:p
atchew
Login
Another very large pullreq (this one mostly because it has RTH's decodetree conversion series in it), but this should be the last of the really large things in my to-review queue... thanks -- PMM The following changes since commit 83aaec1d5a49f158abaa31797a0f976b3c07e5ca: Merge tag 'pull-tcg-20241212' of https://gitlab.com/rth7680/qemu into staging (2024-12-12 18:45:39 -0500) are available in the Git repository at: https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20241213 for you to fetch changes up to 48e652c4bd9570f6f24def25355cb3009a7300f8: target/arm: Simplify condition for tlbi_el2_cp_reginfo[] (2024-12-13 15:41:09 +0000) ---------------------------------------------------------------- target-arm queue: * Finish conversion of A64 decoder to decodetree * Use float_round_to_odd in helper_fcvtx_f64_to_f32 * Move TLBI insn emulation code out to its own source file * docs/system/arm: fix broken links, document undocumented properties * MAINTAINERS: correct an email address ---------------------------------------------------------------- Brian Cain (1): MAINTAINERS: correct my email address Peter Maydell (10): target/arm: Move some TLBI insns to their own source file target/arm: Move TLBI insns for AArch32 EL2 to tlbi_insn_helper.c target/arm: Move AArch64 TLBI insns from v8_cp_reginfo[] target/arm: Move the AArch64 EL2 TLBI insns target/arm: Move AArch64 EL3 TLBI insns target/arm: Move TLBI range insns target/arm: Move the TLBI OS insns to tlb-insns.c. target/arm: Move small helper functions to tlb-insns.c target/arm: Move RME TLB insns to tlb-insns.c target/arm: Simplify condition for tlbi_el2_cp_reginfo[] Pierrick Bouvier (4): docs/system/arm/orangepi: update links docs/system/arm/fby35: document execute-in-place property docs/system/arm/xlnx-versal-virt: document ospi-flash property docs/system/arm/virt: document missing properties Richard Henderson (70): target/arm: Add section labels for "Data Processing (register)" target/arm: Convert UDIV, SDIV to decodetree target/arm: Convert LSLV, LSRV, ASRV, RORV to decodetree target/arm: Convert CRC32, CRC32C to decodetree target/arm: Convert SUBP, IRG, GMI to decodetree target/arm: Convert PACGA to decodetree target/arm: Convert RBIT, REV16, REV32, REV64 to decodetree target/arm: Convert CLZ, CLS to decodetree target/arm: Convert PAC[ID]*, AUT[ID]* to decodetree target/arm: Convert XPAC[ID] to decodetree target/arm: Convert disas_logic_reg to decodetree target/arm: Convert disas_add_sub_ext_reg to decodetree target/arm: Convert disas_add_sub_reg to decodetree target/arm: Convert disas_data_proc_3src to decodetree target/arm: Convert disas_adc_sbc to decodetree target/arm: Convert RMIF to decodetree target/arm: Convert SETF8, SETF16 to decodetree target/arm: Convert CCMP, CCMN to decodetree target/arm: Convert disas_cond_select to decodetree target/arm: Introduce fp_access_check_scalar_hsd target/arm: Introduce fp_access_check_vector_hsd target/arm: Convert FCMP, FCMPE, FCCMP, FCCMPE to decodetree target/arm: Fix decode of fp16 vector fabs, fneg, fsqrt target/arm: Convert FMOV, FABS, FNEG (scalar) to decodetree target/arm: Pass fpstatus to vfp_sqrt* target/arm: Remove helper_sqrt_f16 target/arm: Convert FSQRT (scalar) to decodetree target/arm: Convert FRINT[NPMSAXI] (scalar) to decodetree target/arm: Convert BFCVT to decodetree target/arm: Convert FRINT{32, 64}[ZX] (scalar) to decodetree target/arm: Convert FCVT (scalar) to decodetree target/arm: Convert handle_fpfpcvt to decodetree target/arm: Convert FJCVTZS to decodetree target/arm: Convert handle_fmov to decodetree target/arm: Convert SQABS, SQNEG to decodetree target/arm: Convert ABS, NEG to decodetree target/arm: Introduce gen_gvec_cls, gen_gvec_clz target/arm: Convert CLS, CLZ (vector) to decodetree target/arm: Introduce gen_gvec_cnt, gen_gvec_rbit target/arm: Convert CNT, NOT, RBIT (vector) to decodetree target/arm: Convert CMGT, CMGE, GMLT, GMLE, CMEQ (zero) to decodetree target/arm: Introduce gen_gvec_rev{16,32,64} target/arm: Convert handle_rev to decodetree target/arm: Move helper_neon_addlp_{s8, s16} to neon_helper.c target/arm: Introduce gen_gvec_{s,u}{add,ada}lp target/arm: Convert handle_2misc_pairwise to decodetree target/arm: Remove helper_neon_{add,sub}l_u{16,32} target/arm: Introduce clear_vec target/arm: Convert XTN, SQXTUN, SQXTN, UQXTN to decodetree target/arm: Convert FCVTN, BFCVTN to decodetree target/arm: Convert FCVTXN to decodetree target/arm: Convert SHLL to decodetree target/arm: Implement gen_gvec_fabs, gen_gvec_fneg target/arm: Convert FABS, FNEG (vector) to decodetree target/arm: Convert FSQRT (vector) to decodetree target/arm: Convert FRINT* (vector) to decodetree target/arm: Convert FCVT* (vector, integer) scalar to decodetree target/arm: Convert FCVT* (vector, fixed-point) scalar to decodetree target/arm: Convert [US]CVTF (vector, integer) scalar to decodetree target/arm: Convert [US]CVTF (vector, fixed-point) scalar to decodetree target/arm: Rename helper_gvec_vcvt_[hf][su] with _rz target/arm: Convert [US]CVTF (vector) to decodetree target/arm: Convert FCVTZ[SU] (vector, fixed-point) to decodetree target/arm: Convert FCVT* (vector, integer) to decodetree target/arm: Convert handle_2misc_fcmp_zero to decodetree target/arm: Convert FRECPE, FRECPX, FRSQRTE to decodetree target/arm: Introduce gen_gvec_urecpe, gen_gvec_ursqrte target/arm: Convert URECPE and URSQRTE to decodetree target/arm: Convert FCVTL to decodetree target/arm: Use float_round_to_odd in helper_fcvtx_f64_to_f32 MAINTAINERS | 2 +- docs/system/arm/fby35.rst | 5 + docs/system/arm/orangepi.rst | 4 +- docs/system/arm/virt.rst | 16 + docs/system/arm/xlnx-versal-virt.rst | 3 + target/arm/helper.h | 43 +- target/arm/internals.h | 9 + target/arm/tcg/helper-a64.h | 7 - target/arm/tcg/translate.h | 35 + target/arm/tcg/a64.decode | 502 ++- target/arm/helper.c | 1208 +------- target/arm/tcg-stubs.c | 5 + target/arm/tcg/gengvec.c | 369 +++ target/arm/tcg/helper-a64.c | 122 +- target/arm/tcg/neon_helper.c | 106 +- target/arm/tcg/tlb-insns.c | 1266 ++++++++ target/arm/tcg/translate-a64.c | 5670 +++++++++++----------------------- target/arm/tcg/translate-neon.c | 337 +- target/arm/tcg/translate-vfp.c | 6 +- target/arm/tcg/vec_helper.c | 65 +- target/arm/vfp_helper.c | 16 +- target/arm/tcg/meson.build | 1 + 22 files changed, 4203 insertions(+), 5594 deletions(-) create mode 100644 target/arm/tcg/tlb-insns.c
From: Richard Henderson <richard.henderson@linaro.org> At the same time, use ### to separate 3rd-level sections. We already use ### for 4.1.92 Data Processing (immediate), but not the two following two third-level sections: 4.1.93 Branches, and 4.1.94 Loads and stores. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-2-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ UBFM . 10 100110 . ...... ...... ..... ..... @bitfield_32 EXTR 1 00 100111 1 0 rm:5 imm:6 rn:5 rd:5 &extract sf=1 EXTR 0 00 100111 0 0 rm:5 0 imm:5 rn:5 rd:5 &extract sf=0 -# Branches +### Branches %imm26 0:s26 !function=times_4 @branch . ..... .......................... &i imm=%imm26 @@ -XXX,XX +XXX,XX @@ HLT 1101 0100 010 ................ 000 00 @i16 # DCPS2 1101 0100 101 ................ 000 10 @i16 # DCPS3 1101 0100 101 ................ 000 11 @i16 -# Loads and stores +### Loads and stores &stxr rn rt rt2 rs sz lasr &stlr rn rt sz lasr @@ -XXX,XX +XXX,XX @@ CPYP 00 011 1 01000 ..... .... 01 ..... ..... @cpy CPYM 00 011 1 01010 ..... .... 01 ..... ..... @cpy CPYE 00 011 1 01100 ..... .... 01 ..... ..... @cpy +### Data Processing (register) + +# Data Processing (2-source) +# Data Processing (1-source) +# Logical (shifted reg) +# Add/subtract (shifted reg) +# Add/subtract (extended reg) +# Add/subtract (carry) +# Rotate right into flags +# Evaluate into flags +# Conditional compare (regster) +# Conditional compare (immediate) +# Conditional select +# Data Processing (3-source) + ### Cryptographic AES AESE 01001110 00 10100 00100 10 ..... ..... @r2r_q1e0 -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-3-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 7 ++++ target/arm/tcg/translate-a64.c | 64 +++++++++++++++++----------------- 2 files changed, 39 insertions(+), 32 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ &r rn &ri rd imm &rri_sf rd rn imm sf +&rrr_sf rd rn rm sf &i imm &rr_e rd rn esz &rri_e rd rn imm esz @@ -XXX,XX +XXX,XX @@ CPYE 00 011 1 01100 ..... .... 01 ..... ..... @cpy ### Data Processing (register) # Data Processing (2-source) + +@rrr_sf sf:1 .......... rm:5 ...... rn:5 rd:5 &rrr_sf + +UDIV . 00 11010110 ..... 00001 0 ..... ..... @rrr_sf +SDIV . 00 11010110 ..... 00001 1 ..... ..... @rrr_sf + # Data Processing (1-source) # Logical (shifted reg) # Add/subtract (shifted reg) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ TRANS(UQRSHRN_si, do_scalar_shift_imm_narrow, a, uqrshrn_fns, 0, false) TRANS(SQSHRUN_si, do_scalar_shift_imm_narrow, a, sqshrun_fns, MO_SIGN, false) TRANS(SQRSHRUN_si, do_scalar_shift_imm_narrow, a, sqrshrun_fns, MO_SIGN, false) +static bool do_div(DisasContext *s, arg_rrr_sf *a, bool is_signed) +{ + TCGv_i64 tcg_n, tcg_m, tcg_rd; + tcg_rd = cpu_reg(s, a->rd); + + if (!a->sf && is_signed) { + tcg_n = tcg_temp_new_i64(); + tcg_m = tcg_temp_new_i64(); + tcg_gen_ext32s_i64(tcg_n, cpu_reg(s, a->rn)); + tcg_gen_ext32s_i64(tcg_m, cpu_reg(s, a->rm)); + } else { + tcg_n = read_cpu_reg(s, a->rn, a->sf); + tcg_m = read_cpu_reg(s, a->rm, a->sf); + } + + if (is_signed) { + gen_helper_sdiv64(tcg_rd, tcg_n, tcg_m); + } else { + gen_helper_udiv64(tcg_rd, tcg_n, tcg_m); + } + + if (!a->sf) { /* zero extend final result */ + tcg_gen_ext32u_i64(tcg_rd, tcg_rd); + } + return true; +} + +TRANS(SDIV, do_div, a, true) +TRANS(UDIV, do_div, a, false) + /* Shift a TCGv src by TCGv shift_amount, put result in dst. * Note that it is the caller's responsibility to ensure that the * shift amount is in range (ie 0..31 or 0..63) and provide the ARM @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_1src(DisasContext *s, uint32_t insn) #undef MAP } -static void handle_div(DisasContext *s, bool is_signed, unsigned int sf, - unsigned int rm, unsigned int rn, unsigned int rd) -{ - TCGv_i64 tcg_n, tcg_m, tcg_rd; - tcg_rd = cpu_reg(s, rd); - - if (!sf && is_signed) { - tcg_n = tcg_temp_new_i64(); - tcg_m = tcg_temp_new_i64(); - tcg_gen_ext32s_i64(tcg_n, cpu_reg(s, rn)); - tcg_gen_ext32s_i64(tcg_m, cpu_reg(s, rm)); - } else { - tcg_n = read_cpu_reg(s, rn, sf); - tcg_m = read_cpu_reg(s, rm, sf); - } - - if (is_signed) { - gen_helper_sdiv64(tcg_rd, tcg_n, tcg_m); - } else { - gen_helper_udiv64(tcg_rd, tcg_n, tcg_m); - } - - if (!sf) { /* zero extend final result */ - tcg_gen_ext32u_i64(tcg_rd, tcg_rd); - } -} /* LSLV, LSRV, ASRV, RORV */ static void handle_shift_reg(DisasContext *s, @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_2src(DisasContext *s, uint32_t insn) } } break; - case 2: /* UDIV */ - handle_div(s, false, sf, rm, rn, rd); - break; - case 3: /* SDIV */ - handle_div(s, true, sf, rm, rn, rd); - break; case 4: /* IRG */ if (sf == 0 || !dc_isar_feature(aa64_mte_insn_reg, s)) { goto do_unallocated; @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_2src(DisasContext *s, uint32_t insn) } default: do_unallocated: + case 2: /* UDIV */ + case 3: /* SDIV */ unallocated_encoding(s); break; } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-4-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 4 +++ target/arm/tcg/translate-a64.c | 46 ++++++++++++++++------------------ 2 files changed, 25 insertions(+), 25 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ CPYE 00 011 1 01100 ..... .... 01 ..... ..... @cpy UDIV . 00 11010110 ..... 00001 0 ..... ..... @rrr_sf SDIV . 00 11010110 ..... 00001 1 ..... ..... @rrr_sf +LSLV . 00 11010110 ..... 00100 0 ..... ..... @rrr_sf +LSRV . 00 11010110 ..... 00100 1 ..... ..... @rrr_sf +ASRV . 00 11010110 ..... 00101 0 ..... ..... @rrr_sf +RORV . 00 11010110 ..... 00101 1 ..... ..... @rrr_sf # Data Processing (1-source) # Logical (shifted reg) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static void shift_reg_imm(TCGv_i64 dst, TCGv_i64 src, int sf, } } +static bool do_shift_reg(DisasContext *s, arg_rrr_sf *a, + enum a64_shift_type shift_type) +{ + TCGv_i64 tcg_shift = tcg_temp_new_i64(); + TCGv_i64 tcg_rd = cpu_reg(s, a->rd); + TCGv_i64 tcg_rn = read_cpu_reg(s, a->rn, a->sf); + + tcg_gen_andi_i64(tcg_shift, cpu_reg(s, a->rm), a->sf ? 63 : 31); + shift_reg(tcg_rd, tcg_rn, a->sf, shift_type, tcg_shift); + return true; +} + +TRANS(LSLV, do_shift_reg, a, A64_SHIFT_TYPE_LSL) +TRANS(LSRV, do_shift_reg, a, A64_SHIFT_TYPE_LSR) +TRANS(ASRV, do_shift_reg, a, A64_SHIFT_TYPE_ASR) +TRANS(RORV, do_shift_reg, a, A64_SHIFT_TYPE_ROR) + /* Logical (shifted register) * 31 30 29 28 24 23 22 21 20 16 15 10 9 5 4 0 * +----+-----+-----------+-------+---+------+--------+------+------+ @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_1src(DisasContext *s, uint32_t insn) } -/* LSLV, LSRV, ASRV, RORV */ -static void handle_shift_reg(DisasContext *s, - enum a64_shift_type shift_type, unsigned int sf, - unsigned int rm, unsigned int rn, unsigned int rd) -{ - TCGv_i64 tcg_shift = tcg_temp_new_i64(); - TCGv_i64 tcg_rd = cpu_reg(s, rd); - TCGv_i64 tcg_rn = read_cpu_reg(s, rn, sf); - - tcg_gen_andi_i64(tcg_shift, cpu_reg(s, rm), sf ? 63 : 31); - shift_reg(tcg_rd, tcg_rn, sf, shift_type, tcg_shift); -} - /* CRC32[BHWX], CRC32C[BHWX] */ static void handle_crc32(DisasContext *s, unsigned int sf, unsigned int sz, bool crc32c, @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_2src(DisasContext *s, uint32_t insn) tcg_gen_or_i64(cpu_reg(s, rd), cpu_reg(s, rm), t); } break; - case 8: /* LSLV */ - handle_shift_reg(s, A64_SHIFT_TYPE_LSL, sf, rm, rn, rd); - break; - case 9: /* LSRV */ - handle_shift_reg(s, A64_SHIFT_TYPE_LSR, sf, rm, rn, rd); - break; - case 10: /* ASRV */ - handle_shift_reg(s, A64_SHIFT_TYPE_ASR, sf, rm, rn, rd); - break; - case 11: /* RORV */ - handle_shift_reg(s, A64_SHIFT_TYPE_ROR, sf, rm, rn, rd); - break; case 12: /* PACGA */ if (sf == 0 || !dc_isar_feature(aa64_pauth, s)) { goto do_unallocated; @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_2src(DisasContext *s, uint32_t insn) do_unallocated: case 2: /* UDIV */ case 3: /* SDIV */ + case 8: /* LSLV */ + case 9: /* LSRV */ + case 10: /* ASRV */ + case 11: /* RORV */ unallocated_encoding(s); break; } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-5-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 12 ++++ target/arm/tcg/translate-a64.c | 101 +++++++++++++-------------------- 2 files changed, 53 insertions(+), 60 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ @rr_d ........ ... ..... ...... rn:5 rd:5 &rr_e esz=3 @rr_sd ........ ... ..... ...... rn:5 rd:5 &rr_e esz=%esz_sd +@rrr_b ........ ... rm:5 ...... rn:5 rd:5 &rrr_e esz=0 @rrr_h ........ ... rm:5 ...... rn:5 rd:5 &rrr_e esz=1 +@rrr_s ........ ... rm:5 ...... rn:5 rd:5 &rrr_e esz=2 @rrr_d ........ ... rm:5 ...... rn:5 rd:5 &rrr_e esz=3 @rrr_sd ........ ... rm:5 ...... rn:5 rd:5 &rrr_e esz=%esz_sd @rrr_hsd ........ ... rm:5 ...... rn:5 rd:5 &rrr_e esz=%esz_hsd @@ -XXX,XX +XXX,XX @@ LSRV . 00 11010110 ..... 00100 1 ..... ..... @rrr_sf ASRV . 00 11010110 ..... 00101 0 ..... ..... @rrr_sf RORV . 00 11010110 ..... 00101 1 ..... ..... @rrr_sf +CRC32 0 00 11010110 ..... 0100 00 ..... ..... @rrr_b +CRC32 0 00 11010110 ..... 0100 01 ..... ..... @rrr_h +CRC32 0 00 11010110 ..... 0100 10 ..... ..... @rrr_s +CRC32 1 00 11010110 ..... 0100 11 ..... ..... @rrr_d + +CRC32C 0 00 11010110 ..... 0101 00 ..... ..... @rrr_b +CRC32C 0 00 11010110 ..... 0101 01 ..... ..... @rrr_h +CRC32C 0 00 11010110 ..... 0101 10 ..... ..... @rrr_s +CRC32C 1 00 11010110 ..... 0101 11 ..... ..... @rrr_d + # Data Processing (1-source) # Logical (shifted reg) # Add/subtract (shifted reg) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ TRANS(LSRV, do_shift_reg, a, A64_SHIFT_TYPE_LSR) TRANS(ASRV, do_shift_reg, a, A64_SHIFT_TYPE_ASR) TRANS(RORV, do_shift_reg, a, A64_SHIFT_TYPE_ROR) +static bool do_crc32(DisasContext *s, arg_rrr_e *a, bool crc32c) +{ + TCGv_i64 tcg_acc, tcg_val, tcg_rd; + TCGv_i32 tcg_bytes; + + switch (a->esz) { + case MO_8: + case MO_16: + case MO_32: + tcg_val = tcg_temp_new_i64(); + tcg_gen_extract_i64(tcg_val, cpu_reg(s, a->rm), 0, 8 << a->esz); + break; + case MO_64: + tcg_val = cpu_reg(s, a->rm); + break; + default: + g_assert_not_reached(); + } + tcg_acc = cpu_reg(s, a->rn); + tcg_bytes = tcg_constant_i32(1 << a->esz); + tcg_rd = cpu_reg(s, a->rd); + + if (crc32c) { + gen_helper_crc32c_64(tcg_rd, tcg_acc, tcg_val, tcg_bytes); + } else { + gen_helper_crc32_64(tcg_rd, tcg_acc, tcg_val, tcg_bytes); + } + return true; +} + +TRANS_FEAT(CRC32, aa64_crc32, do_crc32, a, false) +TRANS_FEAT(CRC32C, aa64_crc32, do_crc32, a, true) + /* Logical (shifted register) * 31 30 29 28 24 23 22 21 20 16 15 10 9 5 4 0 * +----+-----+-----------+-------+---+------+--------+------+------+ @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_1src(DisasContext *s, uint32_t insn) } -/* CRC32[BHWX], CRC32C[BHWX] */ -static void handle_crc32(DisasContext *s, - unsigned int sf, unsigned int sz, bool crc32c, - unsigned int rm, unsigned int rn, unsigned int rd) -{ - TCGv_i64 tcg_acc, tcg_val; - TCGv_i32 tcg_bytes; - - if (!dc_isar_feature(aa64_crc32, s) - || (sf == 1 && sz != 3) - || (sf == 0 && sz == 3)) { - unallocated_encoding(s); - return; - } - - if (sz == 3) { - tcg_val = cpu_reg(s, rm); - } else { - uint64_t mask; - switch (sz) { - case 0: - mask = 0xFF; - break; - case 1: - mask = 0xFFFF; - break; - case 2: - mask = 0xFFFFFFFF; - break; - default: - g_assert_not_reached(); - } - tcg_val = tcg_temp_new_i64(); - tcg_gen_andi_i64(tcg_val, cpu_reg(s, rm), mask); - } - - tcg_acc = cpu_reg(s, rn); - tcg_bytes = tcg_constant_i32(1 << sz); - - if (crc32c) { - gen_helper_crc32c_64(cpu_reg(s, rd), tcg_acc, tcg_val, tcg_bytes); - } else { - gen_helper_crc32_64(cpu_reg(s, rd), tcg_acc, tcg_val, tcg_bytes); - } -} - /* Data-processing (2 source) * 31 30 29 28 21 20 16 15 10 9 5 4 0 * +----+---+---+-----------------+------+--------+------+------+ @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_2src(DisasContext *s, uint32_t insn) gen_helper_pacga(cpu_reg(s, rd), tcg_env, cpu_reg(s, rn), cpu_reg_sp(s, rm)); break; - case 16: - case 17: - case 18: - case 19: - case 20: - case 21: - case 22: - case 23: /* CRC32 */ - { - int sz = extract32(opcode, 0, 2); - bool crc32c = extract32(opcode, 2, 1); - handle_crc32(s, sf, sz, crc32c, rm, rn, rd); - break; - } default: do_unallocated: case 2: /* UDIV */ @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_2src(DisasContext *s, uint32_t insn) case 9: /* LSRV */ case 10: /* ASRV */ case 11: /* RORV */ + case 16: + case 17: + case 18: + case 19: + case 20: + case 21: + case 22: + case 23: /* CRC32 */ unallocated_encoding(s); break; } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-6-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 7 +++ target/arm/tcg/translate-a64.c | 94 +++++++++++++++++++--------------- 2 files changed, 59 insertions(+), 42 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ %hlm 11:1 20:2 &r rn +&rrr rd rn rm &ri rd imm &rri_sf rd rn imm sf &rrr_sf rd rn rm sf @@ -XXX,XX +XXX,XX @@ CPYE 00 011 1 01100 ..... .... 01 ..... ..... @cpy # Data Processing (2-source) +@rrr . .......... rm:5 ...... rn:5 rd:5 &rrr @rrr_sf sf:1 .......... rm:5 ...... rn:5 rd:5 &rrr_sf UDIV . 00 11010110 ..... 00001 0 ..... ..... @rrr_sf @@ -XXX,XX +XXX,XX @@ CRC32C 0 00 11010110 ..... 0101 01 ..... ..... @rrr_h CRC32C 0 00 11010110 ..... 0101 10 ..... ..... @rrr_s CRC32C 1 00 11010110 ..... 0101 11 ..... ..... @rrr_d +SUBP 1 00 11010110 ..... 000000 ..... ..... @rrr +SUBPS 1 01 11010110 ..... 000000 ..... ..... @rrr +IRG 1 00 11010110 ..... 000100 ..... ..... @rrr +GMI 1 00 11010110 ..... 000101 ..... ..... @rrr + # Data Processing (1-source) # Logical (shifted reg) # Add/subtract (shifted reg) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static bool do_crc32(DisasContext *s, arg_rrr_e *a, bool crc32c) TRANS_FEAT(CRC32, aa64_crc32, do_crc32, a, false) TRANS_FEAT(CRC32C, aa64_crc32, do_crc32, a, true) +static bool do_subp(DisasContext *s, arg_rrr *a, bool setflag) +{ + TCGv_i64 tcg_n = read_cpu_reg_sp(s, a->rn, true); + TCGv_i64 tcg_m = read_cpu_reg_sp(s, a->rm, true); + TCGv_i64 tcg_d = cpu_reg(s, a->rd); + + tcg_gen_sextract_i64(tcg_n, tcg_n, 0, 56); + tcg_gen_sextract_i64(tcg_m, tcg_m, 0, 56); + + if (setflag) { + gen_sub_CC(true, tcg_d, tcg_n, tcg_m); + } else { + tcg_gen_sub_i64(tcg_d, tcg_n, tcg_m); + } + return true; +} + +TRANS_FEAT(SUBP, aa64_mte_insn_reg, do_subp, a, false) +TRANS_FEAT(SUBPS, aa64_mte_insn_reg, do_subp, a, true) + +static bool trans_IRG(DisasContext *s, arg_rrr *a) +{ + if (dc_isar_feature(aa64_mte_insn_reg, s)) { + TCGv_i64 tcg_rd = cpu_reg_sp(s, a->rd); + TCGv_i64 tcg_rn = cpu_reg_sp(s, a->rn); + + if (s->ata[0]) { + gen_helper_irg(tcg_rd, tcg_env, tcg_rn, cpu_reg(s, a->rm)); + } else { + gen_address_with_allocation_tag0(tcg_rd, tcg_rn); + } + return true; + } + return false; +} + +static bool trans_GMI(DisasContext *s, arg_rrr *a) +{ + if (dc_isar_feature(aa64_mte_insn_reg, s)) { + TCGv_i64 t = tcg_temp_new_i64(); + + tcg_gen_extract_i64(t, cpu_reg_sp(s, a->rn), 56, 4); + tcg_gen_shl_i64(t, tcg_constant_i64(1), t); + tcg_gen_or_i64(cpu_reg(s, a->rd), cpu_reg(s, a->rm), t); + return true; + } + return false; +} + /* Logical (shifted register) * 31 30 29 28 24 23 22 21 20 16 15 10 9 5 4 0 * +----+-----+-----------+-------+---+------+--------+------+------+ @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_2src(DisasContext *s, uint32_t insn) } switch (opcode) { - case 0: /* SUBP(S) */ - if (sf == 0 || !dc_isar_feature(aa64_mte_insn_reg, s)) { - goto do_unallocated; - } else { - TCGv_i64 tcg_n, tcg_m, tcg_d; - - tcg_n = read_cpu_reg_sp(s, rn, true); - tcg_m = read_cpu_reg_sp(s, rm, true); - tcg_gen_sextract_i64(tcg_n, tcg_n, 0, 56); - tcg_gen_sextract_i64(tcg_m, tcg_m, 0, 56); - tcg_d = cpu_reg(s, rd); - - if (setflag) { - gen_sub_CC(true, tcg_d, tcg_n, tcg_m); - } else { - tcg_gen_sub_i64(tcg_d, tcg_n, tcg_m); - } - } - break; - case 4: /* IRG */ - if (sf == 0 || !dc_isar_feature(aa64_mte_insn_reg, s)) { - goto do_unallocated; - } - if (s->ata[0]) { - gen_helper_irg(cpu_reg_sp(s, rd), tcg_env, - cpu_reg_sp(s, rn), cpu_reg(s, rm)); - } else { - gen_address_with_allocation_tag0(cpu_reg_sp(s, rd), - cpu_reg_sp(s, rn)); - } - break; - case 5: /* GMI */ - if (sf == 0 || !dc_isar_feature(aa64_mte_insn_reg, s)) { - goto do_unallocated; - } else { - TCGv_i64 t = tcg_temp_new_i64(); - - tcg_gen_extract_i64(t, cpu_reg_sp(s, rn), 56, 4); - tcg_gen_shl_i64(t, tcg_constant_i64(1), t); - tcg_gen_or_i64(cpu_reg(s, rd), cpu_reg(s, rm), t); - } - break; case 12: /* PACGA */ if (sf == 0 || !dc_isar_feature(aa64_pauth, s)) { goto do_unallocated; @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_2src(DisasContext *s, uint32_t insn) break; default: do_unallocated: + case 0: /* SUBP(S) */ case 2: /* UDIV */ case 3: /* SDIV */ + case 4: /* IRG */ + case 5: /* GMI */ case 8: /* LSLV */ case 9: /* LSRV */ case 10: /* ASRV */ -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Remove disas_data_proc_2src, as this was the last insn decoded by that function. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-7-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 2 ++ target/arm/tcg/translate-a64.c | 65 ++++++---------------------------- 2 files changed, 13 insertions(+), 54 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ SUBPS 1 01 11010110 ..... 000000 ..... ..... @rrr IRG 1 00 11010110 ..... 000100 ..... ..... @rrr GMI 1 00 11010110 ..... 000101 ..... ..... @rrr +PACGA 1 00 11010110 ..... 001100 ..... ..... @rrr + # Data Processing (1-source) # Logical (shifted reg) # Add/subtract (shifted reg) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static bool trans_GMI(DisasContext *s, arg_rrr *a) return false; } +static bool trans_PACGA(DisasContext *s, arg_rrr *a) +{ + if (dc_isar_feature(aa64_pauth, s)) { + gen_helper_pacga(cpu_reg(s, a->rd), tcg_env, + cpu_reg(s, a->rn), cpu_reg_sp(s, a->rm)); + return true; + } + return false; +} + /* Logical (shifted register) * 31 30 29 28 24 23 22 21 20 16 15 10 9 5 4 0 * +----+-----+-----------+-------+---+------+--------+------+------+ @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_1src(DisasContext *s, uint32_t insn) } -/* Data-processing (2 source) - * 31 30 29 28 21 20 16 15 10 9 5 4 0 - * +----+---+---+-----------------+------+--------+------+------+ - * | sf | 0 | S | 1 1 0 1 0 1 1 0 | Rm | opcode | Rn | Rd | - * +----+---+---+-----------------+------+--------+------+------+ - */ -static void disas_data_proc_2src(DisasContext *s, uint32_t insn) -{ - unsigned int sf, rm, opcode, rn, rd, setflag; - sf = extract32(insn, 31, 1); - setflag = extract32(insn, 29, 1); - rm = extract32(insn, 16, 5); - opcode = extract32(insn, 10, 6); - rn = extract32(insn, 5, 5); - rd = extract32(insn, 0, 5); - - if (setflag && opcode != 0) { - unallocated_encoding(s); - return; - } - - switch (opcode) { - case 12: /* PACGA */ - if (sf == 0 || !dc_isar_feature(aa64_pauth, s)) { - goto do_unallocated; - } - gen_helper_pacga(cpu_reg(s, rd), tcg_env, - cpu_reg(s, rn), cpu_reg_sp(s, rm)); - break; - default: - do_unallocated: - case 0: /* SUBP(S) */ - case 2: /* UDIV */ - case 3: /* SDIV */ - case 4: /* IRG */ - case 5: /* GMI */ - case 8: /* LSLV */ - case 9: /* LSRV */ - case 10: /* ASRV */ - case 11: /* RORV */ - case 16: - case 17: - case 18: - case 19: - case 20: - case 21: - case 22: - case 23: /* CRC32 */ - unallocated_encoding(s); - break; - } -} - /* * Data processing - register * 31 30 29 28 25 21 20 16 10 0 @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_reg(DisasContext *s, uint32_t insn) if (op0) { /* (1 source) */ disas_data_proc_1src(s, insn); } else { /* (2 source) */ - disas_data_proc_2src(s, insn); + goto do_unallocated; } break; case 0x8 ... 0xf: /* (3 source) */ -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-8-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 11 +++ target/arm/tcg/translate-a64.c | 137 +++++++++++++++------------------ 2 files changed, 72 insertions(+), 76 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ &r rn &rrr rd rn rm &ri rd imm +&rr rd rn +&rr_sf rd rn sf &rri_sf rd rn imm sf &rrr_sf rd rn rm sf &i imm @@ -XXX,XX +XXX,XX @@ GMI 1 00 11010110 ..... 000101 ..... ..... @rrr PACGA 1 00 11010110 ..... 001100 ..... ..... @rrr # Data Processing (1-source) + +@rr . .......... ..... ...... rn:5 rd:5 &rr +@rr_sf sf:1 .......... ..... ...... rn:5 rd:5 &rr_sf + +RBIT . 10 11010110 00000 000000 ..... ..... @rr_sf +REV16 . 10 11010110 00000 000001 ..... ..... @rr_sf +REV32 . 10 11010110 00000 000010 ..... ..... @rr_sf +REV64 1 10 11010110 00000 000011 ..... ..... @rr + # Logical (shifted reg) # Add/subtract (shifted reg) # Add/subtract (extended reg) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static bool trans_PACGA(DisasContext *s, arg_rrr *a) return false; } +typedef void ArithOneOp(TCGv_i64, TCGv_i64); + +static bool gen_rr(DisasContext *s, int rd, int rn, ArithOneOp fn) +{ + fn(cpu_reg(s, rd), cpu_reg(s, rn)); + return true; +} + +static void gen_rbit32(TCGv_i64 tcg_rd, TCGv_i64 tcg_rn) +{ + TCGv_i32 t32 = tcg_temp_new_i32(); + + tcg_gen_extrl_i64_i32(t32, tcg_rn); + gen_helper_rbit(t32, t32); + tcg_gen_extu_i32_i64(tcg_rd, t32); +} + +static void gen_rev16_xx(TCGv_i64 tcg_rd, TCGv_i64 tcg_rn, TCGv_i64 mask) +{ + TCGv_i64 tcg_tmp = tcg_temp_new_i64(); + + tcg_gen_shri_i64(tcg_tmp, tcg_rn, 8); + tcg_gen_and_i64(tcg_rd, tcg_rn, mask); + tcg_gen_and_i64(tcg_tmp, tcg_tmp, mask); + tcg_gen_shli_i64(tcg_rd, tcg_rd, 8); + tcg_gen_or_i64(tcg_rd, tcg_rd, tcg_tmp); +} + +static void gen_rev16_32(TCGv_i64 tcg_rd, TCGv_i64 tcg_rn) +{ + gen_rev16_xx(tcg_rd, tcg_rn, tcg_constant_i64(0x00ff00ff)); +} + +static void gen_rev16_64(TCGv_i64 tcg_rd, TCGv_i64 tcg_rn) +{ + gen_rev16_xx(tcg_rd, tcg_rn, tcg_constant_i64(0x00ff00ff00ff00ffull)); +} + +static void gen_rev_32(TCGv_i64 tcg_rd, TCGv_i64 tcg_rn) +{ + tcg_gen_bswap32_i64(tcg_rd, tcg_rn, TCG_BSWAP_OZ); +} + +static void gen_rev32(TCGv_i64 tcg_rd, TCGv_i64 tcg_rn) +{ + tcg_gen_bswap64_i64(tcg_rd, tcg_rn); + tcg_gen_rotri_i64(tcg_rd, tcg_rd, 32); +} + +TRANS(RBIT, gen_rr, a->rd, a->rn, a->sf ? gen_helper_rbit64 : gen_rbit32) +TRANS(REV16, gen_rr, a->rd, a->rn, a->sf ? gen_rev16_64 : gen_rev16_32) +TRANS(REV32, gen_rr, a->rd, a->rn, a->sf ? gen_rev32 : gen_rev_32) +TRANS(REV64, gen_rr, a->rd, a->rn, tcg_gen_bswap64_i64) + /* Logical (shifted register) * 31 30 29 28 24 23 22 21 20 16 15 10 9 5 4 0 * +----+-----+-----------+-------+---+------+--------+------+------+ @@ -XXX,XX +XXX,XX @@ static void handle_cls(DisasContext *s, unsigned int sf, } } -static void handle_rbit(DisasContext *s, unsigned int sf, - unsigned int rn, unsigned int rd) -{ - TCGv_i64 tcg_rd, tcg_rn; - tcg_rd = cpu_reg(s, rd); - tcg_rn = cpu_reg(s, rn); - - if (sf) { - gen_helper_rbit64(tcg_rd, tcg_rn); - } else { - TCGv_i32 tcg_tmp32 = tcg_temp_new_i32(); - tcg_gen_extrl_i64_i32(tcg_tmp32, tcg_rn); - gen_helper_rbit(tcg_tmp32, tcg_tmp32); - tcg_gen_extu_i32_i64(tcg_rd, tcg_tmp32); - } -} - -/* REV with sf==1, opcode==3 ("REV64") */ -static void handle_rev64(DisasContext *s, unsigned int sf, - unsigned int rn, unsigned int rd) -{ - if (!sf) { - unallocated_encoding(s); - return; - } - tcg_gen_bswap64_i64(cpu_reg(s, rd), cpu_reg(s, rn)); -} - -/* REV with sf==0, opcode==2 - * REV32 (sf==1, opcode==2) - */ -static void handle_rev32(DisasContext *s, unsigned int sf, - unsigned int rn, unsigned int rd) -{ - TCGv_i64 tcg_rd = cpu_reg(s, rd); - TCGv_i64 tcg_rn = cpu_reg(s, rn); - - if (sf) { - tcg_gen_bswap64_i64(tcg_rd, tcg_rn); - tcg_gen_rotri_i64(tcg_rd, tcg_rd, 32); - } else { - tcg_gen_bswap32_i64(tcg_rd, tcg_rn, TCG_BSWAP_OZ); - } -} - -/* REV16 (opcode==1) */ -static void handle_rev16(DisasContext *s, unsigned int sf, - unsigned int rn, unsigned int rd) -{ - TCGv_i64 tcg_rd = cpu_reg(s, rd); - TCGv_i64 tcg_tmp = tcg_temp_new_i64(); - TCGv_i64 tcg_rn = read_cpu_reg(s, rn, sf); - TCGv_i64 mask = tcg_constant_i64(sf ? 0x00ff00ff00ff00ffull : 0x00ff00ff); - - tcg_gen_shri_i64(tcg_tmp, tcg_rn, 8); - tcg_gen_and_i64(tcg_rd, tcg_rn, mask); - tcg_gen_and_i64(tcg_tmp, tcg_tmp, mask); - tcg_gen_shli_i64(tcg_rd, tcg_rd, 8); - tcg_gen_or_i64(tcg_rd, tcg_rd, tcg_tmp); -} - /* Data-processing (1 source) * 31 30 29 28 21 20 16 15 10 9 5 4 0 * +----+---+---+-----------------+---------+--------+------+------+ @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_1src(DisasContext *s, uint32_t insn) #define MAP(SF, O2, O1) ((SF) | (O1 << 1) | (O2 << 7)) switch (MAP(sf, opcode2, opcode)) { - case MAP(0, 0x00, 0x00): /* RBIT */ - case MAP(1, 0x00, 0x00): - handle_rbit(s, sf, rn, rd); - break; - case MAP(0, 0x00, 0x01): /* REV16 */ - case MAP(1, 0x00, 0x01): - handle_rev16(s, sf, rn, rd); - break; - case MAP(0, 0x00, 0x02): /* REV/REV32 */ - case MAP(1, 0x00, 0x02): - handle_rev32(s, sf, rn, rd); - break; - case MAP(1, 0x00, 0x03): /* REV64 */ - handle_rev64(s, sf, rn, rd); - break; case MAP(0, 0x00, 0x04): /* CLZ */ case MAP(1, 0x00, 0x04): handle_clz(s, sf, rn, rd); @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_1src(DisasContext *s, uint32_t insn) break; default: do_unallocated: + case MAP(0, 0x00, 0x00): /* RBIT */ + case MAP(1, 0x00, 0x00): + case MAP(0, 0x00, 0x01): /* REV16 */ + case MAP(1, 0x00, 0x01): + case MAP(0, 0x00, 0x02): /* REV/REV32 */ + case MAP(1, 0x00, 0x02): + case MAP(1, 0x00, 0x03): /* REV64 */ unallocated_encoding(s); break; } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-9-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 3 ++ target/arm/tcg/translate-a64.c | 72 ++++++++++++++-------------------- 2 files changed, 33 insertions(+), 42 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ REV16 . 10 11010110 00000 000001 ..... ..... @rr_sf REV32 . 10 11010110 00000 000010 ..... ..... @rr_sf REV64 1 10 11010110 00000 000011 ..... ..... @rr +CLZ . 10 11010110 00000 000100 ..... ..... @rr_sf +CLS . 10 11010110 00000 000101 ..... ..... @rr_sf + # Logical (shifted reg) # Add/subtract (shifted reg) # Add/subtract (extended reg) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ TRANS(REV16, gen_rr, a->rd, a->rn, a->sf ? gen_rev16_64 : gen_rev16_32) TRANS(REV32, gen_rr, a->rd, a->rn, a->sf ? gen_rev32 : gen_rev_32) TRANS(REV64, gen_rr, a->rd, a->rn, tcg_gen_bswap64_i64) +static void gen_clz32(TCGv_i64 tcg_rd, TCGv_i64 tcg_rn) +{ + TCGv_i32 t32 = tcg_temp_new_i32(); + + tcg_gen_extrl_i64_i32(t32, tcg_rn); + tcg_gen_clzi_i32(t32, t32, 32); + tcg_gen_extu_i32_i64(tcg_rd, t32); +} + +static void gen_clz64(TCGv_i64 tcg_rd, TCGv_i64 tcg_rn) +{ + tcg_gen_clzi_i64(tcg_rd, tcg_rn, 64); +} + +static void gen_cls32(TCGv_i64 tcg_rd, TCGv_i64 tcg_rn) +{ + TCGv_i32 t32 = tcg_temp_new_i32(); + + tcg_gen_extrl_i64_i32(t32, tcg_rn); + tcg_gen_clrsb_i32(t32, t32); + tcg_gen_extu_i32_i64(tcg_rd, t32); +} + +TRANS(CLZ, gen_rr, a->rd, a->rn, a->sf ? gen_clz64 : gen_clz32) +TRANS(CLS, gen_rr, a->rd, a->rn, a->sf ? tcg_gen_clrsb_i64 : gen_cls32) + /* Logical (shifted register) * 31 30 29 28 24 23 22 21 20 16 15 10 9 5 4 0 * +----+-----+-----------+-------+---+------+--------+------+------+ @@ -XXX,XX +XXX,XX @@ static void disas_cond_select(DisasContext *s, uint32_t insn) } } -static void handle_clz(DisasContext *s, unsigned int sf, - unsigned int rn, unsigned int rd) -{ - TCGv_i64 tcg_rd, tcg_rn; - tcg_rd = cpu_reg(s, rd); - tcg_rn = cpu_reg(s, rn); - - if (sf) { - tcg_gen_clzi_i64(tcg_rd, tcg_rn, 64); - } else { - TCGv_i32 tcg_tmp32 = tcg_temp_new_i32(); - tcg_gen_extrl_i64_i32(tcg_tmp32, tcg_rn); - tcg_gen_clzi_i32(tcg_tmp32, tcg_tmp32, 32); - tcg_gen_extu_i32_i64(tcg_rd, tcg_tmp32); - } -} - -static void handle_cls(DisasContext *s, unsigned int sf, - unsigned int rn, unsigned int rd) -{ - TCGv_i64 tcg_rd, tcg_rn; - tcg_rd = cpu_reg(s, rd); - tcg_rn = cpu_reg(s, rn); - - if (sf) { - tcg_gen_clrsb_i64(tcg_rd, tcg_rn); - } else { - TCGv_i32 tcg_tmp32 = tcg_temp_new_i32(); - tcg_gen_extrl_i64_i32(tcg_tmp32, tcg_rn); - tcg_gen_clrsb_i32(tcg_tmp32, tcg_tmp32); - tcg_gen_extu_i32_i64(tcg_rd, tcg_tmp32); - } -} - /* Data-processing (1 source) * 31 30 29 28 21 20 16 15 10 9 5 4 0 * +----+---+---+-----------------+---------+--------+------+------+ @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_1src(DisasContext *s, uint32_t insn) #define MAP(SF, O2, O1) ((SF) | (O1 << 1) | (O2 << 7)) switch (MAP(sf, opcode2, opcode)) { - case MAP(0, 0x00, 0x04): /* CLZ */ - case MAP(1, 0x00, 0x04): - handle_clz(s, sf, rn, rd); - break; - case MAP(0, 0x00, 0x05): /* CLS */ - case MAP(1, 0x00, 0x05): - handle_cls(s, sf, rn, rd); - break; case MAP(1, 0x01, 0x00): /* PACIA */ if (s->pauth_active) { tcg_rd = cpu_reg(s, rd); @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_1src(DisasContext *s, uint32_t insn) case MAP(0, 0x00, 0x02): /* REV/REV32 */ case MAP(1, 0x00, 0x02): case MAP(1, 0x00, 0x03): /* REV64 */ + case MAP(0, 0x00, 0x04): /* CLZ */ + case MAP(1, 0x00, 0x04): + case MAP(0, 0x00, 0x05): /* CLS */ + case MAP(1, 0x00, 0x05): unallocated_encoding(s); break; } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> This includes PACIA, PACIZA, PACIB, PACIZB, PACDA, PACDZA, PACDB, PACDZB, AUTIA, AUTIZA, AUTIB, AUTIZB, AUTDA, AUTDZA, AUTDB, AUTDZB. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-10-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 13 +++ target/arm/tcg/translate-a64.c | 173 +++++++++------------------------ 2 files changed, 58 insertions(+), 128 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ REV64 1 10 11010110 00000 000011 ..... ..... @rr CLZ . 10 11010110 00000 000100 ..... ..... @rr_sf CLS . 10 11010110 00000 000101 ..... ..... @rr_sf +&pacaut rd rn z +@pacaut . .. ........ ..... .. z:1 ... rn:5 rd:5 &pacaut + +PACIA 1 10 11010110 00001 00.000 ..... ..... @pacaut +PACIB 1 10 11010110 00001 00.001 ..... ..... @pacaut +PACDA 1 10 11010110 00001 00.010 ..... ..... @pacaut +PACDB 1 10 11010110 00001 00.011 ..... ..... @pacaut + +AUTIA 1 10 11010110 00001 00.100 ..... ..... @pacaut +AUTIB 1 10 11010110 00001 00.101 ..... ..... @pacaut +AUTDA 1 10 11010110 00001 00.110 ..... ..... @pacaut +AUTDB 1 10 11010110 00001 00.111 ..... ..... @pacaut + # Logical (shifted reg) # Add/subtract (shifted reg) # Add/subtract (extended reg) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static void gen_cls32(TCGv_i64 tcg_rd, TCGv_i64 tcg_rn) TRANS(CLZ, gen_rr, a->rd, a->rn, a->sf ? gen_clz64 : gen_clz32) TRANS(CLS, gen_rr, a->rd, a->rn, a->sf ? tcg_gen_clrsb_i64 : gen_cls32) +static bool gen_pacaut(DisasContext *s, arg_pacaut *a, NeonGenTwo64OpEnvFn fn) +{ + TCGv_i64 tcg_rd, tcg_rn; + + if (a->z) { + if (a->rn != 31) { + return false; + } + tcg_rn = tcg_constant_i64(0); + } else { + tcg_rn = cpu_reg_sp(s, a->rn); + } + if (s->pauth_active) { + tcg_rd = cpu_reg(s, a->rd); + fn(tcg_rd, tcg_env, tcg_rd, tcg_rn); + } + return true; +} + +TRANS_FEAT(PACIA, aa64_pauth, gen_pacaut, a, gen_helper_pacia) +TRANS_FEAT(PACIB, aa64_pauth, gen_pacaut, a, gen_helper_pacib) +TRANS_FEAT(PACDA, aa64_pauth, gen_pacaut, a, gen_helper_pacda) +TRANS_FEAT(PACDB, aa64_pauth, gen_pacaut, a, gen_helper_pacdb) + +TRANS_FEAT(AUTIA, aa64_pauth, gen_pacaut, a, gen_helper_autia) +TRANS_FEAT(AUTIB, aa64_pauth, gen_pacaut, a, gen_helper_autib) +TRANS_FEAT(AUTDA, aa64_pauth, gen_pacaut, a, gen_helper_autda) +TRANS_FEAT(AUTDB, aa64_pauth, gen_pacaut, a, gen_helper_autdb) + /* Logical (shifted register) * 31 30 29 28 24 23 22 21 20 16 15 10 9 5 4 0 * +----+-----+-----------+-------+---+------+--------+------+------+ @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_1src(DisasContext *s, uint32_t insn) #define MAP(SF, O2, O1) ((SF) | (O1 << 1) | (O2 << 7)) switch (MAP(sf, opcode2, opcode)) { - case MAP(1, 0x01, 0x00): /* PACIA */ - if (s->pauth_active) { - tcg_rd = cpu_reg(s, rd); - gen_helper_pacia(tcg_rd, tcg_env, tcg_rd, cpu_reg_sp(s, rn)); - } else if (!dc_isar_feature(aa64_pauth, s)) { - goto do_unallocated; - } - break; - case MAP(1, 0x01, 0x01): /* PACIB */ - if (s->pauth_active) { - tcg_rd = cpu_reg(s, rd); - gen_helper_pacib(tcg_rd, tcg_env, tcg_rd, cpu_reg_sp(s, rn)); - } else if (!dc_isar_feature(aa64_pauth, s)) { - goto do_unallocated; - } - break; - case MAP(1, 0x01, 0x02): /* PACDA */ - if (s->pauth_active) { - tcg_rd = cpu_reg(s, rd); - gen_helper_pacda(tcg_rd, tcg_env, tcg_rd, cpu_reg_sp(s, rn)); - } else if (!dc_isar_feature(aa64_pauth, s)) { - goto do_unallocated; - } - break; - case MAP(1, 0x01, 0x03): /* PACDB */ - if (s->pauth_active) { - tcg_rd = cpu_reg(s, rd); - gen_helper_pacdb(tcg_rd, tcg_env, tcg_rd, cpu_reg_sp(s, rn)); - } else if (!dc_isar_feature(aa64_pauth, s)) { - goto do_unallocated; - } - break; - case MAP(1, 0x01, 0x04): /* AUTIA */ - if (s->pauth_active) { - tcg_rd = cpu_reg(s, rd); - gen_helper_autia(tcg_rd, tcg_env, tcg_rd, cpu_reg_sp(s, rn)); - } else if (!dc_isar_feature(aa64_pauth, s)) { - goto do_unallocated; - } - break; - case MAP(1, 0x01, 0x05): /* AUTIB */ - if (s->pauth_active) { - tcg_rd = cpu_reg(s, rd); - gen_helper_autib(tcg_rd, tcg_env, tcg_rd, cpu_reg_sp(s, rn)); - } else if (!dc_isar_feature(aa64_pauth, s)) { - goto do_unallocated; - } - break; - case MAP(1, 0x01, 0x06): /* AUTDA */ - if (s->pauth_active) { - tcg_rd = cpu_reg(s, rd); - gen_helper_autda(tcg_rd, tcg_env, tcg_rd, cpu_reg_sp(s, rn)); - } else if (!dc_isar_feature(aa64_pauth, s)) { - goto do_unallocated; - } - break; - case MAP(1, 0x01, 0x07): /* AUTDB */ - if (s->pauth_active) { - tcg_rd = cpu_reg(s, rd); - gen_helper_autdb(tcg_rd, tcg_env, tcg_rd, cpu_reg_sp(s, rn)); - } else if (!dc_isar_feature(aa64_pauth, s)) { - goto do_unallocated; - } - break; - case MAP(1, 0x01, 0x08): /* PACIZA */ - if (!dc_isar_feature(aa64_pauth, s) || rn != 31) { - goto do_unallocated; - } else if (s->pauth_active) { - tcg_rd = cpu_reg(s, rd); - gen_helper_pacia(tcg_rd, tcg_env, tcg_rd, tcg_constant_i64(0)); - } - break; - case MAP(1, 0x01, 0x09): /* PACIZB */ - if (!dc_isar_feature(aa64_pauth, s) || rn != 31) { - goto do_unallocated; - } else if (s->pauth_active) { - tcg_rd = cpu_reg(s, rd); - gen_helper_pacib(tcg_rd, tcg_env, tcg_rd, tcg_constant_i64(0)); - } - break; - case MAP(1, 0x01, 0x0a): /* PACDZA */ - if (!dc_isar_feature(aa64_pauth, s) || rn != 31) { - goto do_unallocated; - } else if (s->pauth_active) { - tcg_rd = cpu_reg(s, rd); - gen_helper_pacda(tcg_rd, tcg_env, tcg_rd, tcg_constant_i64(0)); - } - break; - case MAP(1, 0x01, 0x0b): /* PACDZB */ - if (!dc_isar_feature(aa64_pauth, s) || rn != 31) { - goto do_unallocated; - } else if (s->pauth_active) { - tcg_rd = cpu_reg(s, rd); - gen_helper_pacdb(tcg_rd, tcg_env, tcg_rd, tcg_constant_i64(0)); - } - break; - case MAP(1, 0x01, 0x0c): /* AUTIZA */ - if (!dc_isar_feature(aa64_pauth, s) || rn != 31) { - goto do_unallocated; - } else if (s->pauth_active) { - tcg_rd = cpu_reg(s, rd); - gen_helper_autia(tcg_rd, tcg_env, tcg_rd, tcg_constant_i64(0)); - } - break; - case MAP(1, 0x01, 0x0d): /* AUTIZB */ - if (!dc_isar_feature(aa64_pauth, s) || rn != 31) { - goto do_unallocated; - } else if (s->pauth_active) { - tcg_rd = cpu_reg(s, rd); - gen_helper_autib(tcg_rd, tcg_env, tcg_rd, tcg_constant_i64(0)); - } - break; - case MAP(1, 0x01, 0x0e): /* AUTDZA */ - if (!dc_isar_feature(aa64_pauth, s) || rn != 31) { - goto do_unallocated; - } else if (s->pauth_active) { - tcg_rd = cpu_reg(s, rd); - gen_helper_autda(tcg_rd, tcg_env, tcg_rd, tcg_constant_i64(0)); - } - break; - case MAP(1, 0x01, 0x0f): /* AUTDZB */ - if (!dc_isar_feature(aa64_pauth, s) || rn != 31) { - goto do_unallocated; - } else if (s->pauth_active) { - tcg_rd = cpu_reg(s, rd); - gen_helper_autdb(tcg_rd, tcg_env, tcg_rd, tcg_constant_i64(0)); - } - break; case MAP(1, 0x01, 0x10): /* XPACI */ if (!dc_isar_feature(aa64_pauth, s) || rn != 31) { goto do_unallocated; @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_1src(DisasContext *s, uint32_t insn) case MAP(1, 0x00, 0x04): case MAP(0, 0x00, 0x05): /* CLS */ case MAP(1, 0x00, 0x05): + case MAP(1, 0x01, 0x00): /* PACIA */ + case MAP(1, 0x01, 0x01): /* PACIB */ + case MAP(1, 0x01, 0x02): /* PACDA */ + case MAP(1, 0x01, 0x03): /* PACDB */ + case MAP(1, 0x01, 0x04): /* AUTIA */ + case MAP(1, 0x01, 0x05): /* AUTIB */ + case MAP(1, 0x01, 0x06): /* AUTDA */ + case MAP(1, 0x01, 0x07): /* AUTDB */ + case MAP(1, 0x01, 0x08): /* PACIZA */ + case MAP(1, 0x01, 0x09): /* PACIZB */ + case MAP(1, 0x01, 0x0a): /* PACDZA */ + case MAP(1, 0x01, 0x0b): /* PACDZB */ + case MAP(1, 0x01, 0x0c): /* AUTIZA */ + case MAP(1, 0x01, 0x0d): /* AUTIZB */ + case MAP(1, 0x01, 0x0e): /* AUTDZA */ + case MAP(1, 0x01, 0x0f): /* AUTDZB */ unallocated_encoding(s); break; } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Remove disas_data_proc_1src, as these were the last insns decoded by that function. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-11-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 3 ++ target/arm/tcg/translate-a64.c | 99 +++++----------------------------- 2 files changed, 16 insertions(+), 86 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ AUTIB 1 10 11010110 00001 00.101 ..... ..... @pacaut AUTDA 1 10 11010110 00001 00.110 ..... ..... @pacaut AUTDB 1 10 11010110 00001 00.111 ..... ..... @pacaut +XPACI 1 10 11010110 00001 010000 11111 rd:5 +XPACD 1 10 11010110 00001 010001 11111 rd:5 + # Logical (shifted reg) # Add/subtract (shifted reg) # Add/subtract (extended reg) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(AUTIB, aa64_pauth, gen_pacaut, a, gen_helper_autib) TRANS_FEAT(AUTDA, aa64_pauth, gen_pacaut, a, gen_helper_autda) TRANS_FEAT(AUTDB, aa64_pauth, gen_pacaut, a, gen_helper_autdb) +static bool do_xpac(DisasContext *s, int rd, NeonGenOne64OpEnvFn *fn) +{ + if (s->pauth_active) { + TCGv_i64 tcg_rd = cpu_reg(s, rd); + fn(tcg_rd, tcg_env, tcg_rd); + } + return true; +} + +TRANS_FEAT(XPACI, aa64_pauth, do_xpac, a->rd, gen_helper_xpaci) +TRANS_FEAT(XPACD, aa64_pauth, do_xpac, a->rd, gen_helper_xpacd) + /* Logical (shifted register) * 31 30 29 28 24 23 22 21 20 16 15 10 9 5 4 0 * +----+-----+-----------+-------+---+------+--------+------+------+ @@ -XXX,XX +XXX,XX @@ static void disas_cond_select(DisasContext *s, uint32_t insn) } } -/* Data-processing (1 source) - * 31 30 29 28 21 20 16 15 10 9 5 4 0 - * +----+---+---+-----------------+---------+--------+------+------+ - * | sf | 1 | S | 1 1 0 1 0 1 1 0 | opcode2 | opcode | Rn | Rd | - * +----+---+---+-----------------+---------+--------+------+------+ - */ -static void disas_data_proc_1src(DisasContext *s, uint32_t insn) -{ - unsigned int sf, opcode, opcode2, rn, rd; - TCGv_i64 tcg_rd; - - if (extract32(insn, 29, 1)) { - unallocated_encoding(s); - return; - } - - sf = extract32(insn, 31, 1); - opcode = extract32(insn, 10, 6); - opcode2 = extract32(insn, 16, 5); - rn = extract32(insn, 5, 5); - rd = extract32(insn, 0, 5); - -#define MAP(SF, O2, O1) ((SF) | (O1 << 1) | (O2 << 7)) - - switch (MAP(sf, opcode2, opcode)) { - case MAP(1, 0x01, 0x10): /* XPACI */ - if (!dc_isar_feature(aa64_pauth, s) || rn != 31) { - goto do_unallocated; - } else if (s->pauth_active) { - tcg_rd = cpu_reg(s, rd); - gen_helper_xpaci(tcg_rd, tcg_env, tcg_rd); - } - break; - case MAP(1, 0x01, 0x11): /* XPACD */ - if (!dc_isar_feature(aa64_pauth, s) || rn != 31) { - goto do_unallocated; - } else if (s->pauth_active) { - tcg_rd = cpu_reg(s, rd); - gen_helper_xpacd(tcg_rd, tcg_env, tcg_rd); - } - break; - default: - do_unallocated: - case MAP(0, 0x00, 0x00): /* RBIT */ - case MAP(1, 0x00, 0x00): - case MAP(0, 0x00, 0x01): /* REV16 */ - case MAP(1, 0x00, 0x01): - case MAP(0, 0x00, 0x02): /* REV/REV32 */ - case MAP(1, 0x00, 0x02): - case MAP(1, 0x00, 0x03): /* REV64 */ - case MAP(0, 0x00, 0x04): /* CLZ */ - case MAP(1, 0x00, 0x04): - case MAP(0, 0x00, 0x05): /* CLS */ - case MAP(1, 0x00, 0x05): - case MAP(1, 0x01, 0x00): /* PACIA */ - case MAP(1, 0x01, 0x01): /* PACIB */ - case MAP(1, 0x01, 0x02): /* PACDA */ - case MAP(1, 0x01, 0x03): /* PACDB */ - case MAP(1, 0x01, 0x04): /* AUTIA */ - case MAP(1, 0x01, 0x05): /* AUTIB */ - case MAP(1, 0x01, 0x06): /* AUTDA */ - case MAP(1, 0x01, 0x07): /* AUTDB */ - case MAP(1, 0x01, 0x08): /* PACIZA */ - case MAP(1, 0x01, 0x09): /* PACIZB */ - case MAP(1, 0x01, 0x0a): /* PACDZA */ - case MAP(1, 0x01, 0x0b): /* PACDZB */ - case MAP(1, 0x01, 0x0c): /* AUTIZA */ - case MAP(1, 0x01, 0x0d): /* AUTIZB */ - case MAP(1, 0x01, 0x0e): /* AUTDZA */ - case MAP(1, 0x01, 0x0f): /* AUTDZB */ - unallocated_encoding(s); - break; - } - -#undef MAP -} - - /* * Data processing - register * 31 30 29 28 25 21 20 16 10 0 @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_1src(DisasContext *s, uint32_t insn) */ static void disas_data_proc_reg(DisasContext *s, uint32_t insn) { - int op0 = extract32(insn, 30, 1); int op1 = extract32(insn, 28, 1); int op2 = extract32(insn, 21, 4); int op3 = extract32(insn, 10, 6); @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_reg(DisasContext *s, uint32_t insn) disas_cond_select(s, insn); break; - case 0x6: /* Data-processing */ - if (op0) { /* (1 source) */ - disas_data_proc_1src(s, insn); - } else { /* (2 source) */ - goto do_unallocated; - } - break; case 0x8 ... 0xf: /* (3 source) */ disas_data_proc_3src(s, insn); break; default: do_unallocated: + case 0x6: /* Data-processing */ unallocated_encoding(s); break; } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> This includes AND, BIC, ORR, ORN, EOR, EON, ANDS, BICS (shifted reg). Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-12-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 9 +++ target/arm/tcg/translate-a64.c | 117 ++++++++++++--------------------- 2 files changed, 51 insertions(+), 75 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ XPACI 1 10 11010110 00001 010000 11111 rd:5 XPACD 1 10 11010110 00001 010001 11111 rd:5 # Logical (shifted reg) + +&logic_shift rd rn rm sf sa st n +@logic_shift sf:1 .. ..... st:2 n:1 rm:5 sa:6 rn:5 rd:5 &logic_shift + +AND_r . 00 01010 .. . ..... ...... ..... ..... @logic_shift +ORR_r . 01 01010 .. . ..... ...... ..... ..... @logic_shift +EOR_r . 10 01010 .. . ..... ...... ..... ..... @logic_shift +ANDS_r . 11 01010 .. . ..... ...... ..... ..... @logic_shift + # Add/subtract (shifted reg) # Add/subtract (extended reg) # Add/subtract (carry) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static bool do_xpac(DisasContext *s, int rd, NeonGenOne64OpEnvFn *fn) TRANS_FEAT(XPACI, aa64_pauth, do_xpac, a->rd, gen_helper_xpaci) TRANS_FEAT(XPACD, aa64_pauth, do_xpac, a->rd, gen_helper_xpacd) -/* Logical (shifted register) - * 31 30 29 28 24 23 22 21 20 16 15 10 9 5 4 0 - * +----+-----+-----------+-------+---+------+--------+------+------+ - * | sf | opc | 0 1 0 1 0 | shift | N | Rm | imm6 | Rn | Rd | - * +----+-----+-----------+-------+---+------+--------+------+------+ - */ -static void disas_logic_reg(DisasContext *s, uint32_t insn) +static bool do_logic_reg(DisasContext *s, arg_logic_shift *a, + ArithTwoOp *fn, ArithTwoOp *inv_fn, bool setflags) { TCGv_i64 tcg_rd, tcg_rn, tcg_rm; - unsigned int sf, opc, shift_type, invert, rm, shift_amount, rn, rd; - sf = extract32(insn, 31, 1); - opc = extract32(insn, 29, 2); - shift_type = extract32(insn, 22, 2); - invert = extract32(insn, 21, 1); - rm = extract32(insn, 16, 5); - shift_amount = extract32(insn, 10, 6); - rn = extract32(insn, 5, 5); - rd = extract32(insn, 0, 5); - - if (!sf && (shift_amount & (1 << 5))) { - unallocated_encoding(s); - return; + if (!a->sf && (a->sa & (1 << 5))) { + return false; } - tcg_rd = cpu_reg(s, rd); + tcg_rd = cpu_reg(s, a->rd); + tcg_rn = cpu_reg(s, a->rn); - if (opc == 1 && shift_amount == 0 && shift_type == 0 && rn == 31) { - /* Unshifted ORR and ORN with WZR/XZR is the standard encoding for - * register-register MOV and MVN, so it is worth special casing. - */ - tcg_rm = cpu_reg(s, rm); - if (invert) { + tcg_rm = read_cpu_reg(s, a->rm, a->sf); + if (a->sa) { + shift_reg_imm(tcg_rm, tcg_rm, a->sf, a->st, a->sa); + } + + (a->n ? inv_fn : fn)(tcg_rd, tcg_rn, tcg_rm); + if (!a->sf) { + tcg_gen_ext32u_i64(tcg_rd, tcg_rd); + } + if (setflags) { + gen_logic_CC(a->sf, tcg_rd); + } + return true; +} + +static bool trans_ORR_r(DisasContext *s, arg_logic_shift *a) +{ + /* + * Unshifted ORR and ORN with WZR/XZR is the standard encoding for + * register-register MOV and MVN, so it is worth special casing. + */ + if (a->sa == 0 && a->st == 0 && a->rn == 31) { + TCGv_i64 tcg_rd = cpu_reg(s, a->rd); + TCGv_i64 tcg_rm = cpu_reg(s, a->rm); + + if (a->n) { tcg_gen_not_i64(tcg_rd, tcg_rm); - if (!sf) { + if (!a->sf) { tcg_gen_ext32u_i64(tcg_rd, tcg_rd); } } else { - if (sf) { + if (a->sf) { tcg_gen_mov_i64(tcg_rd, tcg_rm); } else { tcg_gen_ext32u_i64(tcg_rd, tcg_rm); } } - return; + return true; } - tcg_rm = read_cpu_reg(s, rm, sf); - - if (shift_amount) { - shift_reg_imm(tcg_rm, tcg_rm, sf, shift_type, shift_amount); - } - - tcg_rn = cpu_reg(s, rn); - - switch (opc | (invert << 2)) { - case 0: /* AND */ - case 3: /* ANDS */ - tcg_gen_and_i64(tcg_rd, tcg_rn, tcg_rm); - break; - case 1: /* ORR */ - tcg_gen_or_i64(tcg_rd, tcg_rn, tcg_rm); - break; - case 2: /* EOR */ - tcg_gen_xor_i64(tcg_rd, tcg_rn, tcg_rm); - break; - case 4: /* BIC */ - case 7: /* BICS */ - tcg_gen_andc_i64(tcg_rd, tcg_rn, tcg_rm); - break; - case 5: /* ORN */ - tcg_gen_orc_i64(tcg_rd, tcg_rn, tcg_rm); - break; - case 6: /* EON */ - tcg_gen_eqv_i64(tcg_rd, tcg_rn, tcg_rm); - break; - default: - assert(FALSE); - break; - } - - if (!sf) { - tcg_gen_ext32u_i64(tcg_rd, tcg_rd); - } - - if (opc == 3) { - gen_logic_CC(sf, tcg_rd); - } + return do_logic_reg(s, a, tcg_gen_or_i64, tcg_gen_orc_i64, false); } +TRANS(AND_r, do_logic_reg, a, tcg_gen_and_i64, tcg_gen_andc_i64, false) +TRANS(ANDS_r, do_logic_reg, a, tcg_gen_and_i64, tcg_gen_andc_i64, true) +TRANS(EOR_r, do_logic_reg, a, tcg_gen_xor_i64, tcg_gen_eqv_i64, false) + /* * Add/subtract (extended register) * @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_reg(DisasContext *s, uint32_t insn) /* Add/sub (shifted register) */ disas_add_sub_reg(s, insn); } - } else { - /* Logical (shifted register) */ - disas_logic_reg(s, insn); + return; } - return; + goto do_unallocated; } switch (op2) { -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> This includes ADD, SUB, ADDS, SUBS (extended register). Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-13-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 9 +++++ target/arm/tcg/translate-a64.c | 65 +++++++++++----------------------- 2 files changed, 29 insertions(+), 45 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ ANDS_r . 11 01010 .. . ..... ...... ..... ..... @logic_shift # Add/subtract (shifted reg) # Add/subtract (extended reg) + +&addsub_ext rd rn rm sf sa st +@addsub_ext sf:1 .. ........ rm:5 st:3 sa:3 rn:5 rd:5 &addsub_ext + +ADD_ext . 00 01011001 ..... ... ... ..... ..... @addsub_ext +SUB_ext . 10 01011001 ..... ... ... ..... ..... @addsub_ext +ADDS_ext . 01 01011001 ..... ... ... ..... ..... @addsub_ext +SUBS_ext . 11 01011001 ..... ... ... ..... ..... @addsub_ext + # Add/subtract (carry) # Rotate right into flags # Evaluate into flags diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ TRANS(AND_r, do_logic_reg, a, tcg_gen_and_i64, tcg_gen_andc_i64, false) TRANS(ANDS_r, do_logic_reg, a, tcg_gen_and_i64, tcg_gen_andc_i64, true) TRANS(EOR_r, do_logic_reg, a, tcg_gen_xor_i64, tcg_gen_eqv_i64, false) -/* - * Add/subtract (extended register) - * - * 31|30|29|28 24|23 22|21|20 16|15 13|12 10|9 5|4 0| - * +--+--+--+-----------+-----+--+-------+------+------+----+----+ - * |sf|op| S| 0 1 0 1 1 | opt | 1| Rm |option| imm3 | Rn | Rd | - * +--+--+--+-----------+-----+--+-------+------+------+----+----+ - * - * sf: 0 -> 32bit, 1 -> 64bit - * op: 0 -> add , 1 -> sub - * S: 1 -> set flags - * opt: 00 - * option: extension type (see DecodeRegExtend) - * imm3: optional shift to Rm - * - * Rd = Rn + LSL(extend(Rm), amount) - */ -static void disas_add_sub_ext_reg(DisasContext *s, uint32_t insn) +static bool do_addsub_ext(DisasContext *s, arg_addsub_ext *a, + bool sub_op, bool setflags) { - int rd = extract32(insn, 0, 5); - int rn = extract32(insn, 5, 5); - int imm3 = extract32(insn, 10, 3); - int option = extract32(insn, 13, 3); - int rm = extract32(insn, 16, 5); - int opt = extract32(insn, 22, 2); - bool setflags = extract32(insn, 29, 1); - bool sub_op = extract32(insn, 30, 1); - bool sf = extract32(insn, 31, 1); + TCGv_i64 tcg_rm, tcg_rn, tcg_rd, tcg_result; - TCGv_i64 tcg_rm, tcg_rn; /* temps */ - TCGv_i64 tcg_rd; - TCGv_i64 tcg_result; - - if (imm3 > 4 || opt != 0) { - unallocated_encoding(s); - return; + if (a->sa > 4) { + return false; } /* non-flag setting ops may use SP */ if (!setflags) { - tcg_rd = cpu_reg_sp(s, rd); + tcg_rd = cpu_reg_sp(s, a->rd); } else { - tcg_rd = cpu_reg(s, rd); + tcg_rd = cpu_reg(s, a->rd); } - tcg_rn = read_cpu_reg_sp(s, rn, sf); + tcg_rn = read_cpu_reg_sp(s, a->rn, a->sf); - tcg_rm = read_cpu_reg(s, rm, sf); - ext_and_shift_reg(tcg_rm, tcg_rm, option, imm3); + tcg_rm = read_cpu_reg(s, a->rm, a->sf); + ext_and_shift_reg(tcg_rm, tcg_rm, a->st, a->sa); tcg_result = tcg_temp_new_i64(); - if (!setflags) { if (sub_op) { tcg_gen_sub_i64(tcg_result, tcg_rn, tcg_rm); @@ -XXX,XX +XXX,XX @@ static void disas_add_sub_ext_reg(DisasContext *s, uint32_t insn) } } else { if (sub_op) { - gen_sub_CC(sf, tcg_result, tcg_rn, tcg_rm); + gen_sub_CC(a->sf, tcg_result, tcg_rn, tcg_rm); } else { - gen_add_CC(sf, tcg_result, tcg_rn, tcg_rm); + gen_add_CC(a->sf, tcg_result, tcg_rn, tcg_rm); } } - if (sf) { + if (a->sf) { tcg_gen_mov_i64(tcg_rd, tcg_result); } else { tcg_gen_ext32u_i64(tcg_rd, tcg_result); } + return true; } +TRANS(ADD_ext, do_addsub_ext, a, false, false) +TRANS(SUB_ext, do_addsub_ext, a, true, false) +TRANS(ADDS_ext, do_addsub_ext, a, false, true) +TRANS(SUBS_ext, do_addsub_ext, a, true, true) + /* * Add/subtract (shifted register) * @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_reg(DisasContext *s, uint32_t insn) if (!op1) { if (op2 & 8) { if (op2 & 1) { - /* Add/sub (extended register) */ - disas_add_sub_ext_reg(s, insn); + goto do_unallocated; } else { /* Add/sub (shifted register) */ disas_add_sub_reg(s, insn); -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> This includes ADD, SUB, ADDS, SUBS (shifted register). Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-14-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 9 +++++ target/arm/tcg/translate-a64.c | 64 ++++++++++------------------------ 2 files changed, 27 insertions(+), 46 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ EOR_r . 10 01010 .. . ..... ...... ..... ..... @logic_shift ANDS_r . 11 01010 .. . ..... ...... ..... ..... @logic_shift # Add/subtract (shifted reg) + +&addsub_shift rd rn rm sf sa st +@addsub_shift sf:1 .. ..... st:2 . rm:5 sa:6 rn:5 rd:5 &addsub_shift + +ADD_r . 00 01011 .. 0 ..... ...... ..... ..... @addsub_shift +SUB_r . 10 01011 .. 0 ..... ...... ..... ..... @addsub_shift +ADDS_r . 01 01011 .. 0 ..... ...... ..... ..... @addsub_shift +SUBS_r . 11 01011 .. 0 ..... ...... ..... ..... @addsub_shift + # Add/subtract (extended reg) &addsub_ext rd rn rm sf sa st diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ TRANS(SUB_ext, do_addsub_ext, a, true, false) TRANS(ADDS_ext, do_addsub_ext, a, false, true) TRANS(SUBS_ext, do_addsub_ext, a, true, true) -/* - * Add/subtract (shifted register) - * - * 31 30 29 28 24 23 22 21 20 16 15 10 9 5 4 0 - * +--+--+--+-----------+-----+--+-------+---------+------+------+ - * |sf|op| S| 0 1 0 1 1 |shift| 0| Rm | imm6 | Rn | Rd | - * +--+--+--+-----------+-----+--+-------+---------+------+------+ - * - * sf: 0 -> 32bit, 1 -> 64bit - * op: 0 -> add , 1 -> sub - * S: 1 -> set flags - * shift: 00 -> LSL, 01 -> LSR, 10 -> ASR, 11 -> RESERVED - * imm6: Shift amount to apply to Rm before the add/sub - */ -static void disas_add_sub_reg(DisasContext *s, uint32_t insn) +static bool do_addsub_reg(DisasContext *s, arg_addsub_shift *a, + bool sub_op, bool setflags) { - int rd = extract32(insn, 0, 5); - int rn = extract32(insn, 5, 5); - int imm6 = extract32(insn, 10, 6); - int rm = extract32(insn, 16, 5); - int shift_type = extract32(insn, 22, 2); - bool setflags = extract32(insn, 29, 1); - bool sub_op = extract32(insn, 30, 1); - bool sf = extract32(insn, 31, 1); + TCGv_i64 tcg_rd, tcg_rn, tcg_rm, tcg_result; - TCGv_i64 tcg_rd = cpu_reg(s, rd); - TCGv_i64 tcg_rn, tcg_rm; - TCGv_i64 tcg_result; - - if ((shift_type == 3) || (!sf && (imm6 > 31))) { - unallocated_encoding(s); - return; + if (a->st == 3 || (!a->sf && (a->sa & 32))) { + return false; } - tcg_rn = read_cpu_reg(s, rn, sf); - tcg_rm = read_cpu_reg(s, rm, sf); + tcg_rd = cpu_reg(s, a->rd); + tcg_rn = read_cpu_reg(s, a->rn, a->sf); + tcg_rm = read_cpu_reg(s, a->rm, a->sf); - shift_reg_imm(tcg_rm, tcg_rm, sf, shift_type, imm6); + shift_reg_imm(tcg_rm, tcg_rm, a->sf, a->st, a->sa); tcg_result = tcg_temp_new_i64(); - if (!setflags) { if (sub_op) { tcg_gen_sub_i64(tcg_result, tcg_rn, tcg_rm); @@ -XXX,XX +XXX,XX @@ static void disas_add_sub_reg(DisasContext *s, uint32_t insn) } } else { if (sub_op) { - gen_sub_CC(sf, tcg_result, tcg_rn, tcg_rm); + gen_sub_CC(a->sf, tcg_result, tcg_rn, tcg_rm); } else { - gen_add_CC(sf, tcg_result, tcg_rn, tcg_rm); + gen_add_CC(a->sf, tcg_result, tcg_rn, tcg_rm); } } - if (sf) { + if (a->sf) { tcg_gen_mov_i64(tcg_rd, tcg_result); } else { tcg_gen_ext32u_i64(tcg_rd, tcg_result); } + return true; } +TRANS(ADD_r, do_addsub_reg, a, false, false) +TRANS(SUB_r, do_addsub_reg, a, true, false) +TRANS(ADDS_r, do_addsub_reg, a, false, true) +TRANS(SUBS_r, do_addsub_reg, a, true, true) + /* Data-processing (3 source) * * 31 30 29 28 24 23 21 20 16 15 14 10 9 5 4 0 @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_reg(DisasContext *s, uint32_t insn) int op3 = extract32(insn, 10, 6); if (!op1) { - if (op2 & 8) { - if (op2 & 1) { - goto do_unallocated; - } else { - /* Add/sub (shifted register) */ - disas_add_sub_reg(s, insn); - } - return; - } goto do_unallocated; } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> This includes MADD, MSUB, SMADDL, SMSUBL, UMADDL, UMSUBL, SMULH, UMULH. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-15-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 16 +++++ target/arm/tcg/translate-a64.c | 119 ++++++++++++--------------------- 2 files changed, 59 insertions(+), 76 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ SUBS_ext . 11 01011001 ..... ... ... ..... ..... @addsub_ext # Conditional select # Data Processing (3-source) +&rrrr rd rn rm ra +@rrrr . .. ........ rm:5 . ra:5 rn:5 rd:5 &rrrr + +MADD_w 0 00 11011000 ..... 0 ..... ..... ..... @rrrr +MSUB_w 0 00 11011000 ..... 1 ..... ..... ..... @rrrr +MADD_x 1 00 11011000 ..... 0 ..... ..... ..... @rrrr +MSUB_x 1 00 11011000 ..... 1 ..... ..... ..... @rrrr + +SMADDL 1 00 11011001 ..... 0 ..... ..... ..... @rrrr +SMSUBL 1 00 11011001 ..... 1 ..... ..... ..... @rrrr +UMADDL 1 00 11011101 ..... 0 ..... ..... ..... @rrrr +UMSUBL 1 00 11011101 ..... 1 ..... ..... ..... @rrrr + +SMULH 1 00 11011010 ..... 0 11111 ..... ..... @rrr +UMULH 1 00 11011110 ..... 0 11111 ..... ..... @rrr + ### Cryptographic AES AESE 01001110 00 10100 00100 10 ..... ..... @r2r_q1e0 diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ TRANS(SUB_r, do_addsub_reg, a, true, false) TRANS(ADDS_r, do_addsub_reg, a, false, true) TRANS(SUBS_r, do_addsub_reg, a, true, true) -/* Data-processing (3 source) - * - * 31 30 29 28 24 23 21 20 16 15 14 10 9 5 4 0 - * +--+------+-----------+------+------+----+------+------+------+ - * |sf| op54 | 1 1 0 1 1 | op31 | Rm | o0 | Ra | Rn | Rd | - * +--+------+-----------+------+------+----+------+------+------+ - */ -static void disas_data_proc_3src(DisasContext *s, uint32_t insn) +static bool do_mulh(DisasContext *s, arg_rrr *a, + void (*fn)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64)) { - int rd = extract32(insn, 0, 5); - int rn = extract32(insn, 5, 5); - int ra = extract32(insn, 10, 5); - int rm = extract32(insn, 16, 5); - int op_id = (extract32(insn, 29, 3) << 4) | - (extract32(insn, 21, 3) << 1) | - extract32(insn, 15, 1); - bool sf = extract32(insn, 31, 1); - bool is_sub = extract32(op_id, 0, 1); - bool is_high = extract32(op_id, 2, 1); - bool is_signed = false; - TCGv_i64 tcg_op1; - TCGv_i64 tcg_op2; - TCGv_i64 tcg_tmp; + TCGv_i64 discard = tcg_temp_new_i64(); + TCGv_i64 tcg_rd = cpu_reg(s, a->rd); + TCGv_i64 tcg_rn = cpu_reg(s, a->rn); + TCGv_i64 tcg_rm = cpu_reg(s, a->rm); - /* Note that op_id is sf:op54:op31:o0 so it includes the 32/64 size flag */ - switch (op_id) { - case 0x42: /* SMADDL */ - case 0x43: /* SMSUBL */ - case 0x44: /* SMULH */ - is_signed = true; - break; - case 0x0: /* MADD (32bit) */ - case 0x1: /* MSUB (32bit) */ - case 0x40: /* MADD (64bit) */ - case 0x41: /* MSUB (64bit) */ - case 0x4a: /* UMADDL */ - case 0x4b: /* UMSUBL */ - case 0x4c: /* UMULH */ - break; - default: - unallocated_encoding(s); - return; - } + fn(discard, tcg_rd, tcg_rn, tcg_rm); + return true; +} - if (is_high) { - TCGv_i64 low_bits = tcg_temp_new_i64(); /* low bits discarded */ - TCGv_i64 tcg_rd = cpu_reg(s, rd); - TCGv_i64 tcg_rn = cpu_reg(s, rn); - TCGv_i64 tcg_rm = cpu_reg(s, rm); +TRANS(SMULH, do_mulh, a, tcg_gen_muls2_i64) +TRANS(UMULH, do_mulh, a, tcg_gen_mulu2_i64) - if (is_signed) { - tcg_gen_muls2_i64(low_bits, tcg_rd, tcg_rn, tcg_rm); - } else { - tcg_gen_mulu2_i64(low_bits, tcg_rd, tcg_rn, tcg_rm); - } - return; - } +static bool do_muladd(DisasContext *s, arg_rrrr *a, + bool sf, bool is_sub, MemOp mop) +{ + TCGv_i64 tcg_rd = cpu_reg(s, a->rd); + TCGv_i64 tcg_op1, tcg_op2; - tcg_op1 = tcg_temp_new_i64(); - tcg_op2 = tcg_temp_new_i64(); - tcg_tmp = tcg_temp_new_i64(); - - if (op_id < 0x42) { - tcg_gen_mov_i64(tcg_op1, cpu_reg(s, rn)); - tcg_gen_mov_i64(tcg_op2, cpu_reg(s, rm)); + if (mop == MO_64) { + tcg_op1 = cpu_reg(s, a->rn); + tcg_op2 = cpu_reg(s, a->rm); } else { - if (is_signed) { - tcg_gen_ext32s_i64(tcg_op1, cpu_reg(s, rn)); - tcg_gen_ext32s_i64(tcg_op2, cpu_reg(s, rm)); - } else { - tcg_gen_ext32u_i64(tcg_op1, cpu_reg(s, rn)); - tcg_gen_ext32u_i64(tcg_op2, cpu_reg(s, rm)); - } + tcg_op1 = tcg_temp_new_i64(); + tcg_op2 = tcg_temp_new_i64(); + tcg_gen_ext_i64(tcg_op1, cpu_reg(s, a->rn), mop); + tcg_gen_ext_i64(tcg_op2, cpu_reg(s, a->rm), mop); } - if (ra == 31 && !is_sub) { + if (a->ra == 31 && !is_sub) { /* Special-case MADD with rA == XZR; it is the standard MUL alias */ - tcg_gen_mul_i64(cpu_reg(s, rd), tcg_op1, tcg_op2); + tcg_gen_mul_i64(tcg_rd, tcg_op1, tcg_op2); } else { + TCGv_i64 tcg_tmp = tcg_temp_new_i64(); + TCGv_i64 tcg_ra = cpu_reg(s, a->ra); + tcg_gen_mul_i64(tcg_tmp, tcg_op1, tcg_op2); if (is_sub) { - tcg_gen_sub_i64(cpu_reg(s, rd), cpu_reg(s, ra), tcg_tmp); + tcg_gen_sub_i64(tcg_rd, tcg_ra, tcg_tmp); } else { - tcg_gen_add_i64(cpu_reg(s, rd), cpu_reg(s, ra), tcg_tmp); + tcg_gen_add_i64(tcg_rd, tcg_ra, tcg_tmp); } } if (!sf) { - tcg_gen_ext32u_i64(cpu_reg(s, rd), cpu_reg(s, rd)); + tcg_gen_ext32u_i64(tcg_rd, tcg_rd); } + return true; } +TRANS(MADD_w, do_muladd, a, false, false, MO_64) +TRANS(MSUB_w, do_muladd, a, false, true, MO_64) +TRANS(MADD_x, do_muladd, a, true, false, MO_64) +TRANS(MSUB_x, do_muladd, a, true, true, MO_64) + +TRANS(SMADDL, do_muladd, a, true, false, MO_SL) +TRANS(SMSUBL, do_muladd, a, true, true, MO_SL) +TRANS(UMADDL, do_muladd, a, true, false, MO_UL) +TRANS(UMSUBL, do_muladd, a, true, true, MO_UL) + /* Add/subtract (with carry) * 31 30 29 28 27 26 25 24 23 22 21 20 16 15 10 9 5 4 0 * +--+--+--+------------------------+------+-------------+------+-----+ @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_reg(DisasContext *s, uint32_t insn) disas_cond_select(s, insn); break; - case 0x8 ... 0xf: /* (3 source) */ - disas_data_proc_3src(s, insn); - break; - default: do_unallocated: case 0x6: /* Data-processing */ + case 0x8 ... 0xf: /* (3 source) */ unallocated_encoding(s); break; } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> This includes ADC, SBC, ADCS, SBCS. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-16-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 6 +++++ target/arm/tcg/translate-a64.c | 43 +++++++++++++--------------------- 2 files changed, 22 insertions(+), 27 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ ADDS_ext . 01 01011001 ..... ... ... ..... ..... @addsub_ext SUBS_ext . 11 01011001 ..... ... ... ..... ..... @addsub_ext # Add/subtract (carry) + +ADC . 00 11010000 ..... 000000 ..... ..... @rrr_sf +ADCS . 01 11010000 ..... 000000 ..... ..... @rrr_sf +SBC . 10 11010000 ..... 000000 ..... ..... @rrr_sf +SBCS . 11 11010000 ..... 000000 ..... ..... @rrr_sf + # Rotate right into flags # Evaluate into flags # Conditional compare (regster) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ TRANS(SMSUBL, do_muladd, a, true, true, MO_SL) TRANS(UMADDL, do_muladd, a, true, false, MO_UL) TRANS(UMSUBL, do_muladd, a, true, true, MO_UL) -/* Add/subtract (with carry) - * 31 30 29 28 27 26 25 24 23 22 21 20 16 15 10 9 5 4 0 - * +--+--+--+------------------------+------+-------------+------+-----+ - * |sf|op| S| 1 1 0 1 0 0 0 0 | rm | 0 0 0 0 0 0 | Rn | Rd | - * +--+--+--+------------------------+------+-------------+------+-----+ - */ - -static void disas_adc_sbc(DisasContext *s, uint32_t insn) +static bool do_adc_sbc(DisasContext *s, arg_rrr_sf *a, + bool is_sub, bool setflags) { - unsigned int sf, op, setflags, rm, rn, rd; TCGv_i64 tcg_y, tcg_rn, tcg_rd; - sf = extract32(insn, 31, 1); - op = extract32(insn, 30, 1); - setflags = extract32(insn, 29, 1); - rm = extract32(insn, 16, 5); - rn = extract32(insn, 5, 5); - rd = extract32(insn, 0, 5); + tcg_rd = cpu_reg(s, a->rd); + tcg_rn = cpu_reg(s, a->rn); - tcg_rd = cpu_reg(s, rd); - tcg_rn = cpu_reg(s, rn); - - if (op) { + if (is_sub) { tcg_y = tcg_temp_new_i64(); - tcg_gen_not_i64(tcg_y, cpu_reg(s, rm)); + tcg_gen_not_i64(tcg_y, cpu_reg(s, a->rm)); } else { - tcg_y = cpu_reg(s, rm); + tcg_y = cpu_reg(s, a->rm); } if (setflags) { - gen_adc_CC(sf, tcg_rd, tcg_rn, tcg_y); + gen_adc_CC(a->sf, tcg_rd, tcg_rn, tcg_y); } else { - gen_adc(sf, tcg_rd, tcg_rn, tcg_y); + gen_adc(a->sf, tcg_rd, tcg_rn, tcg_y); } + return true; } +TRANS(ADC, do_adc_sbc, a, false, false) +TRANS(SBC, do_adc_sbc, a, true, false) +TRANS(ADCS, do_adc_sbc, a, false, true) +TRANS(SBCS, do_adc_sbc, a, true, true) + /* * Rotate right into flags * 31 30 29 21 15 10 5 4 0 @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_reg(DisasContext *s, uint32_t insn) switch (op2) { case 0x0: switch (op3) { - case 0x00: /* Add/subtract (with carry) */ - disas_adc_sbc(s, insn); - break; - case 0x01: /* Rotate right into flags */ case 0x21: disas_rotate_right_into_flags(s, insn); @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_reg(DisasContext *s, uint32_t insn) break; default: + case 0x00: /* Add/subtract (with carry) */ goto do_unallocated; } break; -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-17-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 3 +++ target/arm/tcg/translate-a64.c | 32 +++++++++----------------------- 2 files changed, 12 insertions(+), 23 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ SBC . 10 11010000 ..... 000000 ..... ..... @rrr_sf SBCS . 11 11010000 ..... 000000 ..... ..... @rrr_sf # Rotate right into flags + +RMIF 1 01 11010000 imm:6 00001 rn:5 0 mask:4 + # Evaluate into flags # Conditional compare (regster) # Conditional compare (immediate) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ TRANS(SBC, do_adc_sbc, a, true, false) TRANS(ADCS, do_adc_sbc, a, false, true) TRANS(SBCS, do_adc_sbc, a, true, true) -/* - * Rotate right into flags - * 31 30 29 21 15 10 5 4 0 - * +--+--+--+-----------------+--------+-----------+------+--+------+ - * |sf|op| S| 1 1 0 1 0 0 0 0 | imm6 | 0 0 0 0 1 | Rn |o2| mask | - * +--+--+--+-----------------+--------+-----------+------+--+------+ - */ -static void disas_rotate_right_into_flags(DisasContext *s, uint32_t insn) +static bool trans_RMIF(DisasContext *s, arg_RMIF *a) { - int mask = extract32(insn, 0, 4); - int o2 = extract32(insn, 4, 1); - int rn = extract32(insn, 5, 5); - int imm6 = extract32(insn, 15, 6); - int sf_op_s = extract32(insn, 29, 3); + int mask = a->mask; TCGv_i64 tcg_rn; TCGv_i32 nzcv; - if (sf_op_s != 5 || o2 != 0 || !dc_isar_feature(aa64_condm_4, s)) { - unallocated_encoding(s); - return; + if (!dc_isar_feature(aa64_condm_4, s)) { + return false; } - tcg_rn = read_cpu_reg(s, rn, 1); - tcg_gen_rotri_i64(tcg_rn, tcg_rn, imm6); + tcg_rn = read_cpu_reg(s, a->rn, 1); + tcg_gen_rotri_i64(tcg_rn, tcg_rn, a->imm); nzcv = tcg_temp_new_i32(); tcg_gen_extrl_i64_i32(nzcv, tcg_rn); @@ -XXX,XX +XXX,XX @@ static void disas_rotate_right_into_flags(DisasContext *s, uint32_t insn) if (mask & 1) { /* V */ tcg_gen_shli_i32(cpu_VF, nzcv, 31 - 0); } + return true; } /* @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_reg(DisasContext *s, uint32_t insn) switch (op2) { case 0x0: switch (op3) { - case 0x01: /* Rotate right into flags */ - case 0x21: - disas_rotate_right_into_flags(s, insn); - break; - case 0x02: /* Evaluate into flags */ case 0x12: case 0x22: @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_reg(DisasContext *s, uint32_t insn) default: case 0x00: /* Add/subtract (with carry) */ + case 0x01: /* Rotate right into flags */ + case 0x21: goto do_unallocated; } break; -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-18-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 4 +++ target/arm/tcg/translate-a64.c | 48 +++++----------------------------- 2 files changed, 11 insertions(+), 41 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ SBCS . 11 11010000 ..... 000000 ..... ..... @rrr_sf RMIF 1 01 11010000 imm:6 00001 rn:5 0 mask:4 # Evaluate into flags + +SETF8 0 01 11010000 00000 000010 rn:5 01101 +SETF16 0 01 11010000 00000 010010 rn:5 01101 + # Conditional compare (regster) # Conditional compare (immediate) # Conditional select diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static bool trans_RMIF(DisasContext *s, arg_RMIF *a) return true; } -/* - * Evaluate into flags - * 31 30 29 21 15 14 10 5 4 0 - * +--+--+--+-----------------+---------+----+---------+------+--+------+ - * |sf|op| S| 1 1 0 1 0 0 0 0 | opcode2 | sz | 0 0 1 0 | Rn |o3| mask | - * +--+--+--+-----------------+---------+----+---------+------+--+------+ - */ -static void disas_evaluate_into_flags(DisasContext *s, uint32_t insn) +static bool do_setf(DisasContext *s, int rn, int shift) { - int o3_mask = extract32(insn, 0, 5); - int rn = extract32(insn, 5, 5); - int o2 = extract32(insn, 15, 6); - int sz = extract32(insn, 14, 1); - int sf_op_s = extract32(insn, 29, 3); - TCGv_i32 tmp; - int shift; + TCGv_i32 tmp = tcg_temp_new_i32(); - if (sf_op_s != 1 || o2 != 0 || o3_mask != 0xd || - !dc_isar_feature(aa64_condm_4, s)) { - unallocated_encoding(s); - return; - } - shift = sz ? 16 : 24; /* SETF16 or SETF8 */ - - tmp = tcg_temp_new_i32(); tcg_gen_extrl_i64_i32(tmp, cpu_reg(s, rn)); tcg_gen_shli_i32(cpu_NF, tmp, shift); tcg_gen_shli_i32(cpu_VF, tmp, shift - 1); tcg_gen_mov_i32(cpu_ZF, cpu_NF); tcg_gen_xor_i32(cpu_VF, cpu_VF, cpu_NF); + return true; } +TRANS_FEAT(SETF8, aa64_condm_4, do_setf, a->rn, 24) +TRANS_FEAT(SETF16, aa64_condm_4, do_setf, a->rn, 16) + /* Conditional compare (immediate / register) * 31 30 29 28 27 26 25 24 23 22 21 20 16 15 12 11 10 9 5 4 3 0 * +--+--+--+------------------------+--------+------+----+--+------+--+-----+ @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_reg(DisasContext *s, uint32_t insn) { int op1 = extract32(insn, 28, 1); int op2 = extract32(insn, 21, 4); - int op3 = extract32(insn, 10, 6); if (!op1) { goto do_unallocated; } switch (op2) { - case 0x0: - switch (op3) { - case 0x02: /* Evaluate into flags */ - case 0x12: - case 0x22: - case 0x32: - disas_evaluate_into_flags(s, insn); - break; - - default: - case 0x00: /* Add/subtract (with carry) */ - case 0x01: /* Rotate right into flags */ - case 0x21: - goto do_unallocated; - } - break; - case 0x2: /* Conditional compare */ disas_cc(s, insn); /* both imm and reg forms */ break; @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_reg(DisasContext *s, uint32_t insn) default: do_unallocated: + case 0x0: case 0x6: /* Data-processing */ case 0x8 ... 0xf: /* (3 source) */ unallocated_encoding(s); -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-19-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 6 ++-- target/arm/tcg/translate-a64.c | 66 +++++++++++----------------------- 2 files changed, 25 insertions(+), 47 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ RMIF 1 01 11010000 imm:6 00001 rn:5 0 mask:4 SETF8 0 01 11010000 00000 000010 rn:5 01101 SETF16 0 01 11010000 00000 010010 rn:5 01101 -# Conditional compare (regster) -# Conditional compare (immediate) +# Conditional compare + +CCMP sf:1 op:1 1 11010010 y:5 cond:4 imm:1 0 rn:5 0 nzcv:4 + # Conditional select # Data Processing (3-source) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static bool do_setf(DisasContext *s, int rn, int shift) TRANS_FEAT(SETF8, aa64_condm_4, do_setf, a->rn, 24) TRANS_FEAT(SETF16, aa64_condm_4, do_setf, a->rn, 16) -/* Conditional compare (immediate / register) - * 31 30 29 28 27 26 25 24 23 22 21 20 16 15 12 11 10 9 5 4 3 0 - * +--+--+--+------------------------+--------+------+----+--+------+--+-----+ - * |sf|op| S| 1 1 0 1 0 0 1 0 |imm5/rm | cond |i/r |o2| Rn |o3|nzcv | - * +--+--+--+------------------------+--------+------+----+--+------+--+-----+ - * [1] y [0] [0] - */ -static void disas_cc(DisasContext *s, uint32_t insn) +/* CCMP, CCMN */ +static bool trans_CCMP(DisasContext *s, arg_CCMP *a) { - unsigned int sf, op, y, cond, rn, nzcv, is_imm; - TCGv_i32 tcg_t0, tcg_t1, tcg_t2; - TCGv_i64 tcg_tmp, tcg_y, tcg_rn; + TCGv_i32 tcg_t0 = tcg_temp_new_i32(); + TCGv_i32 tcg_t1 = tcg_temp_new_i32(); + TCGv_i32 tcg_t2 = tcg_temp_new_i32(); + TCGv_i64 tcg_tmp = tcg_temp_new_i64(); + TCGv_i64 tcg_rn, tcg_y; DisasCompare c; - - if (!extract32(insn, 29, 1)) { - unallocated_encoding(s); - return; - } - if (insn & (1 << 10 | 1 << 4)) { - unallocated_encoding(s); - return; - } - sf = extract32(insn, 31, 1); - op = extract32(insn, 30, 1); - is_imm = extract32(insn, 11, 1); - y = extract32(insn, 16, 5); /* y = rm (reg) or imm5 (imm) */ - cond = extract32(insn, 12, 4); - rn = extract32(insn, 5, 5); - nzcv = extract32(insn, 0, 4); + unsigned nzcv; /* Set T0 = !COND. */ - tcg_t0 = tcg_temp_new_i32(); - arm_test_cc(&c, cond); + arm_test_cc(&c, a->cond); tcg_gen_setcondi_i32(tcg_invert_cond(c.cond), tcg_t0, c.value, 0); /* Load the arguments for the new comparison. */ - if (is_imm) { - tcg_y = tcg_temp_new_i64(); - tcg_gen_movi_i64(tcg_y, y); + if (a->imm) { + tcg_y = tcg_constant_i64(a->y); } else { - tcg_y = cpu_reg(s, y); + tcg_y = cpu_reg(s, a->y); } - tcg_rn = cpu_reg(s, rn); + tcg_rn = cpu_reg(s, a->rn); /* Set the flags for the new comparison. */ - tcg_tmp = tcg_temp_new_i64(); - if (op) { - gen_sub_CC(sf, tcg_tmp, tcg_rn, tcg_y); + if (a->op) { + gen_sub_CC(a->sf, tcg_tmp, tcg_rn, tcg_y); } else { - gen_add_CC(sf, tcg_tmp, tcg_rn, tcg_y); + gen_add_CC(a->sf, tcg_tmp, tcg_rn, tcg_y); } - /* If COND was false, force the flags to #nzcv. Compute two masks + /* + * If COND was false, force the flags to #nzcv. Compute two masks * to help with this: T1 = (COND ? 0 : -1), T2 = (COND ? -1 : 0). * For tcg hosts that support ANDC, we can make do with just T1. * In either case, allow the tcg optimizer to delete any unused mask. */ - tcg_t1 = tcg_temp_new_i32(); - tcg_t2 = tcg_temp_new_i32(); tcg_gen_neg_i32(tcg_t1, tcg_t0); tcg_gen_subi_i32(tcg_t2, tcg_t0, 1); + nzcv = a->nzcv; if (nzcv & 8) { /* N */ tcg_gen_or_i32(cpu_NF, cpu_NF, tcg_t1); } else { @@ -XXX,XX +XXX,XX @@ static void disas_cc(DisasContext *s, uint32_t insn) tcg_gen_and_i32(cpu_VF, cpu_VF, tcg_t2); } } + return true; } /* Conditional select @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_reg(DisasContext *s, uint32_t insn) } switch (op2) { - case 0x2: /* Conditional compare */ - disas_cc(s, insn); /* both imm and reg forms */ - break; - case 0x4: /* Conditional select */ disas_cond_select(s, insn); break; @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_reg(DisasContext *s, uint32_t insn) default: do_unallocated: case 0x0: + case 0x2: /* Conditional compare */ case 0x6: /* Data-processing */ case 0x8 ... 0xf: /* (3 source) */ unallocated_encoding(s); -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> This includes CSEL, CSINC, CSINV, CSNEG. Remove disas_data_proc_reg, as these were the last insns decoded by that function. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-20-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 3 ++ target/arm/tcg/translate-a64.c | 84 ++++++---------------------------- 2 files changed, 17 insertions(+), 70 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ SETF16 0 01 11010000 00000 010010 rn:5 01101 CCMP sf:1 op:1 1 11010010 y:5 cond:4 imm:1 0 rn:5 0 nzcv:4 # Conditional select + +CSEL sf:1 else_inv:1 011010100 rm:5 cond:4 0 else_inc:1 rn:5 rd:5 + # Data Processing (3-source) &rrrr rd rn rm ra diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static bool trans_CCMP(DisasContext *s, arg_CCMP *a) return true; } -/* Conditional select - * 31 30 29 28 21 20 16 15 12 11 10 9 5 4 0 - * +----+----+---+-----------------+------+------+-----+------+------+ - * | sf | op | S | 1 1 0 1 0 1 0 0 | Rm | cond | op2 | Rn | Rd | - * +----+----+---+-----------------+------+------+-----+------+------+ - */ -static void disas_cond_select(DisasContext *s, uint32_t insn) +static bool trans_CSEL(DisasContext *s, arg_CSEL *a) { - unsigned int sf, else_inv, rm, cond, else_inc, rn, rd; - TCGv_i64 tcg_rd, zero; + TCGv_i64 tcg_rd = cpu_reg(s, a->rd); + TCGv_i64 zero = tcg_constant_i64(0); DisasCompare64 c; - if (extract32(insn, 29, 1) || extract32(insn, 11, 1)) { - /* S == 1 or op2<1> == 1 */ - unallocated_encoding(s); - return; - } - sf = extract32(insn, 31, 1); - else_inv = extract32(insn, 30, 1); - rm = extract32(insn, 16, 5); - cond = extract32(insn, 12, 4); - else_inc = extract32(insn, 10, 1); - rn = extract32(insn, 5, 5); - rd = extract32(insn, 0, 5); + a64_test_cc(&c, a->cond); - tcg_rd = cpu_reg(s, rd); - - a64_test_cc(&c, cond); - zero = tcg_constant_i64(0); - - if (rn == 31 && rm == 31 && (else_inc ^ else_inv)) { + if (a->rn == 31 && a->rm == 31 && (a->else_inc ^ a->else_inv)) { /* CSET & CSETM. */ - if (else_inv) { + if (a->else_inv) { tcg_gen_negsetcond_i64(tcg_invert_cond(c.cond), tcg_rd, c.value, zero); } else { @@ -XXX,XX +XXX,XX @@ static void disas_cond_select(DisasContext *s, uint32_t insn) tcg_rd, c.value, zero); } } else { - TCGv_i64 t_true = cpu_reg(s, rn); - TCGv_i64 t_false = read_cpu_reg(s, rm, 1); - if (else_inv && else_inc) { + TCGv_i64 t_true = cpu_reg(s, a->rn); + TCGv_i64 t_false = read_cpu_reg(s, a->rm, 1); + + if (a->else_inv && a->else_inc) { tcg_gen_neg_i64(t_false, t_false); - } else if (else_inv) { + } else if (a->else_inv) { tcg_gen_not_i64(t_false, t_false); - } else if (else_inc) { + } else if (a->else_inc) { tcg_gen_addi_i64(t_false, t_false, 1); } tcg_gen_movcond_i64(c.cond, tcg_rd, c.value, zero, t_true, t_false); } - if (!sf) { + if (!a->sf) { tcg_gen_ext32u_i64(tcg_rd, tcg_rd); } -} - -/* - * Data processing - register - * 31 30 29 28 25 21 20 16 10 0 - * +--+---+--+---+-------+-----+-------+-------+---------+ - * | |op0| |op1| 1 0 1 | op2 | | op3 | | - * +--+---+--+---+-------+-----+-------+-------+---------+ - */ -static void disas_data_proc_reg(DisasContext *s, uint32_t insn) -{ - int op1 = extract32(insn, 28, 1); - int op2 = extract32(insn, 21, 4); - - if (!op1) { - goto do_unallocated; - } - - switch (op2) { - case 0x4: /* Conditional select */ - disas_cond_select(s, insn); - break; - - default: - do_unallocated: - case 0x0: - case 0x2: /* Conditional compare */ - case 0x6: /* Data-processing */ - case 0x8 ... 0xf: /* (3 source) */ - unallocated_encoding(s); - break; - } + return true; } static void handle_fp_compare(DisasContext *s, int size, @@ -XXX,XX +XXX,XX @@ static bool btype_destination_ok(uint32_t insn, bool bt, int btype) static void disas_a64_legacy(DisasContext *s, uint32_t insn) { switch (extract32(insn, 25, 4)) { - case 0x5: - case 0xd: /* Data processing - register */ - disas_data_proc_reg(s, insn); - break; case 0x7: case 0xf: /* Data processing - SIMD and floating point */ disas_data_proc_simd_fp(s, insn); -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Provide a simple way to check for float64, float32, and float16 support, as well as the fpu enabled. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-21-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-a64.c | 62 ++++++++++++++++++---------------- 1 file changed, 32 insertions(+), 30 deletions(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static bool fp_access_check(DisasContext *s) return true; } +/* + * Return <0 for non-supported element sizes, with MO_16 controlled by + * FEAT_FP16; return 0 for fp disabled; otherwise return >0 for success. + */ +static int fp_access_check_scalar_hsd(DisasContext *s, MemOp esz) +{ + switch (esz) { + case MO_64: + case MO_32: + break; + case MO_16: + if (!dc_isar_feature(aa64_fp16, s)) { + return -1; + } + break; + default: + return -1; + } + return fp_access_check(s); +} + /* * Check that SVE access is enabled. If it is, return true. * If not, emit code to generate an appropriate exception and return false. @@ -XXX,XX +XXX,XX @@ static bool trans_FCSEL(DisasContext *s, arg_FCSEL *a) { TCGv_i64 t_true, t_false; DisasCompare64 c; + int check = fp_access_check_scalar_hsd(s, a->esz); - switch (a->esz) { - case MO_32: - case MO_64: - break; - case MO_16: - if (!dc_isar_feature(aa64_fp16, s)) { - return false; - } - break; - default: - return false; - } - - if (!fp_access_check(s)) { - return true; + if (check <= 0) { + return check == 0; } /* Zero extend sreg & hreg inputs to 64 bits now. */ @@ -XXX,XX +XXX,XX @@ TRANS(FMINV_s, do_fp_reduction, a, gen_helper_vfp_mins) static bool trans_FMOVI_s(DisasContext *s, arg_FMOVI_s *a) { - switch (a->esz) { - case MO_32: - case MO_64: - break; - case MO_16: - if (!dc_isar_feature(aa64_fp16, s)) { - return false; - } - break; - default: - return false; - } - if (fp_access_check(s)) { - uint64_t imm = vfp_expand_imm(a->esz, a->imm); - write_fp_dreg(s, a->rd, tcg_constant_i64(imm)); + int check = fp_access_check_scalar_hsd(s, a->esz); + uint64_t imm; + + if (check <= 0) { + return check == 0; } + + imm = vfp_expand_imm(a->esz, a->imm); + write_fp_dreg(s, a->rd, tcg_constant_i64(imm)); return true; } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Provide a simple way to check for float64, float32, and float16 support vs vector width, as well as the fpu enabled. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-22-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-a64.c | 135 +++++++++++++-------------------- 1 file changed, 54 insertions(+), 81 deletions(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static int fp_access_check_scalar_hsd(DisasContext *s, MemOp esz) return fp_access_check(s); } +/* Likewise, but vector MO_64 must have two elements. */ +static int fp_access_check_vector_hsd(DisasContext *s, bool is_q, MemOp esz) +{ + switch (esz) { + case MO_64: + if (!is_q) { + return -1; + } + break; + case MO_32: + break; + case MO_16: + if (!dc_isar_feature(aa64_fp16, s)) { + return -1; + } + break; + default: + return -1; + } + return fp_access_check(s); +} + /* * Check that SVE access is enabled. If it is, return true. * If not, emit code to generate an appropriate exception and return false. @@ -XXX,XX +XXX,XX @@ static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a, int data, gen_helper_gvec_3_ptr * const fns[3]) { MemOp esz = a->esz; + int check = fp_access_check_vector_hsd(s, a->q, esz); - switch (esz) { - case MO_64: - if (!a->q) { - return false; - } - break; - case MO_32: - break; - case MO_16: - if (!dc_isar_feature(aa64_fp16, s)) { - return false; - } - break; - default: - return false; - } - if (fp_access_check(s)) { - gen_gvec_op3_fpst(s, a->q, a->rd, a->rn, a->rm, - esz == MO_16, data, fns[esz - 1]); + if (check <= 0) { + return check == 0; } + + gen_gvec_op3_fpst(s, a->q, a->rd, a->rn, a->rm, + esz == MO_16, data, fns[esz - 1]); return true; } @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FCADD_270, aa64_fcma, do_fp3_vector, a, 1, f_vector_fcadd) static bool trans_FCMLA_v(DisasContext *s, arg_FCMLA_v *a) { - gen_helper_gvec_4_ptr *fn; + static gen_helper_gvec_4_ptr * const fn[] = { + [MO_16] = gen_helper_gvec_fcmlah, + [MO_32] = gen_helper_gvec_fcmlas, + [MO_64] = gen_helper_gvec_fcmlad, + }; + int check; if (!dc_isar_feature(aa64_fcma, s)) { return false; } - switch (a->esz) { - case MO_64: - if (!a->q) { - return false; - } - fn = gen_helper_gvec_fcmlad; - break; - case MO_32: - fn = gen_helper_gvec_fcmlas; - break; - case MO_16: - if (!dc_isar_feature(aa64_fp16, s)) { - return false; - } - fn = gen_helper_gvec_fcmlah; - break; - default: - return false; - } - if (fp_access_check(s)) { - gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd, - a->esz == MO_16, a->rot, fn); + + check = fp_access_check_vector_hsd(s, a->q, a->esz); + if (check <= 0) { + return check == 0; } + + gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd, + a->esz == MO_16, a->rot, fn[a->esz]); return true; } @@ -XXX,XX +XXX,XX @@ static bool do_fp3_vector_idx(DisasContext *s, arg_qrrx_e *a, gen_helper_gvec_3_ptr * const fns[3]) { MemOp esz = a->esz; + int check = fp_access_check_vector_hsd(s, a->q, esz); - switch (esz) { - case MO_64: - if (!a->q) { - return false; - } - break; - case MO_32: - break; - case MO_16: - if (!dc_isar_feature(aa64_fp16, s)) { - return false; - } - break; - default: - g_assert_not_reached(); - } - if (fp_access_check(s)) { - gen_gvec_op3_fpst(s, a->q, a->rd, a->rn, a->rm, - esz == MO_16, a->idx, fns[esz - 1]); + if (check <= 0) { + return check == 0; } + + gen_gvec_op3_fpst(s, a->q, a->rd, a->rn, a->rm, + esz == MO_16, a->idx, fns[esz - 1]); return true; } @@ -XXX,XX +XXX,XX @@ static bool do_fmla_vector_idx(DisasContext *s, arg_qrrx_e *a, bool neg) gen_helper_gvec_fmla_idx_d, }; MemOp esz = a->esz; + int check = fp_access_check_vector_hsd(s, a->q, esz); - switch (esz) { - case MO_64: - if (!a->q) { - return false; - } - break; - case MO_32: - break; - case MO_16: - if (!dc_isar_feature(aa64_fp16, s)) { - return false; - } - break; - default: - g_assert_not_reached(); - } - if (fp_access_check(s)) { - gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd, - esz == MO_16, (a->idx << 1) | neg, - fns[esz - 1]); + if (check <= 0) { + return check == 0; } + + gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd, + esz == MO_16, (a->idx << 1) | neg, + fns[esz - 1]); return true; } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-23-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 8 + target/arm/tcg/translate-a64.c | 283 ++++++++++++--------------------- 2 files changed, 112 insertions(+), 179 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ FMINV_s 0110 1110 10 11000 01111 10 ..... ..... @rr_q1e2 FMOVI_s 0001 1110 .. 1 imm:8 100 00000 rd:5 esz=%esz_hsd +# Floating-point Compare + +FCMP 00011110 .. 1 rm:5 001000 rn:5 e:1 z:1 000 esz=%esz_hsd + +# Floating-point Conditional Compare + +FCCMP 00011110 .. 1 rm:5 cond:4 01 rn:5 e:1 nzcv:4 esz=%esz_hsd + # Advanced SIMD Modified Immediate / Shift by Immediate %abcdefgh 16:3 5:5 diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static bool trans_FMOVI_s(DisasContext *s, arg_FMOVI_s *a) return true; } +/* + * Floating point compare, conditional compare + */ + +static void handle_fp_compare(DisasContext *s, int size, + unsigned int rn, unsigned int rm, + bool cmp_with_zero, bool signal_all_nans) +{ + TCGv_i64 tcg_flags = tcg_temp_new_i64(); + TCGv_ptr fpst = fpstatus_ptr(size == MO_16 ? FPST_FPCR_F16 : FPST_FPCR); + + if (size == MO_64) { + TCGv_i64 tcg_vn, tcg_vm; + + tcg_vn = read_fp_dreg(s, rn); + if (cmp_with_zero) { + tcg_vm = tcg_constant_i64(0); + } else { + tcg_vm = read_fp_dreg(s, rm); + } + if (signal_all_nans) { + gen_helper_vfp_cmped_a64(tcg_flags, tcg_vn, tcg_vm, fpst); + } else { + gen_helper_vfp_cmpd_a64(tcg_flags, tcg_vn, tcg_vm, fpst); + } + } else { + TCGv_i32 tcg_vn = tcg_temp_new_i32(); + TCGv_i32 tcg_vm = tcg_temp_new_i32(); + + read_vec_element_i32(s, tcg_vn, rn, 0, size); + if (cmp_with_zero) { + tcg_gen_movi_i32(tcg_vm, 0); + } else { + read_vec_element_i32(s, tcg_vm, rm, 0, size); + } + + switch (size) { + case MO_32: + if (signal_all_nans) { + gen_helper_vfp_cmpes_a64(tcg_flags, tcg_vn, tcg_vm, fpst); + } else { + gen_helper_vfp_cmps_a64(tcg_flags, tcg_vn, tcg_vm, fpst); + } + break; + case MO_16: + if (signal_all_nans) { + gen_helper_vfp_cmpeh_a64(tcg_flags, tcg_vn, tcg_vm, fpst); + } else { + gen_helper_vfp_cmph_a64(tcg_flags, tcg_vn, tcg_vm, fpst); + } + break; + default: + g_assert_not_reached(); + } + } + + gen_set_nzcv(tcg_flags); +} + +/* FCMP, FCMPE */ +static bool trans_FCMP(DisasContext *s, arg_FCMP *a) +{ + int check = fp_access_check_scalar_hsd(s, a->esz); + + if (check <= 0) { + return check == 0; + } + + handle_fp_compare(s, a->esz, a->rn, a->rm, a->z, a->e); + return true; +} + +/* FCCMP, FCCMPE */ +static bool trans_FCCMP(DisasContext *s, arg_FCCMP *a) +{ + TCGLabel *label_continue = NULL; + int check = fp_access_check_scalar_hsd(s, a->esz); + + if (check <= 0) { + return check == 0; + } + + if (a->cond < 0x0e) { /* not always */ + TCGLabel *label_match = gen_new_label(); + label_continue = gen_new_label(); + arm_gen_test_cc(a->cond, label_match); + /* nomatch: */ + gen_set_nzcv(tcg_constant_i64(a->nzcv << 28)); + tcg_gen_br(label_continue); + gen_set_label(label_match); + } + + handle_fp_compare(s, a->esz, a->rn, a->rm, false, a->e); + + if (label_continue) { + gen_set_label(label_continue); + } + return true; +} + /* * Advanced SIMD Modified Immediate */ @@ -XXX,XX +XXX,XX @@ static bool trans_CSEL(DisasContext *s, arg_CSEL *a) return true; } -static void handle_fp_compare(DisasContext *s, int size, - unsigned int rn, unsigned int rm, - bool cmp_with_zero, bool signal_all_nans) -{ - TCGv_i64 tcg_flags = tcg_temp_new_i64(); - TCGv_ptr fpst = fpstatus_ptr(size == MO_16 ? FPST_FPCR_F16 : FPST_FPCR); - - if (size == MO_64) { - TCGv_i64 tcg_vn, tcg_vm; - - tcg_vn = read_fp_dreg(s, rn); - if (cmp_with_zero) { - tcg_vm = tcg_constant_i64(0); - } else { - tcg_vm = read_fp_dreg(s, rm); - } - if (signal_all_nans) { - gen_helper_vfp_cmped_a64(tcg_flags, tcg_vn, tcg_vm, fpst); - } else { - gen_helper_vfp_cmpd_a64(tcg_flags, tcg_vn, tcg_vm, fpst); - } - } else { - TCGv_i32 tcg_vn = tcg_temp_new_i32(); - TCGv_i32 tcg_vm = tcg_temp_new_i32(); - - read_vec_element_i32(s, tcg_vn, rn, 0, size); - if (cmp_with_zero) { - tcg_gen_movi_i32(tcg_vm, 0); - } else { - read_vec_element_i32(s, tcg_vm, rm, 0, size); - } - - switch (size) { - case MO_32: - if (signal_all_nans) { - gen_helper_vfp_cmpes_a64(tcg_flags, tcg_vn, tcg_vm, fpst); - } else { - gen_helper_vfp_cmps_a64(tcg_flags, tcg_vn, tcg_vm, fpst); - } - break; - case MO_16: - if (signal_all_nans) { - gen_helper_vfp_cmpeh_a64(tcg_flags, tcg_vn, tcg_vm, fpst); - } else { - gen_helper_vfp_cmph_a64(tcg_flags, tcg_vn, tcg_vm, fpst); - } - break; - default: - g_assert_not_reached(); - } - } - - gen_set_nzcv(tcg_flags); -} - -/* Floating point compare - * 31 30 29 28 24 23 22 21 20 16 15 14 13 10 9 5 4 0 - * +---+---+---+-----------+------+---+------+-----+---------+------+-------+ - * | M | 0 | S | 1 1 1 1 0 | type | 1 | Rm | op | 1 0 0 0 | Rn | op2 | - * +---+---+---+-----------+------+---+------+-----+---------+------+-------+ - */ -static void disas_fp_compare(DisasContext *s, uint32_t insn) -{ - unsigned int mos, type, rm, op, rn, opc, op2r; - int size; - - mos = extract32(insn, 29, 3); - type = extract32(insn, 22, 2); - rm = extract32(insn, 16, 5); - op = extract32(insn, 14, 2); - rn = extract32(insn, 5, 5); - opc = extract32(insn, 3, 2); - op2r = extract32(insn, 0, 3); - - if (mos || op || op2r) { - unallocated_encoding(s); - return; - } - - switch (type) { - case 0: - size = MO_32; - break; - case 1: - size = MO_64; - break; - case 3: - size = MO_16; - if (dc_isar_feature(aa64_fp16, s)) { - break; - } - /* fallthru */ - default: - unallocated_encoding(s); - return; - } - - if (!fp_access_check(s)) { - return; - } - - handle_fp_compare(s, size, rn, rm, opc & 1, opc & 2); -} - -/* Floating point conditional compare - * 31 30 29 28 24 23 22 21 20 16 15 12 11 10 9 5 4 3 0 - * +---+---+---+-----------+------+---+------+------+-----+------+----+------+ - * | M | 0 | S | 1 1 1 1 0 | type | 1 | Rm | cond | 0 1 | Rn | op | nzcv | - * +---+---+---+-----------+------+---+------+------+-----+------+----+------+ - */ -static void disas_fp_ccomp(DisasContext *s, uint32_t insn) -{ - unsigned int mos, type, rm, cond, rn, op, nzcv; - TCGLabel *label_continue = NULL; - int size; - - mos = extract32(insn, 29, 3); - type = extract32(insn, 22, 2); - rm = extract32(insn, 16, 5); - cond = extract32(insn, 12, 4); - rn = extract32(insn, 5, 5); - op = extract32(insn, 4, 1); - nzcv = extract32(insn, 0, 4); - - if (mos) { - unallocated_encoding(s); - return; - } - - switch (type) { - case 0: - size = MO_32; - break; - case 1: - size = MO_64; - break; - case 3: - size = MO_16; - if (dc_isar_feature(aa64_fp16, s)) { - break; - } - /* fallthru */ - default: - unallocated_encoding(s); - return; - } - - if (!fp_access_check(s)) { - return; - } - - if (cond < 0x0e) { /* not always */ - TCGLabel *label_match = gen_new_label(); - label_continue = gen_new_label(); - arm_gen_test_cc(cond, label_match); - /* nomatch: */ - gen_set_nzcv(tcg_constant_i64(nzcv << 28)); - tcg_gen_br(label_continue); - gen_set_label(label_match); - } - - handle_fp_compare(s, size, rn, rm, false, op); - - if (cond < 0x0e) { - gen_set_label(label_continue); - } -} - /* Floating-point data-processing (1 source) - half precision */ static void handle_fp_1src_half(DisasContext *s, int opcode, int rd, int rn) { @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_fp(DisasContext *s, uint32_t insn) disas_fp_fixed_conv(s, insn); } else { switch (extract32(insn, 10, 2)) { - case 1: - /* Floating point conditional compare */ - disas_fp_ccomp(s, insn); - break; - case 2: - /* Floating point data-processing (2 source) */ - unallocated_encoding(s); /* in decodetree */ - break; - case 3: - /* Floating point conditional select */ + case 1: /* Floating point conditional compare */ + case 2: /* Floating point data-processing (2 source) */ + case 3: /* Floating point conditional select */ unallocated_encoding(s); /* in decodetree */ break; case 0: @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_fp(DisasContext *s, uint32_t insn) break; case 1: /* [15:12] == xx10 */ /* Floating point compare */ - disas_fp_compare(s, insn); + unallocated_encoding(s); /* in decodetree */ break; case 2: /* [15:12] == x100 */ /* Floating point data-processing (1 source) */ -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> These opcodes are only supported as vector operations, not as advsimd scalar. Set only_in_vector, and remove the unreachable implementation of scalar fneg. Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20241211163036.2297116-24-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-a64.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn) break; case 0x2f: /* FABS */ case 0x6f: /* FNEG */ + only_in_vector = true; need_fpst = false; break; case 0x7d: /* FRSQRTE */ + break; case 0x7f: /* FSQRT (vector) */ + only_in_vector = true; break; default: unallocated_encoding(s); @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn) case 0x7b: /* FCVTZU */ gen_helper_advsimd_f16touinth(tcg_res, tcg_op, tcg_fpstatus); break; - case 0x6f: /* FNEG */ - tcg_gen_xori_i32(tcg_res, tcg_op, 0x8000); - break; case 0x7d: /* FRSQRTE */ gen_helper_rsqrte_f16(tcg_res, tcg_op, tcg_fpstatus); break; -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-25-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 7 +++ target/arm/tcg/translate-a64.c | 105 +++++++++++++++++++++++---------- 2 files changed, 81 insertions(+), 31 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ @rr_h ........ ... ..... ...... rn:5 rd:5 &rr_e esz=1 @rr_d ........ ... ..... ...... rn:5 rd:5 &rr_e esz=3 @rr_sd ........ ... ..... ...... rn:5 rd:5 &rr_e esz=%esz_sd +@rr_hsd ........ ... ..... ...... rn:5 rd:5 &rr_e esz=%esz_hsd @rrr_b ........ ... rm:5 ...... rn:5 rd:5 &rrr_e esz=0 @rrr_h ........ ... rm:5 ...... rn:5 rd:5 &rrr_e esz=1 @@ -XXX,XX +XXX,XX @@ FMAXV_s 0110 1110 00 11000 01111 10 ..... ..... @rr_q1e2 FMINV_h 0.00 1110 10 11000 01111 10 ..... ..... @qrr_h FMINV_s 0110 1110 10 11000 01111 10 ..... ..... @rr_q1e2 +# Floating-point data processing (1 source) + +FMOV_s 00011110 .. 1 000000 10000 ..... ..... @rr_hsd +FABS_s 00011110 .. 1 000001 10000 ..... ..... @rr_hsd +FNEG_s 00011110 .. 1 000010 10000 ..... ..... @rr_hsd + # Floating-point Immediate FMOVI_s 0001 1110 .. 1 imm:8 100 00000 rd:5 esz=%esz_hsd diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static bool trans_CSEL(DisasContext *s, arg_CSEL *a) return true; } +typedef struct FPScalar1Int { + void (*gen_h)(TCGv_i32, TCGv_i32); + void (*gen_s)(TCGv_i32, TCGv_i32); + void (*gen_d)(TCGv_i64, TCGv_i64); +} FPScalar1Int; + +static bool do_fp1_scalar_int(DisasContext *s, arg_rr_e *a, + const FPScalar1Int *f) +{ + switch (a->esz) { + case MO_64: + if (fp_access_check(s)) { + TCGv_i64 t = read_fp_dreg(s, a->rn); + f->gen_d(t, t); + write_fp_dreg(s, a->rd, t); + } + break; + case MO_32: + if (fp_access_check(s)) { + TCGv_i32 t = read_fp_sreg(s, a->rn); + f->gen_s(t, t); + write_fp_sreg(s, a->rd, t); + } + break; + case MO_16: + if (!dc_isar_feature(aa64_fp16, s)) { + return false; + } + if (fp_access_check(s)) { + TCGv_i32 t = read_fp_hreg(s, a->rn); + f->gen_h(t, t); + write_fp_sreg(s, a->rd, t); + } + break; + default: + return false; + } + return true; +} + +static const FPScalar1Int f_scalar_fmov = { + tcg_gen_mov_i32, + tcg_gen_mov_i32, + tcg_gen_mov_i64, +}; +TRANS(FMOV_s, do_fp1_scalar_int, a, &f_scalar_fmov) + +static const FPScalar1Int f_scalar_fabs = { + gen_vfp_absh, + gen_vfp_abss, + gen_vfp_absd, +}; +TRANS(FABS_s, do_fp1_scalar_int, a, &f_scalar_fabs) + +static const FPScalar1Int f_scalar_fneg = { + gen_vfp_negh, + gen_vfp_negs, + gen_vfp_negd, +}; +TRANS(FNEG_s, do_fp1_scalar_int, a, &f_scalar_fneg) + /* Floating-point data-processing (1 source) - half precision */ static void handle_fp_1src_half(DisasContext *s, int opcode, int rd, int rn) { @@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_half(DisasContext *s, int opcode, int rd, int rn) TCGv_i32 tcg_res = tcg_temp_new_i32(); switch (opcode) { - case 0x0: /* FMOV */ - tcg_gen_mov_i32(tcg_res, tcg_op); - break; - case 0x1: /* FABS */ - gen_vfp_absh(tcg_res, tcg_op); - break; - case 0x2: /* FNEG */ - gen_vfp_negh(tcg_res, tcg_op); - break; case 0x3: /* FSQRT */ fpst = fpstatus_ptr(FPST_FPCR_F16); gen_helper_sqrt_f16(tcg_res, tcg_op, fpst); @@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_half(DisasContext *s, int opcode, int rd, int rn) gen_helper_advsimd_rinth(tcg_res, tcg_op, fpst); break; default: + case 0x0: /* FMOV */ + case 0x1: /* FABS */ + case 0x2: /* FNEG */ g_assert_not_reached(); } @@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_single(DisasContext *s, int opcode, int rd, int rn) tcg_res = tcg_temp_new_i32(); switch (opcode) { - case 0x0: /* FMOV */ - tcg_gen_mov_i32(tcg_res, tcg_op); - goto done; - case 0x1: /* FABS */ - gen_vfp_abss(tcg_res, tcg_op); - goto done; - case 0x2: /* FNEG */ - gen_vfp_negs(tcg_res, tcg_op); - goto done; case 0x3: /* FSQRT */ gen_helper_vfp_sqrts(tcg_res, tcg_op, tcg_env); goto done; @@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_single(DisasContext *s, int opcode, int rd, int rn) gen_fpst = gen_helper_frint64_s; break; default: + case 0x0: /* FMOV */ + case 0x1: /* FABS */ + case 0x2: /* FNEG */ g_assert_not_reached(); } @@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_double(DisasContext *s, int opcode, int rd, int rn) TCGv_ptr fpst; int rmode = -1; - switch (opcode) { - case 0x0: /* FMOV */ - gen_gvec_fn2(s, false, rd, rn, tcg_gen_gvec_mov, 0); - return; - } - tcg_op = read_fp_dreg(s, rn); tcg_res = tcg_temp_new_i64(); switch (opcode) { - case 0x1: /* FABS */ - gen_vfp_absd(tcg_res, tcg_op); - goto done; - case 0x2: /* FNEG */ - gen_vfp_negd(tcg_res, tcg_op); - goto done; case 0x3: /* FSQRT */ gen_helper_vfp_sqrtd(tcg_res, tcg_op, tcg_env); goto done; @@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_double(DisasContext *s, int opcode, int rd, int rn) gen_fpst = gen_helper_frint64_d; break; default: + case 0x0: /* FMOV */ + case 0x1: /* FABS */ + case 0x2: /* FNEG */ g_assert_not_reached(); } @@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn) goto do_unallocated; } /* fall through */ - case 0x0 ... 0x3: + case 0x3: case 0x8 ... 0xc: case 0xe ... 0xf: /* 32-to-32 and 64-to-64 ops */ @@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn) default: do_unallocated: + case 0x0: /* FMOV */ + case 0x1: /* FABS */ + case 0x2: /* FNEG */ unallocated_encoding(s); break; } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Pass fpstatus not env, like most other fp helpers. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-26-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/helper.h | 6 +++--- target/arm/tcg/translate-a64.c | 15 +++++++-------- target/arm/tcg/translate-vfp.c | 6 +++--- target/arm/vfp_helper.c | 12 ++++++------ 4 files changed, 19 insertions(+), 20 deletions(-) diff --git a/target/arm/helper.h b/target/arm/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_maxnumd, f64, f64, f64, ptr) DEF_HELPER_3(vfp_minnumh, f16, f16, f16, ptr) DEF_HELPER_3(vfp_minnums, f32, f32, f32, ptr) DEF_HELPER_3(vfp_minnumd, f64, f64, f64, ptr) -DEF_HELPER_2(vfp_sqrth, f16, f16, env) -DEF_HELPER_2(vfp_sqrts, f32, f32, env) -DEF_HELPER_2(vfp_sqrtd, f64, f64, env) +DEF_HELPER_2(vfp_sqrth, f16, f16, ptr) +DEF_HELPER_2(vfp_sqrts, f32, f32, ptr) +DEF_HELPER_2(vfp_sqrtd, f64, f64, ptr) DEF_HELPER_3(vfp_cmph, void, f16, f16, env) DEF_HELPER_3(vfp_cmps, void, f32, f32, env) DEF_HELPER_3(vfp_cmpd, void, f64, f64, env) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_single(DisasContext *s, int opcode, int rd, int rn) switch (opcode) { case 0x3: /* FSQRT */ - gen_helper_vfp_sqrts(tcg_res, tcg_op, tcg_env); - goto done; + gen_fpst = gen_helper_vfp_sqrts; + break; case 0x6: /* BFCVT */ gen_fpst = gen_helper_bfcvt; break; @@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_single(DisasContext *s, int opcode, int rd, int rn) gen_fpst(tcg_res, tcg_op, fpst); } - done: write_fp_sreg(s, rd, tcg_res); } @@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_double(DisasContext *s, int opcode, int rd, int rn) switch (opcode) { case 0x3: /* FSQRT */ - gen_helper_vfp_sqrtd(tcg_res, tcg_op, tcg_env); - goto done; + gen_fpst = gen_helper_vfp_sqrtd; + break; case 0x8: /* FRINTN */ case 0x9: /* FRINTP */ case 0xa: /* FRINTM */ @@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_double(DisasContext *s, int opcode, int rd, int rn) gen_fpst(tcg_res, tcg_op, fpst); } - done: write_fp_dreg(s, rd, tcg_res); } @@ -XXX,XX +XXX,XX @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u, gen_vfp_negd(tcg_rd, tcg_rn); break; case 0x7f: /* FSQRT */ - gen_helper_vfp_sqrtd(tcg_rd, tcg_rn, tcg_env); + gen_helper_vfp_sqrtd(tcg_rd, tcg_rn, tcg_fpstatus); break; case 0x1a: /* FCVTNS */ case 0x1b: /* FCVTMS */ @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) handle_2misc_fcmp_zero(s, opcode, false, u, is_q, size, rn, rd); return; case 0x7f: /* FSQRT */ + need_fpstatus = true; if (size == 3 && !is_q) { unallocated_encoding(s); return; @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) gen_vfp_negs(tcg_res, tcg_op); break; case 0x7f: /* FSQRT */ - gen_helper_vfp_sqrts(tcg_res, tcg_op, tcg_env); + gen_helper_vfp_sqrts(tcg_res, tcg_op, tcg_fpstatus); break; case 0x1a: /* FCVTNS */ case 0x1b: /* FCVTMS */ diff --git a/target/arm/tcg/translate-vfp.c b/target/arm/tcg/translate-vfp.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-vfp.c +++ b/target/arm/tcg/translate-vfp.c @@ -XXX,XX +XXX,XX @@ DO_VFP_2OP(VNEG, dp, gen_vfp_negd, aa32_fpdp_v2) static void gen_VSQRT_hp(TCGv_i32 vd, TCGv_i32 vm) { - gen_helper_vfp_sqrth(vd, vm, tcg_env); + gen_helper_vfp_sqrth(vd, vm, fpstatus_ptr(FPST_FPCR_F16)); } static void gen_VSQRT_sp(TCGv_i32 vd, TCGv_i32 vm) { - gen_helper_vfp_sqrts(vd, vm, tcg_env); + gen_helper_vfp_sqrts(vd, vm, fpstatus_ptr(FPST_FPCR)); } static void gen_VSQRT_dp(TCGv_i64 vd, TCGv_i64 vm) { - gen_helper_vfp_sqrtd(vd, vm, tcg_env); + gen_helper_vfp_sqrtd(vd, vm, fpstatus_ptr(FPST_FPCR)); } DO_VFP_2OP(VSQRT, hp, gen_VSQRT_hp, aa32_fp16_arith) diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/vfp_helper.c +++ b/target/arm/vfp_helper.c @@ -XXX,XX +XXX,XX @@ VFP_BINOP(minnum) VFP_BINOP(maxnum) #undef VFP_BINOP -dh_ctype_f16 VFP_HELPER(sqrt, h)(dh_ctype_f16 a, CPUARMState *env) +dh_ctype_f16 VFP_HELPER(sqrt, h)(dh_ctype_f16 a, void *fpstp) { - return float16_sqrt(a, &env->vfp.fp_status_f16); + return float16_sqrt(a, fpstp); } -float32 VFP_HELPER(sqrt, s)(float32 a, CPUARMState *env) +float32 VFP_HELPER(sqrt, s)(float32 a, void *fpstp) { - return float32_sqrt(a, &env->vfp.fp_status); + return float32_sqrt(a, fpstp); } -float64 VFP_HELPER(sqrt, d)(float64 a, CPUARMState *env) +float64 VFP_HELPER(sqrt, d)(float64 a, void *fpstp) { - return float64_sqrt(a, &env->vfp.fp_status); + return float64_sqrt(a, fpstp); } static void softfloat_to_vfp_compare(CPUARMState *env, FloatRelation cmp) -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> This function is identical with helper_vfp_sqrth. Replace all uses. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-27-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-a64.h | 1 - target/arm/tcg/helper-a64.c | 11 ----------- target/arm/tcg/translate-a64.c | 4 ++-- 3 files changed, 2 insertions(+), 14 deletions(-) diff --git a/target/arm/tcg/helper-a64.h b/target/arm/tcg/helper-a64.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-a64.h +++ b/target/arm/tcg/helper-a64.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_2(advsimd_rinth_exact, f16, f16, ptr) DEF_HELPER_2(advsimd_rinth, f16, f16, ptr) DEF_HELPER_2(advsimd_f16tosinth, i32, f16, ptr) DEF_HELPER_2(advsimd_f16touinth, i32, f16, ptr) -DEF_HELPER_2(sqrt_f16, f16, f16, ptr) DEF_HELPER_2(exception_return, void, env, i64) DEF_HELPER_FLAGS_2(dc_zva, TCG_CALL_NO_WG, void, env, i64) diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-a64.c +++ b/target/arm/tcg/helper-a64.c @@ -XXX,XX +XXX,XX @@ illegal_return: "resuming execution at 0x%" PRIx64 "\n", cur_el, env->pc); } -/* - * Square Root and Reciprocal square root - */ - -uint32_t HELPER(sqrt_f16)(uint32_t a, void *fpstp) -{ - float_status *s = fpstp; - - return float16_sqrt(a, s); -} - void HELPER(dc_zva)(CPUARMState *env, uint64_t vaddr_in) { uintptr_t ra = GETPC(); diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_half(DisasContext *s, int opcode, int rd, int rn) switch (opcode) { case 0x3: /* FSQRT */ fpst = fpstatus_ptr(FPST_FPCR_F16); - gen_helper_sqrt_f16(tcg_res, tcg_op, fpst); + gen_helper_vfp_sqrth(tcg_res, tcg_op, fpst); break; case 0x8: /* FRINTN */ case 0x9: /* FRINTP */ @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn) gen_helper_rsqrte_f16(tcg_res, tcg_op, tcg_fpstatus); break; case 0x7f: /* FSQRT */ - gen_helper_sqrt_f16(tcg_res, tcg_op, tcg_fpstatus); + gen_helper_vfp_sqrth(tcg_res, tcg_op, tcg_fpstatus); break; default: g_assert_not_reached(); -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-28-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 1 + target/arm/tcg/translate-a64.c | 72 ++++++++++++++++++++++++++++------ 2 files changed, 62 insertions(+), 11 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ FMINV_s 0110 1110 10 11000 01111 10 ..... ..... @rr_q1e2 FMOV_s 00011110 .. 1 000000 10000 ..... ..... @rr_hsd FABS_s 00011110 .. 1 000001 10000 ..... ..... @rr_hsd FNEG_s 00011110 .. 1 000010 10000 ..... ..... @rr_hsd +FSQRT_s 00011110 .. 1 000011 10000 ..... ..... @rr_hsd # Floating-point Immediate diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static const FPScalar1Int f_scalar_fneg = { }; TRANS(FNEG_s, do_fp1_scalar_int, a, &f_scalar_fneg) +typedef struct FPScalar1 { + void (*gen_h)(TCGv_i32, TCGv_i32, TCGv_ptr); + void (*gen_s)(TCGv_i32, TCGv_i32, TCGv_ptr); + void (*gen_d)(TCGv_i64, TCGv_i64, TCGv_ptr); +} FPScalar1; + +static bool do_fp1_scalar(DisasContext *s, arg_rr_e *a, + const FPScalar1 *f, int rmode) +{ + TCGv_i32 tcg_rmode = NULL; + TCGv_ptr fpst; + TCGv_i64 t64; + TCGv_i32 t32; + int check = fp_access_check_scalar_hsd(s, a->esz); + + if (check <= 0) { + return check == 0; + } + + fpst = fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR); + if (rmode >= 0) { + tcg_rmode = gen_set_rmode(rmode, fpst); + } + + switch (a->esz) { + case MO_64: + t64 = read_fp_dreg(s, a->rn); + f->gen_d(t64, t64, fpst); + write_fp_dreg(s, a->rd, t64); + break; + case MO_32: + t32 = read_fp_sreg(s, a->rn); + f->gen_s(t32, t32, fpst); + write_fp_sreg(s, a->rd, t32); + break; + case MO_16: + t32 = read_fp_hreg(s, a->rn); + f->gen_h(t32, t32, fpst); + write_fp_sreg(s, a->rd, t32); + break; + default: + g_assert_not_reached(); + } + + if (rmode >= 0) { + gen_restore_rmode(tcg_rmode, fpst); + } + return true; +} + +static const FPScalar1 f_scalar_fsqrt = { + gen_helper_vfp_sqrth, + gen_helper_vfp_sqrts, + gen_helper_vfp_sqrtd, +}; +TRANS(FSQRT_s, do_fp1_scalar, a, &f_scalar_fsqrt, -1) + /* Floating-point data-processing (1 source) - half precision */ static void handle_fp_1src_half(DisasContext *s, int opcode, int rd, int rn) { @@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_half(DisasContext *s, int opcode, int rd, int rn) TCGv_i32 tcg_res = tcg_temp_new_i32(); switch (opcode) { - case 0x3: /* FSQRT */ - fpst = fpstatus_ptr(FPST_FPCR_F16); - gen_helper_vfp_sqrth(tcg_res, tcg_op, fpst); - break; case 0x8: /* FRINTN */ case 0x9: /* FRINTP */ case 0xa: /* FRINTM */ @@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_half(DisasContext *s, int opcode, int rd, int rn) case 0x0: /* FMOV */ case 0x1: /* FABS */ case 0x2: /* FNEG */ + case 0x3: /* FSQRT */ g_assert_not_reached(); } @@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_single(DisasContext *s, int opcode, int rd, int rn) tcg_res = tcg_temp_new_i32(); switch (opcode) { - case 0x3: /* FSQRT */ - gen_fpst = gen_helper_vfp_sqrts; - break; case 0x6: /* BFCVT */ gen_fpst = gen_helper_bfcvt; break; @@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_single(DisasContext *s, int opcode, int rd, int rn) case 0x0: /* FMOV */ case 0x1: /* FABS */ case 0x2: /* FNEG */ + case 0x3: /* FSQRT */ g_assert_not_reached(); } @@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_double(DisasContext *s, int opcode, int rd, int rn) tcg_res = tcg_temp_new_i64(); switch (opcode) { - case 0x3: /* FSQRT */ - gen_fpst = gen_helper_vfp_sqrtd; - break; case 0x8: /* FRINTN */ case 0x9: /* FRINTP */ case 0xa: /* FRINTM */ @@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_double(DisasContext *s, int opcode, int rd, int rn) case 0x0: /* FMOV */ case 0x1: /* FABS */ case 0x2: /* FNEG */ + case 0x3: /* FSQRT */ g_assert_not_reached(); } @@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn) goto do_unallocated; } /* fall through */ - case 0x3: case 0x8 ... 0xc: case 0xe ... 0xf: /* 32-to-32 and 64-to-64 ops */ @@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn) case 0x0: /* FMOV */ case 0x1: /* FABS */ case 0x2: /* FNEG */ + case 0x3: /* FSQRT */ unallocated_encoding(s); break; } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Remove handle_fp_1src_half as these were the last insns decoded by that function. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-29-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 8 +++ target/arm/tcg/translate-a64.c | 117 +++++++++++---------------------- 2 files changed, 46 insertions(+), 79 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ FABS_s 00011110 .. 1 000001 10000 ..... ..... @rr_hsd FNEG_s 00011110 .. 1 000010 10000 ..... ..... @rr_hsd FSQRT_s 00011110 .. 1 000011 10000 ..... ..... @rr_hsd +FRINTN_s 00011110 .. 1 001000 10000 ..... ..... @rr_hsd +FRINTP_s 00011110 .. 1 001001 10000 ..... ..... @rr_hsd +FRINTM_s 00011110 .. 1 001010 10000 ..... ..... @rr_hsd +FRINTZ_s 00011110 .. 1 001011 10000 ..... ..... @rr_hsd +FRINTA_s 00011110 .. 1 001100 10000 ..... ..... @rr_hsd +FRINTX_s 00011110 .. 1 001110 10000 ..... ..... @rr_hsd +FRINTI_s 00011110 .. 1 001111 10000 ..... ..... @rr_hsd + # Floating-point Immediate FMOVI_s 0001 1110 .. 1 imm:8 100 00000 rd:5 esz=%esz_hsd diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static const FPScalar1 f_scalar_fsqrt = { }; TRANS(FSQRT_s, do_fp1_scalar, a, &f_scalar_fsqrt, -1) -/* Floating-point data-processing (1 source) - half precision */ -static void handle_fp_1src_half(DisasContext *s, int opcode, int rd, int rn) -{ - TCGv_ptr fpst = NULL; - TCGv_i32 tcg_op = read_fp_hreg(s, rn); - TCGv_i32 tcg_res = tcg_temp_new_i32(); +static const FPScalar1 f_scalar_frint = { + gen_helper_advsimd_rinth, + gen_helper_rints, + gen_helper_rintd, +}; +TRANS(FRINTN_s, do_fp1_scalar, a, &f_scalar_frint, FPROUNDING_TIEEVEN) +TRANS(FRINTP_s, do_fp1_scalar, a, &f_scalar_frint, FPROUNDING_POSINF) +TRANS(FRINTM_s, do_fp1_scalar, a, &f_scalar_frint, FPROUNDING_NEGINF) +TRANS(FRINTZ_s, do_fp1_scalar, a, &f_scalar_frint, FPROUNDING_ZERO) +TRANS(FRINTA_s, do_fp1_scalar, a, &f_scalar_frint, FPROUNDING_TIEAWAY) +TRANS(FRINTI_s, do_fp1_scalar, a, &f_scalar_frint, -1) - switch (opcode) { - case 0x8: /* FRINTN */ - case 0x9: /* FRINTP */ - case 0xa: /* FRINTM */ - case 0xb: /* FRINTZ */ - case 0xc: /* FRINTA */ - { - TCGv_i32 tcg_rmode; - - fpst = fpstatus_ptr(FPST_FPCR_F16); - tcg_rmode = gen_set_rmode(opcode & 7, fpst); - gen_helper_advsimd_rinth(tcg_res, tcg_op, fpst); - gen_restore_rmode(tcg_rmode, fpst); - break; - } - case 0xe: /* FRINTX */ - fpst = fpstatus_ptr(FPST_FPCR_F16); - gen_helper_advsimd_rinth_exact(tcg_res, tcg_op, fpst); - break; - case 0xf: /* FRINTI */ - fpst = fpstatus_ptr(FPST_FPCR_F16); - gen_helper_advsimd_rinth(tcg_res, tcg_op, fpst); - break; - default: - case 0x0: /* FMOV */ - case 0x1: /* FABS */ - case 0x2: /* FNEG */ - case 0x3: /* FSQRT */ - g_assert_not_reached(); - } - - write_fp_sreg(s, rd, tcg_res); -} +static const FPScalar1 f_scalar_frintx = { + gen_helper_advsimd_rinth_exact, + gen_helper_rints_exact, + gen_helper_rintd_exact, +}; +TRANS(FRINTX_s, do_fp1_scalar, a, &f_scalar_frintx, -1) /* Floating-point data-processing (1 source) - single precision */ static void handle_fp_1src_single(DisasContext *s, int opcode, int rd, int rn) @@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_single(DisasContext *s, int opcode, int rd, int rn) case 0x6: /* BFCVT */ gen_fpst = gen_helper_bfcvt; break; - case 0x8: /* FRINTN */ - case 0x9: /* FRINTP */ - case 0xa: /* FRINTM */ - case 0xb: /* FRINTZ */ - case 0xc: /* FRINTA */ - rmode = opcode & 7; - gen_fpst = gen_helper_rints; - break; - case 0xe: /* FRINTX */ - gen_fpst = gen_helper_rints_exact; - break; - case 0xf: /* FRINTI */ - gen_fpst = gen_helper_rints; - break; case 0x10: /* FRINT32Z */ rmode = FPROUNDING_ZERO; gen_fpst = gen_helper_frint32_s; @@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_single(DisasContext *s, int opcode, int rd, int rn) case 0x1: /* FABS */ case 0x2: /* FNEG */ case 0x3: /* FSQRT */ + case 0x8: /* FRINTN */ + case 0x9: /* FRINTP */ + case 0xa: /* FRINTM */ + case 0xb: /* FRINTZ */ + case 0xc: /* FRINTA */ + case 0xe: /* FRINTX */ + case 0xf: /* FRINTI */ g_assert_not_reached(); } @@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_double(DisasContext *s, int opcode, int rd, int rn) tcg_res = tcg_temp_new_i64(); switch (opcode) { - case 0x8: /* FRINTN */ - case 0x9: /* FRINTP */ - case 0xa: /* FRINTM */ - case 0xb: /* FRINTZ */ - case 0xc: /* FRINTA */ - rmode = opcode & 7; - gen_fpst = gen_helper_rintd; - break; - case 0xe: /* FRINTX */ - gen_fpst = gen_helper_rintd_exact; - break; - case 0xf: /* FRINTI */ - gen_fpst = gen_helper_rintd; - break; case 0x10: /* FRINT32Z */ rmode = FPROUNDING_ZERO; gen_fpst = gen_helper_frint32_d; @@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_double(DisasContext *s, int opcode, int rd, int rn) case 0x1: /* FABS */ case 0x2: /* FNEG */ case 0x3: /* FSQRT */ + case 0x8: /* FRINTN */ + case 0x9: /* FRINTP */ + case 0xa: /* FRINTM */ + case 0xb: /* FRINTZ */ + case 0xc: /* FRINTA */ + case 0xe: /* FRINTX */ + case 0xf: /* FRINTI */ g_assert_not_reached(); } @@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn) if (type > 1 || !dc_isar_feature(aa64_frint, s)) { goto do_unallocated; } - /* fall through */ - case 0x8 ... 0xc: - case 0xe ... 0xf: /* 32-to-32 and 64-to-64 ops */ switch (type) { case 0: @@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn) handle_fp_1src_double(s, opcode, rd, rn); break; case 3: - if (!dc_isar_feature(aa64_fp16, s)) { - goto do_unallocated; - } - - if (!fp_access_check(s)) { - return; - } - handle_fp_1src_half(s, opcode, rd, rn); - break; default: goto do_unallocated; } @@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn) case 0x1: /* FABS */ case 0x2: /* FNEG */ case 0x3: /* FSQRT */ + case 0x8: /* FRINTN */ + case 0x9: /* FRINTP */ + case 0xa: /* FRINTM */ + case 0xb: /* FRINTZ */ + case 0xc: /* FRINTA */ + case 0xe: /* FRINTX */ + case 0xf: /* FRINTI */ unallocated_encoding(s); break; } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-30-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 3 +++ target/arm/tcg/translate-a64.c | 26 +++++++------------------- 2 files changed, 10 insertions(+), 19 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ &qrrrr_e q rd rn rm ra esz @rr_h ........ ... ..... ...... rn:5 rd:5 &rr_e esz=1 +@rr_s ........ ... ..... ...... rn:5 rd:5 &rr_e esz=2 @rr_d ........ ... ..... ...... rn:5 rd:5 &rr_e esz=3 @rr_sd ........ ... ..... ...... rn:5 rd:5 &rr_e esz=%esz_sd @rr_hsd ........ ... ..... ...... rn:5 rd:5 &rr_e esz=%esz_hsd @@ -XXX,XX +XXX,XX @@ FRINTA_s 00011110 .. 1 001100 10000 ..... ..... @rr_hsd FRINTX_s 00011110 .. 1 001110 10000 ..... ..... @rr_hsd FRINTI_s 00011110 .. 1 001111 10000 ..... ..... @rr_hsd +BFCVT_s 00011110 01 1 000110 10000 ..... ..... @rr_s + # Floating-point Immediate FMOVI_s 0001 1110 .. 1 imm:8 100 00000 rd:5 esz=%esz_hsd diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static const FPScalar1 f_scalar_frintx = { }; TRANS(FRINTX_s, do_fp1_scalar, a, &f_scalar_frintx, -1) +static const FPScalar1 f_scalar_bfcvt = { + .gen_s = gen_helper_bfcvt, +}; +TRANS_FEAT(BFCVT_s, aa64_bf16, do_fp1_scalar, a, &f_scalar_bfcvt, -1) + /* Floating-point data-processing (1 source) - single precision */ static void handle_fp_1src_single(DisasContext *s, int opcode, int rd, int rn) { @@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_single(DisasContext *s, int opcode, int rd, int rn) tcg_res = tcg_temp_new_i32(); switch (opcode) { - case 0x6: /* BFCVT */ - gen_fpst = gen_helper_bfcvt; - break; case 0x10: /* FRINT32Z */ rmode = FPROUNDING_ZERO; gen_fpst = gen_helper_frint32_s; @@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_single(DisasContext *s, int opcode, int rd, int rn) case 0x1: /* FABS */ case 0x2: /* FNEG */ case 0x3: /* FSQRT */ + case 0x6: /* BFCVT */ case 0x8: /* FRINTN */ case 0x9: /* FRINTP */ case 0xa: /* FRINTM */ @@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn) } break; - case 0x6: - switch (type) { - case 1: /* BFCVT */ - if (!dc_isar_feature(aa64_bf16, s)) { - goto do_unallocated; - } - if (!fp_access_check(s)) { - return; - } - handle_fp_1src_single(s, opcode, rd, rn); - break; - default: - goto do_unallocated; - } - break; - default: do_unallocated: case 0x0: /* FMOV */ case 0x1: /* FABS */ case 0x2: /* FNEG */ case 0x3: /* FSQRT */ + case 0x6: /* BFCVT */ case 0x8: /* FRINTN */ case 0x9: /* FRINTP */ case 0xa: /* FRINTM */ -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Remove handle_fp_1src_single and handle_fp_1src_double as these were the last insns decoded by those functions. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-31-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 5 ++ target/arm/tcg/translate-a64.c | 146 ++++----------------------------- 2 files changed, 22 insertions(+), 129 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ FRINTI_s 00011110 .. 1 001111 10000 ..... ..... @rr_hsd BFCVT_s 00011110 01 1 000110 10000 ..... ..... @rr_s +FRINT32Z_s 00011110 0. 1 010000 10000 ..... ..... @rr_sd +FRINT32X_s 00011110 0. 1 010001 10000 ..... ..... @rr_sd +FRINT64Z_s 00011110 0. 1 010010 10000 ..... ..... @rr_sd +FRINT64X_s 00011110 0. 1 010011 10000 ..... ..... @rr_sd + # Floating-point Immediate FMOVI_s 0001 1110 .. 1 imm:8 100 00000 rd:5 esz=%esz_hsd diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static const FPScalar1 f_scalar_bfcvt = { }; TRANS_FEAT(BFCVT_s, aa64_bf16, do_fp1_scalar, a, &f_scalar_bfcvt, -1) -/* Floating-point data-processing (1 source) - single precision */ -static void handle_fp_1src_single(DisasContext *s, int opcode, int rd, int rn) -{ - void (*gen_fpst)(TCGv_i32, TCGv_i32, TCGv_ptr); - TCGv_i32 tcg_op, tcg_res; - TCGv_ptr fpst; - int rmode = -1; +static const FPScalar1 f_scalar_frint32 = { + NULL, + gen_helper_frint32_s, + gen_helper_frint32_d, +}; +TRANS_FEAT(FRINT32Z_s, aa64_frint, do_fp1_scalar, a, + &f_scalar_frint32, FPROUNDING_ZERO) +TRANS_FEAT(FRINT32X_s, aa64_frint, do_fp1_scalar, a, &f_scalar_frint32, -1) - tcg_op = read_fp_sreg(s, rn); - tcg_res = tcg_temp_new_i32(); - - switch (opcode) { - case 0x10: /* FRINT32Z */ - rmode = FPROUNDING_ZERO; - gen_fpst = gen_helper_frint32_s; - break; - case 0x11: /* FRINT32X */ - gen_fpst = gen_helper_frint32_s; - break; - case 0x12: /* FRINT64Z */ - rmode = FPROUNDING_ZERO; - gen_fpst = gen_helper_frint64_s; - break; - case 0x13: /* FRINT64X */ - gen_fpst = gen_helper_frint64_s; - break; - default: - case 0x0: /* FMOV */ - case 0x1: /* FABS */ - case 0x2: /* FNEG */ - case 0x3: /* FSQRT */ - case 0x6: /* BFCVT */ - case 0x8: /* FRINTN */ - case 0x9: /* FRINTP */ - case 0xa: /* FRINTM */ - case 0xb: /* FRINTZ */ - case 0xc: /* FRINTA */ - case 0xe: /* FRINTX */ - case 0xf: /* FRINTI */ - g_assert_not_reached(); - } - - fpst = fpstatus_ptr(FPST_FPCR); - if (rmode >= 0) { - TCGv_i32 tcg_rmode = gen_set_rmode(rmode, fpst); - gen_fpst(tcg_res, tcg_op, fpst); - gen_restore_rmode(tcg_rmode, fpst); - } else { - gen_fpst(tcg_res, tcg_op, fpst); - } - - write_fp_sreg(s, rd, tcg_res); -} - -/* Floating-point data-processing (1 source) - double precision */ -static void handle_fp_1src_double(DisasContext *s, int opcode, int rd, int rn) -{ - void (*gen_fpst)(TCGv_i64, TCGv_i64, TCGv_ptr); - TCGv_i64 tcg_op, tcg_res; - TCGv_ptr fpst; - int rmode = -1; - - tcg_op = read_fp_dreg(s, rn); - tcg_res = tcg_temp_new_i64(); - - switch (opcode) { - case 0x10: /* FRINT32Z */ - rmode = FPROUNDING_ZERO; - gen_fpst = gen_helper_frint32_d; - break; - case 0x11: /* FRINT32X */ - gen_fpst = gen_helper_frint32_d; - break; - case 0x12: /* FRINT64Z */ - rmode = FPROUNDING_ZERO; - gen_fpst = gen_helper_frint64_d; - break; - case 0x13: /* FRINT64X */ - gen_fpst = gen_helper_frint64_d; - break; - default: - case 0x0: /* FMOV */ - case 0x1: /* FABS */ - case 0x2: /* FNEG */ - case 0x3: /* FSQRT */ - case 0x8: /* FRINTN */ - case 0x9: /* FRINTP */ - case 0xa: /* FRINTM */ - case 0xb: /* FRINTZ */ - case 0xc: /* FRINTA */ - case 0xe: /* FRINTX */ - case 0xf: /* FRINTI */ - g_assert_not_reached(); - } - - fpst = fpstatus_ptr(FPST_FPCR); - if (rmode >= 0) { - TCGv_i32 tcg_rmode = gen_set_rmode(rmode, fpst); - gen_fpst(tcg_res, tcg_op, fpst); - gen_restore_rmode(tcg_rmode, fpst); - } else { - gen_fpst(tcg_res, tcg_op, fpst); - } - - write_fp_dreg(s, rd, tcg_res); -} +static const FPScalar1 f_scalar_frint64 = { + NULL, + gen_helper_frint64_s, + gen_helper_frint64_d, +}; +TRANS_FEAT(FRINT64Z_s, aa64_frint, do_fp1_scalar, a, + &f_scalar_frint64, FPROUNDING_ZERO) +TRANS_FEAT(FRINT64X_s, aa64_frint, do_fp1_scalar, a, &f_scalar_frint64, -1) static void handle_fp_fcvt(DisasContext *s, int opcode, int rd, int rn, int dtype, int ntype) @@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn) break; } - case 0x10 ... 0x13: /* FRINT{32,64}{X,Z} */ - if (type > 1 || !dc_isar_feature(aa64_frint, s)) { - goto do_unallocated; - } - /* 32-to-32 and 64-to-64 ops */ - switch (type) { - case 0: - if (!fp_access_check(s)) { - return; - } - handle_fp_1src_single(s, opcode, rd, rn); - break; - case 1: - if (!fp_access_check(s)) { - return; - } - handle_fp_1src_double(s, opcode, rd, rn); - break; - case 3: - default: - goto do_unallocated; - } - break; - default: do_unallocated: case 0x0: /* FMOV */ @@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn) case 0xc: /* FRINTA */ case 0xe: /* FRINTX */ case 0xf: /* FRINTI */ + case 0x10 ... 0x13: /* FRINT{32,64}{X,Z} */ unallocated_encoding(s); break; } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Remove handle_fp_fcvt and disas_fp_1src as these were the last insns decoded by those functions. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-32-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 7 ++ target/arm/tcg/translate-a64.c | 172 +++++++++++++-------------------- 2 files changed, 74 insertions(+), 105 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ FRINT32X_s 00011110 0. 1 010001 10000 ..... ..... @rr_sd FRINT64Z_s 00011110 0. 1 010010 10000 ..... ..... @rr_sd FRINT64X_s 00011110 0. 1 010011 10000 ..... ..... @rr_sd +FCVT_s_ds 00011110 00 1 000101 10000 ..... ..... @rr +FCVT_s_hs 00011110 00 1 000111 10000 ..... ..... @rr +FCVT_s_sd 00011110 01 1 000100 10000 ..... ..... @rr +FCVT_s_hd 00011110 01 1 000111 10000 ..... ..... @rr +FCVT_s_sh 00011110 11 1 000100 10000 ..... ..... @rr +FCVT_s_dh 00011110 11 1 000101 10000 ..... ..... @rr + # Floating-point Immediate FMOVI_s 0001 1110 .. 1 imm:8 100 00000 rd:5 esz=%esz_hsd diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FRINT64Z_s, aa64_frint, do_fp1_scalar, a, &f_scalar_frint64, FPROUNDING_ZERO) TRANS_FEAT(FRINT64X_s, aa64_frint, do_fp1_scalar, a, &f_scalar_frint64, -1) -static void handle_fp_fcvt(DisasContext *s, int opcode, - int rd, int rn, int dtype, int ntype) +static bool trans_FCVT_s_ds(DisasContext *s, arg_rr *a) { - switch (ntype) { - case 0x0: - { - TCGv_i32 tcg_rn = read_fp_sreg(s, rn); - if (dtype == 1) { - /* Single to double */ - TCGv_i64 tcg_rd = tcg_temp_new_i64(); - gen_helper_vfp_fcvtds(tcg_rd, tcg_rn, tcg_env); - write_fp_dreg(s, rd, tcg_rd); - } else { - /* Single to half */ - TCGv_i32 tcg_rd = tcg_temp_new_i32(); - TCGv_i32 ahp = get_ahp_flag(); - TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR); + if (fp_access_check(s)) { + TCGv_i32 tcg_rn = read_fp_sreg(s, a->rn); + TCGv_i64 tcg_rd = tcg_temp_new_i64(); - gen_helper_vfp_fcvt_f32_to_f16(tcg_rd, tcg_rn, fpst, ahp); - /* write_fp_sreg is OK here because top half of tcg_rd is zero */ - write_fp_sreg(s, rd, tcg_rd); - } - break; - } - case 0x1: - { - TCGv_i64 tcg_rn = read_fp_dreg(s, rn); - TCGv_i32 tcg_rd = tcg_temp_new_i32(); - if (dtype == 0) { - /* Double to single */ - gen_helper_vfp_fcvtsd(tcg_rd, tcg_rn, tcg_env); - } else { - TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR); - TCGv_i32 ahp = get_ahp_flag(); - /* Double to half */ - gen_helper_vfp_fcvt_f64_to_f16(tcg_rd, tcg_rn, fpst, ahp); - /* write_fp_sreg is OK here because top half of tcg_rd is zero */ - } - write_fp_sreg(s, rd, tcg_rd); - break; - } - case 0x3: - { - TCGv_i32 tcg_rn = read_fp_sreg(s, rn); - TCGv_ptr tcg_fpst = fpstatus_ptr(FPST_FPCR); - TCGv_i32 tcg_ahp = get_ahp_flag(); - tcg_gen_ext16u_i32(tcg_rn, tcg_rn); - if (dtype == 0) { - /* Half to single */ - TCGv_i32 tcg_rd = tcg_temp_new_i32(); - gen_helper_vfp_fcvt_f16_to_f32(tcg_rd, tcg_rn, tcg_fpst, tcg_ahp); - write_fp_sreg(s, rd, tcg_rd); - } else { - /* Half to double */ - TCGv_i64 tcg_rd = tcg_temp_new_i64(); - gen_helper_vfp_fcvt_f16_to_f64(tcg_rd, tcg_rn, tcg_fpst, tcg_ahp); - write_fp_dreg(s, rd, tcg_rd); - } - break; - } - default: - g_assert_not_reached(); + gen_helper_vfp_fcvtds(tcg_rd, tcg_rn, tcg_env); + write_fp_dreg(s, a->rd, tcg_rd); } + return true; } -/* Floating point data-processing (1 source) - * 31 30 29 28 24 23 22 21 20 15 14 10 9 5 4 0 - * +---+---+---+-----------+------+---+--------+-----------+------+------+ - * | M | 0 | S | 1 1 1 1 0 | type | 1 | opcode | 1 0 0 0 0 | Rn | Rd | - * +---+---+---+-----------+------+---+--------+-----------+------+------+ - */ -static void disas_fp_1src(DisasContext *s, uint32_t insn) +static bool trans_FCVT_s_hs(DisasContext *s, arg_rr *a) { - int mos = extract32(insn, 29, 3); - int type = extract32(insn, 22, 2); - int opcode = extract32(insn, 15, 6); - int rn = extract32(insn, 5, 5); - int rd = extract32(insn, 0, 5); + if (fp_access_check(s)) { + TCGv_i32 tmp = read_fp_sreg(s, a->rn); + TCGv_i32 ahp = get_ahp_flag(); + TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR); - if (mos) { - goto do_unallocated; + gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp); + /* write_fp_sreg is OK here because top half of result is zero */ + write_fp_sreg(s, a->rd, tmp); } + return true; +} - switch (opcode) { - case 0x4: case 0x5: case 0x7: - { - /* FCVT between half, single and double precision */ - int dtype = extract32(opcode, 0, 2); - if (type == 2 || dtype == type) { - goto do_unallocated; - } - if (!fp_access_check(s)) { - return; - } +static bool trans_FCVT_s_sd(DisasContext *s, arg_rr *a) +{ + if (fp_access_check(s)) { + TCGv_i64 tcg_rn = read_fp_dreg(s, a->rn); + TCGv_i32 tcg_rd = tcg_temp_new_i32(); - handle_fp_fcvt(s, opcode, rd, rn, dtype, type); - break; + gen_helper_vfp_fcvtsd(tcg_rd, tcg_rn, tcg_env); + write_fp_sreg(s, a->rd, tcg_rd); } + return true; +} - default: - do_unallocated: - case 0x0: /* FMOV */ - case 0x1: /* FABS */ - case 0x2: /* FNEG */ - case 0x3: /* FSQRT */ - case 0x6: /* BFCVT */ - case 0x8: /* FRINTN */ - case 0x9: /* FRINTP */ - case 0xa: /* FRINTM */ - case 0xb: /* FRINTZ */ - case 0xc: /* FRINTA */ - case 0xe: /* FRINTX */ - case 0xf: /* FRINTI */ - case 0x10 ... 0x13: /* FRINT{32,64}{X,Z} */ - unallocated_encoding(s); - break; +static bool trans_FCVT_s_hd(DisasContext *s, arg_rr *a) +{ + if (fp_access_check(s)) { + TCGv_i64 tcg_rn = read_fp_dreg(s, a->rn); + TCGv_i32 tcg_rd = tcg_temp_new_i32(); + TCGv_i32 ahp = get_ahp_flag(); + TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR); + + gen_helper_vfp_fcvt_f64_to_f16(tcg_rd, tcg_rn, fpst, ahp); + /* write_fp_sreg is OK here because top half of tcg_rd is zero */ + write_fp_sreg(s, a->rd, tcg_rd); } + return true; +} + +static bool trans_FCVT_s_sh(DisasContext *s, arg_rr *a) +{ + if (fp_access_check(s)) { + TCGv_i32 tcg_rn = read_fp_hreg(s, a->rn); + TCGv_i32 tcg_rd = tcg_temp_new_i32(); + TCGv_ptr tcg_fpst = fpstatus_ptr(FPST_FPCR); + TCGv_i32 tcg_ahp = get_ahp_flag(); + + gen_helper_vfp_fcvt_f16_to_f32(tcg_rd, tcg_rn, tcg_fpst, tcg_ahp); + write_fp_sreg(s, a->rd, tcg_rd); + } + return true; +} + +static bool trans_FCVT_s_dh(DisasContext *s, arg_rr *a) +{ + if (fp_access_check(s)) { + TCGv_i32 tcg_rn = read_fp_hreg(s, a->rn); + TCGv_i64 tcg_rd = tcg_temp_new_i64(); + TCGv_ptr tcg_fpst = fpstatus_ptr(FPST_FPCR); + TCGv_i32 tcg_ahp = get_ahp_flag(); + + gen_helper_vfp_fcvt_f16_to_f64(tcg_rd, tcg_rn, tcg_fpst, tcg_ahp); + write_fp_dreg(s, a->rd, tcg_rd); + } + return true; } /* Handle floating point <=> fixed point conversions. Note that we can @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_fp(DisasContext *s, uint32_t insn) break; case 2: /* [15:12] == x100 */ /* Floating point data-processing (1 source) */ - disas_fp_1src(s, insn); + unallocated_encoding(s); /* in decodetree */ break; case 3: /* [15:12] == 1000 */ unallocated_encoding(s); -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> This includes SCVTF, UCVTF, FCVT{N,P,M,Z,A}{S,U}. Remove disas_fp_fixed_conv as those were the last insns decoded by that function. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-33-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 40 ++++ target/arm/tcg/translate-a64.c | 391 ++++++++++++++------------------- 2 files changed, 209 insertions(+), 222 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ FMAXV_s 0110 1110 00 11000 01111 10 ..... ..... @rr_q1e2 FMINV_h 0.00 1110 10 11000 01111 10 ..... ..... @qrr_h FMINV_s 0110 1110 10 11000 01111 10 ..... ..... @rr_q1e2 +# Conversion between floating-point and fixed-point (general register) + +&fcvt rd rn esz sf shift +%fcvt_shift32 10:5 !function=rsub_32 +%fcvt_shift64 10:6 !function=rsub_64 + +@fcvt32 0 ....... .. ...... 1..... rn:5 rd:5 \ + &fcvt sf=0 esz=%esz_hsd shift=%fcvt_shift32 +@fcvt64 1 ....... .. ...... ...... rn:5 rd:5 \ + &fcvt sf=1 esz=%esz_hsd shift=%fcvt_shift64 + +SCVTF_g . 0011110 .. 000010 ...... ..... ..... @fcvt32 +SCVTF_g . 0011110 .. 000010 ...... ..... ..... @fcvt64 +UCVTF_g . 0011110 .. 000011 ...... ..... ..... @fcvt32 +UCVTF_g . 0011110 .. 000011 ...... ..... ..... @fcvt64 + +FCVTZS_g . 0011110 .. 011000 ...... ..... ..... @fcvt32 +FCVTZS_g . 0011110 .. 011000 ...... ..... ..... @fcvt64 +FCVTZU_g . 0011110 .. 011001 ...... ..... ..... @fcvt32 +FCVTZU_g . 0011110 .. 011001 ...... ..... ..... @fcvt64 + +# Conversion between floating-point and integer (general register) + +@icvt sf:1 ....... .. ...... ...... rn:5 rd:5 \ + &fcvt esz=%esz_hsd shift=0 + +SCVTF_g . 0011110 .. 100010 000000 ..... ..... @icvt +UCVTF_g . 0011110 .. 100011 000000 ..... ..... @icvt + +FCVTNS_g . 0011110 .. 100000 000000 ..... ..... @icvt +FCVTNU_g . 0011110 .. 100001 000000 ..... ..... @icvt +FCVTPS_g . 0011110 .. 101000 000000 ..... ..... @icvt +FCVTPU_g . 0011110 .. 101001 000000 ..... ..... @icvt +FCVTMS_g . 0011110 .. 110000 000000 ..... ..... @icvt +FCVTMU_g . 0011110 .. 110001 000000 ..... ..... @icvt +FCVTZS_g . 0011110 .. 111000 000000 ..... ..... @icvt +FCVTZU_g . 0011110 .. 111001 000000 ..... ..... @icvt +FCVTAS_g . 0011110 .. 100100 000000 ..... ..... @icvt +FCVTAU_g . 0011110 .. 100101 000000 ..... ..... @icvt + # Floating-point data processing (1 source) FMOV_s 00011110 .. 1 000000 10000 ..... ..... @rr_hsd diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_s_dh(DisasContext *s, arg_rr *a) return true; } -/* Handle floating point <=> fixed point conversions. Note that we can - * also deal with fp <=> integer conversions as a special case (scale == 64) - * OPTME: consider handling that special case specially or at least skipping - * the call to scalbn in the helpers for zero shifts. - */ -static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode, - bool itof, int rmode, int scale, int sf, int type) +static bool do_cvtf_scalar(DisasContext *s, MemOp esz, int rd, int shift, + TCGv_i64 tcg_int, bool is_signed) { - bool is_signed = !(opcode & 1); TCGv_ptr tcg_fpstatus; TCGv_i32 tcg_shift, tcg_single; TCGv_i64 tcg_double; - tcg_fpstatus = fpstatus_ptr(type == 3 ? FPST_FPCR_F16 : FPST_FPCR); + tcg_fpstatus = fpstatus_ptr(esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR); + tcg_shift = tcg_constant_i32(shift); - tcg_shift = tcg_constant_i32(64 - scale); - - if (itof) { - TCGv_i64 tcg_int = cpu_reg(s, rn); - if (!sf) { - TCGv_i64 tcg_extend = tcg_temp_new_i64(); - - if (is_signed) { - tcg_gen_ext32s_i64(tcg_extend, tcg_int); - } else { - tcg_gen_ext32u_i64(tcg_extend, tcg_int); - } - - tcg_int = tcg_extend; + switch (esz) { + case MO_64: + tcg_double = tcg_temp_new_i64(); + if (is_signed) { + gen_helper_vfp_sqtod(tcg_double, tcg_int, tcg_shift, tcg_fpstatus); + } else { + gen_helper_vfp_uqtod(tcg_double, tcg_int, tcg_shift, tcg_fpstatus); } + write_fp_dreg(s, rd, tcg_double); + break; - switch (type) { - case 1: /* float64 */ - tcg_double = tcg_temp_new_i64(); - if (is_signed) { - gen_helper_vfp_sqtod(tcg_double, tcg_int, - tcg_shift, tcg_fpstatus); - } else { - gen_helper_vfp_uqtod(tcg_double, tcg_int, - tcg_shift, tcg_fpstatus); - } - write_fp_dreg(s, rd, tcg_double); - break; - - case 0: /* float32 */ - tcg_single = tcg_temp_new_i32(); - if (is_signed) { - gen_helper_vfp_sqtos(tcg_single, tcg_int, - tcg_shift, tcg_fpstatus); - } else { - gen_helper_vfp_uqtos(tcg_single, tcg_int, - tcg_shift, tcg_fpstatus); - } - write_fp_sreg(s, rd, tcg_single); - break; - - case 3: /* float16 */ - tcg_single = tcg_temp_new_i32(); - if (is_signed) { - gen_helper_vfp_sqtoh(tcg_single, tcg_int, - tcg_shift, tcg_fpstatus); - } else { - gen_helper_vfp_uqtoh(tcg_single, tcg_int, - tcg_shift, tcg_fpstatus); - } - write_fp_sreg(s, rd, tcg_single); - break; - - default: - g_assert_not_reached(); + case MO_32: + tcg_single = tcg_temp_new_i32(); + if (is_signed) { + gen_helper_vfp_sqtos(tcg_single, tcg_int, tcg_shift, tcg_fpstatus); + } else { + gen_helper_vfp_uqtos(tcg_single, tcg_int, tcg_shift, tcg_fpstatus); } - } else { - TCGv_i64 tcg_int = cpu_reg(s, rd); - TCGv_i32 tcg_rmode; + write_fp_sreg(s, rd, tcg_single); + break; - if (extract32(opcode, 2, 1)) { - /* There are too many rounding modes to all fit into rmode, - * so FCVTA[US] is a special case. - */ - rmode = FPROUNDING_TIEAWAY; + case MO_16: + tcg_single = tcg_temp_new_i32(); + if (is_signed) { + gen_helper_vfp_sqtoh(tcg_single, tcg_int, tcg_shift, tcg_fpstatus); + } else { + gen_helper_vfp_uqtoh(tcg_single, tcg_int, tcg_shift, tcg_fpstatus); } + write_fp_sreg(s, rd, tcg_single); + break; - tcg_rmode = gen_set_rmode(rmode, tcg_fpstatus); - - switch (type) { - case 1: /* float64 */ - tcg_double = read_fp_dreg(s, rn); - if (is_signed) { - if (!sf) { - gen_helper_vfp_tosld(tcg_int, tcg_double, - tcg_shift, tcg_fpstatus); - } else { - gen_helper_vfp_tosqd(tcg_int, tcg_double, - tcg_shift, tcg_fpstatus); - } - } else { - if (!sf) { - gen_helper_vfp_tould(tcg_int, tcg_double, - tcg_shift, tcg_fpstatus); - } else { - gen_helper_vfp_touqd(tcg_int, tcg_double, - tcg_shift, tcg_fpstatus); - } - } - if (!sf) { - tcg_gen_ext32u_i64(tcg_int, tcg_int); - } - break; - - case 0: /* float32 */ - tcg_single = read_fp_sreg(s, rn); - if (sf) { - if (is_signed) { - gen_helper_vfp_tosqs(tcg_int, tcg_single, - tcg_shift, tcg_fpstatus); - } else { - gen_helper_vfp_touqs(tcg_int, tcg_single, - tcg_shift, tcg_fpstatus); - } - } else { - TCGv_i32 tcg_dest = tcg_temp_new_i32(); - if (is_signed) { - gen_helper_vfp_tosls(tcg_dest, tcg_single, - tcg_shift, tcg_fpstatus); - } else { - gen_helper_vfp_touls(tcg_dest, tcg_single, - tcg_shift, tcg_fpstatus); - } - tcg_gen_extu_i32_i64(tcg_int, tcg_dest); - } - break; - - case 3: /* float16 */ - tcg_single = read_fp_sreg(s, rn); - if (sf) { - if (is_signed) { - gen_helper_vfp_tosqh(tcg_int, tcg_single, - tcg_shift, tcg_fpstatus); - } else { - gen_helper_vfp_touqh(tcg_int, tcg_single, - tcg_shift, tcg_fpstatus); - } - } else { - TCGv_i32 tcg_dest = tcg_temp_new_i32(); - if (is_signed) { - gen_helper_vfp_toslh(tcg_dest, tcg_single, - tcg_shift, tcg_fpstatus); - } else { - gen_helper_vfp_toulh(tcg_dest, tcg_single, - tcg_shift, tcg_fpstatus); - } - tcg_gen_extu_i32_i64(tcg_int, tcg_dest); - } - break; - - default: - g_assert_not_reached(); - } - - gen_restore_rmode(tcg_rmode, tcg_fpstatus); + default: + g_assert_not_reached(); } + return true; } -/* Floating point <-> fixed point conversions - * 31 30 29 28 24 23 22 21 20 19 18 16 15 10 9 5 4 0 - * +----+---+---+-----------+------+---+-------+--------+-------+------+------+ - * | sf | 0 | S | 1 1 1 1 0 | type | 0 | rmode | opcode | scale | Rn | Rd | - * +----+---+---+-----------+------+---+-------+--------+-------+------+------+ - */ -static void disas_fp_fixed_conv(DisasContext *s, uint32_t insn) +static bool do_cvtf_g(DisasContext *s, arg_fcvt *a, bool is_signed) { - int rd = extract32(insn, 0, 5); - int rn = extract32(insn, 5, 5); - int scale = extract32(insn, 10, 6); - int opcode = extract32(insn, 16, 3); - int rmode = extract32(insn, 19, 2); - int type = extract32(insn, 22, 2); - bool sbit = extract32(insn, 29, 1); - bool sf = extract32(insn, 31, 1); - bool itof; + TCGv_i64 tcg_int; + int check = fp_access_check_scalar_hsd(s, a->esz); - if (sbit || (!sf && scale < 32)) { - unallocated_encoding(s); - return; + if (check <= 0) { + return check == 0; } - switch (type) { - case 0: /* float32 */ - case 1: /* float64 */ - break; - case 3: /* float16 */ - if (dc_isar_feature(aa64_fp16, s)) { - break; + if (a->sf) { + tcg_int = cpu_reg(s, a->rn); + } else { + tcg_int = read_cpu_reg(s, a->rn, true); + if (is_signed) { + tcg_gen_ext32s_i64(tcg_int, tcg_int); + } else { + tcg_gen_ext32u_i64(tcg_int, tcg_int); } - /* fallthru */ - default: - unallocated_encoding(s); - return; } - - switch ((rmode << 3) | opcode) { - case 0x2: /* SCVTF */ - case 0x3: /* UCVTF */ - itof = true; - break; - case 0x18: /* FCVTZS */ - case 0x19: /* FCVTZU */ - itof = false; - break; - default: - unallocated_encoding(s); - return; - } - - if (!fp_access_check(s)) { - return; - } - - handle_fpfpcvt(s, rd, rn, opcode, itof, FPROUNDING_ZERO, scale, sf, type); + return do_cvtf_scalar(s, a->esz, a->rd, a->shift, tcg_int, is_signed); } +TRANS(SCVTF_g, do_cvtf_g, a, true) +TRANS(UCVTF_g, do_cvtf_g, a, false) + +static void do_fcvt_scalar(DisasContext *s, MemOp out, MemOp esz, + TCGv_i64 tcg_out, int shift, int rn, + ARMFPRounding rmode) +{ + TCGv_ptr tcg_fpstatus; + TCGv_i32 tcg_shift, tcg_rmode, tcg_single; + + tcg_fpstatus = fpstatus_ptr(esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR); + tcg_shift = tcg_constant_i32(shift); + tcg_rmode = gen_set_rmode(rmode, tcg_fpstatus); + + switch (esz) { + case MO_64: + read_vec_element(s, tcg_out, rn, 0, MO_64); + switch (out) { + case MO_64 | MO_SIGN: + gen_helper_vfp_tosqd(tcg_out, tcg_out, tcg_shift, tcg_fpstatus); + break; + case MO_64: + gen_helper_vfp_touqd(tcg_out, tcg_out, tcg_shift, tcg_fpstatus); + break; + case MO_32 | MO_SIGN: + gen_helper_vfp_tosld(tcg_out, tcg_out, tcg_shift, tcg_fpstatus); + break; + case MO_32: + gen_helper_vfp_tould(tcg_out, tcg_out, tcg_shift, tcg_fpstatus); + break; + default: + g_assert_not_reached(); + } + break; + + case MO_32: + tcg_single = read_fp_sreg(s, rn); + switch (out) { + case MO_64 | MO_SIGN: + gen_helper_vfp_tosqs(tcg_out, tcg_single, tcg_shift, tcg_fpstatus); + break; + case MO_64: + gen_helper_vfp_touqs(tcg_out, tcg_single, tcg_shift, tcg_fpstatus); + break; + case MO_32 | MO_SIGN: + gen_helper_vfp_tosls(tcg_single, tcg_single, + tcg_shift, tcg_fpstatus); + tcg_gen_extu_i32_i64(tcg_out, tcg_single); + break; + case MO_32: + gen_helper_vfp_touls(tcg_single, tcg_single, + tcg_shift, tcg_fpstatus); + tcg_gen_extu_i32_i64(tcg_out, tcg_single); + break; + default: + g_assert_not_reached(); + } + break; + + case MO_16: + tcg_single = read_fp_hreg(s, rn); + switch (out) { + case MO_64 | MO_SIGN: + gen_helper_vfp_tosqh(tcg_out, tcg_single, tcg_shift, tcg_fpstatus); + break; + case MO_64: + gen_helper_vfp_touqh(tcg_out, tcg_single, tcg_shift, tcg_fpstatus); + break; + case MO_32 | MO_SIGN: + gen_helper_vfp_toslh(tcg_single, tcg_single, + tcg_shift, tcg_fpstatus); + tcg_gen_extu_i32_i64(tcg_out, tcg_single); + break; + case MO_32: + gen_helper_vfp_toulh(tcg_single, tcg_single, + tcg_shift, tcg_fpstatus); + tcg_gen_extu_i32_i64(tcg_out, tcg_single); + break; + default: + g_assert_not_reached(); + } + break; + + default: + g_assert_not_reached(); + } + + gen_restore_rmode(tcg_rmode, tcg_fpstatus); +} + +static bool do_fcvt_g(DisasContext *s, arg_fcvt *a, + ARMFPRounding rmode, bool is_signed) +{ + TCGv_i64 tcg_int; + int check = fp_access_check_scalar_hsd(s, a->esz); + + if (check <= 0) { + return check == 0; + } + + tcg_int = cpu_reg(s, a->rd); + do_fcvt_scalar(s, (a->sf ? MO_64 : MO_32) | (is_signed ? MO_SIGN : 0), + a->esz, tcg_int, a->shift, a->rn, rmode); + + if (!a->sf) { + tcg_gen_ext32u_i64(tcg_int, tcg_int); + } + return true; +} + +TRANS(FCVTNS_g, do_fcvt_g, a, FPROUNDING_TIEEVEN, true) +TRANS(FCVTNU_g, do_fcvt_g, a, FPROUNDING_TIEEVEN, false) +TRANS(FCVTPS_g, do_fcvt_g, a, FPROUNDING_POSINF, true) +TRANS(FCVTPU_g, do_fcvt_g, a, FPROUNDING_POSINF, false) +TRANS(FCVTMS_g, do_fcvt_g, a, FPROUNDING_NEGINF, true) +TRANS(FCVTMU_g, do_fcvt_g, a, FPROUNDING_NEGINF, false) +TRANS(FCVTZS_g, do_fcvt_g, a, FPROUNDING_ZERO, true) +TRANS(FCVTZU_g, do_fcvt_g, a, FPROUNDING_ZERO, false) +TRANS(FCVTAS_g, do_fcvt_g, a, FPROUNDING_TIEAWAY, true) +TRANS(FCVTAU_g, do_fcvt_g, a, FPROUNDING_TIEAWAY, false) + static void handle_fmov(DisasContext *s, int rd, int rn, int type, bool itof) { /* FMOV: gpr to or from float, double, or top half of quad fp reg, @@ -XXX,XX +XXX,XX @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn) switch (opcode) { case 2: /* SCVTF */ case 3: /* UCVTF */ - itof = true; - /* fallthru */ case 4: /* FCVTAS */ case 5: /* FCVTAU */ - if (rmode != 0) { - goto do_unallocated; - } - /* fallthru */ case 0: /* FCVT[NPMZ]S */ case 1: /* FCVT[NPMZ]U */ - switch (type) { - case 0: /* float32 */ - case 1: /* float64 */ - break; - case 3: /* float16 */ - if (!dc_isar_feature(aa64_fp16, s)) { - goto do_unallocated; - } - break; - default: - goto do_unallocated; - } - if (!fp_access_check(s)) { - return; - } - handle_fpfpcvt(s, rd, rn, opcode, itof, rmode, 64, sf, type); - break; + goto do_unallocated; default: switch (sf << 7 | type << 5 | rmode << 3 | opcode) { @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_fp(DisasContext *s, uint32_t insn) unallocated_encoding(s); /* in decodetree */ } else if (extract32(insn, 21, 1) == 0) { /* Floating point to fixed point conversions */ - disas_fp_fixed_conv(s, insn); + unallocated_encoding(s); /* in decodetree */ } else { switch (extract32(insn, 10, 2)) { case 1: /* Floating point conditional compare */ -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-34-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 2 ++ target/arm/tcg/translate-a64.c | 41 +++++++++++++++++----------------- 2 files changed, 22 insertions(+), 21 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ FCVTZU_g . 0011110 .. 111001 000000 ..... ..... @icvt FCVTAS_g . 0011110 .. 100100 000000 ..... ..... @icvt FCVTAU_g . 0011110 .. 100101 000000 ..... ..... @icvt +FJCVTZS 0 0011110 01 111110 000000 ..... ..... @rr + # Floating-point data processing (1 source) FMOV_s 00011110 .. 1 000000 10000 ..... ..... @rr_hsd diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ TRANS(FCVTZU_g, do_fcvt_g, a, FPROUNDING_ZERO, false) TRANS(FCVTAS_g, do_fcvt_g, a, FPROUNDING_TIEAWAY, true) TRANS(FCVTAU_g, do_fcvt_g, a, FPROUNDING_TIEAWAY, false) +static bool trans_FJCVTZS(DisasContext *s, arg_FJCVTZS *a) +{ + if (!dc_isar_feature(aa64_jscvt, s)) { + return false; + } + if (fp_access_check(s)) { + TCGv_i64 t = read_fp_dreg(s, a->rn); + TCGv_ptr fpstatus = fpstatus_ptr(FPST_FPCR); + + gen_helper_fjcvtzs(t, t, fpstatus); + + tcg_gen_ext32u_i64(cpu_reg(s, a->rd), t); + tcg_gen_extrh_i64_i32(cpu_ZF, t); + tcg_gen_movi_i32(cpu_CF, 0); + tcg_gen_movi_i32(cpu_NF, 0); + tcg_gen_movi_i32(cpu_VF, 0); + } + return true; +} + static void handle_fmov(DisasContext *s, int rd, int rn, int type, bool itof) { /* FMOV: gpr to or from float, double, or top half of quad fp reg, @@ -XXX,XX +XXX,XX @@ static void handle_fmov(DisasContext *s, int rd, int rn, int type, bool itof) } } -static void handle_fjcvtzs(DisasContext *s, int rd, int rn) -{ - TCGv_i64 t = read_fp_dreg(s, rn); - TCGv_ptr fpstatus = fpstatus_ptr(FPST_FPCR); - - gen_helper_fjcvtzs(t, t, fpstatus); - - tcg_gen_ext32u_i64(cpu_reg(s, rd), t); - tcg_gen_extrh_i64_i32(cpu_ZF, t); - tcg_gen_movi_i32(cpu_CF, 0); - tcg_gen_movi_i32(cpu_NF, 0); - tcg_gen_movi_i32(cpu_VF, 0); -} - /* Floating point <-> integer conversions * 31 30 29 28 24 23 22 21 20 19 18 16 15 10 9 5 4 0 * +----+---+---+-----------+------+---+-------+-----+-------------+----+----+ @@ -XXX,XX +XXX,XX @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn) break; case 0b00111110: /* FJCVTZS */ - if (!dc_isar_feature(aa64_jscvt, s)) { - goto do_unallocated; - } else if (fp_access_check(s)) { - handle_fjcvtzs(s, rd, rn); - } - break; - default: do_unallocated: unallocated_encoding(s); -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Remove disas_fp_int_conv and disas_data_proc_fp as these were the last insns decoded by those functions. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-35-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 14 ++ target/arm/tcg/translate-a64.c | 232 ++++++++++----------------------- 2 files changed, 86 insertions(+), 160 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ FCVTAU_g . 0011110 .. 100101 000000 ..... ..... @icvt FJCVTZS 0 0011110 01 111110 000000 ..... ..... @rr +FMOV_ws 0 0011110 00 100110 000000 ..... ..... @rr +FMOV_sw 0 0011110 00 100111 000000 ..... ..... @rr + +FMOV_xd 1 0011110 01 100110 000000 ..... ..... @rr +FMOV_dx 1 0011110 01 100111 000000 ..... ..... @rr + +# Move to/from upper half of 128-bit +FMOV_xu 1 0011110 10 101110 000000 ..... ..... @rr +FMOV_ux 1 0011110 10 101111 000000 ..... ..... @rr + +# Half-precision allows both sf=0 and sf=1 with identical results +FMOV_xh - 0011110 11 100110 000000 ..... ..... @rr +FMOV_hx - 0011110 11 100111 000000 ..... ..... @rr + # Floating-point data processing (1 source) FMOV_s 00011110 .. 1 000000 10000 ..... ..... @rr_hsd diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static bool trans_FJCVTZS(DisasContext *s, arg_FJCVTZS *a) return true; } -static void handle_fmov(DisasContext *s, int rd, int rn, int type, bool itof) +static bool trans_FMOV_hx(DisasContext *s, arg_rr *a) { - /* FMOV: gpr to or from float, double, or top half of quad fp reg, - * without conversion. - */ - - if (itof) { - TCGv_i64 tcg_rn = cpu_reg(s, rn); - TCGv_i64 tmp; - - switch (type) { - case 0: - /* 32 bit */ - tmp = tcg_temp_new_i64(); - tcg_gen_ext32u_i64(tmp, tcg_rn); - write_fp_dreg(s, rd, tmp); - break; - case 1: - /* 64 bit */ - write_fp_dreg(s, rd, tcg_rn); - break; - case 2: - /* 64 bit to top half. */ - tcg_gen_st_i64(tcg_rn, tcg_env, fp_reg_hi_offset(s, rd)); - clear_vec_high(s, true, rd); - break; - case 3: - /* 16 bit */ - tmp = tcg_temp_new_i64(); - tcg_gen_ext16u_i64(tmp, tcg_rn); - write_fp_dreg(s, rd, tmp); - break; - default: - g_assert_not_reached(); - } - } else { - TCGv_i64 tcg_rd = cpu_reg(s, rd); - - switch (type) { - case 0: - /* 32 bit */ - tcg_gen_ld32u_i64(tcg_rd, tcg_env, fp_reg_offset(s, rn, MO_32)); - break; - case 1: - /* 64 bit */ - tcg_gen_ld_i64(tcg_rd, tcg_env, fp_reg_offset(s, rn, MO_64)); - break; - case 2: - /* 64 bits from top half */ - tcg_gen_ld_i64(tcg_rd, tcg_env, fp_reg_hi_offset(s, rn)); - break; - case 3: - /* 16 bit */ - tcg_gen_ld16u_i64(tcg_rd, tcg_env, fp_reg_offset(s, rn, MO_16)); - break; - default: - g_assert_not_reached(); - } + if (!dc_isar_feature(aa64_fp16, s)) { + return false; } + if (fp_access_check(s)) { + TCGv_i64 tcg_rn = cpu_reg(s, a->rn); + TCGv_i64 tmp = tcg_temp_new_i64(); + tcg_gen_ext16u_i64(tmp, tcg_rn); + write_fp_dreg(s, a->rd, tmp); + } + return true; } -/* Floating point <-> integer conversions - * 31 30 29 28 24 23 22 21 20 19 18 16 15 10 9 5 4 0 - * +----+---+---+-----------+------+---+-------+-----+-------------+----+----+ - * | sf | 0 | S | 1 1 1 1 0 | type | 1 | rmode | opc | 0 0 0 0 0 0 | Rn | Rd | - * +----+---+---+-----------+------+---+-------+-----+-------------+----+----+ - */ -static void disas_fp_int_conv(DisasContext *s, uint32_t insn) +static bool trans_FMOV_sw(DisasContext *s, arg_rr *a) { - int rd = extract32(insn, 0, 5); - int rn = extract32(insn, 5, 5); - int opcode = extract32(insn, 16, 3); - int rmode = extract32(insn, 19, 2); - int type = extract32(insn, 22, 2); - bool sbit = extract32(insn, 29, 1); - bool sf = extract32(insn, 31, 1); - bool itof = false; - - if (sbit) { - goto do_unallocated; - } - - switch (opcode) { - case 2: /* SCVTF */ - case 3: /* UCVTF */ - case 4: /* FCVTAS */ - case 5: /* FCVTAU */ - case 0: /* FCVT[NPMZ]S */ - case 1: /* FCVT[NPMZ]U */ - goto do_unallocated; - - default: - switch (sf << 7 | type << 5 | rmode << 3 | opcode) { - case 0b01100110: /* FMOV half <-> 32-bit int */ - case 0b01100111: - case 0b11100110: /* FMOV half <-> 64-bit int */ - case 0b11100111: - if (!dc_isar_feature(aa64_fp16, s)) { - goto do_unallocated; - } - /* fallthru */ - case 0b00000110: /* FMOV 32-bit */ - case 0b00000111: - case 0b10100110: /* FMOV 64-bit */ - case 0b10100111: - case 0b11001110: /* FMOV top half of 128-bit */ - case 0b11001111: - if (!fp_access_check(s)) { - return; - } - itof = opcode & 1; - handle_fmov(s, rd, rn, type, itof); - break; - - case 0b00111110: /* FJCVTZS */ - default: - do_unallocated: - unallocated_encoding(s); - return; - } - break; + if (fp_access_check(s)) { + TCGv_i64 tcg_rn = cpu_reg(s, a->rn); + TCGv_i64 tmp = tcg_temp_new_i64(); + tcg_gen_ext32u_i64(tmp, tcg_rn); + write_fp_dreg(s, a->rd, tmp); } + return true; } -/* FP-specific subcases of table C3-6 (SIMD and FP data processing) - * 31 30 29 28 25 24 0 - * +---+---+---+---------+-----------------------------+ - * | | 0 | | 1 1 1 1 | | - * +---+---+---+---------+-----------------------------+ - */ -static void disas_data_proc_fp(DisasContext *s, uint32_t insn) +static bool trans_FMOV_dx(DisasContext *s, arg_rr *a) { - if (extract32(insn, 24, 1)) { - unallocated_encoding(s); /* in decodetree */ - } else if (extract32(insn, 21, 1) == 0) { - /* Floating point to fixed point conversions */ - unallocated_encoding(s); /* in decodetree */ - } else { - switch (extract32(insn, 10, 2)) { - case 1: /* Floating point conditional compare */ - case 2: /* Floating point data-processing (2 source) */ - case 3: /* Floating point conditional select */ - unallocated_encoding(s); /* in decodetree */ - break; - case 0: - switch (ctz32(extract32(insn, 12, 4))) { - case 0: /* [15:12] == xxx1 */ - /* Floating point immediate */ - unallocated_encoding(s); /* in decodetree */ - break; - case 1: /* [15:12] == xx10 */ - /* Floating point compare */ - unallocated_encoding(s); /* in decodetree */ - break; - case 2: /* [15:12] == x100 */ - /* Floating point data-processing (1 source) */ - unallocated_encoding(s); /* in decodetree */ - break; - case 3: /* [15:12] == 1000 */ - unallocated_encoding(s); - break; - default: /* [15:12] == 0000 */ - /* Floating point <-> integer conversions */ - disas_fp_int_conv(s, insn); - break; - } - break; - } + if (fp_access_check(s)) { + TCGv_i64 tcg_rn = cpu_reg(s, a->rn); + write_fp_dreg(s, a->rd, tcg_rn); } + return true; +} + +static bool trans_FMOV_ux(DisasContext *s, arg_rr *a) +{ + if (fp_access_check(s)) { + TCGv_i64 tcg_rn = cpu_reg(s, a->rn); + tcg_gen_st_i64(tcg_rn, tcg_env, fp_reg_hi_offset(s, a->rd)); + clear_vec_high(s, true, a->rd); + } + return true; +} + +static bool trans_FMOV_xh(DisasContext *s, arg_rr *a) +{ + if (!dc_isar_feature(aa64_fp16, s)) { + return false; + } + if (fp_access_check(s)) { + TCGv_i64 tcg_rd = cpu_reg(s, a->rd); + tcg_gen_ld16u_i64(tcg_rd, tcg_env, fp_reg_offset(s, a->rn, MO_16)); + } + return true; +} + +static bool trans_FMOV_ws(DisasContext *s, arg_rr *a) +{ + if (fp_access_check(s)) { + TCGv_i64 tcg_rd = cpu_reg(s, a->rd); + tcg_gen_ld32u_i64(tcg_rd, tcg_env, fp_reg_offset(s, a->rn, MO_32)); + } + return true; +} + +static bool trans_FMOV_xd(DisasContext *s, arg_rr *a) +{ + if (fp_access_check(s)) { + TCGv_i64 tcg_rd = cpu_reg(s, a->rd); + tcg_gen_ld_i64(tcg_rd, tcg_env, fp_reg_offset(s, a->rn, MO_64)); + } + return true; +} + +static bool trans_FMOV_xu(DisasContext *s, arg_rr *a) +{ + if (fp_access_check(s)) { + TCGv_i64 tcg_rd = cpu_reg(s, a->rd); + tcg_gen_ld_i64(tcg_rd, tcg_env, fp_reg_hi_offset(s, a->rn)); + } + return true; } /* Common vector code for handling integer to FP conversion */ @@ -XXX,XX +XXX,XX @@ static void disas_data_proc_simd(DisasContext *s, uint32_t insn) static void disas_data_proc_simd_fp(DisasContext *s, uint32_t insn) { if (extract32(insn, 28, 1) == 1 && extract32(insn, 30, 1) == 0) { - disas_data_proc_fp(s, insn); + unallocated_encoding(s); /* in decodetree */ } else { /* SIMD, including crypto */ disas_data_proc_simd(s, insn); -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-36-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 11 +++ target/arm/tcg/translate-a64.c | 123 +++++++++++++++++++++------------ 2 files changed, 89 insertions(+), 45 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ @rr_h ........ ... ..... ...... rn:5 rd:5 &rr_e esz=1 @rr_s ........ ... ..... ...... rn:5 rd:5 &rr_e esz=2 @rr_d ........ ... ..... ...... rn:5 rd:5 &rr_e esz=3 +@rr_e ........ esz:2 . ..... ...... rn:5 rd:5 &rr_e @rr_sd ........ ... ..... ...... rn:5 rd:5 &rr_e esz=%esz_sd @rr_hsd ........ ... ..... ...... rn:5 rd:5 &rr_e esz=%esz_hsd @@ -XXX,XX +XXX,XX @@ UQRSHRN_si 0111 11110 .... ... 10011 1 ..... ..... @shri_s SQRSHRUN_si 0111 11110 .... ... 10001 1 ..... ..... @shri_b SQRSHRUN_si 0111 11110 .... ... 10001 1 ..... ..... @shri_h SQRSHRUN_si 0111 11110 .... ... 10001 1 ..... ..... @shri_s + +# Advanced SIMD scalar two-register miscellaneous + +SQABS_s 0101 1110 ..1 00000 01111 0 ..... ..... @rr_e +SQNEG_s 0111 1110 ..1 00000 01111 0 ..... ..... @rr_e + +# Advanced SIMD two-register miscellaneous + +SQABS_v 0.00 1110 ..1 00000 01111 0 ..... ..... @qrr_e +SQNEG_v 0.10 1110 ..1 00000 01111 0 ..... ..... @qrr_e diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static bool trans_FMOV_xu(DisasContext *s, arg_rr *a) return true; } +typedef struct ENVScalar1 { + NeonGenOneOpEnvFn *gen_bhs[3]; + NeonGenOne64OpEnvFn *gen_d; +} ENVScalar1; + +static bool do_env_scalar1(DisasContext *s, arg_rr_e *a, const ENVScalar1 *f) +{ + if (!fp_access_check(s)) { + return true; + } + if (a->esz == MO_64) { + TCGv_i64 t = read_fp_dreg(s, a->rn); + f->gen_d(t, tcg_env, t); + write_fp_dreg(s, a->rd, t); + } else { + TCGv_i32 t = tcg_temp_new_i32(); + + read_vec_element_i32(s, t, a->rn, 0, a->esz); + f->gen_bhs[a->esz](t, tcg_env, t); + write_fp_sreg(s, a->rd, t); + } + return true; +} + +static bool do_env_vector1(DisasContext *s, arg_qrr_e *a, const ENVScalar1 *f) +{ + if (a->esz == MO_64 && !a->q) { + return false; + } + if (!fp_access_check(s)) { + return true; + } + if (a->esz == MO_64) { + TCGv_i64 t = tcg_temp_new_i64(); + + for (int i = 0; i < 2; ++i) { + read_vec_element(s, t, a->rn, i, MO_64); + f->gen_d(t, tcg_env, t); + write_vec_element(s, t, a->rd, i, MO_64); + } + } else { + TCGv_i32 t = tcg_temp_new_i32(); + int n = (a->q ? 16 : 8) >> a->esz; + + for (int i = 0; i < n; ++i) { + read_vec_element_i32(s, t, a->rn, i, a->esz); + f->gen_bhs[a->esz](t, tcg_env, t); + write_vec_element_i32(s, t, a->rd, i, a->esz); + } + } + clear_vec_high(s, a->q, a->rd); + return true; +} + +static const ENVScalar1 f_scalar_sqabs = { + { gen_helper_neon_qabs_s8, + gen_helper_neon_qabs_s16, + gen_helper_neon_qabs_s32 }, + gen_helper_neon_qabs_s64, +}; +TRANS(SQABS_s, do_env_scalar1, a, &f_scalar_sqabs) +TRANS(SQABS_v, do_env_vector1, a, &f_scalar_sqabs) + +static const ENVScalar1 f_scalar_sqneg = { + { gen_helper_neon_qneg_s8, + gen_helper_neon_qneg_s16, + gen_helper_neon_qneg_s32 }, + gen_helper_neon_qneg_s64, +}; +TRANS(SQNEG_s, do_env_scalar1, a, &f_scalar_sqneg) +TRANS(SQNEG_v, do_env_vector1, a, &f_scalar_sqneg) + /* Common vector code for handling integer to FP conversion */ static void handle_simd_intfp_conv(DisasContext *s, int rd, int rn, int elements, int is_signed, @@ -XXX,XX +XXX,XX @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u, */ tcg_gen_not_i64(tcg_rd, tcg_rn); break; - case 0x7: /* SQABS, SQNEG */ - if (u) { - gen_helper_neon_qneg_s64(tcg_rd, tcg_env, tcg_rn); - } else { - gen_helper_neon_qabs_s64(tcg_rd, tcg_env, tcg_rn); - } - break; case 0xa: /* CMLT */ cond = TCG_COND_LT; do_cmop: @@ -XXX,XX +XXX,XX @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u, gen_helper_frint64_d(tcg_rd, tcg_rn, tcg_fpstatus); break; default: + case 0x7: /* SQABS, SQNEG */ g_assert_not_reached(); } } @@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) TCGv_ptr tcg_fpstatus; switch (opcode) { - case 0x7: /* SQABS / SQNEG */ - break; case 0xa: /* CMLT */ if (u) { unallocated_encoding(s); @@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) break; default: case 0x3: /* USQADD / SUQADD */ + case 0x7: /* SQABS / SQNEG */ unallocated_encoding(s); return; } @@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) read_vec_element_i32(s, tcg_rn, rn, 0, size); switch (opcode) { - case 0x7: /* SQABS, SQNEG */ - { - NeonGenOneOpEnvFn *genfn; - static NeonGenOneOpEnvFn * const fns[3][2] = { - { gen_helper_neon_qabs_s8, gen_helper_neon_qneg_s8 }, - { gen_helper_neon_qabs_s16, gen_helper_neon_qneg_s16 }, - { gen_helper_neon_qabs_s32, gen_helper_neon_qneg_s32 }, - }; - genfn = fns[size][u]; - genfn(tcg_rd, tcg_env, tcg_rn); - break; - } case 0x1a: /* FCVTNS */ case 0x1b: /* FCVTMS */ case 0x1c: /* FCVTAS */ @@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) tcg_fpstatus); break; default: + case 0x7: /* SQABS, SQNEG */ g_assert_not_reached(); } @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) return; } break; - case 0x7: /* SQABS, SQNEG */ - if (size == 3 && !is_q) { - unallocated_encoding(s); - return; - } - break; case 0xc ... 0xf: case 0x16 ... 0x1f: { @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) } default: case 0x3: /* SUQADD, USQADD */ + case 0x7: /* SQABS, SQNEG */ unallocated_encoding(s); return; } @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) tcg_gen_clrsb_i32(tcg_res, tcg_op); } break; - case 0x7: /* SQABS, SQNEG */ - if (u) { - gen_helper_neon_qneg_s32(tcg_res, tcg_env, tcg_op); - } else { - gen_helper_neon_qabs_s32(tcg_res, tcg_env, tcg_op); - } - break; case 0x2f: /* FABS */ gen_vfp_abss(tcg_res, tcg_op); break; @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) gen_helper_frint64_s(tcg_res, tcg_op, tcg_fpstatus); break; default: + case 0x7: /* SQABS, SQNEG */ g_assert_not_reached(); } } else { @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) gen_helper_neon_cnt_u8(tcg_res, tcg_op); } break; - case 0x7: /* SQABS, SQNEG */ - { - NeonGenOneOpEnvFn *genfn; - static NeonGenOneOpEnvFn * const fns[2][2] = { - { gen_helper_neon_qabs_s8, gen_helper_neon_qneg_s8 }, - { gen_helper_neon_qabs_s16, gen_helper_neon_qneg_s16 }, - }; - genfn = fns[size][u]; - genfn(tcg_res, tcg_env, tcg_op); - break; - } case 0x4: /* CLS, CLZ */ if (u) { if (size == 0) { @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) } break; default: + case 0x7: /* SQABS, SQNEG */ g_assert_not_reached(); } } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-37-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 4 +++ target/arm/tcg/translate-a64.c | 46 +++++++++++++++++++++++----------- 2 files changed, 35 insertions(+), 15 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ SQRSHRUN_si 0111 11110 .... ... 10001 1 ..... ..... @shri_s SQABS_s 0101 1110 ..1 00000 01111 0 ..... ..... @rr_e SQNEG_s 0111 1110 ..1 00000 01111 0 ..... ..... @rr_e +ABS_s 0101 1110 111 00000 10111 0 ..... ..... @rr +NEG_s 0111 1110 111 00000 10111 0 ..... ..... @rr # Advanced SIMD two-register miscellaneous SQABS_v 0.00 1110 ..1 00000 01111 0 ..... ..... @qrr_e SQNEG_v 0.10 1110 ..1 00000 01111 0 ..... ..... @qrr_e +ABS_v 0.00 1110 ..1 00000 10111 0 ..... ..... @qrr_e +NEG_v 0.10 1110 ..1 00000 10111 0 ..... ..... @qrr_e diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static const ENVScalar1 f_scalar_sqneg = { TRANS(SQNEG_s, do_env_scalar1, a, &f_scalar_sqneg) TRANS(SQNEG_v, do_env_vector1, a, &f_scalar_sqneg) +static bool do_scalar1_d(DisasContext *s, arg_rr *a, ArithOneOp *f) +{ + if (fp_access_check(s)) { + TCGv_i64 t = read_fp_dreg(s, a->rn); + f(t, t); + write_fp_dreg(s, a->rd, t); + } + return true; +} + +TRANS(ABS_s, do_scalar1_d, a, tcg_gen_abs_i64) +TRANS(NEG_s, do_scalar1_d, a, tcg_gen_neg_i64) + +static bool do_gvec_fn2(DisasContext *s, arg_qrr_e *a, GVecGen2Fn *fn) +{ + if (!a->q && a->esz == MO_64) { + return false; + } + if (fp_access_check(s)) { + gen_gvec_fn2(s, a->q, a->rd, a->rn, fn, a->esz); + } + return true; +} + +TRANS(ABS_v, do_gvec_fn2, a, tcg_gen_gvec_abs) +TRANS(NEG_v, do_gvec_fn2, a, tcg_gen_gvec_neg) + /* Common vector code for handling integer to FP conversion */ static void handle_simd_intfp_conv(DisasContext *s, int rd, int rn, int elements, int is_signed, @@ -XXX,XX +XXX,XX @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u, case 0x9: /* CMEQ, CMLE */ cond = u ? TCG_COND_LE : TCG_COND_EQ; goto do_cmop; - case 0xb: /* ABS, NEG */ - if (u) { - tcg_gen_neg_i64(tcg_rd, tcg_rn); - } else { - tcg_gen_abs_i64(tcg_rd, tcg_rn); - } - break; case 0x2f: /* FABS */ gen_vfp_absd(tcg_rd, tcg_rn); break; @@ -XXX,XX +XXX,XX @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u, break; default: case 0x7: /* SQABS, SQNEG */ + case 0xb: /* ABS, NEG */ g_assert_not_reached(); } } @@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) /* fall through */ case 0x8: /* CMGT, CMGE */ case 0x9: /* CMEQ, CMLE */ - case 0xb: /* ABS, NEG */ if (size != 3) { unallocated_encoding(s); return; @@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) default: case 0x3: /* USQADD / SUQADD */ case 0x7: /* SQABS / SQNEG */ + case 0xb: /* ABS, NEG */ unallocated_encoding(s); return; } @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) /* fall through */ case 0x8: /* CMGT, CMGE */ case 0x9: /* CMEQ, CMLE */ - case 0xb: /* ABS, NEG */ if (size == 3 && !is_q) { unallocated_encoding(s); return; @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) default: case 0x3: /* SUQADD, USQADD */ case 0x7: /* SQABS, SQNEG */ + case 0xb: /* ABS, NEG */ unallocated_encoding(s); return; } @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) gen_gvec_fn2(s, is_q, rd, rn, gen_gvec_clt0, size); return; case 0xb: - if (u) { /* ABS, NEG */ - gen_gvec_fn2(s, is_q, rd, rn, tcg_gen_gvec_neg, size); - } else { - gen_gvec_fn2(s, is_q, rd, rn, tcg_gen_gvec_abs, size); - } - return; + g_assert_not_reached(); } if (size == 3) { -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Add gvec interfaces for CLS and CLZ operations. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-38-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate.h | 5 +++++ target/arm/tcg/gengvec.c | 35 +++++++++++++++++++++++++++++++++ target/arm/tcg/translate-a64.c | 29 +++++++-------------------- target/arm/tcg/translate-neon.c | 29 ++------------------------- 4 files changed, 49 insertions(+), 49 deletions(-) diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate.h +++ b/target/arm/tcg/translate.h @@ -XXX,XX +XXX,XX @@ void gen_gvec_umaxp(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, void gen_gvec_uminp(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz); +void gen_gvec_cls(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t opr_sz, uint32_t max_sz); +void gen_gvec_clz(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t opr_sz, uint32_t max_sz); + /* * Forward to the isar_feature_* tests given a DisasContext pointer. */ diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/gengvec.c +++ b/target/arm/tcg/gengvec.c @@ -XXX,XX +XXX,XX @@ void gen_gvec_urhadd(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, assert(vece <= MO_32); tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, &g[vece]); } + +void gen_gvec_cls(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t opr_sz, uint32_t max_sz) +{ + static const GVecGen2 g[] = { + { .fni4 = gen_helper_neon_cls_s8, + .vece = MO_8 }, + { .fni4 = gen_helper_neon_cls_s16, + .vece = MO_16 }, + { .fni4 = tcg_gen_clrsb_i32, + .vece = MO_32 }, + }; + assert(vece <= MO_32); + tcg_gen_gvec_2(rd_ofs, rn_ofs, opr_sz, max_sz, &g[vece]); +} + +static void gen_clz32_i32(TCGv_i32 d, TCGv_i32 n) +{ + tcg_gen_clzi_i32(d, n, 32); +} + +void gen_gvec_clz(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t opr_sz, uint32_t max_sz) +{ + static const GVecGen2 g[] = { + { .fni4 = gen_helper_neon_clz_u8, + .vece = MO_8 }, + { .fni4 = gen_helper_neon_clz_u16, + .vece = MO_16 }, + { .fni4 = gen_clz32_i32, + .vece = MO_32 }, + }; + assert(vece <= MO_32); + tcg_gen_gvec_2(rd_ofs, rn_ofs, opr_sz, max_sz, &g[vece]); +} diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) } switch (opcode) { + case 0x4: /* CLZ, CLS */ + if (u) { + gen_gvec_fn2(s, is_q, rd, rn, gen_gvec_clz, size); + } else { + gen_gvec_fn2(s, is_q, rd, rn, gen_gvec_cls, size); + } + return; case 0x5: if (u && size == 0) { /* NOT */ gen_gvec_fn2(s, is_q, rd, rn, tcg_gen_gvec_not, 0); @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) if (size == 2) { /* Special cases for 32 bit elements */ switch (opcode) { - case 0x4: /* CLS */ - if (u) { - tcg_gen_clzi_i32(tcg_res, tcg_op, 32); - } else { - tcg_gen_clrsb_i32(tcg_res, tcg_op); - } - break; case 0x2f: /* FABS */ gen_vfp_abss(tcg_res, tcg_op); break; @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) gen_helper_neon_cnt_u8(tcg_res, tcg_op); } break; - case 0x4: /* CLS, CLZ */ - if (u) { - if (size == 0) { - gen_helper_neon_clz_u8(tcg_res, tcg_op); - } else { - gen_helper_neon_clz_u16(tcg_res, tcg_op); - } - } else { - if (size == 0) { - gen_helper_neon_cls_s8(tcg_res, tcg_op); - } else { - gen_helper_neon_cls_s16(tcg_res, tcg_op); - } - } - break; default: case 0x7: /* SQABS, SQNEG */ g_assert_not_reached(); diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-neon.c +++ b/target/arm/tcg/translate-neon.c @@ -XXX,XX +XXX,XX @@ DO_2MISC_VEC(VCGT0, gen_gvec_cgt0) DO_2MISC_VEC(VCLE0, gen_gvec_cle0) DO_2MISC_VEC(VCGE0, gen_gvec_cge0) DO_2MISC_VEC(VCLT0, gen_gvec_clt0) +DO_2MISC_VEC(VCLS, gen_gvec_cls) +DO_2MISC_VEC(VCLZ, gen_gvec_clz) static bool trans_VMVN(DisasContext *s, arg_2misc *a) { @@ -XXX,XX +XXX,XX @@ static bool trans_VREV16(DisasContext *s, arg_2misc *a) return do_2misc(s, a, gen_rev16); } -static bool trans_VCLS(DisasContext *s, arg_2misc *a) -{ - static NeonGenOneOpFn * const fn[] = { - gen_helper_neon_cls_s8, - gen_helper_neon_cls_s16, - gen_helper_neon_cls_s32, - NULL, - }; - return do_2misc(s, a, fn[a->size]); -} - -static void do_VCLZ_32(TCGv_i32 rd, TCGv_i32 rm) -{ - tcg_gen_clzi_i32(rd, rm, 32); -} - -static bool trans_VCLZ(DisasContext *s, arg_2misc *a) -{ - static NeonGenOneOpFn * const fn[] = { - gen_helper_neon_clz_u8, - gen_helper_neon_clz_u16, - do_VCLZ_32, - NULL, - }; - return do_2misc(s, a, fn[a->size]); -} - static bool trans_VCNT(DisasContext *s, arg_2misc *a) { if (a->size != 0) { -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-39-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 2 ++ target/arm/tcg/translate-a64.c | 37 ++++++++++++++++------------------ 2 files changed, 19 insertions(+), 20 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ SQABS_v 0.00 1110 ..1 00000 01111 0 ..... ..... @qrr_e SQNEG_v 0.10 1110 ..1 00000 01111 0 ..... ..... @qrr_e ABS_v 0.00 1110 ..1 00000 10111 0 ..... ..... @qrr_e NEG_v 0.10 1110 ..1 00000 10111 0 ..... ..... @qrr_e +CLS_v 0.00 1110 ..1 00000 01001 0 ..... ..... @qrr_e +CLZ_v 0.10 1110 ..1 00000 01001 0 ..... ..... @qrr_e diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static bool do_gvec_fn2(DisasContext *s, arg_qrr_e *a, GVecGen2Fn *fn) TRANS(ABS_v, do_gvec_fn2, a, tcg_gen_gvec_abs) TRANS(NEG_v, do_gvec_fn2, a, tcg_gen_gvec_neg) +static bool do_gvec_fn2_bhs(DisasContext *s, arg_qrr_e *a, GVecGen2Fn *fn) +{ + if (a->esz == MO_64) { + return false; + } + if (fp_access_check(s)) { + gen_gvec_fn2(s, a->q, a->rd, a->rn, fn, a->esz); + } + return true; +} + +TRANS(CLS_v, do_gvec_fn2_bhs, a, gen_gvec_cls) +TRANS(CLZ_v, do_gvec_fn2_bhs, a, gen_gvec_clz) + /* Common vector code for handling integer to FP conversion */ static void handle_simd_intfp_conv(DisasContext *s, int rd, int rn, int elements, int is_signed, @@ -XXX,XX +XXX,XX @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u, TCGCond cond; switch (opcode) { - case 0x4: /* CLS, CLZ */ - if (u) { - tcg_gen_clzi_i64(tcg_rd, tcg_rn, 64); - } else { - tcg_gen_clrsb_i64(tcg_rd, tcg_rn); - } - break; case 0x5: /* NOT */ /* This opcode is shared with CNT and RBIT but we have earlier * enforced that size == 3 if and only if this is the NOT insn. @@ -XXX,XX +XXX,XX @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u, gen_helper_frint64_d(tcg_rd, tcg_rn, tcg_fpstatus); break; default: + case 0x4: /* CLS, CLZ */ case 0x7: /* SQABS, SQNEG */ case 0xb: /* ABS, NEG */ g_assert_not_reached(); @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) handle_2misc_narrow(s, false, opcode, u, is_q, size, rn, rd); return; - case 0x4: /* CLS, CLZ */ - if (size == 3) { - unallocated_encoding(s); - return; - } - break; case 0x2: /* SADDLP, UADDLP */ case 0x6: /* SADALP, UADALP */ if (size == 3) { @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) } default: case 0x3: /* SUQADD, USQADD */ + case 0x4: /* CLS, CLZ */ case 0x7: /* SQABS, SQNEG */ case 0xb: /* ABS, NEG */ unallocated_encoding(s); @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) } switch (opcode) { - case 0x4: /* CLZ, CLS */ - if (u) { - gen_gvec_fn2(s, is_q, rd, rn, gen_gvec_clz, size); - } else { - gen_gvec_fn2(s, is_q, rd, rn, gen_gvec_cls, size); - } - return; case 0x5: if (u && size == 0) { /* NOT */ gen_gvec_fn2(s, is_q, rd, rn, tcg_gen_gvec_not, 0); @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) case 0xa: /* CMLT */ gen_gvec_fn2(s, is_q, rd, rn, gen_gvec_clt0, size); return; + case 0x4: /* CLZ, CLS */ case 0xb: g_assert_not_reached(); } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Add gvec interfaces for CNT and RBIT operations. Use ctpop8 for CNT and revbit+bswap for RBIT. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-40-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/helper.h | 4 ++-- target/arm/tcg/translate.h | 4 ++++ target/arm/tcg/gengvec.c | 16 ++++++++++++++++ target/arm/tcg/neon_helper.c | 21 --------------------- target/arm/tcg/translate-a64.c | 32 +++++++++----------------------- target/arm/tcg/translate-neon.c | 16 ++++++++-------- target/arm/tcg/vec_helper.c | 24 ++++++++++++++++++++++++ 7 files changed, 63 insertions(+), 54 deletions(-) diff --git a/target/arm/helper.h b/target/arm/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_1(neon_clz_u16, i32, i32) DEF_HELPER_1(neon_cls_s8, i32, i32) DEF_HELPER_1(neon_cls_s16, i32, i32) DEF_HELPER_1(neon_cls_s32, i32, i32) -DEF_HELPER_1(neon_cnt_u8, i32, i32) -DEF_HELPER_FLAGS_1(neon_rbit_u8, TCG_CALL_NO_RWG_SE, i32, i32) +DEF_HELPER_FLAGS_3(gvec_cnt_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_rbit_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_3(neon_qdmulh_s16, i32, env, i32, i32) DEF_HELPER_3(neon_qrdmulh_s16, i32, env, i32, i32) diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate.h +++ b/target/arm/tcg/translate.h @@ -XXX,XX +XXX,XX @@ void gen_gvec_cls(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, uint32_t opr_sz, uint32_t max_sz); void gen_gvec_clz(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, uint32_t opr_sz, uint32_t max_sz); +void gen_gvec_cnt(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t opr_sz, uint32_t max_sz); +void gen_gvec_rbit(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t opr_sz, uint32_t max_sz); /* * Forward to the isar_feature_* tests given a DisasContext pointer. diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/gengvec.c +++ b/target/arm/tcg/gengvec.c @@ -XXX,XX +XXX,XX @@ void gen_gvec_clz(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, assert(vece <= MO_32); tcg_gen_gvec_2(rd_ofs, rn_ofs, opr_sz, max_sz, &g[vece]); } + +void gen_gvec_cnt(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t opr_sz, uint32_t max_sz) +{ + assert(vece == MO_8); + tcg_gen_gvec_2_ool(rd_ofs, rn_ofs, opr_sz, max_sz, 0, + gen_helper_gvec_cnt_b); +} + +void gen_gvec_rbit(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t opr_sz, uint32_t max_sz) +{ + assert(vece == MO_8); + tcg_gen_gvec_2_ool(rd_ofs, rn_ofs, opr_sz, max_sz, 0, + gen_helper_gvec_rbit_b); +} diff --git a/target/arm/tcg/neon_helper.c b/target/arm/tcg/neon_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/neon_helper.c +++ b/target/arm/tcg/neon_helper.c @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(neon_cls_s32)(uint32_t x) return count - 1; } -/* Bit count. */ -uint32_t HELPER(neon_cnt_u8)(uint32_t x) -{ - x = (x & 0x55555555) + ((x >> 1) & 0x55555555); - x = (x & 0x33333333) + ((x >> 2) & 0x33333333); - x = (x & 0x0f0f0f0f) + ((x >> 4) & 0x0f0f0f0f); - return x; -} - -/* Reverse bits in each 8 bit word */ -uint32_t HELPER(neon_rbit_u8)(uint32_t x) -{ - x = ((x & 0xf0f0f0f0) >> 4) - | ((x & 0x0f0f0f0f) << 4); - x = ((x & 0x88888888) >> 3) - | ((x & 0x44444444) >> 1) - | ((x & 0x22222222) << 1) - | ((x & 0x11111111) << 3); - return x; -} - #define NEON_QDMULH16(dest, src1, src2, round) do { \ uint32_t tmp = (int32_t)(int16_t) src1 * (int16_t) src2; \ if ((tmp ^ (tmp << 1)) & SIGNBIT) { \ diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) } switch (opcode) { - case 0x5: - if (u && size == 0) { /* NOT */ + case 0x5: /* CNT, NOT, RBIT */ + if (!u) { + gen_gvec_fn2(s, is_q, rd, rn, gen_gvec_cnt, 0); + } else if (size) { + gen_gvec_fn2(s, is_q, rd, rn, gen_gvec_rbit, 0); + } else { gen_gvec_fn2(s, is_q, rd, rn, tcg_gen_gvec_not, 0); - return; } - break; + return; case 0x8: /* CMGT, CMGE */ if (u) { gen_gvec_fn2(s, is_q, rd, rn, gen_gvec_cge0, size); @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) } else { int pass; + assert(size == 2); for (pass = 0; pass < (is_q ? 4 : 2); pass++) { TCGv_i32 tcg_op = tcg_temp_new_i32(); TCGv_i32 tcg_res = tcg_temp_new_i32(); read_vec_element_i32(s, tcg_op, rn, pass, MO_32); - if (size == 2) { + { /* Special cases for 32 bit elements */ switch (opcode) { case 0x2f: /* FABS */ @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) case 0x7: /* SQABS, SQNEG */ g_assert_not_reached(); } - } else { - /* Use helpers for 8 and 16 bit elements */ - switch (opcode) { - case 0x5: /* CNT, RBIT */ - /* For these two insns size is part of the opcode specifier - * (handled earlier); they always operate on byte elements. - */ - if (u) { - gen_helper_neon_rbit_u8(tcg_res, tcg_op); - } else { - gen_helper_neon_cnt_u8(tcg_res, tcg_op); - } - break; - default: - case 0x7: /* SQABS, SQNEG */ - g_assert_not_reached(); - } } - write_vec_element_i32(s, tcg_res, rd, pass, MO_32); } } diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-neon.c +++ b/target/arm/tcg/translate-neon.c @@ -XXX,XX +XXX,XX @@ static bool trans_VMVN(DisasContext *s, arg_2misc *a) return do_2misc_vec(s, a, tcg_gen_gvec_not); } +static bool trans_VCNT(DisasContext *s, arg_2misc *a) +{ + if (a->size != 0) { + return false; + } + return do_2misc_vec(s, a, gen_gvec_cnt); +} + #define WRAP_2M_3_OOL_FN(WRAPNAME, FUNC, DATA) \ static void WRAPNAME(unsigned vece, uint32_t rd_ofs, \ uint32_t rm_ofs, uint32_t oprsz, \ @@ -XXX,XX +XXX,XX @@ static bool trans_VREV16(DisasContext *s, arg_2misc *a) return do_2misc(s, a, gen_rev16); } -static bool trans_VCNT(DisasContext *s, arg_2misc *a) -{ - if (a->size != 0) { - return false; - } - return do_2misc(s, a, gen_helper_neon_cnt_u8); -} - static void gen_VABS_F(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs, uint32_t oprsz, uint32_t maxsz) { diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ DO_CLAMP(gvec_uclamp_b, uint8_t) DO_CLAMP(gvec_uclamp_h, uint16_t) DO_CLAMP(gvec_uclamp_s, uint32_t) DO_CLAMP(gvec_uclamp_d, uint64_t) + +/* Bit count in each 8-bit word. */ +void HELPER(gvec_cnt_b)(void *vd, void *vn, uint32_t desc) +{ + intptr_t i, opr_sz = simd_oprsz(desc); + uint8_t *d = vd, *n = vn; + + for (i = 0; i < opr_sz; ++i) { + d[i] = ctpop8(n[i]); + } + clear_tail(d, opr_sz, simd_maxsz(desc)); +} + +/* Reverse bits in each 8 bit word */ +void HELPER(gvec_rbit_b)(void *vd, void *vn, uint32_t desc) +{ + intptr_t i, opr_sz = simd_oprsz(desc); + uint64_t *d = vd, *n = vn; + + for (i = 0; i < opr_sz / 8; ++i) { + d[i] = revbit64(bswap64(n[i])); + } + clear_tail(d, opr_sz, simd_maxsz(desc)); +} -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-41-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 4 ++++ target/arm/tcg/translate-a64.c | 34 ++++++---------------------------- 2 files changed, 10 insertions(+), 28 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ @rrr_q1e3 ........ ... rm:5 ...... rn:5 rd:5 &qrrr_e q=1 esz=3 @rrrr_q1e3 ........ ... rm:5 . ra:5 rn:5 rd:5 &qrrrr_e q=1 esz=3 +@qrr_b . q:1 ...... .. ...... ...... rn:5 rd:5 &qrr_e esz=0 @qrr_h . q:1 ...... .. ...... ...... rn:5 rd:5 &qrr_e esz=1 @qrr_e . q:1 ...... esz:2 ...... ...... rn:5 rd:5 &qrr_e @@ -XXX,XX +XXX,XX @@ ABS_v 0.00 1110 ..1 00000 10111 0 ..... ..... @qrr_e NEG_v 0.10 1110 ..1 00000 10111 0 ..... ..... @qrr_e CLS_v 0.00 1110 ..1 00000 01001 0 ..... ..... @qrr_e CLZ_v 0.10 1110 ..1 00000 01001 0 ..... ..... @qrr_e +CNT_v 0.00 1110 001 00000 01011 0 ..... ..... @qrr_b +NOT_v 0.10 1110 001 00000 01011 0 ..... ..... @qrr_b +RBIT_v 0.10 1110 011 00000 01011 0 ..... ..... @qrr_b diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static bool do_gvec_fn2(DisasContext *s, arg_qrr_e *a, GVecGen2Fn *fn) TRANS(ABS_v, do_gvec_fn2, a, tcg_gen_gvec_abs) TRANS(NEG_v, do_gvec_fn2, a, tcg_gen_gvec_neg) +TRANS(NOT_v, do_gvec_fn2, a, tcg_gen_gvec_not) +TRANS(CNT_v, do_gvec_fn2, a, gen_gvec_cnt) +TRANS(RBIT_v, do_gvec_fn2, a, gen_gvec_rbit) static bool do_gvec_fn2_bhs(DisasContext *s, arg_qrr_e *a, GVecGen2Fn *fn) { @@ -XXX,XX +XXX,XX @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u, TCGCond cond; switch (opcode) { - case 0x5: /* NOT */ - /* This opcode is shared with CNT and RBIT but we have earlier - * enforced that size == 3 if and only if this is the NOT insn. - */ - tcg_gen_not_i64(tcg_rd, tcg_rn); - break; case 0xa: /* CMLT */ cond = TCG_COND_LT; do_cmop: @@ -XXX,XX +XXX,XX @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u, break; default: case 0x4: /* CLS, CLZ */ + case 0x5: /* NOT */ case 0x7: /* SQABS, SQNEG */ case 0xb: /* ABS, NEG */ g_assert_not_reached(); @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) case 0x1: /* REV16 */ handle_rev(s, opcode, u, is_q, size, rn, rd); return; - case 0x5: /* CNT, NOT, RBIT */ - if (u && size == 0) { - /* NOT */ - break; - } else if (u && size == 1) { - /* RBIT */ - break; - } else if (!u && size == 0) { - /* CNT */ - break; - } - unallocated_encoding(s); - return; case 0x12: /* XTN, XTN2, SQXTUN, SQXTUN2 */ case 0x14: /* SQXTN, SQXTN2, UQXTN, UQXTN2 */ if (size == 3) { @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) default: case 0x3: /* SUQADD, USQADD */ case 0x4: /* CLS, CLZ */ + case 0x5: /* CNT, NOT, RBIT */ case 0x7: /* SQABS, SQNEG */ case 0xb: /* ABS, NEG */ unallocated_encoding(s); @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) } switch (opcode) { - case 0x5: /* CNT, NOT, RBIT */ - if (!u) { - gen_gvec_fn2(s, is_q, rd, rn, gen_gvec_cnt, 0); - } else if (size) { - gen_gvec_fn2(s, is_q, rd, rn, gen_gvec_rbit, 0); - } else { - gen_gvec_fn2(s, is_q, rd, rn, tcg_gen_gvec_not, 0); - } - return; case 0x8: /* CMGT, CMGE */ if (u) { gen_gvec_fn2(s, is_q, rd, rn, gen_gvec_cge0, size); @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) gen_gvec_fn2(s, is_q, rd, rn, gen_gvec_clt0, size); return; case 0x4: /* CLZ, CLS */ + case 0x5: /* CNT, NOT, RBIT */ case 0xb: g_assert_not_reached(); } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-42-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 10 ++++ target/arm/tcg/translate-a64.c | 94 +++++++++++----------------------- 2 files changed, 40 insertions(+), 64 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ SQABS_s 0101 1110 ..1 00000 01111 0 ..... ..... @rr_e SQNEG_s 0111 1110 ..1 00000 01111 0 ..... ..... @rr_e ABS_s 0101 1110 111 00000 10111 0 ..... ..... @rr NEG_s 0111 1110 111 00000 10111 0 ..... ..... @rr +CMGT0_s 0101 1110 111 00000 10001 0 ..... ..... @rr +CMGE0_s 0111 1110 111 00000 10001 0 ..... ..... @rr +CMEQ0_s 0101 1110 111 00000 10011 0 ..... ..... @rr +CMLE0_s 0111 1110 111 00000 10011 0 ..... ..... @rr +CMLT0_s 0101 1110 111 00000 10101 0 ..... ..... @rr # Advanced SIMD two-register miscellaneous @@ -XXX,XX +XXX,XX @@ CLZ_v 0.10 1110 ..1 00000 01001 0 ..... ..... @qrr_e CNT_v 0.00 1110 001 00000 01011 0 ..... ..... @qrr_b NOT_v 0.10 1110 001 00000 01011 0 ..... ..... @qrr_b RBIT_v 0.10 1110 011 00000 01011 0 ..... ..... @qrr_b +CMGT0_v 0.00 1110 ..1 00000 10001 0 ..... ..... @qrr_e +CMGE0_v 0.10 1110 ..1 00000 10001 0 ..... ..... @qrr_e +CMEQ0_v 0.00 1110 ..1 00000 10011 0 ..... ..... @qrr_e +CMLE0_v 0.10 1110 ..1 00000 10011 0 ..... ..... @qrr_e +CMLT0_v 0.00 1110 ..1 00000 10101 0 ..... ..... @qrr_e diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static bool do_scalar1_d(DisasContext *s, arg_rr *a, ArithOneOp *f) TRANS(ABS_s, do_scalar1_d, a, tcg_gen_abs_i64) TRANS(NEG_s, do_scalar1_d, a, tcg_gen_neg_i64) +static bool do_cmop0_d(DisasContext *s, arg_rr *a, TCGCond cond) +{ + if (fp_access_check(s)) { + TCGv_i64 t = read_fp_dreg(s, a->rn); + tcg_gen_negsetcond_i64(cond, t, t, tcg_constant_i64(0)); + write_fp_dreg(s, a->rd, t); + } + return true; +} + +TRANS(CMGT0_s, do_cmop0_d, a, TCG_COND_GT) +TRANS(CMGE0_s, do_cmop0_d, a, TCG_COND_GE) +TRANS(CMLE0_s, do_cmop0_d, a, TCG_COND_LE) +TRANS(CMLT0_s, do_cmop0_d, a, TCG_COND_LT) +TRANS(CMEQ0_s, do_cmop0_d, a, TCG_COND_EQ) + static bool do_gvec_fn2(DisasContext *s, arg_qrr_e *a, GVecGen2Fn *fn) { if (!a->q && a->esz == MO_64) { @@ -XXX,XX +XXX,XX @@ TRANS(NEG_v, do_gvec_fn2, a, tcg_gen_gvec_neg) TRANS(NOT_v, do_gvec_fn2, a, tcg_gen_gvec_not) TRANS(CNT_v, do_gvec_fn2, a, gen_gvec_cnt) TRANS(RBIT_v, do_gvec_fn2, a, gen_gvec_rbit) +TRANS(CMGT0_v, do_gvec_fn2, a, gen_gvec_cgt0) +TRANS(CMGE0_v, do_gvec_fn2, a, gen_gvec_cge0) +TRANS(CMLT0_v, do_gvec_fn2, a, gen_gvec_clt0) +TRANS(CMLE0_v, do_gvec_fn2, a, gen_gvec_cle0) +TRANS(CMEQ0_v, do_gvec_fn2, a, gen_gvec_ceq0) static bool do_gvec_fn2_bhs(DisasContext *s, arg_qrr_e *a, GVecGen2Fn *fn) { @@ -XXX,XX +XXX,XX @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u, * The caller only need provide tcg_rmode and tcg_fpstatus if the op * requires them. */ - TCGCond cond; - switch (opcode) { - case 0xa: /* CMLT */ - cond = TCG_COND_LT; - do_cmop: - /* 64 bit integer comparison against zero, result is test ? -1 : 0. */ - tcg_gen_negsetcond_i64(cond, tcg_rd, tcg_rn, tcg_constant_i64(0)); - break; - case 0x8: /* CMGT, CMGE */ - cond = u ? TCG_COND_GE : TCG_COND_GT; - goto do_cmop; - case 0x9: /* CMEQ, CMLE */ - cond = u ? TCG_COND_LE : TCG_COND_EQ; - goto do_cmop; case 0x2f: /* FABS */ gen_vfp_absd(tcg_rd, tcg_rn); break; @@ -XXX,XX +XXX,XX @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u, case 0x4: /* CLS, CLZ */ case 0x5: /* NOT */ case 0x7: /* SQABS, SQNEG */ + case 0x8: /* CMGT, CMGE */ + case 0x9: /* CMEQ, CMLE */ + case 0xa: /* CMLT */ case 0xb: /* ABS, NEG */ g_assert_not_reached(); } @@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) TCGv_ptr tcg_fpstatus; switch (opcode) { - case 0xa: /* CMLT */ - if (u) { - unallocated_encoding(s); - return; - } - /* fall through */ - case 0x8: /* CMGT, CMGE */ - case 0x9: /* CMEQ, CMLE */ - if (size != 3) { - unallocated_encoding(s); - return; - } - break; case 0x12: /* SQXTUN */ if (!u) { unallocated_encoding(s); @@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) default: case 0x3: /* USQADD / SUQADD */ case 0x7: /* SQABS / SQNEG */ + case 0x8: /* CMGT, CMGE */ + case 0x9: /* CMEQ, CMLE */ + case 0xa: /* CMLT */ case 0xb: /* ABS, NEG */ unallocated_encoding(s); return; @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) } handle_shll(s, is_q, size, rn, rd); return; - case 0xa: /* CMLT */ - if (u == 1) { - unallocated_encoding(s); - return; - } - /* fall through */ - case 0x8: /* CMGT, CMGE */ - case 0x9: /* CMEQ, CMLE */ - if (size == 3 && !is_q) { - unallocated_encoding(s); - return; - } - break; case 0xc ... 0xf: case 0x16 ... 0x1f: { @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) case 0x4: /* CLS, CLZ */ case 0x5: /* CNT, NOT, RBIT */ case 0x7: /* SQABS, SQNEG */ + case 0x8: /* CMGT, CMGE */ + case 0x9: /* CMEQ, CMLE */ + case 0xa: /* CMLT */ case 0xb: /* ABS, NEG */ unallocated_encoding(s); return; @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) tcg_rmode = NULL; } - switch (opcode) { - case 0x8: /* CMGT, CMGE */ - if (u) { - gen_gvec_fn2(s, is_q, rd, rn, gen_gvec_cge0, size); - } else { - gen_gvec_fn2(s, is_q, rd, rn, gen_gvec_cgt0, size); - } - return; - case 0x9: /* CMEQ, CMLE */ - if (u) { - gen_gvec_fn2(s, is_q, rd, rn, gen_gvec_cle0, size); - } else { - gen_gvec_fn2(s, is_q, rd, rn, gen_gvec_ceq0, size); - } - return; - case 0xa: /* CMLT */ - gen_gvec_fn2(s, is_q, rd, rn, gen_gvec_clt0, size); - return; - case 0x4: /* CLZ, CLS */ - case 0x5: /* CNT, NOT, RBIT */ - case 0xb: - g_assert_not_reached(); - } - if (size == 3) { /* All 64-bit element operations can be shared with scalar 2misc */ int pass; -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-43-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate.h | 6 +++ target/arm/tcg/gengvec.c | 58 ++++++++++++++++++++++ target/arm/tcg/translate-neon.c | 88 +++++++-------------------------- 3 files changed, 81 insertions(+), 71 deletions(-) diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate.h +++ b/target/arm/tcg/translate.h @@ -XXX,XX +XXX,XX @@ void gen_gvec_cnt(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, uint32_t opr_sz, uint32_t max_sz); void gen_gvec_rbit(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, uint32_t opr_sz, uint32_t max_sz); +void gen_gvec_rev16(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t opr_sz, uint32_t max_sz); +void gen_gvec_rev32(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t opr_sz, uint32_t max_sz); +void gen_gvec_rev64(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t opr_sz, uint32_t max_sz); /* * Forward to the isar_feature_* tests given a DisasContext pointer. diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/gengvec.c +++ b/target/arm/tcg/gengvec.c @@ -XXX,XX +XXX,XX @@ void gen_gvec_rbit(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, tcg_gen_gvec_2_ool(rd_ofs, rn_ofs, opr_sz, max_sz, 0, gen_helper_gvec_rbit_b); } + +void gen_gvec_rev16(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t opr_sz, uint32_t max_sz) +{ + assert(vece == MO_8); + tcg_gen_gvec_rotli(MO_16, rd_ofs, rn_ofs, 8, opr_sz, max_sz); +} + +static void gen_bswap32_i64(TCGv_i64 d, TCGv_i64 n) +{ + tcg_gen_bswap64_i64(d, n); + tcg_gen_rotli_i64(d, d, 32); +} + +void gen_gvec_rev32(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t opr_sz, uint32_t max_sz) +{ + static const GVecGen2 g = { + .fni8 = gen_bswap32_i64, + .fni4 = tcg_gen_bswap32_i32, + .prefer_i64 = TCG_TARGET_REG_BITS == 64, + .vece = MO_32 + }; + + switch (vece) { + case MO_16: + tcg_gen_gvec_rotli(MO_32, rd_ofs, rn_ofs, 16, opr_sz, max_sz); + break; + case MO_8: + tcg_gen_gvec_2(rd_ofs, rn_ofs, opr_sz, max_sz, &g); + break; + default: + g_assert_not_reached(); + } +} + +void gen_gvec_rev64(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t opr_sz, uint32_t max_sz) +{ + static const GVecGen2 g[] = { + { .fni8 = tcg_gen_bswap64_i64, + .vece = MO_64 }, + { .fni8 = tcg_gen_hswap_i64, + .vece = MO_64 }, + }; + + switch (vece) { + case MO_32: + tcg_gen_gvec_rotli(MO_64, rd_ofs, rn_ofs, 32, opr_sz, max_sz); + break; + case MO_8: + case MO_16: + tcg_gen_gvec_2(rd_ofs, rn_ofs, opr_sz, max_sz, &g[vece]); + break; + default: + g_assert_not_reached(); + } +} diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-neon.c +++ b/target/arm/tcg/translate-neon.c @@ -XXX,XX +XXX,XX @@ static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a) return true; } -static bool trans_VREV64(DisasContext *s, arg_VREV64 *a) -{ - int pass, half; - TCGv_i32 tmp[2]; - - if (!arm_dc_feature(s, ARM_FEATURE_NEON)) { - return false; - } - - /* UNDEF accesses to D16-D31 if they don't exist. */ - if (!dc_isar_feature(aa32_simd_r32, s) && - ((a->vd | a->vm) & 0x10)) { - return false; - } - - if ((a->vd | a->vm) & a->q) { - return false; - } - - if (a->size == 3) { - return false; - } - - if (!vfp_access_check(s)) { - return true; - } - - tmp[0] = tcg_temp_new_i32(); - tmp[1] = tcg_temp_new_i32(); - - for (pass = 0; pass < (a->q ? 2 : 1); pass++) { - for (half = 0; half < 2; half++) { - read_neon_element32(tmp[half], a->vm, pass * 2 + half, MO_32); - switch (a->size) { - case 0: - tcg_gen_bswap32_i32(tmp[half], tmp[half]); - break; - case 1: - gen_swap_half(tmp[half], tmp[half]); - break; - case 2: - break; - default: - g_assert_not_reached(); - } - } - write_neon_element32(tmp[1], a->vd, pass * 2, MO_32); - write_neon_element32(tmp[0], a->vd, pass * 2 + 1, MO_32); - } - return true; -} - static bool do_2misc_pairwise(DisasContext *s, arg_2misc *a, NeonGenWidenFn *widenfn, NeonGenTwo64OpFn *opfn, @@ -XXX,XX +XXX,XX @@ DO_2MISC_VEC(VCGE0, gen_gvec_cge0) DO_2MISC_VEC(VCLT0, gen_gvec_clt0) DO_2MISC_VEC(VCLS, gen_gvec_cls) DO_2MISC_VEC(VCLZ, gen_gvec_clz) +DO_2MISC_VEC(VREV64, gen_gvec_rev64) static bool trans_VMVN(DisasContext *s, arg_2misc *a) { @@ -XXX,XX +XXX,XX @@ static bool trans_VCNT(DisasContext *s, arg_2misc *a) return do_2misc_vec(s, a, gen_gvec_cnt); } +static bool trans_VREV16(DisasContext *s, arg_2misc *a) +{ + if (a->size != 0) { + return false; + } + return do_2misc_vec(s, a, gen_gvec_rev16); +} + +static bool trans_VREV32(DisasContext *s, arg_2misc *a) +{ + if (a->size != 0 && a->size != 1) { + return false; + } + return do_2misc_vec(s, a, gen_gvec_rev32); +} + #define WRAP_2M_3_OOL_FN(WRAPNAME, FUNC, DATA) \ static void WRAPNAME(unsigned vece, uint32_t rd_ofs, \ uint32_t rm_ofs, uint32_t oprsz, \ @@ -XXX,XX +XXX,XX @@ static bool do_2misc(DisasContext *s, arg_2misc *a, NeonGenOneOpFn *fn) return true; } -static bool trans_VREV32(DisasContext *s, arg_2misc *a) -{ - static NeonGenOneOpFn * const fn[] = { - tcg_gen_bswap32_i32, - gen_swap_half, - NULL, - NULL, - }; - return do_2misc(s, a, fn[a->size]); -} - -static bool trans_VREV16(DisasContext *s, arg_2misc *a) -{ - if (a->size != 0) { - return false; - } - return do_2misc(s, a, gen_rev16); -} - static void gen_VABS_F(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs, uint32_t oprsz, uint32_t maxsz) { -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> This includes REV16, REV32, REV64. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-44-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 5 +++ target/arm/tcg/translate-a64.c | 79 +++------------------------------- 2 files changed, 10 insertions(+), 74 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ @qrr_b . q:1 ...... .. ...... ...... rn:5 rd:5 &qrr_e esz=0 @qrr_h . q:1 ...... .. ...... ...... rn:5 rd:5 &qrr_e esz=1 +@qrr_bh . q:1 ...... . esz:1 ...... ...... rn:5 rd:5 &qrr_e @qrr_e . q:1 ...... esz:2 ...... ...... rn:5 rd:5 &qrr_e @qrrr_b . q:1 ...... ... rm:5 ...... rn:5 rd:5 &qrrr_e esz=0 @@ -XXX,XX +XXX,XX @@ CMGE0_v 0.10 1110 ..1 00000 10001 0 ..... ..... @qrr_e CMEQ0_v 0.00 1110 ..1 00000 10011 0 ..... ..... @qrr_e CMLE0_v 0.10 1110 ..1 00000 10011 0 ..... ..... @qrr_e CMLT0_v 0.00 1110 ..1 00000 10101 0 ..... ..... @qrr_e + +REV16_v 0.00 1110 001 00000 00011 0 ..... ..... @qrr_b +REV32_v 0.10 1110 0.1 00000 00001 0 ..... ..... @qrr_bh +REV64_v 0.00 1110 ..1 00000 00001 0 ..... ..... @qrr_e diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ TRANS(CMGE0_v, do_gvec_fn2, a, gen_gvec_cge0) TRANS(CMLT0_v, do_gvec_fn2, a, gen_gvec_clt0) TRANS(CMLE0_v, do_gvec_fn2, a, gen_gvec_cle0) TRANS(CMEQ0_v, do_gvec_fn2, a, gen_gvec_ceq0) +TRANS(REV16_v, do_gvec_fn2, a, gen_gvec_rev16) +TRANS(REV32_v, do_gvec_fn2, a, gen_gvec_rev32) static bool do_gvec_fn2_bhs(DisasContext *s, arg_qrr_e *a, GVecGen2Fn *fn) { @@ -XXX,XX +XXX,XX @@ static bool do_gvec_fn2_bhs(DisasContext *s, arg_qrr_e *a, GVecGen2Fn *fn) TRANS(CLS_v, do_gvec_fn2_bhs, a, gen_gvec_cls) TRANS(CLZ_v, do_gvec_fn2_bhs, a, gen_gvec_clz) +TRANS(REV64_v, do_gvec_fn2_bhs, a, gen_gvec_rev64) /* Common vector code for handling integer to FP conversion */ static void handle_simd_intfp_conv(DisasContext *s, int rd, int rn, @@ -XXX,XX +XXX,XX @@ static void handle_2misc_widening(DisasContext *s, int opcode, bool is_q, } } -static void handle_rev(DisasContext *s, int opcode, bool u, - bool is_q, int size, int rn, int rd) -{ - int op = (opcode << 1) | u; - int opsz = op + size; - int grp_size = 3 - opsz; - int dsize = is_q ? 128 : 64; - int i; - - if (opsz >= 3) { - unallocated_encoding(s); - return; - } - - if (!fp_access_check(s)) { - return; - } - - if (size == 0) { - /* Special case bytes, use bswap op on each group of elements */ - int groups = dsize / (8 << grp_size); - - for (i = 0; i < groups; i++) { - TCGv_i64 tcg_tmp = tcg_temp_new_i64(); - - read_vec_element(s, tcg_tmp, rn, i, grp_size); - switch (grp_size) { - case MO_16: - tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp, TCG_BSWAP_IZ); - break; - case MO_32: - tcg_gen_bswap32_i64(tcg_tmp, tcg_tmp, TCG_BSWAP_IZ); - break; - case MO_64: - tcg_gen_bswap64_i64(tcg_tmp, tcg_tmp); - break; - default: - g_assert_not_reached(); - } - write_vec_element(s, tcg_tmp, rd, i, grp_size); - } - clear_vec_high(s, is_q, rd); - } else { - int revmask = (1 << grp_size) - 1; - int esize = 8 << size; - int elements = dsize / esize; - TCGv_i64 tcg_rn = tcg_temp_new_i64(); - TCGv_i64 tcg_rd[2]; - - for (i = 0; i < 2; i++) { - tcg_rd[i] = tcg_temp_new_i64(); - tcg_gen_movi_i64(tcg_rd[i], 0); - } - - for (i = 0; i < elements; i++) { - int e_rev = (i & 0xf) ^ revmask; - int w = (e_rev * esize) / 64; - int o = (e_rev * esize) % 64; - - read_vec_element(s, tcg_rn, rn, i, size); - tcg_gen_deposit_i64(tcg_rd[w], tcg_rd[w], tcg_rn, o, esize); - } - - for (i = 0; i < 2; i++) { - write_vec_element(s, tcg_rd[i], rd, i, MO_64); - } - clear_vec_high(s, true, rd); - } -} - static void handle_2misc_pairwise(DisasContext *s, int opcode, bool u, bool is_q, int size, int rn, int rd) { @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) TCGv_ptr tcg_fpstatus; switch (opcode) { - case 0x0: /* REV64, REV32 */ - case 0x1: /* REV16 */ - handle_rev(s, opcode, u, is_q, size, rn, rd); - return; case 0x12: /* XTN, XTN2, SQXTUN, SQXTUN2 */ case 0x14: /* SQXTN, SQXTN2, UQXTN, UQXTN2 */ if (size == 3) { @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) break; } default: + case 0x0: /* REV64, REV32 */ + case 0x1: /* REV16 */ case 0x3: /* SUQADD, USQADD */ case 0x4: /* CLS, CLZ */ case 0x5: /* CNT, NOT, RBIT */ -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Move from helper-a64.c to neon_helper.c so that these functions are available for arm32 code as well. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-45-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/helper.h | 2 ++ target/arm/tcg/helper-a64.h | 2 -- target/arm/tcg/helper-a64.c | 43 ------------------------------------ target/arm/tcg/neon_helper.c | 43 ++++++++++++++++++++++++++++++++++++ 4 files changed, 45 insertions(+), 45 deletions(-) diff --git a/target/arm/helper.h b/target/arm/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_2(neon_addl_u16, i64, i64, i64) DEF_HELPER_2(neon_addl_u32, i64, i64, i64) DEF_HELPER_2(neon_paddl_u16, i64, i64, i64) DEF_HELPER_2(neon_paddl_u32, i64, i64, i64) +DEF_HELPER_FLAGS_1(neon_addlp_s8, TCG_CALL_NO_RWG_SE, i64, i64) +DEF_HELPER_FLAGS_1(neon_addlp_s16, TCG_CALL_NO_RWG_SE, i64, i64) DEF_HELPER_2(neon_subl_u16, i64, i64, i64) DEF_HELPER_2(neon_subl_u32, i64, i64, i64) DEF_HELPER_3(neon_addl_saturate_s32, i64, env, i64, i64) diff --git a/target/arm/tcg/helper-a64.h b/target/arm/tcg/helper-a64.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-a64.h +++ b/target/arm/tcg/helper-a64.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(recpsf_f64, TCG_CALL_NO_RWG, f64, f64, f64, ptr) DEF_HELPER_FLAGS_3(rsqrtsf_f16, TCG_CALL_NO_RWG, f16, f16, f16, ptr) DEF_HELPER_FLAGS_3(rsqrtsf_f32, TCG_CALL_NO_RWG, f32, f32, f32, ptr) DEF_HELPER_FLAGS_3(rsqrtsf_f64, TCG_CALL_NO_RWG, f64, f64, f64, ptr) -DEF_HELPER_FLAGS_1(neon_addlp_s8, TCG_CALL_NO_RWG_SE, i64, i64) DEF_HELPER_FLAGS_1(neon_addlp_u8, TCG_CALL_NO_RWG_SE, i64, i64) -DEF_HELPER_FLAGS_1(neon_addlp_s16, TCG_CALL_NO_RWG_SE, i64, i64) DEF_HELPER_FLAGS_1(neon_addlp_u16, TCG_CALL_NO_RWG_SE, i64, i64) DEF_HELPER_FLAGS_2(frecpx_f64, TCG_CALL_NO_RWG, f64, f64, ptr) DEF_HELPER_FLAGS_2(frecpx_f32, TCG_CALL_NO_RWG, f32, f32, ptr) diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-a64.c +++ b/target/arm/tcg/helper-a64.c @@ -XXX,XX +XXX,XX @@ float64 HELPER(rsqrtsf_f64)(float64 a, float64 b, void *fpstp) return float64_muladd(a, b, float64_three, float_muladd_halve_result, fpst); } -/* Pairwise long add: add pairs of adjacent elements into - * double-width elements in the result (eg _s8 is an 8x8->16 op) - */ -uint64_t HELPER(neon_addlp_s8)(uint64_t a) -{ - uint64_t nsignmask = 0x0080008000800080ULL; - uint64_t wsignmask = 0x8000800080008000ULL; - uint64_t elementmask = 0x00ff00ff00ff00ffULL; - uint64_t tmp1, tmp2; - uint64_t res, signres; - - /* Extract odd elements, sign extend each to a 16 bit field */ - tmp1 = a & elementmask; - tmp1 ^= nsignmask; - tmp1 |= wsignmask; - tmp1 = (tmp1 - nsignmask) ^ wsignmask; - /* Ditto for the even elements */ - tmp2 = (a >> 8) & elementmask; - tmp2 ^= nsignmask; - tmp2 |= wsignmask; - tmp2 = (tmp2 - nsignmask) ^ wsignmask; - - /* calculate the result by summing bits 0..14, 16..22, etc, - * and then adjusting the sign bits 15, 23, etc manually. - * This ensures the addition can't overflow the 16 bit field. - */ - signres = (tmp1 ^ tmp2) & wsignmask; - res = (tmp1 & ~wsignmask) + (tmp2 & ~wsignmask); - res ^= signres; - - return res; -} - uint64_t HELPER(neon_addlp_u8)(uint64_t a) { uint64_t tmp; @@ -XXX,XX +XXX,XX @@ uint64_t HELPER(neon_addlp_u8)(uint64_t a) return tmp; } -uint64_t HELPER(neon_addlp_s16)(uint64_t a) -{ - int32_t reslo, reshi; - - reslo = (int32_t)(int16_t)a + (int32_t)(int16_t)(a >> 16); - reshi = (int32_t)(int16_t)(a >> 32) + (int32_t)(int16_t)(a >> 48); - - return (uint32_t)reslo | (((uint64_t)reshi) << 32); -} - uint64_t HELPER(neon_addlp_u16)(uint64_t a) { uint64_t tmp; diff --git a/target/arm/tcg/neon_helper.c b/target/arm/tcg/neon_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/neon_helper.c +++ b/target/arm/tcg/neon_helper.c @@ -XXX,XX +XXX,XX @@ uint64_t HELPER(neon_paddl_u32)(uint64_t a, uint64_t b) return low + ((uint64_t)high << 32); } +/* Pairwise long add: add pairs of adjacent elements into + * double-width elements in the result (eg _s8 is an 8x8->16 op) + */ +uint64_t HELPER(neon_addlp_s8)(uint64_t a) +{ + uint64_t nsignmask = 0x0080008000800080ULL; + uint64_t wsignmask = 0x8000800080008000ULL; + uint64_t elementmask = 0x00ff00ff00ff00ffULL; + uint64_t tmp1, tmp2; + uint64_t res, signres; + + /* Extract odd elements, sign extend each to a 16 bit field */ + tmp1 = a & elementmask; + tmp1 ^= nsignmask; + tmp1 |= wsignmask; + tmp1 = (tmp1 - nsignmask) ^ wsignmask; + /* Ditto for the even elements */ + tmp2 = (a >> 8) & elementmask; + tmp2 ^= nsignmask; + tmp2 |= wsignmask; + tmp2 = (tmp2 - nsignmask) ^ wsignmask; + + /* calculate the result by summing bits 0..14, 16..22, etc, + * and then adjusting the sign bits 15, 23, etc manually. + * This ensures the addition can't overflow the 16 bit field. + */ + signres = (tmp1 ^ tmp2) & wsignmask; + res = (tmp1 & ~wsignmask) + (tmp2 & ~wsignmask); + res ^= signres; + + return res; +} + +uint64_t HELPER(neon_addlp_s16)(uint64_t a) +{ + int32_t reslo, reshi; + + reslo = (int32_t)(int16_t)a + (int32_t)(int16_t)(a >> 16); + reshi = (int32_t)(int16_t)(a >> 32) + (int32_t)(int16_t)(a >> 48); + + return (uint32_t)reslo | (((uint64_t)reshi) << 32); +} + uint64_t HELPER(neon_subl_u16)(uint64_t a, uint64_t b) { uint64_t mask; -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Pairwise addition with and without accumulation. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-46-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/helper.h | 2 - target/arm/tcg/translate.h | 9 ++ target/arm/tcg/gengvec.c | 230 ++++++++++++++++++++++++++++++++ target/arm/tcg/neon_helper.c | 22 --- target/arm/tcg/translate-neon.c | 150 +-------------------- 5 files changed, 243 insertions(+), 170 deletions(-) diff --git a/target/arm/helper.h b/target/arm/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_1(neon_widen_s16, i64, i32) DEF_HELPER_2(neon_addl_u16, i64, i64, i64) DEF_HELPER_2(neon_addl_u32, i64, i64, i64) -DEF_HELPER_2(neon_paddl_u16, i64, i64, i64) -DEF_HELPER_2(neon_paddl_u32, i64, i64, i64) DEF_HELPER_FLAGS_1(neon_addlp_s8, TCG_CALL_NO_RWG_SE, i64, i64) DEF_HELPER_FLAGS_1(neon_addlp_s16, TCG_CALL_NO_RWG_SE, i64, i64) DEF_HELPER_2(neon_subl_u16, i64, i64, i64) diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate.h +++ b/target/arm/tcg/translate.h @@ -XXX,XX +XXX,XX @@ void gen_gvec_rev32(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, void gen_gvec_rev64(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, uint32_t opr_sz, uint32_t max_sz); +void gen_gvec_saddlp(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t opr_sz, uint32_t max_sz); +void gen_gvec_sadalp(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t opr_sz, uint32_t max_sz); +void gen_gvec_uaddlp(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t opr_sz, uint32_t max_sz); +void gen_gvec_uadalp(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t opr_sz, uint32_t max_sz); + /* * Forward to the isar_feature_* tests given a DisasContext pointer. */ diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/gengvec.c +++ b/target/arm/tcg/gengvec.c @@ -XXX,XX +XXX,XX @@ void gen_gvec_rev64(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, g_assert_not_reached(); } } + +static void gen_saddlp_vec(unsigned vece, TCGv_vec d, TCGv_vec n) +{ + int half = 4 << vece; + TCGv_vec t = tcg_temp_new_vec_matching(d); + + tcg_gen_shli_vec(vece, t, n, half); + tcg_gen_sari_vec(vece, d, n, half); + tcg_gen_sari_vec(vece, t, t, half); + tcg_gen_add_vec(vece, d, d, t); +} + +static void gen_saddlp_s_i64(TCGv_i64 d, TCGv_i64 n) +{ + TCGv_i64 t = tcg_temp_new_i64(); + + tcg_gen_ext32s_i64(t, n); + tcg_gen_sari_i64(d, n, 32); + tcg_gen_add_i64(d, d, t); +} + +void gen_gvec_saddlp(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t opr_sz, uint32_t max_sz) +{ + static const TCGOpcode vecop_list[] = { + INDEX_op_sari_vec, INDEX_op_shli_vec, INDEX_op_add_vec, 0 + }; + static const GVecGen2 g[] = { + { .fniv = gen_saddlp_vec, + .fni8 = gen_helper_neon_addlp_s8, + .opt_opc = vecop_list, + .vece = MO_16 }, + { .fniv = gen_saddlp_vec, + .fni8 = gen_helper_neon_addlp_s16, + .opt_opc = vecop_list, + .vece = MO_32 }, + { .fniv = gen_saddlp_vec, + .fni8 = gen_saddlp_s_i64, + .opt_opc = vecop_list, + .vece = MO_64 }, + }; + assert(vece <= MO_32); + tcg_gen_gvec_2(rd_ofs, rn_ofs, opr_sz, max_sz, &g[vece]); +} + +static void gen_sadalp_vec(unsigned vece, TCGv_vec d, TCGv_vec n) +{ + TCGv_vec t = tcg_temp_new_vec_matching(d); + + gen_saddlp_vec(vece, t, n); + tcg_gen_add_vec(vece, d, d, t); +} + +static void gen_sadalp_b_i64(TCGv_i64 d, TCGv_i64 n) +{ + TCGv_i64 t = tcg_temp_new_i64(); + + gen_helper_neon_addlp_s8(t, n); + tcg_gen_vec_add16_i64(d, d, t); +} + +static void gen_sadalp_h_i64(TCGv_i64 d, TCGv_i64 n) +{ + TCGv_i64 t = tcg_temp_new_i64(); + + gen_helper_neon_addlp_s16(t, n); + tcg_gen_vec_add32_i64(d, d, t); +} + +static void gen_sadalp_s_i64(TCGv_i64 d, TCGv_i64 n) +{ + TCGv_i64 t = tcg_temp_new_i64(); + + gen_saddlp_s_i64(t, n); + tcg_gen_add_i64(d, d, t); +} + +void gen_gvec_sadalp(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t opr_sz, uint32_t max_sz) +{ + static const TCGOpcode vecop_list[] = { + INDEX_op_sari_vec, INDEX_op_shli_vec, INDEX_op_add_vec, 0 + }; + static const GVecGen2 g[] = { + { .fniv = gen_sadalp_vec, + .fni8 = gen_sadalp_b_i64, + .opt_opc = vecop_list, + .load_dest = true, + .vece = MO_16 }, + { .fniv = gen_sadalp_vec, + .fni8 = gen_sadalp_h_i64, + .opt_opc = vecop_list, + .load_dest = true, + .vece = MO_32 }, + { .fniv = gen_sadalp_vec, + .fni8 = gen_sadalp_s_i64, + .opt_opc = vecop_list, + .load_dest = true, + .vece = MO_64 }, + }; + assert(vece <= MO_32); + tcg_gen_gvec_2(rd_ofs, rn_ofs, opr_sz, max_sz, &g[vece]); +} + +static void gen_uaddlp_vec(unsigned vece, TCGv_vec d, TCGv_vec n) +{ + int half = 4 << vece; + TCGv_vec t = tcg_temp_new_vec_matching(d); + TCGv_vec m = tcg_constant_vec_matching(d, vece, MAKE_64BIT_MASK(0, half)); + + tcg_gen_shri_vec(vece, t, n, half); + tcg_gen_and_vec(vece, d, n, m); + tcg_gen_add_vec(vece, d, d, t); +} + +static void gen_uaddlp_b_i64(TCGv_i64 d, TCGv_i64 n) +{ + TCGv_i64 t = tcg_temp_new_i64(); + TCGv_i64 m = tcg_constant_i64(dup_const(MO_16, 0xff)); + + tcg_gen_shri_i64(t, n, 8); + tcg_gen_and_i64(d, n, m); + tcg_gen_and_i64(t, t, m); + /* No carry between widened unsigned elements. */ + tcg_gen_add_i64(d, d, t); +} + +static void gen_uaddlp_h_i64(TCGv_i64 d, TCGv_i64 n) +{ + TCGv_i64 t = tcg_temp_new_i64(); + TCGv_i64 m = tcg_constant_i64(dup_const(MO_32, 0xffff)); + + tcg_gen_shri_i64(t, n, 16); + tcg_gen_and_i64(d, n, m); + tcg_gen_and_i64(t, t, m); + /* No carry between widened unsigned elements. */ + tcg_gen_add_i64(d, d, t); +} + +static void gen_uaddlp_s_i64(TCGv_i64 d, TCGv_i64 n) +{ + TCGv_i64 t = tcg_temp_new_i64(); + + tcg_gen_ext32u_i64(t, n); + tcg_gen_shri_i64(d, n, 32); + tcg_gen_add_i64(d, d, t); +} + +void gen_gvec_uaddlp(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t opr_sz, uint32_t max_sz) +{ + static const TCGOpcode vecop_list[] = { + INDEX_op_shri_vec, INDEX_op_add_vec, 0 + }; + static const GVecGen2 g[] = { + { .fniv = gen_uaddlp_vec, + .fni8 = gen_uaddlp_b_i64, + .opt_opc = vecop_list, + .vece = MO_16 }, + { .fniv = gen_uaddlp_vec, + .fni8 = gen_uaddlp_h_i64, + .opt_opc = vecop_list, + .vece = MO_32 }, + { .fniv = gen_uaddlp_vec, + .fni8 = gen_uaddlp_s_i64, + .opt_opc = vecop_list, + .vece = MO_64 }, + }; + assert(vece <= MO_32); + tcg_gen_gvec_2(rd_ofs, rn_ofs, opr_sz, max_sz, &g[vece]); +} + +static void gen_uadalp_vec(unsigned vece, TCGv_vec d, TCGv_vec n) +{ + TCGv_vec t = tcg_temp_new_vec_matching(d); + + gen_uaddlp_vec(vece, t, n); + tcg_gen_add_vec(vece, d, d, t); +} + +static void gen_uadalp_b_i64(TCGv_i64 d, TCGv_i64 n) +{ + TCGv_i64 t = tcg_temp_new_i64(); + + gen_uaddlp_b_i64(t, n); + tcg_gen_vec_add16_i64(d, d, t); +} + +static void gen_uadalp_h_i64(TCGv_i64 d, TCGv_i64 n) +{ + TCGv_i64 t = tcg_temp_new_i64(); + + gen_uaddlp_h_i64(t, n); + tcg_gen_vec_add32_i64(d, d, t); +} + +static void gen_uadalp_s_i64(TCGv_i64 d, TCGv_i64 n) +{ + TCGv_i64 t = tcg_temp_new_i64(); + + gen_uaddlp_s_i64(t, n); + tcg_gen_add_i64(d, d, t); +} + +void gen_gvec_uadalp(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t opr_sz, uint32_t max_sz) +{ + static const TCGOpcode vecop_list[] = { + INDEX_op_shri_vec, INDEX_op_add_vec, 0 + }; + static const GVecGen2 g[] = { + { .fniv = gen_uadalp_vec, + .fni8 = gen_uadalp_b_i64, + .load_dest = true, + .opt_opc = vecop_list, + .vece = MO_16 }, + { .fniv = gen_uadalp_vec, + .fni8 = gen_uadalp_h_i64, + .load_dest = true, + .opt_opc = vecop_list, + .vece = MO_32 }, + { .fniv = gen_uadalp_vec, + .fni8 = gen_uadalp_s_i64, + .load_dest = true, + .opt_opc = vecop_list, + .vece = MO_64 }, + }; + assert(vece <= MO_32); + tcg_gen_gvec_2(rd_ofs, rn_ofs, opr_sz, max_sz, &g[vece]); +} diff --git a/target/arm/tcg/neon_helper.c b/target/arm/tcg/neon_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/neon_helper.c +++ b/target/arm/tcg/neon_helper.c @@ -XXX,XX +XXX,XX @@ uint64_t HELPER(neon_addl_u32)(uint64_t a, uint64_t b) return (a + b) ^ mask; } -uint64_t HELPER(neon_paddl_u16)(uint64_t a, uint64_t b) -{ - uint64_t tmp; - uint64_t tmp2; - - tmp = a & 0x0000ffff0000ffffull; - tmp += (a >> 16) & 0x0000ffff0000ffffull; - tmp2 = b & 0xffff0000ffff0000ull; - tmp2 += (b << 16) & 0xffff0000ffff0000ull; - return ( tmp & 0xffff) - | ((tmp >> 16) & 0xffff0000ull) - | ((tmp2 << 16) & 0xffff00000000ull) - | ( tmp2 & 0xffff000000000000ull); -} - -uint64_t HELPER(neon_paddl_u32)(uint64_t a, uint64_t b) -{ - uint32_t low = a + (a >> 32); - uint32_t high = b + (b >> 32); - return low + ((uint64_t)high << 32); -} - /* Pairwise long add: add pairs of adjacent elements into * double-width elements in the result (eg _s8 is an 8x8->16 op) */ diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-neon.c +++ b/target/arm/tcg/translate-neon.c @@ -XXX,XX +XXX,XX @@ static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a) return true; } -static bool do_2misc_pairwise(DisasContext *s, arg_2misc *a, - NeonGenWidenFn *widenfn, - NeonGenTwo64OpFn *opfn, - NeonGenTwo64OpFn *accfn) -{ - /* - * Pairwise long operations: widen both halves of the pair, - * combine the pairs with the opfn, and then possibly accumulate - * into the destination with the accfn. - */ - int pass; - - if (!arm_dc_feature(s, ARM_FEATURE_NEON)) { - return false; - } - - /* UNDEF accesses to D16-D31 if they don't exist. */ - if (!dc_isar_feature(aa32_simd_r32, s) && - ((a->vd | a->vm) & 0x10)) { - return false; - } - - if ((a->vd | a->vm) & a->q) { - return false; - } - - if (!widenfn) { - return false; - } - - if (!vfp_access_check(s)) { - return true; - } - - for (pass = 0; pass < a->q + 1; pass++) { - TCGv_i32 tmp; - TCGv_i64 rm0_64, rm1_64, rd_64; - - rm0_64 = tcg_temp_new_i64(); - rm1_64 = tcg_temp_new_i64(); - rd_64 = tcg_temp_new_i64(); - - tmp = tcg_temp_new_i32(); - read_neon_element32(tmp, a->vm, pass * 2, MO_32); - widenfn(rm0_64, tmp); - read_neon_element32(tmp, a->vm, pass * 2 + 1, MO_32); - widenfn(rm1_64, tmp); - - opfn(rd_64, rm0_64, rm1_64); - - if (accfn) { - TCGv_i64 tmp64 = tcg_temp_new_i64(); - read_neon_element64(tmp64, a->vd, pass, MO_64); - accfn(rd_64, tmp64, rd_64); - } - write_neon_element64(rd_64, a->vd, pass, MO_64); - } - return true; -} - -static bool trans_VPADDL_S(DisasContext *s, arg_2misc *a) -{ - static NeonGenWidenFn * const widenfn[] = { - gen_helper_neon_widen_s8, - gen_helper_neon_widen_s16, - tcg_gen_ext_i32_i64, - NULL, - }; - static NeonGenTwo64OpFn * const opfn[] = { - gen_helper_neon_paddl_u16, - gen_helper_neon_paddl_u32, - tcg_gen_add_i64, - NULL, - }; - - return do_2misc_pairwise(s, a, widenfn[a->size], opfn[a->size], NULL); -} - -static bool trans_VPADDL_U(DisasContext *s, arg_2misc *a) -{ - static NeonGenWidenFn * const widenfn[] = { - gen_helper_neon_widen_u8, - gen_helper_neon_widen_u16, - tcg_gen_extu_i32_i64, - NULL, - }; - static NeonGenTwo64OpFn * const opfn[] = { - gen_helper_neon_paddl_u16, - gen_helper_neon_paddl_u32, - tcg_gen_add_i64, - NULL, - }; - - return do_2misc_pairwise(s, a, widenfn[a->size], opfn[a->size], NULL); -} - -static bool trans_VPADAL_S(DisasContext *s, arg_2misc *a) -{ - static NeonGenWidenFn * const widenfn[] = { - gen_helper_neon_widen_s8, - gen_helper_neon_widen_s16, - tcg_gen_ext_i32_i64, - NULL, - }; - static NeonGenTwo64OpFn * const opfn[] = { - gen_helper_neon_paddl_u16, - gen_helper_neon_paddl_u32, - tcg_gen_add_i64, - NULL, - }; - static NeonGenTwo64OpFn * const accfn[] = { - gen_helper_neon_addl_u16, - gen_helper_neon_addl_u32, - tcg_gen_add_i64, - NULL, - }; - - return do_2misc_pairwise(s, a, widenfn[a->size], opfn[a->size], - accfn[a->size]); -} - -static bool trans_VPADAL_U(DisasContext *s, arg_2misc *a) -{ - static NeonGenWidenFn * const widenfn[] = { - gen_helper_neon_widen_u8, - gen_helper_neon_widen_u16, - tcg_gen_extu_i32_i64, - NULL, - }; - static NeonGenTwo64OpFn * const opfn[] = { - gen_helper_neon_paddl_u16, - gen_helper_neon_paddl_u32, - tcg_gen_add_i64, - NULL, - }; - static NeonGenTwo64OpFn * const accfn[] = { - gen_helper_neon_addl_u16, - gen_helper_neon_addl_u32, - tcg_gen_add_i64, - NULL, - }; - - return do_2misc_pairwise(s, a, widenfn[a->size], opfn[a->size], - accfn[a->size]); -} - typedef void ZipFn(TCGv_ptr, TCGv_ptr); static bool do_zip_uzp(DisasContext *s, arg_2misc *a, @@ -XXX,XX +XXX,XX @@ DO_2MISC_VEC(VCLT0, gen_gvec_clt0) DO_2MISC_VEC(VCLS, gen_gvec_cls) DO_2MISC_VEC(VCLZ, gen_gvec_clz) DO_2MISC_VEC(VREV64, gen_gvec_rev64) +DO_2MISC_VEC(VPADDL_S, gen_gvec_saddlp) +DO_2MISC_VEC(VPADDL_U, gen_gvec_uaddlp) +DO_2MISC_VEC(VPADAL_S, gen_gvec_sadalp) +DO_2MISC_VEC(VPADAL_U, gen_gvec_uadalp) static bool trans_VMVN(DisasContext *s, arg_2misc *a) { -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> This includes SADDLP, UADDLP, SADALP, UADALP. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-47-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-a64.h | 2 - target/arm/tcg/a64.decode | 5 ++ target/arm/tcg/helper-a64.c | 18 -------- target/arm/tcg/translate-a64.c | 84 +++------------------------------- 4 files changed, 11 insertions(+), 98 deletions(-) diff --git a/target/arm/tcg/helper-a64.h b/target/arm/tcg/helper-a64.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-a64.h +++ b/target/arm/tcg/helper-a64.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(recpsf_f64, TCG_CALL_NO_RWG, f64, f64, f64, ptr) DEF_HELPER_FLAGS_3(rsqrtsf_f16, TCG_CALL_NO_RWG, f16, f16, f16, ptr) DEF_HELPER_FLAGS_3(rsqrtsf_f32, TCG_CALL_NO_RWG, f32, f32, f32, ptr) DEF_HELPER_FLAGS_3(rsqrtsf_f64, TCG_CALL_NO_RWG, f64, f64, f64, ptr) -DEF_HELPER_FLAGS_1(neon_addlp_u8, TCG_CALL_NO_RWG_SE, i64, i64) -DEF_HELPER_FLAGS_1(neon_addlp_u16, TCG_CALL_NO_RWG_SE, i64, i64) DEF_HELPER_FLAGS_2(frecpx_f64, TCG_CALL_NO_RWG, f64, f64, ptr) DEF_HELPER_FLAGS_2(frecpx_f32, TCG_CALL_NO_RWG, f32, f32, ptr) DEF_HELPER_FLAGS_2(frecpx_f16, TCG_CALL_NO_RWG, f16, f16, ptr) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ CMLT0_v 0.00 1110 ..1 00000 10101 0 ..... ..... @qrr_e REV16_v 0.00 1110 001 00000 00011 0 ..... ..... @qrr_b REV32_v 0.10 1110 0.1 00000 00001 0 ..... ..... @qrr_bh REV64_v 0.00 1110 ..1 00000 00001 0 ..... ..... @qrr_e + +SADDLP_v 0.00 1110 ..1 00000 00101 0 ..... ..... @qrr_e +UADDLP_v 0.10 1110 ..1 00000 00101 0 ..... ..... @qrr_e +SADALP_v 0.00 1110 ..1 00000 01101 0 ..... ..... @qrr_e +UADALP_v 0.10 1110 ..1 00000 01101 0 ..... ..... @qrr_e diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-a64.c +++ b/target/arm/tcg/helper-a64.c @@ -XXX,XX +XXX,XX @@ float64 HELPER(rsqrtsf_f64)(float64 a, float64 b, void *fpstp) return float64_muladd(a, b, float64_three, float_muladd_halve_result, fpst); } -uint64_t HELPER(neon_addlp_u8)(uint64_t a) -{ - uint64_t tmp; - - tmp = a & 0x00ff00ff00ff00ffULL; - tmp += (a >> 8) & 0x00ff00ff00ff00ffULL; - return tmp; -} - -uint64_t HELPER(neon_addlp_u16)(uint64_t a) -{ - uint64_t tmp; - - tmp = a & 0x0000ffff0000ffffULL; - tmp += (a >> 16) & 0x0000ffff0000ffffULL; - return tmp; -} - /* Floating-point reciprocal exponent - see FPRecpX in ARM ARM */ uint32_t HELPER(frecpx_f16)(uint32_t a, void *fpstp) { diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static bool do_gvec_fn2_bhs(DisasContext *s, arg_qrr_e *a, GVecGen2Fn *fn) TRANS(CLS_v, do_gvec_fn2_bhs, a, gen_gvec_cls) TRANS(CLZ_v, do_gvec_fn2_bhs, a, gen_gvec_clz) TRANS(REV64_v, do_gvec_fn2_bhs, a, gen_gvec_rev64) +TRANS(SADDLP_v, do_gvec_fn2_bhs, a, gen_gvec_saddlp) +TRANS(UADDLP_v, do_gvec_fn2_bhs, a, gen_gvec_uaddlp) +TRANS(SADALP_v, do_gvec_fn2_bhs, a, gen_gvec_sadalp) +TRANS(UADALP_v, do_gvec_fn2_bhs, a, gen_gvec_uadalp) /* Common vector code for handling integer to FP conversion */ static void handle_simd_intfp_conv(DisasContext *s, int rd, int rn, @@ -XXX,XX +XXX,XX @@ static void handle_2misc_widening(DisasContext *s, int opcode, bool is_q, } } -static void handle_2misc_pairwise(DisasContext *s, int opcode, bool u, - bool is_q, int size, int rn, int rd) -{ - /* Implement the pairwise operations from 2-misc: - * SADDLP, UADDLP, SADALP, UADALP. - * These all add pairs of elements in the input to produce a - * double-width result element in the output (possibly accumulating). - */ - bool accum = (opcode == 0x6); - int maxpass = is_q ? 2 : 1; - int pass; - TCGv_i64 tcg_res[2]; - - if (size == 2) { - /* 32 + 32 -> 64 op */ - MemOp memop = size + (u ? 0 : MO_SIGN); - - for (pass = 0; pass < maxpass; pass++) { - TCGv_i64 tcg_op1 = tcg_temp_new_i64(); - TCGv_i64 tcg_op2 = tcg_temp_new_i64(); - - tcg_res[pass] = tcg_temp_new_i64(); - - read_vec_element(s, tcg_op1, rn, pass * 2, memop); - read_vec_element(s, tcg_op2, rn, pass * 2 + 1, memop); - tcg_gen_add_i64(tcg_res[pass], tcg_op1, tcg_op2); - if (accum) { - read_vec_element(s, tcg_op1, rd, pass, MO_64); - tcg_gen_add_i64(tcg_res[pass], tcg_res[pass], tcg_op1); - } - } - } else { - for (pass = 0; pass < maxpass; pass++) { - TCGv_i64 tcg_op = tcg_temp_new_i64(); - NeonGenOne64OpFn *genfn; - static NeonGenOne64OpFn * const fns[2][2] = { - { gen_helper_neon_addlp_s8, gen_helper_neon_addlp_u8 }, - { gen_helper_neon_addlp_s16, gen_helper_neon_addlp_u16 }, - }; - - genfn = fns[size][u]; - - tcg_res[pass] = tcg_temp_new_i64(); - - read_vec_element(s, tcg_op, rn, pass, MO_64); - genfn(tcg_res[pass], tcg_op); - - if (accum) { - read_vec_element(s, tcg_op, rd, pass, MO_64); - if (size == 0) { - gen_helper_neon_addl_u16(tcg_res[pass], - tcg_res[pass], tcg_op); - } else { - gen_helper_neon_addl_u32(tcg_res[pass], - tcg_res[pass], tcg_op); - } - } - } - } - if (!is_q) { - tcg_res[1] = tcg_constant_i64(0); - } - for (pass = 0; pass < 2; pass++) { - write_vec_element(s, tcg_res[pass], rd, pass, MO_64); - } -} - static void handle_shll(DisasContext *s, bool is_q, int size, int rn, int rd) { /* Implement SHLL and SHLL2 */ @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) handle_2misc_narrow(s, false, opcode, u, is_q, size, rn, rd); return; - case 0x2: /* SADDLP, UADDLP */ - case 0x6: /* SADALP, UADALP */ - if (size == 3) { - unallocated_encoding(s); - return; - } - if (!fp_access_check(s)) { - return; - } - handle_2misc_pairwise(s, opcode, u, is_q, size, rn, rd); - return; case 0x13: /* SHLL, SHLL2 */ if (u == 0 || size == 3) { unallocated_encoding(s); @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) default: case 0x0: /* REV64, REV32 */ case 0x1: /* REV16 */ + case 0x2: /* SADDLP, UADDLP */ case 0x3: /* SUQADD, USQADD */ case 0x4: /* CLS, CLZ */ case 0x5: /* CNT, NOT, RBIT */ + case 0x6: /* SADALP, UADALP */ case 0x7: /* SQABS, SQNEG */ case 0x8: /* CMGT, CMGE */ case 0x9: /* CMEQ, CMLE */ -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> These have generic equivalents: tcg_gen_vec_{add,sub}{16,32}_i64. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-48-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/helper.h | 4 ---- target/arm/tcg/neon_helper.c | 36 --------------------------------- target/arm/tcg/translate-neon.c | 22 ++++++++++---------- 3 files changed, 11 insertions(+), 51 deletions(-) diff --git a/target/arm/helper.h b/target/arm/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_1(neon_widen_s8, i64, i32) DEF_HELPER_1(neon_widen_u16, i64, i32) DEF_HELPER_1(neon_widen_s16, i64, i32) -DEF_HELPER_2(neon_addl_u16, i64, i64, i64) -DEF_HELPER_2(neon_addl_u32, i64, i64, i64) DEF_HELPER_FLAGS_1(neon_addlp_s8, TCG_CALL_NO_RWG_SE, i64, i64) DEF_HELPER_FLAGS_1(neon_addlp_s16, TCG_CALL_NO_RWG_SE, i64, i64) -DEF_HELPER_2(neon_subl_u16, i64, i64, i64) -DEF_HELPER_2(neon_subl_u32, i64, i64, i64) DEF_HELPER_3(neon_addl_saturate_s32, i64, env, i64, i64) DEF_HELPER_3(neon_addl_saturate_s64, i64, env, i64, i64) DEF_HELPER_2(neon_abdl_u16, i64, i32, i32) diff --git a/target/arm/tcg/neon_helper.c b/target/arm/tcg/neon_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/neon_helper.c +++ b/target/arm/tcg/neon_helper.c @@ -XXX,XX +XXX,XX @@ uint64_t HELPER(neon_widen_s16)(uint32_t x) return ((uint32_t)(int16_t)x) | (high << 32); } -uint64_t HELPER(neon_addl_u16)(uint64_t a, uint64_t b) -{ - uint64_t mask; - mask = (a ^ b) & 0x8000800080008000ull; - a &= ~0x8000800080008000ull; - b &= ~0x8000800080008000ull; - return (a + b) ^ mask; -} - -uint64_t HELPER(neon_addl_u32)(uint64_t a, uint64_t b) -{ - uint64_t mask; - mask = (a ^ b) & 0x8000000080000000ull; - a &= ~0x8000000080000000ull; - b &= ~0x8000000080000000ull; - return (a + b) ^ mask; -} - /* Pairwise long add: add pairs of adjacent elements into * double-width elements in the result (eg _s8 is an 8x8->16 op) */ @@ -XXX,XX +XXX,XX @@ uint64_t HELPER(neon_addlp_s16)(uint64_t a) return (uint32_t)reslo | (((uint64_t)reshi) << 32); } -uint64_t HELPER(neon_subl_u16)(uint64_t a, uint64_t b) -{ - uint64_t mask; - mask = (a ^ ~b) & 0x8000800080008000ull; - a |= 0x8000800080008000ull; - b &= ~0x8000800080008000ull; - return (a - b) ^ mask; -} - -uint64_t HELPER(neon_subl_u32)(uint64_t a, uint64_t b) -{ - uint64_t mask; - mask = (a ^ ~b) & 0x8000000080000000ull; - a |= 0x8000000080000000ull; - b &= ~0x8000000080000000ull; - return (a - b) ^ mask; -} - uint64_t HELPER(neon_addl_saturate_s32)(CPUARMState *env, uint64_t a, uint64_t b) { uint32_t x, y; diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-neon.c +++ b/target/arm/tcg/translate-neon.c @@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a, NULL, NULL, \ }; \ static NeonGenTwo64OpFn * const addfn[] = { \ - gen_helper_neon_##OP##l_u16, \ - gen_helper_neon_##OP##l_u32, \ + tcg_gen_vec_##OP##16_i64, \ + tcg_gen_vec_##OP##32_i64, \ tcg_gen_##OP##_i64, \ NULL, \ }; \ @@ -XXX,XX +XXX,XX @@ static bool do_narrow_3d(DisasContext *s, arg_3diff *a, static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a) \ { \ static NeonGenTwo64OpFn * const addfn[] = { \ - gen_helper_neon_##OP##l_u16, \ - gen_helper_neon_##OP##l_u32, \ + tcg_gen_vec_##OP##16_i64, \ + tcg_gen_vec_##OP##32_i64, \ tcg_gen_##OP##_i64, \ NULL, \ }; \ @@ -XXX,XX +XXX,XX @@ static bool trans_VABAL_S_3d(DisasContext *s, arg_3diff *a) NULL, }; static NeonGenTwo64OpFn * const addfn[] = { - gen_helper_neon_addl_u16, - gen_helper_neon_addl_u32, + tcg_gen_vec_add16_i64, + tcg_gen_vec_add32_i64, tcg_gen_add_i64, NULL, }; @@ -XXX,XX +XXX,XX @@ static bool trans_VABAL_U_3d(DisasContext *s, arg_3diff *a) NULL, }; static NeonGenTwo64OpFn * const addfn[] = { - gen_helper_neon_addl_u16, - gen_helper_neon_addl_u32, + tcg_gen_vec_add16_i64, + tcg_gen_vec_add32_i64, tcg_gen_add_i64, NULL, }; @@ -XXX,XX +XXX,XX @@ static bool trans_VMULL_U_3d(DisasContext *s, arg_3diff *a) NULL, \ }; \ static NeonGenTwo64OpFn * const accfn[] = { \ - gen_helper_neon_##ACC##l_u16, \ - gen_helper_neon_##ACC##l_u32, \ + tcg_gen_vec_##ACC##16_i64, \ + tcg_gen_vec_##ACC##32_i64, \ tcg_gen_##ACC##_i64, \ NULL, \ }; \ @@ -XXX,XX +XXX,XX @@ static bool trans_VMULL_U_2sc(DisasContext *s, arg_2scalar *a) }; \ static NeonGenTwo64OpFn * const accfn[] = { \ NULL, \ - gen_helper_neon_##ACC##l_u32, \ + tcg_gen_vec_##ACC##32_i64, \ tcg_gen_##ACC##_i64, \ NULL, \ }; \ -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> In a couple of places, clearing the entire vector before storing one element is the easiest solution. Wrap that into a helper function. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-49-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-a64.c | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static TCGv_i32 read_fp_hreg(DisasContext *s, int reg) return v; } -/* Clear the bits above an N-bit vector, for N = (is_q ? 128 : 64). +static void clear_vec(DisasContext *s, int rd) +{ + unsigned ofs = fp_reg_offset(s, rd, MO_64); + unsigned vsz = vec_full_reg_size(s); + + tcg_gen_gvec_dup_imm(MO_64, ofs, vsz, vsz, 0); +} + +/* + * Clear the bits above an N-bit vector, for N = (is_q ? 128 : 64). * If SVE is not enabled, then there are only 128 bits in the vector. */ static void clear_vec_high(DisasContext *s, bool is_q, int rd) @@ -XXX,XX +XXX,XX @@ static bool trans_SM3SS1(DisasContext *s, arg_SM3SS1 *a) TCGv_i32 tcg_op2 = tcg_temp_new_i32(); TCGv_i32 tcg_op3 = tcg_temp_new_i32(); TCGv_i32 tcg_res = tcg_temp_new_i32(); - unsigned vsz, dofs; read_vec_element_i32(s, tcg_op1, a->rn, 3, MO_32); read_vec_element_i32(s, tcg_op2, a->rm, 3, MO_32); @@ -XXX,XX +XXX,XX @@ static bool trans_SM3SS1(DisasContext *s, arg_SM3SS1 *a) tcg_gen_rotri_i32(tcg_res, tcg_res, 25); /* Clear the whole register first, then store bits [127:96]. */ - vsz = vec_full_reg_size(s); - dofs = vec_full_reg_offset(s, a->rd); - tcg_gen_gvec_dup_imm(MO_64, dofs, vsz, vsz, 0); + clear_vec(s, a->rd); write_vec_element_i32(s, tcg_res, a->rd, 3, MO_32); } return true; @@ -XXX,XX +XXX,XX @@ static bool do_scalar_muladd_widening_idx(DisasContext *s, arg_rrx_e *a, TCGv_i64 t0 = tcg_temp_new_i64(); TCGv_i64 t1 = tcg_temp_new_i64(); TCGv_i64 t2 = tcg_temp_new_i64(); - unsigned vsz, dofs; if (acc) { read_vec_element(s, t0, a->rd, 0, a->esz + 1); @@ -XXX,XX +XXX,XX @@ static bool do_scalar_muladd_widening_idx(DisasContext *s, arg_rrx_e *a, fn(t0, t1, t2); /* Clear the whole register first, then store scalar. */ - vsz = vec_full_reg_size(s); - dofs = vec_full_reg_offset(s, a->rd); - tcg_gen_gvec_dup_imm(MO_64, dofs, vsz, vsz, 0); + clear_vec(s, a->rd); write_vec_element(s, t0, a->rd, 0, a->esz + 1); } return true; -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-50-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 9 ++ target/arm/tcg/translate-a64.c | 153 ++++++++++++++++++++------------- 2 files changed, 102 insertions(+), 60 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ CMEQ0_s 0101 1110 111 00000 10011 0 ..... ..... @rr CMLE0_s 0111 1110 111 00000 10011 0 ..... ..... @rr CMLT0_s 0101 1110 111 00000 10101 0 ..... ..... @rr +SQXTUN_s 0111 1110 ..1 00001 00101 0 ..... ..... @rr_e +SQXTN_s 0101 1110 ..1 00001 01001 0 ..... ..... @rr_e +UQXTN_s 0111 1110 ..1 00001 01001 0 ..... ..... @rr_e + # Advanced SIMD two-register miscellaneous SQABS_v 0.00 1110 ..1 00000 01111 0 ..... ..... @qrr_e @@ -XXX,XX +XXX,XX @@ SADDLP_v 0.00 1110 ..1 00000 00101 0 ..... ..... @qrr_e UADDLP_v 0.10 1110 ..1 00000 00101 0 ..... ..... @qrr_e SADALP_v 0.00 1110 ..1 00000 01101 0 ..... ..... @qrr_e UADALP_v 0.10 1110 ..1 00000 01101 0 ..... ..... @qrr_e + +XTN 0.00 1110 ..1 00001 00101 0 ..... ..... @qrr_e +SQXTUN_v 0.10 1110 ..1 00001 00101 0 ..... ..... @qrr_e +SQXTN_v 0.00 1110 ..1 00001 01001 0 ..... ..... @qrr_e +UQXTN_v 0.10 1110 ..1 00001 01001 0 ..... ..... @qrr_e diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ TRANS(CMLE0_s, do_cmop0_d, a, TCG_COND_LE) TRANS(CMLT0_s, do_cmop0_d, a, TCG_COND_LT) TRANS(CMEQ0_s, do_cmop0_d, a, TCG_COND_EQ) +static bool do_2misc_narrow_scalar(DisasContext *s, arg_rr_e *a, + ArithOneOp * const fn[3]) +{ + if (a->esz == MO_64) { + return false; + } + if (fp_access_check(s)) { + TCGv_i64 t = tcg_temp_new_i64(); + + read_vec_element(s, t, a->rn, 0, a->esz + 1); + fn[a->esz](t, t); + clear_vec(s, a->rd); + write_vec_element(s, t, a->rd, 0, a->esz); + } + return true; +} + +#define WRAP_ENV(NAME) \ + static void gen_##NAME(TCGv_i64 d, TCGv_i64 n) \ + { gen_helper_##NAME(d, tcg_env, n); } + +WRAP_ENV(neon_unarrow_sat8) +WRAP_ENV(neon_unarrow_sat16) +WRAP_ENV(neon_unarrow_sat32) + +static ArithOneOp * const f_scalar_sqxtun[] = { + gen_neon_unarrow_sat8, + gen_neon_unarrow_sat16, + gen_neon_unarrow_sat32, +}; +TRANS(SQXTUN_s, do_2misc_narrow_scalar, a, f_scalar_sqxtun) + +WRAP_ENV(neon_narrow_sat_s8) +WRAP_ENV(neon_narrow_sat_s16) +WRAP_ENV(neon_narrow_sat_s32) + +static ArithOneOp * const f_scalar_sqxtn[] = { + gen_neon_narrow_sat_s8, + gen_neon_narrow_sat_s16, + gen_neon_narrow_sat_s32, +}; +TRANS(SQXTN_s, do_2misc_narrow_scalar, a, f_scalar_sqxtn) + +WRAP_ENV(neon_narrow_sat_u8) +WRAP_ENV(neon_narrow_sat_u16) +WRAP_ENV(neon_narrow_sat_u32) + +static ArithOneOp * const f_scalar_uqxtn[] = { + gen_neon_narrow_sat_u8, + gen_neon_narrow_sat_u16, + gen_neon_narrow_sat_u32, +}; +TRANS(UQXTN_s, do_2misc_narrow_scalar, a, f_scalar_uqxtn) + +#undef WRAP_ENV + static bool do_gvec_fn2(DisasContext *s, arg_qrr_e *a, GVecGen2Fn *fn) { if (!a->q && a->esz == MO_64) { @@ -XXX,XX +XXX,XX @@ TRANS(UADDLP_v, do_gvec_fn2_bhs, a, gen_gvec_uaddlp) TRANS(SADALP_v, do_gvec_fn2_bhs, a, gen_gvec_sadalp) TRANS(UADALP_v, do_gvec_fn2_bhs, a, gen_gvec_uadalp) +static bool do_2misc_narrow_vector(DisasContext *s, arg_qrr_e *a, + ArithOneOp * const fn[3]) +{ + if (a->esz == MO_64) { + return false; + } + if (fp_access_check(s)) { + TCGv_i64 t0 = tcg_temp_new_i64(); + TCGv_i64 t1 = tcg_temp_new_i64(); + + read_vec_element(s, t0, a->rn, 0, MO_64); + read_vec_element(s, t1, a->rn, 1, MO_64); + fn[a->esz](t0, t0); + fn[a->esz](t1, t1); + write_vec_element(s, t0, a->rd, a->q ? 2 : 0, MO_32); + write_vec_element(s, t1, a->rd, a->q ? 3 : 1, MO_32); + clear_vec_high(s, a->q, a->rd); + } + return true; +} + +static ArithOneOp * const f_scalar_xtn[] = { + gen_helper_neon_narrow_u8, + gen_helper_neon_narrow_u16, + tcg_gen_ext32u_i64, +}; +TRANS(XTN, do_2misc_narrow_vector, a, f_scalar_xtn) +TRANS(SQXTUN_v, do_2misc_narrow_vector, a, f_scalar_sqxtun) +TRANS(SQXTN_v, do_2misc_narrow_vector, a, f_scalar_sqxtn) +TRANS(UQXTN_v, do_2misc_narrow_vector, a, f_scalar_uqxtn) + /* Common vector code for handling integer to FP conversion */ static void handle_simd_intfp_conv(DisasContext *s, int rd, int rn, int elements, int is_signed, @@ -XXX,XX +XXX,XX @@ static void handle_2misc_narrow(DisasContext *s, bool scalar, tcg_res[pass] = tcg_temp_new_i64(); switch (opcode) { - case 0x12: /* XTN, SQXTUN */ - { - static NeonGenOne64OpFn * const xtnfns[3] = { - gen_helper_neon_narrow_u8, - gen_helper_neon_narrow_u16, - tcg_gen_ext32u_i64, - }; - static NeonGenOne64OpEnvFn * const sqxtunfns[3] = { - gen_helper_neon_unarrow_sat8, - gen_helper_neon_unarrow_sat16, - gen_helper_neon_unarrow_sat32, - }; - if (u) { - genenvfn = sqxtunfns[size]; - } else { - genfn = xtnfns[size]; - } - break; - } - case 0x14: /* SQXTN, UQXTN */ - { - static NeonGenOne64OpEnvFn * const fns[3][2] = { - { gen_helper_neon_narrow_sat_s8, - gen_helper_neon_narrow_sat_u8 }, - { gen_helper_neon_narrow_sat_s16, - gen_helper_neon_narrow_sat_u16 }, - { gen_helper_neon_narrow_sat_s32, - gen_helper_neon_narrow_sat_u32 }, - }; - genenvfn = fns[size][u]; - break; - } case 0x16: /* FCVTN, FCVTN2 */ /* 32 bit to 16 bit or 64 bit to 32 bit float conversion */ if (size == 2) { @@ -XXX,XX +XXX,XX @@ static void handle_2misc_narrow(DisasContext *s, bool scalar, } break; default: + case 0x12: /* XTN, SQXTUN */ + case 0x14: /* SQXTN, UQXTN */ g_assert_not_reached(); } @@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) TCGv_ptr tcg_fpstatus; switch (opcode) { - case 0x12: /* SQXTUN */ - if (!u) { - unallocated_encoding(s); - return; - } - /* fall through */ - case 0x14: /* SQXTN, UQXTN */ - if (size == 3) { - unallocated_encoding(s); - return; - } - if (!fp_access_check(s)) { - return; - } - handle_2misc_narrow(s, true, opcode, u, false, size, rn, rd); - return; case 0xc ... 0xf: case 0x16 ... 0x1d: case 0x1f: @@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) case 0x9: /* CMEQ, CMLE */ case 0xa: /* CMLT */ case 0xb: /* ABS, NEG */ + case 0x12: /* SQXTUN */ + case 0x14: /* SQXTN, UQXTN */ unallocated_encoding(s); return; } @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) TCGv_ptr tcg_fpstatus; switch (opcode) { - case 0x12: /* XTN, XTN2, SQXTUN, SQXTUN2 */ - case 0x14: /* SQXTN, SQXTN2, UQXTN, UQXTN2 */ - if (size == 3) { - unallocated_encoding(s); - return; - } - if (!fp_access_check(s)) { - return; - } - - handle_2misc_narrow(s, false, opcode, u, is_q, size, rn, rd); - return; case 0x13: /* SHLL, SHLL2 */ if (u == 0 || size == 3) { unallocated_encoding(s); @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) case 0x9: /* CMEQ, CMLE */ case 0xa: /* CMLT */ case 0xb: /* ABS, NEG */ + case 0x12: /* XTN, XTN2, SQXTUN, SQXTUN2 */ + case 0x14: /* SQXTN, SQXTN2, UQXTN, UQXTN2 */ unallocated_encoding(s); return; } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-51-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 5 ++ target/arm/tcg/translate-a64.c | 89 ++++++++++++++++++---------------- 2 files changed, 52 insertions(+), 42 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ %rd 0:5 %esz_sd 22:1 !function=plus_2 +%esz_hs 22:1 !function=plus_1 %esz_hsd 22:2 !function=xor_2 %hl 11:1 21:1 %hlm 11:1 20:2 @@ -XXX,XX +XXX,XX @@ @qrr_b . q:1 ...... .. ...... ...... rn:5 rd:5 &qrr_e esz=0 @qrr_h . q:1 ...... .. ...... ...... rn:5 rd:5 &qrr_e esz=1 @qrr_bh . q:1 ...... . esz:1 ...... ...... rn:5 rd:5 &qrr_e +@qrr_hs . q:1 ...... .. ...... ...... rn:5 rd:5 &qrr_e esz=%esz_hs @qrr_e . q:1 ...... esz:2 ...... ...... rn:5 rd:5 &qrr_e @qrrr_b . q:1 ...... ... rm:5 ...... rn:5 rd:5 &qrrr_e esz=0 @@ -XXX,XX +XXX,XX @@ XTN 0.00 1110 ..1 00001 00101 0 ..... ..... @qrr_e SQXTUN_v 0.10 1110 ..1 00001 00101 0 ..... ..... @qrr_e SQXTN_v 0.00 1110 ..1 00001 01001 0 ..... ..... @qrr_e UQXTN_v 0.10 1110 ..1 00001 01001 0 ..... ..... @qrr_e + +FCVTN_v 0.00 1110 0.1 00001 01101 0 ..... ..... @qrr_hs +BFCVTN_v 0.00 1110 101 00001 01101 0 ..... ..... @qrr_h diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ TRANS(SQXTUN_v, do_2misc_narrow_vector, a, f_scalar_sqxtun) TRANS(SQXTN_v, do_2misc_narrow_vector, a, f_scalar_sqxtn) TRANS(UQXTN_v, do_2misc_narrow_vector, a, f_scalar_uqxtn) +static void gen_fcvtn_hs(TCGv_i64 d, TCGv_i64 n) +{ + TCGv_i32 tcg_lo = tcg_temp_new_i32(); + TCGv_i32 tcg_hi = tcg_temp_new_i32(); + TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR); + TCGv_i32 ahp = get_ahp_flag(); + + tcg_gen_extr_i64_i32(tcg_lo, tcg_hi, n); + gen_helper_vfp_fcvt_f32_to_f16(tcg_lo, tcg_lo, fpst, ahp); + gen_helper_vfp_fcvt_f32_to_f16(tcg_hi, tcg_hi, fpst, ahp); + tcg_gen_deposit_i32(tcg_lo, tcg_lo, tcg_hi, 16, 16); + tcg_gen_extu_i32_i64(d, tcg_lo); +} + +static void gen_fcvtn_sd(TCGv_i64 d, TCGv_i64 n) +{ + TCGv_i32 tmp = tcg_temp_new_i32(); + gen_helper_vfp_fcvtsd(tmp, n, tcg_env); + tcg_gen_extu_i32_i64(d, tmp); +} + +static ArithOneOp * const f_vector_fcvtn[] = { + NULL, + gen_fcvtn_hs, + gen_fcvtn_sd, +}; +TRANS(FCVTN_v, do_2misc_narrow_vector, a, f_vector_fcvtn) + +static void gen_bfcvtn_hs(TCGv_i64 d, TCGv_i64 n) +{ + TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR); + TCGv_i32 tmp = tcg_temp_new_i32(); + gen_helper_bfcvt_pair(tmp, n, fpst); + tcg_gen_extu_i32_i64(d, tmp); +} + +static ArithOneOp * const f_vector_bfcvtn[] = { + NULL, + gen_bfcvtn_hs, + NULL, +}; +TRANS_FEAT(BFCVTN_v, aa64_bf16, do_2misc_narrow_vector, a, f_vector_bfcvtn) + /* Common vector code for handling integer to FP conversion */ static void handle_simd_intfp_conv(DisasContext *s, int rd, int rn, int elements, int is_signed, @@ -XXX,XX +XXX,XX @@ static void handle_2misc_narrow(DisasContext *s, bool scalar, tcg_res[pass] = tcg_temp_new_i64(); switch (opcode) { - case 0x16: /* FCVTN, FCVTN2 */ - /* 32 bit to 16 bit or 64 bit to 32 bit float conversion */ - if (size == 2) { - TCGv_i32 tmp = tcg_temp_new_i32(); - gen_helper_vfp_fcvtsd(tmp, tcg_op, tcg_env); - tcg_gen_extu_i32_i64(tcg_res[pass], tmp); - } else { - TCGv_i32 tcg_lo = tcg_temp_new_i32(); - TCGv_i32 tcg_hi = tcg_temp_new_i32(); - TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR); - TCGv_i32 ahp = get_ahp_flag(); - - tcg_gen_extr_i64_i32(tcg_lo, tcg_hi, tcg_op); - gen_helper_vfp_fcvt_f32_to_f16(tcg_lo, tcg_lo, fpst, ahp); - gen_helper_vfp_fcvt_f32_to_f16(tcg_hi, tcg_hi, fpst, ahp); - tcg_gen_deposit_i32(tcg_lo, tcg_lo, tcg_hi, 16, 16); - tcg_gen_extu_i32_i64(tcg_res[pass], tcg_lo); - } - break; - case 0x36: /* BFCVTN, BFCVTN2 */ - { - TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR); - TCGv_i32 tmp = tcg_temp_new_i32(); - gen_helper_bfcvt_pair(tmp, tcg_op, fpst); - tcg_gen_extu_i32_i64(tcg_res[pass], tmp); - } - break; case 0x56: /* FCVTXN, FCVTXN2 */ { /* @@ -XXX,XX +XXX,XX @@ static void handle_2misc_narrow(DisasContext *s, bool scalar, default: case 0x12: /* XTN, SQXTUN */ case 0x14: /* SQXTN, UQXTN */ + case 0x16: /* FCVTN, FCVTN2 */ + case 0x36: /* BFCVTN, BFCVTN2 */ g_assert_not_reached(); } @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) unallocated_encoding(s); return; } - /* fall through */ - case 0x16: /* FCVTN, FCVTN2 */ - /* handle_2misc_narrow does a 2*size -> size operation, but these - * instructions encode the source size rather than dest size. - */ - if (!fp_access_check(s)) { - return; - } - handle_2misc_narrow(s, false, opcode, 0, is_q, size - 1, rn, rd); - return; - case 0x36: /* BFCVTN, BFCVTN2 */ - if (!dc_isar_feature(aa64_bf16, s) || size != 2) { - unallocated_encoding(s); - return; - } if (!fp_access_check(s)) { return; } @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) } break; default: + case 0x16: /* FCVTN, FCVTN2 */ + case 0x36: /* BFCVTN, BFCVTN2 */ unallocated_encoding(s); return; } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Remove handle_2misc_narrow as this was the last insn decoded by that function. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-52-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 4 ++ target/arm/tcg/translate-a64.c | 101 +++++++-------------------------- 2 files changed, 24 insertions(+), 81 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ @qrr_b . q:1 ...... .. ...... ...... rn:5 rd:5 &qrr_e esz=0 @qrr_h . q:1 ...... .. ...... ...... rn:5 rd:5 &qrr_e esz=1 +@qrr_s . q:1 ...... .. ...... ...... rn:5 rd:5 &qrr_e esz=2 @qrr_bh . q:1 ...... . esz:1 ...... ...... rn:5 rd:5 &qrr_e @qrr_hs . q:1 ...... .. ...... ...... rn:5 rd:5 &qrr_e esz=%esz_hs @qrr_e . q:1 ...... esz:2 ...... ...... rn:5 rd:5 &qrr_e @@ -XXX,XX +XXX,XX @@ SQXTUN_s 0111 1110 ..1 00001 00101 0 ..... ..... @rr_e SQXTN_s 0101 1110 ..1 00001 01001 0 ..... ..... @rr_e UQXTN_s 0111 1110 ..1 00001 01001 0 ..... ..... @rr_e +FCVTXN_s 0111 1110 011 00001 01101 0 ..... ..... @rr_s + # Advanced SIMD two-register miscellaneous SQABS_v 0.00 1110 ..1 00000 01111 0 ..... ..... @qrr_e @@ -XXX,XX +XXX,XX @@ SQXTN_v 0.00 1110 ..1 00001 01001 0 ..... ..... @qrr_e UQXTN_v 0.10 1110 ..1 00001 01001 0 ..... ..... @qrr_e FCVTN_v 0.00 1110 0.1 00001 01101 0 ..... ..... @qrr_hs +FCVTXN_v 0.10 1110 011 00001 01101 0 ..... ..... @qrr_s BFCVTN_v 0.00 1110 101 00001 01101 0 ..... ..... @qrr_h diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static ArithOneOp * const f_scalar_uqxtn[] = { }; TRANS(UQXTN_s, do_2misc_narrow_scalar, a, f_scalar_uqxtn) +static void gen_fcvtxn_sd(TCGv_i64 d, TCGv_i64 n) +{ + /* + * 64 bit to 32 bit float conversion + * with von Neumann rounding (round to odd) + */ + TCGv_i32 tmp = tcg_temp_new_i32(); + gen_helper_fcvtx_f64_to_f32(tmp, n, tcg_env); + tcg_gen_extu_i32_i64(d, tmp); +} + +static ArithOneOp * const f_scalar_fcvtxn[] = { + NULL, + NULL, + gen_fcvtxn_sd, +}; +TRANS(FCVTXN_s, do_2misc_narrow_scalar, a, f_scalar_fcvtxn) + #undef WRAP_ENV static bool do_gvec_fn2(DisasContext *s, arg_qrr_e *a, GVecGen2Fn *fn) @@ -XXX,XX +XXX,XX @@ static ArithOneOp * const f_vector_fcvtn[] = { gen_fcvtn_sd, }; TRANS(FCVTN_v, do_2misc_narrow_vector, a, f_vector_fcvtn) +TRANS(FCVTXN_v, do_2misc_narrow_vector, a, f_scalar_fcvtxn) static void gen_bfcvtn_hs(TCGv_i64 d, TCGv_i64 n) { @@ -XXX,XX +XXX,XX @@ static void handle_2misc_reciprocal(DisasContext *s, int opcode, } } -static void handle_2misc_narrow(DisasContext *s, bool scalar, - int opcode, bool u, bool is_q, - int size, int rn, int rd) -{ - /* Handle 2-reg-misc ops which are narrowing (so each 2*size element - * in the source becomes a size element in the destination). - */ - int pass; - TCGv_i64 tcg_res[2]; - int destelt = is_q ? 2 : 0; - int passes = scalar ? 1 : 2; - - if (scalar) { - tcg_res[1] = tcg_constant_i64(0); - } - - for (pass = 0; pass < passes; pass++) { - TCGv_i64 tcg_op = tcg_temp_new_i64(); - NeonGenOne64OpFn *genfn = NULL; - NeonGenOne64OpEnvFn *genenvfn = NULL; - - if (scalar) { - read_vec_element(s, tcg_op, rn, pass, size + 1); - } else { - read_vec_element(s, tcg_op, rn, pass, MO_64); - } - tcg_res[pass] = tcg_temp_new_i64(); - - switch (opcode) { - case 0x56: /* FCVTXN, FCVTXN2 */ - { - /* - * 64 bit to 32 bit float conversion - * with von Neumann rounding (round to odd) - */ - TCGv_i32 tmp = tcg_temp_new_i32(); - assert(size == 2); - gen_helper_fcvtx_f64_to_f32(tmp, tcg_op, tcg_env); - tcg_gen_extu_i32_i64(tcg_res[pass], tmp); - } - break; - default: - case 0x12: /* XTN, SQXTUN */ - case 0x14: /* SQXTN, UQXTN */ - case 0x16: /* FCVTN, FCVTN2 */ - case 0x36: /* BFCVTN, BFCVTN2 */ - g_assert_not_reached(); - } - - if (genfn) { - genfn(tcg_res[pass], tcg_op); - } else if (genenvfn) { - genenvfn(tcg_res[pass], tcg_env, tcg_op); - } - } - - for (pass = 0; pass < 2; pass++) { - write_vec_element(s, tcg_res[pass], rd, destelt + pass, MO_32); - } - clear_vec_high(s, is_q, rd); -} - /* AdvSIMD scalar two reg misc * 31 30 29 28 24 23 22 21 17 16 12 11 10 9 5 4 0 * +-----+---+-----------+------+-----------+--------+-----+------+------+ @@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) rmode = FPROUNDING_TIEAWAY; break; case 0x56: /* FCVTXN, FCVTXN2 */ - if (size == 2) { - unallocated_encoding(s); - return; - } - if (!fp_access_check(s)) { - return; - } - handle_2misc_narrow(s, true, opcode, u, false, size - 1, rn, rd); - return; default: unallocated_encoding(s); return; @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) } handle_2misc_reciprocal(s, opcode, false, u, is_q, size, rn, rd); return; - case 0x56: /* FCVTXN, FCVTXN2 */ - if (size == 2) { - unallocated_encoding(s); - return; - } - if (!fp_access_check(s)) { - return; - } - handle_2misc_narrow(s, false, opcode, 0, is_q, size - 1, rn, rd); - return; case 0x17: /* FCVTL, FCVTL2 */ if (!fp_access_check(s)) { return; @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) default: case 0x16: /* FCVTN, FCVTN2 */ case 0x36: /* BFCVTN, BFCVTN2 */ + case 0x56: /* FCVTXN, FCVTXN2 */ unallocated_encoding(s); return; } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-53-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 2 + target/arm/tcg/translate-a64.c | 75 +++++++++++++++++----------------- 2 files changed, 40 insertions(+), 37 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ UQXTN_v 0.10 1110 ..1 00001 01001 0 ..... ..... @qrr_e FCVTN_v 0.00 1110 0.1 00001 01101 0 ..... ..... @qrr_hs FCVTXN_v 0.10 1110 011 00001 01101 0 ..... ..... @qrr_s BFCVTN_v 0.00 1110 101 00001 01101 0 ..... ..... @qrr_h + +SHLL_v 0.10 1110 ..1 00001 00111 0 ..... ..... @qrr_e diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static ArithOneOp * const f_vector_bfcvtn[] = { }; TRANS_FEAT(BFCVTN_v, aa64_bf16, do_2misc_narrow_vector, a, f_vector_bfcvtn) +static bool trans_SHLL_v(DisasContext *s, arg_qrr_e *a) +{ + static NeonGenWidenFn * const widenfns[3] = { + gen_helper_neon_widen_u8, + gen_helper_neon_widen_u16, + tcg_gen_extu_i32_i64, + }; + NeonGenWidenFn *widenfn; + TCGv_i64 tcg_res[2]; + TCGv_i32 tcg_op; + int part, pass; + + if (a->esz == MO_64) { + return false; + } + if (!fp_access_check(s)) { + return true; + } + + tcg_op = tcg_temp_new_i32(); + widenfn = widenfns[a->esz]; + part = a->q ? 2 : 0; + + for (pass = 0; pass < 2; pass++) { + read_vec_element_i32(s, tcg_op, a->rn, part + pass, MO_32); + tcg_res[pass] = tcg_temp_new_i64(); + widenfn(tcg_res[pass], tcg_op); + tcg_gen_shli_i64(tcg_res[pass], tcg_res[pass], 8 << a->esz); + } + + for (pass = 0; pass < 2; pass++) { + write_vec_element(s, tcg_res[pass], a->rd, pass, MO_64); + } + return true; +} + + /* Common vector code for handling integer to FP conversion */ static void handle_simd_intfp_conv(DisasContext *s, int rd, int rn, int elements, int is_signed, @@ -XXX,XX +XXX,XX @@ static void handle_2misc_widening(DisasContext *s, int opcode, bool is_q, } } -static void handle_shll(DisasContext *s, bool is_q, int size, int rn, int rd) -{ - /* Implement SHLL and SHLL2 */ - int pass; - int part = is_q ? 2 : 0; - TCGv_i64 tcg_res[2]; - - for (pass = 0; pass < 2; pass++) { - static NeonGenWidenFn * const widenfns[3] = { - gen_helper_neon_widen_u8, - gen_helper_neon_widen_u16, - tcg_gen_extu_i32_i64, - }; - NeonGenWidenFn *widenfn = widenfns[size]; - TCGv_i32 tcg_op = tcg_temp_new_i32(); - - read_vec_element_i32(s, tcg_op, rn, part + pass, MO_32); - tcg_res[pass] = tcg_temp_new_i64(); - widenfn(tcg_res[pass], tcg_op); - tcg_gen_shli_i64(tcg_res[pass], tcg_res[pass], 8 << size); - } - - for (pass = 0; pass < 2; pass++) { - write_vec_element(s, tcg_res[pass], rd, pass, MO_64); - } -} - /* AdvSIMD two reg misc * 31 30 29 28 24 23 22 21 17 16 12 11 10 9 5 4 0 * +---+---+---+-----------+------+-----------+--------+-----+------+------+ @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) TCGv_ptr tcg_fpstatus; switch (opcode) { - case 0x13: /* SHLL, SHLL2 */ - if (u == 0 || size == 3) { - unallocated_encoding(s); - return; - } - if (!fp_access_check(s)) { - return; - } - handle_shll(s, is_q, size, rn, rd); - return; case 0xc ... 0xf: case 0x16 ... 0x1f: { @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) case 0xa: /* CMLT */ case 0xb: /* ABS, NEG */ case 0x12: /* XTN, XTN2, SQXTUN, SQXTUN2 */ + case 0x13: /* SHLL, SHLL2 */ case 0x14: /* SQXTN, SQXTN2, UQXTN, UQXTN2 */ unallocated_encoding(s); return; -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Move the current implementation out of translate-neon.c, and extend to handle all element sizes. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-54-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate.h | 6 ++++++ target/arm/tcg/gengvec.c | 14 ++++++++++++++ target/arm/tcg/translate-neon.c | 20 ++------------------ 3 files changed, 22 insertions(+), 18 deletions(-) diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate.h +++ b/target/arm/tcg/translate.h @@ -XXX,XX +XXX,XX @@ void gen_gvec_uaddlp(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, void gen_gvec_uadalp(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, uint32_t opr_sz, uint32_t max_sz); +/* These exclusively manipulate the sign bit. */ +void gen_gvec_fabs(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz); +void gen_gvec_fneg(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz); + /* * Forward to the isar_feature_* tests given a DisasContext pointer. */ diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/gengvec.c +++ b/target/arm/tcg/gengvec.c @@ -XXX,XX +XXX,XX @@ void gen_gvec_uadalp(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, assert(vece <= MO_32); tcg_gen_gvec_2(rd_ofs, rn_ofs, opr_sz, max_sz, &g[vece]); } + +void gen_gvec_fabs(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz) +{ + uint64_t s_bit = 1ull << ((8 << vece) - 1); + tcg_gen_gvec_andi(vece, dofs, aofs, s_bit - 1, oprsz, maxsz); +} + +void gen_gvec_fneg(unsigned vece, uint32_t dofs, uint32_t aofs, + uint32_t oprsz, uint32_t maxsz) +{ + uint64_t s_bit = 1ull << ((8 << vece) - 1); + tcg_gen_gvec_xori(vece, dofs, aofs, s_bit, oprsz, maxsz); +} diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-neon.c +++ b/target/arm/tcg/translate-neon.c @@ -XXX,XX +XXX,XX @@ static bool do_2misc(DisasContext *s, arg_2misc *a, NeonGenOneOpFn *fn) return true; } -static void gen_VABS_F(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs, - uint32_t oprsz, uint32_t maxsz) -{ - tcg_gen_gvec_andi(vece, rd_ofs, rm_ofs, - vece == MO_16 ? 0x7fff : 0x7fffffff, - oprsz, maxsz); -} - static bool trans_VABS_F(DisasContext *s, arg_2misc *a) { if (a->size == MO_16) { @@ -XXX,XX +XXX,XX @@ static bool trans_VABS_F(DisasContext *s, arg_2misc *a) } else if (a->size != MO_32) { return false; } - return do_2misc_vec(s, a, gen_VABS_F); -} - -static void gen_VNEG_F(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs, - uint32_t oprsz, uint32_t maxsz) -{ - tcg_gen_gvec_xori(vece, rd_ofs, rm_ofs, - vece == MO_16 ? 0x8000 : 0x80000000, - oprsz, maxsz); + return do_2misc_vec(s, a, gen_gvec_fabs); } static bool trans_VNEG_F(DisasContext *s, arg_2misc *a) @@ -XXX,XX +XXX,XX @@ static bool trans_VNEG_F(DisasContext *s, arg_2misc *a) } else if (a->size != MO_32) { return false; } - return do_2misc_vec(s, a, gen_VNEG_F); + return do_2misc_vec(s, a, gen_gvec_fneg); } static bool trans_VRECPE(DisasContext *s, arg_2misc *a) -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-55-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 7 +++++ target/arm/tcg/translate-a64.c | 54 +++++++++++++++------------------- 2 files changed, 31 insertions(+), 30 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ @qrr_s . q:1 ...... .. ...... ...... rn:5 rd:5 &qrr_e esz=2 @qrr_bh . q:1 ...... . esz:1 ...... ...... rn:5 rd:5 &qrr_e @qrr_hs . q:1 ...... .. ...... ...... rn:5 rd:5 &qrr_e esz=%esz_hs +@qrr_sd . q:1 ...... .. ...... ...... rn:5 rd:5 &qrr_e esz=%esz_sd @qrr_e . q:1 ...... esz:2 ...... ...... rn:5 rd:5 &qrr_e @qrrr_b . q:1 ...... ... rm:5 ...... rn:5 rd:5 &qrrr_e esz=0 @@ -XXX,XX +XXX,XX @@ FCVTXN_v 0.10 1110 011 00001 01101 0 ..... ..... @qrr_s BFCVTN_v 0.00 1110 101 00001 01101 0 ..... ..... @qrr_h SHLL_v 0.10 1110 ..1 00001 00111 0 ..... ..... @qrr_e + +FABS_v 0.00 1110 111 11000 11111 0 ..... ..... @qrr_h +FABS_v 0.00 1110 1.1 00000 11111 0 ..... ..... @qrr_sd + +FNEG_v 0.10 1110 111 11000 11111 0 ..... ..... @qrr_h +FNEG_v 0.10 1110 1.1 00000 11111 0 ..... ..... @qrr_sd diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static bool trans_SHLL_v(DisasContext *s, arg_qrr_e *a) return true; } +static bool do_fabs_fneg_v(DisasContext *s, arg_qrr_e *a, GVecGen2Fn *fn) +{ + int check = fp_access_check_vector_hsd(s, a->q, a->esz); + + if (check <= 0) { + return check == 0; + } + + gen_gvec_fn2(s, a->q, a->rd, a->rn, fn, a->esz); + return true; +} + +TRANS(FABS_v, do_fabs_fneg_v, a, gen_gvec_fabs) +TRANS(FNEG_v, do_fabs_fneg_v, a, gen_gvec_fneg) /* Common vector code for handling integer to FP conversion */ static void handle_simd_intfp_conv(DisasContext *s, int rd, int rn, @@ -XXX,XX +XXX,XX @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u, * requires them. */ switch (opcode) { - case 0x2f: /* FABS */ - gen_vfp_absd(tcg_rd, tcg_rn); - break; - case 0x6f: /* FNEG */ - gen_vfp_negd(tcg_rd, tcg_rn); - break; case 0x7f: /* FSQRT */ gen_helper_vfp_sqrtd(tcg_rd, tcg_rn, tcg_fpstatus); break; @@ -XXX,XX +XXX,XX @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u, case 0x9: /* CMEQ, CMLE */ case 0xa: /* CMLT */ case 0xb: /* ABS, NEG */ + case 0x2f: /* FABS */ + case 0x6f: /* FNEG */ g_assert_not_reached(); } } @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) opcode |= (extract32(size, 1, 1) << 5) | (u << 6); size = is_double ? 3 : 2; switch (opcode) { - case 0x2f: /* FABS */ - case 0x6f: /* FNEG */ - if (size == 3 && !is_q) { - unallocated_encoding(s); - return; - } - break; case 0x1d: /* SCVTF */ case 0x5d: /* UCVTF */ { @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) case 0x16: /* FCVTN, FCVTN2 */ case 0x36: /* BFCVTN, BFCVTN2 */ case 0x56: /* FCVTXN, FCVTXN2 */ + case 0x2f: /* FABS */ + case 0x6f: /* FNEG */ unallocated_encoding(s); return; } @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) { /* Special cases for 32 bit elements */ switch (opcode) { - case 0x2f: /* FABS */ - gen_vfp_abss(tcg_res, tcg_op); - break; - case 0x6f: /* FNEG */ - gen_vfp_negs(tcg_res, tcg_op); - break; case 0x7f: /* FSQRT */ gen_helper_vfp_sqrts(tcg_res, tcg_op, tcg_fpstatus); break; @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) break; default: case 0x7: /* SQABS, SQNEG */ + case 0x2f: /* FABS */ + case 0x6f: /* FNEG */ g_assert_not_reached(); } } @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn) case 0x7b: /* FCVTZU */ rmode = FPROUNDING_ZERO; break; - case 0x2f: /* FABS */ - case 0x6f: /* FNEG */ - only_in_vector = true; - need_fpst = false; - break; case 0x7d: /* FRSQRTE */ break; case 0x7f: /* FSQRT (vector) */ only_in_vector = true; break; default: + case 0x2f: /* FABS */ + case 0x6f: /* FNEG */ unallocated_encoding(s); return; } @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn) case 0x59: /* FRINTX */ gen_helper_advsimd_rinth_exact(tcg_res, tcg_op, tcg_fpstatus); break; - case 0x2f: /* FABS */ - tcg_gen_andi_i32(tcg_res, tcg_op, 0x7fff); - break; - case 0x6f: /* FNEG */ - tcg_gen_xori_i32(tcg_res, tcg_op, 0x8000); - break; case 0x7d: /* FRSQRTE */ gen_helper_rsqrte_f16(tcg_res, tcg_op, tcg_fpstatus); break; @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn) gen_helper_vfp_sqrth(tcg_res, tcg_op, tcg_fpstatus); break; default: + case 0x2f: /* FABS */ + case 0x6f: /* FNEG */ g_assert_not_reached(); } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-56-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 3 ++ target/arm/tcg/translate-a64.c | 69 ++++++++++++++++++++++++---------- 2 files changed, 53 insertions(+), 19 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ FABS_v 0.00 1110 1.1 00000 11111 0 ..... ..... @qrr_sd FNEG_v 0.10 1110 111 11000 11111 0 ..... ..... @qrr_h FNEG_v 0.10 1110 1.1 00000 11111 0 ..... ..... @qrr_sd + +FSQRT_v 0.10 1110 111 11001 11111 0 ..... ..... @qrr_h +FSQRT_v 0.10 1110 1.1 00001 11111 0 ..... ..... @qrr_sd diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static bool do_fabs_fneg_v(DisasContext *s, arg_qrr_e *a, GVecGen2Fn *fn) TRANS(FABS_v, do_fabs_fneg_v, a, gen_gvec_fabs) TRANS(FNEG_v, do_fabs_fneg_v, a, gen_gvec_fneg) +static bool do_fp1_vector(DisasContext *s, arg_qrr_e *a, + const FPScalar1 *f, int rmode) +{ + TCGv_i32 tcg_rmode = NULL; + TCGv_ptr fpst; + int check = fp_access_check_vector_hsd(s, a->q, a->esz); + + if (check <= 0) { + return check == 0; + } + + fpst = fpstatus_ptr(a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR); + if (rmode >= 0) { + tcg_rmode = gen_set_rmode(rmode, fpst); + } + + if (a->esz == MO_64) { + TCGv_i64 t64 = tcg_temp_new_i64(); + + for (int pass = 0; pass < 2; ++pass) { + read_vec_element(s, t64, a->rn, pass, MO_64); + f->gen_d(t64, t64, fpst); + write_vec_element(s, t64, a->rd, pass, MO_64); + } + } else { + TCGv_i32 t32 = tcg_temp_new_i32(); + void (*gen)(TCGv_i32, TCGv_i32, TCGv_ptr) + = (a->esz == MO_16 ? f->gen_h : f->gen_s); + + for (int pass = 0, n = (a->q ? 16 : 8) >> a->esz; pass < n; ++pass) { + read_vec_element_i32(s, t32, a->rn, pass, a->esz); + gen(t32, t32, fpst); + write_vec_element_i32(s, t32, a->rd, pass, a->esz); + } + } + clear_vec_high(s, a->q, a->rd); + + if (rmode >= 0) { + gen_restore_rmode(tcg_rmode, fpst); + } + return true; +} + +TRANS(FSQRT_v, do_fp1_vector, a, &f_scalar_fsqrt, -1) + /* Common vector code for handling integer to FP conversion */ static void handle_simd_intfp_conv(DisasContext *s, int rd, int rn, int elements, int is_signed, @@ -XXX,XX +XXX,XX @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u, * requires them. */ switch (opcode) { - case 0x7f: /* FSQRT */ - gen_helper_vfp_sqrtd(tcg_rd, tcg_rn, tcg_fpstatus); - break; case 0x1a: /* FCVTNS */ case 0x1b: /* FCVTMS */ case 0x1c: /* FCVTAS */ @@ -XXX,XX +XXX,XX @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u, case 0xb: /* ABS, NEG */ case 0x2f: /* FABS */ case 0x6f: /* FNEG */ + case 0x7f: /* FSQRT */ g_assert_not_reached(); } } @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) } handle_2misc_fcmp_zero(s, opcode, false, u, is_q, size, rn, rd); return; - case 0x7f: /* FSQRT */ - need_fpstatus = true; - if (size == 3 && !is_q) { - unallocated_encoding(s); - return; - } - break; case 0x1a: /* FCVTNS */ case 0x1b: /* FCVTMS */ case 0x3a: /* FCVTPS */ @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) case 0x56: /* FCVTXN, FCVTXN2 */ case 0x2f: /* FABS */ case 0x6f: /* FNEG */ + case 0x7f: /* FSQRT */ unallocated_encoding(s); return; } @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) { /* Special cases for 32 bit elements */ switch (opcode) { - case 0x7f: /* FSQRT */ - gen_helper_vfp_sqrts(tcg_res, tcg_op, tcg_fpstatus); - break; case 0x1a: /* FCVTNS */ case 0x1b: /* FCVTMS */ case 0x1c: /* FCVTAS */ @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) case 0x7: /* SQABS, SQNEG */ case 0x2f: /* FABS */ case 0x6f: /* FNEG */ + case 0x7f: /* FSQRT */ g_assert_not_reached(); } } @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn) break; case 0x7d: /* FRSQRTE */ break; - case 0x7f: /* FSQRT (vector) */ - only_in_vector = true; - break; default: case 0x2f: /* FABS */ case 0x6f: /* FNEG */ + case 0x7f: /* FSQRT (vector) */ unallocated_encoding(s); return; } @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn) case 0x7d: /* FRSQRTE */ gen_helper_rsqrte_f16(tcg_res, tcg_op, tcg_fpstatus); break; - case 0x7f: /* FSQRT */ - gen_helper_vfp_sqrth(tcg_res, tcg_op, tcg_fpstatus); - break; default: case 0x2f: /* FABS */ case 0x6f: /* FNEG */ + case 0x7f: /* FSQRT */ g_assert_not_reached(); } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-57-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 26 +++++ target/arm/tcg/translate-a64.c | 176 ++++++++++++--------------------- 2 files changed, 88 insertions(+), 114 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ FNEG_v 0.10 1110 1.1 00000 11111 0 ..... ..... @qrr_sd FSQRT_v 0.10 1110 111 11001 11111 0 ..... ..... @qrr_h FSQRT_v 0.10 1110 1.1 00001 11111 0 ..... ..... @qrr_sd + +FRINTN_v 0.00 1110 011 11001 10001 0 ..... ..... @qrr_h +FRINTN_v 0.00 1110 0.1 00001 10001 0 ..... ..... @qrr_sd + +FRINTM_v 0.00 1110 011 11001 10011 0 ..... ..... @qrr_h +FRINTM_v 0.00 1110 0.1 00001 10011 0 ..... ..... @qrr_sd + +FRINTP_v 0.00 1110 111 11001 10001 0 ..... ..... @qrr_h +FRINTP_v 0.00 1110 1.1 00001 10001 0 ..... ..... @qrr_sd + +FRINTZ_v 0.00 1110 111 11001 10011 0 ..... ..... @qrr_h +FRINTZ_v 0.00 1110 1.1 00001 10011 0 ..... ..... @qrr_sd + +FRINTA_v 0.10 1110 011 11001 10001 0 ..... ..... @qrr_h +FRINTA_v 0.10 1110 0.1 00001 10001 0 ..... ..... @qrr_sd + +FRINTX_v 0.10 1110 011 11001 10011 0 ..... ..... @qrr_h +FRINTX_v 0.10 1110 0.1 00001 10011 0 ..... ..... @qrr_sd + +FRINTI_v 0.10 1110 111 11001 10011 0 ..... ..... @qrr_h +FRINTI_v 0.10 1110 1.1 00001 10011 0 ..... ..... @qrr_sd + +FRINT32Z_v 0.00 1110 0.1 00001 11101 0 ..... ..... @qrr_sd +FRINT32X_v 0.10 1110 0.1 00001 11101 0 ..... ..... @qrr_sd +FRINT64Z_v 0.00 1110 0.1 00001 11111 0 ..... ..... @qrr_sd +FRINT64X_v 0.10 1110 0.1 00001 11111 0 ..... ..... @qrr_sd diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static bool do_fp1_vector(DisasContext *s, arg_qrr_e *a, TRANS(FSQRT_v, do_fp1_vector, a, &f_scalar_fsqrt, -1) +TRANS(FRINTN_v, do_fp1_vector, a, &f_scalar_frint, FPROUNDING_TIEEVEN) +TRANS(FRINTP_v, do_fp1_vector, a, &f_scalar_frint, FPROUNDING_POSINF) +TRANS(FRINTM_v, do_fp1_vector, a, &f_scalar_frint, FPROUNDING_NEGINF) +TRANS(FRINTZ_v, do_fp1_vector, a, &f_scalar_frint, FPROUNDING_ZERO) +TRANS(FRINTA_v, do_fp1_vector, a, &f_scalar_frint, FPROUNDING_TIEAWAY) +TRANS(FRINTI_v, do_fp1_vector, a, &f_scalar_frint, -1) +TRANS(FRINTX_v, do_fp1_vector, a, &f_scalar_frintx, -1) + +TRANS_FEAT(FRINT32Z_v, aa64_frint, do_fp1_vector, a, + &f_scalar_frint32, FPROUNDING_ZERO) +TRANS_FEAT(FRINT32X_v, aa64_frint, do_fp1_vector, a, &f_scalar_frint32, -1) +TRANS_FEAT(FRINT64Z_v, aa64_frint, do_fp1_vector, a, + &f_scalar_frint64, FPROUNDING_ZERO) +TRANS_FEAT(FRINT64X_v, aa64_frint, do_fp1_vector, a, &f_scalar_frint64, -1) + /* Common vector code for handling integer to FP conversion */ static void handle_simd_intfp_conv(DisasContext *s, int rd, int rn, int elements, int is_signed, @@ -XXX,XX +XXX,XX @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u, case 0x7b: /* FCVTZU */ gen_helper_vfp_touqd(tcg_rd, tcg_rn, tcg_constant_i32(0), tcg_fpstatus); break; - case 0x18: /* FRINTN */ - case 0x19: /* FRINTM */ - case 0x38: /* FRINTP */ - case 0x39: /* FRINTZ */ - case 0x58: /* FRINTA */ - case 0x79: /* FRINTI */ - gen_helper_rintd(tcg_rd, tcg_rn, tcg_fpstatus); - break; - case 0x59: /* FRINTX */ - gen_helper_rintd_exact(tcg_rd, tcg_rn, tcg_fpstatus); - break; - case 0x1e: /* FRINT32Z */ - case 0x5e: /* FRINT32X */ - gen_helper_frint32_d(tcg_rd, tcg_rn, tcg_fpstatus); - break; - case 0x1f: /* FRINT64Z */ - case 0x5f: /* FRINT64X */ - gen_helper_frint64_d(tcg_rd, tcg_rn, tcg_fpstatus); - break; default: case 0x4: /* CLS, CLZ */ case 0x5: /* NOT */ @@ -XXX,XX +XXX,XX @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u, case 0x2f: /* FABS */ case 0x6f: /* FNEG */ case 0x7f: /* FSQRT */ + case 0x18: /* FRINTN */ + case 0x19: /* FRINTM */ + case 0x38: /* FRINTP */ + case 0x39: /* FRINTZ */ + case 0x58: /* FRINTA */ + case 0x79: /* FRINTI */ + case 0x59: /* FRINTX */ + case 0x1e: /* FRINT32Z */ + case 0x5e: /* FRINT32X */ + case 0x1f: /* FRINT64Z */ + case 0x5f: /* FRINT64X */ g_assert_not_reached(); } } @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) } handle_2misc_widening(s, opcode, is_q, size, rn, rd); return; - case 0x18: /* FRINTN */ - case 0x19: /* FRINTM */ - case 0x38: /* FRINTP */ - case 0x39: /* FRINTZ */ - rmode = extract32(opcode, 5, 1) | (extract32(opcode, 0, 1) << 1); - /* fall through */ - case 0x59: /* FRINTX */ - case 0x79: /* FRINTI */ - need_fpstatus = true; - if (size == 3 && !is_q) { - unallocated_encoding(s); - return; - } - break; - case 0x58: /* FRINTA */ - rmode = FPROUNDING_TIEAWAY; - need_fpstatus = true; - if (size == 3 && !is_q) { - unallocated_encoding(s); - return; - } - break; case 0x7c: /* URSQRTE */ if (size == 3) { unallocated_encoding(s); return; } break; - case 0x1e: /* FRINT32Z */ - case 0x1f: /* FRINT64Z */ - rmode = FPROUNDING_ZERO; - /* fall through */ - case 0x5e: /* FRINT32X */ - case 0x5f: /* FRINT64X */ - need_fpstatus = true; - if ((size == 3 && !is_q) || !dc_isar_feature(aa64_frint, s)) { - unallocated_encoding(s); - return; - } - break; default: case 0x16: /* FCVTN, FCVTN2 */ case 0x36: /* BFCVTN, BFCVTN2 */ @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) case 0x2f: /* FABS */ case 0x6f: /* FNEG */ case 0x7f: /* FSQRT */ + case 0x18: /* FRINTN */ + case 0x19: /* FRINTM */ + case 0x38: /* FRINTP */ + case 0x39: /* FRINTZ */ + case 0x59: /* FRINTX */ + case 0x79: /* FRINTI */ + case 0x58: /* FRINTA */ + case 0x1e: /* FRINT32Z */ + case 0x1f: /* FRINT64Z */ + case 0x5e: /* FRINT32X */ + case 0x5f: /* FRINT64X */ unallocated_encoding(s); return; } @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) gen_helper_vfp_touls(tcg_res, tcg_op, tcg_constant_i32(0), tcg_fpstatus); break; - case 0x18: /* FRINTN */ - case 0x19: /* FRINTM */ - case 0x38: /* FRINTP */ - case 0x39: /* FRINTZ */ - case 0x58: /* FRINTA */ - case 0x79: /* FRINTI */ - gen_helper_rints(tcg_res, tcg_op, tcg_fpstatus); - break; - case 0x59: /* FRINTX */ - gen_helper_rints_exact(tcg_res, tcg_op, tcg_fpstatus); - break; case 0x7c: /* URSQRTE */ gen_helper_rsqrte_u32(tcg_res, tcg_op); break; - case 0x1e: /* FRINT32Z */ - case 0x5e: /* FRINT32X */ - gen_helper_frint32_s(tcg_res, tcg_op, tcg_fpstatus); - break; - case 0x1f: /* FRINT64Z */ - case 0x5f: /* FRINT64X */ - gen_helper_frint64_s(tcg_res, tcg_op, tcg_fpstatus); - break; default: case 0x7: /* SQABS, SQNEG */ case 0x2f: /* FABS */ case 0x6f: /* FNEG */ case 0x7f: /* FSQRT */ + case 0x18: /* FRINTN */ + case 0x19: /* FRINTM */ + case 0x38: /* FRINTP */ + case 0x39: /* FRINTZ */ + case 0x58: /* FRINTA */ + case 0x79: /* FRINTI */ + case 0x59: /* FRINTX */ + case 0x1e: /* FRINT32Z */ + case 0x5e: /* FRINT32X */ + case 0x1f: /* FRINT64Z */ + case 0x5f: /* FRINT64X */ g_assert_not_reached(); } } @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn) int rn, rd; bool is_q; bool is_scalar; - bool only_in_vector = false; int pass; TCGv_i32 tcg_rmode = NULL; @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn) case 0x3d: /* FRECPE */ case 0x3f: /* FRECPX */ break; - case 0x18: /* FRINTN */ - only_in_vector = true; - rmode = FPROUNDING_TIEEVEN; - break; - case 0x19: /* FRINTM */ - only_in_vector = true; - rmode = FPROUNDING_NEGINF; - break; - case 0x38: /* FRINTP */ - only_in_vector = true; - rmode = FPROUNDING_POSINF; - break; - case 0x39: /* FRINTZ */ - only_in_vector = true; - rmode = FPROUNDING_ZERO; - break; - case 0x58: /* FRINTA */ - only_in_vector = true; - rmode = FPROUNDING_TIEAWAY; - break; - case 0x59: /* FRINTX */ - case 0x79: /* FRINTI */ - only_in_vector = true; - /* current rounding mode */ - break; case 0x1a: /* FCVTNS */ rmode = FPROUNDING_TIEEVEN; break; @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn) case 0x2f: /* FABS */ case 0x6f: /* FNEG */ case 0x7f: /* FSQRT (vector) */ + case 0x18: /* FRINTN */ + case 0x19: /* FRINTM */ + case 0x38: /* FRINTP */ + case 0x39: /* FRINTZ */ + case 0x58: /* FRINTA */ + case 0x59: /* FRINTX */ + case 0x79: /* FRINTI */ unallocated_encoding(s); return; } @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn) unallocated_encoding(s); return; } - /* FRINTxx is only in the vector form */ - if (only_in_vector) { - unallocated_encoding(s); - return; - } } if (!fp_access_check(s)) { @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn) case 0x7b: /* FCVTZU */ gen_helper_advsimd_f16touinth(tcg_res, tcg_op, tcg_fpstatus); break; - case 0x18: /* FRINTN */ - case 0x19: /* FRINTM */ - case 0x38: /* FRINTP */ - case 0x39: /* FRINTZ */ - case 0x58: /* FRINTA */ - case 0x79: /* FRINTI */ - gen_helper_advsimd_rinth(tcg_res, tcg_op, tcg_fpstatus); - break; - case 0x59: /* FRINTX */ - gen_helper_advsimd_rinth_exact(tcg_res, tcg_op, tcg_fpstatus); - break; case 0x7d: /* FRSQRTE */ gen_helper_rsqrte_f16(tcg_res, tcg_op, tcg_fpstatus); break; @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn) case 0x2f: /* FABS */ case 0x6f: /* FNEG */ case 0x7f: /* FSQRT */ + case 0x18: /* FRINTN */ + case 0x19: /* FRINTM */ + case 0x38: /* FRINTP */ + case 0x39: /* FRINTZ */ + case 0x58: /* FRINTA */ + case 0x79: /* FRINTI */ + case 0x59: /* FRINTX */ g_assert_not_reached(); } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Arm silliness with naming, the scalar insns described as part of the vector instructions, as separate from the "regular" scalar insns which output to general registers. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-58-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 30 ++++++++ target/arm/tcg/translate-a64.c | 133 ++++++++++++++------------------- 2 files changed, 86 insertions(+), 77 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ UQXTN_s 0111 1110 ..1 00001 01001 0 ..... ..... @rr_e FCVTXN_s 0111 1110 011 00001 01101 0 ..... ..... @rr_s +@icvt_h . ....... .. ...... ...... rn:5 rd:5 \ + &fcvt sf=0 esz=1 shift=0 +@icvt_sd . ....... .. ...... ...... rn:5 rd:5 \ + &fcvt sf=0 esz=%esz_sd shift=0 + +FCVTNS_f 0101 1110 011 11001 10101 0 ..... ..... @icvt_h +FCVTNS_f 0101 1110 0.1 00001 10101 0 ..... ..... @icvt_sd +FCVTNU_f 0111 1110 011 11001 10101 0 ..... ..... @icvt_h +FCVTNU_f 0111 1110 0.1 00001 10101 0 ..... ..... @icvt_sd + +FCVTPS_f 0101 1110 111 11001 10101 0 ..... ..... @icvt_h +FCVTPS_f 0101 1110 1.1 00001 10101 0 ..... ..... @icvt_sd +FCVTPU_f 0111 1110 111 11001 10101 0 ..... ..... @icvt_h +FCVTPU_f 0111 1110 1.1 00001 10101 0 ..... ..... @icvt_sd + +FCVTMS_f 0101 1110 011 11001 10111 0 ..... ..... @icvt_h +FCVTMS_f 0101 1110 0.1 00001 10111 0 ..... ..... @icvt_sd +FCVTMU_f 0111 1110 011 11001 10111 0 ..... ..... @icvt_h +FCVTMU_f 0111 1110 0.1 00001 10111 0 ..... ..... @icvt_sd + +FCVTZS_f 0101 1110 111 11001 10111 0 ..... ..... @icvt_h +FCVTZS_f 0101 1110 1.1 00001 10111 0 ..... ..... @icvt_sd +FCVTZU_f 0111 1110 111 11001 10111 0 ..... ..... @icvt_h +FCVTZU_f 0111 1110 1.1 00001 10111 0 ..... ..... @icvt_sd + +FCVTAS_f 0101 1110 011 11001 11001 0 ..... ..... @icvt_h +FCVTAS_f 0101 1110 0.1 00001 11001 0 ..... ..... @icvt_sd +FCVTAU_f 0111 1110 011 11001 11001 0 ..... ..... @icvt_h +FCVTAU_f 0111 1110 0.1 00001 11001 0 ..... ..... @icvt_sd + # Advanced SIMD two-register miscellaneous SQABS_v 0.00 1110 ..1 00000 01111 0 ..... ..... @qrr_e diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static void do_fcvt_scalar(DisasContext *s, MemOp out, MemOp esz, tcg_shift, tcg_fpstatus); tcg_gen_extu_i32_i64(tcg_out, tcg_single); break; + case MO_16 | MO_SIGN: + gen_helper_vfp_toshh(tcg_single, tcg_single, + tcg_shift, tcg_fpstatus); + tcg_gen_extu_i32_i64(tcg_out, tcg_single); + break; + case MO_16: + gen_helper_vfp_touhh(tcg_single, tcg_single, + tcg_shift, tcg_fpstatus); + tcg_gen_extu_i32_i64(tcg_out, tcg_single); + break; default: g_assert_not_reached(); } @@ -XXX,XX +XXX,XX @@ TRANS(FCVTZU_g, do_fcvt_g, a, FPROUNDING_ZERO, false) TRANS(FCVTAS_g, do_fcvt_g, a, FPROUNDING_TIEAWAY, true) TRANS(FCVTAU_g, do_fcvt_g, a, FPROUNDING_TIEAWAY, false) +/* + * FCVT* (vector), scalar version. + * Which sounds weird, but really just means output to fp register + * instead of output to general register. Input and output element + * size are always equal. + */ +static bool do_fcvt_f(DisasContext *s, arg_fcvt *a, + ARMFPRounding rmode, bool is_signed) +{ + TCGv_i64 tcg_int; + int check = fp_access_check_scalar_hsd(s, a->esz); + + if (check <= 0) { + return check == 0; + } + + tcg_int = tcg_temp_new_i64(); + do_fcvt_scalar(s, a->esz | (is_signed ? MO_SIGN : 0), + a->esz, tcg_int, a->shift, a->rn, rmode); + + clear_vec(s, a->rd); + write_vec_element(s, tcg_int, a->rd, 0, a->esz); + return true; +} + +TRANS(FCVTNS_f, do_fcvt_f, a, FPROUNDING_TIEEVEN, true) +TRANS(FCVTNU_f, do_fcvt_f, a, FPROUNDING_TIEEVEN, false) +TRANS(FCVTPS_f, do_fcvt_f, a, FPROUNDING_POSINF, true) +TRANS(FCVTPU_f, do_fcvt_f, a, FPROUNDING_POSINF, false) +TRANS(FCVTMS_f, do_fcvt_f, a, FPROUNDING_NEGINF, true) +TRANS(FCVTMU_f, do_fcvt_f, a, FPROUNDING_NEGINF, false) +TRANS(FCVTZS_f, do_fcvt_f, a, FPROUNDING_ZERO, true) +TRANS(FCVTZU_f, do_fcvt_f, a, FPROUNDING_ZERO, false) +TRANS(FCVTAS_f, do_fcvt_f, a, FPROUNDING_TIEAWAY, true) +TRANS(FCVTAU_f, do_fcvt_f, a, FPROUNDING_TIEAWAY, false) + static bool trans_FJCVTZS(DisasContext *s, arg_FJCVTZS *a) { if (!dc_isar_feature(aa64_jscvt, s)) { @@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) int opcode = extract32(insn, 12, 5); int size = extract32(insn, 22, 2); bool u = extract32(insn, 29, 1); - bool is_fcvt = false; - int rmode; - TCGv_i32 tcg_rmode; - TCGv_ptr tcg_fpstatus; switch (opcode) { case 0xc ... 0xf: @@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) case 0x5b: /* FCVTMU */ case 0x7a: /* FCVTPU */ case 0x7b: /* FCVTZU */ - is_fcvt = true; - rmode = extract32(opcode, 5, 1) | (extract32(opcode, 0, 1) << 1); - break; case 0x1c: /* FCVTAS */ case 0x5c: /* FCVTAU */ - /* TIEAWAY doesn't fit in the usual rounding mode encoding */ - is_fcvt = true; - rmode = FPROUNDING_TIEAWAY; - break; case 0x56: /* FCVTXN, FCVTXN2 */ default: unallocated_encoding(s); @@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) unallocated_encoding(s); return; } - - if (!fp_access_check(s)) { - return; - } - - if (is_fcvt) { - tcg_fpstatus = fpstatus_ptr(FPST_FPCR); - tcg_rmode = gen_set_rmode(rmode, tcg_fpstatus); - } else { - tcg_fpstatus = NULL; - tcg_rmode = NULL; - } - - if (size == 3) { - TCGv_i64 tcg_rn = read_fp_dreg(s, rn); - TCGv_i64 tcg_rd = tcg_temp_new_i64(); - - handle_2misc_64(s, opcode, u, tcg_rd, tcg_rn, tcg_rmode, tcg_fpstatus); - write_fp_dreg(s, rd, tcg_rd); - } else { - TCGv_i32 tcg_rn = tcg_temp_new_i32(); - TCGv_i32 tcg_rd = tcg_temp_new_i32(); - - read_vec_element_i32(s, tcg_rn, rn, 0, size); - - switch (opcode) { - case 0x1a: /* FCVTNS */ - case 0x1b: /* FCVTMS */ - case 0x1c: /* FCVTAS */ - case 0x3a: /* FCVTPS */ - case 0x3b: /* FCVTZS */ - gen_helper_vfp_tosls(tcg_rd, tcg_rn, tcg_constant_i32(0), - tcg_fpstatus); - break; - case 0x5a: /* FCVTNU */ - case 0x5b: /* FCVTMU */ - case 0x5c: /* FCVTAU */ - case 0x7a: /* FCVTPU */ - case 0x7b: /* FCVTZU */ - gen_helper_vfp_touls(tcg_rd, tcg_rn, tcg_constant_i32(0), - tcg_fpstatus); - break; - default: - case 0x7: /* SQABS, SQNEG */ - g_assert_not_reached(); - } - - write_fp_sreg(s, rd, tcg_rd); - } - - if (is_fcvt) { - gen_restore_rmode(tcg_rmode, tcg_fpstatus); - } + g_assert_not_reached(); } /* AdvSIMD shift by immediate @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn) TCGv_i32 tcg_res = tcg_temp_new_i32(); switch (fpop) { - case 0x1a: /* FCVTNS */ - case 0x1b: /* FCVTMS */ - case 0x1c: /* FCVTAS */ - case 0x3a: /* FCVTPS */ - case 0x3b: /* FCVTZS */ - gen_helper_advsimd_f16tosinth(tcg_res, tcg_op, tcg_fpstatus); - break; case 0x3d: /* FRECPE */ gen_helper_recpe_f16(tcg_res, tcg_op, tcg_fpstatus); break; case 0x3f: /* FRECPX */ gen_helper_frecpx_f16(tcg_res, tcg_op, tcg_fpstatus); break; + case 0x7d: /* FRSQRTE */ + gen_helper_rsqrte_f16(tcg_res, tcg_op, tcg_fpstatus); + break; + default: + case 0x1a: /* FCVTNS */ + case 0x1b: /* FCVTMS */ + case 0x1c: /* FCVTAS */ + case 0x3a: /* FCVTPS */ + case 0x3b: /* FCVTZS */ case 0x5a: /* FCVTNU */ case 0x5b: /* FCVTMU */ case 0x5c: /* FCVTAU */ case 0x7a: /* FCVTPU */ case 0x7b: /* FCVTZU */ - gen_helper_advsimd_f16touinth(tcg_res, tcg_op, tcg_fpstatus); - break; - case 0x7d: /* FRSQRTE */ - gen_helper_rsqrte_f16(tcg_res, tcg_op, tcg_fpstatus); - break; - default: g_assert_not_reached(); } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-59-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 19 +++++++++++++++++++ target/arm/tcg/translate-a64.c | 4 +--- 2 files changed, 20 insertions(+), 3 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ FCVTAS_f 0101 1110 0.1 00001 11001 0 ..... ..... @icvt_sd FCVTAU_f 0111 1110 011 11001 11001 0 ..... ..... @icvt_h FCVTAU_f 0111 1110 0.1 00001 11001 0 ..... ..... @icvt_sd +%fcvt_f_sh_h 16:4 !function=rsub_16 +%fcvt_f_sh_s 16:5 !function=rsub_32 +%fcvt_f_sh_d 16:6 !function=rsub_64 + +@fcvt_fixed_h .... .... . 001 .... ...... rn:5 rd:5 \ + &fcvt sf=0 esz=1 shift=%fcvt_f_sh_h +@fcvt_fixed_s .... .... . 01 ..... ...... rn:5 rd:5 \ + &fcvt sf=0 esz=2 shift=%fcvt_f_sh_s +@fcvt_fixed_d .... .... . 1 ...... ...... rn:5 rd:5 \ + &fcvt sf=0 esz=3 shift=%fcvt_f_sh_d + +FCVTZS_f 0101 1111 0 ....... 111111 ..... ..... @fcvt_fixed_h +FCVTZS_f 0101 1111 0 ....... 111111 ..... ..... @fcvt_fixed_s +FCVTZS_f 0101 1111 0 ....... 111111 ..... ..... @fcvt_fixed_d + +FCVTZU_f 0111 1111 0 ....... 111111 ..... ..... @fcvt_fixed_h +FCVTZU_f 0111 1111 0 ....... 111111 ..... ..... @fcvt_fixed_s +FCVTZU_f 0111 1111 0 ....... 111111 ..... ..... @fcvt_fixed_d + # Advanced SIMD two-register miscellaneous SQABS_v 0.00 1110 ..1 00000 01111 0 ..... ..... @qrr_e diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_shift_imm(DisasContext *s, uint32_t insn) handle_simd_shift_intfp_conv(s, true, false, is_u, immh, immb, opcode, rn, rd); break; - case 0x1f: /* FCVTZS, FCVTZU */ - handle_simd_shift_fpint_conv(s, true, false, is_u, immh, immb, rn, rd); - break; default: case 0x00: /* SSHR / USHR */ case 0x02: /* SSRA / USRA */ @@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_shift_imm(DisasContext *s, uint32_t insn) case 0x11: /* SQRSHRUN */ case 0x12: /* SQSHRN, UQSHRN */ case 0x13: /* SQRSHRN, UQRSHRN */ + case 0x1f: /* FCVTZS, FCVTZU */ unallocated_encoding(s); break; } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-60-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 6 ++++++ target/arm/tcg/translate-a64.c | 35 ++++++++++++++++++++++++---------- 2 files changed, 31 insertions(+), 10 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ FCVTXN_s 0111 1110 011 00001 01101 0 ..... ..... @rr_s @icvt_sd . ....... .. ...... ...... rn:5 rd:5 \ &fcvt sf=0 esz=%esz_sd shift=0 +SCVTF_f 0101 1110 011 11001 11011 0 ..... ..... @icvt_h +SCVTF_f 0101 1110 0.1 00001 11011 0 ..... ..... @icvt_sd + +UCVTF_f 0111 1110 011 11001 11011 0 ..... ..... @icvt_h +UCVTF_f 0111 1110 0.1 00001 11011 0 ..... ..... @icvt_sd + FCVTNS_f 0101 1110 011 11001 10101 0 ..... ..... @icvt_h FCVTNS_f 0101 1110 0.1 00001 10101 0 ..... ..... @icvt_sd FCVTNU_f 0111 1110 011 11001 10101 0 ..... ..... @icvt_h diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static bool do_cvtf_g(DisasContext *s, arg_fcvt *a, bool is_signed) TRANS(SCVTF_g, do_cvtf_g, a, true) TRANS(UCVTF_g, do_cvtf_g, a, false) +/* + * [US]CVTF (vector), scalar version. + * Which sounds weird, but really just means input from fp register + * instead of input from general register. Input and output element + * size are always equal. + */ +static bool do_cvtf_f(DisasContext *s, arg_fcvt *a, bool is_signed) +{ + TCGv_i64 tcg_int; + int check = fp_access_check_scalar_hsd(s, a->esz); + + if (check <= 0) { + return check == 0; + } + + tcg_int = tcg_temp_new_i64(); + read_vec_element(s, tcg_int, a->rn, 0, a->esz | (is_signed ? MO_SIGN : 0)); + return do_cvtf_scalar(s, a->esz, a->rd, a->shift, tcg_int, is_signed); +} + +TRANS(SCVTF_f, do_cvtf_f, a, true) +TRANS(UCVTF_f, do_cvtf_f, a, false) + static void do_fcvt_scalar(DisasContext *s, MemOp out, MemOp esz, TCGv_i64 tcg_out, int shift, int rn, ARMFPRounding rmode) @@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) case 0x6d: /* FCMLE (zero) */ handle_2misc_fcmp_zero(s, opcode, true, u, true, size, rn, rd); return; - case 0x1d: /* SCVTF */ - case 0x5d: /* UCVTF */ - { - bool is_signed = (opcode == 0x1d); - if (!fp_access_check(s)) { - return; - } - handle_simd_intfp_conv(s, rd, rn, 1, is_signed, 0, size); - return; - } case 0x3d: /* FRECPE */ case 0x3f: /* FRECPX */ case 0x7d: /* FRSQRTE */ @@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) case 0x1c: /* FCVTAS */ case 0x5c: /* FCVTAU */ case 0x56: /* FCVTXN, FCVTXN2 */ + case 0x1d: /* SCVTF */ + case 0x5d: /* UCVTF */ default: unallocated_encoding(s); return; -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Remove disas_simd_scalar_shift_imm as these were the last insns decoded by that function. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-61-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 8 ++++++ target/arm/tcg/translate-a64.c | 47 ---------------------------------- 2 files changed, 8 insertions(+), 47 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ FCVTAU_f 0111 1110 0.1 00001 11001 0 ..... ..... @icvt_sd @fcvt_fixed_d .... .... . 1 ...... ...... rn:5 rd:5 \ &fcvt sf=0 esz=3 shift=%fcvt_f_sh_d +SCVTF_f 0101 1111 0 ....... 111001 ..... ..... @fcvt_fixed_h +SCVTF_f 0101 1111 0 ....... 111001 ..... ..... @fcvt_fixed_s +SCVTF_f 0101 1111 0 ....... 111001 ..... ..... @fcvt_fixed_d + +UCVTF_f 0111 1111 0 ....... 111001 ..... ..... @fcvt_fixed_h +UCVTF_f 0111 1111 0 ....... 111001 ..... ..... @fcvt_fixed_s +UCVTF_f 0111 1111 0 ....... 111001 ..... ..... @fcvt_fixed_d + FCVTZS_f 0101 1111 0 ....... 111111 ..... ..... @fcvt_fixed_h FCVTZS_f 0101 1111 0 ....... 111111 ..... ..... @fcvt_fixed_s FCVTZS_f 0101 1111 0 ....... 111111 ..... ..... @fcvt_fixed_d diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static void handle_simd_shift_fpint_conv(DisasContext *s, bool is_scalar, gen_restore_rmode(tcg_rmode, tcg_fpstatus); } -/* AdvSIMD scalar shift by immediate - * 31 30 29 28 23 22 19 18 16 15 11 10 9 5 4 0 - * +-----+---+-------------+------+------+--------+---+------+------+ - * | 0 1 | U | 1 1 1 1 1 0 | immh | immb | opcode | 1 | Rn | Rd | - * +-----+---+-------------+------+------+--------+---+------+------+ - * - * This is the scalar version so it works on a fixed sized registers - */ -static void disas_simd_scalar_shift_imm(DisasContext *s, uint32_t insn) -{ - int rd = extract32(insn, 0, 5); - int rn = extract32(insn, 5, 5); - int opcode = extract32(insn, 11, 5); - int immb = extract32(insn, 16, 3); - int immh = extract32(insn, 19, 4); - bool is_u = extract32(insn, 29, 1); - - if (immh == 0) { - unallocated_encoding(s); - return; - } - - switch (opcode) { - case 0x1c: /* SCVTF, UCVTF */ - handle_simd_shift_intfp_conv(s, true, false, is_u, immh, immb, - opcode, rn, rd); - break; - default: - case 0x00: /* SSHR / USHR */ - case 0x02: /* SSRA / USRA */ - case 0x04: /* SRSHR / URSHR */ - case 0x06: /* SRSRA / URSRA */ - case 0x08: /* SRI */ - case 0x0a: /* SHL / SLI */ - case 0x0c: /* SQSHLU */ - case 0x0e: /* SQSHL, UQSHL */ - case 0x10: /* SQSHRUN */ - case 0x11: /* SQRSHRUN */ - case 0x12: /* SQSHRN, UQSHRN */ - case 0x13: /* SQRSHRN, UQRSHRN */ - case 0x1f: /* FCVTZS, FCVTZU */ - unallocated_encoding(s); - break; - } -} - static void handle_2misc_64(DisasContext *s, int opcode, bool u, TCGv_i64 tcg_rd, TCGv_i64 tcg_rn, TCGv_i32 tcg_rmode, TCGv_ptr tcg_fpstatus) @@ -XXX,XX +XXX,XX @@ static const AArch64DecodeTable data_proc_simd[] = { { 0x0e200800, 0x9f3e0c00, disas_simd_two_reg_misc }, { 0x0f000400, 0x9f800400, disas_simd_shift_imm }, { 0x5e200800, 0xdf3e0c00, disas_simd_scalar_two_reg_misc }, - { 0x5f000400, 0xdf800400, disas_simd_scalar_shift_imm }, { 0x0e780800, 0x8f7e0c00, disas_simd_two_reg_misc_fp16 }, { 0x00000000, 0x00000000, NULL } }; -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Emphasize that these functions use round-to-zero mode. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-62-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/helper.h | 8 ++++---- target/arm/tcg/translate-neon.c | 8 ++++---- target/arm/tcg/vec_helper.c | 8 ++++---- 3 files changed, 12 insertions(+), 12 deletions(-) diff --git a/target/arm/helper.h b/target/arm/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_touizs, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_vcvt_sf, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_vcvt_uf, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_4(gvec_vcvt_fs, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_4(gvec_vcvt_fu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_vcvt_rz_fs, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_vcvt_rz_fu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_vcvt_sh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_vcvt_uh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_4(gvec_vcvt_hs, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_4(gvec_vcvt_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_vcvt_rz_hs, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_vcvt_rz_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_vcvt_rm_ss, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_vcvt_rm_us, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-neon.c +++ b/target/arm/tcg/translate-neon.c @@ -XXX,XX +XXX,XX @@ static bool do_fp_2sh(DisasContext *s, arg_2reg_shift *a, DO_FP_2SH(VCVT_SF, gen_helper_gvec_vcvt_sf) DO_FP_2SH(VCVT_UF, gen_helper_gvec_vcvt_uf) -DO_FP_2SH(VCVT_FS, gen_helper_gvec_vcvt_fs) -DO_FP_2SH(VCVT_FU, gen_helper_gvec_vcvt_fu) +DO_FP_2SH(VCVT_FS, gen_helper_gvec_vcvt_rz_fs) +DO_FP_2SH(VCVT_FU, gen_helper_gvec_vcvt_rz_fu) DO_FP_2SH(VCVT_SH, gen_helper_gvec_vcvt_sh) DO_FP_2SH(VCVT_UH, gen_helper_gvec_vcvt_uh) -DO_FP_2SH(VCVT_HS, gen_helper_gvec_vcvt_hs) -DO_FP_2SH(VCVT_HU, gen_helper_gvec_vcvt_hu) +DO_FP_2SH(VCVT_HS, gen_helper_gvec_vcvt_rz_hs) +DO_FP_2SH(VCVT_HU, gen_helper_gvec_vcvt_rz_hu) static bool do_1reg_imm(DisasContext *s, arg_1reg_imm *a, GVecGen2iFn *fn) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ DO_3OP_PAIR(gvec_uminp_s, MIN, uint32_t, H4) DO_VCVT_FIXED(gvec_vcvt_sf, helper_vfp_sltos, uint32_t) DO_VCVT_FIXED(gvec_vcvt_uf, helper_vfp_ultos, uint32_t) -DO_VCVT_FIXED(gvec_vcvt_fs, helper_vfp_tosls_round_to_zero, uint32_t) -DO_VCVT_FIXED(gvec_vcvt_fu, helper_vfp_touls_round_to_zero, uint32_t) +DO_VCVT_FIXED(gvec_vcvt_rz_fs, helper_vfp_tosls_round_to_zero, uint32_t) +DO_VCVT_FIXED(gvec_vcvt_rz_fu, helper_vfp_touls_round_to_zero, uint32_t) DO_VCVT_FIXED(gvec_vcvt_sh, helper_vfp_shtoh, uint16_t) DO_VCVT_FIXED(gvec_vcvt_uh, helper_vfp_uhtoh, uint16_t) -DO_VCVT_FIXED(gvec_vcvt_hs, helper_vfp_toshh_round_to_zero, uint16_t) -DO_VCVT_FIXED(gvec_vcvt_hu, helper_vfp_touhh_round_to_zero, uint16_t) +DO_VCVT_FIXED(gvec_vcvt_rz_hs, helper_vfp_toshh_round_to_zero, uint16_t) +DO_VCVT_FIXED(gvec_vcvt_rz_hu, helper_vfp_touhh_round_to_zero, uint16_t) #undef DO_VCVT_FIXED -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Remove handle_simd_intfp_conv and handle_simd_shift_intfp_conv as these were the last insns decoded by those functions. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-63-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/helper.h | 3 + target/arm/tcg/a64.decode | 22 ++++ target/arm/tcg/translate-a64.c | 201 ++++++--------------------------- target/arm/tcg/vec_helper.c | 7 +- 4 files changed, 66 insertions(+), 167 deletions(-) diff --git a/target/arm/helper.h b/target/arm/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_vcvt_uh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_vcvt_rz_hs, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_vcvt_rz_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_vcvt_sd, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_vcvt_ud, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_4(gvec_vcvt_rm_ss, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_vcvt_rm_us, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_vcvt_rm_sh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ FRINT32Z_v 0.00 1110 0.1 00001 11101 0 ..... ..... @qrr_sd FRINT32X_v 0.10 1110 0.1 00001 11101 0 ..... ..... @qrr_sd FRINT64Z_v 0.00 1110 0.1 00001 11111 0 ..... ..... @qrr_sd FRINT64X_v 0.10 1110 0.1 00001 11111 0 ..... ..... @qrr_sd + +SCVTF_vi 0.00 1110 011 11001 11011 0 ..... ..... @qrr_h +SCVTF_vi 0.00 1110 0.1 00001 11011 0 ..... ..... @qrr_sd + +UCVTF_vi 0.10 1110 011 11001 11011 0 ..... ..... @qrr_h +UCVTF_vi 0.10 1110 0.1 00001 11011 0 ..... ..... @qrr_sd + +&fcvt_q rd rn esz q shift +@fcvtq_h . q:1 . ...... 001 .... ...... rn:5 rd:5 \ + &fcvt_q esz=1 shift=%fcvt_f_sh_h +@fcvtq_s . q:1 . ...... 01 ..... ...... rn:5 rd:5 \ + &fcvt_q esz=2 shift=%fcvt_f_sh_s +@fcvtq_d . q:1 . ...... 1 ...... ...... rn:5 rd:5 \ + &fcvt_q esz=3 shift=%fcvt_f_sh_d + +SCVTF_vf 0.00 11110 ....... 111001 ..... ..... @fcvtq_h +SCVTF_vf 0.00 11110 ....... 111001 ..... ..... @fcvtq_s +SCVTF_vf 0.00 11110 ....... 111001 ..... ..... @fcvtq_d + +UCVTF_vf 0.10 11110 ....... 111001 ..... ..... @fcvtq_h +UCVTF_vf 0.10 11110 ....... 111001 ..... ..... @fcvtq_s +UCVTF_vf 0.10 11110 ....... 111001 ..... ..... @fcvtq_d diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FRINT64Z_v, aa64_frint, do_fp1_vector, a, &f_scalar_frint64, FPROUNDING_ZERO) TRANS_FEAT(FRINT64X_v, aa64_frint, do_fp1_vector, a, &f_scalar_frint64, -1) -/* Common vector code for handling integer to FP conversion */ -static void handle_simd_intfp_conv(DisasContext *s, int rd, int rn, - int elements, int is_signed, - int fracbits, int size) +static bool do_gvec_op2_fpst(DisasContext *s, MemOp esz, bool is_q, + int rd, int rn, int data, + gen_helper_gvec_2_ptr * const fns[3]) { - TCGv_ptr tcg_fpst = fpstatus_ptr(size == MO_16 ? FPST_FPCR_F16 : FPST_FPCR); - TCGv_i32 tcg_shift = NULL; + int check = fp_access_check_vector_hsd(s, is_q, esz); + TCGv_ptr fpst; - MemOp mop = size | (is_signed ? MO_SIGN : 0); - int pass; - - if (fracbits || size == MO_64) { - tcg_shift = tcg_constant_i32(fracbits); + if (check <= 0) { + return check == 0; } - if (size == MO_64) { - TCGv_i64 tcg_int64 = tcg_temp_new_i64(); - TCGv_i64 tcg_double = tcg_temp_new_i64(); - - for (pass = 0; pass < elements; pass++) { - read_vec_element(s, tcg_int64, rn, pass, mop); - - if (is_signed) { - gen_helper_vfp_sqtod(tcg_double, tcg_int64, - tcg_shift, tcg_fpst); - } else { - gen_helper_vfp_uqtod(tcg_double, tcg_int64, - tcg_shift, tcg_fpst); - } - if (elements == 1) { - write_fp_dreg(s, rd, tcg_double); - } else { - write_vec_element(s, tcg_double, rd, pass, MO_64); - } - } - } else { - TCGv_i32 tcg_int32 = tcg_temp_new_i32(); - TCGv_i32 tcg_float = tcg_temp_new_i32(); - - for (pass = 0; pass < elements; pass++) { - read_vec_element_i32(s, tcg_int32, rn, pass, mop); - - switch (size) { - case MO_32: - if (fracbits) { - if (is_signed) { - gen_helper_vfp_sltos(tcg_float, tcg_int32, - tcg_shift, tcg_fpst); - } else { - gen_helper_vfp_ultos(tcg_float, tcg_int32, - tcg_shift, tcg_fpst); - } - } else { - if (is_signed) { - gen_helper_vfp_sitos(tcg_float, tcg_int32, tcg_fpst); - } else { - gen_helper_vfp_uitos(tcg_float, tcg_int32, tcg_fpst); - } - } - break; - case MO_16: - if (fracbits) { - if (is_signed) { - gen_helper_vfp_sltoh(tcg_float, tcg_int32, - tcg_shift, tcg_fpst); - } else { - gen_helper_vfp_ultoh(tcg_float, tcg_int32, - tcg_shift, tcg_fpst); - } - } else { - if (is_signed) { - gen_helper_vfp_sitoh(tcg_float, tcg_int32, tcg_fpst); - } else { - gen_helper_vfp_uitoh(tcg_float, tcg_int32, tcg_fpst); - } - } - break; - default: - g_assert_not_reached(); - } - - if (elements == 1) { - write_fp_sreg(s, rd, tcg_float); - } else { - write_vec_element_i32(s, tcg_float, rd, pass, size); - } - } - } - - clear_vec_high(s, elements << size == 16, rd); + fpst = fpstatus_ptr(esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR); + tcg_gen_gvec_2_ptr(vec_full_reg_offset(s, rd), + vec_full_reg_offset(s, rn), fpst, + is_q ? 16 : 8, vec_full_reg_size(s), + data, fns[esz - 1]); + return true; } -/* UCVTF/SCVTF - Integer to FP conversion */ -static void handle_simd_shift_intfp_conv(DisasContext *s, bool is_scalar, - bool is_q, bool is_u, - int immh, int immb, int opcode, - int rn, int rd) -{ - int size, elements, fracbits; - int immhb = immh << 3 | immb; +static gen_helper_gvec_2_ptr * const f_scvtf_v[] = { + gen_helper_gvec_vcvt_sh, + gen_helper_gvec_vcvt_sf, + gen_helper_gvec_vcvt_sd, +}; +TRANS(SCVTF_vi, do_gvec_op2_fpst, + a->esz, a->q, a->rd, a->rn, 0, f_scvtf_v) +TRANS(SCVTF_vf, do_gvec_op2_fpst, + a->esz, a->q, a->rd, a->rn, a->shift, f_scvtf_v) - if (immh & 8) { - size = MO_64; - if (!is_scalar && !is_q) { - unallocated_encoding(s); - return; - } - } else if (immh & 4) { - size = MO_32; - } else if (immh & 2) { - size = MO_16; - if (!dc_isar_feature(aa64_fp16, s)) { - unallocated_encoding(s); - return; - } - } else { - /* immh == 0 would be a failure of the decode logic */ - g_assert(immh == 1); - unallocated_encoding(s); - return; - } - - if (is_scalar) { - elements = 1; - } else { - elements = (8 << is_q) >> size; - } - fracbits = (16 << size) - immhb; - - if (!fp_access_check(s)) { - return; - } - - handle_simd_intfp_conv(s, rd, rn, elements, !is_u, fracbits, size); -} +static gen_helper_gvec_2_ptr * const f_ucvtf_v[] = { + gen_helper_gvec_vcvt_uh, + gen_helper_gvec_vcvt_uf, + gen_helper_gvec_vcvt_ud, +}; +TRANS(UCVTF_vi, do_gvec_op2_fpst, + a->esz, a->q, a->rd, a->rn, 0, f_ucvtf_v) +TRANS(UCVTF_vf, do_gvec_op2_fpst, + a->esz, a->q, a->rd, a->rn, a->shift, f_ucvtf_v) /* FCVTZS, FVCVTZU - FP to fixedpoint conversion */ static void handle_simd_shift_fpint_conv(DisasContext *s, bool is_scalar, @@ -XXX,XX +XXX,XX @@ static void disas_simd_shift_imm(DisasContext *s, uint32_t insn) } switch (opcode) { - case 0x1c: /* SCVTF / UCVTF */ - handle_simd_shift_intfp_conv(s, false, is_q, is_u, immh, immb, - opcode, rn, rd); - break; case 0x1f: /* FCVTZS/ FCVTZU */ handle_simd_shift_fpint_conv(s, false, is_q, is_u, immh, immb, rn, rd); return; @@ -XXX,XX +XXX,XX @@ static void disas_simd_shift_imm(DisasContext *s, uint32_t insn) case 0x12: /* SQSHRN / UQSHRN */ case 0x13: /* SQRSHRN / UQRSHRN */ case 0x14: /* SSHLL / USHLL */ + case 0x1c: /* SCVTF / UCVTF */ unallocated_encoding(s); return; } @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) opcode |= (extract32(size, 1, 1) << 5) | (u << 6); size = is_double ? 3 : 2; switch (opcode) { - case 0x1d: /* SCVTF */ - case 0x5d: /* UCVTF */ - { - bool is_signed = (opcode == 0x1d) ? true : false; - int elements = is_double ? 2 : is_q ? 4 : 2; - if (is_double && !is_q) { - unallocated_encoding(s); - return; - } - if (!fp_access_check(s)) { - return; - } - handle_simd_intfp_conv(s, rd, rn, elements, is_signed, 0, size); - return; - } case 0x2c: /* FCMGT (zero) */ case 0x2d: /* FCMEQ (zero) */ case 0x2e: /* FCMLT (zero) */ @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) case 0x1f: /* FRINT64Z */ case 0x5e: /* FRINT32X */ case 0x5f: /* FRINT64X */ + case 0x1d: /* SCVTF */ + case 0x5d: /* UCVTF */ unallocated_encoding(s); return; } @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn) fpop = deposit32(fpop, 6, 1, u); switch (fpop) { - case 0x1d: /* SCVTF */ - case 0x5d: /* UCVTF */ - { - int elements; - - if (is_scalar) { - elements = 1; - } else { - elements = (is_q ? 8 : 4); - } - - if (!fp_access_check(s)) { - return; - } - handle_simd_intfp_conv(s, rd, rn, elements, !u, 0, MO_16); - return; - } - break; case 0x2c: /* FCMGT (zero) */ case 0x2d: /* FCMEQ (zero) */ case 0x2e: /* FCMLT (zero) */ @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn) case 0x58: /* FRINTA */ case 0x59: /* FRINTX */ case 0x79: /* FRINTI */ + case 0x1d: /* SCVTF */ + case 0x5d: /* UCVTF */ unallocated_encoding(s); return; } diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ DO_3OP_PAIR(gvec_uminp_s, MIN, uint32_t, H4) clear_tail(d, oprsz, simd_maxsz(desc)); \ } +DO_VCVT_FIXED(gvec_vcvt_sd, helper_vfp_sqtod, uint64_t) +DO_VCVT_FIXED(gvec_vcvt_ud, helper_vfp_uqtod, uint64_t) DO_VCVT_FIXED(gvec_vcvt_sf, helper_vfp_sltos, uint32_t) DO_VCVT_FIXED(gvec_vcvt_uf, helper_vfp_ultos, uint32_t) -DO_VCVT_FIXED(gvec_vcvt_rz_fs, helper_vfp_tosls_round_to_zero, uint32_t) -DO_VCVT_FIXED(gvec_vcvt_rz_fu, helper_vfp_touls_round_to_zero, uint32_t) DO_VCVT_FIXED(gvec_vcvt_sh, helper_vfp_shtoh, uint16_t) DO_VCVT_FIXED(gvec_vcvt_uh, helper_vfp_uhtoh, uint16_t) + +DO_VCVT_FIXED(gvec_vcvt_rz_fs, helper_vfp_tosls_round_to_zero, uint32_t) +DO_VCVT_FIXED(gvec_vcvt_rz_fu, helper_vfp_touls_round_to_zero, uint32_t) DO_VCVT_FIXED(gvec_vcvt_rz_hs, helper_vfp_toshh_round_to_zero, uint16_t) DO_VCVT_FIXED(gvec_vcvt_rz_hu, helper_vfp_touhh_round_to_zero, uint16_t) -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Remove handle_simd_shift_fpint_conv and disas_simd_shift_imm as these were the last insns decoded by those functions. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-64-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/helper.h | 4 + target/arm/tcg/a64.decode | 8 ++ target/arm/tcg/translate-a64.c | 160 +++------------------------------ target/arm/tcg/vec_helper.c | 2 + target/arm/vfp_helper.c | 4 + 5 files changed, 32 insertions(+), 146 deletions(-) diff --git a/target/arm/helper.h b/target/arm/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_touhs_round_to_zero, i32, f32, i32, ptr) DEF_HELPER_3(vfp_touls_round_to_zero, i32, f32, i32, ptr) DEF_HELPER_3(vfp_toshd_round_to_zero, i64, f64, i32, ptr) DEF_HELPER_3(vfp_tosld_round_to_zero, i64, f64, i32, ptr) +DEF_HELPER_3(vfp_tosqd_round_to_zero, i64, f64, i32, ptr) DEF_HELPER_3(vfp_touhd_round_to_zero, i64, f64, i32, ptr) DEF_HELPER_3(vfp_tould_round_to_zero, i64, f64, i32, ptr) +DEF_HELPER_3(vfp_touqd_round_to_zero, i64, f64, i32, ptr) DEF_HELPER_3(vfp_touhh, i32, f16, i32, ptr) DEF_HELPER_3(vfp_toshh, i32, f16, i32, ptr) DEF_HELPER_3(vfp_toulh, i32, f16, i32, ptr) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_vcvt_rz_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_vcvt_sd, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_vcvt_ud, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_vcvt_rz_ds, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_vcvt_rz_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_vcvt_rm_ss, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_vcvt_rm_us, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ SCVTF_vf 0.00 11110 ....... 111001 ..... ..... @fcvtq_d UCVTF_vf 0.10 11110 ....... 111001 ..... ..... @fcvtq_h UCVTF_vf 0.10 11110 ....... 111001 ..... ..... @fcvtq_s UCVTF_vf 0.10 11110 ....... 111001 ..... ..... @fcvtq_d + +FCVTZS_vf 0.00 11110 ....... 111111 ..... ..... @fcvtq_h +FCVTZS_vf 0.00 11110 ....... 111111 ..... ..... @fcvtq_s +FCVTZS_vf 0.00 11110 ....... 111111 ..... ..... @fcvtq_d + +FCVTZU_vf 0.10 11110 ....... 111111 ..... ..... @fcvtq_h +FCVTZU_vf 0.10 11110 ....... 111111 ..... ..... @fcvtq_s +FCVTZU_vf 0.10 11110 ....... 111111 ..... ..... @fcvtq_d diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ TRANS(UCVTF_vi, do_gvec_op2_fpst, TRANS(UCVTF_vf, do_gvec_op2_fpst, a->esz, a->q, a->rd, a->rn, a->shift, f_ucvtf_v) -/* FCVTZS, FVCVTZU - FP to fixedpoint conversion */ -static void handle_simd_shift_fpint_conv(DisasContext *s, bool is_scalar, - bool is_q, bool is_u, - int immh, int immb, int rn, int rd) -{ - int immhb = immh << 3 | immb; - int pass, size, fracbits; - TCGv_ptr tcg_fpstatus; - TCGv_i32 tcg_rmode, tcg_shift; +static gen_helper_gvec_2_ptr * const f_fcvtzs_vf[] = { + gen_helper_gvec_vcvt_rz_hs, + gen_helper_gvec_vcvt_rz_fs, + gen_helper_gvec_vcvt_rz_ds, +}; +TRANS(FCVTZS_vf, do_gvec_op2_fpst, + a->esz, a->q, a->rd, a->rn, a->shift, f_fcvtzs_vf) - if (immh & 0x8) { - size = MO_64; - if (!is_scalar && !is_q) { - unallocated_encoding(s); - return; - } - } else if (immh & 0x4) { - size = MO_32; - } else if (immh & 0x2) { - size = MO_16; - if (!dc_isar_feature(aa64_fp16, s)) { - unallocated_encoding(s); - return; - } - } else { - /* Should have split out AdvSIMD modified immediate earlier. */ - assert(immh == 1); - unallocated_encoding(s); - return; - } - - if (!fp_access_check(s)) { - return; - } - - assert(!(is_scalar && is_q)); - - tcg_fpstatus = fpstatus_ptr(size == MO_16 ? FPST_FPCR_F16 : FPST_FPCR); - tcg_rmode = gen_set_rmode(FPROUNDING_ZERO, tcg_fpstatus); - fracbits = (16 << size) - immhb; - tcg_shift = tcg_constant_i32(fracbits); - - if (size == MO_64) { - int maxpass = is_scalar ? 1 : 2; - - for (pass = 0; pass < maxpass; pass++) { - TCGv_i64 tcg_op = tcg_temp_new_i64(); - - read_vec_element(s, tcg_op, rn, pass, MO_64); - if (is_u) { - gen_helper_vfp_touqd(tcg_op, tcg_op, tcg_shift, tcg_fpstatus); - } else { - gen_helper_vfp_tosqd(tcg_op, tcg_op, tcg_shift, tcg_fpstatus); - } - write_vec_element(s, tcg_op, rd, pass, MO_64); - } - clear_vec_high(s, is_q, rd); - } else { - void (*fn)(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr); - int maxpass = is_scalar ? 1 : ((8 << is_q) >> size); - - switch (size) { - case MO_16: - if (is_u) { - fn = gen_helper_vfp_touhh; - } else { - fn = gen_helper_vfp_toshh; - } - break; - case MO_32: - if (is_u) { - fn = gen_helper_vfp_touls; - } else { - fn = gen_helper_vfp_tosls; - } - break; - default: - g_assert_not_reached(); - } - - for (pass = 0; pass < maxpass; pass++) { - TCGv_i32 tcg_op = tcg_temp_new_i32(); - - read_vec_element_i32(s, tcg_op, rn, pass, size); - fn(tcg_op, tcg_op, tcg_shift, tcg_fpstatus); - if (is_scalar) { - if (size == MO_16 && !is_u) { - tcg_gen_ext16u_i32(tcg_op, tcg_op); - } - write_fp_sreg(s, rd, tcg_op); - } else { - write_vec_element_i32(s, tcg_op, rd, pass, size); - } - } - if (!is_scalar) { - clear_vec_high(s, is_q, rd); - } - } - - gen_restore_rmode(tcg_rmode, tcg_fpstatus); -} +static gen_helper_gvec_2_ptr * const f_fcvtzu_vf[] = { + gen_helper_gvec_vcvt_rz_hu, + gen_helper_gvec_vcvt_rz_fu, + gen_helper_gvec_vcvt_rz_du, +}; +TRANS(FCVTZU_vf, do_gvec_op2_fpst, + a->esz, a->q, a->rd, a->rn, a->shift, f_fcvtzu_vf) static void handle_2misc_64(DisasContext *s, int opcode, bool u, TCGv_i64 tcg_rd, TCGv_i64 tcg_rn, @@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) g_assert_not_reached(); } -/* AdvSIMD shift by immediate - * 31 30 29 28 23 22 19 18 16 15 11 10 9 5 4 0 - * +---+---+---+-------------+------+------+--------+---+------+------+ - * | 0 | Q | U | 0 1 1 1 1 0 | immh | immb | opcode | 1 | Rn | Rd | - * +---+---+---+-------------+------+------+--------+---+------+------+ - */ -static void disas_simd_shift_imm(DisasContext *s, uint32_t insn) -{ - int rd = extract32(insn, 0, 5); - int rn = extract32(insn, 5, 5); - int opcode = extract32(insn, 11, 5); - int immb = extract32(insn, 16, 3); - int immh = extract32(insn, 19, 4); - bool is_u = extract32(insn, 29, 1); - bool is_q = extract32(insn, 30, 1); - - if (immh == 0) { - unallocated_encoding(s); - return; - } - - switch (opcode) { - case 0x1f: /* FCVTZS/ FCVTZU */ - handle_simd_shift_fpint_conv(s, false, is_q, is_u, immh, immb, rn, rd); - return; - default: - case 0x00: /* SSHR / USHR */ - case 0x02: /* SSRA / USRA (accumulate) */ - case 0x04: /* SRSHR / URSHR (rounding) */ - case 0x06: /* SRSRA / URSRA (accum + rounding) */ - case 0x08: /* SRI */ - case 0x0a: /* SHL / SLI */ - case 0x0c: /* SQSHLU */ - case 0x0e: /* SQSHL, UQSHL */ - case 0x10: /* SHRN / SQSHRUN */ - case 0x11: /* RSHRN / SQRSHRUN */ - case 0x12: /* SQSHRN / UQSHRN */ - case 0x13: /* SQRSHRN / UQRSHRN */ - case 0x14: /* SSHLL / USHLL */ - case 0x1c: /* SCVTF / UCVTF */ - unallocated_encoding(s); - return; - } -} - static void handle_2misc_widening(DisasContext *s, int opcode, bool is_q, int size, int rn, int rd) { @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn) static const AArch64DecodeTable data_proc_simd[] = { /* pattern , mask , fn */ { 0x0e200800, 0x9f3e0c00, disas_simd_two_reg_misc }, - { 0x0f000400, 0x9f800400, disas_simd_shift_imm }, { 0x5e200800, 0xdf3e0c00, disas_simd_scalar_two_reg_misc }, { 0x0e780800, 0x8f7e0c00, disas_simd_two_reg_misc_fp16 }, { 0x00000000, 0x00000000, NULL } diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ DO_VCVT_FIXED(gvec_vcvt_uf, helper_vfp_ultos, uint32_t) DO_VCVT_FIXED(gvec_vcvt_sh, helper_vfp_shtoh, uint16_t) DO_VCVT_FIXED(gvec_vcvt_uh, helper_vfp_uhtoh, uint16_t) +DO_VCVT_FIXED(gvec_vcvt_rz_ds, helper_vfp_tosqd_round_to_zero, uint64_t) +DO_VCVT_FIXED(gvec_vcvt_rz_du, helper_vfp_touqd_round_to_zero, uint64_t) DO_VCVT_FIXED(gvec_vcvt_rz_fs, helper_vfp_tosls_round_to_zero, uint32_t) DO_VCVT_FIXED(gvec_vcvt_rz_fu, helper_vfp_touls_round_to_zero, uint32_t) DO_VCVT_FIXED(gvec_vcvt_rz_hs, helper_vfp_toshh_round_to_zero, uint16_t) diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/vfp_helper.c +++ b/target/arm/vfp_helper.c @@ -XXX,XX +XXX,XX @@ VFP_CONV_FIX_A64(sq, h, 16, dh_ctype_f16, 64, int64) VFP_CONV_FIX(uh, h, 16, dh_ctype_f16, 32, uint16) VFP_CONV_FIX(ul, h, 16, dh_ctype_f16, 32, uint32) VFP_CONV_FIX_A64(uq, h, 16, dh_ctype_f16, 64, uint64) +VFP_CONV_FLOAT_FIX_ROUND(sq, d, 64, float64, 64, int64, + float_round_to_zero, _round_to_zero) +VFP_CONV_FLOAT_FIX_ROUND(uq, d, 64, float64, 64, uint64, + float_round_to_zero, _round_to_zero) #undef VFP_CONV_FIX #undef VFP_CONV_FIX_FLOAT -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Remove handle_2misc_64 as these were the last insns decoded by that function. Remove helper_advsimd_f16to[su]inth as unused; we now always go through helper_vfp_to[su]hh or a specialized vector function instead. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-65-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/helper.h | 2 + target/arm/tcg/helper-a64.h | 2 - target/arm/tcg/a64.decode | 25 ++++ target/arm/tcg/helper-a64.c | 32 ----- target/arm/tcg/translate-a64.c | 227 +++++++++++---------------------- target/arm/tcg/vec_helper.c | 2 + 6 files changed, 102 insertions(+), 188 deletions(-) diff --git a/target/arm/helper.h b/target/arm/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_vcvt_ud, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_vcvt_rz_ds, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_vcvt_rz_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_vcvt_rm_sd, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_vcvt_rm_ud, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_vcvt_rm_ss, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_vcvt_rm_us, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_vcvt_rm_sh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/helper-a64.h b/target/arm/tcg/helper-a64.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-a64.h +++ b/target/arm/tcg/helper-a64.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(advsimd_mulx2h, i32, i32, i32, ptr) DEF_HELPER_4(advsimd_muladd2h, i32, i32, i32, i32, ptr) DEF_HELPER_2(advsimd_rinth_exact, f16, f16, ptr) DEF_HELPER_2(advsimd_rinth, f16, f16, ptr) -DEF_HELPER_2(advsimd_f16tosinth, i32, f16, ptr) -DEF_HELPER_2(advsimd_f16touinth, i32, f16, ptr) DEF_HELPER_2(exception_return, void, env, i64) DEF_HELPER_FLAGS_2(dc_zva, TCG_CALL_NO_WG, void, env, i64) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ SCVTF_vi 0.00 1110 0.1 00001 11011 0 ..... ..... @qrr_sd UCVTF_vi 0.10 1110 011 11001 11011 0 ..... ..... @qrr_h UCVTF_vi 0.10 1110 0.1 00001 11011 0 ..... ..... @qrr_sd +FCVTNS_vi 0.00 1110 011 11001 10101 0 ..... ..... @qrr_h +FCVTNS_vi 0.00 1110 0.1 00001 10101 0 ..... ..... @qrr_sd +FCVTNU_vi 0.10 1110 011 11001 10101 0 ..... ..... @qrr_h +FCVTNU_vi 0.10 1110 0.1 00001 10101 0 ..... ..... @qrr_sd + +FCVTPS_vi 0.00 1110 111 11001 10101 0 ..... ..... @qrr_h +FCVTPS_vi 0.00 1110 1.1 00001 10101 0 ..... ..... @qrr_sd +FCVTPU_vi 0.10 1110 111 11001 10101 0 ..... ..... @qrr_h +FCVTPU_vi 0.10 1110 1.1 00001 10101 0 ..... ..... @qrr_sd + +FCVTMS_vi 0.00 1110 011 11001 10111 0 ..... ..... @qrr_h +FCVTMS_vi 0.00 1110 0.1 00001 10111 0 ..... ..... @qrr_sd +FCVTMU_vi 0.10 1110 011 11001 10111 0 ..... ..... @qrr_h +FCVTMU_vi 0.10 1110 0.1 00001 10111 0 ..... ..... @qrr_sd + +FCVTZS_vi 0.00 1110 111 11001 10111 0 ..... ..... @qrr_h +FCVTZS_vi 0.00 1110 1.1 00001 10111 0 ..... ..... @qrr_sd +FCVTZU_vi 0.10 1110 111 11001 10111 0 ..... ..... @qrr_h +FCVTZU_vi 0.10 1110 1.1 00001 10111 0 ..... ..... @qrr_sd + +FCVTAS_vi 0.00 1110 011 11001 11001 0 ..... ..... @qrr_h +FCVTAS_vi 0.00 1110 0.1 00001 11001 0 ..... ..... @qrr_sd +FCVTAU_vi 0.10 1110 011 11001 11001 0 ..... ..... @qrr_h +FCVTAU_vi 0.10 1110 0.1 00001 11001 0 ..... ..... @qrr_sd + &fcvt_q rd rn esz q shift @fcvtq_h . q:1 . ...... 001 .... ...... rn:5 rd:5 \ &fcvt_q esz=1 shift=%fcvt_f_sh_h diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-a64.c +++ b/target/arm/tcg/helper-a64.c @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(advsimd_rinth)(uint32_t x, void *fp_status) return ret; } -/* - * Half-precision floating point conversion functions - * - * There are a multitude of conversion functions with various - * different rounding modes. This is dealt with by the calling code - * setting the mode appropriately before calling the helper. - */ - -uint32_t HELPER(advsimd_f16tosinth)(uint32_t a, void *fpstp) -{ - float_status *fpst = fpstp; - - /* Invalid if we are passed a NaN */ - if (float16_is_any_nan(a)) { - float_raise(float_flag_invalid, fpst); - return 0; - } - return float16_to_int16(a, fpst); -} - -uint32_t HELPER(advsimd_f16touinth)(uint32_t a, void *fpstp) -{ - float_status *fpst = fpstp; - - /* Invalid if we are passed a NaN */ - if (float16_is_any_nan(a)) { - float_raise(float_flag_invalid, fpst); - return 0; - } - return float16_to_uint16(a, fpst); -} - static int el_from_spsr(uint32_t spsr) { /* Return the exception level that this SPSR is requesting a return to, diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2_ptr * const f_fcvtzu_vf[] = { TRANS(FCVTZU_vf, do_gvec_op2_fpst, a->esz, a->q, a->rd, a->rn, a->shift, f_fcvtzu_vf) -static void handle_2misc_64(DisasContext *s, int opcode, bool u, - TCGv_i64 tcg_rd, TCGv_i64 tcg_rn, - TCGv_i32 tcg_rmode, TCGv_ptr tcg_fpstatus) -{ - /* Handle 64->64 opcodes which are shared between the scalar and - * vector 2-reg-misc groups. We cover every integer opcode where size == 3 - * is valid in either group and also the double-precision fp ops. - * The caller only need provide tcg_rmode and tcg_fpstatus if the op - * requires them. - */ - switch (opcode) { - case 0x1a: /* FCVTNS */ - case 0x1b: /* FCVTMS */ - case 0x1c: /* FCVTAS */ - case 0x3a: /* FCVTPS */ - case 0x3b: /* FCVTZS */ - gen_helper_vfp_tosqd(tcg_rd, tcg_rn, tcg_constant_i32(0), tcg_fpstatus); - break; - case 0x5a: /* FCVTNU */ - case 0x5b: /* FCVTMU */ - case 0x5c: /* FCVTAU */ - case 0x7a: /* FCVTPU */ - case 0x7b: /* FCVTZU */ - gen_helper_vfp_touqd(tcg_rd, tcg_rn, tcg_constant_i32(0), tcg_fpstatus); - break; - default: - case 0x4: /* CLS, CLZ */ - case 0x5: /* NOT */ - case 0x7: /* SQABS, SQNEG */ - case 0x8: /* CMGT, CMGE */ - case 0x9: /* CMEQ, CMLE */ - case 0xa: /* CMLT */ - case 0xb: /* ABS, NEG */ - case 0x2f: /* FABS */ - case 0x6f: /* FNEG */ - case 0x7f: /* FSQRT */ - case 0x18: /* FRINTN */ - case 0x19: /* FRINTM */ - case 0x38: /* FRINTP */ - case 0x39: /* FRINTZ */ - case 0x58: /* FRINTA */ - case 0x79: /* FRINTI */ - case 0x59: /* FRINTX */ - case 0x1e: /* FRINT32Z */ - case 0x5e: /* FRINT32X */ - case 0x1f: /* FRINT64Z */ - case 0x5f: /* FRINT64X */ - g_assert_not_reached(); - } -} +static gen_helper_gvec_2_ptr * const f_fcvt_s_vi[] = { + gen_helper_gvec_vcvt_rm_sh, + gen_helper_gvec_vcvt_rm_ss, + gen_helper_gvec_vcvt_rm_sd, +}; + +static gen_helper_gvec_2_ptr * const f_fcvt_u_vi[] = { + gen_helper_gvec_vcvt_rm_uh, + gen_helper_gvec_vcvt_rm_us, + gen_helper_gvec_vcvt_rm_ud, +}; + +TRANS(FCVTNS_vi, do_gvec_op2_fpst, + a->esz, a->q, a->rd, a->rn, float_round_nearest_even, f_fcvt_s_vi) +TRANS(FCVTNU_vi, do_gvec_op2_fpst, + a->esz, a->q, a->rd, a->rn, float_round_nearest_even, f_fcvt_u_vi) +TRANS(FCVTPS_vi, do_gvec_op2_fpst, + a->esz, a->q, a->rd, a->rn, float_round_up, f_fcvt_s_vi) +TRANS(FCVTPU_vi, do_gvec_op2_fpst, + a->esz, a->q, a->rd, a->rn, float_round_up, f_fcvt_u_vi) +TRANS(FCVTMS_vi, do_gvec_op2_fpst, + a->esz, a->q, a->rd, a->rn, float_round_down, f_fcvt_s_vi) +TRANS(FCVTMU_vi, do_gvec_op2_fpst, + a->esz, a->q, a->rd, a->rn, float_round_down, f_fcvt_u_vi) +TRANS(FCVTZS_vi, do_gvec_op2_fpst, + a->esz, a->q, a->rd, a->rn, float_round_to_zero, f_fcvt_s_vi) +TRANS(FCVTZU_vi, do_gvec_op2_fpst, + a->esz, a->q, a->rd, a->rn, float_round_to_zero, f_fcvt_u_vi) +TRANS(FCVTAS_vi, do_gvec_op2_fpst, + a->esz, a->q, a->rd, a->rn, float_round_ties_away, f_fcvt_s_vi) +TRANS(FCVTAU_vi, do_gvec_op2_fpst, + a->esz, a->q, a->rd, a->rn, float_round_ties_away, f_fcvt_u_vi) static void handle_2misc_fcmp_zero(DisasContext *s, int opcode, bool is_scalar, bool is_u, bool is_q, @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) } handle_2misc_fcmp_zero(s, opcode, false, u, is_q, size, rn, rd); return; - case 0x1a: /* FCVTNS */ - case 0x1b: /* FCVTMS */ - case 0x3a: /* FCVTPS */ - case 0x3b: /* FCVTZS */ - case 0x5a: /* FCVTNU */ - case 0x5b: /* FCVTMU */ - case 0x7a: /* FCVTPU */ - case 0x7b: /* FCVTZU */ - need_fpstatus = true; - rmode = extract32(opcode, 5, 1) | (extract32(opcode, 0, 1) << 1); - if (size == 3 && !is_q) { - unallocated_encoding(s); - return; - } - break; - case 0x5c: /* FCVTAU */ - case 0x1c: /* FCVTAS */ - need_fpstatus = true; - rmode = FPROUNDING_TIEAWAY; - if (size == 3 && !is_q) { - unallocated_encoding(s); - return; - } - break; case 0x3c: /* URECPE */ if (size == 3) { unallocated_encoding(s); @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) case 0x5f: /* FRINT64X */ case 0x1d: /* SCVTF */ case 0x5d: /* UCVTF */ + case 0x1a: /* FCVTNS */ + case 0x1b: /* FCVTMS */ + case 0x3a: /* FCVTPS */ + case 0x3b: /* FCVTZS */ + case 0x5a: /* FCVTNU */ + case 0x5b: /* FCVTMU */ + case 0x7a: /* FCVTPU */ + case 0x7b: /* FCVTZU */ + case 0x5c: /* FCVTAU */ + case 0x1c: /* FCVTAS */ unallocated_encoding(s); return; } @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) tcg_rmode = NULL; } - if (size == 3) { - /* All 64-bit element operations can be shared with scalar 2misc */ - int pass; - - /* Coverity claims (size == 3 && !is_q) has been eliminated - * from all paths leading to here. - */ - tcg_debug_assert(is_q); - for (pass = 0; pass < 2; pass++) { - TCGv_i64 tcg_op = tcg_temp_new_i64(); - TCGv_i64 tcg_res = tcg_temp_new_i64(); - - read_vec_element(s, tcg_op, rn, pass, MO_64); - - handle_2misc_64(s, opcode, u, tcg_res, tcg_op, - tcg_rmode, tcg_fpstatus); - - write_vec_element(s, tcg_res, rd, pass, MO_64); - } - } else { + { int pass; assert(size == 2); @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) { /* Special cases for 32 bit elements */ switch (opcode) { - case 0x1a: /* FCVTNS */ - case 0x1b: /* FCVTMS */ - case 0x1c: /* FCVTAS */ - case 0x3a: /* FCVTPS */ - case 0x3b: /* FCVTZS */ - gen_helper_vfp_tosls(tcg_res, tcg_op, - tcg_constant_i32(0), tcg_fpstatus); - break; - case 0x5a: /* FCVTNU */ - case 0x5b: /* FCVTMU */ - case 0x5c: /* FCVTAU */ - case 0x7a: /* FCVTPU */ - case 0x7b: /* FCVTZU */ - gen_helper_vfp_touls(tcg_res, tcg_op, - tcg_constant_i32(0), tcg_fpstatus); - break; case 0x7c: /* URSQRTE */ gen_helper_rsqrte_u32(tcg_res, tcg_op); break; @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) case 0x5e: /* FRINT32X */ case 0x1f: /* FRINT64Z */ case 0x5f: /* FRINT64X */ + case 0x1a: /* FCVTNS */ + case 0x1b: /* FCVTMS */ + case 0x1c: /* FCVTAS */ + case 0x3a: /* FCVTPS */ + case 0x3b: /* FCVTZS */ + case 0x5a: /* FCVTNU */ + case 0x5b: /* FCVTMU */ + case 0x5c: /* FCVTAU */ + case 0x7a: /* FCVTPU */ + case 0x7b: /* FCVTZU */ g_assert_not_reached(); } } @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn) case 0x3d: /* FRECPE */ case 0x3f: /* FRECPX */ break; - case 0x1a: /* FCVTNS */ - rmode = FPROUNDING_TIEEVEN; - break; - case 0x1b: /* FCVTMS */ - rmode = FPROUNDING_NEGINF; - break; - case 0x1c: /* FCVTAS */ - rmode = FPROUNDING_TIEAWAY; - break; - case 0x3a: /* FCVTPS */ - rmode = FPROUNDING_POSINF; - break; - case 0x3b: /* FCVTZS */ - rmode = FPROUNDING_ZERO; - break; - case 0x5a: /* FCVTNU */ - rmode = FPROUNDING_TIEEVEN; - break; - case 0x5b: /* FCVTMU */ - rmode = FPROUNDING_NEGINF; - break; - case 0x5c: /* FCVTAU */ - rmode = FPROUNDING_TIEAWAY; - break; - case 0x7a: /* FCVTPU */ - rmode = FPROUNDING_POSINF; - break; - case 0x7b: /* FCVTZU */ - rmode = FPROUNDING_ZERO; - break; case 0x7d: /* FRSQRTE */ break; default: @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn) case 0x79: /* FRINTI */ case 0x1d: /* SCVTF */ case 0x5d: /* UCVTF */ + case 0x1a: /* FCVTNS */ + case 0x1b: /* FCVTMS */ + case 0x1c: /* FCVTAS */ + case 0x3a: /* FCVTPS */ + case 0x3b: /* FCVTZS */ + case 0x5a: /* FCVTNU */ + case 0x5b: /* FCVTMU */ + case 0x5c: /* FCVTAU */ + case 0x7a: /* FCVTPU */ + case 0x7b: /* FCVTZU */ unallocated_encoding(s); return; } @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn) read_vec_element_i32(s, tcg_op, rn, pass, MO_16); switch (fpop) { - case 0x1a: /* FCVTNS */ - case 0x1b: /* FCVTMS */ - case 0x1c: /* FCVTAS */ - case 0x3a: /* FCVTPS */ - case 0x3b: /* FCVTZS */ - gen_helper_advsimd_f16tosinth(tcg_res, tcg_op, tcg_fpstatus); - break; case 0x3d: /* FRECPE */ gen_helper_recpe_f16(tcg_res, tcg_op, tcg_fpstatus); break; - case 0x5a: /* FCVTNU */ - case 0x5b: /* FCVTMU */ - case 0x5c: /* FCVTAU */ - case 0x7a: /* FCVTPU */ - case 0x7b: /* FCVTZU */ - gen_helper_advsimd_f16touinth(tcg_res, tcg_op, tcg_fpstatus); - break; case 0x7d: /* FRSQRTE */ gen_helper_rsqrte_f16(tcg_res, tcg_op, tcg_fpstatus); break; @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn) case 0x58: /* FRINTA */ case 0x79: /* FRINTI */ case 0x59: /* FRINTX */ + case 0x1a: /* FCVTNS */ + case 0x1b: /* FCVTMS */ + case 0x1c: /* FCVTAS */ + case 0x3a: /* FCVTPS */ + case 0x3b: /* FCVTZS */ + case 0x5a: /* FCVTNU */ + case 0x5b: /* FCVTMU */ + case 0x5c: /* FCVTAU */ + case 0x7a: /* FCVTPU */ + case 0x7b: /* FCVTZU */ g_assert_not_reached(); } diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ DO_VCVT_FIXED(gvec_vcvt_rz_hu, helper_vfp_touhh_round_to_zero, uint16_t) clear_tail(d, oprsz, simd_maxsz(desc)); \ } +DO_VCVT_RMODE(gvec_vcvt_rm_sd, helper_vfp_tosqd, uint64_t) +DO_VCVT_RMODE(gvec_vcvt_rm_ud, helper_vfp_touqd, uint64_t) DO_VCVT_RMODE(gvec_vcvt_rm_ss, helper_vfp_tosls, uint32_t) DO_VCVT_RMODE(gvec_vcvt_rm_us, helper_vfp_touls, uint32_t) DO_VCVT_RMODE(gvec_vcvt_rm_sh, helper_vfp_toshh, uint16_t) -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> This includes FCMEQ, FCMGT, FCMGE, FCMLT, FCMLE. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-66-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/helper.h | 5 + target/arm/tcg/a64.decode | 30 ++++ target/arm/tcg/translate-a64.c | 249 +++++++++++++-------------------- target/arm/tcg/vec_helper.c | 4 +- 4 files changed, 138 insertions(+), 150 deletions(-) diff --git a/target/arm/helper.h b/target/arm/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_frsqrte_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_fcgt0_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_fcgt0_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_fcgt0_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_fcge0_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_fcge0_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_fcge0_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_fceq0_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_fceq0_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_fceq0_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_fcle0_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_fcle0_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_fcle0_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_fclt0_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_fclt0_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(gvec_fclt0_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_fadd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_fadd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ UQXTN_s 0111 1110 ..1 00001 01001 0 ..... ..... @rr_e FCVTXN_s 0111 1110 011 00001 01101 0 ..... ..... @rr_s +FCMGT0_s 0101 1110 111 11000 11001 0 ..... ..... @rr_h +FCMGT0_s 0101 1110 1.1 00000 11001 0 ..... ..... @rr_sd + +FCMGE0_s 0111 1110 111 11000 11001 0 ..... ..... @rr_h +FCMGE0_s 0111 1110 1.1 00000 11001 0 ..... ..... @rr_sd + +FCMEQ0_s 0101 1110 111 11000 11011 0 ..... ..... @rr_h +FCMEQ0_s 0101 1110 1.1 00000 11011 0 ..... ..... @rr_sd + +FCMLE0_s 0111 1110 111 11000 11011 0 ..... ..... @rr_h +FCMLE0_s 0111 1110 1.1 00000 11011 0 ..... ..... @rr_sd + +FCMLT0_s 0101 1110 111 11000 11101 0 ..... ..... @rr_h +FCMLT0_s 0101 1110 1.1 00000 11101 0 ..... ..... @rr_sd + @icvt_h . ....... .. ...... ...... rn:5 rd:5 \ &fcvt sf=0 esz=1 shift=0 @icvt_sd . ....... .. ...... ...... rn:5 rd:5 \ @@ -XXX,XX +XXX,XX @@ FCVTAS_vi 0.00 1110 0.1 00001 11001 0 ..... ..... @qrr_sd FCVTAU_vi 0.10 1110 011 11001 11001 0 ..... ..... @qrr_h FCVTAU_vi 0.10 1110 0.1 00001 11001 0 ..... ..... @qrr_sd +FCMGT0_v 0.00 1110 111 11000 11001 0 ..... ..... @qrr_h +FCMGT0_v 0.00 1110 1.1 00000 11001 0 ..... ..... @qrr_sd + +FCMGE0_v 0.10 1110 111 11000 11001 0 ..... ..... @qrr_h +FCMGE0_v 0.10 1110 1.1 00000 11001 0 ..... ..... @qrr_sd + +FCMEQ0_v 0.00 1110 111 11000 11011 0 ..... ..... @qrr_h +FCMEQ0_v 0.00 1110 1.1 00000 11011 0 ..... ..... @qrr_sd + +FCMLE0_v 0.10 1110 111 11000 11011 0 ..... ..... @qrr_h +FCMLE0_v 0.10 1110 1.1 00000 11011 0 ..... ..... @qrr_sd + +FCMLT0_v 0.00 1110 111 11000 11101 0 ..... ..... @qrr_h +FCMLT0_v 0.00 1110 1.1 00000 11101 0 ..... ..... @qrr_sd + &fcvt_q rd rn esz q shift @fcvtq_h . q:1 . ...... 001 .... ...... rn:5 rd:5 \ &fcvt_q esz=1 shift=%fcvt_f_sh_h diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static const FPScalar f_scalar_frsqrts = { }; TRANS(FRSQRTS_s, do_fp3_scalar, a, &f_scalar_frsqrts) +static bool do_fcmp0_s(DisasContext *s, arg_rr_e *a, + const FPScalar *f, bool swap) +{ + switch (a->esz) { + case MO_64: + if (fp_access_check(s)) { + TCGv_i64 t0 = read_fp_dreg(s, a->rn); + TCGv_i64 t1 = tcg_constant_i64(0); + if (swap) { + f->gen_d(t0, t1, t0, fpstatus_ptr(FPST_FPCR)); + } else { + f->gen_d(t0, t0, t1, fpstatus_ptr(FPST_FPCR)); + } + write_fp_dreg(s, a->rd, t0); + } + break; + case MO_32: + if (fp_access_check(s)) { + TCGv_i32 t0 = read_fp_sreg(s, a->rn); + TCGv_i32 t1 = tcg_constant_i32(0); + if (swap) { + f->gen_s(t0, t1, t0, fpstatus_ptr(FPST_FPCR)); + } else { + f->gen_s(t0, t0, t1, fpstatus_ptr(FPST_FPCR)); + } + write_fp_sreg(s, a->rd, t0); + } + break; + case MO_16: + if (!dc_isar_feature(aa64_fp16, s)) { + return false; + } + if (fp_access_check(s)) { + TCGv_i32 t0 = read_fp_hreg(s, a->rn); + TCGv_i32 t1 = tcg_constant_i32(0); + if (swap) { + f->gen_h(t0, t1, t0, fpstatus_ptr(FPST_FPCR_F16)); + } else { + f->gen_h(t0, t0, t1, fpstatus_ptr(FPST_FPCR_F16)); + } + write_fp_sreg(s, a->rd, t0); + } + break; + default: + return false; + } + return true; +} + +TRANS(FCMEQ0_s, do_fcmp0_s, a, &f_scalar_fcmeq, false) +TRANS(FCMGT0_s, do_fcmp0_s, a, &f_scalar_fcmgt, false) +TRANS(FCMGE0_s, do_fcmp0_s, a, &f_scalar_fcmge, false) +TRANS(FCMLT0_s, do_fcmp0_s, a, &f_scalar_fcmgt, true) +TRANS(FCMLE0_s, do_fcmp0_s, a, &f_scalar_fcmge, true) + static bool do_satacc_s(DisasContext *s, arg_rrr_e *a, MemOp sgn_n, MemOp sgn_m, void (*gen_bhs)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64, MemOp), @@ -XXX,XX +XXX,XX @@ TRANS(FCVTAS_vi, do_gvec_op2_fpst, TRANS(FCVTAU_vi, do_gvec_op2_fpst, a->esz, a->q, a->rd, a->rn, float_round_ties_away, f_fcvt_u_vi) -static void handle_2misc_fcmp_zero(DisasContext *s, int opcode, - bool is_scalar, bool is_u, bool is_q, - int size, int rn, int rd) -{ - bool is_double = (size == MO_64); - TCGv_ptr fpst; +static gen_helper_gvec_2_ptr * const f_fceq0[] = { + gen_helper_gvec_fceq0_h, + gen_helper_gvec_fceq0_s, + gen_helper_gvec_fceq0_d, +}; +TRANS(FCMEQ0_v, do_gvec_op2_fpst, a->esz, a->q, a->rd, a->rn, 0, f_fceq0) - if (!fp_access_check(s)) { - return; - } +static gen_helper_gvec_2_ptr * const f_fcgt0[] = { + gen_helper_gvec_fcgt0_h, + gen_helper_gvec_fcgt0_s, + gen_helper_gvec_fcgt0_d, +}; +TRANS(FCMGT0_v, do_gvec_op2_fpst, a->esz, a->q, a->rd, a->rn, 0, f_fcgt0) - fpst = fpstatus_ptr(size == MO_16 ? FPST_FPCR_F16 : FPST_FPCR); +static gen_helper_gvec_2_ptr * const f_fcge0[] = { + gen_helper_gvec_fcge0_h, + gen_helper_gvec_fcge0_s, + gen_helper_gvec_fcge0_d, +}; +TRANS(FCMGE0_v, do_gvec_op2_fpst, a->esz, a->q, a->rd, a->rn, 0, f_fcge0) - if (is_double) { - TCGv_i64 tcg_op = tcg_temp_new_i64(); - TCGv_i64 tcg_zero = tcg_constant_i64(0); - TCGv_i64 tcg_res = tcg_temp_new_i64(); - NeonGenTwoDoubleOpFn *genfn; - bool swap = false; - int pass; +static gen_helper_gvec_2_ptr * const f_fclt0[] = { + gen_helper_gvec_fclt0_h, + gen_helper_gvec_fclt0_s, + gen_helper_gvec_fclt0_d, +}; +TRANS(FCMLT0_v, do_gvec_op2_fpst, a->esz, a->q, a->rd, a->rn, 0, f_fclt0) - switch (opcode) { - case 0x2e: /* FCMLT (zero) */ - swap = true; - /* fallthrough */ - case 0x2c: /* FCMGT (zero) */ - genfn = gen_helper_neon_cgt_f64; - break; - case 0x2d: /* FCMEQ (zero) */ - genfn = gen_helper_neon_ceq_f64; - break; - case 0x6d: /* FCMLE (zero) */ - swap = true; - /* fall through */ - case 0x6c: /* FCMGE (zero) */ - genfn = gen_helper_neon_cge_f64; - break; - default: - g_assert_not_reached(); - } - - for (pass = 0; pass < (is_scalar ? 1 : 2); pass++) { - read_vec_element(s, tcg_op, rn, pass, MO_64); - if (swap) { - genfn(tcg_res, tcg_zero, tcg_op, fpst); - } else { - genfn(tcg_res, tcg_op, tcg_zero, fpst); - } - write_vec_element(s, tcg_res, rd, pass, MO_64); - } - - clear_vec_high(s, !is_scalar, rd); - } else { - TCGv_i32 tcg_op = tcg_temp_new_i32(); - TCGv_i32 tcg_zero = tcg_constant_i32(0); - TCGv_i32 tcg_res = tcg_temp_new_i32(); - NeonGenTwoSingleOpFn *genfn; - bool swap = false; - int pass, maxpasses; - - if (size == MO_16) { - switch (opcode) { - case 0x2e: /* FCMLT (zero) */ - swap = true; - /* fall through */ - case 0x2c: /* FCMGT (zero) */ - genfn = gen_helper_advsimd_cgt_f16; - break; - case 0x2d: /* FCMEQ (zero) */ - genfn = gen_helper_advsimd_ceq_f16; - break; - case 0x6d: /* FCMLE (zero) */ - swap = true; - /* fall through */ - case 0x6c: /* FCMGE (zero) */ - genfn = gen_helper_advsimd_cge_f16; - break; - default: - g_assert_not_reached(); - } - } else { - switch (opcode) { - case 0x2e: /* FCMLT (zero) */ - swap = true; - /* fall through */ - case 0x2c: /* FCMGT (zero) */ - genfn = gen_helper_neon_cgt_f32; - break; - case 0x2d: /* FCMEQ (zero) */ - genfn = gen_helper_neon_ceq_f32; - break; - case 0x6d: /* FCMLE (zero) */ - swap = true; - /* fall through */ - case 0x6c: /* FCMGE (zero) */ - genfn = gen_helper_neon_cge_f32; - break; - default: - g_assert_not_reached(); - } - } - - if (is_scalar) { - maxpasses = 1; - } else { - int vector_size = 8 << is_q; - maxpasses = vector_size >> size; - } - - for (pass = 0; pass < maxpasses; pass++) { - read_vec_element_i32(s, tcg_op, rn, pass, size); - if (swap) { - genfn(tcg_res, tcg_zero, tcg_op, fpst); - } else { - genfn(tcg_res, tcg_op, tcg_zero, fpst); - } - if (is_scalar) { - write_fp_sreg(s, rd, tcg_res); - } else { - write_vec_element_i32(s, tcg_res, rd, pass, size); - } - } - - if (!is_scalar) { - clear_vec_high(s, is_q, rd); - } - } -} +static gen_helper_gvec_2_ptr * const f_fcle0[] = { + gen_helper_gvec_fcle0_h, + gen_helper_gvec_fcle0_s, + gen_helper_gvec_fcle0_d, +}; +TRANS(FCMLE0_v, do_gvec_op2_fpst, a->esz, a->q, a->rd, a->rn, 0, f_fcle0) static void handle_2misc_reciprocal(DisasContext *s, int opcode, bool is_scalar, bool is_u, bool is_q, @@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) opcode |= (extract32(size, 1, 1) << 5) | (u << 6); size = extract32(size, 0, 1) ? 3 : 2; switch (opcode) { - case 0x2c: /* FCMGT (zero) */ - case 0x2d: /* FCMEQ (zero) */ - case 0x2e: /* FCMLT (zero) */ - case 0x6c: /* FCMGE (zero) */ - case 0x6d: /* FCMLE (zero) */ - handle_2misc_fcmp_zero(s, opcode, true, u, true, size, rn, rd); - return; case 0x3d: /* FRECPE */ case 0x3f: /* FRECPX */ case 0x7d: /* FRSQRTE */ @@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) case 0x56: /* FCVTXN, FCVTXN2 */ case 0x1d: /* SCVTF */ case 0x5d: /* UCVTF */ + case 0x2c: /* FCMGT (zero) */ + case 0x2d: /* FCMEQ (zero) */ + case 0x2e: /* FCMLT (zero) */ + case 0x6c: /* FCMGE (zero) */ + case 0x6d: /* FCMLE (zero) */ default: unallocated_encoding(s); return; @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) opcode |= (extract32(size, 1, 1) << 5) | (u << 6); size = is_double ? 3 : 2; switch (opcode) { - case 0x2c: /* FCMGT (zero) */ - case 0x2d: /* FCMEQ (zero) */ - case 0x2e: /* FCMLT (zero) */ - case 0x6c: /* FCMGE (zero) */ - case 0x6d: /* FCMLE (zero) */ - if (size == 3 && !is_q) { - unallocated_encoding(s); - return; - } - handle_2misc_fcmp_zero(s, opcode, false, u, is_q, size, rn, rd); - return; case 0x3c: /* URECPE */ if (size == 3) { unallocated_encoding(s); @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) case 0x7b: /* FCVTZU */ case 0x5c: /* FCVTAU */ case 0x1c: /* FCVTAS */ + case 0x2c: /* FCMGT (zero) */ + case 0x2d: /* FCMEQ (zero) */ + case 0x2e: /* FCMLT (zero) */ + case 0x6c: /* FCMGE (zero) */ + case 0x6d: /* FCMLE (zero) */ unallocated_encoding(s); return; } @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn) fpop = deposit32(fpop, 6, 1, u); switch (fpop) { - case 0x2c: /* FCMGT (zero) */ - case 0x2d: /* FCMEQ (zero) */ - case 0x2e: /* FCMLT (zero) */ - case 0x6c: /* FCMGE (zero) */ - case 0x6d: /* FCMLE (zero) */ - handle_2misc_fcmp_zero(s, fpop, is_scalar, 0, is_q, MO_16, rn, rd); - return; case 0x3d: /* FRECPE */ case 0x3f: /* FRECPX */ break; @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn) case 0x5c: /* FCVTAU */ case 0x7a: /* FCVTPU */ case 0x7b: /* FCVTZU */ + case 0x2c: /* FCMGT (zero) */ + case 0x2d: /* FCMEQ (zero) */ + case 0x2e: /* FCMLT (zero) */ + case 0x6c: /* FCMGE (zero) */ + case 0x6d: /* FCMLE (zero) */ unallocated_encoding(s); return; } diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ DO_2OP(gvec_touszh, vfp_touszh, float16) #define DO_2OP_CMP0(FN, CMPOP, DIRN) \ WRAP_CMP0_##DIRN(FN, CMPOP, float16) \ WRAP_CMP0_##DIRN(FN, CMPOP, float32) \ + WRAP_CMP0_##DIRN(FN, CMPOP, float64) \ DO_2OP(gvec_f##FN##0_h, float16_##FN##0, float16) \ - DO_2OP(gvec_f##FN##0_s, float32_##FN##0, float32) + DO_2OP(gvec_f##FN##0_s, float32_##FN##0, float32) \ + DO_2OP(gvec_f##FN##0_d, float64_##FN##0, float64) DO_2OP_CMP0(cgt, cgt, FWD) DO_2OP_CMP0(cge, cge, FWD) -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Remove disas_simd_scalar_two_reg_misc and disas_simd_two_reg_misc_fp16 as these were the last insns decoded by those functions. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-67-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 15 ++ target/arm/tcg/translate-a64.c | 329 ++++----------------------------- 2 files changed, 53 insertions(+), 291 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ FCMLE0_s 0111 1110 1.1 00000 11011 0 ..... ..... @rr_sd FCMLT0_s 0101 1110 111 11000 11101 0 ..... ..... @rr_h FCMLT0_s 0101 1110 1.1 00000 11101 0 ..... ..... @rr_sd +FRECPE_s 0101 1110 111 11001 11011 0 ..... ..... @rr_h +FRECPE_s 0101 1110 1.1 00001 11011 0 ..... ..... @rr_sd + +FRECPX_s 0101 1110 111 11001 11111 0 ..... ..... @rr_h +FRECPX_s 0101 1110 1.1 00001 11111 0 ..... ..... @rr_sd + +FRSQRTE_s 0111 1110 111 11001 11011 0 ..... ..... @rr_h +FRSQRTE_s 0111 1110 1.1 00001 11011 0 ..... ..... @rr_sd + @icvt_h . ....... .. ...... ...... rn:5 rd:5 \ &fcvt sf=0 esz=1 shift=0 @icvt_sd . ....... .. ...... ...... rn:5 rd:5 \ @@ -XXX,XX +XXX,XX @@ FCMLE0_v 0.10 1110 1.1 00000 11011 0 ..... ..... @qrr_sd FCMLT0_v 0.00 1110 111 11000 11101 0 ..... ..... @qrr_h FCMLT0_v 0.00 1110 1.1 00000 11101 0 ..... ..... @qrr_sd +FRECPE_v 0.00 1110 111 11001 11011 0 ..... ..... @qrr_h +FRECPE_v 0.00 1110 1.1 00001 11011 0 ..... ..... @qrr_sd + +FRSQRTE_v 0.10 1110 111 11001 11011 0 ..... ..... @qrr_h +FRSQRTE_v 0.10 1110 1.1 00001 11011 0 ..... ..... @qrr_sd + &fcvt_q rd rn esz q shift @fcvtq_h . q:1 . ...... 001 .... ...... rn:5 rd:5 \ &fcvt_q esz=1 shift=%fcvt_f_sh_h diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FRINT64Z_s, aa64_frint, do_fp1_scalar, a, &f_scalar_frint64, FPROUNDING_ZERO) TRANS_FEAT(FRINT64X_s, aa64_frint, do_fp1_scalar, a, &f_scalar_frint64, -1) +static const FPScalar1 f_scalar_frecpe = { + gen_helper_recpe_f16, + gen_helper_recpe_f32, + gen_helper_recpe_f64, +}; +TRANS(FRECPE_s, do_fp1_scalar, a, &f_scalar_frecpe, -1) + +static const FPScalar1 f_scalar_frecpx = { + gen_helper_frecpx_f16, + gen_helper_frecpx_f32, + gen_helper_frecpx_f64, +}; +TRANS(FRECPX_s, do_fp1_scalar, a, &f_scalar_frecpx, -1) + +static const FPScalar1 f_scalar_frsqrte = { + gen_helper_rsqrte_f16, + gen_helper_rsqrte_f32, + gen_helper_rsqrte_f64, +}; +TRANS(FRSQRTE_s, do_fp1_scalar, a, &f_scalar_frsqrte, -1) + static bool trans_FCVT_s_ds(DisasContext *s, arg_rr *a) { if (fp_access_check(s)) { @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2_ptr * const f_fcle0[] = { }; TRANS(FCMLE0_v, do_gvec_op2_fpst, a->esz, a->q, a->rd, a->rn, 0, f_fcle0) +static gen_helper_gvec_2_ptr * const f_frecpe[] = { + gen_helper_gvec_frecpe_h, + gen_helper_gvec_frecpe_s, + gen_helper_gvec_frecpe_d, +}; +TRANS(FRECPE_v, do_gvec_op2_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frecpe) + +static gen_helper_gvec_2_ptr * const f_frsqrte[] = { + gen_helper_gvec_frsqrte_h, + gen_helper_gvec_frsqrte_s, + gen_helper_gvec_frsqrte_d, +}; +TRANS(FRSQRTE_v, do_gvec_op2_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frsqrte) + static void handle_2misc_reciprocal(DisasContext *s, int opcode, bool is_scalar, bool is_u, bool is_q, int size, int rn, int rd) { bool is_double = (size == 3); - TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR); if (is_double) { - TCGv_i64 tcg_op = tcg_temp_new_i64(); - TCGv_i64 tcg_res = tcg_temp_new_i64(); - int pass; - - for (pass = 0; pass < (is_scalar ? 1 : 2); pass++) { - read_vec_element(s, tcg_op, rn, pass, MO_64); - switch (opcode) { - case 0x3d: /* FRECPE */ - gen_helper_recpe_f64(tcg_res, tcg_op, fpst); - break; - case 0x3f: /* FRECPX */ - gen_helper_frecpx_f64(tcg_res, tcg_op, fpst); - break; - case 0x7d: /* FRSQRTE */ - gen_helper_rsqrte_f64(tcg_res, tcg_op, fpst); - break; - default: - g_assert_not_reached(); - } - write_vec_element(s, tcg_res, rd, pass, MO_64); - } - clear_vec_high(s, !is_scalar, rd); + g_assert_not_reached(); } else { TCGv_i32 tcg_op = tcg_temp_new_i32(); TCGv_i32 tcg_res = tcg_temp_new_i32(); @@ -XXX,XX +XXX,XX @@ static void handle_2misc_reciprocal(DisasContext *s, int opcode, gen_helper_recpe_u32(tcg_res, tcg_op); break; case 0x3d: /* FRECPE */ - gen_helper_recpe_f32(tcg_res, tcg_op, fpst); - break; case 0x3f: /* FRECPX */ - gen_helper_frecpx_f32(tcg_res, tcg_op, fpst); - break; case 0x7d: /* FRSQRTE */ - gen_helper_rsqrte_f32(tcg_res, tcg_op, fpst); - break; default: g_assert_not_reached(); } @@ -XXX,XX +XXX,XX @@ static void handle_2misc_reciprocal(DisasContext *s, int opcode, } } -/* AdvSIMD scalar two reg misc - * 31 30 29 28 24 23 22 21 17 16 12 11 10 9 5 4 0 - * +-----+---+-----------+------+-----------+--------+-----+------+------+ - * | 0 1 | U | 1 1 1 1 0 | size | 1 0 0 0 0 | opcode | 1 0 | Rn | Rd | - * +-----+---+-----------+------+-----------+--------+-----+------+------+ - */ -static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) -{ - int rd = extract32(insn, 0, 5); - int rn = extract32(insn, 5, 5); - int opcode = extract32(insn, 12, 5); - int size = extract32(insn, 22, 2); - bool u = extract32(insn, 29, 1); - - switch (opcode) { - case 0xc ... 0xf: - case 0x16 ... 0x1d: - case 0x1f: - /* Floating point: U, size[1] and opcode indicate operation; - * size[0] indicates single or double precision. - */ - opcode |= (extract32(size, 1, 1) << 5) | (u << 6); - size = extract32(size, 0, 1) ? 3 : 2; - switch (opcode) { - case 0x3d: /* FRECPE */ - case 0x3f: /* FRECPX */ - case 0x7d: /* FRSQRTE */ - if (!fp_access_check(s)) { - return; - } - handle_2misc_reciprocal(s, opcode, true, u, true, size, rn, rd); - return; - case 0x1a: /* FCVTNS */ - case 0x1b: /* FCVTMS */ - case 0x3a: /* FCVTPS */ - case 0x3b: /* FCVTZS */ - case 0x5a: /* FCVTNU */ - case 0x5b: /* FCVTMU */ - case 0x7a: /* FCVTPU */ - case 0x7b: /* FCVTZU */ - case 0x1c: /* FCVTAS */ - case 0x5c: /* FCVTAU */ - case 0x56: /* FCVTXN, FCVTXN2 */ - case 0x1d: /* SCVTF */ - case 0x5d: /* UCVTF */ - case 0x2c: /* FCMGT (zero) */ - case 0x2d: /* FCMEQ (zero) */ - case 0x2e: /* FCMLT (zero) */ - case 0x6c: /* FCMGE (zero) */ - case 0x6d: /* FCMLE (zero) */ - default: - unallocated_encoding(s); - return; - } - break; - default: - case 0x3: /* USQADD / SUQADD */ - case 0x7: /* SQABS / SQNEG */ - case 0x8: /* CMGT, CMGE */ - case 0x9: /* CMEQ, CMLE */ - case 0xa: /* CMLT */ - case 0xb: /* ABS, NEG */ - case 0x12: /* SQXTUN */ - case 0x14: /* SQXTN, UQXTN */ - unallocated_encoding(s); - return; - } - g_assert_not_reached(); -} - static void handle_2misc_widening(DisasContext *s, int opcode, bool is_q, int size, int rn, int rd) { @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) unallocated_encoding(s); return; } - /* fall through */ - case 0x3d: /* FRECPE */ - case 0x7d: /* FRSQRTE */ - if (size == 3 && !is_q) { - unallocated_encoding(s); - return; - } if (!fp_access_check(s)) { return; } @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) case 0x2e: /* FCMLT (zero) */ case 0x6c: /* FCMGE (zero) */ case 0x6d: /* FCMLE (zero) */ + case 0x3d: /* FRECPE */ + case 0x7d: /* FRSQRTE */ unallocated_encoding(s); return; } @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) } } -/* AdvSIMD [scalar] two register miscellaneous (FP16) - * - * 31 30 29 28 27 24 23 22 21 17 16 12 11 10 9 5 4 0 - * +---+---+---+---+---------+---+-------------+--------+-----+------+------+ - * | 0 | Q | U | S | 1 1 1 0 | a | 1 1 1 1 0 0 | opcode | 1 0 | Rn | Rd | - * +---+---+---+---+---------+---+-------------+--------+-----+------+------+ - * mask: 1000 1111 0111 1110 0000 1100 0000 0000 0x8f7e 0c00 - * val: 0000 1110 0111 1000 0000 1000 0000 0000 0x0e78 0800 - * - * This actually covers two groups where scalar access is governed by - * bit 28. A bunch of the instructions (float to integral) only exist - * in the vector form and are un-allocated for the scalar decode. Also - * in the scalar decode Q is always 1. - */ -static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn) -{ - int fpop, opcode, a, u; - int rn, rd; - bool is_q; - bool is_scalar; - - int pass; - TCGv_i32 tcg_rmode = NULL; - TCGv_ptr tcg_fpstatus = NULL; - bool need_fpst = true; - int rmode = -1; - - if (!dc_isar_feature(aa64_fp16, s)) { - unallocated_encoding(s); - return; - } - - rd = extract32(insn, 0, 5); - rn = extract32(insn, 5, 5); - - a = extract32(insn, 23, 1); - u = extract32(insn, 29, 1); - is_scalar = extract32(insn, 28, 1); - is_q = extract32(insn, 30, 1); - - opcode = extract32(insn, 12, 5); - fpop = deposit32(opcode, 5, 1, a); - fpop = deposit32(fpop, 6, 1, u); - - switch (fpop) { - case 0x3d: /* FRECPE */ - case 0x3f: /* FRECPX */ - break; - case 0x7d: /* FRSQRTE */ - break; - default: - case 0x2f: /* FABS */ - case 0x6f: /* FNEG */ - case 0x7f: /* FSQRT (vector) */ - case 0x18: /* FRINTN */ - case 0x19: /* FRINTM */ - case 0x38: /* FRINTP */ - case 0x39: /* FRINTZ */ - case 0x58: /* FRINTA */ - case 0x59: /* FRINTX */ - case 0x79: /* FRINTI */ - case 0x1d: /* SCVTF */ - case 0x5d: /* UCVTF */ - case 0x1a: /* FCVTNS */ - case 0x1b: /* FCVTMS */ - case 0x1c: /* FCVTAS */ - case 0x3a: /* FCVTPS */ - case 0x3b: /* FCVTZS */ - case 0x5a: /* FCVTNU */ - case 0x5b: /* FCVTMU */ - case 0x5c: /* FCVTAU */ - case 0x7a: /* FCVTPU */ - case 0x7b: /* FCVTZU */ - case 0x2c: /* FCMGT (zero) */ - case 0x2d: /* FCMEQ (zero) */ - case 0x2e: /* FCMLT (zero) */ - case 0x6c: /* FCMGE (zero) */ - case 0x6d: /* FCMLE (zero) */ - unallocated_encoding(s); - return; - } - - - /* Check additional constraints for the scalar encoding */ - if (is_scalar) { - if (!is_q) { - unallocated_encoding(s); - return; - } - } - - if (!fp_access_check(s)) { - return; - } - - if (rmode >= 0 || need_fpst) { - tcg_fpstatus = fpstatus_ptr(FPST_FPCR_F16); - } - - if (rmode >= 0) { - tcg_rmode = gen_set_rmode(rmode, tcg_fpstatus); - } - - if (is_scalar) { - TCGv_i32 tcg_op = read_fp_hreg(s, rn); - TCGv_i32 tcg_res = tcg_temp_new_i32(); - - switch (fpop) { - case 0x3d: /* FRECPE */ - gen_helper_recpe_f16(tcg_res, tcg_op, tcg_fpstatus); - break; - case 0x3f: /* FRECPX */ - gen_helper_frecpx_f16(tcg_res, tcg_op, tcg_fpstatus); - break; - case 0x7d: /* FRSQRTE */ - gen_helper_rsqrte_f16(tcg_res, tcg_op, tcg_fpstatus); - break; - default: - case 0x1a: /* FCVTNS */ - case 0x1b: /* FCVTMS */ - case 0x1c: /* FCVTAS */ - case 0x3a: /* FCVTPS */ - case 0x3b: /* FCVTZS */ - case 0x5a: /* FCVTNU */ - case 0x5b: /* FCVTMU */ - case 0x5c: /* FCVTAU */ - case 0x7a: /* FCVTPU */ - case 0x7b: /* FCVTZU */ - g_assert_not_reached(); - } - - /* limit any sign extension going on */ - tcg_gen_andi_i32(tcg_res, tcg_res, 0xffff); - write_fp_sreg(s, rd, tcg_res); - } else { - for (pass = 0; pass < (is_q ? 8 : 4); pass++) { - TCGv_i32 tcg_op = tcg_temp_new_i32(); - TCGv_i32 tcg_res = tcg_temp_new_i32(); - - read_vec_element_i32(s, tcg_op, rn, pass, MO_16); - - switch (fpop) { - case 0x3d: /* FRECPE */ - gen_helper_recpe_f16(tcg_res, tcg_op, tcg_fpstatus); - break; - case 0x7d: /* FRSQRTE */ - gen_helper_rsqrte_f16(tcg_res, tcg_op, tcg_fpstatus); - break; - default: - case 0x2f: /* FABS */ - case 0x6f: /* FNEG */ - case 0x7f: /* FSQRT */ - case 0x18: /* FRINTN */ - case 0x19: /* FRINTM */ - case 0x38: /* FRINTP */ - case 0x39: /* FRINTZ */ - case 0x58: /* FRINTA */ - case 0x79: /* FRINTI */ - case 0x59: /* FRINTX */ - case 0x1a: /* FCVTNS */ - case 0x1b: /* FCVTMS */ - case 0x1c: /* FCVTAS */ - case 0x3a: /* FCVTPS */ - case 0x3b: /* FCVTZS */ - case 0x5a: /* FCVTNU */ - case 0x5b: /* FCVTMU */ - case 0x5c: /* FCVTAU */ - case 0x7a: /* FCVTPU */ - case 0x7b: /* FCVTZU */ - g_assert_not_reached(); - } - - write_vec_element_i32(s, tcg_res, rd, pass, MO_16); - } - - clear_vec_high(s, is_q, rd); - } - - if (tcg_rmode) { - gen_restore_rmode(tcg_rmode, tcg_fpstatus); - } -} - /* C3.6 Data processing - SIMD, inc Crypto * * As the decode gets a little complex we are using a table based @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn) static const AArch64DecodeTable data_proc_simd[] = { /* pattern , mask , fn */ { 0x0e200800, 0x9f3e0c00, disas_simd_two_reg_misc }, - { 0x5e200800, 0xdf3e0c00, disas_simd_scalar_two_reg_misc }, - { 0x0e780800, 0x8f7e0c00, disas_simd_two_reg_misc_fp16 }, { 0x00000000, 0x00000000, NULL } }; -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-68-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/helper.h | 3 +++ target/arm/tcg/translate.h | 5 +++++ target/arm/tcg/gengvec.c | 16 ++++++++++++++++ target/arm/tcg/translate-neon.c | 4 ++-- target/arm/tcg/vec_helper.c | 22 ++++++++++++++++++++++ 5 files changed, 48 insertions(+), 2 deletions(-) diff --git a/target/arm/helper.h b/target/arm/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/helper.h +++ b/target/arm/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_uminp_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_uminp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_uminp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_urecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(gvec_ursqrte_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + #ifdef TARGET_AARCH64 #include "tcg/helper-a64.h" #include "tcg/helper-sve.h" diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate.h +++ b/target/arm/tcg/translate.h @@ -XXX,XX +XXX,XX @@ void gen_gvec_fabs(unsigned vece, uint32_t dofs, uint32_t aofs, void gen_gvec_fneg(unsigned vece, uint32_t dofs, uint32_t aofs, uint32_t oprsz, uint32_t maxsz); +void gen_gvec_urecpe(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t opr_sz, uint32_t max_sz); +void gen_gvec_ursqrte(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t opr_sz, uint32_t max_sz); + /* * Forward to the isar_feature_* tests given a DisasContext pointer. */ diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/gengvec.c +++ b/target/arm/tcg/gengvec.c @@ -XXX,XX +XXX,XX @@ void gen_gvec_fneg(unsigned vece, uint32_t dofs, uint32_t aofs, uint64_t s_bit = 1ull << ((8 << vece) - 1); tcg_gen_gvec_xori(vece, dofs, aofs, s_bit, oprsz, maxsz); } + +void gen_gvec_urecpe(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t opr_sz, uint32_t max_sz) +{ + assert(vece == MO_32); + tcg_gen_gvec_2_ool(rd_ofs, rn_ofs, opr_sz, max_sz, 0, + gen_helper_gvec_urecpe_s); +} + +void gen_gvec_ursqrte(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t opr_sz, uint32_t max_sz) +{ + assert(vece == MO_32); + tcg_gen_gvec_2_ool(rd_ofs, rn_ofs, opr_sz, max_sz, 0, + gen_helper_gvec_ursqrte_s); +} diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-neon.c +++ b/target/arm/tcg/translate-neon.c @@ -XXX,XX +XXX,XX @@ static bool trans_VRECPE(DisasContext *s, arg_2misc *a) if (a->size != 2) { return false; } - return do_2misc(s, a, gen_helper_recpe_u32); + return do_2misc_vec(s, a, gen_gvec_urecpe); } static bool trans_VRSQRTE(DisasContext *s, arg_2misc *a) @@ -XXX,XX +XXX,XX @@ static bool trans_VRSQRTE(DisasContext *s, arg_2misc *a) if (a->size != 2) { return false; } - return do_2misc(s, a, gen_helper_rsqrte_u32); + return do_2misc_vec(s, a, gen_gvec_ursqrte); } #define WRAP_1OP_ENV_FN(WRAPNAME, FUNC) \ diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(gvec_rbit_b)(void *vd, void *vn, uint32_t desc) } clear_tail(d, opr_sz, simd_maxsz(desc)); } + +void HELPER(gvec_urecpe_s)(void *vd, void *vn, uint32_t desc) +{ + intptr_t i, opr_sz = simd_oprsz(desc); + uint32_t *d = vd, *n = vn; + + for (i = 0; i < opr_sz / 4; ++i) { + d[i] = helper_recpe_u32(n[i]); + } + clear_tail(d, opr_sz, simd_maxsz(desc)); +} + +void HELPER(gvec_ursqrte_s)(void *vd, void *vn, uint32_t desc) +{ + intptr_t i, opr_sz = simd_oprsz(desc); + uint32_t *d = vd, *n = vn; + + for (i = 0; i < opr_sz / 4; ++i) { + d[i] = helper_rsqrte_u32(n[i]); + } + clear_tail(d, opr_sz, simd_maxsz(desc)); +} -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Remove handle_2misc_reciprocal as these were the last insns decoded by that function. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-69-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 3 + target/arm/tcg/translate-a64.c | 139 ++------------------------------- 2 files changed, 8 insertions(+), 134 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ FRECPE_v 0.00 1110 1.1 00001 11011 0 ..... ..... @qrr_sd FRSQRTE_v 0.10 1110 111 11001 11011 0 ..... ..... @qrr_h FRSQRTE_v 0.10 1110 1.1 00001 11011 0 ..... ..... @qrr_sd +URECPE_v 0.00 1110 101 00001 11001 0 ..... ..... @qrr_s +URSQRTE_v 0.10 1110 101 00001 11001 0 ..... ..... @qrr_s + &fcvt_q rd rn esz q shift @fcvtq_h . q:1 . ...... 001 .... ...... rn:5 rd:5 \ &fcvt_q esz=1 shift=%fcvt_f_sh_h diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ TRANS(CMLE0_v, do_gvec_fn2, a, gen_gvec_cle0) TRANS(CMEQ0_v, do_gvec_fn2, a, gen_gvec_ceq0) TRANS(REV16_v, do_gvec_fn2, a, gen_gvec_rev16) TRANS(REV32_v, do_gvec_fn2, a, gen_gvec_rev32) +TRANS(URECPE_v, do_gvec_fn2, a, gen_gvec_urecpe) +TRANS(URSQRTE_v, do_gvec_fn2, a, gen_gvec_ursqrte) static bool do_gvec_fn2_bhs(DisasContext *s, arg_qrr_e *a, GVecGen2Fn *fn) { @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2_ptr * const f_frsqrte[] = { }; TRANS(FRSQRTE_v, do_gvec_op2_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frsqrte) -static void handle_2misc_reciprocal(DisasContext *s, int opcode, - bool is_scalar, bool is_u, bool is_q, - int size, int rn, int rd) -{ - bool is_double = (size == 3); - - if (is_double) { - g_assert_not_reached(); - } else { - TCGv_i32 tcg_op = tcg_temp_new_i32(); - TCGv_i32 tcg_res = tcg_temp_new_i32(); - int pass, maxpasses; - - if (is_scalar) { - maxpasses = 1; - } else { - maxpasses = is_q ? 4 : 2; - } - - for (pass = 0; pass < maxpasses; pass++) { - read_vec_element_i32(s, tcg_op, rn, pass, MO_32); - - switch (opcode) { - case 0x3c: /* URECPE */ - gen_helper_recpe_u32(tcg_res, tcg_op); - break; - case 0x3d: /* FRECPE */ - case 0x3f: /* FRECPX */ - case 0x7d: /* FRSQRTE */ - default: - g_assert_not_reached(); - } - - if (is_scalar) { - write_fp_sreg(s, rd, tcg_res); - } else { - write_vec_element_i32(s, tcg_res, rd, pass, MO_32); - } - } - if (!is_scalar) { - clear_vec_high(s, is_q, rd); - } - } -} - static void handle_2misc_widening(DisasContext *s, int opcode, bool is_q, int size, int rn, int rd) { @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) bool is_q = extract32(insn, 30, 1); int rn = extract32(insn, 5, 5); int rd = extract32(insn, 0, 5); - bool need_fpstatus = false; - int rmode = -1; - TCGv_i32 tcg_rmode; - TCGv_ptr tcg_fpstatus; switch (opcode) { case 0xc ... 0xf: @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) opcode |= (extract32(size, 1, 1) << 5) | (u << 6); size = is_double ? 3 : 2; switch (opcode) { - case 0x3c: /* URECPE */ - if (size == 3) { - unallocated_encoding(s); - return; - } - if (!fp_access_check(s)) { - return; - } - handle_2misc_reciprocal(s, opcode, false, u, is_q, size, rn, rd); - return; case 0x17: /* FCVTL, FCVTL2 */ if (!fp_access_check(s)) { return; } handle_2misc_widening(s, opcode, is_q, size, rn, rd); return; - case 0x7c: /* URSQRTE */ - if (size == 3) { - unallocated_encoding(s); - return; - } - break; default: case 0x16: /* FCVTN, FCVTN2 */ case 0x36: /* BFCVTN, BFCVTN2 */ @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) case 0x6d: /* FCMLE (zero) */ case 0x3d: /* FRECPE */ case 0x7d: /* FRSQRTE */ + case 0x3c: /* URECPE */ + case 0x7c: /* URSQRTE */ unallocated_encoding(s); return; } @@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) unallocated_encoding(s); return; } - - if (!fp_access_check(s)) { - return; - } - - if (need_fpstatus || rmode >= 0) { - tcg_fpstatus = fpstatus_ptr(FPST_FPCR); - } else { - tcg_fpstatus = NULL; - } - if (rmode >= 0) { - tcg_rmode = gen_set_rmode(rmode, tcg_fpstatus); - } else { - tcg_rmode = NULL; - } - - { - int pass; - - assert(size == 2); - for (pass = 0; pass < (is_q ? 4 : 2); pass++) { - TCGv_i32 tcg_op = tcg_temp_new_i32(); - TCGv_i32 tcg_res = tcg_temp_new_i32(); - - read_vec_element_i32(s, tcg_op, rn, pass, MO_32); - - { - /* Special cases for 32 bit elements */ - switch (opcode) { - case 0x7c: /* URSQRTE */ - gen_helper_rsqrte_u32(tcg_res, tcg_op); - break; - default: - case 0x7: /* SQABS, SQNEG */ - case 0x2f: /* FABS */ - case 0x6f: /* FNEG */ - case 0x7f: /* FSQRT */ - case 0x18: /* FRINTN */ - case 0x19: /* FRINTM */ - case 0x38: /* FRINTP */ - case 0x39: /* FRINTZ */ - case 0x58: /* FRINTA */ - case 0x79: /* FRINTI */ - case 0x59: /* FRINTX */ - case 0x1e: /* FRINT32Z */ - case 0x5e: /* FRINT32X */ - case 0x1f: /* FRINT64Z */ - case 0x5f: /* FRINT64X */ - case 0x1a: /* FCVTNS */ - case 0x1b: /* FCVTMS */ - case 0x1c: /* FCVTAS */ - case 0x3a: /* FCVTPS */ - case 0x3b: /* FCVTZS */ - case 0x5a: /* FCVTNU */ - case 0x5b: /* FCVTMU */ - case 0x5c: /* FCVTAU */ - case 0x7a: /* FCVTPU */ - case 0x7b: /* FCVTZU */ - g_assert_not_reached(); - } - } - write_vec_element_i32(s, tcg_res, rd, pass, MO_32); - } - } - clear_vec_high(s, is_q, rd); - - if (tcg_rmode) { - gen_restore_rmode(tcg_rmode, tcg_fpstatus); - } + g_assert_not_reached(); } /* C3.6 Data processing - SIMD, inc Crypto -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Remove lookup_disas_fn, handle_2misc_widening, disas_simd_two_reg_misc, disas_data_proc_simd, disas_data_proc_simd_fp, disas_a64_legacy, as this is the final insn to be converted. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241211163036.2297116-70-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/a64.decode | 2 + target/arm/tcg/translate-a64.c | 202 +++------------------------------ 2 files changed, 18 insertions(+), 186 deletions(-) diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -XXX,XX +XXX,XX @@ FRSQRTE_v 0.10 1110 1.1 00001 11011 0 ..... ..... @qrr_sd URECPE_v 0.00 1110 101 00001 11001 0 ..... ..... @qrr_s URSQRTE_v 0.10 1110 101 00001 11001 0 ..... ..... @qrr_s +FCVTL_v 0.00 1110 0.1 00001 01111 0 ..... ..... @qrr_sd + &fcvt_q rd rn esz q shift @fcvtq_h . q:1 . ...... 001 .... ...... rn:5 rd:5 \ &fcvt_q esz=1 shift=%fcvt_f_sh_h diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static inline void gen_check_sp_alignment(DisasContext *s) */ } -/* - * This provides a simple table based table lookup decoder. It is - * intended to be used when the relevant bits for decode are too - * awkwardly placed and switch/if based logic would be confusing and - * deeply nested. Since it's a linear search through the table, tables - * should be kept small. - * - * It returns the first handler where insn & mask == pattern, or - * NULL if there is no match. - * The table is terminated by an empty mask (i.e. 0) - */ -static inline AArch64DecodeFn *lookup_disas_fn(const AArch64DecodeTable *table, - uint32_t insn) -{ - const AArch64DecodeTable *tptr = table; - - while (tptr->mask) { - if ((insn & tptr->mask) == tptr->pattern) { - return tptr->disas_fn; - } - tptr++; - } - return NULL; -} - /* * The instruction disassembly implemented here matches * the instruction encoding classifications in chapter C4 @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2_ptr * const f_frsqrte[] = { }; TRANS(FRSQRTE_v, do_gvec_op2_fpst, a->esz, a->q, a->rd, a->rn, 0, f_frsqrte) -static void handle_2misc_widening(DisasContext *s, int opcode, bool is_q, - int size, int rn, int rd) +static bool trans_FCVTL_v(DisasContext *s, arg_qrr_e *a) { /* Handle 2-reg-misc ops which are widening (so each size element * in the source becomes a 2*size element in the destination. @@ -XXX,XX +XXX,XX @@ static void handle_2misc_widening(DisasContext *s, int opcode, bool is_q, */ int pass; - if (size == 3) { + if (!fp_access_check(s)) { + return true; + } + + if (a->esz == MO_64) { /* 32 -> 64 bit fp conversion */ TCGv_i64 tcg_res[2]; - int srcelt = is_q ? 2 : 0; + TCGv_i32 tcg_op = tcg_temp_new_i32(); + int srcelt = a->q ? 2 : 0; for (pass = 0; pass < 2; pass++) { - TCGv_i32 tcg_op = tcg_temp_new_i32(); tcg_res[pass] = tcg_temp_new_i64(); - - read_vec_element_i32(s, tcg_op, rn, srcelt + pass, MO_32); + read_vec_element_i32(s, tcg_op, a->rn, srcelt + pass, MO_32); gen_helper_vfp_fcvtds(tcg_res[pass], tcg_op, tcg_env); } for (pass = 0; pass < 2; pass++) { - write_vec_element(s, tcg_res[pass], rd, pass, MO_64); + write_vec_element(s, tcg_res[pass], a->rd, pass, MO_64); } } else { /* 16 -> 32 bit fp conversion */ - int srcelt = is_q ? 4 : 0; + int srcelt = a->q ? 4 : 0; TCGv_i32 tcg_res[4]; TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR); TCGv_i32 ahp = get_ahp_flag(); for (pass = 0; pass < 4; pass++) { tcg_res[pass] = tcg_temp_new_i32(); - - read_vec_element_i32(s, tcg_res[pass], rn, srcelt + pass, MO_16); + read_vec_element_i32(s, tcg_res[pass], a->rn, srcelt + pass, MO_16); gen_helper_vfp_fcvt_f16_to_f32(tcg_res[pass], tcg_res[pass], fpst, ahp); } for (pass = 0; pass < 4; pass++) { - write_vec_element_i32(s, tcg_res[pass], rd, pass, MO_32); + write_vec_element_i32(s, tcg_res[pass], a->rd, pass, MO_32); } } -} - -/* AdvSIMD two reg misc - * 31 30 29 28 24 23 22 21 17 16 12 11 10 9 5 4 0 - * +---+---+---+-----------+------+-----------+--------+-----+------+------+ - * | 0 | Q | U | 0 1 1 1 0 | size | 1 0 0 0 0 | opcode | 1 0 | Rn | Rd | - * +---+---+---+-----------+------+-----------+--------+-----+------+------+ - */ -static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) -{ - int size = extract32(insn, 22, 2); - int opcode = extract32(insn, 12, 5); - bool u = extract32(insn, 29, 1); - bool is_q = extract32(insn, 30, 1); - int rn = extract32(insn, 5, 5); - int rd = extract32(insn, 0, 5); - - switch (opcode) { - case 0xc ... 0xf: - case 0x16 ... 0x1f: - { - /* Floating point: U, size[1] and opcode indicate operation; - * size[0] indicates single or double precision. - */ - int is_double = extract32(size, 0, 1); - opcode |= (extract32(size, 1, 1) << 5) | (u << 6); - size = is_double ? 3 : 2; - switch (opcode) { - case 0x17: /* FCVTL, FCVTL2 */ - if (!fp_access_check(s)) { - return; - } - handle_2misc_widening(s, opcode, is_q, size, rn, rd); - return; - default: - case 0x16: /* FCVTN, FCVTN2 */ - case 0x36: /* BFCVTN, BFCVTN2 */ - case 0x56: /* FCVTXN, FCVTXN2 */ - case 0x2f: /* FABS */ - case 0x6f: /* FNEG */ - case 0x7f: /* FSQRT */ - case 0x18: /* FRINTN */ - case 0x19: /* FRINTM */ - case 0x38: /* FRINTP */ - case 0x39: /* FRINTZ */ - case 0x59: /* FRINTX */ - case 0x79: /* FRINTI */ - case 0x58: /* FRINTA */ - case 0x1e: /* FRINT32Z */ - case 0x1f: /* FRINT64Z */ - case 0x5e: /* FRINT32X */ - case 0x5f: /* FRINT64X */ - case 0x1d: /* SCVTF */ - case 0x5d: /* UCVTF */ - case 0x1a: /* FCVTNS */ - case 0x1b: /* FCVTMS */ - case 0x3a: /* FCVTPS */ - case 0x3b: /* FCVTZS */ - case 0x5a: /* FCVTNU */ - case 0x5b: /* FCVTMU */ - case 0x7a: /* FCVTPU */ - case 0x7b: /* FCVTZU */ - case 0x5c: /* FCVTAU */ - case 0x1c: /* FCVTAS */ - case 0x2c: /* FCMGT (zero) */ - case 0x2d: /* FCMEQ (zero) */ - case 0x2e: /* FCMLT (zero) */ - case 0x6c: /* FCMGE (zero) */ - case 0x6d: /* FCMLE (zero) */ - case 0x3d: /* FRECPE */ - case 0x7d: /* FRSQRTE */ - case 0x3c: /* URECPE */ - case 0x7c: /* URSQRTE */ - unallocated_encoding(s); - return; - } - break; - } - default: - case 0x0: /* REV64, REV32 */ - case 0x1: /* REV16 */ - case 0x2: /* SADDLP, UADDLP */ - case 0x3: /* SUQADD, USQADD */ - case 0x4: /* CLS, CLZ */ - case 0x5: /* CNT, NOT, RBIT */ - case 0x6: /* SADALP, UADALP */ - case 0x7: /* SQABS, SQNEG */ - case 0x8: /* CMGT, CMGE */ - case 0x9: /* CMEQ, CMLE */ - case 0xa: /* CMLT */ - case 0xb: /* ABS, NEG */ - case 0x12: /* XTN, XTN2, SQXTUN, SQXTUN2 */ - case 0x13: /* SHLL, SHLL2 */ - case 0x14: /* SQXTN, SQXTN2, UQXTN, UQXTN2 */ - unallocated_encoding(s); - return; - } - g_assert_not_reached(); -} - -/* C3.6 Data processing - SIMD, inc Crypto - * - * As the decode gets a little complex we are using a table based - * approach for this part of the decode. - */ -static const AArch64DecodeTable data_proc_simd[] = { - /* pattern , mask , fn */ - { 0x0e200800, 0x9f3e0c00, disas_simd_two_reg_misc }, - { 0x00000000, 0x00000000, NULL } -}; - -static void disas_data_proc_simd(DisasContext *s, uint32_t insn) -{ - /* Note that this is called with all non-FP cases from - * table C3-6 so it must UNDEF for entries not specifically - * allocated to instructions in that table. - */ - AArch64DecodeFn *fn = lookup_disas_fn(&data_proc_simd[0], insn); - if (fn) { - fn(s, insn); - } else { - unallocated_encoding(s); - } -} - -/* C3.6 Data processing - SIMD and floating point */ -static void disas_data_proc_simd_fp(DisasContext *s, uint32_t insn) -{ - if (extract32(insn, 28, 1) == 1 && extract32(insn, 30, 1) == 0) { - unallocated_encoding(s); /* in decodetree */ - } else { - /* SIMD, including crypto */ - disas_data_proc_simd(s, insn); - } + clear_vec_high(s, true, a->rd); + return true; } static bool trans_OK(DisasContext *s, arg_OK *a) @@ -XXX,XX +XXX,XX @@ static bool btype_destination_ok(uint32_t insn, bool bt, int btype) return false; } -/* C3.1 A64 instruction index by encoding */ -static void disas_a64_legacy(DisasContext *s, uint32_t insn) -{ - switch (extract32(insn, 25, 4)) { - case 0x7: - case 0xf: /* Data processing - SIMD and floating point */ - disas_data_proc_simd_fp(s, insn); - break; - default: - unallocated_encoding(s); - break; - } -} - static void aarch64_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cpu) { @@ -XXX,XX +XXX,XX @@ static void aarch64_tr_translate_insn(DisasContextBase *dcbase, CPUState *cpu) if (!disas_a64(s, insn) && !disas_sme(s, insn) && !disas_sve(s, insn)) { - disas_a64_legacy(s, insn); + unallocated_encoding(s); } /* -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Softfloat has native support for round-to-odd. Use it. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241206031428.78634-1-richard.henderson@linaro.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-a64.c | 18 ++++-------------- 1 file changed, 4 insertions(+), 14 deletions(-) diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-a64.c +++ b/target/arm/tcg/helper-a64.c @@ -XXX,XX +XXX,XX @@ float64 HELPER(frecpx_f64)(float64 a, void *fpstp) float32 HELPER(fcvtx_f64_to_f32)(float64 a, CPUARMState *env) { - /* Von Neumann rounding is implemented by using round-to-zero - * and then setting the LSB of the result if Inexact was raised. - */ float32 r; float_status *fpst = &env->vfp.fp_status; - float_status tstat = *fpst; - int exflags; + int old = get_float_rounding_mode(fpst); - set_float_rounding_mode(float_round_to_zero, &tstat); - set_float_exception_flags(0, &tstat); - r = float64_to_float32(a, &tstat); - exflags = get_float_exception_flags(&tstat); - if (exflags & float_flag_inexact) { - r = make_float32(float32_val(r) | 1); - } - exflags |= get_float_exception_flags(fpst); - set_float_exception_flags(exflags, fpst); + set_float_rounding_mode(float_round_to_odd, fpst); + r = float64_to_float32(a, fpst); + set_float_rounding_mode(old, fpst); return r; } -- 2.34.1
From: Pierrick Bouvier <pierrick.bouvier@linaro.org> www.orangepi.org does not support https, it's expected to stick to http. Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Message-id: 20241206192254.3889131-2-pierrick.bouvier@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- docs/system/arm/orangepi.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/system/arm/orangepi.rst b/docs/system/arm/orangepi.rst index XXXXXXX..XXXXXXX 100644 --- a/docs/system/arm/orangepi.rst +++ b/docs/system/arm/orangepi.rst @@ -XXX,XX +XXX,XX @@ Orange Pi PC images Note that the mainline kernel does not have a root filesystem. You may provide it with an official Orange Pi PC image from the official website: - http://www.orangepi.org/downloadresources/ + http://www.orangepi.org/html/serviceAndSupport/index.html Another possibility is to run an Armbian image for Orange Pi PC which can be downloaded from: @@ -XXX,XX +XXX,XX @@ including the Orange Pi PC. NetBSD 9.0 is known to work best for the Orange Pi P board and provides a fully working system with serial console, networking and storage. For the Orange Pi PC machine, get the 'evbarm-earmv7hf' based image from: - https://cdn.netbsd.org/pub/NetBSD/NetBSD-9.0/evbarm-earmv7hf/binary/gzimg/armv7.img.gz + https://archive.netbsd.org/pub/NetBSD-archive/NetBSD-9.0/evbarm-earmv7hf/binary/gzimg/armv7.img.gz The image requires manually installing U-Boot in the image. Build U-Boot with the orangepi_pc_defconfig configuration as described in the previous section. -- 2.34.1
From: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Message-id: 20241206192254.3889131-3-pierrick.bouvier@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- docs/system/arm/fby35.rst | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/docs/system/arm/fby35.rst b/docs/system/arm/fby35.rst index XXXXXXX..XXXXXXX 100644 --- a/docs/system/arm/fby35.rst +++ b/docs/system/arm/fby35.rst @@ -XXX,XX +XXX,XX @@ process starts. $ screen /dev/tty0 # In a separate TMUX pane, terminal window, etc. $ screen /dev/tty1 $ (qemu) c # Start the boot process once screen is setup. + +This machine model supports emulation of the boot from the CE0 flash device by +setting option ``execute-in-place``. When using this option, the CPU fetches +instructions to execute by reading CE0 and not from a preloaded ROM +initialized at machine init time. As a result, execution will be slower. -- 2.34.1
From: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Message-id: 20241206192254.3889131-4-pierrick.bouvier@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> --- docs/system/arm/xlnx-versal-virt.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/system/arm/xlnx-versal-virt.rst b/docs/system/arm/xlnx-versal-virt.rst index XXXXXXX..XXXXXXX 100644 --- a/docs/system/arm/xlnx-versal-virt.rst +++ b/docs/system/arm/xlnx-versal-virt.rst @@ -XXX,XX +XXX,XX @@ Run the following at the U-Boot prompt: fdt set /chosen/dom0 reg <0x00000000 0x40000000 0x0 0x03100000> booti 30000000 - 20000000 +It's possible to change the OSPI flash model emulated by using the machine model +option ``ospi-flash``. + BBRAM File Backend """""""""""""""""" BBRAM can have an optional file backend, which must be a seekable -- 2.34.1
From: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Message-id: 20241206192254.3889131-5-pierrick.bouvier@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> --- docs/system/arm/virt.rst | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst index XXXXXXX..XXXXXXX 100644 --- a/docs/system/arm/virt.rst +++ b/docs/system/arm/virt.rst @@ -XXX,XX +XXX,XX @@ iommu ``smmuv3`` Create an SMMUv3 +default-bus-bypass-iommu + Set ``on``/``off`` to enable/disable `bypass_iommu + <https://gitlab.com/qemu-project/qemu/-/blob/master/docs/bypass-iommu.txt>`_ + for default root bus. + ras Set ``on``/``off`` to enable/disable reporting host memory errors to a guest using ACPI and guest external abort exceptions. The default is off. +acpi + Set ``on``/``off``/``auto`` to enable/disable ACPI. + dtb-randomness Set ``on``/``off`` to pass random seeds via the guest DTB rng-seed and kaslr-seed nodes (in both "/chosen" and @@ -XXX,XX +XXX,XX @@ dtb-randomness dtb-kaslr-seed A deprecated synonym for dtb-randomness. +x-oem-id + Set string (up to 6 bytes) to override the default value of field OEMID in ACPI + table header. + +x-oem-table-id + Set string (up to 8 bytes) to override the default value of field OEM Table ID + in ACPI table header. + Linux guest kernel configuration """""""""""""""""""""""""""""""" -- 2.34.1
From: Brian Cain <brian.cain@oss.qualcomm.com> Mea culpa, I don't know how I got this wrong in 2dfe93699c. Still getting used to the new address, I suppose. Somehow I got it right in the mailmap, though. Signed-off-by: Brian Cain <brian.cain@oss.qualcomm.com> Message-id: 20241209181242.1434231-1-brian.cain@oss.qualcomm.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index XXXXXXX..XXXXXXX 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -XXX,XX +XXX,XX @@ F: target/avr/ F: tests/functional/test_avr_mega2560.py Hexagon TCG CPUs -M: Brian Cain <bcain@oss.qualcomm.com> +M: Brian Cain <brian.cain@oss.qualcomm.com> S: Supported F: target/hexagon/ X: target/hexagon/idef-parser/ -- 2.34.1
target/arm/helper.c is very large and unwieldy. One subset of code that we can pull out into its own file is the cpreg arrays and corresponding functions for the TLBI instructions. Because these are instructions they are only relevant for TCG and we can make the new file only be built for CONFIG_TCG. In this commit we move the AArch32 instructions from: not_v7_cp_reginfo[] v7_cp_reginfo[] v7mp_cp_reginfo[] v8_cp_reginfo[] into a new file target/arm/tcg/tlb-insns.c. A few small functions are used both by functions we haven't yet moved across and by functions we have already moved. We temporarily make these global with a prototype in cpregs.h; when the move of all TLBI insns is complete these will return to being file-local. For CONFIG_TCG, this is just moving code around. For a KVM only build, these cpregs will no longer be added to the cpregs hashtable for the CPU. However this should not be a behaviour change, because: * we never try to migration sync or otherwise include ARM_CP_NO_RAW cpregs * for migration we treat the kernel's list of system registers as the authoritative one, so these TLBI insns were never in it anyway The no-tcg stub of define_tlb_insn_regs() therefore does nothing. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241210160452.2427965-2-peter.maydell@linaro.org --- target/arm/cpregs.h | 14 +++ target/arm/internals.h | 3 + target/arm/helper.c | 231 ++-------------------------------- target/arm/tcg-stubs.c | 5 + target/arm/tcg/tlb-insns.c | 246 +++++++++++++++++++++++++++++++++++++ target/arm/tcg/meson.build | 1 + 6 files changed, 280 insertions(+), 220 deletions(-) create mode 100644 target/arm/tcg/tlb-insns.c diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpregs.h +++ b/target/arm/cpregs.h @@ -XXX,XX +XXX,XX @@ static inline bool arm_cpreg_traps_in_nv(const ARMCPRegInfo *ri) return ri->opc1 == 4 || ri->opc1 == 5; } +/* + * Temporary declarations of functions until the move to tlb_insn_helper.c + * is complete and we can make the functions static again + */ +CPAccessResult access_ttlb(CPUARMState *env, const ARMCPRegInfo *ri, + bool isread); +CPAccessResult access_ttlbis(CPUARMState *env, const ARMCPRegInfo *ri, + bool isread); +bool tlb_force_broadcast(CPUARMState *env); +void tlbimva_hyp_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value); +void tlbimva_hyp_is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value); + #endif /* TARGET_ARM_CPREGS_H */ diff --git a/target/arm/internals.h b/target/arm/internals.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/internals.h +++ b/target/arm/internals.h @@ -XXX,XX +XXX,XX @@ static inline uint64_t pauth_ptr_mask(ARMVAParameters param) /* Add the cpreg definitions for debug related system registers */ void define_debug_regs(ARMCPU *cpu); +/* Add the cpreg definitions for TLBI instructions */ +void define_tlb_insn_regs(ARMCPU *cpu); + /* Effective value of MDCR_EL2 */ static inline uint64_t arm_mdcr_el2_eff(CPUARMState *env) { diff --git a/target/arm/helper.c b/target/arm/helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -XXX,XX +XXX,XX @@ static CPAccessResult access_tacr(CPUARMState *env, const ARMCPRegInfo *ri, } /* Check for traps from EL1 due to HCR_EL2.TTLB. */ -static CPAccessResult access_ttlb(CPUARMState *env, const ARMCPRegInfo *ri, - bool isread) +CPAccessResult access_ttlb(CPUARMState *env, const ARMCPRegInfo *ri, + bool isread) { if (arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & HCR_TTLB)) { return CP_ACCESS_TRAP_EL2; @@ -XXX,XX +XXX,XX @@ static CPAccessResult access_ttlb(CPUARMState *env, const ARMCPRegInfo *ri, } /* Check for traps from EL1 due to HCR_EL2.TTLB or TTLBIS. */ -static CPAccessResult access_ttlbis(CPUARMState *env, const ARMCPRegInfo *ri, - bool isread) +CPAccessResult access_ttlbis(CPUARMState *env, const ARMCPRegInfo *ri, + bool isread) { if (arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & (HCR_TTLB | HCR_TTLBIS))) { @@ -XXX,XX +XXX,XX @@ static int alle1_tlbmask(CPUARMState *env) ARMMMUIdxBit_Stage2_S); } - -/* IS variants of TLB operations must affect all cores */ -static void tlbiall_is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - CPUState *cs = env_cpu(env); - - tlb_flush_all_cpus_synced(cs); -} - -static void tlbiasid_is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - CPUState *cs = env_cpu(env); - - tlb_flush_all_cpus_synced(cs); -} - -static void tlbimva_is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - CPUState *cs = env_cpu(env); - - tlb_flush_page_all_cpus_synced(cs, value & TARGET_PAGE_MASK); -} - -static void tlbimvaa_is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - CPUState *cs = env_cpu(env); - - tlb_flush_page_all_cpus_synced(cs, value & TARGET_PAGE_MASK); -} - /* * Non-IS variants of TLB operations are upgraded to * IS versions if we are at EL1 and HCR_EL2.FB is effectively set to * force broadcast of these operations. */ -static bool tlb_force_broadcast(CPUARMState *env) +bool tlb_force_broadcast(CPUARMState *env) { return arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & HCR_FB); } -static void tlbiall_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - /* Invalidate all (TLBIALL) */ - CPUState *cs = env_cpu(env); - - if (tlb_force_broadcast(env)) { - tlb_flush_all_cpus_synced(cs); - } else { - tlb_flush(cs); - } -} - -static void tlbimva_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - /* Invalidate single TLB entry by MVA and ASID (TLBIMVA) */ - CPUState *cs = env_cpu(env); - - value &= TARGET_PAGE_MASK; - if (tlb_force_broadcast(env)) { - tlb_flush_page_all_cpus_synced(cs, value); - } else { - tlb_flush_page(cs, value); - } -} - -static void tlbiasid_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - /* Invalidate by ASID (TLBIASID) */ - CPUState *cs = env_cpu(env); - - if (tlb_force_broadcast(env)) { - tlb_flush_all_cpus_synced(cs); - } else { - tlb_flush(cs); - } -} - -static void tlbimvaa_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - /* Invalidate single entry by MVA, all ASIDs (TLBIMVAA) */ - CPUState *cs = env_cpu(env); - - value &= TARGET_PAGE_MASK; - if (tlb_force_broadcast(env)) { - tlb_flush_page_all_cpus_synced(cs, value); - } else { - tlb_flush_page(cs, value); - } -} - static void tlbiall_nsnh_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value) { @@ -XXX,XX +XXX,XX @@ static void tlbiall_hyp_is_write(CPUARMState *env, const ARMCPRegInfo *ri, tlb_flush_by_mmuidx_all_cpus_synced(cs, ARMMMUIdxBit_E2); } -static void tlbimva_hyp_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) +void tlbimva_hyp_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) { CPUState *cs = env_cpu(env); uint64_t pageaddr = value & ~MAKE_64BIT_MASK(0, 12); @@ -XXX,XX +XXX,XX @@ static void tlbimva_hyp_write(CPUARMState *env, const ARMCPRegInfo *ri, tlb_flush_page_by_mmuidx(cs, pageaddr, ARMMMUIdxBit_E2); } -static void tlbimva_hyp_is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) +void tlbimva_hyp_is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) { CPUState *cs = env_cpu(env); uint64_t pageaddr = value & ~MAKE_64BIT_MASK(0, 12); @@ -XXX,XX +XXX,XX @@ static void tlbimva_hyp_is_write(CPUARMState *env, const ARMCPRegInfo *ri, ARMMMUIdxBit_E2); } -static void tlbiipas2_hyp_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - CPUState *cs = env_cpu(env); - uint64_t pageaddr = (value & MAKE_64BIT_MASK(0, 28)) << 12; - - tlb_flush_page_by_mmuidx(cs, pageaddr, ARMMMUIdxBit_Stage2); -} - -static void tlbiipas2is_hyp_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - CPUState *cs = env_cpu(env); - uint64_t pageaddr = (value & MAKE_64BIT_MASK(0, 28)) << 12; - - tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr, ARMMMUIdxBit_Stage2); -} - static const ARMCPRegInfo cp_reginfo[] = { /* * Define the secure and non-secure FCSE identifier CP registers @@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo not_v7_cp_reginfo[] = { */ { .name = "DBGDIDR", .cp = 14, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 0, .access = PL0_R, .type = ARM_CP_CONST, .resetvalue = 0 }, - /* - * MMU TLB control. Note that the wildcarding means we cover not just - * the unified TLB ops but also the dside/iside/inner-shareable variants. - */ - { .name = "TLBIALL", .cp = 15, .crn = 8, .crm = CP_ANY, - .opc1 = CP_ANY, .opc2 = 0, .access = PL1_W, .writefn = tlbiall_write, - .type = ARM_CP_NO_RAW }, - { .name = "TLBIMVA", .cp = 15, .crn = 8, .crm = CP_ANY, - .opc1 = CP_ANY, .opc2 = 1, .access = PL1_W, .writefn = tlbimva_write, - .type = ARM_CP_NO_RAW }, - { .name = "TLBIASID", .cp = 15, .crn = 8, .crm = CP_ANY, - .opc1 = CP_ANY, .opc2 = 2, .access = PL1_W, .writefn = tlbiasid_write, - .type = ARM_CP_NO_RAW }, - { .name = "TLBIMVAA", .cp = 15, .crn = 8, .crm = CP_ANY, - .opc1 = CP_ANY, .opc2 = 3, .access = PL1_W, .writefn = tlbimvaa_write, - .type = ARM_CP_NO_RAW }, { .name = "PRRR", .cp = 15, .crn = 10, .crm = 2, .opc1 = 0, .opc2 = 0, .access = PL1_RW, .type = ARM_CP_NOP }, { .name = "NMRR", .cp = 15, .crn = 10, .crm = 2, @@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = { .opc0 = 3, .opc1 = 0, .crn = 12, .crm = 1, .opc2 = 0, .fgt = FGT_ISR_EL1, .type = ARM_CP_NO_RAW, .access = PL1_R, .readfn = isr_read }, - /* 32 bit ITLB invalidates */ - { .name = "ITLBIALL", .cp = 15, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 0, - .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb, - .writefn = tlbiall_write }, - { .name = "ITLBIMVA", .cp = 15, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 1, - .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb, - .writefn = tlbimva_write }, - { .name = "ITLBIASID", .cp = 15, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 2, - .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb, - .writefn = tlbiasid_write }, - /* 32 bit DTLB invalidates */ - { .name = "DTLBIALL", .cp = 15, .opc1 = 0, .crn = 8, .crm = 6, .opc2 = 0, - .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb, - .writefn = tlbiall_write }, - { .name = "DTLBIMVA", .cp = 15, .opc1 = 0, .crn = 8, .crm = 6, .opc2 = 1, - .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb, - .writefn = tlbimva_write }, - { .name = "DTLBIASID", .cp = 15, .opc1 = 0, .crn = 8, .crm = 6, .opc2 = 2, - .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb, - .writefn = tlbiasid_write }, - /* 32 bit TLB invalidates */ - { .name = "TLBIALL", .cp = 15, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 0, - .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb, - .writefn = tlbiall_write }, - { .name = "TLBIMVA", .cp = 15, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 1, - .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb, - .writefn = tlbimva_write }, - { .name = "TLBIASID", .cp = 15, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 2, - .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb, - .writefn = tlbiasid_write }, - { .name = "TLBIMVAA", .cp = 15, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 3, - .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb, - .writefn = tlbimvaa_write }, -}; - -static const ARMCPRegInfo v7mp_cp_reginfo[] = { - /* 32 bit TLB invalidates, Inner Shareable */ - { .name = "TLBIALLIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 0, - .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlbis, - .writefn = tlbiall_is_write }, - { .name = "TLBIMVAIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 1, - .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlbis, - .writefn = tlbimva_is_write }, - { .name = "TLBIASIDIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 2, - .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlbis, - .writefn = tlbiasid_is_write }, - { .name = "TLBIMVAAIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 3, - .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlbis, - .writefn = tlbimvaa_is_write }, }; static const ARMCPRegInfo pmovsset_cp_reginfo[] = { @@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = { .fieldoffset = offsetof(CPUARMState, cp15.par_el[1]), .writefn = par_write }, #endif - /* TLB invalidate last level of translation table walk */ - { .name = "TLBIMVALIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 5, - .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlbis, - .writefn = tlbimva_is_write }, - { .name = "TLBIMVAALIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 7, - .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlbis, - .writefn = tlbimvaa_is_write }, - { .name = "TLBIMVAL", .cp = 15, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 5, - .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb, - .writefn = tlbimva_write }, - { .name = "TLBIMVAAL", .cp = 15, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 7, - .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb, - .writefn = tlbimvaa_write }, - { .name = "TLBIMVALH", .cp = 15, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 5, - .type = ARM_CP_NO_RAW, .access = PL2_W, - .writefn = tlbimva_hyp_write }, - { .name = "TLBIMVALHIS", - .cp = 15, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 5, - .type = ARM_CP_NO_RAW, .access = PL2_W, - .writefn = tlbimva_hyp_is_write }, - { .name = "TLBIIPAS2", - .cp = 15, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 1, - .type = ARM_CP_NO_RAW, .access = PL2_W, - .writefn = tlbiipas2_hyp_write }, - { .name = "TLBIIPAS2IS", - .cp = 15, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 1, - .type = ARM_CP_NO_RAW, .access = PL2_W, - .writefn = tlbiipas2is_hyp_write }, - { .name = "TLBIIPAS2L", - .cp = 15, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 5, - .type = ARM_CP_NO_RAW, .access = PL2_W, - .writefn = tlbiipas2_hyp_write }, - { .name = "TLBIIPAS2LIS", - .cp = 15, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 5, - .type = ARM_CP_NO_RAW, .access = PL2_W, - .writefn = tlbiipas2is_hyp_write }, /* 32 bit cache operations */ { .name = "ICIALLUIS", .cp = 15, .opc1 = 0, .crn = 7, .crm = 1, .opc2 = 0, .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_ticab }, @@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu) define_arm_cp_regs(cpu, not_v8_cp_reginfo); } + define_tlb_insn_regs(cpu); + if (arm_feature(env, ARM_FEATURE_V6)) { /* The ID registers all have impdef reset values */ ARMCPRegInfo v6_idregs[] = { @@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu) if (arm_feature(env, ARM_FEATURE_V6K)) { define_arm_cp_regs(cpu, v6k_cp_reginfo); } - if (arm_feature(env, ARM_FEATURE_V7MP) && - !arm_feature(env, ARM_FEATURE_PMSA)) { - define_arm_cp_regs(cpu, v7mp_cp_reginfo); - } if (arm_feature(env, ARM_FEATURE_V7VE)) { define_arm_cp_regs(cpu, pmovsset_cp_reginfo); } diff --git a/target/arm/tcg-stubs.c b/target/arm/tcg-stubs.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg-stubs.c +++ b/target/arm/tcg-stubs.c @@ -XXX,XX +XXX,XX @@ void raise_exception_ra(CPUARMState *env, uint32_t excp, uint32_t syndrome, void assert_hflags_rebuild_correctly(CPUARMState *env) { } + +/* TLBI insns are only used by TCG, so we don't need to do anything for KVM */ +void define_tlb_insn_regs(ARMCPU *cpu) +{ +} diff --git a/target/arm/tcg/tlb-insns.c b/target/arm/tcg/tlb-insns.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/target/arm/tcg/tlb-insns.c @@ -XXX,XX +XXX,XX @@ +/* + * Helpers for TLBI insns + * + * This code is licensed under the GNU GPL v2 or later. + * + * SPDX-License-Identifier: GPL-2.0-or-later + */ +#include "qemu/osdep.h" +#include "exec/exec-all.h" +#include "cpu.h" +#include "internals.h" +#include "cpu-features.h" +#include "cpregs.h" + +/* IS variants of TLB operations must affect all cores */ +static void tlbiall_is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + CPUState *cs = env_cpu(env); + + tlb_flush_all_cpus_synced(cs); +} + +static void tlbiasid_is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + CPUState *cs = env_cpu(env); + + tlb_flush_all_cpus_synced(cs); +} + +static void tlbimva_is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + CPUState *cs = env_cpu(env); + + tlb_flush_page_all_cpus_synced(cs, value & TARGET_PAGE_MASK); +} + +static void tlbimvaa_is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + CPUState *cs = env_cpu(env); + + tlb_flush_page_all_cpus_synced(cs, value & TARGET_PAGE_MASK); +} + +static void tlbiall_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + /* Invalidate all (TLBIALL) */ + CPUState *cs = env_cpu(env); + + if (tlb_force_broadcast(env)) { + tlb_flush_all_cpus_synced(cs); + } else { + tlb_flush(cs); + } +} + +static void tlbimva_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + /* Invalidate single TLB entry by MVA and ASID (TLBIMVA) */ + CPUState *cs = env_cpu(env); + + value &= TARGET_PAGE_MASK; + if (tlb_force_broadcast(env)) { + tlb_flush_page_all_cpus_synced(cs, value); + } else { + tlb_flush_page(cs, value); + } +} + +static void tlbiasid_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + /* Invalidate by ASID (TLBIASID) */ + CPUState *cs = env_cpu(env); + + if (tlb_force_broadcast(env)) { + tlb_flush_all_cpus_synced(cs); + } else { + tlb_flush(cs); + } +} + +static void tlbimvaa_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + /* Invalidate single entry by MVA, all ASIDs (TLBIMVAA) */ + CPUState *cs = env_cpu(env); + + value &= TARGET_PAGE_MASK; + if (tlb_force_broadcast(env)) { + tlb_flush_page_all_cpus_synced(cs, value); + } else { + tlb_flush_page(cs, value); + } +} + +static void tlbiipas2_hyp_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + CPUState *cs = env_cpu(env); + uint64_t pageaddr = (value & MAKE_64BIT_MASK(0, 28)) << 12; + + tlb_flush_page_by_mmuidx(cs, pageaddr, ARMMMUIdxBit_Stage2); +} + +static void tlbiipas2is_hyp_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + CPUState *cs = env_cpu(env); + uint64_t pageaddr = (value & MAKE_64BIT_MASK(0, 28)) << 12; + + tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr, ARMMMUIdxBit_Stage2); +} + +static const ARMCPRegInfo tlbi_not_v7_cp_reginfo[] = { + /* + * MMU TLB control. Note that the wildcarding means we cover not just + * the unified TLB ops but also the dside/iside/inner-shareable variants. + */ + { .name = "TLBIALL", .cp = 15, .crn = 8, .crm = CP_ANY, + .opc1 = CP_ANY, .opc2 = 0, .access = PL1_W, .writefn = tlbiall_write, + .type = ARM_CP_NO_RAW }, + { .name = "TLBIMVA", .cp = 15, .crn = 8, .crm = CP_ANY, + .opc1 = CP_ANY, .opc2 = 1, .access = PL1_W, .writefn = tlbimva_write, + .type = ARM_CP_NO_RAW }, + { .name = "TLBIASID", .cp = 15, .crn = 8, .crm = CP_ANY, + .opc1 = CP_ANY, .opc2 = 2, .access = PL1_W, .writefn = tlbiasid_write, + .type = ARM_CP_NO_RAW }, + { .name = "TLBIMVAA", .cp = 15, .crn = 8, .crm = CP_ANY, + .opc1 = CP_ANY, .opc2 = 3, .access = PL1_W, .writefn = tlbimvaa_write, + .type = ARM_CP_NO_RAW }, +}; + +static const ARMCPRegInfo tlbi_v7_cp_reginfo[] = { + /* 32 bit ITLB invalidates */ + { .name = "ITLBIALL", .cp = 15, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 0, + .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb, + .writefn = tlbiall_write }, + { .name = "ITLBIMVA", .cp = 15, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 1, + .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb, + .writefn = tlbimva_write }, + { .name = "ITLBIASID", .cp = 15, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 2, + .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb, + .writefn = tlbiasid_write }, + /* 32 bit DTLB invalidates */ + { .name = "DTLBIALL", .cp = 15, .opc1 = 0, .crn = 8, .crm = 6, .opc2 = 0, + .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb, + .writefn = tlbiall_write }, + { .name = "DTLBIMVA", .cp = 15, .opc1 = 0, .crn = 8, .crm = 6, .opc2 = 1, + .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb, + .writefn = tlbimva_write }, + { .name = "DTLBIASID", .cp = 15, .opc1 = 0, .crn = 8, .crm = 6, .opc2 = 2, + .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb, + .writefn = tlbiasid_write }, + /* 32 bit TLB invalidates */ + { .name = "TLBIALL", .cp = 15, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 0, + .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb, + .writefn = tlbiall_write }, + { .name = "TLBIMVA", .cp = 15, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 1, + .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb, + .writefn = tlbimva_write }, + { .name = "TLBIASID", .cp = 15, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 2, + .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb, + .writefn = tlbiasid_write }, + { .name = "TLBIMVAA", .cp = 15, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 3, + .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb, + .writefn = tlbimvaa_write }, +}; + +static const ARMCPRegInfo tlbi_v7mp_cp_reginfo[] = { + /* 32 bit TLB invalidates, Inner Shareable */ + { .name = "TLBIALLIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 0, + .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlbis, + .writefn = tlbiall_is_write }, + { .name = "TLBIMVAIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 1, + .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlbis, + .writefn = tlbimva_is_write }, + { .name = "TLBIASIDIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 2, + .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlbis, + .writefn = tlbiasid_is_write }, + { .name = "TLBIMVAAIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 3, + .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlbis, + .writefn = tlbimvaa_is_write }, +}; + +static const ARMCPRegInfo tlbi_v8_cp_reginfo[] = { + /* AArch32 TLB invalidate last level of translation table walk */ + { .name = "TLBIMVALIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 5, + .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlbis, + .writefn = tlbimva_is_write }, + { .name = "TLBIMVAALIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 7, + .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlbis, + .writefn = tlbimvaa_is_write }, + { .name = "TLBIMVAL", .cp = 15, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 5, + .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb, + .writefn = tlbimva_write }, + { .name = "TLBIMVAAL", .cp = 15, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 7, + .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb, + .writefn = tlbimvaa_write }, + { .name = "TLBIMVALH", .cp = 15, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 5, + .type = ARM_CP_NO_RAW, .access = PL2_W, + .writefn = tlbimva_hyp_write }, + { .name = "TLBIMVALHIS", + .cp = 15, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 5, + .type = ARM_CP_NO_RAW, .access = PL2_W, + .writefn = tlbimva_hyp_is_write }, + { .name = "TLBIIPAS2", + .cp = 15, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 1, + .type = ARM_CP_NO_RAW, .access = PL2_W, + .writefn = tlbiipas2_hyp_write }, + { .name = "TLBIIPAS2IS", + .cp = 15, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 1, + .type = ARM_CP_NO_RAW, .access = PL2_W, + .writefn = tlbiipas2is_hyp_write }, + { .name = "TLBIIPAS2L", + .cp = 15, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 5, + .type = ARM_CP_NO_RAW, .access = PL2_W, + .writefn = tlbiipas2_hyp_write }, + { .name = "TLBIIPAS2LIS", + .cp = 15, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 5, + .type = ARM_CP_NO_RAW, .access = PL2_W, + .writefn = tlbiipas2is_hyp_write }, +}; + +void define_tlb_insn_regs(ARMCPU *cpu) +{ + CPUARMState *env = &cpu->env; + + if (!arm_feature(env, ARM_FEATURE_V7)) { + define_arm_cp_regs(cpu, tlbi_not_v7_cp_reginfo); + } else { + define_arm_cp_regs(cpu, tlbi_v7_cp_reginfo); + } + if (arm_feature(env, ARM_FEATURE_V7MP) && + !arm_feature(env, ARM_FEATURE_PMSA)) { + define_arm_cp_regs(cpu, tlbi_v7mp_cp_reginfo); + } + if (arm_feature(env, ARM_FEATURE_V8)) { + define_arm_cp_regs(cpu, tlbi_v8_cp_reginfo); + } +} diff --git a/target/arm/tcg/meson.build b/target/arm/tcg/meson.build index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/meson.build +++ b/target/arm/tcg/meson.build @@ -XXX,XX +XXX,XX @@ arm_ss.add(files( 'op_helper.c', 'tlb_helper.c', 'vec_helper.c', + 'tlb-insns.c', )) arm_ss.add(when: 'TARGET_AARCH64', if_true: files( -- 2.34.1
Move the AArch32 TLBI insns for AArch32 EL2 to tlbi_insn_helper.c. To keep this as an obviously pure code-movement, we retain the same condition for registering tlbi_el2_cp_reginfo that we use for el2_cp_reginfo. We'll be able to simplify this condition later, since the need to define the reginfo for EL3-without-EL2 doesn't apply for the TLBI ops specifically. This move brings all the uses of tlbimva_hyp_write() and tlbimva_hyp_is_write() back into a single file, so we can move those also, and make them file-local again. The helper alle1_tlbmask() is an exception to the pattern that we only need to make these functions global temporarily, because once this refactoring is complete it will be called by both code in helper.c (vttbr_write()) and by code in tlb-insns.c. We therefore put its prototype in a permanent home in internals.h. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241210160452.2427965-3-peter.maydell@linaro.org --- target/arm/cpregs.h | 4 -- target/arm/internals.h | 6 +++ target/arm/helper.c | 74 +-------------------------------- target/arm/tcg/tlb-insns.c | 85 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 92 insertions(+), 77 deletions(-) diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpregs.h +++ b/target/arm/cpregs.h @@ -XXX,XX +XXX,XX @@ CPAccessResult access_ttlb(CPUARMState *env, const ARMCPRegInfo *ri, CPAccessResult access_ttlbis(CPUARMState *env, const ARMCPRegInfo *ri, bool isread); bool tlb_force_broadcast(CPUARMState *env); -void tlbimva_hyp_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value); -void tlbimva_hyp_is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value); #endif /* TARGET_ARM_CPREGS_H */ diff --git a/target/arm/internals.h b/target/arm/internals.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/internals.h +++ b/target/arm/internals.h @@ -XXX,XX +XXX,XX @@ uint64_t gt_get_countervalue(CPUARMState *env); * and CNTVCT_EL0 (this will be either 0 or the value of CNTVOFF_EL2). */ uint64_t gt_virt_cnt_offset(CPUARMState *env); + +/* + * Return mask of ARMMMUIdxBit values corresponding to an "invalidate + * all EL1" scope; this covers stage 1 and stage 2. + */ +int alle1_tlbmask(CPUARMState *env); #endif diff --git a/target/arm/helper.c b/target/arm/helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -XXX,XX +XXX,XX @@ static void contextidr_write(CPUARMState *env, const ARMCPRegInfo *ri, raw_write(env, ri, value); } -static int alle1_tlbmask(CPUARMState *env) +int alle1_tlbmask(CPUARMState *env) { /* * Note that the 'ALL' scope must invalidate both stage 1 and @@ -XXX,XX +XXX,XX @@ bool tlb_force_broadcast(CPUARMState *env) return arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & HCR_FB); } -static void tlbiall_nsnh_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - CPUState *cs = env_cpu(env); - - tlb_flush_by_mmuidx(cs, alle1_tlbmask(env)); -} - -static void tlbiall_nsnh_is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - CPUState *cs = env_cpu(env); - - tlb_flush_by_mmuidx_all_cpus_synced(cs, alle1_tlbmask(env)); -} - - -static void tlbiall_hyp_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - CPUState *cs = env_cpu(env); - - tlb_flush_by_mmuidx(cs, ARMMMUIdxBit_E2); -} - -static void tlbiall_hyp_is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - CPUState *cs = env_cpu(env); - - tlb_flush_by_mmuidx_all_cpus_synced(cs, ARMMMUIdxBit_E2); -} - -void tlbimva_hyp_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - CPUState *cs = env_cpu(env); - uint64_t pageaddr = value & ~MAKE_64BIT_MASK(0, 12); - - tlb_flush_page_by_mmuidx(cs, pageaddr, ARMMMUIdxBit_E2); -} - -void tlbimva_hyp_is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - CPUState *cs = env_cpu(env); - uint64_t pageaddr = value & ~MAKE_64BIT_MASK(0, 12); - - tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr, - ARMMMUIdxBit_E2); -} - static const ARMCPRegInfo cp_reginfo[] = { /* * Define the secure and non-secure FCSE identifier CP registers @@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el2_cp_reginfo[] = { { .name = "HTTBR", .cp = 15, .opc1 = 4, .crm = 2, .access = PL2_RW, .type = ARM_CP_64BIT | ARM_CP_ALIAS, .fieldoffset = offsetof(CPUARMState, cp15.ttbr0_el[2]) }, - { .name = "TLBIALLNSNH", - .cp = 15, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 4, - .type = ARM_CP_NO_RAW, .access = PL2_W, - .writefn = tlbiall_nsnh_write }, - { .name = "TLBIALLNSNHIS", - .cp = 15, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 4, - .type = ARM_CP_NO_RAW, .access = PL2_W, - .writefn = tlbiall_nsnh_is_write }, - { .name = "TLBIALLH", .cp = 15, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 0, - .type = ARM_CP_NO_RAW, .access = PL2_W, - .writefn = tlbiall_hyp_write }, - { .name = "TLBIALLHIS", .cp = 15, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 0, - .type = ARM_CP_NO_RAW, .access = PL2_W, - .writefn = tlbiall_hyp_is_write }, - { .name = "TLBIMVAH", .cp = 15, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 1, - .type = ARM_CP_NO_RAW, .access = PL2_W, - .writefn = tlbimva_hyp_write }, - { .name = "TLBIMVAHIS", .cp = 15, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 1, - .type = ARM_CP_NO_RAW, .access = PL2_W, - .writefn = tlbimva_hyp_is_write }, { .name = "TLBI_ALLE2", .state = ARM_CP_STATE_AA64, .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 0, .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, diff --git a/target/arm/tcg/tlb-insns.c b/target/arm/tcg/tlb-insns.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/tlb-insns.c +++ b/target/arm/tcg/tlb-insns.c @@ -XXX,XX +XXX,XX @@ static void tlbimvaa_write(CPUARMState *env, const ARMCPRegInfo *ri, } } +static void tlbimva_hyp_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + CPUState *cs = env_cpu(env); + uint64_t pageaddr = value & ~MAKE_64BIT_MASK(0, 12); + + tlb_flush_page_by_mmuidx(cs, pageaddr, ARMMMUIdxBit_E2); +} + +static void tlbimva_hyp_is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + CPUState *cs = env_cpu(env); + uint64_t pageaddr = value & ~MAKE_64BIT_MASK(0, 12); + + tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr, + ARMMMUIdxBit_E2); +} + static void tlbiipas2_hyp_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value) { @@ -XXX,XX +XXX,XX @@ static void tlbiipas2is_hyp_write(CPUARMState *env, const ARMCPRegInfo *ri, tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr, ARMMMUIdxBit_Stage2); } +static void tlbiall_nsnh_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + CPUState *cs = env_cpu(env); + + tlb_flush_by_mmuidx(cs, alle1_tlbmask(env)); +} + +static void tlbiall_nsnh_is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + CPUState *cs = env_cpu(env); + + tlb_flush_by_mmuidx_all_cpus_synced(cs, alle1_tlbmask(env)); +} + + +static void tlbiall_hyp_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + CPUState *cs = env_cpu(env); + + tlb_flush_by_mmuidx(cs, ARMMMUIdxBit_E2); +} + +static void tlbiall_hyp_is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + CPUState *cs = env_cpu(env); + + tlb_flush_by_mmuidx_all_cpus_synced(cs, ARMMMUIdxBit_E2); +} + static const ARMCPRegInfo tlbi_not_v7_cp_reginfo[] = { /* * MMU TLB control. Note that the wildcarding means we cover not just @@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo tlbi_v8_cp_reginfo[] = { .writefn = tlbiipas2is_hyp_write }, }; +static const ARMCPRegInfo tlbi_el2_cp_reginfo[] = { + { .name = "TLBIALLNSNH", + .cp = 15, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 4, + .type = ARM_CP_NO_RAW, .access = PL2_W, + .writefn = tlbiall_nsnh_write }, + { .name = "TLBIALLNSNHIS", + .cp = 15, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 4, + .type = ARM_CP_NO_RAW, .access = PL2_W, + .writefn = tlbiall_nsnh_is_write }, + { .name = "TLBIALLH", .cp = 15, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 0, + .type = ARM_CP_NO_RAW, .access = PL2_W, + .writefn = tlbiall_hyp_write }, + { .name = "TLBIALLHIS", .cp = 15, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 0, + .type = ARM_CP_NO_RAW, .access = PL2_W, + .writefn = tlbiall_hyp_is_write }, + { .name = "TLBIMVAH", .cp = 15, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 1, + .type = ARM_CP_NO_RAW, .access = PL2_W, + .writefn = tlbimva_hyp_write }, + { .name = "TLBIMVAHIS", .cp = 15, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 1, + .type = ARM_CP_NO_RAW, .access = PL2_W, + .writefn = tlbimva_hyp_is_write }, +}; + void define_tlb_insn_regs(ARMCPU *cpu) { CPUARMState *env = &cpu->env; @@ -XXX,XX +XXX,XX @@ void define_tlb_insn_regs(ARMCPU *cpu) if (arm_feature(env, ARM_FEATURE_V8)) { define_arm_cp_regs(cpu, tlbi_v8_cp_reginfo); } + /* + * We retain the existing logic for when to register these TLBI + * ops (i.e. matching the condition for el2_cp_reginfo[] in + * helper.c), but we will be able to simplify this later. + */ + if (arm_feature(env, ARM_FEATURE_EL2) + || (arm_feature(env, ARM_FEATURE_EL3) + && arm_feature(env, ARM_FEATURE_V8))) { + define_arm_cp_regs(cpu, tlbi_el2_cp_reginfo); + } } -- 2.34.1
Move the AArch64 TLBI insns that are declared in v8_cp_reginfo[] into tlb-insns.c. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241210160452.2427965-4-peter.maydell@linaro.org --- target/arm/cpregs.h | 11 +++ target/arm/helper.c | 182 +++---------------------------------- target/arm/tcg/tlb-insns.c | 160 ++++++++++++++++++++++++++++++++ 3 files changed, 182 insertions(+), 171 deletions(-) diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpregs.h +++ b/target/arm/cpregs.h @@ -XXX,XX +XXX,XX @@ CPAccessResult access_ttlb(CPUARMState *env, const ARMCPRegInfo *ri, CPAccessResult access_ttlbis(CPUARMState *env, const ARMCPRegInfo *ri, bool isread); bool tlb_force_broadcast(CPUARMState *env); +int tlbbits_for_regime(CPUARMState *env, ARMMMUIdx mmu_idx, + uint64_t addr); +int vae1_tlbbits(CPUARMState *env, uint64_t addr); +int vae1_tlbmask(CPUARMState *env); +int ipas2e1_tlbmask(CPUARMState *env, int64_t value); +void tlbi_aa64_vmalle1is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value); +void tlbi_aa64_alle1is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value); +void tlbi_aa64_vae1is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value); #endif /* TARGET_ARM_CPREGS_H */ diff --git a/target/arm/helper.c b/target/arm/helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -XXX,XX +XXX,XX @@ static CPAccessResult access_tocu(CPUARMState *env, const ARMCPRegInfo *ri, * Page D4-1736 (DDI0487A.b) */ -static int vae1_tlbmask(CPUARMState *env) +int vae1_tlbmask(CPUARMState *env) { uint64_t hcr = arm_hcr_el2_eff(env); uint16_t mask; @@ -XXX,XX +XXX,XX @@ static int vae2_tlbmask(CPUARMState *env) } /* Return 56 if TBI is enabled, 64 otherwise. */ -static int tlbbits_for_regime(CPUARMState *env, ARMMMUIdx mmu_idx, - uint64_t addr) +int tlbbits_for_regime(CPUARMState *env, ARMMMUIdx mmu_idx, + uint64_t addr) { uint64_t tcr = regime_tcr(env, mmu_idx); int tbi = aa64_va_parameter_tbi(tcr, mmu_idx); @@ -XXX,XX +XXX,XX @@ static int tlbbits_for_regime(CPUARMState *env, ARMMMUIdx mmu_idx, return (tbi >> select) & 1 ? 56 : 64; } -static int vae1_tlbbits(CPUARMState *env, uint64_t addr) +int vae1_tlbbits(CPUARMState *env, uint64_t addr) { uint64_t hcr = arm_hcr_el2_eff(env); ARMMMUIdx mmu_idx; @@ -XXX,XX +XXX,XX @@ static int vae2_tlbbits(CPUARMState *env, uint64_t addr) return tlbbits_for_regime(env, mmu_idx, addr); } -static void tlbi_aa64_vmalle1is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) +void tlbi_aa64_vmalle1is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) { CPUState *cs = env_cpu(env); int mask = vae1_tlbmask(env); @@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vmalle1is_write(CPUARMState *env, const ARMCPRegInfo *ri, tlb_flush_by_mmuidx_all_cpus_synced(cs, mask); } -static void tlbi_aa64_vmalle1_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - CPUState *cs = env_cpu(env); - int mask = vae1_tlbmask(env); - - if (tlb_force_broadcast(env)) { - tlb_flush_by_mmuidx_all_cpus_synced(cs, mask); - } else { - tlb_flush_by_mmuidx(cs, mask); - } -} - static int e2_tlbmask(CPUARMState *env) { return (ARMMMUIdxBit_E20_0 | @@ -XXX,XX +XXX,XX @@ static int e2_tlbmask(CPUARMState *env) ARMMMUIdxBit_E2); } -static void tlbi_aa64_alle1_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - CPUState *cs = env_cpu(env); - int mask = alle1_tlbmask(env); - - tlb_flush_by_mmuidx(cs, mask); -} - static void tlbi_aa64_alle2_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value) { @@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_alle3_write(CPUARMState *env, const ARMCPRegInfo *ri, tlb_flush_by_mmuidx(cs, ARMMMUIdxBit_E3); } -static void tlbi_aa64_alle1is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) +void tlbi_aa64_alle1is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) { CPUState *cs = env_cpu(env); int mask = alle1_tlbmask(env); @@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae3_write(CPUARMState *env, const ARMCPRegInfo *ri, tlb_flush_page_by_mmuidx(cs, pageaddr, ARMMMUIdxBit_E3); } -static void tlbi_aa64_vae1is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) +void tlbi_aa64_vae1is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) { CPUState *cs = env_cpu(env); int mask = vae1_tlbmask(env); @@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae1is_write(CPUARMState *env, const ARMCPRegInfo *ri, tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr, mask, bits); } -static void tlbi_aa64_vae1_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - /* - * Invalidate by VA, EL1&0 (AArch64 version). - * Currently handles all of VAE1, VAAE1, VAALE1 and VALE1, - * since we don't support flush-for-specific-ASID-only or - * flush-last-level-only. - */ - CPUState *cs = env_cpu(env); - int mask = vae1_tlbmask(env); - uint64_t pageaddr = sextract64(value << 12, 0, 56); - int bits = vae1_tlbbits(env, pageaddr); - - if (tlb_force_broadcast(env)) { - tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr, mask, bits); - } else { - tlb_flush_page_bits_by_mmuidx(cs, pageaddr, mask, bits); - } -} - static void tlbi_aa64_vae2is_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value) { @@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae3is_write(CPUARMState *env, const ARMCPRegInfo *ri, ARMMMUIdxBit_E3, bits); } -static int ipas2e1_tlbmask(CPUARMState *env, int64_t value) +int ipas2e1_tlbmask(CPUARMState *env, int64_t value) { /* * The MSB of value is the NS field, which only applies if SEL2 @@ -XXX,XX +XXX,XX @@ static int ipas2e1_tlbmask(CPUARMState *env, int64_t value) : ARMMMUIdxBit_Stage2); } -static void tlbi_aa64_ipas2e1_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - CPUState *cs = env_cpu(env); - int mask = ipas2e1_tlbmask(env, value); - uint64_t pageaddr = sextract64(value << 12, 0, 56); - - if (tlb_force_broadcast(env)) { - tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr, mask); - } else { - tlb_flush_page_by_mmuidx(cs, pageaddr, mask); - } -} - -static void tlbi_aa64_ipas2e1is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - CPUState *cs = env_cpu(env); - int mask = ipas2e1_tlbmask(env, value); - uint64_t pageaddr = sextract64(value << 12, 0, 56); - - tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr, mask); -} - #ifdef TARGET_AARCH64 typedef struct { uint64_t base; @@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = { .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 14, .opc2 = 2, .fgt = FGT_DCCISW, .access = PL1_W, .accessfn = access_tsw, .type = ARM_CP_NOP }, - /* TLBI operations */ - { .name = "TLBI_VMALLE1IS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 0, - .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW, - .fgt = FGT_TLBIVMALLE1IS, - .writefn = tlbi_aa64_vmalle1is_write }, - { .name = "TLBI_VAE1IS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 1, - .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW, - .fgt = FGT_TLBIVAE1IS, - .writefn = tlbi_aa64_vae1is_write }, - { .name = "TLBI_ASIDE1IS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 2, - .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW, - .fgt = FGT_TLBIASIDE1IS, - .writefn = tlbi_aa64_vmalle1is_write }, - { .name = "TLBI_VAAE1IS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 3, - .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW, - .fgt = FGT_TLBIVAAE1IS, - .writefn = tlbi_aa64_vae1is_write }, - { .name = "TLBI_VALE1IS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 5, - .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW, - .fgt = FGT_TLBIVALE1IS, - .writefn = tlbi_aa64_vae1is_write }, - { .name = "TLBI_VAALE1IS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 7, - .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW, - .fgt = FGT_TLBIVAALE1IS, - .writefn = tlbi_aa64_vae1is_write }, - { .name = "TLBI_VMALLE1", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 0, - .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW, - .fgt = FGT_TLBIVMALLE1, - .writefn = tlbi_aa64_vmalle1_write }, - { .name = "TLBI_VAE1", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 1, - .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW, - .fgt = FGT_TLBIVAE1, - .writefn = tlbi_aa64_vae1_write }, - { .name = "TLBI_ASIDE1", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 2, - .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW, - .fgt = FGT_TLBIASIDE1, - .writefn = tlbi_aa64_vmalle1_write }, - { .name = "TLBI_VAAE1", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 3, - .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW, - .fgt = FGT_TLBIVAAE1, - .writefn = tlbi_aa64_vae1_write }, - { .name = "TLBI_VALE1", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 5, - .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW, - .fgt = FGT_TLBIVALE1, - .writefn = tlbi_aa64_vae1_write }, - { .name = "TLBI_VAALE1", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 7, - .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW, - .fgt = FGT_TLBIVAALE1, - .writefn = tlbi_aa64_vae1_write }, - { .name = "TLBI_IPAS2E1IS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 1, - .access = PL2_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_ipas2e1is_write }, - { .name = "TLBI_IPAS2LE1IS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 5, - .access = PL2_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_ipas2e1is_write }, - { .name = "TLBI_ALLE1IS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 4, - .access = PL2_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_alle1is_write }, - { .name = "TLBI_VMALLS12E1IS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 6, - .access = PL2_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_alle1is_write }, - { .name = "TLBI_IPAS2E1", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 1, - .access = PL2_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_ipas2e1_write }, - { .name = "TLBI_IPAS2LE1", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 5, - .access = PL2_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_ipas2e1_write }, - { .name = "TLBI_ALLE1", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 4, - .access = PL2_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_alle1_write }, - { .name = "TLBI_VMALLS12E1", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 6, - .access = PL2_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_alle1is_write }, #ifndef CONFIG_USER_ONLY /* 64 bit address translation operations */ { .name = "AT_S1E1R", .state = ARM_CP_STATE_AA64, diff --git a/target/arm/tcg/tlb-insns.c b/target/arm/tcg/tlb-insns.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/tlb-insns.c +++ b/target/arm/tcg/tlb-insns.c @@ -XXX,XX +XXX,XX @@ static void tlbiall_hyp_is_write(CPUARMState *env, const ARMCPRegInfo *ri, tlb_flush_by_mmuidx_all_cpus_synced(cs, ARMMMUIdxBit_E2); } +static void tlbi_aa64_vmalle1_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + CPUState *cs = env_cpu(env); + int mask = vae1_tlbmask(env); + + if (tlb_force_broadcast(env)) { + tlb_flush_by_mmuidx_all_cpus_synced(cs, mask); + } else { + tlb_flush_by_mmuidx(cs, mask); + } +} + +static void tlbi_aa64_alle1_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + CPUState *cs = env_cpu(env); + int mask = alle1_tlbmask(env); + + tlb_flush_by_mmuidx(cs, mask); +} + +static void tlbi_aa64_vae1_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + /* + * Invalidate by VA, EL1&0 (AArch64 version). + * Currently handles all of VAE1, VAAE1, VAALE1 and VALE1, + * since we don't support flush-for-specific-ASID-only or + * flush-last-level-only. + */ + CPUState *cs = env_cpu(env); + int mask = vae1_tlbmask(env); + uint64_t pageaddr = sextract64(value << 12, 0, 56); + int bits = vae1_tlbbits(env, pageaddr); + + if (tlb_force_broadcast(env)) { + tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr, mask, bits); + } else { + tlb_flush_page_bits_by_mmuidx(cs, pageaddr, mask, bits); + } +} + +static void tlbi_aa64_ipas2e1_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + CPUState *cs = env_cpu(env); + int mask = ipas2e1_tlbmask(env, value); + uint64_t pageaddr = sextract64(value << 12, 0, 56); + + if (tlb_force_broadcast(env)) { + tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr, mask); + } else { + tlb_flush_page_by_mmuidx(cs, pageaddr, mask); + } +} + +static void tlbi_aa64_ipas2e1is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + CPUState *cs = env_cpu(env); + int mask = ipas2e1_tlbmask(env, value); + uint64_t pageaddr = sextract64(value << 12, 0, 56); + + tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr, mask); +} + static const ARMCPRegInfo tlbi_not_v7_cp_reginfo[] = { /* * MMU TLB control. Note that the wildcarding means we cover not just @@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo tlbi_v8_cp_reginfo[] = { .cp = 15, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 5, .type = ARM_CP_NO_RAW, .access = PL2_W, .writefn = tlbiipas2is_hyp_write }, + /* AArch64 TLBI operations */ + { .name = "TLBI_VMALLE1IS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 0, + .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW, + .fgt = FGT_TLBIVMALLE1IS, + .writefn = tlbi_aa64_vmalle1is_write }, + { .name = "TLBI_VAE1IS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 1, + .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW, + .fgt = FGT_TLBIVAE1IS, + .writefn = tlbi_aa64_vae1is_write }, + { .name = "TLBI_ASIDE1IS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 2, + .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW, + .fgt = FGT_TLBIASIDE1IS, + .writefn = tlbi_aa64_vmalle1is_write }, + { .name = "TLBI_VAAE1IS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 3, + .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW, + .fgt = FGT_TLBIVAAE1IS, + .writefn = tlbi_aa64_vae1is_write }, + { .name = "TLBI_VALE1IS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 5, + .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW, + .fgt = FGT_TLBIVALE1IS, + .writefn = tlbi_aa64_vae1is_write }, + { .name = "TLBI_VAALE1IS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 7, + .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW, + .fgt = FGT_TLBIVAALE1IS, + .writefn = tlbi_aa64_vae1is_write }, + { .name = "TLBI_VMALLE1", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 0, + .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW, + .fgt = FGT_TLBIVMALLE1, + .writefn = tlbi_aa64_vmalle1_write }, + { .name = "TLBI_VAE1", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 1, + .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW, + .fgt = FGT_TLBIVAE1, + .writefn = tlbi_aa64_vae1_write }, + { .name = "TLBI_ASIDE1", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 2, + .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW, + .fgt = FGT_TLBIASIDE1, + .writefn = tlbi_aa64_vmalle1_write }, + { .name = "TLBI_VAAE1", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 3, + .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW, + .fgt = FGT_TLBIVAAE1, + .writefn = tlbi_aa64_vae1_write }, + { .name = "TLBI_VALE1", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 5, + .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW, + .fgt = FGT_TLBIVALE1, + .writefn = tlbi_aa64_vae1_write }, + { .name = "TLBI_VAALE1", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 7, + .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW, + .fgt = FGT_TLBIVAALE1, + .writefn = tlbi_aa64_vae1_write }, + { .name = "TLBI_IPAS2E1IS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 1, + .access = PL2_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_ipas2e1is_write }, + { .name = "TLBI_IPAS2LE1IS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 5, + .access = PL2_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_ipas2e1is_write }, + { .name = "TLBI_ALLE1IS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 4, + .access = PL2_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_alle1is_write }, + { .name = "TLBI_VMALLS12E1IS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 6, + .access = PL2_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_alle1is_write }, + { .name = "TLBI_IPAS2E1", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 1, + .access = PL2_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_ipas2e1_write }, + { .name = "TLBI_IPAS2LE1", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 5, + .access = PL2_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_ipas2e1_write }, + { .name = "TLBI_ALLE1", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 4, + .access = PL2_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_alle1_write }, + { .name = "TLBI_VMALLS12E1", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 6, + .access = PL2_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_alle1is_write }, }; static const ARMCPRegInfo tlbi_el2_cp_reginfo[] = { -- 2.34.1
Move the AArch64 EL2 TLBI insn definitions that were in el2_cp_reginfo[] across to tlb-insns.c. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241210160452.2427965-5-peter.maydell@linaro.org --- target/arm/cpregs.h | 7 +++++ target/arm/helper.c | 61 ++++---------------------------------- target/arm/tcg/tlb-insns.c | 49 ++++++++++++++++++++++++++++++ 3 files changed, 62 insertions(+), 55 deletions(-) diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpregs.h +++ b/target/arm/cpregs.h @@ -XXX,XX +XXX,XX @@ bool tlb_force_broadcast(CPUARMState *env); int tlbbits_for_regime(CPUARMState *env, ARMMMUIdx mmu_idx, uint64_t addr); int vae1_tlbbits(CPUARMState *env, uint64_t addr); +int vae2_tlbbits(CPUARMState *env, uint64_t addr); int vae1_tlbmask(CPUARMState *env); +int vae2_tlbmask(CPUARMState *env); int ipas2e1_tlbmask(CPUARMState *env, int64_t value); +int e2_tlbmask(CPUARMState *env); void tlbi_aa64_vmalle1is_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value); void tlbi_aa64_alle1is_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value); void tlbi_aa64_vae1is_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value); +void tlbi_aa64_alle2is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value); +void tlbi_aa64_vae2is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value); #endif /* TARGET_ARM_CPREGS_H */ diff --git a/target/arm/helper.c b/target/arm/helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -XXX,XX +XXX,XX @@ int vae1_tlbmask(CPUARMState *env) return mask; } -static int vae2_tlbmask(CPUARMState *env) +int vae2_tlbmask(CPUARMState *env) { uint64_t hcr = arm_hcr_el2_eff(env); uint16_t mask; @@ -XXX,XX +XXX,XX @@ int vae1_tlbbits(CPUARMState *env, uint64_t addr) return tlbbits_for_regime(env, mmu_idx, addr); } -static int vae2_tlbbits(CPUARMState *env, uint64_t addr) +int vae2_tlbbits(CPUARMState *env, uint64_t addr) { uint64_t hcr = arm_hcr_el2_eff(env); ARMMMUIdx mmu_idx; @@ -XXX,XX +XXX,XX @@ void tlbi_aa64_vmalle1is_write(CPUARMState *env, const ARMCPRegInfo *ri, tlb_flush_by_mmuidx_all_cpus_synced(cs, mask); } -static int e2_tlbmask(CPUARMState *env) +int e2_tlbmask(CPUARMState *env) { return (ARMMMUIdxBit_E20_0 | ARMMMUIdxBit_E20_2 | @@ -XXX,XX +XXX,XX @@ static int e2_tlbmask(CPUARMState *env) ARMMMUIdxBit_E2); } -static void tlbi_aa64_alle2_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - CPUState *cs = env_cpu(env); - int mask = e2_tlbmask(env); - - tlb_flush_by_mmuidx(cs, mask); -} - static void tlbi_aa64_alle3_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value) { @@ -XXX,XX +XXX,XX @@ void tlbi_aa64_alle1is_write(CPUARMState *env, const ARMCPRegInfo *ri, tlb_flush_by_mmuidx_all_cpus_synced(cs, mask); } -static void tlbi_aa64_alle2is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) +void tlbi_aa64_alle2is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) { CPUState *cs = env_cpu(env); int mask = e2_tlbmask(env); @@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_alle3is_write(CPUARMState *env, const ARMCPRegInfo *ri, tlb_flush_by_mmuidx_all_cpus_synced(cs, ARMMMUIdxBit_E3); } -static void tlbi_aa64_vae2_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - /* - * Invalidate by VA, EL2 - * Currently handles both VAE2 and VALE2, since we don't support - * flush-last-level-only. - */ - CPUState *cs = env_cpu(env); - int mask = vae2_tlbmask(env); - uint64_t pageaddr = sextract64(value << 12, 0, 56); - int bits = vae2_tlbbits(env, pageaddr); - - tlb_flush_page_bits_by_mmuidx(cs, pageaddr, mask, bits); -} - static void tlbi_aa64_vae3_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value) { @@ -XXX,XX +XXX,XX @@ void tlbi_aa64_vae1is_write(CPUARMState *env, const ARMCPRegInfo *ri, tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr, mask, bits); } -static void tlbi_aa64_vae2is_write(CPUARMState *env, const ARMCPRegInfo *ri, +void tlbi_aa64_vae2is_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value) { CPUState *cs = env_cpu(env); @@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el2_cp_reginfo[] = { { .name = "HTTBR", .cp = 15, .opc1 = 4, .crm = 2, .access = PL2_RW, .type = ARM_CP_64BIT | ARM_CP_ALIAS, .fieldoffset = offsetof(CPUARMState, cp15.ttbr0_el[2]) }, - { .name = "TLBI_ALLE2", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 0, - .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, - .writefn = tlbi_aa64_alle2_write }, - { .name = "TLBI_VAE2", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 1, - .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, - .writefn = tlbi_aa64_vae2_write }, - { .name = "TLBI_VALE2", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 5, - .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, - .writefn = tlbi_aa64_vae2_write }, - { .name = "TLBI_ALLE2IS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 0, - .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, - .writefn = tlbi_aa64_alle2is_write }, - { .name = "TLBI_VAE2IS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 1, - .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, - .writefn = tlbi_aa64_vae2is_write }, - { .name = "TLBI_VALE2IS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 5, - .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, - .writefn = tlbi_aa64_vae2is_write }, #ifndef CONFIG_USER_ONLY /* * Unlike the other EL2-related AT operations, these must diff --git a/target/arm/tcg/tlb-insns.c b/target/arm/tcg/tlb-insns.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/tlb-insns.c +++ b/target/arm/tcg/tlb-insns.c @@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_alle1_write(CPUARMState *env, const ARMCPRegInfo *ri, tlb_flush_by_mmuidx(cs, mask); } +static void tlbi_aa64_alle2_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + CPUState *cs = env_cpu(env); + int mask = e2_tlbmask(env); + + tlb_flush_by_mmuidx(cs, mask); +} + +static void tlbi_aa64_vae2_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + /* + * Invalidate by VA, EL2 + * Currently handles both VAE2 and VALE2, since we don't support + * flush-last-level-only. + */ + CPUState *cs = env_cpu(env); + int mask = vae2_tlbmask(env); + uint64_t pageaddr = sextract64(value << 12, 0, 56); + int bits = vae2_tlbbits(env, pageaddr); + + tlb_flush_page_bits_by_mmuidx(cs, pageaddr, mask, bits); +} + static void tlbi_aa64_vae1_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value) { @@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo tlbi_el2_cp_reginfo[] = { { .name = "TLBIMVAHIS", .cp = 15, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 1, .type = ARM_CP_NO_RAW, .access = PL2_W, .writefn = tlbimva_hyp_is_write }, + { .name = "TLBI_ALLE2", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 0, + .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, + .writefn = tlbi_aa64_alle2_write }, + { .name = "TLBI_VAE2", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 1, + .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, + .writefn = tlbi_aa64_vae2_write }, + { .name = "TLBI_VALE2", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 5, + .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, + .writefn = tlbi_aa64_vae2_write }, + { .name = "TLBI_ALLE2IS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 0, + .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, + .writefn = tlbi_aa64_alle2is_write }, + { .name = "TLBI_VAE2IS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 1, + .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, + .writefn = tlbi_aa64_vae2is_write }, + { .name = "TLBI_VALE2IS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 5, + .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, + .writefn = tlbi_aa64_vae2is_write }, }; void define_tlb_insn_regs(ARMCPU *cpu) -- 2.34.1
Move the AArch64 EL3 TLBI insns from el3_cp_reginfo[] across to tlb-insns.c. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241210160452.2427965-6-peter.maydell@linaro.org --- target/arm/cpregs.h | 4 +++ target/arm/helper.c | 56 +++----------------------------------- target/arm/tcg/tlb-insns.c | 54 ++++++++++++++++++++++++++++++++++++ 3 files changed, 62 insertions(+), 52 deletions(-) diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpregs.h +++ b/target/arm/cpregs.h @@ -XXX,XX +XXX,XX @@ void tlbi_aa64_alle2is_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value); void tlbi_aa64_vae2is_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value); +void tlbi_aa64_vae3is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value); +void tlbi_aa64_alle3is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value); #endif /* TARGET_ARM_CPREGS_H */ diff --git a/target/arm/helper.c b/target/arm/helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -XXX,XX +XXX,XX @@ int e2_tlbmask(CPUARMState *env) ARMMMUIdxBit_E2); } -static void tlbi_aa64_alle3_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - ARMCPU *cpu = env_archcpu(env); - CPUState *cs = CPU(cpu); - - tlb_flush_by_mmuidx(cs, ARMMMUIdxBit_E3); -} - void tlbi_aa64_alle1is_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value) { @@ -XXX,XX +XXX,XX @@ void tlbi_aa64_alle2is_write(CPUARMState *env, const ARMCPRegInfo *ri, tlb_flush_by_mmuidx_all_cpus_synced(cs, mask); } -static void tlbi_aa64_alle3is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) +void tlbi_aa64_alle3is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) { CPUState *cs = env_cpu(env); tlb_flush_by_mmuidx_all_cpus_synced(cs, ARMMMUIdxBit_E3); } -static void tlbi_aa64_vae3_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - /* - * Invalidate by VA, EL3 - * Currently handles both VAE3 and VALE3, since we don't support - * flush-last-level-only. - */ - ARMCPU *cpu = env_archcpu(env); - CPUState *cs = CPU(cpu); - uint64_t pageaddr = sextract64(value << 12, 0, 56); - - tlb_flush_page_by_mmuidx(cs, pageaddr, ARMMMUIdxBit_E3); -} - void tlbi_aa64_vae1is_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value) { @@ -XXX,XX +XXX,XX @@ void tlbi_aa64_vae2is_write(CPUARMState *env, const ARMCPRegInfo *ri, tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr, mask, bits); } -static void tlbi_aa64_vae3is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) +void tlbi_aa64_vae3is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) { CPUState *cs = env_cpu(env); uint64_t pageaddr = sextract64(value << 12, 0, 56); @@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el3_cp_reginfo[] = { .opc0 = 3, .opc1 = 6, .crn = 5, .crm = 1, .opc2 = 1, .access = PL3_RW, .type = ARM_CP_CONST, .resetvalue = 0 }, - { .name = "TLBI_ALLE3IS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 3, .opc2 = 0, - .access = PL3_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_alle3is_write }, - { .name = "TLBI_VAE3IS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 3, .opc2 = 1, - .access = PL3_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_vae3is_write }, - { .name = "TLBI_VALE3IS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 3, .opc2 = 5, - .access = PL3_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_vae3is_write }, - { .name = "TLBI_ALLE3", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 7, .opc2 = 0, - .access = PL3_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_alle3_write }, - { .name = "TLBI_VAE3", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 7, .opc2 = 1, - .access = PL3_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_vae3_write }, - { .name = "TLBI_VALE3", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 7, .opc2 = 5, - .access = PL3_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_vae3_write }, }; #ifndef CONFIG_USER_ONLY diff --git a/target/arm/tcg/tlb-insns.c b/target/arm/tcg/tlb-insns.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/tlb-insns.c +++ b/target/arm/tcg/tlb-insns.c @@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_alle2_write(CPUARMState *env, const ARMCPRegInfo *ri, tlb_flush_by_mmuidx(cs, mask); } +static void tlbi_aa64_alle3_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + ARMCPU *cpu = env_archcpu(env); + CPUState *cs = CPU(cpu); + + tlb_flush_by_mmuidx(cs, ARMMMUIdxBit_E3); +} + static void tlbi_aa64_vae2_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value) { @@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae2_write(CPUARMState *env, const ARMCPRegInfo *ri, tlb_flush_page_bits_by_mmuidx(cs, pageaddr, mask, bits); } +static void tlbi_aa64_vae3_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + /* + * Invalidate by VA, EL3 + * Currently handles both VAE3 and VALE3, since we don't support + * flush-last-level-only. + */ + ARMCPU *cpu = env_archcpu(env); + CPUState *cs = CPU(cpu); + uint64_t pageaddr = sextract64(value << 12, 0, 56); + + tlb_flush_page_by_mmuidx(cs, pageaddr, ARMMMUIdxBit_E3); +} + static void tlbi_aa64_vae1_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value) { @@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo tlbi_el2_cp_reginfo[] = { .writefn = tlbi_aa64_vae2is_write }, }; +static const ARMCPRegInfo tlbi_el3_cp_reginfo[] = { + { .name = "TLBI_ALLE3IS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 3, .opc2 = 0, + .access = PL3_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_alle3is_write }, + { .name = "TLBI_VAE3IS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 3, .opc2 = 1, + .access = PL3_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_vae3is_write }, + { .name = "TLBI_VALE3IS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 3, .opc2 = 5, + .access = PL3_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_vae3is_write }, + { .name = "TLBI_ALLE3", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 7, .opc2 = 0, + .access = PL3_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_alle3_write }, + { .name = "TLBI_VAE3", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 7, .opc2 = 1, + .access = PL3_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_vae3_write }, + { .name = "TLBI_VALE3", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 7, .opc2 = 5, + .access = PL3_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_vae3_write }, +}; + void define_tlb_insn_regs(ARMCPU *cpu) { CPUARMState *env = &cpu->env; @@ -XXX,XX +XXX,XX @@ void define_tlb_insn_regs(ARMCPU *cpu) && arm_feature(env, ARM_FEATURE_V8))) { define_arm_cp_regs(cpu, tlbi_el2_cp_reginfo); } + if (arm_feature(env, ARM_FEATURE_EL3)) { + define_arm_cp_regs(cpu, tlbi_el3_cp_reginfo); + } } -- 2.34.1
Move the TLBI invalidate-range insns across to tlb-insns.c. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241210160452.2427965-7-peter.maydell@linaro.org --- target/arm/cpregs.h | 2 + target/arm/helper.c | 330 +------------------------------------ target/arm/tcg/tlb-insns.c | 329 ++++++++++++++++++++++++++++++++++++ 3 files changed, 333 insertions(+), 328 deletions(-) diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpregs.h +++ b/target/arm/cpregs.h @@ -XXX,XX +XXX,XX @@ CPAccessResult access_ttlb(CPUARMState *env, const ARMCPRegInfo *ri, bool isread); CPAccessResult access_ttlbis(CPUARMState *env, const ARMCPRegInfo *ri, bool isread); +CPAccessResult access_ttlbos(CPUARMState *env, const ARMCPRegInfo *ri, + bool isread); bool tlb_force_broadcast(CPUARMState *env); int tlbbits_for_regime(CPUARMState *env, ARMMMUIdx mmu_idx, uint64_t addr); diff --git a/target/arm/helper.c b/target/arm/helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -XXX,XX +XXX,XX @@ CPAccessResult access_ttlbis(CPUARMState *env, const ARMCPRegInfo *ri, #ifdef TARGET_AARCH64 /* Check for traps from EL1 due to HCR_EL2.TTLB or TTLBOS. */ -static CPAccessResult access_ttlbos(CPUARMState *env, const ARMCPRegInfo *ri, - bool isread) +CPAccessResult access_ttlbos(CPUARMState *env, const ARMCPRegInfo *ri, + bool isread) { if (arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & (HCR_TTLB | HCR_TTLBOS))) { @@ -XXX,XX +XXX,XX @@ int ipas2e1_tlbmask(CPUARMState *env, int64_t value) : ARMMMUIdxBit_Stage2); } -#ifdef TARGET_AARCH64 -typedef struct { - uint64_t base; - uint64_t length; -} TLBIRange; - -static ARMGranuleSize tlbi_range_tg_to_gran_size(int tg) -{ - /* - * Note that the TLBI range TG field encoding differs from both - * TG0 and TG1 encodings. - */ - switch (tg) { - case 1: - return Gran4K; - case 2: - return Gran16K; - case 3: - return Gran64K; - default: - return GranInvalid; - } -} - -static TLBIRange tlbi_aa64_get_range(CPUARMState *env, ARMMMUIdx mmuidx, - uint64_t value) -{ - unsigned int page_size_granule, page_shift, num, scale, exponent; - /* Extract one bit to represent the va selector in use. */ - uint64_t select = sextract64(value, 36, 1); - ARMVAParameters param = aa64_va_parameters(env, select, mmuidx, true, false); - TLBIRange ret = { }; - ARMGranuleSize gran; - - page_size_granule = extract64(value, 46, 2); - gran = tlbi_range_tg_to_gran_size(page_size_granule); - - /* The granule encoded in value must match the granule in use. */ - if (gran != param.gran) { - qemu_log_mask(LOG_GUEST_ERROR, "Invalid tlbi page size granule %d\n", - page_size_granule); - return ret; - } - - page_shift = arm_granule_bits(gran); - num = extract64(value, 39, 5); - scale = extract64(value, 44, 2); - exponent = (5 * scale) + 1; - - ret.length = (num + 1) << (exponent + page_shift); - - if (param.select) { - ret.base = sextract64(value, 0, 37); - } else { - ret.base = extract64(value, 0, 37); - } - if (param.ds) { - /* - * With DS=1, BaseADDR is always shifted 16 so that it is able - * to address all 52 va bits. The input address is perforce - * aligned on a 64k boundary regardless of translation granule. - */ - page_shift = 16; - } - ret.base <<= page_shift; - - return ret; -} - -static void do_rvae_write(CPUARMState *env, uint64_t value, - int idxmap, bool synced) -{ - ARMMMUIdx one_idx = ARM_MMU_IDX_A | ctz32(idxmap); - TLBIRange range; - int bits; - - range = tlbi_aa64_get_range(env, one_idx, value); - bits = tlbbits_for_regime(env, one_idx, range.base); - - if (synced) { - tlb_flush_range_by_mmuidx_all_cpus_synced(env_cpu(env), - range.base, - range.length, - idxmap, - bits); - } else { - tlb_flush_range_by_mmuidx(env_cpu(env), range.base, - range.length, idxmap, bits); - } -} - -static void tlbi_aa64_rvae1_write(CPUARMState *env, - const ARMCPRegInfo *ri, - uint64_t value) -{ - /* - * Invalidate by VA range, EL1&0. - * Currently handles all of RVAE1, RVAAE1, RVAALE1 and RVALE1, - * since we don't support flush-for-specific-ASID-only or - * flush-last-level-only. - */ - - do_rvae_write(env, value, vae1_tlbmask(env), - tlb_force_broadcast(env)); -} - -static void tlbi_aa64_rvae1is_write(CPUARMState *env, - const ARMCPRegInfo *ri, - uint64_t value) -{ - /* - * Invalidate by VA range, Inner/Outer Shareable EL1&0. - * Currently handles all of RVAE1IS, RVAE1OS, RVAAE1IS, RVAAE1OS, - * RVAALE1IS, RVAALE1OS, RVALE1IS and RVALE1OS, since we don't support - * flush-for-specific-ASID-only, flush-last-level-only or inner/outer - * shareable specific flushes. - */ - - do_rvae_write(env, value, vae1_tlbmask(env), true); -} - -static void tlbi_aa64_rvae2_write(CPUARMState *env, - const ARMCPRegInfo *ri, - uint64_t value) -{ - /* - * Invalidate by VA range, EL2. - * Currently handles all of RVAE2 and RVALE2, - * since we don't support flush-for-specific-ASID-only or - * flush-last-level-only. - */ - - do_rvae_write(env, value, vae2_tlbmask(env), - tlb_force_broadcast(env)); - - -} - -static void tlbi_aa64_rvae2is_write(CPUARMState *env, - const ARMCPRegInfo *ri, - uint64_t value) -{ - /* - * Invalidate by VA range, Inner/Outer Shareable, EL2. - * Currently handles all of RVAE2IS, RVAE2OS, RVALE2IS and RVALE2OS, - * since we don't support flush-for-specific-ASID-only, - * flush-last-level-only or inner/outer shareable specific flushes. - */ - - do_rvae_write(env, value, vae2_tlbmask(env), true); - -} - -static void tlbi_aa64_rvae3_write(CPUARMState *env, - const ARMCPRegInfo *ri, - uint64_t value) -{ - /* - * Invalidate by VA range, EL3. - * Currently handles all of RVAE3 and RVALE3, - * since we don't support flush-for-specific-ASID-only or - * flush-last-level-only. - */ - - do_rvae_write(env, value, ARMMMUIdxBit_E3, tlb_force_broadcast(env)); -} - -static void tlbi_aa64_rvae3is_write(CPUARMState *env, - const ARMCPRegInfo *ri, - uint64_t value) -{ - /* - * Invalidate by VA range, EL3, Inner/Outer Shareable. - * Currently handles all of RVAE3IS, RVAE3OS, RVALE3IS and RVALE3OS, - * since we don't support flush-for-specific-ASID-only, - * flush-last-level-only or inner/outer specific flushes. - */ - - do_rvae_write(env, value, ARMMMUIdxBit_E3, true); -} - -static void tlbi_aa64_ripas2e1_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - do_rvae_write(env, value, ipas2e1_tlbmask(env, value), - tlb_force_broadcast(env)); -} - -static void tlbi_aa64_ripas2e1is_write(CPUARMState *env, - const ARMCPRegInfo *ri, - uint64_t value) -{ - do_rvae_write(env, value, ipas2e1_tlbmask(env, value), true); -} -#endif - static CPAccessResult aa64_zva_access(CPUARMState *env, const ARMCPRegInfo *ri, bool isread) { @@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo pauth_reginfo[] = { .fieldoffset = offsetof(CPUARMState, keys.apib.hi) }, }; -static const ARMCPRegInfo tlbirange_reginfo[] = { - { .name = "TLBI_RVAE1IS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 2, .opc2 = 1, - .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW, - .fgt = FGT_TLBIRVAE1IS, - .writefn = tlbi_aa64_rvae1is_write }, - { .name = "TLBI_RVAAE1IS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 2, .opc2 = 3, - .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW, - .fgt = FGT_TLBIRVAAE1IS, - .writefn = tlbi_aa64_rvae1is_write }, - { .name = "TLBI_RVALE1IS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 2, .opc2 = 5, - .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW, - .fgt = FGT_TLBIRVALE1IS, - .writefn = tlbi_aa64_rvae1is_write }, - { .name = "TLBI_RVAALE1IS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 2, .opc2 = 7, - .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW, - .fgt = FGT_TLBIRVAALE1IS, - .writefn = tlbi_aa64_rvae1is_write }, - { .name = "TLBI_RVAE1OS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 1, - .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW, - .fgt = FGT_TLBIRVAE1OS, - .writefn = tlbi_aa64_rvae1is_write }, - { .name = "TLBI_RVAAE1OS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 3, - .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW, - .fgt = FGT_TLBIRVAAE1OS, - .writefn = tlbi_aa64_rvae1is_write }, - { .name = "TLBI_RVALE1OS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 5, - .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW, - .fgt = FGT_TLBIRVALE1OS, - .writefn = tlbi_aa64_rvae1is_write }, - { .name = "TLBI_RVAALE1OS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 7, - .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW, - .fgt = FGT_TLBIRVAALE1OS, - .writefn = tlbi_aa64_rvae1is_write }, - { .name = "TLBI_RVAE1", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 6, .opc2 = 1, - .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW, - .fgt = FGT_TLBIRVAE1, - .writefn = tlbi_aa64_rvae1_write }, - { .name = "TLBI_RVAAE1", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 6, .opc2 = 3, - .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW, - .fgt = FGT_TLBIRVAAE1, - .writefn = tlbi_aa64_rvae1_write }, - { .name = "TLBI_RVALE1", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 6, .opc2 = 5, - .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW, - .fgt = FGT_TLBIRVALE1, - .writefn = tlbi_aa64_rvae1_write }, - { .name = "TLBI_RVAALE1", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 6, .opc2 = 7, - .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW, - .fgt = FGT_TLBIRVAALE1, - .writefn = tlbi_aa64_rvae1_write }, - { .name = "TLBI_RIPAS2E1IS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 2, - .access = PL2_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_ripas2e1is_write }, - { .name = "TLBI_RIPAS2LE1IS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 6, - .access = PL2_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_ripas2e1is_write }, - { .name = "TLBI_RVAE2IS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 2, .opc2 = 1, - .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, - .writefn = tlbi_aa64_rvae2is_write }, - { .name = "TLBI_RVALE2IS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 2, .opc2 = 5, - .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, - .writefn = tlbi_aa64_rvae2is_write }, - { .name = "TLBI_RIPAS2E1", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 2, - .access = PL2_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_ripas2e1_write }, - { .name = "TLBI_RIPAS2LE1", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 6, - .access = PL2_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_ripas2e1_write }, - { .name = "TLBI_RVAE2OS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 5, .opc2 = 1, - .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, - .writefn = tlbi_aa64_rvae2is_write }, - { .name = "TLBI_RVALE2OS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 5, .opc2 = 5, - .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, - .writefn = tlbi_aa64_rvae2is_write }, - { .name = "TLBI_RVAE2", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 6, .opc2 = 1, - .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, - .writefn = tlbi_aa64_rvae2_write }, - { .name = "TLBI_RVALE2", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 6, .opc2 = 5, - .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, - .writefn = tlbi_aa64_rvae2_write }, - { .name = "TLBI_RVAE3IS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 2, .opc2 = 1, - .access = PL3_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_rvae3is_write }, - { .name = "TLBI_RVALE3IS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 2, .opc2 = 5, - .access = PL3_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_rvae3is_write }, - { .name = "TLBI_RVAE3OS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 5, .opc2 = 1, - .access = PL3_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_rvae3is_write }, - { .name = "TLBI_RVALE3OS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 5, .opc2 = 5, - .access = PL3_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_rvae3is_write }, - { .name = "TLBI_RVAE3", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 6, .opc2 = 1, - .access = PL3_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_rvae3_write }, - { .name = "TLBI_RVALE3", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 6, .opc2 = 5, - .access = PL3_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_rvae3_write }, -}; - static const ARMCPRegInfo tlbios_reginfo[] = { { .name = "TLBI_VMALLE1OS", .state = ARM_CP_STATE_AA64, .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 0, @@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu) if (cpu_isar_feature(aa64_rndr, cpu)) { define_arm_cp_regs(cpu, rndr_reginfo); } - if (cpu_isar_feature(aa64_tlbirange, cpu)) { - define_arm_cp_regs(cpu, tlbirange_reginfo); - } if (cpu_isar_feature(aa64_tlbios, cpu)) { define_arm_cp_regs(cpu, tlbios_reginfo); } diff --git a/target/arm/tcg/tlb-insns.c b/target/arm/tcg/tlb-insns.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/tlb-insns.c +++ b/target/arm/tcg/tlb-insns.c @@ -XXX,XX +XXX,XX @@ * SPDX-License-Identifier: GPL-2.0-or-later */ #include "qemu/osdep.h" +#include "qemu/log.h" #include "exec/exec-all.h" #include "cpu.h" #include "internals.h" @@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo tlbi_el3_cp_reginfo[] = { .writefn = tlbi_aa64_vae3_write }, }; +#ifdef TARGET_AARCH64 +typedef struct { + uint64_t base; + uint64_t length; +} TLBIRange; + +static ARMGranuleSize tlbi_range_tg_to_gran_size(int tg) +{ + /* + * Note that the TLBI range TG field encoding differs from both + * TG0 and TG1 encodings. + */ + switch (tg) { + case 1: + return Gran4K; + case 2: + return Gran16K; + case 3: + return Gran64K; + default: + return GranInvalid; + } +} + +static TLBIRange tlbi_aa64_get_range(CPUARMState *env, ARMMMUIdx mmuidx, + uint64_t value) +{ + unsigned int page_size_granule, page_shift, num, scale, exponent; + /* Extract one bit to represent the va selector in use. */ + uint64_t select = sextract64(value, 36, 1); + ARMVAParameters param = aa64_va_parameters(env, select, mmuidx, true, false); + TLBIRange ret = { }; + ARMGranuleSize gran; + + page_size_granule = extract64(value, 46, 2); + gran = tlbi_range_tg_to_gran_size(page_size_granule); + + /* The granule encoded in value must match the granule in use. */ + if (gran != param.gran) { + qemu_log_mask(LOG_GUEST_ERROR, "Invalid tlbi page size granule %d\n", + page_size_granule); + return ret; + } + + page_shift = arm_granule_bits(gran); + num = extract64(value, 39, 5); + scale = extract64(value, 44, 2); + exponent = (5 * scale) + 1; + + ret.length = (num + 1) << (exponent + page_shift); + + if (param.select) { + ret.base = sextract64(value, 0, 37); + } else { + ret.base = extract64(value, 0, 37); + } + if (param.ds) { + /* + * With DS=1, BaseADDR is always shifted 16 so that it is able + * to address all 52 va bits. The input address is perforce + * aligned on a 64k boundary regardless of translation granule. + */ + page_shift = 16; + } + ret.base <<= page_shift; + + return ret; +} + +static void do_rvae_write(CPUARMState *env, uint64_t value, + int idxmap, bool synced) +{ + ARMMMUIdx one_idx = ARM_MMU_IDX_A | ctz32(idxmap); + TLBIRange range; + int bits; + + range = tlbi_aa64_get_range(env, one_idx, value); + bits = tlbbits_for_regime(env, one_idx, range.base); + + if (synced) { + tlb_flush_range_by_mmuidx_all_cpus_synced(env_cpu(env), + range.base, + range.length, + idxmap, + bits); + } else { + tlb_flush_range_by_mmuidx(env_cpu(env), range.base, + range.length, idxmap, bits); + } +} + +static void tlbi_aa64_rvae1_write(CPUARMState *env, + const ARMCPRegInfo *ri, + uint64_t value) +{ + /* + * Invalidate by VA range, EL1&0. + * Currently handles all of RVAE1, RVAAE1, RVAALE1 and RVALE1, + * since we don't support flush-for-specific-ASID-only or + * flush-last-level-only. + */ + + do_rvae_write(env, value, vae1_tlbmask(env), + tlb_force_broadcast(env)); +} + +static void tlbi_aa64_rvae1is_write(CPUARMState *env, + const ARMCPRegInfo *ri, + uint64_t value) +{ + /* + * Invalidate by VA range, Inner/Outer Shareable EL1&0. + * Currently handles all of RVAE1IS, RVAE1OS, RVAAE1IS, RVAAE1OS, + * RVAALE1IS, RVAALE1OS, RVALE1IS and RVALE1OS, since we don't support + * flush-for-specific-ASID-only, flush-last-level-only or inner/outer + * shareable specific flushes. + */ + + do_rvae_write(env, value, vae1_tlbmask(env), true); +} + +static void tlbi_aa64_rvae2_write(CPUARMState *env, + const ARMCPRegInfo *ri, + uint64_t value) +{ + /* + * Invalidate by VA range, EL2. + * Currently handles all of RVAE2 and RVALE2, + * since we don't support flush-for-specific-ASID-only or + * flush-last-level-only. + */ + + do_rvae_write(env, value, vae2_tlbmask(env), + tlb_force_broadcast(env)); + + +} + +static void tlbi_aa64_rvae2is_write(CPUARMState *env, + const ARMCPRegInfo *ri, + uint64_t value) +{ + /* + * Invalidate by VA range, Inner/Outer Shareable, EL2. + * Currently handles all of RVAE2IS, RVAE2OS, RVALE2IS and RVALE2OS, + * since we don't support flush-for-specific-ASID-only, + * flush-last-level-only or inner/outer shareable specific flushes. + */ + + do_rvae_write(env, value, vae2_tlbmask(env), true); + +} + +static void tlbi_aa64_rvae3_write(CPUARMState *env, + const ARMCPRegInfo *ri, + uint64_t value) +{ + /* + * Invalidate by VA range, EL3. + * Currently handles all of RVAE3 and RVALE3, + * since we don't support flush-for-specific-ASID-only or + * flush-last-level-only. + */ + + do_rvae_write(env, value, ARMMMUIdxBit_E3, tlb_force_broadcast(env)); +} + +static void tlbi_aa64_rvae3is_write(CPUARMState *env, + const ARMCPRegInfo *ri, + uint64_t value) +{ + /* + * Invalidate by VA range, EL3, Inner/Outer Shareable. + * Currently handles all of RVAE3IS, RVAE3OS, RVALE3IS and RVALE3OS, + * since we don't support flush-for-specific-ASID-only, + * flush-last-level-only or inner/outer specific flushes. + */ + + do_rvae_write(env, value, ARMMMUIdxBit_E3, true); +} + +static void tlbi_aa64_ripas2e1_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + do_rvae_write(env, value, ipas2e1_tlbmask(env, value), + tlb_force_broadcast(env)); +} + +static void tlbi_aa64_ripas2e1is_write(CPUARMState *env, + const ARMCPRegInfo *ri, + uint64_t value) +{ + do_rvae_write(env, value, ipas2e1_tlbmask(env, value), true); +} + +static const ARMCPRegInfo tlbirange_reginfo[] = { + { .name = "TLBI_RVAE1IS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 2, .opc2 = 1, + .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW, + .fgt = FGT_TLBIRVAE1IS, + .writefn = tlbi_aa64_rvae1is_write }, + { .name = "TLBI_RVAAE1IS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 2, .opc2 = 3, + .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW, + .fgt = FGT_TLBIRVAAE1IS, + .writefn = tlbi_aa64_rvae1is_write }, + { .name = "TLBI_RVALE1IS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 2, .opc2 = 5, + .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW, + .fgt = FGT_TLBIRVALE1IS, + .writefn = tlbi_aa64_rvae1is_write }, + { .name = "TLBI_RVAALE1IS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 2, .opc2 = 7, + .access = PL1_W, .accessfn = access_ttlbis, .type = ARM_CP_NO_RAW, + .fgt = FGT_TLBIRVAALE1IS, + .writefn = tlbi_aa64_rvae1is_write }, + { .name = "TLBI_RVAE1OS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 1, + .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW, + .fgt = FGT_TLBIRVAE1OS, + .writefn = tlbi_aa64_rvae1is_write }, + { .name = "TLBI_RVAAE1OS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 3, + .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW, + .fgt = FGT_TLBIRVAAE1OS, + .writefn = tlbi_aa64_rvae1is_write }, + { .name = "TLBI_RVALE1OS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 5, + .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW, + .fgt = FGT_TLBIRVALE1OS, + .writefn = tlbi_aa64_rvae1is_write }, + { .name = "TLBI_RVAALE1OS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 7, + .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW, + .fgt = FGT_TLBIRVAALE1OS, + .writefn = tlbi_aa64_rvae1is_write }, + { .name = "TLBI_RVAE1", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 6, .opc2 = 1, + .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW, + .fgt = FGT_TLBIRVAE1, + .writefn = tlbi_aa64_rvae1_write }, + { .name = "TLBI_RVAAE1", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 6, .opc2 = 3, + .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW, + .fgt = FGT_TLBIRVAAE1, + .writefn = tlbi_aa64_rvae1_write }, + { .name = "TLBI_RVALE1", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 6, .opc2 = 5, + .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW, + .fgt = FGT_TLBIRVALE1, + .writefn = tlbi_aa64_rvae1_write }, + { .name = "TLBI_RVAALE1", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 6, .opc2 = 7, + .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW, + .fgt = FGT_TLBIRVAALE1, + .writefn = tlbi_aa64_rvae1_write }, + { .name = "TLBI_RIPAS2E1IS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 2, + .access = PL2_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_ripas2e1is_write }, + { .name = "TLBI_RIPAS2LE1IS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 6, + .access = PL2_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_ripas2e1is_write }, + { .name = "TLBI_RVAE2IS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 2, .opc2 = 1, + .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, + .writefn = tlbi_aa64_rvae2is_write }, + { .name = "TLBI_RVALE2IS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 2, .opc2 = 5, + .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, + .writefn = tlbi_aa64_rvae2is_write }, + { .name = "TLBI_RIPAS2E1", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 2, + .access = PL2_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_ripas2e1_write }, + { .name = "TLBI_RIPAS2LE1", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 6, + .access = PL2_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_ripas2e1_write }, + { .name = "TLBI_RVAE2OS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 5, .opc2 = 1, + .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, + .writefn = tlbi_aa64_rvae2is_write }, + { .name = "TLBI_RVALE2OS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 5, .opc2 = 5, + .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, + .writefn = tlbi_aa64_rvae2is_write }, + { .name = "TLBI_RVAE2", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 6, .opc2 = 1, + .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, + .writefn = tlbi_aa64_rvae2_write }, + { .name = "TLBI_RVALE2", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 6, .opc2 = 5, + .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, + .writefn = tlbi_aa64_rvae2_write }, + { .name = "TLBI_RVAE3IS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 2, .opc2 = 1, + .access = PL3_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_rvae3is_write }, + { .name = "TLBI_RVALE3IS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 2, .opc2 = 5, + .access = PL3_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_rvae3is_write }, + { .name = "TLBI_RVAE3OS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 5, .opc2 = 1, + .access = PL3_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_rvae3is_write }, + { .name = "TLBI_RVALE3OS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 5, .opc2 = 5, + .access = PL3_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_rvae3is_write }, + { .name = "TLBI_RVAE3", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 6, .opc2 = 1, + .access = PL3_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_rvae3_write }, + { .name = "TLBI_RVALE3", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 6, .opc2 = 5, + .access = PL3_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_rvae3_write }, +}; +#endif + void define_tlb_insn_regs(ARMCPU *cpu) { CPUARMState *env = &cpu->env; @@ -XXX,XX +XXX,XX @@ void define_tlb_insn_regs(ARMCPU *cpu) if (arm_feature(env, ARM_FEATURE_EL3)) { define_arm_cp_regs(cpu, tlbi_el3_cp_reginfo); } +#ifdef TARGET_AARCH64 + if (cpu_isar_feature(aa64_tlbirange, cpu)) { + define_arm_cp_regs(cpu, tlbirange_reginfo); + } +#endif } -- 2.34.1
Move the TLBI OS insns across to tlb-insns.c. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241210160452.2427965-8-peter.maydell@linaro.org --- target/arm/helper.c | 80 -------------------------------------- target/arm/tcg/tlb-insns.c | 80 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 80 insertions(+), 80 deletions(-) diff --git a/target/arm/helper.c b/target/arm/helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo pauth_reginfo[] = { .fieldoffset = offsetof(CPUARMState, keys.apib.hi) }, }; -static const ARMCPRegInfo tlbios_reginfo[] = { - { .name = "TLBI_VMALLE1OS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 0, - .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW, - .fgt = FGT_TLBIVMALLE1OS, - .writefn = tlbi_aa64_vmalle1is_write }, - { .name = "TLBI_VAE1OS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 1, - .fgt = FGT_TLBIVAE1OS, - .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_vae1is_write }, - { .name = "TLBI_ASIDE1OS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 2, - .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW, - .fgt = FGT_TLBIASIDE1OS, - .writefn = tlbi_aa64_vmalle1is_write }, - { .name = "TLBI_VAAE1OS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 3, - .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW, - .fgt = FGT_TLBIVAAE1OS, - .writefn = tlbi_aa64_vae1is_write }, - { .name = "TLBI_VALE1OS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 5, - .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW, - .fgt = FGT_TLBIVALE1OS, - .writefn = tlbi_aa64_vae1is_write }, - { .name = "TLBI_VAALE1OS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 7, - .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW, - .fgt = FGT_TLBIVAALE1OS, - .writefn = tlbi_aa64_vae1is_write }, - { .name = "TLBI_ALLE2OS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 1, .opc2 = 0, - .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, - .writefn = tlbi_aa64_alle2is_write }, - { .name = "TLBI_VAE2OS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 1, .opc2 = 1, - .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, - .writefn = tlbi_aa64_vae2is_write }, - { .name = "TLBI_ALLE1OS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 1, .opc2 = 4, - .access = PL2_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_alle1is_write }, - { .name = "TLBI_VALE2OS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 1, .opc2 = 5, - .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, - .writefn = tlbi_aa64_vae2is_write }, - { .name = "TLBI_VMALLS12E1OS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 1, .opc2 = 6, - .access = PL2_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_alle1is_write }, - { .name = "TLBI_IPAS2E1OS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 0, - .access = PL2_W, .type = ARM_CP_NOP }, - { .name = "TLBI_RIPAS2E1OS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 3, - .access = PL2_W, .type = ARM_CP_NOP }, - { .name = "TLBI_IPAS2LE1OS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 4, - .access = PL2_W, .type = ARM_CP_NOP }, - { .name = "TLBI_RIPAS2LE1OS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 7, - .access = PL2_W, .type = ARM_CP_NOP }, - { .name = "TLBI_ALLE3OS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 1, .opc2 = 0, - .access = PL3_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_alle3is_write }, - { .name = "TLBI_VAE3OS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 1, .opc2 = 1, - .access = PL3_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_vae3is_write }, - { .name = "TLBI_VALE3OS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 1, .opc2 = 5, - .access = PL3_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_vae3is_write }, -}; - static uint64_t rndr_readfn(CPUARMState *env, const ARMCPRegInfo *ri) { Error *err = NULL; @@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu) if (cpu_isar_feature(aa64_rndr, cpu)) { define_arm_cp_regs(cpu, rndr_reginfo); } - if (cpu_isar_feature(aa64_tlbios, cpu)) { - define_arm_cp_regs(cpu, tlbios_reginfo); - } /* Data Cache clean instructions up to PoP */ if (cpu_isar_feature(aa64_dcpop, cpu)) { define_one_arm_cp_reg(cpu, dcpop_reg); diff --git a/target/arm/tcg/tlb-insns.c b/target/arm/tcg/tlb-insns.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/tlb-insns.c +++ b/target/arm/tcg/tlb-insns.c @@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo tlbirange_reginfo[] = { .access = PL3_W, .type = ARM_CP_NO_RAW, .writefn = tlbi_aa64_rvae3_write }, }; + +static const ARMCPRegInfo tlbios_reginfo[] = { + { .name = "TLBI_VMALLE1OS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 0, + .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW, + .fgt = FGT_TLBIVMALLE1OS, + .writefn = tlbi_aa64_vmalle1is_write }, + { .name = "TLBI_VAE1OS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 1, + .fgt = FGT_TLBIVAE1OS, + .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_vae1is_write }, + { .name = "TLBI_ASIDE1OS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 2, + .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW, + .fgt = FGT_TLBIASIDE1OS, + .writefn = tlbi_aa64_vmalle1is_write }, + { .name = "TLBI_VAAE1OS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 3, + .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW, + .fgt = FGT_TLBIVAAE1OS, + .writefn = tlbi_aa64_vae1is_write }, + { .name = "TLBI_VALE1OS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 5, + .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW, + .fgt = FGT_TLBIVALE1OS, + .writefn = tlbi_aa64_vae1is_write }, + { .name = "TLBI_VAALE1OS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 1, .opc2 = 7, + .access = PL1_W, .accessfn = access_ttlbos, .type = ARM_CP_NO_RAW, + .fgt = FGT_TLBIVAALE1OS, + .writefn = tlbi_aa64_vae1is_write }, + { .name = "TLBI_ALLE2OS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 1, .opc2 = 0, + .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, + .writefn = tlbi_aa64_alle2is_write }, + { .name = "TLBI_VAE2OS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 1, .opc2 = 1, + .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, + .writefn = tlbi_aa64_vae2is_write }, + { .name = "TLBI_ALLE1OS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 1, .opc2 = 4, + .access = PL2_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_alle1is_write }, + { .name = "TLBI_VALE2OS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 1, .opc2 = 5, + .access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF, + .writefn = tlbi_aa64_vae2is_write }, + { .name = "TLBI_VMALLS12E1OS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 1, .opc2 = 6, + .access = PL2_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_alle1is_write }, + { .name = "TLBI_IPAS2E1OS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 0, + .access = PL2_W, .type = ARM_CP_NOP }, + { .name = "TLBI_RIPAS2E1OS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 3, + .access = PL2_W, .type = ARM_CP_NOP }, + { .name = "TLBI_IPAS2LE1OS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 4, + .access = PL2_W, .type = ARM_CP_NOP }, + { .name = "TLBI_RIPAS2LE1OS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 7, + .access = PL2_W, .type = ARM_CP_NOP }, + { .name = "TLBI_ALLE3OS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 1, .opc2 = 0, + .access = PL3_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_alle3is_write }, + { .name = "TLBI_VAE3OS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 1, .opc2 = 1, + .access = PL3_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_vae3is_write }, + { .name = "TLBI_VALE3OS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 1, .opc2 = 5, + .access = PL3_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_vae3is_write }, +}; #endif void define_tlb_insn_regs(ARMCPU *cpu) @@ -XXX,XX +XXX,XX @@ void define_tlb_insn_regs(ARMCPU *cpu) if (cpu_isar_feature(aa64_tlbirange, cpu)) { define_arm_cp_regs(cpu, tlbirange_reginfo); } + if (cpu_isar_feature(aa64_tlbios, cpu)) { + define_arm_cp_regs(cpu, tlbios_reginfo); + } #endif } -- 2.34.1
The remaining functions that we temporarily made global are now used only from callsits in tlb-insns.c; move them across and make them file-local again. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241210160452.2427965-9-peter.maydell@linaro.org --- target/arm/cpregs.h | 34 ------ target/arm/helper.c | 220 ------------------------------------- target/arm/tcg/tlb-insns.c | 220 +++++++++++++++++++++++++++++++++++++ 3 files changed, 220 insertions(+), 254 deletions(-) diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpregs.h +++ b/target/arm/cpregs.h @@ -XXX,XX +XXX,XX @@ static inline bool arm_cpreg_traps_in_nv(const ARMCPRegInfo *ri) return ri->opc1 == 4 || ri->opc1 == 5; } -/* - * Temporary declarations of functions until the move to tlb_insn_helper.c - * is complete and we can make the functions static again - */ -CPAccessResult access_ttlb(CPUARMState *env, const ARMCPRegInfo *ri, - bool isread); -CPAccessResult access_ttlbis(CPUARMState *env, const ARMCPRegInfo *ri, - bool isread); -CPAccessResult access_ttlbos(CPUARMState *env, const ARMCPRegInfo *ri, - bool isread); -bool tlb_force_broadcast(CPUARMState *env); -int tlbbits_for_regime(CPUARMState *env, ARMMMUIdx mmu_idx, - uint64_t addr); -int vae1_tlbbits(CPUARMState *env, uint64_t addr); -int vae2_tlbbits(CPUARMState *env, uint64_t addr); -int vae1_tlbmask(CPUARMState *env); -int vae2_tlbmask(CPUARMState *env); -int ipas2e1_tlbmask(CPUARMState *env, int64_t value); -int e2_tlbmask(CPUARMState *env); -void tlbi_aa64_vmalle1is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value); -void tlbi_aa64_alle1is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value); -void tlbi_aa64_vae1is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value); -void tlbi_aa64_alle2is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value); -void tlbi_aa64_vae2is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value); -void tlbi_aa64_vae3is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value); -void tlbi_aa64_alle3is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value); - #endif /* TARGET_ARM_CPREGS_H */ diff --git a/target/arm/helper.c b/target/arm/helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -XXX,XX +XXX,XX @@ static CPAccessResult access_tacr(CPUARMState *env, const ARMCPRegInfo *ri, return CP_ACCESS_OK; } -/* Check for traps from EL1 due to HCR_EL2.TTLB. */ -CPAccessResult access_ttlb(CPUARMState *env, const ARMCPRegInfo *ri, - bool isread) -{ - if (arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & HCR_TTLB)) { - return CP_ACCESS_TRAP_EL2; - } - return CP_ACCESS_OK; -} - -/* Check for traps from EL1 due to HCR_EL2.TTLB or TTLBIS. */ -CPAccessResult access_ttlbis(CPUARMState *env, const ARMCPRegInfo *ri, - bool isread) -{ - if (arm_current_el(env) == 1 && - (arm_hcr_el2_eff(env) & (HCR_TTLB | HCR_TTLBIS))) { - return CP_ACCESS_TRAP_EL2; - } - return CP_ACCESS_OK; -} - -#ifdef TARGET_AARCH64 -/* Check for traps from EL1 due to HCR_EL2.TTLB or TTLBOS. */ -CPAccessResult access_ttlbos(CPUARMState *env, const ARMCPRegInfo *ri, - bool isread) -{ - if (arm_current_el(env) == 1 && - (arm_hcr_el2_eff(env) & (HCR_TTLB | HCR_TTLBOS))) { - return CP_ACCESS_TRAP_EL2; - } - return CP_ACCESS_OK; -} -#endif - static void dacr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value) { ARMCPU *cpu = env_archcpu(env); @@ -XXX,XX +XXX,XX @@ int alle1_tlbmask(CPUARMState *env) ARMMMUIdxBit_Stage2_S); } -/* - * Non-IS variants of TLB operations are upgraded to - * IS versions if we are at EL1 and HCR_EL2.FB is effectively set to - * force broadcast of these operations. - */ -bool tlb_force_broadcast(CPUARMState *env) -{ - return arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & HCR_FB); -} - static const ARMCPRegInfo cp_reginfo[] = { /* * Define the secure and non-secure FCSE identifier CP registers @@ -XXX,XX +XXX,XX @@ static CPAccessResult access_tocu(CPUARMState *env, const ARMCPRegInfo *ri, return do_cacheop_pou_access(env, HCR_TOCU | HCR_TPU); } -/* - * See: D4.7.2 TLB maintenance requirements and the TLB maintenance instructions - * Page D4-1736 (DDI0487A.b) - */ - -int vae1_tlbmask(CPUARMState *env) -{ - uint64_t hcr = arm_hcr_el2_eff(env); - uint16_t mask; - - assert(arm_feature(env, ARM_FEATURE_AARCH64)); - - if ((hcr & (HCR_E2H | HCR_TGE)) == (HCR_E2H | HCR_TGE)) { - mask = ARMMMUIdxBit_E20_2 | - ARMMMUIdxBit_E20_2_PAN | - ARMMMUIdxBit_E20_0; - } else { - /* This is AArch64 only, so we don't need to touch the EL30_x TLBs */ - mask = ARMMMUIdxBit_E10_1 | - ARMMMUIdxBit_E10_1_PAN | - ARMMMUIdxBit_E10_0; - } - return mask; -} - -int vae2_tlbmask(CPUARMState *env) -{ - uint64_t hcr = arm_hcr_el2_eff(env); - uint16_t mask; - - if (hcr & HCR_E2H) { - mask = ARMMMUIdxBit_E20_2 | - ARMMMUIdxBit_E20_2_PAN | - ARMMMUIdxBit_E20_0; - } else { - mask = ARMMMUIdxBit_E2; - } - return mask; -} - -/* Return 56 if TBI is enabled, 64 otherwise. */ -int tlbbits_for_regime(CPUARMState *env, ARMMMUIdx mmu_idx, - uint64_t addr) -{ - uint64_t tcr = regime_tcr(env, mmu_idx); - int tbi = aa64_va_parameter_tbi(tcr, mmu_idx); - int select = extract64(addr, 55, 1); - - return (tbi >> select) & 1 ? 56 : 64; -} - -int vae1_tlbbits(CPUARMState *env, uint64_t addr) -{ - uint64_t hcr = arm_hcr_el2_eff(env); - ARMMMUIdx mmu_idx; - - assert(arm_feature(env, ARM_FEATURE_AARCH64)); - - /* Only the regime of the mmu_idx below is significant. */ - if ((hcr & (HCR_E2H | HCR_TGE)) == (HCR_E2H | HCR_TGE)) { - mmu_idx = ARMMMUIdx_E20_0; - } else { - mmu_idx = ARMMMUIdx_E10_0; - } - - return tlbbits_for_regime(env, mmu_idx, addr); -} - -int vae2_tlbbits(CPUARMState *env, uint64_t addr) -{ - uint64_t hcr = arm_hcr_el2_eff(env); - ARMMMUIdx mmu_idx; - - /* - * Only the regime of the mmu_idx below is significant. - * Regime EL2&0 has two ranges with separate TBI configuration, while EL2 - * only has one. - */ - if (hcr & HCR_E2H) { - mmu_idx = ARMMMUIdx_E20_2; - } else { - mmu_idx = ARMMMUIdx_E2; - } - - return tlbbits_for_regime(env, mmu_idx, addr); -} - -void tlbi_aa64_vmalle1is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - CPUState *cs = env_cpu(env); - int mask = vae1_tlbmask(env); - - tlb_flush_by_mmuidx_all_cpus_synced(cs, mask); -} - -int e2_tlbmask(CPUARMState *env) -{ - return (ARMMMUIdxBit_E20_0 | - ARMMMUIdxBit_E20_2 | - ARMMMUIdxBit_E20_2_PAN | - ARMMMUIdxBit_E2); -} - -void tlbi_aa64_alle1is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - CPUState *cs = env_cpu(env); - int mask = alle1_tlbmask(env); - - tlb_flush_by_mmuidx_all_cpus_synced(cs, mask); -} - -void tlbi_aa64_alle2is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - CPUState *cs = env_cpu(env); - int mask = e2_tlbmask(env); - - tlb_flush_by_mmuidx_all_cpus_synced(cs, mask); -} - -void tlbi_aa64_alle3is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - CPUState *cs = env_cpu(env); - - tlb_flush_by_mmuidx_all_cpus_synced(cs, ARMMMUIdxBit_E3); -} - -void tlbi_aa64_vae1is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - CPUState *cs = env_cpu(env); - int mask = vae1_tlbmask(env); - uint64_t pageaddr = sextract64(value << 12, 0, 56); - int bits = vae1_tlbbits(env, pageaddr); - - tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr, mask, bits); -} - -void tlbi_aa64_vae2is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - CPUState *cs = env_cpu(env); - int mask = vae2_tlbmask(env); - uint64_t pageaddr = sextract64(value << 12, 0, 56); - int bits = vae2_tlbbits(env, pageaddr); - - tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr, mask, bits); -} - -void tlbi_aa64_vae3is_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - CPUState *cs = env_cpu(env); - uint64_t pageaddr = sextract64(value << 12, 0, 56); - int bits = tlbbits_for_regime(env, ARMMMUIdx_E3, pageaddr); - - tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr, - ARMMMUIdxBit_E3, bits); -} - -int ipas2e1_tlbmask(CPUARMState *env, int64_t value) -{ - /* - * The MSB of value is the NS field, which only applies if SEL2 - * is implemented and SCR_EL3.NS is not set (i.e. in secure mode). - */ - return (value >= 0 - && cpu_isar_feature(aa64_sel2, env_archcpu(env)) - && arm_is_secure_below_el3(env) - ? ARMMMUIdxBit_Stage2_S - : ARMMMUIdxBit_Stage2); -} - static CPAccessResult aa64_zva_access(CPUARMState *env, const ARMCPRegInfo *ri, bool isread) { diff --git a/target/arm/tcg/tlb-insns.c b/target/arm/tcg/tlb-insns.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/tlb-insns.c +++ b/target/arm/tcg/tlb-insns.c @@ -XXX,XX +XXX,XX @@ #include "cpu-features.h" #include "cpregs.h" +/* Check for traps from EL1 due to HCR_EL2.TTLB. */ +static CPAccessResult access_ttlb(CPUARMState *env, const ARMCPRegInfo *ri, + bool isread) +{ + if (arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & HCR_TTLB)) { + return CP_ACCESS_TRAP_EL2; + } + return CP_ACCESS_OK; +} + +/* Check for traps from EL1 due to HCR_EL2.TTLB or TTLBIS. */ +static CPAccessResult access_ttlbis(CPUARMState *env, const ARMCPRegInfo *ri, + bool isread) +{ + if (arm_current_el(env) == 1 && + (arm_hcr_el2_eff(env) & (HCR_TTLB | HCR_TTLBIS))) { + return CP_ACCESS_TRAP_EL2; + } + return CP_ACCESS_OK; +} + +#ifdef TARGET_AARCH64 +/* Check for traps from EL1 due to HCR_EL2.TTLB or TTLBOS. */ +static CPAccessResult access_ttlbos(CPUARMState *env, const ARMCPRegInfo *ri, + bool isread) +{ + if (arm_current_el(env) == 1 && + (arm_hcr_el2_eff(env) & (HCR_TTLB | HCR_TTLBOS))) { + return CP_ACCESS_TRAP_EL2; + } + return CP_ACCESS_OK; +} +#endif + /* IS variants of TLB operations must affect all cores */ static void tlbiall_is_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value) @@ -XXX,XX +XXX,XX @@ static void tlbimvaa_is_write(CPUARMState *env, const ARMCPRegInfo *ri, tlb_flush_page_all_cpus_synced(cs, value & TARGET_PAGE_MASK); } +/* + * Non-IS variants of TLB operations are upgraded to + * IS versions if we are at EL1 and HCR_EL2.FB is effectively set to + * force broadcast of these operations. + */ +static bool tlb_force_broadcast(CPUARMState *env) +{ + return arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & HCR_FB); +} + static void tlbiall_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value) { @@ -XXX,XX +XXX,XX @@ static void tlbiall_hyp_is_write(CPUARMState *env, const ARMCPRegInfo *ri, tlb_flush_by_mmuidx_all_cpus_synced(cs, ARMMMUIdxBit_E2); } +/* + * See: D4.7.2 TLB maintenance requirements and the TLB maintenance instructions + * Page D4-1736 (DDI0487A.b) + */ + +static int vae1_tlbmask(CPUARMState *env) +{ + uint64_t hcr = arm_hcr_el2_eff(env); + uint16_t mask; + + assert(arm_feature(env, ARM_FEATURE_AARCH64)); + + if ((hcr & (HCR_E2H | HCR_TGE)) == (HCR_E2H | HCR_TGE)) { + mask = ARMMMUIdxBit_E20_2 | + ARMMMUIdxBit_E20_2_PAN | + ARMMMUIdxBit_E20_0; + } else { + /* This is AArch64 only, so we don't need to touch the EL30_x TLBs */ + mask = ARMMMUIdxBit_E10_1 | + ARMMMUIdxBit_E10_1_PAN | + ARMMMUIdxBit_E10_0; + } + return mask; +} + +static int vae2_tlbmask(CPUARMState *env) +{ + uint64_t hcr = arm_hcr_el2_eff(env); + uint16_t mask; + + if (hcr & HCR_E2H) { + mask = ARMMMUIdxBit_E20_2 | + ARMMMUIdxBit_E20_2_PAN | + ARMMMUIdxBit_E20_0; + } else { + mask = ARMMMUIdxBit_E2; + } + return mask; +} + +/* Return 56 if TBI is enabled, 64 otherwise. */ +static int tlbbits_for_regime(CPUARMState *env, ARMMMUIdx mmu_idx, + uint64_t addr) +{ + uint64_t tcr = regime_tcr(env, mmu_idx); + int tbi = aa64_va_parameter_tbi(tcr, mmu_idx); + int select = extract64(addr, 55, 1); + + return (tbi >> select) & 1 ? 56 : 64; +} + +static int vae1_tlbbits(CPUARMState *env, uint64_t addr) +{ + uint64_t hcr = arm_hcr_el2_eff(env); + ARMMMUIdx mmu_idx; + + assert(arm_feature(env, ARM_FEATURE_AARCH64)); + + /* Only the regime of the mmu_idx below is significant. */ + if ((hcr & (HCR_E2H | HCR_TGE)) == (HCR_E2H | HCR_TGE)) { + mmu_idx = ARMMMUIdx_E20_0; + } else { + mmu_idx = ARMMMUIdx_E10_0; + } + + return tlbbits_for_regime(env, mmu_idx, addr); +} + +static int vae2_tlbbits(CPUARMState *env, uint64_t addr) +{ + uint64_t hcr = arm_hcr_el2_eff(env); + ARMMMUIdx mmu_idx; + + /* + * Only the regime of the mmu_idx below is significant. + * Regime EL2&0 has two ranges with separate TBI configuration, while EL2 + * only has one. + */ + if (hcr & HCR_E2H) { + mmu_idx = ARMMMUIdx_E20_2; + } else { + mmu_idx = ARMMMUIdx_E2; + } + + return tlbbits_for_regime(env, mmu_idx, addr); +} + +static void tlbi_aa64_vmalle1is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + CPUState *cs = env_cpu(env); + int mask = vae1_tlbmask(env); + + tlb_flush_by_mmuidx_all_cpus_synced(cs, mask); +} + static void tlbi_aa64_vmalle1_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value) { @@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vmalle1_write(CPUARMState *env, const ARMCPRegInfo *ri, } } +static int e2_tlbmask(CPUARMState *env) +{ + return (ARMMMUIdxBit_E20_0 | + ARMMMUIdxBit_E20_2 | + ARMMMUIdxBit_E20_2_PAN | + ARMMMUIdxBit_E2); +} + static void tlbi_aa64_alle1_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value) { @@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_alle3_write(CPUARMState *env, const ARMCPRegInfo *ri, tlb_flush_by_mmuidx(cs, ARMMMUIdxBit_E3); } +static void tlbi_aa64_alle1is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + CPUState *cs = env_cpu(env); + int mask = alle1_tlbmask(env); + + tlb_flush_by_mmuidx_all_cpus_synced(cs, mask); +} + +static void tlbi_aa64_alle2is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + CPUState *cs = env_cpu(env); + int mask = e2_tlbmask(env); + + tlb_flush_by_mmuidx_all_cpus_synced(cs, mask); +} + +static void tlbi_aa64_alle3is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + CPUState *cs = env_cpu(env); + + tlb_flush_by_mmuidx_all_cpus_synced(cs, ARMMMUIdxBit_E3); +} + static void tlbi_aa64_vae2_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value) { @@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae3_write(CPUARMState *env, const ARMCPRegInfo *ri, tlb_flush_page_by_mmuidx(cs, pageaddr, ARMMMUIdxBit_E3); } +static void tlbi_aa64_vae1is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + CPUState *cs = env_cpu(env); + int mask = vae1_tlbmask(env); + uint64_t pageaddr = sextract64(value << 12, 0, 56); + int bits = vae1_tlbbits(env, pageaddr); + + tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr, mask, bits); +} + static void tlbi_aa64_vae1_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value) { @@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae1_write(CPUARMState *env, const ARMCPRegInfo *ri, } } +static void tlbi_aa64_vae2is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + CPUState *cs = env_cpu(env); + int mask = vae2_tlbmask(env); + uint64_t pageaddr = sextract64(value << 12, 0, 56); + int bits = vae2_tlbbits(env, pageaddr); + + tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr, mask, bits); +} + +static void tlbi_aa64_vae3is_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + CPUState *cs = env_cpu(env); + uint64_t pageaddr = sextract64(value << 12, 0, 56); + int bits = tlbbits_for_regime(env, ARMMMUIdx_E3, pageaddr); + + tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr, + ARMMMUIdxBit_E3, bits); +} + +static int ipas2e1_tlbmask(CPUARMState *env, int64_t value) +{ + /* + * The MSB of value is the NS field, which only applies if SEL2 + * is implemented and SCR_EL3.NS is not set (i.e. in secure mode). + */ + return (value >= 0 + && cpu_isar_feature(aa64_sel2, env_archcpu(env)) + && arm_is_secure_below_el3(env) + ? ARMMMUIdxBit_Stage2_S + : ARMMMUIdxBit_Stage2); +} + static void tlbi_aa64_ipas2e1_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value) { -- 2.34.1
Move the FEAT_RME specific TLB insns across to tlb-insns.c. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241210160452.2427965-10-peter.maydell@linaro.org --- target/arm/helper.c | 38 -------------------------------- target/arm/tcg/tlb-insns.c | 45 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 45 insertions(+), 38 deletions(-) diff --git a/target/arm/helper.c b/target/arm/helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo sme_reginfo[] = { .type = ARM_CP_CONST, .resetvalue = 0 }, }; -static void tlbi_aa64_paall_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - CPUState *cs = env_cpu(env); - - tlb_flush(cs); -} - static void gpccr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value) { @@ -XXX,XX +XXX,XX @@ static void gpccr_reset(CPUARMState *env, const ARMCPRegInfo *ri) env_archcpu(env)->reset_l0gptsz); } -static void tlbi_aa64_paallos_write(CPUARMState *env, const ARMCPRegInfo *ri, - uint64_t value) -{ - CPUState *cs = env_cpu(env); - - tlb_flush_all_cpus_synced(cs); -} - static const ARMCPRegInfo rme_reginfo[] = { { .name = "GPCCR_EL3", .state = ARM_CP_STATE_AA64, .opc0 = 3, .opc1 = 6, .crn = 2, .crm = 1, .opc2 = 6, @@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo rme_reginfo[] = { { .name = "MFAR_EL3", .state = ARM_CP_STATE_AA64, .opc0 = 3, .opc1 = 6, .crn = 6, .crm = 0, .opc2 = 5, .access = PL3_RW, .fieldoffset = offsetof(CPUARMState, cp15.mfar_el3) }, - { .name = "TLBI_PAALL", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 7, .opc2 = 4, - .access = PL3_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_paall_write }, - { .name = "TLBI_PAALLOS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 1, .opc2 = 4, - .access = PL3_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_paallos_write }, - /* - * QEMU does not have a way to invalidate by physical address, thus - * invalidating a range of physical addresses is accomplished by - * flushing all tlb entries in the outer shareable domain, - * just like PAALLOS. - */ - { .name = "TLBI_RPALOS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 4, .opc2 = 7, - .access = PL3_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_paallos_write }, - { .name = "TLBI_RPAOS", .state = ARM_CP_STATE_AA64, - .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 4, .opc2 = 3, - .access = PL3_W, .type = ARM_CP_NO_RAW, - .writefn = tlbi_aa64_paallos_write }, { .name = "DC_CIPAPA", .state = ARM_CP_STATE_AA64, .opc0 = 1, .opc1 = 6, .crn = 7, .crm = 14, .opc2 = 1, .access = PL3_W, .type = ARM_CP_NOP }, diff --git a/target/arm/tcg/tlb-insns.c b/target/arm/tcg/tlb-insns.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/tlb-insns.c +++ b/target/arm/tcg/tlb-insns.c @@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo tlbios_reginfo[] = { .access = PL3_W, .type = ARM_CP_NO_RAW, .writefn = tlbi_aa64_vae3is_write }, }; + +static void tlbi_aa64_paall_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + CPUState *cs = env_cpu(env); + + tlb_flush(cs); +} + +static void tlbi_aa64_paallos_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ + CPUState *cs = env_cpu(env); + + tlb_flush_all_cpus_synced(cs); +} + +static const ARMCPRegInfo tlbi_rme_reginfo[] = { + { .name = "TLBI_PAALL", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 7, .opc2 = 4, + .access = PL3_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_paall_write }, + { .name = "TLBI_PAALLOS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 1, .opc2 = 4, + .access = PL3_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_paallos_write }, + /* + * QEMU does not have a way to invalidate by physical address, thus + * invalidating a range of physical addresses is accomplished by + * flushing all tlb entries in the outer shareable domain, + * just like PAALLOS. + */ + { .name = "TLBI_RPALOS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 4, .opc2 = 7, + .access = PL3_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_paallos_write }, + { .name = "TLBI_RPAOS", .state = ARM_CP_STATE_AA64, + .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 4, .opc2 = 3, + .access = PL3_W, .type = ARM_CP_NO_RAW, + .writefn = tlbi_aa64_paallos_write }, +}; + #endif void define_tlb_insn_regs(ARMCPU *cpu) @@ -XXX,XX +XXX,XX @@ void define_tlb_insn_regs(ARMCPU *cpu) if (cpu_isar_feature(aa64_tlbios, cpu)) { define_arm_cp_regs(cpu, tlbios_reginfo); } + if (cpu_isar_feature(aa64_rme, cpu)) { + define_arm_cp_regs(cpu, tlbi_rme_reginfo); + } #endif } -- 2.34.1
We currently register the tlbi_el2_cp_reginfo[] TLBI insns if EL2 is implemented, or if EL3 and v8 is implemented. This is a copy of the logic used for el2_cp_reginfo[], but for the specific case of the TLBI insns we can simplify it. This is because we do not need the "if EL2 does not exist but EL3 does then EL2 registers should exist and be RAZ/WI" handling here: all our cpregs are for instructions, which UNDEF when EL3 exists and EL2 does not. Simplify the condition down to just "if EL2 exists". This is not a behaviour change because: * for AArch64 insns we marked them with ARM_CP_EL3_NO_EL2_UNDEF, which meant that define_arm_cp_regs() would ignore them if EL2 wasn't present * for AArch32 insns, the .access = PL2_W meant that if EL2 was not present the only way to get at them was from AArch32 EL3; but we have no CPUs which have ARM_FEATURE_V8 but start in AArch32 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20241210160452.2427965-11-peter.maydell@linaro.org --- target/arm/tcg/tlb-insns.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/target/arm/tcg/tlb-insns.c b/target/arm/tcg/tlb-insns.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/tlb-insns.c +++ b/target/arm/tcg/tlb-insns.c @@ -XXX,XX +XXX,XX @@ void define_tlb_insn_regs(ARMCPU *cpu) * ops (i.e. matching the condition for el2_cp_reginfo[] in * helper.c), but we will be able to simplify this later. */ - if (arm_feature(env, ARM_FEATURE_EL2) - || (arm_feature(env, ARM_FEATURE_EL3) - && arm_feature(env, ARM_FEATURE_V8))) { + if (arm_feature(env, ARM_FEATURE_EL2)) { define_arm_cp_regs(cpu, tlbi_el2_cp_reginfo); } if (arm_feature(env, ARM_FEATURE_EL3)) { -- 2.34.1
Hi; here's a target-arm pullreq. Apologies for the size: this is because it has all RTH's work to enable emulation of SME2p1 and SVE2p1. thanks -- PMM The following changes since commit c77283dd5d79149f4e7e9edd00f65416c648ee59: Merge tag 'pull-request-2025-07-02' of https://gitlab.com/thuth/qemu into staging (2025-07-03 06:01:41 -0400) are available in the Git repository at: https://gitlab.com/pm215/qemu.git tags/pull-target-arm-20250704 for you to fetch changes up to 083fef73585dfa03f72055ace6de8dec4912d0b0: linux-user/aarch64: Set hwcap bits for SME2p1/SVE2p1 (2025-07-04 15:53:23 +0100) ---------------------------------------------------------------- target-arm queue: * Implement emulation of SME2p1 and SVE2p1 * Correctly enforce alignment checks for v8M loads and stores done via helper functions * Mark the "highbank" and the "midway" machine as deprecated ---------------------------------------------------------------- Peter Maydell (4): target/arm: Rename FMOPA_h to FMOPA_w_h target/arm: Rename BFMOPA to BFMOPA_w target/arm: Implement FMOPA (non-widening) for fp16 target/arm: Implement SME2 BFMOPA (non-widening) Richard Henderson (103): target/arm: Fix SME vs AdvSIMD exception priority target/arm: Fix sve_access_check for SME target/arm: Fix 128-bit element ZIP, UZP, TRN target/arm: Replace @rda_rn_rm_e0 in sve.decode target/arm: Fix FMMLA (64-bit element) for 128-bit VL target/arm: Disable FEAT_F64MM if maximum SVE vector size too small target/arm: Fix PSEL size operands to tcg_gen_gvec_ands target/arm: Fix f16_dotadd vs nan selection target/arm: Fix bfdotadd_ebf vs nan selection target/arm: Remove CPUARMState.vfp.scratch target/arm: Introduce FPST_ZA, FPST_ZA_F16 target/arm: Use FPST_ZA for sme_fmopa_[hsd] target/arm: Rename zarray to za_state.za target/arm: Add isar feature tests for SME2p1, SVE2p1 target/arm: Add ZT0 target/arm: Add zt0_excp_el to DisasContext target/arm: Implement SME2 ZERO ZT0 target/arm: Add alignment argument to gen_sve_{ldr, str} target/arm: Implement SME2 LDR/STR ZT0 target/arm: Implement SME2 MOVT target/arm: Split get_tile_rowcol argument tile_index target/arm: Rename MOVA for translate target/arm: Split out get_zarray target/arm: Introduce ARMCPU.sme_max_vq target/arm: Implement SME2 MOVA to/from tile, multiple registers target/arm: Implement SME2 MOVA to/from array, multiple registers target/arm: Implement SME2 BMOPA target/arm: Implement SME2 SMOPS, UMOPS (2-way) target/arm: Introduce gen_gvec_sve2_sqdmulh target/arm: Implement SME2 Multiple and Single SVE Destructive target/arm: Implement SME2 Multiple Vectors SVE Destructive target/arm: Implement SME2 ADD/SUB (array results, multiple and single vector) target/arm: Implement SME2 ADD/SUB (array results, multiple vectors) target/arm: Pass ZA to helper_sve2_fmlal_zz[zx]w_s target/arm: Add helper_gvec{_ah}_bfmlsl{_nx} target/arm: Implement SME2 FMLAL, BFMLAL target/arm: Implement SME2 FDOT target/arm: Implement SME2 BFDOT target/arm: Implement SME2 FVDOT, BFVDOT target/arm: Rename helper_gvec_*dot_[bh] to *_4[bh] target/arm: Implemement SME2 SDOT, UDOT, USDOT, SUDOT target/arm: Rename SVE SDOT and UDOT patterns target/arm: Tighten USDOT (vectors) decode target/arm: Implement SDOT, UDOT (2-way) for SME2/SVE2p1 target/arm: Implement SME2 SVDOT, UVDOT, SUVDOT, USVDOT target/arm: Implement SME2 SMLAL, SMLSL, UMLAL, UMLSL target/arm: Rename gvec_fml[as]_[hs] with _nf_ infix target/arm: Implement SME2 FMLA, FMLS target/arm: Implement SME2 BFMLA, BFMLS target/arm: Implement SME2 FADD, FSUB, BFADD, BFSUB target/arm: Implement SME2 ADD/SUB (array accumulator) target/arm: Implement SME2 BFCVT, BFCVTN, FCVT, FCVTN target/arm: Implement SME2 FCVT (widening), FCVTL target/arm: Implement SME2 FCVTZS, FCVTZU target/arm: Implement SME2 SCVTF, UCVTF target/arm: Implement SME2 FRINTN, FRINTP, FRINTM, FRINTA target/arm: Introduce do_[us]sat_[bhs] macros target/arm: Use do_[us]sat_[bhs] in sve_helper.c target/arm: Implement SME2 SQCVT, UQCVT, SQCVTU target/arm: Implement SQCVTN, UQCVTN, SQCVTUN for SME2/SVE2p1 target/arm: Implement SME2 SUNPK, UUNPK target/arm: Implement SME2 ZIP, UZP (four registers) target/arm: Move do_urshr, do_srshr to vec_internal.h target/arm: Implement SME2 SQRSHR, UQRSHR, SQRSHRN target/arm: Implement SME2 ZIP, UZP (two registers) target/arm: Implement SME2 FCLAMP, SCLAMP, UCLAMP target/arm: Enable SCLAMP, UCLAMP for SVE2p1 target/arm: Implement FCLAMP for SME2, SVE2p1 target/arm: Implement SME2p1 Multiple Zero target/arm: Introduce pred_count_test target/arm: Fold predtest_ones into helper_sve_brkns target/arm: Expand do_zero inline target/arm: Split out do_whilel from helper_sve_whilel target/arm: Split out do_whileg from helper_sve_whileg target/arm: Move scale by esz into helper_sve_while* target/arm: Split trans_WHILE to lt and gt target/arm: Enable PSEL for SVE2p1 target/arm: Implement SVE2p1 WHILE (predicate pair) target/arm: Implement SVE2p1 WHILE (predicate as counter) target/arm: Implement SVE2p1 PTRUE (predicate as counter) target/arm: Implement {ADD, SMIN, SMAX, UMIN, UMAX}QV for SVE2p1 target/arm: Implement SVE2p1 PEXT target/arm: Implement SME2 SEL target/arm: Implement ANDQV, ORQV, EORQV for SVE2p1 target/arm: Implement FADDQV, F{MIN, MAX}{NM}QV for SVE2p1 target/arm: Implement BFMLSLB{L, T} for SME2/SVE2p1 target/arm: Implement CNTP (predicate as counter) for SME2/SVE2p1 target/arm: Implement DUPQ for SME2p1/SVE2p1 target/arm: Implement EXTQ for SME2p1/SVE2p1 target/arm: Implement PMOV for SME2p1/SVE2p1 target/arm: Implement ZIPQ, UZPQ for SME2p1/SVE2p1 target/arm: Implement TBLQ, TBXQ for SME2p1/SVE2p1 target/arm: Implement SME2 counted predicate register load/store target/arm: Split the ST_zpri and ST_zprr patterns target/arm: Implement {LD1, ST1}{W, D} (128-bit element) for SVE2p1 target/arm: Move ld1qq and st1qq primitives to sve_ldst_internal.h target/arm: Implement {LD, ST}[234]Q for SME2p1/SVE2p1 target/arm: Implement LD1Q, ST1Q for SVE2p1 target/arm: Implement MOVAZ for SME2p1 target/arm: Implement LUTI2, LUTI4 for SME2/SME2p1 target/arm: Support FPCR.AH in SME FMOPS, BFMOPS target/arm: Enable FEAT_SME2p1 on -cpu max linux-user/aarch64: Set hwcap bits for SME2p1/SVE2p1 Thomas Huth (1): hw/arm/highbank: Mark the "highbank" and the "midway" machine as deprecated William Kosasih (11): target/arm: Bring VLSTM/VLLDM helper store/load closer to the ARM pseudocode target/arm: Fix BLXNS helper store alignment checks target/arm: Fix function_return helper load alignment checks target/arm: Fix VLDR helper load alignment checks target/arm: Fix VSTR helper store alignment checks target/arm: Fix VLDR_SG helper load alignment checks target/arm: Fix VSTR_SG helper store alignment checks target/arm: Fix VLD4 helper load alignment checks target/arm: Fix VLD2 helper load alignment checks target/arm: Fix VST4 helper store alignment checks target/arm: Fix VST2 helper store alignment checks docs/about/deprecated.rst | 7 + docs/system/arm/emulation.rst | 6 + target/arm/cpu-features.h | 63 ++ target/arm/cpu.h | 70 +- target/arm/syndrome.h | 1 + target/arm/tcg/helper-sme.h | 215 ++++- target/arm/tcg/helper-sve.h | 212 +++++ target/arm/tcg/helper.h | 91 +- target/arm/tcg/sve_ldst_internal.h | 89 ++ target/arm/tcg/translate-a64.h | 10 +- target/arm/tcg/translate.h | 9 + target/arm/tcg/vec_internal.h | 148 ++++ target/arm/tcg/sme.decode | 937 +++++++++++++++++++- target/arm/tcg/sve.decode | 327 +++++-- hw/arm/highbank.c | 2 + linux-user/aarch64/signal.c | 4 +- linux-user/elfload.c | 8 + target/arm/cpu.c | 11 +- target/arm/cpu64.c | 8 + target/arm/helper.c | 8 +- target/arm/machine.c | 22 +- target/arm/tcg/cpu64.c | 10 +- target/arm/tcg/gengvec64.c | 11 + target/arm/tcg/helper-a64.c | 2 + target/arm/tcg/hflags.c | 34 +- target/arm/tcg/m_helper.c | 33 +- target/arm/tcg/mve_helper.c | 183 ++-- target/arm/tcg/neon_helper.c | 30 + target/arm/tcg/sme_helper.c | 1674 +++++++++++++++++++++++++++++++++--- target/arm/tcg/sve_helper.c | 1201 +++++++++++++++++++++----- target/arm/tcg/translate-a64.c | 45 +- target/arm/tcg/translate-neon.c | 18 +- target/arm/tcg/translate-sme.c | 1480 ++++++++++++++++++++++++++++++- target/arm/tcg/translate-sve.c | 1019 +++++++++++++++++++--- target/arm/tcg/vec_helper.c | 384 +++++++-- target/arm/tcg/vfp_helper.c | 12 +- 36 files changed, 7605 insertions(+), 779 deletions(-)
From: Thomas Huth <thuth@redhat.com> We don't have any automatic regression tests for these machines and when asking the usual suspects on the mailing list we came to the conclusion that nobody tests these machines manually, too, so it seems like this is currently just completely unused code. Mark them as depre- cated to see whether anybody still speaks up during the deprecation period, otherwise we can likely remove these two machines in a couple of releases. Signed-off-by: Thomas Huth <thuth@redhat.com> Acked-by: Rob Herring <robh@kernel.org> Acked-by: Guenter Roeck <linux@roeck-us.net> Message-id: 20250702113051.46483-1-thuth@redhat.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> [PMM: tweaked deprecation.rst text] Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- docs/about/deprecated.rst | 7 +++++++ hw/arm/highbank.c | 2 ++ 2 files changed, 9 insertions(+) diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst index XXXXXXX..XXXXXXX 100644 --- a/docs/about/deprecated.rst +++ b/docs/about/deprecated.rst @@ -XXX,XX +XXX,XX @@ they want to use and avoids confusion. Existing users of the ``spike`` machine must ensure that they're setting the ``spike`` machine in the command line (``-M spike``). +Arm ``highbank`` and ``midway`` machines (since 10.1) +''''''''''''''''''''''''''''''''''''''''''''''''''''' + +There are no known users left for these machines (if you still use it, +please write a mail to the qemu-devel mailing list). If you just want to +boot a Cortex-A15 or Cortex-A9 Linux, use the ``virt`` machine instead. + System emulator binaries ------------------------ diff --git a/hw/arm/highbank.c b/hw/arm/highbank.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/highbank.c +++ b/hw/arm/highbank.c @@ -XXX,XX +XXX,XX @@ static void highbank_class_init(ObjectClass *oc, const void *data) mc->max_cpus = 4; mc->ignore_memory_transaction_failures = true; mc->default_ram_id = "highbank.dram"; + mc->deprecation_reason = "no known users left for this machine"; } static const TypeInfo highbank_type = { @@ -XXX,XX +XXX,XX @@ static void midway_class_init(ObjectClass *oc, const void *data) mc->max_cpus = 4; mc->ignore_memory_transaction_failures = true; mc->default_ram_id = "highbank.dram"; + mc->deprecation_reason = "no known users left for this machine"; } static const TypeInfo midway_type = { -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch brings the VLSTM and VLLDM helper functions closer to the ARM pseudocode by adding MO_ALIGN to the MemOpIdx of the associated store (`cpu_stl_mmu`) operations and load (`cpu_ldl_mmu`) operations. Note that this is not a bug fix: an 8-byte alignment check already exists and remains in place, enforcing stricter alignment than the 4 bytes requirement in the individual loads and stores. This change merely makes the helper implementations closer to the ARM pseudocode. That said, as a side effect, the MMU index is now resolved once instead of on every `cpu_*_data_ra` call, reducing redundant lookups Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-2-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/m_helper.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/target/arm/tcg/m_helper.c b/target/arm/tcg/m_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/m_helper.c +++ b/target/arm/tcg/m_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fptr) bool s = env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_S_MASK; bool lspact = env->v7m.fpccr[s] & R_V7M_FPCCR_LSPACT_MASK; uintptr_t ra = GETPC(); + ARMMMUIdx mmu_idx = arm_mmu_idx(env); + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, + arm_to_core_mmu_idx(mmu_idx)); assert(env->v7m.secure); @@ -XXX,XX +XXX,XX @@ void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fptr) * Note that we do not use v7m_stack_write() here, because the * accesses should not set the FSR bits for stacking errors if they * fail. (In pseudocode terms, they are AccType_NORMAL, not AccType_STACK - * or AccType_LAZYFP). Faults in cpu_stl_data_ra() will throw exceptions + * or AccType_LAZYFP). Faults in cpu_stl_mmu() will throw exceptions * and longjmp out. */ if (!(env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_LSPEN_MASK)) { @@ -XXX,XX +XXX,XX @@ void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fptr) if (i >= 16) { faddr += 8; /* skip the slot for the FPSCR */ } - cpu_stl_data_ra(env, faddr, slo, ra); - cpu_stl_data_ra(env, faddr + 4, shi, ra); + cpu_stl_mmu(env, faddr, slo, oi, ra); + cpu_stl_mmu(env, faddr + 4, shi, oi, ra); } - cpu_stl_data_ra(env, fptr + 0x40, vfp_get_fpscr(env), ra); + cpu_stl_mmu(env, fptr + 0x40, vfp_get_fpscr(env), oi, ra); if (cpu_isar_feature(aa32_mve, cpu)) { - cpu_stl_data_ra(env, fptr + 0x44, env->v7m.vpr, ra); + cpu_stl_mmu(env, fptr + 0x44, env->v7m.vpr, oi, ra); } /* @@ -XXX,XX +XXX,XX @@ void HELPER(v7m_vlldm)(CPUARMState *env, uint32_t fptr) { ARMCPU *cpu = env_archcpu(env); uintptr_t ra = GETPC(); + ARMMMUIdx mmu_idx = arm_mmu_idx(env); + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, + arm_to_core_mmu_idx(mmu_idx)); /* fptr is the value of Rn, the frame pointer we load the FP regs from */ assert(env->v7m.secure); @@ -XXX,XX +XXX,XX @@ void HELPER(v7m_vlldm)(CPUARMState *env, uint32_t fptr) faddr += 8; /* skip the slot for the FPSCR and VPR */ } - slo = cpu_ldl_data_ra(env, faddr, ra); - shi = cpu_ldl_data_ra(env, faddr + 4, ra); + slo = cpu_ldl_mmu(env, faddr, oi, ra); + shi = cpu_ldl_mmu(env, faddr + 4, oi, ra); dn = (uint64_t) shi << 32 | slo; *aa32_vfp_dreg(env, i / 2) = dn; } - fpscr = cpu_ldl_data_ra(env, fptr + 0x40, ra); + fpscr = cpu_ldl_mmu(env, fptr + 0x40, oi, ra); vfp_set_fpscr(env, fpscr); if (cpu_isar_feature(aa32_mve, cpu)) { - env->v7m.vpr = cpu_ldl_data_ra(env, fptr + 0x44, ra); + env->v7m.vpr = cpu_ldl_mmu(env, fptr + 0x44, oi, ra); } } -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the store operations (when stacking the return pc and psr) in the BLXNS instruction. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-3-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/m_helper.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/target/arm/tcg/m_helper.c b/target/arm/tcg/m_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/m_helper.c +++ b/target/arm/tcg/m_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(v7m_blxns)(CPUARMState *env, uint32_t dest) } /* Note that these stores can throw exceptions on MPU faults */ - cpu_stl_data_ra(env, sp, nextinst, GETPC()); - cpu_stl_data_ra(env, sp + 4, saved_psr, GETPC()); + ARMMMUIdx mmu_idx = arm_mmu_idx(env); + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, + arm_to_core_mmu_idx(mmu_idx)); + cpu_stl_mmu(env, sp, nextinst, oi, GETPC()); + cpu_stl_mmu(env, sp + 4, saved_psr, oi, GETPC()); env->regs[13] = sp; env->regs[14] = 0xfeffffff; -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the load operations (when unstacking the return pc and psr) in the FunctionReturn pseudocode. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-4-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/m_helper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/arm/tcg/m_helper.c b/target/arm/tcg/m_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/m_helper.c +++ b/target/arm/tcg/m_helper.c @@ -XXX,XX +XXX,XX @@ static bool do_v7m_function_return(ARMCPU *cpu) * do them as secure, so work out what MMU index that is. */ mmu_idx = arm_v7m_mmu_idx_for_secstate(env, true); - oi = make_memop_idx(MO_LEUL, arm_to_core_mmu_idx(mmu_idx)); + oi = make_memop_idx(MO_LEUL | MO_ALIGN, arm_to_core_mmu_idx(mmu_idx)); newpc = cpu_ldl_mmu(env, frameptr, oi, 0); newpsr = cpu_ldl_mmu(env, frameptr + 4, oi, 0); -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the load operations in the VLDR instruction. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-5-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/mve_helper.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/mve_helper.c +++ b/target/arm/tcg/mve_helper.c @@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env) } /* For loads, predicated lanes are zeroed instead of keeping their old values */ -#define DO_VLDR(OP, MSIZE, LDTYPE, ESIZE, TYPE) \ +#define DO_VLDR(OP, MFLAG, MSIZE, MTYPE, LDTYPE, ESIZE, TYPE) \ void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr) \ { \ TYPE *d = vd; \ uint16_t mask = mve_element_mask(env); \ uint16_t eci_mask = mve_eci_mask(env); \ unsigned b, e; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MFLAG | MO_ALIGN, mmu_idx); \ /* \ * R_SXTM allows the dest reg to become UNKNOWN for abandoned \ * beats so we don't care if we update part of the dest and \ @@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env) for (b = 0, e = 0; b < 16; b += ESIZE, e++) { \ if (eci_mask & (1 << b)) { \ d[H##ESIZE(e)] = (mask & (1 << b)) ? \ - cpu_##LDTYPE##_data_ra(env, addr, GETPC()) : 0; \ + (MTYPE)cpu_##LDTYPE##_mmu(env, addr, oi, GETPC()) : 0;\ } \ addr += MSIZE; \ } \ @@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env) mve_advance_vpt(env); \ } -DO_VLDR(vldrb, 1, ldub, 1, uint8_t) -DO_VLDR(vldrh, 2, lduw, 2, uint16_t) -DO_VLDR(vldrw, 4, ldl, 4, uint32_t) +DO_VLDR(vldrb, MO_UB, 1, uint8_t, ldb, 1, uint8_t) +DO_VLDR(vldrh, MO_TEUW, 2, uint16_t, ldw, 2, uint16_t) +DO_VLDR(vldrw, MO_TEUL, 4, uint32_t, ldl, 4, uint32_t) DO_VSTR(vstrb, 1, stb, 1, uint8_t) DO_VSTR(vstrh, 2, stw, 2, uint16_t) DO_VSTR(vstrw, 4, stl, 4, uint32_t) -DO_VLDR(vldrb_sh, 1, ldsb, 2, int16_t) -DO_VLDR(vldrb_sw, 1, ldsb, 4, int32_t) -DO_VLDR(vldrb_uh, 1, ldub, 2, uint16_t) -DO_VLDR(vldrb_uw, 1, ldub, 4, uint32_t) -DO_VLDR(vldrh_sw, 2, ldsw, 4, int32_t) -DO_VLDR(vldrh_uw, 2, lduw, 4, uint32_t) +DO_VLDR(vldrb_sh, MO_SB, 1, int8_t, ldb, 2, int16_t) +DO_VLDR(vldrb_sw, MO_SB, 1, int8_t, ldb, 4, int32_t) +DO_VLDR(vldrb_uh, MO_UB, 1, uint8_t, ldb, 2, uint16_t) +DO_VLDR(vldrb_uw, MO_UB, 1, uint8_t, ldb, 4, uint32_t) +DO_VLDR(vldrh_sw, MO_TESW, 2, int16_t, ldw, 4, int32_t) +DO_VLDR(vldrh_uw, MO_TEUW, 2, uint16_t, ldw, 4, uint32_t) DO_VSTR(vstrb_h, 1, stb, 2, int16_t) DO_VSTR(vstrb_w, 1, stb, 4, int32_t) -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the store operations in the VSTR instruction. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-6-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/mve_helper.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/mve_helper.c +++ b/target/arm/tcg/mve_helper.c @@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env) mve_advance_vpt(env); \ } -#define DO_VSTR(OP, MSIZE, STTYPE, ESIZE, TYPE) \ +#define DO_VSTR(OP, MFLAG, MSIZE, STTYPE, ESIZE, TYPE) \ void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr) \ { \ TYPE *d = vd; \ uint16_t mask = mve_element_mask(env); \ unsigned b, e; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MFLAG | MO_ALIGN, mmu_idx); \ for (b = 0, e = 0; b < 16; b += ESIZE, e++) { \ if (mask & (1 << b)) { \ - cpu_##STTYPE##_data_ra(env, addr, d[H##ESIZE(e)], GETPC()); \ + cpu_##STTYPE##_mmu(env, addr, d[H##ESIZE(e)], oi, GETPC()); \ } \ addr += MSIZE; \ } \ @@ -XXX,XX +XXX,XX @@ DO_VLDR(vldrb, MO_UB, 1, uint8_t, ldb, 1, uint8_t) DO_VLDR(vldrh, MO_TEUW, 2, uint16_t, ldw, 2, uint16_t) DO_VLDR(vldrw, MO_TEUL, 4, uint32_t, ldl, 4, uint32_t) -DO_VSTR(vstrb, 1, stb, 1, uint8_t) -DO_VSTR(vstrh, 2, stw, 2, uint16_t) -DO_VSTR(vstrw, 4, stl, 4, uint32_t) +DO_VSTR(vstrb, MO_UB, 1, stb, 1, uint8_t) +DO_VSTR(vstrh, MO_TEUW, 2, stw, 2, uint16_t) +DO_VSTR(vstrw, MO_TEUL, 4, stl, 4, uint32_t) DO_VLDR(vldrb_sh, MO_SB, 1, int8_t, ldb, 2, int16_t) DO_VLDR(vldrb_sw, MO_SB, 1, int8_t, ldb, 4, int32_t) @@ -XXX,XX +XXX,XX @@ DO_VLDR(vldrb_uw, MO_UB, 1, uint8_t, ldb, 4, uint32_t) DO_VLDR(vldrh_sw, MO_TESW, 2, int16_t, ldw, 4, int32_t) DO_VLDR(vldrh_uw, MO_TEUW, 2, uint16_t, ldw, 4, uint32_t) -DO_VSTR(vstrb_h, 1, stb, 2, int16_t) -DO_VSTR(vstrb_w, 1, stb, 4, int32_t) -DO_VSTR(vstrh_w, 2, stw, 4, int32_t) +DO_VSTR(vstrb_h, MO_UB, 1, stb, 2, int16_t) +DO_VSTR(vstrb_w, MO_UB, 1, stb, 4, int32_t) +DO_VSTR(vstrh_w, MO_TEUW, 2, stw, 4, int32_t) #undef DO_VLDR #undef DO_VSTR -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the load operations in the VLDR_SG instructions. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-7-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/mve_helper.c | 42 ++++++++++++++++++++++--------------- 1 file changed, 25 insertions(+), 17 deletions(-) diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/mve_helper.c +++ b/target/arm/tcg/mve_helper.c @@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, MO_TEUW, 2, stw, 4, int32_t) * For loads, predicated lanes are zeroed instead of retaining * their previous values. */ -#define DO_VLDR_SG(OP, LDTYPE, ESIZE, TYPE, OFFTYPE, ADDRFN, WB) \ +#define DO_VLDR_SG(OP, MFLAG, MTYPE, LDTYPE, ESIZE, TYPE, OFFTYPE, ADDRFN, WB)\ void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, \ uint32_t base) \ { \ @@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, MO_TEUW, 2, stw, 4, int32_t) uint16_t eci_mask = mve_eci_mask(env); \ unsigned e; \ uint32_t addr; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MFLAG | MO_ALIGN, mmu_idx); \ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE, eci_mask >>= ESIZE) { \ if (!(eci_mask & 1)) { \ continue; \ } \ addr = ADDRFN(base, m[H##ESIZE(e)]); \ d[H##ESIZE(e)] = (mask & 1) ? \ - cpu_##LDTYPE##_data_ra(env, addr, GETPC()) : 0; \ + (MTYPE)cpu_##LDTYPE##_mmu(env, addr, oi, GETPC()) : 0; \ if (WB) { \ m[H##ESIZE(e)] = addr; \ } \ @@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, MO_TEUW, 2, stw, 4, int32_t) uint16_t eci_mask = mve_eci_mask(env); \ unsigned e; \ uint32_t addr; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (e = 0; e < 16 / 4; e++, mask >>= 4, eci_mask >>= 4) { \ if (!(eci_mask & 1)) { \ continue; \ } \ addr = ADDRFN(base, m[H4(e & ~1)]); \ addr += 4 * (e & 1); \ - d[H4(e)] = (mask & 1) ? cpu_ldl_data_ra(env, addr, GETPC()) : 0; \ + d[H4(e)] = (mask & 1) ? cpu_ldl_mmu(env, addr, oi, GETPC()) : 0; \ if (WB && (e & 1)) { \ m[H4(e & ~1)] = addr - 4; \ } \ @@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, MO_TEUW, 2, stw, 4, int32_t) #define ADDR_ADD_OSW(BASE, OFFSET) ((BASE) + ((OFFSET) << 2)) #define ADDR_ADD_OSD(BASE, OFFSET) ((BASE) + ((OFFSET) << 3)) -DO_VLDR_SG(vldrb_sg_sh, ldsb, 2, int16_t, uint16_t, ADDR_ADD, false) -DO_VLDR_SG(vldrb_sg_sw, ldsb, 4, int32_t, uint32_t, ADDR_ADD, false) -DO_VLDR_SG(vldrh_sg_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD, false) +DO_VLDR_SG(vldrb_sg_sh, MO_SB, int8_t, ldb, 2, int16_t, uint16_t, ADDR_ADD, false) +DO_VLDR_SG(vldrb_sg_sw, MO_SB, int8_t, ldb, 4, int32_t, uint32_t, ADDR_ADD, false) +DO_VLDR_SG(vldrh_sg_sw, MO_TESW, int16_t, ldw, 4, int32_t, uint32_t, ADDR_ADD, false) -DO_VLDR_SG(vldrb_sg_ub, ldub, 1, uint8_t, uint8_t, ADDR_ADD, false) -DO_VLDR_SG(vldrb_sg_uh, ldub, 2, uint16_t, uint16_t, ADDR_ADD, false) -DO_VLDR_SG(vldrb_sg_uw, ldub, 4, uint32_t, uint32_t, ADDR_ADD, false) -DO_VLDR_SG(vldrh_sg_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD, false) -DO_VLDR_SG(vldrh_sg_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD, false) -DO_VLDR_SG(vldrw_sg_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD, false) +DO_VLDR_SG(vldrb_sg_ub, MO_UB, uint8_t, ldb, 1, uint8_t, uint8_t, ADDR_ADD, false) +DO_VLDR_SG(vldrb_sg_uh, MO_UB, uint8_t, ldb, 2, uint16_t, uint16_t, ADDR_ADD, false) +DO_VLDR_SG(vldrb_sg_uw, MO_UB, uint8_t, ldb, 4, uint32_t, uint32_t, ADDR_ADD, false) +DO_VLDR_SG(vldrh_sg_uh, MO_TEUW, uint16_t, ldw, 2, uint16_t, uint16_t, ADDR_ADD, false) +DO_VLDR_SG(vldrh_sg_uw, MO_TEUW, uint16_t, ldw, 4, uint32_t, uint32_t, ADDR_ADD, false) +DO_VLDR_SG(vldrw_sg_uw, MO_TEUL, uint32_t, ldl, 4, uint32_t, uint32_t, ADDR_ADD, false) DO_VLDR64_SG(vldrd_sg_ud, ADDR_ADD, false) -DO_VLDR_SG(vldrh_sg_os_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD_OSH, false) -DO_VLDR_SG(vldrh_sg_os_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD_OSH, false) -DO_VLDR_SG(vldrh_sg_os_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD_OSH, false) -DO_VLDR_SG(vldrw_sg_os_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD_OSW, false) +DO_VLDR_SG(vldrh_sg_os_sw, MO_TESW, int16_t, ldw, 4, + int32_t, uint32_t, ADDR_ADD_OSH, false) +DO_VLDR_SG(vldrh_sg_os_uh, MO_TEUW, uint16_t, ldw, 2, + uint16_t, uint16_t, ADDR_ADD_OSH, false) +DO_VLDR_SG(vldrh_sg_os_uw, MO_TEUW, uint16_t, ldw, 4, + uint32_t, uint32_t, ADDR_ADD_OSH, false) +DO_VLDR_SG(vldrw_sg_os_uw, MO_TEUL, uint32_t, ldl, 4, + uint32_t, uint32_t, ADDR_ADD_OSW, false) DO_VLDR64_SG(vldrd_sg_os_ud, ADDR_ADD_OSD, false) DO_VSTR_SG(vstrb_sg_ub, stb, 1, uint8_t, ADDR_ADD, false) @@ -XXX,XX +XXX,XX @@ DO_VSTR_SG(vstrh_sg_os_uw, stw, 4, uint32_t, ADDR_ADD_OSH, false) DO_VSTR_SG(vstrw_sg_os_uw, stl, 4, uint32_t, ADDR_ADD_OSW, false) DO_VSTR64_SG(vstrd_sg_os_ud, ADDR_ADD_OSD, false) -DO_VLDR_SG(vldrw_sg_wb_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD, true) +DO_VLDR_SG(vldrw_sg_wb_uw, MO_TEUL, uint32_t, ldl, 4, uint32_t, uint32_t, ADDR_ADD, true) DO_VLDR64_SG(vldrd_sg_wb_ud, ADDR_ADD, true) DO_VSTR_SG(vstrw_sg_wb_uw, stl, 4, uint32_t, ADDR_ADD, true) DO_VSTR64_SG(vstrd_sg_wb_ud, ADDR_ADD, true) -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the store operations in the VSTR_SG instructions. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-8-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/mve_helper.c | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/mve_helper.c +++ b/target/arm/tcg/mve_helper.c @@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, MO_TEUW, 2, stw, 4, int32_t) } /* We know here TYPE is unsigned so always the same as the offset type */ -#define DO_VSTR_SG(OP, STTYPE, ESIZE, TYPE, ADDRFN, WB) \ +#define DO_VSTR_SG(OP, MFLAG, STTYPE, ESIZE, TYPE, ADDRFN, WB) \ void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, \ uint32_t base) \ { \ @@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, MO_TEUW, 2, stw, 4, int32_t) uint16_t eci_mask = mve_eci_mask(env); \ unsigned e; \ uint32_t addr; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MFLAG | MO_ALIGN, mmu_idx); \ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE, eci_mask >>= ESIZE) { \ if (!(eci_mask & 1)) { \ continue; \ } \ addr = ADDRFN(base, m[H##ESIZE(e)]); \ if (mask & 1) { \ - cpu_##STTYPE##_data_ra(env, addr, d[H##ESIZE(e)], GETPC()); \ + cpu_##STTYPE##_mmu(env, addr, d[H##ESIZE(e)], oi, GETPC()); \ } \ if (WB) { \ m[H##ESIZE(e)] = addr; \ @@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, MO_TEUW, 2, stw, 4, int32_t) uint16_t eci_mask = mve_eci_mask(env); \ unsigned e; \ uint32_t addr; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (e = 0; e < 16 / 4; e++, mask >>= 4, eci_mask >>= 4) { \ if (!(eci_mask & 1)) { \ continue; \ @@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, MO_TEUW, 2, stw, 4, int32_t) addr = ADDRFN(base, m[H4(e & ~1)]); \ addr += 4 * (e & 1); \ if (mask & 1) { \ - cpu_stl_data_ra(env, addr, d[H4(e)], GETPC()); \ + cpu_stl_mmu(env, addr, d[H4(e)], oi, GETPC()); \ } \ if (WB && (e & 1)) { \ m[H4(e & ~1)] = addr - 4; \ @@ -XXX,XX +XXX,XX @@ DO_VLDR_SG(vldrw_sg_os_uw, MO_TEUL, uint32_t, ldl, 4, uint32_t, uint32_t, ADDR_ADD_OSW, false) DO_VLDR64_SG(vldrd_sg_os_ud, ADDR_ADD_OSD, false) -DO_VSTR_SG(vstrb_sg_ub, stb, 1, uint8_t, ADDR_ADD, false) -DO_VSTR_SG(vstrb_sg_uh, stb, 2, uint16_t, ADDR_ADD, false) -DO_VSTR_SG(vstrb_sg_uw, stb, 4, uint32_t, ADDR_ADD, false) -DO_VSTR_SG(vstrh_sg_uh, stw, 2, uint16_t, ADDR_ADD, false) -DO_VSTR_SG(vstrh_sg_uw, stw, 4, uint32_t, ADDR_ADD, false) -DO_VSTR_SG(vstrw_sg_uw, stl, 4, uint32_t, ADDR_ADD, false) +DO_VSTR_SG(vstrb_sg_ub, MO_UB, stb, 1, uint8_t, ADDR_ADD, false) +DO_VSTR_SG(vstrb_sg_uh, MO_UB, stb, 2, uint16_t, ADDR_ADD, false) +DO_VSTR_SG(vstrb_sg_uw, MO_UB, stb, 4, uint32_t, ADDR_ADD, false) +DO_VSTR_SG(vstrh_sg_uh, MO_TEUW, stw, 2, uint16_t, ADDR_ADD, false) +DO_VSTR_SG(vstrh_sg_uw, MO_TEUW, stw, 4, uint32_t, ADDR_ADD, false) +DO_VSTR_SG(vstrw_sg_uw, MO_TEUL, stl, 4, uint32_t, ADDR_ADD, false) DO_VSTR64_SG(vstrd_sg_ud, ADDR_ADD, false) -DO_VSTR_SG(vstrh_sg_os_uh, stw, 2, uint16_t, ADDR_ADD_OSH, false) -DO_VSTR_SG(vstrh_sg_os_uw, stw, 4, uint32_t, ADDR_ADD_OSH, false) -DO_VSTR_SG(vstrw_sg_os_uw, stl, 4, uint32_t, ADDR_ADD_OSW, false) +DO_VSTR_SG(vstrh_sg_os_uh, MO_TEUW, stw, 2, uint16_t, ADDR_ADD_OSH, false) +DO_VSTR_SG(vstrh_sg_os_uw, MO_TEUW, stw, 4, uint32_t, ADDR_ADD_OSH, false) +DO_VSTR_SG(vstrw_sg_os_uw, MO_TEUL, stl, 4, uint32_t, ADDR_ADD_OSW, false) DO_VSTR64_SG(vstrd_sg_os_ud, ADDR_ADD_OSD, false) DO_VLDR_SG(vldrw_sg_wb_uw, MO_TEUL, uint32_t, ldl, 4, uint32_t, uint32_t, ADDR_ADD, true) DO_VLDR64_SG(vldrd_sg_wb_ud, ADDR_ADD, true) -DO_VSTR_SG(vstrw_sg_wb_uw, stl, 4, uint32_t, ADDR_ADD, true) +DO_VSTR_SG(vstrw_sg_wb_uw, MO_TEUL, stl, 4, uint32_t, ADDR_ADD, true) DO_VSTR64_SG(vstrd_sg_wb_ud, ADDR_ADD, true) /* -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the load operations in the VLD4 instruction. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-9-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/mve_helper.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/mve_helper.c +++ b/target/arm/tcg/mve_helper.c @@ -XXX,XX +XXX,XX @@ DO_VSTR64_SG(vstrd_sg_wb_ud, ADDR_ADD, true) uint16_t mask = mve_eci_mask(env); \ static const uint8_t off[4] = { O1, O2, O3, O4 }; \ uint32_t addr, data; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ continue; \ } \ addr = base + off[beat] * 4; \ - data = cpu_ldl_le_data_ra(env, addr, GETPC()); \ + data = cpu_ldl_mmu(env, addr, oi, GETPC()); \ for (e = 0; e < 4; e++, data >>= 8) { \ uint8_t *qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + e); \ qd[H1(off[beat])] = data; \ @@ -XXX,XX +XXX,XX @@ DO_VSTR64_SG(vstrd_sg_wb_ud, ADDR_ADD, true) uint32_t addr, data; \ int y; /* y counts 0 2 0 2 */ \ uint16_t *qd; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0, y = 0; beat < 4; beat++, mask >>= 4, y ^= 2) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ continue; \ } \ addr = base + off[beat] * 8 + (beat & 1) * 4; \ - data = cpu_ldl_le_data_ra(env, addr, GETPC()); \ + data = cpu_ldl_mmu(env, addr, oi, GETPC()); \ qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + y); \ qd[H2(off[beat])] = data; \ data >>= 16; \ @@ -XXX,XX +XXX,XX @@ DO_VSTR64_SG(vstrd_sg_wb_ud, ADDR_ADD, true) uint32_t addr, data; \ uint32_t *qd; \ int y; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ continue; \ } \ addr = base + off[beat] * 4; \ - data = cpu_ldl_le_data_ra(env, addr, GETPC()); \ + data = cpu_ldl_mmu(env, addr, oi, GETPC()); \ y = (beat + (O1 & 2)) & 3; \ qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + y); \ qd[H4(off[beat] >> 2)] = data; \ -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the load operations in the VLD2 instruction. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-10-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/mve_helper.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/mve_helper.c +++ b/target/arm/tcg/mve_helper.c @@ -XXX,XX +XXX,XX @@ DO_VLD4W(vld43w, 6, 7, 8, 9) static const uint8_t off[4] = { O1, O2, O3, O4 }; \ uint32_t addr, data; \ uint8_t *qd; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ continue; \ } \ addr = base + off[beat] * 2; \ - data = cpu_ldl_le_data_ra(env, addr, GETPC()); \ + data = cpu_ldl_mmu(env, addr, oi, GETPC()); \ for (e = 0; e < 4; e++, data >>= 8) { \ qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + (e & 1)); \ qd[H1(off[beat] + (e >> 1))] = data; \ @@ -XXX,XX +XXX,XX @@ DO_VLD4W(vld43w, 6, 7, 8, 9) uint32_t addr, data; \ int e; \ uint16_t *qd; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ continue; \ } \ addr = base + off[beat] * 4; \ - data = cpu_ldl_le_data_ra(env, addr, GETPC()); \ + data = cpu_ldl_mmu(env, addr, oi, GETPC()); \ for (e = 0; e < 2; e++, data >>= 16) { \ qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + e); \ qd[H2(off[beat])] = data; \ @@ -XXX,XX +XXX,XX @@ DO_VLD4W(vld43w, 6, 7, 8, 9) static const uint8_t off[4] = { O1, O2, O3, O4 }; \ uint32_t addr, data; \ uint32_t *qd; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ continue; \ } \ addr = base + off[beat]; \ - data = cpu_ldl_le_data_ra(env, addr, GETPC()); \ + data = cpu_ldl_mmu(env, addr, oi, GETPC()); \ qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + (beat & 1)); \ qd[H4(off[beat] >> 3)] = data; \ } \ -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the store operations in the VST4 instruction. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-11-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/mve_helper.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/mve_helper.c +++ b/target/arm/tcg/mve_helper.c @@ -XXX,XX +XXX,XX @@ DO_VLD2W(vld21w, 8, 12, 16, 20) uint16_t mask = mve_eci_mask(env); \ static const uint8_t off[4] = { O1, O2, O3, O4 }; \ uint32_t addr, data; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ @@ -XXX,XX +XXX,XX @@ DO_VLD2W(vld21w, 8, 12, 16, 20) uint8_t *qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + e); \ data = (data << 8) | qd[H1(off[beat])]; \ } \ - cpu_stl_le_data_ra(env, addr, data, GETPC()); \ + cpu_stl_mmu(env, addr, data, oi, GETPC()); \ } \ } @@ -XXX,XX +XXX,XX @@ DO_VLD2W(vld21w, 8, 12, 16, 20) uint32_t addr, data; \ int y; /* y counts 0 2 0 2 */ \ uint16_t *qd; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0, y = 0; beat < 4; beat++, mask >>= 4, y ^= 2) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ @@ -XXX,XX +XXX,XX @@ DO_VLD2W(vld21w, 8, 12, 16, 20) data = qd[H2(off[beat])]; \ qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + y + 1); \ data |= qd[H2(off[beat])] << 16; \ - cpu_stl_le_data_ra(env, addr, data, GETPC()); \ + cpu_stl_mmu(env, addr, data, oi, GETPC()); \ } \ } @@ -XXX,XX +XXX,XX @@ DO_VLD2W(vld21w, 8, 12, 16, 20) uint32_t addr, data; \ uint32_t *qd; \ int y; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ @@ -XXX,XX +XXX,XX @@ DO_VLD2W(vld21w, 8, 12, 16, 20) y = (beat + (O1 & 2)) & 3; \ qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + y); \ data = qd[H4(off[beat] >> 2)]; \ - cpu_stl_le_data_ra(env, addr, data, GETPC()); \ + cpu_stl_mmu(env, addr, data, oi, GETPC()); \ } \ } -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the store operations in the VST2 instruction. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-12-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/mve_helper.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/mve_helper.c +++ b/target/arm/tcg/mve_helper.c @@ -XXX,XX +XXX,XX @@ DO_VST4W(vst43w, 6, 7, 8, 9) static const uint8_t off[4] = { O1, O2, O3, O4 }; \ uint32_t addr, data; \ uint8_t *qd; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ @@ -XXX,XX +XXX,XX @@ DO_VST4W(vst43w, 6, 7, 8, 9) qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + (e & 1)); \ data = (data << 8) | qd[H1(off[beat] + (e >> 1))]; \ } \ - cpu_stl_le_data_ra(env, addr, data, GETPC()); \ + cpu_stl_mmu(env, addr, data, oi, GETPC()); \ } \ } @@ -XXX,XX +XXX,XX @@ DO_VST4W(vst43w, 6, 7, 8, 9) uint32_t addr, data; \ int e; \ uint16_t *qd; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ @@ -XXX,XX +XXX,XX @@ DO_VST4W(vst43w, 6, 7, 8, 9) qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + e); \ data = (data << 16) | qd[H2(off[beat])]; \ } \ - cpu_stl_le_data_ra(env, addr, data, GETPC()); \ + cpu_stl_mmu(env, addr, data, oi, GETPC()); \ } \ } @@ -XXX,XX +XXX,XX @@ DO_VST4W(vst43w, 6, 7, 8, 9) static const uint8_t off[4] = { O1, O2, O3, O4 }; \ uint32_t addr, data; \ uint32_t *qd; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ @@ -XXX,XX +XXX,XX @@ DO_VST4W(vst43w, 6, 7, 8, 9) addr = base + off[beat]; \ qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + (beat & 1)); \ data = qd[H4(off[beat] >> 3)]; \ - cpu_stl_le_data_ra(env, addr, data, GETPC()); \ + cpu_stl_mmu(env, addr, data, oi, GETPC()); \ } \ } -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> We failed to raise an exception when sme_excp_el == 0 and fp_excp_el == 1. Cc: qemu-stable@nongnu.org Fixes: 3d74825f4d6 ("target/arm: Add SME enablement checks") Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-2-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-a64.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ bool sme_enabled_check(DisasContext *s) * to be zero when fp_excp_el has priority. This is because we need * sme_excp_el by itself for cpregs access checks. */ - if (!s->fp_excp_el || s->sme_excp_el < s->fp_excp_el) { + if (s->sme_excp_el + && (!s->fp_excp_el || s->sme_excp_el <= s->fp_excp_el)) { bool ret = sme_access_check(s); s->fp_access_checked = (ret ? 1 : -1); return ret; -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Do not assume SME implies SVE. Ensure that the non-streaming check is present along the SME path, since it is not implied by sme_*_enabled_check. Cc: qemu-stable@nongnu.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-3-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-a64.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static bool fp_access_check_only(DisasContext *s) return true; } -static bool fp_access_check(DisasContext *s) +static bool nonstreaming_check(DisasContext *s) { - if (!fp_access_check_only(s)) { - return false; - } if (s->sme_trap_nonstreaming && s->is_nonstreaming) { gen_exception_insn(s, 0, EXCP_UDEF, syn_smetrap(SME_ET_Streaming, false)); @@ -XXX,XX +XXX,XX @@ static bool fp_access_check(DisasContext *s) return true; } +static bool fp_access_check(DisasContext *s) +{ + return fp_access_check_only(s) && nonstreaming_check(s); +} + /* * Return <0 for non-supported element sizes, with MO_16 controlled by * FEAT_FP16; return 0 for fp disabled; otherwise return >0 for success. @@ -XXX,XX +XXX,XX @@ static int fp_access_check_vector_hsd(DisasContext *s, bool is_q, MemOp esz) */ bool sve_access_check(DisasContext *s) { - if (s->pstate_sm || !dc_isar_feature(aa64_sve, s)) { + if (dc_isar_feature(aa64_sme, s)) { bool ret; - assert(dc_isar_feature(aa64_sme, s)); - ret = sme_sm_enabled_check(s); + if (s->pstate_sm) { + ret = sme_enabled_check(s); + } else if (dc_isar_feature(aa64_sve, s)) { + goto continue_sve; + } else { + ret = sme_sm_enabled_check(s); + } + if (ret) { + ret = nonstreaming_check(s); + } s->sve_access_checked = (ret ? 1 : -1); return ret; } + + continue_sve: if (s->sve_excp_el) { /* Assert that we only raise one exception per instruction. */ assert(!s->sve_access_checked); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> We missed the instructions UDEF when the vector size is too small. We missed marking the instructions non-streaming with SME. Cc: qemu-stable@nongnu.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-4-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-sve.c | 43 ++++++++++++++++++++++++---------- 1 file changed, 30 insertions(+), 13 deletions(-) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(PUNPKHI, aa64_sve, do_perm_pred2, a, 1, gen_helper_sve_punpk_p) *** SVE Permute - Interleaving Group */ +static bool do_interleave_q(DisasContext *s, gen_helper_gvec_3 *fn, + arg_rrr_esz *a, int data) +{ + if (sve_access_check(s)) { + unsigned vsz = vec_full_reg_size(s); + if (vsz < 32) { + unallocated_encoding(s); + } else { + tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd), + vec_full_reg_offset(s, a->rn), + vec_full_reg_offset(s, a->rm), + vsz, vsz, data, fn); + } + } + return true; +} + static gen_helper_gvec_3 * const zip_fns[4] = { gen_helper_sve_zip_b, gen_helper_sve_zip_h, gen_helper_sve_zip_s, gen_helper_sve_zip_d, @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(ZIP1_z, aa64_sve, gen_gvec_ool_arg_zzz, TRANS_FEAT(ZIP2_z, aa64_sve, gen_gvec_ool_arg_zzz, zip_fns[a->esz], a, vec_full_reg_size(s) / 2) -TRANS_FEAT(ZIP1_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz, - gen_helper_sve2_zip_q, a, 0) -TRANS_FEAT(ZIP2_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz, - gen_helper_sve2_zip_q, a, - QEMU_ALIGN_DOWN(vec_full_reg_size(s), 32) / 2) +TRANS_FEAT_NONSTREAMING(ZIP1_q, aa64_sve_f64mm, do_interleave_q, + gen_helper_sve2_zip_q, a, 0) +TRANS_FEAT_NONSTREAMING(ZIP2_q, aa64_sve_f64mm, do_interleave_q, + gen_helper_sve2_zip_q, a, + QEMU_ALIGN_DOWN(vec_full_reg_size(s), 32) / 2) static gen_helper_gvec_3 * const uzp_fns[4] = { gen_helper_sve_uzp_b, gen_helper_sve_uzp_h, @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(UZP1_z, aa64_sve, gen_gvec_ool_arg_zzz, TRANS_FEAT(UZP2_z, aa64_sve, gen_gvec_ool_arg_zzz, uzp_fns[a->esz], a, 1 << a->esz) -TRANS_FEAT(UZP1_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz, - gen_helper_sve2_uzp_q, a, 0) -TRANS_FEAT(UZP2_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz, - gen_helper_sve2_uzp_q, a, 16) +TRANS_FEAT_NONSTREAMING(UZP1_q, aa64_sve_f64mm, do_interleave_q, + gen_helper_sve2_uzp_q, a, 0) +TRANS_FEAT_NONSTREAMING(UZP2_q, aa64_sve_f64mm, do_interleave_q, + gen_helper_sve2_uzp_q, a, 16) static gen_helper_gvec_3 * const trn_fns[4] = { gen_helper_sve_trn_b, gen_helper_sve_trn_h, @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(TRN1_z, aa64_sve, gen_gvec_ool_arg_zzz, TRANS_FEAT(TRN2_z, aa64_sve, gen_gvec_ool_arg_zzz, trn_fns[a->esz], a, 1 << a->esz) -TRANS_FEAT(TRN1_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz, - gen_helper_sve2_trn_q, a, 0) -TRANS_FEAT(TRN2_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz, - gen_helper_sve2_trn_q, a, 16) +TRANS_FEAT_NONSTREAMING(TRN1_q, aa64_sve_f64mm, do_interleave_q, + gen_helper_sve2_trn_q, a, 0) +TRANS_FEAT_NONSTREAMING(TRN2_q, aa64_sve_f64mm, do_interleave_q, + gen_helper_sve2_trn_q, a, 16) /* *** SVE Permute Vector - Predicated Group -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Replace @rda_rn_rm_e0 with @rda_rn_rm_ex, and require users to supply an explicit esz. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-5-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 48 +++++++++++++++++++-------------------- 1 file changed, 24 insertions(+), 24 deletions(-) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ @rda_rn_rm ........ esz:2 . rm:5 ... ... rn:5 rd:5 \ &rrrr_esz ra=%reg_movprfx -# Four operand with unused vector element size -@rda_rn_rm_e0 ........ ... rm:5 ... ... rn:5 rd:5 \ - &rrrr_esz esz=0 ra=%reg_movprfx -@rdn_ra_rm_e0 ........ ... rm:5 ... ... ra:5 rd:5 \ - &rrrr_esz esz=0 rn=%reg_movprfx +# Four operand with explicit vector element size +@rda_rn_rm_ex ........ ... rm:5 ... ... rn:5 rd:5 \ + &rrrr_esz ra=%reg_movprfx +@rdn_ra_rm_ex ........ ... rm:5 ... ... ra:5 rd:5 \ + &rrrr_esz rn=%reg_movprfx # Three operand with "memory" size, aka immediate left shift @rd_rn_msz_rm ........ ... rm:5 .... imm:2 rn:5 rd:5 &rrri @@ -XXX,XX +XXX,XX @@ XAR 00000100 .. 1 ..... 001 101 rm:5 rd:5 &rrri_esz \ rn=%reg_movprfx esz=%tszimm16_esz imm=%tszimm16_shr # SVE2 bitwise ternary operations -EOR3 00000100 00 1 ..... 001 110 ..... ..... @rdn_ra_rm_e0 -BSL 00000100 00 1 ..... 001 111 ..... ..... @rdn_ra_rm_e0 -BCAX 00000100 01 1 ..... 001 110 ..... ..... @rdn_ra_rm_e0 -BSL1N 00000100 01 1 ..... 001 111 ..... ..... @rdn_ra_rm_e0 -BSL2N 00000100 10 1 ..... 001 111 ..... ..... @rdn_ra_rm_e0 -NBSL 00000100 11 1 ..... 001 111 ..... ..... @rdn_ra_rm_e0 +EOR3 00000100 00 1 ..... 001 110 ..... ..... @rdn_ra_rm_ex esz=0 +BSL 00000100 00 1 ..... 001 111 ..... ..... @rdn_ra_rm_ex esz=0 +BCAX 00000100 01 1 ..... 001 110 ..... ..... @rdn_ra_rm_ex esz=0 +BSL1N 00000100 01 1 ..... 001 111 ..... ..... @rdn_ra_rm_ex esz=0 +BSL2N 00000100 10 1 ..... 001 111 ..... ..... @rdn_ra_rm_ex esz=0 +NBSL 00000100 11 1 ..... 001 111 ..... ..... @rdn_ra_rm_ex esz=0 ### SVE Index Generation Group @@ -XXX,XX +XXX,XX @@ EORTB 01000101 .. 0 ..... 10010 1 ..... ..... @rd_rn_rm ## SVE integer matrix multiply accumulate -SMMLA 01000101 00 0 ..... 10011 0 ..... ..... @rda_rn_rm_e0 -USMMLA 01000101 10 0 ..... 10011 0 ..... ..... @rda_rn_rm_e0 -UMMLA 01000101 11 0 ..... 10011 0 ..... ..... @rda_rn_rm_e0 +SMMLA 01000101 00 0 ..... 10011 0 ..... ..... @rda_rn_rm_ex esz=2 +USMMLA 01000101 10 0 ..... 10011 0 ..... ..... @rda_rn_rm_ex esz=2 +UMMLA 01000101 11 0 ..... 10011 0 ..... ..... @rda_rn_rm_ex esz=2 ## SVE2 bitwise permute @@ -XXX,XX +XXX,XX @@ SQRDCMLAH_zzzz 01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5 ra=%reg_movprfx USDOT_zzzz 01000100 .. 0 ..... 011 110 ..... ..... @rda_rn_rm ### SVE2 floating point matrix multiply accumulate -BFMMLA 01100100 01 1 ..... 111 001 ..... ..... @rda_rn_rm_e0 -FMMLA_s 01100100 10 1 ..... 111 001 ..... ..... @rda_rn_rm_e0 -FMMLA_d 01100100 11 1 ..... 111 001 ..... ..... @rda_rn_rm_e0 +BFMMLA 01100100 01 1 ..... 111 001 ..... ..... @rda_rn_rm_ex esz=1 +FMMLA_s 01100100 10 1 ..... 111 001 ..... ..... @rda_rn_rm_ex esz=2 +FMMLA_d 01100100 11 1 ..... 111 001 ..... ..... @rda_rn_rm_ex esz=3 ### SVE2 Memory Gather Load Group @@ -XXX,XX +XXX,XX @@ FCVTLT_sd 01100100 11 0010 11 101 ... ..... ..... @rd_pg_rn_e0 FLOGB 01100101 00 011 esz:2 0101 pg:3 rn:5 rd:5 &rpr_esz ### SVE2 floating-point multiply-add long (vectors) -FMLALB_zzzw 01100100 10 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_e0 -FMLALT_zzzw 01100100 10 1 ..... 10 0 00 1 ..... ..... @rda_rn_rm_e0 -FMLSLB_zzzw 01100100 10 1 ..... 10 1 00 0 ..... ..... @rda_rn_rm_e0 -FMLSLT_zzzw 01100100 10 1 ..... 10 1 00 1 ..... ..... @rda_rn_rm_e0 +FMLALB_zzzw 01100100 10 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2 +FMLALT_zzzw 01100100 10 1 ..... 10 0 00 1 ..... ..... @rda_rn_rm_ex esz=2 +FMLSLB_zzzw 01100100 10 1 ..... 10 1 00 0 ..... ..... @rda_rn_rm_ex esz=2 +FMLSLT_zzzw 01100100 10 1 ..... 10 1 00 1 ..... ..... @rda_rn_rm_ex esz=2 -BFMLALB_zzzw 01100100 11 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_e0 -BFMLALT_zzzw 01100100 11 1 ..... 10 0 00 1 ..... ..... @rda_rn_rm_e0 +BFMLALB_zzzw 01100100 11 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2 +BFMLALT_zzzw 01100100 11 1 ..... 10 0 00 1 ..... ..... @rda_rn_rm_ex esz=2 ### SVE2 floating-point bfloat16 dot-product -BFDOT_zzzz 01100100 01 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_e0 +BFDOT_zzzz 01100100 01 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2 ### SVE2 floating-point multiply-add long (indexed) FMLALB_zzxw 01100100 10 1 ..... 0100.0 ..... ..... @rrxr_3a esz=2 -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-6-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-sve.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ DO_ZPZZ_FP(FMINNMP, aa64_sve2, sve2_fminnmp_zpzz) DO_ZPZZ_FP(FMAXP, aa64_sve2, sve2_fmaxp_zpzz) DO_ZPZZ_FP(FMINP, aa64_sve2, sve2_fminp_zpzz) +static bool do_fmmla(DisasContext *s, arg_rrrr_esz *a, + gen_helper_gvec_4_ptr *fn) +{ + if (sve_access_check(s)) { + if (vec_full_reg_size(s) < 4 * memop_size(a->esz)) { + unallocated_encoding(s); + } else { + gen_gvec_fpst_zzzz(s, fn, a->rd, a->rn, a->rm, a->ra, 0, FPST_A64); + } + } + return true; +} + +TRANS_FEAT_NONSTREAMING(FMMLA_s, aa64_sve_f32mm, do_fmmla, a, gen_helper_fmmla_s) +TRANS_FEAT_NONSTREAMING(FMMLA_d, aa64_sve_f64mm, do_fmmla, a, gen_helper_fmmla_d) + /* * SVE Integer Multiply-Add (unpredicated) */ -TRANS_FEAT_NONSTREAMING(FMMLA_s, aa64_sve_f32mm, gen_gvec_fpst_zzzz, - gen_helper_fmmla_s, a->rd, a->rn, a->rm, a->ra, - 0, FPST_A64) -TRANS_FEAT_NONSTREAMING(FMMLA_d, aa64_sve_f64mm, gen_gvec_fpst_zzzz, - gen_helper_fmmla_d, a->rd, a->rn, a->rm, a->ra, - 0, FPST_A64) - static gen_helper_gvec_4 * const sqdmlal_zzzw_fns[] = { NULL, gen_helper_sve2_sqdmlal_zzzw_h, gen_helper_sve2_sqdmlal_zzzw_s, gen_helper_sve2_sqdmlal_zzzw_d, -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> All F64MM instructions operate on a 256-bit vector. If only 128-bit vectors is supported by the cpu, then the cpu cannot enable F64MM. Suggested-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-7-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/cpu64.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu64.c +++ b/target/arm/cpu64.c @@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp) /* From now on sve_max_vq is the actual maximum supported length. */ cpu->sve_max_vq = max_vq; cpu->sve_vq.map = vq_map; + + /* FEAT_F64MM requires the existence of a 256-bit vector size. */ + if (max_vq < 2) { + uint64_t t = GET_IDREG(&cpu->isar, ID_AA64ZFR0); + t = FIELD_DP64(t, ID_AA64ZFR0, F64MM, 0); + SET_IDREG(&cpu->isar, ID_AA64ZFR0, t); + } } /* -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Gvec only operates on size 8 and multiples of 16. Predicates may be any multiple of 2. Round up the size using the appropriate function. Cc: qemu-stable@nongnu.org Fixes: 598ab0b24c0 ("target/arm: Implement PSEL") Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-8-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-sve.c | 1 + 1 file changed, 1 insertion(+) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_PSEL(DisasContext *s, arg_psel *a) tcg_gen_neg_i64(tmp, tmp); /* Apply to either copy the source, or write zeros. */ + pl = size_for_gvec(pl); tcg_gen_gvec_ands(MO_64, pred_full_reg_offset(s, a->pd), pred_full_reg_offset(s, a->pn), tmp, pl, pl); return true; -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Implement FPProcessNaNs4 within f16_dotadd, rather than simply letting NaNs propagate through the function. Cc: qemu-stable@nongnu.org Fixes: 3916841ac75 ("target/arm: Implement FMOPA, FMOPS (widening)") Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-9-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme_helper.c | 62 +++++++++++++++++++++++++++---------- 1 file changed, 46 insertions(+), 16 deletions(-) diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ static float32 f16_dotadd(float32 sum, uint32_t e1, uint32_t e2, * - we have pre-set-up copy of s_std which is set to round-to-odd, * for the multiply (see below) */ - float64 e1r = float16_to_float64(e1 & 0xffff, true, s_f16); - float64 e1c = float16_to_float64(e1 >> 16, true, s_f16); - float64 e2r = float16_to_float64(e2 & 0xffff, true, s_f16); - float64 e2c = float16_to_float64(e2 >> 16, true, s_f16); - float64 t64; + float16 h1r = e1 & 0xffff; + float16 h1c = e1 >> 16; + float16 h2r = e2 & 0xffff; + float16 h2c = e2 >> 16; float32 t32; - /* - * The ARM pseudocode function FPDot performs both multiplies - * and the add with a single rounding operation. Emulate this - * by performing the first multiply in round-to-odd, then doing - * the second multiply as fused multiply-add, and rounding to - * float32 all in one step. - */ - t64 = float64_mul(e1r, e2r, s_odd); - t64 = float64r32_muladd(e1c, e2c, t64, 0, s_std); + /* C.f. FPProcessNaNs4 */ + if (float16_is_any_nan(h1r) || float16_is_any_nan(h1c) || + float16_is_any_nan(h2r) || float16_is_any_nan(h2c)) { + float16 t16; - /* This conversion is exact, because we've already rounded. */ - t32 = float64_to_float32(t64, s_std); + if (float16_is_signaling_nan(h1r, s_f16)) { + t16 = h1r; + } else if (float16_is_signaling_nan(h1c, s_f16)) { + t16 = h1c; + } else if (float16_is_signaling_nan(h2r, s_f16)) { + t16 = h2r; + } else if (float16_is_signaling_nan(h2c, s_f16)) { + t16 = h2c; + } else if (float16_is_any_nan(h1r)) { + t16 = h1r; + } else if (float16_is_any_nan(h1c)) { + t16 = h1c; + } else if (float16_is_any_nan(h2r)) { + t16 = h2r; + } else { + t16 = h2c; + } + t32 = float16_to_float32(t16, true, s_f16); + } else { + float64 e1r = float16_to_float64(h1r, true, s_f16); + float64 e1c = float16_to_float64(h1c, true, s_f16); + float64 e2r = float16_to_float64(h2r, true, s_f16); + float64 e2c = float16_to_float64(h2c, true, s_f16); + float64 t64; + + /* + * The ARM pseudocode function FPDot performs both multiplies + * and the add with a single rounding operation. Emulate this + * by performing the first multiply in round-to-odd, then doing + * the second multiply as fused multiply-add, and rounding to + * float32 all in one step. + */ + t64 = float64_mul(e1r, e2r, s_odd); + t64 = float64r32_muladd(e1c, e2c, t64, 0, s_std); + + /* This conversion is exact, because we've already rounded. */ + t32 = float64_to_float32(t64, s_std); + } /* The final accumulation step is not fused. */ return float32_add(sum, t32, s_std); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Implement FPProcessNaNs4 within bfdotadd_ebf, rather than simply letting NaNs propagate through the function. Cc: qemu-stable@nongnu.org Fixes: 0e1850182a1 ("target/arm: Implement FPCR.EBF=1 semantics for bfdotadd()") Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20250704142112.1018902-10-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/vec_helper.c | 75 ++++++++++++++++++++++++++----------- 1 file changed, 53 insertions(+), 22 deletions(-) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2, float_status *fpst) float32 bfdotadd_ebf(float32 sum, uint32_t e1, uint32_t e2, float_status *fpst, float_status *fpst_odd) { - /* - * Compare f16_dotadd() in sme_helper.c, but here we have - * bfloat16 inputs. In particular that means that we do not - * want the FPCR.FZ16 flush semantics, so we use the normal - * float_status for the input handling here. - */ - float64 e1r = float32_to_float64(e1 << 16, fpst); - float64 e1c = float32_to_float64(e1 & 0xffff0000u, fpst); - float64 e2r = float32_to_float64(e2 << 16, fpst); - float64 e2c = float32_to_float64(e2 & 0xffff0000u, fpst); - float64 t64; + float32 s1r = e1 << 16; + float32 s1c = e1 & 0xffff0000u; + float32 s2r = e2 << 16; + float32 s2c = e2 & 0xffff0000u; float32 t32; - /* - * The ARM pseudocode function FPDot performs both multiplies - * and the add with a single rounding operation. Emulate this - * by performing the first multiply in round-to-odd, then doing - * the second multiply as fused multiply-add, and rounding to - * float32 all in one step. - */ - t64 = float64_mul(e1r, e2r, fpst_odd); - t64 = float64r32_muladd(e1c, e2c, t64, 0, fpst); + /* C.f. FPProcessNaNs4 */ + if (float32_is_any_nan(s1r) || float32_is_any_nan(s1c) || + float32_is_any_nan(s2r) || float32_is_any_nan(s2c)) { + if (float32_is_signaling_nan(s1r, fpst)) { + t32 = s1r; + } else if (float32_is_signaling_nan(s1c, fpst)) { + t32 = s1c; + } else if (float32_is_signaling_nan(s2r, fpst)) { + t32 = s2r; + } else if (float32_is_signaling_nan(s2c, fpst)) { + t32 = s2c; + } else if (float32_is_any_nan(s1r)) { + t32 = s1r; + } else if (float32_is_any_nan(s1c)) { + t32 = s1c; + } else if (float32_is_any_nan(s2r)) { + t32 = s2r; + } else { + t32 = s2c; + } + /* + * FPConvertNaN(FPProcessNaN(t32)) will be done as part + * of the final addition below. + */ + } else { + /* + * Compare f16_dotadd() in sme_helper.c, but here we have + * bfloat16 inputs. In particular that means that we do not + * want the FPCR.FZ16 flush semantics, so we use the normal + * float_status for the input handling here. + */ + float64 e1r = float32_to_float64(s1r, fpst); + float64 e1c = float32_to_float64(s1c, fpst); + float64 e2r = float32_to_float64(s2r, fpst); + float64 e2c = float32_to_float64(s2c, fpst); + float64 t64; - /* This conversion is exact, because we've already rounded. */ - t32 = float64_to_float32(t64, fpst); + /* + * The ARM pseudocode function FPDot performs both multiplies + * and the add with a single rounding operation. Emulate this + * by performing the first multiply in round-to-odd, then doing + * the second multiply as fused multiply-add, and rounding to + * float32 all in one step. + */ + t64 = float64_mul(e1r, e2r, fpst_odd); + t64 = float64r32_muladd(e1c, e2c, t64, 0, fpst); + + /* This conversion is exact, because we've already rounded. */ + t32 = float64_to_float32(t64, fpst); + } /* The final accumulation step is not fused. */ return float32_add(sum, t32, fpst); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> The last use of this field was removed in b2fc7be972b9. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-11-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/cpu.h | 3 --- 1 file changed, 3 deletions(-) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState { uint32_t xregs[16]; - /* Scratch space for aa32 neon expansion. */ - uint32_t scratch[8]; - /* There are a number of distinct float control structures. */ float_status fp_status[FPST_COUNT]; -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Rather than repeatedly copying FPST_FPCR to locals and setting default nan mode, create dedicated float_status. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-12-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/cpu.h | 12 +++++++++++- target/arm/cpu.c | 4 ++++ target/arm/tcg/vfp_helper.c | 12 +++++++++++- 3 files changed, 26 insertions(+), 2 deletions(-) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -XXX,XX +XXX,XX @@ typedef struct NVICState NVICState; * when FPCR.AH == 1 (bfloat16 conversions and multiplies, * and the reciprocal and square root estimate/step insns); * for half-precision + * ZA: the "streaming sve" fp status. + * ZA_F16: likewise for half-precision. * * Half-precision operations are governed by a separate * flush-to-zero control bit in FPSCR:FZ16. We pass a separate @@ -XXX,XX +XXX,XX @@ typedef struct NVICState NVICState; * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16, * which means we need an FPST_AH_F16 as well. * + * The "ZA" float_status are for Streaming SVE operations which use + * default-NaN and do not generate fp exceptions, which means that they + * do not accumulate exception bits back into FPCR. + * See e.g. FPAdd vs FPAdd_ZA pseudocode functions, and the setting + * of fpcr.DN and fpexec parameters. + * * To avoid having to transfer exception bits around, we simply * say that the FPSCR cumulative exception flags are the logical * OR of the flags in the four fp statuses. This relies on the @@ -XXX,XX +XXX,XX @@ typedef enum ARMFPStatusFlavour { FPST_A64_F16, FPST_AH, FPST_AH_F16, + FPST_ZA, + FPST_ZA_F16, FPST_STD, FPST_STD_F16, } ARMFPStatusFlavour; -#define FPST_COUNT 8 +#define FPST_COUNT 10 typedef struct CPUArchState { /* Regs for current mode. */ diff --git a/target/arm/cpu.c b/target/arm/cpu.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type) set_flush_inputs_to_zero(1, &env->vfp.fp_status[FPST_STD]); set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD]); set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD_F16]); + set_default_nan_mode(1, &env->vfp.fp_status[FPST_ZA]); + set_default_nan_mode(1, &env->vfp.fp_status[FPST_ZA_F16]); arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A32]); arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64]); + arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_ZA]); arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD]); arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A32_F16]); arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]); + arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_ZA_F16]); arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD_F16]); arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_AH]); set_flush_to_zero(1, &env->vfp.fp_status[FPST_AH]); diff --git a/target/arm/tcg/vfp_helper.c b/target/arm/tcg/vfp_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vfp_helper.c +++ b/target/arm/tcg/vfp_helper.c @@ -XXX,XX +XXX,XX @@ uint32_t vfp_get_fpsr_from_host(CPUARMState *env) a64_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_A64_F16]) & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used)); /* - * We do not merge in flags from FPST_AH or FPST_AH_F16, because + * We do not merge in flags from FPST_{AH,ZA} or FPST_{AH,ZA}_F16, because * they are used for insns that must not set the cumulative exception bits. */ @@ -XXX,XX +XXX,XX @@ void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask) set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64]); set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A32_F16]); set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64_F16]); + set_float_rounding_mode(i, &env->vfp.fp_status[FPST_ZA]); + set_float_rounding_mode(i, &env->vfp.fp_status[FPST_ZA_F16]); } if (changed & FPCR_FZ16) { bool ftz_enabled = val & FPCR_FZ16; @@ -XXX,XX +XXX,XX @@ void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask) set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64_F16]); set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]); set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]); + set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_ZA_F16]); set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A32_F16]); set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64_F16]); set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]); set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]); + set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_ZA_F16]); } if (changed & FPCR_FZ) { bool ftz_enabled = val & FPCR_FZ; set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A32]); set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64]); + set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_ZA]); /* FIZ is A64 only so FZ always makes A32 code flush inputs to zero */ set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A32]); } @@ -XXX,XX +XXX,XX @@ void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask) bool fitz_enabled = (val & FPCR_FIZ) || (val & (FPCR_FZ | FPCR_AH)) == FPCR_FZ; set_flush_inputs_to_zero(fitz_enabled, &env->vfp.fp_status[FPST_A64]); + set_flush_inputs_to_zero(fitz_enabled, &env->vfp.fp_status[FPST_ZA]); } if (changed & FPCR_DN) { bool dnan_enabled = val & FPCR_DN; @@ -XXX,XX +XXX,XX @@ void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask) /* Change behaviours for A64 FP operations */ arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_A64]); arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]); + arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_ZA]); + arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_ZA_F16]); } else { arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64]); arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]); + arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_ZA]); + arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_ZA_F16]); } } /* -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-13-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme_helper.c | 37 ++++++++-------------------------- target/arm/tcg/translate-sme.c | 4 ++-- 2 files changed, 10 insertions(+), 31 deletions(-) diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme_addva_d)(void *vzda, void *vzn, void *vpn, } void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn, - void *vpm, float_status *fpst_in, uint32_t desc) + void *vpm, float_status *fpst, uint32_t desc) { intptr_t row, col, oprsz = simd_maxsz(desc); uint32_t neg = simd_data(desc) << 31; uint16_t *pn = vpn, *pm = vpm; - float_status fpst; - - /* - * Make a copy of float_status because this operation does not - * update the cumulative fp exception status. It also produces - * default nans. - */ - fpst = *fpst_in; - set_default_nan_mode(true, &fpst); for (row = 0; row < oprsz; ) { uint16_t pa = pn[H2(row >> 4)]; @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn, if (pb & 1) { uint32_t *a = vza_row + H1_4(col); uint32_t *m = vzm + H1_4(col); - *a = float32_muladd(n, *m, *a, 0, &fpst); + *a = float32_muladd(n, *m, *a, 0, fpst); } col += 4; pb >>= 4; @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn, } void HELPER(sme_fmopa_d)(void *vza, void *vzn, void *vzm, void *vpn, - void *vpm, float_status *fpst_in, uint32_t desc) + void *vpm, float_status *fpst, uint32_t desc) { intptr_t row, col, oprsz = simd_oprsz(desc) / 8; uint64_t neg = (uint64_t)simd_data(desc) << 63; uint64_t *za = vza, *zn = vzn, *zm = vzm; uint8_t *pn = vpn, *pm = vpm; - float_status fpst = *fpst_in; - - set_default_nan_mode(true, &fpst); for (row = 0; row < oprsz; ++row) { if (pn[H1(row)] & 1) { @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_d)(void *vza, void *vzn, void *vzm, void *vpn, for (col = 0; col < oprsz; ++col) { if (pm[H1(col)] & 1) { uint64_t *a = &za_row[col]; - *a = float64_muladd(n, zm[col], *a, 0, &fpst); + *a = float64_muladd(n, zm[col], *a, 0, fpst); } } } @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn, intptr_t row, col, oprsz = simd_maxsz(desc); uint32_t neg = simd_data(desc) * 0x80008000u; uint16_t *pn = vpn, *pm = vpm; - float_status fpst_odd, fpst_std, fpst_f16; + float_status fpst_odd = env->vfp.fp_status[FPST_ZA]; - /* - * Make copies of the fp status fields we use, because this operation - * does not update the cumulative fp exception status. It also - * produces default NaNs. We also need a second copy of fp_status with - * round-to-odd -- see above. - */ - fpst_f16 = env->vfp.fp_status[FPST_A64_F16]; - fpst_std = env->vfp.fp_status[FPST_A64]; - set_default_nan_mode(true, &fpst_std); - set_default_nan_mode(true, &fpst_f16); - fpst_odd = fpst_std; set_float_rounding_mode(float_round_to_odd, &fpst_odd); for (row = 0; row < oprsz; ) { @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn, m = f16mop_adj_pair(m, pcol, 0); *a = f16_dotadd(*a, n, m, - &fpst_f16, &fpst_std, &fpst_odd); + &env->vfp.fp_status[FPST_ZA_F16], + &env->vfp.fp_status[FPST_ZA], + &fpst_odd); } col += 4; pcol >>= 4; diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_outprod_env(DisasContext *s, arg_op *a, MemOp esz, TRANS_FEAT(FMOPA_h, aa64_sme, do_outprod_env, a, MO_32, gen_helper_sme_fmopa_h) TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, - MO_32, FPST_A64, gen_helper_sme_fmopa_s) + MO_32, FPST_ZA, gen_helper_sme_fmopa_s) TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, - MO_64, FPST_A64, gen_helper_sme_fmopa_d) + MO_64, FPST_ZA, gen_helper_sme_fmopa_d) TRANS_FEAT(BFMOPA, aa64_sme, do_outprod_env, a, MO_32, gen_helper_sme_bfmopa) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> The whole ZA state will also contain ZT0. Make things easier in aarch64_set_svcr to zero both by wrapping them in a common structure. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-14-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/cpu.h | 48 +++++++++++++++++++--------------- linux-user/aarch64/signal.c | 4 +-- target/arm/cpu.c | 4 +-- target/arm/helper.c | 2 +- target/arm/machine.c | 2 +- target/arm/tcg/sme_helper.c | 6 ++--- target/arm/tcg/translate-sme.c | 4 +-- 7 files changed, 38 insertions(+), 32 deletions(-) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState { uint64_t scxtnum_el[4]; - /* - * SME ZA storage -- 256 x 256 byte array, with bytes in host word order, - * as we do with vfp.zregs[]. This corresponds to the architectural ZA - * array, where ZA[N] is in the least-significant bytes of env->zarray[N]. - * When SVL is less than the architectural maximum, the accessible - * storage is restricted, such that if the SVL is X bytes the guest can - * see only the bottom X elements of zarray[], and only the least - * significant X bytes of each element of the array. (In other words, - * the observable part is always square.) - * - * The ZA storage can also be considered as a set of square tiles of - * elements of different sizes. The mapping from tiles to the ZA array - * is architecturally defined, such that for tiles of elements of esz - * bytes, the Nth row (or "horizontal slice") of tile T is in - * ZA[T + N * esz]. Note that this means that each tile is not contiguous - * in the ZA storage, because its rows are striped through the ZA array. - * - * Because this is so large, keep this toward the end of the reset area, - * to keep the offsets into the rest of the structure smaller. - */ - ARMVectorReg zarray[ARM_MAX_VQ * 16]; + struct { + /* + * SME ZA storage -- 256 x 256 byte array, with bytes in host + * word order, as we do with vfp.zregs[]. This corresponds to + * the architectural ZA array, where ZA[N] is in the least + * significant bytes of env->za_state.za[N]. + * + * When SVL is less than the architectural maximum, the accessible + * storage is restricted, such that if the SVL is X bytes the guest + * can see only the bottom X elements of zarray[], and only the least + * significant X bytes of each element of the array. (In other words, + * the observable part is always square.) + * + * The ZA storage can also be considered as a set of square tiles of + * elements of different sizes. The mapping from tiles to the ZA array + * is architecturally defined, such that for tiles of elements of esz + * bytes, the Nth row (or "horizontal slice") of tile T is in + * ZA[T + N * esz]. Note that this means that each tile is not + * contiguous in the ZA storage, because its rows are striped through + * the ZA array. + * + * Because this is so large, keep this toward the end of the + * reset area, to keep the offsets into the rest of the structure + * smaller. + */ + ARMVectorReg za[ARM_MAX_VQ * 16]; + } za_state; struct CPUBreakpoint *cpu_breakpoint[16]; struct CPUWatchpoint *cpu_watchpoint[16]; diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c index XXXXXXX..XXXXXXX 100644 --- a/linux-user/aarch64/signal.c +++ b/linux-user/aarch64/signal.c @@ -XXX,XX +XXX,XX @@ static void target_setup_za_record(struct target_za_context *za, for (i = 0; i < vl; ++i) { uint64_t *z = (void *)za + TARGET_ZA_SIG_ZAV_OFFSET(vq, i); for (j = 0; j < vq * 2; ++j) { - __put_user_e(env->zarray[i].d[j], z + j, le); + __put_user_e(env->za_state.za[i].d[j], z + j, le); } } } @@ -XXX,XX +XXX,XX @@ static bool target_restore_za_record(CPUARMState *env, for (i = 0; i < vl; ++i) { uint64_t *z = (void *)za + TARGET_ZA_SIG_ZAV_OFFSET(vq, i); for (j = 0; j < vq * 2; ++j) { - __get_user_e(env->zarray[i].d[j], z + j, le); + __get_user_e(env->za_state.za[i].d[j], z + j, le); } } return true; diff --git a/target/arm/cpu.c b/target/arm/cpu.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -XXX,XX +XXX,XX @@ static void aarch64_cpu_dump_state(CPUState *cs, FILE *f, int flags) qemu_fprintf(f, "ZA[%0*d]=", svl_lg10, i); for (j = zcr_len; j >= 0; --j) { qemu_fprintf(f, "%016" PRIx64 ":%016" PRIx64 "%c", - env->zarray[i].d[2 * j + 1], - env->zarray[i].d[2 * j], + env->za_state.za[i].d[2 * j + 1], + env->za_state.za[i].d[2 * j], j ? ':' : '\n'); } } diff --git a/target/arm/helper.c b/target/arm/helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -XXX,XX +XXX,XX @@ void aarch64_set_svcr(CPUARMState *env, uint64_t new, uint64_t mask) * when disabled either. */ if (change & new & R_SVCR_ZA_MASK) { - memset(env->zarray, 0, sizeof(env->zarray)); + memset(&env->za_state, 0, sizeof(env->za_state)); } if (tcg_enabled()) { diff --git a/target/arm/machine.c b/target/arm/machine.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/machine.c +++ b/target/arm/machine.c @@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_za = { .minimum_version_id = 1, .needed = za_needed, .fields = (const VMStateField[]) { - VMSTATE_STRUCT_ARRAY(env.zarray, ARMCPU, ARM_MAX_VQ * 16, 0, + VMSTATE_STRUCT_ARRAY(env.za_state.za, ARMCPU, ARM_MAX_VQ * 16, 0, vmstate_vreg, ARMVectorReg), VMSTATE_END_OF_LIST() } diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void helper_sme_zero(CPUARMState *env, uint32_t imm, uint32_t svl) uint32_t i; /* - * Special case clearing the entire ZA space. + * Special case clearing the entire ZArray. * This falls into the CONSTRAINED UNPREDICTABLE zeroing of any * parts of the ZA storage outside of SVL. */ if (imm == 0xff) { - memset(env->zarray, 0, sizeof(env->zarray)); + memset(env->za_state.za, 0, sizeof(env->za_state.za)); return; } @@ -XXX,XX +XXX,XX @@ void helper_sme_zero(CPUARMState *env, uint32_t imm, uint32_t svl) */ for (i = 0; i < svl; i++) { if (imm & (1 << (i % 8))) { - memset(&env->zarray[i], 0, svl); + memset(&env->za_state.za[i], 0, svl); } } } diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static TCGv_ptr get_tile_rowcol(DisasContext *s, int esz, int rs, offset = tile * sizeof(ARMVectorReg); /* Include the byte offset of zarray to make this relative to env. */ - offset += offsetof(CPUARMState, zarray); + offset += offsetof(CPUARMState, za_state.za); tcg_gen_addi_i32(tmp, tmp, offset); /* Add the byte offset to env to produce the final pointer. */ @@ -XXX,XX +XXX,XX @@ static TCGv_ptr get_tile(DisasContext *s, int esz, int tile) TCGv_ptr addr = tcg_temp_new_ptr(); int offset; - offset = tile * sizeof(ARMVectorReg) + offsetof(CPUARMState, zarray); + offset = tile * sizeof(ARMVectorReg) + offsetof(CPUARMState, za_state.za); tcg_gen_addi_ptr(addr, tcg_env, offset); return addr; -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-15-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/cpu-features.h | 63 +++++++++++++++++++++++++++++++++++++++ target/arm/cpu.h | 1 + 2 files changed, 64 insertions(+) diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu-features.h +++ b/target/arm/cpu-features.h @@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_rpres(const ARMISARegisters *id) return FIELD_EX64_IDREG(id, ID_AA64ISAR2, RPRES); } +static inline bool isar_feature_aa64_lut(const ARMISARegisters *id) +{ + return FIELD_EX64_IDREG(id, ID_AA64ISAR2, LUT); +} + static inline bool isar_feature_aa64_fp_simd(const ARMISARegisters *id) { /* We always set the AdvSIMD and FP fields identically. */ @@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_sve2(const ARMISARegisters *id) return FIELD_EX64_IDREG(id, ID_AA64ZFR0, SVEVER) != 0; } +static inline bool isar_feature_aa64_sve2p1(const ARMISARegisters *id) +{ + return FIELD_EX64_IDREG(id, ID_AA64ZFR0, SVEVER) >=2; +} + static inline bool isar_feature_aa64_sve2_aes(const ARMISARegisters *id) { return FIELD_EX64_IDREG(id, ID_AA64ZFR0, AES) != 0; @@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_sve_f64mm(const ARMISARegisters *id) return FIELD_EX64_IDREG(id, ID_AA64ZFR0, F64MM) != 0; } +static inline bool isar_feature_aa64_sve_b16b16(const ARMISARegisters *id) +{ + return FIELD_EX64_IDREG(id, ID_AA64ZFR0, B16B16); +} + +static inline bool isar_feature_aa64_sme_b16b16(const ARMISARegisters *id) +{ + return FIELD_EX64_IDREG(id, ID_AA64SMFR0, B16B16); +} + +static inline bool isar_feature_aa64_sme_f16f16(const ARMISARegisters *id) +{ + return FIELD_EX64_IDREG(id, ID_AA64SMFR0, F16F16); +} + static inline bool isar_feature_aa64_sme_f64f64(const ARMISARegisters *id) { return FIELD_EX64_IDREG(id, ID_AA64SMFR0, F64F64); @@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_sme_fa64(const ARMISARegisters *id) return FIELD_EX64_IDREG(id, ID_AA64SMFR0, FA64); } +static inline bool isar_feature_aa64_sme2(const ARMISARegisters *id) +{ + return FIELD_EX64_IDREG(id, ID_AA64SMFR0, SMEVER) != 0; +} + +static inline bool isar_feature_aa64_sme2p1(const ARMISARegisters *id) +{ + return FIELD_EX64_IDREG(id, ID_AA64SMFR0, SMEVER) >= 2; +} + +/* + * Combinations of feature tests, for ease of use with TRANS_FEAT. + */ +static inline bool isar_feature_aa64_sme_or_sve2p1(const ARMISARegisters *id) +{ + return isar_feature_aa64_sme(id) || isar_feature_aa64_sve2p1(id); +} + +static inline bool isar_feature_aa64_sme2_or_sve2p1(const ARMISARegisters *id) +{ + return isar_feature_aa64_sme2(id) || isar_feature_aa64_sve2p1(id); +} + +static inline bool isar_feature_aa64_sme2p1_or_sve2p1(const ARMISARegisters *id) +{ + return isar_feature_aa64_sme2p1(id) || isar_feature_aa64_sve2p1(id); +} + +static inline bool isar_feature_aa64_sme2_i16i64(const ARMISARegisters *id) +{ + return isar_feature_aa64_sme2(id) && isar_feature_aa64_sme_i16i64(id); +} + +static inline bool isar_feature_aa64_sme2_f64f64(const ARMISARegisters *id) +{ + return isar_feature_aa64_sme2(id) && isar_feature_aa64_sme_f64f64(id); +} + /* * Feature tests for "does this exist in either 32-bit or 64-bit?" */ diff --git a/target/arm/cpu.h b/target/arm/cpu.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -XXX,XX +XXX,XX @@ FIELD(ID_AA64ISAR2, SYSINSTR_128, 36, 4) FIELD(ID_AA64ISAR2, PRFMSLC, 40, 4) FIELD(ID_AA64ISAR2, RPRFM, 48, 4) FIELD(ID_AA64ISAR2, CSSC, 52, 4) +FIELD(ID_AA64ISAR2, LUT, 56, 4) FIELD(ID_AA64ISAR2, ATS1A, 60, 4) FIELD(ID_AA64PFR0, EL0, 0, 4) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> This is a 512-bit array introduced with SME2. Save it only when ZA is in use. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-16-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/cpu.h | 3 +++ target/arm/machine.c | 20 ++++++++++++++++++++ 2 files changed, 23 insertions(+) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState { uint64_t scxtnum_el[4]; struct { + /* SME2 ZT0 -- 512 bit array, with data ordered like ARMVectorReg. */ + uint64_t zt0[512 / 64] QEMU_ALIGNED(16); + /* * SME ZA storage -- 256 x 256 byte array, with bytes in host * word order, as we do with vfp.zregs[]. This corresponds to diff --git a/target/arm/machine.c b/target/arm/machine.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/machine.c +++ b/target/arm/machine.c @@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_za = { } }; +static bool zt0_needed(void *opaque) +{ + ARMCPU *cpu = opaque; + + return za_needed(cpu) && cpu_isar_feature(aa64_sme2, cpu); +} + +static const VMStateDescription vmstate_zt0 = { + .name = "cpu/zt0", + .version_id = 1, + .minimum_version_id = 1, + .needed = zt0_needed, + .fields = (VMStateField[]) { + VMSTATE_UINT64_ARRAY(env.za_state.zt0, ARMCPU, + ARRAY_SIZE(((CPUARMState *)0)->za_state.zt0)), + VMSTATE_END_OF_LIST() + } +}; + static bool serror_needed(void *opaque) { ARMCPU *cpu = opaque; @@ -XXX,XX +XXX,XX @@ const VMStateDescription vmstate_arm_cpu = { &vmstate_m_security, &vmstate_sve, &vmstate_za, + &vmstate_zt0, &vmstate_serror, &vmstate_irq_line_state, &vmstate_wfxt_timer, -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Pipe the value through from SMCR_ELx through hflags and into the disassembly context. Enable EZT0 in smcr_write. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-17-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/cpu.h | 2 ++ target/arm/tcg/translate.h | 1 + target/arm/cpu.c | 3 +++ target/arm/helper.c | 6 +++++- target/arm/tcg/hflags.c | 34 +++++++++++++++++++++++++++++++++- target/arm/tcg/translate-a64.c | 1 + 6 files changed, 45 insertions(+), 2 deletions(-) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -XXX,XX +XXX,XX @@ FIELD(SVCR, ZA, 1, 1) /* Fields for SMCR_ELx. */ FIELD(SMCR, LEN, 0, 4) +FIELD(SMCR, EZT0, 30, 1) FIELD(SMCR, FA64, 31, 1) /* Write a new value to v7m.exception, thus transitioning into or out @@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A64, NV2_MEM_E20, 35, 1) FIELD(TBFLAG_A64, NV2_MEM_BE, 36, 1) FIELD(TBFLAG_A64, AH, 37, 1) /* FPCR.AH */ FIELD(TBFLAG_A64, NEP, 38, 1) /* FPCR.NEP */ +FIELD(TBFLAG_A64, ZT0EXC_EL, 39, 2) /* * Helpers for using the above. Note that only the A64 accessors use diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate.h +++ b/target/arm/tcg/translate.h @@ -XXX,XX +XXX,XX @@ typedef struct DisasContext { int fp_excp_el; /* FP exception EL or 0 if enabled */ int sve_excp_el; /* SVE exception EL or 0 if enabled */ int sme_excp_el; /* SME exception EL or 0 if enabled */ + int zt0_excp_el; /* ZT0 exception EL or 0 if enabled */ int vl; /* current vector length in bytes */ int svl; /* current streaming vector length in bytes */ bool vfp_enabled; /* FP enabled via FPSCR.EN */ diff --git a/target/arm/cpu.c b/target/arm/cpu.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -XXX,XX +XXX,XX @@ void arm_emulate_firmware_reset(CPUState *cpustate, int target_el) env->cp15.cptr_el[3] |= R_CPTR_EL3_ESM_MASK; env->cp15.scr_el3 |= SCR_ENTP2; env->vfp.smcr_el[3] = 0xf; + if (cpu_isar_feature(aa64_sme2, cpu)) { + env->vfp.smcr_el[3] |= R_SMCR_EZT0_MASK; + } } if (cpu_isar_feature(aa64_hcx, cpu)) { env->cp15.scr_el3 |= SCR_HXEN; diff --git a/target/arm/helper.c b/target/arm/helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -XXX,XX +XXX,XX @@ static void smcr_write(CPUARMState *env, const ARMCPRegInfo *ri, { int cur_el = arm_current_el(env); int old_len = sve_vqm1_for_el(env, cur_el); + uint64_t valid_mask = R_SMCR_LEN_MASK | R_SMCR_FA64_MASK; int new_len; QEMU_BUILD_BUG_ON(ARM_MAX_VQ > R_SMCR_LEN_MASK + 1); - value &= R_SMCR_LEN_MASK | R_SMCR_FA64_MASK; + if (cpu_isar_feature(aa64_sme2, env_archcpu(env))) { + valid_mask |= R_SMCR_EZT0_MASK; + } + value &= valid_mask; raw_write(env, ri, value); /* diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/hflags.c +++ b/target/arm/tcg/hflags.c @@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a32(CPUARMState *env, int fp_el, return rebuild_hflags_common_32(env, fp_el, mmu_idx, flags); } +/* + * Return the exception level to which exceptions should be taken for ZT0. + * C.f. the ARM pseudocode function CheckSMEZT0Enabled, after the ZA check. + */ +static int zt0_exception_el(CPUARMState *env, int el) +{ +#ifndef CONFIG_USER_ONLY + if (el <= 1 + && !el_is_in_host(env, el) + && !FIELD_EX64(env->vfp.smcr_el[1], SMCR, EZT0)) { + return 1; + } + if (el <= 2 + && arm_is_el2_enabled(env) + && !FIELD_EX64(env->vfp.smcr_el[2], SMCR, EZT0)) { + return 2; + } + if (arm_feature(env, ARM_FEATURE_EL3) + && !FIELD_EX64(env->vfp.smcr_el[3], SMCR, EZT0)) { + return 3; + } +#endif + return 0; +} + static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el, ARMMMUIdx mmu_idx) { @@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el, DP_TBFLAG_A64(flags, PSTATE_SM, 1); DP_TBFLAG_A64(flags, SME_TRAP_NONSTREAMING, !sme_fa64(env, el)); } - DP_TBFLAG_A64(flags, PSTATE_ZA, FIELD_EX64(env->svcr, SVCR, ZA)); + + if (FIELD_EX64(env->svcr, SVCR, ZA)) { + DP_TBFLAG_A64(flags, PSTATE_ZA, 1); + if (cpu_isar_feature(aa64_sme2, env_archcpu(env))) { + int zt0_el = zt0_exception_el(env, el); + DP_TBFLAG_A64(flags, ZT0EXC_EL, zt0_el); + } + } } sctlr = regime_sctlr(env, stage1); diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase, dc->trap_eret = EX_TBFLAG_A64(tb_flags, TRAP_ERET); dc->sve_excp_el = EX_TBFLAG_A64(tb_flags, SVEEXC_EL); dc->sme_excp_el = EX_TBFLAG_A64(tb_flags, SMEEXC_EL); + dc->zt0_excp_el = EX_TBFLAG_A64(tb_flags, ZT0EXC_EL); dc->vl = (EX_TBFLAG_A64(tb_flags, VL) + 1) * 16; dc->svl = (EX_TBFLAG_A64(tb_flags, SVL) + 1) * 16; dc->pauth_active = EX_TBFLAG_A64(tb_flags, PAUTH_ACTIVE); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-18-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/syndrome.h | 1 + target/arm/tcg/sme.decode | 1 + target/arm/tcg/translate-sme.c | 26 ++++++++++++++++++++++++++ 3 files changed, 28 insertions(+) diff --git a/target/arm/syndrome.h b/target/arm/syndrome.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/syndrome.h +++ b/target/arm/syndrome.h @@ -XXX,XX +XXX,XX @@ typedef enum { SME_ET_Streaming, SME_ET_NotStreaming, SME_ET_InactiveZA, + SME_ET_InaccessibleZT0, } SMEExceptionType; #define ARM_EL_EC_LENGTH 6 diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ ### SME Misc ZERO 11000000 00 001 00000000000 imm:8 +ZERO_zt0 11000000 01 001 00000000000 00000001 ### SME Move into/from Array diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ #include "decode-sme.c.inc" +static bool sme2_zt0_enabled_check(DisasContext *s) +{ + if (!sme_za_enabled_check(s)) { + return false; + } + if (s->zt0_excp_el) { + gen_exception_insn_el(s, 0, EXCP_UDEF, + syn_smetrap(SME_ET_InaccessibleZT0, false), + s->zt0_excp_el); + return false; + } + return true; +} /* * Resolve tile.size[index] to a host pointer, where tile and index @@ -XXX,XX +XXX,XX @@ static bool trans_ZERO(DisasContext *s, arg_ZERO *a) return true; } +static bool trans_ZERO_zt0(DisasContext *s, arg_ZERO_zt0 *a) +{ + if (!dc_isar_feature(aa64_sme2, s)) { + return false; + } + if (sme_enabled_check(s) && sme2_zt0_enabled_check(s)) { + tcg_gen_gvec_dup_imm(MO_64, offsetof(CPUARMState, za_state.zt0), + sizeof_field(CPUARMState, za_state.zt0), + sizeof_field(CPUARMState, za_state.zt0), 0); + } + return true; +} + static bool trans_MOVA(DisasContext *s, arg_MOVA *a) { static gen_helper_gvec_4 * const h_fns[5] = { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Honor AlignmentEnforced() for LDR/STR (vector), (predicate), and (array vector). Within the expansion functions, clear @align when we're done emitting loads at the largest size. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-19-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-a64.h | 6 ++-- target/arm/tcg/translate-sme.c | 5 ++-- target/arm/tcg/translate-sve.c | 50 ++++++++++++++++++++++++---------- 3 files changed, 42 insertions(+), 19 deletions(-) diff --git a/target/arm/tcg/translate-a64.h b/target/arm/tcg/translate-a64.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.h +++ b/target/arm/tcg/translate-a64.h @@ -XXX,XX +XXX,XX @@ void gen_gvec_usqadd_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz); -void gen_sve_ldr(DisasContext *s, TCGv_ptr, int vofs, int len, int rn, int imm); -void gen_sve_str(DisasContext *s, TCGv_ptr, int vofs, int len, int rn, int imm); +void gen_sve_ldr(DisasContext *s, TCGv_ptr, int vofs, + int len, int rn, int imm, MemOp align); +void gen_sve_str(DisasContext *s, TCGv_ptr, int vofs, + int len, int rn, int imm, MemOp align); #endif /* TARGET_ARM_TRANSLATE_A64_H */ diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool trans_LDST1(DisasContext *s, arg_LDST1 *a) return true; } -typedef void GenLdStR(DisasContext *, TCGv_ptr, int, int, int, int); +typedef void GenLdStR(DisasContext *, TCGv_ptr, int, int, int, int, MemOp); static bool do_ldst_r(DisasContext *s, arg_ldstr *a, GenLdStR *fn) { @@ -XXX,XX +XXX,XX @@ static bool do_ldst_r(DisasContext *s, arg_ldstr *a, GenLdStR *fn) /* ZA[n] equates to ZA0H.B[n]. */ base = get_tile_rowcol(s, MO_8, a->rv, imm, false); - fn(s, base, 0, svl, a->rn, imm * svl); + fn(s, base, 0, svl, a->rn, imm * svl, + s->align_mem ? MO_ALIGN_16 : MO_UNALN); return true; } diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(UCVTF_dd, aa64_sve, gen_gvec_fpst_arg_zpz, */ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs, - int len, int rn, int imm) + int len, int rn, int imm, MemOp align) { int len_align = QEMU_ALIGN_DOWN(len, 16); int len_remain = len % 16; @@ -XXX,XX +XXX,XX @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs, for (i = 0; i < len_align; i += 16) { tcg_gen_qemu_ld_i128(t16, clean_addr, midx, - MO_LE | MO_128 | MO_ATOM_NONE); + MO_LE | MO_128 | MO_ATOM_NONE | align); tcg_gen_extr_i128_i64(t0, t1, t16); tcg_gen_st_i64(t0, base, vofs + i); tcg_gen_st_i64(t1, base, vofs + i + 8); tcg_gen_addi_i64(clean_addr, clean_addr, 16); } + if (len_align) { + align = MO_UNALN; + } } else { TCGLabel *loop = gen_new_label(); TCGv_ptr tp, i = tcg_temp_new_ptr(); @@ -XXX,XX +XXX,XX @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs, t16 = tcg_temp_new_i128(); tcg_gen_qemu_ld_i128(t16, clean_addr, midx, - MO_LE | MO_128 | MO_ATOM_NONE); + MO_LE | MO_128 | MO_ATOM_NONE | align); tcg_gen_addi_i64(clean_addr, clean_addr, 16); tp = tcg_temp_new_ptr(); @@ -XXX,XX +XXX,XX @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs, tcg_gen_st_i64(t1, tp, vofs + 8); tcg_gen_brcondi_ptr(TCG_COND_LTU, i, len_align, loop); + align = MO_UNALN; } /* @@ -XXX,XX +XXX,XX @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs, */ if (len_remain >= 8) { t0 = tcg_temp_new_i64(); - tcg_gen_qemu_ld_i64(t0, clean_addr, midx, MO_LEUQ | MO_ATOM_NONE); + tcg_gen_qemu_ld_i64(t0, clean_addr, midx, + MO_LEUQ | MO_ATOM_NONE | align); + align = MO_UNALN; tcg_gen_st_i64(t0, base, vofs + len_align); len_remain -= 8; len_align += 8; @@ -XXX,XX +XXX,XX @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs, case 4: case 8: tcg_gen_qemu_ld_i64(t0, clean_addr, midx, - MO_LE | ctz32(len_remain) | MO_ATOM_NONE); + MO_LE | ctz32(len_remain) + | MO_ATOM_NONE | align); break; case 6: t1 = tcg_temp_new_i64(); - tcg_gen_qemu_ld_i64(t0, clean_addr, midx, MO_LEUL | MO_ATOM_NONE); + tcg_gen_qemu_ld_i64(t0, clean_addr, midx, + MO_LEUL | MO_ATOM_NONE | align); tcg_gen_addi_i64(clean_addr, clean_addr, 4); tcg_gen_qemu_ld_i64(t1, clean_addr, midx, MO_LEUW | MO_ATOM_NONE); tcg_gen_deposit_i64(t0, t0, t1, 32, 32); @@ -XXX,XX +XXX,XX @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs, /* Similarly for stores. */ void gen_sve_str(DisasContext *s, TCGv_ptr base, int vofs, - int len, int rn, int imm) + int len, int rn, int imm, MemOp align) { int len_align = QEMU_ALIGN_DOWN(len, 16); int len_remain = len % 16; @@ -XXX,XX +XXX,XX @@ void gen_sve_str(DisasContext *s, TCGv_ptr base, int vofs, tcg_gen_ld_i64(t1, base, vofs + i + 8); tcg_gen_concat_i64_i128(t16, t0, t1); tcg_gen_qemu_st_i128(t16, clean_addr, midx, - MO_LE | MO_128 | MO_ATOM_NONE); + MO_LE | MO_128 | MO_ATOM_NONE | align); tcg_gen_addi_i64(clean_addr, clean_addr, 16); } + if (len_align) { + align = MO_UNALN; + } } else { TCGLabel *loop = gen_new_label(); TCGv_ptr tp, i = tcg_temp_new_ptr(); @@ -XXX,XX +XXX,XX @@ void gen_sve_str(DisasContext *s, TCGv_ptr base, int vofs, tcg_gen_addi_i64(clean_addr, clean_addr, 16); tcg_gen_brcondi_ptr(TCG_COND_LTU, i, len_align, loop); + align = MO_UNALN; } /* Predicate register stores can be any multiple of 2. */ if (len_remain >= 8) { t0 = tcg_temp_new_i64(); tcg_gen_ld_i64(t0, base, vofs + len_align); - tcg_gen_qemu_st_i64(t0, clean_addr, midx, MO_LEUQ | MO_ATOM_NONE); + tcg_gen_qemu_st_i64(t0, clean_addr, midx, + MO_LEUQ | MO_ATOM_NONE | align); + align = MO_UNALN; len_remain -= 8; len_align += 8; if (len_remain) { @@ -XXX,XX +XXX,XX @@ void gen_sve_str(DisasContext *s, TCGv_ptr base, int vofs, case 4: case 8: tcg_gen_qemu_st_i64(t0, clean_addr, midx, - MO_LE | ctz32(len_remain) | MO_ATOM_NONE); + MO_LE | ctz32(len_remain) + | MO_ATOM_NONE | align); break; case 6: - tcg_gen_qemu_st_i64(t0, clean_addr, midx, MO_LEUL | MO_ATOM_NONE); + tcg_gen_qemu_st_i64(t0, clean_addr, midx, + MO_LEUL | MO_ATOM_NONE | align); tcg_gen_addi_i64(clean_addr, clean_addr, 4); tcg_gen_shri_i64(t0, t0, 32); tcg_gen_qemu_st_i64(t0, clean_addr, midx, MO_LEUW | MO_ATOM_NONE); @@ -XXX,XX +XXX,XX @@ static bool trans_LDR_zri(DisasContext *s, arg_rri *a) if (sve_access_check(s)) { int size = vec_full_reg_size(s); int off = vec_full_reg_offset(s, a->rd); - gen_sve_ldr(s, tcg_env, off, size, a->rn, a->imm * size); + gen_sve_ldr(s, tcg_env, off, size, a->rn, a->imm * size, + s->align_mem ? MO_ALIGN_16 : MO_UNALN); } return true; } @@ -XXX,XX +XXX,XX @@ static bool trans_LDR_pri(DisasContext *s, arg_rri *a) if (sve_access_check(s)) { int size = pred_full_reg_size(s); int off = pred_full_reg_offset(s, a->rd); - gen_sve_ldr(s, tcg_env, off, size, a->rn, a->imm * size); + gen_sve_ldr(s, tcg_env, off, size, a->rn, a->imm * size, + s->align_mem ? MO_ALIGN_2 : MO_UNALN); } return true; } @@ -XXX,XX +XXX,XX @@ static bool trans_STR_zri(DisasContext *s, arg_rri *a) if (sve_access_check(s)) { int size = vec_full_reg_size(s); int off = vec_full_reg_offset(s, a->rd); - gen_sve_str(s, tcg_env, off, size, a->rn, a->imm * size); + gen_sve_str(s, tcg_env, off, size, a->rn, a->imm * size, + s->align_mem ? MO_ALIGN_16 : MO_UNALN); } return true; } @@ -XXX,XX +XXX,XX @@ static bool trans_STR_pri(DisasContext *s, arg_rri *a) if (sve_access_check(s)) { int size = pred_full_reg_size(s); int off = pred_full_reg_offset(s, a->rd); - gen_sve_str(s, tcg_env, off, size, a->rn, a->imm * size); + gen_sve_str(s, tcg_env, off, size, a->rn, a->imm * size, + s->align_mem ? MO_ALIGN_2 : MO_UNALN); } return true; } -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-20-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 6 ++++++ target/arm/tcg/translate-sme.c | 13 +++++++++++++ 2 files changed, 19 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ LDST1 1110000 111 st:1 rm:5 v:1 .. pg:3 rn:5 0 za_imm:4 \ LDR 1110000 100 0 000000 .. 000 ..... 0 .... @ldstr STR 1110000 100 1 000000 .. 000 ..... 0 .... @ldstr +&ldstzt0 rn +@ldstzt0 ....... ... . ...... .. ... rn:5 ..... &ldstzt0 + +LDR_zt0 1110000 100 0 111111 00 000 ..... 00000 @ldstzt0 +STR_zt0 1110000 100 1 111111 00 000 ..... 00000 @ldstzt0 + ### SME Add Vector to Array &adda zad zn pm pn diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_ldst_r(DisasContext *s, arg_ldstr *a, GenLdStR *fn) TRANS_FEAT(LDR, aa64_sme, do_ldst_r, a, gen_sve_ldr) TRANS_FEAT(STR, aa64_sme, do_ldst_r, a, gen_sve_str) +static bool do_ldst_zt0(DisasContext *s, arg_ldstzt0 *a, GenLdStR *fn) +{ + if (sme2_zt0_enabled_check(s)) { + fn(s, tcg_env, offsetof(CPUARMState, za_state.zt0), + sizeof_field(CPUARMState, za_state.zt0), a->rn, 0, + s->align_mem ? MO_ALIGN_16 : MO_UNALN); + } + return true; +} + +TRANS_FEAT(LDR_zt0, aa64_sme2, do_ldst_zt0, a, gen_sve_ldr) +TRANS_FEAT(STR_zt0, aa64_sme2, do_ldst_zt0, a, gen_sve_str) + static bool do_adda(DisasContext *s, arg_adda *a, MemOp esz, gen_helper_gvec_4 *fn) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-21-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 5 +++++ target/arm/tcg/translate-sme.c | 13 +++++++++++++ 2 files changed, 18 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ MOVA 11000000 esz:2 00001 0 v:1 .. pg:3 0 za_imm:4 zr:5 \ MOVA 11000000 11 00001 1 v:1 .. pg:3 0 za_imm:4 zr:5 \ &mova to_vec=1 rs=%mova_rs esz=4 +### SME Move into/from ZT0 + +MOVT_rzt 1100 0000 0100 1100 0 off:3 00 11111 rt:5 +MOVT_ztr 1100 0000 0100 1110 0 off:3 00 11111 rt:5 + ### SME Memory &ldst esz rs pg rn rm za_imm v:bool st:bool diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool trans_MOVA(DisasContext *s, arg_MOVA *a) return true; } +static bool do_movt(DisasContext *s, arg_MOVT_rzt *a, + void (*func)(TCGv_i64, TCGv_ptr, tcg_target_long)) +{ + if (sme2_zt0_enabled_check(s)) { + func(cpu_reg(s, a->rt), tcg_env, + offsetof(CPUARMState, za_state.zt0) + a->off * 8); + } + return true; +} + +TRANS_FEAT(MOVT_rzt, aa64_sme2, do_movt, a, tcg_gen_ld_i64) +TRANS_FEAT(MOVT_ztr, aa64_sme2, do_movt, a, tcg_gen_st_i64) + static bool trans_LDST1(DisasContext *s, arg_LDST1 *a) { typedef void GenLdSt1(TCGv_env, TCGv_ptr, TCGv_ptr, TCGv, TCGv_i32); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Decode tile number and index offset beforehand and separately. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-22-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 46 +++++++++++++++++++++++----------- target/arm/tcg/translate-sme.c | 17 +++++-------- 2 files changed, 38 insertions(+), 25 deletions(-) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ ZERO_zt0 11000000 01 001 00000000000 00000001 ### SME Move into/from Array %mova_rs 13:2 !function=plus_12 -&mova esz rs pg zr za_imm v:bool to_vec:bool +&mova esz rs pg zr za off v:bool to_vec:bool -MOVA 11000000 esz:2 00000 0 v:1 .. pg:3 zr:5 0 za_imm:4 \ - &mova to_vec=0 rs=%mova_rs -MOVA 11000000 11 00000 1 v:1 .. pg:3 zr:5 0 za_imm:4 \ - &mova to_vec=0 rs=%mova_rs esz=4 +MOVA 11000000 00 00000 0 v:1 .. pg:3 zr:5 0 off:4 \ + &mova to_vec=0 rs=%mova_rs esz=0 za=0 +MOVA 11000000 01 00000 0 v:1 .. pg:3 zr:5 0 za:1 off:3 \ + &mova to_vec=0 rs=%mova_rs esz=1 +MOVA 11000000 10 00000 0 v:1 .. pg:3 zr:5 0 za:2 off:2 \ + &mova to_vec=0 rs=%mova_rs esz=2 +MOVA 11000000 11 00000 0 v:1 .. pg:3 zr:5 0 za:3 off:1 \ + &mova to_vec=0 rs=%mova_rs esz=3 +MOVA 11000000 11 00000 1 v:1 .. pg:3 zr:5 0 za:4 \ + &mova to_vec=0 rs=%mova_rs esz=4 off=0 -MOVA 11000000 esz:2 00001 0 v:1 .. pg:3 0 za_imm:4 zr:5 \ - &mova to_vec=1 rs=%mova_rs -MOVA 11000000 11 00001 1 v:1 .. pg:3 0 za_imm:4 zr:5 \ - &mova to_vec=1 rs=%mova_rs esz=4 +MOVA 11000000 00 00001 0 v:1 .. pg:3 0 off:4 zr:5 \ + &mova to_vec=1 rs=%mova_rs esz=0 za=0 +MOVA 11000000 01 00001 0 v:1 .. pg:3 0 za:1 off:3 zr:5 \ + &mova to_vec=1 rs=%mova_rs esz=1 +MOVA 11000000 10 00001 0 v:1 .. pg:3 0 za:2 off:2 zr:5 \ + &mova to_vec=1 rs=%mova_rs esz=2 +MOVA 11000000 11 00001 0 v:1 .. pg:3 0 za:3 off:1 zr:5 \ + &mova to_vec=1 rs=%mova_rs esz=3 +MOVA 11000000 11 00001 1 v:1 .. pg:3 0 za:4 zr:5 \ + &mova to_vec=1 rs=%mova_rs esz=4 off=0 ### SME Move into/from ZT0 @@ -XXX,XX +XXX,XX @@ MOVT_ztr 1100 0000 0100 1110 0 off:3 00 11111 rt:5 ### SME Memory -&ldst esz rs pg rn rm za_imm v:bool st:bool +&ldst esz rs pg rn rm za off v:bool st:bool -LDST1 1110000 0 esz:2 st:1 rm:5 v:1 .. pg:3 rn:5 0 za_imm:4 \ - &ldst rs=%mova_rs -LDST1 1110000 111 st:1 rm:5 v:1 .. pg:3 rn:5 0 za_imm:4 \ - &ldst esz=4 rs=%mova_rs +LDST1 1110000 0 00 st:1 rm:5 v:1 .. pg:3 rn:5 0 off:4 \ + &ldst rs=%mova_rs esz=0 za=0 +LDST1 1110000 0 01 st:1 rm:5 v:1 .. pg:3 rn:5 0 za:1 off:3 \ + &ldst rs=%mova_rs esz=1 +LDST1 1110000 0 10 st:1 rm:5 v:1 .. pg:3 rn:5 0 za:2 off:2 \ + &ldst rs=%mova_rs esz=2 +LDST1 1110000 0 11 st:1 rm:5 v:1 .. pg:3 rn:5 0 za:3 off:1 \ + &ldst rs=%mova_rs esz=3 +LDST1 1110000 1 11 st:1 rm:5 v:1 .. pg:3 rn:5 0 za:4 \ + &ldst rs=%mova_rs esz=4 off=0 &ldstr rv rn imm @ldstr ....... ... . ...... .. ... rn:5 . imm:4 \ diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool sme2_zt0_enabled_check(DisasContext *s) return true; } -/* - * Resolve tile.size[index] to a host pointer, where tile and index - * are always decoded together, dependent on the element size. - */ +/* Resolve tile.size[rs+imm] to a host pointer. */ static TCGv_ptr get_tile_rowcol(DisasContext *s, int esz, int rs, - int tile_index, bool vertical) + int tile, int imm, bool vertical) { - int tile = tile_index >> (4 - esz); - int index = esz == MO_128 ? 0 : extract32(tile_index, 0, 4 - esz); int pos, len, offset; TCGv_i32 tmp; TCGv_ptr addr; @@ -XXX,XX +XXX,XX @@ static TCGv_ptr get_tile_rowcol(DisasContext *s, int esz, int rs, /* Compute the final index, which is Rs+imm. */ tmp = tcg_temp_new_i32(); tcg_gen_trunc_tl_i32(tmp, cpu_reg(s, rs)); - tcg_gen_addi_i32(tmp, tmp, index); + tcg_gen_addi_i32(tmp, tmp, imm); /* Prepare a power-of-two modulo via extraction of @len bits. */ len = ctz32(streaming_vec_reg_size(s)) - esz; @@ -XXX,XX +XXX,XX @@ static bool trans_MOVA(DisasContext *s, arg_MOVA *a) return true; } - t_za = get_tile_rowcol(s, a->esz, a->rs, a->za_imm, a->v); + t_za = get_tile_rowcol(s, a->esz, a->rs, a->za, a->off, a->v); t_zr = vec_full_reg_ptr(s, a->zr); t_pg = pred_full_reg_ptr(s, a->pg); @@ -XXX,XX +XXX,XX @@ static bool trans_LDST1(DisasContext *s, arg_LDST1 *a) return true; } - t_za = get_tile_rowcol(s, a->esz, a->rs, a->za_imm, a->v); + t_za = get_tile_rowcol(s, a->esz, a->rs, a->za, a->off, a->v); t_pg = pred_full_reg_ptr(s, a->pg); addr = tcg_temp_new_i64(); @@ -XXX,XX +XXX,XX @@ static bool do_ldst_r(DisasContext *s, arg_ldstr *a, GenLdStR *fn) } /* ZA[n] equates to ZA0H.B[n]. */ - base = get_tile_rowcol(s, MO_8, a->rv, imm, false); + base = get_tile_rowcol(s, MO_8, a->rv, 0, imm, false); fn(s, base, 0, svl, a->rn, imm * svl, s->align_mem ? MO_ALIGN_16 : MO_UNALN); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Prepare for more kinds of MOVA from SME2 by renaming the existing SME1 MOVA to indicate tile to/from vector. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-23-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 42 +++++++++++++++++----------------- target/arm/tcg/translate-sme.c | 12 +++++----- 2 files changed, 27 insertions(+), 27 deletions(-) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ ZERO_zt0 11000000 01 001 00000000000 00000001 ### SME Move into/from Array %mova_rs 13:2 !function=plus_12 -&mova esz rs pg zr za off v:bool to_vec:bool +&mova_p esz rs pg zr za off v:bool -MOVA 11000000 00 00000 0 v:1 .. pg:3 zr:5 0 off:4 \ - &mova to_vec=0 rs=%mova_rs esz=0 za=0 -MOVA 11000000 01 00000 0 v:1 .. pg:3 zr:5 0 za:1 off:3 \ - &mova to_vec=0 rs=%mova_rs esz=1 -MOVA 11000000 10 00000 0 v:1 .. pg:3 zr:5 0 za:2 off:2 \ - &mova to_vec=0 rs=%mova_rs esz=2 -MOVA 11000000 11 00000 0 v:1 .. pg:3 zr:5 0 za:3 off:1 \ - &mova to_vec=0 rs=%mova_rs esz=3 -MOVA 11000000 11 00000 1 v:1 .. pg:3 zr:5 0 za:4 \ - &mova to_vec=0 rs=%mova_rs esz=4 off=0 +MOVA_tz 11000000 00 00000 0 v:1 .. pg:3 zr:5 0 off:4 \ + &mova_p rs=%mova_rs esz=0 za=0 +MOVA_tz 11000000 01 00000 0 v:1 .. pg:3 zr:5 0 za:1 off:3 \ + &mova_p rs=%mova_rs esz=1 +MOVA_tz 11000000 10 00000 0 v:1 .. pg:3 zr:5 0 za:2 off:2 \ + &mova_p rs=%mova_rs esz=2 +MOVA_tz 11000000 11 00000 0 v:1 .. pg:3 zr:5 0 za:3 off:1 \ + &mova_p rs=%mova_rs esz=3 +MOVA_tz 11000000 11 00000 1 v:1 .. pg:3 zr:5 0 za:4 \ + &mova_p rs=%mova_rs esz=4 off=0 -MOVA 11000000 00 00001 0 v:1 .. pg:3 0 off:4 zr:5 \ - &mova to_vec=1 rs=%mova_rs esz=0 za=0 -MOVA 11000000 01 00001 0 v:1 .. pg:3 0 za:1 off:3 zr:5 \ - &mova to_vec=1 rs=%mova_rs esz=1 -MOVA 11000000 10 00001 0 v:1 .. pg:3 0 za:2 off:2 zr:5 \ - &mova to_vec=1 rs=%mova_rs esz=2 -MOVA 11000000 11 00001 0 v:1 .. pg:3 0 za:3 off:1 zr:5 \ - &mova to_vec=1 rs=%mova_rs esz=3 -MOVA 11000000 11 00001 1 v:1 .. pg:3 0 za:4 zr:5 \ - &mova to_vec=1 rs=%mova_rs esz=4 off=0 +MOVA_zt 11000000 00 00001 0 v:1 .. pg:3 0 off:4 zr:5 \ + &mova_p rs=%mova_rs esz=0 za=0 +MOVA_zt 11000000 01 00001 0 v:1 .. pg:3 0 za:1 off:3 zr:5 \ + &mova_p rs=%mova_rs esz=1 +MOVA_zt 11000000 10 00001 0 v:1 .. pg:3 0 za:2 off:2 zr:5 \ + &mova_p rs=%mova_rs esz=2 +MOVA_zt 11000000 11 00001 0 v:1 .. pg:3 0 za:3 off:1 zr:5 \ + &mova_p rs=%mova_rs esz=3 +MOVA_zt 11000000 11 00001 1 v:1 .. pg:3 0 za:4 zr:5 \ + &mova_p rs=%mova_rs esz=4 off=0 ### SME Move into/from ZT0 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool trans_ZERO_zt0(DisasContext *s, arg_ZERO_zt0 *a) return true; } -static bool trans_MOVA(DisasContext *s, arg_MOVA *a) +static bool do_mova_tile(DisasContext *s, arg_mova_p *a, bool to_vec) { static gen_helper_gvec_4 * const h_fns[5] = { gen_helper_sve_sel_zpzz_b, gen_helper_sve_sel_zpzz_h, @@ -XXX,XX +XXX,XX @@ static bool trans_MOVA(DisasContext *s, arg_MOVA *a) TCGv_i32 t_desc; int svl; - if (!dc_isar_feature(aa64_sme, s)) { - return false; - } if (!sme_smza_enabled_check(s)) { return true; } @@ -XXX,XX +XXX,XX @@ static bool trans_MOVA(DisasContext *s, arg_MOVA *a) if (a->v) { /* Vertical slice -- use sme mova helpers. */ - if (a->to_vec) { + if (to_vec) { zc_fns[a->esz](t_zr, t_za, t_pg, t_desc); } else { cz_fns[a->esz](t_za, t_zr, t_pg, t_desc); } } else { /* Horizontal slice -- reuse sve sel helpers. */ - if (a->to_vec) { + if (to_vec) { h_fns[a->esz](t_zr, t_za, t_zr, t_pg, t_desc); } else { h_fns[a->esz](t_za, t_zr, t_za, t_pg, t_desc); @@ -XXX,XX +XXX,XX @@ static bool trans_MOVA(DisasContext *s, arg_MOVA *a) return true; } +TRANS_FEAT(MOVA_tz, aa64_sme, do_mova_tile, a, false) +TRANS_FEAT(MOVA_zt, aa64_sme, do_mova_tile, a, true) + static bool do_movt(DisasContext *s, arg_MOVT_rzt *a, void (*func)(TCGv_i64, TCGv_ptr, tcg_target_long)) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Prepare for MOVA array to/from vector with multiple registers by adding a div_len parameter, herein always 1, and a vec_mod parameter, herein always 0. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-24-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-sme.c | 47 +++++++++++++++++++++++----------- 1 file changed, 32 insertions(+), 15 deletions(-) diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool sme2_zt0_enabled_check(DisasContext *s) /* Resolve tile.size[rs+imm] to a host pointer. */ static TCGv_ptr get_tile_rowcol(DisasContext *s, int esz, int rs, - int tile, int imm, bool vertical) + int tile, int imm, int div_len, + int vec_mod, bool vertical) { int pos, len, offset; TCGv_i32 tmp; @@ -XXX,XX +XXX,XX @@ static TCGv_ptr get_tile_rowcol(DisasContext *s, int esz, int rs, /* Compute the final index, which is Rs+imm. */ tmp = tcg_temp_new_i32(); tcg_gen_trunc_tl_i32(tmp, cpu_reg(s, rs)); + /* + * Round the vector index down to a multiple of vec_mod if necessary. + * We do this before adding the offset, to handle cases like + * MOVA (tile to vector, 2 registers) where we want to call this + * several times in a loop with an increasing offset. We rely on + * the instruction encodings always forcing the initial offset in + * [rs + offset] to be a multiple of vec_mod. The pseudocode usually + * does the round-down after adding the offset rather than before, + * but MOVA is an exception. + */ + if (vec_mod > 1) { + tcg_gen_andc_i32(tmp, tmp, tcg_constant_i32(vec_mod - 1)); + } tcg_gen_addi_i32(tmp, tmp, imm); /* Prepare a power-of-two modulo via extraction of @len bits. */ - len = ctz32(streaming_vec_reg_size(s)) - esz; + len = ctz32(streaming_vec_reg_size(s) / div_len) - esz; if (!len) { /* @@ -XXX,XX +XXX,XX @@ static TCGv_ptr get_tile_rowcol(DisasContext *s, int esz, int rs, return addr; } +/* Resolve ZArray[rs+imm] to a host pointer. */ +static TCGv_ptr get_zarray(DisasContext *s, int rs, int imm, + int div_len, int vec_mod) +{ + /* ZA[n] equates to ZA0H.B[n]. */ + return get_tile_rowcol(s, MO_8, rs, 0, imm, div_len, vec_mod, false); +} + /* * Resolve tile.size[0] to a host pointer. * Used by e.g. outer product insns where we require the entire tile. @@ -XXX,XX +XXX,XX @@ static bool do_mova_tile(DisasContext *s, arg_mova_p *a, bool to_vec) return true; } - t_za = get_tile_rowcol(s, a->esz, a->rs, a->za, a->off, a->v); + t_za = get_tile_rowcol(s, a->esz, a->rs, a->za, a->off, 1, 0, a->v); t_zr = vec_full_reg_ptr(s, a->zr); t_pg = pred_full_reg_ptr(s, a->pg); @@ -XXX,XX +XXX,XX @@ static bool trans_LDST1(DisasContext *s, arg_LDST1 *a) return true; } - t_za = get_tile_rowcol(s, a->esz, a->rs, a->za, a->off, a->v); + t_za = get_tile_rowcol(s, a->esz, a->rs, a->za, a->off, 1, 0, a->v); t_pg = pred_full_reg_ptr(s, a->pg); addr = tcg_temp_new_i64(); @@ -XXX,XX +XXX,XX @@ typedef void GenLdStR(DisasContext *, TCGv_ptr, int, int, int, int, MemOp); static bool do_ldst_r(DisasContext *s, arg_ldstr *a, GenLdStR *fn) { - int svl = streaming_vec_reg_size(s); - int imm = a->imm; - TCGv_ptr base; + if (sme_za_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + int imm = a->imm; + TCGv_ptr base = get_zarray(s, a->rv, imm, 1, 0); - if (!sme_za_enabled_check(s)) { - return true; + fn(s, base, 0, svl, a->rn, imm * svl, + s->align_mem ? MO_ALIGN_16 : MO_UNALN); } - - /* ZA[n] equates to ZA0H.B[n]. */ - base = get_tile_rowcol(s, MO_8, a->rv, 0, imm, false); - - fn(s, base, 0, svl, a->rn, imm * svl, - s->align_mem ? MO_ALIGN_16 : MO_UNALN); return true; } -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-25-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/cpu.h | 1 + target/arm/cpu64.c | 1 + 2 files changed, 2 insertions(+) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -XXX,XX +XXX,XX @@ struct ArchCPU { /* Used to set the maximum vector length the cpu will support. */ uint32_t sve_max_vq; + uint32_t sme_max_vq; #ifdef CONFIG_USER_ONLY /* Used to set the default vector length at process start. */ diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu64.c +++ b/target/arm/cpu64.c @@ -XXX,XX +XXX,XX @@ void arm_cpu_sme_finalize(ARMCPU *cpu, Error **errp) } cpu->sme_vq.map = vq_map; + cpu->sme_max_vq = 32 - clz32(vq_map); } static bool cpu_arm_get_sme(Object *obj, Error **errp) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-26-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 9 +++++ target/arm/tcg/translate.h | 1 + target/arm/tcg/sme.decode | 37 ++++++++++++++++++ target/arm/tcg/sme_helper.c | 64 ++++++++++++++++++++++++++++++ target/arm/tcg/translate-a64.c | 1 + target/arm/tcg/translate-sme.c | 71 ++++++++++++++++++++++++++++++++++ 6 files changed, 183 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sme_mova_zc_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sme_mova_cz_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sme_mova_zc_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_mova_cz_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_mova_zc_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_mova_cz_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_mova_zc_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_mova_cz_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_mova_zc_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_mova_cz_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_mova_zc_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(sme_ld1b_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_5(sme_ld1b_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_5(sme_ld1b_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32) diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate.h +++ b/target/arm/tcg/translate.h @@ -XXX,XX +XXX,XX @@ typedef struct DisasContext { int zt0_excp_el; /* ZT0 exception EL or 0 if enabled */ int vl; /* current vector length in bytes */ int svl; /* current streaming vector length in bytes */ + int max_svl; /* maximum implemented streaming vector length */ bool vfp_enabled; /* FP enabled via FPSCR.EN */ int vec_len; int vec_stride; diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ ZERO_zt0 11000000 01 001 00000000000 00000001 %mova_rs 13:2 !function=plus_12 &mova_p esz rs pg zr za off v:bool +&mova_t esz rs zr za off v:bool MOVA_tz 11000000 00 00000 0 v:1 .. pg:3 zr:5 0 off:4 \ &mova_p rs=%mova_rs esz=0 za=0 @@ -XXX,XX +XXX,XX @@ MOVA_zt 11000000 11 00001 0 v:1 .. pg:3 0 za:3 off:1 zr:5 \ MOVA_zt 11000000 11 00001 1 v:1 .. pg:3 0 za:4 zr:5 \ &mova_p rs=%mova_rs esz=4 off=0 +MOVA_tz2 11000000 00 00010 0 v:1 .. 000 zr:4 0 00 off:3 \ + &mova_t rs=%mova_rs esz=0 za=0 +MOVA_tz2 11000000 01 00010 0 v:1 .. 000 zr:4 0 00 za:1 off:2 \ + &mova_t rs=%mova_rs esz=1 +MOVA_tz2 11000000 10 00010 0 v:1 .. 000 zr:4 0 00 za:2 off:1 \ + &mova_t rs=%mova_rs esz=2 +MOVA_tz2 11000000 11 00010 0 v:1 .. 000 zr:4 0 00 za:3 \ + &mova_t rs=%mova_rs esz=3 off=0 + +MOVA_zt2 11000000 00 00011 0 v:1 .. 000 00 off:3 zr:4 0 \ + &mova_t rs=%mova_rs esz=0 za=0 +MOVA_zt2 11000000 01 00011 0 v:1 .. 000 00 za:1 off:2 zr:4 0 \ + &mova_t rs=%mova_rs esz=1 +MOVA_zt2 11000000 10 00011 0 v:1 .. 000 00 za:2 off:1 zr:4 0 \ + &mova_t rs=%mova_rs esz=2 +MOVA_zt2 11000000 11 00011 0 v:1 .. 000 00 za:3 zr:4 0 \ + &mova_t rs=%mova_rs esz=3 off=0 + +MOVA_tz4 11000000 00 00010 0 v:1 .. 001 zr:3 00 000 off:2 \ + &mova_t rs=%mova_rs esz=0 za=0 +MOVA_tz4 11000000 01 00010 0 v:1 .. 001 zr:3 00 000 za:1 off:1 \ + &mova_t rs=%mova_rs esz=1 +MOVA_tz4 11000000 10 00010 0 v:1 .. 001 zr:3 00 000 za:2 \ + &mova_t rs=%mova_rs esz=2 off=0 +MOVA_tz4 11000000 11 00010 0 v:1 .. 001 zr:3 00 00 za:3 \ + &mova_t rs=%mova_rs esz=3 off=0 + +MOVA_zt4 11000000 00 00011 0 v:1 .. 001 000 off:2 zr:3 00 \ + &mova_t rs=%mova_rs esz=0 za=0 +MOVA_zt4 11000000 01 00011 0 v:1 .. 001 000 za:1 off:1 zr:3 00 \ + &mova_t rs=%mova_rs esz=1 +MOVA_zt4 11000000 10 00011 0 v:1 .. 001 000 za:2 zr:3 00 \ + &mova_t rs=%mova_rs esz=2 off=0 +MOVA_zt4 11000000 11 00011 0 v:1 .. 001 00 za:3 zr:3 00 \ + &mova_t rs=%mova_rs esz=3 off=0 + ### SME Move into/from ZT0 MOVT_rzt 1100 0000 0100 1100 0 off:3 00 11111 rt:5 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme_mova_zc_q)(void *vd, void *za, void *vg, uint32_t desc) #undef DO_MOVA_Z +void HELPER(sme2_mova_zc_b)(void *vdst, void *vsrc, uint32_t desc) +{ + const uint8_t *src = vsrc; + uint8_t *dst = vdst; + size_t i, n = simd_oprsz(desc); + + for (i = 0; i < n; ++i) { + dst[i] = src[tile_vslice_index(i)]; + } +} + +void HELPER(sme2_mova_zc_h)(void *vdst, void *vsrc, uint32_t desc) +{ + const uint16_t *src = vsrc; + uint16_t *dst = vdst; + size_t i, n = simd_oprsz(desc) / 2; + + for (i = 0; i < n; ++i) { + dst[i] = src[tile_vslice_index(i)]; + } +} + +void HELPER(sme2_mova_zc_s)(void *vdst, void *vsrc, uint32_t desc) +{ + const uint32_t *src = vsrc; + uint32_t *dst = vdst; + size_t i, n = simd_oprsz(desc) / 4; + + for (i = 0; i < n; ++i) { + dst[i] = src[tile_vslice_index(i)]; + } +} + +void HELPER(sme2_mova_zc_d)(void *vdst, void *vsrc, uint32_t desc) +{ + const uint64_t *src = vsrc; + uint64_t *dst = vdst; + size_t i, n = simd_oprsz(desc) / 8; + + for (i = 0; i < n; ++i) { + dst[i] = src[tile_vslice_index(i)]; + } +} + /* * Clear elements in a tile slice comprising len bytes. */ @@ -XXX,XX +XXX,XX @@ static void copy_vertical_q(void *vdst, const void *vsrc, size_t len) } } +void HELPER(sme2_mova_cz_b)(void *vdst, void *vsrc, uint32_t desc) +{ + copy_vertical_b(vdst, vsrc, simd_oprsz(desc)); +} + +void HELPER(sme2_mova_cz_h)(void *vdst, void *vsrc, uint32_t desc) +{ + copy_vertical_h(vdst, vsrc, simd_oprsz(desc)); +} + +void HELPER(sme2_mova_cz_s)(void *vdst, void *vsrc, uint32_t desc) +{ + copy_vertical_s(vdst, vsrc, simd_oprsz(desc)); +} + +void HELPER(sme2_mova_cz_d)(void *vdst, void *vsrc, uint32_t desc) +{ + copy_vertical_d(vdst, vsrc, simd_oprsz(desc)); +} + /* * Host and TLB primitives for vertical tile slice addressing. */ diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase, dc->zt0_excp_el = EX_TBFLAG_A64(tb_flags, ZT0EXC_EL); dc->vl = (EX_TBFLAG_A64(tb_flags, VL) + 1) * 16; dc->svl = (EX_TBFLAG_A64(tb_flags, SVL) + 1) * 16; + dc->max_svl = arm_cpu->sme_max_vq * 16; dc->pauth_active = EX_TBFLAG_A64(tb_flags, PAUTH_ACTIVE); dc->bt = EX_TBFLAG_A64(tb_flags, BT); dc->btype = EX_TBFLAG_A64(tb_flags, BTYPE); diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_mova_tile(DisasContext *s, arg_mova_p *a, bool to_vec) TRANS_FEAT(MOVA_tz, aa64_sme, do_mova_tile, a, false) TRANS_FEAT(MOVA_zt, aa64_sme, do_mova_tile, a, true) +static bool do_mova_tile_n(DisasContext *s, arg_mova_t *a, int n, bool to_vec) +{ + static gen_helper_gvec_2 * const cz_fns[] = { + gen_helper_sme2_mova_cz_b, gen_helper_sme2_mova_cz_h, + gen_helper_sme2_mova_cz_s, gen_helper_sme2_mova_cz_d, + }; + static gen_helper_gvec_2 * const zc_fns[] = { + gen_helper_sme2_mova_zc_b, gen_helper_sme2_mova_zc_h, + gen_helper_sme2_mova_zc_s, gen_helper_sme2_mova_zc_d, + }; + TCGv_ptr t_za; + int svl, bytes_per_op = n << a->esz; + + /* + * The MaxImplementedSVL check happens in the decode pseudocode, + * before the SM+ZA enabled check in the operation pseudocode. + * This will (currently) only fail for NREG=4, ESZ=MO_64. + */ + if (s->max_svl < bytes_per_op) { + unallocated_encoding(s); + return true; + } + + if (!sme_smza_enabled_check(s)) { + return true; + } + + svl = streaming_vec_reg_size(s); + + /* + * The CurrentVL check happens in the operation pseudocode, + * after the SM+ZA enabled check. + */ + if (svl < bytes_per_op) { + unallocated_encoding(s); + return true; + } + + if (a->v) { + TCGv_i32 t_desc = tcg_constant_i32(simd_desc(svl, svl, 0)); + + for (int i = 0; i < n; ++i) { + TCGv_ptr t_zr = vec_full_reg_ptr(s, a->zr * n + i); + t_za = get_tile_rowcol(s, a->esz, a->rs, a->za, + a->off * n + i, 1, n, a->v); + if (to_vec) { + zc_fns[a->esz](t_zr, t_za, t_desc); + } else { + cz_fns[a->esz](t_za, t_zr, t_desc); + } + } + } else { + for (int i = 0; i < n; ++i) { + int o_zr = vec_full_reg_offset(s, a->zr * n + i); + t_za = get_tile_rowcol(s, a->esz, a->rs, a->za, + a->off * n + i, 1, n, a->v); + if (to_vec) { + tcg_gen_gvec_mov_var(MO_8, tcg_env, o_zr, t_za, 0, svl, svl); + } else { + tcg_gen_gvec_mov_var(MO_8, t_za, 0, tcg_env, o_zr, svl, svl); + } + } + } + return true; +} + +TRANS_FEAT(MOVA_tz2, aa64_sme2, do_mova_tile_n, a, 2, false) +TRANS_FEAT(MOVA_tz4, aa64_sme2, do_mova_tile_n, a, 4, false) +TRANS_FEAT(MOVA_zt2, aa64_sme2, do_mova_tile_n, a, 2, true) +TRANS_FEAT(MOVA_zt4, aa64_sme2, do_mova_tile_n, a, 4, true) + static bool do_movt(DisasContext *s, arg_MOVT_rzt *a, void (*func)(TCGv_i64, TCGv_ptr, tcg_target_long)) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-27-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate.h | 5 +++++ target/arm/tcg/sme.decode | 12 ++++++++++++ target/arm/tcg/translate-sme.c | 30 ++++++++++++++++++++++++++++++ 3 files changed, 47 insertions(+) diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate.h +++ b/target/arm/tcg/translate.h @@ -XXX,XX +XXX,XX @@ static inline int plus_2(DisasContext *s, int x) return x + 2; } +static inline int plus_8(DisasContext *s, int x) +{ + return x + 8; +} + static inline int plus_12(DisasContext *s, int x) { return x + 12; diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ ZERO_zt0 11000000 01 001 00000000000 00000001 ### SME Move into/from Array %mova_rs 13:2 !function=plus_12 +%mova_rv 13:2 !function=plus_8 +&mova_a rv zr off &mova_p esz rs pg zr za off v:bool &mova_t esz rs zr za off v:bool @@ -XXX,XX +XXX,XX @@ MOVA_zt4 11000000 10 00011 0 v:1 .. 001 000 za:2 zr:3 00 \ MOVA_zt4 11000000 11 00011 0 v:1 .. 001 00 za:3 zr:3 00 \ &mova_t rs=%mova_rs esz=3 off=0 +MOVA_az2 11000000 00 00010 00 .. 010 zr:4 000 off:3 \ + &mova_a rv=%mova_rv +MOVA_az4 11000000 00 00010 00 .. 011 zr:3 0000 off:3 \ + &mova_a rv=%mova_rv + +MOVA_za2 11000000 00 00011 00 .. 010 00 off:3 zr:4 0 \ + &mova_a rv=%mova_rv +MOVA_za4 11000000 00 00011 00 .. 011 00 off:3 zr:3 00 \ + &mova_a rv=%mova_rv + ### SME Move into/from ZT0 MOVT_rzt 1100 0000 0100 1100 0 off:3 00 11111 rt:5 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(MOVA_tz4, aa64_sme2, do_mova_tile_n, a, 4, false) TRANS_FEAT(MOVA_zt2, aa64_sme2, do_mova_tile_n, a, 2, true) TRANS_FEAT(MOVA_zt4, aa64_sme2, do_mova_tile_n, a, 4, true) +static bool do_mova_array_n(DisasContext *s, arg_mova_a *a, int n, bool to_vec) +{ + TCGv_ptr t_za; + int svl; + + if (!sme_smza_enabled_check(s)) { + return true; + } + + svl = streaming_vec_reg_size(s); + t_za = get_zarray(s, a->rv, a->off, n, 0); + + for (int i = 0; i < n; ++i) { + int o_za = (svl / n * sizeof(ARMVectorReg)) * i; + int o_zr = vec_full_reg_offset(s, a->zr * n + i); + + if (to_vec) { + tcg_gen_gvec_mov_var(MO_8, tcg_env, o_zr, t_za, o_za, svl, svl); + } else { + tcg_gen_gvec_mov_var(MO_8, t_za, o_za, tcg_env, o_zr, svl, svl); + } + } + return true; +} + +TRANS_FEAT(MOVA_az2, aa64_sme2, do_mova_array_n, a, 2, false) +TRANS_FEAT(MOVA_az4, aa64_sme2, do_mova_array_n, a, 4, false) +TRANS_FEAT(MOVA_za2, aa64_sme2, do_mova_array_n, a, 2, true) +TRANS_FEAT(MOVA_za4, aa64_sme2, do_mova_array_n, a, 4, true) + static bool do_movt(DisasContext *s, arg_MOVT_rzt *a, void (*func)(TCGv_i64, TCGv_ptr, tcg_target_long)) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-28-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 3 +++ target/arm/tcg/sme.decode | 2 ++ target/arm/tcg/sme_helper.c | 34 ++++++++++++++++++++++++---------- target/arm/tcg/translate-sme.c | 2 ++ 4 files changed, 31 insertions(+), 10 deletions(-) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sme_sumopa_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_6(sme_usmopa_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_6(sme2_bmopa_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SMOPA_d 1010000 0 11 0 ..... ... ... ..... . 0 ... @op_64 SUMOPA_d 1010000 0 11 1 ..... ... ... ..... . 0 ... @op_64 USMOPA_d 1010000 1 11 0 ..... ... ... ..... . 0 ... @op_64 UMOPA_d 1010000 1 11 1 ..... ... ... ..... . 0 ... @op_64 + +BMOPA 1000000 0 10 0 ..... ... ... ..... . 10 .. @op_32 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ DEF_IMOP_64(umopa_d, uint16_t, uint16_t) DEF_IMOP_64(sumopa_d, int16_t, uint16_t) DEF_IMOP_64(usmopa_d, uint16_t, int16_t) -#define DEF_IMOPH(NAME, S) \ - void HELPER(sme_##NAME##_##S)(void *vza, void *vzn, void *vzm, \ +#define DEF_IMOPH(P, NAME, S) \ + void HELPER(P##_##NAME##_##S)(void *vza, void *vzn, void *vzm, \ void *vpn, void *vpm, uint32_t desc) \ { do_imopa_##S(vza, vzn, vzm, vpn, vpm, desc, NAME##_##S); } -DEF_IMOPH(smopa, s) -DEF_IMOPH(umopa, s) -DEF_IMOPH(sumopa, s) -DEF_IMOPH(usmopa, s) +DEF_IMOPH(sme, smopa, s) +DEF_IMOPH(sme, umopa, s) +DEF_IMOPH(sme, sumopa, s) +DEF_IMOPH(sme, usmopa, s) -DEF_IMOPH(smopa, d) -DEF_IMOPH(umopa, d) -DEF_IMOPH(sumopa, d) -DEF_IMOPH(usmopa, d) +DEF_IMOPH(sme, smopa, d) +DEF_IMOPH(sme, umopa, d) +DEF_IMOPH(sme, sumopa, d) +DEF_IMOPH(sme, usmopa, d) + +static uint32_t bmopa_s(uint32_t n, uint32_t m, uint32_t a, uint8_t p, bool neg) +{ + uint32_t sum = ctpop32(~(n ^ m)); + if (neg) { + sum = -sum; + } + if (!(p & 1)) { + sum = 0; + } + return a + sum; +} + +DEF_IMOPH(sme2, bmopa, s) diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SMOPA_d, aa64_sme_i16i64, do_outprod, a, MO_64, gen_helper_sme_smopa_ TRANS_FEAT(UMOPA_d, aa64_sme_i16i64, do_outprod, a, MO_64, gen_helper_sme_umopa_d) TRANS_FEAT(SUMOPA_d, aa64_sme_i16i64, do_outprod, a, MO_64, gen_helper_sme_sumopa_d) TRANS_FEAT(USMOPA_d, aa64_sme_i16i64, do_outprod, a, MO_64, gen_helper_sme_usmopa_d) + +TRANS_FEAT(BMOPA, aa64_sme2, do_outprod, a, MO_32, gen_helper_sme2_bmopa_s) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-29-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 4 ++++ target/arm/tcg/sme.decode | 2 ++ target/arm/tcg/sme_helper.c | 37 +++++++++++++++++++++++++--------- target/arm/tcg/translate-sme.c | 2 ++ 4 files changed, 35 insertions(+), 10 deletions(-) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sme_usmopa_d, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_6(sme2_bmopa_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sme2_smopa2_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sme2_umopa2_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ USMOPA_d 1010000 1 11 0 ..... ... ... ..... . 0 ... @op_64 UMOPA_d 1010000 1 11 1 ..... ... ... ..... . 0 ... @op_64 BMOPA 1000000 0 10 0 ..... ... ... ..... . 10 .. @op_32 +SMOPA2_s 1010000 0 10 0 ..... ... ... ..... . 10 .. @op_32 +UMOPA2_s 1010000 1 10 0 ..... ... ... ..... . 10 .. @op_32 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ static inline void do_imopa_d(uint64_t *za, uint64_t *zn, uint64_t *zm, } } -#define DEF_IMOP_32(NAME, NTYPE, MTYPE) \ +#define DEF_IMOP_8x4_32(NAME, NTYPE, MTYPE) \ static uint32_t NAME(uint32_t n, uint32_t m, uint32_t a, uint8_t p, bool neg) \ { \ uint32_t sum = 0; \ @@ -XXX,XX +XXX,XX @@ static uint32_t NAME(uint32_t n, uint32_t m, uint32_t a, uint8_t p, bool neg) \ return neg ? a - sum : a + sum; \ } -#define DEF_IMOP_64(NAME, NTYPE, MTYPE) \ +#define DEF_IMOP_16x4_64(NAME, NTYPE, MTYPE) \ static uint64_t NAME(uint64_t n, uint64_t m, uint64_t a, uint8_t p, bool neg) \ { \ uint64_t sum = 0; \ @@ -XXX,XX +XXX,XX @@ static uint64_t NAME(uint64_t n, uint64_t m, uint64_t a, uint8_t p, bool neg) \ return neg ? a - sum : a + sum; \ } -DEF_IMOP_32(smopa_s, int8_t, int8_t) -DEF_IMOP_32(umopa_s, uint8_t, uint8_t) -DEF_IMOP_32(sumopa_s, int8_t, uint8_t) -DEF_IMOP_32(usmopa_s, uint8_t, int8_t) +DEF_IMOP_8x4_32(smopa_s, int8_t, int8_t) +DEF_IMOP_8x4_32(umopa_s, uint8_t, uint8_t) +DEF_IMOP_8x4_32(sumopa_s, int8_t, uint8_t) +DEF_IMOP_8x4_32(usmopa_s, uint8_t, int8_t) -DEF_IMOP_64(smopa_d, int16_t, int16_t) -DEF_IMOP_64(umopa_d, uint16_t, uint16_t) -DEF_IMOP_64(sumopa_d, int16_t, uint16_t) -DEF_IMOP_64(usmopa_d, uint16_t, int16_t) +DEF_IMOP_16x4_64(smopa_d, int16_t, int16_t) +DEF_IMOP_16x4_64(umopa_d, uint16_t, uint16_t) +DEF_IMOP_16x4_64(sumopa_d, int16_t, uint16_t) +DEF_IMOP_16x4_64(usmopa_d, uint16_t, int16_t) #define DEF_IMOPH(P, NAME, S) \ void HELPER(P##_##NAME##_##S)(void *vza, void *vzn, void *vzm, \ @@ -XXX,XX +XXX,XX @@ static uint32_t bmopa_s(uint32_t n, uint32_t m, uint32_t a, uint8_t p, bool neg) } DEF_IMOPH(sme2, bmopa, s) + +#define DEF_IMOP_16x2_32(NAME, NTYPE, MTYPE) \ +static uint32_t NAME(uint32_t n, uint32_t m, uint32_t a, uint8_t p, bool neg) \ +{ \ + uint32_t sum = 0; \ + /* Apply P to N as a mask, making the inactive elements 0. */ \ + n &= expand_pred_h(p); \ + sum += (NTYPE)(n >> 0) * (MTYPE)(m >> 0); \ + sum += (NTYPE)(n >> 16) * (MTYPE)(m >> 16); \ + return neg ? a - sum : a + sum; \ +} + +DEF_IMOP_16x2_32(smopa2_s, int16_t, int16_t) +DEF_IMOP_16x2_32(umopa2_s, uint16_t, uint16_t) + +DEF_IMOPH(sme2, smopa2, s) +DEF_IMOPH(sme2, umopa2, s) diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SUMOPA_d, aa64_sme_i16i64, do_outprod, a, MO_64, gen_helper_sme_sumop TRANS_FEAT(USMOPA_d, aa64_sme_i16i64, do_outprod, a, MO_64, gen_helper_sme_usmopa_d) TRANS_FEAT(BMOPA, aa64_sme2, do_outprod, a, MO_32, gen_helper_sme2_bmopa_s) +TRANS_FEAT(SMOPA2_s, aa64_sme2, do_outprod, a, MO_32, gen_helper_sme2_smopa2_s) +TRANS_FEAT(UMOPA2_s, aa64_sme2, do_outprod, a, MO_32, gen_helper_sme2_umopa2_s) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> To be used by both SVE2 and SME2. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-30-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-a64.h | 4 ++++ target/arm/tcg/gengvec64.c | 11 +++++++++++ target/arm/tcg/translate-sve.c | 8 +------- 3 files changed, 16 insertions(+), 7 deletions(-) diff --git a/target/arm/tcg/translate-a64.h b/target/arm/tcg/translate-a64.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.h +++ b/target/arm/tcg/translate-a64.h @@ -XXX,XX +XXX,XX @@ void gen_gvec_usqadd_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz); +void gen_gvec_sve2_sqdmulh(unsigned vece, uint32_t rd_ofs, + uint32_t rn_ofs, uint32_t rm_ofs, + uint32_t opr_sz, uint32_t max_sz); + void gen_sve_ldr(DisasContext *s, TCGv_ptr, int vofs, int len, int rn, int imm, MemOp align); void gen_sve_str(DisasContext *s, TCGv_ptr, int vofs, diff --git a/target/arm/tcg/gengvec64.c b/target/arm/tcg/gengvec64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/gengvec64.c +++ b/target/arm/tcg/gengvec64.c @@ -XXX,XX +XXX,XX @@ void gen_gvec_usqadd_qc(unsigned vece, uint32_t rd_ofs, tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc), rn_ofs, rm_ofs, opr_sz, max_sz, &ops[vece]); } + +void gen_gvec_sve2_sqdmulh(unsigned vece, uint32_t rd_ofs, + uint32_t rn_ofs, uint32_t rm_ofs, + uint32_t opr_sz, uint32_t max_sz) +{ + static gen_helper_gvec_3 * const fns[4] = { + gen_helper_sve2_sqdmulh_b, gen_helper_sve2_sqdmulh_h, + gen_helper_sve2_sqdmulh_s, gen_helper_sve2_sqdmulh_d, + }; + tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, 0, fns[vece]); +} diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(MOVPRFX_z, aa64_sve, do_movz_zpz, a->rd, a->rn, a->pg, a->esz, false) */ TRANS_FEAT(MUL_zzz, aa64_sve2, gen_gvec_fn_arg_zzz, tcg_gen_gvec_mul, a) +TRANS_FEAT(SQDMULH_zzz, aa64_sve2, gen_gvec_fn_arg_zzz, gen_gvec_sve2_sqdmulh, a) static gen_helper_gvec_3 * const smulh_zzz_fns[4] = { gen_helper_gvec_smulh_b, gen_helper_gvec_smulh_h, @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(UMULH_zzz, aa64_sve2, gen_gvec_ool_arg_zzz, TRANS_FEAT(PMUL_zzz, aa64_sve2, gen_gvec_ool_arg_zzz, gen_helper_gvec_pmul_b, a, 0) -static gen_helper_gvec_3 * const sqdmulh_zzz_fns[4] = { - gen_helper_sve2_sqdmulh_b, gen_helper_sve2_sqdmulh_h, - gen_helper_sve2_sqdmulh_s, gen_helper_sve2_sqdmulh_d, -}; -TRANS_FEAT(SQDMULH_zzz, aa64_sve2, gen_gvec_ool_arg_zzz, - sqdmulh_zzz_fns[a->esz], a, 0) - static gen_helper_gvec_3 * const sqrdmulh_zzz_fns[4] = { gen_helper_sve2_sqrdmulh_b, gen_helper_sve2_sqrdmulh_h, gen_helper_sve2_sqrdmulh_s, gen_helper_sve2_sqrdmulh_d, -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-31-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 13 ++++ target/arm/tcg/helper.h | 8 ++ target/arm/tcg/vec_internal.h | 4 + target/arm/tcg/sme.decode | 40 ++++++++++ target/arm/tcg/helper-a64.c | 2 + target/arm/tcg/neon_helper.c | 30 ++++++++ target/arm/tcg/translate-sme.c | 137 +++++++++++++++++++++++++++++++++ target/arm/tcg/vec_helper.c | 7 ++ 8 files changed, 241 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sme2_smopa2_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_6(sme2_umopa2_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(gvec_fmax_b16, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_fmin_b16, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_ah_fmax_b16, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_ah_fmin_b16, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_fmaxnum_b16, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_fminnum_b16, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper.h +++ b/target/arm/tcg/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_urshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_urshl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_urshl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_srshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_srshl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_srshl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sme2_urshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_urshl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_urshl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_2(neon_add_u8, i32, i32, i32) DEF_HELPER_2(neon_add_u16, i32, i32, i32) DEF_HELPER_2(neon_sub_u8, i32, i32, i32) diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_internal.h +++ b/target/arm/tcg/vec_internal.h @@ -XXX,XX +XXX,XX @@ static inline float64 float64_maybe_ah_chs(float64 a, bool fpcr_ah) return fpcr_ah && float64_is_any_nan(a) ? a : float64_chs(a); } +/* Not actually called directly as a helper, but uses similar machinery. */ +bfloat16 helper_sme2_ah_fmax_b16(bfloat16 a, bfloat16 b, float_status *fpst); +bfloat16 helper_sme2_ah_fmin_b16(bfloat16 a, bfloat16 b, float_status *fpst); + #endif /* TARGET_ARM_VEC_INTERNAL_H */ diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ UMOPA_d 1010000 1 11 1 ..... ... ... ..... . 0 ... @op_64 BMOPA 1000000 0 10 0 ..... ... ... ..... . 10 .. @op_32 SMOPA2_s 1010000 0 10 0 ..... ... ... ..... . 10 .. @op_32 UMOPA2_s 1010000 1 10 0 ..... ... ... ..... . 10 .. @op_32 + +### SME2 Multi-vector Multiple and Single SVE Destructive + +%zd_ax2 1:4 !function=times_2 +%zd_ax4 2:3 !function=times_4 + +&z2z_en zdn zm esz n +@z2z_2x1 ....... . esz:2 .. zm:4 ....0. ..... .... . \ + &z2z_en n=2 zdn=%zd_ax2 +@z2z_4x1 ....... . esz:2 .. zm:4 ....1. ..... ...0 . \ + &z2z_en n=4 zdn=%zd_ax4 + +SMAX_n1 1100000 1 .. 10 .... 1010.0 00000 .... 0 @z2z_2x1 +SMAX_n1 1100000 1 .. 10 .... 1010.0 00000 .... 0 @z2z_4x1 +UMAX_n1 1100000 1 .. 10 .... 1010.0 00000 .... 1 @z2z_2x1 +UMAX_n1 1100000 1 .. 10 .... 1010.0 00000 .... 1 @z2z_4x1 +SMIN_n1 1100000 1 .. 10 .... 1010.0 00001 .... 0 @z2z_2x1 +SMIN_n1 1100000 1 .. 10 .... 1010.0 00001 .... 0 @z2z_4x1 +UMIN_n1 1100000 1 .. 10 .... 1010.0 00001 .... 1 @z2z_2x1 +UMIN_n1 1100000 1 .. 10 .... 1010.0 00001 .... 1 @z2z_4x1 + +FMAX_n1 1100000 1 .. 10 .... 1010.0 01000 .... 0 @z2z_2x1 +FMAX_n1 1100000 1 .. 10 .... 1010.0 01000 .... 0 @z2z_4x1 +FMIN_n1 1100000 1 .. 10 .... 1010.0 01000 .... 1 @z2z_2x1 +FMIN_n1 1100000 1 .. 10 .... 1010.0 01000 .... 1 @z2z_4x1 +FMAXNM_n1 1100000 1 .. 10 .... 1010.0 01001 .... 0 @z2z_2x1 +FMAXNM_n1 1100000 1 .. 10 .... 1010.0 01001 .... 0 @z2z_4x1 +FMINNM_n1 1100000 1 .. 10 .... 1010.0 01001 .... 1 @z2z_2x1 +FMINNM_n1 1100000 1 .. 10 .... 1010.0 01001 .... 1 @z2z_4x1 + +SRSHL_n1 1100000 1 .. 10 .... 1010.0 10001 .... 0 @z2z_2x1 +SRSHL_n1 1100000 1 .. 10 .... 1010.0 10001 .... 0 @z2z_4x1 +URSHL_n1 1100000 1 .. 10 .... 1010.0 10001 .... 1 @z2z_2x1 +URSHL_n1 1100000 1 .. 10 .... 1010.0 10001 .... 1 @z2z_4x1 + +ADD_n1 1100000 1 .. 10 .... 1010.0 11000 .... 0 @z2z_2x1 +ADD_n1 1100000 1 .. 10 .... 1010.0 11000 .... 0 @z2z_4x1 + +SQDMULH_n1 1100000 1 .. 10 .... 1010.1 00000 .... 0 @z2z_2x1 +SQDMULH_n1 1100000 1 .. 10 .... 1010.1 00000 .... 0 @z2z_4x1 diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-a64.c +++ b/target/arm/tcg/helper-a64.c @@ -XXX,XX +XXX,XX @@ AH_MINMAX_HELPER(vfp_ah_mind, float64, float64, min) AH_MINMAX_HELPER(vfp_ah_maxh, dh_ctype_f16, float16, max) AH_MINMAX_HELPER(vfp_ah_maxs, float32, float32, max) AH_MINMAX_HELPER(vfp_ah_maxd, float64, float64, max) +AH_MINMAX_HELPER(sme2_ah_fmax_b16, bfloat16, bfloat16, max) +AH_MINMAX_HELPER(sme2_ah_fmin_b16, bfloat16, bfloat16, min) /* 64-bit versions of the CRC helpers. Note that although the operation * (and the prototypes of crc32c() and crc32() mean that only the bottom diff --git a/target/arm/tcg/neon_helper.c b/target/arm/tcg/neon_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/neon_helper.c +++ b/target/arm/tcg/neon_helper.c @@ -XXX,XX +XXX,XX @@ NEON_VOP(rshl_s16, neon_s16, 2) NEON_GVEC_VOP2(gvec_srshl_h, int16_t) #undef NEON_FN +#define NEON_FN(dest, src1, src2) \ + (dest = do_sqrshl_bhs(src1, src2, 16, true, NULL)) +NEON_GVEC_VOP2(sme2_srshl_h, int16_t) +#undef NEON_FN + #define NEON_FN(dest, src1, src2) \ (dest = do_sqrshl_bhs(src1, (int8_t)src2, 32, true, NULL)) NEON_GVEC_VOP2(gvec_srshl_s, int32_t) #undef NEON_FN +#define NEON_FN(dest, src1, src2) \ + (dest = do_sqrshl_bhs(src1, src2, 32, true, NULL)) +NEON_GVEC_VOP2(sme2_srshl_s, int32_t) +#undef NEON_FN + #define NEON_FN(dest, src1, src2) \ (dest = do_sqrshl_d(src1, (int8_t)src2, true, NULL)) NEON_GVEC_VOP2(gvec_srshl_d, int64_t) #undef NEON_FN +#define NEON_FN(dest, src1, src2) \ + (dest = do_sqrshl_d(src1, src2, true, NULL)) +NEON_GVEC_VOP2(sme2_srshl_d, int64_t) +#undef NEON_FN + uint32_t HELPER(neon_rshl_s32)(uint32_t val, uint32_t shift) { return do_sqrshl_bhs(val, (int8_t)shift, 32, true, NULL); @@ -XXX,XX +XXX,XX @@ NEON_VOP(rshl_u16, neon_u16, 2) NEON_GVEC_VOP2(gvec_urshl_h, uint16_t) #undef NEON_FN +#define NEON_FN(dest, src1, src2) \ + (dest = do_uqrshl_bhs(src1, (int16_t)src2, 16, true, NULL)) +NEON_GVEC_VOP2(sme2_urshl_h, uint16_t) +#undef NEON_FN + #define NEON_FN(dest, src1, src2) \ (dest = do_uqrshl_bhs(src1, (int8_t)src2, 32, true, NULL)) NEON_GVEC_VOP2(gvec_urshl_s, int32_t) #undef NEON_FN +#define NEON_FN(dest, src1, src2) \ + (dest = do_uqrshl_bhs(src1, src2, 32, true, NULL)) +NEON_GVEC_VOP2(sme2_urshl_s, int32_t) +#undef NEON_FN + #define NEON_FN(dest, src1, src2) \ (dest = do_uqrshl_d(src1, (int8_t)src2, true, NULL)) NEON_GVEC_VOP2(gvec_urshl_d, int64_t) #undef NEON_FN +#define NEON_FN(dest, src1, src2) \ + (dest = do_uqrshl_d(src1, src2, true, NULL)) +NEON_GVEC_VOP2(sme2_urshl_d, int64_t) +#undef NEON_FN + uint32_t HELPER(neon_rshl_u32)(uint32_t val, uint32_t shift) { return do_uqrshl_bhs(val, (int8_t)shift, 32, true, NULL); diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(USMOPA_d, aa64_sme_i16i64, do_outprod, a, MO_64, gen_helper_sme_usmop TRANS_FEAT(BMOPA, aa64_sme2, do_outprod, a, MO_32, gen_helper_sme2_bmopa_s) TRANS_FEAT(SMOPA2_s, aa64_sme2, do_outprod, a, MO_32, gen_helper_sme2_smopa2_s) TRANS_FEAT(UMOPA2_s, aa64_sme2, do_outprod, a, MO_32, gen_helper_sme2_umopa2_s) + +static bool do_z2z_n1(DisasContext *s, arg_z2z_en *a, GVecGen3Fn *fn) +{ + int esz, dn, vsz, mofs, n; + bool overlap = false; + + if (!sme_sm_enabled_check(s)) { + return true; + } + + esz = a->esz; + n = a->n; + dn = a->zdn; + mofs = vec_full_reg_offset(s, a->zm); + vsz = streaming_vec_reg_size(s); + + for (int i = 0; i < n; i++) { + int dofs = vec_full_reg_offset(s, dn + i); + if (dofs == mofs) { + overlap = true; + } else { + fn(esz, dofs, dofs, mofs, vsz, vsz); + } + } + if (overlap) { + fn(esz, mofs, mofs, mofs, vsz, vsz); + } + return true; +} + +static void gen_sme2_srshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz) +{ + static gen_helper_gvec_3 * const fns[] = { + gen_helper_gvec_srshl_b, gen_helper_sme2_srshl_h, + gen_helper_sme2_srshl_s, gen_helper_sme2_srshl_d, + }; + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, 0, fns[vece]); +} + +static void gen_sme2_urshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz) +{ + static gen_helper_gvec_3 * const fns[] = { + gen_helper_gvec_urshl_b, gen_helper_sme2_urshl_h, + gen_helper_sme2_urshl_s, gen_helper_sme2_urshl_d, + }; + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, 0, fns[vece]); +} + +TRANS_FEAT(ADD_n1, aa64_sme2, do_z2z_n1, a, tcg_gen_gvec_add) +TRANS_FEAT(SMAX_n1, aa64_sme2, do_z2z_n1, a, tcg_gen_gvec_smax) +TRANS_FEAT(SMIN_n1, aa64_sme2, do_z2z_n1, a, tcg_gen_gvec_smin) +TRANS_FEAT(UMAX_n1, aa64_sme2, do_z2z_n1, a, tcg_gen_gvec_umax) +TRANS_FEAT(UMIN_n1, aa64_sme2, do_z2z_n1, a, tcg_gen_gvec_umin) +TRANS_FEAT(SRSHL_n1, aa64_sme2, do_z2z_n1, a, gen_sme2_srshl) +TRANS_FEAT(URSHL_n1, aa64_sme2, do_z2z_n1, a, gen_sme2_urshl) +TRANS_FEAT(SQDMULH_n1, aa64_sme2, do_z2z_n1, a, gen_gvec_sve2_sqdmulh) + +static bool do_z2z_n1_fpst(DisasContext *s, arg_z2z_en *a, + gen_helper_gvec_3_ptr * const fns[4]) +{ + int esz = a->esz, n, dn, vsz, mofs; + bool overlap = false; + gen_helper_gvec_3_ptr *fn; + TCGv_ptr fpst; + + /* These insns use MO_8 to encode BFloat16. */ + if (esz == MO_8 && !dc_isar_feature(aa64_sme_b16b16, s)) { + return false; + } + if (!sme_sm_enabled_check(s)) { + return true; + } + + fpst = fpstatus_ptr(esz == MO_16 ? FPST_A64_F16 : FPST_A64); + fn = fns[esz]; + n = a->n; + dn = a->zdn; + mofs = vec_full_reg_offset(s, a->zm); + vsz = streaming_vec_reg_size(s); + + for (int i = 0; i < n; i++) { + int dofs = vec_full_reg_offset(s, dn + i); + if (dofs == mofs) { + overlap = true; + } else { + tcg_gen_gvec_3_ptr(dofs, dofs, mofs, fpst, vsz, vsz, 0, fn); + } + } + if (overlap) { + tcg_gen_gvec_3_ptr(mofs, mofs, mofs, fpst, vsz, vsz, 0, fn); + } + return true; +} + +static gen_helper_gvec_3_ptr * const f_vector_fmax[2][4] = { + { gen_helper_gvec_fmax_b16, + gen_helper_gvec_fmax_h, + gen_helper_gvec_fmax_s, + gen_helper_gvec_fmax_d }, + { gen_helper_gvec_ah_fmax_b16, + gen_helper_gvec_ah_fmax_h, + gen_helper_gvec_ah_fmax_s, + gen_helper_gvec_ah_fmax_d }, +}; +TRANS_FEAT(FMAX_n1, aa64_sme2, do_z2z_n1_fpst, a, f_vector_fmax[s->fpcr_ah]) + +static gen_helper_gvec_3_ptr * const f_vector_fmin[2][4] = { + { gen_helper_gvec_fmin_b16, + gen_helper_gvec_fmin_h, + gen_helper_gvec_fmin_s, + gen_helper_gvec_fmin_d }, + { gen_helper_gvec_ah_fmin_b16, + gen_helper_gvec_ah_fmin_h, + gen_helper_gvec_ah_fmin_s, + gen_helper_gvec_ah_fmin_d }, +}; +TRANS_FEAT(FMIN_n1, aa64_sme2, do_z2z_n1_fpst, a, f_vector_fmin[s->fpcr_ah]) + +static gen_helper_gvec_3_ptr * const f_vector_fmaxnm[4] = { + gen_helper_gvec_fmaxnum_b16, + gen_helper_gvec_fmaxnum_h, + gen_helper_gvec_fmaxnum_s, + gen_helper_gvec_fmaxnum_d, +}; +TRANS_FEAT(FMAXNM_n1, aa64_sme2, do_z2z_n1_fpst, a, f_vector_fmaxnm) + +static gen_helper_gvec_3_ptr * const f_vector_fminnm[4] = { + gen_helper_gvec_fminnum_b16, + gen_helper_gvec_fminnum_h, + gen_helper_gvec_fminnum_s, + gen_helper_gvec_fminnum_d, +}; +TRANS_FEAT(FMINNM_n1, aa64_sme2, do_z2z_n1_fpst, a, f_vector_fminnm) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_ah_fmin_h, helper_vfp_ah_minh, float16) DO_3OP(gvec_ah_fmin_s, helper_vfp_ah_mins, float32) DO_3OP(gvec_ah_fmin_d, helper_vfp_ah_mind, float64) +DO_3OP(gvec_fmax_b16, bfloat16_max, bfloat16) +DO_3OP(gvec_fmin_b16, bfloat16_min, bfloat16) +DO_3OP(gvec_fmaxnum_b16, bfloat16_maxnum, bfloat16) +DO_3OP(gvec_fminnum_b16, bfloat16_minnum, bfloat16) +DO_3OP(gvec_ah_fmax_b16, helper_sme2_ah_fmax_b16, bfloat16) +DO_3OP(gvec_ah_fmin_b16, helper_sme2_ah_fmin_b16, bfloat16) + #endif #undef DO_3OP -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-32-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 36 +++++++++++++++++++ target/arm/tcg/translate-sme.c | 65 ++++++++++++++++++++++++++++++++++ 2 files changed, 101 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ ADD_n1 1100000 1 .. 10 .... 1010.0 11000 .... 0 @z2z_4x1 SQDMULH_n1 1100000 1 .. 10 .... 1010.1 00000 .... 0 @z2z_2x1 SQDMULH_n1 1100000 1 .. 10 .... 1010.1 00000 .... 0 @z2z_4x1 + +### SME2 Multi-vector Multiple Vectors SVE Destructive + +%zm_ax2 17:4 !function=times_2 +%zm_ax4 18:3 !function=times_4 + +@z2z_2x2 ....... . esz:2 . ....0 ....0. ..... .... . \ + &z2z_en n=2 zdn=%zd_ax2 zm=%zm_ax2 +@z2z_4x4 ....... . esz:2 . ...00 ....1. ..... ...0 . \ + &z2z_en n=4 zdn=%zd_ax4 zm=%zm_ax4 + +SMAX_nn 1100000 1 .. 1 ..... 1011.0 00000 .... 0 @z2z_2x2 +SMAX_nn 1100000 1 .. 1 ..... 1011.0 00000 .... 0 @z2z_4x4 +UMAX_nn 1100000 1 .. 1 ..... 1011.0 00000 .... 1 @z2z_2x2 +UMAX_nn 1100000 1 .. 1 ..... 1011.0 00000 .... 1 @z2z_4x4 +SMIN_nn 1100000 1 .. 1 ..... 1011.0 00001 .... 0 @z2z_2x2 +SMIN_nn 1100000 1 .. 1 ..... 1011.0 00001 .... 0 @z2z_4x4 +UMIN_nn 1100000 1 .. 1 ..... 1011.0 00001 .... 1 @z2z_2x2 +UMIN_nn 1100000 1 .. 1 ..... 1011.0 00001 .... 1 @z2z_4x4 + +FMAX_nn 1100000 1 .. 1 ..... 1011.0 01000 .... 0 @z2z_2x2 +FMAX_nn 1100000 1 .. 1 ..... 1011.0 01000 .... 0 @z2z_4x4 +FMIN_nn 1100000 1 .. 1 ..... 1011.0 01000 .... 1 @z2z_2x2 +FMIN_nn 1100000 1 .. 1 ..... 1011.0 01000 .... 1 @z2z_4x4 +FMAXNM_nn 1100000 1 .. 1 ..... 1011.0 01001 .... 0 @z2z_2x2 +FMAXNM_nn 1100000 1 .. 1 ..... 1011.0 01001 .... 0 @z2z_4x4 +FMINNM_nn 1100000 1 .. 1 ..... 1011.0 01001 .... 1 @z2z_2x2 +FMINNM_nn 1100000 1 .. 1 ..... 1011.0 01001 .... 1 @z2z_4x4 + +SRSHL_nn 1100000 1 .. 1 ..... 1011.0 10001 .... 0 @z2z_2x2 +SRSHL_nn 1100000 1 .. 1 ..... 1011.0 10001 .... 0 @z2z_4x4 +URSHL_nn 1100000 1 .. 1 ..... 1011.0 10001 .... 1 @z2z_2x2 +URSHL_nn 1100000 1 .. 1 ..... 1011.0 10001 .... 1 @z2z_4x4 + +SQDMULH_nn 1100000 1 .. 1 ..... 1011.1 00000 .... 0 @z2z_2x2 +SQDMULH_nn 1100000 1 .. 1 ..... 1011.1 00000 .... 0 @z2z_4x4 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SRSHL_n1, aa64_sme2, do_z2z_n1, a, gen_sme2_srshl) TRANS_FEAT(URSHL_n1, aa64_sme2, do_z2z_n1, a, gen_sme2_urshl) TRANS_FEAT(SQDMULH_n1, aa64_sme2, do_z2z_n1, a, gen_gvec_sve2_sqdmulh) +static bool do_z2z_nn(DisasContext *s, arg_z2z_en *a, GVecGen3Fn *fn) +{ + int esz, dn, dm, vsz, n; + + if (!sme_sm_enabled_check(s)) { + return true; + } + + esz = a->esz; + n = a->n; + dn = a->zdn; + dm = a->zm; + vsz = streaming_vec_reg_size(s); + + for (int i = 0; i < n; i++) { + int dofs = vec_full_reg_offset(s, dn + i); + int mofs = vec_full_reg_offset(s, dm + i); + + fn(esz, dofs, dofs, mofs, vsz, vsz); + } + return true; +} + +TRANS_FEAT(SMAX_nn, aa64_sme2, do_z2z_nn, a, tcg_gen_gvec_smax) +TRANS_FEAT(SMIN_nn, aa64_sme2, do_z2z_nn, a, tcg_gen_gvec_smin) +TRANS_FEAT(UMAX_nn, aa64_sme2, do_z2z_nn, a, tcg_gen_gvec_umax) +TRANS_FEAT(UMIN_nn, aa64_sme2, do_z2z_nn, a, tcg_gen_gvec_umin) +TRANS_FEAT(SRSHL_nn, aa64_sme2, do_z2z_nn, a, gen_sme2_srshl) +TRANS_FEAT(URSHL_nn, aa64_sme2, do_z2z_nn, a, gen_sme2_urshl) +TRANS_FEAT(SQDMULH_nn, aa64_sme2, do_z2z_nn, a, gen_gvec_sve2_sqdmulh) + static bool do_z2z_n1_fpst(DisasContext *s, arg_z2z_en *a, gen_helper_gvec_3_ptr * const fns[4]) { @@ -XXX,XX +XXX,XX @@ static bool do_z2z_n1_fpst(DisasContext *s, arg_z2z_en *a, return true; } +static bool do_z2z_nn_fpst(DisasContext *s, arg_z2z_en *a, + gen_helper_gvec_3_ptr * const fns[4]) +{ + int esz = a->esz, n, dn, dm, vsz; + gen_helper_gvec_3_ptr *fn; + TCGv_ptr fpst; + + if (esz == MO_8 && !dc_isar_feature(aa64_sme_b16b16, s)) { + return false; + } + if (!sme_sm_enabled_check(s)) { + return true; + } + + fpst = fpstatus_ptr(esz == MO_16 ? FPST_A64_F16 : FPST_A64); + fn = fns[esz]; + n = a->n; + dn = a->zdn; + dm = a->zm; + vsz = streaming_vec_reg_size(s); + + for (int i = 0; i < n; i++) { + int dofs = vec_full_reg_offset(s, dn + i); + int mofs = vec_full_reg_offset(s, dm + i); + + tcg_gen_gvec_3_ptr(dofs, dofs, mofs, fpst, vsz, vsz, 0, fn); + } + return true; +} + static gen_helper_gvec_3_ptr * const f_vector_fmax[2][4] = { { gen_helper_gvec_fmax_b16, gen_helper_gvec_fmax_h, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fmax[2][4] = { gen_helper_gvec_ah_fmax_d }, }; TRANS_FEAT(FMAX_n1, aa64_sme2, do_z2z_n1_fpst, a, f_vector_fmax[s->fpcr_ah]) +TRANS_FEAT(FMAX_nn, aa64_sme2, do_z2z_nn_fpst, a, f_vector_fmax[s->fpcr_ah]) static gen_helper_gvec_3_ptr * const f_vector_fmin[2][4] = { { gen_helper_gvec_fmin_b16, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fmin[2][4] = { gen_helper_gvec_ah_fmin_d }, }; TRANS_FEAT(FMIN_n1, aa64_sme2, do_z2z_n1_fpst, a, f_vector_fmin[s->fpcr_ah]) +TRANS_FEAT(FMIN_nn, aa64_sme2, do_z2z_nn_fpst, a, f_vector_fmin[s->fpcr_ah]) static gen_helper_gvec_3_ptr * const f_vector_fmaxnm[4] = { gen_helper_gvec_fmaxnum_b16, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fmaxnm[4] = { gen_helper_gvec_fmaxnum_d, }; TRANS_FEAT(FMAXNM_n1, aa64_sme2, do_z2z_n1_fpst, a, f_vector_fmaxnm) +TRANS_FEAT(FMAXNM_nn, aa64_sme2, do_z2z_nn_fpst, a, f_vector_fmaxnm) static gen_helper_gvec_3_ptr * const f_vector_fminnm[4] = { gen_helper_gvec_fminnum_b16, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fminnm[4] = { gen_helper_gvec_fminnum_d, }; TRANS_FEAT(FMINNM_n1, aa64_sme2, do_z2z_n1_fpst, a, f_vector_fminnm) +TRANS_FEAT(FMINNM_nn, aa64_sme2, do_z2z_nn_fpst, a, f_vector_fminnm) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-33-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate.h | 2 ++ target/arm/tcg/sme.decode | 15 +++++++++++++++ target/arm/tcg/translate-sme.c | 30 ++++++++++++++++++++++++++++++ 3 files changed, 47 insertions(+) diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate.h +++ b/target/arm/tcg/translate.h @@ -XXX,XX +XXX,XX @@ typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t); typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t); +typedef void GVecGen3FnVar(unsigned, TCGv_ptr, uint32_t, TCGv_ptr, uint32_t, + TCGv_ptr, uint32_t, uint32_t, uint32_t); /* Function prototype for gen_ functions for calling Neon helpers */ typedef void NeonGenOneOpFn(TCGv_i32, TCGv_i32); diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ URSHL_nn 1100000 1 .. 1 ..... 1011.0 10001 .... 1 @z2z_4x4 SQDMULH_nn 1100000 1 .. 1 ..... 1011.1 00000 .... 0 @z2z_2x2 SQDMULH_nn 1100000 1 .. 1 ..... 1011.1 00000 .... 0 @z2z_4x4 + +### SME2 Multi-vector Multiple and Single Array Vectors + +&azz_n n off rv zn zm +@azz_nx1_o3 ........ .... zm:4 ...... zn:5 .. off:3 &azz_n rv=%mova_rv + +ADD_azz_n1_s 11000001 0010 .... 0 .. 110 ..... 10 ... @azz_nx1_o3 n=2 +ADD_azz_n1_s 11000001 0011 .... 0 .. 110 ..... 10 ... @azz_nx1_o3 n=4 +ADD_azz_n1_d 11000001 0110 .... 0 .. 110 ..... 10 ... @azz_nx1_o3 n=2 +ADD_azz_n1_d 11000001 0111 .... 0 .. 110 ..... 10 ... @azz_nx1_o3 n=4 + +SUB_azz_n1_s 11000001 0010 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=2 +SUB_azz_n1_s 11000001 0011 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=4 +SUB_azz_n1_d 11000001 0110 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=2 +SUB_azz_n1_d 11000001 0111 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=4 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fminnm[4] = { }; TRANS_FEAT(FMINNM_n1, aa64_sme2, do_z2z_n1_fpst, a, f_vector_fminnm) TRANS_FEAT(FMINNM_nn, aa64_sme2, do_z2z_nn_fpst, a, f_vector_fminnm) + +/* Add/Sub vector Z[m] to each Z[n*N] with result in ZA[d*N]. */ +static bool do_azz_n1(DisasContext *s, arg_azz_n *a, int esz, + GVecGen3FnVar *fn) +{ + TCGv_ptr t_za; + int svl, n, o_zm; + + if (!sme_smza_enabled_check(s)) { + return true; + } + + n = a->n; + t_za = get_zarray(s, a->rv, a->off, n, 0); + o_zm = vec_full_reg_offset(s, a->zm); + svl = streaming_vec_reg_size(s); + + for (int i = 0; i < n; ++i) { + int o_za = (svl / n * sizeof(ARMVectorReg)) * i; + int o_zn = vec_full_reg_offset(s, (a->zn + i) % 32); + + fn(esz, t_za, o_za, tcg_env, o_zn, tcg_env, o_zm, svl, svl); + } + return true; +} + +TRANS_FEAT(ADD_azz_n1_s, aa64_sme2, do_azz_n1, a, MO_32, tcg_gen_gvec_add_var) +TRANS_FEAT(SUB_azz_n1_s, aa64_sme2, do_azz_n1, a, MO_32, tcg_gen_gvec_sub_var) +TRANS_FEAT(ADD_azz_n1_d, aa64_sme2_i16i64, do_azz_n1, a, MO_64, tcg_gen_gvec_add_var) +TRANS_FEAT(SUB_azz_n1_d, aa64_sme2_i16i64, do_azz_n1, a, MO_64, tcg_gen_gvec_sub_var) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-34-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 20 ++++++++++++++++++++ target/arm/tcg/translate-sme.c | 30 ++++++++++++++++++++++++++++++ 2 files changed, 50 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SUB_azz_n1_s 11000001 0010 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=2 SUB_azz_n1_s 11000001 0011 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=4 SUB_azz_n1_d 11000001 0110 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=2 SUB_azz_n1_d 11000001 0111 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=4 + +### SME2 Multi-vector Multiple Array Vectors + +%zn_ax2 6:4 !function=times_2 +%zn_ax4 7:3 !function=times_4 + +@azz_2x2_o3 ........ ... ..... . .. ... ..... .. off:3 \ + &azz_n n=2 rv=%mova_rv zn=%zn_ax2 zm=%zm_ax2 +@azz_4x4_o3 ........ ... ..... . .. ... ..... .. off:3 \ + &azz_n n=4 rv=%mova_rv zn=%zn_ax4 zm=%zm_ax4 + +ADD_azz_nn_s 11000001 101 ....0 0 .. 110 ....0 10 ... @azz_2x2_o3 +ADD_azz_nn_s 11000001 101 ...01 0 .. 110 ...00 10 ... @azz_4x4_o3 +ADD_azz_nn_d 11000001 111 ....0 0 .. 110 ....0 10 ... @azz_2x2_o3 +ADD_azz_nn_d 11000001 111 ...01 0 .. 110 ...00 10 ... @azz_4x4_o3 + +SUB_azz_nn_s 11000001 101 ....0 0 .. 110 ....0 11 ... @azz_2x2_o3 +SUB_azz_nn_s 11000001 101 ...01 0 .. 110 ...00 11 ... @azz_4x4_o3 +SUB_azz_nn_d 11000001 111 ....0 0 .. 110 ....0 11 ... @azz_2x2_o3 +SUB_azz_nn_d 11000001 111 ...01 0 .. 110 ...00 11 ... @azz_4x4_o3 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(ADD_azz_n1_s, aa64_sme2, do_azz_n1, a, MO_32, tcg_gen_gvec_add_var) TRANS_FEAT(SUB_azz_n1_s, aa64_sme2, do_azz_n1, a, MO_32, tcg_gen_gvec_sub_var) TRANS_FEAT(ADD_azz_n1_d, aa64_sme2_i16i64, do_azz_n1, a, MO_64, tcg_gen_gvec_add_var) TRANS_FEAT(SUB_azz_n1_d, aa64_sme2_i16i64, do_azz_n1, a, MO_64, tcg_gen_gvec_sub_var) + +/* Add/Sub each vector Z[m*N] to each Z[n*N] with result in ZA[d*N]. */ +static bool do_azz_nn(DisasContext *s, arg_azz_n *a, int esz, + GVecGen3FnVar *fn) +{ + TCGv_ptr t_za; + int svl, n; + + if (!sme_smza_enabled_check(s)) { + return true; + } + + n = a->n; + t_za = get_zarray(s, a->rv, a->off, n, 1); + svl = streaming_vec_reg_size(s); + + for (int i = 0; i < n; ++i) { + int o_za = (svl / n * sizeof(ARMVectorReg)) * i; + int o_zn = vec_full_reg_offset(s, a->zn + i); + int o_zm = vec_full_reg_offset(s, a->zm + i); + + fn(esz, t_za, o_za, tcg_env, o_zn, tcg_env, o_zm, svl, svl); + } + return true; +} + +TRANS_FEAT(ADD_azz_nn_s, aa64_sme2, do_azz_nn, a, MO_32, tcg_gen_gvec_add_var) +TRANS_FEAT(SUB_azz_nn_s, aa64_sme2, do_azz_nn, a, MO_32, tcg_gen_gvec_sub_var) +TRANS_FEAT(ADD_azz_nn_d, aa64_sme2_i16i64, do_azz_nn, a, MO_64, tcg_gen_gvec_add_var) +TRANS_FEAT(SUB_azz_nn_d, aa64_sme2_i16i64, do_azz_nn, a, MO_64, tcg_gen_gvec_sub_var) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Indicate whether to use FPST_FPCR or FPST_ZA via bit 2 of simd_data(desc). For SVE, this bit remains zero. For do_FMLAL_zzzw, this requires no change. For do_FMLAL_zzxw, move the index up one bit. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-35-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-sve.c | 2 +- target/arm/tcg/vec_helper.c | 8 +++++--- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool do_FMLAL_zzxw(DisasContext *s, arg_rrxr_esz *a, bool sub, bool sel) { return gen_gvec_ptr_zzzz(s, gen_helper_sve2_fmlal_zzxw_s, a->rd, a->rn, a->rm, a->ra, - (a->index << 2) | (sel << 1) | sub, tcg_env); + (a->index << 3) | (sel << 1) | sub, tcg_env); } TRANS_FEAT(FMLALB_zzxw, aa64_sve2, do_FMLAL_zzxw, a, false, false) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va, intptr_t i, oprsz = simd_oprsz(desc); bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1); intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16); - float_status *status = &env->vfp.fp_status[FPST_A64]; + bool za = extract32(desc, SIMD_DATA_SHIFT + 2, 1); + float_status *status = &env->vfp.fp_status[za ? FPST_ZA : FPST_A64]; bool fz16 = env->vfp.fpcr & FPCR_FZ16; int negx = 0, negf = 0; @@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va, intptr_t i, j, oprsz = simd_oprsz(desc); bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1); intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16); - intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 2, 3) * sizeof(float16); - float_status *status = &env->vfp.fp_status[FPST_A64]; + bool za = extract32(desc, SIMD_DATA_SHIFT + 2, 1); + intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 3, 3) * sizeof(float16); + float_status *status = &env->vfp.fp_status[za ? FPST_ZA : FPST_A64]; bool fz16 = env->vfp.fpcr & FPCR_FZ16; int negx = 0, negf = 0; -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-36-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper.h | 8 +++++ target/arm/tcg/vec_helper.c | 58 ++++++++++++++++++++++++++++--------- 2 files changed, 53 insertions(+), 13 deletions(-) diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper.h +++ b/target/arm/tcg/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(gvec_bfmmla, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_6(gvec_bfmlal, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_6(gvec_bfmlsl, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_6(gvec_ah_bfmlsl, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_6(gvec_bfmlal_idx, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_6(gvec_bfmlsl_idx, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_6(gvec_ah_bfmlsl_idx, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_sclamp_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfmmla)(void *vd, void *vn, void *vm, void *va, clear_tail(d, opr_sz, simd_maxsz(desc)); } -void HELPER(gvec_bfmlal)(void *vd, void *vn, void *vm, void *va, - float_status *stat, uint32_t desc) +static void do_bfmlal(float32 *d, bfloat16 *n, bfloat16 *m, float32 *a, + float_status *stat, uint32_t desc, int negx, int negf) { intptr_t i, opr_sz = simd_oprsz(desc); - intptr_t sel = simd_data(desc); - float32 *d = vd, *a = va; - bfloat16 *n = vn, *m = vm; + intptr_t sel = extract32(desc, SIMD_DATA_SHIFT, 1); for (i = 0; i < opr_sz / 4; ++i) { - float32 nn = n[H2(i * 2 + sel)] << 16; + float32 nn = (negx ^ n[H2(i * 2 + sel)]) << 16; float32 mm = m[H2(i * 2 + sel)] << 16; - d[H4(i)] = float32_muladd(nn, mm, a[H4(i)], 0, stat); + d[H4(i)] = float32_muladd(nn, mm, a[H4(i)], negf, stat); } clear_tail(d, opr_sz, simd_maxsz(desc)); } -void HELPER(gvec_bfmlal_idx)(void *vd, void *vn, void *vm, - void *va, float_status *stat, uint32_t desc) +void HELPER(gvec_bfmlal)(void *vd, void *vn, void *vm, void *va, + float_status *stat, uint32_t desc) +{ + do_bfmlal(vd, vn, vm, va, stat, desc, 0, 0); +} + +void HELPER(gvec_bfmlsl)(void *vd, void *vn, void *vm, void *va, + float_status *stat, uint32_t desc) +{ + do_bfmlal(vd, vn, vm, va, stat, desc, 0x8000, 0); +} + +void HELPER(gvec_ah_bfmlsl)(void *vd, void *vn, void *vm, void *va, + float_status *stat, uint32_t desc) +{ + do_bfmlal(vd, vn, vm, va, stat, desc, 0, float_muladd_negate_product); +} + +static void do_bfmlal_idx(float32 *d, bfloat16 *n, bfloat16 *m, float32 *a, + float_status *stat, uint32_t desc, int negx, int negf) { intptr_t i, j, opr_sz = simd_oprsz(desc); intptr_t sel = extract32(desc, SIMD_DATA_SHIFT, 1); intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 1, 3); intptr_t elements = opr_sz / 4; intptr_t eltspersegment = MIN(16 / 4, elements); - float32 *d = vd, *a = va; - bfloat16 *n = vn, *m = vm; for (i = 0; i < elements; i += eltspersegment) { float32 m_idx = m[H2(2 * i + index)] << 16; for (j = i; j < i + eltspersegment; j++) { - float32 n_j = n[H2(2 * j + sel)] << 16; - d[H4(j)] = float32_muladd(n_j, m_idx, a[H4(j)], 0, stat); + float32 n_j = (negx ^ n[H2(2 * j + sel)]) << 16; + d[H4(j)] = float32_muladd(n_j, m_idx, a[H4(j)], negf, stat); } } clear_tail(d, opr_sz, simd_maxsz(desc)); } +void HELPER(gvec_bfmlal_idx)(void *vd, void *vn, void *vm, void *va, + float_status *stat, uint32_t desc) +{ + do_bfmlal_idx(vd, vn, vm, va, stat, desc, 0, 0); +} + +void HELPER(gvec_bfmlsl_idx)(void *vd, void *vn, void *vm, void *va, + float_status *stat, uint32_t desc) +{ + do_bfmlal_idx(vd, vn, vm, va, stat, desc, 0x8000, 0); +} + +void HELPER(gvec_ah_bfmlsl_idx)(void *vd, void *vn, void *vm, void *va, + float_status *stat, uint32_t desc) +{ + do_bfmlal_idx(vd, vn, vm, va, stat, desc, 0, float_muladd_negate_product); +} + #define DO_CLAMP(NAME, TYPE) \ void HELPER(NAME)(void *d, void *n, void *m, void *a, uint32_t desc) \ { \ -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-37-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 71 ++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 98 ++++++++++++++++++++++++++++++++++ 2 files changed, 169 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SUB_azz_n1_s 11000001 0011 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=4 SUB_azz_n1_d 11000001 0110 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=2 SUB_azz_n1_d 11000001 0111 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=4 +%off3_x2 0:3 !function=times_2 +%off2_x2 0:2 !function=times_2 + +@azz_nx1_o3x2 ........ ... . zm:4 . .. ... zn:5 .. ... \ + &azz_n off=%off3_x2 rv=%mova_rv +@azz_nx1_o2x2 ........ ... . zm:4 . .. ... zn:5 ... .. \ + &azz_n off=%off2_x2 rv=%mova_rv + +FMLAL_n1 11000001 001 0 .... 0 .. 011 ..... 00 ... @azz_nx1_o3x2 n=1 +FMLAL_n1 11000001 001 0 .... 0 .. 010 ..... 000 .. @azz_nx1_o2x2 n=2 +FMLAL_n1 11000001 001 1 .... 0 .. 010 ..... 000 .. @azz_nx1_o2x2 n=4 + +FMLSL_n1 11000001 001 0 .... 0 .. 011 ..... 01 ... @azz_nx1_o3x2 n=1 +FMLSL_n1 11000001 001 0 .... 0 .. 010 ..... 010 .. @azz_nx1_o2x2 n=2 +FMLSL_n1 11000001 001 1 .... 0 .. 010 ..... 010 .. @azz_nx1_o2x2 n=4 + +BFMLAL_n1 11000001 001 0 .... 0 .. 011 ..... 10 ... @azz_nx1_o3x2 n=1 +BFMLAL_n1 11000001 001 0 .... 0 .. 010 ..... 100 .. @azz_nx1_o2x2 n=2 +BFMLAL_n1 11000001 001 1 .... 0 .. 010 ..... 100 .. @azz_nx1_o2x2 n=4 + +BFMLSL_n1 11000001 001 0 .... 0 .. 011 ..... 11 ... @azz_nx1_o3x2 n=1 +BFMLSL_n1 11000001 001 0 .... 0 .. 010 ..... 110 .. @azz_nx1_o2x2 n=2 +BFMLSL_n1 11000001 001 1 .... 0 .. 010 ..... 110 .. @azz_nx1_o2x2 n=4 + ### SME2 Multi-vector Multiple Array Vectors %zn_ax2 6:4 !function=times_2 @@ -XXX,XX +XXX,XX @@ SUB_azz_nn_s 11000001 101 ....0 0 .. 110 ....0 11 ... @azz_2x2_o3 SUB_azz_nn_s 11000001 101 ...01 0 .. 110 ...00 11 ... @azz_4x4_o3 SUB_azz_nn_d 11000001 111 ....0 0 .. 110 ....0 11 ... @azz_2x2_o3 SUB_azz_nn_d 11000001 111 ...01 0 .. 110 ...00 11 ... @azz_4x4_o3 + +@azz_2x2_o2x2 ........ ... ..... . .. ... ..... ... .. \ + &azz_n n=2 rv=%mova_rv zn=%zn_ax2 zm=%zm_ax2 off=%off2_x2 +@azz_4x4_o2x2 ........ ... ..... . .. ... ..... ... .. \ + &azz_n n=4 rv=%mova_rv zn=%zn_ax4 zm=%zm_ax4 off=%off2_x2 + +FMLAL_nn 11000001 101 ....0 0 .. 010 ....0 000 .. @azz_2x2_o2x2 +FMLAL_nn 11000001 101 ...01 0 .. 010 ...00 000 .. @azz_4x4_o2x2 + +FMLSL_nn 11000001 101 ....0 0 .. 010 ....0 010 .. @azz_2x2_o2x2 +FMLSL_nn 11000001 101 ...01 0 .. 010 ...00 010 .. @azz_4x4_o2x2 + +BFMLAL_nn 11000001 101 ....0 0 .. 010 ....0 100 .. @azz_2x2_o2x2 +BFMLAL_nn 11000001 101 ...01 0 .. 010 ...00 100 .. @azz_4x4_o2x2 + +BFMLSL_nn 11000001 101 ....0 0 .. 010 ....0 110 .. @azz_2x2_o2x2 +BFMLSL_nn 11000001 101 ...01 0 .. 010 ...00 110 .. @azz_4x4_o2x2 + +### SME2 Multi-vector Indexed + +&azx_n n off rv zn zm idx + +%idx3_15_10 15:1 10:2 +%idx2_10_2 10:2 2:1 + +@azx_1x1_o3x2 ........ .... zm:4 . .. . .. zn:5 .. ... \ + &azx_n n=1 rv=%mova_rv off=%off3_x2 idx=%idx3_15_10 +@azx_2x1_o2x2 ........ .... zm:4 . .. . .. ..... .. ... \ + &azx_n n=2 rv=%mova_rv off=%off2_x2 zn=%zn_ax2 idx=%idx2_10_2 +@azx_4x1_o2x2 ........ .... zm:4 . .. . .. ..... .. ... \ + &azx_n n=4 rv=%mova_rv off=%off2_x2 zn=%zn_ax4 idx=%idx2_10_2 + +FMLAL_nx 11000001 1000 .... . .. 1 .. ..... 00 ... @azx_1x1_o3x2 +FMLAL_nx 11000001 1001 .... 0 .. 1 .. ....0 00 ... @azx_2x1_o2x2 +FMLAL_nx 11000001 1001 .... 1 .. 1 .. ...00 00 ... @azx_4x1_o2x2 + +FMLSL_nx 11000001 1000 .... . .. 1 .. ..... 01 ... @azx_1x1_o3x2 +FMLSL_nx 11000001 1001 .... 0 .. 1 .. ....0 01 ... @azx_2x1_o2x2 +FMLSL_nx 11000001 1001 .... 1 .. 1 .. ...00 01 ... @azx_4x1_o2x2 + +BFMLAL_nx 11000001 1000 .... . .. 1 .. ..... 10 ... @azx_1x1_o3x2 +BFMLAL_nx 11000001 1001 .... 0 .. 1 .. ....0 10 ... @azx_2x1_o2x2 +BFMLAL_nx 11000001 1001 .... 1 .. 1 .. ...00 10 ... @azx_4x1_o2x2 + +BFMLSL_nx 11000001 1000 .... . .. 1 .. ..... 11 ... @azx_1x1_o3x2 +BFMLSL_nx 11000001 1001 .... 0 .. 1 .. ....0 11 ... @azx_2x1_o2x2 +BFMLSL_nx 11000001 1001 .... 1 .. 1 .. ...00 11 ... @azx_4x1_o2x2 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(ADD_azz_nn_s, aa64_sme2, do_azz_nn, a, MO_32, tcg_gen_gvec_add_var) TRANS_FEAT(SUB_azz_nn_s, aa64_sme2, do_azz_nn, a, MO_32, tcg_gen_gvec_sub_var) TRANS_FEAT(ADD_azz_nn_d, aa64_sme2_i16i64, do_azz_nn, a, MO_64, tcg_gen_gvec_add_var) TRANS_FEAT(SUB_azz_nn_d, aa64_sme2_i16i64, do_azz_nn, a, MO_64, tcg_gen_gvec_sub_var) + +/* + * Expand array multi-vector single (n1), array multi-vector (nn), + * and array multi-vector indexed (nx), for floating-point accumulate. + * multi: true for nn, false for n1. + * fpst: >= 0 to set ptr argument for FPST_*, < 0 for ENV. + * data: stuff for simd_data, including any index. + */ +#define FPST_ENV -1 + +static bool do_azz_acc_fp(DisasContext *s, int nreg, int nsel, + int rv, int off, int zn, int zm, + int data, int shsel, bool multi, int fpst, + gen_helper_gvec_4_ptr *fn) +{ + if (sme_smza_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + int vstride = svl / nreg; + TCGv_ptr t_za = get_zarray(s, rv, off, nreg, nsel); + TCGv_ptr t, ptr; + + if (fpst >= 0) { + ptr = fpstatus_ptr(fpst); + } else { + ptr = tcg_env; + } + t = tcg_temp_new_ptr(); + + for (int r = 0; r < nreg; ++r) { + TCGv_ptr t_zn = vec_full_reg_ptr(s, zn); + TCGv_ptr t_zm = vec_full_reg_ptr(s, zm); + + for (int i = 0; i < nsel; ++i) { + int o_za = (r * vstride + i) * sizeof(ARMVectorReg); + int desc = simd_desc(svl, svl, data | (i << shsel)); + + tcg_gen_addi_ptr(t, t_za, o_za); + fn(t, t_zn, t_zm, t, ptr, tcg_constant_i32(desc)); + } + + /* + * For multiple-and-single vectors, Zn may wrap. + * For multiple vectors, both Zn and Zm are aligned. + */ + zn = (zn + 1) % 32; + zm += multi; + } + } + return true; +} + +static bool do_fmlal(DisasContext *s, arg_azz_n *a, bool sub, bool multi) +{ + return do_azz_acc_fp(s, a->n, 2, a->rv, a->off, a->zn, a->zm, + (1 << 2) | sub, 1, + multi, FPST_ENV, gen_helper_sve2_fmlal_zzzw_s); +} + +TRANS_FEAT(FMLAL_n1, aa64_sme2, do_fmlal, a, false, false) +TRANS_FEAT(FMLSL_n1, aa64_sme2, do_fmlal, a, true, false) +TRANS_FEAT(FMLAL_nn, aa64_sme2, do_fmlal, a, false, true) +TRANS_FEAT(FMLSL_nn, aa64_sme2, do_fmlal, a, true, true) + +static bool do_fmlal_nx(DisasContext *s, arg_azx_n *a, bool sub) +{ + return do_azz_acc_fp(s, a->n, 2, a->rv, a->off, a->zn, a->zm, + (a->idx << 3) | (1 << 2) | sub, 1, + false, FPST_ENV, gen_helper_sve2_fmlal_zzxw_s); +} + +TRANS_FEAT(FMLAL_nx, aa64_sme2, do_fmlal_nx, a, false) +TRANS_FEAT(FMLSL_nx, aa64_sme2, do_fmlal_nx, a, true) + +static bool do_bfmlal(DisasContext *s, arg_azz_n *a, bool sub, bool multi) +{ + return do_azz_acc_fp(s, a->n, 2, a->rv, a->off, a->zn, a->zm, + 0, 0, multi, FPST_ZA, + (!sub ? gen_helper_gvec_bfmlal + : s->fpcr_ah ? gen_helper_gvec_ah_bfmlsl + : gen_helper_gvec_bfmlsl)); +} + +TRANS_FEAT(BFMLAL_n1, aa64_sme2, do_bfmlal, a, false, false) +TRANS_FEAT(BFMLSL_n1, aa64_sme2, do_bfmlal, a, true, false) +TRANS_FEAT(BFMLAL_nn, aa64_sme2, do_bfmlal, a, false, true) +TRANS_FEAT(BFMLSL_nn, aa64_sme2, do_bfmlal, a, true, true) + +static bool do_bfmlal_nx(DisasContext *s, arg_azx_n *a, bool sub) +{ + return do_azz_acc_fp(s, a->n, 2, a->rv, a->off, a->zn, a->zm, + a->idx << 1, 0, false, FPST_ZA, + !sub ? gen_helper_gvec_bfmlal_idx + : s->fpcr_ah ? gen_helper_gvec_ah_bfmlsl_idx + : gen_helper_gvec_bfmlsl_idx); +} + +TRANS_FEAT(BFMLAL_nx, aa64_sme2, do_bfmlal_nx, a, false) +TRANS_FEAT(BFMLSL_nx, aa64_sme2, do_bfmlal_nx, a, true) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-38-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 5 ++++ target/arm/tcg/sme.decode | 14 +++++++++++ target/arm/tcg/sve.decode | 7 ++++-- target/arm/tcg/sme_helper.c | 44 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 18 ++++++++++++++ target/arm/tcg/translate-sve.c | 5 ++++ 6 files changed, 91 insertions(+), 2 deletions(-) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fmaxnum_b16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fminnum_b16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) + +DEF_HELPER_FLAGS_6(sme2_fdot_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_6(sme2_fdot_idx_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, env, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ BFMLSL_n1 11000001 001 0 .... 0 .. 011 ..... 11 ... @azz_nx1_o3x2 n=1 BFMLSL_n1 11000001 001 0 .... 0 .. 010 ..... 110 .. @azz_nx1_o2x2 n=2 BFMLSL_n1 11000001 001 1 .... 0 .. 010 ..... 110 .. @azz_nx1_o2x2 n=4 +FDOT_n1 11000001 001 0 .... 0 .. 100 ..... 00 ... @azz_nx1_o3 n=2 +FDOT_n1 11000001 001 1 .... 0 .. 100 ..... 00 ... @azz_nx1_o3 n=4 + ### SME2 Multi-vector Multiple Array Vectors %zn_ax2 6:4 !function=times_2 @@ -XXX,XX +XXX,XX @@ BFMLAL_nn 11000001 101 ...01 0 .. 010 ...00 100 .. @azz_4x4_o2x2 BFMLSL_nn 11000001 101 ....0 0 .. 010 ....0 110 .. @azz_2x2_o2x2 BFMLSL_nn 11000001 101 ...01 0 .. 010 ...00 110 .. @azz_4x4_o2x2 +FDOT_nn 11000001 101 ....0 0 .. 100 ....0 00 ... @azz_2x2_o3 +FDOT_nn 11000001 101 ...01 0 .. 100 ...00 00 ... @azz_4x4_o3 + ### SME2 Multi-vector Indexed &azx_n n off rv zn zm idx @@ -XXX,XX +XXX,XX @@ BFMLAL_nx 11000001 1001 .... 1 .. 1 .. ...00 10 ... @azx_4x1_o2x2 BFMLSL_nx 11000001 1000 .... . .. 1 .. ..... 11 ... @azx_1x1_o3x2 BFMLSL_nx 11000001 1001 .... 0 .. 1 .. ....0 11 ... @azx_2x1_o2x2 BFMLSL_nx 11000001 1001 .... 1 .. 1 .. ...00 11 ... @azx_4x1_o2x2 + +@azx_2x1_i2_o3 ........ .... zm:4 . .. . idx:2 .... ... off:3 \ + &azx_n n=2 rv=%mova_rv zn=%zn_ax2 +@azx_4x1_i2_o3 ........ .... zm:4 . .. . idx:2 .... ... off:3 \ + &azx_n n=2 rv=%mova_rv zn=%zn_ax4 + +FDOT_nx 11000001 0101 .... 0 .. 1 .. ....0 01 ... @azx_2x1_i2_o3 +FDOT_nx 11000001 0101 .... 1 .. 1 .. ...00 01 ... @azx_4x1_i2_o3 diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ FMLSLT_zzzw 01100100 10 1 ..... 10 1 00 1 ..... ..... @rda_rn_rm_ex esz=2 BFMLALB_zzzw 01100100 11 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2 BFMLALT_zzzw 01100100 11 1 ..... 10 0 00 1 ..... ..... @rda_rn_rm_ex esz=2 -### SVE2 floating-point bfloat16 dot-product +### SVE2 floating-point dot-product +FDOT_zzzz 01100100 00 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2 BFDOT_zzzz 01100100 01 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2 ### SVE2 floating-point multiply-add long (indexed) @@ -XXX,XX +XXX,XX @@ FMLSLT_zzxw 01100100 10 1 ..... 0110.1 ..... ..... @rrxr_3a esz=2 BFMLALB_zzxw 01100100 11 1 ..... 0100.0 ..... ..... @rrxr_3a esz=2 BFMLALT_zzxw 01100100 11 1 ..... 0100.1 ..... ..... @rrxr_3a esz=2 -### SVE2 floating-point bfloat16 dot-product (indexed) +### SVE2 floating-point dot-product (indexed) + +FDOT_zzxz 01100100 00 1 ..... 010000 ..... ..... @rrxr_2 esz=2 BFDOT_zzxz 01100100 01 1 ..... 010000 ..... ..... @rrxr_2 esz=2 ### SVE broadcast predicate element diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn, } } +void HELPER(sme2_fdot_h)(void *vd, void *vn, void *vm, void *va, + CPUARMState *env, uint32_t desc) +{ + intptr_t i, oprsz = simd_maxsz(desc); + bool za = extract32(desc, SIMD_DATA_SHIFT, 1); + float_status *fpst_std = &env->vfp.fp_status[za ? FPST_ZA : FPST_A64]; + float_status *fpst_f16 = &env->vfp.fp_status[za ? FPST_ZA_F16 : FPST_A64_F16]; + float_status fpst_odd = *fpst_std; + float32 *d = vd, *a = va; + uint32_t *n = vn, *m = vm; + + set_float_rounding_mode(float_round_to_odd, &fpst_odd); + + for (i = 0; i < oprsz / sizeof(float32); ++i) { + d[H4(i)] = f16_dotadd(a[H4(i)], n[H4(i)], m[H4(i)], + fpst_f16, fpst_std, &fpst_odd); + } +} + +void HELPER(sme2_fdot_idx_h)(void *vd, void *vn, void *vm, void *va, + CPUARMState *env, uint32_t desc) +{ + intptr_t i, j, oprsz = simd_maxsz(desc); + intptr_t elements = oprsz / sizeof(float32); + intptr_t eltspersegment = MIN(4, elements); + int idx = extract32(desc, SIMD_DATA_SHIFT, 2); + bool za = extract32(desc, SIMD_DATA_SHIFT + 2, 1); + float_status *fpst_std = &env->vfp.fp_status[za ? FPST_ZA : FPST_A64]; + float_status *fpst_f16 = &env->vfp.fp_status[za ? FPST_ZA_F16 : FPST_A64_F16]; + float_status fpst_odd = *fpst_std; + float32 *d = vd, *a = va; + uint32_t *n = vn, *m = (uint32_t *)vm + H4(idx); + + set_float_rounding_mode(float_round_to_odd, &fpst_odd); + + for (i = 0; i < elements; i += eltspersegment) { + uint32_t mm = m[i]; + for (j = 0; j < eltspersegment; ++j) { + d[H4(i + j)] = f16_dotadd(a[H4(i + j)], n[H4(i + j)], mm, + fpst_f16, fpst_std, &fpst_odd); + } + } +} + void HELPER(sme_bfmopa)(void *vza, void *vzn, void *vzm, void *vpn, void *vpm, CPUARMState *env, uint32_t desc) { diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_bfmlal_nx(DisasContext *s, arg_azx_n *a, bool sub) TRANS_FEAT(BFMLAL_nx, aa64_sme2, do_bfmlal_nx, a, false) TRANS_FEAT(BFMLSL_nx, aa64_sme2, do_bfmlal_nx, a, true) + +static bool do_fdot(DisasContext *s, arg_azz_n *a, bool multi) +{ + return do_azz_acc_fp(s, a->n, 1, a->rv, a->off, a->zn, a->zm, 1, 0, + multi, FPST_ENV, gen_helper_sme2_fdot_h); +} + +TRANS_FEAT(FDOT_n1, aa64_sme2, do_fdot, a, false) +TRANS_FEAT(FDOT_nn, aa64_sme2, do_fdot, a, true) + +static bool do_fdot_nx(DisasContext *s, arg_azx_n *a) +{ + return do_azz_acc_fp(s, a->n, 1, a->rv, a->off, a->zn, a->zm, + a->idx | (1 << 2), 0, false, FPST_ENV, + gen_helper_sme2_fdot_idx_h); +} + +TRANS_FEAT(FDOT_nx, aa64_sme2, do_fdot_nx, a) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT_NONSTREAMING(USMMLA, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz, TRANS_FEAT_NONSTREAMING(UMMLA, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz, gen_helper_gvec_ummla_b, a, 0) +TRANS_FEAT(FDOT_zzzz, aa64_sme2_or_sve2p1, gen_gvec_env_arg_zzzz, + gen_helper_sme2_fdot_h, a, 0) +TRANS_FEAT(FDOT_zzxz, aa64_sme2_or_sve2p1, gen_gvec_env_arg_zzxz, + gen_helper_sme2_fdot_idx_h, a) + TRANS_FEAT(BFDOT_zzzz, aa64_sve_bf16, gen_gvec_env_arg_zzzz, gen_helper_gvec_bfdot, a, 0) TRANS_FEAT(BFDOT_zzxz, aa64_sve_bf16, gen_gvec_env_arg_zzxz, -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-39-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 9 +++++++++ target/arm/tcg/translate-sme.c | 17 +++++++++++++++++ 2 files changed, 26 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ BFMLSL_n1 11000001 001 1 .... 0 .. 010 ..... 110 .. @azz_nx1_o2x2 n=4 FDOT_n1 11000001 001 0 .... 0 .. 100 ..... 00 ... @azz_nx1_o3 n=2 FDOT_n1 11000001 001 1 .... 0 .. 100 ..... 00 ... @azz_nx1_o3 n=4 +BFDOT_n1 11000001 001 0 .... 0 .. 100 ..... 10 ... @azz_nx1_o3 n=2 +BFDOT_n1 11000001 001 1 .... 0 .. 100 ..... 10 ... @azz_nx1_o3 n=4 + ### SME2 Multi-vector Multiple Array Vectors %zn_ax2 6:4 !function=times_2 @@ -XXX,XX +XXX,XX @@ BFMLSL_nn 11000001 101 ...01 0 .. 010 ...00 110 .. @azz_4x4_o2x2 FDOT_nn 11000001 101 ....0 0 .. 100 ....0 00 ... @azz_2x2_o3 FDOT_nn 11000001 101 ...01 0 .. 100 ...00 00 ... @azz_4x4_o3 +BFDOT_nn 11000001 101 ....0 0 .. 100 ....0 10 ... @azz_2x2_o3 +BFDOT_nn 11000001 101 ...01 0 .. 100 ...00 10 ... @azz_4x4_o3 + ### SME2 Multi-vector Indexed &azx_n n off rv zn zm idx @@ -XXX,XX +XXX,XX @@ BFMLSL_nx 11000001 1001 .... 1 .. 1 .. ...00 11 ... @azx_4x1_o2x2 FDOT_nx 11000001 0101 .... 0 .. 1 .. ....0 01 ... @azx_2x1_i2_o3 FDOT_nx 11000001 0101 .... 1 .. 1 .. ...00 01 ... @azx_4x1_i2_o3 + +BFDOT_nx 11000001 0101 .... 0 .. 1 .. ....0 11 ... @azx_2x1_i2_o3 +BFDOT_nx 11000001 0101 .... 1 .. 1 .. ...00 11 ... @azx_4x1_i2_o3 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_fdot_nx(DisasContext *s, arg_azx_n *a) } TRANS_FEAT(FDOT_nx, aa64_sme2, do_fdot_nx, a) + +static bool do_bfdot(DisasContext *s, arg_azz_n *a, bool multi) +{ + return do_azz_acc_fp(s, a->n, 1, a->rv, a->off, a->zn, a->zm, 0, 0, + multi, FPST_ENV, gen_helper_gvec_bfdot); +} + +TRANS_FEAT(BFDOT_n1, aa64_sme2, do_bfdot, a, false) +TRANS_FEAT(BFDOT_nn, aa64_sme2, do_bfdot, a, true) + +static bool do_bfdot_nx(DisasContext *s, arg_azx_n *a) +{ + return do_azz_acc_fp(s, a->n, 1, a->rv, a->off, a->zn, a->zm, a->idx, 0, + false, FPST_ENV, gen_helper_gvec_bfdot_idx); +} + +TRANS_FEAT(BFDOT_nx, aa64_sme2, do_bfdot_nx, a) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-40-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 2 ++ target/arm/tcg/helper.h | 2 ++ target/arm/tcg/sme.decode | 3 +++ target/arm/tcg/sme_helper.c | 30 ++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 24 +++++++++++++++++++++ target/arm/tcg/vec_helper.c | 39 ++++++++++++++++++++++++++++++++++ 6 files changed, 100 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sme2_fdot_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, env, i32) DEF_HELPER_FLAGS_6(sme2_fdot_idx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_6(sme2_fvdot_idx_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, env, i32) diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper.h +++ b/target/arm/tcg/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(gvec_bfdot, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, env, i32) DEF_HELPER_FLAGS_6(gvec_bfdot_idx, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_6(sme2_bfvdot_idx, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, env, i32) DEF_HELPER_FLAGS_6(gvec_bfmmla, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, env, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ FDOT_nx 11000001 0101 .... 1 .. 1 .. ...00 01 ... @azx_4x1_i2_o3 BFDOT_nx 11000001 0101 .... 0 .. 1 .. ....0 11 ... @azx_2x1_i2_o3 BFDOT_nx 11000001 0101 .... 1 .. 1 .. ...00 11 ... @azx_4x1_i2_o3 + +FVDOT 11000001 0101 .... 0 .. 0 .. ....0 01 ... @azx_2x1_i2_o3 +BFVDOT 11000001 0101 .... 0 .. 0 .. ....0 11 ... @azx_2x1_i2_o3 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_fdot_idx_h)(void *vd, void *vn, void *vm, void *va, } } +void HELPER(sme2_fvdot_idx_h)(void *vd, void *vn, void *vm, void *va, + CPUARMState *env, uint32_t desc) +{ + intptr_t i, j, oprsz = simd_maxsz(desc); + intptr_t elements = oprsz / sizeof(float32); + intptr_t eltspersegment = MIN(4, elements); + int idx = extract32(desc, SIMD_DATA_SHIFT, 2); + int sel = extract32(desc, SIMD_DATA_SHIFT + 2, 1); + float_status fpst_odd, *fpst_std, *fpst_f16; + float32 *d = vd, *a = va; + uint16_t *n0 = vn; + uint16_t *n1 = vn + sizeof(ARMVectorReg); + uint32_t *m = (uint32_t *)vm + H4(idx); + + fpst_std = &env->vfp.fp_status[FPST_ZA]; + fpst_f16 = &env->vfp.fp_status[FPST_ZA_F16]; + fpst_odd = *fpst_std; + set_float_rounding_mode(float_round_to_odd, &fpst_odd); + + for (i = 0; i < elements; i += eltspersegment) { + uint32_t mm = m[i]; + for (j = 0; j < eltspersegment; ++j) { + uint32_t nn = (n0[H2(2 * (i + j) + sel)]) + | (n1[H2(2 * (i + j) + sel)] << 16); + d[i + H4(j)] = f16_dotadd(a[i + H4(j)], nn, mm, + fpst_f16, fpst_std, &fpst_odd); + } + } +} + void HELPER(sme_bfmopa)(void *vza, void *vzn, void *vzm, void *vpn, void *vpm, CPUARMState *env, uint32_t desc) { diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_bfdot_nx(DisasContext *s, arg_azx_n *a) } TRANS_FEAT(BFDOT_nx, aa64_sme2, do_bfdot_nx, a) + +static bool do_vdot(DisasContext *s, arg_azx_n *a, gen_helper_gvec_4_ptr *fn) +{ + if (sme_smza_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + int vstride = svl / 2; + TCGv_ptr t_za = get_zarray(s, a->rv, a->off, 2, 1); + TCGv_ptr t_zn = vec_full_reg_ptr(s, a->zn); + TCGv_ptr t_zm = vec_full_reg_ptr(s, a->zm); + TCGv_ptr t = tcg_temp_new_ptr(); + + for (int i = 0; i < 2; ++i) { + int o_za = i * vstride * sizeof(ARMVectorReg); + int desc = simd_desc(svl, svl, a->idx | (i << 2)); + + tcg_gen_addi_ptr(t, t_za, o_za); + fn(t, t_zn, t_zm, t, tcg_env, tcg_constant_i32(desc)); + } + } + return true; +} + +TRANS_FEAT(FVDOT, aa64_sme, do_vdot, a, gen_helper_sme2_fvdot_idx_h) +TRANS_FEAT(BFVDOT, aa64_sme, do_vdot, a, gen_helper_sme2_bfvdot_idx) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfdot_idx)(void *vd, void *vn, void *vm, clear_tail(d, opr_sz, simd_maxsz(desc)); } +void HELPER(sme2_bfvdot_idx)(void *vd, void *vn, void *vm, + void *va, CPUARMState *env, uint32_t desc) +{ + intptr_t i, j, opr_sz = simd_oprsz(desc); + intptr_t idx = extract32(desc, SIMD_DATA_SHIFT, 2); + intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 2, 1); + intptr_t elements = opr_sz / 4; + intptr_t eltspersegment = MIN(16 / 4, elements); + float32 *d = vd, *a = va; + uint16_t *n0 = vn; + uint16_t *n1 = vn + sizeof(ARMVectorReg); + uint32_t *m = vm; + float_status fpst, fpst_odd; + + if (is_ebf(env, &fpst, &fpst_odd)) { + for (i = 0; i < elements; i += eltspersegment) { + uint32_t m_idx = m[i + H4(idx)]; + + for (j = 0; j < eltspersegment; j++) { + uint32_t nn = (n0[H2(2 * (i + j) + sel)]) + | (n1[H2(2 * (i + j) + sel)] << 16); + d[i + H4(j)] = bfdotadd_ebf(a[i + H4(j)], nn, m_idx, + &fpst, &fpst_odd); + } + } + } else { + for (i = 0; i < elements; i += eltspersegment) { + uint32_t m_idx = m[i + H4(idx)]; + + for (j = 0; j < eltspersegment; j++) { + uint32_t nn = (n0[H2(2 * (i + j) + sel)]) + | (n1[H2(2 * (i + j) + sel)] << 16); + d[i + H4(j)] = bfdotadd(a[i + H4(j)], nn, m_idx, &fpst); + } + } + } + clear_tail(d, opr_sz, simd_maxsz(desc)); +} + void HELPER(gvec_bfmmla)(void *vd, void *vn, void *vm, void *va, CPUARMState *env, uint32_t desc) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Emphasize that these are 4-way dot products. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-41-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper.h | 22 +++++++++++----------- target/arm/tcg/translate-a64.c | 14 +++++++------- target/arm/tcg/translate-neon.c | 14 +++++++------- target/arm/tcg/translate-sve.c | 18 +++++++++--------- target/arm/tcg/vec_helper.c | 22 +++++++++++----------- 5 files changed, 45 insertions(+), 45 deletions(-) diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper.h +++ b/target/arm/tcg/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sve2_sqrdmlah_d, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(sve2_sqrdmlsh_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_sdot_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_udot_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_sdot_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_udot_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_usdot_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sdot_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_udot_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sdot_4h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_udot_4h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_usdot_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_sdot_idx_b, TCG_CALL_NO_RWG, +DEF_HELPER_FLAGS_5(gvec_sdot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_udot_idx_b, TCG_CALL_NO_RWG, +DEF_HELPER_FLAGS_5(gvec_udot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_sdot_idx_h, TCG_CALL_NO_RWG, +DEF_HELPER_FLAGS_5(gvec_sdot_idx_4h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_udot_idx_h, TCG_CALL_NO_RWG, +DEF_HELPER_FLAGS_5(gvec_udot_idx_4h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_sudot_idx_b, TCG_CALL_NO_RWG, +DEF_HELPER_FLAGS_5(gvec_sudot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_usdot_idx_b, TCG_CALL_NO_RWG, +DEF_HELPER_FLAGS_5(gvec_usdot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_fcaddh, TCG_CALL_NO_RWG, diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static bool do_dot_vector_env(DisasContext *s, arg_qrrr_e *a, return true; } -TRANS_FEAT(SDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_sdot_b) -TRANS_FEAT(UDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_udot_b) -TRANS_FEAT(USDOT_v, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_usdot_b) +TRANS_FEAT(SDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_sdot_4b) +TRANS_FEAT(UDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_udot_4b) +TRANS_FEAT(USDOT_v, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_usdot_4b) TRANS_FEAT(BFDOT_v, aa64_bf16, do_dot_vector_env, a, gen_helper_gvec_bfdot) TRANS_FEAT(BFMMLA, aa64_bf16, do_dot_vector_env, a, gen_helper_gvec_bfmmla) TRANS_FEAT(SMMLA, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_smmla_b) @@ -XXX,XX +XXX,XX @@ static bool do_dot_vector_idx_env(DisasContext *s, arg_qrrx_e *a, return true; } -TRANS_FEAT(SDOT_vi, aa64_dp, do_dot_vector_idx, a, gen_helper_gvec_sdot_idx_b) -TRANS_FEAT(UDOT_vi, aa64_dp, do_dot_vector_idx, a, gen_helper_gvec_udot_idx_b) +TRANS_FEAT(SDOT_vi, aa64_dp, do_dot_vector_idx, a, gen_helper_gvec_sdot_idx_4b) +TRANS_FEAT(UDOT_vi, aa64_dp, do_dot_vector_idx, a, gen_helper_gvec_udot_idx_4b) TRANS_FEAT(SUDOT_vi, aa64_i8mm, do_dot_vector_idx, a, - gen_helper_gvec_sudot_idx_b) + gen_helper_gvec_sudot_idx_4b) TRANS_FEAT(USDOT_vi, aa64_i8mm, do_dot_vector_idx, a, - gen_helper_gvec_usdot_idx_b) + gen_helper_gvec_usdot_idx_4b) TRANS_FEAT(BFDOT_vi, aa64_bf16, do_dot_vector_idx_env, a, gen_helper_gvec_bfdot_idx) diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-neon.c +++ b/target/arm/tcg/translate-neon.c @@ -XXX,XX +XXX,XX @@ static bool trans_VSDOT(DisasContext *s, arg_VSDOT *a) return false; } return do_neon_ddda(s, a->q * 7, a->vd, a->vn, a->vm, 0, - gen_helper_gvec_sdot_b); + gen_helper_gvec_sdot_4b); } static bool trans_VUDOT(DisasContext *s, arg_VUDOT *a) @@ -XXX,XX +XXX,XX @@ static bool trans_VUDOT(DisasContext *s, arg_VUDOT *a) return false; } return do_neon_ddda(s, a->q * 7, a->vd, a->vn, a->vm, 0, - gen_helper_gvec_udot_b); + gen_helper_gvec_udot_4b); } static bool trans_VUSDOT(DisasContext *s, arg_VUSDOT *a) @@ -XXX,XX +XXX,XX @@ static bool trans_VUSDOT(DisasContext *s, arg_VUSDOT *a) return false; } return do_neon_ddda(s, a->q * 7, a->vd, a->vn, a->vm, 0, - gen_helper_gvec_usdot_b); + gen_helper_gvec_usdot_4b); } static bool trans_VDOT_b16(DisasContext *s, arg_VDOT_b16 *a) @@ -XXX,XX +XXX,XX @@ static bool trans_VSDOT_scalar(DisasContext *s, arg_VSDOT_scalar *a) return false; } return do_neon_ddda(s, a->q * 6, a->vd, a->vn, a->vm, a->index, - gen_helper_gvec_sdot_idx_b); + gen_helper_gvec_sdot_idx_4b); } static bool trans_VUDOT_scalar(DisasContext *s, arg_VUDOT_scalar *a) @@ -XXX,XX +XXX,XX @@ static bool trans_VUDOT_scalar(DisasContext *s, arg_VUDOT_scalar *a) return false; } return do_neon_ddda(s, a->q * 6, a->vd, a->vn, a->vm, a->index, - gen_helper_gvec_udot_idx_b); + gen_helper_gvec_udot_idx_4b); } static bool trans_VUSDOT_scalar(DisasContext *s, arg_VUSDOT_scalar *a) @@ -XXX,XX +XXX,XX @@ static bool trans_VUSDOT_scalar(DisasContext *s, arg_VUSDOT_scalar *a) return false; } return do_neon_ddda(s, a->q * 6, a->vd, a->vn, a->vm, a->index, - gen_helper_gvec_usdot_idx_b); + gen_helper_gvec_usdot_idx_4b); } static bool trans_VSUDOT_scalar(DisasContext *s, arg_VSUDOT_scalar *a) @@ -XXX,XX +XXX,XX @@ static bool trans_VSUDOT_scalar(DisasContext *s, arg_VSUDOT_scalar *a) return false; } return do_neon_ddda(s, a->q * 6, a->vd, a->vn, a->vm, a->index, - gen_helper_gvec_sudot_idx_b); + gen_helper_gvec_sudot_idx_4b); } static bool trans_VDOT_b16_scal(DisasContext *s, arg_VDOT_b16_scal *a) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ DO_ZZI(UMIN, umin) #undef DO_ZZI static gen_helper_gvec_4 * const dot_fns[2][2] = { - { gen_helper_gvec_sdot_b, gen_helper_gvec_sdot_h }, - { gen_helper_gvec_udot_b, gen_helper_gvec_udot_h } + { gen_helper_gvec_sdot_4b, gen_helper_gvec_sdot_4h }, + { gen_helper_gvec_udot_4b, gen_helper_gvec_udot_4h } }; TRANS_FEAT(DOT_zzzz, aa64_sve, gen_gvec_ool_zzzz, dot_fns[a->u][a->sz], a->rd, a->rn, a->rm, a->ra, 0) @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(DOT_zzzz, aa64_sve, gen_gvec_ool_zzzz, */ TRANS_FEAT(SDOT_zzxw_s, aa64_sve, gen_gvec_ool_arg_zzxz, - gen_helper_gvec_sdot_idx_b, a) + gen_helper_gvec_sdot_idx_4b, a) TRANS_FEAT(SDOT_zzxw_d, aa64_sve, gen_gvec_ool_arg_zzxz, - gen_helper_gvec_sdot_idx_h, a) + gen_helper_gvec_sdot_idx_4h, a) TRANS_FEAT(UDOT_zzxw_s, aa64_sve, gen_gvec_ool_arg_zzxz, - gen_helper_gvec_udot_idx_b, a) + gen_helper_gvec_udot_idx_4b, a) TRANS_FEAT(UDOT_zzxw_d, aa64_sve, gen_gvec_ool_arg_zzxz, - gen_helper_gvec_udot_idx_h, a) + gen_helper_gvec_udot_idx_4h, a) TRANS_FEAT(SUDOT_zzxw_s, aa64_sve_i8mm, gen_gvec_ool_arg_zzxz, - gen_helper_gvec_sudot_idx_b, a) + gen_helper_gvec_sudot_idx_4b, a) TRANS_FEAT(USDOT_zzxw_s, aa64_sve_i8mm, gen_gvec_ool_arg_zzxz, - gen_helper_gvec_usdot_idx_b, a) + gen_helper_gvec_usdot_idx_4b, a) #define DO_SVE2_RRX(NAME, FUNC) \ TRANS_FEAT(NAME, aa64_sve, gen_gvec_ool_zzz, FUNC, \ @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SQRDCMLAH_zzzz, aa64_sve2, gen_gvec_ool_zzzz, sqrdcmlah_fns[a->esz], a->rd, a->rn, a->rm, a->ra, a->rot) TRANS_FEAT(USDOT_zzzz, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz, - a->esz == 2 ? gen_helper_gvec_usdot_b : NULL, a, 0) + a->esz == 2 ? gen_helper_gvec_usdot_4b : NULL, a, 0) TRANS_FEAT_NONSTREAMING(AESMC, aa64_sve2_aes, gen_gvec_ool_zz, gen_helper_crypto_aesmc, a->rd, a->rd, 0) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ clear_tail(d, opr_sz, simd_maxsz(desc)); \ } -DO_DOT(gvec_sdot_b, int32_t, int8_t, int8_t) -DO_DOT(gvec_udot_b, uint32_t, uint8_t, uint8_t) -DO_DOT(gvec_usdot_b, uint32_t, uint8_t, int8_t) -DO_DOT(gvec_sdot_h, int64_t, int16_t, int16_t) -DO_DOT(gvec_udot_h, uint64_t, uint16_t, uint16_t) +DO_DOT(gvec_sdot_4b, int32_t, int8_t, int8_t) +DO_DOT(gvec_udot_4b, uint32_t, uint8_t, uint8_t) +DO_DOT(gvec_usdot_4b, uint32_t, uint8_t, int8_t) +DO_DOT(gvec_sdot_4h, int64_t, int16_t, int16_t) +DO_DOT(gvec_udot_4h, uint64_t, uint16_t, uint16_t) #define DO_DOT_IDX(NAME, TYPED, TYPEN, TYPEM, HD) \ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ @@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ clear_tail(d, opr_sz, simd_maxsz(desc)); \ } -DO_DOT_IDX(gvec_sdot_idx_b, int32_t, int8_t, int8_t, H4) -DO_DOT_IDX(gvec_udot_idx_b, uint32_t, uint8_t, uint8_t, H4) -DO_DOT_IDX(gvec_sudot_idx_b, int32_t, int8_t, uint8_t, H4) -DO_DOT_IDX(gvec_usdot_idx_b, int32_t, uint8_t, int8_t, H4) -DO_DOT_IDX(gvec_sdot_idx_h, int64_t, int16_t, int16_t, H8) -DO_DOT_IDX(gvec_udot_idx_h, uint64_t, uint16_t, uint16_t, H8) +DO_DOT_IDX(gvec_sdot_idx_4b, int32_t, int8_t, int8_t, H4) +DO_DOT_IDX(gvec_udot_idx_4b, uint32_t, uint8_t, uint8_t, H4) +DO_DOT_IDX(gvec_sudot_idx_4b, int32_t, int8_t, uint8_t, H4) +DO_DOT_IDX(gvec_usdot_idx_4b, int32_t, uint8_t, int8_t, H4) +DO_DOT_IDX(gvec_sdot_idx_4h, int64_t, int16_t, int16_t, H8) +DO_DOT_IDX(gvec_udot_idx_4h, uint64_t, uint16_t, uint16_t, H8) void HELPER(gvec_fcaddh)(void *vd, void *vn, void *vm, float_status *fpst, uint32_t desc) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20250704142112.1018902-42-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper.h | 8 ++++ target/arm/tcg/sme.decode | 63 ++++++++++++++++++++++++- target/arm/tcg/translate-sme.c | 85 ++++++++++++++++++++++++++++++++++ target/arm/tcg/vec_helper.c | 51 ++++++++++++++++++++ 4 files changed, 206 insertions(+), 1 deletion(-) diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper.h +++ b/target/arm/tcg/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_sdot_4h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_udot_4h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_usdot_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sdot_2h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_udot_2h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(gvec_sdot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_udot_idx_4b, TCG_CALL_NO_RWG, @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_sudot_idx_4b, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(gvec_usdot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sdot_idx_2h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_udot_idx_2h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(gvec_fcaddh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fcadds, TCG_CALL_NO_RWG, diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ FDOT_n1 11000001 001 1 .... 0 .. 100 ..... 00 ... @azz_nx1_o3 n=4 BFDOT_n1 11000001 001 0 .... 0 .. 100 ..... 10 ... @azz_nx1_o3 n=2 BFDOT_n1 11000001 001 1 .... 0 .. 100 ..... 10 ... @azz_nx1_o3 n=4 +USDOT_n1 11000001 001 0 .... 0 .. 101 ..... 01 ... @azz_nx1_o3 n=2 +USDOT_n1 11000001 001 1 .... 0 .. 101 ..... 01 ... @azz_nx1_o3 n=4 + +SUDOT_n1 11000001 001 0 .... 0 .. 101 ..... 11 ... @azz_nx1_o3 n=2 +SUDOT_n1 11000001 001 1 .... 0 .. 101 ..... 11 ... @azz_nx1_o3 n=4 + +SDOT_n1_4b 11000001 001 0 .... 0 .. 101 ..... 00 ... @azz_nx1_o3 n=2 +SDOT_n1_4b 11000001 001 1 .... 0 .. 101 ..... 00 ... @azz_nx1_o3 n=4 +SDOT_n1_4h 11000001 011 0 .... 0 .. 101 ..... 00 ... @azz_nx1_o3 n=2 +SDOT_n1_4h 11000001 011 1 .... 0 .. 101 ..... 00 ... @azz_nx1_o3 n=4 +SDOT_n1_2h 11000001 011 0 .... 0 .. 101 ..... 01 ... @azz_nx1_o3 n=2 +SDOT_n1_2h 11000001 011 1 .... 0 .. 101 ..... 01 ... @azz_nx1_o3 n=4 + +UDOT_n1_4b 11000001 001 0 .... 0 .. 101 ..... 10 ... @azz_nx1_o3 n=2 +UDOT_n1_4b 11000001 001 1 .... 0 .. 101 ..... 10 ... @azz_nx1_o3 n=4 +UDOT_n1_4h 11000001 011 0 .... 0 .. 101 ..... 10 ... @azz_nx1_o3 n=2 +UDOT_n1_4h 11000001 011 1 .... 0 .. 101 ..... 10 ... @azz_nx1_o3 n=4 +UDOT_n1_2h 11000001 011 0 .... 0 .. 101 ..... 11 ... @azz_nx1_o3 n=2 +UDOT_n1_2h 11000001 011 1 .... 0 .. 101 ..... 11 ... @azz_nx1_o3 n=4 + ### SME2 Multi-vector Multiple Array Vectors %zn_ax2 6:4 !function=times_2 @@ -XXX,XX +XXX,XX @@ FDOT_nn 11000001 101 ...01 0 .. 100 ...00 00 ... @azz_4x4_o3 BFDOT_nn 11000001 101 ....0 0 .. 100 ....0 10 ... @azz_2x2_o3 BFDOT_nn 11000001 101 ...01 0 .. 100 ...00 10 ... @azz_4x4_o3 +USDOT_nn 11000001 101 ....0 0 .. 101 ....0 01 ... @azz_2x2_o3 +USDOT_nn 11000001 101 ...01 0 .. 101 ...00 01 ... @azz_4x4_o3 + +SDOT_nn_4b 11000001 101 ....0 0 .. 101 ....0 00 ... @azz_2x2_o3 +SDOT_nn_4b 11000001 101 ...01 0 .. 101 ...00 00 ... @azz_4x4_o3 +SDOT_nn_4h 11000001 111 ....0 0 .. 101 ....0 00 ... @azz_2x2_o3 +SDOT_nn_4h 11000001 111 ...01 0 .. 101 ...00 00 ... @azz_4x4_o3 +SDOT_nn_2h 11000001 111 ....0 0 .. 101 ....0 01 ... @azz_2x2_o3 +SDOT_nn_2h 11000001 111 ...01 0 .. 101 ...00 01 ... @azz_4x4_o3 + +UDOT_nn_4b 11000001 101 ....0 0 .. 101 ....0 10 ... @azz_2x2_o3 +UDOT_nn_4b 11000001 101 ...01 0 .. 101 ...00 10 ... @azz_4x4_o3 +UDOT_nn_4h 11000001 111 ....0 0 .. 101 ....0 10 ... @azz_2x2_o3 +UDOT_nn_4h 11000001 111 ...01 0 .. 101 ...00 10 ... @azz_4x4_o3 +UDOT_nn_2h 11000001 111 ....0 0 .. 101 ....0 11 ... @azz_2x2_o3 +UDOT_nn_2h 11000001 111 ...01 0 .. 101 ...00 11 ... @azz_4x4_o3 + ### SME2 Multi-vector Indexed &azx_n n off rv zn zm idx @@ -XXX,XX +XXX,XX @@ BFMLSL_nx 11000001 1001 .... 1 .. 1 .. ...00 11 ... @azx_4x1_o2x2 @azx_2x1_i2_o3 ........ .... zm:4 . .. . idx:2 .... ... off:3 \ &azx_n n=2 rv=%mova_rv zn=%zn_ax2 @azx_4x1_i2_o3 ........ .... zm:4 . .. . idx:2 .... ... off:3 \ - &azx_n n=2 rv=%mova_rv zn=%zn_ax4 + &azx_n n=4 rv=%mova_rv zn=%zn_ax4 +@azx_2x1_i1_o3 ........ .... zm:4 . .. .. idx:1 .... ... off:3 \ + &azx_n n=2 rv=%mova_rv zn=%zn_ax2 +@azx_4x1_i1_o3 ........ .... zm:4 . .. .. idx:1 .... ... off:3 \ + &azx_n n=4 rv=%mova_rv zn=%zn_ax4 FDOT_nx 11000001 0101 .... 0 .. 1 .. ....0 01 ... @azx_2x1_i2_o3 FDOT_nx 11000001 0101 .... 1 .. 1 .. ...00 01 ... @azx_4x1_i2_o3 @@ -XXX,XX +XXX,XX @@ BFDOT_nx 11000001 0101 .... 1 .. 1 .. ...00 11 ... @azx_4x1_i2_o3 FVDOT 11000001 0101 .... 0 .. 0 .. ....0 01 ... @azx_2x1_i2_o3 BFVDOT 11000001 0101 .... 0 .. 0 .. ....0 11 ... @azx_2x1_i2_o3 + +SDOT_nx_2h 11000001 0101 .... 0 .. 1 .. ....0 00 ... @azx_2x1_i2_o3 +SDOT_nx_2h 11000001 0101 .... 1 .. 1 .. ...00 00 ... @azx_4x1_i2_o3 +SDOT_nx_4b 11000001 0101 .... 0 .. 1 .. ....1 00 ... @azx_2x1_i2_o3 +SDOT_nx_4b 11000001 0101 .... 1 .. 1 .. ...01 00 ... @azx_4x1_i2_o3 +SDOT_nx_4h 11000001 1101 .... 0 .. 00 . ....0 01 ... @azx_2x1_i1_o3 +SDOT_nx_4h 11000001 1101 .... 1 .. 00 . ...00 01 ... @azx_4x1_i1_o3 + +UDOT_nx_2h 11000001 0101 .... 0 .. 1 .. ....0 10 ... @azx_2x1_i2_o3 +UDOT_nx_2h 11000001 0101 .... 1 .. 1 .. ...00 10 ... @azx_4x1_i2_o3 +UDOT_nx_4b 11000001 0101 .... 0 .. 1 .. ....1 10 ... @azx_2x1_i2_o3 +UDOT_nx_4b 11000001 0101 .... 1 .. 1 .. ...01 10 ... @azx_4x1_i2_o3 +UDOT_nx_4h 11000001 1101 .... 0 .. 00 . ....0 11 ... @azx_2x1_i1_o3 +UDOT_nx_4h 11000001 1101 .... 1 .. 00 . ...00 11 ... @azx_4x1_i1_o3 + +USDOT_nx 11000001 0101 .... 0 .. 1 .. ....1 01 ... @azx_2x1_i2_o3 +USDOT_nx 11000001 0101 .... 1 .. 1 .. ...01 01 ... @azx_4x1_i2_o3 + +SUDOT_nx 11000001 0101 .... 0 .. 1 .. ....1 11 ... @azx_2x1_i2_o3 +SUDOT_nx 11000001 0101 .... 1 .. 1 .. ...01 11 ... @azx_4x1_i2_o3 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_vdot(DisasContext *s, arg_azx_n *a, gen_helper_gvec_4_ptr *fn) TRANS_FEAT(FVDOT, aa64_sme, do_vdot, a, gen_helper_sme2_fvdot_idx_h) TRANS_FEAT(BFVDOT, aa64_sme, do_vdot, a, gen_helper_sme2_bfvdot_idx) + +/* + * Expand array multi-vector single (n1), array multi-vector (nn), + * and array multi-vector indexed (nx), for integer accumulate. + * multi: true for nn, false for n1. + * data: stuff for simd_data, including any index. + */ +static bool do_azz_acc(DisasContext *s, int nreg, int nsel, + int rv, int off, int zn, int zm, + int data, int shsel, bool multi, + gen_helper_gvec_4 *fn) +{ + if (sme_smza_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + int vstride = svl / nreg; + TCGv_ptr t_za = get_zarray(s, rv, off, nreg, nsel); + TCGv_ptr t = tcg_temp_new_ptr(); + + for (int r = 0; r < nreg; ++r) { + TCGv_ptr t_zn = vec_full_reg_ptr(s, zn); + TCGv_ptr t_zm = vec_full_reg_ptr(s, zm); + + for (int i = 0; i < nsel; ++i) { + int o_za = (r * vstride + i) * sizeof(ARMVectorReg); + int desc = simd_desc(svl, svl, data | (i << shsel)); + + tcg_gen_addi_ptr(t, t_za, o_za); + fn(t, t_zn, t_zm, t, tcg_constant_i32(desc)); + } + + /* + * For multiple-and-single vectors, Zn may wrap. + * For multiple vectors, both Zn and Zm are aligned. + */ + zn = (zn + 1) % 32; + zm += multi; + } + } + return true; +} + +static bool do_dot(DisasContext *s, arg_azz_n *a, bool multi, + gen_helper_gvec_4 *fn) +{ + return do_azz_acc(s, a->n, 1, a->rv, a->off, a->zn, a->zm, + 0, 0, multi, fn); +} + +static void gen_helper_gvec_sudot_4b(TCGv_ptr d, TCGv_ptr n, TCGv_ptr m, + TCGv_ptr a, TCGv_i32 desc) +{ + gen_helper_gvec_usdot_4b(d, m, n, a, desc); +} + +TRANS_FEAT(USDOT_n1, aa64_sme2, do_dot, a, false, gen_helper_gvec_usdot_4b) +TRANS_FEAT(SUDOT_n1, aa64_sme2, do_dot, a, false, gen_helper_gvec_sudot_4b) +TRANS_FEAT(SDOT_n1_2h, aa64_sme2, do_dot, a, false, gen_helper_gvec_sdot_2h) +TRANS_FEAT(UDOT_n1_2h, aa64_sme2, do_dot, a, false, gen_helper_gvec_udot_2h) +TRANS_FEAT(SDOT_n1_4b, aa64_sme2, do_dot, a, false, gen_helper_gvec_sdot_4b) +TRANS_FEAT(UDOT_n1_4b, aa64_sme2, do_dot, a, false, gen_helper_gvec_udot_4b) +TRANS_FEAT(SDOT_n1_4h, aa64_sme2_i16i64, do_dot, a, false, gen_helper_gvec_sdot_4h) +TRANS_FEAT(UDOT_n1_4h, aa64_sme2_i16i64, do_dot, a, false, gen_helper_gvec_udot_4h) + +TRANS_FEAT(USDOT_nn, aa64_sme2, do_dot, a, true, gen_helper_gvec_usdot_4b) +TRANS_FEAT(SDOT_nn_2h, aa64_sme2, do_dot, a, true, gen_helper_gvec_sdot_2h) +TRANS_FEAT(UDOT_nn_2h, aa64_sme2, do_dot, a, true, gen_helper_gvec_udot_2h) +TRANS_FEAT(SDOT_nn_4b, aa64_sme2, do_dot, a, true, gen_helper_gvec_sdot_4b) +TRANS_FEAT(UDOT_nn_4b, aa64_sme2, do_dot, a, true, gen_helper_gvec_udot_4b) +TRANS_FEAT(SDOT_nn_4h, aa64_sme2_i16i64, do_dot, a, true, gen_helper_gvec_sdot_4h) +TRANS_FEAT(UDOT_nn_4h, aa64_sme2_i16i64, do_dot, a, true, gen_helper_gvec_udot_4h) + +static bool do_dot_nx(DisasContext *s, arg_azx_n *a, gen_helper_gvec_4 *fn) +{ + return do_azz_acc(s, a->n, 1, a->rv, a->off, a->zn, a->zm, + a->idx, 0, false, fn); +} + +TRANS_FEAT(USDOT_nx, aa64_sme2, do_dot_nx, a, gen_helper_gvec_usdot_idx_4b) +TRANS_FEAT(SUDOT_nx, aa64_sme2, do_dot_nx, a, gen_helper_gvec_sudot_idx_4b) +TRANS_FEAT(SDOT_nx_2h, aa64_sme2, do_dot_nx, a, gen_helper_gvec_sdot_idx_2h) +TRANS_FEAT(UDOT_nx_2h, aa64_sme2, do_dot_nx, a, gen_helper_gvec_udot_idx_2h) +TRANS_FEAT(SDOT_nx_4b, aa64_sme2, do_dot_nx, a, gen_helper_gvec_sdot_idx_4b) +TRANS_FEAT(UDOT_nx_4b, aa64_sme2, do_dot_nx, a, gen_helper_gvec_udot_idx_4b) +TRANS_FEAT(SDOT_nx_4h, aa64_sme2_i16i64, do_dot_nx, a, gen_helper_gvec_sdot_idx_4h) +TRANS_FEAT(UDOT_nx_4h, aa64_sme2_i16i64, do_dot_nx, a, gen_helper_gvec_udot_idx_4h) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ DO_DOT_IDX(gvec_usdot_idx_4b, int32_t, uint8_t, int8_t, H4) DO_DOT_IDX(gvec_sdot_idx_4h, int64_t, int16_t, int16_t, H8) DO_DOT_IDX(gvec_udot_idx_4h, uint64_t, uint16_t, uint16_t, H8) +#undef DO_DOT +#undef DO_DOT_IDX + +/* Similar for 2-way dot product */ +#define DO_DOT(NAME, TYPED, TYPEN, TYPEM) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ +{ \ + intptr_t i, opr_sz = simd_oprsz(desc); \ + TYPED *d = vd, *a = va; \ + TYPEN *n = vn; \ + TYPEM *m = vm; \ + for (i = 0; i < opr_sz / sizeof(TYPED); ++i) { \ + d[i] = (a[i] + \ + (TYPED)n[i * 2 + 0] * m[i * 2 + 0] + \ + (TYPED)n[i * 2 + 1] * m[i * 2 + 1]); \ + } \ + clear_tail(d, opr_sz, simd_maxsz(desc)); \ +} + +#define DO_DOT_IDX(NAME, TYPED, TYPEN, TYPEM, HD) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ +{ \ + intptr_t i = 0, opr_sz = simd_oprsz(desc); \ + intptr_t opr_sz_n = opr_sz / sizeof(TYPED); \ + intptr_t segend = MIN(16 / sizeof(TYPED), opr_sz_n); \ + intptr_t index = simd_data(desc); \ + TYPED *d = vd, *a = va; \ + TYPEN *n = vn; \ + TYPEM *m_indexed = (TYPEM *)vm + HD(index) * 2; \ + do { \ + TYPED m0 = m_indexed[i * 2 + 0]; \ + TYPED m1 = m_indexed[i * 2 + 1]; \ + do { \ + d[i] = (a[i] + \ + n[i * 2 + 0] * m0 + \ + n[i * 2 + 1] * m1); \ + } while (++i < segend); \ + segend = i + (16 / sizeof(TYPED)); \ + } while (i < opr_sz_n); \ + clear_tail(d, opr_sz, simd_maxsz(desc)); \ +} + +DO_DOT(gvec_sdot_2h, int32_t, int16_t, int16_t) +DO_DOT(gvec_udot_2h, uint32_t, uint16_t, uint16_t) + +DO_DOT_IDX(gvec_sdot_idx_2h, int32_t, int16_t, int16_t, H4) +DO_DOT_IDX(gvec_udot_idx_2h, uint32_t, uint16_t, uint16_t, H4) + +#undef DO_DOT +#undef DO_DOT_IDX + void HELPER(gvec_fcaddh)(void *vd, void *vn, void *vm, float_status *fpst, uint32_t desc) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Emphasize the 4-way nature of these dot products. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-43-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 12 ++++++------ target/arm/tcg/translate-sve.c | 12 ++++++------ 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ CDOT_zzzz 01000100 esz:2 0 rm:5 0001 rot:2 rn:5 rd:5 ra=%reg_movprfx #### SVE Multiply - Indexed # SVE integer dot product (indexed) -SDOT_zzxw_s 01000100 10 1 ..... 000000 ..... ..... @rrxr_2 esz=2 -SDOT_zzxw_d 01000100 11 1 ..... 000000 ..... ..... @rrxr_1 esz=3 -UDOT_zzxw_s 01000100 10 1 ..... 000001 ..... ..... @rrxr_2 esz=2 -UDOT_zzxw_d 01000100 11 1 ..... 000001 ..... ..... @rrxr_1 esz=3 +SDOT_zzxw_4s 01000100 10 1 ..... 000000 ..... ..... @rrxr_2 esz=2 +SDOT_zzxw_4d 01000100 11 1 ..... 000000 ..... ..... @rrxr_1 esz=3 +UDOT_zzxw_4s 01000100 10 1 ..... 000001 ..... ..... @rrxr_2 esz=2 +UDOT_zzxw_4d 01000100 11 1 ..... 000001 ..... ..... @rrxr_1 esz=3 # SVE2 integer multiply-add (indexed) MLA_zzxz_h 01000100 0. 1 ..... 000010 ..... ..... @rrxr_3 esz=1 @@ -XXX,XX +XXX,XX @@ SQRDMLSH_zzxz_s 01000100 10 1 ..... 000101 ..... ..... @rrxr_2 esz=2 SQRDMLSH_zzxz_d 01000100 11 1 ..... 000101 ..... ..... @rrxr_1 esz=3 # SVE mixed sign dot product (indexed) -USDOT_zzxw_s 01000100 10 1 ..... 000110 ..... ..... @rrxr_2 esz=2 -SUDOT_zzxw_s 01000100 10 1 ..... 000111 ..... ..... @rrxr_2 esz=2 +USDOT_zzxw_4s 01000100 10 1 ..... 000110 ..... ..... @rrxr_2 esz=2 +SUDOT_zzxw_4s 01000100 10 1 ..... 000111 ..... ..... @rrxr_2 esz=2 # SVE2 saturating multiply-add (indexed) SQDMLALB_zzxw_s 01000100 10 1 ..... 0010.0 ..... ..... @rrxr_3a esz=2 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(DOT_zzzz, aa64_sve, gen_gvec_ool_zzzz, * SVE Multiply - Indexed */ -TRANS_FEAT(SDOT_zzxw_s, aa64_sve, gen_gvec_ool_arg_zzxz, +TRANS_FEAT(SDOT_zzxw_4s, aa64_sve, gen_gvec_ool_arg_zzxz, gen_helper_gvec_sdot_idx_4b, a) -TRANS_FEAT(SDOT_zzxw_d, aa64_sve, gen_gvec_ool_arg_zzxz, +TRANS_FEAT(SDOT_zzxw_4d, aa64_sve, gen_gvec_ool_arg_zzxz, gen_helper_gvec_sdot_idx_4h, a) -TRANS_FEAT(UDOT_zzxw_s, aa64_sve, gen_gvec_ool_arg_zzxz, +TRANS_FEAT(UDOT_zzxw_4s, aa64_sve, gen_gvec_ool_arg_zzxz, gen_helper_gvec_udot_idx_4b, a) -TRANS_FEAT(UDOT_zzxw_d, aa64_sve, gen_gvec_ool_arg_zzxz, +TRANS_FEAT(UDOT_zzxw_4d, aa64_sve, gen_gvec_ool_arg_zzxz, gen_helper_gvec_udot_idx_4h, a) -TRANS_FEAT(SUDOT_zzxw_s, aa64_sve_i8mm, gen_gvec_ool_arg_zzxz, +TRANS_FEAT(SUDOT_zzxw_4s, aa64_sve_i8mm, gen_gvec_ool_arg_zzxz, gen_helper_gvec_sudot_idx_4b, a) -TRANS_FEAT(USDOT_zzxw_s, aa64_sve_i8mm, gen_gvec_ool_arg_zzxz, +TRANS_FEAT(USDOT_zzxw_4s, aa64_sve_i8mm, gen_gvec_ool_arg_zzxz, gen_helper_gvec_usdot_idx_4b, a) #define DO_SVE2_RRX(NAME, FUNC) \ -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Rename to USDOT_zzzz_4s and force size=2 during decode. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-44-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 2 +- target/arm/tcg/translate-sve.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ SQRDCMLAH_zzzz 01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5 ra=%reg_movprfx ## SVE mixed sign dot product -USDOT_zzzz 01000100 .. 0 ..... 011 110 ..... ..... @rda_rn_rm +USDOT_zzzz_4s 01000100 10 0 ..... 011 110 ..... ..... @rda_rn_rm_ex esz=2 ### SVE2 floating point matrix multiply accumulate BFMMLA 01100100 01 1 ..... 111 001 ..... ..... @rda_rn_rm_ex esz=1 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_4 * const sqrdcmlah_fns[] = { TRANS_FEAT(SQRDCMLAH_zzzz, aa64_sve2, gen_gvec_ool_zzzz, sqrdcmlah_fns[a->esz], a->rd, a->rn, a->rm, a->ra, a->rot) -TRANS_FEAT(USDOT_zzzz, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz, - a->esz == 2 ? gen_helper_gvec_usdot_4b : NULL, a, 0) +TRANS_FEAT(USDOT_zzzz_4s, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz, + gen_helper_gvec_usdot_4b, a, 0) TRANS_FEAT_NONSTREAMING(AESMC, aa64_sve2_aes, gen_gvec_ool_zz, gen_helper_crypto_aesmc, a->rd, a->rd, 0) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-45-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 8 +++++++- target/arm/tcg/translate-sve.c | 10 ++++++++++ 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ SDOT_zzxw_4d 01000100 11 1 ..... 000000 ..... ..... @rrxr_1 esz=3 UDOT_zzxw_4s 01000100 10 1 ..... 000001 ..... ..... @rrxr_2 esz=2 UDOT_zzxw_4d 01000100 11 1 ..... 000001 ..... ..... @rrxr_1 esz=3 +SDOT_zzxw_2s 01000100 10 0 ..... 110010 ..... ..... @rrxr_2 esz=2 +UDOT_zzxw_2s 01000100 10 0 ..... 110011 ..... ..... @rrxr_2 esz=2 + # SVE2 integer multiply-add (indexed) MLA_zzxz_h 01000100 0. 1 ..... 000010 ..... ..... @rrxr_3 esz=1 MLA_zzxz_s 01000100 10 1 ..... 000010 ..... ..... @rrxr_2 esz=2 @@ -XXX,XX +XXX,XX @@ UMLSLT_zzzw 01000100 .. 0 ..... 010 111 ..... ..... @rda_rn_rm CMLA_zzzz 01000100 esz:2 0 rm:5 0010 rot:2 rn:5 rd:5 ra=%reg_movprfx SQRDCMLAH_zzzz 01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5 ra=%reg_movprfx -## SVE mixed sign dot product +## SVE dot product + +SDOT_zzzz_2s 01000100 00 0 ..... 110 010 ..... ..... @rda_rn_rm_ex esz=2 +UDOT_zzzz_2s 01000100 00 0 ..... 110 011 ..... ..... @rda_rn_rm_ex esz=2 USDOT_zzzz_4s 01000100 10 0 ..... 011 110 ..... ..... @rda_rn_rm_ex esz=2 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SUDOT_zzxw_4s, aa64_sve_i8mm, gen_gvec_ool_arg_zzxz, TRANS_FEAT(USDOT_zzxw_4s, aa64_sve_i8mm, gen_gvec_ool_arg_zzxz, gen_helper_gvec_usdot_idx_4b, a) +TRANS_FEAT(SDOT_zzxw_2s, aa64_sme2_or_sve2p1, gen_gvec_ool_arg_zzxz, + gen_helper_gvec_sdot_idx_2h, a) +TRANS_FEAT(UDOT_zzxw_2s, aa64_sme2_or_sve2p1, gen_gvec_ool_arg_zzxz, + gen_helper_gvec_udot_idx_2h, a) + #define DO_SVE2_RRX(NAME, FUNC) \ TRANS_FEAT(NAME, aa64_sve, gen_gvec_ool_zzz, FUNC, \ a->rd, a->rn, a->rm, a->index) @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SQRDCMLAH_zzzz, aa64_sve2, gen_gvec_ool_zzzz, TRANS_FEAT(USDOT_zzzz_4s, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz, gen_helper_gvec_usdot_4b, a, 0) +TRANS_FEAT(SDOT_zzzz_2s, aa64_sme2_or_sve2p1, gen_gvec_ool_arg_zzzz, + gen_helper_gvec_sdot_2h, a, 0) +TRANS_FEAT(UDOT_zzzz_2s, aa64_sme2_or_sve2p1, gen_gvec_ool_arg_zzzz, + gen_helper_gvec_udot_2h, a, 0) + TRANS_FEAT_NONSTREAMING(AESMC, aa64_sve2_aes, gen_gvec_ool_zz, gen_helper_crypto_aesmc, a->rd, a->rd, 0) TRANS_FEAT_NONSTREAMING(AESIMC, aa64_sve2_aes, gen_gvec_ool_zz, -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-46-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 11 +++++++++ target/arm/tcg/sme.decode | 11 +++++++++ target/arm/tcg/sme_helper.c | 42 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 23 +++++++++++++++++++ 4 files changed, 87 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sme2_fdot_idx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, env, i32) DEF_HELPER_FLAGS_6(sme2_fvdot_idx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, env, i32) + +DEF_HELPER_FLAGS_4(sme2_svdot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uvdot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_suvdot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_usvdot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sme2_svdot_idx_4h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uvdot_idx_4h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sme2_svdot_idx_2h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uvdot_idx_2h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ USDOT_nx 11000001 0101 .... 1 .. 1 .. ...01 01 ... @azx_4x1_i2_o3 SUDOT_nx 11000001 0101 .... 0 .. 1 .. ....1 11 ... @azx_2x1_i2_o3 SUDOT_nx 11000001 0101 .... 1 .. 1 .. ...01 11 ... @azx_4x1_i2_o3 + +SVDOT_nx_2h 11000001 0101 .... 0 .. 0 .. ....1 00 ... @azx_2x1_i2_o3 +SVDOT_nx_4b 11000001 0101 .... 1 .. 0 .. ...01 00 ... @azx_4x1_i2_o3 +SVDOT_nx_4h 11000001 1101 .... 1 .. 01 . ...00 01 ... @azx_4x1_i1_o3 + +UVDOT_nx_2h 11000001 0101 .... 0 .. 0 .. ....1 10 ... @azx_2x1_i2_o3 +UVDOT_nx_4b 11000001 0101 .... 1 .. 0 .. ...01 10 ... @azx_4x1_i2_o3 +UVDOT_nx_4h 11000001 1101 .... 1 .. 01 . ...00 11 ... @azx_4x1_i1_o3 + +SUVDOT_nx_4b 11000001 0101 .... 1 .. 0 .. ...01 11 ... @azx_4x1_i2_o3 +USVDOT_nx_4b 11000001 0101 .... 1 .. 0 .. ...01 01 ... @azx_4x1_i2_o3 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ DEF_IMOP_16x2_32(umopa2_s, uint16_t, uint16_t) DEF_IMOPH(sme2, smopa2, s) DEF_IMOPH(sme2, umopa2, s) + +#define DO_VDOT_IDX(NAME, TYPED, TYPEN, TYPEM, HD, HN) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ +{ \ + intptr_t svl = simd_oprsz(desc); \ + intptr_t elements = svl / sizeof(TYPED); \ + intptr_t eltperseg = 16 / sizeof(TYPED); \ + intptr_t nreg = sizeof(TYPED) / sizeof(TYPEN); \ + intptr_t vstride = (svl / nreg) * sizeof(ARMVectorReg); \ + intptr_t zstride = sizeof(ARMVectorReg) / sizeof(TYPEN); \ + intptr_t idx = extract32(desc, SIMD_DATA_SHIFT, 2); \ + TYPEN *n = vn; \ + TYPEM *m = vm; \ + for (intptr_t r = 0; r < nreg; r++) { \ + TYPED *d = vd + r * vstride; \ + for (intptr_t seg = 0; seg < elements; seg += eltperseg) { \ + intptr_t s = seg + idx; \ + for (intptr_t e = seg; e < seg + eltperseg; e++) { \ + TYPED sum = d[HD(e)]; \ + for (intptr_t i = 0; i < nreg; i++) { \ + TYPED nn = n[i * zstride + HN(nreg * e + r)]; \ + TYPED mm = m[HN(nreg * s + i)]; \ + sum += nn * mm; \ + } \ + d[HD(e)] = sum; \ + } \ + } \ + } \ +} + +DO_VDOT_IDX(sme2_svdot_idx_4b, int32_t, int8_t, int8_t, H4, H1) +DO_VDOT_IDX(sme2_uvdot_idx_4b, uint32_t, uint8_t, uint8_t, H4, H1) +DO_VDOT_IDX(sme2_suvdot_idx_4b, int32_t, int8_t, uint8_t, H4, H1) +DO_VDOT_IDX(sme2_usvdot_idx_4b, int32_t, uint8_t, int8_t, H4, H1) + +DO_VDOT_IDX(sme2_svdot_idx_4h, int64_t, int16_t, int16_t, H8, H2) +DO_VDOT_IDX(sme2_uvdot_idx_4h, uint64_t, uint16_t, uint16_t, H8, H2) + +DO_VDOT_IDX(sme2_svdot_idx_2h, int32_t, int16_t, int16_t, H4, H2) +DO_VDOT_IDX(sme2_uvdot_idx_2h, uint32_t, uint16_t, uint16_t, H4, H2) + +#undef DO_VDOT_IDX diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SDOT_nx_4b, aa64_sme2, do_dot_nx, a, gen_helper_gvec_sdot_idx_4b) TRANS_FEAT(UDOT_nx_4b, aa64_sme2, do_dot_nx, a, gen_helper_gvec_udot_idx_4b) TRANS_FEAT(SDOT_nx_4h, aa64_sme2_i16i64, do_dot_nx, a, gen_helper_gvec_sdot_idx_4h) TRANS_FEAT(UDOT_nx_4h, aa64_sme2_i16i64, do_dot_nx, a, gen_helper_gvec_udot_idx_4h) + +static bool do_vdot_nx(DisasContext *s, arg_azx_n *a, gen_helper_gvec_3 *fn) +{ + if (sme_smza_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + fn(get_zarray(s, a->rv, a->off, a->n, 0), + vec_full_reg_ptr(s, a->zn), + vec_full_reg_ptr(s, a->zm), + tcg_constant_i32(simd_desc(svl, svl, a->idx))); + } + return true; +} + +TRANS_FEAT(SVDOT_nx_2h, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_svdot_idx_2h) +TRANS_FEAT(SVDOT_nx_4b, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_svdot_idx_4b) +TRANS_FEAT(SVDOT_nx_4h, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_svdot_idx_4h) + +TRANS_FEAT(UVDOT_nx_2h, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_uvdot_idx_2h) +TRANS_FEAT(UVDOT_nx_4b, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_uvdot_idx_4b) +TRANS_FEAT(UVDOT_nx_4h, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_uvdot_idx_4h) + +TRANS_FEAT(SUVDOT_nx_4b, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_suvdot_idx_4b) +TRANS_FEAT(USVDOT_nx_4b, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_usvdot_idx_4b) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-47-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 21 +++++ target/arm/tcg/sme.decode | 168 +++++++++++++++++++++++++++++++++ target/arm/tcg/sme_helper.c | 59 ++++++++++++ target/arm/tcg/translate-sme.c | 84 +++++++++++++++++ 4 files changed, 332 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sme2_uvdot_idx_4h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sme2_svdot_idx_2h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sme2_uvdot_idx_2h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sme2_smlall_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_smlall_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_smlsll_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_smlsll_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_umlall_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_umlall_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_umlsll_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_umlsll_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_usmlall_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sme2_smlall_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_smlall_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_smlsll_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_smlsll_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_umlall_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_umlall_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_umlsll_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_umlsll_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_usmlall_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_sumlall_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ UDOT_n1_4h 11000001 011 1 .... 0 .. 101 ..... 10 ... @azz_nx1_o3 n=4 UDOT_n1_2h 11000001 011 0 .... 0 .. 101 ..... 11 ... @azz_nx1_o3 n=2 UDOT_n1_2h 11000001 011 1 .... 0 .. 101 ..... 11 ... @azz_nx1_o3 n=4 +SMLAL_n1 11000001 011 0 .... 0 .. 011 ..... 00 ... @azz_nx1_o3x2 n=1 +SMLAL_n1 11000001 011 0 .... 0 .. 010 ..... 000 .. @azz_nx1_o2x2 n=2 +SMLAL_n1 11000001 011 1 .... 0 .. 010 ..... 000 .. @azz_nx1_o2x2 n=4 + +SMLSL_n1 11000001 011 0 .... 0 .. 011 ..... 01 ... @azz_nx1_o3x2 n=1 +SMLSL_n1 11000001 011 0 .... 0 .. 010 ..... 010 .. @azz_nx1_o2x2 n=2 +SMLSL_n1 11000001 011 1 .... 0 .. 010 ..... 010 .. @azz_nx1_o2x2 n=4 + +UMLAL_n1 11000001 011 0 .... 0 .. 011 ..... 10 ... @azz_nx1_o3x2 n=1 +UMLAL_n1 11000001 011 0 .... 0 .. 010 ..... 100 .. @azz_nx1_o2x2 n=2 +UMLAL_n1 11000001 011 1 .... 0 .. 010 ..... 100 .. @azz_nx1_o2x2 n=4 + +UMLSL_n1 11000001 011 0 .... 0 .. 011 ..... 11 ... @azz_nx1_o3x2 n=1 +UMLSL_n1 11000001 011 0 .... 0 .. 010 ..... 110 .. @azz_nx1_o2x2 n=2 +UMLSL_n1 11000001 011 1 .... 0 .. 010 ..... 110 .. @azz_nx1_o2x2 n=4 + +%off2_x4 0:2 !function=times_4 +%off1_x4 0:1 !function=times_4 + +@azz_nx1_o2x4 ........ ... . zm:4 . .. ... zn:5 ... .. \ + &azz_n off=%off2_x4 rv=%mova_rv +@azz_nx1_o1x4 ........ ... . zm:4 . .. ... zn:5 .... . \ + &azz_n off=%off1_x4 rv=%mova_rv + +SMLALL_n1_s 11000001 001 0 .... 0 .. 001 ..... 000 .. @azz_nx1_o2x4 n=1 +SMLALL_n1_d 11000001 011 0 .... 0 .. 001 ..... 000 .. @azz_nx1_o2x4 n=1 +SMLALL_n1_s 11000001 001 0 .... 0 .. 000 ..... 0000 . @azz_nx1_o1x4 n=2 +SMLALL_n1_d 11000001 011 0 .... 0 .. 000 ..... 0000 . @azz_nx1_o1x4 n=2 +SMLALL_n1_s 11000001 001 1 .... 0 .. 000 ..... 0000 . @azz_nx1_o1x4 n=4 +SMLALL_n1_d 11000001 011 1 .... 0 .. 000 ..... 0000 . @azz_nx1_o1x4 n=4 + +SMLSLL_n1_s 11000001 001 0 .... 0 .. 001 ..... 010 .. @azz_nx1_o2x4 n=1 +SMLSLL_n1_d 11000001 011 0 .... 0 .. 001 ..... 010 .. @azz_nx1_o2x4 n=1 +SMLSLL_n1_s 11000001 001 0 .... 0 .. 000 ..... 0100 . @azz_nx1_o1x4 n=2 +SMLSLL_n1_d 11000001 011 0 .... 0 .. 000 ..... 0100 . @azz_nx1_o1x4 n=2 +SMLSLL_n1_s 11000001 001 1 .... 0 .. 000 ..... 0100 . @azz_nx1_o1x4 n=4 +SMLSLL_n1_d 11000001 011 1 .... 0 .. 000 ..... 0100 . @azz_nx1_o1x4 n=4 + +UMLALL_n1_s 11000001 001 0 .... 0 .. 001 ..... 100 .. @azz_nx1_o2x4 n=1 +UMLALL_n1_d 11000001 011 0 .... 0 .. 001 ..... 100 .. @azz_nx1_o2x4 n=1 +UMLALL_n1_s 11000001 001 0 .... 0 .. 000 ..... 1000 . @azz_nx1_o1x4 n=2 +UMLALL_n1_d 11000001 011 0 .... 0 .. 000 ..... 1000 . @azz_nx1_o1x4 n=2 +UMLALL_n1_s 11000001 001 1 .... 0 .. 000 ..... 1000 . @azz_nx1_o1x4 n=4 +UMLALL_n1_d 11000001 011 1 .... 0 .. 000 ..... 1000 . @azz_nx1_o1x4 n=4 + +UMLSLL_n1_s 11000001 001 0 .... 0 .. 001 ..... 110 .. @azz_nx1_o2x4 n=1 +UMLSLL_n1_d 11000001 011 0 .... 0 .. 001 ..... 110 .. @azz_nx1_o2x4 n=1 +UMLSLL_n1_s 11000001 001 0 .... 0 .. 000 ..... 1100 . @azz_nx1_o1x4 n=2 +UMLSLL_n1_d 11000001 011 0 .... 0 .. 000 ..... 1100 . @azz_nx1_o1x4 n=2 +UMLSLL_n1_s 11000001 001 1 .... 0 .. 000 ..... 1100 . @azz_nx1_o1x4 n=4 +UMLSLL_n1_d 11000001 011 1 .... 0 .. 000 ..... 1100 . @azz_nx1_o1x4 n=4 + +USMLALL_n1_s 11000001 001 0 .... 0 .. 001 ..... 001 .. @azz_nx1_o2x4 n=1 +USMLALL_n1_s 11000001 001 0 .... 0 .. 000 ..... 0010 . @azz_nx1_o1x4 n=2 +USMLALL_n1_s 11000001 001 1 .... 0 .. 000 ..... 0010 . @azz_nx1_o1x4 n=4 + +SUMLALL_n1_s 11000001 001 0 .... 0 .. 000 ..... 1010 . @azz_nx1_o1x4 n=2 +SUMLALL_n1_s 11000001 001 1 .... 0 .. 000 ..... 1010 . @azz_nx1_o1x4 n=4 + ### SME2 Multi-vector Multiple Array Vectors %zn_ax2 6:4 !function=times_2 @@ -XXX,XX +XXX,XX @@ UDOT_nn_4h 11000001 111 ...01 0 .. 101 ...00 10 ... @azz_4x4_o3 UDOT_nn_2h 11000001 111 ....0 0 .. 101 ....0 11 ... @azz_2x2_o3 UDOT_nn_2h 11000001 111 ...01 0 .. 101 ...00 11 ... @azz_4x4_o3 +SMLAL_nn 11000001 111 ....0 0 .. 010 ....0 000 .. @azz_2x2_o2x2 +SMLAL_nn 11000001 111 ...01 0 .. 010 ...00 000 .. @azz_4x4_o2x2 + +SMLSL_nn 11000001 111 ....0 0 .. 010 ....0 010 .. @azz_2x2_o2x2 +SMLSL_nn 11000001 111 ...01 0 .. 010 ...00 010 .. @azz_4x4_o2x2 + +UMLAL_nn 11000001 111 ....0 0 .. 010 ....0 100 .. @azz_2x2_o2x2 +UMLAL_nn 11000001 111 ...01 0 .. 010 ...00 100 .. @azz_4x4_o2x2 + +UMLSL_nn 11000001 111 ....0 0 .. 010 ....0 110 .. @azz_2x2_o2x2 +UMLSL_nn 11000001 111 ...01 0 .. 010 ...00 110 .. @azz_4x4_o2x2 + +@azz_2x2_o1x4 ........ ... ..... . .. ... ..... ... .. \ + &azz_n n=2 rv=%mova_rv zn=%zn_ax2 zm=%zm_ax2 off=%off1_x4 +@azz_4x4_o1x4 ........ ... ..... . .. ... ..... ... .. \ + &azz_n n=4 rv=%mova_rv zn=%zn_ax4 zm=%zm_ax4 off=%off1_x4 + +SMLALL_nn_s 11000001 101 ....0 0 .. 000 ....0 0000 . @azz_2x2_o1x4 +SMLALL_nn_d 11000001 111 ....0 0 .. 000 ....0 0000 . @azz_2x2_o1x4 +SMLALL_nn_s 11000001 101 ...01 0 .. 000 ...00 0000 . @azz_4x4_o1x4 +SMLALL_nn_d 11000001 111 ...01 0 .. 000 ...00 0000 . @azz_4x4_o1x4 + +SMLSLL_nn_s 11000001 101 ....0 0 .. 000 ....0 0100 . @azz_2x2_o1x4 +SMLSLL_nn_d 11000001 111 ....0 0 .. 000 ....0 0100 . @azz_2x2_o1x4 +SMLSLL_nn_s 11000001 101 ...01 0 .. 000 ...00 0100 . @azz_4x4_o1x4 +SMLSLL_nn_d 11000001 111 ...01 0 .. 000 ...00 0100 . @azz_4x4_o1x4 + +UMLALL_nn_s 11000001 101 ....0 0 .. 000 ....0 1000 . @azz_2x2_o1x4 +UMLALL_nn_d 11000001 111 ....0 0 .. 000 ....0 1000 . @azz_2x2_o1x4 +UMLALL_nn_s 11000001 101 ...01 0 .. 000 ...00 1000 . @azz_4x4_o1x4 +UMLALL_nn_d 11000001 111 ...01 0 .. 000 ...00 1000 . @azz_4x4_o1x4 + +UMLSLL_nn_s 11000001 101 ....0 0 .. 000 ....0 1100 . @azz_2x2_o1x4 +UMLSLL_nn_d 11000001 111 ....0 0 .. 000 ....0 1100 . @azz_2x2_o1x4 +UMLSLL_nn_s 11000001 101 ...01 0 .. 000 ...00 1100 . @azz_4x4_o1x4 +UMLSLL_nn_d 11000001 111 ...01 0 .. 000 ...00 1100 . @azz_4x4_o1x4 + +USMLALL_nn_s 11000001 101 ....0 0 .. 000 ....0 0010 . @azz_2x2_o1x4 +USMLALL_nn_s 11000001 101 ...01 0 .. 000 ...00 0010 . @azz_4x4_o1x4 + ### SME2 Multi-vector Indexed &azx_n n off rv zn zm idx @@ -XXX,XX +XXX,XX @@ UVDOT_nx_4h 11000001 1101 .... 1 .. 01 . ...00 11 ... @azx_4x1_i1_o3 SUVDOT_nx_4b 11000001 0101 .... 1 .. 0 .. ...01 11 ... @azx_4x1_i2_o3 USVDOT_nx_4b 11000001 0101 .... 1 .. 0 .. ...01 01 ... @azx_4x1_i2_o3 + +SMLAL_nx 11000001 1100 .... . .. 1 .. ..... 00 ... @azx_1x1_o3x2 +SMLAL_nx 11000001 1101 .... 0 .. 1 .. ....0 00 ... @azx_2x1_o2x2 +SMLAL_nx 11000001 1101 .... 1 .. 1 .. ...00 00 ... @azx_4x1_o2x2 + +SMLSL_nx 11000001 1100 .... . .. 1 .. ..... 01 ... @azx_1x1_o3x2 +SMLSL_nx 11000001 1101 .... 0 .. 1 .. ....0 01 ... @azx_2x1_o2x2 +SMLSL_nx 11000001 1101 .... 1 .. 1 .. ...00 01 ... @azx_4x1_o2x2 + +UMLAL_nx 11000001 1100 .... . .. 1 .. ..... 10 ... @azx_1x1_o3x2 +UMLAL_nx 11000001 1101 .... 0 .. 1 .. ....0 10 ... @azx_2x1_o2x2 +UMLAL_nx 11000001 1101 .... 1 .. 1 .. ...00 10 ... @azx_4x1_o2x2 + +UMLSL_nx 11000001 1100 .... . .. 1 .. ..... 11 ... @azx_1x1_o3x2 +UMLSL_nx 11000001 1101 .... 0 .. 1 .. ....0 11 ... @azx_2x1_o2x2 +UMLSL_nx 11000001 1101 .... 1 .. 1 .. ...00 11 ... @azx_4x1_o2x2 + +%idx4_15_10 15:1 10:3 +%idx4_10_1 10:2 1:2 +%idx3_10_1 10:1 1:2 + +@azx_1x1_i4_o2 ........ .... zm:4 . .. ... zn:5 ... .. \ + &azx_n n=1 rv=%mova_rv off=%off2_x4 idx=%idx4_15_10 +@azx_1x1_i3_o2 ........ .... zm:4 . .. ... zn:5 ... .. \ + &azx_n n=1 rv=%mova_rv off=%off2_x4 idx=%idx3_15_10 +@azx_2x1_i4_o1 ........ .... zm:4 . .. ... ..... ... .. \ + &azx_n n=2 rv=%mova_rv off=%off1_x4 zn=%zn_ax2 idx=%idx4_10_1 +@azx_2x1_i3_o1 ........ .... zm:4 . .. ... ..... ... .. \ + &azx_n n=2 rv=%mova_rv off=%off1_x4 zn=%zn_ax2 idx=%idx3_10_1 +@azx_4x1_i4_o1 ........ .... zm:4 . .. ... ..... ... .. \ + &azx_n n=4 rv=%mova_rv off=%off1_x4 zn=%zn_ax4 idx=%idx4_10_1 +@azx_4x1_i3_o1 ........ .... zm:4 . .. ... ..... ... .. \ + &azx_n n=4 rv=%mova_rv off=%off1_x4 zn=%zn_ax4 idx=%idx3_10_1 + +SMLALL_nx_s 11000001 0000 .... . .. ... ..... 000 .. @azx_1x1_i4_o2 +SMLALL_nx_d 11000001 1000 .... . .. 0.. ..... 000 .. @azx_1x1_i3_o2 +SMLALL_nx_s 11000001 0001 .... 0 .. 0.. ....0 00 ... @azx_2x1_i4_o1 +SMLALL_nx_d 11000001 1001 .... 0 .. 00. ....0 00 ... @azx_2x1_i3_o1 +SMLALL_nx_s 11000001 0001 .... 1 .. 0.. ...00 00 ... @azx_4x1_i4_o1 +SMLALL_nx_d 11000001 1001 .... 1 .. 00. ...00 00 ... @azx_4x1_i3_o1 + +SMLSLL_nx_s 11000001 0000 .... . .. ... ..... 010 .. @azx_1x1_i4_o2 +SMLSLL_nx_d 11000001 1000 .... . .. 0.. ..... 010 .. @azx_1x1_i3_o2 +SMLSLL_nx_s 11000001 0001 .... 0 .. 0.. ....0 01 ... @azx_2x1_i4_o1 +SMLSLL_nx_d 11000001 1001 .... 0 .. 00. ....0 01 ... @azx_2x1_i3_o1 +SMLSLL_nx_s 11000001 0001 .... 1 .. 0.. ...00 01 ... @azx_4x1_i4_o1 +SMLSLL_nx_d 11000001 1001 .... 1 .. 00. ...00 01 ... @azx_4x1_i3_o1 + +UMLALL_nx_s 11000001 0000 .... . .. ... ..... 100 .. @azx_1x1_i4_o2 +UMLALL_nx_d 11000001 1000 .... . .. 0.. ..... 100 .. @azx_1x1_i3_o2 +UMLALL_nx_s 11000001 0001 .... 0 .. 0.. ....0 10 ... @azx_2x1_i4_o1 +UMLALL_nx_d 11000001 1001 .... 0 .. 00. ....0 10 ... @azx_2x1_i3_o1 +UMLALL_nx_s 11000001 0001 .... 1 .. 0.. ...00 10 ... @azx_4x1_i4_o1 +UMLALL_nx_d 11000001 1001 .... 1 .. 00. ...00 10 ... @azx_4x1_i3_o1 + +UMLSLL_nx_s 11000001 0000 .... . .. ... ..... 110 .. @azx_1x1_i4_o2 +UMLSLL_nx_d 11000001 1000 .... . .. 0.. ..... 110 .. @azx_1x1_i3_o2 +UMLSLL_nx_s 11000001 0001 .... 0 .. 0.. ....0 11 ... @azx_2x1_i4_o1 +UMLSLL_nx_d 11000001 1001 .... 0 .. 00. ....0 11 ... @azx_2x1_i3_o1 +UMLSLL_nx_s 11000001 0001 .... 1 .. 0.. ...00 11 ... @azx_4x1_i4_o1 +UMLSLL_nx_d 11000001 1001 .... 1 .. 00. ...00 11 ... @azx_4x1_i3_o1 + +USMLALL_nx_s 11000001 0000 .... . .. ... ..... 001 .. @azx_1x1_i4_o2 +USMLALL_nx_s 11000001 0001 .... 0 .. 0.. ....1 00 ... @azx_2x1_i4_o1 +USMLALL_nx_s 11000001 0001 .... 1 .. 0.. ...01 00 ... @azx_4x1_i4_o1 + +SUMLALL_nx_s 11000001 0000 .... . .. ... ..... 101 .. @azx_1x1_i4_o2 +SUMLALL_nx_s 11000001 0001 .... 0 .. 0.. ....1 10 ... @azx_2x1_i4_o1 +SUMLALL_nx_s 11000001 0001 .... 1 .. 0.. ...01 10 ... @azx_4x1_i4_o1 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ DO_VDOT_IDX(sme2_svdot_idx_2h, int32_t, int16_t, int16_t, H4, H2) DO_VDOT_IDX(sme2_uvdot_idx_2h, uint32_t, uint16_t, uint16_t, H4, H2) #undef DO_VDOT_IDX + +#define DO_MLALL(NAME, TYPEW, TYPEN, TYPEM, HW, HN, OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ +{ \ + intptr_t elements = simd_oprsz(desc) / sizeof(TYPEW); \ + intptr_t sel = extract32(desc, SIMD_DATA_SHIFT, 2); \ + TYPEW *d = vd, *a = va; TYPEN *n = vn; TYPEM *m = vm; \ + for (intptr_t i = 0; i < elements; ++i) { \ + TYPEW nn = n[HN(i * 4 + sel)]; \ + TYPEM mm = m[HN(i * 4 + sel)]; \ + d[HW(i)] = a[HW(i)] OP (nn * mm); \ + } \ +} + +DO_MLALL(sme2_smlall_s, int32_t, int8_t, int8_t, H4, H1, +) +DO_MLALL(sme2_smlall_d, int64_t, int16_t, int16_t, H8, H2, +) +DO_MLALL(sme2_smlsll_s, int32_t, int8_t, int8_t, H4, H1, -) +DO_MLALL(sme2_smlsll_d, int64_t, int16_t, int16_t, H8, H2, -) + +DO_MLALL(sme2_umlall_s, uint32_t, uint8_t, uint8_t, H4, H1, +) +DO_MLALL(sme2_umlall_d, uint64_t, uint16_t, uint16_t, H8, H2, +) +DO_MLALL(sme2_umlsll_s, uint32_t, uint8_t, uint8_t, H4, H1, -) +DO_MLALL(sme2_umlsll_d, uint64_t, uint16_t, uint16_t, H8, H2, -) + +DO_MLALL(sme2_usmlall_s, uint32_t, uint8_t, int8_t, H4, H1, +) + +#undef DO_MLALL + +#define DO_MLALL_IDX(NAME, TYPEW, TYPEN, TYPEM, HW, HN, OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ +{ \ + intptr_t elements = simd_oprsz(desc) / sizeof(TYPEW); \ + intptr_t eltspersegment = 16 / sizeof(TYPEW); \ + intptr_t sel = extract32(desc, SIMD_DATA_SHIFT, 2); \ + intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 2, 4); \ + TYPEW *d = vd, *a = va; TYPEN *n = vn; TYPEM *m = vm; \ + for (intptr_t i = 0; i < elements; i += eltspersegment) { \ + TYPEW mm = m[HN(i * 4 + idx)]; \ + for (intptr_t j = 0; j < eltspersegment; ++j) { \ + TYPEN nn = n[HN((i + j) * 4 + sel)]; \ + d[HW(i + j)] = a[HW(i + j)] OP (nn * mm); \ + } \ + } \ +} + +DO_MLALL_IDX(sme2_smlall_idx_s, int32_t, int8_t, int8_t, H4, H1, +) +DO_MLALL_IDX(sme2_smlall_idx_d, int64_t, int16_t, int16_t, H8, H2, +) +DO_MLALL_IDX(sme2_smlsll_idx_s, int32_t, int8_t, int8_t, H4, H1, -) +DO_MLALL_IDX(sme2_smlsll_idx_d, int64_t, int16_t, int16_t, H8, H2, -) + +DO_MLALL_IDX(sme2_umlall_idx_s, uint32_t, uint8_t, uint8_t, H4, H1, +) +DO_MLALL_IDX(sme2_umlall_idx_d, uint64_t, uint16_t, uint16_t, H8, H2, +) +DO_MLALL_IDX(sme2_umlsll_idx_s, uint32_t, uint8_t, uint8_t, H4, H1, -) +DO_MLALL_IDX(sme2_umlsll_idx_d, uint64_t, uint16_t, uint16_t, H8, H2, -) + +DO_MLALL_IDX(sme2_usmlall_idx_s, uint32_t, uint8_t, int8_t, H4, H1, +) +DO_MLALL_IDX(sme2_sumlall_idx_s, uint32_t, int8_t, uint8_t, H4, H1, +) + +#undef DO_MLALL_IDX diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(UVDOT_nx_4h, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_uvdot_idx_4h) TRANS_FEAT(SUVDOT_nx_4b, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_suvdot_idx_4b) TRANS_FEAT(USVDOT_nx_4b, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_usvdot_idx_4b) + +static bool do_smlal(DisasContext *s, arg_azz_n *a, bool multi, + gen_helper_gvec_4 *fn) +{ + return do_azz_acc(s, a->n, 2, a->rv, a->off, a->zn, a->zm, + 0, 0, multi, fn); +} + +TRANS_FEAT(SMLAL_n1, aa64_sme2, do_smlal, a, false, gen_helper_sve2_smlal_zzzw_s) +TRANS_FEAT(SMLSL_n1, aa64_sme2, do_smlal, a, false, gen_helper_sve2_smlsl_zzzw_s) +TRANS_FEAT(UMLAL_n1, aa64_sme2, do_smlal, a, false, gen_helper_sve2_umlal_zzzw_s) +TRANS_FEAT(UMLSL_n1, aa64_sme2, do_smlal, a, false, gen_helper_sve2_umlsl_zzzw_s) + +TRANS_FEAT(SMLAL_nn, aa64_sme2, do_smlal, a, true, gen_helper_sve2_smlal_zzzw_s) +TRANS_FEAT(SMLSL_nn, aa64_sme2, do_smlal, a, true, gen_helper_sve2_smlsl_zzzw_s) +TRANS_FEAT(UMLAL_nn, aa64_sme2, do_smlal, a, true, gen_helper_sve2_umlal_zzzw_s) +TRANS_FEAT(UMLSL_nn, aa64_sme2, do_smlal, a, true, gen_helper_sve2_umlsl_zzzw_s) + +static bool do_smlal_nx(DisasContext *s, arg_azx_n *a, + gen_helper_gvec_4 *fn) +{ + return do_azz_acc(s, a->n, 2, a->rv, a->off, a->zn, a->zm, + a->idx << 1, 0, false, fn); +} + +TRANS_FEAT(SMLAL_nx, aa64_sme2, do_smlal_nx, a, gen_helper_sve2_smlal_idx_s) +TRANS_FEAT(SMLSL_nx, aa64_sme2, do_smlal_nx, a, gen_helper_sve2_smlsl_idx_s) +TRANS_FEAT(UMLAL_nx, aa64_sme2, do_smlal_nx, a, gen_helper_sve2_umlal_idx_s) +TRANS_FEAT(UMLSL_nx, aa64_sme2, do_smlal_nx, a, gen_helper_sve2_umlsl_idx_s) + +static bool do_smlall(DisasContext *s, arg_azz_n *a, bool multi, + gen_helper_gvec_4 *fn) +{ + return do_azz_acc(s, a->n, 4, a->rv, a->off, a->zn, a->zm, + 0, 0, multi, fn); +} + +static void gen_helper_sme2_sumlall_s(TCGv_ptr d, TCGv_ptr n, TCGv_ptr m, + TCGv_ptr a, TCGv_i32 desc) +{ + gen_helper_sme2_usmlall_s(d, m, n, a, desc); +} + +TRANS_FEAT(SMLALL_n1_s, aa64_sme2, do_smlall, a, false, gen_helper_sme2_smlall_s) +TRANS_FEAT(SMLSLL_n1_s, aa64_sme2, do_smlall, a, false, gen_helper_sme2_smlsll_s) +TRANS_FEAT(UMLALL_n1_s, aa64_sme2, do_smlall, a, false, gen_helper_sme2_umlall_s) +TRANS_FEAT(UMLSLL_n1_s, aa64_sme2, do_smlall, a, false, gen_helper_sme2_umlsll_s) +TRANS_FEAT(USMLALL_n1_s, aa64_sme2, do_smlall, a, false, gen_helper_sme2_usmlall_s) +TRANS_FEAT(SUMLALL_n1_s, aa64_sme2, do_smlall, a, false, gen_helper_sme2_sumlall_s) + +TRANS_FEAT(SMLALL_n1_d, aa64_sme2_i16i64, do_smlall, a, false, gen_helper_sme2_smlall_d) +TRANS_FEAT(SMLSLL_n1_d, aa64_sme2_i16i64, do_smlall, a, false, gen_helper_sme2_smlsll_d) +TRANS_FEAT(UMLALL_n1_d, aa64_sme2_i16i64, do_smlall, a, false, gen_helper_sme2_umlall_d) +TRANS_FEAT(UMLSLL_n1_d, aa64_sme2_i16i64, do_smlall, a, false, gen_helper_sme2_umlsll_d) + +TRANS_FEAT(SMLALL_nn_s, aa64_sme2, do_smlall, a, true, gen_helper_sme2_smlall_s) +TRANS_FEAT(SMLSLL_nn_s, aa64_sme2, do_smlall, a, true, gen_helper_sme2_smlsll_s) +TRANS_FEAT(UMLALL_nn_s, aa64_sme2, do_smlall, a, true, gen_helper_sme2_umlall_s) +TRANS_FEAT(UMLSLL_nn_s, aa64_sme2, do_smlall, a, true, gen_helper_sme2_umlsll_s) +TRANS_FEAT(USMLALL_nn_s, aa64_sme2, do_smlall, a, true, gen_helper_sme2_usmlall_s) + +TRANS_FEAT(SMLALL_nn_d, aa64_sme2_i16i64, do_smlall, a, true, gen_helper_sme2_smlall_d) +TRANS_FEAT(SMLSLL_nn_d, aa64_sme2_i16i64, do_smlall, a, true, gen_helper_sme2_smlsll_d) +TRANS_FEAT(UMLALL_nn_d, aa64_sme2_i16i64, do_smlall, a, true, gen_helper_sme2_umlall_d) +TRANS_FEAT(UMLSLL_nn_d, aa64_sme2_i16i64, do_smlall, a, true, gen_helper_sme2_umlsll_d) + +static bool do_smlall_nx(DisasContext *s, arg_azx_n *a, + gen_helper_gvec_4 *fn) +{ + return do_azz_acc(s, a->n, 4, a->rv, a->off, a->zn, a->zm, + a->idx << 2, 0, false, fn); +} + +TRANS_FEAT(SMLALL_nx_s, aa64_sme2, do_smlall_nx, a, gen_helper_sme2_smlall_idx_s) +TRANS_FEAT(SMLSLL_nx_s, aa64_sme2, do_smlall_nx, a, gen_helper_sme2_smlsll_idx_s) +TRANS_FEAT(UMLALL_nx_s, aa64_sme2, do_smlall_nx, a, gen_helper_sme2_umlall_idx_s) +TRANS_FEAT(UMLSLL_nx_s, aa64_sme2, do_smlall_nx, a, gen_helper_sme2_umlsll_idx_s) +TRANS_FEAT(USMLALL_nx_s, aa64_sme2, do_smlall_nx, a, gen_helper_sme2_usmlall_idx_s) +TRANS_FEAT(SUMLALL_nx_s, aa64_sme2, do_smlall_nx, a, gen_helper_sme2_sumlall_idx_s) + +TRANS_FEAT(SMLALL_nx_d, aa64_sme2_i16i64, do_smlall_nx, a, gen_helper_sme2_smlall_idx_d) +TRANS_FEAT(SMLSLL_nx_d, aa64_sme2_i16i64, do_smlall_nx, a, gen_helper_sme2_smlsll_idx_d) +TRANS_FEAT(UMLALL_nx_d, aa64_sme2_i16i64, do_smlall_nx, a, gen_helper_sme2_umlall_idx_d) +TRANS_FEAT(UMLSLL_nx_d, aa64_sme2_i16i64, do_smlall_nx, a, gen_helper_sme2_umlsll_idx_d) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Emphasize the non-fused nature of these multiply-add. Matches other helpers such as gvec_rsqrts_nf_[hs]. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-48-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper.h | 8 ++++---- target/arm/tcg/translate-neon.c | 4 ++-- target/arm/tcg/vec_helper.c | 8 ++++---- 3 files changed, 10 insertions(+), 10 deletions(-) diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper.h +++ b/target/arm/tcg/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_recps_nf_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, DEF_HELPER_FLAGS_5(gvec_rsqrts_nf_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_rsqrts_nf_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) -DEF_HELPER_FLAGS_5(gvec_fmla_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) -DEF_HELPER_FLAGS_5(gvec_fmla_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_fmla_nf_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_fmla_nf_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) -DEF_HELPER_FLAGS_5(gvec_fmls_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) -DEF_HELPER_FLAGS_5(gvec_fmls_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_fmls_nf_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_fmls_nf_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_vfma_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_vfma_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-neon.c +++ b/target/arm/tcg/translate-neon.c @@ -XXX,XX +XXX,XX @@ DO_3S_FP_GVEC(VACGE, gen_helper_gvec_facge_s, gen_helper_gvec_facge_h) DO_3S_FP_GVEC(VACGT, gen_helper_gvec_facgt_s, gen_helper_gvec_facgt_h) DO_3S_FP_GVEC(VMAX, gen_helper_gvec_fmax_s, gen_helper_gvec_fmax_h) DO_3S_FP_GVEC(VMIN, gen_helper_gvec_fmin_s, gen_helper_gvec_fmin_h) -DO_3S_FP_GVEC(VMLA, gen_helper_gvec_fmla_s, gen_helper_gvec_fmla_h) -DO_3S_FP_GVEC(VMLS, gen_helper_gvec_fmls_s, gen_helper_gvec_fmls_h) +DO_3S_FP_GVEC(VMLA, gen_helper_gvec_fmla_nf_s, gen_helper_gvec_fmla_nf_h) +DO_3S_FP_GVEC(VMLS, gen_helper_gvec_fmls_nf_s, gen_helper_gvec_fmls_nf_h) DO_3S_FP_GVEC(VFMA, gen_helper_gvec_vfma_s, gen_helper_gvec_vfma_h) DO_3S_FP_GVEC(VFMS, gen_helper_gvec_vfms_s, gen_helper_gvec_vfms_h) DO_3S_FP_GVEC(VRECPS, gen_helper_gvec_recps_nf_s, gen_helper_gvec_recps_nf_h) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, \ clear_tail(d, oprsz, simd_maxsz(desc)); \ } -DO_MULADD(gvec_fmla_h, float16_muladd_nf, float16) -DO_MULADD(gvec_fmla_s, float32_muladd_nf, float32) +DO_MULADD(gvec_fmla_nf_h, float16_muladd_nf, float16) +DO_MULADD(gvec_fmla_nf_s, float32_muladd_nf, float32) -DO_MULADD(gvec_fmls_h, float16_mulsub_nf, float16) -DO_MULADD(gvec_fmls_s, float32_mulsub_nf, float32) +DO_MULADD(gvec_fmls_nf_h, float16_mulsub_nf, float16) +DO_MULADD(gvec_fmls_nf_s, float32_mulsub_nf, float32) DO_MULADD(gvec_vfma_h, float16_muladd_f, float16) DO_MULADD(gvec_vfma_s, float32_muladd_f, float32) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-49-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 48 +++++++++++++++++ target/arm/tcg/translate-sme.c | 95 ++++++++++++++++++++++++++++++++++ 2 files changed, 143 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ USMLALL_n1_s 11000001 001 1 .... 0 .. 000 ..... 0010 . @azz_nx1_o1x4 n=4 SUMLALL_n1_s 11000001 001 0 .... 0 .. 000 ..... 1010 . @azz_nx1_o1x4 n=2 SUMLALL_n1_s 11000001 001 1 .... 0 .. 000 ..... 1010 . @azz_nx1_o1x4 n=4 +FMLA_n1_h 11000001 001 0 .... 0 .. 111 ..... 00 ... @azz_nx1_o3 n=2 +FMLA_n1_s 11000001 001 0 .... 0 .. 110 ..... 00 ... @azz_nx1_o3 n=2 +FMLA_n1_d 11000001 011 0 .... 0 .. 110 ..... 00 ... @azz_nx1_o3 n=2 +FMLA_n1_h 11000001 001 1 .... 0 .. 111 ..... 00 ... @azz_nx1_o3 n=4 +FMLA_n1_s 11000001 001 1 .... 0 .. 110 ..... 00 ... @azz_nx1_o3 n=4 +FMLA_n1_d 11000001 011 1 .... 0 .. 110 ..... 00 ... @azz_nx1_o3 n=4 + +FMLS_n1_h 11000001 001 0 .... 0 .. 111 ..... 01 ... @azz_nx1_o3 n=2 +FMLS_n1_s 11000001 001 0 .... 0 .. 110 ..... 01 ... @azz_nx1_o3 n=2 +FMLS_n1_d 11000001 011 0 .... 0 .. 110 ..... 01 ... @azz_nx1_o3 n=2 +FMLS_n1_h 11000001 001 1 .... 0 .. 111 ..... 01 ... @azz_nx1_o3 n=4 +FMLS_n1_s 11000001 001 1 .... 0 .. 110 ..... 01 ... @azz_nx1_o3 n=4 +FMLS_n1_d 11000001 011 1 .... 0 .. 110 ..... 01 ... @azz_nx1_o3 n=4 + ### SME2 Multi-vector Multiple Array Vectors %zn_ax2 6:4 !function=times_2 @@ -XXX,XX +XXX,XX @@ UMLSLL_nn_d 11000001 111 ...01 0 .. 000 ...00 1100 . @azz_4x4_o1x4 USMLALL_nn_s 11000001 101 ....0 0 .. 000 ....0 0010 . @azz_2x2_o1x4 USMLALL_nn_s 11000001 101 ...01 0 .. 000 ...00 0010 . @azz_4x4_o1x4 +FMLA_nn_h 11000001 101 ....0 0 .. 100 ....0 01 ... @azz_2x2_o3 +FMLA_nn_s 11000001 101 ....0 0 .. 110 ....0 00 ... @azz_2x2_o3 +FMLA_nn_d 11000001 111 ....0 0 .. 110 ....0 00 ... @azz_2x2_o3 +FMLA_nn_h 11000001 101 ...01 0 .. 100 ...00 01 ... @azz_4x4_o3 +FMLA_nn_s 11000001 101 ...01 0 .. 110 ...00 00 ... @azz_4x4_o3 +FMLA_nn_d 11000001 111 ...01 0 .. 110 ...00 00 ... @azz_4x4_o3 + +FMLS_nn_h 11000001 101 ....0 0 .. 100 ....0 11 ... @azz_2x2_o3 +FMLS_nn_s 11000001 101 ....0 0 .. 110 ....0 01 ... @azz_2x2_o3 +FMLS_nn_d 11000001 111 ....0 0 .. 110 ....0 01 ... @azz_2x2_o3 +FMLS_nn_h 11000001 101 ...01 0 .. 100 ...00 11 ... @azz_4x4_o3 +FMLS_nn_s 11000001 101 ...01 0 .. 110 ...00 01 ... @azz_4x4_o3 +FMLS_nn_d 11000001 111 ...01 0 .. 110 ...00 01 ... @azz_4x4_o3 + ### SME2 Multi-vector Indexed &azx_n n off rv zn zm idx @@ -XXX,XX +XXX,XX @@ USMLALL_nx_s 11000001 0001 .... 1 .. 0.. ...01 00 ... @azx_4x1_i4_o1 SUMLALL_nx_s 11000001 0000 .... . .. ... ..... 101 .. @azx_1x1_i4_o2 SUMLALL_nx_s 11000001 0001 .... 0 .. 0.. ....1 10 ... @azx_2x1_i4_o1 SUMLALL_nx_s 11000001 0001 .... 1 .. 0.. ...01 10 ... @azx_4x1_i4_o1 + +%idx3_10_3 10:2 3:1 +@azx_2x1_i3_o3 ........ .... zm:4 . .. ... ..... .. off:3 \ + &azx_n n=2 rv=%mova_rv zn=%zn_ax2 idx=%idx3_10_3 +@azx_4x1_i3_o3 ........ .... zm:4 . .. ... ..... .. off:3 \ + &azx_n n=4 rv=%mova_rv zn=%zn_ax4 idx=%idx3_10_3 + +FMLA_nx_h 11000001 0001 .... 0 .. 1.. ....0 0 .... @azx_2x1_i3_o3 +FMLA_nx_s 11000001 0101 .... 0 .. 0.. ....0 00 ... @azx_2x1_i2_o3 +FMLA_nx_d 11000001 1101 .... 0 .. 00. ....0 00 ... @azx_2x1_i1_o3 +FMLA_nx_h 11000001 0001 .... 1 .. 1.. ...00 0 .... @azx_4x1_i3_o3 +FMLA_nx_s 11000001 0101 .... 1 .. 0.. ...00 00 ... @azx_4x1_i2_o3 +FMLA_nx_d 11000001 1101 .... 1 .. 00. ...00 00 ... @azx_4x1_i1_o3 + +FMLS_nx_h 11000001 0001 .... 0 .. 1.. ....0 1 .... @azx_2x1_i3_o3 +FMLS_nx_s 11000001 0101 .... 0 .. 0.. ....0 10 ... @azx_2x1_i2_o3 +FMLS_nx_d 11000001 1101 .... 0 .. 00. ....0 10 ... @azx_2x1_i1_o3 +FMLS_nx_h 11000001 0001 .... 1 .. 1.. ...00 1 .... @azx_4x1_i3_o3 +FMLS_nx_s 11000001 0101 .... 1 .. 0.. ...00 10 ... @azx_4x1_i2_o3 +FMLS_nx_d 11000001 1101 .... 1 .. 00. ...00 10 ... @azx_4x1_i1_o3 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SUB_azz_nn_d, aa64_sme2_i16i64, do_azz_nn, a, MO_64, tcg_gen_gvec_sub */ #define FPST_ENV -1 +static bool do_azz_fp(DisasContext *s, int nreg, int nsel, + int rv, int off, int zn, int zm, + int data, int shsel, bool multi, int fpst, + gen_helper_gvec_3_ptr *fn) +{ + if (sme_smza_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + int vstride = svl / nreg; + TCGv_ptr t_za = get_zarray(s, rv, off, nreg, nsel); + TCGv_ptr t, ptr; + + if (fpst >= 0) { + ptr = fpstatus_ptr(fpst); + } else { + ptr = tcg_env; + } + t = tcg_temp_new_ptr(); + + for (int r = 0; r < nreg; ++r) { + TCGv_ptr t_zn = vec_full_reg_ptr(s, zn); + TCGv_ptr t_zm = vec_full_reg_ptr(s, zm); + + for (int i = 0; i < nsel; ++i) { + int o_za = (r * vstride + i) * sizeof(ARMVectorReg); + int desc = simd_desc(svl, svl, data | (i << shsel)); + + tcg_gen_addi_ptr(t, t_za, o_za); + fn(t, t_zn, t_zm, ptr, tcg_constant_i32(desc)); + } + + /* + * For multiple-and-single vectors, Zn may wrap. + * For multiple vectors, both Zn and Zm are aligned. + */ + zn = (zn + 1) % 32; + zm += multi; + } + } + return true; +} + static bool do_azz_acc_fp(DisasContext *s, int nreg, int nsel, int rv, int off, int zn, int zm, int data, int shsel, bool multi, int fpst, @@ -XXX,XX +XXX,XX @@ static bool do_vdot(DisasContext *s, arg_azx_n *a, gen_helper_gvec_4_ptr *fn) TRANS_FEAT(FVDOT, aa64_sme, do_vdot, a, gen_helper_sme2_fvdot_idx_h) TRANS_FEAT(BFVDOT, aa64_sme, do_vdot, a, gen_helper_sme2_bfvdot_idx) +static bool do_fmla(DisasContext *s, arg_azz_n *a, bool multi, + ARMFPStatusFlavour fpst, gen_helper_gvec_3_ptr *fn) +{ + return do_azz_fp(s, a->n, 1, a->rv, a->off, a->zn, a->zm, + 0, 0, multi, fpst, fn); +} + +TRANS_FEAT(FMLA_n1_h, aa64_sme_f16f16, do_fmla, a, false, FPST_ZA_F16, + gen_helper_gvec_vfma_h) +TRANS_FEAT(FMLS_n1_h, aa64_sme_f16f16, do_fmla, a, false, FPST_ZA_F16, + s->fpcr_ah ? gen_helper_gvec_ah_vfms_h : gen_helper_gvec_vfms_h) +TRANS_FEAT(FMLA_nn_h, aa64_sme_f16f16, do_fmla, a, true, FPST_ZA_F16, + gen_helper_gvec_vfma_h) +TRANS_FEAT(FMLS_nn_h, aa64_sme_f16f16, do_fmla, a, true, FPST_ZA_F16, + s->fpcr_ah ? gen_helper_gvec_ah_vfms_h : gen_helper_gvec_vfms_h) + +TRANS_FEAT(FMLA_n1_s, aa64_sme2, do_fmla, a, false, FPST_ZA, + gen_helper_gvec_vfma_s) +TRANS_FEAT(FMLS_n1_s, aa64_sme2, do_fmla, a, false, FPST_ZA, + s->fpcr_ah ? gen_helper_gvec_ah_vfms_s : gen_helper_gvec_vfms_s) +TRANS_FEAT(FMLA_nn_s, aa64_sme2, do_fmla, a, true, FPST_ZA, + gen_helper_gvec_vfma_s) +TRANS_FEAT(FMLS_nn_s, aa64_sme2, do_fmla, a, true, FPST_ZA, + s->fpcr_ah ? gen_helper_gvec_ah_vfms_s : gen_helper_gvec_vfms_s) + +TRANS_FEAT(FMLA_n1_d, aa64_sme2_f64f64, do_fmla, a, false, FPST_ZA, + gen_helper_gvec_vfma_d) +TRANS_FEAT(FMLS_n1_d, aa64_sme2_f64f64, do_fmla, a, false, FPST_ZA, + s->fpcr_ah ? gen_helper_gvec_ah_vfms_d : gen_helper_gvec_vfms_d) +TRANS_FEAT(FMLA_nn_d, aa64_sme2_f64f64, do_fmla, a, true, FPST_ZA, + gen_helper_gvec_vfma_d) +TRANS_FEAT(FMLS_nn_d, aa64_sme2_f64f64, do_fmla, a, true, FPST_ZA, + s->fpcr_ah ? gen_helper_gvec_ah_vfms_d : gen_helper_gvec_vfms_d) + +static bool do_fmla_nx(DisasContext *s, arg_azx_n *a, + ARMFPStatusFlavour fpst, gen_helper_gvec_4_ptr *fn) +{ + return do_azz_acc_fp(s, a->n, 1, a->rv, a->off, a->zn, a->zm, + a->idx, 0, false, fpst, fn); +} + +TRANS_FEAT(FMLA_nx_h, aa64_sme_f16f16, do_fmla_nx, a, FPST_ZA_F16, + gen_helper_gvec_fmla_idx_h) +TRANS_FEAT(FMLS_nx_h, aa64_sme_f16f16, do_fmla_nx, a, FPST_ZA_F16, + s->fpcr_ah ? gen_helper_gvec_ah_fmls_idx_h : gen_helper_gvec_fmls_idx_h) +TRANS_FEAT(FMLA_nx_s, aa64_sme2, do_fmla_nx, a, FPST_ZA, + gen_helper_gvec_fmla_idx_s) +TRANS_FEAT(FMLS_nx_s, aa64_sme2, do_fmla_nx, a, FPST_ZA, + s->fpcr_ah ? gen_helper_gvec_ah_fmls_idx_s : gen_helper_gvec_fmls_idx_s) +TRANS_FEAT(FMLA_nx_d, aa64_sme2_f64f64, do_fmla_nx, a, FPST_ZA, + gen_helper_gvec_fmla_idx_d) +TRANS_FEAT(FMLS_nx_d, aa64_sme2_f64f64, do_fmla_nx, a, FPST_ZA, + s->fpcr_ah ? gen_helper_gvec_ah_fmls_idx_d : gen_helper_gvec_fmls_idx_d) + /* * Expand array multi-vector single (n1), array multi-vector (nn), * and array multi-vector indexed (nx), for integer accumulate. -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-50-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper.h | 9 +++++++++ target/arm/tcg/sme.decode | 18 ++++++++++++++++++ target/arm/tcg/translate-sme.c | 14 ++++++++++++++ target/arm/tcg/vec_helper.c | 26 ++++++++++++++++++++++++++ 4 files changed, 67 insertions(+) diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper.h +++ b/target/arm/tcg/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fmls_nf_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i DEF_HELPER_FLAGS_5(gvec_vfma_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_vfma_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_vfma_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_bfmla, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_vfms_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_vfms_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_vfms_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_bfmls, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_ah_vfms_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_ah_vfms_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_ah_vfms_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_ah_bfmls, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_ftsmul_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(gvec_fmla_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_6(gvec_fmla_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_6(gvec_bfmla_idx, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_6(gvec_fmls_idx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(gvec_fmls_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_6(gvec_fmls_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_6(gvec_bfmls_idx, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_6(gvec_ah_fmls_idx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(gvec_ah_fmls_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_6(gvec_ah_fmls_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_6(gvec_ah_bfmls_idx, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_uqadd_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ USMLALL_n1_s 11000001 001 1 .... 0 .. 000 ..... 0010 . @azz_nx1_o1x4 n=4 SUMLALL_n1_s 11000001 001 0 .... 0 .. 000 ..... 1010 . @azz_nx1_o1x4 n=2 SUMLALL_n1_s 11000001 001 1 .... 0 .. 000 ..... 1010 . @azz_nx1_o1x4 n=4 +BFMLA_n1 11000001 011 0 .... 0 .. 111 ..... 00 ... @azz_nx1_o3 n=2 FMLA_n1_h 11000001 001 0 .... 0 .. 111 ..... 00 ... @azz_nx1_o3 n=2 FMLA_n1_s 11000001 001 0 .... 0 .. 110 ..... 00 ... @azz_nx1_o3 n=2 FMLA_n1_d 11000001 011 0 .... 0 .. 110 ..... 00 ... @azz_nx1_o3 n=2 + +BFMLA_n1 11000001 011 1 .... 0 .. 111 ..... 00 ... @azz_nx1_o3 n=4 FMLA_n1_h 11000001 001 1 .... 0 .. 111 ..... 00 ... @azz_nx1_o3 n=4 FMLA_n1_s 11000001 001 1 .... 0 .. 110 ..... 00 ... @azz_nx1_o3 n=4 FMLA_n1_d 11000001 011 1 .... 0 .. 110 ..... 00 ... @azz_nx1_o3 n=4 +BFMLS_n1 11000001 011 0 .... 0 .. 111 ..... 01 ... @azz_nx1_o3 n=2 FMLS_n1_h 11000001 001 0 .... 0 .. 111 ..... 01 ... @azz_nx1_o3 n=2 FMLS_n1_s 11000001 001 0 .... 0 .. 110 ..... 01 ... @azz_nx1_o3 n=2 FMLS_n1_d 11000001 011 0 .... 0 .. 110 ..... 01 ... @azz_nx1_o3 n=2 + +BFMLS_n1 11000001 011 1 .... 0 .. 111 ..... 01 ... @azz_nx1_o3 n=4 FMLS_n1_h 11000001 001 1 .... 0 .. 111 ..... 01 ... @azz_nx1_o3 n=4 FMLS_n1_s 11000001 001 1 .... 0 .. 110 ..... 01 ... @azz_nx1_o3 n=4 FMLS_n1_d 11000001 011 1 .... 0 .. 110 ..... 01 ... @azz_nx1_o3 n=4 @@ -XXX,XX +XXX,XX @@ UMLSLL_nn_d 11000001 111 ...01 0 .. 000 ...00 1100 . @azz_4x4_o1x4 USMLALL_nn_s 11000001 101 ....0 0 .. 000 ....0 0010 . @azz_2x2_o1x4 USMLALL_nn_s 11000001 101 ...01 0 .. 000 ...00 0010 . @azz_4x4_o1x4 +BFMLA_nn 11000001 111 ....0 0 .. 100 ....0 01 ... @azz_2x2_o3 FMLA_nn_h 11000001 101 ....0 0 .. 100 ....0 01 ... @azz_2x2_o3 FMLA_nn_s 11000001 101 ....0 0 .. 110 ....0 00 ... @azz_2x2_o3 FMLA_nn_d 11000001 111 ....0 0 .. 110 ....0 00 ... @azz_2x2_o3 + +BFMLA_nn 11000001 111 ...01 0 .. 100 ...00 01 ... @azz_4x4_o3 FMLA_nn_h 11000001 101 ...01 0 .. 100 ...00 01 ... @azz_4x4_o3 FMLA_nn_s 11000001 101 ...01 0 .. 110 ...00 00 ... @azz_4x4_o3 FMLA_nn_d 11000001 111 ...01 0 .. 110 ...00 00 ... @azz_4x4_o3 +BFMLS_nn 11000001 111 ....0 0 .. 100 ....0 11 ... @azz_2x2_o3 FMLS_nn_h 11000001 101 ....0 0 .. 100 ....0 11 ... @azz_2x2_o3 FMLS_nn_s 11000001 101 ....0 0 .. 110 ....0 01 ... @azz_2x2_o3 FMLS_nn_d 11000001 111 ....0 0 .. 110 ....0 01 ... @azz_2x2_o3 + +BFMLS_nn 11000001 111 ...01 0 .. 100 ...00 11 ... @azz_4x4_o3 FMLS_nn_h 11000001 101 ...01 0 .. 100 ...00 11 ... @azz_4x4_o3 FMLS_nn_s 11000001 101 ...01 0 .. 110 ...00 01 ... @azz_4x4_o3 FMLS_nn_d 11000001 111 ...01 0 .. 110 ...00 01 ... @azz_4x4_o3 @@ -XXX,XX +XXX,XX @@ SUMLALL_nx_s 11000001 0001 .... 1 .. 0.. ...01 10 ... @azx_4x1_i4_o1 @azx_4x1_i3_o3 ........ .... zm:4 . .. ... ..... .. off:3 \ &azx_n n=4 rv=%mova_rv zn=%zn_ax4 idx=%idx3_10_3 +BFMLA_nx 11000001 0001 .... 0 .. 1.. ....1 0 .... @azx_2x1_i3_o3 FMLA_nx_h 11000001 0001 .... 0 .. 1.. ....0 0 .... @azx_2x1_i3_o3 FMLA_nx_s 11000001 0101 .... 0 .. 0.. ....0 00 ... @azx_2x1_i2_o3 FMLA_nx_d 11000001 1101 .... 0 .. 00. ....0 00 ... @azx_2x1_i1_o3 + +BFMLA_nx 11000001 0001 .... 1 .. 1.. ...01 0 .... @azx_4x1_i3_o3 FMLA_nx_h 11000001 0001 .... 1 .. 1.. ...00 0 .... @azx_4x1_i3_o3 FMLA_nx_s 11000001 0101 .... 1 .. 0.. ...00 00 ... @azx_4x1_i2_o3 FMLA_nx_d 11000001 1101 .... 1 .. 00. ...00 00 ... @azx_4x1_i1_o3 +BFMLS_nx 11000001 0001 .... 0 .. 1.. ....1 1 .... @azx_2x1_i3_o3 FMLS_nx_h 11000001 0001 .... 0 .. 1.. ....0 1 .... @azx_2x1_i3_o3 FMLS_nx_s 11000001 0101 .... 0 .. 0.. ....0 10 ... @azx_2x1_i2_o3 FMLS_nx_d 11000001 1101 .... 0 .. 00. ....0 10 ... @azx_2x1_i1_o3 + +BFMLS_nx 11000001 0001 .... 1 .. 1.. ...01 1 .... @azx_4x1_i3_o3 FMLS_nx_h 11000001 0001 .... 1 .. 1.. ...00 1 .... @azx_4x1_i3_o3 FMLS_nx_s 11000001 0101 .... 1 .. 0.. ...00 10 ... @azx_4x1_i2_o3 FMLS_nx_d 11000001 1101 .... 1 .. 00. ...00 10 ... @azx_4x1_i1_o3 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FMLA_nn_d, aa64_sme2_f64f64, do_fmla, a, true, FPST_ZA, TRANS_FEAT(FMLS_nn_d, aa64_sme2_f64f64, do_fmla, a, true, FPST_ZA, s->fpcr_ah ? gen_helper_gvec_ah_vfms_d : gen_helper_gvec_vfms_d) +TRANS_FEAT(BFMLA_n1, aa64_sme_b16b16, do_fmla, a, false, FPST_ZA, + gen_helper_gvec_bfmla) +TRANS_FEAT(BFMLS_n1, aa64_sme_b16b16, do_fmla, a, false, FPST_ZA, + s->fpcr_ah ? gen_helper_gvec_ah_bfmls : gen_helper_gvec_bfmls) +TRANS_FEAT(BFMLA_nn, aa64_sme_b16b16, do_fmla, a, true, FPST_ZA, + gen_helper_gvec_bfmla) +TRANS_FEAT(BFMLS_nn, aa64_sme_b16b16, do_fmla, a, true, FPST_ZA, + s->fpcr_ah ? gen_helper_gvec_ah_bfmls : gen_helper_gvec_bfmls) + static bool do_fmla_nx(DisasContext *s, arg_azx_n *a, ARMFPStatusFlavour fpst, gen_helper_gvec_4_ptr *fn) { @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FMLA_nx_d, aa64_sme2_f64f64, do_fmla_nx, a, FPST_ZA, TRANS_FEAT(FMLS_nx_d, aa64_sme2_f64f64, do_fmla_nx, a, FPST_ZA, s->fpcr_ah ? gen_helper_gvec_ah_fmls_idx_d : gen_helper_gvec_fmls_idx_d) +TRANS_FEAT(BFMLA_nx, aa64_sme_b16b16, do_fmla_nx, a, FPST_ZA, + gen_helper_gvec_bfmla_idx) +TRANS_FEAT(BFMLS_nx, aa64_sme_b16b16, do_fmla_nx, a, FPST_ZA, + s->fpcr_ah ? gen_helper_gvec_ah_bfmls_idx : gen_helper_gvec_bfmls_idx) + /* * Expand array multi-vector single (n1), array multi-vector (nn), * and array multi-vector indexed (nx), for integer accumulate. diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ static float16 float16_muladd_f(float16 dest, float16 op1, float16 op2, return float16_muladd(op1, op2, dest, 0, stat); } +static bfloat16 bfloat16_muladd_f(bfloat16 dest, bfloat16 op1, bfloat16 op2, + float_status *stat) +{ + return bfloat16_muladd(op1, op2, dest, 0, stat); +} + static float32 float32_muladd_f(float32 dest, float32 op1, float32 op2, float_status *stat) { @@ -XXX,XX +XXX,XX @@ static float16 float16_mulsub_f(float16 dest, float16 op1, float16 op2, return float16_muladd(float16_chs(op1), op2, dest, 0, stat); } +static bfloat16 bfloat16_mulsub_f(bfloat16 dest, bfloat16 op1, bfloat16 op2, + float_status *stat) +{ + return bfloat16_muladd(bfloat16_chs(op1), op2, dest, 0, stat); +} + static float32 float32_mulsub_f(float32 dest, float32 op1, float32 op2, float_status *stat) { @@ -XXX,XX +XXX,XX @@ static float16 float16_ah_mulsub_f(float16 dest, float16 op1, float16 op2, return float16_muladd(op1, op2, dest, float_muladd_negate_product, stat); } +static bfloat16 bfloat16_ah_mulsub_f(bfloat16 dest, bfloat16 op1, bfloat16 op2, + float_status *stat) +{ + return bfloat16_muladd(op1, op2, dest, float_muladd_negate_product, stat); +} + static float32 float32_ah_mulsub_f(float32 dest, float32 op1, float32 op2, float_status *stat) { @@ -XXX,XX +XXX,XX @@ DO_MULADD(gvec_fmls_nf_s, float32_mulsub_nf, float32) DO_MULADD(gvec_vfma_h, float16_muladd_f, float16) DO_MULADD(gvec_vfma_s, float32_muladd_f, float32) DO_MULADD(gvec_vfma_d, float64_muladd_f, float64) +DO_MULADD(gvec_bfmla, bfloat16_muladd_f, bfloat16) DO_MULADD(gvec_vfms_h, float16_mulsub_f, float16) DO_MULADD(gvec_vfms_s, float32_mulsub_f, float32) DO_MULADD(gvec_vfms_d, float64_mulsub_f, float64) +DO_MULADD(gvec_bfmls, bfloat16_mulsub_f, bfloat16) DO_MULADD(gvec_ah_vfms_h, float16_ah_mulsub_f, float16) DO_MULADD(gvec_ah_vfms_s, float32_ah_mulsub_f, float32) DO_MULADD(gvec_ah_vfms_d, float64_ah_mulsub_f, float64) +DO_MULADD(gvec_ah_bfmls, bfloat16_ah_mulsub_f, bfloat16) + +#undef DO_MULADD /* For the indexed ops, SVE applies the index per 128-bit vector segment. * For AdvSIMD, there is of course only one such vector segment. @@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, \ DO_FMLA_IDX(gvec_fmla_idx_h, float16, H2, 0, 0) DO_FMLA_IDX(gvec_fmla_idx_s, float32, H4, 0, 0) DO_FMLA_IDX(gvec_fmla_idx_d, float64, H8, 0, 0) +DO_FMLA_IDX(gvec_bfmla_idx, bfloat16, H2, 0, 0) DO_FMLA_IDX(gvec_fmls_idx_h, float16, H2, INT16_MIN, 0) DO_FMLA_IDX(gvec_fmls_idx_s, float32, H4, INT32_MIN, 0) DO_FMLA_IDX(gvec_fmls_idx_d, float64, H8, INT64_MIN, 0) +DO_FMLA_IDX(gvec_bfmls_idx, bfloat16, H2, INT16_MIN, 0) DO_FMLA_IDX(gvec_ah_fmls_idx_h, float16, H2, 0, float_muladd_negate_product) DO_FMLA_IDX(gvec_ah_fmls_idx_s, float32, H4, 0, float_muladd_negate_product) DO_FMLA_IDX(gvec_ah_fmls_idx_d, float64, H8, 0, float_muladd_negate_product) +DO_FMLA_IDX(gvec_ah_bfmls_idx, bfloat16, H2, 0, float_muladd_negate_product) #undef DO_FMLA_IDX -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-51-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper.h | 2 ++ target/arm/tcg/sme.decode | 25 +++++++++++++++++++ target/arm/tcg/translate-sme.c | 44 ++++++++++++++++++++++++++++++++++ target/arm/tcg/vec_helper.c | 2 ++ 4 files changed, 73 insertions(+) diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper.h +++ b/target/arm/tcg/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_fclt0_d, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fadd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fadd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fadd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_bfadd, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fsub_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fsub_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fsub_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_bfsub, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fmul_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fmul_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ FMLS_nn_h 11000001 101 ...01 0 .. 100 ...00 11 ... @azz_4x4_o3 FMLS_nn_s 11000001 101 ...01 0 .. 110 ...00 01 ... @azz_4x4_o3 FMLS_nn_d 11000001 111 ...01 0 .. 110 ...00 01 ... @azz_4x4_o3 +&az_n n off rv zm +@az_2x2_o3 ........ ... ..... . .. ... ..... .. off:3 \ + &az_n n=2 rv=%mova_rv zm=%zn_ax2 +@az_4x4_o3 ........ ... ..... . .. ... ..... .. off:3 \ + &az_n n=4 rv=%mova_rv zm=%zn_ax4 + +FADD_nn_h 11000001 101 00100 0 .. 111 ....0 00 ... @az_2x2_o3 +FADD_nn_s 11000001 101 00000 0 .. 111 ....0 00 ... @az_2x2_o3 +FADD_nn_d 11000001 111 00000 0 .. 111 ....0 00 ... @az_2x2_o3 +FADD_nn_h 11000001 101 00101 0 .. 111 ...00 00 ... @az_4x4_o3 +FADD_nn_s 11000001 101 00001 0 .. 111 ...00 00 ... @az_4x4_o3 +FADD_nn_d 11000001 111 00001 0 .. 111 ...00 00 ... @az_4x4_o3 + +FSUB_nn_h 11000001 101 00100 0 .. 111 ....0 01 ... @az_2x2_o3 +FSUB_nn_s 11000001 101 00000 0 .. 111 ....0 01 ... @az_2x2_o3 +FSUB_nn_d 11000001 111 00000 0 .. 111 ....0 01 ... @az_2x2_o3 +FSUB_nn_h 11000001 101 00101 0 .. 111 ...00 01 ... @az_4x4_o3 +FSUB_nn_s 11000001 101 00001 0 .. 111 ...00 01 ... @az_4x4_o3 +FSUB_nn_d 11000001 111 00001 0 .. 111 ...00 01 ... @az_4x4_o3 + +BFADD_nn 11000001 111 00100 0 .. 111 ....0 00 ... @az_2x2_o3 +BFADD_nn 11000001 111 00101 0 .. 111 ...00 00 ... @az_4x4_o3 +BFSUB_nn 11000001 111 00100 0 .. 111 ....0 01 ... @az_2x2_o3 +BFSUB_nn 11000001 111 00101 0 .. 111 ...00 01 ... @az_4x4_o3 + ### SME2 Multi-vector Indexed &azx_n n off rv zn zm idx diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(BFMLA_nx, aa64_sme_b16b16, do_fmla_nx, a, FPST_ZA, TRANS_FEAT(BFMLS_nx, aa64_sme_b16b16, do_fmla_nx, a, FPST_ZA, s->fpcr_ah ? gen_helper_gvec_ah_bfmls_idx : gen_helper_gvec_bfmls_idx) +static bool do_faddsub(DisasContext *s, arg_az_n *a, ARMFPStatusFlavour fpst, + gen_helper_gvec_3_ptr *fn) +{ + if (sme_smza_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + int n = a->n; + int zm = a->zm; + int vstride = svl / n; + TCGv_ptr t_za = get_zarray(s, a->rv, a->off, n, 0); + TCGv_ptr ptr = fpstatus_ptr(fpst); + TCGv_ptr t = tcg_temp_new_ptr(); + + for (int r = 0; r < n; ++r) { + TCGv_ptr t_zm = vec_full_reg_ptr(s, zm + r); + int o_za = r * vstride * sizeof(ARMVectorReg); + int desc = simd_desc(svl, svl, 0); + + tcg_gen_addi_ptr(t, t_za, o_za); + fn(t, t, t_zm, ptr, tcg_constant_i32(desc)); + } + } + return true; +} + +TRANS_FEAT(FADD_nn_h, aa64_sme_f16f16, do_faddsub, a, + FPST_ZA_F16, gen_helper_gvec_fadd_h) +TRANS_FEAT(FSUB_nn_h, aa64_sme_f16f16, do_faddsub, a, + FPST_ZA_F16, gen_helper_gvec_fsub_h) + +TRANS_FEAT(FADD_nn_s, aa64_sme2, do_faddsub, a, + FPST_ZA, gen_helper_gvec_fadd_s) +TRANS_FEAT(FSUB_nn_s, aa64_sme2, do_faddsub, a, + FPST_ZA, gen_helper_gvec_fsub_s) + +TRANS_FEAT(FADD_nn_d, aa64_sme2_f64f64, do_faddsub, a, + FPST_ZA, gen_helper_gvec_fadd_d) +TRANS_FEAT(FSUB_nn_d, aa64_sme2_f64f64, do_faddsub, a, + FPST_ZA, gen_helper_gvec_fsub_d) + +TRANS_FEAT(BFADD_nn, aa64_sme_b16b16, do_faddsub, a, + FPST_ZA, gen_helper_gvec_bfadd) +TRANS_FEAT(BFSUB_nn, aa64_sme_b16b16, do_faddsub, a, + FPST_ZA, gen_helper_gvec_bfsub) + /* * Expand array multi-vector single (n1), array multi-vector (nn), * and array multi-vector indexed (nx), for integer accumulate. diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, \ DO_3OP(gvec_fadd_h, float16_add, float16) DO_3OP(gvec_fadd_s, float32_add, float32) DO_3OP(gvec_fadd_d, float64_add, float64) +DO_3OP(gvec_bfadd, bfloat16_add, bfloat16) DO_3OP(gvec_fsub_h, float16_sub, float16) DO_3OP(gvec_fsub_s, float32_sub, float32) DO_3OP(gvec_fsub_d, float64_sub, float64) +DO_3OP(gvec_bfsub, bfloat16_sub, bfloat16) DO_3OP(gvec_fmul_h, float16_mul, float16) DO_3OP(gvec_fmul_s, float32_mul, float32) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-52-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 12 ++++++++++++ target/arm/tcg/translate-sme.c | 28 ++++++++++++++++++++++++++++ 2 files changed, 40 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ BFMLS_nx 11000001 0001 .... 1 .. 1.. ...01 1 .... @azx_4x1_i3_o3 FMLS_nx_h 11000001 0001 .... 1 .. 1.. ...00 1 .... @azx_4x1_i3_o3 FMLS_nx_s 11000001 0101 .... 1 .. 0.. ...00 10 ... @azx_4x1_i2_o3 FMLS_nx_d 11000001 1101 .... 1 .. 00. ...00 10 ... @azx_4x1_i1_o3 + +### SME2 Add / Sub array accumulators + +ADD_aaz_s 11000001 101 000000 .. 111 ....0 10 ... @az_2x2_o3 +ADD_aaz_s 11000001 101 000010 .. 111 ...00 10 ... @az_4x4_o3 +ADD_aaz_d 11000001 111 000000 .. 111 ....0 10 ... @az_2x2_o3 +ADD_aaz_d 11000001 111 000010 .. 111 ...00 10 ... @az_4x4_o3 + +SUB_aaz_s 11000001 101 000000 .. 111 ....0 11 ... @az_2x2_o3 +SUB_aaz_s 11000001 101 000010 .. 111 ...00 11 ... @az_4x4_o3 +SUB_aaz_d 11000001 111 000000 .. 111 ....0 11 ... @az_2x2_o3 +SUB_aaz_d 11000001 111 000010 .. 111 ...00 11 ... @az_4x4_o3 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SUB_azz_nn_s, aa64_sme2, do_azz_nn, a, MO_32, tcg_gen_gvec_sub_var) TRANS_FEAT(ADD_azz_nn_d, aa64_sme2_i16i64, do_azz_nn, a, MO_64, tcg_gen_gvec_add_var) TRANS_FEAT(SUB_azz_nn_d, aa64_sme2_i16i64, do_azz_nn, a, MO_64, tcg_gen_gvec_sub_var) +/* Add/Sub each ZA[d*N] += Z[m*N] */ +static bool do_aaz(DisasContext *s, arg_az_n *a, int esz, GVecGen3FnVar *fn) +{ + TCGv_ptr t_za; + int svl, n; + + if (!sme_smza_enabled_check(s)) { + return true; + } + + n = a->n; + t_za = get_zarray(s, a->rv, a->off, n, 0); + svl = streaming_vec_reg_size(s); + + for (int i = 0; i < n; ++i) { + int o_za = (svl / n * sizeof(ARMVectorReg)) * i; + int o_zm = vec_full_reg_offset(s, a->zm + i); + + fn(esz, t_za, o_za, t_za, o_za, tcg_env, o_zm, svl, svl); + } + return true; +} + +TRANS_FEAT(ADD_aaz_s, aa64_sme2, do_aaz, a, MO_32, tcg_gen_gvec_add_var) +TRANS_FEAT(SUB_aaz_s, aa64_sme2, do_aaz, a, MO_32, tcg_gen_gvec_sub_var) +TRANS_FEAT(ADD_aaz_d, aa64_sme2_i16i64, do_aaz, a, MO_64, tcg_gen_gvec_add_var) +TRANS_FEAT(SUB_aaz_d, aa64_sme2_i16i64, do_aaz, a, MO_64, tcg_gen_gvec_sub_var) + /* * Expand array multi-vector single (n1), array multi-vector (nn), * and array multi-vector indexed (nx), for floating-point accumulate. -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-53-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 5 +++ target/arm/tcg/vec_internal.h | 2 + target/arm/tcg/sme.decode | 12 ++++++ target/arm/tcg/sme_helper.c | 74 ++++++++++++++++++++++++++++++++++ target/arm/tcg/sve_helper.c | 2 +- target/arm/tcg/translate-sme.c | 25 ++++++++++++ 6 files changed, 119 insertions(+), 1 deletion(-) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sme2_umlsll_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, DEF_HELPER_FLAGS_5(sme2_umlsll_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sme2_usmlall_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sme2_sumlall_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sme2_bfcvt, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_4(sme2_bfcvtn, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_4(sme2_fcvt_n, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_4(sme2_fcvtn, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_internal.h +++ b/target/arm/tcg/vec_internal.h @@ -XXX,XX +XXX,XX @@ static inline float64 float64_maybe_ah_chs(float64 a, bool fpcr_ah) bfloat16 helper_sme2_ah_fmax_b16(bfloat16 a, bfloat16 b, float_status *fpst); bfloat16 helper_sme2_ah_fmin_b16(bfloat16 a, bfloat16 b, float_status *fpst); +float16 sve_f32_to_f16(float32 f, float_status *fpst); + #endif /* TARGET_ARM_VEC_INTERNAL_H */ diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SUB_aaz_s 11000001 101 000000 .. 111 ....0 11 ... @az_2x2_o3 SUB_aaz_s 11000001 101 000010 .. 111 ...00 11 ... @az_4x4_o3 SUB_aaz_d 11000001 111 000000 .. 111 ....0 11 ... @az_2x2_o3 SUB_aaz_d 11000001 111 000010 .. 111 ...00 11 ... @az_4x4_o3 + +### SME2 Multi-vector SVE Constructive Unary + +&zz_n zd zn n +@zz_1x2 ........ ... ..... ...... ..... zd:5 \ + &zz_n n=1 zn=%zn_ax2 + +BFCVT 11000001 011 00000 111000 ....0 ..... @zz_1x2 +BFCVTN 11000001 011 00000 111000 ....1 ..... @zz_1x2 + +FCVT_n 11000001 001 00000 111000 ....0 ..... @zz_1x2 +FCVTN 11000001 001 00000 111000 ....1 ..... @zz_1x2 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ DO_MLALL_IDX(sme2_usmlall_idx_s, uint32_t, uint8_t, int8_t, H4, H1, +) DO_MLALL_IDX(sme2_sumlall_idx_s, uint32_t, int8_t, uint8_t, H4, H1, +) #undef DO_MLALL_IDX + +/* Convert and compress */ +void HELPER(sme2_bfcvt)(void *vd, void *vs, float_status *fpst, uint32_t desc) +{ + ARMVectorReg scratch; + size_t oprsz = simd_oprsz(desc); + size_t i, n = oprsz / 4; + float32 *s0 = vs; + float32 *s1 = vs + sizeof(ARMVectorReg); + bfloat16 *d = vd; + + if (vd == s1) { + s1 = memcpy(&scratch, s1, oprsz); + } + + for (i = 0; i < n; ++i) { + d[H2(i)] = float32_to_bfloat16(s0[H4(i)], fpst); + } + for (i = 0; i < n; ++i) { + d[H2(i) + n] = float32_to_bfloat16(s1[H4(i)], fpst); + } +} + +void HELPER(sme2_fcvt_n)(void *vd, void *vs, float_status *fpst, uint32_t desc) +{ + ARMVectorReg scratch; + size_t oprsz = simd_oprsz(desc); + size_t i, n = oprsz / 4; + float32 *s0 = vs; + float32 *s1 = vs + sizeof(ARMVectorReg); + float16 *d = vd; + + if (vd == s1) { + s1 = memcpy(&scratch, s1, oprsz); + } + + for (i = 0; i < n; ++i) { + d[H2(i)] = sve_f32_to_f16(s0[H4(i)], fpst); + } + for (i = 0; i < n; ++i) { + d[H2(i) + n] = sve_f32_to_f16(s1[H4(i)], fpst); + } +} + +/* Convert and interleave */ +void HELPER(sme2_bfcvtn)(void *vd, void *vs, float_status *fpst, uint32_t desc) +{ + size_t i, n = simd_oprsz(desc) / 4; + float32 *s0 = vs; + float32 *s1 = vs + sizeof(ARMVectorReg); + bfloat16 *d = vd; + + for (i = 0; i < n; ++i) { + bfloat16 d0 = float32_to_bfloat16(s0[H4(i)], fpst); + bfloat16 d1 = float32_to_bfloat16(s1[H4(i)], fpst); + d[H2(i * 2 + 0)] = d0; + d[H2(i * 2 + 1)] = d1; + } +} + +void HELPER(sme2_fcvtn)(void *vd, void *vs, float_status *fpst, uint32_t desc) +{ + size_t i, n = simd_oprsz(desc) / 4; + float32 *s0 = vs; + float32 *s1 = vs + sizeof(ARMVectorReg); + bfloat16 *d = vd; + + for (i = 0; i < n; ++i) { + bfloat16 d0 = sve_f32_to_f16(s0[H4(i)], fpst); + bfloat16 d1 = sve_f32_to_f16(s1[H4(i)], fpst); + d[H2(i * 2 + 0)] = d0; + d[H2(i * 2 + 1)] = d1; + } +} diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ static inline float64 sve_f16_to_f64(float16 f, float_status *fpst) return ret; } -static inline float16 sve_f32_to_f16(float32 f, float_status *fpst) +float16 sve_f32_to_f16(float32 f, float_status *fpst) { bool save = get_flush_to_zero(fpst); float16 ret; diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SMLALL_nx_d, aa64_sme2_i16i64, do_smlall_nx, a, gen_helper_sme2_smlal TRANS_FEAT(SMLSLL_nx_d, aa64_sme2_i16i64, do_smlall_nx, a, gen_helper_sme2_smlsll_idx_d) TRANS_FEAT(UMLALL_nx_d, aa64_sme2_i16i64, do_smlall_nx, a, gen_helper_sme2_umlall_idx_d) TRANS_FEAT(UMLSLL_nx_d, aa64_sme2_i16i64, do_smlall_nx, a, gen_helper_sme2_umlsll_idx_d) + +static bool do_zz_fpst(DisasContext *s, arg_zz_n *a, int data, + ARMFPStatusFlavour type, gen_helper_gvec_2_ptr *fn) +{ + if (sme_sm_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + TCGv_ptr fpst = fpstatus_ptr(type); + + for (int i = 0, n = a->n; i < n; ++i) { + tcg_gen_gvec_2_ptr(vec_full_reg_offset(s, a->zd + i), + vec_full_reg_offset(s, a->zn + i), + fpst, svl, svl, data, fn); + } + } + return true; +} + +TRANS_FEAT(BFCVT, aa64_sme2, do_zz_fpst, a, 0, + FPST_A64, gen_helper_sme2_bfcvt) +TRANS_FEAT(BFCVTN, aa64_sme2, do_zz_fpst, a, 0, + FPST_A64, gen_helper_sme2_bfcvtn) +TRANS_FEAT(FCVT_n, aa64_sme2, do_zz_fpst, a, 0, + FPST_A64, gen_helper_sme2_fcvt_n) +TRANS_FEAT(FCVTN, aa64_sme2, do_zz_fpst, a, 0, + FPST_A64, gen_helper_sme2_fcvtn) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-54-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 2 ++ target/arm/tcg/vec_internal.h | 1 + target/arm/tcg/sme.decode | 5 ++++ target/arm/tcg/sme_helper.c | 45 ++++++++++++++++++++++++++++++++++ target/arm/tcg/sve_helper.c | 2 +- target/arm/tcg/translate-sme.c | 5 ++++ 6 files changed, 59 insertions(+), 1 deletion(-) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sme2_bfcvt, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_4(sme2_bfcvtn, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_4(sme2_fcvt_n, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_4(sme2_fcvtn, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_4(sme2_fcvt_w, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_4(sme2_fcvtl, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_internal.h +++ b/target/arm/tcg/vec_internal.h @@ -XXX,XX +XXX,XX @@ static inline float64 float64_maybe_ah_chs(float64 a, bool fpcr_ah) bfloat16 helper_sme2_ah_fmax_b16(bfloat16 a, bfloat16 b, float_status *fpst); bfloat16 helper_sme2_ah_fmin_b16(bfloat16 a, bfloat16 b, float_status *fpst); +float32 sve_f16_to_f32(float16 f, float_status *fpst); float16 sve_f32_to_f16(float32 f, float_status *fpst); #endif /* TARGET_ARM_VEC_INTERNAL_H */ diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SUB_aaz_d 11000001 111 000010 .. 111 ...00 11 ... @az_4x4_o3 &zz_n zd zn n @zz_1x2 ........ ... ..... ...... ..... zd:5 \ &zz_n n=1 zn=%zn_ax2 +@zz_2x1 ........ ... ..... ...... zn:5 ..... \ + &zz_n n=1 zd=%zd_ax2 BFCVT 11000001 011 00000 111000 ....0 ..... @zz_1x2 BFCVTN 11000001 011 00000 111000 ....1 ..... @zz_1x2 FCVT_n 11000001 001 00000 111000 ....0 ..... @zz_1x2 FCVTN 11000001 001 00000 111000 ....1 ..... @zz_1x2 + +FCVT_w 11000001 101 00000 111000 ..... ....0 @zz_2x1 +FCVTL 11000001 101 00000 111000 ..... ....1 @zz_2x1 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ #include "vec_internal.h" #include "sve_ldst_internal.h" + +static bool vectors_overlap(ARMVectorReg *x, unsigned nx, + ARMVectorReg *y, unsigned ny) +{ + return !(x + nx <= y || y + ny <= x); +} + void helper_set_svcr(CPUARMState *env, uint32_t val, uint32_t mask) { aarch64_set_svcr(env, val, mask); @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_fcvtn)(void *vd, void *vs, float_status *fpst, uint32_t desc) d[H2(i * 2 + 1)] = d1; } } + +/* Expand and convert */ +void HELPER(sme2_fcvt_w)(void *vd, void *vs, float_status *fpst, uint32_t desc) +{ + ARMVectorReg scratch; + size_t oprsz = simd_oprsz(desc); + size_t i, n = oprsz / 4; + float16 *s = vs; + float32 *d0 = vd; + float32 *d1 = vd + sizeof(ARMVectorReg); + + if (vectors_overlap(vd, 1, vs, 2)) { + s = memcpy(&scratch, s, oprsz); + } + + for (i = 0; i < n; ++i) { + d0[H4(i)] = sve_f16_to_f32(s[H2(i)], fpst); + } + for (i = 0; i < n; ++i) { + d1[H4(i)] = sve_f16_to_f32(s[H2(n + i)], fpst); + } +} + +/* Deinterleave and convert. */ +void HELPER(sme2_fcvtl)(void *vd, void *vs, float_status *fpst, uint32_t desc) +{ + size_t i, n = simd_oprsz(desc) / 4; + float16 *s = vs; + float32 *d0 = vd; + float32 *d1 = vd + sizeof(ARMVectorReg); + + for (i = 0; i < n; ++i) { + float32 v0 = sve_f16_to_f32(s[H2(i * 2 + 0)], fpst); + float32 v1 = sve_f16_to_f32(s[H2(i * 2 + 1)], fpst); + d0[H4(i)] = v0; + d1[H4(i)] = v1; + } +} diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vg, \ * FZ16. When converting from fp16, this affects flushing input denormals; * when converting to fp16, this affects flushing output denormals. */ -static inline float32 sve_f16_to_f32(float16 f, float_status *fpst) +float32 sve_f16_to_f32(float16 f, float_status *fpst) { bool save = get_flush_inputs_to_zero(fpst); float32 ret; diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FCVT_n, aa64_sme2, do_zz_fpst, a, 0, FPST_A64, gen_helper_sme2_fcvt_n) TRANS_FEAT(FCVTN, aa64_sme2, do_zz_fpst, a, 0, FPST_A64, gen_helper_sme2_fcvtn) + +TRANS_FEAT(FCVT_w, aa64_sme_f16f16, do_zz_fpst, a, 0, + FPST_A64_F16, gen_helper_sme2_fcvt_w) +TRANS_FEAT(FCVTL, aa64_sme_f16f16, do_zz_fpst, a, 0, + FPST_A64_F16, gen_helper_sme2_fcvtl) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-55-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 9 +++++++++ target/arm/tcg/translate-sme.c | 5 +++++ 2 files changed, 14 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SUB_aaz_d 11000001 111 000010 .. 111 ...00 11 ... @az_4x4_o3 &zz_n n=1 zn=%zn_ax2 @zz_2x1 ........ ... ..... ...... zn:5 ..... \ &zz_n n=1 zd=%zd_ax2 +@zz_2x2 ........ ... ..... ...... .... . ..... \ + &zz_n n=2 zd=%zd_ax2 zn=%zn_ax2 +@zz_4x4 ........ ... ..... ...... .... . ..... \ + &zz_n n=4 zd=%zd_ax4 zn=%zn_ax4 BFCVT 11000001 011 00000 111000 ....0 ..... @zz_1x2 BFCVTN 11000001 011 00000 111000 ....1 ..... @zz_1x2 @@ -XXX,XX +XXX,XX @@ FCVTN 11000001 001 00000 111000 ....1 ..... @zz_1x2 FCVT_w 11000001 101 00000 111000 ..... ....0 @zz_2x1 FCVTL 11000001 101 00000 111000 ..... ....1 @zz_2x1 + +FCVTZS 11000001 001 00001 111000 ....0 ....0 @zz_2x2 +FCVTZS 11000001 001 10001 111000 ...00 ...00 @zz_4x4 +FCVTZU 11000001 001 00001 111000 ....1 ....0 @zz_2x2 +FCVTZU 11000001 001 10001 111000 ...01 ...00 @zz_4x4 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FCVT_w, aa64_sme_f16f16, do_zz_fpst, a, 0, FPST_A64_F16, gen_helper_sme2_fcvt_w) TRANS_FEAT(FCVTL, aa64_sme_f16f16, do_zz_fpst, a, 0, FPST_A64_F16, gen_helper_sme2_fcvtl) + +TRANS_FEAT(FCVTZS, aa64_sme2, do_zz_fpst, a, 0, + FPST_A64, gen_helper_gvec_vcvt_rz_fs) +TRANS_FEAT(FCVTZU, aa64_sme2, do_zz_fpst, a, 0, + FPST_A64, gen_helper_gvec_vcvt_rz_fu) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-56-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 2 ++ target/arm/tcg/sme.decode | 5 +++++ target/arm/tcg/sme_helper.c | 22 ++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 5 +++++ 4 files changed, 34 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sme2_fcvt_n, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_4(sme2_fcvtn, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_4(sme2_fcvt_w, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_4(sme2_fcvtl, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_4(sme2_scvtf, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_4(sme2_ucvtf, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ FCVTZS 11000001 001 00001 111000 ....0 ....0 @zz_2x2 FCVTZS 11000001 001 10001 111000 ...00 ...00 @zz_4x4 FCVTZU 11000001 001 00001 111000 ....1 ....0 @zz_2x2 FCVTZU 11000001 001 10001 111000 ...01 ...00 @zz_4x4 + +SCVTF 11000001 001 00010 111000 ....0 ....0 @zz_2x2 +SCVTF 11000001 001 10010 111000 ...00 ...00 @zz_4x4 +UCVTF 11000001 001 00010 111000 ....1 ....0 @zz_2x2 +UCVTF 11000001 001 10010 111000 ...01 ...00 @zz_4x4 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_fcvtl)(void *vd, void *vs, float_status *fpst, uint32_t desc) d1[H4(i)] = v1; } } + +void HELPER(sme2_scvtf)(void *vd, void *vs, float_status *fpst, uint32_t desc) +{ + size_t i, n = simd_oprsz(desc) / 4; + int32_t *d = vd; + float32 *s = vs; + + for (i = 0; i < n; ++i) { + d[i] = int32_to_float32(s[i], fpst); + } +} + +void HELPER(sme2_ucvtf)(void *vd, void *vs, float_status *fpst, uint32_t desc) +{ + size_t i, n = simd_oprsz(desc) / 4; + uint32_t *d = vd; + float32 *s = vs; + + for (i = 0; i < n; ++i) { + d[i] = uint32_to_float32(s[i], fpst); + } +} diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FCVTZS, aa64_sme2, do_zz_fpst, a, 0, FPST_A64, gen_helper_gvec_vcvt_rz_fs) TRANS_FEAT(FCVTZU, aa64_sme2, do_zz_fpst, a, 0, FPST_A64, gen_helper_gvec_vcvt_rz_fu) + +TRANS_FEAT(SCVTF, aa64_sme2, do_zz_fpst, a, 0, + FPST_A64, gen_helper_sme2_scvtf) +TRANS_FEAT(UCVTF, aa64_sme2, do_zz_fpst, a, 0, + FPST_A64, gen_helper_sme2_ucvtf) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-57-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 9 +++++++++ target/arm/tcg/translate-sme.c | 9 +++++++++ 2 files changed, 18 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SCVTF 11000001 001 00010 111000 ....0 ....0 @zz_2x2 SCVTF 11000001 001 10010 111000 ...00 ...00 @zz_4x4 UCVTF 11000001 001 00010 111000 ....1 ....0 @zz_2x2 UCVTF 11000001 001 10010 111000 ...01 ...00 @zz_4x4 + +FRINTN 11000001 101 01000 111000 ....0 ....0 @zz_2x2 +FRINTN 11000001 101 11000 111000 ...00 ...00 @zz_4x4 +FRINTP 11000001 101 01001 111000 ....0 ....0 @zz_2x2 +FRINTP 11000001 101 11001 111000 ...00 ...00 @zz_4x4 +FRINTM 11000001 101 01010 111000 ....0 ....0 @zz_2x2 +FRINTM 11000001 101 11010 111000 ...00 ...00 @zz_4x4 +FRINTA 11000001 101 01100 111000 ....0 ....0 @zz_2x2 +FRINTA 11000001 101 11100 111000 ...00 ...00 @zz_4x4 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SCVTF, aa64_sme2, do_zz_fpst, a, 0, FPST_A64, gen_helper_sme2_scvtf) TRANS_FEAT(UCVTF, aa64_sme2, do_zz_fpst, a, 0, FPST_A64, gen_helper_sme2_ucvtf) + +TRANS_FEAT(FRINTN, aa64_sme2, do_zz_fpst, a, float_round_nearest_even, + FPST_A64, gen_helper_gvec_vrint_rm_s) +TRANS_FEAT(FRINTP, aa64_sme2, do_zz_fpst, a, float_round_up, + FPST_A64, gen_helper_gvec_vrint_rm_s) +TRANS_FEAT(FRINTM, aa64_sme2, do_zz_fpst, a, float_round_down, + FPST_A64, gen_helper_gvec_vrint_rm_s) +TRANS_FEAT(FRINTA, aa64_sme2, do_zz_fpst, a, float_round_ties_away, + FPST_A64, gen_helper_gvec_vrint_rm_s) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Inputs are a wider type of indeterminate sign. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-58-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/vec_internal.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_internal.h +++ b/target/arm/tcg/vec_internal.h @@ -XXX,XX +XXX,XX @@ int16_t do_sqrdmlah_h(int16_t, int16_t, int16_t, bool, bool, uint32_t *); int32_t do_sqrdmlah_s(int32_t, int32_t, int32_t, bool, bool, uint32_t *); int64_t do_sqrdmlah_d(int64_t, int64_t, int64_t, bool, bool); +#define do_ssat_b(val) MIN(MAX(val, INT8_MIN), INT8_MAX) +#define do_ssat_h(val) MIN(MAX(val, INT16_MIN), INT16_MAX) +#define do_ssat_s(val) MIN(MAX(val, INT32_MIN), INT32_MAX) +#define do_usat_b(val) MIN(MAX(val, 0), UINT8_MAX) +#define do_usat_h(val) MIN(MAX(val, 0), UINT16_MAX) +#define do_usat_s(val) MIN(MAX(val, 0), UINT32_MAX) + /** * bfdotadd: * @sum: addend -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Replace and remove do_sat_bhs. This avoids multiple repetitions of INT*_MIN/MAX. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-59-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve_helper.c | 116 +++++++++++++++--------------------- 1 file changed, 48 insertions(+), 68 deletions(-) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ DO_ZPZZ(sve2_uhsub_zpzz_h, uint16_t, H1_2, DO_HSUB_BHS) DO_ZPZZ(sve2_uhsub_zpzz_s, uint32_t, H1_4, DO_HSUB_BHS) DO_ZPZZ_D(sve2_uhsub_zpzz_d, uint64_t, DO_HSUB_D) -static inline int32_t do_sat_bhs(int64_t val, int64_t min, int64_t max) -{ - return val >= max ? max : val <= min ? min : val; -} - -#define DO_SQADD_B(n, m) do_sat_bhs((int64_t)n + m, INT8_MIN, INT8_MAX) -#define DO_SQADD_H(n, m) do_sat_bhs((int64_t)n + m, INT16_MIN, INT16_MAX) -#define DO_SQADD_S(n, m) do_sat_bhs((int64_t)n + m, INT32_MIN, INT32_MAX) +#define DO_SQADD_B(n, m) do_ssat_b((int64_t)n + m) +#define DO_SQADD_H(n, m) do_ssat_h((int64_t)n + m) +#define DO_SQADD_S(n, m) do_ssat_s((int64_t)n + m) static inline int64_t do_sqadd_d(int64_t n, int64_t m) { @@ -XXX,XX +XXX,XX @@ DO_ZPZZ(sve2_sqadd_zpzz_h, int16_t, H1_2, DO_SQADD_H) DO_ZPZZ(sve2_sqadd_zpzz_s, int32_t, H1_4, DO_SQADD_S) DO_ZPZZ_D(sve2_sqadd_zpzz_d, int64_t, do_sqadd_d) -#define DO_UQADD_B(n, m) do_sat_bhs((int64_t)n + m, 0, UINT8_MAX) -#define DO_UQADD_H(n, m) do_sat_bhs((int64_t)n + m, 0, UINT16_MAX) -#define DO_UQADD_S(n, m) do_sat_bhs((int64_t)n + m, 0, UINT32_MAX) +#define DO_UQADD_B(n, m) do_usat_b((int64_t)n + m) +#define DO_UQADD_H(n, m) do_usat_h((int64_t)n + m) +#define DO_UQADD_S(n, m) do_usat_s((int64_t)n + m) static inline uint64_t do_uqadd_d(uint64_t n, uint64_t m) { @@ -XXX,XX +XXX,XX @@ DO_ZPZZ(sve2_uqadd_zpzz_h, uint16_t, H1_2, DO_UQADD_H) DO_ZPZZ(sve2_uqadd_zpzz_s, uint32_t, H1_4, DO_UQADD_S) DO_ZPZZ_D(sve2_uqadd_zpzz_d, uint64_t, do_uqadd_d) -#define DO_SQSUB_B(n, m) do_sat_bhs((int64_t)n - m, INT8_MIN, INT8_MAX) -#define DO_SQSUB_H(n, m) do_sat_bhs((int64_t)n - m, INT16_MIN, INT16_MAX) -#define DO_SQSUB_S(n, m) do_sat_bhs((int64_t)n - m, INT32_MIN, INT32_MAX) +#define DO_SQSUB_B(n, m) do_ssat_b((int64_t)n - m) +#define DO_SQSUB_H(n, m) do_ssat_h((int64_t)n - m) +#define DO_SQSUB_S(n, m) do_ssat_s((int64_t)n - m) static inline int64_t do_sqsub_d(int64_t n, int64_t m) { @@ -XXX,XX +XXX,XX @@ DO_ZPZZ(sve2_sqsub_zpzz_h, int16_t, H1_2, DO_SQSUB_H) DO_ZPZZ(sve2_sqsub_zpzz_s, int32_t, H1_4, DO_SQSUB_S) DO_ZPZZ_D(sve2_sqsub_zpzz_d, int64_t, do_sqsub_d) -#define DO_UQSUB_B(n, m) do_sat_bhs((int64_t)n - m, 0, UINT8_MAX) -#define DO_UQSUB_H(n, m) do_sat_bhs((int64_t)n - m, 0, UINT16_MAX) -#define DO_UQSUB_S(n, m) do_sat_bhs((int64_t)n - m, 0, UINT32_MAX) +#define DO_UQSUB_B(n, m) do_usat_b((int64_t)n - m) +#define DO_UQSUB_H(n, m) do_usat_h((int64_t)n - m) +#define DO_UQSUB_S(n, m) do_usat_s((int64_t)n - m) static inline uint64_t do_uqsub_d(uint64_t n, uint64_t m) { @@ -XXX,XX +XXX,XX @@ DO_ZPZZ(sve2_uqsub_zpzz_h, uint16_t, H1_2, DO_UQSUB_H) DO_ZPZZ(sve2_uqsub_zpzz_s, uint32_t, H1_4, DO_UQSUB_S) DO_ZPZZ_D(sve2_uqsub_zpzz_d, uint64_t, do_uqsub_d) -#define DO_SUQADD_B(n, m) \ - do_sat_bhs((int64_t)(int8_t)n + m, INT8_MIN, INT8_MAX) -#define DO_SUQADD_H(n, m) \ - do_sat_bhs((int64_t)(int16_t)n + m, INT16_MIN, INT16_MAX) -#define DO_SUQADD_S(n, m) \ - do_sat_bhs((int64_t)(int32_t)n + m, INT32_MIN, INT32_MAX) +#define DO_SUQADD_B(n, m) do_ssat_b((int64_t)(int8_t)n + m) +#define DO_SUQADD_H(n, m) do_ssat_h((int64_t)(int16_t)n + m) +#define DO_SUQADD_S(n, m) do_ssat_s((int64_t)(int32_t)n + m) static inline int64_t do_suqadd_d(int64_t n, uint64_t m) { @@ -XXX,XX +XXX,XX @@ DO_ZPZZ(sve2_suqadd_zpzz_h, uint16_t, H1_2, DO_SUQADD_H) DO_ZPZZ(sve2_suqadd_zpzz_s, uint32_t, H1_4, DO_SUQADD_S) DO_ZPZZ_D(sve2_suqadd_zpzz_d, uint64_t, do_suqadd_d) -#define DO_USQADD_B(n, m) \ - do_sat_bhs((int64_t)n + (int8_t)m, 0, UINT8_MAX) -#define DO_USQADD_H(n, m) \ - do_sat_bhs((int64_t)n + (int16_t)m, 0, UINT16_MAX) -#define DO_USQADD_S(n, m) \ - do_sat_bhs((int64_t)n + (int32_t)m, 0, UINT32_MAX) +#define DO_USQADD_B(n, m) do_usat_b((int64_t)n + (int8_t)m) +#define DO_USQADD_H(n, m) do_usat_h((int64_t)n + (int16_t)m) +#define DO_USQADD_S(n, m) do_usat_s((int64_t)n + (int32_t)m) static inline uint64_t do_usqadd_d(uint64_t n, int64_t m) { @@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, uint32_t desc) \ } \ } -#define DO_SQXTN_H(n) do_sat_bhs(n, INT8_MIN, INT8_MAX) -#define DO_SQXTN_S(n) do_sat_bhs(n, INT16_MIN, INT16_MAX) -#define DO_SQXTN_D(n) do_sat_bhs(n, INT32_MIN, INT32_MAX) +DO_XTNB(sve2_sqxtnb_h, int16_t, do_ssat_b) +DO_XTNB(sve2_sqxtnb_s, int32_t, do_ssat_h) +DO_XTNB(sve2_sqxtnb_d, int64_t, do_ssat_s) -DO_XTNB(sve2_sqxtnb_h, int16_t, DO_SQXTN_H) -DO_XTNB(sve2_sqxtnb_s, int32_t, DO_SQXTN_S) -DO_XTNB(sve2_sqxtnb_d, int64_t, DO_SQXTN_D) +DO_XTNT(sve2_sqxtnt_h, int16_t, int8_t, H1, do_ssat_b) +DO_XTNT(sve2_sqxtnt_s, int32_t, int16_t, H1_2, do_ssat_h) +DO_XTNT(sve2_sqxtnt_d, int64_t, int32_t, H1_4, do_ssat_s) -DO_XTNT(sve2_sqxtnt_h, int16_t, int8_t, H1, DO_SQXTN_H) -DO_XTNT(sve2_sqxtnt_s, int32_t, int16_t, H1_2, DO_SQXTN_S) -DO_XTNT(sve2_sqxtnt_d, int64_t, int32_t, H1_4, DO_SQXTN_D) +DO_XTNB(sve2_uqxtnb_h, uint16_t, do_usat_b) +DO_XTNB(sve2_uqxtnb_s, uint32_t, do_usat_h) +DO_XTNB(sve2_uqxtnb_d, uint64_t, do_usat_s) -#define DO_UQXTN_H(n) do_sat_bhs(n, 0, UINT8_MAX) -#define DO_UQXTN_S(n) do_sat_bhs(n, 0, UINT16_MAX) -#define DO_UQXTN_D(n) do_sat_bhs(n, 0, UINT32_MAX) +DO_XTNT(sve2_uqxtnt_h, uint16_t, uint8_t, H1, do_usat_b) +DO_XTNT(sve2_uqxtnt_s, uint32_t, uint16_t, H1_2, do_usat_h) +DO_XTNT(sve2_uqxtnt_d, uint64_t, uint32_t, H1_4, do_usat_s) -DO_XTNB(sve2_uqxtnb_h, uint16_t, DO_UQXTN_H) -DO_XTNB(sve2_uqxtnb_s, uint32_t, DO_UQXTN_S) -DO_XTNB(sve2_uqxtnb_d, uint64_t, DO_UQXTN_D) +DO_XTNB(sve2_sqxtunb_h, int16_t, do_usat_b) +DO_XTNB(sve2_sqxtunb_s, int32_t, do_usat_h) +DO_XTNB(sve2_sqxtunb_d, int64_t, do_usat_s) -DO_XTNT(sve2_uqxtnt_h, uint16_t, uint8_t, H1, DO_UQXTN_H) -DO_XTNT(sve2_uqxtnt_s, uint32_t, uint16_t, H1_2, DO_UQXTN_S) -DO_XTNT(sve2_uqxtnt_d, uint64_t, uint32_t, H1_4, DO_UQXTN_D) - -DO_XTNB(sve2_sqxtunb_h, int16_t, DO_UQXTN_H) -DO_XTNB(sve2_sqxtunb_s, int32_t, DO_UQXTN_S) -DO_XTNB(sve2_sqxtunb_d, int64_t, DO_UQXTN_D) - -DO_XTNT(sve2_sqxtunt_h, int16_t, int8_t, H1, DO_UQXTN_H) -DO_XTNT(sve2_sqxtunt_s, int32_t, int16_t, H1_2, DO_UQXTN_S) -DO_XTNT(sve2_sqxtunt_d, int64_t, int32_t, H1_4, DO_UQXTN_D) +DO_XTNT(sve2_sqxtunt_h, int16_t, int8_t, H1, do_usat_b) +DO_XTNT(sve2_sqxtunt_s, int32_t, int16_t, H1_2, do_usat_h) +DO_XTNT(sve2_sqxtunt_d, int64_t, int32_t, H1_4, do_usat_s) #undef DO_XTNB #undef DO_XTNT @@ -XXX,XX +XXX,XX @@ DO_SHRNT(sve2_rshrnt_h, uint16_t, uint8_t, H1_2, H1, do_urshr) DO_SHRNT(sve2_rshrnt_s, uint32_t, uint16_t, H1_4, H1_2, do_urshr) DO_SHRNT(sve2_rshrnt_d, uint64_t, uint32_t, H1_8, H1_4, do_urshr) -#define DO_SQSHRUN_H(x, sh) do_sat_bhs((int64_t)(x) >> sh, 0, UINT8_MAX) -#define DO_SQSHRUN_S(x, sh) do_sat_bhs((int64_t)(x) >> sh, 0, UINT16_MAX) -#define DO_SQSHRUN_D(x, sh) \ - do_sat_bhs((int64_t)(x) >> (sh < 64 ? sh : 63), 0, UINT32_MAX) +#define DO_SQSHRUN_H(x, sh) do_usat_b((int64_t)(x) >> sh) +#define DO_SQSHRUN_S(x, sh) do_usat_h((int64_t)(x) >> sh) +#define DO_SQSHRUN_D(x, sh) do_usat_s((int64_t)(x) >> (sh < 64 ? sh : 63)) DO_SHRNB(sve2_sqshrunb_h, int16_t, uint8_t, DO_SQSHRUN_H) DO_SHRNB(sve2_sqshrunb_s, int32_t, uint16_t, DO_SQSHRUN_S) @@ -XXX,XX +XXX,XX @@ DO_SHRNT(sve2_sqshrunt_h, int16_t, uint8_t, H1_2, H1, DO_SQSHRUN_H) DO_SHRNT(sve2_sqshrunt_s, int32_t, uint16_t, H1_4, H1_2, DO_SQSHRUN_S) DO_SHRNT(sve2_sqshrunt_d, int64_t, uint32_t, H1_8, H1_4, DO_SQSHRUN_D) -#define DO_SQRSHRUN_H(x, sh) do_sat_bhs(do_srshr(x, sh), 0, UINT8_MAX) -#define DO_SQRSHRUN_S(x, sh) do_sat_bhs(do_srshr(x, sh), 0, UINT16_MAX) -#define DO_SQRSHRUN_D(x, sh) do_sat_bhs(do_srshr(x, sh), 0, UINT32_MAX) +#define DO_SQRSHRUN_H(x, sh) do_usat_b(do_srshr(x, sh)) +#define DO_SQRSHRUN_S(x, sh) do_usat_h(do_srshr(x, sh)) +#define DO_SQRSHRUN_D(x, sh) do_usat_s(do_srshr(x, sh)) DO_SHRNB(sve2_sqrshrunb_h, int16_t, uint8_t, DO_SQRSHRUN_H) DO_SHRNB(sve2_sqrshrunb_s, int32_t, uint16_t, DO_SQRSHRUN_S) @@ -XXX,XX +XXX,XX @@ DO_SHRNT(sve2_sqrshrunt_h, int16_t, uint8_t, H1_2, H1, DO_SQRSHRUN_H) DO_SHRNT(sve2_sqrshrunt_s, int32_t, uint16_t, H1_4, H1_2, DO_SQRSHRUN_S) DO_SHRNT(sve2_sqrshrunt_d, int64_t, uint32_t, H1_8, H1_4, DO_SQRSHRUN_D) -#define DO_SQSHRN_H(x, sh) do_sat_bhs(x >> sh, INT8_MIN, INT8_MAX) -#define DO_SQSHRN_S(x, sh) do_sat_bhs(x >> sh, INT16_MIN, INT16_MAX) -#define DO_SQSHRN_D(x, sh) do_sat_bhs(x >> sh, INT32_MIN, INT32_MAX) +#define DO_SQSHRN_H(x, sh) do_ssat_b(x >> sh) +#define DO_SQSHRN_S(x, sh) do_ssat_h(x >> sh) +#define DO_SQSHRN_D(x, sh) do_ssat_s(x >> sh) DO_SHRNB(sve2_sqshrnb_h, int16_t, uint8_t, DO_SQSHRN_H) DO_SHRNB(sve2_sqshrnb_s, int32_t, uint16_t, DO_SQSHRN_S) @@ -XXX,XX +XXX,XX @@ DO_SHRNT(sve2_sqshrnt_h, int16_t, uint8_t, H1_2, H1, DO_SQSHRN_H) DO_SHRNT(sve2_sqshrnt_s, int32_t, uint16_t, H1_4, H1_2, DO_SQSHRN_S) DO_SHRNT(sve2_sqshrnt_d, int64_t, uint32_t, H1_8, H1_4, DO_SQSHRN_D) -#define DO_SQRSHRN_H(x, sh) do_sat_bhs(do_srshr(x, sh), INT8_MIN, INT8_MAX) -#define DO_SQRSHRN_S(x, sh) do_sat_bhs(do_srshr(x, sh), INT16_MIN, INT16_MAX) -#define DO_SQRSHRN_D(x, sh) do_sat_bhs(do_srshr(x, sh), INT32_MIN, INT32_MAX) +#define DO_SQRSHRN_H(x, sh) do_ssat_b(do_srshr(x, sh)) +#define DO_SQRSHRN_S(x, sh) do_ssat_h(do_srshr(x, sh)) +#define DO_SQRSHRN_D(x, sh) do_ssat_s(do_srshr(x, sh)) DO_SHRNB(sve2_sqrshrnb_h, int16_t, uint8_t, DO_SQRSHRN_H) DO_SHRNB(sve2_sqrshrnb_s, int32_t, uint16_t, DO_SQRSHRN_S) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20250704142112.1018902-60-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 20 ++++++ target/arm/tcg/sme.decode | 22 +++++++ target/arm/tcg/sme_helper.c | 116 +++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 35 ++++++++++ 4 files changed, 193 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sme2_fcvt_w, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_4(sme2_fcvtl, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_4(sme2_scvtf, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_4(sme2_ucvtf, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) + +DEF_HELPER_FLAGS_3(sme2_sqcvt_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqcvt_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvtu_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvt_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqcvt_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvtu_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvt_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqcvt_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvtu_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sme2_sqcvtn_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqcvtn_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvtun_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvtn_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqcvtn_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvtun_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvtn_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqcvtn_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvtun_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SUB_aaz_d 11000001 111 000010 .. 111 ...00 11 ... @az_4x4_o3 &zz_n zd zn n @zz_1x2 ........ ... ..... ...... ..... zd:5 \ &zz_n n=1 zn=%zn_ax2 +@zz_1x4 ........ ... ..... ...... ..... zd:5 \ + &zz_n n=1 zn=%zn_ax4 @zz_2x1 ........ ... ..... ...... zn:5 ..... \ &zz_n n=1 zd=%zd_ax2 @zz_2x2 ........ ... ..... ...... .... . ..... \ @@ -XXX,XX +XXX,XX @@ FRINTM 11000001 101 01010 111000 ....0 ....0 @zz_2x2 FRINTM 11000001 101 11010 111000 ...00 ...00 @zz_4x4 FRINTA 11000001 101 01100 111000 ....0 ....0 @zz_2x2 FRINTA 11000001 101 11100 111000 ...00 ...00 @zz_4x4 + +SQCVT_sh 11000001 001 00011 111000 ....0 ..... @zz_1x2 +UQCVT_sh 11000001 001 00011 111000 ....1 ..... @zz_1x2 +SQCVTU_sh 11000001 011 00011 111000 ....0 ..... @zz_1x2 + +SQCVT_sb 11000001 001 10011 111000 ...00 ..... @zz_1x4 +UQCVT_sb 11000001 001 10011 111000 ...01 ..... @zz_1x4 +SQCVTU_sb 11000001 011 10011 111000 ...00 ..... @zz_1x4 + +SQCVT_dh 11000001 101 10011 111000 ...00 ..... @zz_1x4 +UQCVT_dh 11000001 101 10011 111000 ...01 ..... @zz_1x4 +SQCVTU_dh 11000001 111 10011 111000 ...00 ..... @zz_1x4 + +SQCVTN_sb 11000001 001 10011 111000 ...10 ..... @zz_1x4 +UQCVTN_sb 11000001 001 10011 111000 ...11 ..... @zz_1x4 +SQCVTUN_sb 11000001 011 10011 111000 ...10 ..... @zz_1x4 + +SQCVTN_dh 11000001 101 10011 111000 ...10 ..... @zz_1x4 +UQCVTN_dh 11000001 101 10011 111000 ...11 ..... @zz_1x4 +SQCVTUN_dh 11000001 111 10011 111000 ...10 ..... @zz_1x4 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_fcvt_n)(void *vd, void *vs, float_status *fpst, uint32_t desc) } } +#define SQCVT2(NAME, TW, TN, HW, HN, SAT) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch; \ + size_t oprsz = simd_oprsz(desc), n = oprsz / sizeof(TW); \ + TW *s0 = vs, *s1 = vs + sizeof(ARMVectorReg); \ + TN *d = vd; \ + if (vectors_overlap(vd, 1, vs, 2)) { \ + d = (TN *)&scratch; \ + } \ + for (size_t i = 0; i < n; ++i) { \ + d[HN(i)] = SAT(s0[HW(i)]); \ + d[HN(i + n)] = SAT(s1[HW(i)]); \ + } \ + if (d != vd) { \ + memcpy(vd, d, oprsz); \ + } \ +} + +SQCVT2(sme2_sqcvt_sh, int32_t, int16_t, H4, H2, do_ssat_h) +SQCVT2(sme2_uqcvt_sh, uint32_t, uint16_t, H4, H2, do_usat_h) +SQCVT2(sme2_sqcvtu_sh, int32_t, uint16_t, H4, H2, do_usat_h) + +#undef SQCVT2 + +#define SQCVT4(NAME, TW, TN, HW, HN, SAT) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch; \ + size_t oprsz = simd_oprsz(desc), n = oprsz / sizeof(TW); \ + TW *s0 = vs, *s1 = vs + sizeof(ARMVectorReg); \ + TW *s2 = vs + 2 * sizeof(ARMVectorReg); \ + TW *s3 = vs + 3 * sizeof(ARMVectorReg); \ + TN *d = vd; \ + if (vectors_overlap(vd, 1, vs, 4)) { \ + d = (TN *)&scratch; \ + } \ + for (size_t i = 0; i < n; ++i) { \ + d[HN(i)] = SAT(s0[HW(i)]); \ + d[HN(i + n)] = SAT(s1[HW(i)]); \ + d[HN(i + 2 * n)] = SAT(s2[HW(i)]); \ + d[HN(i + 3 * n)] = SAT(s3[HW(i)]); \ + } \ + if (d != vd) { \ + memcpy(vd, d, oprsz); \ + } \ +} + +SQCVT4(sme2_sqcvt_sb, int32_t, int8_t, H4, H2, do_ssat_b) +SQCVT4(sme2_uqcvt_sb, uint32_t, uint8_t, H4, H2, do_usat_b) +SQCVT4(sme2_sqcvtu_sb, int32_t, uint8_t, H4, H2, do_usat_b) + +SQCVT4(sme2_sqcvt_dh, int64_t, int16_t, H8, H2, do_ssat_h) +SQCVT4(sme2_uqcvt_dh, uint64_t, uint16_t, H8, H2, do_usat_h) +SQCVT4(sme2_sqcvtu_dh, int64_t, uint16_t, H8, H2, do_usat_h) + +#undef SQCVT4 + /* Convert and interleave */ void HELPER(sme2_bfcvtn)(void *vd, void *vs, float_status *fpst, uint32_t desc) { @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_fcvtn)(void *vd, void *vs, float_status *fpst, uint32_t desc) } } +#define SQCVTN2(NAME, TW, TN, HW, HN, SAT) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch; \ + size_t oprsz = simd_oprsz(desc), n = oprsz / sizeof(TW); \ + TW *s0 = vs, *s1 = vs + sizeof(ARMVectorReg); \ + TN *d = vd; \ + if (vectors_overlap(vd, 1, vs, 2)) { \ + d = (TN *)&scratch; \ + } \ + for (size_t i = 0; i < n; ++i) { \ + d[HN(2 * i + 0)] = SAT(s0[HW(i)]); \ + d[HN(2 * i + 1)] = SAT(s1[HW(i)]); \ + } \ + if (d != vd) { \ + memcpy(vd, d, oprsz); \ + } \ +} + +SQCVTN2(sme2_sqcvtn_sh, int32_t, int16_t, H4, H2, do_ssat_h) +SQCVTN2(sme2_uqcvtn_sh, uint32_t, uint16_t, H4, H2, do_usat_h) +SQCVTN2(sme2_sqcvtun_sh, int32_t, uint16_t, H4, H2, do_usat_h) + +#undef SQCVTN2 + +#define SQCVTN4(NAME, TW, TN, HW, HN, SAT) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch; \ + size_t oprsz = simd_oprsz(desc), n = oprsz / sizeof(TW); \ + TW *s0 = vs, *s1 = vs + sizeof(ARMVectorReg); \ + TW *s2 = vs + 2 * sizeof(ARMVectorReg); \ + TW *s3 = vs + 3 * sizeof(ARMVectorReg); \ + TN *d = vd; \ + if (vectors_overlap(vd, 1, vs, 4)) { \ + d = (TN *)&scratch; \ + } \ + for (size_t i = 0; i < n; ++i) { \ + d[HN(4 * i + 0)] = SAT(s0[HW(i)]); \ + d[HN(4 * i + 1)] = SAT(s1[HW(i)]); \ + d[HN(4 * i + 2)] = SAT(s2[HW(i)]); \ + d[HN(4 * i + 3)] = SAT(s3[HW(i)]); \ + } \ + if (d != vd) { \ + memcpy(vd, d, oprsz); \ + } \ +} + +SQCVTN4(sme2_sqcvtn_sb, int32_t, int8_t, H4, H1, do_ssat_b) +SQCVTN4(sme2_uqcvtn_sb, uint32_t, uint8_t, H4, H1, do_usat_b) +SQCVTN4(sme2_sqcvtun_sb, int32_t, uint8_t, H4, H1, do_usat_b) + +SQCVTN4(sme2_sqcvtn_dh, int64_t, int16_t, H8, H2, do_ssat_h) +SQCVTN4(sme2_uqcvtn_dh, uint64_t, uint16_t, H8, H2, do_usat_h) +SQCVTN4(sme2_sqcvtun_dh, int64_t, uint16_t, H8, H2, do_usat_h) + +#undef SQCVTN4 + /* Expand and convert */ void HELPER(sme2_fcvt_w)(void *vd, void *vs, float_status *fpst, uint32_t desc) { diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FRINTM, aa64_sme2, do_zz_fpst, a, float_round_down, FPST_A64, gen_helper_gvec_vrint_rm_s) TRANS_FEAT(FRINTA, aa64_sme2, do_zz_fpst, a, float_round_ties_away, FPST_A64, gen_helper_gvec_vrint_rm_s) + +static bool do_zz(DisasContext *s, arg_zz_n *a, int data, + gen_helper_gvec_2 *fn) +{ + if (sme_sm_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + + for (int i = 0, n = a->n; i < n; ++i) { + tcg_gen_gvec_2_ool(vec_full_reg_offset(s, a->zd + i), + vec_full_reg_offset(s, a->zn + i), + svl, svl, data, fn); + } + } + return true; +} + +TRANS_FEAT(SQCVT_sh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvt_sh) +TRANS_FEAT(UQCVT_sh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uqcvt_sh) +TRANS_FEAT(SQCVTU_sh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtu_sh) + +TRANS_FEAT(SQCVT_sb, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvt_sb) +TRANS_FEAT(UQCVT_sb, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uqcvt_sb) +TRANS_FEAT(SQCVTU_sb, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtu_sb) + +TRANS_FEAT(SQCVT_dh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvt_dh) +TRANS_FEAT(UQCVT_dh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uqcvt_dh) +TRANS_FEAT(SQCVTU_dh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtu_dh) + +TRANS_FEAT(SQCVTN_sb, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtn_sb) +TRANS_FEAT(UQCVTN_sb, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uqcvtn_sb) +TRANS_FEAT(SQCVTUN_sb, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtun_sb) + +TRANS_FEAT(SQCVTN_dh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtn_dh) +TRANS_FEAT(UQCVTN_dh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uqcvtn_dh) +TRANS_FEAT(SQCVTUN_dh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtun_dh) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-61-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 20 ++++++++++++++++---- target/arm/tcg/translate-sve.c | 7 +++++++ 2 files changed, 23 insertions(+), 4 deletions(-) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ # as propagated via the MOVPRFX instruction. %reg_movprfx 0:5 +%rn_ax2 6:4 !function=times_2 + ########################################################################### # Named attribute sets. These are used to make nice(er) names # when creating helpers common to those for the individual @@ -XXX,XX +XXX,XX @@ # Two operand @pd_pn ........ esz:2 .. .... ....... rn:4 . rd:4 &rr_esz @rd_rn ........ esz:2 ...... ...... rn:5 rd:5 &rr_esz +@rd_rnx2 ........ ... ..... ...... ..... rd:5 &rr_esz rn=%rn_ax2 # Two operand with governing predicate, flags setting @pd_pg_pn_s ........ . s:1 ...... .. pg:4 . rn:4 . rd:4 &rpr_s @@ -XXX,XX +XXX,XX @@ UABA 01000101 .. 0 ..... 11111 1 ..... ..... @rd_rn_rm #### SVE2 Narrowing ## SVE2 saturating extract narrow - # Bits 23, 18-16 are zero, limited in the translator via esz < 3 & imm == 0. -SQXTNB 01000101 .. 1 ..... 010 000 ..... ..... @rd_rn_tszimm_shl + +{ + SQCVTN_sh 01000101 00 1 10001 010 000 ....0 ..... @rd_rnx2 esz=1 + SQXTNB 01000101 .. 1 ..... 010 000 ..... ..... @rd_rn_tszimm_shl +} SQXTNT 01000101 .. 1 ..... 010 001 ..... ..... @rd_rn_tszimm_shl -UQXTNB 01000101 .. 1 ..... 010 010 ..... ..... @rd_rn_tszimm_shl +{ + UQCVTN_sh 01000101 00 1 10001 010 010 ....0 ..... @rd_rnx2 esz=1 + UQXTNB 01000101 .. 1 ..... 010 010 ..... ..... @rd_rn_tszimm_shl +} UQXTNT 01000101 .. 1 ..... 010 011 ..... ..... @rd_rn_tszimm_shl -SQXTUNB 01000101 .. 1 ..... 010 100 ..... ..... @rd_rn_tszimm_shl +{ + SQCVTUN_sh 01000101 00 1 10001 010 100 ....0 ..... @rd_rnx2 esz=1 + SQXTUNB 01000101 .. 1 ..... 010 100 ..... ..... @rd_rn_tszimm_shl +} SQXTUNT 01000101 .. 1 ..... 010 101 ..... ..... @rd_rn_tszimm_shl ## SVE2 bitwise shift right narrow diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static void gen_uclamp(unsigned vece, uint32_t d, uint32_t n, uint32_t m, } TRANS_FEAT(UCLAMP, aa64_sme, gen_gvec_fn_arg_zzzz, gen_uclamp, a) + +TRANS_FEAT(SQCVTN_sh, aa64_sme2_or_sve2p1, gen_gvec_ool_zz, + gen_helper_sme2_sqcvtn_sh, a->rd, a->rn, 0) +TRANS_FEAT(UQCVTN_sh, aa64_sme2_or_sve2p1, gen_gvec_ool_zz, + gen_helper_sme2_uqcvtn_sh, a->rd, a->rn, 0) +TRANS_FEAT(SQCVTUN_sh, aa64_sme2_or_sve2p1, gen_gvec_ool_zz, + gen_helper_sme2_sqcvtun_sh, a->rd, a->rn, 0) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20250704142112.1018902-62-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 13 ++++++++++++ target/arm/tcg/sme.decode | 18 ++++++++++++++++ target/arm/tcg/sme_helper.c | 38 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 16 ++++++++++++++ 4 files changed, 85 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(sme2_sqcvtun_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_sqcvtn_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uqcvtn_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_sqcvtun_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sme2_sunpk2_bh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sunpk2_hs, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sunpk2_sd, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sunpk4_bh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sunpk4_hs, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sunpk4_sd, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uunpk2_bh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uunpk2_hs, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uunpk2_sd, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uunpk4_bh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uunpk4_hs, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uunpk4_sd, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SUB_aaz_d 11000001 111 000010 .. 111 ...00 11 ... @az_4x4_o3 &zz_n n=2 zd=%zd_ax2 zn=%zn_ax2 @zz_4x4 ........ ... ..... ...... .... . ..... \ &zz_n n=4 zd=%zd_ax4 zn=%zn_ax4 +@zz_4x2_n1 ........ ... ..... ...... .... . ..... \ + &zz_n n=1 zd=%zd_ax4 zn=%zn_ax2 BFCVT 11000001 011 00000 111000 ....0 ..... @zz_1x2 BFCVTN 11000001 011 00000 111000 ....1 ..... @zz_1x2 @@ -XXX,XX +XXX,XX @@ SQCVTUN_sb 11000001 011 10011 111000 ...10 ..... @zz_1x4 SQCVTN_dh 11000001 101 10011 111000 ...10 ..... @zz_1x4 UQCVTN_dh 11000001 101 10011 111000 ...11 ..... @zz_1x4 SQCVTUN_dh 11000001 111 10011 111000 ...10 ..... @zz_1x4 + +SUNPK_2bh 11000001 011 00101 111000 ..... ....0 @zz_2x1 +SUNPK_2hs 11000001 101 00101 111000 ..... ....0 @zz_2x1 +SUNPK_2sd 11000001 111 00101 111000 ..... ....0 @zz_2x1 + +UUNPK_2bh 11000001 011 00101 111000 ..... ....1 @zz_2x1 +UUNPK_2hs 11000001 101 00101 111000 ..... ....1 @zz_2x1 +UUNPK_2sd 11000001 111 00101 111000 ..... ....1 @zz_2x1 + +SUNPK_4bh 11000001 011 10101 111000 ....0 ...00 @zz_4x2_n1 +SUNPK_4hs 11000001 101 10101 111000 ....0 ...00 @zz_4x2_n1 +SUNPK_4sd 11000001 111 10101 111000 ....0 ...00 @zz_4x2_n1 + +UUNPK_4bh 11000001 011 10101 111000 ....0 ...01 @zz_4x2_n1 +UUNPK_4hs 11000001 101 10101 111000 ....0 ...01 @zz_4x2_n1 +UUNPK_4sd 11000001 111 10101 111000 ....0 ...01 @zz_4x2_n1 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_fcvt_w)(void *vd, void *vs, float_status *fpst, uint32_t desc) } } +#define UNPK(NAME, SREG, TW, TN, HW, HN) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch[SREG]; \ + size_t oprsz = simd_oprsz(desc); \ + size_t n = oprsz / sizeof(TW); \ + if (vectors_overlap(vd, 2 * SREG, vs, SREG)) { \ + vs = memcpy(scratch, vs, sizeof(scratch)); \ + } \ + for (size_t r = 0; r < SREG; ++r) { \ + TN *s = vs + r * sizeof(ARMVectorReg); \ + for (size_t i = 0; i < 2; ++i) { \ + TW *d = vd + (2 * r + i) * sizeof(ARMVectorReg); \ + for (size_t e = 0; e < n; ++e) { \ + d[HW(e)] = s[HN(i * n + e)]; \ + } \ + } \ + } \ +} + +UNPK(sme2_sunpk2_bh, 1, int16_t, int8_t, H2, H1) +UNPK(sme2_sunpk2_hs, 1, int32_t, int16_t, H4, H2) +UNPK(sme2_sunpk2_sd, 1, int64_t, int32_t, H8, H4) + +UNPK(sme2_sunpk4_bh, 2, int16_t, int8_t, H2, H1) +UNPK(sme2_sunpk4_hs, 2, int32_t, int16_t, H4, H2) +UNPK(sme2_sunpk4_sd, 2, int64_t, int32_t, H8, H4) + +UNPK(sme2_uunpk2_bh, 1, uint16_t, uint8_t, H2, H1) +UNPK(sme2_uunpk2_hs, 1, uint32_t, uint16_t, H4, H2) +UNPK(sme2_uunpk2_sd, 1, uint64_t, uint32_t, H8, H4) + +UNPK(sme2_uunpk4_bh, 2, uint16_t, uint8_t, H2, H1) +UNPK(sme2_uunpk4_hs, 2, uint32_t, uint16_t, H4, H2) +UNPK(sme2_uunpk4_sd, 2, uint64_t, uint32_t, H8, H4) + +#undef UNPK + /* Deinterleave and convert. */ void HELPER(sme2_fcvtl)(void *vd, void *vs, float_status *fpst, uint32_t desc) { diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SQCVTUN_sb, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtun_sb) TRANS_FEAT(SQCVTN_dh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtn_dh) TRANS_FEAT(UQCVTN_dh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uqcvtn_dh) TRANS_FEAT(SQCVTUN_dh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtun_dh) + +TRANS_FEAT(SUNPK_2bh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sunpk2_bh) +TRANS_FEAT(SUNPK_2hs, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sunpk2_hs) +TRANS_FEAT(SUNPK_2sd, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sunpk2_sd) + +TRANS_FEAT(SUNPK_4bh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sunpk4_bh) +TRANS_FEAT(SUNPK_4hs, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sunpk4_hs) +TRANS_FEAT(SUNPK_4sd, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sunpk4_sd) + +TRANS_FEAT(UUNPK_2bh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk2_bh) +TRANS_FEAT(UUNPK_2hs, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk2_hs) +TRANS_FEAT(UUNPK_2sd, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk2_sd) + +TRANS_FEAT(UUNPK_4bh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk4_bh) +TRANS_FEAT(UUNPK_4hs, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk4_hs) +TRANS_FEAT(UUNPK_4sd, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk4_sd) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-63-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 12 ++++++ target/arm/tcg/sme.decode | 11 ++++++ target/arm/tcg/sme_helper.c | 68 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 39 +++++++++++++++++++ 4 files changed, 130 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(sme2_uunpk2_sd, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uunpk4_bh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uunpk4_hs, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uunpk4_sd, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sme2_zip4_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_zip4_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_zip4_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_zip4_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_zip4_q, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sme2_uzp4_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uzp4_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uzp4_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uzp4_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uzp4_q, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SUB_aaz_d 11000001 111 000010 .. 111 ...00 11 ... @az_4x4_o3 ### SME2 Multi-vector SVE Constructive Unary +&zz_e zd zn esz &zz_n zd zn n @zz_1x2 ........ ... ..... ...... ..... zd:5 \ &zz_n n=1 zn=%zn_ax2 @@ -XXX,XX +XXX,XX @@ SUNPK_4sd 11000001 111 10101 111000 ....0 ...00 @zz_4x2_n1 UUNPK_4bh 11000001 011 10101 111000 ....0 ...01 @zz_4x2_n1 UUNPK_4hs 11000001 101 10101 111000 ....0 ...01 @zz_4x2_n1 UUNPK_4sd 11000001 111 10101 111000 ....0 ...01 @zz_4x2_n1 + +ZIP_4 11000001 esz:2 1 10110 111000 ...00 ... 00 \ + &zz_e zd=%zd_ax4 zn=%zn_ax4 +ZIP_4 11000001 001 10111 111000 ...00 ... 00 \ + &zz_e esz=4 zd=%zd_ax4 zn=%zn_ax4 + +UZP_4 11000001 esz:2 1 10110 111000 ...00 ... 10 \ + &zz_e zd=%zd_ax4 zn=%zn_ax4 +UZP_4 11000001 001 10111 111000 ...00 ... 10 \ + &zz_e esz=4 zd=%zd_ax4 zn=%zn_ax4 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_ucvtf)(void *vd, void *vs, float_status *fpst, uint32_t desc) d[i] = uint32_to_float32(s[i], fpst); } } + +#define ZIP4(NAME, TYPE, H) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch[4]; \ + size_t oprsz = simd_oprsz(desc); \ + size_t quads = oprsz / (sizeof(TYPE) * 4); \ + TYPE *s0, *s1, *s2, *s3; \ + if (vs == vd) { \ + vs = memcpy(scratch, vs, sizeof(scratch)); \ + } \ + s0 = vs; \ + s1 = vs + sizeof(ARMVectorReg); \ + s2 = vs + 2 * sizeof(ARMVectorReg); \ + s3 = vs + 3 * sizeof(ARMVectorReg); \ + for (size_t r = 0; r < 4; ++r) { \ + TYPE *d = vd + r * sizeof(ARMVectorReg); \ + size_t base = r * quads; \ + for (size_t q = 0; q < quads; ++q) { \ + d[H(4 * q + 0)] = s0[base + H(q)]; \ + d[H(4 * q + 1)] = s1[base + H(q)]; \ + d[H(4 * q + 2)] = s2[base + H(q)]; \ + d[H(4 * q + 3)] = s3[base + H(q)]; \ + } \ + } \ +} + +ZIP4(sme2_zip4_b, uint8_t, H1) +ZIP4(sme2_zip4_h, uint16_t, H2) +ZIP4(sme2_zip4_s, uint32_t, H4) +ZIP4(sme2_zip4_d, uint64_t, ) +ZIP4(sme2_zip4_q, Int128, ) + +#undef ZIP4 + +#define UZP4(NAME, TYPE, H) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch[4]; \ + size_t oprsz = simd_oprsz(desc); \ + size_t quads = oprsz / (sizeof(TYPE) * 4); \ + TYPE *d0, *d1, *d2, *d3; \ + if (vs == vd) { \ + vs = memcpy(scratch, vs, sizeof(scratch)); \ + } \ + d0 = vd; \ + d1 = vd + sizeof(ARMVectorReg); \ + d2 = vd + 2 * sizeof(ARMVectorReg); \ + d3 = vd + 3 * sizeof(ARMVectorReg); \ + for (size_t r = 0; r < 4; ++r) { \ + TYPE *s = vs + r * sizeof(ARMVectorReg); \ + size_t base = r * quads; \ + for (size_t q = 0; q < quads; ++q) { \ + d0[base + H(q)] = s[H(4 * q + 0)]; \ + d1[base + H(q)] = s[H(4 * q + 1)]; \ + d2[base + H(q)] = s[H(4 * q + 2)]; \ + d3[base + H(q)] = s[H(4 * q + 3)]; \ + } \ + } \ +} + +UZP4(sme2_uzp4_b, uint8_t, H1) +UZP4(sme2_uzp4_h, uint16_t, H2) +UZP4(sme2_uzp4_s, uint32_t, H4) +UZP4(sme2_uzp4_d, uint64_t, ) +UZP4(sme2_uzp4_q, Int128, ) + +#undef UZP4 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(UUNPK_2sd, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk2_sd) TRANS_FEAT(UUNPK_4bh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk4_bh) TRANS_FEAT(UUNPK_4hs, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk4_hs) TRANS_FEAT(UUNPK_4sd, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk4_sd) + +static bool do_zipuzp_4(DisasContext *s, arg_zz_e *a, + gen_helper_gvec_2 * const fn[5]) +{ + int bytes_per_op = 4 << a->esz; + + /* Both MO_64 and MO_128 can fail the size test. */ + if (s->max_svl < bytes_per_op) { + unallocated_encoding(s); + } else if (sme_sm_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + if (svl < bytes_per_op) { + unallocated_encoding(s); + } else { + tcg_gen_gvec_2_ool(vec_full_reg_offset(s, a->zd), + vec_full_reg_offset(s, a->zn), + svl, svl, 0, fn[a->esz]); + } + } + return true; +} + +static gen_helper_gvec_2 * const zip4_fns[] = { + gen_helper_sme2_zip4_b, + gen_helper_sme2_zip4_h, + gen_helper_sme2_zip4_s, + gen_helper_sme2_zip4_d, + gen_helper_sme2_zip4_q, +}; +TRANS_FEAT(ZIP_4, aa64_sme2, do_zipuzp_4, a, zip4_fns) + +static gen_helper_gvec_2 * const uzp4_fns[] = { + gen_helper_sme2_uzp4_b, + gen_helper_sme2_uzp4_h, + gen_helper_sme2_uzp4_s, + gen_helper_sme2_uzp4_d, + gen_helper_sme2_uzp4_q, +}; +TRANS_FEAT(UZP_4, aa64_sme2, do_zipuzp_4, a, uzp4_fns) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Unify two copies of these inline functions. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-64-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/vec_internal.h | 21 +++++++++++++++++++++ target/arm/tcg/mve_helper.c | 21 --------------------- target/arm/tcg/sve_helper.c | 21 --------------------- 3 files changed, 21 insertions(+), 42 deletions(-) diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_internal.h +++ b/target/arm/tcg/vec_internal.h @@ -XXX,XX +XXX,XX @@ int64_t do_sqrdmlah_d(int64_t, int64_t, int64_t, bool, bool); #define do_usat_h(val) MIN(MAX(val, 0), UINT16_MAX) #define do_usat_s(val) MIN(MAX(val, 0), UINT32_MAX) +static inline uint64_t do_urshr(uint64_t x, unsigned sh) +{ + if (likely(sh < 64)) { + return (x >> sh) + ((x >> (sh - 1)) & 1); + } else if (sh == 64) { + return x >> 63; + } else { + return 0; + } +} + +static inline int64_t do_srshr(int64_t x, unsigned sh) +{ + if (likely(sh < 64)) { + return (x >> sh) + ((x >> (sh - 1)) & 1); + } else { + /* Rounding the sign bit always produces 0. */ + return 0; + } +} + /** * bfdotadd: * @sum: addend diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/mve_helper.c +++ b/target/arm/tcg/mve_helper.c @@ -XXX,XX +XXX,XX @@ DO_VSHLL_ALL(vshllt, true) DO_VSHRN(OP##tb, true, 1, uint8_t, 2, uint16_t, FN) \ DO_VSHRN(OP##th, true, 2, uint16_t, 4, uint32_t, FN) -static inline uint64_t do_urshr(uint64_t x, unsigned sh) -{ - if (likely(sh < 64)) { - return (x >> sh) + ((x >> (sh - 1)) & 1); - } else if (sh == 64) { - return x >> 63; - } else { - return 0; - } -} - -static inline int64_t do_srshr(int64_t x, unsigned sh) -{ - if (likely(sh < 64)) { - return (x >> sh) + ((x >> (sh - 1)) & 1); - } else { - /* Rounding the sign bit always produces 0. */ - return 0; - } -} - DO_VSHRN_ALL(vshrn, DO_SHR) DO_VSHRN_ALL(vrshrn, do_urshr) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vg, uint32_t desc) \ when N is negative, add 2**M-1. */ #define DO_ASRD(N, M) ((N + (N < 0 ? ((__typeof(N))1 << M) - 1 : 0)) >> M) -static inline uint64_t do_urshr(uint64_t x, unsigned sh) -{ - if (likely(sh < 64)) { - return (x >> sh) + ((x >> (sh - 1)) & 1); - } else if (sh == 64) { - return x >> 63; - } else { - return 0; - } -} - -static inline int64_t do_srshr(int64_t x, unsigned sh) -{ - if (likely(sh < 64)) { - return (x >> sh) + ((x >> (sh - 1)) & 1); - } else { - /* Rounding the sign bit always produces 0. */ - return 0; - } -} - DO_ZPZI(sve_asr_zpzi_b, int8_t, H1, DO_SHR) DO_ZPZI(sve_asr_zpzi_h, int16_t, H1_2, DO_SHR) DO_ZPZI(sve_asr_zpzi_s, int32_t, H1_4, DO_SHR) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20250704142112.1018902-65-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 20 ++++++ target/arm/tcg/sme.decode | 37 ++++++++++ target/arm/tcg/sme_helper.c | 120 +++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 33 +++++++++ 4 files changed, 210 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(sme2_uzp4_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uzp4_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uzp4_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uzp4_q, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sme2_sqrshr_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqrshr_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshru_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshr_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqrshr_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshru_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshr_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqrshr_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshru_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sme2_sqrshrn_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqrshrn_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshrun_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshrn_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqrshrn_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshrun_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshrn_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqrshrn_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshrun_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ UZP_4 11000001 esz:2 1 10110 111000 ...00 ... 10 \ &zz_e zd=%zd_ax4 zn=%zn_ax4 UZP_4 11000001 001 10111 111000 ...00 ... 10 \ &zz_e esz=4 zd=%zd_ax4 zn=%zn_ax4 + +### SME2 Multi-vector SVE Constructive Binary + +&rshr zd zn shift + +%rshr_sh_shift 16:4 !function=rsub_16 +%rshr_sb_shift 16:5 !function=rsub_32 +%rshr_dh_shift 22:1 16:5 !function=rsub_64 + +@rshr_sh ........ .... .... ...... ..... zd:5 \ + &rshr zn=%zn_ax2 shift=%rshr_sh_shift +@rshr_sb ........ ... ..... ...... ..... zd:5 \ + &rshr zn=%zn_ax4 shift=%rshr_sb_shift +@rshr_dh ........ ... ..... ...... ..... zd:5 \ + &rshr zn=%zn_ax4 shift=%rshr_dh_shift + +SQRSHR_sh 11000001 1110 .... 110101 ....0 ..... @rshr_sh +UQRSHR_sh 11000001 1110 .... 110101 ....1 ..... @rshr_sh +SQRSHRU_sh 11000001 1111 .... 110101 ....0 ..... @rshr_sh + +SQRSHR_sb 11000001 011 ..... 110110 ...00 ..... @rshr_sb +SQRSHR_dh 11000001 1.1 ..... 110110 ...00 ..... @rshr_dh +UQRSHR_sb 11000001 011 ..... 110110 ...01 ..... @rshr_sb +UQRSHR_dh 11000001 1.1 ..... 110110 ...01 ..... @rshr_dh +SQRSHRU_sb 11000001 011 ..... 110110 ...10 ..... @rshr_sb +SQRSHRU_dh 11000001 1.1 ..... 110110 ...10 ..... @rshr_dh + +SQRSHRN_sh 01000101 1011 .... 001010 ....0 ..... @rshr_sh +UQRSHRN_sh 01000101 1011 .... 001110 ....0 ..... @rshr_sh +SQRSHRUN_sh 01000101 1011 .... 000010 ....0 ..... @rshr_sh + +SQRSHRN_sb 11000001 011 ..... 110111 ...00 ..... @rshr_sb +SQRSHRN_dh 11000001 1.1 ..... 110111 ...00 ..... @rshr_dh +UQRSHRN_sb 11000001 011 ..... 110111 ...01 ..... @rshr_sb +UQRSHRN_dh 11000001 1.1 ..... 110111 ...01 ..... @rshr_dh +SQRSHRUN_sb 11000001 011 ..... 110111 ...10 ..... @rshr_sb +SQRSHRUN_dh 11000001 1.1 ..... 110111 ...10 ..... @rshr_dh diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ SQCVT4(sme2_sqcvtu_dh, int64_t, uint16_t, H8, H2, do_usat_h) #undef SQCVT4 +#define SQRSHR2(NAME, TW, TN, HW, HN, RSHR, SAT) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch; \ + size_t oprsz = simd_oprsz(desc), n = oprsz / sizeof(TW); \ + int shift = simd_data(desc); \ + TW *s0 = vs, *s1 = vs + sizeof(ARMVectorReg); \ + TN *d = vd; \ + if (vectors_overlap(vd, 1, vs, 2)) { \ + d = (TN *)&scratch; \ + } \ + for (size_t i = 0; i < n; ++i) { \ + d[HN(i)] = SAT(RSHR(s0[HW(i)], shift)); \ + d[HN(i + n)] = SAT(RSHR(s1[HW(i)], shift)); \ + } \ + if (d != vd) { \ + memcpy(vd, d, oprsz); \ + } \ +} + +SQRSHR2(sme2_sqrshr_sh, int32_t, int16_t, H4, H2, do_srshr, do_ssat_h) +SQRSHR2(sme2_uqrshr_sh, uint32_t, uint16_t, H4, H2, do_urshr, do_usat_h) +SQRSHR2(sme2_sqrshru_sh, int32_t, uint16_t, H4, H2, do_srshr, do_usat_h) + +#undef SQRSHR2 + +#define SQRSHR4(NAME, TW, TN, HW, HN, RSHR, SAT) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch; \ + size_t oprsz = simd_oprsz(desc), n = oprsz / sizeof(TW); \ + int shift = simd_data(desc); \ + TW *s0 = vs, *s1 = vs + sizeof(ARMVectorReg); \ + TW *s2 = vs + 2 * sizeof(ARMVectorReg); \ + TW *s3 = vs + 3 * sizeof(ARMVectorReg); \ + TN *d = vd; \ + if (vectors_overlap(vd, 1, vs, 4)) { \ + d = (TN *)&scratch; \ + } \ + for (size_t i = 0; i < n; ++i) { \ + d[HN(i)] = SAT(RSHR(s0[HW(i)], shift)); \ + d[HN(i + n)] = SAT(RSHR(s1[HW(i)], shift)); \ + d[HN(i + 2 * n)] = SAT(RSHR(s2[HW(i)], shift)); \ + d[HN(i + 3 * n)] = SAT(RSHR(s3[HW(i)], shift)); \ + } \ + if (d != vd) { \ + memcpy(vd, d, oprsz); \ + } \ +} + +SQRSHR4(sme2_sqrshr_sb, int32_t, int8_t, H4, H2, do_srshr, do_ssat_b) +SQRSHR4(sme2_uqrshr_sb, uint32_t, uint8_t, H4, H2, do_urshr, do_usat_b) +SQRSHR4(sme2_sqrshru_sb, int32_t, uint8_t, H4, H2, do_srshr, do_usat_b) + +SQRSHR4(sme2_sqrshr_dh, int64_t, int16_t, H8, H2, do_srshr, do_ssat_h) +SQRSHR4(sme2_uqrshr_dh, uint64_t, uint16_t, H8, H2, do_urshr, do_usat_h) +SQRSHR4(sme2_sqrshru_dh, int64_t, uint16_t, H8, H2, do_srshr, do_usat_h) + +#undef SQRSHR4 + /* Convert and interleave */ void HELPER(sme2_bfcvtn)(void *vd, void *vs, float_status *fpst, uint32_t desc) { @@ -XXX,XX +XXX,XX @@ SQCVTN4(sme2_sqcvtun_dh, int64_t, uint16_t, H8, H2, do_usat_h) #undef SQCVTN4 +#define SQRSHRN2(NAME, TW, TN, HW, HN, RSHR, SAT) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch; \ + size_t oprsz = simd_oprsz(desc), n = oprsz / sizeof(TW); \ + int shift = simd_data(desc); \ + TW *s0 = vs, *s1 = vs + sizeof(ARMVectorReg); \ + TN *d = vd; \ + if (vectors_overlap(vd, 1, vs, 2)) { \ + d = (TN *)&scratch; \ + } \ + for (size_t i = 0; i < n; ++i) { \ + d[HN(2 * i + 0)] = SAT(RSHR(s0[HW(i)], shift)); \ + d[HN(2 * i + 1)] = SAT(RSHR(s1[HW(i)], shift)); \ + } \ + if (d != vd) { \ + memcpy(vd, d, oprsz); \ + } \ +} + +SQRSHRN2(sme2_sqrshrn_sh, int32_t, int16_t, H4, H2, do_srshr, do_ssat_h) +SQRSHRN2(sme2_uqrshrn_sh, uint32_t, uint16_t, H4, H2, do_urshr, do_usat_h) +SQRSHRN2(sme2_sqrshrun_sh, int32_t, uint16_t, H4, H2, do_srshr, do_usat_h) + +#undef SQRSHRN2 + +#define SQRSHRN4(NAME, TW, TN, HW, HN, RSHR, SAT) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch; \ + size_t oprsz = simd_oprsz(desc), n = oprsz / sizeof(TW); \ + int shift = simd_data(desc); \ + TW *s0 = vs, *s1 = vs + sizeof(ARMVectorReg); \ + TW *s2 = vs + 2 * sizeof(ARMVectorReg); \ + TW *s3 = vs + 3 * sizeof(ARMVectorReg); \ + TN *d = vd; \ + if (vectors_overlap(vd, 1, vs, 4)) { \ + d = (TN *)&scratch; \ + } \ + for (size_t i = 0; i < n; ++i) { \ + d[HN(4 * i + 0)] = SAT(RSHR(s0[HW(i)], shift)); \ + d[HN(4 * i + 1)] = SAT(RSHR(s1[HW(i)], shift)); \ + d[HN(4 * i + 2)] = SAT(RSHR(s2[HW(i)], shift)); \ + d[HN(4 * i + 3)] = SAT(RSHR(s3[HW(i)], shift)); \ + } \ + if (d != vd) { \ + memcpy(vd, d, oprsz); \ + } \ +} + +SQRSHRN4(sme2_sqrshrn_sb, int32_t, int8_t, H4, H1, do_srshr, do_ssat_b) +SQRSHRN4(sme2_uqrshrn_sb, uint32_t, uint8_t, H4, H1, do_urshr, do_usat_b) +SQRSHRN4(sme2_sqrshrun_sb, int32_t, uint8_t, H4, H1, do_srshr, do_usat_b) + +SQRSHRN4(sme2_sqrshrn_dh, int64_t, int16_t, H8, H2, do_srshr, do_ssat_h) +SQRSHRN4(sme2_uqrshrn_dh, uint64_t, uint16_t, H8, H2, do_urshr, do_usat_h) +SQRSHRN4(sme2_sqrshrun_dh, int64_t, uint16_t, H8, H2, do_srshr, do_usat_h) + +#undef SQRSHRN4 + /* Expand and convert */ void HELPER(sme2_fcvt_w)(void *vd, void *vs, float_status *fpst, uint32_t desc) { diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2 * const uzp4_fns[] = { gen_helper_sme2_uzp4_q, }; TRANS_FEAT(UZP_4, aa64_sme2, do_zipuzp_4, a, uzp4_fns) + +static bool do_zz_rshr(DisasContext *s, arg_rshr *a, gen_helper_gvec_2 *fn) +{ + if (sve_access_check(s)) { + int vl = vec_full_reg_size(s); + tcg_gen_gvec_2_ool(vec_full_reg_offset(s, a->zd), + vec_full_reg_offset(s, a->zn), + vl, vl, a->shift, fn); + } + return true; +} + +TRANS_FEAT(SQRSHR_sh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshr_sh) +TRANS_FEAT(UQRSHR_sh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_uqrshr_sh) +TRANS_FEAT(SQRSHRU_sh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshru_sh) + +TRANS_FEAT(SQRSHR_sb, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshr_sb) +TRANS_FEAT(SQRSHR_dh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshr_dh) +TRANS_FEAT(UQRSHR_sb, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_uqrshr_sb) +TRANS_FEAT(UQRSHR_dh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_uqrshr_dh) +TRANS_FEAT(SQRSHRU_sb, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshru_sb) +TRANS_FEAT(SQRSHRU_dh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshru_dh) + +TRANS_FEAT(SQRSHRN_sh, aa64_sme2_or_sve2p1, do_zz_rshr, a, gen_helper_sme2_sqrshrn_sh) +TRANS_FEAT(UQRSHRN_sh, aa64_sme2_or_sve2p1, do_zz_rshr, a, gen_helper_sme2_uqrshrn_sh) +TRANS_FEAT(SQRSHRUN_sh, aa64_sme2_or_sve2p1, do_zz_rshr, a, gen_helper_sme2_sqrshrun_sh) + +TRANS_FEAT(SQRSHRN_sb, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshrn_sb) +TRANS_FEAT(SQRSHRN_dh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshrn_dh) +TRANS_FEAT(UQRSHRN_sb, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_uqrshrn_sb) +TRANS_FEAT(UQRSHRN_dh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_uqrshrn_dh) +TRANS_FEAT(SQRSHRUN_sb, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshrun_sb) +TRANS_FEAT(SQRSHRUN_dh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshrun_dh) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-66-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 12 +++++++ target/arm/tcg/sme.decode | 12 +++++++ target/arm/tcg/sme_helper.c | 62 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 40 ++++++++++++++++++++++ 4 files changed, 126 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(sme2_uunpk4_bh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uunpk4_hs, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uunpk4_sd, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_zip2_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_zip2_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_zip2_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_zip2_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_zip2_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sme2_uzp2_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uzp2_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uzp2_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uzp2_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uzp2_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_3(sme2_zip4_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_zip4_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_zip4_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ UQRSHRN_sb 11000001 011 ..... 110111 ...01 ..... @rshr_sb UQRSHRN_dh 11000001 1.1 ..... 110111 ...01 ..... @rshr_dh SQRSHRUN_sb 11000001 011 ..... 110111 ...10 ..... @rshr_sb SQRSHRUN_dh 11000001 1.1 ..... 110111 ...10 ..... @rshr_dh + +&zzz_e zd zn zm esz + +ZIP_2 11000001 esz:2 1 zm:5 110100 zn:5 .... 0 \ + &zzz_e zd=%zd_ax2 +ZIP_2 11000001 00 1 zm:5 110101 zn:5 .... 0 \ + &zzz_e zd=%zd_ax2 esz=4 + +UZP_2 11000001 esz:2 1 zm:5 110100 zn:5 .... 1 \ + &zzz_e zd=%zd_ax2 +UZP_2 11000001 00 1 zm:5 110101 zn:5 .... 1 \ + &zzz_e zd=%zd_ax2 esz=4 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_ucvtf)(void *vd, void *vs, float_status *fpst, uint32_t desc) } } +#define ZIP2(NAME, TYPE, H) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ +{ \ + ARMVectorReg scratch[2]; \ + size_t oprsz = simd_oprsz(desc); \ + size_t pairs = oprsz / (sizeof(TYPE) * 2); \ + TYPE *n = vn, *m = vm; \ + if (vectors_overlap(vd, 2, vn, 1)) { \ + n = memcpy(&scratch[0], vn, oprsz); \ + } \ + if (vectors_overlap(vd, 2, vm, 1)) { \ + m = memcpy(&scratch[1], vm, oprsz); \ + } \ + for (size_t r = 0; r < 2; ++r) { \ + TYPE *d = vd + r * sizeof(ARMVectorReg); \ + size_t base = r * pairs; \ + for (size_t p = 0; p < pairs; ++p) { \ + d[H(2 * p + 0)] = n[base + H(p)]; \ + d[H(2 * p + 1)] = m[base + H(p)]; \ + } \ + } \ +} + +ZIP2(sme2_zip2_b, uint8_t, H1) +ZIP2(sme2_zip2_h, uint16_t, H2) +ZIP2(sme2_zip2_s, uint32_t, H4) +ZIP2(sme2_zip2_d, uint64_t, ) +ZIP2(sme2_zip2_q, Int128, ) + +#undef ZIP2 + #define ZIP4(NAME, TYPE, H) \ void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ { \ @@ -XXX,XX +XXX,XX @@ ZIP4(sme2_zip4_q, Int128, ) #undef ZIP4 +#define UZP2(NAME, TYPE, H) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ +{ \ + ARMVectorReg scratch[2]; \ + size_t oprsz = simd_oprsz(desc); \ + size_t pairs = oprsz / (sizeof(TYPE) * 2); \ + TYPE *d0 = vd, *d1 = vd + sizeof(ARMVectorReg); \ + if (vectors_overlap(vd, 2, vn, 1)) { \ + vn = memcpy(&scratch[0], vn, oprsz); \ + } \ + if (vectors_overlap(vd, 2, vm, 1)) { \ + vm = memcpy(&scratch[1], vm, oprsz); \ + } \ + for (size_t r = 0; r < 2; ++r) { \ + TYPE *s = r ? vm : vn; \ + size_t base = r * pairs; \ + for (size_t p = 0; p < pairs; ++p) { \ + d0[base + H(p)] = s[H(2 * p + 0)]; \ + d1[base + H(p)] = s[H(2 * p + 1)]; \ + } \ + } \ +} + +UZP2(sme2_uzp2_b, uint8_t, H1) +UZP2(sme2_uzp2_h, uint16_t, H2) +UZP2(sme2_uzp2_s, uint32_t, H4) +UZP2(sme2_uzp2_d, uint64_t, ) +UZP2(sme2_uzp2_q, Int128, ) + +#undef UZP2 + #define UZP4(NAME, TYPE, H) \ void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ { \ diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(UQRSHRN_sb, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_uqrshrn_sb) TRANS_FEAT(UQRSHRN_dh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_uqrshrn_dh) TRANS_FEAT(SQRSHRUN_sb, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshrun_sb) TRANS_FEAT(SQRSHRUN_dh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshrun_dh) + +static bool do_zipuzp_2(DisasContext *s, arg_zzz_e *a, + gen_helper_gvec_3 * const fn[5]) +{ + int bytes_per_op = 2 << a->esz; + + /* MO_128 can fail the size test. */ + if (s->max_svl < bytes_per_op) { + unallocated_encoding(s); + } else if (sme_sm_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + if (svl < bytes_per_op) { + unallocated_encoding(s); + } else { + tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->zd), + vec_full_reg_offset(s, a->zn), + vec_full_reg_offset(s, a->zm), + svl, svl, 0, fn[a->esz]); + } + } + return true; +} + +static gen_helper_gvec_3 * const zip2_fns[] = { + gen_helper_sme2_zip2_b, + gen_helper_sme2_zip2_h, + gen_helper_sme2_zip2_s, + gen_helper_sme2_zip2_d, + gen_helper_sme2_zip2_q, +}; +TRANS_FEAT(ZIP_2, aa64_sme2, do_zipuzp_2, a, zip2_fns) + +static gen_helper_gvec_3 * const uzp2_fns[] = { + gen_helper_sme2_uzp2_b, + gen_helper_sme2_uzp2_h, + gen_helper_sme2_uzp2_s, + gen_helper_sme2_uzp2_d, + gen_helper_sme2_uzp2_q, +}; +TRANS_FEAT(UZP_2, aa64_sme2, do_zipuzp_2, a, uzp2_fns) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-67-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 15 +++++++ target/arm/tcg/sme.decode | 17 ++++++++ target/arm/tcg/sme_helper.c | 56 +++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 75 ++++++++++++++++++++++++++++++++++ 4 files changed, 163 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(sme2_sqrshrun_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_sqrshrn_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uqrshrn_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_sqrshrun_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sme2_sclamp_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_sclamp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_sclamp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_sclamp_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sme2_uclamp_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uclamp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uclamp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uclamp_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sme2_fclamp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sme2_fclamp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sme2_fclamp_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sme2_bfclamp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ UZP_2 11000001 esz:2 1 zm:5 110100 zn:5 .... 1 \ &zzz_e zd=%zd_ax2 UZP_2 11000001 00 1 zm:5 110101 zn:5 .... 1 \ &zzz_e zd=%zd_ax2 esz=4 + +&zzz_en zd zn zm esz n + +FCLAMP 11000001 esz:2 1 zm:5 110000 zn:5 .... 0 \ + &zzz_en zd=%zd_ax2 n=2 +FCLAMP 11000001 esz:2 1 zm:5 110010 zn:5 ...0 0 \ + &zzz_en zd=%zd_ax4 n=4 + +SCLAMP 11000001 esz:2 1 zm:5 110001 zn:5 .... 0 \ + &zzz_en zd=%zd_ax2 n=2 +SCLAMP 11000001 esz:2 1 zm:5 110011 zn:5 ...0 0 \ + &zzz_en zd=%zd_ax4 n=4 + +UCLAMP 11000001 esz:2 1 zm:5 110001 zn:5 .... 1 \ + &zzz_en zd=%zd_ax2 n=2 +UCLAMP 11000001 esz:2 1 zm:5 110011 zn:5 ...0 1 \ + &zzz_en zd=%zd_ax4 n=4 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ UZP4(sme2_uzp4_d, uint64_t, ) UZP4(sme2_uzp4_q, Int128, ) #undef UZP4 + +#define ICLAMP(NAME, TYPE, H) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ +{ \ + size_t stride = sizeof(ARMVectorReg) / sizeof(TYPE); \ + size_t elements = simd_oprsz(desc) / sizeof(TYPE); \ + size_t nreg = simd_data(desc); \ + TYPE *d = vd, *n = vn, *m = vm; \ + for (size_t e = 0; e < elements; e++) { \ + TYPE nn = n[H(e)], mm = m[H(e)]; \ + for (size_t r = 0; r < nreg; r++) { \ + TYPE *dd = &d[r * stride + H(e)]; \ + *dd = MIN(MAX(*dd, nn), mm); \ + } \ + } \ +} + +ICLAMP(sme2_sclamp_b, int8_t, H1) +ICLAMP(sme2_sclamp_h, int16_t, H2) +ICLAMP(sme2_sclamp_s, int32_t, H4) +ICLAMP(sme2_sclamp_d, int64_t, H8) + +ICLAMP(sme2_uclamp_b, uint8_t, H1) +ICLAMP(sme2_uclamp_h, uint16_t, H2) +ICLAMP(sme2_uclamp_s, uint32_t, H4) +ICLAMP(sme2_uclamp_d, uint64_t, H8) + +#undef ICLAMP + +/* + * Note the argument ordering to minnum and maxnum must match + * the ARM pseudocode so that NaNs are propagated properly. + */ +#define FCLAMP(NAME, TYPE, H) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, \ + float_status *fpst, uint32_t desc) \ +{ \ + size_t stride = sizeof(ARMVectorReg) / sizeof(TYPE); \ + size_t elements = simd_oprsz(desc) / sizeof(TYPE); \ + size_t nreg = simd_data(desc); \ + TYPE *d = vd, *n = vn, *m = vm; \ + for (size_t e = 0; e < elements; e++) { \ + TYPE nn = n[H(e)], mm = m[H(e)]; \ + for (size_t r = 0; r < nreg; r++) { \ + TYPE *dd = &d[r * stride + H(e)]; \ + *dd = TYPE##_minnum(TYPE##_maxnum(nn, *dd, fpst), mm, fpst); \ + } \ + } \ +} + +FCLAMP(sme2_fclamp_h, float16, H2) +FCLAMP(sme2_fclamp_s, float32, H4) +FCLAMP(sme2_fclamp_d, float64, H8) +FCLAMP(sme2_bfclamp, bfloat16, H2) + +#undef FCLAMP diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3 * const uzp2_fns[] = { gen_helper_sme2_uzp2_q, }; TRANS_FEAT(UZP_2, aa64_sme2, do_zipuzp_2, a, uzp2_fns) + +static bool trans_FCLAMP(DisasContext *s, arg_zzz_en *a) +{ + static gen_helper_gvec_3_ptr * const fn[] = { + gen_helper_sme2_bfclamp, + gen_helper_sme2_fclamp_h, + gen_helper_sme2_fclamp_s, + gen_helper_sme2_fclamp_d, + }; + TCGv_ptr fpst; + int vl; + + if (!dc_isar_feature(aa64_sme2, s)) { + return false; + } + /* This insn uses MO_8 to encode BFloat16. */ + if (a->esz == MO_8 && !dc_isar_feature(aa64_sme_b16b16, s)) { + return false; + } + if (!sme_sm_enabled_check(s)) { + return true; + } + + fpst = fpstatus_ptr(a->esz == MO_16 ? FPST_A64_F16 : FPST_A64); + vl = vec_full_reg_size(s); + + tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, a->zd), + vec_full_reg_offset(s, a->zn), + vec_full_reg_offset(s, a->zm), + fpst, vl, vl, a->n, fn[a->esz]); + return true; +} + +static bool do_clamp(DisasContext *s, arg_zzz_en *a, + gen_helper_gvec_3 * const fn[4]) +{ + int vl; + + if (!dc_isar_feature(aa64_sme2, s)) { + return false; + } + if (!sme_sm_enabled_check(s)) { + return true; + } + + /* + * Clamp is just a min+max, easily supported by most host + * vector operations -- we already have such an expansion in + * translate-sve.c for a single output. + * TODO: Add support in gvec for multiple simultaneous output, + * and/or copy to temporary upon overlap. + */ + vl = vec_full_reg_size(s); + tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->zd), + vec_full_reg_offset(s, a->zn), + vec_full_reg_offset(s, a->zm), + vl, vl, a->n, fn[a->esz]); + return true; +} + +static gen_helper_gvec_3 * const sclamp_fns[] = { + gen_helper_sme2_sclamp_b, + gen_helper_sme2_sclamp_h, + gen_helper_sme2_sclamp_s, + gen_helper_sme2_sclamp_d, +}; +TRANS(SCLAMP, do_clamp, a, sclamp_fns) + +static gen_helper_gvec_3 * const uclamp_fns[] = { + gen_helper_sme2_uclamp_b, + gen_helper_sme2_uclamp_h, + gen_helper_sme2_uclamp_s, + gen_helper_sme2_uclamp_d, +}; +TRANS(UCLAMP, do_clamp, a, uclamp_fns) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> These instructions are present in both SME(1) and SVE2.1 extensions. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-68-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-sve.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static void gen_sclamp(unsigned vece, uint32_t d, uint32_t n, uint32_t m, tcg_gen_gvec_4(d, n, m, a, oprsz, maxsz, &ops[vece]); } -TRANS_FEAT(SCLAMP, aa64_sme, gen_gvec_fn_arg_zzzz, gen_sclamp, a) +TRANS_FEAT(SCLAMP, aa64_sme_or_sve2p1, gen_gvec_fn_arg_zzzz, gen_sclamp, a) static void gen_uclamp_i32(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_i32 a) { @@ -XXX,XX +XXX,XX @@ static void gen_uclamp(unsigned vece, uint32_t d, uint32_t n, uint32_t m, tcg_gen_gvec_4(d, n, m, a, oprsz, maxsz, &ops[vece]); } -TRANS_FEAT(UCLAMP, aa64_sme, gen_gvec_fn_arg_zzzz, gen_uclamp, a) +TRANS_FEAT(UCLAMP, aa64_sme_or_sve2p1, gen_gvec_fn_arg_zzzz, gen_uclamp, a) TRANS_FEAT(SQCVTN_sh, aa64_sme2_or_sve2p1, gen_gvec_ool_zz, gen_helper_sme2_sqcvtn_sh, a->rd, a->rn, 0) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> This is the single vector version within SVE decode space. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-69-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 2 ++ target/arm/tcg/translate-sve.c | 22 ++++++++++++++++++++++ 2 files changed, 24 insertions(+) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ PSEL 00100101 .1 1 000 .. 01 .... 0 .... 0 .... \ SCLAMP 01000100 .. 0 ..... 110000 ..... ..... @rda_rn_rm UCLAMP 01000100 .. 0 ..... 110001 ..... ..... @rda_rn_rm + +FCLAMP 01100100 .. 1 ..... 001001 ..... ..... @rda_rn_rm diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static void gen_uclamp(unsigned vece, uint32_t d, uint32_t n, uint32_t m, TRANS_FEAT(UCLAMP, aa64_sme_or_sve2p1, gen_gvec_fn_arg_zzzz, gen_uclamp, a) +static bool trans_FCLAMP(DisasContext *s, arg_FCLAMP *a) +{ + static gen_helper_gvec_3_ptr * const fn[] = { + gen_helper_sme2_bfclamp, + gen_helper_sme2_fclamp_h, + gen_helper_sme2_fclamp_s, + gen_helper_sme2_fclamp_d, + }; + + /* This insn uses MO_8 to encode BFloat16. */ + if (a->esz == MO_8 + ? !dc_isar_feature(aa64_sve_b16b16, s) + : !dc_isar_feature(aa64_sme2_or_sve2p1, s)) { + return false; + } + + /* So far we never optimize rda with MOVPRFX */ + assert(a->rd == a->ra); + return gen_gvec_fpst_zzz(s, fn[a->esz], a->rd, a->rn, a->rm, 1, + a->esz == MO_16 ? FPST_A64_F16 : FPST_A64); +} + TRANS_FEAT(SQCVTN_sh, aa64_sme2_or_sve2p1, gen_gvec_ool_zz, gen_helper_sme2_sqcvtn_sh, a->rd, a->rn, 0) TRANS_FEAT(UQCVTN_sh, aa64_sme2_or_sve2p1, gen_gvec_ool_zz, -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-70-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 23 +++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 20 ++++++++++++++++++++ 2 files changed, 43 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ UCLAMP 11000001 esz:2 1 zm:5 110001 zn:5 .... 1 \ &zzz_en zd=%zd_ax2 n=2 UCLAMP 11000001 esz:2 1 zm:5 110011 zn:5 ...0 1 \ &zzz_en zd=%zd_ax4 n=4 + +### SME Multiple Zero + +&zero_za rv off ngrp nvec + +ZERO_za 11000000 000011 000 .. 0000000000 off:3 \ + &zero_za ngrp=2 nvec=1 rv=%mova_rv +ZERO_za 11000000 000011 100 .. 0000000000 off:3 \ + &zero_za ngrp=4 nvec=1 rv=%mova_rv + +ZERO_za 11000000 000011 001 .. 0000000000 ... \ + &zero_za ngrp=1 nvec=2 rv=%mova_rv off=%off3_x2 +ZERO_za 11000000 000011 010 .. 0000000000 0.. \ + &zero_za ngrp=2 nvec=2 rv=%mova_rv off=%off2_x2 +ZERO_za 11000000 000011 011 .. 0000000000 0.. \ + &zero_za ngrp=4 nvec=2 rv=%mova_rv off=%off2_x2 + +ZERO_za 11000000 000011 101 .. 0000000000 0.. \ + &zero_za ngrp=1 nvec=4 rv=%mova_rv off=%off2_x4 +ZERO_za 11000000 000011 110 .. 0000000000 00. \ + &zero_za ngrp=2 nvec=4 rv=%mova_rv off=%off1_x4 +ZERO_za 11000000 000011 111 .. 0000000000 00. \ + &zero_za ngrp=4 nvec=4 rv=%mova_rv off=%off1_x4 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool trans_ZERO_zt0(DisasContext *s, arg_ZERO_zt0 *a) return true; } +static bool trans_ZERO_za(DisasContext *s, arg_ZERO_za *a) +{ + if (!dc_isar_feature(aa64_sme2p1, s)) { + return false; + } + if (sme_smza_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + int vstride = svl / a->ngrp; + TCGv_ptr t_za = get_zarray(s, a->rv, a->off, a->ngrp, a->nvec); + + for (int r = 0; r < a->ngrp; ++r) { + for (int i = 0; i < a->nvec; ++i) { + int o_za = (r * vstride + i) * sizeof(ARMVectorReg); + tcg_gen_gvec_dup_imm_var(MO_64, t_za, o_za, svl, svl, 0); + } + } + } + return true; +} + static bool do_mova_tile(DisasContext *s, arg_mova_p *a, bool to_vec) { static gen_helper_gvec_4 * const h_fns[5] = { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> For WHILE, we have the count of enabled predicates, so we don't need to search to compute the PredTest result. Reuse the logic that will shortly be required for counted predicates. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20250704142112.1018902-71-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve_helper.c | 81 +++++++++++++++++++++---------------- 1 file changed, 46 insertions(+), 35 deletions(-) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ uint64_t HELPER(sve_cntp)(void *vn, void *vg, uint32_t pred_desc) return sum; } +/* C.f. Arm pseudocode PredCountTest */ +static uint32_t pred_count_test(uint32_t elements, uint32_t count, bool invert) +{ + uint32_t flags; + + if (count == 0) { + flags = 1; /* !N, Z, C */ + } else if (!invert) { + flags = (1u << 31) | 2; /* N, !Z */ + flags |= count != elements; /* C */ + } else { + flags = 2; /* !Z, !C */ + flags |= (count == elements) << 31; /* N */ + } + return flags; +} + uint32_t HELPER(sve_whilel)(void *vd, uint32_t count, uint32_t pred_desc) { intptr_t oprsz = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); intptr_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); uint64_t esz_mask = pred_esz_masks[esz]; ARMPredicateReg *d = vd; - uint32_t flags; - intptr_t i; + intptr_t i, oprbits = oprsz * 8; + + tcg_debug_assert(count <= oprbits); /* Begin with a zero predicate register. */ - flags = do_zero(d, oprsz); - if (count == 0) { - return flags; + do_zero(d, oprsz); + if (count) { + /* Set all of the requested bits. */ + for (i = 0; i < count / 64; ++i) { + d->p[i] = esz_mask; + } + if (count & 63) { + d->p[i] = MAKE_64BIT_MASK(0, count & 63) & esz_mask; + } } - /* Set all of the requested bits. */ - for (i = 0; i < count / 64; ++i) { - d->p[i] = esz_mask; - } - if (count & 63) { - d->p[i] = MAKE_64BIT_MASK(0, count & 63) & esz_mask; - } - - return predtest_ones(d, oprsz, esz_mask); + return pred_count_test(oprbits, count, false); } uint32_t HELPER(sve_whileg)(void *vd, uint32_t count, uint32_t pred_desc) @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whileg)(void *vd, uint32_t count, uint32_t pred_desc) intptr_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); uint64_t esz_mask = pred_esz_masks[esz]; ARMPredicateReg *d = vd; - intptr_t i, invcount, oprbits; + intptr_t i, invcount, oprbits = oprsz * 8; uint64_t bits; - if (count == 0) { - return do_zero(d, oprsz); - } - - oprbits = oprsz * 8; tcg_debug_assert(count <= oprbits); - bits = esz_mask; - if (oprbits & 63) { - bits &= MAKE_64BIT_MASK(0, oprbits & 63); - } - - invcount = oprbits - count; - for (i = (oprsz - 1) / 8; i > invcount / 64; --i) { - d->p[i] = bits; + /* Begin with a zero predicate register. */ + do_zero(d, oprsz); + if (count) { + /* Set all of the requested bits. */ bits = esz_mask; + if (oprbits & 63) { + bits &= MAKE_64BIT_MASK(0, oprbits & 63); + } + + invcount = oprbits - count; + for (i = (oprsz - 1) / 8; i > invcount / 64; --i) { + d->p[i] = bits; + bits = esz_mask; + } + d->p[i] = bits & MAKE_64BIT_MASK(invcount & 63, 64); } - d->p[i] = bits & MAKE_64BIT_MASK(invcount & 63, 64); - - while (--i >= 0) { - d->p[i] = 0; - } - - return predtest_ones(d, oprsz, esz_mask); + return pred_count_test(oprbits, count, true); } /* Recursive reduction on a function; -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Merge predtest_ones into its only caller. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-72-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve_helper.c | 34 ++++++++++++++-------------------- 1 file changed, 14 insertions(+), 20 deletions(-) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sve_brkn)(void *vd, void *vn, void *vg, uint32_t pred_desc) } } -/* As if PredTest(Ones(PL), D, esz). */ -static uint32_t predtest_ones(ARMPredicateReg *d, intptr_t oprsz, - uint64_t esz_mask) -{ - uint32_t flags = PREDTEST_INIT; - intptr_t i; - - for (i = 0; i < oprsz / 8; i++) { - flags = iter_predtest_fwd(d->p[i], esz_mask, flags); - } - if (oprsz & 7) { - uint64_t mask = ~(-1ULL << (8 * (oprsz & 7))); - flags = iter_predtest_fwd(d->p[i], esz_mask & mask, flags); - } - return flags; -} - uint32_t HELPER(sve_brkns)(void *vd, void *vn, void *vg, uint32_t pred_desc) { intptr_t oprsz = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); if (last_active_pred(vn, vg, oprsz)) { - return predtest_ones(vd, oprsz, -1); - } else { - return do_zero(vd, oprsz); + ARMPredicateReg *d = vd; + uint32_t flags = PREDTEST_INIT; + intptr_t i; + + /* As if PredTest(Ones(PL), D, MO_8). */ + for (i = 0; i < oprsz / 8; i++) { + flags = iter_predtest_fwd(d->p[i], -1, flags); + } + if (oprsz & 7) { + uint64_t mask = ~(-1ULL << (8 * (oprsz & 7))); + flags = iter_predtest_fwd(d->p[i], mask, flags); + } + return flags; } + return do_zero(vd, oprsz); } uint64_t HELPER(sve_cntp)(void *vn, void *vg, uint32_t pred_desc) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Expand to memset plus the return value, when used. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20250704142112.1018902-73-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve_helper.c | 28 +++++++++++----------------- 1 file changed, 11 insertions(+), 17 deletions(-) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ static uint32_t compute_brks_m(uint64_t *d, uint64_t *n, uint64_t *g, return flags; } -static uint32_t do_zero(ARMPredicateReg *d, intptr_t oprsz) -{ - /* It is quicker to zero the whole predicate than loop on OPRSZ. - * The compiler should turn this into 4 64-bit integer stores. - */ - memset(d, 0, sizeof(ARMPredicateReg)); - return PREDTEST_INIT; -} - void HELPER(sve_brkpa)(void *vd, void *vn, void *vm, void *vg, uint32_t pred_desc) { @@ -XXX,XX +XXX,XX @@ void HELPER(sve_brkpa)(void *vd, void *vn, void *vm, void *vg, if (last_active_pred(vn, vg, oprsz)) { compute_brk_z(vd, vm, vg, oprsz, true); } else { - do_zero(vd, oprsz); + memset(vd, 0, sizeof(ARMPredicateReg)); } } @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_brkpas)(void *vd, void *vn, void *vm, void *vg, if (last_active_pred(vn, vg, oprsz)) { return compute_brks_z(vd, vm, vg, oprsz, true); } else { - return do_zero(vd, oprsz); + memset(vd, 0, sizeof(ARMPredicateReg)); + return PREDTEST_INIT; } } @@ -XXX,XX +XXX,XX @@ void HELPER(sve_brkpb)(void *vd, void *vn, void *vm, void *vg, if (last_active_pred(vn, vg, oprsz)) { compute_brk_z(vd, vm, vg, oprsz, false); } else { - do_zero(vd, oprsz); + memset(vd, 0, sizeof(ARMPredicateReg)); } } @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_brkpbs)(void *vd, void *vn, void *vm, void *vg, if (last_active_pred(vn, vg, oprsz)) { return compute_brks_z(vd, vm, vg, oprsz, false); } else { - return do_zero(vd, oprsz); + memset(vd, 0, sizeof(ARMPredicateReg)); + return PREDTEST_INIT; } } @@ -XXX,XX +XXX,XX @@ void HELPER(sve_brkn)(void *vd, void *vn, void *vg, uint32_t pred_desc) { intptr_t oprsz = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); if (!last_active_pred(vn, vg, oprsz)) { - do_zero(vd, oprsz); + memset(vd, 0, sizeof(ARMPredicateReg)); } } @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_brkns)(void *vd, void *vn, void *vg, uint32_t pred_desc) } return flags; } - return do_zero(vd, oprsz); + memset(vd, 0, sizeof(ARMPredicateReg)); + return PREDTEST_INIT; } uint64_t HELPER(sve_cntp)(void *vn, void *vg, uint32_t pred_desc) @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whilel)(void *vd, uint32_t count, uint32_t pred_desc) tcg_debug_assert(count <= oprbits); /* Begin with a zero predicate register. */ - do_zero(d, oprsz); + memset(d, 0, sizeof(*d)); if (count) { /* Set all of the requested bits. */ for (i = 0; i < count / 64; ++i) { @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whileg)(void *vd, uint32_t count, uint32_t pred_desc) tcg_debug_assert(count <= oprbits); /* Begin with a zero predicate register. */ - do_zero(d, oprsz); + memset(d, 0, sizeof(*d)); if (count) { /* Set all of the requested bits. */ bits = esz_mask; -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-74-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve_helper.c | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ static uint32_t pred_count_test(uint32_t elements, uint32_t count, bool invert) return flags; } -uint32_t HELPER(sve_whilel)(void *vd, uint32_t count, uint32_t pred_desc) +/* D must be cleared on entry. */ +static void do_whilel(ARMPredicateReg *d, uint64_t esz_mask, + uint32_t count, uint32_t oprbits) { - intptr_t oprsz = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); - intptr_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); - uint64_t esz_mask = pred_esz_masks[esz]; - ARMPredicateReg *d = vd; - intptr_t i, oprbits = oprsz * 8; - tcg_debug_assert(count <= oprbits); - - /* Begin with a zero predicate register. */ - memset(d, 0, sizeof(*d)); if (count) { + uint32_t i; + /* Set all of the requested bits. */ for (i = 0; i < count / 64; ++i) { d->p[i] = esz_mask; @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whilel)(void *vd, uint32_t count, uint32_t pred_desc) d->p[i] = MAKE_64BIT_MASK(0, count & 63) & esz_mask; } } +} +uint32_t HELPER(sve_whilel)(void *vd, uint32_t count, uint32_t pred_desc) +{ + uint32_t oprsz = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); + uint32_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); + uint32_t oprbits = oprsz * 8; + uint64_t esz_mask = pred_esz_masks[esz]; + ARMPredicateReg *d = vd; + + memset(d, 0, sizeof(*d)); + do_whilel(d, esz_mask, count, oprbits); return pred_count_test(oprbits, count, false); } -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20250704142112.1018902-75-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve_helper.c | 39 +++++++++++++++++++------------------ 1 file changed, 20 insertions(+), 19 deletions(-) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whilel)(void *vd, uint32_t count, uint32_t pred_desc) return pred_count_test(oprbits, count, false); } -uint32_t HELPER(sve_whileg)(void *vd, uint32_t count, uint32_t pred_desc) +/* D must be cleared on entry. */ +static void do_whileg(ARMPredicateReg *d, uint64_t esz_mask, + uint32_t count, uint32_t oprbits) { - intptr_t oprsz = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); - intptr_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); - uint64_t esz_mask = pred_esz_masks[esz]; - ARMPredicateReg *d = vd; - intptr_t i, invcount, oprbits = oprsz * 8; - uint64_t bits; - tcg_debug_assert(count <= oprbits); - - /* Begin with a zero predicate register. */ - memset(d, 0, sizeof(*d)); if (count) { - /* Set all of the requested bits. */ - bits = esz_mask; - if (oprbits & 63) { - bits &= MAKE_64BIT_MASK(0, oprbits & 63); - } + uint32_t i, invcount = oprbits - count; + uint64_t bits = esz_mask & MAKE_64BIT_MASK(invcount & 63, 64); - invcount = oprbits - count; - for (i = (oprsz - 1) / 8; i > invcount / 64; --i) { + for (i = invcount / 64; i < oprbits / 64; ++i) { d->p[i] = bits; bits = esz_mask; } - d->p[i] = bits & MAKE_64BIT_MASK(invcount & 63, 64); + if (oprbits & 63) { + d->p[i] = bits & MAKE_64BIT_MASK(0, oprbits & 63); + } } +} +uint32_t HELPER(sve_whileg)(void *vd, uint32_t count, uint32_t pred_desc) +{ + uint32_t oprsz = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); + uint32_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); + uint32_t oprbits = oprsz * 8; + uint64_t esz_mask = pred_esz_masks[esz]; + ARMPredicateReg *d = vd; + + memset(d, 0, sizeof(*d)); + do_whileg(d, esz_mask, count, oprbits); return pred_count_test(oprbits, count, true); } -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Change the API to pass element count rather than bit count. This will be helpful later for predicate as counter. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-76-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve_helper.c | 2 ++ target/arm/tcg/translate-sve.c | 13 +++++-------- 2 files changed, 7 insertions(+), 8 deletions(-) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whilel)(void *vd, uint32_t count, uint32_t pred_desc) uint64_t esz_mask = pred_esz_masks[esz]; ARMPredicateReg *d = vd; + count <<= esz; memset(d, 0, sizeof(*d)); do_whilel(d, esz_mask, count, oprbits); return pred_count_test(oprbits, count, false); @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whileg)(void *vd, uint32_t count, uint32_t pred_desc) uint64_t esz_mask = pred_esz_masks[esz]; ARMPredicateReg *d = vd; + count <<= esz; memset(d, 0, sizeof(*d)); do_whileg(d, esz_mask, count, oprbits); return pred_count_test(oprbits, count, true); diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_WHILE(DisasContext *s, arg_WHILE *a) t2 = tcg_temp_new_i32(); tcg_gen_extrl_i64_i32(t2, t0); - /* Scale elements to bits. */ - tcg_gen_shli_i32(t2, t2, a->esz); - desc = FIELD_DP32(desc, PREDDESC, OPRSZ, vsz / 8); desc = FIELD_DP32(desc, PREDDESC, ESZ, a->esz); @@ -XXX,XX +XXX,XX @@ static bool trans_WHILE_ptr(DisasContext *s, arg_WHILE_ptr *a) op0 = read_cpu_reg(s, a->rn, 1); op1 = read_cpu_reg(s, a->rm, 1); - tmax = tcg_constant_i64(vsz); + tmax = tcg_constant_i64(vsz >> a->esz); diff = tcg_temp_new_i64(); if (a->rw) { @@ -XXX,XX +XXX,XX @@ static bool trans_WHILE_ptr(DisasContext *s, arg_WHILE_ptr *a) tcg_gen_sub_i64(diff, op0, op1); tcg_gen_sub_i64(t1, op1, op0); tcg_gen_movcond_i64(TCG_COND_GEU, diff, op0, op1, diff, t1); - /* Round down to a multiple of ESIZE. */ - tcg_gen_andi_i64(diff, diff, -1 << a->esz); + /* Divide, rounding down, by ESIZE. */ + tcg_gen_shri_i64(diff, diff, a->esz); /* If op1 == op0, diff == 0, and the condition is always true. */ tcg_gen_movcond_i64(TCG_COND_EQ, diff, op0, op1, tmax, diff); } else { /* WHILEWR */ tcg_gen_sub_i64(diff, op1, op0); - /* Round down to a multiple of ESIZE. */ - tcg_gen_andi_i64(diff, diff, -1 << a->esz); + /* Divide, rounding down, by ESIZE. */ + tcg_gen_shri_i64(diff, diff, a->esz); /* If op0 >= op1, diff <= 0, the condition is always true. */ tcg_gen_movcond_i64(TCG_COND_GEU, diff, op0, op1, tmax, diff); } -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Use TRANS_FEAT to select the correct predicate. Pass the helper and a boolean to do_WHILE. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-77-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 4 +++- target/arm/tcg/translate-sve.c | 23 +++++++++-------------- 2 files changed, 12 insertions(+), 15 deletions(-) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ SINCDECP_z 00100101 .. 1010 d:1 u:1 10000 00 .... ..... @incdec2_pred CTERM 00100101 1 sf:1 1 rm:5 001000 rn:5 ne:1 0000 # SVE integer compare scalar count and limit -WHILE 00100101 esz:2 1 rm:5 000 sf:1 u:1 lt:1 rn:5 eq:1 rd:4 +&while esz rd rn rm sf u eq +WHILE_lt 00100101 esz:2 1 rm:5 000 sf:1 u:1 1 rn:5 eq:1 rd:4 &while +WHILE_gt 00100101 esz:2 1 rm:5 000 sf:1 u:1 0 rn:5 eq:1 rd:4 &while # SVE2 pointer conflict compare WHILE_ptr 00100101 esz:2 1 rm:5 001 100 rn:5 rw:1 rd:4 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_CTERM(DisasContext *s, arg_CTERM *a) return true; } -static bool trans_WHILE(DisasContext *s, arg_WHILE *a) +typedef void gen_while_fn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32); +static bool do_WHILE(DisasContext *s, arg_while *a, bool lt, gen_while_fn *fn) { TCGv_i64 op0, op1, t0, t1, tmax; TCGv_i32 t2; @@ -XXX,XX +XXX,XX @@ static bool trans_WHILE(DisasContext *s, arg_WHILE *a) TCGCond cond; uint64_t maxval; /* Note that GE/HS has a->eq == 0 and GT/HI has a->eq == 1. */ - bool eq = a->eq == a->lt; + bool eq = a->eq == lt; - /* The greater-than conditions are all SVE2. */ - if (a->lt - ? !dc_isar_feature(aa64_sve, s) - : !dc_isar_feature(aa64_sve2, s)) { - return false; - } if (!sve_access_check(s)) { return true; } @@ -XXX,XX +XXX,XX @@ static bool trans_WHILE(DisasContext *s, arg_WHILE *a) t0 = tcg_temp_new_i64(); t1 = tcg_temp_new_i64(); - if (a->lt) { + if (lt) { tcg_gen_sub_i64(t0, op1, op0); if (a->u) { maxval = a->sf ? UINT64_MAX : UINT32_MAX; @@ -XXX,XX +XXX,XX @@ static bool trans_WHILE(DisasContext *s, arg_WHILE *a) ptr = tcg_temp_new_ptr(); tcg_gen_addi_ptr(ptr, tcg_env, pred_full_reg_offset(s, a->rd)); - if (a->lt) { - gen_helper_sve_whilel(t2, ptr, t2, tcg_constant_i32(desc)); - } else { - gen_helper_sve_whileg(t2, ptr, t2, tcg_constant_i32(desc)); - } + fn(t2, ptr, t2, tcg_constant_i32(desc)); + do_pred_flags(t2); return true; } +TRANS_FEAT(WHILE_lt, aa64_sve, do_WHILE, a, true, gen_helper_sve_whilel) +TRANS_FEAT(WHILE_gt, aa64_sve2, do_WHILE, a, false, gen_helper_sve_whileg) + static bool trans_WHILE_ptr(DisasContext *s, arg_WHILE_ptr *a) { TCGv_i64 op0, op1, diff, t1, tmax; -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-78-richard.henderson@linaro.org This instruction is present in both SME(1) and SVE2.1 extensions. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-sve.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_PSEL(DisasContext *s, arg_psel *a) TCGv_i64 tmp, didx, dbit; TCGv_ptr ptr; - if (!dc_isar_feature(aa64_sme, s)) { + if (!dc_isar_feature(aa64_sme_or_sve2p1, s)) { return false; } if (!sve_access_check(s)) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-79-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 3 +++ target/arm/tcg/sve.decode | 8 +++++++ target/arm/tcg/sve_helper.c | 40 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sve.c | 14 ++++++++---- 4 files changed, 61 insertions(+), 4 deletions(-) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(sve_cntp, TCG_CALL_NO_RWG, i64, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve_whilel, TCG_CALL_NO_RWG, i32, ptr, i32, i32) DEF_HELPER_FLAGS_3(sve_whileg, TCG_CALL_NO_RWG, i32, ptr, i32, i32) +DEF_HELPER_FLAGS_3(sve_while2l, TCG_CALL_NO_RWG, i32, ptr, i32, i32) +DEF_HELPER_FLAGS_3(sve_while2g, TCG_CALL_NO_RWG, i32, ptr, i32, i32) + DEF_HELPER_FLAGS_4(sve_subri_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32) DEF_HELPER_FLAGS_4(sve_subri_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32) DEF_HELPER_FLAGS_4(sve_subri_s, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ WHILE_gt 00100101 esz:2 1 rm:5 000 sf:1 u:1 0 rn:5 eq:1 rd:4 &while # SVE2 pointer conflict compare WHILE_ptr 00100101 esz:2 1 rm:5 001 100 rn:5 rw:1 rd:4 +# SVE2.1 predicate pair +%pd_pair 1:3 !function=times_2 +@while_pair ........ esz:2 . rm:5 .... u:1 . rn:5 . ... eq:1 \ + &while rd=%pd_pair sf=1 + +WHILE_lt_pair 00100101 .. 1 ..... 0101 . 1 ..... 1 ... . @while_pair +WHILE_gt_pair 00100101 .. 1 ..... 0101 . 0 ..... 1 ... . @while_pair + ### SVE Integer Wide Immediate - Unpredicated Group # SVE broadcast floating-point immediate (unpredicated) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whilel)(void *vd, uint32_t count, uint32_t pred_desc) return pred_count_test(oprbits, count, false); } +uint32_t HELPER(sve_while2l)(void *vd, uint32_t count, uint32_t pred_desc) +{ + uint32_t oprsz = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); + uint32_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); + uint32_t oprbits = oprsz * 8; + uint64_t esz_mask = pred_esz_masks[esz]; + ARMPredicateReg *d = vd; + + count <<= esz; + memset(d, 0, 2 * sizeof(*d)); + if (count <= oprbits) { + do_whilel(&d[0], esz_mask, count, oprbits); + } else { + do_whilel(&d[0], esz_mask, oprbits, oprbits); + do_whilel(&d[1], esz_mask, count - oprbits, oprbits); + } + + return pred_count_test(2 * oprbits, count, false); +} + /* D must be cleared on entry. */ static void do_whileg(ARMPredicateReg *d, uint64_t esz_mask, uint32_t count, uint32_t oprbits) @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whileg)(void *vd, uint32_t count, uint32_t pred_desc) return pred_count_test(oprbits, count, true); } +uint32_t HELPER(sve_while2g)(void *vd, uint32_t count, uint32_t pred_desc) +{ + uint32_t oprsz = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); + uint32_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); + uint32_t oprbits = oprsz * 8; + uint64_t esz_mask = pred_esz_masks[esz]; + ARMPredicateReg *d = vd; + + count <<= esz; + memset(d, 0, 2 * sizeof(*d)); + if (count <= oprbits) { + do_whileg(&d[1], esz_mask, count, oprbits); + } else { + do_whilel(&d[1], esz_mask, oprbits, oprbits); + do_whileg(&d[0], esz_mask, count - oprbits, oprbits); + } + + return pred_count_test(2 * oprbits, count, true); +} + /* Recursive reduction on a function; * C.f. the ARM ARM function ReducePredicated. * diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_CTERM(DisasContext *s, arg_CTERM *a) } typedef void gen_while_fn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32); -static bool do_WHILE(DisasContext *s, arg_while *a, bool lt, gen_while_fn *fn) +static bool do_WHILE(DisasContext *s, arg_while *a, + bool lt, int scale, gen_while_fn *fn) { TCGv_i64 op0, op1, t0, t1, tmax; TCGv_i32 t2; @@ -XXX,XX +XXX,XX @@ static bool do_WHILE(DisasContext *s, arg_while *a, bool lt, gen_while_fn *fn) } } - tmax = tcg_constant_i64(vsz >> a->esz); + tmax = tcg_constant_i64((vsz << scale) >> a->esz); if (eq) { /* Equality means one more iteration. */ tcg_gen_addi_i64(t0, t0, 1); @@ -XXX,XX +XXX,XX @@ static bool do_WHILE(DisasContext *s, arg_while *a, bool lt, gen_while_fn *fn) return true; } -TRANS_FEAT(WHILE_lt, aa64_sve, do_WHILE, a, true, gen_helper_sve_whilel) -TRANS_FEAT(WHILE_gt, aa64_sve2, do_WHILE, a, false, gen_helper_sve_whileg) +TRANS_FEAT(WHILE_lt, aa64_sve, do_WHILE, a, true, 0, gen_helper_sve_whilel) +TRANS_FEAT(WHILE_gt, aa64_sve2, do_WHILE, a, false, 0, gen_helper_sve_whileg) + +TRANS_FEAT(WHILE_lt_pair, aa64_sme2_or_sve2p1, do_WHILE, + a, true, 1, gen_helper_sve_while2l) +TRANS_FEAT(WHILE_gt_pair, aa64_sme2_or_sve2p1, do_WHILE, + a, false, 1, gen_helper_sve_while2g) static bool trans_WHILE_ptr(DisasContext *s, arg_WHILE_ptr *a) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-80-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 3 ++ target/arm/tcg/sve.decode | 11 +++++++ target/arm/tcg/sve_helper.c | 53 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sve.c | 22 ++++++++++---- 4 files changed, 84 insertions(+), 5 deletions(-) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(sve_whileg, TCG_CALL_NO_RWG, i32, ptr, i32, i32) DEF_HELPER_FLAGS_3(sve_while2l, TCG_CALL_NO_RWG, i32, ptr, i32, i32) DEF_HELPER_FLAGS_3(sve_while2g, TCG_CALL_NO_RWG, i32, ptr, i32, i32) +DEF_HELPER_FLAGS_3(sve_whilecl, TCG_CALL_NO_RWG, i32, ptr, i32, i32) +DEF_HELPER_FLAGS_3(sve_whilecg, TCG_CALL_NO_RWG, i32, ptr, i32, i32) + DEF_HELPER_FLAGS_4(sve_subri_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32) DEF_HELPER_FLAGS_4(sve_subri_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32) DEF_HELPER_FLAGS_4(sve_subri_s, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ %rn_ax2 6:4 !function=times_2 +%pnd 0:3 !function=plus_8 + ########################################################################### # Named attribute sets. These are used to make nice(er) names # when creating helpers common to those for the individual @@ -XXX,XX +XXX,XX @@ WHILE_ptr 00100101 esz:2 1 rm:5 001 100 rn:5 rw:1 rd:4 WHILE_lt_pair 00100101 .. 1 ..... 0101 . 1 ..... 1 ... . @while_pair WHILE_gt_pair 00100101 .. 1 ..... 0101 . 0 ..... 1 ... . @while_pair +# SVE2.1 predicate as count +@while_cnt ........ esz:2 . rm:5 .... u:1 . rn:5 . eq:1 ... \ + &while rd=%pnd sf=1 + +WHILE_lt_cnt2 00100101 .. 1 ..... 0100 . 1 ..... 1 . ... @while_cnt +WHILE_lt_cnt4 00100101 .. 1 ..... 0110 . 1 ..... 1 . ... @while_cnt +WHILE_gt_cnt2 00100101 .. 1 ..... 0100 . 0 ..... 1 . ... @while_cnt +WHILE_gt_cnt4 00100101 .. 1 ..... 0110 . 0 ..... 1 . ... @while_cnt + ### SVE Integer Wide Immediate - Unpredicated Group # SVE broadcast floating-point immediate (unpredicated) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ uint64_t HELPER(sve_cntp)(void *vn, void *vg, uint32_t pred_desc) return sum; } +/* C.f. Arm pseudocode EncodePredCount */ +static uint64_t encode_pred_count(uint32_t elements, uint32_t count, + uint32_t esz, bool invert) +{ + uint32_t pred; + + if (count == 0) { + return 0; + } + if (invert) { + count = elements - count; + } else if (count == elements) { + count = 0; + invert = true; + } + + pred = (count << 1) | 1; + pred <<= esz; + pred |= invert << 15; + + return pred; +} + /* C.f. Arm pseudocode PredCountTest */ static uint32_t pred_count_test(uint32_t elements, uint32_t count, bool invert) { @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_while2l)(void *vd, uint32_t count, uint32_t pred_desc) return pred_count_test(2 * oprbits, count, false); } +uint32_t HELPER(sve_whilecl)(void *vd, uint32_t count, uint32_t pred_desc) +{ + uint32_t pl = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); + uint32_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); + uint32_t scale = FIELD_EX32(pred_desc, PREDDESC, DATA); + uint32_t vl = pl * 8; + uint32_t elements = (vl >> esz) << scale; + ARMPredicateReg *d = vd; + + *d = (ARMPredicateReg) { + .p[0] = encode_pred_count(elements, count, esz, false) + }; + return pred_count_test(elements, count, false); +} + /* D must be cleared on entry. */ static void do_whileg(ARMPredicateReg *d, uint64_t esz_mask, uint32_t count, uint32_t oprbits) @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_while2g)(void *vd, uint32_t count, uint32_t pred_desc) return pred_count_test(2 * oprbits, count, true); } +uint32_t HELPER(sve_whilecg)(void *vd, uint32_t count, uint32_t pred_desc) +{ + uint32_t pl = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); + uint32_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); + uint32_t scale = FIELD_EX32(pred_desc, PREDDESC, DATA); + uint32_t vl = pl * 8; + uint32_t elements = (vl >> esz) << scale; + ARMPredicateReg *d = vd; + + *d = (ARMPredicateReg) { + .p[0] = encode_pred_count(elements, count, esz, true) + }; + return pred_count_test(elements, count, true); +} + /* Recursive reduction on a function; * C.f. the ARM ARM function ReducePredicated. * diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_CTERM(DisasContext *s, arg_CTERM *a) typedef void gen_while_fn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32); static bool do_WHILE(DisasContext *s, arg_while *a, - bool lt, int scale, gen_while_fn *fn) + bool lt, int scale, int data, gen_while_fn *fn) { TCGv_i64 op0, op1, t0, t1, tmax; TCGv_i32 t2; @@ -XXX,XX +XXX,XX @@ static bool do_WHILE(DisasContext *s, arg_while *a, desc = FIELD_DP32(desc, PREDDESC, OPRSZ, vsz / 8); desc = FIELD_DP32(desc, PREDDESC, ESZ, a->esz); + desc = FIELD_DP32(desc, PREDDESC, DATA, data); ptr = tcg_temp_new_ptr(); tcg_gen_addi_ptr(ptr, tcg_env, pred_full_reg_offset(s, a->rd)); @@ -XXX,XX +XXX,XX @@ static bool do_WHILE(DisasContext *s, arg_while *a, return true; } -TRANS_FEAT(WHILE_lt, aa64_sve, do_WHILE, a, true, 0, gen_helper_sve_whilel) -TRANS_FEAT(WHILE_gt, aa64_sve2, do_WHILE, a, false, 0, gen_helper_sve_whileg) +TRANS_FEAT(WHILE_lt, aa64_sve, do_WHILE, + a, true, 0, 0, gen_helper_sve_whilel) +TRANS_FEAT(WHILE_gt, aa64_sve2, do_WHILE, + a, false, 0, 0, gen_helper_sve_whileg) TRANS_FEAT(WHILE_lt_pair, aa64_sme2_or_sve2p1, do_WHILE, - a, true, 1, gen_helper_sve_while2l) + a, true, 1, 0, gen_helper_sve_while2l) TRANS_FEAT(WHILE_gt_pair, aa64_sme2_or_sve2p1, do_WHILE, - a, false, 1, gen_helper_sve_while2g) + a, false, 1, 0, gen_helper_sve_while2g) + +TRANS_FEAT(WHILE_lt_cnt2, aa64_sme2_or_sve2p1, do_WHILE, + a, true, 1, 1, gen_helper_sve_whilecl) +TRANS_FEAT(WHILE_lt_cnt4, aa64_sme2_or_sve2p1, do_WHILE, + a, true, 2, 2, gen_helper_sve_whilecl) +TRANS_FEAT(WHILE_gt_cnt2, aa64_sme2_or_sve2p1, do_WHILE, + a, false, 1, 1, gen_helper_sve_whilecg) +TRANS_FEAT(WHILE_gt_cnt4, aa64_sme2_or_sve2p1, do_WHILE, + a, false, 2, 2, gen_helper_sve_whilecg) static bool trans_WHILE_ptr(DisasContext *s, arg_WHILE_ptr *a) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-81-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 1 + target/arm/tcg/translate-sve.c | 16 ++++++++++++++++ 2 files changed, 17 insertions(+) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ PTEST 00100101 01 010000 11 pg:4 0 rn:4 0 0000 # SVE predicate initialize PTRUE 00100101 esz:2 01100 s:1 111000 pat:5 0 rd:4 +PTRUE_cnt 00100101 esz:2 1000000111100000010 ... rd=%pnd # SVE initialize FFR SETFFR 00100101 0010 1100 1001 0000 0000 0000 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool do_predset(DisasContext *s, int esz, int rd, int pat, bool setflag) TRANS_FEAT(PTRUE, aa64_sve, do_predset, a->esz, a->rd, a->pat, a->s) +static bool trans_PTRUE_cnt(DisasContext *s, arg_PTRUE_cnt *a) +{ + if (!dc_isar_feature(aa64_sme2_or_sve2p1, s)) { + return false; + } + if (sve_access_check(s)) { + /* Canonical TRUE is 0 count, invert bit, plus element size. */ + int val = (1 << 15) | (1 << a->esz); + + /* Write val to the first uint64_t; clear all of the rest. */ + tcg_gen_gvec_dup_imm(MO_64, pred_full_reg_offset(s, a->rd), + 8, size_for_gvec(pred_full_reg_size(s)), val); + } + return true; +} + /* Note pat == 31 is #all, to set all elements. */ TRANS_FEAT_NONSTREAMING(SETFFR, aa64_sve, do_predset, 0, FFR_PRED_NUM, 31, false) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-82-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 25 ++++++++++++++++++ target/arm/tcg/sve.decode | 7 ++++++ target/arm/tcg/sve_helper.c | 46 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sve.c | 35 ++++++++++++++++++++++++++ 4 files changed, 113 insertions(+) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve2_sqshlu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_sqshlu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_sqshlu_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_sqshlu_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2p1_addqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_addqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_addqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_addqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2p1_smaxqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_smaxqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_smaxqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_smaxqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2p1_sminqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_sminqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_sminqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_sminqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2p1_umaxqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_umaxqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_umaxqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_umaxqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2p1_uminqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_uminqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_uminqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_uminqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ UMAXV 00000100 .. 001 001 001 ... ..... ..... @rd_pg_rn SMINV 00000100 .. 001 010 001 ... ..... ..... @rd_pg_rn UMINV 00000100 .. 001 011 001 ... ..... ..... @rd_pg_rn +# SVE2.1 segment reduction +ADDQV 00000100 .. 000 101 001 ... ..... ..... @rd_pg_rn +SMAXQV 00000100 .. 001 100 001 ... ..... ..... @rd_pg_rn +SMINQV 00000100 .. 001 110 001 ... ..... ..... @rd_pg_rn +UMAXQV 00000100 .. 001 101 001 ... ..... ..... @rd_pg_rn +UMINQV 00000100 .. 001 111 001 ... ..... ..... @rd_pg_rn + ### SVE Shift by Immediate - Predicated Group # SVE bitwise shift by immediate (predicated) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ DO_VPZ_D(sve_uminv_d, uint64_t, uint64_t, -1, DO_MIN) #undef DO_VPZ #undef DO_VPZ_D +#define DO_VPQ(NAME, TYPE, H, INIT, OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vg, uint32_t desc) \ +{ \ + TYPE tmp[16 / sizeof(TYPE)] = { [0 ... 16 / sizeof(TYPE) - 1] = INIT }; \ + TYPE *n = vn; uint16_t *g = vg; \ + uintptr_t oprsz = simd_oprsz(desc); \ + uintptr_t nseg = oprsz / 16, nsegelt = 16 / sizeof(TYPE); \ + for (uintptr_t s = 0; s < nseg; s++) { \ + uint16_t pg = g[H2(s)]; \ + for (uintptr_t e = 0; e < nsegelt; e++, pg >>= sizeof(TYPE)) { \ + if (pg & 1) { \ + tmp[e] = OP(tmp[H(e)], n[s * nsegelt + H(e)]); \ + } \ + } \ + } \ + memcpy(vd, tmp, 16); \ + clear_tail(vd, 16, simd_maxsz(desc)); \ +} + +DO_VPQ(sve2p1_addqv_b, uint8_t, H1, 0, DO_ADD) +DO_VPQ(sve2p1_addqv_h, uint16_t, H2, 0, DO_ADD) +DO_VPQ(sve2p1_addqv_s, uint32_t, H4, 0, DO_ADD) +DO_VPQ(sve2p1_addqv_d, uint64_t, H8, 0, DO_ADD) + +DO_VPQ(sve2p1_smaxqv_b, int8_t, H1, INT8_MIN, DO_MAX) +DO_VPQ(sve2p1_smaxqv_h, int16_t, H2, INT16_MIN, DO_MAX) +DO_VPQ(sve2p1_smaxqv_s, int32_t, H4, INT32_MIN, DO_MAX) +DO_VPQ(sve2p1_smaxqv_d, int64_t, H8, INT64_MIN, DO_MAX) + +DO_VPQ(sve2p1_sminqv_b, int8_t, H1, INT8_MAX, DO_MIN) +DO_VPQ(sve2p1_sminqv_h, int16_t, H2, INT16_MAX, DO_MIN) +DO_VPQ(sve2p1_sminqv_s, int32_t, H4, INT32_MAX, DO_MIN) +DO_VPQ(sve2p1_sminqv_d, int64_t, H8, INT64_MAX, DO_MIN) + +DO_VPQ(sve2p1_umaxqv_b, uint8_t, H1, 0, DO_MAX) +DO_VPQ(sve2p1_umaxqv_h, uint16_t, H2, 0, DO_MAX) +DO_VPQ(sve2p1_umaxqv_s, uint32_t, H4, 0, DO_MAX) +DO_VPQ(sve2p1_umaxqv_d, uint64_t, H8, 0, DO_MAX) + +DO_VPQ(sve2p1_uminqv_b, uint8_t, H1, -1, DO_MIN) +DO_VPQ(sve2p1_uminqv_h, uint16_t, H2, -1, DO_MIN) +DO_VPQ(sve2p1_uminqv_s, uint32_t, H4, -1, DO_MIN) +DO_VPQ(sve2p1_uminqv_d, uint64_t, H8, -1, DO_MIN) + +#undef DO_VPQ + /* Two vector operand, one scalar operand, unpredicated. */ #define DO_ZZI(NAME, TYPE, OP) \ void HELPER(NAME)(void *vd, void *vn, uint64_t s64, uint32_t desc) \ diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SXTW, aa64_sve, gen_gvec_ool_arg_zpz, TRANS_FEAT(UXTW, aa64_sve, gen_gvec_ool_arg_zpz, a->esz == 3 ? gen_helper_sve_uxtw_d : NULL, a, 0) +static gen_helper_gvec_3 * const addqv_fns[4] = { + gen_helper_sve2p1_addqv_b, gen_helper_sve2p1_addqv_h, + gen_helper_sve2p1_addqv_s, gen_helper_sve2p1_addqv_d, +}; +TRANS_FEAT(ADDQV, aa64_sme2p1_or_sve2p1, + gen_gvec_ool_arg_zpz, addqv_fns[a->esz], a, 0) + +static gen_helper_gvec_3 * const smaxqv_fns[4] = { + gen_helper_sve2p1_smaxqv_b, gen_helper_sve2p1_smaxqv_h, + gen_helper_sve2p1_smaxqv_s, gen_helper_sve2p1_smaxqv_d, +}; +TRANS_FEAT(SMAXQV, aa64_sme2p1_or_sve2p1, + gen_gvec_ool_arg_zpz, smaxqv_fns[a->esz], a, 0) + +static gen_helper_gvec_3 * const sminqv_fns[4] = { + gen_helper_sve2p1_sminqv_b, gen_helper_sve2p1_sminqv_h, + gen_helper_sve2p1_sminqv_s, gen_helper_sve2p1_sminqv_d, +}; +TRANS_FEAT(SMINQV, aa64_sme2p1_or_sve2p1, + gen_gvec_ool_arg_zpz, sminqv_fns[a->esz], a, 0) + +static gen_helper_gvec_3 * const umaxqv_fns[4] = { + gen_helper_sve2p1_umaxqv_b, gen_helper_sve2p1_umaxqv_h, + gen_helper_sve2p1_umaxqv_s, gen_helper_sve2p1_umaxqv_d, +}; +TRANS_FEAT(UMAXQV, aa64_sme2p1_or_sve2p1, + gen_gvec_ool_arg_zpz, umaxqv_fns[a->esz], a, 0) + +static gen_helper_gvec_3 * const uminqv_fns[4] = { + gen_helper_sve2p1_uminqv_b, gen_helper_sve2p1_uminqv_h, + gen_helper_sve2p1_uminqv_s, gen_helper_sve2p1_uminqv_d, +}; +TRANS_FEAT(UMINQV, aa64_sme2p1_or_sve2p1, + gen_gvec_ool_arg_zpz, uminqv_fns[a->esz], a, 0) + /* *** SVE Integer Reduction Group */ -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-83-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 2 + target/arm/tcg/vec_internal.h | 74 ++++++++++++++++++++++++++++++++++ target/arm/tcg/sve.decode | 6 +++ target/arm/tcg/sve_helper.c | 28 +++++++++++++ target/arm/tcg/translate-sve.c | 36 +++++++++++++++++ 5 files changed, 146 insertions(+) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve2p1_uminqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2p1_uminqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2p1_uminqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2p1_uminqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(pext, TCG_CALL_NO_RWG, void, ptr, i32, i32) diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_internal.h +++ b/target/arm/tcg/vec_internal.h @@ -XXX,XX +XXX,XX @@ bfloat16 helper_sme2_ah_fmin_b16(bfloat16 a, bfloat16 b, float_status *fpst); float32 sve_f16_to_f32(float16 f, float_status *fpst); float16 sve_f32_to_f16(float32 f, float_status *fpst); +/* + * Decode helper functions for predicate as counter. + */ + +typedef struct { + unsigned count; + unsigned lg2_stride; + bool invert; +} DecodeCounter; + +static inline DecodeCounter +decode_counter(unsigned png, unsigned vl, unsigned v_esz) +{ + DecodeCounter ret = { }; + + /* C.f. Arm pseudocode CounterToPredicate. */ + if (likely(png & 0xf)) { + unsigned p_esz = ctz32(png); + + /* + * maxbit = log2(pl(bits) * 4) + * = log2(vl(bytes) * 4) + * = log2(vl) + 2 + * maxbit_mask = ones<maxbit:0> + * = (1 << (maxbit + 1)) - 1 + * = (1 << (log2(vl) + 2 + 1)) - 1 + * = (1 << (log2(vl) + 3)) - 1 + * = (pow2ceil(vl) << 3) - 1 + */ + ret.count = png & (((unsigned)pow2ceil(vl) << 3) - 1); + ret.count >>= p_esz + 1; + + ret.invert = (png >> 15) & 1; + + /* + * The Arm pseudocode for CounterToPredicate expands the count to + * a set of bits, and then the operation proceeds as for the original + * interpretation of predicates as a set of bits. + * + * We can avoid the expansion by adjusting the count and supplying + * an element stride. + */ + if (unlikely(p_esz != v_esz)) { + if (p_esz < v_esz) { + /* + * For predicate esz < vector esz, the expanded predicate + * will have more bits set than will be consumed. + * Adjust the count down, rounding up. + * Consider p_esz = MO_8, v_esz = MO_64, count 14: + * The expanded predicate would be + * 0011 1111 1111 1111 + * The significant bits are + * ...1 ...1 ...1 ...1 + */ + unsigned shift = v_esz - p_esz; + unsigned trunc = ret.count >> shift; + ret.count = trunc + (ret.count != (trunc << shift)); + } else { + /* + * For predicate esz > vector esz, the expanded predicate + * will have bits set only at power-of-two multiples of + * the vector esz. Bits at other multiples will all be + * false. Adjust the count up, and supply the caller + * with a stride of elements to skip. + */ + unsigned shift = p_esz - v_esz; + ret.count <<= shift; + ret.lg2_stride = shift; + } + } + } + return ret; +} + #endif /* TARGET_ARM_VEC_INTERNAL_H */ diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ %rn_ax2 6:4 !function=times_2 %pnd 0:3 !function=plus_8 +%pnn 5:3 !function=plus_8 ########################################################################### # Named attribute sets. These are used to make nice(er) names @@ -XXX,XX +XXX,XX @@ WHILE_lt_cnt4 00100101 .. 1 ..... 0110 . 1 ..... 1 . ... @while_cnt WHILE_gt_cnt2 00100101 .. 1 ..... 0100 . 0 ..... 1 . ... @while_cnt WHILE_gt_cnt4 00100101 .. 1 ..... 0110 . 0 ..... 1 . ... @while_cnt +# SVE2.1 extract mask predicate from predicate-as-counter +&pext rd rn esz imm +PEXT_1 00100101 esz:2 1 00000 0111 00 imm:2 ... 1 rd:4 &pext rn=%pnn +PEXT_2 00100101 esz:2 1 00000 0111 010 imm:1 ... 1 rd:4 &pext rn=%pnn + ### SVE Integer Wide Immediate - Unpredicated Group # SVE broadcast floating-point immediate (unpredicated) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ DO_FCVTLT(sve2_fcvtlt_sd, uint64_t, uint32_t, H1_8, H1_4, float32_to_float64) #undef DO_FCVTLT #undef DO_FCVTNT + +void HELPER(pext)(void *vd, uint32_t png, uint32_t desc) +{ + int pl = FIELD_EX32(desc, PREDDESC, OPRSZ); + int vl = pl * 8; + unsigned v_esz = FIELD_EX32(desc, PREDDESC, ESZ); + int part = FIELD_EX32(desc, PREDDESC, DATA); + DecodeCounter p = decode_counter(png, vl, v_esz); + uint64_t mask = pred_esz_masks[v_esz + p.lg2_stride]; + ARMPredicateReg *d = vd; + + /* + * Convert from element count to byte count and adjust + * for the portion of the 4*VL counter to be extracted. + */ + int b_count = (p.count << v_esz) - vl * part; + + memset(d, 0, sizeof(*d)); + if (p.invert) { + if (b_count <= 0) { + do_whilel(vd, mask, vl, vl); + } else if (b_count < vl) { + do_whileg(vd, mask, vl - b_count, vl); + } + } else if (b_count > 0) { + do_whilel(vd, mask, MIN(b_count, vl), vl); + } +} diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_WHILE_ptr(DisasContext *s, arg_WHILE_ptr *a) return true; } +static bool do_pext(DisasContext *s, arg_pext *a, int n) +{ + TCGv_i32 t_png; + TCGv_ptr t_pd; + int pl; + + if (!sve_access_check(s)) { + return true; + } + + t_png = tcg_temp_new_i32(); + tcg_gen_ld16u_i32(t_png, tcg_env, + pred_full_reg_offset(s, a->rn) ^ + (HOST_BIG_ENDIAN ? 6 : 0)); + + t_pd = tcg_temp_new_ptr(); + pl = pred_full_reg_size(s); + + for (int i = 0; i < n; ++i) { + int rd = (a->rd + i) % 16; + int part = a->imm * n + i; + unsigned desc = 0; + + desc = FIELD_DP32(desc, PREDDESC, OPRSZ, pl); + desc = FIELD_DP32(desc, PREDDESC, ESZ, a->esz); + desc = FIELD_DP32(desc, PREDDESC, DATA, part); + + tcg_gen_addi_ptr(t_pd, tcg_env, pred_full_reg_offset(s, rd)); + gen_helper_pext(t_pd, t_png, tcg_constant_i32(desc)); + } + return true; +} + +TRANS_FEAT(PEXT_1, aa64_sme2_or_sve2p1, do_pext, a, 1) +TRANS_FEAT(PEXT_2, aa64_sme2_or_sve2p1, do_pext, a, 2) + /* *** SVE Integer Wide Immediate - Unpredicated Group */ -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-84-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 5 + target/arm/tcg/sme.decode | 9 + target/arm/tcg/sme_helper.c | 317 +++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 31 ++++ 4 files changed, 362 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sme2_fclamp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i3 DEF_HELPER_FLAGS_5(sme2_fclamp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(sme2_fclamp_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(sme2_bfclamp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) + +DEF_HELPER_FLAGS_5(sme2_sel_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32, i32) +DEF_HELPER_FLAGS_5(sme2_sel_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32, i32) +DEF_HELPER_FLAGS_5(sme2_sel_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32, i32) +DEF_HELPER_FLAGS_5(sme2_sel_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ UCLAMP 11000001 esz:2 1 zm:5 110001 zn:5 .... 1 \ UCLAMP 11000001 esz:2 1 zm:5 110011 zn:5 ...0 1 \ &zzz_en zd=%zd_ax4 n=4 +### SME2 Multi-vector SVE Select + +%sel_pg 10:3 !function=plus_8 + +SEL 11000001 esz:2 1 ....0 100 ... ....0 ....0 \ + n=2 zd=%zd_ax2 zn=%zn_ax2 zm=%zm_ax2 pg=%sel_pg +SEL 11000001 esz:2 1 ...01 100 ... ...00 ...00 \ + n=4 zd=%zd_ax4 zn=%zn_ax4 zm=%zm_ax4 pg=%sel_pg + ### SME Multiple Zero &zero_za rv off ngrp nvec diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ FCLAMP(sme2_fclamp_d, float64, H8) FCLAMP(sme2_bfclamp, bfloat16, H2) #undef FCLAMP + +void HELPER(sme2_sel_b)(void *vd, void *vn, void *vm, + uint32_t png, uint32_t desc) +{ + int vl = simd_oprsz(desc); + int nreg = simd_data(desc); + int elements = vl / sizeof(uint8_t); + DecodeCounter p = decode_counter(png, vl, MO_8); + + if (p.lg2_stride == 0) { + if (p.invert) { + for (int r = 0; r < nreg; r++) { + uint8_t *d = vd + r * sizeof(ARMVectorReg); + uint8_t *n = vn + r * sizeof(ARMVectorReg); + uint8_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + + if (split <= 0) { + memcpy(d, n, vl); /* all true */ + } else if (elements <= split) { + memcpy(d, m, vl); /* all false */ + } else { + for (int e = 0; e < split; e++) { + d[H1(e)] = m[H1(e)]; + } + for (int e = split; e < elements; e++) { + d[H1(e)] = n[H1(e)]; + } + } + } + } else { + for (int r = 0; r < nreg; r++) { + uint8_t *d = vd + r * sizeof(ARMVectorReg); + uint8_t *n = vn + r * sizeof(ARMVectorReg); + uint8_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + + if (split <= 0) { + memcpy(d, m, vl); /* all false */ + } else if (elements <= split) { + memcpy(d, n, vl); /* all true */ + } else { + for (int e = 0; e < split; e++) { + d[H1(e)] = n[H1(e)]; + } + for (int e = split; e < elements; e++) { + d[H1(e)] = m[H1(e)]; + } + } + } + } + } else { + int estride = 1 << p.lg2_stride; + if (p.invert) { + for (int r = 0; r < nreg; r++) { + uint8_t *d = vd + r * sizeof(ARMVectorReg); + uint8_t *n = vn + r * sizeof(ARMVectorReg); + uint8_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + int e = 0; + + for (; e < MIN(split, elements); e++) { + d[H1(e)] = m[H1(e)]; + } + for (; e < elements; e += estride) { + d[H1(e)] = n[H1(e)]; + for (int i = 1; i < estride; i++) { + d[H1(e + i)] = m[H1(e + i)]; + } + } + } + } else { + for (int r = 0; r < nreg; r++) { + uint8_t *d = vd + r * sizeof(ARMVectorReg); + uint8_t *n = vn + r * sizeof(ARMVectorReg); + uint8_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + int e = 0; + + for (; e < MIN(split, elements); e += estride) { + d[H1(e)] = n[H1(e)]; + for (int i = 1; i < estride; i++) { + d[H1(e + i)] = m[H1(e + i)]; + } + } + for (; e < elements; e++) { + d[H1(e)] = m[H1(e)]; + } + } + } + } +} + +void HELPER(sme2_sel_h)(void *vd, void *vn, void *vm, + uint32_t png, uint32_t desc) +{ + int vl = simd_oprsz(desc); + int nreg = simd_data(desc); + int elements = vl / sizeof(uint16_t); + DecodeCounter p = decode_counter(png, vl, MO_16); + + if (p.lg2_stride == 0) { + if (p.invert) { + for (int r = 0; r < nreg; r++) { + uint16_t *d = vd + r * sizeof(ARMVectorReg); + uint16_t *n = vn + r * sizeof(ARMVectorReg); + uint16_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + + if (split <= 0) { + memcpy(d, n, vl); /* all true */ + } else if (elements <= split) { + memcpy(d, m, vl); /* all false */ + } else { + for (int e = 0; e < split; e++) { + d[H2(e)] = m[H2(e)]; + } + for (int e = split; e < elements; e++) { + d[H2(e)] = n[H2(e)]; + } + } + } + } else { + for (int r = 0; r < nreg; r++) { + uint16_t *d = vd + r * sizeof(ARMVectorReg); + uint16_t *n = vn + r * sizeof(ARMVectorReg); + uint16_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + + if (split <= 0) { + memcpy(d, m, vl); /* all false */ + } else if (elements <= split) { + memcpy(d, n, vl); /* all true */ + } else { + for (int e = 0; e < split; e++) { + d[H2(e)] = n[H2(e)]; + } + for (int e = split; e < elements; e++) { + d[H2(e)] = m[H2(e)]; + } + } + } + } + } else { + int estride = 1 << p.lg2_stride; + if (p.invert) { + for (int r = 0; r < nreg; r++) { + uint16_t *d = vd + r * sizeof(ARMVectorReg); + uint16_t *n = vn + r * sizeof(ARMVectorReg); + uint16_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + int e = 0; + + for (; e < MIN(split, elements); e++) { + d[H2(e)] = m[H2(e)]; + } + for (; e < elements; e += estride) { + d[H2(e)] = n[H2(e)]; + for (int i = 1; i < estride; i++) { + d[H2(e + i)] = m[H2(e + i)]; + } + } + } + } else { + for (int r = 0; r < nreg; r++) { + uint16_t *d = vd + r * sizeof(ARMVectorReg); + uint16_t *n = vn + r * sizeof(ARMVectorReg); + uint16_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + int e = 0; + + for (; e < MIN(split, elements); e += estride) { + d[H2(e)] = n[H2(e)]; + for (int i = 1; i < estride; i++) { + d[H2(e + i)] = m[H2(e + i)]; + } + } + for (; e < elements; e++) { + d[H2(e)] = m[H2(e)]; + } + } + } + } +} + +void HELPER(sme2_sel_s)(void *vd, void *vn, void *vm, + uint32_t png, uint32_t desc) +{ + int vl = simd_oprsz(desc); + int nreg = simd_data(desc); + int elements = vl / sizeof(uint32_t); + DecodeCounter p = decode_counter(png, vl, MO_32); + + if (p.lg2_stride == 0) { + if (p.invert) { + for (int r = 0; r < nreg; r++) { + uint32_t *d = vd + r * sizeof(ARMVectorReg); + uint32_t *n = vn + r * sizeof(ARMVectorReg); + uint32_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + + if (split <= 0) { + memcpy(d, n, vl); /* all true */ + } else if (elements <= split) { + memcpy(d, m, vl); /* all false */ + } else { + for (int e = 0; e < split; e++) { + d[H4(e)] = m[H4(e)]; + } + for (int e = split; e < elements; e++) { + d[H4(e)] = n[H4(e)]; + } + } + } + } else { + for (int r = 0; r < nreg; r++) { + uint32_t *d = vd + r * sizeof(ARMVectorReg); + uint32_t *n = vn + r * sizeof(ARMVectorReg); + uint32_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + + if (split <= 0) { + memcpy(d, m, vl); /* all false */ + } else if (elements <= split) { + memcpy(d, n, vl); /* all true */ + } else { + for (int e = 0; e < split; e++) { + d[H4(e)] = n[H4(e)]; + } + for (int e = split; e < elements; e++) { + d[H4(e)] = m[H4(e)]; + } + } + } + } + } else { + /* p.esz must be MO_64, so stride must be 2. */ + if (p.invert) { + for (int r = 0; r < nreg; r++) { + uint32_t *d = vd + r * sizeof(ARMVectorReg); + uint32_t *n = vn + r * sizeof(ARMVectorReg); + uint32_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + int e = 0; + + for (; e < MIN(split, elements); e++) { + d[H4(e)] = m[H4(e)]; + } + for (; e < elements; e += 2) { + d[H4(e)] = n[H4(e)]; + d[H4(e + 1)] = m[H4(e + 1)]; + } + } + } else { + for (int r = 0; r < nreg; r++) { + uint32_t *d = vd + r * sizeof(ARMVectorReg); + uint32_t *n = vn + r * sizeof(ARMVectorReg); + uint32_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + int e = 0; + + for (; e < MIN(split, elements); e += 2) { + d[H4(e)] = n[H4(e)]; + d[H4(e + 1)] = m[H4(e + 1)]; + } + for (; e < elements; e++) { + d[H4(e)] = m[H4(e)]; + } + } + } + } +} + +void HELPER(sme2_sel_d)(void *vd, void *vn, void *vm, + uint32_t png, uint32_t desc) +{ + int vl = simd_oprsz(desc); + int nreg = simd_data(desc); + int elements = vl / sizeof(uint64_t); + DecodeCounter p = decode_counter(png, vl, MO_64); + + if (p.invert) { + for (int r = 0; r < nreg; r++) { + uint64_t *d = vd + r * sizeof(ARMVectorReg); + uint64_t *n = vn + r * sizeof(ARMVectorReg); + uint64_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + + if (split <= 0) { + memcpy(d, n, vl); /* all true */ + } else if (elements <= split) { + memcpy(d, m, vl); /* all false */ + } else { + memcpy(d, m, split * sizeof(uint64_t)); + memcpy(d + split, n + split, + (elements - split) * sizeof(uint64_t)); + } + } + } else { + for (int r = 0; r < nreg; r++) { + uint64_t *d = vd + r * sizeof(ARMVectorReg); + uint64_t *n = vn + r * sizeof(ARMVectorReg); + uint64_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + + if (split <= 0) { + memcpy(d, m, vl); /* all false */ + } else if (elements <= split) { + memcpy(d, n, vl); /* all true */ + } else { + memcpy(d, n, split * sizeof(uint64_t)); + memcpy(d + split, m + split, + (elements - split) * sizeof(uint64_t)); + } + } + } +} diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3 * const uclamp_fns[] = { gen_helper_sme2_uclamp_d, }; TRANS(UCLAMP, do_clamp, a, uclamp_fns) + +static bool trans_SEL(DisasContext *s, arg_SEL *a) +{ + typedef void sme_sel_fn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32); + static sme_sel_fn * const fns[4] = { + gen_helper_sme2_sel_b, gen_helper_sme2_sel_h, + gen_helper_sme2_sel_s, gen_helper_sme2_sel_d + }; + + if (!dc_isar_feature(aa64_sme2, s)) { + return false; + } + if (sme_sm_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + uint32_t desc = simd_desc(svl, svl, a->n); + TCGv_ptr t_d = tcg_temp_new_ptr(); + TCGv_ptr t_n = tcg_temp_new_ptr(); + TCGv_ptr t_m = tcg_temp_new_ptr(); + TCGv_i32 png = tcg_temp_new_i32(); + + tcg_gen_addi_ptr(t_d, tcg_env, vec_full_reg_offset(s, a->zd)); + tcg_gen_addi_ptr(t_n, tcg_env, vec_full_reg_offset(s, a->zn)); + tcg_gen_addi_ptr(t_m, tcg_env, vec_full_reg_offset(s, a->zm)); + + tcg_gen_ld16u_i32(png, tcg_env, pred_full_reg_offset(s, a->pg) + ^ (HOST_BIG_ENDIAN ? 6 : 0)); + + fns[a->esz](t_d, t_n, t_m, png, tcg_constant_i32(desc)); + } + return true; +} -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-85-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 15 ++++++++++++ target/arm/tcg/sve.decode | 5 ++++ target/arm/tcg/sve_helper.c | 42 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sve.c | 3 +++ 4 files changed, 65 insertions(+) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve2p1_uminqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2p1_uminqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_3(pext, TCG_CALL_NO_RWG, void, ptr, i32, i32) + +DEF_HELPER_FLAGS_4(sve2p1_orqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_orqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_orqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_orqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2p1_eorqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_eorqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_eorqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_eorqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2p1_andqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_andqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_andqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_andqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ ORV 00000100 .. 011 000 001 ... ..... ..... @rd_pg_rn EORV 00000100 .. 011 001 001 ... ..... ..... @rd_pg_rn ANDV 00000100 .. 011 010 001 ... ..... ..... @rd_pg_rn +# SVE2.1 bitwise logical reduction (quadwords) +ORQV 00000100 .. 011 100 001 ... ..... ..... @rd_pg_rn +EORQV 00000100 .. 011 101 001 ... ..... ..... @rd_pg_rn +ANDQV 00000100 .. 011 110 001 ... ..... ..... @rd_pg_rn + # SVE constructive prefix (predicated) MOVPRFX_z 00000100 .. 010 000 001 ... ..... ..... @rd_pg_rn MOVPRFX_m 00000100 .. 010 001 001 ... ..... ..... @rd_pg_rn diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ static inline uint64_t expand_pred_s(uint8_t byte) return word[byte & 0x11]; } +static inline uint64_t expand_pred_d(uint8_t byte) +{ + return -(uint64_t)(byte & 1); +} + #define LOGICAL_PPPP(NAME, FUNC) \ void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc) \ { \ @@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc) \ #define DO_EOR(N, M) (N ^ M) #define DO_ORR(N, M) (N | M) #define DO_BIC(N, M) (N & ~M) +#define DO_ORC(N, M) (N | ~M) #define DO_ADD(N, M) (N + M) #define DO_SUB(N, M) (N - M) #define DO_MAX(N, M) ((N) >= (M) ? (N) : (M)) @@ -XXX,XX +XXX,XX @@ DO_ZZI(sve_umini_d, uint64_t, DO_MIN) #undef DO_ZZI +#define DO_LOGIC_QV(NAME, SUFF, INIT, VOP, POP) \ +void HELPER(NAME ## _ ## SUFF)(void *vd, void *vn, void *vg, uint32_t desc) \ +{ \ + unsigned seg = simd_oprsz(desc) / 16; \ + uint64_t r0 = INIT, r1 = INIT; \ + for (unsigned s = 0; s < seg; s++) { \ + uint64_t p0 = expand_pred_##SUFF(*(uint8_t *)(vg + H1(s * 2))); \ + uint64_t p1 = expand_pred_##SUFF(*(uint8_t *)(vg + H1(s * 2 + 1))); \ + uint64_t v0 = *(uint64_t *)(vn + s * 16); \ + uint64_t v1 = *(uint64_t *)(vn + s * 16 + 8); \ + v0 = POP(v0, p0), v1 = POP(v1, p1); \ + r0 = VOP(r0, v0), r1 = VOP(r1, v1); \ + } \ + *(uint64_t *)(vd + 0) = r0; \ + *(uint64_t *)(vd + 8) = r1; \ + clear_tail(vd, 16, simd_maxsz(desc)); \ +} + +DO_LOGIC_QV(sve2p1_orqv, b, 0, DO_ORR, DO_AND) +DO_LOGIC_QV(sve2p1_orqv, h, 0, DO_ORR, DO_AND) +DO_LOGIC_QV(sve2p1_orqv, s, 0, DO_ORR, DO_AND) +DO_LOGIC_QV(sve2p1_orqv, d, 0, DO_ORR, DO_AND) + +DO_LOGIC_QV(sve2p1_eorqv, b, 0, DO_EOR, DO_AND) +DO_LOGIC_QV(sve2p1_eorqv, h, 0, DO_EOR, DO_AND) +DO_LOGIC_QV(sve2p1_eorqv, s, 0, DO_EOR, DO_AND) +DO_LOGIC_QV(sve2p1_eorqv, d, 0, DO_EOR, DO_AND) + +DO_LOGIC_QV(sve2p1_andqv, b, -1, DO_AND, DO_ORC) +DO_LOGIC_QV(sve2p1_andqv, h, -1, DO_AND, DO_ORC) +DO_LOGIC_QV(sve2p1_andqv, s, -1, DO_AND, DO_ORC) +DO_LOGIC_QV(sve2p1_andqv, d, -1, DO_AND, DO_ORC) + +#undef DO_LOGIC_QV + #undef DO_AND #undef DO_ORR #undef DO_EOR #undef DO_BIC +#undef DO_ORC #undef DO_ADD #undef DO_SUB #undef DO_MAX diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ DO_ZPZ(NOT_zpz, aa64_sve, sve_not_zpz) DO_ZPZ(ABS, aa64_sve, sve_abs) DO_ZPZ(NEG, aa64_sve, sve_neg) DO_ZPZ(RBIT, aa64_sve, sve_rbit) +DO_ZPZ(ORQV, aa64_sme2p1_or_sve2p1, sve2p1_orqv) +DO_ZPZ(EORQV, aa64_sme2p1_or_sve2p1, sve2p1_eorqv) +DO_ZPZ(ANDQV, aa64_sme2p1_or_sve2p1, sve2p1_andqv) static gen_helper_gvec_3 * const fabs_fns[4] = { NULL, gen_helper_sve_fabs_h, -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-86-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 49 ++++++++++++++++++++++++ target/arm/tcg/sve.decode | 8 ++++ target/arm/tcg/sve_helper.c | 70 +++++++++++++++++++++------------- target/arm/tcg/translate-sve.c | 48 +++++++++++++++++++++++ 4 files changed, 148 insertions(+), 27 deletions(-) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_ah_fminv_s, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_4(sve_ah_fminv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_faddqv_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_faddqv_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_faddqv_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) + +DEF_HELPER_FLAGS_5(sve2p1_fmaxnmqv_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_fmaxnmqv_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_fmaxnmqv_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) + +DEF_HELPER_FLAGS_5(sve2p1_fminnmqv_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_fminnmqv_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_fminnmqv_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) + +DEF_HELPER_FLAGS_5(sve2p1_fmaxqv_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_fmaxqv_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_fmaxqv_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) + +DEF_HELPER_FLAGS_5(sve2p1_fminqv_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_fminqv_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_fminqv_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) + +DEF_HELPER_FLAGS_5(sve2p1_ah_fmaxqv_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_ah_fmaxqv_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_ah_fmaxqv_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) + +DEF_HELPER_FLAGS_5(sve2p1_ah_fminqv_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_ah_fminqv_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_ah_fminqv_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) + DEF_HELPER_FLAGS_5(sve_fadda_h, TCG_CALL_NO_RWG, i64, i64, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(sve_fadda_s, TCG_CALL_NO_RWG, diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ FMINNMV 01100101 .. 000 101 001 ... ..... ..... @rd_pg_rn FMAXV 01100101 .. 000 110 001 ... ..... ..... @rd_pg_rn FMINV 01100101 .. 000 111 001 ... ..... ..... @rd_pg_rn +### SVE FP recursive reduction (quadwords) + +FADDQV 01100100 .. 010 000 101 ... ..... ..... @rd_pg_rn +FMAXNMQV 01100100 .. 010 100 101 ... ..... ..... @rd_pg_rn +FMINNMQV 01100100 .. 010 101 101 ... ..... ..... @rd_pg_rn +FMAXQV 01100100 .. 010 110 101 ... ..... ..... @rd_pg_rn +FMINQV 01100100 .. 010 111 101 ... ..... ..... @rd_pg_rn + ## SVE Floating Point Unary Operations - Unpredicated Group FRECPE 01100101 .. 001 110 001100 ..... ..... @rd_rn diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whilecg)(void *vd, uint32_t count, uint32_t pred_desc) * The recursion is bounded to depth 7 (128 fp16 elements), so there's * little to gain with a more complex non-recursive form. */ -#define DO_REDUCE(NAME, TYPE, H, FUNC, IDENT) \ -static TYPE NAME##_reduce(TYPE *data, float_status *status, uintptr_t n) \ +#define DO_REDUCE(NAME, SUF, TYPE, H, FUNC, IDENT) \ +static TYPE FUNC##_reduce(TYPE *data, float_status *status, uintptr_t n) \ { \ if (n == 1) { \ return *data; \ } else { \ uintptr_t half = n / 2; \ - TYPE lo = NAME##_reduce(data, status, half); \ - TYPE hi = NAME##_reduce(data + half, status, half); \ + TYPE lo = FUNC##_reduce(data, status, half); \ + TYPE hi = FUNC##_reduce(data + half, status, half); \ return FUNC(lo, hi, status); \ } \ } \ -uint64_t HELPER(NAME)(void *vn, void *vg, float_status *s, uint32_t desc) \ +uint64_t helper_sve_##NAME##v_##SUF(void *vn, void *vg, \ + float_status *s, uint32_t desc) \ { \ uintptr_t i, oprsz = simd_oprsz(desc), maxsz = simd_data(desc); \ TYPE data[sizeof(ARMVectorReg) / sizeof(TYPE)]; \ @@ -XXX,XX +XXX,XX @@ uint64_t HELPER(NAME)(void *vn, void *vg, float_status *s, uint32_t desc) \ for (; i < maxsz; i += sizeof(TYPE)) { \ *(TYPE *)((void *)data + i) = IDENT; \ } \ - return NAME##_reduce(data, s, maxsz / sizeof(TYPE)); \ + return FUNC##_reduce(data, s, maxsz / sizeof(TYPE)); \ +} \ +void helper_sve2p1_##NAME##qv_##SUF(void *vd, void *vn, void *vg, \ + float_status *status, uint32_t desc) \ +{ \ + unsigned oprsz = simd_oprsz(desc), segments = oprsz / 16; \ + for (unsigned e = 0; e < 16; e += sizeof(TYPE)) { \ + TYPE data[ARM_MAX_VQ]; \ + for (unsigned s = 0; s < segments; s++) { \ + uint16_t pg = *(uint16_t *)(vg + H1_2(s * 2)); \ + TYPE nn = *(TYPE *)(vn + H(s * 16 + H(e))); \ + data[s] = (pg >> e) & 1 ? nn : IDENT; \ + } \ + *(TYPE *)(vd + H(e)) = FUNC##_reduce(data, status, segments); \ + } \ + clear_tail(vd, 16, simd_maxsz(desc)); \ } -DO_REDUCE(sve_faddv_h, float16, H1_2, float16_add, float16_zero) -DO_REDUCE(sve_faddv_s, float32, H1_4, float32_add, float32_zero) -DO_REDUCE(sve_faddv_d, float64, H1_8, float64_add, float64_zero) +DO_REDUCE(fadd,h, float16, H1_2, float16_add, float16_zero) +DO_REDUCE(fadd,s, float32, H1_4, float32_add, float32_zero) +DO_REDUCE(fadd,d, float64, H1_8, float64_add, float64_zero) /* Identity is floatN_default_nan, without the function call. */ -DO_REDUCE(sve_fminnmv_h, float16, H1_2, float16_minnum, 0x7E00) -DO_REDUCE(sve_fminnmv_s, float32, H1_4, float32_minnum, 0x7FC00000) -DO_REDUCE(sve_fminnmv_d, float64, H1_8, float64_minnum, 0x7FF8000000000000ULL) +DO_REDUCE(fminnm,h, float16, H1_2, float16_minnum, 0x7E00) +DO_REDUCE(fminnm,s, float32, H1_4, float32_minnum, 0x7FC00000) +DO_REDUCE(fminnm,d, float64, H1_8, float64_minnum, 0x7FF8000000000000ULL) -DO_REDUCE(sve_fmaxnmv_h, float16, H1_2, float16_maxnum, 0x7E00) -DO_REDUCE(sve_fmaxnmv_s, float32, H1_4, float32_maxnum, 0x7FC00000) -DO_REDUCE(sve_fmaxnmv_d, float64, H1_8, float64_maxnum, 0x7FF8000000000000ULL) +DO_REDUCE(fmaxnm,h, float16, H1_2, float16_maxnum, 0x7E00) +DO_REDUCE(fmaxnm,s, float32, H1_4, float32_maxnum, 0x7FC00000) +DO_REDUCE(fmaxnm,d, float64, H1_8, float64_maxnum, 0x7FF8000000000000ULL) -DO_REDUCE(sve_fminv_h, float16, H1_2, float16_min, float16_infinity) -DO_REDUCE(sve_fminv_s, float32, H1_4, float32_min, float32_infinity) -DO_REDUCE(sve_fminv_d, float64, H1_8, float64_min, float64_infinity) +DO_REDUCE(fmin,h, float16, H1_2, float16_min, float16_infinity) +DO_REDUCE(fmin,s, float32, H1_4, float32_min, float32_infinity) +DO_REDUCE(fmin,d, float64, H1_8, float64_min, float64_infinity) -DO_REDUCE(sve_fmaxv_h, float16, H1_2, float16_max, float16_chs(float16_infinity)) -DO_REDUCE(sve_fmaxv_s, float32, H1_4, float32_max, float32_chs(float32_infinity)) -DO_REDUCE(sve_fmaxv_d, float64, H1_8, float64_max, float64_chs(float64_infinity)) +DO_REDUCE(fmax,h, float16, H1_2, float16_max, float16_chs(float16_infinity)) +DO_REDUCE(fmax,s, float32, H1_4, float32_max, float32_chs(float32_infinity)) +DO_REDUCE(fmax,d, float64, H1_8, float64_max, float64_chs(float64_infinity)) -DO_REDUCE(sve_ah_fminv_h, float16, H1_2, helper_vfp_ah_minh, float16_infinity) -DO_REDUCE(sve_ah_fminv_s, float32, H1_4, helper_vfp_ah_mins, float32_infinity) -DO_REDUCE(sve_ah_fminv_d, float64, H1_8, helper_vfp_ah_mind, float64_infinity) +DO_REDUCE(ah_fmin,h, float16, H1_2, helper_vfp_ah_minh, float16_infinity) +DO_REDUCE(ah_fmin,s, float32, H1_4, helper_vfp_ah_mins, float32_infinity) +DO_REDUCE(ah_fmin,d, float64, H1_8, helper_vfp_ah_mind, float64_infinity) -DO_REDUCE(sve_ah_fmaxv_h, float16, H1_2, helper_vfp_ah_maxh, +DO_REDUCE(ah_fmax,h, float16, H1_2, helper_vfp_ah_maxh, float16_chs(float16_infinity)) -DO_REDUCE(sve_ah_fmaxv_s, float32, H1_4, helper_vfp_ah_maxs, +DO_REDUCE(ah_fmax,s, float32, H1_4, helper_vfp_ah_maxs, float32_chs(float32_infinity)) -DO_REDUCE(sve_ah_fmaxv_d, float64, H1_8, helper_vfp_ah_maxd, +DO_REDUCE(ah_fmax,d, float64, H1_8, helper_vfp_ah_maxd, float64_chs(float64_infinity)) #undef DO_REDUCE diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ DO_VPZ_AH(FMAXV, fmaxv) #undef DO_VPZ +static gen_helper_gvec_3_ptr * const faddqv_fns[4] = { + NULL, gen_helper_sve2p1_faddqv_h, + gen_helper_sve2p1_faddqv_s, gen_helper_sve2p1_faddqv_d, +}; +TRANS_FEAT(FADDQV, aa64_sme2p1_or_sve2p1, gen_gvec_fpst_arg_zpz, + faddqv_fns[a->esz], a, 0, + a->esz == MO_16 ? FPST_A64_F16 : FPST_A64) + +static gen_helper_gvec_3_ptr * const fmaxnmqv_fns[4] = { + NULL, gen_helper_sve2p1_fmaxnmqv_h, + gen_helper_sve2p1_fmaxnmqv_s, gen_helper_sve2p1_fmaxnmqv_d, +}; +TRANS_FEAT(FMAXNMQV, aa64_sme2p1_or_sve2p1, gen_gvec_fpst_arg_zpz, + fmaxnmqv_fns[a->esz], a, 0, + a->esz == MO_16 ? FPST_A64_F16 : FPST_A64) + +static gen_helper_gvec_3_ptr * const fminnmqv_fns[4] = { + NULL, gen_helper_sve2p1_fminnmqv_h, + gen_helper_sve2p1_fminnmqv_s, gen_helper_sve2p1_fminnmqv_d, +}; +TRANS_FEAT(FMINNMQV, aa64_sme2p1_or_sve2p1, gen_gvec_fpst_arg_zpz, + fminnmqv_fns[a->esz], a, 0, + a->esz == MO_16 ? FPST_A64_F16 : FPST_A64) + +static gen_helper_gvec_3_ptr * const fmaxqv_fns[4] = { + NULL, gen_helper_sve2p1_fmaxqv_h, + gen_helper_sve2p1_fmaxqv_s, gen_helper_sve2p1_fmaxqv_d, +}; +static gen_helper_gvec_3_ptr * const fmaxqv_ah_fns[4] = { + NULL, gen_helper_sve2p1_ah_fmaxqv_h, + gen_helper_sve2p1_ah_fmaxqv_s, gen_helper_sve2p1_ah_fmaxqv_d, +}; +TRANS_FEAT(FMAXQV, aa64_sme2p1_or_sve2p1, gen_gvec_fpst_arg_zpz, + (s->fpcr_ah ? fmaxqv_fns : fmaxqv_ah_fns)[a->esz], a, 0, + a->esz == MO_16 ? FPST_A64_F16 : FPST_A64) + +static gen_helper_gvec_3_ptr * const fminqv_fns[4] = { + NULL, gen_helper_sve2p1_fminqv_h, + gen_helper_sve2p1_fminqv_s, gen_helper_sve2p1_fminqv_d, +}; +static gen_helper_gvec_3_ptr * const fminqv_ah_fns[4] = { + NULL, gen_helper_sve2p1_ah_fminqv_h, + gen_helper_sve2p1_ah_fminqv_s, gen_helper_sve2p1_ah_fminqv_d, +}; +TRANS_FEAT(FMINQV, aa64_sme2p1_or_sve2p1, gen_gvec_fpst_arg_zpz, + (s->fpcr_ah ? fminqv_fns : fminqv_ah_fns)[a->esz], a, 0, + a->esz == MO_16 ? FPST_A64_F16 : FPST_A64) + /* *** SVE Floating Point Unary Operations - Unpredicated Group */ -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-87-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 6 ++++++ target/arm/tcg/translate-sve.c | 30 ++++++++++++++++++++++++++++++ 2 files changed, 36 insertions(+) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ FMLSLT_zzzw 01100100 10 1 ..... 10 1 00 1 ..... ..... @rda_rn_rm_ex esz=2 BFMLALB_zzzw 01100100 11 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2 BFMLALT_zzzw 01100100 11 1 ..... 10 0 00 1 ..... ..... @rda_rn_rm_ex esz=2 +BFMLSLB_zzzw 01100100 11 1 ..... 10 1 00 0 ..... ..... @rda_rn_rm_ex esz=2 +BFMLSLT_zzzw 01100100 11 1 ..... 10 1 00 1 ..... ..... @rda_rn_rm_ex esz=2 ### SVE2 floating-point dot-product FDOT_zzzz 01100100 00 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2 BFDOT_zzzz 01100100 01 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2 ### SVE2 floating-point multiply-add long (indexed) + FMLALB_zzxw 01100100 10 1 ..... 0100.0 ..... ..... @rrxr_3a esz=2 FMLALT_zzxw 01100100 10 1 ..... 0100.1 ..... ..... @rrxr_3a esz=2 FMLSLB_zzxw 01100100 10 1 ..... 0110.0 ..... ..... @rrxr_3a esz=2 FMLSLT_zzxw 01100100 10 1 ..... 0110.1 ..... ..... @rrxr_3a esz=2 + BFMLALB_zzxw 01100100 11 1 ..... 0100.0 ..... ..... @rrxr_3a esz=2 BFMLALT_zzxw 01100100 11 1 ..... 0100.1 ..... ..... @rrxr_3a esz=2 +BFMLSLB_zzxw 01100100 11 1 ..... 0110.0 ..... ..... @rrxr_3a esz=2 +BFMLSLT_zzxw 01100100 11 1 ..... 0110.1 ..... ..... @rrxr_3a esz=2 ### SVE2 floating-point dot-product (indexed) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool do_BFMLAL_zzxw(DisasContext *s, arg_rrxr_esz *a, bool sel) TRANS_FEAT(BFMLALB_zzxw, aa64_sve_bf16, do_BFMLAL_zzxw, a, false) TRANS_FEAT(BFMLALT_zzxw, aa64_sve_bf16, do_BFMLAL_zzxw, a, true) +static bool do_BFMLSL_zzzw(DisasContext *s, arg_rrrr_esz *a, bool sel) +{ + if (s->fpcr_ah) { + return gen_gvec_fpst_zzzz(s, gen_helper_gvec_ah_bfmlsl, + a->rd, a->rn, a->rm, a->ra, sel, FPST_AH); + } else { + return gen_gvec_fpst_zzzz(s, gen_helper_gvec_bfmlsl, + a->rd, a->rn, a->rm, a->ra, sel, FPST_A64); + } +} + +TRANS_FEAT(BFMLSLB_zzzw, aa64_sme2_or_sve2p1, do_BFMLSL_zzzw, a, false) +TRANS_FEAT(BFMLSLT_zzzw, aa64_sme2_or_sve2p1, do_BFMLSL_zzzw, a, true) + +static bool do_BFMLSL_zzxw(DisasContext *s, arg_rrxr_esz *a, bool sel) +{ + if (s->fpcr_ah) { + return gen_gvec_fpst_zzzz(s, gen_helper_gvec_ah_bfmlsl_idx, + a->rd, a->rn, a->rm, a->ra, + (a->index << 1) | sel, FPST_AH); + } else { + return gen_gvec_fpst_zzzz(s, gen_helper_gvec_bfmlsl_idx, + a->rd, a->rn, a->rm, a->ra, + (a->index << 1) | sel, FPST_A64); + } +} + +TRANS_FEAT(BFMLSLB_zzxw, aa64_sme2_or_sve2p1, do_BFMLSL_zzxw, a, false) +TRANS_FEAT(BFMLSLT_zzxw, aa64_sme2_or_sve2p1, do_BFMLSL_zzxw, a, true) + static bool trans_PSEL(DisasContext *s, arg_psel *a) { int vl = vec_full_reg_size(s); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-88-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 1 + target/arm/tcg/sve.decode | 3 ++- target/arm/tcg/sve_helper.c | 21 +++++++++++++++++++++ target/arm/tcg/translate-sve.c | 30 ++++++++++++++++++++++++++++++ 4 files changed, 54 insertions(+), 1 deletion(-) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_brkn, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_brkns, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve_cntp, TCG_CALL_NO_RWG, i64, ptr, ptr, i32) +DEF_HELPER_FLAGS_2(sve2p1_cntp_c, TCG_CALL_NO_RWG_SE, i64, i32, i32) DEF_HELPER_FLAGS_3(sve_whilel, TCG_CALL_NO_RWG, i32, ptr, i32, i32) DEF_HELPER_FLAGS_3(sve_whileg, TCG_CALL_NO_RWG, i32, ptr, i32, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ BRKN 00100101 0. 01100001 .... 0 .... 0 .... @pd_pg_pn_s ### SVE Predicate Count Group # SVE predicate count -CNTP 00100101 .. 100 000 10 .... 0 .... ..... @rd_pg4_pn +CNTP 00100101 .. 100 000 10 .... 0 .... ..... @rd_pg4_pn +CNTP_c 00100101 esz:2 100 000 10 000 vl:1 1 rn:4 rd:5 # SVE inc/dec register by predicate count INCDECP_r 00100101 .. 10110 d:1 10001 00 .... ..... @incdec_pred u=1 diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ uint64_t HELPER(sve_cntp)(void *vn, void *vg, uint32_t pred_desc) return sum; } +uint64_t HELPER(sve2p1_cntp_c)(uint32_t png, uint32_t desc) +{ + int pl = FIELD_EX32(desc, PREDDESC, OPRSZ); + int vl = pl * 8; + unsigned v_esz = FIELD_EX32(desc, PREDDESC, ESZ); + int lg2_width = FIELD_EX32(desc, PREDDESC, DATA) + 1; + DecodeCounter p = decode_counter(png, vl, v_esz); + unsigned maxelem = (vl << lg2_width) >> v_esz; + unsigned count = p.count; + + if (p.invert) { + if (count >= maxelem) { + return 0; + } + count = maxelem - count; + } else { + count = MIN(count, maxelem); + } + return count >> p.lg2_stride; +} + /* C.f. Arm pseudocode EncodePredCount */ static uint64_t encode_pred_count(uint32_t elements, uint32_t count, uint32_t esz, bool invert) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_CNTP(DisasContext *s, arg_CNTP *a) return true; } +static bool trans_CNTP_c(DisasContext *s, arg_CNTP_c *a) +{ + TCGv_i32 t_png; + uint32_t desc = 0; + + if (dc_isar_feature(aa64_sve2p1, s)) { + if (!sve_access_check(s)) { + return true; + } + } else if (dc_isar_feature(aa64_sme2, s)) { + if (!sme_sm_enabled_check(s)) { + return true; + } + } else { + return false; + } + + t_png = tcg_temp_new_i32(); + tcg_gen_ld16u_i32(t_png, tcg_env, + pred_full_reg_offset(s, a->rn) ^ + (HOST_BIG_ENDIAN ? 6 : 0)); + + desc = FIELD_DP32(desc, PREDDESC, OPRSZ, pred_full_reg_size(s)); + desc = FIELD_DP32(desc, PREDDESC, ESZ, a->esz); + desc = FIELD_DP32(desc, PREDDESC, DATA, a->vl); + + gen_helper_sve2p1_cntp_c(cpu_reg(s, a->rd), t_png, tcg_constant_i32(desc)); + return true; +} + static bool trans_INCDECP_r(DisasContext *s, arg_incdec_pred *a) { if (!dc_isar_feature(aa64_sve, s)) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-89-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 6 ++++++ target/arm/tcg/translate-sve.c | 21 +++++++++++++++++++++ 2 files changed, 27 insertions(+) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ DUP_s 00000101 .. 1 00000 001110 ..... ..... @rd_rn DUP_x 00000101 .. 1 ..... 001000 rn:5 rd:5 \ &rri imm=%imm7_22_16 +# SVE Permute Vector - one source quadwords +DUPQ 00000101 001 imm:4 1 001001 rn:5 rd:5 &rri_esz esz=0 +DUPQ 00000101 001 imm:3 10 001001 rn:5 rd:5 &rri_esz esz=1 +DUPQ 00000101 001 imm:2 100 001001 rn:5 rd:5 &rri_esz esz=2 +DUPQ 00000101 001 imm:1 1000 001001 rn:5 rd:5 &rri_esz esz=3 + # SVE insert SIMD&FP scalar register INSR_f 00000101 .. 1 10100 001110 ..... ..... @rdn_rm diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_DUP_x(DisasContext *s, arg_DUP_x *a) return true; } +static bool trans_DUPQ(DisasContext *s, arg_DUPQ *a) +{ + unsigned vl, dofs, nofs; + + if (!dc_isar_feature(aa64_sme2p1_or_sve2p1, s)) { + return false; + } + if (!sve_access_check(s)) { + return true; + } + + vl = vec_full_reg_size(s); + dofs = vec_full_reg_offset(s, a->rd); + nofs = vec_reg_offset(s, a->rn, a->imm, a->esz); + + for (unsigned i = 0; i < vl; i += 16) { + tcg_gen_gvec_dup_mem(a->esz, dofs + i, nofs + i, 16, 16); + } + return true; +} + static void do_insr_i64(DisasContext *s, arg_rrr_esz *a, TCGv_i64 val) { typedef void gen_insr(TCGv_ptr, TCGv_ptr, TCGv_i64, TCGv_i32); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-90-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 2 ++ target/arm/tcg/translate-sve.c | 49 ++++++++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ DUPQ 00000101 001 imm:3 10 001001 rn:5 rd:5 &rri_esz esz=1 DUPQ 00000101 001 imm:2 100 001001 rn:5 rd:5 &rri_esz esz=2 DUPQ 00000101 001 imm:1 1000 001001 rn:5 rd:5 &rri_esz esz=3 +EXTQ 00000101 0110 imm:4 001001 rn:5 rd:5 &rri + # SVE insert SIMD&FP scalar register INSR_f 00000101 .. 1 10100 001110 ..... ..... @rdn_rm diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool do_EXT(DisasContext *s, int rd, int rn, int rm, int imm) TRANS_FEAT(EXT, aa64_sve, do_EXT, a->rd, a->rn, a->rm, a->imm) TRANS_FEAT(EXT_sve2, aa64_sve2, do_EXT, a->rd, a->rn, (a->rn + 1) % 32, a->imm) +static bool trans_EXTQ(DisasContext *s, arg_EXTQ *a) +{ + unsigned vl, dofs, sofs0, sofs1, sofs2, imm; + + if (!dc_isar_feature(aa64_sme2p1_or_sve2p1, s)) { + return false; + } + if (!sve_access_check(s)) { + return true; + } + + imm = a->imm; + if (imm == 0) { + /* So far we never optimize Zdn with MOVPRFX, so zd = zn is a nop. */ + return true; + } + + vl = vec_full_reg_size(s); + dofs = vec_full_reg_offset(s, a->rd); + sofs2 = vec_full_reg_offset(s, a->rn); + + if (imm & 8) { + sofs0 = dofs + 8; + sofs1 = sofs2; + sofs2 += 8; + } else { + sofs0 = dofs; + sofs1 = dofs + 8; + } + imm = (imm & 7) << 3; + + for (unsigned i = 0; i < vl; i += 16) { + TCGv_i64 s0 = tcg_temp_new_i64(); + TCGv_i64 s1 = tcg_temp_new_i64(); + TCGv_i64 s2 = tcg_temp_new_i64(); + + tcg_gen_ld_i64(s0, tcg_env, sofs0 + i); + tcg_gen_ld_i64(s1, tcg_env, sofs1 + i); + tcg_gen_ld_i64(s2, tcg_env, sofs2 + i); + + tcg_gen_extract2_i64(s0, s0, s1, imm); + tcg_gen_extract2_i64(s1, s1, s2, imm); + + tcg_gen_st_i64(s0, tcg_env, dofs + i); + tcg_gen_st_i64(s1, tcg_env, dofs + i + 8); + } + return true; +} + /* *** SVE Permute - Unpredicated Group */ -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-91-richard.henderson@linaro.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 8 +++ target/arm/tcg/vec_internal.h | 34 ++++++++++++ target/arm/tcg/sve.decode | 17 ++++++ target/arm/tcg/sve_helper.c | 50 +++++++++++++++++ target/arm/tcg/translate-sve.c | 98 ++++++++++++++++++++++++++++++++++ 5 files changed, 207 insertions(+) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve2p1_andqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2p1_andqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2p1_andqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2p1_andqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(pmov_pv_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(pmov_pv_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(pmov_pv_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(pmov_vp_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(pmov_vp_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(pmov_vp_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_internal.h +++ b/target/arm/tcg/vec_internal.h @@ -XXX,XX +XXX,XX @@ decode_counter(unsigned png, unsigned vl, unsigned v_esz) return ret; } +/* Extract @len bits from an array of uint64_t at offset @pos bits. */ +static inline uint64_t extractn(uint64_t *p, unsigned pos, unsigned len) +{ + uint64_t x; + + p += pos / 64; + pos = pos % 64; + + x = p[0]; + if (pos + len > 64) { + x = (x >> pos) | (p[1] << (-pos & 63)); + pos = 0; + } + return extract64(x, pos, len); +} + +/* Deposit @len bits into an array of uint64_t at offset @pos bits. */ +static inline void depositn(uint64_t *p, unsigned pos, + unsigned len, uint64_t val) +{ + p += pos / 64; + pos = pos % 64; + + if (pos + len <= 64) { + p[0] = deposit64(p[0], pos, len, val); + } else { + unsigned len0 = 64 - pos; + unsigned len1 = len - len0; + + p[0] = deposit64(p[0], pos, len0, val); + p[1] = deposit64(p[1], 0, len1, val >> len0); + } +} + #endif /* TARGET_ARM_VEC_INTERNAL_H */ diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ %size_23 23:2 %dtype_23_13 23:2 13:2 %index3_22_19 22:1 19:2 +%index3_22_17 22:1 17:2 %index3_19_11 19:2 11:1 %index2_20_11 20:1 11:1 @@ -XXX,XX +XXX,XX @@ INSR_r 00000101 .. 1 00100 001110 ..... ..... @rdn_rm # SVE reverse vector elements REV_v 00000101 .. 1 11000 001110 ..... ..... @rd_rn +# SVE move predicate to/from vector + +PMOV_pv 00000101 00 101 01 0001110 rn:5 0 rd:4 \ + &rri_esz esz=0 imm=0 +PMOV_pv 00000101 00 101 1 imm:1 0001110 rn:5 0 rd:4 &rri_esz esz=1 +PMOV_pv 00000101 01 101 imm:2 0001110 rn:5 0 rd:4 &rri_esz esz=2 +PMOV_pv 00000101 1. 101 .. 0001110 rn:5 0 rd:4 \ + &rri_esz esz=3 imm=%index3_22_17 + +PMOV_vp 00000101 00 101 01 1001110 0 rn:4 rd:5 \ + &rri_esz esz=0 imm=0 +PMOV_vp 00000101 00 101 1 imm:1 1001110 0 rn:4 rd:5 &rri_esz esz=1 +PMOV_vp 00000101 01 101 imm:2 1001110 0 rn:4 rd:5 &rri_esz esz=2 +PMOV_vp 00000101 1. 101 .. 1001110 0 rn:4 rd:5 \ + &rri_esz esz=3 imm=%index3_22_17 + # SVE vector table lookup TBL 00000101 .. 1 ..... 001100 ..... ..... @rd_rn_rm diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sve_rev_d)(void *vd, void *vn, uint32_t desc) } } +/* + * TODO: This could use half_shuffle64 and similar bit tricks to + * expand blocks of bits at once. + */ +#define DO_PMOV_PV(NAME, ESIZE) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + unsigned vl = simd_oprsz(desc); \ + unsigned idx = simd_data(desc); \ + unsigned elements = vl / ESIZE; \ + ARMPredicateReg *d = vd; \ + ARMVectorReg *s = vs; \ + memset(d, 0, sizeof(*d)); \ + for (unsigned e = 0; e < elements; ++e) { \ + depositn(d->p, e * ESIZE, 1, extractn(s->d, elements * idx + e, 1)); \ + } \ +} + +DO_PMOV_PV(pmov_pv_h, 2) +DO_PMOV_PV(pmov_pv_s, 4) +DO_PMOV_PV(pmov_pv_d, 8) + +#undef DO_PMOV_PV + +/* + * TODO: This could use half_unshuffle64 and similar bit tricks to + * compress blocks of bits at once. + */ +#define DO_PMOV_VP(NAME, ESIZE) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + unsigned vl = simd_oprsz(desc); \ + unsigned idx = simd_data(desc); \ + unsigned elements = vl / ESIZE; \ + ARMVectorReg *d = vd; \ + ARMPredicateReg *s = vs; \ + if (idx == 0) { \ + memset(d, 0, vl); \ + } \ + for (unsigned e = 0; e < elements; ++e) { \ + depositn(d->d, elements * idx + e, 1, extractn(s->p, e * ESIZE, 1)); \ + } \ +} + +DO_PMOV_VP(pmov_vp_h, 2) +DO_PMOV_VP(pmov_vp_s, 4) +DO_PMOV_VP(pmov_vp_d, 8) + +#undef DO_PMOV_VP + typedef void tb_impl_fn(void *, void *, void *, void *, uintptr_t, bool); static inline void do_tbl1(void *vd, void *vn, void *vm, uint32_t desc, diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3 * const tbx_fns[4] = { }; TRANS_FEAT(TBX, aa64_sve2, gen_gvec_ool_arg_zzz, tbx_fns[a->esz], a, 0) +static bool trans_PMOV_pv(DisasContext *s, arg_PMOV_pv *a) +{ + static gen_helper_gvec_2 * const fns[4] = { + NULL, gen_helper_pmov_pv_h, + gen_helper_pmov_pv_s, gen_helper_pmov_pv_d + }; + unsigned vl, pl, vofs, pofs; + TCGv_i64 tmp; + + if (!dc_isar_feature(aa64_sme2p1_or_sve2p1, s)) { + return false; + } + if (!sve_access_check(s)) { + return true; + } + + vl = vec_full_reg_size(s); + if (a->esz != MO_8) { + tcg_gen_gvec_2_ool(pred_full_reg_offset(s, a->rd), + vec_full_reg_offset(s, a->rn), + vl, vl, a->imm, fns[a->esz]); + return true; + } + + /* + * Copy the low PL bytes from vector Zn, zero-extending to a + * multiple of 8 bytes, so that Pd is properly cleared. + */ + + pl = vl / 8; + pofs = pred_full_reg_offset(s, a->rd); + vofs = vec_full_reg_offset(s, a->rn); + + QEMU_BUILD_BUG_ON(sizeof(ARMPredicateReg) != 32); + for (unsigned i = 32; i >= 8; i >>= 1) { + if (pl & i) { + tcg_gen_gvec_mov(MO_64, pofs, vofs, i, i); + pofs += i; + vofs += i; + } + } + switch (pl & 7) { + case 0: + return true; + case 2: + tmp = tcg_temp_new_i64(); + tcg_gen_ld16u_i64(tmp, tcg_env, vofs + (HOST_BIG_ENDIAN ? 6 : 0)); + break; + case 4: + tmp = tcg_temp_new_i64(); + tcg_gen_ld32u_i64(tmp, tcg_env, vofs + (HOST_BIG_ENDIAN ? 4 : 0)); + break; + case 6: + tmp = tcg_temp_new_i64(); + tcg_gen_ld_i64(tmp, tcg_env, vofs); + tcg_gen_extract_i64(tmp, tmp, 0, 48); + break; + default: + g_assert_not_reached(); + } + tcg_gen_st_i64(tmp, tcg_env, pofs); + return true; +} + +static bool trans_PMOV_vp(DisasContext *s, arg_PMOV_pv *a) +{ + static gen_helper_gvec_2 * const fns[4] = { + NULL, gen_helper_pmov_vp_h, + gen_helper_pmov_vp_s, gen_helper_pmov_vp_d + }; + unsigned vl; + + if (!dc_isar_feature(aa64_sme2p1_or_sve2p1, s)) { + return false; + } + if (!sve_access_check(s)) { + return true; + } + + vl = vec_full_reg_size(s); + + if (a->esz == MO_8) { + /* + * The low PL bytes are copied from Pn to Zd unchanged. + * We know that the unused portion of Pn is zero, and + * that imm == 0, so the balance of Zd must be zeroed. + */ + tcg_gen_gvec_mov(MO_64, vec_full_reg_offset(s, a->rd), + pred_full_reg_offset(s, a->rn), + size_for_gvec(vl / 8), vl); + } else { + tcg_gen_gvec_2_ool(vec_full_reg_offset(s, a->rd), + pred_full_reg_offset(s, a->rn), + vl, vl, a->imm, fns[a->esz]); + } + return true; +} + static bool trans_UNPK(DisasContext *s, arg_UNPK *a) { static gen_helper_gvec_2 * const fns[4][2] = { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-92-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 10 ++++++++++ target/arm/tcg/sve.decode | 6 ++++++ target/arm/tcg/sve_helper.c | 29 +++++++++++++++++++++++++++++ target/arm/tcg/translate-sve.c | 19 ++++++++++++++++++- 4 files changed, 63 insertions(+), 1 deletion(-) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_zip_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_zip_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_zip_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_zipq_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_zipq_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_zipq_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_zipq_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_4(sve_uzp_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_uzp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_uzp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_uzp_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_uzp_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_uzpq_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_uzpq_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_uzpq_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_uzpq_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_4(sve_trn_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_trn_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_trn_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ UZP2_q 00000101 10 1 ..... 000 011 ..... ..... @rd_rn_rm_e0 TRN1_q 00000101 10 1 ..... 000 110 ..... ..... @rd_rn_rm_e0 TRN2_q 00000101 10 1 ..... 000 111 ..... ..... @rd_rn_rm_e0 +# SVE2.1 permute vector elements (quadwords) +ZIPQ1 01000100 .. 0 ..... 111 000 ..... ..... @rd_rn_rm +ZIPQ2 01000100 .. 0 ..... 111 001 ..... ..... @rd_rn_rm +UZPQ1 01000100 .. 0 ..... 111 010 ..... ..... @rd_rn_rm +UZPQ2 01000100 .. 0 ..... 111 011 ..... ..... @rd_rn_rm + ### SVE Permute - Predicated Group # SVE compress active elements diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ DO_UZP(sve_uzp_s, uint32_t, H1_4) DO_UZP(sve_uzp_d, uint64_t, H1_8) DO_UZP(sve2_uzp_q, Int128, ) +typedef void perseg_zzz_fn(void *vd, void *vn, void *vm, uint32_t desc); + +static void do_perseg_zzz(void *vd, void *vn, void *vm, + uint32_t desc, perseg_zzz_fn *fn) +{ + intptr_t oprsz = simd_oprsz(desc); + + desc = simd_desc(16, 16, simd_data(desc)); + for (intptr_t i = 0; i < oprsz; i += 16) { + fn(vd + i, vn + i, vm + i, desc); + } +} + +#define DO_PERSEG_ZZZ(NAME, FUNC) \ + void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ + { do_perseg_zzz(vd, vn, vm, desc, FUNC); } + +DO_PERSEG_ZZZ(sve2p1_uzpq_b, helper_sve_uzp_b) +DO_PERSEG_ZZZ(sve2p1_uzpq_h, helper_sve_uzp_h) +DO_PERSEG_ZZZ(sve2p1_uzpq_s, helper_sve_uzp_s) +DO_PERSEG_ZZZ(sve2p1_uzpq_d, helper_sve_uzp_d) + +DO_PERSEG_ZZZ(sve2p1_zipq_b, helper_sve_zip_b) +DO_PERSEG_ZZZ(sve2p1_zipq_h, helper_sve_zip_h) +DO_PERSEG_ZZZ(sve2p1_zipq_s, helper_sve_zip_s) +DO_PERSEG_ZZZ(sve2p1_zipq_d, helper_sve_zip_d) + +#undef DO_PERSEG_ZZZ + #define DO_TRN(NAME, TYPE, H) \ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ { \ diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT_NONSTREAMING(ZIP2_q, aa64_sve_f64mm, do_interleave_q, gen_helper_sve2_zip_q, a, QEMU_ALIGN_DOWN(vec_full_reg_size(s), 32) / 2) +static gen_helper_gvec_3 * const zipq_fns[4] = { + gen_helper_sve2p1_zipq_b, gen_helper_sve2p1_zipq_h, + gen_helper_sve2p1_zipq_s, gen_helper_sve2p1_zipq_d, +}; +TRANS_FEAT(ZIPQ1, aa64_sme2p1_or_sve2p1, gen_gvec_ool_arg_zzz, + zipq_fns[a->esz], a, 0) +TRANS_FEAT(ZIPQ2, aa64_sme2p1_or_sve2p1, gen_gvec_ool_arg_zzz, + zipq_fns[a->esz], a, 16 / 2) + static gen_helper_gvec_3 * const uzp_fns[4] = { gen_helper_sve_uzp_b, gen_helper_sve_uzp_h, gen_helper_sve_uzp_s, gen_helper_sve_uzp_d, }; - TRANS_FEAT(UZP1_z, aa64_sve, gen_gvec_ool_arg_zzz, uzp_fns[a->esz], a, 0) TRANS_FEAT(UZP2_z, aa64_sve, gen_gvec_ool_arg_zzz, @@ -XXX,XX +XXX,XX @@ TRANS_FEAT_NONSTREAMING(UZP1_q, aa64_sve_f64mm, do_interleave_q, TRANS_FEAT_NONSTREAMING(UZP2_q, aa64_sve_f64mm, do_interleave_q, gen_helper_sve2_uzp_q, a, 16) +static gen_helper_gvec_3 * const uzpq_fns[4] = { + gen_helper_sve2p1_uzpq_b, gen_helper_sve2p1_uzpq_h, + gen_helper_sve2p1_uzpq_s, gen_helper_sve2p1_uzpq_d, +}; +TRANS_FEAT(UZPQ1, aa64_sme2p1_or_sve2p1, gen_gvec_ool_arg_zzz, + uzpq_fns[a->esz], a, 0) +TRANS_FEAT(UZPQ2, aa64_sme2p1_or_sve2p1, gen_gvec_ool_arg_zzz, + uzpq_fns[a->esz], a, 1 << a->esz) + static gen_helper_gvec_3 * const trn_fns[4] = { gen_helper_sve_trn_b, gen_helper_sve_trn_h, gen_helper_sve_trn_s, gen_helper_sve_trn_d, -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-93-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 10 ++++++++++ target/arm/tcg/sve.decode | 3 +++ target/arm/tcg/sve_helper.c | 10 ++++++++++ target/arm/tcg/translate-sve.c | 14 ++++++++++++++ 4 files changed, 37 insertions(+) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sve2_tbl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_tbl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_tbl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_tblq_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_tblq_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_tblq_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_tblq_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_4(sve2_tbx_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_tbx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_tbx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_tbx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_tbxq_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_tbxq_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_tbxq_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_tbxq_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_3(sve_sunpk_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve_sunpk_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve_sunpk_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ ZIPQ2 01000100 .. 0 ..... 111 001 ..... ..... @rd_rn_rm UZPQ1 01000100 .. 0 ..... 111 010 ..... ..... @rd_rn_rm UZPQ2 01000100 .. 0 ..... 111 011 ..... ..... @rd_rn_rm +TBLQ 01000100 .. 0 ..... 111 110 ..... ..... @rd_rn_rm +TBXQ 00000101 .. 1 ..... 001 101 ..... ..... @rd_rn_rm + ### SVE Permute - Predicated Group # SVE compress active elements diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ DO_PERSEG_ZZZ(sve2p1_zipq_h, helper_sve_zip_h) DO_PERSEG_ZZZ(sve2p1_zipq_s, helper_sve_zip_s) DO_PERSEG_ZZZ(sve2p1_zipq_d, helper_sve_zip_d) +DO_PERSEG_ZZZ(sve2p1_tblq_b, helper_sve_tbl_b) +DO_PERSEG_ZZZ(sve2p1_tblq_h, helper_sve_tbl_h) +DO_PERSEG_ZZZ(sve2p1_tblq_s, helper_sve_tbl_s) +DO_PERSEG_ZZZ(sve2p1_tblq_d, helper_sve_tbl_d) + +DO_PERSEG_ZZZ(sve2p1_tbxq_b, helper_sve2_tbx_b) +DO_PERSEG_ZZZ(sve2p1_tbxq_h, helper_sve2_tbx_h) +DO_PERSEG_ZZZ(sve2p1_tbxq_s, helper_sve2_tbx_s) +DO_PERSEG_ZZZ(sve2p1_tbxq_d, helper_sve2_tbx_d) + #undef DO_PERSEG_ZZZ #define DO_TRN(NAME, TYPE, H) \ diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_4 * const sve2_tbl_fns[4] = { TRANS_FEAT(TBL_sve2, aa64_sve2, gen_gvec_ool_zzzz, sve2_tbl_fns[a->esz], a->rd, a->rn, (a->rn + 1) % 32, a->rm, 0) +static gen_helper_gvec_3 * const tblq_fns[4] = { + gen_helper_sve2p1_tblq_b, gen_helper_sve2p1_tblq_h, + gen_helper_sve2p1_tblq_s, gen_helper_sve2p1_tblq_d +}; +TRANS_FEAT(TBLQ, aa64_sme2p1_or_sve2p1, gen_gvec_ool_arg_zzz, + tblq_fns[a->esz], a, 0) + static gen_helper_gvec_3 * const tbx_fns[4] = { gen_helper_sve2_tbx_b, gen_helper_sve2_tbx_h, gen_helper_sve2_tbx_s, gen_helper_sve2_tbx_d }; TRANS_FEAT(TBX, aa64_sve2, gen_gvec_ool_arg_zzz, tbx_fns[a->esz], a, 0) +static gen_helper_gvec_3 * const tbxq_fns[4] = { + gen_helper_sve2p1_tbxq_b, gen_helper_sve2p1_tbxq_h, + gen_helper_sve2p1_tbxq_s, gen_helper_sve2p1_tbxq_d +}; +TRANS_FEAT(TBXQ, aa64_sme2p1_or_sve2p1, gen_gvec_ool_arg_zzz, + tbxq_fns[a->esz], a, 0) + static bool trans_PMOV_pv(DisasContext *s, arg_PMOV_pv *a) { static gen_helper_gvec_2 * const fns[4] = { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Implement the SVE2p1 consecutive register LD1/ST1, and the SME2 strided register LD1/ST1. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-94-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 16 ++ target/arm/tcg/sve.decode | 50 ++++ target/arm/tcg/sve_helper.c | 493 +++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sve.c | 103 +++++++ 4 files changed, 662 insertions(+) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(pmov_pv_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(pmov_vp_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(pmov_vp_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(pmov_vp_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2p1_ld1bb_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_ld1hh_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_ld1hh_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_ld1ss_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_ld1ss_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_ld1dd_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_ld1dd_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) + +DEF_HELPER_FLAGS_5(sve2p1_st1bb_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_st1hh_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_st1hh_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_st1ss_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_st1ss_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_st1dd_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_st1dd_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ SCLAMP 01000100 .. 0 ..... 110000 ..... ..... @rda_rn_rm UCLAMP 01000100 .. 0 ..... 110001 ..... ..... @rda_rn_rm FCLAMP 01100100 .. 1 ..... 001001 ..... ..... @rda_rn_rm + +### SVE2p1 multi-vec contiguous load + +&zcrr_ldst rd png rn rm esz nreg +&zcri_ldst rd png rn imm esz nreg +%png 10:3 !function=plus_8 +%zd_ax2 1:4 !function=times_2 +%zd_ax4 2:3 !function=times_4 + +LD1_zcrr 10100000000 rm:5 0 esz:2 ... rn:5 .... - \ + &zcrr_ldst %png rd=%zd_ax2 nreg=2 +LD1_zcrr 10100000000 rm:5 1 esz:2 ... rn:5 ... 0- \ + &zcrr_ldst %png rd=%zd_ax4 nreg=4 + +ST1_zcrr 10100000001 rm:5 0 esz:2 ... rn:5 .... - \ + &zcrr_ldst %png rd=%zd_ax2 nreg=2 +ST1_zcrr 10100000001 rm:5 1 esz:2 ... rn:5 ... 0- \ + &zcrr_ldst %png rd=%zd_ax4 nreg=4 + +LD1_zcri 101000000100 imm:s4 0 esz:2 ... rn:5 .... - \ + &zcri_ldst %png rd=%zd_ax2 nreg=2 +LD1_zcri 101000000100 imm:s4 1 esz:2 ... rn:5 ... 0- \ + &zcri_ldst %png rd=%zd_ax4 nreg=4 + +ST1_zcri 101000000110 imm:s4 0 esz:2 ... rn:5 .... - \ + &zcri_ldst %png rd=%zd_ax2 nreg=2 +ST1_zcri 101000000110 imm:s4 1 esz:2 ... rn:5 ... 0- \ + &zcri_ldst %png rd=%zd_ax4 nreg=4 + +# Note: N bit and 0 bit (for nreg4) still mashed in rd. +# This is handled within gen_ldst_c(). +LD1_zcrr_stride 10100001000 rm:5 0 esz:2 ... rn:5 rd:5 \ + &zcrr_ldst %png nreg=2 +LD1_zcrr_stride 10100001000 rm:5 1 esz:2 ... rn:5 rd:5 \ + &zcrr_ldst %png nreg=4 + +ST1_zcrr_stride 10100001001 rm:5 0 esz:2 ... rn:5 rd:5 \ + &zcrr_ldst %png nreg=2 +ST1_zcrr_stride 10100001001 rm:5 1 esz:2 ... rn:5 rd:5 \ + &zcrr_ldst %png nreg=4 + +LD1_zcri_stride 101000010100 imm:s4 0 esz:2 ... rn:5 rd:5 \ + &zcri_ldst %png nreg=2 +LD1_zcri_stride 101000010100 imm:s4 1 esz:2 ... rn:5 rd:5 \ + &zcri_ldst %png nreg=4 + +ST1_zcri_stride 101000010110 imm:s4 0 esz:2 ... rn:5 rd:5 \ + &zcri_ldst %png nreg=2 +ST1_zcri_stride 101000010110 imm:s4 1 esz:2 ... rn:5 rd:5 \ + &zcri_ldst %png nreg=4 diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ DO_ST1_ZPZ_D(dd_be, zd, MO_64) #undef DO_ST1_ZPZ_S #undef DO_ST1_ZPZ_D +/* + * SVE2.1 consecutive register load/store + */ + +static unsigned sve2p1_cont_ldst_elements(SVEContLdSt *info, vaddr addr, + uint32_t png, intptr_t reg_max, + int N, int v_esz) +{ + const int esize = 1 << v_esz; + intptr_t reg_off_first = -1, reg_off_last = -1, reg_off_split; + DecodeCounter p = decode_counter(png, reg_max, v_esz); + unsigned b_count = p.count << v_esz; + unsigned b_stride = 1 << (v_esz + p.lg2_stride); + intptr_t page_split; + + /* Set all of the element indices to -1, and the TLB data to 0. */ + memset(info, -1, offsetof(SVEContLdSt, page)); + memset(info->page, 0, sizeof(info->page)); + + if (p.invert) { + if (b_count >= reg_max * N) { + return 0; + } + reg_off_first = b_count; + reg_off_last = reg_max * N - b_stride; + } else { + if (b_count == 0) { + return 0; + } + reg_off_first = 0; + reg_off_last = MIN(b_count - esize, reg_max * N - b_stride); + } + + info->reg_off_first[0] = reg_off_first; + info->mem_off_first[0] = reg_off_first; + + page_split = -(addr | TARGET_PAGE_MASK); + if (reg_off_last + esize <= page_split || reg_off_first >= page_split) { + /* The entire operation fits within a single page. */ + info->reg_off_last[0] = reg_off_last; + return b_stride; + } + + info->page_split = page_split; + reg_off_split = ROUND_DOWN(page_split, esize); + + /* + * This is the last full element on the first page, but it is not + * necessarily active. If there is no full element, i.e. the first + * active element is the one that's split, this value remains -1. + * It is useful as iteration bounds. + */ + if (reg_off_split != 0) { + info->reg_off_last[0] = ROUND_DOWN(reg_off_split - esize, b_stride); + } + + /* Determine if an unaligned element spans the pages. */ + if (page_split & (esize - 1)) { + /* It is helpful to know if the split element is active. */ + if ((reg_off_split & (b_stride - 1)) == 0) { + info->reg_off_split = reg_off_split; + info->mem_off_split = reg_off_split; + } + reg_off_split += esize; + } + + /* + * We do want the first active element on the second page, because + * this may affect the address reported in an exception. + */ + reg_off_split = ROUND_UP(reg_off_split, b_stride); + if (reg_off_split <= reg_off_last) { + info->reg_off_first[1] = reg_off_split; + info->mem_off_first[1] = reg_off_split; + info->reg_off_last[1] = reg_off_last; + } + return b_stride; +} + +static void sve2p1_cont_ldst_watchpoints(SVEContLdSt *info, CPUARMState *env, + target_ulong addr, unsigned estride, + int esize, int wp_access, uintptr_t ra) +{ +#ifndef CONFIG_USER_ONLY + intptr_t count_off, count_last; + int flags0 = info->page[0].flags; + int flags1 = info->page[1].flags; + + if (likely(!((flags0 | flags1) & TLB_WATCHPOINT))) { + return; + } + + /* Indicate that watchpoints are handled. */ + info->page[0].flags = flags0 & ~TLB_WATCHPOINT; + info->page[1].flags = flags1 & ~TLB_WATCHPOINT; + + if (flags0 & TLB_WATCHPOINT) { + count_off = info->reg_off_first[0]; + count_last = info->reg_off_split; + if (count_last < 0) { + count_last = info->reg_off_last[0]; + } + do { + cpu_check_watchpoint(env_cpu(env), addr + count_off, + esize, info->page[0].attrs, wp_access, ra); + count_off += estride; + } while (count_off <= count_last); + } + + count_off = info->reg_off_first[1]; + if ((flags1 & TLB_WATCHPOINT) && count_off >= 0) { + count_last = info->reg_off_last[1]; + do { + cpu_check_watchpoint(env_cpu(env), addr + count_off, + esize, info->page[1].attrs, + wp_access, ra); + count_off += estride; + } while (count_off <= count_last); + } +#endif +} + +static void sve2p1_cont_ldst_mte_check(SVEContLdSt *info, CPUARMState *env, + target_ulong addr, unsigned estride, + int esize, uint32_t mtedesc, + uintptr_t ra) +{ + intptr_t count_off, count_last; + + /* + * TODO: estride is always a small power of two, <= 8. + * Manipulate the stride within the loops such that + * - first iteration hits addr + off, as required, + * - second iteration hits ALIGN_UP(addr, 16), + * - other iterations advance addr by 16. + * This will minimize the probing to once per MTE granule. + */ + + /* Process the page only if MemAttr == Tagged. */ + if (info->page[0].tagged) { + count_off = info->reg_off_first[0]; + count_last = info->reg_off_split; + if (count_last < 0) { + count_last = info->reg_off_last[0]; + } + + do { + mte_check(env, mtedesc, addr + count_off, ra); + count_off += estride; + } while (count_off <= count_last); + } + + count_off = info->reg_off_first[1]; + if (count_off >= 0 && info->page[1].tagged) { + count_last = info->reg_off_last[1]; + do { + mte_check(env, mtedesc, addr + count_off, ra); + count_off += estride; + } while (count_off <= count_last); + } +} + +static inline QEMU_ALWAYS_INLINE +void sve2p1_ld1_c(CPUARMState *env, ARMVectorReg *zd, const vaddr addr, + uint32_t png, uint32_t desc, + const uintptr_t ra, const MemOp esz, + sve_ldst1_host_fn *host_fn, + sve_ldst1_tlb_fn *tlb_fn) +{ + const unsigned N = (desc >> SIMD_DATA_SHIFT) & 1 ? 4 : 2; + const unsigned rstride = 1 << ((desc >> (SIMD_DATA_SHIFT + 1)) % 4); + uint32_t mtedesc = desc >> (SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT); + const intptr_t reg_max = simd_oprsz(desc); + const unsigned esize = 1 << esz; + intptr_t count_off, count_last; + intptr_t reg_off, reg_last, reg_n; + SVEContLdSt info; + unsigned estride, flags; + void *host; + + estride = sve2p1_cont_ldst_elements(&info, addr, png, reg_max, N, esz); + if (estride == 0) { + /* The entire predicate was false; no load occurs. */ + for (unsigned n = 0; n < N; n++) { + memset(zd + n * rstride, 0, reg_max); + } + return; + } + + /* Probe the page(s). Exit with exception for any invalid page. */ + sve_cont_ldst_pages(&info, FAULT_ALL, env, addr, MMU_DATA_LOAD, ra); + + /* Handle watchpoints for all active elements. */ + sve2p1_cont_ldst_watchpoints(&info, env, addr, estride, + esize, BP_MEM_READ, ra); + + /* + * Handle mte checks for all active elements. + * Since TBI must be set for MTE, !mtedesc => !mte_active. + */ + if (mtedesc) { + sve2p1_cont_ldst_mte_check(&info, env, estride, addr, + esize, mtedesc, ra); + } + + flags = info.page[0].flags | info.page[1].flags; + if (unlikely(flags != 0)) { + /* + * At least one page includes MMIO. + * Any bus operation can fail with cpu_transaction_failed, + * which for ARM will raise SyncExternal. Perform the load + * into scratch memory to preserve register state until the end. + */ + ARMVectorReg scratch[4] = { }; + + count_off = info.reg_off_first[0]; + count_last = info.reg_off_last[1]; + if (count_last < 0) { + count_last = info.reg_off_split; + if (count_last < 0) { + count_last = info.reg_off_last[0]; + } + } + reg_off = count_off % reg_max; + reg_n = count_off / reg_max; + + do { + reg_last = MIN(count_last - count_off, reg_max - esize); + do { + tlb_fn(env, &scratch[reg_n], reg_off, addr + count_off, ra); + reg_off += estride; + count_off += estride; + } while (reg_off <= reg_last); + reg_off = 0; + reg_n++; + } while (count_off <= count_last); + + for (unsigned n = 0; n < N; ++n) { + memcpy(&zd[n * rstride], &scratch[n], reg_max); + } + return; + } + + /* The entire operation is in RAM, on valid pages. */ + + for (unsigned n = 0; n < N; ++n) { + memset(&zd[n * rstride], 0, reg_max); + } + + count_off = info.reg_off_first[0]; + count_last = info.reg_off_last[0]; + reg_off = count_off % reg_max; + reg_n = count_off / reg_max; + host = info.page[0].host; + + set_helper_retaddr(ra); + + do { + reg_last = MIN(count_last - reg_n * reg_max, reg_max - esize); + do { + host_fn(&zd[reg_n * rstride], reg_off, host + count_off); + reg_off += estride; + count_off += estride; + } while (reg_off <= reg_last); + reg_off = 0; + reg_n++; + } while (count_off <= count_last); + + clear_helper_retaddr(); + + /* + * Use the slow path to manage the cross-page misalignment. + * But we know this is RAM and cannot trap. + */ + count_off = info.reg_off_split; + if (unlikely(count_off >= 0)) { + reg_off = count_off % reg_max; + reg_n = count_off / reg_max; + tlb_fn(env, &zd[reg_n * rstride], reg_off, addr + count_off, ra); + } + + count_off = info.reg_off_first[1]; + if (unlikely(count_off >= 0)) { + count_last = info.reg_off_last[1]; + reg_off = count_off % reg_max; + reg_n = count_off / reg_max; + host = info.page[1].host; + + set_helper_retaddr(ra); + + do { + reg_last = MIN(count_last - reg_n * reg_max, reg_max - esize); + do { + host_fn(&zd[reg_n * rstride], reg_off, host + count_off); + reg_off += estride; + count_off += estride; + } while (reg_off <= reg_last); + reg_off = 0; + reg_n++; + } while (count_off <= count_last); + + clear_helper_retaddr(); + } +} + +void HELPER(sve2p1_ld1bb_c)(CPUARMState *env, void *vd, target_ulong addr, + uint32_t png, uint32_t desc) +{ + sve2p1_ld1_c(env, vd, addr, png, desc, GETPC(), MO_8, + sve_ld1bb_host, sve_ld1bb_tlb); +} + +#define DO_LD1_2(NAME, ESZ) \ +void HELPER(sve2p1_##NAME##_le_c)(CPUARMState *env, void *vd, \ + target_ulong addr, uint32_t png, \ + uint32_t desc) \ +{ \ + sve2p1_ld1_c(env, vd, addr, png, desc, GETPC(), ESZ, \ + sve_##NAME##_le_host, sve_##NAME##_le_tlb); \ +} \ +void HELPER(sve2p1_##NAME##_be_c)(CPUARMState *env, void *vd, \ + target_ulong addr, uint32_t png, \ + uint32_t desc) \ +{ \ + sve2p1_ld1_c(env, vd, addr, png, desc, GETPC(), ESZ, \ + sve_##NAME##_be_host, sve_##NAME##_be_tlb); \ +} + +DO_LD1_2(ld1hh, MO_16) +DO_LD1_2(ld1ss, MO_32) +DO_LD1_2(ld1dd, MO_64) + +#undef DO_LD1_2 + +static inline QEMU_ALWAYS_INLINE +void sve2p1_st1_c(CPUARMState *env, ARMVectorReg *zd, const vaddr addr, + uint32_t png, uint32_t desc, + const uintptr_t ra, const int esz, + sve_ldst1_host_fn *host_fn, + sve_ldst1_tlb_fn *tlb_fn) +{ + const unsigned N = (desc >> SIMD_DATA_SHIFT) & 1 ? 4 : 2; + const unsigned rstride = 1 << ((desc >> (SIMD_DATA_SHIFT + 1)) % 4); + uint32_t mtedesc = desc >> (SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT); + const intptr_t reg_max = simd_oprsz(desc); + const unsigned esize = 1 << esz; + intptr_t count_off, count_last; + intptr_t reg_off, reg_last, reg_n; + SVEContLdSt info; + unsigned estride, flags; + void *host; + + estride = sve2p1_cont_ldst_elements(&info, addr, png, reg_max, N, esz); + if (estride == 0) { + /* The entire predicate was false; no store occurs. */ + return; + } + + /* Probe the page(s). Exit with exception for any invalid page. */ + sve_cont_ldst_pages(&info, FAULT_ALL, env, addr, MMU_DATA_STORE, ra); + + /* Handle watchpoints for all active elements. */ + sve2p1_cont_ldst_watchpoints(&info, env, addr, estride, + esize, BP_MEM_WRITE, ra); + + /* + * Handle mte checks for all active elements. + * Since TBI must be set for MTE, !mtedesc => !mte_active. + */ + if (mtedesc) { + sve2p1_cont_ldst_mte_check(&info, env, estride, addr, + esize, mtedesc, ra); + } + + flags = info.page[0].flags | info.page[1].flags; + if (unlikely(flags != 0)) { + /* + * At least one page includes MMIO. + * Any bus operation can fail with cpu_transaction_failed, + * which for ARM will raise SyncExternal. Perform the load + * into scratch memory to preserve register state until the end. + */ + count_off = info.reg_off_first[0]; + count_last = info.reg_off_last[1]; + if (count_last < 0) { + count_last = info.reg_off_split; + if (count_last < 0) { + count_last = info.reg_off_last[0]; + } + } + reg_off = count_off % reg_max; + reg_n = count_off / reg_max; + + do { + reg_last = MIN(count_last - count_off, reg_max - esize); + do { + tlb_fn(env, &zd[reg_n * rstride], reg_off, addr + count_off, ra); + reg_off += estride; + count_off += estride; + } while (reg_off <= reg_last); + reg_off = 0; + reg_n++; + } while (count_off <= count_last); + return; + } + + /* The entire operation is in RAM, on valid pages. */ + + count_off = info.reg_off_first[0]; + count_last = info.reg_off_last[0]; + reg_off = count_off % reg_max; + reg_n = count_off / reg_max; + host = info.page[0].host; + + set_helper_retaddr(ra); + + do { + reg_last = MIN(count_last - reg_n * reg_max, reg_max - esize); + do { + host_fn(&zd[reg_n * rstride], reg_off, host + count_off); + reg_off += estride; + count_off += estride; + } while (reg_off <= reg_last); + reg_off = 0; + reg_n++; + } while (count_off <= count_last); + + clear_helper_retaddr(); + + /* + * Use the slow path to manage the cross-page misalignment. + * But we know this is RAM and cannot trap. + */ + count_off = info.reg_off_split; + if (unlikely(count_off >= 0)) { + reg_off = count_off % reg_max; + reg_n = count_off / reg_max; + tlb_fn(env, &zd[reg_n * rstride], reg_off, addr + count_off, ra); + } + + count_off = info.reg_off_first[1]; + if (unlikely(count_off >= 0)) { + count_last = info.reg_off_last[1]; + reg_off = count_off % reg_max; + reg_n = count_off / reg_max; + host = info.page[1].host; + + set_helper_retaddr(ra); + + do { + reg_last = MIN(count_last - reg_n * reg_max, reg_max - esize); + do { + host_fn(&zd[reg_n * rstride], reg_off, host + count_off); + reg_off += estride; + count_off += estride; + } while (reg_off <= reg_last); + reg_off = 0; + reg_n++; + } while (count_off <= count_last); + + clear_helper_retaddr(); + } +} + +void HELPER(sve2p1_st1bb_c)(CPUARMState *env, void *vd, target_ulong addr, + uint32_t png, uint32_t desc) +{ + sve2p1_st1_c(env, vd, addr, png, desc, GETPC(), MO_8, + sve_st1bb_host, sve_st1bb_tlb); +} + +#define DO_ST1_2(NAME, ESZ) \ +void HELPER(sve2p1_##NAME##_le_c)(CPUARMState *env, void *vd, \ + target_ulong addr, uint32_t png, \ + uint32_t desc) \ +{ \ + sve2p1_st1_c(env, vd, addr, png, desc, GETPC(), ESZ, \ + sve_##NAME##_le_host, sve_##NAME##_le_tlb); \ +} \ +void HELPER(sve2p1_##NAME##_be_c)(CPUARMState *env, void *vd, \ + target_ulong addr, uint32_t png, \ + uint32_t desc) \ +{ \ + sve2p1_st1_c(env, vd, addr, png, desc, GETPC(), ESZ, \ + sve_##NAME##_be_host, sve_##NAME##_be_tlb); \ +} + +DO_ST1_2(st1hh, MO_16) +DO_ST1_2(st1ss, MO_32) +DO_ST1_2(st1dd, MO_64) + +#undef DO_ST1_2 + void HELPER(sve2_eor3)(void *vd, void *vn, void *vm, void *vk, uint32_t desc) { intptr_t i, opr_sz = simd_oprsz(desc) / 8; diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(UQCVTN_sh, aa64_sme2_or_sve2p1, gen_gvec_ool_zz, gen_helper_sme2_uqcvtn_sh, a->rd, a->rn, 0) TRANS_FEAT(SQCVTUN_sh, aa64_sme2_or_sve2p1, gen_gvec_ool_zz, gen_helper_sme2_sqcvtun_sh, a->rd, a->rn, 0) + +static bool gen_ldst_c(DisasContext *s, TCGv_i64 addr, int zd, int png, + MemOp esz, bool is_write, int n, bool strided) +{ + typedef void ldst_c_fn(TCGv_env, TCGv_ptr, TCGv_i64, + TCGv_i32, TCGv_i32); + static ldst_c_fn * const f_ldst[2][2][4] = { + { { gen_helper_sve2p1_ld1bb_c, + gen_helper_sve2p1_ld1hh_le_c, + gen_helper_sve2p1_ld1ss_le_c, + gen_helper_sve2p1_ld1dd_le_c, }, + { gen_helper_sve2p1_ld1bb_c, + gen_helper_sve2p1_ld1hh_be_c, + gen_helper_sve2p1_ld1ss_be_c, + gen_helper_sve2p1_ld1dd_be_c, } }, + + { { gen_helper_sve2p1_st1bb_c, + gen_helper_sve2p1_st1hh_le_c, + gen_helper_sve2p1_st1ss_le_c, + gen_helper_sve2p1_st1dd_le_c, }, + { gen_helper_sve2p1_st1bb_c, + gen_helper_sve2p1_st1hh_be_c, + gen_helper_sve2p1_st1ss_be_c, + gen_helper_sve2p1_st1dd_be_c, } } + }; + + TCGv_i32 t_png, t_desc; + TCGv_ptr t_zd; + uint32_t desc, lg2_rstride = 0; + bool be = s->be_data == MO_BE; + + assert(n == 2 || n == 4); + if (strided) { + lg2_rstride = 3; + if (n == 4) { + /* Validate ZD alignment. */ + if (zd & 4) { + return false; + } + lg2_rstride = 2; + } + /* Ignore non-temporal bit */ + zd &= ~8; + } + + if (strided || !dc_isar_feature(aa64_sve2p1, s) + ? !sme_sm_enabled_check(s) + : !sve_access_check(s)) { + return true; + } + + if (!s->mte_active[0]) { + addr = clean_data_tbi(s, addr); + } + + desc = n == 2 ? 0 : 1; + desc = desc | (lg2_rstride << 1); + desc = make_svemte_desc(s, vec_full_reg_size(s), 1, esz, is_write, desc); + t_desc = tcg_constant_i32(desc); + + t_png = tcg_temp_new_i32(); + tcg_gen_ld16u_i32(t_png, tcg_env, + pred_full_reg_offset(s, png) ^ + (HOST_BIG_ENDIAN ? 6 : 0)); + + t_zd = tcg_temp_new_ptr(); + tcg_gen_addi_ptr(t_zd, tcg_env, vec_full_reg_offset(s, zd)); + + f_ldst[is_write][be][esz](tcg_env, t_zd, addr, t_png, t_desc); + return true; +} + +static bool gen_ldst_zcrr_c(DisasContext *s, arg_zcrr_ldst *a, + bool is_write, bool strided) +{ + TCGv_i64 addr = tcg_temp_new_i64(); + + tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), a->esz); + tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn)); + return gen_ldst_c(s, addr, a->rd, a->png, a->esz, is_write, + a->nreg, strided); +} + +static bool gen_ldst_zcri_c(DisasContext *s, arg_zcri_ldst *a, + bool is_write, bool strided) +{ + TCGv_i64 addr = tcg_temp_new_i64(); + + tcg_gen_addi_i64(addr, cpu_reg_sp(s, a->rn), + a->imm * a->nreg * vec_full_reg_size(s)); + return gen_ldst_c(s, addr, a->rd, a->png, a->esz, is_write, + a->nreg, strided); +} + +TRANS_FEAT(LD1_zcrr, aa64_sme2_or_sve2p1, gen_ldst_zcrr_c, a, false, false) +TRANS_FEAT(LD1_zcri, aa64_sme2_or_sve2p1, gen_ldst_zcri_c, a, false, false) +TRANS_FEAT(ST1_zcrr, aa64_sme2_or_sve2p1, gen_ldst_zcrr_c, a, true, false) +TRANS_FEAT(ST1_zcri, aa64_sme2_or_sve2p1, gen_ldst_zcri_c, a, true, false) + +TRANS_FEAT(LD1_zcrr_stride, aa64_sme2, gen_ldst_zcrr_c, a, false, true) +TRANS_FEAT(LD1_zcri_stride, aa64_sme2, gen_ldst_zcri_c, a, false, true) +TRANS_FEAT(ST1_zcrr_stride, aa64_sme2, gen_ldst_zcrr_c, a, true, true) +TRANS_FEAT(ST1_zcri_stride, aa64_sme2, gen_ldst_zcri_c, a, true, true) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> The msz > esz encodings are reserved, and some of them are about to be reused. Split these patterns so that the new insns do not overlap. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-95-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ # Stores; user must fill in ESZ, MSZ, NREG as needed. @rprr_store ....... .. .. rm:5 ... pg:3 rn:5 rd:5 &rprr_store -@rpri_store_msz ....... msz:2 .. . imm:s4 ... pg:3 rn:5 rd:5 &rpri_store +@rpri_store ....... .. .. . imm:s4 ... pg:3 rn:5 rd:5 &rpri_store @rprr_store_esz_n0 ....... .. esz:2 rm:5 ... pg:3 rn:5 rd:5 \ &rprr_store nreg=0 @rprr_scatter_store ....... msz:2 .. rm:5 ... pg:3 rn:5 rd:5 \ @@ -XXX,XX +XXX,XX @@ STR_zri 1110010 11 0. ..... 010 ... ..... ..... @rd_rn_i9 # SVE contiguous store (scalar plus immediate) # ST1B, ST1H, ST1W, ST1D; require msz <= esz -ST_zpri 1110010 .. esz:2 0.... 111 ... ..... ..... \ - @rpri_store_msz nreg=0 +ST_zpri 1110010 00 esz:2 0.... 111 ... ..... ..... \ + @rpri_store msz=0 nreg=0 +ST_zpri 1110010 01 esz:2 0.... 111 ... ..... ..... \ + @rpri_store msz=1 nreg=0 +ST_zpri 1110010 10 10 0.... 111 ... ..... ..... \ + @rpri_store msz=2 esz=2 nreg=0 +ST_zpri 1110010 10 11 0.... 111 ... ..... ..... \ + @rpri_store msz=2 esz=3 nreg=0 +ST_zpri 1110010 11 11 0.... 111 ... ..... ..... \ + @rpri_store msz=3 esz=3 nreg=0 # SVE contiguous store (scalar plus scalar) # ST1B, ST1H, ST1W, ST1D; require msz <= esz @@ -XXX,XX +XXX,XX @@ ST_zprr 1110010 00 .. ..... 010 ... ..... ..... \ @rprr_store_esz_n0 msz=0 ST_zprr 1110010 01 .. ..... 010 ... ..... ..... \ @rprr_store_esz_n0 msz=1 -ST_zprr 1110010 10 .. ..... 010 ... ..... ..... \ - @rprr_store_esz_n0 msz=2 +ST_zprr 1110010 10 10 ..... 010 ... ..... ..... \ + @rprr_store msz=2 esz=2 nreg=0 +ST_zprr 1110010 10 11 ..... 010 ... ..... ..... \ + @rprr_store msz=2 esz=3 nreg=0 ST_zprr 1110010 11 11 ..... 010 ... ..... ..... \ @rprr_store msz=3 esz=3 nreg=0 # SVE contiguous non-temporal store (scalar plus immediate) (nreg == 0) # SVE store multiple structures (scalar plus immediate) (nreg != 0) ST_zpri 1110010 .. nreg:2 1.... 111 ... ..... ..... \ - @rpri_store_msz esz=%size_23 + @rpri_store msz=%size_23 esz=%size_23 # SVE contiguous non-temporal store (scalar plus scalar) (nreg == 0) # SVE store multiple structures (scalar plus scalar) (nreg != 0) -ST_zprr 1110010 msz:2 nreg:2 ..... 011 ... ..... ..... \ - @rprr_store esz=%size_23 +ST_zprr 1110010 .. nreg:2 ..... 011 ... ..... ..... \ + @rprr_store msz=%size_23 esz=%size_23 # SVE 32-bit scatter store (scalar plus 32-bit scaled offsets) # Require msz > 0 && msz <= esz. -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-96-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 22 +++++ target/arm/tcg/sve_ldst_internal.h | 26 ++++++ target/arm/tcg/sve.decode | 20 +++++ target/arm/tcg/sve_helper.c | 6 ++ target/arm/tcg/translate-sve.c | 136 +++++++++++++++++++++++------ 5 files changed, 183 insertions(+), 27 deletions(-) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_ld1hds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1sdu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1sds_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1squ_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1dqu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_4(sve_ld1sdu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1sds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1squ_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1dqu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_4(sve_ld1bb_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld2bb_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld3bb_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_ld1hds_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1sdu_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1sds_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1squ_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1dqu_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_4(sve_ld1sdu_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1sds_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1squ_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1dqu_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_4(sve_ldff1bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ldff1bhu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ldff1bsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_st1hd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st1sd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st1sd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st1sq_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st1sq_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st1dq_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st1dq_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_4(sve_st1bb_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st2bb_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st3bb_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_st1hd_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st1sd_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st1sd_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st1sq_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st1sq_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st1dq_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st1dq_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_6(sve_ldbsu_zsu, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_6(sve_ldhsu_le_zsu, TCG_CALL_NO_WG, diff --git a/target/arm/tcg/sve_ldst_internal.h b/target/arm/tcg/sve_ldst_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_ldst_internal.h +++ b/target/arm/tcg/sve_ldst_internal.h @@ -XXX,XX +XXX,XX @@ DO_ST_PRIM_2(sd, H1_8, uint64_t, uint32_t, stl) DO_LD_PRIM_2(dd, H1_8, uint64_t, uint64_t, ldq) DO_ST_PRIM_2(dd, H1_8, uint64_t, uint64_t, stq) +#define DO_LD_PRIM_3(NAME, FUNC) \ + static inline void sve_##NAME##_host(void *vd, \ + intptr_t reg_off, void *host) \ + { sve_##FUNC##_host(vd, reg_off, host); \ + *(uint64_t *)(vd + reg_off + 8) = 0; } \ + static inline void sve_##NAME##_tlb(CPUARMState *env, void *vd, \ + intptr_t reg_off, target_ulong addr, uintptr_t ra) \ + { sve_##FUNC##_tlb(env, vd, reg_off, addr, ra); \ + *(uint64_t *)(vd + reg_off + 8) = 0; } + +DO_LD_PRIM_3(ld1squ_be, ld1sdu_be) +DO_LD_PRIM_3(ld1squ_le, ld1sdu_le) +DO_LD_PRIM_3(ld1dqu_be, ld1dd_be) +DO_LD_PRIM_3(ld1dqu_le, ld1dd_le) + +#define sve_st1sq_be_host sve_st1sd_be_host +#define sve_st1sq_le_host sve_st1sd_le_host +#define sve_st1sq_be_tlb sve_st1sd_be_tlb +#define sve_st1sq_le_tlb sve_st1sd_le_tlb + +#define sve_st1dq_be_host sve_st1dd_be_host +#define sve_st1dq_le_host sve_st1dd_le_host +#define sve_st1dq_be_tlb sve_st1dd_be_tlb +#define sve_st1dq_le_tlb sve_st1dd_le_tlb + #undef DO_LD_TLB #undef DO_ST_TLB #undef DO_LD_HOST @@ -XXX,XX +XXX,XX @@ DO_ST_PRIM_2(dd, H1_8, uint64_t, uint64_t, stq) #undef DO_ST_PRIM_1 #undef DO_LD_PRIM_2 #undef DO_ST_PRIM_2 +#undef DO_LD_PRIM_3 /* * Resolve the guest virtual address to info->host and info->flags. diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ LD1_zpiz 1000010 .. 01 ..... 1.. ... ..... ..... \ # SVE contiguous load (scalar plus scalar) LD_zprr 1010010 .... ..... 010 ... ..... ..... @rprr_load_dt nreg=0 +# LD1W (128-bit element) +LD_zprr 1010010 1000 rm:5 100 pg:3 rn:5 rd:5 \ + &rprr_load dtype=16 nreg=0 +# LD1D (128-bit element) +LD_zprr 1010010 1100 rm:5 100 pg:3 rn:5 rd:5 \ + &rprr_load dtype=17 nreg=0 # SVE contiguous first-fault load (scalar plus scalar) LDFF1_zprr 1010010 .... ..... 011 ... ..... ..... @rprr_load_dt nreg=0 # SVE contiguous load (scalar plus immediate) LD_zpri 1010010 .... 0.... 101 ... ..... ..... @rpri_load_dt nreg=0 +# LD1W (128-bit element) +LD_zpri 1010010 1000 1 imm:s4 001 pg:3 rn:5 rd:5 \ + &rpri_load dtype=16 nreg=0 +# LD1D (128-bit element) +LD_zpri 1010010 1100 1 imm:s4 001 pg:3 rn:5 rd:5 \ + &rpri_load dtype=17 nreg=0 # SVE contiguous non-fault load (scalar plus immediate) LDNF1_zpri 1010010 .... 1.... 101 ... ..... ..... @rpri_load_dt nreg=0 @@ -XXX,XX +XXX,XX @@ ST_zpri 1110010 10 11 0.... 111 ... ..... ..... \ @rpri_store msz=2 esz=3 nreg=0 ST_zpri 1110010 11 11 0.... 111 ... ..... ..... \ @rpri_store msz=3 esz=3 nreg=0 +ST_zpri 1110010 10 00 0.... 111 ... ..... ..... \ + @rpri_store msz=2 esz=4 nreg=0 +ST_zpri 1110010 11 10 0.... 111 ... ..... ..... \ + @rpri_store msz=3 esz=4 nreg=0 # SVE contiguous store (scalar plus scalar) # ST1B, ST1H, ST1W, ST1D; require msz <= esz @@ -XXX,XX +XXX,XX @@ ST_zprr 1110010 10 11 ..... 010 ... ..... ..... \ @rprr_store msz=2 esz=3 nreg=0 ST_zprr 1110010 11 11 ..... 010 ... ..... ..... \ @rprr_store msz=3 esz=3 nreg=0 +ST_zprr 1110010 10 00 ..... 010 ... ..... ..... \ + @rprr_store msz=2 esz=4 nreg=0 +ST_zprr 1110010 11 10 ..... 010 ... ..... ..... \ + @rprr_store msz=3 esz=4 nreg=0 # SVE contiguous non-temporal store (scalar plus immediate) (nreg == 0) # SVE store multiple structures (scalar plus immediate) (nreg != 0) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ DO_LD1_2(ld1sds, MO_64, MO_32) DO_LD1_2(ld1dd, MO_64, MO_64) +DO_LD1_2(ld1squ, MO_32, MO_128) +DO_LD1_2(ld1dqu, MO_64, MO_128) + #undef DO_LD1_1 #undef DO_LD1_2 @@ -XXX,XX +XXX,XX @@ DO_STN_2(2, dd, MO_64, MO_64) DO_STN_2(3, dd, MO_64, MO_64) DO_STN_2(4, dd, MO_64, MO_64) +DO_STN_2(1, sq, MO_128, MO_32) +DO_STN_2(1, dq, MO_128, MO_64) + #undef DO_STN_1 #undef DO_STN_2 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_STR_pri(DisasContext *s, arg_rri *a) */ /* The memory mode of the dtype. */ -static const MemOp dtype_mop[16] = { +static const MemOp dtype_mop[19] = { MO_UB, MO_UB, MO_UB, MO_UB, MO_SL, MO_UW, MO_UW, MO_UW, MO_SW, MO_SW, MO_UL, MO_UL, - MO_SB, MO_SB, MO_SB, MO_UQ + MO_SB, MO_SB, MO_SB, MO_UQ, + /* Artificial values used by decode */ + MO_UL, MO_UQ, MO_128, }; #define dtype_msz(x) (dtype_mop[x] & MO_SIZE) /* The vector element size of dtype. */ -static const uint8_t dtype_esz[16] = { +static const uint8_t dtype_esz[19] = { 0, 1, 2, 3, 3, 1, 2, 3, 3, 2, 2, 3, - 3, 2, 1, 3 + 3, 2, 1, 3, + /* Artificial values used by decode */ + 4, 4, 4, }; uint32_t make_svemte_desc(DisasContext *s, unsigned vsz, uint32_t nregs, @@ -XXX,XX +XXX,XX @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, } /* Indexed by [mte][be][dtype][nreg] */ -static gen_helper_gvec_mem * const ldr_fns[2][2][16][4] = { +static gen_helper_gvec_mem * const ldr_fns[2][2][18][4] = { { /* mte inactive, little-endian */ { { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r, gen_helper_sve_ld3bb_r, gen_helper_sve_ld4bb_r }, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_mem * const ldr_fns[2][2][16][4] = { { gen_helper_sve_ld1bss_r, NULL, NULL, NULL }, { gen_helper_sve_ld1bhs_r, NULL, NULL, NULL }, { gen_helper_sve_ld1dd_le_r, gen_helper_sve_ld2dd_le_r, - gen_helper_sve_ld3dd_le_r, gen_helper_sve_ld4dd_le_r } }, + gen_helper_sve_ld3dd_le_r, gen_helper_sve_ld4dd_le_r }, + + { gen_helper_sve_ld1squ_le_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1dqu_le_r, NULL, NULL, NULL }, + }, /* mte inactive, big-endian */ { { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_mem * const ldr_fns[2][2][16][4] = { { gen_helper_sve_ld1bss_r, NULL, NULL, NULL }, { gen_helper_sve_ld1bhs_r, NULL, NULL, NULL }, { gen_helper_sve_ld1dd_be_r, gen_helper_sve_ld2dd_be_r, - gen_helper_sve_ld3dd_be_r, gen_helper_sve_ld4dd_be_r } } }, + gen_helper_sve_ld3dd_be_r, gen_helper_sve_ld4dd_be_r }, + + { gen_helper_sve_ld1squ_be_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1dqu_be_r, NULL, NULL, NULL }, + }, + }, { /* mte active, little-endian */ { { gen_helper_sve_ld1bb_r_mte, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_mem * const ldr_fns[2][2][16][4] = { { gen_helper_sve_ld1dd_le_r_mte, gen_helper_sve_ld2dd_le_r_mte, gen_helper_sve_ld3dd_le_r_mte, - gen_helper_sve_ld4dd_le_r_mte } }, + gen_helper_sve_ld4dd_le_r_mte }, + + { gen_helper_sve_ld1squ_le_r_mte, NULL, NULL, NULL }, + { gen_helper_sve_ld1dqu_le_r_mte, NULL, NULL, NULL }, + }, /* mte active, big-endian */ { { gen_helper_sve_ld1bb_r_mte, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_mem * const ldr_fns[2][2][16][4] = { { gen_helper_sve_ld1dd_be_r_mte, gen_helper_sve_ld2dd_be_r_mte, gen_helper_sve_ld3dd_be_r_mte, - gen_helper_sve_ld4dd_be_r_mte } } }, + gen_helper_sve_ld4dd_be_r_mte }, + + { gen_helper_sve_ld1squ_be_r_mte, NULL, NULL, NULL }, + { gen_helper_sve_ld1dqu_be_r_mte, NULL, NULL, NULL }, + }, + }, }; static void do_ld_zpa(DisasContext *s, int zt, int pg, @@ -XXX,XX +XXX,XX @@ static void do_ld_zpa(DisasContext *s, int zt, int pg, static bool trans_LD_zprr(DisasContext *s, arg_rprr_load *a) { - if (a->rm == 31 || !dc_isar_feature(aa64_sve, s)) { + if (a->rm == 31) { return false; } + + /* dtypes 16 and 17 are artificial, representing 128-bit element */ + if (a->dtype < 16) { + if (!dc_isar_feature(aa64_sve, s)) { + return false; + } + } else { + if (!dc_isar_feature(aa64_sve2p1, s)) { + return false; + } + s->is_nonstreaming = true; + } + if (sve_access_check(s)) { TCGv_i64 addr = tcg_temp_new_i64(); tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), dtype_msz(a->dtype)); @@ -XXX,XX +XXX,XX @@ static bool trans_LD_zprr(DisasContext *s, arg_rprr_load *a) static bool trans_LD_zpri(DisasContext *s, arg_rpri_load *a) { - if (!dc_isar_feature(aa64_sve, s)) { - return false; + /* dtypes 16 and 17 are artificial, representing 128-bit element */ + if (a->dtype < 16) { + if (!dc_isar_feature(aa64_sve, s)) { + return false; + } + } else { + if (!dc_isar_feature(aa64_sve2p1, s)) { + return false; + } + s->is_nonstreaming = true; } + if (sve_access_check(s)) { int vsz = vec_full_reg_size(s); int elements = vsz >> dtype_esz[a->dtype]; @@ -XXX,XX +XXX,XX @@ static bool trans_LD1R_zpri(DisasContext *s, arg_rpri_load *a) static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, int msz, int esz, int nreg) { - static gen_helper_gvec_mem * const fn_single[2][2][4][4] = { + static gen_helper_gvec_mem * const fn_single[2][2][4][5] = { { { { gen_helper_sve_st1bb_r, gen_helper_sve_st1bh_r, gen_helper_sve_st1bs_r, @@ -XXX,XX +XXX,XX @@ static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, gen_helper_sve_st1hd_le_r }, { NULL, NULL, gen_helper_sve_st1ss_le_r, - gen_helper_sve_st1sd_le_r }, + gen_helper_sve_st1sd_le_r, + gen_helper_sve_st1sq_le_r, }, { NULL, NULL, NULL, - gen_helper_sve_st1dd_le_r } }, + gen_helper_sve_st1dd_le_r, + gen_helper_sve_st1dq_le_r, } }, { { gen_helper_sve_st1bb_r, gen_helper_sve_st1bh_r, gen_helper_sve_st1bs_r, @@ -XXX,XX +XXX,XX @@ static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, gen_helper_sve_st1hd_be_r }, { NULL, NULL, gen_helper_sve_st1ss_be_r, - gen_helper_sve_st1sd_be_r }, + gen_helper_sve_st1sd_be_r, + gen_helper_sve_st1sq_be_r }, { NULL, NULL, NULL, - gen_helper_sve_st1dd_be_r } } }, + gen_helper_sve_st1dd_be_r, + gen_helper_sve_st1dq_be_r } } }, { { { gen_helper_sve_st1bb_r_mte, gen_helper_sve_st1bh_r_mte, @@ -XXX,XX +XXX,XX @@ static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, gen_helper_sve_st1hd_le_r_mte }, { NULL, NULL, gen_helper_sve_st1ss_le_r_mte, - gen_helper_sve_st1sd_le_r_mte }, + gen_helper_sve_st1sd_le_r_mte, + gen_helper_sve_st1sq_le_r_mte }, { NULL, NULL, NULL, - gen_helper_sve_st1dd_le_r_mte } }, + gen_helper_sve_st1dd_le_r_mte, + gen_helper_sve_st1dq_le_r_mte } }, { { gen_helper_sve_st1bb_r_mte, gen_helper_sve_st1bh_r_mte, gen_helper_sve_st1bs_r_mte, @@ -XXX,XX +XXX,XX @@ static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, gen_helper_sve_st1hd_be_r_mte }, { NULL, NULL, gen_helper_sve_st1ss_be_r_mte, - gen_helper_sve_st1sd_be_r_mte }, + gen_helper_sve_st1sd_be_r_mte, + gen_helper_sve_st1sq_be_r_mte }, { NULL, NULL, NULL, - gen_helper_sve_st1dd_be_r_mte } } }, + gen_helper_sve_st1dd_be_r_mte, + gen_helper_sve_st1dq_be_r_mte } } }, }; static gen_helper_gvec_mem * const fn_multiple[2][2][3][4] = { { { { gen_helper_sve_st2bb_r, @@ -XXX,XX +XXX,XX @@ static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, static bool trans_ST_zprr(DisasContext *s, arg_rprr_store *a) { - if (!dc_isar_feature(aa64_sve, s)) { - return false; - } if (a->rm == 31 || a->msz > a->esz) { return false; } + switch (a->esz) { + case MO_8 ... MO_64: + if (!dc_isar_feature(aa64_sve, s)) { + return false; + } + break; + case MO_128: + assert(a->msz < a->esz); + assert(a->nreg == 0); + if (!dc_isar_feature(aa64_sve2p1, s)) { + return false; + } + s->is_nonstreaming = true; + break; + default: + g_assert_not_reached(); + } + if (sve_access_check(s)) { TCGv_i64 addr = tcg_temp_new_i64(); tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), a->msz); @@ -XXX,XX +XXX,XX @@ static bool trans_ST_zprr(DisasContext *s, arg_rprr_store *a) static bool trans_ST_zpri(DisasContext *s, arg_rpri_store *a) { - if (!dc_isar_feature(aa64_sve, s)) { - return false; - } if (a->msz > a->esz) { return false; } + switch (a->esz) { + case MO_8 ... MO_64: + if (!dc_isar_feature(aa64_sve, s)) { + return false; + } + break; + case MO_128: + assert(a->msz < a->esz); + assert(a->nreg == 0); + if (!dc_isar_feature(aa64_sve2p1, s)) { + return false; + } + s->is_nonstreaming = true; + break; + default: + g_assert_not_reached(); + } + if (sve_access_check(s)) { int vsz = vec_full_reg_size(s); int elements = vsz >> a->esz; -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Move from sme_helper.c to the shared header. Add a comment noting the lack of atomicity. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-97-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve_ldst_internal.h | 63 ++++++++++++++++++++++++++++++ target/arm/tcg/sme_helper.c | 44 +++------------------ 2 files changed, 69 insertions(+), 38 deletions(-) diff --git a/target/arm/tcg/sve_ldst_internal.h b/target/arm/tcg/sve_ldst_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_ldst_internal.h +++ b/target/arm/tcg/sve_ldst_internal.h @@ -XXX,XX +XXX,XX @@ DO_LD_PRIM_3(ld1dqu_le, ld1dd_le) #define sve_st1dq_be_tlb sve_st1dd_be_tlb #define sve_st1dq_le_tlb sve_st1dd_le_tlb +/* + * The ARMVectorReg elements are stored in host-endian 64-bit units. + * For 128-bit quantities, the sequence defined by the Elem[] pseudocode + * corresponds to storing the two 64-bit pieces in little-endian order. + */ +/* FIXME: Nothing in this file makes any effort at atomicity. */ + +static inline void sve_ld1qq_be_host(void *vd, intptr_t reg_off, void *host) +{ + sve_ld1dd_be_host(vd, reg_off + 8, host); + sve_ld1dd_be_host(vd, reg_off, host + 8); +} + +static inline void sve_ld1qq_le_host(void *vd, intptr_t reg_off, void *host) +{ + sve_ld1dd_le_host(vd, reg_off, host); + sve_ld1dd_le_host(vd, reg_off + 8, host + 8); +} + +static inline void +sve_ld1qq_be_tlb(CPUARMState *env, void *vd, intptr_t reg_off, + target_ulong addr, uintptr_t ra) +{ + sve_ld1dd_be_tlb(env, vd, reg_off + 8, addr, ra); + sve_ld1dd_be_tlb(env, vd, reg_off, addr + 8, ra); +} + +static inline void +sve_ld1qq_le_tlb(CPUARMState *env, void *vd, intptr_t reg_off, + target_ulong addr, uintptr_t ra) +{ + sve_ld1dd_le_tlb(env, vd, reg_off, addr, ra); + sve_ld1dd_le_tlb(env, vd, reg_off + 8, addr + 8, ra); +} + +static inline void sve_st1qq_be_host(void *vd, intptr_t reg_off, void *host) +{ + sve_st1dd_be_host(vd, reg_off + 8, host); + sve_st1dd_be_host(vd, reg_off, host + 8); +} + +static inline void sve_st1qq_le_host(void *vd, intptr_t reg_off, void *host) +{ + sve_st1dd_le_host(vd, reg_off, host); + sve_st1dd_le_host(vd, reg_off + 8, host + 8); +} + +static inline void +sve_st1qq_be_tlb(CPUARMState *env, void *vd, intptr_t reg_off, + target_ulong addr, uintptr_t ra) +{ + sve_st1dd_be_tlb(env, vd, reg_off + 8, addr, ra); + sve_st1dd_be_tlb(env, vd, reg_off, addr + 8, ra); +} + +static inline void +sve_st1qq_le_tlb(CPUARMState *env, void *vd, intptr_t reg_off, + target_ulong addr, uintptr_t ra) +{ + sve_st1dd_le_tlb(env, vd, reg_off, addr, ra); + sve_st1dd_le_tlb(env, vd, reg_off + 8, addr + 8, ra); +} + #undef DO_LD_TLB #undef DO_ST_TLB #undef DO_LD_HOST diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ static inline void sme_##NAME##_v_tlb(CPUARMState *env, void *za, \ TLB(env, useronly_clean_ptr(addr), val, ra); \ } -/* - * The ARMVectorReg elements are stored in host-endian 64-bit units. - * For 128-bit quantities, the sequence defined by the Elem[] pseudocode - * corresponds to storing the two 64-bit pieces in little-endian order. - */ -#define DO_LDQ(HNAME, VNAME, BE, HOST, TLB) \ -static inline void HNAME##_host(void *za, intptr_t off, void *host) \ -{ \ - uint64_t val0 = HOST(host), val1 = HOST(host + 8); \ - uint64_t *ptr = za + off; \ - ptr[0] = BE ? val1 : val0, ptr[1] = BE ? val0 : val1; \ -} \ +#define DO_LDQ(HNAME, VNAME) \ static inline void VNAME##_v_host(void *za, intptr_t off, void *host) \ { \ HNAME##_host(za, tile_vslice_offset(off), host); \ } \ -static inline void HNAME##_tlb(CPUARMState *env, void *za, intptr_t off, \ - target_ulong addr, uintptr_t ra) \ -{ \ - uint64_t val0 = TLB(env, useronly_clean_ptr(addr), ra); \ - uint64_t val1 = TLB(env, useronly_clean_ptr(addr + 8), ra); \ - uint64_t *ptr = za + off; \ - ptr[0] = BE ? val1 : val0, ptr[1] = BE ? val0 : val1; \ -} \ static inline void VNAME##_v_tlb(CPUARMState *env, void *za, intptr_t off, \ target_ulong addr, uintptr_t ra) \ { \ HNAME##_tlb(env, za, tile_vslice_offset(off), addr, ra); \ } -#define DO_STQ(HNAME, VNAME, BE, HOST, TLB) \ -static inline void HNAME##_host(void *za, intptr_t off, void *host) \ -{ \ - uint64_t *ptr = za + off; \ - HOST(host, ptr[BE]); \ - HOST(host + 8, ptr[!BE]); \ -} \ +#define DO_STQ(HNAME, VNAME) \ static inline void VNAME##_v_host(void *za, intptr_t off, void *host) \ { \ HNAME##_host(za, tile_vslice_offset(off), host); \ } \ -static inline void HNAME##_tlb(CPUARMState *env, void *za, intptr_t off, \ - target_ulong addr, uintptr_t ra) \ -{ \ - uint64_t *ptr = za + off; \ - TLB(env, useronly_clean_ptr(addr), ptr[BE], ra); \ - TLB(env, useronly_clean_ptr(addr + 8), ptr[!BE], ra); \ -} \ static inline void VNAME##_v_tlb(CPUARMState *env, void *za, intptr_t off, \ target_ulong addr, uintptr_t ra) \ { \ @@ -XXX,XX +XXX,XX @@ DO_LD(ld1s_le, uint32_t, ldl_le_p, cpu_ldl_le_data_ra) DO_LD(ld1d_be, uint64_t, ldq_be_p, cpu_ldq_be_data_ra) DO_LD(ld1d_le, uint64_t, ldq_le_p, cpu_ldq_le_data_ra) -DO_LDQ(sve_ld1qq_be, sme_ld1q_be, 1, ldq_be_p, cpu_ldq_be_data_ra) -DO_LDQ(sve_ld1qq_le, sme_ld1q_le, 0, ldq_le_p, cpu_ldq_le_data_ra) +DO_LDQ(sve_ld1qq_be, sme_ld1q_be) +DO_LDQ(sve_ld1qq_le, sme_ld1q_le) DO_ST(st1b, uint8_t, stb_p, cpu_stb_data_ra) DO_ST(st1h_be, uint16_t, stw_be_p, cpu_stw_be_data_ra) @@ -XXX,XX +XXX,XX @@ DO_ST(st1s_le, uint32_t, stl_le_p, cpu_stl_le_data_ra) DO_ST(st1d_be, uint64_t, stq_be_p, cpu_stq_be_data_ra) DO_ST(st1d_le, uint64_t, stq_le_p, cpu_stq_le_data_ra) -DO_STQ(sve_st1qq_be, sme_st1q_be, 1, stq_be_p, cpu_stq_be_data_ra) -DO_STQ(sve_st1qq_le, sme_st1q_le, 0, stq_le_p, cpu_stq_le_data_ra) +DO_STQ(sve_st1qq_be, sme_st1q_be) +DO_STQ(sve_st1qq_le, sme_st1q_le) #undef DO_LD #undef DO_ST -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-98-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 32 +++++++++ target/arm/tcg/sve.decode | 31 +++++++++ target/arm/tcg/sve_helper.c | 8 +++ target/arm/tcg/translate-sve.c | 116 ++++++++++++++++++++++++--------- 4 files changed, 156 insertions(+), 31 deletions(-) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_ld2dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld3dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld4dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld2qq_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld3qq_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld4qq_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_ld2qq_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld3qq_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld4qq_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_4(sve_ld1bhu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1bsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1bdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_ld2dd_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld3dd_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld4dd_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld2qq_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld3qq_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld4qq_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_ld2qq_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld3qq_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld4qq_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_4(sve_ld1bhu_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1bsu_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1bdu_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_st2dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st3dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st4dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st2qq_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st3qq_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st4qq_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_st2qq_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st3qq_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st4qq_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_4(sve_st1bh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st1bs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st1bd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_st2dd_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st3dd_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st4dd_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st2qq_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st3qq_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st4qq_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_st2qq_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st3qq_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st4qq_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_4(sve_st1bh_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st1bs_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st1bd_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ @rprr_load_dt ....... dtype:4 rm:5 ... pg:3 rn:5 rd:5 &rprr_load @rpri_load_dt ....... dtype:4 . imm:s4 ... pg:3 rn:5 rd:5 &rpri_load +@rprr_load ....... .... rm:5 ... pg:3 rn:5 rd:5 &rprr_load +@rpri_load ....... .... . imm:s4 ... pg:3 rn:5 rd:5 &rpri_load + @rprr_load_msz ....... .... rm:5 ... pg:3 rn:5 rd:5 \ &rprr_load dtype=%msz_dtype @rpri_load_msz ....... .... . imm:s4 ... pg:3 rn:5 rd:5 \ @@ -XXX,XX +XXX,XX @@ LDNF1_zpri 1010010 .... 1.... 101 ... ..... ..... @rpri_load_dt nreg=0 # SVE load multiple structures (scalar plus scalar) # LD2B, LD2H, LD2W, LD2D; etc. LD_zprr 1010010 .. nreg:2 ..... 110 ... ..... ..... @rprr_load_msz +# LD[234]Q +LD_zprr 1010010 01 01 ..... 100 ... ..... ..... \ + @rprr_load dtype=18 nreg=1 +LD_zprr 1010010 10 01 ..... 100 ... ..... ..... \ + @rprr_load dtype=18 nreg=2 +LD_zprr 1010010 11 01 ..... 100 ... ..... ..... \ + @rprr_load dtype=18 nreg=3 # SVE contiguous non-temporal load (scalar plus immediate) # LDNT1B, LDNT1H, LDNT1W, LDNT1D # SVE load multiple structures (scalar plus immediate) # LD2B, LD2H, LD2W, LD2D; etc. LD_zpri 1010010 .. nreg:2 0.... 111 ... ..... ..... @rpri_load_msz +# LD[234]Q +LD_zpri 1010010 01 001 .... 111 ... ..... ..... \ + @rpri_load dtype=18 nreg=1 +LD_zpri 1010010 10 001 .... 111 ... ..... ..... \ + @rpri_load dtype=18 nreg=2 +LD_zpri 1010010 11 001 .... 111 ... ..... ..... \ + @rpri_load dtype=18 nreg=3 # SVE load and broadcast quadword (scalar plus scalar) LD1RQ_zprr 1010010 .. 00 ..... 000 ... ..... ..... \ @@ -XXX,XX +XXX,XX @@ ST_zprr 1110010 11 10 ..... 010 ... ..... ..... \ # SVE store multiple structures (scalar plus immediate) (nreg != 0) ST_zpri 1110010 .. nreg:2 1.... 111 ... ..... ..... \ @rpri_store msz=%size_23 esz=%size_23 +# ST[234]Q +ST_zpri 11100100 01 00 .... 000 ... ..... ..... \ + @rpri_store msz=4 esz=4 nreg=1 +ST_zpri 11100100 10 00 .... 000 ... ..... ..... \ + @rpri_store msz=4 esz=4 nreg=2 +ST_zpri 11100100 11 00 .... 000 ... ..... ..... \ + @rpri_store msz=4 esz=4 nreg=3 # SVE contiguous non-temporal store (scalar plus scalar) (nreg == 0) # SVE store multiple structures (scalar plus scalar) (nreg != 0) ST_zprr 1110010 .. nreg:2 ..... 011 ... ..... ..... \ @rprr_store msz=%size_23 esz=%size_23 +# ST[234]Q +ST_zprr 11100100 01 1 ..... 000 ... ..... ..... \ + @rprr_store msz=4 esz=4 nreg=1 +ST_zprr 11100100 10 1 ..... 000 ... ..... ..... \ + @rprr_store msz=4 esz=4 nreg=2 +ST_zprr 11100100 11 1 ..... 000 ... ..... ..... \ + @rprr_store msz=4 esz=4 nreg=3 # SVE 32-bit scatter store (scalar plus 32-bit scaled offsets) # Require msz > 0 && msz <= esz. diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ DO_LDN_2(2, dd, MO_64) DO_LDN_2(3, dd, MO_64) DO_LDN_2(4, dd, MO_64) +DO_LDN_2(2, qq, MO_128) +DO_LDN_2(3, qq, MO_128) +DO_LDN_2(4, qq, MO_128) + #undef DO_LDN_1 #undef DO_LDN_2 @@ -XXX,XX +XXX,XX @@ DO_STN_2(4, dd, MO_64, MO_64) DO_STN_2(1, sq, MO_128, MO_32) DO_STN_2(1, dq, MO_128, MO_64) +DO_STN_2(2, qq, MO_128, MO_128) +DO_STN_2(3, qq, MO_128, MO_128) +DO_STN_2(4, qq, MO_128, MO_128) + #undef DO_STN_1 #undef DO_STN_2 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static inline int expand_imm_sh8u(DisasContext *s, int x) */ static inline int msz_dtype(DisasContext *s, int msz) { - static const uint8_t dtype[4] = { 0, 5, 10, 15 }; + static const uint8_t dtype[5] = { 0, 5, 10, 15, 18 }; return dtype[msz]; } @@ -XXX,XX +XXX,XX @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, } /* Indexed by [mte][be][dtype][nreg] */ -static gen_helper_gvec_mem * const ldr_fns[2][2][18][4] = { +static gen_helper_gvec_mem * const ldr_fns[2][2][19][4] = { { /* mte inactive, little-endian */ { { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r, gen_helper_sve_ld3bb_r, gen_helper_sve_ld4bb_r }, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_mem * const ldr_fns[2][2][18][4] = { { gen_helper_sve_ld1squ_le_r, NULL, NULL, NULL }, { gen_helper_sve_ld1dqu_le_r, NULL, NULL, NULL }, + { NULL, gen_helper_sve_ld2qq_le_r, + gen_helper_sve_ld3qq_le_r, gen_helper_sve_ld4qq_le_r }, }, /* mte inactive, big-endian */ @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_mem * const ldr_fns[2][2][18][4] = { { gen_helper_sve_ld1squ_be_r, NULL, NULL, NULL }, { gen_helper_sve_ld1dqu_be_r, NULL, NULL, NULL }, + { NULL, gen_helper_sve_ld2qq_be_r, + gen_helper_sve_ld3qq_be_r, gen_helper_sve_ld4qq_be_r }, }, }, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_mem * const ldr_fns[2][2][18][4] = { { gen_helper_sve_ld1squ_le_r_mte, NULL, NULL, NULL }, { gen_helper_sve_ld1dqu_le_r_mte, NULL, NULL, NULL }, + { NULL, + gen_helper_sve_ld2qq_le_r_mte, + gen_helper_sve_ld3qq_le_r_mte, + gen_helper_sve_ld4qq_le_r_mte }, }, /* mte active, big-endian */ @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_mem * const ldr_fns[2][2][18][4] = { { gen_helper_sve_ld1squ_be_r_mte, NULL, NULL, NULL }, { gen_helper_sve_ld1dqu_be_r_mte, NULL, NULL, NULL }, + { NULL, + gen_helper_sve_ld2qq_be_r_mte, + gen_helper_sve_ld3qq_be_r_mte, + gen_helper_sve_ld4qq_be_r_mte }, }, }, }; @@ -XXX,XX +XXX,XX @@ static bool trans_LD_zprr(DisasContext *s, arg_rprr_load *a) return false; } - /* dtypes 16 and 17 are artificial, representing 128-bit element */ - if (a->dtype < 16) { + /* dtypes 16-18 are artificial, representing 128-bit element */ + switch (a->dtype) { + case 0 ... 15: if (!dc_isar_feature(aa64_sve, s)) { return false; } - } else { + break; + case 16: case 17: if (!dc_isar_feature(aa64_sve2p1, s)) { return false; } s->is_nonstreaming = true; + break; + case 18: + if (!dc_isar_feature(aa64_sme2p1_or_sve2p1, s)) { + return false; + } + break; + default: + g_assert_not_reached(); } if (sve_access_check(s)) { @@ -XXX,XX +XXX,XX @@ static bool trans_LD_zprr(DisasContext *s, arg_rprr_load *a) static bool trans_LD_zpri(DisasContext *s, arg_rpri_load *a) { - /* dtypes 16 and 17 are artificial, representing 128-bit element */ - if (a->dtype < 16) { + /* dtypes 16-18 are artificial, representing 128-bit element */ + switch (a->dtype) { + case 0 ... 15: if (!dc_isar_feature(aa64_sve, s)) { return false; } - } else { + break; + case 16: case 17: if (!dc_isar_feature(aa64_sve2p1, s)) { return false; } s->is_nonstreaming = true; + break; + case 18: + if (!dc_isar_feature(aa64_sme2p1_or_sve2p1, s)) { + return false; + } + break; + default: + g_assert_not_reached(); } if (sve_access_check(s)) { @@ -XXX,XX +XXX,XX @@ static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, gen_helper_sve_st1dd_be_r_mte, gen_helper_sve_st1dq_be_r_mte } } }, }; - static gen_helper_gvec_mem * const fn_multiple[2][2][3][4] = { + static gen_helper_gvec_mem * const fn_multiple[2][2][3][5] = { { { { gen_helper_sve_st2bb_r, gen_helper_sve_st2hh_le_r, gen_helper_sve_st2ss_le_r, - gen_helper_sve_st2dd_le_r }, + gen_helper_sve_st2dd_le_r, + gen_helper_sve_st2qq_le_r }, { gen_helper_sve_st3bb_r, gen_helper_sve_st3hh_le_r, gen_helper_sve_st3ss_le_r, - gen_helper_sve_st3dd_le_r }, + gen_helper_sve_st3dd_le_r, + gen_helper_sve_st3qq_le_r }, { gen_helper_sve_st4bb_r, gen_helper_sve_st4hh_le_r, gen_helper_sve_st4ss_le_r, - gen_helper_sve_st4dd_le_r } }, + gen_helper_sve_st4dd_le_r, + gen_helper_sve_st4qq_le_r } }, { { gen_helper_sve_st2bb_r, gen_helper_sve_st2hh_be_r, gen_helper_sve_st2ss_be_r, - gen_helper_sve_st2dd_be_r }, + gen_helper_sve_st2dd_be_r, + gen_helper_sve_st2qq_be_r }, { gen_helper_sve_st3bb_r, gen_helper_sve_st3hh_be_r, gen_helper_sve_st3ss_be_r, - gen_helper_sve_st3dd_be_r }, + gen_helper_sve_st3dd_be_r, + gen_helper_sve_st3qq_be_r }, { gen_helper_sve_st4bb_r, gen_helper_sve_st4hh_be_r, gen_helper_sve_st4ss_be_r, - gen_helper_sve_st4dd_be_r } } }, + gen_helper_sve_st4dd_be_r, + gen_helper_sve_st4qq_be_r } } }, { { { gen_helper_sve_st2bb_r_mte, gen_helper_sve_st2hh_le_r_mte, gen_helper_sve_st2ss_le_r_mte, - gen_helper_sve_st2dd_le_r_mte }, + gen_helper_sve_st2dd_le_r_mte, + gen_helper_sve_st2qq_le_r_mte }, { gen_helper_sve_st3bb_r_mte, gen_helper_sve_st3hh_le_r_mte, gen_helper_sve_st3ss_le_r_mte, - gen_helper_sve_st3dd_le_r_mte }, + gen_helper_sve_st3dd_le_r_mte, + gen_helper_sve_st3qq_le_r_mte }, { gen_helper_sve_st4bb_r_mte, gen_helper_sve_st4hh_le_r_mte, gen_helper_sve_st4ss_le_r_mte, - gen_helper_sve_st4dd_le_r_mte } }, + gen_helper_sve_st4dd_le_r_mte, + gen_helper_sve_st4qq_le_r_mte } }, { { gen_helper_sve_st2bb_r_mte, gen_helper_sve_st2hh_be_r_mte, gen_helper_sve_st2ss_be_r_mte, - gen_helper_sve_st2dd_be_r_mte }, + gen_helper_sve_st2dd_be_r_mte, + gen_helper_sve_st2qq_be_r_mte }, { gen_helper_sve_st3bb_r_mte, gen_helper_sve_st3hh_be_r_mte, gen_helper_sve_st3ss_be_r_mte, - gen_helper_sve_st3dd_be_r_mte }, + gen_helper_sve_st3dd_be_r_mte, + gen_helper_sve_st3qq_be_r_mte }, { gen_helper_sve_st4bb_r_mte, gen_helper_sve_st4hh_be_r_mte, gen_helper_sve_st4ss_be_r_mte, - gen_helper_sve_st4dd_be_r_mte } } }, + gen_helper_sve_st4dd_be_r_mte, + gen_helper_sve_st4qq_be_r_mte } } }, }; gen_helper_gvec_mem *fn; int be = s->be_data == MO_BE; @@ -XXX,XX +XXX,XX @@ static bool trans_ST_zprr(DisasContext *s, arg_rprr_store *a) } break; case MO_128: - assert(a->msz < a->esz); - assert(a->nreg == 0); - if (!dc_isar_feature(aa64_sve2p1, s)) { - return false; + if (a->nreg == 0) { + assert(a->msz < a->esz); + if (!dc_isar_feature(aa64_sve2p1, s)) { + return false; + } + s->is_nonstreaming = true; + } else { + if (!dc_isar_feature(aa64_sme2p1_or_sve2p1, s)) { + return false; + } } - s->is_nonstreaming = true; break; default: g_assert_not_reached(); @@ -XXX,XX +XXX,XX @@ static bool trans_ST_zpri(DisasContext *s, arg_rpri_store *a) } break; case MO_128: - assert(a->msz < a->esz); - assert(a->nreg == 0); - if (!dc_isar_feature(aa64_sve2p1, s)) { - return false; + if (a->nreg == 0) { + assert(a->msz < a->esz); + if (!dc_isar_feature(aa64_sve2p1, s)) { + return false; + } + s->is_nonstreaming = true; + } else { + if (!dc_isar_feature(aa64_sme2p1_or_sve2p1, s)) { + return false; + } } - s->is_nonstreaming = true; break; default: g_assert_not_reached(); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-99-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 16 ++++++++++++++++ target/arm/tcg/sve.decode | 8 ++++++++ target/arm/tcg/sve_helper.c | 6 ++++++ target/arm/tcg/translate-sve.c | 34 ++++++++++++++++++++++++++++++++-- 4 files changed, 62 insertions(+), 2 deletions(-) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sve_ldsds_le_zd, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_6(sve_ldsds_be_zd, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldqq_le_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldqq_be_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_6(sve_ldbsu_zsu_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sve_ldsds_le_zd_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_6(sve_ldsds_be_zd_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldqq_le_zd_mte, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldqq_be_zd_mte, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_6(sve_ldffbsu_zsu, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sve_stdd_le_zd, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_6(sve_stdd_be_zd, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_stqq_le_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_stqq_be_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_6(sve_stbs_zsu_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sve_stdd_le_zd_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_6(sve_stdd_be_zd_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_stqq_le_zd_mte, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_stqq_be_zd_mte, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve2_sqdmull_zzz_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ LD1_zprz 1100010 10 1. ..... 1.. ... ..... ..... \ LD1_zprz 1100010 11 1. ..... 11. ... ..... ..... \ @rprr_g_load_sc esz=3 msz=3 u=1 +# LD1Q +LD1_zprz 1100 0100 000 rm:5 101 pg:3 rn:5 rd:5 \ + &rprr_gather_load u=0 ff=0 xs=2 esz=4 msz=4 scale=0 + # SVE 64-bit gather load (vector plus immediate) LD1_zpiz 1100010 .. 01 ..... 1.. ... ..... ..... \ @rpri_g_load esz=3 @@ -XXX,XX +XXX,XX @@ ST1_zprz 1110010 .. 01 ..... 101 ... ..... ..... \ ST1_zprz 1110010 .. 00 ..... 101 ... ..... ..... \ @rprr_scatter_store xs=2 esz=3 scale=0 +# ST1Q +ST1_zprz 1110 0100 001 rm:5 001 pg:3 rn:5 rd:5 \ + &rprr_scatter_store xs=2 msz=4 esz=4 scale=0 + # SVE 64-bit scatter store (vector plus immediate) ST1_zpiz 1110010 .. 10 ..... 101 ... ..... ..... \ @rpri_scatter_store esz=3 diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ DO_LD1_ZPZ_D(dd_be, zsu, MO_64) DO_LD1_ZPZ_D(dd_be, zss, MO_64) DO_LD1_ZPZ_D(dd_be, zd, MO_64) +DO_LD1_ZPZ_D(qq_le, zd, MO_128) +DO_LD1_ZPZ_D(qq_be, zd, MO_128) + #undef DO_LD1_ZPZ_S #undef DO_LD1_ZPZ_D @@ -XXX,XX +XXX,XX @@ DO_ST1_ZPZ_D(sd_be, zd, MO_32) DO_ST1_ZPZ_D(dd_le, zd, MO_64) DO_ST1_ZPZ_D(dd_be, zd, MO_64) +DO_ST1_ZPZ_D(qq_le, zd, MO_128) +DO_ST1_ZPZ_D(qq_be, zd, MO_128) + #undef DO_ST1_ZPZ_S #undef DO_ST1_ZPZ_D diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ gather_load_fn64[2][2][2][3][2][4] = { gen_helper_sve_ldffdd_be_zd_mte, } } } } }, }; +static gen_helper_gvec_mem_scatter * const +gather_load_fn128[2][2] = { + { gen_helper_sve_ldqq_le_zd, + gen_helper_sve_ldqq_be_zd }, + { gen_helper_sve_ldqq_le_zd_mte, + gen_helper_sve_ldqq_be_zd_mte } +}; + static bool trans_LD1_zprz(DisasContext *s, arg_LD1_zprz *a) { gen_helper_gvec_mem_scatter *fn = NULL; bool be = s->be_data == MO_BE; bool mte = s->mte_active[0]; - if (!dc_isar_feature(aa64_sve, s)) { + if (a->esz < MO_128 + ? !dc_isar_feature(aa64_sve, s) + : !dc_isar_feature(aa64_sve2p1, s)) { return false; } s->is_nonstreaming = true; @@ -XXX,XX +XXX,XX @@ static bool trans_LD1_zprz(DisasContext *s, arg_LD1_zprz *a) case MO_64: fn = gather_load_fn64[mte][be][a->ff][a->xs][a->u][a->msz]; break; + case MO_128: + assert(!a->ff && a->u && a->xs == 2 && a->msz == MO_128); + fn = gather_load_fn128[mte][be]; + break; + default: + g_assert_not_reached(); } assert(fn != NULL); @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_mem_scatter * const scatter_store_fn64[2][2][3][4] = { gen_helper_sve_stdd_be_zd_mte, } } }, }; +static gen_helper_gvec_mem_scatter * const +scatter_store_fn128[2][2] = { + { gen_helper_sve_stqq_le_zd, + gen_helper_sve_stqq_be_zd }, + { gen_helper_sve_stqq_le_zd_mte, + gen_helper_sve_stqq_be_zd_mte } +}; + static bool trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a) { gen_helper_gvec_mem_scatter *fn; @@ -XXX,XX +XXX,XX @@ static bool trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a) if (a->esz < a->msz || (a->msz == 0 && a->scale)) { return false; } - if (!dc_isar_feature(aa64_sve, s)) { + if (a->esz < MO_128 + ? !dc_isar_feature(aa64_sve, s) + : !dc_isar_feature(aa64_sve2p1, s)) { return false; } s->is_nonstreaming = true; @@ -XXX,XX +XXX,XX @@ static bool trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a) case MO_64: fn = scatter_store_fn64[mte][be][a->xs][a->msz]; break; + case MO_128: + assert(a->xs == 2 && a->msz == MO_128); + fn = scatter_store_fn128[mte][be]; + break; default: g_assert_not_reached(); } -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-100-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 6 ++++ target/arm/tcg/sme.decode | 36 ++++++++++++++++++++ target/arm/tcg/sme_helper.c | 60 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 46 +++++++++++++++++++------- 4 files changed, 137 insertions(+), 11 deletions(-) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(sme2_mova_zc_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_mova_cz_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_mova_zc_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2p1_movaz_zc_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2p1_movaz_zc_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2p1_movaz_zc_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2p1_movaz_zc_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2p1_movaz_zc_q, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(sme_ld1b_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_5(sme_ld1b_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_5(sme_ld1b_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ MOVA_za2 11000000 00 00011 00 .. 010 00 off:3 zr:4 0 \ MOVA_za4 11000000 00 00011 00 .. 011 00 off:3 zr:3 00 \ &mova_a rv=%mova_rv +### SME Move and Zero + +MOVAZ_za2 11000000 00000110 0 .. 01010 off:3 zr:4 0 \ + &mova_a rv=%mova_rv +MOVAZ_za4 11000000 00000110 0 .. 01110 off:3 zr:3 00 \ + &mova_a rv=%mova_rv + +MOVAZ_zt 11000000 00 00001 0 v:1 .. 0001 off:4 zr:5 \ + &mova_t rs=%mova_rs esz=0 za=0 +MOVAZ_zt 11000000 01 00001 0 v:1 .. 0001 za:1 off:3 zr:5 \ + &mova_t rs=%mova_rs esz=1 +MOVAZ_zt 11000000 10 00001 0 v:1 .. 0001 za:2 off:2 zr:5 \ + &mova_t rs=%mova_rs esz=2 +MOVAZ_zt 11000000 11 00001 0 v:1 .. 0001 za:3 off:1 zr:5 \ + &mova_t rs=%mova_rs esz=3 +MOVAZ_zt 11000000 11 00001 1 v:1 .. 0001 za:4 zr:5 \ + &mova_t rs=%mova_rs esz=4 off=0 + +MOVAZ_zt2 11000000 00 00011 0 v:1 .. 00010 off:3 zr:4 0 \ + &mova_t rs=%mova_rs esz=0 za=0 +MOVAZ_zt2 11000000 01 00011 0 v:1 .. 00010 za:1 off:2 zr:4 0 \ + &mova_t rs=%mova_rs esz=1 +MOVAZ_zt2 11000000 10 00011 0 v:1 .. 00010 za:2 off:1 zr:4 0 \ + &mova_t rs=%mova_rs esz=2 +MOVAZ_zt2 11000000 11 00011 0 v:1 .. 00010 za:3 zr:4 0 \ + &mova_t rs=%mova_rs esz=3 off=0 + +MOVAZ_zt4 11000000 00 00011 0 v:1 .. 001100 off:2 zr:3 00 \ + &mova_t rs=%mova_rs esz=0 za=0 +MOVAZ_zt4 11000000 01 00011 0 v:1 .. 001100 za:1 off:1 zr:3 00 \ + &mova_t rs=%mova_rs esz=1 +MOVAZ_zt4 11000000 10 00011 0 v:1 .. 001100 za:2 zr:3 00 \ + &mova_t rs=%mova_rs esz=2 off=0 +MOVAZ_zt4 11000000 11 00011 0 v:1 .. 00110 za:3 zr:3 00 \ + &mova_t rs=%mova_rs esz=3 off=0 + ### SME Move into/from ZT0 MOVT_rzt 1100 0000 0100 1100 0 off:3 00 11111 rt:5 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_mova_zc_d)(void *vdst, void *vsrc, uint32_t desc) } } +void HELPER(sme2p1_movaz_zc_b)(void *vdst, void *vsrc, uint32_t desc) +{ + uint8_t *src = vsrc; + uint8_t *dst = vdst; + size_t i, n = simd_oprsz(desc); + + for (i = 0; i < n; ++i) { + dst[i] = src[tile_vslice_index(i)]; + src[tile_vslice_index(i)] = 0; + } +} + +void HELPER(sme2p1_movaz_zc_h)(void *vdst, void *vsrc, uint32_t desc) +{ + uint16_t *src = vsrc; + uint16_t *dst = vdst; + size_t i, n = simd_oprsz(desc) / 2; + + for (i = 0; i < n; ++i) { + dst[i] = src[tile_vslice_index(i)]; + src[tile_vslice_index(i)] = 0; + } +} + +void HELPER(sme2p1_movaz_zc_s)(void *vdst, void *vsrc, uint32_t desc) +{ + uint32_t *src = vsrc; + uint32_t *dst = vdst; + size_t i, n = simd_oprsz(desc) / 4; + + for (i = 0; i < n; ++i) { + dst[i] = src[tile_vslice_index(i)]; + src[tile_vslice_index(i)] = 0; + } +} + +void HELPER(sme2p1_movaz_zc_d)(void *vdst, void *vsrc, uint32_t desc) +{ + uint64_t *src = vsrc; + uint64_t *dst = vdst; + size_t i, n = simd_oprsz(desc) / 8; + + for (i = 0; i < n; ++i) { + dst[i] = src[tile_vslice_index(i)]; + src[tile_vslice_index(i)] = 0; + } +} + +void HELPER(sme2p1_movaz_zc_q)(void *vdst, void *vsrc, uint32_t desc) +{ + Int128 *src = vsrc; + Int128 *dst = vdst; + size_t i, n = simd_oprsz(desc) / 16; + + for (i = 0; i < n; ++i) { + dst[i] = src[tile_vslice_index(i)]; + memset(&src[tile_vslice_index(i)], 0, 16); + } +} + /* * Clear elements in a tile slice comprising len bytes. */ diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_mova_tile(DisasContext *s, arg_mova_p *a, bool to_vec) TRANS_FEAT(MOVA_tz, aa64_sme, do_mova_tile, a, false) TRANS_FEAT(MOVA_zt, aa64_sme, do_mova_tile, a, true) -static bool do_mova_tile_n(DisasContext *s, arg_mova_t *a, int n, bool to_vec) +static bool do_mova_tile_n(DisasContext *s, arg_mova_t *a, int n, + bool to_vec, bool zero) { static gen_helper_gvec_2 * const cz_fns[] = { gen_helper_sme2_mova_cz_b, gen_helper_sme2_mova_cz_h, @@ -XXX,XX +XXX,XX @@ static bool do_mova_tile_n(DisasContext *s, arg_mova_t *a, int n, bool to_vec) gen_helper_sme2_mova_zc_b, gen_helper_sme2_mova_zc_h, gen_helper_sme2_mova_zc_s, gen_helper_sme2_mova_zc_d, }; + static gen_helper_gvec_2 * const zc_z_fns[] = { + gen_helper_sme2p1_movaz_zc_b, gen_helper_sme2p1_movaz_zc_h, + gen_helper_sme2p1_movaz_zc_s, gen_helper_sme2p1_movaz_zc_d, + gen_helper_sme2p1_movaz_zc_q, + }; TCGv_ptr t_za; int svl, bytes_per_op = n << a->esz; @@ -XXX,XX +XXX,XX @@ static bool do_mova_tile_n(DisasContext *s, arg_mova_t *a, int n, bool to_vec) return true; } + assert(a->esz <= MO_64 + zero); + if (!sme_smza_enabled_check(s)) { return true; } @@ -XXX,XX +XXX,XX @@ static bool do_mova_tile_n(DisasContext *s, arg_mova_t *a, int n, bool to_vec) TCGv_ptr t_zr = vec_full_reg_ptr(s, a->zr * n + i); t_za = get_tile_rowcol(s, a->esz, a->rs, a->za, a->off * n + i, 1, n, a->v); - if (to_vec) { + if (zero) { + zc_z_fns[a->esz](t_zr, t_za, t_desc); + } else if (to_vec) { zc_fns[a->esz](t_zr, t_za, t_desc); } else { cz_fns[a->esz](t_za, t_zr, t_desc); @@ -XXX,XX +XXX,XX @@ static bool do_mova_tile_n(DisasContext *s, arg_mova_t *a, int n, bool to_vec) a->off * n + i, 1, n, a->v); if (to_vec) { tcg_gen_gvec_mov_var(MO_8, tcg_env, o_zr, t_za, 0, svl, svl); + if (zero) { + tcg_gen_gvec_dup_imm_var(MO_8, t_za, 0, svl, svl, 0); + } } else { tcg_gen_gvec_mov_var(MO_8, t_za, 0, tcg_env, o_zr, svl, svl); } @@ -XXX,XX +XXX,XX @@ static bool do_mova_tile_n(DisasContext *s, arg_mova_t *a, int n, bool to_vec) return true; } -TRANS_FEAT(MOVA_tz2, aa64_sme2, do_mova_tile_n, a, 2, false) -TRANS_FEAT(MOVA_tz4, aa64_sme2, do_mova_tile_n, a, 4, false) -TRANS_FEAT(MOVA_zt2, aa64_sme2, do_mova_tile_n, a, 2, true) -TRANS_FEAT(MOVA_zt4, aa64_sme2, do_mova_tile_n, a, 4, true) +TRANS_FEAT(MOVA_tz2, aa64_sme2, do_mova_tile_n, a, 2, false, false) +TRANS_FEAT(MOVA_tz4, aa64_sme2, do_mova_tile_n, a, 4, false, false) +TRANS_FEAT(MOVA_zt2, aa64_sme2, do_mova_tile_n, a, 2, true, false) +TRANS_FEAT(MOVA_zt4, aa64_sme2, do_mova_tile_n, a, 4, true, false) -static bool do_mova_array_n(DisasContext *s, arg_mova_a *a, int n, bool to_vec) +TRANS_FEAT(MOVAZ_zt, aa64_sme2p1, do_mova_tile_n, a, 1, true, true) +TRANS_FEAT(MOVAZ_zt2, aa64_sme2p1, do_mova_tile_n, a, 2, true, true) +TRANS_FEAT(MOVAZ_zt4, aa64_sme2p1, do_mova_tile_n, a, 4, true, true) + +static bool do_mova_array_n(DisasContext *s, arg_mova_a *a, int n, + bool to_vec, bool zero) { TCGv_ptr t_za; int svl; @@ -XXX,XX +XXX,XX @@ static bool do_mova_array_n(DisasContext *s, arg_mova_a *a, int n, bool to_vec) if (to_vec) { tcg_gen_gvec_mov_var(MO_8, tcg_env, o_zr, t_za, o_za, svl, svl); + if (zero) { + tcg_gen_gvec_dup_imm_var(MO_8, t_za, o_za, svl, svl, 0); + } } else { tcg_gen_gvec_mov_var(MO_8, t_za, o_za, tcg_env, o_zr, svl, svl); } @@ -XXX,XX +XXX,XX @@ static bool do_mova_array_n(DisasContext *s, arg_mova_a *a, int n, bool to_vec) return true; } -TRANS_FEAT(MOVA_az2, aa64_sme2, do_mova_array_n, a, 2, false) -TRANS_FEAT(MOVA_az4, aa64_sme2, do_mova_array_n, a, 4, false) -TRANS_FEAT(MOVA_za2, aa64_sme2, do_mova_array_n, a, 2, true) -TRANS_FEAT(MOVA_za4, aa64_sme2, do_mova_array_n, a, 4, true) +TRANS_FEAT(MOVA_az2, aa64_sme2, do_mova_array_n, a, 2, false, false) +TRANS_FEAT(MOVA_az4, aa64_sme2, do_mova_array_n, a, 4, false, false) +TRANS_FEAT(MOVA_za2, aa64_sme2, do_mova_array_n, a, 2, true, false) +TRANS_FEAT(MOVA_za4, aa64_sme2, do_mova_array_n, a, 4, true, false) + +TRANS_FEAT(MOVAZ_za2, aa64_sme2p1, do_mova_array_n, a, 2, true, true) +TRANS_FEAT(MOVAZ_za4, aa64_sme2p1, do_mova_array_n, a, 4, true, true) static bool do_movt(DisasContext *s, arg_MOVT_rzt *a, void (*func)(TCGv_i64, TCGv_ptr, tcg_target_long)) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-101-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper.h | 24 ++++++++++ target/arm/tcg/sme.decode | 42 ++++++++++++++++ target/arm/tcg/translate-sme.c | 56 ++++++++++++++++++++++ target/arm/tcg/vec_helper.c | 88 ++++++++++++++++++++++++++++++++++ 4 files changed, 210 insertions(+) diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper.h +++ b/target/arm/tcg/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_uminp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_3(gvec_urecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(gvec_ursqrte_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sme2_luti2_1b, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti2_1h, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti2_1s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) + +DEF_HELPER_FLAGS_4(sme2_luti2_2b, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti2_2h, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti2_2s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) + +DEF_HELPER_FLAGS_4(sme2_luti2_4b, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti2_4h, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti2_4s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) + +DEF_HELPER_FLAGS_4(sme2_luti4_1b, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti4_1h, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti4_1s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) + +DEF_HELPER_FLAGS_4(sme2_luti4_2b, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti4_2h, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti4_2s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) + +DEF_HELPER_FLAGS_4(sme2_luti4_4b, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti4_4h, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti4_4s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ ZERO_za 11000000 000011 110 .. 0000000000 00. \ &zero_za ngrp=2 nvec=4 rv=%mova_rv off=%off1_x4 ZERO_za 11000000 000011 111 .. 0000000000 00. \ &zero_za ngrp=4 nvec=4 rv=%mova_rv off=%off1_x4 + +### SME Lookup Table Read + +&lut zd zn idx + +# LUTI2, consecutive +LUTI2_c_1b 1100 0000 1100 11 idx:4 00 00 zn:5 zd:5 &lut +LUTI2_c_1h 1100 0000 1100 11 idx:4 01 00 zn:5 zd:5 &lut +LUTI2_c_1s 1100 0000 1100 11 idx:4 10 00 zn:5 zd:5 &lut + +LUTI2_c_2b 1100 0000 1000 11 idx:3 1 00 00 zn:5 .... 0 &lut zd=%zd_ax2 +LUTI2_c_2h 1100 0000 1000 11 idx:3 1 01 00 zn:5 .... 0 &lut zd=%zd_ax2 +LUTI2_c_2s 1100 0000 1000 11 idx:3 1 10 00 zn:5 .... 0 &lut zd=%zd_ax2 + +LUTI2_c_4b 1100 0000 1000 11 idx:2 10 00 00 zn:5 ... 00 &lut zd=%zd_ax4 +LUTI2_c_4h 1100 0000 1000 11 idx:2 10 01 00 zn:5 ... 00 &lut zd=%zd_ax4 +LUTI2_c_4s 1100 0000 1000 11 idx:2 10 10 00 zn:5 ... 00 &lut zd=%zd_ax4 + +# LUTI2, strided (must check zd alignment) +LUTI2_s_2b 1100 0000 1001 11 idx:3 1 00 00 zn:5 zd:5 &lut +LUTI2_s_2h 1100 0000 1001 11 idx:3 1 01 00 zn:5 zd:5 &lut + +LUTI2_s_4b 1100 0000 1001 11 idx:2 10 00 00 zn:5 zd:5 &lut +LUTI2_s_4h 1100 0000 1001 11 idx:2 10 01 00 zn:5 zd:5 &lut + +# LUTI4, consecutive +LUTI4_c_1b 1100 0000 1100 101 idx:3 00 00 zn:5 zd:5 &lut +LUTI4_c_1h 1100 0000 1100 101 idx:3 01 00 zn:5 zd:5 &lut +LUTI4_c_1s 1100 0000 1100 101 idx:3 10 00 zn:5 zd:5 &lut + +LUTI4_c_2b 1100 0000 1000 101 idx:2 1 00 00 zn:5 .... 0 &lut zd=%zd_ax2 +LUTI4_c_2h 1100 0000 1000 101 idx:2 1 01 00 zn:5 .... 0 &lut zd=%zd_ax2 +LUTI4_c_2s 1100 0000 1000 101 idx:2 1 10 00 zn:5 .... 0 &lut zd=%zd_ax2 + +LUTI4_c_4h 1100 0000 1000 101 idx:1 10 01 00 zn:5 ... 00 &lut zd=%zd_ax4 +LUTI4_c_4s 1100 0000 1000 101 idx:1 10 10 00 zn:5 ... 00 &lut zd=%zd_ax4 + +# LUTI4, strided (must check zd alignment) +LUTI4_s_2b 1100 0000 1001 101 idx:2 1 00 00 zn:5 zd:5 &lut +LUTI4_s_2h 1100 0000 1001 101 idx:2 1 01 00 zn:5 zd:5 &lut + +LUTI4_s_4h 1100 0000 1001 101 idx:1 10 01 00 zn:5 zd:5 &lut diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool trans_SEL(DisasContext *s, arg_SEL *a) } return true; } + +static bool do_lut(DisasContext *s, arg_lut *a, + gen_helper_gvec_2_ptr *fn, bool strided) +{ + if (sme_sm_enabled_check(s) && sme2_zt0_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + tcg_gen_gvec_2_ptr(vec_full_reg_offset(s, a->zd), + vec_full_reg_offset(s, a->zn), + tcg_env, svl, svl, strided | (a->idx << 1), fn); + } + return true; +} + +TRANS_FEAT(LUTI2_c_1b, aa64_sme2, do_lut, a, gen_helper_sme2_luti2_1b, false) +TRANS_FEAT(LUTI2_c_1h, aa64_sme2, do_lut, a, gen_helper_sme2_luti2_1h, false) +TRANS_FEAT(LUTI2_c_1s, aa64_sme2, do_lut, a, gen_helper_sme2_luti2_1s, false) + +TRANS_FEAT(LUTI2_c_2b, aa64_sme2, do_lut, a, gen_helper_sme2_luti2_2b, false) +TRANS_FEAT(LUTI2_c_2h, aa64_sme2, do_lut, a, gen_helper_sme2_luti2_2h, false) +TRANS_FEAT(LUTI2_c_2s, aa64_sme2, do_lut, a, gen_helper_sme2_luti2_2s, false) + +TRANS_FEAT(LUTI2_c_4b, aa64_sme2, do_lut, a, gen_helper_sme2_luti2_4b, false) +TRANS_FEAT(LUTI2_c_4h, aa64_sme2, do_lut, a, gen_helper_sme2_luti2_4h, false) +TRANS_FEAT(LUTI2_c_4s, aa64_sme2, do_lut, a, gen_helper_sme2_luti2_4s, false) + +TRANS_FEAT(LUTI4_c_1b, aa64_sme2, do_lut, a, gen_helper_sme2_luti4_1b, false) +TRANS_FEAT(LUTI4_c_1h, aa64_sme2, do_lut, a, gen_helper_sme2_luti4_1h, false) +TRANS_FEAT(LUTI4_c_1s, aa64_sme2, do_lut, a, gen_helper_sme2_luti4_1s, false) + +TRANS_FEAT(LUTI4_c_2b, aa64_sme2, do_lut, a, gen_helper_sme2_luti4_2b, false) +TRANS_FEAT(LUTI4_c_2h, aa64_sme2, do_lut, a, gen_helper_sme2_luti4_2h, false) +TRANS_FEAT(LUTI4_c_2s, aa64_sme2, do_lut, a, gen_helper_sme2_luti4_2s, false) + +TRANS_FEAT(LUTI4_c_4h, aa64_sme2, do_lut, a, gen_helper_sme2_luti4_4h, false) +TRANS_FEAT(LUTI4_c_4s, aa64_sme2, do_lut, a, gen_helper_sme2_luti4_4s, false) + +static bool do_lut_s4(DisasContext *s, arg_lut *a, gen_helper_gvec_2_ptr *fn) +{ + return !(a->zd & 0b01100) && do_lut(s, a, fn, true); +} + +static bool do_lut_s8(DisasContext *s, arg_lut *a, gen_helper_gvec_2_ptr *fn) +{ + return !(a->zd & 0b01000) && do_lut(s, a, fn, true); +} + +TRANS_FEAT(LUTI2_s_2b, aa64_sme2p1, do_lut_s8, a, gen_helper_sme2_luti2_2b) +TRANS_FEAT(LUTI2_s_2h, aa64_sme2p1, do_lut_s8, a, gen_helper_sme2_luti2_2h) + +TRANS_FEAT(LUTI2_s_4b, aa64_sme2p1, do_lut_s4, a, gen_helper_sme2_luti2_4b) +TRANS_FEAT(LUTI2_s_4h, aa64_sme2p1, do_lut_s4, a, gen_helper_sme2_luti2_4h) + +TRANS_FEAT(LUTI4_s_2b, aa64_sme2p1, do_lut_s8, a, gen_helper_sme2_luti4_2b) +TRANS_FEAT(LUTI4_s_2h, aa64_sme2p1, do_lut_s8, a, gen_helper_sme2_luti4_2h) + +TRANS_FEAT(LUTI4_s_4h, aa64_sme2p1, do_lut_s4, a, gen_helper_sme2_luti4_4h) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(gvec_ursqrte_s)(void *vd, void *vn, uint32_t desc) } clear_tail(d, opr_sz, simd_maxsz(desc)); } + +static inline void do_lut_b(void *zd, uint64_t *indexes, uint64_t *table, + unsigned elements, unsigned segbase, + unsigned dstride, unsigned isize, + unsigned tsize, unsigned nreg) +{ + for (unsigned r = 0; r < nreg; ++r) { + uint8_t *dst = zd + dstride * r; + unsigned base = segbase + r * elements; + + for (unsigned e = 0; e < elements; ++e) { + unsigned index = extractn(indexes, (base + e) * isize, isize); + dst[H1(e)] = extractn(table, index * tsize, 8); + } + } +} + +static inline void do_lut_h(void *zd, uint64_t *indexes, uint64_t *table, + unsigned elements, unsigned segbase, + unsigned dstride, unsigned isize, + unsigned tsize, unsigned nreg) +{ + for (unsigned r = 0; r < nreg; ++r) { + uint16_t *dst = zd + dstride * r; + unsigned base = segbase + r * elements; + + for (unsigned e = 0; e < elements; ++e) { + unsigned index = extractn(indexes, (base + e) * isize, isize); + dst[H2(e)] = extractn(table, index * tsize, 16); + } + } +} + +static inline void do_lut_s(void *zd, uint64_t *indexes, uint32_t *table, + unsigned elements, unsigned segbase, + unsigned dstride, unsigned isize, + unsigned tsize, unsigned nreg) +{ + for (unsigned r = 0; r < nreg; ++r) { + uint32_t *dst = zd + dstride * r; + unsigned base = segbase + r * elements; + + for (unsigned e = 0; e < elements; ++e) { + unsigned index = extractn(indexes, (base + e) * isize, isize); + dst[H4(e)] = table[H4(index)]; + } + } +} + +#define DO_SME2_LUT(ISIZE, NREG, SUFF, ESIZE) \ +void helper_sme2_luti##ISIZE##_##NREG##SUFF \ + (void *zd, void *zn, CPUARMState *env, uint32_t desc) \ +{ \ + unsigned vl = simd_oprsz(desc); \ + unsigned strided = extract32(desc, SIMD_DATA_SHIFT, 1); \ + unsigned idx = extract32(desc, SIMD_DATA_SHIFT + 1, 4); \ + unsigned elements = vl / ESIZE; \ + unsigned dstride = (!strided ? 1 : NREG == 4 ? 4 : 8); \ + unsigned segments = (ESIZE * 8) / (ISIZE * NREG); \ + unsigned segment = idx & (segments - 1); \ + ARMVectorReg indexes; \ + memcpy(&indexes, zn, vl); \ + do_lut_##SUFF(zd, indexes.d, (void *)env->za_state.zt0, elements, \ + segment * NREG * elements, \ + dstride * sizeof(ARMVectorReg), ISIZE, 32, NREG); \ +} + +DO_SME2_LUT(2,1,b, 1) +DO_SME2_LUT(2,1,h, 2) +DO_SME2_LUT(2,1,s, 4) +DO_SME2_LUT(2,2,b, 1) +DO_SME2_LUT(2,2,h, 2) +DO_SME2_LUT(2,2,s, 4) +DO_SME2_LUT(2,4,b, 1) +DO_SME2_LUT(2,4,h, 2) +DO_SME2_LUT(2,4,s, 4) + +DO_SME2_LUT(4,1,b, 1) +DO_SME2_LUT(4,1,h, 2) +DO_SME2_LUT(4,1,s, 4) +DO_SME2_LUT(4,2,b, 1) +DO_SME2_LUT(4,2,h, 2) +DO_SME2_LUT(4,2,s, 4) +DO_SME2_LUT(4,4,b, 1) +DO_SME2_LUT(4,4,h, 2) +DO_SME2_LUT(4,4,s, 4) + +#undef DO_SME2_LUT -- 2.43.0
The pattern we currently have as FMOPA_h is the "widening" insn that takes fp16 inputs and produces single-precision outputs. This is unlike FMOPA_s and FMOPA_d, which are non-widening produce outputs the same size as their inputs. SME2 introduces a non-widening fp16 FMOPA operation; rename FMOPA_h to FMOPA_w_h (for 'widening'), so we can use FMOPA_h for the non-widening version, giving it a name in line with the other non-widening ops FMOPA_s and FMOPA_d. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-102-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 2 +- target/arm/tcg/sme.decode | 2 +- target/arm/tcg/sme_helper.c | 4 ++-- target/arm/tcg/translate-sme.c | 4 ++-- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sme_addva_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sme_addha_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sme_addva_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_7(sme_fmopa_h, TCG_CALL_NO_RWG, +DEF_HELPER_FLAGS_7(sme_fmopa_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) DEF_HELPER_FLAGS_7(sme_fmopa_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ FMOPA_s 10000000 100 ..... ... ... ..... . 00 .. @op_32 FMOPA_d 10000000 110 ..... ... ... ..... . 0 ... @op_64 BFMOPA 10000001 100 ..... ... ... ..... . 00 .. @op_32 -FMOPA_h 10000001 101 ..... ... ... ..... . 00 .. @op_32 +FMOPA_w_h 10000001 101 ..... ... ... ..... . 00 .. @op_32 SMOPA_s 1010000 0 10 0 ..... ... ... ..... . 00 .. @op_32 SUMOPA_s 1010000 0 10 1 ..... ... ... ..... . 00 .. @op_32 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ static float32 f16_dotadd(float32 sum, uint32_t e1, uint32_t e2, return float32_add(sum, t32, s_std); } -void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn, - void *vpm, CPUARMState *env, uint32_t desc) +void HELPER(sme_fmopa_w_h)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, CPUARMState *env, uint32_t desc) { intptr_t row, col, oprsz = simd_maxsz(desc); uint32_t neg = simd_data(desc) * 0x80008000u; diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_outprod_env(DisasContext *s, arg_op *a, MemOp esz, return true; } -TRANS_FEAT(FMOPA_h, aa64_sme, do_outprod_env, a, - MO_32, gen_helper_sme_fmopa_h) +TRANS_FEAT(FMOPA_w_h, aa64_sme, do_outprod_env, a, + MO_32, gen_helper_sme_fmopa_w_h) TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, MO_32, FPST_ZA, gen_helper_sme_fmopa_s) TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, -- 2.43.0
Our current BFMOPA opcode pattern is the widening version of the insn. Rename it to BFMOPA_w, to make way for the non-widening version added in SME2. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-103-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 2 +- target/arm/tcg/sme.decode | 2 +- target/arm/tcg/sme_helper.c | 4 ++-- target/arm/tcg/translate-sme.c | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_7(sme_fmopa_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) -DEF_HELPER_FLAGS_7(sme_bfmopa, TCG_CALL_NO_RWG, +DEF_HELPER_FLAGS_7(sme_bfmopa_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) DEF_HELPER_FLAGS_6(sme_smopa_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ ADDVA_d 11000000 11 01000 1 ... ... ..... 00 ... @adda_64 FMOPA_s 10000000 100 ..... ... ... ..... . 00 .. @op_32 FMOPA_d 10000000 110 ..... ... ... ..... . 0 ... @op_64 -BFMOPA 10000001 100 ..... ... ... ..... . 00 .. @op_32 +BFMOPA_w 10000001 100 ..... ... ... ..... . 00 .. @op_32 FMOPA_w_h 10000001 101 ..... ... ... ..... . 00 .. @op_32 SMOPA_s 1010000 0 10 0 ..... ... ... ..... . 00 .. @op_32 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_fvdot_idx_h)(void *vd, void *vn, void *vm, void *va, } } -void HELPER(sme_bfmopa)(void *vza, void *vzn, void *vzm, - void *vpn, void *vpm, CPUARMState *env, uint32_t desc) +void HELPER(sme_bfmopa_w)(void *vza, void *vzn, void *vzm, + void *vpn, void *vpm, CPUARMState *env, uint32_t desc) { intptr_t row, col, oprsz = simd_maxsz(desc); uint32_t neg = simd_data(desc) * 0x80008000u; diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, MO_64, FPST_ZA, gen_helper_sme_fmopa_d) -TRANS_FEAT(BFMOPA, aa64_sme, do_outprod_env, a, MO_32, gen_helper_sme_bfmopa) +TRANS_FEAT(BFMOPA_w, aa64_sme, do_outprod_env, a, MO_32, gen_helper_sme_bfmopa_w) TRANS_FEAT(SMOPA_s, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_smopa_s) TRANS_FEAT(UMOPA_s, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_umopa_s) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> For non-widening, we can use float_muladd_negate_product, For widening, which uses dot-product, we need to handle the negation explicitly. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-104-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 19 +++++ target/arm/tcg/vec_internal.h | 5 ++ target/arm/tcg/sme_helper.c | 141 +++++++++++++++++++++++++++------ target/arm/tcg/translate-sme.c | 27 ++++--- 4 files changed, 160 insertions(+), 32 deletions(-) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_bfmopa_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) + +DEF_HELPER_FLAGS_7(sme_fmops_w_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_7(sme_fmops_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_7(sme_fmops_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_7(sme_bfmops_w, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, env, i32) + +DEF_HELPER_FLAGS_7(sme_ah_fmops_w_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_7(sme_ah_fmops_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_7(sme_ah_fmops_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_7(sme_ah_bfmops_w, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, env, i32) + DEF_HELPER_FLAGS_6(sme_smopa_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_6(sme_umopa_s, TCG_CALL_NO_RWG, diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_internal.h +++ b/target/arm/tcg/vec_internal.h @@ -XXX,XX +XXX,XX @@ bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp); /* * Negate as for FPCR.AH=1 -- do not negate NaNs. */ +static inline float16 bfloat16_ah_chs(float16 a) +{ + return bfloat16_is_any_nan(a) ? a : bfloat16_chs(a); +} + static inline float16 float16_ah_chs(float16 a) { return float16_is_any_nan(a) ? a : float16_chs(a); diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme_addva_d)(void *vzda, void *vzn, void *vpn, } } -void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn, - void *vpm, float_status *fpst, uint32_t desc) +static void do_fmopa_s(void *vza, void *vzn, void *vzm, uint16_t *pn, + uint16_t *pm, float_status *fpst, uint32_t desc, + uint32_t negx, int negf) { intptr_t row, col, oprsz = simd_maxsz(desc); - uint32_t neg = simd_data(desc) << 31; - uint16_t *pn = vpn, *pm = vpm; for (row = 0; row < oprsz; ) { uint16_t pa = pn[H2(row >> 4)]; do { if (pa & 1) { void *vza_row = vza + tile_vslice_offset(row); - uint32_t n = *(uint32_t *)(vzn + H1_4(row)) ^ neg; + uint32_t n = *(uint32_t *)(vzn + H1_4(row)) ^ negx; for (col = 0; col < oprsz; ) { uint16_t pb = pm[H2(col >> 4)]; @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn, if (pb & 1) { uint32_t *a = vza_row + H1_4(col); uint32_t *m = vzm + H1_4(col); - *a = float32_muladd(n, *m, *a, 0, fpst); + *a = float32_muladd(n, *m, *a, negf, fpst); } col += 4; pb >>= 4; @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn, } } -void HELPER(sme_fmopa_d)(void *vza, void *vzn, void *vzm, void *vpn, +void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn, void *vpm, float_status *fpst, uint32_t desc) +{ + do_fmopa_s(vza, vzn, vzm, vpn, vpm, fpst, desc, 0, 0); +} + +void HELPER(sme_fmops_s)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_fmopa_s(vza, vzn, vzm, vpn, vpm, fpst, desc, 1u << 31, 0); +} + +void HELPER(sme_ah_fmops_s)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_fmopa_s(vza, vzn, vzm, vpn, vpm, fpst, desc, 0, + float_muladd_negate_product); +} + +static void do_fmopa_d(uint64_t *za, uint64_t *zn, uint64_t *zm, uint8_t *pn, + uint8_t *pm, float_status *fpst, uint32_t desc, + uint64_t negx, int negf) { intptr_t row, col, oprsz = simd_oprsz(desc) / 8; - uint64_t neg = (uint64_t)simd_data(desc) << 63; - uint64_t *za = vza, *zn = vzn, *zm = vzm; - uint8_t *pn = vpn, *pm = vpm; for (row = 0; row < oprsz; ++row) { if (pn[H1(row)] & 1) { uint64_t *za_row = &za[tile_vslice_index(row)]; - uint64_t n = zn[row] ^ neg; + uint64_t n = zn[row] ^ negx; for (col = 0; col < oprsz; ++col) { if (pm[H1(col)] & 1) { uint64_t *a = &za_row[col]; - *a = float64_muladd(n, zm[col], *a, 0, fpst); + *a = float64_muladd(n, zm[col], *a, negf, fpst); } } } } } +void HELPER(sme_fmopa_d)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_fmopa_d(vza, vzn, vzm, vpn, vpm, fpst, desc, 0, 0); +} + +void HELPER(sme_fmops_d)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_fmopa_d(vza, vzn, vzm, vpn, vpm, fpst, desc, 1ull << 63, 0); +} + +void HELPER(sme_ah_fmops_d)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_fmopa_d(vza, vzn, vzm, vpn, vpm, fpst, desc, 0, + float_muladd_negate_product); +} + /* * Alter PAIR as needed for controlling predicates being false, * and for NEG on an enabled row element. @@ -XXX,XX +XXX,XX @@ static inline uint32_t f16mop_adj_pair(uint32_t pair, uint32_t pg, uint32_t neg) return pair; } +static inline uint32_t f16mop_ah_neg_adj_pair(uint32_t pair, uint32_t pg) +{ + uint32_t l = pg & 1 ? float16_ah_chs(pair) : 0; + uint32_t h = pg & 4 ? float16_ah_chs(pair >> 16) : 0; + return l | (h << 16); +} + +static inline uint32_t bf16mop_ah_neg_adj_pair(uint32_t pair, uint32_t pg) +{ + uint32_t l = pg & 1 ? bfloat16_ah_chs(pair) : 0; + uint32_t h = pg & 4 ? bfloat16_ah_chs(pair >> 16) : 0; + return l | (h << 16); +} + static float32 f16_dotadd(float32 sum, uint32_t e1, uint32_t e2, float_status *s_f16, float_status *s_std, float_status *s_odd) @@ -XXX,XX +XXX,XX @@ static float32 f16_dotadd(float32 sum, uint32_t e1, uint32_t e2, return float32_add(sum, t32, s_std); } -void HELPER(sme_fmopa_w_h)(void *vza, void *vzn, void *vzm, void *vpn, - void *vpm, CPUARMState *env, uint32_t desc) +static void do_fmopa_w_h(void *vza, void *vzn, void *vzm, uint16_t *pn, + uint16_t *pm, CPUARMState *env, uint32_t desc, + uint32_t negx, bool ah_neg) { intptr_t row, col, oprsz = simd_maxsz(desc); - uint32_t neg = simd_data(desc) * 0x80008000u; - uint16_t *pn = vpn, *pm = vpm; float_status fpst_odd = env->vfp.fp_status[FPST_ZA]; set_float_rounding_mode(float_round_to_odd, &fpst_odd); @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_w_h)(void *vza, void *vzn, void *vzm, void *vpn, void *vza_row = vza + tile_vslice_offset(row); uint32_t n = *(uint32_t *)(vzn + H1_4(row)); - n = f16mop_adj_pair(n, prow, neg); + if (ah_neg) { + n = f16mop_ah_neg_adj_pair(n, prow); + } else { + n = f16mop_adj_pair(n, prow, negx); + } for (col = 0; col < oprsz; ) { uint16_t pcol = pm[H2(col >> 4)]; @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_w_h)(void *vza, void *vzn, void *vzm, void *vpn, } } +void HELPER(sme_fmopa_w_h)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, CPUARMState *env, uint32_t desc) +{ + do_fmopa_w_h(vza, vzn, vzm, vpn, vpm, env, desc, 0, false); +} + +void HELPER(sme_fmops_w_h)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, CPUARMState *env, uint32_t desc) +{ + do_fmopa_w_h(vza, vzn, vzm, vpn, vpm, env, desc, 0x80008000u, false); +} + +void HELPER(sme_ah_fmops_w_h)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, CPUARMState *env, uint32_t desc) +{ + do_fmopa_w_h(vza, vzn, vzm, vpn, vpm, env, desc, 0, true); +} + void HELPER(sme2_fdot_h)(void *vd, void *vn, void *vm, void *va, CPUARMState *env, uint32_t desc) { @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_fvdot_idx_h)(void *vd, void *vn, void *vm, void *va, } } -void HELPER(sme_bfmopa_w)(void *vza, void *vzn, void *vzm, - void *vpn, void *vpm, CPUARMState *env, uint32_t desc) +static void do_bfmopa_w(void *vza, void *vzn, void *vzm, + uint16_t *pn, uint16_t *pm, CPUARMState *env, + uint32_t desc, uint32_t negx, bool ah_neg) { intptr_t row, col, oprsz = simd_maxsz(desc); - uint32_t neg = simd_data(desc) * 0x80008000u; - uint16_t *pn = vpn, *pm = vpm; float_status fpst, fpst_odd; if (is_ebf(env, &fpst, &fpst_odd)) { @@ -XXX,XX +XXX,XX @@ void HELPER(sme_bfmopa_w)(void *vza, void *vzn, void *vzm, void *vza_row = vza + tile_vslice_offset(row); uint32_t n = *(uint32_t *)(vzn + H1_4(row)); - n = f16mop_adj_pair(n, prow, neg); + if (ah_neg) { + n = bf16mop_ah_neg_adj_pair(n, prow); + } else { + n = f16mop_adj_pair(n, prow, negx); + } for (col = 0; col < oprsz; ) { uint16_t pcol = pm[H2(col >> 4)]; @@ -XXX,XX +XXX,XX @@ void HELPER(sme_bfmopa_w)(void *vza, void *vzn, void *vzm, void *vza_row = vza + tile_vslice_offset(row); uint32_t n = *(uint32_t *)(vzn + H1_4(row)); - n = f16mop_adj_pair(n, prow, neg); + if (ah_neg) { + n = bf16mop_ah_neg_adj_pair(n, prow); + } else { + n = f16mop_adj_pair(n, prow, negx); + } for (col = 0; col < oprsz; ) { uint16_t pcol = pm[H2(col >> 4)]; @@ -XXX,XX +XXX,XX @@ void HELPER(sme_bfmopa_w)(void *vza, void *vzn, void *vzm, } } +void HELPER(sme_bfmopa_w)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, CPUARMState *env, uint32_t desc) +{ + do_bfmopa_w(vza, vzn, vzm, vpn, vpm, env, desc, 0, false); +} + +void HELPER(sme_bfmops_w)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, CPUARMState *env, uint32_t desc) +{ + do_bfmopa_w(vza, vzn, vzm, vpn, vpm, env, desc, 0x80008000u, false); +} + +void HELPER(sme_ah_bfmops_w)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, CPUARMState *env, uint32_t desc) +{ + do_bfmopa_w(vza, vzn, vzm, vpn, vpm, env, desc, 0, true); +} + typedef uint32_t IMOPFn32(uint32_t, uint32_t, uint32_t, uint8_t, bool); static inline void do_imopa_s(uint32_t *za, uint32_t *zn, uint32_t *zm, uint8_t *pn, uint8_t *pm, diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_outprod_fpst(DisasContext *s, arg_op *a, MemOp esz, gen_helper_gvec_5_ptr *fn) { int svl = streaming_vec_reg_size(s); - uint32_t desc = simd_desc(svl, svl, a->sub); + uint32_t desc = simd_desc(svl, svl, 0); TCGv_ptr za, zn, zm, pn, pm, fpst; if (!sme_smza_enabled_check(s)) { @@ -XXX,XX +XXX,XX @@ static bool do_outprod_env(DisasContext *s, arg_op *a, MemOp esz, gen_helper_gvec_5_ptr *fn) { int svl = streaming_vec_reg_size(s); - uint32_t desc = simd_desc(svl, svl, a->sub); + uint32_t desc = simd_desc(svl, svl, 0); TCGv_ptr za, zn, zm, pn, pm; if (!sme_smza_enabled_check(s)) { @@ -XXX,XX +XXX,XX @@ static bool do_outprod_env(DisasContext *s, arg_op *a, MemOp esz, return true; } -TRANS_FEAT(FMOPA_w_h, aa64_sme, do_outprod_env, a, - MO_32, gen_helper_sme_fmopa_w_h) -TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, - MO_32, FPST_ZA, gen_helper_sme_fmopa_s) -TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, - MO_64, FPST_ZA, gen_helper_sme_fmopa_d) +TRANS_FEAT(FMOPA_w_h, aa64_sme, do_outprod_env, a, MO_32, + !a->sub ? gen_helper_sme_fmopa_w_h + : !s->fpcr_ah ? gen_helper_sme_fmops_w_h + : gen_helper_sme_ah_fmops_w_h) +TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, MO_32, FPST_ZA, + !a->sub ? gen_helper_sme_fmopa_s + : !s->fpcr_ah ? gen_helper_sme_fmops_s + : gen_helper_sme_ah_fmops_s) +TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, MO_64, FPST_ZA, + !a->sub ? gen_helper_sme_fmopa_d + : !s->fpcr_ah ? gen_helper_sme_fmops_d + : gen_helper_sme_ah_fmops_d) -TRANS_FEAT(BFMOPA_w, aa64_sme, do_outprod_env, a, MO_32, gen_helper_sme_bfmopa_w) +TRANS_FEAT(BFMOPA_w, aa64_sme, do_outprod_env, a, MO_32, + !a->sub ? gen_helper_sme_bfmopa_w + : !s->fpcr_ah ? gen_helper_sme_bfmops_w + : gen_helper_sme_ah_bfmops_w) TRANS_FEAT(SMOPA_s, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_smopa_s) TRANS_FEAT(UMOPA_s, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_umopa_s) -- 2.43.0
Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-105-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 6 ++++ target/arm/tcg/sme.decode | 2 ++ target/arm/tcg/sme_helper.c | 51 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 4 +++ 4 files changed, 63 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sme_addva_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_7(sme_fmopa_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_7(sme_fmopa_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_fmopa_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG, @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_7(sme_bfmopa_w, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_7(sme_fmops_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_7(sme_fmops_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_fmops_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_fmops_d, TCG_CALL_NO_RWG, @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_7(sme_bfmops_w, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_7(sme_ah_fmops_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_7(sme_ah_fmops_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_ah_fmops_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_ah_fmops_d, TCG_CALL_NO_RWG, diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ ADDVA_d 11000000 11 01000 1 ... ... ..... 00 ... @adda_64 ### SME Outer Product &op zad zn zm pm pn sub:bool +@op_16 ........ ... zm:5 pm:3 pn:3 zn:5 sub:1 ... zad:1 &op @op_32 ........ ... zm:5 pm:3 pn:3 zn:5 sub:1 .. zad:2 &op @op_64 ........ ... zm:5 pm:3 pn:3 zn:5 sub:1 . zad:3 &op +FMOPA_h 10000001 100 ..... ... ... ..... . 100 . @op_16 FMOPA_s 10000000 100 ..... ... ... ..... . 00 .. @op_32 FMOPA_d 10000000 110 ..... ... ... ..... . 0 ... @op_64 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme_addva_d)(void *vzda, void *vzn, void *vpn, } } +static void do_fmopa_h(void *vza, void *vzn, void *vzm, uint16_t *pn, + uint16_t *pm, float_status *fpst, uint32_t desc, + uint16_t negx, int negf) +{ + intptr_t row, col, oprsz = simd_maxsz(desc); + + for (row = 0; row < oprsz; ) { + uint16_t pa = pn[H2(row >> 4)]; + do { + if (pa & 1) { + void *vza_row = vza + tile_vslice_offset(row); + uint16_t n = *(uint32_t *)(vzn + H1_2(row)) ^ negx; + + for (col = 0; col < oprsz; ) { + uint16_t pb = pm[H2(col >> 4)]; + do { + if (pb & 1) { + uint16_t *a = vza_row + H1_2(col); + uint16_t *m = vzm + H1_2(col); + *a = float16_muladd(n, *m, *a, negf, fpst); + } + col += 2; + pb >>= 2; + } while (col & 15); + } + } + row += 2; + pa >>= 2; + } while (row & 15); + } +} + +void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_fmopa_h(vza, vzn, vzm, vpn, vpm, fpst, desc, 0, 0); +} + +void HELPER(sme_fmops_h)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_fmopa_h(vza, vzn, vzm, vpn, vpm, fpst, desc, 1u << 15, 0); +} + +void HELPER(sme_ah_fmops_h)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_fmopa_h(vza, vzn, vzm, vpn, vpm, fpst, desc, 0, + float_muladd_negate_product); +} + static void do_fmopa_s(void *vza, void *vzn, void *vzm, uint16_t *pn, uint16_t *pm, float_status *fpst, uint32_t desc, uint32_t negx, int negf) diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FMOPA_w_h, aa64_sme, do_outprod_env, a, MO_32, !a->sub ? gen_helper_sme_fmopa_w_h : !s->fpcr_ah ? gen_helper_sme_fmops_w_h : gen_helper_sme_ah_fmops_w_h) +TRANS_FEAT(FMOPA_h, aa64_sme_f16f16, do_outprod_fpst, a, MO_16, FPST_ZA_F16, + !a->sub ? gen_helper_sme_fmopa_h + : !s->fpcr_ah ? gen_helper_sme_fmops_h + : gen_helper_sme_ah_fmops_h) TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, MO_32, FPST_ZA, !a->sub ? gen_helper_sme_fmopa_s : !s->fpcr_ah ? gen_helper_sme_fmops_s -- 2.43.0
Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-106-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 6 ++++ target/arm/tcg/sme.decode | 2 ++ target/arm/tcg/sme_helper.c | 51 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 5 ++++ 4 files changed, 64 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_bfmopa_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_7(sme_bfmopa, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_fmops_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_7(sme_fmops_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_bfmops_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_7(sme_bfmops, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_ah_fmops_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_7(sme_ah_fmops_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_ah_bfmops_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_7(sme_ah_bfmops, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_6(sme_smopa_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ FMOPA_h 10000001 100 ..... ... ... ..... . 100 . @op_16 FMOPA_s 10000000 100 ..... ... ... ..... . 00 .. @op_32 FMOPA_d 10000000 110 ..... ... ... ..... . 0 ... @op_64 +BFMOPA 10000001 101 ..... ... ... ..... . 100 . @op_16 + BFMOPA_w 10000001 100 ..... ... ... ..... . 00 .. @op_32 FMOPA_w_h 10000001 101 ..... ... ... ..... . 00 .. @op_32 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme_ah_fmops_d)(void *vza, void *vzn, void *vzm, void *vpn, float_muladd_negate_product); } +static void do_bfmopa(void *vza, void *vzn, void *vzm, uint16_t *pn, + uint16_t *pm, float_status *fpst, uint32_t desc, + uint16_t negx, int negf) +{ + intptr_t row, col, oprsz = simd_maxsz(desc); + + for (row = 0; row < oprsz; ) { + uint16_t pa = pn[H2(row >> 4)]; + do { + if (pa & 1) { + void *vza_row = vza + tile_vslice_offset(row); + uint16_t n = *(uint32_t *)(vzn + H1_2(row)) ^ negx; + + for (col = 0; col < oprsz; ) { + uint16_t pb = pm[H2(col >> 4)]; + do { + if (pb & 1) { + uint16_t *a = vza_row + H1_2(col); + uint16_t *m = vzm + H1_2(col); + *a = bfloat16_muladd(n, *m, *a, negf, fpst); + } + col += 2; + pb >>= 2; + } while (col & 15); + } + } + row += 2; + pa >>= 2; + } while (row & 15); + } +} + +void HELPER(sme_bfmopa)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_bfmopa(vza, vzn, vzm, vpn, vpm, fpst, desc, 0, 0); +} + +void HELPER(sme_bfmops)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_bfmopa(vza, vzn, vzm, vpn, vpm, fpst, desc, 1u << 15, 0); +} + +void HELPER(sme_ah_bfmops)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_bfmopa(vza, vzn, vzm, vpn, vpm, fpst, desc, 0, + float_muladd_negate_product); +} + /* * Alter PAIR as needed for controlling predicates being false, * and for NEG on an enabled row element. diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, MO_64, FPST_ZA, : !s->fpcr_ah ? gen_helper_sme_fmops_d : gen_helper_sme_ah_fmops_d) +TRANS_FEAT(BFMOPA, aa64_sme_b16b16, do_outprod_fpst, a, MO_16, FPST_ZA, + !a->sub ? gen_helper_sme_bfmopa + : !s->fpcr_ah ? gen_helper_sme_bfmops + : gen_helper_sme_ah_bfmops) + TRANS_FEAT(BFMOPA_w, aa64_sme, do_outprod_env, a, MO_32, !a->sub ? gen_helper_sme_bfmopa_w : !s->fpcr_ah ? gen_helper_sme_bfmops_w -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-107-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- docs/system/arm/emulation.rst | 6 ++++++ target/arm/tcg/cpu64.c | 10 ++++++++-- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst index XXXXXXX..XXXXXXX 100644 --- a/docs/system/arm/emulation.rst +++ b/docs/system/arm/emulation.rst @@ -XXX,XX +XXX,XX @@ the following architecture extensions: - FEAT_SM3 (Advanced SIMD SM3 instructions) - FEAT_SM4 (Advanced SIMD SM4 instructions) - FEAT_SME (Scalable Matrix Extension) +- FEAT_SME2 (Scalable Matrix Extension version 2) +- FEAT_SME2p1 (Scalable Matrix Extension version 2.1) +- FEAT_SME_B16B16 (Non-widening BFloat16 arithmetic for SME2) - FEAT_SME_FA64 (Full A64 instruction set in Streaming SVE mode) +- FEAT_SME_F16F16 (Non-widening half-precision FP16 arithmetic for SME2) - FEAT_SME_F64F64 (Double-precision floating-point outer product instructions) - FEAT_SME_I16I64 (16-bit to 64-bit integer widening outer product instructions) - FEAT_SVE (Scalable Vector Extension) - FEAT_SVE_AES (Scalable Vector AES instructions) +- FEAT_SVE_B16B16 (Non-widening BFloat16 arithmetic for SVE2) - FEAT_SVE_BitPerm (Scalable Vector Bit Permutes instructions) - FEAT_SVE_PMULL128 (Scalable Vector PMULL instructions) - FEAT_SVE_SHA3 (Scalable Vector SHA3 instructions) - FEAT_SVE_SM4 (Scalable Vector SM4 instructions) - FEAT_SVE2 (Scalable Vector Extension version 2) +- FEAT_SVE2p1 (Scalable Vector Extension version 2.1) - FEAT_SPECRES (Speculation restriction instructions) - FEAT_SSBS (Speculative Store Bypass Safe) - FEAT_SSBS2 (MRS and MSR instructions for SSBS version 2) diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/cpu64.c +++ b/target/arm/tcg/cpu64.c @@ -XXX,XX +XXX,XX @@ void aarch64_max_tcg_initfn(Object *obj) */ t = FIELD_DP64(t, ID_AA64PFR1, MTE, 3); /* FEAT_MTE3 */ t = FIELD_DP64(t, ID_AA64PFR1, RAS_FRAC, 0); /* FEAT_RASv1p1 + FEAT_DoubleFault */ - t = FIELD_DP64(t, ID_AA64PFR1, SME, 1); /* FEAT_SME */ + t = FIELD_DP64(t, ID_AA64PFR1, SME, 2); /* FEAT_SME2 */ t = FIELD_DP64(t, ID_AA64PFR1, CSV2_FRAC, 0); /* FEAT_CSV2_3 */ t = FIELD_DP64(t, ID_AA64PFR1, NMI, 1); /* FEAT_NMI */ SET_IDREG(isar, ID_AA64PFR1, t); @@ -XXX,XX +XXX,XX @@ void aarch64_max_tcg_initfn(Object *obj) FIELD_DP64_IDREG(isar, ID_AA64MMFR3, SPEC_FPACC, 1); /* FEAT_FPACC_SPEC */ t = GET_IDREG(isar, ID_AA64ZFR0); - t = FIELD_DP64(t, ID_AA64ZFR0, SVEVER, 1); + t = FIELD_DP64(t, ID_AA64ZFR0, SVEVER, 2); /* FEAT_SVE2p1 */ t = FIELD_DP64(t, ID_AA64ZFR0, AES, 2); /* FEAT_SVE_PMULL128 */ t = FIELD_DP64(t, ID_AA64ZFR0, BITPERM, 1); /* FEAT_SVE_BitPerm */ t = FIELD_DP64(t, ID_AA64ZFR0, BFLOAT16, 2); /* FEAT_BF16, FEAT_EBF16 */ + t = FIELD_DP64(t, ID_AA64ZFR0, B16B16, 1); /* FEAT_SVE_B16B16 */ t = FIELD_DP64(t, ID_AA64ZFR0, SHA3, 1); /* FEAT_SVE_SHA3 */ t = FIELD_DP64(t, ID_AA64ZFR0, SM4, 1); /* FEAT_SVE_SM4 */ t = FIELD_DP64(t, ID_AA64ZFR0, I8MM, 1); /* FEAT_I8MM */ @@ -XXX,XX +XXX,XX @@ void aarch64_max_tcg_initfn(Object *obj) t = GET_IDREG(isar, ID_AA64SMFR0); t = FIELD_DP64(t, ID_AA64SMFR0, F32F32, 1); /* FEAT_SME */ + t = FIELD_DP64(t, ID_AA64SMFR0, BI32I32, 1); /* FEAT_SME2 */ t = FIELD_DP64(t, ID_AA64SMFR0, B16F32, 1); /* FEAT_SME */ t = FIELD_DP64(t, ID_AA64SMFR0, F16F32, 1); /* FEAT_SME */ t = FIELD_DP64(t, ID_AA64SMFR0, I8I32, 0xf); /* FEAT_SME */ + t = FIELD_DP64(t, ID_AA64SMFR0, F16F16, 1); /* FEAT_SME_F16F16 */ + t = FIELD_DP64(t, ID_AA64SMFR0, B16B16, 1); /* FEAT_SME_B16B16 */ + t = FIELD_DP64(t, ID_AA64SMFR0, I16I32, 5); /* FEAT_SME2 */ t = FIELD_DP64(t, ID_AA64SMFR0, F64F64, 1); /* FEAT_SME_F64F64 */ t = FIELD_DP64(t, ID_AA64SMFR0, I16I64, 0xf); /* FEAT_SME_I16I64 */ + t = FIELD_DP64(t, ID_AA64SMFR0, SMEVER, 2); /* FEAT_SME2p1 */ t = FIELD_DP64(t, ID_AA64SMFR0, FA64, 1); /* FEAT_SME_FA64 */ SET_IDREG(isar, ID_AA64SMFR0, t); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-108-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- linux-user/elfload.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/linux-user/elfload.c b/linux-user/elfload.c index XXXXXXX..XXXXXXX 100644 --- a/linux-user/elfload.c +++ b/linux-user/elfload.c @@ -XXX,XX +XXX,XX @@ uint64_t get_elf_hwcap2(void) GET_FEATURE_ID(aa64_sme_fa64, ARM_HWCAP2_A64_SME_FA64); GET_FEATURE_ID(aa64_hbc, ARM_HWCAP2_A64_HBC); GET_FEATURE_ID(aa64_mops, ARM_HWCAP2_A64_MOPS); + GET_FEATURE_ID(aa64_sve2p1, ARM_HWCAP2_A64_SVE2P1); + GET_FEATURE_ID(aa64_sme2, (ARM_HWCAP2_A64_SME2 | + ARM_HWCAP2_A64_SME_I16I32 | + ARM_HWCAP2_A64_SME_BI32I32)); + GET_FEATURE_ID(aa64_sme2p1, ARM_HWCAP2_A64_SME2P1); + GET_FEATURE_ID(aa64_sme_b16b16, ARM_HWCAP2_A64_SME_B16B16); + GET_FEATURE_ID(aa64_sme_f16f16, ARM_HWCAP2_A64_SME_F16F16); + GET_FEATURE_ID(aa64_sve_b16b16, ARM_HWCAP2_A64_SVE_B16B16); return hwcaps; } -- 2.43.0