:p
atchew
Login
Hi; this pullreq includes FEAT_LSE2 support, the new bpim2u board, and some other smaller patchsets. thanks -- PMM The following changes since commit 369081c4558e7e940fa36ce59bf17b2e390f55d3: Merge tag 'pull-tcg-20230605' of https://gitlab.com/rth7680/qemu into staging (2023-06-05 13:16:56 -0700) are available in the Git repository at: https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20230606 for you to fetch changes up to f9ac778898cb28307e0f91421aba34d43c34b679: target/arm: trap DCC access in user mode emulation (2023-06-06 10:19:40 +0100) ---------------------------------------------------------------- target-arm queue: * Support gdbstub (guest debug) in HVF * xnlx-versal: Support CANFD controller * bpim2u: New board model: Banana Pi BPI-M2 Ultra * Emulate FEAT_LSE2 * allow DC CVA[D]P in user mode emulation * trap DCC access in user mode emulation ---------------------------------------------------------------- Francesco Cagnin (4): arm: move KVM breakpoints helpers hvf: handle access for more registers hvf: add breakpoint handlers hvf: add guest debugging handlers for Apple Silicon hosts Richard Henderson (20): target/arm: Add commentary for CPUARMState.exclusive_high target/arm: Add feature test for FEAT_LSE2 target/arm: Introduce finalize_memop_{atom,pair} target/arm: Use tcg_gen_qemu_ld_i128 for LDXP target/arm: Use tcg_gen_qemu_{st, ld}_i128 for do_fp_{st, ld} target/arm: Use tcg_gen_qemu_st_i128 for STZG, STZ2G target/arm: Use tcg_gen_qemu_{ld, st}_i128 in gen_sve_{ld, st}r target/arm: Sink gen_mte_check1 into load/store_exclusive target/arm: Load/store integer pair with one tcg operation target/arm: Hoist finalize_memop out of do_gpr_{ld, st} target/arm: Hoist finalize_memop out of do_fp_{ld, st} target/arm: Pass memop to gen_mte_check1* target/arm: Pass single_memop to gen_mte_checkN target/arm: Check alignment in helper_mte_check target/arm: Add SCTLR.nAA to TBFLAG_A64 target/arm: Relax ordered/atomic alignment checks for LSE2 target/arm: Move mte check for store-exclusive tests/tcg/aarch64: Use stz2g in mte-7.c tests/tcg/multiarch: Adjust sigbus.c target/arm: Enable FEAT_LSE2 for -cpu max Vikram Garhwal (4): hw/net/can: Introduce Xilinx Versal CANFD controller xlnx-versal: Connect Xilinx VERSAL CANFD controllers MAINTAINERS: Include canfd tests under Xilinx CAN tests/qtest: Introduce tests for Xilinx VERSAL CANFD controller Zhuojia Shen (3): target/arm: allow DC CVA[D]P in user mode emulation tests/tcg/aarch64: add DC CVA[D]P tests target/arm: trap DCC access in user mode emulation qianfan Zhao (11): hw: arm: Add bananapi M2-Ultra and allwinner-r40 support hw/arm/allwinner-r40: add Clock Control Unit hw: allwinner-r40: Complete uart devices hw: arm: allwinner-r40: Add i2c0 device hw/misc: Rename axp209 to axp22x and add support AXP221 PMU hw/arm/allwinner-r40: add SDRAM controller device hw: sd: allwinner-sdhost: Add sun50i-a64 SoC support hw: arm: allwinner-r40: Add emac and gmac support hw: arm: allwinner-sramc: Add SRAM Controller support for R40 tests: avocado: boot_linux_console: Add test case for bpim2u docs: system: arm: Introduce bananapi_m2u MAINTAINERS | 2 +- docs/system/arm/bananapi_m2u.rst | 139 +++ docs/system/arm/emulation.rst | 1 + docs/system/arm/xlnx-versal-virt.rst | 31 + docs/system/target-arm.rst | 1 + include/hw/arm/allwinner-r40.h | 143 +++ include/hw/arm/xlnx-versal.h | 12 + include/hw/misc/allwinner-r40-ccu.h | 65 + include/hw/misc/allwinner-r40-dramc.h | 108 ++ include/hw/misc/allwinner-sramc.h | 69 ++ include/hw/net/xlnx-versal-canfd.h | 87 ++ include/hw/sd/allwinner-sdhost.h | 9 + include/sysemu/hvf.h | 37 + include/sysemu/hvf_int.h | 2 + target/arm/cpu.h | 16 +- target/arm/hvf_arm.h | 7 + target/arm/internals.h | 53 +- target/arm/tcg/helper-a64.h | 3 + target/arm/tcg/translate-a64.h | 4 +- target/arm/tcg/translate.h | 65 +- accel/hvf/hvf-accel-ops.c | 119 ++ accel/hvf/hvf-all.c | 23 + hw/arm/allwinner-r40.c | 526 ++++++++ hw/arm/bananapi_m2u.c | 145 +++ hw/arm/xlnx-versal-virt.c | 53 + hw/arm/xlnx-versal.c | 37 + hw/misc/allwinner-r40-ccu.c | 209 ++++ hw/misc/allwinner-r40-dramc.c | 513 ++++++++ hw/misc/allwinner-sramc.c | 184 +++ hw/misc/axp209.c | 238 ---- hw/misc/axp2xx.c | 283 +++++ hw/net/can/xlnx-versal-canfd.c | 2107 +++++++++++++++++++++++++++++++++ hw/sd/allwinner-sdhost.c | 72 +- target/arm/cpu.c | 2 + target/arm/debug_helper.c | 5 + target/arm/helper.c | 6 +- target/arm/hvf/hvf.c | 750 +++++++++++- target/arm/hyp_gdbstub.c | 253 ++++ target/arm/kvm64.c | 276 ----- target/arm/tcg/cpu64.c | 1 + target/arm/tcg/helper-a64.c | 7 + target/arm/tcg/hflags.c | 6 + target/arm/tcg/mte_helper.c | 18 + target/arm/tcg/translate-a64.c | 477 +++++--- target/arm/tcg/translate-sve.c | 106 +- target/arm/tcg/translate.c | 1 + target/i386/hvf/hvf.c | 33 + tests/qtest/xlnx-canfd-test.c | 423 +++++++ tests/tcg/aarch64/dcpodp.c | 63 + tests/tcg/aarch64/dcpop.c | 63 + tests/tcg/aarch64/mte-7.c | 3 +- tests/tcg/multiarch/sigbus.c | 13 +- hw/arm/Kconfig | 14 +- hw/arm/meson.build | 1 + hw/misc/Kconfig | 5 +- hw/misc/meson.build | 5 +- hw/misc/trace-events | 26 +- hw/net/can/meson.build | 1 + hw/net/can/trace-events | 7 + target/arm/meson.build | 3 +- tests/avocado/boot_linux_console.py | 176 +++ tests/qtest/meson.build | 1 + tests/tcg/aarch64/Makefile.target | 11 + 63 files changed, 7386 insertions(+), 733 deletions(-) create mode 100644 docs/system/arm/bananapi_m2u.rst create mode 100644 include/hw/arm/allwinner-r40.h create mode 100644 include/hw/misc/allwinner-r40-ccu.h create mode 100644 include/hw/misc/allwinner-r40-dramc.h create mode 100644 include/hw/misc/allwinner-sramc.h create mode 100644 include/hw/net/xlnx-versal-canfd.h create mode 100644 hw/arm/allwinner-r40.c create mode 100644 hw/arm/bananapi_m2u.c create mode 100644 hw/misc/allwinner-r40-ccu.c create mode 100644 hw/misc/allwinner-r40-dramc.c create mode 100644 hw/misc/allwinner-sramc.c delete mode 100644 hw/misc/axp209.c create mode 100644 hw/misc/axp2xx.c create mode 100644 hw/net/can/xlnx-versal-canfd.c create mode 100644 target/arm/hyp_gdbstub.c create mode 100644 tests/qtest/xlnx-canfd-test.c create mode 100644 tests/tcg/aarch64/dcpodp.c create mode 100644 tests/tcg/aarch64/dcpop.c
From: Francesco Cagnin <fcagnin@quarkslab.com> These helpers will be also used for HVF. Aside from reformatting a couple of comments for 'checkpatch.pl' and updating meson to compile 'hyp_gdbstub.c', this is just code motion. Signed-off-by: Francesco Cagnin <fcagnin@quarkslab.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20230601153107.81955-2-fcagnin@quarkslab.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/internals.h | 50 +++++++ target/arm/hyp_gdbstub.c | 253 +++++++++++++++++++++++++++++++++++ target/arm/kvm64.c | 276 --------------------------------------- target/arm/meson.build | 3 +- 4 files changed, 305 insertions(+), 277 deletions(-) create mode 100644 target/arm/hyp_gdbstub.c diff --git a/target/arm/internals.h b/target/arm/internals.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/internals.h +++ b/target/arm/internals.h @@ -XXX,XX +XXX,XX @@ static inline bool arm_fgt_active(CPUARMState *env, int el) } void assert_hflags_rebuild_correctly(CPUARMState *env); + +/* + * Although the ARM implementation of hardware assisted debugging + * allows for different breakpoints per-core, the current GDB + * interface treats them as a global pool of registers (which seems to + * be the case for x86, ppc and s390). As a result we store one copy + * of registers which is used for all active cores. + * + * Write access is serialised by virtue of the GDB protocol which + * updates things. Read access (i.e. when the values are copied to the + * vCPU) is also gated by GDB's run control. + * + * This is not unreasonable as most of the time debugging kernels you + * never know which core will eventually execute your function. + */ + +typedef struct { + uint64_t bcr; + uint64_t bvr; +} HWBreakpoint; + +/* + * The watchpoint registers can cover more area than the requested + * watchpoint so we need to store the additional information + * somewhere. We also need to supply a CPUWatchpoint to the GDB stub + * when the watchpoint is hit. + */ +typedef struct { + uint64_t wcr; + uint64_t wvr; + CPUWatchpoint details; +} HWWatchpoint; + +/* Maximum and current break/watch point counts */ +extern int max_hw_bps, max_hw_wps; +extern GArray *hw_breakpoints, *hw_watchpoints; + +#define cur_hw_wps (hw_watchpoints->len) +#define cur_hw_bps (hw_breakpoints->len) +#define get_hw_bp(i) (&g_array_index(hw_breakpoints, HWBreakpoint, i)) +#define get_hw_wp(i) (&g_array_index(hw_watchpoints, HWWatchpoint, i)) + +bool find_hw_breakpoint(CPUState *cpu, target_ulong pc); +int insert_hw_breakpoint(target_ulong pc); +int delete_hw_breakpoint(target_ulong pc); + +bool check_watchpoint_in_range(int i, target_ulong addr); +CPUWatchpoint *find_hw_watchpoint(CPUState *cpu, target_ulong addr); +int insert_hw_watchpoint(target_ulong addr, target_ulong len, int type); +int delete_hw_watchpoint(target_ulong addr, target_ulong len, int type); #endif diff --git a/target/arm/hyp_gdbstub.c b/target/arm/hyp_gdbstub.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/target/arm/hyp_gdbstub.c @@ -XXX,XX +XXX,XX @@ +/* + * ARM implementation of KVM and HVF hooks, 64 bit specific code + * + * Copyright Mian-M. Hamayun 2013, Virtual Open Systems + * Copyright Alex Bennée 2014, Linaro + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + * + */ + +#include "qemu/osdep.h" +#include "cpu.h" +#include "internals.h" +#include "exec/gdbstub.h" + +/* Maximum and current break/watch point counts */ +int max_hw_bps, max_hw_wps; +GArray *hw_breakpoints, *hw_watchpoints; + +/** + * insert_hw_breakpoint() + * @addr: address of breakpoint + * + * See ARM ARM D2.9.1 for details but here we are only going to create + * simple un-linked breakpoints (i.e. we don't chain breakpoints + * together to match address and context or vmid). The hardware is + * capable of fancier matching but that will require exposing that + * fanciness to GDB's interface + * + * DBGBCR<n>_EL1, Debug Breakpoint Control Registers + * + * 31 24 23 20 19 16 15 14 13 12 9 8 5 4 3 2 1 0 + * +------+------+-------+-----+----+------+-----+------+-----+---+ + * | RES0 | BT | LBN | SSC | HMC| RES0 | BAS | RES0 | PMC | E | + * +------+------+-------+-----+----+------+-----+------+-----+---+ + * + * BT: Breakpoint type (0 = unlinked address match) + * LBN: Linked BP number (0 = unused) + * SSC/HMC/PMC: Security, Higher and Priv access control (Table D-12) + * BAS: Byte Address Select (RES1 for AArch64) + * E: Enable bit + * + * DBGBVR<n>_EL1, Debug Breakpoint Value Registers + * + * 63 53 52 49 48 2 1 0 + * +------+-----------+----------+-----+ + * | RESS | VA[52:49] | VA[48:2] | 0 0 | + * +------+-----------+----------+-----+ + * + * Depending on the addressing mode bits the top bits of the register + * are a sign extension of the highest applicable VA bit. Some + * versions of GDB don't do it correctly so we ensure they are correct + * here so future PC comparisons will work properly. + */ + +int insert_hw_breakpoint(target_ulong addr) +{ + HWBreakpoint brk = { + .bcr = 0x1, /* BCR E=1, enable */ + .bvr = sextract64(addr, 0, 53) + }; + + if (cur_hw_bps >= max_hw_bps) { + return -ENOBUFS; + } + + brk.bcr = deposit32(brk.bcr, 1, 2, 0x3); /* PMC = 11 */ + brk.bcr = deposit32(brk.bcr, 5, 4, 0xf); /* BAS = RES1 */ + + g_array_append_val(hw_breakpoints, brk); + + return 0; +} + +/** + * delete_hw_breakpoint() + * @pc: address of breakpoint + * + * Delete a breakpoint and shuffle any above down + */ + +int delete_hw_breakpoint(target_ulong pc) +{ + int i; + for (i = 0; i < hw_breakpoints->len; i++) { + HWBreakpoint *brk = get_hw_bp(i); + if (brk->bvr == pc) { + g_array_remove_index(hw_breakpoints, i); + return 0; + } + } + return -ENOENT; +} + +/** + * insert_hw_watchpoint() + * @addr: address of watch point + * @len: size of area + * @type: type of watch point + * + * See ARM ARM D2.10. As with the breakpoints we can do some advanced + * stuff if we want to. The watch points can be linked with the break + * points above to make them context aware. However for simplicity + * currently we only deal with simple read/write watch points. + * + * D7.3.11 DBGWCR<n>_EL1, Debug Watchpoint Control Registers + * + * 31 29 28 24 23 21 20 19 16 15 14 13 12 5 4 3 2 1 0 + * +------+-------+------+----+-----+-----+-----+-----+-----+-----+---+ + * | RES0 | MASK | RES0 | WT | LBN | SSC | HMC | BAS | LSC | PAC | E | + * +------+-------+------+----+-----+-----+-----+-----+-----+-----+---+ + * + * MASK: num bits addr mask (0=none,01/10=res,11=3 bits (8 bytes)) + * WT: 0 - unlinked, 1 - linked (not currently used) + * LBN: Linked BP number (not currently used) + * SSC/HMC/PAC: Security, Higher and Priv access control (Table D2-11) + * BAS: Byte Address Select + * LSC: Load/Store control (01: load, 10: store, 11: both) + * E: Enable + * + * The bottom 2 bits of the value register are masked. Therefore to + * break on any sizes smaller than an unaligned word you need to set + * MASK=0, BAS=bit per byte in question. For larger regions (^2) you + * need to ensure you mask the address as required and set BAS=0xff + */ + +int insert_hw_watchpoint(target_ulong addr, target_ulong len, int type) +{ + HWWatchpoint wp = { + .wcr = R_DBGWCR_E_MASK, /* E=1, enable */ + .wvr = addr & (~0x7ULL), + .details = { .vaddr = addr, .len = len } + }; + + if (cur_hw_wps >= max_hw_wps) { + return -ENOBUFS; + } + + /* + * HMC=0 SSC=0 PAC=3 will hit EL0 or EL1, any security state, + * valid whether EL3 is implemented or not + */ + wp.wcr = FIELD_DP64(wp.wcr, DBGWCR, PAC, 3); + + switch (type) { + case GDB_WATCHPOINT_READ: + wp.wcr = FIELD_DP64(wp.wcr, DBGWCR, LSC, 1); + wp.details.flags = BP_MEM_READ; + break; + case GDB_WATCHPOINT_WRITE: + wp.wcr = FIELD_DP64(wp.wcr, DBGWCR, LSC, 2); + wp.details.flags = BP_MEM_WRITE; + break; + case GDB_WATCHPOINT_ACCESS: + wp.wcr = FIELD_DP64(wp.wcr, DBGWCR, LSC, 3); + wp.details.flags = BP_MEM_ACCESS; + break; + default: + g_assert_not_reached(); + break; + } + if (len <= 8) { + /* we align the address and set the bits in BAS */ + int off = addr & 0x7; + int bas = (1 << len) - 1; + + wp.wcr = deposit32(wp.wcr, 5 + off, 8 - off, bas); + } else { + /* For ranges above 8 bytes we need to be a power of 2 */ + if (is_power_of_2(len)) { + int bits = ctz64(len); + + wp.wvr &= ~((1 << bits) - 1); + wp.wcr = FIELD_DP64(wp.wcr, DBGWCR, MASK, bits); + wp.wcr = FIELD_DP64(wp.wcr, DBGWCR, BAS, 0xff); + } else { + return -ENOBUFS; + } + } + + g_array_append_val(hw_watchpoints, wp); + return 0; +} + +bool check_watchpoint_in_range(int i, target_ulong addr) +{ + HWWatchpoint *wp = get_hw_wp(i); + uint64_t addr_top, addr_bottom = wp->wvr; + int bas = extract32(wp->wcr, 5, 8); + int mask = extract32(wp->wcr, 24, 4); + + if (mask) { + addr_top = addr_bottom + (1 << mask); + } else { + /* + * BAS must be contiguous but can offset against the base + * address in DBGWVR + */ + addr_bottom = addr_bottom + ctz32(bas); + addr_top = addr_bottom + clo32(bas); + } + + if (addr >= addr_bottom && addr <= addr_top) { + return true; + } + + return false; +} + +/** + * delete_hw_watchpoint() + * @addr: address of breakpoint + * + * Delete a breakpoint and shuffle any above down + */ + +int delete_hw_watchpoint(target_ulong addr, target_ulong len, int type) +{ + int i; + for (i = 0; i < cur_hw_wps; i++) { + if (check_watchpoint_in_range(i, addr)) { + g_array_remove_index(hw_watchpoints, i); + return 0; + } + } + return -ENOENT; +} + +bool find_hw_breakpoint(CPUState *cpu, target_ulong pc) +{ + int i; + + for (i = 0; i < cur_hw_bps; i++) { + HWBreakpoint *bp = get_hw_bp(i); + if (bp->bvr == pc) { + return true; + } + } + return false; +} + +CPUWatchpoint *find_hw_watchpoint(CPUState *cpu, target_ulong addr) +{ + int i; + + for (i = 0; i < cur_hw_wps; i++) { + if (check_watchpoint_in_range(i, addr)) { + return &get_hw_wp(i)->details; + } + } + return NULL; +} diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/kvm64.c +++ b/target/arm/kvm64.c @@ -XXX,XX +XXX,XX @@ static bool have_guest_debug; -/* - * Although the ARM implementation of hardware assisted debugging - * allows for different breakpoints per-core, the current GDB - * interface treats them as a global pool of registers (which seems to - * be the case for x86, ppc and s390). As a result we store one copy - * of registers which is used for all active cores. - * - * Write access is serialised by virtue of the GDB protocol which - * updates things. Read access (i.e. when the values are copied to the - * vCPU) is also gated by GDB's run control. - * - * This is not unreasonable as most of the time debugging kernels you - * never know which core will eventually execute your function. - */ - -typedef struct { - uint64_t bcr; - uint64_t bvr; -} HWBreakpoint; - -/* The watchpoint registers can cover more area than the requested - * watchpoint so we need to store the additional information - * somewhere. We also need to supply a CPUWatchpoint to the GDB stub - * when the watchpoint is hit. - */ -typedef struct { - uint64_t wcr; - uint64_t wvr; - CPUWatchpoint details; -} HWWatchpoint; - -/* Maximum and current break/watch point counts */ -int max_hw_bps, max_hw_wps; -GArray *hw_breakpoints, *hw_watchpoints; - -#define cur_hw_wps (hw_watchpoints->len) -#define cur_hw_bps (hw_breakpoints->len) -#define get_hw_bp(i) (&g_array_index(hw_breakpoints, HWBreakpoint, i)) -#define get_hw_wp(i) (&g_array_index(hw_watchpoints, HWWatchpoint, i)) - void kvm_arm_init_debug(KVMState *s) { have_guest_debug = kvm_check_extension(s, @@ -XXX,XX +XXX,XX @@ void kvm_arm_init_debug(KVMState *s) return; } -/** - * insert_hw_breakpoint() - * @addr: address of breakpoint - * - * See ARM ARM D2.9.1 for details but here we are only going to create - * simple un-linked breakpoints (i.e. we don't chain breakpoints - * together to match address and context or vmid). The hardware is - * capable of fancier matching but that will require exposing that - * fanciness to GDB's interface - * - * DBGBCR<n>_EL1, Debug Breakpoint Control Registers - * - * 31 24 23 20 19 16 15 14 13 12 9 8 5 4 3 2 1 0 - * +------+------+-------+-----+----+------+-----+------+-----+---+ - * | RES0 | BT | LBN | SSC | HMC| RES0 | BAS | RES0 | PMC | E | - * +------+------+-------+-----+----+------+-----+------+-----+---+ - * - * BT: Breakpoint type (0 = unlinked address match) - * LBN: Linked BP number (0 = unused) - * SSC/HMC/PMC: Security, Higher and Priv access control (Table D-12) - * BAS: Byte Address Select (RES1 for AArch64) - * E: Enable bit - * - * DBGBVR<n>_EL1, Debug Breakpoint Value Registers - * - * 63 53 52 49 48 2 1 0 - * +------+-----------+----------+-----+ - * | RESS | VA[52:49] | VA[48:2] | 0 0 | - * +------+-----------+----------+-----+ - * - * Depending on the addressing mode bits the top bits of the register - * are a sign extension of the highest applicable VA bit. Some - * versions of GDB don't do it correctly so we ensure they are correct - * here so future PC comparisons will work properly. - */ - -static int insert_hw_breakpoint(target_ulong addr) -{ - HWBreakpoint brk = { - .bcr = 0x1, /* BCR E=1, enable */ - .bvr = sextract64(addr, 0, 53) - }; - - if (cur_hw_bps >= max_hw_bps) { - return -ENOBUFS; - } - - brk.bcr = deposit32(brk.bcr, 1, 2, 0x3); /* PMC = 11 */ - brk.bcr = deposit32(brk.bcr, 5, 4, 0xf); /* BAS = RES1 */ - - g_array_append_val(hw_breakpoints, brk); - - return 0; -} - -/** - * delete_hw_breakpoint() - * @pc: address of breakpoint - * - * Delete a breakpoint and shuffle any above down - */ - -static int delete_hw_breakpoint(target_ulong pc) -{ - int i; - for (i = 0; i < hw_breakpoints->len; i++) { - HWBreakpoint *brk = get_hw_bp(i); - if (brk->bvr == pc) { - g_array_remove_index(hw_breakpoints, i); - return 0; - } - } - return -ENOENT; -} - -/** - * insert_hw_watchpoint() - * @addr: address of watch point - * @len: size of area - * @type: type of watch point - * - * See ARM ARM D2.10. As with the breakpoints we can do some advanced - * stuff if we want to. The watch points can be linked with the break - * points above to make them context aware. However for simplicity - * currently we only deal with simple read/write watch points. - * - * D7.3.11 DBGWCR<n>_EL1, Debug Watchpoint Control Registers - * - * 31 29 28 24 23 21 20 19 16 15 14 13 12 5 4 3 2 1 0 - * +------+-------+------+----+-----+-----+-----+-----+-----+-----+---+ - * | RES0 | MASK | RES0 | WT | LBN | SSC | HMC | BAS | LSC | PAC | E | - * +------+-------+------+----+-----+-----+-----+-----+-----+-----+---+ - * - * MASK: num bits addr mask (0=none,01/10=res,11=3 bits (8 bytes)) - * WT: 0 - unlinked, 1 - linked (not currently used) - * LBN: Linked BP number (not currently used) - * SSC/HMC/PAC: Security, Higher and Priv access control (Table D2-11) - * BAS: Byte Address Select - * LSC: Load/Store control (01: load, 10: store, 11: both) - * E: Enable - * - * The bottom 2 bits of the value register are masked. Therefore to - * break on any sizes smaller than an unaligned word you need to set - * MASK=0, BAS=bit per byte in question. For larger regions (^2) you - * need to ensure you mask the address as required and set BAS=0xff - */ - -static int insert_hw_watchpoint(target_ulong addr, - target_ulong len, int type) -{ - HWWatchpoint wp = { - .wcr = R_DBGWCR_E_MASK, /* E=1, enable */ - .wvr = addr & (~0x7ULL), - .details = { .vaddr = addr, .len = len } - }; - - if (cur_hw_wps >= max_hw_wps) { - return -ENOBUFS; - } - - /* - * HMC=0 SSC=0 PAC=3 will hit EL0 or EL1, any security state, - * valid whether EL3 is implemented or not - */ - wp.wcr = FIELD_DP64(wp.wcr, DBGWCR, PAC, 3); - - switch (type) { - case GDB_WATCHPOINT_READ: - wp.wcr = FIELD_DP64(wp.wcr, DBGWCR, LSC, 1); - wp.details.flags = BP_MEM_READ; - break; - case GDB_WATCHPOINT_WRITE: - wp.wcr = FIELD_DP64(wp.wcr, DBGWCR, LSC, 2); - wp.details.flags = BP_MEM_WRITE; - break; - case GDB_WATCHPOINT_ACCESS: - wp.wcr = FIELD_DP64(wp.wcr, DBGWCR, LSC, 3); - wp.details.flags = BP_MEM_ACCESS; - break; - default: - g_assert_not_reached(); - break; - } - if (len <= 8) { - /* we align the address and set the bits in BAS */ - int off = addr & 0x7; - int bas = (1 << len) - 1; - - wp.wcr = deposit32(wp.wcr, 5 + off, 8 - off, bas); - } else { - /* For ranges above 8 bytes we need to be a power of 2 */ - if (is_power_of_2(len)) { - int bits = ctz64(len); - - wp.wvr &= ~((1 << bits) - 1); - wp.wcr = FIELD_DP64(wp.wcr, DBGWCR, MASK, bits); - wp.wcr = FIELD_DP64(wp.wcr, DBGWCR, BAS, 0xff); - } else { - return -ENOBUFS; - } - } - - g_array_append_val(hw_watchpoints, wp); - return 0; -} - - -static bool check_watchpoint_in_range(int i, target_ulong addr) -{ - HWWatchpoint *wp = get_hw_wp(i); - uint64_t addr_top, addr_bottom = wp->wvr; - int bas = extract32(wp->wcr, 5, 8); - int mask = extract32(wp->wcr, 24, 4); - - if (mask) { - addr_top = addr_bottom + (1 << mask); - } else { - /* BAS must be contiguous but can offset against the base - * address in DBGWVR */ - addr_bottom = addr_bottom + ctz32(bas); - addr_top = addr_bottom + clo32(bas); - } - - if (addr >= addr_bottom && addr <= addr_top) { - return true; - } - - return false; -} - -/** - * delete_hw_watchpoint() - * @addr: address of breakpoint - * - * Delete a breakpoint and shuffle any above down - */ - -static int delete_hw_watchpoint(target_ulong addr, - target_ulong len, int type) -{ - int i; - for (i = 0; i < cur_hw_wps; i++) { - if (check_watchpoint_in_range(i, addr)) { - g_array_remove_index(hw_watchpoints, i); - return 0; - } - } - return -ENOENT; -} - - int kvm_arch_insert_hw_breakpoint(target_ulong addr, target_ulong len, int type) { @@ -XXX,XX +XXX,XX @@ bool kvm_arm_hw_debug_active(CPUState *cs) return ((cur_hw_wps > 0) || (cur_hw_bps > 0)); } -static bool find_hw_breakpoint(CPUState *cpu, target_ulong pc) -{ - int i; - - for (i = 0; i < cur_hw_bps; i++) { - HWBreakpoint *bp = get_hw_bp(i); - if (bp->bvr == pc) { - return true; - } - } - return false; -} - -static CPUWatchpoint *find_hw_watchpoint(CPUState *cpu, target_ulong addr) -{ - int i; - - for (i = 0; i < cur_hw_wps; i++) { - if (check_watchpoint_in_range(i, addr)) { - return &get_hw_wp(i)->details; - } - } - return NULL; -} - static bool kvm_arm_set_device_attr(CPUState *cs, struct kvm_device_attr *attr, const char *name) { diff --git a/target/arm/meson.build b/target/arm/meson.build index XXXXXXX..XXXXXXX 100644 --- a/target/arm/meson.build +++ b/target/arm/meson.build @@ -XXX,XX +XXX,XX @@ arm_ss.add(files( )) arm_ss.add(zlib) -arm_ss.add(when: 'CONFIG_KVM', if_true: files('kvm.c', 'kvm64.c'), if_false: files('kvm-stub.c')) +arm_ss.add(when: 'CONFIG_KVM', if_true: files('hyp_gdbstub.c', 'kvm.c', 'kvm64.c'), if_false: files('kvm-stub.c')) +arm_ss.add(when: 'CONFIG_HVF', if_true: files('hyp_gdbstub.c')) arm_ss.add(when: 'TARGET_AARCH64', if_true: files( 'cpu64.c', -- 2.34.1
From: Francesco Cagnin <fcagnin@quarkslab.com> Required for guest debugging. Signed-off-by: Francesco Cagnin <fcagnin@quarkslab.com> Message-id: 20230601153107.81955-3-fcagnin@quarkslab.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/hvf/hvf.c | 213 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 213 insertions(+) diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/hvf/hvf.c +++ b/target/arm/hvf/hvf.c @@ -XXX,XX +XXX,XX @@ #define SYSREG_ICC_SGI1R_EL1 SYSREG(3, 0, 12, 11, 5) #define SYSREG_ICC_SRE_EL1 SYSREG(3, 0, 12, 12, 5) +#define SYSREG_MDSCR_EL1 SYSREG(2, 0, 0, 2, 2) +#define SYSREG_DBGBVR0_EL1 SYSREG(2, 0, 0, 0, 4) +#define SYSREG_DBGBCR0_EL1 SYSREG(2, 0, 0, 0, 5) +#define SYSREG_DBGWVR0_EL1 SYSREG(2, 0, 0, 0, 6) +#define SYSREG_DBGWCR0_EL1 SYSREG(2, 0, 0, 0, 7) +#define SYSREG_DBGBVR1_EL1 SYSREG(2, 0, 0, 1, 4) +#define SYSREG_DBGBCR1_EL1 SYSREG(2, 0, 0, 1, 5) +#define SYSREG_DBGWVR1_EL1 SYSREG(2, 0, 0, 1, 6) +#define SYSREG_DBGWCR1_EL1 SYSREG(2, 0, 0, 1, 7) +#define SYSREG_DBGBVR2_EL1 SYSREG(2, 0, 0, 2, 4) +#define SYSREG_DBGBCR2_EL1 SYSREG(2, 0, 0, 2, 5) +#define SYSREG_DBGWVR2_EL1 SYSREG(2, 0, 0, 2, 6) +#define SYSREG_DBGWCR2_EL1 SYSREG(2, 0, 0, 2, 7) +#define SYSREG_DBGBVR3_EL1 SYSREG(2, 0, 0, 3, 4) +#define SYSREG_DBGBCR3_EL1 SYSREG(2, 0, 0, 3, 5) +#define SYSREG_DBGWVR3_EL1 SYSREG(2, 0, 0, 3, 6) +#define SYSREG_DBGWCR3_EL1 SYSREG(2, 0, 0, 3, 7) +#define SYSREG_DBGBVR4_EL1 SYSREG(2, 0, 0, 4, 4) +#define SYSREG_DBGBCR4_EL1 SYSREG(2, 0, 0, 4, 5) +#define SYSREG_DBGWVR4_EL1 SYSREG(2, 0, 0, 4, 6) +#define SYSREG_DBGWCR4_EL1 SYSREG(2, 0, 0, 4, 7) +#define SYSREG_DBGBVR5_EL1 SYSREG(2, 0, 0, 5, 4) +#define SYSREG_DBGBCR5_EL1 SYSREG(2, 0, 0, 5, 5) +#define SYSREG_DBGWVR5_EL1 SYSREG(2, 0, 0, 5, 6) +#define SYSREG_DBGWCR5_EL1 SYSREG(2, 0, 0, 5, 7) +#define SYSREG_DBGBVR6_EL1 SYSREG(2, 0, 0, 6, 4) +#define SYSREG_DBGBCR6_EL1 SYSREG(2, 0, 0, 6, 5) +#define SYSREG_DBGWVR6_EL1 SYSREG(2, 0, 0, 6, 6) +#define SYSREG_DBGWCR6_EL1 SYSREG(2, 0, 0, 6, 7) +#define SYSREG_DBGBVR7_EL1 SYSREG(2, 0, 0, 7, 4) +#define SYSREG_DBGBCR7_EL1 SYSREG(2, 0, 0, 7, 5) +#define SYSREG_DBGWVR7_EL1 SYSREG(2, 0, 0, 7, 6) +#define SYSREG_DBGWCR7_EL1 SYSREG(2, 0, 0, 7, 7) +#define SYSREG_DBGBVR8_EL1 SYSREG(2, 0, 0, 8, 4) +#define SYSREG_DBGBCR8_EL1 SYSREG(2, 0, 0, 8, 5) +#define SYSREG_DBGWVR8_EL1 SYSREG(2, 0, 0, 8, 6) +#define SYSREG_DBGWCR8_EL1 SYSREG(2, 0, 0, 8, 7) +#define SYSREG_DBGBVR9_EL1 SYSREG(2, 0, 0, 9, 4) +#define SYSREG_DBGBCR9_EL1 SYSREG(2, 0, 0, 9, 5) +#define SYSREG_DBGWVR9_EL1 SYSREG(2, 0, 0, 9, 6) +#define SYSREG_DBGWCR9_EL1 SYSREG(2, 0, 0, 9, 7) +#define SYSREG_DBGBVR10_EL1 SYSREG(2, 0, 0, 10, 4) +#define SYSREG_DBGBCR10_EL1 SYSREG(2, 0, 0, 10, 5) +#define SYSREG_DBGWVR10_EL1 SYSREG(2, 0, 0, 10, 6) +#define SYSREG_DBGWCR10_EL1 SYSREG(2, 0, 0, 10, 7) +#define SYSREG_DBGBVR11_EL1 SYSREG(2, 0, 0, 11, 4) +#define SYSREG_DBGBCR11_EL1 SYSREG(2, 0, 0, 11, 5) +#define SYSREG_DBGWVR11_EL1 SYSREG(2, 0, 0, 11, 6) +#define SYSREG_DBGWCR11_EL1 SYSREG(2, 0, 0, 11, 7) +#define SYSREG_DBGBVR12_EL1 SYSREG(2, 0, 0, 12, 4) +#define SYSREG_DBGBCR12_EL1 SYSREG(2, 0, 0, 12, 5) +#define SYSREG_DBGWVR12_EL1 SYSREG(2, 0, 0, 12, 6) +#define SYSREG_DBGWCR12_EL1 SYSREG(2, 0, 0, 12, 7) +#define SYSREG_DBGBVR13_EL1 SYSREG(2, 0, 0, 13, 4) +#define SYSREG_DBGBCR13_EL1 SYSREG(2, 0, 0, 13, 5) +#define SYSREG_DBGWVR13_EL1 SYSREG(2, 0, 0, 13, 6) +#define SYSREG_DBGWCR13_EL1 SYSREG(2, 0, 0, 13, 7) +#define SYSREG_DBGBVR14_EL1 SYSREG(2, 0, 0, 14, 4) +#define SYSREG_DBGBCR14_EL1 SYSREG(2, 0, 0, 14, 5) +#define SYSREG_DBGWVR14_EL1 SYSREG(2, 0, 0, 14, 6) +#define SYSREG_DBGWCR14_EL1 SYSREG(2, 0, 0, 14, 7) +#define SYSREG_DBGBVR15_EL1 SYSREG(2, 0, 0, 15, 4) +#define SYSREG_DBGBCR15_EL1 SYSREG(2, 0, 0, 15, 5) +#define SYSREG_DBGWVR15_EL1 SYSREG(2, 0, 0, 15, 6) +#define SYSREG_DBGWCR15_EL1 SYSREG(2, 0, 0, 15, 7) + #define WFX_IS_WFE (1 << 0) #define TMR_CTL_ENABLE (1 << 0) @@ -XXX,XX +XXX,XX @@ static int hvf_sysreg_read(CPUState *cpu, uint32_t reg, uint32_t rt) hvf_raise_exception(cpu, EXCP_UDEF, syn_uncategorized()); } break; + case SYSREG_DBGBVR0_EL1: + case SYSREG_DBGBVR1_EL1: + case SYSREG_DBGBVR2_EL1: + case SYSREG_DBGBVR3_EL1: + case SYSREG_DBGBVR4_EL1: + case SYSREG_DBGBVR5_EL1: + case SYSREG_DBGBVR6_EL1: + case SYSREG_DBGBVR7_EL1: + case SYSREG_DBGBVR8_EL1: + case SYSREG_DBGBVR9_EL1: + case SYSREG_DBGBVR10_EL1: + case SYSREG_DBGBVR11_EL1: + case SYSREG_DBGBVR12_EL1: + case SYSREG_DBGBVR13_EL1: + case SYSREG_DBGBVR14_EL1: + case SYSREG_DBGBVR15_EL1: + val = env->cp15.dbgbvr[SYSREG_CRM(reg)]; + break; + case SYSREG_DBGBCR0_EL1: + case SYSREG_DBGBCR1_EL1: + case SYSREG_DBGBCR2_EL1: + case SYSREG_DBGBCR3_EL1: + case SYSREG_DBGBCR4_EL1: + case SYSREG_DBGBCR5_EL1: + case SYSREG_DBGBCR6_EL1: + case SYSREG_DBGBCR7_EL1: + case SYSREG_DBGBCR8_EL1: + case SYSREG_DBGBCR9_EL1: + case SYSREG_DBGBCR10_EL1: + case SYSREG_DBGBCR11_EL1: + case SYSREG_DBGBCR12_EL1: + case SYSREG_DBGBCR13_EL1: + case SYSREG_DBGBCR14_EL1: + case SYSREG_DBGBCR15_EL1: + val = env->cp15.dbgbcr[SYSREG_CRM(reg)]; + break; + case SYSREG_DBGWVR0_EL1: + case SYSREG_DBGWVR1_EL1: + case SYSREG_DBGWVR2_EL1: + case SYSREG_DBGWVR3_EL1: + case SYSREG_DBGWVR4_EL1: + case SYSREG_DBGWVR5_EL1: + case SYSREG_DBGWVR6_EL1: + case SYSREG_DBGWVR7_EL1: + case SYSREG_DBGWVR8_EL1: + case SYSREG_DBGWVR9_EL1: + case SYSREG_DBGWVR10_EL1: + case SYSREG_DBGWVR11_EL1: + case SYSREG_DBGWVR12_EL1: + case SYSREG_DBGWVR13_EL1: + case SYSREG_DBGWVR14_EL1: + case SYSREG_DBGWVR15_EL1: + val = env->cp15.dbgwvr[SYSREG_CRM(reg)]; + break; + case SYSREG_DBGWCR0_EL1: + case SYSREG_DBGWCR1_EL1: + case SYSREG_DBGWCR2_EL1: + case SYSREG_DBGWCR3_EL1: + case SYSREG_DBGWCR4_EL1: + case SYSREG_DBGWCR5_EL1: + case SYSREG_DBGWCR6_EL1: + case SYSREG_DBGWCR7_EL1: + case SYSREG_DBGWCR8_EL1: + case SYSREG_DBGWCR9_EL1: + case SYSREG_DBGWCR10_EL1: + case SYSREG_DBGWCR11_EL1: + case SYSREG_DBGWCR12_EL1: + case SYSREG_DBGWCR13_EL1: + case SYSREG_DBGWCR14_EL1: + case SYSREG_DBGWCR15_EL1: + val = env->cp15.dbgwcr[SYSREG_CRM(reg)]; + break; default: if (is_id_sysreg(reg)) { /* ID system registers read as RES0 */ @@ -XXX,XX +XXX,XX @@ static int hvf_sysreg_write(CPUState *cpu, uint32_t reg, uint64_t val) hvf_raise_exception(cpu, EXCP_UDEF, syn_uncategorized()); } break; + case SYSREG_MDSCR_EL1: + env->cp15.mdscr_el1 = val; + break; + case SYSREG_DBGBVR0_EL1: + case SYSREG_DBGBVR1_EL1: + case SYSREG_DBGBVR2_EL1: + case SYSREG_DBGBVR3_EL1: + case SYSREG_DBGBVR4_EL1: + case SYSREG_DBGBVR5_EL1: + case SYSREG_DBGBVR6_EL1: + case SYSREG_DBGBVR7_EL1: + case SYSREG_DBGBVR8_EL1: + case SYSREG_DBGBVR9_EL1: + case SYSREG_DBGBVR10_EL1: + case SYSREG_DBGBVR11_EL1: + case SYSREG_DBGBVR12_EL1: + case SYSREG_DBGBVR13_EL1: + case SYSREG_DBGBVR14_EL1: + case SYSREG_DBGBVR15_EL1: + env->cp15.dbgbvr[SYSREG_CRM(reg)] = val; + break; + case SYSREG_DBGBCR0_EL1: + case SYSREG_DBGBCR1_EL1: + case SYSREG_DBGBCR2_EL1: + case SYSREG_DBGBCR3_EL1: + case SYSREG_DBGBCR4_EL1: + case SYSREG_DBGBCR5_EL1: + case SYSREG_DBGBCR6_EL1: + case SYSREG_DBGBCR7_EL1: + case SYSREG_DBGBCR8_EL1: + case SYSREG_DBGBCR9_EL1: + case SYSREG_DBGBCR10_EL1: + case SYSREG_DBGBCR11_EL1: + case SYSREG_DBGBCR12_EL1: + case SYSREG_DBGBCR13_EL1: + case SYSREG_DBGBCR14_EL1: + case SYSREG_DBGBCR15_EL1: + env->cp15.dbgbcr[SYSREG_CRM(reg)] = val; + break; + case SYSREG_DBGWVR0_EL1: + case SYSREG_DBGWVR1_EL1: + case SYSREG_DBGWVR2_EL1: + case SYSREG_DBGWVR3_EL1: + case SYSREG_DBGWVR4_EL1: + case SYSREG_DBGWVR5_EL1: + case SYSREG_DBGWVR6_EL1: + case SYSREG_DBGWVR7_EL1: + case SYSREG_DBGWVR8_EL1: + case SYSREG_DBGWVR9_EL1: + case SYSREG_DBGWVR10_EL1: + case SYSREG_DBGWVR11_EL1: + case SYSREG_DBGWVR12_EL1: + case SYSREG_DBGWVR13_EL1: + case SYSREG_DBGWVR14_EL1: + case SYSREG_DBGWVR15_EL1: + env->cp15.dbgwvr[SYSREG_CRM(reg)] = val; + break; + case SYSREG_DBGWCR0_EL1: + case SYSREG_DBGWCR1_EL1: + case SYSREG_DBGWCR2_EL1: + case SYSREG_DBGWCR3_EL1: + case SYSREG_DBGWCR4_EL1: + case SYSREG_DBGWCR5_EL1: + case SYSREG_DBGWCR6_EL1: + case SYSREG_DBGWCR7_EL1: + case SYSREG_DBGWCR8_EL1: + case SYSREG_DBGWCR9_EL1: + case SYSREG_DBGWCR10_EL1: + case SYSREG_DBGWCR11_EL1: + case SYSREG_DBGWCR12_EL1: + case SYSREG_DBGWCR13_EL1: + case SYSREG_DBGWCR14_EL1: + case SYSREG_DBGWCR15_EL1: + env->cp15.dbgwcr[SYSREG_CRM(reg)] = val; + break; default: cpu_synchronize_state(cpu); trace_hvf_unhandled_sysreg_write(env->pc, reg, -- 2.34.1
From: Francesco Cagnin <fcagnin@quarkslab.com> Required for guest debugging. The code has been structured like the KVM counterpart. Signed-off-by: Francesco Cagnin <fcagnin@quarkslab.com> Message-id: 20230601153107.81955-4-fcagnin@quarkslab.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/sysemu/hvf.h | 22 ++++++++ include/sysemu/hvf_int.h | 1 + accel/hvf/hvf-accel-ops.c | 109 ++++++++++++++++++++++++++++++++++++++ accel/hvf/hvf-all.c | 17 ++++++ target/arm/hvf/hvf.c | 63 ++++++++++++++++++++++ target/i386/hvf/hvf.c | 24 +++++++++ 6 files changed, 236 insertions(+) diff --git a/include/sysemu/hvf.h b/include/sysemu/hvf.h index XXXXXXX..XXXXXXX 100644 --- a/include/sysemu/hvf.h +++ b/include/sysemu/hvf.h @@ -XXX,XX +XXX,XX @@ #include "qom/object.h" #ifdef NEED_CPU_H +#include "cpu.h" #ifdef CONFIG_HVF uint32_t hvf_get_supported_cpuid(uint32_t func, uint32_t idx, @@ -XXX,XX +XXX,XX @@ typedef struct HVFState HVFState; DECLARE_INSTANCE_CHECKER(HVFState, HVF_STATE, TYPE_HVF_ACCEL) +#ifdef NEED_CPU_H +struct hvf_sw_breakpoint { + target_ulong pc; + target_ulong saved_insn; + int use_count; + QTAILQ_ENTRY(hvf_sw_breakpoint) entry; +}; + +struct hvf_sw_breakpoint *hvf_find_sw_breakpoint(CPUState *cpu, + target_ulong pc); +int hvf_sw_breakpoints_active(CPUState *cpu); + +int hvf_arch_insert_sw_breakpoint(CPUState *cpu, struct hvf_sw_breakpoint *bp); +int hvf_arch_remove_sw_breakpoint(CPUState *cpu, struct hvf_sw_breakpoint *bp); +int hvf_arch_insert_hw_breakpoint(target_ulong addr, target_ulong len, + int type); +int hvf_arch_remove_hw_breakpoint(target_ulong addr, target_ulong len, + int type); +void hvf_arch_remove_all_hw_breakpoints(void); +#endif /* NEED_CPU_H */ + #endif diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h index XXXXXXX..XXXXXXX 100644 --- a/include/sysemu/hvf_int.h +++ b/include/sysemu/hvf_int.h @@ -XXX,XX +XXX,XX @@ struct HVFState { hvf_vcpu_caps *hvf_caps; uint64_t vtimer_offset; + QTAILQ_HEAD(, hvf_sw_breakpoint) hvf_sw_breakpoints; }; extern HVFState *hvf_state; diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c index XXXXXXX..XXXXXXX 100644 --- a/accel/hvf/hvf-accel-ops.c +++ b/accel/hvf/hvf-accel-ops.c @@ -XXX,XX +XXX,XX @@ #include "qemu/main-loop.h" #include "exec/address-spaces.h" #include "exec/exec-all.h" +#include "exec/gdbstub.h" #include "sysemu/cpus.h" #include "sysemu/hvf.h" #include "sysemu/hvf_int.h" @@ -XXX,XX +XXX,XX @@ static int hvf_accel_init(MachineState *ms) s->slots[x].slot_id = x; } + QTAILQ_INIT(&s->hvf_sw_breakpoints); + hvf_state = s; memory_listener_register(&hvf_memory_listener, &address_space_memory); @@ -XXX,XX +XXX,XX @@ static void hvf_start_vcpu_thread(CPUState *cpu) cpu, QEMU_THREAD_JOINABLE); } +static int hvf_insert_breakpoint(CPUState *cpu, int type, hwaddr addr, hwaddr len) +{ + struct hvf_sw_breakpoint *bp; + int err; + + if (type == GDB_BREAKPOINT_SW) { + bp = hvf_find_sw_breakpoint(cpu, addr); + if (bp) { + bp->use_count++; + return 0; + } + + bp = g_new(struct hvf_sw_breakpoint, 1); + bp->pc = addr; + bp->use_count = 1; + err = hvf_arch_insert_sw_breakpoint(cpu, bp); + if (err) { + g_free(bp); + return err; + } + + QTAILQ_INSERT_HEAD(&hvf_state->hvf_sw_breakpoints, bp, entry); + } else { + err = hvf_arch_insert_hw_breakpoint(addr, len, type); + if (err) { + return err; + } + } + + CPU_FOREACH(cpu) { + err = hvf_update_guest_debug(cpu); + if (err) { + return err; + } + } + return 0; +} + +static int hvf_remove_breakpoint(CPUState *cpu, int type, hwaddr addr, hwaddr len) +{ + struct hvf_sw_breakpoint *bp; + int err; + + if (type == GDB_BREAKPOINT_SW) { + bp = hvf_find_sw_breakpoint(cpu, addr); + if (!bp) { + return -ENOENT; + } + + if (bp->use_count > 1) { + bp->use_count--; + return 0; + } + + err = hvf_arch_remove_sw_breakpoint(cpu, bp); + if (err) { + return err; + } + + QTAILQ_REMOVE(&hvf_state->hvf_sw_breakpoints, bp, entry); + g_free(bp); + } else { + err = hvf_arch_remove_hw_breakpoint(addr, len, type); + if (err) { + return err; + } + } + + CPU_FOREACH(cpu) { + err = hvf_update_guest_debug(cpu); + if (err) { + return err; + } + } + return 0; +} + +static void hvf_remove_all_breakpoints(CPUState *cpu) +{ + struct hvf_sw_breakpoint *bp, *next; + CPUState *tmpcpu; + + QTAILQ_FOREACH_SAFE(bp, &hvf_state->hvf_sw_breakpoints, entry, next) { + if (hvf_arch_remove_sw_breakpoint(cpu, bp) != 0) { + /* Try harder to find a CPU that currently sees the breakpoint. */ + CPU_FOREACH(tmpcpu) + { + if (hvf_arch_remove_sw_breakpoint(tmpcpu, bp) == 0) { + break; + } + } + } + QTAILQ_REMOVE(&hvf_state->hvf_sw_breakpoints, bp, entry); + g_free(bp); + } + hvf_arch_remove_all_hw_breakpoints(); + + CPU_FOREACH(cpu) { + hvf_update_guest_debug(cpu); + } +} + static void hvf_accel_ops_class_init(ObjectClass *oc, void *data) { AccelOpsClass *ops = ACCEL_OPS_CLASS(oc); @@ -XXX,XX +XXX,XX @@ static void hvf_accel_ops_class_init(ObjectClass *oc, void *data) ops->synchronize_post_init = hvf_cpu_synchronize_post_init; ops->synchronize_state = hvf_cpu_synchronize_state; ops->synchronize_pre_loadvm = hvf_cpu_synchronize_pre_loadvm; + + ops->insert_breakpoint = hvf_insert_breakpoint; + ops->remove_breakpoint = hvf_remove_breakpoint; + ops->remove_all_breakpoints = hvf_remove_all_breakpoints; }; static const TypeInfo hvf_accel_ops_type = { .name = ACCEL_OPS_NAME("hvf"), diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c index XXXXXXX..XXXXXXX 100644 --- a/accel/hvf/hvf-all.c +++ b/accel/hvf/hvf-all.c @@ -XXX,XX +XXX,XX @@ void assert_hvf_ok(hv_return_t ret) abort(); } + +struct hvf_sw_breakpoint *hvf_find_sw_breakpoint(CPUState *cpu, target_ulong pc) +{ + struct hvf_sw_breakpoint *bp; + + QTAILQ_FOREACH(bp, &hvf_state->hvf_sw_breakpoints, entry) { + if (bp->pc == pc) { + return bp; + } + } + return NULL; +} + +int hvf_sw_breakpoints_active(CPUState *cpu) +{ + return !QTAILQ_EMPTY(&hvf_state->hvf_sw_breakpoints); +} diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/hvf/hvf.c +++ b/target/arm/hvf/hvf.c @@ -XXX,XX +XXX,XX @@ #include "trace/trace-target_arm_hvf.h" #include "migration/vmstate.h" +#include "exec/gdbstub.h" + #define HVF_SYSREG(crn, crm, op0, op1, op2) \ ENCODE_AA64_CP_REG(CP_REG_ARM64_SYSREG_CP, crn, crm, op0, op1, op2) #define PL1_WRITE_MASK 0x4 @@ -XXX,XX +XXX,XX @@ int hvf_arch_init(void) qemu_add_vm_change_state_handler(hvf_vm_state_change, &vtimer); return 0; } + +static const uint32_t brk_insn = 0xd4200000; + +int hvf_arch_insert_sw_breakpoint(CPUState *cpu, struct hvf_sw_breakpoint *bp) +{ + if (cpu_memory_rw_debug(cpu, bp->pc, (uint8_t *)&bp->saved_insn, 4, 0) || + cpu_memory_rw_debug(cpu, bp->pc, (uint8_t *)&brk_insn, 4, 1)) { + return -EINVAL; + } + return 0; +} + +int hvf_arch_remove_sw_breakpoint(CPUState *cpu, struct hvf_sw_breakpoint *bp) +{ + static uint32_t brk; + + if (cpu_memory_rw_debug(cpu, bp->pc, (uint8_t *)&brk, 4, 0) || + brk != brk_insn || + cpu_memory_rw_debug(cpu, bp->pc, (uint8_t *)&bp->saved_insn, 4, 1)) { + return -EINVAL; + } + return 0; +} + +int hvf_arch_insert_hw_breakpoint(target_ulong addr, target_ulong len, int type) +{ + switch (type) { + case GDB_BREAKPOINT_HW: + return insert_hw_breakpoint(addr); + case GDB_WATCHPOINT_READ: + case GDB_WATCHPOINT_WRITE: + case GDB_WATCHPOINT_ACCESS: + return insert_hw_watchpoint(addr, len, type); + default: + return -ENOSYS; + } +} + +int hvf_arch_remove_hw_breakpoint(target_ulong addr, target_ulong len, int type) +{ + switch (type) { + case GDB_BREAKPOINT_HW: + return delete_hw_breakpoint(addr); + case GDB_WATCHPOINT_READ: + case GDB_WATCHPOINT_WRITE: + case GDB_WATCHPOINT_ACCESS: + return delete_hw_watchpoint(addr, len, type); + default: + return -ENOSYS; + } +} + +void hvf_arch_remove_all_hw_breakpoints(void) +{ + if (cur_hw_wps > 0) { + g_array_remove_range(hw_watchpoints, 0, cur_hw_wps); + } + if (cur_hw_bps > 0) { + g_array_remove_range(hw_breakpoints, 0, cur_hw_bps); + } +} diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c index XXXXXXX..XXXXXXX 100644 --- a/target/i386/hvf/hvf.c +++ b/target/i386/hvf/hvf.c @@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu) return ret; } + +int hvf_arch_insert_sw_breakpoint(CPUState *cpu, struct hvf_sw_breakpoint *bp) +{ + return -ENOSYS; +} + +int hvf_arch_remove_sw_breakpoint(CPUState *cpu, struct hvf_sw_breakpoint *bp) +{ + return -ENOSYS; +} + +int hvf_arch_insert_hw_breakpoint(target_ulong addr, target_ulong len, int type) +{ + return -ENOSYS; +} + +int hvf_arch_remove_hw_breakpoint(target_ulong addr, target_ulong len, int type) +{ + return -ENOSYS; +} + +void hvf_arch_remove_all_hw_breakpoints(void) +{ +} -- 2.34.1
From: Francesco Cagnin <fcagnin@quarkslab.com> Guests can now be debugged through the gdbstub. Support is added for single-stepping, software breakpoints, hardware breakpoints and watchpoints. The code has been structured like the KVM counterpart. While guest debugging is enabled, the guest can still read and write the DBG*_EL1 registers but they don't have any effect. Signed-off-by: Francesco Cagnin <fcagnin@quarkslab.com> Message-id: 20230601153107.81955-5-fcagnin@quarkslab.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/sysemu/hvf.h | 15 ++ include/sysemu/hvf_int.h | 1 + target/arm/hvf_arm.h | 7 + accel/hvf/hvf-accel-ops.c | 10 + accel/hvf/hvf-all.c | 6 + target/arm/hvf/hvf.c | 474 +++++++++++++++++++++++++++++++++++++- target/i386/hvf/hvf.c | 9 + 7 files changed, 520 insertions(+), 2 deletions(-) diff --git a/include/sysemu/hvf.h b/include/sysemu/hvf.h index XXXXXXX..XXXXXXX 100644 --- a/include/sysemu/hvf.h +++ b/include/sysemu/hvf.h @@ -XXX,XX +XXX,XX @@ int hvf_arch_insert_hw_breakpoint(target_ulong addr, target_ulong len, int hvf_arch_remove_hw_breakpoint(target_ulong addr, target_ulong len, int type); void hvf_arch_remove_all_hw_breakpoints(void); + +/* + * hvf_update_guest_debug: + * @cs: CPUState for the CPU to update + * + * Update guest to enable or disable debugging. Per-arch specifics will be + * handled by calling down to hvf_arch_update_guest_debug. + */ +int hvf_update_guest_debug(CPUState *cpu); +void hvf_arch_update_guest_debug(CPUState *cpu); + +/* + * Return whether the guest supports debugging. + */ +bool hvf_arch_supports_guest_debug(void); #endif /* NEED_CPU_H */ #endif diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h index XXXXXXX..XXXXXXX 100644 --- a/include/sysemu/hvf_int.h +++ b/include/sysemu/hvf_int.h @@ -XXX,XX +XXX,XX @@ struct hvf_vcpu_state { void *exit; bool vtimer_masked; sigset_t unblock_ipi_mask; + bool guest_debug_enabled; }; void assert_hvf_ok(hv_return_t ret); diff --git a/target/arm/hvf_arm.h b/target/arm/hvf_arm.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/hvf_arm.h +++ b/target/arm/hvf_arm.h @@ -XXX,XX +XXX,XX @@ #include "cpu.h" +/** + * hvf_arm_init_debug() - initialize guest debug capabilities + * + * Should be called only once before using guest debug capabilities. + */ +void hvf_arm_init_debug(void); + void hvf_arm_set_cpu_features_from_host(ARMCPU *cpu); #endif diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c index XXXXXXX..XXXXXXX 100644 --- a/accel/hvf/hvf-accel-ops.c +++ b/accel/hvf/hvf-accel-ops.c @@ -XXX,XX +XXX,XX @@ static int hvf_accel_init(MachineState *ms) return hvf_arch_init(); } +static inline int hvf_gdbstub_sstep_flags(void) +{ + return SSTEP_ENABLE | SSTEP_NOIRQ; +} + static void hvf_accel_class_init(ObjectClass *oc, void *data) { AccelClass *ac = ACCEL_CLASS(oc); ac->name = "HVF"; ac->init_machine = hvf_accel_init; ac->allowed = &hvf_allowed; + ac->gdbstub_supported_sstep_flags = hvf_gdbstub_sstep_flags; } static const TypeInfo hvf_accel_type = { @@ -XXX,XX +XXX,XX @@ static int hvf_init_vcpu(CPUState *cpu) cpu->vcpu_dirty = 1; assert_hvf_ok(r); + cpu->hvf->guest_debug_enabled = false; + return hvf_arch_init_vcpu(cpu); } @@ -XXX,XX +XXX,XX @@ static void hvf_accel_ops_class_init(ObjectClass *oc, void *data) ops->insert_breakpoint = hvf_insert_breakpoint; ops->remove_breakpoint = hvf_remove_breakpoint; ops->remove_all_breakpoints = hvf_remove_all_breakpoints; + ops->update_guest_debug = hvf_update_guest_debug; + ops->supports_guest_debug = hvf_arch_supports_guest_debug; }; static const TypeInfo hvf_accel_ops_type = { .name = ACCEL_OPS_NAME("hvf"), diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c index XXXXXXX..XXXXXXX 100644 --- a/accel/hvf/hvf-all.c +++ b/accel/hvf/hvf-all.c @@ -XXX,XX +XXX,XX @@ int hvf_sw_breakpoints_active(CPUState *cpu) { return !QTAILQ_EMPTY(&hvf_state->hvf_sw_breakpoints); } + +int hvf_update_guest_debug(CPUState *cpu) +{ + hvf_arch_update_guest_debug(cpu); + return 0; +} diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/hvf/hvf.c +++ b/target/arm/hvf/hvf.c @@ -XXX,XX +XXX,XX @@ #include "exec/gdbstub.h" +#define MDSCR_EL1_SS_SHIFT 0 +#define MDSCR_EL1_MDE_SHIFT 15 + +static uint16_t dbgbcr_regs[] = { + HV_SYS_REG_DBGBCR0_EL1, + HV_SYS_REG_DBGBCR1_EL1, + HV_SYS_REG_DBGBCR2_EL1, + HV_SYS_REG_DBGBCR3_EL1, + HV_SYS_REG_DBGBCR4_EL1, + HV_SYS_REG_DBGBCR5_EL1, + HV_SYS_REG_DBGBCR6_EL1, + HV_SYS_REG_DBGBCR7_EL1, + HV_SYS_REG_DBGBCR8_EL1, + HV_SYS_REG_DBGBCR9_EL1, + HV_SYS_REG_DBGBCR10_EL1, + HV_SYS_REG_DBGBCR11_EL1, + HV_SYS_REG_DBGBCR12_EL1, + HV_SYS_REG_DBGBCR13_EL1, + HV_SYS_REG_DBGBCR14_EL1, + HV_SYS_REG_DBGBCR15_EL1, +}; +static uint16_t dbgbvr_regs[] = { + HV_SYS_REG_DBGBVR0_EL1, + HV_SYS_REG_DBGBVR1_EL1, + HV_SYS_REG_DBGBVR2_EL1, + HV_SYS_REG_DBGBVR3_EL1, + HV_SYS_REG_DBGBVR4_EL1, + HV_SYS_REG_DBGBVR5_EL1, + HV_SYS_REG_DBGBVR6_EL1, + HV_SYS_REG_DBGBVR7_EL1, + HV_SYS_REG_DBGBVR8_EL1, + HV_SYS_REG_DBGBVR9_EL1, + HV_SYS_REG_DBGBVR10_EL1, + HV_SYS_REG_DBGBVR11_EL1, + HV_SYS_REG_DBGBVR12_EL1, + HV_SYS_REG_DBGBVR13_EL1, + HV_SYS_REG_DBGBVR14_EL1, + HV_SYS_REG_DBGBVR15_EL1, +}; +static uint16_t dbgwcr_regs[] = { + HV_SYS_REG_DBGWCR0_EL1, + HV_SYS_REG_DBGWCR1_EL1, + HV_SYS_REG_DBGWCR2_EL1, + HV_SYS_REG_DBGWCR3_EL1, + HV_SYS_REG_DBGWCR4_EL1, + HV_SYS_REG_DBGWCR5_EL1, + HV_SYS_REG_DBGWCR6_EL1, + HV_SYS_REG_DBGWCR7_EL1, + HV_SYS_REG_DBGWCR8_EL1, + HV_SYS_REG_DBGWCR9_EL1, + HV_SYS_REG_DBGWCR10_EL1, + HV_SYS_REG_DBGWCR11_EL1, + HV_SYS_REG_DBGWCR12_EL1, + HV_SYS_REG_DBGWCR13_EL1, + HV_SYS_REG_DBGWCR14_EL1, + HV_SYS_REG_DBGWCR15_EL1, +}; +static uint16_t dbgwvr_regs[] = { + HV_SYS_REG_DBGWVR0_EL1, + HV_SYS_REG_DBGWVR1_EL1, + HV_SYS_REG_DBGWVR2_EL1, + HV_SYS_REG_DBGWVR3_EL1, + HV_SYS_REG_DBGWVR4_EL1, + HV_SYS_REG_DBGWVR5_EL1, + HV_SYS_REG_DBGWVR6_EL1, + HV_SYS_REG_DBGWVR7_EL1, + HV_SYS_REG_DBGWVR8_EL1, + HV_SYS_REG_DBGWVR9_EL1, + HV_SYS_REG_DBGWVR10_EL1, + HV_SYS_REG_DBGWVR11_EL1, + HV_SYS_REG_DBGWVR12_EL1, + HV_SYS_REG_DBGWVR13_EL1, + HV_SYS_REG_DBGWVR14_EL1, + HV_SYS_REG_DBGWVR15_EL1, +}; + +static inline int hvf_arm_num_brps(hv_vcpu_config_t config) +{ + uint64_t val; + hv_return_t ret; + ret = hv_vcpu_config_get_feature_reg(config, HV_FEATURE_REG_ID_AA64DFR0_EL1, + &val); + assert_hvf_ok(ret); + return FIELD_EX64(val, ID_AA64DFR0, BRPS) + 1; +} + +static inline int hvf_arm_num_wrps(hv_vcpu_config_t config) +{ + uint64_t val; + hv_return_t ret; + ret = hv_vcpu_config_get_feature_reg(config, HV_FEATURE_REG_ID_AA64DFR0_EL1, + &val); + assert_hvf_ok(ret); + return FIELD_EX64(val, ID_AA64DFR0, WRPS) + 1; +} + +void hvf_arm_init_debug(void) +{ + hv_vcpu_config_t config; + config = hv_vcpu_config_create(); + + max_hw_bps = hvf_arm_num_brps(config); + hw_breakpoints = + g_array_sized_new(true, true, sizeof(HWBreakpoint), max_hw_bps); + + max_hw_wps = hvf_arm_num_wrps(config); + hw_watchpoints = + g_array_sized_new(true, true, sizeof(HWWatchpoint), max_hw_wps); +} + #define HVF_SYSREG(crn, crm, op0, op1, op2) \ ENCODE_AA64_CP_REG(CP_REG_ARM64_SYSREG_CP, crn, crm, op0, op1, op2) #define PL1_WRITE_MASK 0x4 @@ -XXX,XX +XXX,XX @@ int hvf_get_registers(CPUState *cpu) continue; } + if (cpu->hvf->guest_debug_enabled) { + /* Handle debug registers */ + switch (hvf_sreg_match[i].reg) { + case HV_SYS_REG_DBGBVR0_EL1: + case HV_SYS_REG_DBGBCR0_EL1: + case HV_SYS_REG_DBGWVR0_EL1: + case HV_SYS_REG_DBGWCR0_EL1: + case HV_SYS_REG_DBGBVR1_EL1: + case HV_SYS_REG_DBGBCR1_EL1: + case HV_SYS_REG_DBGWVR1_EL1: + case HV_SYS_REG_DBGWCR1_EL1: + case HV_SYS_REG_DBGBVR2_EL1: + case HV_SYS_REG_DBGBCR2_EL1: + case HV_SYS_REG_DBGWVR2_EL1: + case HV_SYS_REG_DBGWCR2_EL1: + case HV_SYS_REG_DBGBVR3_EL1: + case HV_SYS_REG_DBGBCR3_EL1: + case HV_SYS_REG_DBGWVR3_EL1: + case HV_SYS_REG_DBGWCR3_EL1: + case HV_SYS_REG_DBGBVR4_EL1: + case HV_SYS_REG_DBGBCR4_EL1: + case HV_SYS_REG_DBGWVR4_EL1: + case HV_SYS_REG_DBGWCR4_EL1: + case HV_SYS_REG_DBGBVR5_EL1: + case HV_SYS_REG_DBGBCR5_EL1: + case HV_SYS_REG_DBGWVR5_EL1: + case HV_SYS_REG_DBGWCR5_EL1: + case HV_SYS_REG_DBGBVR6_EL1: + case HV_SYS_REG_DBGBCR6_EL1: + case HV_SYS_REG_DBGWVR6_EL1: + case HV_SYS_REG_DBGWCR6_EL1: + case HV_SYS_REG_DBGBVR7_EL1: + case HV_SYS_REG_DBGBCR7_EL1: + case HV_SYS_REG_DBGWVR7_EL1: + case HV_SYS_REG_DBGWCR7_EL1: + case HV_SYS_REG_DBGBVR8_EL1: + case HV_SYS_REG_DBGBCR8_EL1: + case HV_SYS_REG_DBGWVR8_EL1: + case HV_SYS_REG_DBGWCR8_EL1: + case HV_SYS_REG_DBGBVR9_EL1: + case HV_SYS_REG_DBGBCR9_EL1: + case HV_SYS_REG_DBGWVR9_EL1: + case HV_SYS_REG_DBGWCR9_EL1: + case HV_SYS_REG_DBGBVR10_EL1: + case HV_SYS_REG_DBGBCR10_EL1: + case HV_SYS_REG_DBGWVR10_EL1: + case HV_SYS_REG_DBGWCR10_EL1: + case HV_SYS_REG_DBGBVR11_EL1: + case HV_SYS_REG_DBGBCR11_EL1: + case HV_SYS_REG_DBGWVR11_EL1: + case HV_SYS_REG_DBGWCR11_EL1: + case HV_SYS_REG_DBGBVR12_EL1: + case HV_SYS_REG_DBGBCR12_EL1: + case HV_SYS_REG_DBGWVR12_EL1: + case HV_SYS_REG_DBGWCR12_EL1: + case HV_SYS_REG_DBGBVR13_EL1: + case HV_SYS_REG_DBGBCR13_EL1: + case HV_SYS_REG_DBGWVR13_EL1: + case HV_SYS_REG_DBGWCR13_EL1: + case HV_SYS_REG_DBGBVR14_EL1: + case HV_SYS_REG_DBGBCR14_EL1: + case HV_SYS_REG_DBGWVR14_EL1: + case HV_SYS_REG_DBGWCR14_EL1: + case HV_SYS_REG_DBGBVR15_EL1: + case HV_SYS_REG_DBGBCR15_EL1: + case HV_SYS_REG_DBGWVR15_EL1: + case HV_SYS_REG_DBGWCR15_EL1: { + /* + * If the guest is being debugged, the vCPU's debug registers + * are holding the gdbstub's view of the registers (set in + * hvf_arch_update_guest_debug()). + * Since the environment is used to store only the guest's view + * of the registers, don't update it with the values from the + * vCPU but simply keep the values from the previous + * environment. + */ + const ARMCPRegInfo *ri; + ri = get_arm_cp_reginfo(arm_cpu->cp_regs, hvf_sreg_match[i].key); + val = read_raw_cp_reg(env, ri); + + arm_cpu->cpreg_values[hvf_sreg_match[i].cp_idx] = val; + continue; + } + } + } + ret = hv_vcpu_get_sys_reg(cpu->hvf->fd, hvf_sreg_match[i].reg, &val); assert_hvf_ok(ret); @@ -XXX,XX +XXX,XX @@ int hvf_put_registers(CPUState *cpu) continue; } + if (cpu->hvf->guest_debug_enabled) { + /* Handle debug registers */ + switch (hvf_sreg_match[i].reg) { + case HV_SYS_REG_DBGBVR0_EL1: + case HV_SYS_REG_DBGBCR0_EL1: + case HV_SYS_REG_DBGWVR0_EL1: + case HV_SYS_REG_DBGWCR0_EL1: + case HV_SYS_REG_DBGBVR1_EL1: + case HV_SYS_REG_DBGBCR1_EL1: + case HV_SYS_REG_DBGWVR1_EL1: + case HV_SYS_REG_DBGWCR1_EL1: + case HV_SYS_REG_DBGBVR2_EL1: + case HV_SYS_REG_DBGBCR2_EL1: + case HV_SYS_REG_DBGWVR2_EL1: + case HV_SYS_REG_DBGWCR2_EL1: + case HV_SYS_REG_DBGBVR3_EL1: + case HV_SYS_REG_DBGBCR3_EL1: + case HV_SYS_REG_DBGWVR3_EL1: + case HV_SYS_REG_DBGWCR3_EL1: + case HV_SYS_REG_DBGBVR4_EL1: + case HV_SYS_REG_DBGBCR4_EL1: + case HV_SYS_REG_DBGWVR4_EL1: + case HV_SYS_REG_DBGWCR4_EL1: + case HV_SYS_REG_DBGBVR5_EL1: + case HV_SYS_REG_DBGBCR5_EL1: + case HV_SYS_REG_DBGWVR5_EL1: + case HV_SYS_REG_DBGWCR5_EL1: + case HV_SYS_REG_DBGBVR6_EL1: + case HV_SYS_REG_DBGBCR6_EL1: + case HV_SYS_REG_DBGWVR6_EL1: + case HV_SYS_REG_DBGWCR6_EL1: + case HV_SYS_REG_DBGBVR7_EL1: + case HV_SYS_REG_DBGBCR7_EL1: + case HV_SYS_REG_DBGWVR7_EL1: + case HV_SYS_REG_DBGWCR7_EL1: + case HV_SYS_REG_DBGBVR8_EL1: + case HV_SYS_REG_DBGBCR8_EL1: + case HV_SYS_REG_DBGWVR8_EL1: + case HV_SYS_REG_DBGWCR8_EL1: + case HV_SYS_REG_DBGBVR9_EL1: + case HV_SYS_REG_DBGBCR9_EL1: + case HV_SYS_REG_DBGWVR9_EL1: + case HV_SYS_REG_DBGWCR9_EL1: + case HV_SYS_REG_DBGBVR10_EL1: + case HV_SYS_REG_DBGBCR10_EL1: + case HV_SYS_REG_DBGWVR10_EL1: + case HV_SYS_REG_DBGWCR10_EL1: + case HV_SYS_REG_DBGBVR11_EL1: + case HV_SYS_REG_DBGBCR11_EL1: + case HV_SYS_REG_DBGWVR11_EL1: + case HV_SYS_REG_DBGWCR11_EL1: + case HV_SYS_REG_DBGBVR12_EL1: + case HV_SYS_REG_DBGBCR12_EL1: + case HV_SYS_REG_DBGWVR12_EL1: + case HV_SYS_REG_DBGWCR12_EL1: + case HV_SYS_REG_DBGBVR13_EL1: + case HV_SYS_REG_DBGBCR13_EL1: + case HV_SYS_REG_DBGWVR13_EL1: + case HV_SYS_REG_DBGWCR13_EL1: + case HV_SYS_REG_DBGBVR14_EL1: + case HV_SYS_REG_DBGBCR14_EL1: + case HV_SYS_REG_DBGWVR14_EL1: + case HV_SYS_REG_DBGWCR14_EL1: + case HV_SYS_REG_DBGBVR15_EL1: + case HV_SYS_REG_DBGBCR15_EL1: + case HV_SYS_REG_DBGWVR15_EL1: + case HV_SYS_REG_DBGWCR15_EL1: + /* + * If the guest is being debugged, the vCPU's debug registers + * are already holding the gdbstub's view of the registers (set + * in hvf_arch_update_guest_debug()). + */ + continue; + } + } + val = arm_cpu->cpreg_values[hvf_sreg_match[i].cp_idx]; ret = hv_vcpu_set_sys_reg(cpu->hvf->fd, hvf_sreg_match[i].reg, val); assert_hvf_ok(ret); @@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu) { ARMCPU *arm_cpu = ARM_CPU(cpu); CPUARMState *env = &arm_cpu->env; + int ret; hv_vcpu_exit_t *hvf_exit = cpu->hvf->exit; hv_return_t r; bool advance_pc = false; - if (hvf_inject_interrupts(cpu)) { + if (!(cpu->singlestep_enabled & SSTEP_NOIRQ) && + hvf_inject_interrupts(cpu)) { return EXCP_INTERRUPT; } @@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu) uint64_t syndrome = hvf_exit->exception.syndrome; uint32_t ec = syn_get_ec(syndrome); + ret = 0; qemu_mutex_lock_iothread(); switch (exit_reason) { case HV_EXIT_REASON_EXCEPTION: @@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu) hvf_sync_vtimer(cpu); switch (ec) { + case EC_SOFTWARESTEP: { + ret = EXCP_DEBUG; + + if (!cpu->singlestep_enabled) { + error_report("EC_SOFTWARESTEP but single-stepping not enabled"); + } + break; + } + case EC_AA64_BKPT: { + ret = EXCP_DEBUG; + + cpu_synchronize_state(cpu); + + if (!hvf_find_sw_breakpoint(cpu, env->pc)) { + /* Re-inject into the guest */ + ret = 0; + hvf_raise_exception(cpu, EXCP_BKPT, syn_aa64_bkpt(0)); + } + break; + } + case EC_BREAKPOINT: { + ret = EXCP_DEBUG; + + cpu_synchronize_state(cpu); + + if (!find_hw_breakpoint(cpu, env->pc)) { + error_report("EC_BREAKPOINT but unknown hw breakpoint"); + } + break; + } + case EC_WATCHPOINT: { + ret = EXCP_DEBUG; + + cpu_synchronize_state(cpu); + + CPUWatchpoint *wp = + find_hw_watchpoint(cpu, hvf_exit->exception.virtual_address); + if (!wp) { + error_report("EXCP_DEBUG but unknown hw watchpoint"); + } + cpu->watchpoint_hit = wp; + break; + } case EC_DATAABORT: { bool isv = syndrome & ARM_EL_ISV; bool iswrite = (syndrome >> 6) & 1; @@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu) pc += 4; r = hv_vcpu_set_reg(cpu->hvf->fd, HV_REG_PC, pc); assert_hvf_ok(r); + + /* Handle single-stepping over instructions which trigger a VM exit */ + if (cpu->singlestep_enabled) { + ret = EXCP_DEBUG; + } } - return 0; + return ret; } static const VMStateDescription vmstate_hvf_vtimer = { @@ -XXX,XX +XXX,XX @@ int hvf_arch_init(void) hvf_state->vtimer_offset = mach_absolute_time(); vmstate_register(NULL, 0, &vmstate_hvf_vtimer, &vtimer); qemu_add_vm_change_state_handler(hvf_vm_state_change, &vtimer); + + hvf_arm_init_debug(); + return 0; } @@ -XXX,XX +XXX,XX @@ void hvf_arch_remove_all_hw_breakpoints(void) g_array_remove_range(hw_breakpoints, 0, cur_hw_bps); } } + +/* + * Update the vCPU with the gdbstub's view of debug registers. This view + * consists of all hardware breakpoints and watchpoints inserted so far while + * debugging the guest. + */ +static void hvf_put_gdbstub_debug_registers(CPUState *cpu) +{ + hv_return_t r = HV_SUCCESS; + int i; + + for (i = 0; i < cur_hw_bps; i++) { + HWBreakpoint *bp = get_hw_bp(i); + r = hv_vcpu_set_sys_reg(cpu->hvf->fd, dbgbcr_regs[i], bp->bcr); + assert_hvf_ok(r); + r = hv_vcpu_set_sys_reg(cpu->hvf->fd, dbgbvr_regs[i], bp->bvr); + assert_hvf_ok(r); + } + for (i = cur_hw_bps; i < max_hw_bps; i++) { + r = hv_vcpu_set_sys_reg(cpu->hvf->fd, dbgbcr_regs[i], 0); + assert_hvf_ok(r); + r = hv_vcpu_set_sys_reg(cpu->hvf->fd, dbgbvr_regs[i], 0); + assert_hvf_ok(r); + } + + for (i = 0; i < cur_hw_wps; i++) { + HWWatchpoint *wp = get_hw_wp(i); + r = hv_vcpu_set_sys_reg(cpu->hvf->fd, dbgwcr_regs[i], wp->wcr); + assert_hvf_ok(r); + r = hv_vcpu_set_sys_reg(cpu->hvf->fd, dbgwvr_regs[i], wp->wvr); + assert_hvf_ok(r); + } + for (i = cur_hw_wps; i < max_hw_wps; i++) { + r = hv_vcpu_set_sys_reg(cpu->hvf->fd, dbgwcr_regs[i], 0); + assert_hvf_ok(r); + r = hv_vcpu_set_sys_reg(cpu->hvf->fd, dbgwvr_regs[i], 0); + assert_hvf_ok(r); + } +} + +/* + * Update the vCPU with the guest's view of debug registers. This view is kept + * in the environment at all times. + */ +static void hvf_put_guest_debug_registers(CPUState *cpu) +{ + ARMCPU *arm_cpu = ARM_CPU(cpu); + CPUARMState *env = &arm_cpu->env; + hv_return_t r = HV_SUCCESS; + int i; + + for (i = 0; i < max_hw_bps; i++) { + r = hv_vcpu_set_sys_reg(cpu->hvf->fd, dbgbcr_regs[i], + env->cp15.dbgbcr[i]); + assert_hvf_ok(r); + r = hv_vcpu_set_sys_reg(cpu->hvf->fd, dbgbvr_regs[i], + env->cp15.dbgbvr[i]); + assert_hvf_ok(r); + } + + for (i = 0; i < max_hw_wps; i++) { + r = hv_vcpu_set_sys_reg(cpu->hvf->fd, dbgwcr_regs[i], + env->cp15.dbgwcr[i]); + assert_hvf_ok(r); + r = hv_vcpu_set_sys_reg(cpu->hvf->fd, dbgwvr_regs[i], + env->cp15.dbgwvr[i]); + assert_hvf_ok(r); + } +} + +static inline bool hvf_arm_hw_debug_active(CPUState *cpu) +{ + return ((cur_hw_wps > 0) || (cur_hw_bps > 0)); +} + +static void hvf_arch_set_traps(void) +{ + CPUState *cpu; + bool should_enable_traps = false; + hv_return_t r = HV_SUCCESS; + + /* Check whether guest debugging is enabled for at least one vCPU; if it + * is, enable exiting the guest on all vCPUs */ + CPU_FOREACH(cpu) { + should_enable_traps |= cpu->hvf->guest_debug_enabled; + } + CPU_FOREACH(cpu) { + /* Set whether debug exceptions exit the guest */ + r = hv_vcpu_set_trap_debug_exceptions(cpu->hvf->fd, + should_enable_traps); + assert_hvf_ok(r); + + /* Set whether accesses to debug registers exit the guest */ + r = hv_vcpu_set_trap_debug_reg_accesses(cpu->hvf->fd, + should_enable_traps); + assert_hvf_ok(r); + } +} + +void hvf_arch_update_guest_debug(CPUState *cpu) +{ + ARMCPU *arm_cpu = ARM_CPU(cpu); + CPUARMState *env = &arm_cpu->env; + + /* Check whether guest debugging is enabled */ + cpu->hvf->guest_debug_enabled = cpu->singlestep_enabled || + hvf_sw_breakpoints_active(cpu) || + hvf_arm_hw_debug_active(cpu); + + /* Update debug registers */ + if (cpu->hvf->guest_debug_enabled) { + hvf_put_gdbstub_debug_registers(cpu); + } else { + hvf_put_guest_debug_registers(cpu); + } + + cpu_synchronize_state(cpu); + + /* Enable/disable single-stepping */ + if (cpu->singlestep_enabled) { + env->cp15.mdscr_el1 = + deposit64(env->cp15.mdscr_el1, MDSCR_EL1_SS_SHIFT, 1, 1); + pstate_write(env, pstate_read(env) | PSTATE_SS); + } else { + env->cp15.mdscr_el1 = + deposit64(env->cp15.mdscr_el1, MDSCR_EL1_SS_SHIFT, 1, 0); + } + + /* Enable/disable Breakpoint exceptions */ + if (hvf_arm_hw_debug_active(cpu)) { + env->cp15.mdscr_el1 = + deposit64(env->cp15.mdscr_el1, MDSCR_EL1_MDE_SHIFT, 1, 1); + } else { + env->cp15.mdscr_el1 = + deposit64(env->cp15.mdscr_el1, MDSCR_EL1_MDE_SHIFT, 1, 0); + } + + hvf_arch_set_traps(); +} + +inline bool hvf_arch_supports_guest_debug(void) +{ + return true; +} diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c index XXXXXXX..XXXXXXX 100644 --- a/target/i386/hvf/hvf.c +++ b/target/i386/hvf/hvf.c @@ -XXX,XX +XXX,XX @@ int hvf_arch_remove_hw_breakpoint(target_ulong addr, target_ulong len, int type) void hvf_arch_remove_all_hw_breakpoints(void) { } + +void hvf_arch_update_guest_debug(CPUState *cpu) +{ +} + +inline bool hvf_arch_supports_guest_debug(void) +{ + return false; +} -- 2.34.1
From: Vikram Garhwal <vikram.garhwal@amd.com> The Xilinx Versal CANFD controller is developed based on SocketCAN, QEMU CAN bus implementation. Bus connection and socketCAN connection for each CAN module can be set through command lines. Signed-off-by: Vikram Garhwal <vikram.garhwal@amd.com> Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/hw/net/xlnx-versal-canfd.h | 87 ++ hw/net/can/xlnx-versal-canfd.c | 2107 ++++++++++++++++++++++++++++ hw/net/can/meson.build | 1 + hw/net/can/trace-events | 7 + 4 files changed, 2202 insertions(+) create mode 100644 include/hw/net/xlnx-versal-canfd.h create mode 100644 hw/net/can/xlnx-versal-canfd.c diff --git a/include/hw/net/xlnx-versal-canfd.h b/include/hw/net/xlnx-versal-canfd.h new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/include/hw/net/xlnx-versal-canfd.h @@ -XXX,XX +XXX,XX @@ +/* + * QEMU model of the Xilinx Versal CANFD Controller. + * + * Copyright (c) 2023 Advanced Micro Devices, Inc. + * + * Written-by: Vikram Garhwal<vikram.garhwal@amd.com> + * Based on QEMU CANFD Device emulation implemented by Jin Yang, Deniz Eren and + * Pavel Pisa. + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#ifndef HW_CANFD_XILINX_H +#define HW_CANFD_XILINX_H + +#include "hw/register.h" +#include "hw/ptimer.h" +#include "net/can_emu.h" +#include "hw/qdev-clock.h" + +#define TYPE_XILINX_CANFD "xlnx.versal-canfd" + +OBJECT_DECLARE_SIMPLE_TYPE(XlnxVersalCANFDState, XILINX_CANFD) + +#define NUM_REGS_PER_MSG_SPACE 18 /* 1 ID + 1 DLC + 16 Data(DW0 - DW15) regs. */ +#define MAX_NUM_RX 64 +#define OFFSET_RX1_DW15 (0x4144 / 4) +#define CANFD_TIMER_MAX 0xFFFFUL +#define CANFD_DEFAULT_CLOCK (25 * 1000 * 1000) + +#define XLNX_VERSAL_CANFD_R_MAX (OFFSET_RX1_DW15 + \ + ((MAX_NUM_RX - 1) * NUM_REGS_PER_MSG_SPACE) + 1) + +typedef struct XlnxVersalCANFDState { + SysBusDevice parent_obj; + MemoryRegion iomem; + + qemu_irq irq_canfd_int; + qemu_irq irq_addr_err; + + RegisterInfo reg_info[XLNX_VERSAL_CANFD_R_MAX]; + RegisterAccessInfo *tx_regs; + RegisterAccessInfo *rx0_regs; + RegisterAccessInfo *rx1_regs; + RegisterAccessInfo *af_regs; + RegisterAccessInfo *txe_regs; + RegisterAccessInfo *rx_mailbox_regs; + RegisterAccessInfo *af_mask_regs_mailbox; + + uint32_t regs[XLNX_VERSAL_CANFD_R_MAX]; + + ptimer_state *canfd_timer; + + CanBusClientState bus_client; + CanBusState *canfdbus; + + struct { + uint8_t rx0_fifo; + uint8_t rx1_fifo; + uint8_t tx_fifo; + bool enable_rx_fifo1; + uint32_t ext_clk_freq; + } cfg; + +} XlnxVersalCANFDState; + +typedef struct tx_ready_reg_info { + uint32_t can_id; + uint32_t reg_num; +} tx_ready_reg_info; + +#endif diff --git a/hw/net/can/xlnx-versal-canfd.c b/hw/net/can/xlnx-versal-canfd.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/hw/net/can/xlnx-versal-canfd.c @@ -XXX,XX +XXX,XX @@ +/* + * QEMU model of the Xilinx Versal CANFD device. + * + * This implementation is based on the following datasheet: + * https://docs.xilinx.com/v/u/2.0-English/pg223-canfd + * + * Copyright (c) 2023 Advanced Micro Devices, Inc. + * + * Written-by: Vikram Garhwal <vikram.garhwal@amd.com> + * + * Based on QEMU CANFD Device emulation implemented by Jin Yang, Deniz Eren and + * Pavel Pisa + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include "qemu/osdep.h" +#include "hw/sysbus.h" +#include "hw/irq.h" +#include "hw/register.h" +#include "qapi/error.h" +#include "qemu/bitops.h" +#include "qemu/log.h" +#include "qemu/cutils.h" +#include "qemu/event_notifier.h" +#include "hw/qdev-properties.h" +#include "qom/object_interfaces.h" +#include "migration/vmstate.h" +#include "hw/net/xlnx-versal-canfd.h" +#include "trace.h" + +REG32(SOFTWARE_RESET_REGISTER, 0x0) + FIELD(SOFTWARE_RESET_REGISTER, CEN, 1, 1) + FIELD(SOFTWARE_RESET_REGISTER, SRST, 0, 1) +REG32(MODE_SELECT_REGISTER, 0x4) + FIELD(MODE_SELECT_REGISTER, ITO, 8, 8) + FIELD(MODE_SELECT_REGISTER, ABR, 7, 1) + FIELD(MODE_SELECT_REGISTER, SBR, 6, 1) + FIELD(MODE_SELECT_REGISTER, DPEE, 5, 1) + FIELD(MODE_SELECT_REGISTER, DAR, 4, 1) + FIELD(MODE_SELECT_REGISTER, BRSD, 3, 1) + FIELD(MODE_SELECT_REGISTER, SNOOP, 2, 1) + FIELD(MODE_SELECT_REGISTER, LBACK, 1, 1) + FIELD(MODE_SELECT_REGISTER, SLEEP, 0, 1) +REG32(ARBITRATION_PHASE_BAUD_RATE_PRESCALER_REGISTER, 0x8) + FIELD(ARBITRATION_PHASE_BAUD_RATE_PRESCALER_REGISTER, BRP, 0, 8) +REG32(ARBITRATION_PHASE_BIT_TIMING_REGISTER, 0xc) + FIELD(ARBITRATION_PHASE_BIT_TIMING_REGISTER, SJW, 16, 7) + FIELD(ARBITRATION_PHASE_BIT_TIMING_REGISTER, TS2, 8, 7) + FIELD(ARBITRATION_PHASE_BIT_TIMING_REGISTER, TS1, 0, 8) +REG32(ERROR_COUNTER_REGISTER, 0x10) + FIELD(ERROR_COUNTER_REGISTER, REC, 8, 8) + FIELD(ERROR_COUNTER_REGISTER, TEC, 0, 8) +REG32(ERROR_STATUS_REGISTER, 0x14) + FIELD(ERROR_STATUS_REGISTER, F_BERR, 11, 1) + FIELD(ERROR_STATUS_REGISTER, F_STER, 10, 1) + FIELD(ERROR_STATUS_REGISTER, F_FMER, 9, 1) + FIELD(ERROR_STATUS_REGISTER, F_CRCER, 8, 1) + FIELD(ERROR_STATUS_REGISTER, ACKER, 4, 1) + FIELD(ERROR_STATUS_REGISTER, BERR, 3, 1) + FIELD(ERROR_STATUS_REGISTER, STER, 2, 1) + FIELD(ERROR_STATUS_REGISTER, FMER, 1, 1) + FIELD(ERROR_STATUS_REGISTER, CRCER, 0, 1) +REG32(STATUS_REGISTER, 0x18) + FIELD(STATUS_REGISTER, TDCV, 16, 7) + FIELD(STATUS_REGISTER, SNOOP, 12, 1) + FIELD(STATUS_REGISTER, BSFR_CONFIG, 10, 1) + FIELD(STATUS_REGISTER, PEE_CONFIG, 9, 1) + FIELD(STATUS_REGISTER, ESTAT, 7, 2) + FIELD(STATUS_REGISTER, ERRWRN, 6, 1) + FIELD(STATUS_REGISTER, BBSY, 5, 1) + FIELD(STATUS_REGISTER, BIDLE, 4, 1) + FIELD(STATUS_REGISTER, NORMAL, 3, 1) + FIELD(STATUS_REGISTER, SLEEP, 2, 1) + FIELD(STATUS_REGISTER, LBACK, 1, 1) + FIELD(STATUS_REGISTER, CONFIG, 0, 1) +REG32(INTERRUPT_STATUS_REGISTER, 0x1c) + FIELD(INTERRUPT_STATUS_REGISTER, TXEWMFLL, 31, 1) + FIELD(INTERRUPT_STATUS_REGISTER, TXEOFLW, 30, 1) + FIELD(INTERRUPT_STATUS_REGISTER, RXBOFLW_BI, 24, 6) + FIELD(INTERRUPT_STATUS_REGISTER, RXLRM_BI, 18, 6) + FIELD(INTERRUPT_STATUS_REGISTER, RXMNF, 17, 1) + FIELD(INTERRUPT_STATUS_REGISTER, RXFWMFLL_1, 16, 1) + FIELD(INTERRUPT_STATUS_REGISTER, RXFOFLW_1, 15, 1) + FIELD(INTERRUPT_STATUS_REGISTER, TXCRS, 14, 1) + FIELD(INTERRUPT_STATUS_REGISTER, TXRRS, 13, 1) + FIELD(INTERRUPT_STATUS_REGISTER, RXFWMFLL, 12, 1) + FIELD(INTERRUPT_STATUS_REGISTER, WKUP, 11, 1) + FIELD(INTERRUPT_STATUS_REGISTER, SLP, 10, 1) + FIELD(INTERRUPT_STATUS_REGISTER, BSOFF, 9, 1) + /* + * In the original HW description below bit is named as ERROR but an ERROR + * field name collides with a macro in Windows build. To avoid Windows build + * failures, the bit is renamed to ERROR_BIT. + */ + FIELD(INTERRUPT_STATUS_REGISTER, ERROR_BIT, 8, 1) + FIELD(INTERRUPT_STATUS_REGISTER, RXFOFLW, 6, 1) + FIELD(INTERRUPT_STATUS_REGISTER, TSCNT_OFLW, 5, 1) + FIELD(INTERRUPT_STATUS_REGISTER, RXOK, 4, 1) + FIELD(INTERRUPT_STATUS_REGISTER, BSFRD, 3, 1) + FIELD(INTERRUPT_STATUS_REGISTER, PEE, 2, 1) + FIELD(INTERRUPT_STATUS_REGISTER, TXOK, 1, 1) + FIELD(INTERRUPT_STATUS_REGISTER, ARBLST, 0, 1) +REG32(INTERRUPT_ENABLE_REGISTER, 0x20) + FIELD(INTERRUPT_ENABLE_REGISTER, ETXEWMFLL, 31, 1) + FIELD(INTERRUPT_ENABLE_REGISTER, ETXEOFLW, 30, 1) + FIELD(INTERRUPT_ENABLE_REGISTER, ERXMNF, 17, 1) + FIELD(INTERRUPT_ENABLE_REGISTER, ERXFWMFLL_1, 16, 1) + FIELD(INTERRUPT_ENABLE_REGISTER, ERXFOFLW_1, 15, 1) + FIELD(INTERRUPT_ENABLE_REGISTER, ETXCRS, 14, 1) + FIELD(INTERRUPT_ENABLE_REGISTER, ETXRRS, 13, 1) + FIELD(INTERRUPT_ENABLE_REGISTER, ERXFWMFLL, 12, 1) + FIELD(INTERRUPT_ENABLE_REGISTER, EWKUP, 11, 1) + FIELD(INTERRUPT_ENABLE_REGISTER, ESLP, 10, 1) + FIELD(INTERRUPT_ENABLE_REGISTER, EBSOFF, 9, 1) + FIELD(INTERRUPT_ENABLE_REGISTER, EERROR, 8, 1) + FIELD(INTERRUPT_ENABLE_REGISTER, ERFXOFLW, 6, 1) + FIELD(INTERRUPT_ENABLE_REGISTER, ETSCNT_OFLW, 5, 1) + FIELD(INTERRUPT_ENABLE_REGISTER, ERXOK, 4, 1) + FIELD(INTERRUPT_ENABLE_REGISTER, EBSFRD, 3, 1) + FIELD(INTERRUPT_ENABLE_REGISTER, EPEE, 2, 1) + FIELD(INTERRUPT_ENABLE_REGISTER, ETXOK, 1, 1) + FIELD(INTERRUPT_ENABLE_REGISTER, EARBLOST, 0, 1) +REG32(INTERRUPT_CLEAR_REGISTER, 0x24) + FIELD(INTERRUPT_CLEAR_REGISTER, CTXEWMFLL, 31, 1) + FIELD(INTERRUPT_CLEAR_REGISTER, CTXEOFLW, 30, 1) + FIELD(INTERRUPT_CLEAR_REGISTER, CRXMNF, 17, 1) + FIELD(INTERRUPT_CLEAR_REGISTER, CRXFWMFLL_1, 16, 1) + FIELD(INTERRUPT_CLEAR_REGISTER, CRXFOFLW_1, 15, 1) + FIELD(INTERRUPT_CLEAR_REGISTER, CTXCRS, 14, 1) + FIELD(INTERRUPT_CLEAR_REGISTER, CTXRRS, 13, 1) + FIELD(INTERRUPT_CLEAR_REGISTER, CRXFWMFLL, 12, 1) + FIELD(INTERRUPT_CLEAR_REGISTER, CWKUP, 11, 1) + FIELD(INTERRUPT_CLEAR_REGISTER, CSLP, 10, 1) + FIELD(INTERRUPT_CLEAR_REGISTER, CBSOFF, 9, 1) + FIELD(INTERRUPT_CLEAR_REGISTER, CERROR, 8, 1) + FIELD(INTERRUPT_CLEAR_REGISTER, CRFXOFLW, 6, 1) + FIELD(INTERRUPT_CLEAR_REGISTER, CTSCNT_OFLW, 5, 1) + FIELD(INTERRUPT_CLEAR_REGISTER, CRXOK, 4, 1) + FIELD(INTERRUPT_CLEAR_REGISTER, CBSFRD, 3, 1) + FIELD(INTERRUPT_CLEAR_REGISTER, CPEE, 2, 1) + FIELD(INTERRUPT_CLEAR_REGISTER, CTXOK, 1, 1) + FIELD(INTERRUPT_CLEAR_REGISTER, CARBLOST, 0, 1) +REG32(TIMESTAMP_REGISTER, 0x28) + FIELD(TIMESTAMP_REGISTER, TIMESTAMP_CNT, 16, 16) + FIELD(TIMESTAMP_REGISTER, CTS, 0, 1) +REG32(DATA_PHASE_BAUD_RATE_PRESCALER_REGISTER, 0x88) + FIELD(DATA_PHASE_BAUD_RATE_PRESCALER_REGISTER, TDC, 16, 1) + FIELD(DATA_PHASE_BAUD_RATE_PRESCALER_REGISTER, TDCOFF, 8, 6) + FIELD(DATA_PHASE_BAUD_RATE_PRESCALER_REGISTER, DP_BRP, 0, 8) +REG32(DATA_PHASE_BIT_TIMING_REGISTER, 0x8c) + FIELD(DATA_PHASE_BIT_TIMING_REGISTER, DP_SJW, 16, 4) + FIELD(DATA_PHASE_BIT_TIMING_REGISTER, DP_TS2, 8, 4) + FIELD(DATA_PHASE_BIT_TIMING_REGISTER, DP_TS1, 0, 5) +REG32(TX_BUFFER_READY_REQUEST_REGISTER, 0x90) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR31, 31, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR30, 30, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR29, 29, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR28, 28, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR27, 27, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR26, 26, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR25, 25, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR24, 24, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR23, 23, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR22, 22, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR21, 21, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR20, 20, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR19, 19, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR18, 18, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR17, 17, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR16, 16, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR15, 15, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR14, 14, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR13, 13, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR12, 12, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR11, 11, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR10, 10, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR9, 9, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR8, 8, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR7, 7, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR6, 6, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR5, 5, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR4, 4, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR3, 3, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR2, 2, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR1, 1, 1) + FIELD(TX_BUFFER_READY_REQUEST_REGISTER, RR0, 0, 1) +REG32(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, 0x94) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS31, 31, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS30, 30, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS29, 29, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS28, 28, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS27, 27, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS26, 26, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS25, 25, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS24, 24, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS23, 23, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS22, 22, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS21, 21, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS20, 20, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS19, 19, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS18, 18, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS17, 17, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS16, 16, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS15, 15, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS14, 14, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS13, 13, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS12, 12, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS11, 11, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS10, 10, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS9, 9, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS8, 8, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS7, 7, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS6, 6, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS5, 5, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS4, 4, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS3, 3, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS2, 2, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS1, 1, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, ERRS0, 0, 1) +REG32(TX_BUFFER_CANCEL_REQUEST_REGISTER, 0x98) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR31, 31, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR30, 30, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR29, 29, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR28, 28, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR27, 27, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR26, 26, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR25, 25, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR24, 24, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR23, 23, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR22, 22, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR21, 21, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR20, 20, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR19, 19, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR18, 18, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR17, 17, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR16, 16, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR15, 15, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR14, 14, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR13, 13, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR12, 12, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR11, 11, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR10, 10, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR9, 9, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR8, 8, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR7, 7, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR6, 6, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR5, 5, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR4, 4, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR3, 3, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR2, 2, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR1, 1, 1) + FIELD(TX_BUFFER_CANCEL_REQUEST_REGISTER, CR0, 0, 1) +REG32(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, 0x9c) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS31, 31, + 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS30, 30, + 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS29, 29, + 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS28, 28, + 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS27, 27, + 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS26, 26, + 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS25, 25, + 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS24, 24, + 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS23, 23, + 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS22, 22, + 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS21, 21, + 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS20, 20, + 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS19, 19, + 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS18, 18, + 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS17, 17, + 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS16, 16, + 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS15, 15, + 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS14, 14, + 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS13, 13, + 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS12, 12, + 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS11, 11, + 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS10, 10, + 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS9, 9, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS8, 8, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS7, 7, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS6, 6, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS5, 5, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS4, 4, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS3, 3, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS2, 2, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS1, 1, 1) + FIELD(INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, ECRS0, 0, 1) +REG32(TX_EVENT_FIFO_STATUS_REGISTER, 0xa0) + FIELD(TX_EVENT_FIFO_STATUS_REGISTER, TXE_FL, 8, 6) + FIELD(TX_EVENT_FIFO_STATUS_REGISTER, TXE_IRI, 7, 1) + FIELD(TX_EVENT_FIFO_STATUS_REGISTER, TXE_RI, 0, 5) +REG32(TX_EVENT_FIFO_WATERMARK_REGISTER, 0xa4) + FIELD(TX_EVENT_FIFO_WATERMARK_REGISTER, TXE_FWM, 0, 5) +REG32(ACCEPTANCE_FILTER_CONTROL_REGISTER, 0xe0) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF31, 31, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF30, 30, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF29, 29, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF28, 28, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF27, 27, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF26, 26, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF25, 25, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF24, 24, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF23, 23, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF22, 22, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF21, 21, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF20, 20, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF19, 19, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF18, 18, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF17, 17, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF16, 16, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF15, 15, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF14, 14, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF13, 13, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF12, 12, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF11, 11, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF10, 10, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF9, 9, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF8, 8, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF7, 7, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF6, 6, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF5, 5, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF4, 4, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF3, 3, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF2, 2, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF1, 1, 1) + FIELD(ACCEPTANCE_FILTER_CONTROL_REGISTER, UAF0, 0, 1) +REG32(RX_FIFO_STATUS_REGISTER, 0xe8) + FIELD(RX_FIFO_STATUS_REGISTER, FL_1, 24, 7) + FIELD(RX_FIFO_STATUS_REGISTER, IRI_1, 23, 1) + FIELD(RX_FIFO_STATUS_REGISTER, RI_1, 16, 6) + FIELD(RX_FIFO_STATUS_REGISTER, FL, 8, 7) + FIELD(RX_FIFO_STATUS_REGISTER, IRI, 7, 1) + FIELD(RX_FIFO_STATUS_REGISTER, RI, 0, 6) +REG32(RX_FIFO_WATERMARK_REGISTER, 0xec) + FIELD(RX_FIFO_WATERMARK_REGISTER, RXFP, 16, 5) + FIELD(RX_FIFO_WATERMARK_REGISTER, RXFWM_1, 8, 6) + FIELD(RX_FIFO_WATERMARK_REGISTER, RXFWM, 0, 6) +REG32(TB_ID_REGISTER, 0x100) + FIELD(TB_ID_REGISTER, ID, 21, 11) + FIELD(TB_ID_REGISTER, SRR_RTR_RRS, 20, 1) + FIELD(TB_ID_REGISTER, IDE, 19, 1) + FIELD(TB_ID_REGISTER, ID_EXT, 1, 18) + FIELD(TB_ID_REGISTER, RTR_RRS, 0, 1) +REG32(TB0_DLC_REGISTER, 0x104) + FIELD(TB0_DLC_REGISTER, DLC, 28, 4) + FIELD(TB0_DLC_REGISTER, FDF, 27, 1) + FIELD(TB0_DLC_REGISTER, BRS, 26, 1) + FIELD(TB0_DLC_REGISTER, RSVD2, 25, 1) + FIELD(TB0_DLC_REGISTER, EFC, 24, 1) + FIELD(TB0_DLC_REGISTER, MM, 16, 8) + FIELD(TB0_DLC_REGISTER, RSVD1, 0, 16) +REG32(TB_DW0_REGISTER, 0x108) + FIELD(TB_DW0_REGISTER, DATA_BYTES0, 24, 8) + FIELD(TB_DW0_REGISTER, DATA_BYTES1, 16, 8) + FIELD(TB_DW0_REGISTER, DATA_BYTES2, 8, 8) + FIELD(TB_DW0_REGISTER, DATA_BYTES3, 0, 8) +REG32(TB_DW1_REGISTER, 0x10c) + FIELD(TB_DW1_REGISTER, DATA_BYTES4, 24, 8) + FIELD(TB_DW1_REGISTER, DATA_BYTES5, 16, 8) + FIELD(TB_DW1_REGISTER, DATA_BYTES6, 8, 8) + FIELD(TB_DW1_REGISTER, DATA_BYTES7, 0, 8) +REG32(TB_DW2_REGISTER, 0x110) + FIELD(TB_DW2_REGISTER, DATA_BYTES8, 24, 8) + FIELD(TB_DW2_REGISTER, DATA_BYTES9, 16, 8) + FIELD(TB_DW2_REGISTER, DATA_BYTES10, 8, 8) + FIELD(TB_DW2_REGISTER, DATA_BYTES11, 0, 8) +REG32(TB_DW3_REGISTER, 0x114) + FIELD(TB_DW3_REGISTER, DATA_BYTES12, 24, 8) + FIELD(TB_DW3_REGISTER, DATA_BYTES13, 16, 8) + FIELD(TB_DW3_REGISTER, DATA_BYTES14, 8, 8) + FIELD(TB_DW3_REGISTER, DATA_BYTES15, 0, 8) +REG32(TB_DW4_REGISTER, 0x118) + FIELD(TB_DW4_REGISTER, DATA_BYTES16, 24, 8) + FIELD(TB_DW4_REGISTER, DATA_BYTES17, 16, 8) + FIELD(TB_DW4_REGISTER, DATA_BYTES18, 8, 8) + FIELD(TB_DW4_REGISTER, DATA_BYTES19, 0, 8) +REG32(TB_DW5_REGISTER, 0x11c) + FIELD(TB_DW5_REGISTER, DATA_BYTES20, 24, 8) + FIELD(TB_DW5_REGISTER, DATA_BYTES21, 16, 8) + FIELD(TB_DW5_REGISTER, DATA_BYTES22, 8, 8) + FIELD(TB_DW5_REGISTER, DATA_BYTES23, 0, 8) +REG32(TB_DW6_REGISTER, 0x120) + FIELD(TB_DW6_REGISTER, DATA_BYTES24, 24, 8) + FIELD(TB_DW6_REGISTER, DATA_BYTES25, 16, 8) + FIELD(TB_DW6_REGISTER, DATA_BYTES26, 8, 8) + FIELD(TB_DW6_REGISTER, DATA_BYTES27, 0, 8) +REG32(TB_DW7_REGISTER, 0x124) + FIELD(TB_DW7_REGISTER, DATA_BYTES28, 24, 8) + FIELD(TB_DW7_REGISTER, DATA_BYTES29, 16, 8) + FIELD(TB_DW7_REGISTER, DATA_BYTES30, 8, 8) + FIELD(TB_DW7_REGISTER, DATA_BYTES31, 0, 8) +REG32(TB_DW8_REGISTER, 0x128) + FIELD(TB_DW8_REGISTER, DATA_BYTES32, 24, 8) + FIELD(TB_DW8_REGISTER, DATA_BYTES33, 16, 8) + FIELD(TB_DW8_REGISTER, DATA_BYTES34, 8, 8) + FIELD(TB_DW8_REGISTER, DATA_BYTES35, 0, 8) +REG32(TB_DW9_REGISTER, 0x12c) + FIELD(TB_DW9_REGISTER, DATA_BYTES36, 24, 8) + FIELD(TB_DW9_REGISTER, DATA_BYTES37, 16, 8) + FIELD(TB_DW9_REGISTER, DATA_BYTES38, 8, 8) + FIELD(TB_DW9_REGISTER, DATA_BYTES39, 0, 8) +REG32(TB_DW10_REGISTER, 0x130) + FIELD(TB_DW10_REGISTER, DATA_BYTES40, 24, 8) + FIELD(TB_DW10_REGISTER, DATA_BYTES41, 16, 8) + FIELD(TB_DW10_REGISTER, DATA_BYTES42, 8, 8) + FIELD(TB_DW10_REGISTER, DATA_BYTES43, 0, 8) +REG32(TB_DW11_REGISTER, 0x134) + FIELD(TB_DW11_REGISTER, DATA_BYTES44, 24, 8) + FIELD(TB_DW11_REGISTER, DATA_BYTES45, 16, 8) + FIELD(TB_DW11_REGISTER, DATA_BYTES46, 8, 8) + FIELD(TB_DW11_REGISTER, DATA_BYTES47, 0, 8) +REG32(TB_DW12_REGISTER, 0x138) + FIELD(TB_DW12_REGISTER, DATA_BYTES48, 24, 8) + FIELD(TB_DW12_REGISTER, DATA_BYTES49, 16, 8) + FIELD(TB_DW12_REGISTER, DATA_BYTES50, 8, 8) + FIELD(TB_DW12_REGISTER, DATA_BYTES51, 0, 8) +REG32(TB_DW13_REGISTER, 0x13c) + FIELD(TB_DW13_REGISTER, DATA_BYTES52, 24, 8) + FIELD(TB_DW13_REGISTER, DATA_BYTES53, 16, 8) + FIELD(TB_DW13_REGISTER, DATA_BYTES54, 8, 8) + FIELD(TB_DW13_REGISTER, DATA_BYTES55, 0, 8) +REG32(TB_DW14_REGISTER, 0x140) + FIELD(TB_DW14_REGISTER, DATA_BYTES56, 24, 8) + FIELD(TB_DW14_REGISTER, DATA_BYTES57, 16, 8) + FIELD(TB_DW14_REGISTER, DATA_BYTES58, 8, 8) + FIELD(TB_DW14_REGISTER, DATA_BYTES59, 0, 8) +REG32(TB_DW15_REGISTER, 0x144) + FIELD(TB_DW15_REGISTER, DATA_BYTES60, 24, 8) + FIELD(TB_DW15_REGISTER, DATA_BYTES61, 16, 8) + FIELD(TB_DW15_REGISTER, DATA_BYTES62, 8, 8) + FIELD(TB_DW15_REGISTER, DATA_BYTES63, 0, 8) +REG32(AFMR_REGISTER, 0xa00) + FIELD(AFMR_REGISTER, AMID, 21, 11) + FIELD(AFMR_REGISTER, AMSRR, 20, 1) + FIELD(AFMR_REGISTER, AMIDE, 19, 1) + FIELD(AFMR_REGISTER, AMID_EXT, 1, 18) + FIELD(AFMR_REGISTER, AMRTR, 0, 1) +REG32(AFIR_REGISTER, 0xa04) + FIELD(AFIR_REGISTER, AIID, 21, 11) + FIELD(AFIR_REGISTER, AISRR, 20, 1) + FIELD(AFIR_REGISTER, AIIDE, 19, 1) + FIELD(AFIR_REGISTER, AIID_EXT, 1, 18) + FIELD(AFIR_REGISTER, AIRTR, 0, 1) +REG32(TXE_FIFO_TB_ID_REGISTER, 0x2000) + FIELD(TXE_FIFO_TB_ID_REGISTER, ID, 21, 11) + FIELD(TXE_FIFO_TB_ID_REGISTER, SRR_RTR_RRS, 20, 1) + FIELD(TXE_FIFO_TB_ID_REGISTER, IDE, 19, 1) + FIELD(TXE_FIFO_TB_ID_REGISTER, ID_EXT, 1, 18) + FIELD(TXE_FIFO_TB_ID_REGISTER, RTR_RRS, 0, 1) +REG32(TXE_FIFO_TB_DLC_REGISTER, 0x2004) + FIELD(TXE_FIFO_TB_DLC_REGISTER, DLC, 28, 4) + FIELD(TXE_FIFO_TB_DLC_REGISTER, FDF, 27, 1) + FIELD(TXE_FIFO_TB_DLC_REGISTER, BRS, 26, 1) + FIELD(TXE_FIFO_TB_DLC_REGISTER, ET, 24, 2) + FIELD(TXE_FIFO_TB_DLC_REGISTER, MM, 16, 8) + FIELD(TXE_FIFO_TB_DLC_REGISTER, TIMESTAMP, 0, 16) +REG32(RB_ID_REGISTER, 0x2100) + FIELD(RB_ID_REGISTER, ID, 21, 11) + FIELD(RB_ID_REGISTER, SRR_RTR_RRS, 20, 1) + FIELD(RB_ID_REGISTER, IDE, 19, 1) + FIELD(RB_ID_REGISTER, ID_EXT, 1, 18) + FIELD(RB_ID_REGISTER, RTR_RRS, 0, 1) +REG32(RB_DLC_REGISTER, 0x2104) + FIELD(RB_DLC_REGISTER, DLC, 28, 4) + FIELD(RB_DLC_REGISTER, FDF, 27, 1) + FIELD(RB_DLC_REGISTER, BRS, 26, 1) + FIELD(RB_DLC_REGISTER, ESI, 25, 1) + FIELD(RB_DLC_REGISTER, MATCHED_FILTER_INDEX, 16, 5) + FIELD(RB_DLC_REGISTER, TIMESTAMP, 0, 16) +REG32(RB_DW0_REGISTER, 0x2108) + FIELD(RB_DW0_REGISTER, DATA_BYTES0, 24, 8) + FIELD(RB_DW0_REGISTER, DATA_BYTES1, 16, 8) + FIELD(RB_DW0_REGISTER, DATA_BYTES2, 8, 8) + FIELD(RB_DW0_REGISTER, DATA_BYTES3, 0, 8) +REG32(RB_DW1_REGISTER, 0x210c) + FIELD(RB_DW1_REGISTER, DATA_BYTES4, 24, 8) + FIELD(RB_DW1_REGISTER, DATA_BYTES5, 16, 8) + FIELD(RB_DW1_REGISTER, DATA_BYTES6, 8, 8) + FIELD(RB_DW1_REGISTER, DATA_BYTES7, 0, 8) +REG32(RB_DW2_REGISTER, 0x2110) + FIELD(RB_DW2_REGISTER, DATA_BYTES8, 24, 8) + FIELD(RB_DW2_REGISTER, DATA_BYTES9, 16, 8) + FIELD(RB_DW2_REGISTER, DATA_BYTES10, 8, 8) + FIELD(RB_DW2_REGISTER, DATA_BYTES11, 0, 8) +REG32(RB_DW3_REGISTER, 0x2114) + FIELD(RB_DW3_REGISTER, DATA_BYTES12, 24, 8) + FIELD(RB_DW3_REGISTER, DATA_BYTES13, 16, 8) + FIELD(RB_DW3_REGISTER, DATA_BYTES14, 8, 8) + FIELD(RB_DW3_REGISTER, DATA_BYTES15, 0, 8) +REG32(RB_DW4_REGISTER, 0x2118) + FIELD(RB_DW4_REGISTER, DATA_BYTES16, 24, 8) + FIELD(RB_DW4_REGISTER, DATA_BYTES17, 16, 8) + FIELD(RB_DW4_REGISTER, DATA_BYTES18, 8, 8) + FIELD(RB_DW4_REGISTER, DATA_BYTES19, 0, 8) +REG32(RB_DW5_REGISTER, 0x211c) + FIELD(RB_DW5_REGISTER, DATA_BYTES20, 24, 8) + FIELD(RB_DW5_REGISTER, DATA_BYTES21, 16, 8) + FIELD(RB_DW5_REGISTER, DATA_BYTES22, 8, 8) + FIELD(RB_DW5_REGISTER, DATA_BYTES23, 0, 8) +REG32(RB_DW6_REGISTER, 0x2120) + FIELD(RB_DW6_REGISTER, DATA_BYTES24, 24, 8) + FIELD(RB_DW6_REGISTER, DATA_BYTES25, 16, 8) + FIELD(RB_DW6_REGISTER, DATA_BYTES26, 8, 8) + FIELD(RB_DW6_REGISTER, DATA_BYTES27, 0, 8) +REG32(RB_DW7_REGISTER, 0x2124) + FIELD(RB_DW7_REGISTER, DATA_BYTES28, 24, 8) + FIELD(RB_DW7_REGISTER, DATA_BYTES29, 16, 8) + FIELD(RB_DW7_REGISTER, DATA_BYTES30, 8, 8) + FIELD(RB_DW7_REGISTER, DATA_BYTES31, 0, 8) +REG32(RB_DW8_REGISTER, 0x2128) + FIELD(RB_DW8_REGISTER, DATA_BYTES32, 24, 8) + FIELD(RB_DW8_REGISTER, DATA_BYTES33, 16, 8) + FIELD(RB_DW8_REGISTER, DATA_BYTES34, 8, 8) + FIELD(RB_DW8_REGISTER, DATA_BYTES35, 0, 8) +REG32(RB_DW9_REGISTER, 0x212c) + FIELD(RB_DW9_REGISTER, DATA_BYTES36, 24, 8) + FIELD(RB_DW9_REGISTER, DATA_BYTES37, 16, 8) + FIELD(RB_DW9_REGISTER, DATA_BYTES38, 8, 8) + FIELD(RB_DW9_REGISTER, DATA_BYTES39, 0, 8) +REG32(RB_DW10_REGISTER, 0x2130) + FIELD(RB_DW10_REGISTER, DATA_BYTES40, 24, 8) + FIELD(RB_DW10_REGISTER, DATA_BYTES41, 16, 8) + FIELD(RB_DW10_REGISTER, DATA_BYTES42, 8, 8) + FIELD(RB_DW10_REGISTER, DATA_BYTES43, 0, 8) +REG32(RB_DW11_REGISTER, 0x2134) + FIELD(RB_DW11_REGISTER, DATA_BYTES44, 24, 8) + FIELD(RB_DW11_REGISTER, DATA_BYTES45, 16, 8) + FIELD(RB_DW11_REGISTER, DATA_BYTES46, 8, 8) + FIELD(RB_DW11_REGISTER, DATA_BYTES47, 0, 8) +REG32(RB_DW12_REGISTER, 0x2138) + FIELD(RB_DW12_REGISTER, DATA_BYTES48, 24, 8) + FIELD(RB_DW12_REGISTER, DATA_BYTES49, 16, 8) + FIELD(RB_DW12_REGISTER, DATA_BYTES50, 8, 8) + FIELD(RB_DW12_REGISTER, DATA_BYTES51, 0, 8) +REG32(RB_DW13_REGISTER, 0x213c) + FIELD(RB_DW13_REGISTER, DATA_BYTES52, 24, 8) + FIELD(RB_DW13_REGISTER, DATA_BYTES53, 16, 8) + FIELD(RB_DW13_REGISTER, DATA_BYTES54, 8, 8) + FIELD(RB_DW13_REGISTER, DATA_BYTES55, 0, 8) +REG32(RB_DW14_REGISTER, 0x2140) + FIELD(RB_DW14_REGISTER, DATA_BYTES56, 24, 8) + FIELD(RB_DW14_REGISTER, DATA_BYTES57, 16, 8) + FIELD(RB_DW14_REGISTER, DATA_BYTES58, 8, 8) + FIELD(RB_DW14_REGISTER, DATA_BYTES59, 0, 8) +REG32(RB_DW15_REGISTER, 0x2144) + FIELD(RB_DW15_REGISTER, DATA_BYTES60, 24, 8) + FIELD(RB_DW15_REGISTER, DATA_BYTES61, 16, 8) + FIELD(RB_DW15_REGISTER, DATA_BYTES62, 8, 8) + FIELD(RB_DW15_REGISTER, DATA_BYTES63, 0, 8) +REG32(RB_ID_REGISTER_1, 0x4100) + FIELD(RB_ID_REGISTER_1, ID, 21, 11) + FIELD(RB_ID_REGISTER_1, SRR_RTR_RRS, 20, 1) + FIELD(RB_ID_REGISTER_1, IDE, 19, 1) + FIELD(RB_ID_REGISTER_1, ID_EXT, 1, 18) + FIELD(RB_ID_REGISTER_1, RTR_RRS, 0, 1) +REG32(RB_DLC_REGISTER_1, 0x4104) + FIELD(RB_DLC_REGISTER_1, DLC, 28, 4) + FIELD(RB_DLC_REGISTER_1, FDF, 27, 1) + FIELD(RB_DLC_REGISTER_1, BRS, 26, 1) + FIELD(RB_DLC_REGISTER_1, ESI, 25, 1) + FIELD(RB_DLC_REGISTER_1, MATCHED_FILTER_INDEX, 16, 5) + FIELD(RB_DLC_REGISTER_1, TIMESTAMP, 0, 16) +REG32(RB0_DW0_REGISTER_1, 0x4108) + FIELD(RB0_DW0_REGISTER_1, DATA_BYTES0, 24, 8) + FIELD(RB0_DW0_REGISTER_1, DATA_BYTES1, 16, 8) + FIELD(RB0_DW0_REGISTER_1, DATA_BYTES2, 8, 8) + FIELD(RB0_DW0_REGISTER_1, DATA_BYTES3, 0, 8) +REG32(RB_DW1_REGISTER_1, 0x410c) + FIELD(RB_DW1_REGISTER_1, DATA_BYTES4, 24, 8) + FIELD(RB_DW1_REGISTER_1, DATA_BYTES5, 16, 8) + FIELD(RB_DW1_REGISTER_1, DATA_BYTES6, 8, 8) + FIELD(RB_DW1_REGISTER_1, DATA_BYTES7, 0, 8) +REG32(RB_DW2_REGISTER_1, 0x4110) + FIELD(RB_DW2_REGISTER_1, DATA_BYTES8, 24, 8) + FIELD(RB_DW2_REGISTER_1, DATA_BYTES9, 16, 8) + FIELD(RB_DW2_REGISTER_1, DATA_BYTES10, 8, 8) + FIELD(RB_DW2_REGISTER_1, DATA_BYTES11, 0, 8) +REG32(RB_DW3_REGISTER_1, 0x4114) + FIELD(RB_DW3_REGISTER_1, DATA_BYTES12, 24, 8) + FIELD(RB_DW3_REGISTER_1, DATA_BYTES13, 16, 8) + FIELD(RB_DW3_REGISTER_1, DATA_BYTES14, 8, 8) + FIELD(RB_DW3_REGISTER_1, DATA_BYTES15, 0, 8) +REG32(RB_DW4_REGISTER_1, 0x4118) + FIELD(RB_DW4_REGISTER_1, DATA_BYTES16, 24, 8) + FIELD(RB_DW4_REGISTER_1, DATA_BYTES17, 16, 8) + FIELD(RB_DW4_REGISTER_1, DATA_BYTES18, 8, 8) + FIELD(RB_DW4_REGISTER_1, DATA_BYTES19, 0, 8) +REG32(RB_DW5_REGISTER_1, 0x411c) + FIELD(RB_DW5_REGISTER_1, DATA_BYTES20, 24, 8) + FIELD(RB_DW5_REGISTER_1, DATA_BYTES21, 16, 8) + FIELD(RB_DW5_REGISTER_1, DATA_BYTES22, 8, 8) + FIELD(RB_DW5_REGISTER_1, DATA_BYTES23, 0, 8) +REG32(RB_DW6_REGISTER_1, 0x4120) + FIELD(RB_DW6_REGISTER_1, DATA_BYTES24, 24, 8) + FIELD(RB_DW6_REGISTER_1, DATA_BYTES25, 16, 8) + FIELD(RB_DW6_REGISTER_1, DATA_BYTES26, 8, 8) + FIELD(RB_DW6_REGISTER_1, DATA_BYTES27, 0, 8) +REG32(RB_DW7_REGISTER_1, 0x4124) + FIELD(RB_DW7_REGISTER_1, DATA_BYTES28, 24, 8) + FIELD(RB_DW7_REGISTER_1, DATA_BYTES29, 16, 8) + FIELD(RB_DW7_REGISTER_1, DATA_BYTES30, 8, 8) + FIELD(RB_DW7_REGISTER_1, DATA_BYTES31, 0, 8) +REG32(RB_DW8_REGISTER_1, 0x4128) + FIELD(RB_DW8_REGISTER_1, DATA_BYTES32, 24, 8) + FIELD(RB_DW8_REGISTER_1, DATA_BYTES33, 16, 8) + FIELD(RB_DW8_REGISTER_1, DATA_BYTES34, 8, 8) + FIELD(RB_DW8_REGISTER_1, DATA_BYTES35, 0, 8) +REG32(RB_DW9_REGISTER_1, 0x412c) + FIELD(RB_DW9_REGISTER_1, DATA_BYTES36, 24, 8) + FIELD(RB_DW9_REGISTER_1, DATA_BYTES37, 16, 8) + FIELD(RB_DW9_REGISTER_1, DATA_BYTES38, 8, 8) + FIELD(RB_DW9_REGISTER_1, DATA_BYTES39, 0, 8) +REG32(RB_DW10_REGISTER_1, 0x4130) + FIELD(RB_DW10_REGISTER_1, DATA_BYTES40, 24, 8) + FIELD(RB_DW10_REGISTER_1, DATA_BYTES41, 16, 8) + FIELD(RB_DW10_REGISTER_1, DATA_BYTES42, 8, 8) + FIELD(RB_DW10_REGISTER_1, DATA_BYTES43, 0, 8) +REG32(RB_DW11_REGISTER_1, 0x4134) + FIELD(RB_DW11_REGISTER_1, DATA_BYTES44, 24, 8) + FIELD(RB_DW11_REGISTER_1, DATA_BYTES45, 16, 8) + FIELD(RB_DW11_REGISTER_1, DATA_BYTES46, 8, 8) + FIELD(RB_DW11_REGISTER_1, DATA_BYTES47, 0, 8) +REG32(RB_DW12_REGISTER_1, 0x4138) + FIELD(RB_DW12_REGISTER_1, DATA_BYTES48, 24, 8) + FIELD(RB_DW12_REGISTER_1, DATA_BYTES49, 16, 8) + FIELD(RB_DW12_REGISTER_1, DATA_BYTES50, 8, 8) + FIELD(RB_DW12_REGISTER_1, DATA_BYTES51, 0, 8) +REG32(RB_DW13_REGISTER_1, 0x413c) + FIELD(RB_DW13_REGISTER_1, DATA_BYTES52, 24, 8) + FIELD(RB_DW13_REGISTER_1, DATA_BYTES53, 16, 8) + FIELD(RB_DW13_REGISTER_1, DATA_BYTES54, 8, 8) + FIELD(RB_DW13_REGISTER_1, DATA_BYTES55, 0, 8) +REG32(RB_DW14_REGISTER_1, 0x4140) + FIELD(RB_DW14_REGISTER_1, DATA_BYTES56, 24, 8) + FIELD(RB_DW14_REGISTER_1, DATA_BYTES57, 16, 8) + FIELD(RB_DW14_REGISTER_1, DATA_BYTES58, 8, 8) + FIELD(RB_DW14_REGISTER_1, DATA_BYTES59, 0, 8) +REG32(RB_DW15_REGISTER_1, 0x4144) + FIELD(RB_DW15_REGISTER_1, DATA_BYTES60, 24, 8) + FIELD(RB_DW15_REGISTER_1, DATA_BYTES61, 16, 8) + FIELD(RB_DW15_REGISTER_1, DATA_BYTES62, 8, 8) + FIELD(RB_DW15_REGISTER_1, DATA_BYTES63, 0, 8) + +static uint8_t canfd_dlc_array[8] = {8, 12, 16, 20, 24, 32, 48, 64}; + +static void canfd_update_irq(XlnxVersalCANFDState *s) +{ + unsigned int irq = s->regs[R_INTERRUPT_STATUS_REGISTER] & + s->regs[R_INTERRUPT_ENABLE_REGISTER]; + g_autofree char *path = object_get_canonical_path(OBJECT(s)); + + /* RX watermark interrupts. */ + if (ARRAY_FIELD_EX32(s->regs, RX_FIFO_STATUS_REGISTER, FL) > + ARRAY_FIELD_EX32(s->regs, RX_FIFO_WATERMARK_REGISTER, RXFWM)) { + ARRAY_FIELD_DP32(s->regs, INTERRUPT_STATUS_REGISTER, RXFWMFLL, 1); + } + + if (ARRAY_FIELD_EX32(s->regs, RX_FIFO_STATUS_REGISTER, FL_1) > + ARRAY_FIELD_EX32(s->regs, RX_FIFO_WATERMARK_REGISTER, RXFWM_1)) { + ARRAY_FIELD_DP32(s->regs, INTERRUPT_STATUS_REGISTER, RXFWMFLL_1, 1); + } + + /* TX watermark interrupt. */ + if (ARRAY_FIELD_EX32(s->regs, TX_EVENT_FIFO_STATUS_REGISTER, TXE_FL) > + ARRAY_FIELD_EX32(s->regs, TX_EVENT_FIFO_WATERMARK_REGISTER, TXE_FWM)) { + ARRAY_FIELD_DP32(s->regs, INTERRUPT_STATUS_REGISTER, TXEWMFLL, 1); + } + + trace_xlnx_canfd_update_irq(path, s->regs[R_INTERRUPT_STATUS_REGISTER], + s->regs[R_INTERRUPT_ENABLE_REGISTER], irq); + + qemu_set_irq(s->irq_canfd_int, irq); +} + +static void canfd_ier_post_write(RegisterInfo *reg, uint64_t val64) +{ + XlnxVersalCANFDState *s = XILINX_CANFD(reg->opaque); + + canfd_update_irq(s); +} + +static uint64_t canfd_icr_pre_write(RegisterInfo *reg, uint64_t val64) +{ + XlnxVersalCANFDState *s = XILINX_CANFD(reg->opaque); + uint32_t val = val64; + + s->regs[R_INTERRUPT_STATUS_REGISTER] &= ~val; + + /* + * RXBOFLW_BI field is automatically cleared to default if RXBOFLW bit is + * cleared in ISR. + */ + if (ARRAY_FIELD_EX32(s->regs, INTERRUPT_STATUS_REGISTER, RXFWMFLL_1)) { + ARRAY_FIELD_DP32(s->regs, INTERRUPT_STATUS_REGISTER, RXBOFLW_BI, 0); + } + + canfd_update_irq(s); + + return 0; +} + +static void canfd_config_reset(XlnxVersalCANFDState *s) +{ + + unsigned int i; + + /* Reset all the configuration registers. */ + for (i = 0; i < R_RX_FIFO_WATERMARK_REGISTER; ++i) { + register_reset(&s->reg_info[i]); + } + + canfd_update_irq(s); +} + +static void canfd_config_mode(XlnxVersalCANFDState *s) +{ + register_reset(&s->reg_info[R_ERROR_COUNTER_REGISTER]); + register_reset(&s->reg_info[R_ERROR_STATUS_REGISTER]); + register_reset(&s->reg_info[R_STATUS_REGISTER]); + + /* Put XlnxVersalCANFDState in configuration mode. */ + ARRAY_FIELD_DP32(s->regs, STATUS_REGISTER, CONFIG, 1); + ARRAY_FIELD_DP32(s->regs, INTERRUPT_STATUS_REGISTER, WKUP, 0); + ARRAY_FIELD_DP32(s->regs, INTERRUPT_STATUS_REGISTER, SLP, 0); + ARRAY_FIELD_DP32(s->regs, INTERRUPT_STATUS_REGISTER, BSOFF, 0); + ARRAY_FIELD_DP32(s->regs, INTERRUPT_STATUS_REGISTER, ERROR_BIT, 0); + ARRAY_FIELD_DP32(s->regs, INTERRUPT_STATUS_REGISTER, RXFOFLW, 0); + ARRAY_FIELD_DP32(s->regs, INTERRUPT_STATUS_REGISTER, RXFOFLW_1, 0); + ARRAY_FIELD_DP32(s->regs, INTERRUPT_STATUS_REGISTER, RXOK, 0); + ARRAY_FIELD_DP32(s->regs, INTERRUPT_STATUS_REGISTER, TXOK, 0); + ARRAY_FIELD_DP32(s->regs, INTERRUPT_STATUS_REGISTER, ARBLST, 0); + + /* Clear the time stamp. */ + ptimer_transaction_begin(s->canfd_timer); + ptimer_set_count(s->canfd_timer, 0); + ptimer_transaction_commit(s->canfd_timer); + + canfd_update_irq(s); +} + +static void update_status_register_mode_bits(XlnxVersalCANFDState *s) +{ + bool sleep_status = ARRAY_FIELD_EX32(s->regs, STATUS_REGISTER, SLEEP); + bool sleep_mode = ARRAY_FIELD_EX32(s->regs, MODE_SELECT_REGISTER, SLEEP); + /* Wake up interrupt bit. */ + bool wakeup_irq_val = !sleep_mode && sleep_status; + /* Sleep interrupt bit. */ + bool sleep_irq_val = sleep_mode && !sleep_status; + + /* Clear previous core mode status bits. */ + ARRAY_FIELD_DP32(s->regs, STATUS_REGISTER, LBACK, 0); + ARRAY_FIELD_DP32(s->regs, STATUS_REGISTER, SLEEP, 0); + ARRAY_FIELD_DP32(s->regs, STATUS_REGISTER, SNOOP, 0); + ARRAY_FIELD_DP32(s->regs, STATUS_REGISTER, NORMAL, 0); + + /* set current mode bit and generate irqs accordingly. */ + if (ARRAY_FIELD_EX32(s->regs, MODE_SELECT_REGISTER, LBACK)) { + ARRAY_FIELD_DP32(s->regs, STATUS_REGISTER, LBACK, 1); + } else if (ARRAY_FIELD_EX32(s->regs, MODE_SELECT_REGISTER, SLEEP)) { + ARRAY_FIELD_DP32(s->regs, STATUS_REGISTER, SLEEP, 1); + ARRAY_FIELD_DP32(s->regs, INTERRUPT_STATUS_REGISTER, SLP, + sleep_irq_val); + } else if (ARRAY_FIELD_EX32(s->regs, MODE_SELECT_REGISTER, SNOOP)) { + ARRAY_FIELD_DP32(s->regs, STATUS_REGISTER, SNOOP, 1); + } else { + /* If all bits are zero, XlnxVersalCANFDState is set in normal mode. */ + ARRAY_FIELD_DP32(s->regs, STATUS_REGISTER, NORMAL, 1); + /* Set wakeup interrupt bit. */ + ARRAY_FIELD_DP32(s->regs, INTERRUPT_STATUS_REGISTER, WKUP, + wakeup_irq_val); + } + + /* Put the CANFD in error active state. */ + ARRAY_FIELD_DP32(s->regs, STATUS_REGISTER, ESTAT, 1); + + canfd_update_irq(s); +} + +static uint64_t canfd_msr_pre_write(RegisterInfo *reg, uint64_t val64) +{ + XlnxVersalCANFDState *s = XILINX_CANFD(reg->opaque); + uint32_t val = val64; + uint8_t multi_mode = 0; + + /* + * Multiple mode set check. This is done to make sure user doesn't set + * multiple modes. + */ + multi_mode = FIELD_EX32(val, MODE_SELECT_REGISTER, LBACK) + + FIELD_EX32(val, MODE_SELECT_REGISTER, SLEEP) + + FIELD_EX32(val, MODE_SELECT_REGISTER, SNOOP); + + if (multi_mode > 1) { + qemu_log_mask(LOG_GUEST_ERROR, "Attempting to configure several modes" + " simultaneously. One mode will be selected according to" + " their priority: LBACK > SLEEP > SNOOP.\n"); + } + + if (ARRAY_FIELD_EX32(s->regs, SOFTWARE_RESET_REGISTER, CEN) == 0) { + /* In configuration mode, any mode can be selected. */ + s->regs[R_MODE_SELECT_REGISTER] = val; + } else { + bool sleep_mode_bit = FIELD_EX32(val, MODE_SELECT_REGISTER, SLEEP); + + ARRAY_FIELD_DP32(s->regs, MODE_SELECT_REGISTER, SLEEP, sleep_mode_bit); + + if (FIELD_EX32(val, MODE_SELECT_REGISTER, LBACK)) { + qemu_log_mask(LOG_GUEST_ERROR, "Attempting to set LBACK mode" + " without setting CEN bit as 0\n"); + } else if (FIELD_EX32(val, MODE_SELECT_REGISTER, SNOOP)) { + qemu_log_mask(LOG_GUEST_ERROR, "Attempting to set SNOOP mode" + " without setting CEN bit as 0\n"); + } + + update_status_register_mode_bits(s); + } + + return s->regs[R_MODE_SELECT_REGISTER]; +} + +static void canfd_exit_sleep_mode(XlnxVersalCANFDState *s) +{ + ARRAY_FIELD_DP32(s->regs, MODE_SELECT_REGISTER, SLEEP, 0); + update_status_register_mode_bits(s); +} + +static void regs2frame(XlnxVersalCANFDState *s, qemu_can_frame *frame, + uint32_t reg_num) +{ + uint32_t i = 0; + uint32_t j = 0; + uint32_t val = 0; + uint32_t dlc_reg_val = 0; + uint32_t dlc_value = 0; + + /* Check that reg_num should be within TX register space. */ + assert(reg_num <= R_TB_ID_REGISTER + (NUM_REGS_PER_MSG_SPACE * + s->cfg.tx_fifo)); + + dlc_reg_val = s->regs[reg_num + 1]; + dlc_value = FIELD_EX32(dlc_reg_val, TB0_DLC_REGISTER, DLC); + + frame->can_id = s->regs[reg_num]; + + if (FIELD_EX32(dlc_reg_val, TB0_DLC_REGISTER, FDF)) { + /* + * CANFD frame. + * Converting dlc(0 to 15) 4 Byte data to plain length(i.e. 0 to 64) + * 1 Byte data. This is done to make it work with SocketCAN. + * On actual CANFD frame, this value can't be more than 0xF. + * Conversion table for DLC to plain length: + * + * DLC Plain Length + * 0 - 8 0 - 8 + * 9 9 - 12 + * 10 13 - 16 + * 11 17 - 20 + * 12 21 - 24 + * 13 25 - 32 + * 14 33 - 48 + * 15 49 - 64 + */ + + frame->flags = QEMU_CAN_FRMF_TYPE_FD; + + if (dlc_value < 8) { + frame->can_dlc = dlc_value; + } else { + assert((dlc_value - 8) < ARRAY_SIZE(canfd_dlc_array)); + frame->can_dlc = canfd_dlc_array[dlc_value - 8]; + } + } else { + /* + * FD Format bit not set that means it is a CAN Frame. + * Conversion table for classic CAN: + * + * DLC Plain Length + * 0 - 7 0 - 7 + * 8 - 15 8 + */ + + if (dlc_value > 8) { + frame->can_dlc = 8; + qemu_log_mask(LOG_GUEST_ERROR, "Maximum DLC value for Classic CAN" + " frame is 8. Only 8 byte data will be sent.\n"); + } else { + frame->can_dlc = dlc_value; + } + } + + for (j = 0; j < frame->can_dlc; j++) { + val = 8 * i; + + frame->data[j] = extract32(s->regs[reg_num + 2 + (j / 4)], val, 8); + i++; + + if (i % 4 == 0) { + i = 0; + } + } +} + +static void process_cancellation_requests(XlnxVersalCANFDState *s) +{ + uint32_t clear_mask = s->regs[R_TX_BUFFER_READY_REQUEST_REGISTER] & + s->regs[R_TX_BUFFER_CANCEL_REQUEST_REGISTER]; + + s->regs[R_TX_BUFFER_READY_REQUEST_REGISTER] &= ~clear_mask; + s->regs[R_TX_BUFFER_CANCEL_REQUEST_REGISTER] &= ~clear_mask; + + canfd_update_irq(s); +} + +static void store_rx_sequential(XlnxVersalCANFDState *s, + const qemu_can_frame *frame, + uint32_t fill_level, uint32_t read_index, + uint32_t store_location, uint8_t rx_fifo, + bool rx_fifo_id, uint8_t filter_index) +{ + int i; + bool is_canfd_frame; + uint8_t dlc = frame->can_dlc; + uint8_t rx_reg_num = 0; + uint32_t dlc_reg_val = 0; + uint32_t data_reg_val = 0; + + /* Getting RX0/1 fill level */ + if ((fill_level) > rx_fifo - 1) { + g_autofree char *path = object_get_canonical_path(OBJECT(s)); + + qemu_log_mask(LOG_GUEST_ERROR, "%s: RX%d Buffer is full. Discarding the" + " message\n", path, rx_fifo_id); + + /* Set the corresponding RF buffer overflow interrupt. */ + if (rx_fifo_id == 0) { + ARRAY_FIELD_DP32(s->regs, INTERRUPT_STATUS_REGISTER, RXFOFLW, 1); + } else { + ARRAY_FIELD_DP32(s->regs, INTERRUPT_STATUS_REGISTER, RXFOFLW_1, 1); + } + } else { + uint16_t rx_timestamp = CANFD_TIMER_MAX - + ptimer_get_count(s->canfd_timer); + + if (rx_timestamp == 0xFFFF) { + ARRAY_FIELD_DP32(s->regs, INTERRUPT_STATUS_REGISTER, TSCNT_OFLW, 1); + } else { + ARRAY_FIELD_DP32(s->regs, TIMESTAMP_REGISTER, TIMESTAMP_CNT, + rx_timestamp); + } + + if (rx_fifo_id == 0) { + ARRAY_FIELD_DP32(s->regs, RX_FIFO_STATUS_REGISTER, FL, + fill_level + 1); + assert(store_location <= + R_RB_ID_REGISTER + (s->cfg.rx0_fifo * + NUM_REGS_PER_MSG_SPACE)); + } else { + ARRAY_FIELD_DP32(s->regs, RX_FIFO_STATUS_REGISTER, FL_1, + fill_level + 1); + assert(store_location <= + R_RB_ID_REGISTER_1 + (s->cfg.rx1_fifo * + NUM_REGS_PER_MSG_SPACE)); + } + + s->regs[store_location] = frame->can_id; + + dlc = frame->can_dlc; + + if (frame->flags == QEMU_CAN_FRMF_TYPE_FD) { + is_canfd_frame = true; + + /* Store dlc value in Xilinx specific format. */ + for (i = 0; i < ARRAY_SIZE(canfd_dlc_array); i++) { + if (canfd_dlc_array[i] == frame->can_dlc) { + dlc_reg_val = FIELD_DP32(0, RB_DLC_REGISTER, DLC, 8 + i); + } + } + } else { + is_canfd_frame = false; + + if (frame->can_dlc > 8) { + dlc = 8; + } + + dlc_reg_val = FIELD_DP32(0, RB_DLC_REGISTER, DLC, dlc); + } + + dlc_reg_val |= FIELD_DP32(0, RB_DLC_REGISTER, FDF, is_canfd_frame); + dlc_reg_val |= FIELD_DP32(0, RB_DLC_REGISTER, TIMESTAMP, rx_timestamp); + dlc_reg_val |= FIELD_DP32(0, RB_DLC_REGISTER, MATCHED_FILTER_INDEX, + filter_index); + s->regs[store_location + 1] = dlc_reg_val; + + for (i = 0; i < dlc; i++) { + /* Register size is 4 byte but frame->data each is 1 byte. */ + switch (i % 4) { + case 0: + rx_reg_num = i / 4; + + data_reg_val = FIELD_DP32(0, RB_DW0_REGISTER, DATA_BYTES3, + frame->data[i]); + break; + case 1: + data_reg_val |= FIELD_DP32(0, RB_DW0_REGISTER, DATA_BYTES2, + frame->data[i]); + break; + case 2: + data_reg_val |= FIELD_DP32(0, RB_DW0_REGISTER, DATA_BYTES1, + frame->data[i]); + break; + case 3: + data_reg_val |= FIELD_DP32(0, RB_DW0_REGISTER, DATA_BYTES0, + frame->data[i]); + /* + * Last Bytes data which means we have all 4 bytes ready to + * store in one rx regs. + */ + s->regs[store_location + rx_reg_num + 2] = data_reg_val; + break; + } + } + + if (i % 4) { + /* + * In case DLC is not multiplier of 4, data is not saved to RX FIFO + * in above switch case. Store the remaining bytes here. + */ + s->regs[store_location + rx_reg_num + 2] = data_reg_val; + } + + /* set the interrupt as RXOK. */ + ARRAY_FIELD_DP32(s->regs, INTERRUPT_STATUS_REGISTER, RXOK, 1); + } +} + +static void update_rx_sequential(XlnxVersalCANFDState *s, + const qemu_can_frame *frame) +{ + bool filter_pass = false; + uint8_t filter_index = 0; + int i; + int filter_partition = ARRAY_FIELD_EX32(s->regs, + RX_FIFO_WATERMARK_REGISTER, RXFP); + uint32_t store_location; + uint32_t fill_level; + uint32_t read_index; + uint8_t store_index = 0; + g_autofree char *path = NULL; + /* + * If all UAF bits are set to 0, then received messages are not stored + * in the RX buffers. + */ + if (s->regs[R_ACCEPTANCE_FILTER_CONTROL_REGISTER]) { + uint32_t acceptance_filter_status = + s->regs[R_ACCEPTANCE_FILTER_CONTROL_REGISTER]; + + for (i = 0; i < 32; i++) { + if (acceptance_filter_status & 0x1) { + uint32_t msg_id_masked = s->regs[R_AFMR_REGISTER + 2 * i] & + frame->can_id; + uint32_t afir_id_masked = s->regs[R_AFIR_REGISTER + 2 * i] & + s->regs[R_AFMR_REGISTER + 2 * i]; + uint16_t std_msg_id_masked = FIELD_EX32(msg_id_masked, + AFIR_REGISTER, AIID); + uint16_t std_afir_id_masked = FIELD_EX32(afir_id_masked, + AFIR_REGISTER, AIID); + uint32_t ext_msg_id_masked = FIELD_EX32(msg_id_masked, + AFIR_REGISTER, + AIID_EXT); + uint32_t ext_afir_id_masked = FIELD_EX32(afir_id_masked, + AFIR_REGISTER, + AIID_EXT); + bool ext_ide = FIELD_EX32(s->regs[R_AFMR_REGISTER + 2 * i], + AFMR_REGISTER, AMIDE); + + if (std_msg_id_masked == std_afir_id_masked) { + if (ext_ide) { + /* Extended message ID message. */ + if (ext_msg_id_masked == ext_afir_id_masked) { + filter_pass = true; + filter_index = i; + + break; + } + } else { + /* Standard message ID. */ + filter_pass = true; + filter_index = i; + + break; + } + } + } + acceptance_filter_status >>= 1; + } + } + + if (!filter_pass) { + path = object_get_canonical_path(OBJECT(s)); + + trace_xlnx_canfd_rx_fifo_filter_reject(path, frame->can_id, + frame->can_dlc); + } else { + if (filter_index <= filter_partition) { + fill_level = ARRAY_FIELD_EX32(s->regs, RX_FIFO_STATUS_REGISTER, FL); + read_index = ARRAY_FIELD_EX32(s->regs, RX_FIFO_STATUS_REGISTER, RI); + store_index = read_index + fill_level; + + if (read_index == s->cfg.rx0_fifo - 1) { + /* + * When ri is s->cfg.rx0_fifo - 1 i.e. max, it goes cyclic that + * means we reset the ri to 0x0. + */ + read_index = 0; + ARRAY_FIELD_DP32(s->regs, RX_FIFO_STATUS_REGISTER, RI, + read_index); + } + + if (store_index > s->cfg.rx0_fifo - 1) { + store_index -= s->cfg.rx0_fifo - 1; + } + + store_location = R_RB_ID_REGISTER + + (store_index * NUM_REGS_PER_MSG_SPACE); + + store_rx_sequential(s, frame, fill_level, read_index, + store_location, s->cfg.rx0_fifo, 0, + filter_index); + } else { + /* RX 1 fill level message */ + fill_level = ARRAY_FIELD_EX32(s->regs, RX_FIFO_STATUS_REGISTER, + FL_1); + read_index = ARRAY_FIELD_EX32(s->regs, RX_FIFO_STATUS_REGISTER, + RI_1); + store_index = read_index + fill_level; + + if (read_index == s->cfg.rx1_fifo - 1) { + /* + * When ri is s->cfg.rx1_fifo - 1 i.e. max, it goes cyclic that + * means we reset the ri to 0x0. + */ + read_index = 0; + ARRAY_FIELD_DP32(s->regs, RX_FIFO_STATUS_REGISTER, RI_1, + read_index); + } + + if (store_index > s->cfg.rx1_fifo - 1) { + store_index -= s->cfg.rx1_fifo - 1; + } + + store_location = R_RB_ID_REGISTER_1 + + (store_index * NUM_REGS_PER_MSG_SPACE); + + store_rx_sequential(s, frame, fill_level, read_index, + store_location, s->cfg.rx1_fifo, 1, + filter_index); + } + + path = object_get_canonical_path(OBJECT(s)); + + trace_xlnx_canfd_rx_data(path, frame->can_id, frame->can_dlc, + frame->flags); + canfd_update_irq(s); + } +} + +static bool tx_ready_check(XlnxVersalCANFDState *s) +{ + if (ARRAY_FIELD_EX32(s->regs, SOFTWARE_RESET_REGISTER, SRST)) { + g_autofree char *path = object_get_canonical_path(OBJECT(s)); + + qemu_log_mask(LOG_GUEST_ERROR, "%s: Attempting to transfer data while" + " XlnxVersalCANFDState is in reset mode\n", path); + + return false; + } + + if (ARRAY_FIELD_EX32(s->regs, SOFTWARE_RESET_REGISTER, CEN) == 0) { + g_autofree char *path = object_get_canonical_path(OBJECT(s)); + + qemu_log_mask(LOG_GUEST_ERROR, "%s: Attempting to transfer data while" + " XlnxVersalCANFDState is in configuration mode." + " Reset the core so operations can start fresh\n", + path); + return false; + } + + if (ARRAY_FIELD_EX32(s->regs, MODE_SELECT_REGISTER, SNOOP)) { + g_autofree char *path = object_get_canonical_path(OBJECT(s)); + + qemu_log_mask(LOG_GUEST_ERROR, "%s: Attempting to transfer data while" + " XlnxVersalCANFDState is in SNOOP MODE\n", + path); + return false; + } + + return true; +} + +static void tx_fifo_stamp(XlnxVersalCANFDState *s, uint32_t tb0_regid) +{ + /* + * If EFC bit in DLC message is set, this means we will store the + * event of this transmitted message with time stamp. + */ + uint32_t dlc_reg_val = 0; + + if (FIELD_EX32(s->regs[tb0_regid + 1], TB0_DLC_REGISTER, EFC)) { + uint8_t dlc_val = FIELD_EX32(s->regs[tb0_regid + 1], TB0_DLC_REGISTER, + DLC); + bool fdf_val = FIELD_EX32(s->regs[tb0_regid + 1], TB0_DLC_REGISTER, + FDF); + bool brs_val = FIELD_EX32(s->regs[tb0_regid + 1], TB0_DLC_REGISTER, + BRS); + uint8_t mm_val = FIELD_EX32(s->regs[tb0_regid + 1], TB0_DLC_REGISTER, + MM); + uint8_t fill_level = ARRAY_FIELD_EX32(s->regs, + TX_EVENT_FIFO_STATUS_REGISTER, + TXE_FL); + uint8_t read_index = ARRAY_FIELD_EX32(s->regs, + TX_EVENT_FIFO_STATUS_REGISTER, + TXE_RI); + uint8_t store_index = fill_level + read_index; + + if ((fill_level) > s->cfg.tx_fifo - 1) { + qemu_log_mask(LOG_GUEST_ERROR, "TX Event Buffer is full." + " Discarding the message\n"); + ARRAY_FIELD_DP32(s->regs, INTERRUPT_STATUS_REGISTER, TXEOFLW, 1); + } else { + if (read_index == s->cfg.tx_fifo - 1) { + /* + * When ri is s->cfg.tx_fifo - 1 i.e. max, it goes cyclic that + * means we reset the ri to 0x0. + */ + read_index = 0; + ARRAY_FIELD_DP32(s->regs, TX_EVENT_FIFO_STATUS_REGISTER, TXE_RI, + read_index); + } + + if (store_index > s->cfg.tx_fifo - 1) { + store_index -= s->cfg.tx_fifo - 1; + } + + assert(store_index < s->cfg.tx_fifo); + + uint32_t tx_event_reg0_id = R_TXE_FIFO_TB_ID_REGISTER + + (store_index * 2); + + /* Store message ID in TX event register. */ + s->regs[tx_event_reg0_id] = s->regs[tb0_regid]; + + uint16_t tx_timestamp = CANFD_TIMER_MAX - + ptimer_get_count(s->canfd_timer); + + /* Store DLC with time stamp in DLC regs. */ + dlc_reg_val = FIELD_DP32(0, TXE_FIFO_TB_DLC_REGISTER, DLC, dlc_val); + dlc_reg_val |= FIELD_DP32(0, TXE_FIFO_TB_DLC_REGISTER, FDF, + fdf_val); + dlc_reg_val |= FIELD_DP32(0, TXE_FIFO_TB_DLC_REGISTER, BRS, + brs_val); + dlc_reg_val |= FIELD_DP32(0, TXE_FIFO_TB_DLC_REGISTER, ET, 0x3); + dlc_reg_val |= FIELD_DP32(0, TXE_FIFO_TB_DLC_REGISTER, MM, mm_val); + dlc_reg_val |= FIELD_DP32(0, TXE_FIFO_TB_DLC_REGISTER, TIMESTAMP, + tx_timestamp); + s->regs[tx_event_reg0_id + 1] = dlc_reg_val; + + ARRAY_FIELD_DP32(s->regs, TX_EVENT_FIFO_STATUS_REGISTER, TXE_FL, + fill_level + 1); + } + } +} + +static gint g_cmp_ids(gconstpointer data1, gconstpointer data2) +{ + tx_ready_reg_info *tx_reg_1 = (tx_ready_reg_info *) data1; + tx_ready_reg_info *tx_reg_2 = (tx_ready_reg_info *) data2; + + return tx_reg_1->can_id - tx_reg_2->can_id; +} + +static void free_list(GSList *list) +{ + GSList *iterator = NULL; + + for (iterator = list; iterator != NULL; iterator = iterator->next) { + g_free((tx_ready_reg_info *)iterator->data); + } + + g_slist_free(list); + + return; +} + +static GSList *prepare_tx_data(XlnxVersalCANFDState *s) +{ + uint8_t i = 0; + GSList *list = NULL; + uint32_t reg_num = 0; + uint32_t reg_ready = s->regs[R_TX_BUFFER_READY_REQUEST_REGISTER]; + + /* First find the messages which are ready for transmission. */ + for (i = 0; i < s->cfg.tx_fifo; i++) { + if (reg_ready & 1) { + reg_num = R_TB_ID_REGISTER + (NUM_REGS_PER_MSG_SPACE * i); + tx_ready_reg_info *temp = g_new(tx_ready_reg_info, 1); + + temp->can_id = s->regs[reg_num]; + temp->reg_num = reg_num; + list = g_slist_prepend(list, temp); + list = g_slist_sort(list, g_cmp_ids); + } + + reg_ready >>= 1; + } + + s->regs[R_TX_BUFFER_READY_REQUEST_REGISTER] = 0; + s->regs[R_TX_BUFFER_CANCEL_REQUEST_REGISTER] = 0; + + return list; +} + +static void transfer_data(XlnxVersalCANFDState *s) +{ + bool canfd_tx = tx_ready_check(s); + GSList *list, *iterator = NULL; + qemu_can_frame frame; + + if (!canfd_tx) { + g_autofree char *path = object_get_canonical_path(OBJECT(s)); + + qemu_log_mask(LOG_GUEST_ERROR, "%s: Controller not enabled for data" + " transfer\n", path); + return; + } + + list = prepare_tx_data(s); + if (list == NULL) { + return; + } + + for (iterator = list; iterator != NULL; iterator = iterator->next) { + regs2frame(s, &frame, + ((tx_ready_reg_info *)iterator->data)->reg_num); + + if (ARRAY_FIELD_EX32(s->regs, STATUS_REGISTER, LBACK)) { + update_rx_sequential(s, &frame); + tx_fifo_stamp(s, ((tx_ready_reg_info *)iterator->data)->reg_num); + + ARRAY_FIELD_DP32(s->regs, INTERRUPT_STATUS_REGISTER, RXOK, 1); + } else { + g_autofree char *path = object_get_canonical_path(OBJECT(s)); + + trace_xlnx_canfd_tx_data(path, frame.can_id, frame.can_dlc, + frame.flags); + can_bus_client_send(&s->bus_client, &frame, 1); + tx_fifo_stamp(s, + ((tx_ready_reg_info *)iterator->data)->reg_num); + + ARRAY_FIELD_DP32(s->regs, INTERRUPT_STATUS_REGISTER, TXRRS, 1); + + if (ARRAY_FIELD_EX32(s->regs, STATUS_REGISTER, SLEEP)) { + canfd_exit_sleep_mode(s); + } + } + } + + ARRAY_FIELD_DP32(s->regs, INTERRUPT_STATUS_REGISTER, TXOK, 1); + free_list(list); + + canfd_update_irq(s); +} + +static uint64_t canfd_srr_pre_write(RegisterInfo *reg, uint64_t val64) +{ + XlnxVersalCANFDState *s = XILINX_CANFD(reg->opaque); + uint32_t val = val64; + + ARRAY_FIELD_DP32(s->regs, SOFTWARE_RESET_REGISTER, CEN, + FIELD_EX32(val, SOFTWARE_RESET_REGISTER, CEN)); + + if (FIELD_EX32(val, SOFTWARE_RESET_REGISTER, SRST)) { + g_autofree char *path = object_get_canonical_path(OBJECT(s)); + + trace_xlnx_canfd_reset(path, val64); + + /* First, core will do software reset then will enter in config mode. */ + canfd_config_reset(s); + } else if (ARRAY_FIELD_EX32(s->regs, SOFTWARE_RESET_REGISTER, CEN) == 0) { + canfd_config_mode(s); + } else { + /* + * Leave config mode. Now XlnxVersalCANFD core will enter Normal, Sleep, + * snoop or Loopback mode depending upon LBACK, SLEEP, SNOOP register + * states. + */ + ARRAY_FIELD_DP32(s->regs, STATUS_REGISTER, CONFIG, 0); + + ptimer_transaction_begin(s->canfd_timer); + ptimer_set_count(s->canfd_timer, 0); + ptimer_transaction_commit(s->canfd_timer); + update_status_register_mode_bits(s); + transfer_data(s); + } + + return s->regs[R_SOFTWARE_RESET_REGISTER]; +} + +static uint64_t filter_mask(RegisterInfo *reg, uint64_t val64) +{ + XlnxVersalCANFDState *s = XILINX_CANFD(reg->opaque); + uint32_t reg_idx = (reg->access->addr) / 4; + uint32_t val = val64; + uint32_t filter_offset = (reg_idx - R_AFMR_REGISTER) / 2; + + if (!(s->regs[R_ACCEPTANCE_FILTER_CONTROL_REGISTER] & + (1 << filter_offset))) { + s->regs[reg_idx] = val; + } else { + g_autofree char *path = object_get_canonical_path(OBJECT(s)); + + qemu_log_mask(LOG_GUEST_ERROR, "%s: Acceptance filter %d not enabled\n", + path, filter_offset + 1); + } + + return s->regs[reg_idx]; +} + +static uint64_t filter_id(RegisterInfo *reg, uint64_t val64) +{ + XlnxVersalCANFDState *s = XILINX_CANFD(reg->opaque); + hwaddr reg_idx = (reg->access->addr) / 4; + uint32_t val = val64; + uint32_t filter_offset = (reg_idx - R_AFIR_REGISTER) / 2; + + if (!(s->regs[R_ACCEPTANCE_FILTER_CONTROL_REGISTER] & + (1 << filter_offset))) { + s->regs[reg_idx] = val; + } else { + g_autofree char *path = object_get_canonical_path(OBJECT(s)); + + qemu_log_mask(LOG_GUEST_ERROR, "%s: Acceptance filter %d not enabled\n", + path, filter_offset + 1); + } + + return s->regs[reg_idx]; +} + +static uint64_t canfd_tx_fifo_status_prew(RegisterInfo *reg, uint64_t val64) +{ + XlnxVersalCANFDState *s = XILINX_CANFD(reg->opaque); + uint32_t val = val64; + uint8_t read_ind = 0; + uint8_t fill_ind = ARRAY_FIELD_EX32(s->regs, TX_EVENT_FIFO_STATUS_REGISTER, + TXE_FL); + + if (FIELD_EX32(val, TX_EVENT_FIFO_STATUS_REGISTER, TXE_IRI) && fill_ind) { + read_ind = ARRAY_FIELD_EX32(s->regs, TX_EVENT_FIFO_STATUS_REGISTER, + TXE_RI) + 1; + + if (read_ind > s->cfg.tx_fifo - 1) { + read_ind = 0; + } + + /* + * Increase the read index by 1 and decrease the fill level by 1. + */ + ARRAY_FIELD_DP32(s->regs, TX_EVENT_FIFO_STATUS_REGISTER, TXE_RI, + read_ind); + ARRAY_FIELD_DP32(s->regs, TX_EVENT_FIFO_STATUS_REGISTER, TXE_FL, + fill_ind - 1); + } + + return s->regs[R_TX_EVENT_FIFO_STATUS_REGISTER]; +} + +static uint64_t canfd_rx_fifo_status_prew(RegisterInfo *reg, uint64_t val64) +{ + XlnxVersalCANFDState *s = XILINX_CANFD(reg->opaque); + uint32_t val = val64; + uint8_t read_ind = 0; + uint8_t fill_ind = 0; + + if (FIELD_EX32(val, RX_FIFO_STATUS_REGISTER, IRI)) { + /* FL index is zero, setting IRI bit has no effect. */ + if (FIELD_EX32(val, RX_FIFO_STATUS_REGISTER, FL) != 0) { + read_ind = FIELD_EX32(val, RX_FIFO_STATUS_REGISTER, RI) + 1; + + if (read_ind > s->cfg.rx0_fifo - 1) { + read_ind = 0; + } + + fill_ind = FIELD_EX32(val, RX_FIFO_STATUS_REGISTER, FL) - 1; + + ARRAY_FIELD_DP32(s->regs, RX_FIFO_STATUS_REGISTER, RI, read_ind); + ARRAY_FIELD_DP32(s->regs, RX_FIFO_STATUS_REGISTER, FL, fill_ind); + } + } + + if (FIELD_EX32(val, RX_FIFO_STATUS_REGISTER, IRI_1)) { + /* FL_1 index is zero, setting IRI_1 bit has no effect. */ + if (FIELD_EX32(val, RX_FIFO_STATUS_REGISTER, FL_1) != 0) { + read_ind = FIELD_EX32(val, RX_FIFO_STATUS_REGISTER, RI_1) + 1; + + if (read_ind > s->cfg.rx1_fifo - 1) { + read_ind = 0; + } + + fill_ind = FIELD_EX32(val, RX_FIFO_STATUS_REGISTER, FL_1) - 1; + + ARRAY_FIELD_DP32(s->regs, RX_FIFO_STATUS_REGISTER, RI_1, read_ind); + ARRAY_FIELD_DP32(s->regs, RX_FIFO_STATUS_REGISTER, FL_1, fill_ind); + } + } + + return s->regs[R_RX_FIFO_STATUS_REGISTER]; +} + +static uint64_t canfd_tsr_pre_write(RegisterInfo *reg, uint64_t val64) +{ + XlnxVersalCANFDState *s = XILINX_CANFD(reg->opaque); + uint32_t val = val64; + + if (FIELD_EX32(val, TIMESTAMP_REGISTER, CTS)) { + ARRAY_FIELD_DP32(s->regs, TIMESTAMP_REGISTER, TIMESTAMP_CNT, 0); + ptimer_transaction_begin(s->canfd_timer); + ptimer_set_count(s->canfd_timer, 0); + ptimer_transaction_commit(s->canfd_timer); + } + + return 0; +} + +static uint64_t canfd_trr_reg_prew(RegisterInfo *reg, uint64_t val64) +{ + XlnxVersalCANFDState *s = XILINX_CANFD(reg->opaque); + + if (ARRAY_FIELD_EX32(s->regs, MODE_SELECT_REGISTER, SNOOP)) { + g_autofree char *path = object_get_canonical_path(OBJECT(s)); + + qemu_log_mask(LOG_GUEST_ERROR, "%s: Controller is in SNOOP mode." + " tx_ready_register will stay in reset mode\n", path); + return 0; + } else { + return val64; + } +} + +static void canfd_trr_reg_postw(RegisterInfo *reg, uint64_t val64) +{ + XlnxVersalCANFDState *s = XILINX_CANFD(reg->opaque); + + transfer_data(s); +} + +static void canfd_cancel_reg_postw(RegisterInfo *reg, uint64_t val64) +{ + XlnxVersalCANFDState *s = XILINX_CANFD(reg->opaque); + + process_cancellation_requests(s); +} + +static uint64_t canfd_write_check_prew(RegisterInfo *reg, uint64_t val64) +{ + XlnxVersalCANFDState *s = XILINX_CANFD(reg->opaque); + uint32_t val = val64; + + if (ARRAY_FIELD_EX32(s->regs, SOFTWARE_RESET_REGISTER, CEN) == 0) { + return val; + } + return 0; +} + +static const RegisterAccessInfo canfd_tx_regs[] = { + { .name = "TB_ID_REGISTER", .addr = A_TB_ID_REGISTER, + },{ .name = "TB0_DLC_REGISTER", .addr = A_TB0_DLC_REGISTER, + },{ .name = "TB_DW0_REGISTER", .addr = A_TB_DW0_REGISTER, + },{ .name = "TB_DW1_REGISTER", .addr = A_TB_DW1_REGISTER, + },{ .name = "TB_DW2_REGISTER", .addr = A_TB_DW2_REGISTER, + },{ .name = "TB_DW3_REGISTER", .addr = A_TB_DW3_REGISTER, + },{ .name = "TB_DW4_REGISTER", .addr = A_TB_DW4_REGISTER, + },{ .name = "TB_DW5_REGISTER", .addr = A_TB_DW5_REGISTER, + },{ .name = "TB_DW6_REGISTER", .addr = A_TB_DW6_REGISTER, + },{ .name = "TB_DW7_REGISTER", .addr = A_TB_DW7_REGISTER, + },{ .name = "TB_DW8_REGISTER", .addr = A_TB_DW8_REGISTER, + },{ .name = "TB_DW9_REGISTER", .addr = A_TB_DW9_REGISTER, + },{ .name = "TB_DW10_REGISTER", .addr = A_TB_DW10_REGISTER, + },{ .name = "TB_DW11_REGISTER", .addr = A_TB_DW11_REGISTER, + },{ .name = "TB_DW12_REGISTER", .addr = A_TB_DW12_REGISTER, + },{ .name = "TB_DW13_REGISTER", .addr = A_TB_DW13_REGISTER, + },{ .name = "TB_DW14_REGISTER", .addr = A_TB_DW14_REGISTER, + },{ .name = "TB_DW15_REGISTER", .addr = A_TB_DW15_REGISTER, + } +}; + +static const RegisterAccessInfo canfd_rx0_regs[] = { + { .name = "RB_ID_REGISTER", .addr = A_RB_ID_REGISTER, + .ro = 0xffffffff, + },{ .name = "RB_DLC_REGISTER", .addr = A_RB_DLC_REGISTER, + .ro = 0xfe1fffff, + },{ .name = "RB_DW0_REGISTER", .addr = A_RB_DW0_REGISTER, + .ro = 0xffffffff, + },{ .name = "RB_DW1_REGISTER", .addr = A_RB_DW1_REGISTER, + .ro = 0xffffffff, + },{ .name = "RB_DW2_REGISTER", .addr = A_RB_DW2_REGISTER, + .ro = 0xffffffff, + },{ .name = "RB_DW3_REGISTER", .addr = A_RB_DW3_REGISTER, + .ro = 0xffffffff, + },{ .name = "RB_DW4_REGISTER", .addr = A_RB_DW4_REGISTER, + .ro = 0xffffffff, + },{ .name = "RB_DW5_REGISTER", .addr = A_RB_DW5_REGISTER, + .ro = 0xffffffff, + },{ .name = "RB_DW6_REGISTER", .addr = A_RB_DW6_REGISTER, + .ro = 0xffffffff, + },{ .name = "RB_DW7_REGISTER", .addr = A_RB_DW7_REGISTER, + .ro = 0xffffffff, + },{ .name = "RB_DW8_REGISTER", .addr = A_RB_DW8_REGISTER, + .ro = 0xffffffff, + },{ .name = "RB_DW9_REGISTER", .addr = A_RB_DW9_REGISTER, + .ro = 0xffffffff, + },{ .name = "RB_DW10_REGISTER", .addr = A_RB_DW10_REGISTER, + .ro = 0xffffffff, + },{ .name = "RB_DW11_REGISTER", .addr = A_RB_DW11_REGISTER, + .ro = 0xffffffff, + },{ .name = "RB_DW12_REGISTER", .addr = A_RB_DW12_REGISTER, + .ro = 0xffffffff, + },{ .name = "RB_DW13_REGISTER", .addr = A_RB_DW13_REGISTER, + .ro = 0xffffffff, + },{ .name = "RB_DW14_REGISTER", .addr = A_RB_DW14_REGISTER, + .ro = 0xffffffff, + },{ .name = "RB_DW15_REGISTER", .addr = A_RB_DW15_REGISTER, + .ro = 0xffffffff, + } +}; + +static const RegisterAccessInfo canfd_rx1_regs[] = { + { .name = "RB_ID_REGISTER_1", .addr = A_RB_ID_REGISTER_1, + .ro = 0xffffffff, + },{ .name = "RB_DLC_REGISTER_1", .addr = A_RB_DLC_REGISTER_1, + .ro = 0xfe1fffff, + },{ .name = "RB0_DW0_REGISTER_1", .addr = A_RB0_DW0_REGISTER_1, + .ro = 0xffffffff, + },{ .name = "RB_DW1_REGISTER_1", .addr = A_RB_DW1_REGISTER_1, + .ro = 0xffffffff, + },{ .name = "RB_DW2_REGISTER_1", .addr = A_RB_DW2_REGISTER_1, + .ro = 0xffffffff, + },{ .name = "RB_DW3_REGISTER_1", .addr = A_RB_DW3_REGISTER_1, + .ro = 0xffffffff, + },{ .name = "RB_DW4_REGISTER_1", .addr = A_RB_DW4_REGISTER_1, + .ro = 0xffffffff, + },{ .name = "RB_DW5_REGISTER_1", .addr = A_RB_DW5_REGISTER_1, + .ro = 0xffffffff, + },{ .name = "RB_DW6_REGISTER_1", .addr = A_RB_DW6_REGISTER_1, + .ro = 0xffffffff, + },{ .name = "RB_DW7_REGISTER_1", .addr = A_RB_DW7_REGISTER_1, + .ro = 0xffffffff, + },{ .name = "RB_DW8_REGISTER_1", .addr = A_RB_DW8_REGISTER_1, + .ro = 0xffffffff, + },{ .name = "RB_DW9_REGISTER_1", .addr = A_RB_DW9_REGISTER_1, + .ro = 0xffffffff, + },{ .name = "RB_DW10_REGISTER_1", .addr = A_RB_DW10_REGISTER_1, + .ro = 0xffffffff, + },{ .name = "RB_DW11_REGISTER_1", .addr = A_RB_DW11_REGISTER_1, + .ro = 0xffffffff, + },{ .name = "RB_DW12_REGISTER_1", .addr = A_RB_DW12_REGISTER_1, + .ro = 0xffffffff, + },{ .name = "RB_DW13_REGISTER_1", .addr = A_RB_DW13_REGISTER_1, + .ro = 0xffffffff, + },{ .name = "RB_DW14_REGISTER_1", .addr = A_RB_DW14_REGISTER_1, + .ro = 0xffffffff, + },{ .name = "RB_DW15_REGISTER_1", .addr = A_RB_DW15_REGISTER_1, + .ro = 0xffffffff, + } +}; + +/* Acceptance filter registers. */ +static const RegisterAccessInfo canfd_af_regs[] = { + { .name = "AFMR_REGISTER", .addr = A_AFMR_REGISTER, + .pre_write = filter_mask, + },{ .name = "AFIR_REGISTER", .addr = A_AFIR_REGISTER, + .pre_write = filter_id, + } +}; + +static const RegisterAccessInfo canfd_txe_regs[] = { + { .name = "TXE_FIFO_TB_ID_REGISTER", .addr = A_TXE_FIFO_TB_ID_REGISTER, + .ro = 0xffffffff, + },{ .name = "TXE_FIFO_TB_DLC_REGISTER", .addr = A_TXE_FIFO_TB_DLC_REGISTER, + .ro = 0xffffffff, + } +}; + +static const RegisterAccessInfo canfd_regs_info[] = { + { .name = "SOFTWARE_RESET_REGISTER", .addr = A_SOFTWARE_RESET_REGISTER, + .pre_write = canfd_srr_pre_write, + },{ .name = "MODE_SELECT_REGISTER", .addr = A_MODE_SELECT_REGISTER, + .pre_write = canfd_msr_pre_write, + },{ .name = "ARBITRATION_PHASE_BAUD_RATE_PRESCALER_REGISTER", + .addr = A_ARBITRATION_PHASE_BAUD_RATE_PRESCALER_REGISTER, + .pre_write = canfd_write_check_prew, + },{ .name = "ARBITRATION_PHASE_BIT_TIMING_REGISTER", + .addr = A_ARBITRATION_PHASE_BIT_TIMING_REGISTER, + .pre_write = canfd_write_check_prew, + },{ .name = "ERROR_COUNTER_REGISTER", .addr = A_ERROR_COUNTER_REGISTER, + .ro = 0xffff, + },{ .name = "ERROR_STATUS_REGISTER", .addr = A_ERROR_STATUS_REGISTER, + .w1c = 0xf1f, + },{ .name = "STATUS_REGISTER", .addr = A_STATUS_REGISTER, + .reset = 0x1, + .ro = 0x7f17ff, + },{ .name = "INTERRUPT_STATUS_REGISTER", + .addr = A_INTERRUPT_STATUS_REGISTER, + .ro = 0xffffff7f, + },{ .name = "INTERRUPT_ENABLE_REGISTER", + .addr = A_INTERRUPT_ENABLE_REGISTER, + .post_write = canfd_ier_post_write, + },{ .name = "INTERRUPT_CLEAR_REGISTER", + .addr = A_INTERRUPT_CLEAR_REGISTER, .pre_write = canfd_icr_pre_write, + },{ .name = "TIMESTAMP_REGISTER", .addr = A_TIMESTAMP_REGISTER, + .ro = 0xffff0000, + .pre_write = canfd_tsr_pre_write, + },{ .name = "DATA_PHASE_BAUD_RATE_PRESCALER_REGISTER", + .addr = A_DATA_PHASE_BAUD_RATE_PRESCALER_REGISTER, + .pre_write = canfd_write_check_prew, + },{ .name = "DATA_PHASE_BIT_TIMING_REGISTER", + .addr = A_DATA_PHASE_BIT_TIMING_REGISTER, + .pre_write = canfd_write_check_prew, + },{ .name = "TX_BUFFER_READY_REQUEST_REGISTER", + .addr = A_TX_BUFFER_READY_REQUEST_REGISTER, + .pre_write = canfd_trr_reg_prew, + .post_write = canfd_trr_reg_postw, + },{ .name = "INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER", + .addr = A_INTERRUPT_ENABLE_TX_BUFFER_READY_REQUEST_REGISTER, + },{ .name = "TX_BUFFER_CANCEL_REQUEST_REGISTER", + .addr = A_TX_BUFFER_CANCEL_REQUEST_REGISTER, + .post_write = canfd_cancel_reg_postw, + },{ .name = "INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER", + .addr = A_INTERRUPT_ENABLE_TX_BUFFER_CANCELLATION_REQUEST_REGISTER, + },{ .name = "TX_EVENT_FIFO_STATUS_REGISTER", + .addr = A_TX_EVENT_FIFO_STATUS_REGISTER, + .ro = 0x3f1f, .pre_write = canfd_tx_fifo_status_prew, + },{ .name = "TX_EVENT_FIFO_WATERMARK_REGISTER", + .addr = A_TX_EVENT_FIFO_WATERMARK_REGISTER, + .reset = 0xf, + .pre_write = canfd_write_check_prew, + },{ .name = "ACCEPTANCE_FILTER_CONTROL_REGISTER", + .addr = A_ACCEPTANCE_FILTER_CONTROL_REGISTER, + },{ .name = "RX_FIFO_STATUS_REGISTER", .addr = A_RX_FIFO_STATUS_REGISTER, + .ro = 0x7f3f7f3f, .pre_write = canfd_rx_fifo_status_prew, + },{ .name = "RX_FIFO_WATERMARK_REGISTER", + .addr = A_RX_FIFO_WATERMARK_REGISTER, + .reset = 0x1f0f0f, + .pre_write = canfd_write_check_prew, + } +}; + +static void xlnx_versal_canfd_ptimer_cb(void *opaque) +{ + /* No action required on the timer rollover. */ +} + +static const MemoryRegionOps canfd_ops = { + .read = register_read_memory, + .write = register_write_memory, + .endianness = DEVICE_LITTLE_ENDIAN, + .valid = { + .min_access_size = 4, + .max_access_size = 4, + }, +}; + +static void canfd_reset(DeviceState *dev) +{ + XlnxVersalCANFDState *s = XILINX_CANFD(dev); + unsigned int i; + + for (i = 0; i < ARRAY_SIZE(s->reg_info); ++i) { + register_reset(&s->reg_info[i]); + } + + ptimer_transaction_begin(s->canfd_timer); + ptimer_set_count(s->canfd_timer, 0); + ptimer_transaction_commit(s->canfd_timer); +} + +static bool can_xilinx_canfd_receive(CanBusClientState *client) +{ + XlnxVersalCANFDState *s = container_of(client, XlnxVersalCANFDState, + bus_client); + + bool reset_state = ARRAY_FIELD_EX32(s->regs, SOFTWARE_RESET_REGISTER, SRST); + bool can_enabled = ARRAY_FIELD_EX32(s->regs, SOFTWARE_RESET_REGISTER, CEN); + + return !reset_state && can_enabled; +} + +static ssize_t canfd_xilinx_receive(CanBusClientState *client, + const qemu_can_frame *buf, + size_t buf_size) +{ + XlnxVersalCANFDState *s = container_of(client, XlnxVersalCANFDState, + bus_client); + const qemu_can_frame *frame = buf; + + assert(buf_size > 0); + + if (ARRAY_FIELD_EX32(s->regs, STATUS_REGISTER, LBACK)) { + /* + * XlnxVersalCANFDState will not participate in normal bus communication + * and does not receive any messages transmitted by other CAN nodes. + */ + return 1; + } + + /* Update the status register that we are receiving message. */ + ARRAY_FIELD_DP32(s->regs, STATUS_REGISTER, BBSY, 1); + + if (ARRAY_FIELD_EX32(s->regs, STATUS_REGISTER, SNOOP)) { + /* Snoop Mode: Just keep the data. no response back. */ + update_rx_sequential(s, frame); + } else { + if ((ARRAY_FIELD_EX32(s->regs, STATUS_REGISTER, SLEEP))) { + /* + * XlnxVersalCANFDState is in sleep mode. Any data on bus will bring + * it to the wake up state. + */ + canfd_exit_sleep_mode(s); + } + + update_rx_sequential(s, frame); + } + + /* Message processing done. Update the status back to !busy */ + ARRAY_FIELD_DP32(s->regs, STATUS_REGISTER, BBSY, 0); + return 1; +} + +static CanBusClientInfo canfd_xilinx_bus_client_info = { + .can_receive = can_xilinx_canfd_receive, + .receive = canfd_xilinx_receive, +}; + +static int xlnx_canfd_connect_to_bus(XlnxVersalCANFDState *s, + CanBusState *bus) +{ + s->bus_client.info = &canfd_xilinx_bus_client_info; + + return can_bus_insert_client(bus, &s->bus_client); +} + +#define NUM_REG_PER_AF ARRAY_SIZE(canfd_af_regs) +#define NUM_AF 32 +#define NUM_REG_PER_TXE ARRAY_SIZE(canfd_txe_regs) +#define NUM_TXE 32 + +static int canfd_populate_regarray(XlnxVersalCANFDState *s, + RegisterInfoArray *r_array, int pos, + const RegisterAccessInfo *rae, + int num_rae) +{ + int i; + + for (i = 0; i < num_rae; i++) { + int index = rae[i].addr / 4; + RegisterInfo *r = &s->reg_info[index]; + + object_initialize(r, sizeof(*r), TYPE_REGISTER); + + *r = (RegisterInfo) { + .data = &s->regs[index], + .data_size = sizeof(uint32_t), + .access = &rae[i], + .opaque = OBJECT(s), + }; + + r_array->r[i + pos] = r; + } + return i + pos; +} + +static void canfd_create_rai(RegisterAccessInfo *rai_array, + const RegisterAccessInfo *canfd_regs, + int template_rai_array_sz, + int num_template_to_copy) +{ + int i; + int reg_num; + + for (reg_num = 0; reg_num < num_template_to_copy; reg_num++) { + int pos = reg_num * template_rai_array_sz; + + memcpy(rai_array + pos, canfd_regs, + template_rai_array_sz * sizeof(RegisterAccessInfo)); + + for (i = 0; i < template_rai_array_sz; i++) { + const char *name = canfd_regs[i].name; + uint64_t addr = canfd_regs[i].addr; + rai_array[i + pos].name = g_strdup_printf("%s%d", name, reg_num); + rai_array[i + pos].addr = addr + pos * 4; + } + } +} + +static RegisterInfoArray *canfd_create_regarray(XlnxVersalCANFDState *s) +{ + const char *device_prefix = object_get_typename(OBJECT(s)); + uint64_t memory_size = XLNX_VERSAL_CANFD_R_MAX * 4; + int num_regs; + int pos = 0; + RegisterInfoArray *r_array; + + num_regs = ARRAY_SIZE(canfd_regs_info) + + s->cfg.tx_fifo * NUM_REGS_PER_MSG_SPACE + + s->cfg.rx0_fifo * NUM_REGS_PER_MSG_SPACE + + NUM_AF * NUM_REG_PER_AF + + NUM_TXE * NUM_REG_PER_TXE; + + s->tx_regs = g_new0(RegisterAccessInfo, + s->cfg.tx_fifo * ARRAY_SIZE(canfd_tx_regs)); + + canfd_create_rai(s->tx_regs, canfd_tx_regs, + ARRAY_SIZE(canfd_tx_regs), s->cfg.tx_fifo); + + s->rx0_regs = g_new0(RegisterAccessInfo, + s->cfg.rx0_fifo * ARRAY_SIZE(canfd_rx0_regs)); + + canfd_create_rai(s->rx0_regs, canfd_rx0_regs, + ARRAY_SIZE(canfd_rx0_regs), s->cfg.rx0_fifo); + + s->af_regs = g_new0(RegisterAccessInfo, + NUM_AF * ARRAY_SIZE(canfd_af_regs)); + + canfd_create_rai(s->af_regs, canfd_af_regs, + ARRAY_SIZE(canfd_af_regs), NUM_AF); + + s->txe_regs = g_new0(RegisterAccessInfo, + NUM_TXE * ARRAY_SIZE(canfd_txe_regs)); + + canfd_create_rai(s->txe_regs, canfd_txe_regs, + ARRAY_SIZE(canfd_txe_regs), NUM_TXE); + + if (s->cfg.enable_rx_fifo1) { + num_regs += s->cfg.rx1_fifo * NUM_REGS_PER_MSG_SPACE; + + s->rx1_regs = g_new0(RegisterAccessInfo, + s->cfg.rx1_fifo * ARRAY_SIZE(canfd_rx1_regs)); + + canfd_create_rai(s->rx1_regs, canfd_rx1_regs, + ARRAY_SIZE(canfd_rx1_regs), s->cfg.rx1_fifo); + } + + r_array = g_new0(RegisterInfoArray, 1); + r_array->r = g_new0(RegisterInfo * , num_regs); + r_array->num_elements = num_regs; + r_array->prefix = device_prefix; + + pos = canfd_populate_regarray(s, r_array, pos, + canfd_regs_info, + ARRAY_SIZE(canfd_regs_info)); + pos = canfd_populate_regarray(s, r_array, pos, + s->tx_regs, s->cfg.tx_fifo * + NUM_REGS_PER_MSG_SPACE); + pos = canfd_populate_regarray(s, r_array, pos, + s->rx0_regs, s->cfg.rx0_fifo * + NUM_REGS_PER_MSG_SPACE); + if (s->cfg.enable_rx_fifo1) { + pos = canfd_populate_regarray(s, r_array, pos, + s->rx1_regs, s->cfg.rx1_fifo * + NUM_REGS_PER_MSG_SPACE); + } + pos = canfd_populate_regarray(s, r_array, pos, + s->af_regs, NUM_AF * NUM_REG_PER_AF); + pos = canfd_populate_regarray(s, r_array, pos, + s->txe_regs, NUM_TXE * NUM_REG_PER_TXE); + + memory_region_init_io(&r_array->mem, OBJECT(s), &canfd_ops, r_array, + device_prefix, memory_size); + return r_array; +} + +static void canfd_realize(DeviceState *dev, Error **errp) +{ + XlnxVersalCANFDState *s = XILINX_CANFD(dev); + RegisterInfoArray *reg_array; + + reg_array = canfd_create_regarray(s); + memory_region_add_subregion(&s->iomem, 0x00, ®_array->mem); + sysbus_init_mmio(SYS_BUS_DEVICE(dev), &s->iomem); + sysbus_init_irq(SYS_BUS_DEVICE(dev), &s->irq_canfd_int); + + if (s->canfdbus) { + if (xlnx_canfd_connect_to_bus(s, s->canfdbus) < 0) { + g_autofree char *path = object_get_canonical_path(OBJECT(s)); + + error_setg(errp, "%s: xlnx_canfd_connect_to_bus failed", path); + return; + } + + } + + /* Allocate a new timer. */ + s->canfd_timer = ptimer_init(xlnx_versal_canfd_ptimer_cb, s, + PTIMER_POLICY_WRAP_AFTER_ONE_PERIOD | + PTIMER_POLICY_TRIGGER_ONLY_ON_DECREMENT | + PTIMER_POLICY_NO_IMMEDIATE_RELOAD); + + ptimer_transaction_begin(s->canfd_timer); + + ptimer_set_freq(s->canfd_timer, s->cfg.ext_clk_freq); + ptimer_set_limit(s->canfd_timer, CANFD_TIMER_MAX, 1); + ptimer_run(s->canfd_timer, 0); + ptimer_transaction_commit(s->canfd_timer); +} + +static void canfd_init(Object *obj) +{ + XlnxVersalCANFDState *s = XILINX_CANFD(obj); + + memory_region_init(&s->iomem, obj, TYPE_XILINX_CANFD, + XLNX_VERSAL_CANFD_R_MAX * 4); +} + +static const VMStateDescription vmstate_canfd = { + .name = TYPE_XILINX_CANFD, + .version_id = 1, + .minimum_version_id = 1, + .fields = (VMStateField[]) { + VMSTATE_UINT32_ARRAY(regs, XlnxVersalCANFDState, + XLNX_VERSAL_CANFD_R_MAX), + VMSTATE_PTIMER(canfd_timer, XlnxVersalCANFDState), + VMSTATE_END_OF_LIST(), + } +}; + +static Property canfd_core_properties[] = { + DEFINE_PROP_UINT8("rx-fifo0", XlnxVersalCANFDState, cfg.rx0_fifo, 0x40), + DEFINE_PROP_UINT8("rx-fifo1", XlnxVersalCANFDState, cfg.rx1_fifo, 0x40), + DEFINE_PROP_UINT8("tx-fifo", XlnxVersalCANFDState, cfg.tx_fifo, 0x20), + DEFINE_PROP_BOOL("enable-rx-fifo1", XlnxVersalCANFDState, + cfg.enable_rx_fifo1, true), + DEFINE_PROP_UINT32("ext_clk_freq", XlnxVersalCANFDState, cfg.ext_clk_freq, + CANFD_DEFAULT_CLOCK), + DEFINE_PROP_LINK("canfdbus", XlnxVersalCANFDState, canfdbus, TYPE_CAN_BUS, + CanBusState *), + DEFINE_PROP_END_OF_LIST(), +}; + +static void canfd_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(klass); + + dc->reset = canfd_reset; + dc->realize = canfd_realize; + device_class_set_props(dc, canfd_core_properties); + dc->vmsd = &vmstate_canfd; +} + +static const TypeInfo canfd_info = { + .name = TYPE_XILINX_CANFD, + .parent = TYPE_SYS_BUS_DEVICE, + .instance_size = sizeof(XlnxVersalCANFDState), + .class_init = canfd_class_init, + .instance_init = canfd_init, +}; + +static void canfd_register_types(void) +{ + type_register_static(&canfd_info); +} + +type_init(canfd_register_types) diff --git a/hw/net/can/meson.build b/hw/net/can/meson.build index XXXXXXX..XXXXXXX 100644 --- a/hw/net/can/meson.build +++ b/hw/net/can/meson.build @@ -XXX,XX +XXX,XX @@ softmmu_ss.add(when: 'CONFIG_CAN_PCI', if_true: files('can_mioe3680_pci.c')) softmmu_ss.add(when: 'CONFIG_CAN_CTUCANFD', if_true: files('ctucan_core.c')) softmmu_ss.add(when: 'CONFIG_CAN_CTUCANFD_PCI', if_true: files('ctucan_pci.c')) softmmu_ss.add(when: 'CONFIG_XLNX_ZYNQMP', if_true: files('xlnx-zynqmp-can.c')) +softmmu_ss.add(when: 'CONFIG_XLNX_VERSAL', if_true: files('xlnx-versal-canfd.c')) diff --git a/hw/net/can/trace-events b/hw/net/can/trace-events index XXXXXXX..XXXXXXX 100644 --- a/hw/net/can/trace-events +++ b/hw/net/can/trace-events @@ -XXX,XX +XXX,XX @@ xlnx_can_filter_mask_pre_write(uint8_t filter_num, uint32_t value) "Filter%d MAS xlnx_can_tx_data(uint32_t id, uint8_t dlc, uint8_t db0, uint8_t db1, uint8_t db2, uint8_t db3, uint8_t db4, uint8_t db5, uint8_t db6, uint8_t db7) "Frame: ID: 0x%08x DLC: 0x%02x DATA: 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x" xlnx_can_rx_data(uint32_t id, uint32_t dlc, uint8_t db0, uint8_t db1, uint8_t db2, uint8_t db3, uint8_t db4, uint8_t db5, uint8_t db6, uint8_t db7) "Frame: ID: 0x%08x DLC: 0x%02x DATA: 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x 0x%02x" xlnx_can_rx_discard(uint32_t status) "Controller is not enabled for bus communication. Status Register: 0x%08x" + +# xlnx-versal-canfd.c +xlnx_canfd_update_irq(char *path, uint32_t isr, uint32_t ier, uint32_t irq) "%s: ISR: 0x%08x IER: 0x%08x IRQ: 0x%08x" +xlnx_canfd_rx_fifo_filter_reject(char *path, uint32_t id, uint8_t dlc) "%s: Frame: ID: 0x%08x DLC: 0x%02x" +xlnx_canfd_rx_data(char *path, uint32_t id, uint8_t dlc, uint8_t flags) "%s: Frame: ID: 0x%08x DLC: 0x%02x CANFD Flag: 0x%02x" +xlnx_canfd_tx_data(char *path, uint32_t id, uint8_t dlc, uint8_t flgas) "%s: Frame: ID: 0x%08x DLC: 0x%02x CANFD Flag: 0x%02x" +xlnx_canfd_reset(char *path, uint32_t val) "%s: Resetting controller with value = 0x%08x" -- 2.34.1
From: Vikram Garhwal <vikram.garhwal@amd.com> Connect CANFD0 and CANFD1 on the Versal-virt machine and update xlnx-versal-virt document with CANFD command line examples. Signed-off-by: Vikram Garhwal <vikram.garhwal@amd.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- docs/system/arm/xlnx-versal-virt.rst | 31 ++++++++++++++++ include/hw/arm/xlnx-versal.h | 12 +++++++ hw/arm/xlnx-versal-virt.c | 53 ++++++++++++++++++++++++++++ hw/arm/xlnx-versal.c | 37 +++++++++++++++++++ 4 files changed, 133 insertions(+) diff --git a/docs/system/arm/xlnx-versal-virt.rst b/docs/system/arm/xlnx-versal-virt.rst index XXXXXXX..XXXXXXX 100644 --- a/docs/system/arm/xlnx-versal-virt.rst +++ b/docs/system/arm/xlnx-versal-virt.rst @@ -XXX,XX +XXX,XX @@ Implemented devices: - DDR memory - BBRAM (36 bytes of Battery-backed RAM) - eFUSE (3072 bytes of one-time field-programmable bit array) +- 2 CANFDs QEMU does not yet model any other devices, including the PL and the AI Engine. @@ -XXX,XX +XXX,XX @@ To use a different index value, N, from default of 1, add: Better yet, do not use actual product data when running guest image on this Xilinx Versal Virt board. + +Using CANFDs for Versal Virt +"""""""""""""""""""""""""""" +Versal CANFD controller is developed based on SocketCAN and QEMU CAN bus +implementation. Bus connection and socketCAN connection for each CAN module +can be set through command lines. + +To connect both CANFD0 and CANFD1 on the same bus: + +.. code-block:: bash + + -object can-bus,id=canbus -machine canbus0=canbus -machine canbus1=canbus + +To connect CANFD0 and CANFD1 to separate buses: + +.. code-block:: bash + + -object can-bus,id=canbus0 -object can-bus,id=canbus1 \ + -machine canbus0=canbus0 -machine canbus1=canbus1 + +The SocketCAN interface can connect to a Physical or a Virtual CAN interfaces on +the host machine. Please check this document to learn about CAN interface on +Linux: docs/system/devices/can.rst + +To connect CANFD0 and CANFD1 to host machine's CAN interface can0: + +.. code-block:: bash + + -object can-bus,id=canbus -machine canbus0=canbus -machine canbus1=canbus + -object can-host-socketcan,id=canhost0,if=can0,canbus=canbus diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/arm/xlnx-versal.h +++ b/include/hw/arm/xlnx-versal.h @@ -XXX,XX +XXX,XX @@ #include "hw/dma/xlnx_csu_dma.h" #include "hw/misc/xlnx-versal-crl.h" #include "hw/misc/xlnx-versal-pmc-iou-slcr.h" +#include "hw/net/xlnx-versal-canfd.h" #define TYPE_XLNX_VERSAL "xlnx-versal" OBJECT_DECLARE_SIMPLE_TYPE(Versal, XLNX_VERSAL) @@ -XXX,XX +XXX,XX @@ OBJECT_DECLARE_SIMPLE_TYPE(Versal, XLNX_VERSAL) #define XLNX_VERSAL_NR_SDS 2 #define XLNX_VERSAL_NR_XRAM 4 #define XLNX_VERSAL_NR_IRQS 192 +#define XLNX_VERSAL_NR_CANFD 2 +#define XLNX_VERSAL_CANFD_REF_CLK (24 * 1000 * 1000) struct Versal { /*< private >*/ @@ -XXX,XX +XXX,XX @@ struct Versal { CadenceGEMState gem[XLNX_VERSAL_NR_GEMS]; XlnxZDMA adma[XLNX_VERSAL_NR_ADMAS]; VersalUsb2 usb; + CanBusState *canbus[XLNX_VERSAL_NR_CANFD]; + XlnxVersalCANFDState canfd[XLNX_VERSAL_NR_CANFD]; } iou; /* Real-time Processing Unit. */ @@ -XXX,XX +XXX,XX @@ struct Versal { #define VERSAL_CRL_IRQ 10 #define VERSAL_UART0_IRQ_0 18 #define VERSAL_UART1_IRQ_0 19 +#define VERSAL_CANFD0_IRQ_0 20 +#define VERSAL_CANFD1_IRQ_0 21 #define VERSAL_USB0_IRQ_0 22 #define VERSAL_GEM0_IRQ_0 56 #define VERSAL_GEM0_WAKE_IRQ_0 57 @@ -XXX,XX +XXX,XX @@ struct Versal { #define MM_UART1 0xff010000U #define MM_UART1_SIZE 0x10000 +#define MM_CANFD0 0xff060000U +#define MM_CANFD0_SIZE 0x10000 +#define MM_CANFD1 0xff070000U +#define MM_CANFD1_SIZE 0x10000 + #define MM_GEM0 0xff0c0000U #define MM_GEM0_SIZE 0x10000 #define MM_GEM1 0xff0d0000U diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/xlnx-versal-virt.c +++ b/hw/arm/xlnx-versal-virt.c @@ -XXX,XX +XXX,XX @@ struct VersalVirt { uint32_t clk_25Mhz; uint32_t usb; uint32_t dwc; + uint32_t canfd[2]; } phandle; struct arm_boot_info binfo; + CanBusState *canbus[XLNX_VERSAL_NR_CANFD]; struct { bool secure; } cfg; @@ -XXX,XX +XXX,XX @@ static void fdt_add_uart_nodes(VersalVirt *s) } } +static void fdt_add_canfd_nodes(VersalVirt *s) +{ + uint64_t addrs[] = { MM_CANFD1, MM_CANFD0 }; + uint32_t size[] = { MM_CANFD1_SIZE, MM_CANFD0_SIZE }; + unsigned int irqs[] = { VERSAL_CANFD1_IRQ_0, VERSAL_CANFD0_IRQ_0 }; + const char clocknames[] = "can_clk\0s_axi_aclk"; + int i; + + /* Create and connect CANFD0 and CANFD1 nodes to canbus0. */ + for (i = 0; i < ARRAY_SIZE(addrs); i++) { + char *name = g_strdup_printf("/canfd@%" PRIx64, addrs[i]); + qemu_fdt_add_subnode(s->fdt, name); + + qemu_fdt_setprop_cell(s->fdt, name, "rx-fifo-depth", 0x40); + qemu_fdt_setprop_cell(s->fdt, name, "tx-mailbox-count", 0x20); + + qemu_fdt_setprop_cells(s->fdt, name, "clocks", + s->phandle.clk_25Mhz, s->phandle.clk_25Mhz); + qemu_fdt_setprop(s->fdt, name, "clock-names", + clocknames, sizeof(clocknames)); + qemu_fdt_setprop_cells(s->fdt, name, "interrupts", + GIC_FDT_IRQ_TYPE_SPI, irqs[i], + GIC_FDT_IRQ_FLAGS_LEVEL_HI); + qemu_fdt_setprop_sized_cells(s->fdt, name, "reg", + 2, addrs[i], 2, size[i]); + qemu_fdt_setprop_string(s->fdt, name, "compatible", + "xlnx,canfd-2.0"); + + g_free(name); + } +} + static void fdt_add_fixed_link_nodes(VersalVirt *s, char *gemname, uint32_t phandle) { @@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine) TYPE_XLNX_VERSAL); object_property_set_link(OBJECT(&s->soc), "ddr", OBJECT(machine->ram), &error_abort); + object_property_set_link(OBJECT(&s->soc), "canbus0", OBJECT(s->canbus[0]), + &error_abort); + object_property_set_link(OBJECT(&s->soc), "canbus1", OBJECT(s->canbus[1]), + &error_abort); sysbus_realize(SYS_BUS_DEVICE(&s->soc), &error_fatal); fdt_create(s); create_virtio_regions(s); fdt_add_gem_nodes(s); fdt_add_uart_nodes(s); + fdt_add_canfd_nodes(s); fdt_add_gic_nodes(s); fdt_add_timer_nodes(s); fdt_add_zdma_nodes(s); @@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine) static void versal_virt_machine_instance_init(Object *obj) { + VersalVirt *s = XLNX_VERSAL_VIRT_MACHINE(obj); + + /* + * User can set canbus0 and canbus1 properties to can-bus object and connect + * to socketcan(optional) interface via command line. + */ + object_property_add_link(obj, "canbus0", TYPE_CAN_BUS, + (Object **)&s->canbus[0], + object_property_allow_set_link, + 0); + object_property_add_link(obj, "canbus1", TYPE_CAN_BUS, + (Object **)&s->canbus[1], + object_property_allow_set_link, + 0); } static void versal_virt_machine_class_init(ObjectClass *oc, void *data) diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/xlnx-versal.c +++ b/hw/arm/xlnx-versal.c @@ -XXX,XX +XXX,XX @@ static void versal_create_uarts(Versal *s, qemu_irq *pic) } } +static void versal_create_canfds(Versal *s, qemu_irq *pic) +{ + int i; + uint32_t irqs[] = { VERSAL_CANFD0_IRQ_0, VERSAL_CANFD1_IRQ_0}; + uint64_t addrs[] = { MM_CANFD0, MM_CANFD1 }; + + for (i = 0; i < ARRAY_SIZE(s->lpd.iou.canfd); i++) { + char *name = g_strdup_printf("canfd%d", i); + SysBusDevice *sbd; + MemoryRegion *mr; + + object_initialize_child(OBJECT(s), name, &s->lpd.iou.canfd[i], + TYPE_XILINX_CANFD); + sbd = SYS_BUS_DEVICE(&s->lpd.iou.canfd[i]); + + object_property_set_int(OBJECT(&s->lpd.iou.canfd[i]), "ext_clk_freq", + XLNX_VERSAL_CANFD_REF_CLK , &error_abort); + + object_property_set_link(OBJECT(&s->lpd.iou.canfd[i]), "canfdbus", + OBJECT(s->lpd.iou.canbus[i]), + &error_abort); + + sysbus_realize(sbd, &error_fatal); + + mr = sysbus_mmio_get_region(sbd, 0); + memory_region_add_subregion(&s->mr_ps, addrs[i], mr); + + sysbus_connect_irq(sbd, 0, pic[irqs[i]]); + g_free(name); + } +} + static void versal_create_usbs(Versal *s, qemu_irq *pic) { DeviceState *dev; @@ -XXX,XX +XXX,XX @@ static void versal_realize(DeviceState *dev, Error **errp) versal_create_apu_gic(s, pic); versal_create_rpu_cpus(s); versal_create_uarts(s, pic); + versal_create_canfds(s, pic); versal_create_usbs(s, pic); versal_create_gems(s, pic); versal_create_admas(s, pic); @@ -XXX,XX +XXX,XX @@ static void versal_init(Object *obj) static Property versal_properties[] = { DEFINE_PROP_LINK("ddr", Versal, cfg.mr_ddr, TYPE_MEMORY_REGION, MemoryRegion *), + DEFINE_PROP_LINK("canbus0", Versal, lpd.iou.canbus[0], + TYPE_CAN_BUS, CanBusState *), + DEFINE_PROP_LINK("canbus1", Versal, lpd.iou.canbus[1], + TYPE_CAN_BUS, CanBusState *), DEFINE_PROP_END_OF_LIST() }; -- 2.34.1
From: Vikram Garhwal <vikram.garhwal@amd.com> Signed-off-by: Vikram Garhwal <vikram.garhwal@amd.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index XXXXXXX..XXXXXXX 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -XXX,XX +XXX,XX @@ M: Francisco Iglesias <francisco.iglesias@amd.com> S: Maintained F: hw/net/can/xlnx-* F: include/hw/net/xlnx-* -F: tests/qtest/xlnx-can-test* +F: tests/qtest/xlnx-can*-test* EDU M: Jiri Slaby <jslaby@suse.cz> -- 2.34.1
From: Vikram Garhwal <vikram.garhwal@amd.com> The QTests perform three tests on the Xilinx VERSAL CANFD controller: Tests the CANFD controllers in loopback. Tests the CANFD controllers in normal mode with CAN frame. Tests the CANFD controllers in normal mode with CANFD frame. Signed-off-by: Vikram Garhwal <vikram.garhwal@amd.com> Acked-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Francisco Iglesias <francisco.iglesias@amd.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- tests/qtest/xlnx-canfd-test.c | 423 ++++++++++++++++++++++++++++++++++ tests/qtest/meson.build | 1 + 2 files changed, 424 insertions(+) create mode 100644 tests/qtest/xlnx-canfd-test.c diff --git a/tests/qtest/xlnx-canfd-test.c b/tests/qtest/xlnx-canfd-test.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/tests/qtest/xlnx-canfd-test.c @@ -XXX,XX +XXX,XX @@ +/* + * SPDX-License-Identifier: MIT + * + * QTests for the Xilinx Versal CANFD controller. + * + * Copyright (c) 2022 AMD Inc. + * + * Written-by: Vikram Garhwal<vikram.garhwal@amd.com> + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include "qemu/osdep.h" +#include "libqtest.h" + +/* Base address. */ +#define CANFD0_BASE_ADDR 0xff060000 +#define CANFD1_BASE_ADDR 0xff070000 + +/* Register addresses. */ +#define R_SRR_OFFSET 0x00 +#define R_MSR_OFFSET 0x04 +#define R_FILTER_CONTROL_REGISTER 0xe0 +#define R_SR_OFFSET 0x18 +#define R_ISR_OFFSET 0x1c +#define R_IER_OFFSET 0x20 +#define R_ICR_OFFSET 0x24 +#define R_TX_READY_REQ_REGISTER 0x90 +#define RX_FIFO_STATUS_REGISTER 0xe8 +#define R_TXID_OFFSET 0x100 +#define R_TXDLC_OFFSET 0x104 +#define R_TXDATA1_OFFSET 0x108 +#define R_TXDATA2_OFFSET 0x10c +#define R_AFMR_REGISTER0 0xa00 +#define R_AFIR_REGISTER0 0xa04 +#define R_RX0_ID_OFFSET 0x2100 +#define R_RX0_DLC_OFFSET 0x2104 +#define R_RX0_DATA1_OFFSET 0x2108 +#define R_RX0_DATA2_OFFSET 0x210c + +/* CANFD modes. */ +#define SRR_CONFIG_MODE 0x00 +#define MSR_NORMAL_MODE 0x00 +#define MSR_LOOPBACK_MODE (1 << 1) +#define ENABLE_CANFD (1 << 1) + +/* CANFD status. */ +#define STATUS_CONFIG_MODE (1 << 0) +#define STATUS_NORMAL_MODE (1 << 3) +#define STATUS_LOOPBACK_MODE (1 << 1) +#define ISR_TXOK (1 << 1) +#define ISR_RXOK (1 << 4) + +#define ENABLE_ALL_FILTERS 0xffffffff +#define ENABLE_ALL_INTERRUPTS 0xffffffff + +/* We are sending one canfd message. */ +#define TX_READY_REG_VAL 0x1 + +#define FIRST_RX_STORE_INDEX 0x1 +#define STATUS_REG_MASK 0xf +#define DLC_FD_BIT_SHIFT 0x1b +#define DLC_FD_BIT_MASK 0xf8000000 +#define FIFO_STATUS_READ_INDEX_MASK 0x3f +#define FIFO_STATUS_FILL_LEVEL_MASK 0x7f00 +#define FILL_LEVEL_SHIFT 0x8 + +/* CANFD frame size ID, DLC and 16 DATA word. */ +#define CANFD_FRAME_SIZE 18 +/* CAN frame size ID, DLC and 2 DATA word. */ +#define CAN_FRAME_SIZE 4 + +/* Set the filters for CANFD controller. */ +static void enable_filters(QTestState *qts) +{ + const uint32_t arr_afmr[32] = { 0xb423deaa, 0xa2a40bdc, 0x1b64f486, + 0x95c0d4ee, 0xe0c44528, 0x4b407904, + 0xd2673f46, 0x9fc638d6, 0x8844f3d8, + 0xa607d1e8, 0x67871bf4, 0xc2557dc, + 0x9ea5b53e, 0x3643c0cc, 0x5a05ea8e, + 0x83a46d84, 0x4a25c2b8, 0x93a66008, + 0x2e467470, 0xedc66118, 0x9086f9f2, + 0xfa23dd36, 0xb6654b90, 0xb221b8ca, + 0x3467d1e2, 0xa3a55542, 0x5b26a012, + 0x2281ea7e, 0xcea0ece8, 0xdc61e588, + 0x2e5676a, 0x16821320 }; + + const uint32_t arr_afir[32] = { 0xa833dfa1, 0x255a477e, 0x3a4bb1c5, + 0x8f560a6c, 0x27f38903, 0x2fecec4d, + 0xa014c66d, 0xec289b8, 0x7e52dead, + 0x82e94f3c, 0xcf3e3c5c, 0x66059871, + 0x3f213df4, 0x25ac3959, 0xa12e9bef, + 0xa3ad3af, 0xbafd7fe, 0xb3cb40fd, + 0x5d9caa81, 0x2ed61902, 0x7cd64a0, + 0x4b1fa538, 0x9b5ced8c, 0x150de059, + 0xd2794227, 0x635e820a, 0xbb6b02cf, + 0xbb58176, 0x570025bb, 0xa78d9658, + 0x49d735df, 0xe5399d2f }; + + /* Passing the respective array values to all the AFMR and AFIR pairs. */ + for (int i = 0; i < 32; i++) { + /* For CANFD0. */ + qtest_writel(qts, CANFD0_BASE_ADDR + R_AFMR_REGISTER0 + 8 * i, + arr_afmr[i]); + qtest_writel(qts, CANFD0_BASE_ADDR + R_AFIR_REGISTER0 + 8 * i, + arr_afir[i]); + + /* For CANFD1. */ + qtest_writel(qts, CANFD1_BASE_ADDR + R_AFMR_REGISTER0 + 8 * i, + arr_afmr[i]); + qtest_writel(qts, CANFD1_BASE_ADDR + R_AFIR_REGISTER0 + 8 * i, + arr_afir[i]); + } + + /* Enable all the pairs from AFR register. */ + qtest_writel(qts, CANFD0_BASE_ADDR + R_FILTER_CONTROL_REGISTER, + ENABLE_ALL_FILTERS); + qtest_writel(qts, CANFD1_BASE_ADDR + R_FILTER_CONTROL_REGISTER, + ENABLE_ALL_FILTERS); +} + +static void configure_canfd(QTestState *qts, uint8_t mode) +{ + uint32_t status = 0; + + /* Put CANFD0 and CANFD1 in config mode. */ + qtest_writel(qts, CANFD0_BASE_ADDR + R_SRR_OFFSET, SRR_CONFIG_MODE); + qtest_writel(qts, CANFD1_BASE_ADDR + R_SRR_OFFSET, SRR_CONFIG_MODE); + + /* Write mode of operation in Mode select register. */ + qtest_writel(qts, CANFD0_BASE_ADDR + R_MSR_OFFSET, mode); + qtest_writel(qts, CANFD1_BASE_ADDR + R_MSR_OFFSET, mode); + + enable_filters(qts); + + /* Check here if CANFD0 and CANFD1 are in config mode. */ + status = qtest_readl(qts, CANFD0_BASE_ADDR + R_SR_OFFSET); + status = status & STATUS_REG_MASK; + g_assert_cmpint(status, ==, STATUS_CONFIG_MODE); + + status = qtest_readl(qts, CANFD1_BASE_ADDR + R_SR_OFFSET); + status = status & STATUS_REG_MASK; + g_assert_cmpint(status, ==, STATUS_CONFIG_MODE); + + qtest_writel(qts, CANFD1_BASE_ADDR + R_IER_OFFSET, ENABLE_ALL_INTERRUPTS); + qtest_writel(qts, CANFD1_BASE_ADDR + R_IER_OFFSET, ENABLE_ALL_INTERRUPTS); + + qtest_writel(qts, CANFD0_BASE_ADDR + R_SRR_OFFSET, ENABLE_CANFD); + qtest_writel(qts, CANFD1_BASE_ADDR + R_SRR_OFFSET, ENABLE_CANFD); +} + +static void generate_random_data(uint32_t *buf_tx, bool is_canfd_frame) +{ + /* Generate random TX data for CANFD frame. */ + if (is_canfd_frame) { + for (int i = 0; i < CANFD_FRAME_SIZE - 2; i++) { + buf_tx[2 + i] = rand(); + } + } else { + /* Generate random TX data for CAN frame. */ + for (int i = 0; i < CAN_FRAME_SIZE - 2; i++) { + buf_tx[2 + i] = rand(); + } + } +} + +static void read_data(QTestState *qts, uint64_t can_base_addr, uint32_t *buf_rx) +{ + uint32_t int_status; + uint32_t fifo_status_reg_value; + /* At which RX FIFO the received data is stored. */ + uint8_t store_ind = 0; + bool is_canfd_frame = false; + + /* Read the interrupt on CANFD rx. */ + int_status = qtest_readl(qts, can_base_addr + R_ISR_OFFSET) & ISR_RXOK; + + g_assert_cmpint(int_status, ==, ISR_RXOK); + + /* Find the fill level and read index. */ + fifo_status_reg_value = qtest_readl(qts, can_base_addr + + RX_FIFO_STATUS_REGISTER); + + store_ind = (fifo_status_reg_value & FIFO_STATUS_READ_INDEX_MASK) + + ((fifo_status_reg_value & FIFO_STATUS_FILL_LEVEL_MASK) >> + FILL_LEVEL_SHIFT); + + g_assert_cmpint(store_ind, ==, FIRST_RX_STORE_INDEX); + + /* Read the RX register data for CANFD. */ + buf_rx[0] = qtest_readl(qts, can_base_addr + R_RX0_ID_OFFSET); + buf_rx[1] = qtest_readl(qts, can_base_addr + R_RX0_DLC_OFFSET); + + is_canfd_frame = (buf_rx[1] >> DLC_FD_BIT_SHIFT) & 1; + + if (is_canfd_frame) { + for (int i = 0; i < CANFD_FRAME_SIZE - 2; i++) { + buf_rx[i + 2] = qtest_readl(qts, + can_base_addr + R_RX0_DATA1_OFFSET + 4 * i); + } + } else { + buf_rx[2] = qtest_readl(qts, can_base_addr + R_RX0_DATA1_OFFSET); + buf_rx[3] = qtest_readl(qts, can_base_addr + R_RX0_DATA2_OFFSET); + } + + /* Clear the RX interrupt. */ + qtest_writel(qts, CANFD1_BASE_ADDR + R_ICR_OFFSET, ISR_RXOK); +} + +static void write_data(QTestState *qts, uint64_t can_base_addr, + const uint32_t *buf_tx, bool is_canfd_frame) +{ + /* Write the TX register data for CANFD. */ + qtest_writel(qts, can_base_addr + R_TXID_OFFSET, buf_tx[0]); + qtest_writel(qts, can_base_addr + R_TXDLC_OFFSET, buf_tx[1]); + + if (is_canfd_frame) { + for (int i = 0; i < CANFD_FRAME_SIZE - 2; i++) { + qtest_writel(qts, can_base_addr + R_TXDATA1_OFFSET + 4 * i, + buf_tx[2 + i]); + } + } else { + qtest_writel(qts, can_base_addr + R_TXDATA1_OFFSET, buf_tx[2]); + qtest_writel(qts, can_base_addr + R_TXDATA2_OFFSET, buf_tx[3]); + } +} + +static void send_data(QTestState *qts, uint64_t can_base_addr) +{ + uint32_t int_status; + + qtest_writel(qts, can_base_addr + R_TX_READY_REQ_REGISTER, + TX_READY_REG_VAL); + + /* Read the interrupt on CANFD for tx. */ + int_status = qtest_readl(qts, can_base_addr + R_ISR_OFFSET) & ISR_TXOK; + + g_assert_cmpint(int_status, ==, ISR_TXOK); + + /* Clear the interrupt for tx. */ + qtest_writel(qts, CANFD0_BASE_ADDR + R_ICR_OFFSET, ISR_TXOK); +} + +static void match_rx_tx_data(const uint32_t *buf_tx, const uint32_t *buf_rx, + bool is_canfd_frame) +{ + uint16_t size = 0; + uint8_t len = CAN_FRAME_SIZE; + + if (is_canfd_frame) { + len = CANFD_FRAME_SIZE; + } + + while (size < len) { + if (R_RX0_ID_OFFSET + 4 * size == R_RX0_DLC_OFFSET) { + g_assert_cmpint((buf_rx[size] & DLC_FD_BIT_MASK), ==, + (buf_tx[size] & DLC_FD_BIT_MASK)); + } else { + if (!is_canfd_frame && size == 4) { + break; + } + + g_assert_cmpint(buf_rx[size], ==, buf_tx[size]); + } + + size++; + } +} +/* + * Xilinx CANFD supports both CAN and CANFD frames. This test will be + * transferring CAN frame i.e. 8 bytes of data from CANFD0 and CANFD1 through + * canbus. CANFD0 initiate the data transfer to can-bus, CANFD1 receives the + * data. Test compares the can frame data sent from CANFD0 and received on + * CANFD1. + */ +static void test_can_data_transfer(void) +{ + uint32_t buf_tx[CAN_FRAME_SIZE] = { 0x5a5bb9a4, 0x80000000, + 0x12345678, 0x87654321 }; + uint32_t buf_rx[CAN_FRAME_SIZE] = { 0x00, 0x00, 0x00, 0x00 }; + uint32_t status = 0; + + generate_random_data(buf_tx, false); + + QTestState *qts = qtest_init("-machine xlnx-versal-virt" + " -object can-bus,id=canbus" + " -machine canbus0=canbus" + " -machine canbus1=canbus" + ); + + configure_canfd(qts, MSR_NORMAL_MODE); + + /* Check if CANFD0 and CANFD1 are in Normal mode. */ + status = qtest_readl(qts, CANFD0_BASE_ADDR + R_SR_OFFSET); + status = status & STATUS_REG_MASK; + g_assert_cmpint(status, ==, STATUS_NORMAL_MODE); + + status = qtest_readl(qts, CANFD1_BASE_ADDR + R_SR_OFFSET); + status = status & STATUS_REG_MASK; + g_assert_cmpint(status, ==, STATUS_NORMAL_MODE); + + write_data(qts, CANFD0_BASE_ADDR, buf_tx, false); + + send_data(qts, CANFD0_BASE_ADDR); + read_data(qts, CANFD1_BASE_ADDR, buf_rx); + match_rx_tx_data(buf_tx, buf_rx, false); + + qtest_quit(qts); +} + +/* + * This test will be transferring CANFD frame i.e. 64 bytes of data from CANFD0 + * and CANFD1 through canbus. CANFD0 initiate the data transfer to can-bus, + * CANFD1 receives the data. Test compares the CANFD frame data sent from CANFD0 + * with received on CANFD1. + */ +static void test_canfd_data_transfer(void) +{ + uint32_t buf_tx[CANFD_FRAME_SIZE] = { 0x5a5bb9a4, 0xf8000000 }; + uint32_t buf_rx[CANFD_FRAME_SIZE] = { 0x00, 0x00, 0x00, 0x00 }; + uint32_t status = 0; + + generate_random_data(buf_tx, true); + + QTestState *qts = qtest_init("-machine xlnx-versal-virt" + " -object can-bus,id=canbus" + " -machine canbus0=canbus" + " -machine canbus1=canbus" + ); + + configure_canfd(qts, MSR_NORMAL_MODE); + + /* Check if CANFD0 and CANFD1 are in Normal mode. */ + status = qtest_readl(qts, CANFD0_BASE_ADDR + R_SR_OFFSET); + status = status & STATUS_REG_MASK; + g_assert_cmpint(status, ==, STATUS_NORMAL_MODE); + + status = qtest_readl(qts, CANFD1_BASE_ADDR + R_SR_OFFSET); + status = status & STATUS_REG_MASK; + g_assert_cmpint(status, ==, STATUS_NORMAL_MODE); + + write_data(qts, CANFD0_BASE_ADDR, buf_tx, true); + + send_data(qts, CANFD0_BASE_ADDR); + read_data(qts, CANFD1_BASE_ADDR, buf_rx); + match_rx_tx_data(buf_tx, buf_rx, true); + + qtest_quit(qts); +} + +/* + * This test is performing loopback mode on CANFD0 and CANFD1. Data sent from + * TX of each CANFD0 and CANFD1 are compared with RX register data for + * respective CANFD Controller. + */ +static void test_can_loopback(void) +{ + uint32_t buf_tx[CANFD_FRAME_SIZE] = { 0x5a5bb9a4, 0xf8000000 }; + uint32_t buf_rx[CANFD_FRAME_SIZE] = { 0x00, 0x00, 0x00, 0x00 }; + uint32_t status = 0; + + generate_random_data(buf_tx, true); + + QTestState *qts = qtest_init("-machine xlnx-versal-virt" + " -object can-bus,id=canbus" + " -machine canbus0=canbus" + " -machine canbus1=canbus" + ); + + configure_canfd(qts, MSR_LOOPBACK_MODE); + + /* Check if CANFD0 and CANFD1 are set in correct loopback mode. */ + status = qtest_readl(qts, CANFD0_BASE_ADDR + R_SR_OFFSET); + status = status & STATUS_REG_MASK; + g_assert_cmpint(status, ==, STATUS_LOOPBACK_MODE); + + status = qtest_readl(qts, CANFD1_BASE_ADDR + R_SR_OFFSET); + status = status & STATUS_REG_MASK; + g_assert_cmpint(status, ==, STATUS_LOOPBACK_MODE); + + write_data(qts, CANFD0_BASE_ADDR, buf_tx, true); + + send_data(qts, CANFD0_BASE_ADDR); + read_data(qts, CANFD0_BASE_ADDR, buf_rx); + match_rx_tx_data(buf_tx, buf_rx, true); + + generate_random_data(buf_tx, true); + + write_data(qts, CANFD1_BASE_ADDR, buf_tx, true); + + send_data(qts, CANFD1_BASE_ADDR); + read_data(qts, CANFD1_BASE_ADDR, buf_rx); + match_rx_tx_data(buf_tx, buf_rx, true); + + qtest_quit(qts); +} + +int main(int argc, char **argv) +{ + g_test_init(&argc, &argv, NULL); + + qtest_add_func("/net/canfd/can_data_transfer", test_can_data_transfer); + qtest_add_func("/net/canfd/canfd_data_transfer", test_canfd_data_transfer); + qtest_add_func("/net/canfd/can_loopback", test_can_loopback); + + return g_test_run(); +} diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build index XXXXXXX..XXXXXXX 100644 --- a/tests/qtest/meson.build +++ b/tests/qtest/meson.build @@ -XXX,XX +XXX,XX @@ qtests_aarch64 = \ (config_all.has_key('CONFIG_TCG') and config_all_devices.has_key('CONFIG_TPM_TIS_SYSBUS') ? \ ['tpm-tis-device-test', 'tpm-tis-device-swtpm-test'] : []) + \ (config_all_devices.has_key('CONFIG_XLNX_ZYNQMP_ARM') ? ['xlnx-can-test', 'fuzz-xlnx-dp-test'] : []) + \ + (config_all_devices.has_key('CONFIG_XLNX_VERSAL') ? ['xlnx-canfd-test'] : []) + \ (config_all_devices.has_key('CONFIG_RASPI') ? ['bcm2835-dma-test'] : []) + \ (config_all.has_key('CONFIG_TCG') and \ config_all_devices.has_key('CONFIG_TPM_TIS_I2C') ? ['tpm-tis-i2c-test'] : []) + \ -- 2.34.1
From: qianfan Zhao <qianfanguijin@163.com> Allwinner R40 (sun8i) SoC features a Quad-Core Cortex-A7 ARM CPU, and a Mali400 MP2 GPU from ARM. It's also known as the Allwinner T3 for In-Car Entertainment usage, A40i and A40pro are variants that differ in applicable temperatures range (industrial and military). Signed-off-by: qianfan Zhao <qianfanguijin@163.com> Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/hw/arm/allwinner-r40.h | 110 +++++++++ hw/arm/allwinner-r40.c | 415 +++++++++++++++++++++++++++++++++ hw/arm/bananapi_m2u.c | 129 ++++++++++ hw/arm/Kconfig | 10 + hw/arm/meson.build | 1 + 5 files changed, 665 insertions(+) create mode 100644 include/hw/arm/allwinner-r40.h create mode 100644 hw/arm/allwinner-r40.c create mode 100644 hw/arm/bananapi_m2u.c diff --git a/include/hw/arm/allwinner-r40.h b/include/hw/arm/allwinner-r40.h new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/include/hw/arm/allwinner-r40.h @@ -XXX,XX +XXX,XX @@ +/* + * Allwinner R40/A40i/T3 System on Chip emulation + * + * Copyright (C) 2023 qianfan Zhao <qianfanguijin@163.com> + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef HW_ARM_ALLWINNER_R40_H +#define HW_ARM_ALLWINNER_R40_H + +#include "qom/object.h" +#include "hw/arm/boot.h" +#include "hw/timer/allwinner-a10-pit.h" +#include "hw/intc/arm_gic.h" +#include "hw/sd/allwinner-sdhost.h" +#include "target/arm/cpu.h" +#include "sysemu/block-backend.h" + +enum { + AW_R40_DEV_SRAM_A1, + AW_R40_DEV_SRAM_A2, + AW_R40_DEV_SRAM_A3, + AW_R40_DEV_SRAM_A4, + AW_R40_DEV_MMC0, + AW_R40_DEV_MMC1, + AW_R40_DEV_MMC2, + AW_R40_DEV_MMC3, + AW_R40_DEV_CCU, + AW_R40_DEV_PIT, + AW_R40_DEV_UART0, + AW_R40_DEV_GIC_DIST, + AW_R40_DEV_GIC_CPU, + AW_R40_DEV_GIC_HYP, + AW_R40_DEV_GIC_VCPU, + AW_R40_DEV_SDRAM +}; + +#define AW_R40_NUM_CPUS (4) + +/** + * Allwinner R40 object model + * @{ + */ + +/** Object type for the Allwinner R40 SoC */ +#define TYPE_AW_R40 "allwinner-r40" + +/** Convert input object to Allwinner R40 state object */ +OBJECT_DECLARE_SIMPLE_TYPE(AwR40State, AW_R40) + +/** @} */ + +/** + * Allwinner R40 object + * + * This struct contains the state of all the devices + * which are currently emulated by the R40 SoC code. + */ +#define AW_R40_NUM_MMCS 4 + +struct AwR40State { + /*< private >*/ + DeviceState parent_obj; + /*< public >*/ + + ARMCPU cpus[AW_R40_NUM_CPUS]; + const hwaddr *memmap; + AwA10PITState timer; + AwSdHostState mmc[AW_R40_NUM_MMCS]; + GICState gic; + MemoryRegion sram_a1; + MemoryRegion sram_a2; + MemoryRegion sram_a3; + MemoryRegion sram_a4; +}; + +/** + * Emulate Boot ROM firmware setup functionality. + * + * A real Allwinner R40 SoC contains a Boot ROM + * which is the first code that runs right after + * the SoC is powered on. The Boot ROM is responsible + * for loading user code (e.g. a bootloader) from any + * of the supported external devices and writing the + * downloaded code to internal SRAM. After loading the SoC + * begins executing the code written to SRAM. + * + * This function emulates the Boot ROM by copying 32 KiB + * of data from the given block device and writes it to + * the start of the first internal SRAM memory. + * + * @s: Allwinner R40 state object pointer + * @blk: Block backend device object pointer + * @unit: the mmc control's unit + */ +bool allwinner_r40_bootrom_setup(AwR40State *s, BlockBackend *blk, int unit); + +#endif /* HW_ARM_ALLWINNER_R40_H */ diff --git a/hw/arm/allwinner-r40.c b/hw/arm/allwinner-r40.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/hw/arm/allwinner-r40.c @@ -XXX,XX +XXX,XX @@ +/* + * Allwinner R40/A40i/T3 System on Chip emulation + * + * Copyright (C) 2023 qianfan Zhao <qianfanguijin@163.com> + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#include "qemu/osdep.h" +#include "qapi/error.h" +#include "qemu/error-report.h" +#include "qemu/bswap.h" +#include "qemu/module.h" +#include "qemu/units.h" +#include "hw/qdev-core.h" +#include "hw/sysbus.h" +#include "hw/char/serial.h" +#include "hw/misc/unimp.h" +#include "hw/usb/hcd-ehci.h" +#include "hw/loader.h" +#include "sysemu/sysemu.h" +#include "hw/arm/allwinner-r40.h" + +/* Memory map */ +const hwaddr allwinner_r40_memmap[] = { + [AW_R40_DEV_SRAM_A1] = 0x00000000, + [AW_R40_DEV_SRAM_A2] = 0x00004000, + [AW_R40_DEV_SRAM_A3] = 0x00008000, + [AW_R40_DEV_SRAM_A4] = 0x0000b400, + [AW_R40_DEV_MMC0] = 0x01c0f000, + [AW_R40_DEV_MMC1] = 0x01c10000, + [AW_R40_DEV_MMC2] = 0x01c11000, + [AW_R40_DEV_MMC3] = 0x01c12000, + [AW_R40_DEV_PIT] = 0x01c20c00, + [AW_R40_DEV_UART0] = 0x01c28000, + [AW_R40_DEV_GIC_DIST] = 0x01c81000, + [AW_R40_DEV_GIC_CPU] = 0x01c82000, + [AW_R40_DEV_GIC_HYP] = 0x01c84000, + [AW_R40_DEV_GIC_VCPU] = 0x01c86000, + [AW_R40_DEV_SDRAM] = 0x40000000 +}; + +/* List of unimplemented devices */ +struct AwR40Unimplemented { + const char *device_name; + hwaddr base; + hwaddr size; +}; + +static struct AwR40Unimplemented r40_unimplemented[] = { + { "d-engine", 0x01000000, 4 * MiB }, + { "d-inter", 0x01400000, 128 * KiB }, + { "sram-c", 0x01c00000, 4 * KiB }, + { "dma", 0x01c02000, 4 * KiB }, + { "nfdc", 0x01c03000, 4 * KiB }, + { "ts", 0x01c04000, 4 * KiB }, + { "spi0", 0x01c05000, 4 * KiB }, + { "spi1", 0x01c06000, 4 * KiB }, + { "cs0", 0x01c09000, 4 * KiB }, + { "keymem", 0x01c0a000, 4 * KiB }, + { "emac", 0x01c0b000, 4 * KiB }, + { "usb0-otg", 0x01c13000, 4 * KiB }, + { "usb0-host", 0x01c14000, 4 * KiB }, + { "crypto", 0x01c15000, 4 * KiB }, + { "spi2", 0x01c17000, 4 * KiB }, + { "sata", 0x01c18000, 4 * KiB }, + { "usb1-host", 0x01c19000, 4 * KiB }, + { "sid", 0x01c1b000, 4 * KiB }, + { "usb2-host", 0x01c1c000, 4 * KiB }, + { "cs1", 0x01c1d000, 4 * KiB }, + { "spi3", 0x01c1f000, 4 * KiB }, + { "ccu", 0x01c20000, 1 * KiB }, + { "rtc", 0x01c20400, 1 * KiB }, + { "pio", 0x01c20800, 1 * KiB }, + { "owa", 0x01c21000, 1 * KiB }, + { "ac97", 0x01c21400, 1 * KiB }, + { "cir0", 0x01c21800, 1 * KiB }, + { "cir1", 0x01c21c00, 1 * KiB }, + { "pcm0", 0x01c22000, 1 * KiB }, + { "pcm1", 0x01c22400, 1 * KiB }, + { "pcm2", 0x01c22800, 1 * KiB }, + { "audio", 0x01c22c00, 1 * KiB }, + { "keypad", 0x01c23000, 1 * KiB }, + { "pwm", 0x01c23400, 1 * KiB }, + { "keyadc", 0x01c24400, 1 * KiB }, + { "ths", 0x01c24c00, 1 * KiB }, + { "rtp", 0x01c25000, 1 * KiB }, + { "pmu", 0x01c25400, 1 * KiB }, + { "cpu-cfg", 0x01c25c00, 1 * KiB }, + { "uart0", 0x01c28000, 1 * KiB }, + { "uart1", 0x01c28400, 1 * KiB }, + { "uart2", 0x01c28800, 1 * KiB }, + { "uart3", 0x01c28c00, 1 * KiB }, + { "uart4", 0x01c29000, 1 * KiB }, + { "uart5", 0x01c29400, 1 * KiB }, + { "uart6", 0x01c29800, 1 * KiB }, + { "uart7", 0x01c29c00, 1 * KiB }, + { "ps20", 0x01c2a000, 1 * KiB }, + { "ps21", 0x01c2a400, 1 * KiB }, + { "twi0", 0x01c2ac00, 1 * KiB }, + { "twi1", 0x01c2b000, 1 * KiB }, + { "twi2", 0x01c2b400, 1 * KiB }, + { "twi3", 0x01c2b800, 1 * KiB }, + { "twi4", 0x01c2c000, 1 * KiB }, + { "scr", 0x01c2c400, 1 * KiB }, + { "tvd-top", 0x01c30000, 4 * KiB }, + { "tvd0", 0x01c31000, 4 * KiB }, + { "tvd1", 0x01c32000, 4 * KiB }, + { "tvd2", 0x01c33000, 4 * KiB }, + { "tvd3", 0x01c34000, 4 * KiB }, + { "gpu", 0x01c40000, 64 * KiB }, + { "gmac", 0x01c50000, 64 * KiB }, + { "hstmr", 0x01c60000, 4 * KiB }, + { "dram-com", 0x01c62000, 4 * KiB }, + { "dram-ctl", 0x01c63000, 4 * KiB }, + { "tcon-top", 0x01c70000, 4 * KiB }, + { "lcd0", 0x01c71000, 4 * KiB }, + { "lcd1", 0x01c72000, 4 * KiB }, + { "tv0", 0x01c73000, 4 * KiB }, + { "tv1", 0x01c74000, 4 * KiB }, + { "tve-top", 0x01c90000, 16 * KiB }, + { "tve0", 0x01c94000, 16 * KiB }, + { "tve1", 0x01c98000, 16 * KiB }, + { "mipi_dsi", 0x01ca0000, 4 * KiB }, + { "mipi_dphy", 0x01ca1000, 4 * KiB }, + { "ve", 0x01d00000, 1024 * KiB }, + { "mp", 0x01e80000, 128 * KiB }, + { "hdmi", 0x01ee0000, 128 * KiB }, + { "prcm", 0x01f01400, 1 * KiB }, + { "debug", 0x3f500000, 64 * KiB }, + { "cpubist", 0x3f501000, 4 * KiB }, + { "dcu", 0x3fff0000, 64 * KiB }, + { "hstmr", 0x01c60000, 4 * KiB }, + { "brom", 0xffff0000, 36 * KiB } +}; + +/* Per Processor Interrupts */ +enum { + AW_R40_GIC_PPI_MAINT = 9, + AW_R40_GIC_PPI_HYPTIMER = 10, + AW_R40_GIC_PPI_VIRTTIMER = 11, + AW_R40_GIC_PPI_SECTIMER = 13, + AW_R40_GIC_PPI_PHYSTIMER = 14 +}; + +/* Shared Processor Interrupts */ +enum { + AW_R40_GIC_SPI_UART0 = 1, + AW_R40_GIC_SPI_TIMER0 = 22, + AW_R40_GIC_SPI_TIMER1 = 23, + AW_R40_GIC_SPI_MMC0 = 32, + AW_R40_GIC_SPI_MMC1 = 33, + AW_R40_GIC_SPI_MMC2 = 34, + AW_R40_GIC_SPI_MMC3 = 35, +}; + +/* Allwinner R40 general constants */ +enum { + AW_R40_GIC_NUM_SPI = 128 +}; + +#define BOOT0_MAGIC "eGON.BT0" + +/* The low 8-bits of the 'boot_media' field in the SPL header */ +#define SUNXI_BOOTED_FROM_MMC0 0 +#define SUNXI_BOOTED_FROM_NAND 1 +#define SUNXI_BOOTED_FROM_MMC2 2 +#define SUNXI_BOOTED_FROM_SPI 3 + +struct boot_file_head { + uint32_t b_instruction; + uint8_t magic[8]; + uint32_t check_sum; + uint32_t length; + uint32_t pub_head_size; + uint32_t fel_script_address; + uint32_t fel_uEnv_length; + uint32_t dt_name_offset; + uint32_t dram_size; + uint32_t boot_media; + uint32_t string_pool[13]; +}; + +bool allwinner_r40_bootrom_setup(AwR40State *s, BlockBackend *blk, int unit) +{ + const int64_t rom_size = 32 * KiB; + g_autofree uint8_t *buffer = g_new0(uint8_t, rom_size); + struct boot_file_head *head = (struct boot_file_head *)buffer; + + if (blk_pread(blk, 8 * KiB, rom_size, buffer, 0) < 0) { + error_setg(&error_fatal, "%s: failed to read BlockBackend data", + __func__); + return false; + } + + /* we only check the magic string here. */ + if (memcmp(head->magic, BOOT0_MAGIC, sizeof(head->magic))) { + return false; + } + + /* + * Simulate the behavior of the bootROM, it will change the boot_media + * flag to indicate where the chip is booting from. R40 can boot from + * mmc0 or mmc2, the default value of boot_media is zero + * (SUNXI_BOOTED_FROM_MMC0), let's fix this flag when it is booting from + * the others. + */ + if (unit == 2) { + head->boot_media = cpu_to_le32(SUNXI_BOOTED_FROM_MMC2); + } else { + head->boot_media = cpu_to_le32(SUNXI_BOOTED_FROM_MMC0); + } + + rom_add_blob("allwinner-r40.bootrom", buffer, rom_size, + rom_size, s->memmap[AW_R40_DEV_SRAM_A1], + NULL, NULL, NULL, NULL, false); + return true; +} + +static void allwinner_r40_init(Object *obj) +{ + static const char *mmc_names[AW_R40_NUM_MMCS] = { + "mmc0", "mmc1", "mmc2", "mmc3" + }; + AwR40State *s = AW_R40(obj); + + s->memmap = allwinner_r40_memmap; + + for (int i = 0; i < AW_R40_NUM_CPUS; i++) { + object_initialize_child(obj, "cpu[*]", &s->cpus[i], + ARM_CPU_TYPE_NAME("cortex-a7")); + } + + object_initialize_child(obj, "gic", &s->gic, TYPE_ARM_GIC); + + object_initialize_child(obj, "timer", &s->timer, TYPE_AW_A10_PIT); + object_property_add_alias(obj, "clk0-freq", OBJECT(&s->timer), + "clk0-freq"); + object_property_add_alias(obj, "clk1-freq", OBJECT(&s->timer), + "clk1-freq"); + + for (int i = 0; i < AW_R40_NUM_MMCS; i++) { + object_initialize_child(obj, mmc_names[i], &s->mmc[i], + TYPE_AW_SDHOST_SUN5I); + } +} + +static void allwinner_r40_realize(DeviceState *dev, Error **errp) +{ + AwR40State *s = AW_R40(dev); + unsigned i; + + /* CPUs */ + for (i = 0; i < AW_R40_NUM_CPUS; i++) { + + /* + * Disable secondary CPUs. Guest EL3 firmware will start + * them via CPU reset control registers. + */ + qdev_prop_set_bit(DEVICE(&s->cpus[i]), "start-powered-off", + i > 0); + + /* All exception levels required */ + qdev_prop_set_bit(DEVICE(&s->cpus[i]), "has_el3", true); + qdev_prop_set_bit(DEVICE(&s->cpus[i]), "has_el2", true); + + /* Mark realized */ + qdev_realize(DEVICE(&s->cpus[i]), NULL, &error_fatal); + } + + /* Generic Interrupt Controller */ + qdev_prop_set_uint32(DEVICE(&s->gic), "num-irq", AW_R40_GIC_NUM_SPI + + GIC_INTERNAL); + qdev_prop_set_uint32(DEVICE(&s->gic), "revision", 2); + qdev_prop_set_uint32(DEVICE(&s->gic), "num-cpu", AW_R40_NUM_CPUS); + qdev_prop_set_bit(DEVICE(&s->gic), "has-security-extensions", false); + qdev_prop_set_bit(DEVICE(&s->gic), "has-virtualization-extensions", true); + sysbus_realize(SYS_BUS_DEVICE(&s->gic), &error_fatal); + + sysbus_mmio_map(SYS_BUS_DEVICE(&s->gic), 0, s->memmap[AW_R40_DEV_GIC_DIST]); + sysbus_mmio_map(SYS_BUS_DEVICE(&s->gic), 1, s->memmap[AW_R40_DEV_GIC_CPU]); + sysbus_mmio_map(SYS_BUS_DEVICE(&s->gic), 2, s->memmap[AW_R40_DEV_GIC_HYP]); + sysbus_mmio_map(SYS_BUS_DEVICE(&s->gic), 3, s->memmap[AW_R40_DEV_GIC_VCPU]); + + /* + * Wire the outputs from each CPU's generic timer and the GICv2 + * maintenance interrupt signal to the appropriate GIC PPI inputs, + * and the GIC's IRQ/FIQ/VIRQ/VFIQ interrupt outputs to the CPU's inputs. + */ + for (i = 0; i < AW_R40_NUM_CPUS; i++) { + DeviceState *cpudev = DEVICE(&s->cpus[i]); + int ppibase = AW_R40_GIC_NUM_SPI + i * GIC_INTERNAL + GIC_NR_SGIS; + int irq; + /* + * Mapping from the output timer irq lines from the CPU to the + * GIC PPI inputs used for this board. + */ + const int timer_irq[] = { + [GTIMER_PHYS] = AW_R40_GIC_PPI_PHYSTIMER, + [GTIMER_VIRT] = AW_R40_GIC_PPI_VIRTTIMER, + [GTIMER_HYP] = AW_R40_GIC_PPI_HYPTIMER, + [GTIMER_SEC] = AW_R40_GIC_PPI_SECTIMER, + }; + + /* Connect CPU timer outputs to GIC PPI inputs */ + for (irq = 0; irq < ARRAY_SIZE(timer_irq); irq++) { + qdev_connect_gpio_out(cpudev, irq, + qdev_get_gpio_in(DEVICE(&s->gic), + ppibase + timer_irq[irq])); + } + + /* Connect GIC outputs to CPU interrupt inputs */ + sysbus_connect_irq(SYS_BUS_DEVICE(&s->gic), i, + qdev_get_gpio_in(cpudev, ARM_CPU_IRQ)); + sysbus_connect_irq(SYS_BUS_DEVICE(&s->gic), i + AW_R40_NUM_CPUS, + qdev_get_gpio_in(cpudev, ARM_CPU_FIQ)); + sysbus_connect_irq(SYS_BUS_DEVICE(&s->gic), i + (2 * AW_R40_NUM_CPUS), + qdev_get_gpio_in(cpudev, ARM_CPU_VIRQ)); + sysbus_connect_irq(SYS_BUS_DEVICE(&s->gic), i + (3 * AW_R40_NUM_CPUS), + qdev_get_gpio_in(cpudev, ARM_CPU_VFIQ)); + + /* GIC maintenance signal */ + sysbus_connect_irq(SYS_BUS_DEVICE(&s->gic), i + (4 * AW_R40_NUM_CPUS), + qdev_get_gpio_in(DEVICE(&s->gic), + ppibase + AW_R40_GIC_PPI_MAINT)); + } + + /* Timer */ + sysbus_realize(SYS_BUS_DEVICE(&s->timer), &error_fatal); + sysbus_mmio_map(SYS_BUS_DEVICE(&s->timer), 0, s->memmap[AW_R40_DEV_PIT]); + sysbus_connect_irq(SYS_BUS_DEVICE(&s->timer), 0, + qdev_get_gpio_in(DEVICE(&s->gic), + AW_R40_GIC_SPI_TIMER0)); + sysbus_connect_irq(SYS_BUS_DEVICE(&s->timer), 1, + qdev_get_gpio_in(DEVICE(&s->gic), + AW_R40_GIC_SPI_TIMER1)); + + /* SRAM */ + memory_region_init_ram(&s->sram_a1, OBJECT(dev), "sram A1", + 16 * KiB, &error_abort); + memory_region_init_ram(&s->sram_a2, OBJECT(dev), "sram A2", + 16 * KiB, &error_abort); + memory_region_init_ram(&s->sram_a3, OBJECT(dev), "sram A3", + 13 * KiB, &error_abort); + memory_region_init_ram(&s->sram_a4, OBJECT(dev), "sram A4", + 3 * KiB, &error_abort); + memory_region_add_subregion(get_system_memory(), + s->memmap[AW_R40_DEV_SRAM_A1], &s->sram_a1); + memory_region_add_subregion(get_system_memory(), + s->memmap[AW_R40_DEV_SRAM_A2], &s->sram_a2); + memory_region_add_subregion(get_system_memory(), + s->memmap[AW_R40_DEV_SRAM_A3], &s->sram_a3); + memory_region_add_subregion(get_system_memory(), + s->memmap[AW_R40_DEV_SRAM_A4], &s->sram_a4); + + /* SD/MMC */ + for (int i = 0; i < AW_R40_NUM_MMCS; i++) { + qemu_irq irq = qdev_get_gpio_in(DEVICE(&s->gic), + AW_R40_GIC_SPI_MMC0 + i); + const hwaddr addr = s->memmap[AW_R40_DEV_MMC0 + i]; + + object_property_set_link(OBJECT(&s->mmc[i]), "dma-memory", + OBJECT(get_system_memory()), &error_fatal); + sysbus_realize(SYS_BUS_DEVICE(&s->mmc[i]), &error_fatal); + sysbus_mmio_map(SYS_BUS_DEVICE(&s->mmc[i]), 0, addr); + sysbus_connect_irq(SYS_BUS_DEVICE(&s->mmc[i]), 0, irq); + } + + /* UART0. For future clocktree API: All UARTS are connected to APB2_CLK. */ + serial_mm_init(get_system_memory(), s->memmap[AW_R40_DEV_UART0], 2, + qdev_get_gpio_in(DEVICE(&s->gic), AW_R40_GIC_SPI_UART0), + 115200, serial_hd(0), DEVICE_NATIVE_ENDIAN); + + /* Unimplemented devices */ + for (i = 0; i < ARRAY_SIZE(r40_unimplemented); i++) { + create_unimplemented_device(r40_unimplemented[i].device_name, + r40_unimplemented[i].base, + r40_unimplemented[i].size); + } +} + +static void allwinner_r40_class_init(ObjectClass *oc, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(oc); + + dc->realize = allwinner_r40_realize; + /* Reason: uses serial_hd() in realize function */ + dc->user_creatable = false; +} + +static const TypeInfo allwinner_r40_type_info = { + .name = TYPE_AW_R40, + .parent = TYPE_DEVICE, + .instance_size = sizeof(AwR40State), + .instance_init = allwinner_r40_init, + .class_init = allwinner_r40_class_init, +}; + +static void allwinner_r40_register_types(void) +{ + type_register_static(&allwinner_r40_type_info); +} + +type_init(allwinner_r40_register_types) diff --git a/hw/arm/bananapi_m2u.c b/hw/arm/bananapi_m2u.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/hw/arm/bananapi_m2u.c @@ -XXX,XX +XXX,XX @@ +/* + * Bananapi M2U emulation + * + * Copyright (C) 2023 qianfan Zhao <qianfanguijin@163.com> + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#include "qemu/osdep.h" +#include "qemu/units.h" +#include "exec/address-spaces.h" +#include "qapi/error.h" +#include "qemu/error-report.h" +#include "hw/boards.h" +#include "hw/qdev-properties.h" +#include "hw/arm/allwinner-r40.h" + +static struct arm_boot_info bpim2u_binfo; + +/* + * R40 can boot from mmc0 and mmc2, and bpim2u has two mmc interface, one is + * connected to sdcard and another mount an emmc media. + * Attach the mmc driver and try loading bootloader. + */ +static void mmc_attach_drive(AwR40State *s, AwSdHostState *mmc, int unit, + bool load_bootroom, bool *bootroom_loaded) +{ + DriveInfo *di = drive_get(IF_SD, 0, unit); + BlockBackend *blk = di ? blk_by_legacy_dinfo(di) : NULL; + BusState *bus; + DeviceState *carddev; + + bus = qdev_get_child_bus(DEVICE(mmc), "sd-bus"); + if (bus == NULL) { + error_report("No SD bus found in SOC object"); + exit(1); + } + + carddev = qdev_new(TYPE_SD_CARD); + qdev_prop_set_drive_err(carddev, "drive", blk, &error_fatal); + qdev_realize_and_unref(carddev, bus, &error_fatal); + + if (load_bootroom && blk && blk_is_available(blk)) { + /* Use Boot ROM to copy data from SD card to SRAM */ + *bootroom_loaded = allwinner_r40_bootrom_setup(s, blk, unit); + } +} + +static void bpim2u_init(MachineState *machine) +{ + bool bootroom_loaded = false; + AwR40State *r40; + + /* BIOS is not supported by this board */ + if (machine->firmware) { + error_report("BIOS not supported for this machine"); + exit(1); + } + + /* Only allow Cortex-A7 for this board */ + if (strcmp(machine->cpu_type, ARM_CPU_TYPE_NAME("cortex-a7")) != 0) { + error_report("This board can only be used with cortex-a7 CPU"); + exit(1); + } + + r40 = AW_R40(object_new(TYPE_AW_R40)); + object_property_add_child(OBJECT(machine), "soc", OBJECT(r40)); + object_unref(OBJECT(r40)); + + /* Setup timer properties */ + object_property_set_int(OBJECT(r40), "clk0-freq", 32768, &error_abort); + object_property_set_int(OBJECT(r40), "clk1-freq", 24 * 1000 * 1000, + &error_abort); + + /* Mark R40 object realized */ + qdev_realize(DEVICE(r40), NULL, &error_abort); + + /* + * Plug in SD card and try load bootrom, R40 has 4 mmc controllers but can + * only booting from mmc0 and mmc2. + */ + for (int i = 0; i < AW_R40_NUM_MMCS; i++) { + switch (i) { + case 0: + case 2: + mmc_attach_drive(r40, &r40->mmc[i], i, + !machine->kernel_filename && !bootroom_loaded, + &bootroom_loaded); + break; + default: + mmc_attach_drive(r40, &r40->mmc[i], i, false, NULL); + break; + } + } + + /* SDRAM */ + memory_region_add_subregion(get_system_memory(), + r40->memmap[AW_R40_DEV_SDRAM], machine->ram); + + bpim2u_binfo.loader_start = r40->memmap[AW_R40_DEV_SDRAM]; + bpim2u_binfo.ram_size = machine->ram_size; + bpim2u_binfo.psci_conduit = QEMU_PSCI_CONDUIT_SMC; + arm_load_kernel(ARM_CPU(first_cpu), machine, &bpim2u_binfo); +} + +static void bpim2u_machine_init(MachineClass *mc) +{ + mc->desc = "Bananapi M2U (Cortex-A7)"; + mc->init = bpim2u_init; + mc->min_cpus = AW_R40_NUM_CPUS; + mc->max_cpus = AW_R40_NUM_CPUS; + mc->default_cpus = AW_R40_NUM_CPUS; + mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a7"); + mc->default_ram_size = 1 * GiB; + mc->default_ram_id = "bpim2u.ram"; +} + +DEFINE_MACHINE("bpim2u", bpim2u_machine_init) diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/Kconfig +++ b/hw/arm/Kconfig @@ -XXX,XX +XXX,XX @@ config ALLWINNER_H3 select USB_EHCI_SYSBUS select SD +config ALLWINNER_R40 + bool + default y if TCG && ARM + select ALLWINNER_A10_PIT + select SERIAL + select ARM_TIMER + select ARM_GIC + select UNIMP + select SD + config RASPI bool default y diff --git a/hw/arm/meson.build b/hw/arm/meson.build index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/meson.build +++ b/hw/arm/meson.build @@ -XXX,XX +XXX,XX @@ arm_ss.add(when: 'CONFIG_OMAP', if_true: files('omap1.c', 'omap2.c')) arm_ss.add(when: 'CONFIG_STRONGARM', if_true: files('strongarm.c')) arm_ss.add(when: 'CONFIG_ALLWINNER_A10', if_true: files('allwinner-a10.c', 'cubieboard.c')) arm_ss.add(when: 'CONFIG_ALLWINNER_H3', if_true: files('allwinner-h3.c', 'orangepi.c')) +arm_ss.add(when: 'CONFIG_ALLWINNER_R40', if_true: files('allwinner-r40.c', 'bananapi_m2u.c')) arm_ss.add(when: 'CONFIG_RASPI', if_true: files('bcm2836.c', 'raspi.c')) arm_ss.add(when: 'CONFIG_STM32F100_SOC', if_true: files('stm32f100_soc.c')) arm_ss.add(when: 'CONFIG_STM32F205_SOC', if_true: files('stm32f205_soc.c')) -- 2.34.1
From: qianfan Zhao <qianfanguijin@163.com> The CCU provides the registers to program the PLLs and the controls most of the clock generation, division, distribution, synchronization and gating. This commit adds support for the Clock Control Unit which emulates a simple read/write register interface. Signed-off-by: qianfan Zhao <qianfanguijin@163.com> Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/hw/arm/allwinner-r40.h | 2 + include/hw/misc/allwinner-r40-ccu.h | 65 +++++++++ hw/arm/allwinner-r40.c | 8 +- hw/misc/allwinner-r40-ccu.c | 209 ++++++++++++++++++++++++++++ hw/misc/meson.build | 1 + 5 files changed, 284 insertions(+), 1 deletion(-) create mode 100644 include/hw/misc/allwinner-r40-ccu.h create mode 100644 hw/misc/allwinner-r40-ccu.c diff --git a/include/hw/arm/allwinner-r40.h b/include/hw/arm/allwinner-r40.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/arm/allwinner-r40.h +++ b/include/hw/arm/allwinner-r40.h @@ -XXX,XX +XXX,XX @@ #include "hw/timer/allwinner-a10-pit.h" #include "hw/intc/arm_gic.h" #include "hw/sd/allwinner-sdhost.h" +#include "hw/misc/allwinner-r40-ccu.h" #include "target/arm/cpu.h" #include "sysemu/block-backend.h" @@ -XXX,XX +XXX,XX @@ struct AwR40State { const hwaddr *memmap; AwA10PITState timer; AwSdHostState mmc[AW_R40_NUM_MMCS]; + AwR40ClockCtlState ccu; GICState gic; MemoryRegion sram_a1; MemoryRegion sram_a2; diff --git a/include/hw/misc/allwinner-r40-ccu.h b/include/hw/misc/allwinner-r40-ccu.h new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/include/hw/misc/allwinner-r40-ccu.h @@ -XXX,XX +XXX,XX @@ +/* + * Allwinner R40 Clock Control Unit emulation + * + * Copyright (C) 2023 qianfan Zhao <qianfanguijin@163.com> + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef HW_MISC_ALLWINNER_R40_CCU_H +#define HW_MISC_ALLWINNER_R40_CCU_H + +#include "qom/object.h" +#include "hw/sysbus.h" + +/** + * @name Constants + * @{ + */ + +/** Size of register I/O address space used by CCU device */ +#define AW_R40_CCU_IOSIZE (0x400) + +/** Total number of known registers */ +#define AW_R40_CCU_REGS_NUM (AW_R40_CCU_IOSIZE / sizeof(uint32_t)) + +/** @} */ + +/** + * @name Object model + * @{ + */ + +#define TYPE_AW_R40_CCU "allwinner-r40-ccu" +OBJECT_DECLARE_SIMPLE_TYPE(AwR40ClockCtlState, AW_R40_CCU) + +/** @} */ + +/** + * Allwinner R40 CCU object instance state. + */ +struct AwR40ClockCtlState { + /*< private >*/ + SysBusDevice parent_obj; + /*< public >*/ + + /** Maps I/O registers in physical memory */ + MemoryRegion iomem; + + /** Array of hardware registers */ + uint32_t regs[AW_R40_CCU_REGS_NUM]; + +}; + +#endif /* HW_MISC_ALLWINNER_R40_CCU_H */ diff --git a/hw/arm/allwinner-r40.c b/hw/arm/allwinner-r40.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/allwinner-r40.c +++ b/hw/arm/allwinner-r40.c @@ -XXX,XX +XXX,XX @@ const hwaddr allwinner_r40_memmap[] = { [AW_R40_DEV_MMC1] = 0x01c10000, [AW_R40_DEV_MMC2] = 0x01c11000, [AW_R40_DEV_MMC3] = 0x01c12000, + [AW_R40_DEV_CCU] = 0x01c20000, [AW_R40_DEV_PIT] = 0x01c20c00, [AW_R40_DEV_UART0] = 0x01c28000, [AW_R40_DEV_GIC_DIST] = 0x01c81000, @@ -XXX,XX +XXX,XX @@ static struct AwR40Unimplemented r40_unimplemented[] = { { "usb2-host", 0x01c1c000, 4 * KiB }, { "cs1", 0x01c1d000, 4 * KiB }, { "spi3", 0x01c1f000, 4 * KiB }, - { "ccu", 0x01c20000, 1 * KiB }, { "rtc", 0x01c20400, 1 * KiB }, { "pio", 0x01c20800, 1 * KiB }, { "owa", 0x01c21000, 1 * KiB }, @@ -XXX,XX +XXX,XX @@ static void allwinner_r40_init(Object *obj) object_property_add_alias(obj, "clk1-freq", OBJECT(&s->timer), "clk1-freq"); + object_initialize_child(obj, "ccu", &s->ccu, TYPE_AW_R40_CCU); + for (int i = 0; i < AW_R40_NUM_MMCS; i++) { object_initialize_child(obj, mmc_names[i], &s->mmc[i], TYPE_AW_SDHOST_SUN5I); @@ -XXX,XX +XXX,XX @@ static void allwinner_r40_realize(DeviceState *dev, Error **errp) memory_region_add_subregion(get_system_memory(), s->memmap[AW_R40_DEV_SRAM_A4], &s->sram_a4); + /* Clock Control Unit */ + sysbus_realize(SYS_BUS_DEVICE(&s->ccu), &error_fatal); + sysbus_mmio_map(SYS_BUS_DEVICE(&s->ccu), 0, s->memmap[AW_R40_DEV_CCU]); + /* SD/MMC */ for (int i = 0; i < AW_R40_NUM_MMCS; i++) { qemu_irq irq = qdev_get_gpio_in(DEVICE(&s->gic), diff --git a/hw/misc/allwinner-r40-ccu.c b/hw/misc/allwinner-r40-ccu.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/hw/misc/allwinner-r40-ccu.c @@ -XXX,XX +XXX,XX @@ +/* + * Allwinner R40 Clock Control Unit emulation + * + * Copyright (C) 2023 qianfan Zhao <qianfanguijin@163.com> + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#include "qemu/osdep.h" +#include "qemu/units.h" +#include "hw/sysbus.h" +#include "migration/vmstate.h" +#include "qemu/log.h" +#include "qemu/module.h" +#include "hw/misc/allwinner-r40-ccu.h" + +/* CCU register offsets */ +enum { + REG_PLL_CPUX_CTRL = 0x0000, + REG_PLL_AUDIO_CTRL = 0x0008, + REG_PLL_VIDEO0_CTRL = 0x0010, + REG_PLL_VE_CTRL = 0x0018, + REG_PLL_DDR0_CTRL = 0x0020, + REG_PLL_PERIPH0_CTRL = 0x0028, + REG_PLL_PERIPH1_CTRL = 0x002c, + REG_PLL_VIDEO1_CTRL = 0x0030, + REG_PLL_SATA_CTRL = 0x0034, + REG_PLL_GPU_CTRL = 0x0038, + REG_PLL_MIPI_CTRL = 0x0040, + REG_PLL_DE_CTRL = 0x0048, + REG_PLL_DDR1_CTRL = 0x004c, + REG_AHB1_APB1_CFG = 0x0054, + REG_APB2_CFG = 0x0058, + REG_MMC0_CLK = 0x0088, + REG_MMC1_CLK = 0x008c, + REG_MMC2_CLK = 0x0090, + REG_MMC3_CLK = 0x0094, + REG_USBPHY_CFG = 0x00cc, + REG_PLL_DDR_AUX = 0x00f0, + REG_DRAM_CFG = 0x00f4, + REG_PLL_DDR1_CFG = 0x00f8, + REG_DRAM_CLK_GATING = 0x0100, + REG_GMAC_CLK = 0x0164, + REG_SYS_32K_CLK = 0x0310, + REG_PLL_LOCK_CTRL = 0x0320, +}; + +#define REG_INDEX(offset) (offset / sizeof(uint32_t)) + +/* CCU register flags */ +enum { + REG_PLL_ENABLE = (1 << 31), + REG_PLL_LOCK = (1 << 28), +}; + +static uint64_t allwinner_r40_ccu_read(void *opaque, hwaddr offset, + unsigned size) +{ + const AwR40ClockCtlState *s = AW_R40_CCU(opaque); + const uint32_t idx = REG_INDEX(offset); + + switch (offset) { + case 0x324 ... AW_R40_CCU_IOSIZE: + qemu_log_mask(LOG_GUEST_ERROR, "%s: out-of-bounds offset 0x%04x\n", + __func__, (uint32_t)offset); + return 0; + } + + return s->regs[idx]; +} + +static void allwinner_r40_ccu_write(void *opaque, hwaddr offset, + uint64_t val, unsigned size) +{ + AwR40ClockCtlState *s = AW_R40_CCU(opaque); + + switch (offset) { + case REG_DRAM_CFG: /* DRAM Configuration(for DDR0) */ + /* bit16: SDRCLK_UPD (SDRCLK configuration 0 update) */ + val &= ~(1 << 16); + break; + case REG_PLL_DDR1_CTRL: /* DDR1 Control register */ + /* bit30: SDRPLL_UPD */ + val &= ~(1 << 30); + if (val & REG_PLL_ENABLE) { + val |= REG_PLL_LOCK; + } + break; + case REG_PLL_CPUX_CTRL: + case REG_PLL_AUDIO_CTRL: + case REG_PLL_VE_CTRL: + case REG_PLL_VIDEO0_CTRL: + case REG_PLL_DDR0_CTRL: + case REG_PLL_PERIPH0_CTRL: + case REG_PLL_PERIPH1_CTRL: + case REG_PLL_VIDEO1_CTRL: + case REG_PLL_SATA_CTRL: + case REG_PLL_GPU_CTRL: + case REG_PLL_MIPI_CTRL: + case REG_PLL_DE_CTRL: + if (val & REG_PLL_ENABLE) { + val |= REG_PLL_LOCK; + } + break; + case 0x324 ... AW_R40_CCU_IOSIZE: + qemu_log_mask(LOG_GUEST_ERROR, "%s: out-of-bounds offset 0x%04x\n", + __func__, (uint32_t)offset); + break; + default: + qemu_log_mask(LOG_UNIMP, "%s: unimplemented write offset 0x%04x\n", + __func__, (uint32_t)offset); + break; + } + + s->regs[REG_INDEX(offset)] = (uint32_t) val; +} + +static const MemoryRegionOps allwinner_r40_ccu_ops = { + .read = allwinner_r40_ccu_read, + .write = allwinner_r40_ccu_write, + .endianness = DEVICE_NATIVE_ENDIAN, + .valid = { + .min_access_size = 4, + .max_access_size = 4, + }, + .impl.min_access_size = 4, +}; + +static void allwinner_r40_ccu_reset(DeviceState *dev) +{ + AwR40ClockCtlState *s = AW_R40_CCU(dev); + + memset(s->regs, 0, sizeof(s->regs)); + + /* Set default values for registers */ + s->regs[REG_INDEX(REG_PLL_CPUX_CTRL)] = 0x00001000; + s->regs[REG_INDEX(REG_PLL_AUDIO_CTRL)] = 0x00035514; + s->regs[REG_INDEX(REG_PLL_VIDEO0_CTRL)] = 0x03006207; + s->regs[REG_INDEX(REG_PLL_VE_CTRL)] = 0x03006207; + s->regs[REG_INDEX(REG_PLL_DDR0_CTRL)] = 0x00001000, + s->regs[REG_INDEX(REG_PLL_PERIPH0_CTRL)] = 0x00041811; + s->regs[REG_INDEX(REG_PLL_PERIPH1_CTRL)] = 0x00041811; + s->regs[REG_INDEX(REG_PLL_VIDEO1_CTRL)] = 0x03006207; + s->regs[REG_INDEX(REG_PLL_SATA_CTRL)] = 0x00001811; + s->regs[REG_INDEX(REG_PLL_GPU_CTRL)] = 0x03006207; + s->regs[REG_INDEX(REG_PLL_MIPI_CTRL)] = 0x00000515; + s->regs[REG_INDEX(REG_PLL_DE_CTRL)] = 0x03006207; + s->regs[REG_INDEX(REG_PLL_DDR1_CTRL)] = 0x00001800; + s->regs[REG_INDEX(REG_AHB1_APB1_CFG)] = 0x00001010; + s->regs[REG_INDEX(REG_APB2_CFG)] = 0x01000000; + s->regs[REG_INDEX(REG_PLL_DDR_AUX)] = 0x00000001; + s->regs[REG_INDEX(REG_PLL_DDR1_CFG)] = 0x0ccca000; + s->regs[REG_INDEX(REG_SYS_32K_CLK)] = 0x0000000f; +} + +static void allwinner_r40_ccu_init(Object *obj) +{ + SysBusDevice *sbd = SYS_BUS_DEVICE(obj); + AwR40ClockCtlState *s = AW_R40_CCU(obj); + + /* Memory mapping */ + memory_region_init_io(&s->iomem, OBJECT(s), &allwinner_r40_ccu_ops, s, + TYPE_AW_R40_CCU, AW_R40_CCU_IOSIZE); + sysbus_init_mmio(sbd, &s->iomem); +} + +static const VMStateDescription allwinner_r40_ccu_vmstate = { + .name = "allwinner-r40-ccu", + .version_id = 1, + .minimum_version_id = 1, + .fields = (VMStateField[]) { + VMSTATE_UINT32_ARRAY(regs, AwR40ClockCtlState, AW_R40_CCU_REGS_NUM), + VMSTATE_END_OF_LIST() + } +}; + +static void allwinner_r40_ccu_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(klass); + + dc->reset = allwinner_r40_ccu_reset; + dc->vmsd = &allwinner_r40_ccu_vmstate; +} + +static const TypeInfo allwinner_r40_ccu_info = { + .name = TYPE_AW_R40_CCU, + .parent = TYPE_SYS_BUS_DEVICE, + .instance_init = allwinner_r40_ccu_init, + .instance_size = sizeof(AwR40ClockCtlState), + .class_init = allwinner_r40_ccu_class_init, +}; + +static void allwinner_r40_ccu_register(void) +{ + type_register_static(&allwinner_r40_ccu_info); +} + +type_init(allwinner_r40_ccu_register) diff --git a/hw/misc/meson.build b/hw/misc/meson.build index XXXXXXX..XXXXXXX 100644 --- a/hw/misc/meson.build +++ b/hw/misc/meson.build @@ -XXX,XX +XXX,XX @@ specific_ss.add(when: 'CONFIG_ALLWINNER_H3', if_true: files('allwinner-cpucfg.c' softmmu_ss.add(when: 'CONFIG_ALLWINNER_H3', if_true: files('allwinner-h3-dramc.c')) softmmu_ss.add(when: 'CONFIG_ALLWINNER_H3', if_true: files('allwinner-h3-sysctrl.c')) softmmu_ss.add(when: 'CONFIG_ALLWINNER_H3', if_true: files('allwinner-sid.c')) +softmmu_ss.add(when: 'CONFIG_ALLWINNER_R40', if_true: files('allwinner-r40-ccu.c')) softmmu_ss.add(when: 'CONFIG_AXP209_PMU', if_true: files('axp209.c')) softmmu_ss.add(when: 'CONFIG_REALVIEW', if_true: files('arm_sysctl.c')) softmmu_ss.add(when: 'CONFIG_NSERIES', if_true: files('cbus.c')) -- 2.34.1
From: qianfan Zhao <qianfanguijin@163.com> R40 has eight UARTs, support both 16450 and 16550 compatible modes. Signed-off-by: qianfan Zhao <qianfanguijin@163.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/hw/arm/allwinner-r40.h | 8 ++++++++ hw/arm/allwinner-r40.c | 34 +++++++++++++++++++++++++++++++--- 2 files changed, 39 insertions(+), 3 deletions(-) diff --git a/include/hw/arm/allwinner-r40.h b/include/hw/arm/allwinner-r40.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/arm/allwinner-r40.h +++ b/include/hw/arm/allwinner-r40.h @@ -XXX,XX +XXX,XX @@ enum { AW_R40_DEV_CCU, AW_R40_DEV_PIT, AW_R40_DEV_UART0, + AW_R40_DEV_UART1, + AW_R40_DEV_UART2, + AW_R40_DEV_UART3, + AW_R40_DEV_UART4, + AW_R40_DEV_UART5, + AW_R40_DEV_UART6, + AW_R40_DEV_UART7, AW_R40_DEV_GIC_DIST, AW_R40_DEV_GIC_CPU, AW_R40_DEV_GIC_HYP, @@ -XXX,XX +XXX,XX @@ OBJECT_DECLARE_SIMPLE_TYPE(AwR40State, AW_R40) * which are currently emulated by the R40 SoC code. */ #define AW_R40_NUM_MMCS 4 +#define AW_R40_NUM_UARTS 8 struct AwR40State { /*< private >*/ diff --git a/hw/arm/allwinner-r40.c b/hw/arm/allwinner-r40.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/allwinner-r40.c +++ b/hw/arm/allwinner-r40.c @@ -XXX,XX +XXX,XX @@ const hwaddr allwinner_r40_memmap[] = { [AW_R40_DEV_CCU] = 0x01c20000, [AW_R40_DEV_PIT] = 0x01c20c00, [AW_R40_DEV_UART0] = 0x01c28000, + [AW_R40_DEV_UART1] = 0x01c28400, + [AW_R40_DEV_UART2] = 0x01c28800, + [AW_R40_DEV_UART3] = 0x01c28c00, + [AW_R40_DEV_UART4] = 0x01c29000, + [AW_R40_DEV_UART5] = 0x01c29400, + [AW_R40_DEV_UART6] = 0x01c29800, + [AW_R40_DEV_UART7] = 0x01c29c00, [AW_R40_DEV_GIC_DIST] = 0x01c81000, [AW_R40_DEV_GIC_CPU] = 0x01c82000, [AW_R40_DEV_GIC_HYP] = 0x01c84000, @@ -XXX,XX +XXX,XX @@ enum { /* Shared Processor Interrupts */ enum { AW_R40_GIC_SPI_UART0 = 1, + AW_R40_GIC_SPI_UART1 = 2, + AW_R40_GIC_SPI_UART2 = 3, + AW_R40_GIC_SPI_UART3 = 4, + AW_R40_GIC_SPI_UART4 = 17, + AW_R40_GIC_SPI_UART5 = 18, + AW_R40_GIC_SPI_UART6 = 19, + AW_R40_GIC_SPI_UART7 = 20, AW_R40_GIC_SPI_TIMER0 = 22, AW_R40_GIC_SPI_TIMER1 = 23, AW_R40_GIC_SPI_MMC0 = 32, @@ -XXX,XX +XXX,XX @@ static void allwinner_r40_realize(DeviceState *dev, Error **errp) } /* UART0. For future clocktree API: All UARTS are connected to APB2_CLK. */ - serial_mm_init(get_system_memory(), s->memmap[AW_R40_DEV_UART0], 2, - qdev_get_gpio_in(DEVICE(&s->gic), AW_R40_GIC_SPI_UART0), - 115200, serial_hd(0), DEVICE_NATIVE_ENDIAN); + for (int i = 0; i < AW_R40_NUM_UARTS; i++) { + static const int uart_irqs[AW_R40_NUM_UARTS] = { + AW_R40_GIC_SPI_UART0, + AW_R40_GIC_SPI_UART1, + AW_R40_GIC_SPI_UART2, + AW_R40_GIC_SPI_UART3, + AW_R40_GIC_SPI_UART4, + AW_R40_GIC_SPI_UART5, + AW_R40_GIC_SPI_UART6, + AW_R40_GIC_SPI_UART7, + }; + const hwaddr addr = s->memmap[AW_R40_DEV_UART0 + i]; + + serial_mm_init(get_system_memory(), addr, 2, + qdev_get_gpio_in(DEVICE(&s->gic), uart_irqs[i]), + 115200, serial_hd(i), DEVICE_NATIVE_ENDIAN); + } /* Unimplemented devices */ for (i = 0; i < ARRAY_SIZE(r40_unimplemented); i++) { -- 2.34.1
From: qianfan Zhao <qianfanguijin@163.com> TWI(i2c) is designed to be used as an interface between CPU host and the serial 2-Wire bus. It can support all standard 2-Wire transfer, can be operated in standard mode(100kbit/s) or fast-mode, supporting data rate up to 400kbit/s. Signed-off-by: qianfan Zhao <qianfanguijin@163.com> Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/hw/arm/allwinner-r40.h | 3 +++ hw/arm/allwinner-r40.c | 11 ++++++++++- 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/include/hw/arm/allwinner-r40.h b/include/hw/arm/allwinner-r40.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/arm/allwinner-r40.h +++ b/include/hw/arm/allwinner-r40.h @@ -XXX,XX +XXX,XX @@ #include "hw/intc/arm_gic.h" #include "hw/sd/allwinner-sdhost.h" #include "hw/misc/allwinner-r40-ccu.h" +#include "hw/i2c/allwinner-i2c.h" #include "target/arm/cpu.h" #include "sysemu/block-backend.h" @@ -XXX,XX +XXX,XX @@ enum { AW_R40_DEV_UART5, AW_R40_DEV_UART6, AW_R40_DEV_UART7, + AW_R40_DEV_TWI0, AW_R40_DEV_GIC_DIST, AW_R40_DEV_GIC_CPU, AW_R40_DEV_GIC_HYP, @@ -XXX,XX +XXX,XX @@ struct AwR40State { AwA10PITState timer; AwSdHostState mmc[AW_R40_NUM_MMCS]; AwR40ClockCtlState ccu; + AWI2CState i2c0; GICState gic; MemoryRegion sram_a1; MemoryRegion sram_a2; diff --git a/hw/arm/allwinner-r40.c b/hw/arm/allwinner-r40.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/allwinner-r40.c +++ b/hw/arm/allwinner-r40.c @@ -XXX,XX +XXX,XX @@ const hwaddr allwinner_r40_memmap[] = { [AW_R40_DEV_UART5] = 0x01c29400, [AW_R40_DEV_UART6] = 0x01c29800, [AW_R40_DEV_UART7] = 0x01c29c00, + [AW_R40_DEV_TWI0] = 0x01c2ac00, [AW_R40_DEV_GIC_DIST] = 0x01c81000, [AW_R40_DEV_GIC_CPU] = 0x01c82000, [AW_R40_DEV_GIC_HYP] = 0x01c84000, @@ -XXX,XX +XXX,XX @@ static struct AwR40Unimplemented r40_unimplemented[] = { { "uart7", 0x01c29c00, 1 * KiB }, { "ps20", 0x01c2a000, 1 * KiB }, { "ps21", 0x01c2a400, 1 * KiB }, - { "twi0", 0x01c2ac00, 1 * KiB }, { "twi1", 0x01c2b000, 1 * KiB }, { "twi2", 0x01c2b400, 1 * KiB }, { "twi3", 0x01c2b800, 1 * KiB }, @@ -XXX,XX +XXX,XX @@ enum { AW_R40_GIC_SPI_UART1 = 2, AW_R40_GIC_SPI_UART2 = 3, AW_R40_GIC_SPI_UART3 = 4, + AW_R40_GIC_SPI_TWI0 = 7, AW_R40_GIC_SPI_UART4 = 17, AW_R40_GIC_SPI_UART5 = 18, AW_R40_GIC_SPI_UART6 = 19, @@ -XXX,XX +XXX,XX @@ static void allwinner_r40_init(Object *obj) object_initialize_child(obj, mmc_names[i], &s->mmc[i], TYPE_AW_SDHOST_SUN5I); } + + object_initialize_child(obj, "twi0", &s->i2c0, TYPE_AW_I2C_SUN6I); } static void allwinner_r40_realize(DeviceState *dev, Error **errp) @@ -XXX,XX +XXX,XX @@ static void allwinner_r40_realize(DeviceState *dev, Error **errp) 115200, serial_hd(i), DEVICE_NATIVE_ENDIAN); } + /* I2C */ + sysbus_realize(SYS_BUS_DEVICE(&s->i2c0), &error_fatal); + sysbus_mmio_map(SYS_BUS_DEVICE(&s->i2c0), 0, s->memmap[AW_R40_DEV_TWI0]); + sysbus_connect_irq(SYS_BUS_DEVICE(&s->i2c0), 0, + qdev_get_gpio_in(DEVICE(&s->gic), AW_R40_GIC_SPI_TWI0)); + /* Unimplemented devices */ for (i = 0; i < ARRAY_SIZE(r40_unimplemented); i++) { create_unimplemented_device(r40_unimplemented[i].device_name, -- 2.34.1
From: qianfan Zhao <qianfanguijin@163.com> This patch adds minimal support for AXP-221 PMU and connect it to bananapi M2U board. Signed-off-by: qianfan Zhao <qianfanguijin@163.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- hw/arm/bananapi_m2u.c | 6 + hw/misc/axp209.c | 238 ----------------------------------- hw/misc/axp2xx.c | 283 ++++++++++++++++++++++++++++++++++++++++++ hw/arm/Kconfig | 3 +- hw/misc/Kconfig | 2 +- hw/misc/meson.build | 2 +- hw/misc/trace-events | 8 +- 7 files changed, 297 insertions(+), 245 deletions(-) delete mode 100644 hw/misc/axp209.c create mode 100644 hw/misc/axp2xx.c diff --git a/hw/arm/bananapi_m2u.c b/hw/arm/bananapi_m2u.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/bananapi_m2u.c +++ b/hw/arm/bananapi_m2u.c @@ -XXX,XX +XXX,XX @@ #include "qapi/error.h" #include "qemu/error-report.h" #include "hw/boards.h" +#include "hw/i2c/i2c.h" #include "hw/qdev-properties.h" #include "hw/arm/allwinner-r40.h" @@ -XXX,XX +XXX,XX @@ static void bpim2u_init(MachineState *machine) { bool bootroom_loaded = false; AwR40State *r40; + I2CBus *i2c; /* BIOS is not supported by this board */ if (machine->firmware) { @@ -XXX,XX +XXX,XX @@ static void bpim2u_init(MachineState *machine) } } + /* Connect AXP221 */ + i2c = I2C_BUS(qdev_get_child_bus(DEVICE(&r40->i2c0), "i2c")); + i2c_slave_create_simple(i2c, "axp221_pmu", 0x34); + /* SDRAM */ memory_region_add_subregion(get_system_memory(), r40->memmap[AW_R40_DEV_SDRAM], machine->ram); diff --git a/hw/misc/axp209.c b/hw/misc/axp209.c deleted file mode 100644 index XXXXXXX..XXXXXXX --- a/hw/misc/axp209.c +++ /dev/null @@ -XXX,XX +XXX,XX @@ -/* - * AXP-209 PMU Emulation - * - * Copyright (C) 2022 Strahinja Jankovic <strahinja.p.jankovic@gmail.com> - * - * Permission is hereby granted, free of charge, to any person obtaining a - * copy of this software and associated documentation files (the "Software"), - * to deal in the Software without restriction, including without limitation - * the rights to use, copy, modify, merge, publish, distribute, sublicense, - * and/or sell copies of the Software, and to permit persons to whom the - * Software is furnished to do so, subject to the following conditions: - * - * The above copyright notice and this permission notice shall be included in - * all copies or substantial portions of the Software. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE - * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING - * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER - * DEALINGS IN THE SOFTWARE. - * - * SPDX-License-Identifier: MIT - */ - -#include "qemu/osdep.h" -#include "qemu/log.h" -#include "trace.h" -#include "hw/i2c/i2c.h" -#include "migration/vmstate.h" - -#define TYPE_AXP209_PMU "axp209_pmu" - -#define AXP209(obj) \ - OBJECT_CHECK(AXP209I2CState, (obj), TYPE_AXP209_PMU) - -/* registers */ -enum { - REG_POWER_STATUS = 0x0u, - REG_OPERATING_MODE, - REG_OTG_VBUS_STATUS, - REG_CHIP_VERSION, - REG_DATA_CACHE_0, - REG_DATA_CACHE_1, - REG_DATA_CACHE_2, - REG_DATA_CACHE_3, - REG_DATA_CACHE_4, - REG_DATA_CACHE_5, - REG_DATA_CACHE_6, - REG_DATA_CACHE_7, - REG_DATA_CACHE_8, - REG_DATA_CACHE_9, - REG_DATA_CACHE_A, - REG_DATA_CACHE_B, - REG_POWER_OUTPUT_CTRL = 0x12u, - REG_DC_DC2_OUT_V_CTRL = 0x23u, - REG_DC_DC2_DVS_CTRL = 0x25u, - REG_DC_DC3_OUT_V_CTRL = 0x27u, - REG_LDO2_4_OUT_V_CTRL, - REG_LDO3_OUT_V_CTRL, - REG_VBUS_CH_MGMT = 0x30u, - REG_SHUTDOWN_V_CTRL, - REG_SHUTDOWN_CTRL, - REG_CHARGE_CTRL_1, - REG_CHARGE_CTRL_2, - REG_SPARE_CHARGE_CTRL, - REG_PEK_KEY_CTRL, - REG_DC_DC_FREQ_SET, - REG_CHR_TEMP_TH_SET, - REG_CHR_HIGH_TEMP_TH_CTRL, - REG_IPSOUT_WARN_L1, - REG_IPSOUT_WARN_L2, - REG_DISCHR_TEMP_TH_SET, - REG_DISCHR_HIGH_TEMP_TH_CTRL, - REG_IRQ_BANK_1_CTRL = 0x40u, - REG_IRQ_BANK_2_CTRL, - REG_IRQ_BANK_3_CTRL, - REG_IRQ_BANK_4_CTRL, - REG_IRQ_BANK_5_CTRL, - REG_IRQ_BANK_1_STAT = 0x48u, - REG_IRQ_BANK_2_STAT, - REG_IRQ_BANK_3_STAT, - REG_IRQ_BANK_4_STAT, - REG_IRQ_BANK_5_STAT, - REG_ADC_ACIN_V_H = 0x56u, - REG_ADC_ACIN_V_L, - REG_ADC_ACIN_CURR_H, - REG_ADC_ACIN_CURR_L, - REG_ADC_VBUS_V_H, - REG_ADC_VBUS_V_L, - REG_ADC_VBUS_CURR_H, - REG_ADC_VBUS_CURR_L, - REG_ADC_INT_TEMP_H, - REG_ADC_INT_TEMP_L, - REG_ADC_TEMP_SENS_V_H = 0x62u, - REG_ADC_TEMP_SENS_V_L, - REG_ADC_BAT_V_H = 0x78u, - REG_ADC_BAT_V_L, - REG_ADC_BAT_DISCHR_CURR_H, - REG_ADC_BAT_DISCHR_CURR_L, - REG_ADC_BAT_CHR_CURR_H, - REG_ADC_BAT_CHR_CURR_L, - REG_ADC_IPSOUT_V_H, - REG_ADC_IPSOUT_V_L, - REG_DC_DC_MOD_SEL = 0x80u, - REG_ADC_EN_1, - REG_ADC_EN_2, - REG_ADC_SR_CTRL, - REG_ADC_IN_RANGE, - REG_GPIO1_ADC_IRQ_RISING_TH, - REG_GPIO1_ADC_IRQ_FALLING_TH, - REG_TIMER_CTRL = 0x8au, - REG_VBUS_CTRL_MON_SRP, - REG_OVER_TEMP_SHUTDOWN = 0x8fu, - REG_GPIO0_FEAT_SET, - REG_GPIO_OUT_HIGH_SET, - REG_GPIO1_FEAT_SET, - REG_GPIO2_FEAT_SET, - REG_GPIO_SIG_STATE_SET_MON, - REG_GPIO3_SET, - REG_COULOMB_CNTR_CTRL = 0xb8u, - REG_POWER_MEAS_RES, - NR_REGS -}; - -#define AXP209_CHIP_VERSION_ID (0x01) -#define AXP209_DC_DC2_OUT_V_CTRL_RESET (0x16) -#define AXP209_IRQ_BANK_1_CTRL_RESET (0xd8) - -/* A simple I2C slave which returns values of ID or CNT register. */ -typedef struct AXP209I2CState { - /*< private >*/ - I2CSlave i2c; - /*< public >*/ - uint8_t regs[NR_REGS]; /* peripheral registers */ - uint8_t ptr; /* current register index */ - uint8_t count; /* counter used for tx/rx */ -} AXP209I2CState; - -/* Reset all counters and load ID register */ -static void axp209_reset_enter(Object *obj, ResetType type) -{ - AXP209I2CState *s = AXP209(obj); - - memset(s->regs, 0, NR_REGS); - s->ptr = 0; - s->count = 0; - s->regs[REG_CHIP_VERSION] = AXP209_CHIP_VERSION_ID; - s->regs[REG_DC_DC2_OUT_V_CTRL] = AXP209_DC_DC2_OUT_V_CTRL_RESET; - s->regs[REG_IRQ_BANK_1_CTRL] = AXP209_IRQ_BANK_1_CTRL_RESET; -} - -/* Handle events from master. */ -static int axp209_event(I2CSlave *i2c, enum i2c_event event) -{ - AXP209I2CState *s = AXP209(i2c); - - s->count = 0; - - return 0; -} - -/* Called when master requests read */ -static uint8_t axp209_rx(I2CSlave *i2c) -{ - AXP209I2CState *s = AXP209(i2c); - uint8_t ret = 0xff; - - if (s->ptr < NR_REGS) { - ret = s->regs[s->ptr++]; - } - - trace_axp209_rx(s->ptr - 1, ret); - - return ret; -} - -/* - * Called when master sends write. - * Update ptr with byte 0, then perform write with second byte. - */ -static int axp209_tx(I2CSlave *i2c, uint8_t data) -{ - AXP209I2CState *s = AXP209(i2c); - - if (s->count == 0) { - /* Store register address */ - s->ptr = data; - s->count++; - trace_axp209_select(data); - } else { - trace_axp209_tx(s->ptr, data); - if (s->ptr == REG_DC_DC2_OUT_V_CTRL) { - s->regs[s->ptr++] = data; - } - } - - return 0; -} - -static const VMStateDescription vmstate_axp209 = { - .name = TYPE_AXP209_PMU, - .version_id = 1, - .fields = (VMStateField[]) { - VMSTATE_UINT8_ARRAY(regs, AXP209I2CState, NR_REGS), - VMSTATE_UINT8(count, AXP209I2CState), - VMSTATE_UINT8(ptr, AXP209I2CState), - VMSTATE_END_OF_LIST() - } -}; - -static void axp209_class_init(ObjectClass *oc, void *data) -{ - DeviceClass *dc = DEVICE_CLASS(oc); - I2CSlaveClass *isc = I2C_SLAVE_CLASS(oc); - ResettableClass *rc = RESETTABLE_CLASS(oc); - - rc->phases.enter = axp209_reset_enter; - dc->vmsd = &vmstate_axp209; - isc->event = axp209_event; - isc->recv = axp209_rx; - isc->send = axp209_tx; -} - -static const TypeInfo axp209_info = { - .name = TYPE_AXP209_PMU, - .parent = TYPE_I2C_SLAVE, - .instance_size = sizeof(AXP209I2CState), - .class_init = axp209_class_init -}; - -static void axp209_register_devices(void) -{ - type_register_static(&axp209_info); -} - -type_init(axp209_register_devices); diff --git a/hw/misc/axp2xx.c b/hw/misc/axp2xx.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/hw/misc/axp2xx.c @@ -XXX,XX +XXX,XX @@ +/* + * AXP-2XX PMU Emulation, supported lists: + * AXP209 + * AXP221 + * + * Copyright (C) 2022 Strahinja Jankovic <strahinja.p.jankovic@gmail.com> + * Copyright (C) 2023 qianfan Zhao <qianfanguijin@163.com> + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * SPDX-License-Identifier: MIT + */ + +#include "qemu/osdep.h" +#include "qemu/log.h" +#include "qom/object.h" +#include "trace.h" +#include "hw/i2c/i2c.h" +#include "migration/vmstate.h" + +#define TYPE_AXP2XX "axp2xx_pmu" +#define TYPE_AXP209_PMU "axp209_pmu" +#define TYPE_AXP221_PMU "axp221_pmu" + +OBJECT_DECLARE_TYPE(AXP2xxI2CState, AXP2xxClass, AXP2XX) + +#define NR_REGS (0xff) + +/* A simple I2C slave which returns values of ID or CNT register. */ +typedef struct AXP2xxI2CState { + /*< private >*/ + I2CSlave i2c; + /*< public >*/ + uint8_t regs[NR_REGS]; /* peripheral registers */ + uint8_t ptr; /* current register index */ + uint8_t count; /* counter used for tx/rx */ +} AXP2xxI2CState; + +typedef struct AXP2xxClass { + /*< private >*/ + I2CSlaveClass parent_class; + /*< public >*/ + void (*reset_enter)(AXP2xxI2CState *s, ResetType type); +} AXP2xxClass; + +#define AXP209_CHIP_VERSION_ID (0x01) +#define AXP209_DC_DC2_OUT_V_CTRL_RESET (0x16) + +/* Reset all counters and load ID register */ +static void axp209_reset_enter(AXP2xxI2CState *s, ResetType type) +{ + memset(s->regs, 0, NR_REGS); + s->ptr = 0; + s->count = 0; + + s->regs[0x03] = AXP209_CHIP_VERSION_ID; + s->regs[0x23] = AXP209_DC_DC2_OUT_V_CTRL_RESET; + + s->regs[0x30] = 0x60; + s->regs[0x32] = 0x46; + s->regs[0x34] = 0x41; + s->regs[0x35] = 0x22; + s->regs[0x36] = 0x5d; + s->regs[0x37] = 0x08; + s->regs[0x38] = 0xa5; + s->regs[0x39] = 0x1f; + s->regs[0x3a] = 0x68; + s->regs[0x3b] = 0x5f; + s->regs[0x3c] = 0xfc; + s->regs[0x3d] = 0x16; + s->regs[0x40] = 0xd8; + s->regs[0x42] = 0xff; + s->regs[0x43] = 0x3b; + s->regs[0x80] = 0xe0; + s->regs[0x82] = 0x83; + s->regs[0x83] = 0x80; + s->regs[0x84] = 0x32; + s->regs[0x86] = 0xff; + s->regs[0x90] = 0x07; + s->regs[0x91] = 0xa0; + s->regs[0x92] = 0x07; + s->regs[0x93] = 0x07; +} + +#define AXP221_PWR_STATUS_ACIN_PRESENT BIT(7) +#define AXP221_PWR_STATUS_ACIN_AVAIL BIT(6) +#define AXP221_PWR_STATUS_VBUS_PRESENT BIT(5) +#define AXP221_PWR_STATUS_VBUS_USED BIT(4) +#define AXP221_PWR_STATUS_BAT_CHARGING BIT(2) +#define AXP221_PWR_STATUS_ACIN_VBUS_POWERED BIT(1) + +/* Reset all counters and load ID register */ +static void axp221_reset_enter(AXP2xxI2CState *s, ResetType type) +{ + memset(s->regs, 0, NR_REGS); + s->ptr = 0; + s->count = 0; + + /* input power status register */ + s->regs[0x00] = AXP221_PWR_STATUS_ACIN_PRESENT + | AXP221_PWR_STATUS_ACIN_AVAIL + | AXP221_PWR_STATUS_ACIN_VBUS_POWERED; + + s->regs[0x01] = 0x00; /* no battery is connected */ + + /* + * CHIPID register, no documented on datasheet, but it is checked in + * u-boot spl. I had read it from AXP221s and got 0x06 value. + * So leave 06h here. + */ + s->regs[0x03] = 0x06; + + s->regs[0x10] = 0xbf; + s->regs[0x13] = 0x01; + s->regs[0x30] = 0x60; + s->regs[0x31] = 0x03; + s->regs[0x32] = 0x43; + s->regs[0x33] = 0xc6; + s->regs[0x34] = 0x45; + s->regs[0x35] = 0x0e; + s->regs[0x36] = 0x5d; + s->regs[0x37] = 0x08; + s->regs[0x38] = 0xa5; + s->regs[0x39] = 0x1f; + s->regs[0x3c] = 0xfc; + s->regs[0x3d] = 0x16; + s->regs[0x80] = 0x80; + s->regs[0x82] = 0xe0; + s->regs[0x84] = 0x32; + s->regs[0x8f] = 0x01; + + s->regs[0x90] = 0x07; + s->regs[0x91] = 0x1f; + s->regs[0x92] = 0x07; + s->regs[0x93] = 0x1f; + + s->regs[0x40] = 0xd8; + s->regs[0x41] = 0xff; + s->regs[0x42] = 0x03; + s->regs[0x43] = 0x03; + + s->regs[0xb8] = 0xc0; + s->regs[0xb9] = 0x64; + s->regs[0xe6] = 0xa0; +} + +static void axp2xx_reset_enter(Object *obj, ResetType type) +{ + AXP2xxI2CState *s = AXP2XX(obj); + AXP2xxClass *sc = AXP2XX_GET_CLASS(s); + + sc->reset_enter(s, type); +} + +/* Handle events from master. */ +static int axp2xx_event(I2CSlave *i2c, enum i2c_event event) +{ + AXP2xxI2CState *s = AXP2XX(i2c); + + s->count = 0; + + return 0; +} + +/* Called when master requests read */ +static uint8_t axp2xx_rx(I2CSlave *i2c) +{ + AXP2xxI2CState *s = AXP2XX(i2c); + uint8_t ret = 0xff; + + if (s->ptr < NR_REGS) { + ret = s->regs[s->ptr++]; + } + + trace_axp2xx_rx(s->ptr - 1, ret); + + return ret; +} + +/* + * Called when master sends write. + * Update ptr with byte 0, then perform write with second byte. + */ +static int axp2xx_tx(I2CSlave *i2c, uint8_t data) +{ + AXP2xxI2CState *s = AXP2XX(i2c); + + if (s->count == 0) { + /* Store register address */ + s->ptr = data; + s->count++; + trace_axp2xx_select(data); + } else { + trace_axp2xx_tx(s->ptr, data); + s->regs[s->ptr++] = data; + } + + return 0; +} + +static const VMStateDescription vmstate_axp2xx = { + .name = TYPE_AXP2XX, + .version_id = 1, + .fields = (VMStateField[]) { + VMSTATE_UINT8_ARRAY(regs, AXP2xxI2CState, NR_REGS), + VMSTATE_UINT8(ptr, AXP2xxI2CState), + VMSTATE_UINT8(count, AXP2xxI2CState), + VMSTATE_END_OF_LIST() + } +}; + +static void axp2xx_class_init(ObjectClass *oc, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(oc); + I2CSlaveClass *isc = I2C_SLAVE_CLASS(oc); + ResettableClass *rc = RESETTABLE_CLASS(oc); + + rc->phases.enter = axp2xx_reset_enter; + dc->vmsd = &vmstate_axp2xx; + isc->event = axp2xx_event; + isc->recv = axp2xx_rx; + isc->send = axp2xx_tx; +} + +static const TypeInfo axp2xx_info = { + .name = TYPE_AXP2XX, + .parent = TYPE_I2C_SLAVE, + .instance_size = sizeof(AXP2xxI2CState), + .class_size = sizeof(AXP2xxClass), + .class_init = axp2xx_class_init, + .abstract = true, +}; + +static void axp209_class_init(ObjectClass *oc, void *data) +{ + AXP2xxClass *sc = AXP2XX_CLASS(oc); + + sc->reset_enter = axp209_reset_enter; +} + +static const TypeInfo axp209_info = { + .name = TYPE_AXP209_PMU, + .parent = TYPE_AXP2XX, + .class_init = axp209_class_init +}; + +static void axp221_class_init(ObjectClass *oc, void *data) +{ + AXP2xxClass *sc = AXP2XX_CLASS(oc); + + sc->reset_enter = axp221_reset_enter; +} + +static const TypeInfo axp221_info = { + .name = TYPE_AXP221_PMU, + .parent = TYPE_AXP2XX, + .class_init = axp221_class_init, +}; + +static void axp2xx_register_devices(void) +{ + type_register_static(&axp2xx_info); + type_register_static(&axp209_info); + type_register_static(&axp221_info); +} + +type_init(axp2xx_register_devices); diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/Kconfig +++ b/hw/arm/Kconfig @@ -XXX,XX +XXX,XX @@ config ALLWINNER_A10 select ALLWINNER_WDT select ALLWINNER_EMAC select ALLWINNER_I2C - select AXP209_PMU + select AXP2XX_PMU select SERIAL select UNIMP @@ -XXX,XX +XXX,XX @@ config ALLWINNER_R40 bool default y if TCG && ARM select ALLWINNER_A10_PIT + select AXP2XX_PMU select SERIAL select ARM_TIMER select ARM_GIC diff --git a/hw/misc/Kconfig b/hw/misc/Kconfig index XXXXXXX..XXXXXXX 100644 --- a/hw/misc/Kconfig +++ b/hw/misc/Kconfig @@ -XXX,XX +XXX,XX @@ config ALLWINNER_A10_CCM config ALLWINNER_A10_DRAMC bool -config AXP209_PMU +config AXP2XX_PMU bool depends on I2C diff --git a/hw/misc/meson.build b/hw/misc/meson.build index XXXXXXX..XXXXXXX 100644 --- a/hw/misc/meson.build +++ b/hw/misc/meson.build @@ -XXX,XX +XXX,XX @@ softmmu_ss.add(when: 'CONFIG_ALLWINNER_H3', if_true: files('allwinner-h3-dramc.c softmmu_ss.add(when: 'CONFIG_ALLWINNER_H3', if_true: files('allwinner-h3-sysctrl.c')) softmmu_ss.add(when: 'CONFIG_ALLWINNER_H3', if_true: files('allwinner-sid.c')) softmmu_ss.add(when: 'CONFIG_ALLWINNER_R40', if_true: files('allwinner-r40-ccu.c')) -softmmu_ss.add(when: 'CONFIG_AXP209_PMU', if_true: files('axp209.c')) +softmmu_ss.add(when: 'CONFIG_AXP2XX_PMU', if_true: files('axp2xx.c')) softmmu_ss.add(when: 'CONFIG_REALVIEW', if_true: files('arm_sysctl.c')) softmmu_ss.add(when: 'CONFIG_NSERIES', if_true: files('cbus.c')) softmmu_ss.add(when: 'CONFIG_ECCMEMCTL', if_true: files('eccmemctl.c')) diff --git a/hw/misc/trace-events b/hw/misc/trace-events index XXXXXXX..XXXXXXX 100644 --- a/hw/misc/trace-events +++ b/hw/misc/trace-events @@ -XXX,XX +XXX,XX @@ allwinner_sid_write(uint64_t offset, uint64_t data, unsigned size) "offset 0x%" avr_power_read(uint8_t value) "power_reduc read value:%u" avr_power_write(uint8_t value) "power_reduc write value:%u" -# axp209.c -axp209_rx(uint8_t reg, uint8_t data) "Read reg 0x%" PRIx8 " : 0x%" PRIx8 -axp209_select(uint8_t reg) "Accessing reg 0x%" PRIx8 -axp209_tx(uint8_t reg, uint8_t data) "Write reg 0x%" PRIx8 " : 0x%" PRIx8 +# axp2xx +axp2xx_rx(uint8_t reg, uint8_t data) "Read reg 0x%" PRIx8 " : 0x%" PRIx8 +axp2xx_select(uint8_t reg) "Accessing reg 0x%" PRIx8 +axp2xx_tx(uint8_t reg, uint8_t data) "Write reg 0x%" PRIx8 " : 0x%" PRIx8 # eccmemctl.c ecc_mem_writel_mer(uint32_t val) "Write memory enable 0x%08x" -- 2.34.1
From: qianfan Zhao <qianfanguijin@163.com> Types of memory that the SDRAM controller supports are DDR2/DDR3 and capacities of up to 2GiB. This commit adds emulation support of the Allwinner R40 SDRAM controller. This driver only support 256M, 512M and 1024M memory now. Signed-off-by: qianfan Zhao <qianfanguijin@163.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/hw/arm/allwinner-r40.h | 13 +- include/hw/misc/allwinner-r40-dramc.h | 108 ++++++ hw/arm/allwinner-r40.c | 21 +- hw/arm/bananapi_m2u.c | 7 + hw/misc/allwinner-r40-dramc.c | 513 ++++++++++++++++++++++++++ hw/misc/meson.build | 1 + hw/misc/trace-events | 14 + 7 files changed, 674 insertions(+), 3 deletions(-) create mode 100644 include/hw/misc/allwinner-r40-dramc.h create mode 100644 hw/misc/allwinner-r40-dramc.c diff --git a/include/hw/arm/allwinner-r40.h b/include/hw/arm/allwinner-r40.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/arm/allwinner-r40.h +++ b/include/hw/arm/allwinner-r40.h @@ -XXX,XX +XXX,XX @@ #include "hw/intc/arm_gic.h" #include "hw/sd/allwinner-sdhost.h" #include "hw/misc/allwinner-r40-ccu.h" +#include "hw/misc/allwinner-r40-dramc.h" #include "hw/i2c/allwinner-i2c.h" #include "target/arm/cpu.h" #include "sysemu/block-backend.h" @@ -XXX,XX +XXX,XX @@ enum { AW_R40_DEV_GIC_CPU, AW_R40_DEV_GIC_HYP, AW_R40_DEV_GIC_VCPU, - AW_R40_DEV_SDRAM + AW_R40_DEV_SDRAM, + AW_R40_DEV_DRAMCOM, + AW_R40_DEV_DRAMCTL, + AW_R40_DEV_DRAMPHY, }; #define AW_R40_NUM_CPUS (4) @@ -XXX,XX +XXX,XX @@ struct AwR40State { DeviceState parent_obj; /*< public >*/ + /** Physical base address for start of RAM */ + hwaddr ram_addr; + + /** Total RAM size in megabytes */ + uint32_t ram_size; + ARMCPU cpus[AW_R40_NUM_CPUS]; const hwaddr *memmap; AwA10PITState timer; AwSdHostState mmc[AW_R40_NUM_MMCS]; AwR40ClockCtlState ccu; + AwR40DramCtlState dramc; AWI2CState i2c0; GICState gic; MemoryRegion sram_a1; diff --git a/include/hw/misc/allwinner-r40-dramc.h b/include/hw/misc/allwinner-r40-dramc.h new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/include/hw/misc/allwinner-r40-dramc.h @@ -XXX,XX +XXX,XX @@ +/* + * Allwinner R40 SDRAM Controller emulation + * + * Copyright (C) 2023 qianfan Zhao <qianfanguijin@163.com> + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef HW_MISC_ALLWINNER_R40_DRAMC_H +#define HW_MISC_ALLWINNER_R40_DRAMC_H + +#include "qom/object.h" +#include "hw/sysbus.h" +#include "exec/hwaddr.h" + +/** + * Constants + * @{ + */ + +/** Highest register address used by DRAMCOM module */ +#define AW_R40_DRAMCOM_REGS_MAXADDR (0x804) + +/** Total number of known DRAMCOM registers */ +#define AW_R40_DRAMCOM_REGS_NUM (AW_R40_DRAMCOM_REGS_MAXADDR / \ + sizeof(uint32_t)) + +/** Highest register address used by DRAMCTL module */ +#define AW_R40_DRAMCTL_REGS_MAXADDR (0x88c) + +/** Total number of known DRAMCTL registers */ +#define AW_R40_DRAMCTL_REGS_NUM (AW_R40_DRAMCTL_REGS_MAXADDR / \ + sizeof(uint32_t)) + +/** Highest register address used by DRAMPHY module */ +#define AW_R40_DRAMPHY_REGS_MAXADDR (0x4) + +/** Total number of known DRAMPHY registers */ +#define AW_R40_DRAMPHY_REGS_NUM (AW_R40_DRAMPHY_REGS_MAXADDR / \ + sizeof(uint32_t)) + +/** @} */ + +/** + * Object model + * @{ + */ + +#define TYPE_AW_R40_DRAMC "allwinner-r40-dramc" +OBJECT_DECLARE_SIMPLE_TYPE(AwR40DramCtlState, AW_R40_DRAMC) + +/** @} */ + +/** + * Allwinner R40 SDRAM Controller object instance state. + */ +struct AwR40DramCtlState { + /*< private >*/ + SysBusDevice parent_obj; + /*< public >*/ + + /** Physical base address for start of RAM */ + hwaddr ram_addr; + + /** Total RAM size in megabytes */ + uint32_t ram_size; + + uint8_t set_row_bits; + uint8_t set_bank_bits; + uint8_t set_col_bits; + + /** + * @name Memory Regions + * @{ + */ + MemoryRegion dramcom_iomem; /**< DRAMCOM module I/O registers */ + MemoryRegion dramctl_iomem; /**< DRAMCTL module I/O registers */ + MemoryRegion dramphy_iomem; /**< DRAMPHY module I/O registers */ + MemoryRegion dram_high; /**< The high 1G dram for dualrank detect */ + MemoryRegion detect_cells; /**< DRAM memory cells for auto detect */ + + /** @} */ + + /** + * @name Hardware Registers + * @{ + */ + + uint32_t dramcom[AW_R40_DRAMCOM_REGS_NUM]; /**< DRAMCOM registers */ + uint32_t dramctl[AW_R40_DRAMCTL_REGS_NUM]; /**< DRAMCTL registers */ + uint32_t dramphy[AW_R40_DRAMPHY_REGS_NUM] ;/**< DRAMPHY registers */ + + /** @} */ + +}; + +#endif /* HW_MISC_ALLWINNER_R40_DRAMC_H */ diff --git a/hw/arm/allwinner-r40.c b/hw/arm/allwinner-r40.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/allwinner-r40.c +++ b/hw/arm/allwinner-r40.c @@ -XXX,XX +XXX,XX @@ #include "hw/loader.h" #include "sysemu/sysemu.h" #include "hw/arm/allwinner-r40.h" +#include "hw/misc/allwinner-r40-dramc.h" /* Memory map */ const hwaddr allwinner_r40_memmap[] = { @@ -XXX,XX +XXX,XX @@ const hwaddr allwinner_r40_memmap[] = { [AW_R40_DEV_UART6] = 0x01c29800, [AW_R40_DEV_UART7] = 0x01c29c00, [AW_R40_DEV_TWI0] = 0x01c2ac00, + [AW_R40_DEV_DRAMCOM] = 0x01c62000, + [AW_R40_DEV_DRAMCTL] = 0x01c63000, + [AW_R40_DEV_DRAMPHY] = 0x01c65000, [AW_R40_DEV_GIC_DIST] = 0x01c81000, [AW_R40_DEV_GIC_CPU] = 0x01c82000, [AW_R40_DEV_GIC_HYP] = 0x01c84000, @@ -XXX,XX +XXX,XX @@ static struct AwR40Unimplemented r40_unimplemented[] = { { "gpu", 0x01c40000, 64 * KiB }, { "gmac", 0x01c50000, 64 * KiB }, { "hstmr", 0x01c60000, 4 * KiB }, - { "dram-com", 0x01c62000, 4 * KiB }, - { "dram-ctl", 0x01c63000, 4 * KiB }, { "tcon-top", 0x01c70000, 4 * KiB }, { "lcd0", 0x01c71000, 4 * KiB }, { "lcd1", 0x01c72000, 4 * KiB }, @@ -XXX,XX +XXX,XX @@ static void allwinner_r40_init(Object *obj) } object_initialize_child(obj, "twi0", &s->i2c0, TYPE_AW_I2C_SUN6I); + + object_initialize_child(obj, "dramc", &s->dramc, TYPE_AW_R40_DRAMC); + object_property_add_alias(obj, "ram-addr", OBJECT(&s->dramc), + "ram-addr"); + object_property_add_alias(obj, "ram-size", OBJECT(&s->dramc), + "ram-size"); } static void allwinner_r40_realize(DeviceState *dev, Error **errp) @@ -XXX,XX +XXX,XX @@ static void allwinner_r40_realize(DeviceState *dev, Error **errp) sysbus_connect_irq(SYS_BUS_DEVICE(&s->i2c0), 0, qdev_get_gpio_in(DEVICE(&s->gic), AW_R40_GIC_SPI_TWI0)); + /* DRAMC */ + sysbus_realize(SYS_BUS_DEVICE(&s->dramc), &error_fatal); + sysbus_mmio_map(SYS_BUS_DEVICE(&s->dramc), 0, + s->memmap[AW_R40_DEV_DRAMCOM]); + sysbus_mmio_map(SYS_BUS_DEVICE(&s->dramc), 1, + s->memmap[AW_R40_DEV_DRAMCTL]); + sysbus_mmio_map(SYS_BUS_DEVICE(&s->dramc), 2, + s->memmap[AW_R40_DEV_DRAMPHY]); + /* Unimplemented devices */ for (i = 0; i < ARRAY_SIZE(r40_unimplemented); i++) { create_unimplemented_device(r40_unimplemented[i].device_name, diff --git a/hw/arm/bananapi_m2u.c b/hw/arm/bananapi_m2u.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/bananapi_m2u.c +++ b/hw/arm/bananapi_m2u.c @@ -XXX,XX +XXX,XX @@ static void bpim2u_init(MachineState *machine) object_property_set_int(OBJECT(r40), "clk1-freq", 24 * 1000 * 1000, &error_abort); + /* DRAMC */ + r40->ram_size = machine->ram_size / MiB; + object_property_set_uint(OBJECT(r40), "ram-addr", + r40->memmap[AW_R40_DEV_SDRAM], &error_abort); + object_property_set_int(OBJECT(r40), "ram-size", + r40->ram_size, &error_abort); + /* Mark R40 object realized */ qdev_realize(DEVICE(r40), NULL, &error_abort); diff --git a/hw/misc/allwinner-r40-dramc.c b/hw/misc/allwinner-r40-dramc.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/hw/misc/allwinner-r40-dramc.c @@ -XXX,XX +XXX,XX @@ +/* + * Allwinner R40 SDRAM Controller emulation + * + * CCopyright (C) 2023 qianfan Zhao <qianfanguijin@163.com> + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#include "qemu/osdep.h" +#include "qemu/units.h" +#include "qemu/error-report.h" +#include "hw/sysbus.h" +#include "migration/vmstate.h" +#include "qemu/log.h" +#include "qemu/module.h" +#include "exec/address-spaces.h" +#include "hw/qdev-properties.h" +#include "qapi/error.h" +#include "qemu/bitops.h" +#include "hw/misc/allwinner-r40-dramc.h" +#include "trace.h" + +#define REG_INDEX(offset) (offset / sizeof(uint32_t)) + +/* DRAMCOM register offsets */ +enum { + REG_DRAMCOM_CR = 0x0000, /* Control Register */ +}; + +/* DRAMCOMM register flags */ +enum { + REG_DRAMCOM_CR_DUAL_RANK = (1 << 0), +}; + +/* DRAMCTL register offsets */ +enum { + REG_DRAMCTL_PIR = 0x0000, /* PHY Initialization Register */ + REG_DRAMCTL_PGSR = 0x0010, /* PHY General Status Register */ + REG_DRAMCTL_STATR = 0x0018, /* Status Register */ + REG_DRAMCTL_PGCR = 0x0100, /* PHY general configuration registers */ +}; + +/* DRAMCTL register flags */ +enum { + REG_DRAMCTL_PGSR_INITDONE = (1 << 0), + REG_DRAMCTL_PGSR_READ_TIMEOUT = (1 << 13), + REG_DRAMCTL_PGCR_ENABLE_READ_TIMEOUT = (1 << 25), +}; + +enum { + REG_DRAMCTL_STATR_ACTIVE = (1 << 0), +}; + +#define DRAM_MAX_ROW_BITS 16 +#define DRAM_MAX_COL_BITS 13 /* 8192 */ +#define DRAM_MAX_BANK 3 + +static uint64_t dram_autodetect_cells[DRAM_MAX_ROW_BITS] + [DRAM_MAX_BANK] + [DRAM_MAX_COL_BITS]; +struct VirtualDDRChip { + uint32_t ram_size; + uint8_t bank_bits; + uint8_t row_bits; + uint8_t col_bits; +}; + +/* + * Only power of 2 RAM sizes from 256MiB up to 2048MiB are supported, + * 2GiB memory is not supported due to dual rank feature. + */ +static const struct VirtualDDRChip dummy_ddr_chips[] = { + { + .ram_size = 256, + .bank_bits = 3, + .row_bits = 12, + .col_bits = 13, + }, { + .ram_size = 512, + .bank_bits = 3, + .row_bits = 13, + .col_bits = 13, + }, { + .ram_size = 1024, + .bank_bits = 3, + .row_bits = 14, + .col_bits = 13, + }, { + 0 + } +}; + +static const struct VirtualDDRChip *get_match_ddr(uint32_t ram_size) +{ + const struct VirtualDDRChip *ddr; + + for (ddr = &dummy_ddr_chips[0]; ddr->ram_size; ddr++) { + if (ddr->ram_size == ram_size) { + return ddr; + } + } + + return NULL; +} + +static uint64_t *address_to_autodetect_cells(AwR40DramCtlState *s, + const struct VirtualDDRChip *ddr, + uint32_t offset) +{ + int row_index = 0, bank_index = 0, col_index = 0; + uint32_t row_addr, bank_addr, col_addr; + + row_addr = extract32(offset, s->set_col_bits + s->set_bank_bits, + s->set_row_bits); + bank_addr = extract32(offset, s->set_col_bits, s->set_bank_bits); + col_addr = extract32(offset, 0, s->set_col_bits); + + for (int i = 0; i < ddr->row_bits; i++) { + if (row_addr & BIT(i)) { + row_index = i; + } + } + + for (int i = 0; i < ddr->bank_bits; i++) { + if (bank_addr & BIT(i)) { + bank_index = i; + } + } + + for (int i = 0; i < ddr->col_bits; i++) { + if (col_addr & BIT(i)) { + col_index = i; + } + } + + trace_allwinner_r40_dramc_offset_to_cell(offset, row_index, bank_index, + col_index); + return &dram_autodetect_cells[row_index][bank_index][col_index]; +} + +static void allwinner_r40_dramc_map_rows(AwR40DramCtlState *s, uint8_t row_bits, + uint8_t bank_bits, uint8_t col_bits) +{ + const struct VirtualDDRChip *ddr = get_match_ddr(s->ram_size); + bool enable_detect_cells; + + trace_allwinner_r40_dramc_map_rows(row_bits, bank_bits, col_bits); + + if (!ddr) { + return; + } + + s->set_row_bits = row_bits; + s->set_bank_bits = bank_bits; + s->set_col_bits = col_bits; + + enable_detect_cells = ddr->bank_bits != bank_bits + || ddr->row_bits != row_bits + || ddr->col_bits != col_bits; + + if (enable_detect_cells) { + trace_allwinner_r40_dramc_detect_cells_enable(); + } else { + trace_allwinner_r40_dramc_detect_cells_disable(); + } + + memory_region_set_enabled(&s->detect_cells, enable_detect_cells); +} + +static uint64_t allwinner_r40_dramcom_read(void *opaque, hwaddr offset, + unsigned size) +{ + const AwR40DramCtlState *s = AW_R40_DRAMC(opaque); + const uint32_t idx = REG_INDEX(offset); + + if (idx >= AW_R40_DRAMCOM_REGS_NUM) { + qemu_log_mask(LOG_GUEST_ERROR, "%s: out-of-bounds offset 0x%04x\n", + __func__, (uint32_t)offset); + return 0; + } + + trace_allwinner_r40_dramcom_read(offset, s->dramcom[idx], size); + return s->dramcom[idx]; +} + +static void allwinner_r40_dramcom_write(void *opaque, hwaddr offset, + uint64_t val, unsigned size) +{ + AwR40DramCtlState *s = AW_R40_DRAMC(opaque); + const uint32_t idx = REG_INDEX(offset); + + trace_allwinner_r40_dramcom_write(offset, val, size); + + if (idx >= AW_R40_DRAMCOM_REGS_NUM) { + qemu_log_mask(LOG_GUEST_ERROR, "%s: out-of-bounds offset 0x%04x\n", + __func__, (uint32_t)offset); + return; + } + + switch (offset) { + case REG_DRAMCOM_CR: /* Control Register */ + if (!(val & REG_DRAMCOM_CR_DUAL_RANK)) { + allwinner_r40_dramc_map_rows(s, ((val >> 4) & 0xf) + 1, + ((val >> 2) & 0x1) + 2, + (((val >> 8) & 0xf) + 3)); + } + break; + }; + + s->dramcom[idx] = (uint32_t) val; +} + +static uint64_t allwinner_r40_dramctl_read(void *opaque, hwaddr offset, + unsigned size) +{ + const AwR40DramCtlState *s = AW_R40_DRAMC(opaque); + const uint32_t idx = REG_INDEX(offset); + + if (idx >= AW_R40_DRAMCTL_REGS_NUM) { + qemu_log_mask(LOG_GUEST_ERROR, "%s: out-of-bounds offset 0x%04x\n", + __func__, (uint32_t)offset); + return 0; + } + + trace_allwinner_r40_dramctl_read(offset, s->dramctl[idx], size); + return s->dramctl[idx]; +} + +static void allwinner_r40_dramctl_write(void *opaque, hwaddr offset, + uint64_t val, unsigned size) +{ + AwR40DramCtlState *s = AW_R40_DRAMC(opaque); + const uint32_t idx = REG_INDEX(offset); + + trace_allwinner_r40_dramctl_write(offset, val, size); + + if (idx >= AW_R40_DRAMCTL_REGS_NUM) { + qemu_log_mask(LOG_GUEST_ERROR, "%s: out-of-bounds offset 0x%04x\n", + __func__, (uint32_t)offset); + return; + } + + switch (offset) { + case REG_DRAMCTL_PIR: /* PHY Initialization Register */ + s->dramctl[REG_INDEX(REG_DRAMCTL_PGSR)] |= REG_DRAMCTL_PGSR_INITDONE; + s->dramctl[REG_INDEX(REG_DRAMCTL_STATR)] |= REG_DRAMCTL_STATR_ACTIVE; + break; + } + + s->dramctl[idx] = (uint32_t) val; +} + +static uint64_t allwinner_r40_dramphy_read(void *opaque, hwaddr offset, + unsigned size) +{ + const AwR40DramCtlState *s = AW_R40_DRAMC(opaque); + const uint32_t idx = REG_INDEX(offset); + + if (idx >= AW_R40_DRAMPHY_REGS_NUM) { + qemu_log_mask(LOG_GUEST_ERROR, "%s: out-of-bounds offset 0x%04x\n", + __func__, (uint32_t)offset); + return 0; + } + + trace_allwinner_r40_dramphy_read(offset, s->dramphy[idx], size); + return s->dramphy[idx]; +} + +static void allwinner_r40_dramphy_write(void *opaque, hwaddr offset, + uint64_t val, unsigned size) +{ + AwR40DramCtlState *s = AW_R40_DRAMC(opaque); + const uint32_t idx = REG_INDEX(offset); + + trace_allwinner_r40_dramphy_write(offset, val, size); + + if (idx >= AW_R40_DRAMPHY_REGS_NUM) { + qemu_log_mask(LOG_GUEST_ERROR, "%s: out-of-bounds offset 0x%04x\n", + __func__, (uint32_t)offset); + return; + } + + s->dramphy[idx] = (uint32_t) val; +} + +static const MemoryRegionOps allwinner_r40_dramcom_ops = { + .read = allwinner_r40_dramcom_read, + .write = allwinner_r40_dramcom_write, + .endianness = DEVICE_NATIVE_ENDIAN, + .valid = { + .min_access_size = 4, + .max_access_size = 4, + }, + .impl.min_access_size = 4, +}; + +static const MemoryRegionOps allwinner_r40_dramctl_ops = { + .read = allwinner_r40_dramctl_read, + .write = allwinner_r40_dramctl_write, + .endianness = DEVICE_NATIVE_ENDIAN, + .valid = { + .min_access_size = 4, + .max_access_size = 4, + }, + .impl.min_access_size = 4, +}; + +static const MemoryRegionOps allwinner_r40_dramphy_ops = { + .read = allwinner_r40_dramphy_read, + .write = allwinner_r40_dramphy_write, + .endianness = DEVICE_NATIVE_ENDIAN, + .valid = { + .min_access_size = 4, + .max_access_size = 4, + }, + .impl.min_access_size = 4, +}; + +static uint64_t allwinner_r40_detect_read(void *opaque, hwaddr offset, + unsigned size) +{ + AwR40DramCtlState *s = AW_R40_DRAMC(opaque); + const struct VirtualDDRChip *ddr = get_match_ddr(s->ram_size); + uint64_t data = 0; + + if (ddr) { + data = *address_to_autodetect_cells(s, ddr, (uint32_t)offset); + } + + trace_allwinner_r40_dramc_detect_cell_read(offset, data); + return data; +} + +static void allwinner_r40_detect_write(void *opaque, hwaddr offset, + uint64_t data, unsigned size) +{ + AwR40DramCtlState *s = AW_R40_DRAMC(opaque); + const struct VirtualDDRChip *ddr = get_match_ddr(s->ram_size); + + if (ddr) { + uint64_t *cell = address_to_autodetect_cells(s, ddr, (uint32_t)offset); + trace_allwinner_r40_dramc_detect_cell_write(offset, data); + *cell = data; + } +} + +static const MemoryRegionOps allwinner_r40_detect_ops = { + .read = allwinner_r40_detect_read, + .write = allwinner_r40_detect_write, + .endianness = DEVICE_NATIVE_ENDIAN, + .valid = { + .min_access_size = 4, + .max_access_size = 4, + }, + .impl.min_access_size = 4, +}; + +/* + * mctl_r40_detect_rank_count in u-boot will write the high 1G of DDR + * to detect wether the board support dual_rank or not. Create a virtual memory + * if the board's ram_size less or equal than 1G, and set read time out flag of + * REG_DRAMCTL_PGSR when the user touch this high dram. + */ +static uint64_t allwinner_r40_dualrank_detect_read(void *opaque, hwaddr offset, + unsigned size) +{ + AwR40DramCtlState *s = AW_R40_DRAMC(opaque); + uint32_t reg; + + reg = s->dramctl[REG_INDEX(REG_DRAMCTL_PGCR)]; + if (reg & REG_DRAMCTL_PGCR_ENABLE_READ_TIMEOUT) { /* Enable read time out */ + /* + * this driver only support one rank, mark READ_TIMEOUT when try + * read the second rank. + */ + s->dramctl[REG_INDEX(REG_DRAMCTL_PGSR)] + |= REG_DRAMCTL_PGSR_READ_TIMEOUT; + } + + return 0; +} + +static const MemoryRegionOps allwinner_r40_dualrank_detect_ops = { + .read = allwinner_r40_dualrank_detect_read, + .endianness = DEVICE_NATIVE_ENDIAN, + .valid = { + .min_access_size = 4, + .max_access_size = 4, + }, + .impl.min_access_size = 4, +}; + +static void allwinner_r40_dramc_reset(DeviceState *dev) +{ + AwR40DramCtlState *s = AW_R40_DRAMC(dev); + + /* Set default values for registers */ + memset(&s->dramcom, 0, sizeof(s->dramcom)); + memset(&s->dramctl, 0, sizeof(s->dramctl)); + memset(&s->dramphy, 0, sizeof(s->dramphy)); +} + +static void allwinner_r40_dramc_realize(DeviceState *dev, Error **errp) +{ + AwR40DramCtlState *s = AW_R40_DRAMC(dev); + + if (!get_match_ddr(s->ram_size)) { + error_report("%s: ram-size %u MiB is not supported", + __func__, s->ram_size); + exit(1); + } + + /* detect_cells */ + sysbus_mmio_map_overlap(SYS_BUS_DEVICE(s), 3, s->ram_addr, 10); + memory_region_set_enabled(&s->detect_cells, false); + + /* + * We only support DRAM size up to 1G now, so prepare a high memory page + * after 1G for dualrank detect. index = 4 + */ + memory_region_init_io(&s->dram_high, OBJECT(s), + &allwinner_r40_dualrank_detect_ops, s, + "DRAMHIGH", KiB); + sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->dram_high); + sysbus_mmio_map(SYS_BUS_DEVICE(s), 4, s->ram_addr + GiB); +} + +static void allwinner_r40_dramc_init(Object *obj) +{ + SysBusDevice *sbd = SYS_BUS_DEVICE(obj); + AwR40DramCtlState *s = AW_R40_DRAMC(obj); + + /* DRAMCOM registers, index 0 */ + memory_region_init_io(&s->dramcom_iomem, OBJECT(s), + &allwinner_r40_dramcom_ops, s, + "DRAMCOM", 4 * KiB); + sysbus_init_mmio(sbd, &s->dramcom_iomem); + + /* DRAMCTL registers, index 1 */ + memory_region_init_io(&s->dramctl_iomem, OBJECT(s), + &allwinner_r40_dramctl_ops, s, + "DRAMCTL", 4 * KiB); + sysbus_init_mmio(sbd, &s->dramctl_iomem); + + /* DRAMPHY registers. index 2 */ + memory_region_init_io(&s->dramphy_iomem, OBJECT(s), + &allwinner_r40_dramphy_ops, s, + "DRAMPHY", 4 * KiB); + sysbus_init_mmio(sbd, &s->dramphy_iomem); + + /* R40 support max 2G memory but we only support up to 1G now. index 3 */ + memory_region_init_io(&s->detect_cells, OBJECT(s), + &allwinner_r40_detect_ops, s, + "DRAMCELLS", 1 * GiB); + sysbus_init_mmio(sbd, &s->detect_cells); +} + +static Property allwinner_r40_dramc_properties[] = { + DEFINE_PROP_UINT64("ram-addr", AwR40DramCtlState, ram_addr, 0x0), + DEFINE_PROP_UINT32("ram-size", AwR40DramCtlState, ram_size, 256), /* MiB */ + DEFINE_PROP_END_OF_LIST() +}; + +static const VMStateDescription allwinner_r40_dramc_vmstate = { + .name = "allwinner-r40-dramc", + .version_id = 1, + .minimum_version_id = 1, + .fields = (VMStateField[]) { + VMSTATE_UINT32_ARRAY(dramcom, AwR40DramCtlState, + AW_R40_DRAMCOM_REGS_NUM), + VMSTATE_UINT32_ARRAY(dramctl, AwR40DramCtlState, + AW_R40_DRAMCTL_REGS_NUM), + VMSTATE_UINT32_ARRAY(dramphy, AwR40DramCtlState, + AW_R40_DRAMPHY_REGS_NUM), + VMSTATE_END_OF_LIST() + } +}; + +static void allwinner_r40_dramc_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(klass); + + dc->reset = allwinner_r40_dramc_reset; + dc->vmsd = &allwinner_r40_dramc_vmstate; + dc->realize = allwinner_r40_dramc_realize; + device_class_set_props(dc, allwinner_r40_dramc_properties); +} + +static const TypeInfo allwinner_r40_dramc_info = { + .name = TYPE_AW_R40_DRAMC, + .parent = TYPE_SYS_BUS_DEVICE, + .instance_init = allwinner_r40_dramc_init, + .instance_size = sizeof(AwR40DramCtlState), + .class_init = allwinner_r40_dramc_class_init, +}; + +static void allwinner_r40_dramc_register(void) +{ + type_register_static(&allwinner_r40_dramc_info); +} + +type_init(allwinner_r40_dramc_register) diff --git a/hw/misc/meson.build b/hw/misc/meson.build index XXXXXXX..XXXXXXX 100644 --- a/hw/misc/meson.build +++ b/hw/misc/meson.build @@ -XXX,XX +XXX,XX @@ softmmu_ss.add(when: 'CONFIG_ALLWINNER_H3', if_true: files('allwinner-h3-dramc.c softmmu_ss.add(when: 'CONFIG_ALLWINNER_H3', if_true: files('allwinner-h3-sysctrl.c')) softmmu_ss.add(when: 'CONFIG_ALLWINNER_H3', if_true: files('allwinner-sid.c')) softmmu_ss.add(when: 'CONFIG_ALLWINNER_R40', if_true: files('allwinner-r40-ccu.c')) +softmmu_ss.add(when: 'CONFIG_ALLWINNER_R40', if_true: files('allwinner-r40-dramc.c')) softmmu_ss.add(when: 'CONFIG_AXP2XX_PMU', if_true: files('axp2xx.c')) softmmu_ss.add(when: 'CONFIG_REALVIEW', if_true: files('arm_sysctl.c')) softmmu_ss.add(when: 'CONFIG_NSERIES', if_true: files('cbus.c')) diff --git a/hw/misc/trace-events b/hw/misc/trace-events index XXXXXXX..XXXXXXX 100644 --- a/hw/misc/trace-events +++ b/hw/misc/trace-events @@ -XXX,XX +XXX,XX @@ allwinner_h3_dramctl_write(uint64_t offset, uint64_t data, unsigned size) "Write allwinner_h3_dramphy_read(uint64_t offset, uint64_t data, unsigned size) "Read: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %" PRIu32 allwinner_h3_dramphy_write(uint64_t offset, uint64_t data, unsigned size) "write: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %" PRIu32 +# allwinner-r40-dramc.c +allwinner_r40_dramc_detect_cells_disable(void) "Disable detect cells" +allwinner_r40_dramc_detect_cells_enable(void) "Enable detect cells" +allwinner_r40_dramc_map_rows(uint8_t row_bits, uint8_t bank_bits, uint8_t col_bits) "DRAM layout: row_bits %d, bank_bits %d, col_bits %d" +allwinner_r40_dramc_offset_to_cell(uint64_t offset, int row, int bank, int col) "offset 0x%" PRIx64 " row %d bank %d col %d" +allwinner_r40_dramc_detect_cell_write(uint64_t offset, uint64_t data) "offset 0x%" PRIx64 " data 0x%" PRIx64 "" +allwinner_r40_dramc_detect_cell_read(uint64_t offset, uint64_t data) "offset 0x%" PRIx64 " data 0x%" PRIx64 "" +allwinner_r40_dramcom_read(uint64_t offset, uint64_t data, unsigned size) "Read: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %" PRIu32 +allwinner_r40_dramcom_write(uint64_t offset, uint64_t data, unsigned size) "Write: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %" PRIu32 +allwinner_r40_dramctl_read(uint64_t offset, uint64_t data, unsigned size) "Read: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %" PRIu32 +allwinner_r40_dramctl_write(uint64_t offset, uint64_t data, unsigned size) "Write: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %" PRIu32 +allwinner_r40_dramphy_read(uint64_t offset, uint64_t data, unsigned size) "Read: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %" PRIu32 +allwinner_r40_dramphy_write(uint64_t offset, uint64_t data, unsigned size) "write: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %" PRIu32 + # allwinner-sid.c allwinner_sid_read(uint64_t offset, uint64_t data, unsigned size) "offset 0x%" PRIx64 " data 0x%" PRIx64 " size %" PRIu32 allwinner_sid_write(uint64_t offset, uint64_t data, unsigned size) "offset 0x%" PRIx64 " data 0x%" PRIx64 " size %" PRIu32 -- 2.34.1
From: qianfan Zhao <qianfanguijin@163.com> A64's sd register was similar to H3, and it introduced a new register named SAMP_DL_REG location at 0x144. The dma descriptor buffer size of mmc2 is only 8K and the other mmc controllers has 64K. Also fix allwinner-r40's mmc controller type. Signed-off-by: qianfan Zhao <qianfanguijin@163.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/hw/sd/allwinner-sdhost.h | 9 ++++ hw/arm/allwinner-r40.c | 2 +- hw/sd/allwinner-sdhost.c | 72 ++++++++++++++++++++++++++++++-- 3 files changed, 79 insertions(+), 4 deletions(-) diff --git a/include/hw/sd/allwinner-sdhost.h b/include/hw/sd/allwinner-sdhost.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/sd/allwinner-sdhost.h +++ b/include/hw/sd/allwinner-sdhost.h @@ -XXX,XX +XXX,XX @@ /** Allwinner sun5i family and newer (A13, H2+, H3, etc) */ #define TYPE_AW_SDHOST_SUN5I TYPE_AW_SDHOST "-sun5i" +/** Allwinner sun50i-a64 */ +#define TYPE_AW_SDHOST_SUN50I_A64 TYPE_AW_SDHOST "-sun50i-a64" + +/** Allwinner sun50i-a64 emmc */ +#define TYPE_AW_SDHOST_SUN50I_A64_EMMC TYPE_AW_SDHOST "-sun50i-a64-emmc" + /** @} */ /** @@ -XXX,XX +XXX,XX @@ struct AwSdHostState { uint32_t startbit_detect; /**< eMMC DDR Start Bit Detection Control */ uint32_t response_crc; /**< Response CRC */ uint32_t data_crc[8]; /**< Data CRC */ + uint32_t sample_delay; /**< Sample delay control */ uint32_t status_crc; /**< Status CRC */ /** @} */ @@ -XXX,XX +XXX,XX @@ struct AwSdHostClass { size_t max_desc_size; bool is_sun4i; + /** does the IP block support autocalibration? */ + bool can_calibrate; }; #endif /* HW_SD_ALLWINNER_SDHOST_H */ diff --git a/hw/arm/allwinner-r40.c b/hw/arm/allwinner-r40.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/allwinner-r40.c +++ b/hw/arm/allwinner-r40.c @@ -XXX,XX +XXX,XX @@ static void allwinner_r40_init(Object *obj) for (int i = 0; i < AW_R40_NUM_MMCS; i++) { object_initialize_child(obj, mmc_names[i], &s->mmc[i], - TYPE_AW_SDHOST_SUN5I); + TYPE_AW_SDHOST_SUN50I_A64); } object_initialize_child(obj, "twi0", &s->i2c0, TYPE_AW_I2C_SUN6I); diff --git a/hw/sd/allwinner-sdhost.c b/hw/sd/allwinner-sdhost.c index XXXXXXX..XXXXXXX 100644 --- a/hw/sd/allwinner-sdhost.c +++ b/hw/sd/allwinner-sdhost.c @@ -XXX,XX +XXX,XX @@ enum { REG_SD_DATA1_CRC = 0x12C, /* CRC Data 1 from card/eMMC */ REG_SD_DATA0_CRC = 0x130, /* CRC Data 0 from card/eMMC */ REG_SD_CRC_STA = 0x134, /* CRC status from card/eMMC during write */ + REG_SD_SAMP_DL = 0x144, /* Sample Delay Control (sun50i-a64) */ REG_SD_FIFO = 0x200, /* Read/Write FIFO */ }; @@ -XXX,XX +XXX,XX @@ enum { REG_SD_RES_CRC_RST = 0x0, REG_SD_DATA_CRC_RST = 0x0, REG_SD_CRC_STA_RST = 0x0, + REG_SD_SAMPLE_DL_RST = 0x00002000, REG_SD_FIFO_RST = 0x0, }; @@ -XXX,XX +XXX,XX @@ static uint64_t allwinner_sdhost_read(void *opaque, hwaddr offset, { AwSdHostState *s = AW_SDHOST(opaque); AwSdHostClass *sc = AW_SDHOST_GET_CLASS(s); + bool out_of_bounds = false; uint32_t res = 0; switch (offset) { @@ -XXX,XX +XXX,XX @@ static uint64_t allwinner_sdhost_read(void *opaque, hwaddr offset, case REG_SD_FIFO: /* Read/Write FIFO */ res = allwinner_sdhost_fifo_read(s); break; + case REG_SD_SAMP_DL: /* Sample Delay */ + if (sc->can_calibrate) { + res = s->sample_delay; + } else { + out_of_bounds = true; + } + break; default: - qemu_log_mask(LOG_GUEST_ERROR, "%s: out-of-bounds offset %" - HWADDR_PRIx"\n", __func__, offset); + out_of_bounds = true; res = 0; break; } + if (out_of_bounds) { + qemu_log_mask(LOG_GUEST_ERROR, "%s: out-of-bounds offset %" + HWADDR_PRIx"\n", __func__, offset); + } + trace_allwinner_sdhost_read(offset, res, size); return res; } @@ -XXX,XX +XXX,XX @@ static void allwinner_sdhost_write(void *opaque, hwaddr offset, { AwSdHostState *s = AW_SDHOST(opaque); AwSdHostClass *sc = AW_SDHOST_GET_CLASS(s); + bool out_of_bounds = false; trace_allwinner_sdhost_write(offset, value, size); @@ -XXX,XX +XXX,XX @@ static void allwinner_sdhost_write(void *opaque, hwaddr offset, case REG_SD_DATA0_CRC: /* CRC Data 0 from card/eMMC */ case REG_SD_CRC_STA: /* CRC status from card/eMMC in write operation */ break; + case REG_SD_SAMP_DL: /* Sample delay control */ + if (sc->can_calibrate) { + s->sample_delay = value; + } else { + out_of_bounds = true; + } + break; default: + out_of_bounds = true; + break; + } + + if (out_of_bounds) { qemu_log_mask(LOG_GUEST_ERROR, "%s: out-of-bounds offset %" HWADDR_PRIx"\n", __func__, offset); - break; } } @@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_allwinner_sdhost = { VMSTATE_UINT32(response_crc, AwSdHostState), VMSTATE_UINT32_ARRAY(data_crc, AwSdHostState, 8), VMSTATE_UINT32(status_crc, AwSdHostState), + VMSTATE_UINT32(sample_delay, AwSdHostState), VMSTATE_END_OF_LIST() } }; @@ -XXX,XX +XXX,XX @@ static void allwinner_sdhost_realize(DeviceState *dev, Error **errp) static void allwinner_sdhost_reset(DeviceState *dev) { AwSdHostState *s = AW_SDHOST(dev); + AwSdHostClass *sc = AW_SDHOST_GET_CLASS(s); s->global_ctl = REG_SD_GCTL_RST; s->clock_ctl = REG_SD_CKCR_RST; @@ -XXX,XX +XXX,XX @@ static void allwinner_sdhost_reset(DeviceState *dev) } s->status_crc = REG_SD_CRC_STA_RST; + + if (sc->can_calibrate) { + s->sample_delay = REG_SD_SAMPLE_DL_RST; + } } static void allwinner_sdhost_bus_class_init(ObjectClass *klass, void *data) @@ -XXX,XX +XXX,XX @@ static void allwinner_sdhost_sun4i_class_init(ObjectClass *klass, void *data) AwSdHostClass *sc = AW_SDHOST_CLASS(klass); sc->max_desc_size = 8 * KiB; sc->is_sun4i = true; + sc->can_calibrate = false; } static void allwinner_sdhost_sun5i_class_init(ObjectClass *klass, void *data) @@ -XXX,XX +XXX,XX @@ static void allwinner_sdhost_sun5i_class_init(ObjectClass *klass, void *data) AwSdHostClass *sc = AW_SDHOST_CLASS(klass); sc->max_desc_size = 64 * KiB; sc->is_sun4i = false; + sc->can_calibrate = false; +} + +static void allwinner_sdhost_sun50i_a64_class_init(ObjectClass *klass, + void *data) +{ + AwSdHostClass *sc = AW_SDHOST_CLASS(klass); + sc->max_desc_size = 64 * KiB; + sc->is_sun4i = false; + sc->can_calibrate = true; +} + +static void allwinner_sdhost_sun50i_a64_emmc_class_init(ObjectClass *klass, + void *data) +{ + AwSdHostClass *sc = AW_SDHOST_CLASS(klass); + sc->max_desc_size = 8 * KiB; + sc->is_sun4i = false; + sc->can_calibrate = true; } static const TypeInfo allwinner_sdhost_info = { @@ -XXX,XX +XXX,XX @@ static const TypeInfo allwinner_sdhost_sun5i_info = { .class_init = allwinner_sdhost_sun5i_class_init, }; +static const TypeInfo allwinner_sdhost_sun50i_a64_info = { + .name = TYPE_AW_SDHOST_SUN50I_A64, + .parent = TYPE_AW_SDHOST, + .class_init = allwinner_sdhost_sun50i_a64_class_init, +}; + +static const TypeInfo allwinner_sdhost_sun50i_a64_emmc_info = { + .name = TYPE_AW_SDHOST_SUN50I_A64_EMMC, + .parent = TYPE_AW_SDHOST, + .class_init = allwinner_sdhost_sun50i_a64_emmc_class_init, +}; + static const TypeInfo allwinner_sdhost_bus_info = { .name = TYPE_AW_SDHOST_BUS, .parent = TYPE_SD_BUS, @@ -XXX,XX +XXX,XX @@ static void allwinner_sdhost_register_types(void) type_register_static(&allwinner_sdhost_info); type_register_static(&allwinner_sdhost_sun4i_info); type_register_static(&allwinner_sdhost_sun5i_info); + type_register_static(&allwinner_sdhost_sun50i_a64_info); + type_register_static(&allwinner_sdhost_sun50i_a64_emmc_info); type_register_static(&allwinner_sdhost_bus_info); } -- 2.34.1
From: qianfan Zhao <qianfanguijin@163.com> R40 has two ethernet controllers named as emac and gmac. The emac is compatibled with A10, and the GMAC is compatibled with H3. Signed-off-by: qianfan Zhao <qianfanguijin@163.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/hw/arm/allwinner-r40.h | 6 ++++ hw/arm/allwinner-r40.c | 50 ++++++++++++++++++++++++++++++++-- hw/arm/bananapi_m2u.c | 3 ++ 3 files changed, 57 insertions(+), 2 deletions(-) diff --git a/include/hw/arm/allwinner-r40.h b/include/hw/arm/allwinner-r40.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/arm/allwinner-r40.h +++ b/include/hw/arm/allwinner-r40.h @@ -XXX,XX +XXX,XX @@ #include "hw/misc/allwinner-r40-ccu.h" #include "hw/misc/allwinner-r40-dramc.h" #include "hw/i2c/allwinner-i2c.h" +#include "hw/net/allwinner_emac.h" +#include "hw/net/allwinner-sun8i-emac.h" #include "target/arm/cpu.h" #include "sysemu/block-backend.h" @@ -XXX,XX +XXX,XX @@ enum { AW_R40_DEV_SRAM_A2, AW_R40_DEV_SRAM_A3, AW_R40_DEV_SRAM_A4, + AW_R40_DEV_EMAC, AW_R40_DEV_MMC0, AW_R40_DEV_MMC1, AW_R40_DEV_MMC2, @@ -XXX,XX +XXX,XX @@ enum { AW_R40_DEV_UART6, AW_R40_DEV_UART7, AW_R40_DEV_TWI0, + AW_R40_DEV_GMAC, AW_R40_DEV_GIC_DIST, AW_R40_DEV_GIC_CPU, AW_R40_DEV_GIC_HYP, @@ -XXX,XX +XXX,XX @@ struct AwR40State { AwR40ClockCtlState ccu; AwR40DramCtlState dramc; AWI2CState i2c0; + AwEmacState emac; + AwSun8iEmacState gmac; GICState gic; MemoryRegion sram_a1; MemoryRegion sram_a2; diff --git a/hw/arm/allwinner-r40.c b/hw/arm/allwinner-r40.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/allwinner-r40.c +++ b/hw/arm/allwinner-r40.c @@ -XXX,XX +XXX,XX @@ const hwaddr allwinner_r40_memmap[] = { [AW_R40_DEV_SRAM_A2] = 0x00004000, [AW_R40_DEV_SRAM_A3] = 0x00008000, [AW_R40_DEV_SRAM_A4] = 0x0000b400, + [AW_R40_DEV_EMAC] = 0x01c0b000, [AW_R40_DEV_MMC0] = 0x01c0f000, [AW_R40_DEV_MMC1] = 0x01c10000, [AW_R40_DEV_MMC2] = 0x01c11000, @@ -XXX,XX +XXX,XX @@ const hwaddr allwinner_r40_memmap[] = { [AW_R40_DEV_UART6] = 0x01c29800, [AW_R40_DEV_UART7] = 0x01c29c00, [AW_R40_DEV_TWI0] = 0x01c2ac00, + [AW_R40_DEV_GMAC] = 0x01c50000, [AW_R40_DEV_DRAMCOM] = 0x01c62000, [AW_R40_DEV_DRAMCTL] = 0x01c63000, [AW_R40_DEV_DRAMPHY] = 0x01c65000, @@ -XXX,XX +XXX,XX @@ static struct AwR40Unimplemented r40_unimplemented[] = { { "spi1", 0x01c06000, 4 * KiB }, { "cs0", 0x01c09000, 4 * KiB }, { "keymem", 0x01c0a000, 4 * KiB }, - { "emac", 0x01c0b000, 4 * KiB }, { "usb0-otg", 0x01c13000, 4 * KiB }, { "usb0-host", 0x01c14000, 4 * KiB }, { "crypto", 0x01c15000, 4 * KiB }, @@ -XXX,XX +XXX,XX @@ static struct AwR40Unimplemented r40_unimplemented[] = { { "tvd2", 0x01c33000, 4 * KiB }, { "tvd3", 0x01c34000, 4 * KiB }, { "gpu", 0x01c40000, 64 * KiB }, - { "gmac", 0x01c50000, 64 * KiB }, { "hstmr", 0x01c60000, 4 * KiB }, { "tcon-top", 0x01c70000, 4 * KiB }, { "lcd0", 0x01c71000, 4 * KiB }, @@ -XXX,XX +XXX,XX @@ enum { AW_R40_GIC_SPI_MMC1 = 33, AW_R40_GIC_SPI_MMC2 = 34, AW_R40_GIC_SPI_MMC3 = 35, + AW_R40_GIC_SPI_EMAC = 55, + AW_R40_GIC_SPI_GMAC = 85, }; /* Allwinner R40 general constants */ @@ -XXX,XX +XXX,XX @@ static void allwinner_r40_init(Object *obj) object_initialize_child(obj, "twi0", &s->i2c0, TYPE_AW_I2C_SUN6I); + object_initialize_child(obj, "emac", &s->emac, TYPE_AW_EMAC); + object_initialize_child(obj, "gmac", &s->gmac, TYPE_AW_SUN8I_EMAC); + object_property_add_alias(obj, "gmac-phy-addr", + OBJECT(&s->gmac), "phy-addr"); + object_initialize_child(obj, "dramc", &s->dramc, TYPE_AW_R40_DRAMC); object_property_add_alias(obj, "ram-addr", OBJECT(&s->dramc), "ram-addr"); @@ -XXX,XX +XXX,XX @@ static void allwinner_r40_init(Object *obj) static void allwinner_r40_realize(DeviceState *dev, Error **errp) { + const char *r40_nic_models[] = { "gmac", "emac", NULL }; AwR40State *s = AW_R40(dev); unsigned i; @@ -XXX,XX +XXX,XX @@ static void allwinner_r40_realize(DeviceState *dev, Error **errp) sysbus_mmio_map(SYS_BUS_DEVICE(&s->dramc), 2, s->memmap[AW_R40_DEV_DRAMPHY]); + /* nic support gmac and emac */ + for (int i = 0; i < ARRAY_SIZE(r40_nic_models) - 1; i++) { + NICInfo *nic = &nd_table[i]; + + if (!nic->used) { + continue; + } + if (qemu_show_nic_models(nic->model, r40_nic_models)) { + exit(0); + } + + switch (qemu_find_nic_model(nic, r40_nic_models, r40_nic_models[0])) { + case 0: /* gmac */ + qdev_set_nic_properties(DEVICE(&s->gmac), nic); + break; + case 1: /* emac */ + qdev_set_nic_properties(DEVICE(&s->emac), nic); + break; + default: + exit(1); + break; + } + } + + /* GMAC */ + object_property_set_link(OBJECT(&s->gmac), "dma-memory", + OBJECT(get_system_memory()), &error_fatal); + sysbus_realize(SYS_BUS_DEVICE(&s->gmac), &error_fatal); + sysbus_mmio_map(SYS_BUS_DEVICE(&s->gmac), 0, s->memmap[AW_R40_DEV_GMAC]); + sysbus_connect_irq(SYS_BUS_DEVICE(&s->gmac), 0, + qdev_get_gpio_in(DEVICE(&s->gic), AW_R40_GIC_SPI_GMAC)); + + /* EMAC */ + sysbus_realize(SYS_BUS_DEVICE(&s->emac), &error_fatal); + sysbus_mmio_map(SYS_BUS_DEVICE(&s->emac), 0, s->memmap[AW_R40_DEV_EMAC]); + sysbus_connect_irq(SYS_BUS_DEVICE(&s->emac), 0, + qdev_get_gpio_in(DEVICE(&s->gic), AW_R40_GIC_SPI_EMAC)); + /* Unimplemented devices */ for (i = 0; i < ARRAY_SIZE(r40_unimplemented); i++) { create_unimplemented_device(r40_unimplemented[i].device_name, diff --git a/hw/arm/bananapi_m2u.c b/hw/arm/bananapi_m2u.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/bananapi_m2u.c +++ b/hw/arm/bananapi_m2u.c @@ -XXX,XX +XXX,XX @@ static void bpim2u_init(MachineState *machine) object_property_set_int(OBJECT(r40), "ram-size", r40->ram_size, &error_abort); + /* GMAC PHY */ + object_property_set_uint(OBJECT(r40), "gmac-phy-addr", 1, &error_abort); + /* Mark R40 object realized */ qdev_realize(DEVICE(r40), NULL, &error_abort); -- 2.34.1
From: qianfan Zhao <qianfanguijin@163.com> Only a few important registers are added, especially the SRAM_VER register. Signed-off-by: qianfan Zhao <qianfanguijin@163.com> Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/hw/arm/allwinner-r40.h | 3 + include/hw/misc/allwinner-sramc.h | 69 +++++++++++ hw/arm/allwinner-r40.c | 7 +- hw/misc/allwinner-sramc.c | 184 ++++++++++++++++++++++++++++++ hw/arm/Kconfig | 1 + hw/misc/Kconfig | 3 + hw/misc/meson.build | 1 + hw/misc/trace-events | 4 + 8 files changed, 271 insertions(+), 1 deletion(-) create mode 100644 include/hw/misc/allwinner-sramc.h create mode 100644 hw/misc/allwinner-sramc.c diff --git a/include/hw/arm/allwinner-r40.h b/include/hw/arm/allwinner-r40.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/arm/allwinner-r40.h +++ b/include/hw/arm/allwinner-r40.h @@ -XXX,XX +XXX,XX @@ #include "hw/sd/allwinner-sdhost.h" #include "hw/misc/allwinner-r40-ccu.h" #include "hw/misc/allwinner-r40-dramc.h" +#include "hw/misc/allwinner-sramc.h" #include "hw/i2c/allwinner-i2c.h" #include "hw/net/allwinner_emac.h" #include "hw/net/allwinner-sun8i-emac.h" @@ -XXX,XX +XXX,XX @@ enum { AW_R40_DEV_SRAM_A2, AW_R40_DEV_SRAM_A3, AW_R40_DEV_SRAM_A4, + AW_R40_DEV_SRAMC, AW_R40_DEV_EMAC, AW_R40_DEV_MMC0, AW_R40_DEV_MMC1, @@ -XXX,XX +XXX,XX @@ struct AwR40State { ARMCPU cpus[AW_R40_NUM_CPUS]; const hwaddr *memmap; + AwSRAMCState sramc; AwA10PITState timer; AwSdHostState mmc[AW_R40_NUM_MMCS]; AwR40ClockCtlState ccu; diff --git a/include/hw/misc/allwinner-sramc.h b/include/hw/misc/allwinner-sramc.h new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/include/hw/misc/allwinner-sramc.h @@ -XXX,XX +XXX,XX @@ +/* + * Allwinner SRAM controller emulation + * + * Copyright (C) 2023 qianfan Zhao <qianfanguijin@163.com> + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef HW_MISC_ALLWINNER_SRAMC_H +#define HW_MISC_ALLWINNER_SRAMC_H + +#include "qom/object.h" +#include "hw/sysbus.h" +#include "qemu/uuid.h" + +/** + * Object model + * @{ + */ +#define TYPE_AW_SRAMC "allwinner-sramc" +#define TYPE_AW_SRAMC_SUN8I_R40 TYPE_AW_SRAMC "-sun8i-r40" +OBJECT_DECLARE_TYPE(AwSRAMCState, AwSRAMCClass, AW_SRAMC) + +/** @} */ + +/** + * Allwinner SRAMC object instance state + */ +struct AwSRAMCState { + /*< private >*/ + SysBusDevice parent_obj; + /*< public >*/ + + /** Maps I/O registers in physical memory */ + MemoryRegion iomem; + + /* registers */ + uint32_t sram_ctl1; + uint32_t sram_ver; + uint32_t sram_soft_entry_reg0; +}; + +/** + * Allwinner SRAM Controller class-level struct. + * + * This struct is filled by each sunxi device specific code + * such that the generic code can use this struct to support + * all devices. + */ +struct AwSRAMCClass { + /*< private >*/ + SysBusDeviceClass parent_class; + /*< public >*/ + + uint32_t sram_version_code; +}; + +#endif /* HW_MISC_ALLWINNER_SRAMC_H */ diff --git a/hw/arm/allwinner-r40.c b/hw/arm/allwinner-r40.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/allwinner-r40.c +++ b/hw/arm/allwinner-r40.c @@ -XXX,XX +XXX,XX @@ const hwaddr allwinner_r40_memmap[] = { [AW_R40_DEV_SRAM_A2] = 0x00004000, [AW_R40_DEV_SRAM_A3] = 0x00008000, [AW_R40_DEV_SRAM_A4] = 0x0000b400, + [AW_R40_DEV_SRAMC] = 0x01c00000, [AW_R40_DEV_EMAC] = 0x01c0b000, [AW_R40_DEV_MMC0] = 0x01c0f000, [AW_R40_DEV_MMC1] = 0x01c10000, @@ -XXX,XX +XXX,XX @@ struct AwR40Unimplemented { static struct AwR40Unimplemented r40_unimplemented[] = { { "d-engine", 0x01000000, 4 * MiB }, { "d-inter", 0x01400000, 128 * KiB }, - { "sram-c", 0x01c00000, 4 * KiB }, { "dma", 0x01c02000, 4 * KiB }, { "nfdc", 0x01c03000, 4 * KiB }, { "ts", 0x01c04000, 4 * KiB }, @@ -XXX,XX +XXX,XX @@ static void allwinner_r40_init(Object *obj) "ram-addr"); object_property_add_alias(obj, "ram-size", OBJECT(&s->dramc), "ram-size"); + + object_initialize_child(obj, "sramc", &s->sramc, TYPE_AW_SRAMC_SUN8I_R40); } static void allwinner_r40_realize(DeviceState *dev, Error **errp) @@ -XXX,XX +XXX,XX @@ static void allwinner_r40_realize(DeviceState *dev, Error **errp) AW_R40_GIC_SPI_TIMER1)); /* SRAM */ + sysbus_realize(SYS_BUS_DEVICE(&s->sramc), &error_fatal); + sysbus_mmio_map(SYS_BUS_DEVICE(&s->sramc), 0, s->memmap[AW_R40_DEV_SRAMC]); + memory_region_init_ram(&s->sram_a1, OBJECT(dev), "sram A1", 16 * KiB, &error_abort); memory_region_init_ram(&s->sram_a2, OBJECT(dev), "sram A2", diff --git a/hw/misc/allwinner-sramc.c b/hw/misc/allwinner-sramc.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/hw/misc/allwinner-sramc.c @@ -XXX,XX +XXX,XX @@ +/* + * Allwinner R40 SRAM controller emulation + * + * Copyright (C) 2023 qianfan Zhao <qianfanguijin@163.com> + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#include "qemu/osdep.h" +#include "qemu/units.h" +#include "hw/sysbus.h" +#include "migration/vmstate.h" +#include "qemu/log.h" +#include "qemu/module.h" +#include "qapi/error.h" +#include "hw/qdev-properties.h" +#include "hw/qdev-properties-system.h" +#include "hw/misc/allwinner-sramc.h" +#include "trace.h" + +/* + * register offsets + * https://linux-sunxi.org/SRAM_Controller_Register_Guide + */ +enum { + REG_SRAM_CTL1_CFG = 0x04, /* SRAM Control register 1 */ + REG_SRAM_VER = 0x24, /* SRAM Version register */ + REG_SRAM_R40_SOFT_ENTRY_REG0 = 0xbc, +}; + +/* REG_SRAMC_VERSION bit defines */ +#define SRAM_VER_READ_ENABLE (1 << 15) +#define SRAM_VER_VERSION_SHIFT 16 +#define SRAM_VERSION_SUN8I_R40 0x1701 + +static uint64_t allwinner_sramc_read(void *opaque, hwaddr offset, + unsigned size) +{ + AwSRAMCState *s = AW_SRAMC(opaque); + AwSRAMCClass *sc = AW_SRAMC_GET_CLASS(s); + uint64_t val = 0; + + switch (offset) { + case REG_SRAM_CTL1_CFG: + val = s->sram_ctl1; + break; + case REG_SRAM_VER: + /* bit15: lock bit, set this bit before reading this register */ + if (s->sram_ver & SRAM_VER_READ_ENABLE) { + val = SRAM_VER_READ_ENABLE | + (sc->sram_version_code << SRAM_VER_VERSION_SHIFT); + } + break; + case REG_SRAM_R40_SOFT_ENTRY_REG0: + val = s->sram_soft_entry_reg0; + break; + default: + qemu_log_mask(LOG_GUEST_ERROR, "%s: out-of-bounds offset 0x%04x\n", + __func__, (uint32_t)offset); + return 0; + } + + trace_allwinner_sramc_read(offset, val); + + return val; +} + +static void allwinner_sramc_write(void *opaque, hwaddr offset, + uint64_t val, unsigned size) +{ + AwSRAMCState *s = AW_SRAMC(opaque); + + trace_allwinner_sramc_write(offset, val); + + switch (offset) { + case REG_SRAM_CTL1_CFG: + s->sram_ctl1 = val; + break; + case REG_SRAM_VER: + /* Only the READ_ENABLE bit is writeable */ + s->sram_ver = val & SRAM_VER_READ_ENABLE; + break; + case REG_SRAM_R40_SOFT_ENTRY_REG0: + s->sram_soft_entry_reg0 = val; + break; + default: + qemu_log_mask(LOG_GUEST_ERROR, "%s: out-of-bounds offset 0x%04x\n", + __func__, (uint32_t)offset); + break; + } +} + +static const MemoryRegionOps allwinner_sramc_ops = { + .read = allwinner_sramc_read, + .write = allwinner_sramc_write, + .endianness = DEVICE_NATIVE_ENDIAN, + .valid = { + .min_access_size = 4, + .max_access_size = 4, + }, + .impl.min_access_size = 4, +}; + +static const VMStateDescription allwinner_sramc_vmstate = { + .name = "allwinner-sramc", + .version_id = 1, + .minimum_version_id = 1, + .fields = (VMStateField[]) { + VMSTATE_UINT32(sram_ver, AwSRAMCState), + VMSTATE_UINT32(sram_soft_entry_reg0, AwSRAMCState), + VMSTATE_END_OF_LIST() + } +}; + +static void allwinner_sramc_reset(DeviceState *dev) +{ + AwSRAMCState *s = AW_SRAMC(dev); + AwSRAMCClass *sc = AW_SRAMC_GET_CLASS(s); + + switch (sc->sram_version_code) { + case SRAM_VERSION_SUN8I_R40: + s->sram_ctl1 = 0x1300; + break; + } +} + +static void allwinner_sramc_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(klass); + + dc->reset = allwinner_sramc_reset; + dc->vmsd = &allwinner_sramc_vmstate; +} + +static void allwinner_sramc_init(Object *obj) +{ + SysBusDevice *sbd = SYS_BUS_DEVICE(obj); + AwSRAMCState *s = AW_SRAMC(obj); + + /* Memory mapping */ + memory_region_init_io(&s->iomem, OBJECT(s), &allwinner_sramc_ops, s, + TYPE_AW_SRAMC, 1 * KiB); + sysbus_init_mmio(sbd, &s->iomem); +} + +static const TypeInfo allwinner_sramc_info = { + .name = TYPE_AW_SRAMC, + .parent = TYPE_SYS_BUS_DEVICE, + .instance_init = allwinner_sramc_init, + .instance_size = sizeof(AwSRAMCState), + .class_init = allwinner_sramc_class_init, +}; + +static void allwinner_r40_sramc_class_init(ObjectClass *klass, void *data) +{ + AwSRAMCClass *sc = AW_SRAMC_CLASS(klass); + + sc->sram_version_code = SRAM_VERSION_SUN8I_R40; +} + +static const TypeInfo allwinner_r40_sramc_info = { + .name = TYPE_AW_SRAMC_SUN8I_R40, + .parent = TYPE_AW_SRAMC, + .class_init = allwinner_r40_sramc_class_init, +}; + +static void allwinner_sramc_register(void) +{ + type_register_static(&allwinner_sramc_info); + type_register_static(&allwinner_r40_sramc_info); +} + +type_init(allwinner_sramc_register) diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/Kconfig +++ b/hw/arm/Kconfig @@ -XXX,XX +XXX,XX @@ config ALLWINNER_H3 config ALLWINNER_R40 bool default y if TCG && ARM + select ALLWINNER_SRAMC select ALLWINNER_A10_PIT select AXP2XX_PMU select SERIAL diff --git a/hw/misc/Kconfig b/hw/misc/Kconfig index XXXXXXX..XXXXXXX 100644 --- a/hw/misc/Kconfig +++ b/hw/misc/Kconfig @@ -XXX,XX +XXX,XX @@ config VIRT_CTRL config LASI bool +config ALLWINNER_SRAMC + bool + config ALLWINNER_A10_CCM bool diff --git a/hw/misc/meson.build b/hw/misc/meson.build index XXXXXXX..XXXXXXX 100644 --- a/hw/misc/meson.build +++ b/hw/misc/meson.build @@ -XXX,XX +XXX,XX @@ subdir('macio') softmmu_ss.add(when: 'CONFIG_IVSHMEM_DEVICE', if_true: files('ivshmem.c')) +softmmu_ss.add(when: 'CONFIG_ALLWINNER_SRAMC', if_true: files('allwinner-sramc.c')) softmmu_ss.add(when: 'CONFIG_ALLWINNER_A10_CCM', if_true: files('allwinner-a10-ccm.c')) softmmu_ss.add(when: 'CONFIG_ALLWINNER_A10_DRAMC', if_true: files('allwinner-a10-dramc.c')) softmmu_ss.add(when: 'CONFIG_ALLWINNER_H3', if_true: files('allwinner-h3-ccu.c')) diff --git a/hw/misc/trace-events b/hw/misc/trace-events index XXXXXXX..XXXXXXX 100644 --- a/hw/misc/trace-events +++ b/hw/misc/trace-events @@ -XXX,XX +XXX,XX @@ allwinner_r40_dramphy_write(uint64_t offset, uint64_t data, unsigned size) "writ allwinner_sid_read(uint64_t offset, uint64_t data, unsigned size) "offset 0x%" PRIx64 " data 0x%" PRIx64 " size %" PRIu32 allwinner_sid_write(uint64_t offset, uint64_t data, unsigned size) "offset 0x%" PRIx64 " data 0x%" PRIx64 " size %" PRIu32 +# allwinner-sramc.c +allwinner_sramc_read(uint64_t offset, uint64_t data) "offset 0x%" PRIx64 " data 0x%" PRIx64 +allwinner_sramc_write(uint64_t offset, uint64_t data) "offset 0x%" PRIx64 " data 0x%" PRIx64 + # avr_power.c avr_power_read(uint8_t value) "power_reduc read value:%u" avr_power_write(uint8_t value) "power_reduc write value:%u" -- 2.34.1
From: qianfan Zhao <qianfanguijin@163.com> Add test case for booting from initrd and sd card. Signed-off-by: qianfan Zhao <qianfanguijin@163.com> Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com> Tested-by: Niek Linnenbank <nieklinnenbank@gmail.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- tests/avocado/boot_linux_console.py | 176 ++++++++++++++++++++++++++++ 1 file changed, 176 insertions(+) diff --git a/tests/avocado/boot_linux_console.py b/tests/avocado/boot_linux_console.py index XXXXXXX..XXXXXXX 100644 --- a/tests/avocado/boot_linux_console.py +++ b/tests/avocado/boot_linux_console.py @@ -XXX,XX +XXX,XX @@ def test_arm_quanta_gsj_initrd(self): self.wait_for_console_pattern( 'Give root password for system maintenance') + def test_arm_bpim2u(self): + """ + :avocado: tags=arch:arm + :avocado: tags=machine:bpim2u + :avocado: tags=accel:tcg + """ + deb_url = ('https://apt.armbian.com/pool/main/l/linux-5.10.16-sunxi/' + 'linux-image-current-sunxi_21.02.2_armhf.deb') + deb_hash = '9fa84beda245cabf0b4fa84cf6eaa7738ead1da0' + deb_path = self.fetch_asset(deb_url, asset_hash=deb_hash) + kernel_path = self.extract_from_deb(deb_path, + '/boot/vmlinuz-5.10.16-sunxi') + dtb_path = ('/usr/lib/linux-image-current-sunxi/' + 'sun8i-r40-bananapi-m2-ultra.dtb') + dtb_path = self.extract_from_deb(deb_path, dtb_path) + + self.vm.set_console() + kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE + + 'console=ttyS0,115200n8 ' + 'earlycon=uart,mmio32,0x1c28000') + self.vm.add_args('-kernel', kernel_path, + '-dtb', dtb_path, + '-append', kernel_command_line) + self.vm.launch() + console_pattern = 'Kernel command line: %s' % kernel_command_line + self.wait_for_console_pattern(console_pattern) + + def test_arm_bpim2u_initrd(self): + """ + :avocado: tags=arch:arm + :avocado: tags=accel:tcg + :avocado: tags=machine:bpim2u + """ + deb_url = ('https://apt.armbian.com/pool/main/l/linux-5.10.16-sunxi/' + 'linux-image-current-sunxi_21.02.2_armhf.deb') + deb_hash = '9fa84beda245cabf0b4fa84cf6eaa7738ead1da0' + deb_path = self.fetch_asset(deb_url, asset_hash=deb_hash) + kernel_path = self.extract_from_deb(deb_path, + '/boot/vmlinuz-5.10.16-sunxi') + dtb_path = ('/usr/lib/linux-image-current-sunxi/' + 'sun8i-r40-bananapi-m2-ultra.dtb') + dtb_path = self.extract_from_deb(deb_path, dtb_path) + initrd_url = ('https://github.com/groeck/linux-build-test/raw/' + '2eb0a73b5d5a28df3170c546ddaaa9757e1e0848/rootfs/' + 'arm/rootfs-armv7a.cpio.gz') + initrd_hash = '604b2e45cdf35045846b8bbfbf2129b1891bdc9c' + initrd_path_gz = self.fetch_asset(initrd_url, asset_hash=initrd_hash) + initrd_path = os.path.join(self.workdir, 'rootfs.cpio') + archive.gzip_uncompress(initrd_path_gz, initrd_path) + + self.vm.set_console() + kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE + + 'console=ttyS0,115200 ' + 'panic=-1 noreboot') + self.vm.add_args('-kernel', kernel_path, + '-dtb', dtb_path, + '-initrd', initrd_path, + '-append', kernel_command_line, + '-no-reboot') + self.vm.launch() + self.wait_for_console_pattern('Boot successful.') + + exec_command_and_wait_for_pattern(self, 'cat /proc/cpuinfo', + 'Allwinner sun8i Family') + exec_command_and_wait_for_pattern(self, 'cat /proc/iomem', + 'system-control@1c00000') + exec_command_and_wait_for_pattern(self, 'reboot', + 'reboot: Restarting system') + # Wait for VM to shut down gracefully + self.vm.wait() + + def test_arm_bpim2u_gmac(self): + """ + :avocado: tags=arch:arm + :avocado: tags=accel:tcg + :avocado: tags=machine:bpim2u + :avocado: tags=device:sd + """ + self.require_netdev('user') + + deb_url = ('https://apt.armbian.com/pool/main/l/linux-5.10.16-sunxi/' + 'linux-image-current-sunxi_21.02.2_armhf.deb') + deb_hash = '9fa84beda245cabf0b4fa84cf6eaa7738ead1da0' + deb_path = self.fetch_asset(deb_url, asset_hash=deb_hash) + kernel_path = self.extract_from_deb(deb_path, + '/boot/vmlinuz-5.10.16-sunxi') + dtb_path = ('/usr/lib/linux-image-current-sunxi/' + 'sun8i-r40-bananapi-m2-ultra.dtb') + dtb_path = self.extract_from_deb(deb_path, dtb_path) + rootfs_url = ('http://storage.kernelci.org/images/rootfs/buildroot/' + 'buildroot-baseline/20221116.0/armel/rootfs.ext2.xz') + rootfs_hash = 'fae32f337c7b87547b10f42599acf109da8b6d9a' + rootfs_path_xz = self.fetch_asset(rootfs_url, asset_hash=rootfs_hash) + rootfs_path = os.path.join(self.workdir, 'rootfs.cpio') + archive.lzma_uncompress(rootfs_path_xz, rootfs_path) + image_pow2ceil_expand(rootfs_path) + + self.vm.set_console() + kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE + + 'console=ttyS0,115200 ' + 'root=/dev/mmcblk0 rootwait rw ' + 'panic=-1 noreboot') + self.vm.add_args('-kernel', kernel_path, + '-dtb', dtb_path, + '-drive', 'file=' + rootfs_path + ',if=sd,format=raw', + '-net', 'nic,model=gmac,netdev=host_gmac', + '-netdev', 'user,id=host_gmac', + '-append', kernel_command_line, + '-no-reboot') + self.vm.launch() + shell_ready = "/bin/sh: can't access tty; job control turned off" + self.wait_for_console_pattern(shell_ready) + + exec_command_and_wait_for_pattern(self, 'cat /proc/cpuinfo', + 'Allwinner sun8i Family') + exec_command_and_wait_for_pattern(self, 'cat /proc/partitions', + 'mmcblk0') + exec_command_and_wait_for_pattern(self, 'ifconfig eth0 up', + 'eth0: Link is Up') + exec_command_and_wait_for_pattern(self, 'udhcpc eth0', + 'udhcpc: lease of 10.0.2.15 obtained') + exec_command_and_wait_for_pattern(self, 'ping -c 3 10.0.2.2', + '3 packets transmitted, 3 packets received, 0% packet loss') + exec_command_and_wait_for_pattern(self, 'reboot', + 'reboot: Restarting system') + # Wait for VM to shut down gracefully + self.vm.wait() + + @skipUnless(os.getenv('AVOCADO_ALLOW_LARGE_STORAGE'), 'storage limited') + def test_arm_bpim2u_openwrt_22_03_3(self): + """ + :avocado: tags=arch:arm + :avocado: tags=machine:bpim2u + :avocado: tags=device:sd + """ + + # This test download a 8.9 MiB compressed image and expand it + # to 127 MiB. + image_url = ('https://downloads.openwrt.org/releases/22.03.3/targets/' + 'sunxi/cortexa7/openwrt-22.03.3-sunxi-cortexa7-' + 'sinovoip_bananapi-m2-ultra-ext4-sdcard.img.gz') + image_hash = ('5b41b4e11423e562c6011640f9a7cd3b' + 'dd0a3d42b83430f7caa70a432e6cd82c') + image_path_gz = self.fetch_asset(image_url, asset_hash=image_hash, + algorithm='sha256') + image_path = archive.extract(image_path_gz, self.workdir) + image_pow2ceil_expand(image_path) + + self.vm.set_console() + self.vm.add_args('-drive', 'file=' + image_path + ',if=sd,format=raw', + '-nic', 'user', + '-no-reboot') + self.vm.launch() + + kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE + + 'usbcore.nousb ' + 'noreboot') + + self.wait_for_console_pattern('U-Boot SPL') + + interrupt_interactive_console_until_pattern( + self, 'Hit any key to stop autoboot:', '=>') + exec_command_and_wait_for_pattern(self, "setenv extraargs '" + + kernel_command_line + "'", '=>') + exec_command_and_wait_for_pattern(self, 'boot', 'Starting kernel ...'); + + self.wait_for_console_pattern( + 'Please press Enter to activate this console.') + + exec_command_and_wait_for_pattern(self, ' ', 'root@') + + exec_command_and_wait_for_pattern(self, 'cat /proc/cpuinfo', + 'Allwinner sun8i Family') + exec_command_and_wait_for_pattern(self, 'cat /proc/iomem', + 'system-control@1c00000') + def test_arm_orangepi(self): """ :avocado: tags=arch:arm -- 2.34.1
From: qianfan Zhao <qianfanguijin@163.com> Add documents for Banana Pi M2U Signed-off-by: qianfan Zhao <qianfanguijin@163.com> Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com> [PMM: Minor format fixes to correct sphinx errors] Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- docs/system/arm/bananapi_m2u.rst | 139 +++++++++++++++++++++++++++++++ docs/system/target-arm.rst | 1 + 2 files changed, 140 insertions(+) create mode 100644 docs/system/arm/bananapi_m2u.rst diff --git a/docs/system/arm/bananapi_m2u.rst b/docs/system/arm/bananapi_m2u.rst new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/docs/system/arm/bananapi_m2u.rst @@ -XXX,XX +XXX,XX @@ +Banana Pi BPI-M2U (``bpim2u``) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Banana Pi BPI-M2 Ultra is a quad-core mini single board computer built with +Allwinner A40i/R40/V40 SoC. It features 2GB of RAM and 8GB eMMC. It also +has onboard WiFi and BT. On the ports side, the BPI-M2 Ultra has 2 USB A +2.0 ports, 1 USB OTG port, 1 HDMI port, 1 audio jack, a DC power port, +and last but not least, a SATA port. + +Supported devices +""""""""""""""""" + +The Banana Pi M2U machine supports the following devices: + + * SMP (Quad Core Cortex-A7) + * Generic Interrupt Controller configuration + * SRAM mappings + * SDRAM controller + * Timer device (re-used from Allwinner A10) + * UART + * SD/MMC storage controller + * EMAC ethernet + * GMAC ethernet + * Clock Control Unit + * TWI (I2C) + +Limitations +""""""""""" + +Currently, Banana Pi M2U does *not* support the following features: + +- Graphical output via HDMI, GPU and/or the Display Engine +- Audio output +- Hardware Watchdog +- Real Time Clock +- USB 2.0 interfaces + +Also see the 'unimplemented' array in the Allwinner R40 SoC module +for a complete list of unimplemented I/O devices: ``./hw/arm/allwinner-r40.c`` + +Boot options +"""""""""""" + +The Banana Pi M2U machine can start using the standard -kernel functionality +for loading a Linux kernel or ELF executable. Additionally, the Banana Pi M2U +machine can also emulate the BootROM which is present on an actual Allwinner R40 +based SoC, which loads the bootloader from a SD card, specified via the -sd +argument to qemu-system-arm. + +Running mainline Linux +"""""""""""""""""""""" + +To build a Linux mainline kernel that can be booted by the Banana Pi M2U machine, +simply configure the kernel using the sunxi_defconfig configuration: + +.. code-block:: bash + + $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- make mrproper + $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- make sunxi_defconfig + +To boot the newly build linux kernel in QEMU with the Banana Pi M2U machine, use: + +.. code-block:: bash + + $ qemu-system-arm -M bpim2u -nographic \ + -kernel /path/to/linux/arch/arm/boot/zImage \ + -append 'console=ttyS0,115200' \ + -dtb /path/to/linux/arch/arm/boot/dts/sun8i-r40-bananapi-m2-ultra.dtb + +Banana Pi M2U images +"""""""""""""""""""" + +Note that the mainline kernel does not have a root filesystem. You can choose +to build you own image with buildroot using the bananapi_m2_ultra_defconfig. +Also see https://buildroot.org for more information. + +Another possibility is to run an OpenWrt image for Banana Pi M2U which +can be downloaded from: + + https://downloads.openwrt.org/releases/22.03.3/targets/sunxi/cortexa7/ + +When using an image as an SD card, it must be resized to a power of two. This can be +done with the ``qemu-img`` command. It is recommended to only increase the image size +instead of shrinking it to a power of two, to avoid loss of data. For example, +to prepare a downloaded Armbian image, first extract it and then increase +its size to one gigabyte as follows: + +.. code-block:: bash + + $ qemu-img resize \ + openwrt-22.03.3-sunxi-cortexa7-sinovoip_bananapi-m2-ultra-ext4-sdcard.img \ + 1G + +Instead of providing a custom Linux kernel via the -kernel command you may also +choose to let the Banana Pi M2U machine load the bootloader from SD card, just like +a real board would do using the BootROM. Simply pass the selected image via the -sd +argument and remove the -kernel, -append, -dbt and -initrd arguments: + +.. code-block:: bash + + $ qemu-system-arm -M bpim2u -nic user -nographic \ + -sd openwrt-22.03.3-sunxi-cortexa7-sinovoip_bananapi-m2-ultra-ext4-sdcard.img + +Running U-Boot +"""""""""""""" + +U-Boot mainline can be build and configured using the Bananapi_M2_Ultra_defconfig +using similar commands as describe above for Linux. Note that it is recommended +for development/testing to select the following configuration setting in U-Boot: + + Device Tree Control > Provider for DTB for DT Control > Embedded DTB + +The BootROM of allwinner R40 loading u-boot from the 8KiB offset of sdcard. +Let's create an bootable disk image: + +.. code-block:: bash + + $ dd if=/dev/zero of=sd.img bs=32M count=1 + $ dd if=u-boot-sunxi-with-spl.bin of=sd.img bs=1k seek=8 conv=notrunc + +And then boot it. + +.. code-block:: bash + + $ qemu-system-arm -M bpim2u -nographic -sd sd.img + +Banana Pi M2U integration tests +""""""""""""""""""""""""""""""" + +The Banana Pi M2U machine has several integration tests included. +To run the whole set of tests, build QEMU from source and simply +provide the following command: + +.. code-block:: bash + + $ cd qemu-build-dir + $ AVOCADO_ALLOW_LARGE_STORAGE=yes tests/venv/bin/avocado \ + --verbose --show=app,console run -t machine:bpim2u \ + ../tests/avocado/boot_linux_console.py diff --git a/docs/system/target-arm.rst b/docs/system/target-arm.rst index XXXXXXX..XXXXXXX 100644 --- a/docs/system/target-arm.rst +++ b/docs/system/target-arm.rst @@ -XXX,XX +XXX,XX @@ undocumented; you can get a complete list by running arm/versatile arm/vexpress arm/aspeed + arm/bananapi_m2u.rst arm/sabrelite arm/digic arm/cubieboard -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Document the meaning of exclusive_high in a big-endian context, and why we can't change it now. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20230530191438.411344-2-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/cpu.h | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState { uint64_t zcr_el[4]; /* ZCR_EL[1-3] */ uint64_t smcr_el[4]; /* SMCR_EL[1-3] */ } vfp; + uint64_t exclusive_addr; uint64_t exclusive_val; + /* + * Contains the 'val' for the second 64-bit register of LDXP, which comes + * from the higher address, not the high part of a complete 128-bit value. + * In some ways it might be more convenient to record the exclusive value + * as the low and high halves of a 128 bit data value, but the current + * semantics of these fields are baked into the migration format. + */ uint64_t exclusive_high; /* iwMMXt coprocessor state. */ -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20230530191438.411344-3-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/cpu.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_st(const ARMISARegisters *id) return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, ST) != 0; } +static inline bool isar_feature_aa64_lse2(const ARMISARegisters *id) +{ + return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, AT) != 0; +} + static inline bool isar_feature_aa64_fwb(const ARMISARegisters *id) { return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, FWB) != 0; -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Let finalize_memop_atom be the new basic function, with finalize_memop and finalize_memop_pair testing FEAT_LSE2 to apply the appropriate atomicity. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20230530191438.411344-4-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate.h | 39 +++++++++++++++++++++++++++++----- target/arm/tcg/translate-a64.c | 2 ++ target/arm/tcg/translate.c | 1 + 3 files changed, 37 insertions(+), 5 deletions(-) diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate.h +++ b/target/arm/tcg/translate.h @@ -XXX,XX +XXX,XX @@ typedef struct DisasContext { uint64_t features; /* CPU features bits */ bool aarch64; bool thumb; + bool lse2; /* Because unallocated encodings generate different exception syndrome * information from traps due to FP being disabled, we can't do a single * "is fp access disabled" check at a high level in the decode tree. @@ -XXX,XX +XXX,XX @@ static inline TCGv_ptr fpstatus_ptr(ARMFPStatusFlavour flavour) } /** - * finalize_memop: + * finalize_memop_atom: * @s: DisasContext * @opc: size+sign+align of the memory operation + * @atom: atomicity of the memory operation * - * Build the complete MemOp for a memory operation, including alignment - * and endianness. + * Build the complete MemOp for a memory operation, including alignment, + * endianness, and atomicity. * * If (op & MO_AMASK) then the operation already contains the required * alignment, e.g. for AccType_ATOMIC. Otherwise, this an optionally @@ -XXX,XX +XXX,XX @@ static inline TCGv_ptr fpstatus_ptr(ARMFPStatusFlavour flavour) * and this is applied here. Note that there is no way to indicate that * no alignment should ever be enforced; this must be handled manually. */ -static inline MemOp finalize_memop(DisasContext *s, MemOp opc) +static inline MemOp finalize_memop_atom(DisasContext *s, MemOp opc, MemOp atom) { if (s->align_mem && !(opc & MO_AMASK)) { opc |= MO_ALIGN; } - return opc | s->be_data; + return opc | atom | s->be_data; +} + +/** + * finalize_memop: + * @s: DisasContext + * @opc: size+sign+align of the memory operation + * + * Like finalize_memop_atom, but with default atomicity. + */ +static inline MemOp finalize_memop(DisasContext *s, MemOp opc) +{ + MemOp atom = s->lse2 ? MO_ATOM_WITHIN16 : MO_ATOM_IFALIGN; + return finalize_memop_atom(s, opc, atom); +} + +/** + * finalize_memop_pair: + * @s: DisasContext + * @opc: size+sign+align of the memory operation + * + * Like finalize_memop_atom, but with atomicity for a pair. + * C.f. Pseudocode for Mem[], operand ispair. + */ +static inline MemOp finalize_memop_pair(DisasContext *s, MemOp opc) +{ + MemOp atom = s->lse2 ? MO_ATOM_WITHIN16_PAIR : MO_ATOM_IFALIGN_PAIR; + return finalize_memop_atom(s, opc, atom); } /** diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase, tcg_debug_assert(dc->tbid & 1); #endif + dc->lse2 = dc_isar_feature(aa64_lse2, dc); + /* Single step state. The code-generation logic here is: * SS_ACTIVE == 0: * generate code with no special handling for single-stepping (except diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate.c +++ b/target/arm/tcg/translate.c @@ -XXX,XX +XXX,XX @@ static void arm_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs) dc->sme_trap_nonstreaming = EX_TBFLAG_A32(tb_flags, SME_TRAP_NONSTREAMING); } + dc->lse2 = false; /* applies only to aarch64 */ dc->cp_regs = cpu->cp_regs; dc->features = env->features; -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> While we don't require 16-byte atomicity here, using a single larger load simplifies the code, and makes it a closer match to STXP. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20230530191438.411344-5-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-a64.c | 31 ++++++++++++++++++++----------- 1 file changed, 20 insertions(+), 11 deletions(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2, TCGv_i64 addr, int size, bool is_pair) { int idx = get_mem_index(s); - MemOp memop = s->be_data; + MemOp memop; g_assert(size <= 3); if (is_pair) { g_assert(size >= 2); if (size == 2) { /* The pair must be single-copy atomic for the doubleword. */ - memop |= MO_64 | MO_ALIGN; + memop = finalize_memop(s, MO_64 | MO_ALIGN); tcg_gen_qemu_ld_i64(cpu_exclusive_val, addr, idx, memop); if (s->be_data == MO_LE) { tcg_gen_extract_i64(cpu_reg(s, rt), cpu_exclusive_val, 0, 32); @@ -XXX,XX +XXX,XX @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2, tcg_gen_extract_i64(cpu_reg(s, rt2), cpu_exclusive_val, 0, 32); } } else { - /* The pair must be single-copy atomic for *each* doubleword, not - the entire quadword, however it must be quadword aligned. */ - memop |= MO_64; - tcg_gen_qemu_ld_i64(cpu_exclusive_val, addr, idx, - memop | MO_ALIGN_16); + /* + * The pair must be single-copy atomic for *each* doubleword, not + * the entire quadword, however it must be quadword aligned. + * Expose the complete load to tcg, for ease of tlb lookup, + * but indicate that only 8-byte atomicity is required. + */ + TCGv_i128 t16 = tcg_temp_new_i128(); - TCGv_i64 addr2 = tcg_temp_new_i64(); - tcg_gen_addi_i64(addr2, addr, 8); - tcg_gen_qemu_ld_i64(cpu_exclusive_high, addr2, idx, memop); + memop = finalize_memop_atom(s, MO_128 | MO_ALIGN_16, + MO_ATOM_IFALIGN_PAIR); + tcg_gen_qemu_ld_i128(t16, addr, idx, memop); + if (s->be_data == MO_LE) { + tcg_gen_extr_i128_i64(cpu_exclusive_val, + cpu_exclusive_high, t16); + } else { + tcg_gen_extr_i128_i64(cpu_exclusive_high, + cpu_exclusive_val, t16); + } tcg_gen_mov_i64(cpu_reg(s, rt), cpu_exclusive_val); tcg_gen_mov_i64(cpu_reg(s, rt2), cpu_exclusive_high); } } else { - memop |= size | MO_ALIGN; + memop = finalize_memop(s, size | MO_ALIGN); tcg_gen_qemu_ld_i64(cpu_exclusive_val, addr, idx, memop); tcg_gen_mov_i64(cpu_reg(s, rt), cpu_exclusive_val); } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> While we don't require 16-byte atomicity here, using a single larger operation simplifies the code. Introduce finalize_memop_asimd for this. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20230530191438.411344-6-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate.h | 24 +++++++++++++++++++++++ target/arm/tcg/translate-a64.c | 35 +++++++++++----------------------- 2 files changed, 35 insertions(+), 24 deletions(-) diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate.h +++ b/target/arm/tcg/translate.h @@ -XXX,XX +XXX,XX @@ static inline MemOp finalize_memop_pair(DisasContext *s, MemOp opc) return finalize_memop_atom(s, opc, atom); } +/** + * finalize_memop_asimd: + * @s: DisasContext + * @opc: size+sign+align of the memory operation + * + * Like finalize_memop_atom, but with atomicity of AccessType_ASIMD. + */ +static inline MemOp finalize_memop_asimd(DisasContext *s, MemOp opc) +{ + /* + * In the pseudocode for Mem[], with AccessType_ASIMD, size == 16, + * if IsAligned(8), the first case provides separate atomicity for + * the pair of 64-bit accesses. If !IsAligned(8), the middle cases + * do not apply, and we're left with the final case of no atomicity. + * Thus MO_ATOM_IFALIGN_PAIR. + * + * For other sizes, normal LSE2 rules apply. + */ + if ((opc & MO_SIZE) == MO_128) { + return finalize_memop_atom(s, opc, MO_ATOM_IFALIGN_PAIR); + } + return finalize_memop(s, opc); +} + /** * asimd_imm_const: Expand an encoded SIMD constant value * diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static void do_fp_st(DisasContext *s, int srcidx, TCGv_i64 tcg_addr, int size) { /* This writes the bottom N bits of a 128 bit wide vector to memory */ TCGv_i64 tmplo = tcg_temp_new_i64(); - MemOp mop; + MemOp mop = finalize_memop_asimd(s, size); tcg_gen_ld_i64(tmplo, cpu_env, fp_reg_offset(s, srcidx, MO_64)); - if (size < 4) { - mop = finalize_memop(s, size); + if (size < MO_128) { tcg_gen_qemu_st_i64(tmplo, tcg_addr, get_mem_index(s), mop); } else { - bool be = s->be_data == MO_BE; - TCGv_i64 tcg_hiaddr = tcg_temp_new_i64(); TCGv_i64 tmphi = tcg_temp_new_i64(); + TCGv_i128 t16 = tcg_temp_new_i128(); tcg_gen_ld_i64(tmphi, cpu_env, fp_reg_hi_offset(s, srcidx)); + tcg_gen_concat_i64_i128(t16, tmplo, tmphi); - mop = s->be_data | MO_UQ; - tcg_gen_qemu_st_i64(be ? tmphi : tmplo, tcg_addr, get_mem_index(s), - mop | (s->align_mem ? MO_ALIGN_16 : 0)); - tcg_gen_addi_i64(tcg_hiaddr, tcg_addr, 8); - tcg_gen_qemu_st_i64(be ? tmplo : tmphi, tcg_hiaddr, - get_mem_index(s), mop); + tcg_gen_qemu_st_i128(t16, tcg_addr, get_mem_index(s), mop); } } @@ -XXX,XX +XXX,XX @@ static void do_fp_ld(DisasContext *s, int destidx, TCGv_i64 tcg_addr, int size) /* This always zero-extends and writes to a full 128 bit wide vector */ TCGv_i64 tmplo = tcg_temp_new_i64(); TCGv_i64 tmphi = NULL; - MemOp mop; + MemOp mop = finalize_memop_asimd(s, size); - if (size < 4) { - mop = finalize_memop(s, size); + if (size < MO_128) { tcg_gen_qemu_ld_i64(tmplo, tcg_addr, get_mem_index(s), mop); } else { - bool be = s->be_data == MO_BE; - TCGv_i64 tcg_hiaddr; + TCGv_i128 t16 = tcg_temp_new_i128(); + + tcg_gen_qemu_ld_i128(t16, tcg_addr, get_mem_index(s), mop); tmphi = tcg_temp_new_i64(); - tcg_hiaddr = tcg_temp_new_i64(); - - mop = s->be_data | MO_UQ; - tcg_gen_qemu_ld_i64(be ? tmphi : tmplo, tcg_addr, get_mem_index(s), - mop | (s->align_mem ? MO_ALIGN_16 : 0)); - tcg_gen_addi_i64(tcg_hiaddr, tcg_addr, 8); - tcg_gen_qemu_ld_i64(be ? tmplo : tmphi, tcg_hiaddr, - get_mem_index(s), mop); + tcg_gen_extr_i128_i64(tmplo, tmphi, t16); } tcg_gen_st_i64(tmplo, cpu_env, fp_reg_offset(s, destidx, MO_64)); -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> This fixes a bug in that these two insns should have been using atomic 16-byte stores, since MTE is ARMv8.5 and LSE2 is mandatory from ARMv8.4. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20230530191438.411344-7-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-a64.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static void disas_ldst_tag(DisasContext *s, uint32_t insn) if (is_zero) { TCGv_i64 clean_addr = clean_data_tbi(s, addr); - TCGv_i64 tcg_zero = tcg_constant_i64(0); + TCGv_i64 zero64 = tcg_constant_i64(0); + TCGv_i128 zero128 = tcg_temp_new_i128(); int mem_index = get_mem_index(s); - int i, n = (1 + is_pair) << LOG2_TAG_GRANULE; + MemOp mop = finalize_memop(s, MO_128 | MO_ALIGN); - tcg_gen_qemu_st_i64(tcg_zero, clean_addr, mem_index, - MO_UQ | MO_ALIGN_16); - for (i = 8; i < n; i += 8) { - tcg_gen_addi_i64(clean_addr, clean_addr, 8); - tcg_gen_qemu_st_i64(tcg_zero, clean_addr, mem_index, MO_UQ); + tcg_gen_concat_i64_i128(zero128, zero64, zero64); + + /* This is 1 or 2 atomic 16-byte operations. */ + tcg_gen_qemu_st_i128(zero128, clean_addr, mem_index, mop); + if (is_pair) { + tcg_gen_addi_i64(clean_addr, clean_addr, 16); + tcg_gen_qemu_st_i128(zero128, clean_addr, mem_index, mop); } } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Round len_align to 16 instead of 8, handling an odd 8-byte as part of the tail. Use MO_ATOM_NONE to indicate that all of these memory ops have only byte atomicity. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20230530191438.411344-8-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-sve.c | 95 +++++++++++++++++++++++++--------- 1 file changed, 70 insertions(+), 25 deletions(-) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(UCVTF_dd, aa64_sve, gen_gvec_fpst_arg_zpz, void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs, int len, int rn, int imm) { - int len_align = QEMU_ALIGN_DOWN(len, 8); - int len_remain = len % 8; - int nparts = len / 8 + ctpop8(len_remain); + int len_align = QEMU_ALIGN_DOWN(len, 16); + int len_remain = len % 16; + int nparts = len / 16 + ctpop8(len_remain); int midx = get_mem_index(s); TCGv_i64 dirty_addr, clean_addr, t0, t1; + TCGv_i128 t16; dirty_addr = tcg_temp_new_i64(); tcg_gen_addi_i64(dirty_addr, cpu_reg_sp(s, rn), imm); @@ -XXX,XX +XXX,XX @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs, int i; t0 = tcg_temp_new_i64(); - for (i = 0; i < len_align; i += 8) { - tcg_gen_qemu_ld_i64(t0, clean_addr, midx, MO_LEUQ); + t1 = tcg_temp_new_i64(); + t16 = tcg_temp_new_i128(); + + for (i = 0; i < len_align; i += 16) { + tcg_gen_qemu_ld_i128(t16, clean_addr, midx, + MO_LE | MO_128 | MO_ATOM_NONE); + tcg_gen_extr_i128_i64(t0, t1, t16); tcg_gen_st_i64(t0, base, vofs + i); - tcg_gen_addi_i64(clean_addr, clean_addr, 8); + tcg_gen_st_i64(t1, base, vofs + i + 8); + tcg_gen_addi_i64(clean_addr, clean_addr, 16); } } else { TCGLabel *loop = gen_new_label(); @@ -XXX,XX +XXX,XX @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs, tcg_gen_movi_ptr(i, 0); gen_set_label(loop); - t0 = tcg_temp_new_i64(); - tcg_gen_qemu_ld_i64(t0, clean_addr, midx, MO_LEUQ); - tcg_gen_addi_i64(clean_addr, clean_addr, 8); + t16 = tcg_temp_new_i128(); + tcg_gen_qemu_ld_i128(t16, clean_addr, midx, + MO_LE | MO_128 | MO_ATOM_NONE); + tcg_gen_addi_i64(clean_addr, clean_addr, 16); tp = tcg_temp_new_ptr(); tcg_gen_add_ptr(tp, base, i); - tcg_gen_addi_ptr(i, i, 8); + tcg_gen_addi_ptr(i, i, 16); + + t0 = tcg_temp_new_i64(); + t1 = tcg_temp_new_i64(); + tcg_gen_extr_i128_i64(t0, t1, t16); + tcg_gen_st_i64(t0, tp, vofs); + tcg_gen_st_i64(t1, tp, vofs + 8); tcg_gen_brcondi_ptr(TCG_COND_LTU, i, len_align, loop); } @@ -XXX,XX +XXX,XX @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs, * Predicate register loads can be any multiple of 2. * Note that we still store the entire 64-bit unit into cpu_env. */ + if (len_remain >= 8) { + t0 = tcg_temp_new_i64(); + tcg_gen_qemu_ld_i64(t0, clean_addr, midx, MO_LEUQ | MO_ATOM_NONE); + tcg_gen_st_i64(t0, base, vofs + len_align); + len_remain -= 8; + len_align += 8; + if (len_remain) { + tcg_gen_addi_i64(clean_addr, clean_addr, 8); + } + } if (len_remain) { t0 = tcg_temp_new_i64(); switch (len_remain) { @@ -XXX,XX +XXX,XX @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs, case 4: case 8: tcg_gen_qemu_ld_i64(t0, clean_addr, midx, - MO_LE | ctz32(len_remain)); + MO_LE | ctz32(len_remain) | MO_ATOM_NONE); break; case 6: t1 = tcg_temp_new_i64(); - tcg_gen_qemu_ld_i64(t0, clean_addr, midx, MO_LEUL); + tcg_gen_qemu_ld_i64(t0, clean_addr, midx, MO_LEUL | MO_ATOM_NONE); tcg_gen_addi_i64(clean_addr, clean_addr, 4); - tcg_gen_qemu_ld_i64(t1, clean_addr, midx, MO_LEUW); + tcg_gen_qemu_ld_i64(t1, clean_addr, midx, MO_LEUW | MO_ATOM_NONE); tcg_gen_deposit_i64(t0, t0, t1, 32, 32); break; @@ -XXX,XX +XXX,XX @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs, void gen_sve_str(DisasContext *s, TCGv_ptr base, int vofs, int len, int rn, int imm) { - int len_align = QEMU_ALIGN_DOWN(len, 8); - int len_remain = len % 8; - int nparts = len / 8 + ctpop8(len_remain); + int len_align = QEMU_ALIGN_DOWN(len, 16); + int len_remain = len % 16; + int nparts = len / 16 + ctpop8(len_remain); int midx = get_mem_index(s); - TCGv_i64 dirty_addr, clean_addr, t0; + TCGv_i64 dirty_addr, clean_addr, t0, t1; + TCGv_i128 t16; dirty_addr = tcg_temp_new_i64(); tcg_gen_addi_i64(dirty_addr, cpu_reg_sp(s, rn), imm); @@ -XXX,XX +XXX,XX @@ void gen_sve_str(DisasContext *s, TCGv_ptr base, int vofs, int i; t0 = tcg_temp_new_i64(); + t1 = tcg_temp_new_i64(); + t16 = tcg_temp_new_i128(); for (i = 0; i < len_align; i += 8) { tcg_gen_ld_i64(t0, base, vofs + i); - tcg_gen_qemu_st_i64(t0, clean_addr, midx, MO_LEUQ); - tcg_gen_addi_i64(clean_addr, clean_addr, 8); + tcg_gen_ld_i64(t1, base, vofs + i + 8); + tcg_gen_concat_i64_i128(t16, t0, t1); + tcg_gen_qemu_st_i128(t16, clean_addr, midx, + MO_LE | MO_128 | MO_ATOM_NONE); + tcg_gen_addi_i64(clean_addr, clean_addr, 16); } } else { TCGLabel *loop = gen_new_label(); @@ -XXX,XX +XXX,XX @@ void gen_sve_str(DisasContext *s, TCGv_ptr base, int vofs, gen_set_label(loop); t0 = tcg_temp_new_i64(); + t1 = tcg_temp_new_i64(); tp = tcg_temp_new_ptr(); tcg_gen_add_ptr(tp, base, i); tcg_gen_ld_i64(t0, tp, vofs); - tcg_gen_addi_ptr(i, i, 8); + tcg_gen_ld_i64(t1, tp, vofs + 8); + tcg_gen_addi_ptr(i, i, 16); - tcg_gen_qemu_st_i64(t0, clean_addr, midx, MO_LEUQ); - tcg_gen_addi_i64(clean_addr, clean_addr, 8); + t16 = tcg_temp_new_i128(); + tcg_gen_concat_i64_i128(t16, t0, t1); + + tcg_gen_qemu_st_i128(t16, clean_addr, midx, MO_LEUQ); + tcg_gen_addi_i64(clean_addr, clean_addr, 16); tcg_gen_brcondi_ptr(TCG_COND_LTU, i, len_align, loop); } /* Predicate register stores can be any multiple of 2. */ + if (len_remain >= 8) { + t0 = tcg_temp_new_i64(); + tcg_gen_st_i64(t0, base, vofs + len_align); + tcg_gen_qemu_st_i64(t0, clean_addr, midx, MO_LEUQ | MO_ATOM_NONE); + len_remain -= 8; + len_align += 8; + if (len_remain) { + tcg_gen_addi_i64(clean_addr, clean_addr, 8); + } + } if (len_remain) { t0 = tcg_temp_new_i64(); tcg_gen_ld_i64(t0, base, vofs + len_align); @@ -XXX,XX +XXX,XX @@ void gen_sve_str(DisasContext *s, TCGv_ptr base, int vofs, case 4: case 8: tcg_gen_qemu_st_i64(t0, clean_addr, midx, - MO_LE | ctz32(len_remain)); + MO_LE | ctz32(len_remain) | MO_ATOM_NONE); break; case 6: - tcg_gen_qemu_st_i64(t0, clean_addr, midx, MO_LEUL); + tcg_gen_qemu_st_i64(t0, clean_addr, midx, MO_LEUL | MO_ATOM_NONE); tcg_gen_addi_i64(clean_addr, clean_addr, 4); tcg_gen_shri_i64(t0, t0, 32); - tcg_gen_qemu_st_i64(t0, clean_addr, midx, MO_LEUW); + tcg_gen_qemu_st_i64(t0, clean_addr, midx, MO_LEUW | MO_ATOM_NONE); break; default: -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> No need to duplicate this check across multiple call sites. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20230530191438.411344-9-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-a64.c | 44 ++++++++++++++++------------------ 1 file changed, 21 insertions(+), 23 deletions(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static void disas_b_exc_sys(DisasContext *s, uint32_t insn) * races in multi-threaded linux-user and when MTTCG softmmu is * enabled. */ -static void gen_load_exclusive(DisasContext *s, int rt, int rt2, - TCGv_i64 addr, int size, bool is_pair) +static void gen_load_exclusive(DisasContext *s, int rt, int rt2, int rn, + int size, bool is_pair) { int idx = get_mem_index(s); MemOp memop; + TCGv_i64 dirty_addr, clean_addr; + + s->is_ldex = true; + dirty_addr = cpu_reg_sp(s, rn); + clean_addr = gen_mte_check1(s, dirty_addr, false, rn != 31, size); g_assert(size <= 3); if (is_pair) { @@ -XXX,XX +XXX,XX @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2, if (size == 2) { /* The pair must be single-copy atomic for the doubleword. */ memop = finalize_memop(s, MO_64 | MO_ALIGN); - tcg_gen_qemu_ld_i64(cpu_exclusive_val, addr, idx, memop); + tcg_gen_qemu_ld_i64(cpu_exclusive_val, clean_addr, idx, memop); if (s->be_data == MO_LE) { tcg_gen_extract_i64(cpu_reg(s, rt), cpu_exclusive_val, 0, 32); tcg_gen_extract_i64(cpu_reg(s, rt2), cpu_exclusive_val, 32, 32); @@ -XXX,XX +XXX,XX @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2, memop = finalize_memop_atom(s, MO_128 | MO_ALIGN_16, MO_ATOM_IFALIGN_PAIR); - tcg_gen_qemu_ld_i128(t16, addr, idx, memop); + tcg_gen_qemu_ld_i128(t16, clean_addr, idx, memop); if (s->be_data == MO_LE) { tcg_gen_extr_i128_i64(cpu_exclusive_val, @@ -XXX,XX +XXX,XX @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2, } } else { memop = finalize_memop(s, size | MO_ALIGN); - tcg_gen_qemu_ld_i64(cpu_exclusive_val, addr, idx, memop); + tcg_gen_qemu_ld_i64(cpu_exclusive_val, clean_addr, idx, memop); tcg_gen_mov_i64(cpu_reg(s, rt), cpu_exclusive_val); } - tcg_gen_mov_i64(cpu_exclusive_addr, addr); + tcg_gen_mov_i64(cpu_exclusive_addr, clean_addr); } static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2, - TCGv_i64 addr, int size, int is_pair) + int rn, int size, int is_pair) { /* if (env->exclusive_addr == addr && env->exclusive_val == [addr] * && (!is_pair || env->exclusive_high == [addr + datasize])) { @@ -XXX,XX +XXX,XX @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2, */ TCGLabel *fail_label = gen_new_label(); TCGLabel *done_label = gen_new_label(); - TCGv_i64 tmp; + TCGv_i64 tmp, dirty_addr, clean_addr; - tcg_gen_brcond_i64(TCG_COND_NE, addr, cpu_exclusive_addr, fail_label); + dirty_addr = cpu_reg_sp(s, rn); + clean_addr = gen_mte_check1(s, dirty_addr, true, rn != 31, size); + + tcg_gen_brcond_i64(TCG_COND_NE, clean_addr, cpu_exclusive_addr, fail_label); tmp = tcg_temp_new_i64(); if (is_pair) { @@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn) if (is_lasr) { tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL); } - clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn), - true, rn != 31, size); - gen_store_exclusive(s, rs, rt, rt2, clean_addr, size, false); + gen_store_exclusive(s, rs, rt, rt2, rn, size, false); return; case 0x4: /* LDXR */ @@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn) if (rn == 31) { gen_check_sp_alignment(s); } - clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn), - false, rn != 31, size); - s->is_ldex = true; - gen_load_exclusive(s, rt, rt2, clean_addr, size, false); + gen_load_exclusive(s, rt, rt2, rn, size, false); if (is_lasr) { tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ); } @@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn) if (is_lasr) { tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL); } - clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn), - true, rn != 31, size); - gen_store_exclusive(s, rs, rt, rt2, clean_addr, size, true); + gen_store_exclusive(s, rs, rt, rt2, rn, size, true); return; } if (rt2 == 31 @@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn) if (rn == 31) { gen_check_sp_alignment(s); } - clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn), - false, rn != 31, size); - s->is_ldex = true; - gen_load_exclusive(s, rt, rt2, clean_addr, size, true); + gen_load_exclusive(s, rt, rt2, rn, size, true); if (is_lasr) { tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ); } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> This is required for LSE2, where the pair must be treated atomically if it does not cross a 16-byte boundary. But it simplifies the code to do this always. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20230530191438.411344-10-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-a64.c | 70 ++++++++++++++++++++++++++-------- 1 file changed, 55 insertions(+), 15 deletions(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static void disas_ldst_pair(DisasContext *s, uint32_t insn) } else { TCGv_i64 tcg_rt = cpu_reg(s, rt); TCGv_i64 tcg_rt2 = cpu_reg(s, rt2); + MemOp mop = size + 1; + + /* + * With LSE2, non-sign-extending pairs are treated atomically if + * aligned, and if unaligned one of the pair will be completely + * within a 16-byte block and that element will be atomic. + * Otherwise each element is separately atomic. + * In all cases, issue one operation with the correct atomicity. + * + * This treats sign-extending loads like zero-extending loads, + * since that reuses the most code below. + */ + if (s->align_mem) { + mop |= (size == 2 ? MO_ALIGN_4 : MO_ALIGN_8); + } + mop = finalize_memop_pair(s, mop); if (is_load) { - TCGv_i64 tmp = tcg_temp_new_i64(); + if (size == 2) { + int o2 = s->be_data == MO_LE ? 32 : 0; + int o1 = o2 ^ 32; - /* Do not modify tcg_rt before recognizing any exception - * from the second load. - */ - do_gpr_ld(s, tmp, clean_addr, size + is_signed * MO_SIGN, - false, false, 0, false, false); - tcg_gen_addi_i64(clean_addr, clean_addr, 1 << size); - do_gpr_ld(s, tcg_rt2, clean_addr, size + is_signed * MO_SIGN, - false, false, 0, false, false); + tcg_gen_qemu_ld_i64(tcg_rt, clean_addr, get_mem_index(s), mop); + if (is_signed) { + tcg_gen_sextract_i64(tcg_rt2, tcg_rt, o2, 32); + tcg_gen_sextract_i64(tcg_rt, tcg_rt, o1, 32); + } else { + tcg_gen_extract_i64(tcg_rt2, tcg_rt, o2, 32); + tcg_gen_extract_i64(tcg_rt, tcg_rt, o1, 32); + } + } else { + TCGv_i128 tmp = tcg_temp_new_i128(); - tcg_gen_mov_i64(tcg_rt, tmp); + tcg_gen_qemu_ld_i128(tmp, clean_addr, get_mem_index(s), mop); + if (s->be_data == MO_LE) { + tcg_gen_extr_i128_i64(tcg_rt, tcg_rt2, tmp); + } else { + tcg_gen_extr_i128_i64(tcg_rt2, tcg_rt, tmp); + } + } } else { - do_gpr_st(s, tcg_rt, clean_addr, size, - false, 0, false, false); - tcg_gen_addi_i64(clean_addr, clean_addr, 1 << size); - do_gpr_st(s, tcg_rt2, clean_addr, size, - false, 0, false, false); + if (size == 2) { + TCGv_i64 tmp = tcg_temp_new_i64(); + + if (s->be_data == MO_LE) { + tcg_gen_concat32_i64(tmp, tcg_rt, tcg_rt2); + } else { + tcg_gen_concat32_i64(tmp, tcg_rt2, tcg_rt); + } + tcg_gen_qemu_st_i64(tmp, clean_addr, get_mem_index(s), mop); + } else { + TCGv_i128 tmp = tcg_temp_new_i128(); + + if (s->be_data == MO_LE) { + tcg_gen_concat_i64_i128(tmp, tcg_rt, tcg_rt2); + } else { + tcg_gen_concat_i64_i128(tmp, tcg_rt2, tcg_rt); + } + tcg_gen_qemu_st_i128(tmp, clean_addr, get_mem_index(s), mop); + } } } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> We are going to need the complete memop beforehand, so let's not compute it twice. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20230530191438.411344-11-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-a64.c | 61 +++++++++++++++++++--------------- 1 file changed, 35 insertions(+), 26 deletions(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static void do_gpr_st_memidx(DisasContext *s, TCGv_i64 source, unsigned int iss_srt, bool iss_sf, bool iss_ar) { - memop = finalize_memop(s, memop); tcg_gen_qemu_st_i64(source, tcg_addr, memidx, memop); if (iss_valid) { @@ -XXX,XX +XXX,XX @@ static void do_gpr_ld_memidx(DisasContext *s, TCGv_i64 dest, TCGv_i64 tcg_addr, bool iss_valid, unsigned int iss_srt, bool iss_sf, bool iss_ar) { - memop = finalize_memop(s, memop); tcg_gen_qemu_ld_i64(dest, tcg_addr, memidx, memop); if (extend && (memop & MO_SIGN)) { @@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn) int o2_L_o1_o0 = extract32(insn, 21, 3) * 2 | is_lasr; int size = extract32(insn, 30, 2); TCGv_i64 clean_addr; + MemOp memop; switch (o2_L_o1_o0) { case 0x0: /* STXR */ @@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn) gen_check_sp_alignment(s); } tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL); + /* TODO: ARMv8.4-LSE SCTLR.nAA */ + memop = finalize_memop(s, size | MO_ALIGN); clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn), true, rn != 31, size); - /* TODO: ARMv8.4-LSE SCTLR.nAA */ - do_gpr_st(s, cpu_reg(s, rt), clean_addr, size | MO_ALIGN, true, rt, + do_gpr_st(s, cpu_reg(s, rt), clean_addr, memop, true, rt, disas_ldst_compute_iss_sf(size, false, 0), is_lasr); return; @@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn) if (rn == 31) { gen_check_sp_alignment(s); } + /* TODO: ARMv8.4-LSE SCTLR.nAA */ + memop = finalize_memop(s, size | MO_ALIGN); clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn), false, rn != 31, size); - /* TODO: ARMv8.4-LSE SCTLR.nAA */ - do_gpr_ld(s, cpu_reg(s, rt), clean_addr, size | MO_ALIGN, false, true, + do_gpr_ld(s, cpu_reg(s, rt), clean_addr, memop, false, true, rt, disas_ldst_compute_iss_sf(size, false, 0), is_lasr); tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ); return; @@ -XXX,XX +XXX,XX @@ static void disas_ld_lit(DisasContext *s, uint32_t insn) } else { /* Only unsigned 32bit loads target 32bit registers. */ bool iss_sf = opc != 0; + MemOp memop = finalize_memop(s, size + is_signed * MO_SIGN); - do_gpr_ld(s, tcg_rt, clean_addr, size + is_signed * MO_SIGN, - false, true, rt, iss_sf, false); + do_gpr_ld(s, tcg_rt, clean_addr, memop, false, true, rt, iss_sf, false); } } @@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_imm9(DisasContext *s, uint32_t insn, bool post_index; bool writeback; int memidx; - + MemOp memop; TCGv_i64 clean_addr, dirty_addr; if (is_vector) { @@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_imm9(DisasContext *s, uint32_t insn, return; } is_store = (opc == 0); - is_signed = extract32(opc, 1, 1); + is_signed = !is_store && extract32(opc, 1, 1); is_extended = (size < 3) && extract32(opc, 0, 1); } @@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_imm9(DisasContext *s, uint32_t insn, } memidx = is_unpriv ? get_a64_user_mem_index(s) : get_mem_index(s); + memop = finalize_memop(s, size + is_signed * MO_SIGN); + clean_addr = gen_mte_check1_mmuidx(s, dirty_addr, is_store, writeback || rn != 31, size, is_unpriv, memidx); @@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_imm9(DisasContext *s, uint32_t insn, bool iss_sf = disas_ldst_compute_iss_sf(size, is_signed, opc); if (is_store) { - do_gpr_st_memidx(s, tcg_rt, clean_addr, size, memidx, + do_gpr_st_memidx(s, tcg_rt, clean_addr, memop, memidx, iss_valid, rt, iss_sf, false); } else { - do_gpr_ld_memidx(s, tcg_rt, clean_addr, size + is_signed * MO_SIGN, + do_gpr_ld_memidx(s, tcg_rt, clean_addr, memop, is_extended, memidx, iss_valid, rt, iss_sf, false); } @@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_roffset(DisasContext *s, uint32_t insn, bool is_signed = false; bool is_store = false; bool is_extended = false; - TCGv_i64 tcg_rm, clean_addr, dirty_addr; + MemOp memop; if (extract32(opt, 1, 1) == 0) { unallocated_encoding(s); @@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_roffset(DisasContext *s, uint32_t insn, return; } is_store = (opc == 0); - is_signed = extract32(opc, 1, 1); + is_signed = !is_store && extract32(opc, 1, 1); is_extended = (size < 3) && extract32(opc, 0, 1); } @@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_roffset(DisasContext *s, uint32_t insn, ext_and_shift_reg(tcg_rm, tcg_rm, opt, shift ? size : 0); tcg_gen_add_i64(dirty_addr, dirty_addr, tcg_rm); + + memop = finalize_memop(s, size + is_signed * MO_SIGN); clean_addr = gen_mte_check1(s, dirty_addr, is_store, true, size); if (is_vector) { @@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_roffset(DisasContext *s, uint32_t insn, } else { TCGv_i64 tcg_rt = cpu_reg(s, rt); bool iss_sf = disas_ldst_compute_iss_sf(size, is_signed, opc); + if (is_store) { - do_gpr_st(s, tcg_rt, clean_addr, size, + do_gpr_st(s, tcg_rt, clean_addr, memop, true, rt, iss_sf, false); } else { - do_gpr_ld(s, tcg_rt, clean_addr, size + is_signed * MO_SIGN, + do_gpr_ld(s, tcg_rt, clean_addr, memop, is_extended, true, rt, iss_sf, false); } } @@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_unsigned_imm(DisasContext *s, uint32_t insn, int rn = extract32(insn, 5, 5); unsigned int imm12 = extract32(insn, 10, 12); unsigned int offset; - TCGv_i64 clean_addr, dirty_addr; - bool is_store; bool is_signed = false; bool is_extended = false; + MemOp memop; if (is_vector) { size |= (opc & 2) << 1; @@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_unsigned_imm(DisasContext *s, uint32_t insn, return; } is_store = (opc == 0); - is_signed = extract32(opc, 1, 1); + is_signed = !is_store && extract32(opc, 1, 1); is_extended = (size < 3) && extract32(opc, 0, 1); } @@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_unsigned_imm(DisasContext *s, uint32_t insn, dirty_addr = read_cpu_reg_sp(s, rn, 1); offset = imm12 << size; tcg_gen_addi_i64(dirty_addr, dirty_addr, offset); + + memop = finalize_memop(s, size + is_signed * MO_SIGN); clean_addr = gen_mte_check1(s, dirty_addr, is_store, rn != 31, size); if (is_vector) { @@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_unsigned_imm(DisasContext *s, uint32_t insn, TCGv_i64 tcg_rt = cpu_reg(s, rt); bool iss_sf = disas_ldst_compute_iss_sf(size, is_signed, opc); if (is_store) { - do_gpr_st(s, tcg_rt, clean_addr, size, - true, rt, iss_sf, false); + do_gpr_st(s, tcg_rt, clean_addr, memop, true, rt, iss_sf, false); } else { - do_gpr_ld(s, tcg_rt, clean_addr, size + is_signed * MO_SIGN, + do_gpr_ld(s, tcg_rt, clean_addr, memop, is_extended, true, rt, iss_sf, false); } } @@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn, bool a = extract32(insn, 23, 1); TCGv_i64 tcg_rs, tcg_rt, clean_addr; AtomicThreeOpFn *fn = NULL; - MemOp mop = s->be_data | size | MO_ALIGN; + MemOp mop = finalize_memop(s, size | MO_ALIGN); if (is_vector || !dc_isar_feature(aa64_atomics, s)) { unallocated_encoding(s); @@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn, * full load-acquire (we only need "load-acquire processor consistent"), * but we choose to implement them as full LDAQ. */ - do_gpr_ld(s, cpu_reg(s, rt), clean_addr, size, false, + do_gpr_ld(s, cpu_reg(s, rt), clean_addr, mop, false, true, rt, disas_ldst_compute_iss_sf(size, false, 0), true); tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ); return; @@ -XXX,XX +XXX,XX @@ static void disas_ldst_pac(DisasContext *s, uint32_t insn, bool use_key_a = !extract32(insn, 23, 1); int offset; TCGv_i64 clean_addr, dirty_addr, tcg_rt; + MemOp memop; if (size != 3 || is_vector || !dc_isar_feature(aa64_pauth, s)) { unallocated_encoding(s); @@ -XXX,XX +XXX,XX @@ static void disas_ldst_pac(DisasContext *s, uint32_t insn, offset = sextract32(offset << size, 0, 10 + size); tcg_gen_addi_i64(dirty_addr, dirty_addr, offset); + memop = finalize_memop(s, size); + /* Note that "clean" and "dirty" here refer to TBI not PAC. */ clean_addr = gen_mte_check1(s, dirty_addr, false, is_wback || rn != 31, size); tcg_rt = cpu_reg(s, rt); - do_gpr_ld(s, tcg_rt, clean_addr, size, + do_gpr_ld(s, tcg_rt, clean_addr, memop, /* extend */ false, /* iss_valid */ !is_wback, /* iss_srt */ rt, /* iss_sf */ true, /* iss_ar */ false); @@ -XXX,XX +XXX,XX @@ static void disas_ldst_ldapr_stlr(DisasContext *s, uint32_t insn) } /* TODO: ARMv8.4-LSE SCTLR.nAA */ - mop = size | MO_ALIGN; + mop = finalize_memop(s, size | MO_ALIGN); switch (opc) { case 0: /* STLURB */ -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> We are going to need the complete memop beforehand, so let's not compute it twice. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20230530191438.411344-12-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-a64.c | 43 ++++++++++++++++++---------------- 1 file changed, 23 insertions(+), 20 deletions(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static void do_gpr_ld(DisasContext *s, TCGv_i64 dest, TCGv_i64 tcg_addr, /* * Store from FP register to memory */ -static void do_fp_st(DisasContext *s, int srcidx, TCGv_i64 tcg_addr, int size) +static void do_fp_st(DisasContext *s, int srcidx, TCGv_i64 tcg_addr, MemOp mop) { /* This writes the bottom N bits of a 128 bit wide vector to memory */ TCGv_i64 tmplo = tcg_temp_new_i64(); - MemOp mop = finalize_memop_asimd(s, size); tcg_gen_ld_i64(tmplo, cpu_env, fp_reg_offset(s, srcidx, MO_64)); - if (size < MO_128) { + if ((mop & MO_SIZE) < MO_128) { tcg_gen_qemu_st_i64(tmplo, tcg_addr, get_mem_index(s), mop); } else { TCGv_i64 tmphi = tcg_temp_new_i64(); @@ -XXX,XX +XXX,XX @@ static void do_fp_st(DisasContext *s, int srcidx, TCGv_i64 tcg_addr, int size) /* * Load from memory to FP register */ -static void do_fp_ld(DisasContext *s, int destidx, TCGv_i64 tcg_addr, int size) +static void do_fp_ld(DisasContext *s, int destidx, TCGv_i64 tcg_addr, MemOp mop) { /* This always zero-extends and writes to a full 128 bit wide vector */ TCGv_i64 tmplo = tcg_temp_new_i64(); TCGv_i64 tmphi = NULL; - MemOp mop = finalize_memop_asimd(s, size); - if (size < MO_128) { + if ((mop & MO_SIZE) < MO_128) { tcg_gen_qemu_ld_i64(tmplo, tcg_addr, get_mem_index(s), mop); } else { TCGv_i128 t16 = tcg_temp_new_i128(); @@ -XXX,XX +XXX,XX @@ static void disas_ld_lit(DisasContext *s, uint32_t insn) bool is_signed = false; int size = 2; TCGv_i64 tcg_rt, clean_addr; + MemOp memop; if (is_vector) { if (opc == 3) { @@ -XXX,XX +XXX,XX @@ static void disas_ld_lit(DisasContext *s, uint32_t insn) if (!fp_access_check(s)) { return; } + memop = finalize_memop_asimd(s, size); } else { if (opc == 3) { /* PRFM (literal) : prefetch */ @@ -XXX,XX +XXX,XX @@ static void disas_ld_lit(DisasContext *s, uint32_t insn) } size = 2 + extract32(opc, 0, 1); is_signed = extract32(opc, 1, 1); + memop = finalize_memop(s, size + is_signed * MO_SIGN); } tcg_rt = cpu_reg(s, rt); clean_addr = tcg_temp_new_i64(); gen_pc_plus_diff(s, clean_addr, imm); + if (is_vector) { - do_fp_ld(s, rt, clean_addr, size); + do_fp_ld(s, rt, clean_addr, memop); } else { /* Only unsigned 32bit loads target 32bit registers. */ bool iss_sf = opc != 0; - MemOp memop = finalize_memop(s, size + is_signed * MO_SIGN); - do_gpr_ld(s, tcg_rt, clean_addr, memop, false, true, rt, iss_sf, false); } } @@ -XXX,XX +XXX,XX @@ static void disas_ldst_pair(DisasContext *s, uint32_t insn) (wback || rn != 31) && !set_tag, 2 << size); if (is_vector) { + MemOp mop = finalize_memop_asimd(s, size); + if (is_load) { - do_fp_ld(s, rt, clean_addr, size); + do_fp_ld(s, rt, clean_addr, mop); } else { - do_fp_st(s, rt, clean_addr, size); + do_fp_st(s, rt, clean_addr, mop); } tcg_gen_addi_i64(clean_addr, clean_addr, 1 << size); if (is_load) { - do_fp_ld(s, rt2, clean_addr, size); + do_fp_ld(s, rt2, clean_addr, mop); } else { - do_fp_st(s, rt2, clean_addr, size); + do_fp_st(s, rt2, clean_addr, mop); } } else { TCGv_i64 tcg_rt = cpu_reg(s, rt); @@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_imm9(DisasContext *s, uint32_t insn, if (!fp_access_check(s)) { return; } + memop = finalize_memop_asimd(s, size); } else { if (size == 3 && opc == 2) { /* PRFM - prefetch */ @@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_imm9(DisasContext *s, uint32_t insn, is_store = (opc == 0); is_signed = !is_store && extract32(opc, 1, 1); is_extended = (size < 3) && extract32(opc, 0, 1); + memop = finalize_memop(s, size + is_signed * MO_SIGN); } switch (idx) { @@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_imm9(DisasContext *s, uint32_t insn, } memidx = is_unpriv ? get_a64_user_mem_index(s) : get_mem_index(s); - memop = finalize_memop(s, size + is_signed * MO_SIGN); clean_addr = gen_mte_check1_mmuidx(s, dirty_addr, is_store, writeback || rn != 31, @@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_imm9(DisasContext *s, uint32_t insn, if (is_vector) { if (is_store) { - do_fp_st(s, rt, clean_addr, size); + do_fp_st(s, rt, clean_addr, memop); } else { - do_fp_ld(s, rt, clean_addr, size); + do_fp_ld(s, rt, clean_addr, memop); } } else { TCGv_i64 tcg_rt = cpu_reg(s, rt); @@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_roffset(DisasContext *s, uint32_t insn, if (is_vector) { if (is_store) { - do_fp_st(s, rt, clean_addr, size); + do_fp_st(s, rt, clean_addr, memop); } else { - do_fp_ld(s, rt, clean_addr, size); + do_fp_ld(s, rt, clean_addr, memop); } } else { TCGv_i64 tcg_rt = cpu_reg(s, rt); @@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_unsigned_imm(DisasContext *s, uint32_t insn, if (is_vector) { if (is_store) { - do_fp_st(s, rt, clean_addr, size); + do_fp_st(s, rt, clean_addr, memop); } else { - do_fp_ld(s, rt, clean_addr, size); + do_fp_ld(s, rt, clean_addr, memop); } } else { TCGv_i64 tcg_rt = cpu_reg(s, rt); -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Pass the completed memop to gen_mte_check1_mmuidx. For the moment, do nothing more than extract the size. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20230530191438.411344-13-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-a64.h | 2 +- target/arm/tcg/translate-a64.c | 82 ++++++++++++++++++---------------- target/arm/tcg/translate-sve.c | 7 +-- 3 files changed, 49 insertions(+), 42 deletions(-) diff --git a/target/arm/tcg/translate-a64.h b/target/arm/tcg/translate-a64.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.h +++ b/target/arm/tcg/translate-a64.h @@ -XXX,XX +XXX,XX @@ static inline bool sme_smza_enabled_check(DisasContext *s) TCGv_i64 clean_data_tbi(DisasContext *s, TCGv_i64 addr); TCGv_i64 gen_mte_check1(DisasContext *s, TCGv_i64 addr, bool is_write, - bool tag_checked, int log2_size); + bool tag_checked, MemOp memop); TCGv_i64 gen_mte_checkN(DisasContext *s, TCGv_i64 addr, bool is_write, bool tag_checked, int size); diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static void gen_probe_access(DisasContext *s, TCGv_i64 ptr, */ static TCGv_i64 gen_mte_check1_mmuidx(DisasContext *s, TCGv_i64 addr, bool is_write, bool tag_checked, - int log2_size, bool is_unpriv, + MemOp memop, bool is_unpriv, int core_idx) { if (tag_checked && s->mte_active[is_unpriv]) { @@ -XXX,XX +XXX,XX @@ static TCGv_i64 gen_mte_check1_mmuidx(DisasContext *s, TCGv_i64 addr, desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid); desc = FIELD_DP32(desc, MTEDESC, TCMA, s->tcma); desc = FIELD_DP32(desc, MTEDESC, WRITE, is_write); - desc = FIELD_DP32(desc, MTEDESC, SIZEM1, (1 << log2_size) - 1); + desc = FIELD_DP32(desc, MTEDESC, SIZEM1, memop_size(memop) - 1); ret = tcg_temp_new_i64(); gen_helper_mte_check(ret, cpu_env, tcg_constant_i32(desc), addr); @@ -XXX,XX +XXX,XX @@ static TCGv_i64 gen_mte_check1_mmuidx(DisasContext *s, TCGv_i64 addr, } TCGv_i64 gen_mte_check1(DisasContext *s, TCGv_i64 addr, bool is_write, - bool tag_checked, int log2_size) + bool tag_checked, MemOp memop) { - return gen_mte_check1_mmuidx(s, addr, is_write, tag_checked, log2_size, + return gen_mte_check1_mmuidx(s, addr, is_write, tag_checked, memop, false, get_mem_index(s)); } @@ -XXX,XX +XXX,XX @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2, int rn, int size, bool is_pair) { int idx = get_mem_index(s); - MemOp memop; TCGv_i64 dirty_addr, clean_addr; + MemOp memop; + + /* + * For pairs: + * if size == 2, the operation is single-copy atomic for the doubleword. + * if size == 3, the operation is single-copy atomic for *each* doubleword, + * not the entire quadword, however it must be quadword aligned. + */ + memop = size + is_pair; + if (memop == MO_128) { + memop = finalize_memop_atom(s, MO_128 | MO_ALIGN, + MO_ATOM_IFALIGN_PAIR); + } else { + memop = finalize_memop(s, memop | MO_ALIGN); + } s->is_ldex = true; dirty_addr = cpu_reg_sp(s, rn); - clean_addr = gen_mte_check1(s, dirty_addr, false, rn != 31, size); + clean_addr = gen_mte_check1(s, dirty_addr, false, rn != 31, memop); g_assert(size <= 3); if (is_pair) { g_assert(size >= 2); if (size == 2) { - /* The pair must be single-copy atomic for the doubleword. */ - memop = finalize_memop(s, MO_64 | MO_ALIGN); tcg_gen_qemu_ld_i64(cpu_exclusive_val, clean_addr, idx, memop); if (s->be_data == MO_LE) { tcg_gen_extract_i64(cpu_reg(s, rt), cpu_exclusive_val, 0, 32); @@ -XXX,XX +XXX,XX @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2, int rn, tcg_gen_extract_i64(cpu_reg(s, rt2), cpu_exclusive_val, 0, 32); } } else { - /* - * The pair must be single-copy atomic for *each* doubleword, not - * the entire quadword, however it must be quadword aligned. - * Expose the complete load to tcg, for ease of tlb lookup, - * but indicate that only 8-byte atomicity is required. - */ TCGv_i128 t16 = tcg_temp_new_i128(); - memop = finalize_memop_atom(s, MO_128 | MO_ALIGN_16, - MO_ATOM_IFALIGN_PAIR); tcg_gen_qemu_ld_i128(t16, clean_addr, idx, memop); if (s->be_data == MO_LE) { @@ -XXX,XX +XXX,XX @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2, int rn, tcg_gen_mov_i64(cpu_reg(s, rt2), cpu_exclusive_high); } } else { - memop = finalize_memop(s, size | MO_ALIGN); tcg_gen_qemu_ld_i64(cpu_exclusive_val, clean_addr, idx, memop); tcg_gen_mov_i64(cpu_reg(s, rt), cpu_exclusive_val); } @@ -XXX,XX +XXX,XX @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2, TCGLabel *fail_label = gen_new_label(); TCGLabel *done_label = gen_new_label(); TCGv_i64 tmp, dirty_addr, clean_addr; + MemOp memop; + + memop = (size + is_pair) | MO_ALIGN; + memop = finalize_memop(s, memop); dirty_addr = cpu_reg_sp(s, rn); - clean_addr = gen_mte_check1(s, dirty_addr, true, rn != 31, size); + clean_addr = gen_mte_check1(s, dirty_addr, true, rn != 31, memop); tcg_gen_brcond_i64(TCG_COND_NE, clean_addr, cpu_exclusive_addr, fail_label); @@ -XXX,XX +XXX,XX @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2, } tcg_gen_atomic_cmpxchg_i64(tmp, cpu_exclusive_addr, cpu_exclusive_val, tmp, - get_mem_index(s), - MO_64 | MO_ALIGN | s->be_data); + get_mem_index(s), memop); tcg_gen_setcond_i64(TCG_COND_NE, tmp, tmp, cpu_exclusive_val); } else { TCGv_i128 t16 = tcg_temp_new_i128(); @@ -XXX,XX +XXX,XX @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2, } tcg_gen_atomic_cmpxchg_i128(t16, cpu_exclusive_addr, c16, t16, - get_mem_index(s), - MO_128 | MO_ALIGN | s->be_data); + get_mem_index(s), memop); a = tcg_temp_new_i64(); b = tcg_temp_new_i64(); @@ -XXX,XX +XXX,XX @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2, } } else { tcg_gen_atomic_cmpxchg_i64(tmp, cpu_exclusive_addr, cpu_exclusive_val, - cpu_reg(s, rt), get_mem_index(s), - size | MO_ALIGN | s->be_data); + cpu_reg(s, rt), get_mem_index(s), memop); tcg_gen_setcond_i64(TCG_COND_NE, tmp, tmp, cpu_exclusive_val); } tcg_gen_mov_i64(cpu_reg(s, rd), tmp); @@ -XXX,XX +XXX,XX @@ static void gen_compare_and_swap(DisasContext *s, int rs, int rt, TCGv_i64 tcg_rt = cpu_reg(s, rt); int memidx = get_mem_index(s); TCGv_i64 clean_addr; + MemOp memop; if (rn == 31) { gen_check_sp_alignment(s); } - clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn), true, rn != 31, size); - tcg_gen_atomic_cmpxchg_i64(tcg_rs, clean_addr, tcg_rs, tcg_rt, memidx, - size | MO_ALIGN | s->be_data); + memop = finalize_memop(s, size | MO_ALIGN); + clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn), true, rn != 31, memop); + tcg_gen_atomic_cmpxchg_i64(tcg_rs, clean_addr, tcg_rs, tcg_rt, + memidx, memop); } static void gen_compare_and_swap_pair(DisasContext *s, int rs, int rt, @@ -XXX,XX +XXX,XX @@ static void gen_compare_and_swap_pair(DisasContext *s, int rs, int rt, TCGv_i64 t2 = cpu_reg(s, rt + 1); TCGv_i64 clean_addr; int memidx = get_mem_index(s); + MemOp memop; if (rn == 31) { gen_check_sp_alignment(s); } /* This is a single atomic access, despite the "pair". */ - clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn), true, rn != 31, size + 1); + memop = finalize_memop(s, (size + 1) | MO_ALIGN); + clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn), true, rn != 31, memop); if (size == 2) { TCGv_i64 cmp = tcg_temp_new_i64(); @@ -XXX,XX +XXX,XX @@ static void gen_compare_and_swap_pair(DisasContext *s, int rs, int rt, tcg_gen_concat32_i64(cmp, s2, s1); } - tcg_gen_atomic_cmpxchg_i64(cmp, clean_addr, cmp, val, memidx, - MO_64 | MO_ALIGN | s->be_data); + tcg_gen_atomic_cmpxchg_i64(cmp, clean_addr, cmp, val, memidx, memop); if (s->be_data == MO_LE) { tcg_gen_extr32_i64(s1, s2, cmp); @@ -XXX,XX +XXX,XX @@ static void gen_compare_and_swap_pair(DisasContext *s, int rs, int rt, tcg_gen_concat_i64_i128(cmp, s2, s1); } - tcg_gen_atomic_cmpxchg_i128(cmp, clean_addr, cmp, val, memidx, - MO_128 | MO_ALIGN | s->be_data); + tcg_gen_atomic_cmpxchg_i128(cmp, clean_addr, cmp, val, memidx, memop); if (s->be_data == MO_LE) { tcg_gen_extr_i128_i64(s1, s2, cmp); @@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn) /* TODO: ARMv8.4-LSE SCTLR.nAA */ memop = finalize_memop(s, size | MO_ALIGN); clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn), - true, rn != 31, size); + true, rn != 31, memop); do_gpr_st(s, cpu_reg(s, rt), clean_addr, memop, true, rt, disas_ldst_compute_iss_sf(size, false, 0), is_lasr); return; @@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn) /* TODO: ARMv8.4-LSE SCTLR.nAA */ memop = finalize_memop(s, size | MO_ALIGN); clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn), - false, rn != 31, size); + false, rn != 31, memop); do_gpr_ld(s, cpu_reg(s, rt), clean_addr, memop, false, true, rt, disas_ldst_compute_iss_sf(size, false, 0), is_lasr); tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ); @@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_roffset(DisasContext *s, uint32_t insn, tcg_gen_add_i64(dirty_addr, dirty_addr, tcg_rm); memop = finalize_memop(s, size + is_signed * MO_SIGN); - clean_addr = gen_mte_check1(s, dirty_addr, is_store, true, size); + clean_addr = gen_mte_check1(s, dirty_addr, is_store, true, memop); if (is_vector) { if (is_store) { @@ -XXX,XX +XXX,XX @@ static void disas_ldst_reg_unsigned_imm(DisasContext *s, uint32_t insn, tcg_gen_addi_i64(dirty_addr, dirty_addr, offset); memop = finalize_memop(s, size + is_signed * MO_SIGN); - clean_addr = gen_mte_check1(s, dirty_addr, is_store, rn != 31, size); + clean_addr = gen_mte_check1(s, dirty_addr, is_store, rn != 31, memop); if (is_vector) { if (is_store) { @@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn, if (rn == 31) { gen_check_sp_alignment(s); } - clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn), false, rn != 31, size); + clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn), false, rn != 31, mop); if (o3_opc == 014) { /* @@ -XXX,XX +XXX,XX @@ static void disas_ldst_pac(DisasContext *s, uint32_t insn, /* Note that "clean" and "dirty" here refer to TBI not PAC. */ clean_addr = gen_mte_check1(s, dirty_addr, false, - is_wback || rn != 31, size); + is_wback || rn != 31, memop); tcg_rt = cpu_reg(s, rt); do_gpr_ld(s, tcg_rt, clean_addr, memop, diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_LD1R_zpri(DisasContext *s, arg_rpri_load *a) unsigned msz = dtype_msz(a->dtype); TCGLabel *over; TCGv_i64 temp, clean_addr; + MemOp memop; if (!dc_isar_feature(aa64_sve, s)) { return false; @@ -XXX,XX +XXX,XX @@ static bool trans_LD1R_zpri(DisasContext *s, arg_rpri_load *a) /* Load the data. */ temp = tcg_temp_new_i64(); tcg_gen_addi_i64(temp, cpu_reg_sp(s, a->rn), a->imm << msz); - clean_addr = gen_mte_check1(s, temp, false, true, msz); - tcg_gen_qemu_ld_i64(temp, clean_addr, get_mem_index(s), - finalize_memop(s, dtype_mop[a->dtype])); + memop = finalize_memop(s, dtype_mop[a->dtype]); + clean_addr = gen_mte_check1(s, temp, false, true, memop); + tcg_gen_qemu_ld_i64(temp, clean_addr, get_mem_index(s), memop); /* Broadcast to *all* elements. */ tcg_gen_gvec_dup_i64(esz, vec_full_reg_offset(s, a->rd), -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Pass the individual memop to gen_mte_checkN. For the moment, do nothing with it. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20230530191438.411344-14-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-a64.h | 2 +- target/arm/tcg/translate-a64.c | 31 +++++++++++++++++++------------ target/arm/tcg/translate-sve.c | 4 ++-- 3 files changed, 22 insertions(+), 15 deletions(-) diff --git a/target/arm/tcg/translate-a64.h b/target/arm/tcg/translate-a64.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.h +++ b/target/arm/tcg/translate-a64.h @@ -XXX,XX +XXX,XX @@ TCGv_i64 clean_data_tbi(DisasContext *s, TCGv_i64 addr); TCGv_i64 gen_mte_check1(DisasContext *s, TCGv_i64 addr, bool is_write, bool tag_checked, MemOp memop); TCGv_i64 gen_mte_checkN(DisasContext *s, TCGv_i64 addr, bool is_write, - bool tag_checked, int size); + bool tag_checked, int total_size, MemOp memop); /* We should have at some point before trying to access an FP register * done the necessary access check, so assert that diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ TCGv_i64 gen_mte_check1(DisasContext *s, TCGv_i64 addr, bool is_write, * For MTE, check multiple logical sequential accesses. */ TCGv_i64 gen_mte_checkN(DisasContext *s, TCGv_i64 addr, bool is_write, - bool tag_checked, int size) + bool tag_checked, int total_size, MemOp single_mop) { if (tag_checked && s->mte_active[0]) { TCGv_i64 ret; @@ -XXX,XX +XXX,XX @@ TCGv_i64 gen_mte_checkN(DisasContext *s, TCGv_i64 addr, bool is_write, desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid); desc = FIELD_DP32(desc, MTEDESC, TCMA, s->tcma); desc = FIELD_DP32(desc, MTEDESC, WRITE, is_write); - desc = FIELD_DP32(desc, MTEDESC, SIZEM1, size - 1); + desc = FIELD_DP32(desc, MTEDESC, SIZEM1, total_size - 1); ret = tcg_temp_new_i64(); gen_helper_mte_check(ret, cpu_env, tcg_constant_i32(desc), addr); @@ -XXX,XX +XXX,XX @@ static void disas_ldst_pair(DisasContext *s, uint32_t insn) bool is_vector = extract32(insn, 26, 1); bool is_load = extract32(insn, 22, 1); int opc = extract32(insn, 30, 2); - bool is_signed = false; bool postindex = false; bool wback = false; bool set_tag = false; - TCGv_i64 clean_addr, dirty_addr; - + MemOp mop; int size; if (opc == 3) { @@ -XXX,XX +XXX,XX @@ static void disas_ldst_pair(DisasContext *s, uint32_t insn) } } + if (is_vector) { + mop = finalize_memop_asimd(s, size); + } else { + mop = finalize_memop(s, size); + } clean_addr = gen_mte_checkN(s, dirty_addr, !is_load, - (wback || rn != 31) && !set_tag, 2 << size); + (wback || rn != 31) && !set_tag, + 2 << size, mop); if (is_vector) { - MemOp mop = finalize_memop_asimd(s, size); - + /* LSE2 does not merge FP pairs; leave these as separate operations. */ if (is_load) { do_fp_ld(s, rt, clean_addr, mop); } else { @@ -XXX,XX +XXX,XX @@ static void disas_ldst_pair(DisasContext *s, uint32_t insn) } else { TCGv_i64 tcg_rt = cpu_reg(s, rt); TCGv_i64 tcg_rt2 = cpu_reg(s, rt2); - MemOp mop = size + 1; /* + * We built mop above for the single logical access -- rebuild it + * now for the paired operation. + * * With LSE2, non-sign-extending pairs are treated atomically if * aligned, and if unaligned one of the pair will be completely * within a 16-byte block and that element will be atomic. @@ -XXX,XX +XXX,XX @@ static void disas_ldst_pair(DisasContext *s, uint32_t insn) * This treats sign-extending loads like zero-extending loads, * since that reuses the most code below. */ + mop = size + 1; if (s->align_mem) { mop |= (size == 2 ? MO_ALIGN_4 : MO_ALIGN_8); } @@ -XXX,XX +XXX,XX @@ static void disas_ldst_multiple_struct(DisasContext *s, uint32_t insn) * promote consecutive little-endian elements below. */ clean_addr = gen_mte_checkN(s, tcg_rn, is_store, is_postidx || rn != 31, - total); + total, finalize_memop(s, size)); /* * Consecutive little-endian elements from a single register @@ -XXX,XX +XXX,XX @@ static void disas_ldst_single_struct(DisasContext *s, uint32_t insn) total = selem << scale; tcg_rn = cpu_reg_sp(s, rn); - clean_addr = gen_mte_checkN(s, tcg_rn, !is_load, is_postidx || rn != 31, - total); mop = finalize_memop(s, scale); + clean_addr = gen_mte_checkN(s, tcg_rn, !is_load, is_postidx || rn != 31, + total, mop); + tcg_ebytes = tcg_constant_i64(1 << scale); for (xs = 0; xs < selem; xs++) { if (replicate) { diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs, dirty_addr = tcg_temp_new_i64(); tcg_gen_addi_i64(dirty_addr, cpu_reg_sp(s, rn), imm); - clean_addr = gen_mte_checkN(s, dirty_addr, false, rn != 31, len); + clean_addr = gen_mte_checkN(s, dirty_addr, false, rn != 31, len, MO_8); /* * Note that unpredicated load/store of vector/predicate registers @@ -XXX,XX +XXX,XX @@ void gen_sve_str(DisasContext *s, TCGv_ptr base, int vofs, dirty_addr = tcg_temp_new_i64(); tcg_gen_addi_i64(dirty_addr, cpu_reg_sp(s, rn), imm); - clean_addr = gen_mte_checkN(s, dirty_addr, false, rn != 31, len); + clean_addr = gen_mte_checkN(s, dirty_addr, false, rn != 31, len, MO_8); /* Note that unpredicated load/store of vector/predicate registers * are defined as a stream of bytes, which equates to little-endian -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Fixes a bug in that with SCTLR.A set, we should raise any alignment fault before raising any MTE check fault. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20230530191438.411344-15-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/internals.h | 3 ++- target/arm/tcg/mte_helper.c | 18 ++++++++++++++++++ target/arm/tcg/translate-a64.c | 2 ++ 3 files changed, 22 insertions(+), 1 deletion(-) diff --git a/target/arm/internals.h b/target/arm/internals.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/internals.h +++ b/target/arm/internals.h @@ -XXX,XX +XXX,XX @@ FIELD(MTEDESC, MIDX, 0, 4) FIELD(MTEDESC, TBI, 4, 2) FIELD(MTEDESC, TCMA, 6, 2) FIELD(MTEDESC, WRITE, 8, 1) -FIELD(MTEDESC, SIZEM1, 9, SIMD_DATA_BITS - 9) /* size - 1 */ +FIELD(MTEDESC, ALIGN, 9, 3) +FIELD(MTEDESC, SIZEM1, 12, SIMD_DATA_BITS - 12) /* size - 1 */ bool mte_probe(CPUARMState *env, uint32_t desc, uint64_t ptr); uint64_t mte_check(CPUARMState *env, uint32_t desc, uint64_t ptr, uintptr_t ra); diff --git a/target/arm/tcg/mte_helper.c b/target/arm/tcg/mte_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/mte_helper.c +++ b/target/arm/tcg/mte_helper.c @@ -XXX,XX +XXX,XX @@ uint64_t mte_check(CPUARMState *env, uint32_t desc, uint64_t ptr, uintptr_t ra) uint64_t HELPER(mte_check)(CPUARMState *env, uint32_t desc, uint64_t ptr) { + /* + * R_XCHFJ: Alignment check not caused by memory type is priority 1, + * higher than any translation fault. When MTE is disabled, tcg + * performs the alignment check during the code generated for the + * memory access. With MTE enabled, we must check this here before + * raising any translation fault in allocation_tag_mem. + */ + unsigned align = FIELD_EX32(desc, MTEDESC, ALIGN); + if (unlikely(align)) { + align = (1u << align) - 1; + if (unlikely(ptr & align)) { + int idx = FIELD_EX32(desc, MTEDESC, MIDX); + bool w = FIELD_EX32(desc, MTEDESC, WRITE); + MMUAccessType type = w ? MMU_DATA_STORE : MMU_DATA_LOAD; + arm_cpu_do_unaligned_access(env_cpu(env), ptr, type, idx, GETPC()); + } + } + return mte_check(env, desc, ptr, GETPC()); } diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static TCGv_i64 gen_mte_check1_mmuidx(DisasContext *s, TCGv_i64 addr, desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid); desc = FIELD_DP32(desc, MTEDESC, TCMA, s->tcma); desc = FIELD_DP32(desc, MTEDESC, WRITE, is_write); + desc = FIELD_DP32(desc, MTEDESC, ALIGN, get_alignment_bits(memop)); desc = FIELD_DP32(desc, MTEDESC, SIZEM1, memop_size(memop) - 1); ret = tcg_temp_new_i64(); @@ -XXX,XX +XXX,XX @@ TCGv_i64 gen_mte_checkN(DisasContext *s, TCGv_i64 addr, bool is_write, desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid); desc = FIELD_DP32(desc, MTEDESC, TCMA, s->tcma); desc = FIELD_DP32(desc, MTEDESC, WRITE, is_write); + desc = FIELD_DP32(desc, MTEDESC, ALIGN, get_alignment_bits(single_mop)); desc = FIELD_DP32(desc, MTEDESC, SIZEM1, total_size - 1); ret = tcg_temp_new_i64(); -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20230530191438.411344-16-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/cpu.h | 3 ++- target/arm/tcg/translate.h | 2 ++ target/arm/tcg/hflags.c | 6 ++++++ target/arm/tcg/translate-a64.c | 1 + 4 files changed, 11 insertions(+), 1 deletion(-) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -XXX,XX +XXX,XX @@ void pmu_init(ARMCPU *cpu); #define SCTLR_D (1U << 5) /* up to v5; RAO in v6 */ #define SCTLR_CP15BEN (1U << 5) /* v7 onward */ #define SCTLR_L (1U << 6) /* up to v5; RAO in v6 and v7; RAZ in v8 */ -#define SCTLR_nAA (1U << 6) /* when v8.4-LSE is implemented */ +#define SCTLR_nAA (1U << 6) /* when FEAT_LSE2 is implemented */ #define SCTLR_B (1U << 7) /* up to v6; RAZ in v7 */ #define SCTLR_ITD (1U << 7) /* v8 onward */ #define SCTLR_S (1U << 8) /* up to v6; RAZ in v7 */ @@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A64, SVL, 24, 4) /* Indicates that SME Streaming mode is active, and SMCR_ELx.FA64 is not. */ FIELD(TBFLAG_A64, SME_TRAP_NONSTREAMING, 28, 1) FIELD(TBFLAG_A64, FGT_ERET, 29, 1) +FIELD(TBFLAG_A64, NAA, 30, 1) /* * Helpers for using the above. diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate.h +++ b/target/arm/tcg/translate.h @@ -XXX,XX +XXX,XX @@ typedef struct DisasContext { bool fgt_eret; /* True if fine-grained trap on SVC is enabled */ bool fgt_svc; + /* True if FEAT_LSE2 SCTLR_ELx.nAA is set */ + bool naa; /* * >= 0, a copy of PSTATE.BTYPE, which will be 0 without v8.5-BTI. * < 0, set by the current instruction. diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/hflags.c +++ b/target/arm/tcg/hflags.c @@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el, } } + if (cpu_isar_feature(aa64_lse2, env_archcpu(env))) { + if (sctlr & SCTLR_nAA) { + DP_TBFLAG_A64(flags, NAA, 1); + } + } + /* Compute the condition for using AccType_UNPRIV for LDTR et al. */ if (!(env->pstate & PSTATE_UAO)) { switch (mmu_idx) { diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase, dc->pstate_sm = EX_TBFLAG_A64(tb_flags, PSTATE_SM); dc->pstate_za = EX_TBFLAG_A64(tb_flags, PSTATE_ZA); dc->sme_trap_nonstreaming = EX_TBFLAG_A64(tb_flags, SME_TRAP_NONSTREAMING); + dc->naa = EX_TBFLAG_A64(tb_flags, NAA); dc->vec_len = 0; dc->vec_stride = 0; dc->cp_regs = arm_cpu->cp_regs; -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> FEAT_LSE2 only requires that atomic operations not cross a 16-byte boundary. Ordered operations may be completely unaligned if SCTLR.nAA is set. Because this alignment check is so special, do it by hand. Make sure not to keep TCG temps live across the branch. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20230530191438.411344-17-richard.henderson@linaro.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-a64.h | 3 + target/arm/tcg/helper-a64.c | 7 ++ target/arm/tcg/translate-a64.c | 120 ++++++++++++++++++++++++++------- 3 files changed, 104 insertions(+), 26 deletions(-) diff --git a/target/arm/tcg/helper-a64.h b/target/arm/tcg/helper-a64.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-a64.h +++ b/target/arm/tcg/helper-a64.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_2(st2g_stub, TCG_CALL_NO_WG, void, env, i64) DEF_HELPER_FLAGS_2(ldgm, TCG_CALL_NO_WG, i64, env, i64) DEF_HELPER_FLAGS_3(stgm, TCG_CALL_NO_WG, void, env, i64, i64) DEF_HELPER_FLAGS_3(stzgm_tags, TCG_CALL_NO_WG, void, env, i64, i64) + +DEF_HELPER_FLAGS_4(unaligned_access, TCG_CALL_NO_WG, + noreturn, env, i64, i32, i32) diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-a64.c +++ b/target/arm/tcg/helper-a64.c @@ -XXX,XX +XXX,XX @@ void HELPER(dc_zva)(CPUARMState *env, uint64_t vaddr_in) memset(mem, 0, blocklen); } + +void HELPER(unaligned_access)(CPUARMState *env, uint64_t addr, + uint32_t access_type, uint32_t mmu_idx) +{ + arm_cpu_do_unaligned_access(env_cpu(env), addr, access_type, + mmu_idx, GETPC()); +} diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ TCGv_i64 gen_mte_checkN(DisasContext *s, TCGv_i64 addr, bool is_write, return clean_data_tbi(s, addr); } +/* + * Generate the special alignment check that applies to AccType_ATOMIC + * and AccType_ORDERED insns under FEAT_LSE2: the access need not be + * naturally aligned, but it must not cross a 16-byte boundary. + * See AArch64.CheckAlignment(). + */ +static void check_lse2_align(DisasContext *s, int rn, int imm, + bool is_write, MemOp mop) +{ + TCGv_i32 tmp; + TCGv_i64 addr; + TCGLabel *over_label; + MMUAccessType type; + int mmu_idx; + + tmp = tcg_temp_new_i32(); + tcg_gen_extrl_i64_i32(tmp, cpu_reg_sp(s, rn)); + tcg_gen_addi_i32(tmp, tmp, imm & 15); + tcg_gen_andi_i32(tmp, tmp, 15); + tcg_gen_addi_i32(tmp, tmp, memop_size(mop)); + + over_label = gen_new_label(); + tcg_gen_brcondi_i32(TCG_COND_LEU, tmp, 16, over_label); + + addr = tcg_temp_new_i64(); + tcg_gen_addi_i64(addr, cpu_reg_sp(s, rn), imm); + + type = is_write ? MMU_DATA_STORE : MMU_DATA_LOAD, + mmu_idx = get_mem_index(s); + gen_helper_unaligned_access(cpu_env, addr, tcg_constant_i32(type), + tcg_constant_i32(mmu_idx)); + + gen_set_label(over_label); + +} + +/* Handle the alignment check for AccType_ATOMIC instructions. */ +static MemOp check_atomic_align(DisasContext *s, int rn, MemOp mop) +{ + MemOp size = mop & MO_SIZE; + + if (size == MO_8) { + return mop; + } + + /* + * If size == MO_128, this is a LDXP, and the operation is single-copy + * atomic for each doubleword, not the entire quadword; it still must + * be quadword aligned. + */ + if (size == MO_128) { + return finalize_memop_atom(s, MO_128 | MO_ALIGN, + MO_ATOM_IFALIGN_PAIR); + } + if (dc_isar_feature(aa64_lse2, s)) { + check_lse2_align(s, rn, 0, true, mop); + } else { + mop |= MO_ALIGN; + } + return finalize_memop(s, mop); +} + +/* Handle the alignment check for AccType_ORDERED instructions. */ +static MemOp check_ordered_align(DisasContext *s, int rn, int imm, + bool is_write, MemOp mop) +{ + MemOp size = mop & MO_SIZE; + + if (size == MO_8) { + return mop; + } + if (size == MO_128) { + return finalize_memop_atom(s, MO_128 | MO_ALIGN, + MO_ATOM_IFALIGN_PAIR); + } + if (!dc_isar_feature(aa64_lse2, s)) { + mop |= MO_ALIGN; + } else if (!s->naa) { + check_lse2_align(s, rn, imm, is_write, mop); + } + return finalize_memop(s, mop); +} + typedef struct DisasCompare64 { TCGCond cond; TCGv_i64 value; @@ -XXX,XX +XXX,XX @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2, int rn, { int idx = get_mem_index(s); TCGv_i64 dirty_addr, clean_addr; - MemOp memop; - - /* - * For pairs: - * if size == 2, the operation is single-copy atomic for the doubleword. - * if size == 3, the operation is single-copy atomic for *each* doubleword, - * not the entire quadword, however it must be quadword aligned. - */ - memop = size + is_pair; - if (memop == MO_128) { - memop = finalize_memop_atom(s, MO_128 | MO_ALIGN, - MO_ATOM_IFALIGN_PAIR); - } else { - memop = finalize_memop(s, memop | MO_ALIGN); - } + MemOp memop = check_atomic_align(s, rn, size + is_pair); s->is_ldex = true; dirty_addr = cpu_reg_sp(s, rn); @@ -XXX,XX +XXX,XX @@ static void gen_compare_and_swap(DisasContext *s, int rs, int rt, if (rn == 31) { gen_check_sp_alignment(s); } - memop = finalize_memop(s, size | MO_ALIGN); + memop = check_atomic_align(s, rn, size); clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn), true, rn != 31, memop); tcg_gen_atomic_cmpxchg_i64(tcg_rs, clean_addr, tcg_rs, tcg_rt, memidx, memop); @@ -XXX,XX +XXX,XX @@ static void gen_compare_and_swap_pair(DisasContext *s, int rs, int rt, } /* This is a single atomic access, despite the "pair". */ - memop = finalize_memop(s, (size + 1) | MO_ALIGN); + memop = check_atomic_align(s, rn, size + 1); clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn), true, rn != 31, memop); if (size == 2) { @@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn) gen_check_sp_alignment(s); } tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL); - /* TODO: ARMv8.4-LSE SCTLR.nAA */ - memop = finalize_memop(s, size | MO_ALIGN); + memop = check_ordered_align(s, rn, 0, true, size); clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn), true, rn != 31, memop); do_gpr_st(s, cpu_reg(s, rt), clean_addr, memop, true, rt, @@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn) if (rn == 31) { gen_check_sp_alignment(s); } - /* TODO: ARMv8.4-LSE SCTLR.nAA */ - memop = finalize_memop(s, size | MO_ALIGN); + memop = check_ordered_align(s, rn, 0, false, size); clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn), false, rn != 31, memop); do_gpr_ld(s, cpu_reg(s, rt), clean_addr, memop, false, true, @@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn, bool a = extract32(insn, 23, 1); TCGv_i64 tcg_rs, tcg_rt, clean_addr; AtomicThreeOpFn *fn = NULL; - MemOp mop = finalize_memop(s, size | MO_ALIGN); + MemOp mop = size; if (is_vector || !dc_isar_feature(aa64_atomics, s)) { unallocated_encoding(s); @@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn, if (rn == 31) { gen_check_sp_alignment(s); } + + mop = check_atomic_align(s, rn, mop); clean_addr = gen_mte_check1(s, cpu_reg_sp(s, rn), false, rn != 31, mop); if (o3_opc == 014) { @@ -XXX,XX +XXX,XX @@ static void disas_ldst_ldapr_stlr(DisasContext *s, uint32_t insn) bool is_store = false; bool extend = false; bool iss_sf; - MemOp mop; + MemOp mop = size; if (!dc_isar_feature(aa64_rcpc_8_4, s)) { unallocated_encoding(s); return; } - /* TODO: ARMv8.4-LSE SCTLR.nAA */ - mop = finalize_memop(s, size | MO_ALIGN); - switch (opc) { case 0: /* STLURB */ is_store = true; @@ -XXX,XX +XXX,XX @@ static void disas_ldst_ldapr_stlr(DisasContext *s, uint32_t insn) gen_check_sp_alignment(s); } + mop = check_ordered_align(s, rn, offset, is_store, mop); + dirty_addr = read_cpu_reg_sp(s, rn, 1); tcg_gen_addi_i64(dirty_addr, dirty_addr, offset); clean_addr = clean_data_tbi(s, dirty_addr); -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Push the mte check behind the exclusive_addr check. Document the several ways that we are still out of spec with this implementation. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20230530191438.411344-18-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-a64.c | 42 +++++++++++++++++++++++++++++----- 1 file changed, 36 insertions(+), 6 deletions(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2, */ TCGLabel *fail_label = gen_new_label(); TCGLabel *done_label = gen_new_label(); - TCGv_i64 tmp, dirty_addr, clean_addr; + TCGv_i64 tmp, clean_addr; MemOp memop; - memop = (size + is_pair) | MO_ALIGN; - memop = finalize_memop(s, memop); - - dirty_addr = cpu_reg_sp(s, rn); - clean_addr = gen_mte_check1(s, dirty_addr, true, rn != 31, memop); + /* + * FIXME: We are out of spec here. We have recorded only the address + * from load_exclusive, not the entire range, and we assume that the + * size of the access on both sides match. The architecture allows the + * store to be smaller than the load, so long as the stored bytes are + * within the range recorded by the load. + */ + /* See AArch64.ExclusiveMonitorsPass() and AArch64.IsExclusiveVA(). */ + clean_addr = clean_data_tbi(s, cpu_reg_sp(s, rn)); tcg_gen_brcond_i64(TCG_COND_NE, clean_addr, cpu_exclusive_addr, fail_label); + /* + * The write, and any associated faults, only happen if the virtual + * and physical addresses pass the exclusive monitor check. These + * faults are exceedingly unlikely, because normally the guest uses + * the exact same address register for the load_exclusive, and we + * would have recognized these faults there. + * + * It is possible to trigger an alignment fault pre-LSE2, e.g. with an + * unaligned 4-byte write within the range of an aligned 8-byte load. + * With LSE2, the store would need to cross a 16-byte boundary when the + * load did not, which would mean the store is outside the range + * recorded for the monitor, which would have failed a corrected monitor + * check above. For now, we assume no size change and retain the + * MO_ALIGN to let tcg know what we checked in the load_exclusive. + * + * It is possible to trigger an MTE fault, by performing the load with + * a virtual address with a valid tag and performing the store with the + * same virtual address and a different invalid tag. + */ + memop = size + is_pair; + if (memop == MO_128 || !dc_isar_feature(aa64_lse2, s)) { + memop |= MO_ALIGN; + } + memop = finalize_memop(s, memop); + gen_mte_check1(s, cpu_reg_sp(s, rn), true, rn != 31, memop); + tmp = tcg_temp_new_i64(); if (is_pair) { if (size == 2) { -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> We have many other instances of stg in the testsuite; change these to provide an instance of stz2g. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20230530191438.411344-19-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- tests/tcg/aarch64/mte-7.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/tests/tcg/aarch64/mte-7.c b/tests/tcg/aarch64/mte-7.c index XXXXXXX..XXXXXXX 100644 --- a/tests/tcg/aarch64/mte-7.c +++ b/tests/tcg/aarch64/mte-7.c @@ -XXX,XX +XXX,XX @@ int main(int ac, char **av) p = (void *)((unsigned long)p | (1ul << 56)); /* Store tag in sequential granules. */ - asm("stg %0, [%0]" : : "r"(p + 0x0ff0)); - asm("stg %0, [%0]" : : "r"(p + 0x1000)); + asm("stz2g %0, [%0]" : : "r"(p + 0x0ff0)); /* * Perform an unaligned store with tag 1 crossing the pages. -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> With -cpu max and FEAT_LSE2, the __aarch64__ section will only raise an alignment exception when the load crosses a 16-byte boundary. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20230530191438.411344-20-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- tests/tcg/multiarch/sigbus.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/tests/tcg/multiarch/sigbus.c b/tests/tcg/multiarch/sigbus.c index XXXXXXX..XXXXXXX 100644 --- a/tests/tcg/multiarch/sigbus.c +++ b/tests/tcg/multiarch/sigbus.c @@ -XXX,XX +XXX,XX @@ #include <endian.h> -unsigned long long x = 0x8877665544332211ull; -void * volatile p = (void *)&x + 1; +char x[32] __attribute__((aligned(16))) = { + 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, + 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, 0x10, + 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, + 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f, 0x20, +}; +void * volatile p = (void *)&x + 15; void sigbus(int sig, siginfo_t *info, void *uc) { @@ -XXX,XX +XXX,XX @@ int main() * We might as well validate the unaligned load worked. */ if (BYTE_ORDER == LITTLE_ENDIAN) { - assert(tmp == 0x55443322); + assert(tmp == 0x13121110); } else { - assert(tmp == 0x77665544); + assert(tmp == 0x10111213); } return EXIT_SUCCESS; } -- 2.34.1
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20230530191438.411344-21-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- docs/system/arm/emulation.rst | 1 + target/arm/tcg/cpu64.c | 1 + 2 files changed, 2 insertions(+) diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst index XXXXXXX..XXXXXXX 100644 --- a/docs/system/arm/emulation.rst +++ b/docs/system/arm/emulation.rst @@ -XXX,XX +XXX,XX @@ the following architecture extensions: - FEAT_LRCPC (Load-acquire RCpc instructions) - FEAT_LRCPC2 (Load-acquire RCpc instructions v2) - FEAT_LSE (Large System Extensions) +- FEAT_LSE2 (Large System Extensions v2) - FEAT_LVA (Large Virtual Address space) - FEAT_MTE (Memory Tagging Extension) - FEAT_MTE2 (Memory Tagging Extension) diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/cpu64.c +++ b/target/arm/tcg/cpu64.c @@ -XXX,XX +XXX,XX @@ void aarch64_max_tcg_initfn(Object *obj) t = FIELD_DP64(t, ID_AA64MMFR2, IESB, 1); /* FEAT_IESB */ t = FIELD_DP64(t, ID_AA64MMFR2, VARANGE, 1); /* FEAT_LVA */ t = FIELD_DP64(t, ID_AA64MMFR2, ST, 1); /* FEAT_TTST */ + t = FIELD_DP64(t, ID_AA64MMFR2, AT, 1); /* FEAT_LSE2 */ t = FIELD_DP64(t, ID_AA64MMFR2, IDS, 1); /* FEAT_IDST */ t = FIELD_DP64(t, ID_AA64MMFR2, FWB, 1); /* FEAT_S2FWB */ t = FIELD_DP64(t, ID_AA64MMFR2, TTL, 1); /* FEAT_TTL */ -- 2.34.1
From: Zhuojia Shen <chaosdefinition@hotmail.com> DC CVAP and DC CVADP instructions can be executed in EL0 on Linux, either directly when SCTLR_EL1.UCI == 1 or emulated by the kernel (see user_cache_maint_handler() in arch/arm64/kernel/traps.c). This patch enables execution of the two instructions in user mode emulation. Signed-off-by: Zhuojia Shen <chaosdefinition@hotmail.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/helper.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/target/arm/helper.c b/target/arm/helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo rndr_reginfo[] = { .access = PL0_R, .readfn = rndr_readfn }, }; -#ifndef CONFIG_USER_ONLY static void dccvap_writefn(CPUARMState *env, const ARMCPRegInfo *opaque, uint64_t value) { @@ -XXX,XX +XXX,XX @@ static void dccvap_writefn(CPUARMState *env, const ARMCPRegInfo *opaque, /* This won't be crossing page boundaries */ haddr = probe_read(env, vaddr, dline_size, mem_idx, GETPC()); if (haddr) { +#ifndef CONFIG_USER_ONLY ram_addr_t offset; MemoryRegion *mr; @@ -XXX,XX +XXX,XX @@ static void dccvap_writefn(CPUARMState *env, const ARMCPRegInfo *opaque, if (mr) { memory_region_writeback(mr, offset, dline_size); } +#endif /*CONFIG_USER_ONLY*/ } } @@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo dcpodp_reg[] = { .fgt = FGT_DCCVADP, .accessfn = aa64_cacheop_poc_access, .writefn = dccvap_writefn }, }; -#endif /*CONFIG_USER_ONLY*/ static CPAccessResult access_aa64_tid5(CPUARMState *env, const ARMCPRegInfo *ri, bool isread) @@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu) if (cpu_isar_feature(aa64_tlbios, cpu)) { define_arm_cp_regs(cpu, tlbios_reginfo); } -#ifndef CONFIG_USER_ONLY /* Data Cache clean instructions up to PoP */ if (cpu_isar_feature(aa64_dcpop, cpu)) { define_one_arm_cp_reg(cpu, dcpop_reg); @@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu) define_one_arm_cp_reg(cpu, dcpodp_reg); } } -#endif /*CONFIG_USER_ONLY*/ /* * If full MTE is enabled, add all of the system registers. -- 2.34.1
From: Zhuojia Shen <chaosdefinition@hotmail.com> Test execution of DC CVAP and DC CVADP instructions under user mode emulation. Signed-off-by: Zhuojia Shen <chaosdefinition@hotmail.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- tests/tcg/aarch64/dcpodp.c | 63 +++++++++++++++++++++++++++++++ tests/tcg/aarch64/dcpop.c | 63 +++++++++++++++++++++++++++++++ tests/tcg/aarch64/Makefile.target | 11 ++++++ 3 files changed, 137 insertions(+) create mode 100644 tests/tcg/aarch64/dcpodp.c create mode 100644 tests/tcg/aarch64/dcpop.c diff --git a/tests/tcg/aarch64/dcpodp.c b/tests/tcg/aarch64/dcpodp.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/tests/tcg/aarch64/dcpodp.c @@ -XXX,XX +XXX,XX @@ +/* + * Test execution of DC CVADP instruction. + * + * Copyright (c) 2023 Zhuojia Shen <chaosdefinition@hotmail.com> + * SPDX-License-Identifier: GPL-2.0-or-later + */ + +#include <asm/hwcap.h> +#include <sys/auxv.h> + +#include <signal.h> +#include <stdbool.h> +#include <stdio.h> +#include <stdlib.h> + +#ifndef HWCAP2_DCPODP +#define HWCAP2_DCPODP (1 << 0) +#endif + +bool should_fail = false; + +static void signal_handler(int sig, siginfo_t *si, void *data) +{ + ucontext_t *uc = (ucontext_t *)data; + + if (should_fail) { + uc->uc_mcontext.pc += 4; + } else { + exit(EXIT_FAILURE); + } +} + +static int do_dc_cvadp(void) +{ + struct sigaction sa = { + .sa_flags = SA_SIGINFO, + .sa_sigaction = signal_handler, + }; + + sigemptyset(&sa.sa_mask); + if (sigaction(SIGSEGV, &sa, NULL) < 0) { + perror("sigaction"); + return EXIT_FAILURE; + } + + asm volatile("dc cvadp, %0\n\t" :: "r"(&sa)); + + should_fail = true; + asm volatile("dc cvadp, %0\n\t" :: "r"(NULL)); + should_fail = false; + + return EXIT_SUCCESS; +} + +int main(void) +{ + if (getauxval(AT_HWCAP2) & HWCAP2_DCPODP) { + return do_dc_cvadp(); + } else { + printf("SKIP: no HWCAP2_DCPODP on this system\n"); + return EXIT_SUCCESS; + } +} diff --git a/tests/tcg/aarch64/dcpop.c b/tests/tcg/aarch64/dcpop.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/tests/tcg/aarch64/dcpop.c @@ -XXX,XX +XXX,XX @@ +/* + * Test execution of DC CVAP instruction. + * + * Copyright (c) 2023 Zhuojia Shen <chaosdefinition@hotmail.com> + * SPDX-License-Identifier: GPL-2.0-or-later + */ + +#include <asm/hwcap.h> +#include <sys/auxv.h> + +#include <signal.h> +#include <stdbool.h> +#include <stdio.h> +#include <stdlib.h> + +#ifndef HWCAP_DCPOP +#define HWCAP_DCPOP (1 << 16) +#endif + +bool should_fail = false; + +static void signal_handler(int sig, siginfo_t *si, void *data) +{ + ucontext_t *uc = (ucontext_t *)data; + + if (should_fail) { + uc->uc_mcontext.pc += 4; + } else { + exit(EXIT_FAILURE); + } +} + +static int do_dc_cvap(void) +{ + struct sigaction sa = { + .sa_flags = SA_SIGINFO, + .sa_sigaction = signal_handler, + }; + + sigemptyset(&sa.sa_mask); + if (sigaction(SIGSEGV, &sa, NULL) < 0) { + perror("sigaction"); + return EXIT_FAILURE; + } + + asm volatile("dc cvap, %0\n\t" :: "r"(&sa)); + + should_fail = true; + asm volatile("dc cvap, %0\n\t" :: "r"(NULL)); + should_fail = false; + + return EXIT_SUCCESS; +} + +int main(void) +{ + if (getauxval(AT_HWCAP) & HWCAP_DCPOP) { + return do_dc_cvap(); + } else { + printf("SKIP: no HWCAP_DCPOP on this system\n"); + return EXIT_SUCCESS; + } +} diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target index XXXXXXX..XXXXXXX 100644 --- a/tests/tcg/aarch64/Makefile.target +++ b/tests/tcg/aarch64/Makefile.target @@ -XXX,XX +XXX,XX @@ config-cc.mak: Makefile $(quiet-@)( \ $(call cc-option,-march=armv8.1-a+sve, CROSS_CC_HAS_SVE); \ $(call cc-option,-march=armv8.1-a+sve2, CROSS_CC_HAS_SVE2); \ + $(call cc-option,-march=armv8.2-a, CROSS_CC_HAS_ARMV8_2); \ $(call cc-option,-march=armv8.3-a, CROSS_CC_HAS_ARMV8_3); \ + $(call cc-option,-march=armv8.5-a, CROSS_CC_HAS_ARMV8_5); \ $(call cc-option,-mbranch-protection=standard, CROSS_CC_HAS_ARMV8_BTI); \ $(call cc-option,-march=armv8.5-a+memtag, CROSS_CC_HAS_ARMV8_MTE); \ $(call cc-option,-march=armv9-a+sme, CROSS_CC_HAS_ARMV9_SME)) 3> config-cc.mak -include config-cc.mak +ifneq ($(CROSS_CC_HAS_ARMV8_2),) +AARCH64_TESTS += dcpop +dcpop: CFLAGS += -march=armv8.2-a +endif +ifneq ($(CROSS_CC_HAS_ARMV8_5),) +AARCH64_TESTS += dcpodp +dcpodp: CFLAGS += -march=armv8.5-a +endif + # Pauth Tests ifneq ($(CROSS_CC_HAS_ARMV8_3),) AARCH64_TESTS += pauth-1 pauth-2 pauth-4 pauth-5 -- 2.34.1
From: Zhuojia Shen <chaosdefinition@hotmail.com> Accessing EL0-accessible Debug Communication Channel (DCC) registers in user mode emulation is currently enabled. However, it does not match Linux behavior as Linux sets MDSCR_EL1.TDCC on startup to disable EL0 access to DCC (see __cpu_setup() in arch/arm64/mm/proc.S). This patch fixes access_tdcc() to check MDSCR_EL1.TDCC for EL0 and sets MDSCR_EL1.TDCC for user mode emulation to match Linux. Signed-off-by: Zhuojia Shen <chaosdefinition@hotmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: DS7PR12MB630905198DD8E69F6817544CAC4EA@DS7PR12MB6309.namprd12.prod.outlook.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/cpu.c | 2 ++ target/arm/debug_helper.c | 5 +++++ 2 files changed, 7 insertions(+) diff --git a/target/arm/cpu.c b/target/arm/cpu.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj) * This is not yet exposed from the Linux kernel in any way. */ env->cp15.sctlr_el[1] |= SCTLR_TSCXT; + /* Disable access to Debug Communication Channel (DCC). */ + env->cp15.mdscr_el1 |= 1 << 12; #else /* Reset into the highest available EL */ if (arm_feature(env, ARM_FEATURE_EL3)) { diff --git a/target/arm/debug_helper.c b/target/arm/debug_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/debug_helper.c +++ b/target/arm/debug_helper.c @@ -XXX,XX +XXX,XX @@ static CPAccessResult access_tda(CPUARMState *env, const ARMCPRegInfo *ri, * is implemented then these are controlled by MDCR_EL2.TDCC for * EL2 and MDCR_EL3.TDCC for EL3. They are also controlled by * the general debug access trap bits MDCR_EL2.TDA and MDCR_EL3.TDA. + * For EL0, they are also controlled by MDSCR_EL1.TDCC. */ static CPAccessResult access_tdcc(CPUARMState *env, const ARMCPRegInfo *ri, bool isread) { int el = arm_current_el(env); uint64_t mdcr_el2 = arm_mdcr_el2_eff(env); + bool mdscr_el1_tdcc = extract32(env->cp15.mdscr_el1, 12, 1); bool mdcr_el2_tda = (mdcr_el2 & MDCR_TDA) || (mdcr_el2 & MDCR_TDE) || (arm_hcr_el2_eff(env) & HCR_TGE); bool mdcr_el2_tdcc = cpu_isar_feature(aa64_fgt, env_archcpu(env)) && @@ -XXX,XX +XXX,XX @@ static CPAccessResult access_tdcc(CPUARMState *env, const ARMCPRegInfo *ri, bool mdcr_el3_tdcc = cpu_isar_feature(aa64_fgt, env_archcpu(env)) && (env->cp15.mdcr_el3 & MDCR_TDCC); + if (el < 1 && mdscr_el1_tdcc) { + return CP_ACCESS_TRAP; + } if (el < 2 && (mdcr_el2_tda || mdcr_el2_tdcc)) { return CP_ACCESS_TRAP_EL2; } -- 2.34.1
Hi; here's a target-arm pullreq. Apologies for the size: this is because it has all RTH's work to enable emulation of SME2p1 and SVE2p1. thanks -- PMM The following changes since commit c77283dd5d79149f4e7e9edd00f65416c648ee59: Merge tag 'pull-request-2025-07-02' of https://gitlab.com/thuth/qemu into staging (2025-07-03 06:01:41 -0400) are available in the Git repository at: https://gitlab.com/pm215/qemu.git tags/pull-target-arm-20250704 for you to fetch changes up to 083fef73585dfa03f72055ace6de8dec4912d0b0: linux-user/aarch64: Set hwcap bits for SME2p1/SVE2p1 (2025-07-04 15:53:23 +0100) ---------------------------------------------------------------- target-arm queue: * Implement emulation of SME2p1 and SVE2p1 * Correctly enforce alignment checks for v8M loads and stores done via helper functions * Mark the "highbank" and the "midway" machine as deprecated ---------------------------------------------------------------- Peter Maydell (4): target/arm: Rename FMOPA_h to FMOPA_w_h target/arm: Rename BFMOPA to BFMOPA_w target/arm: Implement FMOPA (non-widening) for fp16 target/arm: Implement SME2 BFMOPA (non-widening) Richard Henderson (103): target/arm: Fix SME vs AdvSIMD exception priority target/arm: Fix sve_access_check for SME target/arm: Fix 128-bit element ZIP, UZP, TRN target/arm: Replace @rda_rn_rm_e0 in sve.decode target/arm: Fix FMMLA (64-bit element) for 128-bit VL target/arm: Disable FEAT_F64MM if maximum SVE vector size too small target/arm: Fix PSEL size operands to tcg_gen_gvec_ands target/arm: Fix f16_dotadd vs nan selection target/arm: Fix bfdotadd_ebf vs nan selection target/arm: Remove CPUARMState.vfp.scratch target/arm: Introduce FPST_ZA, FPST_ZA_F16 target/arm: Use FPST_ZA for sme_fmopa_[hsd] target/arm: Rename zarray to za_state.za target/arm: Add isar feature tests for SME2p1, SVE2p1 target/arm: Add ZT0 target/arm: Add zt0_excp_el to DisasContext target/arm: Implement SME2 ZERO ZT0 target/arm: Add alignment argument to gen_sve_{ldr, str} target/arm: Implement SME2 LDR/STR ZT0 target/arm: Implement SME2 MOVT target/arm: Split get_tile_rowcol argument tile_index target/arm: Rename MOVA for translate target/arm: Split out get_zarray target/arm: Introduce ARMCPU.sme_max_vq target/arm: Implement SME2 MOVA to/from tile, multiple registers target/arm: Implement SME2 MOVA to/from array, multiple registers target/arm: Implement SME2 BMOPA target/arm: Implement SME2 SMOPS, UMOPS (2-way) target/arm: Introduce gen_gvec_sve2_sqdmulh target/arm: Implement SME2 Multiple and Single SVE Destructive target/arm: Implement SME2 Multiple Vectors SVE Destructive target/arm: Implement SME2 ADD/SUB (array results, multiple and single vector) target/arm: Implement SME2 ADD/SUB (array results, multiple vectors) target/arm: Pass ZA to helper_sve2_fmlal_zz[zx]w_s target/arm: Add helper_gvec{_ah}_bfmlsl{_nx} target/arm: Implement SME2 FMLAL, BFMLAL target/arm: Implement SME2 FDOT target/arm: Implement SME2 BFDOT target/arm: Implement SME2 FVDOT, BFVDOT target/arm: Rename helper_gvec_*dot_[bh] to *_4[bh] target/arm: Implemement SME2 SDOT, UDOT, USDOT, SUDOT target/arm: Rename SVE SDOT and UDOT patterns target/arm: Tighten USDOT (vectors) decode target/arm: Implement SDOT, UDOT (2-way) for SME2/SVE2p1 target/arm: Implement SME2 SVDOT, UVDOT, SUVDOT, USVDOT target/arm: Implement SME2 SMLAL, SMLSL, UMLAL, UMLSL target/arm: Rename gvec_fml[as]_[hs] with _nf_ infix target/arm: Implement SME2 FMLA, FMLS target/arm: Implement SME2 BFMLA, BFMLS target/arm: Implement SME2 FADD, FSUB, BFADD, BFSUB target/arm: Implement SME2 ADD/SUB (array accumulator) target/arm: Implement SME2 BFCVT, BFCVTN, FCVT, FCVTN target/arm: Implement SME2 FCVT (widening), FCVTL target/arm: Implement SME2 FCVTZS, FCVTZU target/arm: Implement SME2 SCVTF, UCVTF target/arm: Implement SME2 FRINTN, FRINTP, FRINTM, FRINTA target/arm: Introduce do_[us]sat_[bhs] macros target/arm: Use do_[us]sat_[bhs] in sve_helper.c target/arm: Implement SME2 SQCVT, UQCVT, SQCVTU target/arm: Implement SQCVTN, UQCVTN, SQCVTUN for SME2/SVE2p1 target/arm: Implement SME2 SUNPK, UUNPK target/arm: Implement SME2 ZIP, UZP (four registers) target/arm: Move do_urshr, do_srshr to vec_internal.h target/arm: Implement SME2 SQRSHR, UQRSHR, SQRSHRN target/arm: Implement SME2 ZIP, UZP (two registers) target/arm: Implement SME2 FCLAMP, SCLAMP, UCLAMP target/arm: Enable SCLAMP, UCLAMP for SVE2p1 target/arm: Implement FCLAMP for SME2, SVE2p1 target/arm: Implement SME2p1 Multiple Zero target/arm: Introduce pred_count_test target/arm: Fold predtest_ones into helper_sve_brkns target/arm: Expand do_zero inline target/arm: Split out do_whilel from helper_sve_whilel target/arm: Split out do_whileg from helper_sve_whileg target/arm: Move scale by esz into helper_sve_while* target/arm: Split trans_WHILE to lt and gt target/arm: Enable PSEL for SVE2p1 target/arm: Implement SVE2p1 WHILE (predicate pair) target/arm: Implement SVE2p1 WHILE (predicate as counter) target/arm: Implement SVE2p1 PTRUE (predicate as counter) target/arm: Implement {ADD, SMIN, SMAX, UMIN, UMAX}QV for SVE2p1 target/arm: Implement SVE2p1 PEXT target/arm: Implement SME2 SEL target/arm: Implement ANDQV, ORQV, EORQV for SVE2p1 target/arm: Implement FADDQV, F{MIN, MAX}{NM}QV for SVE2p1 target/arm: Implement BFMLSLB{L, T} for SME2/SVE2p1 target/arm: Implement CNTP (predicate as counter) for SME2/SVE2p1 target/arm: Implement DUPQ for SME2p1/SVE2p1 target/arm: Implement EXTQ for SME2p1/SVE2p1 target/arm: Implement PMOV for SME2p1/SVE2p1 target/arm: Implement ZIPQ, UZPQ for SME2p1/SVE2p1 target/arm: Implement TBLQ, TBXQ for SME2p1/SVE2p1 target/arm: Implement SME2 counted predicate register load/store target/arm: Split the ST_zpri and ST_zprr patterns target/arm: Implement {LD1, ST1}{W, D} (128-bit element) for SVE2p1 target/arm: Move ld1qq and st1qq primitives to sve_ldst_internal.h target/arm: Implement {LD, ST}[234]Q for SME2p1/SVE2p1 target/arm: Implement LD1Q, ST1Q for SVE2p1 target/arm: Implement MOVAZ for SME2p1 target/arm: Implement LUTI2, LUTI4 for SME2/SME2p1 target/arm: Support FPCR.AH in SME FMOPS, BFMOPS target/arm: Enable FEAT_SME2p1 on -cpu max linux-user/aarch64: Set hwcap bits for SME2p1/SVE2p1 Thomas Huth (1): hw/arm/highbank: Mark the "highbank" and the "midway" machine as deprecated William Kosasih (11): target/arm: Bring VLSTM/VLLDM helper store/load closer to the ARM pseudocode target/arm: Fix BLXNS helper store alignment checks target/arm: Fix function_return helper load alignment checks target/arm: Fix VLDR helper load alignment checks target/arm: Fix VSTR helper store alignment checks target/arm: Fix VLDR_SG helper load alignment checks target/arm: Fix VSTR_SG helper store alignment checks target/arm: Fix VLD4 helper load alignment checks target/arm: Fix VLD2 helper load alignment checks target/arm: Fix VST4 helper store alignment checks target/arm: Fix VST2 helper store alignment checks docs/about/deprecated.rst | 7 + docs/system/arm/emulation.rst | 6 + target/arm/cpu-features.h | 63 ++ target/arm/cpu.h | 70 +- target/arm/syndrome.h | 1 + target/arm/tcg/helper-sme.h | 215 ++++- target/arm/tcg/helper-sve.h | 212 +++++ target/arm/tcg/helper.h | 91 +- target/arm/tcg/sve_ldst_internal.h | 89 ++ target/arm/tcg/translate-a64.h | 10 +- target/arm/tcg/translate.h | 9 + target/arm/tcg/vec_internal.h | 148 ++++ target/arm/tcg/sme.decode | 937 +++++++++++++++++++- target/arm/tcg/sve.decode | 327 +++++-- hw/arm/highbank.c | 2 + linux-user/aarch64/signal.c | 4 +- linux-user/elfload.c | 8 + target/arm/cpu.c | 11 +- target/arm/cpu64.c | 8 + target/arm/helper.c | 8 +- target/arm/machine.c | 22 +- target/arm/tcg/cpu64.c | 10 +- target/arm/tcg/gengvec64.c | 11 + target/arm/tcg/helper-a64.c | 2 + target/arm/tcg/hflags.c | 34 +- target/arm/tcg/m_helper.c | 33 +- target/arm/tcg/mve_helper.c | 183 ++-- target/arm/tcg/neon_helper.c | 30 + target/arm/tcg/sme_helper.c | 1674 +++++++++++++++++++++++++++++++++--- target/arm/tcg/sve_helper.c | 1201 +++++++++++++++++++++----- target/arm/tcg/translate-a64.c | 45 +- target/arm/tcg/translate-neon.c | 18 +- target/arm/tcg/translate-sme.c | 1480 ++++++++++++++++++++++++++++++- target/arm/tcg/translate-sve.c | 1019 +++++++++++++++++++--- target/arm/tcg/vec_helper.c | 384 +++++++-- target/arm/tcg/vfp_helper.c | 12 +- 36 files changed, 7605 insertions(+), 779 deletions(-)
From: Thomas Huth <thuth@redhat.com> We don't have any automatic regression tests for these machines and when asking the usual suspects on the mailing list we came to the conclusion that nobody tests these machines manually, too, so it seems like this is currently just completely unused code. Mark them as depre- cated to see whether anybody still speaks up during the deprecation period, otherwise we can likely remove these two machines in a couple of releases. Signed-off-by: Thomas Huth <thuth@redhat.com> Acked-by: Rob Herring <robh@kernel.org> Acked-by: Guenter Roeck <linux@roeck-us.net> Message-id: 20250702113051.46483-1-thuth@redhat.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> [PMM: tweaked deprecation.rst text] Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- docs/about/deprecated.rst | 7 +++++++ hw/arm/highbank.c | 2 ++ 2 files changed, 9 insertions(+) diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst index XXXXXXX..XXXXXXX 100644 --- a/docs/about/deprecated.rst +++ b/docs/about/deprecated.rst @@ -XXX,XX +XXX,XX @@ they want to use and avoids confusion. Existing users of the ``spike`` machine must ensure that they're setting the ``spike`` machine in the command line (``-M spike``). +Arm ``highbank`` and ``midway`` machines (since 10.1) +''''''''''''''''''''''''''''''''''''''''''''''''''''' + +There are no known users left for these machines (if you still use it, +please write a mail to the qemu-devel mailing list). If you just want to +boot a Cortex-A15 or Cortex-A9 Linux, use the ``virt`` machine instead. + System emulator binaries ------------------------ diff --git a/hw/arm/highbank.c b/hw/arm/highbank.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/highbank.c +++ b/hw/arm/highbank.c @@ -XXX,XX +XXX,XX @@ static void highbank_class_init(ObjectClass *oc, const void *data) mc->max_cpus = 4; mc->ignore_memory_transaction_failures = true; mc->default_ram_id = "highbank.dram"; + mc->deprecation_reason = "no known users left for this machine"; } static const TypeInfo highbank_type = { @@ -XXX,XX +XXX,XX @@ static void midway_class_init(ObjectClass *oc, const void *data) mc->max_cpus = 4; mc->ignore_memory_transaction_failures = true; mc->default_ram_id = "highbank.dram"; + mc->deprecation_reason = "no known users left for this machine"; } static const TypeInfo midway_type = { -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch brings the VLSTM and VLLDM helper functions closer to the ARM pseudocode by adding MO_ALIGN to the MemOpIdx of the associated store (`cpu_stl_mmu`) operations and load (`cpu_ldl_mmu`) operations. Note that this is not a bug fix: an 8-byte alignment check already exists and remains in place, enforcing stricter alignment than the 4 bytes requirement in the individual loads and stores. This change merely makes the helper implementations closer to the ARM pseudocode. That said, as a side effect, the MMU index is now resolved once instead of on every `cpu_*_data_ra` call, reducing redundant lookups Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-2-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/m_helper.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/target/arm/tcg/m_helper.c b/target/arm/tcg/m_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/m_helper.c +++ b/target/arm/tcg/m_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fptr) bool s = env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_S_MASK; bool lspact = env->v7m.fpccr[s] & R_V7M_FPCCR_LSPACT_MASK; uintptr_t ra = GETPC(); + ARMMMUIdx mmu_idx = arm_mmu_idx(env); + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, + arm_to_core_mmu_idx(mmu_idx)); assert(env->v7m.secure); @@ -XXX,XX +XXX,XX @@ void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fptr) * Note that we do not use v7m_stack_write() here, because the * accesses should not set the FSR bits for stacking errors if they * fail. (In pseudocode terms, they are AccType_NORMAL, not AccType_STACK - * or AccType_LAZYFP). Faults in cpu_stl_data_ra() will throw exceptions + * or AccType_LAZYFP). Faults in cpu_stl_mmu() will throw exceptions * and longjmp out. */ if (!(env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_LSPEN_MASK)) { @@ -XXX,XX +XXX,XX @@ void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fptr) if (i >= 16) { faddr += 8; /* skip the slot for the FPSCR */ } - cpu_stl_data_ra(env, faddr, slo, ra); - cpu_stl_data_ra(env, faddr + 4, shi, ra); + cpu_stl_mmu(env, faddr, slo, oi, ra); + cpu_stl_mmu(env, faddr + 4, shi, oi, ra); } - cpu_stl_data_ra(env, fptr + 0x40, vfp_get_fpscr(env), ra); + cpu_stl_mmu(env, fptr + 0x40, vfp_get_fpscr(env), oi, ra); if (cpu_isar_feature(aa32_mve, cpu)) { - cpu_stl_data_ra(env, fptr + 0x44, env->v7m.vpr, ra); + cpu_stl_mmu(env, fptr + 0x44, env->v7m.vpr, oi, ra); } /* @@ -XXX,XX +XXX,XX @@ void HELPER(v7m_vlldm)(CPUARMState *env, uint32_t fptr) { ARMCPU *cpu = env_archcpu(env); uintptr_t ra = GETPC(); + ARMMMUIdx mmu_idx = arm_mmu_idx(env); + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, + arm_to_core_mmu_idx(mmu_idx)); /* fptr is the value of Rn, the frame pointer we load the FP regs from */ assert(env->v7m.secure); @@ -XXX,XX +XXX,XX @@ void HELPER(v7m_vlldm)(CPUARMState *env, uint32_t fptr) faddr += 8; /* skip the slot for the FPSCR and VPR */ } - slo = cpu_ldl_data_ra(env, faddr, ra); - shi = cpu_ldl_data_ra(env, faddr + 4, ra); + slo = cpu_ldl_mmu(env, faddr, oi, ra); + shi = cpu_ldl_mmu(env, faddr + 4, oi, ra); dn = (uint64_t) shi << 32 | slo; *aa32_vfp_dreg(env, i / 2) = dn; } - fpscr = cpu_ldl_data_ra(env, fptr + 0x40, ra); + fpscr = cpu_ldl_mmu(env, fptr + 0x40, oi, ra); vfp_set_fpscr(env, fpscr); if (cpu_isar_feature(aa32_mve, cpu)) { - env->v7m.vpr = cpu_ldl_data_ra(env, fptr + 0x44, ra); + env->v7m.vpr = cpu_ldl_mmu(env, fptr + 0x44, oi, ra); } } -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the store operations (when stacking the return pc and psr) in the BLXNS instruction. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-3-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/m_helper.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/target/arm/tcg/m_helper.c b/target/arm/tcg/m_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/m_helper.c +++ b/target/arm/tcg/m_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(v7m_blxns)(CPUARMState *env, uint32_t dest) } /* Note that these stores can throw exceptions on MPU faults */ - cpu_stl_data_ra(env, sp, nextinst, GETPC()); - cpu_stl_data_ra(env, sp + 4, saved_psr, GETPC()); + ARMMMUIdx mmu_idx = arm_mmu_idx(env); + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, + arm_to_core_mmu_idx(mmu_idx)); + cpu_stl_mmu(env, sp, nextinst, oi, GETPC()); + cpu_stl_mmu(env, sp + 4, saved_psr, oi, GETPC()); env->regs[13] = sp; env->regs[14] = 0xfeffffff; -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the load operations (when unstacking the return pc and psr) in the FunctionReturn pseudocode. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-4-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/m_helper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/arm/tcg/m_helper.c b/target/arm/tcg/m_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/m_helper.c +++ b/target/arm/tcg/m_helper.c @@ -XXX,XX +XXX,XX @@ static bool do_v7m_function_return(ARMCPU *cpu) * do them as secure, so work out what MMU index that is. */ mmu_idx = arm_v7m_mmu_idx_for_secstate(env, true); - oi = make_memop_idx(MO_LEUL, arm_to_core_mmu_idx(mmu_idx)); + oi = make_memop_idx(MO_LEUL | MO_ALIGN, arm_to_core_mmu_idx(mmu_idx)); newpc = cpu_ldl_mmu(env, frameptr, oi, 0); newpsr = cpu_ldl_mmu(env, frameptr + 4, oi, 0); -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the load operations in the VLDR instruction. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-5-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/mve_helper.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/mve_helper.c +++ b/target/arm/tcg/mve_helper.c @@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env) } /* For loads, predicated lanes are zeroed instead of keeping their old values */ -#define DO_VLDR(OP, MSIZE, LDTYPE, ESIZE, TYPE) \ +#define DO_VLDR(OP, MFLAG, MSIZE, MTYPE, LDTYPE, ESIZE, TYPE) \ void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr) \ { \ TYPE *d = vd; \ uint16_t mask = mve_element_mask(env); \ uint16_t eci_mask = mve_eci_mask(env); \ unsigned b, e; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MFLAG | MO_ALIGN, mmu_idx); \ /* \ * R_SXTM allows the dest reg to become UNKNOWN for abandoned \ * beats so we don't care if we update part of the dest and \ @@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env) for (b = 0, e = 0; b < 16; b += ESIZE, e++) { \ if (eci_mask & (1 << b)) { \ d[H##ESIZE(e)] = (mask & (1 << b)) ? \ - cpu_##LDTYPE##_data_ra(env, addr, GETPC()) : 0; \ + (MTYPE)cpu_##LDTYPE##_mmu(env, addr, oi, GETPC()) : 0;\ } \ addr += MSIZE; \ } \ @@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env) mve_advance_vpt(env); \ } -DO_VLDR(vldrb, 1, ldub, 1, uint8_t) -DO_VLDR(vldrh, 2, lduw, 2, uint16_t) -DO_VLDR(vldrw, 4, ldl, 4, uint32_t) +DO_VLDR(vldrb, MO_UB, 1, uint8_t, ldb, 1, uint8_t) +DO_VLDR(vldrh, MO_TEUW, 2, uint16_t, ldw, 2, uint16_t) +DO_VLDR(vldrw, MO_TEUL, 4, uint32_t, ldl, 4, uint32_t) DO_VSTR(vstrb, 1, stb, 1, uint8_t) DO_VSTR(vstrh, 2, stw, 2, uint16_t) DO_VSTR(vstrw, 4, stl, 4, uint32_t) -DO_VLDR(vldrb_sh, 1, ldsb, 2, int16_t) -DO_VLDR(vldrb_sw, 1, ldsb, 4, int32_t) -DO_VLDR(vldrb_uh, 1, ldub, 2, uint16_t) -DO_VLDR(vldrb_uw, 1, ldub, 4, uint32_t) -DO_VLDR(vldrh_sw, 2, ldsw, 4, int32_t) -DO_VLDR(vldrh_uw, 2, lduw, 4, uint32_t) +DO_VLDR(vldrb_sh, MO_SB, 1, int8_t, ldb, 2, int16_t) +DO_VLDR(vldrb_sw, MO_SB, 1, int8_t, ldb, 4, int32_t) +DO_VLDR(vldrb_uh, MO_UB, 1, uint8_t, ldb, 2, uint16_t) +DO_VLDR(vldrb_uw, MO_UB, 1, uint8_t, ldb, 4, uint32_t) +DO_VLDR(vldrh_sw, MO_TESW, 2, int16_t, ldw, 4, int32_t) +DO_VLDR(vldrh_uw, MO_TEUW, 2, uint16_t, ldw, 4, uint32_t) DO_VSTR(vstrb_h, 1, stb, 2, int16_t) DO_VSTR(vstrb_w, 1, stb, 4, int32_t) -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the store operations in the VSTR instruction. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-6-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/mve_helper.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/mve_helper.c +++ b/target/arm/tcg/mve_helper.c @@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env) mve_advance_vpt(env); \ } -#define DO_VSTR(OP, MSIZE, STTYPE, ESIZE, TYPE) \ +#define DO_VSTR(OP, MFLAG, MSIZE, STTYPE, ESIZE, TYPE) \ void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr) \ { \ TYPE *d = vd; \ uint16_t mask = mve_element_mask(env); \ unsigned b, e; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MFLAG | MO_ALIGN, mmu_idx); \ for (b = 0, e = 0; b < 16; b += ESIZE, e++) { \ if (mask & (1 << b)) { \ - cpu_##STTYPE##_data_ra(env, addr, d[H##ESIZE(e)], GETPC()); \ + cpu_##STTYPE##_mmu(env, addr, d[H##ESIZE(e)], oi, GETPC()); \ } \ addr += MSIZE; \ } \ @@ -XXX,XX +XXX,XX @@ DO_VLDR(vldrb, MO_UB, 1, uint8_t, ldb, 1, uint8_t) DO_VLDR(vldrh, MO_TEUW, 2, uint16_t, ldw, 2, uint16_t) DO_VLDR(vldrw, MO_TEUL, 4, uint32_t, ldl, 4, uint32_t) -DO_VSTR(vstrb, 1, stb, 1, uint8_t) -DO_VSTR(vstrh, 2, stw, 2, uint16_t) -DO_VSTR(vstrw, 4, stl, 4, uint32_t) +DO_VSTR(vstrb, MO_UB, 1, stb, 1, uint8_t) +DO_VSTR(vstrh, MO_TEUW, 2, stw, 2, uint16_t) +DO_VSTR(vstrw, MO_TEUL, 4, stl, 4, uint32_t) DO_VLDR(vldrb_sh, MO_SB, 1, int8_t, ldb, 2, int16_t) DO_VLDR(vldrb_sw, MO_SB, 1, int8_t, ldb, 4, int32_t) @@ -XXX,XX +XXX,XX @@ DO_VLDR(vldrb_uw, MO_UB, 1, uint8_t, ldb, 4, uint32_t) DO_VLDR(vldrh_sw, MO_TESW, 2, int16_t, ldw, 4, int32_t) DO_VLDR(vldrh_uw, MO_TEUW, 2, uint16_t, ldw, 4, uint32_t) -DO_VSTR(vstrb_h, 1, stb, 2, int16_t) -DO_VSTR(vstrb_w, 1, stb, 4, int32_t) -DO_VSTR(vstrh_w, 2, stw, 4, int32_t) +DO_VSTR(vstrb_h, MO_UB, 1, stb, 2, int16_t) +DO_VSTR(vstrb_w, MO_UB, 1, stb, 4, int32_t) +DO_VSTR(vstrh_w, MO_TEUW, 2, stw, 4, int32_t) #undef DO_VLDR #undef DO_VSTR -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the load operations in the VLDR_SG instructions. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-7-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/mve_helper.c | 42 ++++++++++++++++++++++--------------- 1 file changed, 25 insertions(+), 17 deletions(-) diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/mve_helper.c +++ b/target/arm/tcg/mve_helper.c @@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, MO_TEUW, 2, stw, 4, int32_t) * For loads, predicated lanes are zeroed instead of retaining * their previous values. */ -#define DO_VLDR_SG(OP, LDTYPE, ESIZE, TYPE, OFFTYPE, ADDRFN, WB) \ +#define DO_VLDR_SG(OP, MFLAG, MTYPE, LDTYPE, ESIZE, TYPE, OFFTYPE, ADDRFN, WB)\ void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, \ uint32_t base) \ { \ @@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, MO_TEUW, 2, stw, 4, int32_t) uint16_t eci_mask = mve_eci_mask(env); \ unsigned e; \ uint32_t addr; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MFLAG | MO_ALIGN, mmu_idx); \ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE, eci_mask >>= ESIZE) { \ if (!(eci_mask & 1)) { \ continue; \ } \ addr = ADDRFN(base, m[H##ESIZE(e)]); \ d[H##ESIZE(e)] = (mask & 1) ? \ - cpu_##LDTYPE##_data_ra(env, addr, GETPC()) : 0; \ + (MTYPE)cpu_##LDTYPE##_mmu(env, addr, oi, GETPC()) : 0; \ if (WB) { \ m[H##ESIZE(e)] = addr; \ } \ @@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, MO_TEUW, 2, stw, 4, int32_t) uint16_t eci_mask = mve_eci_mask(env); \ unsigned e; \ uint32_t addr; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (e = 0; e < 16 / 4; e++, mask >>= 4, eci_mask >>= 4) { \ if (!(eci_mask & 1)) { \ continue; \ } \ addr = ADDRFN(base, m[H4(e & ~1)]); \ addr += 4 * (e & 1); \ - d[H4(e)] = (mask & 1) ? cpu_ldl_data_ra(env, addr, GETPC()) : 0; \ + d[H4(e)] = (mask & 1) ? cpu_ldl_mmu(env, addr, oi, GETPC()) : 0; \ if (WB && (e & 1)) { \ m[H4(e & ~1)] = addr - 4; \ } \ @@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, MO_TEUW, 2, stw, 4, int32_t) #define ADDR_ADD_OSW(BASE, OFFSET) ((BASE) + ((OFFSET) << 2)) #define ADDR_ADD_OSD(BASE, OFFSET) ((BASE) + ((OFFSET) << 3)) -DO_VLDR_SG(vldrb_sg_sh, ldsb, 2, int16_t, uint16_t, ADDR_ADD, false) -DO_VLDR_SG(vldrb_sg_sw, ldsb, 4, int32_t, uint32_t, ADDR_ADD, false) -DO_VLDR_SG(vldrh_sg_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD, false) +DO_VLDR_SG(vldrb_sg_sh, MO_SB, int8_t, ldb, 2, int16_t, uint16_t, ADDR_ADD, false) +DO_VLDR_SG(vldrb_sg_sw, MO_SB, int8_t, ldb, 4, int32_t, uint32_t, ADDR_ADD, false) +DO_VLDR_SG(vldrh_sg_sw, MO_TESW, int16_t, ldw, 4, int32_t, uint32_t, ADDR_ADD, false) -DO_VLDR_SG(vldrb_sg_ub, ldub, 1, uint8_t, uint8_t, ADDR_ADD, false) -DO_VLDR_SG(vldrb_sg_uh, ldub, 2, uint16_t, uint16_t, ADDR_ADD, false) -DO_VLDR_SG(vldrb_sg_uw, ldub, 4, uint32_t, uint32_t, ADDR_ADD, false) -DO_VLDR_SG(vldrh_sg_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD, false) -DO_VLDR_SG(vldrh_sg_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD, false) -DO_VLDR_SG(vldrw_sg_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD, false) +DO_VLDR_SG(vldrb_sg_ub, MO_UB, uint8_t, ldb, 1, uint8_t, uint8_t, ADDR_ADD, false) +DO_VLDR_SG(vldrb_sg_uh, MO_UB, uint8_t, ldb, 2, uint16_t, uint16_t, ADDR_ADD, false) +DO_VLDR_SG(vldrb_sg_uw, MO_UB, uint8_t, ldb, 4, uint32_t, uint32_t, ADDR_ADD, false) +DO_VLDR_SG(vldrh_sg_uh, MO_TEUW, uint16_t, ldw, 2, uint16_t, uint16_t, ADDR_ADD, false) +DO_VLDR_SG(vldrh_sg_uw, MO_TEUW, uint16_t, ldw, 4, uint32_t, uint32_t, ADDR_ADD, false) +DO_VLDR_SG(vldrw_sg_uw, MO_TEUL, uint32_t, ldl, 4, uint32_t, uint32_t, ADDR_ADD, false) DO_VLDR64_SG(vldrd_sg_ud, ADDR_ADD, false) -DO_VLDR_SG(vldrh_sg_os_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD_OSH, false) -DO_VLDR_SG(vldrh_sg_os_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD_OSH, false) -DO_VLDR_SG(vldrh_sg_os_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD_OSH, false) -DO_VLDR_SG(vldrw_sg_os_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD_OSW, false) +DO_VLDR_SG(vldrh_sg_os_sw, MO_TESW, int16_t, ldw, 4, + int32_t, uint32_t, ADDR_ADD_OSH, false) +DO_VLDR_SG(vldrh_sg_os_uh, MO_TEUW, uint16_t, ldw, 2, + uint16_t, uint16_t, ADDR_ADD_OSH, false) +DO_VLDR_SG(vldrh_sg_os_uw, MO_TEUW, uint16_t, ldw, 4, + uint32_t, uint32_t, ADDR_ADD_OSH, false) +DO_VLDR_SG(vldrw_sg_os_uw, MO_TEUL, uint32_t, ldl, 4, + uint32_t, uint32_t, ADDR_ADD_OSW, false) DO_VLDR64_SG(vldrd_sg_os_ud, ADDR_ADD_OSD, false) DO_VSTR_SG(vstrb_sg_ub, stb, 1, uint8_t, ADDR_ADD, false) @@ -XXX,XX +XXX,XX @@ DO_VSTR_SG(vstrh_sg_os_uw, stw, 4, uint32_t, ADDR_ADD_OSH, false) DO_VSTR_SG(vstrw_sg_os_uw, stl, 4, uint32_t, ADDR_ADD_OSW, false) DO_VSTR64_SG(vstrd_sg_os_ud, ADDR_ADD_OSD, false) -DO_VLDR_SG(vldrw_sg_wb_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD, true) +DO_VLDR_SG(vldrw_sg_wb_uw, MO_TEUL, uint32_t, ldl, 4, uint32_t, uint32_t, ADDR_ADD, true) DO_VLDR64_SG(vldrd_sg_wb_ud, ADDR_ADD, true) DO_VSTR_SG(vstrw_sg_wb_uw, stl, 4, uint32_t, ADDR_ADD, true) DO_VSTR64_SG(vstrd_sg_wb_ud, ADDR_ADD, true) -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the store operations in the VSTR_SG instructions. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-8-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/mve_helper.c | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/mve_helper.c +++ b/target/arm/tcg/mve_helper.c @@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, MO_TEUW, 2, stw, 4, int32_t) } /* We know here TYPE is unsigned so always the same as the offset type */ -#define DO_VSTR_SG(OP, STTYPE, ESIZE, TYPE, ADDRFN, WB) \ +#define DO_VSTR_SG(OP, MFLAG, STTYPE, ESIZE, TYPE, ADDRFN, WB) \ void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, \ uint32_t base) \ { \ @@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, MO_TEUW, 2, stw, 4, int32_t) uint16_t eci_mask = mve_eci_mask(env); \ unsigned e; \ uint32_t addr; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MFLAG | MO_ALIGN, mmu_idx); \ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE, eci_mask >>= ESIZE) { \ if (!(eci_mask & 1)) { \ continue; \ } \ addr = ADDRFN(base, m[H##ESIZE(e)]); \ if (mask & 1) { \ - cpu_##STTYPE##_data_ra(env, addr, d[H##ESIZE(e)], GETPC()); \ + cpu_##STTYPE##_mmu(env, addr, d[H##ESIZE(e)], oi, GETPC()); \ } \ if (WB) { \ m[H##ESIZE(e)] = addr; \ @@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, MO_TEUW, 2, stw, 4, int32_t) uint16_t eci_mask = mve_eci_mask(env); \ unsigned e; \ uint32_t addr; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (e = 0; e < 16 / 4; e++, mask >>= 4, eci_mask >>= 4) { \ if (!(eci_mask & 1)) { \ continue; \ @@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, MO_TEUW, 2, stw, 4, int32_t) addr = ADDRFN(base, m[H4(e & ~1)]); \ addr += 4 * (e & 1); \ if (mask & 1) { \ - cpu_stl_data_ra(env, addr, d[H4(e)], GETPC()); \ + cpu_stl_mmu(env, addr, d[H4(e)], oi, GETPC()); \ } \ if (WB && (e & 1)) { \ m[H4(e & ~1)] = addr - 4; \ @@ -XXX,XX +XXX,XX @@ DO_VLDR_SG(vldrw_sg_os_uw, MO_TEUL, uint32_t, ldl, 4, uint32_t, uint32_t, ADDR_ADD_OSW, false) DO_VLDR64_SG(vldrd_sg_os_ud, ADDR_ADD_OSD, false) -DO_VSTR_SG(vstrb_sg_ub, stb, 1, uint8_t, ADDR_ADD, false) -DO_VSTR_SG(vstrb_sg_uh, stb, 2, uint16_t, ADDR_ADD, false) -DO_VSTR_SG(vstrb_sg_uw, stb, 4, uint32_t, ADDR_ADD, false) -DO_VSTR_SG(vstrh_sg_uh, stw, 2, uint16_t, ADDR_ADD, false) -DO_VSTR_SG(vstrh_sg_uw, stw, 4, uint32_t, ADDR_ADD, false) -DO_VSTR_SG(vstrw_sg_uw, stl, 4, uint32_t, ADDR_ADD, false) +DO_VSTR_SG(vstrb_sg_ub, MO_UB, stb, 1, uint8_t, ADDR_ADD, false) +DO_VSTR_SG(vstrb_sg_uh, MO_UB, stb, 2, uint16_t, ADDR_ADD, false) +DO_VSTR_SG(vstrb_sg_uw, MO_UB, stb, 4, uint32_t, ADDR_ADD, false) +DO_VSTR_SG(vstrh_sg_uh, MO_TEUW, stw, 2, uint16_t, ADDR_ADD, false) +DO_VSTR_SG(vstrh_sg_uw, MO_TEUW, stw, 4, uint32_t, ADDR_ADD, false) +DO_VSTR_SG(vstrw_sg_uw, MO_TEUL, stl, 4, uint32_t, ADDR_ADD, false) DO_VSTR64_SG(vstrd_sg_ud, ADDR_ADD, false) -DO_VSTR_SG(vstrh_sg_os_uh, stw, 2, uint16_t, ADDR_ADD_OSH, false) -DO_VSTR_SG(vstrh_sg_os_uw, stw, 4, uint32_t, ADDR_ADD_OSH, false) -DO_VSTR_SG(vstrw_sg_os_uw, stl, 4, uint32_t, ADDR_ADD_OSW, false) +DO_VSTR_SG(vstrh_sg_os_uh, MO_TEUW, stw, 2, uint16_t, ADDR_ADD_OSH, false) +DO_VSTR_SG(vstrh_sg_os_uw, MO_TEUW, stw, 4, uint32_t, ADDR_ADD_OSH, false) +DO_VSTR_SG(vstrw_sg_os_uw, MO_TEUL, stl, 4, uint32_t, ADDR_ADD_OSW, false) DO_VSTR64_SG(vstrd_sg_os_ud, ADDR_ADD_OSD, false) DO_VLDR_SG(vldrw_sg_wb_uw, MO_TEUL, uint32_t, ldl, 4, uint32_t, uint32_t, ADDR_ADD, true) DO_VLDR64_SG(vldrd_sg_wb_ud, ADDR_ADD, true) -DO_VSTR_SG(vstrw_sg_wb_uw, stl, 4, uint32_t, ADDR_ADD, true) +DO_VSTR_SG(vstrw_sg_wb_uw, MO_TEUL, stl, 4, uint32_t, ADDR_ADD, true) DO_VSTR64_SG(vstrd_sg_wb_ud, ADDR_ADD, true) /* -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the load operations in the VLD4 instruction. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-9-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/mve_helper.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/mve_helper.c +++ b/target/arm/tcg/mve_helper.c @@ -XXX,XX +XXX,XX @@ DO_VSTR64_SG(vstrd_sg_wb_ud, ADDR_ADD, true) uint16_t mask = mve_eci_mask(env); \ static const uint8_t off[4] = { O1, O2, O3, O4 }; \ uint32_t addr, data; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ continue; \ } \ addr = base + off[beat] * 4; \ - data = cpu_ldl_le_data_ra(env, addr, GETPC()); \ + data = cpu_ldl_mmu(env, addr, oi, GETPC()); \ for (e = 0; e < 4; e++, data >>= 8) { \ uint8_t *qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + e); \ qd[H1(off[beat])] = data; \ @@ -XXX,XX +XXX,XX @@ DO_VSTR64_SG(vstrd_sg_wb_ud, ADDR_ADD, true) uint32_t addr, data; \ int y; /* y counts 0 2 0 2 */ \ uint16_t *qd; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0, y = 0; beat < 4; beat++, mask >>= 4, y ^= 2) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ continue; \ } \ addr = base + off[beat] * 8 + (beat & 1) * 4; \ - data = cpu_ldl_le_data_ra(env, addr, GETPC()); \ + data = cpu_ldl_mmu(env, addr, oi, GETPC()); \ qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + y); \ qd[H2(off[beat])] = data; \ data >>= 16; \ @@ -XXX,XX +XXX,XX @@ DO_VSTR64_SG(vstrd_sg_wb_ud, ADDR_ADD, true) uint32_t addr, data; \ uint32_t *qd; \ int y; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ continue; \ } \ addr = base + off[beat] * 4; \ - data = cpu_ldl_le_data_ra(env, addr, GETPC()); \ + data = cpu_ldl_mmu(env, addr, oi, GETPC()); \ y = (beat + (O1 & 2)) & 3; \ qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + y); \ qd[H4(off[beat] >> 2)] = data; \ -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the load operations in the VLD2 instruction. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-10-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/mve_helper.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/mve_helper.c +++ b/target/arm/tcg/mve_helper.c @@ -XXX,XX +XXX,XX @@ DO_VLD4W(vld43w, 6, 7, 8, 9) static const uint8_t off[4] = { O1, O2, O3, O4 }; \ uint32_t addr, data; \ uint8_t *qd; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ continue; \ } \ addr = base + off[beat] * 2; \ - data = cpu_ldl_le_data_ra(env, addr, GETPC()); \ + data = cpu_ldl_mmu(env, addr, oi, GETPC()); \ for (e = 0; e < 4; e++, data >>= 8) { \ qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + (e & 1)); \ qd[H1(off[beat] + (e >> 1))] = data; \ @@ -XXX,XX +XXX,XX @@ DO_VLD4W(vld43w, 6, 7, 8, 9) uint32_t addr, data; \ int e; \ uint16_t *qd; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ continue; \ } \ addr = base + off[beat] * 4; \ - data = cpu_ldl_le_data_ra(env, addr, GETPC()); \ + data = cpu_ldl_mmu(env, addr, oi, GETPC()); \ for (e = 0; e < 2; e++, data >>= 16) { \ qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + e); \ qd[H2(off[beat])] = data; \ @@ -XXX,XX +XXX,XX @@ DO_VLD4W(vld43w, 6, 7, 8, 9) static const uint8_t off[4] = { O1, O2, O3, O4 }; \ uint32_t addr, data; \ uint32_t *qd; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ continue; \ } \ addr = base + off[beat]; \ - data = cpu_ldl_le_data_ra(env, addr, GETPC()); \ + data = cpu_ldl_mmu(env, addr, oi, GETPC()); \ qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + (beat & 1)); \ qd[H4(off[beat] >> 3)] = data; \ } \ -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the store operations in the VST4 instruction. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-11-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/mve_helper.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/mve_helper.c +++ b/target/arm/tcg/mve_helper.c @@ -XXX,XX +XXX,XX @@ DO_VLD2W(vld21w, 8, 12, 16, 20) uint16_t mask = mve_eci_mask(env); \ static const uint8_t off[4] = { O1, O2, O3, O4 }; \ uint32_t addr, data; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ @@ -XXX,XX +XXX,XX @@ DO_VLD2W(vld21w, 8, 12, 16, 20) uint8_t *qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + e); \ data = (data << 8) | qd[H1(off[beat])]; \ } \ - cpu_stl_le_data_ra(env, addr, data, GETPC()); \ + cpu_stl_mmu(env, addr, data, oi, GETPC()); \ } \ } @@ -XXX,XX +XXX,XX @@ DO_VLD2W(vld21w, 8, 12, 16, 20) uint32_t addr, data; \ int y; /* y counts 0 2 0 2 */ \ uint16_t *qd; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0, y = 0; beat < 4; beat++, mask >>= 4, y ^= 2) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ @@ -XXX,XX +XXX,XX @@ DO_VLD2W(vld21w, 8, 12, 16, 20) data = qd[H2(off[beat])]; \ qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + y + 1); \ data |= qd[H2(off[beat])] << 16; \ - cpu_stl_le_data_ra(env, addr, data, GETPC()); \ + cpu_stl_mmu(env, addr, data, oi, GETPC()); \ } \ } @@ -XXX,XX +XXX,XX @@ DO_VLD2W(vld21w, 8, 12, 16, 20) uint32_t addr, data; \ uint32_t *qd; \ int y; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ @@ -XXX,XX +XXX,XX @@ DO_VLD2W(vld21w, 8, 12, 16, 20) y = (beat + (O1 & 2)) & 3; \ qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + y); \ data = qd[H4(off[beat] >> 2)]; \ - cpu_stl_le_data_ra(env, addr, data, GETPC()); \ + cpu_stl_mmu(env, addr, data, oi, GETPC()); \ } \ } -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the store operations in the VST2 instruction. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-12-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/mve_helper.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/mve_helper.c +++ b/target/arm/tcg/mve_helper.c @@ -XXX,XX +XXX,XX @@ DO_VST4W(vst43w, 6, 7, 8, 9) static const uint8_t off[4] = { O1, O2, O3, O4 }; \ uint32_t addr, data; \ uint8_t *qd; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ @@ -XXX,XX +XXX,XX @@ DO_VST4W(vst43w, 6, 7, 8, 9) qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + (e & 1)); \ data = (data << 8) | qd[H1(off[beat] + (e >> 1))]; \ } \ - cpu_stl_le_data_ra(env, addr, data, GETPC()); \ + cpu_stl_mmu(env, addr, data, oi, GETPC()); \ } \ } @@ -XXX,XX +XXX,XX @@ DO_VST4W(vst43w, 6, 7, 8, 9) uint32_t addr, data; \ int e; \ uint16_t *qd; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ @@ -XXX,XX +XXX,XX @@ DO_VST4W(vst43w, 6, 7, 8, 9) qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + e); \ data = (data << 16) | qd[H2(off[beat])]; \ } \ - cpu_stl_le_data_ra(env, addr, data, GETPC()); \ + cpu_stl_mmu(env, addr, data, oi, GETPC()); \ } \ } @@ -XXX,XX +XXX,XX @@ DO_VST4W(vst43w, 6, 7, 8, 9) static const uint8_t off[4] = { O1, O2, O3, O4 }; \ uint32_t addr, data; \ uint32_t *qd; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ @@ -XXX,XX +XXX,XX @@ DO_VST4W(vst43w, 6, 7, 8, 9) addr = base + off[beat]; \ qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + (beat & 1)); \ data = qd[H4(off[beat] >> 3)]; \ - cpu_stl_le_data_ra(env, addr, data, GETPC()); \ + cpu_stl_mmu(env, addr, data, oi, GETPC()); \ } \ } -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> We failed to raise an exception when sme_excp_el == 0 and fp_excp_el == 1. Cc: qemu-stable@nongnu.org Fixes: 3d74825f4d6 ("target/arm: Add SME enablement checks") Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-2-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-a64.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ bool sme_enabled_check(DisasContext *s) * to be zero when fp_excp_el has priority. This is because we need * sme_excp_el by itself for cpregs access checks. */ - if (!s->fp_excp_el || s->sme_excp_el < s->fp_excp_el) { + if (s->sme_excp_el + && (!s->fp_excp_el || s->sme_excp_el <= s->fp_excp_el)) { bool ret = sme_access_check(s); s->fp_access_checked = (ret ? 1 : -1); return ret; -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Do not assume SME implies SVE. Ensure that the non-streaming check is present along the SME path, since it is not implied by sme_*_enabled_check. Cc: qemu-stable@nongnu.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-3-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-a64.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static bool fp_access_check_only(DisasContext *s) return true; } -static bool fp_access_check(DisasContext *s) +static bool nonstreaming_check(DisasContext *s) { - if (!fp_access_check_only(s)) { - return false; - } if (s->sme_trap_nonstreaming && s->is_nonstreaming) { gen_exception_insn(s, 0, EXCP_UDEF, syn_smetrap(SME_ET_Streaming, false)); @@ -XXX,XX +XXX,XX @@ static bool fp_access_check(DisasContext *s) return true; } +static bool fp_access_check(DisasContext *s) +{ + return fp_access_check_only(s) && nonstreaming_check(s); +} + /* * Return <0 for non-supported element sizes, with MO_16 controlled by * FEAT_FP16; return 0 for fp disabled; otherwise return >0 for success. @@ -XXX,XX +XXX,XX @@ static int fp_access_check_vector_hsd(DisasContext *s, bool is_q, MemOp esz) */ bool sve_access_check(DisasContext *s) { - if (s->pstate_sm || !dc_isar_feature(aa64_sve, s)) { + if (dc_isar_feature(aa64_sme, s)) { bool ret; - assert(dc_isar_feature(aa64_sme, s)); - ret = sme_sm_enabled_check(s); + if (s->pstate_sm) { + ret = sme_enabled_check(s); + } else if (dc_isar_feature(aa64_sve, s)) { + goto continue_sve; + } else { + ret = sme_sm_enabled_check(s); + } + if (ret) { + ret = nonstreaming_check(s); + } s->sve_access_checked = (ret ? 1 : -1); return ret; } + + continue_sve: if (s->sve_excp_el) { /* Assert that we only raise one exception per instruction. */ assert(!s->sve_access_checked); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> We missed the instructions UDEF when the vector size is too small. We missed marking the instructions non-streaming with SME. Cc: qemu-stable@nongnu.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-4-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-sve.c | 43 ++++++++++++++++++++++++---------- 1 file changed, 30 insertions(+), 13 deletions(-) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(PUNPKHI, aa64_sve, do_perm_pred2, a, 1, gen_helper_sve_punpk_p) *** SVE Permute - Interleaving Group */ +static bool do_interleave_q(DisasContext *s, gen_helper_gvec_3 *fn, + arg_rrr_esz *a, int data) +{ + if (sve_access_check(s)) { + unsigned vsz = vec_full_reg_size(s); + if (vsz < 32) { + unallocated_encoding(s); + } else { + tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd), + vec_full_reg_offset(s, a->rn), + vec_full_reg_offset(s, a->rm), + vsz, vsz, data, fn); + } + } + return true; +} + static gen_helper_gvec_3 * const zip_fns[4] = { gen_helper_sve_zip_b, gen_helper_sve_zip_h, gen_helper_sve_zip_s, gen_helper_sve_zip_d, @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(ZIP1_z, aa64_sve, gen_gvec_ool_arg_zzz, TRANS_FEAT(ZIP2_z, aa64_sve, gen_gvec_ool_arg_zzz, zip_fns[a->esz], a, vec_full_reg_size(s) / 2) -TRANS_FEAT(ZIP1_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz, - gen_helper_sve2_zip_q, a, 0) -TRANS_FEAT(ZIP2_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz, - gen_helper_sve2_zip_q, a, - QEMU_ALIGN_DOWN(vec_full_reg_size(s), 32) / 2) +TRANS_FEAT_NONSTREAMING(ZIP1_q, aa64_sve_f64mm, do_interleave_q, + gen_helper_sve2_zip_q, a, 0) +TRANS_FEAT_NONSTREAMING(ZIP2_q, aa64_sve_f64mm, do_interleave_q, + gen_helper_sve2_zip_q, a, + QEMU_ALIGN_DOWN(vec_full_reg_size(s), 32) / 2) static gen_helper_gvec_3 * const uzp_fns[4] = { gen_helper_sve_uzp_b, gen_helper_sve_uzp_h, @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(UZP1_z, aa64_sve, gen_gvec_ool_arg_zzz, TRANS_FEAT(UZP2_z, aa64_sve, gen_gvec_ool_arg_zzz, uzp_fns[a->esz], a, 1 << a->esz) -TRANS_FEAT(UZP1_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz, - gen_helper_sve2_uzp_q, a, 0) -TRANS_FEAT(UZP2_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz, - gen_helper_sve2_uzp_q, a, 16) +TRANS_FEAT_NONSTREAMING(UZP1_q, aa64_sve_f64mm, do_interleave_q, + gen_helper_sve2_uzp_q, a, 0) +TRANS_FEAT_NONSTREAMING(UZP2_q, aa64_sve_f64mm, do_interleave_q, + gen_helper_sve2_uzp_q, a, 16) static gen_helper_gvec_3 * const trn_fns[4] = { gen_helper_sve_trn_b, gen_helper_sve_trn_h, @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(TRN1_z, aa64_sve, gen_gvec_ool_arg_zzz, TRANS_FEAT(TRN2_z, aa64_sve, gen_gvec_ool_arg_zzz, trn_fns[a->esz], a, 1 << a->esz) -TRANS_FEAT(TRN1_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz, - gen_helper_sve2_trn_q, a, 0) -TRANS_FEAT(TRN2_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz, - gen_helper_sve2_trn_q, a, 16) +TRANS_FEAT_NONSTREAMING(TRN1_q, aa64_sve_f64mm, do_interleave_q, + gen_helper_sve2_trn_q, a, 0) +TRANS_FEAT_NONSTREAMING(TRN2_q, aa64_sve_f64mm, do_interleave_q, + gen_helper_sve2_trn_q, a, 16) /* *** SVE Permute Vector - Predicated Group -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Replace @rda_rn_rm_e0 with @rda_rn_rm_ex, and require users to supply an explicit esz. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-5-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 48 +++++++++++++++++++-------------------- 1 file changed, 24 insertions(+), 24 deletions(-) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ @rda_rn_rm ........ esz:2 . rm:5 ... ... rn:5 rd:5 \ &rrrr_esz ra=%reg_movprfx -# Four operand with unused vector element size -@rda_rn_rm_e0 ........ ... rm:5 ... ... rn:5 rd:5 \ - &rrrr_esz esz=0 ra=%reg_movprfx -@rdn_ra_rm_e0 ........ ... rm:5 ... ... ra:5 rd:5 \ - &rrrr_esz esz=0 rn=%reg_movprfx +# Four operand with explicit vector element size +@rda_rn_rm_ex ........ ... rm:5 ... ... rn:5 rd:5 \ + &rrrr_esz ra=%reg_movprfx +@rdn_ra_rm_ex ........ ... rm:5 ... ... ra:5 rd:5 \ + &rrrr_esz rn=%reg_movprfx # Three operand with "memory" size, aka immediate left shift @rd_rn_msz_rm ........ ... rm:5 .... imm:2 rn:5 rd:5 &rrri @@ -XXX,XX +XXX,XX @@ XAR 00000100 .. 1 ..... 001 101 rm:5 rd:5 &rrri_esz \ rn=%reg_movprfx esz=%tszimm16_esz imm=%tszimm16_shr # SVE2 bitwise ternary operations -EOR3 00000100 00 1 ..... 001 110 ..... ..... @rdn_ra_rm_e0 -BSL 00000100 00 1 ..... 001 111 ..... ..... @rdn_ra_rm_e0 -BCAX 00000100 01 1 ..... 001 110 ..... ..... @rdn_ra_rm_e0 -BSL1N 00000100 01 1 ..... 001 111 ..... ..... @rdn_ra_rm_e0 -BSL2N 00000100 10 1 ..... 001 111 ..... ..... @rdn_ra_rm_e0 -NBSL 00000100 11 1 ..... 001 111 ..... ..... @rdn_ra_rm_e0 +EOR3 00000100 00 1 ..... 001 110 ..... ..... @rdn_ra_rm_ex esz=0 +BSL 00000100 00 1 ..... 001 111 ..... ..... @rdn_ra_rm_ex esz=0 +BCAX 00000100 01 1 ..... 001 110 ..... ..... @rdn_ra_rm_ex esz=0 +BSL1N 00000100 01 1 ..... 001 111 ..... ..... @rdn_ra_rm_ex esz=0 +BSL2N 00000100 10 1 ..... 001 111 ..... ..... @rdn_ra_rm_ex esz=0 +NBSL 00000100 11 1 ..... 001 111 ..... ..... @rdn_ra_rm_ex esz=0 ### SVE Index Generation Group @@ -XXX,XX +XXX,XX @@ EORTB 01000101 .. 0 ..... 10010 1 ..... ..... @rd_rn_rm ## SVE integer matrix multiply accumulate -SMMLA 01000101 00 0 ..... 10011 0 ..... ..... @rda_rn_rm_e0 -USMMLA 01000101 10 0 ..... 10011 0 ..... ..... @rda_rn_rm_e0 -UMMLA 01000101 11 0 ..... 10011 0 ..... ..... @rda_rn_rm_e0 +SMMLA 01000101 00 0 ..... 10011 0 ..... ..... @rda_rn_rm_ex esz=2 +USMMLA 01000101 10 0 ..... 10011 0 ..... ..... @rda_rn_rm_ex esz=2 +UMMLA 01000101 11 0 ..... 10011 0 ..... ..... @rda_rn_rm_ex esz=2 ## SVE2 bitwise permute @@ -XXX,XX +XXX,XX @@ SQRDCMLAH_zzzz 01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5 ra=%reg_movprfx USDOT_zzzz 01000100 .. 0 ..... 011 110 ..... ..... @rda_rn_rm ### SVE2 floating point matrix multiply accumulate -BFMMLA 01100100 01 1 ..... 111 001 ..... ..... @rda_rn_rm_e0 -FMMLA_s 01100100 10 1 ..... 111 001 ..... ..... @rda_rn_rm_e0 -FMMLA_d 01100100 11 1 ..... 111 001 ..... ..... @rda_rn_rm_e0 +BFMMLA 01100100 01 1 ..... 111 001 ..... ..... @rda_rn_rm_ex esz=1 +FMMLA_s 01100100 10 1 ..... 111 001 ..... ..... @rda_rn_rm_ex esz=2 +FMMLA_d 01100100 11 1 ..... 111 001 ..... ..... @rda_rn_rm_ex esz=3 ### SVE2 Memory Gather Load Group @@ -XXX,XX +XXX,XX @@ FCVTLT_sd 01100100 11 0010 11 101 ... ..... ..... @rd_pg_rn_e0 FLOGB 01100101 00 011 esz:2 0101 pg:3 rn:5 rd:5 &rpr_esz ### SVE2 floating-point multiply-add long (vectors) -FMLALB_zzzw 01100100 10 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_e0 -FMLALT_zzzw 01100100 10 1 ..... 10 0 00 1 ..... ..... @rda_rn_rm_e0 -FMLSLB_zzzw 01100100 10 1 ..... 10 1 00 0 ..... ..... @rda_rn_rm_e0 -FMLSLT_zzzw 01100100 10 1 ..... 10 1 00 1 ..... ..... @rda_rn_rm_e0 +FMLALB_zzzw 01100100 10 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2 +FMLALT_zzzw 01100100 10 1 ..... 10 0 00 1 ..... ..... @rda_rn_rm_ex esz=2 +FMLSLB_zzzw 01100100 10 1 ..... 10 1 00 0 ..... ..... @rda_rn_rm_ex esz=2 +FMLSLT_zzzw 01100100 10 1 ..... 10 1 00 1 ..... ..... @rda_rn_rm_ex esz=2 -BFMLALB_zzzw 01100100 11 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_e0 -BFMLALT_zzzw 01100100 11 1 ..... 10 0 00 1 ..... ..... @rda_rn_rm_e0 +BFMLALB_zzzw 01100100 11 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2 +BFMLALT_zzzw 01100100 11 1 ..... 10 0 00 1 ..... ..... @rda_rn_rm_ex esz=2 ### SVE2 floating-point bfloat16 dot-product -BFDOT_zzzz 01100100 01 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_e0 +BFDOT_zzzz 01100100 01 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2 ### SVE2 floating-point multiply-add long (indexed) FMLALB_zzxw 01100100 10 1 ..... 0100.0 ..... ..... @rrxr_3a esz=2 -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-6-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-sve.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ DO_ZPZZ_FP(FMINNMP, aa64_sve2, sve2_fminnmp_zpzz) DO_ZPZZ_FP(FMAXP, aa64_sve2, sve2_fmaxp_zpzz) DO_ZPZZ_FP(FMINP, aa64_sve2, sve2_fminp_zpzz) +static bool do_fmmla(DisasContext *s, arg_rrrr_esz *a, + gen_helper_gvec_4_ptr *fn) +{ + if (sve_access_check(s)) { + if (vec_full_reg_size(s) < 4 * memop_size(a->esz)) { + unallocated_encoding(s); + } else { + gen_gvec_fpst_zzzz(s, fn, a->rd, a->rn, a->rm, a->ra, 0, FPST_A64); + } + } + return true; +} + +TRANS_FEAT_NONSTREAMING(FMMLA_s, aa64_sve_f32mm, do_fmmla, a, gen_helper_fmmla_s) +TRANS_FEAT_NONSTREAMING(FMMLA_d, aa64_sve_f64mm, do_fmmla, a, gen_helper_fmmla_d) + /* * SVE Integer Multiply-Add (unpredicated) */ -TRANS_FEAT_NONSTREAMING(FMMLA_s, aa64_sve_f32mm, gen_gvec_fpst_zzzz, - gen_helper_fmmla_s, a->rd, a->rn, a->rm, a->ra, - 0, FPST_A64) -TRANS_FEAT_NONSTREAMING(FMMLA_d, aa64_sve_f64mm, gen_gvec_fpst_zzzz, - gen_helper_fmmla_d, a->rd, a->rn, a->rm, a->ra, - 0, FPST_A64) - static gen_helper_gvec_4 * const sqdmlal_zzzw_fns[] = { NULL, gen_helper_sve2_sqdmlal_zzzw_h, gen_helper_sve2_sqdmlal_zzzw_s, gen_helper_sve2_sqdmlal_zzzw_d, -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> All F64MM instructions operate on a 256-bit vector. If only 128-bit vectors is supported by the cpu, then the cpu cannot enable F64MM. Suggested-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-7-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/cpu64.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu64.c +++ b/target/arm/cpu64.c @@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp) /* From now on sve_max_vq is the actual maximum supported length. */ cpu->sve_max_vq = max_vq; cpu->sve_vq.map = vq_map; + + /* FEAT_F64MM requires the existence of a 256-bit vector size. */ + if (max_vq < 2) { + uint64_t t = GET_IDREG(&cpu->isar, ID_AA64ZFR0); + t = FIELD_DP64(t, ID_AA64ZFR0, F64MM, 0); + SET_IDREG(&cpu->isar, ID_AA64ZFR0, t); + } } /* -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Gvec only operates on size 8 and multiples of 16. Predicates may be any multiple of 2. Round up the size using the appropriate function. Cc: qemu-stable@nongnu.org Fixes: 598ab0b24c0 ("target/arm: Implement PSEL") Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-8-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-sve.c | 1 + 1 file changed, 1 insertion(+) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_PSEL(DisasContext *s, arg_psel *a) tcg_gen_neg_i64(tmp, tmp); /* Apply to either copy the source, or write zeros. */ + pl = size_for_gvec(pl); tcg_gen_gvec_ands(MO_64, pred_full_reg_offset(s, a->pd), pred_full_reg_offset(s, a->pn), tmp, pl, pl); return true; -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Implement FPProcessNaNs4 within f16_dotadd, rather than simply letting NaNs propagate through the function. Cc: qemu-stable@nongnu.org Fixes: 3916841ac75 ("target/arm: Implement FMOPA, FMOPS (widening)") Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-9-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme_helper.c | 62 +++++++++++++++++++++++++++---------- 1 file changed, 46 insertions(+), 16 deletions(-) diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ static float32 f16_dotadd(float32 sum, uint32_t e1, uint32_t e2, * - we have pre-set-up copy of s_std which is set to round-to-odd, * for the multiply (see below) */ - float64 e1r = float16_to_float64(e1 & 0xffff, true, s_f16); - float64 e1c = float16_to_float64(e1 >> 16, true, s_f16); - float64 e2r = float16_to_float64(e2 & 0xffff, true, s_f16); - float64 e2c = float16_to_float64(e2 >> 16, true, s_f16); - float64 t64; + float16 h1r = e1 & 0xffff; + float16 h1c = e1 >> 16; + float16 h2r = e2 & 0xffff; + float16 h2c = e2 >> 16; float32 t32; - /* - * The ARM pseudocode function FPDot performs both multiplies - * and the add with a single rounding operation. Emulate this - * by performing the first multiply in round-to-odd, then doing - * the second multiply as fused multiply-add, and rounding to - * float32 all in one step. - */ - t64 = float64_mul(e1r, e2r, s_odd); - t64 = float64r32_muladd(e1c, e2c, t64, 0, s_std); + /* C.f. FPProcessNaNs4 */ + if (float16_is_any_nan(h1r) || float16_is_any_nan(h1c) || + float16_is_any_nan(h2r) || float16_is_any_nan(h2c)) { + float16 t16; - /* This conversion is exact, because we've already rounded. */ - t32 = float64_to_float32(t64, s_std); + if (float16_is_signaling_nan(h1r, s_f16)) { + t16 = h1r; + } else if (float16_is_signaling_nan(h1c, s_f16)) { + t16 = h1c; + } else if (float16_is_signaling_nan(h2r, s_f16)) { + t16 = h2r; + } else if (float16_is_signaling_nan(h2c, s_f16)) { + t16 = h2c; + } else if (float16_is_any_nan(h1r)) { + t16 = h1r; + } else if (float16_is_any_nan(h1c)) { + t16 = h1c; + } else if (float16_is_any_nan(h2r)) { + t16 = h2r; + } else { + t16 = h2c; + } + t32 = float16_to_float32(t16, true, s_f16); + } else { + float64 e1r = float16_to_float64(h1r, true, s_f16); + float64 e1c = float16_to_float64(h1c, true, s_f16); + float64 e2r = float16_to_float64(h2r, true, s_f16); + float64 e2c = float16_to_float64(h2c, true, s_f16); + float64 t64; + + /* + * The ARM pseudocode function FPDot performs both multiplies + * and the add with a single rounding operation. Emulate this + * by performing the first multiply in round-to-odd, then doing + * the second multiply as fused multiply-add, and rounding to + * float32 all in one step. + */ + t64 = float64_mul(e1r, e2r, s_odd); + t64 = float64r32_muladd(e1c, e2c, t64, 0, s_std); + + /* This conversion is exact, because we've already rounded. */ + t32 = float64_to_float32(t64, s_std); + } /* The final accumulation step is not fused. */ return float32_add(sum, t32, s_std); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Implement FPProcessNaNs4 within bfdotadd_ebf, rather than simply letting NaNs propagate through the function. Cc: qemu-stable@nongnu.org Fixes: 0e1850182a1 ("target/arm: Implement FPCR.EBF=1 semantics for bfdotadd()") Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20250704142112.1018902-10-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/vec_helper.c | 75 ++++++++++++++++++++++++++----------- 1 file changed, 53 insertions(+), 22 deletions(-) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2, float_status *fpst) float32 bfdotadd_ebf(float32 sum, uint32_t e1, uint32_t e2, float_status *fpst, float_status *fpst_odd) { - /* - * Compare f16_dotadd() in sme_helper.c, but here we have - * bfloat16 inputs. In particular that means that we do not - * want the FPCR.FZ16 flush semantics, so we use the normal - * float_status for the input handling here. - */ - float64 e1r = float32_to_float64(e1 << 16, fpst); - float64 e1c = float32_to_float64(e1 & 0xffff0000u, fpst); - float64 e2r = float32_to_float64(e2 << 16, fpst); - float64 e2c = float32_to_float64(e2 & 0xffff0000u, fpst); - float64 t64; + float32 s1r = e1 << 16; + float32 s1c = e1 & 0xffff0000u; + float32 s2r = e2 << 16; + float32 s2c = e2 & 0xffff0000u; float32 t32; - /* - * The ARM pseudocode function FPDot performs both multiplies - * and the add with a single rounding operation. Emulate this - * by performing the first multiply in round-to-odd, then doing - * the second multiply as fused multiply-add, and rounding to - * float32 all in one step. - */ - t64 = float64_mul(e1r, e2r, fpst_odd); - t64 = float64r32_muladd(e1c, e2c, t64, 0, fpst); + /* C.f. FPProcessNaNs4 */ + if (float32_is_any_nan(s1r) || float32_is_any_nan(s1c) || + float32_is_any_nan(s2r) || float32_is_any_nan(s2c)) { + if (float32_is_signaling_nan(s1r, fpst)) { + t32 = s1r; + } else if (float32_is_signaling_nan(s1c, fpst)) { + t32 = s1c; + } else if (float32_is_signaling_nan(s2r, fpst)) { + t32 = s2r; + } else if (float32_is_signaling_nan(s2c, fpst)) { + t32 = s2c; + } else if (float32_is_any_nan(s1r)) { + t32 = s1r; + } else if (float32_is_any_nan(s1c)) { + t32 = s1c; + } else if (float32_is_any_nan(s2r)) { + t32 = s2r; + } else { + t32 = s2c; + } + /* + * FPConvertNaN(FPProcessNaN(t32)) will be done as part + * of the final addition below. + */ + } else { + /* + * Compare f16_dotadd() in sme_helper.c, but here we have + * bfloat16 inputs. In particular that means that we do not + * want the FPCR.FZ16 flush semantics, so we use the normal + * float_status for the input handling here. + */ + float64 e1r = float32_to_float64(s1r, fpst); + float64 e1c = float32_to_float64(s1c, fpst); + float64 e2r = float32_to_float64(s2r, fpst); + float64 e2c = float32_to_float64(s2c, fpst); + float64 t64; - /* This conversion is exact, because we've already rounded. */ - t32 = float64_to_float32(t64, fpst); + /* + * The ARM pseudocode function FPDot performs both multiplies + * and the add with a single rounding operation. Emulate this + * by performing the first multiply in round-to-odd, then doing + * the second multiply as fused multiply-add, and rounding to + * float32 all in one step. + */ + t64 = float64_mul(e1r, e2r, fpst_odd); + t64 = float64r32_muladd(e1c, e2c, t64, 0, fpst); + + /* This conversion is exact, because we've already rounded. */ + t32 = float64_to_float32(t64, fpst); + } /* The final accumulation step is not fused. */ return float32_add(sum, t32, fpst); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> The last use of this field was removed in b2fc7be972b9. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-11-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/cpu.h | 3 --- 1 file changed, 3 deletions(-) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState { uint32_t xregs[16]; - /* Scratch space for aa32 neon expansion. */ - uint32_t scratch[8]; - /* There are a number of distinct float control structures. */ float_status fp_status[FPST_COUNT]; -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Rather than repeatedly copying FPST_FPCR to locals and setting default nan mode, create dedicated float_status. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-12-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/cpu.h | 12 +++++++++++- target/arm/cpu.c | 4 ++++ target/arm/tcg/vfp_helper.c | 12 +++++++++++- 3 files changed, 26 insertions(+), 2 deletions(-) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -XXX,XX +XXX,XX @@ typedef struct NVICState NVICState; * when FPCR.AH == 1 (bfloat16 conversions and multiplies, * and the reciprocal and square root estimate/step insns); * for half-precision + * ZA: the "streaming sve" fp status. + * ZA_F16: likewise for half-precision. * * Half-precision operations are governed by a separate * flush-to-zero control bit in FPSCR:FZ16. We pass a separate @@ -XXX,XX +XXX,XX @@ typedef struct NVICState NVICState; * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16, * which means we need an FPST_AH_F16 as well. * + * The "ZA" float_status are for Streaming SVE operations which use + * default-NaN and do not generate fp exceptions, which means that they + * do not accumulate exception bits back into FPCR. + * See e.g. FPAdd vs FPAdd_ZA pseudocode functions, and the setting + * of fpcr.DN and fpexec parameters. + * * To avoid having to transfer exception bits around, we simply * say that the FPSCR cumulative exception flags are the logical * OR of the flags in the four fp statuses. This relies on the @@ -XXX,XX +XXX,XX @@ typedef enum ARMFPStatusFlavour { FPST_A64_F16, FPST_AH, FPST_AH_F16, + FPST_ZA, + FPST_ZA_F16, FPST_STD, FPST_STD_F16, } ARMFPStatusFlavour; -#define FPST_COUNT 8 +#define FPST_COUNT 10 typedef struct CPUArchState { /* Regs for current mode. */ diff --git a/target/arm/cpu.c b/target/arm/cpu.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type) set_flush_inputs_to_zero(1, &env->vfp.fp_status[FPST_STD]); set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD]); set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD_F16]); + set_default_nan_mode(1, &env->vfp.fp_status[FPST_ZA]); + set_default_nan_mode(1, &env->vfp.fp_status[FPST_ZA_F16]); arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A32]); arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64]); + arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_ZA]); arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD]); arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A32_F16]); arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]); + arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_ZA_F16]); arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD_F16]); arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_AH]); set_flush_to_zero(1, &env->vfp.fp_status[FPST_AH]); diff --git a/target/arm/tcg/vfp_helper.c b/target/arm/tcg/vfp_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vfp_helper.c +++ b/target/arm/tcg/vfp_helper.c @@ -XXX,XX +XXX,XX @@ uint32_t vfp_get_fpsr_from_host(CPUARMState *env) a64_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_A64_F16]) & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used)); /* - * We do not merge in flags from FPST_AH or FPST_AH_F16, because + * We do not merge in flags from FPST_{AH,ZA} or FPST_{AH,ZA}_F16, because * they are used for insns that must not set the cumulative exception bits. */ @@ -XXX,XX +XXX,XX @@ void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask) set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64]); set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A32_F16]); set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64_F16]); + set_float_rounding_mode(i, &env->vfp.fp_status[FPST_ZA]); + set_float_rounding_mode(i, &env->vfp.fp_status[FPST_ZA_F16]); } if (changed & FPCR_FZ16) { bool ftz_enabled = val & FPCR_FZ16; @@ -XXX,XX +XXX,XX @@ void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask) set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64_F16]); set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]); set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]); + set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_ZA_F16]); set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A32_F16]); set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64_F16]); set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]); set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]); + set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_ZA_F16]); } if (changed & FPCR_FZ) { bool ftz_enabled = val & FPCR_FZ; set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A32]); set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64]); + set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_ZA]); /* FIZ is A64 only so FZ always makes A32 code flush inputs to zero */ set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A32]); } @@ -XXX,XX +XXX,XX @@ void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask) bool fitz_enabled = (val & FPCR_FIZ) || (val & (FPCR_FZ | FPCR_AH)) == FPCR_FZ; set_flush_inputs_to_zero(fitz_enabled, &env->vfp.fp_status[FPST_A64]); + set_flush_inputs_to_zero(fitz_enabled, &env->vfp.fp_status[FPST_ZA]); } if (changed & FPCR_DN) { bool dnan_enabled = val & FPCR_DN; @@ -XXX,XX +XXX,XX @@ void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask) /* Change behaviours for A64 FP operations */ arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_A64]); arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]); + arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_ZA]); + arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_ZA_F16]); } else { arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64]); arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]); + arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_ZA]); + arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_ZA_F16]); } } /* -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-13-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme_helper.c | 37 ++++++++-------------------------- target/arm/tcg/translate-sme.c | 4 ++-- 2 files changed, 10 insertions(+), 31 deletions(-) diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme_addva_d)(void *vzda, void *vzn, void *vpn, } void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn, - void *vpm, float_status *fpst_in, uint32_t desc) + void *vpm, float_status *fpst, uint32_t desc) { intptr_t row, col, oprsz = simd_maxsz(desc); uint32_t neg = simd_data(desc) << 31; uint16_t *pn = vpn, *pm = vpm; - float_status fpst; - - /* - * Make a copy of float_status because this operation does not - * update the cumulative fp exception status. It also produces - * default nans. - */ - fpst = *fpst_in; - set_default_nan_mode(true, &fpst); for (row = 0; row < oprsz; ) { uint16_t pa = pn[H2(row >> 4)]; @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn, if (pb & 1) { uint32_t *a = vza_row + H1_4(col); uint32_t *m = vzm + H1_4(col); - *a = float32_muladd(n, *m, *a, 0, &fpst); + *a = float32_muladd(n, *m, *a, 0, fpst); } col += 4; pb >>= 4; @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn, } void HELPER(sme_fmopa_d)(void *vza, void *vzn, void *vzm, void *vpn, - void *vpm, float_status *fpst_in, uint32_t desc) + void *vpm, float_status *fpst, uint32_t desc) { intptr_t row, col, oprsz = simd_oprsz(desc) / 8; uint64_t neg = (uint64_t)simd_data(desc) << 63; uint64_t *za = vza, *zn = vzn, *zm = vzm; uint8_t *pn = vpn, *pm = vpm; - float_status fpst = *fpst_in; - - set_default_nan_mode(true, &fpst); for (row = 0; row < oprsz; ++row) { if (pn[H1(row)] & 1) { @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_d)(void *vza, void *vzn, void *vzm, void *vpn, for (col = 0; col < oprsz; ++col) { if (pm[H1(col)] & 1) { uint64_t *a = &za_row[col]; - *a = float64_muladd(n, zm[col], *a, 0, &fpst); + *a = float64_muladd(n, zm[col], *a, 0, fpst); } } } @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn, intptr_t row, col, oprsz = simd_maxsz(desc); uint32_t neg = simd_data(desc) * 0x80008000u; uint16_t *pn = vpn, *pm = vpm; - float_status fpst_odd, fpst_std, fpst_f16; + float_status fpst_odd = env->vfp.fp_status[FPST_ZA]; - /* - * Make copies of the fp status fields we use, because this operation - * does not update the cumulative fp exception status. It also - * produces default NaNs. We also need a second copy of fp_status with - * round-to-odd -- see above. - */ - fpst_f16 = env->vfp.fp_status[FPST_A64_F16]; - fpst_std = env->vfp.fp_status[FPST_A64]; - set_default_nan_mode(true, &fpst_std); - set_default_nan_mode(true, &fpst_f16); - fpst_odd = fpst_std; set_float_rounding_mode(float_round_to_odd, &fpst_odd); for (row = 0; row < oprsz; ) { @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn, m = f16mop_adj_pair(m, pcol, 0); *a = f16_dotadd(*a, n, m, - &fpst_f16, &fpst_std, &fpst_odd); + &env->vfp.fp_status[FPST_ZA_F16], + &env->vfp.fp_status[FPST_ZA], + &fpst_odd); } col += 4; pcol >>= 4; diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_outprod_env(DisasContext *s, arg_op *a, MemOp esz, TRANS_FEAT(FMOPA_h, aa64_sme, do_outprod_env, a, MO_32, gen_helper_sme_fmopa_h) TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, - MO_32, FPST_A64, gen_helper_sme_fmopa_s) + MO_32, FPST_ZA, gen_helper_sme_fmopa_s) TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, - MO_64, FPST_A64, gen_helper_sme_fmopa_d) + MO_64, FPST_ZA, gen_helper_sme_fmopa_d) TRANS_FEAT(BFMOPA, aa64_sme, do_outprod_env, a, MO_32, gen_helper_sme_bfmopa) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> The whole ZA state will also contain ZT0. Make things easier in aarch64_set_svcr to zero both by wrapping them in a common structure. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-14-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/cpu.h | 48 +++++++++++++++++++--------------- linux-user/aarch64/signal.c | 4 +-- target/arm/cpu.c | 4 +-- target/arm/helper.c | 2 +- target/arm/machine.c | 2 +- target/arm/tcg/sme_helper.c | 6 ++--- target/arm/tcg/translate-sme.c | 4 +-- 7 files changed, 38 insertions(+), 32 deletions(-) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState { uint64_t scxtnum_el[4]; - /* - * SME ZA storage -- 256 x 256 byte array, with bytes in host word order, - * as we do with vfp.zregs[]. This corresponds to the architectural ZA - * array, where ZA[N] is in the least-significant bytes of env->zarray[N]. - * When SVL is less than the architectural maximum, the accessible - * storage is restricted, such that if the SVL is X bytes the guest can - * see only the bottom X elements of zarray[], and only the least - * significant X bytes of each element of the array. (In other words, - * the observable part is always square.) - * - * The ZA storage can also be considered as a set of square tiles of - * elements of different sizes. The mapping from tiles to the ZA array - * is architecturally defined, such that for tiles of elements of esz - * bytes, the Nth row (or "horizontal slice") of tile T is in - * ZA[T + N * esz]. Note that this means that each tile is not contiguous - * in the ZA storage, because its rows are striped through the ZA array. - * - * Because this is so large, keep this toward the end of the reset area, - * to keep the offsets into the rest of the structure smaller. - */ - ARMVectorReg zarray[ARM_MAX_VQ * 16]; + struct { + /* + * SME ZA storage -- 256 x 256 byte array, with bytes in host + * word order, as we do with vfp.zregs[]. This corresponds to + * the architectural ZA array, where ZA[N] is in the least + * significant bytes of env->za_state.za[N]. + * + * When SVL is less than the architectural maximum, the accessible + * storage is restricted, such that if the SVL is X bytes the guest + * can see only the bottom X elements of zarray[], and only the least + * significant X bytes of each element of the array. (In other words, + * the observable part is always square.) + * + * The ZA storage can also be considered as a set of square tiles of + * elements of different sizes. The mapping from tiles to the ZA array + * is architecturally defined, such that for tiles of elements of esz + * bytes, the Nth row (or "horizontal slice") of tile T is in + * ZA[T + N * esz]. Note that this means that each tile is not + * contiguous in the ZA storage, because its rows are striped through + * the ZA array. + * + * Because this is so large, keep this toward the end of the + * reset area, to keep the offsets into the rest of the structure + * smaller. + */ + ARMVectorReg za[ARM_MAX_VQ * 16]; + } za_state; struct CPUBreakpoint *cpu_breakpoint[16]; struct CPUWatchpoint *cpu_watchpoint[16]; diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c index XXXXXXX..XXXXXXX 100644 --- a/linux-user/aarch64/signal.c +++ b/linux-user/aarch64/signal.c @@ -XXX,XX +XXX,XX @@ static void target_setup_za_record(struct target_za_context *za, for (i = 0; i < vl; ++i) { uint64_t *z = (void *)za + TARGET_ZA_SIG_ZAV_OFFSET(vq, i); for (j = 0; j < vq * 2; ++j) { - __put_user_e(env->zarray[i].d[j], z + j, le); + __put_user_e(env->za_state.za[i].d[j], z + j, le); } } } @@ -XXX,XX +XXX,XX @@ static bool target_restore_za_record(CPUARMState *env, for (i = 0; i < vl; ++i) { uint64_t *z = (void *)za + TARGET_ZA_SIG_ZAV_OFFSET(vq, i); for (j = 0; j < vq * 2; ++j) { - __get_user_e(env->zarray[i].d[j], z + j, le); + __get_user_e(env->za_state.za[i].d[j], z + j, le); } } return true; diff --git a/target/arm/cpu.c b/target/arm/cpu.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -XXX,XX +XXX,XX @@ static void aarch64_cpu_dump_state(CPUState *cs, FILE *f, int flags) qemu_fprintf(f, "ZA[%0*d]=", svl_lg10, i); for (j = zcr_len; j >= 0; --j) { qemu_fprintf(f, "%016" PRIx64 ":%016" PRIx64 "%c", - env->zarray[i].d[2 * j + 1], - env->zarray[i].d[2 * j], + env->za_state.za[i].d[2 * j + 1], + env->za_state.za[i].d[2 * j], j ? ':' : '\n'); } } diff --git a/target/arm/helper.c b/target/arm/helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -XXX,XX +XXX,XX @@ void aarch64_set_svcr(CPUARMState *env, uint64_t new, uint64_t mask) * when disabled either. */ if (change & new & R_SVCR_ZA_MASK) { - memset(env->zarray, 0, sizeof(env->zarray)); + memset(&env->za_state, 0, sizeof(env->za_state)); } if (tcg_enabled()) { diff --git a/target/arm/machine.c b/target/arm/machine.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/machine.c +++ b/target/arm/machine.c @@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_za = { .minimum_version_id = 1, .needed = za_needed, .fields = (const VMStateField[]) { - VMSTATE_STRUCT_ARRAY(env.zarray, ARMCPU, ARM_MAX_VQ * 16, 0, + VMSTATE_STRUCT_ARRAY(env.za_state.za, ARMCPU, ARM_MAX_VQ * 16, 0, vmstate_vreg, ARMVectorReg), VMSTATE_END_OF_LIST() } diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void helper_sme_zero(CPUARMState *env, uint32_t imm, uint32_t svl) uint32_t i; /* - * Special case clearing the entire ZA space. + * Special case clearing the entire ZArray. * This falls into the CONSTRAINED UNPREDICTABLE zeroing of any * parts of the ZA storage outside of SVL. */ if (imm == 0xff) { - memset(env->zarray, 0, sizeof(env->zarray)); + memset(env->za_state.za, 0, sizeof(env->za_state.za)); return; } @@ -XXX,XX +XXX,XX @@ void helper_sme_zero(CPUARMState *env, uint32_t imm, uint32_t svl) */ for (i = 0; i < svl; i++) { if (imm & (1 << (i % 8))) { - memset(&env->zarray[i], 0, svl); + memset(&env->za_state.za[i], 0, svl); } } } diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static TCGv_ptr get_tile_rowcol(DisasContext *s, int esz, int rs, offset = tile * sizeof(ARMVectorReg); /* Include the byte offset of zarray to make this relative to env. */ - offset += offsetof(CPUARMState, zarray); + offset += offsetof(CPUARMState, za_state.za); tcg_gen_addi_i32(tmp, tmp, offset); /* Add the byte offset to env to produce the final pointer. */ @@ -XXX,XX +XXX,XX @@ static TCGv_ptr get_tile(DisasContext *s, int esz, int tile) TCGv_ptr addr = tcg_temp_new_ptr(); int offset; - offset = tile * sizeof(ARMVectorReg) + offsetof(CPUARMState, zarray); + offset = tile * sizeof(ARMVectorReg) + offsetof(CPUARMState, za_state.za); tcg_gen_addi_ptr(addr, tcg_env, offset); return addr; -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-15-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/cpu-features.h | 63 +++++++++++++++++++++++++++++++++++++++ target/arm/cpu.h | 1 + 2 files changed, 64 insertions(+) diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu-features.h +++ b/target/arm/cpu-features.h @@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_rpres(const ARMISARegisters *id) return FIELD_EX64_IDREG(id, ID_AA64ISAR2, RPRES); } +static inline bool isar_feature_aa64_lut(const ARMISARegisters *id) +{ + return FIELD_EX64_IDREG(id, ID_AA64ISAR2, LUT); +} + static inline bool isar_feature_aa64_fp_simd(const ARMISARegisters *id) { /* We always set the AdvSIMD and FP fields identically. */ @@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_sve2(const ARMISARegisters *id) return FIELD_EX64_IDREG(id, ID_AA64ZFR0, SVEVER) != 0; } +static inline bool isar_feature_aa64_sve2p1(const ARMISARegisters *id) +{ + return FIELD_EX64_IDREG(id, ID_AA64ZFR0, SVEVER) >=2; +} + static inline bool isar_feature_aa64_sve2_aes(const ARMISARegisters *id) { return FIELD_EX64_IDREG(id, ID_AA64ZFR0, AES) != 0; @@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_sve_f64mm(const ARMISARegisters *id) return FIELD_EX64_IDREG(id, ID_AA64ZFR0, F64MM) != 0; } +static inline bool isar_feature_aa64_sve_b16b16(const ARMISARegisters *id) +{ + return FIELD_EX64_IDREG(id, ID_AA64ZFR0, B16B16); +} + +static inline bool isar_feature_aa64_sme_b16b16(const ARMISARegisters *id) +{ + return FIELD_EX64_IDREG(id, ID_AA64SMFR0, B16B16); +} + +static inline bool isar_feature_aa64_sme_f16f16(const ARMISARegisters *id) +{ + return FIELD_EX64_IDREG(id, ID_AA64SMFR0, F16F16); +} + static inline bool isar_feature_aa64_sme_f64f64(const ARMISARegisters *id) { return FIELD_EX64_IDREG(id, ID_AA64SMFR0, F64F64); @@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_sme_fa64(const ARMISARegisters *id) return FIELD_EX64_IDREG(id, ID_AA64SMFR0, FA64); } +static inline bool isar_feature_aa64_sme2(const ARMISARegisters *id) +{ + return FIELD_EX64_IDREG(id, ID_AA64SMFR0, SMEVER) != 0; +} + +static inline bool isar_feature_aa64_sme2p1(const ARMISARegisters *id) +{ + return FIELD_EX64_IDREG(id, ID_AA64SMFR0, SMEVER) >= 2; +} + +/* + * Combinations of feature tests, for ease of use with TRANS_FEAT. + */ +static inline bool isar_feature_aa64_sme_or_sve2p1(const ARMISARegisters *id) +{ + return isar_feature_aa64_sme(id) || isar_feature_aa64_sve2p1(id); +} + +static inline bool isar_feature_aa64_sme2_or_sve2p1(const ARMISARegisters *id) +{ + return isar_feature_aa64_sme2(id) || isar_feature_aa64_sve2p1(id); +} + +static inline bool isar_feature_aa64_sme2p1_or_sve2p1(const ARMISARegisters *id) +{ + return isar_feature_aa64_sme2p1(id) || isar_feature_aa64_sve2p1(id); +} + +static inline bool isar_feature_aa64_sme2_i16i64(const ARMISARegisters *id) +{ + return isar_feature_aa64_sme2(id) && isar_feature_aa64_sme_i16i64(id); +} + +static inline bool isar_feature_aa64_sme2_f64f64(const ARMISARegisters *id) +{ + return isar_feature_aa64_sme2(id) && isar_feature_aa64_sme_f64f64(id); +} + /* * Feature tests for "does this exist in either 32-bit or 64-bit?" */ diff --git a/target/arm/cpu.h b/target/arm/cpu.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -XXX,XX +XXX,XX @@ FIELD(ID_AA64ISAR2, SYSINSTR_128, 36, 4) FIELD(ID_AA64ISAR2, PRFMSLC, 40, 4) FIELD(ID_AA64ISAR2, RPRFM, 48, 4) FIELD(ID_AA64ISAR2, CSSC, 52, 4) +FIELD(ID_AA64ISAR2, LUT, 56, 4) FIELD(ID_AA64ISAR2, ATS1A, 60, 4) FIELD(ID_AA64PFR0, EL0, 0, 4) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> This is a 512-bit array introduced with SME2. Save it only when ZA is in use. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-16-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/cpu.h | 3 +++ target/arm/machine.c | 20 ++++++++++++++++++++ 2 files changed, 23 insertions(+) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState { uint64_t scxtnum_el[4]; struct { + /* SME2 ZT0 -- 512 bit array, with data ordered like ARMVectorReg. */ + uint64_t zt0[512 / 64] QEMU_ALIGNED(16); + /* * SME ZA storage -- 256 x 256 byte array, with bytes in host * word order, as we do with vfp.zregs[]. This corresponds to diff --git a/target/arm/machine.c b/target/arm/machine.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/machine.c +++ b/target/arm/machine.c @@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_za = { } }; +static bool zt0_needed(void *opaque) +{ + ARMCPU *cpu = opaque; + + return za_needed(cpu) && cpu_isar_feature(aa64_sme2, cpu); +} + +static const VMStateDescription vmstate_zt0 = { + .name = "cpu/zt0", + .version_id = 1, + .minimum_version_id = 1, + .needed = zt0_needed, + .fields = (VMStateField[]) { + VMSTATE_UINT64_ARRAY(env.za_state.zt0, ARMCPU, + ARRAY_SIZE(((CPUARMState *)0)->za_state.zt0)), + VMSTATE_END_OF_LIST() + } +}; + static bool serror_needed(void *opaque) { ARMCPU *cpu = opaque; @@ -XXX,XX +XXX,XX @@ const VMStateDescription vmstate_arm_cpu = { &vmstate_m_security, &vmstate_sve, &vmstate_za, + &vmstate_zt0, &vmstate_serror, &vmstate_irq_line_state, &vmstate_wfxt_timer, -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Pipe the value through from SMCR_ELx through hflags and into the disassembly context. Enable EZT0 in smcr_write. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-17-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/cpu.h | 2 ++ target/arm/tcg/translate.h | 1 + target/arm/cpu.c | 3 +++ target/arm/helper.c | 6 +++++- target/arm/tcg/hflags.c | 34 +++++++++++++++++++++++++++++++++- target/arm/tcg/translate-a64.c | 1 + 6 files changed, 45 insertions(+), 2 deletions(-) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -XXX,XX +XXX,XX @@ FIELD(SVCR, ZA, 1, 1) /* Fields for SMCR_ELx. */ FIELD(SMCR, LEN, 0, 4) +FIELD(SMCR, EZT0, 30, 1) FIELD(SMCR, FA64, 31, 1) /* Write a new value to v7m.exception, thus transitioning into or out @@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A64, NV2_MEM_E20, 35, 1) FIELD(TBFLAG_A64, NV2_MEM_BE, 36, 1) FIELD(TBFLAG_A64, AH, 37, 1) /* FPCR.AH */ FIELD(TBFLAG_A64, NEP, 38, 1) /* FPCR.NEP */ +FIELD(TBFLAG_A64, ZT0EXC_EL, 39, 2) /* * Helpers for using the above. Note that only the A64 accessors use diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate.h +++ b/target/arm/tcg/translate.h @@ -XXX,XX +XXX,XX @@ typedef struct DisasContext { int fp_excp_el; /* FP exception EL or 0 if enabled */ int sve_excp_el; /* SVE exception EL or 0 if enabled */ int sme_excp_el; /* SME exception EL or 0 if enabled */ + int zt0_excp_el; /* ZT0 exception EL or 0 if enabled */ int vl; /* current vector length in bytes */ int svl; /* current streaming vector length in bytes */ bool vfp_enabled; /* FP enabled via FPSCR.EN */ diff --git a/target/arm/cpu.c b/target/arm/cpu.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -XXX,XX +XXX,XX @@ void arm_emulate_firmware_reset(CPUState *cpustate, int target_el) env->cp15.cptr_el[3] |= R_CPTR_EL3_ESM_MASK; env->cp15.scr_el3 |= SCR_ENTP2; env->vfp.smcr_el[3] = 0xf; + if (cpu_isar_feature(aa64_sme2, cpu)) { + env->vfp.smcr_el[3] |= R_SMCR_EZT0_MASK; + } } if (cpu_isar_feature(aa64_hcx, cpu)) { env->cp15.scr_el3 |= SCR_HXEN; diff --git a/target/arm/helper.c b/target/arm/helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -XXX,XX +XXX,XX @@ static void smcr_write(CPUARMState *env, const ARMCPRegInfo *ri, { int cur_el = arm_current_el(env); int old_len = sve_vqm1_for_el(env, cur_el); + uint64_t valid_mask = R_SMCR_LEN_MASK | R_SMCR_FA64_MASK; int new_len; QEMU_BUILD_BUG_ON(ARM_MAX_VQ > R_SMCR_LEN_MASK + 1); - value &= R_SMCR_LEN_MASK | R_SMCR_FA64_MASK; + if (cpu_isar_feature(aa64_sme2, env_archcpu(env))) { + valid_mask |= R_SMCR_EZT0_MASK; + } + value &= valid_mask; raw_write(env, ri, value); /* diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/hflags.c +++ b/target/arm/tcg/hflags.c @@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a32(CPUARMState *env, int fp_el, return rebuild_hflags_common_32(env, fp_el, mmu_idx, flags); } +/* + * Return the exception level to which exceptions should be taken for ZT0. + * C.f. the ARM pseudocode function CheckSMEZT0Enabled, after the ZA check. + */ +static int zt0_exception_el(CPUARMState *env, int el) +{ +#ifndef CONFIG_USER_ONLY + if (el <= 1 + && !el_is_in_host(env, el) + && !FIELD_EX64(env->vfp.smcr_el[1], SMCR, EZT0)) { + return 1; + } + if (el <= 2 + && arm_is_el2_enabled(env) + && !FIELD_EX64(env->vfp.smcr_el[2], SMCR, EZT0)) { + return 2; + } + if (arm_feature(env, ARM_FEATURE_EL3) + && !FIELD_EX64(env->vfp.smcr_el[3], SMCR, EZT0)) { + return 3; + } +#endif + return 0; +} + static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el, ARMMMUIdx mmu_idx) { @@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el, DP_TBFLAG_A64(flags, PSTATE_SM, 1); DP_TBFLAG_A64(flags, SME_TRAP_NONSTREAMING, !sme_fa64(env, el)); } - DP_TBFLAG_A64(flags, PSTATE_ZA, FIELD_EX64(env->svcr, SVCR, ZA)); + + if (FIELD_EX64(env->svcr, SVCR, ZA)) { + DP_TBFLAG_A64(flags, PSTATE_ZA, 1); + if (cpu_isar_feature(aa64_sme2, env_archcpu(env))) { + int zt0_el = zt0_exception_el(env, el); + DP_TBFLAG_A64(flags, ZT0EXC_EL, zt0_el); + } + } } sctlr = regime_sctlr(env, stage1); diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase, dc->trap_eret = EX_TBFLAG_A64(tb_flags, TRAP_ERET); dc->sve_excp_el = EX_TBFLAG_A64(tb_flags, SVEEXC_EL); dc->sme_excp_el = EX_TBFLAG_A64(tb_flags, SMEEXC_EL); + dc->zt0_excp_el = EX_TBFLAG_A64(tb_flags, ZT0EXC_EL); dc->vl = (EX_TBFLAG_A64(tb_flags, VL) + 1) * 16; dc->svl = (EX_TBFLAG_A64(tb_flags, SVL) + 1) * 16; dc->pauth_active = EX_TBFLAG_A64(tb_flags, PAUTH_ACTIVE); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-18-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/syndrome.h | 1 + target/arm/tcg/sme.decode | 1 + target/arm/tcg/translate-sme.c | 26 ++++++++++++++++++++++++++ 3 files changed, 28 insertions(+) diff --git a/target/arm/syndrome.h b/target/arm/syndrome.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/syndrome.h +++ b/target/arm/syndrome.h @@ -XXX,XX +XXX,XX @@ typedef enum { SME_ET_Streaming, SME_ET_NotStreaming, SME_ET_InactiveZA, + SME_ET_InaccessibleZT0, } SMEExceptionType; #define ARM_EL_EC_LENGTH 6 diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ ### SME Misc ZERO 11000000 00 001 00000000000 imm:8 +ZERO_zt0 11000000 01 001 00000000000 00000001 ### SME Move into/from Array diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ #include "decode-sme.c.inc" +static bool sme2_zt0_enabled_check(DisasContext *s) +{ + if (!sme_za_enabled_check(s)) { + return false; + } + if (s->zt0_excp_el) { + gen_exception_insn_el(s, 0, EXCP_UDEF, + syn_smetrap(SME_ET_InaccessibleZT0, false), + s->zt0_excp_el); + return false; + } + return true; +} /* * Resolve tile.size[index] to a host pointer, where tile and index @@ -XXX,XX +XXX,XX @@ static bool trans_ZERO(DisasContext *s, arg_ZERO *a) return true; } +static bool trans_ZERO_zt0(DisasContext *s, arg_ZERO_zt0 *a) +{ + if (!dc_isar_feature(aa64_sme2, s)) { + return false; + } + if (sme_enabled_check(s) && sme2_zt0_enabled_check(s)) { + tcg_gen_gvec_dup_imm(MO_64, offsetof(CPUARMState, za_state.zt0), + sizeof_field(CPUARMState, za_state.zt0), + sizeof_field(CPUARMState, za_state.zt0), 0); + } + return true; +} + static bool trans_MOVA(DisasContext *s, arg_MOVA *a) { static gen_helper_gvec_4 * const h_fns[5] = { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Honor AlignmentEnforced() for LDR/STR (vector), (predicate), and (array vector). Within the expansion functions, clear @align when we're done emitting loads at the largest size. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-19-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-a64.h | 6 ++-- target/arm/tcg/translate-sme.c | 5 ++-- target/arm/tcg/translate-sve.c | 50 ++++++++++++++++++++++++---------- 3 files changed, 42 insertions(+), 19 deletions(-) diff --git a/target/arm/tcg/translate-a64.h b/target/arm/tcg/translate-a64.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.h +++ b/target/arm/tcg/translate-a64.h @@ -XXX,XX +XXX,XX @@ void gen_gvec_usqadd_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz); -void gen_sve_ldr(DisasContext *s, TCGv_ptr, int vofs, int len, int rn, int imm); -void gen_sve_str(DisasContext *s, TCGv_ptr, int vofs, int len, int rn, int imm); +void gen_sve_ldr(DisasContext *s, TCGv_ptr, int vofs, + int len, int rn, int imm, MemOp align); +void gen_sve_str(DisasContext *s, TCGv_ptr, int vofs, + int len, int rn, int imm, MemOp align); #endif /* TARGET_ARM_TRANSLATE_A64_H */ diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool trans_LDST1(DisasContext *s, arg_LDST1 *a) return true; } -typedef void GenLdStR(DisasContext *, TCGv_ptr, int, int, int, int); +typedef void GenLdStR(DisasContext *, TCGv_ptr, int, int, int, int, MemOp); static bool do_ldst_r(DisasContext *s, arg_ldstr *a, GenLdStR *fn) { @@ -XXX,XX +XXX,XX @@ static bool do_ldst_r(DisasContext *s, arg_ldstr *a, GenLdStR *fn) /* ZA[n] equates to ZA0H.B[n]. */ base = get_tile_rowcol(s, MO_8, a->rv, imm, false); - fn(s, base, 0, svl, a->rn, imm * svl); + fn(s, base, 0, svl, a->rn, imm * svl, + s->align_mem ? MO_ALIGN_16 : MO_UNALN); return true; } diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(UCVTF_dd, aa64_sve, gen_gvec_fpst_arg_zpz, */ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs, - int len, int rn, int imm) + int len, int rn, int imm, MemOp align) { int len_align = QEMU_ALIGN_DOWN(len, 16); int len_remain = len % 16; @@ -XXX,XX +XXX,XX @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs, for (i = 0; i < len_align; i += 16) { tcg_gen_qemu_ld_i128(t16, clean_addr, midx, - MO_LE | MO_128 | MO_ATOM_NONE); + MO_LE | MO_128 | MO_ATOM_NONE | align); tcg_gen_extr_i128_i64(t0, t1, t16); tcg_gen_st_i64(t0, base, vofs + i); tcg_gen_st_i64(t1, base, vofs + i + 8); tcg_gen_addi_i64(clean_addr, clean_addr, 16); } + if (len_align) { + align = MO_UNALN; + } } else { TCGLabel *loop = gen_new_label(); TCGv_ptr tp, i = tcg_temp_new_ptr(); @@ -XXX,XX +XXX,XX @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs, t16 = tcg_temp_new_i128(); tcg_gen_qemu_ld_i128(t16, clean_addr, midx, - MO_LE | MO_128 | MO_ATOM_NONE); + MO_LE | MO_128 | MO_ATOM_NONE | align); tcg_gen_addi_i64(clean_addr, clean_addr, 16); tp = tcg_temp_new_ptr(); @@ -XXX,XX +XXX,XX @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs, tcg_gen_st_i64(t1, tp, vofs + 8); tcg_gen_brcondi_ptr(TCG_COND_LTU, i, len_align, loop); + align = MO_UNALN; } /* @@ -XXX,XX +XXX,XX @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs, */ if (len_remain >= 8) { t0 = tcg_temp_new_i64(); - tcg_gen_qemu_ld_i64(t0, clean_addr, midx, MO_LEUQ | MO_ATOM_NONE); + tcg_gen_qemu_ld_i64(t0, clean_addr, midx, + MO_LEUQ | MO_ATOM_NONE | align); + align = MO_UNALN; tcg_gen_st_i64(t0, base, vofs + len_align); len_remain -= 8; len_align += 8; @@ -XXX,XX +XXX,XX @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs, case 4: case 8: tcg_gen_qemu_ld_i64(t0, clean_addr, midx, - MO_LE | ctz32(len_remain) | MO_ATOM_NONE); + MO_LE | ctz32(len_remain) + | MO_ATOM_NONE | align); break; case 6: t1 = tcg_temp_new_i64(); - tcg_gen_qemu_ld_i64(t0, clean_addr, midx, MO_LEUL | MO_ATOM_NONE); + tcg_gen_qemu_ld_i64(t0, clean_addr, midx, + MO_LEUL | MO_ATOM_NONE | align); tcg_gen_addi_i64(clean_addr, clean_addr, 4); tcg_gen_qemu_ld_i64(t1, clean_addr, midx, MO_LEUW | MO_ATOM_NONE); tcg_gen_deposit_i64(t0, t0, t1, 32, 32); @@ -XXX,XX +XXX,XX @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs, /* Similarly for stores. */ void gen_sve_str(DisasContext *s, TCGv_ptr base, int vofs, - int len, int rn, int imm) + int len, int rn, int imm, MemOp align) { int len_align = QEMU_ALIGN_DOWN(len, 16); int len_remain = len % 16; @@ -XXX,XX +XXX,XX @@ void gen_sve_str(DisasContext *s, TCGv_ptr base, int vofs, tcg_gen_ld_i64(t1, base, vofs + i + 8); tcg_gen_concat_i64_i128(t16, t0, t1); tcg_gen_qemu_st_i128(t16, clean_addr, midx, - MO_LE | MO_128 | MO_ATOM_NONE); + MO_LE | MO_128 | MO_ATOM_NONE | align); tcg_gen_addi_i64(clean_addr, clean_addr, 16); } + if (len_align) { + align = MO_UNALN; + } } else { TCGLabel *loop = gen_new_label(); TCGv_ptr tp, i = tcg_temp_new_ptr(); @@ -XXX,XX +XXX,XX @@ void gen_sve_str(DisasContext *s, TCGv_ptr base, int vofs, tcg_gen_addi_i64(clean_addr, clean_addr, 16); tcg_gen_brcondi_ptr(TCG_COND_LTU, i, len_align, loop); + align = MO_UNALN; } /* Predicate register stores can be any multiple of 2. */ if (len_remain >= 8) { t0 = tcg_temp_new_i64(); tcg_gen_ld_i64(t0, base, vofs + len_align); - tcg_gen_qemu_st_i64(t0, clean_addr, midx, MO_LEUQ | MO_ATOM_NONE); + tcg_gen_qemu_st_i64(t0, clean_addr, midx, + MO_LEUQ | MO_ATOM_NONE | align); + align = MO_UNALN; len_remain -= 8; len_align += 8; if (len_remain) { @@ -XXX,XX +XXX,XX @@ void gen_sve_str(DisasContext *s, TCGv_ptr base, int vofs, case 4: case 8: tcg_gen_qemu_st_i64(t0, clean_addr, midx, - MO_LE | ctz32(len_remain) | MO_ATOM_NONE); + MO_LE | ctz32(len_remain) + | MO_ATOM_NONE | align); break; case 6: - tcg_gen_qemu_st_i64(t0, clean_addr, midx, MO_LEUL | MO_ATOM_NONE); + tcg_gen_qemu_st_i64(t0, clean_addr, midx, + MO_LEUL | MO_ATOM_NONE | align); tcg_gen_addi_i64(clean_addr, clean_addr, 4); tcg_gen_shri_i64(t0, t0, 32); tcg_gen_qemu_st_i64(t0, clean_addr, midx, MO_LEUW | MO_ATOM_NONE); @@ -XXX,XX +XXX,XX @@ static bool trans_LDR_zri(DisasContext *s, arg_rri *a) if (sve_access_check(s)) { int size = vec_full_reg_size(s); int off = vec_full_reg_offset(s, a->rd); - gen_sve_ldr(s, tcg_env, off, size, a->rn, a->imm * size); + gen_sve_ldr(s, tcg_env, off, size, a->rn, a->imm * size, + s->align_mem ? MO_ALIGN_16 : MO_UNALN); } return true; } @@ -XXX,XX +XXX,XX @@ static bool trans_LDR_pri(DisasContext *s, arg_rri *a) if (sve_access_check(s)) { int size = pred_full_reg_size(s); int off = pred_full_reg_offset(s, a->rd); - gen_sve_ldr(s, tcg_env, off, size, a->rn, a->imm * size); + gen_sve_ldr(s, tcg_env, off, size, a->rn, a->imm * size, + s->align_mem ? MO_ALIGN_2 : MO_UNALN); } return true; } @@ -XXX,XX +XXX,XX @@ static bool trans_STR_zri(DisasContext *s, arg_rri *a) if (sve_access_check(s)) { int size = vec_full_reg_size(s); int off = vec_full_reg_offset(s, a->rd); - gen_sve_str(s, tcg_env, off, size, a->rn, a->imm * size); + gen_sve_str(s, tcg_env, off, size, a->rn, a->imm * size, + s->align_mem ? MO_ALIGN_16 : MO_UNALN); } return true; } @@ -XXX,XX +XXX,XX @@ static bool trans_STR_pri(DisasContext *s, arg_rri *a) if (sve_access_check(s)) { int size = pred_full_reg_size(s); int off = pred_full_reg_offset(s, a->rd); - gen_sve_str(s, tcg_env, off, size, a->rn, a->imm * size); + gen_sve_str(s, tcg_env, off, size, a->rn, a->imm * size, + s->align_mem ? MO_ALIGN_2 : MO_UNALN); } return true; } -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-20-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 6 ++++++ target/arm/tcg/translate-sme.c | 13 +++++++++++++ 2 files changed, 19 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ LDST1 1110000 111 st:1 rm:5 v:1 .. pg:3 rn:5 0 za_imm:4 \ LDR 1110000 100 0 000000 .. 000 ..... 0 .... @ldstr STR 1110000 100 1 000000 .. 000 ..... 0 .... @ldstr +&ldstzt0 rn +@ldstzt0 ....... ... . ...... .. ... rn:5 ..... &ldstzt0 + +LDR_zt0 1110000 100 0 111111 00 000 ..... 00000 @ldstzt0 +STR_zt0 1110000 100 1 111111 00 000 ..... 00000 @ldstzt0 + ### SME Add Vector to Array &adda zad zn pm pn diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_ldst_r(DisasContext *s, arg_ldstr *a, GenLdStR *fn) TRANS_FEAT(LDR, aa64_sme, do_ldst_r, a, gen_sve_ldr) TRANS_FEAT(STR, aa64_sme, do_ldst_r, a, gen_sve_str) +static bool do_ldst_zt0(DisasContext *s, arg_ldstzt0 *a, GenLdStR *fn) +{ + if (sme2_zt0_enabled_check(s)) { + fn(s, tcg_env, offsetof(CPUARMState, za_state.zt0), + sizeof_field(CPUARMState, za_state.zt0), a->rn, 0, + s->align_mem ? MO_ALIGN_16 : MO_UNALN); + } + return true; +} + +TRANS_FEAT(LDR_zt0, aa64_sme2, do_ldst_zt0, a, gen_sve_ldr) +TRANS_FEAT(STR_zt0, aa64_sme2, do_ldst_zt0, a, gen_sve_str) + static bool do_adda(DisasContext *s, arg_adda *a, MemOp esz, gen_helper_gvec_4 *fn) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-21-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 5 +++++ target/arm/tcg/translate-sme.c | 13 +++++++++++++ 2 files changed, 18 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ MOVA 11000000 esz:2 00001 0 v:1 .. pg:3 0 za_imm:4 zr:5 \ MOVA 11000000 11 00001 1 v:1 .. pg:3 0 za_imm:4 zr:5 \ &mova to_vec=1 rs=%mova_rs esz=4 +### SME Move into/from ZT0 + +MOVT_rzt 1100 0000 0100 1100 0 off:3 00 11111 rt:5 +MOVT_ztr 1100 0000 0100 1110 0 off:3 00 11111 rt:5 + ### SME Memory &ldst esz rs pg rn rm za_imm v:bool st:bool diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool trans_MOVA(DisasContext *s, arg_MOVA *a) return true; } +static bool do_movt(DisasContext *s, arg_MOVT_rzt *a, + void (*func)(TCGv_i64, TCGv_ptr, tcg_target_long)) +{ + if (sme2_zt0_enabled_check(s)) { + func(cpu_reg(s, a->rt), tcg_env, + offsetof(CPUARMState, za_state.zt0) + a->off * 8); + } + return true; +} + +TRANS_FEAT(MOVT_rzt, aa64_sme2, do_movt, a, tcg_gen_ld_i64) +TRANS_FEAT(MOVT_ztr, aa64_sme2, do_movt, a, tcg_gen_st_i64) + static bool trans_LDST1(DisasContext *s, arg_LDST1 *a) { typedef void GenLdSt1(TCGv_env, TCGv_ptr, TCGv_ptr, TCGv, TCGv_i32); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Decode tile number and index offset beforehand and separately. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-22-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 46 +++++++++++++++++++++++----------- target/arm/tcg/translate-sme.c | 17 +++++-------- 2 files changed, 38 insertions(+), 25 deletions(-) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ ZERO_zt0 11000000 01 001 00000000000 00000001 ### SME Move into/from Array %mova_rs 13:2 !function=plus_12 -&mova esz rs pg zr za_imm v:bool to_vec:bool +&mova esz rs pg zr za off v:bool to_vec:bool -MOVA 11000000 esz:2 00000 0 v:1 .. pg:3 zr:5 0 za_imm:4 \ - &mova to_vec=0 rs=%mova_rs -MOVA 11000000 11 00000 1 v:1 .. pg:3 zr:5 0 za_imm:4 \ - &mova to_vec=0 rs=%mova_rs esz=4 +MOVA 11000000 00 00000 0 v:1 .. pg:3 zr:5 0 off:4 \ + &mova to_vec=0 rs=%mova_rs esz=0 za=0 +MOVA 11000000 01 00000 0 v:1 .. pg:3 zr:5 0 za:1 off:3 \ + &mova to_vec=0 rs=%mova_rs esz=1 +MOVA 11000000 10 00000 0 v:1 .. pg:3 zr:5 0 za:2 off:2 \ + &mova to_vec=0 rs=%mova_rs esz=2 +MOVA 11000000 11 00000 0 v:1 .. pg:3 zr:5 0 za:3 off:1 \ + &mova to_vec=0 rs=%mova_rs esz=3 +MOVA 11000000 11 00000 1 v:1 .. pg:3 zr:5 0 za:4 \ + &mova to_vec=0 rs=%mova_rs esz=4 off=0 -MOVA 11000000 esz:2 00001 0 v:1 .. pg:3 0 za_imm:4 zr:5 \ - &mova to_vec=1 rs=%mova_rs -MOVA 11000000 11 00001 1 v:1 .. pg:3 0 za_imm:4 zr:5 \ - &mova to_vec=1 rs=%mova_rs esz=4 +MOVA 11000000 00 00001 0 v:1 .. pg:3 0 off:4 zr:5 \ + &mova to_vec=1 rs=%mova_rs esz=0 za=0 +MOVA 11000000 01 00001 0 v:1 .. pg:3 0 za:1 off:3 zr:5 \ + &mova to_vec=1 rs=%mova_rs esz=1 +MOVA 11000000 10 00001 0 v:1 .. pg:3 0 za:2 off:2 zr:5 \ + &mova to_vec=1 rs=%mova_rs esz=2 +MOVA 11000000 11 00001 0 v:1 .. pg:3 0 za:3 off:1 zr:5 \ + &mova to_vec=1 rs=%mova_rs esz=3 +MOVA 11000000 11 00001 1 v:1 .. pg:3 0 za:4 zr:5 \ + &mova to_vec=1 rs=%mova_rs esz=4 off=0 ### SME Move into/from ZT0 @@ -XXX,XX +XXX,XX @@ MOVT_ztr 1100 0000 0100 1110 0 off:3 00 11111 rt:5 ### SME Memory -&ldst esz rs pg rn rm za_imm v:bool st:bool +&ldst esz rs pg rn rm za off v:bool st:bool -LDST1 1110000 0 esz:2 st:1 rm:5 v:1 .. pg:3 rn:5 0 za_imm:4 \ - &ldst rs=%mova_rs -LDST1 1110000 111 st:1 rm:5 v:1 .. pg:3 rn:5 0 za_imm:4 \ - &ldst esz=4 rs=%mova_rs +LDST1 1110000 0 00 st:1 rm:5 v:1 .. pg:3 rn:5 0 off:4 \ + &ldst rs=%mova_rs esz=0 za=0 +LDST1 1110000 0 01 st:1 rm:5 v:1 .. pg:3 rn:5 0 za:1 off:3 \ + &ldst rs=%mova_rs esz=1 +LDST1 1110000 0 10 st:1 rm:5 v:1 .. pg:3 rn:5 0 za:2 off:2 \ + &ldst rs=%mova_rs esz=2 +LDST1 1110000 0 11 st:1 rm:5 v:1 .. pg:3 rn:5 0 za:3 off:1 \ + &ldst rs=%mova_rs esz=3 +LDST1 1110000 1 11 st:1 rm:5 v:1 .. pg:3 rn:5 0 za:4 \ + &ldst rs=%mova_rs esz=4 off=0 &ldstr rv rn imm @ldstr ....... ... . ...... .. ... rn:5 . imm:4 \ diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool sme2_zt0_enabled_check(DisasContext *s) return true; } -/* - * Resolve tile.size[index] to a host pointer, where tile and index - * are always decoded together, dependent on the element size. - */ +/* Resolve tile.size[rs+imm] to a host pointer. */ static TCGv_ptr get_tile_rowcol(DisasContext *s, int esz, int rs, - int tile_index, bool vertical) + int tile, int imm, bool vertical) { - int tile = tile_index >> (4 - esz); - int index = esz == MO_128 ? 0 : extract32(tile_index, 0, 4 - esz); int pos, len, offset; TCGv_i32 tmp; TCGv_ptr addr; @@ -XXX,XX +XXX,XX @@ static TCGv_ptr get_tile_rowcol(DisasContext *s, int esz, int rs, /* Compute the final index, which is Rs+imm. */ tmp = tcg_temp_new_i32(); tcg_gen_trunc_tl_i32(tmp, cpu_reg(s, rs)); - tcg_gen_addi_i32(tmp, tmp, index); + tcg_gen_addi_i32(tmp, tmp, imm); /* Prepare a power-of-two modulo via extraction of @len bits. */ len = ctz32(streaming_vec_reg_size(s)) - esz; @@ -XXX,XX +XXX,XX @@ static bool trans_MOVA(DisasContext *s, arg_MOVA *a) return true; } - t_za = get_tile_rowcol(s, a->esz, a->rs, a->za_imm, a->v); + t_za = get_tile_rowcol(s, a->esz, a->rs, a->za, a->off, a->v); t_zr = vec_full_reg_ptr(s, a->zr); t_pg = pred_full_reg_ptr(s, a->pg); @@ -XXX,XX +XXX,XX @@ static bool trans_LDST1(DisasContext *s, arg_LDST1 *a) return true; } - t_za = get_tile_rowcol(s, a->esz, a->rs, a->za_imm, a->v); + t_za = get_tile_rowcol(s, a->esz, a->rs, a->za, a->off, a->v); t_pg = pred_full_reg_ptr(s, a->pg); addr = tcg_temp_new_i64(); @@ -XXX,XX +XXX,XX @@ static bool do_ldst_r(DisasContext *s, arg_ldstr *a, GenLdStR *fn) } /* ZA[n] equates to ZA0H.B[n]. */ - base = get_tile_rowcol(s, MO_8, a->rv, imm, false); + base = get_tile_rowcol(s, MO_8, a->rv, 0, imm, false); fn(s, base, 0, svl, a->rn, imm * svl, s->align_mem ? MO_ALIGN_16 : MO_UNALN); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Prepare for more kinds of MOVA from SME2 by renaming the existing SME1 MOVA to indicate tile to/from vector. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-23-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 42 +++++++++++++++++----------------- target/arm/tcg/translate-sme.c | 12 +++++----- 2 files changed, 27 insertions(+), 27 deletions(-) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ ZERO_zt0 11000000 01 001 00000000000 00000001 ### SME Move into/from Array %mova_rs 13:2 !function=plus_12 -&mova esz rs pg zr za off v:bool to_vec:bool +&mova_p esz rs pg zr za off v:bool -MOVA 11000000 00 00000 0 v:1 .. pg:3 zr:5 0 off:4 \ - &mova to_vec=0 rs=%mova_rs esz=0 za=0 -MOVA 11000000 01 00000 0 v:1 .. pg:3 zr:5 0 za:1 off:3 \ - &mova to_vec=0 rs=%mova_rs esz=1 -MOVA 11000000 10 00000 0 v:1 .. pg:3 zr:5 0 za:2 off:2 \ - &mova to_vec=0 rs=%mova_rs esz=2 -MOVA 11000000 11 00000 0 v:1 .. pg:3 zr:5 0 za:3 off:1 \ - &mova to_vec=0 rs=%mova_rs esz=3 -MOVA 11000000 11 00000 1 v:1 .. pg:3 zr:5 0 za:4 \ - &mova to_vec=0 rs=%mova_rs esz=4 off=0 +MOVA_tz 11000000 00 00000 0 v:1 .. pg:3 zr:5 0 off:4 \ + &mova_p rs=%mova_rs esz=0 za=0 +MOVA_tz 11000000 01 00000 0 v:1 .. pg:3 zr:5 0 za:1 off:3 \ + &mova_p rs=%mova_rs esz=1 +MOVA_tz 11000000 10 00000 0 v:1 .. pg:3 zr:5 0 za:2 off:2 \ + &mova_p rs=%mova_rs esz=2 +MOVA_tz 11000000 11 00000 0 v:1 .. pg:3 zr:5 0 za:3 off:1 \ + &mova_p rs=%mova_rs esz=3 +MOVA_tz 11000000 11 00000 1 v:1 .. pg:3 zr:5 0 za:4 \ + &mova_p rs=%mova_rs esz=4 off=0 -MOVA 11000000 00 00001 0 v:1 .. pg:3 0 off:4 zr:5 \ - &mova to_vec=1 rs=%mova_rs esz=0 za=0 -MOVA 11000000 01 00001 0 v:1 .. pg:3 0 za:1 off:3 zr:5 \ - &mova to_vec=1 rs=%mova_rs esz=1 -MOVA 11000000 10 00001 0 v:1 .. pg:3 0 za:2 off:2 zr:5 \ - &mova to_vec=1 rs=%mova_rs esz=2 -MOVA 11000000 11 00001 0 v:1 .. pg:3 0 za:3 off:1 zr:5 \ - &mova to_vec=1 rs=%mova_rs esz=3 -MOVA 11000000 11 00001 1 v:1 .. pg:3 0 za:4 zr:5 \ - &mova to_vec=1 rs=%mova_rs esz=4 off=0 +MOVA_zt 11000000 00 00001 0 v:1 .. pg:3 0 off:4 zr:5 \ + &mova_p rs=%mova_rs esz=0 za=0 +MOVA_zt 11000000 01 00001 0 v:1 .. pg:3 0 za:1 off:3 zr:5 \ + &mova_p rs=%mova_rs esz=1 +MOVA_zt 11000000 10 00001 0 v:1 .. pg:3 0 za:2 off:2 zr:5 \ + &mova_p rs=%mova_rs esz=2 +MOVA_zt 11000000 11 00001 0 v:1 .. pg:3 0 za:3 off:1 zr:5 \ + &mova_p rs=%mova_rs esz=3 +MOVA_zt 11000000 11 00001 1 v:1 .. pg:3 0 za:4 zr:5 \ + &mova_p rs=%mova_rs esz=4 off=0 ### SME Move into/from ZT0 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool trans_ZERO_zt0(DisasContext *s, arg_ZERO_zt0 *a) return true; } -static bool trans_MOVA(DisasContext *s, arg_MOVA *a) +static bool do_mova_tile(DisasContext *s, arg_mova_p *a, bool to_vec) { static gen_helper_gvec_4 * const h_fns[5] = { gen_helper_sve_sel_zpzz_b, gen_helper_sve_sel_zpzz_h, @@ -XXX,XX +XXX,XX @@ static bool trans_MOVA(DisasContext *s, arg_MOVA *a) TCGv_i32 t_desc; int svl; - if (!dc_isar_feature(aa64_sme, s)) { - return false; - } if (!sme_smza_enabled_check(s)) { return true; } @@ -XXX,XX +XXX,XX @@ static bool trans_MOVA(DisasContext *s, arg_MOVA *a) if (a->v) { /* Vertical slice -- use sme mova helpers. */ - if (a->to_vec) { + if (to_vec) { zc_fns[a->esz](t_zr, t_za, t_pg, t_desc); } else { cz_fns[a->esz](t_za, t_zr, t_pg, t_desc); } } else { /* Horizontal slice -- reuse sve sel helpers. */ - if (a->to_vec) { + if (to_vec) { h_fns[a->esz](t_zr, t_za, t_zr, t_pg, t_desc); } else { h_fns[a->esz](t_za, t_zr, t_za, t_pg, t_desc); @@ -XXX,XX +XXX,XX @@ static bool trans_MOVA(DisasContext *s, arg_MOVA *a) return true; } +TRANS_FEAT(MOVA_tz, aa64_sme, do_mova_tile, a, false) +TRANS_FEAT(MOVA_zt, aa64_sme, do_mova_tile, a, true) + static bool do_movt(DisasContext *s, arg_MOVT_rzt *a, void (*func)(TCGv_i64, TCGv_ptr, tcg_target_long)) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Prepare for MOVA array to/from vector with multiple registers by adding a div_len parameter, herein always 1, and a vec_mod parameter, herein always 0. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-24-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-sme.c | 47 +++++++++++++++++++++++----------- 1 file changed, 32 insertions(+), 15 deletions(-) diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool sme2_zt0_enabled_check(DisasContext *s) /* Resolve tile.size[rs+imm] to a host pointer. */ static TCGv_ptr get_tile_rowcol(DisasContext *s, int esz, int rs, - int tile, int imm, bool vertical) + int tile, int imm, int div_len, + int vec_mod, bool vertical) { int pos, len, offset; TCGv_i32 tmp; @@ -XXX,XX +XXX,XX @@ static TCGv_ptr get_tile_rowcol(DisasContext *s, int esz, int rs, /* Compute the final index, which is Rs+imm. */ tmp = tcg_temp_new_i32(); tcg_gen_trunc_tl_i32(tmp, cpu_reg(s, rs)); + /* + * Round the vector index down to a multiple of vec_mod if necessary. + * We do this before adding the offset, to handle cases like + * MOVA (tile to vector, 2 registers) where we want to call this + * several times in a loop with an increasing offset. We rely on + * the instruction encodings always forcing the initial offset in + * [rs + offset] to be a multiple of vec_mod. The pseudocode usually + * does the round-down after adding the offset rather than before, + * but MOVA is an exception. + */ + if (vec_mod > 1) { + tcg_gen_andc_i32(tmp, tmp, tcg_constant_i32(vec_mod - 1)); + } tcg_gen_addi_i32(tmp, tmp, imm); /* Prepare a power-of-two modulo via extraction of @len bits. */ - len = ctz32(streaming_vec_reg_size(s)) - esz; + len = ctz32(streaming_vec_reg_size(s) / div_len) - esz; if (!len) { /* @@ -XXX,XX +XXX,XX @@ static TCGv_ptr get_tile_rowcol(DisasContext *s, int esz, int rs, return addr; } +/* Resolve ZArray[rs+imm] to a host pointer. */ +static TCGv_ptr get_zarray(DisasContext *s, int rs, int imm, + int div_len, int vec_mod) +{ + /* ZA[n] equates to ZA0H.B[n]. */ + return get_tile_rowcol(s, MO_8, rs, 0, imm, div_len, vec_mod, false); +} + /* * Resolve tile.size[0] to a host pointer. * Used by e.g. outer product insns where we require the entire tile. @@ -XXX,XX +XXX,XX @@ static bool do_mova_tile(DisasContext *s, arg_mova_p *a, bool to_vec) return true; } - t_za = get_tile_rowcol(s, a->esz, a->rs, a->za, a->off, a->v); + t_za = get_tile_rowcol(s, a->esz, a->rs, a->za, a->off, 1, 0, a->v); t_zr = vec_full_reg_ptr(s, a->zr); t_pg = pred_full_reg_ptr(s, a->pg); @@ -XXX,XX +XXX,XX @@ static bool trans_LDST1(DisasContext *s, arg_LDST1 *a) return true; } - t_za = get_tile_rowcol(s, a->esz, a->rs, a->za, a->off, a->v); + t_za = get_tile_rowcol(s, a->esz, a->rs, a->za, a->off, 1, 0, a->v); t_pg = pred_full_reg_ptr(s, a->pg); addr = tcg_temp_new_i64(); @@ -XXX,XX +XXX,XX @@ typedef void GenLdStR(DisasContext *, TCGv_ptr, int, int, int, int, MemOp); static bool do_ldst_r(DisasContext *s, arg_ldstr *a, GenLdStR *fn) { - int svl = streaming_vec_reg_size(s); - int imm = a->imm; - TCGv_ptr base; + if (sme_za_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + int imm = a->imm; + TCGv_ptr base = get_zarray(s, a->rv, imm, 1, 0); - if (!sme_za_enabled_check(s)) { - return true; + fn(s, base, 0, svl, a->rn, imm * svl, + s->align_mem ? MO_ALIGN_16 : MO_UNALN); } - - /* ZA[n] equates to ZA0H.B[n]. */ - base = get_tile_rowcol(s, MO_8, a->rv, 0, imm, false); - - fn(s, base, 0, svl, a->rn, imm * svl, - s->align_mem ? MO_ALIGN_16 : MO_UNALN); return true; } -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-25-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/cpu.h | 1 + target/arm/cpu64.c | 1 + 2 files changed, 2 insertions(+) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -XXX,XX +XXX,XX @@ struct ArchCPU { /* Used to set the maximum vector length the cpu will support. */ uint32_t sve_max_vq; + uint32_t sme_max_vq; #ifdef CONFIG_USER_ONLY /* Used to set the default vector length at process start. */ diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu64.c +++ b/target/arm/cpu64.c @@ -XXX,XX +XXX,XX @@ void arm_cpu_sme_finalize(ARMCPU *cpu, Error **errp) } cpu->sme_vq.map = vq_map; + cpu->sme_max_vq = 32 - clz32(vq_map); } static bool cpu_arm_get_sme(Object *obj, Error **errp) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-26-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 9 +++++ target/arm/tcg/translate.h | 1 + target/arm/tcg/sme.decode | 37 ++++++++++++++++++ target/arm/tcg/sme_helper.c | 64 ++++++++++++++++++++++++++++++ target/arm/tcg/translate-a64.c | 1 + target/arm/tcg/translate-sme.c | 71 ++++++++++++++++++++++++++++++++++ 6 files changed, 183 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sme_mova_zc_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sme_mova_cz_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sme_mova_zc_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_mova_cz_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_mova_zc_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_mova_cz_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_mova_zc_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_mova_cz_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_mova_zc_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_mova_cz_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_mova_zc_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(sme_ld1b_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_5(sme_ld1b_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_5(sme_ld1b_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32) diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate.h +++ b/target/arm/tcg/translate.h @@ -XXX,XX +XXX,XX @@ typedef struct DisasContext { int zt0_excp_el; /* ZT0 exception EL or 0 if enabled */ int vl; /* current vector length in bytes */ int svl; /* current streaming vector length in bytes */ + int max_svl; /* maximum implemented streaming vector length */ bool vfp_enabled; /* FP enabled via FPSCR.EN */ int vec_len; int vec_stride; diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ ZERO_zt0 11000000 01 001 00000000000 00000001 %mova_rs 13:2 !function=plus_12 &mova_p esz rs pg zr za off v:bool +&mova_t esz rs zr za off v:bool MOVA_tz 11000000 00 00000 0 v:1 .. pg:3 zr:5 0 off:4 \ &mova_p rs=%mova_rs esz=0 za=0 @@ -XXX,XX +XXX,XX @@ MOVA_zt 11000000 11 00001 0 v:1 .. pg:3 0 za:3 off:1 zr:5 \ MOVA_zt 11000000 11 00001 1 v:1 .. pg:3 0 za:4 zr:5 \ &mova_p rs=%mova_rs esz=4 off=0 +MOVA_tz2 11000000 00 00010 0 v:1 .. 000 zr:4 0 00 off:3 \ + &mova_t rs=%mova_rs esz=0 za=0 +MOVA_tz2 11000000 01 00010 0 v:1 .. 000 zr:4 0 00 za:1 off:2 \ + &mova_t rs=%mova_rs esz=1 +MOVA_tz2 11000000 10 00010 0 v:1 .. 000 zr:4 0 00 za:2 off:1 \ + &mova_t rs=%mova_rs esz=2 +MOVA_tz2 11000000 11 00010 0 v:1 .. 000 zr:4 0 00 za:3 \ + &mova_t rs=%mova_rs esz=3 off=0 + +MOVA_zt2 11000000 00 00011 0 v:1 .. 000 00 off:3 zr:4 0 \ + &mova_t rs=%mova_rs esz=0 za=0 +MOVA_zt2 11000000 01 00011 0 v:1 .. 000 00 za:1 off:2 zr:4 0 \ + &mova_t rs=%mova_rs esz=1 +MOVA_zt2 11000000 10 00011 0 v:1 .. 000 00 za:2 off:1 zr:4 0 \ + &mova_t rs=%mova_rs esz=2 +MOVA_zt2 11000000 11 00011 0 v:1 .. 000 00 za:3 zr:4 0 \ + &mova_t rs=%mova_rs esz=3 off=0 + +MOVA_tz4 11000000 00 00010 0 v:1 .. 001 zr:3 00 000 off:2 \ + &mova_t rs=%mova_rs esz=0 za=0 +MOVA_tz4 11000000 01 00010 0 v:1 .. 001 zr:3 00 000 za:1 off:1 \ + &mova_t rs=%mova_rs esz=1 +MOVA_tz4 11000000 10 00010 0 v:1 .. 001 zr:3 00 000 za:2 \ + &mova_t rs=%mova_rs esz=2 off=0 +MOVA_tz4 11000000 11 00010 0 v:1 .. 001 zr:3 00 00 za:3 \ + &mova_t rs=%mova_rs esz=3 off=0 + +MOVA_zt4 11000000 00 00011 0 v:1 .. 001 000 off:2 zr:3 00 \ + &mova_t rs=%mova_rs esz=0 za=0 +MOVA_zt4 11000000 01 00011 0 v:1 .. 001 000 za:1 off:1 zr:3 00 \ + &mova_t rs=%mova_rs esz=1 +MOVA_zt4 11000000 10 00011 0 v:1 .. 001 000 za:2 zr:3 00 \ + &mova_t rs=%mova_rs esz=2 off=0 +MOVA_zt4 11000000 11 00011 0 v:1 .. 001 00 za:3 zr:3 00 \ + &mova_t rs=%mova_rs esz=3 off=0 + ### SME Move into/from ZT0 MOVT_rzt 1100 0000 0100 1100 0 off:3 00 11111 rt:5 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme_mova_zc_q)(void *vd, void *za, void *vg, uint32_t desc) #undef DO_MOVA_Z +void HELPER(sme2_mova_zc_b)(void *vdst, void *vsrc, uint32_t desc) +{ + const uint8_t *src = vsrc; + uint8_t *dst = vdst; + size_t i, n = simd_oprsz(desc); + + for (i = 0; i < n; ++i) { + dst[i] = src[tile_vslice_index(i)]; + } +} + +void HELPER(sme2_mova_zc_h)(void *vdst, void *vsrc, uint32_t desc) +{ + const uint16_t *src = vsrc; + uint16_t *dst = vdst; + size_t i, n = simd_oprsz(desc) / 2; + + for (i = 0; i < n; ++i) { + dst[i] = src[tile_vslice_index(i)]; + } +} + +void HELPER(sme2_mova_zc_s)(void *vdst, void *vsrc, uint32_t desc) +{ + const uint32_t *src = vsrc; + uint32_t *dst = vdst; + size_t i, n = simd_oprsz(desc) / 4; + + for (i = 0; i < n; ++i) { + dst[i] = src[tile_vslice_index(i)]; + } +} + +void HELPER(sme2_mova_zc_d)(void *vdst, void *vsrc, uint32_t desc) +{ + const uint64_t *src = vsrc; + uint64_t *dst = vdst; + size_t i, n = simd_oprsz(desc) / 8; + + for (i = 0; i < n; ++i) { + dst[i] = src[tile_vslice_index(i)]; + } +} + /* * Clear elements in a tile slice comprising len bytes. */ @@ -XXX,XX +XXX,XX @@ static void copy_vertical_q(void *vdst, const void *vsrc, size_t len) } } +void HELPER(sme2_mova_cz_b)(void *vdst, void *vsrc, uint32_t desc) +{ + copy_vertical_b(vdst, vsrc, simd_oprsz(desc)); +} + +void HELPER(sme2_mova_cz_h)(void *vdst, void *vsrc, uint32_t desc) +{ + copy_vertical_h(vdst, vsrc, simd_oprsz(desc)); +} + +void HELPER(sme2_mova_cz_s)(void *vdst, void *vsrc, uint32_t desc) +{ + copy_vertical_s(vdst, vsrc, simd_oprsz(desc)); +} + +void HELPER(sme2_mova_cz_d)(void *vdst, void *vsrc, uint32_t desc) +{ + copy_vertical_d(vdst, vsrc, simd_oprsz(desc)); +} + /* * Host and TLB primitives for vertical tile slice addressing. */ diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase, dc->zt0_excp_el = EX_TBFLAG_A64(tb_flags, ZT0EXC_EL); dc->vl = (EX_TBFLAG_A64(tb_flags, VL) + 1) * 16; dc->svl = (EX_TBFLAG_A64(tb_flags, SVL) + 1) * 16; + dc->max_svl = arm_cpu->sme_max_vq * 16; dc->pauth_active = EX_TBFLAG_A64(tb_flags, PAUTH_ACTIVE); dc->bt = EX_TBFLAG_A64(tb_flags, BT); dc->btype = EX_TBFLAG_A64(tb_flags, BTYPE); diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_mova_tile(DisasContext *s, arg_mova_p *a, bool to_vec) TRANS_FEAT(MOVA_tz, aa64_sme, do_mova_tile, a, false) TRANS_FEAT(MOVA_zt, aa64_sme, do_mova_tile, a, true) +static bool do_mova_tile_n(DisasContext *s, arg_mova_t *a, int n, bool to_vec) +{ + static gen_helper_gvec_2 * const cz_fns[] = { + gen_helper_sme2_mova_cz_b, gen_helper_sme2_mova_cz_h, + gen_helper_sme2_mova_cz_s, gen_helper_sme2_mova_cz_d, + }; + static gen_helper_gvec_2 * const zc_fns[] = { + gen_helper_sme2_mova_zc_b, gen_helper_sme2_mova_zc_h, + gen_helper_sme2_mova_zc_s, gen_helper_sme2_mova_zc_d, + }; + TCGv_ptr t_za; + int svl, bytes_per_op = n << a->esz; + + /* + * The MaxImplementedSVL check happens in the decode pseudocode, + * before the SM+ZA enabled check in the operation pseudocode. + * This will (currently) only fail for NREG=4, ESZ=MO_64. + */ + if (s->max_svl < bytes_per_op) { + unallocated_encoding(s); + return true; + } + + if (!sme_smza_enabled_check(s)) { + return true; + } + + svl = streaming_vec_reg_size(s); + + /* + * The CurrentVL check happens in the operation pseudocode, + * after the SM+ZA enabled check. + */ + if (svl < bytes_per_op) { + unallocated_encoding(s); + return true; + } + + if (a->v) { + TCGv_i32 t_desc = tcg_constant_i32(simd_desc(svl, svl, 0)); + + for (int i = 0; i < n; ++i) { + TCGv_ptr t_zr = vec_full_reg_ptr(s, a->zr * n + i); + t_za = get_tile_rowcol(s, a->esz, a->rs, a->za, + a->off * n + i, 1, n, a->v); + if (to_vec) { + zc_fns[a->esz](t_zr, t_za, t_desc); + } else { + cz_fns[a->esz](t_za, t_zr, t_desc); + } + } + } else { + for (int i = 0; i < n; ++i) { + int o_zr = vec_full_reg_offset(s, a->zr * n + i); + t_za = get_tile_rowcol(s, a->esz, a->rs, a->za, + a->off * n + i, 1, n, a->v); + if (to_vec) { + tcg_gen_gvec_mov_var(MO_8, tcg_env, o_zr, t_za, 0, svl, svl); + } else { + tcg_gen_gvec_mov_var(MO_8, t_za, 0, tcg_env, o_zr, svl, svl); + } + } + } + return true; +} + +TRANS_FEAT(MOVA_tz2, aa64_sme2, do_mova_tile_n, a, 2, false) +TRANS_FEAT(MOVA_tz4, aa64_sme2, do_mova_tile_n, a, 4, false) +TRANS_FEAT(MOVA_zt2, aa64_sme2, do_mova_tile_n, a, 2, true) +TRANS_FEAT(MOVA_zt4, aa64_sme2, do_mova_tile_n, a, 4, true) + static bool do_movt(DisasContext *s, arg_MOVT_rzt *a, void (*func)(TCGv_i64, TCGv_ptr, tcg_target_long)) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-27-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate.h | 5 +++++ target/arm/tcg/sme.decode | 12 ++++++++++++ target/arm/tcg/translate-sme.c | 30 ++++++++++++++++++++++++++++++ 3 files changed, 47 insertions(+) diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate.h +++ b/target/arm/tcg/translate.h @@ -XXX,XX +XXX,XX @@ static inline int plus_2(DisasContext *s, int x) return x + 2; } +static inline int plus_8(DisasContext *s, int x) +{ + return x + 8; +} + static inline int plus_12(DisasContext *s, int x) { return x + 12; diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ ZERO_zt0 11000000 01 001 00000000000 00000001 ### SME Move into/from Array %mova_rs 13:2 !function=plus_12 +%mova_rv 13:2 !function=plus_8 +&mova_a rv zr off &mova_p esz rs pg zr za off v:bool &mova_t esz rs zr za off v:bool @@ -XXX,XX +XXX,XX @@ MOVA_zt4 11000000 10 00011 0 v:1 .. 001 000 za:2 zr:3 00 \ MOVA_zt4 11000000 11 00011 0 v:1 .. 001 00 za:3 zr:3 00 \ &mova_t rs=%mova_rs esz=3 off=0 +MOVA_az2 11000000 00 00010 00 .. 010 zr:4 000 off:3 \ + &mova_a rv=%mova_rv +MOVA_az4 11000000 00 00010 00 .. 011 zr:3 0000 off:3 \ + &mova_a rv=%mova_rv + +MOVA_za2 11000000 00 00011 00 .. 010 00 off:3 zr:4 0 \ + &mova_a rv=%mova_rv +MOVA_za4 11000000 00 00011 00 .. 011 00 off:3 zr:3 00 \ + &mova_a rv=%mova_rv + ### SME Move into/from ZT0 MOVT_rzt 1100 0000 0100 1100 0 off:3 00 11111 rt:5 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(MOVA_tz4, aa64_sme2, do_mova_tile_n, a, 4, false) TRANS_FEAT(MOVA_zt2, aa64_sme2, do_mova_tile_n, a, 2, true) TRANS_FEAT(MOVA_zt4, aa64_sme2, do_mova_tile_n, a, 4, true) +static bool do_mova_array_n(DisasContext *s, arg_mova_a *a, int n, bool to_vec) +{ + TCGv_ptr t_za; + int svl; + + if (!sme_smza_enabled_check(s)) { + return true; + } + + svl = streaming_vec_reg_size(s); + t_za = get_zarray(s, a->rv, a->off, n, 0); + + for (int i = 0; i < n; ++i) { + int o_za = (svl / n * sizeof(ARMVectorReg)) * i; + int o_zr = vec_full_reg_offset(s, a->zr * n + i); + + if (to_vec) { + tcg_gen_gvec_mov_var(MO_8, tcg_env, o_zr, t_za, o_za, svl, svl); + } else { + tcg_gen_gvec_mov_var(MO_8, t_za, o_za, tcg_env, o_zr, svl, svl); + } + } + return true; +} + +TRANS_FEAT(MOVA_az2, aa64_sme2, do_mova_array_n, a, 2, false) +TRANS_FEAT(MOVA_az4, aa64_sme2, do_mova_array_n, a, 4, false) +TRANS_FEAT(MOVA_za2, aa64_sme2, do_mova_array_n, a, 2, true) +TRANS_FEAT(MOVA_za4, aa64_sme2, do_mova_array_n, a, 4, true) + static bool do_movt(DisasContext *s, arg_MOVT_rzt *a, void (*func)(TCGv_i64, TCGv_ptr, tcg_target_long)) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-28-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 3 +++ target/arm/tcg/sme.decode | 2 ++ target/arm/tcg/sme_helper.c | 34 ++++++++++++++++++++++++---------- target/arm/tcg/translate-sme.c | 2 ++ 4 files changed, 31 insertions(+), 10 deletions(-) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sme_sumopa_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_6(sme_usmopa_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_6(sme2_bmopa_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SMOPA_d 1010000 0 11 0 ..... ... ... ..... . 0 ... @op_64 SUMOPA_d 1010000 0 11 1 ..... ... ... ..... . 0 ... @op_64 USMOPA_d 1010000 1 11 0 ..... ... ... ..... . 0 ... @op_64 UMOPA_d 1010000 1 11 1 ..... ... ... ..... . 0 ... @op_64 + +BMOPA 1000000 0 10 0 ..... ... ... ..... . 10 .. @op_32 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ DEF_IMOP_64(umopa_d, uint16_t, uint16_t) DEF_IMOP_64(sumopa_d, int16_t, uint16_t) DEF_IMOP_64(usmopa_d, uint16_t, int16_t) -#define DEF_IMOPH(NAME, S) \ - void HELPER(sme_##NAME##_##S)(void *vza, void *vzn, void *vzm, \ +#define DEF_IMOPH(P, NAME, S) \ + void HELPER(P##_##NAME##_##S)(void *vza, void *vzn, void *vzm, \ void *vpn, void *vpm, uint32_t desc) \ { do_imopa_##S(vza, vzn, vzm, vpn, vpm, desc, NAME##_##S); } -DEF_IMOPH(smopa, s) -DEF_IMOPH(umopa, s) -DEF_IMOPH(sumopa, s) -DEF_IMOPH(usmopa, s) +DEF_IMOPH(sme, smopa, s) +DEF_IMOPH(sme, umopa, s) +DEF_IMOPH(sme, sumopa, s) +DEF_IMOPH(sme, usmopa, s) -DEF_IMOPH(smopa, d) -DEF_IMOPH(umopa, d) -DEF_IMOPH(sumopa, d) -DEF_IMOPH(usmopa, d) +DEF_IMOPH(sme, smopa, d) +DEF_IMOPH(sme, umopa, d) +DEF_IMOPH(sme, sumopa, d) +DEF_IMOPH(sme, usmopa, d) + +static uint32_t bmopa_s(uint32_t n, uint32_t m, uint32_t a, uint8_t p, bool neg) +{ + uint32_t sum = ctpop32(~(n ^ m)); + if (neg) { + sum = -sum; + } + if (!(p & 1)) { + sum = 0; + } + return a + sum; +} + +DEF_IMOPH(sme2, bmopa, s) diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SMOPA_d, aa64_sme_i16i64, do_outprod, a, MO_64, gen_helper_sme_smopa_ TRANS_FEAT(UMOPA_d, aa64_sme_i16i64, do_outprod, a, MO_64, gen_helper_sme_umopa_d) TRANS_FEAT(SUMOPA_d, aa64_sme_i16i64, do_outprod, a, MO_64, gen_helper_sme_sumopa_d) TRANS_FEAT(USMOPA_d, aa64_sme_i16i64, do_outprod, a, MO_64, gen_helper_sme_usmopa_d) + +TRANS_FEAT(BMOPA, aa64_sme2, do_outprod, a, MO_32, gen_helper_sme2_bmopa_s) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-29-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 4 ++++ target/arm/tcg/sme.decode | 2 ++ target/arm/tcg/sme_helper.c | 37 +++++++++++++++++++++++++--------- target/arm/tcg/translate-sme.c | 2 ++ 4 files changed, 35 insertions(+), 10 deletions(-) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sme_usmopa_d, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_6(sme2_bmopa_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sme2_smopa2_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sme2_umopa2_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ USMOPA_d 1010000 1 11 0 ..... ... ... ..... . 0 ... @op_64 UMOPA_d 1010000 1 11 1 ..... ... ... ..... . 0 ... @op_64 BMOPA 1000000 0 10 0 ..... ... ... ..... . 10 .. @op_32 +SMOPA2_s 1010000 0 10 0 ..... ... ... ..... . 10 .. @op_32 +UMOPA2_s 1010000 1 10 0 ..... ... ... ..... . 10 .. @op_32 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ static inline void do_imopa_d(uint64_t *za, uint64_t *zn, uint64_t *zm, } } -#define DEF_IMOP_32(NAME, NTYPE, MTYPE) \ +#define DEF_IMOP_8x4_32(NAME, NTYPE, MTYPE) \ static uint32_t NAME(uint32_t n, uint32_t m, uint32_t a, uint8_t p, bool neg) \ { \ uint32_t sum = 0; \ @@ -XXX,XX +XXX,XX @@ static uint32_t NAME(uint32_t n, uint32_t m, uint32_t a, uint8_t p, bool neg) \ return neg ? a - sum : a + sum; \ } -#define DEF_IMOP_64(NAME, NTYPE, MTYPE) \ +#define DEF_IMOP_16x4_64(NAME, NTYPE, MTYPE) \ static uint64_t NAME(uint64_t n, uint64_t m, uint64_t a, uint8_t p, bool neg) \ { \ uint64_t sum = 0; \ @@ -XXX,XX +XXX,XX @@ static uint64_t NAME(uint64_t n, uint64_t m, uint64_t a, uint8_t p, bool neg) \ return neg ? a - sum : a + sum; \ } -DEF_IMOP_32(smopa_s, int8_t, int8_t) -DEF_IMOP_32(umopa_s, uint8_t, uint8_t) -DEF_IMOP_32(sumopa_s, int8_t, uint8_t) -DEF_IMOP_32(usmopa_s, uint8_t, int8_t) +DEF_IMOP_8x4_32(smopa_s, int8_t, int8_t) +DEF_IMOP_8x4_32(umopa_s, uint8_t, uint8_t) +DEF_IMOP_8x4_32(sumopa_s, int8_t, uint8_t) +DEF_IMOP_8x4_32(usmopa_s, uint8_t, int8_t) -DEF_IMOP_64(smopa_d, int16_t, int16_t) -DEF_IMOP_64(umopa_d, uint16_t, uint16_t) -DEF_IMOP_64(sumopa_d, int16_t, uint16_t) -DEF_IMOP_64(usmopa_d, uint16_t, int16_t) +DEF_IMOP_16x4_64(smopa_d, int16_t, int16_t) +DEF_IMOP_16x4_64(umopa_d, uint16_t, uint16_t) +DEF_IMOP_16x4_64(sumopa_d, int16_t, uint16_t) +DEF_IMOP_16x4_64(usmopa_d, uint16_t, int16_t) #define DEF_IMOPH(P, NAME, S) \ void HELPER(P##_##NAME##_##S)(void *vza, void *vzn, void *vzm, \ @@ -XXX,XX +XXX,XX @@ static uint32_t bmopa_s(uint32_t n, uint32_t m, uint32_t a, uint8_t p, bool neg) } DEF_IMOPH(sme2, bmopa, s) + +#define DEF_IMOP_16x2_32(NAME, NTYPE, MTYPE) \ +static uint32_t NAME(uint32_t n, uint32_t m, uint32_t a, uint8_t p, bool neg) \ +{ \ + uint32_t sum = 0; \ + /* Apply P to N as a mask, making the inactive elements 0. */ \ + n &= expand_pred_h(p); \ + sum += (NTYPE)(n >> 0) * (MTYPE)(m >> 0); \ + sum += (NTYPE)(n >> 16) * (MTYPE)(m >> 16); \ + return neg ? a - sum : a + sum; \ +} + +DEF_IMOP_16x2_32(smopa2_s, int16_t, int16_t) +DEF_IMOP_16x2_32(umopa2_s, uint16_t, uint16_t) + +DEF_IMOPH(sme2, smopa2, s) +DEF_IMOPH(sme2, umopa2, s) diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SUMOPA_d, aa64_sme_i16i64, do_outprod, a, MO_64, gen_helper_sme_sumop TRANS_FEAT(USMOPA_d, aa64_sme_i16i64, do_outprod, a, MO_64, gen_helper_sme_usmopa_d) TRANS_FEAT(BMOPA, aa64_sme2, do_outprod, a, MO_32, gen_helper_sme2_bmopa_s) +TRANS_FEAT(SMOPA2_s, aa64_sme2, do_outprod, a, MO_32, gen_helper_sme2_smopa2_s) +TRANS_FEAT(UMOPA2_s, aa64_sme2, do_outprod, a, MO_32, gen_helper_sme2_umopa2_s) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> To be used by both SVE2 and SME2. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-30-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-a64.h | 4 ++++ target/arm/tcg/gengvec64.c | 11 +++++++++++ target/arm/tcg/translate-sve.c | 8 +------- 3 files changed, 16 insertions(+), 7 deletions(-) diff --git a/target/arm/tcg/translate-a64.h b/target/arm/tcg/translate-a64.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.h +++ b/target/arm/tcg/translate-a64.h @@ -XXX,XX +XXX,XX @@ void gen_gvec_usqadd_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz); +void gen_gvec_sve2_sqdmulh(unsigned vece, uint32_t rd_ofs, + uint32_t rn_ofs, uint32_t rm_ofs, + uint32_t opr_sz, uint32_t max_sz); + void gen_sve_ldr(DisasContext *s, TCGv_ptr, int vofs, int len, int rn, int imm, MemOp align); void gen_sve_str(DisasContext *s, TCGv_ptr, int vofs, diff --git a/target/arm/tcg/gengvec64.c b/target/arm/tcg/gengvec64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/gengvec64.c +++ b/target/arm/tcg/gengvec64.c @@ -XXX,XX +XXX,XX @@ void gen_gvec_usqadd_qc(unsigned vece, uint32_t rd_ofs, tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc), rn_ofs, rm_ofs, opr_sz, max_sz, &ops[vece]); } + +void gen_gvec_sve2_sqdmulh(unsigned vece, uint32_t rd_ofs, + uint32_t rn_ofs, uint32_t rm_ofs, + uint32_t opr_sz, uint32_t max_sz) +{ + static gen_helper_gvec_3 * const fns[4] = { + gen_helper_sve2_sqdmulh_b, gen_helper_sve2_sqdmulh_h, + gen_helper_sve2_sqdmulh_s, gen_helper_sve2_sqdmulh_d, + }; + tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, 0, fns[vece]); +} diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(MOVPRFX_z, aa64_sve, do_movz_zpz, a->rd, a->rn, a->pg, a->esz, false) */ TRANS_FEAT(MUL_zzz, aa64_sve2, gen_gvec_fn_arg_zzz, tcg_gen_gvec_mul, a) +TRANS_FEAT(SQDMULH_zzz, aa64_sve2, gen_gvec_fn_arg_zzz, gen_gvec_sve2_sqdmulh, a) static gen_helper_gvec_3 * const smulh_zzz_fns[4] = { gen_helper_gvec_smulh_b, gen_helper_gvec_smulh_h, @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(UMULH_zzz, aa64_sve2, gen_gvec_ool_arg_zzz, TRANS_FEAT(PMUL_zzz, aa64_sve2, gen_gvec_ool_arg_zzz, gen_helper_gvec_pmul_b, a, 0) -static gen_helper_gvec_3 * const sqdmulh_zzz_fns[4] = { - gen_helper_sve2_sqdmulh_b, gen_helper_sve2_sqdmulh_h, - gen_helper_sve2_sqdmulh_s, gen_helper_sve2_sqdmulh_d, -}; -TRANS_FEAT(SQDMULH_zzz, aa64_sve2, gen_gvec_ool_arg_zzz, - sqdmulh_zzz_fns[a->esz], a, 0) - static gen_helper_gvec_3 * const sqrdmulh_zzz_fns[4] = { gen_helper_sve2_sqrdmulh_b, gen_helper_sve2_sqrdmulh_h, gen_helper_sve2_sqrdmulh_s, gen_helper_sve2_sqrdmulh_d, -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-31-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 13 ++++ target/arm/tcg/helper.h | 8 ++ target/arm/tcg/vec_internal.h | 4 + target/arm/tcg/sme.decode | 40 ++++++++++ target/arm/tcg/helper-a64.c | 2 + target/arm/tcg/neon_helper.c | 30 ++++++++ target/arm/tcg/translate-sme.c | 137 +++++++++++++++++++++++++++++++++ target/arm/tcg/vec_helper.c | 7 ++ 8 files changed, 241 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sme2_smopa2_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_6(sme2_umopa2_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(gvec_fmax_b16, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_fmin_b16, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_ah_fmax_b16, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_ah_fmin_b16, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_fmaxnum_b16, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_fminnum_b16, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper.h +++ b/target/arm/tcg/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_urshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_urshl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_urshl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_srshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_srshl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_srshl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sme2_urshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_urshl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_urshl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_2(neon_add_u8, i32, i32, i32) DEF_HELPER_2(neon_add_u16, i32, i32, i32) DEF_HELPER_2(neon_sub_u8, i32, i32, i32) diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_internal.h +++ b/target/arm/tcg/vec_internal.h @@ -XXX,XX +XXX,XX @@ static inline float64 float64_maybe_ah_chs(float64 a, bool fpcr_ah) return fpcr_ah && float64_is_any_nan(a) ? a : float64_chs(a); } +/* Not actually called directly as a helper, but uses similar machinery. */ +bfloat16 helper_sme2_ah_fmax_b16(bfloat16 a, bfloat16 b, float_status *fpst); +bfloat16 helper_sme2_ah_fmin_b16(bfloat16 a, bfloat16 b, float_status *fpst); + #endif /* TARGET_ARM_VEC_INTERNAL_H */ diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ UMOPA_d 1010000 1 11 1 ..... ... ... ..... . 0 ... @op_64 BMOPA 1000000 0 10 0 ..... ... ... ..... . 10 .. @op_32 SMOPA2_s 1010000 0 10 0 ..... ... ... ..... . 10 .. @op_32 UMOPA2_s 1010000 1 10 0 ..... ... ... ..... . 10 .. @op_32 + +### SME2 Multi-vector Multiple and Single SVE Destructive + +%zd_ax2 1:4 !function=times_2 +%zd_ax4 2:3 !function=times_4 + +&z2z_en zdn zm esz n +@z2z_2x1 ....... . esz:2 .. zm:4 ....0. ..... .... . \ + &z2z_en n=2 zdn=%zd_ax2 +@z2z_4x1 ....... . esz:2 .. zm:4 ....1. ..... ...0 . \ + &z2z_en n=4 zdn=%zd_ax4 + +SMAX_n1 1100000 1 .. 10 .... 1010.0 00000 .... 0 @z2z_2x1 +SMAX_n1 1100000 1 .. 10 .... 1010.0 00000 .... 0 @z2z_4x1 +UMAX_n1 1100000 1 .. 10 .... 1010.0 00000 .... 1 @z2z_2x1 +UMAX_n1 1100000 1 .. 10 .... 1010.0 00000 .... 1 @z2z_4x1 +SMIN_n1 1100000 1 .. 10 .... 1010.0 00001 .... 0 @z2z_2x1 +SMIN_n1 1100000 1 .. 10 .... 1010.0 00001 .... 0 @z2z_4x1 +UMIN_n1 1100000 1 .. 10 .... 1010.0 00001 .... 1 @z2z_2x1 +UMIN_n1 1100000 1 .. 10 .... 1010.0 00001 .... 1 @z2z_4x1 + +FMAX_n1 1100000 1 .. 10 .... 1010.0 01000 .... 0 @z2z_2x1 +FMAX_n1 1100000 1 .. 10 .... 1010.0 01000 .... 0 @z2z_4x1 +FMIN_n1 1100000 1 .. 10 .... 1010.0 01000 .... 1 @z2z_2x1 +FMIN_n1 1100000 1 .. 10 .... 1010.0 01000 .... 1 @z2z_4x1 +FMAXNM_n1 1100000 1 .. 10 .... 1010.0 01001 .... 0 @z2z_2x1 +FMAXNM_n1 1100000 1 .. 10 .... 1010.0 01001 .... 0 @z2z_4x1 +FMINNM_n1 1100000 1 .. 10 .... 1010.0 01001 .... 1 @z2z_2x1 +FMINNM_n1 1100000 1 .. 10 .... 1010.0 01001 .... 1 @z2z_4x1 + +SRSHL_n1 1100000 1 .. 10 .... 1010.0 10001 .... 0 @z2z_2x1 +SRSHL_n1 1100000 1 .. 10 .... 1010.0 10001 .... 0 @z2z_4x1 +URSHL_n1 1100000 1 .. 10 .... 1010.0 10001 .... 1 @z2z_2x1 +URSHL_n1 1100000 1 .. 10 .... 1010.0 10001 .... 1 @z2z_4x1 + +ADD_n1 1100000 1 .. 10 .... 1010.0 11000 .... 0 @z2z_2x1 +ADD_n1 1100000 1 .. 10 .... 1010.0 11000 .... 0 @z2z_4x1 + +SQDMULH_n1 1100000 1 .. 10 .... 1010.1 00000 .... 0 @z2z_2x1 +SQDMULH_n1 1100000 1 .. 10 .... 1010.1 00000 .... 0 @z2z_4x1 diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-a64.c +++ b/target/arm/tcg/helper-a64.c @@ -XXX,XX +XXX,XX @@ AH_MINMAX_HELPER(vfp_ah_mind, float64, float64, min) AH_MINMAX_HELPER(vfp_ah_maxh, dh_ctype_f16, float16, max) AH_MINMAX_HELPER(vfp_ah_maxs, float32, float32, max) AH_MINMAX_HELPER(vfp_ah_maxd, float64, float64, max) +AH_MINMAX_HELPER(sme2_ah_fmax_b16, bfloat16, bfloat16, max) +AH_MINMAX_HELPER(sme2_ah_fmin_b16, bfloat16, bfloat16, min) /* 64-bit versions of the CRC helpers. Note that although the operation * (and the prototypes of crc32c() and crc32() mean that only the bottom diff --git a/target/arm/tcg/neon_helper.c b/target/arm/tcg/neon_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/neon_helper.c +++ b/target/arm/tcg/neon_helper.c @@ -XXX,XX +XXX,XX @@ NEON_VOP(rshl_s16, neon_s16, 2) NEON_GVEC_VOP2(gvec_srshl_h, int16_t) #undef NEON_FN +#define NEON_FN(dest, src1, src2) \ + (dest = do_sqrshl_bhs(src1, src2, 16, true, NULL)) +NEON_GVEC_VOP2(sme2_srshl_h, int16_t) +#undef NEON_FN + #define NEON_FN(dest, src1, src2) \ (dest = do_sqrshl_bhs(src1, (int8_t)src2, 32, true, NULL)) NEON_GVEC_VOP2(gvec_srshl_s, int32_t) #undef NEON_FN +#define NEON_FN(dest, src1, src2) \ + (dest = do_sqrshl_bhs(src1, src2, 32, true, NULL)) +NEON_GVEC_VOP2(sme2_srshl_s, int32_t) +#undef NEON_FN + #define NEON_FN(dest, src1, src2) \ (dest = do_sqrshl_d(src1, (int8_t)src2, true, NULL)) NEON_GVEC_VOP2(gvec_srshl_d, int64_t) #undef NEON_FN +#define NEON_FN(dest, src1, src2) \ + (dest = do_sqrshl_d(src1, src2, true, NULL)) +NEON_GVEC_VOP2(sme2_srshl_d, int64_t) +#undef NEON_FN + uint32_t HELPER(neon_rshl_s32)(uint32_t val, uint32_t shift) { return do_sqrshl_bhs(val, (int8_t)shift, 32, true, NULL); @@ -XXX,XX +XXX,XX @@ NEON_VOP(rshl_u16, neon_u16, 2) NEON_GVEC_VOP2(gvec_urshl_h, uint16_t) #undef NEON_FN +#define NEON_FN(dest, src1, src2) \ + (dest = do_uqrshl_bhs(src1, (int16_t)src2, 16, true, NULL)) +NEON_GVEC_VOP2(sme2_urshl_h, uint16_t) +#undef NEON_FN + #define NEON_FN(dest, src1, src2) \ (dest = do_uqrshl_bhs(src1, (int8_t)src2, 32, true, NULL)) NEON_GVEC_VOP2(gvec_urshl_s, int32_t) #undef NEON_FN +#define NEON_FN(dest, src1, src2) \ + (dest = do_uqrshl_bhs(src1, src2, 32, true, NULL)) +NEON_GVEC_VOP2(sme2_urshl_s, int32_t) +#undef NEON_FN + #define NEON_FN(dest, src1, src2) \ (dest = do_uqrshl_d(src1, (int8_t)src2, true, NULL)) NEON_GVEC_VOP2(gvec_urshl_d, int64_t) #undef NEON_FN +#define NEON_FN(dest, src1, src2) \ + (dest = do_uqrshl_d(src1, src2, true, NULL)) +NEON_GVEC_VOP2(sme2_urshl_d, int64_t) +#undef NEON_FN + uint32_t HELPER(neon_rshl_u32)(uint32_t val, uint32_t shift) { return do_uqrshl_bhs(val, (int8_t)shift, 32, true, NULL); diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(USMOPA_d, aa64_sme_i16i64, do_outprod, a, MO_64, gen_helper_sme_usmop TRANS_FEAT(BMOPA, aa64_sme2, do_outprod, a, MO_32, gen_helper_sme2_bmopa_s) TRANS_FEAT(SMOPA2_s, aa64_sme2, do_outprod, a, MO_32, gen_helper_sme2_smopa2_s) TRANS_FEAT(UMOPA2_s, aa64_sme2, do_outprod, a, MO_32, gen_helper_sme2_umopa2_s) + +static bool do_z2z_n1(DisasContext *s, arg_z2z_en *a, GVecGen3Fn *fn) +{ + int esz, dn, vsz, mofs, n; + bool overlap = false; + + if (!sme_sm_enabled_check(s)) { + return true; + } + + esz = a->esz; + n = a->n; + dn = a->zdn; + mofs = vec_full_reg_offset(s, a->zm); + vsz = streaming_vec_reg_size(s); + + for (int i = 0; i < n; i++) { + int dofs = vec_full_reg_offset(s, dn + i); + if (dofs == mofs) { + overlap = true; + } else { + fn(esz, dofs, dofs, mofs, vsz, vsz); + } + } + if (overlap) { + fn(esz, mofs, mofs, mofs, vsz, vsz); + } + return true; +} + +static void gen_sme2_srshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz) +{ + static gen_helper_gvec_3 * const fns[] = { + gen_helper_gvec_srshl_b, gen_helper_sme2_srshl_h, + gen_helper_sme2_srshl_s, gen_helper_sme2_srshl_d, + }; + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, 0, fns[vece]); +} + +static void gen_sme2_urshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz) +{ + static gen_helper_gvec_3 * const fns[] = { + gen_helper_gvec_urshl_b, gen_helper_sme2_urshl_h, + gen_helper_sme2_urshl_s, gen_helper_sme2_urshl_d, + }; + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, 0, fns[vece]); +} + +TRANS_FEAT(ADD_n1, aa64_sme2, do_z2z_n1, a, tcg_gen_gvec_add) +TRANS_FEAT(SMAX_n1, aa64_sme2, do_z2z_n1, a, tcg_gen_gvec_smax) +TRANS_FEAT(SMIN_n1, aa64_sme2, do_z2z_n1, a, tcg_gen_gvec_smin) +TRANS_FEAT(UMAX_n1, aa64_sme2, do_z2z_n1, a, tcg_gen_gvec_umax) +TRANS_FEAT(UMIN_n1, aa64_sme2, do_z2z_n1, a, tcg_gen_gvec_umin) +TRANS_FEAT(SRSHL_n1, aa64_sme2, do_z2z_n1, a, gen_sme2_srshl) +TRANS_FEAT(URSHL_n1, aa64_sme2, do_z2z_n1, a, gen_sme2_urshl) +TRANS_FEAT(SQDMULH_n1, aa64_sme2, do_z2z_n1, a, gen_gvec_sve2_sqdmulh) + +static bool do_z2z_n1_fpst(DisasContext *s, arg_z2z_en *a, + gen_helper_gvec_3_ptr * const fns[4]) +{ + int esz = a->esz, n, dn, vsz, mofs; + bool overlap = false; + gen_helper_gvec_3_ptr *fn; + TCGv_ptr fpst; + + /* These insns use MO_8 to encode BFloat16. */ + if (esz == MO_8 && !dc_isar_feature(aa64_sme_b16b16, s)) { + return false; + } + if (!sme_sm_enabled_check(s)) { + return true; + } + + fpst = fpstatus_ptr(esz == MO_16 ? FPST_A64_F16 : FPST_A64); + fn = fns[esz]; + n = a->n; + dn = a->zdn; + mofs = vec_full_reg_offset(s, a->zm); + vsz = streaming_vec_reg_size(s); + + for (int i = 0; i < n; i++) { + int dofs = vec_full_reg_offset(s, dn + i); + if (dofs == mofs) { + overlap = true; + } else { + tcg_gen_gvec_3_ptr(dofs, dofs, mofs, fpst, vsz, vsz, 0, fn); + } + } + if (overlap) { + tcg_gen_gvec_3_ptr(mofs, mofs, mofs, fpst, vsz, vsz, 0, fn); + } + return true; +} + +static gen_helper_gvec_3_ptr * const f_vector_fmax[2][4] = { + { gen_helper_gvec_fmax_b16, + gen_helper_gvec_fmax_h, + gen_helper_gvec_fmax_s, + gen_helper_gvec_fmax_d }, + { gen_helper_gvec_ah_fmax_b16, + gen_helper_gvec_ah_fmax_h, + gen_helper_gvec_ah_fmax_s, + gen_helper_gvec_ah_fmax_d }, +}; +TRANS_FEAT(FMAX_n1, aa64_sme2, do_z2z_n1_fpst, a, f_vector_fmax[s->fpcr_ah]) + +static gen_helper_gvec_3_ptr * const f_vector_fmin[2][4] = { + { gen_helper_gvec_fmin_b16, + gen_helper_gvec_fmin_h, + gen_helper_gvec_fmin_s, + gen_helper_gvec_fmin_d }, + { gen_helper_gvec_ah_fmin_b16, + gen_helper_gvec_ah_fmin_h, + gen_helper_gvec_ah_fmin_s, + gen_helper_gvec_ah_fmin_d }, +}; +TRANS_FEAT(FMIN_n1, aa64_sme2, do_z2z_n1_fpst, a, f_vector_fmin[s->fpcr_ah]) + +static gen_helper_gvec_3_ptr * const f_vector_fmaxnm[4] = { + gen_helper_gvec_fmaxnum_b16, + gen_helper_gvec_fmaxnum_h, + gen_helper_gvec_fmaxnum_s, + gen_helper_gvec_fmaxnum_d, +}; +TRANS_FEAT(FMAXNM_n1, aa64_sme2, do_z2z_n1_fpst, a, f_vector_fmaxnm) + +static gen_helper_gvec_3_ptr * const f_vector_fminnm[4] = { + gen_helper_gvec_fminnum_b16, + gen_helper_gvec_fminnum_h, + gen_helper_gvec_fminnum_s, + gen_helper_gvec_fminnum_d, +}; +TRANS_FEAT(FMINNM_n1, aa64_sme2, do_z2z_n1_fpst, a, f_vector_fminnm) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_ah_fmin_h, helper_vfp_ah_minh, float16) DO_3OP(gvec_ah_fmin_s, helper_vfp_ah_mins, float32) DO_3OP(gvec_ah_fmin_d, helper_vfp_ah_mind, float64) +DO_3OP(gvec_fmax_b16, bfloat16_max, bfloat16) +DO_3OP(gvec_fmin_b16, bfloat16_min, bfloat16) +DO_3OP(gvec_fmaxnum_b16, bfloat16_maxnum, bfloat16) +DO_3OP(gvec_fminnum_b16, bfloat16_minnum, bfloat16) +DO_3OP(gvec_ah_fmax_b16, helper_sme2_ah_fmax_b16, bfloat16) +DO_3OP(gvec_ah_fmin_b16, helper_sme2_ah_fmin_b16, bfloat16) + #endif #undef DO_3OP -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-32-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 36 +++++++++++++++++++ target/arm/tcg/translate-sme.c | 65 ++++++++++++++++++++++++++++++++++ 2 files changed, 101 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ ADD_n1 1100000 1 .. 10 .... 1010.0 11000 .... 0 @z2z_4x1 SQDMULH_n1 1100000 1 .. 10 .... 1010.1 00000 .... 0 @z2z_2x1 SQDMULH_n1 1100000 1 .. 10 .... 1010.1 00000 .... 0 @z2z_4x1 + +### SME2 Multi-vector Multiple Vectors SVE Destructive + +%zm_ax2 17:4 !function=times_2 +%zm_ax4 18:3 !function=times_4 + +@z2z_2x2 ....... . esz:2 . ....0 ....0. ..... .... . \ + &z2z_en n=2 zdn=%zd_ax2 zm=%zm_ax2 +@z2z_4x4 ....... . esz:2 . ...00 ....1. ..... ...0 . \ + &z2z_en n=4 zdn=%zd_ax4 zm=%zm_ax4 + +SMAX_nn 1100000 1 .. 1 ..... 1011.0 00000 .... 0 @z2z_2x2 +SMAX_nn 1100000 1 .. 1 ..... 1011.0 00000 .... 0 @z2z_4x4 +UMAX_nn 1100000 1 .. 1 ..... 1011.0 00000 .... 1 @z2z_2x2 +UMAX_nn 1100000 1 .. 1 ..... 1011.0 00000 .... 1 @z2z_4x4 +SMIN_nn 1100000 1 .. 1 ..... 1011.0 00001 .... 0 @z2z_2x2 +SMIN_nn 1100000 1 .. 1 ..... 1011.0 00001 .... 0 @z2z_4x4 +UMIN_nn 1100000 1 .. 1 ..... 1011.0 00001 .... 1 @z2z_2x2 +UMIN_nn 1100000 1 .. 1 ..... 1011.0 00001 .... 1 @z2z_4x4 + +FMAX_nn 1100000 1 .. 1 ..... 1011.0 01000 .... 0 @z2z_2x2 +FMAX_nn 1100000 1 .. 1 ..... 1011.0 01000 .... 0 @z2z_4x4 +FMIN_nn 1100000 1 .. 1 ..... 1011.0 01000 .... 1 @z2z_2x2 +FMIN_nn 1100000 1 .. 1 ..... 1011.0 01000 .... 1 @z2z_4x4 +FMAXNM_nn 1100000 1 .. 1 ..... 1011.0 01001 .... 0 @z2z_2x2 +FMAXNM_nn 1100000 1 .. 1 ..... 1011.0 01001 .... 0 @z2z_4x4 +FMINNM_nn 1100000 1 .. 1 ..... 1011.0 01001 .... 1 @z2z_2x2 +FMINNM_nn 1100000 1 .. 1 ..... 1011.0 01001 .... 1 @z2z_4x4 + +SRSHL_nn 1100000 1 .. 1 ..... 1011.0 10001 .... 0 @z2z_2x2 +SRSHL_nn 1100000 1 .. 1 ..... 1011.0 10001 .... 0 @z2z_4x4 +URSHL_nn 1100000 1 .. 1 ..... 1011.0 10001 .... 1 @z2z_2x2 +URSHL_nn 1100000 1 .. 1 ..... 1011.0 10001 .... 1 @z2z_4x4 + +SQDMULH_nn 1100000 1 .. 1 ..... 1011.1 00000 .... 0 @z2z_2x2 +SQDMULH_nn 1100000 1 .. 1 ..... 1011.1 00000 .... 0 @z2z_4x4 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SRSHL_n1, aa64_sme2, do_z2z_n1, a, gen_sme2_srshl) TRANS_FEAT(URSHL_n1, aa64_sme2, do_z2z_n1, a, gen_sme2_urshl) TRANS_FEAT(SQDMULH_n1, aa64_sme2, do_z2z_n1, a, gen_gvec_sve2_sqdmulh) +static bool do_z2z_nn(DisasContext *s, arg_z2z_en *a, GVecGen3Fn *fn) +{ + int esz, dn, dm, vsz, n; + + if (!sme_sm_enabled_check(s)) { + return true; + } + + esz = a->esz; + n = a->n; + dn = a->zdn; + dm = a->zm; + vsz = streaming_vec_reg_size(s); + + for (int i = 0; i < n; i++) { + int dofs = vec_full_reg_offset(s, dn + i); + int mofs = vec_full_reg_offset(s, dm + i); + + fn(esz, dofs, dofs, mofs, vsz, vsz); + } + return true; +} + +TRANS_FEAT(SMAX_nn, aa64_sme2, do_z2z_nn, a, tcg_gen_gvec_smax) +TRANS_FEAT(SMIN_nn, aa64_sme2, do_z2z_nn, a, tcg_gen_gvec_smin) +TRANS_FEAT(UMAX_nn, aa64_sme2, do_z2z_nn, a, tcg_gen_gvec_umax) +TRANS_FEAT(UMIN_nn, aa64_sme2, do_z2z_nn, a, tcg_gen_gvec_umin) +TRANS_FEAT(SRSHL_nn, aa64_sme2, do_z2z_nn, a, gen_sme2_srshl) +TRANS_FEAT(URSHL_nn, aa64_sme2, do_z2z_nn, a, gen_sme2_urshl) +TRANS_FEAT(SQDMULH_nn, aa64_sme2, do_z2z_nn, a, gen_gvec_sve2_sqdmulh) + static bool do_z2z_n1_fpst(DisasContext *s, arg_z2z_en *a, gen_helper_gvec_3_ptr * const fns[4]) { @@ -XXX,XX +XXX,XX @@ static bool do_z2z_n1_fpst(DisasContext *s, arg_z2z_en *a, return true; } +static bool do_z2z_nn_fpst(DisasContext *s, arg_z2z_en *a, + gen_helper_gvec_3_ptr * const fns[4]) +{ + int esz = a->esz, n, dn, dm, vsz; + gen_helper_gvec_3_ptr *fn; + TCGv_ptr fpst; + + if (esz == MO_8 && !dc_isar_feature(aa64_sme_b16b16, s)) { + return false; + } + if (!sme_sm_enabled_check(s)) { + return true; + } + + fpst = fpstatus_ptr(esz == MO_16 ? FPST_A64_F16 : FPST_A64); + fn = fns[esz]; + n = a->n; + dn = a->zdn; + dm = a->zm; + vsz = streaming_vec_reg_size(s); + + for (int i = 0; i < n; i++) { + int dofs = vec_full_reg_offset(s, dn + i); + int mofs = vec_full_reg_offset(s, dm + i); + + tcg_gen_gvec_3_ptr(dofs, dofs, mofs, fpst, vsz, vsz, 0, fn); + } + return true; +} + static gen_helper_gvec_3_ptr * const f_vector_fmax[2][4] = { { gen_helper_gvec_fmax_b16, gen_helper_gvec_fmax_h, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fmax[2][4] = { gen_helper_gvec_ah_fmax_d }, }; TRANS_FEAT(FMAX_n1, aa64_sme2, do_z2z_n1_fpst, a, f_vector_fmax[s->fpcr_ah]) +TRANS_FEAT(FMAX_nn, aa64_sme2, do_z2z_nn_fpst, a, f_vector_fmax[s->fpcr_ah]) static gen_helper_gvec_3_ptr * const f_vector_fmin[2][4] = { { gen_helper_gvec_fmin_b16, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fmin[2][4] = { gen_helper_gvec_ah_fmin_d }, }; TRANS_FEAT(FMIN_n1, aa64_sme2, do_z2z_n1_fpst, a, f_vector_fmin[s->fpcr_ah]) +TRANS_FEAT(FMIN_nn, aa64_sme2, do_z2z_nn_fpst, a, f_vector_fmin[s->fpcr_ah]) static gen_helper_gvec_3_ptr * const f_vector_fmaxnm[4] = { gen_helper_gvec_fmaxnum_b16, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fmaxnm[4] = { gen_helper_gvec_fmaxnum_d, }; TRANS_FEAT(FMAXNM_n1, aa64_sme2, do_z2z_n1_fpst, a, f_vector_fmaxnm) +TRANS_FEAT(FMAXNM_nn, aa64_sme2, do_z2z_nn_fpst, a, f_vector_fmaxnm) static gen_helper_gvec_3_ptr * const f_vector_fminnm[4] = { gen_helper_gvec_fminnum_b16, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fminnm[4] = { gen_helper_gvec_fminnum_d, }; TRANS_FEAT(FMINNM_n1, aa64_sme2, do_z2z_n1_fpst, a, f_vector_fminnm) +TRANS_FEAT(FMINNM_nn, aa64_sme2, do_z2z_nn_fpst, a, f_vector_fminnm) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-33-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate.h | 2 ++ target/arm/tcg/sme.decode | 15 +++++++++++++++ target/arm/tcg/translate-sme.c | 30 ++++++++++++++++++++++++++++++ 3 files changed, 47 insertions(+) diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate.h +++ b/target/arm/tcg/translate.h @@ -XXX,XX +XXX,XX @@ typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t); typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t); +typedef void GVecGen3FnVar(unsigned, TCGv_ptr, uint32_t, TCGv_ptr, uint32_t, + TCGv_ptr, uint32_t, uint32_t, uint32_t); /* Function prototype for gen_ functions for calling Neon helpers */ typedef void NeonGenOneOpFn(TCGv_i32, TCGv_i32); diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ URSHL_nn 1100000 1 .. 1 ..... 1011.0 10001 .... 1 @z2z_4x4 SQDMULH_nn 1100000 1 .. 1 ..... 1011.1 00000 .... 0 @z2z_2x2 SQDMULH_nn 1100000 1 .. 1 ..... 1011.1 00000 .... 0 @z2z_4x4 + +### SME2 Multi-vector Multiple and Single Array Vectors + +&azz_n n off rv zn zm +@azz_nx1_o3 ........ .... zm:4 ...... zn:5 .. off:3 &azz_n rv=%mova_rv + +ADD_azz_n1_s 11000001 0010 .... 0 .. 110 ..... 10 ... @azz_nx1_o3 n=2 +ADD_azz_n1_s 11000001 0011 .... 0 .. 110 ..... 10 ... @azz_nx1_o3 n=4 +ADD_azz_n1_d 11000001 0110 .... 0 .. 110 ..... 10 ... @azz_nx1_o3 n=2 +ADD_azz_n1_d 11000001 0111 .... 0 .. 110 ..... 10 ... @azz_nx1_o3 n=4 + +SUB_azz_n1_s 11000001 0010 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=2 +SUB_azz_n1_s 11000001 0011 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=4 +SUB_azz_n1_d 11000001 0110 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=2 +SUB_azz_n1_d 11000001 0111 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=4 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fminnm[4] = { }; TRANS_FEAT(FMINNM_n1, aa64_sme2, do_z2z_n1_fpst, a, f_vector_fminnm) TRANS_FEAT(FMINNM_nn, aa64_sme2, do_z2z_nn_fpst, a, f_vector_fminnm) + +/* Add/Sub vector Z[m] to each Z[n*N] with result in ZA[d*N]. */ +static bool do_azz_n1(DisasContext *s, arg_azz_n *a, int esz, + GVecGen3FnVar *fn) +{ + TCGv_ptr t_za; + int svl, n, o_zm; + + if (!sme_smza_enabled_check(s)) { + return true; + } + + n = a->n; + t_za = get_zarray(s, a->rv, a->off, n, 0); + o_zm = vec_full_reg_offset(s, a->zm); + svl = streaming_vec_reg_size(s); + + for (int i = 0; i < n; ++i) { + int o_za = (svl / n * sizeof(ARMVectorReg)) * i; + int o_zn = vec_full_reg_offset(s, (a->zn + i) % 32); + + fn(esz, t_za, o_za, tcg_env, o_zn, tcg_env, o_zm, svl, svl); + } + return true; +} + +TRANS_FEAT(ADD_azz_n1_s, aa64_sme2, do_azz_n1, a, MO_32, tcg_gen_gvec_add_var) +TRANS_FEAT(SUB_azz_n1_s, aa64_sme2, do_azz_n1, a, MO_32, tcg_gen_gvec_sub_var) +TRANS_FEAT(ADD_azz_n1_d, aa64_sme2_i16i64, do_azz_n1, a, MO_64, tcg_gen_gvec_add_var) +TRANS_FEAT(SUB_azz_n1_d, aa64_sme2_i16i64, do_azz_n1, a, MO_64, tcg_gen_gvec_sub_var) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-34-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 20 ++++++++++++++++++++ target/arm/tcg/translate-sme.c | 30 ++++++++++++++++++++++++++++++ 2 files changed, 50 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SUB_azz_n1_s 11000001 0010 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=2 SUB_azz_n1_s 11000001 0011 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=4 SUB_azz_n1_d 11000001 0110 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=2 SUB_azz_n1_d 11000001 0111 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=4 + +### SME2 Multi-vector Multiple Array Vectors + +%zn_ax2 6:4 !function=times_2 +%zn_ax4 7:3 !function=times_4 + +@azz_2x2_o3 ........ ... ..... . .. ... ..... .. off:3 \ + &azz_n n=2 rv=%mova_rv zn=%zn_ax2 zm=%zm_ax2 +@azz_4x4_o3 ........ ... ..... . .. ... ..... .. off:3 \ + &azz_n n=4 rv=%mova_rv zn=%zn_ax4 zm=%zm_ax4 + +ADD_azz_nn_s 11000001 101 ....0 0 .. 110 ....0 10 ... @azz_2x2_o3 +ADD_azz_nn_s 11000001 101 ...01 0 .. 110 ...00 10 ... @azz_4x4_o3 +ADD_azz_nn_d 11000001 111 ....0 0 .. 110 ....0 10 ... @azz_2x2_o3 +ADD_azz_nn_d 11000001 111 ...01 0 .. 110 ...00 10 ... @azz_4x4_o3 + +SUB_azz_nn_s 11000001 101 ....0 0 .. 110 ....0 11 ... @azz_2x2_o3 +SUB_azz_nn_s 11000001 101 ...01 0 .. 110 ...00 11 ... @azz_4x4_o3 +SUB_azz_nn_d 11000001 111 ....0 0 .. 110 ....0 11 ... @azz_2x2_o3 +SUB_azz_nn_d 11000001 111 ...01 0 .. 110 ...00 11 ... @azz_4x4_o3 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(ADD_azz_n1_s, aa64_sme2, do_azz_n1, a, MO_32, tcg_gen_gvec_add_var) TRANS_FEAT(SUB_azz_n1_s, aa64_sme2, do_azz_n1, a, MO_32, tcg_gen_gvec_sub_var) TRANS_FEAT(ADD_azz_n1_d, aa64_sme2_i16i64, do_azz_n1, a, MO_64, tcg_gen_gvec_add_var) TRANS_FEAT(SUB_azz_n1_d, aa64_sme2_i16i64, do_azz_n1, a, MO_64, tcg_gen_gvec_sub_var) + +/* Add/Sub each vector Z[m*N] to each Z[n*N] with result in ZA[d*N]. */ +static bool do_azz_nn(DisasContext *s, arg_azz_n *a, int esz, + GVecGen3FnVar *fn) +{ + TCGv_ptr t_za; + int svl, n; + + if (!sme_smza_enabled_check(s)) { + return true; + } + + n = a->n; + t_za = get_zarray(s, a->rv, a->off, n, 1); + svl = streaming_vec_reg_size(s); + + for (int i = 0; i < n; ++i) { + int o_za = (svl / n * sizeof(ARMVectorReg)) * i; + int o_zn = vec_full_reg_offset(s, a->zn + i); + int o_zm = vec_full_reg_offset(s, a->zm + i); + + fn(esz, t_za, o_za, tcg_env, o_zn, tcg_env, o_zm, svl, svl); + } + return true; +} + +TRANS_FEAT(ADD_azz_nn_s, aa64_sme2, do_azz_nn, a, MO_32, tcg_gen_gvec_add_var) +TRANS_FEAT(SUB_azz_nn_s, aa64_sme2, do_azz_nn, a, MO_32, tcg_gen_gvec_sub_var) +TRANS_FEAT(ADD_azz_nn_d, aa64_sme2_i16i64, do_azz_nn, a, MO_64, tcg_gen_gvec_add_var) +TRANS_FEAT(SUB_azz_nn_d, aa64_sme2_i16i64, do_azz_nn, a, MO_64, tcg_gen_gvec_sub_var) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Indicate whether to use FPST_FPCR or FPST_ZA via bit 2 of simd_data(desc). For SVE, this bit remains zero. For do_FMLAL_zzzw, this requires no change. For do_FMLAL_zzxw, move the index up one bit. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-35-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-sve.c | 2 +- target/arm/tcg/vec_helper.c | 8 +++++--- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool do_FMLAL_zzxw(DisasContext *s, arg_rrxr_esz *a, bool sub, bool sel) { return gen_gvec_ptr_zzzz(s, gen_helper_sve2_fmlal_zzxw_s, a->rd, a->rn, a->rm, a->ra, - (a->index << 2) | (sel << 1) | sub, tcg_env); + (a->index << 3) | (sel << 1) | sub, tcg_env); } TRANS_FEAT(FMLALB_zzxw, aa64_sve2, do_FMLAL_zzxw, a, false, false) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va, intptr_t i, oprsz = simd_oprsz(desc); bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1); intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16); - float_status *status = &env->vfp.fp_status[FPST_A64]; + bool za = extract32(desc, SIMD_DATA_SHIFT + 2, 1); + float_status *status = &env->vfp.fp_status[za ? FPST_ZA : FPST_A64]; bool fz16 = env->vfp.fpcr & FPCR_FZ16; int negx = 0, negf = 0; @@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va, intptr_t i, j, oprsz = simd_oprsz(desc); bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1); intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16); - intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 2, 3) * sizeof(float16); - float_status *status = &env->vfp.fp_status[FPST_A64]; + bool za = extract32(desc, SIMD_DATA_SHIFT + 2, 1); + intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 3, 3) * sizeof(float16); + float_status *status = &env->vfp.fp_status[za ? FPST_ZA : FPST_A64]; bool fz16 = env->vfp.fpcr & FPCR_FZ16; int negx = 0, negf = 0; -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-36-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper.h | 8 +++++ target/arm/tcg/vec_helper.c | 58 ++++++++++++++++++++++++++++--------- 2 files changed, 53 insertions(+), 13 deletions(-) diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper.h +++ b/target/arm/tcg/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(gvec_bfmmla, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_6(gvec_bfmlal, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_6(gvec_bfmlsl, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_6(gvec_ah_bfmlsl, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_6(gvec_bfmlal_idx, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_6(gvec_bfmlsl_idx, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_6(gvec_ah_bfmlsl_idx, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_sclamp_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfmmla)(void *vd, void *vn, void *vm, void *va, clear_tail(d, opr_sz, simd_maxsz(desc)); } -void HELPER(gvec_bfmlal)(void *vd, void *vn, void *vm, void *va, - float_status *stat, uint32_t desc) +static void do_bfmlal(float32 *d, bfloat16 *n, bfloat16 *m, float32 *a, + float_status *stat, uint32_t desc, int negx, int negf) { intptr_t i, opr_sz = simd_oprsz(desc); - intptr_t sel = simd_data(desc); - float32 *d = vd, *a = va; - bfloat16 *n = vn, *m = vm; + intptr_t sel = extract32(desc, SIMD_DATA_SHIFT, 1); for (i = 0; i < opr_sz / 4; ++i) { - float32 nn = n[H2(i * 2 + sel)] << 16; + float32 nn = (negx ^ n[H2(i * 2 + sel)]) << 16; float32 mm = m[H2(i * 2 + sel)] << 16; - d[H4(i)] = float32_muladd(nn, mm, a[H4(i)], 0, stat); + d[H4(i)] = float32_muladd(nn, mm, a[H4(i)], negf, stat); } clear_tail(d, opr_sz, simd_maxsz(desc)); } -void HELPER(gvec_bfmlal_idx)(void *vd, void *vn, void *vm, - void *va, float_status *stat, uint32_t desc) +void HELPER(gvec_bfmlal)(void *vd, void *vn, void *vm, void *va, + float_status *stat, uint32_t desc) +{ + do_bfmlal(vd, vn, vm, va, stat, desc, 0, 0); +} + +void HELPER(gvec_bfmlsl)(void *vd, void *vn, void *vm, void *va, + float_status *stat, uint32_t desc) +{ + do_bfmlal(vd, vn, vm, va, stat, desc, 0x8000, 0); +} + +void HELPER(gvec_ah_bfmlsl)(void *vd, void *vn, void *vm, void *va, + float_status *stat, uint32_t desc) +{ + do_bfmlal(vd, vn, vm, va, stat, desc, 0, float_muladd_negate_product); +} + +static void do_bfmlal_idx(float32 *d, bfloat16 *n, bfloat16 *m, float32 *a, + float_status *stat, uint32_t desc, int negx, int negf) { intptr_t i, j, opr_sz = simd_oprsz(desc); intptr_t sel = extract32(desc, SIMD_DATA_SHIFT, 1); intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 1, 3); intptr_t elements = opr_sz / 4; intptr_t eltspersegment = MIN(16 / 4, elements); - float32 *d = vd, *a = va; - bfloat16 *n = vn, *m = vm; for (i = 0; i < elements; i += eltspersegment) { float32 m_idx = m[H2(2 * i + index)] << 16; for (j = i; j < i + eltspersegment; j++) { - float32 n_j = n[H2(2 * j + sel)] << 16; - d[H4(j)] = float32_muladd(n_j, m_idx, a[H4(j)], 0, stat); + float32 n_j = (negx ^ n[H2(2 * j + sel)]) << 16; + d[H4(j)] = float32_muladd(n_j, m_idx, a[H4(j)], negf, stat); } } clear_tail(d, opr_sz, simd_maxsz(desc)); } +void HELPER(gvec_bfmlal_idx)(void *vd, void *vn, void *vm, void *va, + float_status *stat, uint32_t desc) +{ + do_bfmlal_idx(vd, vn, vm, va, stat, desc, 0, 0); +} + +void HELPER(gvec_bfmlsl_idx)(void *vd, void *vn, void *vm, void *va, + float_status *stat, uint32_t desc) +{ + do_bfmlal_idx(vd, vn, vm, va, stat, desc, 0x8000, 0); +} + +void HELPER(gvec_ah_bfmlsl_idx)(void *vd, void *vn, void *vm, void *va, + float_status *stat, uint32_t desc) +{ + do_bfmlal_idx(vd, vn, vm, va, stat, desc, 0, float_muladd_negate_product); +} + #define DO_CLAMP(NAME, TYPE) \ void HELPER(NAME)(void *d, void *n, void *m, void *a, uint32_t desc) \ { \ -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-37-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 71 ++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 98 ++++++++++++++++++++++++++++++++++ 2 files changed, 169 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SUB_azz_n1_s 11000001 0011 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=4 SUB_azz_n1_d 11000001 0110 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=2 SUB_azz_n1_d 11000001 0111 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=4 +%off3_x2 0:3 !function=times_2 +%off2_x2 0:2 !function=times_2 + +@azz_nx1_o3x2 ........ ... . zm:4 . .. ... zn:5 .. ... \ + &azz_n off=%off3_x2 rv=%mova_rv +@azz_nx1_o2x2 ........ ... . zm:4 . .. ... zn:5 ... .. \ + &azz_n off=%off2_x2 rv=%mova_rv + +FMLAL_n1 11000001 001 0 .... 0 .. 011 ..... 00 ... @azz_nx1_o3x2 n=1 +FMLAL_n1 11000001 001 0 .... 0 .. 010 ..... 000 .. @azz_nx1_o2x2 n=2 +FMLAL_n1 11000001 001 1 .... 0 .. 010 ..... 000 .. @azz_nx1_o2x2 n=4 + +FMLSL_n1 11000001 001 0 .... 0 .. 011 ..... 01 ... @azz_nx1_o3x2 n=1 +FMLSL_n1 11000001 001 0 .... 0 .. 010 ..... 010 .. @azz_nx1_o2x2 n=2 +FMLSL_n1 11000001 001 1 .... 0 .. 010 ..... 010 .. @azz_nx1_o2x2 n=4 + +BFMLAL_n1 11000001 001 0 .... 0 .. 011 ..... 10 ... @azz_nx1_o3x2 n=1 +BFMLAL_n1 11000001 001 0 .... 0 .. 010 ..... 100 .. @azz_nx1_o2x2 n=2 +BFMLAL_n1 11000001 001 1 .... 0 .. 010 ..... 100 .. @azz_nx1_o2x2 n=4 + +BFMLSL_n1 11000001 001 0 .... 0 .. 011 ..... 11 ... @azz_nx1_o3x2 n=1 +BFMLSL_n1 11000001 001 0 .... 0 .. 010 ..... 110 .. @azz_nx1_o2x2 n=2 +BFMLSL_n1 11000001 001 1 .... 0 .. 010 ..... 110 .. @azz_nx1_o2x2 n=4 + ### SME2 Multi-vector Multiple Array Vectors %zn_ax2 6:4 !function=times_2 @@ -XXX,XX +XXX,XX @@ SUB_azz_nn_s 11000001 101 ....0 0 .. 110 ....0 11 ... @azz_2x2_o3 SUB_azz_nn_s 11000001 101 ...01 0 .. 110 ...00 11 ... @azz_4x4_o3 SUB_azz_nn_d 11000001 111 ....0 0 .. 110 ....0 11 ... @azz_2x2_o3 SUB_azz_nn_d 11000001 111 ...01 0 .. 110 ...00 11 ... @azz_4x4_o3 + +@azz_2x2_o2x2 ........ ... ..... . .. ... ..... ... .. \ + &azz_n n=2 rv=%mova_rv zn=%zn_ax2 zm=%zm_ax2 off=%off2_x2 +@azz_4x4_o2x2 ........ ... ..... . .. ... ..... ... .. \ + &azz_n n=4 rv=%mova_rv zn=%zn_ax4 zm=%zm_ax4 off=%off2_x2 + +FMLAL_nn 11000001 101 ....0 0 .. 010 ....0 000 .. @azz_2x2_o2x2 +FMLAL_nn 11000001 101 ...01 0 .. 010 ...00 000 .. @azz_4x4_o2x2 + +FMLSL_nn 11000001 101 ....0 0 .. 010 ....0 010 .. @azz_2x2_o2x2 +FMLSL_nn 11000001 101 ...01 0 .. 010 ...00 010 .. @azz_4x4_o2x2 + +BFMLAL_nn 11000001 101 ....0 0 .. 010 ....0 100 .. @azz_2x2_o2x2 +BFMLAL_nn 11000001 101 ...01 0 .. 010 ...00 100 .. @azz_4x4_o2x2 + +BFMLSL_nn 11000001 101 ....0 0 .. 010 ....0 110 .. @azz_2x2_o2x2 +BFMLSL_nn 11000001 101 ...01 0 .. 010 ...00 110 .. @azz_4x4_o2x2 + +### SME2 Multi-vector Indexed + +&azx_n n off rv zn zm idx + +%idx3_15_10 15:1 10:2 +%idx2_10_2 10:2 2:1 + +@azx_1x1_o3x2 ........ .... zm:4 . .. . .. zn:5 .. ... \ + &azx_n n=1 rv=%mova_rv off=%off3_x2 idx=%idx3_15_10 +@azx_2x1_o2x2 ........ .... zm:4 . .. . .. ..... .. ... \ + &azx_n n=2 rv=%mova_rv off=%off2_x2 zn=%zn_ax2 idx=%idx2_10_2 +@azx_4x1_o2x2 ........ .... zm:4 . .. . .. ..... .. ... \ + &azx_n n=4 rv=%mova_rv off=%off2_x2 zn=%zn_ax4 idx=%idx2_10_2 + +FMLAL_nx 11000001 1000 .... . .. 1 .. ..... 00 ... @azx_1x1_o3x2 +FMLAL_nx 11000001 1001 .... 0 .. 1 .. ....0 00 ... @azx_2x1_o2x2 +FMLAL_nx 11000001 1001 .... 1 .. 1 .. ...00 00 ... @azx_4x1_o2x2 + +FMLSL_nx 11000001 1000 .... . .. 1 .. ..... 01 ... @azx_1x1_o3x2 +FMLSL_nx 11000001 1001 .... 0 .. 1 .. ....0 01 ... @azx_2x1_o2x2 +FMLSL_nx 11000001 1001 .... 1 .. 1 .. ...00 01 ... @azx_4x1_o2x2 + +BFMLAL_nx 11000001 1000 .... . .. 1 .. ..... 10 ... @azx_1x1_o3x2 +BFMLAL_nx 11000001 1001 .... 0 .. 1 .. ....0 10 ... @azx_2x1_o2x2 +BFMLAL_nx 11000001 1001 .... 1 .. 1 .. ...00 10 ... @azx_4x1_o2x2 + +BFMLSL_nx 11000001 1000 .... . .. 1 .. ..... 11 ... @azx_1x1_o3x2 +BFMLSL_nx 11000001 1001 .... 0 .. 1 .. ....0 11 ... @azx_2x1_o2x2 +BFMLSL_nx 11000001 1001 .... 1 .. 1 .. ...00 11 ... @azx_4x1_o2x2 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(ADD_azz_nn_s, aa64_sme2, do_azz_nn, a, MO_32, tcg_gen_gvec_add_var) TRANS_FEAT(SUB_azz_nn_s, aa64_sme2, do_azz_nn, a, MO_32, tcg_gen_gvec_sub_var) TRANS_FEAT(ADD_azz_nn_d, aa64_sme2_i16i64, do_azz_nn, a, MO_64, tcg_gen_gvec_add_var) TRANS_FEAT(SUB_azz_nn_d, aa64_sme2_i16i64, do_azz_nn, a, MO_64, tcg_gen_gvec_sub_var) + +/* + * Expand array multi-vector single (n1), array multi-vector (nn), + * and array multi-vector indexed (nx), for floating-point accumulate. + * multi: true for nn, false for n1. + * fpst: >= 0 to set ptr argument for FPST_*, < 0 for ENV. + * data: stuff for simd_data, including any index. + */ +#define FPST_ENV -1 + +static bool do_azz_acc_fp(DisasContext *s, int nreg, int nsel, + int rv, int off, int zn, int zm, + int data, int shsel, bool multi, int fpst, + gen_helper_gvec_4_ptr *fn) +{ + if (sme_smza_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + int vstride = svl / nreg; + TCGv_ptr t_za = get_zarray(s, rv, off, nreg, nsel); + TCGv_ptr t, ptr; + + if (fpst >= 0) { + ptr = fpstatus_ptr(fpst); + } else { + ptr = tcg_env; + } + t = tcg_temp_new_ptr(); + + for (int r = 0; r < nreg; ++r) { + TCGv_ptr t_zn = vec_full_reg_ptr(s, zn); + TCGv_ptr t_zm = vec_full_reg_ptr(s, zm); + + for (int i = 0; i < nsel; ++i) { + int o_za = (r * vstride + i) * sizeof(ARMVectorReg); + int desc = simd_desc(svl, svl, data | (i << shsel)); + + tcg_gen_addi_ptr(t, t_za, o_za); + fn(t, t_zn, t_zm, t, ptr, tcg_constant_i32(desc)); + } + + /* + * For multiple-and-single vectors, Zn may wrap. + * For multiple vectors, both Zn and Zm are aligned. + */ + zn = (zn + 1) % 32; + zm += multi; + } + } + return true; +} + +static bool do_fmlal(DisasContext *s, arg_azz_n *a, bool sub, bool multi) +{ + return do_azz_acc_fp(s, a->n, 2, a->rv, a->off, a->zn, a->zm, + (1 << 2) | sub, 1, + multi, FPST_ENV, gen_helper_sve2_fmlal_zzzw_s); +} + +TRANS_FEAT(FMLAL_n1, aa64_sme2, do_fmlal, a, false, false) +TRANS_FEAT(FMLSL_n1, aa64_sme2, do_fmlal, a, true, false) +TRANS_FEAT(FMLAL_nn, aa64_sme2, do_fmlal, a, false, true) +TRANS_FEAT(FMLSL_nn, aa64_sme2, do_fmlal, a, true, true) + +static bool do_fmlal_nx(DisasContext *s, arg_azx_n *a, bool sub) +{ + return do_azz_acc_fp(s, a->n, 2, a->rv, a->off, a->zn, a->zm, + (a->idx << 3) | (1 << 2) | sub, 1, + false, FPST_ENV, gen_helper_sve2_fmlal_zzxw_s); +} + +TRANS_FEAT(FMLAL_nx, aa64_sme2, do_fmlal_nx, a, false) +TRANS_FEAT(FMLSL_nx, aa64_sme2, do_fmlal_nx, a, true) + +static bool do_bfmlal(DisasContext *s, arg_azz_n *a, bool sub, bool multi) +{ + return do_azz_acc_fp(s, a->n, 2, a->rv, a->off, a->zn, a->zm, + 0, 0, multi, FPST_ZA, + (!sub ? gen_helper_gvec_bfmlal + : s->fpcr_ah ? gen_helper_gvec_ah_bfmlsl + : gen_helper_gvec_bfmlsl)); +} + +TRANS_FEAT(BFMLAL_n1, aa64_sme2, do_bfmlal, a, false, false) +TRANS_FEAT(BFMLSL_n1, aa64_sme2, do_bfmlal, a, true, false) +TRANS_FEAT(BFMLAL_nn, aa64_sme2, do_bfmlal, a, false, true) +TRANS_FEAT(BFMLSL_nn, aa64_sme2, do_bfmlal, a, true, true) + +static bool do_bfmlal_nx(DisasContext *s, arg_azx_n *a, bool sub) +{ + return do_azz_acc_fp(s, a->n, 2, a->rv, a->off, a->zn, a->zm, + a->idx << 1, 0, false, FPST_ZA, + !sub ? gen_helper_gvec_bfmlal_idx + : s->fpcr_ah ? gen_helper_gvec_ah_bfmlsl_idx + : gen_helper_gvec_bfmlsl_idx); +} + +TRANS_FEAT(BFMLAL_nx, aa64_sme2, do_bfmlal_nx, a, false) +TRANS_FEAT(BFMLSL_nx, aa64_sme2, do_bfmlal_nx, a, true) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-38-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 5 ++++ target/arm/tcg/sme.decode | 14 +++++++++++ target/arm/tcg/sve.decode | 7 ++++-- target/arm/tcg/sme_helper.c | 44 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 18 ++++++++++++++ target/arm/tcg/translate-sve.c | 5 ++++ 6 files changed, 91 insertions(+), 2 deletions(-) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fmaxnum_b16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fminnum_b16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) + +DEF_HELPER_FLAGS_6(sme2_fdot_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_6(sme2_fdot_idx_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, env, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ BFMLSL_n1 11000001 001 0 .... 0 .. 011 ..... 11 ... @azz_nx1_o3x2 n=1 BFMLSL_n1 11000001 001 0 .... 0 .. 010 ..... 110 .. @azz_nx1_o2x2 n=2 BFMLSL_n1 11000001 001 1 .... 0 .. 010 ..... 110 .. @azz_nx1_o2x2 n=4 +FDOT_n1 11000001 001 0 .... 0 .. 100 ..... 00 ... @azz_nx1_o3 n=2 +FDOT_n1 11000001 001 1 .... 0 .. 100 ..... 00 ... @azz_nx1_o3 n=4 + ### SME2 Multi-vector Multiple Array Vectors %zn_ax2 6:4 !function=times_2 @@ -XXX,XX +XXX,XX @@ BFMLAL_nn 11000001 101 ...01 0 .. 010 ...00 100 .. @azz_4x4_o2x2 BFMLSL_nn 11000001 101 ....0 0 .. 010 ....0 110 .. @azz_2x2_o2x2 BFMLSL_nn 11000001 101 ...01 0 .. 010 ...00 110 .. @azz_4x4_o2x2 +FDOT_nn 11000001 101 ....0 0 .. 100 ....0 00 ... @azz_2x2_o3 +FDOT_nn 11000001 101 ...01 0 .. 100 ...00 00 ... @azz_4x4_o3 + ### SME2 Multi-vector Indexed &azx_n n off rv zn zm idx @@ -XXX,XX +XXX,XX @@ BFMLAL_nx 11000001 1001 .... 1 .. 1 .. ...00 10 ... @azx_4x1_o2x2 BFMLSL_nx 11000001 1000 .... . .. 1 .. ..... 11 ... @azx_1x1_o3x2 BFMLSL_nx 11000001 1001 .... 0 .. 1 .. ....0 11 ... @azx_2x1_o2x2 BFMLSL_nx 11000001 1001 .... 1 .. 1 .. ...00 11 ... @azx_4x1_o2x2 + +@azx_2x1_i2_o3 ........ .... zm:4 . .. . idx:2 .... ... off:3 \ + &azx_n n=2 rv=%mova_rv zn=%zn_ax2 +@azx_4x1_i2_o3 ........ .... zm:4 . .. . idx:2 .... ... off:3 \ + &azx_n n=2 rv=%mova_rv zn=%zn_ax4 + +FDOT_nx 11000001 0101 .... 0 .. 1 .. ....0 01 ... @azx_2x1_i2_o3 +FDOT_nx 11000001 0101 .... 1 .. 1 .. ...00 01 ... @azx_4x1_i2_o3 diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ FMLSLT_zzzw 01100100 10 1 ..... 10 1 00 1 ..... ..... @rda_rn_rm_ex esz=2 BFMLALB_zzzw 01100100 11 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2 BFMLALT_zzzw 01100100 11 1 ..... 10 0 00 1 ..... ..... @rda_rn_rm_ex esz=2 -### SVE2 floating-point bfloat16 dot-product +### SVE2 floating-point dot-product +FDOT_zzzz 01100100 00 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2 BFDOT_zzzz 01100100 01 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2 ### SVE2 floating-point multiply-add long (indexed) @@ -XXX,XX +XXX,XX @@ FMLSLT_zzxw 01100100 10 1 ..... 0110.1 ..... ..... @rrxr_3a esz=2 BFMLALB_zzxw 01100100 11 1 ..... 0100.0 ..... ..... @rrxr_3a esz=2 BFMLALT_zzxw 01100100 11 1 ..... 0100.1 ..... ..... @rrxr_3a esz=2 -### SVE2 floating-point bfloat16 dot-product (indexed) +### SVE2 floating-point dot-product (indexed) + +FDOT_zzxz 01100100 00 1 ..... 010000 ..... ..... @rrxr_2 esz=2 BFDOT_zzxz 01100100 01 1 ..... 010000 ..... ..... @rrxr_2 esz=2 ### SVE broadcast predicate element diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn, } } +void HELPER(sme2_fdot_h)(void *vd, void *vn, void *vm, void *va, + CPUARMState *env, uint32_t desc) +{ + intptr_t i, oprsz = simd_maxsz(desc); + bool za = extract32(desc, SIMD_DATA_SHIFT, 1); + float_status *fpst_std = &env->vfp.fp_status[za ? FPST_ZA : FPST_A64]; + float_status *fpst_f16 = &env->vfp.fp_status[za ? FPST_ZA_F16 : FPST_A64_F16]; + float_status fpst_odd = *fpst_std; + float32 *d = vd, *a = va; + uint32_t *n = vn, *m = vm; + + set_float_rounding_mode(float_round_to_odd, &fpst_odd); + + for (i = 0; i < oprsz / sizeof(float32); ++i) { + d[H4(i)] = f16_dotadd(a[H4(i)], n[H4(i)], m[H4(i)], + fpst_f16, fpst_std, &fpst_odd); + } +} + +void HELPER(sme2_fdot_idx_h)(void *vd, void *vn, void *vm, void *va, + CPUARMState *env, uint32_t desc) +{ + intptr_t i, j, oprsz = simd_maxsz(desc); + intptr_t elements = oprsz / sizeof(float32); + intptr_t eltspersegment = MIN(4, elements); + int idx = extract32(desc, SIMD_DATA_SHIFT, 2); + bool za = extract32(desc, SIMD_DATA_SHIFT + 2, 1); + float_status *fpst_std = &env->vfp.fp_status[za ? FPST_ZA : FPST_A64]; + float_status *fpst_f16 = &env->vfp.fp_status[za ? FPST_ZA_F16 : FPST_A64_F16]; + float_status fpst_odd = *fpst_std; + float32 *d = vd, *a = va; + uint32_t *n = vn, *m = (uint32_t *)vm + H4(idx); + + set_float_rounding_mode(float_round_to_odd, &fpst_odd); + + for (i = 0; i < elements; i += eltspersegment) { + uint32_t mm = m[i]; + for (j = 0; j < eltspersegment; ++j) { + d[H4(i + j)] = f16_dotadd(a[H4(i + j)], n[H4(i + j)], mm, + fpst_f16, fpst_std, &fpst_odd); + } + } +} + void HELPER(sme_bfmopa)(void *vza, void *vzn, void *vzm, void *vpn, void *vpm, CPUARMState *env, uint32_t desc) { diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_bfmlal_nx(DisasContext *s, arg_azx_n *a, bool sub) TRANS_FEAT(BFMLAL_nx, aa64_sme2, do_bfmlal_nx, a, false) TRANS_FEAT(BFMLSL_nx, aa64_sme2, do_bfmlal_nx, a, true) + +static bool do_fdot(DisasContext *s, arg_azz_n *a, bool multi) +{ + return do_azz_acc_fp(s, a->n, 1, a->rv, a->off, a->zn, a->zm, 1, 0, + multi, FPST_ENV, gen_helper_sme2_fdot_h); +} + +TRANS_FEAT(FDOT_n1, aa64_sme2, do_fdot, a, false) +TRANS_FEAT(FDOT_nn, aa64_sme2, do_fdot, a, true) + +static bool do_fdot_nx(DisasContext *s, arg_azx_n *a) +{ + return do_azz_acc_fp(s, a->n, 1, a->rv, a->off, a->zn, a->zm, + a->idx | (1 << 2), 0, false, FPST_ENV, + gen_helper_sme2_fdot_idx_h); +} + +TRANS_FEAT(FDOT_nx, aa64_sme2, do_fdot_nx, a) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT_NONSTREAMING(USMMLA, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz, TRANS_FEAT_NONSTREAMING(UMMLA, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz, gen_helper_gvec_ummla_b, a, 0) +TRANS_FEAT(FDOT_zzzz, aa64_sme2_or_sve2p1, gen_gvec_env_arg_zzzz, + gen_helper_sme2_fdot_h, a, 0) +TRANS_FEAT(FDOT_zzxz, aa64_sme2_or_sve2p1, gen_gvec_env_arg_zzxz, + gen_helper_sme2_fdot_idx_h, a) + TRANS_FEAT(BFDOT_zzzz, aa64_sve_bf16, gen_gvec_env_arg_zzzz, gen_helper_gvec_bfdot, a, 0) TRANS_FEAT(BFDOT_zzxz, aa64_sve_bf16, gen_gvec_env_arg_zzxz, -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-39-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 9 +++++++++ target/arm/tcg/translate-sme.c | 17 +++++++++++++++++ 2 files changed, 26 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ BFMLSL_n1 11000001 001 1 .... 0 .. 010 ..... 110 .. @azz_nx1_o2x2 n=4 FDOT_n1 11000001 001 0 .... 0 .. 100 ..... 00 ... @azz_nx1_o3 n=2 FDOT_n1 11000001 001 1 .... 0 .. 100 ..... 00 ... @azz_nx1_o3 n=4 +BFDOT_n1 11000001 001 0 .... 0 .. 100 ..... 10 ... @azz_nx1_o3 n=2 +BFDOT_n1 11000001 001 1 .... 0 .. 100 ..... 10 ... @azz_nx1_o3 n=4 + ### SME2 Multi-vector Multiple Array Vectors %zn_ax2 6:4 !function=times_2 @@ -XXX,XX +XXX,XX @@ BFMLSL_nn 11000001 101 ...01 0 .. 010 ...00 110 .. @azz_4x4_o2x2 FDOT_nn 11000001 101 ....0 0 .. 100 ....0 00 ... @azz_2x2_o3 FDOT_nn 11000001 101 ...01 0 .. 100 ...00 00 ... @azz_4x4_o3 +BFDOT_nn 11000001 101 ....0 0 .. 100 ....0 10 ... @azz_2x2_o3 +BFDOT_nn 11000001 101 ...01 0 .. 100 ...00 10 ... @azz_4x4_o3 + ### SME2 Multi-vector Indexed &azx_n n off rv zn zm idx @@ -XXX,XX +XXX,XX @@ BFMLSL_nx 11000001 1001 .... 1 .. 1 .. ...00 11 ... @azx_4x1_o2x2 FDOT_nx 11000001 0101 .... 0 .. 1 .. ....0 01 ... @azx_2x1_i2_o3 FDOT_nx 11000001 0101 .... 1 .. 1 .. ...00 01 ... @azx_4x1_i2_o3 + +BFDOT_nx 11000001 0101 .... 0 .. 1 .. ....0 11 ... @azx_2x1_i2_o3 +BFDOT_nx 11000001 0101 .... 1 .. 1 .. ...00 11 ... @azx_4x1_i2_o3 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_fdot_nx(DisasContext *s, arg_azx_n *a) } TRANS_FEAT(FDOT_nx, aa64_sme2, do_fdot_nx, a) + +static bool do_bfdot(DisasContext *s, arg_azz_n *a, bool multi) +{ + return do_azz_acc_fp(s, a->n, 1, a->rv, a->off, a->zn, a->zm, 0, 0, + multi, FPST_ENV, gen_helper_gvec_bfdot); +} + +TRANS_FEAT(BFDOT_n1, aa64_sme2, do_bfdot, a, false) +TRANS_FEAT(BFDOT_nn, aa64_sme2, do_bfdot, a, true) + +static bool do_bfdot_nx(DisasContext *s, arg_azx_n *a) +{ + return do_azz_acc_fp(s, a->n, 1, a->rv, a->off, a->zn, a->zm, a->idx, 0, + false, FPST_ENV, gen_helper_gvec_bfdot_idx); +} + +TRANS_FEAT(BFDOT_nx, aa64_sme2, do_bfdot_nx, a) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-40-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 2 ++ target/arm/tcg/helper.h | 2 ++ target/arm/tcg/sme.decode | 3 +++ target/arm/tcg/sme_helper.c | 30 ++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 24 +++++++++++++++++++++ target/arm/tcg/vec_helper.c | 39 ++++++++++++++++++++++++++++++++++ 6 files changed, 100 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sme2_fdot_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, env, i32) DEF_HELPER_FLAGS_6(sme2_fdot_idx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_6(sme2_fvdot_idx_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, env, i32) diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper.h +++ b/target/arm/tcg/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(gvec_bfdot, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, env, i32) DEF_HELPER_FLAGS_6(gvec_bfdot_idx, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_6(sme2_bfvdot_idx, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, env, i32) DEF_HELPER_FLAGS_6(gvec_bfmmla, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, env, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ FDOT_nx 11000001 0101 .... 1 .. 1 .. ...00 01 ... @azx_4x1_i2_o3 BFDOT_nx 11000001 0101 .... 0 .. 1 .. ....0 11 ... @azx_2x1_i2_o3 BFDOT_nx 11000001 0101 .... 1 .. 1 .. ...00 11 ... @azx_4x1_i2_o3 + +FVDOT 11000001 0101 .... 0 .. 0 .. ....0 01 ... @azx_2x1_i2_o3 +BFVDOT 11000001 0101 .... 0 .. 0 .. ....0 11 ... @azx_2x1_i2_o3 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_fdot_idx_h)(void *vd, void *vn, void *vm, void *va, } } +void HELPER(sme2_fvdot_idx_h)(void *vd, void *vn, void *vm, void *va, + CPUARMState *env, uint32_t desc) +{ + intptr_t i, j, oprsz = simd_maxsz(desc); + intptr_t elements = oprsz / sizeof(float32); + intptr_t eltspersegment = MIN(4, elements); + int idx = extract32(desc, SIMD_DATA_SHIFT, 2); + int sel = extract32(desc, SIMD_DATA_SHIFT + 2, 1); + float_status fpst_odd, *fpst_std, *fpst_f16; + float32 *d = vd, *a = va; + uint16_t *n0 = vn; + uint16_t *n1 = vn + sizeof(ARMVectorReg); + uint32_t *m = (uint32_t *)vm + H4(idx); + + fpst_std = &env->vfp.fp_status[FPST_ZA]; + fpst_f16 = &env->vfp.fp_status[FPST_ZA_F16]; + fpst_odd = *fpst_std; + set_float_rounding_mode(float_round_to_odd, &fpst_odd); + + for (i = 0; i < elements; i += eltspersegment) { + uint32_t mm = m[i]; + for (j = 0; j < eltspersegment; ++j) { + uint32_t nn = (n0[H2(2 * (i + j) + sel)]) + | (n1[H2(2 * (i + j) + sel)] << 16); + d[i + H4(j)] = f16_dotadd(a[i + H4(j)], nn, mm, + fpst_f16, fpst_std, &fpst_odd); + } + } +} + void HELPER(sme_bfmopa)(void *vza, void *vzn, void *vzm, void *vpn, void *vpm, CPUARMState *env, uint32_t desc) { diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_bfdot_nx(DisasContext *s, arg_azx_n *a) } TRANS_FEAT(BFDOT_nx, aa64_sme2, do_bfdot_nx, a) + +static bool do_vdot(DisasContext *s, arg_azx_n *a, gen_helper_gvec_4_ptr *fn) +{ + if (sme_smza_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + int vstride = svl / 2; + TCGv_ptr t_za = get_zarray(s, a->rv, a->off, 2, 1); + TCGv_ptr t_zn = vec_full_reg_ptr(s, a->zn); + TCGv_ptr t_zm = vec_full_reg_ptr(s, a->zm); + TCGv_ptr t = tcg_temp_new_ptr(); + + for (int i = 0; i < 2; ++i) { + int o_za = i * vstride * sizeof(ARMVectorReg); + int desc = simd_desc(svl, svl, a->idx | (i << 2)); + + tcg_gen_addi_ptr(t, t_za, o_za); + fn(t, t_zn, t_zm, t, tcg_env, tcg_constant_i32(desc)); + } + } + return true; +} + +TRANS_FEAT(FVDOT, aa64_sme, do_vdot, a, gen_helper_sme2_fvdot_idx_h) +TRANS_FEAT(BFVDOT, aa64_sme, do_vdot, a, gen_helper_sme2_bfvdot_idx) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfdot_idx)(void *vd, void *vn, void *vm, clear_tail(d, opr_sz, simd_maxsz(desc)); } +void HELPER(sme2_bfvdot_idx)(void *vd, void *vn, void *vm, + void *va, CPUARMState *env, uint32_t desc) +{ + intptr_t i, j, opr_sz = simd_oprsz(desc); + intptr_t idx = extract32(desc, SIMD_DATA_SHIFT, 2); + intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 2, 1); + intptr_t elements = opr_sz / 4; + intptr_t eltspersegment = MIN(16 / 4, elements); + float32 *d = vd, *a = va; + uint16_t *n0 = vn; + uint16_t *n1 = vn + sizeof(ARMVectorReg); + uint32_t *m = vm; + float_status fpst, fpst_odd; + + if (is_ebf(env, &fpst, &fpst_odd)) { + for (i = 0; i < elements; i += eltspersegment) { + uint32_t m_idx = m[i + H4(idx)]; + + for (j = 0; j < eltspersegment; j++) { + uint32_t nn = (n0[H2(2 * (i + j) + sel)]) + | (n1[H2(2 * (i + j) + sel)] << 16); + d[i + H4(j)] = bfdotadd_ebf(a[i + H4(j)], nn, m_idx, + &fpst, &fpst_odd); + } + } + } else { + for (i = 0; i < elements; i += eltspersegment) { + uint32_t m_idx = m[i + H4(idx)]; + + for (j = 0; j < eltspersegment; j++) { + uint32_t nn = (n0[H2(2 * (i + j) + sel)]) + | (n1[H2(2 * (i + j) + sel)] << 16); + d[i + H4(j)] = bfdotadd(a[i + H4(j)], nn, m_idx, &fpst); + } + } + } + clear_tail(d, opr_sz, simd_maxsz(desc)); +} + void HELPER(gvec_bfmmla)(void *vd, void *vn, void *vm, void *va, CPUARMState *env, uint32_t desc) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Emphasize that these are 4-way dot products. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-41-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper.h | 22 +++++++++++----------- target/arm/tcg/translate-a64.c | 14 +++++++------- target/arm/tcg/translate-neon.c | 14 +++++++------- target/arm/tcg/translate-sve.c | 18 +++++++++--------- target/arm/tcg/vec_helper.c | 22 +++++++++++----------- 5 files changed, 45 insertions(+), 45 deletions(-) diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper.h +++ b/target/arm/tcg/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sve2_sqrdmlah_d, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(sve2_sqrdmlsh_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_sdot_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_udot_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_sdot_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_udot_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_usdot_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sdot_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_udot_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sdot_4h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_udot_4h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_usdot_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_sdot_idx_b, TCG_CALL_NO_RWG, +DEF_HELPER_FLAGS_5(gvec_sdot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_udot_idx_b, TCG_CALL_NO_RWG, +DEF_HELPER_FLAGS_5(gvec_udot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_sdot_idx_h, TCG_CALL_NO_RWG, +DEF_HELPER_FLAGS_5(gvec_sdot_idx_4h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_udot_idx_h, TCG_CALL_NO_RWG, +DEF_HELPER_FLAGS_5(gvec_udot_idx_4h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_sudot_idx_b, TCG_CALL_NO_RWG, +DEF_HELPER_FLAGS_5(gvec_sudot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_usdot_idx_b, TCG_CALL_NO_RWG, +DEF_HELPER_FLAGS_5(gvec_usdot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_fcaddh, TCG_CALL_NO_RWG, diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static bool do_dot_vector_env(DisasContext *s, arg_qrrr_e *a, return true; } -TRANS_FEAT(SDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_sdot_b) -TRANS_FEAT(UDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_udot_b) -TRANS_FEAT(USDOT_v, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_usdot_b) +TRANS_FEAT(SDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_sdot_4b) +TRANS_FEAT(UDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_udot_4b) +TRANS_FEAT(USDOT_v, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_usdot_4b) TRANS_FEAT(BFDOT_v, aa64_bf16, do_dot_vector_env, a, gen_helper_gvec_bfdot) TRANS_FEAT(BFMMLA, aa64_bf16, do_dot_vector_env, a, gen_helper_gvec_bfmmla) TRANS_FEAT(SMMLA, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_smmla_b) @@ -XXX,XX +XXX,XX @@ static bool do_dot_vector_idx_env(DisasContext *s, arg_qrrx_e *a, return true; } -TRANS_FEAT(SDOT_vi, aa64_dp, do_dot_vector_idx, a, gen_helper_gvec_sdot_idx_b) -TRANS_FEAT(UDOT_vi, aa64_dp, do_dot_vector_idx, a, gen_helper_gvec_udot_idx_b) +TRANS_FEAT(SDOT_vi, aa64_dp, do_dot_vector_idx, a, gen_helper_gvec_sdot_idx_4b) +TRANS_FEAT(UDOT_vi, aa64_dp, do_dot_vector_idx, a, gen_helper_gvec_udot_idx_4b) TRANS_FEAT(SUDOT_vi, aa64_i8mm, do_dot_vector_idx, a, - gen_helper_gvec_sudot_idx_b) + gen_helper_gvec_sudot_idx_4b) TRANS_FEAT(USDOT_vi, aa64_i8mm, do_dot_vector_idx, a, - gen_helper_gvec_usdot_idx_b) + gen_helper_gvec_usdot_idx_4b) TRANS_FEAT(BFDOT_vi, aa64_bf16, do_dot_vector_idx_env, a, gen_helper_gvec_bfdot_idx) diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-neon.c +++ b/target/arm/tcg/translate-neon.c @@ -XXX,XX +XXX,XX @@ static bool trans_VSDOT(DisasContext *s, arg_VSDOT *a) return false; } return do_neon_ddda(s, a->q * 7, a->vd, a->vn, a->vm, 0, - gen_helper_gvec_sdot_b); + gen_helper_gvec_sdot_4b); } static bool trans_VUDOT(DisasContext *s, arg_VUDOT *a) @@ -XXX,XX +XXX,XX @@ static bool trans_VUDOT(DisasContext *s, arg_VUDOT *a) return false; } return do_neon_ddda(s, a->q * 7, a->vd, a->vn, a->vm, 0, - gen_helper_gvec_udot_b); + gen_helper_gvec_udot_4b); } static bool trans_VUSDOT(DisasContext *s, arg_VUSDOT *a) @@ -XXX,XX +XXX,XX @@ static bool trans_VUSDOT(DisasContext *s, arg_VUSDOT *a) return false; } return do_neon_ddda(s, a->q * 7, a->vd, a->vn, a->vm, 0, - gen_helper_gvec_usdot_b); + gen_helper_gvec_usdot_4b); } static bool trans_VDOT_b16(DisasContext *s, arg_VDOT_b16 *a) @@ -XXX,XX +XXX,XX @@ static bool trans_VSDOT_scalar(DisasContext *s, arg_VSDOT_scalar *a) return false; } return do_neon_ddda(s, a->q * 6, a->vd, a->vn, a->vm, a->index, - gen_helper_gvec_sdot_idx_b); + gen_helper_gvec_sdot_idx_4b); } static bool trans_VUDOT_scalar(DisasContext *s, arg_VUDOT_scalar *a) @@ -XXX,XX +XXX,XX @@ static bool trans_VUDOT_scalar(DisasContext *s, arg_VUDOT_scalar *a) return false; } return do_neon_ddda(s, a->q * 6, a->vd, a->vn, a->vm, a->index, - gen_helper_gvec_udot_idx_b); + gen_helper_gvec_udot_idx_4b); } static bool trans_VUSDOT_scalar(DisasContext *s, arg_VUSDOT_scalar *a) @@ -XXX,XX +XXX,XX @@ static bool trans_VUSDOT_scalar(DisasContext *s, arg_VUSDOT_scalar *a) return false; } return do_neon_ddda(s, a->q * 6, a->vd, a->vn, a->vm, a->index, - gen_helper_gvec_usdot_idx_b); + gen_helper_gvec_usdot_idx_4b); } static bool trans_VSUDOT_scalar(DisasContext *s, arg_VSUDOT_scalar *a) @@ -XXX,XX +XXX,XX @@ static bool trans_VSUDOT_scalar(DisasContext *s, arg_VSUDOT_scalar *a) return false; } return do_neon_ddda(s, a->q * 6, a->vd, a->vn, a->vm, a->index, - gen_helper_gvec_sudot_idx_b); + gen_helper_gvec_sudot_idx_4b); } static bool trans_VDOT_b16_scal(DisasContext *s, arg_VDOT_b16_scal *a) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ DO_ZZI(UMIN, umin) #undef DO_ZZI static gen_helper_gvec_4 * const dot_fns[2][2] = { - { gen_helper_gvec_sdot_b, gen_helper_gvec_sdot_h }, - { gen_helper_gvec_udot_b, gen_helper_gvec_udot_h } + { gen_helper_gvec_sdot_4b, gen_helper_gvec_sdot_4h }, + { gen_helper_gvec_udot_4b, gen_helper_gvec_udot_4h } }; TRANS_FEAT(DOT_zzzz, aa64_sve, gen_gvec_ool_zzzz, dot_fns[a->u][a->sz], a->rd, a->rn, a->rm, a->ra, 0) @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(DOT_zzzz, aa64_sve, gen_gvec_ool_zzzz, */ TRANS_FEAT(SDOT_zzxw_s, aa64_sve, gen_gvec_ool_arg_zzxz, - gen_helper_gvec_sdot_idx_b, a) + gen_helper_gvec_sdot_idx_4b, a) TRANS_FEAT(SDOT_zzxw_d, aa64_sve, gen_gvec_ool_arg_zzxz, - gen_helper_gvec_sdot_idx_h, a) + gen_helper_gvec_sdot_idx_4h, a) TRANS_FEAT(UDOT_zzxw_s, aa64_sve, gen_gvec_ool_arg_zzxz, - gen_helper_gvec_udot_idx_b, a) + gen_helper_gvec_udot_idx_4b, a) TRANS_FEAT(UDOT_zzxw_d, aa64_sve, gen_gvec_ool_arg_zzxz, - gen_helper_gvec_udot_idx_h, a) + gen_helper_gvec_udot_idx_4h, a) TRANS_FEAT(SUDOT_zzxw_s, aa64_sve_i8mm, gen_gvec_ool_arg_zzxz, - gen_helper_gvec_sudot_idx_b, a) + gen_helper_gvec_sudot_idx_4b, a) TRANS_FEAT(USDOT_zzxw_s, aa64_sve_i8mm, gen_gvec_ool_arg_zzxz, - gen_helper_gvec_usdot_idx_b, a) + gen_helper_gvec_usdot_idx_4b, a) #define DO_SVE2_RRX(NAME, FUNC) \ TRANS_FEAT(NAME, aa64_sve, gen_gvec_ool_zzz, FUNC, \ @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SQRDCMLAH_zzzz, aa64_sve2, gen_gvec_ool_zzzz, sqrdcmlah_fns[a->esz], a->rd, a->rn, a->rm, a->ra, a->rot) TRANS_FEAT(USDOT_zzzz, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz, - a->esz == 2 ? gen_helper_gvec_usdot_b : NULL, a, 0) + a->esz == 2 ? gen_helper_gvec_usdot_4b : NULL, a, 0) TRANS_FEAT_NONSTREAMING(AESMC, aa64_sve2_aes, gen_gvec_ool_zz, gen_helper_crypto_aesmc, a->rd, a->rd, 0) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ clear_tail(d, opr_sz, simd_maxsz(desc)); \ } -DO_DOT(gvec_sdot_b, int32_t, int8_t, int8_t) -DO_DOT(gvec_udot_b, uint32_t, uint8_t, uint8_t) -DO_DOT(gvec_usdot_b, uint32_t, uint8_t, int8_t) -DO_DOT(gvec_sdot_h, int64_t, int16_t, int16_t) -DO_DOT(gvec_udot_h, uint64_t, uint16_t, uint16_t) +DO_DOT(gvec_sdot_4b, int32_t, int8_t, int8_t) +DO_DOT(gvec_udot_4b, uint32_t, uint8_t, uint8_t) +DO_DOT(gvec_usdot_4b, uint32_t, uint8_t, int8_t) +DO_DOT(gvec_sdot_4h, int64_t, int16_t, int16_t) +DO_DOT(gvec_udot_4h, uint64_t, uint16_t, uint16_t) #define DO_DOT_IDX(NAME, TYPED, TYPEN, TYPEM, HD) \ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ @@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ clear_tail(d, opr_sz, simd_maxsz(desc)); \ } -DO_DOT_IDX(gvec_sdot_idx_b, int32_t, int8_t, int8_t, H4) -DO_DOT_IDX(gvec_udot_idx_b, uint32_t, uint8_t, uint8_t, H4) -DO_DOT_IDX(gvec_sudot_idx_b, int32_t, int8_t, uint8_t, H4) -DO_DOT_IDX(gvec_usdot_idx_b, int32_t, uint8_t, int8_t, H4) -DO_DOT_IDX(gvec_sdot_idx_h, int64_t, int16_t, int16_t, H8) -DO_DOT_IDX(gvec_udot_idx_h, uint64_t, uint16_t, uint16_t, H8) +DO_DOT_IDX(gvec_sdot_idx_4b, int32_t, int8_t, int8_t, H4) +DO_DOT_IDX(gvec_udot_idx_4b, uint32_t, uint8_t, uint8_t, H4) +DO_DOT_IDX(gvec_sudot_idx_4b, int32_t, int8_t, uint8_t, H4) +DO_DOT_IDX(gvec_usdot_idx_4b, int32_t, uint8_t, int8_t, H4) +DO_DOT_IDX(gvec_sdot_idx_4h, int64_t, int16_t, int16_t, H8) +DO_DOT_IDX(gvec_udot_idx_4h, uint64_t, uint16_t, uint16_t, H8) void HELPER(gvec_fcaddh)(void *vd, void *vn, void *vm, float_status *fpst, uint32_t desc) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20250704142112.1018902-42-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper.h | 8 ++++ target/arm/tcg/sme.decode | 63 ++++++++++++++++++++++++- target/arm/tcg/translate-sme.c | 85 ++++++++++++++++++++++++++++++++++ target/arm/tcg/vec_helper.c | 51 ++++++++++++++++++++ 4 files changed, 206 insertions(+), 1 deletion(-) diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper.h +++ b/target/arm/tcg/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_sdot_4h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_udot_4h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_usdot_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sdot_2h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_udot_2h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(gvec_sdot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_udot_idx_4b, TCG_CALL_NO_RWG, @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_sudot_idx_4b, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(gvec_usdot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sdot_idx_2h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_udot_idx_2h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(gvec_fcaddh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fcadds, TCG_CALL_NO_RWG, diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ FDOT_n1 11000001 001 1 .... 0 .. 100 ..... 00 ... @azz_nx1_o3 n=4 BFDOT_n1 11000001 001 0 .... 0 .. 100 ..... 10 ... @azz_nx1_o3 n=2 BFDOT_n1 11000001 001 1 .... 0 .. 100 ..... 10 ... @azz_nx1_o3 n=4 +USDOT_n1 11000001 001 0 .... 0 .. 101 ..... 01 ... @azz_nx1_o3 n=2 +USDOT_n1 11000001 001 1 .... 0 .. 101 ..... 01 ... @azz_nx1_o3 n=4 + +SUDOT_n1 11000001 001 0 .... 0 .. 101 ..... 11 ... @azz_nx1_o3 n=2 +SUDOT_n1 11000001 001 1 .... 0 .. 101 ..... 11 ... @azz_nx1_o3 n=4 + +SDOT_n1_4b 11000001 001 0 .... 0 .. 101 ..... 00 ... @azz_nx1_o3 n=2 +SDOT_n1_4b 11000001 001 1 .... 0 .. 101 ..... 00 ... @azz_nx1_o3 n=4 +SDOT_n1_4h 11000001 011 0 .... 0 .. 101 ..... 00 ... @azz_nx1_o3 n=2 +SDOT_n1_4h 11000001 011 1 .... 0 .. 101 ..... 00 ... @azz_nx1_o3 n=4 +SDOT_n1_2h 11000001 011 0 .... 0 .. 101 ..... 01 ... @azz_nx1_o3 n=2 +SDOT_n1_2h 11000001 011 1 .... 0 .. 101 ..... 01 ... @azz_nx1_o3 n=4 + +UDOT_n1_4b 11000001 001 0 .... 0 .. 101 ..... 10 ... @azz_nx1_o3 n=2 +UDOT_n1_4b 11000001 001 1 .... 0 .. 101 ..... 10 ... @azz_nx1_o3 n=4 +UDOT_n1_4h 11000001 011 0 .... 0 .. 101 ..... 10 ... @azz_nx1_o3 n=2 +UDOT_n1_4h 11000001 011 1 .... 0 .. 101 ..... 10 ... @azz_nx1_o3 n=4 +UDOT_n1_2h 11000001 011 0 .... 0 .. 101 ..... 11 ... @azz_nx1_o3 n=2 +UDOT_n1_2h 11000001 011 1 .... 0 .. 101 ..... 11 ... @azz_nx1_o3 n=4 + ### SME2 Multi-vector Multiple Array Vectors %zn_ax2 6:4 !function=times_2 @@ -XXX,XX +XXX,XX @@ FDOT_nn 11000001 101 ...01 0 .. 100 ...00 00 ... @azz_4x4_o3 BFDOT_nn 11000001 101 ....0 0 .. 100 ....0 10 ... @azz_2x2_o3 BFDOT_nn 11000001 101 ...01 0 .. 100 ...00 10 ... @azz_4x4_o3 +USDOT_nn 11000001 101 ....0 0 .. 101 ....0 01 ... @azz_2x2_o3 +USDOT_nn 11000001 101 ...01 0 .. 101 ...00 01 ... @azz_4x4_o3 + +SDOT_nn_4b 11000001 101 ....0 0 .. 101 ....0 00 ... @azz_2x2_o3 +SDOT_nn_4b 11000001 101 ...01 0 .. 101 ...00 00 ... @azz_4x4_o3 +SDOT_nn_4h 11000001 111 ....0 0 .. 101 ....0 00 ... @azz_2x2_o3 +SDOT_nn_4h 11000001 111 ...01 0 .. 101 ...00 00 ... @azz_4x4_o3 +SDOT_nn_2h 11000001 111 ....0 0 .. 101 ....0 01 ... @azz_2x2_o3 +SDOT_nn_2h 11000001 111 ...01 0 .. 101 ...00 01 ... @azz_4x4_o3 + +UDOT_nn_4b 11000001 101 ....0 0 .. 101 ....0 10 ... @azz_2x2_o3 +UDOT_nn_4b 11000001 101 ...01 0 .. 101 ...00 10 ... @azz_4x4_o3 +UDOT_nn_4h 11000001 111 ....0 0 .. 101 ....0 10 ... @azz_2x2_o3 +UDOT_nn_4h 11000001 111 ...01 0 .. 101 ...00 10 ... @azz_4x4_o3 +UDOT_nn_2h 11000001 111 ....0 0 .. 101 ....0 11 ... @azz_2x2_o3 +UDOT_nn_2h 11000001 111 ...01 0 .. 101 ...00 11 ... @azz_4x4_o3 + ### SME2 Multi-vector Indexed &azx_n n off rv zn zm idx @@ -XXX,XX +XXX,XX @@ BFMLSL_nx 11000001 1001 .... 1 .. 1 .. ...00 11 ... @azx_4x1_o2x2 @azx_2x1_i2_o3 ........ .... zm:4 . .. . idx:2 .... ... off:3 \ &azx_n n=2 rv=%mova_rv zn=%zn_ax2 @azx_4x1_i2_o3 ........ .... zm:4 . .. . idx:2 .... ... off:3 \ - &azx_n n=2 rv=%mova_rv zn=%zn_ax4 + &azx_n n=4 rv=%mova_rv zn=%zn_ax4 +@azx_2x1_i1_o3 ........ .... zm:4 . .. .. idx:1 .... ... off:3 \ + &azx_n n=2 rv=%mova_rv zn=%zn_ax2 +@azx_4x1_i1_o3 ........ .... zm:4 . .. .. idx:1 .... ... off:3 \ + &azx_n n=4 rv=%mova_rv zn=%zn_ax4 FDOT_nx 11000001 0101 .... 0 .. 1 .. ....0 01 ... @azx_2x1_i2_o3 FDOT_nx 11000001 0101 .... 1 .. 1 .. ...00 01 ... @azx_4x1_i2_o3 @@ -XXX,XX +XXX,XX @@ BFDOT_nx 11000001 0101 .... 1 .. 1 .. ...00 11 ... @azx_4x1_i2_o3 FVDOT 11000001 0101 .... 0 .. 0 .. ....0 01 ... @azx_2x1_i2_o3 BFVDOT 11000001 0101 .... 0 .. 0 .. ....0 11 ... @azx_2x1_i2_o3 + +SDOT_nx_2h 11000001 0101 .... 0 .. 1 .. ....0 00 ... @azx_2x1_i2_o3 +SDOT_nx_2h 11000001 0101 .... 1 .. 1 .. ...00 00 ... @azx_4x1_i2_o3 +SDOT_nx_4b 11000001 0101 .... 0 .. 1 .. ....1 00 ... @azx_2x1_i2_o3 +SDOT_nx_4b 11000001 0101 .... 1 .. 1 .. ...01 00 ... @azx_4x1_i2_o3 +SDOT_nx_4h 11000001 1101 .... 0 .. 00 . ....0 01 ... @azx_2x1_i1_o3 +SDOT_nx_4h 11000001 1101 .... 1 .. 00 . ...00 01 ... @azx_4x1_i1_o3 + +UDOT_nx_2h 11000001 0101 .... 0 .. 1 .. ....0 10 ... @azx_2x1_i2_o3 +UDOT_nx_2h 11000001 0101 .... 1 .. 1 .. ...00 10 ... @azx_4x1_i2_o3 +UDOT_nx_4b 11000001 0101 .... 0 .. 1 .. ....1 10 ... @azx_2x1_i2_o3 +UDOT_nx_4b 11000001 0101 .... 1 .. 1 .. ...01 10 ... @azx_4x1_i2_o3 +UDOT_nx_4h 11000001 1101 .... 0 .. 00 . ....0 11 ... @azx_2x1_i1_o3 +UDOT_nx_4h 11000001 1101 .... 1 .. 00 . ...00 11 ... @azx_4x1_i1_o3 + +USDOT_nx 11000001 0101 .... 0 .. 1 .. ....1 01 ... @azx_2x1_i2_o3 +USDOT_nx 11000001 0101 .... 1 .. 1 .. ...01 01 ... @azx_4x1_i2_o3 + +SUDOT_nx 11000001 0101 .... 0 .. 1 .. ....1 11 ... @azx_2x1_i2_o3 +SUDOT_nx 11000001 0101 .... 1 .. 1 .. ...01 11 ... @azx_4x1_i2_o3 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_vdot(DisasContext *s, arg_azx_n *a, gen_helper_gvec_4_ptr *fn) TRANS_FEAT(FVDOT, aa64_sme, do_vdot, a, gen_helper_sme2_fvdot_idx_h) TRANS_FEAT(BFVDOT, aa64_sme, do_vdot, a, gen_helper_sme2_bfvdot_idx) + +/* + * Expand array multi-vector single (n1), array multi-vector (nn), + * and array multi-vector indexed (nx), for integer accumulate. + * multi: true for nn, false for n1. + * data: stuff for simd_data, including any index. + */ +static bool do_azz_acc(DisasContext *s, int nreg, int nsel, + int rv, int off, int zn, int zm, + int data, int shsel, bool multi, + gen_helper_gvec_4 *fn) +{ + if (sme_smza_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + int vstride = svl / nreg; + TCGv_ptr t_za = get_zarray(s, rv, off, nreg, nsel); + TCGv_ptr t = tcg_temp_new_ptr(); + + for (int r = 0; r < nreg; ++r) { + TCGv_ptr t_zn = vec_full_reg_ptr(s, zn); + TCGv_ptr t_zm = vec_full_reg_ptr(s, zm); + + for (int i = 0; i < nsel; ++i) { + int o_za = (r * vstride + i) * sizeof(ARMVectorReg); + int desc = simd_desc(svl, svl, data | (i << shsel)); + + tcg_gen_addi_ptr(t, t_za, o_za); + fn(t, t_zn, t_zm, t, tcg_constant_i32(desc)); + } + + /* + * For multiple-and-single vectors, Zn may wrap. + * For multiple vectors, both Zn and Zm are aligned. + */ + zn = (zn + 1) % 32; + zm += multi; + } + } + return true; +} + +static bool do_dot(DisasContext *s, arg_azz_n *a, bool multi, + gen_helper_gvec_4 *fn) +{ + return do_azz_acc(s, a->n, 1, a->rv, a->off, a->zn, a->zm, + 0, 0, multi, fn); +} + +static void gen_helper_gvec_sudot_4b(TCGv_ptr d, TCGv_ptr n, TCGv_ptr m, + TCGv_ptr a, TCGv_i32 desc) +{ + gen_helper_gvec_usdot_4b(d, m, n, a, desc); +} + +TRANS_FEAT(USDOT_n1, aa64_sme2, do_dot, a, false, gen_helper_gvec_usdot_4b) +TRANS_FEAT(SUDOT_n1, aa64_sme2, do_dot, a, false, gen_helper_gvec_sudot_4b) +TRANS_FEAT(SDOT_n1_2h, aa64_sme2, do_dot, a, false, gen_helper_gvec_sdot_2h) +TRANS_FEAT(UDOT_n1_2h, aa64_sme2, do_dot, a, false, gen_helper_gvec_udot_2h) +TRANS_FEAT(SDOT_n1_4b, aa64_sme2, do_dot, a, false, gen_helper_gvec_sdot_4b) +TRANS_FEAT(UDOT_n1_4b, aa64_sme2, do_dot, a, false, gen_helper_gvec_udot_4b) +TRANS_FEAT(SDOT_n1_4h, aa64_sme2_i16i64, do_dot, a, false, gen_helper_gvec_sdot_4h) +TRANS_FEAT(UDOT_n1_4h, aa64_sme2_i16i64, do_dot, a, false, gen_helper_gvec_udot_4h) + +TRANS_FEAT(USDOT_nn, aa64_sme2, do_dot, a, true, gen_helper_gvec_usdot_4b) +TRANS_FEAT(SDOT_nn_2h, aa64_sme2, do_dot, a, true, gen_helper_gvec_sdot_2h) +TRANS_FEAT(UDOT_nn_2h, aa64_sme2, do_dot, a, true, gen_helper_gvec_udot_2h) +TRANS_FEAT(SDOT_nn_4b, aa64_sme2, do_dot, a, true, gen_helper_gvec_sdot_4b) +TRANS_FEAT(UDOT_nn_4b, aa64_sme2, do_dot, a, true, gen_helper_gvec_udot_4b) +TRANS_FEAT(SDOT_nn_4h, aa64_sme2_i16i64, do_dot, a, true, gen_helper_gvec_sdot_4h) +TRANS_FEAT(UDOT_nn_4h, aa64_sme2_i16i64, do_dot, a, true, gen_helper_gvec_udot_4h) + +static bool do_dot_nx(DisasContext *s, arg_azx_n *a, gen_helper_gvec_4 *fn) +{ + return do_azz_acc(s, a->n, 1, a->rv, a->off, a->zn, a->zm, + a->idx, 0, false, fn); +} + +TRANS_FEAT(USDOT_nx, aa64_sme2, do_dot_nx, a, gen_helper_gvec_usdot_idx_4b) +TRANS_FEAT(SUDOT_nx, aa64_sme2, do_dot_nx, a, gen_helper_gvec_sudot_idx_4b) +TRANS_FEAT(SDOT_nx_2h, aa64_sme2, do_dot_nx, a, gen_helper_gvec_sdot_idx_2h) +TRANS_FEAT(UDOT_nx_2h, aa64_sme2, do_dot_nx, a, gen_helper_gvec_udot_idx_2h) +TRANS_FEAT(SDOT_nx_4b, aa64_sme2, do_dot_nx, a, gen_helper_gvec_sdot_idx_4b) +TRANS_FEAT(UDOT_nx_4b, aa64_sme2, do_dot_nx, a, gen_helper_gvec_udot_idx_4b) +TRANS_FEAT(SDOT_nx_4h, aa64_sme2_i16i64, do_dot_nx, a, gen_helper_gvec_sdot_idx_4h) +TRANS_FEAT(UDOT_nx_4h, aa64_sme2_i16i64, do_dot_nx, a, gen_helper_gvec_udot_idx_4h) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ DO_DOT_IDX(gvec_usdot_idx_4b, int32_t, uint8_t, int8_t, H4) DO_DOT_IDX(gvec_sdot_idx_4h, int64_t, int16_t, int16_t, H8) DO_DOT_IDX(gvec_udot_idx_4h, uint64_t, uint16_t, uint16_t, H8) +#undef DO_DOT +#undef DO_DOT_IDX + +/* Similar for 2-way dot product */ +#define DO_DOT(NAME, TYPED, TYPEN, TYPEM) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ +{ \ + intptr_t i, opr_sz = simd_oprsz(desc); \ + TYPED *d = vd, *a = va; \ + TYPEN *n = vn; \ + TYPEM *m = vm; \ + for (i = 0; i < opr_sz / sizeof(TYPED); ++i) { \ + d[i] = (a[i] + \ + (TYPED)n[i * 2 + 0] * m[i * 2 + 0] + \ + (TYPED)n[i * 2 + 1] * m[i * 2 + 1]); \ + } \ + clear_tail(d, opr_sz, simd_maxsz(desc)); \ +} + +#define DO_DOT_IDX(NAME, TYPED, TYPEN, TYPEM, HD) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ +{ \ + intptr_t i = 0, opr_sz = simd_oprsz(desc); \ + intptr_t opr_sz_n = opr_sz / sizeof(TYPED); \ + intptr_t segend = MIN(16 / sizeof(TYPED), opr_sz_n); \ + intptr_t index = simd_data(desc); \ + TYPED *d = vd, *a = va; \ + TYPEN *n = vn; \ + TYPEM *m_indexed = (TYPEM *)vm + HD(index) * 2; \ + do { \ + TYPED m0 = m_indexed[i * 2 + 0]; \ + TYPED m1 = m_indexed[i * 2 + 1]; \ + do { \ + d[i] = (a[i] + \ + n[i * 2 + 0] * m0 + \ + n[i * 2 + 1] * m1); \ + } while (++i < segend); \ + segend = i + (16 / sizeof(TYPED)); \ + } while (i < opr_sz_n); \ + clear_tail(d, opr_sz, simd_maxsz(desc)); \ +} + +DO_DOT(gvec_sdot_2h, int32_t, int16_t, int16_t) +DO_DOT(gvec_udot_2h, uint32_t, uint16_t, uint16_t) + +DO_DOT_IDX(gvec_sdot_idx_2h, int32_t, int16_t, int16_t, H4) +DO_DOT_IDX(gvec_udot_idx_2h, uint32_t, uint16_t, uint16_t, H4) + +#undef DO_DOT +#undef DO_DOT_IDX + void HELPER(gvec_fcaddh)(void *vd, void *vn, void *vm, float_status *fpst, uint32_t desc) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Emphasize the 4-way nature of these dot products. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-43-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 12 ++++++------ target/arm/tcg/translate-sve.c | 12 ++++++------ 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ CDOT_zzzz 01000100 esz:2 0 rm:5 0001 rot:2 rn:5 rd:5 ra=%reg_movprfx #### SVE Multiply - Indexed # SVE integer dot product (indexed) -SDOT_zzxw_s 01000100 10 1 ..... 000000 ..... ..... @rrxr_2 esz=2 -SDOT_zzxw_d 01000100 11 1 ..... 000000 ..... ..... @rrxr_1 esz=3 -UDOT_zzxw_s 01000100 10 1 ..... 000001 ..... ..... @rrxr_2 esz=2 -UDOT_zzxw_d 01000100 11 1 ..... 000001 ..... ..... @rrxr_1 esz=3 +SDOT_zzxw_4s 01000100 10 1 ..... 000000 ..... ..... @rrxr_2 esz=2 +SDOT_zzxw_4d 01000100 11 1 ..... 000000 ..... ..... @rrxr_1 esz=3 +UDOT_zzxw_4s 01000100 10 1 ..... 000001 ..... ..... @rrxr_2 esz=2 +UDOT_zzxw_4d 01000100 11 1 ..... 000001 ..... ..... @rrxr_1 esz=3 # SVE2 integer multiply-add (indexed) MLA_zzxz_h 01000100 0. 1 ..... 000010 ..... ..... @rrxr_3 esz=1 @@ -XXX,XX +XXX,XX @@ SQRDMLSH_zzxz_s 01000100 10 1 ..... 000101 ..... ..... @rrxr_2 esz=2 SQRDMLSH_zzxz_d 01000100 11 1 ..... 000101 ..... ..... @rrxr_1 esz=3 # SVE mixed sign dot product (indexed) -USDOT_zzxw_s 01000100 10 1 ..... 000110 ..... ..... @rrxr_2 esz=2 -SUDOT_zzxw_s 01000100 10 1 ..... 000111 ..... ..... @rrxr_2 esz=2 +USDOT_zzxw_4s 01000100 10 1 ..... 000110 ..... ..... @rrxr_2 esz=2 +SUDOT_zzxw_4s 01000100 10 1 ..... 000111 ..... ..... @rrxr_2 esz=2 # SVE2 saturating multiply-add (indexed) SQDMLALB_zzxw_s 01000100 10 1 ..... 0010.0 ..... ..... @rrxr_3a esz=2 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(DOT_zzzz, aa64_sve, gen_gvec_ool_zzzz, * SVE Multiply - Indexed */ -TRANS_FEAT(SDOT_zzxw_s, aa64_sve, gen_gvec_ool_arg_zzxz, +TRANS_FEAT(SDOT_zzxw_4s, aa64_sve, gen_gvec_ool_arg_zzxz, gen_helper_gvec_sdot_idx_4b, a) -TRANS_FEAT(SDOT_zzxw_d, aa64_sve, gen_gvec_ool_arg_zzxz, +TRANS_FEAT(SDOT_zzxw_4d, aa64_sve, gen_gvec_ool_arg_zzxz, gen_helper_gvec_sdot_idx_4h, a) -TRANS_FEAT(UDOT_zzxw_s, aa64_sve, gen_gvec_ool_arg_zzxz, +TRANS_FEAT(UDOT_zzxw_4s, aa64_sve, gen_gvec_ool_arg_zzxz, gen_helper_gvec_udot_idx_4b, a) -TRANS_FEAT(UDOT_zzxw_d, aa64_sve, gen_gvec_ool_arg_zzxz, +TRANS_FEAT(UDOT_zzxw_4d, aa64_sve, gen_gvec_ool_arg_zzxz, gen_helper_gvec_udot_idx_4h, a) -TRANS_FEAT(SUDOT_zzxw_s, aa64_sve_i8mm, gen_gvec_ool_arg_zzxz, +TRANS_FEAT(SUDOT_zzxw_4s, aa64_sve_i8mm, gen_gvec_ool_arg_zzxz, gen_helper_gvec_sudot_idx_4b, a) -TRANS_FEAT(USDOT_zzxw_s, aa64_sve_i8mm, gen_gvec_ool_arg_zzxz, +TRANS_FEAT(USDOT_zzxw_4s, aa64_sve_i8mm, gen_gvec_ool_arg_zzxz, gen_helper_gvec_usdot_idx_4b, a) #define DO_SVE2_RRX(NAME, FUNC) \ -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Rename to USDOT_zzzz_4s and force size=2 during decode. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-44-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 2 +- target/arm/tcg/translate-sve.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ SQRDCMLAH_zzzz 01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5 ra=%reg_movprfx ## SVE mixed sign dot product -USDOT_zzzz 01000100 .. 0 ..... 011 110 ..... ..... @rda_rn_rm +USDOT_zzzz_4s 01000100 10 0 ..... 011 110 ..... ..... @rda_rn_rm_ex esz=2 ### SVE2 floating point matrix multiply accumulate BFMMLA 01100100 01 1 ..... 111 001 ..... ..... @rda_rn_rm_ex esz=1 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_4 * const sqrdcmlah_fns[] = { TRANS_FEAT(SQRDCMLAH_zzzz, aa64_sve2, gen_gvec_ool_zzzz, sqrdcmlah_fns[a->esz], a->rd, a->rn, a->rm, a->ra, a->rot) -TRANS_FEAT(USDOT_zzzz, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz, - a->esz == 2 ? gen_helper_gvec_usdot_4b : NULL, a, 0) +TRANS_FEAT(USDOT_zzzz_4s, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz, + gen_helper_gvec_usdot_4b, a, 0) TRANS_FEAT_NONSTREAMING(AESMC, aa64_sve2_aes, gen_gvec_ool_zz, gen_helper_crypto_aesmc, a->rd, a->rd, 0) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-45-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 8 +++++++- target/arm/tcg/translate-sve.c | 10 ++++++++++ 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ SDOT_zzxw_4d 01000100 11 1 ..... 000000 ..... ..... @rrxr_1 esz=3 UDOT_zzxw_4s 01000100 10 1 ..... 000001 ..... ..... @rrxr_2 esz=2 UDOT_zzxw_4d 01000100 11 1 ..... 000001 ..... ..... @rrxr_1 esz=3 +SDOT_zzxw_2s 01000100 10 0 ..... 110010 ..... ..... @rrxr_2 esz=2 +UDOT_zzxw_2s 01000100 10 0 ..... 110011 ..... ..... @rrxr_2 esz=2 + # SVE2 integer multiply-add (indexed) MLA_zzxz_h 01000100 0. 1 ..... 000010 ..... ..... @rrxr_3 esz=1 MLA_zzxz_s 01000100 10 1 ..... 000010 ..... ..... @rrxr_2 esz=2 @@ -XXX,XX +XXX,XX @@ UMLSLT_zzzw 01000100 .. 0 ..... 010 111 ..... ..... @rda_rn_rm CMLA_zzzz 01000100 esz:2 0 rm:5 0010 rot:2 rn:5 rd:5 ra=%reg_movprfx SQRDCMLAH_zzzz 01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5 ra=%reg_movprfx -## SVE mixed sign dot product +## SVE dot product + +SDOT_zzzz_2s 01000100 00 0 ..... 110 010 ..... ..... @rda_rn_rm_ex esz=2 +UDOT_zzzz_2s 01000100 00 0 ..... 110 011 ..... ..... @rda_rn_rm_ex esz=2 USDOT_zzzz_4s 01000100 10 0 ..... 011 110 ..... ..... @rda_rn_rm_ex esz=2 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SUDOT_zzxw_4s, aa64_sve_i8mm, gen_gvec_ool_arg_zzxz, TRANS_FEAT(USDOT_zzxw_4s, aa64_sve_i8mm, gen_gvec_ool_arg_zzxz, gen_helper_gvec_usdot_idx_4b, a) +TRANS_FEAT(SDOT_zzxw_2s, aa64_sme2_or_sve2p1, gen_gvec_ool_arg_zzxz, + gen_helper_gvec_sdot_idx_2h, a) +TRANS_FEAT(UDOT_zzxw_2s, aa64_sme2_or_sve2p1, gen_gvec_ool_arg_zzxz, + gen_helper_gvec_udot_idx_2h, a) + #define DO_SVE2_RRX(NAME, FUNC) \ TRANS_FEAT(NAME, aa64_sve, gen_gvec_ool_zzz, FUNC, \ a->rd, a->rn, a->rm, a->index) @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SQRDCMLAH_zzzz, aa64_sve2, gen_gvec_ool_zzzz, TRANS_FEAT(USDOT_zzzz_4s, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz, gen_helper_gvec_usdot_4b, a, 0) +TRANS_FEAT(SDOT_zzzz_2s, aa64_sme2_or_sve2p1, gen_gvec_ool_arg_zzzz, + gen_helper_gvec_sdot_2h, a, 0) +TRANS_FEAT(UDOT_zzzz_2s, aa64_sme2_or_sve2p1, gen_gvec_ool_arg_zzzz, + gen_helper_gvec_udot_2h, a, 0) + TRANS_FEAT_NONSTREAMING(AESMC, aa64_sve2_aes, gen_gvec_ool_zz, gen_helper_crypto_aesmc, a->rd, a->rd, 0) TRANS_FEAT_NONSTREAMING(AESIMC, aa64_sve2_aes, gen_gvec_ool_zz, -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-46-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 11 +++++++++ target/arm/tcg/sme.decode | 11 +++++++++ target/arm/tcg/sme_helper.c | 42 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 23 +++++++++++++++++++ 4 files changed, 87 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sme2_fdot_idx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, env, i32) DEF_HELPER_FLAGS_6(sme2_fvdot_idx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, env, i32) + +DEF_HELPER_FLAGS_4(sme2_svdot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uvdot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_suvdot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_usvdot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sme2_svdot_idx_4h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uvdot_idx_4h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sme2_svdot_idx_2h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uvdot_idx_2h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ USDOT_nx 11000001 0101 .... 1 .. 1 .. ...01 01 ... @azx_4x1_i2_o3 SUDOT_nx 11000001 0101 .... 0 .. 1 .. ....1 11 ... @azx_2x1_i2_o3 SUDOT_nx 11000001 0101 .... 1 .. 1 .. ...01 11 ... @azx_4x1_i2_o3 + +SVDOT_nx_2h 11000001 0101 .... 0 .. 0 .. ....1 00 ... @azx_2x1_i2_o3 +SVDOT_nx_4b 11000001 0101 .... 1 .. 0 .. ...01 00 ... @azx_4x1_i2_o3 +SVDOT_nx_4h 11000001 1101 .... 1 .. 01 . ...00 01 ... @azx_4x1_i1_o3 + +UVDOT_nx_2h 11000001 0101 .... 0 .. 0 .. ....1 10 ... @azx_2x1_i2_o3 +UVDOT_nx_4b 11000001 0101 .... 1 .. 0 .. ...01 10 ... @azx_4x1_i2_o3 +UVDOT_nx_4h 11000001 1101 .... 1 .. 01 . ...00 11 ... @azx_4x1_i1_o3 + +SUVDOT_nx_4b 11000001 0101 .... 1 .. 0 .. ...01 11 ... @azx_4x1_i2_o3 +USVDOT_nx_4b 11000001 0101 .... 1 .. 0 .. ...01 01 ... @azx_4x1_i2_o3 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ DEF_IMOP_16x2_32(umopa2_s, uint16_t, uint16_t) DEF_IMOPH(sme2, smopa2, s) DEF_IMOPH(sme2, umopa2, s) + +#define DO_VDOT_IDX(NAME, TYPED, TYPEN, TYPEM, HD, HN) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ +{ \ + intptr_t svl = simd_oprsz(desc); \ + intptr_t elements = svl / sizeof(TYPED); \ + intptr_t eltperseg = 16 / sizeof(TYPED); \ + intptr_t nreg = sizeof(TYPED) / sizeof(TYPEN); \ + intptr_t vstride = (svl / nreg) * sizeof(ARMVectorReg); \ + intptr_t zstride = sizeof(ARMVectorReg) / sizeof(TYPEN); \ + intptr_t idx = extract32(desc, SIMD_DATA_SHIFT, 2); \ + TYPEN *n = vn; \ + TYPEM *m = vm; \ + for (intptr_t r = 0; r < nreg; r++) { \ + TYPED *d = vd + r * vstride; \ + for (intptr_t seg = 0; seg < elements; seg += eltperseg) { \ + intptr_t s = seg + idx; \ + for (intptr_t e = seg; e < seg + eltperseg; e++) { \ + TYPED sum = d[HD(e)]; \ + for (intptr_t i = 0; i < nreg; i++) { \ + TYPED nn = n[i * zstride + HN(nreg * e + r)]; \ + TYPED mm = m[HN(nreg * s + i)]; \ + sum += nn * mm; \ + } \ + d[HD(e)] = sum; \ + } \ + } \ + } \ +} + +DO_VDOT_IDX(sme2_svdot_idx_4b, int32_t, int8_t, int8_t, H4, H1) +DO_VDOT_IDX(sme2_uvdot_idx_4b, uint32_t, uint8_t, uint8_t, H4, H1) +DO_VDOT_IDX(sme2_suvdot_idx_4b, int32_t, int8_t, uint8_t, H4, H1) +DO_VDOT_IDX(sme2_usvdot_idx_4b, int32_t, uint8_t, int8_t, H4, H1) + +DO_VDOT_IDX(sme2_svdot_idx_4h, int64_t, int16_t, int16_t, H8, H2) +DO_VDOT_IDX(sme2_uvdot_idx_4h, uint64_t, uint16_t, uint16_t, H8, H2) + +DO_VDOT_IDX(sme2_svdot_idx_2h, int32_t, int16_t, int16_t, H4, H2) +DO_VDOT_IDX(sme2_uvdot_idx_2h, uint32_t, uint16_t, uint16_t, H4, H2) + +#undef DO_VDOT_IDX diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SDOT_nx_4b, aa64_sme2, do_dot_nx, a, gen_helper_gvec_sdot_idx_4b) TRANS_FEAT(UDOT_nx_4b, aa64_sme2, do_dot_nx, a, gen_helper_gvec_udot_idx_4b) TRANS_FEAT(SDOT_nx_4h, aa64_sme2_i16i64, do_dot_nx, a, gen_helper_gvec_sdot_idx_4h) TRANS_FEAT(UDOT_nx_4h, aa64_sme2_i16i64, do_dot_nx, a, gen_helper_gvec_udot_idx_4h) + +static bool do_vdot_nx(DisasContext *s, arg_azx_n *a, gen_helper_gvec_3 *fn) +{ + if (sme_smza_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + fn(get_zarray(s, a->rv, a->off, a->n, 0), + vec_full_reg_ptr(s, a->zn), + vec_full_reg_ptr(s, a->zm), + tcg_constant_i32(simd_desc(svl, svl, a->idx))); + } + return true; +} + +TRANS_FEAT(SVDOT_nx_2h, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_svdot_idx_2h) +TRANS_FEAT(SVDOT_nx_4b, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_svdot_idx_4b) +TRANS_FEAT(SVDOT_nx_4h, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_svdot_idx_4h) + +TRANS_FEAT(UVDOT_nx_2h, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_uvdot_idx_2h) +TRANS_FEAT(UVDOT_nx_4b, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_uvdot_idx_4b) +TRANS_FEAT(UVDOT_nx_4h, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_uvdot_idx_4h) + +TRANS_FEAT(SUVDOT_nx_4b, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_suvdot_idx_4b) +TRANS_FEAT(USVDOT_nx_4b, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_usvdot_idx_4b) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-47-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 21 +++++ target/arm/tcg/sme.decode | 168 +++++++++++++++++++++++++++++++++ target/arm/tcg/sme_helper.c | 59 ++++++++++++ target/arm/tcg/translate-sme.c | 84 +++++++++++++++++ 4 files changed, 332 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sme2_uvdot_idx_4h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sme2_svdot_idx_2h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sme2_uvdot_idx_2h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sme2_smlall_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_smlall_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_smlsll_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_smlsll_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_umlall_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_umlall_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_umlsll_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_umlsll_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_usmlall_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sme2_smlall_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_smlall_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_smlsll_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_smlsll_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_umlall_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_umlall_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_umlsll_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_umlsll_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_usmlall_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_sumlall_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ UDOT_n1_4h 11000001 011 1 .... 0 .. 101 ..... 10 ... @azz_nx1_o3 n=4 UDOT_n1_2h 11000001 011 0 .... 0 .. 101 ..... 11 ... @azz_nx1_o3 n=2 UDOT_n1_2h 11000001 011 1 .... 0 .. 101 ..... 11 ... @azz_nx1_o3 n=4 +SMLAL_n1 11000001 011 0 .... 0 .. 011 ..... 00 ... @azz_nx1_o3x2 n=1 +SMLAL_n1 11000001 011 0 .... 0 .. 010 ..... 000 .. @azz_nx1_o2x2 n=2 +SMLAL_n1 11000001 011 1 .... 0 .. 010 ..... 000 .. @azz_nx1_o2x2 n=4 + +SMLSL_n1 11000001 011 0 .... 0 .. 011 ..... 01 ... @azz_nx1_o3x2 n=1 +SMLSL_n1 11000001 011 0 .... 0 .. 010 ..... 010 .. @azz_nx1_o2x2 n=2 +SMLSL_n1 11000001 011 1 .... 0 .. 010 ..... 010 .. @azz_nx1_o2x2 n=4 + +UMLAL_n1 11000001 011 0 .... 0 .. 011 ..... 10 ... @azz_nx1_o3x2 n=1 +UMLAL_n1 11000001 011 0 .... 0 .. 010 ..... 100 .. @azz_nx1_o2x2 n=2 +UMLAL_n1 11000001 011 1 .... 0 .. 010 ..... 100 .. @azz_nx1_o2x2 n=4 + +UMLSL_n1 11000001 011 0 .... 0 .. 011 ..... 11 ... @azz_nx1_o3x2 n=1 +UMLSL_n1 11000001 011 0 .... 0 .. 010 ..... 110 .. @azz_nx1_o2x2 n=2 +UMLSL_n1 11000001 011 1 .... 0 .. 010 ..... 110 .. @azz_nx1_o2x2 n=4 + +%off2_x4 0:2 !function=times_4 +%off1_x4 0:1 !function=times_4 + +@azz_nx1_o2x4 ........ ... . zm:4 . .. ... zn:5 ... .. \ + &azz_n off=%off2_x4 rv=%mova_rv +@azz_nx1_o1x4 ........ ... . zm:4 . .. ... zn:5 .... . \ + &azz_n off=%off1_x4 rv=%mova_rv + +SMLALL_n1_s 11000001 001 0 .... 0 .. 001 ..... 000 .. @azz_nx1_o2x4 n=1 +SMLALL_n1_d 11000001 011 0 .... 0 .. 001 ..... 000 .. @azz_nx1_o2x4 n=1 +SMLALL_n1_s 11000001 001 0 .... 0 .. 000 ..... 0000 . @azz_nx1_o1x4 n=2 +SMLALL_n1_d 11000001 011 0 .... 0 .. 000 ..... 0000 . @azz_nx1_o1x4 n=2 +SMLALL_n1_s 11000001 001 1 .... 0 .. 000 ..... 0000 . @azz_nx1_o1x4 n=4 +SMLALL_n1_d 11000001 011 1 .... 0 .. 000 ..... 0000 . @azz_nx1_o1x4 n=4 + +SMLSLL_n1_s 11000001 001 0 .... 0 .. 001 ..... 010 .. @azz_nx1_o2x4 n=1 +SMLSLL_n1_d 11000001 011 0 .... 0 .. 001 ..... 010 .. @azz_nx1_o2x4 n=1 +SMLSLL_n1_s 11000001 001 0 .... 0 .. 000 ..... 0100 . @azz_nx1_o1x4 n=2 +SMLSLL_n1_d 11000001 011 0 .... 0 .. 000 ..... 0100 . @azz_nx1_o1x4 n=2 +SMLSLL_n1_s 11000001 001 1 .... 0 .. 000 ..... 0100 . @azz_nx1_o1x4 n=4 +SMLSLL_n1_d 11000001 011 1 .... 0 .. 000 ..... 0100 . @azz_nx1_o1x4 n=4 + +UMLALL_n1_s 11000001 001 0 .... 0 .. 001 ..... 100 .. @azz_nx1_o2x4 n=1 +UMLALL_n1_d 11000001 011 0 .... 0 .. 001 ..... 100 .. @azz_nx1_o2x4 n=1 +UMLALL_n1_s 11000001 001 0 .... 0 .. 000 ..... 1000 . @azz_nx1_o1x4 n=2 +UMLALL_n1_d 11000001 011 0 .... 0 .. 000 ..... 1000 . @azz_nx1_o1x4 n=2 +UMLALL_n1_s 11000001 001 1 .... 0 .. 000 ..... 1000 . @azz_nx1_o1x4 n=4 +UMLALL_n1_d 11000001 011 1 .... 0 .. 000 ..... 1000 . @azz_nx1_o1x4 n=4 + +UMLSLL_n1_s 11000001 001 0 .... 0 .. 001 ..... 110 .. @azz_nx1_o2x4 n=1 +UMLSLL_n1_d 11000001 011 0 .... 0 .. 001 ..... 110 .. @azz_nx1_o2x4 n=1 +UMLSLL_n1_s 11000001 001 0 .... 0 .. 000 ..... 1100 . @azz_nx1_o1x4 n=2 +UMLSLL_n1_d 11000001 011 0 .... 0 .. 000 ..... 1100 . @azz_nx1_o1x4 n=2 +UMLSLL_n1_s 11000001 001 1 .... 0 .. 000 ..... 1100 . @azz_nx1_o1x4 n=4 +UMLSLL_n1_d 11000001 011 1 .... 0 .. 000 ..... 1100 . @azz_nx1_o1x4 n=4 + +USMLALL_n1_s 11000001 001 0 .... 0 .. 001 ..... 001 .. @azz_nx1_o2x4 n=1 +USMLALL_n1_s 11000001 001 0 .... 0 .. 000 ..... 0010 . @azz_nx1_o1x4 n=2 +USMLALL_n1_s 11000001 001 1 .... 0 .. 000 ..... 0010 . @azz_nx1_o1x4 n=4 + +SUMLALL_n1_s 11000001 001 0 .... 0 .. 000 ..... 1010 . @azz_nx1_o1x4 n=2 +SUMLALL_n1_s 11000001 001 1 .... 0 .. 000 ..... 1010 . @azz_nx1_o1x4 n=4 + ### SME2 Multi-vector Multiple Array Vectors %zn_ax2 6:4 !function=times_2 @@ -XXX,XX +XXX,XX @@ UDOT_nn_4h 11000001 111 ...01 0 .. 101 ...00 10 ... @azz_4x4_o3 UDOT_nn_2h 11000001 111 ....0 0 .. 101 ....0 11 ... @azz_2x2_o3 UDOT_nn_2h 11000001 111 ...01 0 .. 101 ...00 11 ... @azz_4x4_o3 +SMLAL_nn 11000001 111 ....0 0 .. 010 ....0 000 .. @azz_2x2_o2x2 +SMLAL_nn 11000001 111 ...01 0 .. 010 ...00 000 .. @azz_4x4_o2x2 + +SMLSL_nn 11000001 111 ....0 0 .. 010 ....0 010 .. @azz_2x2_o2x2 +SMLSL_nn 11000001 111 ...01 0 .. 010 ...00 010 .. @azz_4x4_o2x2 + +UMLAL_nn 11000001 111 ....0 0 .. 010 ....0 100 .. @azz_2x2_o2x2 +UMLAL_nn 11000001 111 ...01 0 .. 010 ...00 100 .. @azz_4x4_o2x2 + +UMLSL_nn 11000001 111 ....0 0 .. 010 ....0 110 .. @azz_2x2_o2x2 +UMLSL_nn 11000001 111 ...01 0 .. 010 ...00 110 .. @azz_4x4_o2x2 + +@azz_2x2_o1x4 ........ ... ..... . .. ... ..... ... .. \ + &azz_n n=2 rv=%mova_rv zn=%zn_ax2 zm=%zm_ax2 off=%off1_x4 +@azz_4x4_o1x4 ........ ... ..... . .. ... ..... ... .. \ + &azz_n n=4 rv=%mova_rv zn=%zn_ax4 zm=%zm_ax4 off=%off1_x4 + +SMLALL_nn_s 11000001 101 ....0 0 .. 000 ....0 0000 . @azz_2x2_o1x4 +SMLALL_nn_d 11000001 111 ....0 0 .. 000 ....0 0000 . @azz_2x2_o1x4 +SMLALL_nn_s 11000001 101 ...01 0 .. 000 ...00 0000 . @azz_4x4_o1x4 +SMLALL_nn_d 11000001 111 ...01 0 .. 000 ...00 0000 . @azz_4x4_o1x4 + +SMLSLL_nn_s 11000001 101 ....0 0 .. 000 ....0 0100 . @azz_2x2_o1x4 +SMLSLL_nn_d 11000001 111 ....0 0 .. 000 ....0 0100 . @azz_2x2_o1x4 +SMLSLL_nn_s 11000001 101 ...01 0 .. 000 ...00 0100 . @azz_4x4_o1x4 +SMLSLL_nn_d 11000001 111 ...01 0 .. 000 ...00 0100 . @azz_4x4_o1x4 + +UMLALL_nn_s 11000001 101 ....0 0 .. 000 ....0 1000 . @azz_2x2_o1x4 +UMLALL_nn_d 11000001 111 ....0 0 .. 000 ....0 1000 . @azz_2x2_o1x4 +UMLALL_nn_s 11000001 101 ...01 0 .. 000 ...00 1000 . @azz_4x4_o1x4 +UMLALL_nn_d 11000001 111 ...01 0 .. 000 ...00 1000 . @azz_4x4_o1x4 + +UMLSLL_nn_s 11000001 101 ....0 0 .. 000 ....0 1100 . @azz_2x2_o1x4 +UMLSLL_nn_d 11000001 111 ....0 0 .. 000 ....0 1100 . @azz_2x2_o1x4 +UMLSLL_nn_s 11000001 101 ...01 0 .. 000 ...00 1100 . @azz_4x4_o1x4 +UMLSLL_nn_d 11000001 111 ...01 0 .. 000 ...00 1100 . @azz_4x4_o1x4 + +USMLALL_nn_s 11000001 101 ....0 0 .. 000 ....0 0010 . @azz_2x2_o1x4 +USMLALL_nn_s 11000001 101 ...01 0 .. 000 ...00 0010 . @azz_4x4_o1x4 + ### SME2 Multi-vector Indexed &azx_n n off rv zn zm idx @@ -XXX,XX +XXX,XX @@ UVDOT_nx_4h 11000001 1101 .... 1 .. 01 . ...00 11 ... @azx_4x1_i1_o3 SUVDOT_nx_4b 11000001 0101 .... 1 .. 0 .. ...01 11 ... @azx_4x1_i2_o3 USVDOT_nx_4b 11000001 0101 .... 1 .. 0 .. ...01 01 ... @azx_4x1_i2_o3 + +SMLAL_nx 11000001 1100 .... . .. 1 .. ..... 00 ... @azx_1x1_o3x2 +SMLAL_nx 11000001 1101 .... 0 .. 1 .. ....0 00 ... @azx_2x1_o2x2 +SMLAL_nx 11000001 1101 .... 1 .. 1 .. ...00 00 ... @azx_4x1_o2x2 + +SMLSL_nx 11000001 1100 .... . .. 1 .. ..... 01 ... @azx_1x1_o3x2 +SMLSL_nx 11000001 1101 .... 0 .. 1 .. ....0 01 ... @azx_2x1_o2x2 +SMLSL_nx 11000001 1101 .... 1 .. 1 .. ...00 01 ... @azx_4x1_o2x2 + +UMLAL_nx 11000001 1100 .... . .. 1 .. ..... 10 ... @azx_1x1_o3x2 +UMLAL_nx 11000001 1101 .... 0 .. 1 .. ....0 10 ... @azx_2x1_o2x2 +UMLAL_nx 11000001 1101 .... 1 .. 1 .. ...00 10 ... @azx_4x1_o2x2 + +UMLSL_nx 11000001 1100 .... . .. 1 .. ..... 11 ... @azx_1x1_o3x2 +UMLSL_nx 11000001 1101 .... 0 .. 1 .. ....0 11 ... @azx_2x1_o2x2 +UMLSL_nx 11000001 1101 .... 1 .. 1 .. ...00 11 ... @azx_4x1_o2x2 + +%idx4_15_10 15:1 10:3 +%idx4_10_1 10:2 1:2 +%idx3_10_1 10:1 1:2 + +@azx_1x1_i4_o2 ........ .... zm:4 . .. ... zn:5 ... .. \ + &azx_n n=1 rv=%mova_rv off=%off2_x4 idx=%idx4_15_10 +@azx_1x1_i3_o2 ........ .... zm:4 . .. ... zn:5 ... .. \ + &azx_n n=1 rv=%mova_rv off=%off2_x4 idx=%idx3_15_10 +@azx_2x1_i4_o1 ........ .... zm:4 . .. ... ..... ... .. \ + &azx_n n=2 rv=%mova_rv off=%off1_x4 zn=%zn_ax2 idx=%idx4_10_1 +@azx_2x1_i3_o1 ........ .... zm:4 . .. ... ..... ... .. \ + &azx_n n=2 rv=%mova_rv off=%off1_x4 zn=%zn_ax2 idx=%idx3_10_1 +@azx_4x1_i4_o1 ........ .... zm:4 . .. ... ..... ... .. \ + &azx_n n=4 rv=%mova_rv off=%off1_x4 zn=%zn_ax4 idx=%idx4_10_1 +@azx_4x1_i3_o1 ........ .... zm:4 . .. ... ..... ... .. \ + &azx_n n=4 rv=%mova_rv off=%off1_x4 zn=%zn_ax4 idx=%idx3_10_1 + +SMLALL_nx_s 11000001 0000 .... . .. ... ..... 000 .. @azx_1x1_i4_o2 +SMLALL_nx_d 11000001 1000 .... . .. 0.. ..... 000 .. @azx_1x1_i3_o2 +SMLALL_nx_s 11000001 0001 .... 0 .. 0.. ....0 00 ... @azx_2x1_i4_o1 +SMLALL_nx_d 11000001 1001 .... 0 .. 00. ....0 00 ... @azx_2x1_i3_o1 +SMLALL_nx_s 11000001 0001 .... 1 .. 0.. ...00 00 ... @azx_4x1_i4_o1 +SMLALL_nx_d 11000001 1001 .... 1 .. 00. ...00 00 ... @azx_4x1_i3_o1 + +SMLSLL_nx_s 11000001 0000 .... . .. ... ..... 010 .. @azx_1x1_i4_o2 +SMLSLL_nx_d 11000001 1000 .... . .. 0.. ..... 010 .. @azx_1x1_i3_o2 +SMLSLL_nx_s 11000001 0001 .... 0 .. 0.. ....0 01 ... @azx_2x1_i4_o1 +SMLSLL_nx_d 11000001 1001 .... 0 .. 00. ....0 01 ... @azx_2x1_i3_o1 +SMLSLL_nx_s 11000001 0001 .... 1 .. 0.. ...00 01 ... @azx_4x1_i4_o1 +SMLSLL_nx_d 11000001 1001 .... 1 .. 00. ...00 01 ... @azx_4x1_i3_o1 + +UMLALL_nx_s 11000001 0000 .... . .. ... ..... 100 .. @azx_1x1_i4_o2 +UMLALL_nx_d 11000001 1000 .... . .. 0.. ..... 100 .. @azx_1x1_i3_o2 +UMLALL_nx_s 11000001 0001 .... 0 .. 0.. ....0 10 ... @azx_2x1_i4_o1 +UMLALL_nx_d 11000001 1001 .... 0 .. 00. ....0 10 ... @azx_2x1_i3_o1 +UMLALL_nx_s 11000001 0001 .... 1 .. 0.. ...00 10 ... @azx_4x1_i4_o1 +UMLALL_nx_d 11000001 1001 .... 1 .. 00. ...00 10 ... @azx_4x1_i3_o1 + +UMLSLL_nx_s 11000001 0000 .... . .. ... ..... 110 .. @azx_1x1_i4_o2 +UMLSLL_nx_d 11000001 1000 .... . .. 0.. ..... 110 .. @azx_1x1_i3_o2 +UMLSLL_nx_s 11000001 0001 .... 0 .. 0.. ....0 11 ... @azx_2x1_i4_o1 +UMLSLL_nx_d 11000001 1001 .... 0 .. 00. ....0 11 ... @azx_2x1_i3_o1 +UMLSLL_nx_s 11000001 0001 .... 1 .. 0.. ...00 11 ... @azx_4x1_i4_o1 +UMLSLL_nx_d 11000001 1001 .... 1 .. 00. ...00 11 ... @azx_4x1_i3_o1 + +USMLALL_nx_s 11000001 0000 .... . .. ... ..... 001 .. @azx_1x1_i4_o2 +USMLALL_nx_s 11000001 0001 .... 0 .. 0.. ....1 00 ... @azx_2x1_i4_o1 +USMLALL_nx_s 11000001 0001 .... 1 .. 0.. ...01 00 ... @azx_4x1_i4_o1 + +SUMLALL_nx_s 11000001 0000 .... . .. ... ..... 101 .. @azx_1x1_i4_o2 +SUMLALL_nx_s 11000001 0001 .... 0 .. 0.. ....1 10 ... @azx_2x1_i4_o1 +SUMLALL_nx_s 11000001 0001 .... 1 .. 0.. ...01 10 ... @azx_4x1_i4_o1 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ DO_VDOT_IDX(sme2_svdot_idx_2h, int32_t, int16_t, int16_t, H4, H2) DO_VDOT_IDX(sme2_uvdot_idx_2h, uint32_t, uint16_t, uint16_t, H4, H2) #undef DO_VDOT_IDX + +#define DO_MLALL(NAME, TYPEW, TYPEN, TYPEM, HW, HN, OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ +{ \ + intptr_t elements = simd_oprsz(desc) / sizeof(TYPEW); \ + intptr_t sel = extract32(desc, SIMD_DATA_SHIFT, 2); \ + TYPEW *d = vd, *a = va; TYPEN *n = vn; TYPEM *m = vm; \ + for (intptr_t i = 0; i < elements; ++i) { \ + TYPEW nn = n[HN(i * 4 + sel)]; \ + TYPEM mm = m[HN(i * 4 + sel)]; \ + d[HW(i)] = a[HW(i)] OP (nn * mm); \ + } \ +} + +DO_MLALL(sme2_smlall_s, int32_t, int8_t, int8_t, H4, H1, +) +DO_MLALL(sme2_smlall_d, int64_t, int16_t, int16_t, H8, H2, +) +DO_MLALL(sme2_smlsll_s, int32_t, int8_t, int8_t, H4, H1, -) +DO_MLALL(sme2_smlsll_d, int64_t, int16_t, int16_t, H8, H2, -) + +DO_MLALL(sme2_umlall_s, uint32_t, uint8_t, uint8_t, H4, H1, +) +DO_MLALL(sme2_umlall_d, uint64_t, uint16_t, uint16_t, H8, H2, +) +DO_MLALL(sme2_umlsll_s, uint32_t, uint8_t, uint8_t, H4, H1, -) +DO_MLALL(sme2_umlsll_d, uint64_t, uint16_t, uint16_t, H8, H2, -) + +DO_MLALL(sme2_usmlall_s, uint32_t, uint8_t, int8_t, H4, H1, +) + +#undef DO_MLALL + +#define DO_MLALL_IDX(NAME, TYPEW, TYPEN, TYPEM, HW, HN, OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ +{ \ + intptr_t elements = simd_oprsz(desc) / sizeof(TYPEW); \ + intptr_t eltspersegment = 16 / sizeof(TYPEW); \ + intptr_t sel = extract32(desc, SIMD_DATA_SHIFT, 2); \ + intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 2, 4); \ + TYPEW *d = vd, *a = va; TYPEN *n = vn; TYPEM *m = vm; \ + for (intptr_t i = 0; i < elements; i += eltspersegment) { \ + TYPEW mm = m[HN(i * 4 + idx)]; \ + for (intptr_t j = 0; j < eltspersegment; ++j) { \ + TYPEN nn = n[HN((i + j) * 4 + sel)]; \ + d[HW(i + j)] = a[HW(i + j)] OP (nn * mm); \ + } \ + } \ +} + +DO_MLALL_IDX(sme2_smlall_idx_s, int32_t, int8_t, int8_t, H4, H1, +) +DO_MLALL_IDX(sme2_smlall_idx_d, int64_t, int16_t, int16_t, H8, H2, +) +DO_MLALL_IDX(sme2_smlsll_idx_s, int32_t, int8_t, int8_t, H4, H1, -) +DO_MLALL_IDX(sme2_smlsll_idx_d, int64_t, int16_t, int16_t, H8, H2, -) + +DO_MLALL_IDX(sme2_umlall_idx_s, uint32_t, uint8_t, uint8_t, H4, H1, +) +DO_MLALL_IDX(sme2_umlall_idx_d, uint64_t, uint16_t, uint16_t, H8, H2, +) +DO_MLALL_IDX(sme2_umlsll_idx_s, uint32_t, uint8_t, uint8_t, H4, H1, -) +DO_MLALL_IDX(sme2_umlsll_idx_d, uint64_t, uint16_t, uint16_t, H8, H2, -) + +DO_MLALL_IDX(sme2_usmlall_idx_s, uint32_t, uint8_t, int8_t, H4, H1, +) +DO_MLALL_IDX(sme2_sumlall_idx_s, uint32_t, int8_t, uint8_t, H4, H1, +) + +#undef DO_MLALL_IDX diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(UVDOT_nx_4h, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_uvdot_idx_4h) TRANS_FEAT(SUVDOT_nx_4b, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_suvdot_idx_4b) TRANS_FEAT(USVDOT_nx_4b, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_usvdot_idx_4b) + +static bool do_smlal(DisasContext *s, arg_azz_n *a, bool multi, + gen_helper_gvec_4 *fn) +{ + return do_azz_acc(s, a->n, 2, a->rv, a->off, a->zn, a->zm, + 0, 0, multi, fn); +} + +TRANS_FEAT(SMLAL_n1, aa64_sme2, do_smlal, a, false, gen_helper_sve2_smlal_zzzw_s) +TRANS_FEAT(SMLSL_n1, aa64_sme2, do_smlal, a, false, gen_helper_sve2_smlsl_zzzw_s) +TRANS_FEAT(UMLAL_n1, aa64_sme2, do_smlal, a, false, gen_helper_sve2_umlal_zzzw_s) +TRANS_FEAT(UMLSL_n1, aa64_sme2, do_smlal, a, false, gen_helper_sve2_umlsl_zzzw_s) + +TRANS_FEAT(SMLAL_nn, aa64_sme2, do_smlal, a, true, gen_helper_sve2_smlal_zzzw_s) +TRANS_FEAT(SMLSL_nn, aa64_sme2, do_smlal, a, true, gen_helper_sve2_smlsl_zzzw_s) +TRANS_FEAT(UMLAL_nn, aa64_sme2, do_smlal, a, true, gen_helper_sve2_umlal_zzzw_s) +TRANS_FEAT(UMLSL_nn, aa64_sme2, do_smlal, a, true, gen_helper_sve2_umlsl_zzzw_s) + +static bool do_smlal_nx(DisasContext *s, arg_azx_n *a, + gen_helper_gvec_4 *fn) +{ + return do_azz_acc(s, a->n, 2, a->rv, a->off, a->zn, a->zm, + a->idx << 1, 0, false, fn); +} + +TRANS_FEAT(SMLAL_nx, aa64_sme2, do_smlal_nx, a, gen_helper_sve2_smlal_idx_s) +TRANS_FEAT(SMLSL_nx, aa64_sme2, do_smlal_nx, a, gen_helper_sve2_smlsl_idx_s) +TRANS_FEAT(UMLAL_nx, aa64_sme2, do_smlal_nx, a, gen_helper_sve2_umlal_idx_s) +TRANS_FEAT(UMLSL_nx, aa64_sme2, do_smlal_nx, a, gen_helper_sve2_umlsl_idx_s) + +static bool do_smlall(DisasContext *s, arg_azz_n *a, bool multi, + gen_helper_gvec_4 *fn) +{ + return do_azz_acc(s, a->n, 4, a->rv, a->off, a->zn, a->zm, + 0, 0, multi, fn); +} + +static void gen_helper_sme2_sumlall_s(TCGv_ptr d, TCGv_ptr n, TCGv_ptr m, + TCGv_ptr a, TCGv_i32 desc) +{ + gen_helper_sme2_usmlall_s(d, m, n, a, desc); +} + +TRANS_FEAT(SMLALL_n1_s, aa64_sme2, do_smlall, a, false, gen_helper_sme2_smlall_s) +TRANS_FEAT(SMLSLL_n1_s, aa64_sme2, do_smlall, a, false, gen_helper_sme2_smlsll_s) +TRANS_FEAT(UMLALL_n1_s, aa64_sme2, do_smlall, a, false, gen_helper_sme2_umlall_s) +TRANS_FEAT(UMLSLL_n1_s, aa64_sme2, do_smlall, a, false, gen_helper_sme2_umlsll_s) +TRANS_FEAT(USMLALL_n1_s, aa64_sme2, do_smlall, a, false, gen_helper_sme2_usmlall_s) +TRANS_FEAT(SUMLALL_n1_s, aa64_sme2, do_smlall, a, false, gen_helper_sme2_sumlall_s) + +TRANS_FEAT(SMLALL_n1_d, aa64_sme2_i16i64, do_smlall, a, false, gen_helper_sme2_smlall_d) +TRANS_FEAT(SMLSLL_n1_d, aa64_sme2_i16i64, do_smlall, a, false, gen_helper_sme2_smlsll_d) +TRANS_FEAT(UMLALL_n1_d, aa64_sme2_i16i64, do_smlall, a, false, gen_helper_sme2_umlall_d) +TRANS_FEAT(UMLSLL_n1_d, aa64_sme2_i16i64, do_smlall, a, false, gen_helper_sme2_umlsll_d) + +TRANS_FEAT(SMLALL_nn_s, aa64_sme2, do_smlall, a, true, gen_helper_sme2_smlall_s) +TRANS_FEAT(SMLSLL_nn_s, aa64_sme2, do_smlall, a, true, gen_helper_sme2_smlsll_s) +TRANS_FEAT(UMLALL_nn_s, aa64_sme2, do_smlall, a, true, gen_helper_sme2_umlall_s) +TRANS_FEAT(UMLSLL_nn_s, aa64_sme2, do_smlall, a, true, gen_helper_sme2_umlsll_s) +TRANS_FEAT(USMLALL_nn_s, aa64_sme2, do_smlall, a, true, gen_helper_sme2_usmlall_s) + +TRANS_FEAT(SMLALL_nn_d, aa64_sme2_i16i64, do_smlall, a, true, gen_helper_sme2_smlall_d) +TRANS_FEAT(SMLSLL_nn_d, aa64_sme2_i16i64, do_smlall, a, true, gen_helper_sme2_smlsll_d) +TRANS_FEAT(UMLALL_nn_d, aa64_sme2_i16i64, do_smlall, a, true, gen_helper_sme2_umlall_d) +TRANS_FEAT(UMLSLL_nn_d, aa64_sme2_i16i64, do_smlall, a, true, gen_helper_sme2_umlsll_d) + +static bool do_smlall_nx(DisasContext *s, arg_azx_n *a, + gen_helper_gvec_4 *fn) +{ + return do_azz_acc(s, a->n, 4, a->rv, a->off, a->zn, a->zm, + a->idx << 2, 0, false, fn); +} + +TRANS_FEAT(SMLALL_nx_s, aa64_sme2, do_smlall_nx, a, gen_helper_sme2_smlall_idx_s) +TRANS_FEAT(SMLSLL_nx_s, aa64_sme2, do_smlall_nx, a, gen_helper_sme2_smlsll_idx_s) +TRANS_FEAT(UMLALL_nx_s, aa64_sme2, do_smlall_nx, a, gen_helper_sme2_umlall_idx_s) +TRANS_FEAT(UMLSLL_nx_s, aa64_sme2, do_smlall_nx, a, gen_helper_sme2_umlsll_idx_s) +TRANS_FEAT(USMLALL_nx_s, aa64_sme2, do_smlall_nx, a, gen_helper_sme2_usmlall_idx_s) +TRANS_FEAT(SUMLALL_nx_s, aa64_sme2, do_smlall_nx, a, gen_helper_sme2_sumlall_idx_s) + +TRANS_FEAT(SMLALL_nx_d, aa64_sme2_i16i64, do_smlall_nx, a, gen_helper_sme2_smlall_idx_d) +TRANS_FEAT(SMLSLL_nx_d, aa64_sme2_i16i64, do_smlall_nx, a, gen_helper_sme2_smlsll_idx_d) +TRANS_FEAT(UMLALL_nx_d, aa64_sme2_i16i64, do_smlall_nx, a, gen_helper_sme2_umlall_idx_d) +TRANS_FEAT(UMLSLL_nx_d, aa64_sme2_i16i64, do_smlall_nx, a, gen_helper_sme2_umlsll_idx_d) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Emphasize the non-fused nature of these multiply-add. Matches other helpers such as gvec_rsqrts_nf_[hs]. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-48-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper.h | 8 ++++---- target/arm/tcg/translate-neon.c | 4 ++-- target/arm/tcg/vec_helper.c | 8 ++++---- 3 files changed, 10 insertions(+), 10 deletions(-) diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper.h +++ b/target/arm/tcg/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_recps_nf_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, DEF_HELPER_FLAGS_5(gvec_rsqrts_nf_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_rsqrts_nf_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) -DEF_HELPER_FLAGS_5(gvec_fmla_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) -DEF_HELPER_FLAGS_5(gvec_fmla_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_fmla_nf_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_fmla_nf_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) -DEF_HELPER_FLAGS_5(gvec_fmls_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) -DEF_HELPER_FLAGS_5(gvec_fmls_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_fmls_nf_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_fmls_nf_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_vfma_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_vfma_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-neon.c +++ b/target/arm/tcg/translate-neon.c @@ -XXX,XX +XXX,XX @@ DO_3S_FP_GVEC(VACGE, gen_helper_gvec_facge_s, gen_helper_gvec_facge_h) DO_3S_FP_GVEC(VACGT, gen_helper_gvec_facgt_s, gen_helper_gvec_facgt_h) DO_3S_FP_GVEC(VMAX, gen_helper_gvec_fmax_s, gen_helper_gvec_fmax_h) DO_3S_FP_GVEC(VMIN, gen_helper_gvec_fmin_s, gen_helper_gvec_fmin_h) -DO_3S_FP_GVEC(VMLA, gen_helper_gvec_fmla_s, gen_helper_gvec_fmla_h) -DO_3S_FP_GVEC(VMLS, gen_helper_gvec_fmls_s, gen_helper_gvec_fmls_h) +DO_3S_FP_GVEC(VMLA, gen_helper_gvec_fmla_nf_s, gen_helper_gvec_fmla_nf_h) +DO_3S_FP_GVEC(VMLS, gen_helper_gvec_fmls_nf_s, gen_helper_gvec_fmls_nf_h) DO_3S_FP_GVEC(VFMA, gen_helper_gvec_vfma_s, gen_helper_gvec_vfma_h) DO_3S_FP_GVEC(VFMS, gen_helper_gvec_vfms_s, gen_helper_gvec_vfms_h) DO_3S_FP_GVEC(VRECPS, gen_helper_gvec_recps_nf_s, gen_helper_gvec_recps_nf_h) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, \ clear_tail(d, oprsz, simd_maxsz(desc)); \ } -DO_MULADD(gvec_fmla_h, float16_muladd_nf, float16) -DO_MULADD(gvec_fmla_s, float32_muladd_nf, float32) +DO_MULADD(gvec_fmla_nf_h, float16_muladd_nf, float16) +DO_MULADD(gvec_fmla_nf_s, float32_muladd_nf, float32) -DO_MULADD(gvec_fmls_h, float16_mulsub_nf, float16) -DO_MULADD(gvec_fmls_s, float32_mulsub_nf, float32) +DO_MULADD(gvec_fmls_nf_h, float16_mulsub_nf, float16) +DO_MULADD(gvec_fmls_nf_s, float32_mulsub_nf, float32) DO_MULADD(gvec_vfma_h, float16_muladd_f, float16) DO_MULADD(gvec_vfma_s, float32_muladd_f, float32) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-49-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 48 +++++++++++++++++ target/arm/tcg/translate-sme.c | 95 ++++++++++++++++++++++++++++++++++ 2 files changed, 143 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ USMLALL_n1_s 11000001 001 1 .... 0 .. 000 ..... 0010 . @azz_nx1_o1x4 n=4 SUMLALL_n1_s 11000001 001 0 .... 0 .. 000 ..... 1010 . @azz_nx1_o1x4 n=2 SUMLALL_n1_s 11000001 001 1 .... 0 .. 000 ..... 1010 . @azz_nx1_o1x4 n=4 +FMLA_n1_h 11000001 001 0 .... 0 .. 111 ..... 00 ... @azz_nx1_o3 n=2 +FMLA_n1_s 11000001 001 0 .... 0 .. 110 ..... 00 ... @azz_nx1_o3 n=2 +FMLA_n1_d 11000001 011 0 .... 0 .. 110 ..... 00 ... @azz_nx1_o3 n=2 +FMLA_n1_h 11000001 001 1 .... 0 .. 111 ..... 00 ... @azz_nx1_o3 n=4 +FMLA_n1_s 11000001 001 1 .... 0 .. 110 ..... 00 ... @azz_nx1_o3 n=4 +FMLA_n1_d 11000001 011 1 .... 0 .. 110 ..... 00 ... @azz_nx1_o3 n=4 + +FMLS_n1_h 11000001 001 0 .... 0 .. 111 ..... 01 ... @azz_nx1_o3 n=2 +FMLS_n1_s 11000001 001 0 .... 0 .. 110 ..... 01 ... @azz_nx1_o3 n=2 +FMLS_n1_d 11000001 011 0 .... 0 .. 110 ..... 01 ... @azz_nx1_o3 n=2 +FMLS_n1_h 11000001 001 1 .... 0 .. 111 ..... 01 ... @azz_nx1_o3 n=4 +FMLS_n1_s 11000001 001 1 .... 0 .. 110 ..... 01 ... @azz_nx1_o3 n=4 +FMLS_n1_d 11000001 011 1 .... 0 .. 110 ..... 01 ... @azz_nx1_o3 n=4 + ### SME2 Multi-vector Multiple Array Vectors %zn_ax2 6:4 !function=times_2 @@ -XXX,XX +XXX,XX @@ UMLSLL_nn_d 11000001 111 ...01 0 .. 000 ...00 1100 . @azz_4x4_o1x4 USMLALL_nn_s 11000001 101 ....0 0 .. 000 ....0 0010 . @azz_2x2_o1x4 USMLALL_nn_s 11000001 101 ...01 0 .. 000 ...00 0010 . @azz_4x4_o1x4 +FMLA_nn_h 11000001 101 ....0 0 .. 100 ....0 01 ... @azz_2x2_o3 +FMLA_nn_s 11000001 101 ....0 0 .. 110 ....0 00 ... @azz_2x2_o3 +FMLA_nn_d 11000001 111 ....0 0 .. 110 ....0 00 ... @azz_2x2_o3 +FMLA_nn_h 11000001 101 ...01 0 .. 100 ...00 01 ... @azz_4x4_o3 +FMLA_nn_s 11000001 101 ...01 0 .. 110 ...00 00 ... @azz_4x4_o3 +FMLA_nn_d 11000001 111 ...01 0 .. 110 ...00 00 ... @azz_4x4_o3 + +FMLS_nn_h 11000001 101 ....0 0 .. 100 ....0 11 ... @azz_2x2_o3 +FMLS_nn_s 11000001 101 ....0 0 .. 110 ....0 01 ... @azz_2x2_o3 +FMLS_nn_d 11000001 111 ....0 0 .. 110 ....0 01 ... @azz_2x2_o3 +FMLS_nn_h 11000001 101 ...01 0 .. 100 ...00 11 ... @azz_4x4_o3 +FMLS_nn_s 11000001 101 ...01 0 .. 110 ...00 01 ... @azz_4x4_o3 +FMLS_nn_d 11000001 111 ...01 0 .. 110 ...00 01 ... @azz_4x4_o3 + ### SME2 Multi-vector Indexed &azx_n n off rv zn zm idx @@ -XXX,XX +XXX,XX @@ USMLALL_nx_s 11000001 0001 .... 1 .. 0.. ...01 00 ... @azx_4x1_i4_o1 SUMLALL_nx_s 11000001 0000 .... . .. ... ..... 101 .. @azx_1x1_i4_o2 SUMLALL_nx_s 11000001 0001 .... 0 .. 0.. ....1 10 ... @azx_2x1_i4_o1 SUMLALL_nx_s 11000001 0001 .... 1 .. 0.. ...01 10 ... @azx_4x1_i4_o1 + +%idx3_10_3 10:2 3:1 +@azx_2x1_i3_o3 ........ .... zm:4 . .. ... ..... .. off:3 \ + &azx_n n=2 rv=%mova_rv zn=%zn_ax2 idx=%idx3_10_3 +@azx_4x1_i3_o3 ........ .... zm:4 . .. ... ..... .. off:3 \ + &azx_n n=4 rv=%mova_rv zn=%zn_ax4 idx=%idx3_10_3 + +FMLA_nx_h 11000001 0001 .... 0 .. 1.. ....0 0 .... @azx_2x1_i3_o3 +FMLA_nx_s 11000001 0101 .... 0 .. 0.. ....0 00 ... @azx_2x1_i2_o3 +FMLA_nx_d 11000001 1101 .... 0 .. 00. ....0 00 ... @azx_2x1_i1_o3 +FMLA_nx_h 11000001 0001 .... 1 .. 1.. ...00 0 .... @azx_4x1_i3_o3 +FMLA_nx_s 11000001 0101 .... 1 .. 0.. ...00 00 ... @azx_4x1_i2_o3 +FMLA_nx_d 11000001 1101 .... 1 .. 00. ...00 00 ... @azx_4x1_i1_o3 + +FMLS_nx_h 11000001 0001 .... 0 .. 1.. ....0 1 .... @azx_2x1_i3_o3 +FMLS_nx_s 11000001 0101 .... 0 .. 0.. ....0 10 ... @azx_2x1_i2_o3 +FMLS_nx_d 11000001 1101 .... 0 .. 00. ....0 10 ... @azx_2x1_i1_o3 +FMLS_nx_h 11000001 0001 .... 1 .. 1.. ...00 1 .... @azx_4x1_i3_o3 +FMLS_nx_s 11000001 0101 .... 1 .. 0.. ...00 10 ... @azx_4x1_i2_o3 +FMLS_nx_d 11000001 1101 .... 1 .. 00. ...00 10 ... @azx_4x1_i1_o3 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SUB_azz_nn_d, aa64_sme2_i16i64, do_azz_nn, a, MO_64, tcg_gen_gvec_sub */ #define FPST_ENV -1 +static bool do_azz_fp(DisasContext *s, int nreg, int nsel, + int rv, int off, int zn, int zm, + int data, int shsel, bool multi, int fpst, + gen_helper_gvec_3_ptr *fn) +{ + if (sme_smza_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + int vstride = svl / nreg; + TCGv_ptr t_za = get_zarray(s, rv, off, nreg, nsel); + TCGv_ptr t, ptr; + + if (fpst >= 0) { + ptr = fpstatus_ptr(fpst); + } else { + ptr = tcg_env; + } + t = tcg_temp_new_ptr(); + + for (int r = 0; r < nreg; ++r) { + TCGv_ptr t_zn = vec_full_reg_ptr(s, zn); + TCGv_ptr t_zm = vec_full_reg_ptr(s, zm); + + for (int i = 0; i < nsel; ++i) { + int o_za = (r * vstride + i) * sizeof(ARMVectorReg); + int desc = simd_desc(svl, svl, data | (i << shsel)); + + tcg_gen_addi_ptr(t, t_za, o_za); + fn(t, t_zn, t_zm, ptr, tcg_constant_i32(desc)); + } + + /* + * For multiple-and-single vectors, Zn may wrap. + * For multiple vectors, both Zn and Zm are aligned. + */ + zn = (zn + 1) % 32; + zm += multi; + } + } + return true; +} + static bool do_azz_acc_fp(DisasContext *s, int nreg, int nsel, int rv, int off, int zn, int zm, int data, int shsel, bool multi, int fpst, @@ -XXX,XX +XXX,XX @@ static bool do_vdot(DisasContext *s, arg_azx_n *a, gen_helper_gvec_4_ptr *fn) TRANS_FEAT(FVDOT, aa64_sme, do_vdot, a, gen_helper_sme2_fvdot_idx_h) TRANS_FEAT(BFVDOT, aa64_sme, do_vdot, a, gen_helper_sme2_bfvdot_idx) +static bool do_fmla(DisasContext *s, arg_azz_n *a, bool multi, + ARMFPStatusFlavour fpst, gen_helper_gvec_3_ptr *fn) +{ + return do_azz_fp(s, a->n, 1, a->rv, a->off, a->zn, a->zm, + 0, 0, multi, fpst, fn); +} + +TRANS_FEAT(FMLA_n1_h, aa64_sme_f16f16, do_fmla, a, false, FPST_ZA_F16, + gen_helper_gvec_vfma_h) +TRANS_FEAT(FMLS_n1_h, aa64_sme_f16f16, do_fmla, a, false, FPST_ZA_F16, + s->fpcr_ah ? gen_helper_gvec_ah_vfms_h : gen_helper_gvec_vfms_h) +TRANS_FEAT(FMLA_nn_h, aa64_sme_f16f16, do_fmla, a, true, FPST_ZA_F16, + gen_helper_gvec_vfma_h) +TRANS_FEAT(FMLS_nn_h, aa64_sme_f16f16, do_fmla, a, true, FPST_ZA_F16, + s->fpcr_ah ? gen_helper_gvec_ah_vfms_h : gen_helper_gvec_vfms_h) + +TRANS_FEAT(FMLA_n1_s, aa64_sme2, do_fmla, a, false, FPST_ZA, + gen_helper_gvec_vfma_s) +TRANS_FEAT(FMLS_n1_s, aa64_sme2, do_fmla, a, false, FPST_ZA, + s->fpcr_ah ? gen_helper_gvec_ah_vfms_s : gen_helper_gvec_vfms_s) +TRANS_FEAT(FMLA_nn_s, aa64_sme2, do_fmla, a, true, FPST_ZA, + gen_helper_gvec_vfma_s) +TRANS_FEAT(FMLS_nn_s, aa64_sme2, do_fmla, a, true, FPST_ZA, + s->fpcr_ah ? gen_helper_gvec_ah_vfms_s : gen_helper_gvec_vfms_s) + +TRANS_FEAT(FMLA_n1_d, aa64_sme2_f64f64, do_fmla, a, false, FPST_ZA, + gen_helper_gvec_vfma_d) +TRANS_FEAT(FMLS_n1_d, aa64_sme2_f64f64, do_fmla, a, false, FPST_ZA, + s->fpcr_ah ? gen_helper_gvec_ah_vfms_d : gen_helper_gvec_vfms_d) +TRANS_FEAT(FMLA_nn_d, aa64_sme2_f64f64, do_fmla, a, true, FPST_ZA, + gen_helper_gvec_vfma_d) +TRANS_FEAT(FMLS_nn_d, aa64_sme2_f64f64, do_fmla, a, true, FPST_ZA, + s->fpcr_ah ? gen_helper_gvec_ah_vfms_d : gen_helper_gvec_vfms_d) + +static bool do_fmla_nx(DisasContext *s, arg_azx_n *a, + ARMFPStatusFlavour fpst, gen_helper_gvec_4_ptr *fn) +{ + return do_azz_acc_fp(s, a->n, 1, a->rv, a->off, a->zn, a->zm, + a->idx, 0, false, fpst, fn); +} + +TRANS_FEAT(FMLA_nx_h, aa64_sme_f16f16, do_fmla_nx, a, FPST_ZA_F16, + gen_helper_gvec_fmla_idx_h) +TRANS_FEAT(FMLS_nx_h, aa64_sme_f16f16, do_fmla_nx, a, FPST_ZA_F16, + s->fpcr_ah ? gen_helper_gvec_ah_fmls_idx_h : gen_helper_gvec_fmls_idx_h) +TRANS_FEAT(FMLA_nx_s, aa64_sme2, do_fmla_nx, a, FPST_ZA, + gen_helper_gvec_fmla_idx_s) +TRANS_FEAT(FMLS_nx_s, aa64_sme2, do_fmla_nx, a, FPST_ZA, + s->fpcr_ah ? gen_helper_gvec_ah_fmls_idx_s : gen_helper_gvec_fmls_idx_s) +TRANS_FEAT(FMLA_nx_d, aa64_sme2_f64f64, do_fmla_nx, a, FPST_ZA, + gen_helper_gvec_fmla_idx_d) +TRANS_FEAT(FMLS_nx_d, aa64_sme2_f64f64, do_fmla_nx, a, FPST_ZA, + s->fpcr_ah ? gen_helper_gvec_ah_fmls_idx_d : gen_helper_gvec_fmls_idx_d) + /* * Expand array multi-vector single (n1), array multi-vector (nn), * and array multi-vector indexed (nx), for integer accumulate. -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-50-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper.h | 9 +++++++++ target/arm/tcg/sme.decode | 18 ++++++++++++++++++ target/arm/tcg/translate-sme.c | 14 ++++++++++++++ target/arm/tcg/vec_helper.c | 26 ++++++++++++++++++++++++++ 4 files changed, 67 insertions(+) diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper.h +++ b/target/arm/tcg/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fmls_nf_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i DEF_HELPER_FLAGS_5(gvec_vfma_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_vfma_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_vfma_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_bfmla, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_vfms_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_vfms_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_vfms_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_bfmls, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_ah_vfms_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_ah_vfms_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_ah_vfms_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_ah_bfmls, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_ftsmul_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(gvec_fmla_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_6(gvec_fmla_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_6(gvec_bfmla_idx, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_6(gvec_fmls_idx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(gvec_fmls_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_6(gvec_fmls_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_6(gvec_bfmls_idx, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_6(gvec_ah_fmls_idx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(gvec_ah_fmls_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_6(gvec_ah_fmls_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_6(gvec_ah_bfmls_idx, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_uqadd_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ USMLALL_n1_s 11000001 001 1 .... 0 .. 000 ..... 0010 . @azz_nx1_o1x4 n=4 SUMLALL_n1_s 11000001 001 0 .... 0 .. 000 ..... 1010 . @azz_nx1_o1x4 n=2 SUMLALL_n1_s 11000001 001 1 .... 0 .. 000 ..... 1010 . @azz_nx1_o1x4 n=4 +BFMLA_n1 11000001 011 0 .... 0 .. 111 ..... 00 ... @azz_nx1_o3 n=2 FMLA_n1_h 11000001 001 0 .... 0 .. 111 ..... 00 ... @azz_nx1_o3 n=2 FMLA_n1_s 11000001 001 0 .... 0 .. 110 ..... 00 ... @azz_nx1_o3 n=2 FMLA_n1_d 11000001 011 0 .... 0 .. 110 ..... 00 ... @azz_nx1_o3 n=2 + +BFMLA_n1 11000001 011 1 .... 0 .. 111 ..... 00 ... @azz_nx1_o3 n=4 FMLA_n1_h 11000001 001 1 .... 0 .. 111 ..... 00 ... @azz_nx1_o3 n=4 FMLA_n1_s 11000001 001 1 .... 0 .. 110 ..... 00 ... @azz_nx1_o3 n=4 FMLA_n1_d 11000001 011 1 .... 0 .. 110 ..... 00 ... @azz_nx1_o3 n=4 +BFMLS_n1 11000001 011 0 .... 0 .. 111 ..... 01 ... @azz_nx1_o3 n=2 FMLS_n1_h 11000001 001 0 .... 0 .. 111 ..... 01 ... @azz_nx1_o3 n=2 FMLS_n1_s 11000001 001 0 .... 0 .. 110 ..... 01 ... @azz_nx1_o3 n=2 FMLS_n1_d 11000001 011 0 .... 0 .. 110 ..... 01 ... @azz_nx1_o3 n=2 + +BFMLS_n1 11000001 011 1 .... 0 .. 111 ..... 01 ... @azz_nx1_o3 n=4 FMLS_n1_h 11000001 001 1 .... 0 .. 111 ..... 01 ... @azz_nx1_o3 n=4 FMLS_n1_s 11000001 001 1 .... 0 .. 110 ..... 01 ... @azz_nx1_o3 n=4 FMLS_n1_d 11000001 011 1 .... 0 .. 110 ..... 01 ... @azz_nx1_o3 n=4 @@ -XXX,XX +XXX,XX @@ UMLSLL_nn_d 11000001 111 ...01 0 .. 000 ...00 1100 . @azz_4x4_o1x4 USMLALL_nn_s 11000001 101 ....0 0 .. 000 ....0 0010 . @azz_2x2_o1x4 USMLALL_nn_s 11000001 101 ...01 0 .. 000 ...00 0010 . @azz_4x4_o1x4 +BFMLA_nn 11000001 111 ....0 0 .. 100 ....0 01 ... @azz_2x2_o3 FMLA_nn_h 11000001 101 ....0 0 .. 100 ....0 01 ... @azz_2x2_o3 FMLA_nn_s 11000001 101 ....0 0 .. 110 ....0 00 ... @azz_2x2_o3 FMLA_nn_d 11000001 111 ....0 0 .. 110 ....0 00 ... @azz_2x2_o3 + +BFMLA_nn 11000001 111 ...01 0 .. 100 ...00 01 ... @azz_4x4_o3 FMLA_nn_h 11000001 101 ...01 0 .. 100 ...00 01 ... @azz_4x4_o3 FMLA_nn_s 11000001 101 ...01 0 .. 110 ...00 00 ... @azz_4x4_o3 FMLA_nn_d 11000001 111 ...01 0 .. 110 ...00 00 ... @azz_4x4_o3 +BFMLS_nn 11000001 111 ....0 0 .. 100 ....0 11 ... @azz_2x2_o3 FMLS_nn_h 11000001 101 ....0 0 .. 100 ....0 11 ... @azz_2x2_o3 FMLS_nn_s 11000001 101 ....0 0 .. 110 ....0 01 ... @azz_2x2_o3 FMLS_nn_d 11000001 111 ....0 0 .. 110 ....0 01 ... @azz_2x2_o3 + +BFMLS_nn 11000001 111 ...01 0 .. 100 ...00 11 ... @azz_4x4_o3 FMLS_nn_h 11000001 101 ...01 0 .. 100 ...00 11 ... @azz_4x4_o3 FMLS_nn_s 11000001 101 ...01 0 .. 110 ...00 01 ... @azz_4x4_o3 FMLS_nn_d 11000001 111 ...01 0 .. 110 ...00 01 ... @azz_4x4_o3 @@ -XXX,XX +XXX,XX @@ SUMLALL_nx_s 11000001 0001 .... 1 .. 0.. ...01 10 ... @azx_4x1_i4_o1 @azx_4x1_i3_o3 ........ .... zm:4 . .. ... ..... .. off:3 \ &azx_n n=4 rv=%mova_rv zn=%zn_ax4 idx=%idx3_10_3 +BFMLA_nx 11000001 0001 .... 0 .. 1.. ....1 0 .... @azx_2x1_i3_o3 FMLA_nx_h 11000001 0001 .... 0 .. 1.. ....0 0 .... @azx_2x1_i3_o3 FMLA_nx_s 11000001 0101 .... 0 .. 0.. ....0 00 ... @azx_2x1_i2_o3 FMLA_nx_d 11000001 1101 .... 0 .. 00. ....0 00 ... @azx_2x1_i1_o3 + +BFMLA_nx 11000001 0001 .... 1 .. 1.. ...01 0 .... @azx_4x1_i3_o3 FMLA_nx_h 11000001 0001 .... 1 .. 1.. ...00 0 .... @azx_4x1_i3_o3 FMLA_nx_s 11000001 0101 .... 1 .. 0.. ...00 00 ... @azx_4x1_i2_o3 FMLA_nx_d 11000001 1101 .... 1 .. 00. ...00 00 ... @azx_4x1_i1_o3 +BFMLS_nx 11000001 0001 .... 0 .. 1.. ....1 1 .... @azx_2x1_i3_o3 FMLS_nx_h 11000001 0001 .... 0 .. 1.. ....0 1 .... @azx_2x1_i3_o3 FMLS_nx_s 11000001 0101 .... 0 .. 0.. ....0 10 ... @azx_2x1_i2_o3 FMLS_nx_d 11000001 1101 .... 0 .. 00. ....0 10 ... @azx_2x1_i1_o3 + +BFMLS_nx 11000001 0001 .... 1 .. 1.. ...01 1 .... @azx_4x1_i3_o3 FMLS_nx_h 11000001 0001 .... 1 .. 1.. ...00 1 .... @azx_4x1_i3_o3 FMLS_nx_s 11000001 0101 .... 1 .. 0.. ...00 10 ... @azx_4x1_i2_o3 FMLS_nx_d 11000001 1101 .... 1 .. 00. ...00 10 ... @azx_4x1_i1_o3 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FMLA_nn_d, aa64_sme2_f64f64, do_fmla, a, true, FPST_ZA, TRANS_FEAT(FMLS_nn_d, aa64_sme2_f64f64, do_fmla, a, true, FPST_ZA, s->fpcr_ah ? gen_helper_gvec_ah_vfms_d : gen_helper_gvec_vfms_d) +TRANS_FEAT(BFMLA_n1, aa64_sme_b16b16, do_fmla, a, false, FPST_ZA, + gen_helper_gvec_bfmla) +TRANS_FEAT(BFMLS_n1, aa64_sme_b16b16, do_fmla, a, false, FPST_ZA, + s->fpcr_ah ? gen_helper_gvec_ah_bfmls : gen_helper_gvec_bfmls) +TRANS_FEAT(BFMLA_nn, aa64_sme_b16b16, do_fmla, a, true, FPST_ZA, + gen_helper_gvec_bfmla) +TRANS_FEAT(BFMLS_nn, aa64_sme_b16b16, do_fmla, a, true, FPST_ZA, + s->fpcr_ah ? gen_helper_gvec_ah_bfmls : gen_helper_gvec_bfmls) + static bool do_fmla_nx(DisasContext *s, arg_azx_n *a, ARMFPStatusFlavour fpst, gen_helper_gvec_4_ptr *fn) { @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FMLA_nx_d, aa64_sme2_f64f64, do_fmla_nx, a, FPST_ZA, TRANS_FEAT(FMLS_nx_d, aa64_sme2_f64f64, do_fmla_nx, a, FPST_ZA, s->fpcr_ah ? gen_helper_gvec_ah_fmls_idx_d : gen_helper_gvec_fmls_idx_d) +TRANS_FEAT(BFMLA_nx, aa64_sme_b16b16, do_fmla_nx, a, FPST_ZA, + gen_helper_gvec_bfmla_idx) +TRANS_FEAT(BFMLS_nx, aa64_sme_b16b16, do_fmla_nx, a, FPST_ZA, + s->fpcr_ah ? gen_helper_gvec_ah_bfmls_idx : gen_helper_gvec_bfmls_idx) + /* * Expand array multi-vector single (n1), array multi-vector (nn), * and array multi-vector indexed (nx), for integer accumulate. diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ static float16 float16_muladd_f(float16 dest, float16 op1, float16 op2, return float16_muladd(op1, op2, dest, 0, stat); } +static bfloat16 bfloat16_muladd_f(bfloat16 dest, bfloat16 op1, bfloat16 op2, + float_status *stat) +{ + return bfloat16_muladd(op1, op2, dest, 0, stat); +} + static float32 float32_muladd_f(float32 dest, float32 op1, float32 op2, float_status *stat) { @@ -XXX,XX +XXX,XX @@ static float16 float16_mulsub_f(float16 dest, float16 op1, float16 op2, return float16_muladd(float16_chs(op1), op2, dest, 0, stat); } +static bfloat16 bfloat16_mulsub_f(bfloat16 dest, bfloat16 op1, bfloat16 op2, + float_status *stat) +{ + return bfloat16_muladd(bfloat16_chs(op1), op2, dest, 0, stat); +} + static float32 float32_mulsub_f(float32 dest, float32 op1, float32 op2, float_status *stat) { @@ -XXX,XX +XXX,XX @@ static float16 float16_ah_mulsub_f(float16 dest, float16 op1, float16 op2, return float16_muladd(op1, op2, dest, float_muladd_negate_product, stat); } +static bfloat16 bfloat16_ah_mulsub_f(bfloat16 dest, bfloat16 op1, bfloat16 op2, + float_status *stat) +{ + return bfloat16_muladd(op1, op2, dest, float_muladd_negate_product, stat); +} + static float32 float32_ah_mulsub_f(float32 dest, float32 op1, float32 op2, float_status *stat) { @@ -XXX,XX +XXX,XX @@ DO_MULADD(gvec_fmls_nf_s, float32_mulsub_nf, float32) DO_MULADD(gvec_vfma_h, float16_muladd_f, float16) DO_MULADD(gvec_vfma_s, float32_muladd_f, float32) DO_MULADD(gvec_vfma_d, float64_muladd_f, float64) +DO_MULADD(gvec_bfmla, bfloat16_muladd_f, bfloat16) DO_MULADD(gvec_vfms_h, float16_mulsub_f, float16) DO_MULADD(gvec_vfms_s, float32_mulsub_f, float32) DO_MULADD(gvec_vfms_d, float64_mulsub_f, float64) +DO_MULADD(gvec_bfmls, bfloat16_mulsub_f, bfloat16) DO_MULADD(gvec_ah_vfms_h, float16_ah_mulsub_f, float16) DO_MULADD(gvec_ah_vfms_s, float32_ah_mulsub_f, float32) DO_MULADD(gvec_ah_vfms_d, float64_ah_mulsub_f, float64) +DO_MULADD(gvec_ah_bfmls, bfloat16_ah_mulsub_f, bfloat16) + +#undef DO_MULADD /* For the indexed ops, SVE applies the index per 128-bit vector segment. * For AdvSIMD, there is of course only one such vector segment. @@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, \ DO_FMLA_IDX(gvec_fmla_idx_h, float16, H2, 0, 0) DO_FMLA_IDX(gvec_fmla_idx_s, float32, H4, 0, 0) DO_FMLA_IDX(gvec_fmla_idx_d, float64, H8, 0, 0) +DO_FMLA_IDX(gvec_bfmla_idx, bfloat16, H2, 0, 0) DO_FMLA_IDX(gvec_fmls_idx_h, float16, H2, INT16_MIN, 0) DO_FMLA_IDX(gvec_fmls_idx_s, float32, H4, INT32_MIN, 0) DO_FMLA_IDX(gvec_fmls_idx_d, float64, H8, INT64_MIN, 0) +DO_FMLA_IDX(gvec_bfmls_idx, bfloat16, H2, INT16_MIN, 0) DO_FMLA_IDX(gvec_ah_fmls_idx_h, float16, H2, 0, float_muladd_negate_product) DO_FMLA_IDX(gvec_ah_fmls_idx_s, float32, H4, 0, float_muladd_negate_product) DO_FMLA_IDX(gvec_ah_fmls_idx_d, float64, H8, 0, float_muladd_negate_product) +DO_FMLA_IDX(gvec_ah_bfmls_idx, bfloat16, H2, 0, float_muladd_negate_product) #undef DO_FMLA_IDX -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-51-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper.h | 2 ++ target/arm/tcg/sme.decode | 25 +++++++++++++++++++ target/arm/tcg/translate-sme.c | 44 ++++++++++++++++++++++++++++++++++ target/arm/tcg/vec_helper.c | 2 ++ 4 files changed, 73 insertions(+) diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper.h +++ b/target/arm/tcg/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_fclt0_d, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fadd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fadd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fadd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_bfadd, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fsub_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fsub_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fsub_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_bfsub, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fmul_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fmul_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ FMLS_nn_h 11000001 101 ...01 0 .. 100 ...00 11 ... @azz_4x4_o3 FMLS_nn_s 11000001 101 ...01 0 .. 110 ...00 01 ... @azz_4x4_o3 FMLS_nn_d 11000001 111 ...01 0 .. 110 ...00 01 ... @azz_4x4_o3 +&az_n n off rv zm +@az_2x2_o3 ........ ... ..... . .. ... ..... .. off:3 \ + &az_n n=2 rv=%mova_rv zm=%zn_ax2 +@az_4x4_o3 ........ ... ..... . .. ... ..... .. off:3 \ + &az_n n=4 rv=%mova_rv zm=%zn_ax4 + +FADD_nn_h 11000001 101 00100 0 .. 111 ....0 00 ... @az_2x2_o3 +FADD_nn_s 11000001 101 00000 0 .. 111 ....0 00 ... @az_2x2_o3 +FADD_nn_d 11000001 111 00000 0 .. 111 ....0 00 ... @az_2x2_o3 +FADD_nn_h 11000001 101 00101 0 .. 111 ...00 00 ... @az_4x4_o3 +FADD_nn_s 11000001 101 00001 0 .. 111 ...00 00 ... @az_4x4_o3 +FADD_nn_d 11000001 111 00001 0 .. 111 ...00 00 ... @az_4x4_o3 + +FSUB_nn_h 11000001 101 00100 0 .. 111 ....0 01 ... @az_2x2_o3 +FSUB_nn_s 11000001 101 00000 0 .. 111 ....0 01 ... @az_2x2_o3 +FSUB_nn_d 11000001 111 00000 0 .. 111 ....0 01 ... @az_2x2_o3 +FSUB_nn_h 11000001 101 00101 0 .. 111 ...00 01 ... @az_4x4_o3 +FSUB_nn_s 11000001 101 00001 0 .. 111 ...00 01 ... @az_4x4_o3 +FSUB_nn_d 11000001 111 00001 0 .. 111 ...00 01 ... @az_4x4_o3 + +BFADD_nn 11000001 111 00100 0 .. 111 ....0 00 ... @az_2x2_o3 +BFADD_nn 11000001 111 00101 0 .. 111 ...00 00 ... @az_4x4_o3 +BFSUB_nn 11000001 111 00100 0 .. 111 ....0 01 ... @az_2x2_o3 +BFSUB_nn 11000001 111 00101 0 .. 111 ...00 01 ... @az_4x4_o3 + ### SME2 Multi-vector Indexed &azx_n n off rv zn zm idx diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(BFMLA_nx, aa64_sme_b16b16, do_fmla_nx, a, FPST_ZA, TRANS_FEAT(BFMLS_nx, aa64_sme_b16b16, do_fmla_nx, a, FPST_ZA, s->fpcr_ah ? gen_helper_gvec_ah_bfmls_idx : gen_helper_gvec_bfmls_idx) +static bool do_faddsub(DisasContext *s, arg_az_n *a, ARMFPStatusFlavour fpst, + gen_helper_gvec_3_ptr *fn) +{ + if (sme_smza_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + int n = a->n; + int zm = a->zm; + int vstride = svl / n; + TCGv_ptr t_za = get_zarray(s, a->rv, a->off, n, 0); + TCGv_ptr ptr = fpstatus_ptr(fpst); + TCGv_ptr t = tcg_temp_new_ptr(); + + for (int r = 0; r < n; ++r) { + TCGv_ptr t_zm = vec_full_reg_ptr(s, zm + r); + int o_za = r * vstride * sizeof(ARMVectorReg); + int desc = simd_desc(svl, svl, 0); + + tcg_gen_addi_ptr(t, t_za, o_za); + fn(t, t, t_zm, ptr, tcg_constant_i32(desc)); + } + } + return true; +} + +TRANS_FEAT(FADD_nn_h, aa64_sme_f16f16, do_faddsub, a, + FPST_ZA_F16, gen_helper_gvec_fadd_h) +TRANS_FEAT(FSUB_nn_h, aa64_sme_f16f16, do_faddsub, a, + FPST_ZA_F16, gen_helper_gvec_fsub_h) + +TRANS_FEAT(FADD_nn_s, aa64_sme2, do_faddsub, a, + FPST_ZA, gen_helper_gvec_fadd_s) +TRANS_FEAT(FSUB_nn_s, aa64_sme2, do_faddsub, a, + FPST_ZA, gen_helper_gvec_fsub_s) + +TRANS_FEAT(FADD_nn_d, aa64_sme2_f64f64, do_faddsub, a, + FPST_ZA, gen_helper_gvec_fadd_d) +TRANS_FEAT(FSUB_nn_d, aa64_sme2_f64f64, do_faddsub, a, + FPST_ZA, gen_helper_gvec_fsub_d) + +TRANS_FEAT(BFADD_nn, aa64_sme_b16b16, do_faddsub, a, + FPST_ZA, gen_helper_gvec_bfadd) +TRANS_FEAT(BFSUB_nn, aa64_sme_b16b16, do_faddsub, a, + FPST_ZA, gen_helper_gvec_bfsub) + /* * Expand array multi-vector single (n1), array multi-vector (nn), * and array multi-vector indexed (nx), for integer accumulate. diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, \ DO_3OP(gvec_fadd_h, float16_add, float16) DO_3OP(gvec_fadd_s, float32_add, float32) DO_3OP(gvec_fadd_d, float64_add, float64) +DO_3OP(gvec_bfadd, bfloat16_add, bfloat16) DO_3OP(gvec_fsub_h, float16_sub, float16) DO_3OP(gvec_fsub_s, float32_sub, float32) DO_3OP(gvec_fsub_d, float64_sub, float64) +DO_3OP(gvec_bfsub, bfloat16_sub, bfloat16) DO_3OP(gvec_fmul_h, float16_mul, float16) DO_3OP(gvec_fmul_s, float32_mul, float32) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-52-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 12 ++++++++++++ target/arm/tcg/translate-sme.c | 28 ++++++++++++++++++++++++++++ 2 files changed, 40 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ BFMLS_nx 11000001 0001 .... 1 .. 1.. ...01 1 .... @azx_4x1_i3_o3 FMLS_nx_h 11000001 0001 .... 1 .. 1.. ...00 1 .... @azx_4x1_i3_o3 FMLS_nx_s 11000001 0101 .... 1 .. 0.. ...00 10 ... @azx_4x1_i2_o3 FMLS_nx_d 11000001 1101 .... 1 .. 00. ...00 10 ... @azx_4x1_i1_o3 + +### SME2 Add / Sub array accumulators + +ADD_aaz_s 11000001 101 000000 .. 111 ....0 10 ... @az_2x2_o3 +ADD_aaz_s 11000001 101 000010 .. 111 ...00 10 ... @az_4x4_o3 +ADD_aaz_d 11000001 111 000000 .. 111 ....0 10 ... @az_2x2_o3 +ADD_aaz_d 11000001 111 000010 .. 111 ...00 10 ... @az_4x4_o3 + +SUB_aaz_s 11000001 101 000000 .. 111 ....0 11 ... @az_2x2_o3 +SUB_aaz_s 11000001 101 000010 .. 111 ...00 11 ... @az_4x4_o3 +SUB_aaz_d 11000001 111 000000 .. 111 ....0 11 ... @az_2x2_o3 +SUB_aaz_d 11000001 111 000010 .. 111 ...00 11 ... @az_4x4_o3 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SUB_azz_nn_s, aa64_sme2, do_azz_nn, a, MO_32, tcg_gen_gvec_sub_var) TRANS_FEAT(ADD_azz_nn_d, aa64_sme2_i16i64, do_azz_nn, a, MO_64, tcg_gen_gvec_add_var) TRANS_FEAT(SUB_azz_nn_d, aa64_sme2_i16i64, do_azz_nn, a, MO_64, tcg_gen_gvec_sub_var) +/* Add/Sub each ZA[d*N] += Z[m*N] */ +static bool do_aaz(DisasContext *s, arg_az_n *a, int esz, GVecGen3FnVar *fn) +{ + TCGv_ptr t_za; + int svl, n; + + if (!sme_smza_enabled_check(s)) { + return true; + } + + n = a->n; + t_za = get_zarray(s, a->rv, a->off, n, 0); + svl = streaming_vec_reg_size(s); + + for (int i = 0; i < n; ++i) { + int o_za = (svl / n * sizeof(ARMVectorReg)) * i; + int o_zm = vec_full_reg_offset(s, a->zm + i); + + fn(esz, t_za, o_za, t_za, o_za, tcg_env, o_zm, svl, svl); + } + return true; +} + +TRANS_FEAT(ADD_aaz_s, aa64_sme2, do_aaz, a, MO_32, tcg_gen_gvec_add_var) +TRANS_FEAT(SUB_aaz_s, aa64_sme2, do_aaz, a, MO_32, tcg_gen_gvec_sub_var) +TRANS_FEAT(ADD_aaz_d, aa64_sme2_i16i64, do_aaz, a, MO_64, tcg_gen_gvec_add_var) +TRANS_FEAT(SUB_aaz_d, aa64_sme2_i16i64, do_aaz, a, MO_64, tcg_gen_gvec_sub_var) + /* * Expand array multi-vector single (n1), array multi-vector (nn), * and array multi-vector indexed (nx), for floating-point accumulate. -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-53-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 5 +++ target/arm/tcg/vec_internal.h | 2 + target/arm/tcg/sme.decode | 12 ++++++ target/arm/tcg/sme_helper.c | 74 ++++++++++++++++++++++++++++++++++ target/arm/tcg/sve_helper.c | 2 +- target/arm/tcg/translate-sme.c | 25 ++++++++++++ 6 files changed, 119 insertions(+), 1 deletion(-) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sme2_umlsll_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, DEF_HELPER_FLAGS_5(sme2_umlsll_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sme2_usmlall_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sme2_sumlall_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sme2_bfcvt, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_4(sme2_bfcvtn, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_4(sme2_fcvt_n, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_4(sme2_fcvtn, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_internal.h +++ b/target/arm/tcg/vec_internal.h @@ -XXX,XX +XXX,XX @@ static inline float64 float64_maybe_ah_chs(float64 a, bool fpcr_ah) bfloat16 helper_sme2_ah_fmax_b16(bfloat16 a, bfloat16 b, float_status *fpst); bfloat16 helper_sme2_ah_fmin_b16(bfloat16 a, bfloat16 b, float_status *fpst); +float16 sve_f32_to_f16(float32 f, float_status *fpst); + #endif /* TARGET_ARM_VEC_INTERNAL_H */ diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SUB_aaz_s 11000001 101 000000 .. 111 ....0 11 ... @az_2x2_o3 SUB_aaz_s 11000001 101 000010 .. 111 ...00 11 ... @az_4x4_o3 SUB_aaz_d 11000001 111 000000 .. 111 ....0 11 ... @az_2x2_o3 SUB_aaz_d 11000001 111 000010 .. 111 ...00 11 ... @az_4x4_o3 + +### SME2 Multi-vector SVE Constructive Unary + +&zz_n zd zn n +@zz_1x2 ........ ... ..... ...... ..... zd:5 \ + &zz_n n=1 zn=%zn_ax2 + +BFCVT 11000001 011 00000 111000 ....0 ..... @zz_1x2 +BFCVTN 11000001 011 00000 111000 ....1 ..... @zz_1x2 + +FCVT_n 11000001 001 00000 111000 ....0 ..... @zz_1x2 +FCVTN 11000001 001 00000 111000 ....1 ..... @zz_1x2 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ DO_MLALL_IDX(sme2_usmlall_idx_s, uint32_t, uint8_t, int8_t, H4, H1, +) DO_MLALL_IDX(sme2_sumlall_idx_s, uint32_t, int8_t, uint8_t, H4, H1, +) #undef DO_MLALL_IDX + +/* Convert and compress */ +void HELPER(sme2_bfcvt)(void *vd, void *vs, float_status *fpst, uint32_t desc) +{ + ARMVectorReg scratch; + size_t oprsz = simd_oprsz(desc); + size_t i, n = oprsz / 4; + float32 *s0 = vs; + float32 *s1 = vs + sizeof(ARMVectorReg); + bfloat16 *d = vd; + + if (vd == s1) { + s1 = memcpy(&scratch, s1, oprsz); + } + + for (i = 0; i < n; ++i) { + d[H2(i)] = float32_to_bfloat16(s0[H4(i)], fpst); + } + for (i = 0; i < n; ++i) { + d[H2(i) + n] = float32_to_bfloat16(s1[H4(i)], fpst); + } +} + +void HELPER(sme2_fcvt_n)(void *vd, void *vs, float_status *fpst, uint32_t desc) +{ + ARMVectorReg scratch; + size_t oprsz = simd_oprsz(desc); + size_t i, n = oprsz / 4; + float32 *s0 = vs; + float32 *s1 = vs + sizeof(ARMVectorReg); + float16 *d = vd; + + if (vd == s1) { + s1 = memcpy(&scratch, s1, oprsz); + } + + for (i = 0; i < n; ++i) { + d[H2(i)] = sve_f32_to_f16(s0[H4(i)], fpst); + } + for (i = 0; i < n; ++i) { + d[H2(i) + n] = sve_f32_to_f16(s1[H4(i)], fpst); + } +} + +/* Convert and interleave */ +void HELPER(sme2_bfcvtn)(void *vd, void *vs, float_status *fpst, uint32_t desc) +{ + size_t i, n = simd_oprsz(desc) / 4; + float32 *s0 = vs; + float32 *s1 = vs + sizeof(ARMVectorReg); + bfloat16 *d = vd; + + for (i = 0; i < n; ++i) { + bfloat16 d0 = float32_to_bfloat16(s0[H4(i)], fpst); + bfloat16 d1 = float32_to_bfloat16(s1[H4(i)], fpst); + d[H2(i * 2 + 0)] = d0; + d[H2(i * 2 + 1)] = d1; + } +} + +void HELPER(sme2_fcvtn)(void *vd, void *vs, float_status *fpst, uint32_t desc) +{ + size_t i, n = simd_oprsz(desc) / 4; + float32 *s0 = vs; + float32 *s1 = vs + sizeof(ARMVectorReg); + bfloat16 *d = vd; + + for (i = 0; i < n; ++i) { + bfloat16 d0 = sve_f32_to_f16(s0[H4(i)], fpst); + bfloat16 d1 = sve_f32_to_f16(s1[H4(i)], fpst); + d[H2(i * 2 + 0)] = d0; + d[H2(i * 2 + 1)] = d1; + } +} diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ static inline float64 sve_f16_to_f64(float16 f, float_status *fpst) return ret; } -static inline float16 sve_f32_to_f16(float32 f, float_status *fpst) +float16 sve_f32_to_f16(float32 f, float_status *fpst) { bool save = get_flush_to_zero(fpst); float16 ret; diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SMLALL_nx_d, aa64_sme2_i16i64, do_smlall_nx, a, gen_helper_sme2_smlal TRANS_FEAT(SMLSLL_nx_d, aa64_sme2_i16i64, do_smlall_nx, a, gen_helper_sme2_smlsll_idx_d) TRANS_FEAT(UMLALL_nx_d, aa64_sme2_i16i64, do_smlall_nx, a, gen_helper_sme2_umlall_idx_d) TRANS_FEAT(UMLSLL_nx_d, aa64_sme2_i16i64, do_smlall_nx, a, gen_helper_sme2_umlsll_idx_d) + +static bool do_zz_fpst(DisasContext *s, arg_zz_n *a, int data, + ARMFPStatusFlavour type, gen_helper_gvec_2_ptr *fn) +{ + if (sme_sm_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + TCGv_ptr fpst = fpstatus_ptr(type); + + for (int i = 0, n = a->n; i < n; ++i) { + tcg_gen_gvec_2_ptr(vec_full_reg_offset(s, a->zd + i), + vec_full_reg_offset(s, a->zn + i), + fpst, svl, svl, data, fn); + } + } + return true; +} + +TRANS_FEAT(BFCVT, aa64_sme2, do_zz_fpst, a, 0, + FPST_A64, gen_helper_sme2_bfcvt) +TRANS_FEAT(BFCVTN, aa64_sme2, do_zz_fpst, a, 0, + FPST_A64, gen_helper_sme2_bfcvtn) +TRANS_FEAT(FCVT_n, aa64_sme2, do_zz_fpst, a, 0, + FPST_A64, gen_helper_sme2_fcvt_n) +TRANS_FEAT(FCVTN, aa64_sme2, do_zz_fpst, a, 0, + FPST_A64, gen_helper_sme2_fcvtn) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-54-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 2 ++ target/arm/tcg/vec_internal.h | 1 + target/arm/tcg/sme.decode | 5 ++++ target/arm/tcg/sme_helper.c | 45 ++++++++++++++++++++++++++++++++++ target/arm/tcg/sve_helper.c | 2 +- target/arm/tcg/translate-sme.c | 5 ++++ 6 files changed, 59 insertions(+), 1 deletion(-) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sme2_bfcvt, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_4(sme2_bfcvtn, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_4(sme2_fcvt_n, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_4(sme2_fcvtn, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_4(sme2_fcvt_w, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_4(sme2_fcvtl, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_internal.h +++ b/target/arm/tcg/vec_internal.h @@ -XXX,XX +XXX,XX @@ static inline float64 float64_maybe_ah_chs(float64 a, bool fpcr_ah) bfloat16 helper_sme2_ah_fmax_b16(bfloat16 a, bfloat16 b, float_status *fpst); bfloat16 helper_sme2_ah_fmin_b16(bfloat16 a, bfloat16 b, float_status *fpst); +float32 sve_f16_to_f32(float16 f, float_status *fpst); float16 sve_f32_to_f16(float32 f, float_status *fpst); #endif /* TARGET_ARM_VEC_INTERNAL_H */ diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SUB_aaz_d 11000001 111 000010 .. 111 ...00 11 ... @az_4x4_o3 &zz_n zd zn n @zz_1x2 ........ ... ..... ...... ..... zd:5 \ &zz_n n=1 zn=%zn_ax2 +@zz_2x1 ........ ... ..... ...... zn:5 ..... \ + &zz_n n=1 zd=%zd_ax2 BFCVT 11000001 011 00000 111000 ....0 ..... @zz_1x2 BFCVTN 11000001 011 00000 111000 ....1 ..... @zz_1x2 FCVT_n 11000001 001 00000 111000 ....0 ..... @zz_1x2 FCVTN 11000001 001 00000 111000 ....1 ..... @zz_1x2 + +FCVT_w 11000001 101 00000 111000 ..... ....0 @zz_2x1 +FCVTL 11000001 101 00000 111000 ..... ....1 @zz_2x1 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ #include "vec_internal.h" #include "sve_ldst_internal.h" + +static bool vectors_overlap(ARMVectorReg *x, unsigned nx, + ARMVectorReg *y, unsigned ny) +{ + return !(x + nx <= y || y + ny <= x); +} + void helper_set_svcr(CPUARMState *env, uint32_t val, uint32_t mask) { aarch64_set_svcr(env, val, mask); @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_fcvtn)(void *vd, void *vs, float_status *fpst, uint32_t desc) d[H2(i * 2 + 1)] = d1; } } + +/* Expand and convert */ +void HELPER(sme2_fcvt_w)(void *vd, void *vs, float_status *fpst, uint32_t desc) +{ + ARMVectorReg scratch; + size_t oprsz = simd_oprsz(desc); + size_t i, n = oprsz / 4; + float16 *s = vs; + float32 *d0 = vd; + float32 *d1 = vd + sizeof(ARMVectorReg); + + if (vectors_overlap(vd, 1, vs, 2)) { + s = memcpy(&scratch, s, oprsz); + } + + for (i = 0; i < n; ++i) { + d0[H4(i)] = sve_f16_to_f32(s[H2(i)], fpst); + } + for (i = 0; i < n; ++i) { + d1[H4(i)] = sve_f16_to_f32(s[H2(n + i)], fpst); + } +} + +/* Deinterleave and convert. */ +void HELPER(sme2_fcvtl)(void *vd, void *vs, float_status *fpst, uint32_t desc) +{ + size_t i, n = simd_oprsz(desc) / 4; + float16 *s = vs; + float32 *d0 = vd; + float32 *d1 = vd + sizeof(ARMVectorReg); + + for (i = 0; i < n; ++i) { + float32 v0 = sve_f16_to_f32(s[H2(i * 2 + 0)], fpst); + float32 v1 = sve_f16_to_f32(s[H2(i * 2 + 1)], fpst); + d0[H4(i)] = v0; + d1[H4(i)] = v1; + } +} diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vg, \ * FZ16. When converting from fp16, this affects flushing input denormals; * when converting to fp16, this affects flushing output denormals. */ -static inline float32 sve_f16_to_f32(float16 f, float_status *fpst) +float32 sve_f16_to_f32(float16 f, float_status *fpst) { bool save = get_flush_inputs_to_zero(fpst); float32 ret; diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FCVT_n, aa64_sme2, do_zz_fpst, a, 0, FPST_A64, gen_helper_sme2_fcvt_n) TRANS_FEAT(FCVTN, aa64_sme2, do_zz_fpst, a, 0, FPST_A64, gen_helper_sme2_fcvtn) + +TRANS_FEAT(FCVT_w, aa64_sme_f16f16, do_zz_fpst, a, 0, + FPST_A64_F16, gen_helper_sme2_fcvt_w) +TRANS_FEAT(FCVTL, aa64_sme_f16f16, do_zz_fpst, a, 0, + FPST_A64_F16, gen_helper_sme2_fcvtl) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-55-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 9 +++++++++ target/arm/tcg/translate-sme.c | 5 +++++ 2 files changed, 14 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SUB_aaz_d 11000001 111 000010 .. 111 ...00 11 ... @az_4x4_o3 &zz_n n=1 zn=%zn_ax2 @zz_2x1 ........ ... ..... ...... zn:5 ..... \ &zz_n n=1 zd=%zd_ax2 +@zz_2x2 ........ ... ..... ...... .... . ..... \ + &zz_n n=2 zd=%zd_ax2 zn=%zn_ax2 +@zz_4x4 ........ ... ..... ...... .... . ..... \ + &zz_n n=4 zd=%zd_ax4 zn=%zn_ax4 BFCVT 11000001 011 00000 111000 ....0 ..... @zz_1x2 BFCVTN 11000001 011 00000 111000 ....1 ..... @zz_1x2 @@ -XXX,XX +XXX,XX @@ FCVTN 11000001 001 00000 111000 ....1 ..... @zz_1x2 FCVT_w 11000001 101 00000 111000 ..... ....0 @zz_2x1 FCVTL 11000001 101 00000 111000 ..... ....1 @zz_2x1 + +FCVTZS 11000001 001 00001 111000 ....0 ....0 @zz_2x2 +FCVTZS 11000001 001 10001 111000 ...00 ...00 @zz_4x4 +FCVTZU 11000001 001 00001 111000 ....1 ....0 @zz_2x2 +FCVTZU 11000001 001 10001 111000 ...01 ...00 @zz_4x4 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FCVT_w, aa64_sme_f16f16, do_zz_fpst, a, 0, FPST_A64_F16, gen_helper_sme2_fcvt_w) TRANS_FEAT(FCVTL, aa64_sme_f16f16, do_zz_fpst, a, 0, FPST_A64_F16, gen_helper_sme2_fcvtl) + +TRANS_FEAT(FCVTZS, aa64_sme2, do_zz_fpst, a, 0, + FPST_A64, gen_helper_gvec_vcvt_rz_fs) +TRANS_FEAT(FCVTZU, aa64_sme2, do_zz_fpst, a, 0, + FPST_A64, gen_helper_gvec_vcvt_rz_fu) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-56-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 2 ++ target/arm/tcg/sme.decode | 5 +++++ target/arm/tcg/sme_helper.c | 22 ++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 5 +++++ 4 files changed, 34 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sme2_fcvt_n, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_4(sme2_fcvtn, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_4(sme2_fcvt_w, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_4(sme2_fcvtl, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_4(sme2_scvtf, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_4(sme2_ucvtf, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ FCVTZS 11000001 001 00001 111000 ....0 ....0 @zz_2x2 FCVTZS 11000001 001 10001 111000 ...00 ...00 @zz_4x4 FCVTZU 11000001 001 00001 111000 ....1 ....0 @zz_2x2 FCVTZU 11000001 001 10001 111000 ...01 ...00 @zz_4x4 + +SCVTF 11000001 001 00010 111000 ....0 ....0 @zz_2x2 +SCVTF 11000001 001 10010 111000 ...00 ...00 @zz_4x4 +UCVTF 11000001 001 00010 111000 ....1 ....0 @zz_2x2 +UCVTF 11000001 001 10010 111000 ...01 ...00 @zz_4x4 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_fcvtl)(void *vd, void *vs, float_status *fpst, uint32_t desc) d1[H4(i)] = v1; } } + +void HELPER(sme2_scvtf)(void *vd, void *vs, float_status *fpst, uint32_t desc) +{ + size_t i, n = simd_oprsz(desc) / 4; + int32_t *d = vd; + float32 *s = vs; + + for (i = 0; i < n; ++i) { + d[i] = int32_to_float32(s[i], fpst); + } +} + +void HELPER(sme2_ucvtf)(void *vd, void *vs, float_status *fpst, uint32_t desc) +{ + size_t i, n = simd_oprsz(desc) / 4; + uint32_t *d = vd; + float32 *s = vs; + + for (i = 0; i < n; ++i) { + d[i] = uint32_to_float32(s[i], fpst); + } +} diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FCVTZS, aa64_sme2, do_zz_fpst, a, 0, FPST_A64, gen_helper_gvec_vcvt_rz_fs) TRANS_FEAT(FCVTZU, aa64_sme2, do_zz_fpst, a, 0, FPST_A64, gen_helper_gvec_vcvt_rz_fu) + +TRANS_FEAT(SCVTF, aa64_sme2, do_zz_fpst, a, 0, + FPST_A64, gen_helper_sme2_scvtf) +TRANS_FEAT(UCVTF, aa64_sme2, do_zz_fpst, a, 0, + FPST_A64, gen_helper_sme2_ucvtf) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-57-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 9 +++++++++ target/arm/tcg/translate-sme.c | 9 +++++++++ 2 files changed, 18 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SCVTF 11000001 001 00010 111000 ....0 ....0 @zz_2x2 SCVTF 11000001 001 10010 111000 ...00 ...00 @zz_4x4 UCVTF 11000001 001 00010 111000 ....1 ....0 @zz_2x2 UCVTF 11000001 001 10010 111000 ...01 ...00 @zz_4x4 + +FRINTN 11000001 101 01000 111000 ....0 ....0 @zz_2x2 +FRINTN 11000001 101 11000 111000 ...00 ...00 @zz_4x4 +FRINTP 11000001 101 01001 111000 ....0 ....0 @zz_2x2 +FRINTP 11000001 101 11001 111000 ...00 ...00 @zz_4x4 +FRINTM 11000001 101 01010 111000 ....0 ....0 @zz_2x2 +FRINTM 11000001 101 11010 111000 ...00 ...00 @zz_4x4 +FRINTA 11000001 101 01100 111000 ....0 ....0 @zz_2x2 +FRINTA 11000001 101 11100 111000 ...00 ...00 @zz_4x4 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SCVTF, aa64_sme2, do_zz_fpst, a, 0, FPST_A64, gen_helper_sme2_scvtf) TRANS_FEAT(UCVTF, aa64_sme2, do_zz_fpst, a, 0, FPST_A64, gen_helper_sme2_ucvtf) + +TRANS_FEAT(FRINTN, aa64_sme2, do_zz_fpst, a, float_round_nearest_even, + FPST_A64, gen_helper_gvec_vrint_rm_s) +TRANS_FEAT(FRINTP, aa64_sme2, do_zz_fpst, a, float_round_up, + FPST_A64, gen_helper_gvec_vrint_rm_s) +TRANS_FEAT(FRINTM, aa64_sme2, do_zz_fpst, a, float_round_down, + FPST_A64, gen_helper_gvec_vrint_rm_s) +TRANS_FEAT(FRINTA, aa64_sme2, do_zz_fpst, a, float_round_ties_away, + FPST_A64, gen_helper_gvec_vrint_rm_s) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Inputs are a wider type of indeterminate sign. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-58-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/vec_internal.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_internal.h +++ b/target/arm/tcg/vec_internal.h @@ -XXX,XX +XXX,XX @@ int16_t do_sqrdmlah_h(int16_t, int16_t, int16_t, bool, bool, uint32_t *); int32_t do_sqrdmlah_s(int32_t, int32_t, int32_t, bool, bool, uint32_t *); int64_t do_sqrdmlah_d(int64_t, int64_t, int64_t, bool, bool); +#define do_ssat_b(val) MIN(MAX(val, INT8_MIN), INT8_MAX) +#define do_ssat_h(val) MIN(MAX(val, INT16_MIN), INT16_MAX) +#define do_ssat_s(val) MIN(MAX(val, INT32_MIN), INT32_MAX) +#define do_usat_b(val) MIN(MAX(val, 0), UINT8_MAX) +#define do_usat_h(val) MIN(MAX(val, 0), UINT16_MAX) +#define do_usat_s(val) MIN(MAX(val, 0), UINT32_MAX) + /** * bfdotadd: * @sum: addend -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Replace and remove do_sat_bhs. This avoids multiple repetitions of INT*_MIN/MAX. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-59-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve_helper.c | 116 +++++++++++++++--------------------- 1 file changed, 48 insertions(+), 68 deletions(-) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ DO_ZPZZ(sve2_uhsub_zpzz_h, uint16_t, H1_2, DO_HSUB_BHS) DO_ZPZZ(sve2_uhsub_zpzz_s, uint32_t, H1_4, DO_HSUB_BHS) DO_ZPZZ_D(sve2_uhsub_zpzz_d, uint64_t, DO_HSUB_D) -static inline int32_t do_sat_bhs(int64_t val, int64_t min, int64_t max) -{ - return val >= max ? max : val <= min ? min : val; -} - -#define DO_SQADD_B(n, m) do_sat_bhs((int64_t)n + m, INT8_MIN, INT8_MAX) -#define DO_SQADD_H(n, m) do_sat_bhs((int64_t)n + m, INT16_MIN, INT16_MAX) -#define DO_SQADD_S(n, m) do_sat_bhs((int64_t)n + m, INT32_MIN, INT32_MAX) +#define DO_SQADD_B(n, m) do_ssat_b((int64_t)n + m) +#define DO_SQADD_H(n, m) do_ssat_h((int64_t)n + m) +#define DO_SQADD_S(n, m) do_ssat_s((int64_t)n + m) static inline int64_t do_sqadd_d(int64_t n, int64_t m) { @@ -XXX,XX +XXX,XX @@ DO_ZPZZ(sve2_sqadd_zpzz_h, int16_t, H1_2, DO_SQADD_H) DO_ZPZZ(sve2_sqadd_zpzz_s, int32_t, H1_4, DO_SQADD_S) DO_ZPZZ_D(sve2_sqadd_zpzz_d, int64_t, do_sqadd_d) -#define DO_UQADD_B(n, m) do_sat_bhs((int64_t)n + m, 0, UINT8_MAX) -#define DO_UQADD_H(n, m) do_sat_bhs((int64_t)n + m, 0, UINT16_MAX) -#define DO_UQADD_S(n, m) do_sat_bhs((int64_t)n + m, 0, UINT32_MAX) +#define DO_UQADD_B(n, m) do_usat_b((int64_t)n + m) +#define DO_UQADD_H(n, m) do_usat_h((int64_t)n + m) +#define DO_UQADD_S(n, m) do_usat_s((int64_t)n + m) static inline uint64_t do_uqadd_d(uint64_t n, uint64_t m) { @@ -XXX,XX +XXX,XX @@ DO_ZPZZ(sve2_uqadd_zpzz_h, uint16_t, H1_2, DO_UQADD_H) DO_ZPZZ(sve2_uqadd_zpzz_s, uint32_t, H1_4, DO_UQADD_S) DO_ZPZZ_D(sve2_uqadd_zpzz_d, uint64_t, do_uqadd_d) -#define DO_SQSUB_B(n, m) do_sat_bhs((int64_t)n - m, INT8_MIN, INT8_MAX) -#define DO_SQSUB_H(n, m) do_sat_bhs((int64_t)n - m, INT16_MIN, INT16_MAX) -#define DO_SQSUB_S(n, m) do_sat_bhs((int64_t)n - m, INT32_MIN, INT32_MAX) +#define DO_SQSUB_B(n, m) do_ssat_b((int64_t)n - m) +#define DO_SQSUB_H(n, m) do_ssat_h((int64_t)n - m) +#define DO_SQSUB_S(n, m) do_ssat_s((int64_t)n - m) static inline int64_t do_sqsub_d(int64_t n, int64_t m) { @@ -XXX,XX +XXX,XX @@ DO_ZPZZ(sve2_sqsub_zpzz_h, int16_t, H1_2, DO_SQSUB_H) DO_ZPZZ(sve2_sqsub_zpzz_s, int32_t, H1_4, DO_SQSUB_S) DO_ZPZZ_D(sve2_sqsub_zpzz_d, int64_t, do_sqsub_d) -#define DO_UQSUB_B(n, m) do_sat_bhs((int64_t)n - m, 0, UINT8_MAX) -#define DO_UQSUB_H(n, m) do_sat_bhs((int64_t)n - m, 0, UINT16_MAX) -#define DO_UQSUB_S(n, m) do_sat_bhs((int64_t)n - m, 0, UINT32_MAX) +#define DO_UQSUB_B(n, m) do_usat_b((int64_t)n - m) +#define DO_UQSUB_H(n, m) do_usat_h((int64_t)n - m) +#define DO_UQSUB_S(n, m) do_usat_s((int64_t)n - m) static inline uint64_t do_uqsub_d(uint64_t n, uint64_t m) { @@ -XXX,XX +XXX,XX @@ DO_ZPZZ(sve2_uqsub_zpzz_h, uint16_t, H1_2, DO_UQSUB_H) DO_ZPZZ(sve2_uqsub_zpzz_s, uint32_t, H1_4, DO_UQSUB_S) DO_ZPZZ_D(sve2_uqsub_zpzz_d, uint64_t, do_uqsub_d) -#define DO_SUQADD_B(n, m) \ - do_sat_bhs((int64_t)(int8_t)n + m, INT8_MIN, INT8_MAX) -#define DO_SUQADD_H(n, m) \ - do_sat_bhs((int64_t)(int16_t)n + m, INT16_MIN, INT16_MAX) -#define DO_SUQADD_S(n, m) \ - do_sat_bhs((int64_t)(int32_t)n + m, INT32_MIN, INT32_MAX) +#define DO_SUQADD_B(n, m) do_ssat_b((int64_t)(int8_t)n + m) +#define DO_SUQADD_H(n, m) do_ssat_h((int64_t)(int16_t)n + m) +#define DO_SUQADD_S(n, m) do_ssat_s((int64_t)(int32_t)n + m) static inline int64_t do_suqadd_d(int64_t n, uint64_t m) { @@ -XXX,XX +XXX,XX @@ DO_ZPZZ(sve2_suqadd_zpzz_h, uint16_t, H1_2, DO_SUQADD_H) DO_ZPZZ(sve2_suqadd_zpzz_s, uint32_t, H1_4, DO_SUQADD_S) DO_ZPZZ_D(sve2_suqadd_zpzz_d, uint64_t, do_suqadd_d) -#define DO_USQADD_B(n, m) \ - do_sat_bhs((int64_t)n + (int8_t)m, 0, UINT8_MAX) -#define DO_USQADD_H(n, m) \ - do_sat_bhs((int64_t)n + (int16_t)m, 0, UINT16_MAX) -#define DO_USQADD_S(n, m) \ - do_sat_bhs((int64_t)n + (int32_t)m, 0, UINT32_MAX) +#define DO_USQADD_B(n, m) do_usat_b((int64_t)n + (int8_t)m) +#define DO_USQADD_H(n, m) do_usat_h((int64_t)n + (int16_t)m) +#define DO_USQADD_S(n, m) do_usat_s((int64_t)n + (int32_t)m) static inline uint64_t do_usqadd_d(uint64_t n, int64_t m) { @@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, uint32_t desc) \ } \ } -#define DO_SQXTN_H(n) do_sat_bhs(n, INT8_MIN, INT8_MAX) -#define DO_SQXTN_S(n) do_sat_bhs(n, INT16_MIN, INT16_MAX) -#define DO_SQXTN_D(n) do_sat_bhs(n, INT32_MIN, INT32_MAX) +DO_XTNB(sve2_sqxtnb_h, int16_t, do_ssat_b) +DO_XTNB(sve2_sqxtnb_s, int32_t, do_ssat_h) +DO_XTNB(sve2_sqxtnb_d, int64_t, do_ssat_s) -DO_XTNB(sve2_sqxtnb_h, int16_t, DO_SQXTN_H) -DO_XTNB(sve2_sqxtnb_s, int32_t, DO_SQXTN_S) -DO_XTNB(sve2_sqxtnb_d, int64_t, DO_SQXTN_D) +DO_XTNT(sve2_sqxtnt_h, int16_t, int8_t, H1, do_ssat_b) +DO_XTNT(sve2_sqxtnt_s, int32_t, int16_t, H1_2, do_ssat_h) +DO_XTNT(sve2_sqxtnt_d, int64_t, int32_t, H1_4, do_ssat_s) -DO_XTNT(sve2_sqxtnt_h, int16_t, int8_t, H1, DO_SQXTN_H) -DO_XTNT(sve2_sqxtnt_s, int32_t, int16_t, H1_2, DO_SQXTN_S) -DO_XTNT(sve2_sqxtnt_d, int64_t, int32_t, H1_4, DO_SQXTN_D) +DO_XTNB(sve2_uqxtnb_h, uint16_t, do_usat_b) +DO_XTNB(sve2_uqxtnb_s, uint32_t, do_usat_h) +DO_XTNB(sve2_uqxtnb_d, uint64_t, do_usat_s) -#define DO_UQXTN_H(n) do_sat_bhs(n, 0, UINT8_MAX) -#define DO_UQXTN_S(n) do_sat_bhs(n, 0, UINT16_MAX) -#define DO_UQXTN_D(n) do_sat_bhs(n, 0, UINT32_MAX) +DO_XTNT(sve2_uqxtnt_h, uint16_t, uint8_t, H1, do_usat_b) +DO_XTNT(sve2_uqxtnt_s, uint32_t, uint16_t, H1_2, do_usat_h) +DO_XTNT(sve2_uqxtnt_d, uint64_t, uint32_t, H1_4, do_usat_s) -DO_XTNB(sve2_uqxtnb_h, uint16_t, DO_UQXTN_H) -DO_XTNB(sve2_uqxtnb_s, uint32_t, DO_UQXTN_S) -DO_XTNB(sve2_uqxtnb_d, uint64_t, DO_UQXTN_D) +DO_XTNB(sve2_sqxtunb_h, int16_t, do_usat_b) +DO_XTNB(sve2_sqxtunb_s, int32_t, do_usat_h) +DO_XTNB(sve2_sqxtunb_d, int64_t, do_usat_s) -DO_XTNT(sve2_uqxtnt_h, uint16_t, uint8_t, H1, DO_UQXTN_H) -DO_XTNT(sve2_uqxtnt_s, uint32_t, uint16_t, H1_2, DO_UQXTN_S) -DO_XTNT(sve2_uqxtnt_d, uint64_t, uint32_t, H1_4, DO_UQXTN_D) - -DO_XTNB(sve2_sqxtunb_h, int16_t, DO_UQXTN_H) -DO_XTNB(sve2_sqxtunb_s, int32_t, DO_UQXTN_S) -DO_XTNB(sve2_sqxtunb_d, int64_t, DO_UQXTN_D) - -DO_XTNT(sve2_sqxtunt_h, int16_t, int8_t, H1, DO_UQXTN_H) -DO_XTNT(sve2_sqxtunt_s, int32_t, int16_t, H1_2, DO_UQXTN_S) -DO_XTNT(sve2_sqxtunt_d, int64_t, int32_t, H1_4, DO_UQXTN_D) +DO_XTNT(sve2_sqxtunt_h, int16_t, int8_t, H1, do_usat_b) +DO_XTNT(sve2_sqxtunt_s, int32_t, int16_t, H1_2, do_usat_h) +DO_XTNT(sve2_sqxtunt_d, int64_t, int32_t, H1_4, do_usat_s) #undef DO_XTNB #undef DO_XTNT @@ -XXX,XX +XXX,XX @@ DO_SHRNT(sve2_rshrnt_h, uint16_t, uint8_t, H1_2, H1, do_urshr) DO_SHRNT(sve2_rshrnt_s, uint32_t, uint16_t, H1_4, H1_2, do_urshr) DO_SHRNT(sve2_rshrnt_d, uint64_t, uint32_t, H1_8, H1_4, do_urshr) -#define DO_SQSHRUN_H(x, sh) do_sat_bhs((int64_t)(x) >> sh, 0, UINT8_MAX) -#define DO_SQSHRUN_S(x, sh) do_sat_bhs((int64_t)(x) >> sh, 0, UINT16_MAX) -#define DO_SQSHRUN_D(x, sh) \ - do_sat_bhs((int64_t)(x) >> (sh < 64 ? sh : 63), 0, UINT32_MAX) +#define DO_SQSHRUN_H(x, sh) do_usat_b((int64_t)(x) >> sh) +#define DO_SQSHRUN_S(x, sh) do_usat_h((int64_t)(x) >> sh) +#define DO_SQSHRUN_D(x, sh) do_usat_s((int64_t)(x) >> (sh < 64 ? sh : 63)) DO_SHRNB(sve2_sqshrunb_h, int16_t, uint8_t, DO_SQSHRUN_H) DO_SHRNB(sve2_sqshrunb_s, int32_t, uint16_t, DO_SQSHRUN_S) @@ -XXX,XX +XXX,XX @@ DO_SHRNT(sve2_sqshrunt_h, int16_t, uint8_t, H1_2, H1, DO_SQSHRUN_H) DO_SHRNT(sve2_sqshrunt_s, int32_t, uint16_t, H1_4, H1_2, DO_SQSHRUN_S) DO_SHRNT(sve2_sqshrunt_d, int64_t, uint32_t, H1_8, H1_4, DO_SQSHRUN_D) -#define DO_SQRSHRUN_H(x, sh) do_sat_bhs(do_srshr(x, sh), 0, UINT8_MAX) -#define DO_SQRSHRUN_S(x, sh) do_sat_bhs(do_srshr(x, sh), 0, UINT16_MAX) -#define DO_SQRSHRUN_D(x, sh) do_sat_bhs(do_srshr(x, sh), 0, UINT32_MAX) +#define DO_SQRSHRUN_H(x, sh) do_usat_b(do_srshr(x, sh)) +#define DO_SQRSHRUN_S(x, sh) do_usat_h(do_srshr(x, sh)) +#define DO_SQRSHRUN_D(x, sh) do_usat_s(do_srshr(x, sh)) DO_SHRNB(sve2_sqrshrunb_h, int16_t, uint8_t, DO_SQRSHRUN_H) DO_SHRNB(sve2_sqrshrunb_s, int32_t, uint16_t, DO_SQRSHRUN_S) @@ -XXX,XX +XXX,XX @@ DO_SHRNT(sve2_sqrshrunt_h, int16_t, uint8_t, H1_2, H1, DO_SQRSHRUN_H) DO_SHRNT(sve2_sqrshrunt_s, int32_t, uint16_t, H1_4, H1_2, DO_SQRSHRUN_S) DO_SHRNT(sve2_sqrshrunt_d, int64_t, uint32_t, H1_8, H1_4, DO_SQRSHRUN_D) -#define DO_SQSHRN_H(x, sh) do_sat_bhs(x >> sh, INT8_MIN, INT8_MAX) -#define DO_SQSHRN_S(x, sh) do_sat_bhs(x >> sh, INT16_MIN, INT16_MAX) -#define DO_SQSHRN_D(x, sh) do_sat_bhs(x >> sh, INT32_MIN, INT32_MAX) +#define DO_SQSHRN_H(x, sh) do_ssat_b(x >> sh) +#define DO_SQSHRN_S(x, sh) do_ssat_h(x >> sh) +#define DO_SQSHRN_D(x, sh) do_ssat_s(x >> sh) DO_SHRNB(sve2_sqshrnb_h, int16_t, uint8_t, DO_SQSHRN_H) DO_SHRNB(sve2_sqshrnb_s, int32_t, uint16_t, DO_SQSHRN_S) @@ -XXX,XX +XXX,XX @@ DO_SHRNT(sve2_sqshrnt_h, int16_t, uint8_t, H1_2, H1, DO_SQSHRN_H) DO_SHRNT(sve2_sqshrnt_s, int32_t, uint16_t, H1_4, H1_2, DO_SQSHRN_S) DO_SHRNT(sve2_sqshrnt_d, int64_t, uint32_t, H1_8, H1_4, DO_SQSHRN_D) -#define DO_SQRSHRN_H(x, sh) do_sat_bhs(do_srshr(x, sh), INT8_MIN, INT8_MAX) -#define DO_SQRSHRN_S(x, sh) do_sat_bhs(do_srshr(x, sh), INT16_MIN, INT16_MAX) -#define DO_SQRSHRN_D(x, sh) do_sat_bhs(do_srshr(x, sh), INT32_MIN, INT32_MAX) +#define DO_SQRSHRN_H(x, sh) do_ssat_b(do_srshr(x, sh)) +#define DO_SQRSHRN_S(x, sh) do_ssat_h(do_srshr(x, sh)) +#define DO_SQRSHRN_D(x, sh) do_ssat_s(do_srshr(x, sh)) DO_SHRNB(sve2_sqrshrnb_h, int16_t, uint8_t, DO_SQRSHRN_H) DO_SHRNB(sve2_sqrshrnb_s, int32_t, uint16_t, DO_SQRSHRN_S) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20250704142112.1018902-60-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 20 ++++++ target/arm/tcg/sme.decode | 22 +++++++ target/arm/tcg/sme_helper.c | 116 +++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 35 ++++++++++ 4 files changed, 193 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sme2_fcvt_w, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_4(sme2_fcvtl, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_4(sme2_scvtf, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_4(sme2_ucvtf, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) + +DEF_HELPER_FLAGS_3(sme2_sqcvt_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqcvt_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvtu_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvt_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqcvt_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvtu_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvt_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqcvt_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvtu_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sme2_sqcvtn_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqcvtn_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvtun_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvtn_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqcvtn_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvtun_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvtn_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqcvtn_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvtun_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SUB_aaz_d 11000001 111 000010 .. 111 ...00 11 ... @az_4x4_o3 &zz_n zd zn n @zz_1x2 ........ ... ..... ...... ..... zd:5 \ &zz_n n=1 zn=%zn_ax2 +@zz_1x4 ........ ... ..... ...... ..... zd:5 \ + &zz_n n=1 zn=%zn_ax4 @zz_2x1 ........ ... ..... ...... zn:5 ..... \ &zz_n n=1 zd=%zd_ax2 @zz_2x2 ........ ... ..... ...... .... . ..... \ @@ -XXX,XX +XXX,XX @@ FRINTM 11000001 101 01010 111000 ....0 ....0 @zz_2x2 FRINTM 11000001 101 11010 111000 ...00 ...00 @zz_4x4 FRINTA 11000001 101 01100 111000 ....0 ....0 @zz_2x2 FRINTA 11000001 101 11100 111000 ...00 ...00 @zz_4x4 + +SQCVT_sh 11000001 001 00011 111000 ....0 ..... @zz_1x2 +UQCVT_sh 11000001 001 00011 111000 ....1 ..... @zz_1x2 +SQCVTU_sh 11000001 011 00011 111000 ....0 ..... @zz_1x2 + +SQCVT_sb 11000001 001 10011 111000 ...00 ..... @zz_1x4 +UQCVT_sb 11000001 001 10011 111000 ...01 ..... @zz_1x4 +SQCVTU_sb 11000001 011 10011 111000 ...00 ..... @zz_1x4 + +SQCVT_dh 11000001 101 10011 111000 ...00 ..... @zz_1x4 +UQCVT_dh 11000001 101 10011 111000 ...01 ..... @zz_1x4 +SQCVTU_dh 11000001 111 10011 111000 ...00 ..... @zz_1x4 + +SQCVTN_sb 11000001 001 10011 111000 ...10 ..... @zz_1x4 +UQCVTN_sb 11000001 001 10011 111000 ...11 ..... @zz_1x4 +SQCVTUN_sb 11000001 011 10011 111000 ...10 ..... @zz_1x4 + +SQCVTN_dh 11000001 101 10011 111000 ...10 ..... @zz_1x4 +UQCVTN_dh 11000001 101 10011 111000 ...11 ..... @zz_1x4 +SQCVTUN_dh 11000001 111 10011 111000 ...10 ..... @zz_1x4 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_fcvt_n)(void *vd, void *vs, float_status *fpst, uint32_t desc) } } +#define SQCVT2(NAME, TW, TN, HW, HN, SAT) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch; \ + size_t oprsz = simd_oprsz(desc), n = oprsz / sizeof(TW); \ + TW *s0 = vs, *s1 = vs + sizeof(ARMVectorReg); \ + TN *d = vd; \ + if (vectors_overlap(vd, 1, vs, 2)) { \ + d = (TN *)&scratch; \ + } \ + for (size_t i = 0; i < n; ++i) { \ + d[HN(i)] = SAT(s0[HW(i)]); \ + d[HN(i + n)] = SAT(s1[HW(i)]); \ + } \ + if (d != vd) { \ + memcpy(vd, d, oprsz); \ + } \ +} + +SQCVT2(sme2_sqcvt_sh, int32_t, int16_t, H4, H2, do_ssat_h) +SQCVT2(sme2_uqcvt_sh, uint32_t, uint16_t, H4, H2, do_usat_h) +SQCVT2(sme2_sqcvtu_sh, int32_t, uint16_t, H4, H2, do_usat_h) + +#undef SQCVT2 + +#define SQCVT4(NAME, TW, TN, HW, HN, SAT) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch; \ + size_t oprsz = simd_oprsz(desc), n = oprsz / sizeof(TW); \ + TW *s0 = vs, *s1 = vs + sizeof(ARMVectorReg); \ + TW *s2 = vs + 2 * sizeof(ARMVectorReg); \ + TW *s3 = vs + 3 * sizeof(ARMVectorReg); \ + TN *d = vd; \ + if (vectors_overlap(vd, 1, vs, 4)) { \ + d = (TN *)&scratch; \ + } \ + for (size_t i = 0; i < n; ++i) { \ + d[HN(i)] = SAT(s0[HW(i)]); \ + d[HN(i + n)] = SAT(s1[HW(i)]); \ + d[HN(i + 2 * n)] = SAT(s2[HW(i)]); \ + d[HN(i + 3 * n)] = SAT(s3[HW(i)]); \ + } \ + if (d != vd) { \ + memcpy(vd, d, oprsz); \ + } \ +} + +SQCVT4(sme2_sqcvt_sb, int32_t, int8_t, H4, H2, do_ssat_b) +SQCVT4(sme2_uqcvt_sb, uint32_t, uint8_t, H4, H2, do_usat_b) +SQCVT4(sme2_sqcvtu_sb, int32_t, uint8_t, H4, H2, do_usat_b) + +SQCVT4(sme2_sqcvt_dh, int64_t, int16_t, H8, H2, do_ssat_h) +SQCVT4(sme2_uqcvt_dh, uint64_t, uint16_t, H8, H2, do_usat_h) +SQCVT4(sme2_sqcvtu_dh, int64_t, uint16_t, H8, H2, do_usat_h) + +#undef SQCVT4 + /* Convert and interleave */ void HELPER(sme2_bfcvtn)(void *vd, void *vs, float_status *fpst, uint32_t desc) { @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_fcvtn)(void *vd, void *vs, float_status *fpst, uint32_t desc) } } +#define SQCVTN2(NAME, TW, TN, HW, HN, SAT) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch; \ + size_t oprsz = simd_oprsz(desc), n = oprsz / sizeof(TW); \ + TW *s0 = vs, *s1 = vs + sizeof(ARMVectorReg); \ + TN *d = vd; \ + if (vectors_overlap(vd, 1, vs, 2)) { \ + d = (TN *)&scratch; \ + } \ + for (size_t i = 0; i < n; ++i) { \ + d[HN(2 * i + 0)] = SAT(s0[HW(i)]); \ + d[HN(2 * i + 1)] = SAT(s1[HW(i)]); \ + } \ + if (d != vd) { \ + memcpy(vd, d, oprsz); \ + } \ +} + +SQCVTN2(sme2_sqcvtn_sh, int32_t, int16_t, H4, H2, do_ssat_h) +SQCVTN2(sme2_uqcvtn_sh, uint32_t, uint16_t, H4, H2, do_usat_h) +SQCVTN2(sme2_sqcvtun_sh, int32_t, uint16_t, H4, H2, do_usat_h) + +#undef SQCVTN2 + +#define SQCVTN4(NAME, TW, TN, HW, HN, SAT) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch; \ + size_t oprsz = simd_oprsz(desc), n = oprsz / sizeof(TW); \ + TW *s0 = vs, *s1 = vs + sizeof(ARMVectorReg); \ + TW *s2 = vs + 2 * sizeof(ARMVectorReg); \ + TW *s3 = vs + 3 * sizeof(ARMVectorReg); \ + TN *d = vd; \ + if (vectors_overlap(vd, 1, vs, 4)) { \ + d = (TN *)&scratch; \ + } \ + for (size_t i = 0; i < n; ++i) { \ + d[HN(4 * i + 0)] = SAT(s0[HW(i)]); \ + d[HN(4 * i + 1)] = SAT(s1[HW(i)]); \ + d[HN(4 * i + 2)] = SAT(s2[HW(i)]); \ + d[HN(4 * i + 3)] = SAT(s3[HW(i)]); \ + } \ + if (d != vd) { \ + memcpy(vd, d, oprsz); \ + } \ +} + +SQCVTN4(sme2_sqcvtn_sb, int32_t, int8_t, H4, H1, do_ssat_b) +SQCVTN4(sme2_uqcvtn_sb, uint32_t, uint8_t, H4, H1, do_usat_b) +SQCVTN4(sme2_sqcvtun_sb, int32_t, uint8_t, H4, H1, do_usat_b) + +SQCVTN4(sme2_sqcvtn_dh, int64_t, int16_t, H8, H2, do_ssat_h) +SQCVTN4(sme2_uqcvtn_dh, uint64_t, uint16_t, H8, H2, do_usat_h) +SQCVTN4(sme2_sqcvtun_dh, int64_t, uint16_t, H8, H2, do_usat_h) + +#undef SQCVTN4 + /* Expand and convert */ void HELPER(sme2_fcvt_w)(void *vd, void *vs, float_status *fpst, uint32_t desc) { diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FRINTM, aa64_sme2, do_zz_fpst, a, float_round_down, FPST_A64, gen_helper_gvec_vrint_rm_s) TRANS_FEAT(FRINTA, aa64_sme2, do_zz_fpst, a, float_round_ties_away, FPST_A64, gen_helper_gvec_vrint_rm_s) + +static bool do_zz(DisasContext *s, arg_zz_n *a, int data, + gen_helper_gvec_2 *fn) +{ + if (sme_sm_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + + for (int i = 0, n = a->n; i < n; ++i) { + tcg_gen_gvec_2_ool(vec_full_reg_offset(s, a->zd + i), + vec_full_reg_offset(s, a->zn + i), + svl, svl, data, fn); + } + } + return true; +} + +TRANS_FEAT(SQCVT_sh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvt_sh) +TRANS_FEAT(UQCVT_sh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uqcvt_sh) +TRANS_FEAT(SQCVTU_sh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtu_sh) + +TRANS_FEAT(SQCVT_sb, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvt_sb) +TRANS_FEAT(UQCVT_sb, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uqcvt_sb) +TRANS_FEAT(SQCVTU_sb, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtu_sb) + +TRANS_FEAT(SQCVT_dh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvt_dh) +TRANS_FEAT(UQCVT_dh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uqcvt_dh) +TRANS_FEAT(SQCVTU_dh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtu_dh) + +TRANS_FEAT(SQCVTN_sb, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtn_sb) +TRANS_FEAT(UQCVTN_sb, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uqcvtn_sb) +TRANS_FEAT(SQCVTUN_sb, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtun_sb) + +TRANS_FEAT(SQCVTN_dh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtn_dh) +TRANS_FEAT(UQCVTN_dh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uqcvtn_dh) +TRANS_FEAT(SQCVTUN_dh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtun_dh) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-61-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 20 ++++++++++++++++---- target/arm/tcg/translate-sve.c | 7 +++++++ 2 files changed, 23 insertions(+), 4 deletions(-) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ # as propagated via the MOVPRFX instruction. %reg_movprfx 0:5 +%rn_ax2 6:4 !function=times_2 + ########################################################################### # Named attribute sets. These are used to make nice(er) names # when creating helpers common to those for the individual @@ -XXX,XX +XXX,XX @@ # Two operand @pd_pn ........ esz:2 .. .... ....... rn:4 . rd:4 &rr_esz @rd_rn ........ esz:2 ...... ...... rn:5 rd:5 &rr_esz +@rd_rnx2 ........ ... ..... ...... ..... rd:5 &rr_esz rn=%rn_ax2 # Two operand with governing predicate, flags setting @pd_pg_pn_s ........ . s:1 ...... .. pg:4 . rn:4 . rd:4 &rpr_s @@ -XXX,XX +XXX,XX @@ UABA 01000101 .. 0 ..... 11111 1 ..... ..... @rd_rn_rm #### SVE2 Narrowing ## SVE2 saturating extract narrow - # Bits 23, 18-16 are zero, limited in the translator via esz < 3 & imm == 0. -SQXTNB 01000101 .. 1 ..... 010 000 ..... ..... @rd_rn_tszimm_shl + +{ + SQCVTN_sh 01000101 00 1 10001 010 000 ....0 ..... @rd_rnx2 esz=1 + SQXTNB 01000101 .. 1 ..... 010 000 ..... ..... @rd_rn_tszimm_shl +} SQXTNT 01000101 .. 1 ..... 010 001 ..... ..... @rd_rn_tszimm_shl -UQXTNB 01000101 .. 1 ..... 010 010 ..... ..... @rd_rn_tszimm_shl +{ + UQCVTN_sh 01000101 00 1 10001 010 010 ....0 ..... @rd_rnx2 esz=1 + UQXTNB 01000101 .. 1 ..... 010 010 ..... ..... @rd_rn_tszimm_shl +} UQXTNT 01000101 .. 1 ..... 010 011 ..... ..... @rd_rn_tszimm_shl -SQXTUNB 01000101 .. 1 ..... 010 100 ..... ..... @rd_rn_tszimm_shl +{ + SQCVTUN_sh 01000101 00 1 10001 010 100 ....0 ..... @rd_rnx2 esz=1 + SQXTUNB 01000101 .. 1 ..... 010 100 ..... ..... @rd_rn_tszimm_shl +} SQXTUNT 01000101 .. 1 ..... 010 101 ..... ..... @rd_rn_tszimm_shl ## SVE2 bitwise shift right narrow diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static void gen_uclamp(unsigned vece, uint32_t d, uint32_t n, uint32_t m, } TRANS_FEAT(UCLAMP, aa64_sme, gen_gvec_fn_arg_zzzz, gen_uclamp, a) + +TRANS_FEAT(SQCVTN_sh, aa64_sme2_or_sve2p1, gen_gvec_ool_zz, + gen_helper_sme2_sqcvtn_sh, a->rd, a->rn, 0) +TRANS_FEAT(UQCVTN_sh, aa64_sme2_or_sve2p1, gen_gvec_ool_zz, + gen_helper_sme2_uqcvtn_sh, a->rd, a->rn, 0) +TRANS_FEAT(SQCVTUN_sh, aa64_sme2_or_sve2p1, gen_gvec_ool_zz, + gen_helper_sme2_sqcvtun_sh, a->rd, a->rn, 0) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20250704142112.1018902-62-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 13 ++++++++++++ target/arm/tcg/sme.decode | 18 ++++++++++++++++ target/arm/tcg/sme_helper.c | 38 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 16 ++++++++++++++ 4 files changed, 85 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(sme2_sqcvtun_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_sqcvtn_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uqcvtn_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_sqcvtun_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sme2_sunpk2_bh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sunpk2_hs, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sunpk2_sd, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sunpk4_bh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sunpk4_hs, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sunpk4_sd, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uunpk2_bh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uunpk2_hs, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uunpk2_sd, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uunpk4_bh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uunpk4_hs, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uunpk4_sd, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SUB_aaz_d 11000001 111 000010 .. 111 ...00 11 ... @az_4x4_o3 &zz_n n=2 zd=%zd_ax2 zn=%zn_ax2 @zz_4x4 ........ ... ..... ...... .... . ..... \ &zz_n n=4 zd=%zd_ax4 zn=%zn_ax4 +@zz_4x2_n1 ........ ... ..... ...... .... . ..... \ + &zz_n n=1 zd=%zd_ax4 zn=%zn_ax2 BFCVT 11000001 011 00000 111000 ....0 ..... @zz_1x2 BFCVTN 11000001 011 00000 111000 ....1 ..... @zz_1x2 @@ -XXX,XX +XXX,XX @@ SQCVTUN_sb 11000001 011 10011 111000 ...10 ..... @zz_1x4 SQCVTN_dh 11000001 101 10011 111000 ...10 ..... @zz_1x4 UQCVTN_dh 11000001 101 10011 111000 ...11 ..... @zz_1x4 SQCVTUN_dh 11000001 111 10011 111000 ...10 ..... @zz_1x4 + +SUNPK_2bh 11000001 011 00101 111000 ..... ....0 @zz_2x1 +SUNPK_2hs 11000001 101 00101 111000 ..... ....0 @zz_2x1 +SUNPK_2sd 11000001 111 00101 111000 ..... ....0 @zz_2x1 + +UUNPK_2bh 11000001 011 00101 111000 ..... ....1 @zz_2x1 +UUNPK_2hs 11000001 101 00101 111000 ..... ....1 @zz_2x1 +UUNPK_2sd 11000001 111 00101 111000 ..... ....1 @zz_2x1 + +SUNPK_4bh 11000001 011 10101 111000 ....0 ...00 @zz_4x2_n1 +SUNPK_4hs 11000001 101 10101 111000 ....0 ...00 @zz_4x2_n1 +SUNPK_4sd 11000001 111 10101 111000 ....0 ...00 @zz_4x2_n1 + +UUNPK_4bh 11000001 011 10101 111000 ....0 ...01 @zz_4x2_n1 +UUNPK_4hs 11000001 101 10101 111000 ....0 ...01 @zz_4x2_n1 +UUNPK_4sd 11000001 111 10101 111000 ....0 ...01 @zz_4x2_n1 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_fcvt_w)(void *vd, void *vs, float_status *fpst, uint32_t desc) } } +#define UNPK(NAME, SREG, TW, TN, HW, HN) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch[SREG]; \ + size_t oprsz = simd_oprsz(desc); \ + size_t n = oprsz / sizeof(TW); \ + if (vectors_overlap(vd, 2 * SREG, vs, SREG)) { \ + vs = memcpy(scratch, vs, sizeof(scratch)); \ + } \ + for (size_t r = 0; r < SREG; ++r) { \ + TN *s = vs + r * sizeof(ARMVectorReg); \ + for (size_t i = 0; i < 2; ++i) { \ + TW *d = vd + (2 * r + i) * sizeof(ARMVectorReg); \ + for (size_t e = 0; e < n; ++e) { \ + d[HW(e)] = s[HN(i * n + e)]; \ + } \ + } \ + } \ +} + +UNPK(sme2_sunpk2_bh, 1, int16_t, int8_t, H2, H1) +UNPK(sme2_sunpk2_hs, 1, int32_t, int16_t, H4, H2) +UNPK(sme2_sunpk2_sd, 1, int64_t, int32_t, H8, H4) + +UNPK(sme2_sunpk4_bh, 2, int16_t, int8_t, H2, H1) +UNPK(sme2_sunpk4_hs, 2, int32_t, int16_t, H4, H2) +UNPK(sme2_sunpk4_sd, 2, int64_t, int32_t, H8, H4) + +UNPK(sme2_uunpk2_bh, 1, uint16_t, uint8_t, H2, H1) +UNPK(sme2_uunpk2_hs, 1, uint32_t, uint16_t, H4, H2) +UNPK(sme2_uunpk2_sd, 1, uint64_t, uint32_t, H8, H4) + +UNPK(sme2_uunpk4_bh, 2, uint16_t, uint8_t, H2, H1) +UNPK(sme2_uunpk4_hs, 2, uint32_t, uint16_t, H4, H2) +UNPK(sme2_uunpk4_sd, 2, uint64_t, uint32_t, H8, H4) + +#undef UNPK + /* Deinterleave and convert. */ void HELPER(sme2_fcvtl)(void *vd, void *vs, float_status *fpst, uint32_t desc) { diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SQCVTUN_sb, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtun_sb) TRANS_FEAT(SQCVTN_dh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtn_dh) TRANS_FEAT(UQCVTN_dh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uqcvtn_dh) TRANS_FEAT(SQCVTUN_dh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtun_dh) + +TRANS_FEAT(SUNPK_2bh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sunpk2_bh) +TRANS_FEAT(SUNPK_2hs, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sunpk2_hs) +TRANS_FEAT(SUNPK_2sd, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sunpk2_sd) + +TRANS_FEAT(SUNPK_4bh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sunpk4_bh) +TRANS_FEAT(SUNPK_4hs, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sunpk4_hs) +TRANS_FEAT(SUNPK_4sd, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sunpk4_sd) + +TRANS_FEAT(UUNPK_2bh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk2_bh) +TRANS_FEAT(UUNPK_2hs, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk2_hs) +TRANS_FEAT(UUNPK_2sd, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk2_sd) + +TRANS_FEAT(UUNPK_4bh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk4_bh) +TRANS_FEAT(UUNPK_4hs, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk4_hs) +TRANS_FEAT(UUNPK_4sd, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk4_sd) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-63-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 12 ++++++ target/arm/tcg/sme.decode | 11 ++++++ target/arm/tcg/sme_helper.c | 68 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 39 +++++++++++++++++++ 4 files changed, 130 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(sme2_uunpk2_sd, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uunpk4_bh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uunpk4_hs, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uunpk4_sd, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sme2_zip4_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_zip4_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_zip4_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_zip4_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_zip4_q, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sme2_uzp4_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uzp4_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uzp4_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uzp4_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uzp4_q, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SUB_aaz_d 11000001 111 000010 .. 111 ...00 11 ... @az_4x4_o3 ### SME2 Multi-vector SVE Constructive Unary +&zz_e zd zn esz &zz_n zd zn n @zz_1x2 ........ ... ..... ...... ..... zd:5 \ &zz_n n=1 zn=%zn_ax2 @@ -XXX,XX +XXX,XX @@ SUNPK_4sd 11000001 111 10101 111000 ....0 ...00 @zz_4x2_n1 UUNPK_4bh 11000001 011 10101 111000 ....0 ...01 @zz_4x2_n1 UUNPK_4hs 11000001 101 10101 111000 ....0 ...01 @zz_4x2_n1 UUNPK_4sd 11000001 111 10101 111000 ....0 ...01 @zz_4x2_n1 + +ZIP_4 11000001 esz:2 1 10110 111000 ...00 ... 00 \ + &zz_e zd=%zd_ax4 zn=%zn_ax4 +ZIP_4 11000001 001 10111 111000 ...00 ... 00 \ + &zz_e esz=4 zd=%zd_ax4 zn=%zn_ax4 + +UZP_4 11000001 esz:2 1 10110 111000 ...00 ... 10 \ + &zz_e zd=%zd_ax4 zn=%zn_ax4 +UZP_4 11000001 001 10111 111000 ...00 ... 10 \ + &zz_e esz=4 zd=%zd_ax4 zn=%zn_ax4 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_ucvtf)(void *vd, void *vs, float_status *fpst, uint32_t desc) d[i] = uint32_to_float32(s[i], fpst); } } + +#define ZIP4(NAME, TYPE, H) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch[4]; \ + size_t oprsz = simd_oprsz(desc); \ + size_t quads = oprsz / (sizeof(TYPE) * 4); \ + TYPE *s0, *s1, *s2, *s3; \ + if (vs == vd) { \ + vs = memcpy(scratch, vs, sizeof(scratch)); \ + } \ + s0 = vs; \ + s1 = vs + sizeof(ARMVectorReg); \ + s2 = vs + 2 * sizeof(ARMVectorReg); \ + s3 = vs + 3 * sizeof(ARMVectorReg); \ + for (size_t r = 0; r < 4; ++r) { \ + TYPE *d = vd + r * sizeof(ARMVectorReg); \ + size_t base = r * quads; \ + for (size_t q = 0; q < quads; ++q) { \ + d[H(4 * q + 0)] = s0[base + H(q)]; \ + d[H(4 * q + 1)] = s1[base + H(q)]; \ + d[H(4 * q + 2)] = s2[base + H(q)]; \ + d[H(4 * q + 3)] = s3[base + H(q)]; \ + } \ + } \ +} + +ZIP4(sme2_zip4_b, uint8_t, H1) +ZIP4(sme2_zip4_h, uint16_t, H2) +ZIP4(sme2_zip4_s, uint32_t, H4) +ZIP4(sme2_zip4_d, uint64_t, ) +ZIP4(sme2_zip4_q, Int128, ) + +#undef ZIP4 + +#define UZP4(NAME, TYPE, H) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch[4]; \ + size_t oprsz = simd_oprsz(desc); \ + size_t quads = oprsz / (sizeof(TYPE) * 4); \ + TYPE *d0, *d1, *d2, *d3; \ + if (vs == vd) { \ + vs = memcpy(scratch, vs, sizeof(scratch)); \ + } \ + d0 = vd; \ + d1 = vd + sizeof(ARMVectorReg); \ + d2 = vd + 2 * sizeof(ARMVectorReg); \ + d3 = vd + 3 * sizeof(ARMVectorReg); \ + for (size_t r = 0; r < 4; ++r) { \ + TYPE *s = vs + r * sizeof(ARMVectorReg); \ + size_t base = r * quads; \ + for (size_t q = 0; q < quads; ++q) { \ + d0[base + H(q)] = s[H(4 * q + 0)]; \ + d1[base + H(q)] = s[H(4 * q + 1)]; \ + d2[base + H(q)] = s[H(4 * q + 2)]; \ + d3[base + H(q)] = s[H(4 * q + 3)]; \ + } \ + } \ +} + +UZP4(sme2_uzp4_b, uint8_t, H1) +UZP4(sme2_uzp4_h, uint16_t, H2) +UZP4(sme2_uzp4_s, uint32_t, H4) +UZP4(sme2_uzp4_d, uint64_t, ) +UZP4(sme2_uzp4_q, Int128, ) + +#undef UZP4 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(UUNPK_2sd, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk2_sd) TRANS_FEAT(UUNPK_4bh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk4_bh) TRANS_FEAT(UUNPK_4hs, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk4_hs) TRANS_FEAT(UUNPK_4sd, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk4_sd) + +static bool do_zipuzp_4(DisasContext *s, arg_zz_e *a, + gen_helper_gvec_2 * const fn[5]) +{ + int bytes_per_op = 4 << a->esz; + + /* Both MO_64 and MO_128 can fail the size test. */ + if (s->max_svl < bytes_per_op) { + unallocated_encoding(s); + } else if (sme_sm_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + if (svl < bytes_per_op) { + unallocated_encoding(s); + } else { + tcg_gen_gvec_2_ool(vec_full_reg_offset(s, a->zd), + vec_full_reg_offset(s, a->zn), + svl, svl, 0, fn[a->esz]); + } + } + return true; +} + +static gen_helper_gvec_2 * const zip4_fns[] = { + gen_helper_sme2_zip4_b, + gen_helper_sme2_zip4_h, + gen_helper_sme2_zip4_s, + gen_helper_sme2_zip4_d, + gen_helper_sme2_zip4_q, +}; +TRANS_FEAT(ZIP_4, aa64_sme2, do_zipuzp_4, a, zip4_fns) + +static gen_helper_gvec_2 * const uzp4_fns[] = { + gen_helper_sme2_uzp4_b, + gen_helper_sme2_uzp4_h, + gen_helper_sme2_uzp4_s, + gen_helper_sme2_uzp4_d, + gen_helper_sme2_uzp4_q, +}; +TRANS_FEAT(UZP_4, aa64_sme2, do_zipuzp_4, a, uzp4_fns) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Unify two copies of these inline functions. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-64-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/vec_internal.h | 21 +++++++++++++++++++++ target/arm/tcg/mve_helper.c | 21 --------------------- target/arm/tcg/sve_helper.c | 21 --------------------- 3 files changed, 21 insertions(+), 42 deletions(-) diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_internal.h +++ b/target/arm/tcg/vec_internal.h @@ -XXX,XX +XXX,XX @@ int64_t do_sqrdmlah_d(int64_t, int64_t, int64_t, bool, bool); #define do_usat_h(val) MIN(MAX(val, 0), UINT16_MAX) #define do_usat_s(val) MIN(MAX(val, 0), UINT32_MAX) +static inline uint64_t do_urshr(uint64_t x, unsigned sh) +{ + if (likely(sh < 64)) { + return (x >> sh) + ((x >> (sh - 1)) & 1); + } else if (sh == 64) { + return x >> 63; + } else { + return 0; + } +} + +static inline int64_t do_srshr(int64_t x, unsigned sh) +{ + if (likely(sh < 64)) { + return (x >> sh) + ((x >> (sh - 1)) & 1); + } else { + /* Rounding the sign bit always produces 0. */ + return 0; + } +} + /** * bfdotadd: * @sum: addend diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/mve_helper.c +++ b/target/arm/tcg/mve_helper.c @@ -XXX,XX +XXX,XX @@ DO_VSHLL_ALL(vshllt, true) DO_VSHRN(OP##tb, true, 1, uint8_t, 2, uint16_t, FN) \ DO_VSHRN(OP##th, true, 2, uint16_t, 4, uint32_t, FN) -static inline uint64_t do_urshr(uint64_t x, unsigned sh) -{ - if (likely(sh < 64)) { - return (x >> sh) + ((x >> (sh - 1)) & 1); - } else if (sh == 64) { - return x >> 63; - } else { - return 0; - } -} - -static inline int64_t do_srshr(int64_t x, unsigned sh) -{ - if (likely(sh < 64)) { - return (x >> sh) + ((x >> (sh - 1)) & 1); - } else { - /* Rounding the sign bit always produces 0. */ - return 0; - } -} - DO_VSHRN_ALL(vshrn, DO_SHR) DO_VSHRN_ALL(vrshrn, do_urshr) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vg, uint32_t desc) \ when N is negative, add 2**M-1. */ #define DO_ASRD(N, M) ((N + (N < 0 ? ((__typeof(N))1 << M) - 1 : 0)) >> M) -static inline uint64_t do_urshr(uint64_t x, unsigned sh) -{ - if (likely(sh < 64)) { - return (x >> sh) + ((x >> (sh - 1)) & 1); - } else if (sh == 64) { - return x >> 63; - } else { - return 0; - } -} - -static inline int64_t do_srshr(int64_t x, unsigned sh) -{ - if (likely(sh < 64)) { - return (x >> sh) + ((x >> (sh - 1)) & 1); - } else { - /* Rounding the sign bit always produces 0. */ - return 0; - } -} - DO_ZPZI(sve_asr_zpzi_b, int8_t, H1, DO_SHR) DO_ZPZI(sve_asr_zpzi_h, int16_t, H1_2, DO_SHR) DO_ZPZI(sve_asr_zpzi_s, int32_t, H1_4, DO_SHR) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20250704142112.1018902-65-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 20 ++++++ target/arm/tcg/sme.decode | 37 ++++++++++ target/arm/tcg/sme_helper.c | 120 +++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 33 +++++++++ 4 files changed, 210 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(sme2_uzp4_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uzp4_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uzp4_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uzp4_q, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sme2_sqrshr_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqrshr_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshru_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshr_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqrshr_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshru_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshr_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqrshr_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshru_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sme2_sqrshrn_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqrshrn_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshrun_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshrn_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqrshrn_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshrun_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshrn_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqrshrn_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshrun_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ UZP_4 11000001 esz:2 1 10110 111000 ...00 ... 10 \ &zz_e zd=%zd_ax4 zn=%zn_ax4 UZP_4 11000001 001 10111 111000 ...00 ... 10 \ &zz_e esz=4 zd=%zd_ax4 zn=%zn_ax4 + +### SME2 Multi-vector SVE Constructive Binary + +&rshr zd zn shift + +%rshr_sh_shift 16:4 !function=rsub_16 +%rshr_sb_shift 16:5 !function=rsub_32 +%rshr_dh_shift 22:1 16:5 !function=rsub_64 + +@rshr_sh ........ .... .... ...... ..... zd:5 \ + &rshr zn=%zn_ax2 shift=%rshr_sh_shift +@rshr_sb ........ ... ..... ...... ..... zd:5 \ + &rshr zn=%zn_ax4 shift=%rshr_sb_shift +@rshr_dh ........ ... ..... ...... ..... zd:5 \ + &rshr zn=%zn_ax4 shift=%rshr_dh_shift + +SQRSHR_sh 11000001 1110 .... 110101 ....0 ..... @rshr_sh +UQRSHR_sh 11000001 1110 .... 110101 ....1 ..... @rshr_sh +SQRSHRU_sh 11000001 1111 .... 110101 ....0 ..... @rshr_sh + +SQRSHR_sb 11000001 011 ..... 110110 ...00 ..... @rshr_sb +SQRSHR_dh 11000001 1.1 ..... 110110 ...00 ..... @rshr_dh +UQRSHR_sb 11000001 011 ..... 110110 ...01 ..... @rshr_sb +UQRSHR_dh 11000001 1.1 ..... 110110 ...01 ..... @rshr_dh +SQRSHRU_sb 11000001 011 ..... 110110 ...10 ..... @rshr_sb +SQRSHRU_dh 11000001 1.1 ..... 110110 ...10 ..... @rshr_dh + +SQRSHRN_sh 01000101 1011 .... 001010 ....0 ..... @rshr_sh +UQRSHRN_sh 01000101 1011 .... 001110 ....0 ..... @rshr_sh +SQRSHRUN_sh 01000101 1011 .... 000010 ....0 ..... @rshr_sh + +SQRSHRN_sb 11000001 011 ..... 110111 ...00 ..... @rshr_sb +SQRSHRN_dh 11000001 1.1 ..... 110111 ...00 ..... @rshr_dh +UQRSHRN_sb 11000001 011 ..... 110111 ...01 ..... @rshr_sb +UQRSHRN_dh 11000001 1.1 ..... 110111 ...01 ..... @rshr_dh +SQRSHRUN_sb 11000001 011 ..... 110111 ...10 ..... @rshr_sb +SQRSHRUN_dh 11000001 1.1 ..... 110111 ...10 ..... @rshr_dh diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ SQCVT4(sme2_sqcvtu_dh, int64_t, uint16_t, H8, H2, do_usat_h) #undef SQCVT4 +#define SQRSHR2(NAME, TW, TN, HW, HN, RSHR, SAT) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch; \ + size_t oprsz = simd_oprsz(desc), n = oprsz / sizeof(TW); \ + int shift = simd_data(desc); \ + TW *s0 = vs, *s1 = vs + sizeof(ARMVectorReg); \ + TN *d = vd; \ + if (vectors_overlap(vd, 1, vs, 2)) { \ + d = (TN *)&scratch; \ + } \ + for (size_t i = 0; i < n; ++i) { \ + d[HN(i)] = SAT(RSHR(s0[HW(i)], shift)); \ + d[HN(i + n)] = SAT(RSHR(s1[HW(i)], shift)); \ + } \ + if (d != vd) { \ + memcpy(vd, d, oprsz); \ + } \ +} + +SQRSHR2(sme2_sqrshr_sh, int32_t, int16_t, H4, H2, do_srshr, do_ssat_h) +SQRSHR2(sme2_uqrshr_sh, uint32_t, uint16_t, H4, H2, do_urshr, do_usat_h) +SQRSHR2(sme2_sqrshru_sh, int32_t, uint16_t, H4, H2, do_srshr, do_usat_h) + +#undef SQRSHR2 + +#define SQRSHR4(NAME, TW, TN, HW, HN, RSHR, SAT) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch; \ + size_t oprsz = simd_oprsz(desc), n = oprsz / sizeof(TW); \ + int shift = simd_data(desc); \ + TW *s0 = vs, *s1 = vs + sizeof(ARMVectorReg); \ + TW *s2 = vs + 2 * sizeof(ARMVectorReg); \ + TW *s3 = vs + 3 * sizeof(ARMVectorReg); \ + TN *d = vd; \ + if (vectors_overlap(vd, 1, vs, 4)) { \ + d = (TN *)&scratch; \ + } \ + for (size_t i = 0; i < n; ++i) { \ + d[HN(i)] = SAT(RSHR(s0[HW(i)], shift)); \ + d[HN(i + n)] = SAT(RSHR(s1[HW(i)], shift)); \ + d[HN(i + 2 * n)] = SAT(RSHR(s2[HW(i)], shift)); \ + d[HN(i + 3 * n)] = SAT(RSHR(s3[HW(i)], shift)); \ + } \ + if (d != vd) { \ + memcpy(vd, d, oprsz); \ + } \ +} + +SQRSHR4(sme2_sqrshr_sb, int32_t, int8_t, H4, H2, do_srshr, do_ssat_b) +SQRSHR4(sme2_uqrshr_sb, uint32_t, uint8_t, H4, H2, do_urshr, do_usat_b) +SQRSHR4(sme2_sqrshru_sb, int32_t, uint8_t, H4, H2, do_srshr, do_usat_b) + +SQRSHR4(sme2_sqrshr_dh, int64_t, int16_t, H8, H2, do_srshr, do_ssat_h) +SQRSHR4(sme2_uqrshr_dh, uint64_t, uint16_t, H8, H2, do_urshr, do_usat_h) +SQRSHR4(sme2_sqrshru_dh, int64_t, uint16_t, H8, H2, do_srshr, do_usat_h) + +#undef SQRSHR4 + /* Convert and interleave */ void HELPER(sme2_bfcvtn)(void *vd, void *vs, float_status *fpst, uint32_t desc) { @@ -XXX,XX +XXX,XX @@ SQCVTN4(sme2_sqcvtun_dh, int64_t, uint16_t, H8, H2, do_usat_h) #undef SQCVTN4 +#define SQRSHRN2(NAME, TW, TN, HW, HN, RSHR, SAT) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch; \ + size_t oprsz = simd_oprsz(desc), n = oprsz / sizeof(TW); \ + int shift = simd_data(desc); \ + TW *s0 = vs, *s1 = vs + sizeof(ARMVectorReg); \ + TN *d = vd; \ + if (vectors_overlap(vd, 1, vs, 2)) { \ + d = (TN *)&scratch; \ + } \ + for (size_t i = 0; i < n; ++i) { \ + d[HN(2 * i + 0)] = SAT(RSHR(s0[HW(i)], shift)); \ + d[HN(2 * i + 1)] = SAT(RSHR(s1[HW(i)], shift)); \ + } \ + if (d != vd) { \ + memcpy(vd, d, oprsz); \ + } \ +} + +SQRSHRN2(sme2_sqrshrn_sh, int32_t, int16_t, H4, H2, do_srshr, do_ssat_h) +SQRSHRN2(sme2_uqrshrn_sh, uint32_t, uint16_t, H4, H2, do_urshr, do_usat_h) +SQRSHRN2(sme2_sqrshrun_sh, int32_t, uint16_t, H4, H2, do_srshr, do_usat_h) + +#undef SQRSHRN2 + +#define SQRSHRN4(NAME, TW, TN, HW, HN, RSHR, SAT) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch; \ + size_t oprsz = simd_oprsz(desc), n = oprsz / sizeof(TW); \ + int shift = simd_data(desc); \ + TW *s0 = vs, *s1 = vs + sizeof(ARMVectorReg); \ + TW *s2 = vs + 2 * sizeof(ARMVectorReg); \ + TW *s3 = vs + 3 * sizeof(ARMVectorReg); \ + TN *d = vd; \ + if (vectors_overlap(vd, 1, vs, 4)) { \ + d = (TN *)&scratch; \ + } \ + for (size_t i = 0; i < n; ++i) { \ + d[HN(4 * i + 0)] = SAT(RSHR(s0[HW(i)], shift)); \ + d[HN(4 * i + 1)] = SAT(RSHR(s1[HW(i)], shift)); \ + d[HN(4 * i + 2)] = SAT(RSHR(s2[HW(i)], shift)); \ + d[HN(4 * i + 3)] = SAT(RSHR(s3[HW(i)], shift)); \ + } \ + if (d != vd) { \ + memcpy(vd, d, oprsz); \ + } \ +} + +SQRSHRN4(sme2_sqrshrn_sb, int32_t, int8_t, H4, H1, do_srshr, do_ssat_b) +SQRSHRN4(sme2_uqrshrn_sb, uint32_t, uint8_t, H4, H1, do_urshr, do_usat_b) +SQRSHRN4(sme2_sqrshrun_sb, int32_t, uint8_t, H4, H1, do_srshr, do_usat_b) + +SQRSHRN4(sme2_sqrshrn_dh, int64_t, int16_t, H8, H2, do_srshr, do_ssat_h) +SQRSHRN4(sme2_uqrshrn_dh, uint64_t, uint16_t, H8, H2, do_urshr, do_usat_h) +SQRSHRN4(sme2_sqrshrun_dh, int64_t, uint16_t, H8, H2, do_srshr, do_usat_h) + +#undef SQRSHRN4 + /* Expand and convert */ void HELPER(sme2_fcvt_w)(void *vd, void *vs, float_status *fpst, uint32_t desc) { diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2 * const uzp4_fns[] = { gen_helper_sme2_uzp4_q, }; TRANS_FEAT(UZP_4, aa64_sme2, do_zipuzp_4, a, uzp4_fns) + +static bool do_zz_rshr(DisasContext *s, arg_rshr *a, gen_helper_gvec_2 *fn) +{ + if (sve_access_check(s)) { + int vl = vec_full_reg_size(s); + tcg_gen_gvec_2_ool(vec_full_reg_offset(s, a->zd), + vec_full_reg_offset(s, a->zn), + vl, vl, a->shift, fn); + } + return true; +} + +TRANS_FEAT(SQRSHR_sh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshr_sh) +TRANS_FEAT(UQRSHR_sh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_uqrshr_sh) +TRANS_FEAT(SQRSHRU_sh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshru_sh) + +TRANS_FEAT(SQRSHR_sb, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshr_sb) +TRANS_FEAT(SQRSHR_dh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshr_dh) +TRANS_FEAT(UQRSHR_sb, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_uqrshr_sb) +TRANS_FEAT(UQRSHR_dh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_uqrshr_dh) +TRANS_FEAT(SQRSHRU_sb, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshru_sb) +TRANS_FEAT(SQRSHRU_dh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshru_dh) + +TRANS_FEAT(SQRSHRN_sh, aa64_sme2_or_sve2p1, do_zz_rshr, a, gen_helper_sme2_sqrshrn_sh) +TRANS_FEAT(UQRSHRN_sh, aa64_sme2_or_sve2p1, do_zz_rshr, a, gen_helper_sme2_uqrshrn_sh) +TRANS_FEAT(SQRSHRUN_sh, aa64_sme2_or_sve2p1, do_zz_rshr, a, gen_helper_sme2_sqrshrun_sh) + +TRANS_FEAT(SQRSHRN_sb, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshrn_sb) +TRANS_FEAT(SQRSHRN_dh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshrn_dh) +TRANS_FEAT(UQRSHRN_sb, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_uqrshrn_sb) +TRANS_FEAT(UQRSHRN_dh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_uqrshrn_dh) +TRANS_FEAT(SQRSHRUN_sb, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshrun_sb) +TRANS_FEAT(SQRSHRUN_dh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshrun_dh) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-66-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 12 +++++++ target/arm/tcg/sme.decode | 12 +++++++ target/arm/tcg/sme_helper.c | 62 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 40 ++++++++++++++++++++++ 4 files changed, 126 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(sme2_uunpk4_bh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uunpk4_hs, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uunpk4_sd, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_zip2_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_zip2_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_zip2_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_zip2_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_zip2_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sme2_uzp2_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uzp2_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uzp2_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uzp2_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uzp2_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_3(sme2_zip4_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_zip4_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_zip4_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ UQRSHRN_sb 11000001 011 ..... 110111 ...01 ..... @rshr_sb UQRSHRN_dh 11000001 1.1 ..... 110111 ...01 ..... @rshr_dh SQRSHRUN_sb 11000001 011 ..... 110111 ...10 ..... @rshr_sb SQRSHRUN_dh 11000001 1.1 ..... 110111 ...10 ..... @rshr_dh + +&zzz_e zd zn zm esz + +ZIP_2 11000001 esz:2 1 zm:5 110100 zn:5 .... 0 \ + &zzz_e zd=%zd_ax2 +ZIP_2 11000001 00 1 zm:5 110101 zn:5 .... 0 \ + &zzz_e zd=%zd_ax2 esz=4 + +UZP_2 11000001 esz:2 1 zm:5 110100 zn:5 .... 1 \ + &zzz_e zd=%zd_ax2 +UZP_2 11000001 00 1 zm:5 110101 zn:5 .... 1 \ + &zzz_e zd=%zd_ax2 esz=4 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_ucvtf)(void *vd, void *vs, float_status *fpst, uint32_t desc) } } +#define ZIP2(NAME, TYPE, H) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ +{ \ + ARMVectorReg scratch[2]; \ + size_t oprsz = simd_oprsz(desc); \ + size_t pairs = oprsz / (sizeof(TYPE) * 2); \ + TYPE *n = vn, *m = vm; \ + if (vectors_overlap(vd, 2, vn, 1)) { \ + n = memcpy(&scratch[0], vn, oprsz); \ + } \ + if (vectors_overlap(vd, 2, vm, 1)) { \ + m = memcpy(&scratch[1], vm, oprsz); \ + } \ + for (size_t r = 0; r < 2; ++r) { \ + TYPE *d = vd + r * sizeof(ARMVectorReg); \ + size_t base = r * pairs; \ + for (size_t p = 0; p < pairs; ++p) { \ + d[H(2 * p + 0)] = n[base + H(p)]; \ + d[H(2 * p + 1)] = m[base + H(p)]; \ + } \ + } \ +} + +ZIP2(sme2_zip2_b, uint8_t, H1) +ZIP2(sme2_zip2_h, uint16_t, H2) +ZIP2(sme2_zip2_s, uint32_t, H4) +ZIP2(sme2_zip2_d, uint64_t, ) +ZIP2(sme2_zip2_q, Int128, ) + +#undef ZIP2 + #define ZIP4(NAME, TYPE, H) \ void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ { \ @@ -XXX,XX +XXX,XX @@ ZIP4(sme2_zip4_q, Int128, ) #undef ZIP4 +#define UZP2(NAME, TYPE, H) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ +{ \ + ARMVectorReg scratch[2]; \ + size_t oprsz = simd_oprsz(desc); \ + size_t pairs = oprsz / (sizeof(TYPE) * 2); \ + TYPE *d0 = vd, *d1 = vd + sizeof(ARMVectorReg); \ + if (vectors_overlap(vd, 2, vn, 1)) { \ + vn = memcpy(&scratch[0], vn, oprsz); \ + } \ + if (vectors_overlap(vd, 2, vm, 1)) { \ + vm = memcpy(&scratch[1], vm, oprsz); \ + } \ + for (size_t r = 0; r < 2; ++r) { \ + TYPE *s = r ? vm : vn; \ + size_t base = r * pairs; \ + for (size_t p = 0; p < pairs; ++p) { \ + d0[base + H(p)] = s[H(2 * p + 0)]; \ + d1[base + H(p)] = s[H(2 * p + 1)]; \ + } \ + } \ +} + +UZP2(sme2_uzp2_b, uint8_t, H1) +UZP2(sme2_uzp2_h, uint16_t, H2) +UZP2(sme2_uzp2_s, uint32_t, H4) +UZP2(sme2_uzp2_d, uint64_t, ) +UZP2(sme2_uzp2_q, Int128, ) + +#undef UZP2 + #define UZP4(NAME, TYPE, H) \ void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ { \ diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(UQRSHRN_sb, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_uqrshrn_sb) TRANS_FEAT(UQRSHRN_dh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_uqrshrn_dh) TRANS_FEAT(SQRSHRUN_sb, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshrun_sb) TRANS_FEAT(SQRSHRUN_dh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshrun_dh) + +static bool do_zipuzp_2(DisasContext *s, arg_zzz_e *a, + gen_helper_gvec_3 * const fn[5]) +{ + int bytes_per_op = 2 << a->esz; + + /* MO_128 can fail the size test. */ + if (s->max_svl < bytes_per_op) { + unallocated_encoding(s); + } else if (sme_sm_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + if (svl < bytes_per_op) { + unallocated_encoding(s); + } else { + tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->zd), + vec_full_reg_offset(s, a->zn), + vec_full_reg_offset(s, a->zm), + svl, svl, 0, fn[a->esz]); + } + } + return true; +} + +static gen_helper_gvec_3 * const zip2_fns[] = { + gen_helper_sme2_zip2_b, + gen_helper_sme2_zip2_h, + gen_helper_sme2_zip2_s, + gen_helper_sme2_zip2_d, + gen_helper_sme2_zip2_q, +}; +TRANS_FEAT(ZIP_2, aa64_sme2, do_zipuzp_2, a, zip2_fns) + +static gen_helper_gvec_3 * const uzp2_fns[] = { + gen_helper_sme2_uzp2_b, + gen_helper_sme2_uzp2_h, + gen_helper_sme2_uzp2_s, + gen_helper_sme2_uzp2_d, + gen_helper_sme2_uzp2_q, +}; +TRANS_FEAT(UZP_2, aa64_sme2, do_zipuzp_2, a, uzp2_fns) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-67-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 15 +++++++ target/arm/tcg/sme.decode | 17 ++++++++ target/arm/tcg/sme_helper.c | 56 +++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 75 ++++++++++++++++++++++++++++++++++ 4 files changed, 163 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(sme2_sqrshrun_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_sqrshrn_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uqrshrn_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_sqrshrun_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sme2_sclamp_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_sclamp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_sclamp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_sclamp_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sme2_uclamp_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uclamp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uclamp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uclamp_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sme2_fclamp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sme2_fclamp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sme2_fclamp_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sme2_bfclamp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ UZP_2 11000001 esz:2 1 zm:5 110100 zn:5 .... 1 \ &zzz_e zd=%zd_ax2 UZP_2 11000001 00 1 zm:5 110101 zn:5 .... 1 \ &zzz_e zd=%zd_ax2 esz=4 + +&zzz_en zd zn zm esz n + +FCLAMP 11000001 esz:2 1 zm:5 110000 zn:5 .... 0 \ + &zzz_en zd=%zd_ax2 n=2 +FCLAMP 11000001 esz:2 1 zm:5 110010 zn:5 ...0 0 \ + &zzz_en zd=%zd_ax4 n=4 + +SCLAMP 11000001 esz:2 1 zm:5 110001 zn:5 .... 0 \ + &zzz_en zd=%zd_ax2 n=2 +SCLAMP 11000001 esz:2 1 zm:5 110011 zn:5 ...0 0 \ + &zzz_en zd=%zd_ax4 n=4 + +UCLAMP 11000001 esz:2 1 zm:5 110001 zn:5 .... 1 \ + &zzz_en zd=%zd_ax2 n=2 +UCLAMP 11000001 esz:2 1 zm:5 110011 zn:5 ...0 1 \ + &zzz_en zd=%zd_ax4 n=4 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ UZP4(sme2_uzp4_d, uint64_t, ) UZP4(sme2_uzp4_q, Int128, ) #undef UZP4 + +#define ICLAMP(NAME, TYPE, H) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ +{ \ + size_t stride = sizeof(ARMVectorReg) / sizeof(TYPE); \ + size_t elements = simd_oprsz(desc) / sizeof(TYPE); \ + size_t nreg = simd_data(desc); \ + TYPE *d = vd, *n = vn, *m = vm; \ + for (size_t e = 0; e < elements; e++) { \ + TYPE nn = n[H(e)], mm = m[H(e)]; \ + for (size_t r = 0; r < nreg; r++) { \ + TYPE *dd = &d[r * stride + H(e)]; \ + *dd = MIN(MAX(*dd, nn), mm); \ + } \ + } \ +} + +ICLAMP(sme2_sclamp_b, int8_t, H1) +ICLAMP(sme2_sclamp_h, int16_t, H2) +ICLAMP(sme2_sclamp_s, int32_t, H4) +ICLAMP(sme2_sclamp_d, int64_t, H8) + +ICLAMP(sme2_uclamp_b, uint8_t, H1) +ICLAMP(sme2_uclamp_h, uint16_t, H2) +ICLAMP(sme2_uclamp_s, uint32_t, H4) +ICLAMP(sme2_uclamp_d, uint64_t, H8) + +#undef ICLAMP + +/* + * Note the argument ordering to minnum and maxnum must match + * the ARM pseudocode so that NaNs are propagated properly. + */ +#define FCLAMP(NAME, TYPE, H) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, \ + float_status *fpst, uint32_t desc) \ +{ \ + size_t stride = sizeof(ARMVectorReg) / sizeof(TYPE); \ + size_t elements = simd_oprsz(desc) / sizeof(TYPE); \ + size_t nreg = simd_data(desc); \ + TYPE *d = vd, *n = vn, *m = vm; \ + for (size_t e = 0; e < elements; e++) { \ + TYPE nn = n[H(e)], mm = m[H(e)]; \ + for (size_t r = 0; r < nreg; r++) { \ + TYPE *dd = &d[r * stride + H(e)]; \ + *dd = TYPE##_minnum(TYPE##_maxnum(nn, *dd, fpst), mm, fpst); \ + } \ + } \ +} + +FCLAMP(sme2_fclamp_h, float16, H2) +FCLAMP(sme2_fclamp_s, float32, H4) +FCLAMP(sme2_fclamp_d, float64, H8) +FCLAMP(sme2_bfclamp, bfloat16, H2) + +#undef FCLAMP diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3 * const uzp2_fns[] = { gen_helper_sme2_uzp2_q, }; TRANS_FEAT(UZP_2, aa64_sme2, do_zipuzp_2, a, uzp2_fns) + +static bool trans_FCLAMP(DisasContext *s, arg_zzz_en *a) +{ + static gen_helper_gvec_3_ptr * const fn[] = { + gen_helper_sme2_bfclamp, + gen_helper_sme2_fclamp_h, + gen_helper_sme2_fclamp_s, + gen_helper_sme2_fclamp_d, + }; + TCGv_ptr fpst; + int vl; + + if (!dc_isar_feature(aa64_sme2, s)) { + return false; + } + /* This insn uses MO_8 to encode BFloat16. */ + if (a->esz == MO_8 && !dc_isar_feature(aa64_sme_b16b16, s)) { + return false; + } + if (!sme_sm_enabled_check(s)) { + return true; + } + + fpst = fpstatus_ptr(a->esz == MO_16 ? FPST_A64_F16 : FPST_A64); + vl = vec_full_reg_size(s); + + tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, a->zd), + vec_full_reg_offset(s, a->zn), + vec_full_reg_offset(s, a->zm), + fpst, vl, vl, a->n, fn[a->esz]); + return true; +} + +static bool do_clamp(DisasContext *s, arg_zzz_en *a, + gen_helper_gvec_3 * const fn[4]) +{ + int vl; + + if (!dc_isar_feature(aa64_sme2, s)) { + return false; + } + if (!sme_sm_enabled_check(s)) { + return true; + } + + /* + * Clamp is just a min+max, easily supported by most host + * vector operations -- we already have such an expansion in + * translate-sve.c for a single output. + * TODO: Add support in gvec for multiple simultaneous output, + * and/or copy to temporary upon overlap. + */ + vl = vec_full_reg_size(s); + tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->zd), + vec_full_reg_offset(s, a->zn), + vec_full_reg_offset(s, a->zm), + vl, vl, a->n, fn[a->esz]); + return true; +} + +static gen_helper_gvec_3 * const sclamp_fns[] = { + gen_helper_sme2_sclamp_b, + gen_helper_sme2_sclamp_h, + gen_helper_sme2_sclamp_s, + gen_helper_sme2_sclamp_d, +}; +TRANS(SCLAMP, do_clamp, a, sclamp_fns) + +static gen_helper_gvec_3 * const uclamp_fns[] = { + gen_helper_sme2_uclamp_b, + gen_helper_sme2_uclamp_h, + gen_helper_sme2_uclamp_s, + gen_helper_sme2_uclamp_d, +}; +TRANS(UCLAMP, do_clamp, a, uclamp_fns) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> These instructions are present in both SME(1) and SVE2.1 extensions. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-68-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-sve.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static void gen_sclamp(unsigned vece, uint32_t d, uint32_t n, uint32_t m, tcg_gen_gvec_4(d, n, m, a, oprsz, maxsz, &ops[vece]); } -TRANS_FEAT(SCLAMP, aa64_sme, gen_gvec_fn_arg_zzzz, gen_sclamp, a) +TRANS_FEAT(SCLAMP, aa64_sme_or_sve2p1, gen_gvec_fn_arg_zzzz, gen_sclamp, a) static void gen_uclamp_i32(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_i32 a) { @@ -XXX,XX +XXX,XX @@ static void gen_uclamp(unsigned vece, uint32_t d, uint32_t n, uint32_t m, tcg_gen_gvec_4(d, n, m, a, oprsz, maxsz, &ops[vece]); } -TRANS_FEAT(UCLAMP, aa64_sme, gen_gvec_fn_arg_zzzz, gen_uclamp, a) +TRANS_FEAT(UCLAMP, aa64_sme_or_sve2p1, gen_gvec_fn_arg_zzzz, gen_uclamp, a) TRANS_FEAT(SQCVTN_sh, aa64_sme2_or_sve2p1, gen_gvec_ool_zz, gen_helper_sme2_sqcvtn_sh, a->rd, a->rn, 0) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> This is the single vector version within SVE decode space. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-69-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 2 ++ target/arm/tcg/translate-sve.c | 22 ++++++++++++++++++++++ 2 files changed, 24 insertions(+) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ PSEL 00100101 .1 1 000 .. 01 .... 0 .... 0 .... \ SCLAMP 01000100 .. 0 ..... 110000 ..... ..... @rda_rn_rm UCLAMP 01000100 .. 0 ..... 110001 ..... ..... @rda_rn_rm + +FCLAMP 01100100 .. 1 ..... 001001 ..... ..... @rda_rn_rm diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static void gen_uclamp(unsigned vece, uint32_t d, uint32_t n, uint32_t m, TRANS_FEAT(UCLAMP, aa64_sme_or_sve2p1, gen_gvec_fn_arg_zzzz, gen_uclamp, a) +static bool trans_FCLAMP(DisasContext *s, arg_FCLAMP *a) +{ + static gen_helper_gvec_3_ptr * const fn[] = { + gen_helper_sme2_bfclamp, + gen_helper_sme2_fclamp_h, + gen_helper_sme2_fclamp_s, + gen_helper_sme2_fclamp_d, + }; + + /* This insn uses MO_8 to encode BFloat16. */ + if (a->esz == MO_8 + ? !dc_isar_feature(aa64_sve_b16b16, s) + : !dc_isar_feature(aa64_sme2_or_sve2p1, s)) { + return false; + } + + /* So far we never optimize rda with MOVPRFX */ + assert(a->rd == a->ra); + return gen_gvec_fpst_zzz(s, fn[a->esz], a->rd, a->rn, a->rm, 1, + a->esz == MO_16 ? FPST_A64_F16 : FPST_A64); +} + TRANS_FEAT(SQCVTN_sh, aa64_sme2_or_sve2p1, gen_gvec_ool_zz, gen_helper_sme2_sqcvtn_sh, a->rd, a->rn, 0) TRANS_FEAT(UQCVTN_sh, aa64_sme2_or_sve2p1, gen_gvec_ool_zz, -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-70-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 23 +++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 20 ++++++++++++++++++++ 2 files changed, 43 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ UCLAMP 11000001 esz:2 1 zm:5 110001 zn:5 .... 1 \ &zzz_en zd=%zd_ax2 n=2 UCLAMP 11000001 esz:2 1 zm:5 110011 zn:5 ...0 1 \ &zzz_en zd=%zd_ax4 n=4 + +### SME Multiple Zero + +&zero_za rv off ngrp nvec + +ZERO_za 11000000 000011 000 .. 0000000000 off:3 \ + &zero_za ngrp=2 nvec=1 rv=%mova_rv +ZERO_za 11000000 000011 100 .. 0000000000 off:3 \ + &zero_za ngrp=4 nvec=1 rv=%mova_rv + +ZERO_za 11000000 000011 001 .. 0000000000 ... \ + &zero_za ngrp=1 nvec=2 rv=%mova_rv off=%off3_x2 +ZERO_za 11000000 000011 010 .. 0000000000 0.. \ + &zero_za ngrp=2 nvec=2 rv=%mova_rv off=%off2_x2 +ZERO_za 11000000 000011 011 .. 0000000000 0.. \ + &zero_za ngrp=4 nvec=2 rv=%mova_rv off=%off2_x2 + +ZERO_za 11000000 000011 101 .. 0000000000 0.. \ + &zero_za ngrp=1 nvec=4 rv=%mova_rv off=%off2_x4 +ZERO_za 11000000 000011 110 .. 0000000000 00. \ + &zero_za ngrp=2 nvec=4 rv=%mova_rv off=%off1_x4 +ZERO_za 11000000 000011 111 .. 0000000000 00. \ + &zero_za ngrp=4 nvec=4 rv=%mova_rv off=%off1_x4 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool trans_ZERO_zt0(DisasContext *s, arg_ZERO_zt0 *a) return true; } +static bool trans_ZERO_za(DisasContext *s, arg_ZERO_za *a) +{ + if (!dc_isar_feature(aa64_sme2p1, s)) { + return false; + } + if (sme_smza_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + int vstride = svl / a->ngrp; + TCGv_ptr t_za = get_zarray(s, a->rv, a->off, a->ngrp, a->nvec); + + for (int r = 0; r < a->ngrp; ++r) { + for (int i = 0; i < a->nvec; ++i) { + int o_za = (r * vstride + i) * sizeof(ARMVectorReg); + tcg_gen_gvec_dup_imm_var(MO_64, t_za, o_za, svl, svl, 0); + } + } + } + return true; +} + static bool do_mova_tile(DisasContext *s, arg_mova_p *a, bool to_vec) { static gen_helper_gvec_4 * const h_fns[5] = { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> For WHILE, we have the count of enabled predicates, so we don't need to search to compute the PredTest result. Reuse the logic that will shortly be required for counted predicates. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20250704142112.1018902-71-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve_helper.c | 81 +++++++++++++++++++++---------------- 1 file changed, 46 insertions(+), 35 deletions(-) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ uint64_t HELPER(sve_cntp)(void *vn, void *vg, uint32_t pred_desc) return sum; } +/* C.f. Arm pseudocode PredCountTest */ +static uint32_t pred_count_test(uint32_t elements, uint32_t count, bool invert) +{ + uint32_t flags; + + if (count == 0) { + flags = 1; /* !N, Z, C */ + } else if (!invert) { + flags = (1u << 31) | 2; /* N, !Z */ + flags |= count != elements; /* C */ + } else { + flags = 2; /* !Z, !C */ + flags |= (count == elements) << 31; /* N */ + } + return flags; +} + uint32_t HELPER(sve_whilel)(void *vd, uint32_t count, uint32_t pred_desc) { intptr_t oprsz = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); intptr_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); uint64_t esz_mask = pred_esz_masks[esz]; ARMPredicateReg *d = vd; - uint32_t flags; - intptr_t i; + intptr_t i, oprbits = oprsz * 8; + + tcg_debug_assert(count <= oprbits); /* Begin with a zero predicate register. */ - flags = do_zero(d, oprsz); - if (count == 0) { - return flags; + do_zero(d, oprsz); + if (count) { + /* Set all of the requested bits. */ + for (i = 0; i < count / 64; ++i) { + d->p[i] = esz_mask; + } + if (count & 63) { + d->p[i] = MAKE_64BIT_MASK(0, count & 63) & esz_mask; + } } - /* Set all of the requested bits. */ - for (i = 0; i < count / 64; ++i) { - d->p[i] = esz_mask; - } - if (count & 63) { - d->p[i] = MAKE_64BIT_MASK(0, count & 63) & esz_mask; - } - - return predtest_ones(d, oprsz, esz_mask); + return pred_count_test(oprbits, count, false); } uint32_t HELPER(sve_whileg)(void *vd, uint32_t count, uint32_t pred_desc) @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whileg)(void *vd, uint32_t count, uint32_t pred_desc) intptr_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); uint64_t esz_mask = pred_esz_masks[esz]; ARMPredicateReg *d = vd; - intptr_t i, invcount, oprbits; + intptr_t i, invcount, oprbits = oprsz * 8; uint64_t bits; - if (count == 0) { - return do_zero(d, oprsz); - } - - oprbits = oprsz * 8; tcg_debug_assert(count <= oprbits); - bits = esz_mask; - if (oprbits & 63) { - bits &= MAKE_64BIT_MASK(0, oprbits & 63); - } - - invcount = oprbits - count; - for (i = (oprsz - 1) / 8; i > invcount / 64; --i) { - d->p[i] = bits; + /* Begin with a zero predicate register. */ + do_zero(d, oprsz); + if (count) { + /* Set all of the requested bits. */ bits = esz_mask; + if (oprbits & 63) { + bits &= MAKE_64BIT_MASK(0, oprbits & 63); + } + + invcount = oprbits - count; + for (i = (oprsz - 1) / 8; i > invcount / 64; --i) { + d->p[i] = bits; + bits = esz_mask; + } + d->p[i] = bits & MAKE_64BIT_MASK(invcount & 63, 64); } - d->p[i] = bits & MAKE_64BIT_MASK(invcount & 63, 64); - - while (--i >= 0) { - d->p[i] = 0; - } - - return predtest_ones(d, oprsz, esz_mask); + return pred_count_test(oprbits, count, true); } /* Recursive reduction on a function; -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Merge predtest_ones into its only caller. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-72-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve_helper.c | 34 ++++++++++++++-------------------- 1 file changed, 14 insertions(+), 20 deletions(-) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sve_brkn)(void *vd, void *vn, void *vg, uint32_t pred_desc) } } -/* As if PredTest(Ones(PL), D, esz). */ -static uint32_t predtest_ones(ARMPredicateReg *d, intptr_t oprsz, - uint64_t esz_mask) -{ - uint32_t flags = PREDTEST_INIT; - intptr_t i; - - for (i = 0; i < oprsz / 8; i++) { - flags = iter_predtest_fwd(d->p[i], esz_mask, flags); - } - if (oprsz & 7) { - uint64_t mask = ~(-1ULL << (8 * (oprsz & 7))); - flags = iter_predtest_fwd(d->p[i], esz_mask & mask, flags); - } - return flags; -} - uint32_t HELPER(sve_brkns)(void *vd, void *vn, void *vg, uint32_t pred_desc) { intptr_t oprsz = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); if (last_active_pred(vn, vg, oprsz)) { - return predtest_ones(vd, oprsz, -1); - } else { - return do_zero(vd, oprsz); + ARMPredicateReg *d = vd; + uint32_t flags = PREDTEST_INIT; + intptr_t i; + + /* As if PredTest(Ones(PL), D, MO_8). */ + for (i = 0; i < oprsz / 8; i++) { + flags = iter_predtest_fwd(d->p[i], -1, flags); + } + if (oprsz & 7) { + uint64_t mask = ~(-1ULL << (8 * (oprsz & 7))); + flags = iter_predtest_fwd(d->p[i], mask, flags); + } + return flags; } + return do_zero(vd, oprsz); } uint64_t HELPER(sve_cntp)(void *vn, void *vg, uint32_t pred_desc) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Expand to memset plus the return value, when used. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20250704142112.1018902-73-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve_helper.c | 28 +++++++++++----------------- 1 file changed, 11 insertions(+), 17 deletions(-) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ static uint32_t compute_brks_m(uint64_t *d, uint64_t *n, uint64_t *g, return flags; } -static uint32_t do_zero(ARMPredicateReg *d, intptr_t oprsz) -{ - /* It is quicker to zero the whole predicate than loop on OPRSZ. - * The compiler should turn this into 4 64-bit integer stores. - */ - memset(d, 0, sizeof(ARMPredicateReg)); - return PREDTEST_INIT; -} - void HELPER(sve_brkpa)(void *vd, void *vn, void *vm, void *vg, uint32_t pred_desc) { @@ -XXX,XX +XXX,XX @@ void HELPER(sve_brkpa)(void *vd, void *vn, void *vm, void *vg, if (last_active_pred(vn, vg, oprsz)) { compute_brk_z(vd, vm, vg, oprsz, true); } else { - do_zero(vd, oprsz); + memset(vd, 0, sizeof(ARMPredicateReg)); } } @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_brkpas)(void *vd, void *vn, void *vm, void *vg, if (last_active_pred(vn, vg, oprsz)) { return compute_brks_z(vd, vm, vg, oprsz, true); } else { - return do_zero(vd, oprsz); + memset(vd, 0, sizeof(ARMPredicateReg)); + return PREDTEST_INIT; } } @@ -XXX,XX +XXX,XX @@ void HELPER(sve_brkpb)(void *vd, void *vn, void *vm, void *vg, if (last_active_pred(vn, vg, oprsz)) { compute_brk_z(vd, vm, vg, oprsz, false); } else { - do_zero(vd, oprsz); + memset(vd, 0, sizeof(ARMPredicateReg)); } } @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_brkpbs)(void *vd, void *vn, void *vm, void *vg, if (last_active_pred(vn, vg, oprsz)) { return compute_brks_z(vd, vm, vg, oprsz, false); } else { - return do_zero(vd, oprsz); + memset(vd, 0, sizeof(ARMPredicateReg)); + return PREDTEST_INIT; } } @@ -XXX,XX +XXX,XX @@ void HELPER(sve_brkn)(void *vd, void *vn, void *vg, uint32_t pred_desc) { intptr_t oprsz = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); if (!last_active_pred(vn, vg, oprsz)) { - do_zero(vd, oprsz); + memset(vd, 0, sizeof(ARMPredicateReg)); } } @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_brkns)(void *vd, void *vn, void *vg, uint32_t pred_desc) } return flags; } - return do_zero(vd, oprsz); + memset(vd, 0, sizeof(ARMPredicateReg)); + return PREDTEST_INIT; } uint64_t HELPER(sve_cntp)(void *vn, void *vg, uint32_t pred_desc) @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whilel)(void *vd, uint32_t count, uint32_t pred_desc) tcg_debug_assert(count <= oprbits); /* Begin with a zero predicate register. */ - do_zero(d, oprsz); + memset(d, 0, sizeof(*d)); if (count) { /* Set all of the requested bits. */ for (i = 0; i < count / 64; ++i) { @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whileg)(void *vd, uint32_t count, uint32_t pred_desc) tcg_debug_assert(count <= oprbits); /* Begin with a zero predicate register. */ - do_zero(d, oprsz); + memset(d, 0, sizeof(*d)); if (count) { /* Set all of the requested bits. */ bits = esz_mask; -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-74-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve_helper.c | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ static uint32_t pred_count_test(uint32_t elements, uint32_t count, bool invert) return flags; } -uint32_t HELPER(sve_whilel)(void *vd, uint32_t count, uint32_t pred_desc) +/* D must be cleared on entry. */ +static void do_whilel(ARMPredicateReg *d, uint64_t esz_mask, + uint32_t count, uint32_t oprbits) { - intptr_t oprsz = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); - intptr_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); - uint64_t esz_mask = pred_esz_masks[esz]; - ARMPredicateReg *d = vd; - intptr_t i, oprbits = oprsz * 8; - tcg_debug_assert(count <= oprbits); - - /* Begin with a zero predicate register. */ - memset(d, 0, sizeof(*d)); if (count) { + uint32_t i; + /* Set all of the requested bits. */ for (i = 0; i < count / 64; ++i) { d->p[i] = esz_mask; @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whilel)(void *vd, uint32_t count, uint32_t pred_desc) d->p[i] = MAKE_64BIT_MASK(0, count & 63) & esz_mask; } } +} +uint32_t HELPER(sve_whilel)(void *vd, uint32_t count, uint32_t pred_desc) +{ + uint32_t oprsz = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); + uint32_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); + uint32_t oprbits = oprsz * 8; + uint64_t esz_mask = pred_esz_masks[esz]; + ARMPredicateReg *d = vd; + + memset(d, 0, sizeof(*d)); + do_whilel(d, esz_mask, count, oprbits); return pred_count_test(oprbits, count, false); } -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20250704142112.1018902-75-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve_helper.c | 39 +++++++++++++++++++------------------ 1 file changed, 20 insertions(+), 19 deletions(-) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whilel)(void *vd, uint32_t count, uint32_t pred_desc) return pred_count_test(oprbits, count, false); } -uint32_t HELPER(sve_whileg)(void *vd, uint32_t count, uint32_t pred_desc) +/* D must be cleared on entry. */ +static void do_whileg(ARMPredicateReg *d, uint64_t esz_mask, + uint32_t count, uint32_t oprbits) { - intptr_t oprsz = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); - intptr_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); - uint64_t esz_mask = pred_esz_masks[esz]; - ARMPredicateReg *d = vd; - intptr_t i, invcount, oprbits = oprsz * 8; - uint64_t bits; - tcg_debug_assert(count <= oprbits); - - /* Begin with a zero predicate register. */ - memset(d, 0, sizeof(*d)); if (count) { - /* Set all of the requested bits. */ - bits = esz_mask; - if (oprbits & 63) { - bits &= MAKE_64BIT_MASK(0, oprbits & 63); - } + uint32_t i, invcount = oprbits - count; + uint64_t bits = esz_mask & MAKE_64BIT_MASK(invcount & 63, 64); - invcount = oprbits - count; - for (i = (oprsz - 1) / 8; i > invcount / 64; --i) { + for (i = invcount / 64; i < oprbits / 64; ++i) { d->p[i] = bits; bits = esz_mask; } - d->p[i] = bits & MAKE_64BIT_MASK(invcount & 63, 64); + if (oprbits & 63) { + d->p[i] = bits & MAKE_64BIT_MASK(0, oprbits & 63); + } } +} +uint32_t HELPER(sve_whileg)(void *vd, uint32_t count, uint32_t pred_desc) +{ + uint32_t oprsz = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); + uint32_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); + uint32_t oprbits = oprsz * 8; + uint64_t esz_mask = pred_esz_masks[esz]; + ARMPredicateReg *d = vd; + + memset(d, 0, sizeof(*d)); + do_whileg(d, esz_mask, count, oprbits); return pred_count_test(oprbits, count, true); } -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Change the API to pass element count rather than bit count. This will be helpful later for predicate as counter. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-76-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve_helper.c | 2 ++ target/arm/tcg/translate-sve.c | 13 +++++-------- 2 files changed, 7 insertions(+), 8 deletions(-) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whilel)(void *vd, uint32_t count, uint32_t pred_desc) uint64_t esz_mask = pred_esz_masks[esz]; ARMPredicateReg *d = vd; + count <<= esz; memset(d, 0, sizeof(*d)); do_whilel(d, esz_mask, count, oprbits); return pred_count_test(oprbits, count, false); @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whileg)(void *vd, uint32_t count, uint32_t pred_desc) uint64_t esz_mask = pred_esz_masks[esz]; ARMPredicateReg *d = vd; + count <<= esz; memset(d, 0, sizeof(*d)); do_whileg(d, esz_mask, count, oprbits); return pred_count_test(oprbits, count, true); diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_WHILE(DisasContext *s, arg_WHILE *a) t2 = tcg_temp_new_i32(); tcg_gen_extrl_i64_i32(t2, t0); - /* Scale elements to bits. */ - tcg_gen_shli_i32(t2, t2, a->esz); - desc = FIELD_DP32(desc, PREDDESC, OPRSZ, vsz / 8); desc = FIELD_DP32(desc, PREDDESC, ESZ, a->esz); @@ -XXX,XX +XXX,XX @@ static bool trans_WHILE_ptr(DisasContext *s, arg_WHILE_ptr *a) op0 = read_cpu_reg(s, a->rn, 1); op1 = read_cpu_reg(s, a->rm, 1); - tmax = tcg_constant_i64(vsz); + tmax = tcg_constant_i64(vsz >> a->esz); diff = tcg_temp_new_i64(); if (a->rw) { @@ -XXX,XX +XXX,XX @@ static bool trans_WHILE_ptr(DisasContext *s, arg_WHILE_ptr *a) tcg_gen_sub_i64(diff, op0, op1); tcg_gen_sub_i64(t1, op1, op0); tcg_gen_movcond_i64(TCG_COND_GEU, diff, op0, op1, diff, t1); - /* Round down to a multiple of ESIZE. */ - tcg_gen_andi_i64(diff, diff, -1 << a->esz); + /* Divide, rounding down, by ESIZE. */ + tcg_gen_shri_i64(diff, diff, a->esz); /* If op1 == op0, diff == 0, and the condition is always true. */ tcg_gen_movcond_i64(TCG_COND_EQ, diff, op0, op1, tmax, diff); } else { /* WHILEWR */ tcg_gen_sub_i64(diff, op1, op0); - /* Round down to a multiple of ESIZE. */ - tcg_gen_andi_i64(diff, diff, -1 << a->esz); + /* Divide, rounding down, by ESIZE. */ + tcg_gen_shri_i64(diff, diff, a->esz); /* If op0 >= op1, diff <= 0, the condition is always true. */ tcg_gen_movcond_i64(TCG_COND_GEU, diff, op0, op1, tmax, diff); } -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Use TRANS_FEAT to select the correct predicate. Pass the helper and a boolean to do_WHILE. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-77-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 4 +++- target/arm/tcg/translate-sve.c | 23 +++++++++-------------- 2 files changed, 12 insertions(+), 15 deletions(-) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ SINCDECP_z 00100101 .. 1010 d:1 u:1 10000 00 .... ..... @incdec2_pred CTERM 00100101 1 sf:1 1 rm:5 001000 rn:5 ne:1 0000 # SVE integer compare scalar count and limit -WHILE 00100101 esz:2 1 rm:5 000 sf:1 u:1 lt:1 rn:5 eq:1 rd:4 +&while esz rd rn rm sf u eq +WHILE_lt 00100101 esz:2 1 rm:5 000 sf:1 u:1 1 rn:5 eq:1 rd:4 &while +WHILE_gt 00100101 esz:2 1 rm:5 000 sf:1 u:1 0 rn:5 eq:1 rd:4 &while # SVE2 pointer conflict compare WHILE_ptr 00100101 esz:2 1 rm:5 001 100 rn:5 rw:1 rd:4 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_CTERM(DisasContext *s, arg_CTERM *a) return true; } -static bool trans_WHILE(DisasContext *s, arg_WHILE *a) +typedef void gen_while_fn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32); +static bool do_WHILE(DisasContext *s, arg_while *a, bool lt, gen_while_fn *fn) { TCGv_i64 op0, op1, t0, t1, tmax; TCGv_i32 t2; @@ -XXX,XX +XXX,XX @@ static bool trans_WHILE(DisasContext *s, arg_WHILE *a) TCGCond cond; uint64_t maxval; /* Note that GE/HS has a->eq == 0 and GT/HI has a->eq == 1. */ - bool eq = a->eq == a->lt; + bool eq = a->eq == lt; - /* The greater-than conditions are all SVE2. */ - if (a->lt - ? !dc_isar_feature(aa64_sve, s) - : !dc_isar_feature(aa64_sve2, s)) { - return false; - } if (!sve_access_check(s)) { return true; } @@ -XXX,XX +XXX,XX @@ static bool trans_WHILE(DisasContext *s, arg_WHILE *a) t0 = tcg_temp_new_i64(); t1 = tcg_temp_new_i64(); - if (a->lt) { + if (lt) { tcg_gen_sub_i64(t0, op1, op0); if (a->u) { maxval = a->sf ? UINT64_MAX : UINT32_MAX; @@ -XXX,XX +XXX,XX @@ static bool trans_WHILE(DisasContext *s, arg_WHILE *a) ptr = tcg_temp_new_ptr(); tcg_gen_addi_ptr(ptr, tcg_env, pred_full_reg_offset(s, a->rd)); - if (a->lt) { - gen_helper_sve_whilel(t2, ptr, t2, tcg_constant_i32(desc)); - } else { - gen_helper_sve_whileg(t2, ptr, t2, tcg_constant_i32(desc)); - } + fn(t2, ptr, t2, tcg_constant_i32(desc)); + do_pred_flags(t2); return true; } +TRANS_FEAT(WHILE_lt, aa64_sve, do_WHILE, a, true, gen_helper_sve_whilel) +TRANS_FEAT(WHILE_gt, aa64_sve2, do_WHILE, a, false, gen_helper_sve_whileg) + static bool trans_WHILE_ptr(DisasContext *s, arg_WHILE_ptr *a) { TCGv_i64 op0, op1, diff, t1, tmax; -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-78-richard.henderson@linaro.org This instruction is present in both SME(1) and SVE2.1 extensions. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-sve.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_PSEL(DisasContext *s, arg_psel *a) TCGv_i64 tmp, didx, dbit; TCGv_ptr ptr; - if (!dc_isar_feature(aa64_sme, s)) { + if (!dc_isar_feature(aa64_sme_or_sve2p1, s)) { return false; } if (!sve_access_check(s)) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-79-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 3 +++ target/arm/tcg/sve.decode | 8 +++++++ target/arm/tcg/sve_helper.c | 40 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sve.c | 14 ++++++++---- 4 files changed, 61 insertions(+), 4 deletions(-) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(sve_cntp, TCG_CALL_NO_RWG, i64, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve_whilel, TCG_CALL_NO_RWG, i32, ptr, i32, i32) DEF_HELPER_FLAGS_3(sve_whileg, TCG_CALL_NO_RWG, i32, ptr, i32, i32) +DEF_HELPER_FLAGS_3(sve_while2l, TCG_CALL_NO_RWG, i32, ptr, i32, i32) +DEF_HELPER_FLAGS_3(sve_while2g, TCG_CALL_NO_RWG, i32, ptr, i32, i32) + DEF_HELPER_FLAGS_4(sve_subri_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32) DEF_HELPER_FLAGS_4(sve_subri_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32) DEF_HELPER_FLAGS_4(sve_subri_s, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ WHILE_gt 00100101 esz:2 1 rm:5 000 sf:1 u:1 0 rn:5 eq:1 rd:4 &while # SVE2 pointer conflict compare WHILE_ptr 00100101 esz:2 1 rm:5 001 100 rn:5 rw:1 rd:4 +# SVE2.1 predicate pair +%pd_pair 1:3 !function=times_2 +@while_pair ........ esz:2 . rm:5 .... u:1 . rn:5 . ... eq:1 \ + &while rd=%pd_pair sf=1 + +WHILE_lt_pair 00100101 .. 1 ..... 0101 . 1 ..... 1 ... . @while_pair +WHILE_gt_pair 00100101 .. 1 ..... 0101 . 0 ..... 1 ... . @while_pair + ### SVE Integer Wide Immediate - Unpredicated Group # SVE broadcast floating-point immediate (unpredicated) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whilel)(void *vd, uint32_t count, uint32_t pred_desc) return pred_count_test(oprbits, count, false); } +uint32_t HELPER(sve_while2l)(void *vd, uint32_t count, uint32_t pred_desc) +{ + uint32_t oprsz = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); + uint32_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); + uint32_t oprbits = oprsz * 8; + uint64_t esz_mask = pred_esz_masks[esz]; + ARMPredicateReg *d = vd; + + count <<= esz; + memset(d, 0, 2 * sizeof(*d)); + if (count <= oprbits) { + do_whilel(&d[0], esz_mask, count, oprbits); + } else { + do_whilel(&d[0], esz_mask, oprbits, oprbits); + do_whilel(&d[1], esz_mask, count - oprbits, oprbits); + } + + return pred_count_test(2 * oprbits, count, false); +} + /* D must be cleared on entry. */ static void do_whileg(ARMPredicateReg *d, uint64_t esz_mask, uint32_t count, uint32_t oprbits) @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whileg)(void *vd, uint32_t count, uint32_t pred_desc) return pred_count_test(oprbits, count, true); } +uint32_t HELPER(sve_while2g)(void *vd, uint32_t count, uint32_t pred_desc) +{ + uint32_t oprsz = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); + uint32_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); + uint32_t oprbits = oprsz * 8; + uint64_t esz_mask = pred_esz_masks[esz]; + ARMPredicateReg *d = vd; + + count <<= esz; + memset(d, 0, 2 * sizeof(*d)); + if (count <= oprbits) { + do_whileg(&d[1], esz_mask, count, oprbits); + } else { + do_whilel(&d[1], esz_mask, oprbits, oprbits); + do_whileg(&d[0], esz_mask, count - oprbits, oprbits); + } + + return pred_count_test(2 * oprbits, count, true); +} + /* Recursive reduction on a function; * C.f. the ARM ARM function ReducePredicated. * diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_CTERM(DisasContext *s, arg_CTERM *a) } typedef void gen_while_fn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32); -static bool do_WHILE(DisasContext *s, arg_while *a, bool lt, gen_while_fn *fn) +static bool do_WHILE(DisasContext *s, arg_while *a, + bool lt, int scale, gen_while_fn *fn) { TCGv_i64 op0, op1, t0, t1, tmax; TCGv_i32 t2; @@ -XXX,XX +XXX,XX @@ static bool do_WHILE(DisasContext *s, arg_while *a, bool lt, gen_while_fn *fn) } } - tmax = tcg_constant_i64(vsz >> a->esz); + tmax = tcg_constant_i64((vsz << scale) >> a->esz); if (eq) { /* Equality means one more iteration. */ tcg_gen_addi_i64(t0, t0, 1); @@ -XXX,XX +XXX,XX @@ static bool do_WHILE(DisasContext *s, arg_while *a, bool lt, gen_while_fn *fn) return true; } -TRANS_FEAT(WHILE_lt, aa64_sve, do_WHILE, a, true, gen_helper_sve_whilel) -TRANS_FEAT(WHILE_gt, aa64_sve2, do_WHILE, a, false, gen_helper_sve_whileg) +TRANS_FEAT(WHILE_lt, aa64_sve, do_WHILE, a, true, 0, gen_helper_sve_whilel) +TRANS_FEAT(WHILE_gt, aa64_sve2, do_WHILE, a, false, 0, gen_helper_sve_whileg) + +TRANS_FEAT(WHILE_lt_pair, aa64_sme2_or_sve2p1, do_WHILE, + a, true, 1, gen_helper_sve_while2l) +TRANS_FEAT(WHILE_gt_pair, aa64_sme2_or_sve2p1, do_WHILE, + a, false, 1, gen_helper_sve_while2g) static bool trans_WHILE_ptr(DisasContext *s, arg_WHILE_ptr *a) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-80-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 3 ++ target/arm/tcg/sve.decode | 11 +++++++ target/arm/tcg/sve_helper.c | 53 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sve.c | 22 ++++++++++---- 4 files changed, 84 insertions(+), 5 deletions(-) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(sve_whileg, TCG_CALL_NO_RWG, i32, ptr, i32, i32) DEF_HELPER_FLAGS_3(sve_while2l, TCG_CALL_NO_RWG, i32, ptr, i32, i32) DEF_HELPER_FLAGS_3(sve_while2g, TCG_CALL_NO_RWG, i32, ptr, i32, i32) +DEF_HELPER_FLAGS_3(sve_whilecl, TCG_CALL_NO_RWG, i32, ptr, i32, i32) +DEF_HELPER_FLAGS_3(sve_whilecg, TCG_CALL_NO_RWG, i32, ptr, i32, i32) + DEF_HELPER_FLAGS_4(sve_subri_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32) DEF_HELPER_FLAGS_4(sve_subri_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32) DEF_HELPER_FLAGS_4(sve_subri_s, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ %rn_ax2 6:4 !function=times_2 +%pnd 0:3 !function=plus_8 + ########################################################################### # Named attribute sets. These are used to make nice(er) names # when creating helpers common to those for the individual @@ -XXX,XX +XXX,XX @@ WHILE_ptr 00100101 esz:2 1 rm:5 001 100 rn:5 rw:1 rd:4 WHILE_lt_pair 00100101 .. 1 ..... 0101 . 1 ..... 1 ... . @while_pair WHILE_gt_pair 00100101 .. 1 ..... 0101 . 0 ..... 1 ... . @while_pair +# SVE2.1 predicate as count +@while_cnt ........ esz:2 . rm:5 .... u:1 . rn:5 . eq:1 ... \ + &while rd=%pnd sf=1 + +WHILE_lt_cnt2 00100101 .. 1 ..... 0100 . 1 ..... 1 . ... @while_cnt +WHILE_lt_cnt4 00100101 .. 1 ..... 0110 . 1 ..... 1 . ... @while_cnt +WHILE_gt_cnt2 00100101 .. 1 ..... 0100 . 0 ..... 1 . ... @while_cnt +WHILE_gt_cnt4 00100101 .. 1 ..... 0110 . 0 ..... 1 . ... @while_cnt + ### SVE Integer Wide Immediate - Unpredicated Group # SVE broadcast floating-point immediate (unpredicated) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ uint64_t HELPER(sve_cntp)(void *vn, void *vg, uint32_t pred_desc) return sum; } +/* C.f. Arm pseudocode EncodePredCount */ +static uint64_t encode_pred_count(uint32_t elements, uint32_t count, + uint32_t esz, bool invert) +{ + uint32_t pred; + + if (count == 0) { + return 0; + } + if (invert) { + count = elements - count; + } else if (count == elements) { + count = 0; + invert = true; + } + + pred = (count << 1) | 1; + pred <<= esz; + pred |= invert << 15; + + return pred; +} + /* C.f. Arm pseudocode PredCountTest */ static uint32_t pred_count_test(uint32_t elements, uint32_t count, bool invert) { @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_while2l)(void *vd, uint32_t count, uint32_t pred_desc) return pred_count_test(2 * oprbits, count, false); } +uint32_t HELPER(sve_whilecl)(void *vd, uint32_t count, uint32_t pred_desc) +{ + uint32_t pl = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); + uint32_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); + uint32_t scale = FIELD_EX32(pred_desc, PREDDESC, DATA); + uint32_t vl = pl * 8; + uint32_t elements = (vl >> esz) << scale; + ARMPredicateReg *d = vd; + + *d = (ARMPredicateReg) { + .p[0] = encode_pred_count(elements, count, esz, false) + }; + return pred_count_test(elements, count, false); +} + /* D must be cleared on entry. */ static void do_whileg(ARMPredicateReg *d, uint64_t esz_mask, uint32_t count, uint32_t oprbits) @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_while2g)(void *vd, uint32_t count, uint32_t pred_desc) return pred_count_test(2 * oprbits, count, true); } +uint32_t HELPER(sve_whilecg)(void *vd, uint32_t count, uint32_t pred_desc) +{ + uint32_t pl = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); + uint32_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); + uint32_t scale = FIELD_EX32(pred_desc, PREDDESC, DATA); + uint32_t vl = pl * 8; + uint32_t elements = (vl >> esz) << scale; + ARMPredicateReg *d = vd; + + *d = (ARMPredicateReg) { + .p[0] = encode_pred_count(elements, count, esz, true) + }; + return pred_count_test(elements, count, true); +} + /* Recursive reduction on a function; * C.f. the ARM ARM function ReducePredicated. * diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_CTERM(DisasContext *s, arg_CTERM *a) typedef void gen_while_fn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32); static bool do_WHILE(DisasContext *s, arg_while *a, - bool lt, int scale, gen_while_fn *fn) + bool lt, int scale, int data, gen_while_fn *fn) { TCGv_i64 op0, op1, t0, t1, tmax; TCGv_i32 t2; @@ -XXX,XX +XXX,XX @@ static bool do_WHILE(DisasContext *s, arg_while *a, desc = FIELD_DP32(desc, PREDDESC, OPRSZ, vsz / 8); desc = FIELD_DP32(desc, PREDDESC, ESZ, a->esz); + desc = FIELD_DP32(desc, PREDDESC, DATA, data); ptr = tcg_temp_new_ptr(); tcg_gen_addi_ptr(ptr, tcg_env, pred_full_reg_offset(s, a->rd)); @@ -XXX,XX +XXX,XX @@ static bool do_WHILE(DisasContext *s, arg_while *a, return true; } -TRANS_FEAT(WHILE_lt, aa64_sve, do_WHILE, a, true, 0, gen_helper_sve_whilel) -TRANS_FEAT(WHILE_gt, aa64_sve2, do_WHILE, a, false, 0, gen_helper_sve_whileg) +TRANS_FEAT(WHILE_lt, aa64_sve, do_WHILE, + a, true, 0, 0, gen_helper_sve_whilel) +TRANS_FEAT(WHILE_gt, aa64_sve2, do_WHILE, + a, false, 0, 0, gen_helper_sve_whileg) TRANS_FEAT(WHILE_lt_pair, aa64_sme2_or_sve2p1, do_WHILE, - a, true, 1, gen_helper_sve_while2l) + a, true, 1, 0, gen_helper_sve_while2l) TRANS_FEAT(WHILE_gt_pair, aa64_sme2_or_sve2p1, do_WHILE, - a, false, 1, gen_helper_sve_while2g) + a, false, 1, 0, gen_helper_sve_while2g) + +TRANS_FEAT(WHILE_lt_cnt2, aa64_sme2_or_sve2p1, do_WHILE, + a, true, 1, 1, gen_helper_sve_whilecl) +TRANS_FEAT(WHILE_lt_cnt4, aa64_sme2_or_sve2p1, do_WHILE, + a, true, 2, 2, gen_helper_sve_whilecl) +TRANS_FEAT(WHILE_gt_cnt2, aa64_sme2_or_sve2p1, do_WHILE, + a, false, 1, 1, gen_helper_sve_whilecg) +TRANS_FEAT(WHILE_gt_cnt4, aa64_sme2_or_sve2p1, do_WHILE, + a, false, 2, 2, gen_helper_sve_whilecg) static bool trans_WHILE_ptr(DisasContext *s, arg_WHILE_ptr *a) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-81-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 1 + target/arm/tcg/translate-sve.c | 16 ++++++++++++++++ 2 files changed, 17 insertions(+) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ PTEST 00100101 01 010000 11 pg:4 0 rn:4 0 0000 # SVE predicate initialize PTRUE 00100101 esz:2 01100 s:1 111000 pat:5 0 rd:4 +PTRUE_cnt 00100101 esz:2 1000000111100000010 ... rd=%pnd # SVE initialize FFR SETFFR 00100101 0010 1100 1001 0000 0000 0000 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool do_predset(DisasContext *s, int esz, int rd, int pat, bool setflag) TRANS_FEAT(PTRUE, aa64_sve, do_predset, a->esz, a->rd, a->pat, a->s) +static bool trans_PTRUE_cnt(DisasContext *s, arg_PTRUE_cnt *a) +{ + if (!dc_isar_feature(aa64_sme2_or_sve2p1, s)) { + return false; + } + if (sve_access_check(s)) { + /* Canonical TRUE is 0 count, invert bit, plus element size. */ + int val = (1 << 15) | (1 << a->esz); + + /* Write val to the first uint64_t; clear all of the rest. */ + tcg_gen_gvec_dup_imm(MO_64, pred_full_reg_offset(s, a->rd), + 8, size_for_gvec(pred_full_reg_size(s)), val); + } + return true; +} + /* Note pat == 31 is #all, to set all elements. */ TRANS_FEAT_NONSTREAMING(SETFFR, aa64_sve, do_predset, 0, FFR_PRED_NUM, 31, false) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-82-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 25 ++++++++++++++++++ target/arm/tcg/sve.decode | 7 ++++++ target/arm/tcg/sve_helper.c | 46 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sve.c | 35 ++++++++++++++++++++++++++ 4 files changed, 113 insertions(+) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve2_sqshlu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_sqshlu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_sqshlu_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_sqshlu_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2p1_addqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_addqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_addqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_addqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2p1_smaxqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_smaxqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_smaxqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_smaxqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2p1_sminqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_sminqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_sminqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_sminqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2p1_umaxqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_umaxqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_umaxqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_umaxqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2p1_uminqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_uminqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_uminqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_uminqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ UMAXV 00000100 .. 001 001 001 ... ..... ..... @rd_pg_rn SMINV 00000100 .. 001 010 001 ... ..... ..... @rd_pg_rn UMINV 00000100 .. 001 011 001 ... ..... ..... @rd_pg_rn +# SVE2.1 segment reduction +ADDQV 00000100 .. 000 101 001 ... ..... ..... @rd_pg_rn +SMAXQV 00000100 .. 001 100 001 ... ..... ..... @rd_pg_rn +SMINQV 00000100 .. 001 110 001 ... ..... ..... @rd_pg_rn +UMAXQV 00000100 .. 001 101 001 ... ..... ..... @rd_pg_rn +UMINQV 00000100 .. 001 111 001 ... ..... ..... @rd_pg_rn + ### SVE Shift by Immediate - Predicated Group # SVE bitwise shift by immediate (predicated) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ DO_VPZ_D(sve_uminv_d, uint64_t, uint64_t, -1, DO_MIN) #undef DO_VPZ #undef DO_VPZ_D +#define DO_VPQ(NAME, TYPE, H, INIT, OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vg, uint32_t desc) \ +{ \ + TYPE tmp[16 / sizeof(TYPE)] = { [0 ... 16 / sizeof(TYPE) - 1] = INIT }; \ + TYPE *n = vn; uint16_t *g = vg; \ + uintptr_t oprsz = simd_oprsz(desc); \ + uintptr_t nseg = oprsz / 16, nsegelt = 16 / sizeof(TYPE); \ + for (uintptr_t s = 0; s < nseg; s++) { \ + uint16_t pg = g[H2(s)]; \ + for (uintptr_t e = 0; e < nsegelt; e++, pg >>= sizeof(TYPE)) { \ + if (pg & 1) { \ + tmp[e] = OP(tmp[H(e)], n[s * nsegelt + H(e)]); \ + } \ + } \ + } \ + memcpy(vd, tmp, 16); \ + clear_tail(vd, 16, simd_maxsz(desc)); \ +} + +DO_VPQ(sve2p1_addqv_b, uint8_t, H1, 0, DO_ADD) +DO_VPQ(sve2p1_addqv_h, uint16_t, H2, 0, DO_ADD) +DO_VPQ(sve2p1_addqv_s, uint32_t, H4, 0, DO_ADD) +DO_VPQ(sve2p1_addqv_d, uint64_t, H8, 0, DO_ADD) + +DO_VPQ(sve2p1_smaxqv_b, int8_t, H1, INT8_MIN, DO_MAX) +DO_VPQ(sve2p1_smaxqv_h, int16_t, H2, INT16_MIN, DO_MAX) +DO_VPQ(sve2p1_smaxqv_s, int32_t, H4, INT32_MIN, DO_MAX) +DO_VPQ(sve2p1_smaxqv_d, int64_t, H8, INT64_MIN, DO_MAX) + +DO_VPQ(sve2p1_sminqv_b, int8_t, H1, INT8_MAX, DO_MIN) +DO_VPQ(sve2p1_sminqv_h, int16_t, H2, INT16_MAX, DO_MIN) +DO_VPQ(sve2p1_sminqv_s, int32_t, H4, INT32_MAX, DO_MIN) +DO_VPQ(sve2p1_sminqv_d, int64_t, H8, INT64_MAX, DO_MIN) + +DO_VPQ(sve2p1_umaxqv_b, uint8_t, H1, 0, DO_MAX) +DO_VPQ(sve2p1_umaxqv_h, uint16_t, H2, 0, DO_MAX) +DO_VPQ(sve2p1_umaxqv_s, uint32_t, H4, 0, DO_MAX) +DO_VPQ(sve2p1_umaxqv_d, uint64_t, H8, 0, DO_MAX) + +DO_VPQ(sve2p1_uminqv_b, uint8_t, H1, -1, DO_MIN) +DO_VPQ(sve2p1_uminqv_h, uint16_t, H2, -1, DO_MIN) +DO_VPQ(sve2p1_uminqv_s, uint32_t, H4, -1, DO_MIN) +DO_VPQ(sve2p1_uminqv_d, uint64_t, H8, -1, DO_MIN) + +#undef DO_VPQ + /* Two vector operand, one scalar operand, unpredicated. */ #define DO_ZZI(NAME, TYPE, OP) \ void HELPER(NAME)(void *vd, void *vn, uint64_t s64, uint32_t desc) \ diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SXTW, aa64_sve, gen_gvec_ool_arg_zpz, TRANS_FEAT(UXTW, aa64_sve, gen_gvec_ool_arg_zpz, a->esz == 3 ? gen_helper_sve_uxtw_d : NULL, a, 0) +static gen_helper_gvec_3 * const addqv_fns[4] = { + gen_helper_sve2p1_addqv_b, gen_helper_sve2p1_addqv_h, + gen_helper_sve2p1_addqv_s, gen_helper_sve2p1_addqv_d, +}; +TRANS_FEAT(ADDQV, aa64_sme2p1_or_sve2p1, + gen_gvec_ool_arg_zpz, addqv_fns[a->esz], a, 0) + +static gen_helper_gvec_3 * const smaxqv_fns[4] = { + gen_helper_sve2p1_smaxqv_b, gen_helper_sve2p1_smaxqv_h, + gen_helper_sve2p1_smaxqv_s, gen_helper_sve2p1_smaxqv_d, +}; +TRANS_FEAT(SMAXQV, aa64_sme2p1_or_sve2p1, + gen_gvec_ool_arg_zpz, smaxqv_fns[a->esz], a, 0) + +static gen_helper_gvec_3 * const sminqv_fns[4] = { + gen_helper_sve2p1_sminqv_b, gen_helper_sve2p1_sminqv_h, + gen_helper_sve2p1_sminqv_s, gen_helper_sve2p1_sminqv_d, +}; +TRANS_FEAT(SMINQV, aa64_sme2p1_or_sve2p1, + gen_gvec_ool_arg_zpz, sminqv_fns[a->esz], a, 0) + +static gen_helper_gvec_3 * const umaxqv_fns[4] = { + gen_helper_sve2p1_umaxqv_b, gen_helper_sve2p1_umaxqv_h, + gen_helper_sve2p1_umaxqv_s, gen_helper_sve2p1_umaxqv_d, +}; +TRANS_FEAT(UMAXQV, aa64_sme2p1_or_sve2p1, + gen_gvec_ool_arg_zpz, umaxqv_fns[a->esz], a, 0) + +static gen_helper_gvec_3 * const uminqv_fns[4] = { + gen_helper_sve2p1_uminqv_b, gen_helper_sve2p1_uminqv_h, + gen_helper_sve2p1_uminqv_s, gen_helper_sve2p1_uminqv_d, +}; +TRANS_FEAT(UMINQV, aa64_sme2p1_or_sve2p1, + gen_gvec_ool_arg_zpz, uminqv_fns[a->esz], a, 0) + /* *** SVE Integer Reduction Group */ -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-83-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 2 + target/arm/tcg/vec_internal.h | 74 ++++++++++++++++++++++++++++++++++ target/arm/tcg/sve.decode | 6 +++ target/arm/tcg/sve_helper.c | 28 +++++++++++++ target/arm/tcg/translate-sve.c | 36 +++++++++++++++++ 5 files changed, 146 insertions(+) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve2p1_uminqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2p1_uminqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2p1_uminqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2p1_uminqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(pext, TCG_CALL_NO_RWG, void, ptr, i32, i32) diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_internal.h +++ b/target/arm/tcg/vec_internal.h @@ -XXX,XX +XXX,XX @@ bfloat16 helper_sme2_ah_fmin_b16(bfloat16 a, bfloat16 b, float_status *fpst); float32 sve_f16_to_f32(float16 f, float_status *fpst); float16 sve_f32_to_f16(float32 f, float_status *fpst); +/* + * Decode helper functions for predicate as counter. + */ + +typedef struct { + unsigned count; + unsigned lg2_stride; + bool invert; +} DecodeCounter; + +static inline DecodeCounter +decode_counter(unsigned png, unsigned vl, unsigned v_esz) +{ + DecodeCounter ret = { }; + + /* C.f. Arm pseudocode CounterToPredicate. */ + if (likely(png & 0xf)) { + unsigned p_esz = ctz32(png); + + /* + * maxbit = log2(pl(bits) * 4) + * = log2(vl(bytes) * 4) + * = log2(vl) + 2 + * maxbit_mask = ones<maxbit:0> + * = (1 << (maxbit + 1)) - 1 + * = (1 << (log2(vl) + 2 + 1)) - 1 + * = (1 << (log2(vl) + 3)) - 1 + * = (pow2ceil(vl) << 3) - 1 + */ + ret.count = png & (((unsigned)pow2ceil(vl) << 3) - 1); + ret.count >>= p_esz + 1; + + ret.invert = (png >> 15) & 1; + + /* + * The Arm pseudocode for CounterToPredicate expands the count to + * a set of bits, and then the operation proceeds as for the original + * interpretation of predicates as a set of bits. + * + * We can avoid the expansion by adjusting the count and supplying + * an element stride. + */ + if (unlikely(p_esz != v_esz)) { + if (p_esz < v_esz) { + /* + * For predicate esz < vector esz, the expanded predicate + * will have more bits set than will be consumed. + * Adjust the count down, rounding up. + * Consider p_esz = MO_8, v_esz = MO_64, count 14: + * The expanded predicate would be + * 0011 1111 1111 1111 + * The significant bits are + * ...1 ...1 ...1 ...1 + */ + unsigned shift = v_esz - p_esz; + unsigned trunc = ret.count >> shift; + ret.count = trunc + (ret.count != (trunc << shift)); + } else { + /* + * For predicate esz > vector esz, the expanded predicate + * will have bits set only at power-of-two multiples of + * the vector esz. Bits at other multiples will all be + * false. Adjust the count up, and supply the caller + * with a stride of elements to skip. + */ + unsigned shift = p_esz - v_esz; + ret.count <<= shift; + ret.lg2_stride = shift; + } + } + } + return ret; +} + #endif /* TARGET_ARM_VEC_INTERNAL_H */ diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ %rn_ax2 6:4 !function=times_2 %pnd 0:3 !function=plus_8 +%pnn 5:3 !function=plus_8 ########################################################################### # Named attribute sets. These are used to make nice(er) names @@ -XXX,XX +XXX,XX @@ WHILE_lt_cnt4 00100101 .. 1 ..... 0110 . 1 ..... 1 . ... @while_cnt WHILE_gt_cnt2 00100101 .. 1 ..... 0100 . 0 ..... 1 . ... @while_cnt WHILE_gt_cnt4 00100101 .. 1 ..... 0110 . 0 ..... 1 . ... @while_cnt +# SVE2.1 extract mask predicate from predicate-as-counter +&pext rd rn esz imm +PEXT_1 00100101 esz:2 1 00000 0111 00 imm:2 ... 1 rd:4 &pext rn=%pnn +PEXT_2 00100101 esz:2 1 00000 0111 010 imm:1 ... 1 rd:4 &pext rn=%pnn + ### SVE Integer Wide Immediate - Unpredicated Group # SVE broadcast floating-point immediate (unpredicated) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ DO_FCVTLT(sve2_fcvtlt_sd, uint64_t, uint32_t, H1_8, H1_4, float32_to_float64) #undef DO_FCVTLT #undef DO_FCVTNT + +void HELPER(pext)(void *vd, uint32_t png, uint32_t desc) +{ + int pl = FIELD_EX32(desc, PREDDESC, OPRSZ); + int vl = pl * 8; + unsigned v_esz = FIELD_EX32(desc, PREDDESC, ESZ); + int part = FIELD_EX32(desc, PREDDESC, DATA); + DecodeCounter p = decode_counter(png, vl, v_esz); + uint64_t mask = pred_esz_masks[v_esz + p.lg2_stride]; + ARMPredicateReg *d = vd; + + /* + * Convert from element count to byte count and adjust + * for the portion of the 4*VL counter to be extracted. + */ + int b_count = (p.count << v_esz) - vl * part; + + memset(d, 0, sizeof(*d)); + if (p.invert) { + if (b_count <= 0) { + do_whilel(vd, mask, vl, vl); + } else if (b_count < vl) { + do_whileg(vd, mask, vl - b_count, vl); + } + } else if (b_count > 0) { + do_whilel(vd, mask, MIN(b_count, vl), vl); + } +} diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_WHILE_ptr(DisasContext *s, arg_WHILE_ptr *a) return true; } +static bool do_pext(DisasContext *s, arg_pext *a, int n) +{ + TCGv_i32 t_png; + TCGv_ptr t_pd; + int pl; + + if (!sve_access_check(s)) { + return true; + } + + t_png = tcg_temp_new_i32(); + tcg_gen_ld16u_i32(t_png, tcg_env, + pred_full_reg_offset(s, a->rn) ^ + (HOST_BIG_ENDIAN ? 6 : 0)); + + t_pd = tcg_temp_new_ptr(); + pl = pred_full_reg_size(s); + + for (int i = 0; i < n; ++i) { + int rd = (a->rd + i) % 16; + int part = a->imm * n + i; + unsigned desc = 0; + + desc = FIELD_DP32(desc, PREDDESC, OPRSZ, pl); + desc = FIELD_DP32(desc, PREDDESC, ESZ, a->esz); + desc = FIELD_DP32(desc, PREDDESC, DATA, part); + + tcg_gen_addi_ptr(t_pd, tcg_env, pred_full_reg_offset(s, rd)); + gen_helper_pext(t_pd, t_png, tcg_constant_i32(desc)); + } + return true; +} + +TRANS_FEAT(PEXT_1, aa64_sme2_or_sve2p1, do_pext, a, 1) +TRANS_FEAT(PEXT_2, aa64_sme2_or_sve2p1, do_pext, a, 2) + /* *** SVE Integer Wide Immediate - Unpredicated Group */ -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-84-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 5 + target/arm/tcg/sme.decode | 9 + target/arm/tcg/sme_helper.c | 317 +++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 31 ++++ 4 files changed, 362 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sme2_fclamp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i3 DEF_HELPER_FLAGS_5(sme2_fclamp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(sme2_fclamp_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(sme2_bfclamp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) + +DEF_HELPER_FLAGS_5(sme2_sel_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32, i32) +DEF_HELPER_FLAGS_5(sme2_sel_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32, i32) +DEF_HELPER_FLAGS_5(sme2_sel_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32, i32) +DEF_HELPER_FLAGS_5(sme2_sel_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ UCLAMP 11000001 esz:2 1 zm:5 110001 zn:5 .... 1 \ UCLAMP 11000001 esz:2 1 zm:5 110011 zn:5 ...0 1 \ &zzz_en zd=%zd_ax4 n=4 +### SME2 Multi-vector SVE Select + +%sel_pg 10:3 !function=plus_8 + +SEL 11000001 esz:2 1 ....0 100 ... ....0 ....0 \ + n=2 zd=%zd_ax2 zn=%zn_ax2 zm=%zm_ax2 pg=%sel_pg +SEL 11000001 esz:2 1 ...01 100 ... ...00 ...00 \ + n=4 zd=%zd_ax4 zn=%zn_ax4 zm=%zm_ax4 pg=%sel_pg + ### SME Multiple Zero &zero_za rv off ngrp nvec diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ FCLAMP(sme2_fclamp_d, float64, H8) FCLAMP(sme2_bfclamp, bfloat16, H2) #undef FCLAMP + +void HELPER(sme2_sel_b)(void *vd, void *vn, void *vm, + uint32_t png, uint32_t desc) +{ + int vl = simd_oprsz(desc); + int nreg = simd_data(desc); + int elements = vl / sizeof(uint8_t); + DecodeCounter p = decode_counter(png, vl, MO_8); + + if (p.lg2_stride == 0) { + if (p.invert) { + for (int r = 0; r < nreg; r++) { + uint8_t *d = vd + r * sizeof(ARMVectorReg); + uint8_t *n = vn + r * sizeof(ARMVectorReg); + uint8_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + + if (split <= 0) { + memcpy(d, n, vl); /* all true */ + } else if (elements <= split) { + memcpy(d, m, vl); /* all false */ + } else { + for (int e = 0; e < split; e++) { + d[H1(e)] = m[H1(e)]; + } + for (int e = split; e < elements; e++) { + d[H1(e)] = n[H1(e)]; + } + } + } + } else { + for (int r = 0; r < nreg; r++) { + uint8_t *d = vd + r * sizeof(ARMVectorReg); + uint8_t *n = vn + r * sizeof(ARMVectorReg); + uint8_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + + if (split <= 0) { + memcpy(d, m, vl); /* all false */ + } else if (elements <= split) { + memcpy(d, n, vl); /* all true */ + } else { + for (int e = 0; e < split; e++) { + d[H1(e)] = n[H1(e)]; + } + for (int e = split; e < elements; e++) { + d[H1(e)] = m[H1(e)]; + } + } + } + } + } else { + int estride = 1 << p.lg2_stride; + if (p.invert) { + for (int r = 0; r < nreg; r++) { + uint8_t *d = vd + r * sizeof(ARMVectorReg); + uint8_t *n = vn + r * sizeof(ARMVectorReg); + uint8_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + int e = 0; + + for (; e < MIN(split, elements); e++) { + d[H1(e)] = m[H1(e)]; + } + for (; e < elements; e += estride) { + d[H1(e)] = n[H1(e)]; + for (int i = 1; i < estride; i++) { + d[H1(e + i)] = m[H1(e + i)]; + } + } + } + } else { + for (int r = 0; r < nreg; r++) { + uint8_t *d = vd + r * sizeof(ARMVectorReg); + uint8_t *n = vn + r * sizeof(ARMVectorReg); + uint8_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + int e = 0; + + for (; e < MIN(split, elements); e += estride) { + d[H1(e)] = n[H1(e)]; + for (int i = 1; i < estride; i++) { + d[H1(e + i)] = m[H1(e + i)]; + } + } + for (; e < elements; e++) { + d[H1(e)] = m[H1(e)]; + } + } + } + } +} + +void HELPER(sme2_sel_h)(void *vd, void *vn, void *vm, + uint32_t png, uint32_t desc) +{ + int vl = simd_oprsz(desc); + int nreg = simd_data(desc); + int elements = vl / sizeof(uint16_t); + DecodeCounter p = decode_counter(png, vl, MO_16); + + if (p.lg2_stride == 0) { + if (p.invert) { + for (int r = 0; r < nreg; r++) { + uint16_t *d = vd + r * sizeof(ARMVectorReg); + uint16_t *n = vn + r * sizeof(ARMVectorReg); + uint16_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + + if (split <= 0) { + memcpy(d, n, vl); /* all true */ + } else if (elements <= split) { + memcpy(d, m, vl); /* all false */ + } else { + for (int e = 0; e < split; e++) { + d[H2(e)] = m[H2(e)]; + } + for (int e = split; e < elements; e++) { + d[H2(e)] = n[H2(e)]; + } + } + } + } else { + for (int r = 0; r < nreg; r++) { + uint16_t *d = vd + r * sizeof(ARMVectorReg); + uint16_t *n = vn + r * sizeof(ARMVectorReg); + uint16_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + + if (split <= 0) { + memcpy(d, m, vl); /* all false */ + } else if (elements <= split) { + memcpy(d, n, vl); /* all true */ + } else { + for (int e = 0; e < split; e++) { + d[H2(e)] = n[H2(e)]; + } + for (int e = split; e < elements; e++) { + d[H2(e)] = m[H2(e)]; + } + } + } + } + } else { + int estride = 1 << p.lg2_stride; + if (p.invert) { + for (int r = 0; r < nreg; r++) { + uint16_t *d = vd + r * sizeof(ARMVectorReg); + uint16_t *n = vn + r * sizeof(ARMVectorReg); + uint16_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + int e = 0; + + for (; e < MIN(split, elements); e++) { + d[H2(e)] = m[H2(e)]; + } + for (; e < elements; e += estride) { + d[H2(e)] = n[H2(e)]; + for (int i = 1; i < estride; i++) { + d[H2(e + i)] = m[H2(e + i)]; + } + } + } + } else { + for (int r = 0; r < nreg; r++) { + uint16_t *d = vd + r * sizeof(ARMVectorReg); + uint16_t *n = vn + r * sizeof(ARMVectorReg); + uint16_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + int e = 0; + + for (; e < MIN(split, elements); e += estride) { + d[H2(e)] = n[H2(e)]; + for (int i = 1; i < estride; i++) { + d[H2(e + i)] = m[H2(e + i)]; + } + } + for (; e < elements; e++) { + d[H2(e)] = m[H2(e)]; + } + } + } + } +} + +void HELPER(sme2_sel_s)(void *vd, void *vn, void *vm, + uint32_t png, uint32_t desc) +{ + int vl = simd_oprsz(desc); + int nreg = simd_data(desc); + int elements = vl / sizeof(uint32_t); + DecodeCounter p = decode_counter(png, vl, MO_32); + + if (p.lg2_stride == 0) { + if (p.invert) { + for (int r = 0; r < nreg; r++) { + uint32_t *d = vd + r * sizeof(ARMVectorReg); + uint32_t *n = vn + r * sizeof(ARMVectorReg); + uint32_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + + if (split <= 0) { + memcpy(d, n, vl); /* all true */ + } else if (elements <= split) { + memcpy(d, m, vl); /* all false */ + } else { + for (int e = 0; e < split; e++) { + d[H4(e)] = m[H4(e)]; + } + for (int e = split; e < elements; e++) { + d[H4(e)] = n[H4(e)]; + } + } + } + } else { + for (int r = 0; r < nreg; r++) { + uint32_t *d = vd + r * sizeof(ARMVectorReg); + uint32_t *n = vn + r * sizeof(ARMVectorReg); + uint32_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + + if (split <= 0) { + memcpy(d, m, vl); /* all false */ + } else if (elements <= split) { + memcpy(d, n, vl); /* all true */ + } else { + for (int e = 0; e < split; e++) { + d[H4(e)] = n[H4(e)]; + } + for (int e = split; e < elements; e++) { + d[H4(e)] = m[H4(e)]; + } + } + } + } + } else { + /* p.esz must be MO_64, so stride must be 2. */ + if (p.invert) { + for (int r = 0; r < nreg; r++) { + uint32_t *d = vd + r * sizeof(ARMVectorReg); + uint32_t *n = vn + r * sizeof(ARMVectorReg); + uint32_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + int e = 0; + + for (; e < MIN(split, elements); e++) { + d[H4(e)] = m[H4(e)]; + } + for (; e < elements; e += 2) { + d[H4(e)] = n[H4(e)]; + d[H4(e + 1)] = m[H4(e + 1)]; + } + } + } else { + for (int r = 0; r < nreg; r++) { + uint32_t *d = vd + r * sizeof(ARMVectorReg); + uint32_t *n = vn + r * sizeof(ARMVectorReg); + uint32_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + int e = 0; + + for (; e < MIN(split, elements); e += 2) { + d[H4(e)] = n[H4(e)]; + d[H4(e + 1)] = m[H4(e + 1)]; + } + for (; e < elements; e++) { + d[H4(e)] = m[H4(e)]; + } + } + } + } +} + +void HELPER(sme2_sel_d)(void *vd, void *vn, void *vm, + uint32_t png, uint32_t desc) +{ + int vl = simd_oprsz(desc); + int nreg = simd_data(desc); + int elements = vl / sizeof(uint64_t); + DecodeCounter p = decode_counter(png, vl, MO_64); + + if (p.invert) { + for (int r = 0; r < nreg; r++) { + uint64_t *d = vd + r * sizeof(ARMVectorReg); + uint64_t *n = vn + r * sizeof(ARMVectorReg); + uint64_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + + if (split <= 0) { + memcpy(d, n, vl); /* all true */ + } else if (elements <= split) { + memcpy(d, m, vl); /* all false */ + } else { + memcpy(d, m, split * sizeof(uint64_t)); + memcpy(d + split, n + split, + (elements - split) * sizeof(uint64_t)); + } + } + } else { + for (int r = 0; r < nreg; r++) { + uint64_t *d = vd + r * sizeof(ARMVectorReg); + uint64_t *n = vn + r * sizeof(ARMVectorReg); + uint64_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + + if (split <= 0) { + memcpy(d, m, vl); /* all false */ + } else if (elements <= split) { + memcpy(d, n, vl); /* all true */ + } else { + memcpy(d, n, split * sizeof(uint64_t)); + memcpy(d + split, m + split, + (elements - split) * sizeof(uint64_t)); + } + } + } +} diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3 * const uclamp_fns[] = { gen_helper_sme2_uclamp_d, }; TRANS(UCLAMP, do_clamp, a, uclamp_fns) + +static bool trans_SEL(DisasContext *s, arg_SEL *a) +{ + typedef void sme_sel_fn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32); + static sme_sel_fn * const fns[4] = { + gen_helper_sme2_sel_b, gen_helper_sme2_sel_h, + gen_helper_sme2_sel_s, gen_helper_sme2_sel_d + }; + + if (!dc_isar_feature(aa64_sme2, s)) { + return false; + } + if (sme_sm_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + uint32_t desc = simd_desc(svl, svl, a->n); + TCGv_ptr t_d = tcg_temp_new_ptr(); + TCGv_ptr t_n = tcg_temp_new_ptr(); + TCGv_ptr t_m = tcg_temp_new_ptr(); + TCGv_i32 png = tcg_temp_new_i32(); + + tcg_gen_addi_ptr(t_d, tcg_env, vec_full_reg_offset(s, a->zd)); + tcg_gen_addi_ptr(t_n, tcg_env, vec_full_reg_offset(s, a->zn)); + tcg_gen_addi_ptr(t_m, tcg_env, vec_full_reg_offset(s, a->zm)); + + tcg_gen_ld16u_i32(png, tcg_env, pred_full_reg_offset(s, a->pg) + ^ (HOST_BIG_ENDIAN ? 6 : 0)); + + fns[a->esz](t_d, t_n, t_m, png, tcg_constant_i32(desc)); + } + return true; +} -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-85-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 15 ++++++++++++ target/arm/tcg/sve.decode | 5 ++++ target/arm/tcg/sve_helper.c | 42 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sve.c | 3 +++ 4 files changed, 65 insertions(+) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve2p1_uminqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2p1_uminqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_3(pext, TCG_CALL_NO_RWG, void, ptr, i32, i32) + +DEF_HELPER_FLAGS_4(sve2p1_orqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_orqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_orqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_orqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2p1_eorqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_eorqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_eorqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_eorqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2p1_andqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_andqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_andqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_andqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ ORV 00000100 .. 011 000 001 ... ..... ..... @rd_pg_rn EORV 00000100 .. 011 001 001 ... ..... ..... @rd_pg_rn ANDV 00000100 .. 011 010 001 ... ..... ..... @rd_pg_rn +# SVE2.1 bitwise logical reduction (quadwords) +ORQV 00000100 .. 011 100 001 ... ..... ..... @rd_pg_rn +EORQV 00000100 .. 011 101 001 ... ..... ..... @rd_pg_rn +ANDQV 00000100 .. 011 110 001 ... ..... ..... @rd_pg_rn + # SVE constructive prefix (predicated) MOVPRFX_z 00000100 .. 010 000 001 ... ..... ..... @rd_pg_rn MOVPRFX_m 00000100 .. 010 001 001 ... ..... ..... @rd_pg_rn diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ static inline uint64_t expand_pred_s(uint8_t byte) return word[byte & 0x11]; } +static inline uint64_t expand_pred_d(uint8_t byte) +{ + return -(uint64_t)(byte & 1); +} + #define LOGICAL_PPPP(NAME, FUNC) \ void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc) \ { \ @@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc) \ #define DO_EOR(N, M) (N ^ M) #define DO_ORR(N, M) (N | M) #define DO_BIC(N, M) (N & ~M) +#define DO_ORC(N, M) (N | ~M) #define DO_ADD(N, M) (N + M) #define DO_SUB(N, M) (N - M) #define DO_MAX(N, M) ((N) >= (M) ? (N) : (M)) @@ -XXX,XX +XXX,XX @@ DO_ZZI(sve_umini_d, uint64_t, DO_MIN) #undef DO_ZZI +#define DO_LOGIC_QV(NAME, SUFF, INIT, VOP, POP) \ +void HELPER(NAME ## _ ## SUFF)(void *vd, void *vn, void *vg, uint32_t desc) \ +{ \ + unsigned seg = simd_oprsz(desc) / 16; \ + uint64_t r0 = INIT, r1 = INIT; \ + for (unsigned s = 0; s < seg; s++) { \ + uint64_t p0 = expand_pred_##SUFF(*(uint8_t *)(vg + H1(s * 2))); \ + uint64_t p1 = expand_pred_##SUFF(*(uint8_t *)(vg + H1(s * 2 + 1))); \ + uint64_t v0 = *(uint64_t *)(vn + s * 16); \ + uint64_t v1 = *(uint64_t *)(vn + s * 16 + 8); \ + v0 = POP(v0, p0), v1 = POP(v1, p1); \ + r0 = VOP(r0, v0), r1 = VOP(r1, v1); \ + } \ + *(uint64_t *)(vd + 0) = r0; \ + *(uint64_t *)(vd + 8) = r1; \ + clear_tail(vd, 16, simd_maxsz(desc)); \ +} + +DO_LOGIC_QV(sve2p1_orqv, b, 0, DO_ORR, DO_AND) +DO_LOGIC_QV(sve2p1_orqv, h, 0, DO_ORR, DO_AND) +DO_LOGIC_QV(sve2p1_orqv, s, 0, DO_ORR, DO_AND) +DO_LOGIC_QV(sve2p1_orqv, d, 0, DO_ORR, DO_AND) + +DO_LOGIC_QV(sve2p1_eorqv, b, 0, DO_EOR, DO_AND) +DO_LOGIC_QV(sve2p1_eorqv, h, 0, DO_EOR, DO_AND) +DO_LOGIC_QV(sve2p1_eorqv, s, 0, DO_EOR, DO_AND) +DO_LOGIC_QV(sve2p1_eorqv, d, 0, DO_EOR, DO_AND) + +DO_LOGIC_QV(sve2p1_andqv, b, -1, DO_AND, DO_ORC) +DO_LOGIC_QV(sve2p1_andqv, h, -1, DO_AND, DO_ORC) +DO_LOGIC_QV(sve2p1_andqv, s, -1, DO_AND, DO_ORC) +DO_LOGIC_QV(sve2p1_andqv, d, -1, DO_AND, DO_ORC) + +#undef DO_LOGIC_QV + #undef DO_AND #undef DO_ORR #undef DO_EOR #undef DO_BIC +#undef DO_ORC #undef DO_ADD #undef DO_SUB #undef DO_MAX diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ DO_ZPZ(NOT_zpz, aa64_sve, sve_not_zpz) DO_ZPZ(ABS, aa64_sve, sve_abs) DO_ZPZ(NEG, aa64_sve, sve_neg) DO_ZPZ(RBIT, aa64_sve, sve_rbit) +DO_ZPZ(ORQV, aa64_sme2p1_or_sve2p1, sve2p1_orqv) +DO_ZPZ(EORQV, aa64_sme2p1_or_sve2p1, sve2p1_eorqv) +DO_ZPZ(ANDQV, aa64_sme2p1_or_sve2p1, sve2p1_andqv) static gen_helper_gvec_3 * const fabs_fns[4] = { NULL, gen_helper_sve_fabs_h, -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-86-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 49 ++++++++++++++++++++++++ target/arm/tcg/sve.decode | 8 ++++ target/arm/tcg/sve_helper.c | 70 +++++++++++++++++++++------------- target/arm/tcg/translate-sve.c | 48 +++++++++++++++++++++++ 4 files changed, 148 insertions(+), 27 deletions(-) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_ah_fminv_s, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_4(sve_ah_fminv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_faddqv_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_faddqv_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_faddqv_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) + +DEF_HELPER_FLAGS_5(sve2p1_fmaxnmqv_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_fmaxnmqv_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_fmaxnmqv_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) + +DEF_HELPER_FLAGS_5(sve2p1_fminnmqv_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_fminnmqv_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_fminnmqv_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) + +DEF_HELPER_FLAGS_5(sve2p1_fmaxqv_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_fmaxqv_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_fmaxqv_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) + +DEF_HELPER_FLAGS_5(sve2p1_fminqv_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_fminqv_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_fminqv_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) + +DEF_HELPER_FLAGS_5(sve2p1_ah_fmaxqv_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_ah_fmaxqv_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_ah_fmaxqv_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) + +DEF_HELPER_FLAGS_5(sve2p1_ah_fminqv_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_ah_fminqv_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_ah_fminqv_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) + DEF_HELPER_FLAGS_5(sve_fadda_h, TCG_CALL_NO_RWG, i64, i64, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(sve_fadda_s, TCG_CALL_NO_RWG, diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ FMINNMV 01100101 .. 000 101 001 ... ..... ..... @rd_pg_rn FMAXV 01100101 .. 000 110 001 ... ..... ..... @rd_pg_rn FMINV 01100101 .. 000 111 001 ... ..... ..... @rd_pg_rn +### SVE FP recursive reduction (quadwords) + +FADDQV 01100100 .. 010 000 101 ... ..... ..... @rd_pg_rn +FMAXNMQV 01100100 .. 010 100 101 ... ..... ..... @rd_pg_rn +FMINNMQV 01100100 .. 010 101 101 ... ..... ..... @rd_pg_rn +FMAXQV 01100100 .. 010 110 101 ... ..... ..... @rd_pg_rn +FMINQV 01100100 .. 010 111 101 ... ..... ..... @rd_pg_rn + ## SVE Floating Point Unary Operations - Unpredicated Group FRECPE 01100101 .. 001 110 001100 ..... ..... @rd_rn diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whilecg)(void *vd, uint32_t count, uint32_t pred_desc) * The recursion is bounded to depth 7 (128 fp16 elements), so there's * little to gain with a more complex non-recursive form. */ -#define DO_REDUCE(NAME, TYPE, H, FUNC, IDENT) \ -static TYPE NAME##_reduce(TYPE *data, float_status *status, uintptr_t n) \ +#define DO_REDUCE(NAME, SUF, TYPE, H, FUNC, IDENT) \ +static TYPE FUNC##_reduce(TYPE *data, float_status *status, uintptr_t n) \ { \ if (n == 1) { \ return *data; \ } else { \ uintptr_t half = n / 2; \ - TYPE lo = NAME##_reduce(data, status, half); \ - TYPE hi = NAME##_reduce(data + half, status, half); \ + TYPE lo = FUNC##_reduce(data, status, half); \ + TYPE hi = FUNC##_reduce(data + half, status, half); \ return FUNC(lo, hi, status); \ } \ } \ -uint64_t HELPER(NAME)(void *vn, void *vg, float_status *s, uint32_t desc) \ +uint64_t helper_sve_##NAME##v_##SUF(void *vn, void *vg, \ + float_status *s, uint32_t desc) \ { \ uintptr_t i, oprsz = simd_oprsz(desc), maxsz = simd_data(desc); \ TYPE data[sizeof(ARMVectorReg) / sizeof(TYPE)]; \ @@ -XXX,XX +XXX,XX @@ uint64_t HELPER(NAME)(void *vn, void *vg, float_status *s, uint32_t desc) \ for (; i < maxsz; i += sizeof(TYPE)) { \ *(TYPE *)((void *)data + i) = IDENT; \ } \ - return NAME##_reduce(data, s, maxsz / sizeof(TYPE)); \ + return FUNC##_reduce(data, s, maxsz / sizeof(TYPE)); \ +} \ +void helper_sve2p1_##NAME##qv_##SUF(void *vd, void *vn, void *vg, \ + float_status *status, uint32_t desc) \ +{ \ + unsigned oprsz = simd_oprsz(desc), segments = oprsz / 16; \ + for (unsigned e = 0; e < 16; e += sizeof(TYPE)) { \ + TYPE data[ARM_MAX_VQ]; \ + for (unsigned s = 0; s < segments; s++) { \ + uint16_t pg = *(uint16_t *)(vg + H1_2(s * 2)); \ + TYPE nn = *(TYPE *)(vn + H(s * 16 + H(e))); \ + data[s] = (pg >> e) & 1 ? nn : IDENT; \ + } \ + *(TYPE *)(vd + H(e)) = FUNC##_reduce(data, status, segments); \ + } \ + clear_tail(vd, 16, simd_maxsz(desc)); \ } -DO_REDUCE(sve_faddv_h, float16, H1_2, float16_add, float16_zero) -DO_REDUCE(sve_faddv_s, float32, H1_4, float32_add, float32_zero) -DO_REDUCE(sve_faddv_d, float64, H1_8, float64_add, float64_zero) +DO_REDUCE(fadd,h, float16, H1_2, float16_add, float16_zero) +DO_REDUCE(fadd,s, float32, H1_4, float32_add, float32_zero) +DO_REDUCE(fadd,d, float64, H1_8, float64_add, float64_zero) /* Identity is floatN_default_nan, without the function call. */ -DO_REDUCE(sve_fminnmv_h, float16, H1_2, float16_minnum, 0x7E00) -DO_REDUCE(sve_fminnmv_s, float32, H1_4, float32_minnum, 0x7FC00000) -DO_REDUCE(sve_fminnmv_d, float64, H1_8, float64_minnum, 0x7FF8000000000000ULL) +DO_REDUCE(fminnm,h, float16, H1_2, float16_minnum, 0x7E00) +DO_REDUCE(fminnm,s, float32, H1_4, float32_minnum, 0x7FC00000) +DO_REDUCE(fminnm,d, float64, H1_8, float64_minnum, 0x7FF8000000000000ULL) -DO_REDUCE(sve_fmaxnmv_h, float16, H1_2, float16_maxnum, 0x7E00) -DO_REDUCE(sve_fmaxnmv_s, float32, H1_4, float32_maxnum, 0x7FC00000) -DO_REDUCE(sve_fmaxnmv_d, float64, H1_8, float64_maxnum, 0x7FF8000000000000ULL) +DO_REDUCE(fmaxnm,h, float16, H1_2, float16_maxnum, 0x7E00) +DO_REDUCE(fmaxnm,s, float32, H1_4, float32_maxnum, 0x7FC00000) +DO_REDUCE(fmaxnm,d, float64, H1_8, float64_maxnum, 0x7FF8000000000000ULL) -DO_REDUCE(sve_fminv_h, float16, H1_2, float16_min, float16_infinity) -DO_REDUCE(sve_fminv_s, float32, H1_4, float32_min, float32_infinity) -DO_REDUCE(sve_fminv_d, float64, H1_8, float64_min, float64_infinity) +DO_REDUCE(fmin,h, float16, H1_2, float16_min, float16_infinity) +DO_REDUCE(fmin,s, float32, H1_4, float32_min, float32_infinity) +DO_REDUCE(fmin,d, float64, H1_8, float64_min, float64_infinity) -DO_REDUCE(sve_fmaxv_h, float16, H1_2, float16_max, float16_chs(float16_infinity)) -DO_REDUCE(sve_fmaxv_s, float32, H1_4, float32_max, float32_chs(float32_infinity)) -DO_REDUCE(sve_fmaxv_d, float64, H1_8, float64_max, float64_chs(float64_infinity)) +DO_REDUCE(fmax,h, float16, H1_2, float16_max, float16_chs(float16_infinity)) +DO_REDUCE(fmax,s, float32, H1_4, float32_max, float32_chs(float32_infinity)) +DO_REDUCE(fmax,d, float64, H1_8, float64_max, float64_chs(float64_infinity)) -DO_REDUCE(sve_ah_fminv_h, float16, H1_2, helper_vfp_ah_minh, float16_infinity) -DO_REDUCE(sve_ah_fminv_s, float32, H1_4, helper_vfp_ah_mins, float32_infinity) -DO_REDUCE(sve_ah_fminv_d, float64, H1_8, helper_vfp_ah_mind, float64_infinity) +DO_REDUCE(ah_fmin,h, float16, H1_2, helper_vfp_ah_minh, float16_infinity) +DO_REDUCE(ah_fmin,s, float32, H1_4, helper_vfp_ah_mins, float32_infinity) +DO_REDUCE(ah_fmin,d, float64, H1_8, helper_vfp_ah_mind, float64_infinity) -DO_REDUCE(sve_ah_fmaxv_h, float16, H1_2, helper_vfp_ah_maxh, +DO_REDUCE(ah_fmax,h, float16, H1_2, helper_vfp_ah_maxh, float16_chs(float16_infinity)) -DO_REDUCE(sve_ah_fmaxv_s, float32, H1_4, helper_vfp_ah_maxs, +DO_REDUCE(ah_fmax,s, float32, H1_4, helper_vfp_ah_maxs, float32_chs(float32_infinity)) -DO_REDUCE(sve_ah_fmaxv_d, float64, H1_8, helper_vfp_ah_maxd, +DO_REDUCE(ah_fmax,d, float64, H1_8, helper_vfp_ah_maxd, float64_chs(float64_infinity)) #undef DO_REDUCE diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ DO_VPZ_AH(FMAXV, fmaxv) #undef DO_VPZ +static gen_helper_gvec_3_ptr * const faddqv_fns[4] = { + NULL, gen_helper_sve2p1_faddqv_h, + gen_helper_sve2p1_faddqv_s, gen_helper_sve2p1_faddqv_d, +}; +TRANS_FEAT(FADDQV, aa64_sme2p1_or_sve2p1, gen_gvec_fpst_arg_zpz, + faddqv_fns[a->esz], a, 0, + a->esz == MO_16 ? FPST_A64_F16 : FPST_A64) + +static gen_helper_gvec_3_ptr * const fmaxnmqv_fns[4] = { + NULL, gen_helper_sve2p1_fmaxnmqv_h, + gen_helper_sve2p1_fmaxnmqv_s, gen_helper_sve2p1_fmaxnmqv_d, +}; +TRANS_FEAT(FMAXNMQV, aa64_sme2p1_or_sve2p1, gen_gvec_fpst_arg_zpz, + fmaxnmqv_fns[a->esz], a, 0, + a->esz == MO_16 ? FPST_A64_F16 : FPST_A64) + +static gen_helper_gvec_3_ptr * const fminnmqv_fns[4] = { + NULL, gen_helper_sve2p1_fminnmqv_h, + gen_helper_sve2p1_fminnmqv_s, gen_helper_sve2p1_fminnmqv_d, +}; +TRANS_FEAT(FMINNMQV, aa64_sme2p1_or_sve2p1, gen_gvec_fpst_arg_zpz, + fminnmqv_fns[a->esz], a, 0, + a->esz == MO_16 ? FPST_A64_F16 : FPST_A64) + +static gen_helper_gvec_3_ptr * const fmaxqv_fns[4] = { + NULL, gen_helper_sve2p1_fmaxqv_h, + gen_helper_sve2p1_fmaxqv_s, gen_helper_sve2p1_fmaxqv_d, +}; +static gen_helper_gvec_3_ptr * const fmaxqv_ah_fns[4] = { + NULL, gen_helper_sve2p1_ah_fmaxqv_h, + gen_helper_sve2p1_ah_fmaxqv_s, gen_helper_sve2p1_ah_fmaxqv_d, +}; +TRANS_FEAT(FMAXQV, aa64_sme2p1_or_sve2p1, gen_gvec_fpst_arg_zpz, + (s->fpcr_ah ? fmaxqv_fns : fmaxqv_ah_fns)[a->esz], a, 0, + a->esz == MO_16 ? FPST_A64_F16 : FPST_A64) + +static gen_helper_gvec_3_ptr * const fminqv_fns[4] = { + NULL, gen_helper_sve2p1_fminqv_h, + gen_helper_sve2p1_fminqv_s, gen_helper_sve2p1_fminqv_d, +}; +static gen_helper_gvec_3_ptr * const fminqv_ah_fns[4] = { + NULL, gen_helper_sve2p1_ah_fminqv_h, + gen_helper_sve2p1_ah_fminqv_s, gen_helper_sve2p1_ah_fminqv_d, +}; +TRANS_FEAT(FMINQV, aa64_sme2p1_or_sve2p1, gen_gvec_fpst_arg_zpz, + (s->fpcr_ah ? fminqv_fns : fminqv_ah_fns)[a->esz], a, 0, + a->esz == MO_16 ? FPST_A64_F16 : FPST_A64) + /* *** SVE Floating Point Unary Operations - Unpredicated Group */ -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-87-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 6 ++++++ target/arm/tcg/translate-sve.c | 30 ++++++++++++++++++++++++++++++ 2 files changed, 36 insertions(+) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ FMLSLT_zzzw 01100100 10 1 ..... 10 1 00 1 ..... ..... @rda_rn_rm_ex esz=2 BFMLALB_zzzw 01100100 11 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2 BFMLALT_zzzw 01100100 11 1 ..... 10 0 00 1 ..... ..... @rda_rn_rm_ex esz=2 +BFMLSLB_zzzw 01100100 11 1 ..... 10 1 00 0 ..... ..... @rda_rn_rm_ex esz=2 +BFMLSLT_zzzw 01100100 11 1 ..... 10 1 00 1 ..... ..... @rda_rn_rm_ex esz=2 ### SVE2 floating-point dot-product FDOT_zzzz 01100100 00 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2 BFDOT_zzzz 01100100 01 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2 ### SVE2 floating-point multiply-add long (indexed) + FMLALB_zzxw 01100100 10 1 ..... 0100.0 ..... ..... @rrxr_3a esz=2 FMLALT_zzxw 01100100 10 1 ..... 0100.1 ..... ..... @rrxr_3a esz=2 FMLSLB_zzxw 01100100 10 1 ..... 0110.0 ..... ..... @rrxr_3a esz=2 FMLSLT_zzxw 01100100 10 1 ..... 0110.1 ..... ..... @rrxr_3a esz=2 + BFMLALB_zzxw 01100100 11 1 ..... 0100.0 ..... ..... @rrxr_3a esz=2 BFMLALT_zzxw 01100100 11 1 ..... 0100.1 ..... ..... @rrxr_3a esz=2 +BFMLSLB_zzxw 01100100 11 1 ..... 0110.0 ..... ..... @rrxr_3a esz=2 +BFMLSLT_zzxw 01100100 11 1 ..... 0110.1 ..... ..... @rrxr_3a esz=2 ### SVE2 floating-point dot-product (indexed) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool do_BFMLAL_zzxw(DisasContext *s, arg_rrxr_esz *a, bool sel) TRANS_FEAT(BFMLALB_zzxw, aa64_sve_bf16, do_BFMLAL_zzxw, a, false) TRANS_FEAT(BFMLALT_zzxw, aa64_sve_bf16, do_BFMLAL_zzxw, a, true) +static bool do_BFMLSL_zzzw(DisasContext *s, arg_rrrr_esz *a, bool sel) +{ + if (s->fpcr_ah) { + return gen_gvec_fpst_zzzz(s, gen_helper_gvec_ah_bfmlsl, + a->rd, a->rn, a->rm, a->ra, sel, FPST_AH); + } else { + return gen_gvec_fpst_zzzz(s, gen_helper_gvec_bfmlsl, + a->rd, a->rn, a->rm, a->ra, sel, FPST_A64); + } +} + +TRANS_FEAT(BFMLSLB_zzzw, aa64_sme2_or_sve2p1, do_BFMLSL_zzzw, a, false) +TRANS_FEAT(BFMLSLT_zzzw, aa64_sme2_or_sve2p1, do_BFMLSL_zzzw, a, true) + +static bool do_BFMLSL_zzxw(DisasContext *s, arg_rrxr_esz *a, bool sel) +{ + if (s->fpcr_ah) { + return gen_gvec_fpst_zzzz(s, gen_helper_gvec_ah_bfmlsl_idx, + a->rd, a->rn, a->rm, a->ra, + (a->index << 1) | sel, FPST_AH); + } else { + return gen_gvec_fpst_zzzz(s, gen_helper_gvec_bfmlsl_idx, + a->rd, a->rn, a->rm, a->ra, + (a->index << 1) | sel, FPST_A64); + } +} + +TRANS_FEAT(BFMLSLB_zzxw, aa64_sme2_or_sve2p1, do_BFMLSL_zzxw, a, false) +TRANS_FEAT(BFMLSLT_zzxw, aa64_sme2_or_sve2p1, do_BFMLSL_zzxw, a, true) + static bool trans_PSEL(DisasContext *s, arg_psel *a) { int vl = vec_full_reg_size(s); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-88-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 1 + target/arm/tcg/sve.decode | 3 ++- target/arm/tcg/sve_helper.c | 21 +++++++++++++++++++++ target/arm/tcg/translate-sve.c | 30 ++++++++++++++++++++++++++++++ 4 files changed, 54 insertions(+), 1 deletion(-) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_brkn, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_brkns, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve_cntp, TCG_CALL_NO_RWG, i64, ptr, ptr, i32) +DEF_HELPER_FLAGS_2(sve2p1_cntp_c, TCG_CALL_NO_RWG_SE, i64, i32, i32) DEF_HELPER_FLAGS_3(sve_whilel, TCG_CALL_NO_RWG, i32, ptr, i32, i32) DEF_HELPER_FLAGS_3(sve_whileg, TCG_CALL_NO_RWG, i32, ptr, i32, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ BRKN 00100101 0. 01100001 .... 0 .... 0 .... @pd_pg_pn_s ### SVE Predicate Count Group # SVE predicate count -CNTP 00100101 .. 100 000 10 .... 0 .... ..... @rd_pg4_pn +CNTP 00100101 .. 100 000 10 .... 0 .... ..... @rd_pg4_pn +CNTP_c 00100101 esz:2 100 000 10 000 vl:1 1 rn:4 rd:5 # SVE inc/dec register by predicate count INCDECP_r 00100101 .. 10110 d:1 10001 00 .... ..... @incdec_pred u=1 diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ uint64_t HELPER(sve_cntp)(void *vn, void *vg, uint32_t pred_desc) return sum; } +uint64_t HELPER(sve2p1_cntp_c)(uint32_t png, uint32_t desc) +{ + int pl = FIELD_EX32(desc, PREDDESC, OPRSZ); + int vl = pl * 8; + unsigned v_esz = FIELD_EX32(desc, PREDDESC, ESZ); + int lg2_width = FIELD_EX32(desc, PREDDESC, DATA) + 1; + DecodeCounter p = decode_counter(png, vl, v_esz); + unsigned maxelem = (vl << lg2_width) >> v_esz; + unsigned count = p.count; + + if (p.invert) { + if (count >= maxelem) { + return 0; + } + count = maxelem - count; + } else { + count = MIN(count, maxelem); + } + return count >> p.lg2_stride; +} + /* C.f. Arm pseudocode EncodePredCount */ static uint64_t encode_pred_count(uint32_t elements, uint32_t count, uint32_t esz, bool invert) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_CNTP(DisasContext *s, arg_CNTP *a) return true; } +static bool trans_CNTP_c(DisasContext *s, arg_CNTP_c *a) +{ + TCGv_i32 t_png; + uint32_t desc = 0; + + if (dc_isar_feature(aa64_sve2p1, s)) { + if (!sve_access_check(s)) { + return true; + } + } else if (dc_isar_feature(aa64_sme2, s)) { + if (!sme_sm_enabled_check(s)) { + return true; + } + } else { + return false; + } + + t_png = tcg_temp_new_i32(); + tcg_gen_ld16u_i32(t_png, tcg_env, + pred_full_reg_offset(s, a->rn) ^ + (HOST_BIG_ENDIAN ? 6 : 0)); + + desc = FIELD_DP32(desc, PREDDESC, OPRSZ, pred_full_reg_size(s)); + desc = FIELD_DP32(desc, PREDDESC, ESZ, a->esz); + desc = FIELD_DP32(desc, PREDDESC, DATA, a->vl); + + gen_helper_sve2p1_cntp_c(cpu_reg(s, a->rd), t_png, tcg_constant_i32(desc)); + return true; +} + static bool trans_INCDECP_r(DisasContext *s, arg_incdec_pred *a) { if (!dc_isar_feature(aa64_sve, s)) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-89-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 6 ++++++ target/arm/tcg/translate-sve.c | 21 +++++++++++++++++++++ 2 files changed, 27 insertions(+) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ DUP_s 00000101 .. 1 00000 001110 ..... ..... @rd_rn DUP_x 00000101 .. 1 ..... 001000 rn:5 rd:5 \ &rri imm=%imm7_22_16 +# SVE Permute Vector - one source quadwords +DUPQ 00000101 001 imm:4 1 001001 rn:5 rd:5 &rri_esz esz=0 +DUPQ 00000101 001 imm:3 10 001001 rn:5 rd:5 &rri_esz esz=1 +DUPQ 00000101 001 imm:2 100 001001 rn:5 rd:5 &rri_esz esz=2 +DUPQ 00000101 001 imm:1 1000 001001 rn:5 rd:5 &rri_esz esz=3 + # SVE insert SIMD&FP scalar register INSR_f 00000101 .. 1 10100 001110 ..... ..... @rdn_rm diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_DUP_x(DisasContext *s, arg_DUP_x *a) return true; } +static bool trans_DUPQ(DisasContext *s, arg_DUPQ *a) +{ + unsigned vl, dofs, nofs; + + if (!dc_isar_feature(aa64_sme2p1_or_sve2p1, s)) { + return false; + } + if (!sve_access_check(s)) { + return true; + } + + vl = vec_full_reg_size(s); + dofs = vec_full_reg_offset(s, a->rd); + nofs = vec_reg_offset(s, a->rn, a->imm, a->esz); + + for (unsigned i = 0; i < vl; i += 16) { + tcg_gen_gvec_dup_mem(a->esz, dofs + i, nofs + i, 16, 16); + } + return true; +} + static void do_insr_i64(DisasContext *s, arg_rrr_esz *a, TCGv_i64 val) { typedef void gen_insr(TCGv_ptr, TCGv_ptr, TCGv_i64, TCGv_i32); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-90-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 2 ++ target/arm/tcg/translate-sve.c | 49 ++++++++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ DUPQ 00000101 001 imm:3 10 001001 rn:5 rd:5 &rri_esz esz=1 DUPQ 00000101 001 imm:2 100 001001 rn:5 rd:5 &rri_esz esz=2 DUPQ 00000101 001 imm:1 1000 001001 rn:5 rd:5 &rri_esz esz=3 +EXTQ 00000101 0110 imm:4 001001 rn:5 rd:5 &rri + # SVE insert SIMD&FP scalar register INSR_f 00000101 .. 1 10100 001110 ..... ..... @rdn_rm diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool do_EXT(DisasContext *s, int rd, int rn, int rm, int imm) TRANS_FEAT(EXT, aa64_sve, do_EXT, a->rd, a->rn, a->rm, a->imm) TRANS_FEAT(EXT_sve2, aa64_sve2, do_EXT, a->rd, a->rn, (a->rn + 1) % 32, a->imm) +static bool trans_EXTQ(DisasContext *s, arg_EXTQ *a) +{ + unsigned vl, dofs, sofs0, sofs1, sofs2, imm; + + if (!dc_isar_feature(aa64_sme2p1_or_sve2p1, s)) { + return false; + } + if (!sve_access_check(s)) { + return true; + } + + imm = a->imm; + if (imm == 0) { + /* So far we never optimize Zdn with MOVPRFX, so zd = zn is a nop. */ + return true; + } + + vl = vec_full_reg_size(s); + dofs = vec_full_reg_offset(s, a->rd); + sofs2 = vec_full_reg_offset(s, a->rn); + + if (imm & 8) { + sofs0 = dofs + 8; + sofs1 = sofs2; + sofs2 += 8; + } else { + sofs0 = dofs; + sofs1 = dofs + 8; + } + imm = (imm & 7) << 3; + + for (unsigned i = 0; i < vl; i += 16) { + TCGv_i64 s0 = tcg_temp_new_i64(); + TCGv_i64 s1 = tcg_temp_new_i64(); + TCGv_i64 s2 = tcg_temp_new_i64(); + + tcg_gen_ld_i64(s0, tcg_env, sofs0 + i); + tcg_gen_ld_i64(s1, tcg_env, sofs1 + i); + tcg_gen_ld_i64(s2, tcg_env, sofs2 + i); + + tcg_gen_extract2_i64(s0, s0, s1, imm); + tcg_gen_extract2_i64(s1, s1, s2, imm); + + tcg_gen_st_i64(s0, tcg_env, dofs + i); + tcg_gen_st_i64(s1, tcg_env, dofs + i + 8); + } + return true; +} + /* *** SVE Permute - Unpredicated Group */ -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-91-richard.henderson@linaro.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 8 +++ target/arm/tcg/vec_internal.h | 34 ++++++++++++ target/arm/tcg/sve.decode | 17 ++++++ target/arm/tcg/sve_helper.c | 50 +++++++++++++++++ target/arm/tcg/translate-sve.c | 98 ++++++++++++++++++++++++++++++++++ 5 files changed, 207 insertions(+) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve2p1_andqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2p1_andqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2p1_andqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2p1_andqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(pmov_pv_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(pmov_pv_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(pmov_pv_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(pmov_vp_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(pmov_vp_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(pmov_vp_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_internal.h +++ b/target/arm/tcg/vec_internal.h @@ -XXX,XX +XXX,XX @@ decode_counter(unsigned png, unsigned vl, unsigned v_esz) return ret; } +/* Extract @len bits from an array of uint64_t at offset @pos bits. */ +static inline uint64_t extractn(uint64_t *p, unsigned pos, unsigned len) +{ + uint64_t x; + + p += pos / 64; + pos = pos % 64; + + x = p[0]; + if (pos + len > 64) { + x = (x >> pos) | (p[1] << (-pos & 63)); + pos = 0; + } + return extract64(x, pos, len); +} + +/* Deposit @len bits into an array of uint64_t at offset @pos bits. */ +static inline void depositn(uint64_t *p, unsigned pos, + unsigned len, uint64_t val) +{ + p += pos / 64; + pos = pos % 64; + + if (pos + len <= 64) { + p[0] = deposit64(p[0], pos, len, val); + } else { + unsigned len0 = 64 - pos; + unsigned len1 = len - len0; + + p[0] = deposit64(p[0], pos, len0, val); + p[1] = deposit64(p[1], 0, len1, val >> len0); + } +} + #endif /* TARGET_ARM_VEC_INTERNAL_H */ diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ %size_23 23:2 %dtype_23_13 23:2 13:2 %index3_22_19 22:1 19:2 +%index3_22_17 22:1 17:2 %index3_19_11 19:2 11:1 %index2_20_11 20:1 11:1 @@ -XXX,XX +XXX,XX @@ INSR_r 00000101 .. 1 00100 001110 ..... ..... @rdn_rm # SVE reverse vector elements REV_v 00000101 .. 1 11000 001110 ..... ..... @rd_rn +# SVE move predicate to/from vector + +PMOV_pv 00000101 00 101 01 0001110 rn:5 0 rd:4 \ + &rri_esz esz=0 imm=0 +PMOV_pv 00000101 00 101 1 imm:1 0001110 rn:5 0 rd:4 &rri_esz esz=1 +PMOV_pv 00000101 01 101 imm:2 0001110 rn:5 0 rd:4 &rri_esz esz=2 +PMOV_pv 00000101 1. 101 .. 0001110 rn:5 0 rd:4 \ + &rri_esz esz=3 imm=%index3_22_17 + +PMOV_vp 00000101 00 101 01 1001110 0 rn:4 rd:5 \ + &rri_esz esz=0 imm=0 +PMOV_vp 00000101 00 101 1 imm:1 1001110 0 rn:4 rd:5 &rri_esz esz=1 +PMOV_vp 00000101 01 101 imm:2 1001110 0 rn:4 rd:5 &rri_esz esz=2 +PMOV_vp 00000101 1. 101 .. 1001110 0 rn:4 rd:5 \ + &rri_esz esz=3 imm=%index3_22_17 + # SVE vector table lookup TBL 00000101 .. 1 ..... 001100 ..... ..... @rd_rn_rm diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sve_rev_d)(void *vd, void *vn, uint32_t desc) } } +/* + * TODO: This could use half_shuffle64 and similar bit tricks to + * expand blocks of bits at once. + */ +#define DO_PMOV_PV(NAME, ESIZE) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + unsigned vl = simd_oprsz(desc); \ + unsigned idx = simd_data(desc); \ + unsigned elements = vl / ESIZE; \ + ARMPredicateReg *d = vd; \ + ARMVectorReg *s = vs; \ + memset(d, 0, sizeof(*d)); \ + for (unsigned e = 0; e < elements; ++e) { \ + depositn(d->p, e * ESIZE, 1, extractn(s->d, elements * idx + e, 1)); \ + } \ +} + +DO_PMOV_PV(pmov_pv_h, 2) +DO_PMOV_PV(pmov_pv_s, 4) +DO_PMOV_PV(pmov_pv_d, 8) + +#undef DO_PMOV_PV + +/* + * TODO: This could use half_unshuffle64 and similar bit tricks to + * compress blocks of bits at once. + */ +#define DO_PMOV_VP(NAME, ESIZE) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + unsigned vl = simd_oprsz(desc); \ + unsigned idx = simd_data(desc); \ + unsigned elements = vl / ESIZE; \ + ARMVectorReg *d = vd; \ + ARMPredicateReg *s = vs; \ + if (idx == 0) { \ + memset(d, 0, vl); \ + } \ + for (unsigned e = 0; e < elements; ++e) { \ + depositn(d->d, elements * idx + e, 1, extractn(s->p, e * ESIZE, 1)); \ + } \ +} + +DO_PMOV_VP(pmov_vp_h, 2) +DO_PMOV_VP(pmov_vp_s, 4) +DO_PMOV_VP(pmov_vp_d, 8) + +#undef DO_PMOV_VP + typedef void tb_impl_fn(void *, void *, void *, void *, uintptr_t, bool); static inline void do_tbl1(void *vd, void *vn, void *vm, uint32_t desc, diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3 * const tbx_fns[4] = { }; TRANS_FEAT(TBX, aa64_sve2, gen_gvec_ool_arg_zzz, tbx_fns[a->esz], a, 0) +static bool trans_PMOV_pv(DisasContext *s, arg_PMOV_pv *a) +{ + static gen_helper_gvec_2 * const fns[4] = { + NULL, gen_helper_pmov_pv_h, + gen_helper_pmov_pv_s, gen_helper_pmov_pv_d + }; + unsigned vl, pl, vofs, pofs; + TCGv_i64 tmp; + + if (!dc_isar_feature(aa64_sme2p1_or_sve2p1, s)) { + return false; + } + if (!sve_access_check(s)) { + return true; + } + + vl = vec_full_reg_size(s); + if (a->esz != MO_8) { + tcg_gen_gvec_2_ool(pred_full_reg_offset(s, a->rd), + vec_full_reg_offset(s, a->rn), + vl, vl, a->imm, fns[a->esz]); + return true; + } + + /* + * Copy the low PL bytes from vector Zn, zero-extending to a + * multiple of 8 bytes, so that Pd is properly cleared. + */ + + pl = vl / 8; + pofs = pred_full_reg_offset(s, a->rd); + vofs = vec_full_reg_offset(s, a->rn); + + QEMU_BUILD_BUG_ON(sizeof(ARMPredicateReg) != 32); + for (unsigned i = 32; i >= 8; i >>= 1) { + if (pl & i) { + tcg_gen_gvec_mov(MO_64, pofs, vofs, i, i); + pofs += i; + vofs += i; + } + } + switch (pl & 7) { + case 0: + return true; + case 2: + tmp = tcg_temp_new_i64(); + tcg_gen_ld16u_i64(tmp, tcg_env, vofs + (HOST_BIG_ENDIAN ? 6 : 0)); + break; + case 4: + tmp = tcg_temp_new_i64(); + tcg_gen_ld32u_i64(tmp, tcg_env, vofs + (HOST_BIG_ENDIAN ? 4 : 0)); + break; + case 6: + tmp = tcg_temp_new_i64(); + tcg_gen_ld_i64(tmp, tcg_env, vofs); + tcg_gen_extract_i64(tmp, tmp, 0, 48); + break; + default: + g_assert_not_reached(); + } + tcg_gen_st_i64(tmp, tcg_env, pofs); + return true; +} + +static bool trans_PMOV_vp(DisasContext *s, arg_PMOV_pv *a) +{ + static gen_helper_gvec_2 * const fns[4] = { + NULL, gen_helper_pmov_vp_h, + gen_helper_pmov_vp_s, gen_helper_pmov_vp_d + }; + unsigned vl; + + if (!dc_isar_feature(aa64_sme2p1_or_sve2p1, s)) { + return false; + } + if (!sve_access_check(s)) { + return true; + } + + vl = vec_full_reg_size(s); + + if (a->esz == MO_8) { + /* + * The low PL bytes are copied from Pn to Zd unchanged. + * We know that the unused portion of Pn is zero, and + * that imm == 0, so the balance of Zd must be zeroed. + */ + tcg_gen_gvec_mov(MO_64, vec_full_reg_offset(s, a->rd), + pred_full_reg_offset(s, a->rn), + size_for_gvec(vl / 8), vl); + } else { + tcg_gen_gvec_2_ool(vec_full_reg_offset(s, a->rd), + pred_full_reg_offset(s, a->rn), + vl, vl, a->imm, fns[a->esz]); + } + return true; +} + static bool trans_UNPK(DisasContext *s, arg_UNPK *a) { static gen_helper_gvec_2 * const fns[4][2] = { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-92-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 10 ++++++++++ target/arm/tcg/sve.decode | 6 ++++++ target/arm/tcg/sve_helper.c | 29 +++++++++++++++++++++++++++++ target/arm/tcg/translate-sve.c | 19 ++++++++++++++++++- 4 files changed, 63 insertions(+), 1 deletion(-) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_zip_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_zip_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_zip_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_zipq_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_zipq_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_zipq_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_zipq_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_4(sve_uzp_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_uzp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_uzp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_uzp_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_uzp_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_uzpq_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_uzpq_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_uzpq_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_uzpq_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_4(sve_trn_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_trn_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_trn_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ UZP2_q 00000101 10 1 ..... 000 011 ..... ..... @rd_rn_rm_e0 TRN1_q 00000101 10 1 ..... 000 110 ..... ..... @rd_rn_rm_e0 TRN2_q 00000101 10 1 ..... 000 111 ..... ..... @rd_rn_rm_e0 +# SVE2.1 permute vector elements (quadwords) +ZIPQ1 01000100 .. 0 ..... 111 000 ..... ..... @rd_rn_rm +ZIPQ2 01000100 .. 0 ..... 111 001 ..... ..... @rd_rn_rm +UZPQ1 01000100 .. 0 ..... 111 010 ..... ..... @rd_rn_rm +UZPQ2 01000100 .. 0 ..... 111 011 ..... ..... @rd_rn_rm + ### SVE Permute - Predicated Group # SVE compress active elements diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ DO_UZP(sve_uzp_s, uint32_t, H1_4) DO_UZP(sve_uzp_d, uint64_t, H1_8) DO_UZP(sve2_uzp_q, Int128, ) +typedef void perseg_zzz_fn(void *vd, void *vn, void *vm, uint32_t desc); + +static void do_perseg_zzz(void *vd, void *vn, void *vm, + uint32_t desc, perseg_zzz_fn *fn) +{ + intptr_t oprsz = simd_oprsz(desc); + + desc = simd_desc(16, 16, simd_data(desc)); + for (intptr_t i = 0; i < oprsz; i += 16) { + fn(vd + i, vn + i, vm + i, desc); + } +} + +#define DO_PERSEG_ZZZ(NAME, FUNC) \ + void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ + { do_perseg_zzz(vd, vn, vm, desc, FUNC); } + +DO_PERSEG_ZZZ(sve2p1_uzpq_b, helper_sve_uzp_b) +DO_PERSEG_ZZZ(sve2p1_uzpq_h, helper_sve_uzp_h) +DO_PERSEG_ZZZ(sve2p1_uzpq_s, helper_sve_uzp_s) +DO_PERSEG_ZZZ(sve2p1_uzpq_d, helper_sve_uzp_d) + +DO_PERSEG_ZZZ(sve2p1_zipq_b, helper_sve_zip_b) +DO_PERSEG_ZZZ(sve2p1_zipq_h, helper_sve_zip_h) +DO_PERSEG_ZZZ(sve2p1_zipq_s, helper_sve_zip_s) +DO_PERSEG_ZZZ(sve2p1_zipq_d, helper_sve_zip_d) + +#undef DO_PERSEG_ZZZ + #define DO_TRN(NAME, TYPE, H) \ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ { \ diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT_NONSTREAMING(ZIP2_q, aa64_sve_f64mm, do_interleave_q, gen_helper_sve2_zip_q, a, QEMU_ALIGN_DOWN(vec_full_reg_size(s), 32) / 2) +static gen_helper_gvec_3 * const zipq_fns[4] = { + gen_helper_sve2p1_zipq_b, gen_helper_sve2p1_zipq_h, + gen_helper_sve2p1_zipq_s, gen_helper_sve2p1_zipq_d, +}; +TRANS_FEAT(ZIPQ1, aa64_sme2p1_or_sve2p1, gen_gvec_ool_arg_zzz, + zipq_fns[a->esz], a, 0) +TRANS_FEAT(ZIPQ2, aa64_sme2p1_or_sve2p1, gen_gvec_ool_arg_zzz, + zipq_fns[a->esz], a, 16 / 2) + static gen_helper_gvec_3 * const uzp_fns[4] = { gen_helper_sve_uzp_b, gen_helper_sve_uzp_h, gen_helper_sve_uzp_s, gen_helper_sve_uzp_d, }; - TRANS_FEAT(UZP1_z, aa64_sve, gen_gvec_ool_arg_zzz, uzp_fns[a->esz], a, 0) TRANS_FEAT(UZP2_z, aa64_sve, gen_gvec_ool_arg_zzz, @@ -XXX,XX +XXX,XX @@ TRANS_FEAT_NONSTREAMING(UZP1_q, aa64_sve_f64mm, do_interleave_q, TRANS_FEAT_NONSTREAMING(UZP2_q, aa64_sve_f64mm, do_interleave_q, gen_helper_sve2_uzp_q, a, 16) +static gen_helper_gvec_3 * const uzpq_fns[4] = { + gen_helper_sve2p1_uzpq_b, gen_helper_sve2p1_uzpq_h, + gen_helper_sve2p1_uzpq_s, gen_helper_sve2p1_uzpq_d, +}; +TRANS_FEAT(UZPQ1, aa64_sme2p1_or_sve2p1, gen_gvec_ool_arg_zzz, + uzpq_fns[a->esz], a, 0) +TRANS_FEAT(UZPQ2, aa64_sme2p1_or_sve2p1, gen_gvec_ool_arg_zzz, + uzpq_fns[a->esz], a, 1 << a->esz) + static gen_helper_gvec_3 * const trn_fns[4] = { gen_helper_sve_trn_b, gen_helper_sve_trn_h, gen_helper_sve_trn_s, gen_helper_sve_trn_d, -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-93-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 10 ++++++++++ target/arm/tcg/sve.decode | 3 +++ target/arm/tcg/sve_helper.c | 10 ++++++++++ target/arm/tcg/translate-sve.c | 14 ++++++++++++++ 4 files changed, 37 insertions(+) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sve2_tbl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_tbl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_tbl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_tblq_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_tblq_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_tblq_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_tblq_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_4(sve2_tbx_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_tbx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_tbx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_tbx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_tbxq_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_tbxq_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_tbxq_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_tbxq_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_3(sve_sunpk_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve_sunpk_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve_sunpk_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ ZIPQ2 01000100 .. 0 ..... 111 001 ..... ..... @rd_rn_rm UZPQ1 01000100 .. 0 ..... 111 010 ..... ..... @rd_rn_rm UZPQ2 01000100 .. 0 ..... 111 011 ..... ..... @rd_rn_rm +TBLQ 01000100 .. 0 ..... 111 110 ..... ..... @rd_rn_rm +TBXQ 00000101 .. 1 ..... 001 101 ..... ..... @rd_rn_rm + ### SVE Permute - Predicated Group # SVE compress active elements diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ DO_PERSEG_ZZZ(sve2p1_zipq_h, helper_sve_zip_h) DO_PERSEG_ZZZ(sve2p1_zipq_s, helper_sve_zip_s) DO_PERSEG_ZZZ(sve2p1_zipq_d, helper_sve_zip_d) +DO_PERSEG_ZZZ(sve2p1_tblq_b, helper_sve_tbl_b) +DO_PERSEG_ZZZ(sve2p1_tblq_h, helper_sve_tbl_h) +DO_PERSEG_ZZZ(sve2p1_tblq_s, helper_sve_tbl_s) +DO_PERSEG_ZZZ(sve2p1_tblq_d, helper_sve_tbl_d) + +DO_PERSEG_ZZZ(sve2p1_tbxq_b, helper_sve2_tbx_b) +DO_PERSEG_ZZZ(sve2p1_tbxq_h, helper_sve2_tbx_h) +DO_PERSEG_ZZZ(sve2p1_tbxq_s, helper_sve2_tbx_s) +DO_PERSEG_ZZZ(sve2p1_tbxq_d, helper_sve2_tbx_d) + #undef DO_PERSEG_ZZZ #define DO_TRN(NAME, TYPE, H) \ diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_4 * const sve2_tbl_fns[4] = { TRANS_FEAT(TBL_sve2, aa64_sve2, gen_gvec_ool_zzzz, sve2_tbl_fns[a->esz], a->rd, a->rn, (a->rn + 1) % 32, a->rm, 0) +static gen_helper_gvec_3 * const tblq_fns[4] = { + gen_helper_sve2p1_tblq_b, gen_helper_sve2p1_tblq_h, + gen_helper_sve2p1_tblq_s, gen_helper_sve2p1_tblq_d +}; +TRANS_FEAT(TBLQ, aa64_sme2p1_or_sve2p1, gen_gvec_ool_arg_zzz, + tblq_fns[a->esz], a, 0) + static gen_helper_gvec_3 * const tbx_fns[4] = { gen_helper_sve2_tbx_b, gen_helper_sve2_tbx_h, gen_helper_sve2_tbx_s, gen_helper_sve2_tbx_d }; TRANS_FEAT(TBX, aa64_sve2, gen_gvec_ool_arg_zzz, tbx_fns[a->esz], a, 0) +static gen_helper_gvec_3 * const tbxq_fns[4] = { + gen_helper_sve2p1_tbxq_b, gen_helper_sve2p1_tbxq_h, + gen_helper_sve2p1_tbxq_s, gen_helper_sve2p1_tbxq_d +}; +TRANS_FEAT(TBXQ, aa64_sme2p1_or_sve2p1, gen_gvec_ool_arg_zzz, + tbxq_fns[a->esz], a, 0) + static bool trans_PMOV_pv(DisasContext *s, arg_PMOV_pv *a) { static gen_helper_gvec_2 * const fns[4] = { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Implement the SVE2p1 consecutive register LD1/ST1, and the SME2 strided register LD1/ST1. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-94-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 16 ++ target/arm/tcg/sve.decode | 50 ++++ target/arm/tcg/sve_helper.c | 493 +++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sve.c | 103 +++++++ 4 files changed, 662 insertions(+) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(pmov_pv_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(pmov_vp_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(pmov_vp_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(pmov_vp_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2p1_ld1bb_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_ld1hh_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_ld1hh_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_ld1ss_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_ld1ss_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_ld1dd_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_ld1dd_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) + +DEF_HELPER_FLAGS_5(sve2p1_st1bb_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_st1hh_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_st1hh_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_st1ss_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_st1ss_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_st1dd_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_st1dd_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ SCLAMP 01000100 .. 0 ..... 110000 ..... ..... @rda_rn_rm UCLAMP 01000100 .. 0 ..... 110001 ..... ..... @rda_rn_rm FCLAMP 01100100 .. 1 ..... 001001 ..... ..... @rda_rn_rm + +### SVE2p1 multi-vec contiguous load + +&zcrr_ldst rd png rn rm esz nreg +&zcri_ldst rd png rn imm esz nreg +%png 10:3 !function=plus_8 +%zd_ax2 1:4 !function=times_2 +%zd_ax4 2:3 !function=times_4 + +LD1_zcrr 10100000000 rm:5 0 esz:2 ... rn:5 .... - \ + &zcrr_ldst %png rd=%zd_ax2 nreg=2 +LD1_zcrr 10100000000 rm:5 1 esz:2 ... rn:5 ... 0- \ + &zcrr_ldst %png rd=%zd_ax4 nreg=4 + +ST1_zcrr 10100000001 rm:5 0 esz:2 ... rn:5 .... - \ + &zcrr_ldst %png rd=%zd_ax2 nreg=2 +ST1_zcrr 10100000001 rm:5 1 esz:2 ... rn:5 ... 0- \ + &zcrr_ldst %png rd=%zd_ax4 nreg=4 + +LD1_zcri 101000000100 imm:s4 0 esz:2 ... rn:5 .... - \ + &zcri_ldst %png rd=%zd_ax2 nreg=2 +LD1_zcri 101000000100 imm:s4 1 esz:2 ... rn:5 ... 0- \ + &zcri_ldst %png rd=%zd_ax4 nreg=4 + +ST1_zcri 101000000110 imm:s4 0 esz:2 ... rn:5 .... - \ + &zcri_ldst %png rd=%zd_ax2 nreg=2 +ST1_zcri 101000000110 imm:s4 1 esz:2 ... rn:5 ... 0- \ + &zcri_ldst %png rd=%zd_ax4 nreg=4 + +# Note: N bit and 0 bit (for nreg4) still mashed in rd. +# This is handled within gen_ldst_c(). +LD1_zcrr_stride 10100001000 rm:5 0 esz:2 ... rn:5 rd:5 \ + &zcrr_ldst %png nreg=2 +LD1_zcrr_stride 10100001000 rm:5 1 esz:2 ... rn:5 rd:5 \ + &zcrr_ldst %png nreg=4 + +ST1_zcrr_stride 10100001001 rm:5 0 esz:2 ... rn:5 rd:5 \ + &zcrr_ldst %png nreg=2 +ST1_zcrr_stride 10100001001 rm:5 1 esz:2 ... rn:5 rd:5 \ + &zcrr_ldst %png nreg=4 + +LD1_zcri_stride 101000010100 imm:s4 0 esz:2 ... rn:5 rd:5 \ + &zcri_ldst %png nreg=2 +LD1_zcri_stride 101000010100 imm:s4 1 esz:2 ... rn:5 rd:5 \ + &zcri_ldst %png nreg=4 + +ST1_zcri_stride 101000010110 imm:s4 0 esz:2 ... rn:5 rd:5 \ + &zcri_ldst %png nreg=2 +ST1_zcri_stride 101000010110 imm:s4 1 esz:2 ... rn:5 rd:5 \ + &zcri_ldst %png nreg=4 diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ DO_ST1_ZPZ_D(dd_be, zd, MO_64) #undef DO_ST1_ZPZ_S #undef DO_ST1_ZPZ_D +/* + * SVE2.1 consecutive register load/store + */ + +static unsigned sve2p1_cont_ldst_elements(SVEContLdSt *info, vaddr addr, + uint32_t png, intptr_t reg_max, + int N, int v_esz) +{ + const int esize = 1 << v_esz; + intptr_t reg_off_first = -1, reg_off_last = -1, reg_off_split; + DecodeCounter p = decode_counter(png, reg_max, v_esz); + unsigned b_count = p.count << v_esz; + unsigned b_stride = 1 << (v_esz + p.lg2_stride); + intptr_t page_split; + + /* Set all of the element indices to -1, and the TLB data to 0. */ + memset(info, -1, offsetof(SVEContLdSt, page)); + memset(info->page, 0, sizeof(info->page)); + + if (p.invert) { + if (b_count >= reg_max * N) { + return 0; + } + reg_off_first = b_count; + reg_off_last = reg_max * N - b_stride; + } else { + if (b_count == 0) { + return 0; + } + reg_off_first = 0; + reg_off_last = MIN(b_count - esize, reg_max * N - b_stride); + } + + info->reg_off_first[0] = reg_off_first; + info->mem_off_first[0] = reg_off_first; + + page_split = -(addr | TARGET_PAGE_MASK); + if (reg_off_last + esize <= page_split || reg_off_first >= page_split) { + /* The entire operation fits within a single page. */ + info->reg_off_last[0] = reg_off_last; + return b_stride; + } + + info->page_split = page_split; + reg_off_split = ROUND_DOWN(page_split, esize); + + /* + * This is the last full element on the first page, but it is not + * necessarily active. If there is no full element, i.e. the first + * active element is the one that's split, this value remains -1. + * It is useful as iteration bounds. + */ + if (reg_off_split != 0) { + info->reg_off_last[0] = ROUND_DOWN(reg_off_split - esize, b_stride); + } + + /* Determine if an unaligned element spans the pages. */ + if (page_split & (esize - 1)) { + /* It is helpful to know if the split element is active. */ + if ((reg_off_split & (b_stride - 1)) == 0) { + info->reg_off_split = reg_off_split; + info->mem_off_split = reg_off_split; + } + reg_off_split += esize; + } + + /* + * We do want the first active element on the second page, because + * this may affect the address reported in an exception. + */ + reg_off_split = ROUND_UP(reg_off_split, b_stride); + if (reg_off_split <= reg_off_last) { + info->reg_off_first[1] = reg_off_split; + info->mem_off_first[1] = reg_off_split; + info->reg_off_last[1] = reg_off_last; + } + return b_stride; +} + +static void sve2p1_cont_ldst_watchpoints(SVEContLdSt *info, CPUARMState *env, + target_ulong addr, unsigned estride, + int esize, int wp_access, uintptr_t ra) +{ +#ifndef CONFIG_USER_ONLY + intptr_t count_off, count_last; + int flags0 = info->page[0].flags; + int flags1 = info->page[1].flags; + + if (likely(!((flags0 | flags1) & TLB_WATCHPOINT))) { + return; + } + + /* Indicate that watchpoints are handled. */ + info->page[0].flags = flags0 & ~TLB_WATCHPOINT; + info->page[1].flags = flags1 & ~TLB_WATCHPOINT; + + if (flags0 & TLB_WATCHPOINT) { + count_off = info->reg_off_first[0]; + count_last = info->reg_off_split; + if (count_last < 0) { + count_last = info->reg_off_last[0]; + } + do { + cpu_check_watchpoint(env_cpu(env), addr + count_off, + esize, info->page[0].attrs, wp_access, ra); + count_off += estride; + } while (count_off <= count_last); + } + + count_off = info->reg_off_first[1]; + if ((flags1 & TLB_WATCHPOINT) && count_off >= 0) { + count_last = info->reg_off_last[1]; + do { + cpu_check_watchpoint(env_cpu(env), addr + count_off, + esize, info->page[1].attrs, + wp_access, ra); + count_off += estride; + } while (count_off <= count_last); + } +#endif +} + +static void sve2p1_cont_ldst_mte_check(SVEContLdSt *info, CPUARMState *env, + target_ulong addr, unsigned estride, + int esize, uint32_t mtedesc, + uintptr_t ra) +{ + intptr_t count_off, count_last; + + /* + * TODO: estride is always a small power of two, <= 8. + * Manipulate the stride within the loops such that + * - first iteration hits addr + off, as required, + * - second iteration hits ALIGN_UP(addr, 16), + * - other iterations advance addr by 16. + * This will minimize the probing to once per MTE granule. + */ + + /* Process the page only if MemAttr == Tagged. */ + if (info->page[0].tagged) { + count_off = info->reg_off_first[0]; + count_last = info->reg_off_split; + if (count_last < 0) { + count_last = info->reg_off_last[0]; + } + + do { + mte_check(env, mtedesc, addr + count_off, ra); + count_off += estride; + } while (count_off <= count_last); + } + + count_off = info->reg_off_first[1]; + if (count_off >= 0 && info->page[1].tagged) { + count_last = info->reg_off_last[1]; + do { + mte_check(env, mtedesc, addr + count_off, ra); + count_off += estride; + } while (count_off <= count_last); + } +} + +static inline QEMU_ALWAYS_INLINE +void sve2p1_ld1_c(CPUARMState *env, ARMVectorReg *zd, const vaddr addr, + uint32_t png, uint32_t desc, + const uintptr_t ra, const MemOp esz, + sve_ldst1_host_fn *host_fn, + sve_ldst1_tlb_fn *tlb_fn) +{ + const unsigned N = (desc >> SIMD_DATA_SHIFT) & 1 ? 4 : 2; + const unsigned rstride = 1 << ((desc >> (SIMD_DATA_SHIFT + 1)) % 4); + uint32_t mtedesc = desc >> (SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT); + const intptr_t reg_max = simd_oprsz(desc); + const unsigned esize = 1 << esz; + intptr_t count_off, count_last; + intptr_t reg_off, reg_last, reg_n; + SVEContLdSt info; + unsigned estride, flags; + void *host; + + estride = sve2p1_cont_ldst_elements(&info, addr, png, reg_max, N, esz); + if (estride == 0) { + /* The entire predicate was false; no load occurs. */ + for (unsigned n = 0; n < N; n++) { + memset(zd + n * rstride, 0, reg_max); + } + return; + } + + /* Probe the page(s). Exit with exception for any invalid page. */ + sve_cont_ldst_pages(&info, FAULT_ALL, env, addr, MMU_DATA_LOAD, ra); + + /* Handle watchpoints for all active elements. */ + sve2p1_cont_ldst_watchpoints(&info, env, addr, estride, + esize, BP_MEM_READ, ra); + + /* + * Handle mte checks for all active elements. + * Since TBI must be set for MTE, !mtedesc => !mte_active. + */ + if (mtedesc) { + sve2p1_cont_ldst_mte_check(&info, env, estride, addr, + esize, mtedesc, ra); + } + + flags = info.page[0].flags | info.page[1].flags; + if (unlikely(flags != 0)) { + /* + * At least one page includes MMIO. + * Any bus operation can fail with cpu_transaction_failed, + * which for ARM will raise SyncExternal. Perform the load + * into scratch memory to preserve register state until the end. + */ + ARMVectorReg scratch[4] = { }; + + count_off = info.reg_off_first[0]; + count_last = info.reg_off_last[1]; + if (count_last < 0) { + count_last = info.reg_off_split; + if (count_last < 0) { + count_last = info.reg_off_last[0]; + } + } + reg_off = count_off % reg_max; + reg_n = count_off / reg_max; + + do { + reg_last = MIN(count_last - count_off, reg_max - esize); + do { + tlb_fn(env, &scratch[reg_n], reg_off, addr + count_off, ra); + reg_off += estride; + count_off += estride; + } while (reg_off <= reg_last); + reg_off = 0; + reg_n++; + } while (count_off <= count_last); + + for (unsigned n = 0; n < N; ++n) { + memcpy(&zd[n * rstride], &scratch[n], reg_max); + } + return; + } + + /* The entire operation is in RAM, on valid pages. */ + + for (unsigned n = 0; n < N; ++n) { + memset(&zd[n * rstride], 0, reg_max); + } + + count_off = info.reg_off_first[0]; + count_last = info.reg_off_last[0]; + reg_off = count_off % reg_max; + reg_n = count_off / reg_max; + host = info.page[0].host; + + set_helper_retaddr(ra); + + do { + reg_last = MIN(count_last - reg_n * reg_max, reg_max - esize); + do { + host_fn(&zd[reg_n * rstride], reg_off, host + count_off); + reg_off += estride; + count_off += estride; + } while (reg_off <= reg_last); + reg_off = 0; + reg_n++; + } while (count_off <= count_last); + + clear_helper_retaddr(); + + /* + * Use the slow path to manage the cross-page misalignment. + * But we know this is RAM and cannot trap. + */ + count_off = info.reg_off_split; + if (unlikely(count_off >= 0)) { + reg_off = count_off % reg_max; + reg_n = count_off / reg_max; + tlb_fn(env, &zd[reg_n * rstride], reg_off, addr + count_off, ra); + } + + count_off = info.reg_off_first[1]; + if (unlikely(count_off >= 0)) { + count_last = info.reg_off_last[1]; + reg_off = count_off % reg_max; + reg_n = count_off / reg_max; + host = info.page[1].host; + + set_helper_retaddr(ra); + + do { + reg_last = MIN(count_last - reg_n * reg_max, reg_max - esize); + do { + host_fn(&zd[reg_n * rstride], reg_off, host + count_off); + reg_off += estride; + count_off += estride; + } while (reg_off <= reg_last); + reg_off = 0; + reg_n++; + } while (count_off <= count_last); + + clear_helper_retaddr(); + } +} + +void HELPER(sve2p1_ld1bb_c)(CPUARMState *env, void *vd, target_ulong addr, + uint32_t png, uint32_t desc) +{ + sve2p1_ld1_c(env, vd, addr, png, desc, GETPC(), MO_8, + sve_ld1bb_host, sve_ld1bb_tlb); +} + +#define DO_LD1_2(NAME, ESZ) \ +void HELPER(sve2p1_##NAME##_le_c)(CPUARMState *env, void *vd, \ + target_ulong addr, uint32_t png, \ + uint32_t desc) \ +{ \ + sve2p1_ld1_c(env, vd, addr, png, desc, GETPC(), ESZ, \ + sve_##NAME##_le_host, sve_##NAME##_le_tlb); \ +} \ +void HELPER(sve2p1_##NAME##_be_c)(CPUARMState *env, void *vd, \ + target_ulong addr, uint32_t png, \ + uint32_t desc) \ +{ \ + sve2p1_ld1_c(env, vd, addr, png, desc, GETPC(), ESZ, \ + sve_##NAME##_be_host, sve_##NAME##_be_tlb); \ +} + +DO_LD1_2(ld1hh, MO_16) +DO_LD1_2(ld1ss, MO_32) +DO_LD1_2(ld1dd, MO_64) + +#undef DO_LD1_2 + +static inline QEMU_ALWAYS_INLINE +void sve2p1_st1_c(CPUARMState *env, ARMVectorReg *zd, const vaddr addr, + uint32_t png, uint32_t desc, + const uintptr_t ra, const int esz, + sve_ldst1_host_fn *host_fn, + sve_ldst1_tlb_fn *tlb_fn) +{ + const unsigned N = (desc >> SIMD_DATA_SHIFT) & 1 ? 4 : 2; + const unsigned rstride = 1 << ((desc >> (SIMD_DATA_SHIFT + 1)) % 4); + uint32_t mtedesc = desc >> (SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT); + const intptr_t reg_max = simd_oprsz(desc); + const unsigned esize = 1 << esz; + intptr_t count_off, count_last; + intptr_t reg_off, reg_last, reg_n; + SVEContLdSt info; + unsigned estride, flags; + void *host; + + estride = sve2p1_cont_ldst_elements(&info, addr, png, reg_max, N, esz); + if (estride == 0) { + /* The entire predicate was false; no store occurs. */ + return; + } + + /* Probe the page(s). Exit with exception for any invalid page. */ + sve_cont_ldst_pages(&info, FAULT_ALL, env, addr, MMU_DATA_STORE, ra); + + /* Handle watchpoints for all active elements. */ + sve2p1_cont_ldst_watchpoints(&info, env, addr, estride, + esize, BP_MEM_WRITE, ra); + + /* + * Handle mte checks for all active elements. + * Since TBI must be set for MTE, !mtedesc => !mte_active. + */ + if (mtedesc) { + sve2p1_cont_ldst_mte_check(&info, env, estride, addr, + esize, mtedesc, ra); + } + + flags = info.page[0].flags | info.page[1].flags; + if (unlikely(flags != 0)) { + /* + * At least one page includes MMIO. + * Any bus operation can fail with cpu_transaction_failed, + * which for ARM will raise SyncExternal. Perform the load + * into scratch memory to preserve register state until the end. + */ + count_off = info.reg_off_first[0]; + count_last = info.reg_off_last[1]; + if (count_last < 0) { + count_last = info.reg_off_split; + if (count_last < 0) { + count_last = info.reg_off_last[0]; + } + } + reg_off = count_off % reg_max; + reg_n = count_off / reg_max; + + do { + reg_last = MIN(count_last - count_off, reg_max - esize); + do { + tlb_fn(env, &zd[reg_n * rstride], reg_off, addr + count_off, ra); + reg_off += estride; + count_off += estride; + } while (reg_off <= reg_last); + reg_off = 0; + reg_n++; + } while (count_off <= count_last); + return; + } + + /* The entire operation is in RAM, on valid pages. */ + + count_off = info.reg_off_first[0]; + count_last = info.reg_off_last[0]; + reg_off = count_off % reg_max; + reg_n = count_off / reg_max; + host = info.page[0].host; + + set_helper_retaddr(ra); + + do { + reg_last = MIN(count_last - reg_n * reg_max, reg_max - esize); + do { + host_fn(&zd[reg_n * rstride], reg_off, host + count_off); + reg_off += estride; + count_off += estride; + } while (reg_off <= reg_last); + reg_off = 0; + reg_n++; + } while (count_off <= count_last); + + clear_helper_retaddr(); + + /* + * Use the slow path to manage the cross-page misalignment. + * But we know this is RAM and cannot trap. + */ + count_off = info.reg_off_split; + if (unlikely(count_off >= 0)) { + reg_off = count_off % reg_max; + reg_n = count_off / reg_max; + tlb_fn(env, &zd[reg_n * rstride], reg_off, addr + count_off, ra); + } + + count_off = info.reg_off_first[1]; + if (unlikely(count_off >= 0)) { + count_last = info.reg_off_last[1]; + reg_off = count_off % reg_max; + reg_n = count_off / reg_max; + host = info.page[1].host; + + set_helper_retaddr(ra); + + do { + reg_last = MIN(count_last - reg_n * reg_max, reg_max - esize); + do { + host_fn(&zd[reg_n * rstride], reg_off, host + count_off); + reg_off += estride; + count_off += estride; + } while (reg_off <= reg_last); + reg_off = 0; + reg_n++; + } while (count_off <= count_last); + + clear_helper_retaddr(); + } +} + +void HELPER(sve2p1_st1bb_c)(CPUARMState *env, void *vd, target_ulong addr, + uint32_t png, uint32_t desc) +{ + sve2p1_st1_c(env, vd, addr, png, desc, GETPC(), MO_8, + sve_st1bb_host, sve_st1bb_tlb); +} + +#define DO_ST1_2(NAME, ESZ) \ +void HELPER(sve2p1_##NAME##_le_c)(CPUARMState *env, void *vd, \ + target_ulong addr, uint32_t png, \ + uint32_t desc) \ +{ \ + sve2p1_st1_c(env, vd, addr, png, desc, GETPC(), ESZ, \ + sve_##NAME##_le_host, sve_##NAME##_le_tlb); \ +} \ +void HELPER(sve2p1_##NAME##_be_c)(CPUARMState *env, void *vd, \ + target_ulong addr, uint32_t png, \ + uint32_t desc) \ +{ \ + sve2p1_st1_c(env, vd, addr, png, desc, GETPC(), ESZ, \ + sve_##NAME##_be_host, sve_##NAME##_be_tlb); \ +} + +DO_ST1_2(st1hh, MO_16) +DO_ST1_2(st1ss, MO_32) +DO_ST1_2(st1dd, MO_64) + +#undef DO_ST1_2 + void HELPER(sve2_eor3)(void *vd, void *vn, void *vm, void *vk, uint32_t desc) { intptr_t i, opr_sz = simd_oprsz(desc) / 8; diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(UQCVTN_sh, aa64_sme2_or_sve2p1, gen_gvec_ool_zz, gen_helper_sme2_uqcvtn_sh, a->rd, a->rn, 0) TRANS_FEAT(SQCVTUN_sh, aa64_sme2_or_sve2p1, gen_gvec_ool_zz, gen_helper_sme2_sqcvtun_sh, a->rd, a->rn, 0) + +static bool gen_ldst_c(DisasContext *s, TCGv_i64 addr, int zd, int png, + MemOp esz, bool is_write, int n, bool strided) +{ + typedef void ldst_c_fn(TCGv_env, TCGv_ptr, TCGv_i64, + TCGv_i32, TCGv_i32); + static ldst_c_fn * const f_ldst[2][2][4] = { + { { gen_helper_sve2p1_ld1bb_c, + gen_helper_sve2p1_ld1hh_le_c, + gen_helper_sve2p1_ld1ss_le_c, + gen_helper_sve2p1_ld1dd_le_c, }, + { gen_helper_sve2p1_ld1bb_c, + gen_helper_sve2p1_ld1hh_be_c, + gen_helper_sve2p1_ld1ss_be_c, + gen_helper_sve2p1_ld1dd_be_c, } }, + + { { gen_helper_sve2p1_st1bb_c, + gen_helper_sve2p1_st1hh_le_c, + gen_helper_sve2p1_st1ss_le_c, + gen_helper_sve2p1_st1dd_le_c, }, + { gen_helper_sve2p1_st1bb_c, + gen_helper_sve2p1_st1hh_be_c, + gen_helper_sve2p1_st1ss_be_c, + gen_helper_sve2p1_st1dd_be_c, } } + }; + + TCGv_i32 t_png, t_desc; + TCGv_ptr t_zd; + uint32_t desc, lg2_rstride = 0; + bool be = s->be_data == MO_BE; + + assert(n == 2 || n == 4); + if (strided) { + lg2_rstride = 3; + if (n == 4) { + /* Validate ZD alignment. */ + if (zd & 4) { + return false; + } + lg2_rstride = 2; + } + /* Ignore non-temporal bit */ + zd &= ~8; + } + + if (strided || !dc_isar_feature(aa64_sve2p1, s) + ? !sme_sm_enabled_check(s) + : !sve_access_check(s)) { + return true; + } + + if (!s->mte_active[0]) { + addr = clean_data_tbi(s, addr); + } + + desc = n == 2 ? 0 : 1; + desc = desc | (lg2_rstride << 1); + desc = make_svemte_desc(s, vec_full_reg_size(s), 1, esz, is_write, desc); + t_desc = tcg_constant_i32(desc); + + t_png = tcg_temp_new_i32(); + tcg_gen_ld16u_i32(t_png, tcg_env, + pred_full_reg_offset(s, png) ^ + (HOST_BIG_ENDIAN ? 6 : 0)); + + t_zd = tcg_temp_new_ptr(); + tcg_gen_addi_ptr(t_zd, tcg_env, vec_full_reg_offset(s, zd)); + + f_ldst[is_write][be][esz](tcg_env, t_zd, addr, t_png, t_desc); + return true; +} + +static bool gen_ldst_zcrr_c(DisasContext *s, arg_zcrr_ldst *a, + bool is_write, bool strided) +{ + TCGv_i64 addr = tcg_temp_new_i64(); + + tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), a->esz); + tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn)); + return gen_ldst_c(s, addr, a->rd, a->png, a->esz, is_write, + a->nreg, strided); +} + +static bool gen_ldst_zcri_c(DisasContext *s, arg_zcri_ldst *a, + bool is_write, bool strided) +{ + TCGv_i64 addr = tcg_temp_new_i64(); + + tcg_gen_addi_i64(addr, cpu_reg_sp(s, a->rn), + a->imm * a->nreg * vec_full_reg_size(s)); + return gen_ldst_c(s, addr, a->rd, a->png, a->esz, is_write, + a->nreg, strided); +} + +TRANS_FEAT(LD1_zcrr, aa64_sme2_or_sve2p1, gen_ldst_zcrr_c, a, false, false) +TRANS_FEAT(LD1_zcri, aa64_sme2_or_sve2p1, gen_ldst_zcri_c, a, false, false) +TRANS_FEAT(ST1_zcrr, aa64_sme2_or_sve2p1, gen_ldst_zcrr_c, a, true, false) +TRANS_FEAT(ST1_zcri, aa64_sme2_or_sve2p1, gen_ldst_zcri_c, a, true, false) + +TRANS_FEAT(LD1_zcrr_stride, aa64_sme2, gen_ldst_zcrr_c, a, false, true) +TRANS_FEAT(LD1_zcri_stride, aa64_sme2, gen_ldst_zcri_c, a, false, true) +TRANS_FEAT(ST1_zcrr_stride, aa64_sme2, gen_ldst_zcrr_c, a, true, true) +TRANS_FEAT(ST1_zcri_stride, aa64_sme2, gen_ldst_zcri_c, a, true, true) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> The msz > esz encodings are reserved, and some of them are about to be reused. Split these patterns so that the new insns do not overlap. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-95-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ # Stores; user must fill in ESZ, MSZ, NREG as needed. @rprr_store ....... .. .. rm:5 ... pg:3 rn:5 rd:5 &rprr_store -@rpri_store_msz ....... msz:2 .. . imm:s4 ... pg:3 rn:5 rd:5 &rpri_store +@rpri_store ....... .. .. . imm:s4 ... pg:3 rn:5 rd:5 &rpri_store @rprr_store_esz_n0 ....... .. esz:2 rm:5 ... pg:3 rn:5 rd:5 \ &rprr_store nreg=0 @rprr_scatter_store ....... msz:2 .. rm:5 ... pg:3 rn:5 rd:5 \ @@ -XXX,XX +XXX,XX @@ STR_zri 1110010 11 0. ..... 010 ... ..... ..... @rd_rn_i9 # SVE contiguous store (scalar plus immediate) # ST1B, ST1H, ST1W, ST1D; require msz <= esz -ST_zpri 1110010 .. esz:2 0.... 111 ... ..... ..... \ - @rpri_store_msz nreg=0 +ST_zpri 1110010 00 esz:2 0.... 111 ... ..... ..... \ + @rpri_store msz=0 nreg=0 +ST_zpri 1110010 01 esz:2 0.... 111 ... ..... ..... \ + @rpri_store msz=1 nreg=0 +ST_zpri 1110010 10 10 0.... 111 ... ..... ..... \ + @rpri_store msz=2 esz=2 nreg=0 +ST_zpri 1110010 10 11 0.... 111 ... ..... ..... \ + @rpri_store msz=2 esz=3 nreg=0 +ST_zpri 1110010 11 11 0.... 111 ... ..... ..... \ + @rpri_store msz=3 esz=3 nreg=0 # SVE contiguous store (scalar plus scalar) # ST1B, ST1H, ST1W, ST1D; require msz <= esz @@ -XXX,XX +XXX,XX @@ ST_zprr 1110010 00 .. ..... 010 ... ..... ..... \ @rprr_store_esz_n0 msz=0 ST_zprr 1110010 01 .. ..... 010 ... ..... ..... \ @rprr_store_esz_n0 msz=1 -ST_zprr 1110010 10 .. ..... 010 ... ..... ..... \ - @rprr_store_esz_n0 msz=2 +ST_zprr 1110010 10 10 ..... 010 ... ..... ..... \ + @rprr_store msz=2 esz=2 nreg=0 +ST_zprr 1110010 10 11 ..... 010 ... ..... ..... \ + @rprr_store msz=2 esz=3 nreg=0 ST_zprr 1110010 11 11 ..... 010 ... ..... ..... \ @rprr_store msz=3 esz=3 nreg=0 # SVE contiguous non-temporal store (scalar plus immediate) (nreg == 0) # SVE store multiple structures (scalar plus immediate) (nreg != 0) ST_zpri 1110010 .. nreg:2 1.... 111 ... ..... ..... \ - @rpri_store_msz esz=%size_23 + @rpri_store msz=%size_23 esz=%size_23 # SVE contiguous non-temporal store (scalar plus scalar) (nreg == 0) # SVE store multiple structures (scalar plus scalar) (nreg != 0) -ST_zprr 1110010 msz:2 nreg:2 ..... 011 ... ..... ..... \ - @rprr_store esz=%size_23 +ST_zprr 1110010 .. nreg:2 ..... 011 ... ..... ..... \ + @rprr_store msz=%size_23 esz=%size_23 # SVE 32-bit scatter store (scalar plus 32-bit scaled offsets) # Require msz > 0 && msz <= esz. -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-96-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 22 +++++ target/arm/tcg/sve_ldst_internal.h | 26 ++++++ target/arm/tcg/sve.decode | 20 +++++ target/arm/tcg/sve_helper.c | 6 ++ target/arm/tcg/translate-sve.c | 136 +++++++++++++++++++++++------ 5 files changed, 183 insertions(+), 27 deletions(-) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_ld1hds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1sdu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1sds_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1squ_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1dqu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_4(sve_ld1sdu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1sds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1squ_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1dqu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_4(sve_ld1bb_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld2bb_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld3bb_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_ld1hds_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1sdu_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1sds_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1squ_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1dqu_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_4(sve_ld1sdu_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1sds_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1squ_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1dqu_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_4(sve_ldff1bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ldff1bhu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ldff1bsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_st1hd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st1sd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st1sd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st1sq_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st1sq_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st1dq_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st1dq_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_4(sve_st1bb_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st2bb_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st3bb_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_st1hd_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st1sd_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st1sd_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st1sq_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st1sq_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st1dq_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st1dq_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_6(sve_ldbsu_zsu, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_6(sve_ldhsu_le_zsu, TCG_CALL_NO_WG, diff --git a/target/arm/tcg/sve_ldst_internal.h b/target/arm/tcg/sve_ldst_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_ldst_internal.h +++ b/target/arm/tcg/sve_ldst_internal.h @@ -XXX,XX +XXX,XX @@ DO_ST_PRIM_2(sd, H1_8, uint64_t, uint32_t, stl) DO_LD_PRIM_2(dd, H1_8, uint64_t, uint64_t, ldq) DO_ST_PRIM_2(dd, H1_8, uint64_t, uint64_t, stq) +#define DO_LD_PRIM_3(NAME, FUNC) \ + static inline void sve_##NAME##_host(void *vd, \ + intptr_t reg_off, void *host) \ + { sve_##FUNC##_host(vd, reg_off, host); \ + *(uint64_t *)(vd + reg_off + 8) = 0; } \ + static inline void sve_##NAME##_tlb(CPUARMState *env, void *vd, \ + intptr_t reg_off, target_ulong addr, uintptr_t ra) \ + { sve_##FUNC##_tlb(env, vd, reg_off, addr, ra); \ + *(uint64_t *)(vd + reg_off + 8) = 0; } + +DO_LD_PRIM_3(ld1squ_be, ld1sdu_be) +DO_LD_PRIM_3(ld1squ_le, ld1sdu_le) +DO_LD_PRIM_3(ld1dqu_be, ld1dd_be) +DO_LD_PRIM_3(ld1dqu_le, ld1dd_le) + +#define sve_st1sq_be_host sve_st1sd_be_host +#define sve_st1sq_le_host sve_st1sd_le_host +#define sve_st1sq_be_tlb sve_st1sd_be_tlb +#define sve_st1sq_le_tlb sve_st1sd_le_tlb + +#define sve_st1dq_be_host sve_st1dd_be_host +#define sve_st1dq_le_host sve_st1dd_le_host +#define sve_st1dq_be_tlb sve_st1dd_be_tlb +#define sve_st1dq_le_tlb sve_st1dd_le_tlb + #undef DO_LD_TLB #undef DO_ST_TLB #undef DO_LD_HOST @@ -XXX,XX +XXX,XX @@ DO_ST_PRIM_2(dd, H1_8, uint64_t, uint64_t, stq) #undef DO_ST_PRIM_1 #undef DO_LD_PRIM_2 #undef DO_ST_PRIM_2 +#undef DO_LD_PRIM_3 /* * Resolve the guest virtual address to info->host and info->flags. diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ LD1_zpiz 1000010 .. 01 ..... 1.. ... ..... ..... \ # SVE contiguous load (scalar plus scalar) LD_zprr 1010010 .... ..... 010 ... ..... ..... @rprr_load_dt nreg=0 +# LD1W (128-bit element) +LD_zprr 1010010 1000 rm:5 100 pg:3 rn:5 rd:5 \ + &rprr_load dtype=16 nreg=0 +# LD1D (128-bit element) +LD_zprr 1010010 1100 rm:5 100 pg:3 rn:5 rd:5 \ + &rprr_load dtype=17 nreg=0 # SVE contiguous first-fault load (scalar plus scalar) LDFF1_zprr 1010010 .... ..... 011 ... ..... ..... @rprr_load_dt nreg=0 # SVE contiguous load (scalar plus immediate) LD_zpri 1010010 .... 0.... 101 ... ..... ..... @rpri_load_dt nreg=0 +# LD1W (128-bit element) +LD_zpri 1010010 1000 1 imm:s4 001 pg:3 rn:5 rd:5 \ + &rpri_load dtype=16 nreg=0 +# LD1D (128-bit element) +LD_zpri 1010010 1100 1 imm:s4 001 pg:3 rn:5 rd:5 \ + &rpri_load dtype=17 nreg=0 # SVE contiguous non-fault load (scalar plus immediate) LDNF1_zpri 1010010 .... 1.... 101 ... ..... ..... @rpri_load_dt nreg=0 @@ -XXX,XX +XXX,XX @@ ST_zpri 1110010 10 11 0.... 111 ... ..... ..... \ @rpri_store msz=2 esz=3 nreg=0 ST_zpri 1110010 11 11 0.... 111 ... ..... ..... \ @rpri_store msz=3 esz=3 nreg=0 +ST_zpri 1110010 10 00 0.... 111 ... ..... ..... \ + @rpri_store msz=2 esz=4 nreg=0 +ST_zpri 1110010 11 10 0.... 111 ... ..... ..... \ + @rpri_store msz=3 esz=4 nreg=0 # SVE contiguous store (scalar plus scalar) # ST1B, ST1H, ST1W, ST1D; require msz <= esz @@ -XXX,XX +XXX,XX @@ ST_zprr 1110010 10 11 ..... 010 ... ..... ..... \ @rprr_store msz=2 esz=3 nreg=0 ST_zprr 1110010 11 11 ..... 010 ... ..... ..... \ @rprr_store msz=3 esz=3 nreg=0 +ST_zprr 1110010 10 00 ..... 010 ... ..... ..... \ + @rprr_store msz=2 esz=4 nreg=0 +ST_zprr 1110010 11 10 ..... 010 ... ..... ..... \ + @rprr_store msz=3 esz=4 nreg=0 # SVE contiguous non-temporal store (scalar plus immediate) (nreg == 0) # SVE store multiple structures (scalar plus immediate) (nreg != 0) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ DO_LD1_2(ld1sds, MO_64, MO_32) DO_LD1_2(ld1dd, MO_64, MO_64) +DO_LD1_2(ld1squ, MO_32, MO_128) +DO_LD1_2(ld1dqu, MO_64, MO_128) + #undef DO_LD1_1 #undef DO_LD1_2 @@ -XXX,XX +XXX,XX @@ DO_STN_2(2, dd, MO_64, MO_64) DO_STN_2(3, dd, MO_64, MO_64) DO_STN_2(4, dd, MO_64, MO_64) +DO_STN_2(1, sq, MO_128, MO_32) +DO_STN_2(1, dq, MO_128, MO_64) + #undef DO_STN_1 #undef DO_STN_2 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_STR_pri(DisasContext *s, arg_rri *a) */ /* The memory mode of the dtype. */ -static const MemOp dtype_mop[16] = { +static const MemOp dtype_mop[19] = { MO_UB, MO_UB, MO_UB, MO_UB, MO_SL, MO_UW, MO_UW, MO_UW, MO_SW, MO_SW, MO_UL, MO_UL, - MO_SB, MO_SB, MO_SB, MO_UQ + MO_SB, MO_SB, MO_SB, MO_UQ, + /* Artificial values used by decode */ + MO_UL, MO_UQ, MO_128, }; #define dtype_msz(x) (dtype_mop[x] & MO_SIZE) /* The vector element size of dtype. */ -static const uint8_t dtype_esz[16] = { +static const uint8_t dtype_esz[19] = { 0, 1, 2, 3, 3, 1, 2, 3, 3, 2, 2, 3, - 3, 2, 1, 3 + 3, 2, 1, 3, + /* Artificial values used by decode */ + 4, 4, 4, }; uint32_t make_svemte_desc(DisasContext *s, unsigned vsz, uint32_t nregs, @@ -XXX,XX +XXX,XX @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, } /* Indexed by [mte][be][dtype][nreg] */ -static gen_helper_gvec_mem * const ldr_fns[2][2][16][4] = { +static gen_helper_gvec_mem * const ldr_fns[2][2][18][4] = { { /* mte inactive, little-endian */ { { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r, gen_helper_sve_ld3bb_r, gen_helper_sve_ld4bb_r }, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_mem * const ldr_fns[2][2][16][4] = { { gen_helper_sve_ld1bss_r, NULL, NULL, NULL }, { gen_helper_sve_ld1bhs_r, NULL, NULL, NULL }, { gen_helper_sve_ld1dd_le_r, gen_helper_sve_ld2dd_le_r, - gen_helper_sve_ld3dd_le_r, gen_helper_sve_ld4dd_le_r } }, + gen_helper_sve_ld3dd_le_r, gen_helper_sve_ld4dd_le_r }, + + { gen_helper_sve_ld1squ_le_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1dqu_le_r, NULL, NULL, NULL }, + }, /* mte inactive, big-endian */ { { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_mem * const ldr_fns[2][2][16][4] = { { gen_helper_sve_ld1bss_r, NULL, NULL, NULL }, { gen_helper_sve_ld1bhs_r, NULL, NULL, NULL }, { gen_helper_sve_ld1dd_be_r, gen_helper_sve_ld2dd_be_r, - gen_helper_sve_ld3dd_be_r, gen_helper_sve_ld4dd_be_r } } }, + gen_helper_sve_ld3dd_be_r, gen_helper_sve_ld4dd_be_r }, + + { gen_helper_sve_ld1squ_be_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1dqu_be_r, NULL, NULL, NULL }, + }, + }, { /* mte active, little-endian */ { { gen_helper_sve_ld1bb_r_mte, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_mem * const ldr_fns[2][2][16][4] = { { gen_helper_sve_ld1dd_le_r_mte, gen_helper_sve_ld2dd_le_r_mte, gen_helper_sve_ld3dd_le_r_mte, - gen_helper_sve_ld4dd_le_r_mte } }, + gen_helper_sve_ld4dd_le_r_mte }, + + { gen_helper_sve_ld1squ_le_r_mte, NULL, NULL, NULL }, + { gen_helper_sve_ld1dqu_le_r_mte, NULL, NULL, NULL }, + }, /* mte active, big-endian */ { { gen_helper_sve_ld1bb_r_mte, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_mem * const ldr_fns[2][2][16][4] = { { gen_helper_sve_ld1dd_be_r_mte, gen_helper_sve_ld2dd_be_r_mte, gen_helper_sve_ld3dd_be_r_mte, - gen_helper_sve_ld4dd_be_r_mte } } }, + gen_helper_sve_ld4dd_be_r_mte }, + + { gen_helper_sve_ld1squ_be_r_mte, NULL, NULL, NULL }, + { gen_helper_sve_ld1dqu_be_r_mte, NULL, NULL, NULL }, + }, + }, }; static void do_ld_zpa(DisasContext *s, int zt, int pg, @@ -XXX,XX +XXX,XX @@ static void do_ld_zpa(DisasContext *s, int zt, int pg, static bool trans_LD_zprr(DisasContext *s, arg_rprr_load *a) { - if (a->rm == 31 || !dc_isar_feature(aa64_sve, s)) { + if (a->rm == 31) { return false; } + + /* dtypes 16 and 17 are artificial, representing 128-bit element */ + if (a->dtype < 16) { + if (!dc_isar_feature(aa64_sve, s)) { + return false; + } + } else { + if (!dc_isar_feature(aa64_sve2p1, s)) { + return false; + } + s->is_nonstreaming = true; + } + if (sve_access_check(s)) { TCGv_i64 addr = tcg_temp_new_i64(); tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), dtype_msz(a->dtype)); @@ -XXX,XX +XXX,XX @@ static bool trans_LD_zprr(DisasContext *s, arg_rprr_load *a) static bool trans_LD_zpri(DisasContext *s, arg_rpri_load *a) { - if (!dc_isar_feature(aa64_sve, s)) { - return false; + /* dtypes 16 and 17 are artificial, representing 128-bit element */ + if (a->dtype < 16) { + if (!dc_isar_feature(aa64_sve, s)) { + return false; + } + } else { + if (!dc_isar_feature(aa64_sve2p1, s)) { + return false; + } + s->is_nonstreaming = true; } + if (sve_access_check(s)) { int vsz = vec_full_reg_size(s); int elements = vsz >> dtype_esz[a->dtype]; @@ -XXX,XX +XXX,XX @@ static bool trans_LD1R_zpri(DisasContext *s, arg_rpri_load *a) static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, int msz, int esz, int nreg) { - static gen_helper_gvec_mem * const fn_single[2][2][4][4] = { + static gen_helper_gvec_mem * const fn_single[2][2][4][5] = { { { { gen_helper_sve_st1bb_r, gen_helper_sve_st1bh_r, gen_helper_sve_st1bs_r, @@ -XXX,XX +XXX,XX @@ static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, gen_helper_sve_st1hd_le_r }, { NULL, NULL, gen_helper_sve_st1ss_le_r, - gen_helper_sve_st1sd_le_r }, + gen_helper_sve_st1sd_le_r, + gen_helper_sve_st1sq_le_r, }, { NULL, NULL, NULL, - gen_helper_sve_st1dd_le_r } }, + gen_helper_sve_st1dd_le_r, + gen_helper_sve_st1dq_le_r, } }, { { gen_helper_sve_st1bb_r, gen_helper_sve_st1bh_r, gen_helper_sve_st1bs_r, @@ -XXX,XX +XXX,XX @@ static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, gen_helper_sve_st1hd_be_r }, { NULL, NULL, gen_helper_sve_st1ss_be_r, - gen_helper_sve_st1sd_be_r }, + gen_helper_sve_st1sd_be_r, + gen_helper_sve_st1sq_be_r }, { NULL, NULL, NULL, - gen_helper_sve_st1dd_be_r } } }, + gen_helper_sve_st1dd_be_r, + gen_helper_sve_st1dq_be_r } } }, { { { gen_helper_sve_st1bb_r_mte, gen_helper_sve_st1bh_r_mte, @@ -XXX,XX +XXX,XX @@ static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, gen_helper_sve_st1hd_le_r_mte }, { NULL, NULL, gen_helper_sve_st1ss_le_r_mte, - gen_helper_sve_st1sd_le_r_mte }, + gen_helper_sve_st1sd_le_r_mte, + gen_helper_sve_st1sq_le_r_mte }, { NULL, NULL, NULL, - gen_helper_sve_st1dd_le_r_mte } }, + gen_helper_sve_st1dd_le_r_mte, + gen_helper_sve_st1dq_le_r_mte } }, { { gen_helper_sve_st1bb_r_mte, gen_helper_sve_st1bh_r_mte, gen_helper_sve_st1bs_r_mte, @@ -XXX,XX +XXX,XX @@ static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, gen_helper_sve_st1hd_be_r_mte }, { NULL, NULL, gen_helper_sve_st1ss_be_r_mte, - gen_helper_sve_st1sd_be_r_mte }, + gen_helper_sve_st1sd_be_r_mte, + gen_helper_sve_st1sq_be_r_mte }, { NULL, NULL, NULL, - gen_helper_sve_st1dd_be_r_mte } } }, + gen_helper_sve_st1dd_be_r_mte, + gen_helper_sve_st1dq_be_r_mte } } }, }; static gen_helper_gvec_mem * const fn_multiple[2][2][3][4] = { { { { gen_helper_sve_st2bb_r, @@ -XXX,XX +XXX,XX @@ static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, static bool trans_ST_zprr(DisasContext *s, arg_rprr_store *a) { - if (!dc_isar_feature(aa64_sve, s)) { - return false; - } if (a->rm == 31 || a->msz > a->esz) { return false; } + switch (a->esz) { + case MO_8 ... MO_64: + if (!dc_isar_feature(aa64_sve, s)) { + return false; + } + break; + case MO_128: + assert(a->msz < a->esz); + assert(a->nreg == 0); + if (!dc_isar_feature(aa64_sve2p1, s)) { + return false; + } + s->is_nonstreaming = true; + break; + default: + g_assert_not_reached(); + } + if (sve_access_check(s)) { TCGv_i64 addr = tcg_temp_new_i64(); tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), a->msz); @@ -XXX,XX +XXX,XX @@ static bool trans_ST_zprr(DisasContext *s, arg_rprr_store *a) static bool trans_ST_zpri(DisasContext *s, arg_rpri_store *a) { - if (!dc_isar_feature(aa64_sve, s)) { - return false; - } if (a->msz > a->esz) { return false; } + switch (a->esz) { + case MO_8 ... MO_64: + if (!dc_isar_feature(aa64_sve, s)) { + return false; + } + break; + case MO_128: + assert(a->msz < a->esz); + assert(a->nreg == 0); + if (!dc_isar_feature(aa64_sve2p1, s)) { + return false; + } + s->is_nonstreaming = true; + break; + default: + g_assert_not_reached(); + } + if (sve_access_check(s)) { int vsz = vec_full_reg_size(s); int elements = vsz >> a->esz; -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Move from sme_helper.c to the shared header. Add a comment noting the lack of atomicity. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-97-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve_ldst_internal.h | 63 ++++++++++++++++++++++++++++++ target/arm/tcg/sme_helper.c | 44 +++------------------ 2 files changed, 69 insertions(+), 38 deletions(-) diff --git a/target/arm/tcg/sve_ldst_internal.h b/target/arm/tcg/sve_ldst_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_ldst_internal.h +++ b/target/arm/tcg/sve_ldst_internal.h @@ -XXX,XX +XXX,XX @@ DO_LD_PRIM_3(ld1dqu_le, ld1dd_le) #define sve_st1dq_be_tlb sve_st1dd_be_tlb #define sve_st1dq_le_tlb sve_st1dd_le_tlb +/* + * The ARMVectorReg elements are stored in host-endian 64-bit units. + * For 128-bit quantities, the sequence defined by the Elem[] pseudocode + * corresponds to storing the two 64-bit pieces in little-endian order. + */ +/* FIXME: Nothing in this file makes any effort at atomicity. */ + +static inline void sve_ld1qq_be_host(void *vd, intptr_t reg_off, void *host) +{ + sve_ld1dd_be_host(vd, reg_off + 8, host); + sve_ld1dd_be_host(vd, reg_off, host + 8); +} + +static inline void sve_ld1qq_le_host(void *vd, intptr_t reg_off, void *host) +{ + sve_ld1dd_le_host(vd, reg_off, host); + sve_ld1dd_le_host(vd, reg_off + 8, host + 8); +} + +static inline void +sve_ld1qq_be_tlb(CPUARMState *env, void *vd, intptr_t reg_off, + target_ulong addr, uintptr_t ra) +{ + sve_ld1dd_be_tlb(env, vd, reg_off + 8, addr, ra); + sve_ld1dd_be_tlb(env, vd, reg_off, addr + 8, ra); +} + +static inline void +sve_ld1qq_le_tlb(CPUARMState *env, void *vd, intptr_t reg_off, + target_ulong addr, uintptr_t ra) +{ + sve_ld1dd_le_tlb(env, vd, reg_off, addr, ra); + sve_ld1dd_le_tlb(env, vd, reg_off + 8, addr + 8, ra); +} + +static inline void sve_st1qq_be_host(void *vd, intptr_t reg_off, void *host) +{ + sve_st1dd_be_host(vd, reg_off + 8, host); + sve_st1dd_be_host(vd, reg_off, host + 8); +} + +static inline void sve_st1qq_le_host(void *vd, intptr_t reg_off, void *host) +{ + sve_st1dd_le_host(vd, reg_off, host); + sve_st1dd_le_host(vd, reg_off + 8, host + 8); +} + +static inline void +sve_st1qq_be_tlb(CPUARMState *env, void *vd, intptr_t reg_off, + target_ulong addr, uintptr_t ra) +{ + sve_st1dd_be_tlb(env, vd, reg_off + 8, addr, ra); + sve_st1dd_be_tlb(env, vd, reg_off, addr + 8, ra); +} + +static inline void +sve_st1qq_le_tlb(CPUARMState *env, void *vd, intptr_t reg_off, + target_ulong addr, uintptr_t ra) +{ + sve_st1dd_le_tlb(env, vd, reg_off, addr, ra); + sve_st1dd_le_tlb(env, vd, reg_off + 8, addr + 8, ra); +} + #undef DO_LD_TLB #undef DO_ST_TLB #undef DO_LD_HOST diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ static inline void sme_##NAME##_v_tlb(CPUARMState *env, void *za, \ TLB(env, useronly_clean_ptr(addr), val, ra); \ } -/* - * The ARMVectorReg elements are stored in host-endian 64-bit units. - * For 128-bit quantities, the sequence defined by the Elem[] pseudocode - * corresponds to storing the two 64-bit pieces in little-endian order. - */ -#define DO_LDQ(HNAME, VNAME, BE, HOST, TLB) \ -static inline void HNAME##_host(void *za, intptr_t off, void *host) \ -{ \ - uint64_t val0 = HOST(host), val1 = HOST(host + 8); \ - uint64_t *ptr = za + off; \ - ptr[0] = BE ? val1 : val0, ptr[1] = BE ? val0 : val1; \ -} \ +#define DO_LDQ(HNAME, VNAME) \ static inline void VNAME##_v_host(void *za, intptr_t off, void *host) \ { \ HNAME##_host(za, tile_vslice_offset(off), host); \ } \ -static inline void HNAME##_tlb(CPUARMState *env, void *za, intptr_t off, \ - target_ulong addr, uintptr_t ra) \ -{ \ - uint64_t val0 = TLB(env, useronly_clean_ptr(addr), ra); \ - uint64_t val1 = TLB(env, useronly_clean_ptr(addr + 8), ra); \ - uint64_t *ptr = za + off; \ - ptr[0] = BE ? val1 : val0, ptr[1] = BE ? val0 : val1; \ -} \ static inline void VNAME##_v_tlb(CPUARMState *env, void *za, intptr_t off, \ target_ulong addr, uintptr_t ra) \ { \ HNAME##_tlb(env, za, tile_vslice_offset(off), addr, ra); \ } -#define DO_STQ(HNAME, VNAME, BE, HOST, TLB) \ -static inline void HNAME##_host(void *za, intptr_t off, void *host) \ -{ \ - uint64_t *ptr = za + off; \ - HOST(host, ptr[BE]); \ - HOST(host + 8, ptr[!BE]); \ -} \ +#define DO_STQ(HNAME, VNAME) \ static inline void VNAME##_v_host(void *za, intptr_t off, void *host) \ { \ HNAME##_host(za, tile_vslice_offset(off), host); \ } \ -static inline void HNAME##_tlb(CPUARMState *env, void *za, intptr_t off, \ - target_ulong addr, uintptr_t ra) \ -{ \ - uint64_t *ptr = za + off; \ - TLB(env, useronly_clean_ptr(addr), ptr[BE], ra); \ - TLB(env, useronly_clean_ptr(addr + 8), ptr[!BE], ra); \ -} \ static inline void VNAME##_v_tlb(CPUARMState *env, void *za, intptr_t off, \ target_ulong addr, uintptr_t ra) \ { \ @@ -XXX,XX +XXX,XX @@ DO_LD(ld1s_le, uint32_t, ldl_le_p, cpu_ldl_le_data_ra) DO_LD(ld1d_be, uint64_t, ldq_be_p, cpu_ldq_be_data_ra) DO_LD(ld1d_le, uint64_t, ldq_le_p, cpu_ldq_le_data_ra) -DO_LDQ(sve_ld1qq_be, sme_ld1q_be, 1, ldq_be_p, cpu_ldq_be_data_ra) -DO_LDQ(sve_ld1qq_le, sme_ld1q_le, 0, ldq_le_p, cpu_ldq_le_data_ra) +DO_LDQ(sve_ld1qq_be, sme_ld1q_be) +DO_LDQ(sve_ld1qq_le, sme_ld1q_le) DO_ST(st1b, uint8_t, stb_p, cpu_stb_data_ra) DO_ST(st1h_be, uint16_t, stw_be_p, cpu_stw_be_data_ra) @@ -XXX,XX +XXX,XX @@ DO_ST(st1s_le, uint32_t, stl_le_p, cpu_stl_le_data_ra) DO_ST(st1d_be, uint64_t, stq_be_p, cpu_stq_be_data_ra) DO_ST(st1d_le, uint64_t, stq_le_p, cpu_stq_le_data_ra) -DO_STQ(sve_st1qq_be, sme_st1q_be, 1, stq_be_p, cpu_stq_be_data_ra) -DO_STQ(sve_st1qq_le, sme_st1q_le, 0, stq_le_p, cpu_stq_le_data_ra) +DO_STQ(sve_st1qq_be, sme_st1q_be) +DO_STQ(sve_st1qq_le, sme_st1q_le) #undef DO_LD #undef DO_ST -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-98-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 32 +++++++++ target/arm/tcg/sve.decode | 31 +++++++++ target/arm/tcg/sve_helper.c | 8 +++ target/arm/tcg/translate-sve.c | 116 ++++++++++++++++++++++++--------- 4 files changed, 156 insertions(+), 31 deletions(-) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_ld2dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld3dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld4dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld2qq_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld3qq_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld4qq_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_ld2qq_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld3qq_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld4qq_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_4(sve_ld1bhu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1bsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1bdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_ld2dd_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld3dd_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld4dd_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld2qq_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld3qq_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld4qq_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_ld2qq_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld3qq_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld4qq_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_4(sve_ld1bhu_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1bsu_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1bdu_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_st2dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st3dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st4dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st2qq_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st3qq_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st4qq_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_st2qq_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st3qq_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st4qq_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_4(sve_st1bh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st1bs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st1bd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_st2dd_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st3dd_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st4dd_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st2qq_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st3qq_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st4qq_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_st2qq_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st3qq_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st4qq_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_4(sve_st1bh_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st1bs_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st1bd_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ @rprr_load_dt ....... dtype:4 rm:5 ... pg:3 rn:5 rd:5 &rprr_load @rpri_load_dt ....... dtype:4 . imm:s4 ... pg:3 rn:5 rd:5 &rpri_load +@rprr_load ....... .... rm:5 ... pg:3 rn:5 rd:5 &rprr_load +@rpri_load ....... .... . imm:s4 ... pg:3 rn:5 rd:5 &rpri_load + @rprr_load_msz ....... .... rm:5 ... pg:3 rn:5 rd:5 \ &rprr_load dtype=%msz_dtype @rpri_load_msz ....... .... . imm:s4 ... pg:3 rn:5 rd:5 \ @@ -XXX,XX +XXX,XX @@ LDNF1_zpri 1010010 .... 1.... 101 ... ..... ..... @rpri_load_dt nreg=0 # SVE load multiple structures (scalar plus scalar) # LD2B, LD2H, LD2W, LD2D; etc. LD_zprr 1010010 .. nreg:2 ..... 110 ... ..... ..... @rprr_load_msz +# LD[234]Q +LD_zprr 1010010 01 01 ..... 100 ... ..... ..... \ + @rprr_load dtype=18 nreg=1 +LD_zprr 1010010 10 01 ..... 100 ... ..... ..... \ + @rprr_load dtype=18 nreg=2 +LD_zprr 1010010 11 01 ..... 100 ... ..... ..... \ + @rprr_load dtype=18 nreg=3 # SVE contiguous non-temporal load (scalar plus immediate) # LDNT1B, LDNT1H, LDNT1W, LDNT1D # SVE load multiple structures (scalar plus immediate) # LD2B, LD2H, LD2W, LD2D; etc. LD_zpri 1010010 .. nreg:2 0.... 111 ... ..... ..... @rpri_load_msz +# LD[234]Q +LD_zpri 1010010 01 001 .... 111 ... ..... ..... \ + @rpri_load dtype=18 nreg=1 +LD_zpri 1010010 10 001 .... 111 ... ..... ..... \ + @rpri_load dtype=18 nreg=2 +LD_zpri 1010010 11 001 .... 111 ... ..... ..... \ + @rpri_load dtype=18 nreg=3 # SVE load and broadcast quadword (scalar plus scalar) LD1RQ_zprr 1010010 .. 00 ..... 000 ... ..... ..... \ @@ -XXX,XX +XXX,XX @@ ST_zprr 1110010 11 10 ..... 010 ... ..... ..... \ # SVE store multiple structures (scalar plus immediate) (nreg != 0) ST_zpri 1110010 .. nreg:2 1.... 111 ... ..... ..... \ @rpri_store msz=%size_23 esz=%size_23 +# ST[234]Q +ST_zpri 11100100 01 00 .... 000 ... ..... ..... \ + @rpri_store msz=4 esz=4 nreg=1 +ST_zpri 11100100 10 00 .... 000 ... ..... ..... \ + @rpri_store msz=4 esz=4 nreg=2 +ST_zpri 11100100 11 00 .... 000 ... ..... ..... \ + @rpri_store msz=4 esz=4 nreg=3 # SVE contiguous non-temporal store (scalar plus scalar) (nreg == 0) # SVE store multiple structures (scalar plus scalar) (nreg != 0) ST_zprr 1110010 .. nreg:2 ..... 011 ... ..... ..... \ @rprr_store msz=%size_23 esz=%size_23 +# ST[234]Q +ST_zprr 11100100 01 1 ..... 000 ... ..... ..... \ + @rprr_store msz=4 esz=4 nreg=1 +ST_zprr 11100100 10 1 ..... 000 ... ..... ..... \ + @rprr_store msz=4 esz=4 nreg=2 +ST_zprr 11100100 11 1 ..... 000 ... ..... ..... \ + @rprr_store msz=4 esz=4 nreg=3 # SVE 32-bit scatter store (scalar plus 32-bit scaled offsets) # Require msz > 0 && msz <= esz. diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ DO_LDN_2(2, dd, MO_64) DO_LDN_2(3, dd, MO_64) DO_LDN_2(4, dd, MO_64) +DO_LDN_2(2, qq, MO_128) +DO_LDN_2(3, qq, MO_128) +DO_LDN_2(4, qq, MO_128) + #undef DO_LDN_1 #undef DO_LDN_2 @@ -XXX,XX +XXX,XX @@ DO_STN_2(4, dd, MO_64, MO_64) DO_STN_2(1, sq, MO_128, MO_32) DO_STN_2(1, dq, MO_128, MO_64) +DO_STN_2(2, qq, MO_128, MO_128) +DO_STN_2(3, qq, MO_128, MO_128) +DO_STN_2(4, qq, MO_128, MO_128) + #undef DO_STN_1 #undef DO_STN_2 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static inline int expand_imm_sh8u(DisasContext *s, int x) */ static inline int msz_dtype(DisasContext *s, int msz) { - static const uint8_t dtype[4] = { 0, 5, 10, 15 }; + static const uint8_t dtype[5] = { 0, 5, 10, 15, 18 }; return dtype[msz]; } @@ -XXX,XX +XXX,XX @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, } /* Indexed by [mte][be][dtype][nreg] */ -static gen_helper_gvec_mem * const ldr_fns[2][2][18][4] = { +static gen_helper_gvec_mem * const ldr_fns[2][2][19][4] = { { /* mte inactive, little-endian */ { { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r, gen_helper_sve_ld3bb_r, gen_helper_sve_ld4bb_r }, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_mem * const ldr_fns[2][2][18][4] = { { gen_helper_sve_ld1squ_le_r, NULL, NULL, NULL }, { gen_helper_sve_ld1dqu_le_r, NULL, NULL, NULL }, + { NULL, gen_helper_sve_ld2qq_le_r, + gen_helper_sve_ld3qq_le_r, gen_helper_sve_ld4qq_le_r }, }, /* mte inactive, big-endian */ @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_mem * const ldr_fns[2][2][18][4] = { { gen_helper_sve_ld1squ_be_r, NULL, NULL, NULL }, { gen_helper_sve_ld1dqu_be_r, NULL, NULL, NULL }, + { NULL, gen_helper_sve_ld2qq_be_r, + gen_helper_sve_ld3qq_be_r, gen_helper_sve_ld4qq_be_r }, }, }, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_mem * const ldr_fns[2][2][18][4] = { { gen_helper_sve_ld1squ_le_r_mte, NULL, NULL, NULL }, { gen_helper_sve_ld1dqu_le_r_mte, NULL, NULL, NULL }, + { NULL, + gen_helper_sve_ld2qq_le_r_mte, + gen_helper_sve_ld3qq_le_r_mte, + gen_helper_sve_ld4qq_le_r_mte }, }, /* mte active, big-endian */ @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_mem * const ldr_fns[2][2][18][4] = { { gen_helper_sve_ld1squ_be_r_mte, NULL, NULL, NULL }, { gen_helper_sve_ld1dqu_be_r_mte, NULL, NULL, NULL }, + { NULL, + gen_helper_sve_ld2qq_be_r_mte, + gen_helper_sve_ld3qq_be_r_mte, + gen_helper_sve_ld4qq_be_r_mte }, }, }, }; @@ -XXX,XX +XXX,XX @@ static bool trans_LD_zprr(DisasContext *s, arg_rprr_load *a) return false; } - /* dtypes 16 and 17 are artificial, representing 128-bit element */ - if (a->dtype < 16) { + /* dtypes 16-18 are artificial, representing 128-bit element */ + switch (a->dtype) { + case 0 ... 15: if (!dc_isar_feature(aa64_sve, s)) { return false; } - } else { + break; + case 16: case 17: if (!dc_isar_feature(aa64_sve2p1, s)) { return false; } s->is_nonstreaming = true; + break; + case 18: + if (!dc_isar_feature(aa64_sme2p1_or_sve2p1, s)) { + return false; + } + break; + default: + g_assert_not_reached(); } if (sve_access_check(s)) { @@ -XXX,XX +XXX,XX @@ static bool trans_LD_zprr(DisasContext *s, arg_rprr_load *a) static bool trans_LD_zpri(DisasContext *s, arg_rpri_load *a) { - /* dtypes 16 and 17 are artificial, representing 128-bit element */ - if (a->dtype < 16) { + /* dtypes 16-18 are artificial, representing 128-bit element */ + switch (a->dtype) { + case 0 ... 15: if (!dc_isar_feature(aa64_sve, s)) { return false; } - } else { + break; + case 16: case 17: if (!dc_isar_feature(aa64_sve2p1, s)) { return false; } s->is_nonstreaming = true; + break; + case 18: + if (!dc_isar_feature(aa64_sme2p1_or_sve2p1, s)) { + return false; + } + break; + default: + g_assert_not_reached(); } if (sve_access_check(s)) { @@ -XXX,XX +XXX,XX @@ static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, gen_helper_sve_st1dd_be_r_mte, gen_helper_sve_st1dq_be_r_mte } } }, }; - static gen_helper_gvec_mem * const fn_multiple[2][2][3][4] = { + static gen_helper_gvec_mem * const fn_multiple[2][2][3][5] = { { { { gen_helper_sve_st2bb_r, gen_helper_sve_st2hh_le_r, gen_helper_sve_st2ss_le_r, - gen_helper_sve_st2dd_le_r }, + gen_helper_sve_st2dd_le_r, + gen_helper_sve_st2qq_le_r }, { gen_helper_sve_st3bb_r, gen_helper_sve_st3hh_le_r, gen_helper_sve_st3ss_le_r, - gen_helper_sve_st3dd_le_r }, + gen_helper_sve_st3dd_le_r, + gen_helper_sve_st3qq_le_r }, { gen_helper_sve_st4bb_r, gen_helper_sve_st4hh_le_r, gen_helper_sve_st4ss_le_r, - gen_helper_sve_st4dd_le_r } }, + gen_helper_sve_st4dd_le_r, + gen_helper_sve_st4qq_le_r } }, { { gen_helper_sve_st2bb_r, gen_helper_sve_st2hh_be_r, gen_helper_sve_st2ss_be_r, - gen_helper_sve_st2dd_be_r }, + gen_helper_sve_st2dd_be_r, + gen_helper_sve_st2qq_be_r }, { gen_helper_sve_st3bb_r, gen_helper_sve_st3hh_be_r, gen_helper_sve_st3ss_be_r, - gen_helper_sve_st3dd_be_r }, + gen_helper_sve_st3dd_be_r, + gen_helper_sve_st3qq_be_r }, { gen_helper_sve_st4bb_r, gen_helper_sve_st4hh_be_r, gen_helper_sve_st4ss_be_r, - gen_helper_sve_st4dd_be_r } } }, + gen_helper_sve_st4dd_be_r, + gen_helper_sve_st4qq_be_r } } }, { { { gen_helper_sve_st2bb_r_mte, gen_helper_sve_st2hh_le_r_mte, gen_helper_sve_st2ss_le_r_mte, - gen_helper_sve_st2dd_le_r_mte }, + gen_helper_sve_st2dd_le_r_mte, + gen_helper_sve_st2qq_le_r_mte }, { gen_helper_sve_st3bb_r_mte, gen_helper_sve_st3hh_le_r_mte, gen_helper_sve_st3ss_le_r_mte, - gen_helper_sve_st3dd_le_r_mte }, + gen_helper_sve_st3dd_le_r_mte, + gen_helper_sve_st3qq_le_r_mte }, { gen_helper_sve_st4bb_r_mte, gen_helper_sve_st4hh_le_r_mte, gen_helper_sve_st4ss_le_r_mte, - gen_helper_sve_st4dd_le_r_mte } }, + gen_helper_sve_st4dd_le_r_mte, + gen_helper_sve_st4qq_le_r_mte } }, { { gen_helper_sve_st2bb_r_mte, gen_helper_sve_st2hh_be_r_mte, gen_helper_sve_st2ss_be_r_mte, - gen_helper_sve_st2dd_be_r_mte }, + gen_helper_sve_st2dd_be_r_mte, + gen_helper_sve_st2qq_be_r_mte }, { gen_helper_sve_st3bb_r_mte, gen_helper_sve_st3hh_be_r_mte, gen_helper_sve_st3ss_be_r_mte, - gen_helper_sve_st3dd_be_r_mte }, + gen_helper_sve_st3dd_be_r_mte, + gen_helper_sve_st3qq_be_r_mte }, { gen_helper_sve_st4bb_r_mte, gen_helper_sve_st4hh_be_r_mte, gen_helper_sve_st4ss_be_r_mte, - gen_helper_sve_st4dd_be_r_mte } } }, + gen_helper_sve_st4dd_be_r_mte, + gen_helper_sve_st4qq_be_r_mte } } }, }; gen_helper_gvec_mem *fn; int be = s->be_data == MO_BE; @@ -XXX,XX +XXX,XX @@ static bool trans_ST_zprr(DisasContext *s, arg_rprr_store *a) } break; case MO_128: - assert(a->msz < a->esz); - assert(a->nreg == 0); - if (!dc_isar_feature(aa64_sve2p1, s)) { - return false; + if (a->nreg == 0) { + assert(a->msz < a->esz); + if (!dc_isar_feature(aa64_sve2p1, s)) { + return false; + } + s->is_nonstreaming = true; + } else { + if (!dc_isar_feature(aa64_sme2p1_or_sve2p1, s)) { + return false; + } } - s->is_nonstreaming = true; break; default: g_assert_not_reached(); @@ -XXX,XX +XXX,XX @@ static bool trans_ST_zpri(DisasContext *s, arg_rpri_store *a) } break; case MO_128: - assert(a->msz < a->esz); - assert(a->nreg == 0); - if (!dc_isar_feature(aa64_sve2p1, s)) { - return false; + if (a->nreg == 0) { + assert(a->msz < a->esz); + if (!dc_isar_feature(aa64_sve2p1, s)) { + return false; + } + s->is_nonstreaming = true; + } else { + if (!dc_isar_feature(aa64_sme2p1_or_sve2p1, s)) { + return false; + } } - s->is_nonstreaming = true; break; default: g_assert_not_reached(); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-99-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 16 ++++++++++++++++ target/arm/tcg/sve.decode | 8 ++++++++ target/arm/tcg/sve_helper.c | 6 ++++++ target/arm/tcg/translate-sve.c | 34 ++++++++++++++++++++++++++++++++-- 4 files changed, 62 insertions(+), 2 deletions(-) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sve_ldsds_le_zd, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_6(sve_ldsds_be_zd, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldqq_le_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldqq_be_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_6(sve_ldbsu_zsu_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sve_ldsds_le_zd_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_6(sve_ldsds_be_zd_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldqq_le_zd_mte, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldqq_be_zd_mte, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_6(sve_ldffbsu_zsu, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sve_stdd_le_zd, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_6(sve_stdd_be_zd, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_stqq_le_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_stqq_be_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_6(sve_stbs_zsu_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sve_stdd_le_zd_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_6(sve_stdd_be_zd_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_stqq_le_zd_mte, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_stqq_be_zd_mte, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve2_sqdmull_zzz_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ LD1_zprz 1100010 10 1. ..... 1.. ... ..... ..... \ LD1_zprz 1100010 11 1. ..... 11. ... ..... ..... \ @rprr_g_load_sc esz=3 msz=3 u=1 +# LD1Q +LD1_zprz 1100 0100 000 rm:5 101 pg:3 rn:5 rd:5 \ + &rprr_gather_load u=0 ff=0 xs=2 esz=4 msz=4 scale=0 + # SVE 64-bit gather load (vector plus immediate) LD1_zpiz 1100010 .. 01 ..... 1.. ... ..... ..... \ @rpri_g_load esz=3 @@ -XXX,XX +XXX,XX @@ ST1_zprz 1110010 .. 01 ..... 101 ... ..... ..... \ ST1_zprz 1110010 .. 00 ..... 101 ... ..... ..... \ @rprr_scatter_store xs=2 esz=3 scale=0 +# ST1Q +ST1_zprz 1110 0100 001 rm:5 001 pg:3 rn:5 rd:5 \ + &rprr_scatter_store xs=2 msz=4 esz=4 scale=0 + # SVE 64-bit scatter store (vector plus immediate) ST1_zpiz 1110010 .. 10 ..... 101 ... ..... ..... \ @rpri_scatter_store esz=3 diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ DO_LD1_ZPZ_D(dd_be, zsu, MO_64) DO_LD1_ZPZ_D(dd_be, zss, MO_64) DO_LD1_ZPZ_D(dd_be, zd, MO_64) +DO_LD1_ZPZ_D(qq_le, zd, MO_128) +DO_LD1_ZPZ_D(qq_be, zd, MO_128) + #undef DO_LD1_ZPZ_S #undef DO_LD1_ZPZ_D @@ -XXX,XX +XXX,XX @@ DO_ST1_ZPZ_D(sd_be, zd, MO_32) DO_ST1_ZPZ_D(dd_le, zd, MO_64) DO_ST1_ZPZ_D(dd_be, zd, MO_64) +DO_ST1_ZPZ_D(qq_le, zd, MO_128) +DO_ST1_ZPZ_D(qq_be, zd, MO_128) + #undef DO_ST1_ZPZ_S #undef DO_ST1_ZPZ_D diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ gather_load_fn64[2][2][2][3][2][4] = { gen_helper_sve_ldffdd_be_zd_mte, } } } } }, }; +static gen_helper_gvec_mem_scatter * const +gather_load_fn128[2][2] = { + { gen_helper_sve_ldqq_le_zd, + gen_helper_sve_ldqq_be_zd }, + { gen_helper_sve_ldqq_le_zd_mte, + gen_helper_sve_ldqq_be_zd_mte } +}; + static bool trans_LD1_zprz(DisasContext *s, arg_LD1_zprz *a) { gen_helper_gvec_mem_scatter *fn = NULL; bool be = s->be_data == MO_BE; bool mte = s->mte_active[0]; - if (!dc_isar_feature(aa64_sve, s)) { + if (a->esz < MO_128 + ? !dc_isar_feature(aa64_sve, s) + : !dc_isar_feature(aa64_sve2p1, s)) { return false; } s->is_nonstreaming = true; @@ -XXX,XX +XXX,XX @@ static bool trans_LD1_zprz(DisasContext *s, arg_LD1_zprz *a) case MO_64: fn = gather_load_fn64[mte][be][a->ff][a->xs][a->u][a->msz]; break; + case MO_128: + assert(!a->ff && a->u && a->xs == 2 && a->msz == MO_128); + fn = gather_load_fn128[mte][be]; + break; + default: + g_assert_not_reached(); } assert(fn != NULL); @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_mem_scatter * const scatter_store_fn64[2][2][3][4] = { gen_helper_sve_stdd_be_zd_mte, } } }, }; +static gen_helper_gvec_mem_scatter * const +scatter_store_fn128[2][2] = { + { gen_helper_sve_stqq_le_zd, + gen_helper_sve_stqq_be_zd }, + { gen_helper_sve_stqq_le_zd_mte, + gen_helper_sve_stqq_be_zd_mte } +}; + static bool trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a) { gen_helper_gvec_mem_scatter *fn; @@ -XXX,XX +XXX,XX @@ static bool trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a) if (a->esz < a->msz || (a->msz == 0 && a->scale)) { return false; } - if (!dc_isar_feature(aa64_sve, s)) { + if (a->esz < MO_128 + ? !dc_isar_feature(aa64_sve, s) + : !dc_isar_feature(aa64_sve2p1, s)) { return false; } s->is_nonstreaming = true; @@ -XXX,XX +XXX,XX @@ static bool trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a) case MO_64: fn = scatter_store_fn64[mte][be][a->xs][a->msz]; break; + case MO_128: + assert(a->xs == 2 && a->msz == MO_128); + fn = scatter_store_fn128[mte][be]; + break; default: g_assert_not_reached(); } -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-100-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 6 ++++ target/arm/tcg/sme.decode | 36 ++++++++++++++++++++ target/arm/tcg/sme_helper.c | 60 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 46 +++++++++++++++++++------- 4 files changed, 137 insertions(+), 11 deletions(-) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(sme2_mova_zc_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_mova_cz_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_mova_zc_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2p1_movaz_zc_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2p1_movaz_zc_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2p1_movaz_zc_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2p1_movaz_zc_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2p1_movaz_zc_q, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(sme_ld1b_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_5(sme_ld1b_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_5(sme_ld1b_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ MOVA_za2 11000000 00 00011 00 .. 010 00 off:3 zr:4 0 \ MOVA_za4 11000000 00 00011 00 .. 011 00 off:3 zr:3 00 \ &mova_a rv=%mova_rv +### SME Move and Zero + +MOVAZ_za2 11000000 00000110 0 .. 01010 off:3 zr:4 0 \ + &mova_a rv=%mova_rv +MOVAZ_za4 11000000 00000110 0 .. 01110 off:3 zr:3 00 \ + &mova_a rv=%mova_rv + +MOVAZ_zt 11000000 00 00001 0 v:1 .. 0001 off:4 zr:5 \ + &mova_t rs=%mova_rs esz=0 za=0 +MOVAZ_zt 11000000 01 00001 0 v:1 .. 0001 za:1 off:3 zr:5 \ + &mova_t rs=%mova_rs esz=1 +MOVAZ_zt 11000000 10 00001 0 v:1 .. 0001 za:2 off:2 zr:5 \ + &mova_t rs=%mova_rs esz=2 +MOVAZ_zt 11000000 11 00001 0 v:1 .. 0001 za:3 off:1 zr:5 \ + &mova_t rs=%mova_rs esz=3 +MOVAZ_zt 11000000 11 00001 1 v:1 .. 0001 za:4 zr:5 \ + &mova_t rs=%mova_rs esz=4 off=0 + +MOVAZ_zt2 11000000 00 00011 0 v:1 .. 00010 off:3 zr:4 0 \ + &mova_t rs=%mova_rs esz=0 za=0 +MOVAZ_zt2 11000000 01 00011 0 v:1 .. 00010 za:1 off:2 zr:4 0 \ + &mova_t rs=%mova_rs esz=1 +MOVAZ_zt2 11000000 10 00011 0 v:1 .. 00010 za:2 off:1 zr:4 0 \ + &mova_t rs=%mova_rs esz=2 +MOVAZ_zt2 11000000 11 00011 0 v:1 .. 00010 za:3 zr:4 0 \ + &mova_t rs=%mova_rs esz=3 off=0 + +MOVAZ_zt4 11000000 00 00011 0 v:1 .. 001100 off:2 zr:3 00 \ + &mova_t rs=%mova_rs esz=0 za=0 +MOVAZ_zt4 11000000 01 00011 0 v:1 .. 001100 za:1 off:1 zr:3 00 \ + &mova_t rs=%mova_rs esz=1 +MOVAZ_zt4 11000000 10 00011 0 v:1 .. 001100 za:2 zr:3 00 \ + &mova_t rs=%mova_rs esz=2 off=0 +MOVAZ_zt4 11000000 11 00011 0 v:1 .. 00110 za:3 zr:3 00 \ + &mova_t rs=%mova_rs esz=3 off=0 + ### SME Move into/from ZT0 MOVT_rzt 1100 0000 0100 1100 0 off:3 00 11111 rt:5 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_mova_zc_d)(void *vdst, void *vsrc, uint32_t desc) } } +void HELPER(sme2p1_movaz_zc_b)(void *vdst, void *vsrc, uint32_t desc) +{ + uint8_t *src = vsrc; + uint8_t *dst = vdst; + size_t i, n = simd_oprsz(desc); + + for (i = 0; i < n; ++i) { + dst[i] = src[tile_vslice_index(i)]; + src[tile_vslice_index(i)] = 0; + } +} + +void HELPER(sme2p1_movaz_zc_h)(void *vdst, void *vsrc, uint32_t desc) +{ + uint16_t *src = vsrc; + uint16_t *dst = vdst; + size_t i, n = simd_oprsz(desc) / 2; + + for (i = 0; i < n; ++i) { + dst[i] = src[tile_vslice_index(i)]; + src[tile_vslice_index(i)] = 0; + } +} + +void HELPER(sme2p1_movaz_zc_s)(void *vdst, void *vsrc, uint32_t desc) +{ + uint32_t *src = vsrc; + uint32_t *dst = vdst; + size_t i, n = simd_oprsz(desc) / 4; + + for (i = 0; i < n; ++i) { + dst[i] = src[tile_vslice_index(i)]; + src[tile_vslice_index(i)] = 0; + } +} + +void HELPER(sme2p1_movaz_zc_d)(void *vdst, void *vsrc, uint32_t desc) +{ + uint64_t *src = vsrc; + uint64_t *dst = vdst; + size_t i, n = simd_oprsz(desc) / 8; + + for (i = 0; i < n; ++i) { + dst[i] = src[tile_vslice_index(i)]; + src[tile_vslice_index(i)] = 0; + } +} + +void HELPER(sme2p1_movaz_zc_q)(void *vdst, void *vsrc, uint32_t desc) +{ + Int128 *src = vsrc; + Int128 *dst = vdst; + size_t i, n = simd_oprsz(desc) / 16; + + for (i = 0; i < n; ++i) { + dst[i] = src[tile_vslice_index(i)]; + memset(&src[tile_vslice_index(i)], 0, 16); + } +} + /* * Clear elements in a tile slice comprising len bytes. */ diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_mova_tile(DisasContext *s, arg_mova_p *a, bool to_vec) TRANS_FEAT(MOVA_tz, aa64_sme, do_mova_tile, a, false) TRANS_FEAT(MOVA_zt, aa64_sme, do_mova_tile, a, true) -static bool do_mova_tile_n(DisasContext *s, arg_mova_t *a, int n, bool to_vec) +static bool do_mova_tile_n(DisasContext *s, arg_mova_t *a, int n, + bool to_vec, bool zero) { static gen_helper_gvec_2 * const cz_fns[] = { gen_helper_sme2_mova_cz_b, gen_helper_sme2_mova_cz_h, @@ -XXX,XX +XXX,XX @@ static bool do_mova_tile_n(DisasContext *s, arg_mova_t *a, int n, bool to_vec) gen_helper_sme2_mova_zc_b, gen_helper_sme2_mova_zc_h, gen_helper_sme2_mova_zc_s, gen_helper_sme2_mova_zc_d, }; + static gen_helper_gvec_2 * const zc_z_fns[] = { + gen_helper_sme2p1_movaz_zc_b, gen_helper_sme2p1_movaz_zc_h, + gen_helper_sme2p1_movaz_zc_s, gen_helper_sme2p1_movaz_zc_d, + gen_helper_sme2p1_movaz_zc_q, + }; TCGv_ptr t_za; int svl, bytes_per_op = n << a->esz; @@ -XXX,XX +XXX,XX @@ static bool do_mova_tile_n(DisasContext *s, arg_mova_t *a, int n, bool to_vec) return true; } + assert(a->esz <= MO_64 + zero); + if (!sme_smza_enabled_check(s)) { return true; } @@ -XXX,XX +XXX,XX @@ static bool do_mova_tile_n(DisasContext *s, arg_mova_t *a, int n, bool to_vec) TCGv_ptr t_zr = vec_full_reg_ptr(s, a->zr * n + i); t_za = get_tile_rowcol(s, a->esz, a->rs, a->za, a->off * n + i, 1, n, a->v); - if (to_vec) { + if (zero) { + zc_z_fns[a->esz](t_zr, t_za, t_desc); + } else if (to_vec) { zc_fns[a->esz](t_zr, t_za, t_desc); } else { cz_fns[a->esz](t_za, t_zr, t_desc); @@ -XXX,XX +XXX,XX @@ static bool do_mova_tile_n(DisasContext *s, arg_mova_t *a, int n, bool to_vec) a->off * n + i, 1, n, a->v); if (to_vec) { tcg_gen_gvec_mov_var(MO_8, tcg_env, o_zr, t_za, 0, svl, svl); + if (zero) { + tcg_gen_gvec_dup_imm_var(MO_8, t_za, 0, svl, svl, 0); + } } else { tcg_gen_gvec_mov_var(MO_8, t_za, 0, tcg_env, o_zr, svl, svl); } @@ -XXX,XX +XXX,XX @@ static bool do_mova_tile_n(DisasContext *s, arg_mova_t *a, int n, bool to_vec) return true; } -TRANS_FEAT(MOVA_tz2, aa64_sme2, do_mova_tile_n, a, 2, false) -TRANS_FEAT(MOVA_tz4, aa64_sme2, do_mova_tile_n, a, 4, false) -TRANS_FEAT(MOVA_zt2, aa64_sme2, do_mova_tile_n, a, 2, true) -TRANS_FEAT(MOVA_zt4, aa64_sme2, do_mova_tile_n, a, 4, true) +TRANS_FEAT(MOVA_tz2, aa64_sme2, do_mova_tile_n, a, 2, false, false) +TRANS_FEAT(MOVA_tz4, aa64_sme2, do_mova_tile_n, a, 4, false, false) +TRANS_FEAT(MOVA_zt2, aa64_sme2, do_mova_tile_n, a, 2, true, false) +TRANS_FEAT(MOVA_zt4, aa64_sme2, do_mova_tile_n, a, 4, true, false) -static bool do_mova_array_n(DisasContext *s, arg_mova_a *a, int n, bool to_vec) +TRANS_FEAT(MOVAZ_zt, aa64_sme2p1, do_mova_tile_n, a, 1, true, true) +TRANS_FEAT(MOVAZ_zt2, aa64_sme2p1, do_mova_tile_n, a, 2, true, true) +TRANS_FEAT(MOVAZ_zt4, aa64_sme2p1, do_mova_tile_n, a, 4, true, true) + +static bool do_mova_array_n(DisasContext *s, arg_mova_a *a, int n, + bool to_vec, bool zero) { TCGv_ptr t_za; int svl; @@ -XXX,XX +XXX,XX @@ static bool do_mova_array_n(DisasContext *s, arg_mova_a *a, int n, bool to_vec) if (to_vec) { tcg_gen_gvec_mov_var(MO_8, tcg_env, o_zr, t_za, o_za, svl, svl); + if (zero) { + tcg_gen_gvec_dup_imm_var(MO_8, t_za, o_za, svl, svl, 0); + } } else { tcg_gen_gvec_mov_var(MO_8, t_za, o_za, tcg_env, o_zr, svl, svl); } @@ -XXX,XX +XXX,XX @@ static bool do_mova_array_n(DisasContext *s, arg_mova_a *a, int n, bool to_vec) return true; } -TRANS_FEAT(MOVA_az2, aa64_sme2, do_mova_array_n, a, 2, false) -TRANS_FEAT(MOVA_az4, aa64_sme2, do_mova_array_n, a, 4, false) -TRANS_FEAT(MOVA_za2, aa64_sme2, do_mova_array_n, a, 2, true) -TRANS_FEAT(MOVA_za4, aa64_sme2, do_mova_array_n, a, 4, true) +TRANS_FEAT(MOVA_az2, aa64_sme2, do_mova_array_n, a, 2, false, false) +TRANS_FEAT(MOVA_az4, aa64_sme2, do_mova_array_n, a, 4, false, false) +TRANS_FEAT(MOVA_za2, aa64_sme2, do_mova_array_n, a, 2, true, false) +TRANS_FEAT(MOVA_za4, aa64_sme2, do_mova_array_n, a, 4, true, false) + +TRANS_FEAT(MOVAZ_za2, aa64_sme2p1, do_mova_array_n, a, 2, true, true) +TRANS_FEAT(MOVAZ_za4, aa64_sme2p1, do_mova_array_n, a, 4, true, true) static bool do_movt(DisasContext *s, arg_MOVT_rzt *a, void (*func)(TCGv_i64, TCGv_ptr, tcg_target_long)) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-101-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper.h | 24 ++++++++++ target/arm/tcg/sme.decode | 42 ++++++++++++++++ target/arm/tcg/translate-sme.c | 56 ++++++++++++++++++++++ target/arm/tcg/vec_helper.c | 88 ++++++++++++++++++++++++++++++++++ 4 files changed, 210 insertions(+) diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper.h +++ b/target/arm/tcg/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_uminp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_3(gvec_urecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(gvec_ursqrte_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sme2_luti2_1b, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti2_1h, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti2_1s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) + +DEF_HELPER_FLAGS_4(sme2_luti2_2b, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti2_2h, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti2_2s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) + +DEF_HELPER_FLAGS_4(sme2_luti2_4b, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti2_4h, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti2_4s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) + +DEF_HELPER_FLAGS_4(sme2_luti4_1b, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti4_1h, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti4_1s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) + +DEF_HELPER_FLAGS_4(sme2_luti4_2b, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti4_2h, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti4_2s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) + +DEF_HELPER_FLAGS_4(sme2_luti4_4b, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti4_4h, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti4_4s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ ZERO_za 11000000 000011 110 .. 0000000000 00. \ &zero_za ngrp=2 nvec=4 rv=%mova_rv off=%off1_x4 ZERO_za 11000000 000011 111 .. 0000000000 00. \ &zero_za ngrp=4 nvec=4 rv=%mova_rv off=%off1_x4 + +### SME Lookup Table Read + +&lut zd zn idx + +# LUTI2, consecutive +LUTI2_c_1b 1100 0000 1100 11 idx:4 00 00 zn:5 zd:5 &lut +LUTI2_c_1h 1100 0000 1100 11 idx:4 01 00 zn:5 zd:5 &lut +LUTI2_c_1s 1100 0000 1100 11 idx:4 10 00 zn:5 zd:5 &lut + +LUTI2_c_2b 1100 0000 1000 11 idx:3 1 00 00 zn:5 .... 0 &lut zd=%zd_ax2 +LUTI2_c_2h 1100 0000 1000 11 idx:3 1 01 00 zn:5 .... 0 &lut zd=%zd_ax2 +LUTI2_c_2s 1100 0000 1000 11 idx:3 1 10 00 zn:5 .... 0 &lut zd=%zd_ax2 + +LUTI2_c_4b 1100 0000 1000 11 idx:2 10 00 00 zn:5 ... 00 &lut zd=%zd_ax4 +LUTI2_c_4h 1100 0000 1000 11 idx:2 10 01 00 zn:5 ... 00 &lut zd=%zd_ax4 +LUTI2_c_4s 1100 0000 1000 11 idx:2 10 10 00 zn:5 ... 00 &lut zd=%zd_ax4 + +# LUTI2, strided (must check zd alignment) +LUTI2_s_2b 1100 0000 1001 11 idx:3 1 00 00 zn:5 zd:5 &lut +LUTI2_s_2h 1100 0000 1001 11 idx:3 1 01 00 zn:5 zd:5 &lut + +LUTI2_s_4b 1100 0000 1001 11 idx:2 10 00 00 zn:5 zd:5 &lut +LUTI2_s_4h 1100 0000 1001 11 idx:2 10 01 00 zn:5 zd:5 &lut + +# LUTI4, consecutive +LUTI4_c_1b 1100 0000 1100 101 idx:3 00 00 zn:5 zd:5 &lut +LUTI4_c_1h 1100 0000 1100 101 idx:3 01 00 zn:5 zd:5 &lut +LUTI4_c_1s 1100 0000 1100 101 idx:3 10 00 zn:5 zd:5 &lut + +LUTI4_c_2b 1100 0000 1000 101 idx:2 1 00 00 zn:5 .... 0 &lut zd=%zd_ax2 +LUTI4_c_2h 1100 0000 1000 101 idx:2 1 01 00 zn:5 .... 0 &lut zd=%zd_ax2 +LUTI4_c_2s 1100 0000 1000 101 idx:2 1 10 00 zn:5 .... 0 &lut zd=%zd_ax2 + +LUTI4_c_4h 1100 0000 1000 101 idx:1 10 01 00 zn:5 ... 00 &lut zd=%zd_ax4 +LUTI4_c_4s 1100 0000 1000 101 idx:1 10 10 00 zn:5 ... 00 &lut zd=%zd_ax4 + +# LUTI4, strided (must check zd alignment) +LUTI4_s_2b 1100 0000 1001 101 idx:2 1 00 00 zn:5 zd:5 &lut +LUTI4_s_2h 1100 0000 1001 101 idx:2 1 01 00 zn:5 zd:5 &lut + +LUTI4_s_4h 1100 0000 1001 101 idx:1 10 01 00 zn:5 zd:5 &lut diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool trans_SEL(DisasContext *s, arg_SEL *a) } return true; } + +static bool do_lut(DisasContext *s, arg_lut *a, + gen_helper_gvec_2_ptr *fn, bool strided) +{ + if (sme_sm_enabled_check(s) && sme2_zt0_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + tcg_gen_gvec_2_ptr(vec_full_reg_offset(s, a->zd), + vec_full_reg_offset(s, a->zn), + tcg_env, svl, svl, strided | (a->idx << 1), fn); + } + return true; +} + +TRANS_FEAT(LUTI2_c_1b, aa64_sme2, do_lut, a, gen_helper_sme2_luti2_1b, false) +TRANS_FEAT(LUTI2_c_1h, aa64_sme2, do_lut, a, gen_helper_sme2_luti2_1h, false) +TRANS_FEAT(LUTI2_c_1s, aa64_sme2, do_lut, a, gen_helper_sme2_luti2_1s, false) + +TRANS_FEAT(LUTI2_c_2b, aa64_sme2, do_lut, a, gen_helper_sme2_luti2_2b, false) +TRANS_FEAT(LUTI2_c_2h, aa64_sme2, do_lut, a, gen_helper_sme2_luti2_2h, false) +TRANS_FEAT(LUTI2_c_2s, aa64_sme2, do_lut, a, gen_helper_sme2_luti2_2s, false) + +TRANS_FEAT(LUTI2_c_4b, aa64_sme2, do_lut, a, gen_helper_sme2_luti2_4b, false) +TRANS_FEAT(LUTI2_c_4h, aa64_sme2, do_lut, a, gen_helper_sme2_luti2_4h, false) +TRANS_FEAT(LUTI2_c_4s, aa64_sme2, do_lut, a, gen_helper_sme2_luti2_4s, false) + +TRANS_FEAT(LUTI4_c_1b, aa64_sme2, do_lut, a, gen_helper_sme2_luti4_1b, false) +TRANS_FEAT(LUTI4_c_1h, aa64_sme2, do_lut, a, gen_helper_sme2_luti4_1h, false) +TRANS_FEAT(LUTI4_c_1s, aa64_sme2, do_lut, a, gen_helper_sme2_luti4_1s, false) + +TRANS_FEAT(LUTI4_c_2b, aa64_sme2, do_lut, a, gen_helper_sme2_luti4_2b, false) +TRANS_FEAT(LUTI4_c_2h, aa64_sme2, do_lut, a, gen_helper_sme2_luti4_2h, false) +TRANS_FEAT(LUTI4_c_2s, aa64_sme2, do_lut, a, gen_helper_sme2_luti4_2s, false) + +TRANS_FEAT(LUTI4_c_4h, aa64_sme2, do_lut, a, gen_helper_sme2_luti4_4h, false) +TRANS_FEAT(LUTI4_c_4s, aa64_sme2, do_lut, a, gen_helper_sme2_luti4_4s, false) + +static bool do_lut_s4(DisasContext *s, arg_lut *a, gen_helper_gvec_2_ptr *fn) +{ + return !(a->zd & 0b01100) && do_lut(s, a, fn, true); +} + +static bool do_lut_s8(DisasContext *s, arg_lut *a, gen_helper_gvec_2_ptr *fn) +{ + return !(a->zd & 0b01000) && do_lut(s, a, fn, true); +} + +TRANS_FEAT(LUTI2_s_2b, aa64_sme2p1, do_lut_s8, a, gen_helper_sme2_luti2_2b) +TRANS_FEAT(LUTI2_s_2h, aa64_sme2p1, do_lut_s8, a, gen_helper_sme2_luti2_2h) + +TRANS_FEAT(LUTI2_s_4b, aa64_sme2p1, do_lut_s4, a, gen_helper_sme2_luti2_4b) +TRANS_FEAT(LUTI2_s_4h, aa64_sme2p1, do_lut_s4, a, gen_helper_sme2_luti2_4h) + +TRANS_FEAT(LUTI4_s_2b, aa64_sme2p1, do_lut_s8, a, gen_helper_sme2_luti4_2b) +TRANS_FEAT(LUTI4_s_2h, aa64_sme2p1, do_lut_s8, a, gen_helper_sme2_luti4_2h) + +TRANS_FEAT(LUTI4_s_4h, aa64_sme2p1, do_lut_s4, a, gen_helper_sme2_luti4_4h) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(gvec_ursqrte_s)(void *vd, void *vn, uint32_t desc) } clear_tail(d, opr_sz, simd_maxsz(desc)); } + +static inline void do_lut_b(void *zd, uint64_t *indexes, uint64_t *table, + unsigned elements, unsigned segbase, + unsigned dstride, unsigned isize, + unsigned tsize, unsigned nreg) +{ + for (unsigned r = 0; r < nreg; ++r) { + uint8_t *dst = zd + dstride * r; + unsigned base = segbase + r * elements; + + for (unsigned e = 0; e < elements; ++e) { + unsigned index = extractn(indexes, (base + e) * isize, isize); + dst[H1(e)] = extractn(table, index * tsize, 8); + } + } +} + +static inline void do_lut_h(void *zd, uint64_t *indexes, uint64_t *table, + unsigned elements, unsigned segbase, + unsigned dstride, unsigned isize, + unsigned tsize, unsigned nreg) +{ + for (unsigned r = 0; r < nreg; ++r) { + uint16_t *dst = zd + dstride * r; + unsigned base = segbase + r * elements; + + for (unsigned e = 0; e < elements; ++e) { + unsigned index = extractn(indexes, (base + e) * isize, isize); + dst[H2(e)] = extractn(table, index * tsize, 16); + } + } +} + +static inline void do_lut_s(void *zd, uint64_t *indexes, uint32_t *table, + unsigned elements, unsigned segbase, + unsigned dstride, unsigned isize, + unsigned tsize, unsigned nreg) +{ + for (unsigned r = 0; r < nreg; ++r) { + uint32_t *dst = zd + dstride * r; + unsigned base = segbase + r * elements; + + for (unsigned e = 0; e < elements; ++e) { + unsigned index = extractn(indexes, (base + e) * isize, isize); + dst[H4(e)] = table[H4(index)]; + } + } +} + +#define DO_SME2_LUT(ISIZE, NREG, SUFF, ESIZE) \ +void helper_sme2_luti##ISIZE##_##NREG##SUFF \ + (void *zd, void *zn, CPUARMState *env, uint32_t desc) \ +{ \ + unsigned vl = simd_oprsz(desc); \ + unsigned strided = extract32(desc, SIMD_DATA_SHIFT, 1); \ + unsigned idx = extract32(desc, SIMD_DATA_SHIFT + 1, 4); \ + unsigned elements = vl / ESIZE; \ + unsigned dstride = (!strided ? 1 : NREG == 4 ? 4 : 8); \ + unsigned segments = (ESIZE * 8) / (ISIZE * NREG); \ + unsigned segment = idx & (segments - 1); \ + ARMVectorReg indexes; \ + memcpy(&indexes, zn, vl); \ + do_lut_##SUFF(zd, indexes.d, (void *)env->za_state.zt0, elements, \ + segment * NREG * elements, \ + dstride * sizeof(ARMVectorReg), ISIZE, 32, NREG); \ +} + +DO_SME2_LUT(2,1,b, 1) +DO_SME2_LUT(2,1,h, 2) +DO_SME2_LUT(2,1,s, 4) +DO_SME2_LUT(2,2,b, 1) +DO_SME2_LUT(2,2,h, 2) +DO_SME2_LUT(2,2,s, 4) +DO_SME2_LUT(2,4,b, 1) +DO_SME2_LUT(2,4,h, 2) +DO_SME2_LUT(2,4,s, 4) + +DO_SME2_LUT(4,1,b, 1) +DO_SME2_LUT(4,1,h, 2) +DO_SME2_LUT(4,1,s, 4) +DO_SME2_LUT(4,2,b, 1) +DO_SME2_LUT(4,2,h, 2) +DO_SME2_LUT(4,2,s, 4) +DO_SME2_LUT(4,4,b, 1) +DO_SME2_LUT(4,4,h, 2) +DO_SME2_LUT(4,4,s, 4) + +#undef DO_SME2_LUT -- 2.43.0
The pattern we currently have as FMOPA_h is the "widening" insn that takes fp16 inputs and produces single-precision outputs. This is unlike FMOPA_s and FMOPA_d, which are non-widening produce outputs the same size as their inputs. SME2 introduces a non-widening fp16 FMOPA operation; rename FMOPA_h to FMOPA_w_h (for 'widening'), so we can use FMOPA_h for the non-widening version, giving it a name in line with the other non-widening ops FMOPA_s and FMOPA_d. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-102-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 2 +- target/arm/tcg/sme.decode | 2 +- target/arm/tcg/sme_helper.c | 4 ++-- target/arm/tcg/translate-sme.c | 4 ++-- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sme_addva_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sme_addha_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sme_addva_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_7(sme_fmopa_h, TCG_CALL_NO_RWG, +DEF_HELPER_FLAGS_7(sme_fmopa_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) DEF_HELPER_FLAGS_7(sme_fmopa_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ FMOPA_s 10000000 100 ..... ... ... ..... . 00 .. @op_32 FMOPA_d 10000000 110 ..... ... ... ..... . 0 ... @op_64 BFMOPA 10000001 100 ..... ... ... ..... . 00 .. @op_32 -FMOPA_h 10000001 101 ..... ... ... ..... . 00 .. @op_32 +FMOPA_w_h 10000001 101 ..... ... ... ..... . 00 .. @op_32 SMOPA_s 1010000 0 10 0 ..... ... ... ..... . 00 .. @op_32 SUMOPA_s 1010000 0 10 1 ..... ... ... ..... . 00 .. @op_32 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ static float32 f16_dotadd(float32 sum, uint32_t e1, uint32_t e2, return float32_add(sum, t32, s_std); } -void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn, - void *vpm, CPUARMState *env, uint32_t desc) +void HELPER(sme_fmopa_w_h)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, CPUARMState *env, uint32_t desc) { intptr_t row, col, oprsz = simd_maxsz(desc); uint32_t neg = simd_data(desc) * 0x80008000u; diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_outprod_env(DisasContext *s, arg_op *a, MemOp esz, return true; } -TRANS_FEAT(FMOPA_h, aa64_sme, do_outprod_env, a, - MO_32, gen_helper_sme_fmopa_h) +TRANS_FEAT(FMOPA_w_h, aa64_sme, do_outprod_env, a, + MO_32, gen_helper_sme_fmopa_w_h) TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, MO_32, FPST_ZA, gen_helper_sme_fmopa_s) TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, -- 2.43.0
Our current BFMOPA opcode pattern is the widening version of the insn. Rename it to BFMOPA_w, to make way for the non-widening version added in SME2. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-103-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 2 +- target/arm/tcg/sme.decode | 2 +- target/arm/tcg/sme_helper.c | 4 ++-- target/arm/tcg/translate-sme.c | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_7(sme_fmopa_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) -DEF_HELPER_FLAGS_7(sme_bfmopa, TCG_CALL_NO_RWG, +DEF_HELPER_FLAGS_7(sme_bfmopa_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) DEF_HELPER_FLAGS_6(sme_smopa_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ ADDVA_d 11000000 11 01000 1 ... ... ..... 00 ... @adda_64 FMOPA_s 10000000 100 ..... ... ... ..... . 00 .. @op_32 FMOPA_d 10000000 110 ..... ... ... ..... . 0 ... @op_64 -BFMOPA 10000001 100 ..... ... ... ..... . 00 .. @op_32 +BFMOPA_w 10000001 100 ..... ... ... ..... . 00 .. @op_32 FMOPA_w_h 10000001 101 ..... ... ... ..... . 00 .. @op_32 SMOPA_s 1010000 0 10 0 ..... ... ... ..... . 00 .. @op_32 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_fvdot_idx_h)(void *vd, void *vn, void *vm, void *va, } } -void HELPER(sme_bfmopa)(void *vza, void *vzn, void *vzm, - void *vpn, void *vpm, CPUARMState *env, uint32_t desc) +void HELPER(sme_bfmopa_w)(void *vza, void *vzn, void *vzm, + void *vpn, void *vpm, CPUARMState *env, uint32_t desc) { intptr_t row, col, oprsz = simd_maxsz(desc); uint32_t neg = simd_data(desc) * 0x80008000u; diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, MO_64, FPST_ZA, gen_helper_sme_fmopa_d) -TRANS_FEAT(BFMOPA, aa64_sme, do_outprod_env, a, MO_32, gen_helper_sme_bfmopa) +TRANS_FEAT(BFMOPA_w, aa64_sme, do_outprod_env, a, MO_32, gen_helper_sme_bfmopa_w) TRANS_FEAT(SMOPA_s, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_smopa_s) TRANS_FEAT(UMOPA_s, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_umopa_s) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> For non-widening, we can use float_muladd_negate_product, For widening, which uses dot-product, we need to handle the negation explicitly. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-104-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 19 +++++ target/arm/tcg/vec_internal.h | 5 ++ target/arm/tcg/sme_helper.c | 141 +++++++++++++++++++++++++++------ target/arm/tcg/translate-sme.c | 27 ++++--- 4 files changed, 160 insertions(+), 32 deletions(-) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_bfmopa_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) + +DEF_HELPER_FLAGS_7(sme_fmops_w_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_7(sme_fmops_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_7(sme_fmops_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_7(sme_bfmops_w, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, env, i32) + +DEF_HELPER_FLAGS_7(sme_ah_fmops_w_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_7(sme_ah_fmops_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_7(sme_ah_fmops_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_7(sme_ah_bfmops_w, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, env, i32) + DEF_HELPER_FLAGS_6(sme_smopa_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_6(sme_umopa_s, TCG_CALL_NO_RWG, diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_internal.h +++ b/target/arm/tcg/vec_internal.h @@ -XXX,XX +XXX,XX @@ bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp); /* * Negate as for FPCR.AH=1 -- do not negate NaNs. */ +static inline float16 bfloat16_ah_chs(float16 a) +{ + return bfloat16_is_any_nan(a) ? a : bfloat16_chs(a); +} + static inline float16 float16_ah_chs(float16 a) { return float16_is_any_nan(a) ? a : float16_chs(a); diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme_addva_d)(void *vzda, void *vzn, void *vpn, } } -void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn, - void *vpm, float_status *fpst, uint32_t desc) +static void do_fmopa_s(void *vza, void *vzn, void *vzm, uint16_t *pn, + uint16_t *pm, float_status *fpst, uint32_t desc, + uint32_t negx, int negf) { intptr_t row, col, oprsz = simd_maxsz(desc); - uint32_t neg = simd_data(desc) << 31; - uint16_t *pn = vpn, *pm = vpm; for (row = 0; row < oprsz; ) { uint16_t pa = pn[H2(row >> 4)]; do { if (pa & 1) { void *vza_row = vza + tile_vslice_offset(row); - uint32_t n = *(uint32_t *)(vzn + H1_4(row)) ^ neg; + uint32_t n = *(uint32_t *)(vzn + H1_4(row)) ^ negx; for (col = 0; col < oprsz; ) { uint16_t pb = pm[H2(col >> 4)]; @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn, if (pb & 1) { uint32_t *a = vza_row + H1_4(col); uint32_t *m = vzm + H1_4(col); - *a = float32_muladd(n, *m, *a, 0, fpst); + *a = float32_muladd(n, *m, *a, negf, fpst); } col += 4; pb >>= 4; @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn, } } -void HELPER(sme_fmopa_d)(void *vza, void *vzn, void *vzm, void *vpn, +void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn, void *vpm, float_status *fpst, uint32_t desc) +{ + do_fmopa_s(vza, vzn, vzm, vpn, vpm, fpst, desc, 0, 0); +} + +void HELPER(sme_fmops_s)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_fmopa_s(vza, vzn, vzm, vpn, vpm, fpst, desc, 1u << 31, 0); +} + +void HELPER(sme_ah_fmops_s)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_fmopa_s(vza, vzn, vzm, vpn, vpm, fpst, desc, 0, + float_muladd_negate_product); +} + +static void do_fmopa_d(uint64_t *za, uint64_t *zn, uint64_t *zm, uint8_t *pn, + uint8_t *pm, float_status *fpst, uint32_t desc, + uint64_t negx, int negf) { intptr_t row, col, oprsz = simd_oprsz(desc) / 8; - uint64_t neg = (uint64_t)simd_data(desc) << 63; - uint64_t *za = vza, *zn = vzn, *zm = vzm; - uint8_t *pn = vpn, *pm = vpm; for (row = 0; row < oprsz; ++row) { if (pn[H1(row)] & 1) { uint64_t *za_row = &za[tile_vslice_index(row)]; - uint64_t n = zn[row] ^ neg; + uint64_t n = zn[row] ^ negx; for (col = 0; col < oprsz; ++col) { if (pm[H1(col)] & 1) { uint64_t *a = &za_row[col]; - *a = float64_muladd(n, zm[col], *a, 0, fpst); + *a = float64_muladd(n, zm[col], *a, negf, fpst); } } } } } +void HELPER(sme_fmopa_d)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_fmopa_d(vza, vzn, vzm, vpn, vpm, fpst, desc, 0, 0); +} + +void HELPER(sme_fmops_d)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_fmopa_d(vza, vzn, vzm, vpn, vpm, fpst, desc, 1ull << 63, 0); +} + +void HELPER(sme_ah_fmops_d)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_fmopa_d(vza, vzn, vzm, vpn, vpm, fpst, desc, 0, + float_muladd_negate_product); +} + /* * Alter PAIR as needed for controlling predicates being false, * and for NEG on an enabled row element. @@ -XXX,XX +XXX,XX @@ static inline uint32_t f16mop_adj_pair(uint32_t pair, uint32_t pg, uint32_t neg) return pair; } +static inline uint32_t f16mop_ah_neg_adj_pair(uint32_t pair, uint32_t pg) +{ + uint32_t l = pg & 1 ? float16_ah_chs(pair) : 0; + uint32_t h = pg & 4 ? float16_ah_chs(pair >> 16) : 0; + return l | (h << 16); +} + +static inline uint32_t bf16mop_ah_neg_adj_pair(uint32_t pair, uint32_t pg) +{ + uint32_t l = pg & 1 ? bfloat16_ah_chs(pair) : 0; + uint32_t h = pg & 4 ? bfloat16_ah_chs(pair >> 16) : 0; + return l | (h << 16); +} + static float32 f16_dotadd(float32 sum, uint32_t e1, uint32_t e2, float_status *s_f16, float_status *s_std, float_status *s_odd) @@ -XXX,XX +XXX,XX @@ static float32 f16_dotadd(float32 sum, uint32_t e1, uint32_t e2, return float32_add(sum, t32, s_std); } -void HELPER(sme_fmopa_w_h)(void *vza, void *vzn, void *vzm, void *vpn, - void *vpm, CPUARMState *env, uint32_t desc) +static void do_fmopa_w_h(void *vza, void *vzn, void *vzm, uint16_t *pn, + uint16_t *pm, CPUARMState *env, uint32_t desc, + uint32_t negx, bool ah_neg) { intptr_t row, col, oprsz = simd_maxsz(desc); - uint32_t neg = simd_data(desc) * 0x80008000u; - uint16_t *pn = vpn, *pm = vpm; float_status fpst_odd = env->vfp.fp_status[FPST_ZA]; set_float_rounding_mode(float_round_to_odd, &fpst_odd); @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_w_h)(void *vza, void *vzn, void *vzm, void *vpn, void *vza_row = vza + tile_vslice_offset(row); uint32_t n = *(uint32_t *)(vzn + H1_4(row)); - n = f16mop_adj_pair(n, prow, neg); + if (ah_neg) { + n = f16mop_ah_neg_adj_pair(n, prow); + } else { + n = f16mop_adj_pair(n, prow, negx); + } for (col = 0; col < oprsz; ) { uint16_t pcol = pm[H2(col >> 4)]; @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_w_h)(void *vza, void *vzn, void *vzm, void *vpn, } } +void HELPER(sme_fmopa_w_h)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, CPUARMState *env, uint32_t desc) +{ + do_fmopa_w_h(vza, vzn, vzm, vpn, vpm, env, desc, 0, false); +} + +void HELPER(sme_fmops_w_h)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, CPUARMState *env, uint32_t desc) +{ + do_fmopa_w_h(vza, vzn, vzm, vpn, vpm, env, desc, 0x80008000u, false); +} + +void HELPER(sme_ah_fmops_w_h)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, CPUARMState *env, uint32_t desc) +{ + do_fmopa_w_h(vza, vzn, vzm, vpn, vpm, env, desc, 0, true); +} + void HELPER(sme2_fdot_h)(void *vd, void *vn, void *vm, void *va, CPUARMState *env, uint32_t desc) { @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_fvdot_idx_h)(void *vd, void *vn, void *vm, void *va, } } -void HELPER(sme_bfmopa_w)(void *vza, void *vzn, void *vzm, - void *vpn, void *vpm, CPUARMState *env, uint32_t desc) +static void do_bfmopa_w(void *vza, void *vzn, void *vzm, + uint16_t *pn, uint16_t *pm, CPUARMState *env, + uint32_t desc, uint32_t negx, bool ah_neg) { intptr_t row, col, oprsz = simd_maxsz(desc); - uint32_t neg = simd_data(desc) * 0x80008000u; - uint16_t *pn = vpn, *pm = vpm; float_status fpst, fpst_odd; if (is_ebf(env, &fpst, &fpst_odd)) { @@ -XXX,XX +XXX,XX @@ void HELPER(sme_bfmopa_w)(void *vza, void *vzn, void *vzm, void *vza_row = vza + tile_vslice_offset(row); uint32_t n = *(uint32_t *)(vzn + H1_4(row)); - n = f16mop_adj_pair(n, prow, neg); + if (ah_neg) { + n = bf16mop_ah_neg_adj_pair(n, prow); + } else { + n = f16mop_adj_pair(n, prow, negx); + } for (col = 0; col < oprsz; ) { uint16_t pcol = pm[H2(col >> 4)]; @@ -XXX,XX +XXX,XX @@ void HELPER(sme_bfmopa_w)(void *vza, void *vzn, void *vzm, void *vza_row = vza + tile_vslice_offset(row); uint32_t n = *(uint32_t *)(vzn + H1_4(row)); - n = f16mop_adj_pair(n, prow, neg); + if (ah_neg) { + n = bf16mop_ah_neg_adj_pair(n, prow); + } else { + n = f16mop_adj_pair(n, prow, negx); + } for (col = 0; col < oprsz; ) { uint16_t pcol = pm[H2(col >> 4)]; @@ -XXX,XX +XXX,XX @@ void HELPER(sme_bfmopa_w)(void *vza, void *vzn, void *vzm, } } +void HELPER(sme_bfmopa_w)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, CPUARMState *env, uint32_t desc) +{ + do_bfmopa_w(vza, vzn, vzm, vpn, vpm, env, desc, 0, false); +} + +void HELPER(sme_bfmops_w)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, CPUARMState *env, uint32_t desc) +{ + do_bfmopa_w(vza, vzn, vzm, vpn, vpm, env, desc, 0x80008000u, false); +} + +void HELPER(sme_ah_bfmops_w)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, CPUARMState *env, uint32_t desc) +{ + do_bfmopa_w(vza, vzn, vzm, vpn, vpm, env, desc, 0, true); +} + typedef uint32_t IMOPFn32(uint32_t, uint32_t, uint32_t, uint8_t, bool); static inline void do_imopa_s(uint32_t *za, uint32_t *zn, uint32_t *zm, uint8_t *pn, uint8_t *pm, diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_outprod_fpst(DisasContext *s, arg_op *a, MemOp esz, gen_helper_gvec_5_ptr *fn) { int svl = streaming_vec_reg_size(s); - uint32_t desc = simd_desc(svl, svl, a->sub); + uint32_t desc = simd_desc(svl, svl, 0); TCGv_ptr za, zn, zm, pn, pm, fpst; if (!sme_smza_enabled_check(s)) { @@ -XXX,XX +XXX,XX @@ static bool do_outprod_env(DisasContext *s, arg_op *a, MemOp esz, gen_helper_gvec_5_ptr *fn) { int svl = streaming_vec_reg_size(s); - uint32_t desc = simd_desc(svl, svl, a->sub); + uint32_t desc = simd_desc(svl, svl, 0); TCGv_ptr za, zn, zm, pn, pm; if (!sme_smza_enabled_check(s)) { @@ -XXX,XX +XXX,XX @@ static bool do_outprod_env(DisasContext *s, arg_op *a, MemOp esz, return true; } -TRANS_FEAT(FMOPA_w_h, aa64_sme, do_outprod_env, a, - MO_32, gen_helper_sme_fmopa_w_h) -TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, - MO_32, FPST_ZA, gen_helper_sme_fmopa_s) -TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, - MO_64, FPST_ZA, gen_helper_sme_fmopa_d) +TRANS_FEAT(FMOPA_w_h, aa64_sme, do_outprod_env, a, MO_32, + !a->sub ? gen_helper_sme_fmopa_w_h + : !s->fpcr_ah ? gen_helper_sme_fmops_w_h + : gen_helper_sme_ah_fmops_w_h) +TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, MO_32, FPST_ZA, + !a->sub ? gen_helper_sme_fmopa_s + : !s->fpcr_ah ? gen_helper_sme_fmops_s + : gen_helper_sme_ah_fmops_s) +TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, MO_64, FPST_ZA, + !a->sub ? gen_helper_sme_fmopa_d + : !s->fpcr_ah ? gen_helper_sme_fmops_d + : gen_helper_sme_ah_fmops_d) -TRANS_FEAT(BFMOPA_w, aa64_sme, do_outprod_env, a, MO_32, gen_helper_sme_bfmopa_w) +TRANS_FEAT(BFMOPA_w, aa64_sme, do_outprod_env, a, MO_32, + !a->sub ? gen_helper_sme_bfmopa_w + : !s->fpcr_ah ? gen_helper_sme_bfmops_w + : gen_helper_sme_ah_bfmops_w) TRANS_FEAT(SMOPA_s, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_smopa_s) TRANS_FEAT(UMOPA_s, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_umopa_s) -- 2.43.0
Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-105-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 6 ++++ target/arm/tcg/sme.decode | 2 ++ target/arm/tcg/sme_helper.c | 51 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 4 +++ 4 files changed, 63 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sme_addva_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_7(sme_fmopa_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_7(sme_fmopa_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_fmopa_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG, @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_7(sme_bfmopa_w, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_7(sme_fmops_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_7(sme_fmops_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_fmops_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_fmops_d, TCG_CALL_NO_RWG, @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_7(sme_bfmops_w, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_7(sme_ah_fmops_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_7(sme_ah_fmops_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_ah_fmops_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_ah_fmops_d, TCG_CALL_NO_RWG, diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ ADDVA_d 11000000 11 01000 1 ... ... ..... 00 ... @adda_64 ### SME Outer Product &op zad zn zm pm pn sub:bool +@op_16 ........ ... zm:5 pm:3 pn:3 zn:5 sub:1 ... zad:1 &op @op_32 ........ ... zm:5 pm:3 pn:3 zn:5 sub:1 .. zad:2 &op @op_64 ........ ... zm:5 pm:3 pn:3 zn:5 sub:1 . zad:3 &op +FMOPA_h 10000001 100 ..... ... ... ..... . 100 . @op_16 FMOPA_s 10000000 100 ..... ... ... ..... . 00 .. @op_32 FMOPA_d 10000000 110 ..... ... ... ..... . 0 ... @op_64 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme_addva_d)(void *vzda, void *vzn, void *vpn, } } +static void do_fmopa_h(void *vza, void *vzn, void *vzm, uint16_t *pn, + uint16_t *pm, float_status *fpst, uint32_t desc, + uint16_t negx, int negf) +{ + intptr_t row, col, oprsz = simd_maxsz(desc); + + for (row = 0; row < oprsz; ) { + uint16_t pa = pn[H2(row >> 4)]; + do { + if (pa & 1) { + void *vza_row = vza + tile_vslice_offset(row); + uint16_t n = *(uint32_t *)(vzn + H1_2(row)) ^ negx; + + for (col = 0; col < oprsz; ) { + uint16_t pb = pm[H2(col >> 4)]; + do { + if (pb & 1) { + uint16_t *a = vza_row + H1_2(col); + uint16_t *m = vzm + H1_2(col); + *a = float16_muladd(n, *m, *a, negf, fpst); + } + col += 2; + pb >>= 2; + } while (col & 15); + } + } + row += 2; + pa >>= 2; + } while (row & 15); + } +} + +void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_fmopa_h(vza, vzn, vzm, vpn, vpm, fpst, desc, 0, 0); +} + +void HELPER(sme_fmops_h)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_fmopa_h(vza, vzn, vzm, vpn, vpm, fpst, desc, 1u << 15, 0); +} + +void HELPER(sme_ah_fmops_h)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_fmopa_h(vza, vzn, vzm, vpn, vpm, fpst, desc, 0, + float_muladd_negate_product); +} + static void do_fmopa_s(void *vza, void *vzn, void *vzm, uint16_t *pn, uint16_t *pm, float_status *fpst, uint32_t desc, uint32_t negx, int negf) diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FMOPA_w_h, aa64_sme, do_outprod_env, a, MO_32, !a->sub ? gen_helper_sme_fmopa_w_h : !s->fpcr_ah ? gen_helper_sme_fmops_w_h : gen_helper_sme_ah_fmops_w_h) +TRANS_FEAT(FMOPA_h, aa64_sme_f16f16, do_outprod_fpst, a, MO_16, FPST_ZA_F16, + !a->sub ? gen_helper_sme_fmopa_h + : !s->fpcr_ah ? gen_helper_sme_fmops_h + : gen_helper_sme_ah_fmops_h) TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, MO_32, FPST_ZA, !a->sub ? gen_helper_sme_fmopa_s : !s->fpcr_ah ? gen_helper_sme_fmops_s -- 2.43.0
Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-106-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 6 ++++ target/arm/tcg/sme.decode | 2 ++ target/arm/tcg/sme_helper.c | 51 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 5 ++++ 4 files changed, 64 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_bfmopa_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_7(sme_bfmopa, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_fmops_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_7(sme_fmops_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_bfmops_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_7(sme_bfmops, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_ah_fmops_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_7(sme_ah_fmops_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_ah_bfmops_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_7(sme_ah_bfmops, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_6(sme_smopa_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ FMOPA_h 10000001 100 ..... ... ... ..... . 100 . @op_16 FMOPA_s 10000000 100 ..... ... ... ..... . 00 .. @op_32 FMOPA_d 10000000 110 ..... ... ... ..... . 0 ... @op_64 +BFMOPA 10000001 101 ..... ... ... ..... . 100 . @op_16 + BFMOPA_w 10000001 100 ..... ... ... ..... . 00 .. @op_32 FMOPA_w_h 10000001 101 ..... ... ... ..... . 00 .. @op_32 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme_ah_fmops_d)(void *vza, void *vzn, void *vzm, void *vpn, float_muladd_negate_product); } +static void do_bfmopa(void *vza, void *vzn, void *vzm, uint16_t *pn, + uint16_t *pm, float_status *fpst, uint32_t desc, + uint16_t negx, int negf) +{ + intptr_t row, col, oprsz = simd_maxsz(desc); + + for (row = 0; row < oprsz; ) { + uint16_t pa = pn[H2(row >> 4)]; + do { + if (pa & 1) { + void *vza_row = vza + tile_vslice_offset(row); + uint16_t n = *(uint32_t *)(vzn + H1_2(row)) ^ negx; + + for (col = 0; col < oprsz; ) { + uint16_t pb = pm[H2(col >> 4)]; + do { + if (pb & 1) { + uint16_t *a = vza_row + H1_2(col); + uint16_t *m = vzm + H1_2(col); + *a = bfloat16_muladd(n, *m, *a, negf, fpst); + } + col += 2; + pb >>= 2; + } while (col & 15); + } + } + row += 2; + pa >>= 2; + } while (row & 15); + } +} + +void HELPER(sme_bfmopa)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_bfmopa(vza, vzn, vzm, vpn, vpm, fpst, desc, 0, 0); +} + +void HELPER(sme_bfmops)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_bfmopa(vza, vzn, vzm, vpn, vpm, fpst, desc, 1u << 15, 0); +} + +void HELPER(sme_ah_bfmops)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_bfmopa(vza, vzn, vzm, vpn, vpm, fpst, desc, 0, + float_muladd_negate_product); +} + /* * Alter PAIR as needed for controlling predicates being false, * and for NEG on an enabled row element. diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, MO_64, FPST_ZA, : !s->fpcr_ah ? gen_helper_sme_fmops_d : gen_helper_sme_ah_fmops_d) +TRANS_FEAT(BFMOPA, aa64_sme_b16b16, do_outprod_fpst, a, MO_16, FPST_ZA, + !a->sub ? gen_helper_sme_bfmopa + : !s->fpcr_ah ? gen_helper_sme_bfmops + : gen_helper_sme_ah_bfmops) + TRANS_FEAT(BFMOPA_w, aa64_sme, do_outprod_env, a, MO_32, !a->sub ? gen_helper_sme_bfmopa_w : !s->fpcr_ah ? gen_helper_sme_bfmops_w -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-107-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- docs/system/arm/emulation.rst | 6 ++++++ target/arm/tcg/cpu64.c | 10 ++++++++-- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst index XXXXXXX..XXXXXXX 100644 --- a/docs/system/arm/emulation.rst +++ b/docs/system/arm/emulation.rst @@ -XXX,XX +XXX,XX @@ the following architecture extensions: - FEAT_SM3 (Advanced SIMD SM3 instructions) - FEAT_SM4 (Advanced SIMD SM4 instructions) - FEAT_SME (Scalable Matrix Extension) +- FEAT_SME2 (Scalable Matrix Extension version 2) +- FEAT_SME2p1 (Scalable Matrix Extension version 2.1) +- FEAT_SME_B16B16 (Non-widening BFloat16 arithmetic for SME2) - FEAT_SME_FA64 (Full A64 instruction set in Streaming SVE mode) +- FEAT_SME_F16F16 (Non-widening half-precision FP16 arithmetic for SME2) - FEAT_SME_F64F64 (Double-precision floating-point outer product instructions) - FEAT_SME_I16I64 (16-bit to 64-bit integer widening outer product instructions) - FEAT_SVE (Scalable Vector Extension) - FEAT_SVE_AES (Scalable Vector AES instructions) +- FEAT_SVE_B16B16 (Non-widening BFloat16 arithmetic for SVE2) - FEAT_SVE_BitPerm (Scalable Vector Bit Permutes instructions) - FEAT_SVE_PMULL128 (Scalable Vector PMULL instructions) - FEAT_SVE_SHA3 (Scalable Vector SHA3 instructions) - FEAT_SVE_SM4 (Scalable Vector SM4 instructions) - FEAT_SVE2 (Scalable Vector Extension version 2) +- FEAT_SVE2p1 (Scalable Vector Extension version 2.1) - FEAT_SPECRES (Speculation restriction instructions) - FEAT_SSBS (Speculative Store Bypass Safe) - FEAT_SSBS2 (MRS and MSR instructions for SSBS version 2) diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/cpu64.c +++ b/target/arm/tcg/cpu64.c @@ -XXX,XX +XXX,XX @@ void aarch64_max_tcg_initfn(Object *obj) */ t = FIELD_DP64(t, ID_AA64PFR1, MTE, 3); /* FEAT_MTE3 */ t = FIELD_DP64(t, ID_AA64PFR1, RAS_FRAC, 0); /* FEAT_RASv1p1 + FEAT_DoubleFault */ - t = FIELD_DP64(t, ID_AA64PFR1, SME, 1); /* FEAT_SME */ + t = FIELD_DP64(t, ID_AA64PFR1, SME, 2); /* FEAT_SME2 */ t = FIELD_DP64(t, ID_AA64PFR1, CSV2_FRAC, 0); /* FEAT_CSV2_3 */ t = FIELD_DP64(t, ID_AA64PFR1, NMI, 1); /* FEAT_NMI */ SET_IDREG(isar, ID_AA64PFR1, t); @@ -XXX,XX +XXX,XX @@ void aarch64_max_tcg_initfn(Object *obj) FIELD_DP64_IDREG(isar, ID_AA64MMFR3, SPEC_FPACC, 1); /* FEAT_FPACC_SPEC */ t = GET_IDREG(isar, ID_AA64ZFR0); - t = FIELD_DP64(t, ID_AA64ZFR0, SVEVER, 1); + t = FIELD_DP64(t, ID_AA64ZFR0, SVEVER, 2); /* FEAT_SVE2p1 */ t = FIELD_DP64(t, ID_AA64ZFR0, AES, 2); /* FEAT_SVE_PMULL128 */ t = FIELD_DP64(t, ID_AA64ZFR0, BITPERM, 1); /* FEAT_SVE_BitPerm */ t = FIELD_DP64(t, ID_AA64ZFR0, BFLOAT16, 2); /* FEAT_BF16, FEAT_EBF16 */ + t = FIELD_DP64(t, ID_AA64ZFR0, B16B16, 1); /* FEAT_SVE_B16B16 */ t = FIELD_DP64(t, ID_AA64ZFR0, SHA3, 1); /* FEAT_SVE_SHA3 */ t = FIELD_DP64(t, ID_AA64ZFR0, SM4, 1); /* FEAT_SVE_SM4 */ t = FIELD_DP64(t, ID_AA64ZFR0, I8MM, 1); /* FEAT_I8MM */ @@ -XXX,XX +XXX,XX @@ void aarch64_max_tcg_initfn(Object *obj) t = GET_IDREG(isar, ID_AA64SMFR0); t = FIELD_DP64(t, ID_AA64SMFR0, F32F32, 1); /* FEAT_SME */ + t = FIELD_DP64(t, ID_AA64SMFR0, BI32I32, 1); /* FEAT_SME2 */ t = FIELD_DP64(t, ID_AA64SMFR0, B16F32, 1); /* FEAT_SME */ t = FIELD_DP64(t, ID_AA64SMFR0, F16F32, 1); /* FEAT_SME */ t = FIELD_DP64(t, ID_AA64SMFR0, I8I32, 0xf); /* FEAT_SME */ + t = FIELD_DP64(t, ID_AA64SMFR0, F16F16, 1); /* FEAT_SME_F16F16 */ + t = FIELD_DP64(t, ID_AA64SMFR0, B16B16, 1); /* FEAT_SME_B16B16 */ + t = FIELD_DP64(t, ID_AA64SMFR0, I16I32, 5); /* FEAT_SME2 */ t = FIELD_DP64(t, ID_AA64SMFR0, F64F64, 1); /* FEAT_SME_F64F64 */ t = FIELD_DP64(t, ID_AA64SMFR0, I16I64, 0xf); /* FEAT_SME_I16I64 */ + t = FIELD_DP64(t, ID_AA64SMFR0, SMEVER, 2); /* FEAT_SME2p1 */ t = FIELD_DP64(t, ID_AA64SMFR0, FA64, 1); /* FEAT_SME_FA64 */ SET_IDREG(isar, ID_AA64SMFR0, t); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-108-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- linux-user/elfload.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/linux-user/elfload.c b/linux-user/elfload.c index XXXXXXX..XXXXXXX 100644 --- a/linux-user/elfload.c +++ b/linux-user/elfload.c @@ -XXX,XX +XXX,XX @@ uint64_t get_elf_hwcap2(void) GET_FEATURE_ID(aa64_sme_fa64, ARM_HWCAP2_A64_SME_FA64); GET_FEATURE_ID(aa64_hbc, ARM_HWCAP2_A64_HBC); GET_FEATURE_ID(aa64_mops, ARM_HWCAP2_A64_MOPS); + GET_FEATURE_ID(aa64_sve2p1, ARM_HWCAP2_A64_SVE2P1); + GET_FEATURE_ID(aa64_sme2, (ARM_HWCAP2_A64_SME2 | + ARM_HWCAP2_A64_SME_I16I32 | + ARM_HWCAP2_A64_SME_BI32I32)); + GET_FEATURE_ID(aa64_sme2p1, ARM_HWCAP2_A64_SME2P1); + GET_FEATURE_ID(aa64_sme_b16b16, ARM_HWCAP2_A64_SME_B16B16); + GET_FEATURE_ID(aa64_sme_f16f16, ARM_HWCAP2_A64_SME_F16F16); + GET_FEATURE_ID(aa64_sve_b16b16, ARM_HWCAP2_A64_SVE_B16B16); return hwcaps; } -- 2.43.0