:p
atchew
Login
A large arm pullreq, mostly because of 3 series: * aspeed 2600 support * semihosting v2.0 support * transaction-based ptimers thanks -- PMM The following changes since commit 22dbfdecc3c52228d3489da3fe81da92b21197bf: Merge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20191010.0' into staging (2019-10-14 15:09:08 +0100) are available in the Git repository at: https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20191014 for you to fetch changes up to bca1936f8f66c5f8a111569ffd14969de208bf3b: hw/misc/bcm2835_mbox: Add trace events (2019-10-14 16:48:56 +0100) ---------------------------------------------------------------- target-arm queue: * Add Aspeed AST2600 SoC and board support * aspeed/wdt: Check correct register for clock source * bcm2835: code cleanups, better logging, trace events * implement v2.0 of the Arm semihosting specification * provide new 'transaction-based' ptimer API and use it for the Arm devices that use ptimers * ARM: KVM: support more than 256 CPUs ---------------------------------------------------------------- Amithash Prasad (1): aspeed/wdt: Check correct register for clock source Cédric Le Goater (15): aspeed/timer: Introduce an object class per SoC aspeed/timer: Add support for control register 3 aspeed/timer: Add AST2600 support aspeed/timer: Add support for IRQ status register on the AST2600 aspeed/sdmc: Introduce an object class per SoC watchdog/aspeed: Introduce an object class per SoC aspeed/smc: Introduce segment operations aspeed/smc: Add AST2600 support aspeed/i2c: Introduce an object class per SoC aspeed/i2c: Add AST2600 support aspeed: Introduce an object class per SoC aspeed/soc: Add AST2600 support m25p80: Add support for w25q512jv aspeed: Add an AST2600 eval board aspeed: add support for the Aspeed MII controller of the AST2600 Eddie James (1): hw/sd/aspeed_sdhci: New device Eric Auger (3): linux headers: update against v5.4-rc1 intc/arm_gic: Support IRQ injection for more than 256 vpus ARM: KVM: Check KVM_CAP_ARM_IRQ_LINE_LAYOUT_2 for smp_cpus > 256 Joel Stanley (5): hw: aspeed_scu: Add AST2600 support aspeed/sdmc: Add AST2600 support hw: wdt_aspeed: Add AST2600 support aspeed: Parameterise number of MACs aspeed/soc: Add ASPEED Video stub Peter Maydell (36): ptimer: Rename ptimer_init() to ptimer_init_with_bh() ptimer: Provide new transaction-based API tests/ptimer-test: Switch to transaction-based ptimer API hw/timer/arm_timer.c: Switch to transaction-based ptimer API hw/arm/musicpal.c: Switch to transaction-based ptimer API hw/timer/allwinner-a10-pit.c: Switch to transaction-based ptimer API hw/timer/arm_mptimer.c: Switch to transaction-based ptimer API hw/timer/cmsdk-apb-dualtimer.c: Switch to transaction-based ptimer API hw/timer/cmsdk-apb-timer.c: Switch to transaction-based ptimer API hw/timer/digic-timer.c: Switch to transaction-based ptimer API hw/timer/exynos4210_mct.c: Switch GFRC to transaction-based ptimer API hw/timer/exynos4210_mct.c: Switch LFRC to transaction-based ptimer API hw/timer/exynos4210_mct.c: Switch ltick to transaction-based ptimer API hw/timer/exynos4210_pwm.c: Switch to transaction-based ptimer API hw/timer/exynos4210_rtc.c: Switch 1Hz ptimer to transaction-based API hw/timer/exynos4210_rtc.c: Switch main ptimer to transaction-based API hw/timer/imx_epit.c: Switch to transaction-based ptimer API hw/timer/imx_gpt.c: Switch to transaction-based ptimer API hw/timer/mss-timerc: Switch to transaction-based ptimer API hw/watchdog/cmsdk-apb-watchdog.c: Switch to transaction-based ptimer API hw/net/lan9118.c: Switch to transaction-based ptimer API target/arm/arm-semi: Capture errno in softmmu version of set_swi_errno() target/arm/arm-semi: Always set some kind of errno for failed calls target/arm/arm-semi: Correct comment about gdb syscall races target/arm/arm-semi: Make semihosting code hand out its own file descriptors target/arm/arm-semi: Restrict use of TaskState* target/arm/arm-semi: Use set_swi_errno() in gdbstub callback functions target/arm/arm-semi: Factor out implementation of SYS_CLOSE target/arm/arm-semi: Factor out implementation of SYS_WRITE target/arm/arm-semi: Factor out implementation of SYS_READ target/arm/arm-semi: Factor out implementation of SYS_ISTTY target/arm/arm-semi: Factor out implementation of SYS_SEEK target/arm/arm-semi: Factor out implementation of SYS_FLEN target/arm/arm-semi: Implement support for semihosting feature detection target/arm/arm-semi: Implement SH_EXT_EXIT_EXTENDED extension target/arm/arm-semi: Implement SH_EXT_STDOUT_STDERR extension Philippe Mathieu-Daudé (6): hw/arm/raspi: Use the IEC binary prefix definitions hw/arm/bcm2835_peripherals: Improve logging hw/arm/bcm2835_peripherals: Name various address spaces hw/arm/bcm2835: Rename some definitions hw/arm/bcm2835: Add various unimplemented peripherals hw/misc/bcm2835_mbox: Add trace events Rashmica Gupta (1): hw/gpio: Add in AST2600 specific implementation hw/arm/Makefile.objs | 2 +- hw/sd/Makefile.objs | 1 + include/hw/arm/aspeed.h | 1 + include/hw/arm/aspeed_soc.h | 29 +- include/hw/arm/bcm2835_peripherals.h | 15 + include/hw/arm/raspi_platform.h | 24 +- include/hw/i2c/aspeed_i2c.h | 20 +- include/hw/misc/aspeed_scu.h | 7 +- include/hw/misc/aspeed_sdmc.h | 20 +- include/hw/net/ftgmac100.h | 17 + include/hw/ptimer.h | 83 ++- include/hw/sd/aspeed_sdhci.h | 34 ++ include/hw/ssi/aspeed_smc.h | 4 + include/hw/timer/aspeed_timer.h | 18 + include/hw/timer/mss-timer.h | 1 - include/hw/watchdog/wdt_aspeed.h | 19 +- include/standard-headers/asm-x86/bootparam.h | 2 + include/standard-headers/asm-x86/kvm_para.h | 1 + include/standard-headers/linux/ethtool.h | 24 + include/standard-headers/linux/pci_regs.h | 19 +- include/standard-headers/linux/virtio_fs.h | 19 + include/standard-headers/linux/virtio_ids.h | 2 + include/standard-headers/linux/virtio_iommu.h | 165 ++++++ include/standard-headers/linux/virtio_pmem.h | 6 +- linux-headers/asm-arm/kvm.h | 16 +- linux-headers/asm-arm/unistd-common.h | 2 + linux-headers/asm-arm64/kvm.h | 21 +- linux-headers/asm-generic/mman-common.h | 18 +- linux-headers/asm-generic/mman.h | 10 +- linux-headers/asm-generic/unistd.h | 10 +- linux-headers/asm-mips/mman.h | 3 + linux-headers/asm-mips/unistd_n32.h | 1 + linux-headers/asm-mips/unistd_n64.h | 1 + linux-headers/asm-mips/unistd_o32.h | 1 + linux-headers/asm-powerpc/mman.h | 6 +- linux-headers/asm-powerpc/unistd_32.h | 2 + linux-headers/asm-powerpc/unistd_64.h | 2 + linux-headers/asm-s390/kvm.h | 6 + linux-headers/asm-s390/unistd_32.h | 2 + linux-headers/asm-s390/unistd_64.h | 2 + linux-headers/asm-x86/kvm.h | 28 +- linux-headers/asm-x86/unistd.h | 2 +- linux-headers/asm-x86/unistd_32.h | 2 + linux-headers/asm-x86/unistd_64.h | 2 + linux-headers/asm-x86/unistd_x32.h | 2 + linux-headers/linux/kvm.h | 12 +- linux-headers/linux/psp-sev.h | 5 +- linux-headers/linux/vfio.h | 71 ++- target/arm/kvm_arm.h | 1 + hw/arm/aspeed.c | 42 +- hw/arm/aspeed_ast2600.c | 523 +++++++++++++++++++ hw/arm/aspeed_soc.c | 199 +++++--- hw/arm/bcm2835_peripherals.c | 38 +- hw/arm/bcm2836.c | 2 +- hw/arm/musicpal.c | 16 +- hw/arm/raspi.c | 4 +- hw/block/m25p80.c | 1 + hw/char/bcm2835_aux.c | 5 +- hw/core/ptimer.c | 154 +++++- hw/display/bcm2835_fb.c | 2 +- hw/dma/bcm2835_dma.c | 10 +- hw/dma/xilinx_axidma.c | 2 +- hw/gpio/aspeed_gpio.c | 142 +++++- hw/i2c/aspeed_i2c.c | 106 +++- hw/intc/arm_gic_kvm.c | 7 +- hw/intc/bcm2836_control.c | 7 +- hw/m68k/mcf5206.c | 2 +- hw/m68k/mcf5208.c | 2 +- hw/misc/aspeed_scu.c | 194 ++++++- hw/misc/aspeed_sdmc.c | 250 ++++++--- hw/misc/bcm2835_mbox.c | 14 +- hw/misc/bcm2835_property.c | 20 +- hw/net/fsl_etsec/etsec.c | 2 +- hw/net/ftgmac100.c | 162 ++++++ hw/net/lan9118.c | 11 +- hw/sd/aspeed_sdhci.c | 198 ++++++++ hw/ssi/aspeed_smc.c | 177 ++++++- hw/timer/allwinner-a10-pit.c | 12 +- hw/timer/altera_timer.c | 2 +- hw/timer/arm_mptimer.c | 18 +- hw/timer/arm_timer.c | 16 +- hw/timer/aspeed_timer.c | 213 +++++++- hw/timer/cmsdk-apb-dualtimer.c | 14 +- hw/timer/cmsdk-apb-timer.c | 15 +- hw/timer/digic-timer.c | 16 +- hw/timer/etraxfs_timer.c | 6 +- hw/timer/exynos4210_mct.c | 107 +++- hw/timer/exynos4210_pwm.c | 17 +- hw/timer/exynos4210_rtc.c | 22 +- hw/timer/grlib_gptimer.c | 2 +- hw/timer/imx_epit.c | 32 +- hw/timer/imx_gpt.c | 21 +- hw/timer/lm32_timer.c | 2 +- hw/timer/milkymist-sysctl.c | 4 +- hw/timer/mss-timer.c | 11 +- hw/timer/puv3_ost.c | 2 +- hw/timer/sh_timer.c | 2 +- hw/timer/slavio_timer.c | 2 +- hw/timer/xilinx_timer.c | 2 +- hw/watchdog/cmsdk-apb-watchdog.c | 13 +- hw/watchdog/wdt_aspeed.c | 153 +++--- target/arm/arm-semi.c | 707 +++++++++++++++++++++----- target/arm/cpu.c | 10 +- target/arm/kvm.c | 22 +- tests/ptimer-test.c | 106 +++- hw/misc/trace-events | 6 + 106 files changed, 3958 insertions(+), 650 deletions(-) create mode 100644 include/hw/sd/aspeed_sdhci.h create mode 100644 include/standard-headers/linux/virtio_fs.h create mode 100644 include/standard-headers/linux/virtio_iommu.h create mode 100644 hw/arm/aspeed_ast2600.c create mode 100644 hw/sd/aspeed_sdhci.c
From: Eric Auger <eric.auger@redhat.com> Update the headers against commit: 0f1a7b3fac05 ("timer-of: don't use conditional expression with mixed 'void' types") Signed-off-by: Eric Auger <eric.auger@redhat.com> Acked-by: Marc Zyngier <maz@kernel.org> Message-id: 20191003154640.22451-2-eric.auger@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/standard-headers/asm-x86/bootparam.h | 2 + include/standard-headers/asm-x86/kvm_para.h | 1 + include/standard-headers/linux/ethtool.h | 24 +++ include/standard-headers/linux/pci_regs.h | 19 +- include/standard-headers/linux/virtio_fs.h | 19 ++ include/standard-headers/linux/virtio_ids.h | 2 + include/standard-headers/linux/virtio_iommu.h | 165 ++++++++++++++++++ include/standard-headers/linux/virtio_pmem.h | 6 +- linux-headers/asm-arm/kvm.h | 16 +- linux-headers/asm-arm/unistd-common.h | 2 + linux-headers/asm-arm64/kvm.h | 21 ++- linux-headers/asm-generic/mman-common.h | 18 +- linux-headers/asm-generic/mman.h | 10 +- linux-headers/asm-generic/unistd.h | 10 +- linux-headers/asm-mips/mman.h | 3 + linux-headers/asm-mips/unistd_n32.h | 1 + linux-headers/asm-mips/unistd_n64.h | 1 + linux-headers/asm-mips/unistd_o32.h | 1 + linux-headers/asm-powerpc/mman.h | 6 +- linux-headers/asm-powerpc/unistd_32.h | 2 + linux-headers/asm-powerpc/unistd_64.h | 2 + linux-headers/asm-s390/kvm.h | 6 + linux-headers/asm-s390/unistd_32.h | 2 + linux-headers/asm-s390/unistd_64.h | 2 + linux-headers/asm-x86/kvm.h | 28 ++- linux-headers/asm-x86/unistd.h | 2 +- linux-headers/asm-x86/unistd_32.h | 2 + linux-headers/asm-x86/unistd_64.h | 2 + linux-headers/asm-x86/unistd_x32.h | 2 + linux-headers/linux/kvm.h | 12 +- linux-headers/linux/psp-sev.h | 5 +- linux-headers/linux/vfio.h | 71 +++++--- 32 files changed, 406 insertions(+), 59 deletions(-) create mode 100644 include/standard-headers/linux/virtio_fs.h create mode 100644 include/standard-headers/linux/virtio_iommu.h diff --git a/include/standard-headers/asm-x86/bootparam.h b/include/standard-headers/asm-x86/bootparam.h index XXXXXXX..XXXXXXX 100644 --- a/include/standard-headers/asm-x86/bootparam.h +++ b/include/standard-headers/asm-x86/bootparam.h @@ -XXX,XX +XXX,XX @@ #define XLF_EFI_HANDOVER_32 (1<<2) #define XLF_EFI_HANDOVER_64 (1<<3) #define XLF_EFI_KEXEC (1<<4) +#define XLF_5LEVEL (1<<5) +#define XLF_5LEVEL_ENABLED (1<<6) #endif /* _ASM_X86_BOOTPARAM_H */ diff --git a/include/standard-headers/asm-x86/kvm_para.h b/include/standard-headers/asm-x86/kvm_para.h index XXXXXXX..XXXXXXX 100644 --- a/include/standard-headers/asm-x86/kvm_para.h +++ b/include/standard-headers/asm-x86/kvm_para.h @@ -XXX,XX +XXX,XX @@ #define KVM_FEATURE_ASYNC_PF_VMEXIT 10 #define KVM_FEATURE_PV_SEND_IPI 11 #define KVM_FEATURE_POLL_CONTROL 12 +#define KVM_FEATURE_PV_SCHED_YIELD 13 #define KVM_HINTS_REALTIME 0 diff --git a/include/standard-headers/linux/ethtool.h b/include/standard-headers/linux/ethtool.h index XXXXXXX..XXXXXXX 100644 --- a/include/standard-headers/linux/ethtool.h +++ b/include/standard-headers/linux/ethtool.h @@ -XXX,XX +XXX,XX @@ struct ethtool_tunable { #define ETHTOOL_PHY_FAST_LINK_DOWN_ON 0 #define ETHTOOL_PHY_FAST_LINK_DOWN_OFF 0xff +/* Energy Detect Power Down (EDPD) is a feature supported by some PHYs, where + * the PHY's RX & TX blocks are put into a low-power mode when there is no + * link detected (typically cable is un-plugged). For RX, only a minimal + * link-detection is available, and for TX the PHY wakes up to send link pulses + * to avoid any lock-ups in case the peer PHY may also be running in EDPD mode. + * + * Some PHYs may support configuration of the wake-up interval for TX pulses, + * and some PHYs may support only disabling TX pulses entirely. For the latter + * a special value is required (ETHTOOL_PHY_EDPD_NO_TX) so that this can be + * configured from userspace (should the user want it). + * + * The interval units for TX wake-up are in milliseconds, since this should + * cover a reasonable range of intervals: + * - from 1 millisecond, which does not sound like much of a power-saver + * - to ~65 seconds which is quite a lot to wait for a link to come up when + * plugging a cable + */ +#define ETHTOOL_PHY_EDPD_DFLT_TX_MSECS 0xffff +#define ETHTOOL_PHY_EDPD_NO_TX 0xfffe +#define ETHTOOL_PHY_EDPD_DISABLE 0 + enum phy_tunable_id { ETHTOOL_PHY_ID_UNSPEC, ETHTOOL_PHY_DOWNSHIFT, ETHTOOL_PHY_FAST_LINK_DOWN, + ETHTOOL_PHY_EDPD, /* * Add your fresh new phy tunable attribute above and remember to update * phy_tunable_strings[] in net/core/ethtool.c @@ -XXX,XX +XXX,XX @@ enum ethtool_link_mode_bit_indices { ETHTOOL_LINK_MODE_200000baseLR4_ER4_FR4_Full_BIT = 64, ETHTOOL_LINK_MODE_200000baseDR4_Full_BIT = 65, ETHTOOL_LINK_MODE_200000baseCR4_Full_BIT = 66, + ETHTOOL_LINK_MODE_100baseT1_Full_BIT = 67, + ETHTOOL_LINK_MODE_1000baseT1_Full_BIT = 68, /* must be last entry */ __ETHTOOL_LINK_MODE_MASK_NBITS diff --git a/include/standard-headers/linux/pci_regs.h b/include/standard-headers/linux/pci_regs.h index XXXXXXX..XXXXXXX 100644 --- a/include/standard-headers/linux/pci_regs.h +++ b/include/standard-headers/linux/pci_regs.h @@ -XXX,XX +XXX,XX @@ #define PCI_EXP_LNKCAP_SLS_5_0GB 0x00000002 /* LNKCAP2 SLS Vector bit 1 */ #define PCI_EXP_LNKCAP_SLS_8_0GB 0x00000003 /* LNKCAP2 SLS Vector bit 2 */ #define PCI_EXP_LNKCAP_SLS_16_0GB 0x00000004 /* LNKCAP2 SLS Vector bit 3 */ +#define PCI_EXP_LNKCAP_SLS_32_0GB 0x00000005 /* LNKCAP2 SLS Vector bit 4 */ #define PCI_EXP_LNKCAP_MLW 0x000003f0 /* Maximum Link Width */ #define PCI_EXP_LNKCAP_ASPMS 0x00000c00 /* ASPM Support */ #define PCI_EXP_LNKCAP_L0SEL 0x00007000 /* L0s Exit Latency */ @@ -XXX,XX +XXX,XX @@ #define PCI_EXP_LNKSTA_CLS_5_0GB 0x0002 /* Current Link Speed 5.0GT/s */ #define PCI_EXP_LNKSTA_CLS_8_0GB 0x0003 /* Current Link Speed 8.0GT/s */ #define PCI_EXP_LNKSTA_CLS_16_0GB 0x0004 /* Current Link Speed 16.0GT/s */ +#define PCI_EXP_LNKSTA_CLS_32_0GB 0x0005 /* Current Link Speed 32.0GT/s */ #define PCI_EXP_LNKSTA_NLW 0x03f0 /* Negotiated Link Width */ #define PCI_EXP_LNKSTA_NLW_X1 0x0010 /* Current Link Width x1 */ #define PCI_EXP_LNKSTA_NLW_X2 0x0020 /* Current Link Width x2 */ @@ -XXX,XX +XXX,XX @@ #define PCI_EXP_SLTCTL_CCIE 0x0010 /* Command Completed Interrupt Enable */ #define PCI_EXP_SLTCTL_HPIE 0x0020 /* Hot-Plug Interrupt Enable */ #define PCI_EXP_SLTCTL_AIC 0x00c0 /* Attention Indicator Control */ +#define PCI_EXP_SLTCTL_ATTN_IND_SHIFT 6 /* Attention Indicator shift */ #define PCI_EXP_SLTCTL_ATTN_IND_ON 0x0040 /* Attention Indicator on */ #define PCI_EXP_SLTCTL_ATTN_IND_BLINK 0x0080 /* Attention Indicator blinking */ #define PCI_EXP_SLTCTL_ATTN_IND_OFF 0x00c0 /* Attention Indicator off */ @@ -XXX,XX +XXX,XX @@ #define PCI_EXP_LNKCAP2_SLS_5_0GB 0x00000004 /* Supported Speed 5GT/s */ #define PCI_EXP_LNKCAP2_SLS_8_0GB 0x00000008 /* Supported Speed 8GT/s */ #define PCI_EXP_LNKCAP2_SLS_16_0GB 0x00000010 /* Supported Speed 16GT/s */ +#define PCI_EXP_LNKCAP2_SLS_32_0GB 0x00000020 /* Supported Speed 32GT/s */ #define PCI_EXP_LNKCAP2_CROSSLINK 0x00000100 /* Crosslink supported */ #define PCI_EXP_LNKCTL2 48 /* Link Control 2 */ #define PCI_EXP_LNKCTL2_TLS 0x000f @@ -XXX,XX +XXX,XX @@ #define PCI_EXP_LNKCTL2_TLS_5_0GT 0x0002 /* Supported Speed 5GT/s */ #define PCI_EXP_LNKCTL2_TLS_8_0GT 0x0003 /* Supported Speed 8GT/s */ #define PCI_EXP_LNKCTL2_TLS_16_0GT 0x0004 /* Supported Speed 16GT/s */ +#define PCI_EXP_LNKCTL2_TLS_32_0GT 0x0005 /* Supported Speed 32GT/s */ #define PCI_EXP_LNKSTA2 50 /* Link Status 2 */ #define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 52 /* v2 endpoints with link end here */ #define PCI_EXP_SLTCAP2 52 /* Slot Capabilities 2 */ @@ -XXX,XX +XXX,XX @@ #define PCI_EXT_CAP_ID_DPC 0x1D /* Downstream Port Containment */ #define PCI_EXT_CAP_ID_L1SS 0x1E /* L1 PM Substates */ #define PCI_EXT_CAP_ID_PTM 0x1F /* Precision Time Measurement */ -#define PCI_EXT_CAP_ID_MAX PCI_EXT_CAP_ID_PTM +#define PCI_EXT_CAP_ID_DLF 0x25 /* Data Link Feature */ +#define PCI_EXT_CAP_ID_PL_16GT 0x26 /* Physical Layer 16.0 GT/s */ +#define PCI_EXT_CAP_ID_MAX PCI_EXT_CAP_ID_PL_16GT #define PCI_EXT_CAP_DSN_SIZEOF 12 #define PCI_EXT_CAP_MCAST_ENDPOINT_SIZEOF 40 @@ -XXX,XX +XXX,XX @@ #define PCI_L1SS_CTL1_LTR_L12_TH_SCALE 0xe0000000 /* LTR_L1.2_THRESHOLD_Scale */ #define PCI_L1SS_CTL2 0x0c /* Control 2 Register */ +/* Data Link Feature */ +#define PCI_DLF_CAP 0x04 /* Capabilities Register */ +#define PCI_DLF_EXCHANGE_ENABLE 0x80000000 /* Data Link Feature Exchange Enable */ + +/* Physical Layer 16.0 GT/s */ +#define PCI_PL_16GT_LE_CTRL 0x20 /* Lane Equalization Control Register */ +#define PCI_PL_16GT_LE_CTRL_DSP_TX_PRESET_MASK 0x0000000F +#define PCI_PL_16GT_LE_CTRL_USP_TX_PRESET_MASK 0x000000F0 +#define PCI_PL_16GT_LE_CTRL_USP_TX_PRESET_SHIFT 4 + #endif /* LINUX_PCI_REGS_H */ diff --git a/include/standard-headers/linux/virtio_fs.h b/include/standard-headers/linux/virtio_fs.h new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/include/standard-headers/linux/virtio_fs.h @@ -XXX,XX +XXX,XX @@ +/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */ + +#ifndef _LINUX_VIRTIO_FS_H +#define _LINUX_VIRTIO_FS_H + +#include "standard-headers/linux/types.h" +#include "standard-headers/linux/virtio_ids.h" +#include "standard-headers/linux/virtio_config.h" +#include "standard-headers/linux/virtio_types.h" + +struct virtio_fs_config { + /* Filesystem name (UTF-8, not NUL-terminated, padded with NULs) */ + uint8_t tag[36]; + + /* Number of request queues */ + uint32_t num_request_queues; +} QEMU_PACKED; + +#endif /* _LINUX_VIRTIO_FS_H */ diff --git a/include/standard-headers/linux/virtio_ids.h b/include/standard-headers/linux/virtio_ids.h index XXXXXXX..XXXXXXX 100644 --- a/include/standard-headers/linux/virtio_ids.h +++ b/include/standard-headers/linux/virtio_ids.h @@ -XXX,XX +XXX,XX @@ #define VIRTIO_ID_INPUT 18 /* virtio input */ #define VIRTIO_ID_VSOCK 19 /* virtio vsock transport */ #define VIRTIO_ID_CRYPTO 20 /* virtio crypto */ +#define VIRTIO_ID_IOMMU 23 /* virtio IOMMU */ +#define VIRTIO_ID_FS 26 /* virtio filesystem */ #define VIRTIO_ID_PMEM 27 /* virtio pmem */ #endif /* _LINUX_VIRTIO_IDS_H */ diff --git a/include/standard-headers/linux/virtio_iommu.h b/include/standard-headers/linux/virtio_iommu.h new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/include/standard-headers/linux/virtio_iommu.h @@ -XXX,XX +XXX,XX @@ +/* SPDX-License-Identifier: BSD-3-Clause */ +/* + * Virtio-iommu definition v0.12 + * + * Copyright (C) 2019 Arm Ltd. + */ +#ifndef _LINUX_VIRTIO_IOMMU_H +#define _LINUX_VIRTIO_IOMMU_H + +#include "standard-headers/linux/types.h" + +/* Feature bits */ +#define VIRTIO_IOMMU_F_INPUT_RANGE 0 +#define VIRTIO_IOMMU_F_DOMAIN_RANGE 1 +#define VIRTIO_IOMMU_F_MAP_UNMAP 2 +#define VIRTIO_IOMMU_F_BYPASS 3 +#define VIRTIO_IOMMU_F_PROBE 4 +#define VIRTIO_IOMMU_F_MMIO 5 + +struct virtio_iommu_range_64 { + uint64_t start; + uint64_t end; +}; + +struct virtio_iommu_range_32 { + uint32_t start; + uint32_t end; +}; + +struct virtio_iommu_config { + /* Supported page sizes */ + uint64_t page_size_mask; + /* Supported IOVA range */ + struct virtio_iommu_range_64 input_range; + /* Max domain ID size */ + struct virtio_iommu_range_32 domain_range; + /* Probe buffer size */ + uint32_t probe_size; +}; + +/* Request types */ +#define VIRTIO_IOMMU_T_ATTACH 0x01 +#define VIRTIO_IOMMU_T_DETACH 0x02 +#define VIRTIO_IOMMU_T_MAP 0x03 +#define VIRTIO_IOMMU_T_UNMAP 0x04 +#define VIRTIO_IOMMU_T_PROBE 0x05 + +/* Status types */ +#define VIRTIO_IOMMU_S_OK 0x00 +#define VIRTIO_IOMMU_S_IOERR 0x01 +#define VIRTIO_IOMMU_S_UNSUPP 0x02 +#define VIRTIO_IOMMU_S_DEVERR 0x03 +#define VIRTIO_IOMMU_S_INVAL 0x04 +#define VIRTIO_IOMMU_S_RANGE 0x05 +#define VIRTIO_IOMMU_S_NOENT 0x06 +#define VIRTIO_IOMMU_S_FAULT 0x07 +#define VIRTIO_IOMMU_S_NOMEM 0x08 + +struct virtio_iommu_req_head { + uint8_t type; + uint8_t reserved[3]; +}; + +struct virtio_iommu_req_tail { + uint8_t status; + uint8_t reserved[3]; +}; + +struct virtio_iommu_req_attach { + struct virtio_iommu_req_head head; + uint32_t domain; + uint32_t endpoint; + uint8_t reserved[8]; + struct virtio_iommu_req_tail tail; +}; + +struct virtio_iommu_req_detach { + struct virtio_iommu_req_head head; + uint32_t domain; + uint32_t endpoint; + uint8_t reserved[8]; + struct virtio_iommu_req_tail tail; +}; + +#define VIRTIO_IOMMU_MAP_F_READ (1 << 0) +#define VIRTIO_IOMMU_MAP_F_WRITE (1 << 1) +#define VIRTIO_IOMMU_MAP_F_MMIO (1 << 2) + +#define VIRTIO_IOMMU_MAP_F_MASK (VIRTIO_IOMMU_MAP_F_READ | \ + VIRTIO_IOMMU_MAP_F_WRITE | \ + VIRTIO_IOMMU_MAP_F_MMIO) + +struct virtio_iommu_req_map { + struct virtio_iommu_req_head head; + uint32_t domain; + uint64_t virt_start; + uint64_t virt_end; + uint64_t phys_start; + uint32_t flags; + struct virtio_iommu_req_tail tail; +}; + +struct virtio_iommu_req_unmap { + struct virtio_iommu_req_head head; + uint32_t domain; + uint64_t virt_start; + uint64_t virt_end; + uint8_t reserved[4]; + struct virtio_iommu_req_tail tail; +}; + +#define VIRTIO_IOMMU_PROBE_T_NONE 0 +#define VIRTIO_IOMMU_PROBE_T_RESV_MEM 1 + +#define VIRTIO_IOMMU_PROBE_T_MASK 0xfff + +struct virtio_iommu_probe_property { + uint16_t type; + uint16_t length; +}; + +#define VIRTIO_IOMMU_RESV_MEM_T_RESERVED 0 +#define VIRTIO_IOMMU_RESV_MEM_T_MSI 1 + +struct virtio_iommu_probe_resv_mem { + struct virtio_iommu_probe_property head; + uint8_t subtype; + uint8_t reserved[3]; + uint64_t start; + uint64_t end; +}; + +struct virtio_iommu_req_probe { + struct virtio_iommu_req_head head; + uint32_t endpoint; + uint8_t reserved[64]; + + uint8_t properties[]; + + /* + * Tail follows the variable-length properties array. No padding, + * property lengths are all aligned on 8 bytes. + */ +}; + +/* Fault types */ +#define VIRTIO_IOMMU_FAULT_R_UNKNOWN 0 +#define VIRTIO_IOMMU_FAULT_R_DOMAIN 1 +#define VIRTIO_IOMMU_FAULT_R_MAPPING 2 + +#define VIRTIO_IOMMU_FAULT_F_READ (1 << 0) +#define VIRTIO_IOMMU_FAULT_F_WRITE (1 << 1) +#define VIRTIO_IOMMU_FAULT_F_EXEC (1 << 2) +#define VIRTIO_IOMMU_FAULT_F_ADDRESS (1 << 8) + +struct virtio_iommu_fault { + uint8_t reason; + uint8_t reserved[3]; + uint32_t flags; + uint32_t endpoint; + uint8_t reserved2[4]; + uint64_t address; +}; + +#endif diff --git a/include/standard-headers/linux/virtio_pmem.h b/include/standard-headers/linux/virtio_pmem.h index XXXXXXX..XXXXXXX 100644 --- a/include/standard-headers/linux/virtio_pmem.h +++ b/include/standard-headers/linux/virtio_pmem.h @@ -XXX,XX +XXX,XX @@ -/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */ +/* SPDX-License-Identifier: (GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause */ /* * Definitions for virtio-pmem devices. * @@ -XXX,XX +XXX,XX @@ * Author(s): Pankaj Gupta <pagupta@redhat.com> */ -#ifndef _UAPI_LINUX_VIRTIO_PMEM_H -#define _UAPI_LINUX_VIRTIO_PMEM_H +#ifndef _LINUX_VIRTIO_PMEM_H +#define _LINUX_VIRTIO_PMEM_H #include "standard-headers/linux/types.h" #include "standard-headers/linux/virtio_ids.h" diff --git a/linux-headers/asm-arm/kvm.h b/linux-headers/asm-arm/kvm.h index XXXXXXX..XXXXXXX 100644 --- a/linux-headers/asm-arm/kvm.h +++ b/linux-headers/asm-arm/kvm.h @@ -XXX,XX +XXX,XX @@ struct kvm_vcpu_events { #define KVM_REG_ARM_FW_REG(r) (KVM_REG_ARM | KVM_REG_SIZE_U64 | \ KVM_REG_ARM_FW | ((r) & 0xffff)) #define KVM_REG_ARM_PSCI_VERSION KVM_REG_ARM_FW_REG(0) +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1 KVM_REG_ARM_FW_REG(1) + /* Higher values mean better protection. */ +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL 0 +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_AVAIL 1 +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_REQUIRED 2 +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2 KVM_REG_ARM_FW_REG(2) + /* Higher values mean better protection. */ +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL 0 +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNKNOWN 1 +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL 2 +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED 3 +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED (1U << 4) /* Device Control API: ARM VGIC */ #define KVM_DEV_ARM_VGIC_GRP_ADDR 0 @@ -XXX,XX +XXX,XX @@ struct kvm_vcpu_events { #define KVM_DEV_ARM_ITS_CTRL_RESET 4 /* KVM_IRQ_LINE irq field index values */ +#define KVM_ARM_IRQ_VCPU2_SHIFT 28 +#define KVM_ARM_IRQ_VCPU2_MASK 0xf #define KVM_ARM_IRQ_TYPE_SHIFT 24 -#define KVM_ARM_IRQ_TYPE_MASK 0xff +#define KVM_ARM_IRQ_TYPE_MASK 0xf #define KVM_ARM_IRQ_VCPU_SHIFT 16 #define KVM_ARM_IRQ_VCPU_MASK 0xff #define KVM_ARM_IRQ_NUM_SHIFT 0 diff --git a/linux-headers/asm-arm/unistd-common.h b/linux-headers/asm-arm/unistd-common.h index XXXXXXX..XXXXXXX 100644 --- a/linux-headers/asm-arm/unistd-common.h +++ b/linux-headers/asm-arm/unistd-common.h @@ -XXX,XX +XXX,XX @@ #define __NR_fsconfig (__NR_SYSCALL_BASE + 431) #define __NR_fsmount (__NR_SYSCALL_BASE + 432) #define __NR_fspick (__NR_SYSCALL_BASE + 433) +#define __NR_pidfd_open (__NR_SYSCALL_BASE + 434) +#define __NR_clone3 (__NR_SYSCALL_BASE + 435) #endif /* _ASM_ARM_UNISTD_COMMON_H */ diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h index XXXXXXX..XXXXXXX 100644 --- a/linux-headers/asm-arm64/kvm.h +++ b/linux-headers/asm-arm64/kvm.h @@ -XXX,XX +XXX,XX @@ struct kvm_vcpu_events { #define KVM_REG_ARM_FW_REG(r) (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \ KVM_REG_ARM_FW | ((r) & 0xffff)) #define KVM_REG_ARM_PSCI_VERSION KVM_REG_ARM_FW_REG(0) +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1 KVM_REG_ARM_FW_REG(1) +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_AVAIL 0 +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_AVAIL 1 +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1_NOT_REQUIRED 2 +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2 KVM_REG_ARM_FW_REG(2) +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_AVAIL 0 +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_UNKNOWN 1 +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_AVAIL 2 +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED 3 +#define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED (1U << 4) /* SVE registers */ #define KVM_REG_ARM64_SVE (0x15 << KVM_REG_ARM_COPROC_SHIFT) @@ -XXX,XX +XXX,XX @@ struct kvm_vcpu_events { KVM_REG_SIZE_U256 | \ ((i) & (KVM_ARM64_SVE_MAX_SLICES - 1))) +/* + * Register values for KVM_REG_ARM64_SVE_ZREG(), KVM_REG_ARM64_SVE_PREG() and + * KVM_REG_ARM64_SVE_FFR() are represented in memory in an endianness- + * invariant layout which differs from the layout used for the FPSIMD + * V-registers on big-endian systems: see sigcontext.h for more explanation. + */ + #define KVM_ARM64_SVE_VQ_MIN __SVE_VQ_MIN #define KVM_ARM64_SVE_VQ_MAX __SVE_VQ_MAX @@ -XXX,XX +XXX,XX @@ struct kvm_vcpu_events { #define KVM_ARM_VCPU_TIMER_IRQ_PTIMER 1 /* KVM_IRQ_LINE irq field index values */ +#define KVM_ARM_IRQ_VCPU2_SHIFT 28 +#define KVM_ARM_IRQ_VCPU2_MASK 0xf #define KVM_ARM_IRQ_TYPE_SHIFT 24 -#define KVM_ARM_IRQ_TYPE_MASK 0xff +#define KVM_ARM_IRQ_TYPE_MASK 0xf #define KVM_ARM_IRQ_VCPU_SHIFT 16 #define KVM_ARM_IRQ_VCPU_MASK 0xff #define KVM_ARM_IRQ_NUM_SHIFT 0 diff --git a/linux-headers/asm-generic/mman-common.h b/linux-headers/asm-generic/mman-common.h index XXXXXXX..XXXXXXX 100644 --- a/linux-headers/asm-generic/mman-common.h +++ b/linux-headers/asm-generic/mman-common.h @@ -XXX,XX +XXX,XX @@ #define MAP_TYPE 0x0f /* Mask for type of mapping */ #define MAP_FIXED 0x10 /* Interpret addr exactly */ #define MAP_ANONYMOUS 0x20 /* don't use a file */ -#ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED -# define MAP_UNINITIALIZED 0x4000000 /* For anonymous mmap, memory could be uninitialized */ -#else -# define MAP_UNINITIALIZED 0x0 /* Don't support this flag */ -#endif -/* 0x0100 - 0x80000 flags are defined in asm-generic/mman.h */ +/* 0x0100 - 0x4000 flags are defined in asm-generic/mman.h */ +#define MAP_POPULATE 0x008000 /* populate (prefault) pagetables */ +#define MAP_NONBLOCK 0x010000 /* do not block on IO */ +#define MAP_STACK 0x020000 /* give out an address that is best suited for process/thread stacks */ +#define MAP_HUGETLB 0x040000 /* create a huge page mapping */ +#define MAP_SYNC 0x080000 /* perform synchronous page faults for the mapping */ #define MAP_FIXED_NOREPLACE 0x100000 /* MAP_FIXED which doesn't unmap underlying mapping */ +#define MAP_UNINITIALIZED 0x4000000 /* For anonymous mmap, memory could be + * uninitialized */ + /* * Flags for mlock */ @@ -XXX,XX +XXX,XX @@ #define MADV_WIPEONFORK 18 /* Zero memory on fork, child only */ #define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */ +#define MADV_COLD 20 /* deactivate these pages */ +#define MADV_PAGEOUT 21 /* reclaim these pages */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/linux-headers/asm-generic/mman.h b/linux-headers/asm-generic/mman.h index XXXXXXX..XXXXXXX 100644 --- a/linux-headers/asm-generic/mman.h +++ b/linux-headers/asm-generic/mman.h @@ -XXX,XX +XXX,XX @@ #define MAP_EXECUTABLE 0x1000 /* mark it as an executable */ #define MAP_LOCKED 0x2000 /* pages are locked */ #define MAP_NORESERVE 0x4000 /* don't check for reservations */ -#define MAP_POPULATE 0x8000 /* populate (prefault) pagetables */ -#define MAP_NONBLOCK 0x10000 /* do not block on IO */ -#define MAP_STACK 0x20000 /* give out an address that is best suited for process/thread stacks */ -#define MAP_HUGETLB 0x40000 /* create a huge page mapping */ -#define MAP_SYNC 0x80000 /* perform synchronous page faults for the mapping */ -/* Bits [26:31] are reserved, see mman-common.h for MAP_HUGETLB usage */ +/* + * Bits [26:31] are reserved, see asm-generic/hugetlb_encode.h + * for MAP_HUGETLB usage + */ #define MCL_CURRENT 1 /* lock all current mappings */ #define MCL_FUTURE 2 /* lock all future mappings */ diff --git a/linux-headers/asm-generic/unistd.h b/linux-headers/asm-generic/unistd.h index XXXXXXX..XXXXXXX 100644 --- a/linux-headers/asm-generic/unistd.h +++ b/linux-headers/asm-generic/unistd.h @@ -XXX,XX +XXX,XX @@ __SYSCALL(__NR_semget, sys_semget) __SC_COMP(__NR_semctl, sys_semctl, compat_sys_semctl) #if defined(__ARCH_WANT_TIME32_SYSCALLS) || __BITS_PER_LONG != 32 #define __NR_semtimedop 192 -__SC_COMP(__NR_semtimedop, sys_semtimedop, sys_semtimedop_time32) +__SC_3264(__NR_semtimedop, sys_semtimedop_time32, sys_semtimedop) #endif #define __NR_semop 193 __SYSCALL(__NR_semop, sys_semop) @@ -XXX,XX +XXX,XX @@ __SYSCALL(__NR_fsconfig, sys_fsconfig) __SYSCALL(__NR_fsmount, sys_fsmount) #define __NR_fspick 433 __SYSCALL(__NR_fspick, sys_fspick) +#define __NR_pidfd_open 434 +__SYSCALL(__NR_pidfd_open, sys_pidfd_open) +#ifdef __ARCH_WANT_SYS_CLONE3 +#define __NR_clone3 435 +__SYSCALL(__NR_clone3, sys_clone3) +#endif #undef __NR_syscalls -#define __NR_syscalls 434 +#define __NR_syscalls 436 /* * 32 bit systems traditionally used different diff --git a/linux-headers/asm-mips/mman.h b/linux-headers/asm-mips/mman.h index XXXXXXX..XXXXXXX 100644 --- a/linux-headers/asm-mips/mman.h +++ b/linux-headers/asm-mips/mman.h @@ -XXX,XX +XXX,XX @@ #define MADV_WIPEONFORK 18 /* Zero memory on fork, child only */ #define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */ +#define MADV_COLD 20 /* deactivate these pages */ +#define MADV_PAGEOUT 21 /* reclaim these pages */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/linux-headers/asm-mips/unistd_n32.h b/linux-headers/asm-mips/unistd_n32.h index XXXXXXX..XXXXXXX 100644 --- a/linux-headers/asm-mips/unistd_n32.h +++ b/linux-headers/asm-mips/unistd_n32.h @@ -XXX,XX +XXX,XX @@ #define __NR_fsconfig (__NR_Linux + 431) #define __NR_fsmount (__NR_Linux + 432) #define __NR_fspick (__NR_Linux + 433) +#define __NR_pidfd_open (__NR_Linux + 434) #endif /* _ASM_MIPS_UNISTD_N32_H */ diff --git a/linux-headers/asm-mips/unistd_n64.h b/linux-headers/asm-mips/unistd_n64.h index XXXXXXX..XXXXXXX 100644 --- a/linux-headers/asm-mips/unistd_n64.h +++ b/linux-headers/asm-mips/unistd_n64.h @@ -XXX,XX +XXX,XX @@ #define __NR_fsconfig (__NR_Linux + 431) #define __NR_fsmount (__NR_Linux + 432) #define __NR_fspick (__NR_Linux + 433) +#define __NR_pidfd_open (__NR_Linux + 434) #endif /* _ASM_MIPS_UNISTD_N64_H */ diff --git a/linux-headers/asm-mips/unistd_o32.h b/linux-headers/asm-mips/unistd_o32.h index XXXXXXX..XXXXXXX 100644 --- a/linux-headers/asm-mips/unistd_o32.h +++ b/linux-headers/asm-mips/unistd_o32.h @@ -XXX,XX +XXX,XX @@ #define __NR_fsconfig (__NR_Linux + 431) #define __NR_fsmount (__NR_Linux + 432) #define __NR_fspick (__NR_Linux + 433) +#define __NR_pidfd_open (__NR_Linux + 434) #endif /* _ASM_MIPS_UNISTD_O32_H */ diff --git a/linux-headers/asm-powerpc/mman.h b/linux-headers/asm-powerpc/mman.h index XXXXXXX..XXXXXXX 100644 --- a/linux-headers/asm-powerpc/mman.h +++ b/linux-headers/asm-powerpc/mman.h @@ -XXX,XX +XXX,XX @@ #define MAP_DENYWRITE 0x0800 /* ETXTBSY */ #define MAP_EXECUTABLE 0x1000 /* mark it as an executable */ + #define MCL_CURRENT 0x2000 /* lock all currently mapped pages */ #define MCL_FUTURE 0x4000 /* lock all additions to address space */ #define MCL_ONFAULT 0x8000 /* lock all pages that are faulted in */ -#define MAP_POPULATE 0x8000 /* populate (prefault) pagetables */ -#define MAP_NONBLOCK 0x10000 /* do not block on IO */ -#define MAP_STACK 0x20000 /* give out an address that is best suited for process/thread stacks */ -#define MAP_HUGETLB 0x40000 /* create a huge page mapping */ - /* Override any generic PKEY permission defines */ #define PKEY_DISABLE_EXECUTE 0x4 #undef PKEY_ACCESS_MASK diff --git a/linux-headers/asm-powerpc/unistd_32.h b/linux-headers/asm-powerpc/unistd_32.h index XXXXXXX..XXXXXXX 100644 --- a/linux-headers/asm-powerpc/unistd_32.h +++ b/linux-headers/asm-powerpc/unistd_32.h @@ -XXX,XX +XXX,XX @@ #define __NR_fsconfig 431 #define __NR_fsmount 432 #define __NR_fspick 433 +#define __NR_pidfd_open 434 +#define __NR_clone3 435 #endif /* _ASM_POWERPC_UNISTD_32_H */ diff --git a/linux-headers/asm-powerpc/unistd_64.h b/linux-headers/asm-powerpc/unistd_64.h index XXXXXXX..XXXXXXX 100644 --- a/linux-headers/asm-powerpc/unistd_64.h +++ b/linux-headers/asm-powerpc/unistd_64.h @@ -XXX,XX +XXX,XX @@ #define __NR_fsconfig 431 #define __NR_fsmount 432 #define __NR_fspick 433 +#define __NR_pidfd_open 434 +#define __NR_clone3 435 #endif /* _ASM_POWERPC_UNISTD_64_H */ diff --git a/linux-headers/asm-s390/kvm.h b/linux-headers/asm-s390/kvm.h index XXXXXXX..XXXXXXX 100644 --- a/linux-headers/asm-s390/kvm.h +++ b/linux-headers/asm-s390/kvm.h @@ -XXX,XX +XXX,XX @@ struct kvm_guest_debug_arch { #define KVM_SYNC_GSCB (1UL << 9) #define KVM_SYNC_BPBC (1UL << 10) #define KVM_SYNC_ETOKEN (1UL << 11) + +#define KVM_SYNC_S390_VALID_FIELDS \ + (KVM_SYNC_PREFIX | KVM_SYNC_GPRS | KVM_SYNC_ACRS | KVM_SYNC_CRS | \ + KVM_SYNC_ARCH0 | KVM_SYNC_PFAULT | KVM_SYNC_VRS | KVM_SYNC_RICCB | \ + KVM_SYNC_FPRS | KVM_SYNC_GSCB | KVM_SYNC_BPBC | KVM_SYNC_ETOKEN) + /* length and alignment of the sdnx as a power of two */ #define SDNXC 8 #define SDNXL (1UL << SDNXC) diff --git a/linux-headers/asm-s390/unistd_32.h b/linux-headers/asm-s390/unistd_32.h index XXXXXXX..XXXXXXX 100644 --- a/linux-headers/asm-s390/unistd_32.h +++ b/linux-headers/asm-s390/unistd_32.h @@ -XXX,XX +XXX,XX @@ #define __NR_fsconfig 431 #define __NR_fsmount 432 #define __NR_fspick 433 +#define __NR_pidfd_open 434 +#define __NR_clone3 435 #endif /* _ASM_S390_UNISTD_32_H */ diff --git a/linux-headers/asm-s390/unistd_64.h b/linux-headers/asm-s390/unistd_64.h index XXXXXXX..XXXXXXX 100644 --- a/linux-headers/asm-s390/unistd_64.h +++ b/linux-headers/asm-s390/unistd_64.h @@ -XXX,XX +XXX,XX @@ #define __NR_fsconfig 431 #define __NR_fsmount 432 #define __NR_fspick 433 +#define __NR_pidfd_open 434 +#define __NR_clone3 435 #endif /* _ASM_S390_UNISTD_64_H */ diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h index XXXXXXX..XXXXXXX 100644 --- a/linux-headers/asm-x86/kvm.h +++ b/linux-headers/asm-x86/kvm.h @@ -XXX,XX +XXX,XX @@ struct kvm_sync_regs { struct kvm_vcpu_events events; }; -#define KVM_X86_QUIRK_LINT0_REENABLED (1 << 0) -#define KVM_X86_QUIRK_CD_NW_CLEARED (1 << 1) -#define KVM_X86_QUIRK_LAPIC_MMIO_HOLE (1 << 2) -#define KVM_X86_QUIRK_OUT_7E_INC_RIP (1 << 3) +#define KVM_X86_QUIRK_LINT0_REENABLED (1 << 0) +#define KVM_X86_QUIRK_CD_NW_CLEARED (1 << 1) +#define KVM_X86_QUIRK_LAPIC_MMIO_HOLE (1 << 2) +#define KVM_X86_QUIRK_OUT_7E_INC_RIP (1 << 3) +#define KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT (1 << 4) #define KVM_STATE_NESTED_FORMAT_VMX 0 -#define KVM_STATE_NESTED_FORMAT_SVM 1 +#define KVM_STATE_NESTED_FORMAT_SVM 1 /* unused */ #define KVM_STATE_NESTED_GUEST_MODE 0x00000001 #define KVM_STATE_NESTED_RUN_PENDING 0x00000002 #define KVM_STATE_NESTED_EVMCS 0x00000004 -#define KVM_STATE_NESTED_VMX_VMCS_SIZE 0x1000 - #define KVM_STATE_NESTED_SMM_GUEST_MODE 0x00000001 #define KVM_STATE_NESTED_SMM_VMXON 0x00000002 +#define KVM_STATE_NESTED_VMX_VMCS_SIZE 0x1000 + struct kvm_vmx_nested_state_data { __u8 vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE]; __u8 shadow_vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE]; @@ -XXX,XX +XXX,XX @@ struct kvm_nested_state { } data; }; +/* for KVM_CAP_PMU_EVENT_FILTER */ +struct kvm_pmu_event_filter { + __u32 action; + __u32 nevents; + __u32 fixed_counter_bitmap; + __u32 flags; + __u32 pad[4]; + __u64 events[0]; +}; + +#define KVM_PMU_EVENT_ALLOW 0 +#define KVM_PMU_EVENT_DENY 1 + #endif /* _ASM_X86_KVM_H */ diff --git a/linux-headers/asm-x86/unistd.h b/linux-headers/asm-x86/unistd.h index XXXXXXX..XXXXXXX 100644 --- a/linux-headers/asm-x86/unistd.h +++ b/linux-headers/asm-x86/unistd.h @@ -XXX,XX +XXX,XX @@ #define _ASM_X86_UNISTD_H /* x32 syscall flag bit */ -#define __X32_SYSCALL_BIT 0x40000000 +#define __X32_SYSCALL_BIT 0x40000000UL # ifdef __i386__ # include <asm/unistd_32.h> diff --git a/linux-headers/asm-x86/unistd_32.h b/linux-headers/asm-x86/unistd_32.h index XXXXXXX..XXXXXXX 100644 --- a/linux-headers/asm-x86/unistd_32.h +++ b/linux-headers/asm-x86/unistd_32.h @@ -XXX,XX +XXX,XX @@ #define __NR_fsconfig 431 #define __NR_fsmount 432 #define __NR_fspick 433 +#define __NR_pidfd_open 434 +#define __NR_clone3 435 #endif /* _ASM_X86_UNISTD_32_H */ diff --git a/linux-headers/asm-x86/unistd_64.h b/linux-headers/asm-x86/unistd_64.h index XXXXXXX..XXXXXXX 100644 --- a/linux-headers/asm-x86/unistd_64.h +++ b/linux-headers/asm-x86/unistd_64.h @@ -XXX,XX +XXX,XX @@ #define __NR_fsconfig 431 #define __NR_fsmount 432 #define __NR_fspick 433 +#define __NR_pidfd_open 434 +#define __NR_clone3 435 #endif /* _ASM_X86_UNISTD_64_H */ diff --git a/linux-headers/asm-x86/unistd_x32.h b/linux-headers/asm-x86/unistd_x32.h index XXXXXXX..XXXXXXX 100644 --- a/linux-headers/asm-x86/unistd_x32.h +++ b/linux-headers/asm-x86/unistd_x32.h @@ -XXX,XX +XXX,XX @@ #define __NR_fsconfig (__X32_SYSCALL_BIT + 431) #define __NR_fsmount (__X32_SYSCALL_BIT + 432) #define __NR_fspick (__X32_SYSCALL_BIT + 433) +#define __NR_pidfd_open (__X32_SYSCALL_BIT + 434) +#define __NR_clone3 (__X32_SYSCALL_BIT + 435) #define __NR_rt_sigaction (__X32_SYSCALL_BIT + 512) #define __NR_rt_sigreturn (__X32_SYSCALL_BIT + 513) #define __NR_ioctl (__X32_SYSCALL_BIT + 514) diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index XXXXXXX..XXXXXXX 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -XXX,XX +XXX,XX @@ struct kvm_irq_level { * ACPI gsi notion of irq. * For IA-64 (APIC model) IOAPIC0: irq 0-23; IOAPIC1: irq 24-47.. * For X86 (standard AT mode) PIC0/1: irq 0-15. IOAPIC0: 0-23.. - * For ARM: See Documentation/virtual/kvm/api.txt + * For ARM: See Documentation/virt/kvm/api.txt */ union { __u32 irq; @@ -XXX,XX +XXX,XX @@ struct kvm_hyperv_exit { #define KVM_INTERNAL_ERROR_SIMUL_EX 2 /* Encounter unexpected vm-exit due to delivery event. */ #define KVM_INTERNAL_ERROR_DELIVERY_EV 3 +/* Encounter unexpected vm-exit reason */ +#define KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON 4 /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */ struct kvm_run { @@ -XXX,XX +XXX,XX @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_ARM_SVE 170 #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171 #define KVM_CAP_ARM_PTRAUTH_GENERIC 172 +#define KVM_CAP_PMU_EVENT_FILTER 173 +#define KVM_CAP_ARM_IRQ_LINE_LAYOUT_2 174 +#define KVM_CAP_HYPERV_DIRECT_TLBFLUSH 175 #ifdef KVM_CAP_IRQ_ROUTING @@ -XXX,XX +XXX,XX @@ struct kvm_xen_hvm_config { * * KVM_IRQFD_FLAG_RESAMPLE indicates resamplefd is valid and specifies * the irqfd to operate in resampling mode for level triggered interrupt - * emulation. See Documentation/virtual/kvm/api.txt. + * emulation. See Documentation/virt/kvm/api.txt. */ #define KVM_IRQFD_FLAG_RESAMPLE (1 << 1) @@ -XXX,XX +XXX,XX @@ struct kvm_dirty_tlb { #define KVM_REG_S390 0x5000000000000000ULL #define KVM_REG_ARM64 0x6000000000000000ULL #define KVM_REG_MIPS 0x7000000000000000ULL +#define KVM_REG_RISCV 0x8000000000000000ULL #define KVM_REG_SIZE_SHIFT 52 #define KVM_REG_SIZE_MASK 0x00f0000000000000ULL @@ -XXX,XX +XXX,XX @@ struct kvm_s390_ucas_mapping { #define KVM_PPC_GET_RMMU_INFO _IOW(KVMIO, 0xb0, struct kvm_ppc_rmmu_info) /* Available with KVM_CAP_PPC_GET_CPU_CHAR */ #define KVM_PPC_GET_CPU_CHAR _IOR(KVMIO, 0xb1, struct kvm_ppc_cpu_char) +/* Available with KVM_CAP_PMU_EVENT_FILTER */ +#define KVM_SET_PMU_EVENT_FILTER _IOW(KVMIO, 0xb2, struct kvm_pmu_event_filter) /* ioctl for vm fd */ #define KVM_CREATE_DEVICE _IOWR(KVMIO, 0xe0, struct kvm_create_device) diff --git a/linux-headers/linux/psp-sev.h b/linux-headers/linux/psp-sev.h index XXXXXXX..XXXXXXX 100644 --- a/linux-headers/linux/psp-sev.h +++ b/linux-headers/linux/psp-sev.h @@ -XXX,XX +XXX,XX @@ +/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */ /* * Userspace interface for AMD Secure Encrypted Virtualization (SEV) * platform management commands. @@ -XXX,XX +XXX,XX @@ * Author: Brijesh Singh <brijesh.singh@amd.com> * * SEV API specification is available at: https://developer.amd.com/sev/ - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. */ #ifndef __PSP_SEV_USER_H__ diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h index XXXXXXX..XXXXXXX 100644 --- a/linux-headers/linux/vfio.h +++ b/linux-headers/linux/vfio.h @@ -XXX,XX +XXX,XX @@ struct vfio_region_info_cap_type { __u32 subtype; /* type specific */ }; +/* + * List of region types, global per bus driver. + * If you introduce a new type, please add it here. + */ + +/* PCI region type containing a PCI vendor part */ #define VFIO_REGION_TYPE_PCI_VENDOR_TYPE (1 << 31) #define VFIO_REGION_TYPE_PCI_VENDOR_MASK (0xffff) +#define VFIO_REGION_TYPE_GFX (1) +#define VFIO_REGION_TYPE_CCW (2) -/* 8086 Vendor sub-types */ +/* sub-types for VFIO_REGION_TYPE_PCI_* */ + +/* 8086 vendor PCI sub-types */ #define VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION (1) #define VFIO_REGION_SUBTYPE_INTEL_IGD_HOST_CFG (2) #define VFIO_REGION_SUBTYPE_INTEL_IGD_LPC_CFG (3) -#define VFIO_REGION_TYPE_GFX (1) +/* 10de vendor PCI sub-types */ +/* + * NVIDIA GPU NVlink2 RAM is coherent RAM mapped onto the host address space. + */ +#define VFIO_REGION_SUBTYPE_NVIDIA_NVLINK2_RAM (1) + +/* 1014 vendor PCI sub-types */ +/* + * IBM NPU NVlink2 ATSD (Address Translation Shootdown) register of NPU + * to do TLB invalidation on a GPU. + */ +#define VFIO_REGION_SUBTYPE_IBM_NVLINK2_ATSD (1) + +/* sub-types for VFIO_REGION_TYPE_GFX */ #define VFIO_REGION_SUBTYPE_GFX_EDID (1) /** @@ -XXX,XX +XXX,XX @@ struct vfio_region_gfx_edid { #define VFIO_DEVICE_GFX_LINK_STATE_DOWN 2 }; -#define VFIO_REGION_TYPE_CCW (2) -/* ccw sub-types */ +/* sub-types for VFIO_REGION_TYPE_CCW */ #define VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD (1) -/* - * 10de vendor sub-type - * - * NVIDIA GPU NVlink2 RAM is coherent RAM mapped onto the host address space. - */ -#define VFIO_REGION_SUBTYPE_NVIDIA_NVLINK2_RAM (1) - -/* - * 1014 vendor sub-type - * - * IBM NPU NVlink2 ATSD (Address Translation Shootdown) register of NPU - * to do TLB invalidation on a GPU. - */ -#define VFIO_REGION_SUBTYPE_IBM_NVLINK2_ATSD (1) - /* * The MSIX mappable capability informs that MSIX data of a BAR can be mmapped * which allows direct access to non-MSIX registers which happened to be within @@ -XXX,XX +XXX,XX @@ struct vfio_iommu_type1_info { __u32 argsz; __u32 flags; #define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */ - __u64 iova_pgsizes; /* Bitmap of supported page sizes */ +#define VFIO_IOMMU_INFO_CAPS (1 << 1) /* Info supports caps */ + __u64 iova_pgsizes; /* Bitmap of supported page sizes */ + __u32 cap_offset; /* Offset within info struct of first cap */ +}; + +/* + * The IOVA capability allows to report the valid IOVA range(s) + * excluding any non-relaxable reserved regions exposed by + * devices attached to the container. Any DMA map attempt + * outside the valid iova range will return error. + * + * The structures below define version 1 of this capability. + */ +#define VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE 1 + +struct vfio_iova_range { + __u64 start; + __u64 end; +}; + +struct vfio_iommu_type1_info_cap_iova_range { + struct vfio_info_cap_header header; + __u32 nr_iovas; + __u32 reserved; + struct vfio_iova_range iova_ranges[]; }; #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12) -- 2.20.1
From: Eric Auger <eric.auger@redhat.com> Host kernels that expose the KVM_CAP_ARM_IRQ_LINE_LAYOUT_2 capability allow injection of interrupts along with vcpu ids larger than 255. Let's encode the vpcu id on 12 bits according to the upgraded KVM_IRQ_LINE ABI when needed. Given that we have two callsites that need to assemble the value for kvm_set_irq(), a new helper routine, kvm_arm_set_irq is introduced. Without that patch qemu exits with "kvm_set_irq: Invalid argument" message. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reported-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Acked-by: Marc Zyngier <maz@kernel.org> Message-id: 20191003154640.22451-3-eric.auger@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/kvm_arm.h | 1 + hw/intc/arm_gic_kvm.c | 7 ++----- target/arm/cpu.c | 10 ++++------ target/arm/kvm.c | 12 ++++++++++++ 4 files changed, 19 insertions(+), 11 deletions(-) diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/kvm_arm.h +++ b/target/arm/kvm_arm.h @@ -XXX,XX +XXX,XX @@ int kvm_arm_vgic_probe(void); void kvm_arm_pmu_set_irq(CPUState *cs, int irq); void kvm_arm_pmu_init(CPUState *cs); +int kvm_arm_set_irq(int cpu, int irqtype, int irq, int level); #else diff --git a/hw/intc/arm_gic_kvm.c b/hw/intc/arm_gic_kvm.c index XXXXXXX..XXXXXXX 100644 --- a/hw/intc/arm_gic_kvm.c +++ b/hw/intc/arm_gic_kvm.c @@ -XXX,XX +XXX,XX @@ void kvm_arm_gic_set_irq(uint32_t num_irq, int irq, int level) * has separate fields in the irq number for type, * CPU number and interrupt number. */ - int kvm_irq, irqtype, cpu; + int irqtype, cpu; if (irq < (num_irq - GIC_INTERNAL)) { /* External interrupt. The kernel numbers these like the GIC @@ -XXX,XX +XXX,XX @@ void kvm_arm_gic_set_irq(uint32_t num_irq, int irq, int level) cpu = irq / GIC_INTERNAL; irq %= GIC_INTERNAL; } - kvm_irq = (irqtype << KVM_ARM_IRQ_TYPE_SHIFT) - | (cpu << KVM_ARM_IRQ_VCPU_SHIFT) | irq; - - kvm_set_irq(kvm_state, kvm_irq, !!level); + kvm_arm_set_irq(cpu, irqtype, irq, !!level); } static void kvm_arm_gicv2_set_irq(void *opaque, int irq, int level) diff --git a/target/arm/cpu.c b/target/arm/cpu.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -XXX,XX +XXX,XX @@ static void arm_cpu_kvm_set_irq(void *opaque, int irq, int level) ARMCPU *cpu = opaque; CPUARMState *env = &cpu->env; CPUState *cs = CPU(cpu); - int kvm_irq = KVM_ARM_IRQ_TYPE_CPU << KVM_ARM_IRQ_TYPE_SHIFT; uint32_t linestate_bit; + int irq_id; switch (irq) { case ARM_CPU_IRQ: - kvm_irq |= KVM_ARM_IRQ_CPU_IRQ; + irq_id = KVM_ARM_IRQ_CPU_IRQ; linestate_bit = CPU_INTERRUPT_HARD; break; case ARM_CPU_FIQ: - kvm_irq |= KVM_ARM_IRQ_CPU_FIQ; + irq_id = KVM_ARM_IRQ_CPU_FIQ; linestate_bit = CPU_INTERRUPT_FIQ; break; default: @@ -XXX,XX +XXX,XX @@ static void arm_cpu_kvm_set_irq(void *opaque, int irq, int level) } else { env->irq_line_state &= ~linestate_bit; } - - kvm_irq |= cs->cpu_index << KVM_ARM_IRQ_VCPU_SHIFT; - kvm_set_irq(kvm_state, kvm_irq, level ? 1 : 0); + kvm_arm_set_irq(cs->cpu_index, KVM_ARM_IRQ_TYPE_CPU, irq_id, !!level); #endif } diff --git a/target/arm/kvm.c b/target/arm/kvm.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/kvm.c +++ b/target/arm/kvm.c @@ -XXX,XX +XXX,XX @@ int kvm_arm_vgic_probe(void) } } +int kvm_arm_set_irq(int cpu, int irqtype, int irq, int level) +{ + int kvm_irq = (irqtype << KVM_ARM_IRQ_TYPE_SHIFT) | irq; + int cpu_idx1 = cpu % 256; + int cpu_idx2 = cpu / 256; + + kvm_irq |= (cpu_idx1 << KVM_ARM_IRQ_VCPU_SHIFT) | + (cpu_idx2 << KVM_ARM_IRQ_VCPU2_SHIFT); + + return kvm_set_irq(kvm_state, kvm_irq, !!level); +} + int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route, uint64_t address, uint32_t data, PCIDevice *dev) { -- 2.20.1
From: Eric Auger <eric.auger@redhat.com> Host kernel within [4.18, 5.3] report an erroneous KVM_MAX_VCPUS=512 for ARM. The actual capability to instantiate more than 256 vcpus was fixed in 5.4 with the upgrade of the KVM_IRQ_LINE ABI to support vcpu id encoded on 12 bits instead of 8 and a redistributor consuming a single KVM IO device instead of 2. So let's check this capability when attempting to use more than 256 vcpus within any ARM kvm accelerated machine. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Acked-by: Marc Zyngier <maz@kernel.org> Message-id: 20191003154640.22451-4-eric.auger@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/kvm.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/target/arm/kvm.c b/target/arm/kvm.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/kvm.c +++ b/target/arm/kvm.c @@ -XXX,XX +XXX,XX @@ int kvm_arm_get_max_vm_ipa_size(MachineState *ms) int kvm_arch_init(MachineState *ms, KVMState *s) { + int ret = 0; /* For ARM interrupt delivery is always asynchronous, * whether we are using an in-kernel VGIC or not. */ @@ -XXX,XX +XXX,XX @@ int kvm_arch_init(MachineState *ms, KVMState *s) cap_has_mp_state = kvm_check_extension(s, KVM_CAP_MP_STATE); - return 0; + if (ms->smp.cpus > 256 && + !kvm_check_extension(s, KVM_CAP_ARM_IRQ_LINE_LAYOUT_2)) { + error_report("Using more than 256 vcpus requires a host kernel " + "with KVM_CAP_ARM_IRQ_LINE_LAYOUT_2"); + ret = -EINVAL; + } + + return ret; } unsigned long kvm_arch_vcpu_id(CPUState *cpu) -- 2.20.1
Currently the ptimer design uses a QEMU bottom-half as its mechanism for calling back into the device model using the ptimer when the timer has expired. Unfortunately this design is fatally flawed, because it means that there is a lag between the ptimer updating its own state and the device callback function updating device state, and guest accesses to device registers between the two can return inconsistent device state. We want to replace the bottom-half design with one where the guest device's callback is called either immediately (when the ptimer triggers by timeout) or when the device model code closes a transaction-begin/end section (when the ptimer triggers because the device model changed the ptimer's count value or other state). As the first step, rename ptimer_init() to ptimer_init_with_bh(), to free up the ptimer_init() name for the new API. We can then convert all the ptimer users away from ptimer_init_with_bh() before removing it entirely. (Commit created with git grep -l ptimer_init | xargs sed -i -e 's/ptimer_init/ptimer_init_with_bh/' and three overlong lines folded by hand.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20191008171740.9679-2-peter.maydell@linaro.org --- include/hw/ptimer.h | 11 ++++++----- hw/arm/musicpal.c | 2 +- hw/core/ptimer.c | 2 +- hw/dma/xilinx_axidma.c | 2 +- hw/m68k/mcf5206.c | 2 +- hw/m68k/mcf5208.c | 2 +- hw/net/fsl_etsec/etsec.c | 2 +- hw/net/lan9118.c | 2 +- hw/timer/allwinner-a10-pit.c | 2 +- hw/timer/altera_timer.c | 2 +- hw/timer/arm_mptimer.c | 6 +++--- hw/timer/arm_timer.c | 2 +- hw/timer/cmsdk-apb-dualtimer.c | 2 +- hw/timer/cmsdk-apb-timer.c | 2 +- hw/timer/digic-timer.c | 2 +- hw/timer/etraxfs_timer.c | 6 +++--- hw/timer/exynos4210_mct.c | 7 ++++--- hw/timer/exynos4210_pwm.c | 2 +- hw/timer/exynos4210_rtc.c | 4 ++-- hw/timer/grlib_gptimer.c | 2 +- hw/timer/imx_epit.c | 4 ++-- hw/timer/imx_gpt.c | 2 +- hw/timer/lm32_timer.c | 2 +- hw/timer/milkymist-sysctl.c | 4 ++-- hw/timer/mss-timer.c | 2 +- hw/timer/puv3_ost.c | 2 +- hw/timer/sh_timer.c | 2 +- hw/timer/slavio_timer.c | 2 +- hw/timer/xilinx_timer.c | 2 +- hw/watchdog/cmsdk-apb-watchdog.c | 2 +- tests/ptimer-test.c | 22 +++++++++++----------- 31 files changed, 56 insertions(+), 54 deletions(-) diff --git a/include/hw/ptimer.h b/include/hw/ptimer.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/ptimer.h +++ b/include/hw/ptimer.h @@ -XXX,XX +XXX,XX @@ * ptimer_set_count() or ptimer_set_limit() will not trigger the timer * (though it will cause a reload). Only a counter decrement to "0" * will cause a trigger. Not compatible with NO_IMMEDIATE_TRIGGER; - * ptimer_init() will assert() that you don't set both. + * ptimer_init_with_bh() will assert() that you don't set both. */ #define PTIMER_POLICY_TRIGGER_ONLY_ON_DECREMENT (1 << 5) @@ -XXX,XX +XXX,XX @@ typedef struct ptimer_state ptimer_state; typedef void (*ptimer_cb)(void *opaque); /** - * ptimer_init - Allocate and return a new ptimer + * ptimer_init_with_bh - Allocate and return a new ptimer * @bh: QEMU bottom half which is run on timer expiry * @policy: PTIMER_POLICY_* bits specifying behaviour * @@ -XXX,XX +XXX,XX @@ typedef void (*ptimer_cb)(void *opaque); * The ptimer takes ownership of @bh and will delete it * when the ptimer is eventually freed. */ -ptimer_state *ptimer_init(QEMUBH *bh, uint8_t policy_mask); +ptimer_state *ptimer_init_with_bh(QEMUBH *bh, uint8_t policy_mask); /** * ptimer_free - Free a ptimer * @s: timer to free * - * Free a ptimer created using ptimer_init() (including + * Free a ptimer created using ptimer_init_with_bh() (including * deleting the bottom half which it is using). */ void ptimer_free(ptimer_state *s); @@ -XXX,XX +XXX,XX @@ void ptimer_set_count(ptimer_state *s, uint64_t count); * @oneshot: non-zero if this timer should only count down once * * Start a ptimer counting down; when it reaches zero the bottom half - * passed to ptimer_init() will be invoked. If the @oneshot argument is zero, + * passed to ptimer_init_with_bh() will be invoked. + * If the @oneshot argument is zero, * the counter value will then be reloaded from the limit and it will * start counting down again. If @oneshot is non-zero, then the counter * will disable itself when it reaches zero. diff --git a/hw/arm/musicpal.c b/hw/arm/musicpal.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/musicpal.c +++ b/hw/arm/musicpal.c @@ -XXX,XX +XXX,XX @@ static void mv88w8618_timer_init(SysBusDevice *dev, mv88w8618_timer_state *s, s->freq = freq; bh = qemu_bh_new(mv88w8618_timer_tick, s); - s->ptimer = ptimer_init(bh, PTIMER_POLICY_DEFAULT); + s->ptimer = ptimer_init_with_bh(bh, PTIMER_POLICY_DEFAULT); } static uint64_t mv88w8618_pit_read(void *opaque, hwaddr offset, diff --git a/hw/core/ptimer.c b/hw/core/ptimer.c index XXXXXXX..XXXXXXX 100644 --- a/hw/core/ptimer.c +++ b/hw/core/ptimer.c @@ -XXX,XX +XXX,XX @@ const VMStateDescription vmstate_ptimer = { } }; -ptimer_state *ptimer_init(QEMUBH *bh, uint8_t policy_mask) +ptimer_state *ptimer_init_with_bh(QEMUBH *bh, uint8_t policy_mask) { ptimer_state *s; diff --git a/hw/dma/xilinx_axidma.c b/hw/dma/xilinx_axidma.c index XXXXXXX..XXXXXXX 100644 --- a/hw/dma/xilinx_axidma.c +++ b/hw/dma/xilinx_axidma.c @@ -XXX,XX +XXX,XX @@ static void xilinx_axidma_realize(DeviceState *dev, Error **errp) st->nr = i; st->bh = qemu_bh_new(timer_hit, st); - st->ptimer = ptimer_init(st->bh, PTIMER_POLICY_DEFAULT); + st->ptimer = ptimer_init_with_bh(st->bh, PTIMER_POLICY_DEFAULT); ptimer_set_freq(st->ptimer, s->freqhz); } return; diff --git a/hw/m68k/mcf5206.c b/hw/m68k/mcf5206.c index XXXXXXX..XXXXXXX 100644 --- a/hw/m68k/mcf5206.c +++ b/hw/m68k/mcf5206.c @@ -XXX,XX +XXX,XX @@ static m5206_timer_state *m5206_timer_init(qemu_irq irq) s = g_new0(m5206_timer_state, 1); bh = qemu_bh_new(m5206_timer_trigger, s); - s->timer = ptimer_init(bh, PTIMER_POLICY_DEFAULT); + s->timer = ptimer_init_with_bh(bh, PTIMER_POLICY_DEFAULT); s->irq = irq; m5206_timer_reset(s); return s; diff --git a/hw/m68k/mcf5208.c b/hw/m68k/mcf5208.c index XXXXXXX..XXXXXXX 100644 --- a/hw/m68k/mcf5208.c +++ b/hw/m68k/mcf5208.c @@ -XXX,XX +XXX,XX @@ static void mcf5208_sys_init(MemoryRegion *address_space, qemu_irq *pic) for (i = 0; i < 2; i++) { s = g_new0(m5208_timer_state, 1); bh = qemu_bh_new(m5208_timer_trigger, s); - s->timer = ptimer_init(bh, PTIMER_POLICY_DEFAULT); + s->timer = ptimer_init_with_bh(bh, PTIMER_POLICY_DEFAULT); memory_region_init_io(&s->iomem, NULL, &m5208_timer_ops, s, "m5208-timer", 0x00004000); memory_region_add_subregion(address_space, 0xfc080000 + 0x4000 * i, diff --git a/hw/net/fsl_etsec/etsec.c b/hw/net/fsl_etsec/etsec.c index XXXXXXX..XXXXXXX 100644 --- a/hw/net/fsl_etsec/etsec.c +++ b/hw/net/fsl_etsec/etsec.c @@ -XXX,XX +XXX,XX @@ static void etsec_realize(DeviceState *dev, Error **errp) etsec->bh = qemu_bh_new(etsec_timer_hit, etsec); - etsec->ptimer = ptimer_init(etsec->bh, PTIMER_POLICY_DEFAULT); + etsec->ptimer = ptimer_init_with_bh(etsec->bh, PTIMER_POLICY_DEFAULT); ptimer_set_freq(etsec->ptimer, 100); } diff --git a/hw/net/lan9118.c b/hw/net/lan9118.c index XXXXXXX..XXXXXXX 100644 --- a/hw/net/lan9118.c +++ b/hw/net/lan9118.c @@ -XXX,XX +XXX,XX @@ static void lan9118_realize(DeviceState *dev, Error **errp) s->txp = &s->tx_packet; bh = qemu_bh_new(lan9118_tick, s); - s->timer = ptimer_init(bh, PTIMER_POLICY_DEFAULT); + s->timer = ptimer_init_with_bh(bh, PTIMER_POLICY_DEFAULT); ptimer_set_freq(s->timer, 10000); ptimer_set_limit(s->timer, 0xffff, 1); } diff --git a/hw/timer/allwinner-a10-pit.c b/hw/timer/allwinner-a10-pit.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/allwinner-a10-pit.c +++ b/hw/timer/allwinner-a10-pit.c @@ -XXX,XX +XXX,XX @@ static void a10_pit_init(Object *obj) tc->container = s; tc->index = i; bh[i] = qemu_bh_new(a10_pit_timer_cb, tc); - s->timer[i] = ptimer_init(bh[i], PTIMER_POLICY_DEFAULT); + s->timer[i] = ptimer_init_with_bh(bh[i], PTIMER_POLICY_DEFAULT); } } diff --git a/hw/timer/altera_timer.c b/hw/timer/altera_timer.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/altera_timer.c +++ b/hw/timer/altera_timer.c @@ -XXX,XX +XXX,XX @@ static void altera_timer_realize(DeviceState *dev, Error **errp) } t->bh = qemu_bh_new(timer_hit, t); - t->ptimer = ptimer_init(t->bh, PTIMER_POLICY_DEFAULT); + t->ptimer = ptimer_init_with_bh(t->bh, PTIMER_POLICY_DEFAULT); ptimer_set_freq(t->ptimer, t->freq_hz); memory_region_init_io(&t->mmio, OBJECT(t), &timer_ops, t, diff --git a/hw/timer/arm_mptimer.c b/hw/timer/arm_mptimer.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/arm_mptimer.c +++ b/hw/timer/arm_mptimer.c @@ -XXX,XX +XXX,XX @@ static void arm_mptimer_reset(DeviceState *dev) } } -static void arm_mptimer_init(Object *obj) +static void arm_mptimer_init_with_bh(Object *obj) { ARMMPTimerState *s = ARM_MPTIMER(obj); @@ -XXX,XX +XXX,XX @@ static void arm_mptimer_realize(DeviceState *dev, Error **errp) for (i = 0; i < s->num_cpu; i++) { TimerBlock *tb = &s->timerblock[i]; QEMUBH *bh = qemu_bh_new(timerblock_tick, tb); - tb->timer = ptimer_init(bh, PTIMER_POLICY); + tb->timer = ptimer_init_with_bh(bh, PTIMER_POLICY); sysbus_init_irq(sbd, &tb->irq); memory_region_init_io(&tb->iomem, OBJECT(s), &timerblock_ops, tb, "arm_mptimer_timerblock", 0x20); @@ -XXX,XX +XXX,XX @@ static const TypeInfo arm_mptimer_info = { .name = TYPE_ARM_MPTIMER, .parent = TYPE_SYS_BUS_DEVICE, .instance_size = sizeof(ARMMPTimerState), - .instance_init = arm_mptimer_init, + .instance_init = arm_mptimer_init_with_bh, .class_init = arm_mptimer_class_init, }; diff --git a/hw/timer/arm_timer.c b/hw/timer/arm_timer.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/arm_timer.c +++ b/hw/timer/arm_timer.c @@ -XXX,XX +XXX,XX @@ static arm_timer_state *arm_timer_init(uint32_t freq) s->control = TIMER_CTRL_IE; bh = qemu_bh_new(arm_timer_tick, s); - s->timer = ptimer_init(bh, PTIMER_POLICY_DEFAULT); + s->timer = ptimer_init_with_bh(bh, PTIMER_POLICY_DEFAULT); vmstate_register(NULL, -1, &vmstate_arm_timer, s); return s; } diff --git a/hw/timer/cmsdk-apb-dualtimer.c b/hw/timer/cmsdk-apb-dualtimer.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/cmsdk-apb-dualtimer.c +++ b/hw/timer/cmsdk-apb-dualtimer.c @@ -XXX,XX +XXX,XX @@ static void cmsdk_apb_dualtimer_realize(DeviceState *dev, Error **errp) QEMUBH *bh = qemu_bh_new(cmsdk_dualtimermod_tick, m); m->parent = s; - m->timer = ptimer_init(bh, + m->timer = ptimer_init_with_bh(bh, PTIMER_POLICY_WRAP_AFTER_ONE_PERIOD | PTIMER_POLICY_TRIGGER_ONLY_ON_DECREMENT | PTIMER_POLICY_NO_IMMEDIATE_RELOAD | diff --git a/hw/timer/cmsdk-apb-timer.c b/hw/timer/cmsdk-apb-timer.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/cmsdk-apb-timer.c +++ b/hw/timer/cmsdk-apb-timer.c @@ -XXX,XX +XXX,XX @@ static void cmsdk_apb_timer_realize(DeviceState *dev, Error **errp) } bh = qemu_bh_new(cmsdk_apb_timer_tick, s); - s->timer = ptimer_init(bh, + s->timer = ptimer_init_with_bh(bh, PTIMER_POLICY_WRAP_AFTER_ONE_PERIOD | PTIMER_POLICY_TRIGGER_ONLY_ON_DECREMENT | PTIMER_POLICY_NO_IMMEDIATE_RELOAD | diff --git a/hw/timer/digic-timer.c b/hw/timer/digic-timer.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/digic-timer.c +++ b/hw/timer/digic-timer.c @@ -XXX,XX +XXX,XX @@ static void digic_timer_init(Object *obj) { DigicTimerState *s = DIGIC_TIMER(obj); - s->ptimer = ptimer_init(NULL, PTIMER_POLICY_DEFAULT); + s->ptimer = ptimer_init_with_bh(NULL, PTIMER_POLICY_DEFAULT); /* * FIXME: there is no documentation on Digic timer diff --git a/hw/timer/etraxfs_timer.c b/hw/timer/etraxfs_timer.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/etraxfs_timer.c +++ b/hw/timer/etraxfs_timer.c @@ -XXX,XX +XXX,XX @@ static void etraxfs_timer_realize(DeviceState *dev, Error **errp) t->bh_t0 = qemu_bh_new(timer0_hit, t); t->bh_t1 = qemu_bh_new(timer1_hit, t); t->bh_wd = qemu_bh_new(watchdog_hit, t); - t->ptimer_t0 = ptimer_init(t->bh_t0, PTIMER_POLICY_DEFAULT); - t->ptimer_t1 = ptimer_init(t->bh_t1, PTIMER_POLICY_DEFAULT); - t->ptimer_wd = ptimer_init(t->bh_wd, PTIMER_POLICY_DEFAULT); + t->ptimer_t0 = ptimer_init_with_bh(t->bh_t0, PTIMER_POLICY_DEFAULT); + t->ptimer_t1 = ptimer_init_with_bh(t->bh_t1, PTIMER_POLICY_DEFAULT); + t->ptimer_wd = ptimer_init_with_bh(t->bh_wd, PTIMER_POLICY_DEFAULT); sysbus_init_irq(sbd, &t->irq); sysbus_init_irq(sbd, &t->nmi); diff --git a/hw/timer/exynos4210_mct.c b/hw/timer/exynos4210_mct.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/exynos4210_mct.c +++ b/hw/timer/exynos4210_mct.c @@ -XXX,XX +XXX,XX @@ static void exynos4210_mct_init(Object *obj) /* Global timer */ bh[0] = qemu_bh_new(exynos4210_gfrc_event, s); - s->g_timer.ptimer_frc = ptimer_init(bh[0], PTIMER_POLICY_DEFAULT); + s->g_timer.ptimer_frc = ptimer_init_with_bh(bh[0], PTIMER_POLICY_DEFAULT); memset(&s->g_timer.reg, 0, sizeof(struct gregs)); /* Local timers */ @@ -XXX,XX +XXX,XX @@ static void exynos4210_mct_init(Object *obj) bh[0] = qemu_bh_new(exynos4210_ltick_event, &s->l_timer[i]); bh[1] = qemu_bh_new(exynos4210_lfrc_event, &s->l_timer[i]); s->l_timer[i].tick_timer.ptimer_tick = - ptimer_init(bh[0], PTIMER_POLICY_DEFAULT); - s->l_timer[i].ptimer_frc = ptimer_init(bh[1], PTIMER_POLICY_DEFAULT); + ptimer_init_with_bh(bh[0], PTIMER_POLICY_DEFAULT); + s->l_timer[i].ptimer_frc = + ptimer_init_with_bh(bh[1], PTIMER_POLICY_DEFAULT); s->l_timer[i].id = i; } diff --git a/hw/timer/exynos4210_pwm.c b/hw/timer/exynos4210_pwm.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/exynos4210_pwm.c +++ b/hw/timer/exynos4210_pwm.c @@ -XXX,XX +XXX,XX @@ static void exynos4210_pwm_init(Object *obj) for (i = 0; i < EXYNOS4210_PWM_TIMERS_NUM; i++) { bh = qemu_bh_new(exynos4210_pwm_tick, &s->timer[i]); sysbus_init_irq(dev, &s->timer[i].irq); - s->timer[i].ptimer = ptimer_init(bh, PTIMER_POLICY_DEFAULT); + s->timer[i].ptimer = ptimer_init_with_bh(bh, PTIMER_POLICY_DEFAULT); s->timer[i].id = i; s->timer[i].parent = s; } diff --git a/hw/timer/exynos4210_rtc.c b/hw/timer/exynos4210_rtc.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/exynos4210_rtc.c +++ b/hw/timer/exynos4210_rtc.c @@ -XXX,XX +XXX,XX @@ static void exynos4210_rtc_init(Object *obj) QEMUBH *bh; bh = qemu_bh_new(exynos4210_rtc_tick, s); - s->ptimer = ptimer_init(bh, PTIMER_POLICY_DEFAULT); + s->ptimer = ptimer_init_with_bh(bh, PTIMER_POLICY_DEFAULT); ptimer_set_freq(s->ptimer, RTC_BASE_FREQ); exynos4210_rtc_update_freq(s, 0); bh = qemu_bh_new(exynos4210_rtc_1Hz_tick, s); - s->ptimer_1Hz = ptimer_init(bh, PTIMER_POLICY_DEFAULT); + s->ptimer_1Hz = ptimer_init_with_bh(bh, PTIMER_POLICY_DEFAULT); ptimer_set_freq(s->ptimer_1Hz, RTC_BASE_FREQ); sysbus_init_irq(dev, &s->alm_irq); diff --git a/hw/timer/grlib_gptimer.c b/hw/timer/grlib_gptimer.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/grlib_gptimer.c +++ b/hw/timer/grlib_gptimer.c @@ -XXX,XX +XXX,XX @@ static void grlib_gptimer_realize(DeviceState *dev, Error **errp) timer->unit = unit; timer->bh = qemu_bh_new(grlib_gptimer_hit, timer); - timer->ptimer = ptimer_init(timer->bh, PTIMER_POLICY_DEFAULT); + timer->ptimer = ptimer_init_with_bh(timer->bh, PTIMER_POLICY_DEFAULT); timer->id = i; /* One IRQ line for each timer */ diff --git a/hw/timer/imx_epit.c b/hw/timer/imx_epit.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/imx_epit.c +++ b/hw/timer/imx_epit.c @@ -XXX,XX +XXX,XX @@ static void imx_epit_realize(DeviceState *dev, Error **errp) 0x00001000); sysbus_init_mmio(sbd, &s->iomem); - s->timer_reload = ptimer_init(NULL, PTIMER_POLICY_DEFAULT); + s->timer_reload = ptimer_init_with_bh(NULL, PTIMER_POLICY_DEFAULT); bh = qemu_bh_new(imx_epit_cmp, s); - s->timer_cmp = ptimer_init(bh, PTIMER_POLICY_DEFAULT); + s->timer_cmp = ptimer_init_with_bh(bh, PTIMER_POLICY_DEFAULT); } static void imx_epit_class_init(ObjectClass *klass, void *data) diff --git a/hw/timer/imx_gpt.c b/hw/timer/imx_gpt.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/imx_gpt.c +++ b/hw/timer/imx_gpt.c @@ -XXX,XX +XXX,XX @@ static void imx_gpt_realize(DeviceState *dev, Error **errp) sysbus_init_mmio(sbd, &s->iomem); bh = qemu_bh_new(imx_gpt_timeout, s); - s->timer = ptimer_init(bh, PTIMER_POLICY_DEFAULT); + s->timer = ptimer_init_with_bh(bh, PTIMER_POLICY_DEFAULT); } static void imx_gpt_class_init(ObjectClass *klass, void *data) diff --git a/hw/timer/lm32_timer.c b/hw/timer/lm32_timer.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/lm32_timer.c +++ b/hw/timer/lm32_timer.c @@ -XXX,XX +XXX,XX @@ static void lm32_timer_realize(DeviceState *dev, Error **errp) LM32TimerState *s = LM32_TIMER(dev); s->bh = qemu_bh_new(timer_hit, s); - s->ptimer = ptimer_init(s->bh, PTIMER_POLICY_DEFAULT); + s->ptimer = ptimer_init_with_bh(s->bh, PTIMER_POLICY_DEFAULT); ptimer_set_freq(s->ptimer, s->freq_hz); } diff --git a/hw/timer/milkymist-sysctl.c b/hw/timer/milkymist-sysctl.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/milkymist-sysctl.c +++ b/hw/timer/milkymist-sysctl.c @@ -XXX,XX +XXX,XX @@ static void milkymist_sysctl_realize(DeviceState *dev, Error **errp) s->bh0 = qemu_bh_new(timer0_hit, s); s->bh1 = qemu_bh_new(timer1_hit, s); - s->ptimer0 = ptimer_init(s->bh0, PTIMER_POLICY_DEFAULT); - s->ptimer1 = ptimer_init(s->bh1, PTIMER_POLICY_DEFAULT); + s->ptimer0 = ptimer_init_with_bh(s->bh0, PTIMER_POLICY_DEFAULT); + s->ptimer1 = ptimer_init_with_bh(s->bh1, PTIMER_POLICY_DEFAULT); ptimer_set_freq(s->ptimer0, s->freq_hz); ptimer_set_freq(s->ptimer1, s->freq_hz); diff --git a/hw/timer/mss-timer.c b/hw/timer/mss-timer.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/mss-timer.c +++ b/hw/timer/mss-timer.c @@ -XXX,XX +XXX,XX @@ static void mss_timer_init(Object *obj) struct Msf2Timer *st = &t->timers[i]; st->bh = qemu_bh_new(timer_hit, st); - st->ptimer = ptimer_init(st->bh, PTIMER_POLICY_DEFAULT); + st->ptimer = ptimer_init_with_bh(st->bh, PTIMER_POLICY_DEFAULT); ptimer_set_freq(st->ptimer, t->freq_hz); sysbus_init_irq(SYS_BUS_DEVICE(obj), &st->irq); } diff --git a/hw/timer/puv3_ost.c b/hw/timer/puv3_ost.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/puv3_ost.c +++ b/hw/timer/puv3_ost.c @@ -XXX,XX +XXX,XX @@ static void puv3_ost_realize(DeviceState *dev, Error **errp) sysbus_init_irq(sbd, &s->irq); s->bh = qemu_bh_new(puv3_ost_tick, s); - s->ptimer = ptimer_init(s->bh, PTIMER_POLICY_DEFAULT); + s->ptimer = ptimer_init_with_bh(s->bh, PTIMER_POLICY_DEFAULT); ptimer_set_freq(s->ptimer, 50 * 1000 * 1000); memory_region_init_io(&s->iomem, OBJECT(s), &puv3_ost_ops, s, "puv3_ost", diff --git a/hw/timer/sh_timer.c b/hw/timer/sh_timer.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/sh_timer.c +++ b/hw/timer/sh_timer.c @@ -XXX,XX +XXX,XX @@ static void *sh_timer_init(uint32_t freq, int feat, qemu_irq irq) s->irq = irq; bh = qemu_bh_new(sh_timer_tick, s); - s->timer = ptimer_init(bh, PTIMER_POLICY_DEFAULT); + s->timer = ptimer_init_with_bh(bh, PTIMER_POLICY_DEFAULT); sh_timer_write(s, OFFSET_TCOR >> 2, s->tcor); sh_timer_write(s, OFFSET_TCNT >> 2, s->tcnt); diff --git a/hw/timer/slavio_timer.c b/hw/timer/slavio_timer.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/slavio_timer.c +++ b/hw/timer/slavio_timer.c @@ -XXX,XX +XXX,XX @@ static void slavio_timer_init(Object *obj) tc->timer_index = i; bh = qemu_bh_new(slavio_timer_irq, tc); - s->cputimer[i].timer = ptimer_init(bh, PTIMER_POLICY_DEFAULT); + s->cputimer[i].timer = ptimer_init_with_bh(bh, PTIMER_POLICY_DEFAULT); ptimer_set_period(s->cputimer[i].timer, TIMER_PERIOD); size = i == 0 ? SYS_TIMER_SIZE : CPU_TIMER_SIZE; diff --git a/hw/timer/xilinx_timer.c b/hw/timer/xilinx_timer.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/xilinx_timer.c +++ b/hw/timer/xilinx_timer.c @@ -XXX,XX +XXX,XX @@ static void xilinx_timer_realize(DeviceState *dev, Error **errp) xt->parent = t; xt->nr = i; xt->bh = qemu_bh_new(timer_hit, xt); - xt->ptimer = ptimer_init(xt->bh, PTIMER_POLICY_DEFAULT); + xt->ptimer = ptimer_init_with_bh(xt->bh, PTIMER_POLICY_DEFAULT); ptimer_set_freq(xt->ptimer, t->freq_hz); } diff --git a/hw/watchdog/cmsdk-apb-watchdog.c b/hw/watchdog/cmsdk-apb-watchdog.c index XXXXXXX..XXXXXXX 100644 --- a/hw/watchdog/cmsdk-apb-watchdog.c +++ b/hw/watchdog/cmsdk-apb-watchdog.c @@ -XXX,XX +XXX,XX @@ static void cmsdk_apb_watchdog_realize(DeviceState *dev, Error **errp) } bh = qemu_bh_new(cmsdk_apb_watchdog_tick, s); - s->timer = ptimer_init(bh, + s->timer = ptimer_init_with_bh(bh, PTIMER_POLICY_WRAP_AFTER_ONE_PERIOD | PTIMER_POLICY_TRIGGER_ONLY_ON_DECREMENT | PTIMER_POLICY_NO_IMMEDIATE_RELOAD | diff --git a/tests/ptimer-test.c b/tests/ptimer-test.c index XXXXXXX..XXXXXXX 100644 --- a/tests/ptimer-test.c +++ b/tests/ptimer-test.c @@ -XXX,XX +XXX,XX @@ static void check_set_count(gconstpointer arg) { const uint8_t *policy = arg; QEMUBH *bh = qemu_bh_new(ptimer_trigger, NULL); - ptimer_state *ptimer = ptimer_init(bh, *policy); + ptimer_state *ptimer = ptimer_init_with_bh(bh, *policy); triggered = false; @@ -XXX,XX +XXX,XX @@ static void check_set_limit(gconstpointer arg) { const uint8_t *policy = arg; QEMUBH *bh = qemu_bh_new(ptimer_trigger, NULL); - ptimer_state *ptimer = ptimer_init(bh, *policy); + ptimer_state *ptimer = ptimer_init_with_bh(bh, *policy); triggered = false; @@ -XXX,XX +XXX,XX @@ static void check_oneshot(gconstpointer arg) { const uint8_t *policy = arg; QEMUBH *bh = qemu_bh_new(ptimer_trigger, NULL); - ptimer_state *ptimer = ptimer_init(bh, *policy); + ptimer_state *ptimer = ptimer_init_with_bh(bh, *policy); bool no_round_down = (*policy & PTIMER_POLICY_NO_COUNTER_ROUND_DOWN); triggered = false; @@ -XXX,XX +XXX,XX @@ static void check_periodic(gconstpointer arg) { const uint8_t *policy = arg; QEMUBH *bh = qemu_bh_new(ptimer_trigger, NULL); - ptimer_state *ptimer = ptimer_init(bh, *policy); + ptimer_state *ptimer = ptimer_init_with_bh(bh, *policy); bool wrap_policy = (*policy & PTIMER_POLICY_WRAP_AFTER_ONE_PERIOD); bool no_immediate_trigger = (*policy & PTIMER_POLICY_NO_IMMEDIATE_TRIGGER); bool no_immediate_reload = (*policy & PTIMER_POLICY_NO_IMMEDIATE_RELOAD); @@ -XXX,XX +XXX,XX @@ static void check_on_the_fly_mode_change(gconstpointer arg) { const uint8_t *policy = arg; QEMUBH *bh = qemu_bh_new(ptimer_trigger, NULL); - ptimer_state *ptimer = ptimer_init(bh, *policy); + ptimer_state *ptimer = ptimer_init_with_bh(bh, *policy); bool wrap_policy = (*policy & PTIMER_POLICY_WRAP_AFTER_ONE_PERIOD); bool no_round_down = (*policy & PTIMER_POLICY_NO_COUNTER_ROUND_DOWN); @@ -XXX,XX +XXX,XX @@ static void check_on_the_fly_period_change(gconstpointer arg) { const uint8_t *policy = arg; QEMUBH *bh = qemu_bh_new(ptimer_trigger, NULL); - ptimer_state *ptimer = ptimer_init(bh, *policy); + ptimer_state *ptimer = ptimer_init_with_bh(bh, *policy); bool no_round_down = (*policy & PTIMER_POLICY_NO_COUNTER_ROUND_DOWN); triggered = false; @@ -XXX,XX +XXX,XX @@ static void check_on_the_fly_freq_change(gconstpointer arg) { const uint8_t *policy = arg; QEMUBH *bh = qemu_bh_new(ptimer_trigger, NULL); - ptimer_state *ptimer = ptimer_init(bh, *policy); + ptimer_state *ptimer = ptimer_init_with_bh(bh, *policy); bool no_round_down = (*policy & PTIMER_POLICY_NO_COUNTER_ROUND_DOWN); triggered = false; @@ -XXX,XX +XXX,XX @@ static void check_run_with_period_0(gconstpointer arg) { const uint8_t *policy = arg; QEMUBH *bh = qemu_bh_new(ptimer_trigger, NULL); - ptimer_state *ptimer = ptimer_init(bh, *policy); + ptimer_state *ptimer = ptimer_init_with_bh(bh, *policy); triggered = false; @@ -XXX,XX +XXX,XX @@ static void check_run_with_delta_0(gconstpointer arg) { const uint8_t *policy = arg; QEMUBH *bh = qemu_bh_new(ptimer_trigger, NULL); - ptimer_state *ptimer = ptimer_init(bh, *policy); + ptimer_state *ptimer = ptimer_init_with_bh(bh, *policy); bool wrap_policy = (*policy & PTIMER_POLICY_WRAP_AFTER_ONE_PERIOD); bool no_immediate_trigger = (*policy & PTIMER_POLICY_NO_IMMEDIATE_TRIGGER); bool no_immediate_reload = (*policy & PTIMER_POLICY_NO_IMMEDIATE_RELOAD); @@ -XXX,XX +XXX,XX @@ static void check_periodic_with_load_0(gconstpointer arg) { const uint8_t *policy = arg; QEMUBH *bh = qemu_bh_new(ptimer_trigger, NULL); - ptimer_state *ptimer = ptimer_init(bh, *policy); + ptimer_state *ptimer = ptimer_init_with_bh(bh, *policy); bool continuous_trigger = (*policy & PTIMER_POLICY_CONTINUOUS_TRIGGER); bool no_immediate_trigger = (*policy & PTIMER_POLICY_NO_IMMEDIATE_TRIGGER); bool trig_only_on_dec = (*policy & PTIMER_POLICY_TRIGGER_ONLY_ON_DECREMENT); @@ -XXX,XX +XXX,XX @@ static void check_oneshot_with_load_0(gconstpointer arg) { const uint8_t *policy = arg; QEMUBH *bh = qemu_bh_new(ptimer_trigger, NULL); - ptimer_state *ptimer = ptimer_init(bh, *policy); + ptimer_state *ptimer = ptimer_init_with_bh(bh, *policy); bool no_immediate_trigger = (*policy & PTIMER_POLICY_NO_IMMEDIATE_TRIGGER); bool trig_only_on_dec = (*policy & PTIMER_POLICY_TRIGGER_ONLY_ON_DECREMENT); -- 2.20.1
Provide the new transaction-based API. If a ptimer is created using ptimer_init() rather than ptimer_init_with_bh(), then instead of providing a QEMUBH, it provides a pointer to the callback function directly, and has opted into the transaction API. All calls to functions which modify ptimer state: - ptimer_set_period() - ptimer_set_freq() - ptimer_set_limit() - ptimer_set_count() - ptimer_run() - ptimer_stop() must be between matched calls to ptimer_transaction_begin() and ptimer_transaction_commit(). When ptimer_transaction_commit() is called it will evaluate the state of the timer after all the changes in the transaction, and call the callback if necessary. In the old API the individual update functions generally would call ptimer_trigger() immediately, which would schedule the QEMUBH. In the new API the update functions will instead defer the "set s->next_event and call ptimer_reload()" work to ptimer_transaction_commit(). Because ptimer_trigger() can now immediately call into the device code which may then call other ptimer functions that update ptimer_state fields, we must be more careful in ptimer_reload() not to cache fields from ptimer_state across the ptimer_trigger() call. (This was harmless with the QEMUBH mechanism as the BH would not be invoked until much later.) We use assertions to check that: * the functions modifying ptimer state are not called outside a transaction block * ptimer_transaction_begin() and _commit() calls are paired * the transaction API is not used with a QEMUBH ptimer There is some slight repetition of code: * most of the set functions have similar looking "if s->bh call ptimer_reload, otherwise set s->need_reload" code * ptimer_init() and ptimer_init_with_bh() have similar code We deliberately don't try to avoid this repetition, because it will all be deleted when the QEMUBH version of the API is removed. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20191008171740.9679-3-peter.maydell@linaro.org --- include/hw/ptimer.h | 72 +++++++++++++++++++++ hw/core/ptimer.c | 152 +++++++++++++++++++++++++++++++++++++++----- 2 files changed, 209 insertions(+), 15 deletions(-) diff --git a/include/hw/ptimer.h b/include/hw/ptimer.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/ptimer.h +++ b/include/hw/ptimer.h @@ -XXX,XX +XXX,XX @@ typedef void (*ptimer_cb)(void *opaque); */ ptimer_state *ptimer_init_with_bh(QEMUBH *bh, uint8_t policy_mask); +/** + * ptimer_init - Allocate and return a new ptimer + * @callback: function to call on ptimer expiry + * @callback_opaque: opaque pointer passed to @callback + * @policy: PTIMER_POLICY_* bits specifying behaviour + * + * The ptimer returned must be freed using ptimer_free(). + * + * If a ptimer is created using this API then will use the + * transaction-based API for modifying ptimer state: all calls + * to functions which modify ptimer state: + * - ptimer_set_period() + * - ptimer_set_freq() + * - ptimer_set_limit() + * - ptimer_set_count() + * - ptimer_run() + * - ptimer_stop() + * must be between matched calls to ptimer_transaction_begin() + * and ptimer_transaction_commit(). When ptimer_transaction_commit() + * is called it will evaluate the state of the timer after all the + * changes in the transaction, and call the callback if necessary. + * + * The callback function is always called from within a transaction + * begin/commit block, so the callback should not call the + * ptimer_transaction_begin() function itself. If the callback changes + * the ptimer state such that another ptimer expiry is triggered, then + * the callback will be called a second time after the first call returns. + */ +ptimer_state *ptimer_init(ptimer_cb callback, + void *callback_opaque, + uint8_t policy_mask); + /** * ptimer_free - Free a ptimer * @s: timer to free @@ -XXX,XX +XXX,XX @@ ptimer_state *ptimer_init_with_bh(QEMUBH *bh, uint8_t policy_mask); */ void ptimer_free(ptimer_state *s); +/** + * ptimer_transaction_begin() - Start a ptimer modification transaction + * + * This function must be called before making any calls to functions + * which modify the ptimer's state (see the ptimer_init() documentation + * for a list of these), and must always have a matched call to + * ptimer_transaction_commit(). + * It is an error to call this function for a BH-based ptimer; + * attempting to do this will trigger an assert. + */ +void ptimer_transaction_begin(ptimer_state *s); + +/** + * ptimer_transaction_commit() - Commit a ptimer modification transaction + * + * This function must be called after calls to functions which modify + * the ptimer's state, and completes the update of the ptimer. If the + * ptimer state now means that we should trigger the timer expiry + * callback, it will be called directly. + */ +void ptimer_transaction_commit(ptimer_state *s); + /** * ptimer_set_period - Set counter increment interval in nanoseconds * @s: ptimer to configure @@ -XXX,XX +XXX,XX @@ void ptimer_free(ptimer_state *s); * Note that if your counter behaviour is specified as having a * particular frequency rather than a period then ptimer_set_freq() * may be more appropriate. + * + * This function will assert if it is called outside a + * ptimer_transaction_begin/commit block, unless this is a bottom-half ptimer. */ void ptimer_set_period(ptimer_state *s, int64_t period); @@ -XXX,XX +XXX,XX @@ void ptimer_set_period(ptimer_state *s, int64_t period); * as setting the frequency then this function is more appropriate, * because it allows specifying an effective period which is * precise to fractions of a nanosecond, avoiding rounding errors. + * + * This function will assert if it is called outside a + * ptimer_transaction_begin/commit block, unless this is a bottom-half ptimer. */ void ptimer_set_freq(ptimer_state *s, uint32_t freq); @@ -XXX,XX +XXX,XX @@ uint64_t ptimer_get_limit(ptimer_state *s); * Set the limit value of the down-counter. The @reload flag can * be used to emulate the behaviour of timers which immediately * reload the counter when their reload register is written to. + * + * This function will assert if it is called outside a + * ptimer_transaction_begin/commit block, unless this is a bottom-half ptimer. */ void ptimer_set_limit(ptimer_state *s, uint64_t limit, int reload); @@ -XXX,XX +XXX,XX @@ uint64_t ptimer_get_count(ptimer_state *s); * Set the value of the down-counter. If the counter is currently * enabled this will arrange for a timer callback at the appropriate * point in the future. + * + * This function will assert if it is called outside a + * ptimer_transaction_begin/commit block, unless this is a bottom-half ptimer. */ void ptimer_set_count(ptimer_state *s, uint64_t count); @@ -XXX,XX +XXX,XX @@ void ptimer_set_count(ptimer_state *s, uint64_t count); * the counter value will then be reloaded from the limit and it will * start counting down again. If @oneshot is non-zero, then the counter * will disable itself when it reaches zero. + * + * This function will assert if it is called outside a + * ptimer_transaction_begin/commit block, unless this is a bottom-half ptimer. */ void ptimer_run(ptimer_state *s, int oneshot); @@ -XXX,XX +XXX,XX @@ void ptimer_run(ptimer_state *s, int oneshot); * * Note that this can cause it to "lose" time, even if it is immediately * restarted. + * + * This function will assert if it is called outside a + * ptimer_transaction_begin/commit block, unless this is a bottom-half ptimer. */ void ptimer_stop(ptimer_state *s); diff --git a/hw/core/ptimer.c b/hw/core/ptimer.c index XXXXXXX..XXXXXXX 100644 --- a/hw/core/ptimer.c +++ b/hw/core/ptimer.c @@ -XXX,XX +XXX,XX @@ struct ptimer_state uint8_t policy_mask; QEMUBH *bh; QEMUTimer *timer; + ptimer_cb callback; + void *callback_opaque; + /* + * These track whether we're in a transaction block, and if we + * need to do a timer reload when the block finishes. They don't + * need to be migrated because migration can never happen in the + * middle of a transaction block. + */ + bool in_transaction; + bool need_reload; }; /* Use a bottom-half routine to avoid reentrancy issues. */ @@ -XXX,XX +XXX,XX @@ static void ptimer_trigger(ptimer_state *s) if (s->bh) { replay_bh_schedule_event(s->bh); } + if (s->callback) { + s->callback(s->callback_opaque); + } } static void ptimer_reload(ptimer_state *s, int delta_adjust) { - uint32_t period_frac = s->period_frac; - uint64_t period = s->period; - uint64_t delta = s->delta; + uint32_t period_frac; + uint64_t period; + uint64_t delta; bool suppress_trigger = false; /* @@ -XXX,XX +XXX,XX @@ static void ptimer_reload(ptimer_state *s, int delta_adjust) (s->policy_mask & PTIMER_POLICY_TRIGGER_ONLY_ON_DECREMENT)) { suppress_trigger = true; } - if (delta == 0 && !(s->policy_mask & PTIMER_POLICY_NO_IMMEDIATE_TRIGGER) + if (s->delta == 0 && !(s->policy_mask & PTIMER_POLICY_NO_IMMEDIATE_TRIGGER) && !suppress_trigger) { ptimer_trigger(s); } + /* + * Note that ptimer_trigger() might call the device callback function, + * which can then modify timer state, so we must not cache any fields + * from ptimer_state until after we have called it. + */ + delta = s->delta; + period = s->period; + period_frac = s->period_frac; + if (delta == 0 && !(s->policy_mask & PTIMER_POLICY_NO_IMMEDIATE_RELOAD)) { delta = s->delta = s->limit; } @@ -XXX,XX +XXX,XX @@ static void ptimer_tick(void *opaque) ptimer_state *s = (ptimer_state *)opaque; bool trigger = true; + /* + * We perform all the tick actions within a begin/commit block + * because the callback function that ptimer_trigger() calls + * might make calls into the ptimer APIs that provoke another + * trigger, and we want that to cause the callback function + * to be called iteratively, not recursively. + */ + ptimer_transaction_begin(s); + if (s->enabled == 2) { s->delta = 0; s->enabled = 0; @@ -XXX,XX +XXX,XX @@ static void ptimer_tick(void *opaque) if (trigger) { ptimer_trigger(s); } + + ptimer_transaction_commit(s); } uint64_t ptimer_get_count(ptimer_state *s) @@ -XXX,XX +XXX,XX @@ uint64_t ptimer_get_count(ptimer_state *s) void ptimer_set_count(ptimer_state *s, uint64_t count) { + assert(s->in_transaction || !s->callback); s->delta = count; if (s->enabled) { - s->next_event = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); - ptimer_reload(s, 0); + if (!s->callback) { + s->next_event = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); + ptimer_reload(s, 0); + } else { + s->need_reload = true; + } } } @@ -XXX,XX +XXX,XX @@ void ptimer_run(ptimer_state *s, int oneshot) { bool was_disabled = !s->enabled; + assert(s->in_transaction || !s->callback); + if (was_disabled && s->period == 0) { if (!qtest_enabled()) { fprintf(stderr, "Timer with period zero, disabling\n"); @@ -XXX,XX +XXX,XX @@ void ptimer_run(ptimer_state *s, int oneshot) } s->enabled = oneshot ? 2 : 1; if (was_disabled) { - s->next_event = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); - ptimer_reload(s, 0); + if (!s->callback) { + s->next_event = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); + ptimer_reload(s, 0); + } else { + s->need_reload = true; + } } } @@ -XXX,XX +XXX,XX @@ void ptimer_run(ptimer_state *s, int oneshot) is immediately restarted. */ void ptimer_stop(ptimer_state *s) { + assert(s->in_transaction || !s->callback); + if (!s->enabled) return; s->delta = ptimer_get_count(s); timer_del(s->timer); s->enabled = 0; + if (s->callback) { + s->need_reload = false; + } } /* Set counter increment interval in nanoseconds. */ void ptimer_set_period(ptimer_state *s, int64_t period) { + assert(s->in_transaction || !s->callback); s->delta = ptimer_get_count(s); s->period = period; s->period_frac = 0; if (s->enabled) { - s->next_event = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); - ptimer_reload(s, 0); + if (!s->callback) { + s->next_event = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); + ptimer_reload(s, 0); + } else { + s->need_reload = true; + } } } /* Set counter frequency in Hz. */ void ptimer_set_freq(ptimer_state *s, uint32_t freq) { + assert(s->in_transaction || !s->callback); s->delta = ptimer_get_count(s); s->period = 1000000000ll / freq; s->period_frac = (1000000000ll << 32) / freq; if (s->enabled) { - s->next_event = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); - ptimer_reload(s, 0); + if (!s->callback) { + s->next_event = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); + ptimer_reload(s, 0); + } else { + s->need_reload = true; + } } } @@ -XXX,XX +XXX,XX @@ void ptimer_set_freq(ptimer_state *s, uint32_t freq) count = limit. */ void ptimer_set_limit(ptimer_state *s, uint64_t limit, int reload) { + assert(s->in_transaction || !s->callback); s->limit = limit; if (reload) s->delta = limit; if (s->enabled && reload) { - s->next_event = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); - ptimer_reload(s, 0); + if (!s->callback) { + s->next_event = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); + ptimer_reload(s, 0); + } else { + s->need_reload = true; + } } } @@ -XXX,XX +XXX,XX @@ uint64_t ptimer_get_limit(ptimer_state *s) return s->limit; } +void ptimer_transaction_begin(ptimer_state *s) +{ + assert(!s->in_transaction || !s->callback); + s->in_transaction = true; + s->need_reload = false; +} + +void ptimer_transaction_commit(ptimer_state *s) +{ + assert(s->in_transaction); + /* + * We must loop here because ptimer_reload() can call the callback + * function, which might then update ptimer state in a way that + * means we need to do another reload and possibly another callback. + * A disabled timer never needs reloading (and if we don't check + * this then we loop forever if ptimer_reload() disables the timer). + */ + while (s->need_reload && s->enabled) { + s->need_reload = false; + s->next_event = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); + ptimer_reload(s, 0); + } + /* Now we've finished reload we can leave the transaction block. */ + s->in_transaction = false; +} + const VMStateDescription vmstate_ptimer = { .name = "ptimer", .version_id = 1, @@ -XXX,XX +XXX,XX @@ ptimer_state *ptimer_init_with_bh(QEMUBH *bh, uint8_t policy_mask) return s; } +ptimer_state *ptimer_init(ptimer_cb callback, void *callback_opaque, + uint8_t policy_mask) +{ + ptimer_state *s; + + /* + * The callback function is mandatory; so we use it to distinguish + * old-style QEMUBH ptimers from new transaction API ptimers. + * (ptimer_init_with_bh() allows a NULL bh pointer and at least + * one device (digic-timer) passes NULL, so it's not the case + * that either s->bh != NULL or s->callback != NULL.) + */ + assert(callback); + + s = g_new0(ptimer_state, 1); + s->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, ptimer_tick, s); + s->policy_mask = policy_mask; + s->callback = callback; + s->callback_opaque = callback_opaque; + + /* + * These two policies are incompatible -- trigger-on-decrement implies + * a timer trigger when the count becomes 0, but no-immediate-trigger + * implies a trigger when the count stops being 0. + */ + assert(!((policy_mask & PTIMER_POLICY_TRIGGER_ONLY_ON_DECREMENT) && + (policy_mask & PTIMER_POLICY_NO_IMMEDIATE_TRIGGER))); + return s; +} + void ptimer_free(ptimer_state *s) { - qemu_bh_delete(s->bh); + if (s->bh) { + qemu_bh_delete(s->bh); + } timer_free(s->timer); g_free(s); } -- 2.20.1
Convert the ptimer test cases to the transaction-based ptimer API, by changing to ptimer_init(), dropping the now-unused QEMUBH variables, and surrounding each set of changes to the ptimer state in ptimer_transaction_begin/commit calls. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20191008171740.9679-4-peter.maydell@linaro.org --- tests/ptimer-test.c | 106 +++++++++++++++++++++++++++++++++++--------- 1 file changed, 84 insertions(+), 22 deletions(-) diff --git a/tests/ptimer-test.c b/tests/ptimer-test.c index XXXXXXX..XXXXXXX 100644 --- a/tests/ptimer-test.c +++ b/tests/ptimer-test.c @@ -XXX,XX +XXX,XX @@ static void qemu_clock_step(uint64_t ns) static void check_set_count(gconstpointer arg) { const uint8_t *policy = arg; - QEMUBH *bh = qemu_bh_new(ptimer_trigger, NULL); - ptimer_state *ptimer = ptimer_init_with_bh(bh, *policy); + ptimer_state *ptimer = ptimer_init(ptimer_trigger, NULL, *policy); triggered = false; + ptimer_transaction_begin(ptimer); ptimer_set_count(ptimer, 1000); + ptimer_transaction_commit(ptimer); g_assert_cmpuint(ptimer_get_count(ptimer), ==, 1000); g_assert_false(triggered); ptimer_free(ptimer); @@ -XXX,XX +XXX,XX @@ static void check_set_count(gconstpointer arg) static void check_set_limit(gconstpointer arg) { const uint8_t *policy = arg; - QEMUBH *bh = qemu_bh_new(ptimer_trigger, NULL); - ptimer_state *ptimer = ptimer_init_with_bh(bh, *policy); + ptimer_state *ptimer = ptimer_init(ptimer_trigger, NULL, *policy); triggered = false; + ptimer_transaction_begin(ptimer); ptimer_set_limit(ptimer, 1000, 0); + ptimer_transaction_commit(ptimer); g_assert_cmpuint(ptimer_get_count(ptimer), ==, 0); g_assert_cmpuint(ptimer_get_limit(ptimer), ==, 1000); g_assert_false(triggered); + ptimer_transaction_begin(ptimer); ptimer_set_limit(ptimer, 2000, 1); + ptimer_transaction_commit(ptimer); g_assert_cmpuint(ptimer_get_count(ptimer), ==, 2000); g_assert_cmpuint(ptimer_get_limit(ptimer), ==, 2000); g_assert_false(triggered); @@ -XXX,XX +XXX,XX @@ static void check_set_limit(gconstpointer arg) static void check_oneshot(gconstpointer arg) { const uint8_t *policy = arg; - QEMUBH *bh = qemu_bh_new(ptimer_trigger, NULL); - ptimer_state *ptimer = ptimer_init_with_bh(bh, *policy); + ptimer_state *ptimer = ptimer_init(ptimer_trigger, NULL, *policy); bool no_round_down = (*policy & PTIMER_POLICY_NO_COUNTER_ROUND_DOWN); triggered = false; + ptimer_transaction_begin(ptimer); ptimer_set_period(ptimer, 2000000); ptimer_set_count(ptimer, 10); ptimer_run(ptimer, 1); + ptimer_transaction_commit(ptimer); qemu_clock_step(2000000 * 2 + 1); g_assert_cmpuint(ptimer_get_count(ptimer), ==, no_round_down ? 8 : 7); g_assert_false(triggered); + ptimer_transaction_begin(ptimer); ptimer_stop(ptimer); + ptimer_transaction_commit(ptimer); g_assert_cmpuint(ptimer_get_count(ptimer), ==, no_round_down ? 8 : 7); g_assert_false(triggered); @@ -XXX,XX +XXX,XX @@ static void check_oneshot(gconstpointer arg) g_assert_cmpuint(ptimer_get_count(ptimer), ==, no_round_down ? 8 : 7); g_assert_false(triggered); + ptimer_transaction_begin(ptimer); ptimer_run(ptimer, 1); + ptimer_transaction_commit(ptimer); qemu_clock_step(2000000 * 7 + 1); @@ -XXX,XX +XXX,XX @@ static void check_oneshot(gconstpointer arg) g_assert_cmpuint(ptimer_get_count(ptimer), ==, 0); g_assert_false(triggered); + ptimer_transaction_begin(ptimer); ptimer_set_count(ptimer, 10); + ptimer_transaction_commit(ptimer); qemu_clock_step(20000000 + 1); g_assert_cmpuint(ptimer_get_count(ptimer), ==, 10); g_assert_false(triggered); + ptimer_transaction_begin(ptimer); ptimer_set_limit(ptimer, 9, 1); + ptimer_transaction_commit(ptimer); qemu_clock_step(20000000 + 1); g_assert_cmpuint(ptimer_get_count(ptimer), ==, 9); g_assert_false(triggered); + ptimer_transaction_begin(ptimer); ptimer_run(ptimer, 1); + ptimer_transaction_commit(ptimer); qemu_clock_step(2000000 + 1); g_assert_cmpuint(ptimer_get_count(ptimer), ==, no_round_down ? 8 : 7); g_assert_false(triggered); + ptimer_transaction_begin(ptimer); ptimer_set_count(ptimer, 20); + ptimer_transaction_commit(ptimer); qemu_clock_step(2000000 * 19 + 1); @@ -XXX,XX +XXX,XX @@ static void check_oneshot(gconstpointer arg) g_assert_cmpuint(ptimer_get_count(ptimer), ==, 0); g_assert_true(triggered); + ptimer_transaction_begin(ptimer); ptimer_stop(ptimer); + ptimer_transaction_commit(ptimer); triggered = false; @@ -XXX,XX +XXX,XX @@ static void check_oneshot(gconstpointer arg) static void check_periodic(gconstpointer arg) { const uint8_t *policy = arg; - QEMUBH *bh = qemu_bh_new(ptimer_trigger, NULL); - ptimer_state *ptimer = ptimer_init_with_bh(bh, *policy); + ptimer_state *ptimer = ptimer_init(ptimer_trigger, NULL, *policy); bool wrap_policy = (*policy & PTIMER_POLICY_WRAP_AFTER_ONE_PERIOD); bool no_immediate_trigger = (*policy & PTIMER_POLICY_NO_IMMEDIATE_TRIGGER); bool no_immediate_reload = (*policy & PTIMER_POLICY_NO_IMMEDIATE_RELOAD); @@ -XXX,XX +XXX,XX @@ static void check_periodic(gconstpointer arg) triggered = false; + ptimer_transaction_begin(ptimer); ptimer_set_period(ptimer, 2000000); ptimer_set_limit(ptimer, 10, 1); ptimer_run(ptimer, 0); + ptimer_transaction_commit(ptimer); g_assert_cmpuint(ptimer_get_count(ptimer), ==, 10); g_assert_false(triggered); @@ -XXX,XX +XXX,XX @@ static void check_periodic(gconstpointer arg) (no_round_down ? 9 : 8) + (wrap_policy ? 1 : 0)); g_assert_false(triggered); + ptimer_transaction_begin(ptimer); ptimer_set_count(ptimer, 20); + ptimer_transaction_commit(ptimer); g_assert_cmpuint(ptimer_get_count(ptimer), ==, 20); g_assert_false(triggered); @@ -XXX,XX +XXX,XX @@ static void check_periodic(gconstpointer arg) triggered = false; + ptimer_transaction_begin(ptimer); ptimer_set_count(ptimer, 3); + ptimer_transaction_commit(ptimer); g_assert_cmpuint(ptimer_get_count(ptimer), ==, 3); g_assert_false(triggered); @@ -XXX,XX +XXX,XX @@ static void check_periodic(gconstpointer arg) (no_round_down ? 9 : 8) + (wrap_policy ? 1 : 0)); g_assert_true(triggered); + ptimer_transaction_begin(ptimer); ptimer_stop(ptimer); + ptimer_transaction_commit(ptimer); triggered = false; qemu_clock_step(2000000); @@ -XXX,XX +XXX,XX @@ static void check_periodic(gconstpointer arg) (no_round_down ? 9 : 8) + (wrap_policy ? 1 : 0)); g_assert_false(triggered); + ptimer_transaction_begin(ptimer); ptimer_set_count(ptimer, 3); ptimer_run(ptimer, 0); + ptimer_transaction_commit(ptimer); qemu_clock_step(2000000 * 3 + 1); @@ -XXX,XX +XXX,XX @@ static void check_periodic(gconstpointer arg) (no_round_down ? 9 : 8) + (wrap_policy ? 1 : 0)); g_assert_false(triggered); + ptimer_transaction_begin(ptimer); ptimer_set_count(ptimer, 0); + ptimer_transaction_commit(ptimer); g_assert_cmpuint(ptimer_get_count(ptimer), ==, no_immediate_reload ? 0 : 10); @@ -XXX,XX +XXX,XX @@ static void check_periodic(gconstpointer arg) (no_round_down ? 8 : 7) + (wrap_policy ? 1 : 0)); g_assert_true(triggered); + ptimer_transaction_begin(ptimer); ptimer_stop(ptimer); + ptimer_transaction_commit(ptimer); triggered = false; @@ -XXX,XX +XXX,XX @@ static void check_periodic(gconstpointer arg) (no_round_down ? 8 : 7) + (wrap_policy ? 1 : 0)); g_assert_false(triggered); + ptimer_transaction_begin(ptimer); ptimer_run(ptimer, 0); + ptimer_transaction_commit(ptimer); + + ptimer_transaction_begin(ptimer); ptimer_set_period(ptimer, 0); + ptimer_transaction_commit(ptimer); qemu_clock_step(2000000 + 1); @@ -XXX,XX +XXX,XX @@ static void check_periodic(gconstpointer arg) static void check_on_the_fly_mode_change(gconstpointer arg) { const uint8_t *policy = arg; - QEMUBH *bh = qemu_bh_new(ptimer_trigger, NULL); - ptimer_state *ptimer = ptimer_init_with_bh(bh, *policy); + ptimer_state *ptimer = ptimer_init(ptimer_trigger, NULL, *policy); bool wrap_policy = (*policy & PTIMER_POLICY_WRAP_AFTER_ONE_PERIOD); bool no_round_down = (*policy & PTIMER_POLICY_NO_COUNTER_ROUND_DOWN); triggered = false; + ptimer_transaction_begin(ptimer); ptimer_set_period(ptimer, 2000000); ptimer_set_limit(ptimer, 10, 1); ptimer_run(ptimer, 1); + ptimer_transaction_commit(ptimer); qemu_clock_step(2000000 * 9 + 1); g_assert_cmpuint(ptimer_get_count(ptimer), ==, no_round_down ? 1 : 0); g_assert_false(triggered); + ptimer_transaction_begin(ptimer); ptimer_run(ptimer, 0); + ptimer_transaction_commit(ptimer); g_assert_cmpuint(ptimer_get_count(ptimer), ==, no_round_down ? 1 : 0); g_assert_false(triggered); @@ -XXX,XX +XXX,XX @@ static void check_on_the_fly_mode_change(gconstpointer arg) qemu_clock_step(2000000 * 9); + ptimer_transaction_begin(ptimer); ptimer_run(ptimer, 1); + ptimer_transaction_commit(ptimer); g_assert_cmpuint(ptimer_get_count(ptimer), ==, (no_round_down ? 1 : 0) + (wrap_policy ? 1 : 0)); @@ -XXX,XX +XXX,XX @@ static void check_on_the_fly_mode_change(gconstpointer arg) static void check_on_the_fly_period_change(gconstpointer arg) { const uint8_t *policy = arg; - QEMUBH *bh = qemu_bh_new(ptimer_trigger, NULL); - ptimer_state *ptimer = ptimer_init_with_bh(bh, *policy); + ptimer_state *ptimer = ptimer_init(ptimer_trigger, NULL, *policy); bool no_round_down = (*policy & PTIMER_POLICY_NO_COUNTER_ROUND_DOWN); triggered = false; + ptimer_transaction_begin(ptimer); ptimer_set_period(ptimer, 2000000); ptimer_set_limit(ptimer, 8, 1); ptimer_run(ptimer, 1); + ptimer_transaction_commit(ptimer); qemu_clock_step(2000000 * 4 + 1); g_assert_cmpuint(ptimer_get_count(ptimer), ==, no_round_down ? 4 : 3); g_assert_false(triggered); + ptimer_transaction_begin(ptimer); ptimer_set_period(ptimer, 4000000); + ptimer_transaction_commit(ptimer); g_assert_cmpuint(ptimer_get_count(ptimer), ==, no_round_down ? 4 : 3); qemu_clock_step(4000000 * 2 + 1); @@ -XXX,XX +XXX,XX @@ static void check_on_the_fly_period_change(gconstpointer arg) static void check_on_the_fly_freq_change(gconstpointer arg) { const uint8_t *policy = arg; - QEMUBH *bh = qemu_bh_new(ptimer_trigger, NULL); - ptimer_state *ptimer = ptimer_init_with_bh(bh, *policy); + ptimer_state *ptimer = ptimer_init(ptimer_trigger, NULL, *policy); bool no_round_down = (*policy & PTIMER_POLICY_NO_COUNTER_ROUND_DOWN); triggered = false; + ptimer_transaction_begin(ptimer); ptimer_set_freq(ptimer, 500); ptimer_set_limit(ptimer, 8, 1); ptimer_run(ptimer, 1); + ptimer_transaction_commit(ptimer); qemu_clock_step(2000000 * 4 + 1); g_assert_cmpuint(ptimer_get_count(ptimer), ==, no_round_down ? 4 : 3); g_assert_false(triggered); + ptimer_transaction_begin(ptimer); ptimer_set_freq(ptimer, 250); + ptimer_transaction_commit(ptimer); g_assert_cmpuint(ptimer_get_count(ptimer), ==, no_round_down ? 4 : 3); qemu_clock_step(2000000 * 4 + 1); @@ -XXX,XX +XXX,XX @@ static void check_on_the_fly_freq_change(gconstpointer arg) static void check_run_with_period_0(gconstpointer arg) { const uint8_t *policy = arg; - QEMUBH *bh = qemu_bh_new(ptimer_trigger, NULL); - ptimer_state *ptimer = ptimer_init_with_bh(bh, *policy); + ptimer_state *ptimer = ptimer_init(ptimer_trigger, NULL, *policy); triggered = false; + ptimer_transaction_begin(ptimer); ptimer_set_count(ptimer, 99); ptimer_run(ptimer, 1); + ptimer_transaction_commit(ptimer); qemu_clock_step(10 * NANOSECONDS_PER_SECOND); @@ -XXX,XX +XXX,XX @@ static void check_run_with_period_0(gconstpointer arg) static void check_run_with_delta_0(gconstpointer arg) { const uint8_t *policy = arg; - QEMUBH *bh = qemu_bh_new(ptimer_trigger, NULL); - ptimer_state *ptimer = ptimer_init_with_bh(bh, *policy); + ptimer_state *ptimer = ptimer_init(ptimer_trigger, NULL, *policy); bool wrap_policy = (*policy & PTIMER_POLICY_WRAP_AFTER_ONE_PERIOD); bool no_immediate_trigger = (*policy & PTIMER_POLICY_NO_IMMEDIATE_TRIGGER); bool no_immediate_reload = (*policy & PTIMER_POLICY_NO_IMMEDIATE_RELOAD); @@ -XXX,XX +XXX,XX @@ static void check_run_with_delta_0(gconstpointer arg) triggered = false; + ptimer_transaction_begin(ptimer); ptimer_set_period(ptimer, 2000000); ptimer_set_limit(ptimer, 99, 0); ptimer_run(ptimer, 1); + ptimer_transaction_commit(ptimer); g_assert_cmpuint(ptimer_get_count(ptimer), ==, no_immediate_reload ? 0 : 99); @@ -XXX,XX +XXX,XX @@ static void check_run_with_delta_0(gconstpointer arg) g_assert_false(triggered); } + ptimer_transaction_begin(ptimer); ptimer_set_count(ptimer, 99); ptimer_run(ptimer, 1); + ptimer_transaction_commit(ptimer); } qemu_clock_step(2000000 + 1); @@ -XXX,XX +XXX,XX @@ static void check_run_with_delta_0(gconstpointer arg) triggered = false; + ptimer_transaction_begin(ptimer); ptimer_set_count(ptimer, 0); ptimer_run(ptimer, 0); + ptimer_transaction_commit(ptimer); g_assert_cmpuint(ptimer_get_count(ptimer), ==, no_immediate_reload ? 0 : 99); @@ -XXX,XX +XXX,XX @@ static void check_run_with_delta_0(gconstpointer arg) wrap_policy ? 0 : (no_round_down ? 99 : 98)); g_assert_true(triggered); + ptimer_transaction_begin(ptimer); ptimer_stop(ptimer); + ptimer_transaction_commit(ptimer); ptimer_free(ptimer); } static void check_periodic_with_load_0(gconstpointer arg) { const uint8_t *policy = arg; - QEMUBH *bh = qemu_bh_new(ptimer_trigger, NULL); - ptimer_state *ptimer = ptimer_init_with_bh(bh, *policy); + ptimer_state *ptimer = ptimer_init(ptimer_trigger, NULL, *policy); bool continuous_trigger = (*policy & PTIMER_POLICY_CONTINUOUS_TRIGGER); bool no_immediate_trigger = (*policy & PTIMER_POLICY_NO_IMMEDIATE_TRIGGER); bool trig_only_on_dec = (*policy & PTIMER_POLICY_TRIGGER_ONLY_ON_DECREMENT); triggered = false; + ptimer_transaction_begin(ptimer); ptimer_set_period(ptimer, 2000000); ptimer_run(ptimer, 0); + ptimer_transaction_commit(ptimer); g_assert_cmpuint(ptimer_get_count(ptimer), ==, 0); @@ -XXX,XX +XXX,XX @@ static void check_periodic_with_load_0(gconstpointer arg) triggered = false; + ptimer_transaction_begin(ptimer); ptimer_set_count(ptimer, 10); ptimer_run(ptimer, 0); + ptimer_transaction_commit(ptimer); qemu_clock_step(2000000 * 10 + 1); @@ -XXX,XX +XXX,XX @@ static void check_periodic_with_load_0(gconstpointer arg) g_assert_false(triggered); } + ptimer_transaction_begin(ptimer); ptimer_stop(ptimer); + ptimer_transaction_commit(ptimer); ptimer_free(ptimer); } static void check_oneshot_with_load_0(gconstpointer arg) { const uint8_t *policy = arg; - QEMUBH *bh = qemu_bh_new(ptimer_trigger, NULL); - ptimer_state *ptimer = ptimer_init_with_bh(bh, *policy); + ptimer_state *ptimer = ptimer_init(ptimer_trigger, NULL, *policy); bool no_immediate_trigger = (*policy & PTIMER_POLICY_NO_IMMEDIATE_TRIGGER); bool trig_only_on_dec = (*policy & PTIMER_POLICY_TRIGGER_ONLY_ON_DECREMENT); triggered = false; + ptimer_transaction_begin(ptimer); ptimer_set_period(ptimer, 2000000); ptimer_run(ptimer, 1); + ptimer_transaction_commit(ptimer); g_assert_cmpuint(ptimer_get_count(ptimer), ==, 0); -- 2.20.1
Switch the arm_timer.c code away from bottom-half based ptimers to the new transaction-based ptimer API. This just requires adding begin/commit calls around the various arms of arm_timer_write() that modify the ptimer state, and using the new ptimer_init() function to create the timer. Fixes: https://bugs.launchpad.net/qemu/+bug/1777777 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20191008171740.9679-5-peter.maydell@linaro.org --- hw/timer/arm_timer.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/hw/timer/arm_timer.c b/hw/timer/arm_timer.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/arm_timer.c +++ b/hw/timer/arm_timer.c @@ -XXX,XX +XXX,XX @@ #include "hw/irq.h" #include "hw/ptimer.h" #include "hw/qdev-properties.h" -#include "qemu/main-loop.h" #include "qemu/module.h" #include "qemu/log.h" @@ -XXX,XX +XXX,XX @@ static uint32_t arm_timer_read(void *opaque, hwaddr offset) } } -/* Reset the timer limit after settings have changed. */ +/* + * Reset the timer limit after settings have changed. + * May only be called from inside a ptimer transaction block. + */ static void arm_timer_recalibrate(arm_timer_state *s, int reload) { uint32_t limit; @@ -XXX,XX +XXX,XX @@ static void arm_timer_write(void *opaque, hwaddr offset, switch (offset >> 2) { case 0: /* TimerLoad */ s->limit = value; + ptimer_transaction_begin(s->timer); arm_timer_recalibrate(s, 1); + ptimer_transaction_commit(s->timer); break; case 1: /* TimerValue */ /* ??? Linux seems to want to write to this readonly register. Ignore it. */ break; case 2: /* TimerControl */ + ptimer_transaction_begin(s->timer); if (s->control & TIMER_CTRL_ENABLE) { /* Pause the timer if it is running. This may cause some inaccuracy dure to rounding, but avoids a whole lot of other @@ -XXX,XX +XXX,XX @@ static void arm_timer_write(void *opaque, hwaddr offset, /* Restart the timer if still enabled. */ ptimer_run(s->timer, (s->control & TIMER_CTRL_ONESHOT) != 0); } + ptimer_transaction_commit(s->timer); break; case 3: /* TimerIntClr */ s->int_level = 0; break; case 6: /* TimerBGLoad */ s->limit = value; + ptimer_transaction_begin(s->timer); arm_timer_recalibrate(s, 0); + ptimer_transaction_commit(s->timer); break; default: qemu_log_mask(LOG_GUEST_ERROR, @@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_arm_timer = { static arm_timer_state *arm_timer_init(uint32_t freq) { arm_timer_state *s; - QEMUBH *bh; s = (arm_timer_state *)g_malloc0(sizeof(arm_timer_state)); s->freq = freq; s->control = TIMER_CTRL_IE; - bh = qemu_bh_new(arm_timer_tick, s); - s->timer = ptimer_init_with_bh(bh, PTIMER_POLICY_DEFAULT); + s->timer = ptimer_init(arm_timer_tick, s, PTIMER_POLICY_DEFAULT); vmstate_register(NULL, -1, &vmstate_arm_timer, s); return s; } -- 2.20.1
Switch the musicpal code away from bottom-half based ptimers to the new transaction-based ptimer API. This just requires adding begin/commit calls around the various places that modify the ptimer state, and using the new ptimer_init() function to create the timer. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20191008171740.9679-6-peter.maydell@linaro.org --- hw/arm/musicpal.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/hw/arm/musicpal.c b/hw/arm/musicpal.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/musicpal.c +++ b/hw/arm/musicpal.c @@ -XXX,XX +XXX,XX @@ static void mv88w8618_timer_tick(void *opaque) static void mv88w8618_timer_init(SysBusDevice *dev, mv88w8618_timer_state *s, uint32_t freq) { - QEMUBH *bh; - sysbus_init_irq(dev, &s->irq); s->freq = freq; - bh = qemu_bh_new(mv88w8618_timer_tick, s); - s->ptimer = ptimer_init_with_bh(bh, PTIMER_POLICY_DEFAULT); + s->ptimer = ptimer_init(mv88w8618_timer_tick, s, PTIMER_POLICY_DEFAULT); } static uint64_t mv88w8618_pit_read(void *opaque, hwaddr offset, @@ -XXX,XX +XXX,XX @@ static void mv88w8618_pit_write(void *opaque, hwaddr offset, case MP_PIT_TIMER1_LENGTH ... MP_PIT_TIMER4_LENGTH: t = &s->timer[offset >> 2]; t->limit = value; + ptimer_transaction_begin(t->ptimer); if (t->limit > 0) { ptimer_set_limit(t->ptimer, t->limit, 1); } else { ptimer_stop(t->ptimer); } + ptimer_transaction_commit(t->ptimer); break; case MP_PIT_CONTROL: for (i = 0; i < 4; i++) { t = &s->timer[i]; + ptimer_transaction_begin(t->ptimer); if (value & 0xf && t->limit > 0) { ptimer_set_limit(t->ptimer, t->limit, 0); ptimer_set_freq(t->ptimer, t->freq); @@ -XXX,XX +XXX,XX @@ static void mv88w8618_pit_write(void *opaque, hwaddr offset, } else { ptimer_stop(t->ptimer); } + ptimer_transaction_commit(t->ptimer); value >>= 4; } break; @@ -XXX,XX +XXX,XX @@ static void mv88w8618_pit_reset(DeviceState *d) int i; for (i = 0; i < 4; i++) { - ptimer_stop(s->timer[i].ptimer); - s->timer[i].limit = 0; + mv88w8618_timer_state *t = &s->timer[i]; + ptimer_transaction_begin(t->ptimer); + ptimer_stop(t->ptimer); + ptimer_transaction_commit(t->ptimer); + t->limit = 0; } } -- 2.20.1
Switch the allwinner-a10-pit code away from bottom-half based ptimers to the new transaction-based ptimer API. This just requires adding begin/commit calls around the various places that modify the ptimer state, and using the new ptimer_init() function to create the timer. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20191008171740.9679-7-peter.maydell@linaro.org --- hw/timer/allwinner-a10-pit.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/hw/timer/allwinner-a10-pit.c b/hw/timer/allwinner-a10-pit.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/allwinner-a10-pit.c +++ b/hw/timer/allwinner-a10-pit.c @@ -XXX,XX +XXX,XX @@ #include "hw/timer/allwinner-a10-pit.h" #include "migration/vmstate.h" #include "qemu/log.h" -#include "qemu/main-loop.h" #include "qemu/module.h" static void a10_pit_update_irq(AwA10PITState *s) @@ -XXX,XX +XXX,XX @@ static uint64_t a10_pit_read(void *opaque, hwaddr offset, unsigned size) return 0; } +/* Must be called inside a ptimer transaction block for s->timer[index] */ static void a10_pit_set_freq(AwA10PITState *s, int index) { uint32_t prescaler, source, source_freq; @@ -XXX,XX +XXX,XX @@ static void a10_pit_write(void *opaque, hwaddr offset, uint64_t value, switch (offset & 0x0f) { case AW_A10_PIT_TIMER_CONTROL: s->control[index] = value; + ptimer_transaction_begin(s->timer[index]); a10_pit_set_freq(s, index); if (s->control[index] & AW_A10_PIT_TIMER_RELOAD) { ptimer_set_count(s->timer[index], s->interval[index]); @@ -XXX,XX +XXX,XX @@ static void a10_pit_write(void *opaque, hwaddr offset, uint64_t value, } else { ptimer_stop(s->timer[index]); } + ptimer_transaction_commit(s->timer[index]); break; case AW_A10_PIT_TIMER_INTERVAL: s->interval[index] = value; + ptimer_transaction_begin(s->timer[index]); ptimer_set_limit(s->timer[index], s->interval[index], 1); + ptimer_transaction_commit(s->timer[index]); break; case AW_A10_PIT_TIMER_COUNT: s->count[index] = value; @@ -XXX,XX +XXX,XX @@ static void a10_pit_reset(DeviceState *dev) s->control[i] = AW_A10_PIT_DEFAULT_CLOCK; s->interval[i] = 0; s->count[i] = 0; + ptimer_transaction_begin(s->timer[i]); ptimer_stop(s->timer[i]); a10_pit_set_freq(s, i); + ptimer_transaction_commit(s->timer[i]); } s->watch_dog_mode = 0; s->watch_dog_control = 0; @@ -XXX,XX +XXX,XX @@ static void a10_pit_init(Object *obj) { AwA10PITState *s = AW_A10_PIT(obj); SysBusDevice *sbd = SYS_BUS_DEVICE(obj); - QEMUBH * bh[AW_A10_PIT_TIMER_NR]; uint8_t i; for (i = 0; i < AW_A10_PIT_TIMER_NR; i++) { @@ -XXX,XX +XXX,XX @@ static void a10_pit_init(Object *obj) tc->container = s; tc->index = i; - bh[i] = qemu_bh_new(a10_pit_timer_cb, tc); - s->timer[i] = ptimer_init_with_bh(bh[i], PTIMER_POLICY_DEFAULT); + s->timer[i] = ptimer_init(a10_pit_timer_cb, tc, PTIMER_POLICY_DEFAULT); } } -- 2.20.1
Switch the arm_mptimer.c code away from bottom-half based ptimers to the new transaction-based ptimer API. This just requires adding begin/commit calls around the various places that modify the ptimer state, and using the new ptimer_init() function to create the timer. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20191008171740.9679-8-peter.maydell@linaro.org --- hw/timer/arm_mptimer.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/hw/timer/arm_mptimer.c b/hw/timer/arm_mptimer.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/arm_mptimer.c +++ b/hw/timer/arm_mptimer.c @@ -XXX,XX +XXX,XX @@ #include "hw/timer/arm_mptimer.h" #include "migration/vmstate.h" #include "qapi/error.h" -#include "qemu/main-loop.h" #include "qemu/module.h" #include "hw/core/cpu.h" @@ -XXX,XX +XXX,XX @@ static inline uint32_t timerblock_scale(uint32_t control) return (((control >> 8) & 0xff) + 1) * 10; } +/* Must be called within a ptimer transaction block */ static inline void timerblock_set_count(struct ptimer_state *timer, uint32_t control, uint64_t *count) { @@ -XXX,XX +XXX,XX @@ static inline void timerblock_set_count(struct ptimer_state *timer, ptimer_set_count(timer, *count); } +/* Must be called within a ptimer transaction block */ static inline void timerblock_run(struct ptimer_state *timer, uint32_t control, uint32_t load) { @@ -XXX,XX +XXX,XX @@ static void timerblock_write(void *opaque, hwaddr addr, uint32_t control = tb->control; switch (addr) { case 0: /* Load */ + ptimer_transaction_begin(tb->timer); /* Setting load to 0 stops the timer without doing the tick if * prescaler = 0. */ @@ -XXX,XX +XXX,XX @@ static void timerblock_write(void *opaque, hwaddr addr, } ptimer_set_limit(tb->timer, value, 1); timerblock_run(tb->timer, control, value); + ptimer_transaction_commit(tb->timer); break; case 4: /* Counter. */ + ptimer_transaction_begin(tb->timer); /* Setting counter to 0 stops the one-shot timer, or periodic with * load = 0, without doing the tick if prescaler = 0. */ @@ -XXX,XX +XXX,XX @@ static void timerblock_write(void *opaque, hwaddr addr, } timerblock_set_count(tb->timer, control, &value); timerblock_run(tb->timer, control, value); + ptimer_transaction_commit(tb->timer); break; case 8: /* Control. */ + ptimer_transaction_begin(tb->timer); if ((control & 3) != (value & 3)) { ptimer_stop(tb->timer); } @@ -XXX,XX +XXX,XX @@ static void timerblock_write(void *opaque, hwaddr addr, timerblock_run(tb->timer, value, count); } tb->control = value; + ptimer_transaction_commit(tb->timer); break; case 12: /* Interrupt status. */ tb->status &= ~value; @@ -XXX,XX +XXX,XX @@ static void timerblock_reset(TimerBlock *tb) tb->control = 0; tb->status = 0; if (tb->timer) { + ptimer_transaction_begin(tb->timer); ptimer_stop(tb->timer); ptimer_set_limit(tb->timer, 0, 1); ptimer_set_period(tb->timer, timerblock_scale(0)); + ptimer_transaction_commit(tb->timer); } } @@ -XXX,XX +XXX,XX @@ static void arm_mptimer_realize(DeviceState *dev, Error **errp) */ for (i = 0; i < s->num_cpu; i++) { TimerBlock *tb = &s->timerblock[i]; - QEMUBH *bh = qemu_bh_new(timerblock_tick, tb); - tb->timer = ptimer_init_with_bh(bh, PTIMER_POLICY); + tb->timer = ptimer_init(timerblock_tick, tb, PTIMER_POLICY); sysbus_init_irq(sbd, &tb->irq); memory_region_init_io(&tb->iomem, OBJECT(s), &timerblock_ops, tb, "arm_mptimer_timerblock", 0x20); -- 2.20.1
Switch the cmsdk-apb-dualtimer code away from bottom-half based ptimers to the new transaction-based ptimer API. This just requires adding begin/commit calls around the various places that modify the ptimer state, and using the new ptimer_init() function to create the timer. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20191008171740.9679-9-peter.maydell@linaro.org --- hw/timer/cmsdk-apb-dualtimer.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/hw/timer/cmsdk-apb-dualtimer.c b/hw/timer/cmsdk-apb-dualtimer.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/cmsdk-apb-dualtimer.c +++ b/hw/timer/cmsdk-apb-dualtimer.c @@ -XXX,XX +XXX,XX @@ #include "qemu/log.h" #include "trace.h" #include "qapi/error.h" -#include "qemu/main-loop.h" #include "qemu/module.h" #include "hw/sysbus.h" #include "hw/irq.h" @@ -XXX,XX +XXX,XX @@ static void cmsdk_dualtimermod_write_control(CMSDKAPBDualTimerModule *m, /* Handle a write to the CONTROL register */ uint32_t changed; + ptimer_transaction_begin(m->timer); + newctrl &= R_CONTROL_VALID_MASK; changed = m->control ^ newctrl; @@ -XXX,XX +XXX,XX @@ static void cmsdk_dualtimermod_write_control(CMSDKAPBDualTimerModule *m, } m->control = newctrl; + + ptimer_transaction_commit(m->timer); } static uint64_t cmsdk_apb_dualtimer_read(void *opaque, hwaddr offset, @@ -XXX,XX +XXX,XX @@ static void cmsdk_apb_dualtimer_write(void *opaque, hwaddr offset, if (!(m->control & R_CONTROL_SIZE_MASK)) { value &= 0xffff; } + ptimer_transaction_begin(m->timer); if (!(m->control & R_CONTROL_MODE_MASK)) { /* * In free-running mode this won't set the limit but will @@ -XXX,XX +XXX,XX @@ static void cmsdk_apb_dualtimer_write(void *opaque, hwaddr offset, ptimer_run(m->timer, 1); } } + ptimer_transaction_commit(m->timer); break; case A_TIMER1BGLOAD: /* Set the limit, but not the current count */ @@ -XXX,XX +XXX,XX @@ static void cmsdk_apb_dualtimer_write(void *opaque, hwaddr offset, if (!(m->control & R_CONTROL_SIZE_MASK)) { value &= 0xffff; } + ptimer_transaction_begin(m->timer); ptimer_set_limit(m->timer, value, 0); + ptimer_transaction_commit(m->timer); break; case A_TIMER1CONTROL: cmsdk_dualtimermod_write_control(m, value); @@ -XXX,XX +XXX,XX @@ static void cmsdk_dualtimermod_reset(CMSDKAPBDualTimerModule *m) m->intstatus = 0; m->load = 0; m->value = 0xffffffff; + ptimer_transaction_begin(m->timer); ptimer_stop(m->timer); /* * We start in free-running mode, with VALUE at 0xffffffff, and @@ -XXX,XX +XXX,XX @@ static void cmsdk_dualtimermod_reset(CMSDKAPBDualTimerModule *m) */ ptimer_set_limit(m->timer, 0xffff, 1); ptimer_set_freq(m->timer, m->parent->pclk_frq); + ptimer_transaction_commit(m->timer); } static void cmsdk_apb_dualtimer_reset(DeviceState *dev) @@ -XXX,XX +XXX,XX @@ static void cmsdk_apb_dualtimer_realize(DeviceState *dev, Error **errp) for (i = 0; i < ARRAY_SIZE(s->timermod); i++) { CMSDKAPBDualTimerModule *m = &s->timermod[i]; - QEMUBH *bh = qemu_bh_new(cmsdk_dualtimermod_tick, m); m->parent = s; - m->timer = ptimer_init_with_bh(bh, + m->timer = ptimer_init(cmsdk_dualtimermod_tick, m, PTIMER_POLICY_WRAP_AFTER_ONE_PERIOD | PTIMER_POLICY_TRIGGER_ONLY_ON_DECREMENT | PTIMER_POLICY_NO_IMMEDIATE_RELOAD | -- 2.20.1
Switch the cmsdk-apb-timer code away from bottom-half based ptimers to the new transaction-based ptimer API. This just requires adding begin/commit calls around the various places that modify the ptimer state, and using the new ptimer_init() function to create the timer. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20191008171740.9679-10-peter.maydell@linaro.org --- hw/timer/cmsdk-apb-timer.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/hw/timer/cmsdk-apb-timer.c b/hw/timer/cmsdk-apb-timer.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/cmsdk-apb-timer.c +++ b/hw/timer/cmsdk-apb-timer.c @@ -XXX,XX +XXX,XX @@ #include "qemu/osdep.h" #include "qemu/log.h" -#include "qemu/main-loop.h" #include "qemu/module.h" #include "qapi/error.h" #include "trace.h" @@ -XXX,XX +XXX,XX @@ static void cmsdk_apb_timer_write(void *opaque, hwaddr offset, uint64_t value, "CMSDK APB timer: EXTIN input not supported\n"); } s->ctrl = value & 0xf; + ptimer_transaction_begin(s->timer); if (s->ctrl & R_CTRL_EN_MASK) { ptimer_run(s->timer, ptimer_get_limit(s->timer) == 0); } else { ptimer_stop(s->timer); } + ptimer_transaction_commit(s->timer); break; case A_RELOAD: /* Writing to reload also sets the current timer value */ + ptimer_transaction_begin(s->timer); if (!value) { ptimer_stop(s->timer); } @@ -XXX,XX +XXX,XX @@ static void cmsdk_apb_timer_write(void *opaque, hwaddr offset, uint64_t value, */ ptimer_run(s->timer, 0); } + ptimer_transaction_commit(s->timer); break; case A_VALUE: + ptimer_transaction_begin(s->timer); if (!value && !ptimer_get_limit(s->timer)) { ptimer_stop(s->timer); } @@ -XXX,XX +XXX,XX @@ static void cmsdk_apb_timer_write(void *opaque, hwaddr offset, uint64_t value, if (value && (s->ctrl & R_CTRL_EN_MASK)) { ptimer_run(s->timer, ptimer_get_limit(s->timer) == 0); } + ptimer_transaction_commit(s->timer); break; case A_INTSTATUS: /* Just one bit, which is W1C. */ @@ -XXX,XX +XXX,XX @@ static void cmsdk_apb_timer_reset(DeviceState *dev) trace_cmsdk_apb_timer_reset(); s->ctrl = 0; s->intstatus = 0; + ptimer_transaction_begin(s->timer); ptimer_stop(s->timer); /* Set the limit and the count */ ptimer_set_limit(s->timer, 0, 1); + ptimer_transaction_commit(s->timer); } static void cmsdk_apb_timer_init(Object *obj) @@ -XXX,XX +XXX,XX @@ static void cmsdk_apb_timer_init(Object *obj) static void cmsdk_apb_timer_realize(DeviceState *dev, Error **errp) { CMSDKAPBTIMER *s = CMSDK_APB_TIMER(dev); - QEMUBH *bh; if (s->pclk_frq == 0) { error_setg(errp, "CMSDK APB timer: pclk-frq property must be set"); return; } - bh = qemu_bh_new(cmsdk_apb_timer_tick, s); - s->timer = ptimer_init_with_bh(bh, + s->timer = ptimer_init(cmsdk_apb_timer_tick, s, PTIMER_POLICY_WRAP_AFTER_ONE_PERIOD | PTIMER_POLICY_TRIGGER_ONLY_ON_DECREMENT | PTIMER_POLICY_NO_IMMEDIATE_RELOAD | PTIMER_POLICY_NO_COUNTER_ROUND_DOWN); + ptimer_transaction_begin(s->timer); ptimer_set_freq(s->timer, s->pclk_frq); + ptimer_transaction_commit(s->timer); } static const VMStateDescription cmsdk_apb_timer_vmstate = { -- 2.20.1
Switch the digic-timer.c code away from bottom-half based ptimers to the new transaction-based ptimer API. This just requires adding begin/commit calls around the various places that modify the ptimer state, and using the new ptimer_init() function to create the timer. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20191008171740.9679-11-peter.maydell@linaro.org --- hw/timer/digic-timer.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/hw/timer/digic-timer.c b/hw/timer/digic-timer.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/digic-timer.c +++ b/hw/timer/digic-timer.c @@ -XXX,XX +XXX,XX @@ #include "qemu/osdep.h" #include "hw/sysbus.h" #include "hw/ptimer.h" -#include "qemu/main-loop.h" #include "qemu/module.h" #include "qemu/log.h" @@ -XXX,XX +XXX,XX @@ static void digic_timer_reset(DeviceState *dev) { DigicTimerState *s = DIGIC_TIMER(dev); + ptimer_transaction_begin(s->ptimer); ptimer_stop(s->ptimer); + ptimer_transaction_commit(s->ptimer); s->control = 0; s->relvalue = 0; } @@ -XXX,XX +XXX,XX @@ static void digic_timer_write(void *opaque, hwaddr offset, break; } + ptimer_transaction_begin(s->ptimer); if (value & DIGIC_TIMER_CONTROL_EN) { ptimer_run(s->ptimer, 0); } s->control = (uint32_t)value; + ptimer_transaction_commit(s->ptimer); break; case DIGIC_TIMER_RELVALUE: s->relvalue = extract32(value, 0, 16); + ptimer_transaction_begin(s->ptimer); ptimer_set_limit(s->ptimer, s->relvalue, 1); + ptimer_transaction_commit(s->ptimer); break; case DIGIC_TIMER_VALUE: @@ -XXX,XX +XXX,XX @@ static const MemoryRegionOps digic_timer_ops = { .endianness = DEVICE_NATIVE_ENDIAN, }; +static void digic_timer_tick(void *opaque) +{ + /* Nothing to do on timer rollover */ +} + static void digic_timer_init(Object *obj) { DigicTimerState *s = DIGIC_TIMER(obj); - s->ptimer = ptimer_init_with_bh(NULL, PTIMER_POLICY_DEFAULT); + s->ptimer = ptimer_init(digic_timer_tick, NULL, PTIMER_POLICY_DEFAULT); /* * FIXME: there is no documentation on Digic timer * frequency setup so let it always run at 1 MHz */ + ptimer_transaction_begin(s->ptimer); ptimer_set_freq(s->ptimer, 1 * 1000 * 1000); + ptimer_transaction_commit(s->ptimer); memory_region_init_io(&s->iomem, OBJECT(s), &digic_timer_ops, s, TYPE_DIGIC_TIMER, 0x100); -- 2.20.1
We want to switch the exynos MCT code away from bottom-half based ptimers to the new transaction-based ptimer API. The MCT is complicated and uses multiple different ptimers, so it's clearer to switch it a piece at a time. Here we change over only the GFRC. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20191008171740.9679-12-peter.maydell@linaro.org --- hw/timer/exynos4210_mct.c | 48 ++++++++++++++++++++++++++++++++++++--- 1 file changed, 45 insertions(+), 3 deletions(-) diff --git a/hw/timer/exynos4210_mct.c b/hw/timer/exynos4210_mct.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/exynos4210_mct.c +++ b/hw/timer/exynos4210_mct.c @@ -XXX,XX +XXX,XX @@ static void exynos4210_mct_update_freq(Exynos4210MCTState *s); /* * Set counter of FRC global timer. + * Must be called within exynos4210_gfrc_tx_begin/commit block. */ static void exynos4210_gfrc_set_count(Exynos4210MCTGT *s, uint64_t count) { @@ -XXX,XX +XXX,XX @@ static uint64_t exynos4210_gfrc_get_count(Exynos4210MCTGT *s) /* * Stop global FRC timer + * Must be called within exynos4210_gfrc_tx_begin/commit block. */ static void exynos4210_gfrc_stop(Exynos4210MCTGT *s) { @@ -XXX,XX +XXX,XX @@ static void exynos4210_gfrc_stop(Exynos4210MCTGT *s) /* * Start global FRC timer + * Must be called within exynos4210_gfrc_tx_begin/commit block. */ static void exynos4210_gfrc_start(Exynos4210MCTGT *s) { @@ -XXX,XX +XXX,XX @@ static void exynos4210_gfrc_start(Exynos4210MCTGT *s) ptimer_run(s->ptimer_frc, 1); } +/* + * Start ptimer transaction for global FRC timer; this is just for + * consistency with the way we wrap operations like stop and run. + */ +static void exynos4210_gfrc_tx_begin(Exynos4210MCTGT *s) +{ + ptimer_transaction_begin(s->ptimer_frc); +} + +/* Commit ptimer transaction for global FRC timer. */ +static void exynos4210_gfrc_tx_commit(Exynos4210MCTGT *s) +{ + ptimer_transaction_commit(s->ptimer_frc); +} + /* * Find next nearest Comparator. If current Comparator value equals to other * Comparator value, skip them both @@ -XXX,XX +XXX,XX @@ static uint64_t exynos4210_gcomp_get_distance(Exynos4210MCTState *s, int32_t id) /* * Restart global FRC timer + * Must be called within exynos4210_gfrc_tx_begin/commit block. */ static void exynos4210_gfrc_restart(Exynos4210MCTState *s) { @@ -XXX,XX +XXX,XX @@ static void exynos4210_ltick_event(void *opaque) exynos4210_ltick_int_start(&s->tick_timer); } +static void tx_ptimer_set_freq(ptimer_state *s, uint32_t freq) +{ + /* + * callers of exynos4210_mct_update_freq() never do anything + * else that needs to be in the same ptimer transaction, so + * to avoid a lot of repetition we have a convenience function + * for begin/set_freq/commit. + */ + ptimer_transaction_begin(s); + ptimer_set_freq(s, freq); + ptimer_transaction_commit(s); +} + /* update timer frequency */ static void exynos4210_mct_update_freq(Exynos4210MCTState *s) { @@ -XXX,XX +XXX,XX @@ static void exynos4210_mct_update_freq(Exynos4210MCTState *s) DPRINTF("freq=%dHz\n", s->freq); /* global timer */ - ptimer_set_freq(s->g_timer.ptimer_frc, s->freq); + tx_ptimer_set_freq(s->g_timer.ptimer_frc, s->freq); /* local timer */ ptimer_set_freq(s->l_timer[0].tick_timer.ptimer_tick, s->freq); @@ -XXX,XX +XXX,XX @@ static void exynos4210_mct_reset(DeviceState *d) /* global timer */ memset(&s->g_timer.reg, 0, sizeof(s->g_timer.reg)); + exynos4210_gfrc_tx_begin(&s->g_timer); exynos4210_gfrc_stop(&s->g_timer); + exynos4210_gfrc_tx_commit(&s->g_timer); /* local timer */ memset(s->l_timer[0].reg.cnt, 0, sizeof(s->l_timer[0].reg.cnt)); @@ -XXX,XX +XXX,XX @@ static void exynos4210_mct_write(void *opaque, hwaddr offset, } s->g_timer.reg.cnt = new_frc; + exynos4210_gfrc_tx_begin(&s->g_timer); exynos4210_gfrc_restart(s); + exynos4210_gfrc_tx_commit(&s->g_timer); break; case G_CNT_WSTAT: @@ -XXX,XX +XXX,XX @@ static void exynos4210_mct_write(void *opaque, hwaddr offset, s->g_timer.reg.wstat |= G_WSTAT_COMP_L(index); } + exynos4210_gfrc_tx_begin(&s->g_timer); exynos4210_gfrc_restart(s); + exynos4210_gfrc_tx_commit(&s->g_timer); break; case G_TCON: @@ -XXX,XX +XXX,XX @@ static void exynos4210_mct_write(void *opaque, hwaddr offset, DPRINTF("global timer write to reg.g_tcon %llx\n", value); + exynos4210_gfrc_tx_begin(&s->g_timer); + /* Start FRC if transition from disabled to enabled */ if ((value & G_TCON_TIMER_ENABLE) > (old_val & G_TCON_TIMER_ENABLE)) { @@ -XXX,XX +XXX,XX @@ static void exynos4210_mct_write(void *opaque, hwaddr offset, exynos4210_gfrc_restart(s); } } + + exynos4210_gfrc_tx_commit(&s->g_timer); break; case G_INT_CSTAT: @@ -XXX,XX +XXX,XX @@ static void exynos4210_mct_init(Object *obj) QEMUBH *bh[2]; /* Global timer */ - bh[0] = qemu_bh_new(exynos4210_gfrc_event, s); - s->g_timer.ptimer_frc = ptimer_init_with_bh(bh[0], PTIMER_POLICY_DEFAULT); + s->g_timer.ptimer_frc = ptimer_init(exynos4210_gfrc_event, s, + PTIMER_POLICY_DEFAULT); memset(&s->g_timer.reg, 0, sizeof(struct gregs)); /* Local timers */ -- 2.20.1
Switch the exynos MCT LFRC timers over to the ptimer transaction API. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20191008171740.9679-13-peter.maydell@linaro.org --- hw/timer/exynos4210_mct.c | 27 +++++++++++++++++++++++---- 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/hw/timer/exynos4210_mct.c b/hw/timer/exynos4210_mct.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/exynos4210_mct.c +++ b/hw/timer/exynos4210_mct.c @@ -XXX,XX +XXX,XX @@ static uint64_t exynos4210_lfrc_get_count(Exynos4210MCTLT *s) /* * Set counter of FRC local timer. + * Must be called from within exynos4210_lfrc_tx_begin/commit block. */ static void exynos4210_lfrc_update_count(Exynos4210MCTLT *s) { @@ -XXX,XX +XXX,XX @@ static void exynos4210_lfrc_update_count(Exynos4210MCTLT *s) /* * Start local FRC timer + * Must be called from within exynos4210_lfrc_tx_begin/commit block. */ static void exynos4210_lfrc_start(Exynos4210MCTLT *s) { @@ -XXX,XX +XXX,XX @@ static void exynos4210_lfrc_start(Exynos4210MCTLT *s) /* * Stop local FRC timer + * Must be called from within exynos4210_lfrc_tx_begin/commit block. */ static void exynos4210_lfrc_stop(Exynos4210MCTLT *s) { ptimer_stop(s->ptimer_frc); } +/* Start ptimer transaction for local FRC timer */ +static void exynos4210_lfrc_tx_begin(Exynos4210MCTLT *s) +{ + ptimer_transaction_begin(s->ptimer_frc); +} + +/* Commit ptimer transaction for local FRC timer */ +static void exynos4210_lfrc_tx_commit(Exynos4210MCTLT *s) +{ + ptimer_transaction_commit(s->ptimer_frc); +} + /* * Local timer free running counter tick handler */ @@ -XXX,XX +XXX,XX @@ static void exynos4210_mct_update_freq(Exynos4210MCTState *s) /* local timer */ ptimer_set_freq(s->l_timer[0].tick_timer.ptimer_tick, s->freq); - ptimer_set_freq(s->l_timer[0].ptimer_frc, s->freq); + tx_ptimer_set_freq(s->l_timer[0].ptimer_frc, s->freq); ptimer_set_freq(s->l_timer[1].tick_timer.ptimer_tick, s->freq); - ptimer_set_freq(s->l_timer[1].ptimer_frc, s->freq); + tx_ptimer_set_freq(s->l_timer[1].ptimer_frc, s->freq); } } @@ -XXX,XX +XXX,XX @@ static void exynos4210_mct_reset(DeviceState *d) s->l_timer[i].tick_timer.count = 0; s->l_timer[i].tick_timer.distance = 0; s->l_timer[i].tick_timer.progress = 0; + exynos4210_lfrc_tx_begin(&s->l_timer[i]); ptimer_stop(s->l_timer[i].ptimer_frc); + exynos4210_lfrc_tx_commit(&s->l_timer[i]); exynos4210_ltick_timer_init(&s->l_timer[i].tick_timer); } @@ -XXX,XX +XXX,XX @@ static void exynos4210_mct_write(void *opaque, hwaddr offset, } /* Start or Stop local FRC if TCON changed */ + exynos4210_lfrc_tx_begin(&s->l_timer[lt_i]); if ((value & L_TCON_FRC_START) > (s->l_timer[lt_i].reg.tcon & L_TCON_FRC_START)) { DPRINTF("local timer[%d] start frc\n", lt_i); @@ -XXX,XX +XXX,XX @@ static void exynos4210_mct_write(void *opaque, hwaddr offset, DPRINTF("local timer[%d] stop frc\n", lt_i); exynos4210_lfrc_stop(&s->l_timer[lt_i]); } + exynos4210_lfrc_tx_commit(&s->l_timer[lt_i]); break; case L0_TCNTB: case L1_TCNTB: @@ -XXX,XX +XXX,XX @@ static void exynos4210_mct_init(Object *obj) /* Local timers */ for (i = 0; i < 2; i++) { bh[0] = qemu_bh_new(exynos4210_ltick_event, &s->l_timer[i]); - bh[1] = qemu_bh_new(exynos4210_lfrc_event, &s->l_timer[i]); s->l_timer[i].tick_timer.ptimer_tick = ptimer_init_with_bh(bh[0], PTIMER_POLICY_DEFAULT); s->l_timer[i].ptimer_frc = - ptimer_init_with_bh(bh[1], PTIMER_POLICY_DEFAULT); + ptimer_init(exynos4210_lfrc_event, &s->l_timer[i], + PTIMER_POLICY_DEFAULT); s->l_timer[i].id = i; } -- 2.20.1
Switch the ltick ptimer over to the ptimer transaction API. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20191008171740.9679-14-peter.maydell@linaro.org --- hw/timer/exynos4210_mct.c | 31 +++++++++++++++++++++++++------ 1 file changed, 25 insertions(+), 6 deletions(-) diff --git a/hw/timer/exynos4210_mct.c b/hw/timer/exynos4210_mct.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/exynos4210_mct.c +++ b/hw/timer/exynos4210_mct.c @@ -XXX,XX +XXX,XX @@ #include "hw/sysbus.h" #include "migration/vmstate.h" #include "qemu/timer.h" -#include "qemu/main-loop.h" #include "qemu/module.h" #include "hw/ptimer.h" @@ -XXX,XX +XXX,XX @@ static uint32_t exynos4210_ltick_int_get_cnto(struct tick_timer *s) /* * Start local tick cnt timer. + * Must be called within exynos4210_ltick_tx_begin/commit block. */ static void exynos4210_ltick_cnt_start(struct tick_timer *s) { @@ -XXX,XX +XXX,XX @@ static void exynos4210_ltick_cnt_start(struct tick_timer *s) /* * Stop local tick cnt timer. + * Must be called within exynos4210_ltick_tx_begin/commit block. */ static void exynos4210_ltick_cnt_stop(struct tick_timer *s) { @@ -XXX,XX +XXX,XX @@ static void exynos4210_ltick_cnt_stop(struct tick_timer *s) } } +/* Start ptimer transaction for local tick timer */ +static void exynos4210_ltick_tx_begin(struct tick_timer *s) +{ + ptimer_transaction_begin(s->ptimer_tick); +} + +/* Commit ptimer transaction for local tick timer */ +static void exynos4210_ltick_tx_commit(struct tick_timer *s) +{ + ptimer_transaction_commit(s->ptimer_tick); +} + /* * Get counter for CNT timer */ @@ -XXX,XX +XXX,XX @@ static uint32_t exynos4210_ltick_cnt_get_cnto(struct tick_timer *s) /* * Set new values of counters for CNT and INT timers + * Must be called within exynos4210_ltick_tx_begin/commit block. */ static void exynos4210_ltick_set_cntb(struct tick_timer *s, uint32_t new_cnt, uint32_t new_int) @@ -XXX,XX +XXX,XX @@ static void exynos4210_ltick_recalc_count(struct tick_timer *s) static void exynos4210_ltick_timer_init(struct tick_timer *s) { exynos4210_ltick_int_stop(s); + exynos4210_ltick_tx_begin(s); exynos4210_ltick_cnt_stop(s); + exynos4210_ltick_tx_commit(s); s->count = 0; s->distance = 0; @@ -XXX,XX +XXX,XX @@ static void exynos4210_mct_update_freq(Exynos4210MCTState *s) tx_ptimer_set_freq(s->g_timer.ptimer_frc, s->freq); /* local timer */ - ptimer_set_freq(s->l_timer[0].tick_timer.ptimer_tick, s->freq); + tx_ptimer_set_freq(s->l_timer[0].tick_timer.ptimer_tick, s->freq); tx_ptimer_set_freq(s->l_timer[0].ptimer_frc, s->freq); - ptimer_set_freq(s->l_timer[1].tick_timer.ptimer_tick, s->freq); + tx_ptimer_set_freq(s->l_timer[1].tick_timer.ptimer_tick, s->freq); tx_ptimer_set_freq(s->l_timer[1].ptimer_frc, s->freq); } } @@ -XXX,XX +XXX,XX @@ static void exynos4210_mct_write(void *opaque, hwaddr offset, s->l_timer[lt_i].reg.wstat |= L_WSTAT_TCON_WRITE; s->l_timer[lt_i].reg.tcon = value; + exynos4210_ltick_tx_begin(&s->l_timer[lt_i].tick_timer); /* Stop local CNT */ if ((value & L_TCON_TICK_START) < (old_val & L_TCON_TICK_START)) { @@ -XXX,XX +XXX,XX @@ static void exynos4210_mct_write(void *opaque, hwaddr offset, DPRINTF("local timer[%d] start int\n", lt_i); exynos4210_ltick_int_start(&s->l_timer[lt_i].tick_timer); } + exynos4210_ltick_tx_commit(&s->l_timer[lt_i].tick_timer); /* Start or Stop local FRC if TCON changed */ exynos4210_lfrc_tx_begin(&s->l_timer[lt_i]); @@ -XXX,XX +XXX,XX @@ static void exynos4210_mct_write(void *opaque, hwaddr offset, * Due to this we should reload timer to nearest moment when CNT is * expired and then in event handler update tcntb to new TCNTB value. */ + exynos4210_ltick_tx_begin(&s->l_timer[lt_i].tick_timer); exynos4210_ltick_set_cntb(&s->l_timer[lt_i].tick_timer, value, s->l_timer[lt_i].tick_timer.icntb); + exynos4210_ltick_tx_commit(&s->l_timer[lt_i].tick_timer); s->l_timer[lt_i].reg.wstat |= L_WSTAT_TCNTB_WRITE; s->l_timer[lt_i].reg.cnt[L_REG_CNT_TCNTB] = value; @@ -XXX,XX +XXX,XX @@ static void exynos4210_mct_init(Object *obj) int i; Exynos4210MCTState *s = EXYNOS4210_MCT(obj); SysBusDevice *dev = SYS_BUS_DEVICE(obj); - QEMUBH *bh[2]; /* Global timer */ s->g_timer.ptimer_frc = ptimer_init(exynos4210_gfrc_event, s, @@ -XXX,XX +XXX,XX @@ static void exynos4210_mct_init(Object *obj) /* Local timers */ for (i = 0; i < 2; i++) { - bh[0] = qemu_bh_new(exynos4210_ltick_event, &s->l_timer[i]); s->l_timer[i].tick_timer.ptimer_tick = - ptimer_init_with_bh(bh[0], PTIMER_POLICY_DEFAULT); + ptimer_init(exynos4210_ltick_event, &s->l_timer[i], + PTIMER_POLICY_DEFAULT); s->l_timer[i].ptimer_frc = ptimer_init(exynos4210_lfrc_event, &s->l_timer[i], PTIMER_POLICY_DEFAULT); -- 2.20.1
Switch the exynos4210_pwm code away from bottom-half based ptimers to the new transaction-based ptimer API. This just requires adding begin/commit calls around the various places that modify the ptimer state, and using the new ptimer_init() function to create the timer. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20191008171740.9679-15-peter.maydell@linaro.org --- hw/timer/exynos4210_pwm.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/hw/timer/exynos4210_pwm.c b/hw/timer/exynos4210_pwm.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/exynos4210_pwm.c +++ b/hw/timer/exynos4210_pwm.c @@ -XXX,XX +XXX,XX @@ #include "hw/sysbus.h" #include "migration/vmstate.h" #include "qemu/timer.h" -#include "qemu/main-loop.h" #include "qemu/module.h" #include "hw/ptimer.h" @@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_exynos4210_pwm_state = { }; /* - * PWM update frequency + * PWM update frequency. + * Must be called within a ptimer_transaction_begin/commit block + * for s->timer[id].ptimer. */ static void exynos4210_pwm_update_freq(Exynos4210PWMState *s, uint32_t id) { @@ -XXX,XX +XXX,XX @@ static void exynos4210_pwm_write(void *opaque, hwaddr offset, /* update timers frequencies */ for (i = 0; i < EXYNOS4210_PWM_TIMERS_NUM; i++) { + ptimer_transaction_begin(s->timer[i].ptimer); exynos4210_pwm_update_freq(s, s->timer[i].id); + ptimer_transaction_commit(s->timer[i].ptimer); } break; case TCON: for (i = 0; i < EXYNOS4210_PWM_TIMERS_NUM; i++) { + ptimer_transaction_begin(s->timer[i].ptimer); if ((value & TCON_TIMER_MANUAL_UPD(i)) > (s->reg_tcon & TCON_TIMER_MANUAL_UPD(i))) { /* @@ -XXX,XX +XXX,XX @@ static void exynos4210_pwm_write(void *opaque, hwaddr offset, ptimer_stop(s->timer[i].ptimer); DPRINTF("stop timer %d\n", i); } + ptimer_transaction_commit(s->timer[i].ptimer); } s->reg_tcon = value; break; @@ -XXX,XX +XXX,XX @@ static void exynos4210_pwm_reset(DeviceState *d) s->timer[i].reg_tcmpb = 0; s->timer[i].reg_tcntb = 0; + ptimer_transaction_begin(s->timer[i].ptimer); exynos4210_pwm_update_freq(s, s->timer[i].id); ptimer_stop(s->timer[i].ptimer); + ptimer_transaction_commit(s->timer[i].ptimer); } } @@ -XXX,XX +XXX,XX @@ static void exynos4210_pwm_init(Object *obj) Exynos4210PWMState *s = EXYNOS4210_PWM(obj); SysBusDevice *dev = SYS_BUS_DEVICE(obj); int i; - QEMUBH *bh; for (i = 0; i < EXYNOS4210_PWM_TIMERS_NUM; i++) { - bh = qemu_bh_new(exynos4210_pwm_tick, &s->timer[i]); sysbus_init_irq(dev, &s->timer[i].irq); - s->timer[i].ptimer = ptimer_init_with_bh(bh, PTIMER_POLICY_DEFAULT); + s->timer[i].ptimer = ptimer_init(exynos4210_pwm_tick, + &s->timer[i], + PTIMER_POLICY_DEFAULT); s->timer[i].id = i; s->timer[i].parent = s; } -- 2.20.1
Switch the exynos41210_rtc 1Hz ptimer over to the transaction-based API. (We will switch the other ptimer used by this device in a separate commit.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20191008171740.9679-16-peter.maydell@linaro.org --- hw/timer/exynos4210_rtc.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/hw/timer/exynos4210_rtc.c b/hw/timer/exynos4210_rtc.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/exynos4210_rtc.c +++ b/hw/timer/exynos4210_rtc.c @@ -XXX,XX +XXX,XX @@ static void exynos4210_rtc_write(void *opaque, hwaddr offset, } break; case RTCCON: + ptimer_transaction_begin(s->ptimer_1Hz); if (value & RTC_ENABLE) { exynos4210_rtc_update_freq(s, value); } @@ -XXX,XX +XXX,XX @@ static void exynos4210_rtc_write(void *opaque, hwaddr offset, ptimer_stop(s->ptimer); } } + ptimer_transaction_commit(s->ptimer_1Hz); s->reg_rtccon = value; break; case TICCNT: @@ -XXX,XX +XXX,XX @@ static void exynos4210_rtc_reset(DeviceState *d) exynos4210_rtc_update_freq(s, s->reg_rtccon); ptimer_stop(s->ptimer); + ptimer_transaction_begin(s->ptimer_1Hz); ptimer_stop(s->ptimer_1Hz); + ptimer_transaction_commit(s->ptimer_1Hz); } static const MemoryRegionOps exynos4210_rtc_ops = { @@ -XXX,XX +XXX,XX @@ static void exynos4210_rtc_init(Object *obj) ptimer_set_freq(s->ptimer, RTC_BASE_FREQ); exynos4210_rtc_update_freq(s, 0); - bh = qemu_bh_new(exynos4210_rtc_1Hz_tick, s); - s->ptimer_1Hz = ptimer_init_with_bh(bh, PTIMER_POLICY_DEFAULT); + s->ptimer_1Hz = ptimer_init(exynos4210_rtc_1Hz_tick, + s, PTIMER_POLICY_DEFAULT); + ptimer_transaction_begin(s->ptimer_1Hz); ptimer_set_freq(s->ptimer_1Hz, RTC_BASE_FREQ); + ptimer_transaction_commit(s->ptimer_1Hz); sysbus_init_irq(dev, &s->alm_irq); sysbus_init_irq(dev, &s->tick_irq); -- 2.20.1
Switch the exynos41210_rtc main ptimer over to the transaction-based API, completing the transition for this device. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20191008171740.9679-17-peter.maydell@linaro.org --- hw/timer/exynos4210_rtc.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/hw/timer/exynos4210_rtc.c b/hw/timer/exynos4210_rtc.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/exynos4210_rtc.c +++ b/hw/timer/exynos4210_rtc.c @@ -XXX,XX +XXX,XX @@ #include "qemu/osdep.h" #include "qemu-common.h" #include "qemu/log.h" -#include "qemu/main-loop.h" #include "qemu/module.h" #include "hw/sysbus.h" #include "migration/vmstate.h" @@ -XXX,XX +XXX,XX @@ static void check_alarm_raise(Exynos4210RTCState *s) * RTC update frequency * Parameters: * reg_value - current RTCCON register or his new value + * Must be called within a ptimer_transaction_begin/commit block for s->ptimer. */ static void exynos4210_rtc_update_freq(Exynos4210RTCState *s, uint32_t reg_value) @@ -XXX,XX +XXX,XX @@ static void exynos4210_rtc_write(void *opaque, hwaddr offset, break; case RTCCON: ptimer_transaction_begin(s->ptimer_1Hz); + ptimer_transaction_begin(s->ptimer); if (value & RTC_ENABLE) { exynos4210_rtc_update_freq(s, value); } @@ -XXX,XX +XXX,XX @@ static void exynos4210_rtc_write(void *opaque, hwaddr offset, } } ptimer_transaction_commit(s->ptimer_1Hz); + ptimer_transaction_commit(s->ptimer); s->reg_rtccon = value; break; case TICCNT: @@ -XXX,XX +XXX,XX @@ static void exynos4210_rtc_reset(DeviceState *d) s->reg_curticcnt = 0; + ptimer_transaction_begin(s->ptimer); exynos4210_rtc_update_freq(s, s->reg_rtccon); ptimer_stop(s->ptimer); + ptimer_transaction_commit(s->ptimer); ptimer_transaction_begin(s->ptimer_1Hz); ptimer_stop(s->ptimer_1Hz); ptimer_transaction_commit(s->ptimer_1Hz); @@ -XXX,XX +XXX,XX @@ static void exynos4210_rtc_init(Object *obj) { Exynos4210RTCState *s = EXYNOS4210_RTC(obj); SysBusDevice *dev = SYS_BUS_DEVICE(obj); - QEMUBH *bh; - bh = qemu_bh_new(exynos4210_rtc_tick, s); - s->ptimer = ptimer_init_with_bh(bh, PTIMER_POLICY_DEFAULT); + s->ptimer = ptimer_init(exynos4210_rtc_tick, s, PTIMER_POLICY_DEFAULT); + ptimer_transaction_begin(s->ptimer); ptimer_set_freq(s->ptimer, RTC_BASE_FREQ); exynos4210_rtc_update_freq(s, 0); + ptimer_transaction_commit(s->ptimer); s->ptimer_1Hz = ptimer_init(exynos4210_rtc_1Hz_tick, s, PTIMER_POLICY_DEFAULT); -- 2.20.1
Switch the imx_epit.c code away from bottom-half based ptimers to the new transaction-based ptimer API. This just requires adding begin/commit calls around the various places that modify the ptimer state, and using the new ptimer_init() function to create the timer. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20191008171740.9679-18-peter.maydell@linaro.org --- hw/timer/imx_epit.c | 32 +++++++++++++++++++++++++++----- 1 file changed, 27 insertions(+), 5 deletions(-) diff --git a/hw/timer/imx_epit.c b/hw/timer/imx_epit.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/imx_epit.c +++ b/hw/timer/imx_epit.c @@ -XXX,XX +XXX,XX @@ #include "migration/vmstate.h" #include "hw/irq.h" #include "hw/misc/imx_ccm.h" -#include "qemu/main-loop.h" #include "qemu/module.h" #include "qemu/log.h" @@ -XXX,XX +XXX,XX @@ static void imx_epit_update_int(IMXEPITState *s) } } +/* + * Must be called from within a ptimer_transaction_begin/commit block + * for both s->timer_cmp and s->timer_reload. + */ static void imx_epit_set_freq(IMXEPITState *s) { uint32_t clksrc; @@ -XXX,XX +XXX,XX @@ static void imx_epit_reset(DeviceState *dev) s->lr = EPIT_TIMER_MAX; s->cmp = 0; s->cnt = 0; + ptimer_transaction_begin(s->timer_cmp); + ptimer_transaction_begin(s->timer_reload); /* stop both timers */ ptimer_stop(s->timer_cmp); ptimer_stop(s->timer_reload); @@ -XXX,XX +XXX,XX @@ static void imx_epit_reset(DeviceState *dev) /* if the timer is still enabled, restart it */ ptimer_run(s->timer_reload, 0); } + ptimer_transaction_commit(s->timer_cmp); + ptimer_transaction_commit(s->timer_reload); } static uint32_t imx_epit_update_count(IMXEPITState *s) @@ -XXX,XX +XXX,XX @@ static uint64_t imx_epit_read(void *opaque, hwaddr offset, unsigned size) return reg_value; } +/* Must be called from ptimer_transaction_begin/commit block for s->timer_cmp */ static void imx_epit_reload_compare_timer(IMXEPITState *s) { if ((s->cr & (CR_EN | CR_OCIEN)) == (CR_EN | CR_OCIEN)) { @@ -XXX,XX +XXX,XX @@ static void imx_epit_write(void *opaque, hwaddr offset, uint64_t value, switch (offset >> 2) { case 0: /* CR */ + ptimer_transaction_begin(s->timer_cmp); + ptimer_transaction_begin(s->timer_reload); oldcr = s->cr; s->cr = value & 0x03ffffff; @@ -XXX,XX +XXX,XX @@ static void imx_epit_write(void *opaque, hwaddr offset, uint64_t value, } else { ptimer_stop(s->timer_cmp); } + + ptimer_transaction_commit(s->timer_cmp); + ptimer_transaction_commit(s->timer_reload); break; case 1: /* SR - ACK*/ @@ -XXX,XX +XXX,XX @@ static void imx_epit_write(void *opaque, hwaddr offset, uint64_t value, case 2: /* LR - set ticks */ s->lr = value; + ptimer_transaction_begin(s->timer_cmp); + ptimer_transaction_begin(s->timer_reload); if (s->cr & CR_RLD) { /* Also set the limit if the LRD bit is set */ /* If IOVW bit is set then set the timer value */ @@ -XXX,XX +XXX,XX @@ static void imx_epit_write(void *opaque, hwaddr offset, uint64_t value, } imx_epit_reload_compare_timer(s); + ptimer_transaction_commit(s->timer_cmp); + ptimer_transaction_commit(s->timer_reload); break; case 3: /* CMP */ s->cmp = value; + ptimer_transaction_begin(s->timer_cmp); imx_epit_reload_compare_timer(s); + ptimer_transaction_commit(s->timer_cmp); break; @@ -XXX,XX +XXX,XX @@ static void imx_epit_cmp(void *opaque) imx_epit_update_int(s); } +static void imx_epit_reload(void *opaque) +{ + /* No action required on rollover of timer_reload */ +} + static const MemoryRegionOps imx_epit_ops = { .read = imx_epit_read, .write = imx_epit_write, @@ -XXX,XX +XXX,XX @@ static void imx_epit_realize(DeviceState *dev, Error **errp) { IMXEPITState *s = IMX_EPIT(dev); SysBusDevice *sbd = SYS_BUS_DEVICE(dev); - QEMUBH *bh; DPRINTF("\n"); @@ -XXX,XX +XXX,XX @@ static void imx_epit_realize(DeviceState *dev, Error **errp) 0x00001000); sysbus_init_mmio(sbd, &s->iomem); - s->timer_reload = ptimer_init_with_bh(NULL, PTIMER_POLICY_DEFAULT); + s->timer_reload = ptimer_init(imx_epit_reload, s, PTIMER_POLICY_DEFAULT); - bh = qemu_bh_new(imx_epit_cmp, s); - s->timer_cmp = ptimer_init_with_bh(bh, PTIMER_POLICY_DEFAULT); + s->timer_cmp = ptimer_init(imx_epit_cmp, s, PTIMER_POLICY_DEFAULT); } static void imx_epit_class_init(ObjectClass *klass, void *data) -- 2.20.1
Switch the imx_epit.c code away from bottom-half based ptimers to the new transaction-based ptimer API. This just requires adding begin/commit calls around the various places that modify the ptimer state, and using the new ptimer_init() function to create the timer. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20191008171740.9679-19-peter.maydell@linaro.org --- hw/timer/imx_gpt.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/hw/timer/imx_gpt.c b/hw/timer/imx_gpt.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/imx_gpt.c +++ b/hw/timer/imx_gpt.c @@ -XXX,XX +XXX,XX @@ #include "hw/irq.h" #include "hw/timer/imx_gpt.h" #include "migration/vmstate.h" -#include "qemu/main-loop.h" #include "qemu/module.h" #include "qemu/log.h" @@ -XXX,XX +XXX,XX @@ static const IMXClk imx7_gpt_clocks[] = { CLK_NONE, /* 111 not defined */ }; +/* Must be called from within ptimer_transaction_begin/commit block */ static void imx_gpt_set_freq(IMXGPTState *s) { uint32_t clksrc = extract32(s->cr, GPT_CR_CLKSRC_SHIFT, 3); @@ -XXX,XX +XXX,XX @@ static inline uint32_t imx_gpt_find_limit(uint32_t count, uint32_t reg, return timeout; } +/* Must be called from within ptimer_transaction_begin/commit block */ static void imx_gpt_compute_next_timeout(IMXGPTState *s, bool event) { uint32_t timeout = GPT_TIMER_MAX; @@ -XXX,XX +XXX,XX @@ static uint64_t imx_gpt_read(void *opaque, hwaddr offset, unsigned size) static void imx_gpt_reset_common(IMXGPTState *s, bool is_soft_reset) { + ptimer_transaction_begin(s->timer); /* stop timer */ ptimer_stop(s->timer); @@ -XXX,XX +XXX,XX @@ static void imx_gpt_reset_common(IMXGPTState *s, bool is_soft_reset) if (s->freq && (s->cr & GPT_CR_EN)) { ptimer_run(s->timer, 1); } + ptimer_transaction_commit(s->timer); } static void imx_gpt_soft_reset(DeviceState *dev) @@ -XXX,XX +XXX,XX @@ static void imx_gpt_write(void *opaque, hwaddr offset, uint64_t value, imx_gpt_soft_reset(DEVICE(s)); } else { /* set our freq, as the source might have changed */ + ptimer_transaction_begin(s->timer); imx_gpt_set_freq(s); if ((oldreg ^ s->cr) & GPT_CR_EN) { @@ -XXX,XX +XXX,XX @@ static void imx_gpt_write(void *opaque, hwaddr offset, uint64_t value, ptimer_stop(s->timer); } } + ptimer_transaction_commit(s->timer); } break; case 1: /* Prescaler */ s->pr = value & 0xfff; + ptimer_transaction_begin(s->timer); imx_gpt_set_freq(s); + ptimer_transaction_commit(s->timer); break; case 2: /* SR */ @@ -XXX,XX +XXX,XX @@ static void imx_gpt_write(void *opaque, hwaddr offset, uint64_t value, s->ir = value & 0x3f; imx_gpt_update_int(s); + ptimer_transaction_begin(s->timer); imx_gpt_compute_next_timeout(s, false); + ptimer_transaction_commit(s->timer); break; case 4: /* OCR1 -- output compare register */ s->ocr1 = value; + ptimer_transaction_begin(s->timer); /* In non-freerun mode, reset count when this register is written */ if (!(s->cr & GPT_CR_FRR)) { s->next_timeout = GPT_TIMER_MAX; @@ -XXX,XX +XXX,XX @@ static void imx_gpt_write(void *opaque, hwaddr offset, uint64_t value, /* compute the new timeout */ imx_gpt_compute_next_timeout(s, false); + ptimer_transaction_commit(s->timer); break; @@ -XXX,XX +XXX,XX @@ static void imx_gpt_write(void *opaque, hwaddr offset, uint64_t value, s->ocr2 = value; /* compute the new timeout */ + ptimer_transaction_begin(s->timer); imx_gpt_compute_next_timeout(s, false); + ptimer_transaction_commit(s->timer); break; @@ -XXX,XX +XXX,XX @@ static void imx_gpt_write(void *opaque, hwaddr offset, uint64_t value, s->ocr3 = value; /* compute the new timeout */ + ptimer_transaction_begin(s->timer); imx_gpt_compute_next_timeout(s, false); + ptimer_transaction_commit(s->timer); break; @@ -XXX,XX +XXX,XX @@ static void imx_gpt_realize(DeviceState *dev, Error **errp) { IMXGPTState *s = IMX_GPT(dev); SysBusDevice *sbd = SYS_BUS_DEVICE(dev); - QEMUBH *bh; sysbus_init_irq(sbd, &s->irq); memory_region_init_io(&s->iomem, OBJECT(s), &imx_gpt_ops, s, TYPE_IMX_GPT, 0x00001000); sysbus_init_mmio(sbd, &s->iomem); - bh = qemu_bh_new(imx_gpt_timeout, s); - s->timer = ptimer_init_with_bh(bh, PTIMER_POLICY_DEFAULT); + s->timer = ptimer_init(imx_gpt_timeout, s, PTIMER_POLICY_DEFAULT); } static void imx_gpt_class_init(ObjectClass *klass, void *data) -- 2.20.1
Switch the mss-timer code away from bottom-half based ptimers to the new transaction-based ptimer API. This just requires adding begin/commit calls around the various places that modify the ptimer state, and using the new ptimer_init() function to create the timer. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20191008171740.9679-20-peter.maydell@linaro.org --- include/hw/timer/mss-timer.h | 1 - hw/timer/mss-timer.c | 11 ++++++++--- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/include/hw/timer/mss-timer.h b/include/hw/timer/mss-timer.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/timer/mss-timer.h +++ b/include/hw/timer/mss-timer.h @@ -XXX,XX +XXX,XX @@ #define R_TIM1_MAX 6 struct Msf2Timer { - QEMUBH *bh; ptimer_state *ptimer; uint32_t regs[R_TIM1_MAX]; diff --git a/hw/timer/mss-timer.c b/hw/timer/mss-timer.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/mss-timer.c +++ b/hw/timer/mss-timer.c @@ -XXX,XX +XXX,XX @@ */ #include "qemu/osdep.h" -#include "qemu/main-loop.h" #include "qemu/module.h" #include "qemu/log.h" #include "hw/irq.h" @@ -XXX,XX +XXX,XX @@ static void timer_update_irq(struct Msf2Timer *st) qemu_set_irq(st->irq, (ier && isr)); } +/* Must be called from within a ptimer_transaction_begin/commit block */ static void timer_update(struct Msf2Timer *st) { uint64_t count; @@ -XXX,XX +XXX,XX @@ timer_write(void *opaque, hwaddr offset, switch (addr) { case R_TIM_CTRL: st->regs[R_TIM_CTRL] = value; + ptimer_transaction_begin(st->ptimer); timer_update(st); + ptimer_transaction_commit(st->ptimer); break; case R_TIM_RIS: @@ -XXX,XX +XXX,XX @@ timer_write(void *opaque, hwaddr offset, case R_TIM_LOADVAL: st->regs[R_TIM_LOADVAL] = value; if (st->regs[R_TIM_CTRL] & TIMER_CTRL_ENBL) { + ptimer_transaction_begin(st->ptimer); timer_update(st); + ptimer_transaction_commit(st->ptimer); } break; @@ -XXX,XX +XXX,XX @@ static void mss_timer_init(Object *obj) for (i = 0; i < NUM_TIMERS; i++) { struct Msf2Timer *st = &t->timers[i]; - st->bh = qemu_bh_new(timer_hit, st); - st->ptimer = ptimer_init_with_bh(st->bh, PTIMER_POLICY_DEFAULT); + st->ptimer = ptimer_init(timer_hit, st, PTIMER_POLICY_DEFAULT); + ptimer_transaction_begin(st->ptimer); ptimer_set_freq(st->ptimer, t->freq_hz); + ptimer_transaction_commit(st->ptimer); sysbus_init_irq(SYS_BUS_DEVICE(obj), &st->irq); } -- 2.20.1
Switch the cmsdk-apb-watchdog code away from bottom-half based ptimers to the new transaction-based ptimer API. This just requires adding begin/commit calls around the various places that modify the ptimer state, and using the new ptimer_init() function to create the timer. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20191008171740.9679-21-peter.maydell@linaro.org --- hw/watchdog/cmsdk-apb-watchdog.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/hw/watchdog/cmsdk-apb-watchdog.c b/hw/watchdog/cmsdk-apb-watchdog.c index XXXXXXX..XXXXXXX 100644 --- a/hw/watchdog/cmsdk-apb-watchdog.c +++ b/hw/watchdog/cmsdk-apb-watchdog.c @@ -XXX,XX +XXX,XX @@ #include "qemu/log.h" #include "trace.h" #include "qapi/error.h" -#include "qemu/main-loop.h" #include "qemu/module.h" #include "sysemu/watchdog.h" #include "hw/sysbus.h" @@ -XXX,XX +XXX,XX @@ static void cmsdk_apb_watchdog_write(void *opaque, hwaddr offset, * Reset the load value and the current count, and make sure * we're counting. */ + ptimer_transaction_begin(s->timer); ptimer_set_limit(s->timer, value, 1); ptimer_run(s->timer, 0); + ptimer_transaction_commit(s->timer); break; case A_WDOGCONTROL: if (s->is_luminary && 0 != (R_WDOGCONTROL_INTEN_MASK & s->control)) { @@ -XXX,XX +XXX,XX @@ static void cmsdk_apb_watchdog_write(void *opaque, hwaddr offset, break; case A_WDOGINTCLR: s->intstatus = 0; + ptimer_transaction_begin(s->timer); ptimer_set_count(s->timer, ptimer_get_limit(s->timer)); + ptimer_transaction_commit(s->timer); cmsdk_apb_watchdog_update(s); break; case A_WDOGLOCK: @@ -XXX,XX +XXX,XX @@ static void cmsdk_apb_watchdog_reset(DeviceState *dev) s->itop = 0; s->resetstatus = 0; /* Set the limit and the count */ + ptimer_transaction_begin(s->timer); ptimer_set_limit(s->timer, 0xffffffff, 1); ptimer_run(s->timer, 0); + ptimer_transaction_commit(s->timer); } static void cmsdk_apb_watchdog_init(Object *obj) @@ -XXX,XX +XXX,XX @@ static void cmsdk_apb_watchdog_init(Object *obj) static void cmsdk_apb_watchdog_realize(DeviceState *dev, Error **errp) { CMSDKAPBWatchdog *s = CMSDK_APB_WATCHDOG(dev); - QEMUBH *bh; if (s->wdogclk_frq == 0) { error_setg(errp, @@ -XXX,XX +XXX,XX @@ static void cmsdk_apb_watchdog_realize(DeviceState *dev, Error **errp) return; } - bh = qemu_bh_new(cmsdk_apb_watchdog_tick, s); - s->timer = ptimer_init_with_bh(bh, + s->timer = ptimer_init(cmsdk_apb_watchdog_tick, s, PTIMER_POLICY_WRAP_AFTER_ONE_PERIOD | PTIMER_POLICY_TRIGGER_ONLY_ON_DECREMENT | PTIMER_POLICY_NO_IMMEDIATE_RELOAD | PTIMER_POLICY_NO_COUNTER_ROUND_DOWN); + ptimer_transaction_begin(s->timer); ptimer_set_freq(s->timer, s->wdogclk_frq); + ptimer_transaction_commit(s->timer); } static const VMStateDescription cmsdk_apb_watchdog_vmstate = { -- 2.20.1
Switch the cmsdk-apb-watchdog code away from bottom-half based ptimers to the new transaction-based ptimer API. This just requires adding begin/commit calls around the various places that modify the ptimer state, and using the new ptimer_init() function to create the timer. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20191008171740.9679-22-peter.maydell@linaro.org --- hw/net/lan9118.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/hw/net/lan9118.c b/hw/net/lan9118.c index XXXXXXX..XXXXXXX 100644 --- a/hw/net/lan9118.c +++ b/hw/net/lan9118.c @@ -XXX,XX +XXX,XX @@ #include "hw/ptimer.h" #include "hw/qdev-properties.h" #include "qemu/log.h" -#include "qemu/main-loop.h" #include "qemu/module.h" /* For crc32 */ #include <zlib.h> @@ -XXX,XX +XXX,XX @@ static void lan9118_reset(DeviceState *d) s->e2p_data = 0; s->free_timer_start = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) / 40; + ptimer_transaction_begin(s->timer); ptimer_stop(s->timer); ptimer_set_count(s->timer, 0xffff); + ptimer_transaction_commit(s->timer); s->gpt_cfg = 0xffff; s->mac_cr = MAC_CR_PRMS; @@ -XXX,XX +XXX,XX @@ static void lan9118_writel(void *opaque, hwaddr offset, break; case CSR_GPT_CFG: if ((s->gpt_cfg ^ val) & GPT_TIMER_EN) { + ptimer_transaction_begin(s->timer); if (val & GPT_TIMER_EN) { ptimer_set_count(s->timer, val & 0xffff); ptimer_run(s->timer, 0); @@ -XXX,XX +XXX,XX @@ static void lan9118_writel(void *opaque, hwaddr offset, ptimer_stop(s->timer); ptimer_set_count(s->timer, 0xffff); } + ptimer_transaction_commit(s->timer); } s->gpt_cfg = val & (GPT_TIMER_EN | 0xffff); break; @@ -XXX,XX +XXX,XX @@ static void lan9118_realize(DeviceState *dev, Error **errp) { SysBusDevice *sbd = SYS_BUS_DEVICE(dev); lan9118_state *s = LAN9118(dev); - QEMUBH *bh; int i; const MemoryRegionOps *mem_ops = s->mode_16bit ? &lan9118_16bit_mem_ops : &lan9118_mem_ops; @@ -XXX,XX +XXX,XX @@ static void lan9118_realize(DeviceState *dev, Error **errp) s->pmt_ctrl = 1; s->txp = &s->tx_packet; - bh = qemu_bh_new(lan9118_tick, s); - s->timer = ptimer_init_with_bh(bh, PTIMER_POLICY_DEFAULT); + s->timer = ptimer_init(lan9118_tick, s, PTIMER_POLICY_DEFAULT); + ptimer_transaction_begin(s->timer); ptimer_set_freq(s->timer, 10000); ptimer_set_limit(s->timer, 0xffff, 1); + ptimer_transaction_commit(s->timer); } static Property lan9118_properties[] = { -- 2.20.1
The set_swi_errno() function is called to capture the errno from a host system call, so that we can return -1 from the semihosting function and later allow the guest to get a more specific error code with the SYS_ERRNO function. It comes in two versions, one for user-only and one for softmmu. We forgot to capture the errno in the softmmu version; fix the error. (Semihosting calls directed to gdb are unaffected because they go through a different code path that captures the error return from the gdbstub call in arm_semi_cb() or arm_semi_flen_cb().) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20190916141544.17540-2-peter.maydell@linaro.org --- target/arm/arm-semi.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/target/arm/arm-semi.c b/target/arm/arm-semi.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/arm-semi.c +++ b/target/arm/arm-semi.c @@ -XXX,XX +XXX,XX @@ static inline uint32_t set_swi_errno(TaskState *ts, uint32_t code) return code; } #else +static target_ulong syscall_err; + static inline uint32_t set_swi_errno(CPUARMState *env, uint32_t code) { + if (code == (uint32_t)-1) { + syscall_err = errno; + } return code; } @@ -XXX,XX +XXX,XX @@ static inline uint32_t set_swi_errno(CPUARMState *env, uint32_t code) static target_ulong arm_semi_syscall_len; -#if !defined(CONFIG_USER_ONLY) -static target_ulong syscall_err; -#endif - static void arm_semi_cb(CPUState *cs, target_ulong ret, target_ulong err) { ARMCPU *cpu = ARM_CPU(cs); -- 2.20.1
If we fail a semihosting call we should always set the semihosting errno to something; we were failing to do this for some of the "check inputs for sanity" cases. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20190916141544.17540-3-peter.maydell@linaro.org --- target/arm/arm-semi.c | 45 ++++++++++++++++++++++++++----------------- 1 file changed, 27 insertions(+), 18 deletions(-) diff --git a/target/arm/arm-semi.c b/target/arm/arm-semi.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/arm-semi.c +++ b/target/arm/arm-semi.c @@ -XXX,XX +XXX,XX @@ static target_ulong arm_gdb_syscall(ARMCPU *cpu, gdb_syscall_complete_cb cb, #define GET_ARG(n) do { \ if (is_a64(env)) { \ if (get_user_u64(arg ## n, args + (n) * 8)) { \ - return -1; \ + errno = EFAULT; \ + return set_swi_errno(ts, -1); \ } \ } else { \ if (get_user_u32(arg ## n, args + (n) * 4)) { \ - return -1; \ + errno = EFAULT; \ + return set_swi_errno(ts, -1); \ } \ } \ } while (0) @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) GET_ARG(2); s = lock_user_string(arg0); if (!s) { - /* FIXME - should this error code be -TARGET_EFAULT ? */ - return (uint32_t)-1; + errno = EFAULT; + return set_swi_errno(ts, -1); } if (arg1 >= 12) { unlock_user(s, arg0, 0); - return (uint32_t)-1; + errno = EINVAL; + return set_swi_errno(ts, -1); } if (strcmp(s, ":tt") == 0) { int result_fileno = arg1 < 4 ? STDIN_FILENO : STDOUT_FILENO; @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) } else { s = lock_user_string(arg0); if (!s) { - /* FIXME - should this error code be -TARGET_EFAULT ? */ - return (uint32_t)-1; + errno = EFAULT; + return set_swi_errno(ts, -1); } ret = set_swi_errno(ts, remove(s)); unlock_user(s, arg0, 0); @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) char *s2; s = lock_user_string(arg0); s2 = lock_user_string(arg2); - if (!s || !s2) - /* FIXME - should this error code be -TARGET_EFAULT ? */ - ret = (uint32_t)-1; - else + if (!s || !s2) { + errno = EFAULT; + ret = set_swi_errno(ts, -1); + } else { ret = set_swi_errno(ts, rename(s, s2)); + } if (s2) unlock_user(s2, arg2, 0); if (s) @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) } else { s = lock_user_string(arg0); if (!s) { - /* FIXME - should this error code be -TARGET_EFAULT ? */ - return (uint32_t)-1; + errno = EFAULT; + return set_swi_errno(ts, -1); } ret = set_swi_errno(ts, system(s)); unlock_user(s, arg0, 0); @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) if (output_size > input_size) { /* Not enough space to store command-line arguments. */ - return -1; + errno = E2BIG; + return set_swi_errno(ts, -1); } /* Adjust the command-line length. */ if (SET_ARG(1, output_size - 1)) { /* Couldn't write back to argument block */ - return -1; + errno = EFAULT; + return set_swi_errno(ts, -1); } /* Lock the buffer on the ARM side. */ output_buffer = lock_user(VERIFY_WRITE, arg0, output_size, 0); if (!output_buffer) { - return -1; + errno = EFAULT; + return set_swi_errno(ts, -1); } /* Copy the command-line arguments. */ @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) if (copy_from_user(output_buffer, ts->info->arg_start, output_size)) { - status = -1; + errno = EFAULT; + status = set_swi_errno(ts, -1); goto out; } @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) if (fail) { /* Couldn't write back to argument block */ - return -1; + errno = EFAULT; + return set_swi_errno(ts, -1); } } return 0; -- 2.20.1
In arm_gdb_syscall() we have a comment suggesting a race because the syscall completion callback might not happen before the gdb_do_syscallv() call returns. The comment is correct that the callback may not happen but incorrect about the effects. Correct it and note the important caveat that callers must never do any work of any kind after return from arm_gdb_syscall() that depends on its return value. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20190916141544.17540-4-peter.maydell@linaro.org --- target/arm/arm-semi.c | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/target/arm/arm-semi.c b/target/arm/arm-semi.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/arm-semi.c +++ b/target/arm/arm-semi.c @@ -XXX,XX +XXX,XX @@ static target_ulong arm_gdb_syscall(ARMCPU *cpu, gdb_syscall_complete_cb cb, gdb_do_syscallv(cb, fmt, va); va_end(va); - /* FIXME: we are implicitly relying on the syscall completing - * before this point, which is not guaranteed. We should - * put in an explicit synchronization between this and - * the callback function. + /* + * FIXME: in softmmu mode, the gdbstub will schedule our callback + * to occur, but will not actually call it to complete the syscall + * until after this function has returned and we are back in the + * CPU main loop. Therefore callers to this function must not + * do anything with its return value, because it is not necessarily + * the result of the syscall, but could just be the old value of X0. + * The only thing safe to do with this is that the callers of + * do_arm_semihosting() will write it straight back into X0. + * (In linux-user mode, the callback will have happened before + * gdb_do_syscallv() returns.) + * + * We should tidy this up so neither this function nor + * do_arm_semihosting() return a value, so the mistake of + * doing something with the return value is not possible to make. */ return is_a64(env) ? env->xregs[0] : env->regs[0]; -- 2.20.1
Currently the Arm semihosting code returns the guest file descriptors (handles) which are simply the fd values from the host OS or the remote gdbstub. Part of the semihosting 2.0 specification requires that we implement special handling of opening a ":semihosting-features" filename. Guest fds which result from opening the special file won't correspond to host fds, so to ensure that we don't end up with duplicate fds we need to have QEMU code control the allocation of the fd values we give the guest. Add in an abstraction layer which lets us allocate new guest FD values, and translate from a guest FD value back to the host one. This also fixes an odd hole where a semihosting guest could use the semihosting API to read, write or close file descriptors that it had never allocated but which were being used by QEMU itself. (This isn't a security hole, because enabling semihosting permits the guest to do arbitrary file access to the whole host filesystem, and so should only be done if the guest is completely trusted.) Currently the only kind of guest fd is one which maps to a host fd, but in a following commit we will add one which maps to the :semihosting-features magic data. If the guest is migrated with an open semihosting file descriptor then subsequent attempts to use the fd will all fail; this is not a change from the previous situation (where the host fd being used on the source end would not be re-opened on the destination end). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20190916141544.17540-5-peter.maydell@linaro.org --- target/arm/arm-semi.c | 232 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 216 insertions(+), 16 deletions(-) diff --git a/target/arm/arm-semi.c b/target/arm/arm-semi.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/arm-semi.c +++ b/target/arm/arm-semi.c @@ -XXX,XX +XXX,XX @@ static int open_modeflags[12] = { O_RDWR | O_CREAT | O_APPEND | O_BINARY }; +typedef enum GuestFDType { + GuestFDUnused = 0, + GuestFDHost = 1, +} GuestFDType; + +/* + * Guest file descriptors are integer indexes into an array of + * these structures (we will dynamically resize as necessary). + */ +typedef struct GuestFD { + GuestFDType type; + int hostfd; +} GuestFD; + +static GArray *guestfd_array; + +/* + * Allocate a new guest file descriptor and return it; if we + * couldn't allocate a new fd then return -1. + * This is a fairly simplistic implementation because we don't + * expect that most semihosting guest programs will make very + * heavy use of opening and closing fds. + */ +static int alloc_guestfd(void) +{ + guint i; + + if (!guestfd_array) { + /* New entries zero-initialized, i.e. type GuestFDUnused */ + guestfd_array = g_array_new(FALSE, TRUE, sizeof(GuestFD)); + } + + for (i = 0; i < guestfd_array->len; i++) { + GuestFD *gf = &g_array_index(guestfd_array, GuestFD, i); + + if (gf->type == GuestFDUnused) { + return i; + } + } + + /* All elements already in use: expand the array */ + g_array_set_size(guestfd_array, i + 1); + return i; +} + +/* + * Look up the guestfd in the data structure; return NULL + * for out of bounds, but don't check whether the slot is unused. + * This is used internally by the other guestfd functions. + */ +static GuestFD *do_get_guestfd(int guestfd) +{ + if (!guestfd_array) { + return NULL; + } + + if (guestfd < 0 || guestfd >= guestfd_array->len) { + return NULL; + } + + return &g_array_index(guestfd_array, GuestFD, guestfd); +} + +/* + * Associate the specified guest fd (which must have been + * allocated via alloc_fd() and not previously used) with + * the specified host fd. + */ +static void associate_guestfd(int guestfd, int hostfd) +{ + GuestFD *gf = do_get_guestfd(guestfd); + + assert(gf); + gf->type = GuestFDHost; + gf->hostfd = hostfd; +} + +/* + * Deallocate the specified guest file descriptor. This doesn't + * close the host fd, it merely undoes the work of alloc_fd(). + */ +static void dealloc_guestfd(int guestfd) +{ + GuestFD *gf = do_get_guestfd(guestfd); + + assert(gf); + gf->type = GuestFDUnused; +} + +/* + * Given a guest file descriptor, get the associated struct. + * If the fd is not valid, return NULL. This is the function + * used by the various semihosting calls to validate a handle + * from the guest. + * Note: calling alloc_guestfd() or dealloc_guestfd() will + * invalidate any GuestFD* obtained by calling this function. + */ +static GuestFD *get_guestfd(int guestfd) +{ + GuestFD *gf = do_get_guestfd(guestfd); + + if (!gf || gf->type == GuestFDUnused) { + return NULL; + } + return gf; +} + #ifdef CONFIG_USER_ONLY static inline uint32_t set_swi_errno(TaskState *ts, uint32_t code) { @@ -XXX,XX +XXX,XX @@ static void arm_semi_flen_cb(CPUState *cs, target_ulong ret, target_ulong err) #endif } +static int arm_semi_open_guestfd; + +static void arm_semi_open_cb(CPUState *cs, target_ulong ret, target_ulong err) +{ + ARMCPU *cpu = ARM_CPU(cs); + CPUARMState *env = &cpu->env; +#ifdef CONFIG_USER_ONLY + TaskState *ts = cs->opaque; +#endif + if (ret == (target_ulong)-1) { +#ifdef CONFIG_USER_ONLY + ts->swi_errno = err; +#else + syscall_err = err; +#endif + dealloc_guestfd(arm_semi_open_guestfd); + } else { + associate_guestfd(arm_semi_open_guestfd, ret); + ret = arm_semi_open_guestfd; + } + + if (is_a64(env)) { + env->xregs[0] = ret; + } else { + env->regs[0] = ret; + } +} + static target_ulong arm_gdb_syscall(ARMCPU *cpu, gdb_syscall_complete_cb cb, const char *fmt, ...) { @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) #else CPUARMState *ts = env; #endif + GuestFD *gf; if (is_a64(env)) { /* Note that the syscall number is in W0, not X0 */ @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) switch (nr) { case TARGET_SYS_OPEN: + { + int guestfd; + GET_ARG(0); GET_ARG(1); GET_ARG(2); @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) errno = EINVAL; return set_swi_errno(ts, -1); } + + guestfd = alloc_guestfd(); + if (guestfd < 0) { + unlock_user(s, arg0, 0); + errno = EMFILE; + return set_swi_errno(ts, -1); + } + if (strcmp(s, ":tt") == 0) { int result_fileno = arg1 < 4 ? STDIN_FILENO : STDOUT_FILENO; + associate_guestfd(guestfd, result_fileno); unlock_user(s, arg0, 0); - return result_fileno; + return guestfd; } if (use_gdb_syscalls()) { - ret = arm_gdb_syscall(cpu, arm_semi_cb, "open,%s,%x,1a4", arg0, + arm_semi_open_guestfd = guestfd; + ret = arm_gdb_syscall(cpu, arm_semi_open_cb, "open,%s,%x,1a4", arg0, (int)arg2+1, gdb_open_modeflags[arg1]); } else { ret = set_swi_errno(ts, open(s, open_modeflags[arg1], 0644)); + if (ret == (uint32_t)-1) { + dealloc_guestfd(guestfd); + } else { + associate_guestfd(guestfd, ret); + ret = guestfd; + } } unlock_user(s, arg0, 0); return ret; + } case TARGET_SYS_CLOSE: GET_ARG(0); - if (use_gdb_syscalls()) { - return arm_gdb_syscall(cpu, arm_semi_cb, "close,%x", arg0); - } else { - return set_swi_errno(ts, close(arg0)); + + gf = get_guestfd(arg0); + if (!gf) { + errno = EBADF; + return set_swi_errno(ts, -1); } + + if (use_gdb_syscalls()) { + ret = arm_gdb_syscall(cpu, arm_semi_cb, "close,%x", gf->hostfd); + } else { + ret = set_swi_errno(ts, close(gf->hostfd)); + } + dealloc_guestfd(arg0); + return ret; case TARGET_SYS_WRITEC: qemu_semihosting_console_outc(env, args); return 0xdeadbeef; @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) GET_ARG(1); GET_ARG(2); len = arg2; + + gf = get_guestfd(arg0); + if (!gf) { + errno = EBADF; + return set_swi_errno(ts, -1); + } + if (use_gdb_syscalls()) { arm_semi_syscall_len = len; return arm_gdb_syscall(cpu, arm_semi_cb, "write,%x,%x,%x", - arg0, arg1, len); + gf->hostfd, arg1, len); } else { s = lock_user(VERIFY_READ, arg1, len, 1); if (!s) { /* Return bytes not written on error */ return len; } - ret = set_swi_errno(ts, write(arg0, s, len)); + ret = set_swi_errno(ts, write(gf->hostfd, s, len)); unlock_user(s, arg1, 0); if (ret == (uint32_t)-1) { ret = 0; @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) GET_ARG(1); GET_ARG(2); len = arg2; + + gf = get_guestfd(arg0); + if (!gf) { + errno = EBADF; + return set_swi_errno(ts, -1); + } + if (use_gdb_syscalls()) { arm_semi_syscall_len = len; return arm_gdb_syscall(cpu, arm_semi_cb, "read,%x,%x,%x", - arg0, arg1, len); + gf->hostfd, arg1, len); } else { s = lock_user(VERIFY_WRITE, arg1, len, 0); if (!s) { @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) return len; } do { - ret = set_swi_errno(ts, read(arg0, s, len)); + ret = set_swi_errno(ts, read(gf->hostfd, s, len)); } while (ret == -1 && errno == EINTR); unlock_user(s, arg1, len); if (ret == (uint32_t)-1) { @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) return 0; case TARGET_SYS_ISTTY: GET_ARG(0); + + gf = get_guestfd(arg0); + if (!gf) { + errno = EBADF; + return set_swi_errno(ts, -1); + } + if (use_gdb_syscalls()) { - return arm_gdb_syscall(cpu, arm_semi_cb, "isatty,%x", arg0); + return arm_gdb_syscall(cpu, arm_semi_cb, "isatty,%x", gf->hostfd); } else { - return isatty(arg0); + return isatty(gf->hostfd); } case TARGET_SYS_SEEK: GET_ARG(0); GET_ARG(1); + + gf = get_guestfd(arg0); + if (!gf) { + errno = EBADF; + return set_swi_errno(ts, -1); + } + if (use_gdb_syscalls()) { return arm_gdb_syscall(cpu, arm_semi_cb, "lseek,%x,%x,0", - arg0, arg1); + gf->hostfd, arg1); } else { - ret = set_swi_errno(ts, lseek(arg0, arg1, SEEK_SET)); + ret = set_swi_errno(ts, lseek(gf->hostfd, arg1, SEEK_SET)); if (ret == (uint32_t)-1) return -1; return 0; } case TARGET_SYS_FLEN: GET_ARG(0); + + gf = get_guestfd(arg0); + if (!gf) { + errno = EBADF; + return set_swi_errno(ts, -1); + } + if (use_gdb_syscalls()) { return arm_gdb_syscall(cpu, arm_semi_flen_cb, "fstat,%x,%x", - arg0, arm_flen_buf(cpu)); + gf->hostfd, arm_flen_buf(cpu)); } else { struct stat buf; - ret = set_swi_errno(ts, fstat(arg0, &buf)); + ret = set_swi_errno(ts, fstat(gf->hostfd, &buf)); if (ret == (uint32_t)-1) return -1; return buf.st_size; -- 2.20.1
The semihosting code needs accuss to the linux-user only TaskState pointer so it can set the semihosting errno per-thread for linux-user mode. At the moment we do this by having some ifdefs so that we define a 'ts' local in do_arm_semihosting() which is either a real TaskState * or just a CPUARMState *, depending on which mode we're compiling for. This is awkward if we want to refactor do_arm_semihosting() into other functions which might need to be passed the TaskState. Restrict usage of the TaskState local by: * making set_swi_errno() always take the CPUARMState pointer and (for the linux-user version) get TaskState from that * creating a new get_swi_errno() which reads the errno * having the two semihosting calls which need the TaskState for other purposes (SYS_GET_CMDLINE and SYS_HEAPINFO) define a variable with scope restricted to just that code Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20190916141544.17540-6-peter.maydell@linaro.org --- target/arm/arm-semi.c | 111 ++++++++++++++++++++++++------------------ 1 file changed, 63 insertions(+), 48 deletions(-) diff --git a/target/arm/arm-semi.c b/target/arm/arm-semi.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/arm-semi.c +++ b/target/arm/arm-semi.c @@ -XXX,XX +XXX,XX @@ static GuestFD *get_guestfd(int guestfd) return gf; } -#ifdef CONFIG_USER_ONLY -static inline uint32_t set_swi_errno(TaskState *ts, uint32_t code) -{ - if (code == (uint32_t)-1) - ts->swi_errno = errno; - return code; -} -#else +/* + * The semihosting API has no concept of its errno being thread-safe, + * as the API design predates SMP CPUs and was intended as a simple + * real-hardware set of debug functionality. For QEMU, we make the + * errno be per-thread in linux-user mode; in softmmu it is a simple + * global, and we assume that the guest takes care of avoiding any races. + */ +#ifndef CONFIG_USER_ONLY static target_ulong syscall_err; +#include "exec/softmmu-semi.h" +#endif + static inline uint32_t set_swi_errno(CPUARMState *env, uint32_t code) { if (code == (uint32_t)-1) { +#ifdef CONFIG_USER_ONLY + CPUState *cs = env_cpu(env); + TaskState *ts = cs->opaque; + + ts->swi_errno = errno; +#else syscall_err = errno; +#endif } return code; } -#include "exec/softmmu-semi.h" +static inline uint32_t get_swi_errno(CPUARMState *env) +{ +#ifdef CONFIG_USER_ONLY + CPUState *cs = env_cpu(env); + TaskState *ts = cs->opaque; + + return ts->swi_errno; +#else + return syscall_err; #endif +} static target_ulong arm_semi_syscall_len; @@ -XXX,XX +XXX,XX @@ static target_ulong arm_gdb_syscall(ARMCPU *cpu, gdb_syscall_complete_cb cb, if (is_a64(env)) { \ if (get_user_u64(arg ## n, args + (n) * 8)) { \ errno = EFAULT; \ - return set_swi_errno(ts, -1); \ + return set_swi_errno(env, -1); \ } \ } else { \ if (get_user_u32(arg ## n, args + (n) * 4)) { \ errno = EFAULT; \ - return set_swi_errno(ts, -1); \ + return set_swi_errno(env, -1); \ } \ } \ } while (0) @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) int nr; uint32_t ret; uint32_t len; -#ifdef CONFIG_USER_ONLY - TaskState *ts = cs->opaque; -#else - CPUARMState *ts = env; -#endif GuestFD *gf; if (is_a64(env)) { @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) s = lock_user_string(arg0); if (!s) { errno = EFAULT; - return set_swi_errno(ts, -1); + return set_swi_errno(env, -1); } if (arg1 >= 12) { unlock_user(s, arg0, 0); errno = EINVAL; - return set_swi_errno(ts, -1); + return set_swi_errno(env, -1); } guestfd = alloc_guestfd(); if (guestfd < 0) { unlock_user(s, arg0, 0); errno = EMFILE; - return set_swi_errno(ts, -1); + return set_swi_errno(env, -1); } if (strcmp(s, ":tt") == 0) { @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) ret = arm_gdb_syscall(cpu, arm_semi_open_cb, "open,%s,%x,1a4", arg0, (int)arg2+1, gdb_open_modeflags[arg1]); } else { - ret = set_swi_errno(ts, open(s, open_modeflags[arg1], 0644)); + ret = set_swi_errno(env, open(s, open_modeflags[arg1], 0644)); if (ret == (uint32_t)-1) { dealloc_guestfd(guestfd); } else { @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) gf = get_guestfd(arg0); if (!gf) { errno = EBADF; - return set_swi_errno(ts, -1); + return set_swi_errno(env, -1); } if (use_gdb_syscalls()) { ret = arm_gdb_syscall(cpu, arm_semi_cb, "close,%x", gf->hostfd); } else { - ret = set_swi_errno(ts, close(gf->hostfd)); + ret = set_swi_errno(env, close(gf->hostfd)); } dealloc_guestfd(arg0); return ret; @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) gf = get_guestfd(arg0); if (!gf) { errno = EBADF; - return set_swi_errno(ts, -1); + return set_swi_errno(env, -1); } if (use_gdb_syscalls()) { @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) /* Return bytes not written on error */ return len; } - ret = set_swi_errno(ts, write(gf->hostfd, s, len)); + ret = set_swi_errno(env, write(gf->hostfd, s, len)); unlock_user(s, arg1, 0); if (ret == (uint32_t)-1) { ret = 0; @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) gf = get_guestfd(arg0); if (!gf) { errno = EBADF; - return set_swi_errno(ts, -1); + return set_swi_errno(env, -1); } if (use_gdb_syscalls()) { @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) return len; } do { - ret = set_swi_errno(ts, read(gf->hostfd, s, len)); + ret = set_swi_errno(env, read(gf->hostfd, s, len)); } while (ret == -1 && errno == EINTR); unlock_user(s, arg1, len); if (ret == (uint32_t)-1) { @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) gf = get_guestfd(arg0); if (!gf) { errno = EBADF; - return set_swi_errno(ts, -1); + return set_swi_errno(env, -1); } if (use_gdb_syscalls()) { @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) gf = get_guestfd(arg0); if (!gf) { errno = EBADF; - return set_swi_errno(ts, -1); + return set_swi_errno(env, -1); } if (use_gdb_syscalls()) { return arm_gdb_syscall(cpu, arm_semi_cb, "lseek,%x,%x,0", gf->hostfd, arg1); } else { - ret = set_swi_errno(ts, lseek(gf->hostfd, arg1, SEEK_SET)); + ret = set_swi_errno(env, lseek(gf->hostfd, arg1, SEEK_SET)); if (ret == (uint32_t)-1) return -1; return 0; @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) gf = get_guestfd(arg0); if (!gf) { errno = EBADF; - return set_swi_errno(ts, -1); + return set_swi_errno(env, -1); } if (use_gdb_syscalls()) { @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) gf->hostfd, arm_flen_buf(cpu)); } else { struct stat buf; - ret = set_swi_errno(ts, fstat(gf->hostfd, &buf)); + ret = set_swi_errno(env, fstat(gf->hostfd, &buf)); if (ret == (uint32_t)-1) return -1; return buf.st_size; @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) s = lock_user_string(arg0); if (!s) { errno = EFAULT; - return set_swi_errno(ts, -1); + return set_swi_errno(env, -1); } - ret = set_swi_errno(ts, remove(s)); + ret = set_swi_errno(env, remove(s)); unlock_user(s, arg0, 0); } return ret; @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) s2 = lock_user_string(arg2); if (!s || !s2) { errno = EFAULT; - ret = set_swi_errno(ts, -1); + ret = set_swi_errno(env, -1); } else { - ret = set_swi_errno(ts, rename(s, s2)); + ret = set_swi_errno(env, rename(s, s2)); } if (s2) unlock_user(s2, arg2, 0); @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) case TARGET_SYS_CLOCK: return clock() / (CLOCKS_PER_SEC / 100); case TARGET_SYS_TIME: - return set_swi_errno(ts, time(NULL)); + return set_swi_errno(env, time(NULL)); case TARGET_SYS_SYSTEM: GET_ARG(0); GET_ARG(1); @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) s = lock_user_string(arg0); if (!s) { errno = EFAULT; - return set_swi_errno(ts, -1); + return set_swi_errno(env, -1); } - ret = set_swi_errno(ts, system(s)); + ret = set_swi_errno(env, system(s)); unlock_user(s, arg0, 0); return ret; } case TARGET_SYS_ERRNO: -#ifdef CONFIG_USER_ONLY - return ts->swi_errno; -#else - return syscall_err; -#endif + return get_swi_errno(env); case TARGET_SYS_GET_CMDLINE: { /* Build a command-line from the original argv. @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) int status = 0; #if !defined(CONFIG_USER_ONLY) const char *cmdline; +#else + TaskState *ts = cs->opaque; #endif GET_ARG(0); GET_ARG(1); @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) if (output_size > input_size) { /* Not enough space to store command-line arguments. */ errno = E2BIG; - return set_swi_errno(ts, -1); + return set_swi_errno(env, -1); } /* Adjust the command-line length. */ if (SET_ARG(1, output_size - 1)) { /* Couldn't write back to argument block */ errno = EFAULT; - return set_swi_errno(ts, -1); + return set_swi_errno(env, -1); } /* Lock the buffer on the ARM side. */ output_buffer = lock_user(VERIFY_WRITE, arg0, output_size, 0); if (!output_buffer) { errno = EFAULT; - return set_swi_errno(ts, -1); + return set_swi_errno(env, -1); } /* Copy the command-line arguments. */ @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) if (copy_from_user(output_buffer, ts->info->arg_start, output_size)) { errno = EFAULT; - status = set_swi_errno(ts, -1); + status = set_swi_errno(env, -1); goto out; } @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) target_ulong retvals[4]; target_ulong limit; int i; +#ifdef CONFIG_USER_ONLY + TaskState *ts = cs->opaque; +#endif GET_ARG(0); @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) if (fail) { /* Couldn't write back to argument block */ errno = EFAULT; - return set_swi_errno(ts, -1); + return set_swi_errno(env, -1); } } return 0; -- 2.20.1
When we are routing semihosting operations through the gdbstub, the work of sorting out the return value and setting errno if necessary is done by callback functions which are invoked by the gdbstub code. Clean up some ifdeffery in those functions by having them call set_swi_errno() to set the semihosting errno. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20190916141544.17540-7-peter.maydell@linaro.org --- target/arm/arm-semi.c | 27 ++++++--------------------- 1 file changed, 6 insertions(+), 21 deletions(-) diff --git a/target/arm/arm-semi.c b/target/arm/arm-semi.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/arm-semi.c +++ b/target/arm/arm-semi.c @@ -XXX,XX +XXX,XX @@ static void arm_semi_cb(CPUState *cs, target_ulong ret, target_ulong err) { ARMCPU *cpu = ARM_CPU(cs); CPUARMState *env = &cpu->env; -#ifdef CONFIG_USER_ONLY - TaskState *ts = cs->opaque; -#endif target_ulong reg0 = is_a64(env) ? env->xregs[0] : env->regs[0]; if (ret == (target_ulong)-1) { -#ifdef CONFIG_USER_ONLY - ts->swi_errno = err; -#else - syscall_err = err; -#endif + errno = err; + set_swi_errno(env, -1); reg0 = ret; } else { /* Fixup syscalls that use nonstardard return conventions. */ @@ -XXX,XX +XXX,XX @@ static void arm_semi_flen_cb(CPUState *cs, target_ulong ret, target_ulong err) } else { env->regs[0] = size; } -#ifdef CONFIG_USER_ONLY - ((TaskState *)cs->opaque)->swi_errno = err; -#else - syscall_err = err; -#endif + errno = err; + set_swi_errno(env, -1); } static int arm_semi_open_guestfd; @@ -XXX,XX +XXX,XX @@ static void arm_semi_open_cb(CPUState *cs, target_ulong ret, target_ulong err) { ARMCPU *cpu = ARM_CPU(cs); CPUARMState *env = &cpu->env; -#ifdef CONFIG_USER_ONLY - TaskState *ts = cs->opaque; -#endif if (ret == (target_ulong)-1) { -#ifdef CONFIG_USER_ONLY - ts->swi_errno = err; -#else - syscall_err = err; -#endif + errno = err; + set_swi_errno(env, -1); dealloc_guestfd(arm_semi_open_guestfd); } else { associate_guestfd(arm_semi_open_guestfd, ret); -- 2.20.1
Currently for the semihosting calls which take a file descriptor (SYS_CLOSE, SYS_WRITE, SYS_READ, SYS_ISTTY, SYS_SEEK, SYS_FLEN) we have effectively two implementations, one for real host files and one for when we indirect via the gdbstub. We want to add a third one to deal with the magic :semihosting-features file. Instead of having a three-way if statement in each of these cases, factor out the implementation of the calls to separate functions which we dispatch to via function pointers selected via the GuestFDType for the guest fd. In this commit, we set up the framework for the dispatch, and convert the SYS_CLOSE call to use it. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20190916141544.17540-8-peter.maydell@linaro.org --- target/arm/arm-semi.c | 44 ++++++++++++++++++++++++++++++++++++------- 1 file changed, 37 insertions(+), 7 deletions(-) diff --git a/target/arm/arm-semi.c b/target/arm/arm-semi.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/arm-semi.c +++ b/target/arm/arm-semi.c @@ -XXX,XX +XXX,XX @@ static int open_modeflags[12] = { typedef enum GuestFDType { GuestFDUnused = 0, GuestFDHost = 1, + GuestFDGDB = 2, } GuestFDType; /* @@ -XXX,XX +XXX,XX @@ static GuestFD *do_get_guestfd(int guestfd) /* * Associate the specified guest fd (which must have been * allocated via alloc_fd() and not previously used) with - * the specified host fd. + * the specified host/gdb fd. */ static void associate_guestfd(int guestfd, int hostfd) { GuestFD *gf = do_get_guestfd(guestfd); assert(gf); - gf->type = GuestFDHost; + gf->type = use_gdb_syscalls() ? GuestFDGDB : GuestFDHost; gf->hostfd = hostfd; } @@ -XXX,XX +XXX,XX @@ static target_ulong arm_gdb_syscall(ARMCPU *cpu, gdb_syscall_complete_cb cb, return is_a64(env) ? env->xregs[0] : env->regs[0]; } +/* + * Types for functions implementing various semihosting calls + * for specific types of guest file descriptor. These must all + * do the work and return the required return value for the guest, + * setting the guest errno if appropriate. + */ +typedef uint32_t sys_closefn(ARMCPU *cpu, GuestFD *gf); + +static uint32_t host_closefn(ARMCPU *cpu, GuestFD *gf) +{ + CPUARMState *env = &cpu->env; + + return set_swi_errno(env, close(gf->hostfd)); +} + +static uint32_t gdb_closefn(ARMCPU *cpu, GuestFD *gf) +{ + return arm_gdb_syscall(cpu, arm_semi_cb, "close,%x", gf->hostfd); +} + +typedef struct GuestFDFunctions { + sys_closefn *closefn; +} GuestFDFunctions; + +static const GuestFDFunctions guestfd_fns[] = { + [GuestFDHost] = { + .closefn = host_closefn, + }, + [GuestFDGDB] = { + .closefn = gdb_closefn, + }, +}; + /* Read the input value from the argument block; fail the semihosting * call if the memory read fails. */ @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) return set_swi_errno(env, -1); } - if (use_gdb_syscalls()) { - ret = arm_gdb_syscall(cpu, arm_semi_cb, "close,%x", gf->hostfd); - } else { - ret = set_swi_errno(env, close(gf->hostfd)); - } + ret = guestfd_fns[gf->type].closefn(cpu, gf); dealloc_guestfd(arg0); return ret; case TARGET_SYS_WRITEC: -- 2.20.1
Factor out the implementation of SYS_WRITE via the new function tables. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20190916141544.17540-9-peter.maydell@linaro.org --- target/arm/arm-semi.c | 51 ++++++++++++++++++++++++++++--------------- 1 file changed, 33 insertions(+), 18 deletions(-) diff --git a/target/arm/arm-semi.c b/target/arm/arm-semi.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/arm-semi.c +++ b/target/arm/arm-semi.c @@ -XXX,XX +XXX,XX @@ static target_ulong arm_gdb_syscall(ARMCPU *cpu, gdb_syscall_complete_cb cb, * setting the guest errno if appropriate. */ typedef uint32_t sys_closefn(ARMCPU *cpu, GuestFD *gf); +typedef uint32_t sys_writefn(ARMCPU *cpu, GuestFD *gf, + target_ulong buf, uint32_t len); static uint32_t host_closefn(ARMCPU *cpu, GuestFD *gf) { @@ -XXX,XX +XXX,XX @@ static uint32_t host_closefn(ARMCPU *cpu, GuestFD *gf) return set_swi_errno(env, close(gf->hostfd)); } +static uint32_t host_writefn(ARMCPU *cpu, GuestFD *gf, + target_ulong buf, uint32_t len) +{ + uint32_t ret; + CPUARMState *env = &cpu->env; + char *s = lock_user(VERIFY_READ, buf, len, 1); + if (!s) { + /* Return bytes not written on error */ + return len; + } + ret = set_swi_errno(env, write(gf->hostfd, s, len)); + unlock_user(s, buf, 0); + if (ret == (uint32_t)-1) { + ret = 0; + } + /* Return bytes not written */ + return len - ret; +} + static uint32_t gdb_closefn(ARMCPU *cpu, GuestFD *gf) { return arm_gdb_syscall(cpu, arm_semi_cb, "close,%x", gf->hostfd); } +static uint32_t gdb_writefn(ARMCPU *cpu, GuestFD *gf, + target_ulong buf, uint32_t len) +{ + arm_semi_syscall_len = len; + return arm_gdb_syscall(cpu, arm_semi_cb, "write,%x,%x,%x", + gf->hostfd, buf, len); +} + typedef struct GuestFDFunctions { sys_closefn *closefn; + sys_writefn *writefn; } GuestFDFunctions; static const GuestFDFunctions guestfd_fns[] = { [GuestFDHost] = { .closefn = host_closefn, + .writefn = host_writefn, }, [GuestFDGDB] = { .closefn = gdb_closefn, + .writefn = gdb_writefn, }, }; @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) return set_swi_errno(env, -1); } - if (use_gdb_syscalls()) { - arm_semi_syscall_len = len; - return arm_gdb_syscall(cpu, arm_semi_cb, "write,%x,%x,%x", - gf->hostfd, arg1, len); - } else { - s = lock_user(VERIFY_READ, arg1, len, 1); - if (!s) { - /* Return bytes not written on error */ - return len; - } - ret = set_swi_errno(env, write(gf->hostfd, s, len)); - unlock_user(s, arg1, 0); - if (ret == (uint32_t)-1) { - ret = 0; - } - /* Return bytes not written */ - return len - ret; - } + return guestfd_fns[gf->type].writefn(cpu, gf, arg1, len); case TARGET_SYS_READ: GET_ARG(0); GET_ARG(1); -- 2.20.1
Factor out the implementation of SYS_READ via the new function tables. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20190916141544.17540-10-peter.maydell@linaro.org --- target/arm/arm-semi.c | 55 +++++++++++++++++++++++++++---------------- 1 file changed, 35 insertions(+), 20 deletions(-) diff --git a/target/arm/arm-semi.c b/target/arm/arm-semi.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/arm-semi.c +++ b/target/arm/arm-semi.c @@ -XXX,XX +XXX,XX @@ static target_ulong arm_gdb_syscall(ARMCPU *cpu, gdb_syscall_complete_cb cb, typedef uint32_t sys_closefn(ARMCPU *cpu, GuestFD *gf); typedef uint32_t sys_writefn(ARMCPU *cpu, GuestFD *gf, target_ulong buf, uint32_t len); +typedef uint32_t sys_readfn(ARMCPU *cpu, GuestFD *gf, + target_ulong buf, uint32_t len); static uint32_t host_closefn(ARMCPU *cpu, GuestFD *gf) { @@ -XXX,XX +XXX,XX @@ static uint32_t host_writefn(ARMCPU *cpu, GuestFD *gf, return len - ret; } +static uint32_t host_readfn(ARMCPU *cpu, GuestFD *gf, + target_ulong buf, uint32_t len) +{ + uint32_t ret; + CPUARMState *env = &cpu->env; + char *s = lock_user(VERIFY_WRITE, buf, len, 0); + if (!s) { + /* return bytes not read */ + return len; + } + do { + ret = set_swi_errno(env, read(gf->hostfd, s, len)); + } while (ret == -1 && errno == EINTR); + unlock_user(s, buf, len); + if (ret == (uint32_t)-1) { + ret = 0; + } + /* Return bytes not read */ + return len - ret; +} + static uint32_t gdb_closefn(ARMCPU *cpu, GuestFD *gf) { return arm_gdb_syscall(cpu, arm_semi_cb, "close,%x", gf->hostfd); @@ -XXX,XX +XXX,XX @@ static uint32_t gdb_writefn(ARMCPU *cpu, GuestFD *gf, gf->hostfd, buf, len); } +static uint32_t gdb_readfn(ARMCPU *cpu, GuestFD *gf, + target_ulong buf, uint32_t len) +{ + arm_semi_syscall_len = len; + return arm_gdb_syscall(cpu, arm_semi_cb, "read,%x,%x,%x", + gf->hostfd, buf, len); +} + typedef struct GuestFDFunctions { sys_closefn *closefn; sys_writefn *writefn; + sys_readfn *readfn; } GuestFDFunctions; static const GuestFDFunctions guestfd_fns[] = { [GuestFDHost] = { .closefn = host_closefn, .writefn = host_writefn, + .readfn = host_readfn, }, [GuestFDGDB] = { .closefn = gdb_closefn, .writefn = gdb_writefn, + .readfn = gdb_readfn, }, }; @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) return set_swi_errno(env, -1); } - if (use_gdb_syscalls()) { - arm_semi_syscall_len = len; - return arm_gdb_syscall(cpu, arm_semi_cb, "read,%x,%x,%x", - gf->hostfd, arg1, len); - } else { - s = lock_user(VERIFY_WRITE, arg1, len, 0); - if (!s) { - /* return bytes not read */ - return len; - } - do { - ret = set_swi_errno(env, read(gf->hostfd, s, len)); - } while (ret == -1 && errno == EINTR); - unlock_user(s, arg1, len); - if (ret == (uint32_t)-1) { - ret = 0; - } - /* Return bytes not read */ - return len - ret; - } + return guestfd_fns[gf->type].readfn(cpu, gf, arg1, len); case TARGET_SYS_READC: qemu_log_mask(LOG_UNIMP, "%s: SYS_READC not implemented", __func__); return 0; -- 2.20.1
Factor out the implementation of SYS_ISTTY via the new function tables. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20190916141544.17540-11-peter.maydell@linaro.org --- target/arm/arm-semi.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/target/arm/arm-semi.c b/target/arm/arm-semi.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/arm-semi.c +++ b/target/arm/arm-semi.c @@ -XXX,XX +XXX,XX @@ typedef uint32_t sys_writefn(ARMCPU *cpu, GuestFD *gf, target_ulong buf, uint32_t len); typedef uint32_t sys_readfn(ARMCPU *cpu, GuestFD *gf, target_ulong buf, uint32_t len); +typedef uint32_t sys_isattyfn(ARMCPU *cpu, GuestFD *gf); static uint32_t host_closefn(ARMCPU *cpu, GuestFD *gf) { @@ -XXX,XX +XXX,XX @@ static uint32_t host_readfn(ARMCPU *cpu, GuestFD *gf, return len - ret; } +static uint32_t host_isattyfn(ARMCPU *cpu, GuestFD *gf) +{ + return isatty(gf->hostfd); +} + static uint32_t gdb_closefn(ARMCPU *cpu, GuestFD *gf) { return arm_gdb_syscall(cpu, arm_semi_cb, "close,%x", gf->hostfd); @@ -XXX,XX +XXX,XX @@ static uint32_t gdb_readfn(ARMCPU *cpu, GuestFD *gf, gf->hostfd, buf, len); } +static uint32_t gdb_isattyfn(ARMCPU *cpu, GuestFD *gf) +{ + return arm_gdb_syscall(cpu, arm_semi_cb, "isatty,%x", gf->hostfd); +} + typedef struct GuestFDFunctions { sys_closefn *closefn; sys_writefn *writefn; sys_readfn *readfn; + sys_isattyfn *isattyfn; } GuestFDFunctions; static const GuestFDFunctions guestfd_fns[] = { @@ -XXX,XX +XXX,XX @@ static const GuestFDFunctions guestfd_fns[] = { .closefn = host_closefn, .writefn = host_writefn, .readfn = host_readfn, + .isattyfn = host_isattyfn, }, [GuestFDGDB] = { .closefn = gdb_closefn, .writefn = gdb_writefn, .readfn = gdb_readfn, + .isattyfn = gdb_isattyfn, }, }; @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) return set_swi_errno(env, -1); } - if (use_gdb_syscalls()) { - return arm_gdb_syscall(cpu, arm_semi_cb, "isatty,%x", gf->hostfd); - } else { - return isatty(gf->hostfd); - } + return guestfd_fns[gf->type].isattyfn(cpu, gf); case TARGET_SYS_SEEK: GET_ARG(0); GET_ARG(1); -- 2.20.1
Factor out the implementation of SYS_SEEK via the new function tables. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20190916141544.17540-12-peter.maydell@linaro.org --- target/arm/arm-semi.c | 31 ++++++++++++++++++++++--------- 1 file changed, 22 insertions(+), 9 deletions(-) diff --git a/target/arm/arm-semi.c b/target/arm/arm-semi.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/arm-semi.c +++ b/target/arm/arm-semi.c @@ -XXX,XX +XXX,XX @@ typedef uint32_t sys_writefn(ARMCPU *cpu, GuestFD *gf, typedef uint32_t sys_readfn(ARMCPU *cpu, GuestFD *gf, target_ulong buf, uint32_t len); typedef uint32_t sys_isattyfn(ARMCPU *cpu, GuestFD *gf); +typedef uint32_t sys_seekfn(ARMCPU *cpu, GuestFD *gf, + target_ulong offset); static uint32_t host_closefn(ARMCPU *cpu, GuestFD *gf) { @@ -XXX,XX +XXX,XX @@ static uint32_t host_isattyfn(ARMCPU *cpu, GuestFD *gf) return isatty(gf->hostfd); } +static uint32_t host_seekfn(ARMCPU *cpu, GuestFD *gf, target_ulong offset) +{ + CPUARMState *env = &cpu->env; + uint32_t ret = set_swi_errno(env, lseek(gf->hostfd, offset, SEEK_SET)); + if (ret == (uint32_t)-1) { + return -1; + } + return 0; +} + static uint32_t gdb_closefn(ARMCPU *cpu, GuestFD *gf) { return arm_gdb_syscall(cpu, arm_semi_cb, "close,%x", gf->hostfd); @@ -XXX,XX +XXX,XX @@ static uint32_t gdb_isattyfn(ARMCPU *cpu, GuestFD *gf) return arm_gdb_syscall(cpu, arm_semi_cb, "isatty,%x", gf->hostfd); } +static uint32_t gdb_seekfn(ARMCPU *cpu, GuestFD *gf, target_ulong offset) +{ + return arm_gdb_syscall(cpu, arm_semi_cb, "lseek,%x,%x,0", + gf->hostfd, offset); +} + typedef struct GuestFDFunctions { sys_closefn *closefn; sys_writefn *writefn; sys_readfn *readfn; sys_isattyfn *isattyfn; + sys_seekfn *seekfn; } GuestFDFunctions; static const GuestFDFunctions guestfd_fns[] = { @@ -XXX,XX +XXX,XX @@ static const GuestFDFunctions guestfd_fns[] = { .writefn = host_writefn, .readfn = host_readfn, .isattyfn = host_isattyfn, + .seekfn = host_seekfn, }, [GuestFDGDB] = { .closefn = gdb_closefn, .writefn = gdb_writefn, .readfn = gdb_readfn, .isattyfn = gdb_isattyfn, + .seekfn = gdb_seekfn, }, }; @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) return set_swi_errno(env, -1); } - if (use_gdb_syscalls()) { - return arm_gdb_syscall(cpu, arm_semi_cb, "lseek,%x,%x,0", - gf->hostfd, arg1); - } else { - ret = set_swi_errno(env, lseek(gf->hostfd, arg1, SEEK_SET)); - if (ret == (uint32_t)-1) - return -1; - return 0; - } + return guestfd_fns[gf->type].seekfn(cpu, gf, arg1); case TARGET_SYS_FLEN: GET_ARG(0); -- 2.20.1
Factor out the implementation of SYS_FLEN via the new function tables. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20190916141544.17540-13-peter.maydell@linaro.org --- target/arm/arm-semi.c | 32 ++++++++++++++++++++++---------- 1 file changed, 22 insertions(+), 10 deletions(-) diff --git a/target/arm/arm-semi.c b/target/arm/arm-semi.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/arm-semi.c +++ b/target/arm/arm-semi.c @@ -XXX,XX +XXX,XX @@ typedef uint32_t sys_readfn(ARMCPU *cpu, GuestFD *gf, typedef uint32_t sys_isattyfn(ARMCPU *cpu, GuestFD *gf); typedef uint32_t sys_seekfn(ARMCPU *cpu, GuestFD *gf, target_ulong offset); +typedef uint32_t sys_flenfn(ARMCPU *cpu, GuestFD *gf); static uint32_t host_closefn(ARMCPU *cpu, GuestFD *gf) { @@ -XXX,XX +XXX,XX @@ static uint32_t host_seekfn(ARMCPU *cpu, GuestFD *gf, target_ulong offset) return 0; } +static uint32_t host_flenfn(ARMCPU *cpu, GuestFD *gf) +{ + CPUARMState *env = &cpu->env; + struct stat buf; + uint32_t ret = set_swi_errno(env, fstat(gf->hostfd, &buf)); + if (ret == (uint32_t)-1) { + return -1; + } + return buf.st_size; +} + static uint32_t gdb_closefn(ARMCPU *cpu, GuestFD *gf) { return arm_gdb_syscall(cpu, arm_semi_cb, "close,%x", gf->hostfd); @@ -XXX,XX +XXX,XX @@ static uint32_t gdb_seekfn(ARMCPU *cpu, GuestFD *gf, target_ulong offset) gf->hostfd, offset); } +static uint32_t gdb_flenfn(ARMCPU *cpu, GuestFD *gf) +{ + return arm_gdb_syscall(cpu, arm_semi_flen_cb, "fstat,%x,%x", + gf->hostfd, arm_flen_buf(cpu)); +} + typedef struct GuestFDFunctions { sys_closefn *closefn; sys_writefn *writefn; sys_readfn *readfn; sys_isattyfn *isattyfn; sys_seekfn *seekfn; + sys_flenfn *flenfn; } GuestFDFunctions; static const GuestFDFunctions guestfd_fns[] = { @@ -XXX,XX +XXX,XX @@ static const GuestFDFunctions guestfd_fns[] = { .readfn = host_readfn, .isattyfn = host_isattyfn, .seekfn = host_seekfn, + .flenfn = host_flenfn, }, [GuestFDGDB] = { .closefn = gdb_closefn, @@ -XXX,XX +XXX,XX @@ static const GuestFDFunctions guestfd_fns[] = { .readfn = gdb_readfn, .isattyfn = gdb_isattyfn, .seekfn = gdb_seekfn, + .flenfn = gdb_flenfn, }, }; @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) return set_swi_errno(env, -1); } - if (use_gdb_syscalls()) { - return arm_gdb_syscall(cpu, arm_semi_flen_cb, "fstat,%x,%x", - gf->hostfd, arm_flen_buf(cpu)); - } else { - struct stat buf; - ret = set_swi_errno(env, fstat(gf->hostfd, &buf)); - if (ret == (uint32_t)-1) - return -1; - return buf.st_size; - } + return guestfd_fns[gf->type].flenfn(cpu, gf); case TARGET_SYS_TMPNAM: qemu_log_mask(LOG_UNIMP, "%s: SYS_TMPNAM not implemented", __func__); return -1; -- 2.20.1
Version 2.0 of the semihosting specification added support for allowing a guest to detect whether the implementation supported particular features. This works by the guest opening a magic file ":semihosting-features", which contains a fixed set of data with some magic numbers followed by a sequence of bytes with feature flags. The file is expected to behave sensibly for the various semihosting calls which operate on files (SYS_FLEN, SYS_SEEK, etc). Implement this as another kind of guest FD using our function table dispatch mechanism. Initially we report no extended features, so we have just one feature flag byte which is zero. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20190916141544.17540-14-peter.maydell@linaro.org --- target/arm/arm-semi.c | 109 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 108 insertions(+), 1 deletion(-) diff --git a/target/arm/arm-semi.c b/target/arm/arm-semi.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/arm-semi.c +++ b/target/arm/arm-semi.c @@ -XXX,XX +XXX,XX @@ typedef enum GuestFDType { GuestFDUnused = 0, GuestFDHost = 1, GuestFDGDB = 2, + GuestFDFeatureFile = 3, } GuestFDType; /* @@ -XXX,XX +XXX,XX @@ typedef enum GuestFDType { */ typedef struct GuestFD { GuestFDType type; - int hostfd; + union { + int hostfd; + target_ulong featurefile_offset; + }; } GuestFD; static GArray *guestfd_array; @@ -XXX,XX +XXX,XX @@ static uint32_t gdb_flenfn(ARMCPU *cpu, GuestFD *gf) gf->hostfd, arm_flen_buf(cpu)); } +#define SHFB_MAGIC_0 0x53 +#define SHFB_MAGIC_1 0x48 +#define SHFB_MAGIC_2 0x46 +#define SHFB_MAGIC_3 0x42 + +static const uint8_t featurefile_data[] = { + SHFB_MAGIC_0, + SHFB_MAGIC_1, + SHFB_MAGIC_2, + SHFB_MAGIC_3, + 0, /* Feature byte 0 */ +}; + +static void init_featurefile_guestfd(int guestfd) +{ + GuestFD *gf = do_get_guestfd(guestfd); + + assert(gf); + gf->type = GuestFDFeatureFile; + gf->featurefile_offset = 0; +} + +static uint32_t featurefile_closefn(ARMCPU *cpu, GuestFD *gf) +{ + /* Nothing to do */ + return 0; +} + +static uint32_t featurefile_writefn(ARMCPU *cpu, GuestFD *gf, + target_ulong buf, uint32_t len) +{ + /* This fd can never be open for writing */ + CPUARMState *env = &cpu->env; + + errno = EBADF; + return set_swi_errno(env, -1); +} + +static uint32_t featurefile_readfn(ARMCPU *cpu, GuestFD *gf, + target_ulong buf, uint32_t len) +{ + uint32_t i; +#ifndef CONFIG_USER_ONLY + CPUARMState *env = &cpu->env; +#endif + char *s; + + s = lock_user(VERIFY_WRITE, buf, len, 0); + if (!s) { + return len; + } + + for (i = 0; i < len; i++) { + if (gf->featurefile_offset >= sizeof(featurefile_data)) { + break; + } + s[i] = featurefile_data[gf->featurefile_offset]; + gf->featurefile_offset++; + } + + unlock_user(s, buf, len); + + /* Return number of bytes not read */ + return len - i; +} + +static uint32_t featurefile_isattyfn(ARMCPU *cpu, GuestFD *gf) +{ + return 0; +} + +static uint32_t featurefile_seekfn(ARMCPU *cpu, GuestFD *gf, + target_ulong offset) +{ + gf->featurefile_offset = offset; + return 0; +} + +static uint32_t featurefile_flenfn(ARMCPU *cpu, GuestFD *gf) +{ + return sizeof(featurefile_data); +} + typedef struct GuestFDFunctions { sys_closefn *closefn; sys_writefn *writefn; @@ -XXX,XX +XXX,XX @@ static const GuestFDFunctions guestfd_fns[] = { .seekfn = gdb_seekfn, .flenfn = gdb_flenfn, }, + [GuestFDFeatureFile] = { + .closefn = featurefile_closefn, + .writefn = featurefile_writefn, + .readfn = featurefile_readfn, + .isattyfn = featurefile_isattyfn, + .seekfn = featurefile_seekfn, + .flenfn = featurefile_flenfn, + }, }; /* Read the input value from the argument block; fail the semihosting @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) unlock_user(s, arg0, 0); return guestfd; } + if (strcmp(s, ":semihosting-features") == 0) { + unlock_user(s, arg0, 0); + /* We must fail opens for modes other than 0 ('r') or 1 ('rb') */ + if (arg1 != 0 && arg1 != 1) { + dealloc_guestfd(guestfd); + errno = EACCES; + return set_swi_errno(env, -1); + } + init_featurefile_guestfd(guestfd); + return guestfd; + } + if (use_gdb_syscalls()) { arm_semi_open_guestfd = guestfd; ret = arm_gdb_syscall(cpu, arm_semi_open_cb, "open,%s,%x,1a4", arg0, -- 2.20.1
SH_EXT_EXIT_EXTENDED is a v2.0 semihosting extension: it indicates that the implementation supports the SYS_EXIT_EXTENDED function. This function allows both A64 and A32/T32 guests to exit with a specified exit status, unlike the older SYS_EXIT function which only allowed this for A64 guests. Implement this extension. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20190916141544.17540-15-peter.maydell@linaro.org --- target/arm/arm-semi.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/target/arm/arm-semi.c b/target/arm/arm-semi.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/arm-semi.c +++ b/target/arm/arm-semi.c @@ -XXX,XX +XXX,XX @@ #define TARGET_SYS_HEAPINFO 0x16 #define TARGET_SYS_EXIT 0x18 #define TARGET_SYS_SYNCCACHE 0x19 +#define TARGET_SYS_EXIT_EXTENDED 0x20 /* ADP_Stopped_ApplicationExit is used for exit(0), * anything else is implemented as exit(1) */ @@ -XXX,XX +XXX,XX @@ static uint32_t gdb_flenfn(ARMCPU *cpu, GuestFD *gf) #define SHFB_MAGIC_2 0x46 #define SHFB_MAGIC_3 0x42 +/* Feature bits reportable in feature byte 0 */ +#define SH_EXT_EXIT_EXTENDED (1 << 0) + static const uint8_t featurefile_data[] = { SHFB_MAGIC_0, SHFB_MAGIC_1, SHFB_MAGIC_2, SHFB_MAGIC_3, - 0, /* Feature byte 0 */ + SH_EXT_EXIT_EXTENDED, /* Feature byte 0 */ }; static void init_featurefile_guestfd(int guestfd) @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) return 0; } case TARGET_SYS_EXIT: - if (is_a64(env)) { + case TARGET_SYS_EXIT_EXTENDED: + if (nr == TARGET_SYS_EXIT_EXTENDED || is_a64(env)) { /* - * The A64 version of this call takes a parameter block, + * The A64 version of SYS_EXIT takes a parameter block, * so the application-exit type can return a subcode which * is the exit status code from the application. + * SYS_EXIT_EXTENDED is an a new-in-v2.0 optional function + * which allows A32/T32 guests to also provide a status code. */ GET_ARG(0); GET_ARG(1); @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) } } else { /* - * ARM specifies only Stopped_ApplicationExit as normal - * exit, everything else is considered an error + * The A32/T32 version of SYS_EXIT specifies only + * Stopped_ApplicationExit as normal exit, but does not + * allow the guest to specify the exit status code. + * Everything else is considered an error. */ ret = (args == ADP_Stopped_ApplicationExit) ? 0 : 1; } -- 2.20.1
SH_EXT_STDOUT_STDERR is a v2.0 semihosting extension: the guest can open ":tt" with a file mode requesting append access in order to open stderr, in addition to the existing "open for read for stdin or write for stdout". Implement this and report it via the :semihosting-features data. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20190916141544.17540-16-peter.maydell@linaro.org --- target/arm/arm-semi.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/target/arm/arm-semi.c b/target/arm/arm-semi.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/arm-semi.c +++ b/target/arm/arm-semi.c @@ -XXX,XX +XXX,XX @@ static uint32_t gdb_flenfn(ARMCPU *cpu, GuestFD *gf) /* Feature bits reportable in feature byte 0 */ #define SH_EXT_EXIT_EXTENDED (1 << 0) +#define SH_EXT_STDOUT_STDERR (1 << 1) static const uint8_t featurefile_data[] = { SHFB_MAGIC_0, SHFB_MAGIC_1, SHFB_MAGIC_2, SHFB_MAGIC_3, - SH_EXT_EXIT_EXTENDED, /* Feature byte 0 */ + SH_EXT_EXIT_EXTENDED | SH_EXT_STDOUT_STDERR, /* Feature byte 0 */ }; static void init_featurefile_guestfd(int guestfd) @@ -XXX,XX +XXX,XX @@ target_ulong do_arm_semihosting(CPUARMState *env) } if (strcmp(s, ":tt") == 0) { - int result_fileno = arg1 < 4 ? STDIN_FILENO : STDOUT_FILENO; + int result_fileno; + + /* + * We implement SH_EXT_STDOUT_STDERR, so: + * open for read == stdin + * open for write == stdout + * open for append == stderr + */ + if (arg1 < 4) { + result_fileno = STDIN_FILENO; + } else if (arg1 < 8) { + result_fileno = STDOUT_FILENO; + } else { + result_fileno = STDERR_FILENO; + } associate_guestfd(guestfd, result_fileno); unlock_user(s, arg0, 0); return guestfd; -- 2.20.1
From: Amithash Prasad <amithash@fb.com> When WDT_RESTART is written, the data is not the contents of the WDT_CTRL register. Hence ensure we are looking at WDT_CTRL to check if bit WDT_CTRL_1MHZ_CLK is set or not. Signed-off-by: Amithash Prasad <amithash@fb.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-id: 20190925143248.10000-2-clg@kaod.org [clg: improved Suject prefix ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- hw/watchdog/wdt_aspeed.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/watchdog/wdt_aspeed.c b/hw/watchdog/wdt_aspeed.c index XXXXXXX..XXXXXXX 100644 --- a/hw/watchdog/wdt_aspeed.c +++ b/hw/watchdog/wdt_aspeed.c @@ -XXX,XX +XXX,XX @@ static void aspeed_wdt_write(void *opaque, hwaddr offset, uint64_t data, case WDT_RESTART: if ((data & 0xFFFF) == WDT_RESTART_MAGIC) { s->regs[WDT_STATUS] = s->regs[WDT_RELOAD_VALUE]; - aspeed_wdt_reload(s, !(data & WDT_CTRL_1MHZ_CLK)); + aspeed_wdt_reload(s, !(s->regs[WDT_CTRL] & WDT_CTRL_1MHZ_CLK)); } break; case WDT_CTRL: -- 2.20.1
From: Eddie James <eajames@linux.ibm.com> The Aspeed SOCs have two SD/MMC controllers. Add a device that encapsulates both of these controllers and models the Aspeed-specific registers and behavior. Tested by reading from mmcblk0 in Linux: qemu-system-arm -machine romulus-bmc -nographic \ -drive file=flash-romulus,format=raw,if=mtd \ -device sd-card,drive=sd0 -drive file=_tmp/kernel,format=raw,if=sd,id=sd0 Signed-off-by: Eddie James <eajames@linux.ibm.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-id: 20190925143248.10000-3-clg@kaod.org [clg: - changed the controller MMIO window size to 0x1000 - moved the MMIO mapping of the SDHCI slots at the SoC level - merged code to add SD drives on the SD buses at the machine level ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- hw/sd/Makefile.objs | 1 + include/hw/arm/aspeed_soc.h | 3 + include/hw/sd/aspeed_sdhci.h | 34 ++++++ hw/arm/aspeed.c | 15 ++- hw/arm/aspeed_soc.c | 23 ++++ hw/sd/aspeed_sdhci.c | 198 +++++++++++++++++++++++++++++++++++ 6 files changed, 273 insertions(+), 1 deletion(-) create mode 100644 include/hw/sd/aspeed_sdhci.h create mode 100644 hw/sd/aspeed_sdhci.c diff --git a/hw/sd/Makefile.objs b/hw/sd/Makefile.objs index XXXXXXX..XXXXXXX 100644 --- a/hw/sd/Makefile.objs +++ b/hw/sd/Makefile.objs @@ -XXX,XX +XXX,XX @@ obj-$(CONFIG_MILKYMIST) += milkymist-memcard.o obj-$(CONFIG_OMAP) += omap_mmc.o obj-$(CONFIG_PXA2XX) += pxa2xx_mmci.o obj-$(CONFIG_RASPI) += bcm2835_sdhost.o +obj-$(CONFIG_ASPEED_SOC) += aspeed_sdhci.o diff --git a/include/hw/arm/aspeed_soc.h b/include/hw/arm/aspeed_soc.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/arm/aspeed_soc.h +++ b/include/hw/arm/aspeed_soc.h @@ -XXX,XX +XXX,XX @@ #include "hw/net/ftgmac100.h" #include "target/arm/cpu.h" #include "hw/gpio/aspeed_gpio.h" +#include "hw/sd/aspeed_sdhci.h" #define ASPEED_SPIS_NUM 2 #define ASPEED_WDTS_NUM 3 @@ -XXX,XX +XXX,XX @@ typedef struct AspeedSoCState { AspeedWDTState wdt[ASPEED_WDTS_NUM]; FTGMAC100State ftgmac100[ASPEED_MACS_NUM]; AspeedGPIOState gpio; + AspeedSDHCIState sdhci; } AspeedSoCState; #define TYPE_ASPEED_SOC "aspeed-soc" @@ -XXX,XX +XXX,XX @@ enum { ASPEED_SCU, ASPEED_ADC, ASPEED_SRAM, + ASPEED_SDHCI, ASPEED_GPIO, ASPEED_RTC, ASPEED_TIMER1, diff --git a/include/hw/sd/aspeed_sdhci.h b/include/hw/sd/aspeed_sdhci.h new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/include/hw/sd/aspeed_sdhci.h @@ -XXX,XX +XXX,XX @@ +/* + * Aspeed SD Host Controller + * Eddie James <eajames@linux.ibm.com> + * + * Copyright (C) 2019 IBM Corp + * SPDX-License-Identifer: GPL-2.0-or-later + */ + +#ifndef ASPEED_SDHCI_H +#define ASPEED_SDHCI_H + +#include "hw/sd/sdhci.h" + +#define TYPE_ASPEED_SDHCI "aspeed.sdhci" +#define ASPEED_SDHCI(obj) OBJECT_CHECK(AspeedSDHCIState, (obj), \ + TYPE_ASPEED_SDHCI) + +#define ASPEED_SDHCI_CAPABILITIES 0x01E80080 +#define ASPEED_SDHCI_NUM_SLOTS 2 +#define ASPEED_SDHCI_NUM_REGS (ASPEED_SDHCI_REG_SIZE / sizeof(uint32_t)) +#define ASPEED_SDHCI_REG_SIZE 0x100 + +typedef struct AspeedSDHCIState { + SysBusDevice parent; + + SDHCIState slots[ASPEED_SDHCI_NUM_SLOTS]; + + MemoryRegion iomem; + qemu_irq irq; + + uint32_t regs[ASPEED_SDHCI_NUM_REGS]; +} AspeedSDHCIState; + +#endif /* ASPEED_SDHCI_H */ diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/aspeed.c +++ b/hw/arm/aspeed.c @@ -XXX,XX +XXX,XX @@ static void aspeed_board_init(MachineState *machine, AspeedSoCClass *sc; DriveInfo *drive0 = drive_get(IF_MTD, 0, 0); ram_addr_t max_ram_size; + int i; bmc = g_new0(AspeedBoardState, 1); @@ -XXX,XX +XXX,XX @@ static void aspeed_board_init(MachineState *machine, cfg->i2c_init(bmc); } + for (i = 0; i < ARRAY_SIZE(bmc->soc.sdhci.slots); i++) { + SDHCIState *sdhci = &bmc->soc.sdhci.slots[i]; + DriveInfo *dinfo = drive_get_next(IF_SD); + BlockBackend *blk; + DeviceState *card; + + blk = dinfo ? blk_by_legacy_dinfo(dinfo) : NULL; + card = qdev_create(qdev_get_child_bus(DEVICE(sdhci), "sd-bus"), + TYPE_SD_CARD); + qdev_prop_set_drive(card, "drive", blk, &error_fatal); + object_property_set_bool(OBJECT(card), true, "realized", &error_fatal); + } + arm_load_kernel(ARM_CPU(first_cpu), machine, &aspeed_board_binfo); } @@ -XXX,XX +XXX,XX @@ static void aspeed_machine_class_init(ObjectClass *oc, void *data) mc->desc = board->desc; mc->init = aspeed_machine_init; mc->max_cpus = ASPEED_CPUS_NUM; - mc->no_sdcard = 1; mc->no_floppy = 1; mc->no_cdrom = 1; mc->no_parallel = 1; diff --git a/hw/arm/aspeed_soc.c b/hw/arm/aspeed_soc.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/aspeed_soc.c +++ b/hw/arm/aspeed_soc.c @@ -XXX,XX +XXX,XX @@ static const hwaddr aspeed_soc_ast2400_memmap[] = { [ASPEED_XDMA] = 0x1E6E7000, [ASPEED_ADC] = 0x1E6E9000, [ASPEED_SRAM] = 0x1E720000, + [ASPEED_SDHCI] = 0x1E740000, [ASPEED_GPIO] = 0x1E780000, [ASPEED_RTC] = 0x1E781000, [ASPEED_TIMER1] = 0x1E782000, @@ -XXX,XX +XXX,XX @@ static const hwaddr aspeed_soc_ast2500_memmap[] = { [ASPEED_XDMA] = 0x1E6E7000, [ASPEED_ADC] = 0x1E6E9000, [ASPEED_SRAM] = 0x1E720000, + [ASPEED_SDHCI] = 0x1E740000, [ASPEED_GPIO] = 0x1E780000, [ASPEED_RTC] = 0x1E781000, [ASPEED_TIMER1] = 0x1E782000, @@ -XXX,XX +XXX,XX @@ static const int aspeed_soc_ast2400_irqmap[] = { [ASPEED_ETH1] = 2, [ASPEED_ETH2] = 3, [ASPEED_XDMA] = 6, + [ASPEED_SDHCI] = 26, }; #define aspeed_soc_ast2500_irqmap aspeed_soc_ast2400_irqmap @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_init(Object *obj) snprintf(typename, sizeof(typename), "aspeed.gpio-%s", socname); sysbus_init_child_obj(obj, "gpio", OBJECT(&s->gpio), sizeof(s->gpio), typename); + + sysbus_init_child_obj(obj, "sdc", OBJECT(&s->sdhci), sizeof(s->sdhci), + TYPE_ASPEED_SDHCI); + + /* Init sd card slot class here so that they're under the correct parent */ + for (i = 0; i < ASPEED_SDHCI_NUM_SLOTS; ++i) { + sysbus_init_child_obj(obj, "sdhci[*]", OBJECT(&s->sdhci.slots[i]), + sizeof(s->sdhci.slots[i]), TYPE_SYSBUS_SDHCI); + } } static void aspeed_soc_realize(DeviceState *dev, Error **errp) @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_realize(DeviceState *dev, Error **errp) sysbus_mmio_map(SYS_BUS_DEVICE(&s->gpio), 0, sc->info->memmap[ASPEED_GPIO]); sysbus_connect_irq(SYS_BUS_DEVICE(&s->gpio), 0, aspeed_soc_get_irq(s, ASPEED_GPIO)); + + /* SDHCI */ + object_property_set_bool(OBJECT(&s->sdhci), true, "realized", &err); + if (err) { + error_propagate(errp, err); + return; + } + sysbus_mmio_map(SYS_BUS_DEVICE(&s->sdhci), 0, + sc->info->memmap[ASPEED_SDHCI]); + sysbus_connect_irq(SYS_BUS_DEVICE(&s->sdhci), 0, + aspeed_soc_get_irq(s, ASPEED_SDHCI)); } static Property aspeed_soc_properties[] = { DEFINE_PROP_UINT32("num-cpus", AspeedSoCState, num_cpus, 0), diff --git a/hw/sd/aspeed_sdhci.c b/hw/sd/aspeed_sdhci.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/hw/sd/aspeed_sdhci.c @@ -XXX,XX +XXX,XX @@ +/* + * Aspeed SD Host Controller + * Eddie James <eajames@linux.ibm.com> + * + * Copyright (C) 2019 IBM Corp + * SPDX-License-Identifer: GPL-2.0-or-later + */ + +#include "qemu/osdep.h" +#include "qemu/log.h" +#include "qemu/error-report.h" +#include "hw/sd/aspeed_sdhci.h" +#include "qapi/error.h" +#include "hw/irq.h" +#include "migration/vmstate.h" + +#define ASPEED_SDHCI_INFO 0x00 +#define ASPEED_SDHCI_INFO_RESET 0x00030000 +#define ASPEED_SDHCI_DEBOUNCE 0x04 +#define ASPEED_SDHCI_DEBOUNCE_RESET 0x00000005 +#define ASPEED_SDHCI_BUS 0x08 +#define ASPEED_SDHCI_SDIO_140 0x10 +#define ASPEED_SDHCI_SDIO_148 0x18 +#define ASPEED_SDHCI_SDIO_240 0x20 +#define ASPEED_SDHCI_SDIO_248 0x28 +#define ASPEED_SDHCI_WP_POL 0xec +#define ASPEED_SDHCI_CARD_DET 0xf0 +#define ASPEED_SDHCI_IRQ_STAT 0xfc + +#define TO_REG(addr) ((addr) / sizeof(uint32_t)) + +static uint64_t aspeed_sdhci_read(void *opaque, hwaddr addr, unsigned int size) +{ + uint32_t val = 0; + AspeedSDHCIState *sdhci = opaque; + + switch (addr) { + case ASPEED_SDHCI_SDIO_140: + val = (uint32_t)sdhci->slots[0].capareg; + break; + case ASPEED_SDHCI_SDIO_148: + val = (uint32_t)sdhci->slots[0].maxcurr; + break; + case ASPEED_SDHCI_SDIO_240: + val = (uint32_t)sdhci->slots[1].capareg; + break; + case ASPEED_SDHCI_SDIO_248: + val = (uint32_t)sdhci->slots[1].maxcurr; + break; + default: + if (addr < ASPEED_SDHCI_REG_SIZE) { + val = sdhci->regs[TO_REG(addr)]; + } else { + qemu_log_mask(LOG_GUEST_ERROR, + "%s: Out-of-bounds read at 0x%" HWADDR_PRIx "\n", + __func__, addr); + } + } + + return (uint64_t)val; +} + +static void aspeed_sdhci_write(void *opaque, hwaddr addr, uint64_t val, + unsigned int size) +{ + AspeedSDHCIState *sdhci = opaque; + + switch (addr) { + case ASPEED_SDHCI_SDIO_140: + sdhci->slots[0].capareg = (uint64_t)(uint32_t)val; + break; + case ASPEED_SDHCI_SDIO_148: + sdhci->slots[0].maxcurr = (uint64_t)(uint32_t)val; + break; + case ASPEED_SDHCI_SDIO_240: + sdhci->slots[1].capareg = (uint64_t)(uint32_t)val; + break; + case ASPEED_SDHCI_SDIO_248: + sdhci->slots[1].maxcurr = (uint64_t)(uint32_t)val; + break; + default: + if (addr < ASPEED_SDHCI_REG_SIZE) { + sdhci->regs[TO_REG(addr)] = (uint32_t)val; + } else { + qemu_log_mask(LOG_GUEST_ERROR, + "%s: Out-of-bounds write at 0x%" HWADDR_PRIx "\n", + __func__, addr); + } + } +} + +static const MemoryRegionOps aspeed_sdhci_ops = { + .read = aspeed_sdhci_read, + .write = aspeed_sdhci_write, + .endianness = DEVICE_NATIVE_ENDIAN, + .valid.min_access_size = 4, + .valid.max_access_size = 4, +}; + +static void aspeed_sdhci_set_irq(void *opaque, int n, int level) +{ + AspeedSDHCIState *sdhci = opaque; + + if (level) { + sdhci->regs[TO_REG(ASPEED_SDHCI_IRQ_STAT)] |= BIT(n); + + qemu_irq_raise(sdhci->irq); + } else { + sdhci->regs[TO_REG(ASPEED_SDHCI_IRQ_STAT)] &= ~BIT(n); + + qemu_irq_lower(sdhci->irq); + } +} + +static void aspeed_sdhci_realize(DeviceState *dev, Error **errp) +{ + Error *err = NULL; + SysBusDevice *sbd = SYS_BUS_DEVICE(dev); + AspeedSDHCIState *sdhci = ASPEED_SDHCI(dev); + + /* Create input irqs for the slots */ + qdev_init_gpio_in_named_with_opaque(DEVICE(sbd), aspeed_sdhci_set_irq, + sdhci, NULL, ASPEED_SDHCI_NUM_SLOTS); + + sysbus_init_irq(sbd, &sdhci->irq); + memory_region_init_io(&sdhci->iomem, OBJECT(sdhci), &aspeed_sdhci_ops, + sdhci, TYPE_ASPEED_SDHCI, 0x1000); + sysbus_init_mmio(sbd, &sdhci->iomem); + + for (int i = 0; i < ASPEED_SDHCI_NUM_SLOTS; ++i) { + Object *sdhci_slot = OBJECT(&sdhci->slots[i]); + SysBusDevice *sbd_slot = SYS_BUS_DEVICE(&sdhci->slots[i]); + + object_property_set_int(sdhci_slot, 2, "sd-spec-version", &err); + if (err) { + error_propagate(errp, err); + return; + } + + object_property_set_uint(sdhci_slot, ASPEED_SDHCI_CAPABILITIES, + "capareg", &err); + if (err) { + error_propagate(errp, err); + return; + } + + object_property_set_bool(sdhci_slot, true, "realized", &err); + if (err) { + error_propagate(errp, err); + return; + } + + sysbus_connect_irq(sbd_slot, 0, qdev_get_gpio_in(DEVICE(sbd), i)); + memory_region_add_subregion(&sdhci->iomem, (i + 1) * 0x100, + &sdhci->slots[i].iomem); + } +} + +static void aspeed_sdhci_reset(DeviceState *dev) +{ + AspeedSDHCIState *sdhci = ASPEED_SDHCI(dev); + + memset(sdhci->regs, 0, ASPEED_SDHCI_REG_SIZE); + sdhci->regs[TO_REG(ASPEED_SDHCI_INFO)] = ASPEED_SDHCI_INFO_RESET; + sdhci->regs[TO_REG(ASPEED_SDHCI_DEBOUNCE)] = ASPEED_SDHCI_DEBOUNCE_RESET; +} + +static const VMStateDescription vmstate_aspeed_sdhci = { + .name = TYPE_ASPEED_SDHCI, + .version_id = 1, + .fields = (VMStateField[]) { + VMSTATE_UINT32_ARRAY(regs, AspeedSDHCIState, ASPEED_SDHCI_NUM_REGS), + VMSTATE_END_OF_LIST(), + }, +}; + +static void aspeed_sdhci_class_init(ObjectClass *classp, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(classp); + + dc->realize = aspeed_sdhci_realize; + dc->reset = aspeed_sdhci_reset; + dc->vmsd = &vmstate_aspeed_sdhci; +} + +static TypeInfo aspeed_sdhci_info = { + .name = TYPE_ASPEED_SDHCI, + .parent = TYPE_SYS_BUS_DEVICE, + .instance_size = sizeof(AspeedSDHCIState), + .class_init = aspeed_sdhci_class_init, +}; + +static void aspeed_sdhci_register_types(void) +{ + type_register_static(&aspeed_sdhci_info); +} + +type_init(aspeed_sdhci_register_types) -- 2.20.1
From: Joel Stanley <joel@jms.id.au> The SCU controller on the AST2600 SoC has extra registers. Increase the number of regs of the model and introduce a new field in the class to customize the MemoryRegion operations depending on the SoC model. Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-id: 20190925143248.10000-4-clg@kaod.org [clg: - improved commit log - changed vmstate version - reworked model integration into new object class - included AST2600_HPLL_PARAM value ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/hw/misc/aspeed_scu.h | 7 +- hw/misc/aspeed_scu.c | 192 +++++++++++++++++++++++++++++++++-- 2 files changed, 191 insertions(+), 8 deletions(-) diff --git a/include/hw/misc/aspeed_scu.h b/include/hw/misc/aspeed_scu.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/misc/aspeed_scu.h +++ b/include/hw/misc/aspeed_scu.h @@ -XXX,XX +XXX,XX @@ #define ASPEED_SCU(obj) OBJECT_CHECK(AspeedSCUState, (obj), TYPE_ASPEED_SCU) #define TYPE_ASPEED_2400_SCU TYPE_ASPEED_SCU "-ast2400" #define TYPE_ASPEED_2500_SCU TYPE_ASPEED_SCU "-ast2500" +#define TYPE_ASPEED_2600_SCU TYPE_ASPEED_SCU "-ast2600" #define ASPEED_SCU_NR_REGS (0x1A8 >> 2) +#define ASPEED_AST2600_SCU_NR_REGS (0xE20 >> 2) typedef struct AspeedSCUState { /*< private >*/ @@ -XXX,XX +XXX,XX @@ typedef struct AspeedSCUState { /*< public >*/ MemoryRegion iomem; - uint32_t regs[ASPEED_SCU_NR_REGS]; + uint32_t regs[ASPEED_AST2600_SCU_NR_REGS]; uint32_t silicon_rev; uint32_t hw_strap1; uint32_t hw_strap2; @@ -XXX,XX +XXX,XX @@ typedef struct AspeedSCUState { #define AST2400_A1_SILICON_REV 0x02010303U #define AST2500_A0_SILICON_REV 0x04000303U #define AST2500_A1_SILICON_REV 0x04010303U +#define AST2600_A0_SILICON_REV 0x05000303U #define ASPEED_IS_AST2500(si_rev) ((((si_rev) >> 24) & 0xff) == 0x04) @@ -XXX,XX +XXX,XX @@ typedef struct AspeedSCUClass { const uint32_t *resets; uint32_t (*calc_hpll)(AspeedSCUState *s, uint32_t hpll_reg); uint32_t apb_divider; + uint32_t nr_regs; + const MemoryRegionOps *ops; } AspeedSCUClass; #define ASPEED_SCU_PROT_KEY 0x1688A8A8 diff --git a/hw/misc/aspeed_scu.c b/hw/misc/aspeed_scu.c index XXXXXXX..XXXXXXX 100644 --- a/hw/misc/aspeed_scu.c +++ b/hw/misc/aspeed_scu.c @@ -XXX,XX +XXX,XX @@ #define BMC_REV TO_REG(0x19C) #define BMC_DEV_ID TO_REG(0x1A4) +#define AST2600_PROT_KEY TO_REG(0x00) +#define AST2600_SILICON_REV TO_REG(0x04) +#define AST2600_SILICON_REV2 TO_REG(0x14) +#define AST2600_SYS_RST_CTRL TO_REG(0x40) +#define AST2600_SYS_RST_CTRL_CLR TO_REG(0x44) +#define AST2600_SYS_RST_CTRL2 TO_REG(0x50) +#define AST2600_SYS_RST_CTRL2_CLR TO_REG(0x54) +#define AST2600_CLK_STOP_CTRL TO_REG(0x80) +#define AST2600_CLK_STOP_CTRL_CLR TO_REG(0x84) +#define AST2600_CLK_STOP_CTRL2 TO_REG(0x90) +#define AST2600_CLK_STOP_CTR2L_CLR TO_REG(0x94) +#define AST2600_HPLL_PARAM TO_REG(0x200) +#define AST2600_HPLL_EXT TO_REG(0x204) +#define AST2600_MPLL_EXT TO_REG(0x224) +#define AST2600_EPLL_EXT TO_REG(0x244) +#define AST2600_CLK_SEL TO_REG(0x300) +#define AST2600_CLK_SEL2 TO_REG(0x304) +#define AST2600_CLK_SEL3 TO_REG(0x310) +#define AST2600_HW_STRAP1 TO_REG(0x500) +#define AST2600_HW_STRAP1_CLR TO_REG(0x504) +#define AST2600_HW_STRAP1_PROT TO_REG(0x508) +#define AST2600_HW_STRAP2 TO_REG(0x510) +#define AST2600_HW_STRAP2_CLR TO_REG(0x514) +#define AST2600_HW_STRAP2_PROT TO_REG(0x518) +#define AST2600_RNG_CTRL TO_REG(0x524) +#define AST2600_RNG_DATA TO_REG(0x540) + +#define AST2600_CLK TO_REG(0x40) + #define SCU_IO_REGION_SIZE 0x1000 static const uint32_t ast2400_a0_resets[ASPEED_SCU_NR_REGS] = { @@ -XXX,XX +XXX,XX @@ static uint64_t aspeed_scu_read(void *opaque, hwaddr offset, unsigned size) AspeedSCUState *s = ASPEED_SCU(opaque); int reg = TO_REG(offset); - if (reg >= ARRAY_SIZE(s->regs)) { + if (reg >= ASPEED_SCU_NR_REGS) { qemu_log_mask(LOG_GUEST_ERROR, "%s: Out-of-bounds read at offset 0x%" HWADDR_PRIx "\n", __func__, offset); @@ -XXX,XX +XXX,XX @@ static void aspeed_scu_write(void *opaque, hwaddr offset, uint64_t data, AspeedSCUState *s = ASPEED_SCU(opaque); int reg = TO_REG(offset); - if (reg >= ARRAY_SIZE(s->regs)) { + if (reg >= ASPEED_SCU_NR_REGS) { qemu_log_mask(LOG_GUEST_ERROR, "%s: Out-of-bounds write at offset 0x%" HWADDR_PRIx "\n", __func__, offset); @@ -XXX,XX +XXX,XX @@ static void aspeed_scu_reset(DeviceState *dev) AspeedSCUState *s = ASPEED_SCU(dev); AspeedSCUClass *asc = ASPEED_SCU_GET_CLASS(dev); - memcpy(s->regs, asc->resets, sizeof(s->regs)); + memcpy(s->regs, asc->resets, asc->nr_regs * 4); s->regs[SILICON_REV] = s->silicon_rev; s->regs[HW_STRAP1] = s->hw_strap1; s->regs[HW_STRAP2] = s->hw_strap2; @@ -XXX,XX +XXX,XX @@ static uint32_t aspeed_silicon_revs[] = { AST2400_A1_SILICON_REV, AST2500_A0_SILICON_REV, AST2500_A1_SILICON_REV, + AST2600_A0_SILICON_REV, }; bool is_supported_silicon_rev(uint32_t silicon_rev) @@ -XXX,XX +XXX,XX @@ static void aspeed_scu_realize(DeviceState *dev, Error **errp) { SysBusDevice *sbd = SYS_BUS_DEVICE(dev); AspeedSCUState *s = ASPEED_SCU(dev); + AspeedSCUClass *asc = ASPEED_SCU_GET_CLASS(dev); if (!is_supported_silicon_rev(s->silicon_rev)) { error_setg(errp, "Unknown silicon revision: 0x%" PRIx32, @@ -XXX,XX +XXX,XX @@ static void aspeed_scu_realize(DeviceState *dev, Error **errp) return; } - memory_region_init_io(&s->iomem, OBJECT(s), &aspeed_scu_ops, s, + memory_region_init_io(&s->iomem, OBJECT(s), asc->ops, s, TYPE_ASPEED_SCU, SCU_IO_REGION_SIZE); sysbus_init_mmio(sbd, &s->iomem); @@ -XXX,XX +XXX,XX @@ static void aspeed_scu_realize(DeviceState *dev, Error **errp) static const VMStateDescription vmstate_aspeed_scu = { .name = "aspeed.scu", - .version_id = 1, - .minimum_version_id = 1, + .version_id = 2, + .minimum_version_id = 2, .fields = (VMStateField[]) { - VMSTATE_UINT32_ARRAY(regs, AspeedSCUState, ASPEED_SCU_NR_REGS), + VMSTATE_UINT32_ARRAY(regs, AspeedSCUState, ASPEED_AST2600_SCU_NR_REGS), VMSTATE_END_OF_LIST() } }; @@ -XXX,XX +XXX,XX @@ static void aspeed_2400_scu_class_init(ObjectClass *klass, void *data) asc->resets = ast2400_a0_resets; asc->calc_hpll = aspeed_2400_scu_calc_hpll; asc->apb_divider = 2; + asc->nr_regs = ASPEED_SCU_NR_REGS; + asc->ops = &aspeed_scu_ops; } static const TypeInfo aspeed_2400_scu_info = { @@ -XXX,XX +XXX,XX @@ static void aspeed_2500_scu_class_init(ObjectClass *klass, void *data) asc->resets = ast2500_a1_resets; asc->calc_hpll = aspeed_2500_scu_calc_hpll; asc->apb_divider = 4; + asc->nr_regs = ASPEED_SCU_NR_REGS; + asc->ops = &aspeed_scu_ops; } static const TypeInfo aspeed_2500_scu_info = { @@ -XXX,XX +XXX,XX @@ static const TypeInfo aspeed_2500_scu_info = { .class_init = aspeed_2500_scu_class_init, }; +static uint64_t aspeed_ast2600_scu_read(void *opaque, hwaddr offset, + unsigned size) +{ + AspeedSCUState *s = ASPEED_SCU(opaque); + int reg = TO_REG(offset); + + if (reg >= ASPEED_AST2600_SCU_NR_REGS) { + qemu_log_mask(LOG_GUEST_ERROR, + "%s: Out-of-bounds read at offset 0x%" HWADDR_PRIx "\n", + __func__, offset); + return 0; + } + + switch (reg) { + case AST2600_HPLL_EXT: + case AST2600_EPLL_EXT: + case AST2600_MPLL_EXT: + /* PLLs are always "locked" */ + return s->regs[reg] | BIT(31); + case AST2600_RNG_DATA: + /* + * On hardware, RNG_DATA works regardless of the state of the + * enable bit in RNG_CTRL + * + * TODO: Check this is true for ast2600 + */ + s->regs[AST2600_RNG_DATA] = aspeed_scu_get_random(); + break; + } + + return s->regs[reg]; +} + +static void aspeed_ast2600_scu_write(void *opaque, hwaddr offset, uint64_t data, + unsigned size) +{ + AspeedSCUState *s = ASPEED_SCU(opaque); + int reg = TO_REG(offset); + + if (reg >= ASPEED_AST2600_SCU_NR_REGS) { + qemu_log_mask(LOG_GUEST_ERROR, + "%s: Out-of-bounds write at offset 0x%" HWADDR_PRIx "\n", + __func__, offset); + return; + } + + if (reg > PROT_KEY && !s->regs[PROT_KEY]) { + qemu_log_mask(LOG_GUEST_ERROR, "%s: SCU is locked!\n", __func__); + } + + trace_aspeed_scu_write(offset, size, data); + + switch (reg) { + case AST2600_PROT_KEY: + s->regs[reg] = (data == ASPEED_SCU_PROT_KEY) ? 1 : 0; + return; + case AST2600_HW_STRAP1: + case AST2600_HW_STRAP2: + if (s->regs[reg + 2]) { + return; + } + /* fall through */ + case AST2600_SYS_RST_CTRL: + case AST2600_SYS_RST_CTRL2: + /* W1S (Write 1 to set) registers */ + s->regs[reg] |= data; + return; + case AST2600_SYS_RST_CTRL_CLR: + case AST2600_SYS_RST_CTRL2_CLR: + case AST2600_HW_STRAP1_CLR: + case AST2600_HW_STRAP2_CLR: + /* W1C (Write 1 to clear) registers */ + s->regs[reg] &= ~data; + return; + + case AST2600_RNG_DATA: + case AST2600_SILICON_REV: + case AST2600_SILICON_REV2: + /* Add read only registers here */ + qemu_log_mask(LOG_GUEST_ERROR, + "%s: Write to read-only offset 0x%" HWADDR_PRIx "\n", + __func__, offset); + return; + } + + s->regs[reg] = data; +} + +static const MemoryRegionOps aspeed_ast2600_scu_ops = { + .read = aspeed_ast2600_scu_read, + .write = aspeed_ast2600_scu_write, + .endianness = DEVICE_LITTLE_ENDIAN, + .valid.min_access_size = 4, + .valid.max_access_size = 4, + .valid.unaligned = false, +}; + +static const uint32_t ast2600_a0_resets[ASPEED_AST2600_SCU_NR_REGS] = { + [AST2600_SILICON_REV] = AST2600_SILICON_REV, + [AST2600_SILICON_REV2] = AST2600_SILICON_REV, + [AST2600_SYS_RST_CTRL] = 0xF7CFFEDC | 0x100, + [AST2600_SYS_RST_CTRL2] = 0xFFFFFFFC, + [AST2600_CLK_STOP_CTRL] = 0xEFF43E8B, + [AST2600_CLK_STOP_CTRL2] = 0xFFF0FFF0, + [AST2600_HPLL_PARAM] = 0x1000405F, +}; + +static void aspeed_ast2600_scu_reset(DeviceState *dev) +{ + AspeedSCUState *s = ASPEED_SCU(dev); + AspeedSCUClass *asc = ASPEED_SCU_GET_CLASS(dev); + + memcpy(s->regs, asc->resets, asc->nr_regs * 4); + + s->regs[AST2600_SILICON_REV] = s->silicon_rev; + s->regs[AST2600_SILICON_REV2] = s->silicon_rev; + s->regs[AST2600_HW_STRAP1] = s->hw_strap1; + s->regs[AST2600_HW_STRAP2] = s->hw_strap2; + s->regs[PROT_KEY] = s->hw_prot_key; +} + +static void aspeed_2600_scu_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(klass); + AspeedSCUClass *asc = ASPEED_SCU_CLASS(klass); + + dc->desc = "ASPEED 2600 System Control Unit"; + dc->reset = aspeed_ast2600_scu_reset; + asc->resets = ast2600_a0_resets; + asc->calc_hpll = aspeed_2500_scu_calc_hpll; /* No change since AST2500 */ + asc->apb_divider = 4; + asc->nr_regs = ASPEED_AST2600_SCU_NR_REGS; + asc->ops = &aspeed_ast2600_scu_ops; +} + +static const TypeInfo aspeed_2600_scu_info = { + .name = TYPE_ASPEED_2600_SCU, + .parent = TYPE_ASPEED_SCU, + .instance_size = sizeof(AspeedSCUState), + .class_init = aspeed_2600_scu_class_init, +}; + static void aspeed_scu_register_types(void) { type_register_static(&aspeed_scu_info); type_register_static(&aspeed_2400_scu_info); type_register_static(&aspeed_2500_scu_info); + type_register_static(&aspeed_2600_scu_info); } type_init(aspeed_scu_register_types); -- 2.20.1
From: Cédric Le Goater <clg@kaod.org> The most important changes will be on the register range 0x34 - 0x3C memops. Introduce class read/write operations to handle the differences between SoCs. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Joel Stanley <joel@jms.id.au> Message-id: 20190925143248.10000-5-clg@kaod.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/hw/timer/aspeed_timer.h | 15 +++++ hw/arm/aspeed_soc.c | 3 +- hw/timer/aspeed_timer.c | 107 ++++++++++++++++++++++++++++---- 3 files changed, 113 insertions(+), 12 deletions(-) diff --git a/include/hw/timer/aspeed_timer.h b/include/hw/timer/aspeed_timer.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/timer/aspeed_timer.h +++ b/include/hw/timer/aspeed_timer.h @@ -XXX,XX +XXX,XX @@ #define ASPEED_TIMER(obj) \ OBJECT_CHECK(AspeedTimerCtrlState, (obj), TYPE_ASPEED_TIMER); #define TYPE_ASPEED_TIMER "aspeed.timer" +#define TYPE_ASPEED_2400_TIMER TYPE_ASPEED_TIMER "-ast2400" +#define TYPE_ASPEED_2500_TIMER TYPE_ASPEED_TIMER "-ast2500" + #define ASPEED_TIMER_NR_TIMERS 8 typedef struct AspeedTimer { @@ -XXX,XX +XXX,XX @@ typedef struct AspeedTimerCtrlState { AspeedSCUState *scu; } AspeedTimerCtrlState; +#define ASPEED_TIMER_CLASS(klass) \ + OBJECT_CLASS_CHECK(AspeedTimerClass, (klass), TYPE_ASPEED_TIMER) +#define ASPEED_TIMER_GET_CLASS(obj) \ + OBJECT_GET_CLASS(AspeedTimerClass, (obj), TYPE_ASPEED_TIMER) + +typedef struct AspeedTimerClass { + SysBusDeviceClass parent_class; + + uint64_t (*read)(AspeedTimerCtrlState *s, hwaddr offset); + void (*write)(AspeedTimerCtrlState *s, hwaddr offset, uint64_t value); +} AspeedTimerClass; + #endif /* ASPEED_TIMER_H */ diff --git a/hw/arm/aspeed_soc.c b/hw/arm/aspeed_soc.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/aspeed_soc.c +++ b/hw/arm/aspeed_soc.c @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_init(Object *obj) sysbus_init_child_obj(obj, "rtc", OBJECT(&s->rtc), sizeof(s->rtc), TYPE_ASPEED_RTC); + snprintf(typename, sizeof(typename), "aspeed.timer-%s", socname); sysbus_init_child_obj(obj, "timerctrl", OBJECT(&s->timerctrl), - sizeof(s->timerctrl), TYPE_ASPEED_TIMER); + sizeof(s->timerctrl), typename); object_property_add_const_link(OBJECT(&s->timerctrl), "scu", OBJECT(&s->scu), &error_abort); diff --git a/hw/timer/aspeed_timer.c b/hw/timer/aspeed_timer.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/aspeed_timer.c +++ b/hw/timer/aspeed_timer.c @@ -XXX,XX +XXX,XX @@ static uint64_t aspeed_timer_read(void *opaque, hwaddr offset, unsigned size) case 0x40 ... 0x8c: /* Timers 5 - 8 */ value = aspeed_timer_get_value(&s->timers[(offset >> 4) - 1], reg); break; - /* Illegal */ - case 0x38: - case 0x3C: default: - qemu_log_mask(LOG_GUEST_ERROR, "%s: Bad offset 0x%" HWADDR_PRIx "\n", - __func__, offset); - value = 0; + value = ASPEED_TIMER_GET_CLASS(s)->read(s, offset); break; } trace_aspeed_timer_read(offset, size, value); @@ -XXX,XX +XXX,XX @@ static void aspeed_timer_write(void *opaque, hwaddr offset, uint64_t value, case 0x40 ... 0x8c: aspeed_timer_set_value(s, (offset >> TIMER_NR_REGS) - 1, reg, tv); break; - /* Illegal */ - case 0x38: - case 0x3C: default: - qemu_log_mask(LOG_GUEST_ERROR, "%s: Bad offset 0x%" HWADDR_PRIx "\n", - __func__, offset); + ASPEED_TIMER_GET_CLASS(s)->write(s, offset, value); break; } } @@ -XXX,XX +XXX,XX @@ static const MemoryRegionOps aspeed_timer_ops = { .valid.unaligned = false, }; +static uint64_t aspeed_2400_timer_read(AspeedTimerCtrlState *s, hwaddr offset) +{ + uint64_t value; + + switch (offset) { + case 0x38: + case 0x3C: + default: + qemu_log_mask(LOG_GUEST_ERROR, "%s: Bad offset 0x%" HWADDR_PRIx "\n", + __func__, offset); + value = 0; + break; + } + return value; +} + +static void aspeed_2400_timer_write(AspeedTimerCtrlState *s, hwaddr offset, + uint64_t value) +{ + switch (offset) { + case 0x38: + case 0x3C: + default: + qemu_log_mask(LOG_GUEST_ERROR, "%s: Bad offset 0x%" HWADDR_PRIx "\n", + __func__, offset); + break; + } +} + +static uint64_t aspeed_2500_timer_read(AspeedTimerCtrlState *s, hwaddr offset) +{ + uint64_t value; + + switch (offset) { + case 0x38: + case 0x3C: + default: + qemu_log_mask(LOG_GUEST_ERROR, "%s: Bad offset 0x%" HWADDR_PRIx "\n", + __func__, offset); + value = 0; + break; + } + return value; +} + +static void aspeed_2500_timer_write(AspeedTimerCtrlState *s, hwaddr offset, + uint64_t value) +{ + switch (offset) { + case 0x38: + case 0x3C: + default: + qemu_log_mask(LOG_GUEST_ERROR, "%s: Bad offset 0x%" HWADDR_PRIx "\n", + __func__, offset); + break; + } +} + static void aspeed_init_one_timer(AspeedTimerCtrlState *s, uint8_t id) { AspeedTimer *t = &s->timers[id]; @@ -XXX,XX +XXX,XX @@ static const TypeInfo aspeed_timer_info = { .parent = TYPE_SYS_BUS_DEVICE, .instance_size = sizeof(AspeedTimerCtrlState), .class_init = timer_class_init, + .class_size = sizeof(AspeedTimerClass), + .abstract = true, +}; + +static void aspeed_2400_timer_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(klass); + AspeedTimerClass *awc = ASPEED_TIMER_CLASS(klass); + + dc->desc = "ASPEED 2400 Timer"; + awc->read = aspeed_2400_timer_read; + awc->write = aspeed_2400_timer_write; +} + +static const TypeInfo aspeed_2400_timer_info = { + .name = TYPE_ASPEED_2400_TIMER, + .parent = TYPE_ASPEED_TIMER, + .class_init = aspeed_2400_timer_class_init, +}; + +static void aspeed_2500_timer_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(klass); + AspeedTimerClass *awc = ASPEED_TIMER_CLASS(klass); + + dc->desc = "ASPEED 2500 Timer"; + awc->read = aspeed_2500_timer_read; + awc->write = aspeed_2500_timer_write; +} + +static const TypeInfo aspeed_2500_timer_info = { + .name = TYPE_ASPEED_2500_TIMER, + .parent = TYPE_ASPEED_TIMER, + .class_init = aspeed_2500_timer_class_init, }; static void aspeed_timer_register_types(void) { type_register_static(&aspeed_timer_info); + type_register_static(&aspeed_2400_timer_info); + type_register_static(&aspeed_2500_timer_info); } type_init(aspeed_timer_register_types) -- 2.20.1
From: Cédric Le Goater <clg@kaod.org> The AST2500 timer has a third control register that is used to implement a set-to-clear feature for the main control register. This models the behaviour expected by the AST2500 while maintaining the same behaviour for the AST2400. The vmstate version is not increased yet because the structure is modified again in the following patches. Based on previous work from Joel Stanley. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Joel Stanley <joel@jms.id.au> Message-id: 20190925143248.10000-6-clg@kaod.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/hw/timer/aspeed_timer.h | 1 + hw/timer/aspeed_timer.c | 19 +++++++++++++++++++ 2 files changed, 20 insertions(+) diff --git a/include/hw/timer/aspeed_timer.h b/include/hw/timer/aspeed_timer.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/timer/aspeed_timer.h +++ b/include/hw/timer/aspeed_timer.h @@ -XXX,XX +XXX,XX @@ typedef struct AspeedTimerCtrlState { uint32_t ctrl; uint32_t ctrl2; + uint32_t ctrl3; AspeedTimer timers[ASPEED_TIMER_NR_TIMERS]; AspeedSCUState *scu; diff --git a/hw/timer/aspeed_timer.c b/hw/timer/aspeed_timer.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/aspeed_timer.c +++ b/hw/timer/aspeed_timer.c @@ -XXX,XX +XXX,XX @@ static uint64_t aspeed_2500_timer_read(AspeedTimerCtrlState *s, hwaddr offset) switch (offset) { case 0x38: + value = s->ctrl3 & BIT(0); + break; case 0x3C: default: qemu_log_mask(LOG_GUEST_ERROR, "%s: Bad offset 0x%" HWADDR_PRIx "\n", @@ -XXX,XX +XXX,XX @@ static uint64_t aspeed_2500_timer_read(AspeedTimerCtrlState *s, hwaddr offset) static void aspeed_2500_timer_write(AspeedTimerCtrlState *s, hwaddr offset, uint64_t value) { + const uint32_t tv = (uint32_t)(value & 0xFFFFFFFF); + uint8_t command; + switch (offset) { case 0x38: + command = (value >> 1) & 0xFF; + if (command == 0xAE) { + s->ctrl3 = 0x1; + } else if (command == 0xEA) { + s->ctrl3 = 0x0; + } + break; case 0x3C: + if (s->ctrl3 & BIT(0)) { + aspeed_timer_set_ctrl(s, s->ctrl & ~tv); + } + break; + default: qemu_log_mask(LOG_GUEST_ERROR, "%s: Bad offset 0x%" HWADDR_PRIx "\n", __func__, offset); @@ -XXX,XX +XXX,XX @@ static void aspeed_timer_reset(DeviceState *dev) } s->ctrl = 0; s->ctrl2 = 0; + s->ctrl3 = 0; } static const VMStateDescription vmstate_aspeed_timer = { @@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_aspeed_timer_state = { .fields = (VMStateField[]) { VMSTATE_UINT32(ctrl, AspeedTimerCtrlState), VMSTATE_UINT32(ctrl2, AspeedTimerCtrlState), + VMSTATE_UINT32(ctrl3, AspeedTimerCtrlState), VMSTATE_STRUCT_ARRAY(timers, AspeedTimerCtrlState, ASPEED_TIMER_NR_TIMERS, 1, vmstate_aspeed_timer, AspeedTimer), -- 2.20.1
From: Cédric Le Goater <clg@kaod.org> The AST2600 timer has a third control register that is used to implement a set-to-clear feature for the main control register. On the AST2600, it is not configurable via 0x38 (control register 3) as it is on the AST2500. Based on previous work from Joel Stanley. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Joel Stanley <joel@jms.id.au> Message-id: 20190925143248.10000-7-clg@kaod.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/hw/timer/aspeed_timer.h | 1 + hw/timer/aspeed_timer.c | 51 +++++++++++++++++++++++++++++++++ 2 files changed, 52 insertions(+) diff --git a/include/hw/timer/aspeed_timer.h b/include/hw/timer/aspeed_timer.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/timer/aspeed_timer.h +++ b/include/hw/timer/aspeed_timer.h @@ -XXX,XX +XXX,XX @@ #define TYPE_ASPEED_TIMER "aspeed.timer" #define TYPE_ASPEED_2400_TIMER TYPE_ASPEED_TIMER "-ast2400" #define TYPE_ASPEED_2500_TIMER TYPE_ASPEED_TIMER "-ast2500" +#define TYPE_ASPEED_2600_TIMER TYPE_ASPEED_TIMER "-ast2600" #define ASPEED_TIMER_NR_TIMERS 8 diff --git a/hw/timer/aspeed_timer.c b/hw/timer/aspeed_timer.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/aspeed_timer.c +++ b/hw/timer/aspeed_timer.c @@ -XXX,XX +XXX,XX @@ static void aspeed_2500_timer_write(AspeedTimerCtrlState *s, hwaddr offset, } } +static uint64_t aspeed_2600_timer_read(AspeedTimerCtrlState *s, hwaddr offset) +{ + uint64_t value; + + switch (offset) { + case 0x38: + case 0x3C: + default: + qemu_log_mask(LOG_GUEST_ERROR, "%s: Bad offset 0x%" HWADDR_PRIx "\n", + __func__, offset); + value = 0; + break; + } + return value; +} + +static void aspeed_2600_timer_write(AspeedTimerCtrlState *s, hwaddr offset, + uint64_t value) +{ + const uint32_t tv = (uint32_t)(value & 0xFFFFFFFF); + + switch (offset) { + case 0x3C: + aspeed_timer_set_ctrl(s, s->ctrl & ~tv); + break; + + case 0x38: + default: + qemu_log_mask(LOG_GUEST_ERROR, "%s: Bad offset 0x%" HWADDR_PRIx "\n", + __func__, offset); + break; + } +} + static void aspeed_init_one_timer(AspeedTimerCtrlState *s, uint8_t id) { AspeedTimer *t = &s->timers[id]; @@ -XXX,XX +XXX,XX @@ static const TypeInfo aspeed_2500_timer_info = { .class_init = aspeed_2500_timer_class_init, }; +static void aspeed_2600_timer_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(klass); + AspeedTimerClass *awc = ASPEED_TIMER_CLASS(klass); + + dc->desc = "ASPEED 2600 Timer"; + awc->read = aspeed_2600_timer_read; + awc->write = aspeed_2600_timer_write; +} + +static const TypeInfo aspeed_2600_timer_info = { + .name = TYPE_ASPEED_2600_TIMER, + .parent = TYPE_ASPEED_TIMER, + .class_init = aspeed_2600_timer_class_init, +}; + static void aspeed_timer_register_types(void) { type_register_static(&aspeed_timer_info); type_register_static(&aspeed_2400_timer_info); type_register_static(&aspeed_2500_timer_info); + type_register_static(&aspeed_2600_timer_info); } type_init(aspeed_timer_register_types) -- 2.20.1
From: Cédric Le Goater <clg@kaod.org> The AST2600 timer replaces control register 2 with a interrupt status register. It is set by hardware when an IRQ occurs and cleared by software. Modify the vmstate version to take into account the new fields. Based on previous work from Joel Stanley. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Joel Stanley <joel@jms.id.au> Message-id: 20190925143248.10000-8-clg@kaod.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/hw/timer/aspeed_timer.h | 1 + hw/timer/aspeed_timer.c | 36 +++++++++++++++++++++++++-------- 2 files changed, 29 insertions(+), 8 deletions(-) diff --git a/include/hw/timer/aspeed_timer.h b/include/hw/timer/aspeed_timer.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/timer/aspeed_timer.h +++ b/include/hw/timer/aspeed_timer.h @@ -XXX,XX +XXX,XX @@ typedef struct AspeedTimerCtrlState { uint32_t ctrl; uint32_t ctrl2; uint32_t ctrl3; + uint32_t irq_sts; AspeedTimer timers[ASPEED_TIMER_NR_TIMERS]; AspeedSCUState *scu; diff --git a/hw/timer/aspeed_timer.c b/hw/timer/aspeed_timer.c index XXXXXXX..XXXXXXX 100644 --- a/hw/timer/aspeed_timer.c +++ b/hw/timer/aspeed_timer.c @@ -XXX,XX +XXX,XX @@ static uint64_t calculate_next(struct AspeedTimer *t) timer_del(&t->timer); if (timer_overflow_interrupt(t)) { + AspeedTimerCtrlState *s = timer_to_ctrl(t); t->level = !t->level; + s->irq_sts |= BIT(t->id); qemu_set_irq(t->irq, t->level); } @@ -XXX,XX +XXX,XX @@ static void aspeed_timer_expire(void *opaque) } if (interrupt) { + AspeedTimerCtrlState *s = timer_to_ctrl(t); t->level = !t->level; + s->irq_sts |= BIT(t->id); qemu_set_irq(t->irq, t->level); } @@ -XXX,XX +XXX,XX @@ static uint64_t aspeed_timer_read(void *opaque, hwaddr offset, unsigned size) case 0x30: /* Control Register */ value = s->ctrl; break; - case 0x34: /* Control Register 2 */ - value = s->ctrl2; - break; case 0x00 ... 0x2c: /* Timers 1 - 4 */ value = aspeed_timer_get_value(&s->timers[(offset >> 4)], reg); break; @@ -XXX,XX +XXX,XX @@ static void aspeed_timer_write(void *opaque, hwaddr offset, uint64_t value, case 0x30: aspeed_timer_set_ctrl(s, tv); break; - case 0x34: - aspeed_timer_set_ctrl2(s, tv); - break; /* Timer Registers */ case 0x00 ... 0x2c: aspeed_timer_set_value(s, (offset >> TIMER_NR_REGS), reg, tv); @@ -XXX,XX +XXX,XX @@ static uint64_t aspeed_2400_timer_read(AspeedTimerCtrlState *s, hwaddr offset) uint64_t value; switch (offset) { + case 0x34: + value = s->ctrl2; + break; case 0x38: case 0x3C: default: @@ -XXX,XX +XXX,XX @@ static uint64_t aspeed_2400_timer_read(AspeedTimerCtrlState *s, hwaddr offset) static void aspeed_2400_timer_write(AspeedTimerCtrlState *s, hwaddr offset, uint64_t value) { + const uint32_t tv = (uint32_t)(value & 0xFFFFFFFF); + switch (offset) { + case 0x34: + aspeed_timer_set_ctrl2(s, tv); + break; case 0x38: case 0x3C: default: @@ -XXX,XX +XXX,XX @@ static uint64_t aspeed_2500_timer_read(AspeedTimerCtrlState *s, hwaddr offset) uint64_t value; switch (offset) { + case 0x34: + value = s->ctrl2; + break; case 0x38: value = s->ctrl3 & BIT(0); break; @@ -XXX,XX +XXX,XX @@ static void aspeed_2500_timer_write(AspeedTimerCtrlState *s, hwaddr offset, uint8_t command; switch (offset) { + case 0x34: + aspeed_timer_set_ctrl2(s, tv); + break; case 0x38: command = (value >> 1) & 0xFF; if (command == 0xAE) { @@ -XXX,XX +XXX,XX @@ static uint64_t aspeed_2600_timer_read(AspeedTimerCtrlState *s, hwaddr offset) uint64_t value; switch (offset) { + case 0x34: + value = s->irq_sts; + break; case 0x38: case 0x3C: default: @@ -XXX,XX +XXX,XX @@ static void aspeed_2600_timer_write(AspeedTimerCtrlState *s, hwaddr offset, const uint32_t tv = (uint32_t)(value & 0xFFFFFFFF); switch (offset) { + case 0x34: + s->irq_sts &= tv; + break; case 0x3C: aspeed_timer_set_ctrl(s, s->ctrl & ~tv); break; @@ -XXX,XX +XXX,XX @@ static void aspeed_timer_reset(DeviceState *dev) s->ctrl = 0; s->ctrl2 = 0; s->ctrl3 = 0; + s->irq_sts = 0; } static const VMStateDescription vmstate_aspeed_timer = { @@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_aspeed_timer = { static const VMStateDescription vmstate_aspeed_timer_state = { .name = "aspeed.timerctrl", - .version_id = 1, - .minimum_version_id = 1, + .version_id = 2, + .minimum_version_id = 2, .fields = (VMStateField[]) { VMSTATE_UINT32(ctrl, AspeedTimerCtrlState), VMSTATE_UINT32(ctrl2, AspeedTimerCtrlState), VMSTATE_UINT32(ctrl3, AspeedTimerCtrlState), + VMSTATE_UINT32(irq_sts, AspeedTimerCtrlState), VMSTATE_STRUCT_ARRAY(timers, AspeedTimerCtrlState, ASPEED_TIMER_NR_TIMERS, 1, vmstate_aspeed_timer, AspeedTimer), -- 2.20.1
From: Cédric Le Goater <clg@kaod.org> Use class handlers and class constants to differentiate the characteristics of the memory controller and remove the 'silicon_rev' property. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Joel Stanley <joel@jms.id.au> Message-id: 20190925143248.10000-9-clg@kaod.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/hw/misc/aspeed_sdmc.h | 19 +++- hw/arm/aspeed_soc.c | 5 +- hw/misc/aspeed_sdmc.c | 168 +++++++++++++++++++++------------- 3 files changed, 122 insertions(+), 70 deletions(-) diff --git a/include/hw/misc/aspeed_sdmc.h b/include/hw/misc/aspeed_sdmc.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/misc/aspeed_sdmc.h +++ b/include/hw/misc/aspeed_sdmc.h @@ -XXX,XX +XXX,XX @@ #define TYPE_ASPEED_SDMC "aspeed.sdmc" #define ASPEED_SDMC(obj) OBJECT_CHECK(AspeedSDMCState, (obj), TYPE_ASPEED_SDMC) +#define TYPE_ASPEED_2400_SDMC TYPE_ASPEED_SDMC "-ast2400" +#define TYPE_ASPEED_2500_SDMC TYPE_ASPEED_SDMC "-ast2500" #define ASPEED_SDMC_NR_REGS (0x174 >> 2) @@ -XXX,XX +XXX,XX @@ typedef struct AspeedSDMCState { MemoryRegion iomem; uint32_t regs[ASPEED_SDMC_NR_REGS]; - uint32_t silicon_rev; - uint32_t ram_bits; uint64_t ram_size; uint64_t max_ram_size; - uint32_t fixed_conf; - } AspeedSDMCState; +#define ASPEED_SDMC_CLASS(klass) \ + OBJECT_CLASS_CHECK(AspeedSDMCClass, (klass), TYPE_ASPEED_SDMC) +#define ASPEED_SDMC_GET_CLASS(obj) \ + OBJECT_GET_CLASS(AspeedSDMCClass, (obj), TYPE_ASPEED_SDMC) + +typedef struct AspeedSDMCClass { + SysBusDeviceClass parent_class; + + uint64_t max_ram_size; + uint32_t (*compute_conf)(AspeedSDMCState *s, uint32_t data); + void (*write)(AspeedSDMCState *s, uint32_t reg, uint32_t data); +} AspeedSDMCClass; + #endif /* ASPEED_SDMC_H */ diff --git a/hw/arm/aspeed_soc.c b/hw/arm/aspeed_soc.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/aspeed_soc.c +++ b/hw/arm/aspeed_soc.c @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_init(Object *obj) sizeof(s->spi[i]), typename); } + snprintf(typename, sizeof(typename), "aspeed.sdmc-%s", socname); sysbus_init_child_obj(obj, "sdmc", OBJECT(&s->sdmc), sizeof(s->sdmc), - TYPE_ASPEED_SDMC); - qdev_prop_set_uint32(DEVICE(&s->sdmc), "silicon-rev", - sc->info->silicon_rev); + typename); object_property_add_alias(obj, "ram-size", OBJECT(&s->sdmc), "ram-size", &error_abort); object_property_add_alias(obj, "max-ram-size", OBJECT(&s->sdmc), diff --git a/hw/misc/aspeed_sdmc.c b/hw/misc/aspeed_sdmc.c index XXXXXXX..XXXXXXX 100644 --- a/hw/misc/aspeed_sdmc.c +++ b/hw/misc/aspeed_sdmc.c @@ -XXX,XX +XXX,XX @@ static void aspeed_sdmc_write(void *opaque, hwaddr addr, uint64_t data, unsigned int size) { AspeedSDMCState *s = ASPEED_SDMC(opaque); + AspeedSDMCClass *asc = ASPEED_SDMC_GET_CLASS(s); addr >>= 2; @@ -XXX,XX +XXX,XX @@ static void aspeed_sdmc_write(void *opaque, hwaddr addr, uint64_t data, return; } - if (addr == R_CONF) { - /* Make sure readonly bits are kept */ - switch (s->silicon_rev) { - case AST2400_A0_SILICON_REV: - case AST2400_A1_SILICON_REV: - data &= ~ASPEED_SDMC_READONLY_MASK; - data |= s->fixed_conf; - break; - case AST2500_A0_SILICON_REV: - case AST2500_A1_SILICON_REV: - data &= ~ASPEED_SDMC_AST2500_READONLY_MASK; - data |= s->fixed_conf; - break; - default: - g_assert_not_reached(); - } - } - if (s->silicon_rev == AST2500_A0_SILICON_REV || - s->silicon_rev == AST2500_A1_SILICON_REV) { - switch (addr) { - case R_STATUS1: - /* Will never return 'busy' */ - data &= ~PHY_BUSY_STATE; - break; - case R_ECC_TEST_CTRL: - /* Always done, always happy */ - data |= ECC_TEST_FINISHED; - data &= ~ECC_TEST_FAIL; - break; - default: - break; - } - } - - s->regs[addr] = data; + asc->write(s, addr, data); } static const MemoryRegionOps aspeed_sdmc_ops = { @@ -XXX,XX +XXX,XX @@ static int ast2500_rambits(AspeedSDMCState *s) static void aspeed_sdmc_reset(DeviceState *dev) { AspeedSDMCState *s = ASPEED_SDMC(dev); + AspeedSDMCClass *asc = ASPEED_SDMC_GET_CLASS(s); memset(s->regs, 0, sizeof(s->regs)); /* Set ram size bit and defaults values */ - s->regs[R_CONF] = s->fixed_conf; + s->regs[R_CONF] = asc->compute_conf(s, 0); } static void aspeed_sdmc_realize(DeviceState *dev, Error **errp) { SysBusDevice *sbd = SYS_BUS_DEVICE(dev); AspeedSDMCState *s = ASPEED_SDMC(dev); + AspeedSDMCClass *asc = ASPEED_SDMC_GET_CLASS(s); - if (!is_supported_silicon_rev(s->silicon_rev)) { - error_setg(errp, "Unknown silicon revision: 0x%" PRIx32, - s->silicon_rev); - return; - } - - switch (s->silicon_rev) { - case AST2400_A0_SILICON_REV: - case AST2400_A1_SILICON_REV: - s->ram_bits = ast2400_rambits(s); - s->max_ram_size = 512 << 20; - s->fixed_conf = ASPEED_SDMC_VGA_COMPAT | - ASPEED_SDMC_DRAM_SIZE(s->ram_bits); - break; - case AST2500_A0_SILICON_REV: - case AST2500_A1_SILICON_REV: - s->ram_bits = ast2500_rambits(s); - s->max_ram_size = 1024 << 20; - s->fixed_conf = ASPEED_SDMC_HW_VERSION(1) | - ASPEED_SDMC_VGA_APERTURE(ASPEED_SDMC_VGA_64MB) | - ASPEED_SDMC_CACHE_INITIAL_DONE | - ASPEED_SDMC_DRAM_SIZE(s->ram_bits); - break; - default: - g_assert_not_reached(); - } + s->max_ram_size = asc->max_ram_size; memory_region_init_io(&s->iomem, OBJECT(s), &aspeed_sdmc_ops, s, TYPE_ASPEED_SDMC, 0x1000); @@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_aspeed_sdmc = { }; static Property aspeed_sdmc_properties[] = { - DEFINE_PROP_UINT32("silicon-rev", AspeedSDMCState, silicon_rev, 0), DEFINE_PROP_UINT64("ram-size", AspeedSDMCState, ram_size, 0), DEFINE_PROP_UINT64("max-ram-size", AspeedSDMCState, max_ram_size, 0), DEFINE_PROP_END_OF_LIST(), @@ -XXX,XX +XXX,XX @@ static const TypeInfo aspeed_sdmc_info = { .parent = TYPE_SYS_BUS_DEVICE, .instance_size = sizeof(AspeedSDMCState), .class_init = aspeed_sdmc_class_init, + .class_size = sizeof(AspeedSDMCClass), + .abstract = true, +}; + +static uint32_t aspeed_2400_sdmc_compute_conf(AspeedSDMCState *s, uint32_t data) +{ + uint32_t fixed_conf = ASPEED_SDMC_VGA_COMPAT | + ASPEED_SDMC_DRAM_SIZE(ast2400_rambits(s)); + + /* Make sure readonly bits are kept */ + data &= ~ASPEED_SDMC_READONLY_MASK; + + return data | fixed_conf; +} + +static void aspeed_2400_sdmc_write(AspeedSDMCState *s, uint32_t reg, + uint32_t data) +{ + switch (reg) { + case R_CONF: + data = aspeed_2400_sdmc_compute_conf(s, data); + break; + default: + break; + } + + s->regs[reg] = data; +} + +static void aspeed_2400_sdmc_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(klass); + AspeedSDMCClass *asc = ASPEED_SDMC_CLASS(klass); + + dc->desc = "ASPEED 2400 SDRAM Memory Controller"; + asc->max_ram_size = 512 << 20; + asc->compute_conf = aspeed_2400_sdmc_compute_conf; + asc->write = aspeed_2400_sdmc_write; +} + +static const TypeInfo aspeed_2400_sdmc_info = { + .name = TYPE_ASPEED_2400_SDMC, + .parent = TYPE_ASPEED_SDMC, + .class_init = aspeed_2400_sdmc_class_init, +}; + +static uint32_t aspeed_2500_sdmc_compute_conf(AspeedSDMCState *s, uint32_t data) +{ + uint32_t fixed_conf = ASPEED_SDMC_HW_VERSION(1) | + ASPEED_SDMC_VGA_APERTURE(ASPEED_SDMC_VGA_64MB) | + ASPEED_SDMC_CACHE_INITIAL_DONE | + ASPEED_SDMC_DRAM_SIZE(ast2500_rambits(s)); + + /* Make sure readonly bits are kept */ + data &= ~ASPEED_SDMC_AST2500_READONLY_MASK; + + return data | fixed_conf; +} + +static void aspeed_2500_sdmc_write(AspeedSDMCState *s, uint32_t reg, + uint32_t data) +{ + switch (reg) { + case R_CONF: + data = aspeed_2500_sdmc_compute_conf(s, data); + break; + case R_STATUS1: + /* Will never return 'busy' */ + data &= ~PHY_BUSY_STATE; + break; + case R_ECC_TEST_CTRL: + /* Always done, always happy */ + data |= ECC_TEST_FINISHED; + data &= ~ECC_TEST_FAIL; + break; + default: + break; + } + + s->regs[reg] = data; +} + +static void aspeed_2500_sdmc_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(klass); + AspeedSDMCClass *asc = ASPEED_SDMC_CLASS(klass); + + dc->desc = "ASPEED 2500 SDRAM Memory Controller"; + asc->max_ram_size = 1024 << 20; + asc->compute_conf = aspeed_2500_sdmc_compute_conf; + asc->write = aspeed_2500_sdmc_write; +} + +static const TypeInfo aspeed_2500_sdmc_info = { + .name = TYPE_ASPEED_2500_SDMC, + .parent = TYPE_ASPEED_SDMC, + .class_init = aspeed_2500_sdmc_class_init, }; static void aspeed_sdmc_register_types(void) { type_register_static(&aspeed_sdmc_info); + type_register_static(&aspeed_2400_sdmc_info); + type_register_static(&aspeed_2500_sdmc_info); } type_init(aspeed_sdmc_register_types); -- 2.20.1
From: Joel Stanley <joel@jms.id.au> The AST2600 SDMC controller is slightly different from its predecessor (DRAM training). Max memory is now 2G on the AST2600. Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-id: 20190925143248.10000-10-clg@kaod.org [clg: - improved commit log - reworked model integration into new object class ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/hw/misc/aspeed_sdmc.h | 1 + hw/misc/aspeed_scu.c | 2 + hw/misc/aspeed_sdmc.c | 82 +++++++++++++++++++++++++++++++++++ 3 files changed, 85 insertions(+) diff --git a/include/hw/misc/aspeed_sdmc.h b/include/hw/misc/aspeed_sdmc.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/misc/aspeed_sdmc.h +++ b/include/hw/misc/aspeed_sdmc.h @@ -XXX,XX +XXX,XX @@ #define ASPEED_SDMC(obj) OBJECT_CHECK(AspeedSDMCState, (obj), TYPE_ASPEED_SDMC) #define TYPE_ASPEED_2400_SDMC TYPE_ASPEED_SDMC "-ast2400" #define TYPE_ASPEED_2500_SDMC TYPE_ASPEED_SDMC "-ast2500" +#define TYPE_ASPEED_2600_SDMC TYPE_ASPEED_SDMC "-ast2600" #define ASPEED_SDMC_NR_REGS (0x174 >> 2) diff --git a/hw/misc/aspeed_scu.c b/hw/misc/aspeed_scu.c index XXXXXXX..XXXXXXX 100644 --- a/hw/misc/aspeed_scu.c +++ b/hw/misc/aspeed_scu.c @@ -XXX,XX +XXX,XX @@ #define AST2600_CLK_STOP_CTRL_CLR TO_REG(0x84) #define AST2600_CLK_STOP_CTRL2 TO_REG(0x90) #define AST2600_CLK_STOP_CTR2L_CLR TO_REG(0x94) +#define AST2600_SDRAM_HANDSHAKE TO_REG(0x100) #define AST2600_HPLL_PARAM TO_REG(0x200) #define AST2600_HPLL_EXT TO_REG(0x204) #define AST2600_MPLL_EXT TO_REG(0x224) @@ -XXX,XX +XXX,XX @@ static const uint32_t ast2600_a0_resets[ASPEED_AST2600_SCU_NR_REGS] = { [AST2600_SYS_RST_CTRL2] = 0xFFFFFFFC, [AST2600_CLK_STOP_CTRL] = 0xEFF43E8B, [AST2600_CLK_STOP_CTRL2] = 0xFFF0FFF0, + [AST2600_SDRAM_HANDSHAKE] = 0x00000040, /* SoC completed DRAM init */ [AST2600_HPLL_PARAM] = 0x1000405F, }; diff --git a/hw/misc/aspeed_sdmc.c b/hw/misc/aspeed_sdmc.c index XXXXXXX..XXXXXXX 100644 --- a/hw/misc/aspeed_sdmc.c +++ b/hw/misc/aspeed_sdmc.c @@ -XXX,XX +XXX,XX @@ /* Control/Status Register #1 (ast2500) */ #define R_STATUS1 (0x60 / 4) #define PHY_BUSY_STATE BIT(0) +#define PHY_PLL_LOCK_STATUS BIT(4) #define R_ECC_TEST_CTRL (0x70 / 4) #define ECC_TEST_FINISHED BIT(12) @@ -XXX,XX +XXX,XX @@ #define ASPEED_SDMC_AST2500_512MB 0x2 #define ASPEED_SDMC_AST2500_1024MB 0x3 +#define ASPEED_SDMC_AST2600_256MB 0x0 +#define ASPEED_SDMC_AST2600_512MB 0x1 +#define ASPEED_SDMC_AST2600_1024MB 0x2 +#define ASPEED_SDMC_AST2600_2048MB 0x3 + #define ASPEED_SDMC_AST2500_READONLY_MASK \ (ASPEED_SDMC_HW_VERSION(0xf) | ASPEED_SDMC_CACHE_INITIAL_DONE | \ ASPEED_SDMC_AST2500_RESERVED | ASPEED_SDMC_VGA_COMPAT | \ @@ -XXX,XX +XXX,XX @@ static int ast2500_rambits(AspeedSDMCState *s) return ASPEED_SDMC_AST2500_512MB; } +static int ast2600_rambits(AspeedSDMCState *s) +{ + switch (s->ram_size >> 20) { + case 256: + return ASPEED_SDMC_AST2600_256MB; + case 512: + return ASPEED_SDMC_AST2600_512MB; + case 1024: + return ASPEED_SDMC_AST2600_1024MB; + case 2048: + return ASPEED_SDMC_AST2600_2048MB; + default: + break; + } + + /* use a common default */ + warn_report("Invalid RAM size 0x%" PRIx64 ". Using default 512M", + s->ram_size); + s->ram_size = 512 << 20; + return ASPEED_SDMC_AST2600_512MB; +} + static void aspeed_sdmc_reset(DeviceState *dev) { AspeedSDMCState *s = ASPEED_SDMC(dev); @@ -XXX,XX +XXX,XX @@ static const TypeInfo aspeed_2500_sdmc_info = { .class_init = aspeed_2500_sdmc_class_init, }; +static uint32_t aspeed_2600_sdmc_compute_conf(AspeedSDMCState *s, uint32_t data) +{ + uint32_t fixed_conf = ASPEED_SDMC_HW_VERSION(3) | + ASPEED_SDMC_VGA_APERTURE(ASPEED_SDMC_VGA_64MB) | + ASPEED_SDMC_DRAM_SIZE(ast2600_rambits(s)); + + /* Make sure readonly bits are kept (use ast2500 mask) */ + data &= ~ASPEED_SDMC_AST2500_READONLY_MASK; + + return data | fixed_conf; +} + +static void aspeed_2600_sdmc_write(AspeedSDMCState *s, uint32_t reg, + uint32_t data) +{ + switch (reg) { + case R_CONF: + data = aspeed_2600_sdmc_compute_conf(s, data); + break; + case R_STATUS1: + /* Will never return 'busy'. 'lock status' is always set */ + data &= ~PHY_BUSY_STATE; + data |= PHY_PLL_LOCK_STATUS; + break; + case R_ECC_TEST_CTRL: + /* Always done, always happy */ + data |= ECC_TEST_FINISHED; + data &= ~ECC_TEST_FAIL; + break; + default: + break; + } + + s->regs[reg] = data; +} + +static void aspeed_2600_sdmc_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(klass); + AspeedSDMCClass *asc = ASPEED_SDMC_CLASS(klass); + + dc->desc = "ASPEED 2600 SDRAM Memory Controller"; + asc->max_ram_size = 2048 << 20; + asc->compute_conf = aspeed_2600_sdmc_compute_conf; + asc->write = aspeed_2600_sdmc_write; +} + +static const TypeInfo aspeed_2600_sdmc_info = { + .name = TYPE_ASPEED_2600_SDMC, + .parent = TYPE_ASPEED_SDMC, + .class_init = aspeed_2600_sdmc_class_init, +}; + static void aspeed_sdmc_register_types(void) { type_register_static(&aspeed_sdmc_info); type_register_static(&aspeed_2400_sdmc_info); type_register_static(&aspeed_2500_sdmc_info); + type_register_static(&aspeed_2600_sdmc_info); } type_init(aspeed_sdmc_register_types); -- 2.20.1
From: Cédric Le Goater <clg@kaod.org> It cleanups the current models for the Aspeed AST2400 and AST2500 SoCs and prepares ground for future SoCs. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Joel Stanley <joel@jms.id.au> Message-id: 20190925143248.10000-11-clg@kaod.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/hw/watchdog/wdt_aspeed.h | 18 ++++- hw/arm/aspeed_soc.c | 9 ++- hw/watchdog/wdt_aspeed.c | 122 ++++++++++++++++--------------- 3 files changed, 86 insertions(+), 63 deletions(-) diff --git a/include/hw/watchdog/wdt_aspeed.h b/include/hw/watchdog/wdt_aspeed.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/watchdog/wdt_aspeed.h +++ b/include/hw/watchdog/wdt_aspeed.h @@ -XXX,XX +XXX,XX @@ #define TYPE_ASPEED_WDT "aspeed.wdt" #define ASPEED_WDT(obj) \ OBJECT_CHECK(AspeedWDTState, (obj), TYPE_ASPEED_WDT) +#define TYPE_ASPEED_2400_WDT TYPE_ASPEED_WDT "-ast2400" +#define TYPE_ASPEED_2500_WDT TYPE_ASPEED_WDT "-ast2500" #define ASPEED_WDT_REGS_MAX (0x20 / 4) @@ -XXX,XX +XXX,XX @@ typedef struct AspeedWDTState { AspeedSCUState *scu; uint32_t pclk_freq; - uint32_t silicon_rev; - uint32_t ext_pulse_width_mask; } AspeedWDTState; +#define ASPEED_WDT_CLASS(klass) \ + OBJECT_CLASS_CHECK(AspeedWDTClass, (klass), TYPE_ASPEED_WDT) +#define ASPEED_WDT_GET_CLASS(obj) \ + OBJECT_GET_CLASS(AspeedWDTClass, (obj), TYPE_ASPEED_WDT) + +typedef struct AspeedWDTClass { + SysBusDeviceClass parent_class; + + uint32_t offset; + uint32_t ext_pulse_width_mask; + uint32_t reset_ctrl_reg; + void (*reset_pulse)(AspeedWDTState *s, uint32_t property); +} AspeedWDTClass; + #endif /* WDT_ASPEED_H */ diff --git a/hw/arm/aspeed_soc.c b/hw/arm/aspeed_soc.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/aspeed_soc.c +++ b/hw/arm/aspeed_soc.c @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_init(Object *obj) "max-ram-size", &error_abort); for (i = 0; i < sc->info->wdts_num; i++) { + snprintf(typename, sizeof(typename), "aspeed.wdt-%s", socname); sysbus_init_child_obj(obj, "wdt[*]", OBJECT(&s->wdt[i]), - sizeof(s->wdt[i]), TYPE_ASPEED_WDT); - qdev_prop_set_uint32(DEVICE(&s->wdt[i]), "silicon-rev", - sc->info->silicon_rev); + sizeof(s->wdt[i]), typename); object_property_add_const_link(OBJECT(&s->wdt[i]), "scu", OBJECT(&s->scu), &error_abort); } @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_realize(DeviceState *dev, Error **errp) /* Watch dog */ for (i = 0; i < sc->info->wdts_num; i++) { + AspeedWDTClass *awc = ASPEED_WDT_GET_CLASS(&s->wdt[i]); + object_property_set_bool(OBJECT(&s->wdt[i]), true, "realized", &err); if (err) { error_propagate(errp, err); return; } sysbus_mmio_map(SYS_BUS_DEVICE(&s->wdt[i]), 0, - sc->info->memmap[ASPEED_WDT] + i * 0x20); + sc->info->memmap[ASPEED_WDT] + i * awc->offset); } /* Net */ diff --git a/hw/watchdog/wdt_aspeed.c b/hw/watchdog/wdt_aspeed.c index XXXXXXX..XXXXXXX 100644 --- a/hw/watchdog/wdt_aspeed.c +++ b/hw/watchdog/wdt_aspeed.c @@ -XXX,XX +XXX,XX @@ static bool aspeed_wdt_is_enabled(const AspeedWDTState *s) return s->regs[WDT_CTRL] & WDT_CTRL_ENABLE; } -static bool is_ast2500(const AspeedWDTState *s) -{ - switch (s->silicon_rev) { - case AST2500_A0_SILICON_REV: - case AST2500_A1_SILICON_REV: - return true; - case AST2400_A0_SILICON_REV: - case AST2400_A1_SILICON_REV: - default: - break; - } - - return false; -} - static uint64_t aspeed_wdt_read(void *opaque, hwaddr offset, unsigned size) { AspeedWDTState *s = ASPEED_WDT(opaque); @@ -XXX,XX +XXX,XX @@ static void aspeed_wdt_write(void *opaque, hwaddr offset, uint64_t data, unsigned size) { AspeedWDTState *s = ASPEED_WDT(opaque); + AspeedWDTClass *awc = ASPEED_WDT_GET_CLASS(s); bool enable = data & WDT_CTRL_ENABLE; offset >>= 2; @@ -XXX,XX +XXX,XX @@ static void aspeed_wdt_write(void *opaque, hwaddr offset, uint64_t data, } break; case WDT_RESET_WIDTH: - { - uint32_t property = data & WDT_POLARITY_MASK; - - if (property && is_ast2500(s)) { - if (property == WDT_ACTIVE_HIGH_MAGIC) { - s->regs[WDT_RESET_WIDTH] |= WDT_RESET_WIDTH_ACTIVE_HIGH; - } else if (property == WDT_ACTIVE_LOW_MAGIC) { - s->regs[WDT_RESET_WIDTH] &= ~WDT_RESET_WIDTH_ACTIVE_HIGH; - } else if (property == WDT_PUSH_PULL_MAGIC) { - s->regs[WDT_RESET_WIDTH] |= WDT_RESET_WIDTH_PUSH_PULL; - } else if (property == WDT_OPEN_DRAIN_MAGIC) { - s->regs[WDT_RESET_WIDTH] &= ~WDT_RESET_WIDTH_PUSH_PULL; - } + if (awc->reset_pulse) { + awc->reset_pulse(s, data & WDT_POLARITY_MASK); } - s->regs[WDT_RESET_WIDTH] &= ~s->ext_pulse_width_mask; - s->regs[WDT_RESET_WIDTH] |= data & s->ext_pulse_width_mask; + s->regs[WDT_RESET_WIDTH] &= ~awc->ext_pulse_width_mask; + s->regs[WDT_RESET_WIDTH] |= data & awc->ext_pulse_width_mask; break; - } + case WDT_TIMEOUT_STATUS: case WDT_TIMEOUT_CLEAR: qemu_log_mask(LOG_UNIMP, @@ -XXX,XX +XXX,XX @@ static void aspeed_wdt_reset(DeviceState *dev) static void aspeed_wdt_timer_expired(void *dev) { AspeedWDTState *s = ASPEED_WDT(dev); + uint32_t reset_ctrl_reg = ASPEED_WDT_GET_CLASS(s)->reset_ctrl_reg; /* Do not reset on SDRAM controller reset */ - if (s->scu->regs[SCU_RESET_CONTROL1] & SCU_RESET_SDRAM) { + if (s->scu->regs[reset_ctrl_reg] & SCU_RESET_SDRAM) { timer_del(s->timer); s->regs[WDT_CTRL] = 0; return; @@ -XXX,XX +XXX,XX @@ static void aspeed_wdt_realize(DeviceState *dev, Error **errp) } s->scu = ASPEED_SCU(obj); - if (!is_supported_silicon_rev(s->silicon_rev)) { - error_setg(errp, "Unknown silicon revision: 0x%" PRIx32, - s->silicon_rev); - return; - } - - switch (s->silicon_rev) { - case AST2400_A0_SILICON_REV: - case AST2400_A1_SILICON_REV: - s->ext_pulse_width_mask = 0xff; - break; - case AST2500_A0_SILICON_REV: - case AST2500_A1_SILICON_REV: - s->ext_pulse_width_mask = 0xfffff; - break; - default: - g_assert_not_reached(); - } - s->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, aspeed_wdt_timer_expired, dev); /* FIXME: This setting should be derived from the SCU hw strapping @@ -XXX,XX +XXX,XX @@ static void aspeed_wdt_realize(DeviceState *dev, Error **errp) sysbus_init_mmio(sbd, &s->iomem); } -static Property aspeed_wdt_properties[] = { - DEFINE_PROP_UINT32("silicon-rev", AspeedWDTState, silicon_rev, 0), - DEFINE_PROP_END_OF_LIST(), -}; - static void aspeed_wdt_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); + dc->desc = "ASPEED Watchdog Controller"; dc->realize = aspeed_wdt_realize; dc->reset = aspeed_wdt_reset; set_bit(DEVICE_CATEGORY_MISC, dc->categories); dc->vmsd = &vmstate_aspeed_wdt; - dc->props = aspeed_wdt_properties; } static const TypeInfo aspeed_wdt_info = { @@ -XXX,XX +XXX,XX @@ static const TypeInfo aspeed_wdt_info = { .name = TYPE_ASPEED_WDT, .instance_size = sizeof(AspeedWDTState), .class_init = aspeed_wdt_class_init, + .class_size = sizeof(AspeedWDTClass), + .abstract = true, +}; + +static void aspeed_2400_wdt_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(klass); + AspeedWDTClass *awc = ASPEED_WDT_CLASS(klass); + + dc->desc = "ASPEED 2400 Watchdog Controller"; + awc->offset = 0x20; + awc->ext_pulse_width_mask = 0xff; + awc->reset_ctrl_reg = SCU_RESET_CONTROL1; +} + +static const TypeInfo aspeed_2400_wdt_info = { + .name = TYPE_ASPEED_2400_WDT, + .parent = TYPE_ASPEED_WDT, + .instance_size = sizeof(AspeedWDTState), + .class_init = aspeed_2400_wdt_class_init, +}; + +static void aspeed_2500_wdt_reset_pulse(AspeedWDTState *s, uint32_t property) +{ + if (property) { + if (property == WDT_ACTIVE_HIGH_MAGIC) { + s->regs[WDT_RESET_WIDTH] |= WDT_RESET_WIDTH_ACTIVE_HIGH; + } else if (property == WDT_ACTIVE_LOW_MAGIC) { + s->regs[WDT_RESET_WIDTH] &= ~WDT_RESET_WIDTH_ACTIVE_HIGH; + } else if (property == WDT_PUSH_PULL_MAGIC) { + s->regs[WDT_RESET_WIDTH] |= WDT_RESET_WIDTH_PUSH_PULL; + } else if (property == WDT_OPEN_DRAIN_MAGIC) { + s->regs[WDT_RESET_WIDTH] &= ~WDT_RESET_WIDTH_PUSH_PULL; + } + } +} + +static void aspeed_2500_wdt_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(klass); + AspeedWDTClass *awc = ASPEED_WDT_CLASS(klass); + + dc->desc = "ASPEED 2500 Watchdog Controller"; + awc->offset = 0x20; + awc->ext_pulse_width_mask = 0xfffff; + awc->reset_ctrl_reg = SCU_RESET_CONTROL1; + awc->reset_pulse = aspeed_2500_wdt_reset_pulse; +} + +static const TypeInfo aspeed_2500_wdt_info = { + .name = TYPE_ASPEED_2500_WDT, + .parent = TYPE_ASPEED_WDT, + .instance_size = sizeof(AspeedWDTState), + .class_init = aspeed_2500_wdt_class_init, }; static void wdt_aspeed_register_types(void) { watchdog_add_model(&model); type_register_static(&aspeed_wdt_info); + type_register_static(&aspeed_2400_wdt_info); + type_register_static(&aspeed_2500_wdt_info); } type_init(wdt_aspeed_register_types) -- 2.20.1
From: Joel Stanley <joel@jms.id.au> The AST2600 has four watchdogs, and they each have a 0x40 of registers. When running as part of an ast2600 system we must check a different offset for the system reset control register in the SCU. Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-id: 20190925143248.10000-12-clg@kaod.org [clg: - reworked model integration into new object class ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/hw/arm/aspeed_soc.h | 2 +- include/hw/watchdog/wdt_aspeed.h | 1 + hw/watchdog/wdt_aspeed.c | 29 +++++++++++++++++++++++++++++ 3 files changed, 31 insertions(+), 1 deletion(-) diff --git a/include/hw/arm/aspeed_soc.h b/include/hw/arm/aspeed_soc.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/arm/aspeed_soc.h +++ b/include/hw/arm/aspeed_soc.h @@ -XXX,XX +XXX,XX @@ #include "hw/sd/aspeed_sdhci.h" #define ASPEED_SPIS_NUM 2 -#define ASPEED_WDTS_NUM 3 +#define ASPEED_WDTS_NUM 4 #define ASPEED_CPUS_NUM 2 #define ASPEED_MACS_NUM 2 diff --git a/include/hw/watchdog/wdt_aspeed.h b/include/hw/watchdog/wdt_aspeed.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/watchdog/wdt_aspeed.h +++ b/include/hw/watchdog/wdt_aspeed.h @@ -XXX,XX +XXX,XX @@ OBJECT_CHECK(AspeedWDTState, (obj), TYPE_ASPEED_WDT) #define TYPE_ASPEED_2400_WDT TYPE_ASPEED_WDT "-ast2400" #define TYPE_ASPEED_2500_WDT TYPE_ASPEED_WDT "-ast2500" +#define TYPE_ASPEED_2600_WDT TYPE_ASPEED_WDT "-ast2600" #define ASPEED_WDT_REGS_MAX (0x20 / 4) diff --git a/hw/watchdog/wdt_aspeed.c b/hw/watchdog/wdt_aspeed.c index XXXXXXX..XXXXXXX 100644 --- a/hw/watchdog/wdt_aspeed.c +++ b/hw/watchdog/wdt_aspeed.c @@ -XXX,XX +XXX,XX @@ #define WDT_DRIVE_TYPE_MASK (0xFF << 24) #define WDT_PUSH_PULL_MAGIC (0xA8 << 24) #define WDT_OPEN_DRAIN_MAGIC (0x8A << 24) +#define WDT_RESET_MASK1 (0x1c / 4) #define WDT_TIMEOUT_STATUS (0x10 / 4) #define WDT_TIMEOUT_CLEAR (0x14 / 4) #define WDT_RESTART_MAGIC 0x4755 +#define AST2600_SCU_RESET_CONTROL1 (0x40 / 4) #define SCU_RESET_CONTROL1 (0x04 / 4) #define SCU_RESET_SDRAM BIT(0) @@ -XXX,XX +XXX,XX @@ static uint64_t aspeed_wdt_read(void *opaque, hwaddr offset, unsigned size) return s->regs[WDT_CTRL]; case WDT_RESET_WIDTH: return s->regs[WDT_RESET_WIDTH]; + case WDT_RESET_MASK1: + return s->regs[WDT_RESET_MASK1]; case WDT_TIMEOUT_STATUS: case WDT_TIMEOUT_CLEAR: qemu_log_mask(LOG_UNIMP, @@ -XXX,XX +XXX,XX @@ static void aspeed_wdt_write(void *opaque, hwaddr offset, uint64_t data, s->regs[WDT_RESET_WIDTH] |= data & awc->ext_pulse_width_mask; break; + case WDT_RESET_MASK1: + /* TODO: implement */ + s->regs[WDT_RESET_MASK1] = data; + break; + case WDT_TIMEOUT_STATUS: case WDT_TIMEOUT_CLEAR: qemu_log_mask(LOG_UNIMP, @@ -XXX,XX +XXX,XX @@ static const TypeInfo aspeed_2500_wdt_info = { .class_init = aspeed_2500_wdt_class_init, }; +static void aspeed_2600_wdt_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(klass); + AspeedWDTClass *awc = ASPEED_WDT_CLASS(klass); + + dc->desc = "ASPEED 2600 Watchdog Controller"; + awc->offset = 0x40; + awc->ext_pulse_width_mask = 0xfffff; /* TODO */ + awc->reset_ctrl_reg = AST2600_SCU_RESET_CONTROL1; + awc->reset_pulse = aspeed_2500_wdt_reset_pulse; +} + +static const TypeInfo aspeed_2600_wdt_info = { + .name = TYPE_ASPEED_2600_WDT, + .parent = TYPE_ASPEED_WDT, + .instance_size = sizeof(AspeedWDTState), + .class_init = aspeed_2600_wdt_class_init, +}; + static void wdt_aspeed_register_types(void) { watchdog_add_model(&model); type_register_static(&aspeed_wdt_info); type_register_static(&aspeed_2400_wdt_info); type_register_static(&aspeed_2500_wdt_info); + type_register_static(&aspeed_2600_wdt_info); } type_init(wdt_aspeed_register_types) -- 2.20.1
From: Cédric Le Goater <clg@kaod.org> AST2600 will use a different encoding for the addresses defined in the Segment Register. Signed-off-by: Cédric Le Goater <clg@kaod.org> Acked-by: Joel Stanley <joel@jms.id.au> Message-id: 20190925143248.10000-13-clg@kaod.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/hw/ssi/aspeed_smc.h | 4 ++++ hw/ssi/aspeed_smc.c | 45 ++++++++++++++++++++++++------------- 2 files changed, 34 insertions(+), 15 deletions(-) diff --git a/include/hw/ssi/aspeed_smc.h b/include/hw/ssi/aspeed_smc.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/ssi/aspeed_smc.h +++ b/include/hw/ssi/aspeed_smc.h @@ -XXX,XX +XXX,XX @@ typedef struct AspeedSMCController { hwaddr dma_flash_mask; hwaddr dma_dram_mask; uint32_t nregs; + uint32_t (*segment_to_reg)(const struct AspeedSMCState *s, + const AspeedSegments *seg); + void (*reg_to_segment)(const struct AspeedSMCState *s, uint32_t reg, + AspeedSegments *seg); } AspeedSMCController; typedef struct AspeedSMCFlash { diff --git a/hw/ssi/aspeed_smc.c b/hw/ssi/aspeed_smc.c index XXXXXXX..XXXXXXX 100644 --- a/hw/ssi/aspeed_smc.c +++ b/hw/ssi/aspeed_smc.c @@ -XXX,XX +XXX,XX @@ static const AspeedSegments aspeed_segments_ast2500_spi2[] = { { 0x38000000, 32 * 1024 * 1024 }, /* start address is readonly */ { 0x3A000000, 96 * 1024 * 1024 }, /* end address is readonly */ }; +static uint32_t aspeed_smc_segment_to_reg(const AspeedSMCState *s, + const AspeedSegments *seg); +static void aspeed_smc_reg_to_segment(const AspeedSMCState *s, uint32_t reg, + AspeedSegments *seg); static const AspeedSMCController controllers[] = { { @@ -XXX,XX +XXX,XX @@ static const AspeedSMCController controllers[] = { .flash_window_size = 0x6000000, .has_dma = false, .nregs = ASPEED_SMC_R_SMC_MAX, + .segment_to_reg = aspeed_smc_segment_to_reg, + .reg_to_segment = aspeed_smc_reg_to_segment, }, { .name = "aspeed.fmc-ast2400", .r_conf = R_CONF, @@ -XXX,XX +XXX,XX @@ static const AspeedSMCController controllers[] = { .dma_flash_mask = 0x0FFFFFFC, .dma_dram_mask = 0x1FFFFFFC, .nregs = ASPEED_SMC_R_MAX, + .segment_to_reg = aspeed_smc_segment_to_reg, + .reg_to_segment = aspeed_smc_reg_to_segment, }, { .name = "aspeed.spi1-ast2400", .r_conf = R_SPI_CONF, @@ -XXX,XX +XXX,XX @@ static const AspeedSMCController controllers[] = { .flash_window_size = 0x10000000, .has_dma = false, .nregs = ASPEED_SMC_R_SPI_MAX, + .segment_to_reg = aspeed_smc_segment_to_reg, + .reg_to_segment = aspeed_smc_reg_to_segment, }, { .name = "aspeed.fmc-ast2500", .r_conf = R_CONF, @@ -XXX,XX +XXX,XX @@ static const AspeedSMCController controllers[] = { .dma_flash_mask = 0x0FFFFFFC, .dma_dram_mask = 0x3FFFFFFC, .nregs = ASPEED_SMC_R_MAX, + .segment_to_reg = aspeed_smc_segment_to_reg, + .reg_to_segment = aspeed_smc_reg_to_segment, }, { .name = "aspeed.spi1-ast2500", .r_conf = R_CONF, @@ -XXX,XX +XXX,XX @@ static const AspeedSMCController controllers[] = { .flash_window_size = 0x8000000, .has_dma = false, .nregs = ASPEED_SMC_R_MAX, + .segment_to_reg = aspeed_smc_segment_to_reg, + .reg_to_segment = aspeed_smc_reg_to_segment, }, { .name = "aspeed.spi2-ast2500", .r_conf = R_CONF, @@ -XXX,XX +XXX,XX @@ static const AspeedSMCController controllers[] = { .flash_window_size = 0x8000000, .has_dma = false, .nregs = ASPEED_SMC_R_MAX, + .segment_to_reg = aspeed_smc_segment_to_reg, + .reg_to_segment = aspeed_smc_reg_to_segment, }, }; /* - * The Segment Register uses a 8MB unit to encode the start address - * and the end address of the mapping window of a flash SPI slave : - * - * | byte 1 | byte 2 | byte 3 | byte 4 | - * +--------+--------+--------+--------+ - * | end | start | 0 | 0 | - * + * The Segment Registers of the AST2400 and AST2500 have a 8MB + * unit. The address range of a flash SPI slave is encoded with + * absolute addresses which should be part of the overall controller + * window. */ -static inline uint32_t aspeed_smc_segment_to_reg(const AspeedSegments *seg) +static uint32_t aspeed_smc_segment_to_reg(const AspeedSMCState *s, + const AspeedSegments *seg) { uint32_t reg = 0; reg |= ((seg->addr >> 23) & SEG_START_MASK) << SEG_START_SHIFT; @@ -XXX,XX +XXX,XX @@ static inline uint32_t aspeed_smc_segment_to_reg(const AspeedSegments *seg) return reg; } -static inline void aspeed_smc_reg_to_segment(uint32_t reg, AspeedSegments *seg) +static void aspeed_smc_reg_to_segment(const AspeedSMCState *s, + uint32_t reg, AspeedSegments *seg) { seg->addr = ((reg >> SEG_START_SHIFT) & SEG_START_MASK) << 23; seg->size = (((reg >> SEG_END_SHIFT) & SEG_END_MASK) << 23) - seg->addr; @@ -XXX,XX +XXX,XX @@ static bool aspeed_smc_flash_overlap(const AspeedSMCState *s, continue; } - aspeed_smc_reg_to_segment(s->regs[R_SEG_ADDR0 + i], &seg); + s->ctrl->reg_to_segment(s, s->regs[R_SEG_ADDR0 + i], &seg); if (new->addr + new->size > seg.addr && new->addr < seg.addr + seg.size) { @@ -XXX,XX +XXX,XX @@ static void aspeed_smc_flash_set_segment(AspeedSMCState *s, int cs, AspeedSMCFlash *fl = &s->flashes[cs]; AspeedSegments seg; - aspeed_smc_reg_to_segment(new, &seg); + s->ctrl->reg_to_segment(s, new, &seg); /* The start address of CS0 is read-only */ if (cs == 0 && seg.addr != s->ctrl->flash_window_base) { @@ -XXX,XX +XXX,XX @@ static void aspeed_smc_flash_set_segment(AspeedSMCState *s, int cs, "%s: Tried to change CS0 start address to 0x%" HWADDR_PRIx "\n", s->ctrl->name, seg.addr); seg.addr = s->ctrl->flash_window_base; - new = aspeed_smc_segment_to_reg(&seg); + new = s->ctrl->segment_to_reg(s, &seg); } /* @@ -XXX,XX +XXX,XX @@ static void aspeed_smc_flash_set_segment(AspeedSMCState *s, int cs, HWADDR_PRIx "\n", s->ctrl->name, cs, seg.addr + seg.size); seg.size = s->ctrl->segments[cs].addr + s->ctrl->segments[cs].size - seg.addr; - new = aspeed_smc_segment_to_reg(&seg); + new = s->ctrl->segment_to_reg(s, &seg); } /* Keep the segment in the overall flash window */ @@ -XXX,XX +XXX,XX @@ static uint32_t aspeed_smc_check_segment_addr(const AspeedSMCFlash *fl, const AspeedSMCState *s = fl->controller; AspeedSegments seg; - aspeed_smc_reg_to_segment(s->regs[R_SEG_ADDR0 + fl->id], &seg); + s->ctrl->reg_to_segment(s, s->regs[R_SEG_ADDR0 + fl->id], &seg); if ((addr % seg.size) != addr) { qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid address 0x%08x for CS%d segment : " @@ -XXX,XX +XXX,XX @@ static void aspeed_smc_reset(DeviceState *d) /* setup default segment register values for all */ for (i = 0; i < s->ctrl->max_slaves; ++i) { s->regs[R_SEG_ADDR0 + i] = - aspeed_smc_segment_to_reg(&s->ctrl->segments[i]); + s->ctrl->segment_to_reg(s, &s->ctrl->segments[i]); } /* HW strapping flash type for FMC controllers */ -- 2.20.1
From: Cédric Le Goater <clg@kaod.org> The AST2600 SoC SMC controller is a SPI only controller now and has a few extensions which we will need to take into account when SW requires it. This is enough to support u-boot and Linux. Signed-off-by: Cédric Le Goater <clg@kaod.org> Acked-by: Joel Stanley <joel@jms.id.au> Message-id: 20190925143248.10000-14-clg@kaod.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- hw/ssi/aspeed_smc.c | 132 ++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 128 insertions(+), 4 deletions(-) diff --git a/hw/ssi/aspeed_smc.c b/hw/ssi/aspeed_smc.c index XXXXXXX..XXXXXXX 100644 --- a/hw/ssi/aspeed_smc.c +++ b/hw/ssi/aspeed_smc.c @@ -XXX,XX +XXX,XX @@ #include "qemu/error-report.h" #include "qapi/error.h" #include "exec/address-spaces.h" +#include "qemu/units.h" #include "hw/irq.h" #include "hw/qdev-properties.h" @@ -XXX,XX +XXX,XX @@ #define CONF_FLASH_TYPE0 0 #define CONF_FLASH_TYPE_NOR 0x0 #define CONF_FLASH_TYPE_NAND 0x1 -#define CONF_FLASH_TYPE_SPI 0x2 +#define CONF_FLASH_TYPE_SPI 0x2 /* AST2600 is SPI only */ /* CE Control Register */ #define R_CE_CTRL (0x04 / 4) @@ -XXX,XX +XXX,XX @@ /* CEx Control Register */ #define R_CTRL0 (0x10 / 4) +#define CTRL_IO_QPI (1 << 31) +#define CTRL_IO_QUAD_DATA (1 << 30) #define CTRL_IO_DUAL_DATA (1 << 29) #define CTRL_IO_DUAL_ADDR_DATA (1 << 28) /* Includes dummies */ +#define CTRL_IO_QUAD_ADDR_DATA (1 << 28) /* Includes dummies */ #define CTRL_CMD_SHIFT 16 #define CTRL_CMD_MASK 0xff #define CTRL_DUMMY_HIGH_SHIFT 14 @@ -XXX,XX +XXX,XX @@ /* Misc Control Register #2 */ #define R_TIMINGS (0x94 / 4) -/* SPI controller registers and bits */ +/* SPI controller registers and bits (AST2400) */ #define R_SPI_CONF (0x00 / 4) #define SPI_CONF_ENABLE_W0 0 #define R_SPI_CTRL0 (0x4 / 4) @@ -XXX,XX +XXX,XX @@ static uint32_t aspeed_smc_segment_to_reg(const AspeedSMCState *s, static void aspeed_smc_reg_to_segment(const AspeedSMCState *s, uint32_t reg, AspeedSegments *seg); +/* + * AST2600 definitions + */ +#define ASPEED26_SOC_FMC_FLASH_BASE 0x20000000 +#define ASPEED26_SOC_SPI_FLASH_BASE 0x30000000 +#define ASPEED26_SOC_SPI2_FLASH_BASE 0x50000000 + +static const AspeedSegments aspeed_segments_ast2600_fmc[] = { + { 0x0, 128 * MiB }, /* start address is readonly */ + { 0x0, 0 }, /* disabled */ + { 0x0, 0 }, /* disabled */ +}; + +static const AspeedSegments aspeed_segments_ast2600_spi1[] = { + { 0x0, 128 * MiB }, /* start address is readonly */ + { 0x0, 0 }, /* disabled */ +}; + +static const AspeedSegments aspeed_segments_ast2600_spi2[] = { + { 0x0, 128 * MiB }, /* start address is readonly */ + { 0x0, 0 }, /* disabled */ + { 0x0, 0 }, /* disabled */ +}; + +static uint32_t aspeed_2600_smc_segment_to_reg(const AspeedSMCState *s, + const AspeedSegments *seg); +static void aspeed_2600_smc_reg_to_segment(const AspeedSMCState *s, + uint32_t reg, AspeedSegments *seg); + static const AspeedSMCController controllers[] = { { .name = "aspeed.smc-ast2400", @@ -XXX,XX +XXX,XX @@ static const AspeedSMCController controllers[] = { .nregs = ASPEED_SMC_R_MAX, .segment_to_reg = aspeed_smc_segment_to_reg, .reg_to_segment = aspeed_smc_reg_to_segment, + }, { + .name = "aspeed.fmc-ast2600", + .r_conf = R_CONF, + .r_ce_ctrl = R_CE_CTRL, + .r_ctrl0 = R_CTRL0, + .r_timings = R_TIMINGS, + .conf_enable_w0 = CONF_ENABLE_W0, + .max_slaves = 3, + .segments = aspeed_segments_ast2600_fmc, + .flash_window_base = ASPEED26_SOC_FMC_FLASH_BASE, + .flash_window_size = 0x10000000, + .has_dma = true, + .nregs = ASPEED_SMC_R_MAX, + .segment_to_reg = aspeed_2600_smc_segment_to_reg, + .reg_to_segment = aspeed_2600_smc_reg_to_segment, + }, { + .name = "aspeed.spi1-ast2600", + .r_conf = R_CONF, + .r_ce_ctrl = R_CE_CTRL, + .r_ctrl0 = R_CTRL0, + .r_timings = R_TIMINGS, + .conf_enable_w0 = CONF_ENABLE_W0, + .max_slaves = 2, + .segments = aspeed_segments_ast2600_spi1, + .flash_window_base = ASPEED26_SOC_SPI_FLASH_BASE, + .flash_window_size = 0x10000000, + .has_dma = false, + .nregs = ASPEED_SMC_R_MAX, + .segment_to_reg = aspeed_2600_smc_segment_to_reg, + .reg_to_segment = aspeed_2600_smc_reg_to_segment, + }, { + .name = "aspeed.spi2-ast2600", + .r_conf = R_CONF, + .r_ce_ctrl = R_CE_CTRL, + .r_ctrl0 = R_CTRL0, + .r_timings = R_TIMINGS, + .conf_enable_w0 = CONF_ENABLE_W0, + .max_slaves = 3, + .segments = aspeed_segments_ast2600_spi2, + .flash_window_base = ASPEED26_SOC_SPI2_FLASH_BASE, + .flash_window_size = 0x10000000, + .has_dma = false, + .nregs = ASPEED_SMC_R_MAX, + .segment_to_reg = aspeed_2600_smc_segment_to_reg, + .reg_to_segment = aspeed_2600_smc_reg_to_segment, }, }; @@ -XXX,XX +XXX,XX @@ static void aspeed_smc_reg_to_segment(const AspeedSMCState *s, seg->size = (((reg >> SEG_END_SHIFT) & SEG_END_MASK) << 23) - seg->addr; } +/* + * The Segment Registers of the AST2600 have a 1MB unit. The address + * range of a flash SPI slave is encoded with offsets in the overall + * controller window. The previous SoC AST2400 and AST2500 used + * absolute addresses. Only bits [27:20] are relevant and the end + * address is an upper bound limit. + */ +#define AST2600_SEG_ADDR_MASK 0x0ff00000 + +static uint32_t aspeed_2600_smc_segment_to_reg(const AspeedSMCState *s, + const AspeedSegments *seg) +{ + uint32_t reg = 0; + + /* Disabled segments have a nil register */ + if (!seg->size) { + return 0; + } + + reg |= (seg->addr & AST2600_SEG_ADDR_MASK) >> 16; /* start offset */ + reg |= (seg->addr + seg->size - 1) & AST2600_SEG_ADDR_MASK; /* end offset */ + return reg; +} + +static void aspeed_2600_smc_reg_to_segment(const AspeedSMCState *s, + uint32_t reg, AspeedSegments *seg) +{ + uint32_t start_offset = (reg << 16) & AST2600_SEG_ADDR_MASK; + uint32_t end_offset = reg & AST2600_SEG_ADDR_MASK; + + seg->addr = s->ctrl->flash_window_base + start_offset; + seg->size = end_offset + MiB - start_offset; +} + static bool aspeed_smc_flash_overlap(const AspeedSMCState *s, const AspeedSegments *new, int cs) @@ -XXX,XX +XXX,XX @@ static inline int aspeed_smc_flash_cmd(const AspeedSMCFlash *fl) const AspeedSMCState *s = fl->controller; int cmd = (s->regs[s->r_ctrl0 + fl->id] >> CTRL_CMD_SHIFT) & CTRL_CMD_MASK; - /* In read mode, the default SPI command is READ (0x3). In other - * modes, the command should necessarily be defined */ + /* + * In read mode, the default SPI command is READ (0x3). In other + * modes, the command should necessarily be defined + * + * TODO: add support for READ4 (0x13) on AST2600 + */ if (aspeed_smc_flash_mode(fl) == CTRL_READMODE) { cmd = SPI_OP_READ; } @@ -XXX,XX +XXX,XX @@ static void aspeed_smc_reset(DeviceState *d) s->ctrl->segment_to_reg(s, &s->ctrl->segments[i]); } + /* HW strapping flash type for the AST2600 controllers */ + if (s->ctrl->segments == aspeed_segments_ast2600_fmc) { + /* flash type is fixed to SPI for all */ + s->regs[s->r_conf] |= (CONF_FLASH_TYPE_SPI << CONF_FLASH_TYPE0); + s->regs[s->r_conf] |= (CONF_FLASH_TYPE_SPI << CONF_FLASH_TYPE1); + s->regs[s->r_conf] |= (CONF_FLASH_TYPE_SPI << CONF_FLASH_TYPE2); + } + /* HW strapping flash type for FMC controllers */ if (s->ctrl->segments == aspeed_segments_ast2500_fmc) { /* flash type is fixed to SPI for CE0 and CE1 */ -- 2.20.1
From: Rashmica Gupta <rashmica.g@gmail.com> The AST2600 has the same sets of 3.6v gpios as the AST2400 plus an addtional two sets of 1.8V gpios. Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Acked-by: Joel Stanley <joel@jms.id.au> Message-id: 20190925143248.10000-15-clg@kaod.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- hw/gpio/aspeed_gpio.c | 142 ++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 137 insertions(+), 5 deletions(-) diff --git a/hw/gpio/aspeed_gpio.c b/hw/gpio/aspeed_gpio.c index XXXXXXX..XXXXXXX 100644 --- a/hw/gpio/aspeed_gpio.c +++ b/hw/gpio/aspeed_gpio.c @@ -XXX,XX +XXX,XX @@ #define GPIO_3_6V_MEM_SIZE 0x1F0 #define GPIO_3_6V_REG_ARRAY_SIZE (GPIO_3_6V_MEM_SIZE >> 2) +/* AST2600 only - 1.8V gpios */ +/* + * The AST2600 has same 3.6V gpios as the AST2400 (memory offsets 0x0-0x198) + * and addtional 1.8V gpios (memory offsets 0x800-0x9D4). + */ +#define GPIO_1_8V_REG_OFFSET 0x800 +#define GPIO_1_8V_ABCD_DATA_VALUE ((0x800 - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_ABCD_DIRECTION ((0x804 - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_ABCD_INT_ENABLE ((0x808 - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_ABCD_INT_SENS_0 ((0x80C - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_ABCD_INT_SENS_1 ((0x810 - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_ABCD_INT_SENS_2 ((0x814 - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_ABCD_INT_STATUS ((0x818 - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_ABCD_RESET_TOLERANT ((0x81C - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_E_DATA_VALUE ((0x820 - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_E_DIRECTION ((0x824 - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_E_INT_ENABLE ((0x828 - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_E_INT_SENS_0 ((0x82C - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_E_INT_SENS_1 ((0x830 - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_E_INT_SENS_2 ((0x834 - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_E_INT_STATUS ((0x838 - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_E_RESET_TOLERANT ((0x83C - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_ABCD_DEBOUNCE_1 ((0x840 - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_ABCD_DEBOUNCE_2 ((0x844 - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_E_DEBOUNCE_1 ((0x848 - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_E_DEBOUNCE_2 ((0x84C - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_DEBOUNCE_TIME_1 ((0x850 - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_DEBOUNCE_TIME_2 ((0x854 - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_DEBOUNCE_TIME_3 ((0x858 - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_ABCD_COMMAND_SRC_0 ((0x860 - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_ABCD_COMMAND_SRC_1 ((0x864 - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_E_COMMAND_SRC_0 ((0x868 - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_E_COMMAND_SRC_1 ((0x86C - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_ABCD_DATA_READ ((0x8C0 - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_E_DATA_READ ((0x8C4 - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_ABCD_INPUT_MASK ((0x9D0 - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_E_INPUT_MASK ((0x9D4 - GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_1_8V_MEM_SIZE 0x9D8 +#define GPIO_1_8V_REG_ARRAY_SIZE ((GPIO_1_8V_MEM_SIZE - \ + GPIO_1_8V_REG_OFFSET) >> 2) +#define GPIO_MAX_MEM_SIZE MAX(GPIO_3_6V_MEM_SIZE, GPIO_1_8V_MEM_SIZE) + static int aspeed_evaluate_irq(GPIOSets *regs, int gpio_prev_high, int gpio) { uint32_t falling_edge = 0, rising_edge = 0; @@ -XXX,XX +XXX,XX @@ static const AspeedGPIOReg aspeed_3_6v_gpios[GPIO_3_6V_REG_ARRAY_SIZE] = { [GPIO_AC_INPUT_MASK] = { 7, gpio_reg_input_mask }, }; +static const AspeedGPIOReg aspeed_1_8v_gpios[GPIO_1_8V_REG_ARRAY_SIZE] = { + /* 1.8V Set ABCD */ + [GPIO_1_8V_ABCD_DATA_VALUE] = {0, gpio_reg_data_value}, + [GPIO_1_8V_ABCD_DIRECTION] = {0, gpio_reg_direction}, + [GPIO_1_8V_ABCD_INT_ENABLE] = {0, gpio_reg_int_enable}, + [GPIO_1_8V_ABCD_INT_SENS_0] = {0, gpio_reg_int_sens_0}, + [GPIO_1_8V_ABCD_INT_SENS_1] = {0, gpio_reg_int_sens_1}, + [GPIO_1_8V_ABCD_INT_SENS_2] = {0, gpio_reg_int_sens_2}, + [GPIO_1_8V_ABCD_INT_STATUS] = {0, gpio_reg_int_status}, + [GPIO_1_8V_ABCD_RESET_TOLERANT] = {0, gpio_reg_reset_tolerant}, + [GPIO_1_8V_ABCD_DEBOUNCE_1] = {0, gpio_reg_debounce_1}, + [GPIO_1_8V_ABCD_DEBOUNCE_2] = {0, gpio_reg_debounce_2}, + [GPIO_1_8V_ABCD_COMMAND_SRC_0] = {0, gpio_reg_cmd_source_0}, + [GPIO_1_8V_ABCD_COMMAND_SRC_1] = {0, gpio_reg_cmd_source_1}, + [GPIO_1_8V_ABCD_DATA_READ] = {0, gpio_reg_data_read}, + [GPIO_1_8V_ABCD_INPUT_MASK] = {0, gpio_reg_input_mask}, + /* 1.8V Set E */ + [GPIO_1_8V_E_DATA_VALUE] = {1, gpio_reg_data_value}, + [GPIO_1_8V_E_DIRECTION] = {1, gpio_reg_direction}, + [GPIO_1_8V_E_INT_ENABLE] = {1, gpio_reg_int_enable}, + [GPIO_1_8V_E_INT_SENS_0] = {1, gpio_reg_int_sens_0}, + [GPIO_1_8V_E_INT_SENS_1] = {1, gpio_reg_int_sens_1}, + [GPIO_1_8V_E_INT_SENS_2] = {1, gpio_reg_int_sens_2}, + [GPIO_1_8V_E_INT_STATUS] = {1, gpio_reg_int_status}, + [GPIO_1_8V_E_RESET_TOLERANT] = {1, gpio_reg_reset_tolerant}, + [GPIO_1_8V_E_DEBOUNCE_1] = {1, gpio_reg_debounce_1}, + [GPIO_1_8V_E_DEBOUNCE_2] = {1, gpio_reg_debounce_2}, + [GPIO_1_8V_E_COMMAND_SRC_0] = {1, gpio_reg_cmd_source_0}, + [GPIO_1_8V_E_COMMAND_SRC_1] = {1, gpio_reg_cmd_source_1}, + [GPIO_1_8V_E_DATA_READ] = {1, gpio_reg_data_read}, + [GPIO_1_8V_E_INPUT_MASK] = {1, gpio_reg_input_mask}, +}; + static uint64_t aspeed_gpio_read(void *opaque, hwaddr offset, uint32_t size) { AspeedGPIOState *s = ASPEED_GPIO(opaque); @@ -XXX,XX +XXX,XX @@ static void aspeed_gpio_get_pin(Object *obj, Visitor *v, const char *name, int set_idx, group_idx = 0; if (sscanf(name, "gpio%2[A-Z]%1d", group, &pin) != 2) { - error_setg(errp, "%s: error reading %s", __func__, name); - return; + /* 1.8V gpio */ + if (sscanf(name, "gpio%3s%1d", group, &pin) != 2) { + error_setg(errp, "%s: error reading %s", __func__, name); + return; + } } set_idx = get_set_idx(s, group, &group_idx); if (set_idx == -1) { @@ -XXX,XX +XXX,XX @@ static void aspeed_gpio_set_pin(Object *obj, Visitor *v, const char *name, return; } if (sscanf(name, "gpio%2[A-Z]%1d", group, &pin) != 2) { - error_setg(errp, "%s: error reading %s", __func__, name); - return; + /* 1.8V gpio */ + if (sscanf(name, "gpio%3s%1d", group, &pin) != 2) { + error_setg(errp, "%s: error reading %s", __func__, name); + return; + } } set_idx = get_set_idx(s, group, &group_idx); if (set_idx == -1) { @@ -XXX,XX +XXX,XX @@ static const GPIOSetProperties ast2500_set_props[] = { [7] = {0x000000ff, 0x000000ff, {"AC"} }, }; +static GPIOSetProperties ast2600_3_6v_set_props[] = { + [0] = {0xffffffff, 0xffffffff, {"A", "B", "C", "D"} }, + [1] = {0xffffffff, 0xffffffff, {"E", "F", "G", "H"} }, + [2] = {0xffffffff, 0xffffffff, {"I", "J", "K", "L"} }, + [3] = {0xffffffff, 0xffffffff, {"M", "N", "O", "P"} }, + [4] = {0xffffffff, 0xffffffff, {"Q", "R", "S", "T"} }, + [5] = {0xffffffff, 0x0000ffff, {"U", "V", "W", "X"} }, + [6] = {0xffff0000, 0x0fff0000, {"Y", "Z", "", ""} }, +}; + +static GPIOSetProperties ast2600_1_8v_set_props[] = { + [0] = {0xffffffff, 0xffffffff, {"18A", "18B", "18C", "18D"} }, + [1] = {0x0000000f, 0x0000000f, {"18E"} }, +}; + static const MemoryRegionOps aspeed_gpio_ops = { .read = aspeed_gpio_read, .write = aspeed_gpio_write, @@ -XXX,XX +XXX,XX @@ static void aspeed_gpio_realize(DeviceState *dev, Error **errp) } memory_region_init_io(&s->iomem, OBJECT(s), &aspeed_gpio_ops, s, - TYPE_ASPEED_GPIO, GPIO_3_6V_MEM_SIZE); + TYPE_ASPEED_GPIO, GPIO_MAX_MEM_SIZE); sysbus_init_mmio(sbd, &s->iomem); } @@ -XXX,XX +XXX,XX @@ static void aspeed_gpio_2500_class_init(ObjectClass *klass, void *data) agc->reg_table = aspeed_3_6v_gpios; } +static void aspeed_gpio_ast2600_3_6v_class_init(ObjectClass *klass, void *data) +{ + AspeedGPIOClass *agc = ASPEED_GPIO_CLASS(klass); + + agc->props = ast2600_3_6v_set_props; + agc->nr_gpio_pins = 208; + agc->nr_gpio_sets = 7; + agc->reg_table = aspeed_3_6v_gpios; +} + +static void aspeed_gpio_ast2600_1_8v_class_init(ObjectClass *klass, void *data) +{ + AspeedGPIOClass *agc = ASPEED_GPIO_CLASS(klass); + + agc->props = ast2600_1_8v_set_props; + agc->nr_gpio_pins = 36; + agc->nr_gpio_sets = 2; + agc->reg_table = aspeed_1_8v_gpios; +} + static const TypeInfo aspeed_gpio_info = { .name = TYPE_ASPEED_GPIO, .parent = TYPE_SYS_BUS_DEVICE, @@ -XXX,XX +XXX,XX @@ static const TypeInfo aspeed_gpio_ast2500_info = { .instance_init = aspeed_gpio_init, }; +static const TypeInfo aspeed_gpio_ast2600_3_6v_info = { + .name = TYPE_ASPEED_GPIO "-ast2600", + .parent = TYPE_ASPEED_GPIO, + .class_init = aspeed_gpio_ast2600_3_6v_class_init, + .instance_init = aspeed_gpio_init, +}; + +static const TypeInfo aspeed_gpio_ast2600_1_8v_info = { + .name = TYPE_ASPEED_GPIO "-ast2600-1_8v", + .parent = TYPE_ASPEED_GPIO, + .class_init = aspeed_gpio_ast2600_1_8v_class_init, + .instance_init = aspeed_gpio_init, +}; + static void aspeed_gpio_register_types(void) { type_register_static(&aspeed_gpio_info); type_register_static(&aspeed_gpio_ast2400_info); type_register_static(&aspeed_gpio_ast2500_info); + type_register_static(&aspeed_gpio_ast2600_3_6v_info); + type_register_static(&aspeed_gpio_ast2600_1_8v_info); } type_init(aspeed_gpio_register_types); -- 2.20.1
From: Cédric Le Goater <clg@kaod.org> It prepares ground for register differences between SoCs. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Joel Stanley <joel@jms.id.au> Message-id: 20190925143248.10000-16-clg@kaod.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/hw/i2c/aspeed_i2c.h | 15 ++++++++++ hw/arm/aspeed_soc.c | 3 +- hw/i2c/aspeed_i2c.c | 60 ++++++++++++++++++++++++++++++++----- 3 files changed, 69 insertions(+), 9 deletions(-) diff --git a/include/hw/i2c/aspeed_i2c.h b/include/hw/i2c/aspeed_i2c.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/i2c/aspeed_i2c.h +++ b/include/hw/i2c/aspeed_i2c.h @@ -XXX,XX +XXX,XX @@ #include "hw/sysbus.h" #define TYPE_ASPEED_I2C "aspeed.i2c" +#define TYPE_ASPEED_2400_I2C TYPE_ASPEED_I2C "-ast2400" +#define TYPE_ASPEED_2500_I2C TYPE_ASPEED_I2C "-ast2500" #define ASPEED_I2C(obj) \ OBJECT_CHECK(AspeedI2CState, (obj), TYPE_ASPEED_I2C) @@ -XXX,XX +XXX,XX @@ typedef struct AspeedI2CState { AspeedI2CBus busses[ASPEED_I2C_NR_BUSSES]; } AspeedI2CState; +#define ASPEED_I2C_CLASS(klass) \ + OBJECT_CLASS_CHECK(AspeedI2CClass, (klass), TYPE_ASPEED_I2C) +#define ASPEED_I2C_GET_CLASS(obj) \ + OBJECT_GET_CLASS(AspeedI2CClass, (obj), TYPE_ASPEED_I2C) + +typedef struct AspeedI2CClass { + SysBusDeviceClass parent_class; + + uint8_t num_busses; + uint8_t reg_size; + uint8_t gap; +} AspeedI2CClass; + I2CBus *aspeed_i2c_get_bus(DeviceState *dev, int busnr); #endif /* ASPEED_I2C_H */ diff --git a/hw/arm/aspeed_soc.c b/hw/arm/aspeed_soc.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/aspeed_soc.c +++ b/hw/arm/aspeed_soc.c @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_init(Object *obj) object_property_add_const_link(OBJECT(&s->timerctrl), "scu", OBJECT(&s->scu), &error_abort); + snprintf(typename, sizeof(typename), "aspeed.i2c-%s", socname); sysbus_init_child_obj(obj, "i2c", OBJECT(&s->i2c), sizeof(s->i2c), - TYPE_ASPEED_I2C); + typename); snprintf(typename, sizeof(typename), "aspeed.fmc-%s", socname); sysbus_init_child_obj(obj, "fmc", OBJECT(&s->fmc), sizeof(s->fmc), diff --git a/hw/i2c/aspeed_i2c.c b/hw/i2c/aspeed_i2c.c index XXXXXXX..XXXXXXX 100644 --- a/hw/i2c/aspeed_i2c.c +++ b/hw/i2c/aspeed_i2c.c @@ -XXX,XX +XXX,XX @@ static void aspeed_i2c_reset(DeviceState *dev) { int i; AspeedI2CState *s = ASPEED_I2C(dev); + AspeedI2CClass *aic = ASPEED_I2C_GET_CLASS(s); s->intr_status = 0; - for (i = 0; i < ASPEED_I2C_NR_BUSSES; i++) { + for (i = 0; i < aic->num_busses; i++) { s->busses[i].intr_ctrl = 0; s->busses[i].intr_status = 0; s->busses[i].cmd = 0; @@ -XXX,XX +XXX,XX @@ static void aspeed_i2c_reset(DeviceState *dev) } /* - * Address Definitions + * Address Definitions (AST2400 and AST2500) * * 0x000 ... 0x03F: Global Register * 0x040 ... 0x07F: Device 1 @@ -XXX,XX +XXX,XX @@ static void aspeed_i2c_realize(DeviceState *dev, Error **errp) int i; SysBusDevice *sbd = SYS_BUS_DEVICE(dev); AspeedI2CState *s = ASPEED_I2C(dev); + AspeedI2CClass *aic = ASPEED_I2C_GET_CLASS(s); sysbus_init_irq(sbd, &s->irq); memory_region_init_io(&s->iomem, OBJECT(s), &aspeed_i2c_ctrl_ops, s, "aspeed.i2c", 0x1000); sysbus_init_mmio(sbd, &s->iomem); - for (i = 0; i < ASPEED_I2C_NR_BUSSES; i++) { - char name[16]; - int offset = i < 7 ? 1 : 5; + for (i = 0; i < aic->num_busses; i++) { + char name[32]; + int offset = i < aic->gap ? 1 : 5; snprintf(name, sizeof(name), "aspeed.i2c.%d", i); s->busses[i].controller = s; s->busses[i].id = i; s->busses[i].bus = i2c_init_bus(dev, name); memory_region_init_io(&s->busses[i].mr, OBJECT(dev), - &aspeed_i2c_bus_ops, &s->busses[i], name, 0x40); - memory_region_add_subregion(&s->iomem, 0x40 * (i + offset), + &aspeed_i2c_bus_ops, &s->busses[i], name, + aic->reg_size); + memory_region_add_subregion(&s->iomem, aic->reg_size * (i + offset), &s->busses[i].mr); } } @@ -XXX,XX +XXX,XX @@ static const TypeInfo aspeed_i2c_info = { .parent = TYPE_SYS_BUS_DEVICE, .instance_size = sizeof(AspeedI2CState), .class_init = aspeed_i2c_class_init, + .class_size = sizeof(AspeedI2CClass), + .abstract = true, +}; + +static void aspeed_2400_i2c_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(klass); + AspeedI2CClass *aic = ASPEED_I2C_CLASS(klass); + + dc->desc = "ASPEED 2400 I2C Controller"; + + aic->num_busses = 14; + aic->reg_size = 0x40; + aic->gap = 7; +} + +static const TypeInfo aspeed_2400_i2c_info = { + .name = TYPE_ASPEED_2400_I2C, + .parent = TYPE_ASPEED_I2C, + .class_init = aspeed_2400_i2c_class_init, +}; + +static void aspeed_2500_i2c_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(klass); + AspeedI2CClass *aic = ASPEED_I2C_CLASS(klass); + + dc->desc = "ASPEED 2500 I2C Controller"; + + aic->num_busses = 14; + aic->reg_size = 0x40; + aic->gap = 7; +} + +static const TypeInfo aspeed_2500_i2c_info = { + .name = TYPE_ASPEED_2500_I2C, + .parent = TYPE_ASPEED_I2C, + .class_init = aspeed_2500_i2c_class_init, }; static void aspeed_i2c_register_types(void) { type_register_static(&aspeed_i2c_info); + type_register_static(&aspeed_2400_i2c_info); + type_register_static(&aspeed_2500_i2c_info); } type_init(aspeed_i2c_register_types) @@ -XXX,XX +XXX,XX @@ type_init(aspeed_i2c_register_types) I2CBus *aspeed_i2c_get_bus(DeviceState *dev, int busnr) { AspeedI2CState *s = ASPEED_I2C(dev); + AspeedI2CClass *aic = ASPEED_I2C_GET_CLASS(s); I2CBus *bus = NULL; - if (busnr >= 0 && busnr < ASPEED_I2C_NR_BUSSES) { + if (busnr >= 0 && busnr < aic->num_busses) { bus = s->busses[busnr].bus; } -- 2.20.1
From: Cédric Le Goater <clg@kaod.org> The I2C controller of the AST2400 and AST2500 SoCs have one IRQ shared by all I2C busses. The AST2600 SoC I2C controller has one IRQ per bus and 16 busses. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Joel Stanley <joel@jms.id.au> Message-id: 20190925143248.10000-17-clg@kaod.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/hw/i2c/aspeed_i2c.h | 5 +++- hw/i2c/aspeed_i2c.c | 46 +++++++++++++++++++++++++++++++++++-- 2 files changed, 48 insertions(+), 3 deletions(-) diff --git a/include/hw/i2c/aspeed_i2c.h b/include/hw/i2c/aspeed_i2c.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/i2c/aspeed_i2c.h +++ b/include/hw/i2c/aspeed_i2c.h @@ -XXX,XX +XXX,XX @@ #define TYPE_ASPEED_I2C "aspeed.i2c" #define TYPE_ASPEED_2400_I2C TYPE_ASPEED_I2C "-ast2400" #define TYPE_ASPEED_2500_I2C TYPE_ASPEED_I2C "-ast2500" +#define TYPE_ASPEED_2600_I2C TYPE_ASPEED_I2C "-ast2600" #define ASPEED_I2C(obj) \ OBJECT_CHECK(AspeedI2CState, (obj), TYPE_ASPEED_I2C) -#define ASPEED_I2C_NR_BUSSES 14 +#define ASPEED_I2C_NR_BUSSES 16 struct AspeedI2CState; @@ -XXX,XX +XXX,XX @@ typedef struct AspeedI2CBus { I2CBus *bus; uint8_t id; + qemu_irq irq; uint32_t ctrl; uint32_t timing[2]; @@ -XXX,XX +XXX,XX @@ typedef struct AspeedI2CClass { uint8_t num_busses; uint8_t reg_size; uint8_t gap; + qemu_irq (*bus_get_irq)(AspeedI2CBus *); } AspeedI2CClass; I2CBus *aspeed_i2c_get_bus(DeviceState *dev, int busnr); diff --git a/hw/i2c/aspeed_i2c.c b/hw/i2c/aspeed_i2c.c index XXXXXXX..XXXXXXX 100644 --- a/hw/i2c/aspeed_i2c.c +++ b/hw/i2c/aspeed_i2c.c @@ -XXX,XX +XXX,XX @@ static inline bool aspeed_i2c_bus_is_enabled(AspeedI2CBus *bus) static inline void aspeed_i2c_bus_raise_interrupt(AspeedI2CBus *bus) { + AspeedI2CClass *aic = ASPEED_I2C_GET_CLASS(bus->controller); + bus->intr_status &= bus->intr_ctrl; if (bus->intr_status) { bus->controller->intr_status |= 1 << bus->id; - qemu_irq_raise(bus->controller->irq); + qemu_irq_raise(aic->bus_get_irq(bus)); } } @@ -XXX,XX +XXX,XX @@ static void aspeed_i2c_bus_write(void *opaque, hwaddr offset, uint64_t value, unsigned size) { AspeedI2CBus *bus = opaque; + AspeedI2CClass *aic = ASPEED_I2C_GET_CLASS(bus->controller); bool handle_rx; switch (offset) { @@ -XXX,XX +XXX,XX @@ static void aspeed_i2c_bus_write(void *opaque, hwaddr offset, bus->intr_status &= ~(value & 0x7FFF); if (!bus->intr_status) { bus->controller->intr_status &= ~(1 << bus->id); - qemu_irq_lower(bus->controller->irq); + qemu_irq_lower(aic->bus_get_irq(bus)); } if (handle_rx && (bus->cmd & (I2CD_M_RX_CMD | I2CD_M_S_RX_CMD_LAST))) { aspeed_i2c_handle_rx_cmd(bus); @@ -XXX,XX +XXX,XX @@ static void aspeed_i2c_realize(DeviceState *dev, Error **errp) for (i = 0; i < aic->num_busses; i++) { char name[32]; int offset = i < aic->gap ? 1 : 5; + + sysbus_init_irq(sbd, &s->busses[i].irq); snprintf(name, sizeof(name), "aspeed.i2c.%d", i); s->busses[i].controller = s; s->busses[i].id = i; @@ -XXX,XX +XXX,XX @@ static const TypeInfo aspeed_i2c_info = { .abstract = true, }; +static qemu_irq aspeed_2400_i2c_bus_get_irq(AspeedI2CBus *bus) +{ + return bus->controller->irq; +} + static void aspeed_2400_i2c_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); @@ -XXX,XX +XXX,XX @@ static void aspeed_2400_i2c_class_init(ObjectClass *klass, void *data) aic->num_busses = 14; aic->reg_size = 0x40; aic->gap = 7; + aic->bus_get_irq = aspeed_2400_i2c_bus_get_irq; } static const TypeInfo aspeed_2400_i2c_info = { @@ -XXX,XX +XXX,XX @@ static const TypeInfo aspeed_2400_i2c_info = { .class_init = aspeed_2400_i2c_class_init, }; +static qemu_irq aspeed_2500_i2c_bus_get_irq(AspeedI2CBus *bus) +{ + return bus->controller->irq; +} + static void aspeed_2500_i2c_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); @@ -XXX,XX +XXX,XX @@ static void aspeed_2500_i2c_class_init(ObjectClass *klass, void *data) aic->num_busses = 14; aic->reg_size = 0x40; aic->gap = 7; + aic->bus_get_irq = aspeed_2500_i2c_bus_get_irq; } static const TypeInfo aspeed_2500_i2c_info = { @@ -XXX,XX +XXX,XX @@ static const TypeInfo aspeed_2500_i2c_info = { .class_init = aspeed_2500_i2c_class_init, }; +static qemu_irq aspeed_2600_i2c_bus_get_irq(AspeedI2CBus *bus) +{ + return bus->irq; +} + +static void aspeed_2600_i2c_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(klass); + AspeedI2CClass *aic = ASPEED_I2C_CLASS(klass); + + dc->desc = "ASPEED 2600 I2C Controller"; + + aic->num_busses = 16; + aic->reg_size = 0x80; + aic->gap = -1; /* no gap */ + aic->bus_get_irq = aspeed_2600_i2c_bus_get_irq; +} + +static const TypeInfo aspeed_2600_i2c_info = { + .name = TYPE_ASPEED_2600_I2C, + .parent = TYPE_ASPEED_I2C, + .class_init = aspeed_2600_i2c_class_init, +}; + static void aspeed_i2c_register_types(void) { type_register_static(&aspeed_i2c_info); type_register_static(&aspeed_2400_i2c_info); type_register_static(&aspeed_2500_i2c_info); + type_register_static(&aspeed_2600_i2c_info); } type_init(aspeed_i2c_register_types) -- 2.20.1
From: Cédric Le Goater <clg@kaod.org> It prepares ground for the AST2600. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Joel Stanley <joel@jms.id.au> Message-id: 20190925143248.10000-18-clg@kaod.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/hw/arm/aspeed_soc.h | 9 +-- hw/arm/aspeed.c | 4 +- hw/arm/aspeed_soc.c | 148 +++++++++++++++++++----------------- 3 files changed, 84 insertions(+), 77 deletions(-) diff --git a/include/hw/arm/aspeed_soc.h b/include/hw/arm/aspeed_soc.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/arm/aspeed_soc.h +++ b/include/hw/arm/aspeed_soc.h @@ -XXX,XX +XXX,XX @@ typedef struct AspeedSoCState { #define TYPE_ASPEED_SOC "aspeed-soc" #define ASPEED_SOC(obj) OBJECT_CHECK(AspeedSoCState, (obj), TYPE_ASPEED_SOC) -typedef struct AspeedSoCInfo { +typedef struct AspeedSoCClass { + DeviceClass parent_class; + const char *name; const char *cpu_type; uint32_t silicon_rev; @@ -XXX,XX +XXX,XX @@ typedef struct AspeedSoCInfo { const int *irqmap; const hwaddr *memmap; uint32_t num_cpus; -} AspeedSoCInfo; - -typedef struct AspeedSoCClass { - DeviceClass parent_class; - AspeedSoCInfo *info; } AspeedSoCClass; #define ASPEED_SOC_CLASS(klass) \ diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/aspeed.c +++ b/hw/arm/aspeed.c @@ -XXX,XX +XXX,XX @@ static void aspeed_board_init(MachineState *machine, memory_region_allocate_system_memory(&bmc->ram, NULL, "ram", ram_size); memory_region_add_subregion(&bmc->ram_container, 0, &bmc->ram); memory_region_add_subregion(get_system_memory(), - sc->info->memmap[ASPEED_SDRAM], + sc->memmap[ASPEED_SDRAM], &bmc->ram_container); max_ram_size = object_property_get_uint(OBJECT(&bmc->soc), "max-ram-size", @@ -XXX,XX +XXX,XX @@ static void aspeed_board_init(MachineState *machine, } aspeed_board_binfo.ram_size = ram_size; - aspeed_board_binfo.loader_start = sc->info->memmap[ASPEED_SDRAM]; + aspeed_board_binfo.loader_start = sc->memmap[ASPEED_SDRAM]; aspeed_board_binfo.nb_cpus = bmc->soc.num_cpus; if (cfg->i2c_init) { diff --git a/hw/arm/aspeed_soc.c b/hw/arm/aspeed_soc.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/aspeed_soc.c +++ b/hw/arm/aspeed_soc.c @@ -XXX,XX +XXX,XX @@ static const int aspeed_soc_ast2400_irqmap[] = { #define aspeed_soc_ast2500_irqmap aspeed_soc_ast2400_irqmap -static const AspeedSoCInfo aspeed_socs[] = { - { - .name = "ast2400-a1", - .cpu_type = ARM_CPU_TYPE_NAME("arm926"), - .silicon_rev = AST2400_A1_SILICON_REV, - .sram_size = 0x8000, - .spis_num = 1, - .wdts_num = 2, - .irqmap = aspeed_soc_ast2400_irqmap, - .memmap = aspeed_soc_ast2400_memmap, - .num_cpus = 1, - }, { - .name = "ast2500-a1", - .cpu_type = ARM_CPU_TYPE_NAME("arm1176"), - .silicon_rev = AST2500_A1_SILICON_REV, - .sram_size = 0x9000, - .spis_num = 2, - .wdts_num = 3, - .irqmap = aspeed_soc_ast2500_irqmap, - .memmap = aspeed_soc_ast2500_memmap, - .num_cpus = 1, - }, -}; - static qemu_irq aspeed_soc_get_irq(AspeedSoCState *s, int ctrl) { AspeedSoCClass *sc = ASPEED_SOC_GET_CLASS(s); - return qdev_get_gpio_in(DEVICE(&s->vic), sc->info->irqmap[ctrl]); + return qdev_get_gpio_in(DEVICE(&s->vic), sc->irqmap[ctrl]); } static void aspeed_soc_init(Object *obj) @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_init(Object *obj) char socname[8]; char typename[64]; - if (sscanf(sc->info->name, "%7s", socname) != 1) { + if (sscanf(sc->name, "%7s", socname) != 1) { g_assert_not_reached(); } - for (i = 0; i < sc->info->num_cpus; i++) { + for (i = 0; i < sc->num_cpus; i++) { object_initialize_child(obj, "cpu[*]", OBJECT(&s->cpu[i]), - sizeof(s->cpu[i]), sc->info->cpu_type, + sizeof(s->cpu[i]), sc->cpu_type, &error_abort, NULL); } @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_init(Object *obj) sysbus_init_child_obj(obj, "scu", OBJECT(&s->scu), sizeof(s->scu), typename); qdev_prop_set_uint32(DEVICE(&s->scu), "silicon-rev", - sc->info->silicon_rev); + sc->silicon_rev); object_property_add_alias(obj, "hw-strap1", OBJECT(&s->scu), "hw-strap1", &error_abort); object_property_add_alias(obj, "hw-strap2", OBJECT(&s->scu), @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_init(Object *obj) object_property_add_alias(obj, "dram", OBJECT(&s->fmc), "dram", &error_abort); - for (i = 0; i < sc->info->spis_num; i++) { + for (i = 0; i < sc->spis_num; i++) { snprintf(typename, sizeof(typename), "aspeed.spi%d-%s", i + 1, socname); sysbus_init_child_obj(obj, "spi[*]", OBJECT(&s->spi[i]), sizeof(s->spi[i]), typename); @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_init(Object *obj) object_property_add_alias(obj, "max-ram-size", OBJECT(&s->sdmc), "max-ram-size", &error_abort); - for (i = 0; i < sc->info->wdts_num; i++) { + for (i = 0; i < sc->wdts_num; i++) { snprintf(typename, sizeof(typename), "aspeed.wdt-%s", socname); sysbus_init_child_obj(obj, "wdt[*]", OBJECT(&s->wdt[i]), sizeof(s->wdt[i]), typename); @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_realize(DeviceState *dev, Error **errp) Error *err = NULL, *local_err = NULL; /* IO space */ - create_unimplemented_device("aspeed_soc.io", sc->info->memmap[ASPEED_IOMEM], + create_unimplemented_device("aspeed_soc.io", sc->memmap[ASPEED_IOMEM], ASPEED_SOC_IOMEM_SIZE); - if (s->num_cpus > sc->info->num_cpus) { + if (s->num_cpus > sc->num_cpus) { warn_report("%s: invalid number of CPUs %d, using default %d", - sc->info->name, s->num_cpus, sc->info->num_cpus); - s->num_cpus = sc->info->num_cpus; + sc->name, s->num_cpus, sc->num_cpus); + s->num_cpus = sc->num_cpus; } /* CPU */ @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_realize(DeviceState *dev, Error **errp) /* SRAM */ memory_region_init_ram(&s->sram, OBJECT(dev), "aspeed.sram", - sc->info->sram_size, &err); + sc->sram_size, &err); if (err) { error_propagate(errp, err); return; } memory_region_add_subregion(get_system_memory(), - sc->info->memmap[ASPEED_SRAM], &s->sram); + sc->memmap[ASPEED_SRAM], &s->sram); /* SCU */ object_property_set_bool(OBJECT(&s->scu), true, "realized", &err); @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_realize(DeviceState *dev, Error **errp) error_propagate(errp, err); return; } - sysbus_mmio_map(SYS_BUS_DEVICE(&s->scu), 0, sc->info->memmap[ASPEED_SCU]); + sysbus_mmio_map(SYS_BUS_DEVICE(&s->scu), 0, sc->memmap[ASPEED_SCU]); /* VIC */ object_property_set_bool(OBJECT(&s->vic), true, "realized", &err); @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_realize(DeviceState *dev, Error **errp) error_propagate(errp, err); return; } - sysbus_mmio_map(SYS_BUS_DEVICE(&s->vic), 0, sc->info->memmap[ASPEED_VIC]); + sysbus_mmio_map(SYS_BUS_DEVICE(&s->vic), 0, sc->memmap[ASPEED_VIC]); sysbus_connect_irq(SYS_BUS_DEVICE(&s->vic), 0, qdev_get_gpio_in(DEVICE(&s->cpu), ARM_CPU_IRQ)); sysbus_connect_irq(SYS_BUS_DEVICE(&s->vic), 1, @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_realize(DeviceState *dev, Error **errp) error_propagate(errp, err); return; } - sysbus_mmio_map(SYS_BUS_DEVICE(&s->rtc), 0, sc->info->memmap[ASPEED_RTC]); + sysbus_mmio_map(SYS_BUS_DEVICE(&s->rtc), 0, sc->memmap[ASPEED_RTC]); sysbus_connect_irq(SYS_BUS_DEVICE(&s->rtc), 0, aspeed_soc_get_irq(s, ASPEED_RTC)); @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_realize(DeviceState *dev, Error **errp) return; } sysbus_mmio_map(SYS_BUS_DEVICE(&s->timerctrl), 0, - sc->info->memmap[ASPEED_TIMER1]); + sc->memmap[ASPEED_TIMER1]); for (i = 0; i < ASPEED_TIMER_NR_TIMERS; i++) { qemu_irq irq = aspeed_soc_get_irq(s, ASPEED_TIMER1 + i); sysbus_connect_irq(SYS_BUS_DEVICE(&s->timerctrl), i, irq); @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_realize(DeviceState *dev, Error **errp) /* UART - attach an 8250 to the IO space as our UART5 */ if (serial_hd(0)) { qemu_irq uart5 = aspeed_soc_get_irq(s, ASPEED_UART5); - serial_mm_init(get_system_memory(), sc->info->memmap[ASPEED_UART5], 2, + serial_mm_init(get_system_memory(), sc->memmap[ASPEED_UART5], 2, uart5, 38400, serial_hd(0), DEVICE_LITTLE_ENDIAN); } @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_realize(DeviceState *dev, Error **errp) error_propagate(errp, err); return; } - sysbus_mmio_map(SYS_BUS_DEVICE(&s->i2c), 0, sc->info->memmap[ASPEED_I2C]); + sysbus_mmio_map(SYS_BUS_DEVICE(&s->i2c), 0, sc->memmap[ASPEED_I2C]); sysbus_connect_irq(SYS_BUS_DEVICE(&s->i2c), 0, aspeed_soc_get_irq(s, ASPEED_I2C)); /* FMC, The number of CS is set at the board level */ - object_property_set_int(OBJECT(&s->fmc), sc->info->memmap[ASPEED_SDRAM], + object_property_set_int(OBJECT(&s->fmc), sc->memmap[ASPEED_SDRAM], "sdram-base", &err); if (err) { error_propagate(errp, err); @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_realize(DeviceState *dev, Error **errp) error_propagate(errp, err); return; } - sysbus_mmio_map(SYS_BUS_DEVICE(&s->fmc), 0, sc->info->memmap[ASPEED_FMC]); + sysbus_mmio_map(SYS_BUS_DEVICE(&s->fmc), 0, sc->memmap[ASPEED_FMC]); sysbus_mmio_map(SYS_BUS_DEVICE(&s->fmc), 1, s->fmc.ctrl->flash_window_base); sysbus_connect_irq(SYS_BUS_DEVICE(&s->fmc), 0, aspeed_soc_get_irq(s, ASPEED_FMC)); /* SPI */ - for (i = 0; i < sc->info->spis_num; i++) { + for (i = 0; i < sc->spis_num; i++) { object_property_set_int(OBJECT(&s->spi[i]), 1, "num-cs", &err); object_property_set_bool(OBJECT(&s->spi[i]), true, "realized", &local_err); @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_realize(DeviceState *dev, Error **errp) return; } sysbus_mmio_map(SYS_BUS_DEVICE(&s->spi[i]), 0, - sc->info->memmap[ASPEED_SPI1 + i]); + sc->memmap[ASPEED_SPI1 + i]); sysbus_mmio_map(SYS_BUS_DEVICE(&s->spi[i]), 1, s->spi[i].ctrl->flash_window_base); } @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_realize(DeviceState *dev, Error **errp) error_propagate(errp, err); return; } - sysbus_mmio_map(SYS_BUS_DEVICE(&s->sdmc), 0, sc->info->memmap[ASPEED_SDMC]); + sysbus_mmio_map(SYS_BUS_DEVICE(&s->sdmc), 0, sc->memmap[ASPEED_SDMC]); /* Watch dog */ - for (i = 0; i < sc->info->wdts_num; i++) { + for (i = 0; i < sc->wdts_num; i++) { AspeedWDTClass *awc = ASPEED_WDT_GET_CLASS(&s->wdt[i]); object_property_set_bool(OBJECT(&s->wdt[i]), true, "realized", &err); @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_realize(DeviceState *dev, Error **errp) return; } sysbus_mmio_map(SYS_BUS_DEVICE(&s->wdt[i]), 0, - sc->info->memmap[ASPEED_WDT] + i * awc->offset); + sc->memmap[ASPEED_WDT] + i * awc->offset); } /* Net */ @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_realize(DeviceState *dev, Error **errp) return; } sysbus_mmio_map(SYS_BUS_DEVICE(&s->ftgmac100[i]), 0, - sc->info->memmap[ASPEED_ETH1 + i]); + sc->memmap[ASPEED_ETH1 + i]); sysbus_connect_irq(SYS_BUS_DEVICE(&s->ftgmac100[i]), 0, aspeed_soc_get_irq(s, ASPEED_ETH1 + i)); } @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_realize(DeviceState *dev, Error **errp) return; } sysbus_mmio_map(SYS_BUS_DEVICE(&s->xdma), 0, - sc->info->memmap[ASPEED_XDMA]); + sc->memmap[ASPEED_XDMA]); sysbus_connect_irq(SYS_BUS_DEVICE(&s->xdma), 0, aspeed_soc_get_irq(s, ASPEED_XDMA)); @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_realize(DeviceState *dev, Error **errp) error_propagate(errp, err); return; } - sysbus_mmio_map(SYS_BUS_DEVICE(&s->gpio), 0, sc->info->memmap[ASPEED_GPIO]); + sysbus_mmio_map(SYS_BUS_DEVICE(&s->gpio), 0, sc->memmap[ASPEED_GPIO]); sysbus_connect_irq(SYS_BUS_DEVICE(&s->gpio), 0, aspeed_soc_get_irq(s, ASPEED_GPIO)); @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_realize(DeviceState *dev, Error **errp) return; } sysbus_mmio_map(SYS_BUS_DEVICE(&s->sdhci), 0, - sc->info->memmap[ASPEED_SDHCI]); + sc->memmap[ASPEED_SDHCI]); sysbus_connect_irq(SYS_BUS_DEVICE(&s->sdhci), 0, aspeed_soc_get_irq(s, ASPEED_SDHCI)); } @@ -XXX,XX +XXX,XX @@ static Property aspeed_soc_properties[] = { static void aspeed_soc_class_init(ObjectClass *oc, void *data) { DeviceClass *dc = DEVICE_CLASS(oc); - AspeedSoCClass *sc = ASPEED_SOC_CLASS(oc); - sc->info = (AspeedSoCInfo *) data; dc->realize = aspeed_soc_realize; /* Reason: Uses serial_hds and nd_table in realize() directly */ dc->user_creatable = false; @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_class_init(ObjectClass *oc, void *data) static const TypeInfo aspeed_soc_type_info = { .name = TYPE_ASPEED_SOC, .parent = TYPE_DEVICE, - .instance_init = aspeed_soc_init, .instance_size = sizeof(AspeedSoCState), .class_size = sizeof(AspeedSoCClass), + .class_init = aspeed_soc_class_init, .abstract = true, }; -static void aspeed_soc_register_types(void) +static void aspeed_soc_ast2400_class_init(ObjectClass *oc, void *data) { - int i; + AspeedSoCClass *sc = ASPEED_SOC_CLASS(oc); - type_register_static(&aspeed_soc_type_info); - for (i = 0; i < ARRAY_SIZE(aspeed_socs); ++i) { - TypeInfo ti = { - .name = aspeed_socs[i].name, - .parent = TYPE_ASPEED_SOC, - .class_init = aspeed_soc_class_init, - .class_data = (void *) &aspeed_socs[i], - }; - type_register(&ti); - } + sc->name = "ast2400-a1"; + sc->cpu_type = ARM_CPU_TYPE_NAME("arm926"); + sc->silicon_rev = AST2400_A1_SILICON_REV; + sc->sram_size = 0x8000; + sc->spis_num = 1; + sc->wdts_num = 2; + sc->irqmap = aspeed_soc_ast2400_irqmap; + sc->memmap = aspeed_soc_ast2400_memmap; + sc->num_cpus = 1; } +static const TypeInfo aspeed_soc_ast2400_type_info = { + .name = "ast2400-a1", + .parent = TYPE_ASPEED_SOC, + .instance_init = aspeed_soc_init, + .instance_size = sizeof(AspeedSoCState), + .class_init = aspeed_soc_ast2400_class_init, +}; + +static void aspeed_soc_ast2500_class_init(ObjectClass *oc, void *data) +{ + AspeedSoCClass *sc = ASPEED_SOC_CLASS(oc); + + sc->name = "ast2500-a1"; + sc->cpu_type = ARM_CPU_TYPE_NAME("arm1176"); + sc->silicon_rev = AST2500_A1_SILICON_REV; + sc->sram_size = 0x9000; + sc->spis_num = 2; + sc->wdts_num = 3; + sc->irqmap = aspeed_soc_ast2500_irqmap; + sc->memmap = aspeed_soc_ast2500_memmap; + sc->num_cpus = 1; +} + +static const TypeInfo aspeed_soc_ast2500_type_info = { + .name = "ast2500-a1", + .parent = TYPE_ASPEED_SOC, + .instance_init = aspeed_soc_init, + .instance_size = sizeof(AspeedSoCState), + .class_init = aspeed_soc_ast2500_class_init, +}; +static void aspeed_soc_register_types(void) +{ + type_register_static(&aspeed_soc_type_info); + type_register_static(&aspeed_soc_ast2400_type_info); + type_register_static(&aspeed_soc_ast2500_type_info); +}; + type_init(aspeed_soc_register_types) -- 2.20.1
From: Cédric Le Goater <clg@kaod.org> Initial definitions for a simple machine using an AST2600 SoC (Cortex CPU). The Cortex CPU and its interrupt controller are too complex to handle in the common Aspeed SoC framework. We introduce a new Aspeed SoC class with instance_init and realize handlers to handle the differences with the AST2400 and the AST2500 SoCs. This will add extra work to keep in sync both models with future extensions but it makes the code clearer. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Joel Stanley <joel@jms.id.au> Message-id: 20190925143248.10000-19-clg@kaod.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- hw/arm/Makefile.objs | 2 +- include/hw/arm/aspeed_soc.h | 4 + hw/arm/aspeed_ast2600.c | 492 ++++++++++++++++++++++++++++++++++++ 3 files changed, 497 insertions(+), 1 deletion(-) create mode 100644 hw/arm/aspeed_ast2600.c diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/Makefile.objs +++ b/hw/arm/Makefile.objs @@ -XXX,XX +XXX,XX @@ obj-$(CONFIG_XLNX_VERSAL) += xlnx-versal.o xlnx-versal-virt.o obj-$(CONFIG_FSL_IMX25) += fsl-imx25.o imx25_pdk.o obj-$(CONFIG_FSL_IMX31) += fsl-imx31.o kzm.o obj-$(CONFIG_FSL_IMX6) += fsl-imx6.o -obj-$(CONFIG_ASPEED_SOC) += aspeed_soc.o aspeed.o +obj-$(CONFIG_ASPEED_SOC) += aspeed_soc.o aspeed.o aspeed_ast2600.o obj-$(CONFIG_MPS2) += mps2.o obj-$(CONFIG_MPS2) += mps2-tz.o obj-$(CONFIG_MSF2) += msf2-soc.o diff --git a/include/hw/arm/aspeed_soc.h b/include/hw/arm/aspeed_soc.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/arm/aspeed_soc.h +++ b/include/hw/arm/aspeed_soc.h @@ -XXX,XX +XXX,XX @@ #ifndef ASPEED_SOC_H #define ASPEED_SOC_H +#include "hw/cpu/a15mpcore.h" #include "hw/intc/aspeed_vic.h" #include "hw/misc/aspeed_scu.h" #include "hw/misc/aspeed_sdmc.h" @@ -XXX,XX +XXX,XX @@ typedef struct AspeedSoCState { /*< public >*/ ARMCPU cpu[ASPEED_CPUS_NUM]; uint32_t num_cpus; + A15MPPrivState a7mpcore; MemoryRegion sram; AspeedVICState vic; AspeedRtcState rtc; @@ -XXX,XX +XXX,XX @@ typedef struct AspeedSoCState { AspeedWDTState wdt[ASPEED_WDTS_NUM]; FTGMAC100State ftgmac100[ASPEED_MACS_NUM]; AspeedGPIOState gpio; + AspeedGPIOState gpio_1_8v; AspeedSDHCIState sdhci; } AspeedSoCState; @@ -XXX,XX +XXX,XX @@ enum { ASPEED_SRAM, ASPEED_SDHCI, ASPEED_GPIO, + ASPEED_GPIO_1_8V, ASPEED_RTC, ASPEED_TIMER1, ASPEED_TIMER2, diff --git a/hw/arm/aspeed_ast2600.c b/hw/arm/aspeed_ast2600.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/hw/arm/aspeed_ast2600.c @@ -XXX,XX +XXX,XX @@ +/* + * ASPEED SoC 2600 family + * + * Copyright (c) 2016-2019, IBM Corporation. + * + * This code is licensed under the GPL version 2 or later. See + * the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include "qapi/error.h" +#include "cpu.h" +#include "exec/address-spaces.h" +#include "hw/misc/unimp.h" +#include "hw/arm/aspeed_soc.h" +#include "hw/char/serial.h" +#include "qemu/log.h" +#include "qemu/module.h" +#include "qemu/error-report.h" +#include "hw/i2c/aspeed_i2c.h" +#include "net/net.h" +#include "sysemu/sysemu.h" + +#define ASPEED_SOC_IOMEM_SIZE 0x00200000 + +static const hwaddr aspeed_soc_ast2600_memmap[] = { + [ASPEED_SRAM] = 0x10000000, + /* 0x16000000 0x17FFFFFF : AHB BUS do LPC Bus bridge */ + [ASPEED_IOMEM] = 0x1E600000, + [ASPEED_PWM] = 0x1E610000, + [ASPEED_FMC] = 0x1E620000, + [ASPEED_SPI1] = 0x1E630000, + [ASPEED_SPI2] = 0x1E641000, + [ASPEED_ETH1] = 0x1E660000, + [ASPEED_ETH2] = 0x1E680000, + [ASPEED_VIC] = 0x1E6C0000, + [ASPEED_SDMC] = 0x1E6E0000, + [ASPEED_SCU] = 0x1E6E2000, + [ASPEED_XDMA] = 0x1E6E7000, + [ASPEED_ADC] = 0x1E6E9000, + [ASPEED_SDHCI] = 0x1E740000, + [ASPEED_GPIO] = 0x1E780000, + [ASPEED_GPIO_1_8V] = 0x1E780800, + [ASPEED_RTC] = 0x1E781000, + [ASPEED_TIMER1] = 0x1E782000, + [ASPEED_WDT] = 0x1E785000, + [ASPEED_LPC] = 0x1E789000, + [ASPEED_IBT] = 0x1E789140, + [ASPEED_I2C] = 0x1E78A000, + [ASPEED_UART1] = 0x1E783000, + [ASPEED_UART5] = 0x1E784000, + [ASPEED_VUART] = 0x1E787000, + [ASPEED_SDRAM] = 0x80000000, +}; + +#define ASPEED_A7MPCORE_ADDR 0x40460000 + +#define ASPEED_SOC_AST2600_MAX_IRQ 128 + +static const int aspeed_soc_ast2600_irqmap[] = { + [ASPEED_UART1] = 47, + [ASPEED_UART2] = 48, + [ASPEED_UART3] = 49, + [ASPEED_UART4] = 50, + [ASPEED_UART5] = 8, + [ASPEED_VUART] = 8, + [ASPEED_FMC] = 39, + [ASPEED_SDMC] = 0, + [ASPEED_SCU] = 12, + [ASPEED_ADC] = 78, + [ASPEED_XDMA] = 6, + [ASPEED_SDHCI] = 43, + [ASPEED_GPIO] = 40, + [ASPEED_GPIO_1_8V] = 11, + [ASPEED_RTC] = 13, + [ASPEED_TIMER1] = 16, + [ASPEED_TIMER2] = 17, + [ASPEED_TIMER3] = 18, + [ASPEED_TIMER4] = 19, + [ASPEED_TIMER5] = 20, + [ASPEED_TIMER6] = 21, + [ASPEED_TIMER7] = 22, + [ASPEED_TIMER8] = 23, + [ASPEED_WDT] = 24, + [ASPEED_PWM] = 44, + [ASPEED_LPC] = 35, + [ASPEED_IBT] = 35, /* LPC */ + [ASPEED_I2C] = 110, /* 110 -> 125 */ + [ASPEED_ETH1] = 2, + [ASPEED_ETH2] = 3, +}; + +static qemu_irq aspeed_soc_get_irq(AspeedSoCState *s, int ctrl) +{ + AspeedSoCClass *sc = ASPEED_SOC_GET_CLASS(s); + + return qdev_get_gpio_in(DEVICE(&s->a7mpcore), sc->irqmap[ctrl]); +} + +static void aspeed_soc_ast2600_init(Object *obj) +{ + AspeedSoCState *s = ASPEED_SOC(obj); + AspeedSoCClass *sc = ASPEED_SOC_GET_CLASS(s); + int i; + char socname[8]; + char typename[64]; + + if (sscanf(sc->name, "%7s", socname) != 1) { + g_assert_not_reached(); + } + + for (i = 0; i < sc->num_cpus; i++) { + object_initialize_child(obj, "cpu[*]", OBJECT(&s->cpu[i]), + sizeof(s->cpu[i]), sc->cpu_type, + &error_abort, NULL); + } + + snprintf(typename, sizeof(typename), "aspeed.scu-%s", socname); + sysbus_init_child_obj(obj, "scu", OBJECT(&s->scu), sizeof(s->scu), + typename); + qdev_prop_set_uint32(DEVICE(&s->scu), "silicon-rev", + sc->silicon_rev); + object_property_add_alias(obj, "hw-strap1", OBJECT(&s->scu), + "hw-strap1", &error_abort); + object_property_add_alias(obj, "hw-strap2", OBJECT(&s->scu), + "hw-strap2", &error_abort); + object_property_add_alias(obj, "hw-prot-key", OBJECT(&s->scu), + "hw-prot-key", &error_abort); + + sysbus_init_child_obj(obj, "a7mpcore", &s->a7mpcore, + sizeof(s->a7mpcore), TYPE_A15MPCORE_PRIV); + + sysbus_init_child_obj(obj, "rtc", OBJECT(&s->rtc), sizeof(s->rtc), + TYPE_ASPEED_RTC); + + snprintf(typename, sizeof(typename), "aspeed.timer-%s", socname); + sysbus_init_child_obj(obj, "timerctrl", OBJECT(&s->timerctrl), + sizeof(s->timerctrl), typename); + object_property_add_const_link(OBJECT(&s->timerctrl), "scu", + OBJECT(&s->scu), &error_abort); + + snprintf(typename, sizeof(typename), "aspeed.i2c-%s", socname); + sysbus_init_child_obj(obj, "i2c", OBJECT(&s->i2c), sizeof(s->i2c), + typename); + + snprintf(typename, sizeof(typename), "aspeed.fmc-%s", socname); + sysbus_init_child_obj(obj, "fmc", OBJECT(&s->fmc), sizeof(s->fmc), + typename); + object_property_add_alias(obj, "num-cs", OBJECT(&s->fmc), "num-cs", + &error_abort); + object_property_add_alias(obj, "dram", OBJECT(&s->fmc), "dram", + &error_abort); + + for (i = 0; i < sc->spis_num; i++) { + snprintf(typename, sizeof(typename), "aspeed.spi%d-%s", i + 1, socname); + sysbus_init_child_obj(obj, "spi[*]", OBJECT(&s->spi[i]), + sizeof(s->spi[i]), typename); + } + + snprintf(typename, sizeof(typename), "aspeed.sdmc-%s", socname); + sysbus_init_child_obj(obj, "sdmc", OBJECT(&s->sdmc), sizeof(s->sdmc), + typename); + object_property_add_alias(obj, "ram-size", OBJECT(&s->sdmc), + "ram-size", &error_abort); + object_property_add_alias(obj, "max-ram-size", OBJECT(&s->sdmc), + "max-ram-size", &error_abort); + + for (i = 0; i < sc->wdts_num; i++) { + snprintf(typename, sizeof(typename), "aspeed.wdt-%s", socname); + sysbus_init_child_obj(obj, "wdt[*]", OBJECT(&s->wdt[i]), + sizeof(s->wdt[i]), typename); + object_property_add_const_link(OBJECT(&s->wdt[i]), "scu", + OBJECT(&s->scu), &error_abort); + } + + for (i = 0; i < ASPEED_MACS_NUM; i++) { + sysbus_init_child_obj(obj, "ftgmac100[*]", OBJECT(&s->ftgmac100[i]), + sizeof(s->ftgmac100[i]), TYPE_FTGMAC100); + } + + sysbus_init_child_obj(obj, "xdma", OBJECT(&s->xdma), sizeof(s->xdma), + TYPE_ASPEED_XDMA); + + snprintf(typename, sizeof(typename), "aspeed.gpio-%s", socname); + sysbus_init_child_obj(obj, "gpio", OBJECT(&s->gpio), sizeof(s->gpio), + typename); + + snprintf(typename, sizeof(typename), "aspeed.gpio-%s-1_8v", socname); + sysbus_init_child_obj(obj, "gpio_1_8v", OBJECT(&s->gpio_1_8v), + sizeof(s->gpio_1_8v), typename); + + sysbus_init_child_obj(obj, "sdc", OBJECT(&s->sdhci), sizeof(s->sdhci), + TYPE_ASPEED_SDHCI); + + /* Init sd card slot class here so that they're under the correct parent */ + for (i = 0; i < ASPEED_SDHCI_NUM_SLOTS; ++i) { + sysbus_init_child_obj(obj, "sdhci[*]", OBJECT(&s->sdhci.slots[i]), + sizeof(s->sdhci.slots[i]), TYPE_SYSBUS_SDHCI); + } +} + +/* + * ASPEED ast2600 has 0xf as cluster ID + * + * http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0388e/CIHEBGFG.html + */ +static uint64_t aspeed_calc_affinity(int cpu) +{ + return (0xf << ARM_AFF1_SHIFT) | cpu; +} + +static void aspeed_soc_ast2600_realize(DeviceState *dev, Error **errp) +{ + int i; + AspeedSoCState *s = ASPEED_SOC(dev); + AspeedSoCClass *sc = ASPEED_SOC_GET_CLASS(s); + Error *err = NULL, *local_err = NULL; + qemu_irq irq; + + /* IO space */ + create_unimplemented_device("aspeed_soc.io", sc->memmap[ASPEED_IOMEM], + ASPEED_SOC_IOMEM_SIZE); + + if (s->num_cpus > sc->num_cpus) { + warn_report("%s: invalid number of CPUs %d, using default %d", + sc->name, s->num_cpus, sc->num_cpus); + s->num_cpus = sc->num_cpus; + } + + /* CPU */ + for (i = 0; i < s->num_cpus; i++) { + object_property_set_int(OBJECT(&s->cpu[i]), QEMU_PSCI_CONDUIT_SMC, + "psci-conduit", &error_abort); + if (s->num_cpus > 1) { + object_property_set_int(OBJECT(&s->cpu[i]), + ASPEED_A7MPCORE_ADDR, + "reset-cbar", &error_abort); + } + object_property_set_int(OBJECT(&s->cpu[i]), aspeed_calc_affinity(i), + "mp-affinity", &error_abort); + + /* + * TODO: the secondary CPUs are started and a boot helper + * is needed when using -kernel + */ + + object_property_set_bool(OBJECT(&s->cpu[i]), true, "realized", &err); + if (err) { + error_propagate(errp, err); + return; + } + } + + /* A7MPCORE */ + object_property_set_int(OBJECT(&s->a7mpcore), s->num_cpus, "num-cpu", + &error_abort); + object_property_set_int(OBJECT(&s->a7mpcore), + ASPEED_SOC_AST2600_MAX_IRQ + GIC_INTERNAL, + "num-irq", &error_abort); + + object_property_set_bool(OBJECT(&s->a7mpcore), true, "realized", + &error_abort); + sysbus_mmio_map(SYS_BUS_DEVICE(&s->a7mpcore), 0, ASPEED_A7MPCORE_ADDR); + + for (i = 0; i < s->num_cpus; i++) { + SysBusDevice *sbd = SYS_BUS_DEVICE(&s->a7mpcore); + DeviceState *d = DEVICE(qemu_get_cpu(i)); + + irq = qdev_get_gpio_in(d, ARM_CPU_IRQ); + sysbus_connect_irq(sbd, i, irq); + irq = qdev_get_gpio_in(d, ARM_CPU_FIQ); + sysbus_connect_irq(sbd, i + s->num_cpus, irq); + irq = qdev_get_gpio_in(d, ARM_CPU_VIRQ); + sysbus_connect_irq(sbd, i + 2 * s->num_cpus, irq); + irq = qdev_get_gpio_in(d, ARM_CPU_VFIQ); + sysbus_connect_irq(sbd, i + 3 * s->num_cpus, irq); + } + + /* SRAM */ + memory_region_init_ram(&s->sram, OBJECT(dev), "aspeed.sram", + sc->sram_size, &err); + if (err) { + error_propagate(errp, err); + return; + } + memory_region_add_subregion(get_system_memory(), + sc->memmap[ASPEED_SRAM], &s->sram); + + /* SCU */ + object_property_set_bool(OBJECT(&s->scu), true, "realized", &err); + if (err) { + error_propagate(errp, err); + return; + } + sysbus_mmio_map(SYS_BUS_DEVICE(&s->scu), 0, sc->memmap[ASPEED_SCU]); + + /* RTC */ + object_property_set_bool(OBJECT(&s->rtc), true, "realized", &err); + if (err) { + error_propagate(errp, err); + return; + } + sysbus_mmio_map(SYS_BUS_DEVICE(&s->rtc), 0, sc->memmap[ASPEED_RTC]); + sysbus_connect_irq(SYS_BUS_DEVICE(&s->rtc), 0, + aspeed_soc_get_irq(s, ASPEED_RTC)); + + /* Timer */ + object_property_set_bool(OBJECT(&s->timerctrl), true, "realized", &err); + if (err) { + error_propagate(errp, err); + return; + } + sysbus_mmio_map(SYS_BUS_DEVICE(&s->timerctrl), 0, + sc->memmap[ASPEED_TIMER1]); + for (i = 0; i < ASPEED_TIMER_NR_TIMERS; i++) { + qemu_irq irq = aspeed_soc_get_irq(s, ASPEED_TIMER1 + i); + sysbus_connect_irq(SYS_BUS_DEVICE(&s->timerctrl), i, irq); + } + + /* UART - attach an 8250 to the IO space as our UART5 */ + if (serial_hd(0)) { + qemu_irq uart5 = aspeed_soc_get_irq(s, ASPEED_UART5); + serial_mm_init(get_system_memory(), sc->memmap[ASPEED_UART5], 2, + uart5, 38400, serial_hd(0), DEVICE_LITTLE_ENDIAN); + } + + /* I2C */ + object_property_set_bool(OBJECT(&s->i2c), true, "realized", &err); + if (err) { + error_propagate(errp, err); + return; + } + sysbus_mmio_map(SYS_BUS_DEVICE(&s->i2c), 0, sc->memmap[ASPEED_I2C]); + for (i = 0; i < ASPEED_I2C_GET_CLASS(&s->i2c)->num_busses; i++) { + qemu_irq irq = qdev_get_gpio_in(DEVICE(&s->a7mpcore), + sc->irqmap[ASPEED_I2C] + i); + /* + * The AST2600 SoC has one IRQ per I2C bus. Skip the common + * IRQ (AST2400 and AST2500) and connect all bussses. + */ + sysbus_connect_irq(SYS_BUS_DEVICE(&s->i2c), i + 1, irq); + } + + /* FMC, The number of CS is set at the board level */ + object_property_set_int(OBJECT(&s->fmc), sc->memmap[ASPEED_SDRAM], + "sdram-base", &err); + if (err) { + error_propagate(errp, err); + return; + } + object_property_set_bool(OBJECT(&s->fmc), true, "realized", &err); + if (err) { + error_propagate(errp, err); + return; + } + sysbus_mmio_map(SYS_BUS_DEVICE(&s->fmc), 0, sc->memmap[ASPEED_FMC]); + sysbus_mmio_map(SYS_BUS_DEVICE(&s->fmc), 1, + s->fmc.ctrl->flash_window_base); + sysbus_connect_irq(SYS_BUS_DEVICE(&s->fmc), 0, + aspeed_soc_get_irq(s, ASPEED_FMC)); + + /* SPI */ + for (i = 0; i < sc->spis_num; i++) { + object_property_set_int(OBJECT(&s->spi[i]), 1, "num-cs", &err); + object_property_set_bool(OBJECT(&s->spi[i]), true, "realized", + &local_err); + error_propagate(&err, local_err); + if (err) { + error_propagate(errp, err); + return; + } + sysbus_mmio_map(SYS_BUS_DEVICE(&s->spi[i]), 0, + sc->memmap[ASPEED_SPI1 + i]); + sysbus_mmio_map(SYS_BUS_DEVICE(&s->spi[i]), 1, + s->spi[i].ctrl->flash_window_base); + } + + /* SDMC - SDRAM Memory Controller */ + object_property_set_bool(OBJECT(&s->sdmc), true, "realized", &err); + if (err) { + error_propagate(errp, err); + return; + } + sysbus_mmio_map(SYS_BUS_DEVICE(&s->sdmc), 0, sc->memmap[ASPEED_SDMC]); + + /* Watch dog */ + for (i = 0; i < sc->wdts_num; i++) { + AspeedWDTClass *awc = ASPEED_WDT_GET_CLASS(&s->wdt[i]); + + object_property_set_bool(OBJECT(&s->wdt[i]), true, "realized", &err); + if (err) { + error_propagate(errp, err); + return; + } + sysbus_mmio_map(SYS_BUS_DEVICE(&s->wdt[i]), 0, + sc->memmap[ASPEED_WDT] + i * awc->offset); + } + + /* Net */ + for (i = 0; i < nb_nics; i++) { + qdev_set_nic_properties(DEVICE(&s->ftgmac100[i]), &nd_table[i]); + object_property_set_bool(OBJECT(&s->ftgmac100[i]), true, "aspeed", + &err); + object_property_set_bool(OBJECT(&s->ftgmac100[i]), true, "realized", + &local_err); + error_propagate(&err, local_err); + if (err) { + error_propagate(errp, err); + return; + } + sysbus_mmio_map(SYS_BUS_DEVICE(&s->ftgmac100[i]), 0, + sc->memmap[ASPEED_ETH1 + i]); + sysbus_connect_irq(SYS_BUS_DEVICE(&s->ftgmac100[i]), 0, + aspeed_soc_get_irq(s, ASPEED_ETH1 + i)); + } + + /* XDMA */ + object_property_set_bool(OBJECT(&s->xdma), true, "realized", &err); + if (err) { + error_propagate(errp, err); + return; + } + sysbus_mmio_map(SYS_BUS_DEVICE(&s->xdma), 0, + sc->memmap[ASPEED_XDMA]); + sysbus_connect_irq(SYS_BUS_DEVICE(&s->xdma), 0, + aspeed_soc_get_irq(s, ASPEED_XDMA)); + + /* GPIO */ + object_property_set_bool(OBJECT(&s->gpio), true, "realized", &err); + if (err) { + error_propagate(errp, err); + return; + } + sysbus_mmio_map(SYS_BUS_DEVICE(&s->gpio), 0, sc->memmap[ASPEED_GPIO]); + sysbus_connect_irq(SYS_BUS_DEVICE(&s->gpio), 0, + aspeed_soc_get_irq(s, ASPEED_GPIO)); + + object_property_set_bool(OBJECT(&s->gpio_1_8v), true, "realized", &err); + if (err) { + error_propagate(errp, err); + return; + } + sysbus_mmio_map(SYS_BUS_DEVICE(&s->gpio_1_8v), 0, + sc->memmap[ASPEED_GPIO_1_8V]); + sysbus_connect_irq(SYS_BUS_DEVICE(&s->gpio_1_8v), 0, + aspeed_soc_get_irq(s, ASPEED_GPIO_1_8V)); + + /* SDHCI */ + object_property_set_bool(OBJECT(&s->sdhci), true, "realized", &err); + if (err) { + error_propagate(errp, err); + return; + } + sysbus_mmio_map(SYS_BUS_DEVICE(&s->sdhci), 0, + sc->memmap[ASPEED_SDHCI]); + sysbus_connect_irq(SYS_BUS_DEVICE(&s->sdhci), 0, + aspeed_soc_get_irq(s, ASPEED_SDHCI)); +} + +static void aspeed_soc_ast2600_class_init(ObjectClass *oc, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(oc); + AspeedSoCClass *sc = ASPEED_SOC_CLASS(oc); + + dc->realize = aspeed_soc_ast2600_realize; + + sc->name = "ast2600-a0"; + sc->cpu_type = ARM_CPU_TYPE_NAME("cortex-a7"); + sc->silicon_rev = AST2600_A0_SILICON_REV; + sc->sram_size = 0x10000; + sc->spis_num = 2; + sc->wdts_num = 4; + sc->irqmap = aspeed_soc_ast2600_irqmap; + sc->memmap = aspeed_soc_ast2600_memmap; + sc->num_cpus = 2; +} + +static const TypeInfo aspeed_soc_ast2600_type_info = { + .name = "ast2600-a0", + .parent = TYPE_ASPEED_SOC, + .instance_size = sizeof(AspeedSoCState), + .instance_init = aspeed_soc_ast2600_init, + .class_init = aspeed_soc_ast2600_class_init, + .class_size = sizeof(AspeedSoCClass), +}; + +static void aspeed_soc_register_types(void) +{ + type_register_static(&aspeed_soc_ast2600_type_info); +}; + +type_init(aspeed_soc_register_types) -- 2.20.1
From: Cédric Le Goater <clg@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Joel Stanley <joel@jms.id.au> Message-id: 20190925143248.10000-20-clg@kaod.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- hw/block/m25p80.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c index XXXXXXX..XXXXXXX 100644 --- a/hw/block/m25p80.c +++ b/hw/block/m25p80.c @@ -XXX,XX +XXX,XX @@ static const FlashPartInfo known_devices[] = { { INFO("w25q80", 0xef5014, 0, 64 << 10, 16, ER_4K) }, { INFO("w25q80bl", 0xef4014, 0, 64 << 10, 16, ER_4K) }, { INFO("w25q256", 0xef4019, 0, 64 << 10, 512, ER_4K) }, + { INFO("w25q512jv", 0xef4020, 0, 64 << 10, 1024, ER_4K) }, }; typedef enum { -- 2.20.1
From: Cédric Le Goater <clg@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Joel Stanley <joel@jms.id.au> Message-id: 20190925143248.10000-21-clg@kaod.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/hw/arm/aspeed.h | 1 + hw/arm/aspeed.c | 23 +++++++++++++++++++++++ 2 files changed, 24 insertions(+) diff --git a/include/hw/arm/aspeed.h b/include/hw/arm/aspeed.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/arm/aspeed.h +++ b/include/hw/arm/aspeed.h @@ -XXX,XX +XXX,XX @@ typedef struct AspeedBoardConfig { const char *desc; const char *soc_name; uint32_t hw_strap1; + uint32_t hw_strap2; const char *fmc_model; const char *spi_model; uint32_t num_cs; diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/aspeed.c +++ b/hw/arm/aspeed.c @@ -XXX,XX +XXX,XX @@ struct AspeedBoardState { /* Witherspoon hardware value: 0xF10AD216 (but use romulus definition) */ #define WITHERSPOON_BMC_HW_STRAP1 ROMULUS_BMC_HW_STRAP1 +/* AST2600 evb hardware value */ +#define AST2600_EVB_HW_STRAP1 0x000000C0 +#define AST2600_EVB_HW_STRAP2 0x00000003 + /* * The max ram region is for firmwares that scan the address space * with load/store to guess how much RAM the SoC has. @@ -XXX,XX +XXX,XX @@ static void aspeed_board_init(MachineState *machine, &error_abort); object_property_set_int(OBJECT(&bmc->soc), cfg->hw_strap1, "hw-strap1", &error_abort); + object_property_set_int(OBJECT(&bmc->soc), cfg->hw_strap2, "hw-strap2", + &error_abort); object_property_set_int(OBJECT(&bmc->soc), cfg->num_cs, "num-cs", &error_abort); object_property_set_int(OBJECT(&bmc->soc), machine->smp.cpus, "num-cpus", @@ -XXX,XX +XXX,XX @@ static void ast2500_evb_i2c_init(AspeedBoardState *bmc) i2c_create_slave(aspeed_i2c_get_bus(DEVICE(&soc->i2c), 11), "ds1338", 0x32); } +static void ast2600_evb_i2c_init(AspeedBoardState *bmc) +{ + /* Start with some devices on our I2C busses */ + ast2500_evb_i2c_init(bmc); +} + static void romulus_bmc_i2c_init(AspeedBoardState *bmc) { AspeedSoCState *soc = &bmc->soc; @@ -XXX,XX +XXX,XX @@ static const AspeedBoardConfig aspeed_boards[] = { .num_cs = 2, .i2c_init = witherspoon_bmc_i2c_init, .ram = 512 * MiB, + }, { + .name = MACHINE_TYPE_NAME("ast2600-evb"), + .desc = "Aspeed AST2600 EVB (Cortex A7)", + .soc_name = "ast2600-a0", + .hw_strap1 = AST2600_EVB_HW_STRAP1, + .hw_strap2 = AST2600_EVB_HW_STRAP2, + .fmc_model = "w25q512jv", + .spi_model = "mx66u51235f", + .num_cs = 1, + .i2c_init = ast2600_evb_i2c_init, + .ram = 2 * GiB, }, }; -- 2.20.1
From: Joel Stanley <joel@jms.id.au> To support the ast2600's four MACs allow SoCs to specify the number they have, and create that many. Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-id: 20190925143248.10000-22-clg@kaod.org [clg: - included a check on sc->macs_num when realizing the macs - included interrupt definitions for the AST2600 ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/hw/arm/aspeed_soc.h | 5 ++++- hw/arm/aspeed_ast2600.c | 10 ++++++++-- hw/arm/aspeed_soc.c | 6 ++++-- 3 files changed, 16 insertions(+), 5 deletions(-) diff --git a/include/hw/arm/aspeed_soc.h b/include/hw/arm/aspeed_soc.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/arm/aspeed_soc.h +++ b/include/hw/arm/aspeed_soc.h @@ -XXX,XX +XXX,XX @@ #define ASPEED_SPIS_NUM 2 #define ASPEED_WDTS_NUM 4 #define ASPEED_CPUS_NUM 2 -#define ASPEED_MACS_NUM 2 +#define ASPEED_MACS_NUM 4 typedef struct AspeedSoCState { /*< private >*/ @@ -XXX,XX +XXX,XX @@ typedef struct AspeedSoCClass { uint64_t sram_size; int spis_num; int wdts_num; + int macs_num; const int *irqmap; const hwaddr *memmap; uint32_t num_cpus; @@ -XXX,XX +XXX,XX @@ enum { ASPEED_I2C, ASPEED_ETH1, ASPEED_ETH2, + ASPEED_ETH3, + ASPEED_ETH4, ASPEED_SDRAM, ASPEED_XDMA, }; diff --git a/hw/arm/aspeed_ast2600.c b/hw/arm/aspeed_ast2600.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/aspeed_ast2600.c +++ b/hw/arm/aspeed_ast2600.c @@ -XXX,XX +XXX,XX @@ static const hwaddr aspeed_soc_ast2600_memmap[] = { [ASPEED_SPI1] = 0x1E630000, [ASPEED_SPI2] = 0x1E641000, [ASPEED_ETH1] = 0x1E660000, + [ASPEED_ETH3] = 0x1E670000, [ASPEED_ETH2] = 0x1E680000, + [ASPEED_ETH4] = 0x1E690000, [ASPEED_VIC] = 0x1E6C0000, [ASPEED_SDMC] = 0x1E6E0000, [ASPEED_SCU] = 0x1E6E2000, @@ -XXX,XX +XXX,XX @@ static const int aspeed_soc_ast2600_irqmap[] = { [ASPEED_I2C] = 110, /* 110 -> 125 */ [ASPEED_ETH1] = 2, [ASPEED_ETH2] = 3, + [ASPEED_ETH3] = 32, + [ASPEED_ETH4] = 33, + }; static qemu_irq aspeed_soc_get_irq(AspeedSoCState *s, int ctrl) @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_ast2600_init(Object *obj) OBJECT(&s->scu), &error_abort); } - for (i = 0; i < ASPEED_MACS_NUM; i++) { + for (i = 0; i < sc->macs_num; i++) { sysbus_init_child_obj(obj, "ftgmac100[*]", OBJECT(&s->ftgmac100[i]), sizeof(s->ftgmac100[i]), TYPE_FTGMAC100); } @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_ast2600_realize(DeviceState *dev, Error **errp) } /* Net */ - for (i = 0; i < nb_nics; i++) { + for (i = 0; i < nb_nics && i < sc->macs_num; i++) { qdev_set_nic_properties(DEVICE(&s->ftgmac100[i]), &nd_table[i]); object_property_set_bool(OBJECT(&s->ftgmac100[i]), true, "aspeed", &err); @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_ast2600_class_init(ObjectClass *oc, void *data) sc->sram_size = 0x10000; sc->spis_num = 2; sc->wdts_num = 4; + sc->macs_num = 4; sc->irqmap = aspeed_soc_ast2600_irqmap; sc->memmap = aspeed_soc_ast2600_memmap; sc->num_cpus = 2; diff --git a/hw/arm/aspeed_soc.c b/hw/arm/aspeed_soc.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/aspeed_soc.c +++ b/hw/arm/aspeed_soc.c @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_init(Object *obj) OBJECT(&s->scu), &error_abort); } - for (i = 0; i < ASPEED_MACS_NUM; i++) { + for (i = 0; i < sc->macs_num; i++) { sysbus_init_child_obj(obj, "ftgmac100[*]", OBJECT(&s->ftgmac100[i]), sizeof(s->ftgmac100[i]), TYPE_FTGMAC100); } @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_realize(DeviceState *dev, Error **errp) } /* Net */ - for (i = 0; i < nb_nics; i++) { + for (i = 0; i < nb_nics && i < sc->macs_num; i++) { qdev_set_nic_properties(DEVICE(&s->ftgmac100[i]), &nd_table[i]); object_property_set_bool(OBJECT(&s->ftgmac100[i]), true, "aspeed", &err); @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_ast2400_class_init(ObjectClass *oc, void *data) sc->sram_size = 0x8000; sc->spis_num = 1; sc->wdts_num = 2; + sc->macs_num = 2; sc->irqmap = aspeed_soc_ast2400_irqmap; sc->memmap = aspeed_soc_ast2400_memmap; sc->num_cpus = 1; @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_ast2500_class_init(ObjectClass *oc, void *data) sc->sram_size = 0x9000; sc->spis_num = 2; sc->wdts_num = 3; + sc->macs_num = 2; sc->irqmap = aspeed_soc_ast2500_irqmap; sc->memmap = aspeed_soc_ast2500_memmap; sc->num_cpus = 1; -- 2.20.1
From: Cédric Le Goater <clg@kaod.org> The AST2600 SoC has an extra controller to set the PHY registers. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Joel Stanley <joel@jms.id.au> Message-id: 20190925143248.10000-23-clg@kaod.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/hw/arm/aspeed_soc.h | 5 ++ include/hw/net/ftgmac100.h | 17 ++++ hw/arm/aspeed_ast2600.c | 20 +++++ hw/net/ftgmac100.c | 162 ++++++++++++++++++++++++++++++++++++ 4 files changed, 204 insertions(+) diff --git a/include/hw/arm/aspeed_soc.h b/include/hw/arm/aspeed_soc.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/arm/aspeed_soc.h +++ b/include/hw/arm/aspeed_soc.h @@ -XXX,XX +XXX,XX @@ typedef struct AspeedSoCState { AspeedSDMCState sdmc; AspeedWDTState wdt[ASPEED_WDTS_NUM]; FTGMAC100State ftgmac100[ASPEED_MACS_NUM]; + AspeedMiiState mii[ASPEED_MACS_NUM]; AspeedGPIOState gpio; AspeedGPIOState gpio_1_8v; AspeedSDHCIState sdhci; @@ -XXX,XX +XXX,XX @@ enum { ASPEED_ETH2, ASPEED_ETH3, ASPEED_ETH4, + ASPEED_MII1, + ASPEED_MII2, + ASPEED_MII3, + ASPEED_MII4, ASPEED_SDRAM, ASPEED_XDMA, }; diff --git a/include/hw/net/ftgmac100.h b/include/hw/net/ftgmac100.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/net/ftgmac100.h +++ b/include/hw/net/ftgmac100.h @@ -XXX,XX +XXX,XX @@ typedef struct FTGMAC100State { uint32_t rxdes0_edorr; } FTGMAC100State; +#define TYPE_ASPEED_MII "aspeed-mmi" +#define ASPEED_MII(obj) OBJECT_CHECK(AspeedMiiState, (obj), TYPE_ASPEED_MII) + +/* + * AST2600 MII controller + */ +typedef struct AspeedMiiState { + /*< private >*/ + SysBusDevice parent_obj; + + FTGMAC100State *nic; + + MemoryRegion iomem; + uint32_t phycr; + uint32_t phydata; +} AspeedMiiState; + #endif diff --git a/hw/arm/aspeed_ast2600.c b/hw/arm/aspeed_ast2600.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/aspeed_ast2600.c +++ b/hw/arm/aspeed_ast2600.c @@ -XXX,XX +XXX,XX @@ static const hwaddr aspeed_soc_ast2600_memmap[] = { [ASPEED_FMC] = 0x1E620000, [ASPEED_SPI1] = 0x1E630000, [ASPEED_SPI2] = 0x1E641000, + [ASPEED_MII1] = 0x1E650000, + [ASPEED_MII2] = 0x1E650008, + [ASPEED_MII3] = 0x1E650010, + [ASPEED_MII4] = 0x1E650018, [ASPEED_ETH1] = 0x1E660000, [ASPEED_ETH3] = 0x1E670000, [ASPEED_ETH2] = 0x1E680000, @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_ast2600_init(Object *obj) for (i = 0; i < sc->macs_num; i++) { sysbus_init_child_obj(obj, "ftgmac100[*]", OBJECT(&s->ftgmac100[i]), sizeof(s->ftgmac100[i]), TYPE_FTGMAC100); + + sysbus_init_child_obj(obj, "mii[*]", &s->mii[i], sizeof(s->mii[i]), + TYPE_ASPEED_MII); + object_property_add_const_link(OBJECT(&s->mii[i]), "nic", + OBJECT(&s->ftgmac100[i]), + &error_abort); } sysbus_init_child_obj(obj, "xdma", OBJECT(&s->xdma), sizeof(s->xdma), @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_ast2600_realize(DeviceState *dev, Error **errp) sc->memmap[ASPEED_ETH1 + i]); sysbus_connect_irq(SYS_BUS_DEVICE(&s->ftgmac100[i]), 0, aspeed_soc_get_irq(s, ASPEED_ETH1 + i)); + + object_property_set_bool(OBJECT(&s->mii[i]), true, "realized", + &err); + if (err) { + error_propagate(errp, err); + return; + } + + sysbus_mmio_map(SYS_BUS_DEVICE(&s->mii[i]), 0, + sc->memmap[ASPEED_MII1 + i]); } /* XDMA */ diff --git a/hw/net/ftgmac100.c b/hw/net/ftgmac100.c index XXXXXXX..XXXXXXX 100644 --- a/hw/net/ftgmac100.c +++ b/hw/net/ftgmac100.c @@ -XXX,XX +XXX,XX @@ #include "hw/irq.h" #include "hw/net/ftgmac100.h" #include "sysemu/dma.h" +#include "qapi/error.h" #include "qemu/log.h" #include "qemu/module.h" #include "net/checksum.h" @@ -XXX,XX +XXX,XX @@ static const TypeInfo ftgmac100_info = { .class_init = ftgmac100_class_init, }; +/* + * AST2600 MII controller + */ +#define ASPEED_MII_PHYCR_FIRE BIT(31) +#define ASPEED_MII_PHYCR_ST_22 BIT(28) +#define ASPEED_MII_PHYCR_OP(x) ((x) & (ASPEED_MII_PHYCR_OP_WRITE | \ + ASPEED_MII_PHYCR_OP_READ)) +#define ASPEED_MII_PHYCR_OP_WRITE BIT(26) +#define ASPEED_MII_PHYCR_OP_READ BIT(27) +#define ASPEED_MII_PHYCR_DATA(x) (x & 0xffff) +#define ASPEED_MII_PHYCR_PHY(x) (((x) >> 21) & 0x1f) +#define ASPEED_MII_PHYCR_REG(x) (((x) >> 16) & 0x1f) + +#define ASPEED_MII_PHYDATA_IDLE BIT(16) + +static void aspeed_mii_transition(AspeedMiiState *s, bool fire) +{ + if (fire) { + s->phycr |= ASPEED_MII_PHYCR_FIRE; + s->phydata &= ~ASPEED_MII_PHYDATA_IDLE; + } else { + s->phycr &= ~ASPEED_MII_PHYCR_FIRE; + s->phydata |= ASPEED_MII_PHYDATA_IDLE; + } +} + +static void aspeed_mii_do_phy_ctl(AspeedMiiState *s) +{ + uint8_t reg; + uint16_t data; + + if (!(s->phycr & ASPEED_MII_PHYCR_ST_22)) { + aspeed_mii_transition(s, !ASPEED_MII_PHYCR_FIRE); + qemu_log_mask(LOG_UNIMP, "%s: unsupported ST code\n", __func__); + return; + } + + /* Nothing to do */ + if (!(s->phycr & ASPEED_MII_PHYCR_FIRE)) { + return; + } + + reg = ASPEED_MII_PHYCR_REG(s->phycr); + data = ASPEED_MII_PHYCR_DATA(s->phycr); + + switch (ASPEED_MII_PHYCR_OP(s->phycr)) { + case ASPEED_MII_PHYCR_OP_WRITE: + do_phy_write(s->nic, reg, data); + break; + case ASPEED_MII_PHYCR_OP_READ: + s->phydata = (s->phydata & ~0xffff) | do_phy_read(s->nic, reg); + break; + default: + qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid OP code %08x\n", + __func__, s->phycr); + } + + aspeed_mii_transition(s, !ASPEED_MII_PHYCR_FIRE); +} + +static uint64_t aspeed_mii_read(void *opaque, hwaddr addr, unsigned size) +{ + AspeedMiiState *s = ASPEED_MII(opaque); + + switch (addr) { + case 0x0: + return s->phycr; + case 0x4: + return s->phydata; + default: + g_assert_not_reached(); + } +} + +static void aspeed_mii_write(void *opaque, hwaddr addr, + uint64_t value, unsigned size) +{ + AspeedMiiState *s = ASPEED_MII(opaque); + + switch (addr) { + case 0x0: + s->phycr = value & ~(s->phycr & ASPEED_MII_PHYCR_FIRE); + break; + case 0x4: + s->phydata = value & ~(0xffff | ASPEED_MII_PHYDATA_IDLE); + break; + default: + g_assert_not_reached(); + } + + aspeed_mii_transition(s, !!(s->phycr & ASPEED_MII_PHYCR_FIRE)); + aspeed_mii_do_phy_ctl(s); +} + +static const MemoryRegionOps aspeed_mii_ops = { + .read = aspeed_mii_read, + .write = aspeed_mii_write, + .valid.min_access_size = 4, + .valid.max_access_size = 4, + .endianness = DEVICE_LITTLE_ENDIAN, +}; + +static void aspeed_mii_reset(DeviceState *dev) +{ + AspeedMiiState *s = ASPEED_MII(dev); + + s->phycr = 0; + s->phydata = 0; + + aspeed_mii_transition(s, !!(s->phycr & ASPEED_MII_PHYCR_FIRE)); +}; + +static void aspeed_mii_realize(DeviceState *dev, Error **errp) +{ + AspeedMiiState *s = ASPEED_MII(dev); + SysBusDevice *sbd = SYS_BUS_DEVICE(dev); + Object *obj; + Error *local_err = NULL; + + obj = object_property_get_link(OBJECT(dev), "nic", &local_err); + if (!obj) { + error_propagate(errp, local_err); + error_prepend(errp, "required link 'nic' not found: "); + return; + } + + s->nic = FTGMAC100(obj); + + memory_region_init_io(&s->iomem, OBJECT(dev), &aspeed_mii_ops, s, + TYPE_ASPEED_MII, 0x8); + sysbus_init_mmio(sbd, &s->iomem); +} + +static const VMStateDescription vmstate_aspeed_mii = { + .name = TYPE_ASPEED_MII, + .version_id = 1, + .minimum_version_id = 1, + .fields = (VMStateField[]) { + VMSTATE_UINT32(phycr, FTGMAC100State), + VMSTATE_UINT32(phydata, FTGMAC100State), + VMSTATE_END_OF_LIST() + } +}; +static void aspeed_mii_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(klass); + + dc->vmsd = &vmstate_aspeed_mii; + dc->reset = aspeed_mii_reset; + dc->realize = aspeed_mii_realize; + dc->desc = "Aspeed MII controller"; +} + +static const TypeInfo aspeed_mii_info = { + .name = TYPE_ASPEED_MII, + .parent = TYPE_SYS_BUS_DEVICE, + .instance_size = sizeof(AspeedMiiState), + .class_init = aspeed_mii_class_init, +}; + static void ftgmac100_register_types(void) { type_register_static(&ftgmac100_info); + type_register_static(&aspeed_mii_info); } type_init(ftgmac100_register_types) -- 2.20.1
From: Joel Stanley <joel@jms.id.au> Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-id: 20190925143248.10000-24-clg@kaod.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/hw/arm/aspeed_soc.h | 1 + hw/arm/aspeed_ast2600.c | 5 +++++ hw/arm/aspeed_soc.c | 6 ++++++ 3 files changed, 12 insertions(+) diff --git a/include/hw/arm/aspeed_soc.h b/include/hw/arm/aspeed_soc.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/arm/aspeed_soc.h +++ b/include/hw/arm/aspeed_soc.h @@ -XXX,XX +XXX,XX @@ enum { ASPEED_SDMC, ASPEED_SCU, ASPEED_ADC, + ASPEED_VIDEO, ASPEED_SRAM, ASPEED_SDHCI, ASPEED_GPIO, diff --git a/hw/arm/aspeed_ast2600.c b/hw/arm/aspeed_ast2600.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/aspeed_ast2600.c +++ b/hw/arm/aspeed_ast2600.c @@ -XXX,XX +XXX,XX @@ static const hwaddr aspeed_soc_ast2600_memmap[] = { [ASPEED_SCU] = 0x1E6E2000, [ASPEED_XDMA] = 0x1E6E7000, [ASPEED_ADC] = 0x1E6E9000, + [ASPEED_VIDEO] = 0x1E700000, [ASPEED_SDHCI] = 0x1E740000, [ASPEED_GPIO] = 0x1E780000, [ASPEED_GPIO_1_8V] = 0x1E780800, @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_ast2600_realize(DeviceState *dev, Error **errp) create_unimplemented_device("aspeed_soc.io", sc->memmap[ASPEED_IOMEM], ASPEED_SOC_IOMEM_SIZE); + /* Video engine stub */ + create_unimplemented_device("aspeed.video", sc->memmap[ASPEED_VIDEO], + 0x1000); + if (s->num_cpus > sc->num_cpus) { warn_report("%s: invalid number of CPUs %d, using default %d", sc->name, s->num_cpus, sc->num_cpus); diff --git a/hw/arm/aspeed_soc.c b/hw/arm/aspeed_soc.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/aspeed_soc.c +++ b/hw/arm/aspeed_soc.c @@ -XXX,XX +XXX,XX @@ static const hwaddr aspeed_soc_ast2400_memmap[] = { [ASPEED_SDMC] = 0x1E6E0000, [ASPEED_SCU] = 0x1E6E2000, [ASPEED_XDMA] = 0x1E6E7000, + [ASPEED_VIDEO] = 0x1E700000, [ASPEED_ADC] = 0x1E6E9000, [ASPEED_SRAM] = 0x1E720000, [ASPEED_SDHCI] = 0x1E740000, @@ -XXX,XX +XXX,XX @@ static const hwaddr aspeed_soc_ast2500_memmap[] = { [ASPEED_SCU] = 0x1E6E2000, [ASPEED_XDMA] = 0x1E6E7000, [ASPEED_ADC] = 0x1E6E9000, + [ASPEED_VIDEO] = 0x1E700000, [ASPEED_SRAM] = 0x1E720000, [ASPEED_SDHCI] = 0x1E740000, [ASPEED_GPIO] = 0x1E780000, @@ -XXX,XX +XXX,XX @@ static void aspeed_soc_realize(DeviceState *dev, Error **errp) create_unimplemented_device("aspeed_soc.io", sc->memmap[ASPEED_IOMEM], ASPEED_SOC_IOMEM_SIZE); + /* Video engine stub */ + create_unimplemented_device("aspeed.video", sc->memmap[ASPEED_VIDEO], + 0x1000); + if (s->num_cpus > sc->num_cpus) { warn_report("%s: invalid number of CPUs %d, using default %d", sc->name, s->num_cpus, sc->num_cpus); -- 2.20.1
From: Philippe Mathieu-Daudé <f4bug@amsat.org> IEC binary prefixes ease code review: the unit is explicit. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Cleber Rosa <crosa@redhat.com> Message-id: 20190926173428.10713-2-f4bug@amsat.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- hw/arm/raspi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/raspi.c +++ b/hw/arm/raspi.c @@ -XXX,XX +XXX,XX @@ static void raspi2_machine_init(MachineClass *mc) mc->max_cpus = BCM283X_NCPUS; mc->min_cpus = BCM283X_NCPUS; mc->default_cpus = BCM283X_NCPUS; - mc->default_ram_size = 1024 * 1024 * 1024; + mc->default_ram_size = 1 * GiB; mc->ignore_memory_transaction_failures = true; }; DEFINE_MACHINE("raspi2", raspi2_machine_init) @@ -XXX,XX +XXX,XX @@ static void raspi3_machine_init(MachineClass *mc) mc->max_cpus = BCM283X_NCPUS; mc->min_cpus = BCM283X_NCPUS; mc->default_cpus = BCM283X_NCPUS; - mc->default_ram_size = 1024 * 1024 * 1024; + mc->default_ram_size = 1 * GiB; } DEFINE_MACHINE("raspi3", raspi3_machine_init) #endif -- 2.20.1
From: Philippe Mathieu-Daudé <f4bug@amsat.org> Various logging improvements as once: - Use 0x prefix for hex numbers - Display value written during write accesses - Move some logs from GUEST_ERROR to UNIMP Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Cleber Rosa <crosa@redhat.com> Message-id: 20190926173428.10713-3-f4bug@amsat.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- hw/char/bcm2835_aux.c | 5 +++-- hw/dma/bcm2835_dma.c | 8 ++++---- hw/intc/bcm2836_control.c | 7 ++++--- hw/misc/bcm2835_mbox.c | 7 ++++--- hw/misc/bcm2835_property.c | 16 ++++++++++------ 5 files changed, 25 insertions(+), 18 deletions(-) diff --git a/hw/char/bcm2835_aux.c b/hw/char/bcm2835_aux.c index XXXXXXX..XXXXXXX 100644 --- a/hw/char/bcm2835_aux.c +++ b/hw/char/bcm2835_aux.c @@ -XXX,XX +XXX,XX @@ static void bcm2835_aux_write(void *opaque, hwaddr offset, uint64_t value, switch (offset) { case AUX_ENABLES: if (value != 1) { - qemu_log_mask(LOG_UNIMP, "%s: unsupported attempt to enable SPI " - "or disable UART\n", __func__); + qemu_log_mask(LOG_UNIMP, "%s: unsupported attempt to enable SPI" + " or disable UART: 0x%"PRIx64"\n", + __func__, value); } break; diff --git a/hw/dma/bcm2835_dma.c b/hw/dma/bcm2835_dma.c index XXXXXXX..XXXXXXX 100644 --- a/hw/dma/bcm2835_dma.c +++ b/hw/dma/bcm2835_dma.c @@ -XXX,XX +XXX,XX @@ static uint64_t bcm2835_dma_read(BCM2835DMAState *s, hwaddr offset, res = ch->debug; break; default: - qemu_log_mask(LOG_GUEST_ERROR, "%s: Bad offset %"HWADDR_PRIx"\n", + qemu_log_mask(LOG_GUEST_ERROR, "%s: Bad offset 0x%"HWADDR_PRIx"\n", __func__, offset); break; } @@ -XXX,XX +XXX,XX @@ static void bcm2835_dma_write(BCM2835DMAState *s, hwaddr offset, ch->debug = value; break; default: - qemu_log_mask(LOG_GUEST_ERROR, "%s: Bad offset %"HWADDR_PRIx"\n", + qemu_log_mask(LOG_GUEST_ERROR, "%s: Bad offset 0x%"HWADDR_PRIx"\n", __func__, offset); break; } @@ -XXX,XX +XXX,XX @@ static uint64_t bcm2835_dma0_read(void *opaque, hwaddr offset, unsigned size) case BCM2708_DMA_ENABLE: return s->enable; default: - qemu_log_mask(LOG_GUEST_ERROR, "%s: Bad offset %"HWADDR_PRIx"\n", + qemu_log_mask(LOG_GUEST_ERROR, "%s: Bad offset 0x%"HWADDR_PRIx"\n", __func__, offset); return 0; } @@ -XXX,XX +XXX,XX @@ static void bcm2835_dma0_write(void *opaque, hwaddr offset, uint64_t value, s->enable = (value & 0xffff); break; default: - qemu_log_mask(LOG_GUEST_ERROR, "%s: Bad offset %"HWADDR_PRIx"\n", + qemu_log_mask(LOG_GUEST_ERROR, "%s: Bad offset 0x%"HWADDR_PRIx"\n", __func__, offset); } } diff --git a/hw/intc/bcm2836_control.c b/hw/intc/bcm2836_control.c index XXXXXXX..XXXXXXX 100644 --- a/hw/intc/bcm2836_control.c +++ b/hw/intc/bcm2836_control.c @@ -XXX,XX +XXX,XX @@ static uint64_t bcm2836_control_read(void *opaque, hwaddr offset, unsigned size) } else if (offset >= REG_MBOX0_RDCLR && offset < REG_LIMIT) { return s->mailboxes[(offset - REG_MBOX0_RDCLR) >> 2]; } else { - qemu_log_mask(LOG_GUEST_ERROR, "%s: Bad offset %"HWADDR_PRIx"\n", + qemu_log_mask(LOG_UNIMP, "%s: Unsupported offset 0x%"HWADDR_PRIx"\n", __func__, offset); return 0; } @@ -XXX,XX +XXX,XX @@ static void bcm2836_control_write(void *opaque, hwaddr offset, } else if (offset >= REG_MBOX0_RDCLR && offset < REG_LIMIT) { s->mailboxes[(offset - REG_MBOX0_RDCLR) >> 2] &= ~val; } else { - qemu_log_mask(LOG_GUEST_ERROR, "%s: Bad offset %"HWADDR_PRIx"\n", - __func__, offset); + qemu_log_mask(LOG_UNIMP, "%s: Unsupported offset 0x%"HWADDR_PRIx + " value 0x%"PRIx64"\n", + __func__, offset, val); return; } diff --git a/hw/misc/bcm2835_mbox.c b/hw/misc/bcm2835_mbox.c index XXXXXXX..XXXXXXX 100644 --- a/hw/misc/bcm2835_mbox.c +++ b/hw/misc/bcm2835_mbox.c @@ -XXX,XX +XXX,XX @@ static uint64_t bcm2835_mbox_read(void *opaque, hwaddr offset, unsigned size) break; default: - qemu_log_mask(LOG_GUEST_ERROR, "%s: Bad offset %"HWADDR_PRIx"\n", + qemu_log_mask(LOG_UNIMP, "%s: Unsupported offset 0x%"HWADDR_PRIx"\n", __func__, offset); return 0; } @@ -XXX,XX +XXX,XX @@ static void bcm2835_mbox_write(void *opaque, hwaddr offset, break; default: - qemu_log_mask(LOG_GUEST_ERROR, "%s: Bad offset %"HWADDR_PRIx"\n", - __func__, offset); + qemu_log_mask(LOG_UNIMP, "%s: Unsupported offset 0x%"HWADDR_PRIx + " value 0x%"PRIx64"\n", + __func__, offset, value); return; } diff --git a/hw/misc/bcm2835_property.c b/hw/misc/bcm2835_property.c index XXXXXXX..XXXXXXX 100644 --- a/hw/misc/bcm2835_property.c +++ b/hw/misc/bcm2835_property.c @@ -XXX,XX +XXX,XX @@ static void bcm2835_property_mbox_push(BCM2835PropertyState *s, uint32_t value) break; case 0x00010001: /* Get board model */ qemu_log_mask(LOG_UNIMP, - "bcm2835_property: %x get board model NYI\n", tag); + "bcm2835_property: 0x%08x get board model NYI\n", + tag); resplen = 4; break; case 0x00010002: /* Get board revision */ @@ -XXX,XX +XXX,XX @@ static void bcm2835_property_mbox_push(BCM2835PropertyState *s, uint32_t value) break; case 0x00010004: /* Get board serial */ qemu_log_mask(LOG_UNIMP, - "bcm2835_property: %x get board serial NYI\n", tag); + "bcm2835_property: 0x%08x get board serial NYI\n", + tag); resplen = 8; break; case 0x00010005: /* Get ARM memory */ @@ -XXX,XX +XXX,XX @@ static void bcm2835_property_mbox_push(BCM2835PropertyState *s, uint32_t value) case 0x00038001: /* Set clock state */ qemu_log_mask(LOG_UNIMP, - "bcm2835_property: %x set clock state NYI\n", tag); + "bcm2835_property: 0x%08x set clock state NYI\n", + tag); resplen = 8; break; @@ -XXX,XX +XXX,XX @@ static void bcm2835_property_mbox_push(BCM2835PropertyState *s, uint32_t value) case 0x00038004: /* Set max clock rate */ case 0x00038007: /* Set min clock rate */ qemu_log_mask(LOG_UNIMP, - "bcm2835_property: %x set clock rates NYI\n", tag); + "bcm2835_property: 0x%08x set clock rate NYI\n", + tag); resplen = 8; break; @@ -XXX,XX +XXX,XX @@ static void bcm2835_property_mbox_push(BCM2835PropertyState *s, uint32_t value) break; default: - qemu_log_mask(LOG_GUEST_ERROR, - "bcm2835_property: unhandled tag %08x\n", tag); + qemu_log_mask(LOG_UNIMP, + "bcm2835_property: unhandled tag 0x%08x\n", tag); break; } -- 2.20.1
From: Philippe Mathieu-Daudé <f4bug@amsat.org> Various address spaces from the BCM2835 are reported as 'anonymous' in memory tree: (qemu) info mtree address-space: anonymous 0000000000000000-000000000000008f (prio 0, i/o): bcm2835-mbox 0000000000000010-000000000000001f (prio 0, i/o): bcm2835-fb 0000000000000080-000000000000008f (prio 0, i/o): bcm2835-property address-space: anonymous 0000000000000000-00000000ffffffff (prio 0, i/o): bcm2835-gpu 0000000000000000-000000003fffffff (prio 0, i/o): alias bcm2835-gpu-ram-alias[*] @ram 0000000000000000-000000003fffffff 0000000040000000-000000007fffffff (prio 0, i/o): alias bcm2835-gpu-ram-alias[*] @ram 0000000000000000-000000003fffffff 000000007e000000-000000007effffff (prio 1, i/o): alias bcm2835-peripherals @bcm2835-peripherals 0000000000000000-0000000000ffffff 0000000080000000-00000000bfffffff (prio 0, i/o): alias bcm2835-gpu-ram-alias[*] @ram 0000000000000000-000000003fffffff 00000000c0000000-00000000ffffffff (prio 0, i/o): alias bcm2835-gpu-ram-alias[*] @ram 0000000000000000-000000003fffffff [...] Since the address_space_init() function takes a 'name' argument, set it to correctly describe each address space: (qemu) info mtree address-space: bcm2835-mbox-memory 0000000000000000-000000000000008f (prio 0, i/o): bcm2835-mbox 0000000000000010-000000000000001f (prio 0, i/o): bcm2835-fb 0000000000000080-000000000000008f (prio 0, i/o): bcm2835-property address-space: bcm2835-fb-memory 0000000000000000-00000000ffffffff (prio 0, i/o): bcm2835-gpu 0000000000000000-000000003fffffff (prio 0, i/o): alias bcm2835-gpu-ram-alias[*] @ram 0000000000000000-000000003fffffff 0000000040000000-000000007fffffff (prio 0, i/o): alias bcm2835-gpu-ram-alias[*] @ram 0000000000000000-000000003fffffff 000000007e000000-000000007effffff (prio 1, i/o): alias bcm2835-peripherals @bcm2835-peripherals 0000000000000000-0000000000ffffff 0000000080000000-00000000bfffffff (prio 0, i/o): alias bcm2835-gpu-ram-alias[*] @ram 0000000000000000-000000003fffffff 00000000c0000000-00000000ffffffff (prio 0, i/o): alias bcm2835-gpu-ram-alias[*] @ram 0000000000000000-000000003fffffff address-space: bcm2835-property-memory 0000000000000000-00000000ffffffff (prio 0, i/o): bcm2835-gpu 0000000000000000-000000003fffffff (prio 0, i/o): alias bcm2835-gpu-ram-alias[*] @ram 0000000000000000-000000003fffffff 0000000040000000-000000007fffffff (prio 0, i/o): alias bcm2835-gpu-ram-alias[*] @ram 0000000000000000-000000003fffffff 000000007e000000-000000007effffff (prio 1, i/o): alias bcm2835-peripherals @bcm2835-peripherals 0000000000000000-0000000000ffffff 0000000080000000-00000000bfffffff (prio 0, i/o): alias bcm2835-gpu-ram-alias[*] @ram 0000000000000000-000000003fffffff 00000000c0000000-00000000ffffffff (prio 0, i/o): alias bcm2835-gpu-ram-alias[*] @ram 0000000000000000-000000003fffffff address-space: bcm2835-dma-memory 0000000000000000-00000000ffffffff (prio 0, i/o): bcm2835-gpu 0000000000000000-000000003fffffff (prio 0, i/o): alias bcm2835-gpu-ram-alias[*] @ram 0000000000000000-000000003fffffff 0000000040000000-000000007fffffff (prio 0, i/o): alias bcm2835-gpu-ram-alias[*] @ram 0000000000000000-000000003fffffff 000000007e000000-000000007effffff (prio 1, i/o): alias bcm2835-peripherals @bcm2835-peripherals 0000000000000000-0000000000ffffff 0000000080000000-00000000bfffffff (prio 0, i/o): alias bcm2835-gpu-ram-alias[*] @ram 0000000000000000-000000003fffffff 00000000c0000000-00000000ffffffff (prio 0, i/o): alias bcm2835-gpu-ram-alias[*] @ram 0000000000000000-000000003fffffff Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Cleber Rosa <crosa@redhat.com> Message-id: 20190926173428.10713-4-f4bug@amsat.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- hw/display/bcm2835_fb.c | 2 +- hw/dma/bcm2835_dma.c | 2 +- hw/misc/bcm2835_mbox.c | 2 +- hw/misc/bcm2835_property.c | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/hw/display/bcm2835_fb.c b/hw/display/bcm2835_fb.c index XXXXXXX..XXXXXXX 100644 --- a/hw/display/bcm2835_fb.c +++ b/hw/display/bcm2835_fb.c @@ -XXX,XX +XXX,XX @@ static void bcm2835_fb_realize(DeviceState *dev, Error **errp) s->initial_config.base = s->vcram_base + BCM2835_FB_OFFSET; s->dma_mr = MEMORY_REGION(obj); - address_space_init(&s->dma_as, s->dma_mr, NULL); + address_space_init(&s->dma_as, s->dma_mr, TYPE_BCM2835_FB "-memory"); bcm2835_fb_reset(dev); diff --git a/hw/dma/bcm2835_dma.c b/hw/dma/bcm2835_dma.c index XXXXXXX..XXXXXXX 100644 --- a/hw/dma/bcm2835_dma.c +++ b/hw/dma/bcm2835_dma.c @@ -XXX,XX +XXX,XX @@ static void bcm2835_dma_realize(DeviceState *dev, Error **errp) } s->dma_mr = MEMORY_REGION(obj); - address_space_init(&s->dma_as, s->dma_mr, NULL); + address_space_init(&s->dma_as, s->dma_mr, TYPE_BCM2835_DMA "-memory"); bcm2835_dma_reset(dev); } diff --git a/hw/misc/bcm2835_mbox.c b/hw/misc/bcm2835_mbox.c index XXXXXXX..XXXXXXX 100644 --- a/hw/misc/bcm2835_mbox.c +++ b/hw/misc/bcm2835_mbox.c @@ -XXX,XX +XXX,XX @@ static void bcm2835_mbox_realize(DeviceState *dev, Error **errp) } s->mbox_mr = MEMORY_REGION(obj); - address_space_init(&s->mbox_as, s->mbox_mr, NULL); + address_space_init(&s->mbox_as, s->mbox_mr, TYPE_BCM2835_MBOX "-memory"); bcm2835_mbox_reset(dev); } diff --git a/hw/misc/bcm2835_property.c b/hw/misc/bcm2835_property.c index XXXXXXX..XXXXXXX 100644 --- a/hw/misc/bcm2835_property.c +++ b/hw/misc/bcm2835_property.c @@ -XXX,XX +XXX,XX @@ static void bcm2835_property_realize(DeviceState *dev, Error **errp) } s->dma_mr = MEMORY_REGION(obj); - address_space_init(&s->dma_as, s->dma_mr, NULL); + address_space_init(&s->dma_as, s->dma_mr, TYPE_BCM2835_PROPERTY "-memory"); /* TODO: connect to MAC address of USB NIC device, once we emulate it */ qemu_macaddr_default_if_unset(&s->macaddr); -- 2.20.1
From: Philippe Mathieu-Daudé <f4bug@amsat.org> The UART1 is part of the AUX peripheral, the PCM_CLOCK (yet unimplemented) is part of the CPRMAN. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20190926173428.10713-5-f4bug@amsat.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/hw/arm/raspi_platform.h | 16 +++++++--------- hw/arm/bcm2835_peripherals.c | 7 ++++--- hw/arm/bcm2836.c | 2 +- 3 files changed, 12 insertions(+), 13 deletions(-) diff --git a/include/hw/arm/raspi_platform.h b/include/hw/arm/raspi_platform.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/arm/raspi_platform.h +++ b/include/hw/arm/raspi_platform.h @@ -XXX,XX +XXX,XX @@ #ifndef HW_ARM_RASPI_PLATFORM_H #define HW_ARM_RASPI_PLATFORM_H -#define MCORE_OFFSET 0x0000 /* Fake frame buffer device - * (the multicore sync block) */ +#define MSYNC_OFFSET 0x0000 /* Multicore Sync Block */ #define IC0_OFFSET 0x2000 #define ST_OFFSET 0x3000 /* System Timer */ #define MPHI_OFFSET 0x6000 /* Message-based Parallel Host Intf. */ @@ -XXX,XX +XXX,XX @@ #define ARMCTRL_TIMER0_1_OFFSET (ARM_OFFSET + 0x400) /* Timer 0 and 1 */ #define ARMCTRL_0_SBM_OFFSET (ARM_OFFSET + 0x800) /* User 0 (ARM) Semaphores * Doorbells & Mailboxes */ -#define PM_OFFSET 0x100000 /* Power Management, Reset controller - * and Watchdog registers */ -#define PCM_CLOCK_OFFSET 0x101098 +#define CPRMAN_OFFSET 0x100000 /* Power Management, Watchdog */ +#define CM_OFFSET 0x101000 /* Clock Management */ #define RNG_OFFSET 0x104000 #define GPIO_OFFSET 0x200000 #define UART0_OFFSET 0x201000 @@ -XXX,XX +XXX,XX @@ #define I2S_OFFSET 0x203000 #define SPI0_OFFSET 0x204000 #define BSC0_OFFSET 0x205000 /* BSC0 I2C/TWI */ -#define UART1_OFFSET 0x215000 -#define EMMC_OFFSET 0x300000 +#define AUX_OFFSET 0x215000 /* AUX: UART1/SPI1/SPI2 */ +#define EMMC1_OFFSET 0x300000 #define SMI_OFFSET 0x600000 #define BSC1_OFFSET 0x804000 /* BSC1 I2C/TWI */ -#define USB_OFFSET 0x980000 /* DTC_OTG USB controller */ +#define USB_OTG_OFFSET 0x980000 /* DTC_OTG USB controller */ #define DMA15_OFFSET 0xE05000 /* DMA controller, channel 15 */ /* GPU interrupts */ @@ -XXX,XX +XXX,XX @@ #define INTERRUPT_SPI 54 #define INTERRUPT_I2SPCM 55 #define INTERRUPT_SDIO 56 -#define INTERRUPT_UART 57 +#define INTERRUPT_UART0 57 #define INTERRUPT_SLIMBUS 58 #define INTERRUPT_VEC 59 #define INTERRUPT_CPG 60 diff --git a/hw/arm/bcm2835_peripherals.c b/hw/arm/bcm2835_peripherals.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/bcm2835_peripherals.c +++ b/hw/arm/bcm2835_peripherals.c @@ -XXX,XX +XXX,XX @@ static void bcm2835_peripherals_realize(DeviceState *dev, Error **errp) sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->uart0), 0)); sysbus_connect_irq(SYS_BUS_DEVICE(&s->uart0), 0, qdev_get_gpio_in_named(DEVICE(&s->ic), BCM2835_IC_GPU_IRQ, - INTERRUPT_UART)); + INTERRUPT_UART0)); + /* AUX / UART1 */ qdev_prop_set_chr(DEVICE(&s->aux), "chardev", serial_hd(1)); @@ -XXX,XX +XXX,XX @@ static void bcm2835_peripherals_realize(DeviceState *dev, Error **errp) return; } - memory_region_add_subregion(&s->peri_mr, UART1_OFFSET, + memory_region_add_subregion(&s->peri_mr, AUX_OFFSET, sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->aux), 0)); sysbus_connect_irq(SYS_BUS_DEVICE(&s->aux), 0, qdev_get_gpio_in_named(DEVICE(&s->ic), BCM2835_IC_GPU_IRQ, @@ -XXX,XX +XXX,XX @@ static void bcm2835_peripherals_realize(DeviceState *dev, Error **errp) return; } - memory_region_add_subregion(&s->peri_mr, EMMC_OFFSET, + memory_region_add_subregion(&s->peri_mr, EMMC1_OFFSET, sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->sdhci), 0)); sysbus_connect_irq(SYS_BUS_DEVICE(&s->sdhci), 0, qdev_get_gpio_in_named(DEVICE(&s->ic), BCM2835_IC_GPU_IRQ, diff --git a/hw/arm/bcm2836.c b/hw/arm/bcm2836.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/bcm2836.c +++ b/hw/arm/bcm2836.c @@ -XXX,XX +XXX,XX @@ static void bcm2836_realize(DeviceState *dev, Error **errp) /* set periphbase/CBAR value for CPU-local registers */ object_property_set_int(OBJECT(&s->cpus[n]), - BCM2836_PERI_BASE + MCORE_OFFSET, + BCM2836_PERI_BASE + MSYNC_OFFSET, "reset-cbar", &err); if (err) { error_propagate(errp, err); -- 2.20.1
From: Philippe Mathieu-Daudé <f4bug@amsat.org> Base addresses and sizes taken from the "BCM2835 ARM Peripherals" datasheet from February 06 2012: https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20190926173428.10713-6-f4bug@amsat.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- include/hw/arm/bcm2835_peripherals.h | 15 ++++++++++++++ include/hw/arm/raspi_platform.h | 8 +++++++ hw/arm/bcm2835_peripherals.c | 31 ++++++++++++++++++++++++++++ 3 files changed, 54 insertions(+) diff --git a/include/hw/arm/bcm2835_peripherals.h b/include/hw/arm/bcm2835_peripherals.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/arm/bcm2835_peripherals.h +++ b/include/hw/arm/bcm2835_peripherals.h @@ -XXX,XX +XXX,XX @@ #include "hw/sd/sdhci.h" #include "hw/sd/bcm2835_sdhost.h" #include "hw/gpio/bcm2835_gpio.h" +#include "hw/misc/unimp.h" #define TYPE_BCM2835_PERIPHERALS "bcm2835-peripherals" #define BCM2835_PERIPHERALS(obj) \ @@ -XXX,XX +XXX,XX @@ typedef struct BCM2835PeripheralState { MemoryRegion ram_alias[4]; qemu_irq irq, fiq; + UnimplementedDeviceState systmr; + UnimplementedDeviceState armtmr; + UnimplementedDeviceState cprman; + UnimplementedDeviceState a2w; PL011State uart0; BCM2835AuxState aux; BCM2835FBState fb; @@ -XXX,XX +XXX,XX @@ typedef struct BCM2835PeripheralState { SDHCIState sdhci; BCM2835SDHostState sdhost; BCM2835GpioState gpio; + UnimplementedDeviceState i2s; + UnimplementedDeviceState spi[1]; + UnimplementedDeviceState i2c[3]; + UnimplementedDeviceState otp; + UnimplementedDeviceState dbus; + UnimplementedDeviceState ave0; + UnimplementedDeviceState bscsl; + UnimplementedDeviceState smi; + UnimplementedDeviceState dwc2; + UnimplementedDeviceState sdramc; } BCM2835PeripheralState; #endif /* BCM2835_PERIPHERALS_H */ diff --git a/include/hw/arm/raspi_platform.h b/include/hw/arm/raspi_platform.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/arm/raspi_platform.h +++ b/include/hw/arm/raspi_platform.h @@ -XXX,XX +XXX,XX @@ * Doorbells & Mailboxes */ #define CPRMAN_OFFSET 0x100000 /* Power Management, Watchdog */ #define CM_OFFSET 0x101000 /* Clock Management */ +#define A2W_OFFSET 0x102000 /* Reset controller */ +#define AVS_OFFSET 0x103000 /* Audio Video Standard */ #define RNG_OFFSET 0x104000 #define GPIO_OFFSET 0x200000 #define UART0_OFFSET 0x201000 @@ -XXX,XX +XXX,XX @@ #define I2S_OFFSET 0x203000 #define SPI0_OFFSET 0x204000 #define BSC0_OFFSET 0x205000 /* BSC0 I2C/TWI */ +#define OTP_OFFSET 0x20f000 +#define BSC_SL_OFFSET 0x214000 /* SPI slave */ #define AUX_OFFSET 0x215000 /* AUX: UART1/SPI1/SPI2 */ #define EMMC1_OFFSET 0x300000 #define SMI_OFFSET 0x600000 #define BSC1_OFFSET 0x804000 /* BSC1 I2C/TWI */ +#define BSC2_OFFSET 0x805000 /* BSC2 I2C/TWI */ +#define DBUS_OFFSET 0x900000 +#define AVE0_OFFSET 0x910000 #define USB_OTG_OFFSET 0x980000 /* DTC_OTG USB controller */ +#define SDRAMC_OFFSET 0xe00000 #define DMA15_OFFSET 0xE05000 /* DMA controller, channel 15 */ /* GPU interrupts */ diff --git a/hw/arm/bcm2835_peripherals.c b/hw/arm/bcm2835_peripherals.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/bcm2835_peripherals.c +++ b/hw/arm/bcm2835_peripherals.c @@ -XXX,XX +XXX,XX @@ /* Capabilities for SD controller: no DMA, high-speed, default clocks etc. */ #define BCM2835_SDHC_CAPAREG 0x52134b4 +static void create_unimp(BCM2835PeripheralState *ps, + UnimplementedDeviceState *uds, + const char *name, hwaddr ofs, hwaddr size) +{ + sysbus_init_child_obj(OBJECT(ps), name, uds, + sizeof(UnimplementedDeviceState), + TYPE_UNIMPLEMENTED_DEVICE); + qdev_prop_set_string(DEVICE(uds), "name", name); + qdev_prop_set_uint64(DEVICE(uds), "size", size); + object_property_set_bool(OBJECT(uds), true, "realized", &error_fatal); + memory_region_add_subregion_overlap(&ps->peri_mr, ofs, + sysbus_mmio_get_region(SYS_BUS_DEVICE(uds), 0), -1000); +} + static void bcm2835_peripherals_init(Object *obj) { BCM2835PeripheralState *s = BCM2835_PERIPHERALS(obj); @@ -XXX,XX +XXX,XX @@ static void bcm2835_peripherals_realize(DeviceState *dev, Error **errp) error_propagate(errp, err); return; } + + create_unimp(s, &s->armtmr, "bcm2835-sp804", ARMCTRL_TIMER0_1_OFFSET, 0x40); + create_unimp(s, &s->systmr, "bcm2835-systimer", ST_OFFSET, 0x20); + create_unimp(s, &s->cprman, "bcm2835-cprman", CPRMAN_OFFSET, 0x1000); + create_unimp(s, &s->a2w, "bcm2835-a2w", A2W_OFFSET, 0x1000); + create_unimp(s, &s->i2s, "bcm2835-i2s", I2S_OFFSET, 0x100); + create_unimp(s, &s->smi, "bcm2835-smi", SMI_OFFSET, 0x100); + create_unimp(s, &s->spi[0], "bcm2835-spi0", SPI0_OFFSET, 0x20); + create_unimp(s, &s->bscsl, "bcm2835-spis", BSC_SL_OFFSET, 0x100); + create_unimp(s, &s->i2c[0], "bcm2835-i2c0", BSC0_OFFSET, 0x20); + create_unimp(s, &s->i2c[1], "bcm2835-i2c1", BSC1_OFFSET, 0x20); + create_unimp(s, &s->i2c[2], "bcm2835-i2c2", BSC2_OFFSET, 0x20); + create_unimp(s, &s->otp, "bcm2835-otp", OTP_OFFSET, 0x80); + create_unimp(s, &s->dbus, "bcm2835-dbus", DBUS_OFFSET, 0x8000); + create_unimp(s, &s->ave0, "bcm2835-ave0", AVE0_OFFSET, 0x8000); + create_unimp(s, &s->dwc2, "dwc-usb2", USB_OTG_OFFSET, 0x1000); + create_unimp(s, &s->sdramc, "bcm2835-sdramc", SDRAMC_OFFSET, 0x100); } static void bcm2835_peripherals_class_init(ObjectClass *oc, void *data) -- 2.20.1
From: Philippe Mathieu-Daudé <f4bug@amsat.org> Add trace events for read/write accesses and IRQ. Properties are structures used for the ARM particular MBOX. Since one call in bcm2835_property.c concerns the mbox block, name this trace event in the same bcm2835_mbox* namespace. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20190926173428.10713-8-f4bug@amsat.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- hw/misc/bcm2835_mbox.c | 5 +++++ hw/misc/bcm2835_property.c | 2 ++ hw/misc/trace-events | 6 ++++++ 3 files changed, 13 insertions(+) diff --git a/hw/misc/bcm2835_mbox.c b/hw/misc/bcm2835_mbox.c index XXXXXXX..XXXXXXX 100644 --- a/hw/misc/bcm2835_mbox.c +++ b/hw/misc/bcm2835_mbox.c @@ -XXX,XX +XXX,XX @@ #include "migration/vmstate.h" #include "qemu/log.h" #include "qemu/module.h" +#include "trace.h" #define MAIL0_PEEK 0x90 #define MAIL0_SENDER 0x94 @@ -XXX,XX +XXX,XX @@ static void bcm2835_mbox_update(BCM2835MboxState *s) set = true; } } + trace_bcm2835_mbox_irq(set); qemu_set_irq(s->arm_irq, set); } @@ -XXX,XX +XXX,XX @@ static uint64_t bcm2835_mbox_read(void *opaque, hwaddr offset, unsigned size) default: qemu_log_mask(LOG_UNIMP, "%s: Unsupported offset 0x%"HWADDR_PRIx"\n", __func__, offset); + trace_bcm2835_mbox_read(size, offset, res); return 0; } + trace_bcm2835_mbox_read(size, offset, res); bcm2835_mbox_update(s); @@ -XXX,XX +XXX,XX @@ static void bcm2835_mbox_write(void *opaque, hwaddr offset, offset &= 0xff; + trace_bcm2835_mbox_write(size, offset, value); switch (offset) { case MAIL0_SENDER: break; diff --git a/hw/misc/bcm2835_property.c b/hw/misc/bcm2835_property.c index XXXXXXX..XXXXXXX 100644 --- a/hw/misc/bcm2835_property.c +++ b/hw/misc/bcm2835_property.c @@ -XXX,XX +XXX,XX @@ #include "sysemu/dma.h" #include "qemu/log.h" #include "qemu/module.h" +#include "trace.h" /* https://github.com/raspberrypi/firmware/wiki/Mailbox-property-interface */ @@ -XXX,XX +XXX,XX @@ static void bcm2835_property_mbox_push(BCM2835PropertyState *s, uint32_t value) break; } + trace_bcm2835_mbox_property(tag, bufsize, resplen); if (tag == 0) { break; } diff --git a/hw/misc/trace-events b/hw/misc/trace-events index XXXXXXX..XXXXXXX 100644 --- a/hw/misc/trace-events +++ b/hw/misc/trace-events @@ -XXX,XX +XXX,XX @@ armsse_mhu_write(uint64_t offset, uint64_t data, unsigned size) "SSE-200 MHU wri # aspeed_xdma.c aspeed_xdma_write(uint64_t offset, uint64_t data) "XDMA write: offset 0x%" PRIx64 " data 0x%" PRIx64 + +# bcm2835_mbox.c +bcm2835_mbox_write(unsigned int size, uint64_t addr, uint64_t value) "mbox write sz:%u addr:0x%"PRIx64" data:0x%"PRIx64 +bcm2835_mbox_read(unsigned int size, uint64_t addr, uint64_t value) "mbox read sz:%u addr:0x%"PRIx64" data:0x%"PRIx64 +bcm2835_mbox_irq(unsigned level) "mbox irq:ARM level:%u" +bcm2835_mbox_property(uint32_t tag, uint32_t bufsize, size_t resplen) "mbox property tag:0x%08x in_sz:%u out_sz:%zu" -- 2.20.1
Hi; here's a target-arm pullreq. Apologies for the size: this is because it has all RTH's work to enable emulation of SME2p1 and SVE2p1. thanks -- PMM The following changes since commit c77283dd5d79149f4e7e9edd00f65416c648ee59: Merge tag 'pull-request-2025-07-02' of https://gitlab.com/thuth/qemu into staging (2025-07-03 06:01:41 -0400) are available in the Git repository at: https://gitlab.com/pm215/qemu.git tags/pull-target-arm-20250704 for you to fetch changes up to 083fef73585dfa03f72055ace6de8dec4912d0b0: linux-user/aarch64: Set hwcap bits for SME2p1/SVE2p1 (2025-07-04 15:53:23 +0100) ---------------------------------------------------------------- target-arm queue: * Implement emulation of SME2p1 and SVE2p1 * Correctly enforce alignment checks for v8M loads and stores done via helper functions * Mark the "highbank" and the "midway" machine as deprecated ---------------------------------------------------------------- Peter Maydell (4): target/arm: Rename FMOPA_h to FMOPA_w_h target/arm: Rename BFMOPA to BFMOPA_w target/arm: Implement FMOPA (non-widening) for fp16 target/arm: Implement SME2 BFMOPA (non-widening) Richard Henderson (103): target/arm: Fix SME vs AdvSIMD exception priority target/arm: Fix sve_access_check for SME target/arm: Fix 128-bit element ZIP, UZP, TRN target/arm: Replace @rda_rn_rm_e0 in sve.decode target/arm: Fix FMMLA (64-bit element) for 128-bit VL target/arm: Disable FEAT_F64MM if maximum SVE vector size too small target/arm: Fix PSEL size operands to tcg_gen_gvec_ands target/arm: Fix f16_dotadd vs nan selection target/arm: Fix bfdotadd_ebf vs nan selection target/arm: Remove CPUARMState.vfp.scratch target/arm: Introduce FPST_ZA, FPST_ZA_F16 target/arm: Use FPST_ZA for sme_fmopa_[hsd] target/arm: Rename zarray to za_state.za target/arm: Add isar feature tests for SME2p1, SVE2p1 target/arm: Add ZT0 target/arm: Add zt0_excp_el to DisasContext target/arm: Implement SME2 ZERO ZT0 target/arm: Add alignment argument to gen_sve_{ldr, str} target/arm: Implement SME2 LDR/STR ZT0 target/arm: Implement SME2 MOVT target/arm: Split get_tile_rowcol argument tile_index target/arm: Rename MOVA for translate target/arm: Split out get_zarray target/arm: Introduce ARMCPU.sme_max_vq target/arm: Implement SME2 MOVA to/from tile, multiple registers target/arm: Implement SME2 MOVA to/from array, multiple registers target/arm: Implement SME2 BMOPA target/arm: Implement SME2 SMOPS, UMOPS (2-way) target/arm: Introduce gen_gvec_sve2_sqdmulh target/arm: Implement SME2 Multiple and Single SVE Destructive target/arm: Implement SME2 Multiple Vectors SVE Destructive target/arm: Implement SME2 ADD/SUB (array results, multiple and single vector) target/arm: Implement SME2 ADD/SUB (array results, multiple vectors) target/arm: Pass ZA to helper_sve2_fmlal_zz[zx]w_s target/arm: Add helper_gvec{_ah}_bfmlsl{_nx} target/arm: Implement SME2 FMLAL, BFMLAL target/arm: Implement SME2 FDOT target/arm: Implement SME2 BFDOT target/arm: Implement SME2 FVDOT, BFVDOT target/arm: Rename helper_gvec_*dot_[bh] to *_4[bh] target/arm: Implemement SME2 SDOT, UDOT, USDOT, SUDOT target/arm: Rename SVE SDOT and UDOT patterns target/arm: Tighten USDOT (vectors) decode target/arm: Implement SDOT, UDOT (2-way) for SME2/SVE2p1 target/arm: Implement SME2 SVDOT, UVDOT, SUVDOT, USVDOT target/arm: Implement SME2 SMLAL, SMLSL, UMLAL, UMLSL target/arm: Rename gvec_fml[as]_[hs] with _nf_ infix target/arm: Implement SME2 FMLA, FMLS target/arm: Implement SME2 BFMLA, BFMLS target/arm: Implement SME2 FADD, FSUB, BFADD, BFSUB target/arm: Implement SME2 ADD/SUB (array accumulator) target/arm: Implement SME2 BFCVT, BFCVTN, FCVT, FCVTN target/arm: Implement SME2 FCVT (widening), FCVTL target/arm: Implement SME2 FCVTZS, FCVTZU target/arm: Implement SME2 SCVTF, UCVTF target/arm: Implement SME2 FRINTN, FRINTP, FRINTM, FRINTA target/arm: Introduce do_[us]sat_[bhs] macros target/arm: Use do_[us]sat_[bhs] in sve_helper.c target/arm: Implement SME2 SQCVT, UQCVT, SQCVTU target/arm: Implement SQCVTN, UQCVTN, SQCVTUN for SME2/SVE2p1 target/arm: Implement SME2 SUNPK, UUNPK target/arm: Implement SME2 ZIP, UZP (four registers) target/arm: Move do_urshr, do_srshr to vec_internal.h target/arm: Implement SME2 SQRSHR, UQRSHR, SQRSHRN target/arm: Implement SME2 ZIP, UZP (two registers) target/arm: Implement SME2 FCLAMP, SCLAMP, UCLAMP target/arm: Enable SCLAMP, UCLAMP for SVE2p1 target/arm: Implement FCLAMP for SME2, SVE2p1 target/arm: Implement SME2p1 Multiple Zero target/arm: Introduce pred_count_test target/arm: Fold predtest_ones into helper_sve_brkns target/arm: Expand do_zero inline target/arm: Split out do_whilel from helper_sve_whilel target/arm: Split out do_whileg from helper_sve_whileg target/arm: Move scale by esz into helper_sve_while* target/arm: Split trans_WHILE to lt and gt target/arm: Enable PSEL for SVE2p1 target/arm: Implement SVE2p1 WHILE (predicate pair) target/arm: Implement SVE2p1 WHILE (predicate as counter) target/arm: Implement SVE2p1 PTRUE (predicate as counter) target/arm: Implement {ADD, SMIN, SMAX, UMIN, UMAX}QV for SVE2p1 target/arm: Implement SVE2p1 PEXT target/arm: Implement SME2 SEL target/arm: Implement ANDQV, ORQV, EORQV for SVE2p1 target/arm: Implement FADDQV, F{MIN, MAX}{NM}QV for SVE2p1 target/arm: Implement BFMLSLB{L, T} for SME2/SVE2p1 target/arm: Implement CNTP (predicate as counter) for SME2/SVE2p1 target/arm: Implement DUPQ for SME2p1/SVE2p1 target/arm: Implement EXTQ for SME2p1/SVE2p1 target/arm: Implement PMOV for SME2p1/SVE2p1 target/arm: Implement ZIPQ, UZPQ for SME2p1/SVE2p1 target/arm: Implement TBLQ, TBXQ for SME2p1/SVE2p1 target/arm: Implement SME2 counted predicate register load/store target/arm: Split the ST_zpri and ST_zprr patterns target/arm: Implement {LD1, ST1}{W, D} (128-bit element) for SVE2p1 target/arm: Move ld1qq and st1qq primitives to sve_ldst_internal.h target/arm: Implement {LD, ST}[234]Q for SME2p1/SVE2p1 target/arm: Implement LD1Q, ST1Q for SVE2p1 target/arm: Implement MOVAZ for SME2p1 target/arm: Implement LUTI2, LUTI4 for SME2/SME2p1 target/arm: Support FPCR.AH in SME FMOPS, BFMOPS target/arm: Enable FEAT_SME2p1 on -cpu max linux-user/aarch64: Set hwcap bits for SME2p1/SVE2p1 Thomas Huth (1): hw/arm/highbank: Mark the "highbank" and the "midway" machine as deprecated William Kosasih (11): target/arm: Bring VLSTM/VLLDM helper store/load closer to the ARM pseudocode target/arm: Fix BLXNS helper store alignment checks target/arm: Fix function_return helper load alignment checks target/arm: Fix VLDR helper load alignment checks target/arm: Fix VSTR helper store alignment checks target/arm: Fix VLDR_SG helper load alignment checks target/arm: Fix VSTR_SG helper store alignment checks target/arm: Fix VLD4 helper load alignment checks target/arm: Fix VLD2 helper load alignment checks target/arm: Fix VST4 helper store alignment checks target/arm: Fix VST2 helper store alignment checks docs/about/deprecated.rst | 7 + docs/system/arm/emulation.rst | 6 + target/arm/cpu-features.h | 63 ++ target/arm/cpu.h | 70 +- target/arm/syndrome.h | 1 + target/arm/tcg/helper-sme.h | 215 ++++- target/arm/tcg/helper-sve.h | 212 +++++ target/arm/tcg/helper.h | 91 +- target/arm/tcg/sve_ldst_internal.h | 89 ++ target/arm/tcg/translate-a64.h | 10 +- target/arm/tcg/translate.h | 9 + target/arm/tcg/vec_internal.h | 148 ++++ target/arm/tcg/sme.decode | 937 +++++++++++++++++++- target/arm/tcg/sve.decode | 327 +++++-- hw/arm/highbank.c | 2 + linux-user/aarch64/signal.c | 4 +- linux-user/elfload.c | 8 + target/arm/cpu.c | 11 +- target/arm/cpu64.c | 8 + target/arm/helper.c | 8 +- target/arm/machine.c | 22 +- target/arm/tcg/cpu64.c | 10 +- target/arm/tcg/gengvec64.c | 11 + target/arm/tcg/helper-a64.c | 2 + target/arm/tcg/hflags.c | 34 +- target/arm/tcg/m_helper.c | 33 +- target/arm/tcg/mve_helper.c | 183 ++-- target/arm/tcg/neon_helper.c | 30 + target/arm/tcg/sme_helper.c | 1674 +++++++++++++++++++++++++++++++++--- target/arm/tcg/sve_helper.c | 1201 +++++++++++++++++++++----- target/arm/tcg/translate-a64.c | 45 +- target/arm/tcg/translate-neon.c | 18 +- target/arm/tcg/translate-sme.c | 1480 ++++++++++++++++++++++++++++++- target/arm/tcg/translate-sve.c | 1019 +++++++++++++++++++--- target/arm/tcg/vec_helper.c | 384 +++++++-- target/arm/tcg/vfp_helper.c | 12 +- 36 files changed, 7605 insertions(+), 779 deletions(-)
From: Thomas Huth <thuth@redhat.com> We don't have any automatic regression tests for these machines and when asking the usual suspects on the mailing list we came to the conclusion that nobody tests these machines manually, too, so it seems like this is currently just completely unused code. Mark them as depre- cated to see whether anybody still speaks up during the deprecation period, otherwise we can likely remove these two machines in a couple of releases. Signed-off-by: Thomas Huth <thuth@redhat.com> Acked-by: Rob Herring <robh@kernel.org> Acked-by: Guenter Roeck <linux@roeck-us.net> Message-id: 20250702113051.46483-1-thuth@redhat.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> [PMM: tweaked deprecation.rst text] Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- docs/about/deprecated.rst | 7 +++++++ hw/arm/highbank.c | 2 ++ 2 files changed, 9 insertions(+) diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst index XXXXXXX..XXXXXXX 100644 --- a/docs/about/deprecated.rst +++ b/docs/about/deprecated.rst @@ -XXX,XX +XXX,XX @@ they want to use and avoids confusion. Existing users of the ``spike`` machine must ensure that they're setting the ``spike`` machine in the command line (``-M spike``). +Arm ``highbank`` and ``midway`` machines (since 10.1) +''''''''''''''''''''''''''''''''''''''''''''''''''''' + +There are no known users left for these machines (if you still use it, +please write a mail to the qemu-devel mailing list). If you just want to +boot a Cortex-A15 or Cortex-A9 Linux, use the ``virt`` machine instead. + System emulator binaries ------------------------ diff --git a/hw/arm/highbank.c b/hw/arm/highbank.c index XXXXXXX..XXXXXXX 100644 --- a/hw/arm/highbank.c +++ b/hw/arm/highbank.c @@ -XXX,XX +XXX,XX @@ static void highbank_class_init(ObjectClass *oc, const void *data) mc->max_cpus = 4; mc->ignore_memory_transaction_failures = true; mc->default_ram_id = "highbank.dram"; + mc->deprecation_reason = "no known users left for this machine"; } static const TypeInfo highbank_type = { @@ -XXX,XX +XXX,XX @@ static void midway_class_init(ObjectClass *oc, const void *data) mc->max_cpus = 4; mc->ignore_memory_transaction_failures = true; mc->default_ram_id = "highbank.dram"; + mc->deprecation_reason = "no known users left for this machine"; } static const TypeInfo midway_type = { -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch brings the VLSTM and VLLDM helper functions closer to the ARM pseudocode by adding MO_ALIGN to the MemOpIdx of the associated store (`cpu_stl_mmu`) operations and load (`cpu_ldl_mmu`) operations. Note that this is not a bug fix: an 8-byte alignment check already exists and remains in place, enforcing stricter alignment than the 4 bytes requirement in the individual loads and stores. This change merely makes the helper implementations closer to the ARM pseudocode. That said, as a side effect, the MMU index is now resolved once instead of on every `cpu_*_data_ra` call, reducing redundant lookups Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-2-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/m_helper.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/target/arm/tcg/m_helper.c b/target/arm/tcg/m_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/m_helper.c +++ b/target/arm/tcg/m_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fptr) bool s = env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_S_MASK; bool lspact = env->v7m.fpccr[s] & R_V7M_FPCCR_LSPACT_MASK; uintptr_t ra = GETPC(); + ARMMMUIdx mmu_idx = arm_mmu_idx(env); + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, + arm_to_core_mmu_idx(mmu_idx)); assert(env->v7m.secure); @@ -XXX,XX +XXX,XX @@ void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fptr) * Note that we do not use v7m_stack_write() here, because the * accesses should not set the FSR bits for stacking errors if they * fail. (In pseudocode terms, they are AccType_NORMAL, not AccType_STACK - * or AccType_LAZYFP). Faults in cpu_stl_data_ra() will throw exceptions + * or AccType_LAZYFP). Faults in cpu_stl_mmu() will throw exceptions * and longjmp out. */ if (!(env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_LSPEN_MASK)) { @@ -XXX,XX +XXX,XX @@ void HELPER(v7m_vlstm)(CPUARMState *env, uint32_t fptr) if (i >= 16) { faddr += 8; /* skip the slot for the FPSCR */ } - cpu_stl_data_ra(env, faddr, slo, ra); - cpu_stl_data_ra(env, faddr + 4, shi, ra); + cpu_stl_mmu(env, faddr, slo, oi, ra); + cpu_stl_mmu(env, faddr + 4, shi, oi, ra); } - cpu_stl_data_ra(env, fptr + 0x40, vfp_get_fpscr(env), ra); + cpu_stl_mmu(env, fptr + 0x40, vfp_get_fpscr(env), oi, ra); if (cpu_isar_feature(aa32_mve, cpu)) { - cpu_stl_data_ra(env, fptr + 0x44, env->v7m.vpr, ra); + cpu_stl_mmu(env, fptr + 0x44, env->v7m.vpr, oi, ra); } /* @@ -XXX,XX +XXX,XX @@ void HELPER(v7m_vlldm)(CPUARMState *env, uint32_t fptr) { ARMCPU *cpu = env_archcpu(env); uintptr_t ra = GETPC(); + ARMMMUIdx mmu_idx = arm_mmu_idx(env); + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, + arm_to_core_mmu_idx(mmu_idx)); /* fptr is the value of Rn, the frame pointer we load the FP regs from */ assert(env->v7m.secure); @@ -XXX,XX +XXX,XX @@ void HELPER(v7m_vlldm)(CPUARMState *env, uint32_t fptr) faddr += 8; /* skip the slot for the FPSCR and VPR */ } - slo = cpu_ldl_data_ra(env, faddr, ra); - shi = cpu_ldl_data_ra(env, faddr + 4, ra); + slo = cpu_ldl_mmu(env, faddr, oi, ra); + shi = cpu_ldl_mmu(env, faddr + 4, oi, ra); dn = (uint64_t) shi << 32 | slo; *aa32_vfp_dreg(env, i / 2) = dn; } - fpscr = cpu_ldl_data_ra(env, fptr + 0x40, ra); + fpscr = cpu_ldl_mmu(env, fptr + 0x40, oi, ra); vfp_set_fpscr(env, fpscr); if (cpu_isar_feature(aa32_mve, cpu)) { - env->v7m.vpr = cpu_ldl_data_ra(env, fptr + 0x44, ra); + env->v7m.vpr = cpu_ldl_mmu(env, fptr + 0x44, oi, ra); } } -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the store operations (when stacking the return pc and psr) in the BLXNS instruction. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-3-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/m_helper.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/target/arm/tcg/m_helper.c b/target/arm/tcg/m_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/m_helper.c +++ b/target/arm/tcg/m_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(v7m_blxns)(CPUARMState *env, uint32_t dest) } /* Note that these stores can throw exceptions on MPU faults */ - cpu_stl_data_ra(env, sp, nextinst, GETPC()); - cpu_stl_data_ra(env, sp + 4, saved_psr, GETPC()); + ARMMMUIdx mmu_idx = arm_mmu_idx(env); + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, + arm_to_core_mmu_idx(mmu_idx)); + cpu_stl_mmu(env, sp, nextinst, oi, GETPC()); + cpu_stl_mmu(env, sp + 4, saved_psr, oi, GETPC()); env->regs[13] = sp; env->regs[14] = 0xfeffffff; -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the load operations (when unstacking the return pc and psr) in the FunctionReturn pseudocode. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-4-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/m_helper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/arm/tcg/m_helper.c b/target/arm/tcg/m_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/m_helper.c +++ b/target/arm/tcg/m_helper.c @@ -XXX,XX +XXX,XX @@ static bool do_v7m_function_return(ARMCPU *cpu) * do them as secure, so work out what MMU index that is. */ mmu_idx = arm_v7m_mmu_idx_for_secstate(env, true); - oi = make_memop_idx(MO_LEUL, arm_to_core_mmu_idx(mmu_idx)); + oi = make_memop_idx(MO_LEUL | MO_ALIGN, arm_to_core_mmu_idx(mmu_idx)); newpc = cpu_ldl_mmu(env, frameptr, oi, 0); newpsr = cpu_ldl_mmu(env, frameptr + 4, oi, 0); -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the load operations in the VLDR instruction. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-5-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/mve_helper.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/mve_helper.c +++ b/target/arm/tcg/mve_helper.c @@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env) } /* For loads, predicated lanes are zeroed instead of keeping their old values */ -#define DO_VLDR(OP, MSIZE, LDTYPE, ESIZE, TYPE) \ +#define DO_VLDR(OP, MFLAG, MSIZE, MTYPE, LDTYPE, ESIZE, TYPE) \ void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr) \ { \ TYPE *d = vd; \ uint16_t mask = mve_element_mask(env); \ uint16_t eci_mask = mve_eci_mask(env); \ unsigned b, e; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MFLAG | MO_ALIGN, mmu_idx); \ /* \ * R_SXTM allows the dest reg to become UNKNOWN for abandoned \ * beats so we don't care if we update part of the dest and \ @@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env) for (b = 0, e = 0; b < 16; b += ESIZE, e++) { \ if (eci_mask & (1 << b)) { \ d[H##ESIZE(e)] = (mask & (1 << b)) ? \ - cpu_##LDTYPE##_data_ra(env, addr, GETPC()) : 0; \ + (MTYPE)cpu_##LDTYPE##_mmu(env, addr, oi, GETPC()) : 0;\ } \ addr += MSIZE; \ } \ @@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env) mve_advance_vpt(env); \ } -DO_VLDR(vldrb, 1, ldub, 1, uint8_t) -DO_VLDR(vldrh, 2, lduw, 2, uint16_t) -DO_VLDR(vldrw, 4, ldl, 4, uint32_t) +DO_VLDR(vldrb, MO_UB, 1, uint8_t, ldb, 1, uint8_t) +DO_VLDR(vldrh, MO_TEUW, 2, uint16_t, ldw, 2, uint16_t) +DO_VLDR(vldrw, MO_TEUL, 4, uint32_t, ldl, 4, uint32_t) DO_VSTR(vstrb, 1, stb, 1, uint8_t) DO_VSTR(vstrh, 2, stw, 2, uint16_t) DO_VSTR(vstrw, 4, stl, 4, uint32_t) -DO_VLDR(vldrb_sh, 1, ldsb, 2, int16_t) -DO_VLDR(vldrb_sw, 1, ldsb, 4, int32_t) -DO_VLDR(vldrb_uh, 1, ldub, 2, uint16_t) -DO_VLDR(vldrb_uw, 1, ldub, 4, uint32_t) -DO_VLDR(vldrh_sw, 2, ldsw, 4, int32_t) -DO_VLDR(vldrh_uw, 2, lduw, 4, uint32_t) +DO_VLDR(vldrb_sh, MO_SB, 1, int8_t, ldb, 2, int16_t) +DO_VLDR(vldrb_sw, MO_SB, 1, int8_t, ldb, 4, int32_t) +DO_VLDR(vldrb_uh, MO_UB, 1, uint8_t, ldb, 2, uint16_t) +DO_VLDR(vldrb_uw, MO_UB, 1, uint8_t, ldb, 4, uint32_t) +DO_VLDR(vldrh_sw, MO_TESW, 2, int16_t, ldw, 4, int32_t) +DO_VLDR(vldrh_uw, MO_TEUW, 2, uint16_t, ldw, 4, uint32_t) DO_VSTR(vstrb_h, 1, stb, 2, int16_t) DO_VSTR(vstrb_w, 1, stb, 4, int32_t) -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the store operations in the VSTR instruction. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-6-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/mve_helper.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/mve_helper.c +++ b/target/arm/tcg/mve_helper.c @@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env) mve_advance_vpt(env); \ } -#define DO_VSTR(OP, MSIZE, STTYPE, ESIZE, TYPE) \ +#define DO_VSTR(OP, MFLAG, MSIZE, STTYPE, ESIZE, TYPE) \ void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr) \ { \ TYPE *d = vd; \ uint16_t mask = mve_element_mask(env); \ unsigned b, e; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MFLAG | MO_ALIGN, mmu_idx); \ for (b = 0, e = 0; b < 16; b += ESIZE, e++) { \ if (mask & (1 << b)) { \ - cpu_##STTYPE##_data_ra(env, addr, d[H##ESIZE(e)], GETPC()); \ + cpu_##STTYPE##_mmu(env, addr, d[H##ESIZE(e)], oi, GETPC()); \ } \ addr += MSIZE; \ } \ @@ -XXX,XX +XXX,XX @@ DO_VLDR(vldrb, MO_UB, 1, uint8_t, ldb, 1, uint8_t) DO_VLDR(vldrh, MO_TEUW, 2, uint16_t, ldw, 2, uint16_t) DO_VLDR(vldrw, MO_TEUL, 4, uint32_t, ldl, 4, uint32_t) -DO_VSTR(vstrb, 1, stb, 1, uint8_t) -DO_VSTR(vstrh, 2, stw, 2, uint16_t) -DO_VSTR(vstrw, 4, stl, 4, uint32_t) +DO_VSTR(vstrb, MO_UB, 1, stb, 1, uint8_t) +DO_VSTR(vstrh, MO_TEUW, 2, stw, 2, uint16_t) +DO_VSTR(vstrw, MO_TEUL, 4, stl, 4, uint32_t) DO_VLDR(vldrb_sh, MO_SB, 1, int8_t, ldb, 2, int16_t) DO_VLDR(vldrb_sw, MO_SB, 1, int8_t, ldb, 4, int32_t) @@ -XXX,XX +XXX,XX @@ DO_VLDR(vldrb_uw, MO_UB, 1, uint8_t, ldb, 4, uint32_t) DO_VLDR(vldrh_sw, MO_TESW, 2, int16_t, ldw, 4, int32_t) DO_VLDR(vldrh_uw, MO_TEUW, 2, uint16_t, ldw, 4, uint32_t) -DO_VSTR(vstrb_h, 1, stb, 2, int16_t) -DO_VSTR(vstrb_w, 1, stb, 4, int32_t) -DO_VSTR(vstrh_w, 2, stw, 4, int32_t) +DO_VSTR(vstrb_h, MO_UB, 1, stb, 2, int16_t) +DO_VSTR(vstrb_w, MO_UB, 1, stb, 4, int32_t) +DO_VSTR(vstrh_w, MO_TEUW, 2, stw, 4, int32_t) #undef DO_VLDR #undef DO_VSTR -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the load operations in the VLDR_SG instructions. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-7-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/mve_helper.c | 42 ++++++++++++++++++++++--------------- 1 file changed, 25 insertions(+), 17 deletions(-) diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/mve_helper.c +++ b/target/arm/tcg/mve_helper.c @@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, MO_TEUW, 2, stw, 4, int32_t) * For loads, predicated lanes are zeroed instead of retaining * their previous values. */ -#define DO_VLDR_SG(OP, LDTYPE, ESIZE, TYPE, OFFTYPE, ADDRFN, WB) \ +#define DO_VLDR_SG(OP, MFLAG, MTYPE, LDTYPE, ESIZE, TYPE, OFFTYPE, ADDRFN, WB)\ void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, \ uint32_t base) \ { \ @@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, MO_TEUW, 2, stw, 4, int32_t) uint16_t eci_mask = mve_eci_mask(env); \ unsigned e; \ uint32_t addr; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MFLAG | MO_ALIGN, mmu_idx); \ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE, eci_mask >>= ESIZE) { \ if (!(eci_mask & 1)) { \ continue; \ } \ addr = ADDRFN(base, m[H##ESIZE(e)]); \ d[H##ESIZE(e)] = (mask & 1) ? \ - cpu_##LDTYPE##_data_ra(env, addr, GETPC()) : 0; \ + (MTYPE)cpu_##LDTYPE##_mmu(env, addr, oi, GETPC()) : 0; \ if (WB) { \ m[H##ESIZE(e)] = addr; \ } \ @@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, MO_TEUW, 2, stw, 4, int32_t) uint16_t eci_mask = mve_eci_mask(env); \ unsigned e; \ uint32_t addr; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (e = 0; e < 16 / 4; e++, mask >>= 4, eci_mask >>= 4) { \ if (!(eci_mask & 1)) { \ continue; \ } \ addr = ADDRFN(base, m[H4(e & ~1)]); \ addr += 4 * (e & 1); \ - d[H4(e)] = (mask & 1) ? cpu_ldl_data_ra(env, addr, GETPC()) : 0; \ + d[H4(e)] = (mask & 1) ? cpu_ldl_mmu(env, addr, oi, GETPC()) : 0; \ if (WB && (e & 1)) { \ m[H4(e & ~1)] = addr - 4; \ } \ @@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, MO_TEUW, 2, stw, 4, int32_t) #define ADDR_ADD_OSW(BASE, OFFSET) ((BASE) + ((OFFSET) << 2)) #define ADDR_ADD_OSD(BASE, OFFSET) ((BASE) + ((OFFSET) << 3)) -DO_VLDR_SG(vldrb_sg_sh, ldsb, 2, int16_t, uint16_t, ADDR_ADD, false) -DO_VLDR_SG(vldrb_sg_sw, ldsb, 4, int32_t, uint32_t, ADDR_ADD, false) -DO_VLDR_SG(vldrh_sg_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD, false) +DO_VLDR_SG(vldrb_sg_sh, MO_SB, int8_t, ldb, 2, int16_t, uint16_t, ADDR_ADD, false) +DO_VLDR_SG(vldrb_sg_sw, MO_SB, int8_t, ldb, 4, int32_t, uint32_t, ADDR_ADD, false) +DO_VLDR_SG(vldrh_sg_sw, MO_TESW, int16_t, ldw, 4, int32_t, uint32_t, ADDR_ADD, false) -DO_VLDR_SG(vldrb_sg_ub, ldub, 1, uint8_t, uint8_t, ADDR_ADD, false) -DO_VLDR_SG(vldrb_sg_uh, ldub, 2, uint16_t, uint16_t, ADDR_ADD, false) -DO_VLDR_SG(vldrb_sg_uw, ldub, 4, uint32_t, uint32_t, ADDR_ADD, false) -DO_VLDR_SG(vldrh_sg_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD, false) -DO_VLDR_SG(vldrh_sg_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD, false) -DO_VLDR_SG(vldrw_sg_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD, false) +DO_VLDR_SG(vldrb_sg_ub, MO_UB, uint8_t, ldb, 1, uint8_t, uint8_t, ADDR_ADD, false) +DO_VLDR_SG(vldrb_sg_uh, MO_UB, uint8_t, ldb, 2, uint16_t, uint16_t, ADDR_ADD, false) +DO_VLDR_SG(vldrb_sg_uw, MO_UB, uint8_t, ldb, 4, uint32_t, uint32_t, ADDR_ADD, false) +DO_VLDR_SG(vldrh_sg_uh, MO_TEUW, uint16_t, ldw, 2, uint16_t, uint16_t, ADDR_ADD, false) +DO_VLDR_SG(vldrh_sg_uw, MO_TEUW, uint16_t, ldw, 4, uint32_t, uint32_t, ADDR_ADD, false) +DO_VLDR_SG(vldrw_sg_uw, MO_TEUL, uint32_t, ldl, 4, uint32_t, uint32_t, ADDR_ADD, false) DO_VLDR64_SG(vldrd_sg_ud, ADDR_ADD, false) -DO_VLDR_SG(vldrh_sg_os_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD_OSH, false) -DO_VLDR_SG(vldrh_sg_os_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD_OSH, false) -DO_VLDR_SG(vldrh_sg_os_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD_OSH, false) -DO_VLDR_SG(vldrw_sg_os_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD_OSW, false) +DO_VLDR_SG(vldrh_sg_os_sw, MO_TESW, int16_t, ldw, 4, + int32_t, uint32_t, ADDR_ADD_OSH, false) +DO_VLDR_SG(vldrh_sg_os_uh, MO_TEUW, uint16_t, ldw, 2, + uint16_t, uint16_t, ADDR_ADD_OSH, false) +DO_VLDR_SG(vldrh_sg_os_uw, MO_TEUW, uint16_t, ldw, 4, + uint32_t, uint32_t, ADDR_ADD_OSH, false) +DO_VLDR_SG(vldrw_sg_os_uw, MO_TEUL, uint32_t, ldl, 4, + uint32_t, uint32_t, ADDR_ADD_OSW, false) DO_VLDR64_SG(vldrd_sg_os_ud, ADDR_ADD_OSD, false) DO_VSTR_SG(vstrb_sg_ub, stb, 1, uint8_t, ADDR_ADD, false) @@ -XXX,XX +XXX,XX @@ DO_VSTR_SG(vstrh_sg_os_uw, stw, 4, uint32_t, ADDR_ADD_OSH, false) DO_VSTR_SG(vstrw_sg_os_uw, stl, 4, uint32_t, ADDR_ADD_OSW, false) DO_VSTR64_SG(vstrd_sg_os_ud, ADDR_ADD_OSD, false) -DO_VLDR_SG(vldrw_sg_wb_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD, true) +DO_VLDR_SG(vldrw_sg_wb_uw, MO_TEUL, uint32_t, ldl, 4, uint32_t, uint32_t, ADDR_ADD, true) DO_VLDR64_SG(vldrd_sg_wb_ud, ADDR_ADD, true) DO_VSTR_SG(vstrw_sg_wb_uw, stl, 4, uint32_t, ADDR_ADD, true) DO_VSTR64_SG(vstrd_sg_wb_ud, ADDR_ADD, true) -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the store operations in the VSTR_SG instructions. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-8-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/mve_helper.c | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/mve_helper.c +++ b/target/arm/tcg/mve_helper.c @@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, MO_TEUW, 2, stw, 4, int32_t) } /* We know here TYPE is unsigned so always the same as the offset type */ -#define DO_VSTR_SG(OP, STTYPE, ESIZE, TYPE, ADDRFN, WB) \ +#define DO_VSTR_SG(OP, MFLAG, STTYPE, ESIZE, TYPE, ADDRFN, WB) \ void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, \ uint32_t base) \ { \ @@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, MO_TEUW, 2, stw, 4, int32_t) uint16_t eci_mask = mve_eci_mask(env); \ unsigned e; \ uint32_t addr; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MFLAG | MO_ALIGN, mmu_idx); \ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE, eci_mask >>= ESIZE) { \ if (!(eci_mask & 1)) { \ continue; \ } \ addr = ADDRFN(base, m[H##ESIZE(e)]); \ if (mask & 1) { \ - cpu_##STTYPE##_data_ra(env, addr, d[H##ESIZE(e)], GETPC()); \ + cpu_##STTYPE##_mmu(env, addr, d[H##ESIZE(e)], oi, GETPC()); \ } \ if (WB) { \ m[H##ESIZE(e)] = addr; \ @@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, MO_TEUW, 2, stw, 4, int32_t) uint16_t eci_mask = mve_eci_mask(env); \ unsigned e; \ uint32_t addr; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (e = 0; e < 16 / 4; e++, mask >>= 4, eci_mask >>= 4) { \ if (!(eci_mask & 1)) { \ continue; \ @@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, MO_TEUW, 2, stw, 4, int32_t) addr = ADDRFN(base, m[H4(e & ~1)]); \ addr += 4 * (e & 1); \ if (mask & 1) { \ - cpu_stl_data_ra(env, addr, d[H4(e)], GETPC()); \ + cpu_stl_mmu(env, addr, d[H4(e)], oi, GETPC()); \ } \ if (WB && (e & 1)) { \ m[H4(e & ~1)] = addr - 4; \ @@ -XXX,XX +XXX,XX @@ DO_VLDR_SG(vldrw_sg_os_uw, MO_TEUL, uint32_t, ldl, 4, uint32_t, uint32_t, ADDR_ADD_OSW, false) DO_VLDR64_SG(vldrd_sg_os_ud, ADDR_ADD_OSD, false) -DO_VSTR_SG(vstrb_sg_ub, stb, 1, uint8_t, ADDR_ADD, false) -DO_VSTR_SG(vstrb_sg_uh, stb, 2, uint16_t, ADDR_ADD, false) -DO_VSTR_SG(vstrb_sg_uw, stb, 4, uint32_t, ADDR_ADD, false) -DO_VSTR_SG(vstrh_sg_uh, stw, 2, uint16_t, ADDR_ADD, false) -DO_VSTR_SG(vstrh_sg_uw, stw, 4, uint32_t, ADDR_ADD, false) -DO_VSTR_SG(vstrw_sg_uw, stl, 4, uint32_t, ADDR_ADD, false) +DO_VSTR_SG(vstrb_sg_ub, MO_UB, stb, 1, uint8_t, ADDR_ADD, false) +DO_VSTR_SG(vstrb_sg_uh, MO_UB, stb, 2, uint16_t, ADDR_ADD, false) +DO_VSTR_SG(vstrb_sg_uw, MO_UB, stb, 4, uint32_t, ADDR_ADD, false) +DO_VSTR_SG(vstrh_sg_uh, MO_TEUW, stw, 2, uint16_t, ADDR_ADD, false) +DO_VSTR_SG(vstrh_sg_uw, MO_TEUW, stw, 4, uint32_t, ADDR_ADD, false) +DO_VSTR_SG(vstrw_sg_uw, MO_TEUL, stl, 4, uint32_t, ADDR_ADD, false) DO_VSTR64_SG(vstrd_sg_ud, ADDR_ADD, false) -DO_VSTR_SG(vstrh_sg_os_uh, stw, 2, uint16_t, ADDR_ADD_OSH, false) -DO_VSTR_SG(vstrh_sg_os_uw, stw, 4, uint32_t, ADDR_ADD_OSH, false) -DO_VSTR_SG(vstrw_sg_os_uw, stl, 4, uint32_t, ADDR_ADD_OSW, false) +DO_VSTR_SG(vstrh_sg_os_uh, MO_TEUW, stw, 2, uint16_t, ADDR_ADD_OSH, false) +DO_VSTR_SG(vstrh_sg_os_uw, MO_TEUW, stw, 4, uint32_t, ADDR_ADD_OSH, false) +DO_VSTR_SG(vstrw_sg_os_uw, MO_TEUL, stl, 4, uint32_t, ADDR_ADD_OSW, false) DO_VSTR64_SG(vstrd_sg_os_ud, ADDR_ADD_OSD, false) DO_VLDR_SG(vldrw_sg_wb_uw, MO_TEUL, uint32_t, ldl, 4, uint32_t, uint32_t, ADDR_ADD, true) DO_VLDR64_SG(vldrd_sg_wb_ud, ADDR_ADD, true) -DO_VSTR_SG(vstrw_sg_wb_uw, stl, 4, uint32_t, ADDR_ADD, true) +DO_VSTR_SG(vstrw_sg_wb_uw, MO_TEUL, stl, 4, uint32_t, ADDR_ADD, true) DO_VSTR64_SG(vstrd_sg_wb_ud, ADDR_ADD, true) /* -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the load operations in the VLD4 instruction. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-9-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/mve_helper.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/mve_helper.c +++ b/target/arm/tcg/mve_helper.c @@ -XXX,XX +XXX,XX @@ DO_VSTR64_SG(vstrd_sg_wb_ud, ADDR_ADD, true) uint16_t mask = mve_eci_mask(env); \ static const uint8_t off[4] = { O1, O2, O3, O4 }; \ uint32_t addr, data; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ continue; \ } \ addr = base + off[beat] * 4; \ - data = cpu_ldl_le_data_ra(env, addr, GETPC()); \ + data = cpu_ldl_mmu(env, addr, oi, GETPC()); \ for (e = 0; e < 4; e++, data >>= 8) { \ uint8_t *qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + e); \ qd[H1(off[beat])] = data; \ @@ -XXX,XX +XXX,XX @@ DO_VSTR64_SG(vstrd_sg_wb_ud, ADDR_ADD, true) uint32_t addr, data; \ int y; /* y counts 0 2 0 2 */ \ uint16_t *qd; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0, y = 0; beat < 4; beat++, mask >>= 4, y ^= 2) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ continue; \ } \ addr = base + off[beat] * 8 + (beat & 1) * 4; \ - data = cpu_ldl_le_data_ra(env, addr, GETPC()); \ + data = cpu_ldl_mmu(env, addr, oi, GETPC()); \ qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + y); \ qd[H2(off[beat])] = data; \ data >>= 16; \ @@ -XXX,XX +XXX,XX @@ DO_VSTR64_SG(vstrd_sg_wb_ud, ADDR_ADD, true) uint32_t addr, data; \ uint32_t *qd; \ int y; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ continue; \ } \ addr = base + off[beat] * 4; \ - data = cpu_ldl_le_data_ra(env, addr, GETPC()); \ + data = cpu_ldl_mmu(env, addr, oi, GETPC()); \ y = (beat + (O1 & 2)) & 3; \ qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + y); \ qd[H4(off[beat] >> 2)] = data; \ -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the load operations in the VLD2 instruction. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-10-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/mve_helper.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/mve_helper.c +++ b/target/arm/tcg/mve_helper.c @@ -XXX,XX +XXX,XX @@ DO_VLD4W(vld43w, 6, 7, 8, 9) static const uint8_t off[4] = { O1, O2, O3, O4 }; \ uint32_t addr, data; \ uint8_t *qd; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ continue; \ } \ addr = base + off[beat] * 2; \ - data = cpu_ldl_le_data_ra(env, addr, GETPC()); \ + data = cpu_ldl_mmu(env, addr, oi, GETPC()); \ for (e = 0; e < 4; e++, data >>= 8) { \ qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + (e & 1)); \ qd[H1(off[beat] + (e >> 1))] = data; \ @@ -XXX,XX +XXX,XX @@ DO_VLD4W(vld43w, 6, 7, 8, 9) uint32_t addr, data; \ int e; \ uint16_t *qd; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ continue; \ } \ addr = base + off[beat] * 4; \ - data = cpu_ldl_le_data_ra(env, addr, GETPC()); \ + data = cpu_ldl_mmu(env, addr, oi, GETPC()); \ for (e = 0; e < 2; e++, data >>= 16) { \ qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + e); \ qd[H2(off[beat])] = data; \ @@ -XXX,XX +XXX,XX @@ DO_VLD4W(vld43w, 6, 7, 8, 9) static const uint8_t off[4] = { O1, O2, O3, O4 }; \ uint32_t addr, data; \ uint32_t *qd; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ continue; \ } \ addr = base + off[beat]; \ - data = cpu_ldl_le_data_ra(env, addr, GETPC()); \ + data = cpu_ldl_mmu(env, addr, oi, GETPC()); \ qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + (beat & 1)); \ qd[H4(off[beat] >> 3)] = data; \ } \ -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the store operations in the VST4 instruction. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-11-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/mve_helper.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/mve_helper.c +++ b/target/arm/tcg/mve_helper.c @@ -XXX,XX +XXX,XX @@ DO_VLD2W(vld21w, 8, 12, 16, 20) uint16_t mask = mve_eci_mask(env); \ static const uint8_t off[4] = { O1, O2, O3, O4 }; \ uint32_t addr, data; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ @@ -XXX,XX +XXX,XX @@ DO_VLD2W(vld21w, 8, 12, 16, 20) uint8_t *qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + e); \ data = (data << 8) | qd[H1(off[beat])]; \ } \ - cpu_stl_le_data_ra(env, addr, data, GETPC()); \ + cpu_stl_mmu(env, addr, data, oi, GETPC()); \ } \ } @@ -XXX,XX +XXX,XX @@ DO_VLD2W(vld21w, 8, 12, 16, 20) uint32_t addr, data; \ int y; /* y counts 0 2 0 2 */ \ uint16_t *qd; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0, y = 0; beat < 4; beat++, mask >>= 4, y ^= 2) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ @@ -XXX,XX +XXX,XX @@ DO_VLD2W(vld21w, 8, 12, 16, 20) data = qd[H2(off[beat])]; \ qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + y + 1); \ data |= qd[H2(off[beat])] << 16; \ - cpu_stl_le_data_ra(env, addr, data, GETPC()); \ + cpu_stl_mmu(env, addr, data, oi, GETPC()); \ } \ } @@ -XXX,XX +XXX,XX @@ DO_VLD2W(vld21w, 8, 12, 16, 20) uint32_t addr, data; \ uint32_t *qd; \ int y; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ @@ -XXX,XX +XXX,XX @@ DO_VLD2W(vld21w, 8, 12, 16, 20) y = (beat + (O1 & 2)) & 3; \ qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + y); \ data = qd[H4(off[beat] >> 2)]; \ - cpu_stl_le_data_ra(env, addr, data, GETPC()); \ + cpu_stl_mmu(env, addr, data, oi, GETPC()); \ } \ } -- 2.43.0
From: William Kosasih <kosasihwilliam4@gmail.com> This patch adds alignment checks in the store operations in the VST2 instruction. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1154 Signed-off-by: William Kosasih <kosasihwilliam4@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250703085604.154449-12-kosasihwilliam4@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/mve_helper.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/mve_helper.c +++ b/target/arm/tcg/mve_helper.c @@ -XXX,XX +XXX,XX @@ DO_VST4W(vst43w, 6, 7, 8, 9) static const uint8_t off[4] = { O1, O2, O3, O4 }; \ uint32_t addr, data; \ uint8_t *qd; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ @@ -XXX,XX +XXX,XX @@ DO_VST4W(vst43w, 6, 7, 8, 9) qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + (e & 1)); \ data = (data << 8) | qd[H1(off[beat] + (e >> 1))]; \ } \ - cpu_stl_le_data_ra(env, addr, data, GETPC()); \ + cpu_stl_mmu(env, addr, data, oi, GETPC()); \ } \ } @@ -XXX,XX +XXX,XX @@ DO_VST4W(vst43w, 6, 7, 8, 9) uint32_t addr, data; \ int e; \ uint16_t *qd; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ @@ -XXX,XX +XXX,XX @@ DO_VST4W(vst43w, 6, 7, 8, 9) qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + e); \ data = (data << 16) | qd[H2(off[beat])]; \ } \ - cpu_stl_le_data_ra(env, addr, data, GETPC()); \ + cpu_stl_mmu(env, addr, data, oi, GETPC()); \ } \ } @@ -XXX,XX +XXX,XX @@ DO_VST4W(vst43w, 6, 7, 8, 9) static const uint8_t off[4] = { O1, O2, O3, O4 }; \ uint32_t addr, data; \ uint32_t *qd; \ + int mmu_idx = arm_to_core_mmu_idx(arm_mmu_idx(env)); \ + MemOpIdx oi = make_memop_idx(MO_TEUL | MO_ALIGN, mmu_idx); \ for (beat = 0; beat < 4; beat++, mask >>= 4) { \ if ((mask & 1) == 0) { \ /* ECI says skip this beat */ \ @@ -XXX,XX +XXX,XX @@ DO_VST4W(vst43w, 6, 7, 8, 9) addr = base + off[beat]; \ qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + (beat & 1)); \ data = qd[H4(off[beat] >> 3)]; \ - cpu_stl_le_data_ra(env, addr, data, GETPC()); \ + cpu_stl_mmu(env, addr, data, oi, GETPC()); \ } \ } -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> We failed to raise an exception when sme_excp_el == 0 and fp_excp_el == 1. Cc: qemu-stable@nongnu.org Fixes: 3d74825f4d6 ("target/arm: Add SME enablement checks") Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-2-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-a64.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ bool sme_enabled_check(DisasContext *s) * to be zero when fp_excp_el has priority. This is because we need * sme_excp_el by itself for cpregs access checks. */ - if (!s->fp_excp_el || s->sme_excp_el < s->fp_excp_el) { + if (s->sme_excp_el + && (!s->fp_excp_el || s->sme_excp_el <= s->fp_excp_el)) { bool ret = sme_access_check(s); s->fp_access_checked = (ret ? 1 : -1); return ret; -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Do not assume SME implies SVE. Ensure that the non-streaming check is present along the SME path, since it is not implied by sme_*_enabled_check. Cc: qemu-stable@nongnu.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-3-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-a64.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static bool fp_access_check_only(DisasContext *s) return true; } -static bool fp_access_check(DisasContext *s) +static bool nonstreaming_check(DisasContext *s) { - if (!fp_access_check_only(s)) { - return false; - } if (s->sme_trap_nonstreaming && s->is_nonstreaming) { gen_exception_insn(s, 0, EXCP_UDEF, syn_smetrap(SME_ET_Streaming, false)); @@ -XXX,XX +XXX,XX @@ static bool fp_access_check(DisasContext *s) return true; } +static bool fp_access_check(DisasContext *s) +{ + return fp_access_check_only(s) && nonstreaming_check(s); +} + /* * Return <0 for non-supported element sizes, with MO_16 controlled by * FEAT_FP16; return 0 for fp disabled; otherwise return >0 for success. @@ -XXX,XX +XXX,XX @@ static int fp_access_check_vector_hsd(DisasContext *s, bool is_q, MemOp esz) */ bool sve_access_check(DisasContext *s) { - if (s->pstate_sm || !dc_isar_feature(aa64_sve, s)) { + if (dc_isar_feature(aa64_sme, s)) { bool ret; - assert(dc_isar_feature(aa64_sme, s)); - ret = sme_sm_enabled_check(s); + if (s->pstate_sm) { + ret = sme_enabled_check(s); + } else if (dc_isar_feature(aa64_sve, s)) { + goto continue_sve; + } else { + ret = sme_sm_enabled_check(s); + } + if (ret) { + ret = nonstreaming_check(s); + } s->sve_access_checked = (ret ? 1 : -1); return ret; } + + continue_sve: if (s->sve_excp_el) { /* Assert that we only raise one exception per instruction. */ assert(!s->sve_access_checked); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> We missed the instructions UDEF when the vector size is too small. We missed marking the instructions non-streaming with SME. Cc: qemu-stable@nongnu.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-4-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-sve.c | 43 ++++++++++++++++++++++++---------- 1 file changed, 30 insertions(+), 13 deletions(-) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(PUNPKHI, aa64_sve, do_perm_pred2, a, 1, gen_helper_sve_punpk_p) *** SVE Permute - Interleaving Group */ +static bool do_interleave_q(DisasContext *s, gen_helper_gvec_3 *fn, + arg_rrr_esz *a, int data) +{ + if (sve_access_check(s)) { + unsigned vsz = vec_full_reg_size(s); + if (vsz < 32) { + unallocated_encoding(s); + } else { + tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd), + vec_full_reg_offset(s, a->rn), + vec_full_reg_offset(s, a->rm), + vsz, vsz, data, fn); + } + } + return true; +} + static gen_helper_gvec_3 * const zip_fns[4] = { gen_helper_sve_zip_b, gen_helper_sve_zip_h, gen_helper_sve_zip_s, gen_helper_sve_zip_d, @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(ZIP1_z, aa64_sve, gen_gvec_ool_arg_zzz, TRANS_FEAT(ZIP2_z, aa64_sve, gen_gvec_ool_arg_zzz, zip_fns[a->esz], a, vec_full_reg_size(s) / 2) -TRANS_FEAT(ZIP1_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz, - gen_helper_sve2_zip_q, a, 0) -TRANS_FEAT(ZIP2_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz, - gen_helper_sve2_zip_q, a, - QEMU_ALIGN_DOWN(vec_full_reg_size(s), 32) / 2) +TRANS_FEAT_NONSTREAMING(ZIP1_q, aa64_sve_f64mm, do_interleave_q, + gen_helper_sve2_zip_q, a, 0) +TRANS_FEAT_NONSTREAMING(ZIP2_q, aa64_sve_f64mm, do_interleave_q, + gen_helper_sve2_zip_q, a, + QEMU_ALIGN_DOWN(vec_full_reg_size(s), 32) / 2) static gen_helper_gvec_3 * const uzp_fns[4] = { gen_helper_sve_uzp_b, gen_helper_sve_uzp_h, @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(UZP1_z, aa64_sve, gen_gvec_ool_arg_zzz, TRANS_FEAT(UZP2_z, aa64_sve, gen_gvec_ool_arg_zzz, uzp_fns[a->esz], a, 1 << a->esz) -TRANS_FEAT(UZP1_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz, - gen_helper_sve2_uzp_q, a, 0) -TRANS_FEAT(UZP2_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz, - gen_helper_sve2_uzp_q, a, 16) +TRANS_FEAT_NONSTREAMING(UZP1_q, aa64_sve_f64mm, do_interleave_q, + gen_helper_sve2_uzp_q, a, 0) +TRANS_FEAT_NONSTREAMING(UZP2_q, aa64_sve_f64mm, do_interleave_q, + gen_helper_sve2_uzp_q, a, 16) static gen_helper_gvec_3 * const trn_fns[4] = { gen_helper_sve_trn_b, gen_helper_sve_trn_h, @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(TRN1_z, aa64_sve, gen_gvec_ool_arg_zzz, TRANS_FEAT(TRN2_z, aa64_sve, gen_gvec_ool_arg_zzz, trn_fns[a->esz], a, 1 << a->esz) -TRANS_FEAT(TRN1_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz, - gen_helper_sve2_trn_q, a, 0) -TRANS_FEAT(TRN2_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz, - gen_helper_sve2_trn_q, a, 16) +TRANS_FEAT_NONSTREAMING(TRN1_q, aa64_sve_f64mm, do_interleave_q, + gen_helper_sve2_trn_q, a, 0) +TRANS_FEAT_NONSTREAMING(TRN2_q, aa64_sve_f64mm, do_interleave_q, + gen_helper_sve2_trn_q, a, 16) /* *** SVE Permute Vector - Predicated Group -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Replace @rda_rn_rm_e0 with @rda_rn_rm_ex, and require users to supply an explicit esz. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-5-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 48 +++++++++++++++++++-------------------- 1 file changed, 24 insertions(+), 24 deletions(-) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ @rda_rn_rm ........ esz:2 . rm:5 ... ... rn:5 rd:5 \ &rrrr_esz ra=%reg_movprfx -# Four operand with unused vector element size -@rda_rn_rm_e0 ........ ... rm:5 ... ... rn:5 rd:5 \ - &rrrr_esz esz=0 ra=%reg_movprfx -@rdn_ra_rm_e0 ........ ... rm:5 ... ... ra:5 rd:5 \ - &rrrr_esz esz=0 rn=%reg_movprfx +# Four operand with explicit vector element size +@rda_rn_rm_ex ........ ... rm:5 ... ... rn:5 rd:5 \ + &rrrr_esz ra=%reg_movprfx +@rdn_ra_rm_ex ........ ... rm:5 ... ... ra:5 rd:5 \ + &rrrr_esz rn=%reg_movprfx # Three operand with "memory" size, aka immediate left shift @rd_rn_msz_rm ........ ... rm:5 .... imm:2 rn:5 rd:5 &rrri @@ -XXX,XX +XXX,XX @@ XAR 00000100 .. 1 ..... 001 101 rm:5 rd:5 &rrri_esz \ rn=%reg_movprfx esz=%tszimm16_esz imm=%tszimm16_shr # SVE2 bitwise ternary operations -EOR3 00000100 00 1 ..... 001 110 ..... ..... @rdn_ra_rm_e0 -BSL 00000100 00 1 ..... 001 111 ..... ..... @rdn_ra_rm_e0 -BCAX 00000100 01 1 ..... 001 110 ..... ..... @rdn_ra_rm_e0 -BSL1N 00000100 01 1 ..... 001 111 ..... ..... @rdn_ra_rm_e0 -BSL2N 00000100 10 1 ..... 001 111 ..... ..... @rdn_ra_rm_e0 -NBSL 00000100 11 1 ..... 001 111 ..... ..... @rdn_ra_rm_e0 +EOR3 00000100 00 1 ..... 001 110 ..... ..... @rdn_ra_rm_ex esz=0 +BSL 00000100 00 1 ..... 001 111 ..... ..... @rdn_ra_rm_ex esz=0 +BCAX 00000100 01 1 ..... 001 110 ..... ..... @rdn_ra_rm_ex esz=0 +BSL1N 00000100 01 1 ..... 001 111 ..... ..... @rdn_ra_rm_ex esz=0 +BSL2N 00000100 10 1 ..... 001 111 ..... ..... @rdn_ra_rm_ex esz=0 +NBSL 00000100 11 1 ..... 001 111 ..... ..... @rdn_ra_rm_ex esz=0 ### SVE Index Generation Group @@ -XXX,XX +XXX,XX @@ EORTB 01000101 .. 0 ..... 10010 1 ..... ..... @rd_rn_rm ## SVE integer matrix multiply accumulate -SMMLA 01000101 00 0 ..... 10011 0 ..... ..... @rda_rn_rm_e0 -USMMLA 01000101 10 0 ..... 10011 0 ..... ..... @rda_rn_rm_e0 -UMMLA 01000101 11 0 ..... 10011 0 ..... ..... @rda_rn_rm_e0 +SMMLA 01000101 00 0 ..... 10011 0 ..... ..... @rda_rn_rm_ex esz=2 +USMMLA 01000101 10 0 ..... 10011 0 ..... ..... @rda_rn_rm_ex esz=2 +UMMLA 01000101 11 0 ..... 10011 0 ..... ..... @rda_rn_rm_ex esz=2 ## SVE2 bitwise permute @@ -XXX,XX +XXX,XX @@ SQRDCMLAH_zzzz 01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5 ra=%reg_movprfx USDOT_zzzz 01000100 .. 0 ..... 011 110 ..... ..... @rda_rn_rm ### SVE2 floating point matrix multiply accumulate -BFMMLA 01100100 01 1 ..... 111 001 ..... ..... @rda_rn_rm_e0 -FMMLA_s 01100100 10 1 ..... 111 001 ..... ..... @rda_rn_rm_e0 -FMMLA_d 01100100 11 1 ..... 111 001 ..... ..... @rda_rn_rm_e0 +BFMMLA 01100100 01 1 ..... 111 001 ..... ..... @rda_rn_rm_ex esz=1 +FMMLA_s 01100100 10 1 ..... 111 001 ..... ..... @rda_rn_rm_ex esz=2 +FMMLA_d 01100100 11 1 ..... 111 001 ..... ..... @rda_rn_rm_ex esz=3 ### SVE2 Memory Gather Load Group @@ -XXX,XX +XXX,XX @@ FCVTLT_sd 01100100 11 0010 11 101 ... ..... ..... @rd_pg_rn_e0 FLOGB 01100101 00 011 esz:2 0101 pg:3 rn:5 rd:5 &rpr_esz ### SVE2 floating-point multiply-add long (vectors) -FMLALB_zzzw 01100100 10 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_e0 -FMLALT_zzzw 01100100 10 1 ..... 10 0 00 1 ..... ..... @rda_rn_rm_e0 -FMLSLB_zzzw 01100100 10 1 ..... 10 1 00 0 ..... ..... @rda_rn_rm_e0 -FMLSLT_zzzw 01100100 10 1 ..... 10 1 00 1 ..... ..... @rda_rn_rm_e0 +FMLALB_zzzw 01100100 10 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2 +FMLALT_zzzw 01100100 10 1 ..... 10 0 00 1 ..... ..... @rda_rn_rm_ex esz=2 +FMLSLB_zzzw 01100100 10 1 ..... 10 1 00 0 ..... ..... @rda_rn_rm_ex esz=2 +FMLSLT_zzzw 01100100 10 1 ..... 10 1 00 1 ..... ..... @rda_rn_rm_ex esz=2 -BFMLALB_zzzw 01100100 11 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_e0 -BFMLALT_zzzw 01100100 11 1 ..... 10 0 00 1 ..... ..... @rda_rn_rm_e0 +BFMLALB_zzzw 01100100 11 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2 +BFMLALT_zzzw 01100100 11 1 ..... 10 0 00 1 ..... ..... @rda_rn_rm_ex esz=2 ### SVE2 floating-point bfloat16 dot-product -BFDOT_zzzz 01100100 01 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_e0 +BFDOT_zzzz 01100100 01 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2 ### SVE2 floating-point multiply-add long (indexed) FMLALB_zzxw 01100100 10 1 ..... 0100.0 ..... ..... @rrxr_3a esz=2 -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-6-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-sve.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ DO_ZPZZ_FP(FMINNMP, aa64_sve2, sve2_fminnmp_zpzz) DO_ZPZZ_FP(FMAXP, aa64_sve2, sve2_fmaxp_zpzz) DO_ZPZZ_FP(FMINP, aa64_sve2, sve2_fminp_zpzz) +static bool do_fmmla(DisasContext *s, arg_rrrr_esz *a, + gen_helper_gvec_4_ptr *fn) +{ + if (sve_access_check(s)) { + if (vec_full_reg_size(s) < 4 * memop_size(a->esz)) { + unallocated_encoding(s); + } else { + gen_gvec_fpst_zzzz(s, fn, a->rd, a->rn, a->rm, a->ra, 0, FPST_A64); + } + } + return true; +} + +TRANS_FEAT_NONSTREAMING(FMMLA_s, aa64_sve_f32mm, do_fmmla, a, gen_helper_fmmla_s) +TRANS_FEAT_NONSTREAMING(FMMLA_d, aa64_sve_f64mm, do_fmmla, a, gen_helper_fmmla_d) + /* * SVE Integer Multiply-Add (unpredicated) */ -TRANS_FEAT_NONSTREAMING(FMMLA_s, aa64_sve_f32mm, gen_gvec_fpst_zzzz, - gen_helper_fmmla_s, a->rd, a->rn, a->rm, a->ra, - 0, FPST_A64) -TRANS_FEAT_NONSTREAMING(FMMLA_d, aa64_sve_f64mm, gen_gvec_fpst_zzzz, - gen_helper_fmmla_d, a->rd, a->rn, a->rm, a->ra, - 0, FPST_A64) - static gen_helper_gvec_4 * const sqdmlal_zzzw_fns[] = { NULL, gen_helper_sve2_sqdmlal_zzzw_h, gen_helper_sve2_sqdmlal_zzzw_s, gen_helper_sve2_sqdmlal_zzzw_d, -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> All F64MM instructions operate on a 256-bit vector. If only 128-bit vectors is supported by the cpu, then the cpu cannot enable F64MM. Suggested-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-7-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/cpu64.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu64.c +++ b/target/arm/cpu64.c @@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp) /* From now on sve_max_vq is the actual maximum supported length. */ cpu->sve_max_vq = max_vq; cpu->sve_vq.map = vq_map; + + /* FEAT_F64MM requires the existence of a 256-bit vector size. */ + if (max_vq < 2) { + uint64_t t = GET_IDREG(&cpu->isar, ID_AA64ZFR0); + t = FIELD_DP64(t, ID_AA64ZFR0, F64MM, 0); + SET_IDREG(&cpu->isar, ID_AA64ZFR0, t); + } } /* -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Gvec only operates on size 8 and multiples of 16. Predicates may be any multiple of 2. Round up the size using the appropriate function. Cc: qemu-stable@nongnu.org Fixes: 598ab0b24c0 ("target/arm: Implement PSEL") Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-8-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-sve.c | 1 + 1 file changed, 1 insertion(+) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_PSEL(DisasContext *s, arg_psel *a) tcg_gen_neg_i64(tmp, tmp); /* Apply to either copy the source, or write zeros. */ + pl = size_for_gvec(pl); tcg_gen_gvec_ands(MO_64, pred_full_reg_offset(s, a->pd), pred_full_reg_offset(s, a->pn), tmp, pl, pl); return true; -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Implement FPProcessNaNs4 within f16_dotadd, rather than simply letting NaNs propagate through the function. Cc: qemu-stable@nongnu.org Fixes: 3916841ac75 ("target/arm: Implement FMOPA, FMOPS (widening)") Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-9-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme_helper.c | 62 +++++++++++++++++++++++++++---------- 1 file changed, 46 insertions(+), 16 deletions(-) diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ static float32 f16_dotadd(float32 sum, uint32_t e1, uint32_t e2, * - we have pre-set-up copy of s_std which is set to round-to-odd, * for the multiply (see below) */ - float64 e1r = float16_to_float64(e1 & 0xffff, true, s_f16); - float64 e1c = float16_to_float64(e1 >> 16, true, s_f16); - float64 e2r = float16_to_float64(e2 & 0xffff, true, s_f16); - float64 e2c = float16_to_float64(e2 >> 16, true, s_f16); - float64 t64; + float16 h1r = e1 & 0xffff; + float16 h1c = e1 >> 16; + float16 h2r = e2 & 0xffff; + float16 h2c = e2 >> 16; float32 t32; - /* - * The ARM pseudocode function FPDot performs both multiplies - * and the add with a single rounding operation. Emulate this - * by performing the first multiply in round-to-odd, then doing - * the second multiply as fused multiply-add, and rounding to - * float32 all in one step. - */ - t64 = float64_mul(e1r, e2r, s_odd); - t64 = float64r32_muladd(e1c, e2c, t64, 0, s_std); + /* C.f. FPProcessNaNs4 */ + if (float16_is_any_nan(h1r) || float16_is_any_nan(h1c) || + float16_is_any_nan(h2r) || float16_is_any_nan(h2c)) { + float16 t16; - /* This conversion is exact, because we've already rounded. */ - t32 = float64_to_float32(t64, s_std); + if (float16_is_signaling_nan(h1r, s_f16)) { + t16 = h1r; + } else if (float16_is_signaling_nan(h1c, s_f16)) { + t16 = h1c; + } else if (float16_is_signaling_nan(h2r, s_f16)) { + t16 = h2r; + } else if (float16_is_signaling_nan(h2c, s_f16)) { + t16 = h2c; + } else if (float16_is_any_nan(h1r)) { + t16 = h1r; + } else if (float16_is_any_nan(h1c)) { + t16 = h1c; + } else if (float16_is_any_nan(h2r)) { + t16 = h2r; + } else { + t16 = h2c; + } + t32 = float16_to_float32(t16, true, s_f16); + } else { + float64 e1r = float16_to_float64(h1r, true, s_f16); + float64 e1c = float16_to_float64(h1c, true, s_f16); + float64 e2r = float16_to_float64(h2r, true, s_f16); + float64 e2c = float16_to_float64(h2c, true, s_f16); + float64 t64; + + /* + * The ARM pseudocode function FPDot performs both multiplies + * and the add with a single rounding operation. Emulate this + * by performing the first multiply in round-to-odd, then doing + * the second multiply as fused multiply-add, and rounding to + * float32 all in one step. + */ + t64 = float64_mul(e1r, e2r, s_odd); + t64 = float64r32_muladd(e1c, e2c, t64, 0, s_std); + + /* This conversion is exact, because we've already rounded. */ + t32 = float64_to_float32(t64, s_std); + } /* The final accumulation step is not fused. */ return float32_add(sum, t32, s_std); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Implement FPProcessNaNs4 within bfdotadd_ebf, rather than simply letting NaNs propagate through the function. Cc: qemu-stable@nongnu.org Fixes: 0e1850182a1 ("target/arm: Implement FPCR.EBF=1 semantics for bfdotadd()") Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20250704142112.1018902-10-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/vec_helper.c | 75 ++++++++++++++++++++++++++----------- 1 file changed, 53 insertions(+), 22 deletions(-) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2, float_status *fpst) float32 bfdotadd_ebf(float32 sum, uint32_t e1, uint32_t e2, float_status *fpst, float_status *fpst_odd) { - /* - * Compare f16_dotadd() in sme_helper.c, but here we have - * bfloat16 inputs. In particular that means that we do not - * want the FPCR.FZ16 flush semantics, so we use the normal - * float_status for the input handling here. - */ - float64 e1r = float32_to_float64(e1 << 16, fpst); - float64 e1c = float32_to_float64(e1 & 0xffff0000u, fpst); - float64 e2r = float32_to_float64(e2 << 16, fpst); - float64 e2c = float32_to_float64(e2 & 0xffff0000u, fpst); - float64 t64; + float32 s1r = e1 << 16; + float32 s1c = e1 & 0xffff0000u; + float32 s2r = e2 << 16; + float32 s2c = e2 & 0xffff0000u; float32 t32; - /* - * The ARM pseudocode function FPDot performs both multiplies - * and the add with a single rounding operation. Emulate this - * by performing the first multiply in round-to-odd, then doing - * the second multiply as fused multiply-add, and rounding to - * float32 all in one step. - */ - t64 = float64_mul(e1r, e2r, fpst_odd); - t64 = float64r32_muladd(e1c, e2c, t64, 0, fpst); + /* C.f. FPProcessNaNs4 */ + if (float32_is_any_nan(s1r) || float32_is_any_nan(s1c) || + float32_is_any_nan(s2r) || float32_is_any_nan(s2c)) { + if (float32_is_signaling_nan(s1r, fpst)) { + t32 = s1r; + } else if (float32_is_signaling_nan(s1c, fpst)) { + t32 = s1c; + } else if (float32_is_signaling_nan(s2r, fpst)) { + t32 = s2r; + } else if (float32_is_signaling_nan(s2c, fpst)) { + t32 = s2c; + } else if (float32_is_any_nan(s1r)) { + t32 = s1r; + } else if (float32_is_any_nan(s1c)) { + t32 = s1c; + } else if (float32_is_any_nan(s2r)) { + t32 = s2r; + } else { + t32 = s2c; + } + /* + * FPConvertNaN(FPProcessNaN(t32)) will be done as part + * of the final addition below. + */ + } else { + /* + * Compare f16_dotadd() in sme_helper.c, but here we have + * bfloat16 inputs. In particular that means that we do not + * want the FPCR.FZ16 flush semantics, so we use the normal + * float_status for the input handling here. + */ + float64 e1r = float32_to_float64(s1r, fpst); + float64 e1c = float32_to_float64(s1c, fpst); + float64 e2r = float32_to_float64(s2r, fpst); + float64 e2c = float32_to_float64(s2c, fpst); + float64 t64; - /* This conversion is exact, because we've already rounded. */ - t32 = float64_to_float32(t64, fpst); + /* + * The ARM pseudocode function FPDot performs both multiplies + * and the add with a single rounding operation. Emulate this + * by performing the first multiply in round-to-odd, then doing + * the second multiply as fused multiply-add, and rounding to + * float32 all in one step. + */ + t64 = float64_mul(e1r, e2r, fpst_odd); + t64 = float64r32_muladd(e1c, e2c, t64, 0, fpst); + + /* This conversion is exact, because we've already rounded. */ + t32 = float64_to_float32(t64, fpst); + } /* The final accumulation step is not fused. */ return float32_add(sum, t32, fpst); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> The last use of this field was removed in b2fc7be972b9. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-11-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/cpu.h | 3 --- 1 file changed, 3 deletions(-) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState { uint32_t xregs[16]; - /* Scratch space for aa32 neon expansion. */ - uint32_t scratch[8]; - /* There are a number of distinct float control structures. */ float_status fp_status[FPST_COUNT]; -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Rather than repeatedly copying FPST_FPCR to locals and setting default nan mode, create dedicated float_status. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-12-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/cpu.h | 12 +++++++++++- target/arm/cpu.c | 4 ++++ target/arm/tcg/vfp_helper.c | 12 +++++++++++- 3 files changed, 26 insertions(+), 2 deletions(-) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -XXX,XX +XXX,XX @@ typedef struct NVICState NVICState; * when FPCR.AH == 1 (bfloat16 conversions and multiplies, * and the reciprocal and square root estimate/step insns); * for half-precision + * ZA: the "streaming sve" fp status. + * ZA_F16: likewise for half-precision. * * Half-precision operations are governed by a separate * flush-to-zero control bit in FPSCR:FZ16. We pass a separate @@ -XXX,XX +XXX,XX @@ typedef struct NVICState NVICState; * they ignore FPCR.RMode. But they don't ignore FPCR.FZ16, * which means we need an FPST_AH_F16 as well. * + * The "ZA" float_status are for Streaming SVE operations which use + * default-NaN and do not generate fp exceptions, which means that they + * do not accumulate exception bits back into FPCR. + * See e.g. FPAdd vs FPAdd_ZA pseudocode functions, and the setting + * of fpcr.DN and fpexec parameters. + * * To avoid having to transfer exception bits around, we simply * say that the FPSCR cumulative exception flags are the logical * OR of the flags in the four fp statuses. This relies on the @@ -XXX,XX +XXX,XX @@ typedef enum ARMFPStatusFlavour { FPST_A64_F16, FPST_AH, FPST_AH_F16, + FPST_ZA, + FPST_ZA_F16, FPST_STD, FPST_STD_F16, } ARMFPStatusFlavour; -#define FPST_COUNT 8 +#define FPST_COUNT 10 typedef struct CPUArchState { /* Regs for current mode. */ diff --git a/target/arm/cpu.c b/target/arm/cpu.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type) set_flush_inputs_to_zero(1, &env->vfp.fp_status[FPST_STD]); set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD]); set_default_nan_mode(1, &env->vfp.fp_status[FPST_STD_F16]); + set_default_nan_mode(1, &env->vfp.fp_status[FPST_ZA]); + set_default_nan_mode(1, &env->vfp.fp_status[FPST_ZA_F16]); arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A32]); arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64]); + arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_ZA]); arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD]); arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A32_F16]); arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]); + arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_ZA_F16]); arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_STD_F16]); arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_AH]); set_flush_to_zero(1, &env->vfp.fp_status[FPST_AH]); diff --git a/target/arm/tcg/vfp_helper.c b/target/arm/tcg/vfp_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vfp_helper.c +++ b/target/arm/tcg/vfp_helper.c @@ -XXX,XX +XXX,XX @@ uint32_t vfp_get_fpsr_from_host(CPUARMState *env) a64_flags |= (get_float_exception_flags(&env->vfp.fp_status[FPST_A64_F16]) & ~(float_flag_input_denormal_flushed | float_flag_input_denormal_used)); /* - * We do not merge in flags from FPST_AH or FPST_AH_F16, because + * We do not merge in flags from FPST_{AH,ZA} or FPST_{AH,ZA}_F16, because * they are used for insns that must not set the cumulative exception bits. */ @@ -XXX,XX +XXX,XX @@ void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask) set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64]); set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A32_F16]); set_float_rounding_mode(i, &env->vfp.fp_status[FPST_A64_F16]); + set_float_rounding_mode(i, &env->vfp.fp_status[FPST_ZA]); + set_float_rounding_mode(i, &env->vfp.fp_status[FPST_ZA_F16]); } if (changed & FPCR_FZ16) { bool ftz_enabled = val & FPCR_FZ16; @@ -XXX,XX +XXX,XX @@ void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask) set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64_F16]); set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]); set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]); + set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_ZA_F16]); set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A32_F16]); set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64_F16]); set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_STD_F16]); set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_AH_F16]); + set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_ZA_F16]); } if (changed & FPCR_FZ) { bool ftz_enabled = val & FPCR_FZ; set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A32]); set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A64]); + set_flush_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_ZA]); /* FIZ is A64 only so FZ always makes A32 code flush inputs to zero */ set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status[FPST_A32]); } @@ -XXX,XX +XXX,XX @@ void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask) bool fitz_enabled = (val & FPCR_FIZ) || (val & (FPCR_FZ | FPCR_AH)) == FPCR_FZ; set_flush_inputs_to_zero(fitz_enabled, &env->vfp.fp_status[FPST_A64]); + set_flush_inputs_to_zero(fitz_enabled, &env->vfp.fp_status[FPST_ZA]); } if (changed & FPCR_DN) { bool dnan_enabled = val & FPCR_DN; @@ -XXX,XX +XXX,XX @@ void vfp_set_fpcr_to_host(CPUARMState *env, uint32_t val, uint32_t mask) /* Change behaviours for A64 FP operations */ arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_A64]); arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]); + arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_ZA]); + arm_set_ah_fp_behaviours(&env->vfp.fp_status[FPST_ZA_F16]); } else { arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64]); arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_A64_F16]); + arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_ZA]); + arm_set_default_fp_behaviours(&env->vfp.fp_status[FPST_ZA_F16]); } } /* -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-13-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme_helper.c | 37 ++++++++-------------------------- target/arm/tcg/translate-sme.c | 4 ++-- 2 files changed, 10 insertions(+), 31 deletions(-) diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme_addva_d)(void *vzda, void *vzn, void *vpn, } void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn, - void *vpm, float_status *fpst_in, uint32_t desc) + void *vpm, float_status *fpst, uint32_t desc) { intptr_t row, col, oprsz = simd_maxsz(desc); uint32_t neg = simd_data(desc) << 31; uint16_t *pn = vpn, *pm = vpm; - float_status fpst; - - /* - * Make a copy of float_status because this operation does not - * update the cumulative fp exception status. It also produces - * default nans. - */ - fpst = *fpst_in; - set_default_nan_mode(true, &fpst); for (row = 0; row < oprsz; ) { uint16_t pa = pn[H2(row >> 4)]; @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn, if (pb & 1) { uint32_t *a = vza_row + H1_4(col); uint32_t *m = vzm + H1_4(col); - *a = float32_muladd(n, *m, *a, 0, &fpst); + *a = float32_muladd(n, *m, *a, 0, fpst); } col += 4; pb >>= 4; @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn, } void HELPER(sme_fmopa_d)(void *vza, void *vzn, void *vzm, void *vpn, - void *vpm, float_status *fpst_in, uint32_t desc) + void *vpm, float_status *fpst, uint32_t desc) { intptr_t row, col, oprsz = simd_oprsz(desc) / 8; uint64_t neg = (uint64_t)simd_data(desc) << 63; uint64_t *za = vza, *zn = vzn, *zm = vzm; uint8_t *pn = vpn, *pm = vpm; - float_status fpst = *fpst_in; - - set_default_nan_mode(true, &fpst); for (row = 0; row < oprsz; ++row) { if (pn[H1(row)] & 1) { @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_d)(void *vza, void *vzn, void *vzm, void *vpn, for (col = 0; col < oprsz; ++col) { if (pm[H1(col)] & 1) { uint64_t *a = &za_row[col]; - *a = float64_muladd(n, zm[col], *a, 0, &fpst); + *a = float64_muladd(n, zm[col], *a, 0, fpst); } } } @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn, intptr_t row, col, oprsz = simd_maxsz(desc); uint32_t neg = simd_data(desc) * 0x80008000u; uint16_t *pn = vpn, *pm = vpm; - float_status fpst_odd, fpst_std, fpst_f16; + float_status fpst_odd = env->vfp.fp_status[FPST_ZA]; - /* - * Make copies of the fp status fields we use, because this operation - * does not update the cumulative fp exception status. It also - * produces default NaNs. We also need a second copy of fp_status with - * round-to-odd -- see above. - */ - fpst_f16 = env->vfp.fp_status[FPST_A64_F16]; - fpst_std = env->vfp.fp_status[FPST_A64]; - set_default_nan_mode(true, &fpst_std); - set_default_nan_mode(true, &fpst_f16); - fpst_odd = fpst_std; set_float_rounding_mode(float_round_to_odd, &fpst_odd); for (row = 0; row < oprsz; ) { @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn, m = f16mop_adj_pair(m, pcol, 0); *a = f16_dotadd(*a, n, m, - &fpst_f16, &fpst_std, &fpst_odd); + &env->vfp.fp_status[FPST_ZA_F16], + &env->vfp.fp_status[FPST_ZA], + &fpst_odd); } col += 4; pcol >>= 4; diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_outprod_env(DisasContext *s, arg_op *a, MemOp esz, TRANS_FEAT(FMOPA_h, aa64_sme, do_outprod_env, a, MO_32, gen_helper_sme_fmopa_h) TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, - MO_32, FPST_A64, gen_helper_sme_fmopa_s) + MO_32, FPST_ZA, gen_helper_sme_fmopa_s) TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, - MO_64, FPST_A64, gen_helper_sme_fmopa_d) + MO_64, FPST_ZA, gen_helper_sme_fmopa_d) TRANS_FEAT(BFMOPA, aa64_sme, do_outprod_env, a, MO_32, gen_helper_sme_bfmopa) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> The whole ZA state will also contain ZT0. Make things easier in aarch64_set_svcr to zero both by wrapping them in a common structure. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-14-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/cpu.h | 48 +++++++++++++++++++--------------- linux-user/aarch64/signal.c | 4 +-- target/arm/cpu.c | 4 +-- target/arm/helper.c | 2 +- target/arm/machine.c | 2 +- target/arm/tcg/sme_helper.c | 6 ++--- target/arm/tcg/translate-sme.c | 4 +-- 7 files changed, 38 insertions(+), 32 deletions(-) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState { uint64_t scxtnum_el[4]; - /* - * SME ZA storage -- 256 x 256 byte array, with bytes in host word order, - * as we do with vfp.zregs[]. This corresponds to the architectural ZA - * array, where ZA[N] is in the least-significant bytes of env->zarray[N]. - * When SVL is less than the architectural maximum, the accessible - * storage is restricted, such that if the SVL is X bytes the guest can - * see only the bottom X elements of zarray[], and only the least - * significant X bytes of each element of the array. (In other words, - * the observable part is always square.) - * - * The ZA storage can also be considered as a set of square tiles of - * elements of different sizes. The mapping from tiles to the ZA array - * is architecturally defined, such that for tiles of elements of esz - * bytes, the Nth row (or "horizontal slice") of tile T is in - * ZA[T + N * esz]. Note that this means that each tile is not contiguous - * in the ZA storage, because its rows are striped through the ZA array. - * - * Because this is so large, keep this toward the end of the reset area, - * to keep the offsets into the rest of the structure smaller. - */ - ARMVectorReg zarray[ARM_MAX_VQ * 16]; + struct { + /* + * SME ZA storage -- 256 x 256 byte array, with bytes in host + * word order, as we do with vfp.zregs[]. This corresponds to + * the architectural ZA array, where ZA[N] is in the least + * significant bytes of env->za_state.za[N]. + * + * When SVL is less than the architectural maximum, the accessible + * storage is restricted, such that if the SVL is X bytes the guest + * can see only the bottom X elements of zarray[], and only the least + * significant X bytes of each element of the array. (In other words, + * the observable part is always square.) + * + * The ZA storage can also be considered as a set of square tiles of + * elements of different sizes. The mapping from tiles to the ZA array + * is architecturally defined, such that for tiles of elements of esz + * bytes, the Nth row (or "horizontal slice") of tile T is in + * ZA[T + N * esz]. Note that this means that each tile is not + * contiguous in the ZA storage, because its rows are striped through + * the ZA array. + * + * Because this is so large, keep this toward the end of the + * reset area, to keep the offsets into the rest of the structure + * smaller. + */ + ARMVectorReg za[ARM_MAX_VQ * 16]; + } za_state; struct CPUBreakpoint *cpu_breakpoint[16]; struct CPUWatchpoint *cpu_watchpoint[16]; diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c index XXXXXXX..XXXXXXX 100644 --- a/linux-user/aarch64/signal.c +++ b/linux-user/aarch64/signal.c @@ -XXX,XX +XXX,XX @@ static void target_setup_za_record(struct target_za_context *za, for (i = 0; i < vl; ++i) { uint64_t *z = (void *)za + TARGET_ZA_SIG_ZAV_OFFSET(vq, i); for (j = 0; j < vq * 2; ++j) { - __put_user_e(env->zarray[i].d[j], z + j, le); + __put_user_e(env->za_state.za[i].d[j], z + j, le); } } } @@ -XXX,XX +XXX,XX @@ static bool target_restore_za_record(CPUARMState *env, for (i = 0; i < vl; ++i) { uint64_t *z = (void *)za + TARGET_ZA_SIG_ZAV_OFFSET(vq, i); for (j = 0; j < vq * 2; ++j) { - __get_user_e(env->zarray[i].d[j], z + j, le); + __get_user_e(env->za_state.za[i].d[j], z + j, le); } } return true; diff --git a/target/arm/cpu.c b/target/arm/cpu.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -XXX,XX +XXX,XX @@ static void aarch64_cpu_dump_state(CPUState *cs, FILE *f, int flags) qemu_fprintf(f, "ZA[%0*d]=", svl_lg10, i); for (j = zcr_len; j >= 0; --j) { qemu_fprintf(f, "%016" PRIx64 ":%016" PRIx64 "%c", - env->zarray[i].d[2 * j + 1], - env->zarray[i].d[2 * j], + env->za_state.za[i].d[2 * j + 1], + env->za_state.za[i].d[2 * j], j ? ':' : '\n'); } } diff --git a/target/arm/helper.c b/target/arm/helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -XXX,XX +XXX,XX @@ void aarch64_set_svcr(CPUARMState *env, uint64_t new, uint64_t mask) * when disabled either. */ if (change & new & R_SVCR_ZA_MASK) { - memset(env->zarray, 0, sizeof(env->zarray)); + memset(&env->za_state, 0, sizeof(env->za_state)); } if (tcg_enabled()) { diff --git a/target/arm/machine.c b/target/arm/machine.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/machine.c +++ b/target/arm/machine.c @@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_za = { .minimum_version_id = 1, .needed = za_needed, .fields = (const VMStateField[]) { - VMSTATE_STRUCT_ARRAY(env.zarray, ARMCPU, ARM_MAX_VQ * 16, 0, + VMSTATE_STRUCT_ARRAY(env.za_state.za, ARMCPU, ARM_MAX_VQ * 16, 0, vmstate_vreg, ARMVectorReg), VMSTATE_END_OF_LIST() } diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void helper_sme_zero(CPUARMState *env, uint32_t imm, uint32_t svl) uint32_t i; /* - * Special case clearing the entire ZA space. + * Special case clearing the entire ZArray. * This falls into the CONSTRAINED UNPREDICTABLE zeroing of any * parts of the ZA storage outside of SVL. */ if (imm == 0xff) { - memset(env->zarray, 0, sizeof(env->zarray)); + memset(env->za_state.za, 0, sizeof(env->za_state.za)); return; } @@ -XXX,XX +XXX,XX @@ void helper_sme_zero(CPUARMState *env, uint32_t imm, uint32_t svl) */ for (i = 0; i < svl; i++) { if (imm & (1 << (i % 8))) { - memset(&env->zarray[i], 0, svl); + memset(&env->za_state.za[i], 0, svl); } } } diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static TCGv_ptr get_tile_rowcol(DisasContext *s, int esz, int rs, offset = tile * sizeof(ARMVectorReg); /* Include the byte offset of zarray to make this relative to env. */ - offset += offsetof(CPUARMState, zarray); + offset += offsetof(CPUARMState, za_state.za); tcg_gen_addi_i32(tmp, tmp, offset); /* Add the byte offset to env to produce the final pointer. */ @@ -XXX,XX +XXX,XX @@ static TCGv_ptr get_tile(DisasContext *s, int esz, int tile) TCGv_ptr addr = tcg_temp_new_ptr(); int offset; - offset = tile * sizeof(ARMVectorReg) + offsetof(CPUARMState, zarray); + offset = tile * sizeof(ARMVectorReg) + offsetof(CPUARMState, za_state.za); tcg_gen_addi_ptr(addr, tcg_env, offset); return addr; -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-15-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/cpu-features.h | 63 +++++++++++++++++++++++++++++++++++++++ target/arm/cpu.h | 1 + 2 files changed, 64 insertions(+) diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu-features.h +++ b/target/arm/cpu-features.h @@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_rpres(const ARMISARegisters *id) return FIELD_EX64_IDREG(id, ID_AA64ISAR2, RPRES); } +static inline bool isar_feature_aa64_lut(const ARMISARegisters *id) +{ + return FIELD_EX64_IDREG(id, ID_AA64ISAR2, LUT); +} + static inline bool isar_feature_aa64_fp_simd(const ARMISARegisters *id) { /* We always set the AdvSIMD and FP fields identically. */ @@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_sve2(const ARMISARegisters *id) return FIELD_EX64_IDREG(id, ID_AA64ZFR0, SVEVER) != 0; } +static inline bool isar_feature_aa64_sve2p1(const ARMISARegisters *id) +{ + return FIELD_EX64_IDREG(id, ID_AA64ZFR0, SVEVER) >=2; +} + static inline bool isar_feature_aa64_sve2_aes(const ARMISARegisters *id) { return FIELD_EX64_IDREG(id, ID_AA64ZFR0, AES) != 0; @@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_sve_f64mm(const ARMISARegisters *id) return FIELD_EX64_IDREG(id, ID_AA64ZFR0, F64MM) != 0; } +static inline bool isar_feature_aa64_sve_b16b16(const ARMISARegisters *id) +{ + return FIELD_EX64_IDREG(id, ID_AA64ZFR0, B16B16); +} + +static inline bool isar_feature_aa64_sme_b16b16(const ARMISARegisters *id) +{ + return FIELD_EX64_IDREG(id, ID_AA64SMFR0, B16B16); +} + +static inline bool isar_feature_aa64_sme_f16f16(const ARMISARegisters *id) +{ + return FIELD_EX64_IDREG(id, ID_AA64SMFR0, F16F16); +} + static inline bool isar_feature_aa64_sme_f64f64(const ARMISARegisters *id) { return FIELD_EX64_IDREG(id, ID_AA64SMFR0, F64F64); @@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_sme_fa64(const ARMISARegisters *id) return FIELD_EX64_IDREG(id, ID_AA64SMFR0, FA64); } +static inline bool isar_feature_aa64_sme2(const ARMISARegisters *id) +{ + return FIELD_EX64_IDREG(id, ID_AA64SMFR0, SMEVER) != 0; +} + +static inline bool isar_feature_aa64_sme2p1(const ARMISARegisters *id) +{ + return FIELD_EX64_IDREG(id, ID_AA64SMFR0, SMEVER) >= 2; +} + +/* + * Combinations of feature tests, for ease of use with TRANS_FEAT. + */ +static inline bool isar_feature_aa64_sme_or_sve2p1(const ARMISARegisters *id) +{ + return isar_feature_aa64_sme(id) || isar_feature_aa64_sve2p1(id); +} + +static inline bool isar_feature_aa64_sme2_or_sve2p1(const ARMISARegisters *id) +{ + return isar_feature_aa64_sme2(id) || isar_feature_aa64_sve2p1(id); +} + +static inline bool isar_feature_aa64_sme2p1_or_sve2p1(const ARMISARegisters *id) +{ + return isar_feature_aa64_sme2p1(id) || isar_feature_aa64_sve2p1(id); +} + +static inline bool isar_feature_aa64_sme2_i16i64(const ARMISARegisters *id) +{ + return isar_feature_aa64_sme2(id) && isar_feature_aa64_sme_i16i64(id); +} + +static inline bool isar_feature_aa64_sme2_f64f64(const ARMISARegisters *id) +{ + return isar_feature_aa64_sme2(id) && isar_feature_aa64_sme_f64f64(id); +} + /* * Feature tests for "does this exist in either 32-bit or 64-bit?" */ diff --git a/target/arm/cpu.h b/target/arm/cpu.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -XXX,XX +XXX,XX @@ FIELD(ID_AA64ISAR2, SYSINSTR_128, 36, 4) FIELD(ID_AA64ISAR2, PRFMSLC, 40, 4) FIELD(ID_AA64ISAR2, RPRFM, 48, 4) FIELD(ID_AA64ISAR2, CSSC, 52, 4) +FIELD(ID_AA64ISAR2, LUT, 56, 4) FIELD(ID_AA64ISAR2, ATS1A, 60, 4) FIELD(ID_AA64PFR0, EL0, 0, 4) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> This is a 512-bit array introduced with SME2. Save it only when ZA is in use. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-16-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/cpu.h | 3 +++ target/arm/machine.c | 20 ++++++++++++++++++++ 2 files changed, 23 insertions(+) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState { uint64_t scxtnum_el[4]; struct { + /* SME2 ZT0 -- 512 bit array, with data ordered like ARMVectorReg. */ + uint64_t zt0[512 / 64] QEMU_ALIGNED(16); + /* * SME ZA storage -- 256 x 256 byte array, with bytes in host * word order, as we do with vfp.zregs[]. This corresponds to diff --git a/target/arm/machine.c b/target/arm/machine.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/machine.c +++ b/target/arm/machine.c @@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_za = { } }; +static bool zt0_needed(void *opaque) +{ + ARMCPU *cpu = opaque; + + return za_needed(cpu) && cpu_isar_feature(aa64_sme2, cpu); +} + +static const VMStateDescription vmstate_zt0 = { + .name = "cpu/zt0", + .version_id = 1, + .minimum_version_id = 1, + .needed = zt0_needed, + .fields = (VMStateField[]) { + VMSTATE_UINT64_ARRAY(env.za_state.zt0, ARMCPU, + ARRAY_SIZE(((CPUARMState *)0)->za_state.zt0)), + VMSTATE_END_OF_LIST() + } +}; + static bool serror_needed(void *opaque) { ARMCPU *cpu = opaque; @@ -XXX,XX +XXX,XX @@ const VMStateDescription vmstate_arm_cpu = { &vmstate_m_security, &vmstate_sve, &vmstate_za, + &vmstate_zt0, &vmstate_serror, &vmstate_irq_line_state, &vmstate_wfxt_timer, -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Pipe the value through from SMCR_ELx through hflags and into the disassembly context. Enable EZT0 in smcr_write. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-17-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/cpu.h | 2 ++ target/arm/tcg/translate.h | 1 + target/arm/cpu.c | 3 +++ target/arm/helper.c | 6 +++++- target/arm/tcg/hflags.c | 34 +++++++++++++++++++++++++++++++++- target/arm/tcg/translate-a64.c | 1 + 6 files changed, 45 insertions(+), 2 deletions(-) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -XXX,XX +XXX,XX @@ FIELD(SVCR, ZA, 1, 1) /* Fields for SMCR_ELx. */ FIELD(SMCR, LEN, 0, 4) +FIELD(SMCR, EZT0, 30, 1) FIELD(SMCR, FA64, 31, 1) /* Write a new value to v7m.exception, thus transitioning into or out @@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A64, NV2_MEM_E20, 35, 1) FIELD(TBFLAG_A64, NV2_MEM_BE, 36, 1) FIELD(TBFLAG_A64, AH, 37, 1) /* FPCR.AH */ FIELD(TBFLAG_A64, NEP, 38, 1) /* FPCR.NEP */ +FIELD(TBFLAG_A64, ZT0EXC_EL, 39, 2) /* * Helpers for using the above. Note that only the A64 accessors use diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate.h +++ b/target/arm/tcg/translate.h @@ -XXX,XX +XXX,XX @@ typedef struct DisasContext { int fp_excp_el; /* FP exception EL or 0 if enabled */ int sve_excp_el; /* SVE exception EL or 0 if enabled */ int sme_excp_el; /* SME exception EL or 0 if enabled */ + int zt0_excp_el; /* ZT0 exception EL or 0 if enabled */ int vl; /* current vector length in bytes */ int svl; /* current streaming vector length in bytes */ bool vfp_enabled; /* FP enabled via FPSCR.EN */ diff --git a/target/arm/cpu.c b/target/arm/cpu.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -XXX,XX +XXX,XX @@ void arm_emulate_firmware_reset(CPUState *cpustate, int target_el) env->cp15.cptr_el[3] |= R_CPTR_EL3_ESM_MASK; env->cp15.scr_el3 |= SCR_ENTP2; env->vfp.smcr_el[3] = 0xf; + if (cpu_isar_feature(aa64_sme2, cpu)) { + env->vfp.smcr_el[3] |= R_SMCR_EZT0_MASK; + } } if (cpu_isar_feature(aa64_hcx, cpu)) { env->cp15.scr_el3 |= SCR_HXEN; diff --git a/target/arm/helper.c b/target/arm/helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -XXX,XX +XXX,XX @@ static void smcr_write(CPUARMState *env, const ARMCPRegInfo *ri, { int cur_el = arm_current_el(env); int old_len = sve_vqm1_for_el(env, cur_el); + uint64_t valid_mask = R_SMCR_LEN_MASK | R_SMCR_FA64_MASK; int new_len; QEMU_BUILD_BUG_ON(ARM_MAX_VQ > R_SMCR_LEN_MASK + 1); - value &= R_SMCR_LEN_MASK | R_SMCR_FA64_MASK; + if (cpu_isar_feature(aa64_sme2, env_archcpu(env))) { + valid_mask |= R_SMCR_EZT0_MASK; + } + value &= valid_mask; raw_write(env, ri, value); /* diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/hflags.c +++ b/target/arm/tcg/hflags.c @@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a32(CPUARMState *env, int fp_el, return rebuild_hflags_common_32(env, fp_el, mmu_idx, flags); } +/* + * Return the exception level to which exceptions should be taken for ZT0. + * C.f. the ARM pseudocode function CheckSMEZT0Enabled, after the ZA check. + */ +static int zt0_exception_el(CPUARMState *env, int el) +{ +#ifndef CONFIG_USER_ONLY + if (el <= 1 + && !el_is_in_host(env, el) + && !FIELD_EX64(env->vfp.smcr_el[1], SMCR, EZT0)) { + return 1; + } + if (el <= 2 + && arm_is_el2_enabled(env) + && !FIELD_EX64(env->vfp.smcr_el[2], SMCR, EZT0)) { + return 2; + } + if (arm_feature(env, ARM_FEATURE_EL3) + && !FIELD_EX64(env->vfp.smcr_el[3], SMCR, EZT0)) { + return 3; + } +#endif + return 0; +} + static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el, ARMMMUIdx mmu_idx) { @@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el, DP_TBFLAG_A64(flags, PSTATE_SM, 1); DP_TBFLAG_A64(flags, SME_TRAP_NONSTREAMING, !sme_fa64(env, el)); } - DP_TBFLAG_A64(flags, PSTATE_ZA, FIELD_EX64(env->svcr, SVCR, ZA)); + + if (FIELD_EX64(env->svcr, SVCR, ZA)) { + DP_TBFLAG_A64(flags, PSTATE_ZA, 1); + if (cpu_isar_feature(aa64_sme2, env_archcpu(env))) { + int zt0_el = zt0_exception_el(env, el); + DP_TBFLAG_A64(flags, ZT0EXC_EL, zt0_el); + } + } } sctlr = regime_sctlr(env, stage1); diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase, dc->trap_eret = EX_TBFLAG_A64(tb_flags, TRAP_ERET); dc->sve_excp_el = EX_TBFLAG_A64(tb_flags, SVEEXC_EL); dc->sme_excp_el = EX_TBFLAG_A64(tb_flags, SMEEXC_EL); + dc->zt0_excp_el = EX_TBFLAG_A64(tb_flags, ZT0EXC_EL); dc->vl = (EX_TBFLAG_A64(tb_flags, VL) + 1) * 16; dc->svl = (EX_TBFLAG_A64(tb_flags, SVL) + 1) * 16; dc->pauth_active = EX_TBFLAG_A64(tb_flags, PAUTH_ACTIVE); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-18-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/syndrome.h | 1 + target/arm/tcg/sme.decode | 1 + target/arm/tcg/translate-sme.c | 26 ++++++++++++++++++++++++++ 3 files changed, 28 insertions(+) diff --git a/target/arm/syndrome.h b/target/arm/syndrome.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/syndrome.h +++ b/target/arm/syndrome.h @@ -XXX,XX +XXX,XX @@ typedef enum { SME_ET_Streaming, SME_ET_NotStreaming, SME_ET_InactiveZA, + SME_ET_InaccessibleZT0, } SMEExceptionType; #define ARM_EL_EC_LENGTH 6 diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ ### SME Misc ZERO 11000000 00 001 00000000000 imm:8 +ZERO_zt0 11000000 01 001 00000000000 00000001 ### SME Move into/from Array diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ #include "decode-sme.c.inc" +static bool sme2_zt0_enabled_check(DisasContext *s) +{ + if (!sme_za_enabled_check(s)) { + return false; + } + if (s->zt0_excp_el) { + gen_exception_insn_el(s, 0, EXCP_UDEF, + syn_smetrap(SME_ET_InaccessibleZT0, false), + s->zt0_excp_el); + return false; + } + return true; +} /* * Resolve tile.size[index] to a host pointer, where tile and index @@ -XXX,XX +XXX,XX @@ static bool trans_ZERO(DisasContext *s, arg_ZERO *a) return true; } +static bool trans_ZERO_zt0(DisasContext *s, arg_ZERO_zt0 *a) +{ + if (!dc_isar_feature(aa64_sme2, s)) { + return false; + } + if (sme_enabled_check(s) && sme2_zt0_enabled_check(s)) { + tcg_gen_gvec_dup_imm(MO_64, offsetof(CPUARMState, za_state.zt0), + sizeof_field(CPUARMState, za_state.zt0), + sizeof_field(CPUARMState, za_state.zt0), 0); + } + return true; +} + static bool trans_MOVA(DisasContext *s, arg_MOVA *a) { static gen_helper_gvec_4 * const h_fns[5] = { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Honor AlignmentEnforced() for LDR/STR (vector), (predicate), and (array vector). Within the expansion functions, clear @align when we're done emitting loads at the largest size. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-19-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-a64.h | 6 ++-- target/arm/tcg/translate-sme.c | 5 ++-- target/arm/tcg/translate-sve.c | 50 ++++++++++++++++++++++++---------- 3 files changed, 42 insertions(+), 19 deletions(-) diff --git a/target/arm/tcg/translate-a64.h b/target/arm/tcg/translate-a64.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.h +++ b/target/arm/tcg/translate-a64.h @@ -XXX,XX +XXX,XX @@ void gen_gvec_usqadd_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz); -void gen_sve_ldr(DisasContext *s, TCGv_ptr, int vofs, int len, int rn, int imm); -void gen_sve_str(DisasContext *s, TCGv_ptr, int vofs, int len, int rn, int imm); +void gen_sve_ldr(DisasContext *s, TCGv_ptr, int vofs, + int len, int rn, int imm, MemOp align); +void gen_sve_str(DisasContext *s, TCGv_ptr, int vofs, + int len, int rn, int imm, MemOp align); #endif /* TARGET_ARM_TRANSLATE_A64_H */ diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool trans_LDST1(DisasContext *s, arg_LDST1 *a) return true; } -typedef void GenLdStR(DisasContext *, TCGv_ptr, int, int, int, int); +typedef void GenLdStR(DisasContext *, TCGv_ptr, int, int, int, int, MemOp); static bool do_ldst_r(DisasContext *s, arg_ldstr *a, GenLdStR *fn) { @@ -XXX,XX +XXX,XX @@ static bool do_ldst_r(DisasContext *s, arg_ldstr *a, GenLdStR *fn) /* ZA[n] equates to ZA0H.B[n]. */ base = get_tile_rowcol(s, MO_8, a->rv, imm, false); - fn(s, base, 0, svl, a->rn, imm * svl); + fn(s, base, 0, svl, a->rn, imm * svl, + s->align_mem ? MO_ALIGN_16 : MO_UNALN); return true; } diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(UCVTF_dd, aa64_sve, gen_gvec_fpst_arg_zpz, */ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs, - int len, int rn, int imm) + int len, int rn, int imm, MemOp align) { int len_align = QEMU_ALIGN_DOWN(len, 16); int len_remain = len % 16; @@ -XXX,XX +XXX,XX @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs, for (i = 0; i < len_align; i += 16) { tcg_gen_qemu_ld_i128(t16, clean_addr, midx, - MO_LE | MO_128 | MO_ATOM_NONE); + MO_LE | MO_128 | MO_ATOM_NONE | align); tcg_gen_extr_i128_i64(t0, t1, t16); tcg_gen_st_i64(t0, base, vofs + i); tcg_gen_st_i64(t1, base, vofs + i + 8); tcg_gen_addi_i64(clean_addr, clean_addr, 16); } + if (len_align) { + align = MO_UNALN; + } } else { TCGLabel *loop = gen_new_label(); TCGv_ptr tp, i = tcg_temp_new_ptr(); @@ -XXX,XX +XXX,XX @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs, t16 = tcg_temp_new_i128(); tcg_gen_qemu_ld_i128(t16, clean_addr, midx, - MO_LE | MO_128 | MO_ATOM_NONE); + MO_LE | MO_128 | MO_ATOM_NONE | align); tcg_gen_addi_i64(clean_addr, clean_addr, 16); tp = tcg_temp_new_ptr(); @@ -XXX,XX +XXX,XX @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs, tcg_gen_st_i64(t1, tp, vofs + 8); tcg_gen_brcondi_ptr(TCG_COND_LTU, i, len_align, loop); + align = MO_UNALN; } /* @@ -XXX,XX +XXX,XX @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs, */ if (len_remain >= 8) { t0 = tcg_temp_new_i64(); - tcg_gen_qemu_ld_i64(t0, clean_addr, midx, MO_LEUQ | MO_ATOM_NONE); + tcg_gen_qemu_ld_i64(t0, clean_addr, midx, + MO_LEUQ | MO_ATOM_NONE | align); + align = MO_UNALN; tcg_gen_st_i64(t0, base, vofs + len_align); len_remain -= 8; len_align += 8; @@ -XXX,XX +XXX,XX @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs, case 4: case 8: tcg_gen_qemu_ld_i64(t0, clean_addr, midx, - MO_LE | ctz32(len_remain) | MO_ATOM_NONE); + MO_LE | ctz32(len_remain) + | MO_ATOM_NONE | align); break; case 6: t1 = tcg_temp_new_i64(); - tcg_gen_qemu_ld_i64(t0, clean_addr, midx, MO_LEUL | MO_ATOM_NONE); + tcg_gen_qemu_ld_i64(t0, clean_addr, midx, + MO_LEUL | MO_ATOM_NONE | align); tcg_gen_addi_i64(clean_addr, clean_addr, 4); tcg_gen_qemu_ld_i64(t1, clean_addr, midx, MO_LEUW | MO_ATOM_NONE); tcg_gen_deposit_i64(t0, t0, t1, 32, 32); @@ -XXX,XX +XXX,XX @@ void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs, /* Similarly for stores. */ void gen_sve_str(DisasContext *s, TCGv_ptr base, int vofs, - int len, int rn, int imm) + int len, int rn, int imm, MemOp align) { int len_align = QEMU_ALIGN_DOWN(len, 16); int len_remain = len % 16; @@ -XXX,XX +XXX,XX @@ void gen_sve_str(DisasContext *s, TCGv_ptr base, int vofs, tcg_gen_ld_i64(t1, base, vofs + i + 8); tcg_gen_concat_i64_i128(t16, t0, t1); tcg_gen_qemu_st_i128(t16, clean_addr, midx, - MO_LE | MO_128 | MO_ATOM_NONE); + MO_LE | MO_128 | MO_ATOM_NONE | align); tcg_gen_addi_i64(clean_addr, clean_addr, 16); } + if (len_align) { + align = MO_UNALN; + } } else { TCGLabel *loop = gen_new_label(); TCGv_ptr tp, i = tcg_temp_new_ptr(); @@ -XXX,XX +XXX,XX @@ void gen_sve_str(DisasContext *s, TCGv_ptr base, int vofs, tcg_gen_addi_i64(clean_addr, clean_addr, 16); tcg_gen_brcondi_ptr(TCG_COND_LTU, i, len_align, loop); + align = MO_UNALN; } /* Predicate register stores can be any multiple of 2. */ if (len_remain >= 8) { t0 = tcg_temp_new_i64(); tcg_gen_ld_i64(t0, base, vofs + len_align); - tcg_gen_qemu_st_i64(t0, clean_addr, midx, MO_LEUQ | MO_ATOM_NONE); + tcg_gen_qemu_st_i64(t0, clean_addr, midx, + MO_LEUQ | MO_ATOM_NONE | align); + align = MO_UNALN; len_remain -= 8; len_align += 8; if (len_remain) { @@ -XXX,XX +XXX,XX @@ void gen_sve_str(DisasContext *s, TCGv_ptr base, int vofs, case 4: case 8: tcg_gen_qemu_st_i64(t0, clean_addr, midx, - MO_LE | ctz32(len_remain) | MO_ATOM_NONE); + MO_LE | ctz32(len_remain) + | MO_ATOM_NONE | align); break; case 6: - tcg_gen_qemu_st_i64(t0, clean_addr, midx, MO_LEUL | MO_ATOM_NONE); + tcg_gen_qemu_st_i64(t0, clean_addr, midx, + MO_LEUL | MO_ATOM_NONE | align); tcg_gen_addi_i64(clean_addr, clean_addr, 4); tcg_gen_shri_i64(t0, t0, 32); tcg_gen_qemu_st_i64(t0, clean_addr, midx, MO_LEUW | MO_ATOM_NONE); @@ -XXX,XX +XXX,XX @@ static bool trans_LDR_zri(DisasContext *s, arg_rri *a) if (sve_access_check(s)) { int size = vec_full_reg_size(s); int off = vec_full_reg_offset(s, a->rd); - gen_sve_ldr(s, tcg_env, off, size, a->rn, a->imm * size); + gen_sve_ldr(s, tcg_env, off, size, a->rn, a->imm * size, + s->align_mem ? MO_ALIGN_16 : MO_UNALN); } return true; } @@ -XXX,XX +XXX,XX @@ static bool trans_LDR_pri(DisasContext *s, arg_rri *a) if (sve_access_check(s)) { int size = pred_full_reg_size(s); int off = pred_full_reg_offset(s, a->rd); - gen_sve_ldr(s, tcg_env, off, size, a->rn, a->imm * size); + gen_sve_ldr(s, tcg_env, off, size, a->rn, a->imm * size, + s->align_mem ? MO_ALIGN_2 : MO_UNALN); } return true; } @@ -XXX,XX +XXX,XX @@ static bool trans_STR_zri(DisasContext *s, arg_rri *a) if (sve_access_check(s)) { int size = vec_full_reg_size(s); int off = vec_full_reg_offset(s, a->rd); - gen_sve_str(s, tcg_env, off, size, a->rn, a->imm * size); + gen_sve_str(s, tcg_env, off, size, a->rn, a->imm * size, + s->align_mem ? MO_ALIGN_16 : MO_UNALN); } return true; } @@ -XXX,XX +XXX,XX @@ static bool trans_STR_pri(DisasContext *s, arg_rri *a) if (sve_access_check(s)) { int size = pred_full_reg_size(s); int off = pred_full_reg_offset(s, a->rd); - gen_sve_str(s, tcg_env, off, size, a->rn, a->imm * size); + gen_sve_str(s, tcg_env, off, size, a->rn, a->imm * size, + s->align_mem ? MO_ALIGN_2 : MO_UNALN); } return true; } -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-20-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 6 ++++++ target/arm/tcg/translate-sme.c | 13 +++++++++++++ 2 files changed, 19 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ LDST1 1110000 111 st:1 rm:5 v:1 .. pg:3 rn:5 0 za_imm:4 \ LDR 1110000 100 0 000000 .. 000 ..... 0 .... @ldstr STR 1110000 100 1 000000 .. 000 ..... 0 .... @ldstr +&ldstzt0 rn +@ldstzt0 ....... ... . ...... .. ... rn:5 ..... &ldstzt0 + +LDR_zt0 1110000 100 0 111111 00 000 ..... 00000 @ldstzt0 +STR_zt0 1110000 100 1 111111 00 000 ..... 00000 @ldstzt0 + ### SME Add Vector to Array &adda zad zn pm pn diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_ldst_r(DisasContext *s, arg_ldstr *a, GenLdStR *fn) TRANS_FEAT(LDR, aa64_sme, do_ldst_r, a, gen_sve_ldr) TRANS_FEAT(STR, aa64_sme, do_ldst_r, a, gen_sve_str) +static bool do_ldst_zt0(DisasContext *s, arg_ldstzt0 *a, GenLdStR *fn) +{ + if (sme2_zt0_enabled_check(s)) { + fn(s, tcg_env, offsetof(CPUARMState, za_state.zt0), + sizeof_field(CPUARMState, za_state.zt0), a->rn, 0, + s->align_mem ? MO_ALIGN_16 : MO_UNALN); + } + return true; +} + +TRANS_FEAT(LDR_zt0, aa64_sme2, do_ldst_zt0, a, gen_sve_ldr) +TRANS_FEAT(STR_zt0, aa64_sme2, do_ldst_zt0, a, gen_sve_str) + static bool do_adda(DisasContext *s, arg_adda *a, MemOp esz, gen_helper_gvec_4 *fn) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-21-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 5 +++++ target/arm/tcg/translate-sme.c | 13 +++++++++++++ 2 files changed, 18 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ MOVA 11000000 esz:2 00001 0 v:1 .. pg:3 0 za_imm:4 zr:5 \ MOVA 11000000 11 00001 1 v:1 .. pg:3 0 za_imm:4 zr:5 \ &mova to_vec=1 rs=%mova_rs esz=4 +### SME Move into/from ZT0 + +MOVT_rzt 1100 0000 0100 1100 0 off:3 00 11111 rt:5 +MOVT_ztr 1100 0000 0100 1110 0 off:3 00 11111 rt:5 + ### SME Memory &ldst esz rs pg rn rm za_imm v:bool st:bool diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool trans_MOVA(DisasContext *s, arg_MOVA *a) return true; } +static bool do_movt(DisasContext *s, arg_MOVT_rzt *a, + void (*func)(TCGv_i64, TCGv_ptr, tcg_target_long)) +{ + if (sme2_zt0_enabled_check(s)) { + func(cpu_reg(s, a->rt), tcg_env, + offsetof(CPUARMState, za_state.zt0) + a->off * 8); + } + return true; +} + +TRANS_FEAT(MOVT_rzt, aa64_sme2, do_movt, a, tcg_gen_ld_i64) +TRANS_FEAT(MOVT_ztr, aa64_sme2, do_movt, a, tcg_gen_st_i64) + static bool trans_LDST1(DisasContext *s, arg_LDST1 *a) { typedef void GenLdSt1(TCGv_env, TCGv_ptr, TCGv_ptr, TCGv, TCGv_i32); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Decode tile number and index offset beforehand and separately. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-22-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 46 +++++++++++++++++++++++----------- target/arm/tcg/translate-sme.c | 17 +++++-------- 2 files changed, 38 insertions(+), 25 deletions(-) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ ZERO_zt0 11000000 01 001 00000000000 00000001 ### SME Move into/from Array %mova_rs 13:2 !function=plus_12 -&mova esz rs pg zr za_imm v:bool to_vec:bool +&mova esz rs pg zr za off v:bool to_vec:bool -MOVA 11000000 esz:2 00000 0 v:1 .. pg:3 zr:5 0 za_imm:4 \ - &mova to_vec=0 rs=%mova_rs -MOVA 11000000 11 00000 1 v:1 .. pg:3 zr:5 0 za_imm:4 \ - &mova to_vec=0 rs=%mova_rs esz=4 +MOVA 11000000 00 00000 0 v:1 .. pg:3 zr:5 0 off:4 \ + &mova to_vec=0 rs=%mova_rs esz=0 za=0 +MOVA 11000000 01 00000 0 v:1 .. pg:3 zr:5 0 za:1 off:3 \ + &mova to_vec=0 rs=%mova_rs esz=1 +MOVA 11000000 10 00000 0 v:1 .. pg:3 zr:5 0 za:2 off:2 \ + &mova to_vec=0 rs=%mova_rs esz=2 +MOVA 11000000 11 00000 0 v:1 .. pg:3 zr:5 0 za:3 off:1 \ + &mova to_vec=0 rs=%mova_rs esz=3 +MOVA 11000000 11 00000 1 v:1 .. pg:3 zr:5 0 za:4 \ + &mova to_vec=0 rs=%mova_rs esz=4 off=0 -MOVA 11000000 esz:2 00001 0 v:1 .. pg:3 0 za_imm:4 zr:5 \ - &mova to_vec=1 rs=%mova_rs -MOVA 11000000 11 00001 1 v:1 .. pg:3 0 za_imm:4 zr:5 \ - &mova to_vec=1 rs=%mova_rs esz=4 +MOVA 11000000 00 00001 0 v:1 .. pg:3 0 off:4 zr:5 \ + &mova to_vec=1 rs=%mova_rs esz=0 za=0 +MOVA 11000000 01 00001 0 v:1 .. pg:3 0 za:1 off:3 zr:5 \ + &mova to_vec=1 rs=%mova_rs esz=1 +MOVA 11000000 10 00001 0 v:1 .. pg:3 0 za:2 off:2 zr:5 \ + &mova to_vec=1 rs=%mova_rs esz=2 +MOVA 11000000 11 00001 0 v:1 .. pg:3 0 za:3 off:1 zr:5 \ + &mova to_vec=1 rs=%mova_rs esz=3 +MOVA 11000000 11 00001 1 v:1 .. pg:3 0 za:4 zr:5 \ + &mova to_vec=1 rs=%mova_rs esz=4 off=0 ### SME Move into/from ZT0 @@ -XXX,XX +XXX,XX @@ MOVT_ztr 1100 0000 0100 1110 0 off:3 00 11111 rt:5 ### SME Memory -&ldst esz rs pg rn rm za_imm v:bool st:bool +&ldst esz rs pg rn rm za off v:bool st:bool -LDST1 1110000 0 esz:2 st:1 rm:5 v:1 .. pg:3 rn:5 0 za_imm:4 \ - &ldst rs=%mova_rs -LDST1 1110000 111 st:1 rm:5 v:1 .. pg:3 rn:5 0 za_imm:4 \ - &ldst esz=4 rs=%mova_rs +LDST1 1110000 0 00 st:1 rm:5 v:1 .. pg:3 rn:5 0 off:4 \ + &ldst rs=%mova_rs esz=0 za=0 +LDST1 1110000 0 01 st:1 rm:5 v:1 .. pg:3 rn:5 0 za:1 off:3 \ + &ldst rs=%mova_rs esz=1 +LDST1 1110000 0 10 st:1 rm:5 v:1 .. pg:3 rn:5 0 za:2 off:2 \ + &ldst rs=%mova_rs esz=2 +LDST1 1110000 0 11 st:1 rm:5 v:1 .. pg:3 rn:5 0 za:3 off:1 \ + &ldst rs=%mova_rs esz=3 +LDST1 1110000 1 11 st:1 rm:5 v:1 .. pg:3 rn:5 0 za:4 \ + &ldst rs=%mova_rs esz=4 off=0 &ldstr rv rn imm @ldstr ....... ... . ...... .. ... rn:5 . imm:4 \ diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool sme2_zt0_enabled_check(DisasContext *s) return true; } -/* - * Resolve tile.size[index] to a host pointer, where tile and index - * are always decoded together, dependent on the element size. - */ +/* Resolve tile.size[rs+imm] to a host pointer. */ static TCGv_ptr get_tile_rowcol(DisasContext *s, int esz, int rs, - int tile_index, bool vertical) + int tile, int imm, bool vertical) { - int tile = tile_index >> (4 - esz); - int index = esz == MO_128 ? 0 : extract32(tile_index, 0, 4 - esz); int pos, len, offset; TCGv_i32 tmp; TCGv_ptr addr; @@ -XXX,XX +XXX,XX @@ static TCGv_ptr get_tile_rowcol(DisasContext *s, int esz, int rs, /* Compute the final index, which is Rs+imm. */ tmp = tcg_temp_new_i32(); tcg_gen_trunc_tl_i32(tmp, cpu_reg(s, rs)); - tcg_gen_addi_i32(tmp, tmp, index); + tcg_gen_addi_i32(tmp, tmp, imm); /* Prepare a power-of-two modulo via extraction of @len bits. */ len = ctz32(streaming_vec_reg_size(s)) - esz; @@ -XXX,XX +XXX,XX @@ static bool trans_MOVA(DisasContext *s, arg_MOVA *a) return true; } - t_za = get_tile_rowcol(s, a->esz, a->rs, a->za_imm, a->v); + t_za = get_tile_rowcol(s, a->esz, a->rs, a->za, a->off, a->v); t_zr = vec_full_reg_ptr(s, a->zr); t_pg = pred_full_reg_ptr(s, a->pg); @@ -XXX,XX +XXX,XX @@ static bool trans_LDST1(DisasContext *s, arg_LDST1 *a) return true; } - t_za = get_tile_rowcol(s, a->esz, a->rs, a->za_imm, a->v); + t_za = get_tile_rowcol(s, a->esz, a->rs, a->za, a->off, a->v); t_pg = pred_full_reg_ptr(s, a->pg); addr = tcg_temp_new_i64(); @@ -XXX,XX +XXX,XX @@ static bool do_ldst_r(DisasContext *s, arg_ldstr *a, GenLdStR *fn) } /* ZA[n] equates to ZA0H.B[n]. */ - base = get_tile_rowcol(s, MO_8, a->rv, imm, false); + base = get_tile_rowcol(s, MO_8, a->rv, 0, imm, false); fn(s, base, 0, svl, a->rn, imm * svl, s->align_mem ? MO_ALIGN_16 : MO_UNALN); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Prepare for more kinds of MOVA from SME2 by renaming the existing SME1 MOVA to indicate tile to/from vector. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-23-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 42 +++++++++++++++++----------------- target/arm/tcg/translate-sme.c | 12 +++++----- 2 files changed, 27 insertions(+), 27 deletions(-) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ ZERO_zt0 11000000 01 001 00000000000 00000001 ### SME Move into/from Array %mova_rs 13:2 !function=plus_12 -&mova esz rs pg zr za off v:bool to_vec:bool +&mova_p esz rs pg zr za off v:bool -MOVA 11000000 00 00000 0 v:1 .. pg:3 zr:5 0 off:4 \ - &mova to_vec=0 rs=%mova_rs esz=0 za=0 -MOVA 11000000 01 00000 0 v:1 .. pg:3 zr:5 0 za:1 off:3 \ - &mova to_vec=0 rs=%mova_rs esz=1 -MOVA 11000000 10 00000 0 v:1 .. pg:3 zr:5 0 za:2 off:2 \ - &mova to_vec=0 rs=%mova_rs esz=2 -MOVA 11000000 11 00000 0 v:1 .. pg:3 zr:5 0 za:3 off:1 \ - &mova to_vec=0 rs=%mova_rs esz=3 -MOVA 11000000 11 00000 1 v:1 .. pg:3 zr:5 0 za:4 \ - &mova to_vec=0 rs=%mova_rs esz=4 off=0 +MOVA_tz 11000000 00 00000 0 v:1 .. pg:3 zr:5 0 off:4 \ + &mova_p rs=%mova_rs esz=0 za=0 +MOVA_tz 11000000 01 00000 0 v:1 .. pg:3 zr:5 0 za:1 off:3 \ + &mova_p rs=%mova_rs esz=1 +MOVA_tz 11000000 10 00000 0 v:1 .. pg:3 zr:5 0 za:2 off:2 \ + &mova_p rs=%mova_rs esz=2 +MOVA_tz 11000000 11 00000 0 v:1 .. pg:3 zr:5 0 za:3 off:1 \ + &mova_p rs=%mova_rs esz=3 +MOVA_tz 11000000 11 00000 1 v:1 .. pg:3 zr:5 0 za:4 \ + &mova_p rs=%mova_rs esz=4 off=0 -MOVA 11000000 00 00001 0 v:1 .. pg:3 0 off:4 zr:5 \ - &mova to_vec=1 rs=%mova_rs esz=0 za=0 -MOVA 11000000 01 00001 0 v:1 .. pg:3 0 za:1 off:3 zr:5 \ - &mova to_vec=1 rs=%mova_rs esz=1 -MOVA 11000000 10 00001 0 v:1 .. pg:3 0 za:2 off:2 zr:5 \ - &mova to_vec=1 rs=%mova_rs esz=2 -MOVA 11000000 11 00001 0 v:1 .. pg:3 0 za:3 off:1 zr:5 \ - &mova to_vec=1 rs=%mova_rs esz=3 -MOVA 11000000 11 00001 1 v:1 .. pg:3 0 za:4 zr:5 \ - &mova to_vec=1 rs=%mova_rs esz=4 off=0 +MOVA_zt 11000000 00 00001 0 v:1 .. pg:3 0 off:4 zr:5 \ + &mova_p rs=%mova_rs esz=0 za=0 +MOVA_zt 11000000 01 00001 0 v:1 .. pg:3 0 za:1 off:3 zr:5 \ + &mova_p rs=%mova_rs esz=1 +MOVA_zt 11000000 10 00001 0 v:1 .. pg:3 0 za:2 off:2 zr:5 \ + &mova_p rs=%mova_rs esz=2 +MOVA_zt 11000000 11 00001 0 v:1 .. pg:3 0 za:3 off:1 zr:5 \ + &mova_p rs=%mova_rs esz=3 +MOVA_zt 11000000 11 00001 1 v:1 .. pg:3 0 za:4 zr:5 \ + &mova_p rs=%mova_rs esz=4 off=0 ### SME Move into/from ZT0 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool trans_ZERO_zt0(DisasContext *s, arg_ZERO_zt0 *a) return true; } -static bool trans_MOVA(DisasContext *s, arg_MOVA *a) +static bool do_mova_tile(DisasContext *s, arg_mova_p *a, bool to_vec) { static gen_helper_gvec_4 * const h_fns[5] = { gen_helper_sve_sel_zpzz_b, gen_helper_sve_sel_zpzz_h, @@ -XXX,XX +XXX,XX @@ static bool trans_MOVA(DisasContext *s, arg_MOVA *a) TCGv_i32 t_desc; int svl; - if (!dc_isar_feature(aa64_sme, s)) { - return false; - } if (!sme_smza_enabled_check(s)) { return true; } @@ -XXX,XX +XXX,XX @@ static bool trans_MOVA(DisasContext *s, arg_MOVA *a) if (a->v) { /* Vertical slice -- use sme mova helpers. */ - if (a->to_vec) { + if (to_vec) { zc_fns[a->esz](t_zr, t_za, t_pg, t_desc); } else { cz_fns[a->esz](t_za, t_zr, t_pg, t_desc); } } else { /* Horizontal slice -- reuse sve sel helpers. */ - if (a->to_vec) { + if (to_vec) { h_fns[a->esz](t_zr, t_za, t_zr, t_pg, t_desc); } else { h_fns[a->esz](t_za, t_zr, t_za, t_pg, t_desc); @@ -XXX,XX +XXX,XX @@ static bool trans_MOVA(DisasContext *s, arg_MOVA *a) return true; } +TRANS_FEAT(MOVA_tz, aa64_sme, do_mova_tile, a, false) +TRANS_FEAT(MOVA_zt, aa64_sme, do_mova_tile, a, true) + static bool do_movt(DisasContext *s, arg_MOVT_rzt *a, void (*func)(TCGv_i64, TCGv_ptr, tcg_target_long)) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Prepare for MOVA array to/from vector with multiple registers by adding a div_len parameter, herein always 1, and a vec_mod parameter, herein always 0. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-24-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-sme.c | 47 +++++++++++++++++++++++----------- 1 file changed, 32 insertions(+), 15 deletions(-) diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool sme2_zt0_enabled_check(DisasContext *s) /* Resolve tile.size[rs+imm] to a host pointer. */ static TCGv_ptr get_tile_rowcol(DisasContext *s, int esz, int rs, - int tile, int imm, bool vertical) + int tile, int imm, int div_len, + int vec_mod, bool vertical) { int pos, len, offset; TCGv_i32 tmp; @@ -XXX,XX +XXX,XX @@ static TCGv_ptr get_tile_rowcol(DisasContext *s, int esz, int rs, /* Compute the final index, which is Rs+imm. */ tmp = tcg_temp_new_i32(); tcg_gen_trunc_tl_i32(tmp, cpu_reg(s, rs)); + /* + * Round the vector index down to a multiple of vec_mod if necessary. + * We do this before adding the offset, to handle cases like + * MOVA (tile to vector, 2 registers) where we want to call this + * several times in a loop with an increasing offset. We rely on + * the instruction encodings always forcing the initial offset in + * [rs + offset] to be a multiple of vec_mod. The pseudocode usually + * does the round-down after adding the offset rather than before, + * but MOVA is an exception. + */ + if (vec_mod > 1) { + tcg_gen_andc_i32(tmp, tmp, tcg_constant_i32(vec_mod - 1)); + } tcg_gen_addi_i32(tmp, tmp, imm); /* Prepare a power-of-two modulo via extraction of @len bits. */ - len = ctz32(streaming_vec_reg_size(s)) - esz; + len = ctz32(streaming_vec_reg_size(s) / div_len) - esz; if (!len) { /* @@ -XXX,XX +XXX,XX @@ static TCGv_ptr get_tile_rowcol(DisasContext *s, int esz, int rs, return addr; } +/* Resolve ZArray[rs+imm] to a host pointer. */ +static TCGv_ptr get_zarray(DisasContext *s, int rs, int imm, + int div_len, int vec_mod) +{ + /* ZA[n] equates to ZA0H.B[n]. */ + return get_tile_rowcol(s, MO_8, rs, 0, imm, div_len, vec_mod, false); +} + /* * Resolve tile.size[0] to a host pointer. * Used by e.g. outer product insns where we require the entire tile. @@ -XXX,XX +XXX,XX @@ static bool do_mova_tile(DisasContext *s, arg_mova_p *a, bool to_vec) return true; } - t_za = get_tile_rowcol(s, a->esz, a->rs, a->za, a->off, a->v); + t_za = get_tile_rowcol(s, a->esz, a->rs, a->za, a->off, 1, 0, a->v); t_zr = vec_full_reg_ptr(s, a->zr); t_pg = pred_full_reg_ptr(s, a->pg); @@ -XXX,XX +XXX,XX @@ static bool trans_LDST1(DisasContext *s, arg_LDST1 *a) return true; } - t_za = get_tile_rowcol(s, a->esz, a->rs, a->za, a->off, a->v); + t_za = get_tile_rowcol(s, a->esz, a->rs, a->za, a->off, 1, 0, a->v); t_pg = pred_full_reg_ptr(s, a->pg); addr = tcg_temp_new_i64(); @@ -XXX,XX +XXX,XX @@ typedef void GenLdStR(DisasContext *, TCGv_ptr, int, int, int, int, MemOp); static bool do_ldst_r(DisasContext *s, arg_ldstr *a, GenLdStR *fn) { - int svl = streaming_vec_reg_size(s); - int imm = a->imm; - TCGv_ptr base; + if (sme_za_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + int imm = a->imm; + TCGv_ptr base = get_zarray(s, a->rv, imm, 1, 0); - if (!sme_za_enabled_check(s)) { - return true; + fn(s, base, 0, svl, a->rn, imm * svl, + s->align_mem ? MO_ALIGN_16 : MO_UNALN); } - - /* ZA[n] equates to ZA0H.B[n]. */ - base = get_tile_rowcol(s, MO_8, a->rv, 0, imm, false); - - fn(s, base, 0, svl, a->rn, imm * svl, - s->align_mem ? MO_ALIGN_16 : MO_UNALN); return true; } -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-25-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/cpu.h | 1 + target/arm/cpu64.c | 1 + 2 files changed, 2 insertions(+) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -XXX,XX +XXX,XX @@ struct ArchCPU { /* Used to set the maximum vector length the cpu will support. */ uint32_t sve_max_vq; + uint32_t sme_max_vq; #ifdef CONFIG_USER_ONLY /* Used to set the default vector length at process start. */ diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/cpu64.c +++ b/target/arm/cpu64.c @@ -XXX,XX +XXX,XX @@ void arm_cpu_sme_finalize(ARMCPU *cpu, Error **errp) } cpu->sme_vq.map = vq_map; + cpu->sme_max_vq = 32 - clz32(vq_map); } static bool cpu_arm_get_sme(Object *obj, Error **errp) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-26-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 9 +++++ target/arm/tcg/translate.h | 1 + target/arm/tcg/sme.decode | 37 ++++++++++++++++++ target/arm/tcg/sme_helper.c | 64 ++++++++++++++++++++++++++++++ target/arm/tcg/translate-a64.c | 1 + target/arm/tcg/translate-sme.c | 71 ++++++++++++++++++++++++++++++++++ 6 files changed, 183 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sme_mova_zc_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sme_mova_cz_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sme_mova_zc_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_mova_cz_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_mova_zc_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_mova_cz_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_mova_zc_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_mova_cz_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_mova_zc_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_mova_cz_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_mova_zc_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(sme_ld1b_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_5(sme_ld1b_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_5(sme_ld1b_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32) diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate.h +++ b/target/arm/tcg/translate.h @@ -XXX,XX +XXX,XX @@ typedef struct DisasContext { int zt0_excp_el; /* ZT0 exception EL or 0 if enabled */ int vl; /* current vector length in bytes */ int svl; /* current streaming vector length in bytes */ + int max_svl; /* maximum implemented streaming vector length */ bool vfp_enabled; /* FP enabled via FPSCR.EN */ int vec_len; int vec_stride; diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ ZERO_zt0 11000000 01 001 00000000000 00000001 %mova_rs 13:2 !function=plus_12 &mova_p esz rs pg zr za off v:bool +&mova_t esz rs zr za off v:bool MOVA_tz 11000000 00 00000 0 v:1 .. pg:3 zr:5 0 off:4 \ &mova_p rs=%mova_rs esz=0 za=0 @@ -XXX,XX +XXX,XX @@ MOVA_zt 11000000 11 00001 0 v:1 .. pg:3 0 za:3 off:1 zr:5 \ MOVA_zt 11000000 11 00001 1 v:1 .. pg:3 0 za:4 zr:5 \ &mova_p rs=%mova_rs esz=4 off=0 +MOVA_tz2 11000000 00 00010 0 v:1 .. 000 zr:4 0 00 off:3 \ + &mova_t rs=%mova_rs esz=0 za=0 +MOVA_tz2 11000000 01 00010 0 v:1 .. 000 zr:4 0 00 za:1 off:2 \ + &mova_t rs=%mova_rs esz=1 +MOVA_tz2 11000000 10 00010 0 v:1 .. 000 zr:4 0 00 za:2 off:1 \ + &mova_t rs=%mova_rs esz=2 +MOVA_tz2 11000000 11 00010 0 v:1 .. 000 zr:4 0 00 za:3 \ + &mova_t rs=%mova_rs esz=3 off=0 + +MOVA_zt2 11000000 00 00011 0 v:1 .. 000 00 off:3 zr:4 0 \ + &mova_t rs=%mova_rs esz=0 za=0 +MOVA_zt2 11000000 01 00011 0 v:1 .. 000 00 za:1 off:2 zr:4 0 \ + &mova_t rs=%mova_rs esz=1 +MOVA_zt2 11000000 10 00011 0 v:1 .. 000 00 za:2 off:1 zr:4 0 \ + &mova_t rs=%mova_rs esz=2 +MOVA_zt2 11000000 11 00011 0 v:1 .. 000 00 za:3 zr:4 0 \ + &mova_t rs=%mova_rs esz=3 off=0 + +MOVA_tz4 11000000 00 00010 0 v:1 .. 001 zr:3 00 000 off:2 \ + &mova_t rs=%mova_rs esz=0 za=0 +MOVA_tz4 11000000 01 00010 0 v:1 .. 001 zr:3 00 000 za:1 off:1 \ + &mova_t rs=%mova_rs esz=1 +MOVA_tz4 11000000 10 00010 0 v:1 .. 001 zr:3 00 000 za:2 \ + &mova_t rs=%mova_rs esz=2 off=0 +MOVA_tz4 11000000 11 00010 0 v:1 .. 001 zr:3 00 00 za:3 \ + &mova_t rs=%mova_rs esz=3 off=0 + +MOVA_zt4 11000000 00 00011 0 v:1 .. 001 000 off:2 zr:3 00 \ + &mova_t rs=%mova_rs esz=0 za=0 +MOVA_zt4 11000000 01 00011 0 v:1 .. 001 000 za:1 off:1 zr:3 00 \ + &mova_t rs=%mova_rs esz=1 +MOVA_zt4 11000000 10 00011 0 v:1 .. 001 000 za:2 zr:3 00 \ + &mova_t rs=%mova_rs esz=2 off=0 +MOVA_zt4 11000000 11 00011 0 v:1 .. 001 00 za:3 zr:3 00 \ + &mova_t rs=%mova_rs esz=3 off=0 + ### SME Move into/from ZT0 MOVT_rzt 1100 0000 0100 1100 0 off:3 00 11111 rt:5 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme_mova_zc_q)(void *vd, void *za, void *vg, uint32_t desc) #undef DO_MOVA_Z +void HELPER(sme2_mova_zc_b)(void *vdst, void *vsrc, uint32_t desc) +{ + const uint8_t *src = vsrc; + uint8_t *dst = vdst; + size_t i, n = simd_oprsz(desc); + + for (i = 0; i < n; ++i) { + dst[i] = src[tile_vslice_index(i)]; + } +} + +void HELPER(sme2_mova_zc_h)(void *vdst, void *vsrc, uint32_t desc) +{ + const uint16_t *src = vsrc; + uint16_t *dst = vdst; + size_t i, n = simd_oprsz(desc) / 2; + + for (i = 0; i < n; ++i) { + dst[i] = src[tile_vslice_index(i)]; + } +} + +void HELPER(sme2_mova_zc_s)(void *vdst, void *vsrc, uint32_t desc) +{ + const uint32_t *src = vsrc; + uint32_t *dst = vdst; + size_t i, n = simd_oprsz(desc) / 4; + + for (i = 0; i < n; ++i) { + dst[i] = src[tile_vslice_index(i)]; + } +} + +void HELPER(sme2_mova_zc_d)(void *vdst, void *vsrc, uint32_t desc) +{ + const uint64_t *src = vsrc; + uint64_t *dst = vdst; + size_t i, n = simd_oprsz(desc) / 8; + + for (i = 0; i < n; ++i) { + dst[i] = src[tile_vslice_index(i)]; + } +} + /* * Clear elements in a tile slice comprising len bytes. */ @@ -XXX,XX +XXX,XX @@ static void copy_vertical_q(void *vdst, const void *vsrc, size_t len) } } +void HELPER(sme2_mova_cz_b)(void *vdst, void *vsrc, uint32_t desc) +{ + copy_vertical_b(vdst, vsrc, simd_oprsz(desc)); +} + +void HELPER(sme2_mova_cz_h)(void *vdst, void *vsrc, uint32_t desc) +{ + copy_vertical_h(vdst, vsrc, simd_oprsz(desc)); +} + +void HELPER(sme2_mova_cz_s)(void *vdst, void *vsrc, uint32_t desc) +{ + copy_vertical_s(vdst, vsrc, simd_oprsz(desc)); +} + +void HELPER(sme2_mova_cz_d)(void *vdst, void *vsrc, uint32_t desc) +{ + copy_vertical_d(vdst, vsrc, simd_oprsz(desc)); +} + /* * Host and TLB primitives for vertical tile slice addressing. */ diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase, dc->zt0_excp_el = EX_TBFLAG_A64(tb_flags, ZT0EXC_EL); dc->vl = (EX_TBFLAG_A64(tb_flags, VL) + 1) * 16; dc->svl = (EX_TBFLAG_A64(tb_flags, SVL) + 1) * 16; + dc->max_svl = arm_cpu->sme_max_vq * 16; dc->pauth_active = EX_TBFLAG_A64(tb_flags, PAUTH_ACTIVE); dc->bt = EX_TBFLAG_A64(tb_flags, BT); dc->btype = EX_TBFLAG_A64(tb_flags, BTYPE); diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_mova_tile(DisasContext *s, arg_mova_p *a, bool to_vec) TRANS_FEAT(MOVA_tz, aa64_sme, do_mova_tile, a, false) TRANS_FEAT(MOVA_zt, aa64_sme, do_mova_tile, a, true) +static bool do_mova_tile_n(DisasContext *s, arg_mova_t *a, int n, bool to_vec) +{ + static gen_helper_gvec_2 * const cz_fns[] = { + gen_helper_sme2_mova_cz_b, gen_helper_sme2_mova_cz_h, + gen_helper_sme2_mova_cz_s, gen_helper_sme2_mova_cz_d, + }; + static gen_helper_gvec_2 * const zc_fns[] = { + gen_helper_sme2_mova_zc_b, gen_helper_sme2_mova_zc_h, + gen_helper_sme2_mova_zc_s, gen_helper_sme2_mova_zc_d, + }; + TCGv_ptr t_za; + int svl, bytes_per_op = n << a->esz; + + /* + * The MaxImplementedSVL check happens in the decode pseudocode, + * before the SM+ZA enabled check in the operation pseudocode. + * This will (currently) only fail for NREG=4, ESZ=MO_64. + */ + if (s->max_svl < bytes_per_op) { + unallocated_encoding(s); + return true; + } + + if (!sme_smza_enabled_check(s)) { + return true; + } + + svl = streaming_vec_reg_size(s); + + /* + * The CurrentVL check happens in the operation pseudocode, + * after the SM+ZA enabled check. + */ + if (svl < bytes_per_op) { + unallocated_encoding(s); + return true; + } + + if (a->v) { + TCGv_i32 t_desc = tcg_constant_i32(simd_desc(svl, svl, 0)); + + for (int i = 0; i < n; ++i) { + TCGv_ptr t_zr = vec_full_reg_ptr(s, a->zr * n + i); + t_za = get_tile_rowcol(s, a->esz, a->rs, a->za, + a->off * n + i, 1, n, a->v); + if (to_vec) { + zc_fns[a->esz](t_zr, t_za, t_desc); + } else { + cz_fns[a->esz](t_za, t_zr, t_desc); + } + } + } else { + for (int i = 0; i < n; ++i) { + int o_zr = vec_full_reg_offset(s, a->zr * n + i); + t_za = get_tile_rowcol(s, a->esz, a->rs, a->za, + a->off * n + i, 1, n, a->v); + if (to_vec) { + tcg_gen_gvec_mov_var(MO_8, tcg_env, o_zr, t_za, 0, svl, svl); + } else { + tcg_gen_gvec_mov_var(MO_8, t_za, 0, tcg_env, o_zr, svl, svl); + } + } + } + return true; +} + +TRANS_FEAT(MOVA_tz2, aa64_sme2, do_mova_tile_n, a, 2, false) +TRANS_FEAT(MOVA_tz4, aa64_sme2, do_mova_tile_n, a, 4, false) +TRANS_FEAT(MOVA_zt2, aa64_sme2, do_mova_tile_n, a, 2, true) +TRANS_FEAT(MOVA_zt4, aa64_sme2, do_mova_tile_n, a, 4, true) + static bool do_movt(DisasContext *s, arg_MOVT_rzt *a, void (*func)(TCGv_i64, TCGv_ptr, tcg_target_long)) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-27-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate.h | 5 +++++ target/arm/tcg/sme.decode | 12 ++++++++++++ target/arm/tcg/translate-sme.c | 30 ++++++++++++++++++++++++++++++ 3 files changed, 47 insertions(+) diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate.h +++ b/target/arm/tcg/translate.h @@ -XXX,XX +XXX,XX @@ static inline int plus_2(DisasContext *s, int x) return x + 2; } +static inline int plus_8(DisasContext *s, int x) +{ + return x + 8; +} + static inline int plus_12(DisasContext *s, int x) { return x + 12; diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ ZERO_zt0 11000000 01 001 00000000000 00000001 ### SME Move into/from Array %mova_rs 13:2 !function=plus_12 +%mova_rv 13:2 !function=plus_8 +&mova_a rv zr off &mova_p esz rs pg zr za off v:bool &mova_t esz rs zr za off v:bool @@ -XXX,XX +XXX,XX @@ MOVA_zt4 11000000 10 00011 0 v:1 .. 001 000 za:2 zr:3 00 \ MOVA_zt4 11000000 11 00011 0 v:1 .. 001 00 za:3 zr:3 00 \ &mova_t rs=%mova_rs esz=3 off=0 +MOVA_az2 11000000 00 00010 00 .. 010 zr:4 000 off:3 \ + &mova_a rv=%mova_rv +MOVA_az4 11000000 00 00010 00 .. 011 zr:3 0000 off:3 \ + &mova_a rv=%mova_rv + +MOVA_za2 11000000 00 00011 00 .. 010 00 off:3 zr:4 0 \ + &mova_a rv=%mova_rv +MOVA_za4 11000000 00 00011 00 .. 011 00 off:3 zr:3 00 \ + &mova_a rv=%mova_rv + ### SME Move into/from ZT0 MOVT_rzt 1100 0000 0100 1100 0 off:3 00 11111 rt:5 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(MOVA_tz4, aa64_sme2, do_mova_tile_n, a, 4, false) TRANS_FEAT(MOVA_zt2, aa64_sme2, do_mova_tile_n, a, 2, true) TRANS_FEAT(MOVA_zt4, aa64_sme2, do_mova_tile_n, a, 4, true) +static bool do_mova_array_n(DisasContext *s, arg_mova_a *a, int n, bool to_vec) +{ + TCGv_ptr t_za; + int svl; + + if (!sme_smza_enabled_check(s)) { + return true; + } + + svl = streaming_vec_reg_size(s); + t_za = get_zarray(s, a->rv, a->off, n, 0); + + for (int i = 0; i < n; ++i) { + int o_za = (svl / n * sizeof(ARMVectorReg)) * i; + int o_zr = vec_full_reg_offset(s, a->zr * n + i); + + if (to_vec) { + tcg_gen_gvec_mov_var(MO_8, tcg_env, o_zr, t_za, o_za, svl, svl); + } else { + tcg_gen_gvec_mov_var(MO_8, t_za, o_za, tcg_env, o_zr, svl, svl); + } + } + return true; +} + +TRANS_FEAT(MOVA_az2, aa64_sme2, do_mova_array_n, a, 2, false) +TRANS_FEAT(MOVA_az4, aa64_sme2, do_mova_array_n, a, 4, false) +TRANS_FEAT(MOVA_za2, aa64_sme2, do_mova_array_n, a, 2, true) +TRANS_FEAT(MOVA_za4, aa64_sme2, do_mova_array_n, a, 4, true) + static bool do_movt(DisasContext *s, arg_MOVT_rzt *a, void (*func)(TCGv_i64, TCGv_ptr, tcg_target_long)) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-28-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 3 +++ target/arm/tcg/sme.decode | 2 ++ target/arm/tcg/sme_helper.c | 34 ++++++++++++++++++++++++---------- target/arm/tcg/translate-sme.c | 2 ++ 4 files changed, 31 insertions(+), 10 deletions(-) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sme_sumopa_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_6(sme_usmopa_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_6(sme2_bmopa_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SMOPA_d 1010000 0 11 0 ..... ... ... ..... . 0 ... @op_64 SUMOPA_d 1010000 0 11 1 ..... ... ... ..... . 0 ... @op_64 USMOPA_d 1010000 1 11 0 ..... ... ... ..... . 0 ... @op_64 UMOPA_d 1010000 1 11 1 ..... ... ... ..... . 0 ... @op_64 + +BMOPA 1000000 0 10 0 ..... ... ... ..... . 10 .. @op_32 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ DEF_IMOP_64(umopa_d, uint16_t, uint16_t) DEF_IMOP_64(sumopa_d, int16_t, uint16_t) DEF_IMOP_64(usmopa_d, uint16_t, int16_t) -#define DEF_IMOPH(NAME, S) \ - void HELPER(sme_##NAME##_##S)(void *vza, void *vzn, void *vzm, \ +#define DEF_IMOPH(P, NAME, S) \ + void HELPER(P##_##NAME##_##S)(void *vza, void *vzn, void *vzm, \ void *vpn, void *vpm, uint32_t desc) \ { do_imopa_##S(vza, vzn, vzm, vpn, vpm, desc, NAME##_##S); } -DEF_IMOPH(smopa, s) -DEF_IMOPH(umopa, s) -DEF_IMOPH(sumopa, s) -DEF_IMOPH(usmopa, s) +DEF_IMOPH(sme, smopa, s) +DEF_IMOPH(sme, umopa, s) +DEF_IMOPH(sme, sumopa, s) +DEF_IMOPH(sme, usmopa, s) -DEF_IMOPH(smopa, d) -DEF_IMOPH(umopa, d) -DEF_IMOPH(sumopa, d) -DEF_IMOPH(usmopa, d) +DEF_IMOPH(sme, smopa, d) +DEF_IMOPH(sme, umopa, d) +DEF_IMOPH(sme, sumopa, d) +DEF_IMOPH(sme, usmopa, d) + +static uint32_t bmopa_s(uint32_t n, uint32_t m, uint32_t a, uint8_t p, bool neg) +{ + uint32_t sum = ctpop32(~(n ^ m)); + if (neg) { + sum = -sum; + } + if (!(p & 1)) { + sum = 0; + } + return a + sum; +} + +DEF_IMOPH(sme2, bmopa, s) diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SMOPA_d, aa64_sme_i16i64, do_outprod, a, MO_64, gen_helper_sme_smopa_ TRANS_FEAT(UMOPA_d, aa64_sme_i16i64, do_outprod, a, MO_64, gen_helper_sme_umopa_d) TRANS_FEAT(SUMOPA_d, aa64_sme_i16i64, do_outprod, a, MO_64, gen_helper_sme_sumopa_d) TRANS_FEAT(USMOPA_d, aa64_sme_i16i64, do_outprod, a, MO_64, gen_helper_sme_usmopa_d) + +TRANS_FEAT(BMOPA, aa64_sme2, do_outprod, a, MO_32, gen_helper_sme2_bmopa_s) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-29-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 4 ++++ target/arm/tcg/sme.decode | 2 ++ target/arm/tcg/sme_helper.c | 37 +++++++++++++++++++++++++--------- target/arm/tcg/translate-sme.c | 2 ++ 4 files changed, 35 insertions(+), 10 deletions(-) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sme_usmopa_d, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_6(sme2_bmopa_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sme2_smopa2_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_6(sme2_umopa2_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ USMOPA_d 1010000 1 11 0 ..... ... ... ..... . 0 ... @op_64 UMOPA_d 1010000 1 11 1 ..... ... ... ..... . 0 ... @op_64 BMOPA 1000000 0 10 0 ..... ... ... ..... . 10 .. @op_32 +SMOPA2_s 1010000 0 10 0 ..... ... ... ..... . 10 .. @op_32 +UMOPA2_s 1010000 1 10 0 ..... ... ... ..... . 10 .. @op_32 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ static inline void do_imopa_d(uint64_t *za, uint64_t *zn, uint64_t *zm, } } -#define DEF_IMOP_32(NAME, NTYPE, MTYPE) \ +#define DEF_IMOP_8x4_32(NAME, NTYPE, MTYPE) \ static uint32_t NAME(uint32_t n, uint32_t m, uint32_t a, uint8_t p, bool neg) \ { \ uint32_t sum = 0; \ @@ -XXX,XX +XXX,XX @@ static uint32_t NAME(uint32_t n, uint32_t m, uint32_t a, uint8_t p, bool neg) \ return neg ? a - sum : a + sum; \ } -#define DEF_IMOP_64(NAME, NTYPE, MTYPE) \ +#define DEF_IMOP_16x4_64(NAME, NTYPE, MTYPE) \ static uint64_t NAME(uint64_t n, uint64_t m, uint64_t a, uint8_t p, bool neg) \ { \ uint64_t sum = 0; \ @@ -XXX,XX +XXX,XX @@ static uint64_t NAME(uint64_t n, uint64_t m, uint64_t a, uint8_t p, bool neg) \ return neg ? a - sum : a + sum; \ } -DEF_IMOP_32(smopa_s, int8_t, int8_t) -DEF_IMOP_32(umopa_s, uint8_t, uint8_t) -DEF_IMOP_32(sumopa_s, int8_t, uint8_t) -DEF_IMOP_32(usmopa_s, uint8_t, int8_t) +DEF_IMOP_8x4_32(smopa_s, int8_t, int8_t) +DEF_IMOP_8x4_32(umopa_s, uint8_t, uint8_t) +DEF_IMOP_8x4_32(sumopa_s, int8_t, uint8_t) +DEF_IMOP_8x4_32(usmopa_s, uint8_t, int8_t) -DEF_IMOP_64(smopa_d, int16_t, int16_t) -DEF_IMOP_64(umopa_d, uint16_t, uint16_t) -DEF_IMOP_64(sumopa_d, int16_t, uint16_t) -DEF_IMOP_64(usmopa_d, uint16_t, int16_t) +DEF_IMOP_16x4_64(smopa_d, int16_t, int16_t) +DEF_IMOP_16x4_64(umopa_d, uint16_t, uint16_t) +DEF_IMOP_16x4_64(sumopa_d, int16_t, uint16_t) +DEF_IMOP_16x4_64(usmopa_d, uint16_t, int16_t) #define DEF_IMOPH(P, NAME, S) \ void HELPER(P##_##NAME##_##S)(void *vza, void *vzn, void *vzm, \ @@ -XXX,XX +XXX,XX @@ static uint32_t bmopa_s(uint32_t n, uint32_t m, uint32_t a, uint8_t p, bool neg) } DEF_IMOPH(sme2, bmopa, s) + +#define DEF_IMOP_16x2_32(NAME, NTYPE, MTYPE) \ +static uint32_t NAME(uint32_t n, uint32_t m, uint32_t a, uint8_t p, bool neg) \ +{ \ + uint32_t sum = 0; \ + /* Apply P to N as a mask, making the inactive elements 0. */ \ + n &= expand_pred_h(p); \ + sum += (NTYPE)(n >> 0) * (MTYPE)(m >> 0); \ + sum += (NTYPE)(n >> 16) * (MTYPE)(m >> 16); \ + return neg ? a - sum : a + sum; \ +} + +DEF_IMOP_16x2_32(smopa2_s, int16_t, int16_t) +DEF_IMOP_16x2_32(umopa2_s, uint16_t, uint16_t) + +DEF_IMOPH(sme2, smopa2, s) +DEF_IMOPH(sme2, umopa2, s) diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SUMOPA_d, aa64_sme_i16i64, do_outprod, a, MO_64, gen_helper_sme_sumop TRANS_FEAT(USMOPA_d, aa64_sme_i16i64, do_outprod, a, MO_64, gen_helper_sme_usmopa_d) TRANS_FEAT(BMOPA, aa64_sme2, do_outprod, a, MO_32, gen_helper_sme2_bmopa_s) +TRANS_FEAT(SMOPA2_s, aa64_sme2, do_outprod, a, MO_32, gen_helper_sme2_smopa2_s) +TRANS_FEAT(UMOPA2_s, aa64_sme2, do_outprod, a, MO_32, gen_helper_sme2_umopa2_s) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> To be used by both SVE2 and SME2. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-30-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-a64.h | 4 ++++ target/arm/tcg/gengvec64.c | 11 +++++++++++ target/arm/tcg/translate-sve.c | 8 +------- 3 files changed, 16 insertions(+), 7 deletions(-) diff --git a/target/arm/tcg/translate-a64.h b/target/arm/tcg/translate-a64.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.h +++ b/target/arm/tcg/translate-a64.h @@ -XXX,XX +XXX,XX @@ void gen_gvec_usqadd_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz); +void gen_gvec_sve2_sqdmulh(unsigned vece, uint32_t rd_ofs, + uint32_t rn_ofs, uint32_t rm_ofs, + uint32_t opr_sz, uint32_t max_sz); + void gen_sve_ldr(DisasContext *s, TCGv_ptr, int vofs, int len, int rn, int imm, MemOp align); void gen_sve_str(DisasContext *s, TCGv_ptr, int vofs, diff --git a/target/arm/tcg/gengvec64.c b/target/arm/tcg/gengvec64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/gengvec64.c +++ b/target/arm/tcg/gengvec64.c @@ -XXX,XX +XXX,XX @@ void gen_gvec_usqadd_qc(unsigned vece, uint32_t rd_ofs, tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc), rn_ofs, rm_ofs, opr_sz, max_sz, &ops[vece]); } + +void gen_gvec_sve2_sqdmulh(unsigned vece, uint32_t rd_ofs, + uint32_t rn_ofs, uint32_t rm_ofs, + uint32_t opr_sz, uint32_t max_sz) +{ + static gen_helper_gvec_3 * const fns[4] = { + gen_helper_sve2_sqdmulh_b, gen_helper_sve2_sqdmulh_h, + gen_helper_sve2_sqdmulh_s, gen_helper_sve2_sqdmulh_d, + }; + tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, 0, fns[vece]); +} diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(MOVPRFX_z, aa64_sve, do_movz_zpz, a->rd, a->rn, a->pg, a->esz, false) */ TRANS_FEAT(MUL_zzz, aa64_sve2, gen_gvec_fn_arg_zzz, tcg_gen_gvec_mul, a) +TRANS_FEAT(SQDMULH_zzz, aa64_sve2, gen_gvec_fn_arg_zzz, gen_gvec_sve2_sqdmulh, a) static gen_helper_gvec_3 * const smulh_zzz_fns[4] = { gen_helper_gvec_smulh_b, gen_helper_gvec_smulh_h, @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(UMULH_zzz, aa64_sve2, gen_gvec_ool_arg_zzz, TRANS_FEAT(PMUL_zzz, aa64_sve2, gen_gvec_ool_arg_zzz, gen_helper_gvec_pmul_b, a, 0) -static gen_helper_gvec_3 * const sqdmulh_zzz_fns[4] = { - gen_helper_sve2_sqdmulh_b, gen_helper_sve2_sqdmulh_h, - gen_helper_sve2_sqdmulh_s, gen_helper_sve2_sqdmulh_d, -}; -TRANS_FEAT(SQDMULH_zzz, aa64_sve2, gen_gvec_ool_arg_zzz, - sqdmulh_zzz_fns[a->esz], a, 0) - static gen_helper_gvec_3 * const sqrdmulh_zzz_fns[4] = { gen_helper_sve2_sqrdmulh_b, gen_helper_sve2_sqrdmulh_h, gen_helper_sve2_sqrdmulh_s, gen_helper_sve2_sqrdmulh_d, -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-31-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 13 ++++ target/arm/tcg/helper.h | 8 ++ target/arm/tcg/vec_internal.h | 4 + target/arm/tcg/sme.decode | 40 ++++++++++ target/arm/tcg/helper-a64.c | 2 + target/arm/tcg/neon_helper.c | 30 ++++++++ target/arm/tcg/translate-sme.c | 137 +++++++++++++++++++++++++++++++++ target/arm/tcg/vec_helper.c | 7 ++ 8 files changed, 241 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sme2_smopa2_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_6(sme2_umopa2_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(gvec_fmax_b16, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_fmin_b16, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_ah_fmax_b16, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_ah_fmin_b16, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_fmaxnum_b16, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_fminnum_b16, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper.h +++ b/target/arm/tcg/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_urshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_urshl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(gvec_urshl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_srshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_srshl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_srshl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sme2_urshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_urshl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_urshl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_2(neon_add_u8, i32, i32, i32) DEF_HELPER_2(neon_add_u16, i32, i32, i32) DEF_HELPER_2(neon_sub_u8, i32, i32, i32) diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_internal.h +++ b/target/arm/tcg/vec_internal.h @@ -XXX,XX +XXX,XX @@ static inline float64 float64_maybe_ah_chs(float64 a, bool fpcr_ah) return fpcr_ah && float64_is_any_nan(a) ? a : float64_chs(a); } +/* Not actually called directly as a helper, but uses similar machinery. */ +bfloat16 helper_sme2_ah_fmax_b16(bfloat16 a, bfloat16 b, float_status *fpst); +bfloat16 helper_sme2_ah_fmin_b16(bfloat16 a, bfloat16 b, float_status *fpst); + #endif /* TARGET_ARM_VEC_INTERNAL_H */ diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ UMOPA_d 1010000 1 11 1 ..... ... ... ..... . 0 ... @op_64 BMOPA 1000000 0 10 0 ..... ... ... ..... . 10 .. @op_32 SMOPA2_s 1010000 0 10 0 ..... ... ... ..... . 10 .. @op_32 UMOPA2_s 1010000 1 10 0 ..... ... ... ..... . 10 .. @op_32 + +### SME2 Multi-vector Multiple and Single SVE Destructive + +%zd_ax2 1:4 !function=times_2 +%zd_ax4 2:3 !function=times_4 + +&z2z_en zdn zm esz n +@z2z_2x1 ....... . esz:2 .. zm:4 ....0. ..... .... . \ + &z2z_en n=2 zdn=%zd_ax2 +@z2z_4x1 ....... . esz:2 .. zm:4 ....1. ..... ...0 . \ + &z2z_en n=4 zdn=%zd_ax4 + +SMAX_n1 1100000 1 .. 10 .... 1010.0 00000 .... 0 @z2z_2x1 +SMAX_n1 1100000 1 .. 10 .... 1010.0 00000 .... 0 @z2z_4x1 +UMAX_n1 1100000 1 .. 10 .... 1010.0 00000 .... 1 @z2z_2x1 +UMAX_n1 1100000 1 .. 10 .... 1010.0 00000 .... 1 @z2z_4x1 +SMIN_n1 1100000 1 .. 10 .... 1010.0 00001 .... 0 @z2z_2x1 +SMIN_n1 1100000 1 .. 10 .... 1010.0 00001 .... 0 @z2z_4x1 +UMIN_n1 1100000 1 .. 10 .... 1010.0 00001 .... 1 @z2z_2x1 +UMIN_n1 1100000 1 .. 10 .... 1010.0 00001 .... 1 @z2z_4x1 + +FMAX_n1 1100000 1 .. 10 .... 1010.0 01000 .... 0 @z2z_2x1 +FMAX_n1 1100000 1 .. 10 .... 1010.0 01000 .... 0 @z2z_4x1 +FMIN_n1 1100000 1 .. 10 .... 1010.0 01000 .... 1 @z2z_2x1 +FMIN_n1 1100000 1 .. 10 .... 1010.0 01000 .... 1 @z2z_4x1 +FMAXNM_n1 1100000 1 .. 10 .... 1010.0 01001 .... 0 @z2z_2x1 +FMAXNM_n1 1100000 1 .. 10 .... 1010.0 01001 .... 0 @z2z_4x1 +FMINNM_n1 1100000 1 .. 10 .... 1010.0 01001 .... 1 @z2z_2x1 +FMINNM_n1 1100000 1 .. 10 .... 1010.0 01001 .... 1 @z2z_4x1 + +SRSHL_n1 1100000 1 .. 10 .... 1010.0 10001 .... 0 @z2z_2x1 +SRSHL_n1 1100000 1 .. 10 .... 1010.0 10001 .... 0 @z2z_4x1 +URSHL_n1 1100000 1 .. 10 .... 1010.0 10001 .... 1 @z2z_2x1 +URSHL_n1 1100000 1 .. 10 .... 1010.0 10001 .... 1 @z2z_4x1 + +ADD_n1 1100000 1 .. 10 .... 1010.0 11000 .... 0 @z2z_2x1 +ADD_n1 1100000 1 .. 10 .... 1010.0 11000 .... 0 @z2z_4x1 + +SQDMULH_n1 1100000 1 .. 10 .... 1010.1 00000 .... 0 @z2z_2x1 +SQDMULH_n1 1100000 1 .. 10 .... 1010.1 00000 .... 0 @z2z_4x1 diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-a64.c +++ b/target/arm/tcg/helper-a64.c @@ -XXX,XX +XXX,XX @@ AH_MINMAX_HELPER(vfp_ah_mind, float64, float64, min) AH_MINMAX_HELPER(vfp_ah_maxh, dh_ctype_f16, float16, max) AH_MINMAX_HELPER(vfp_ah_maxs, float32, float32, max) AH_MINMAX_HELPER(vfp_ah_maxd, float64, float64, max) +AH_MINMAX_HELPER(sme2_ah_fmax_b16, bfloat16, bfloat16, max) +AH_MINMAX_HELPER(sme2_ah_fmin_b16, bfloat16, bfloat16, min) /* 64-bit versions of the CRC helpers. Note that although the operation * (and the prototypes of crc32c() and crc32() mean that only the bottom diff --git a/target/arm/tcg/neon_helper.c b/target/arm/tcg/neon_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/neon_helper.c +++ b/target/arm/tcg/neon_helper.c @@ -XXX,XX +XXX,XX @@ NEON_VOP(rshl_s16, neon_s16, 2) NEON_GVEC_VOP2(gvec_srshl_h, int16_t) #undef NEON_FN +#define NEON_FN(dest, src1, src2) \ + (dest = do_sqrshl_bhs(src1, src2, 16, true, NULL)) +NEON_GVEC_VOP2(sme2_srshl_h, int16_t) +#undef NEON_FN + #define NEON_FN(dest, src1, src2) \ (dest = do_sqrshl_bhs(src1, (int8_t)src2, 32, true, NULL)) NEON_GVEC_VOP2(gvec_srshl_s, int32_t) #undef NEON_FN +#define NEON_FN(dest, src1, src2) \ + (dest = do_sqrshl_bhs(src1, src2, 32, true, NULL)) +NEON_GVEC_VOP2(sme2_srshl_s, int32_t) +#undef NEON_FN + #define NEON_FN(dest, src1, src2) \ (dest = do_sqrshl_d(src1, (int8_t)src2, true, NULL)) NEON_GVEC_VOP2(gvec_srshl_d, int64_t) #undef NEON_FN +#define NEON_FN(dest, src1, src2) \ + (dest = do_sqrshl_d(src1, src2, true, NULL)) +NEON_GVEC_VOP2(sme2_srshl_d, int64_t) +#undef NEON_FN + uint32_t HELPER(neon_rshl_s32)(uint32_t val, uint32_t shift) { return do_sqrshl_bhs(val, (int8_t)shift, 32, true, NULL); @@ -XXX,XX +XXX,XX @@ NEON_VOP(rshl_u16, neon_u16, 2) NEON_GVEC_VOP2(gvec_urshl_h, uint16_t) #undef NEON_FN +#define NEON_FN(dest, src1, src2) \ + (dest = do_uqrshl_bhs(src1, (int16_t)src2, 16, true, NULL)) +NEON_GVEC_VOP2(sme2_urshl_h, uint16_t) +#undef NEON_FN + #define NEON_FN(dest, src1, src2) \ (dest = do_uqrshl_bhs(src1, (int8_t)src2, 32, true, NULL)) NEON_GVEC_VOP2(gvec_urshl_s, int32_t) #undef NEON_FN +#define NEON_FN(dest, src1, src2) \ + (dest = do_uqrshl_bhs(src1, src2, 32, true, NULL)) +NEON_GVEC_VOP2(sme2_urshl_s, int32_t) +#undef NEON_FN + #define NEON_FN(dest, src1, src2) \ (dest = do_uqrshl_d(src1, (int8_t)src2, true, NULL)) NEON_GVEC_VOP2(gvec_urshl_d, int64_t) #undef NEON_FN +#define NEON_FN(dest, src1, src2) \ + (dest = do_uqrshl_d(src1, src2, true, NULL)) +NEON_GVEC_VOP2(sme2_urshl_d, int64_t) +#undef NEON_FN + uint32_t HELPER(neon_rshl_u32)(uint32_t val, uint32_t shift) { return do_uqrshl_bhs(val, (int8_t)shift, 32, true, NULL); diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(USMOPA_d, aa64_sme_i16i64, do_outprod, a, MO_64, gen_helper_sme_usmop TRANS_FEAT(BMOPA, aa64_sme2, do_outprod, a, MO_32, gen_helper_sme2_bmopa_s) TRANS_FEAT(SMOPA2_s, aa64_sme2, do_outprod, a, MO_32, gen_helper_sme2_smopa2_s) TRANS_FEAT(UMOPA2_s, aa64_sme2, do_outprod, a, MO_32, gen_helper_sme2_umopa2_s) + +static bool do_z2z_n1(DisasContext *s, arg_z2z_en *a, GVecGen3Fn *fn) +{ + int esz, dn, vsz, mofs, n; + bool overlap = false; + + if (!sme_sm_enabled_check(s)) { + return true; + } + + esz = a->esz; + n = a->n; + dn = a->zdn; + mofs = vec_full_reg_offset(s, a->zm); + vsz = streaming_vec_reg_size(s); + + for (int i = 0; i < n; i++) { + int dofs = vec_full_reg_offset(s, dn + i); + if (dofs == mofs) { + overlap = true; + } else { + fn(esz, dofs, dofs, mofs, vsz, vsz); + } + } + if (overlap) { + fn(esz, mofs, mofs, mofs, vsz, vsz); + } + return true; +} + +static void gen_sme2_srshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz) +{ + static gen_helper_gvec_3 * const fns[] = { + gen_helper_gvec_srshl_b, gen_helper_sme2_srshl_h, + gen_helper_sme2_srshl_s, gen_helper_sme2_srshl_d, + }; + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, 0, fns[vece]); +} + +static void gen_sme2_urshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, + uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz) +{ + static gen_helper_gvec_3 * const fns[] = { + gen_helper_gvec_urshl_b, gen_helper_sme2_urshl_h, + gen_helper_sme2_urshl_s, gen_helper_sme2_urshl_d, + }; + tcg_debug_assert(vece <= MO_64); + tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, 0, fns[vece]); +} + +TRANS_FEAT(ADD_n1, aa64_sme2, do_z2z_n1, a, tcg_gen_gvec_add) +TRANS_FEAT(SMAX_n1, aa64_sme2, do_z2z_n1, a, tcg_gen_gvec_smax) +TRANS_FEAT(SMIN_n1, aa64_sme2, do_z2z_n1, a, tcg_gen_gvec_smin) +TRANS_FEAT(UMAX_n1, aa64_sme2, do_z2z_n1, a, tcg_gen_gvec_umax) +TRANS_FEAT(UMIN_n1, aa64_sme2, do_z2z_n1, a, tcg_gen_gvec_umin) +TRANS_FEAT(SRSHL_n1, aa64_sme2, do_z2z_n1, a, gen_sme2_srshl) +TRANS_FEAT(URSHL_n1, aa64_sme2, do_z2z_n1, a, gen_sme2_urshl) +TRANS_FEAT(SQDMULH_n1, aa64_sme2, do_z2z_n1, a, gen_gvec_sve2_sqdmulh) + +static bool do_z2z_n1_fpst(DisasContext *s, arg_z2z_en *a, + gen_helper_gvec_3_ptr * const fns[4]) +{ + int esz = a->esz, n, dn, vsz, mofs; + bool overlap = false; + gen_helper_gvec_3_ptr *fn; + TCGv_ptr fpst; + + /* These insns use MO_8 to encode BFloat16. */ + if (esz == MO_8 && !dc_isar_feature(aa64_sme_b16b16, s)) { + return false; + } + if (!sme_sm_enabled_check(s)) { + return true; + } + + fpst = fpstatus_ptr(esz == MO_16 ? FPST_A64_F16 : FPST_A64); + fn = fns[esz]; + n = a->n; + dn = a->zdn; + mofs = vec_full_reg_offset(s, a->zm); + vsz = streaming_vec_reg_size(s); + + for (int i = 0; i < n; i++) { + int dofs = vec_full_reg_offset(s, dn + i); + if (dofs == mofs) { + overlap = true; + } else { + tcg_gen_gvec_3_ptr(dofs, dofs, mofs, fpst, vsz, vsz, 0, fn); + } + } + if (overlap) { + tcg_gen_gvec_3_ptr(mofs, mofs, mofs, fpst, vsz, vsz, 0, fn); + } + return true; +} + +static gen_helper_gvec_3_ptr * const f_vector_fmax[2][4] = { + { gen_helper_gvec_fmax_b16, + gen_helper_gvec_fmax_h, + gen_helper_gvec_fmax_s, + gen_helper_gvec_fmax_d }, + { gen_helper_gvec_ah_fmax_b16, + gen_helper_gvec_ah_fmax_h, + gen_helper_gvec_ah_fmax_s, + gen_helper_gvec_ah_fmax_d }, +}; +TRANS_FEAT(FMAX_n1, aa64_sme2, do_z2z_n1_fpst, a, f_vector_fmax[s->fpcr_ah]) + +static gen_helper_gvec_3_ptr * const f_vector_fmin[2][4] = { + { gen_helper_gvec_fmin_b16, + gen_helper_gvec_fmin_h, + gen_helper_gvec_fmin_s, + gen_helper_gvec_fmin_d }, + { gen_helper_gvec_ah_fmin_b16, + gen_helper_gvec_ah_fmin_h, + gen_helper_gvec_ah_fmin_s, + gen_helper_gvec_ah_fmin_d }, +}; +TRANS_FEAT(FMIN_n1, aa64_sme2, do_z2z_n1_fpst, a, f_vector_fmin[s->fpcr_ah]) + +static gen_helper_gvec_3_ptr * const f_vector_fmaxnm[4] = { + gen_helper_gvec_fmaxnum_b16, + gen_helper_gvec_fmaxnum_h, + gen_helper_gvec_fmaxnum_s, + gen_helper_gvec_fmaxnum_d, +}; +TRANS_FEAT(FMAXNM_n1, aa64_sme2, do_z2z_n1_fpst, a, f_vector_fmaxnm) + +static gen_helper_gvec_3_ptr * const f_vector_fminnm[4] = { + gen_helper_gvec_fminnum_b16, + gen_helper_gvec_fminnum_h, + gen_helper_gvec_fminnum_s, + gen_helper_gvec_fminnum_d, +}; +TRANS_FEAT(FMINNM_n1, aa64_sme2, do_z2z_n1_fpst, a, f_vector_fminnm) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_ah_fmin_h, helper_vfp_ah_minh, float16) DO_3OP(gvec_ah_fmin_s, helper_vfp_ah_mins, float32) DO_3OP(gvec_ah_fmin_d, helper_vfp_ah_mind, float64) +DO_3OP(gvec_fmax_b16, bfloat16_max, bfloat16) +DO_3OP(gvec_fmin_b16, bfloat16_min, bfloat16) +DO_3OP(gvec_fmaxnum_b16, bfloat16_maxnum, bfloat16) +DO_3OP(gvec_fminnum_b16, bfloat16_minnum, bfloat16) +DO_3OP(gvec_ah_fmax_b16, helper_sme2_ah_fmax_b16, bfloat16) +DO_3OP(gvec_ah_fmin_b16, helper_sme2_ah_fmin_b16, bfloat16) + #endif #undef DO_3OP -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-32-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 36 +++++++++++++++++++ target/arm/tcg/translate-sme.c | 65 ++++++++++++++++++++++++++++++++++ 2 files changed, 101 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ ADD_n1 1100000 1 .. 10 .... 1010.0 11000 .... 0 @z2z_4x1 SQDMULH_n1 1100000 1 .. 10 .... 1010.1 00000 .... 0 @z2z_2x1 SQDMULH_n1 1100000 1 .. 10 .... 1010.1 00000 .... 0 @z2z_4x1 + +### SME2 Multi-vector Multiple Vectors SVE Destructive + +%zm_ax2 17:4 !function=times_2 +%zm_ax4 18:3 !function=times_4 + +@z2z_2x2 ....... . esz:2 . ....0 ....0. ..... .... . \ + &z2z_en n=2 zdn=%zd_ax2 zm=%zm_ax2 +@z2z_4x4 ....... . esz:2 . ...00 ....1. ..... ...0 . \ + &z2z_en n=4 zdn=%zd_ax4 zm=%zm_ax4 + +SMAX_nn 1100000 1 .. 1 ..... 1011.0 00000 .... 0 @z2z_2x2 +SMAX_nn 1100000 1 .. 1 ..... 1011.0 00000 .... 0 @z2z_4x4 +UMAX_nn 1100000 1 .. 1 ..... 1011.0 00000 .... 1 @z2z_2x2 +UMAX_nn 1100000 1 .. 1 ..... 1011.0 00000 .... 1 @z2z_4x4 +SMIN_nn 1100000 1 .. 1 ..... 1011.0 00001 .... 0 @z2z_2x2 +SMIN_nn 1100000 1 .. 1 ..... 1011.0 00001 .... 0 @z2z_4x4 +UMIN_nn 1100000 1 .. 1 ..... 1011.0 00001 .... 1 @z2z_2x2 +UMIN_nn 1100000 1 .. 1 ..... 1011.0 00001 .... 1 @z2z_4x4 + +FMAX_nn 1100000 1 .. 1 ..... 1011.0 01000 .... 0 @z2z_2x2 +FMAX_nn 1100000 1 .. 1 ..... 1011.0 01000 .... 0 @z2z_4x4 +FMIN_nn 1100000 1 .. 1 ..... 1011.0 01000 .... 1 @z2z_2x2 +FMIN_nn 1100000 1 .. 1 ..... 1011.0 01000 .... 1 @z2z_4x4 +FMAXNM_nn 1100000 1 .. 1 ..... 1011.0 01001 .... 0 @z2z_2x2 +FMAXNM_nn 1100000 1 .. 1 ..... 1011.0 01001 .... 0 @z2z_4x4 +FMINNM_nn 1100000 1 .. 1 ..... 1011.0 01001 .... 1 @z2z_2x2 +FMINNM_nn 1100000 1 .. 1 ..... 1011.0 01001 .... 1 @z2z_4x4 + +SRSHL_nn 1100000 1 .. 1 ..... 1011.0 10001 .... 0 @z2z_2x2 +SRSHL_nn 1100000 1 .. 1 ..... 1011.0 10001 .... 0 @z2z_4x4 +URSHL_nn 1100000 1 .. 1 ..... 1011.0 10001 .... 1 @z2z_2x2 +URSHL_nn 1100000 1 .. 1 ..... 1011.0 10001 .... 1 @z2z_4x4 + +SQDMULH_nn 1100000 1 .. 1 ..... 1011.1 00000 .... 0 @z2z_2x2 +SQDMULH_nn 1100000 1 .. 1 ..... 1011.1 00000 .... 0 @z2z_4x4 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SRSHL_n1, aa64_sme2, do_z2z_n1, a, gen_sme2_srshl) TRANS_FEAT(URSHL_n1, aa64_sme2, do_z2z_n1, a, gen_sme2_urshl) TRANS_FEAT(SQDMULH_n1, aa64_sme2, do_z2z_n1, a, gen_gvec_sve2_sqdmulh) +static bool do_z2z_nn(DisasContext *s, arg_z2z_en *a, GVecGen3Fn *fn) +{ + int esz, dn, dm, vsz, n; + + if (!sme_sm_enabled_check(s)) { + return true; + } + + esz = a->esz; + n = a->n; + dn = a->zdn; + dm = a->zm; + vsz = streaming_vec_reg_size(s); + + for (int i = 0; i < n; i++) { + int dofs = vec_full_reg_offset(s, dn + i); + int mofs = vec_full_reg_offset(s, dm + i); + + fn(esz, dofs, dofs, mofs, vsz, vsz); + } + return true; +} + +TRANS_FEAT(SMAX_nn, aa64_sme2, do_z2z_nn, a, tcg_gen_gvec_smax) +TRANS_FEAT(SMIN_nn, aa64_sme2, do_z2z_nn, a, tcg_gen_gvec_smin) +TRANS_FEAT(UMAX_nn, aa64_sme2, do_z2z_nn, a, tcg_gen_gvec_umax) +TRANS_FEAT(UMIN_nn, aa64_sme2, do_z2z_nn, a, tcg_gen_gvec_umin) +TRANS_FEAT(SRSHL_nn, aa64_sme2, do_z2z_nn, a, gen_sme2_srshl) +TRANS_FEAT(URSHL_nn, aa64_sme2, do_z2z_nn, a, gen_sme2_urshl) +TRANS_FEAT(SQDMULH_nn, aa64_sme2, do_z2z_nn, a, gen_gvec_sve2_sqdmulh) + static bool do_z2z_n1_fpst(DisasContext *s, arg_z2z_en *a, gen_helper_gvec_3_ptr * const fns[4]) { @@ -XXX,XX +XXX,XX @@ static bool do_z2z_n1_fpst(DisasContext *s, arg_z2z_en *a, return true; } +static bool do_z2z_nn_fpst(DisasContext *s, arg_z2z_en *a, + gen_helper_gvec_3_ptr * const fns[4]) +{ + int esz = a->esz, n, dn, dm, vsz; + gen_helper_gvec_3_ptr *fn; + TCGv_ptr fpst; + + if (esz == MO_8 && !dc_isar_feature(aa64_sme_b16b16, s)) { + return false; + } + if (!sme_sm_enabled_check(s)) { + return true; + } + + fpst = fpstatus_ptr(esz == MO_16 ? FPST_A64_F16 : FPST_A64); + fn = fns[esz]; + n = a->n; + dn = a->zdn; + dm = a->zm; + vsz = streaming_vec_reg_size(s); + + for (int i = 0; i < n; i++) { + int dofs = vec_full_reg_offset(s, dn + i); + int mofs = vec_full_reg_offset(s, dm + i); + + tcg_gen_gvec_3_ptr(dofs, dofs, mofs, fpst, vsz, vsz, 0, fn); + } + return true; +} + static gen_helper_gvec_3_ptr * const f_vector_fmax[2][4] = { { gen_helper_gvec_fmax_b16, gen_helper_gvec_fmax_h, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fmax[2][4] = { gen_helper_gvec_ah_fmax_d }, }; TRANS_FEAT(FMAX_n1, aa64_sme2, do_z2z_n1_fpst, a, f_vector_fmax[s->fpcr_ah]) +TRANS_FEAT(FMAX_nn, aa64_sme2, do_z2z_nn_fpst, a, f_vector_fmax[s->fpcr_ah]) static gen_helper_gvec_3_ptr * const f_vector_fmin[2][4] = { { gen_helper_gvec_fmin_b16, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fmin[2][4] = { gen_helper_gvec_ah_fmin_d }, }; TRANS_FEAT(FMIN_n1, aa64_sme2, do_z2z_n1_fpst, a, f_vector_fmin[s->fpcr_ah]) +TRANS_FEAT(FMIN_nn, aa64_sme2, do_z2z_nn_fpst, a, f_vector_fmin[s->fpcr_ah]) static gen_helper_gvec_3_ptr * const f_vector_fmaxnm[4] = { gen_helper_gvec_fmaxnum_b16, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fmaxnm[4] = { gen_helper_gvec_fmaxnum_d, }; TRANS_FEAT(FMAXNM_n1, aa64_sme2, do_z2z_n1_fpst, a, f_vector_fmaxnm) +TRANS_FEAT(FMAXNM_nn, aa64_sme2, do_z2z_nn_fpst, a, f_vector_fmaxnm) static gen_helper_gvec_3_ptr * const f_vector_fminnm[4] = { gen_helper_gvec_fminnum_b16, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fminnm[4] = { gen_helper_gvec_fminnum_d, }; TRANS_FEAT(FMINNM_n1, aa64_sme2, do_z2z_n1_fpst, a, f_vector_fminnm) +TRANS_FEAT(FMINNM_nn, aa64_sme2, do_z2z_nn_fpst, a, f_vector_fminnm) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-33-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate.h | 2 ++ target/arm/tcg/sme.decode | 15 +++++++++++++++ target/arm/tcg/translate-sme.c | 30 ++++++++++++++++++++++++++++++ 3 files changed, 47 insertions(+) diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate.h +++ b/target/arm/tcg/translate.h @@ -XXX,XX +XXX,XX @@ typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t); typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t); +typedef void GVecGen3FnVar(unsigned, TCGv_ptr, uint32_t, TCGv_ptr, uint32_t, + TCGv_ptr, uint32_t, uint32_t, uint32_t); /* Function prototype for gen_ functions for calling Neon helpers */ typedef void NeonGenOneOpFn(TCGv_i32, TCGv_i32); diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ URSHL_nn 1100000 1 .. 1 ..... 1011.0 10001 .... 1 @z2z_4x4 SQDMULH_nn 1100000 1 .. 1 ..... 1011.1 00000 .... 0 @z2z_2x2 SQDMULH_nn 1100000 1 .. 1 ..... 1011.1 00000 .... 0 @z2z_4x4 + +### SME2 Multi-vector Multiple and Single Array Vectors + +&azz_n n off rv zn zm +@azz_nx1_o3 ........ .... zm:4 ...... zn:5 .. off:3 &azz_n rv=%mova_rv + +ADD_azz_n1_s 11000001 0010 .... 0 .. 110 ..... 10 ... @azz_nx1_o3 n=2 +ADD_azz_n1_s 11000001 0011 .... 0 .. 110 ..... 10 ... @azz_nx1_o3 n=4 +ADD_azz_n1_d 11000001 0110 .... 0 .. 110 ..... 10 ... @azz_nx1_o3 n=2 +ADD_azz_n1_d 11000001 0111 .... 0 .. 110 ..... 10 ... @azz_nx1_o3 n=4 + +SUB_azz_n1_s 11000001 0010 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=2 +SUB_azz_n1_s 11000001 0011 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=4 +SUB_azz_n1_d 11000001 0110 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=2 +SUB_azz_n1_d 11000001 0111 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=4 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const f_vector_fminnm[4] = { }; TRANS_FEAT(FMINNM_n1, aa64_sme2, do_z2z_n1_fpst, a, f_vector_fminnm) TRANS_FEAT(FMINNM_nn, aa64_sme2, do_z2z_nn_fpst, a, f_vector_fminnm) + +/* Add/Sub vector Z[m] to each Z[n*N] with result in ZA[d*N]. */ +static bool do_azz_n1(DisasContext *s, arg_azz_n *a, int esz, + GVecGen3FnVar *fn) +{ + TCGv_ptr t_za; + int svl, n, o_zm; + + if (!sme_smza_enabled_check(s)) { + return true; + } + + n = a->n; + t_za = get_zarray(s, a->rv, a->off, n, 0); + o_zm = vec_full_reg_offset(s, a->zm); + svl = streaming_vec_reg_size(s); + + for (int i = 0; i < n; ++i) { + int o_za = (svl / n * sizeof(ARMVectorReg)) * i; + int o_zn = vec_full_reg_offset(s, (a->zn + i) % 32); + + fn(esz, t_za, o_za, tcg_env, o_zn, tcg_env, o_zm, svl, svl); + } + return true; +} + +TRANS_FEAT(ADD_azz_n1_s, aa64_sme2, do_azz_n1, a, MO_32, tcg_gen_gvec_add_var) +TRANS_FEAT(SUB_azz_n1_s, aa64_sme2, do_azz_n1, a, MO_32, tcg_gen_gvec_sub_var) +TRANS_FEAT(ADD_azz_n1_d, aa64_sme2_i16i64, do_azz_n1, a, MO_64, tcg_gen_gvec_add_var) +TRANS_FEAT(SUB_azz_n1_d, aa64_sme2_i16i64, do_azz_n1, a, MO_64, tcg_gen_gvec_sub_var) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-34-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 20 ++++++++++++++++++++ target/arm/tcg/translate-sme.c | 30 ++++++++++++++++++++++++++++++ 2 files changed, 50 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SUB_azz_n1_s 11000001 0010 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=2 SUB_azz_n1_s 11000001 0011 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=4 SUB_azz_n1_d 11000001 0110 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=2 SUB_azz_n1_d 11000001 0111 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=4 + +### SME2 Multi-vector Multiple Array Vectors + +%zn_ax2 6:4 !function=times_2 +%zn_ax4 7:3 !function=times_4 + +@azz_2x2_o3 ........ ... ..... . .. ... ..... .. off:3 \ + &azz_n n=2 rv=%mova_rv zn=%zn_ax2 zm=%zm_ax2 +@azz_4x4_o3 ........ ... ..... . .. ... ..... .. off:3 \ + &azz_n n=4 rv=%mova_rv zn=%zn_ax4 zm=%zm_ax4 + +ADD_azz_nn_s 11000001 101 ....0 0 .. 110 ....0 10 ... @azz_2x2_o3 +ADD_azz_nn_s 11000001 101 ...01 0 .. 110 ...00 10 ... @azz_4x4_o3 +ADD_azz_nn_d 11000001 111 ....0 0 .. 110 ....0 10 ... @azz_2x2_o3 +ADD_azz_nn_d 11000001 111 ...01 0 .. 110 ...00 10 ... @azz_4x4_o3 + +SUB_azz_nn_s 11000001 101 ....0 0 .. 110 ....0 11 ... @azz_2x2_o3 +SUB_azz_nn_s 11000001 101 ...01 0 .. 110 ...00 11 ... @azz_4x4_o3 +SUB_azz_nn_d 11000001 111 ....0 0 .. 110 ....0 11 ... @azz_2x2_o3 +SUB_azz_nn_d 11000001 111 ...01 0 .. 110 ...00 11 ... @azz_4x4_o3 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(ADD_azz_n1_s, aa64_sme2, do_azz_n1, a, MO_32, tcg_gen_gvec_add_var) TRANS_FEAT(SUB_azz_n1_s, aa64_sme2, do_azz_n1, a, MO_32, tcg_gen_gvec_sub_var) TRANS_FEAT(ADD_azz_n1_d, aa64_sme2_i16i64, do_azz_n1, a, MO_64, tcg_gen_gvec_add_var) TRANS_FEAT(SUB_azz_n1_d, aa64_sme2_i16i64, do_azz_n1, a, MO_64, tcg_gen_gvec_sub_var) + +/* Add/Sub each vector Z[m*N] to each Z[n*N] with result in ZA[d*N]. */ +static bool do_azz_nn(DisasContext *s, arg_azz_n *a, int esz, + GVecGen3FnVar *fn) +{ + TCGv_ptr t_za; + int svl, n; + + if (!sme_smza_enabled_check(s)) { + return true; + } + + n = a->n; + t_za = get_zarray(s, a->rv, a->off, n, 1); + svl = streaming_vec_reg_size(s); + + for (int i = 0; i < n; ++i) { + int o_za = (svl / n * sizeof(ARMVectorReg)) * i; + int o_zn = vec_full_reg_offset(s, a->zn + i); + int o_zm = vec_full_reg_offset(s, a->zm + i); + + fn(esz, t_za, o_za, tcg_env, o_zn, tcg_env, o_zm, svl, svl); + } + return true; +} + +TRANS_FEAT(ADD_azz_nn_s, aa64_sme2, do_azz_nn, a, MO_32, tcg_gen_gvec_add_var) +TRANS_FEAT(SUB_azz_nn_s, aa64_sme2, do_azz_nn, a, MO_32, tcg_gen_gvec_sub_var) +TRANS_FEAT(ADD_azz_nn_d, aa64_sme2_i16i64, do_azz_nn, a, MO_64, tcg_gen_gvec_add_var) +TRANS_FEAT(SUB_azz_nn_d, aa64_sme2_i16i64, do_azz_nn, a, MO_64, tcg_gen_gvec_sub_var) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Indicate whether to use FPST_FPCR or FPST_ZA via bit 2 of simd_data(desc). For SVE, this bit remains zero. For do_FMLAL_zzzw, this requires no change. For do_FMLAL_zzxw, move the index up one bit. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-35-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-sve.c | 2 +- target/arm/tcg/vec_helper.c | 8 +++++--- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool do_FMLAL_zzxw(DisasContext *s, arg_rrxr_esz *a, bool sub, bool sel) { return gen_gvec_ptr_zzzz(s, gen_helper_sve2_fmlal_zzxw_s, a->rd, a->rn, a->rm, a->ra, - (a->index << 2) | (sel << 1) | sub, tcg_env); + (a->index << 3) | (sel << 1) | sub, tcg_env); } TRANS_FEAT(FMLALB_zzxw, aa64_sve2, do_FMLAL_zzxw, a, false, false) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzzw_s)(void *vd, void *vn, void *vm, void *va, intptr_t i, oprsz = simd_oprsz(desc); bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1); intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16); - float_status *status = &env->vfp.fp_status[FPST_A64]; + bool za = extract32(desc, SIMD_DATA_SHIFT + 2, 1); + float_status *status = &env->vfp.fp_status[za ? FPST_ZA : FPST_A64]; bool fz16 = env->vfp.fpcr & FPCR_FZ16; int negx = 0, negf = 0; @@ -XXX,XX +XXX,XX @@ void HELPER(sve2_fmlal_zzxw_s)(void *vd, void *vn, void *vm, void *va, intptr_t i, j, oprsz = simd_oprsz(desc); bool is_s = extract32(desc, SIMD_DATA_SHIFT, 1); intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 1, 1) * sizeof(float16); - intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 2, 3) * sizeof(float16); - float_status *status = &env->vfp.fp_status[FPST_A64]; + bool za = extract32(desc, SIMD_DATA_SHIFT + 2, 1); + intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 3, 3) * sizeof(float16); + float_status *status = &env->vfp.fp_status[za ? FPST_ZA : FPST_A64]; bool fz16 = env->vfp.fpcr & FPCR_FZ16; int negx = 0, negf = 0; -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-36-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper.h | 8 +++++ target/arm/tcg/vec_helper.c | 58 ++++++++++++++++++++++++++++--------- 2 files changed, 53 insertions(+), 13 deletions(-) diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper.h +++ b/target/arm/tcg/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(gvec_bfmmla, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_6(gvec_bfmlal, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_6(gvec_bfmlsl, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_6(gvec_ah_bfmlsl, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_6(gvec_bfmlal_idx, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_6(gvec_bfmlsl_idx, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_6(gvec_ah_bfmlsl_idx, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_sclamp_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfmmla)(void *vd, void *vn, void *vm, void *va, clear_tail(d, opr_sz, simd_maxsz(desc)); } -void HELPER(gvec_bfmlal)(void *vd, void *vn, void *vm, void *va, - float_status *stat, uint32_t desc) +static void do_bfmlal(float32 *d, bfloat16 *n, bfloat16 *m, float32 *a, + float_status *stat, uint32_t desc, int negx, int negf) { intptr_t i, opr_sz = simd_oprsz(desc); - intptr_t sel = simd_data(desc); - float32 *d = vd, *a = va; - bfloat16 *n = vn, *m = vm; + intptr_t sel = extract32(desc, SIMD_DATA_SHIFT, 1); for (i = 0; i < opr_sz / 4; ++i) { - float32 nn = n[H2(i * 2 + sel)] << 16; + float32 nn = (negx ^ n[H2(i * 2 + sel)]) << 16; float32 mm = m[H2(i * 2 + sel)] << 16; - d[H4(i)] = float32_muladd(nn, mm, a[H4(i)], 0, stat); + d[H4(i)] = float32_muladd(nn, mm, a[H4(i)], negf, stat); } clear_tail(d, opr_sz, simd_maxsz(desc)); } -void HELPER(gvec_bfmlal_idx)(void *vd, void *vn, void *vm, - void *va, float_status *stat, uint32_t desc) +void HELPER(gvec_bfmlal)(void *vd, void *vn, void *vm, void *va, + float_status *stat, uint32_t desc) +{ + do_bfmlal(vd, vn, vm, va, stat, desc, 0, 0); +} + +void HELPER(gvec_bfmlsl)(void *vd, void *vn, void *vm, void *va, + float_status *stat, uint32_t desc) +{ + do_bfmlal(vd, vn, vm, va, stat, desc, 0x8000, 0); +} + +void HELPER(gvec_ah_bfmlsl)(void *vd, void *vn, void *vm, void *va, + float_status *stat, uint32_t desc) +{ + do_bfmlal(vd, vn, vm, va, stat, desc, 0, float_muladd_negate_product); +} + +static void do_bfmlal_idx(float32 *d, bfloat16 *n, bfloat16 *m, float32 *a, + float_status *stat, uint32_t desc, int negx, int negf) { intptr_t i, j, opr_sz = simd_oprsz(desc); intptr_t sel = extract32(desc, SIMD_DATA_SHIFT, 1); intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 1, 3); intptr_t elements = opr_sz / 4; intptr_t eltspersegment = MIN(16 / 4, elements); - float32 *d = vd, *a = va; - bfloat16 *n = vn, *m = vm; for (i = 0; i < elements; i += eltspersegment) { float32 m_idx = m[H2(2 * i + index)] << 16; for (j = i; j < i + eltspersegment; j++) { - float32 n_j = n[H2(2 * j + sel)] << 16; - d[H4(j)] = float32_muladd(n_j, m_idx, a[H4(j)], 0, stat); + float32 n_j = (negx ^ n[H2(2 * j + sel)]) << 16; + d[H4(j)] = float32_muladd(n_j, m_idx, a[H4(j)], negf, stat); } } clear_tail(d, opr_sz, simd_maxsz(desc)); } +void HELPER(gvec_bfmlal_idx)(void *vd, void *vn, void *vm, void *va, + float_status *stat, uint32_t desc) +{ + do_bfmlal_idx(vd, vn, vm, va, stat, desc, 0, 0); +} + +void HELPER(gvec_bfmlsl_idx)(void *vd, void *vn, void *vm, void *va, + float_status *stat, uint32_t desc) +{ + do_bfmlal_idx(vd, vn, vm, va, stat, desc, 0x8000, 0); +} + +void HELPER(gvec_ah_bfmlsl_idx)(void *vd, void *vn, void *vm, void *va, + float_status *stat, uint32_t desc) +{ + do_bfmlal_idx(vd, vn, vm, va, stat, desc, 0, float_muladd_negate_product); +} + #define DO_CLAMP(NAME, TYPE) \ void HELPER(NAME)(void *d, void *n, void *m, void *a, uint32_t desc) \ { \ -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-37-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 71 ++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 98 ++++++++++++++++++++++++++++++++++ 2 files changed, 169 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SUB_azz_n1_s 11000001 0011 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=4 SUB_azz_n1_d 11000001 0110 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=2 SUB_azz_n1_d 11000001 0111 .... 0 .. 110 ..... 11 ... @azz_nx1_o3 n=4 +%off3_x2 0:3 !function=times_2 +%off2_x2 0:2 !function=times_2 + +@azz_nx1_o3x2 ........ ... . zm:4 . .. ... zn:5 .. ... \ + &azz_n off=%off3_x2 rv=%mova_rv +@azz_nx1_o2x2 ........ ... . zm:4 . .. ... zn:5 ... .. \ + &azz_n off=%off2_x2 rv=%mova_rv + +FMLAL_n1 11000001 001 0 .... 0 .. 011 ..... 00 ... @azz_nx1_o3x2 n=1 +FMLAL_n1 11000001 001 0 .... 0 .. 010 ..... 000 .. @azz_nx1_o2x2 n=2 +FMLAL_n1 11000001 001 1 .... 0 .. 010 ..... 000 .. @azz_nx1_o2x2 n=4 + +FMLSL_n1 11000001 001 0 .... 0 .. 011 ..... 01 ... @azz_nx1_o3x2 n=1 +FMLSL_n1 11000001 001 0 .... 0 .. 010 ..... 010 .. @azz_nx1_o2x2 n=2 +FMLSL_n1 11000001 001 1 .... 0 .. 010 ..... 010 .. @azz_nx1_o2x2 n=4 + +BFMLAL_n1 11000001 001 0 .... 0 .. 011 ..... 10 ... @azz_nx1_o3x2 n=1 +BFMLAL_n1 11000001 001 0 .... 0 .. 010 ..... 100 .. @azz_nx1_o2x2 n=2 +BFMLAL_n1 11000001 001 1 .... 0 .. 010 ..... 100 .. @azz_nx1_o2x2 n=4 + +BFMLSL_n1 11000001 001 0 .... 0 .. 011 ..... 11 ... @azz_nx1_o3x2 n=1 +BFMLSL_n1 11000001 001 0 .... 0 .. 010 ..... 110 .. @azz_nx1_o2x2 n=2 +BFMLSL_n1 11000001 001 1 .... 0 .. 010 ..... 110 .. @azz_nx1_o2x2 n=4 + ### SME2 Multi-vector Multiple Array Vectors %zn_ax2 6:4 !function=times_2 @@ -XXX,XX +XXX,XX @@ SUB_azz_nn_s 11000001 101 ....0 0 .. 110 ....0 11 ... @azz_2x2_o3 SUB_azz_nn_s 11000001 101 ...01 0 .. 110 ...00 11 ... @azz_4x4_o3 SUB_azz_nn_d 11000001 111 ....0 0 .. 110 ....0 11 ... @azz_2x2_o3 SUB_azz_nn_d 11000001 111 ...01 0 .. 110 ...00 11 ... @azz_4x4_o3 + +@azz_2x2_o2x2 ........ ... ..... . .. ... ..... ... .. \ + &azz_n n=2 rv=%mova_rv zn=%zn_ax2 zm=%zm_ax2 off=%off2_x2 +@azz_4x4_o2x2 ........ ... ..... . .. ... ..... ... .. \ + &azz_n n=4 rv=%mova_rv zn=%zn_ax4 zm=%zm_ax4 off=%off2_x2 + +FMLAL_nn 11000001 101 ....0 0 .. 010 ....0 000 .. @azz_2x2_o2x2 +FMLAL_nn 11000001 101 ...01 0 .. 010 ...00 000 .. @azz_4x4_o2x2 + +FMLSL_nn 11000001 101 ....0 0 .. 010 ....0 010 .. @azz_2x2_o2x2 +FMLSL_nn 11000001 101 ...01 0 .. 010 ...00 010 .. @azz_4x4_o2x2 + +BFMLAL_nn 11000001 101 ....0 0 .. 010 ....0 100 .. @azz_2x2_o2x2 +BFMLAL_nn 11000001 101 ...01 0 .. 010 ...00 100 .. @azz_4x4_o2x2 + +BFMLSL_nn 11000001 101 ....0 0 .. 010 ....0 110 .. @azz_2x2_o2x2 +BFMLSL_nn 11000001 101 ...01 0 .. 010 ...00 110 .. @azz_4x4_o2x2 + +### SME2 Multi-vector Indexed + +&azx_n n off rv zn zm idx + +%idx3_15_10 15:1 10:2 +%idx2_10_2 10:2 2:1 + +@azx_1x1_o3x2 ........ .... zm:4 . .. . .. zn:5 .. ... \ + &azx_n n=1 rv=%mova_rv off=%off3_x2 idx=%idx3_15_10 +@azx_2x1_o2x2 ........ .... zm:4 . .. . .. ..... .. ... \ + &azx_n n=2 rv=%mova_rv off=%off2_x2 zn=%zn_ax2 idx=%idx2_10_2 +@azx_4x1_o2x2 ........ .... zm:4 . .. . .. ..... .. ... \ + &azx_n n=4 rv=%mova_rv off=%off2_x2 zn=%zn_ax4 idx=%idx2_10_2 + +FMLAL_nx 11000001 1000 .... . .. 1 .. ..... 00 ... @azx_1x1_o3x2 +FMLAL_nx 11000001 1001 .... 0 .. 1 .. ....0 00 ... @azx_2x1_o2x2 +FMLAL_nx 11000001 1001 .... 1 .. 1 .. ...00 00 ... @azx_4x1_o2x2 + +FMLSL_nx 11000001 1000 .... . .. 1 .. ..... 01 ... @azx_1x1_o3x2 +FMLSL_nx 11000001 1001 .... 0 .. 1 .. ....0 01 ... @azx_2x1_o2x2 +FMLSL_nx 11000001 1001 .... 1 .. 1 .. ...00 01 ... @azx_4x1_o2x2 + +BFMLAL_nx 11000001 1000 .... . .. 1 .. ..... 10 ... @azx_1x1_o3x2 +BFMLAL_nx 11000001 1001 .... 0 .. 1 .. ....0 10 ... @azx_2x1_o2x2 +BFMLAL_nx 11000001 1001 .... 1 .. 1 .. ...00 10 ... @azx_4x1_o2x2 + +BFMLSL_nx 11000001 1000 .... . .. 1 .. ..... 11 ... @azx_1x1_o3x2 +BFMLSL_nx 11000001 1001 .... 0 .. 1 .. ....0 11 ... @azx_2x1_o2x2 +BFMLSL_nx 11000001 1001 .... 1 .. 1 .. ...00 11 ... @azx_4x1_o2x2 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(ADD_azz_nn_s, aa64_sme2, do_azz_nn, a, MO_32, tcg_gen_gvec_add_var) TRANS_FEAT(SUB_azz_nn_s, aa64_sme2, do_azz_nn, a, MO_32, tcg_gen_gvec_sub_var) TRANS_FEAT(ADD_azz_nn_d, aa64_sme2_i16i64, do_azz_nn, a, MO_64, tcg_gen_gvec_add_var) TRANS_FEAT(SUB_azz_nn_d, aa64_sme2_i16i64, do_azz_nn, a, MO_64, tcg_gen_gvec_sub_var) + +/* + * Expand array multi-vector single (n1), array multi-vector (nn), + * and array multi-vector indexed (nx), for floating-point accumulate. + * multi: true for nn, false for n1. + * fpst: >= 0 to set ptr argument for FPST_*, < 0 for ENV. + * data: stuff for simd_data, including any index. + */ +#define FPST_ENV -1 + +static bool do_azz_acc_fp(DisasContext *s, int nreg, int nsel, + int rv, int off, int zn, int zm, + int data, int shsel, bool multi, int fpst, + gen_helper_gvec_4_ptr *fn) +{ + if (sme_smza_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + int vstride = svl / nreg; + TCGv_ptr t_za = get_zarray(s, rv, off, nreg, nsel); + TCGv_ptr t, ptr; + + if (fpst >= 0) { + ptr = fpstatus_ptr(fpst); + } else { + ptr = tcg_env; + } + t = tcg_temp_new_ptr(); + + for (int r = 0; r < nreg; ++r) { + TCGv_ptr t_zn = vec_full_reg_ptr(s, zn); + TCGv_ptr t_zm = vec_full_reg_ptr(s, zm); + + for (int i = 0; i < nsel; ++i) { + int o_za = (r * vstride + i) * sizeof(ARMVectorReg); + int desc = simd_desc(svl, svl, data | (i << shsel)); + + tcg_gen_addi_ptr(t, t_za, o_za); + fn(t, t_zn, t_zm, t, ptr, tcg_constant_i32(desc)); + } + + /* + * For multiple-and-single vectors, Zn may wrap. + * For multiple vectors, both Zn and Zm are aligned. + */ + zn = (zn + 1) % 32; + zm += multi; + } + } + return true; +} + +static bool do_fmlal(DisasContext *s, arg_azz_n *a, bool sub, bool multi) +{ + return do_azz_acc_fp(s, a->n, 2, a->rv, a->off, a->zn, a->zm, + (1 << 2) | sub, 1, + multi, FPST_ENV, gen_helper_sve2_fmlal_zzzw_s); +} + +TRANS_FEAT(FMLAL_n1, aa64_sme2, do_fmlal, a, false, false) +TRANS_FEAT(FMLSL_n1, aa64_sme2, do_fmlal, a, true, false) +TRANS_FEAT(FMLAL_nn, aa64_sme2, do_fmlal, a, false, true) +TRANS_FEAT(FMLSL_nn, aa64_sme2, do_fmlal, a, true, true) + +static bool do_fmlal_nx(DisasContext *s, arg_azx_n *a, bool sub) +{ + return do_azz_acc_fp(s, a->n, 2, a->rv, a->off, a->zn, a->zm, + (a->idx << 3) | (1 << 2) | sub, 1, + false, FPST_ENV, gen_helper_sve2_fmlal_zzxw_s); +} + +TRANS_FEAT(FMLAL_nx, aa64_sme2, do_fmlal_nx, a, false) +TRANS_FEAT(FMLSL_nx, aa64_sme2, do_fmlal_nx, a, true) + +static bool do_bfmlal(DisasContext *s, arg_azz_n *a, bool sub, bool multi) +{ + return do_azz_acc_fp(s, a->n, 2, a->rv, a->off, a->zn, a->zm, + 0, 0, multi, FPST_ZA, + (!sub ? gen_helper_gvec_bfmlal + : s->fpcr_ah ? gen_helper_gvec_ah_bfmlsl + : gen_helper_gvec_bfmlsl)); +} + +TRANS_FEAT(BFMLAL_n1, aa64_sme2, do_bfmlal, a, false, false) +TRANS_FEAT(BFMLSL_n1, aa64_sme2, do_bfmlal, a, true, false) +TRANS_FEAT(BFMLAL_nn, aa64_sme2, do_bfmlal, a, false, true) +TRANS_FEAT(BFMLSL_nn, aa64_sme2, do_bfmlal, a, true, true) + +static bool do_bfmlal_nx(DisasContext *s, arg_azx_n *a, bool sub) +{ + return do_azz_acc_fp(s, a->n, 2, a->rv, a->off, a->zn, a->zm, + a->idx << 1, 0, false, FPST_ZA, + !sub ? gen_helper_gvec_bfmlal_idx + : s->fpcr_ah ? gen_helper_gvec_ah_bfmlsl_idx + : gen_helper_gvec_bfmlsl_idx); +} + +TRANS_FEAT(BFMLAL_nx, aa64_sme2, do_bfmlal_nx, a, false) +TRANS_FEAT(BFMLSL_nx, aa64_sme2, do_bfmlal_nx, a, true) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-38-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 5 ++++ target/arm/tcg/sme.decode | 14 +++++++++++ target/arm/tcg/sve.decode | 7 ++++-- target/arm/tcg/sme_helper.c | 44 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 18 ++++++++++++++ target/arm/tcg/translate-sve.c | 5 ++++ 6 files changed, 91 insertions(+), 2 deletions(-) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fmaxnum_b16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fminnum_b16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) + +DEF_HELPER_FLAGS_6(sme2_fdot_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_6(sme2_fdot_idx_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, env, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ BFMLSL_n1 11000001 001 0 .... 0 .. 011 ..... 11 ... @azz_nx1_o3x2 n=1 BFMLSL_n1 11000001 001 0 .... 0 .. 010 ..... 110 .. @azz_nx1_o2x2 n=2 BFMLSL_n1 11000001 001 1 .... 0 .. 010 ..... 110 .. @azz_nx1_o2x2 n=4 +FDOT_n1 11000001 001 0 .... 0 .. 100 ..... 00 ... @azz_nx1_o3 n=2 +FDOT_n1 11000001 001 1 .... 0 .. 100 ..... 00 ... @azz_nx1_o3 n=4 + ### SME2 Multi-vector Multiple Array Vectors %zn_ax2 6:4 !function=times_2 @@ -XXX,XX +XXX,XX @@ BFMLAL_nn 11000001 101 ...01 0 .. 010 ...00 100 .. @azz_4x4_o2x2 BFMLSL_nn 11000001 101 ....0 0 .. 010 ....0 110 .. @azz_2x2_o2x2 BFMLSL_nn 11000001 101 ...01 0 .. 010 ...00 110 .. @azz_4x4_o2x2 +FDOT_nn 11000001 101 ....0 0 .. 100 ....0 00 ... @azz_2x2_o3 +FDOT_nn 11000001 101 ...01 0 .. 100 ...00 00 ... @azz_4x4_o3 + ### SME2 Multi-vector Indexed &azx_n n off rv zn zm idx @@ -XXX,XX +XXX,XX @@ BFMLAL_nx 11000001 1001 .... 1 .. 1 .. ...00 10 ... @azx_4x1_o2x2 BFMLSL_nx 11000001 1000 .... . .. 1 .. ..... 11 ... @azx_1x1_o3x2 BFMLSL_nx 11000001 1001 .... 0 .. 1 .. ....0 11 ... @azx_2x1_o2x2 BFMLSL_nx 11000001 1001 .... 1 .. 1 .. ...00 11 ... @azx_4x1_o2x2 + +@azx_2x1_i2_o3 ........ .... zm:4 . .. . idx:2 .... ... off:3 \ + &azx_n n=2 rv=%mova_rv zn=%zn_ax2 +@azx_4x1_i2_o3 ........ .... zm:4 . .. . idx:2 .... ... off:3 \ + &azx_n n=2 rv=%mova_rv zn=%zn_ax4 + +FDOT_nx 11000001 0101 .... 0 .. 1 .. ....0 01 ... @azx_2x1_i2_o3 +FDOT_nx 11000001 0101 .... 1 .. 1 .. ...00 01 ... @azx_4x1_i2_o3 diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ FMLSLT_zzzw 01100100 10 1 ..... 10 1 00 1 ..... ..... @rda_rn_rm_ex esz=2 BFMLALB_zzzw 01100100 11 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2 BFMLALT_zzzw 01100100 11 1 ..... 10 0 00 1 ..... ..... @rda_rn_rm_ex esz=2 -### SVE2 floating-point bfloat16 dot-product +### SVE2 floating-point dot-product +FDOT_zzzz 01100100 00 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2 BFDOT_zzzz 01100100 01 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2 ### SVE2 floating-point multiply-add long (indexed) @@ -XXX,XX +XXX,XX @@ FMLSLT_zzxw 01100100 10 1 ..... 0110.1 ..... ..... @rrxr_3a esz=2 BFMLALB_zzxw 01100100 11 1 ..... 0100.0 ..... ..... @rrxr_3a esz=2 BFMLALT_zzxw 01100100 11 1 ..... 0100.1 ..... ..... @rrxr_3a esz=2 -### SVE2 floating-point bfloat16 dot-product (indexed) +### SVE2 floating-point dot-product (indexed) + +FDOT_zzxz 01100100 00 1 ..... 010000 ..... ..... @rrxr_2 esz=2 BFDOT_zzxz 01100100 01 1 ..... 010000 ..... ..... @rrxr_2 esz=2 ### SVE broadcast predicate element diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn, } } +void HELPER(sme2_fdot_h)(void *vd, void *vn, void *vm, void *va, + CPUARMState *env, uint32_t desc) +{ + intptr_t i, oprsz = simd_maxsz(desc); + bool za = extract32(desc, SIMD_DATA_SHIFT, 1); + float_status *fpst_std = &env->vfp.fp_status[za ? FPST_ZA : FPST_A64]; + float_status *fpst_f16 = &env->vfp.fp_status[za ? FPST_ZA_F16 : FPST_A64_F16]; + float_status fpst_odd = *fpst_std; + float32 *d = vd, *a = va; + uint32_t *n = vn, *m = vm; + + set_float_rounding_mode(float_round_to_odd, &fpst_odd); + + for (i = 0; i < oprsz / sizeof(float32); ++i) { + d[H4(i)] = f16_dotadd(a[H4(i)], n[H4(i)], m[H4(i)], + fpst_f16, fpst_std, &fpst_odd); + } +} + +void HELPER(sme2_fdot_idx_h)(void *vd, void *vn, void *vm, void *va, + CPUARMState *env, uint32_t desc) +{ + intptr_t i, j, oprsz = simd_maxsz(desc); + intptr_t elements = oprsz / sizeof(float32); + intptr_t eltspersegment = MIN(4, elements); + int idx = extract32(desc, SIMD_DATA_SHIFT, 2); + bool za = extract32(desc, SIMD_DATA_SHIFT + 2, 1); + float_status *fpst_std = &env->vfp.fp_status[za ? FPST_ZA : FPST_A64]; + float_status *fpst_f16 = &env->vfp.fp_status[za ? FPST_ZA_F16 : FPST_A64_F16]; + float_status fpst_odd = *fpst_std; + float32 *d = vd, *a = va; + uint32_t *n = vn, *m = (uint32_t *)vm + H4(idx); + + set_float_rounding_mode(float_round_to_odd, &fpst_odd); + + for (i = 0; i < elements; i += eltspersegment) { + uint32_t mm = m[i]; + for (j = 0; j < eltspersegment; ++j) { + d[H4(i + j)] = f16_dotadd(a[H4(i + j)], n[H4(i + j)], mm, + fpst_f16, fpst_std, &fpst_odd); + } + } +} + void HELPER(sme_bfmopa)(void *vza, void *vzn, void *vzm, void *vpn, void *vpm, CPUARMState *env, uint32_t desc) { diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_bfmlal_nx(DisasContext *s, arg_azx_n *a, bool sub) TRANS_FEAT(BFMLAL_nx, aa64_sme2, do_bfmlal_nx, a, false) TRANS_FEAT(BFMLSL_nx, aa64_sme2, do_bfmlal_nx, a, true) + +static bool do_fdot(DisasContext *s, arg_azz_n *a, bool multi) +{ + return do_azz_acc_fp(s, a->n, 1, a->rv, a->off, a->zn, a->zm, 1, 0, + multi, FPST_ENV, gen_helper_sme2_fdot_h); +} + +TRANS_FEAT(FDOT_n1, aa64_sme2, do_fdot, a, false) +TRANS_FEAT(FDOT_nn, aa64_sme2, do_fdot, a, true) + +static bool do_fdot_nx(DisasContext *s, arg_azx_n *a) +{ + return do_azz_acc_fp(s, a->n, 1, a->rv, a->off, a->zn, a->zm, + a->idx | (1 << 2), 0, false, FPST_ENV, + gen_helper_sme2_fdot_idx_h); +} + +TRANS_FEAT(FDOT_nx, aa64_sme2, do_fdot_nx, a) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT_NONSTREAMING(USMMLA, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz, TRANS_FEAT_NONSTREAMING(UMMLA, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz, gen_helper_gvec_ummla_b, a, 0) +TRANS_FEAT(FDOT_zzzz, aa64_sme2_or_sve2p1, gen_gvec_env_arg_zzzz, + gen_helper_sme2_fdot_h, a, 0) +TRANS_FEAT(FDOT_zzxz, aa64_sme2_or_sve2p1, gen_gvec_env_arg_zzxz, + gen_helper_sme2_fdot_idx_h, a) + TRANS_FEAT(BFDOT_zzzz, aa64_sve_bf16, gen_gvec_env_arg_zzzz, gen_helper_gvec_bfdot, a, 0) TRANS_FEAT(BFDOT_zzxz, aa64_sve_bf16, gen_gvec_env_arg_zzxz, -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-39-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 9 +++++++++ target/arm/tcg/translate-sme.c | 17 +++++++++++++++++ 2 files changed, 26 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ BFMLSL_n1 11000001 001 1 .... 0 .. 010 ..... 110 .. @azz_nx1_o2x2 n=4 FDOT_n1 11000001 001 0 .... 0 .. 100 ..... 00 ... @azz_nx1_o3 n=2 FDOT_n1 11000001 001 1 .... 0 .. 100 ..... 00 ... @azz_nx1_o3 n=4 +BFDOT_n1 11000001 001 0 .... 0 .. 100 ..... 10 ... @azz_nx1_o3 n=2 +BFDOT_n1 11000001 001 1 .... 0 .. 100 ..... 10 ... @azz_nx1_o3 n=4 + ### SME2 Multi-vector Multiple Array Vectors %zn_ax2 6:4 !function=times_2 @@ -XXX,XX +XXX,XX @@ BFMLSL_nn 11000001 101 ...01 0 .. 010 ...00 110 .. @azz_4x4_o2x2 FDOT_nn 11000001 101 ....0 0 .. 100 ....0 00 ... @azz_2x2_o3 FDOT_nn 11000001 101 ...01 0 .. 100 ...00 00 ... @azz_4x4_o3 +BFDOT_nn 11000001 101 ....0 0 .. 100 ....0 10 ... @azz_2x2_o3 +BFDOT_nn 11000001 101 ...01 0 .. 100 ...00 10 ... @azz_4x4_o3 + ### SME2 Multi-vector Indexed &azx_n n off rv zn zm idx @@ -XXX,XX +XXX,XX @@ BFMLSL_nx 11000001 1001 .... 1 .. 1 .. ...00 11 ... @azx_4x1_o2x2 FDOT_nx 11000001 0101 .... 0 .. 1 .. ....0 01 ... @azx_2x1_i2_o3 FDOT_nx 11000001 0101 .... 1 .. 1 .. ...00 01 ... @azx_4x1_i2_o3 + +BFDOT_nx 11000001 0101 .... 0 .. 1 .. ....0 11 ... @azx_2x1_i2_o3 +BFDOT_nx 11000001 0101 .... 1 .. 1 .. ...00 11 ... @azx_4x1_i2_o3 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_fdot_nx(DisasContext *s, arg_azx_n *a) } TRANS_FEAT(FDOT_nx, aa64_sme2, do_fdot_nx, a) + +static bool do_bfdot(DisasContext *s, arg_azz_n *a, bool multi) +{ + return do_azz_acc_fp(s, a->n, 1, a->rv, a->off, a->zn, a->zm, 0, 0, + multi, FPST_ENV, gen_helper_gvec_bfdot); +} + +TRANS_FEAT(BFDOT_n1, aa64_sme2, do_bfdot, a, false) +TRANS_FEAT(BFDOT_nn, aa64_sme2, do_bfdot, a, true) + +static bool do_bfdot_nx(DisasContext *s, arg_azx_n *a) +{ + return do_azz_acc_fp(s, a->n, 1, a->rv, a->off, a->zn, a->zm, a->idx, 0, + false, FPST_ENV, gen_helper_gvec_bfdot_idx); +} + +TRANS_FEAT(BFDOT_nx, aa64_sme2, do_bfdot_nx, a) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-40-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 2 ++ target/arm/tcg/helper.h | 2 ++ target/arm/tcg/sme.decode | 3 +++ target/arm/tcg/sme_helper.c | 30 ++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 24 +++++++++++++++++++++ target/arm/tcg/vec_helper.c | 39 ++++++++++++++++++++++++++++++++++ 6 files changed, 100 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sme2_fdot_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, env, i32) DEF_HELPER_FLAGS_6(sme2_fdot_idx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_6(sme2_fvdot_idx_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, env, i32) diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper.h +++ b/target/arm/tcg/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(gvec_bfdot, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, env, i32) DEF_HELPER_FLAGS_6(gvec_bfdot_idx, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_6(sme2_bfvdot_idx, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, env, i32) DEF_HELPER_FLAGS_6(gvec_bfmmla, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, env, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ FDOT_nx 11000001 0101 .... 1 .. 1 .. ...00 01 ... @azx_4x1_i2_o3 BFDOT_nx 11000001 0101 .... 0 .. 1 .. ....0 11 ... @azx_2x1_i2_o3 BFDOT_nx 11000001 0101 .... 1 .. 1 .. ...00 11 ... @azx_4x1_i2_o3 + +FVDOT 11000001 0101 .... 0 .. 0 .. ....0 01 ... @azx_2x1_i2_o3 +BFVDOT 11000001 0101 .... 0 .. 0 .. ....0 11 ... @azx_2x1_i2_o3 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_fdot_idx_h)(void *vd, void *vn, void *vm, void *va, } } +void HELPER(sme2_fvdot_idx_h)(void *vd, void *vn, void *vm, void *va, + CPUARMState *env, uint32_t desc) +{ + intptr_t i, j, oprsz = simd_maxsz(desc); + intptr_t elements = oprsz / sizeof(float32); + intptr_t eltspersegment = MIN(4, elements); + int idx = extract32(desc, SIMD_DATA_SHIFT, 2); + int sel = extract32(desc, SIMD_DATA_SHIFT + 2, 1); + float_status fpst_odd, *fpst_std, *fpst_f16; + float32 *d = vd, *a = va; + uint16_t *n0 = vn; + uint16_t *n1 = vn + sizeof(ARMVectorReg); + uint32_t *m = (uint32_t *)vm + H4(idx); + + fpst_std = &env->vfp.fp_status[FPST_ZA]; + fpst_f16 = &env->vfp.fp_status[FPST_ZA_F16]; + fpst_odd = *fpst_std; + set_float_rounding_mode(float_round_to_odd, &fpst_odd); + + for (i = 0; i < elements; i += eltspersegment) { + uint32_t mm = m[i]; + for (j = 0; j < eltspersegment; ++j) { + uint32_t nn = (n0[H2(2 * (i + j) + sel)]) + | (n1[H2(2 * (i + j) + sel)] << 16); + d[i + H4(j)] = f16_dotadd(a[i + H4(j)], nn, mm, + fpst_f16, fpst_std, &fpst_odd); + } + } +} + void HELPER(sme_bfmopa)(void *vza, void *vzn, void *vzm, void *vpn, void *vpm, CPUARMState *env, uint32_t desc) { diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_bfdot_nx(DisasContext *s, arg_azx_n *a) } TRANS_FEAT(BFDOT_nx, aa64_sme2, do_bfdot_nx, a) + +static bool do_vdot(DisasContext *s, arg_azx_n *a, gen_helper_gvec_4_ptr *fn) +{ + if (sme_smza_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + int vstride = svl / 2; + TCGv_ptr t_za = get_zarray(s, a->rv, a->off, 2, 1); + TCGv_ptr t_zn = vec_full_reg_ptr(s, a->zn); + TCGv_ptr t_zm = vec_full_reg_ptr(s, a->zm); + TCGv_ptr t = tcg_temp_new_ptr(); + + for (int i = 0; i < 2; ++i) { + int o_za = i * vstride * sizeof(ARMVectorReg); + int desc = simd_desc(svl, svl, a->idx | (i << 2)); + + tcg_gen_addi_ptr(t, t_za, o_za); + fn(t, t_zn, t_zm, t, tcg_env, tcg_constant_i32(desc)); + } + } + return true; +} + +TRANS_FEAT(FVDOT, aa64_sme, do_vdot, a, gen_helper_sme2_fvdot_idx_h) +TRANS_FEAT(BFVDOT, aa64_sme, do_vdot, a, gen_helper_sme2_bfvdot_idx) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfdot_idx)(void *vd, void *vn, void *vm, clear_tail(d, opr_sz, simd_maxsz(desc)); } +void HELPER(sme2_bfvdot_idx)(void *vd, void *vn, void *vm, + void *va, CPUARMState *env, uint32_t desc) +{ + intptr_t i, j, opr_sz = simd_oprsz(desc); + intptr_t idx = extract32(desc, SIMD_DATA_SHIFT, 2); + intptr_t sel = extract32(desc, SIMD_DATA_SHIFT + 2, 1); + intptr_t elements = opr_sz / 4; + intptr_t eltspersegment = MIN(16 / 4, elements); + float32 *d = vd, *a = va; + uint16_t *n0 = vn; + uint16_t *n1 = vn + sizeof(ARMVectorReg); + uint32_t *m = vm; + float_status fpst, fpst_odd; + + if (is_ebf(env, &fpst, &fpst_odd)) { + for (i = 0; i < elements; i += eltspersegment) { + uint32_t m_idx = m[i + H4(idx)]; + + for (j = 0; j < eltspersegment; j++) { + uint32_t nn = (n0[H2(2 * (i + j) + sel)]) + | (n1[H2(2 * (i + j) + sel)] << 16); + d[i + H4(j)] = bfdotadd_ebf(a[i + H4(j)], nn, m_idx, + &fpst, &fpst_odd); + } + } + } else { + for (i = 0; i < elements; i += eltspersegment) { + uint32_t m_idx = m[i + H4(idx)]; + + for (j = 0; j < eltspersegment; j++) { + uint32_t nn = (n0[H2(2 * (i + j) + sel)]) + | (n1[H2(2 * (i + j) + sel)] << 16); + d[i + H4(j)] = bfdotadd(a[i + H4(j)], nn, m_idx, &fpst); + } + } + } + clear_tail(d, opr_sz, simd_maxsz(desc)); +} + void HELPER(gvec_bfmmla)(void *vd, void *vn, void *vm, void *va, CPUARMState *env, uint32_t desc) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Emphasize that these are 4-way dot products. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-41-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper.h | 22 +++++++++++----------- target/arm/tcg/translate-a64.c | 14 +++++++------- target/arm/tcg/translate-neon.c | 14 +++++++------- target/arm/tcg/translate-sve.c | 18 +++++++++--------- target/arm/tcg/vec_helper.c | 22 +++++++++++----------- 5 files changed, 45 insertions(+), 45 deletions(-) diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper.h +++ b/target/arm/tcg/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sve2_sqrdmlah_d, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(sve2_sqrdmlsh_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_sdot_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_udot_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_sdot_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_udot_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_usdot_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sdot_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_udot_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sdot_4h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_udot_4h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_usdot_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_sdot_idx_b, TCG_CALL_NO_RWG, +DEF_HELPER_FLAGS_5(gvec_sdot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_udot_idx_b, TCG_CALL_NO_RWG, +DEF_HELPER_FLAGS_5(gvec_udot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_sdot_idx_h, TCG_CALL_NO_RWG, +DEF_HELPER_FLAGS_5(gvec_sdot_idx_4h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_udot_idx_h, TCG_CALL_NO_RWG, +DEF_HELPER_FLAGS_5(gvec_udot_idx_4h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_sudot_idx_b, TCG_CALL_NO_RWG, +DEF_HELPER_FLAGS_5(gvec_sudot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_5(gvec_usdot_idx_b, TCG_CALL_NO_RWG, +DEF_HELPER_FLAGS_5(gvec_usdot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_fcaddh, TCG_CALL_NO_RWG, diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -XXX,XX +XXX,XX @@ static bool do_dot_vector_env(DisasContext *s, arg_qrrr_e *a, return true; } -TRANS_FEAT(SDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_sdot_b) -TRANS_FEAT(UDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_udot_b) -TRANS_FEAT(USDOT_v, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_usdot_b) +TRANS_FEAT(SDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_sdot_4b) +TRANS_FEAT(UDOT_v, aa64_dp, do_dot_vector, a, gen_helper_gvec_udot_4b) +TRANS_FEAT(USDOT_v, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_usdot_4b) TRANS_FEAT(BFDOT_v, aa64_bf16, do_dot_vector_env, a, gen_helper_gvec_bfdot) TRANS_FEAT(BFMMLA, aa64_bf16, do_dot_vector_env, a, gen_helper_gvec_bfmmla) TRANS_FEAT(SMMLA, aa64_i8mm, do_dot_vector, a, gen_helper_gvec_smmla_b) @@ -XXX,XX +XXX,XX @@ static bool do_dot_vector_idx_env(DisasContext *s, arg_qrrx_e *a, return true; } -TRANS_FEAT(SDOT_vi, aa64_dp, do_dot_vector_idx, a, gen_helper_gvec_sdot_idx_b) -TRANS_FEAT(UDOT_vi, aa64_dp, do_dot_vector_idx, a, gen_helper_gvec_udot_idx_b) +TRANS_FEAT(SDOT_vi, aa64_dp, do_dot_vector_idx, a, gen_helper_gvec_sdot_idx_4b) +TRANS_FEAT(UDOT_vi, aa64_dp, do_dot_vector_idx, a, gen_helper_gvec_udot_idx_4b) TRANS_FEAT(SUDOT_vi, aa64_i8mm, do_dot_vector_idx, a, - gen_helper_gvec_sudot_idx_b) + gen_helper_gvec_sudot_idx_4b) TRANS_FEAT(USDOT_vi, aa64_i8mm, do_dot_vector_idx, a, - gen_helper_gvec_usdot_idx_b) + gen_helper_gvec_usdot_idx_4b) TRANS_FEAT(BFDOT_vi, aa64_bf16, do_dot_vector_idx_env, a, gen_helper_gvec_bfdot_idx) diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-neon.c +++ b/target/arm/tcg/translate-neon.c @@ -XXX,XX +XXX,XX @@ static bool trans_VSDOT(DisasContext *s, arg_VSDOT *a) return false; } return do_neon_ddda(s, a->q * 7, a->vd, a->vn, a->vm, 0, - gen_helper_gvec_sdot_b); + gen_helper_gvec_sdot_4b); } static bool trans_VUDOT(DisasContext *s, arg_VUDOT *a) @@ -XXX,XX +XXX,XX @@ static bool trans_VUDOT(DisasContext *s, arg_VUDOT *a) return false; } return do_neon_ddda(s, a->q * 7, a->vd, a->vn, a->vm, 0, - gen_helper_gvec_udot_b); + gen_helper_gvec_udot_4b); } static bool trans_VUSDOT(DisasContext *s, arg_VUSDOT *a) @@ -XXX,XX +XXX,XX @@ static bool trans_VUSDOT(DisasContext *s, arg_VUSDOT *a) return false; } return do_neon_ddda(s, a->q * 7, a->vd, a->vn, a->vm, 0, - gen_helper_gvec_usdot_b); + gen_helper_gvec_usdot_4b); } static bool trans_VDOT_b16(DisasContext *s, arg_VDOT_b16 *a) @@ -XXX,XX +XXX,XX @@ static bool trans_VSDOT_scalar(DisasContext *s, arg_VSDOT_scalar *a) return false; } return do_neon_ddda(s, a->q * 6, a->vd, a->vn, a->vm, a->index, - gen_helper_gvec_sdot_idx_b); + gen_helper_gvec_sdot_idx_4b); } static bool trans_VUDOT_scalar(DisasContext *s, arg_VUDOT_scalar *a) @@ -XXX,XX +XXX,XX @@ static bool trans_VUDOT_scalar(DisasContext *s, arg_VUDOT_scalar *a) return false; } return do_neon_ddda(s, a->q * 6, a->vd, a->vn, a->vm, a->index, - gen_helper_gvec_udot_idx_b); + gen_helper_gvec_udot_idx_4b); } static bool trans_VUSDOT_scalar(DisasContext *s, arg_VUSDOT_scalar *a) @@ -XXX,XX +XXX,XX @@ static bool trans_VUSDOT_scalar(DisasContext *s, arg_VUSDOT_scalar *a) return false; } return do_neon_ddda(s, a->q * 6, a->vd, a->vn, a->vm, a->index, - gen_helper_gvec_usdot_idx_b); + gen_helper_gvec_usdot_idx_4b); } static bool trans_VSUDOT_scalar(DisasContext *s, arg_VSUDOT_scalar *a) @@ -XXX,XX +XXX,XX @@ static bool trans_VSUDOT_scalar(DisasContext *s, arg_VSUDOT_scalar *a) return false; } return do_neon_ddda(s, a->q * 6, a->vd, a->vn, a->vm, a->index, - gen_helper_gvec_sudot_idx_b); + gen_helper_gvec_sudot_idx_4b); } static bool trans_VDOT_b16_scal(DisasContext *s, arg_VDOT_b16_scal *a) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ DO_ZZI(UMIN, umin) #undef DO_ZZI static gen_helper_gvec_4 * const dot_fns[2][2] = { - { gen_helper_gvec_sdot_b, gen_helper_gvec_sdot_h }, - { gen_helper_gvec_udot_b, gen_helper_gvec_udot_h } + { gen_helper_gvec_sdot_4b, gen_helper_gvec_sdot_4h }, + { gen_helper_gvec_udot_4b, gen_helper_gvec_udot_4h } }; TRANS_FEAT(DOT_zzzz, aa64_sve, gen_gvec_ool_zzzz, dot_fns[a->u][a->sz], a->rd, a->rn, a->rm, a->ra, 0) @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(DOT_zzzz, aa64_sve, gen_gvec_ool_zzzz, */ TRANS_FEAT(SDOT_zzxw_s, aa64_sve, gen_gvec_ool_arg_zzxz, - gen_helper_gvec_sdot_idx_b, a) + gen_helper_gvec_sdot_idx_4b, a) TRANS_FEAT(SDOT_zzxw_d, aa64_sve, gen_gvec_ool_arg_zzxz, - gen_helper_gvec_sdot_idx_h, a) + gen_helper_gvec_sdot_idx_4h, a) TRANS_FEAT(UDOT_zzxw_s, aa64_sve, gen_gvec_ool_arg_zzxz, - gen_helper_gvec_udot_idx_b, a) + gen_helper_gvec_udot_idx_4b, a) TRANS_FEAT(UDOT_zzxw_d, aa64_sve, gen_gvec_ool_arg_zzxz, - gen_helper_gvec_udot_idx_h, a) + gen_helper_gvec_udot_idx_4h, a) TRANS_FEAT(SUDOT_zzxw_s, aa64_sve_i8mm, gen_gvec_ool_arg_zzxz, - gen_helper_gvec_sudot_idx_b, a) + gen_helper_gvec_sudot_idx_4b, a) TRANS_FEAT(USDOT_zzxw_s, aa64_sve_i8mm, gen_gvec_ool_arg_zzxz, - gen_helper_gvec_usdot_idx_b, a) + gen_helper_gvec_usdot_idx_4b, a) #define DO_SVE2_RRX(NAME, FUNC) \ TRANS_FEAT(NAME, aa64_sve, gen_gvec_ool_zzz, FUNC, \ @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SQRDCMLAH_zzzz, aa64_sve2, gen_gvec_ool_zzzz, sqrdcmlah_fns[a->esz], a->rd, a->rn, a->rm, a->ra, a->rot) TRANS_FEAT(USDOT_zzzz, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz, - a->esz == 2 ? gen_helper_gvec_usdot_b : NULL, a, 0) + a->esz == 2 ? gen_helper_gvec_usdot_4b : NULL, a, 0) TRANS_FEAT_NONSTREAMING(AESMC, aa64_sve2_aes, gen_gvec_ool_zz, gen_helper_crypto_aesmc, a->rd, a->rd, 0) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ clear_tail(d, opr_sz, simd_maxsz(desc)); \ } -DO_DOT(gvec_sdot_b, int32_t, int8_t, int8_t) -DO_DOT(gvec_udot_b, uint32_t, uint8_t, uint8_t) -DO_DOT(gvec_usdot_b, uint32_t, uint8_t, int8_t) -DO_DOT(gvec_sdot_h, int64_t, int16_t, int16_t) -DO_DOT(gvec_udot_h, uint64_t, uint16_t, uint16_t) +DO_DOT(gvec_sdot_4b, int32_t, int8_t, int8_t) +DO_DOT(gvec_udot_4b, uint32_t, uint8_t, uint8_t) +DO_DOT(gvec_usdot_4b, uint32_t, uint8_t, int8_t) +DO_DOT(gvec_sdot_4h, int64_t, int16_t, int16_t) +DO_DOT(gvec_udot_4h, uint64_t, uint16_t, uint16_t) #define DO_DOT_IDX(NAME, TYPED, TYPEN, TYPEM, HD) \ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ @@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ clear_tail(d, opr_sz, simd_maxsz(desc)); \ } -DO_DOT_IDX(gvec_sdot_idx_b, int32_t, int8_t, int8_t, H4) -DO_DOT_IDX(gvec_udot_idx_b, uint32_t, uint8_t, uint8_t, H4) -DO_DOT_IDX(gvec_sudot_idx_b, int32_t, int8_t, uint8_t, H4) -DO_DOT_IDX(gvec_usdot_idx_b, int32_t, uint8_t, int8_t, H4) -DO_DOT_IDX(gvec_sdot_idx_h, int64_t, int16_t, int16_t, H8) -DO_DOT_IDX(gvec_udot_idx_h, uint64_t, uint16_t, uint16_t, H8) +DO_DOT_IDX(gvec_sdot_idx_4b, int32_t, int8_t, int8_t, H4) +DO_DOT_IDX(gvec_udot_idx_4b, uint32_t, uint8_t, uint8_t, H4) +DO_DOT_IDX(gvec_sudot_idx_4b, int32_t, int8_t, uint8_t, H4) +DO_DOT_IDX(gvec_usdot_idx_4b, int32_t, uint8_t, int8_t, H4) +DO_DOT_IDX(gvec_sdot_idx_4h, int64_t, int16_t, int16_t, H8) +DO_DOT_IDX(gvec_udot_idx_4h, uint64_t, uint16_t, uint16_t, H8) void HELPER(gvec_fcaddh)(void *vd, void *vn, void *vm, float_status *fpst, uint32_t desc) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20250704142112.1018902-42-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper.h | 8 ++++ target/arm/tcg/sme.decode | 63 ++++++++++++++++++++++++- target/arm/tcg/translate-sme.c | 85 ++++++++++++++++++++++++++++++++++ target/arm/tcg/vec_helper.c | 51 ++++++++++++++++++++ 4 files changed, 206 insertions(+), 1 deletion(-) diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper.h +++ b/target/arm/tcg/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_sdot_4h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_udot_4h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_usdot_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sdot_2h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_udot_2h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(gvec_sdot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(gvec_udot_idx_4b, TCG_CALL_NO_RWG, @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_sudot_idx_4b, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_5(gvec_usdot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_sdot_idx_2h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(gvec_udot_idx_2h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(gvec_fcaddh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fcadds, TCG_CALL_NO_RWG, diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ FDOT_n1 11000001 001 1 .... 0 .. 100 ..... 00 ... @azz_nx1_o3 n=4 BFDOT_n1 11000001 001 0 .... 0 .. 100 ..... 10 ... @azz_nx1_o3 n=2 BFDOT_n1 11000001 001 1 .... 0 .. 100 ..... 10 ... @azz_nx1_o3 n=4 +USDOT_n1 11000001 001 0 .... 0 .. 101 ..... 01 ... @azz_nx1_o3 n=2 +USDOT_n1 11000001 001 1 .... 0 .. 101 ..... 01 ... @azz_nx1_o3 n=4 + +SUDOT_n1 11000001 001 0 .... 0 .. 101 ..... 11 ... @azz_nx1_o3 n=2 +SUDOT_n1 11000001 001 1 .... 0 .. 101 ..... 11 ... @azz_nx1_o3 n=4 + +SDOT_n1_4b 11000001 001 0 .... 0 .. 101 ..... 00 ... @azz_nx1_o3 n=2 +SDOT_n1_4b 11000001 001 1 .... 0 .. 101 ..... 00 ... @azz_nx1_o3 n=4 +SDOT_n1_4h 11000001 011 0 .... 0 .. 101 ..... 00 ... @azz_nx1_o3 n=2 +SDOT_n1_4h 11000001 011 1 .... 0 .. 101 ..... 00 ... @azz_nx1_o3 n=4 +SDOT_n1_2h 11000001 011 0 .... 0 .. 101 ..... 01 ... @azz_nx1_o3 n=2 +SDOT_n1_2h 11000001 011 1 .... 0 .. 101 ..... 01 ... @azz_nx1_o3 n=4 + +UDOT_n1_4b 11000001 001 0 .... 0 .. 101 ..... 10 ... @azz_nx1_o3 n=2 +UDOT_n1_4b 11000001 001 1 .... 0 .. 101 ..... 10 ... @azz_nx1_o3 n=4 +UDOT_n1_4h 11000001 011 0 .... 0 .. 101 ..... 10 ... @azz_nx1_o3 n=2 +UDOT_n1_4h 11000001 011 1 .... 0 .. 101 ..... 10 ... @azz_nx1_o3 n=4 +UDOT_n1_2h 11000001 011 0 .... 0 .. 101 ..... 11 ... @azz_nx1_o3 n=2 +UDOT_n1_2h 11000001 011 1 .... 0 .. 101 ..... 11 ... @azz_nx1_o3 n=4 + ### SME2 Multi-vector Multiple Array Vectors %zn_ax2 6:4 !function=times_2 @@ -XXX,XX +XXX,XX @@ FDOT_nn 11000001 101 ...01 0 .. 100 ...00 00 ... @azz_4x4_o3 BFDOT_nn 11000001 101 ....0 0 .. 100 ....0 10 ... @azz_2x2_o3 BFDOT_nn 11000001 101 ...01 0 .. 100 ...00 10 ... @azz_4x4_o3 +USDOT_nn 11000001 101 ....0 0 .. 101 ....0 01 ... @azz_2x2_o3 +USDOT_nn 11000001 101 ...01 0 .. 101 ...00 01 ... @azz_4x4_o3 + +SDOT_nn_4b 11000001 101 ....0 0 .. 101 ....0 00 ... @azz_2x2_o3 +SDOT_nn_4b 11000001 101 ...01 0 .. 101 ...00 00 ... @azz_4x4_o3 +SDOT_nn_4h 11000001 111 ....0 0 .. 101 ....0 00 ... @azz_2x2_o3 +SDOT_nn_4h 11000001 111 ...01 0 .. 101 ...00 00 ... @azz_4x4_o3 +SDOT_nn_2h 11000001 111 ....0 0 .. 101 ....0 01 ... @azz_2x2_o3 +SDOT_nn_2h 11000001 111 ...01 0 .. 101 ...00 01 ... @azz_4x4_o3 + +UDOT_nn_4b 11000001 101 ....0 0 .. 101 ....0 10 ... @azz_2x2_o3 +UDOT_nn_4b 11000001 101 ...01 0 .. 101 ...00 10 ... @azz_4x4_o3 +UDOT_nn_4h 11000001 111 ....0 0 .. 101 ....0 10 ... @azz_2x2_o3 +UDOT_nn_4h 11000001 111 ...01 0 .. 101 ...00 10 ... @azz_4x4_o3 +UDOT_nn_2h 11000001 111 ....0 0 .. 101 ....0 11 ... @azz_2x2_o3 +UDOT_nn_2h 11000001 111 ...01 0 .. 101 ...00 11 ... @azz_4x4_o3 + ### SME2 Multi-vector Indexed &azx_n n off rv zn zm idx @@ -XXX,XX +XXX,XX @@ BFMLSL_nx 11000001 1001 .... 1 .. 1 .. ...00 11 ... @azx_4x1_o2x2 @azx_2x1_i2_o3 ........ .... zm:4 . .. . idx:2 .... ... off:3 \ &azx_n n=2 rv=%mova_rv zn=%zn_ax2 @azx_4x1_i2_o3 ........ .... zm:4 . .. . idx:2 .... ... off:3 \ - &azx_n n=2 rv=%mova_rv zn=%zn_ax4 + &azx_n n=4 rv=%mova_rv zn=%zn_ax4 +@azx_2x1_i1_o3 ........ .... zm:4 . .. .. idx:1 .... ... off:3 \ + &azx_n n=2 rv=%mova_rv zn=%zn_ax2 +@azx_4x1_i1_o3 ........ .... zm:4 . .. .. idx:1 .... ... off:3 \ + &azx_n n=4 rv=%mova_rv zn=%zn_ax4 FDOT_nx 11000001 0101 .... 0 .. 1 .. ....0 01 ... @azx_2x1_i2_o3 FDOT_nx 11000001 0101 .... 1 .. 1 .. ...00 01 ... @azx_4x1_i2_o3 @@ -XXX,XX +XXX,XX @@ BFDOT_nx 11000001 0101 .... 1 .. 1 .. ...00 11 ... @azx_4x1_i2_o3 FVDOT 11000001 0101 .... 0 .. 0 .. ....0 01 ... @azx_2x1_i2_o3 BFVDOT 11000001 0101 .... 0 .. 0 .. ....0 11 ... @azx_2x1_i2_o3 + +SDOT_nx_2h 11000001 0101 .... 0 .. 1 .. ....0 00 ... @azx_2x1_i2_o3 +SDOT_nx_2h 11000001 0101 .... 1 .. 1 .. ...00 00 ... @azx_4x1_i2_o3 +SDOT_nx_4b 11000001 0101 .... 0 .. 1 .. ....1 00 ... @azx_2x1_i2_o3 +SDOT_nx_4b 11000001 0101 .... 1 .. 1 .. ...01 00 ... @azx_4x1_i2_o3 +SDOT_nx_4h 11000001 1101 .... 0 .. 00 . ....0 01 ... @azx_2x1_i1_o3 +SDOT_nx_4h 11000001 1101 .... 1 .. 00 . ...00 01 ... @azx_4x1_i1_o3 + +UDOT_nx_2h 11000001 0101 .... 0 .. 1 .. ....0 10 ... @azx_2x1_i2_o3 +UDOT_nx_2h 11000001 0101 .... 1 .. 1 .. ...00 10 ... @azx_4x1_i2_o3 +UDOT_nx_4b 11000001 0101 .... 0 .. 1 .. ....1 10 ... @azx_2x1_i2_o3 +UDOT_nx_4b 11000001 0101 .... 1 .. 1 .. ...01 10 ... @azx_4x1_i2_o3 +UDOT_nx_4h 11000001 1101 .... 0 .. 00 . ....0 11 ... @azx_2x1_i1_o3 +UDOT_nx_4h 11000001 1101 .... 1 .. 00 . ...00 11 ... @azx_4x1_i1_o3 + +USDOT_nx 11000001 0101 .... 0 .. 1 .. ....1 01 ... @azx_2x1_i2_o3 +USDOT_nx 11000001 0101 .... 1 .. 1 .. ...01 01 ... @azx_4x1_i2_o3 + +SUDOT_nx 11000001 0101 .... 0 .. 1 .. ....1 11 ... @azx_2x1_i2_o3 +SUDOT_nx 11000001 0101 .... 1 .. 1 .. ...01 11 ... @azx_4x1_i2_o3 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_vdot(DisasContext *s, arg_azx_n *a, gen_helper_gvec_4_ptr *fn) TRANS_FEAT(FVDOT, aa64_sme, do_vdot, a, gen_helper_sme2_fvdot_idx_h) TRANS_FEAT(BFVDOT, aa64_sme, do_vdot, a, gen_helper_sme2_bfvdot_idx) + +/* + * Expand array multi-vector single (n1), array multi-vector (nn), + * and array multi-vector indexed (nx), for integer accumulate. + * multi: true for nn, false for n1. + * data: stuff for simd_data, including any index. + */ +static bool do_azz_acc(DisasContext *s, int nreg, int nsel, + int rv, int off, int zn, int zm, + int data, int shsel, bool multi, + gen_helper_gvec_4 *fn) +{ + if (sme_smza_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + int vstride = svl / nreg; + TCGv_ptr t_za = get_zarray(s, rv, off, nreg, nsel); + TCGv_ptr t = tcg_temp_new_ptr(); + + for (int r = 0; r < nreg; ++r) { + TCGv_ptr t_zn = vec_full_reg_ptr(s, zn); + TCGv_ptr t_zm = vec_full_reg_ptr(s, zm); + + for (int i = 0; i < nsel; ++i) { + int o_za = (r * vstride + i) * sizeof(ARMVectorReg); + int desc = simd_desc(svl, svl, data | (i << shsel)); + + tcg_gen_addi_ptr(t, t_za, o_za); + fn(t, t_zn, t_zm, t, tcg_constant_i32(desc)); + } + + /* + * For multiple-and-single vectors, Zn may wrap. + * For multiple vectors, both Zn and Zm are aligned. + */ + zn = (zn + 1) % 32; + zm += multi; + } + } + return true; +} + +static bool do_dot(DisasContext *s, arg_azz_n *a, bool multi, + gen_helper_gvec_4 *fn) +{ + return do_azz_acc(s, a->n, 1, a->rv, a->off, a->zn, a->zm, + 0, 0, multi, fn); +} + +static void gen_helper_gvec_sudot_4b(TCGv_ptr d, TCGv_ptr n, TCGv_ptr m, + TCGv_ptr a, TCGv_i32 desc) +{ + gen_helper_gvec_usdot_4b(d, m, n, a, desc); +} + +TRANS_FEAT(USDOT_n1, aa64_sme2, do_dot, a, false, gen_helper_gvec_usdot_4b) +TRANS_FEAT(SUDOT_n1, aa64_sme2, do_dot, a, false, gen_helper_gvec_sudot_4b) +TRANS_FEAT(SDOT_n1_2h, aa64_sme2, do_dot, a, false, gen_helper_gvec_sdot_2h) +TRANS_FEAT(UDOT_n1_2h, aa64_sme2, do_dot, a, false, gen_helper_gvec_udot_2h) +TRANS_FEAT(SDOT_n1_4b, aa64_sme2, do_dot, a, false, gen_helper_gvec_sdot_4b) +TRANS_FEAT(UDOT_n1_4b, aa64_sme2, do_dot, a, false, gen_helper_gvec_udot_4b) +TRANS_FEAT(SDOT_n1_4h, aa64_sme2_i16i64, do_dot, a, false, gen_helper_gvec_sdot_4h) +TRANS_FEAT(UDOT_n1_4h, aa64_sme2_i16i64, do_dot, a, false, gen_helper_gvec_udot_4h) + +TRANS_FEAT(USDOT_nn, aa64_sme2, do_dot, a, true, gen_helper_gvec_usdot_4b) +TRANS_FEAT(SDOT_nn_2h, aa64_sme2, do_dot, a, true, gen_helper_gvec_sdot_2h) +TRANS_FEAT(UDOT_nn_2h, aa64_sme2, do_dot, a, true, gen_helper_gvec_udot_2h) +TRANS_FEAT(SDOT_nn_4b, aa64_sme2, do_dot, a, true, gen_helper_gvec_sdot_4b) +TRANS_FEAT(UDOT_nn_4b, aa64_sme2, do_dot, a, true, gen_helper_gvec_udot_4b) +TRANS_FEAT(SDOT_nn_4h, aa64_sme2_i16i64, do_dot, a, true, gen_helper_gvec_sdot_4h) +TRANS_FEAT(UDOT_nn_4h, aa64_sme2_i16i64, do_dot, a, true, gen_helper_gvec_udot_4h) + +static bool do_dot_nx(DisasContext *s, arg_azx_n *a, gen_helper_gvec_4 *fn) +{ + return do_azz_acc(s, a->n, 1, a->rv, a->off, a->zn, a->zm, + a->idx, 0, false, fn); +} + +TRANS_FEAT(USDOT_nx, aa64_sme2, do_dot_nx, a, gen_helper_gvec_usdot_idx_4b) +TRANS_FEAT(SUDOT_nx, aa64_sme2, do_dot_nx, a, gen_helper_gvec_sudot_idx_4b) +TRANS_FEAT(SDOT_nx_2h, aa64_sme2, do_dot_nx, a, gen_helper_gvec_sdot_idx_2h) +TRANS_FEAT(UDOT_nx_2h, aa64_sme2, do_dot_nx, a, gen_helper_gvec_udot_idx_2h) +TRANS_FEAT(SDOT_nx_4b, aa64_sme2, do_dot_nx, a, gen_helper_gvec_sdot_idx_4b) +TRANS_FEAT(UDOT_nx_4b, aa64_sme2, do_dot_nx, a, gen_helper_gvec_udot_idx_4b) +TRANS_FEAT(SDOT_nx_4h, aa64_sme2_i16i64, do_dot_nx, a, gen_helper_gvec_sdot_idx_4h) +TRANS_FEAT(UDOT_nx_4h, aa64_sme2_i16i64, do_dot_nx, a, gen_helper_gvec_udot_idx_4h) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ DO_DOT_IDX(gvec_usdot_idx_4b, int32_t, uint8_t, int8_t, H4) DO_DOT_IDX(gvec_sdot_idx_4h, int64_t, int16_t, int16_t, H8) DO_DOT_IDX(gvec_udot_idx_4h, uint64_t, uint16_t, uint16_t, H8) +#undef DO_DOT +#undef DO_DOT_IDX + +/* Similar for 2-way dot product */ +#define DO_DOT(NAME, TYPED, TYPEN, TYPEM) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ +{ \ + intptr_t i, opr_sz = simd_oprsz(desc); \ + TYPED *d = vd, *a = va; \ + TYPEN *n = vn; \ + TYPEM *m = vm; \ + for (i = 0; i < opr_sz / sizeof(TYPED); ++i) { \ + d[i] = (a[i] + \ + (TYPED)n[i * 2 + 0] * m[i * 2 + 0] + \ + (TYPED)n[i * 2 + 1] * m[i * 2 + 1]); \ + } \ + clear_tail(d, opr_sz, simd_maxsz(desc)); \ +} + +#define DO_DOT_IDX(NAME, TYPED, TYPEN, TYPEM, HD) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ +{ \ + intptr_t i = 0, opr_sz = simd_oprsz(desc); \ + intptr_t opr_sz_n = opr_sz / sizeof(TYPED); \ + intptr_t segend = MIN(16 / sizeof(TYPED), opr_sz_n); \ + intptr_t index = simd_data(desc); \ + TYPED *d = vd, *a = va; \ + TYPEN *n = vn; \ + TYPEM *m_indexed = (TYPEM *)vm + HD(index) * 2; \ + do { \ + TYPED m0 = m_indexed[i * 2 + 0]; \ + TYPED m1 = m_indexed[i * 2 + 1]; \ + do { \ + d[i] = (a[i] + \ + n[i * 2 + 0] * m0 + \ + n[i * 2 + 1] * m1); \ + } while (++i < segend); \ + segend = i + (16 / sizeof(TYPED)); \ + } while (i < opr_sz_n); \ + clear_tail(d, opr_sz, simd_maxsz(desc)); \ +} + +DO_DOT(gvec_sdot_2h, int32_t, int16_t, int16_t) +DO_DOT(gvec_udot_2h, uint32_t, uint16_t, uint16_t) + +DO_DOT_IDX(gvec_sdot_idx_2h, int32_t, int16_t, int16_t, H4) +DO_DOT_IDX(gvec_udot_idx_2h, uint32_t, uint16_t, uint16_t, H4) + +#undef DO_DOT +#undef DO_DOT_IDX + void HELPER(gvec_fcaddh)(void *vd, void *vn, void *vm, float_status *fpst, uint32_t desc) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Emphasize the 4-way nature of these dot products. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-43-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 12 ++++++------ target/arm/tcg/translate-sve.c | 12 ++++++------ 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ CDOT_zzzz 01000100 esz:2 0 rm:5 0001 rot:2 rn:5 rd:5 ra=%reg_movprfx #### SVE Multiply - Indexed # SVE integer dot product (indexed) -SDOT_zzxw_s 01000100 10 1 ..... 000000 ..... ..... @rrxr_2 esz=2 -SDOT_zzxw_d 01000100 11 1 ..... 000000 ..... ..... @rrxr_1 esz=3 -UDOT_zzxw_s 01000100 10 1 ..... 000001 ..... ..... @rrxr_2 esz=2 -UDOT_zzxw_d 01000100 11 1 ..... 000001 ..... ..... @rrxr_1 esz=3 +SDOT_zzxw_4s 01000100 10 1 ..... 000000 ..... ..... @rrxr_2 esz=2 +SDOT_zzxw_4d 01000100 11 1 ..... 000000 ..... ..... @rrxr_1 esz=3 +UDOT_zzxw_4s 01000100 10 1 ..... 000001 ..... ..... @rrxr_2 esz=2 +UDOT_zzxw_4d 01000100 11 1 ..... 000001 ..... ..... @rrxr_1 esz=3 # SVE2 integer multiply-add (indexed) MLA_zzxz_h 01000100 0. 1 ..... 000010 ..... ..... @rrxr_3 esz=1 @@ -XXX,XX +XXX,XX @@ SQRDMLSH_zzxz_s 01000100 10 1 ..... 000101 ..... ..... @rrxr_2 esz=2 SQRDMLSH_zzxz_d 01000100 11 1 ..... 000101 ..... ..... @rrxr_1 esz=3 # SVE mixed sign dot product (indexed) -USDOT_zzxw_s 01000100 10 1 ..... 000110 ..... ..... @rrxr_2 esz=2 -SUDOT_zzxw_s 01000100 10 1 ..... 000111 ..... ..... @rrxr_2 esz=2 +USDOT_zzxw_4s 01000100 10 1 ..... 000110 ..... ..... @rrxr_2 esz=2 +SUDOT_zzxw_4s 01000100 10 1 ..... 000111 ..... ..... @rrxr_2 esz=2 # SVE2 saturating multiply-add (indexed) SQDMLALB_zzxw_s 01000100 10 1 ..... 0010.0 ..... ..... @rrxr_3a esz=2 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(DOT_zzzz, aa64_sve, gen_gvec_ool_zzzz, * SVE Multiply - Indexed */ -TRANS_FEAT(SDOT_zzxw_s, aa64_sve, gen_gvec_ool_arg_zzxz, +TRANS_FEAT(SDOT_zzxw_4s, aa64_sve, gen_gvec_ool_arg_zzxz, gen_helper_gvec_sdot_idx_4b, a) -TRANS_FEAT(SDOT_zzxw_d, aa64_sve, gen_gvec_ool_arg_zzxz, +TRANS_FEAT(SDOT_zzxw_4d, aa64_sve, gen_gvec_ool_arg_zzxz, gen_helper_gvec_sdot_idx_4h, a) -TRANS_FEAT(UDOT_zzxw_s, aa64_sve, gen_gvec_ool_arg_zzxz, +TRANS_FEAT(UDOT_zzxw_4s, aa64_sve, gen_gvec_ool_arg_zzxz, gen_helper_gvec_udot_idx_4b, a) -TRANS_FEAT(UDOT_zzxw_d, aa64_sve, gen_gvec_ool_arg_zzxz, +TRANS_FEAT(UDOT_zzxw_4d, aa64_sve, gen_gvec_ool_arg_zzxz, gen_helper_gvec_udot_idx_4h, a) -TRANS_FEAT(SUDOT_zzxw_s, aa64_sve_i8mm, gen_gvec_ool_arg_zzxz, +TRANS_FEAT(SUDOT_zzxw_4s, aa64_sve_i8mm, gen_gvec_ool_arg_zzxz, gen_helper_gvec_sudot_idx_4b, a) -TRANS_FEAT(USDOT_zzxw_s, aa64_sve_i8mm, gen_gvec_ool_arg_zzxz, +TRANS_FEAT(USDOT_zzxw_4s, aa64_sve_i8mm, gen_gvec_ool_arg_zzxz, gen_helper_gvec_usdot_idx_4b, a) #define DO_SVE2_RRX(NAME, FUNC) \ -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Rename to USDOT_zzzz_4s and force size=2 during decode. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-44-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 2 +- target/arm/tcg/translate-sve.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ SQRDCMLAH_zzzz 01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5 ra=%reg_movprfx ## SVE mixed sign dot product -USDOT_zzzz 01000100 .. 0 ..... 011 110 ..... ..... @rda_rn_rm +USDOT_zzzz_4s 01000100 10 0 ..... 011 110 ..... ..... @rda_rn_rm_ex esz=2 ### SVE2 floating point matrix multiply accumulate BFMMLA 01100100 01 1 ..... 111 001 ..... ..... @rda_rn_rm_ex esz=1 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_4 * const sqrdcmlah_fns[] = { TRANS_FEAT(SQRDCMLAH_zzzz, aa64_sve2, gen_gvec_ool_zzzz, sqrdcmlah_fns[a->esz], a->rd, a->rn, a->rm, a->ra, a->rot) -TRANS_FEAT(USDOT_zzzz, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz, - a->esz == 2 ? gen_helper_gvec_usdot_4b : NULL, a, 0) +TRANS_FEAT(USDOT_zzzz_4s, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz, + gen_helper_gvec_usdot_4b, a, 0) TRANS_FEAT_NONSTREAMING(AESMC, aa64_sve2_aes, gen_gvec_ool_zz, gen_helper_crypto_aesmc, a->rd, a->rd, 0) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-45-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 8 +++++++- target/arm/tcg/translate-sve.c | 10 ++++++++++ 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ SDOT_zzxw_4d 01000100 11 1 ..... 000000 ..... ..... @rrxr_1 esz=3 UDOT_zzxw_4s 01000100 10 1 ..... 000001 ..... ..... @rrxr_2 esz=2 UDOT_zzxw_4d 01000100 11 1 ..... 000001 ..... ..... @rrxr_1 esz=3 +SDOT_zzxw_2s 01000100 10 0 ..... 110010 ..... ..... @rrxr_2 esz=2 +UDOT_zzxw_2s 01000100 10 0 ..... 110011 ..... ..... @rrxr_2 esz=2 + # SVE2 integer multiply-add (indexed) MLA_zzxz_h 01000100 0. 1 ..... 000010 ..... ..... @rrxr_3 esz=1 MLA_zzxz_s 01000100 10 1 ..... 000010 ..... ..... @rrxr_2 esz=2 @@ -XXX,XX +XXX,XX @@ UMLSLT_zzzw 01000100 .. 0 ..... 010 111 ..... ..... @rda_rn_rm CMLA_zzzz 01000100 esz:2 0 rm:5 0010 rot:2 rn:5 rd:5 ra=%reg_movprfx SQRDCMLAH_zzzz 01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5 ra=%reg_movprfx -## SVE mixed sign dot product +## SVE dot product + +SDOT_zzzz_2s 01000100 00 0 ..... 110 010 ..... ..... @rda_rn_rm_ex esz=2 +UDOT_zzzz_2s 01000100 00 0 ..... 110 011 ..... ..... @rda_rn_rm_ex esz=2 USDOT_zzzz_4s 01000100 10 0 ..... 011 110 ..... ..... @rda_rn_rm_ex esz=2 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SUDOT_zzxw_4s, aa64_sve_i8mm, gen_gvec_ool_arg_zzxz, TRANS_FEAT(USDOT_zzxw_4s, aa64_sve_i8mm, gen_gvec_ool_arg_zzxz, gen_helper_gvec_usdot_idx_4b, a) +TRANS_FEAT(SDOT_zzxw_2s, aa64_sme2_or_sve2p1, gen_gvec_ool_arg_zzxz, + gen_helper_gvec_sdot_idx_2h, a) +TRANS_FEAT(UDOT_zzxw_2s, aa64_sme2_or_sve2p1, gen_gvec_ool_arg_zzxz, + gen_helper_gvec_udot_idx_2h, a) + #define DO_SVE2_RRX(NAME, FUNC) \ TRANS_FEAT(NAME, aa64_sve, gen_gvec_ool_zzz, FUNC, \ a->rd, a->rn, a->rm, a->index) @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SQRDCMLAH_zzzz, aa64_sve2, gen_gvec_ool_zzzz, TRANS_FEAT(USDOT_zzzz_4s, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz, gen_helper_gvec_usdot_4b, a, 0) +TRANS_FEAT(SDOT_zzzz_2s, aa64_sme2_or_sve2p1, gen_gvec_ool_arg_zzzz, + gen_helper_gvec_sdot_2h, a, 0) +TRANS_FEAT(UDOT_zzzz_2s, aa64_sme2_or_sve2p1, gen_gvec_ool_arg_zzzz, + gen_helper_gvec_udot_2h, a, 0) + TRANS_FEAT_NONSTREAMING(AESMC, aa64_sve2_aes, gen_gvec_ool_zz, gen_helper_crypto_aesmc, a->rd, a->rd, 0) TRANS_FEAT_NONSTREAMING(AESIMC, aa64_sve2_aes, gen_gvec_ool_zz, -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-46-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 11 +++++++++ target/arm/tcg/sme.decode | 11 +++++++++ target/arm/tcg/sme_helper.c | 42 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 23 +++++++++++++++++++ 4 files changed, 87 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sme2_fdot_idx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, env, i32) DEF_HELPER_FLAGS_6(sme2_fvdot_idx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, env, i32) + +DEF_HELPER_FLAGS_4(sme2_svdot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uvdot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_suvdot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_usvdot_idx_4b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sme2_svdot_idx_4h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uvdot_idx_4h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sme2_svdot_idx_2h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uvdot_idx_2h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ USDOT_nx 11000001 0101 .... 1 .. 1 .. ...01 01 ... @azx_4x1_i2_o3 SUDOT_nx 11000001 0101 .... 0 .. 1 .. ....1 11 ... @azx_2x1_i2_o3 SUDOT_nx 11000001 0101 .... 1 .. 1 .. ...01 11 ... @azx_4x1_i2_o3 + +SVDOT_nx_2h 11000001 0101 .... 0 .. 0 .. ....1 00 ... @azx_2x1_i2_o3 +SVDOT_nx_4b 11000001 0101 .... 1 .. 0 .. ...01 00 ... @azx_4x1_i2_o3 +SVDOT_nx_4h 11000001 1101 .... 1 .. 01 . ...00 01 ... @azx_4x1_i1_o3 + +UVDOT_nx_2h 11000001 0101 .... 0 .. 0 .. ....1 10 ... @azx_2x1_i2_o3 +UVDOT_nx_4b 11000001 0101 .... 1 .. 0 .. ...01 10 ... @azx_4x1_i2_o3 +UVDOT_nx_4h 11000001 1101 .... 1 .. 01 . ...00 11 ... @azx_4x1_i1_o3 + +SUVDOT_nx_4b 11000001 0101 .... 1 .. 0 .. ...01 11 ... @azx_4x1_i2_o3 +USVDOT_nx_4b 11000001 0101 .... 1 .. 0 .. ...01 01 ... @azx_4x1_i2_o3 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ DEF_IMOP_16x2_32(umopa2_s, uint16_t, uint16_t) DEF_IMOPH(sme2, smopa2, s) DEF_IMOPH(sme2, umopa2, s) + +#define DO_VDOT_IDX(NAME, TYPED, TYPEN, TYPEM, HD, HN) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ +{ \ + intptr_t svl = simd_oprsz(desc); \ + intptr_t elements = svl / sizeof(TYPED); \ + intptr_t eltperseg = 16 / sizeof(TYPED); \ + intptr_t nreg = sizeof(TYPED) / sizeof(TYPEN); \ + intptr_t vstride = (svl / nreg) * sizeof(ARMVectorReg); \ + intptr_t zstride = sizeof(ARMVectorReg) / sizeof(TYPEN); \ + intptr_t idx = extract32(desc, SIMD_DATA_SHIFT, 2); \ + TYPEN *n = vn; \ + TYPEM *m = vm; \ + for (intptr_t r = 0; r < nreg; r++) { \ + TYPED *d = vd + r * vstride; \ + for (intptr_t seg = 0; seg < elements; seg += eltperseg) { \ + intptr_t s = seg + idx; \ + for (intptr_t e = seg; e < seg + eltperseg; e++) { \ + TYPED sum = d[HD(e)]; \ + for (intptr_t i = 0; i < nreg; i++) { \ + TYPED nn = n[i * zstride + HN(nreg * e + r)]; \ + TYPED mm = m[HN(nreg * s + i)]; \ + sum += nn * mm; \ + } \ + d[HD(e)] = sum; \ + } \ + } \ + } \ +} + +DO_VDOT_IDX(sme2_svdot_idx_4b, int32_t, int8_t, int8_t, H4, H1) +DO_VDOT_IDX(sme2_uvdot_idx_4b, uint32_t, uint8_t, uint8_t, H4, H1) +DO_VDOT_IDX(sme2_suvdot_idx_4b, int32_t, int8_t, uint8_t, H4, H1) +DO_VDOT_IDX(sme2_usvdot_idx_4b, int32_t, uint8_t, int8_t, H4, H1) + +DO_VDOT_IDX(sme2_svdot_idx_4h, int64_t, int16_t, int16_t, H8, H2) +DO_VDOT_IDX(sme2_uvdot_idx_4h, uint64_t, uint16_t, uint16_t, H8, H2) + +DO_VDOT_IDX(sme2_svdot_idx_2h, int32_t, int16_t, int16_t, H4, H2) +DO_VDOT_IDX(sme2_uvdot_idx_2h, uint32_t, uint16_t, uint16_t, H4, H2) + +#undef DO_VDOT_IDX diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SDOT_nx_4b, aa64_sme2, do_dot_nx, a, gen_helper_gvec_sdot_idx_4b) TRANS_FEAT(UDOT_nx_4b, aa64_sme2, do_dot_nx, a, gen_helper_gvec_udot_idx_4b) TRANS_FEAT(SDOT_nx_4h, aa64_sme2_i16i64, do_dot_nx, a, gen_helper_gvec_sdot_idx_4h) TRANS_FEAT(UDOT_nx_4h, aa64_sme2_i16i64, do_dot_nx, a, gen_helper_gvec_udot_idx_4h) + +static bool do_vdot_nx(DisasContext *s, arg_azx_n *a, gen_helper_gvec_3 *fn) +{ + if (sme_smza_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + fn(get_zarray(s, a->rv, a->off, a->n, 0), + vec_full_reg_ptr(s, a->zn), + vec_full_reg_ptr(s, a->zm), + tcg_constant_i32(simd_desc(svl, svl, a->idx))); + } + return true; +} + +TRANS_FEAT(SVDOT_nx_2h, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_svdot_idx_2h) +TRANS_FEAT(SVDOT_nx_4b, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_svdot_idx_4b) +TRANS_FEAT(SVDOT_nx_4h, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_svdot_idx_4h) + +TRANS_FEAT(UVDOT_nx_2h, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_uvdot_idx_2h) +TRANS_FEAT(UVDOT_nx_4b, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_uvdot_idx_4b) +TRANS_FEAT(UVDOT_nx_4h, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_uvdot_idx_4h) + +TRANS_FEAT(SUVDOT_nx_4b, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_suvdot_idx_4b) +TRANS_FEAT(USVDOT_nx_4b, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_usvdot_idx_4b) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-47-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 21 +++++ target/arm/tcg/sme.decode | 168 +++++++++++++++++++++++++++++++++ target/arm/tcg/sme_helper.c | 59 ++++++++++++ target/arm/tcg/translate-sme.c | 84 +++++++++++++++++ 4 files changed, 332 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sme2_uvdot_idx_4h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sme2_svdot_idx_2h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sme2_uvdot_idx_2h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sme2_smlall_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_smlall_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_smlsll_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_smlsll_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_umlall_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_umlall_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_umlsll_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_umlsll_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_usmlall_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sme2_smlall_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_smlall_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_smlsll_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_smlsll_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_umlall_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_umlall_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_umlsll_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_umlsll_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_usmlall_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_5(sme2_sumlall_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ UDOT_n1_4h 11000001 011 1 .... 0 .. 101 ..... 10 ... @azz_nx1_o3 n=4 UDOT_n1_2h 11000001 011 0 .... 0 .. 101 ..... 11 ... @azz_nx1_o3 n=2 UDOT_n1_2h 11000001 011 1 .... 0 .. 101 ..... 11 ... @azz_nx1_o3 n=4 +SMLAL_n1 11000001 011 0 .... 0 .. 011 ..... 00 ... @azz_nx1_o3x2 n=1 +SMLAL_n1 11000001 011 0 .... 0 .. 010 ..... 000 .. @azz_nx1_o2x2 n=2 +SMLAL_n1 11000001 011 1 .... 0 .. 010 ..... 000 .. @azz_nx1_o2x2 n=4 + +SMLSL_n1 11000001 011 0 .... 0 .. 011 ..... 01 ... @azz_nx1_o3x2 n=1 +SMLSL_n1 11000001 011 0 .... 0 .. 010 ..... 010 .. @azz_nx1_o2x2 n=2 +SMLSL_n1 11000001 011 1 .... 0 .. 010 ..... 010 .. @azz_nx1_o2x2 n=4 + +UMLAL_n1 11000001 011 0 .... 0 .. 011 ..... 10 ... @azz_nx1_o3x2 n=1 +UMLAL_n1 11000001 011 0 .... 0 .. 010 ..... 100 .. @azz_nx1_o2x2 n=2 +UMLAL_n1 11000001 011 1 .... 0 .. 010 ..... 100 .. @azz_nx1_o2x2 n=4 + +UMLSL_n1 11000001 011 0 .... 0 .. 011 ..... 11 ... @azz_nx1_o3x2 n=1 +UMLSL_n1 11000001 011 0 .... 0 .. 010 ..... 110 .. @azz_nx1_o2x2 n=2 +UMLSL_n1 11000001 011 1 .... 0 .. 010 ..... 110 .. @azz_nx1_o2x2 n=4 + +%off2_x4 0:2 !function=times_4 +%off1_x4 0:1 !function=times_4 + +@azz_nx1_o2x4 ........ ... . zm:4 . .. ... zn:5 ... .. \ + &azz_n off=%off2_x4 rv=%mova_rv +@azz_nx1_o1x4 ........ ... . zm:4 . .. ... zn:5 .... . \ + &azz_n off=%off1_x4 rv=%mova_rv + +SMLALL_n1_s 11000001 001 0 .... 0 .. 001 ..... 000 .. @azz_nx1_o2x4 n=1 +SMLALL_n1_d 11000001 011 0 .... 0 .. 001 ..... 000 .. @azz_nx1_o2x4 n=1 +SMLALL_n1_s 11000001 001 0 .... 0 .. 000 ..... 0000 . @azz_nx1_o1x4 n=2 +SMLALL_n1_d 11000001 011 0 .... 0 .. 000 ..... 0000 . @azz_nx1_o1x4 n=2 +SMLALL_n1_s 11000001 001 1 .... 0 .. 000 ..... 0000 . @azz_nx1_o1x4 n=4 +SMLALL_n1_d 11000001 011 1 .... 0 .. 000 ..... 0000 . @azz_nx1_o1x4 n=4 + +SMLSLL_n1_s 11000001 001 0 .... 0 .. 001 ..... 010 .. @azz_nx1_o2x4 n=1 +SMLSLL_n1_d 11000001 011 0 .... 0 .. 001 ..... 010 .. @azz_nx1_o2x4 n=1 +SMLSLL_n1_s 11000001 001 0 .... 0 .. 000 ..... 0100 . @azz_nx1_o1x4 n=2 +SMLSLL_n1_d 11000001 011 0 .... 0 .. 000 ..... 0100 . @azz_nx1_o1x4 n=2 +SMLSLL_n1_s 11000001 001 1 .... 0 .. 000 ..... 0100 . @azz_nx1_o1x4 n=4 +SMLSLL_n1_d 11000001 011 1 .... 0 .. 000 ..... 0100 . @azz_nx1_o1x4 n=4 + +UMLALL_n1_s 11000001 001 0 .... 0 .. 001 ..... 100 .. @azz_nx1_o2x4 n=1 +UMLALL_n1_d 11000001 011 0 .... 0 .. 001 ..... 100 .. @azz_nx1_o2x4 n=1 +UMLALL_n1_s 11000001 001 0 .... 0 .. 000 ..... 1000 . @azz_nx1_o1x4 n=2 +UMLALL_n1_d 11000001 011 0 .... 0 .. 000 ..... 1000 . @azz_nx1_o1x4 n=2 +UMLALL_n1_s 11000001 001 1 .... 0 .. 000 ..... 1000 . @azz_nx1_o1x4 n=4 +UMLALL_n1_d 11000001 011 1 .... 0 .. 000 ..... 1000 . @azz_nx1_o1x4 n=4 + +UMLSLL_n1_s 11000001 001 0 .... 0 .. 001 ..... 110 .. @azz_nx1_o2x4 n=1 +UMLSLL_n1_d 11000001 011 0 .... 0 .. 001 ..... 110 .. @azz_nx1_o2x4 n=1 +UMLSLL_n1_s 11000001 001 0 .... 0 .. 000 ..... 1100 . @azz_nx1_o1x4 n=2 +UMLSLL_n1_d 11000001 011 0 .... 0 .. 000 ..... 1100 . @azz_nx1_o1x4 n=2 +UMLSLL_n1_s 11000001 001 1 .... 0 .. 000 ..... 1100 . @azz_nx1_o1x4 n=4 +UMLSLL_n1_d 11000001 011 1 .... 0 .. 000 ..... 1100 . @azz_nx1_o1x4 n=4 + +USMLALL_n1_s 11000001 001 0 .... 0 .. 001 ..... 001 .. @azz_nx1_o2x4 n=1 +USMLALL_n1_s 11000001 001 0 .... 0 .. 000 ..... 0010 . @azz_nx1_o1x4 n=2 +USMLALL_n1_s 11000001 001 1 .... 0 .. 000 ..... 0010 . @azz_nx1_o1x4 n=4 + +SUMLALL_n1_s 11000001 001 0 .... 0 .. 000 ..... 1010 . @azz_nx1_o1x4 n=2 +SUMLALL_n1_s 11000001 001 1 .... 0 .. 000 ..... 1010 . @azz_nx1_o1x4 n=4 + ### SME2 Multi-vector Multiple Array Vectors %zn_ax2 6:4 !function=times_2 @@ -XXX,XX +XXX,XX @@ UDOT_nn_4h 11000001 111 ...01 0 .. 101 ...00 10 ... @azz_4x4_o3 UDOT_nn_2h 11000001 111 ....0 0 .. 101 ....0 11 ... @azz_2x2_o3 UDOT_nn_2h 11000001 111 ...01 0 .. 101 ...00 11 ... @azz_4x4_o3 +SMLAL_nn 11000001 111 ....0 0 .. 010 ....0 000 .. @azz_2x2_o2x2 +SMLAL_nn 11000001 111 ...01 0 .. 010 ...00 000 .. @azz_4x4_o2x2 + +SMLSL_nn 11000001 111 ....0 0 .. 010 ....0 010 .. @azz_2x2_o2x2 +SMLSL_nn 11000001 111 ...01 0 .. 010 ...00 010 .. @azz_4x4_o2x2 + +UMLAL_nn 11000001 111 ....0 0 .. 010 ....0 100 .. @azz_2x2_o2x2 +UMLAL_nn 11000001 111 ...01 0 .. 010 ...00 100 .. @azz_4x4_o2x2 + +UMLSL_nn 11000001 111 ....0 0 .. 010 ....0 110 .. @azz_2x2_o2x2 +UMLSL_nn 11000001 111 ...01 0 .. 010 ...00 110 .. @azz_4x4_o2x2 + +@azz_2x2_o1x4 ........ ... ..... . .. ... ..... ... .. \ + &azz_n n=2 rv=%mova_rv zn=%zn_ax2 zm=%zm_ax2 off=%off1_x4 +@azz_4x4_o1x4 ........ ... ..... . .. ... ..... ... .. \ + &azz_n n=4 rv=%mova_rv zn=%zn_ax4 zm=%zm_ax4 off=%off1_x4 + +SMLALL_nn_s 11000001 101 ....0 0 .. 000 ....0 0000 . @azz_2x2_o1x4 +SMLALL_nn_d 11000001 111 ....0 0 .. 000 ....0 0000 . @azz_2x2_o1x4 +SMLALL_nn_s 11000001 101 ...01 0 .. 000 ...00 0000 . @azz_4x4_o1x4 +SMLALL_nn_d 11000001 111 ...01 0 .. 000 ...00 0000 . @azz_4x4_o1x4 + +SMLSLL_nn_s 11000001 101 ....0 0 .. 000 ....0 0100 . @azz_2x2_o1x4 +SMLSLL_nn_d 11000001 111 ....0 0 .. 000 ....0 0100 . @azz_2x2_o1x4 +SMLSLL_nn_s 11000001 101 ...01 0 .. 000 ...00 0100 . @azz_4x4_o1x4 +SMLSLL_nn_d 11000001 111 ...01 0 .. 000 ...00 0100 . @azz_4x4_o1x4 + +UMLALL_nn_s 11000001 101 ....0 0 .. 000 ....0 1000 . @azz_2x2_o1x4 +UMLALL_nn_d 11000001 111 ....0 0 .. 000 ....0 1000 . @azz_2x2_o1x4 +UMLALL_nn_s 11000001 101 ...01 0 .. 000 ...00 1000 . @azz_4x4_o1x4 +UMLALL_nn_d 11000001 111 ...01 0 .. 000 ...00 1000 . @azz_4x4_o1x4 + +UMLSLL_nn_s 11000001 101 ....0 0 .. 000 ....0 1100 . @azz_2x2_o1x4 +UMLSLL_nn_d 11000001 111 ....0 0 .. 000 ....0 1100 . @azz_2x2_o1x4 +UMLSLL_nn_s 11000001 101 ...01 0 .. 000 ...00 1100 . @azz_4x4_o1x4 +UMLSLL_nn_d 11000001 111 ...01 0 .. 000 ...00 1100 . @azz_4x4_o1x4 + +USMLALL_nn_s 11000001 101 ....0 0 .. 000 ....0 0010 . @azz_2x2_o1x4 +USMLALL_nn_s 11000001 101 ...01 0 .. 000 ...00 0010 . @azz_4x4_o1x4 + ### SME2 Multi-vector Indexed &azx_n n off rv zn zm idx @@ -XXX,XX +XXX,XX @@ UVDOT_nx_4h 11000001 1101 .... 1 .. 01 . ...00 11 ... @azx_4x1_i1_o3 SUVDOT_nx_4b 11000001 0101 .... 1 .. 0 .. ...01 11 ... @azx_4x1_i2_o3 USVDOT_nx_4b 11000001 0101 .... 1 .. 0 .. ...01 01 ... @azx_4x1_i2_o3 + +SMLAL_nx 11000001 1100 .... . .. 1 .. ..... 00 ... @azx_1x1_o3x2 +SMLAL_nx 11000001 1101 .... 0 .. 1 .. ....0 00 ... @azx_2x1_o2x2 +SMLAL_nx 11000001 1101 .... 1 .. 1 .. ...00 00 ... @azx_4x1_o2x2 + +SMLSL_nx 11000001 1100 .... . .. 1 .. ..... 01 ... @azx_1x1_o3x2 +SMLSL_nx 11000001 1101 .... 0 .. 1 .. ....0 01 ... @azx_2x1_o2x2 +SMLSL_nx 11000001 1101 .... 1 .. 1 .. ...00 01 ... @azx_4x1_o2x2 + +UMLAL_nx 11000001 1100 .... . .. 1 .. ..... 10 ... @azx_1x1_o3x2 +UMLAL_nx 11000001 1101 .... 0 .. 1 .. ....0 10 ... @azx_2x1_o2x2 +UMLAL_nx 11000001 1101 .... 1 .. 1 .. ...00 10 ... @azx_4x1_o2x2 + +UMLSL_nx 11000001 1100 .... . .. 1 .. ..... 11 ... @azx_1x1_o3x2 +UMLSL_nx 11000001 1101 .... 0 .. 1 .. ....0 11 ... @azx_2x1_o2x2 +UMLSL_nx 11000001 1101 .... 1 .. 1 .. ...00 11 ... @azx_4x1_o2x2 + +%idx4_15_10 15:1 10:3 +%idx4_10_1 10:2 1:2 +%idx3_10_1 10:1 1:2 + +@azx_1x1_i4_o2 ........ .... zm:4 . .. ... zn:5 ... .. \ + &azx_n n=1 rv=%mova_rv off=%off2_x4 idx=%idx4_15_10 +@azx_1x1_i3_o2 ........ .... zm:4 . .. ... zn:5 ... .. \ + &azx_n n=1 rv=%mova_rv off=%off2_x4 idx=%idx3_15_10 +@azx_2x1_i4_o1 ........ .... zm:4 . .. ... ..... ... .. \ + &azx_n n=2 rv=%mova_rv off=%off1_x4 zn=%zn_ax2 idx=%idx4_10_1 +@azx_2x1_i3_o1 ........ .... zm:4 . .. ... ..... ... .. \ + &azx_n n=2 rv=%mova_rv off=%off1_x4 zn=%zn_ax2 idx=%idx3_10_1 +@azx_4x1_i4_o1 ........ .... zm:4 . .. ... ..... ... .. \ + &azx_n n=4 rv=%mova_rv off=%off1_x4 zn=%zn_ax4 idx=%idx4_10_1 +@azx_4x1_i3_o1 ........ .... zm:4 . .. ... ..... ... .. \ + &azx_n n=4 rv=%mova_rv off=%off1_x4 zn=%zn_ax4 idx=%idx3_10_1 + +SMLALL_nx_s 11000001 0000 .... . .. ... ..... 000 .. @azx_1x1_i4_o2 +SMLALL_nx_d 11000001 1000 .... . .. 0.. ..... 000 .. @azx_1x1_i3_o2 +SMLALL_nx_s 11000001 0001 .... 0 .. 0.. ....0 00 ... @azx_2x1_i4_o1 +SMLALL_nx_d 11000001 1001 .... 0 .. 00. ....0 00 ... @azx_2x1_i3_o1 +SMLALL_nx_s 11000001 0001 .... 1 .. 0.. ...00 00 ... @azx_4x1_i4_o1 +SMLALL_nx_d 11000001 1001 .... 1 .. 00. ...00 00 ... @azx_4x1_i3_o1 + +SMLSLL_nx_s 11000001 0000 .... . .. ... ..... 010 .. @azx_1x1_i4_o2 +SMLSLL_nx_d 11000001 1000 .... . .. 0.. ..... 010 .. @azx_1x1_i3_o2 +SMLSLL_nx_s 11000001 0001 .... 0 .. 0.. ....0 01 ... @azx_2x1_i4_o1 +SMLSLL_nx_d 11000001 1001 .... 0 .. 00. ....0 01 ... @azx_2x1_i3_o1 +SMLSLL_nx_s 11000001 0001 .... 1 .. 0.. ...00 01 ... @azx_4x1_i4_o1 +SMLSLL_nx_d 11000001 1001 .... 1 .. 00. ...00 01 ... @azx_4x1_i3_o1 + +UMLALL_nx_s 11000001 0000 .... . .. ... ..... 100 .. @azx_1x1_i4_o2 +UMLALL_nx_d 11000001 1000 .... . .. 0.. ..... 100 .. @azx_1x1_i3_o2 +UMLALL_nx_s 11000001 0001 .... 0 .. 0.. ....0 10 ... @azx_2x1_i4_o1 +UMLALL_nx_d 11000001 1001 .... 0 .. 00. ....0 10 ... @azx_2x1_i3_o1 +UMLALL_nx_s 11000001 0001 .... 1 .. 0.. ...00 10 ... @azx_4x1_i4_o1 +UMLALL_nx_d 11000001 1001 .... 1 .. 00. ...00 10 ... @azx_4x1_i3_o1 + +UMLSLL_nx_s 11000001 0000 .... . .. ... ..... 110 .. @azx_1x1_i4_o2 +UMLSLL_nx_d 11000001 1000 .... . .. 0.. ..... 110 .. @azx_1x1_i3_o2 +UMLSLL_nx_s 11000001 0001 .... 0 .. 0.. ....0 11 ... @azx_2x1_i4_o1 +UMLSLL_nx_d 11000001 1001 .... 0 .. 00. ....0 11 ... @azx_2x1_i3_o1 +UMLSLL_nx_s 11000001 0001 .... 1 .. 0.. ...00 11 ... @azx_4x1_i4_o1 +UMLSLL_nx_d 11000001 1001 .... 1 .. 00. ...00 11 ... @azx_4x1_i3_o1 + +USMLALL_nx_s 11000001 0000 .... . .. ... ..... 001 .. @azx_1x1_i4_o2 +USMLALL_nx_s 11000001 0001 .... 0 .. 0.. ....1 00 ... @azx_2x1_i4_o1 +USMLALL_nx_s 11000001 0001 .... 1 .. 0.. ...01 00 ... @azx_4x1_i4_o1 + +SUMLALL_nx_s 11000001 0000 .... . .. ... ..... 101 .. @azx_1x1_i4_o2 +SUMLALL_nx_s 11000001 0001 .... 0 .. 0.. ....1 10 ... @azx_2x1_i4_o1 +SUMLALL_nx_s 11000001 0001 .... 1 .. 0.. ...01 10 ... @azx_4x1_i4_o1 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ DO_VDOT_IDX(sme2_svdot_idx_2h, int32_t, int16_t, int16_t, H4, H2) DO_VDOT_IDX(sme2_uvdot_idx_2h, uint32_t, uint16_t, uint16_t, H4, H2) #undef DO_VDOT_IDX + +#define DO_MLALL(NAME, TYPEW, TYPEN, TYPEM, HW, HN, OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ +{ \ + intptr_t elements = simd_oprsz(desc) / sizeof(TYPEW); \ + intptr_t sel = extract32(desc, SIMD_DATA_SHIFT, 2); \ + TYPEW *d = vd, *a = va; TYPEN *n = vn; TYPEM *m = vm; \ + for (intptr_t i = 0; i < elements; ++i) { \ + TYPEW nn = n[HN(i * 4 + sel)]; \ + TYPEM mm = m[HN(i * 4 + sel)]; \ + d[HW(i)] = a[HW(i)] OP (nn * mm); \ + } \ +} + +DO_MLALL(sme2_smlall_s, int32_t, int8_t, int8_t, H4, H1, +) +DO_MLALL(sme2_smlall_d, int64_t, int16_t, int16_t, H8, H2, +) +DO_MLALL(sme2_smlsll_s, int32_t, int8_t, int8_t, H4, H1, -) +DO_MLALL(sme2_smlsll_d, int64_t, int16_t, int16_t, H8, H2, -) + +DO_MLALL(sme2_umlall_s, uint32_t, uint8_t, uint8_t, H4, H1, +) +DO_MLALL(sme2_umlall_d, uint64_t, uint16_t, uint16_t, H8, H2, +) +DO_MLALL(sme2_umlsll_s, uint32_t, uint8_t, uint8_t, H4, H1, -) +DO_MLALL(sme2_umlsll_d, uint64_t, uint16_t, uint16_t, H8, H2, -) + +DO_MLALL(sme2_usmlall_s, uint32_t, uint8_t, int8_t, H4, H1, +) + +#undef DO_MLALL + +#define DO_MLALL_IDX(NAME, TYPEW, TYPEN, TYPEM, HW, HN, OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \ +{ \ + intptr_t elements = simd_oprsz(desc) / sizeof(TYPEW); \ + intptr_t eltspersegment = 16 / sizeof(TYPEW); \ + intptr_t sel = extract32(desc, SIMD_DATA_SHIFT, 2); \ + intptr_t idx = extract32(desc, SIMD_DATA_SHIFT + 2, 4); \ + TYPEW *d = vd, *a = va; TYPEN *n = vn; TYPEM *m = vm; \ + for (intptr_t i = 0; i < elements; i += eltspersegment) { \ + TYPEW mm = m[HN(i * 4 + idx)]; \ + for (intptr_t j = 0; j < eltspersegment; ++j) { \ + TYPEN nn = n[HN((i + j) * 4 + sel)]; \ + d[HW(i + j)] = a[HW(i + j)] OP (nn * mm); \ + } \ + } \ +} + +DO_MLALL_IDX(sme2_smlall_idx_s, int32_t, int8_t, int8_t, H4, H1, +) +DO_MLALL_IDX(sme2_smlall_idx_d, int64_t, int16_t, int16_t, H8, H2, +) +DO_MLALL_IDX(sme2_smlsll_idx_s, int32_t, int8_t, int8_t, H4, H1, -) +DO_MLALL_IDX(sme2_smlsll_idx_d, int64_t, int16_t, int16_t, H8, H2, -) + +DO_MLALL_IDX(sme2_umlall_idx_s, uint32_t, uint8_t, uint8_t, H4, H1, +) +DO_MLALL_IDX(sme2_umlall_idx_d, uint64_t, uint16_t, uint16_t, H8, H2, +) +DO_MLALL_IDX(sme2_umlsll_idx_s, uint32_t, uint8_t, uint8_t, H4, H1, -) +DO_MLALL_IDX(sme2_umlsll_idx_d, uint64_t, uint16_t, uint16_t, H8, H2, -) + +DO_MLALL_IDX(sme2_usmlall_idx_s, uint32_t, uint8_t, int8_t, H4, H1, +) +DO_MLALL_IDX(sme2_sumlall_idx_s, uint32_t, int8_t, uint8_t, H4, H1, +) + +#undef DO_MLALL_IDX diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(UVDOT_nx_4h, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_uvdot_idx_4h) TRANS_FEAT(SUVDOT_nx_4b, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_suvdot_idx_4b) TRANS_FEAT(USVDOT_nx_4b, aa64_sme2, do_vdot_nx, a, gen_helper_sme2_usvdot_idx_4b) + +static bool do_smlal(DisasContext *s, arg_azz_n *a, bool multi, + gen_helper_gvec_4 *fn) +{ + return do_azz_acc(s, a->n, 2, a->rv, a->off, a->zn, a->zm, + 0, 0, multi, fn); +} + +TRANS_FEAT(SMLAL_n1, aa64_sme2, do_smlal, a, false, gen_helper_sve2_smlal_zzzw_s) +TRANS_FEAT(SMLSL_n1, aa64_sme2, do_smlal, a, false, gen_helper_sve2_smlsl_zzzw_s) +TRANS_FEAT(UMLAL_n1, aa64_sme2, do_smlal, a, false, gen_helper_sve2_umlal_zzzw_s) +TRANS_FEAT(UMLSL_n1, aa64_sme2, do_smlal, a, false, gen_helper_sve2_umlsl_zzzw_s) + +TRANS_FEAT(SMLAL_nn, aa64_sme2, do_smlal, a, true, gen_helper_sve2_smlal_zzzw_s) +TRANS_FEAT(SMLSL_nn, aa64_sme2, do_smlal, a, true, gen_helper_sve2_smlsl_zzzw_s) +TRANS_FEAT(UMLAL_nn, aa64_sme2, do_smlal, a, true, gen_helper_sve2_umlal_zzzw_s) +TRANS_FEAT(UMLSL_nn, aa64_sme2, do_smlal, a, true, gen_helper_sve2_umlsl_zzzw_s) + +static bool do_smlal_nx(DisasContext *s, arg_azx_n *a, + gen_helper_gvec_4 *fn) +{ + return do_azz_acc(s, a->n, 2, a->rv, a->off, a->zn, a->zm, + a->idx << 1, 0, false, fn); +} + +TRANS_FEAT(SMLAL_nx, aa64_sme2, do_smlal_nx, a, gen_helper_sve2_smlal_idx_s) +TRANS_FEAT(SMLSL_nx, aa64_sme2, do_smlal_nx, a, gen_helper_sve2_smlsl_idx_s) +TRANS_FEAT(UMLAL_nx, aa64_sme2, do_smlal_nx, a, gen_helper_sve2_umlal_idx_s) +TRANS_FEAT(UMLSL_nx, aa64_sme2, do_smlal_nx, a, gen_helper_sve2_umlsl_idx_s) + +static bool do_smlall(DisasContext *s, arg_azz_n *a, bool multi, + gen_helper_gvec_4 *fn) +{ + return do_azz_acc(s, a->n, 4, a->rv, a->off, a->zn, a->zm, + 0, 0, multi, fn); +} + +static void gen_helper_sme2_sumlall_s(TCGv_ptr d, TCGv_ptr n, TCGv_ptr m, + TCGv_ptr a, TCGv_i32 desc) +{ + gen_helper_sme2_usmlall_s(d, m, n, a, desc); +} + +TRANS_FEAT(SMLALL_n1_s, aa64_sme2, do_smlall, a, false, gen_helper_sme2_smlall_s) +TRANS_FEAT(SMLSLL_n1_s, aa64_sme2, do_smlall, a, false, gen_helper_sme2_smlsll_s) +TRANS_FEAT(UMLALL_n1_s, aa64_sme2, do_smlall, a, false, gen_helper_sme2_umlall_s) +TRANS_FEAT(UMLSLL_n1_s, aa64_sme2, do_smlall, a, false, gen_helper_sme2_umlsll_s) +TRANS_FEAT(USMLALL_n1_s, aa64_sme2, do_smlall, a, false, gen_helper_sme2_usmlall_s) +TRANS_FEAT(SUMLALL_n1_s, aa64_sme2, do_smlall, a, false, gen_helper_sme2_sumlall_s) + +TRANS_FEAT(SMLALL_n1_d, aa64_sme2_i16i64, do_smlall, a, false, gen_helper_sme2_smlall_d) +TRANS_FEAT(SMLSLL_n1_d, aa64_sme2_i16i64, do_smlall, a, false, gen_helper_sme2_smlsll_d) +TRANS_FEAT(UMLALL_n1_d, aa64_sme2_i16i64, do_smlall, a, false, gen_helper_sme2_umlall_d) +TRANS_FEAT(UMLSLL_n1_d, aa64_sme2_i16i64, do_smlall, a, false, gen_helper_sme2_umlsll_d) + +TRANS_FEAT(SMLALL_nn_s, aa64_sme2, do_smlall, a, true, gen_helper_sme2_smlall_s) +TRANS_FEAT(SMLSLL_nn_s, aa64_sme2, do_smlall, a, true, gen_helper_sme2_smlsll_s) +TRANS_FEAT(UMLALL_nn_s, aa64_sme2, do_smlall, a, true, gen_helper_sme2_umlall_s) +TRANS_FEAT(UMLSLL_nn_s, aa64_sme2, do_smlall, a, true, gen_helper_sme2_umlsll_s) +TRANS_FEAT(USMLALL_nn_s, aa64_sme2, do_smlall, a, true, gen_helper_sme2_usmlall_s) + +TRANS_FEAT(SMLALL_nn_d, aa64_sme2_i16i64, do_smlall, a, true, gen_helper_sme2_smlall_d) +TRANS_FEAT(SMLSLL_nn_d, aa64_sme2_i16i64, do_smlall, a, true, gen_helper_sme2_smlsll_d) +TRANS_FEAT(UMLALL_nn_d, aa64_sme2_i16i64, do_smlall, a, true, gen_helper_sme2_umlall_d) +TRANS_FEAT(UMLSLL_nn_d, aa64_sme2_i16i64, do_smlall, a, true, gen_helper_sme2_umlsll_d) + +static bool do_smlall_nx(DisasContext *s, arg_azx_n *a, + gen_helper_gvec_4 *fn) +{ + return do_azz_acc(s, a->n, 4, a->rv, a->off, a->zn, a->zm, + a->idx << 2, 0, false, fn); +} + +TRANS_FEAT(SMLALL_nx_s, aa64_sme2, do_smlall_nx, a, gen_helper_sme2_smlall_idx_s) +TRANS_FEAT(SMLSLL_nx_s, aa64_sme2, do_smlall_nx, a, gen_helper_sme2_smlsll_idx_s) +TRANS_FEAT(UMLALL_nx_s, aa64_sme2, do_smlall_nx, a, gen_helper_sme2_umlall_idx_s) +TRANS_FEAT(UMLSLL_nx_s, aa64_sme2, do_smlall_nx, a, gen_helper_sme2_umlsll_idx_s) +TRANS_FEAT(USMLALL_nx_s, aa64_sme2, do_smlall_nx, a, gen_helper_sme2_usmlall_idx_s) +TRANS_FEAT(SUMLALL_nx_s, aa64_sme2, do_smlall_nx, a, gen_helper_sme2_sumlall_idx_s) + +TRANS_FEAT(SMLALL_nx_d, aa64_sme2_i16i64, do_smlall_nx, a, gen_helper_sme2_smlall_idx_d) +TRANS_FEAT(SMLSLL_nx_d, aa64_sme2_i16i64, do_smlall_nx, a, gen_helper_sme2_smlsll_idx_d) +TRANS_FEAT(UMLALL_nx_d, aa64_sme2_i16i64, do_smlall_nx, a, gen_helper_sme2_umlall_idx_d) +TRANS_FEAT(UMLSLL_nx_d, aa64_sme2_i16i64, do_smlall_nx, a, gen_helper_sme2_umlsll_idx_d) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Emphasize the non-fused nature of these multiply-add. Matches other helpers such as gvec_rsqrts_nf_[hs]. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-48-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper.h | 8 ++++---- target/arm/tcg/translate-neon.c | 4 ++-- target/arm/tcg/vec_helper.c | 8 ++++---- 3 files changed, 10 insertions(+), 10 deletions(-) diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper.h +++ b/target/arm/tcg/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_recps_nf_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, DEF_HELPER_FLAGS_5(gvec_rsqrts_nf_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_rsqrts_nf_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) -DEF_HELPER_FLAGS_5(gvec_fmla_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) -DEF_HELPER_FLAGS_5(gvec_fmla_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_fmla_nf_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_fmla_nf_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) -DEF_HELPER_FLAGS_5(gvec_fmls_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) -DEF_HELPER_FLAGS_5(gvec_fmls_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_fmls_nf_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_fmls_nf_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_vfma_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_vfma_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-neon.c +++ b/target/arm/tcg/translate-neon.c @@ -XXX,XX +XXX,XX @@ DO_3S_FP_GVEC(VACGE, gen_helper_gvec_facge_s, gen_helper_gvec_facge_h) DO_3S_FP_GVEC(VACGT, gen_helper_gvec_facgt_s, gen_helper_gvec_facgt_h) DO_3S_FP_GVEC(VMAX, gen_helper_gvec_fmax_s, gen_helper_gvec_fmax_h) DO_3S_FP_GVEC(VMIN, gen_helper_gvec_fmin_s, gen_helper_gvec_fmin_h) -DO_3S_FP_GVEC(VMLA, gen_helper_gvec_fmla_s, gen_helper_gvec_fmla_h) -DO_3S_FP_GVEC(VMLS, gen_helper_gvec_fmls_s, gen_helper_gvec_fmls_h) +DO_3S_FP_GVEC(VMLA, gen_helper_gvec_fmla_nf_s, gen_helper_gvec_fmla_nf_h) +DO_3S_FP_GVEC(VMLS, gen_helper_gvec_fmls_nf_s, gen_helper_gvec_fmls_nf_h) DO_3S_FP_GVEC(VFMA, gen_helper_gvec_vfma_s, gen_helper_gvec_vfma_h) DO_3S_FP_GVEC(VFMS, gen_helper_gvec_vfms_s, gen_helper_gvec_vfms_h) DO_3S_FP_GVEC(VRECPS, gen_helper_gvec_recps_nf_s, gen_helper_gvec_recps_nf_h) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, \ clear_tail(d, oprsz, simd_maxsz(desc)); \ } -DO_MULADD(gvec_fmla_h, float16_muladd_nf, float16) -DO_MULADD(gvec_fmla_s, float32_muladd_nf, float32) +DO_MULADD(gvec_fmla_nf_h, float16_muladd_nf, float16) +DO_MULADD(gvec_fmla_nf_s, float32_muladd_nf, float32) -DO_MULADD(gvec_fmls_h, float16_mulsub_nf, float16) -DO_MULADD(gvec_fmls_s, float32_mulsub_nf, float32) +DO_MULADD(gvec_fmls_nf_h, float16_mulsub_nf, float16) +DO_MULADD(gvec_fmls_nf_s, float32_mulsub_nf, float32) DO_MULADD(gvec_vfma_h, float16_muladd_f, float16) DO_MULADD(gvec_vfma_s, float32_muladd_f, float32) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-49-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 48 +++++++++++++++++ target/arm/tcg/translate-sme.c | 95 ++++++++++++++++++++++++++++++++++ 2 files changed, 143 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ USMLALL_n1_s 11000001 001 1 .... 0 .. 000 ..... 0010 . @azz_nx1_o1x4 n=4 SUMLALL_n1_s 11000001 001 0 .... 0 .. 000 ..... 1010 . @azz_nx1_o1x4 n=2 SUMLALL_n1_s 11000001 001 1 .... 0 .. 000 ..... 1010 . @azz_nx1_o1x4 n=4 +FMLA_n1_h 11000001 001 0 .... 0 .. 111 ..... 00 ... @azz_nx1_o3 n=2 +FMLA_n1_s 11000001 001 0 .... 0 .. 110 ..... 00 ... @azz_nx1_o3 n=2 +FMLA_n1_d 11000001 011 0 .... 0 .. 110 ..... 00 ... @azz_nx1_o3 n=2 +FMLA_n1_h 11000001 001 1 .... 0 .. 111 ..... 00 ... @azz_nx1_o3 n=4 +FMLA_n1_s 11000001 001 1 .... 0 .. 110 ..... 00 ... @azz_nx1_o3 n=4 +FMLA_n1_d 11000001 011 1 .... 0 .. 110 ..... 00 ... @azz_nx1_o3 n=4 + +FMLS_n1_h 11000001 001 0 .... 0 .. 111 ..... 01 ... @azz_nx1_o3 n=2 +FMLS_n1_s 11000001 001 0 .... 0 .. 110 ..... 01 ... @azz_nx1_o3 n=2 +FMLS_n1_d 11000001 011 0 .... 0 .. 110 ..... 01 ... @azz_nx1_o3 n=2 +FMLS_n1_h 11000001 001 1 .... 0 .. 111 ..... 01 ... @azz_nx1_o3 n=4 +FMLS_n1_s 11000001 001 1 .... 0 .. 110 ..... 01 ... @azz_nx1_o3 n=4 +FMLS_n1_d 11000001 011 1 .... 0 .. 110 ..... 01 ... @azz_nx1_o3 n=4 + ### SME2 Multi-vector Multiple Array Vectors %zn_ax2 6:4 !function=times_2 @@ -XXX,XX +XXX,XX @@ UMLSLL_nn_d 11000001 111 ...01 0 .. 000 ...00 1100 . @azz_4x4_o1x4 USMLALL_nn_s 11000001 101 ....0 0 .. 000 ....0 0010 . @azz_2x2_o1x4 USMLALL_nn_s 11000001 101 ...01 0 .. 000 ...00 0010 . @azz_4x4_o1x4 +FMLA_nn_h 11000001 101 ....0 0 .. 100 ....0 01 ... @azz_2x2_o3 +FMLA_nn_s 11000001 101 ....0 0 .. 110 ....0 00 ... @azz_2x2_o3 +FMLA_nn_d 11000001 111 ....0 0 .. 110 ....0 00 ... @azz_2x2_o3 +FMLA_nn_h 11000001 101 ...01 0 .. 100 ...00 01 ... @azz_4x4_o3 +FMLA_nn_s 11000001 101 ...01 0 .. 110 ...00 00 ... @azz_4x4_o3 +FMLA_nn_d 11000001 111 ...01 0 .. 110 ...00 00 ... @azz_4x4_o3 + +FMLS_nn_h 11000001 101 ....0 0 .. 100 ....0 11 ... @azz_2x2_o3 +FMLS_nn_s 11000001 101 ....0 0 .. 110 ....0 01 ... @azz_2x2_o3 +FMLS_nn_d 11000001 111 ....0 0 .. 110 ....0 01 ... @azz_2x2_o3 +FMLS_nn_h 11000001 101 ...01 0 .. 100 ...00 11 ... @azz_4x4_o3 +FMLS_nn_s 11000001 101 ...01 0 .. 110 ...00 01 ... @azz_4x4_o3 +FMLS_nn_d 11000001 111 ...01 0 .. 110 ...00 01 ... @azz_4x4_o3 + ### SME2 Multi-vector Indexed &azx_n n off rv zn zm idx @@ -XXX,XX +XXX,XX @@ USMLALL_nx_s 11000001 0001 .... 1 .. 0.. ...01 00 ... @azx_4x1_i4_o1 SUMLALL_nx_s 11000001 0000 .... . .. ... ..... 101 .. @azx_1x1_i4_o2 SUMLALL_nx_s 11000001 0001 .... 0 .. 0.. ....1 10 ... @azx_2x1_i4_o1 SUMLALL_nx_s 11000001 0001 .... 1 .. 0.. ...01 10 ... @azx_4x1_i4_o1 + +%idx3_10_3 10:2 3:1 +@azx_2x1_i3_o3 ........ .... zm:4 . .. ... ..... .. off:3 \ + &azx_n n=2 rv=%mova_rv zn=%zn_ax2 idx=%idx3_10_3 +@azx_4x1_i3_o3 ........ .... zm:4 . .. ... ..... .. off:3 \ + &azx_n n=4 rv=%mova_rv zn=%zn_ax4 idx=%idx3_10_3 + +FMLA_nx_h 11000001 0001 .... 0 .. 1.. ....0 0 .... @azx_2x1_i3_o3 +FMLA_nx_s 11000001 0101 .... 0 .. 0.. ....0 00 ... @azx_2x1_i2_o3 +FMLA_nx_d 11000001 1101 .... 0 .. 00. ....0 00 ... @azx_2x1_i1_o3 +FMLA_nx_h 11000001 0001 .... 1 .. 1.. ...00 0 .... @azx_4x1_i3_o3 +FMLA_nx_s 11000001 0101 .... 1 .. 0.. ...00 00 ... @azx_4x1_i2_o3 +FMLA_nx_d 11000001 1101 .... 1 .. 00. ...00 00 ... @azx_4x1_i1_o3 + +FMLS_nx_h 11000001 0001 .... 0 .. 1.. ....0 1 .... @azx_2x1_i3_o3 +FMLS_nx_s 11000001 0101 .... 0 .. 0.. ....0 10 ... @azx_2x1_i2_o3 +FMLS_nx_d 11000001 1101 .... 0 .. 00. ....0 10 ... @azx_2x1_i1_o3 +FMLS_nx_h 11000001 0001 .... 1 .. 1.. ...00 1 .... @azx_4x1_i3_o3 +FMLS_nx_s 11000001 0101 .... 1 .. 0.. ...00 10 ... @azx_4x1_i2_o3 +FMLS_nx_d 11000001 1101 .... 1 .. 00. ...00 10 ... @azx_4x1_i1_o3 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SUB_azz_nn_d, aa64_sme2_i16i64, do_azz_nn, a, MO_64, tcg_gen_gvec_sub */ #define FPST_ENV -1 +static bool do_azz_fp(DisasContext *s, int nreg, int nsel, + int rv, int off, int zn, int zm, + int data, int shsel, bool multi, int fpst, + gen_helper_gvec_3_ptr *fn) +{ + if (sme_smza_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + int vstride = svl / nreg; + TCGv_ptr t_za = get_zarray(s, rv, off, nreg, nsel); + TCGv_ptr t, ptr; + + if (fpst >= 0) { + ptr = fpstatus_ptr(fpst); + } else { + ptr = tcg_env; + } + t = tcg_temp_new_ptr(); + + for (int r = 0; r < nreg; ++r) { + TCGv_ptr t_zn = vec_full_reg_ptr(s, zn); + TCGv_ptr t_zm = vec_full_reg_ptr(s, zm); + + for (int i = 0; i < nsel; ++i) { + int o_za = (r * vstride + i) * sizeof(ARMVectorReg); + int desc = simd_desc(svl, svl, data | (i << shsel)); + + tcg_gen_addi_ptr(t, t_za, o_za); + fn(t, t_zn, t_zm, ptr, tcg_constant_i32(desc)); + } + + /* + * For multiple-and-single vectors, Zn may wrap. + * For multiple vectors, both Zn and Zm are aligned. + */ + zn = (zn + 1) % 32; + zm += multi; + } + } + return true; +} + static bool do_azz_acc_fp(DisasContext *s, int nreg, int nsel, int rv, int off, int zn, int zm, int data, int shsel, bool multi, int fpst, @@ -XXX,XX +XXX,XX @@ static bool do_vdot(DisasContext *s, arg_azx_n *a, gen_helper_gvec_4_ptr *fn) TRANS_FEAT(FVDOT, aa64_sme, do_vdot, a, gen_helper_sme2_fvdot_idx_h) TRANS_FEAT(BFVDOT, aa64_sme, do_vdot, a, gen_helper_sme2_bfvdot_idx) +static bool do_fmla(DisasContext *s, arg_azz_n *a, bool multi, + ARMFPStatusFlavour fpst, gen_helper_gvec_3_ptr *fn) +{ + return do_azz_fp(s, a->n, 1, a->rv, a->off, a->zn, a->zm, + 0, 0, multi, fpst, fn); +} + +TRANS_FEAT(FMLA_n1_h, aa64_sme_f16f16, do_fmla, a, false, FPST_ZA_F16, + gen_helper_gvec_vfma_h) +TRANS_FEAT(FMLS_n1_h, aa64_sme_f16f16, do_fmla, a, false, FPST_ZA_F16, + s->fpcr_ah ? gen_helper_gvec_ah_vfms_h : gen_helper_gvec_vfms_h) +TRANS_FEAT(FMLA_nn_h, aa64_sme_f16f16, do_fmla, a, true, FPST_ZA_F16, + gen_helper_gvec_vfma_h) +TRANS_FEAT(FMLS_nn_h, aa64_sme_f16f16, do_fmla, a, true, FPST_ZA_F16, + s->fpcr_ah ? gen_helper_gvec_ah_vfms_h : gen_helper_gvec_vfms_h) + +TRANS_FEAT(FMLA_n1_s, aa64_sme2, do_fmla, a, false, FPST_ZA, + gen_helper_gvec_vfma_s) +TRANS_FEAT(FMLS_n1_s, aa64_sme2, do_fmla, a, false, FPST_ZA, + s->fpcr_ah ? gen_helper_gvec_ah_vfms_s : gen_helper_gvec_vfms_s) +TRANS_FEAT(FMLA_nn_s, aa64_sme2, do_fmla, a, true, FPST_ZA, + gen_helper_gvec_vfma_s) +TRANS_FEAT(FMLS_nn_s, aa64_sme2, do_fmla, a, true, FPST_ZA, + s->fpcr_ah ? gen_helper_gvec_ah_vfms_s : gen_helper_gvec_vfms_s) + +TRANS_FEAT(FMLA_n1_d, aa64_sme2_f64f64, do_fmla, a, false, FPST_ZA, + gen_helper_gvec_vfma_d) +TRANS_FEAT(FMLS_n1_d, aa64_sme2_f64f64, do_fmla, a, false, FPST_ZA, + s->fpcr_ah ? gen_helper_gvec_ah_vfms_d : gen_helper_gvec_vfms_d) +TRANS_FEAT(FMLA_nn_d, aa64_sme2_f64f64, do_fmla, a, true, FPST_ZA, + gen_helper_gvec_vfma_d) +TRANS_FEAT(FMLS_nn_d, aa64_sme2_f64f64, do_fmla, a, true, FPST_ZA, + s->fpcr_ah ? gen_helper_gvec_ah_vfms_d : gen_helper_gvec_vfms_d) + +static bool do_fmla_nx(DisasContext *s, arg_azx_n *a, + ARMFPStatusFlavour fpst, gen_helper_gvec_4_ptr *fn) +{ + return do_azz_acc_fp(s, a->n, 1, a->rv, a->off, a->zn, a->zm, + a->idx, 0, false, fpst, fn); +} + +TRANS_FEAT(FMLA_nx_h, aa64_sme_f16f16, do_fmla_nx, a, FPST_ZA_F16, + gen_helper_gvec_fmla_idx_h) +TRANS_FEAT(FMLS_nx_h, aa64_sme_f16f16, do_fmla_nx, a, FPST_ZA_F16, + s->fpcr_ah ? gen_helper_gvec_ah_fmls_idx_h : gen_helper_gvec_fmls_idx_h) +TRANS_FEAT(FMLA_nx_s, aa64_sme2, do_fmla_nx, a, FPST_ZA, + gen_helper_gvec_fmla_idx_s) +TRANS_FEAT(FMLS_nx_s, aa64_sme2, do_fmla_nx, a, FPST_ZA, + s->fpcr_ah ? gen_helper_gvec_ah_fmls_idx_s : gen_helper_gvec_fmls_idx_s) +TRANS_FEAT(FMLA_nx_d, aa64_sme2_f64f64, do_fmla_nx, a, FPST_ZA, + gen_helper_gvec_fmla_idx_d) +TRANS_FEAT(FMLS_nx_d, aa64_sme2_f64f64, do_fmla_nx, a, FPST_ZA, + s->fpcr_ah ? gen_helper_gvec_ah_fmls_idx_d : gen_helper_gvec_fmls_idx_d) + /* * Expand array multi-vector single (n1), array multi-vector (nn), * and array multi-vector indexed (nx), for integer accumulate. -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-50-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper.h | 9 +++++++++ target/arm/tcg/sme.decode | 18 ++++++++++++++++++ target/arm/tcg/translate-sme.c | 14 ++++++++++++++ target/arm/tcg/vec_helper.c | 26 ++++++++++++++++++++++++++ 4 files changed, 67 insertions(+) diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper.h +++ b/target/arm/tcg/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fmls_nf_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i DEF_HELPER_FLAGS_5(gvec_vfma_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_vfma_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_vfma_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_bfmla, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_vfms_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_vfms_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_vfms_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_bfmls, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_ah_vfms_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_ah_vfms_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_ah_vfms_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_ah_bfmls, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_ftsmul_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(gvec_fmla_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_6(gvec_fmla_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_6(gvec_bfmla_idx, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_6(gvec_fmls_idx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(gvec_fmls_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_6(gvec_fmls_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_6(gvec_bfmls_idx, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_6(gvec_ah_fmls_idx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(gvec_ah_fmls_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_6(gvec_ah_fmls_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_6(gvec_ah_bfmls_idx, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_uqadd_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ USMLALL_n1_s 11000001 001 1 .... 0 .. 000 ..... 0010 . @azz_nx1_o1x4 n=4 SUMLALL_n1_s 11000001 001 0 .... 0 .. 000 ..... 1010 . @azz_nx1_o1x4 n=2 SUMLALL_n1_s 11000001 001 1 .... 0 .. 000 ..... 1010 . @azz_nx1_o1x4 n=4 +BFMLA_n1 11000001 011 0 .... 0 .. 111 ..... 00 ... @azz_nx1_o3 n=2 FMLA_n1_h 11000001 001 0 .... 0 .. 111 ..... 00 ... @azz_nx1_o3 n=2 FMLA_n1_s 11000001 001 0 .... 0 .. 110 ..... 00 ... @azz_nx1_o3 n=2 FMLA_n1_d 11000001 011 0 .... 0 .. 110 ..... 00 ... @azz_nx1_o3 n=2 + +BFMLA_n1 11000001 011 1 .... 0 .. 111 ..... 00 ... @azz_nx1_o3 n=4 FMLA_n1_h 11000001 001 1 .... 0 .. 111 ..... 00 ... @azz_nx1_o3 n=4 FMLA_n1_s 11000001 001 1 .... 0 .. 110 ..... 00 ... @azz_nx1_o3 n=4 FMLA_n1_d 11000001 011 1 .... 0 .. 110 ..... 00 ... @azz_nx1_o3 n=4 +BFMLS_n1 11000001 011 0 .... 0 .. 111 ..... 01 ... @azz_nx1_o3 n=2 FMLS_n1_h 11000001 001 0 .... 0 .. 111 ..... 01 ... @azz_nx1_o3 n=2 FMLS_n1_s 11000001 001 0 .... 0 .. 110 ..... 01 ... @azz_nx1_o3 n=2 FMLS_n1_d 11000001 011 0 .... 0 .. 110 ..... 01 ... @azz_nx1_o3 n=2 + +BFMLS_n1 11000001 011 1 .... 0 .. 111 ..... 01 ... @azz_nx1_o3 n=4 FMLS_n1_h 11000001 001 1 .... 0 .. 111 ..... 01 ... @azz_nx1_o3 n=4 FMLS_n1_s 11000001 001 1 .... 0 .. 110 ..... 01 ... @azz_nx1_o3 n=4 FMLS_n1_d 11000001 011 1 .... 0 .. 110 ..... 01 ... @azz_nx1_o3 n=4 @@ -XXX,XX +XXX,XX @@ UMLSLL_nn_d 11000001 111 ...01 0 .. 000 ...00 1100 . @azz_4x4_o1x4 USMLALL_nn_s 11000001 101 ....0 0 .. 000 ....0 0010 . @azz_2x2_o1x4 USMLALL_nn_s 11000001 101 ...01 0 .. 000 ...00 0010 . @azz_4x4_o1x4 +BFMLA_nn 11000001 111 ....0 0 .. 100 ....0 01 ... @azz_2x2_o3 FMLA_nn_h 11000001 101 ....0 0 .. 100 ....0 01 ... @azz_2x2_o3 FMLA_nn_s 11000001 101 ....0 0 .. 110 ....0 00 ... @azz_2x2_o3 FMLA_nn_d 11000001 111 ....0 0 .. 110 ....0 00 ... @azz_2x2_o3 + +BFMLA_nn 11000001 111 ...01 0 .. 100 ...00 01 ... @azz_4x4_o3 FMLA_nn_h 11000001 101 ...01 0 .. 100 ...00 01 ... @azz_4x4_o3 FMLA_nn_s 11000001 101 ...01 0 .. 110 ...00 00 ... @azz_4x4_o3 FMLA_nn_d 11000001 111 ...01 0 .. 110 ...00 00 ... @azz_4x4_o3 +BFMLS_nn 11000001 111 ....0 0 .. 100 ....0 11 ... @azz_2x2_o3 FMLS_nn_h 11000001 101 ....0 0 .. 100 ....0 11 ... @azz_2x2_o3 FMLS_nn_s 11000001 101 ....0 0 .. 110 ....0 01 ... @azz_2x2_o3 FMLS_nn_d 11000001 111 ....0 0 .. 110 ....0 01 ... @azz_2x2_o3 + +BFMLS_nn 11000001 111 ...01 0 .. 100 ...00 11 ... @azz_4x4_o3 FMLS_nn_h 11000001 101 ...01 0 .. 100 ...00 11 ... @azz_4x4_o3 FMLS_nn_s 11000001 101 ...01 0 .. 110 ...00 01 ... @azz_4x4_o3 FMLS_nn_d 11000001 111 ...01 0 .. 110 ...00 01 ... @azz_4x4_o3 @@ -XXX,XX +XXX,XX @@ SUMLALL_nx_s 11000001 0001 .... 1 .. 0.. ...01 10 ... @azx_4x1_i4_o1 @azx_4x1_i3_o3 ........ .... zm:4 . .. ... ..... .. off:3 \ &azx_n n=4 rv=%mova_rv zn=%zn_ax4 idx=%idx3_10_3 +BFMLA_nx 11000001 0001 .... 0 .. 1.. ....1 0 .... @azx_2x1_i3_o3 FMLA_nx_h 11000001 0001 .... 0 .. 1.. ....0 0 .... @azx_2x1_i3_o3 FMLA_nx_s 11000001 0101 .... 0 .. 0.. ....0 00 ... @azx_2x1_i2_o3 FMLA_nx_d 11000001 1101 .... 0 .. 00. ....0 00 ... @azx_2x1_i1_o3 + +BFMLA_nx 11000001 0001 .... 1 .. 1.. ...01 0 .... @azx_4x1_i3_o3 FMLA_nx_h 11000001 0001 .... 1 .. 1.. ...00 0 .... @azx_4x1_i3_o3 FMLA_nx_s 11000001 0101 .... 1 .. 0.. ...00 00 ... @azx_4x1_i2_o3 FMLA_nx_d 11000001 1101 .... 1 .. 00. ...00 00 ... @azx_4x1_i1_o3 +BFMLS_nx 11000001 0001 .... 0 .. 1.. ....1 1 .... @azx_2x1_i3_o3 FMLS_nx_h 11000001 0001 .... 0 .. 1.. ....0 1 .... @azx_2x1_i3_o3 FMLS_nx_s 11000001 0101 .... 0 .. 0.. ....0 10 ... @azx_2x1_i2_o3 FMLS_nx_d 11000001 1101 .... 0 .. 00. ....0 10 ... @azx_2x1_i1_o3 + +BFMLS_nx 11000001 0001 .... 1 .. 1.. ...01 1 .... @azx_4x1_i3_o3 FMLS_nx_h 11000001 0001 .... 1 .. 1.. ...00 1 .... @azx_4x1_i3_o3 FMLS_nx_s 11000001 0101 .... 1 .. 0.. ...00 10 ... @azx_4x1_i2_o3 FMLS_nx_d 11000001 1101 .... 1 .. 00. ...00 10 ... @azx_4x1_i1_o3 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FMLA_nn_d, aa64_sme2_f64f64, do_fmla, a, true, FPST_ZA, TRANS_FEAT(FMLS_nn_d, aa64_sme2_f64f64, do_fmla, a, true, FPST_ZA, s->fpcr_ah ? gen_helper_gvec_ah_vfms_d : gen_helper_gvec_vfms_d) +TRANS_FEAT(BFMLA_n1, aa64_sme_b16b16, do_fmla, a, false, FPST_ZA, + gen_helper_gvec_bfmla) +TRANS_FEAT(BFMLS_n1, aa64_sme_b16b16, do_fmla, a, false, FPST_ZA, + s->fpcr_ah ? gen_helper_gvec_ah_bfmls : gen_helper_gvec_bfmls) +TRANS_FEAT(BFMLA_nn, aa64_sme_b16b16, do_fmla, a, true, FPST_ZA, + gen_helper_gvec_bfmla) +TRANS_FEAT(BFMLS_nn, aa64_sme_b16b16, do_fmla, a, true, FPST_ZA, + s->fpcr_ah ? gen_helper_gvec_ah_bfmls : gen_helper_gvec_bfmls) + static bool do_fmla_nx(DisasContext *s, arg_azx_n *a, ARMFPStatusFlavour fpst, gen_helper_gvec_4_ptr *fn) { @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FMLA_nx_d, aa64_sme2_f64f64, do_fmla_nx, a, FPST_ZA, TRANS_FEAT(FMLS_nx_d, aa64_sme2_f64f64, do_fmla_nx, a, FPST_ZA, s->fpcr_ah ? gen_helper_gvec_ah_fmls_idx_d : gen_helper_gvec_fmls_idx_d) +TRANS_FEAT(BFMLA_nx, aa64_sme_b16b16, do_fmla_nx, a, FPST_ZA, + gen_helper_gvec_bfmla_idx) +TRANS_FEAT(BFMLS_nx, aa64_sme_b16b16, do_fmla_nx, a, FPST_ZA, + s->fpcr_ah ? gen_helper_gvec_ah_bfmls_idx : gen_helper_gvec_bfmls_idx) + /* * Expand array multi-vector single (n1), array multi-vector (nn), * and array multi-vector indexed (nx), for integer accumulate. diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ static float16 float16_muladd_f(float16 dest, float16 op1, float16 op2, return float16_muladd(op1, op2, dest, 0, stat); } +static bfloat16 bfloat16_muladd_f(bfloat16 dest, bfloat16 op1, bfloat16 op2, + float_status *stat) +{ + return bfloat16_muladd(op1, op2, dest, 0, stat); +} + static float32 float32_muladd_f(float32 dest, float32 op1, float32 op2, float_status *stat) { @@ -XXX,XX +XXX,XX @@ static float16 float16_mulsub_f(float16 dest, float16 op1, float16 op2, return float16_muladd(float16_chs(op1), op2, dest, 0, stat); } +static bfloat16 bfloat16_mulsub_f(bfloat16 dest, bfloat16 op1, bfloat16 op2, + float_status *stat) +{ + return bfloat16_muladd(bfloat16_chs(op1), op2, dest, 0, stat); +} + static float32 float32_mulsub_f(float32 dest, float32 op1, float32 op2, float_status *stat) { @@ -XXX,XX +XXX,XX @@ static float16 float16_ah_mulsub_f(float16 dest, float16 op1, float16 op2, return float16_muladd(op1, op2, dest, float_muladd_negate_product, stat); } +static bfloat16 bfloat16_ah_mulsub_f(bfloat16 dest, bfloat16 op1, bfloat16 op2, + float_status *stat) +{ + return bfloat16_muladd(op1, op2, dest, float_muladd_negate_product, stat); +} + static float32 float32_ah_mulsub_f(float32 dest, float32 op1, float32 op2, float_status *stat) { @@ -XXX,XX +XXX,XX @@ DO_MULADD(gvec_fmls_nf_s, float32_mulsub_nf, float32) DO_MULADD(gvec_vfma_h, float16_muladd_f, float16) DO_MULADD(gvec_vfma_s, float32_muladd_f, float32) DO_MULADD(gvec_vfma_d, float64_muladd_f, float64) +DO_MULADD(gvec_bfmla, bfloat16_muladd_f, bfloat16) DO_MULADD(gvec_vfms_h, float16_mulsub_f, float16) DO_MULADD(gvec_vfms_s, float32_mulsub_f, float32) DO_MULADD(gvec_vfms_d, float64_mulsub_f, float64) +DO_MULADD(gvec_bfmls, bfloat16_mulsub_f, bfloat16) DO_MULADD(gvec_ah_vfms_h, float16_ah_mulsub_f, float16) DO_MULADD(gvec_ah_vfms_s, float32_ah_mulsub_f, float32) DO_MULADD(gvec_ah_vfms_d, float64_ah_mulsub_f, float64) +DO_MULADD(gvec_ah_bfmls, bfloat16_ah_mulsub_f, bfloat16) + +#undef DO_MULADD /* For the indexed ops, SVE applies the index per 128-bit vector segment. * For AdvSIMD, there is of course only one such vector segment. @@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, \ DO_FMLA_IDX(gvec_fmla_idx_h, float16, H2, 0, 0) DO_FMLA_IDX(gvec_fmla_idx_s, float32, H4, 0, 0) DO_FMLA_IDX(gvec_fmla_idx_d, float64, H8, 0, 0) +DO_FMLA_IDX(gvec_bfmla_idx, bfloat16, H2, 0, 0) DO_FMLA_IDX(gvec_fmls_idx_h, float16, H2, INT16_MIN, 0) DO_FMLA_IDX(gvec_fmls_idx_s, float32, H4, INT32_MIN, 0) DO_FMLA_IDX(gvec_fmls_idx_d, float64, H8, INT64_MIN, 0) +DO_FMLA_IDX(gvec_bfmls_idx, bfloat16, H2, INT16_MIN, 0) DO_FMLA_IDX(gvec_ah_fmls_idx_h, float16, H2, 0, float_muladd_negate_product) DO_FMLA_IDX(gvec_ah_fmls_idx_s, float32, H4, 0, float_muladd_negate_product) DO_FMLA_IDX(gvec_ah_fmls_idx_d, float64, H8, 0, float_muladd_negate_product) +DO_FMLA_IDX(gvec_ah_bfmls_idx, bfloat16, H2, 0, float_muladd_negate_product) #undef DO_FMLA_IDX -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-51-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper.h | 2 ++ target/arm/tcg/sme.decode | 25 +++++++++++++++++++ target/arm/tcg/translate-sme.c | 44 ++++++++++++++++++++++++++++++++++ target/arm/tcg/vec_helper.c | 2 ++ 4 files changed, 73 insertions(+) diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper.h +++ b/target/arm/tcg/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_fclt0_d, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fadd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fadd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fadd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_bfadd, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fsub_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fsub_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fsub_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(gvec_bfsub, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fmul_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(gvec_fmul_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ FMLS_nn_h 11000001 101 ...01 0 .. 100 ...00 11 ... @azz_4x4_o3 FMLS_nn_s 11000001 101 ...01 0 .. 110 ...00 01 ... @azz_4x4_o3 FMLS_nn_d 11000001 111 ...01 0 .. 110 ...00 01 ... @azz_4x4_o3 +&az_n n off rv zm +@az_2x2_o3 ........ ... ..... . .. ... ..... .. off:3 \ + &az_n n=2 rv=%mova_rv zm=%zn_ax2 +@az_4x4_o3 ........ ... ..... . .. ... ..... .. off:3 \ + &az_n n=4 rv=%mova_rv zm=%zn_ax4 + +FADD_nn_h 11000001 101 00100 0 .. 111 ....0 00 ... @az_2x2_o3 +FADD_nn_s 11000001 101 00000 0 .. 111 ....0 00 ... @az_2x2_o3 +FADD_nn_d 11000001 111 00000 0 .. 111 ....0 00 ... @az_2x2_o3 +FADD_nn_h 11000001 101 00101 0 .. 111 ...00 00 ... @az_4x4_o3 +FADD_nn_s 11000001 101 00001 0 .. 111 ...00 00 ... @az_4x4_o3 +FADD_nn_d 11000001 111 00001 0 .. 111 ...00 00 ... @az_4x4_o3 + +FSUB_nn_h 11000001 101 00100 0 .. 111 ....0 01 ... @az_2x2_o3 +FSUB_nn_s 11000001 101 00000 0 .. 111 ....0 01 ... @az_2x2_o3 +FSUB_nn_d 11000001 111 00000 0 .. 111 ....0 01 ... @az_2x2_o3 +FSUB_nn_h 11000001 101 00101 0 .. 111 ...00 01 ... @az_4x4_o3 +FSUB_nn_s 11000001 101 00001 0 .. 111 ...00 01 ... @az_4x4_o3 +FSUB_nn_d 11000001 111 00001 0 .. 111 ...00 01 ... @az_4x4_o3 + +BFADD_nn 11000001 111 00100 0 .. 111 ....0 00 ... @az_2x2_o3 +BFADD_nn 11000001 111 00101 0 .. 111 ...00 00 ... @az_4x4_o3 +BFSUB_nn 11000001 111 00100 0 .. 111 ....0 01 ... @az_2x2_o3 +BFSUB_nn 11000001 111 00101 0 .. 111 ...00 01 ... @az_4x4_o3 + ### SME2 Multi-vector Indexed &azx_n n off rv zn zm idx diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(BFMLA_nx, aa64_sme_b16b16, do_fmla_nx, a, FPST_ZA, TRANS_FEAT(BFMLS_nx, aa64_sme_b16b16, do_fmla_nx, a, FPST_ZA, s->fpcr_ah ? gen_helper_gvec_ah_bfmls_idx : gen_helper_gvec_bfmls_idx) +static bool do_faddsub(DisasContext *s, arg_az_n *a, ARMFPStatusFlavour fpst, + gen_helper_gvec_3_ptr *fn) +{ + if (sme_smza_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + int n = a->n; + int zm = a->zm; + int vstride = svl / n; + TCGv_ptr t_za = get_zarray(s, a->rv, a->off, n, 0); + TCGv_ptr ptr = fpstatus_ptr(fpst); + TCGv_ptr t = tcg_temp_new_ptr(); + + for (int r = 0; r < n; ++r) { + TCGv_ptr t_zm = vec_full_reg_ptr(s, zm + r); + int o_za = r * vstride * sizeof(ARMVectorReg); + int desc = simd_desc(svl, svl, 0); + + tcg_gen_addi_ptr(t, t_za, o_za); + fn(t, t, t_zm, ptr, tcg_constant_i32(desc)); + } + } + return true; +} + +TRANS_FEAT(FADD_nn_h, aa64_sme_f16f16, do_faddsub, a, + FPST_ZA_F16, gen_helper_gvec_fadd_h) +TRANS_FEAT(FSUB_nn_h, aa64_sme_f16f16, do_faddsub, a, + FPST_ZA_F16, gen_helper_gvec_fsub_h) + +TRANS_FEAT(FADD_nn_s, aa64_sme2, do_faddsub, a, + FPST_ZA, gen_helper_gvec_fadd_s) +TRANS_FEAT(FSUB_nn_s, aa64_sme2, do_faddsub, a, + FPST_ZA, gen_helper_gvec_fsub_s) + +TRANS_FEAT(FADD_nn_d, aa64_sme2_f64f64, do_faddsub, a, + FPST_ZA, gen_helper_gvec_fadd_d) +TRANS_FEAT(FSUB_nn_d, aa64_sme2_f64f64, do_faddsub, a, + FPST_ZA, gen_helper_gvec_fsub_d) + +TRANS_FEAT(BFADD_nn, aa64_sme_b16b16, do_faddsub, a, + FPST_ZA, gen_helper_gvec_bfadd) +TRANS_FEAT(BFSUB_nn, aa64_sme_b16b16, do_faddsub, a, + FPST_ZA, gen_helper_gvec_bfsub) + /* * Expand array multi-vector single (n1), array multi-vector (nn), * and array multi-vector indexed (nx), for integer accumulate. diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, \ DO_3OP(gvec_fadd_h, float16_add, float16) DO_3OP(gvec_fadd_s, float32_add, float32) DO_3OP(gvec_fadd_d, float64_add, float64) +DO_3OP(gvec_bfadd, bfloat16_add, bfloat16) DO_3OP(gvec_fsub_h, float16_sub, float16) DO_3OP(gvec_fsub_s, float32_sub, float32) DO_3OP(gvec_fsub_d, float64_sub, float64) +DO_3OP(gvec_bfsub, bfloat16_sub, bfloat16) DO_3OP(gvec_fmul_h, float16_mul, float16) DO_3OP(gvec_fmul_s, float32_mul, float32) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-52-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 12 ++++++++++++ target/arm/tcg/translate-sme.c | 28 ++++++++++++++++++++++++++++ 2 files changed, 40 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ BFMLS_nx 11000001 0001 .... 1 .. 1.. ...01 1 .... @azx_4x1_i3_o3 FMLS_nx_h 11000001 0001 .... 1 .. 1.. ...00 1 .... @azx_4x1_i3_o3 FMLS_nx_s 11000001 0101 .... 1 .. 0.. ...00 10 ... @azx_4x1_i2_o3 FMLS_nx_d 11000001 1101 .... 1 .. 00. ...00 10 ... @azx_4x1_i1_o3 + +### SME2 Add / Sub array accumulators + +ADD_aaz_s 11000001 101 000000 .. 111 ....0 10 ... @az_2x2_o3 +ADD_aaz_s 11000001 101 000010 .. 111 ...00 10 ... @az_4x4_o3 +ADD_aaz_d 11000001 111 000000 .. 111 ....0 10 ... @az_2x2_o3 +ADD_aaz_d 11000001 111 000010 .. 111 ...00 10 ... @az_4x4_o3 + +SUB_aaz_s 11000001 101 000000 .. 111 ....0 11 ... @az_2x2_o3 +SUB_aaz_s 11000001 101 000010 .. 111 ...00 11 ... @az_4x4_o3 +SUB_aaz_d 11000001 111 000000 .. 111 ....0 11 ... @az_2x2_o3 +SUB_aaz_d 11000001 111 000010 .. 111 ...00 11 ... @az_4x4_o3 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SUB_azz_nn_s, aa64_sme2, do_azz_nn, a, MO_32, tcg_gen_gvec_sub_var) TRANS_FEAT(ADD_azz_nn_d, aa64_sme2_i16i64, do_azz_nn, a, MO_64, tcg_gen_gvec_add_var) TRANS_FEAT(SUB_azz_nn_d, aa64_sme2_i16i64, do_azz_nn, a, MO_64, tcg_gen_gvec_sub_var) +/* Add/Sub each ZA[d*N] += Z[m*N] */ +static bool do_aaz(DisasContext *s, arg_az_n *a, int esz, GVecGen3FnVar *fn) +{ + TCGv_ptr t_za; + int svl, n; + + if (!sme_smza_enabled_check(s)) { + return true; + } + + n = a->n; + t_za = get_zarray(s, a->rv, a->off, n, 0); + svl = streaming_vec_reg_size(s); + + for (int i = 0; i < n; ++i) { + int o_za = (svl / n * sizeof(ARMVectorReg)) * i; + int o_zm = vec_full_reg_offset(s, a->zm + i); + + fn(esz, t_za, o_za, t_za, o_za, tcg_env, o_zm, svl, svl); + } + return true; +} + +TRANS_FEAT(ADD_aaz_s, aa64_sme2, do_aaz, a, MO_32, tcg_gen_gvec_add_var) +TRANS_FEAT(SUB_aaz_s, aa64_sme2, do_aaz, a, MO_32, tcg_gen_gvec_sub_var) +TRANS_FEAT(ADD_aaz_d, aa64_sme2_i16i64, do_aaz, a, MO_64, tcg_gen_gvec_add_var) +TRANS_FEAT(SUB_aaz_d, aa64_sme2_i16i64, do_aaz, a, MO_64, tcg_gen_gvec_sub_var) + /* * Expand array multi-vector single (n1), array multi-vector (nn), * and array multi-vector indexed (nx), for floating-point accumulate. -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-53-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 5 +++ target/arm/tcg/vec_internal.h | 2 + target/arm/tcg/sme.decode | 12 ++++++ target/arm/tcg/sme_helper.c | 74 ++++++++++++++++++++++++++++++++++ target/arm/tcg/sve_helper.c | 2 +- target/arm/tcg/translate-sme.c | 25 ++++++++++++ 6 files changed, 119 insertions(+), 1 deletion(-) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sme2_umlsll_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, DEF_HELPER_FLAGS_5(sme2_umlsll_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sme2_usmlall_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sme2_sumlall_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sme2_bfcvt, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_4(sme2_bfcvtn, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_4(sme2_fcvt_n, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_4(sme2_fcvtn, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_internal.h +++ b/target/arm/tcg/vec_internal.h @@ -XXX,XX +XXX,XX @@ static inline float64 float64_maybe_ah_chs(float64 a, bool fpcr_ah) bfloat16 helper_sme2_ah_fmax_b16(bfloat16 a, bfloat16 b, float_status *fpst); bfloat16 helper_sme2_ah_fmin_b16(bfloat16 a, bfloat16 b, float_status *fpst); +float16 sve_f32_to_f16(float32 f, float_status *fpst); + #endif /* TARGET_ARM_VEC_INTERNAL_H */ diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SUB_aaz_s 11000001 101 000000 .. 111 ....0 11 ... @az_2x2_o3 SUB_aaz_s 11000001 101 000010 .. 111 ...00 11 ... @az_4x4_o3 SUB_aaz_d 11000001 111 000000 .. 111 ....0 11 ... @az_2x2_o3 SUB_aaz_d 11000001 111 000010 .. 111 ...00 11 ... @az_4x4_o3 + +### SME2 Multi-vector SVE Constructive Unary + +&zz_n zd zn n +@zz_1x2 ........ ... ..... ...... ..... zd:5 \ + &zz_n n=1 zn=%zn_ax2 + +BFCVT 11000001 011 00000 111000 ....0 ..... @zz_1x2 +BFCVTN 11000001 011 00000 111000 ....1 ..... @zz_1x2 + +FCVT_n 11000001 001 00000 111000 ....0 ..... @zz_1x2 +FCVTN 11000001 001 00000 111000 ....1 ..... @zz_1x2 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ DO_MLALL_IDX(sme2_usmlall_idx_s, uint32_t, uint8_t, int8_t, H4, H1, +) DO_MLALL_IDX(sme2_sumlall_idx_s, uint32_t, int8_t, uint8_t, H4, H1, +) #undef DO_MLALL_IDX + +/* Convert and compress */ +void HELPER(sme2_bfcvt)(void *vd, void *vs, float_status *fpst, uint32_t desc) +{ + ARMVectorReg scratch; + size_t oprsz = simd_oprsz(desc); + size_t i, n = oprsz / 4; + float32 *s0 = vs; + float32 *s1 = vs + sizeof(ARMVectorReg); + bfloat16 *d = vd; + + if (vd == s1) { + s1 = memcpy(&scratch, s1, oprsz); + } + + for (i = 0; i < n; ++i) { + d[H2(i)] = float32_to_bfloat16(s0[H4(i)], fpst); + } + for (i = 0; i < n; ++i) { + d[H2(i) + n] = float32_to_bfloat16(s1[H4(i)], fpst); + } +} + +void HELPER(sme2_fcvt_n)(void *vd, void *vs, float_status *fpst, uint32_t desc) +{ + ARMVectorReg scratch; + size_t oprsz = simd_oprsz(desc); + size_t i, n = oprsz / 4; + float32 *s0 = vs; + float32 *s1 = vs + sizeof(ARMVectorReg); + float16 *d = vd; + + if (vd == s1) { + s1 = memcpy(&scratch, s1, oprsz); + } + + for (i = 0; i < n; ++i) { + d[H2(i)] = sve_f32_to_f16(s0[H4(i)], fpst); + } + for (i = 0; i < n; ++i) { + d[H2(i) + n] = sve_f32_to_f16(s1[H4(i)], fpst); + } +} + +/* Convert and interleave */ +void HELPER(sme2_bfcvtn)(void *vd, void *vs, float_status *fpst, uint32_t desc) +{ + size_t i, n = simd_oprsz(desc) / 4; + float32 *s0 = vs; + float32 *s1 = vs + sizeof(ARMVectorReg); + bfloat16 *d = vd; + + for (i = 0; i < n; ++i) { + bfloat16 d0 = float32_to_bfloat16(s0[H4(i)], fpst); + bfloat16 d1 = float32_to_bfloat16(s1[H4(i)], fpst); + d[H2(i * 2 + 0)] = d0; + d[H2(i * 2 + 1)] = d1; + } +} + +void HELPER(sme2_fcvtn)(void *vd, void *vs, float_status *fpst, uint32_t desc) +{ + size_t i, n = simd_oprsz(desc) / 4; + float32 *s0 = vs; + float32 *s1 = vs + sizeof(ARMVectorReg); + bfloat16 *d = vd; + + for (i = 0; i < n; ++i) { + bfloat16 d0 = sve_f32_to_f16(s0[H4(i)], fpst); + bfloat16 d1 = sve_f32_to_f16(s1[H4(i)], fpst); + d[H2(i * 2 + 0)] = d0; + d[H2(i * 2 + 1)] = d1; + } +} diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ static inline float64 sve_f16_to_f64(float16 f, float_status *fpst) return ret; } -static inline float16 sve_f32_to_f16(float32 f, float_status *fpst) +float16 sve_f32_to_f16(float32 f, float_status *fpst) { bool save = get_flush_to_zero(fpst); float16 ret; diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SMLALL_nx_d, aa64_sme2_i16i64, do_smlall_nx, a, gen_helper_sme2_smlal TRANS_FEAT(SMLSLL_nx_d, aa64_sme2_i16i64, do_smlall_nx, a, gen_helper_sme2_smlsll_idx_d) TRANS_FEAT(UMLALL_nx_d, aa64_sme2_i16i64, do_smlall_nx, a, gen_helper_sme2_umlall_idx_d) TRANS_FEAT(UMLSLL_nx_d, aa64_sme2_i16i64, do_smlall_nx, a, gen_helper_sme2_umlsll_idx_d) + +static bool do_zz_fpst(DisasContext *s, arg_zz_n *a, int data, + ARMFPStatusFlavour type, gen_helper_gvec_2_ptr *fn) +{ + if (sme_sm_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + TCGv_ptr fpst = fpstatus_ptr(type); + + for (int i = 0, n = a->n; i < n; ++i) { + tcg_gen_gvec_2_ptr(vec_full_reg_offset(s, a->zd + i), + vec_full_reg_offset(s, a->zn + i), + fpst, svl, svl, data, fn); + } + } + return true; +} + +TRANS_FEAT(BFCVT, aa64_sme2, do_zz_fpst, a, 0, + FPST_A64, gen_helper_sme2_bfcvt) +TRANS_FEAT(BFCVTN, aa64_sme2, do_zz_fpst, a, 0, + FPST_A64, gen_helper_sme2_bfcvtn) +TRANS_FEAT(FCVT_n, aa64_sme2, do_zz_fpst, a, 0, + FPST_A64, gen_helper_sme2_fcvt_n) +TRANS_FEAT(FCVTN, aa64_sme2, do_zz_fpst, a, 0, + FPST_A64, gen_helper_sme2_fcvtn) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-54-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 2 ++ target/arm/tcg/vec_internal.h | 1 + target/arm/tcg/sme.decode | 5 ++++ target/arm/tcg/sme_helper.c | 45 ++++++++++++++++++++++++++++++++++ target/arm/tcg/sve_helper.c | 2 +- target/arm/tcg/translate-sme.c | 5 ++++ 6 files changed, 59 insertions(+), 1 deletion(-) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sme2_bfcvt, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_4(sme2_bfcvtn, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_4(sme2_fcvt_n, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_4(sme2_fcvtn, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_4(sme2_fcvt_w, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_4(sme2_fcvtl, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_internal.h +++ b/target/arm/tcg/vec_internal.h @@ -XXX,XX +XXX,XX @@ static inline float64 float64_maybe_ah_chs(float64 a, bool fpcr_ah) bfloat16 helper_sme2_ah_fmax_b16(bfloat16 a, bfloat16 b, float_status *fpst); bfloat16 helper_sme2_ah_fmin_b16(bfloat16 a, bfloat16 b, float_status *fpst); +float32 sve_f16_to_f32(float16 f, float_status *fpst); float16 sve_f32_to_f16(float32 f, float_status *fpst); #endif /* TARGET_ARM_VEC_INTERNAL_H */ diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SUB_aaz_d 11000001 111 000010 .. 111 ...00 11 ... @az_4x4_o3 &zz_n zd zn n @zz_1x2 ........ ... ..... ...... ..... zd:5 \ &zz_n n=1 zn=%zn_ax2 +@zz_2x1 ........ ... ..... ...... zn:5 ..... \ + &zz_n n=1 zd=%zd_ax2 BFCVT 11000001 011 00000 111000 ....0 ..... @zz_1x2 BFCVTN 11000001 011 00000 111000 ....1 ..... @zz_1x2 FCVT_n 11000001 001 00000 111000 ....0 ..... @zz_1x2 FCVTN 11000001 001 00000 111000 ....1 ..... @zz_1x2 + +FCVT_w 11000001 101 00000 111000 ..... ....0 @zz_2x1 +FCVTL 11000001 101 00000 111000 ..... ....1 @zz_2x1 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ #include "vec_internal.h" #include "sve_ldst_internal.h" + +static bool vectors_overlap(ARMVectorReg *x, unsigned nx, + ARMVectorReg *y, unsigned ny) +{ + return !(x + nx <= y || y + ny <= x); +} + void helper_set_svcr(CPUARMState *env, uint32_t val, uint32_t mask) { aarch64_set_svcr(env, val, mask); @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_fcvtn)(void *vd, void *vs, float_status *fpst, uint32_t desc) d[H2(i * 2 + 1)] = d1; } } + +/* Expand and convert */ +void HELPER(sme2_fcvt_w)(void *vd, void *vs, float_status *fpst, uint32_t desc) +{ + ARMVectorReg scratch; + size_t oprsz = simd_oprsz(desc); + size_t i, n = oprsz / 4; + float16 *s = vs; + float32 *d0 = vd; + float32 *d1 = vd + sizeof(ARMVectorReg); + + if (vectors_overlap(vd, 1, vs, 2)) { + s = memcpy(&scratch, s, oprsz); + } + + for (i = 0; i < n; ++i) { + d0[H4(i)] = sve_f16_to_f32(s[H2(i)], fpst); + } + for (i = 0; i < n; ++i) { + d1[H4(i)] = sve_f16_to_f32(s[H2(n + i)], fpst); + } +} + +/* Deinterleave and convert. */ +void HELPER(sme2_fcvtl)(void *vd, void *vs, float_status *fpst, uint32_t desc) +{ + size_t i, n = simd_oprsz(desc) / 4; + float16 *s = vs; + float32 *d0 = vd; + float32 *d1 = vd + sizeof(ARMVectorReg); + + for (i = 0; i < n; ++i) { + float32 v0 = sve_f16_to_f32(s[H2(i * 2 + 0)], fpst); + float32 v1 = sve_f16_to_f32(s[H2(i * 2 + 1)], fpst); + d0[H4(i)] = v0; + d1[H4(i)] = v1; + } +} diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vg, \ * FZ16. When converting from fp16, this affects flushing input denormals; * when converting to fp16, this affects flushing output denormals. */ -static inline float32 sve_f16_to_f32(float16 f, float_status *fpst) +float32 sve_f16_to_f32(float16 f, float_status *fpst) { bool save = get_flush_inputs_to_zero(fpst); float32 ret; diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FCVT_n, aa64_sme2, do_zz_fpst, a, 0, FPST_A64, gen_helper_sme2_fcvt_n) TRANS_FEAT(FCVTN, aa64_sme2, do_zz_fpst, a, 0, FPST_A64, gen_helper_sme2_fcvtn) + +TRANS_FEAT(FCVT_w, aa64_sme_f16f16, do_zz_fpst, a, 0, + FPST_A64_F16, gen_helper_sme2_fcvt_w) +TRANS_FEAT(FCVTL, aa64_sme_f16f16, do_zz_fpst, a, 0, + FPST_A64_F16, gen_helper_sme2_fcvtl) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-55-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 9 +++++++++ target/arm/tcg/translate-sme.c | 5 +++++ 2 files changed, 14 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SUB_aaz_d 11000001 111 000010 .. 111 ...00 11 ... @az_4x4_o3 &zz_n n=1 zn=%zn_ax2 @zz_2x1 ........ ... ..... ...... zn:5 ..... \ &zz_n n=1 zd=%zd_ax2 +@zz_2x2 ........ ... ..... ...... .... . ..... \ + &zz_n n=2 zd=%zd_ax2 zn=%zn_ax2 +@zz_4x4 ........ ... ..... ...... .... . ..... \ + &zz_n n=4 zd=%zd_ax4 zn=%zn_ax4 BFCVT 11000001 011 00000 111000 ....0 ..... @zz_1x2 BFCVTN 11000001 011 00000 111000 ....1 ..... @zz_1x2 @@ -XXX,XX +XXX,XX @@ FCVTN 11000001 001 00000 111000 ....1 ..... @zz_1x2 FCVT_w 11000001 101 00000 111000 ..... ....0 @zz_2x1 FCVTL 11000001 101 00000 111000 ..... ....1 @zz_2x1 + +FCVTZS 11000001 001 00001 111000 ....0 ....0 @zz_2x2 +FCVTZS 11000001 001 10001 111000 ...00 ...00 @zz_4x4 +FCVTZU 11000001 001 00001 111000 ....1 ....0 @zz_2x2 +FCVTZU 11000001 001 10001 111000 ...01 ...00 @zz_4x4 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FCVT_w, aa64_sme_f16f16, do_zz_fpst, a, 0, FPST_A64_F16, gen_helper_sme2_fcvt_w) TRANS_FEAT(FCVTL, aa64_sme_f16f16, do_zz_fpst, a, 0, FPST_A64_F16, gen_helper_sme2_fcvtl) + +TRANS_FEAT(FCVTZS, aa64_sme2, do_zz_fpst, a, 0, + FPST_A64, gen_helper_gvec_vcvt_rz_fs) +TRANS_FEAT(FCVTZU, aa64_sme2, do_zz_fpst, a, 0, + FPST_A64, gen_helper_gvec_vcvt_rz_fu) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-56-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 2 ++ target/arm/tcg/sme.decode | 5 +++++ target/arm/tcg/sme_helper.c | 22 ++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 5 +++++ 4 files changed, 34 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sme2_fcvt_n, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_4(sme2_fcvtn, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_4(sme2_fcvt_w, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_4(sme2_fcvtl, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_4(sme2_scvtf, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_4(sme2_ucvtf, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ FCVTZS 11000001 001 00001 111000 ....0 ....0 @zz_2x2 FCVTZS 11000001 001 10001 111000 ...00 ...00 @zz_4x4 FCVTZU 11000001 001 00001 111000 ....1 ....0 @zz_2x2 FCVTZU 11000001 001 10001 111000 ...01 ...00 @zz_4x4 + +SCVTF 11000001 001 00010 111000 ....0 ....0 @zz_2x2 +SCVTF 11000001 001 10010 111000 ...00 ...00 @zz_4x4 +UCVTF 11000001 001 00010 111000 ....1 ....0 @zz_2x2 +UCVTF 11000001 001 10010 111000 ...01 ...00 @zz_4x4 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_fcvtl)(void *vd, void *vs, float_status *fpst, uint32_t desc) d1[H4(i)] = v1; } } + +void HELPER(sme2_scvtf)(void *vd, void *vs, float_status *fpst, uint32_t desc) +{ + size_t i, n = simd_oprsz(desc) / 4; + int32_t *d = vd; + float32 *s = vs; + + for (i = 0; i < n; ++i) { + d[i] = int32_to_float32(s[i], fpst); + } +} + +void HELPER(sme2_ucvtf)(void *vd, void *vs, float_status *fpst, uint32_t desc) +{ + size_t i, n = simd_oprsz(desc) / 4; + uint32_t *d = vd; + float32 *s = vs; + + for (i = 0; i < n; ++i) { + d[i] = uint32_to_float32(s[i], fpst); + } +} diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FCVTZS, aa64_sme2, do_zz_fpst, a, 0, FPST_A64, gen_helper_gvec_vcvt_rz_fs) TRANS_FEAT(FCVTZU, aa64_sme2, do_zz_fpst, a, 0, FPST_A64, gen_helper_gvec_vcvt_rz_fu) + +TRANS_FEAT(SCVTF, aa64_sme2, do_zz_fpst, a, 0, + FPST_A64, gen_helper_sme2_scvtf) +TRANS_FEAT(UCVTF, aa64_sme2, do_zz_fpst, a, 0, + FPST_A64, gen_helper_sme2_ucvtf) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-57-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 9 +++++++++ target/arm/tcg/translate-sme.c | 9 +++++++++ 2 files changed, 18 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SCVTF 11000001 001 00010 111000 ....0 ....0 @zz_2x2 SCVTF 11000001 001 10010 111000 ...00 ...00 @zz_4x4 UCVTF 11000001 001 00010 111000 ....1 ....0 @zz_2x2 UCVTF 11000001 001 10010 111000 ...01 ...00 @zz_4x4 + +FRINTN 11000001 101 01000 111000 ....0 ....0 @zz_2x2 +FRINTN 11000001 101 11000 111000 ...00 ...00 @zz_4x4 +FRINTP 11000001 101 01001 111000 ....0 ....0 @zz_2x2 +FRINTP 11000001 101 11001 111000 ...00 ...00 @zz_4x4 +FRINTM 11000001 101 01010 111000 ....0 ....0 @zz_2x2 +FRINTM 11000001 101 11010 111000 ...00 ...00 @zz_4x4 +FRINTA 11000001 101 01100 111000 ....0 ....0 @zz_2x2 +FRINTA 11000001 101 11100 111000 ...00 ...00 @zz_4x4 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SCVTF, aa64_sme2, do_zz_fpst, a, 0, FPST_A64, gen_helper_sme2_scvtf) TRANS_FEAT(UCVTF, aa64_sme2, do_zz_fpst, a, 0, FPST_A64, gen_helper_sme2_ucvtf) + +TRANS_FEAT(FRINTN, aa64_sme2, do_zz_fpst, a, float_round_nearest_even, + FPST_A64, gen_helper_gvec_vrint_rm_s) +TRANS_FEAT(FRINTP, aa64_sme2, do_zz_fpst, a, float_round_up, + FPST_A64, gen_helper_gvec_vrint_rm_s) +TRANS_FEAT(FRINTM, aa64_sme2, do_zz_fpst, a, float_round_down, + FPST_A64, gen_helper_gvec_vrint_rm_s) +TRANS_FEAT(FRINTA, aa64_sme2, do_zz_fpst, a, float_round_ties_away, + FPST_A64, gen_helper_gvec_vrint_rm_s) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Inputs are a wider type of indeterminate sign. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-58-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/vec_internal.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_internal.h +++ b/target/arm/tcg/vec_internal.h @@ -XXX,XX +XXX,XX @@ int16_t do_sqrdmlah_h(int16_t, int16_t, int16_t, bool, bool, uint32_t *); int32_t do_sqrdmlah_s(int32_t, int32_t, int32_t, bool, bool, uint32_t *); int64_t do_sqrdmlah_d(int64_t, int64_t, int64_t, bool, bool); +#define do_ssat_b(val) MIN(MAX(val, INT8_MIN), INT8_MAX) +#define do_ssat_h(val) MIN(MAX(val, INT16_MIN), INT16_MAX) +#define do_ssat_s(val) MIN(MAX(val, INT32_MIN), INT32_MAX) +#define do_usat_b(val) MIN(MAX(val, 0), UINT8_MAX) +#define do_usat_h(val) MIN(MAX(val, 0), UINT16_MAX) +#define do_usat_s(val) MIN(MAX(val, 0), UINT32_MAX) + /** * bfdotadd: * @sum: addend -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Replace and remove do_sat_bhs. This avoids multiple repetitions of INT*_MIN/MAX. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-59-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve_helper.c | 116 +++++++++++++++--------------------- 1 file changed, 48 insertions(+), 68 deletions(-) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ DO_ZPZZ(sve2_uhsub_zpzz_h, uint16_t, H1_2, DO_HSUB_BHS) DO_ZPZZ(sve2_uhsub_zpzz_s, uint32_t, H1_4, DO_HSUB_BHS) DO_ZPZZ_D(sve2_uhsub_zpzz_d, uint64_t, DO_HSUB_D) -static inline int32_t do_sat_bhs(int64_t val, int64_t min, int64_t max) -{ - return val >= max ? max : val <= min ? min : val; -} - -#define DO_SQADD_B(n, m) do_sat_bhs((int64_t)n + m, INT8_MIN, INT8_MAX) -#define DO_SQADD_H(n, m) do_sat_bhs((int64_t)n + m, INT16_MIN, INT16_MAX) -#define DO_SQADD_S(n, m) do_sat_bhs((int64_t)n + m, INT32_MIN, INT32_MAX) +#define DO_SQADD_B(n, m) do_ssat_b((int64_t)n + m) +#define DO_SQADD_H(n, m) do_ssat_h((int64_t)n + m) +#define DO_SQADD_S(n, m) do_ssat_s((int64_t)n + m) static inline int64_t do_sqadd_d(int64_t n, int64_t m) { @@ -XXX,XX +XXX,XX @@ DO_ZPZZ(sve2_sqadd_zpzz_h, int16_t, H1_2, DO_SQADD_H) DO_ZPZZ(sve2_sqadd_zpzz_s, int32_t, H1_4, DO_SQADD_S) DO_ZPZZ_D(sve2_sqadd_zpzz_d, int64_t, do_sqadd_d) -#define DO_UQADD_B(n, m) do_sat_bhs((int64_t)n + m, 0, UINT8_MAX) -#define DO_UQADD_H(n, m) do_sat_bhs((int64_t)n + m, 0, UINT16_MAX) -#define DO_UQADD_S(n, m) do_sat_bhs((int64_t)n + m, 0, UINT32_MAX) +#define DO_UQADD_B(n, m) do_usat_b((int64_t)n + m) +#define DO_UQADD_H(n, m) do_usat_h((int64_t)n + m) +#define DO_UQADD_S(n, m) do_usat_s((int64_t)n + m) static inline uint64_t do_uqadd_d(uint64_t n, uint64_t m) { @@ -XXX,XX +XXX,XX @@ DO_ZPZZ(sve2_uqadd_zpzz_h, uint16_t, H1_2, DO_UQADD_H) DO_ZPZZ(sve2_uqadd_zpzz_s, uint32_t, H1_4, DO_UQADD_S) DO_ZPZZ_D(sve2_uqadd_zpzz_d, uint64_t, do_uqadd_d) -#define DO_SQSUB_B(n, m) do_sat_bhs((int64_t)n - m, INT8_MIN, INT8_MAX) -#define DO_SQSUB_H(n, m) do_sat_bhs((int64_t)n - m, INT16_MIN, INT16_MAX) -#define DO_SQSUB_S(n, m) do_sat_bhs((int64_t)n - m, INT32_MIN, INT32_MAX) +#define DO_SQSUB_B(n, m) do_ssat_b((int64_t)n - m) +#define DO_SQSUB_H(n, m) do_ssat_h((int64_t)n - m) +#define DO_SQSUB_S(n, m) do_ssat_s((int64_t)n - m) static inline int64_t do_sqsub_d(int64_t n, int64_t m) { @@ -XXX,XX +XXX,XX @@ DO_ZPZZ(sve2_sqsub_zpzz_h, int16_t, H1_2, DO_SQSUB_H) DO_ZPZZ(sve2_sqsub_zpzz_s, int32_t, H1_4, DO_SQSUB_S) DO_ZPZZ_D(sve2_sqsub_zpzz_d, int64_t, do_sqsub_d) -#define DO_UQSUB_B(n, m) do_sat_bhs((int64_t)n - m, 0, UINT8_MAX) -#define DO_UQSUB_H(n, m) do_sat_bhs((int64_t)n - m, 0, UINT16_MAX) -#define DO_UQSUB_S(n, m) do_sat_bhs((int64_t)n - m, 0, UINT32_MAX) +#define DO_UQSUB_B(n, m) do_usat_b((int64_t)n - m) +#define DO_UQSUB_H(n, m) do_usat_h((int64_t)n - m) +#define DO_UQSUB_S(n, m) do_usat_s((int64_t)n - m) static inline uint64_t do_uqsub_d(uint64_t n, uint64_t m) { @@ -XXX,XX +XXX,XX @@ DO_ZPZZ(sve2_uqsub_zpzz_h, uint16_t, H1_2, DO_UQSUB_H) DO_ZPZZ(sve2_uqsub_zpzz_s, uint32_t, H1_4, DO_UQSUB_S) DO_ZPZZ_D(sve2_uqsub_zpzz_d, uint64_t, do_uqsub_d) -#define DO_SUQADD_B(n, m) \ - do_sat_bhs((int64_t)(int8_t)n + m, INT8_MIN, INT8_MAX) -#define DO_SUQADD_H(n, m) \ - do_sat_bhs((int64_t)(int16_t)n + m, INT16_MIN, INT16_MAX) -#define DO_SUQADD_S(n, m) \ - do_sat_bhs((int64_t)(int32_t)n + m, INT32_MIN, INT32_MAX) +#define DO_SUQADD_B(n, m) do_ssat_b((int64_t)(int8_t)n + m) +#define DO_SUQADD_H(n, m) do_ssat_h((int64_t)(int16_t)n + m) +#define DO_SUQADD_S(n, m) do_ssat_s((int64_t)(int32_t)n + m) static inline int64_t do_suqadd_d(int64_t n, uint64_t m) { @@ -XXX,XX +XXX,XX @@ DO_ZPZZ(sve2_suqadd_zpzz_h, uint16_t, H1_2, DO_SUQADD_H) DO_ZPZZ(sve2_suqadd_zpzz_s, uint32_t, H1_4, DO_SUQADD_S) DO_ZPZZ_D(sve2_suqadd_zpzz_d, uint64_t, do_suqadd_d) -#define DO_USQADD_B(n, m) \ - do_sat_bhs((int64_t)n + (int8_t)m, 0, UINT8_MAX) -#define DO_USQADD_H(n, m) \ - do_sat_bhs((int64_t)n + (int16_t)m, 0, UINT16_MAX) -#define DO_USQADD_S(n, m) \ - do_sat_bhs((int64_t)n + (int32_t)m, 0, UINT32_MAX) +#define DO_USQADD_B(n, m) do_usat_b((int64_t)n + (int8_t)m) +#define DO_USQADD_H(n, m) do_usat_h((int64_t)n + (int16_t)m) +#define DO_USQADD_S(n, m) do_usat_s((int64_t)n + (int32_t)m) static inline uint64_t do_usqadd_d(uint64_t n, int64_t m) { @@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, uint32_t desc) \ } \ } -#define DO_SQXTN_H(n) do_sat_bhs(n, INT8_MIN, INT8_MAX) -#define DO_SQXTN_S(n) do_sat_bhs(n, INT16_MIN, INT16_MAX) -#define DO_SQXTN_D(n) do_sat_bhs(n, INT32_MIN, INT32_MAX) +DO_XTNB(sve2_sqxtnb_h, int16_t, do_ssat_b) +DO_XTNB(sve2_sqxtnb_s, int32_t, do_ssat_h) +DO_XTNB(sve2_sqxtnb_d, int64_t, do_ssat_s) -DO_XTNB(sve2_sqxtnb_h, int16_t, DO_SQXTN_H) -DO_XTNB(sve2_sqxtnb_s, int32_t, DO_SQXTN_S) -DO_XTNB(sve2_sqxtnb_d, int64_t, DO_SQXTN_D) +DO_XTNT(sve2_sqxtnt_h, int16_t, int8_t, H1, do_ssat_b) +DO_XTNT(sve2_sqxtnt_s, int32_t, int16_t, H1_2, do_ssat_h) +DO_XTNT(sve2_sqxtnt_d, int64_t, int32_t, H1_4, do_ssat_s) -DO_XTNT(sve2_sqxtnt_h, int16_t, int8_t, H1, DO_SQXTN_H) -DO_XTNT(sve2_sqxtnt_s, int32_t, int16_t, H1_2, DO_SQXTN_S) -DO_XTNT(sve2_sqxtnt_d, int64_t, int32_t, H1_4, DO_SQXTN_D) +DO_XTNB(sve2_uqxtnb_h, uint16_t, do_usat_b) +DO_XTNB(sve2_uqxtnb_s, uint32_t, do_usat_h) +DO_XTNB(sve2_uqxtnb_d, uint64_t, do_usat_s) -#define DO_UQXTN_H(n) do_sat_bhs(n, 0, UINT8_MAX) -#define DO_UQXTN_S(n) do_sat_bhs(n, 0, UINT16_MAX) -#define DO_UQXTN_D(n) do_sat_bhs(n, 0, UINT32_MAX) +DO_XTNT(sve2_uqxtnt_h, uint16_t, uint8_t, H1, do_usat_b) +DO_XTNT(sve2_uqxtnt_s, uint32_t, uint16_t, H1_2, do_usat_h) +DO_XTNT(sve2_uqxtnt_d, uint64_t, uint32_t, H1_4, do_usat_s) -DO_XTNB(sve2_uqxtnb_h, uint16_t, DO_UQXTN_H) -DO_XTNB(sve2_uqxtnb_s, uint32_t, DO_UQXTN_S) -DO_XTNB(sve2_uqxtnb_d, uint64_t, DO_UQXTN_D) +DO_XTNB(sve2_sqxtunb_h, int16_t, do_usat_b) +DO_XTNB(sve2_sqxtunb_s, int32_t, do_usat_h) +DO_XTNB(sve2_sqxtunb_d, int64_t, do_usat_s) -DO_XTNT(sve2_uqxtnt_h, uint16_t, uint8_t, H1, DO_UQXTN_H) -DO_XTNT(sve2_uqxtnt_s, uint32_t, uint16_t, H1_2, DO_UQXTN_S) -DO_XTNT(sve2_uqxtnt_d, uint64_t, uint32_t, H1_4, DO_UQXTN_D) - -DO_XTNB(sve2_sqxtunb_h, int16_t, DO_UQXTN_H) -DO_XTNB(sve2_sqxtunb_s, int32_t, DO_UQXTN_S) -DO_XTNB(sve2_sqxtunb_d, int64_t, DO_UQXTN_D) - -DO_XTNT(sve2_sqxtunt_h, int16_t, int8_t, H1, DO_UQXTN_H) -DO_XTNT(sve2_sqxtunt_s, int32_t, int16_t, H1_2, DO_UQXTN_S) -DO_XTNT(sve2_sqxtunt_d, int64_t, int32_t, H1_4, DO_UQXTN_D) +DO_XTNT(sve2_sqxtunt_h, int16_t, int8_t, H1, do_usat_b) +DO_XTNT(sve2_sqxtunt_s, int32_t, int16_t, H1_2, do_usat_h) +DO_XTNT(sve2_sqxtunt_d, int64_t, int32_t, H1_4, do_usat_s) #undef DO_XTNB #undef DO_XTNT @@ -XXX,XX +XXX,XX @@ DO_SHRNT(sve2_rshrnt_h, uint16_t, uint8_t, H1_2, H1, do_urshr) DO_SHRNT(sve2_rshrnt_s, uint32_t, uint16_t, H1_4, H1_2, do_urshr) DO_SHRNT(sve2_rshrnt_d, uint64_t, uint32_t, H1_8, H1_4, do_urshr) -#define DO_SQSHRUN_H(x, sh) do_sat_bhs((int64_t)(x) >> sh, 0, UINT8_MAX) -#define DO_SQSHRUN_S(x, sh) do_sat_bhs((int64_t)(x) >> sh, 0, UINT16_MAX) -#define DO_SQSHRUN_D(x, sh) \ - do_sat_bhs((int64_t)(x) >> (sh < 64 ? sh : 63), 0, UINT32_MAX) +#define DO_SQSHRUN_H(x, sh) do_usat_b((int64_t)(x) >> sh) +#define DO_SQSHRUN_S(x, sh) do_usat_h((int64_t)(x) >> sh) +#define DO_SQSHRUN_D(x, sh) do_usat_s((int64_t)(x) >> (sh < 64 ? sh : 63)) DO_SHRNB(sve2_sqshrunb_h, int16_t, uint8_t, DO_SQSHRUN_H) DO_SHRNB(sve2_sqshrunb_s, int32_t, uint16_t, DO_SQSHRUN_S) @@ -XXX,XX +XXX,XX @@ DO_SHRNT(sve2_sqshrunt_h, int16_t, uint8_t, H1_2, H1, DO_SQSHRUN_H) DO_SHRNT(sve2_sqshrunt_s, int32_t, uint16_t, H1_4, H1_2, DO_SQSHRUN_S) DO_SHRNT(sve2_sqshrunt_d, int64_t, uint32_t, H1_8, H1_4, DO_SQSHRUN_D) -#define DO_SQRSHRUN_H(x, sh) do_sat_bhs(do_srshr(x, sh), 0, UINT8_MAX) -#define DO_SQRSHRUN_S(x, sh) do_sat_bhs(do_srshr(x, sh), 0, UINT16_MAX) -#define DO_SQRSHRUN_D(x, sh) do_sat_bhs(do_srshr(x, sh), 0, UINT32_MAX) +#define DO_SQRSHRUN_H(x, sh) do_usat_b(do_srshr(x, sh)) +#define DO_SQRSHRUN_S(x, sh) do_usat_h(do_srshr(x, sh)) +#define DO_SQRSHRUN_D(x, sh) do_usat_s(do_srshr(x, sh)) DO_SHRNB(sve2_sqrshrunb_h, int16_t, uint8_t, DO_SQRSHRUN_H) DO_SHRNB(sve2_sqrshrunb_s, int32_t, uint16_t, DO_SQRSHRUN_S) @@ -XXX,XX +XXX,XX @@ DO_SHRNT(sve2_sqrshrunt_h, int16_t, uint8_t, H1_2, H1, DO_SQRSHRUN_H) DO_SHRNT(sve2_sqrshrunt_s, int32_t, uint16_t, H1_4, H1_2, DO_SQRSHRUN_S) DO_SHRNT(sve2_sqrshrunt_d, int64_t, uint32_t, H1_8, H1_4, DO_SQRSHRUN_D) -#define DO_SQSHRN_H(x, sh) do_sat_bhs(x >> sh, INT8_MIN, INT8_MAX) -#define DO_SQSHRN_S(x, sh) do_sat_bhs(x >> sh, INT16_MIN, INT16_MAX) -#define DO_SQSHRN_D(x, sh) do_sat_bhs(x >> sh, INT32_MIN, INT32_MAX) +#define DO_SQSHRN_H(x, sh) do_ssat_b(x >> sh) +#define DO_SQSHRN_S(x, sh) do_ssat_h(x >> sh) +#define DO_SQSHRN_D(x, sh) do_ssat_s(x >> sh) DO_SHRNB(sve2_sqshrnb_h, int16_t, uint8_t, DO_SQSHRN_H) DO_SHRNB(sve2_sqshrnb_s, int32_t, uint16_t, DO_SQSHRN_S) @@ -XXX,XX +XXX,XX @@ DO_SHRNT(sve2_sqshrnt_h, int16_t, uint8_t, H1_2, H1, DO_SQSHRN_H) DO_SHRNT(sve2_sqshrnt_s, int32_t, uint16_t, H1_4, H1_2, DO_SQSHRN_S) DO_SHRNT(sve2_sqshrnt_d, int64_t, uint32_t, H1_8, H1_4, DO_SQSHRN_D) -#define DO_SQRSHRN_H(x, sh) do_sat_bhs(do_srshr(x, sh), INT8_MIN, INT8_MAX) -#define DO_SQRSHRN_S(x, sh) do_sat_bhs(do_srshr(x, sh), INT16_MIN, INT16_MAX) -#define DO_SQRSHRN_D(x, sh) do_sat_bhs(do_srshr(x, sh), INT32_MIN, INT32_MAX) +#define DO_SQRSHRN_H(x, sh) do_ssat_b(do_srshr(x, sh)) +#define DO_SQRSHRN_S(x, sh) do_ssat_h(do_srshr(x, sh)) +#define DO_SQRSHRN_D(x, sh) do_ssat_s(do_srshr(x, sh)) DO_SHRNB(sve2_sqrshrnb_h, int16_t, uint8_t, DO_SQRSHRN_H) DO_SHRNB(sve2_sqrshrnb_s, int32_t, uint16_t, DO_SQRSHRN_S) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20250704142112.1018902-60-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 20 ++++++ target/arm/tcg/sme.decode | 22 +++++++ target/arm/tcg/sme_helper.c | 116 +++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 35 ++++++++++ 4 files changed, 193 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sme2_fcvt_w, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_4(sme2_fcvtl, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_4(sme2_scvtf, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_4(sme2_ucvtf, TCG_CALL_NO_RWG, void, ptr, ptr, fpst, i32) + +DEF_HELPER_FLAGS_3(sme2_sqcvt_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqcvt_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvtu_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvt_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqcvt_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvtu_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvt_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqcvt_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvtu_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sme2_sqcvtn_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqcvtn_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvtun_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvtn_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqcvtn_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvtun_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvtn_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqcvtn_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqcvtun_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SUB_aaz_d 11000001 111 000010 .. 111 ...00 11 ... @az_4x4_o3 &zz_n zd zn n @zz_1x2 ........ ... ..... ...... ..... zd:5 \ &zz_n n=1 zn=%zn_ax2 +@zz_1x4 ........ ... ..... ...... ..... zd:5 \ + &zz_n n=1 zn=%zn_ax4 @zz_2x1 ........ ... ..... ...... zn:5 ..... \ &zz_n n=1 zd=%zd_ax2 @zz_2x2 ........ ... ..... ...... .... . ..... \ @@ -XXX,XX +XXX,XX @@ FRINTM 11000001 101 01010 111000 ....0 ....0 @zz_2x2 FRINTM 11000001 101 11010 111000 ...00 ...00 @zz_4x4 FRINTA 11000001 101 01100 111000 ....0 ....0 @zz_2x2 FRINTA 11000001 101 11100 111000 ...00 ...00 @zz_4x4 + +SQCVT_sh 11000001 001 00011 111000 ....0 ..... @zz_1x2 +UQCVT_sh 11000001 001 00011 111000 ....1 ..... @zz_1x2 +SQCVTU_sh 11000001 011 00011 111000 ....0 ..... @zz_1x2 + +SQCVT_sb 11000001 001 10011 111000 ...00 ..... @zz_1x4 +UQCVT_sb 11000001 001 10011 111000 ...01 ..... @zz_1x4 +SQCVTU_sb 11000001 011 10011 111000 ...00 ..... @zz_1x4 + +SQCVT_dh 11000001 101 10011 111000 ...00 ..... @zz_1x4 +UQCVT_dh 11000001 101 10011 111000 ...01 ..... @zz_1x4 +SQCVTU_dh 11000001 111 10011 111000 ...00 ..... @zz_1x4 + +SQCVTN_sb 11000001 001 10011 111000 ...10 ..... @zz_1x4 +UQCVTN_sb 11000001 001 10011 111000 ...11 ..... @zz_1x4 +SQCVTUN_sb 11000001 011 10011 111000 ...10 ..... @zz_1x4 + +SQCVTN_dh 11000001 101 10011 111000 ...10 ..... @zz_1x4 +UQCVTN_dh 11000001 101 10011 111000 ...11 ..... @zz_1x4 +SQCVTUN_dh 11000001 111 10011 111000 ...10 ..... @zz_1x4 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_fcvt_n)(void *vd, void *vs, float_status *fpst, uint32_t desc) } } +#define SQCVT2(NAME, TW, TN, HW, HN, SAT) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch; \ + size_t oprsz = simd_oprsz(desc), n = oprsz / sizeof(TW); \ + TW *s0 = vs, *s1 = vs + sizeof(ARMVectorReg); \ + TN *d = vd; \ + if (vectors_overlap(vd, 1, vs, 2)) { \ + d = (TN *)&scratch; \ + } \ + for (size_t i = 0; i < n; ++i) { \ + d[HN(i)] = SAT(s0[HW(i)]); \ + d[HN(i + n)] = SAT(s1[HW(i)]); \ + } \ + if (d != vd) { \ + memcpy(vd, d, oprsz); \ + } \ +} + +SQCVT2(sme2_sqcvt_sh, int32_t, int16_t, H4, H2, do_ssat_h) +SQCVT2(sme2_uqcvt_sh, uint32_t, uint16_t, H4, H2, do_usat_h) +SQCVT2(sme2_sqcvtu_sh, int32_t, uint16_t, H4, H2, do_usat_h) + +#undef SQCVT2 + +#define SQCVT4(NAME, TW, TN, HW, HN, SAT) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch; \ + size_t oprsz = simd_oprsz(desc), n = oprsz / sizeof(TW); \ + TW *s0 = vs, *s1 = vs + sizeof(ARMVectorReg); \ + TW *s2 = vs + 2 * sizeof(ARMVectorReg); \ + TW *s3 = vs + 3 * sizeof(ARMVectorReg); \ + TN *d = vd; \ + if (vectors_overlap(vd, 1, vs, 4)) { \ + d = (TN *)&scratch; \ + } \ + for (size_t i = 0; i < n; ++i) { \ + d[HN(i)] = SAT(s0[HW(i)]); \ + d[HN(i + n)] = SAT(s1[HW(i)]); \ + d[HN(i + 2 * n)] = SAT(s2[HW(i)]); \ + d[HN(i + 3 * n)] = SAT(s3[HW(i)]); \ + } \ + if (d != vd) { \ + memcpy(vd, d, oprsz); \ + } \ +} + +SQCVT4(sme2_sqcvt_sb, int32_t, int8_t, H4, H2, do_ssat_b) +SQCVT4(sme2_uqcvt_sb, uint32_t, uint8_t, H4, H2, do_usat_b) +SQCVT4(sme2_sqcvtu_sb, int32_t, uint8_t, H4, H2, do_usat_b) + +SQCVT4(sme2_sqcvt_dh, int64_t, int16_t, H8, H2, do_ssat_h) +SQCVT4(sme2_uqcvt_dh, uint64_t, uint16_t, H8, H2, do_usat_h) +SQCVT4(sme2_sqcvtu_dh, int64_t, uint16_t, H8, H2, do_usat_h) + +#undef SQCVT4 + /* Convert and interleave */ void HELPER(sme2_bfcvtn)(void *vd, void *vs, float_status *fpst, uint32_t desc) { @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_fcvtn)(void *vd, void *vs, float_status *fpst, uint32_t desc) } } +#define SQCVTN2(NAME, TW, TN, HW, HN, SAT) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch; \ + size_t oprsz = simd_oprsz(desc), n = oprsz / sizeof(TW); \ + TW *s0 = vs, *s1 = vs + sizeof(ARMVectorReg); \ + TN *d = vd; \ + if (vectors_overlap(vd, 1, vs, 2)) { \ + d = (TN *)&scratch; \ + } \ + for (size_t i = 0; i < n; ++i) { \ + d[HN(2 * i + 0)] = SAT(s0[HW(i)]); \ + d[HN(2 * i + 1)] = SAT(s1[HW(i)]); \ + } \ + if (d != vd) { \ + memcpy(vd, d, oprsz); \ + } \ +} + +SQCVTN2(sme2_sqcvtn_sh, int32_t, int16_t, H4, H2, do_ssat_h) +SQCVTN2(sme2_uqcvtn_sh, uint32_t, uint16_t, H4, H2, do_usat_h) +SQCVTN2(sme2_sqcvtun_sh, int32_t, uint16_t, H4, H2, do_usat_h) + +#undef SQCVTN2 + +#define SQCVTN4(NAME, TW, TN, HW, HN, SAT) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch; \ + size_t oprsz = simd_oprsz(desc), n = oprsz / sizeof(TW); \ + TW *s0 = vs, *s1 = vs + sizeof(ARMVectorReg); \ + TW *s2 = vs + 2 * sizeof(ARMVectorReg); \ + TW *s3 = vs + 3 * sizeof(ARMVectorReg); \ + TN *d = vd; \ + if (vectors_overlap(vd, 1, vs, 4)) { \ + d = (TN *)&scratch; \ + } \ + for (size_t i = 0; i < n; ++i) { \ + d[HN(4 * i + 0)] = SAT(s0[HW(i)]); \ + d[HN(4 * i + 1)] = SAT(s1[HW(i)]); \ + d[HN(4 * i + 2)] = SAT(s2[HW(i)]); \ + d[HN(4 * i + 3)] = SAT(s3[HW(i)]); \ + } \ + if (d != vd) { \ + memcpy(vd, d, oprsz); \ + } \ +} + +SQCVTN4(sme2_sqcvtn_sb, int32_t, int8_t, H4, H1, do_ssat_b) +SQCVTN4(sme2_uqcvtn_sb, uint32_t, uint8_t, H4, H1, do_usat_b) +SQCVTN4(sme2_sqcvtun_sb, int32_t, uint8_t, H4, H1, do_usat_b) + +SQCVTN4(sme2_sqcvtn_dh, int64_t, int16_t, H8, H2, do_ssat_h) +SQCVTN4(sme2_uqcvtn_dh, uint64_t, uint16_t, H8, H2, do_usat_h) +SQCVTN4(sme2_sqcvtun_dh, int64_t, uint16_t, H8, H2, do_usat_h) + +#undef SQCVTN4 + /* Expand and convert */ void HELPER(sme2_fcvt_w)(void *vd, void *vs, float_status *fpst, uint32_t desc) { diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FRINTM, aa64_sme2, do_zz_fpst, a, float_round_down, FPST_A64, gen_helper_gvec_vrint_rm_s) TRANS_FEAT(FRINTA, aa64_sme2, do_zz_fpst, a, float_round_ties_away, FPST_A64, gen_helper_gvec_vrint_rm_s) + +static bool do_zz(DisasContext *s, arg_zz_n *a, int data, + gen_helper_gvec_2 *fn) +{ + if (sme_sm_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + + for (int i = 0, n = a->n; i < n; ++i) { + tcg_gen_gvec_2_ool(vec_full_reg_offset(s, a->zd + i), + vec_full_reg_offset(s, a->zn + i), + svl, svl, data, fn); + } + } + return true; +} + +TRANS_FEAT(SQCVT_sh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvt_sh) +TRANS_FEAT(UQCVT_sh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uqcvt_sh) +TRANS_FEAT(SQCVTU_sh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtu_sh) + +TRANS_FEAT(SQCVT_sb, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvt_sb) +TRANS_FEAT(UQCVT_sb, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uqcvt_sb) +TRANS_FEAT(SQCVTU_sb, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtu_sb) + +TRANS_FEAT(SQCVT_dh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvt_dh) +TRANS_FEAT(UQCVT_dh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uqcvt_dh) +TRANS_FEAT(SQCVTU_dh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtu_dh) + +TRANS_FEAT(SQCVTN_sb, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtn_sb) +TRANS_FEAT(UQCVTN_sb, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uqcvtn_sb) +TRANS_FEAT(SQCVTUN_sb, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtun_sb) + +TRANS_FEAT(SQCVTN_dh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtn_dh) +TRANS_FEAT(UQCVTN_dh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uqcvtn_dh) +TRANS_FEAT(SQCVTUN_dh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtun_dh) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-61-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 20 ++++++++++++++++---- target/arm/tcg/translate-sve.c | 7 +++++++ 2 files changed, 23 insertions(+), 4 deletions(-) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ # as propagated via the MOVPRFX instruction. %reg_movprfx 0:5 +%rn_ax2 6:4 !function=times_2 + ########################################################################### # Named attribute sets. These are used to make nice(er) names # when creating helpers common to those for the individual @@ -XXX,XX +XXX,XX @@ # Two operand @pd_pn ........ esz:2 .. .... ....... rn:4 . rd:4 &rr_esz @rd_rn ........ esz:2 ...... ...... rn:5 rd:5 &rr_esz +@rd_rnx2 ........ ... ..... ...... ..... rd:5 &rr_esz rn=%rn_ax2 # Two operand with governing predicate, flags setting @pd_pg_pn_s ........ . s:1 ...... .. pg:4 . rn:4 . rd:4 &rpr_s @@ -XXX,XX +XXX,XX @@ UABA 01000101 .. 0 ..... 11111 1 ..... ..... @rd_rn_rm #### SVE2 Narrowing ## SVE2 saturating extract narrow - # Bits 23, 18-16 are zero, limited in the translator via esz < 3 & imm == 0. -SQXTNB 01000101 .. 1 ..... 010 000 ..... ..... @rd_rn_tszimm_shl + +{ + SQCVTN_sh 01000101 00 1 10001 010 000 ....0 ..... @rd_rnx2 esz=1 + SQXTNB 01000101 .. 1 ..... 010 000 ..... ..... @rd_rn_tszimm_shl +} SQXTNT 01000101 .. 1 ..... 010 001 ..... ..... @rd_rn_tszimm_shl -UQXTNB 01000101 .. 1 ..... 010 010 ..... ..... @rd_rn_tszimm_shl +{ + UQCVTN_sh 01000101 00 1 10001 010 010 ....0 ..... @rd_rnx2 esz=1 + UQXTNB 01000101 .. 1 ..... 010 010 ..... ..... @rd_rn_tszimm_shl +} UQXTNT 01000101 .. 1 ..... 010 011 ..... ..... @rd_rn_tszimm_shl -SQXTUNB 01000101 .. 1 ..... 010 100 ..... ..... @rd_rn_tszimm_shl +{ + SQCVTUN_sh 01000101 00 1 10001 010 100 ....0 ..... @rd_rnx2 esz=1 + SQXTUNB 01000101 .. 1 ..... 010 100 ..... ..... @rd_rn_tszimm_shl +} SQXTUNT 01000101 .. 1 ..... 010 101 ..... ..... @rd_rn_tszimm_shl ## SVE2 bitwise shift right narrow diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static void gen_uclamp(unsigned vece, uint32_t d, uint32_t n, uint32_t m, } TRANS_FEAT(UCLAMP, aa64_sme, gen_gvec_fn_arg_zzzz, gen_uclamp, a) + +TRANS_FEAT(SQCVTN_sh, aa64_sme2_or_sve2p1, gen_gvec_ool_zz, + gen_helper_sme2_sqcvtn_sh, a->rd, a->rn, 0) +TRANS_FEAT(UQCVTN_sh, aa64_sme2_or_sve2p1, gen_gvec_ool_zz, + gen_helper_sme2_uqcvtn_sh, a->rd, a->rn, 0) +TRANS_FEAT(SQCVTUN_sh, aa64_sme2_or_sve2p1, gen_gvec_ool_zz, + gen_helper_sme2_sqcvtun_sh, a->rd, a->rn, 0) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20250704142112.1018902-62-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 13 ++++++++++++ target/arm/tcg/sme.decode | 18 ++++++++++++++++ target/arm/tcg/sme_helper.c | 38 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 16 ++++++++++++++ 4 files changed, 85 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(sme2_sqcvtun_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_sqcvtn_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uqcvtn_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_sqcvtun_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sme2_sunpk2_bh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sunpk2_hs, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sunpk2_sd, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sunpk4_bh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sunpk4_hs, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sunpk4_sd, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uunpk2_bh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uunpk2_hs, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uunpk2_sd, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uunpk4_bh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uunpk4_hs, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uunpk4_sd, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SUB_aaz_d 11000001 111 000010 .. 111 ...00 11 ... @az_4x4_o3 &zz_n n=2 zd=%zd_ax2 zn=%zn_ax2 @zz_4x4 ........ ... ..... ...... .... . ..... \ &zz_n n=4 zd=%zd_ax4 zn=%zn_ax4 +@zz_4x2_n1 ........ ... ..... ...... .... . ..... \ + &zz_n n=1 zd=%zd_ax4 zn=%zn_ax2 BFCVT 11000001 011 00000 111000 ....0 ..... @zz_1x2 BFCVTN 11000001 011 00000 111000 ....1 ..... @zz_1x2 @@ -XXX,XX +XXX,XX @@ SQCVTUN_sb 11000001 011 10011 111000 ...10 ..... @zz_1x4 SQCVTN_dh 11000001 101 10011 111000 ...10 ..... @zz_1x4 UQCVTN_dh 11000001 101 10011 111000 ...11 ..... @zz_1x4 SQCVTUN_dh 11000001 111 10011 111000 ...10 ..... @zz_1x4 + +SUNPK_2bh 11000001 011 00101 111000 ..... ....0 @zz_2x1 +SUNPK_2hs 11000001 101 00101 111000 ..... ....0 @zz_2x1 +SUNPK_2sd 11000001 111 00101 111000 ..... ....0 @zz_2x1 + +UUNPK_2bh 11000001 011 00101 111000 ..... ....1 @zz_2x1 +UUNPK_2hs 11000001 101 00101 111000 ..... ....1 @zz_2x1 +UUNPK_2sd 11000001 111 00101 111000 ..... ....1 @zz_2x1 + +SUNPK_4bh 11000001 011 10101 111000 ....0 ...00 @zz_4x2_n1 +SUNPK_4hs 11000001 101 10101 111000 ....0 ...00 @zz_4x2_n1 +SUNPK_4sd 11000001 111 10101 111000 ....0 ...00 @zz_4x2_n1 + +UUNPK_4bh 11000001 011 10101 111000 ....0 ...01 @zz_4x2_n1 +UUNPK_4hs 11000001 101 10101 111000 ....0 ...01 @zz_4x2_n1 +UUNPK_4sd 11000001 111 10101 111000 ....0 ...01 @zz_4x2_n1 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_fcvt_w)(void *vd, void *vs, float_status *fpst, uint32_t desc) } } +#define UNPK(NAME, SREG, TW, TN, HW, HN) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch[SREG]; \ + size_t oprsz = simd_oprsz(desc); \ + size_t n = oprsz / sizeof(TW); \ + if (vectors_overlap(vd, 2 * SREG, vs, SREG)) { \ + vs = memcpy(scratch, vs, sizeof(scratch)); \ + } \ + for (size_t r = 0; r < SREG; ++r) { \ + TN *s = vs + r * sizeof(ARMVectorReg); \ + for (size_t i = 0; i < 2; ++i) { \ + TW *d = vd + (2 * r + i) * sizeof(ARMVectorReg); \ + for (size_t e = 0; e < n; ++e) { \ + d[HW(e)] = s[HN(i * n + e)]; \ + } \ + } \ + } \ +} + +UNPK(sme2_sunpk2_bh, 1, int16_t, int8_t, H2, H1) +UNPK(sme2_sunpk2_hs, 1, int32_t, int16_t, H4, H2) +UNPK(sme2_sunpk2_sd, 1, int64_t, int32_t, H8, H4) + +UNPK(sme2_sunpk4_bh, 2, int16_t, int8_t, H2, H1) +UNPK(sme2_sunpk4_hs, 2, int32_t, int16_t, H4, H2) +UNPK(sme2_sunpk4_sd, 2, int64_t, int32_t, H8, H4) + +UNPK(sme2_uunpk2_bh, 1, uint16_t, uint8_t, H2, H1) +UNPK(sme2_uunpk2_hs, 1, uint32_t, uint16_t, H4, H2) +UNPK(sme2_uunpk2_sd, 1, uint64_t, uint32_t, H8, H4) + +UNPK(sme2_uunpk4_bh, 2, uint16_t, uint8_t, H2, H1) +UNPK(sme2_uunpk4_hs, 2, uint32_t, uint16_t, H4, H2) +UNPK(sme2_uunpk4_sd, 2, uint64_t, uint32_t, H8, H4) + +#undef UNPK + /* Deinterleave and convert. */ void HELPER(sme2_fcvtl)(void *vd, void *vs, float_status *fpst, uint32_t desc) { diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SQCVTUN_sb, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtun_sb) TRANS_FEAT(SQCVTN_dh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtn_dh) TRANS_FEAT(UQCVTN_dh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uqcvtn_dh) TRANS_FEAT(SQCVTUN_dh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sqcvtun_dh) + +TRANS_FEAT(SUNPK_2bh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sunpk2_bh) +TRANS_FEAT(SUNPK_2hs, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sunpk2_hs) +TRANS_FEAT(SUNPK_2sd, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sunpk2_sd) + +TRANS_FEAT(SUNPK_4bh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sunpk4_bh) +TRANS_FEAT(SUNPK_4hs, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sunpk4_hs) +TRANS_FEAT(SUNPK_4sd, aa64_sme2, do_zz, a, 0, gen_helper_sme2_sunpk4_sd) + +TRANS_FEAT(UUNPK_2bh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk2_bh) +TRANS_FEAT(UUNPK_2hs, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk2_hs) +TRANS_FEAT(UUNPK_2sd, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk2_sd) + +TRANS_FEAT(UUNPK_4bh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk4_bh) +TRANS_FEAT(UUNPK_4hs, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk4_hs) +TRANS_FEAT(UUNPK_4sd, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk4_sd) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-63-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 12 ++++++ target/arm/tcg/sme.decode | 11 ++++++ target/arm/tcg/sme_helper.c | 68 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 39 +++++++++++++++++++ 4 files changed, 130 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(sme2_uunpk2_sd, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uunpk4_bh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uunpk4_hs, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uunpk4_sd, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sme2_zip4_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_zip4_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_zip4_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_zip4_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_zip4_q, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sme2_uzp4_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uzp4_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uzp4_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uzp4_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uzp4_q, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ SUB_aaz_d 11000001 111 000010 .. 111 ...00 11 ... @az_4x4_o3 ### SME2 Multi-vector SVE Constructive Unary +&zz_e zd zn esz &zz_n zd zn n @zz_1x2 ........ ... ..... ...... ..... zd:5 \ &zz_n n=1 zn=%zn_ax2 @@ -XXX,XX +XXX,XX @@ SUNPK_4sd 11000001 111 10101 111000 ....0 ...00 @zz_4x2_n1 UUNPK_4bh 11000001 011 10101 111000 ....0 ...01 @zz_4x2_n1 UUNPK_4hs 11000001 101 10101 111000 ....0 ...01 @zz_4x2_n1 UUNPK_4sd 11000001 111 10101 111000 ....0 ...01 @zz_4x2_n1 + +ZIP_4 11000001 esz:2 1 10110 111000 ...00 ... 00 \ + &zz_e zd=%zd_ax4 zn=%zn_ax4 +ZIP_4 11000001 001 10111 111000 ...00 ... 00 \ + &zz_e esz=4 zd=%zd_ax4 zn=%zn_ax4 + +UZP_4 11000001 esz:2 1 10110 111000 ...00 ... 10 \ + &zz_e zd=%zd_ax4 zn=%zn_ax4 +UZP_4 11000001 001 10111 111000 ...00 ... 10 \ + &zz_e esz=4 zd=%zd_ax4 zn=%zn_ax4 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_ucvtf)(void *vd, void *vs, float_status *fpst, uint32_t desc) d[i] = uint32_to_float32(s[i], fpst); } } + +#define ZIP4(NAME, TYPE, H) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch[4]; \ + size_t oprsz = simd_oprsz(desc); \ + size_t quads = oprsz / (sizeof(TYPE) * 4); \ + TYPE *s0, *s1, *s2, *s3; \ + if (vs == vd) { \ + vs = memcpy(scratch, vs, sizeof(scratch)); \ + } \ + s0 = vs; \ + s1 = vs + sizeof(ARMVectorReg); \ + s2 = vs + 2 * sizeof(ARMVectorReg); \ + s3 = vs + 3 * sizeof(ARMVectorReg); \ + for (size_t r = 0; r < 4; ++r) { \ + TYPE *d = vd + r * sizeof(ARMVectorReg); \ + size_t base = r * quads; \ + for (size_t q = 0; q < quads; ++q) { \ + d[H(4 * q + 0)] = s0[base + H(q)]; \ + d[H(4 * q + 1)] = s1[base + H(q)]; \ + d[H(4 * q + 2)] = s2[base + H(q)]; \ + d[H(4 * q + 3)] = s3[base + H(q)]; \ + } \ + } \ +} + +ZIP4(sme2_zip4_b, uint8_t, H1) +ZIP4(sme2_zip4_h, uint16_t, H2) +ZIP4(sme2_zip4_s, uint32_t, H4) +ZIP4(sme2_zip4_d, uint64_t, ) +ZIP4(sme2_zip4_q, Int128, ) + +#undef ZIP4 + +#define UZP4(NAME, TYPE, H) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch[4]; \ + size_t oprsz = simd_oprsz(desc); \ + size_t quads = oprsz / (sizeof(TYPE) * 4); \ + TYPE *d0, *d1, *d2, *d3; \ + if (vs == vd) { \ + vs = memcpy(scratch, vs, sizeof(scratch)); \ + } \ + d0 = vd; \ + d1 = vd + sizeof(ARMVectorReg); \ + d2 = vd + 2 * sizeof(ARMVectorReg); \ + d3 = vd + 3 * sizeof(ARMVectorReg); \ + for (size_t r = 0; r < 4; ++r) { \ + TYPE *s = vs + r * sizeof(ARMVectorReg); \ + size_t base = r * quads; \ + for (size_t q = 0; q < quads; ++q) { \ + d0[base + H(q)] = s[H(4 * q + 0)]; \ + d1[base + H(q)] = s[H(4 * q + 1)]; \ + d2[base + H(q)] = s[H(4 * q + 2)]; \ + d3[base + H(q)] = s[H(4 * q + 3)]; \ + } \ + } \ +} + +UZP4(sme2_uzp4_b, uint8_t, H1) +UZP4(sme2_uzp4_h, uint16_t, H2) +UZP4(sme2_uzp4_s, uint32_t, H4) +UZP4(sme2_uzp4_d, uint64_t, ) +UZP4(sme2_uzp4_q, Int128, ) + +#undef UZP4 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(UUNPK_2sd, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk2_sd) TRANS_FEAT(UUNPK_4bh, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk4_bh) TRANS_FEAT(UUNPK_4hs, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk4_hs) TRANS_FEAT(UUNPK_4sd, aa64_sme2, do_zz, a, 0, gen_helper_sme2_uunpk4_sd) + +static bool do_zipuzp_4(DisasContext *s, arg_zz_e *a, + gen_helper_gvec_2 * const fn[5]) +{ + int bytes_per_op = 4 << a->esz; + + /* Both MO_64 and MO_128 can fail the size test. */ + if (s->max_svl < bytes_per_op) { + unallocated_encoding(s); + } else if (sme_sm_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + if (svl < bytes_per_op) { + unallocated_encoding(s); + } else { + tcg_gen_gvec_2_ool(vec_full_reg_offset(s, a->zd), + vec_full_reg_offset(s, a->zn), + svl, svl, 0, fn[a->esz]); + } + } + return true; +} + +static gen_helper_gvec_2 * const zip4_fns[] = { + gen_helper_sme2_zip4_b, + gen_helper_sme2_zip4_h, + gen_helper_sme2_zip4_s, + gen_helper_sme2_zip4_d, + gen_helper_sme2_zip4_q, +}; +TRANS_FEAT(ZIP_4, aa64_sme2, do_zipuzp_4, a, zip4_fns) + +static gen_helper_gvec_2 * const uzp4_fns[] = { + gen_helper_sme2_uzp4_b, + gen_helper_sme2_uzp4_h, + gen_helper_sme2_uzp4_s, + gen_helper_sme2_uzp4_d, + gen_helper_sme2_uzp4_q, +}; +TRANS_FEAT(UZP_4, aa64_sme2, do_zipuzp_4, a, uzp4_fns) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Unify two copies of these inline functions. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-64-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/vec_internal.h | 21 +++++++++++++++++++++ target/arm/tcg/mve_helper.c | 21 --------------------- target/arm/tcg/sve_helper.c | 21 --------------------- 3 files changed, 21 insertions(+), 42 deletions(-) diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_internal.h +++ b/target/arm/tcg/vec_internal.h @@ -XXX,XX +XXX,XX @@ int64_t do_sqrdmlah_d(int64_t, int64_t, int64_t, bool, bool); #define do_usat_h(val) MIN(MAX(val, 0), UINT16_MAX) #define do_usat_s(val) MIN(MAX(val, 0), UINT32_MAX) +static inline uint64_t do_urshr(uint64_t x, unsigned sh) +{ + if (likely(sh < 64)) { + return (x >> sh) + ((x >> (sh - 1)) & 1); + } else if (sh == 64) { + return x >> 63; + } else { + return 0; + } +} + +static inline int64_t do_srshr(int64_t x, unsigned sh) +{ + if (likely(sh < 64)) { + return (x >> sh) + ((x >> (sh - 1)) & 1); + } else { + /* Rounding the sign bit always produces 0. */ + return 0; + } +} + /** * bfdotadd: * @sum: addend diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/mve_helper.c +++ b/target/arm/tcg/mve_helper.c @@ -XXX,XX +XXX,XX @@ DO_VSHLL_ALL(vshllt, true) DO_VSHRN(OP##tb, true, 1, uint8_t, 2, uint16_t, FN) \ DO_VSHRN(OP##th, true, 2, uint16_t, 4, uint32_t, FN) -static inline uint64_t do_urshr(uint64_t x, unsigned sh) -{ - if (likely(sh < 64)) { - return (x >> sh) + ((x >> (sh - 1)) & 1); - } else if (sh == 64) { - return x >> 63; - } else { - return 0; - } -} - -static inline int64_t do_srshr(int64_t x, unsigned sh) -{ - if (likely(sh < 64)) { - return (x >> sh) + ((x >> (sh - 1)) & 1); - } else { - /* Rounding the sign bit always produces 0. */ - return 0; - } -} - DO_VSHRN_ALL(vshrn, DO_SHR) DO_VSHRN_ALL(vrshrn, do_urshr) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vg, uint32_t desc) \ when N is negative, add 2**M-1. */ #define DO_ASRD(N, M) ((N + (N < 0 ? ((__typeof(N))1 << M) - 1 : 0)) >> M) -static inline uint64_t do_urshr(uint64_t x, unsigned sh) -{ - if (likely(sh < 64)) { - return (x >> sh) + ((x >> (sh - 1)) & 1); - } else if (sh == 64) { - return x >> 63; - } else { - return 0; - } -} - -static inline int64_t do_srshr(int64_t x, unsigned sh) -{ - if (likely(sh < 64)) { - return (x >> sh) + ((x >> (sh - 1)) & 1); - } else { - /* Rounding the sign bit always produces 0. */ - return 0; - } -} - DO_ZPZI(sve_asr_zpzi_b, int8_t, H1, DO_SHR) DO_ZPZI(sve_asr_zpzi_h, int16_t, H1_2, DO_SHR) DO_ZPZI(sve_asr_zpzi_s, int32_t, H1_4, DO_SHR) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20250704142112.1018902-65-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 20 ++++++ target/arm/tcg/sme.decode | 37 ++++++++++ target/arm/tcg/sme_helper.c | 120 +++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 33 +++++++++ 4 files changed, 210 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(sme2_uzp4_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uzp4_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uzp4_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uzp4_q, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sme2_sqrshr_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqrshr_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshru_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshr_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqrshr_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshru_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshr_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqrshr_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshru_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(sme2_sqrshrn_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqrshrn_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshrun_sh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshrn_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqrshrn_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshrun_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshrn_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_uqrshrn_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2_sqrshrun_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ UZP_4 11000001 esz:2 1 10110 111000 ...00 ... 10 \ &zz_e zd=%zd_ax4 zn=%zn_ax4 UZP_4 11000001 001 10111 111000 ...00 ... 10 \ &zz_e esz=4 zd=%zd_ax4 zn=%zn_ax4 + +### SME2 Multi-vector SVE Constructive Binary + +&rshr zd zn shift + +%rshr_sh_shift 16:4 !function=rsub_16 +%rshr_sb_shift 16:5 !function=rsub_32 +%rshr_dh_shift 22:1 16:5 !function=rsub_64 + +@rshr_sh ........ .... .... ...... ..... zd:5 \ + &rshr zn=%zn_ax2 shift=%rshr_sh_shift +@rshr_sb ........ ... ..... ...... ..... zd:5 \ + &rshr zn=%zn_ax4 shift=%rshr_sb_shift +@rshr_dh ........ ... ..... ...... ..... zd:5 \ + &rshr zn=%zn_ax4 shift=%rshr_dh_shift + +SQRSHR_sh 11000001 1110 .... 110101 ....0 ..... @rshr_sh +UQRSHR_sh 11000001 1110 .... 110101 ....1 ..... @rshr_sh +SQRSHRU_sh 11000001 1111 .... 110101 ....0 ..... @rshr_sh + +SQRSHR_sb 11000001 011 ..... 110110 ...00 ..... @rshr_sb +SQRSHR_dh 11000001 1.1 ..... 110110 ...00 ..... @rshr_dh +UQRSHR_sb 11000001 011 ..... 110110 ...01 ..... @rshr_sb +UQRSHR_dh 11000001 1.1 ..... 110110 ...01 ..... @rshr_dh +SQRSHRU_sb 11000001 011 ..... 110110 ...10 ..... @rshr_sb +SQRSHRU_dh 11000001 1.1 ..... 110110 ...10 ..... @rshr_dh + +SQRSHRN_sh 01000101 1011 .... 001010 ....0 ..... @rshr_sh +UQRSHRN_sh 01000101 1011 .... 001110 ....0 ..... @rshr_sh +SQRSHRUN_sh 01000101 1011 .... 000010 ....0 ..... @rshr_sh + +SQRSHRN_sb 11000001 011 ..... 110111 ...00 ..... @rshr_sb +SQRSHRN_dh 11000001 1.1 ..... 110111 ...00 ..... @rshr_dh +UQRSHRN_sb 11000001 011 ..... 110111 ...01 ..... @rshr_sb +UQRSHRN_dh 11000001 1.1 ..... 110111 ...01 ..... @rshr_dh +SQRSHRUN_sb 11000001 011 ..... 110111 ...10 ..... @rshr_sb +SQRSHRUN_dh 11000001 1.1 ..... 110111 ...10 ..... @rshr_dh diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ SQCVT4(sme2_sqcvtu_dh, int64_t, uint16_t, H8, H2, do_usat_h) #undef SQCVT4 +#define SQRSHR2(NAME, TW, TN, HW, HN, RSHR, SAT) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch; \ + size_t oprsz = simd_oprsz(desc), n = oprsz / sizeof(TW); \ + int shift = simd_data(desc); \ + TW *s0 = vs, *s1 = vs + sizeof(ARMVectorReg); \ + TN *d = vd; \ + if (vectors_overlap(vd, 1, vs, 2)) { \ + d = (TN *)&scratch; \ + } \ + for (size_t i = 0; i < n; ++i) { \ + d[HN(i)] = SAT(RSHR(s0[HW(i)], shift)); \ + d[HN(i + n)] = SAT(RSHR(s1[HW(i)], shift)); \ + } \ + if (d != vd) { \ + memcpy(vd, d, oprsz); \ + } \ +} + +SQRSHR2(sme2_sqrshr_sh, int32_t, int16_t, H4, H2, do_srshr, do_ssat_h) +SQRSHR2(sme2_uqrshr_sh, uint32_t, uint16_t, H4, H2, do_urshr, do_usat_h) +SQRSHR2(sme2_sqrshru_sh, int32_t, uint16_t, H4, H2, do_srshr, do_usat_h) + +#undef SQRSHR2 + +#define SQRSHR4(NAME, TW, TN, HW, HN, RSHR, SAT) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch; \ + size_t oprsz = simd_oprsz(desc), n = oprsz / sizeof(TW); \ + int shift = simd_data(desc); \ + TW *s0 = vs, *s1 = vs + sizeof(ARMVectorReg); \ + TW *s2 = vs + 2 * sizeof(ARMVectorReg); \ + TW *s3 = vs + 3 * sizeof(ARMVectorReg); \ + TN *d = vd; \ + if (vectors_overlap(vd, 1, vs, 4)) { \ + d = (TN *)&scratch; \ + } \ + for (size_t i = 0; i < n; ++i) { \ + d[HN(i)] = SAT(RSHR(s0[HW(i)], shift)); \ + d[HN(i + n)] = SAT(RSHR(s1[HW(i)], shift)); \ + d[HN(i + 2 * n)] = SAT(RSHR(s2[HW(i)], shift)); \ + d[HN(i + 3 * n)] = SAT(RSHR(s3[HW(i)], shift)); \ + } \ + if (d != vd) { \ + memcpy(vd, d, oprsz); \ + } \ +} + +SQRSHR4(sme2_sqrshr_sb, int32_t, int8_t, H4, H2, do_srshr, do_ssat_b) +SQRSHR4(sme2_uqrshr_sb, uint32_t, uint8_t, H4, H2, do_urshr, do_usat_b) +SQRSHR4(sme2_sqrshru_sb, int32_t, uint8_t, H4, H2, do_srshr, do_usat_b) + +SQRSHR4(sme2_sqrshr_dh, int64_t, int16_t, H8, H2, do_srshr, do_ssat_h) +SQRSHR4(sme2_uqrshr_dh, uint64_t, uint16_t, H8, H2, do_urshr, do_usat_h) +SQRSHR4(sme2_sqrshru_dh, int64_t, uint16_t, H8, H2, do_srshr, do_usat_h) + +#undef SQRSHR4 + /* Convert and interleave */ void HELPER(sme2_bfcvtn)(void *vd, void *vs, float_status *fpst, uint32_t desc) { @@ -XXX,XX +XXX,XX @@ SQCVTN4(sme2_sqcvtun_dh, int64_t, uint16_t, H8, H2, do_usat_h) #undef SQCVTN4 +#define SQRSHRN2(NAME, TW, TN, HW, HN, RSHR, SAT) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch; \ + size_t oprsz = simd_oprsz(desc), n = oprsz / sizeof(TW); \ + int shift = simd_data(desc); \ + TW *s0 = vs, *s1 = vs + sizeof(ARMVectorReg); \ + TN *d = vd; \ + if (vectors_overlap(vd, 1, vs, 2)) { \ + d = (TN *)&scratch; \ + } \ + for (size_t i = 0; i < n; ++i) { \ + d[HN(2 * i + 0)] = SAT(RSHR(s0[HW(i)], shift)); \ + d[HN(2 * i + 1)] = SAT(RSHR(s1[HW(i)], shift)); \ + } \ + if (d != vd) { \ + memcpy(vd, d, oprsz); \ + } \ +} + +SQRSHRN2(sme2_sqrshrn_sh, int32_t, int16_t, H4, H2, do_srshr, do_ssat_h) +SQRSHRN2(sme2_uqrshrn_sh, uint32_t, uint16_t, H4, H2, do_urshr, do_usat_h) +SQRSHRN2(sme2_sqrshrun_sh, int32_t, uint16_t, H4, H2, do_srshr, do_usat_h) + +#undef SQRSHRN2 + +#define SQRSHRN4(NAME, TW, TN, HW, HN, RSHR, SAT) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + ARMVectorReg scratch; \ + size_t oprsz = simd_oprsz(desc), n = oprsz / sizeof(TW); \ + int shift = simd_data(desc); \ + TW *s0 = vs, *s1 = vs + sizeof(ARMVectorReg); \ + TW *s2 = vs + 2 * sizeof(ARMVectorReg); \ + TW *s3 = vs + 3 * sizeof(ARMVectorReg); \ + TN *d = vd; \ + if (vectors_overlap(vd, 1, vs, 4)) { \ + d = (TN *)&scratch; \ + } \ + for (size_t i = 0; i < n; ++i) { \ + d[HN(4 * i + 0)] = SAT(RSHR(s0[HW(i)], shift)); \ + d[HN(4 * i + 1)] = SAT(RSHR(s1[HW(i)], shift)); \ + d[HN(4 * i + 2)] = SAT(RSHR(s2[HW(i)], shift)); \ + d[HN(4 * i + 3)] = SAT(RSHR(s3[HW(i)], shift)); \ + } \ + if (d != vd) { \ + memcpy(vd, d, oprsz); \ + } \ +} + +SQRSHRN4(sme2_sqrshrn_sb, int32_t, int8_t, H4, H1, do_srshr, do_ssat_b) +SQRSHRN4(sme2_uqrshrn_sb, uint32_t, uint8_t, H4, H1, do_urshr, do_usat_b) +SQRSHRN4(sme2_sqrshrun_sb, int32_t, uint8_t, H4, H1, do_srshr, do_usat_b) + +SQRSHRN4(sme2_sqrshrn_dh, int64_t, int16_t, H8, H2, do_srshr, do_ssat_h) +SQRSHRN4(sme2_uqrshrn_dh, uint64_t, uint16_t, H8, H2, do_urshr, do_usat_h) +SQRSHRN4(sme2_sqrshrun_dh, int64_t, uint16_t, H8, H2, do_srshr, do_usat_h) + +#undef SQRSHRN4 + /* Expand and convert */ void HELPER(sme2_fcvt_w)(void *vd, void *vs, float_status *fpst, uint32_t desc) { diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2 * const uzp4_fns[] = { gen_helper_sme2_uzp4_q, }; TRANS_FEAT(UZP_4, aa64_sme2, do_zipuzp_4, a, uzp4_fns) + +static bool do_zz_rshr(DisasContext *s, arg_rshr *a, gen_helper_gvec_2 *fn) +{ + if (sve_access_check(s)) { + int vl = vec_full_reg_size(s); + tcg_gen_gvec_2_ool(vec_full_reg_offset(s, a->zd), + vec_full_reg_offset(s, a->zn), + vl, vl, a->shift, fn); + } + return true; +} + +TRANS_FEAT(SQRSHR_sh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshr_sh) +TRANS_FEAT(UQRSHR_sh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_uqrshr_sh) +TRANS_FEAT(SQRSHRU_sh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshru_sh) + +TRANS_FEAT(SQRSHR_sb, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshr_sb) +TRANS_FEAT(SQRSHR_dh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshr_dh) +TRANS_FEAT(UQRSHR_sb, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_uqrshr_sb) +TRANS_FEAT(UQRSHR_dh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_uqrshr_dh) +TRANS_FEAT(SQRSHRU_sb, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshru_sb) +TRANS_FEAT(SQRSHRU_dh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshru_dh) + +TRANS_FEAT(SQRSHRN_sh, aa64_sme2_or_sve2p1, do_zz_rshr, a, gen_helper_sme2_sqrshrn_sh) +TRANS_FEAT(UQRSHRN_sh, aa64_sme2_or_sve2p1, do_zz_rshr, a, gen_helper_sme2_uqrshrn_sh) +TRANS_FEAT(SQRSHRUN_sh, aa64_sme2_or_sve2p1, do_zz_rshr, a, gen_helper_sme2_sqrshrun_sh) + +TRANS_FEAT(SQRSHRN_sb, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshrn_sb) +TRANS_FEAT(SQRSHRN_dh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshrn_dh) +TRANS_FEAT(UQRSHRN_sb, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_uqrshrn_sb) +TRANS_FEAT(UQRSHRN_dh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_uqrshrn_dh) +TRANS_FEAT(SQRSHRUN_sb, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshrun_sb) +TRANS_FEAT(SQRSHRUN_dh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshrun_dh) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-66-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 12 +++++++ target/arm/tcg/sme.decode | 12 +++++++ target/arm/tcg/sme_helper.c | 62 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 40 ++++++++++++++++++++++ 4 files changed, 126 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(sme2_uunpk4_bh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uunpk4_hs, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uunpk4_sd, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_zip2_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_zip2_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_zip2_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_zip2_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_zip2_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sme2_uzp2_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uzp2_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uzp2_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uzp2_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uzp2_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_3(sme2_zip4_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_zip4_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_zip4_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ UQRSHRN_sb 11000001 011 ..... 110111 ...01 ..... @rshr_sb UQRSHRN_dh 11000001 1.1 ..... 110111 ...01 ..... @rshr_dh SQRSHRUN_sb 11000001 011 ..... 110111 ...10 ..... @rshr_sb SQRSHRUN_dh 11000001 1.1 ..... 110111 ...10 ..... @rshr_dh + +&zzz_e zd zn zm esz + +ZIP_2 11000001 esz:2 1 zm:5 110100 zn:5 .... 0 \ + &zzz_e zd=%zd_ax2 +ZIP_2 11000001 00 1 zm:5 110101 zn:5 .... 0 \ + &zzz_e zd=%zd_ax2 esz=4 + +UZP_2 11000001 esz:2 1 zm:5 110100 zn:5 .... 1 \ + &zzz_e zd=%zd_ax2 +UZP_2 11000001 00 1 zm:5 110101 zn:5 .... 1 \ + &zzz_e zd=%zd_ax2 esz=4 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_ucvtf)(void *vd, void *vs, float_status *fpst, uint32_t desc) } } +#define ZIP2(NAME, TYPE, H) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ +{ \ + ARMVectorReg scratch[2]; \ + size_t oprsz = simd_oprsz(desc); \ + size_t pairs = oprsz / (sizeof(TYPE) * 2); \ + TYPE *n = vn, *m = vm; \ + if (vectors_overlap(vd, 2, vn, 1)) { \ + n = memcpy(&scratch[0], vn, oprsz); \ + } \ + if (vectors_overlap(vd, 2, vm, 1)) { \ + m = memcpy(&scratch[1], vm, oprsz); \ + } \ + for (size_t r = 0; r < 2; ++r) { \ + TYPE *d = vd + r * sizeof(ARMVectorReg); \ + size_t base = r * pairs; \ + for (size_t p = 0; p < pairs; ++p) { \ + d[H(2 * p + 0)] = n[base + H(p)]; \ + d[H(2 * p + 1)] = m[base + H(p)]; \ + } \ + } \ +} + +ZIP2(sme2_zip2_b, uint8_t, H1) +ZIP2(sme2_zip2_h, uint16_t, H2) +ZIP2(sme2_zip2_s, uint32_t, H4) +ZIP2(sme2_zip2_d, uint64_t, ) +ZIP2(sme2_zip2_q, Int128, ) + +#undef ZIP2 + #define ZIP4(NAME, TYPE, H) \ void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ { \ @@ -XXX,XX +XXX,XX @@ ZIP4(sme2_zip4_q, Int128, ) #undef ZIP4 +#define UZP2(NAME, TYPE, H) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ +{ \ + ARMVectorReg scratch[2]; \ + size_t oprsz = simd_oprsz(desc); \ + size_t pairs = oprsz / (sizeof(TYPE) * 2); \ + TYPE *d0 = vd, *d1 = vd + sizeof(ARMVectorReg); \ + if (vectors_overlap(vd, 2, vn, 1)) { \ + vn = memcpy(&scratch[0], vn, oprsz); \ + } \ + if (vectors_overlap(vd, 2, vm, 1)) { \ + vm = memcpy(&scratch[1], vm, oprsz); \ + } \ + for (size_t r = 0; r < 2; ++r) { \ + TYPE *s = r ? vm : vn; \ + size_t base = r * pairs; \ + for (size_t p = 0; p < pairs; ++p) { \ + d0[base + H(p)] = s[H(2 * p + 0)]; \ + d1[base + H(p)] = s[H(2 * p + 1)]; \ + } \ + } \ +} + +UZP2(sme2_uzp2_b, uint8_t, H1) +UZP2(sme2_uzp2_h, uint16_t, H2) +UZP2(sme2_uzp2_s, uint32_t, H4) +UZP2(sme2_uzp2_d, uint64_t, ) +UZP2(sme2_uzp2_q, Int128, ) + +#undef UZP2 + #define UZP4(NAME, TYPE, H) \ void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ { \ diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(UQRSHRN_sb, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_uqrshrn_sb) TRANS_FEAT(UQRSHRN_dh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_uqrshrn_dh) TRANS_FEAT(SQRSHRUN_sb, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshrun_sb) TRANS_FEAT(SQRSHRUN_dh, aa64_sme2, do_zz_rshr, a, gen_helper_sme2_sqrshrun_dh) + +static bool do_zipuzp_2(DisasContext *s, arg_zzz_e *a, + gen_helper_gvec_3 * const fn[5]) +{ + int bytes_per_op = 2 << a->esz; + + /* MO_128 can fail the size test. */ + if (s->max_svl < bytes_per_op) { + unallocated_encoding(s); + } else if (sme_sm_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + if (svl < bytes_per_op) { + unallocated_encoding(s); + } else { + tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->zd), + vec_full_reg_offset(s, a->zn), + vec_full_reg_offset(s, a->zm), + svl, svl, 0, fn[a->esz]); + } + } + return true; +} + +static gen_helper_gvec_3 * const zip2_fns[] = { + gen_helper_sme2_zip2_b, + gen_helper_sme2_zip2_h, + gen_helper_sme2_zip2_s, + gen_helper_sme2_zip2_d, + gen_helper_sme2_zip2_q, +}; +TRANS_FEAT(ZIP_2, aa64_sme2, do_zipuzp_2, a, zip2_fns) + +static gen_helper_gvec_3 * const uzp2_fns[] = { + gen_helper_sme2_uzp2_b, + gen_helper_sme2_uzp2_h, + gen_helper_sme2_uzp2_s, + gen_helper_sme2_uzp2_d, + gen_helper_sme2_uzp2_q, +}; +TRANS_FEAT(UZP_2, aa64_sme2, do_zipuzp_2, a, uzp2_fns) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-67-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 15 +++++++ target/arm/tcg/sme.decode | 17 ++++++++ target/arm/tcg/sme_helper.c | 56 +++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 75 ++++++++++++++++++++++++++++++++++ 4 files changed, 163 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(sme2_sqrshrun_sb, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_sqrshrn_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_uqrshrn_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_sqrshrun_dh, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sme2_sclamp_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_sclamp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_sclamp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_sclamp_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sme2_uclamp_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uclamp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uclamp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sme2_uclamp_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sme2_fclamp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sme2_fclamp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sme2_fclamp_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sme2_bfclamp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ UZP_2 11000001 esz:2 1 zm:5 110100 zn:5 .... 1 \ &zzz_e zd=%zd_ax2 UZP_2 11000001 00 1 zm:5 110101 zn:5 .... 1 \ &zzz_e zd=%zd_ax2 esz=4 + +&zzz_en zd zn zm esz n + +FCLAMP 11000001 esz:2 1 zm:5 110000 zn:5 .... 0 \ + &zzz_en zd=%zd_ax2 n=2 +FCLAMP 11000001 esz:2 1 zm:5 110010 zn:5 ...0 0 \ + &zzz_en zd=%zd_ax4 n=4 + +SCLAMP 11000001 esz:2 1 zm:5 110001 zn:5 .... 0 \ + &zzz_en zd=%zd_ax2 n=2 +SCLAMP 11000001 esz:2 1 zm:5 110011 zn:5 ...0 0 \ + &zzz_en zd=%zd_ax4 n=4 + +UCLAMP 11000001 esz:2 1 zm:5 110001 zn:5 .... 1 \ + &zzz_en zd=%zd_ax2 n=2 +UCLAMP 11000001 esz:2 1 zm:5 110011 zn:5 ...0 1 \ + &zzz_en zd=%zd_ax4 n=4 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ UZP4(sme2_uzp4_d, uint64_t, ) UZP4(sme2_uzp4_q, Int128, ) #undef UZP4 + +#define ICLAMP(NAME, TYPE, H) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ +{ \ + size_t stride = sizeof(ARMVectorReg) / sizeof(TYPE); \ + size_t elements = simd_oprsz(desc) / sizeof(TYPE); \ + size_t nreg = simd_data(desc); \ + TYPE *d = vd, *n = vn, *m = vm; \ + for (size_t e = 0; e < elements; e++) { \ + TYPE nn = n[H(e)], mm = m[H(e)]; \ + for (size_t r = 0; r < nreg; r++) { \ + TYPE *dd = &d[r * stride + H(e)]; \ + *dd = MIN(MAX(*dd, nn), mm); \ + } \ + } \ +} + +ICLAMP(sme2_sclamp_b, int8_t, H1) +ICLAMP(sme2_sclamp_h, int16_t, H2) +ICLAMP(sme2_sclamp_s, int32_t, H4) +ICLAMP(sme2_sclamp_d, int64_t, H8) + +ICLAMP(sme2_uclamp_b, uint8_t, H1) +ICLAMP(sme2_uclamp_h, uint16_t, H2) +ICLAMP(sme2_uclamp_s, uint32_t, H4) +ICLAMP(sme2_uclamp_d, uint64_t, H8) + +#undef ICLAMP + +/* + * Note the argument ordering to minnum and maxnum must match + * the ARM pseudocode so that NaNs are propagated properly. + */ +#define FCLAMP(NAME, TYPE, H) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, \ + float_status *fpst, uint32_t desc) \ +{ \ + size_t stride = sizeof(ARMVectorReg) / sizeof(TYPE); \ + size_t elements = simd_oprsz(desc) / sizeof(TYPE); \ + size_t nreg = simd_data(desc); \ + TYPE *d = vd, *n = vn, *m = vm; \ + for (size_t e = 0; e < elements; e++) { \ + TYPE nn = n[H(e)], mm = m[H(e)]; \ + for (size_t r = 0; r < nreg; r++) { \ + TYPE *dd = &d[r * stride + H(e)]; \ + *dd = TYPE##_minnum(TYPE##_maxnum(nn, *dd, fpst), mm, fpst); \ + } \ + } \ +} + +FCLAMP(sme2_fclamp_h, float16, H2) +FCLAMP(sme2_fclamp_s, float32, H4) +FCLAMP(sme2_fclamp_d, float64, H8) +FCLAMP(sme2_bfclamp, bfloat16, H2) + +#undef FCLAMP diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3 * const uzp2_fns[] = { gen_helper_sme2_uzp2_q, }; TRANS_FEAT(UZP_2, aa64_sme2, do_zipuzp_2, a, uzp2_fns) + +static bool trans_FCLAMP(DisasContext *s, arg_zzz_en *a) +{ + static gen_helper_gvec_3_ptr * const fn[] = { + gen_helper_sme2_bfclamp, + gen_helper_sme2_fclamp_h, + gen_helper_sme2_fclamp_s, + gen_helper_sme2_fclamp_d, + }; + TCGv_ptr fpst; + int vl; + + if (!dc_isar_feature(aa64_sme2, s)) { + return false; + } + /* This insn uses MO_8 to encode BFloat16. */ + if (a->esz == MO_8 && !dc_isar_feature(aa64_sme_b16b16, s)) { + return false; + } + if (!sme_sm_enabled_check(s)) { + return true; + } + + fpst = fpstatus_ptr(a->esz == MO_16 ? FPST_A64_F16 : FPST_A64); + vl = vec_full_reg_size(s); + + tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, a->zd), + vec_full_reg_offset(s, a->zn), + vec_full_reg_offset(s, a->zm), + fpst, vl, vl, a->n, fn[a->esz]); + return true; +} + +static bool do_clamp(DisasContext *s, arg_zzz_en *a, + gen_helper_gvec_3 * const fn[4]) +{ + int vl; + + if (!dc_isar_feature(aa64_sme2, s)) { + return false; + } + if (!sme_sm_enabled_check(s)) { + return true; + } + + /* + * Clamp is just a min+max, easily supported by most host + * vector operations -- we already have such an expansion in + * translate-sve.c for a single output. + * TODO: Add support in gvec for multiple simultaneous output, + * and/or copy to temporary upon overlap. + */ + vl = vec_full_reg_size(s); + tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->zd), + vec_full_reg_offset(s, a->zn), + vec_full_reg_offset(s, a->zm), + vl, vl, a->n, fn[a->esz]); + return true; +} + +static gen_helper_gvec_3 * const sclamp_fns[] = { + gen_helper_sme2_sclamp_b, + gen_helper_sme2_sclamp_h, + gen_helper_sme2_sclamp_s, + gen_helper_sme2_sclamp_d, +}; +TRANS(SCLAMP, do_clamp, a, sclamp_fns) + +static gen_helper_gvec_3 * const uclamp_fns[] = { + gen_helper_sme2_uclamp_b, + gen_helper_sme2_uclamp_h, + gen_helper_sme2_uclamp_s, + gen_helper_sme2_uclamp_d, +}; +TRANS(UCLAMP, do_clamp, a, uclamp_fns) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> These instructions are present in both SME(1) and SVE2.1 extensions. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-68-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-sve.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static void gen_sclamp(unsigned vece, uint32_t d, uint32_t n, uint32_t m, tcg_gen_gvec_4(d, n, m, a, oprsz, maxsz, &ops[vece]); } -TRANS_FEAT(SCLAMP, aa64_sme, gen_gvec_fn_arg_zzzz, gen_sclamp, a) +TRANS_FEAT(SCLAMP, aa64_sme_or_sve2p1, gen_gvec_fn_arg_zzzz, gen_sclamp, a) static void gen_uclamp_i32(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_i32 a) { @@ -XXX,XX +XXX,XX @@ static void gen_uclamp(unsigned vece, uint32_t d, uint32_t n, uint32_t m, tcg_gen_gvec_4(d, n, m, a, oprsz, maxsz, &ops[vece]); } -TRANS_FEAT(UCLAMP, aa64_sme, gen_gvec_fn_arg_zzzz, gen_uclamp, a) +TRANS_FEAT(UCLAMP, aa64_sme_or_sve2p1, gen_gvec_fn_arg_zzzz, gen_uclamp, a) TRANS_FEAT(SQCVTN_sh, aa64_sme2_or_sve2p1, gen_gvec_ool_zz, gen_helper_sme2_sqcvtn_sh, a->rd, a->rn, 0) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> This is the single vector version within SVE decode space. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-69-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 2 ++ target/arm/tcg/translate-sve.c | 22 ++++++++++++++++++++++ 2 files changed, 24 insertions(+) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ PSEL 00100101 .1 1 000 .. 01 .... 0 .... 0 .... \ SCLAMP 01000100 .. 0 ..... 110000 ..... ..... @rda_rn_rm UCLAMP 01000100 .. 0 ..... 110001 ..... ..... @rda_rn_rm + +FCLAMP 01100100 .. 1 ..... 001001 ..... ..... @rda_rn_rm diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static void gen_uclamp(unsigned vece, uint32_t d, uint32_t n, uint32_t m, TRANS_FEAT(UCLAMP, aa64_sme_or_sve2p1, gen_gvec_fn_arg_zzzz, gen_uclamp, a) +static bool trans_FCLAMP(DisasContext *s, arg_FCLAMP *a) +{ + static gen_helper_gvec_3_ptr * const fn[] = { + gen_helper_sme2_bfclamp, + gen_helper_sme2_fclamp_h, + gen_helper_sme2_fclamp_s, + gen_helper_sme2_fclamp_d, + }; + + /* This insn uses MO_8 to encode BFloat16. */ + if (a->esz == MO_8 + ? !dc_isar_feature(aa64_sve_b16b16, s) + : !dc_isar_feature(aa64_sme2_or_sve2p1, s)) { + return false; + } + + /* So far we never optimize rda with MOVPRFX */ + assert(a->rd == a->ra); + return gen_gvec_fpst_zzz(s, fn[a->esz], a->rd, a->rn, a->rm, 1, + a->esz == MO_16 ? FPST_A64_F16 : FPST_A64); +} + TRANS_FEAT(SQCVTN_sh, aa64_sme2_or_sve2p1, gen_gvec_ool_zz, gen_helper_sme2_sqcvtn_sh, a->rd, a->rn, 0) TRANS_FEAT(UQCVTN_sh, aa64_sme2_or_sve2p1, gen_gvec_ool_zz, -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-70-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sme.decode | 23 +++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 20 ++++++++++++++++++++ 2 files changed, 43 insertions(+) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ UCLAMP 11000001 esz:2 1 zm:5 110001 zn:5 .... 1 \ &zzz_en zd=%zd_ax2 n=2 UCLAMP 11000001 esz:2 1 zm:5 110011 zn:5 ...0 1 \ &zzz_en zd=%zd_ax4 n=4 + +### SME Multiple Zero + +&zero_za rv off ngrp nvec + +ZERO_za 11000000 000011 000 .. 0000000000 off:3 \ + &zero_za ngrp=2 nvec=1 rv=%mova_rv +ZERO_za 11000000 000011 100 .. 0000000000 off:3 \ + &zero_za ngrp=4 nvec=1 rv=%mova_rv + +ZERO_za 11000000 000011 001 .. 0000000000 ... \ + &zero_za ngrp=1 nvec=2 rv=%mova_rv off=%off3_x2 +ZERO_za 11000000 000011 010 .. 0000000000 0.. \ + &zero_za ngrp=2 nvec=2 rv=%mova_rv off=%off2_x2 +ZERO_za 11000000 000011 011 .. 0000000000 0.. \ + &zero_za ngrp=4 nvec=2 rv=%mova_rv off=%off2_x2 + +ZERO_za 11000000 000011 101 .. 0000000000 0.. \ + &zero_za ngrp=1 nvec=4 rv=%mova_rv off=%off2_x4 +ZERO_za 11000000 000011 110 .. 0000000000 00. \ + &zero_za ngrp=2 nvec=4 rv=%mova_rv off=%off1_x4 +ZERO_za 11000000 000011 111 .. 0000000000 00. \ + &zero_za ngrp=4 nvec=4 rv=%mova_rv off=%off1_x4 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool trans_ZERO_zt0(DisasContext *s, arg_ZERO_zt0 *a) return true; } +static bool trans_ZERO_za(DisasContext *s, arg_ZERO_za *a) +{ + if (!dc_isar_feature(aa64_sme2p1, s)) { + return false; + } + if (sme_smza_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + int vstride = svl / a->ngrp; + TCGv_ptr t_za = get_zarray(s, a->rv, a->off, a->ngrp, a->nvec); + + for (int r = 0; r < a->ngrp; ++r) { + for (int i = 0; i < a->nvec; ++i) { + int o_za = (r * vstride + i) * sizeof(ARMVectorReg); + tcg_gen_gvec_dup_imm_var(MO_64, t_za, o_za, svl, svl, 0); + } + } + } + return true; +} + static bool do_mova_tile(DisasContext *s, arg_mova_p *a, bool to_vec) { static gen_helper_gvec_4 * const h_fns[5] = { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> For WHILE, we have the count of enabled predicates, so we don't need to search to compute the PredTest result. Reuse the logic that will shortly be required for counted predicates. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20250704142112.1018902-71-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve_helper.c | 81 +++++++++++++++++++++---------------- 1 file changed, 46 insertions(+), 35 deletions(-) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ uint64_t HELPER(sve_cntp)(void *vn, void *vg, uint32_t pred_desc) return sum; } +/* C.f. Arm pseudocode PredCountTest */ +static uint32_t pred_count_test(uint32_t elements, uint32_t count, bool invert) +{ + uint32_t flags; + + if (count == 0) { + flags = 1; /* !N, Z, C */ + } else if (!invert) { + flags = (1u << 31) | 2; /* N, !Z */ + flags |= count != elements; /* C */ + } else { + flags = 2; /* !Z, !C */ + flags |= (count == elements) << 31; /* N */ + } + return flags; +} + uint32_t HELPER(sve_whilel)(void *vd, uint32_t count, uint32_t pred_desc) { intptr_t oprsz = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); intptr_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); uint64_t esz_mask = pred_esz_masks[esz]; ARMPredicateReg *d = vd; - uint32_t flags; - intptr_t i; + intptr_t i, oprbits = oprsz * 8; + + tcg_debug_assert(count <= oprbits); /* Begin with a zero predicate register. */ - flags = do_zero(d, oprsz); - if (count == 0) { - return flags; + do_zero(d, oprsz); + if (count) { + /* Set all of the requested bits. */ + for (i = 0; i < count / 64; ++i) { + d->p[i] = esz_mask; + } + if (count & 63) { + d->p[i] = MAKE_64BIT_MASK(0, count & 63) & esz_mask; + } } - /* Set all of the requested bits. */ - for (i = 0; i < count / 64; ++i) { - d->p[i] = esz_mask; - } - if (count & 63) { - d->p[i] = MAKE_64BIT_MASK(0, count & 63) & esz_mask; - } - - return predtest_ones(d, oprsz, esz_mask); + return pred_count_test(oprbits, count, false); } uint32_t HELPER(sve_whileg)(void *vd, uint32_t count, uint32_t pred_desc) @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whileg)(void *vd, uint32_t count, uint32_t pred_desc) intptr_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); uint64_t esz_mask = pred_esz_masks[esz]; ARMPredicateReg *d = vd; - intptr_t i, invcount, oprbits; + intptr_t i, invcount, oprbits = oprsz * 8; uint64_t bits; - if (count == 0) { - return do_zero(d, oprsz); - } - - oprbits = oprsz * 8; tcg_debug_assert(count <= oprbits); - bits = esz_mask; - if (oprbits & 63) { - bits &= MAKE_64BIT_MASK(0, oprbits & 63); - } - - invcount = oprbits - count; - for (i = (oprsz - 1) / 8; i > invcount / 64; --i) { - d->p[i] = bits; + /* Begin with a zero predicate register. */ + do_zero(d, oprsz); + if (count) { + /* Set all of the requested bits. */ bits = esz_mask; + if (oprbits & 63) { + bits &= MAKE_64BIT_MASK(0, oprbits & 63); + } + + invcount = oprbits - count; + for (i = (oprsz - 1) / 8; i > invcount / 64; --i) { + d->p[i] = bits; + bits = esz_mask; + } + d->p[i] = bits & MAKE_64BIT_MASK(invcount & 63, 64); } - d->p[i] = bits & MAKE_64BIT_MASK(invcount & 63, 64); - - while (--i >= 0) { - d->p[i] = 0; - } - - return predtest_ones(d, oprsz, esz_mask); + return pred_count_test(oprbits, count, true); } /* Recursive reduction on a function; -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Merge predtest_ones into its only caller. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-72-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve_helper.c | 34 ++++++++++++++-------------------- 1 file changed, 14 insertions(+), 20 deletions(-) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sve_brkn)(void *vd, void *vn, void *vg, uint32_t pred_desc) } } -/* As if PredTest(Ones(PL), D, esz). */ -static uint32_t predtest_ones(ARMPredicateReg *d, intptr_t oprsz, - uint64_t esz_mask) -{ - uint32_t flags = PREDTEST_INIT; - intptr_t i; - - for (i = 0; i < oprsz / 8; i++) { - flags = iter_predtest_fwd(d->p[i], esz_mask, flags); - } - if (oprsz & 7) { - uint64_t mask = ~(-1ULL << (8 * (oprsz & 7))); - flags = iter_predtest_fwd(d->p[i], esz_mask & mask, flags); - } - return flags; -} - uint32_t HELPER(sve_brkns)(void *vd, void *vn, void *vg, uint32_t pred_desc) { intptr_t oprsz = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); if (last_active_pred(vn, vg, oprsz)) { - return predtest_ones(vd, oprsz, -1); - } else { - return do_zero(vd, oprsz); + ARMPredicateReg *d = vd; + uint32_t flags = PREDTEST_INIT; + intptr_t i; + + /* As if PredTest(Ones(PL), D, MO_8). */ + for (i = 0; i < oprsz / 8; i++) { + flags = iter_predtest_fwd(d->p[i], -1, flags); + } + if (oprsz & 7) { + uint64_t mask = ~(-1ULL << (8 * (oprsz & 7))); + flags = iter_predtest_fwd(d->p[i], mask, flags); + } + return flags; } + return do_zero(vd, oprsz); } uint64_t HELPER(sve_cntp)(void *vn, void *vg, uint32_t pred_desc) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Expand to memset plus the return value, when used. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20250704142112.1018902-73-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve_helper.c | 28 +++++++++++----------------- 1 file changed, 11 insertions(+), 17 deletions(-) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ static uint32_t compute_brks_m(uint64_t *d, uint64_t *n, uint64_t *g, return flags; } -static uint32_t do_zero(ARMPredicateReg *d, intptr_t oprsz) -{ - /* It is quicker to zero the whole predicate than loop on OPRSZ. - * The compiler should turn this into 4 64-bit integer stores. - */ - memset(d, 0, sizeof(ARMPredicateReg)); - return PREDTEST_INIT; -} - void HELPER(sve_brkpa)(void *vd, void *vn, void *vm, void *vg, uint32_t pred_desc) { @@ -XXX,XX +XXX,XX @@ void HELPER(sve_brkpa)(void *vd, void *vn, void *vm, void *vg, if (last_active_pred(vn, vg, oprsz)) { compute_brk_z(vd, vm, vg, oprsz, true); } else { - do_zero(vd, oprsz); + memset(vd, 0, sizeof(ARMPredicateReg)); } } @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_brkpas)(void *vd, void *vn, void *vm, void *vg, if (last_active_pred(vn, vg, oprsz)) { return compute_brks_z(vd, vm, vg, oprsz, true); } else { - return do_zero(vd, oprsz); + memset(vd, 0, sizeof(ARMPredicateReg)); + return PREDTEST_INIT; } } @@ -XXX,XX +XXX,XX @@ void HELPER(sve_brkpb)(void *vd, void *vn, void *vm, void *vg, if (last_active_pred(vn, vg, oprsz)) { compute_brk_z(vd, vm, vg, oprsz, false); } else { - do_zero(vd, oprsz); + memset(vd, 0, sizeof(ARMPredicateReg)); } } @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_brkpbs)(void *vd, void *vn, void *vm, void *vg, if (last_active_pred(vn, vg, oprsz)) { return compute_brks_z(vd, vm, vg, oprsz, false); } else { - return do_zero(vd, oprsz); + memset(vd, 0, sizeof(ARMPredicateReg)); + return PREDTEST_INIT; } } @@ -XXX,XX +XXX,XX @@ void HELPER(sve_brkn)(void *vd, void *vn, void *vg, uint32_t pred_desc) { intptr_t oprsz = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); if (!last_active_pred(vn, vg, oprsz)) { - do_zero(vd, oprsz); + memset(vd, 0, sizeof(ARMPredicateReg)); } } @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_brkns)(void *vd, void *vn, void *vg, uint32_t pred_desc) } return flags; } - return do_zero(vd, oprsz); + memset(vd, 0, sizeof(ARMPredicateReg)); + return PREDTEST_INIT; } uint64_t HELPER(sve_cntp)(void *vn, void *vg, uint32_t pred_desc) @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whilel)(void *vd, uint32_t count, uint32_t pred_desc) tcg_debug_assert(count <= oprbits); /* Begin with a zero predicate register. */ - do_zero(d, oprsz); + memset(d, 0, sizeof(*d)); if (count) { /* Set all of the requested bits. */ for (i = 0; i < count / 64; ++i) { @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whileg)(void *vd, uint32_t count, uint32_t pred_desc) tcg_debug_assert(count <= oprbits); /* Begin with a zero predicate register. */ - do_zero(d, oprsz); + memset(d, 0, sizeof(*d)); if (count) { /* Set all of the requested bits. */ bits = esz_mask; -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-74-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve_helper.c | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ static uint32_t pred_count_test(uint32_t elements, uint32_t count, bool invert) return flags; } -uint32_t HELPER(sve_whilel)(void *vd, uint32_t count, uint32_t pred_desc) +/* D must be cleared on entry. */ +static void do_whilel(ARMPredicateReg *d, uint64_t esz_mask, + uint32_t count, uint32_t oprbits) { - intptr_t oprsz = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); - intptr_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); - uint64_t esz_mask = pred_esz_masks[esz]; - ARMPredicateReg *d = vd; - intptr_t i, oprbits = oprsz * 8; - tcg_debug_assert(count <= oprbits); - - /* Begin with a zero predicate register. */ - memset(d, 0, sizeof(*d)); if (count) { + uint32_t i; + /* Set all of the requested bits. */ for (i = 0; i < count / 64; ++i) { d->p[i] = esz_mask; @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whilel)(void *vd, uint32_t count, uint32_t pred_desc) d->p[i] = MAKE_64BIT_MASK(0, count & 63) & esz_mask; } } +} +uint32_t HELPER(sve_whilel)(void *vd, uint32_t count, uint32_t pred_desc) +{ + uint32_t oprsz = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); + uint32_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); + uint32_t oprbits = oprsz * 8; + uint64_t esz_mask = pred_esz_masks[esz]; + ARMPredicateReg *d = vd; + + memset(d, 0, sizeof(*d)); + do_whilel(d, esz_mask, count, oprbits); return pred_count_test(oprbits, count, false); } -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20250704142112.1018902-75-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve_helper.c | 39 +++++++++++++++++++------------------ 1 file changed, 20 insertions(+), 19 deletions(-) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whilel)(void *vd, uint32_t count, uint32_t pred_desc) return pred_count_test(oprbits, count, false); } -uint32_t HELPER(sve_whileg)(void *vd, uint32_t count, uint32_t pred_desc) +/* D must be cleared on entry. */ +static void do_whileg(ARMPredicateReg *d, uint64_t esz_mask, + uint32_t count, uint32_t oprbits) { - intptr_t oprsz = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); - intptr_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); - uint64_t esz_mask = pred_esz_masks[esz]; - ARMPredicateReg *d = vd; - intptr_t i, invcount, oprbits = oprsz * 8; - uint64_t bits; - tcg_debug_assert(count <= oprbits); - - /* Begin with a zero predicate register. */ - memset(d, 0, sizeof(*d)); if (count) { - /* Set all of the requested bits. */ - bits = esz_mask; - if (oprbits & 63) { - bits &= MAKE_64BIT_MASK(0, oprbits & 63); - } + uint32_t i, invcount = oprbits - count; + uint64_t bits = esz_mask & MAKE_64BIT_MASK(invcount & 63, 64); - invcount = oprbits - count; - for (i = (oprsz - 1) / 8; i > invcount / 64; --i) { + for (i = invcount / 64; i < oprbits / 64; ++i) { d->p[i] = bits; bits = esz_mask; } - d->p[i] = bits & MAKE_64BIT_MASK(invcount & 63, 64); + if (oprbits & 63) { + d->p[i] = bits & MAKE_64BIT_MASK(0, oprbits & 63); + } } +} +uint32_t HELPER(sve_whileg)(void *vd, uint32_t count, uint32_t pred_desc) +{ + uint32_t oprsz = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); + uint32_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); + uint32_t oprbits = oprsz * 8; + uint64_t esz_mask = pred_esz_masks[esz]; + ARMPredicateReg *d = vd; + + memset(d, 0, sizeof(*d)); + do_whileg(d, esz_mask, count, oprbits); return pred_count_test(oprbits, count, true); } -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Change the API to pass element count rather than bit count. This will be helpful later for predicate as counter. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-76-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve_helper.c | 2 ++ target/arm/tcg/translate-sve.c | 13 +++++-------- 2 files changed, 7 insertions(+), 8 deletions(-) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whilel)(void *vd, uint32_t count, uint32_t pred_desc) uint64_t esz_mask = pred_esz_masks[esz]; ARMPredicateReg *d = vd; + count <<= esz; memset(d, 0, sizeof(*d)); do_whilel(d, esz_mask, count, oprbits); return pred_count_test(oprbits, count, false); @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whileg)(void *vd, uint32_t count, uint32_t pred_desc) uint64_t esz_mask = pred_esz_masks[esz]; ARMPredicateReg *d = vd; + count <<= esz; memset(d, 0, sizeof(*d)); do_whileg(d, esz_mask, count, oprbits); return pred_count_test(oprbits, count, true); diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_WHILE(DisasContext *s, arg_WHILE *a) t2 = tcg_temp_new_i32(); tcg_gen_extrl_i64_i32(t2, t0); - /* Scale elements to bits. */ - tcg_gen_shli_i32(t2, t2, a->esz); - desc = FIELD_DP32(desc, PREDDESC, OPRSZ, vsz / 8); desc = FIELD_DP32(desc, PREDDESC, ESZ, a->esz); @@ -XXX,XX +XXX,XX @@ static bool trans_WHILE_ptr(DisasContext *s, arg_WHILE_ptr *a) op0 = read_cpu_reg(s, a->rn, 1); op1 = read_cpu_reg(s, a->rm, 1); - tmax = tcg_constant_i64(vsz); + tmax = tcg_constant_i64(vsz >> a->esz); diff = tcg_temp_new_i64(); if (a->rw) { @@ -XXX,XX +XXX,XX @@ static bool trans_WHILE_ptr(DisasContext *s, arg_WHILE_ptr *a) tcg_gen_sub_i64(diff, op0, op1); tcg_gen_sub_i64(t1, op1, op0); tcg_gen_movcond_i64(TCG_COND_GEU, diff, op0, op1, diff, t1); - /* Round down to a multiple of ESIZE. */ - tcg_gen_andi_i64(diff, diff, -1 << a->esz); + /* Divide, rounding down, by ESIZE. */ + tcg_gen_shri_i64(diff, diff, a->esz); /* If op1 == op0, diff == 0, and the condition is always true. */ tcg_gen_movcond_i64(TCG_COND_EQ, diff, op0, op1, tmax, diff); } else { /* WHILEWR */ tcg_gen_sub_i64(diff, op1, op0); - /* Round down to a multiple of ESIZE. */ - tcg_gen_andi_i64(diff, diff, -1 << a->esz); + /* Divide, rounding down, by ESIZE. */ + tcg_gen_shri_i64(diff, diff, a->esz); /* If op0 >= op1, diff <= 0, the condition is always true. */ tcg_gen_movcond_i64(TCG_COND_GEU, diff, op0, op1, tmax, diff); } -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Use TRANS_FEAT to select the correct predicate. Pass the helper and a boolean to do_WHILE. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-77-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 4 +++- target/arm/tcg/translate-sve.c | 23 +++++++++-------------- 2 files changed, 12 insertions(+), 15 deletions(-) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ SINCDECP_z 00100101 .. 1010 d:1 u:1 10000 00 .... ..... @incdec2_pred CTERM 00100101 1 sf:1 1 rm:5 001000 rn:5 ne:1 0000 # SVE integer compare scalar count and limit -WHILE 00100101 esz:2 1 rm:5 000 sf:1 u:1 lt:1 rn:5 eq:1 rd:4 +&while esz rd rn rm sf u eq +WHILE_lt 00100101 esz:2 1 rm:5 000 sf:1 u:1 1 rn:5 eq:1 rd:4 &while +WHILE_gt 00100101 esz:2 1 rm:5 000 sf:1 u:1 0 rn:5 eq:1 rd:4 &while # SVE2 pointer conflict compare WHILE_ptr 00100101 esz:2 1 rm:5 001 100 rn:5 rw:1 rd:4 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_CTERM(DisasContext *s, arg_CTERM *a) return true; } -static bool trans_WHILE(DisasContext *s, arg_WHILE *a) +typedef void gen_while_fn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32); +static bool do_WHILE(DisasContext *s, arg_while *a, bool lt, gen_while_fn *fn) { TCGv_i64 op0, op1, t0, t1, tmax; TCGv_i32 t2; @@ -XXX,XX +XXX,XX @@ static bool trans_WHILE(DisasContext *s, arg_WHILE *a) TCGCond cond; uint64_t maxval; /* Note that GE/HS has a->eq == 0 and GT/HI has a->eq == 1. */ - bool eq = a->eq == a->lt; + bool eq = a->eq == lt; - /* The greater-than conditions are all SVE2. */ - if (a->lt - ? !dc_isar_feature(aa64_sve, s) - : !dc_isar_feature(aa64_sve2, s)) { - return false; - } if (!sve_access_check(s)) { return true; } @@ -XXX,XX +XXX,XX @@ static bool trans_WHILE(DisasContext *s, arg_WHILE *a) t0 = tcg_temp_new_i64(); t1 = tcg_temp_new_i64(); - if (a->lt) { + if (lt) { tcg_gen_sub_i64(t0, op1, op0); if (a->u) { maxval = a->sf ? UINT64_MAX : UINT32_MAX; @@ -XXX,XX +XXX,XX @@ static bool trans_WHILE(DisasContext *s, arg_WHILE *a) ptr = tcg_temp_new_ptr(); tcg_gen_addi_ptr(ptr, tcg_env, pred_full_reg_offset(s, a->rd)); - if (a->lt) { - gen_helper_sve_whilel(t2, ptr, t2, tcg_constant_i32(desc)); - } else { - gen_helper_sve_whileg(t2, ptr, t2, tcg_constant_i32(desc)); - } + fn(t2, ptr, t2, tcg_constant_i32(desc)); + do_pred_flags(t2); return true; } +TRANS_FEAT(WHILE_lt, aa64_sve, do_WHILE, a, true, gen_helper_sve_whilel) +TRANS_FEAT(WHILE_gt, aa64_sve2, do_WHILE, a, false, gen_helper_sve_whileg) + static bool trans_WHILE_ptr(DisasContext *s, arg_WHILE_ptr *a) { TCGv_i64 op0, op1, diff, t1, tmax; -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-78-richard.henderson@linaro.org This instruction is present in both SME(1) and SVE2.1 extensions. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/translate-sve.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_PSEL(DisasContext *s, arg_psel *a) TCGv_i64 tmp, didx, dbit; TCGv_ptr ptr; - if (!dc_isar_feature(aa64_sme, s)) { + if (!dc_isar_feature(aa64_sme_or_sve2p1, s)) { return false; } if (!sve_access_check(s)) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-79-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 3 +++ target/arm/tcg/sve.decode | 8 +++++++ target/arm/tcg/sve_helper.c | 40 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sve.c | 14 ++++++++---- 4 files changed, 61 insertions(+), 4 deletions(-) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(sve_cntp, TCG_CALL_NO_RWG, i64, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve_whilel, TCG_CALL_NO_RWG, i32, ptr, i32, i32) DEF_HELPER_FLAGS_3(sve_whileg, TCG_CALL_NO_RWG, i32, ptr, i32, i32) +DEF_HELPER_FLAGS_3(sve_while2l, TCG_CALL_NO_RWG, i32, ptr, i32, i32) +DEF_HELPER_FLAGS_3(sve_while2g, TCG_CALL_NO_RWG, i32, ptr, i32, i32) + DEF_HELPER_FLAGS_4(sve_subri_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32) DEF_HELPER_FLAGS_4(sve_subri_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32) DEF_HELPER_FLAGS_4(sve_subri_s, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ WHILE_gt 00100101 esz:2 1 rm:5 000 sf:1 u:1 0 rn:5 eq:1 rd:4 &while # SVE2 pointer conflict compare WHILE_ptr 00100101 esz:2 1 rm:5 001 100 rn:5 rw:1 rd:4 +# SVE2.1 predicate pair +%pd_pair 1:3 !function=times_2 +@while_pair ........ esz:2 . rm:5 .... u:1 . rn:5 . ... eq:1 \ + &while rd=%pd_pair sf=1 + +WHILE_lt_pair 00100101 .. 1 ..... 0101 . 1 ..... 1 ... . @while_pair +WHILE_gt_pair 00100101 .. 1 ..... 0101 . 0 ..... 1 ... . @while_pair + ### SVE Integer Wide Immediate - Unpredicated Group # SVE broadcast floating-point immediate (unpredicated) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whilel)(void *vd, uint32_t count, uint32_t pred_desc) return pred_count_test(oprbits, count, false); } +uint32_t HELPER(sve_while2l)(void *vd, uint32_t count, uint32_t pred_desc) +{ + uint32_t oprsz = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); + uint32_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); + uint32_t oprbits = oprsz * 8; + uint64_t esz_mask = pred_esz_masks[esz]; + ARMPredicateReg *d = vd; + + count <<= esz; + memset(d, 0, 2 * sizeof(*d)); + if (count <= oprbits) { + do_whilel(&d[0], esz_mask, count, oprbits); + } else { + do_whilel(&d[0], esz_mask, oprbits, oprbits); + do_whilel(&d[1], esz_mask, count - oprbits, oprbits); + } + + return pred_count_test(2 * oprbits, count, false); +} + /* D must be cleared on entry. */ static void do_whileg(ARMPredicateReg *d, uint64_t esz_mask, uint32_t count, uint32_t oprbits) @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whileg)(void *vd, uint32_t count, uint32_t pred_desc) return pred_count_test(oprbits, count, true); } +uint32_t HELPER(sve_while2g)(void *vd, uint32_t count, uint32_t pred_desc) +{ + uint32_t oprsz = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); + uint32_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); + uint32_t oprbits = oprsz * 8; + uint64_t esz_mask = pred_esz_masks[esz]; + ARMPredicateReg *d = vd; + + count <<= esz; + memset(d, 0, 2 * sizeof(*d)); + if (count <= oprbits) { + do_whileg(&d[1], esz_mask, count, oprbits); + } else { + do_whilel(&d[1], esz_mask, oprbits, oprbits); + do_whileg(&d[0], esz_mask, count - oprbits, oprbits); + } + + return pred_count_test(2 * oprbits, count, true); +} + /* Recursive reduction on a function; * C.f. the ARM ARM function ReducePredicated. * diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_CTERM(DisasContext *s, arg_CTERM *a) } typedef void gen_while_fn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32); -static bool do_WHILE(DisasContext *s, arg_while *a, bool lt, gen_while_fn *fn) +static bool do_WHILE(DisasContext *s, arg_while *a, + bool lt, int scale, gen_while_fn *fn) { TCGv_i64 op0, op1, t0, t1, tmax; TCGv_i32 t2; @@ -XXX,XX +XXX,XX @@ static bool do_WHILE(DisasContext *s, arg_while *a, bool lt, gen_while_fn *fn) } } - tmax = tcg_constant_i64(vsz >> a->esz); + tmax = tcg_constant_i64((vsz << scale) >> a->esz); if (eq) { /* Equality means one more iteration. */ tcg_gen_addi_i64(t0, t0, 1); @@ -XXX,XX +XXX,XX @@ static bool do_WHILE(DisasContext *s, arg_while *a, bool lt, gen_while_fn *fn) return true; } -TRANS_FEAT(WHILE_lt, aa64_sve, do_WHILE, a, true, gen_helper_sve_whilel) -TRANS_FEAT(WHILE_gt, aa64_sve2, do_WHILE, a, false, gen_helper_sve_whileg) +TRANS_FEAT(WHILE_lt, aa64_sve, do_WHILE, a, true, 0, gen_helper_sve_whilel) +TRANS_FEAT(WHILE_gt, aa64_sve2, do_WHILE, a, false, 0, gen_helper_sve_whileg) + +TRANS_FEAT(WHILE_lt_pair, aa64_sme2_or_sve2p1, do_WHILE, + a, true, 1, gen_helper_sve_while2l) +TRANS_FEAT(WHILE_gt_pair, aa64_sme2_or_sve2p1, do_WHILE, + a, false, 1, gen_helper_sve_while2g) static bool trans_WHILE_ptr(DisasContext *s, arg_WHILE_ptr *a) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-80-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 3 ++ target/arm/tcg/sve.decode | 11 +++++++ target/arm/tcg/sve_helper.c | 53 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sve.c | 22 ++++++++++---- 4 files changed, 84 insertions(+), 5 deletions(-) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(sve_whileg, TCG_CALL_NO_RWG, i32, ptr, i32, i32) DEF_HELPER_FLAGS_3(sve_while2l, TCG_CALL_NO_RWG, i32, ptr, i32, i32) DEF_HELPER_FLAGS_3(sve_while2g, TCG_CALL_NO_RWG, i32, ptr, i32, i32) +DEF_HELPER_FLAGS_3(sve_whilecl, TCG_CALL_NO_RWG, i32, ptr, i32, i32) +DEF_HELPER_FLAGS_3(sve_whilecg, TCG_CALL_NO_RWG, i32, ptr, i32, i32) + DEF_HELPER_FLAGS_4(sve_subri_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32) DEF_HELPER_FLAGS_4(sve_subri_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32) DEF_HELPER_FLAGS_4(sve_subri_s, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ %rn_ax2 6:4 !function=times_2 +%pnd 0:3 !function=plus_8 + ########################################################################### # Named attribute sets. These are used to make nice(er) names # when creating helpers common to those for the individual @@ -XXX,XX +XXX,XX @@ WHILE_ptr 00100101 esz:2 1 rm:5 001 100 rn:5 rw:1 rd:4 WHILE_lt_pair 00100101 .. 1 ..... 0101 . 1 ..... 1 ... . @while_pair WHILE_gt_pair 00100101 .. 1 ..... 0101 . 0 ..... 1 ... . @while_pair +# SVE2.1 predicate as count +@while_cnt ........ esz:2 . rm:5 .... u:1 . rn:5 . eq:1 ... \ + &while rd=%pnd sf=1 + +WHILE_lt_cnt2 00100101 .. 1 ..... 0100 . 1 ..... 1 . ... @while_cnt +WHILE_lt_cnt4 00100101 .. 1 ..... 0110 . 1 ..... 1 . ... @while_cnt +WHILE_gt_cnt2 00100101 .. 1 ..... 0100 . 0 ..... 1 . ... @while_cnt +WHILE_gt_cnt4 00100101 .. 1 ..... 0110 . 0 ..... 1 . ... @while_cnt + ### SVE Integer Wide Immediate - Unpredicated Group # SVE broadcast floating-point immediate (unpredicated) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ uint64_t HELPER(sve_cntp)(void *vn, void *vg, uint32_t pred_desc) return sum; } +/* C.f. Arm pseudocode EncodePredCount */ +static uint64_t encode_pred_count(uint32_t elements, uint32_t count, + uint32_t esz, bool invert) +{ + uint32_t pred; + + if (count == 0) { + return 0; + } + if (invert) { + count = elements - count; + } else if (count == elements) { + count = 0; + invert = true; + } + + pred = (count << 1) | 1; + pred <<= esz; + pred |= invert << 15; + + return pred; +} + /* C.f. Arm pseudocode PredCountTest */ static uint32_t pred_count_test(uint32_t elements, uint32_t count, bool invert) { @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_while2l)(void *vd, uint32_t count, uint32_t pred_desc) return pred_count_test(2 * oprbits, count, false); } +uint32_t HELPER(sve_whilecl)(void *vd, uint32_t count, uint32_t pred_desc) +{ + uint32_t pl = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); + uint32_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); + uint32_t scale = FIELD_EX32(pred_desc, PREDDESC, DATA); + uint32_t vl = pl * 8; + uint32_t elements = (vl >> esz) << scale; + ARMPredicateReg *d = vd; + + *d = (ARMPredicateReg) { + .p[0] = encode_pred_count(elements, count, esz, false) + }; + return pred_count_test(elements, count, false); +} + /* D must be cleared on entry. */ static void do_whileg(ARMPredicateReg *d, uint64_t esz_mask, uint32_t count, uint32_t oprbits) @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_while2g)(void *vd, uint32_t count, uint32_t pred_desc) return pred_count_test(2 * oprbits, count, true); } +uint32_t HELPER(sve_whilecg)(void *vd, uint32_t count, uint32_t pred_desc) +{ + uint32_t pl = FIELD_EX32(pred_desc, PREDDESC, OPRSZ); + uint32_t esz = FIELD_EX32(pred_desc, PREDDESC, ESZ); + uint32_t scale = FIELD_EX32(pred_desc, PREDDESC, DATA); + uint32_t vl = pl * 8; + uint32_t elements = (vl >> esz) << scale; + ARMPredicateReg *d = vd; + + *d = (ARMPredicateReg) { + .p[0] = encode_pred_count(elements, count, esz, true) + }; + return pred_count_test(elements, count, true); +} + /* Recursive reduction on a function; * C.f. the ARM ARM function ReducePredicated. * diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_CTERM(DisasContext *s, arg_CTERM *a) typedef void gen_while_fn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32); static bool do_WHILE(DisasContext *s, arg_while *a, - bool lt, int scale, gen_while_fn *fn) + bool lt, int scale, int data, gen_while_fn *fn) { TCGv_i64 op0, op1, t0, t1, tmax; TCGv_i32 t2; @@ -XXX,XX +XXX,XX @@ static bool do_WHILE(DisasContext *s, arg_while *a, desc = FIELD_DP32(desc, PREDDESC, OPRSZ, vsz / 8); desc = FIELD_DP32(desc, PREDDESC, ESZ, a->esz); + desc = FIELD_DP32(desc, PREDDESC, DATA, data); ptr = tcg_temp_new_ptr(); tcg_gen_addi_ptr(ptr, tcg_env, pred_full_reg_offset(s, a->rd)); @@ -XXX,XX +XXX,XX @@ static bool do_WHILE(DisasContext *s, arg_while *a, return true; } -TRANS_FEAT(WHILE_lt, aa64_sve, do_WHILE, a, true, 0, gen_helper_sve_whilel) -TRANS_FEAT(WHILE_gt, aa64_sve2, do_WHILE, a, false, 0, gen_helper_sve_whileg) +TRANS_FEAT(WHILE_lt, aa64_sve, do_WHILE, + a, true, 0, 0, gen_helper_sve_whilel) +TRANS_FEAT(WHILE_gt, aa64_sve2, do_WHILE, + a, false, 0, 0, gen_helper_sve_whileg) TRANS_FEAT(WHILE_lt_pair, aa64_sme2_or_sve2p1, do_WHILE, - a, true, 1, gen_helper_sve_while2l) + a, true, 1, 0, gen_helper_sve_while2l) TRANS_FEAT(WHILE_gt_pair, aa64_sme2_or_sve2p1, do_WHILE, - a, false, 1, gen_helper_sve_while2g) + a, false, 1, 0, gen_helper_sve_while2g) + +TRANS_FEAT(WHILE_lt_cnt2, aa64_sme2_or_sve2p1, do_WHILE, + a, true, 1, 1, gen_helper_sve_whilecl) +TRANS_FEAT(WHILE_lt_cnt4, aa64_sme2_or_sve2p1, do_WHILE, + a, true, 2, 2, gen_helper_sve_whilecl) +TRANS_FEAT(WHILE_gt_cnt2, aa64_sme2_or_sve2p1, do_WHILE, + a, false, 1, 1, gen_helper_sve_whilecg) +TRANS_FEAT(WHILE_gt_cnt4, aa64_sme2_or_sve2p1, do_WHILE, + a, false, 2, 2, gen_helper_sve_whilecg) static bool trans_WHILE_ptr(DisasContext *s, arg_WHILE_ptr *a) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-81-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 1 + target/arm/tcg/translate-sve.c | 16 ++++++++++++++++ 2 files changed, 17 insertions(+) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ PTEST 00100101 01 010000 11 pg:4 0 rn:4 0 0000 # SVE predicate initialize PTRUE 00100101 esz:2 01100 s:1 111000 pat:5 0 rd:4 +PTRUE_cnt 00100101 esz:2 1000000111100000010 ... rd=%pnd # SVE initialize FFR SETFFR 00100101 0010 1100 1001 0000 0000 0000 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool do_predset(DisasContext *s, int esz, int rd, int pat, bool setflag) TRANS_FEAT(PTRUE, aa64_sve, do_predset, a->esz, a->rd, a->pat, a->s) +static bool trans_PTRUE_cnt(DisasContext *s, arg_PTRUE_cnt *a) +{ + if (!dc_isar_feature(aa64_sme2_or_sve2p1, s)) { + return false; + } + if (sve_access_check(s)) { + /* Canonical TRUE is 0 count, invert bit, plus element size. */ + int val = (1 << 15) | (1 << a->esz); + + /* Write val to the first uint64_t; clear all of the rest. */ + tcg_gen_gvec_dup_imm(MO_64, pred_full_reg_offset(s, a->rd), + 8, size_for_gvec(pred_full_reg_size(s)), val); + } + return true; +} + /* Note pat == 31 is #all, to set all elements. */ TRANS_FEAT_NONSTREAMING(SETFFR, aa64_sve, do_predset, 0, FFR_PRED_NUM, 31, false) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-82-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 25 ++++++++++++++++++ target/arm/tcg/sve.decode | 7 ++++++ target/arm/tcg/sve_helper.c | 46 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sve.c | 35 ++++++++++++++++++++++++++ 4 files changed, 113 insertions(+) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve2_sqshlu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_sqshlu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_sqshlu_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_sqshlu_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2p1_addqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_addqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_addqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_addqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2p1_smaxqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_smaxqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_smaxqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_smaxqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2p1_sminqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_sminqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_sminqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_sminqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2p1_umaxqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_umaxqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_umaxqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_umaxqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2p1_uminqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_uminqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_uminqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_uminqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ UMAXV 00000100 .. 001 001 001 ... ..... ..... @rd_pg_rn SMINV 00000100 .. 001 010 001 ... ..... ..... @rd_pg_rn UMINV 00000100 .. 001 011 001 ... ..... ..... @rd_pg_rn +# SVE2.1 segment reduction +ADDQV 00000100 .. 000 101 001 ... ..... ..... @rd_pg_rn +SMAXQV 00000100 .. 001 100 001 ... ..... ..... @rd_pg_rn +SMINQV 00000100 .. 001 110 001 ... ..... ..... @rd_pg_rn +UMAXQV 00000100 .. 001 101 001 ... ..... ..... @rd_pg_rn +UMINQV 00000100 .. 001 111 001 ... ..... ..... @rd_pg_rn + ### SVE Shift by Immediate - Predicated Group # SVE bitwise shift by immediate (predicated) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ DO_VPZ_D(sve_uminv_d, uint64_t, uint64_t, -1, DO_MIN) #undef DO_VPZ #undef DO_VPZ_D +#define DO_VPQ(NAME, TYPE, H, INIT, OP) \ +void HELPER(NAME)(void *vd, void *vn, void *vg, uint32_t desc) \ +{ \ + TYPE tmp[16 / sizeof(TYPE)] = { [0 ... 16 / sizeof(TYPE) - 1] = INIT }; \ + TYPE *n = vn; uint16_t *g = vg; \ + uintptr_t oprsz = simd_oprsz(desc); \ + uintptr_t nseg = oprsz / 16, nsegelt = 16 / sizeof(TYPE); \ + for (uintptr_t s = 0; s < nseg; s++) { \ + uint16_t pg = g[H2(s)]; \ + for (uintptr_t e = 0; e < nsegelt; e++, pg >>= sizeof(TYPE)) { \ + if (pg & 1) { \ + tmp[e] = OP(tmp[H(e)], n[s * nsegelt + H(e)]); \ + } \ + } \ + } \ + memcpy(vd, tmp, 16); \ + clear_tail(vd, 16, simd_maxsz(desc)); \ +} + +DO_VPQ(sve2p1_addqv_b, uint8_t, H1, 0, DO_ADD) +DO_VPQ(sve2p1_addqv_h, uint16_t, H2, 0, DO_ADD) +DO_VPQ(sve2p1_addqv_s, uint32_t, H4, 0, DO_ADD) +DO_VPQ(sve2p1_addqv_d, uint64_t, H8, 0, DO_ADD) + +DO_VPQ(sve2p1_smaxqv_b, int8_t, H1, INT8_MIN, DO_MAX) +DO_VPQ(sve2p1_smaxqv_h, int16_t, H2, INT16_MIN, DO_MAX) +DO_VPQ(sve2p1_smaxqv_s, int32_t, H4, INT32_MIN, DO_MAX) +DO_VPQ(sve2p1_smaxqv_d, int64_t, H8, INT64_MIN, DO_MAX) + +DO_VPQ(sve2p1_sminqv_b, int8_t, H1, INT8_MAX, DO_MIN) +DO_VPQ(sve2p1_sminqv_h, int16_t, H2, INT16_MAX, DO_MIN) +DO_VPQ(sve2p1_sminqv_s, int32_t, H4, INT32_MAX, DO_MIN) +DO_VPQ(sve2p1_sminqv_d, int64_t, H8, INT64_MAX, DO_MIN) + +DO_VPQ(sve2p1_umaxqv_b, uint8_t, H1, 0, DO_MAX) +DO_VPQ(sve2p1_umaxqv_h, uint16_t, H2, 0, DO_MAX) +DO_VPQ(sve2p1_umaxqv_s, uint32_t, H4, 0, DO_MAX) +DO_VPQ(sve2p1_umaxqv_d, uint64_t, H8, 0, DO_MAX) + +DO_VPQ(sve2p1_uminqv_b, uint8_t, H1, -1, DO_MIN) +DO_VPQ(sve2p1_uminqv_h, uint16_t, H2, -1, DO_MIN) +DO_VPQ(sve2p1_uminqv_s, uint32_t, H4, -1, DO_MIN) +DO_VPQ(sve2p1_uminqv_d, uint64_t, H8, -1, DO_MIN) + +#undef DO_VPQ + /* Two vector operand, one scalar operand, unpredicated. */ #define DO_ZZI(NAME, TYPE, OP) \ void HELPER(NAME)(void *vd, void *vn, uint64_t s64, uint32_t desc) \ diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SXTW, aa64_sve, gen_gvec_ool_arg_zpz, TRANS_FEAT(UXTW, aa64_sve, gen_gvec_ool_arg_zpz, a->esz == 3 ? gen_helper_sve_uxtw_d : NULL, a, 0) +static gen_helper_gvec_3 * const addqv_fns[4] = { + gen_helper_sve2p1_addqv_b, gen_helper_sve2p1_addqv_h, + gen_helper_sve2p1_addqv_s, gen_helper_sve2p1_addqv_d, +}; +TRANS_FEAT(ADDQV, aa64_sme2p1_or_sve2p1, + gen_gvec_ool_arg_zpz, addqv_fns[a->esz], a, 0) + +static gen_helper_gvec_3 * const smaxqv_fns[4] = { + gen_helper_sve2p1_smaxqv_b, gen_helper_sve2p1_smaxqv_h, + gen_helper_sve2p1_smaxqv_s, gen_helper_sve2p1_smaxqv_d, +}; +TRANS_FEAT(SMAXQV, aa64_sme2p1_or_sve2p1, + gen_gvec_ool_arg_zpz, smaxqv_fns[a->esz], a, 0) + +static gen_helper_gvec_3 * const sminqv_fns[4] = { + gen_helper_sve2p1_sminqv_b, gen_helper_sve2p1_sminqv_h, + gen_helper_sve2p1_sminqv_s, gen_helper_sve2p1_sminqv_d, +}; +TRANS_FEAT(SMINQV, aa64_sme2p1_or_sve2p1, + gen_gvec_ool_arg_zpz, sminqv_fns[a->esz], a, 0) + +static gen_helper_gvec_3 * const umaxqv_fns[4] = { + gen_helper_sve2p1_umaxqv_b, gen_helper_sve2p1_umaxqv_h, + gen_helper_sve2p1_umaxqv_s, gen_helper_sve2p1_umaxqv_d, +}; +TRANS_FEAT(UMAXQV, aa64_sme2p1_or_sve2p1, + gen_gvec_ool_arg_zpz, umaxqv_fns[a->esz], a, 0) + +static gen_helper_gvec_3 * const uminqv_fns[4] = { + gen_helper_sve2p1_uminqv_b, gen_helper_sve2p1_uminqv_h, + gen_helper_sve2p1_uminqv_s, gen_helper_sve2p1_uminqv_d, +}; +TRANS_FEAT(UMINQV, aa64_sme2p1_or_sve2p1, + gen_gvec_ool_arg_zpz, uminqv_fns[a->esz], a, 0) + /* *** SVE Integer Reduction Group */ -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-83-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 2 + target/arm/tcg/vec_internal.h | 74 ++++++++++++++++++++++++++++++++++ target/arm/tcg/sve.decode | 6 +++ target/arm/tcg/sve_helper.c | 28 +++++++++++++ target/arm/tcg/translate-sve.c | 36 +++++++++++++++++ 5 files changed, 146 insertions(+) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve2p1_uminqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2p1_uminqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2p1_uminqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2p1_uminqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(pext, TCG_CALL_NO_RWG, void, ptr, i32, i32) diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_internal.h +++ b/target/arm/tcg/vec_internal.h @@ -XXX,XX +XXX,XX @@ bfloat16 helper_sme2_ah_fmin_b16(bfloat16 a, bfloat16 b, float_status *fpst); float32 sve_f16_to_f32(float16 f, float_status *fpst); float16 sve_f32_to_f16(float32 f, float_status *fpst); +/* + * Decode helper functions for predicate as counter. + */ + +typedef struct { + unsigned count; + unsigned lg2_stride; + bool invert; +} DecodeCounter; + +static inline DecodeCounter +decode_counter(unsigned png, unsigned vl, unsigned v_esz) +{ + DecodeCounter ret = { }; + + /* C.f. Arm pseudocode CounterToPredicate. */ + if (likely(png & 0xf)) { + unsigned p_esz = ctz32(png); + + /* + * maxbit = log2(pl(bits) * 4) + * = log2(vl(bytes) * 4) + * = log2(vl) + 2 + * maxbit_mask = ones<maxbit:0> + * = (1 << (maxbit + 1)) - 1 + * = (1 << (log2(vl) + 2 + 1)) - 1 + * = (1 << (log2(vl) + 3)) - 1 + * = (pow2ceil(vl) << 3) - 1 + */ + ret.count = png & (((unsigned)pow2ceil(vl) << 3) - 1); + ret.count >>= p_esz + 1; + + ret.invert = (png >> 15) & 1; + + /* + * The Arm pseudocode for CounterToPredicate expands the count to + * a set of bits, and then the operation proceeds as for the original + * interpretation of predicates as a set of bits. + * + * We can avoid the expansion by adjusting the count and supplying + * an element stride. + */ + if (unlikely(p_esz != v_esz)) { + if (p_esz < v_esz) { + /* + * For predicate esz < vector esz, the expanded predicate + * will have more bits set than will be consumed. + * Adjust the count down, rounding up. + * Consider p_esz = MO_8, v_esz = MO_64, count 14: + * The expanded predicate would be + * 0011 1111 1111 1111 + * The significant bits are + * ...1 ...1 ...1 ...1 + */ + unsigned shift = v_esz - p_esz; + unsigned trunc = ret.count >> shift; + ret.count = trunc + (ret.count != (trunc << shift)); + } else { + /* + * For predicate esz > vector esz, the expanded predicate + * will have bits set only at power-of-two multiples of + * the vector esz. Bits at other multiples will all be + * false. Adjust the count up, and supply the caller + * with a stride of elements to skip. + */ + unsigned shift = p_esz - v_esz; + ret.count <<= shift; + ret.lg2_stride = shift; + } + } + } + return ret; +} + #endif /* TARGET_ARM_VEC_INTERNAL_H */ diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ %rn_ax2 6:4 !function=times_2 %pnd 0:3 !function=plus_8 +%pnn 5:3 !function=plus_8 ########################################################################### # Named attribute sets. These are used to make nice(er) names @@ -XXX,XX +XXX,XX @@ WHILE_lt_cnt4 00100101 .. 1 ..... 0110 . 1 ..... 1 . ... @while_cnt WHILE_gt_cnt2 00100101 .. 1 ..... 0100 . 0 ..... 1 . ... @while_cnt WHILE_gt_cnt4 00100101 .. 1 ..... 0110 . 0 ..... 1 . ... @while_cnt +# SVE2.1 extract mask predicate from predicate-as-counter +&pext rd rn esz imm +PEXT_1 00100101 esz:2 1 00000 0111 00 imm:2 ... 1 rd:4 &pext rn=%pnn +PEXT_2 00100101 esz:2 1 00000 0111 010 imm:1 ... 1 rd:4 &pext rn=%pnn + ### SVE Integer Wide Immediate - Unpredicated Group # SVE broadcast floating-point immediate (unpredicated) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ DO_FCVTLT(sve2_fcvtlt_sd, uint64_t, uint32_t, H1_8, H1_4, float32_to_float64) #undef DO_FCVTLT #undef DO_FCVTNT + +void HELPER(pext)(void *vd, uint32_t png, uint32_t desc) +{ + int pl = FIELD_EX32(desc, PREDDESC, OPRSZ); + int vl = pl * 8; + unsigned v_esz = FIELD_EX32(desc, PREDDESC, ESZ); + int part = FIELD_EX32(desc, PREDDESC, DATA); + DecodeCounter p = decode_counter(png, vl, v_esz); + uint64_t mask = pred_esz_masks[v_esz + p.lg2_stride]; + ARMPredicateReg *d = vd; + + /* + * Convert from element count to byte count and adjust + * for the portion of the 4*VL counter to be extracted. + */ + int b_count = (p.count << v_esz) - vl * part; + + memset(d, 0, sizeof(*d)); + if (p.invert) { + if (b_count <= 0) { + do_whilel(vd, mask, vl, vl); + } else if (b_count < vl) { + do_whileg(vd, mask, vl - b_count, vl); + } + } else if (b_count > 0) { + do_whilel(vd, mask, MIN(b_count, vl), vl); + } +} diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_WHILE_ptr(DisasContext *s, arg_WHILE_ptr *a) return true; } +static bool do_pext(DisasContext *s, arg_pext *a, int n) +{ + TCGv_i32 t_png; + TCGv_ptr t_pd; + int pl; + + if (!sve_access_check(s)) { + return true; + } + + t_png = tcg_temp_new_i32(); + tcg_gen_ld16u_i32(t_png, tcg_env, + pred_full_reg_offset(s, a->rn) ^ + (HOST_BIG_ENDIAN ? 6 : 0)); + + t_pd = tcg_temp_new_ptr(); + pl = pred_full_reg_size(s); + + for (int i = 0; i < n; ++i) { + int rd = (a->rd + i) % 16; + int part = a->imm * n + i; + unsigned desc = 0; + + desc = FIELD_DP32(desc, PREDDESC, OPRSZ, pl); + desc = FIELD_DP32(desc, PREDDESC, ESZ, a->esz); + desc = FIELD_DP32(desc, PREDDESC, DATA, part); + + tcg_gen_addi_ptr(t_pd, tcg_env, pred_full_reg_offset(s, rd)); + gen_helper_pext(t_pd, t_png, tcg_constant_i32(desc)); + } + return true; +} + +TRANS_FEAT(PEXT_1, aa64_sme2_or_sve2p1, do_pext, a, 1) +TRANS_FEAT(PEXT_2, aa64_sme2_or_sve2p1, do_pext, a, 2) + /* *** SVE Integer Wide Immediate - Unpredicated Group */ -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-84-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 5 + target/arm/tcg/sme.decode | 9 + target/arm/tcg/sme_helper.c | 317 +++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 31 ++++ 4 files changed, 362 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sme2_fclamp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i3 DEF_HELPER_FLAGS_5(sme2_fclamp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(sme2_fclamp_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(sme2_bfclamp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, fpst, i32) + +DEF_HELPER_FLAGS_5(sme2_sel_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32, i32) +DEF_HELPER_FLAGS_5(sme2_sel_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32, i32) +DEF_HELPER_FLAGS_5(sme2_sel_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32, i32) +DEF_HELPER_FLAGS_5(sme2_sel_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ UCLAMP 11000001 esz:2 1 zm:5 110001 zn:5 .... 1 \ UCLAMP 11000001 esz:2 1 zm:5 110011 zn:5 ...0 1 \ &zzz_en zd=%zd_ax4 n=4 +### SME2 Multi-vector SVE Select + +%sel_pg 10:3 !function=plus_8 + +SEL 11000001 esz:2 1 ....0 100 ... ....0 ....0 \ + n=2 zd=%zd_ax2 zn=%zn_ax2 zm=%zm_ax2 pg=%sel_pg +SEL 11000001 esz:2 1 ...01 100 ... ...00 ...00 \ + n=4 zd=%zd_ax4 zn=%zn_ax4 zm=%zm_ax4 pg=%sel_pg + ### SME Multiple Zero &zero_za rv off ngrp nvec diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ FCLAMP(sme2_fclamp_d, float64, H8) FCLAMP(sme2_bfclamp, bfloat16, H2) #undef FCLAMP + +void HELPER(sme2_sel_b)(void *vd, void *vn, void *vm, + uint32_t png, uint32_t desc) +{ + int vl = simd_oprsz(desc); + int nreg = simd_data(desc); + int elements = vl / sizeof(uint8_t); + DecodeCounter p = decode_counter(png, vl, MO_8); + + if (p.lg2_stride == 0) { + if (p.invert) { + for (int r = 0; r < nreg; r++) { + uint8_t *d = vd + r * sizeof(ARMVectorReg); + uint8_t *n = vn + r * sizeof(ARMVectorReg); + uint8_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + + if (split <= 0) { + memcpy(d, n, vl); /* all true */ + } else if (elements <= split) { + memcpy(d, m, vl); /* all false */ + } else { + for (int e = 0; e < split; e++) { + d[H1(e)] = m[H1(e)]; + } + for (int e = split; e < elements; e++) { + d[H1(e)] = n[H1(e)]; + } + } + } + } else { + for (int r = 0; r < nreg; r++) { + uint8_t *d = vd + r * sizeof(ARMVectorReg); + uint8_t *n = vn + r * sizeof(ARMVectorReg); + uint8_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + + if (split <= 0) { + memcpy(d, m, vl); /* all false */ + } else if (elements <= split) { + memcpy(d, n, vl); /* all true */ + } else { + for (int e = 0; e < split; e++) { + d[H1(e)] = n[H1(e)]; + } + for (int e = split; e < elements; e++) { + d[H1(e)] = m[H1(e)]; + } + } + } + } + } else { + int estride = 1 << p.lg2_stride; + if (p.invert) { + for (int r = 0; r < nreg; r++) { + uint8_t *d = vd + r * sizeof(ARMVectorReg); + uint8_t *n = vn + r * sizeof(ARMVectorReg); + uint8_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + int e = 0; + + for (; e < MIN(split, elements); e++) { + d[H1(e)] = m[H1(e)]; + } + for (; e < elements; e += estride) { + d[H1(e)] = n[H1(e)]; + for (int i = 1; i < estride; i++) { + d[H1(e + i)] = m[H1(e + i)]; + } + } + } + } else { + for (int r = 0; r < nreg; r++) { + uint8_t *d = vd + r * sizeof(ARMVectorReg); + uint8_t *n = vn + r * sizeof(ARMVectorReg); + uint8_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + int e = 0; + + for (; e < MIN(split, elements); e += estride) { + d[H1(e)] = n[H1(e)]; + for (int i = 1; i < estride; i++) { + d[H1(e + i)] = m[H1(e + i)]; + } + } + for (; e < elements; e++) { + d[H1(e)] = m[H1(e)]; + } + } + } + } +} + +void HELPER(sme2_sel_h)(void *vd, void *vn, void *vm, + uint32_t png, uint32_t desc) +{ + int vl = simd_oprsz(desc); + int nreg = simd_data(desc); + int elements = vl / sizeof(uint16_t); + DecodeCounter p = decode_counter(png, vl, MO_16); + + if (p.lg2_stride == 0) { + if (p.invert) { + for (int r = 0; r < nreg; r++) { + uint16_t *d = vd + r * sizeof(ARMVectorReg); + uint16_t *n = vn + r * sizeof(ARMVectorReg); + uint16_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + + if (split <= 0) { + memcpy(d, n, vl); /* all true */ + } else if (elements <= split) { + memcpy(d, m, vl); /* all false */ + } else { + for (int e = 0; e < split; e++) { + d[H2(e)] = m[H2(e)]; + } + for (int e = split; e < elements; e++) { + d[H2(e)] = n[H2(e)]; + } + } + } + } else { + for (int r = 0; r < nreg; r++) { + uint16_t *d = vd + r * sizeof(ARMVectorReg); + uint16_t *n = vn + r * sizeof(ARMVectorReg); + uint16_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + + if (split <= 0) { + memcpy(d, m, vl); /* all false */ + } else if (elements <= split) { + memcpy(d, n, vl); /* all true */ + } else { + for (int e = 0; e < split; e++) { + d[H2(e)] = n[H2(e)]; + } + for (int e = split; e < elements; e++) { + d[H2(e)] = m[H2(e)]; + } + } + } + } + } else { + int estride = 1 << p.lg2_stride; + if (p.invert) { + for (int r = 0; r < nreg; r++) { + uint16_t *d = vd + r * sizeof(ARMVectorReg); + uint16_t *n = vn + r * sizeof(ARMVectorReg); + uint16_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + int e = 0; + + for (; e < MIN(split, elements); e++) { + d[H2(e)] = m[H2(e)]; + } + for (; e < elements; e += estride) { + d[H2(e)] = n[H2(e)]; + for (int i = 1; i < estride; i++) { + d[H2(e + i)] = m[H2(e + i)]; + } + } + } + } else { + for (int r = 0; r < nreg; r++) { + uint16_t *d = vd + r * sizeof(ARMVectorReg); + uint16_t *n = vn + r * sizeof(ARMVectorReg); + uint16_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + int e = 0; + + for (; e < MIN(split, elements); e += estride) { + d[H2(e)] = n[H2(e)]; + for (int i = 1; i < estride; i++) { + d[H2(e + i)] = m[H2(e + i)]; + } + } + for (; e < elements; e++) { + d[H2(e)] = m[H2(e)]; + } + } + } + } +} + +void HELPER(sme2_sel_s)(void *vd, void *vn, void *vm, + uint32_t png, uint32_t desc) +{ + int vl = simd_oprsz(desc); + int nreg = simd_data(desc); + int elements = vl / sizeof(uint32_t); + DecodeCounter p = decode_counter(png, vl, MO_32); + + if (p.lg2_stride == 0) { + if (p.invert) { + for (int r = 0; r < nreg; r++) { + uint32_t *d = vd + r * sizeof(ARMVectorReg); + uint32_t *n = vn + r * sizeof(ARMVectorReg); + uint32_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + + if (split <= 0) { + memcpy(d, n, vl); /* all true */ + } else if (elements <= split) { + memcpy(d, m, vl); /* all false */ + } else { + for (int e = 0; e < split; e++) { + d[H4(e)] = m[H4(e)]; + } + for (int e = split; e < elements; e++) { + d[H4(e)] = n[H4(e)]; + } + } + } + } else { + for (int r = 0; r < nreg; r++) { + uint32_t *d = vd + r * sizeof(ARMVectorReg); + uint32_t *n = vn + r * sizeof(ARMVectorReg); + uint32_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + + if (split <= 0) { + memcpy(d, m, vl); /* all false */ + } else if (elements <= split) { + memcpy(d, n, vl); /* all true */ + } else { + for (int e = 0; e < split; e++) { + d[H4(e)] = n[H4(e)]; + } + for (int e = split; e < elements; e++) { + d[H4(e)] = m[H4(e)]; + } + } + } + } + } else { + /* p.esz must be MO_64, so stride must be 2. */ + if (p.invert) { + for (int r = 0; r < nreg; r++) { + uint32_t *d = vd + r * sizeof(ARMVectorReg); + uint32_t *n = vn + r * sizeof(ARMVectorReg); + uint32_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + int e = 0; + + for (; e < MIN(split, elements); e++) { + d[H4(e)] = m[H4(e)]; + } + for (; e < elements; e += 2) { + d[H4(e)] = n[H4(e)]; + d[H4(e + 1)] = m[H4(e + 1)]; + } + } + } else { + for (int r = 0; r < nreg; r++) { + uint32_t *d = vd + r * sizeof(ARMVectorReg); + uint32_t *n = vn + r * sizeof(ARMVectorReg); + uint32_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + int e = 0; + + for (; e < MIN(split, elements); e += 2) { + d[H4(e)] = n[H4(e)]; + d[H4(e + 1)] = m[H4(e + 1)]; + } + for (; e < elements; e++) { + d[H4(e)] = m[H4(e)]; + } + } + } + } +} + +void HELPER(sme2_sel_d)(void *vd, void *vn, void *vm, + uint32_t png, uint32_t desc) +{ + int vl = simd_oprsz(desc); + int nreg = simd_data(desc); + int elements = vl / sizeof(uint64_t); + DecodeCounter p = decode_counter(png, vl, MO_64); + + if (p.invert) { + for (int r = 0; r < nreg; r++) { + uint64_t *d = vd + r * sizeof(ARMVectorReg); + uint64_t *n = vn + r * sizeof(ARMVectorReg); + uint64_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + + if (split <= 0) { + memcpy(d, n, vl); /* all true */ + } else if (elements <= split) { + memcpy(d, m, vl); /* all false */ + } else { + memcpy(d, m, split * sizeof(uint64_t)); + memcpy(d + split, n + split, + (elements - split) * sizeof(uint64_t)); + } + } + } else { + for (int r = 0; r < nreg; r++) { + uint64_t *d = vd + r * sizeof(ARMVectorReg); + uint64_t *n = vn + r * sizeof(ARMVectorReg); + uint64_t *m = vm + r * sizeof(ARMVectorReg); + int split = p.count - r * elements; + + if (split <= 0) { + memcpy(d, m, vl); /* all false */ + } else if (elements <= split) { + memcpy(d, n, vl); /* all true */ + } else { + memcpy(d, n, split * sizeof(uint64_t)); + memcpy(d + split, m + split, + (elements - split) * sizeof(uint64_t)); + } + } + } +} diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3 * const uclamp_fns[] = { gen_helper_sme2_uclamp_d, }; TRANS(UCLAMP, do_clamp, a, uclamp_fns) + +static bool trans_SEL(DisasContext *s, arg_SEL *a) +{ + typedef void sme_sel_fn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32); + static sme_sel_fn * const fns[4] = { + gen_helper_sme2_sel_b, gen_helper_sme2_sel_h, + gen_helper_sme2_sel_s, gen_helper_sme2_sel_d + }; + + if (!dc_isar_feature(aa64_sme2, s)) { + return false; + } + if (sme_sm_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + uint32_t desc = simd_desc(svl, svl, a->n); + TCGv_ptr t_d = tcg_temp_new_ptr(); + TCGv_ptr t_n = tcg_temp_new_ptr(); + TCGv_ptr t_m = tcg_temp_new_ptr(); + TCGv_i32 png = tcg_temp_new_i32(); + + tcg_gen_addi_ptr(t_d, tcg_env, vec_full_reg_offset(s, a->zd)); + tcg_gen_addi_ptr(t_n, tcg_env, vec_full_reg_offset(s, a->zn)); + tcg_gen_addi_ptr(t_m, tcg_env, vec_full_reg_offset(s, a->zm)); + + tcg_gen_ld16u_i32(png, tcg_env, pred_full_reg_offset(s, a->pg) + ^ (HOST_BIG_ENDIAN ? 6 : 0)); + + fns[a->esz](t_d, t_n, t_m, png, tcg_constant_i32(desc)); + } + return true; +} -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-85-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 15 ++++++++++++ target/arm/tcg/sve.decode | 5 ++++ target/arm/tcg/sve_helper.c | 42 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sve.c | 3 +++ 4 files changed, 65 insertions(+) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve2p1_uminqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2p1_uminqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_3(pext, TCG_CALL_NO_RWG, void, ptr, i32, i32) + +DEF_HELPER_FLAGS_4(sve2p1_orqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_orqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_orqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_orqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2p1_eorqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_eorqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_eorqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_eorqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve2p1_andqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_andqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_andqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_andqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ ORV 00000100 .. 011 000 001 ... ..... ..... @rd_pg_rn EORV 00000100 .. 011 001 001 ... ..... ..... @rd_pg_rn ANDV 00000100 .. 011 010 001 ... ..... ..... @rd_pg_rn +# SVE2.1 bitwise logical reduction (quadwords) +ORQV 00000100 .. 011 100 001 ... ..... ..... @rd_pg_rn +EORQV 00000100 .. 011 101 001 ... ..... ..... @rd_pg_rn +ANDQV 00000100 .. 011 110 001 ... ..... ..... @rd_pg_rn + # SVE constructive prefix (predicated) MOVPRFX_z 00000100 .. 010 000 001 ... ..... ..... @rd_pg_rn MOVPRFX_m 00000100 .. 010 001 001 ... ..... ..... @rd_pg_rn diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ static inline uint64_t expand_pred_s(uint8_t byte) return word[byte & 0x11]; } +static inline uint64_t expand_pred_d(uint8_t byte) +{ + return -(uint64_t)(byte & 1); +} + #define LOGICAL_PPPP(NAME, FUNC) \ void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc) \ { \ @@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc) \ #define DO_EOR(N, M) (N ^ M) #define DO_ORR(N, M) (N | M) #define DO_BIC(N, M) (N & ~M) +#define DO_ORC(N, M) (N | ~M) #define DO_ADD(N, M) (N + M) #define DO_SUB(N, M) (N - M) #define DO_MAX(N, M) ((N) >= (M) ? (N) : (M)) @@ -XXX,XX +XXX,XX @@ DO_ZZI(sve_umini_d, uint64_t, DO_MIN) #undef DO_ZZI +#define DO_LOGIC_QV(NAME, SUFF, INIT, VOP, POP) \ +void HELPER(NAME ## _ ## SUFF)(void *vd, void *vn, void *vg, uint32_t desc) \ +{ \ + unsigned seg = simd_oprsz(desc) / 16; \ + uint64_t r0 = INIT, r1 = INIT; \ + for (unsigned s = 0; s < seg; s++) { \ + uint64_t p0 = expand_pred_##SUFF(*(uint8_t *)(vg + H1(s * 2))); \ + uint64_t p1 = expand_pred_##SUFF(*(uint8_t *)(vg + H1(s * 2 + 1))); \ + uint64_t v0 = *(uint64_t *)(vn + s * 16); \ + uint64_t v1 = *(uint64_t *)(vn + s * 16 + 8); \ + v0 = POP(v0, p0), v1 = POP(v1, p1); \ + r0 = VOP(r0, v0), r1 = VOP(r1, v1); \ + } \ + *(uint64_t *)(vd + 0) = r0; \ + *(uint64_t *)(vd + 8) = r1; \ + clear_tail(vd, 16, simd_maxsz(desc)); \ +} + +DO_LOGIC_QV(sve2p1_orqv, b, 0, DO_ORR, DO_AND) +DO_LOGIC_QV(sve2p1_orqv, h, 0, DO_ORR, DO_AND) +DO_LOGIC_QV(sve2p1_orqv, s, 0, DO_ORR, DO_AND) +DO_LOGIC_QV(sve2p1_orqv, d, 0, DO_ORR, DO_AND) + +DO_LOGIC_QV(sve2p1_eorqv, b, 0, DO_EOR, DO_AND) +DO_LOGIC_QV(sve2p1_eorqv, h, 0, DO_EOR, DO_AND) +DO_LOGIC_QV(sve2p1_eorqv, s, 0, DO_EOR, DO_AND) +DO_LOGIC_QV(sve2p1_eorqv, d, 0, DO_EOR, DO_AND) + +DO_LOGIC_QV(sve2p1_andqv, b, -1, DO_AND, DO_ORC) +DO_LOGIC_QV(sve2p1_andqv, h, -1, DO_AND, DO_ORC) +DO_LOGIC_QV(sve2p1_andqv, s, -1, DO_AND, DO_ORC) +DO_LOGIC_QV(sve2p1_andqv, d, -1, DO_AND, DO_ORC) + +#undef DO_LOGIC_QV + #undef DO_AND #undef DO_ORR #undef DO_EOR #undef DO_BIC +#undef DO_ORC #undef DO_ADD #undef DO_SUB #undef DO_MAX diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ DO_ZPZ(NOT_zpz, aa64_sve, sve_not_zpz) DO_ZPZ(ABS, aa64_sve, sve_abs) DO_ZPZ(NEG, aa64_sve, sve_neg) DO_ZPZ(RBIT, aa64_sve, sve_rbit) +DO_ZPZ(ORQV, aa64_sme2p1_or_sve2p1, sve2p1_orqv) +DO_ZPZ(EORQV, aa64_sme2p1_or_sve2p1, sve2p1_eorqv) +DO_ZPZ(ANDQV, aa64_sme2p1_or_sve2p1, sve2p1_andqv) static gen_helper_gvec_3 * const fabs_fns[4] = { NULL, gen_helper_sve_fabs_h, -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-86-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 49 ++++++++++++++++++++++++ target/arm/tcg/sve.decode | 8 ++++ target/arm/tcg/sve_helper.c | 70 +++++++++++++++++++++------------- target/arm/tcg/translate-sve.c | 48 +++++++++++++++++++++++ 4 files changed, 148 insertions(+), 27 deletions(-) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_ah_fminv_s, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_4(sve_ah_fminv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_faddqv_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_faddqv_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_faddqv_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) + +DEF_HELPER_FLAGS_5(sve2p1_fmaxnmqv_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_fmaxnmqv_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_fmaxnmqv_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) + +DEF_HELPER_FLAGS_5(sve2p1_fminnmqv_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_fminnmqv_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_fminnmqv_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) + +DEF_HELPER_FLAGS_5(sve2p1_fmaxqv_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_fmaxqv_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_fmaxqv_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) + +DEF_HELPER_FLAGS_5(sve2p1_fminqv_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_fminqv_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_fminqv_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) + +DEF_HELPER_FLAGS_5(sve2p1_ah_fmaxqv_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_ah_fmaxqv_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_ah_fmaxqv_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) + +DEF_HELPER_FLAGS_5(sve2p1_ah_fminqv_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_ah_fminqv_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_5(sve2p1_ah_fminqv_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, fpst, i32) + DEF_HELPER_FLAGS_5(sve_fadda_h, TCG_CALL_NO_RWG, i64, i64, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_5(sve_fadda_s, TCG_CALL_NO_RWG, diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ FMINNMV 01100101 .. 000 101 001 ... ..... ..... @rd_pg_rn FMAXV 01100101 .. 000 110 001 ... ..... ..... @rd_pg_rn FMINV 01100101 .. 000 111 001 ... ..... ..... @rd_pg_rn +### SVE FP recursive reduction (quadwords) + +FADDQV 01100100 .. 010 000 101 ... ..... ..... @rd_pg_rn +FMAXNMQV 01100100 .. 010 100 101 ... ..... ..... @rd_pg_rn +FMINNMQV 01100100 .. 010 101 101 ... ..... ..... @rd_pg_rn +FMAXQV 01100100 .. 010 110 101 ... ..... ..... @rd_pg_rn +FMINQV 01100100 .. 010 111 101 ... ..... ..... @rd_pg_rn + ## SVE Floating Point Unary Operations - Unpredicated Group FRECPE 01100101 .. 001 110 001100 ..... ..... @rd_rn diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_whilecg)(void *vd, uint32_t count, uint32_t pred_desc) * The recursion is bounded to depth 7 (128 fp16 elements), so there's * little to gain with a more complex non-recursive form. */ -#define DO_REDUCE(NAME, TYPE, H, FUNC, IDENT) \ -static TYPE NAME##_reduce(TYPE *data, float_status *status, uintptr_t n) \ +#define DO_REDUCE(NAME, SUF, TYPE, H, FUNC, IDENT) \ +static TYPE FUNC##_reduce(TYPE *data, float_status *status, uintptr_t n) \ { \ if (n == 1) { \ return *data; \ } else { \ uintptr_t half = n / 2; \ - TYPE lo = NAME##_reduce(data, status, half); \ - TYPE hi = NAME##_reduce(data + half, status, half); \ + TYPE lo = FUNC##_reduce(data, status, half); \ + TYPE hi = FUNC##_reduce(data + half, status, half); \ return FUNC(lo, hi, status); \ } \ } \ -uint64_t HELPER(NAME)(void *vn, void *vg, float_status *s, uint32_t desc) \ +uint64_t helper_sve_##NAME##v_##SUF(void *vn, void *vg, \ + float_status *s, uint32_t desc) \ { \ uintptr_t i, oprsz = simd_oprsz(desc), maxsz = simd_data(desc); \ TYPE data[sizeof(ARMVectorReg) / sizeof(TYPE)]; \ @@ -XXX,XX +XXX,XX @@ uint64_t HELPER(NAME)(void *vn, void *vg, float_status *s, uint32_t desc) \ for (; i < maxsz; i += sizeof(TYPE)) { \ *(TYPE *)((void *)data + i) = IDENT; \ } \ - return NAME##_reduce(data, s, maxsz / sizeof(TYPE)); \ + return FUNC##_reduce(data, s, maxsz / sizeof(TYPE)); \ +} \ +void helper_sve2p1_##NAME##qv_##SUF(void *vd, void *vn, void *vg, \ + float_status *status, uint32_t desc) \ +{ \ + unsigned oprsz = simd_oprsz(desc), segments = oprsz / 16; \ + for (unsigned e = 0; e < 16; e += sizeof(TYPE)) { \ + TYPE data[ARM_MAX_VQ]; \ + for (unsigned s = 0; s < segments; s++) { \ + uint16_t pg = *(uint16_t *)(vg + H1_2(s * 2)); \ + TYPE nn = *(TYPE *)(vn + H(s * 16 + H(e))); \ + data[s] = (pg >> e) & 1 ? nn : IDENT; \ + } \ + *(TYPE *)(vd + H(e)) = FUNC##_reduce(data, status, segments); \ + } \ + clear_tail(vd, 16, simd_maxsz(desc)); \ } -DO_REDUCE(sve_faddv_h, float16, H1_2, float16_add, float16_zero) -DO_REDUCE(sve_faddv_s, float32, H1_4, float32_add, float32_zero) -DO_REDUCE(sve_faddv_d, float64, H1_8, float64_add, float64_zero) +DO_REDUCE(fadd,h, float16, H1_2, float16_add, float16_zero) +DO_REDUCE(fadd,s, float32, H1_4, float32_add, float32_zero) +DO_REDUCE(fadd,d, float64, H1_8, float64_add, float64_zero) /* Identity is floatN_default_nan, without the function call. */ -DO_REDUCE(sve_fminnmv_h, float16, H1_2, float16_minnum, 0x7E00) -DO_REDUCE(sve_fminnmv_s, float32, H1_4, float32_minnum, 0x7FC00000) -DO_REDUCE(sve_fminnmv_d, float64, H1_8, float64_minnum, 0x7FF8000000000000ULL) +DO_REDUCE(fminnm,h, float16, H1_2, float16_minnum, 0x7E00) +DO_REDUCE(fminnm,s, float32, H1_4, float32_minnum, 0x7FC00000) +DO_REDUCE(fminnm,d, float64, H1_8, float64_minnum, 0x7FF8000000000000ULL) -DO_REDUCE(sve_fmaxnmv_h, float16, H1_2, float16_maxnum, 0x7E00) -DO_REDUCE(sve_fmaxnmv_s, float32, H1_4, float32_maxnum, 0x7FC00000) -DO_REDUCE(sve_fmaxnmv_d, float64, H1_8, float64_maxnum, 0x7FF8000000000000ULL) +DO_REDUCE(fmaxnm,h, float16, H1_2, float16_maxnum, 0x7E00) +DO_REDUCE(fmaxnm,s, float32, H1_4, float32_maxnum, 0x7FC00000) +DO_REDUCE(fmaxnm,d, float64, H1_8, float64_maxnum, 0x7FF8000000000000ULL) -DO_REDUCE(sve_fminv_h, float16, H1_2, float16_min, float16_infinity) -DO_REDUCE(sve_fminv_s, float32, H1_4, float32_min, float32_infinity) -DO_REDUCE(sve_fminv_d, float64, H1_8, float64_min, float64_infinity) +DO_REDUCE(fmin,h, float16, H1_2, float16_min, float16_infinity) +DO_REDUCE(fmin,s, float32, H1_4, float32_min, float32_infinity) +DO_REDUCE(fmin,d, float64, H1_8, float64_min, float64_infinity) -DO_REDUCE(sve_fmaxv_h, float16, H1_2, float16_max, float16_chs(float16_infinity)) -DO_REDUCE(sve_fmaxv_s, float32, H1_4, float32_max, float32_chs(float32_infinity)) -DO_REDUCE(sve_fmaxv_d, float64, H1_8, float64_max, float64_chs(float64_infinity)) +DO_REDUCE(fmax,h, float16, H1_2, float16_max, float16_chs(float16_infinity)) +DO_REDUCE(fmax,s, float32, H1_4, float32_max, float32_chs(float32_infinity)) +DO_REDUCE(fmax,d, float64, H1_8, float64_max, float64_chs(float64_infinity)) -DO_REDUCE(sve_ah_fminv_h, float16, H1_2, helper_vfp_ah_minh, float16_infinity) -DO_REDUCE(sve_ah_fminv_s, float32, H1_4, helper_vfp_ah_mins, float32_infinity) -DO_REDUCE(sve_ah_fminv_d, float64, H1_8, helper_vfp_ah_mind, float64_infinity) +DO_REDUCE(ah_fmin,h, float16, H1_2, helper_vfp_ah_minh, float16_infinity) +DO_REDUCE(ah_fmin,s, float32, H1_4, helper_vfp_ah_mins, float32_infinity) +DO_REDUCE(ah_fmin,d, float64, H1_8, helper_vfp_ah_mind, float64_infinity) -DO_REDUCE(sve_ah_fmaxv_h, float16, H1_2, helper_vfp_ah_maxh, +DO_REDUCE(ah_fmax,h, float16, H1_2, helper_vfp_ah_maxh, float16_chs(float16_infinity)) -DO_REDUCE(sve_ah_fmaxv_s, float32, H1_4, helper_vfp_ah_maxs, +DO_REDUCE(ah_fmax,s, float32, H1_4, helper_vfp_ah_maxs, float32_chs(float32_infinity)) -DO_REDUCE(sve_ah_fmaxv_d, float64, H1_8, helper_vfp_ah_maxd, +DO_REDUCE(ah_fmax,d, float64, H1_8, helper_vfp_ah_maxd, float64_chs(float64_infinity)) #undef DO_REDUCE diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ DO_VPZ_AH(FMAXV, fmaxv) #undef DO_VPZ +static gen_helper_gvec_3_ptr * const faddqv_fns[4] = { + NULL, gen_helper_sve2p1_faddqv_h, + gen_helper_sve2p1_faddqv_s, gen_helper_sve2p1_faddqv_d, +}; +TRANS_FEAT(FADDQV, aa64_sme2p1_or_sve2p1, gen_gvec_fpst_arg_zpz, + faddqv_fns[a->esz], a, 0, + a->esz == MO_16 ? FPST_A64_F16 : FPST_A64) + +static gen_helper_gvec_3_ptr * const fmaxnmqv_fns[4] = { + NULL, gen_helper_sve2p1_fmaxnmqv_h, + gen_helper_sve2p1_fmaxnmqv_s, gen_helper_sve2p1_fmaxnmqv_d, +}; +TRANS_FEAT(FMAXNMQV, aa64_sme2p1_or_sve2p1, gen_gvec_fpst_arg_zpz, + fmaxnmqv_fns[a->esz], a, 0, + a->esz == MO_16 ? FPST_A64_F16 : FPST_A64) + +static gen_helper_gvec_3_ptr * const fminnmqv_fns[4] = { + NULL, gen_helper_sve2p1_fminnmqv_h, + gen_helper_sve2p1_fminnmqv_s, gen_helper_sve2p1_fminnmqv_d, +}; +TRANS_FEAT(FMINNMQV, aa64_sme2p1_or_sve2p1, gen_gvec_fpst_arg_zpz, + fminnmqv_fns[a->esz], a, 0, + a->esz == MO_16 ? FPST_A64_F16 : FPST_A64) + +static gen_helper_gvec_3_ptr * const fmaxqv_fns[4] = { + NULL, gen_helper_sve2p1_fmaxqv_h, + gen_helper_sve2p1_fmaxqv_s, gen_helper_sve2p1_fmaxqv_d, +}; +static gen_helper_gvec_3_ptr * const fmaxqv_ah_fns[4] = { + NULL, gen_helper_sve2p1_ah_fmaxqv_h, + gen_helper_sve2p1_ah_fmaxqv_s, gen_helper_sve2p1_ah_fmaxqv_d, +}; +TRANS_FEAT(FMAXQV, aa64_sme2p1_or_sve2p1, gen_gvec_fpst_arg_zpz, + (s->fpcr_ah ? fmaxqv_fns : fmaxqv_ah_fns)[a->esz], a, 0, + a->esz == MO_16 ? FPST_A64_F16 : FPST_A64) + +static gen_helper_gvec_3_ptr * const fminqv_fns[4] = { + NULL, gen_helper_sve2p1_fminqv_h, + gen_helper_sve2p1_fminqv_s, gen_helper_sve2p1_fminqv_d, +}; +static gen_helper_gvec_3_ptr * const fminqv_ah_fns[4] = { + NULL, gen_helper_sve2p1_ah_fminqv_h, + gen_helper_sve2p1_ah_fminqv_s, gen_helper_sve2p1_ah_fminqv_d, +}; +TRANS_FEAT(FMINQV, aa64_sme2p1_or_sve2p1, gen_gvec_fpst_arg_zpz, + (s->fpcr_ah ? fminqv_fns : fminqv_ah_fns)[a->esz], a, 0, + a->esz == MO_16 ? FPST_A64_F16 : FPST_A64) + /* *** SVE Floating Point Unary Operations - Unpredicated Group */ -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-87-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 6 ++++++ target/arm/tcg/translate-sve.c | 30 ++++++++++++++++++++++++++++++ 2 files changed, 36 insertions(+) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ FMLSLT_zzzw 01100100 10 1 ..... 10 1 00 1 ..... ..... @rda_rn_rm_ex esz=2 BFMLALB_zzzw 01100100 11 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2 BFMLALT_zzzw 01100100 11 1 ..... 10 0 00 1 ..... ..... @rda_rn_rm_ex esz=2 +BFMLSLB_zzzw 01100100 11 1 ..... 10 1 00 0 ..... ..... @rda_rn_rm_ex esz=2 +BFMLSLT_zzzw 01100100 11 1 ..... 10 1 00 1 ..... ..... @rda_rn_rm_ex esz=2 ### SVE2 floating-point dot-product FDOT_zzzz 01100100 00 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2 BFDOT_zzzz 01100100 01 1 ..... 10 0 00 0 ..... ..... @rda_rn_rm_ex esz=2 ### SVE2 floating-point multiply-add long (indexed) + FMLALB_zzxw 01100100 10 1 ..... 0100.0 ..... ..... @rrxr_3a esz=2 FMLALT_zzxw 01100100 10 1 ..... 0100.1 ..... ..... @rrxr_3a esz=2 FMLSLB_zzxw 01100100 10 1 ..... 0110.0 ..... ..... @rrxr_3a esz=2 FMLSLT_zzxw 01100100 10 1 ..... 0110.1 ..... ..... @rrxr_3a esz=2 + BFMLALB_zzxw 01100100 11 1 ..... 0100.0 ..... ..... @rrxr_3a esz=2 BFMLALT_zzxw 01100100 11 1 ..... 0100.1 ..... ..... @rrxr_3a esz=2 +BFMLSLB_zzxw 01100100 11 1 ..... 0110.0 ..... ..... @rrxr_3a esz=2 +BFMLSLT_zzxw 01100100 11 1 ..... 0110.1 ..... ..... @rrxr_3a esz=2 ### SVE2 floating-point dot-product (indexed) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool do_BFMLAL_zzxw(DisasContext *s, arg_rrxr_esz *a, bool sel) TRANS_FEAT(BFMLALB_zzxw, aa64_sve_bf16, do_BFMLAL_zzxw, a, false) TRANS_FEAT(BFMLALT_zzxw, aa64_sve_bf16, do_BFMLAL_zzxw, a, true) +static bool do_BFMLSL_zzzw(DisasContext *s, arg_rrrr_esz *a, bool sel) +{ + if (s->fpcr_ah) { + return gen_gvec_fpst_zzzz(s, gen_helper_gvec_ah_bfmlsl, + a->rd, a->rn, a->rm, a->ra, sel, FPST_AH); + } else { + return gen_gvec_fpst_zzzz(s, gen_helper_gvec_bfmlsl, + a->rd, a->rn, a->rm, a->ra, sel, FPST_A64); + } +} + +TRANS_FEAT(BFMLSLB_zzzw, aa64_sme2_or_sve2p1, do_BFMLSL_zzzw, a, false) +TRANS_FEAT(BFMLSLT_zzzw, aa64_sme2_or_sve2p1, do_BFMLSL_zzzw, a, true) + +static bool do_BFMLSL_zzxw(DisasContext *s, arg_rrxr_esz *a, bool sel) +{ + if (s->fpcr_ah) { + return gen_gvec_fpst_zzzz(s, gen_helper_gvec_ah_bfmlsl_idx, + a->rd, a->rn, a->rm, a->ra, + (a->index << 1) | sel, FPST_AH); + } else { + return gen_gvec_fpst_zzzz(s, gen_helper_gvec_bfmlsl_idx, + a->rd, a->rn, a->rm, a->ra, + (a->index << 1) | sel, FPST_A64); + } +} + +TRANS_FEAT(BFMLSLB_zzxw, aa64_sme2_or_sve2p1, do_BFMLSL_zzxw, a, false) +TRANS_FEAT(BFMLSLT_zzxw, aa64_sme2_or_sve2p1, do_BFMLSL_zzxw, a, true) + static bool trans_PSEL(DisasContext *s, arg_psel *a) { int vl = vec_full_reg_size(s); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-88-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 1 + target/arm/tcg/sve.decode | 3 ++- target/arm/tcg/sve_helper.c | 21 +++++++++++++++++++++ target/arm/tcg/translate-sve.c | 30 ++++++++++++++++++++++++++++++ 4 files changed, 54 insertions(+), 1 deletion(-) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_brkn, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_brkns, TCG_CALL_NO_RWG, i32, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve_cntp, TCG_CALL_NO_RWG, i64, ptr, ptr, i32) +DEF_HELPER_FLAGS_2(sve2p1_cntp_c, TCG_CALL_NO_RWG_SE, i64, i32, i32) DEF_HELPER_FLAGS_3(sve_whilel, TCG_CALL_NO_RWG, i32, ptr, i32, i32) DEF_HELPER_FLAGS_3(sve_whileg, TCG_CALL_NO_RWG, i32, ptr, i32, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ BRKN 00100101 0. 01100001 .... 0 .... 0 .... @pd_pg_pn_s ### SVE Predicate Count Group # SVE predicate count -CNTP 00100101 .. 100 000 10 .... 0 .... ..... @rd_pg4_pn +CNTP 00100101 .. 100 000 10 .... 0 .... ..... @rd_pg4_pn +CNTP_c 00100101 esz:2 100 000 10 000 vl:1 1 rn:4 rd:5 # SVE inc/dec register by predicate count INCDECP_r 00100101 .. 10110 d:1 10001 00 .... ..... @incdec_pred u=1 diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ uint64_t HELPER(sve_cntp)(void *vn, void *vg, uint32_t pred_desc) return sum; } +uint64_t HELPER(sve2p1_cntp_c)(uint32_t png, uint32_t desc) +{ + int pl = FIELD_EX32(desc, PREDDESC, OPRSZ); + int vl = pl * 8; + unsigned v_esz = FIELD_EX32(desc, PREDDESC, ESZ); + int lg2_width = FIELD_EX32(desc, PREDDESC, DATA) + 1; + DecodeCounter p = decode_counter(png, vl, v_esz); + unsigned maxelem = (vl << lg2_width) >> v_esz; + unsigned count = p.count; + + if (p.invert) { + if (count >= maxelem) { + return 0; + } + count = maxelem - count; + } else { + count = MIN(count, maxelem); + } + return count >> p.lg2_stride; +} + /* C.f. Arm pseudocode EncodePredCount */ static uint64_t encode_pred_count(uint32_t elements, uint32_t count, uint32_t esz, bool invert) diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_CNTP(DisasContext *s, arg_CNTP *a) return true; } +static bool trans_CNTP_c(DisasContext *s, arg_CNTP_c *a) +{ + TCGv_i32 t_png; + uint32_t desc = 0; + + if (dc_isar_feature(aa64_sve2p1, s)) { + if (!sve_access_check(s)) { + return true; + } + } else if (dc_isar_feature(aa64_sme2, s)) { + if (!sme_sm_enabled_check(s)) { + return true; + } + } else { + return false; + } + + t_png = tcg_temp_new_i32(); + tcg_gen_ld16u_i32(t_png, tcg_env, + pred_full_reg_offset(s, a->rn) ^ + (HOST_BIG_ENDIAN ? 6 : 0)); + + desc = FIELD_DP32(desc, PREDDESC, OPRSZ, pred_full_reg_size(s)); + desc = FIELD_DP32(desc, PREDDESC, ESZ, a->esz); + desc = FIELD_DP32(desc, PREDDESC, DATA, a->vl); + + gen_helper_sve2p1_cntp_c(cpu_reg(s, a->rd), t_png, tcg_constant_i32(desc)); + return true; +} + static bool trans_INCDECP_r(DisasContext *s, arg_incdec_pred *a) { if (!dc_isar_feature(aa64_sve, s)) { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-89-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 6 ++++++ target/arm/tcg/translate-sve.c | 21 +++++++++++++++++++++ 2 files changed, 27 insertions(+) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ DUP_s 00000101 .. 1 00000 001110 ..... ..... @rd_rn DUP_x 00000101 .. 1 ..... 001000 rn:5 rd:5 \ &rri imm=%imm7_22_16 +# SVE Permute Vector - one source quadwords +DUPQ 00000101 001 imm:4 1 001001 rn:5 rd:5 &rri_esz esz=0 +DUPQ 00000101 001 imm:3 10 001001 rn:5 rd:5 &rri_esz esz=1 +DUPQ 00000101 001 imm:2 100 001001 rn:5 rd:5 &rri_esz esz=2 +DUPQ 00000101 001 imm:1 1000 001001 rn:5 rd:5 &rri_esz esz=3 + # SVE insert SIMD&FP scalar register INSR_f 00000101 .. 1 10100 001110 ..... ..... @rdn_rm diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_DUP_x(DisasContext *s, arg_DUP_x *a) return true; } +static bool trans_DUPQ(DisasContext *s, arg_DUPQ *a) +{ + unsigned vl, dofs, nofs; + + if (!dc_isar_feature(aa64_sme2p1_or_sve2p1, s)) { + return false; + } + if (!sve_access_check(s)) { + return true; + } + + vl = vec_full_reg_size(s); + dofs = vec_full_reg_offset(s, a->rd); + nofs = vec_reg_offset(s, a->rn, a->imm, a->esz); + + for (unsigned i = 0; i < vl; i += 16) { + tcg_gen_gvec_dup_mem(a->esz, dofs + i, nofs + i, 16, 16); + } + return true; +} + static void do_insr_i64(DisasContext *s, arg_rrr_esz *a, TCGv_i64 val) { typedef void gen_insr(TCGv_ptr, TCGv_ptr, TCGv_i64, TCGv_i32); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-90-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 2 ++ target/arm/tcg/translate-sve.c | 49 ++++++++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ DUPQ 00000101 001 imm:3 10 001001 rn:5 rd:5 &rri_esz esz=1 DUPQ 00000101 001 imm:2 100 001001 rn:5 rd:5 &rri_esz esz=2 DUPQ 00000101 001 imm:1 1000 001001 rn:5 rd:5 &rri_esz esz=3 +EXTQ 00000101 0110 imm:4 001001 rn:5 rd:5 &rri + # SVE insert SIMD&FP scalar register INSR_f 00000101 .. 1 10100 001110 ..... ..... @rdn_rm diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool do_EXT(DisasContext *s, int rd, int rn, int rm, int imm) TRANS_FEAT(EXT, aa64_sve, do_EXT, a->rd, a->rn, a->rm, a->imm) TRANS_FEAT(EXT_sve2, aa64_sve2, do_EXT, a->rd, a->rn, (a->rn + 1) % 32, a->imm) +static bool trans_EXTQ(DisasContext *s, arg_EXTQ *a) +{ + unsigned vl, dofs, sofs0, sofs1, sofs2, imm; + + if (!dc_isar_feature(aa64_sme2p1_or_sve2p1, s)) { + return false; + } + if (!sve_access_check(s)) { + return true; + } + + imm = a->imm; + if (imm == 0) { + /* So far we never optimize Zdn with MOVPRFX, so zd = zn is a nop. */ + return true; + } + + vl = vec_full_reg_size(s); + dofs = vec_full_reg_offset(s, a->rd); + sofs2 = vec_full_reg_offset(s, a->rn); + + if (imm & 8) { + sofs0 = dofs + 8; + sofs1 = sofs2; + sofs2 += 8; + } else { + sofs0 = dofs; + sofs1 = dofs + 8; + } + imm = (imm & 7) << 3; + + for (unsigned i = 0; i < vl; i += 16) { + TCGv_i64 s0 = tcg_temp_new_i64(); + TCGv_i64 s1 = tcg_temp_new_i64(); + TCGv_i64 s2 = tcg_temp_new_i64(); + + tcg_gen_ld_i64(s0, tcg_env, sofs0 + i); + tcg_gen_ld_i64(s1, tcg_env, sofs1 + i); + tcg_gen_ld_i64(s2, tcg_env, sofs2 + i); + + tcg_gen_extract2_i64(s0, s0, s1, imm); + tcg_gen_extract2_i64(s1, s1, s2, imm); + + tcg_gen_st_i64(s0, tcg_env, dofs + i); + tcg_gen_st_i64(s1, tcg_env, dofs + i + 8); + } + return true; +} + /* *** SVE Permute - Unpredicated Group */ -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-91-richard.henderson@linaro.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 8 +++ target/arm/tcg/vec_internal.h | 34 ++++++++++++ target/arm/tcg/sve.decode | 17 ++++++ target/arm/tcg/sve_helper.c | 50 +++++++++++++++++ target/arm/tcg/translate-sve.c | 98 ++++++++++++++++++++++++++++++++++ 5 files changed, 207 insertions(+) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve2p1_andqv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2p1_andqv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2p1_andqv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2p1_andqv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(pmov_pv_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(pmov_pv_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(pmov_pv_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_3(pmov_vp_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(pmov_vp_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(pmov_vp_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_internal.h +++ b/target/arm/tcg/vec_internal.h @@ -XXX,XX +XXX,XX @@ decode_counter(unsigned png, unsigned vl, unsigned v_esz) return ret; } +/* Extract @len bits from an array of uint64_t at offset @pos bits. */ +static inline uint64_t extractn(uint64_t *p, unsigned pos, unsigned len) +{ + uint64_t x; + + p += pos / 64; + pos = pos % 64; + + x = p[0]; + if (pos + len > 64) { + x = (x >> pos) | (p[1] << (-pos & 63)); + pos = 0; + } + return extract64(x, pos, len); +} + +/* Deposit @len bits into an array of uint64_t at offset @pos bits. */ +static inline void depositn(uint64_t *p, unsigned pos, + unsigned len, uint64_t val) +{ + p += pos / 64; + pos = pos % 64; + + if (pos + len <= 64) { + p[0] = deposit64(p[0], pos, len, val); + } else { + unsigned len0 = 64 - pos; + unsigned len1 = len - len0; + + p[0] = deposit64(p[0], pos, len0, val); + p[1] = deposit64(p[1], 0, len1, val >> len0); + } +} + #endif /* TARGET_ARM_VEC_INTERNAL_H */ diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ %size_23 23:2 %dtype_23_13 23:2 13:2 %index3_22_19 22:1 19:2 +%index3_22_17 22:1 17:2 %index3_19_11 19:2 11:1 %index2_20_11 20:1 11:1 @@ -XXX,XX +XXX,XX @@ INSR_r 00000101 .. 1 00100 001110 ..... ..... @rdn_rm # SVE reverse vector elements REV_v 00000101 .. 1 11000 001110 ..... ..... @rd_rn +# SVE move predicate to/from vector + +PMOV_pv 00000101 00 101 01 0001110 rn:5 0 rd:4 \ + &rri_esz esz=0 imm=0 +PMOV_pv 00000101 00 101 1 imm:1 0001110 rn:5 0 rd:4 &rri_esz esz=1 +PMOV_pv 00000101 01 101 imm:2 0001110 rn:5 0 rd:4 &rri_esz esz=2 +PMOV_pv 00000101 1. 101 .. 0001110 rn:5 0 rd:4 \ + &rri_esz esz=3 imm=%index3_22_17 + +PMOV_vp 00000101 00 101 01 1001110 0 rn:4 rd:5 \ + &rri_esz esz=0 imm=0 +PMOV_vp 00000101 00 101 1 imm:1 1001110 0 rn:4 rd:5 &rri_esz esz=1 +PMOV_vp 00000101 01 101 imm:2 1001110 0 rn:4 rd:5 &rri_esz esz=2 +PMOV_vp 00000101 1. 101 .. 1001110 0 rn:4 rd:5 \ + &rri_esz esz=3 imm=%index3_22_17 + # SVE vector table lookup TBL 00000101 .. 1 ..... 001100 ..... ..... @rd_rn_rm diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sve_rev_d)(void *vd, void *vn, uint32_t desc) } } +/* + * TODO: This could use half_shuffle64 and similar bit tricks to + * expand blocks of bits at once. + */ +#define DO_PMOV_PV(NAME, ESIZE) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + unsigned vl = simd_oprsz(desc); \ + unsigned idx = simd_data(desc); \ + unsigned elements = vl / ESIZE; \ + ARMPredicateReg *d = vd; \ + ARMVectorReg *s = vs; \ + memset(d, 0, sizeof(*d)); \ + for (unsigned e = 0; e < elements; ++e) { \ + depositn(d->p, e * ESIZE, 1, extractn(s->d, elements * idx + e, 1)); \ + } \ +} + +DO_PMOV_PV(pmov_pv_h, 2) +DO_PMOV_PV(pmov_pv_s, 4) +DO_PMOV_PV(pmov_pv_d, 8) + +#undef DO_PMOV_PV + +/* + * TODO: This could use half_unshuffle64 and similar bit tricks to + * compress blocks of bits at once. + */ +#define DO_PMOV_VP(NAME, ESIZE) \ +void HELPER(NAME)(void *vd, void *vs, uint32_t desc) \ +{ \ + unsigned vl = simd_oprsz(desc); \ + unsigned idx = simd_data(desc); \ + unsigned elements = vl / ESIZE; \ + ARMVectorReg *d = vd; \ + ARMPredicateReg *s = vs; \ + if (idx == 0) { \ + memset(d, 0, vl); \ + } \ + for (unsigned e = 0; e < elements; ++e) { \ + depositn(d->d, elements * idx + e, 1, extractn(s->p, e * ESIZE, 1)); \ + } \ +} + +DO_PMOV_VP(pmov_vp_h, 2) +DO_PMOV_VP(pmov_vp_s, 4) +DO_PMOV_VP(pmov_vp_d, 8) + +#undef DO_PMOV_VP + typedef void tb_impl_fn(void *, void *, void *, void *, uintptr_t, bool); static inline void do_tbl1(void *vd, void *vn, void *vm, uint32_t desc, diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3 * const tbx_fns[4] = { }; TRANS_FEAT(TBX, aa64_sve2, gen_gvec_ool_arg_zzz, tbx_fns[a->esz], a, 0) +static bool trans_PMOV_pv(DisasContext *s, arg_PMOV_pv *a) +{ + static gen_helper_gvec_2 * const fns[4] = { + NULL, gen_helper_pmov_pv_h, + gen_helper_pmov_pv_s, gen_helper_pmov_pv_d + }; + unsigned vl, pl, vofs, pofs; + TCGv_i64 tmp; + + if (!dc_isar_feature(aa64_sme2p1_or_sve2p1, s)) { + return false; + } + if (!sve_access_check(s)) { + return true; + } + + vl = vec_full_reg_size(s); + if (a->esz != MO_8) { + tcg_gen_gvec_2_ool(pred_full_reg_offset(s, a->rd), + vec_full_reg_offset(s, a->rn), + vl, vl, a->imm, fns[a->esz]); + return true; + } + + /* + * Copy the low PL bytes from vector Zn, zero-extending to a + * multiple of 8 bytes, so that Pd is properly cleared. + */ + + pl = vl / 8; + pofs = pred_full_reg_offset(s, a->rd); + vofs = vec_full_reg_offset(s, a->rn); + + QEMU_BUILD_BUG_ON(sizeof(ARMPredicateReg) != 32); + for (unsigned i = 32; i >= 8; i >>= 1) { + if (pl & i) { + tcg_gen_gvec_mov(MO_64, pofs, vofs, i, i); + pofs += i; + vofs += i; + } + } + switch (pl & 7) { + case 0: + return true; + case 2: + tmp = tcg_temp_new_i64(); + tcg_gen_ld16u_i64(tmp, tcg_env, vofs + (HOST_BIG_ENDIAN ? 6 : 0)); + break; + case 4: + tmp = tcg_temp_new_i64(); + tcg_gen_ld32u_i64(tmp, tcg_env, vofs + (HOST_BIG_ENDIAN ? 4 : 0)); + break; + case 6: + tmp = tcg_temp_new_i64(); + tcg_gen_ld_i64(tmp, tcg_env, vofs); + tcg_gen_extract_i64(tmp, tmp, 0, 48); + break; + default: + g_assert_not_reached(); + } + tcg_gen_st_i64(tmp, tcg_env, pofs); + return true; +} + +static bool trans_PMOV_vp(DisasContext *s, arg_PMOV_pv *a) +{ + static gen_helper_gvec_2 * const fns[4] = { + NULL, gen_helper_pmov_vp_h, + gen_helper_pmov_vp_s, gen_helper_pmov_vp_d + }; + unsigned vl; + + if (!dc_isar_feature(aa64_sme2p1_or_sve2p1, s)) { + return false; + } + if (!sve_access_check(s)) { + return true; + } + + vl = vec_full_reg_size(s); + + if (a->esz == MO_8) { + /* + * The low PL bytes are copied from Pn to Zd unchanged. + * We know that the unused portion of Pn is zero, and + * that imm == 0, so the balance of Zd must be zeroed. + */ + tcg_gen_gvec_mov(MO_64, vec_full_reg_offset(s, a->rd), + pred_full_reg_offset(s, a->rn), + size_for_gvec(vl / 8), vl); + } else { + tcg_gen_gvec_2_ool(vec_full_reg_offset(s, a->rd), + pred_full_reg_offset(s, a->rn), + vl, vl, a->imm, fns[a->esz]); + } + return true; +} + static bool trans_UNPK(DisasContext *s, arg_UNPK *a) { static gen_helper_gvec_2 * const fns[4][2] = { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-92-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 10 ++++++++++ target/arm/tcg/sve.decode | 6 ++++++ target/arm/tcg/sve_helper.c | 29 +++++++++++++++++++++++++++++ target/arm/tcg/translate-sve.c | 19 ++++++++++++++++++- 4 files changed, 63 insertions(+), 1 deletion(-) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_zip_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_zip_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_zip_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_zipq_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_zipq_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_zipq_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_zipq_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_4(sve_uzp_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_uzp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_uzp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_uzp_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_uzp_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_uzpq_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_uzpq_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_uzpq_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_uzpq_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_4(sve_trn_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_trn_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve_trn_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ UZP2_q 00000101 10 1 ..... 000 011 ..... ..... @rd_rn_rm_e0 TRN1_q 00000101 10 1 ..... 000 110 ..... ..... @rd_rn_rm_e0 TRN2_q 00000101 10 1 ..... 000 111 ..... ..... @rd_rn_rm_e0 +# SVE2.1 permute vector elements (quadwords) +ZIPQ1 01000100 .. 0 ..... 111 000 ..... ..... @rd_rn_rm +ZIPQ2 01000100 .. 0 ..... 111 001 ..... ..... @rd_rn_rm +UZPQ1 01000100 .. 0 ..... 111 010 ..... ..... @rd_rn_rm +UZPQ2 01000100 .. 0 ..... 111 011 ..... ..... @rd_rn_rm + ### SVE Permute - Predicated Group # SVE compress active elements diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ DO_UZP(sve_uzp_s, uint32_t, H1_4) DO_UZP(sve_uzp_d, uint64_t, H1_8) DO_UZP(sve2_uzp_q, Int128, ) +typedef void perseg_zzz_fn(void *vd, void *vn, void *vm, uint32_t desc); + +static void do_perseg_zzz(void *vd, void *vn, void *vm, + uint32_t desc, perseg_zzz_fn *fn) +{ + intptr_t oprsz = simd_oprsz(desc); + + desc = simd_desc(16, 16, simd_data(desc)); + for (intptr_t i = 0; i < oprsz; i += 16) { + fn(vd + i, vn + i, vm + i, desc); + } +} + +#define DO_PERSEG_ZZZ(NAME, FUNC) \ + void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ + { do_perseg_zzz(vd, vn, vm, desc, FUNC); } + +DO_PERSEG_ZZZ(sve2p1_uzpq_b, helper_sve_uzp_b) +DO_PERSEG_ZZZ(sve2p1_uzpq_h, helper_sve_uzp_h) +DO_PERSEG_ZZZ(sve2p1_uzpq_s, helper_sve_uzp_s) +DO_PERSEG_ZZZ(sve2p1_uzpq_d, helper_sve_uzp_d) + +DO_PERSEG_ZZZ(sve2p1_zipq_b, helper_sve_zip_b) +DO_PERSEG_ZZZ(sve2p1_zipq_h, helper_sve_zip_h) +DO_PERSEG_ZZZ(sve2p1_zipq_s, helper_sve_zip_s) +DO_PERSEG_ZZZ(sve2p1_zipq_d, helper_sve_zip_d) + +#undef DO_PERSEG_ZZZ + #define DO_TRN(NAME, TYPE, H) \ void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ { \ diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT_NONSTREAMING(ZIP2_q, aa64_sve_f64mm, do_interleave_q, gen_helper_sve2_zip_q, a, QEMU_ALIGN_DOWN(vec_full_reg_size(s), 32) / 2) +static gen_helper_gvec_3 * const zipq_fns[4] = { + gen_helper_sve2p1_zipq_b, gen_helper_sve2p1_zipq_h, + gen_helper_sve2p1_zipq_s, gen_helper_sve2p1_zipq_d, +}; +TRANS_FEAT(ZIPQ1, aa64_sme2p1_or_sve2p1, gen_gvec_ool_arg_zzz, + zipq_fns[a->esz], a, 0) +TRANS_FEAT(ZIPQ2, aa64_sme2p1_or_sve2p1, gen_gvec_ool_arg_zzz, + zipq_fns[a->esz], a, 16 / 2) + static gen_helper_gvec_3 * const uzp_fns[4] = { gen_helper_sve_uzp_b, gen_helper_sve_uzp_h, gen_helper_sve_uzp_s, gen_helper_sve_uzp_d, }; - TRANS_FEAT(UZP1_z, aa64_sve, gen_gvec_ool_arg_zzz, uzp_fns[a->esz], a, 0) TRANS_FEAT(UZP2_z, aa64_sve, gen_gvec_ool_arg_zzz, @@ -XXX,XX +XXX,XX @@ TRANS_FEAT_NONSTREAMING(UZP1_q, aa64_sve_f64mm, do_interleave_q, TRANS_FEAT_NONSTREAMING(UZP2_q, aa64_sve_f64mm, do_interleave_q, gen_helper_sve2_uzp_q, a, 16) +static gen_helper_gvec_3 * const uzpq_fns[4] = { + gen_helper_sve2p1_uzpq_b, gen_helper_sve2p1_uzpq_h, + gen_helper_sve2p1_uzpq_s, gen_helper_sve2p1_uzpq_d, +}; +TRANS_FEAT(UZPQ1, aa64_sme2p1_or_sve2p1, gen_gvec_ool_arg_zzz, + uzpq_fns[a->esz], a, 0) +TRANS_FEAT(UZPQ2, aa64_sme2p1_or_sve2p1, gen_gvec_ool_arg_zzz, + uzpq_fns[a->esz], a, 1 << a->esz) + static gen_helper_gvec_3 * const trn_fns[4] = { gen_helper_sve_trn_b, gen_helper_sve_trn_h, gen_helper_sve_trn_s, gen_helper_sve_trn_d, -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-93-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 10 ++++++++++ target/arm/tcg/sve.decode | 3 +++ target/arm/tcg/sve_helper.c | 10 ++++++++++ target/arm/tcg/translate-sve.c | 14 ++++++++++++++ 4 files changed, 37 insertions(+) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sve2_tbl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_tbl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_tbl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_tblq_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_tblq_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_tblq_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_tblq_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_4(sve2_tbx_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_tbx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_tbx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_tbx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_tbxq_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_tbxq_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_tbxq_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve2p1_tbxq_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_3(sve_sunpk_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve_sunpk_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve_sunpk_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ ZIPQ2 01000100 .. 0 ..... 111 001 ..... ..... @rd_rn_rm UZPQ1 01000100 .. 0 ..... 111 010 ..... ..... @rd_rn_rm UZPQ2 01000100 .. 0 ..... 111 011 ..... ..... @rd_rn_rm +TBLQ 01000100 .. 0 ..... 111 110 ..... ..... @rd_rn_rm +TBXQ 00000101 .. 1 ..... 001 101 ..... ..... @rd_rn_rm + ### SVE Permute - Predicated Group # SVE compress active elements diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ DO_PERSEG_ZZZ(sve2p1_zipq_h, helper_sve_zip_h) DO_PERSEG_ZZZ(sve2p1_zipq_s, helper_sve_zip_s) DO_PERSEG_ZZZ(sve2p1_zipq_d, helper_sve_zip_d) +DO_PERSEG_ZZZ(sve2p1_tblq_b, helper_sve_tbl_b) +DO_PERSEG_ZZZ(sve2p1_tblq_h, helper_sve_tbl_h) +DO_PERSEG_ZZZ(sve2p1_tblq_s, helper_sve_tbl_s) +DO_PERSEG_ZZZ(sve2p1_tblq_d, helper_sve_tbl_d) + +DO_PERSEG_ZZZ(sve2p1_tbxq_b, helper_sve2_tbx_b) +DO_PERSEG_ZZZ(sve2p1_tbxq_h, helper_sve2_tbx_h) +DO_PERSEG_ZZZ(sve2p1_tbxq_s, helper_sve2_tbx_s) +DO_PERSEG_ZZZ(sve2p1_tbxq_d, helper_sve2_tbx_d) + #undef DO_PERSEG_ZZZ #define DO_TRN(NAME, TYPE, H) \ diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_4 * const sve2_tbl_fns[4] = { TRANS_FEAT(TBL_sve2, aa64_sve2, gen_gvec_ool_zzzz, sve2_tbl_fns[a->esz], a->rd, a->rn, (a->rn + 1) % 32, a->rm, 0) +static gen_helper_gvec_3 * const tblq_fns[4] = { + gen_helper_sve2p1_tblq_b, gen_helper_sve2p1_tblq_h, + gen_helper_sve2p1_tblq_s, gen_helper_sve2p1_tblq_d +}; +TRANS_FEAT(TBLQ, aa64_sme2p1_or_sve2p1, gen_gvec_ool_arg_zzz, + tblq_fns[a->esz], a, 0) + static gen_helper_gvec_3 * const tbx_fns[4] = { gen_helper_sve2_tbx_b, gen_helper_sve2_tbx_h, gen_helper_sve2_tbx_s, gen_helper_sve2_tbx_d }; TRANS_FEAT(TBX, aa64_sve2, gen_gvec_ool_arg_zzz, tbx_fns[a->esz], a, 0) +static gen_helper_gvec_3 * const tbxq_fns[4] = { + gen_helper_sve2p1_tbxq_b, gen_helper_sve2p1_tbxq_h, + gen_helper_sve2p1_tbxq_s, gen_helper_sve2p1_tbxq_d +}; +TRANS_FEAT(TBXQ, aa64_sme2p1_or_sve2p1, gen_gvec_ool_arg_zzz, + tbxq_fns[a->esz], a, 0) + static bool trans_PMOV_pv(DisasContext *s, arg_PMOV_pv *a) { static gen_helper_gvec_2 * const fns[4] = { -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Implement the SVE2p1 consecutive register LD1/ST1, and the SME2 strided register LD1/ST1. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-94-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 16 ++ target/arm/tcg/sve.decode | 50 ++++ target/arm/tcg/sve_helper.c | 493 +++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sve.c | 103 +++++++ 4 files changed, 662 insertions(+) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(pmov_pv_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(pmov_vp_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(pmov_vp_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(pmov_vp_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_5(sve2p1_ld1bb_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_ld1hh_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_ld1hh_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_ld1ss_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_ld1ss_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_ld1dd_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_ld1dd_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) + +DEF_HELPER_FLAGS_5(sve2p1_st1bb_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_st1hh_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_st1hh_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_st1ss_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_st1ss_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_st1dd_le_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) +DEF_HELPER_FLAGS_5(sve2p1_st1dd_be_c, TCG_CALL_NO_WG, void, env, ptr, tl, i32, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ SCLAMP 01000100 .. 0 ..... 110000 ..... ..... @rda_rn_rm UCLAMP 01000100 .. 0 ..... 110001 ..... ..... @rda_rn_rm FCLAMP 01100100 .. 1 ..... 001001 ..... ..... @rda_rn_rm + +### SVE2p1 multi-vec contiguous load + +&zcrr_ldst rd png rn rm esz nreg +&zcri_ldst rd png rn imm esz nreg +%png 10:3 !function=plus_8 +%zd_ax2 1:4 !function=times_2 +%zd_ax4 2:3 !function=times_4 + +LD1_zcrr 10100000000 rm:5 0 esz:2 ... rn:5 .... - \ + &zcrr_ldst %png rd=%zd_ax2 nreg=2 +LD1_zcrr 10100000000 rm:5 1 esz:2 ... rn:5 ... 0- \ + &zcrr_ldst %png rd=%zd_ax4 nreg=4 + +ST1_zcrr 10100000001 rm:5 0 esz:2 ... rn:5 .... - \ + &zcrr_ldst %png rd=%zd_ax2 nreg=2 +ST1_zcrr 10100000001 rm:5 1 esz:2 ... rn:5 ... 0- \ + &zcrr_ldst %png rd=%zd_ax4 nreg=4 + +LD1_zcri 101000000100 imm:s4 0 esz:2 ... rn:5 .... - \ + &zcri_ldst %png rd=%zd_ax2 nreg=2 +LD1_zcri 101000000100 imm:s4 1 esz:2 ... rn:5 ... 0- \ + &zcri_ldst %png rd=%zd_ax4 nreg=4 + +ST1_zcri 101000000110 imm:s4 0 esz:2 ... rn:5 .... - \ + &zcri_ldst %png rd=%zd_ax2 nreg=2 +ST1_zcri 101000000110 imm:s4 1 esz:2 ... rn:5 ... 0- \ + &zcri_ldst %png rd=%zd_ax4 nreg=4 + +# Note: N bit and 0 bit (for nreg4) still mashed in rd. +# This is handled within gen_ldst_c(). +LD1_zcrr_stride 10100001000 rm:5 0 esz:2 ... rn:5 rd:5 \ + &zcrr_ldst %png nreg=2 +LD1_zcrr_stride 10100001000 rm:5 1 esz:2 ... rn:5 rd:5 \ + &zcrr_ldst %png nreg=4 + +ST1_zcrr_stride 10100001001 rm:5 0 esz:2 ... rn:5 rd:5 \ + &zcrr_ldst %png nreg=2 +ST1_zcrr_stride 10100001001 rm:5 1 esz:2 ... rn:5 rd:5 \ + &zcrr_ldst %png nreg=4 + +LD1_zcri_stride 101000010100 imm:s4 0 esz:2 ... rn:5 rd:5 \ + &zcri_ldst %png nreg=2 +LD1_zcri_stride 101000010100 imm:s4 1 esz:2 ... rn:5 rd:5 \ + &zcri_ldst %png nreg=4 + +ST1_zcri_stride 101000010110 imm:s4 0 esz:2 ... rn:5 rd:5 \ + &zcri_ldst %png nreg=2 +ST1_zcri_stride 101000010110 imm:s4 1 esz:2 ... rn:5 rd:5 \ + &zcri_ldst %png nreg=4 diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ DO_ST1_ZPZ_D(dd_be, zd, MO_64) #undef DO_ST1_ZPZ_S #undef DO_ST1_ZPZ_D +/* + * SVE2.1 consecutive register load/store + */ + +static unsigned sve2p1_cont_ldst_elements(SVEContLdSt *info, vaddr addr, + uint32_t png, intptr_t reg_max, + int N, int v_esz) +{ + const int esize = 1 << v_esz; + intptr_t reg_off_first = -1, reg_off_last = -1, reg_off_split; + DecodeCounter p = decode_counter(png, reg_max, v_esz); + unsigned b_count = p.count << v_esz; + unsigned b_stride = 1 << (v_esz + p.lg2_stride); + intptr_t page_split; + + /* Set all of the element indices to -1, and the TLB data to 0. */ + memset(info, -1, offsetof(SVEContLdSt, page)); + memset(info->page, 0, sizeof(info->page)); + + if (p.invert) { + if (b_count >= reg_max * N) { + return 0; + } + reg_off_first = b_count; + reg_off_last = reg_max * N - b_stride; + } else { + if (b_count == 0) { + return 0; + } + reg_off_first = 0; + reg_off_last = MIN(b_count - esize, reg_max * N - b_stride); + } + + info->reg_off_first[0] = reg_off_first; + info->mem_off_first[0] = reg_off_first; + + page_split = -(addr | TARGET_PAGE_MASK); + if (reg_off_last + esize <= page_split || reg_off_first >= page_split) { + /* The entire operation fits within a single page. */ + info->reg_off_last[0] = reg_off_last; + return b_stride; + } + + info->page_split = page_split; + reg_off_split = ROUND_DOWN(page_split, esize); + + /* + * This is the last full element on the first page, but it is not + * necessarily active. If there is no full element, i.e. the first + * active element is the one that's split, this value remains -1. + * It is useful as iteration bounds. + */ + if (reg_off_split != 0) { + info->reg_off_last[0] = ROUND_DOWN(reg_off_split - esize, b_stride); + } + + /* Determine if an unaligned element spans the pages. */ + if (page_split & (esize - 1)) { + /* It is helpful to know if the split element is active. */ + if ((reg_off_split & (b_stride - 1)) == 0) { + info->reg_off_split = reg_off_split; + info->mem_off_split = reg_off_split; + } + reg_off_split += esize; + } + + /* + * We do want the first active element on the second page, because + * this may affect the address reported in an exception. + */ + reg_off_split = ROUND_UP(reg_off_split, b_stride); + if (reg_off_split <= reg_off_last) { + info->reg_off_first[1] = reg_off_split; + info->mem_off_first[1] = reg_off_split; + info->reg_off_last[1] = reg_off_last; + } + return b_stride; +} + +static void sve2p1_cont_ldst_watchpoints(SVEContLdSt *info, CPUARMState *env, + target_ulong addr, unsigned estride, + int esize, int wp_access, uintptr_t ra) +{ +#ifndef CONFIG_USER_ONLY + intptr_t count_off, count_last; + int flags0 = info->page[0].flags; + int flags1 = info->page[1].flags; + + if (likely(!((flags0 | flags1) & TLB_WATCHPOINT))) { + return; + } + + /* Indicate that watchpoints are handled. */ + info->page[0].flags = flags0 & ~TLB_WATCHPOINT; + info->page[1].flags = flags1 & ~TLB_WATCHPOINT; + + if (flags0 & TLB_WATCHPOINT) { + count_off = info->reg_off_first[0]; + count_last = info->reg_off_split; + if (count_last < 0) { + count_last = info->reg_off_last[0]; + } + do { + cpu_check_watchpoint(env_cpu(env), addr + count_off, + esize, info->page[0].attrs, wp_access, ra); + count_off += estride; + } while (count_off <= count_last); + } + + count_off = info->reg_off_first[1]; + if ((flags1 & TLB_WATCHPOINT) && count_off >= 0) { + count_last = info->reg_off_last[1]; + do { + cpu_check_watchpoint(env_cpu(env), addr + count_off, + esize, info->page[1].attrs, + wp_access, ra); + count_off += estride; + } while (count_off <= count_last); + } +#endif +} + +static void sve2p1_cont_ldst_mte_check(SVEContLdSt *info, CPUARMState *env, + target_ulong addr, unsigned estride, + int esize, uint32_t mtedesc, + uintptr_t ra) +{ + intptr_t count_off, count_last; + + /* + * TODO: estride is always a small power of two, <= 8. + * Manipulate the stride within the loops such that + * - first iteration hits addr + off, as required, + * - second iteration hits ALIGN_UP(addr, 16), + * - other iterations advance addr by 16. + * This will minimize the probing to once per MTE granule. + */ + + /* Process the page only if MemAttr == Tagged. */ + if (info->page[0].tagged) { + count_off = info->reg_off_first[0]; + count_last = info->reg_off_split; + if (count_last < 0) { + count_last = info->reg_off_last[0]; + } + + do { + mte_check(env, mtedesc, addr + count_off, ra); + count_off += estride; + } while (count_off <= count_last); + } + + count_off = info->reg_off_first[1]; + if (count_off >= 0 && info->page[1].tagged) { + count_last = info->reg_off_last[1]; + do { + mte_check(env, mtedesc, addr + count_off, ra); + count_off += estride; + } while (count_off <= count_last); + } +} + +static inline QEMU_ALWAYS_INLINE +void sve2p1_ld1_c(CPUARMState *env, ARMVectorReg *zd, const vaddr addr, + uint32_t png, uint32_t desc, + const uintptr_t ra, const MemOp esz, + sve_ldst1_host_fn *host_fn, + sve_ldst1_tlb_fn *tlb_fn) +{ + const unsigned N = (desc >> SIMD_DATA_SHIFT) & 1 ? 4 : 2; + const unsigned rstride = 1 << ((desc >> (SIMD_DATA_SHIFT + 1)) % 4); + uint32_t mtedesc = desc >> (SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT); + const intptr_t reg_max = simd_oprsz(desc); + const unsigned esize = 1 << esz; + intptr_t count_off, count_last; + intptr_t reg_off, reg_last, reg_n; + SVEContLdSt info; + unsigned estride, flags; + void *host; + + estride = sve2p1_cont_ldst_elements(&info, addr, png, reg_max, N, esz); + if (estride == 0) { + /* The entire predicate was false; no load occurs. */ + for (unsigned n = 0; n < N; n++) { + memset(zd + n * rstride, 0, reg_max); + } + return; + } + + /* Probe the page(s). Exit with exception for any invalid page. */ + sve_cont_ldst_pages(&info, FAULT_ALL, env, addr, MMU_DATA_LOAD, ra); + + /* Handle watchpoints for all active elements. */ + sve2p1_cont_ldst_watchpoints(&info, env, addr, estride, + esize, BP_MEM_READ, ra); + + /* + * Handle mte checks for all active elements. + * Since TBI must be set for MTE, !mtedesc => !mte_active. + */ + if (mtedesc) { + sve2p1_cont_ldst_mte_check(&info, env, estride, addr, + esize, mtedesc, ra); + } + + flags = info.page[0].flags | info.page[1].flags; + if (unlikely(flags != 0)) { + /* + * At least one page includes MMIO. + * Any bus operation can fail with cpu_transaction_failed, + * which for ARM will raise SyncExternal. Perform the load + * into scratch memory to preserve register state until the end. + */ + ARMVectorReg scratch[4] = { }; + + count_off = info.reg_off_first[0]; + count_last = info.reg_off_last[1]; + if (count_last < 0) { + count_last = info.reg_off_split; + if (count_last < 0) { + count_last = info.reg_off_last[0]; + } + } + reg_off = count_off % reg_max; + reg_n = count_off / reg_max; + + do { + reg_last = MIN(count_last - count_off, reg_max - esize); + do { + tlb_fn(env, &scratch[reg_n], reg_off, addr + count_off, ra); + reg_off += estride; + count_off += estride; + } while (reg_off <= reg_last); + reg_off = 0; + reg_n++; + } while (count_off <= count_last); + + for (unsigned n = 0; n < N; ++n) { + memcpy(&zd[n * rstride], &scratch[n], reg_max); + } + return; + } + + /* The entire operation is in RAM, on valid pages. */ + + for (unsigned n = 0; n < N; ++n) { + memset(&zd[n * rstride], 0, reg_max); + } + + count_off = info.reg_off_first[0]; + count_last = info.reg_off_last[0]; + reg_off = count_off % reg_max; + reg_n = count_off / reg_max; + host = info.page[0].host; + + set_helper_retaddr(ra); + + do { + reg_last = MIN(count_last - reg_n * reg_max, reg_max - esize); + do { + host_fn(&zd[reg_n * rstride], reg_off, host + count_off); + reg_off += estride; + count_off += estride; + } while (reg_off <= reg_last); + reg_off = 0; + reg_n++; + } while (count_off <= count_last); + + clear_helper_retaddr(); + + /* + * Use the slow path to manage the cross-page misalignment. + * But we know this is RAM and cannot trap. + */ + count_off = info.reg_off_split; + if (unlikely(count_off >= 0)) { + reg_off = count_off % reg_max; + reg_n = count_off / reg_max; + tlb_fn(env, &zd[reg_n * rstride], reg_off, addr + count_off, ra); + } + + count_off = info.reg_off_first[1]; + if (unlikely(count_off >= 0)) { + count_last = info.reg_off_last[1]; + reg_off = count_off % reg_max; + reg_n = count_off / reg_max; + host = info.page[1].host; + + set_helper_retaddr(ra); + + do { + reg_last = MIN(count_last - reg_n * reg_max, reg_max - esize); + do { + host_fn(&zd[reg_n * rstride], reg_off, host + count_off); + reg_off += estride; + count_off += estride; + } while (reg_off <= reg_last); + reg_off = 0; + reg_n++; + } while (count_off <= count_last); + + clear_helper_retaddr(); + } +} + +void HELPER(sve2p1_ld1bb_c)(CPUARMState *env, void *vd, target_ulong addr, + uint32_t png, uint32_t desc) +{ + sve2p1_ld1_c(env, vd, addr, png, desc, GETPC(), MO_8, + sve_ld1bb_host, sve_ld1bb_tlb); +} + +#define DO_LD1_2(NAME, ESZ) \ +void HELPER(sve2p1_##NAME##_le_c)(CPUARMState *env, void *vd, \ + target_ulong addr, uint32_t png, \ + uint32_t desc) \ +{ \ + sve2p1_ld1_c(env, vd, addr, png, desc, GETPC(), ESZ, \ + sve_##NAME##_le_host, sve_##NAME##_le_tlb); \ +} \ +void HELPER(sve2p1_##NAME##_be_c)(CPUARMState *env, void *vd, \ + target_ulong addr, uint32_t png, \ + uint32_t desc) \ +{ \ + sve2p1_ld1_c(env, vd, addr, png, desc, GETPC(), ESZ, \ + sve_##NAME##_be_host, sve_##NAME##_be_tlb); \ +} + +DO_LD1_2(ld1hh, MO_16) +DO_LD1_2(ld1ss, MO_32) +DO_LD1_2(ld1dd, MO_64) + +#undef DO_LD1_2 + +static inline QEMU_ALWAYS_INLINE +void sve2p1_st1_c(CPUARMState *env, ARMVectorReg *zd, const vaddr addr, + uint32_t png, uint32_t desc, + const uintptr_t ra, const int esz, + sve_ldst1_host_fn *host_fn, + sve_ldst1_tlb_fn *tlb_fn) +{ + const unsigned N = (desc >> SIMD_DATA_SHIFT) & 1 ? 4 : 2; + const unsigned rstride = 1 << ((desc >> (SIMD_DATA_SHIFT + 1)) % 4); + uint32_t mtedesc = desc >> (SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT); + const intptr_t reg_max = simd_oprsz(desc); + const unsigned esize = 1 << esz; + intptr_t count_off, count_last; + intptr_t reg_off, reg_last, reg_n; + SVEContLdSt info; + unsigned estride, flags; + void *host; + + estride = sve2p1_cont_ldst_elements(&info, addr, png, reg_max, N, esz); + if (estride == 0) { + /* The entire predicate was false; no store occurs. */ + return; + } + + /* Probe the page(s). Exit with exception for any invalid page. */ + sve_cont_ldst_pages(&info, FAULT_ALL, env, addr, MMU_DATA_STORE, ra); + + /* Handle watchpoints for all active elements. */ + sve2p1_cont_ldst_watchpoints(&info, env, addr, estride, + esize, BP_MEM_WRITE, ra); + + /* + * Handle mte checks for all active elements. + * Since TBI must be set for MTE, !mtedesc => !mte_active. + */ + if (mtedesc) { + sve2p1_cont_ldst_mte_check(&info, env, estride, addr, + esize, mtedesc, ra); + } + + flags = info.page[0].flags | info.page[1].flags; + if (unlikely(flags != 0)) { + /* + * At least one page includes MMIO. + * Any bus operation can fail with cpu_transaction_failed, + * which for ARM will raise SyncExternal. Perform the load + * into scratch memory to preserve register state until the end. + */ + count_off = info.reg_off_first[0]; + count_last = info.reg_off_last[1]; + if (count_last < 0) { + count_last = info.reg_off_split; + if (count_last < 0) { + count_last = info.reg_off_last[0]; + } + } + reg_off = count_off % reg_max; + reg_n = count_off / reg_max; + + do { + reg_last = MIN(count_last - count_off, reg_max - esize); + do { + tlb_fn(env, &zd[reg_n * rstride], reg_off, addr + count_off, ra); + reg_off += estride; + count_off += estride; + } while (reg_off <= reg_last); + reg_off = 0; + reg_n++; + } while (count_off <= count_last); + return; + } + + /* The entire operation is in RAM, on valid pages. */ + + count_off = info.reg_off_first[0]; + count_last = info.reg_off_last[0]; + reg_off = count_off % reg_max; + reg_n = count_off / reg_max; + host = info.page[0].host; + + set_helper_retaddr(ra); + + do { + reg_last = MIN(count_last - reg_n * reg_max, reg_max - esize); + do { + host_fn(&zd[reg_n * rstride], reg_off, host + count_off); + reg_off += estride; + count_off += estride; + } while (reg_off <= reg_last); + reg_off = 0; + reg_n++; + } while (count_off <= count_last); + + clear_helper_retaddr(); + + /* + * Use the slow path to manage the cross-page misalignment. + * But we know this is RAM and cannot trap. + */ + count_off = info.reg_off_split; + if (unlikely(count_off >= 0)) { + reg_off = count_off % reg_max; + reg_n = count_off / reg_max; + tlb_fn(env, &zd[reg_n * rstride], reg_off, addr + count_off, ra); + } + + count_off = info.reg_off_first[1]; + if (unlikely(count_off >= 0)) { + count_last = info.reg_off_last[1]; + reg_off = count_off % reg_max; + reg_n = count_off / reg_max; + host = info.page[1].host; + + set_helper_retaddr(ra); + + do { + reg_last = MIN(count_last - reg_n * reg_max, reg_max - esize); + do { + host_fn(&zd[reg_n * rstride], reg_off, host + count_off); + reg_off += estride; + count_off += estride; + } while (reg_off <= reg_last); + reg_off = 0; + reg_n++; + } while (count_off <= count_last); + + clear_helper_retaddr(); + } +} + +void HELPER(sve2p1_st1bb_c)(CPUARMState *env, void *vd, target_ulong addr, + uint32_t png, uint32_t desc) +{ + sve2p1_st1_c(env, vd, addr, png, desc, GETPC(), MO_8, + sve_st1bb_host, sve_st1bb_tlb); +} + +#define DO_ST1_2(NAME, ESZ) \ +void HELPER(sve2p1_##NAME##_le_c)(CPUARMState *env, void *vd, \ + target_ulong addr, uint32_t png, \ + uint32_t desc) \ +{ \ + sve2p1_st1_c(env, vd, addr, png, desc, GETPC(), ESZ, \ + sve_##NAME##_le_host, sve_##NAME##_le_tlb); \ +} \ +void HELPER(sve2p1_##NAME##_be_c)(CPUARMState *env, void *vd, \ + target_ulong addr, uint32_t png, \ + uint32_t desc) \ +{ \ + sve2p1_st1_c(env, vd, addr, png, desc, GETPC(), ESZ, \ + sve_##NAME##_be_host, sve_##NAME##_be_tlb); \ +} + +DO_ST1_2(st1hh, MO_16) +DO_ST1_2(st1ss, MO_32) +DO_ST1_2(st1dd, MO_64) + +#undef DO_ST1_2 + void HELPER(sve2_eor3)(void *vd, void *vn, void *vm, void *vk, uint32_t desc) { intptr_t i, opr_sz = simd_oprsz(desc) / 8; diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(UQCVTN_sh, aa64_sme2_or_sve2p1, gen_gvec_ool_zz, gen_helper_sme2_uqcvtn_sh, a->rd, a->rn, 0) TRANS_FEAT(SQCVTUN_sh, aa64_sme2_or_sve2p1, gen_gvec_ool_zz, gen_helper_sme2_sqcvtun_sh, a->rd, a->rn, 0) + +static bool gen_ldst_c(DisasContext *s, TCGv_i64 addr, int zd, int png, + MemOp esz, bool is_write, int n, bool strided) +{ + typedef void ldst_c_fn(TCGv_env, TCGv_ptr, TCGv_i64, + TCGv_i32, TCGv_i32); + static ldst_c_fn * const f_ldst[2][2][4] = { + { { gen_helper_sve2p1_ld1bb_c, + gen_helper_sve2p1_ld1hh_le_c, + gen_helper_sve2p1_ld1ss_le_c, + gen_helper_sve2p1_ld1dd_le_c, }, + { gen_helper_sve2p1_ld1bb_c, + gen_helper_sve2p1_ld1hh_be_c, + gen_helper_sve2p1_ld1ss_be_c, + gen_helper_sve2p1_ld1dd_be_c, } }, + + { { gen_helper_sve2p1_st1bb_c, + gen_helper_sve2p1_st1hh_le_c, + gen_helper_sve2p1_st1ss_le_c, + gen_helper_sve2p1_st1dd_le_c, }, + { gen_helper_sve2p1_st1bb_c, + gen_helper_sve2p1_st1hh_be_c, + gen_helper_sve2p1_st1ss_be_c, + gen_helper_sve2p1_st1dd_be_c, } } + }; + + TCGv_i32 t_png, t_desc; + TCGv_ptr t_zd; + uint32_t desc, lg2_rstride = 0; + bool be = s->be_data == MO_BE; + + assert(n == 2 || n == 4); + if (strided) { + lg2_rstride = 3; + if (n == 4) { + /* Validate ZD alignment. */ + if (zd & 4) { + return false; + } + lg2_rstride = 2; + } + /* Ignore non-temporal bit */ + zd &= ~8; + } + + if (strided || !dc_isar_feature(aa64_sve2p1, s) + ? !sme_sm_enabled_check(s) + : !sve_access_check(s)) { + return true; + } + + if (!s->mte_active[0]) { + addr = clean_data_tbi(s, addr); + } + + desc = n == 2 ? 0 : 1; + desc = desc | (lg2_rstride << 1); + desc = make_svemte_desc(s, vec_full_reg_size(s), 1, esz, is_write, desc); + t_desc = tcg_constant_i32(desc); + + t_png = tcg_temp_new_i32(); + tcg_gen_ld16u_i32(t_png, tcg_env, + pred_full_reg_offset(s, png) ^ + (HOST_BIG_ENDIAN ? 6 : 0)); + + t_zd = tcg_temp_new_ptr(); + tcg_gen_addi_ptr(t_zd, tcg_env, vec_full_reg_offset(s, zd)); + + f_ldst[is_write][be][esz](tcg_env, t_zd, addr, t_png, t_desc); + return true; +} + +static bool gen_ldst_zcrr_c(DisasContext *s, arg_zcrr_ldst *a, + bool is_write, bool strided) +{ + TCGv_i64 addr = tcg_temp_new_i64(); + + tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), a->esz); + tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn)); + return gen_ldst_c(s, addr, a->rd, a->png, a->esz, is_write, + a->nreg, strided); +} + +static bool gen_ldst_zcri_c(DisasContext *s, arg_zcri_ldst *a, + bool is_write, bool strided) +{ + TCGv_i64 addr = tcg_temp_new_i64(); + + tcg_gen_addi_i64(addr, cpu_reg_sp(s, a->rn), + a->imm * a->nreg * vec_full_reg_size(s)); + return gen_ldst_c(s, addr, a->rd, a->png, a->esz, is_write, + a->nreg, strided); +} + +TRANS_FEAT(LD1_zcrr, aa64_sme2_or_sve2p1, gen_ldst_zcrr_c, a, false, false) +TRANS_FEAT(LD1_zcri, aa64_sme2_or_sve2p1, gen_ldst_zcri_c, a, false, false) +TRANS_FEAT(ST1_zcrr, aa64_sme2_or_sve2p1, gen_ldst_zcrr_c, a, true, false) +TRANS_FEAT(ST1_zcri, aa64_sme2_or_sve2p1, gen_ldst_zcri_c, a, true, false) + +TRANS_FEAT(LD1_zcrr_stride, aa64_sme2, gen_ldst_zcrr_c, a, false, true) +TRANS_FEAT(LD1_zcri_stride, aa64_sme2, gen_ldst_zcri_c, a, false, true) +TRANS_FEAT(ST1_zcrr_stride, aa64_sme2, gen_ldst_zcrr_c, a, true, true) +TRANS_FEAT(ST1_zcri_stride, aa64_sme2, gen_ldst_zcri_c, a, true, true) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> The msz > esz encodings are reserved, and some of them are about to be reused. Split these patterns so that the new insns do not overlap. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-95-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve.decode | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ # Stores; user must fill in ESZ, MSZ, NREG as needed. @rprr_store ....... .. .. rm:5 ... pg:3 rn:5 rd:5 &rprr_store -@rpri_store_msz ....... msz:2 .. . imm:s4 ... pg:3 rn:5 rd:5 &rpri_store +@rpri_store ....... .. .. . imm:s4 ... pg:3 rn:5 rd:5 &rpri_store @rprr_store_esz_n0 ....... .. esz:2 rm:5 ... pg:3 rn:5 rd:5 \ &rprr_store nreg=0 @rprr_scatter_store ....... msz:2 .. rm:5 ... pg:3 rn:5 rd:5 \ @@ -XXX,XX +XXX,XX @@ STR_zri 1110010 11 0. ..... 010 ... ..... ..... @rd_rn_i9 # SVE contiguous store (scalar plus immediate) # ST1B, ST1H, ST1W, ST1D; require msz <= esz -ST_zpri 1110010 .. esz:2 0.... 111 ... ..... ..... \ - @rpri_store_msz nreg=0 +ST_zpri 1110010 00 esz:2 0.... 111 ... ..... ..... \ + @rpri_store msz=0 nreg=0 +ST_zpri 1110010 01 esz:2 0.... 111 ... ..... ..... \ + @rpri_store msz=1 nreg=0 +ST_zpri 1110010 10 10 0.... 111 ... ..... ..... \ + @rpri_store msz=2 esz=2 nreg=0 +ST_zpri 1110010 10 11 0.... 111 ... ..... ..... \ + @rpri_store msz=2 esz=3 nreg=0 +ST_zpri 1110010 11 11 0.... 111 ... ..... ..... \ + @rpri_store msz=3 esz=3 nreg=0 # SVE contiguous store (scalar plus scalar) # ST1B, ST1H, ST1W, ST1D; require msz <= esz @@ -XXX,XX +XXX,XX @@ ST_zprr 1110010 00 .. ..... 010 ... ..... ..... \ @rprr_store_esz_n0 msz=0 ST_zprr 1110010 01 .. ..... 010 ... ..... ..... \ @rprr_store_esz_n0 msz=1 -ST_zprr 1110010 10 .. ..... 010 ... ..... ..... \ - @rprr_store_esz_n0 msz=2 +ST_zprr 1110010 10 10 ..... 010 ... ..... ..... \ + @rprr_store msz=2 esz=2 nreg=0 +ST_zprr 1110010 10 11 ..... 010 ... ..... ..... \ + @rprr_store msz=2 esz=3 nreg=0 ST_zprr 1110010 11 11 ..... 010 ... ..... ..... \ @rprr_store msz=3 esz=3 nreg=0 # SVE contiguous non-temporal store (scalar plus immediate) (nreg == 0) # SVE store multiple structures (scalar plus immediate) (nreg != 0) ST_zpri 1110010 .. nreg:2 1.... 111 ... ..... ..... \ - @rpri_store_msz esz=%size_23 + @rpri_store msz=%size_23 esz=%size_23 # SVE contiguous non-temporal store (scalar plus scalar) (nreg == 0) # SVE store multiple structures (scalar plus scalar) (nreg != 0) -ST_zprr 1110010 msz:2 nreg:2 ..... 011 ... ..... ..... \ - @rprr_store esz=%size_23 +ST_zprr 1110010 .. nreg:2 ..... 011 ... ..... ..... \ + @rprr_store msz=%size_23 esz=%size_23 # SVE 32-bit scatter store (scalar plus 32-bit scaled offsets) # Require msz > 0 && msz <= esz. -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-96-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 22 +++++ target/arm/tcg/sve_ldst_internal.h | 26 ++++++ target/arm/tcg/sve.decode | 20 +++++ target/arm/tcg/sve_helper.c | 6 ++ target/arm/tcg/translate-sve.c | 136 +++++++++++++++++++++++------ 5 files changed, 183 insertions(+), 27 deletions(-) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_ld1hds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1sdu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1sds_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1squ_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1dqu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_4(sve_ld1sdu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1sds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1squ_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1dqu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_4(sve_ld1bb_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld2bb_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld3bb_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_ld1hds_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1sdu_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1sds_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1squ_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1dqu_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_4(sve_ld1sdu_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1sds_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1squ_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld1dqu_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_4(sve_ldff1bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ldff1bhu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ldff1bsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_st1hd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st1sd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st1sd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st1sq_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st1sq_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st1dq_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st1dq_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_4(sve_st1bb_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st2bb_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st3bb_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_st1hd_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st1sd_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st1sd_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st1sq_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st1sq_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st1dq_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st1dq_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_6(sve_ldbsu_zsu, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_6(sve_ldhsu_le_zsu, TCG_CALL_NO_WG, diff --git a/target/arm/tcg/sve_ldst_internal.h b/target/arm/tcg/sve_ldst_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_ldst_internal.h +++ b/target/arm/tcg/sve_ldst_internal.h @@ -XXX,XX +XXX,XX @@ DO_ST_PRIM_2(sd, H1_8, uint64_t, uint32_t, stl) DO_LD_PRIM_2(dd, H1_8, uint64_t, uint64_t, ldq) DO_ST_PRIM_2(dd, H1_8, uint64_t, uint64_t, stq) +#define DO_LD_PRIM_3(NAME, FUNC) \ + static inline void sve_##NAME##_host(void *vd, \ + intptr_t reg_off, void *host) \ + { sve_##FUNC##_host(vd, reg_off, host); \ + *(uint64_t *)(vd + reg_off + 8) = 0; } \ + static inline void sve_##NAME##_tlb(CPUARMState *env, void *vd, \ + intptr_t reg_off, target_ulong addr, uintptr_t ra) \ + { sve_##FUNC##_tlb(env, vd, reg_off, addr, ra); \ + *(uint64_t *)(vd + reg_off + 8) = 0; } + +DO_LD_PRIM_3(ld1squ_be, ld1sdu_be) +DO_LD_PRIM_3(ld1squ_le, ld1sdu_le) +DO_LD_PRIM_3(ld1dqu_be, ld1dd_be) +DO_LD_PRIM_3(ld1dqu_le, ld1dd_le) + +#define sve_st1sq_be_host sve_st1sd_be_host +#define sve_st1sq_le_host sve_st1sd_le_host +#define sve_st1sq_be_tlb sve_st1sd_be_tlb +#define sve_st1sq_le_tlb sve_st1sd_le_tlb + +#define sve_st1dq_be_host sve_st1dd_be_host +#define sve_st1dq_le_host sve_st1dd_le_host +#define sve_st1dq_be_tlb sve_st1dd_be_tlb +#define sve_st1dq_le_tlb sve_st1dd_le_tlb + #undef DO_LD_TLB #undef DO_ST_TLB #undef DO_LD_HOST @@ -XXX,XX +XXX,XX @@ DO_ST_PRIM_2(dd, H1_8, uint64_t, uint64_t, stq) #undef DO_ST_PRIM_1 #undef DO_LD_PRIM_2 #undef DO_ST_PRIM_2 +#undef DO_LD_PRIM_3 /* * Resolve the guest virtual address to info->host and info->flags. diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ LD1_zpiz 1000010 .. 01 ..... 1.. ... ..... ..... \ # SVE contiguous load (scalar plus scalar) LD_zprr 1010010 .... ..... 010 ... ..... ..... @rprr_load_dt nreg=0 +# LD1W (128-bit element) +LD_zprr 1010010 1000 rm:5 100 pg:3 rn:5 rd:5 \ + &rprr_load dtype=16 nreg=0 +# LD1D (128-bit element) +LD_zprr 1010010 1100 rm:5 100 pg:3 rn:5 rd:5 \ + &rprr_load dtype=17 nreg=0 # SVE contiguous first-fault load (scalar plus scalar) LDFF1_zprr 1010010 .... ..... 011 ... ..... ..... @rprr_load_dt nreg=0 # SVE contiguous load (scalar plus immediate) LD_zpri 1010010 .... 0.... 101 ... ..... ..... @rpri_load_dt nreg=0 +# LD1W (128-bit element) +LD_zpri 1010010 1000 1 imm:s4 001 pg:3 rn:5 rd:5 \ + &rpri_load dtype=16 nreg=0 +# LD1D (128-bit element) +LD_zpri 1010010 1100 1 imm:s4 001 pg:3 rn:5 rd:5 \ + &rpri_load dtype=17 nreg=0 # SVE contiguous non-fault load (scalar plus immediate) LDNF1_zpri 1010010 .... 1.... 101 ... ..... ..... @rpri_load_dt nreg=0 @@ -XXX,XX +XXX,XX @@ ST_zpri 1110010 10 11 0.... 111 ... ..... ..... \ @rpri_store msz=2 esz=3 nreg=0 ST_zpri 1110010 11 11 0.... 111 ... ..... ..... \ @rpri_store msz=3 esz=3 nreg=0 +ST_zpri 1110010 10 00 0.... 111 ... ..... ..... \ + @rpri_store msz=2 esz=4 nreg=0 +ST_zpri 1110010 11 10 0.... 111 ... ..... ..... \ + @rpri_store msz=3 esz=4 nreg=0 # SVE contiguous store (scalar plus scalar) # ST1B, ST1H, ST1W, ST1D; require msz <= esz @@ -XXX,XX +XXX,XX @@ ST_zprr 1110010 10 11 ..... 010 ... ..... ..... \ @rprr_store msz=2 esz=3 nreg=0 ST_zprr 1110010 11 11 ..... 010 ... ..... ..... \ @rprr_store msz=3 esz=3 nreg=0 +ST_zprr 1110010 10 00 ..... 010 ... ..... ..... \ + @rprr_store msz=2 esz=4 nreg=0 +ST_zprr 1110010 11 10 ..... 010 ... ..... ..... \ + @rprr_store msz=3 esz=4 nreg=0 # SVE contiguous non-temporal store (scalar plus immediate) (nreg == 0) # SVE store multiple structures (scalar plus immediate) (nreg != 0) diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ DO_LD1_2(ld1sds, MO_64, MO_32) DO_LD1_2(ld1dd, MO_64, MO_64) +DO_LD1_2(ld1squ, MO_32, MO_128) +DO_LD1_2(ld1dqu, MO_64, MO_128) + #undef DO_LD1_1 #undef DO_LD1_2 @@ -XXX,XX +XXX,XX @@ DO_STN_2(2, dd, MO_64, MO_64) DO_STN_2(3, dd, MO_64, MO_64) DO_STN_2(4, dd, MO_64, MO_64) +DO_STN_2(1, sq, MO_128, MO_32) +DO_STN_2(1, dq, MO_128, MO_64) + #undef DO_STN_1 #undef DO_STN_2 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static bool trans_STR_pri(DisasContext *s, arg_rri *a) */ /* The memory mode of the dtype. */ -static const MemOp dtype_mop[16] = { +static const MemOp dtype_mop[19] = { MO_UB, MO_UB, MO_UB, MO_UB, MO_SL, MO_UW, MO_UW, MO_UW, MO_SW, MO_SW, MO_UL, MO_UL, - MO_SB, MO_SB, MO_SB, MO_UQ + MO_SB, MO_SB, MO_SB, MO_UQ, + /* Artificial values used by decode */ + MO_UL, MO_UQ, MO_128, }; #define dtype_msz(x) (dtype_mop[x] & MO_SIZE) /* The vector element size of dtype. */ -static const uint8_t dtype_esz[16] = { +static const uint8_t dtype_esz[19] = { 0, 1, 2, 3, 3, 1, 2, 3, 3, 2, 2, 3, - 3, 2, 1, 3 + 3, 2, 1, 3, + /* Artificial values used by decode */ + 4, 4, 4, }; uint32_t make_svemte_desc(DisasContext *s, unsigned vsz, uint32_t nregs, @@ -XXX,XX +XXX,XX @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, } /* Indexed by [mte][be][dtype][nreg] */ -static gen_helper_gvec_mem * const ldr_fns[2][2][16][4] = { +static gen_helper_gvec_mem * const ldr_fns[2][2][18][4] = { { /* mte inactive, little-endian */ { { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r, gen_helper_sve_ld3bb_r, gen_helper_sve_ld4bb_r }, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_mem * const ldr_fns[2][2][16][4] = { { gen_helper_sve_ld1bss_r, NULL, NULL, NULL }, { gen_helper_sve_ld1bhs_r, NULL, NULL, NULL }, { gen_helper_sve_ld1dd_le_r, gen_helper_sve_ld2dd_le_r, - gen_helper_sve_ld3dd_le_r, gen_helper_sve_ld4dd_le_r } }, + gen_helper_sve_ld3dd_le_r, gen_helper_sve_ld4dd_le_r }, + + { gen_helper_sve_ld1squ_le_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1dqu_le_r, NULL, NULL, NULL }, + }, /* mte inactive, big-endian */ { { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_mem * const ldr_fns[2][2][16][4] = { { gen_helper_sve_ld1bss_r, NULL, NULL, NULL }, { gen_helper_sve_ld1bhs_r, NULL, NULL, NULL }, { gen_helper_sve_ld1dd_be_r, gen_helper_sve_ld2dd_be_r, - gen_helper_sve_ld3dd_be_r, gen_helper_sve_ld4dd_be_r } } }, + gen_helper_sve_ld3dd_be_r, gen_helper_sve_ld4dd_be_r }, + + { gen_helper_sve_ld1squ_be_r, NULL, NULL, NULL }, + { gen_helper_sve_ld1dqu_be_r, NULL, NULL, NULL }, + }, + }, { /* mte active, little-endian */ { { gen_helper_sve_ld1bb_r_mte, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_mem * const ldr_fns[2][2][16][4] = { { gen_helper_sve_ld1dd_le_r_mte, gen_helper_sve_ld2dd_le_r_mte, gen_helper_sve_ld3dd_le_r_mte, - gen_helper_sve_ld4dd_le_r_mte } }, + gen_helper_sve_ld4dd_le_r_mte }, + + { gen_helper_sve_ld1squ_le_r_mte, NULL, NULL, NULL }, + { gen_helper_sve_ld1dqu_le_r_mte, NULL, NULL, NULL }, + }, /* mte active, big-endian */ { { gen_helper_sve_ld1bb_r_mte, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_mem * const ldr_fns[2][2][16][4] = { { gen_helper_sve_ld1dd_be_r_mte, gen_helper_sve_ld2dd_be_r_mte, gen_helper_sve_ld3dd_be_r_mte, - gen_helper_sve_ld4dd_be_r_mte } } }, + gen_helper_sve_ld4dd_be_r_mte }, + + { gen_helper_sve_ld1squ_be_r_mte, NULL, NULL, NULL }, + { gen_helper_sve_ld1dqu_be_r_mte, NULL, NULL, NULL }, + }, + }, }; static void do_ld_zpa(DisasContext *s, int zt, int pg, @@ -XXX,XX +XXX,XX @@ static void do_ld_zpa(DisasContext *s, int zt, int pg, static bool trans_LD_zprr(DisasContext *s, arg_rprr_load *a) { - if (a->rm == 31 || !dc_isar_feature(aa64_sve, s)) { + if (a->rm == 31) { return false; } + + /* dtypes 16 and 17 are artificial, representing 128-bit element */ + if (a->dtype < 16) { + if (!dc_isar_feature(aa64_sve, s)) { + return false; + } + } else { + if (!dc_isar_feature(aa64_sve2p1, s)) { + return false; + } + s->is_nonstreaming = true; + } + if (sve_access_check(s)) { TCGv_i64 addr = tcg_temp_new_i64(); tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), dtype_msz(a->dtype)); @@ -XXX,XX +XXX,XX @@ static bool trans_LD_zprr(DisasContext *s, arg_rprr_load *a) static bool trans_LD_zpri(DisasContext *s, arg_rpri_load *a) { - if (!dc_isar_feature(aa64_sve, s)) { - return false; + /* dtypes 16 and 17 are artificial, representing 128-bit element */ + if (a->dtype < 16) { + if (!dc_isar_feature(aa64_sve, s)) { + return false; + } + } else { + if (!dc_isar_feature(aa64_sve2p1, s)) { + return false; + } + s->is_nonstreaming = true; } + if (sve_access_check(s)) { int vsz = vec_full_reg_size(s); int elements = vsz >> dtype_esz[a->dtype]; @@ -XXX,XX +XXX,XX @@ static bool trans_LD1R_zpri(DisasContext *s, arg_rpri_load *a) static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, int msz, int esz, int nreg) { - static gen_helper_gvec_mem * const fn_single[2][2][4][4] = { + static gen_helper_gvec_mem * const fn_single[2][2][4][5] = { { { { gen_helper_sve_st1bb_r, gen_helper_sve_st1bh_r, gen_helper_sve_st1bs_r, @@ -XXX,XX +XXX,XX @@ static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, gen_helper_sve_st1hd_le_r }, { NULL, NULL, gen_helper_sve_st1ss_le_r, - gen_helper_sve_st1sd_le_r }, + gen_helper_sve_st1sd_le_r, + gen_helper_sve_st1sq_le_r, }, { NULL, NULL, NULL, - gen_helper_sve_st1dd_le_r } }, + gen_helper_sve_st1dd_le_r, + gen_helper_sve_st1dq_le_r, } }, { { gen_helper_sve_st1bb_r, gen_helper_sve_st1bh_r, gen_helper_sve_st1bs_r, @@ -XXX,XX +XXX,XX @@ static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, gen_helper_sve_st1hd_be_r }, { NULL, NULL, gen_helper_sve_st1ss_be_r, - gen_helper_sve_st1sd_be_r }, + gen_helper_sve_st1sd_be_r, + gen_helper_sve_st1sq_be_r }, { NULL, NULL, NULL, - gen_helper_sve_st1dd_be_r } } }, + gen_helper_sve_st1dd_be_r, + gen_helper_sve_st1dq_be_r } } }, { { { gen_helper_sve_st1bb_r_mte, gen_helper_sve_st1bh_r_mte, @@ -XXX,XX +XXX,XX @@ static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, gen_helper_sve_st1hd_le_r_mte }, { NULL, NULL, gen_helper_sve_st1ss_le_r_mte, - gen_helper_sve_st1sd_le_r_mte }, + gen_helper_sve_st1sd_le_r_mte, + gen_helper_sve_st1sq_le_r_mte }, { NULL, NULL, NULL, - gen_helper_sve_st1dd_le_r_mte } }, + gen_helper_sve_st1dd_le_r_mte, + gen_helper_sve_st1dq_le_r_mte } }, { { gen_helper_sve_st1bb_r_mte, gen_helper_sve_st1bh_r_mte, gen_helper_sve_st1bs_r_mte, @@ -XXX,XX +XXX,XX @@ static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, gen_helper_sve_st1hd_be_r_mte }, { NULL, NULL, gen_helper_sve_st1ss_be_r_mte, - gen_helper_sve_st1sd_be_r_mte }, + gen_helper_sve_st1sd_be_r_mte, + gen_helper_sve_st1sq_be_r_mte }, { NULL, NULL, NULL, - gen_helper_sve_st1dd_be_r_mte } } }, + gen_helper_sve_st1dd_be_r_mte, + gen_helper_sve_st1dq_be_r_mte } } }, }; static gen_helper_gvec_mem * const fn_multiple[2][2][3][4] = { { { { gen_helper_sve_st2bb_r, @@ -XXX,XX +XXX,XX @@ static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, static bool trans_ST_zprr(DisasContext *s, arg_rprr_store *a) { - if (!dc_isar_feature(aa64_sve, s)) { - return false; - } if (a->rm == 31 || a->msz > a->esz) { return false; } + switch (a->esz) { + case MO_8 ... MO_64: + if (!dc_isar_feature(aa64_sve, s)) { + return false; + } + break; + case MO_128: + assert(a->msz < a->esz); + assert(a->nreg == 0); + if (!dc_isar_feature(aa64_sve2p1, s)) { + return false; + } + s->is_nonstreaming = true; + break; + default: + g_assert_not_reached(); + } + if (sve_access_check(s)) { TCGv_i64 addr = tcg_temp_new_i64(); tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), a->msz); @@ -XXX,XX +XXX,XX @@ static bool trans_ST_zprr(DisasContext *s, arg_rprr_store *a) static bool trans_ST_zpri(DisasContext *s, arg_rpri_store *a) { - if (!dc_isar_feature(aa64_sve, s)) { - return false; - } if (a->msz > a->esz) { return false; } + switch (a->esz) { + case MO_8 ... MO_64: + if (!dc_isar_feature(aa64_sve, s)) { + return false; + } + break; + case MO_128: + assert(a->msz < a->esz); + assert(a->nreg == 0); + if (!dc_isar_feature(aa64_sve2p1, s)) { + return false; + } + s->is_nonstreaming = true; + break; + default: + g_assert_not_reached(); + } + if (sve_access_check(s)) { int vsz = vec_full_reg_size(s); int elements = vsz >> a->esz; -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Move from sme_helper.c to the shared header. Add a comment noting the lack of atomicity. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-97-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/sve_ldst_internal.h | 63 ++++++++++++++++++++++++++++++ target/arm/tcg/sme_helper.c | 44 +++------------------ 2 files changed, 69 insertions(+), 38 deletions(-) diff --git a/target/arm/tcg/sve_ldst_internal.h b/target/arm/tcg/sve_ldst_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_ldst_internal.h +++ b/target/arm/tcg/sve_ldst_internal.h @@ -XXX,XX +XXX,XX @@ DO_LD_PRIM_3(ld1dqu_le, ld1dd_le) #define sve_st1dq_be_tlb sve_st1dd_be_tlb #define sve_st1dq_le_tlb sve_st1dd_le_tlb +/* + * The ARMVectorReg elements are stored in host-endian 64-bit units. + * For 128-bit quantities, the sequence defined by the Elem[] pseudocode + * corresponds to storing the two 64-bit pieces in little-endian order. + */ +/* FIXME: Nothing in this file makes any effort at atomicity. */ + +static inline void sve_ld1qq_be_host(void *vd, intptr_t reg_off, void *host) +{ + sve_ld1dd_be_host(vd, reg_off + 8, host); + sve_ld1dd_be_host(vd, reg_off, host + 8); +} + +static inline void sve_ld1qq_le_host(void *vd, intptr_t reg_off, void *host) +{ + sve_ld1dd_le_host(vd, reg_off, host); + sve_ld1dd_le_host(vd, reg_off + 8, host + 8); +} + +static inline void +sve_ld1qq_be_tlb(CPUARMState *env, void *vd, intptr_t reg_off, + target_ulong addr, uintptr_t ra) +{ + sve_ld1dd_be_tlb(env, vd, reg_off + 8, addr, ra); + sve_ld1dd_be_tlb(env, vd, reg_off, addr + 8, ra); +} + +static inline void +sve_ld1qq_le_tlb(CPUARMState *env, void *vd, intptr_t reg_off, + target_ulong addr, uintptr_t ra) +{ + sve_ld1dd_le_tlb(env, vd, reg_off, addr, ra); + sve_ld1dd_le_tlb(env, vd, reg_off + 8, addr + 8, ra); +} + +static inline void sve_st1qq_be_host(void *vd, intptr_t reg_off, void *host) +{ + sve_st1dd_be_host(vd, reg_off + 8, host); + sve_st1dd_be_host(vd, reg_off, host + 8); +} + +static inline void sve_st1qq_le_host(void *vd, intptr_t reg_off, void *host) +{ + sve_st1dd_le_host(vd, reg_off, host); + sve_st1dd_le_host(vd, reg_off + 8, host + 8); +} + +static inline void +sve_st1qq_be_tlb(CPUARMState *env, void *vd, intptr_t reg_off, + target_ulong addr, uintptr_t ra) +{ + sve_st1dd_be_tlb(env, vd, reg_off + 8, addr, ra); + sve_st1dd_be_tlb(env, vd, reg_off, addr + 8, ra); +} + +static inline void +sve_st1qq_le_tlb(CPUARMState *env, void *vd, intptr_t reg_off, + target_ulong addr, uintptr_t ra) +{ + sve_st1dd_le_tlb(env, vd, reg_off, addr, ra); + sve_st1dd_le_tlb(env, vd, reg_off + 8, addr + 8, ra); +} + #undef DO_LD_TLB #undef DO_ST_TLB #undef DO_LD_HOST diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ static inline void sme_##NAME##_v_tlb(CPUARMState *env, void *za, \ TLB(env, useronly_clean_ptr(addr), val, ra); \ } -/* - * The ARMVectorReg elements are stored in host-endian 64-bit units. - * For 128-bit quantities, the sequence defined by the Elem[] pseudocode - * corresponds to storing the two 64-bit pieces in little-endian order. - */ -#define DO_LDQ(HNAME, VNAME, BE, HOST, TLB) \ -static inline void HNAME##_host(void *za, intptr_t off, void *host) \ -{ \ - uint64_t val0 = HOST(host), val1 = HOST(host + 8); \ - uint64_t *ptr = za + off; \ - ptr[0] = BE ? val1 : val0, ptr[1] = BE ? val0 : val1; \ -} \ +#define DO_LDQ(HNAME, VNAME) \ static inline void VNAME##_v_host(void *za, intptr_t off, void *host) \ { \ HNAME##_host(za, tile_vslice_offset(off), host); \ } \ -static inline void HNAME##_tlb(CPUARMState *env, void *za, intptr_t off, \ - target_ulong addr, uintptr_t ra) \ -{ \ - uint64_t val0 = TLB(env, useronly_clean_ptr(addr), ra); \ - uint64_t val1 = TLB(env, useronly_clean_ptr(addr + 8), ra); \ - uint64_t *ptr = za + off; \ - ptr[0] = BE ? val1 : val0, ptr[1] = BE ? val0 : val1; \ -} \ static inline void VNAME##_v_tlb(CPUARMState *env, void *za, intptr_t off, \ target_ulong addr, uintptr_t ra) \ { \ HNAME##_tlb(env, za, tile_vslice_offset(off), addr, ra); \ } -#define DO_STQ(HNAME, VNAME, BE, HOST, TLB) \ -static inline void HNAME##_host(void *za, intptr_t off, void *host) \ -{ \ - uint64_t *ptr = za + off; \ - HOST(host, ptr[BE]); \ - HOST(host + 8, ptr[!BE]); \ -} \ +#define DO_STQ(HNAME, VNAME) \ static inline void VNAME##_v_host(void *za, intptr_t off, void *host) \ { \ HNAME##_host(za, tile_vslice_offset(off), host); \ } \ -static inline void HNAME##_tlb(CPUARMState *env, void *za, intptr_t off, \ - target_ulong addr, uintptr_t ra) \ -{ \ - uint64_t *ptr = za + off; \ - TLB(env, useronly_clean_ptr(addr), ptr[BE], ra); \ - TLB(env, useronly_clean_ptr(addr + 8), ptr[!BE], ra); \ -} \ static inline void VNAME##_v_tlb(CPUARMState *env, void *za, intptr_t off, \ target_ulong addr, uintptr_t ra) \ { \ @@ -XXX,XX +XXX,XX @@ DO_LD(ld1s_le, uint32_t, ldl_le_p, cpu_ldl_le_data_ra) DO_LD(ld1d_be, uint64_t, ldq_be_p, cpu_ldq_be_data_ra) DO_LD(ld1d_le, uint64_t, ldq_le_p, cpu_ldq_le_data_ra) -DO_LDQ(sve_ld1qq_be, sme_ld1q_be, 1, ldq_be_p, cpu_ldq_be_data_ra) -DO_LDQ(sve_ld1qq_le, sme_ld1q_le, 0, ldq_le_p, cpu_ldq_le_data_ra) +DO_LDQ(sve_ld1qq_be, sme_ld1q_be) +DO_LDQ(sve_ld1qq_le, sme_ld1q_le) DO_ST(st1b, uint8_t, stb_p, cpu_stb_data_ra) DO_ST(st1h_be, uint16_t, stw_be_p, cpu_stw_be_data_ra) @@ -XXX,XX +XXX,XX @@ DO_ST(st1s_le, uint32_t, stl_le_p, cpu_stl_le_data_ra) DO_ST(st1d_be, uint64_t, stq_be_p, cpu_stq_be_data_ra) DO_ST(st1d_le, uint64_t, stq_le_p, cpu_stq_le_data_ra) -DO_STQ(sve_st1qq_be, sme_st1q_be, 1, stq_be_p, cpu_stq_be_data_ra) -DO_STQ(sve_st1qq_le, sme_st1q_le, 0, stq_le_p, cpu_stq_le_data_ra) +DO_STQ(sve_st1qq_be, sme_st1q_be) +DO_STQ(sve_st1qq_le, sme_st1q_le) #undef DO_LD #undef DO_ST -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-98-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 32 +++++++++ target/arm/tcg/sve.decode | 31 +++++++++ target/arm/tcg/sve_helper.c | 8 +++ target/arm/tcg/translate-sve.c | 116 ++++++++++++++++++++++++--------- 4 files changed, 156 insertions(+), 31 deletions(-) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_ld2dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld3dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld4dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld2qq_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld3qq_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld4qq_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_ld2qq_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld3qq_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld4qq_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_4(sve_ld1bhu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1bsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1bdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_ld2dd_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld3dd_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld4dd_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld2qq_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld3qq_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld4qq_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_ld2qq_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld3qq_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_ld4qq_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_4(sve_ld1bhu_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1bsu_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_ld1bdu_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_st2dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st3dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st4dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st2qq_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st3qq_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st4qq_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_st2qq_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st3qq_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st4qq_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_4(sve_st1bh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st1bs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st1bd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_st2dd_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st3dd_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st4dd_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st2qq_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st3qq_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st4qq_le_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + +DEF_HELPER_FLAGS_4(sve_st2qq_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st3qq_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) +DEF_HELPER_FLAGS_4(sve_st4qq_be_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) + DEF_HELPER_FLAGS_4(sve_st1bh_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st1bs_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve_st1bd_r_mte, TCG_CALL_NO_WG, void, env, ptr, tl, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ @rprr_load_dt ....... dtype:4 rm:5 ... pg:3 rn:5 rd:5 &rprr_load @rpri_load_dt ....... dtype:4 . imm:s4 ... pg:3 rn:5 rd:5 &rpri_load +@rprr_load ....... .... rm:5 ... pg:3 rn:5 rd:5 &rprr_load +@rpri_load ....... .... . imm:s4 ... pg:3 rn:5 rd:5 &rpri_load + @rprr_load_msz ....... .... rm:5 ... pg:3 rn:5 rd:5 \ &rprr_load dtype=%msz_dtype @rpri_load_msz ....... .... . imm:s4 ... pg:3 rn:5 rd:5 \ @@ -XXX,XX +XXX,XX @@ LDNF1_zpri 1010010 .... 1.... 101 ... ..... ..... @rpri_load_dt nreg=0 # SVE load multiple structures (scalar plus scalar) # LD2B, LD2H, LD2W, LD2D; etc. LD_zprr 1010010 .. nreg:2 ..... 110 ... ..... ..... @rprr_load_msz +# LD[234]Q +LD_zprr 1010010 01 01 ..... 100 ... ..... ..... \ + @rprr_load dtype=18 nreg=1 +LD_zprr 1010010 10 01 ..... 100 ... ..... ..... \ + @rprr_load dtype=18 nreg=2 +LD_zprr 1010010 11 01 ..... 100 ... ..... ..... \ + @rprr_load dtype=18 nreg=3 # SVE contiguous non-temporal load (scalar plus immediate) # LDNT1B, LDNT1H, LDNT1W, LDNT1D # SVE load multiple structures (scalar plus immediate) # LD2B, LD2H, LD2W, LD2D; etc. LD_zpri 1010010 .. nreg:2 0.... 111 ... ..... ..... @rpri_load_msz +# LD[234]Q +LD_zpri 1010010 01 001 .... 111 ... ..... ..... \ + @rpri_load dtype=18 nreg=1 +LD_zpri 1010010 10 001 .... 111 ... ..... ..... \ + @rpri_load dtype=18 nreg=2 +LD_zpri 1010010 11 001 .... 111 ... ..... ..... \ + @rpri_load dtype=18 nreg=3 # SVE load and broadcast quadword (scalar plus scalar) LD1RQ_zprr 1010010 .. 00 ..... 000 ... ..... ..... \ @@ -XXX,XX +XXX,XX @@ ST_zprr 1110010 11 10 ..... 010 ... ..... ..... \ # SVE store multiple structures (scalar plus immediate) (nreg != 0) ST_zpri 1110010 .. nreg:2 1.... 111 ... ..... ..... \ @rpri_store msz=%size_23 esz=%size_23 +# ST[234]Q +ST_zpri 11100100 01 00 .... 000 ... ..... ..... \ + @rpri_store msz=4 esz=4 nreg=1 +ST_zpri 11100100 10 00 .... 000 ... ..... ..... \ + @rpri_store msz=4 esz=4 nreg=2 +ST_zpri 11100100 11 00 .... 000 ... ..... ..... \ + @rpri_store msz=4 esz=4 nreg=3 # SVE contiguous non-temporal store (scalar plus scalar) (nreg == 0) # SVE store multiple structures (scalar plus scalar) (nreg != 0) ST_zprr 1110010 .. nreg:2 ..... 011 ... ..... ..... \ @rprr_store msz=%size_23 esz=%size_23 +# ST[234]Q +ST_zprr 11100100 01 1 ..... 000 ... ..... ..... \ + @rprr_store msz=4 esz=4 nreg=1 +ST_zprr 11100100 10 1 ..... 000 ... ..... ..... \ + @rprr_store msz=4 esz=4 nreg=2 +ST_zprr 11100100 11 1 ..... 000 ... ..... ..... \ + @rprr_store msz=4 esz=4 nreg=3 # SVE 32-bit scatter store (scalar plus 32-bit scaled offsets) # Require msz > 0 && msz <= esz. diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ DO_LDN_2(2, dd, MO_64) DO_LDN_2(3, dd, MO_64) DO_LDN_2(4, dd, MO_64) +DO_LDN_2(2, qq, MO_128) +DO_LDN_2(3, qq, MO_128) +DO_LDN_2(4, qq, MO_128) + #undef DO_LDN_1 #undef DO_LDN_2 @@ -XXX,XX +XXX,XX @@ DO_STN_2(4, dd, MO_64, MO_64) DO_STN_2(1, sq, MO_128, MO_32) DO_STN_2(1, dq, MO_128, MO_64) +DO_STN_2(2, qq, MO_128, MO_128) +DO_STN_2(3, qq, MO_128, MO_128) +DO_STN_2(4, qq, MO_128, MO_128) + #undef DO_STN_1 #undef DO_STN_2 diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ static inline int expand_imm_sh8u(DisasContext *s, int x) */ static inline int msz_dtype(DisasContext *s, int msz) { - static const uint8_t dtype[4] = { 0, 5, 10, 15 }; + static const uint8_t dtype[5] = { 0, 5, 10, 15, 18 }; return dtype[msz]; } @@ -XXX,XX +XXX,XX @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, } /* Indexed by [mte][be][dtype][nreg] */ -static gen_helper_gvec_mem * const ldr_fns[2][2][18][4] = { +static gen_helper_gvec_mem * const ldr_fns[2][2][19][4] = { { /* mte inactive, little-endian */ { { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r, gen_helper_sve_ld3bb_r, gen_helper_sve_ld4bb_r }, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_mem * const ldr_fns[2][2][18][4] = { { gen_helper_sve_ld1squ_le_r, NULL, NULL, NULL }, { gen_helper_sve_ld1dqu_le_r, NULL, NULL, NULL }, + { NULL, gen_helper_sve_ld2qq_le_r, + gen_helper_sve_ld3qq_le_r, gen_helper_sve_ld4qq_le_r }, }, /* mte inactive, big-endian */ @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_mem * const ldr_fns[2][2][18][4] = { { gen_helper_sve_ld1squ_be_r, NULL, NULL, NULL }, { gen_helper_sve_ld1dqu_be_r, NULL, NULL, NULL }, + { NULL, gen_helper_sve_ld2qq_be_r, + gen_helper_sve_ld3qq_be_r, gen_helper_sve_ld4qq_be_r }, }, }, @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_mem * const ldr_fns[2][2][18][4] = { { gen_helper_sve_ld1squ_le_r_mte, NULL, NULL, NULL }, { gen_helper_sve_ld1dqu_le_r_mte, NULL, NULL, NULL }, + { NULL, + gen_helper_sve_ld2qq_le_r_mte, + gen_helper_sve_ld3qq_le_r_mte, + gen_helper_sve_ld4qq_le_r_mte }, }, /* mte active, big-endian */ @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_mem * const ldr_fns[2][2][18][4] = { { gen_helper_sve_ld1squ_be_r_mte, NULL, NULL, NULL }, { gen_helper_sve_ld1dqu_be_r_mte, NULL, NULL, NULL }, + { NULL, + gen_helper_sve_ld2qq_be_r_mte, + gen_helper_sve_ld3qq_be_r_mte, + gen_helper_sve_ld4qq_be_r_mte }, }, }, }; @@ -XXX,XX +XXX,XX @@ static bool trans_LD_zprr(DisasContext *s, arg_rprr_load *a) return false; } - /* dtypes 16 and 17 are artificial, representing 128-bit element */ - if (a->dtype < 16) { + /* dtypes 16-18 are artificial, representing 128-bit element */ + switch (a->dtype) { + case 0 ... 15: if (!dc_isar_feature(aa64_sve, s)) { return false; } - } else { + break; + case 16: case 17: if (!dc_isar_feature(aa64_sve2p1, s)) { return false; } s->is_nonstreaming = true; + break; + case 18: + if (!dc_isar_feature(aa64_sme2p1_or_sve2p1, s)) { + return false; + } + break; + default: + g_assert_not_reached(); } if (sve_access_check(s)) { @@ -XXX,XX +XXX,XX @@ static bool trans_LD_zprr(DisasContext *s, arg_rprr_load *a) static bool trans_LD_zpri(DisasContext *s, arg_rpri_load *a) { - /* dtypes 16 and 17 are artificial, representing 128-bit element */ - if (a->dtype < 16) { + /* dtypes 16-18 are artificial, representing 128-bit element */ + switch (a->dtype) { + case 0 ... 15: if (!dc_isar_feature(aa64_sve, s)) { return false; } - } else { + break; + case 16: case 17: if (!dc_isar_feature(aa64_sve2p1, s)) { return false; } s->is_nonstreaming = true; + break; + case 18: + if (!dc_isar_feature(aa64_sme2p1_or_sve2p1, s)) { + return false; + } + break; + default: + g_assert_not_reached(); } if (sve_access_check(s)) { @@ -XXX,XX +XXX,XX @@ static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr, gen_helper_sve_st1dd_be_r_mte, gen_helper_sve_st1dq_be_r_mte } } }, }; - static gen_helper_gvec_mem * const fn_multiple[2][2][3][4] = { + static gen_helper_gvec_mem * const fn_multiple[2][2][3][5] = { { { { gen_helper_sve_st2bb_r, gen_helper_sve_st2hh_le_r, gen_helper_sve_st2ss_le_r, - gen_helper_sve_st2dd_le_r }, + gen_helper_sve_st2dd_le_r, + gen_helper_sve_st2qq_le_r }, { gen_helper_sve_st3bb_r, gen_helper_sve_st3hh_le_r, gen_helper_sve_st3ss_le_r, - gen_helper_sve_st3dd_le_r }, + gen_helper_sve_st3dd_le_r, + gen_helper_sve_st3qq_le_r }, { gen_helper_sve_st4bb_r, gen_helper_sve_st4hh_le_r, gen_helper_sve_st4ss_le_r, - gen_helper_sve_st4dd_le_r } }, + gen_helper_sve_st4dd_le_r, + gen_helper_sve_st4qq_le_r } }, { { gen_helper_sve_st2bb_r, gen_helper_sve_st2hh_be_r, gen_helper_sve_st2ss_be_r, - gen_helper_sve_st2dd_be_r }, + gen_helper_sve_st2dd_be_r, + gen_helper_sve_st2qq_be_r }, { gen_helper_sve_st3bb_r, gen_helper_sve_st3hh_be_r, gen_helper_sve_st3ss_be_r, - gen_helper_sve_st3dd_be_r }, + gen_helper_sve_st3dd_be_r, + gen_helper_sve_st3qq_be_r }, { gen_helper_sve_st4bb_r, gen_helper_sve_st4hh_be_r, gen_helper_sve_st4ss_be_r, - gen_helper_sve_st4dd_be_r } } }, + gen_helper_sve_st4dd_be_r, + gen_helper_sve_st4qq_be_r } } }, { { { gen_helper_sve_st2bb_r_mte, gen_helper_sve_st2hh_le_r_mte, gen_helper_sve_st2ss_le_r_mte, - gen_helper_sve_st2dd_le_r_mte }, + gen_helper_sve_st2dd_le_r_mte, + gen_helper_sve_st2qq_le_r_mte }, { gen_helper_sve_st3bb_r_mte, gen_helper_sve_st3hh_le_r_mte, gen_helper_sve_st3ss_le_r_mte, - gen_helper_sve_st3dd_le_r_mte }, + gen_helper_sve_st3dd_le_r_mte, + gen_helper_sve_st3qq_le_r_mte }, { gen_helper_sve_st4bb_r_mte, gen_helper_sve_st4hh_le_r_mte, gen_helper_sve_st4ss_le_r_mte, - gen_helper_sve_st4dd_le_r_mte } }, + gen_helper_sve_st4dd_le_r_mte, + gen_helper_sve_st4qq_le_r_mte } }, { { gen_helper_sve_st2bb_r_mte, gen_helper_sve_st2hh_be_r_mte, gen_helper_sve_st2ss_be_r_mte, - gen_helper_sve_st2dd_be_r_mte }, + gen_helper_sve_st2dd_be_r_mte, + gen_helper_sve_st2qq_be_r_mte }, { gen_helper_sve_st3bb_r_mte, gen_helper_sve_st3hh_be_r_mte, gen_helper_sve_st3ss_be_r_mte, - gen_helper_sve_st3dd_be_r_mte }, + gen_helper_sve_st3dd_be_r_mte, + gen_helper_sve_st3qq_be_r_mte }, { gen_helper_sve_st4bb_r_mte, gen_helper_sve_st4hh_be_r_mte, gen_helper_sve_st4ss_be_r_mte, - gen_helper_sve_st4dd_be_r_mte } } }, + gen_helper_sve_st4dd_be_r_mte, + gen_helper_sve_st4qq_be_r_mte } } }, }; gen_helper_gvec_mem *fn; int be = s->be_data == MO_BE; @@ -XXX,XX +XXX,XX @@ static bool trans_ST_zprr(DisasContext *s, arg_rprr_store *a) } break; case MO_128: - assert(a->msz < a->esz); - assert(a->nreg == 0); - if (!dc_isar_feature(aa64_sve2p1, s)) { - return false; + if (a->nreg == 0) { + assert(a->msz < a->esz); + if (!dc_isar_feature(aa64_sve2p1, s)) { + return false; + } + s->is_nonstreaming = true; + } else { + if (!dc_isar_feature(aa64_sme2p1_or_sve2p1, s)) { + return false; + } } - s->is_nonstreaming = true; break; default: g_assert_not_reached(); @@ -XXX,XX +XXX,XX @@ static bool trans_ST_zpri(DisasContext *s, arg_rpri_store *a) } break; case MO_128: - assert(a->msz < a->esz); - assert(a->nreg == 0); - if (!dc_isar_feature(aa64_sve2p1, s)) { - return false; + if (a->nreg == 0) { + assert(a->msz < a->esz); + if (!dc_isar_feature(aa64_sve2p1, s)) { + return false; + } + s->is_nonstreaming = true; + } else { + if (!dc_isar_feature(aa64_sme2p1_or_sve2p1, s)) { + return false; + } } - s->is_nonstreaming = true; break; default: g_assert_not_reached(); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-99-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sve.h | 16 ++++++++++++++++ target/arm/tcg/sve.decode | 8 ++++++++ target/arm/tcg/sve_helper.c | 6 ++++++ target/arm/tcg/translate-sve.c | 34 ++++++++++++++++++++++++++++++++-- 4 files changed, 62 insertions(+), 2 deletions(-) diff --git a/target/arm/tcg/helper-sve.h b/target/arm/tcg/helper-sve.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sve.h +++ b/target/arm/tcg/helper-sve.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sve_ldsds_le_zd, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_6(sve_ldsds_be_zd, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldqq_le_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldqq_be_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_6(sve_ldbsu_zsu_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sve_ldsds_le_zd_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_6(sve_ldsds_be_zd_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldqq_le_zd_mte, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_ldqq_be_zd_mte, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_6(sve_ldffbsu_zsu, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sve_stdd_le_zd, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_6(sve_stdd_be_zd, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_stqq_le_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_stqq_be_zd, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_6(sve_stbs_zsu_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sve_stdd_le_zd_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_6(sve_stdd_be_zd_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_stqq_le_zd_mte, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) +DEF_HELPER_FLAGS_6(sve_stqq_be_zd_mte, TCG_CALL_NO_WG, + void, env, ptr, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_4(sve2_sqdmull_zzz_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve.decode +++ b/target/arm/tcg/sve.decode @@ -XXX,XX +XXX,XX @@ LD1_zprz 1100010 10 1. ..... 1.. ... ..... ..... \ LD1_zprz 1100010 11 1. ..... 11. ... ..... ..... \ @rprr_g_load_sc esz=3 msz=3 u=1 +# LD1Q +LD1_zprz 1100 0100 000 rm:5 101 pg:3 rn:5 rd:5 \ + &rprr_gather_load u=0 ff=0 xs=2 esz=4 msz=4 scale=0 + # SVE 64-bit gather load (vector plus immediate) LD1_zpiz 1100010 .. 01 ..... 1.. ... ..... ..... \ @rpri_g_load esz=3 @@ -XXX,XX +XXX,XX @@ ST1_zprz 1110010 .. 01 ..... 101 ... ..... ..... \ ST1_zprz 1110010 .. 00 ..... 101 ... ..... ..... \ @rprr_scatter_store xs=2 esz=3 scale=0 +# ST1Q +ST1_zprz 1110 0100 001 rm:5 001 pg:3 rn:5 rd:5 \ + &rprr_scatter_store xs=2 msz=4 esz=4 scale=0 + # SVE 64-bit scatter store (vector plus immediate) ST1_zpiz 1110010 .. 10 ..... 101 ... ..... ..... \ @rpri_scatter_store esz=3 diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -XXX,XX +XXX,XX @@ DO_LD1_ZPZ_D(dd_be, zsu, MO_64) DO_LD1_ZPZ_D(dd_be, zss, MO_64) DO_LD1_ZPZ_D(dd_be, zd, MO_64) +DO_LD1_ZPZ_D(qq_le, zd, MO_128) +DO_LD1_ZPZ_D(qq_be, zd, MO_128) + #undef DO_LD1_ZPZ_S #undef DO_LD1_ZPZ_D @@ -XXX,XX +XXX,XX @@ DO_ST1_ZPZ_D(sd_be, zd, MO_32) DO_ST1_ZPZ_D(dd_le, zd, MO_64) DO_ST1_ZPZ_D(dd_be, zd, MO_64) +DO_ST1_ZPZ_D(qq_le, zd, MO_128) +DO_ST1_ZPZ_D(qq_be, zd, MO_128) + #undef DO_ST1_ZPZ_S #undef DO_ST1_ZPZ_D diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sve.c +++ b/target/arm/tcg/translate-sve.c @@ -XXX,XX +XXX,XX @@ gather_load_fn64[2][2][2][3][2][4] = { gen_helper_sve_ldffdd_be_zd_mte, } } } } }, }; +static gen_helper_gvec_mem_scatter * const +gather_load_fn128[2][2] = { + { gen_helper_sve_ldqq_le_zd, + gen_helper_sve_ldqq_be_zd }, + { gen_helper_sve_ldqq_le_zd_mte, + gen_helper_sve_ldqq_be_zd_mte } +}; + static bool trans_LD1_zprz(DisasContext *s, arg_LD1_zprz *a) { gen_helper_gvec_mem_scatter *fn = NULL; bool be = s->be_data == MO_BE; bool mte = s->mte_active[0]; - if (!dc_isar_feature(aa64_sve, s)) { + if (a->esz < MO_128 + ? !dc_isar_feature(aa64_sve, s) + : !dc_isar_feature(aa64_sve2p1, s)) { return false; } s->is_nonstreaming = true; @@ -XXX,XX +XXX,XX @@ static bool trans_LD1_zprz(DisasContext *s, arg_LD1_zprz *a) case MO_64: fn = gather_load_fn64[mte][be][a->ff][a->xs][a->u][a->msz]; break; + case MO_128: + assert(!a->ff && a->u && a->xs == 2 && a->msz == MO_128); + fn = gather_load_fn128[mte][be]; + break; + default: + g_assert_not_reached(); } assert(fn != NULL); @@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_mem_scatter * const scatter_store_fn64[2][2][3][4] = { gen_helper_sve_stdd_be_zd_mte, } } }, }; +static gen_helper_gvec_mem_scatter * const +scatter_store_fn128[2][2] = { + { gen_helper_sve_stqq_le_zd, + gen_helper_sve_stqq_be_zd }, + { gen_helper_sve_stqq_le_zd_mte, + gen_helper_sve_stqq_be_zd_mte } +}; + static bool trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a) { gen_helper_gvec_mem_scatter *fn; @@ -XXX,XX +XXX,XX @@ static bool trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a) if (a->esz < a->msz || (a->msz == 0 && a->scale)) { return false; } - if (!dc_isar_feature(aa64_sve, s)) { + if (a->esz < MO_128 + ? !dc_isar_feature(aa64_sve, s) + : !dc_isar_feature(aa64_sve2p1, s)) { return false; } s->is_nonstreaming = true; @@ -XXX,XX +XXX,XX @@ static bool trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a) case MO_64: fn = scatter_store_fn64[mte][be][a->xs][a->msz]; break; + case MO_128: + assert(a->xs == 2 && a->msz == MO_128); + fn = scatter_store_fn128[mte][be]; + break; default: g_assert_not_reached(); } -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-100-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 6 ++++ target/arm/tcg/sme.decode | 36 ++++++++++++++++++++ target/arm/tcg/sme_helper.c | 60 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 46 +++++++++++++++++++------- 4 files changed, 137 insertions(+), 11 deletions(-) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(sme2_mova_zc_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_mova_cz_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sme2_mova_zc_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2p1_movaz_zc_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2p1_movaz_zc_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2p1_movaz_zc_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2p1_movaz_zc_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_3(sme2p1_movaz_zc_q, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(sme_ld1b_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_5(sme_ld1b_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32) DEF_HELPER_FLAGS_5(sme_ld1b_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ MOVA_za2 11000000 00 00011 00 .. 010 00 off:3 zr:4 0 \ MOVA_za4 11000000 00 00011 00 .. 011 00 off:3 zr:3 00 \ &mova_a rv=%mova_rv +### SME Move and Zero + +MOVAZ_za2 11000000 00000110 0 .. 01010 off:3 zr:4 0 \ + &mova_a rv=%mova_rv +MOVAZ_za4 11000000 00000110 0 .. 01110 off:3 zr:3 00 \ + &mova_a rv=%mova_rv + +MOVAZ_zt 11000000 00 00001 0 v:1 .. 0001 off:4 zr:5 \ + &mova_t rs=%mova_rs esz=0 za=0 +MOVAZ_zt 11000000 01 00001 0 v:1 .. 0001 za:1 off:3 zr:5 \ + &mova_t rs=%mova_rs esz=1 +MOVAZ_zt 11000000 10 00001 0 v:1 .. 0001 za:2 off:2 zr:5 \ + &mova_t rs=%mova_rs esz=2 +MOVAZ_zt 11000000 11 00001 0 v:1 .. 0001 za:3 off:1 zr:5 \ + &mova_t rs=%mova_rs esz=3 +MOVAZ_zt 11000000 11 00001 1 v:1 .. 0001 za:4 zr:5 \ + &mova_t rs=%mova_rs esz=4 off=0 + +MOVAZ_zt2 11000000 00 00011 0 v:1 .. 00010 off:3 zr:4 0 \ + &mova_t rs=%mova_rs esz=0 za=0 +MOVAZ_zt2 11000000 01 00011 0 v:1 .. 00010 za:1 off:2 zr:4 0 \ + &mova_t rs=%mova_rs esz=1 +MOVAZ_zt2 11000000 10 00011 0 v:1 .. 00010 za:2 off:1 zr:4 0 \ + &mova_t rs=%mova_rs esz=2 +MOVAZ_zt2 11000000 11 00011 0 v:1 .. 00010 za:3 zr:4 0 \ + &mova_t rs=%mova_rs esz=3 off=0 + +MOVAZ_zt4 11000000 00 00011 0 v:1 .. 001100 off:2 zr:3 00 \ + &mova_t rs=%mova_rs esz=0 za=0 +MOVAZ_zt4 11000000 01 00011 0 v:1 .. 001100 za:1 off:1 zr:3 00 \ + &mova_t rs=%mova_rs esz=1 +MOVAZ_zt4 11000000 10 00011 0 v:1 .. 001100 za:2 zr:3 00 \ + &mova_t rs=%mova_rs esz=2 off=0 +MOVAZ_zt4 11000000 11 00011 0 v:1 .. 00110 za:3 zr:3 00 \ + &mova_t rs=%mova_rs esz=3 off=0 + ### SME Move into/from ZT0 MOVT_rzt 1100 0000 0100 1100 0 off:3 00 11111 rt:5 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_mova_zc_d)(void *vdst, void *vsrc, uint32_t desc) } } +void HELPER(sme2p1_movaz_zc_b)(void *vdst, void *vsrc, uint32_t desc) +{ + uint8_t *src = vsrc; + uint8_t *dst = vdst; + size_t i, n = simd_oprsz(desc); + + for (i = 0; i < n; ++i) { + dst[i] = src[tile_vslice_index(i)]; + src[tile_vslice_index(i)] = 0; + } +} + +void HELPER(sme2p1_movaz_zc_h)(void *vdst, void *vsrc, uint32_t desc) +{ + uint16_t *src = vsrc; + uint16_t *dst = vdst; + size_t i, n = simd_oprsz(desc) / 2; + + for (i = 0; i < n; ++i) { + dst[i] = src[tile_vslice_index(i)]; + src[tile_vslice_index(i)] = 0; + } +} + +void HELPER(sme2p1_movaz_zc_s)(void *vdst, void *vsrc, uint32_t desc) +{ + uint32_t *src = vsrc; + uint32_t *dst = vdst; + size_t i, n = simd_oprsz(desc) / 4; + + for (i = 0; i < n; ++i) { + dst[i] = src[tile_vslice_index(i)]; + src[tile_vslice_index(i)] = 0; + } +} + +void HELPER(sme2p1_movaz_zc_d)(void *vdst, void *vsrc, uint32_t desc) +{ + uint64_t *src = vsrc; + uint64_t *dst = vdst; + size_t i, n = simd_oprsz(desc) / 8; + + for (i = 0; i < n; ++i) { + dst[i] = src[tile_vslice_index(i)]; + src[tile_vslice_index(i)] = 0; + } +} + +void HELPER(sme2p1_movaz_zc_q)(void *vdst, void *vsrc, uint32_t desc) +{ + Int128 *src = vsrc; + Int128 *dst = vdst; + size_t i, n = simd_oprsz(desc) / 16; + + for (i = 0; i < n; ++i) { + dst[i] = src[tile_vslice_index(i)]; + memset(&src[tile_vslice_index(i)], 0, 16); + } +} + /* * Clear elements in a tile slice comprising len bytes. */ diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_mova_tile(DisasContext *s, arg_mova_p *a, bool to_vec) TRANS_FEAT(MOVA_tz, aa64_sme, do_mova_tile, a, false) TRANS_FEAT(MOVA_zt, aa64_sme, do_mova_tile, a, true) -static bool do_mova_tile_n(DisasContext *s, arg_mova_t *a, int n, bool to_vec) +static bool do_mova_tile_n(DisasContext *s, arg_mova_t *a, int n, + bool to_vec, bool zero) { static gen_helper_gvec_2 * const cz_fns[] = { gen_helper_sme2_mova_cz_b, gen_helper_sme2_mova_cz_h, @@ -XXX,XX +XXX,XX @@ static bool do_mova_tile_n(DisasContext *s, arg_mova_t *a, int n, bool to_vec) gen_helper_sme2_mova_zc_b, gen_helper_sme2_mova_zc_h, gen_helper_sme2_mova_zc_s, gen_helper_sme2_mova_zc_d, }; + static gen_helper_gvec_2 * const zc_z_fns[] = { + gen_helper_sme2p1_movaz_zc_b, gen_helper_sme2p1_movaz_zc_h, + gen_helper_sme2p1_movaz_zc_s, gen_helper_sme2p1_movaz_zc_d, + gen_helper_sme2p1_movaz_zc_q, + }; TCGv_ptr t_za; int svl, bytes_per_op = n << a->esz; @@ -XXX,XX +XXX,XX @@ static bool do_mova_tile_n(DisasContext *s, arg_mova_t *a, int n, bool to_vec) return true; } + assert(a->esz <= MO_64 + zero); + if (!sme_smza_enabled_check(s)) { return true; } @@ -XXX,XX +XXX,XX @@ static bool do_mova_tile_n(DisasContext *s, arg_mova_t *a, int n, bool to_vec) TCGv_ptr t_zr = vec_full_reg_ptr(s, a->zr * n + i); t_za = get_tile_rowcol(s, a->esz, a->rs, a->za, a->off * n + i, 1, n, a->v); - if (to_vec) { + if (zero) { + zc_z_fns[a->esz](t_zr, t_za, t_desc); + } else if (to_vec) { zc_fns[a->esz](t_zr, t_za, t_desc); } else { cz_fns[a->esz](t_za, t_zr, t_desc); @@ -XXX,XX +XXX,XX @@ static bool do_mova_tile_n(DisasContext *s, arg_mova_t *a, int n, bool to_vec) a->off * n + i, 1, n, a->v); if (to_vec) { tcg_gen_gvec_mov_var(MO_8, tcg_env, o_zr, t_za, 0, svl, svl); + if (zero) { + tcg_gen_gvec_dup_imm_var(MO_8, t_za, 0, svl, svl, 0); + } } else { tcg_gen_gvec_mov_var(MO_8, t_za, 0, tcg_env, o_zr, svl, svl); } @@ -XXX,XX +XXX,XX @@ static bool do_mova_tile_n(DisasContext *s, arg_mova_t *a, int n, bool to_vec) return true; } -TRANS_FEAT(MOVA_tz2, aa64_sme2, do_mova_tile_n, a, 2, false) -TRANS_FEAT(MOVA_tz4, aa64_sme2, do_mova_tile_n, a, 4, false) -TRANS_FEAT(MOVA_zt2, aa64_sme2, do_mova_tile_n, a, 2, true) -TRANS_FEAT(MOVA_zt4, aa64_sme2, do_mova_tile_n, a, 4, true) +TRANS_FEAT(MOVA_tz2, aa64_sme2, do_mova_tile_n, a, 2, false, false) +TRANS_FEAT(MOVA_tz4, aa64_sme2, do_mova_tile_n, a, 4, false, false) +TRANS_FEAT(MOVA_zt2, aa64_sme2, do_mova_tile_n, a, 2, true, false) +TRANS_FEAT(MOVA_zt4, aa64_sme2, do_mova_tile_n, a, 4, true, false) -static bool do_mova_array_n(DisasContext *s, arg_mova_a *a, int n, bool to_vec) +TRANS_FEAT(MOVAZ_zt, aa64_sme2p1, do_mova_tile_n, a, 1, true, true) +TRANS_FEAT(MOVAZ_zt2, aa64_sme2p1, do_mova_tile_n, a, 2, true, true) +TRANS_FEAT(MOVAZ_zt4, aa64_sme2p1, do_mova_tile_n, a, 4, true, true) + +static bool do_mova_array_n(DisasContext *s, arg_mova_a *a, int n, + bool to_vec, bool zero) { TCGv_ptr t_za; int svl; @@ -XXX,XX +XXX,XX @@ static bool do_mova_array_n(DisasContext *s, arg_mova_a *a, int n, bool to_vec) if (to_vec) { tcg_gen_gvec_mov_var(MO_8, tcg_env, o_zr, t_za, o_za, svl, svl); + if (zero) { + tcg_gen_gvec_dup_imm_var(MO_8, t_za, o_za, svl, svl, 0); + } } else { tcg_gen_gvec_mov_var(MO_8, t_za, o_za, tcg_env, o_zr, svl, svl); } @@ -XXX,XX +XXX,XX @@ static bool do_mova_array_n(DisasContext *s, arg_mova_a *a, int n, bool to_vec) return true; } -TRANS_FEAT(MOVA_az2, aa64_sme2, do_mova_array_n, a, 2, false) -TRANS_FEAT(MOVA_az4, aa64_sme2, do_mova_array_n, a, 4, false) -TRANS_FEAT(MOVA_za2, aa64_sme2, do_mova_array_n, a, 2, true) -TRANS_FEAT(MOVA_za4, aa64_sme2, do_mova_array_n, a, 4, true) +TRANS_FEAT(MOVA_az2, aa64_sme2, do_mova_array_n, a, 2, false, false) +TRANS_FEAT(MOVA_az4, aa64_sme2, do_mova_array_n, a, 4, false, false) +TRANS_FEAT(MOVA_za2, aa64_sme2, do_mova_array_n, a, 2, true, false) +TRANS_FEAT(MOVA_za4, aa64_sme2, do_mova_array_n, a, 4, true, false) + +TRANS_FEAT(MOVAZ_za2, aa64_sme2p1, do_mova_array_n, a, 2, true, true) +TRANS_FEAT(MOVAZ_za4, aa64_sme2p1, do_mova_array_n, a, 4, true, true) static bool do_movt(DisasContext *s, arg_MOVT_rzt *a, void (*func)(TCGv_i64, TCGv_ptr, tcg_target_long)) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-101-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper.h | 24 ++++++++++ target/arm/tcg/sme.decode | 42 ++++++++++++++++ target/arm/tcg/translate-sme.c | 56 ++++++++++++++++++++++ target/arm/tcg/vec_helper.c | 88 ++++++++++++++++++++++++++++++++++ 4 files changed, 210 insertions(+) diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper.h +++ b/target/arm/tcg/helper.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_uminp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_3(gvec_urecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(gvec_ursqrte_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sme2_luti2_1b, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti2_1h, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti2_1s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) + +DEF_HELPER_FLAGS_4(sme2_luti2_2b, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti2_2h, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti2_2s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) + +DEF_HELPER_FLAGS_4(sme2_luti2_4b, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti2_4h, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti2_4s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) + +DEF_HELPER_FLAGS_4(sme2_luti4_1b, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti4_1h, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti4_1s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) + +DEF_HELPER_FLAGS_4(sme2_luti4_2b, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti4_2h, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti4_2s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) + +DEF_HELPER_FLAGS_4(sme2_luti4_4b, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti4_4h, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_4(sme2_luti4_4s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ ZERO_za 11000000 000011 110 .. 0000000000 00. \ &zero_za ngrp=2 nvec=4 rv=%mova_rv off=%off1_x4 ZERO_za 11000000 000011 111 .. 0000000000 00. \ &zero_za ngrp=4 nvec=4 rv=%mova_rv off=%off1_x4 + +### SME Lookup Table Read + +&lut zd zn idx + +# LUTI2, consecutive +LUTI2_c_1b 1100 0000 1100 11 idx:4 00 00 zn:5 zd:5 &lut +LUTI2_c_1h 1100 0000 1100 11 idx:4 01 00 zn:5 zd:5 &lut +LUTI2_c_1s 1100 0000 1100 11 idx:4 10 00 zn:5 zd:5 &lut + +LUTI2_c_2b 1100 0000 1000 11 idx:3 1 00 00 zn:5 .... 0 &lut zd=%zd_ax2 +LUTI2_c_2h 1100 0000 1000 11 idx:3 1 01 00 zn:5 .... 0 &lut zd=%zd_ax2 +LUTI2_c_2s 1100 0000 1000 11 idx:3 1 10 00 zn:5 .... 0 &lut zd=%zd_ax2 + +LUTI2_c_4b 1100 0000 1000 11 idx:2 10 00 00 zn:5 ... 00 &lut zd=%zd_ax4 +LUTI2_c_4h 1100 0000 1000 11 idx:2 10 01 00 zn:5 ... 00 &lut zd=%zd_ax4 +LUTI2_c_4s 1100 0000 1000 11 idx:2 10 10 00 zn:5 ... 00 &lut zd=%zd_ax4 + +# LUTI2, strided (must check zd alignment) +LUTI2_s_2b 1100 0000 1001 11 idx:3 1 00 00 zn:5 zd:5 &lut +LUTI2_s_2h 1100 0000 1001 11 idx:3 1 01 00 zn:5 zd:5 &lut + +LUTI2_s_4b 1100 0000 1001 11 idx:2 10 00 00 zn:5 zd:5 &lut +LUTI2_s_4h 1100 0000 1001 11 idx:2 10 01 00 zn:5 zd:5 &lut + +# LUTI4, consecutive +LUTI4_c_1b 1100 0000 1100 101 idx:3 00 00 zn:5 zd:5 &lut +LUTI4_c_1h 1100 0000 1100 101 idx:3 01 00 zn:5 zd:5 &lut +LUTI4_c_1s 1100 0000 1100 101 idx:3 10 00 zn:5 zd:5 &lut + +LUTI4_c_2b 1100 0000 1000 101 idx:2 1 00 00 zn:5 .... 0 &lut zd=%zd_ax2 +LUTI4_c_2h 1100 0000 1000 101 idx:2 1 01 00 zn:5 .... 0 &lut zd=%zd_ax2 +LUTI4_c_2s 1100 0000 1000 101 idx:2 1 10 00 zn:5 .... 0 &lut zd=%zd_ax2 + +LUTI4_c_4h 1100 0000 1000 101 idx:1 10 01 00 zn:5 ... 00 &lut zd=%zd_ax4 +LUTI4_c_4s 1100 0000 1000 101 idx:1 10 10 00 zn:5 ... 00 &lut zd=%zd_ax4 + +# LUTI4, strided (must check zd alignment) +LUTI4_s_2b 1100 0000 1001 101 idx:2 1 00 00 zn:5 zd:5 &lut +LUTI4_s_2h 1100 0000 1001 101 idx:2 1 01 00 zn:5 zd:5 &lut + +LUTI4_s_4h 1100 0000 1001 101 idx:1 10 01 00 zn:5 zd:5 &lut diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool trans_SEL(DisasContext *s, arg_SEL *a) } return true; } + +static bool do_lut(DisasContext *s, arg_lut *a, + gen_helper_gvec_2_ptr *fn, bool strided) +{ + if (sme_sm_enabled_check(s) && sme2_zt0_enabled_check(s)) { + int svl = streaming_vec_reg_size(s); + tcg_gen_gvec_2_ptr(vec_full_reg_offset(s, a->zd), + vec_full_reg_offset(s, a->zn), + tcg_env, svl, svl, strided | (a->idx << 1), fn); + } + return true; +} + +TRANS_FEAT(LUTI2_c_1b, aa64_sme2, do_lut, a, gen_helper_sme2_luti2_1b, false) +TRANS_FEAT(LUTI2_c_1h, aa64_sme2, do_lut, a, gen_helper_sme2_luti2_1h, false) +TRANS_FEAT(LUTI2_c_1s, aa64_sme2, do_lut, a, gen_helper_sme2_luti2_1s, false) + +TRANS_FEAT(LUTI2_c_2b, aa64_sme2, do_lut, a, gen_helper_sme2_luti2_2b, false) +TRANS_FEAT(LUTI2_c_2h, aa64_sme2, do_lut, a, gen_helper_sme2_luti2_2h, false) +TRANS_FEAT(LUTI2_c_2s, aa64_sme2, do_lut, a, gen_helper_sme2_luti2_2s, false) + +TRANS_FEAT(LUTI2_c_4b, aa64_sme2, do_lut, a, gen_helper_sme2_luti2_4b, false) +TRANS_FEAT(LUTI2_c_4h, aa64_sme2, do_lut, a, gen_helper_sme2_luti2_4h, false) +TRANS_FEAT(LUTI2_c_4s, aa64_sme2, do_lut, a, gen_helper_sme2_luti2_4s, false) + +TRANS_FEAT(LUTI4_c_1b, aa64_sme2, do_lut, a, gen_helper_sme2_luti4_1b, false) +TRANS_FEAT(LUTI4_c_1h, aa64_sme2, do_lut, a, gen_helper_sme2_luti4_1h, false) +TRANS_FEAT(LUTI4_c_1s, aa64_sme2, do_lut, a, gen_helper_sme2_luti4_1s, false) + +TRANS_FEAT(LUTI4_c_2b, aa64_sme2, do_lut, a, gen_helper_sme2_luti4_2b, false) +TRANS_FEAT(LUTI4_c_2h, aa64_sme2, do_lut, a, gen_helper_sme2_luti4_2h, false) +TRANS_FEAT(LUTI4_c_2s, aa64_sme2, do_lut, a, gen_helper_sme2_luti4_2s, false) + +TRANS_FEAT(LUTI4_c_4h, aa64_sme2, do_lut, a, gen_helper_sme2_luti4_4h, false) +TRANS_FEAT(LUTI4_c_4s, aa64_sme2, do_lut, a, gen_helper_sme2_luti4_4s, false) + +static bool do_lut_s4(DisasContext *s, arg_lut *a, gen_helper_gvec_2_ptr *fn) +{ + return !(a->zd & 0b01100) && do_lut(s, a, fn, true); +} + +static bool do_lut_s8(DisasContext *s, arg_lut *a, gen_helper_gvec_2_ptr *fn) +{ + return !(a->zd & 0b01000) && do_lut(s, a, fn, true); +} + +TRANS_FEAT(LUTI2_s_2b, aa64_sme2p1, do_lut_s8, a, gen_helper_sme2_luti2_2b) +TRANS_FEAT(LUTI2_s_2h, aa64_sme2p1, do_lut_s8, a, gen_helper_sme2_luti2_2h) + +TRANS_FEAT(LUTI2_s_4b, aa64_sme2p1, do_lut_s4, a, gen_helper_sme2_luti2_4b) +TRANS_FEAT(LUTI2_s_4h, aa64_sme2p1, do_lut_s4, a, gen_helper_sme2_luti2_4h) + +TRANS_FEAT(LUTI4_s_2b, aa64_sme2p1, do_lut_s8, a, gen_helper_sme2_luti4_2b) +TRANS_FEAT(LUTI4_s_2h, aa64_sme2p1, do_lut_s8, a, gen_helper_sme2_luti4_2h) + +TRANS_FEAT(LUTI4_s_4h, aa64_sme2p1, do_lut_s4, a, gen_helper_sme2_luti4_4h) diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_helper.c +++ b/target/arm/tcg/vec_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(gvec_ursqrte_s)(void *vd, void *vn, uint32_t desc) } clear_tail(d, opr_sz, simd_maxsz(desc)); } + +static inline void do_lut_b(void *zd, uint64_t *indexes, uint64_t *table, + unsigned elements, unsigned segbase, + unsigned dstride, unsigned isize, + unsigned tsize, unsigned nreg) +{ + for (unsigned r = 0; r < nreg; ++r) { + uint8_t *dst = zd + dstride * r; + unsigned base = segbase + r * elements; + + for (unsigned e = 0; e < elements; ++e) { + unsigned index = extractn(indexes, (base + e) * isize, isize); + dst[H1(e)] = extractn(table, index * tsize, 8); + } + } +} + +static inline void do_lut_h(void *zd, uint64_t *indexes, uint64_t *table, + unsigned elements, unsigned segbase, + unsigned dstride, unsigned isize, + unsigned tsize, unsigned nreg) +{ + for (unsigned r = 0; r < nreg; ++r) { + uint16_t *dst = zd + dstride * r; + unsigned base = segbase + r * elements; + + for (unsigned e = 0; e < elements; ++e) { + unsigned index = extractn(indexes, (base + e) * isize, isize); + dst[H2(e)] = extractn(table, index * tsize, 16); + } + } +} + +static inline void do_lut_s(void *zd, uint64_t *indexes, uint32_t *table, + unsigned elements, unsigned segbase, + unsigned dstride, unsigned isize, + unsigned tsize, unsigned nreg) +{ + for (unsigned r = 0; r < nreg; ++r) { + uint32_t *dst = zd + dstride * r; + unsigned base = segbase + r * elements; + + for (unsigned e = 0; e < elements; ++e) { + unsigned index = extractn(indexes, (base + e) * isize, isize); + dst[H4(e)] = table[H4(index)]; + } + } +} + +#define DO_SME2_LUT(ISIZE, NREG, SUFF, ESIZE) \ +void helper_sme2_luti##ISIZE##_##NREG##SUFF \ + (void *zd, void *zn, CPUARMState *env, uint32_t desc) \ +{ \ + unsigned vl = simd_oprsz(desc); \ + unsigned strided = extract32(desc, SIMD_DATA_SHIFT, 1); \ + unsigned idx = extract32(desc, SIMD_DATA_SHIFT + 1, 4); \ + unsigned elements = vl / ESIZE; \ + unsigned dstride = (!strided ? 1 : NREG == 4 ? 4 : 8); \ + unsigned segments = (ESIZE * 8) / (ISIZE * NREG); \ + unsigned segment = idx & (segments - 1); \ + ARMVectorReg indexes; \ + memcpy(&indexes, zn, vl); \ + do_lut_##SUFF(zd, indexes.d, (void *)env->za_state.zt0, elements, \ + segment * NREG * elements, \ + dstride * sizeof(ARMVectorReg), ISIZE, 32, NREG); \ +} + +DO_SME2_LUT(2,1,b, 1) +DO_SME2_LUT(2,1,h, 2) +DO_SME2_LUT(2,1,s, 4) +DO_SME2_LUT(2,2,b, 1) +DO_SME2_LUT(2,2,h, 2) +DO_SME2_LUT(2,2,s, 4) +DO_SME2_LUT(2,4,b, 1) +DO_SME2_LUT(2,4,h, 2) +DO_SME2_LUT(2,4,s, 4) + +DO_SME2_LUT(4,1,b, 1) +DO_SME2_LUT(4,1,h, 2) +DO_SME2_LUT(4,1,s, 4) +DO_SME2_LUT(4,2,b, 1) +DO_SME2_LUT(4,2,h, 2) +DO_SME2_LUT(4,2,s, 4) +DO_SME2_LUT(4,4,b, 1) +DO_SME2_LUT(4,4,h, 2) +DO_SME2_LUT(4,4,s, 4) + +#undef DO_SME2_LUT -- 2.43.0
The pattern we currently have as FMOPA_h is the "widening" insn that takes fp16 inputs and produces single-precision outputs. This is unlike FMOPA_s and FMOPA_d, which are non-widening produce outputs the same size as their inputs. SME2 introduces a non-widening fp16 FMOPA operation; rename FMOPA_h to FMOPA_w_h (for 'widening'), so we can use FMOPA_h for the non-widening version, giving it a name in line with the other non-widening ops FMOPA_s and FMOPA_d. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-102-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 2 +- target/arm/tcg/sme.decode | 2 +- target/arm/tcg/sme_helper.c | 4 ++-- target/arm/tcg/translate-sme.c | 4 ++-- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sme_addva_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sme_addha_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sme_addva_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) -DEF_HELPER_FLAGS_7(sme_fmopa_h, TCG_CALL_NO_RWG, +DEF_HELPER_FLAGS_7(sme_fmopa_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) DEF_HELPER_FLAGS_7(sme_fmopa_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ FMOPA_s 10000000 100 ..... ... ... ..... . 00 .. @op_32 FMOPA_d 10000000 110 ..... ... ... ..... . 0 ... @op_64 BFMOPA 10000001 100 ..... ... ... ..... . 00 .. @op_32 -FMOPA_h 10000001 101 ..... ... ... ..... . 00 .. @op_32 +FMOPA_w_h 10000001 101 ..... ... ... ..... . 00 .. @op_32 SMOPA_s 1010000 0 10 0 ..... ... ... ..... . 00 .. @op_32 SUMOPA_s 1010000 0 10 1 ..... ... ... ..... . 00 .. @op_32 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ static float32 f16_dotadd(float32 sum, uint32_t e1, uint32_t e2, return float32_add(sum, t32, s_std); } -void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn, - void *vpm, CPUARMState *env, uint32_t desc) +void HELPER(sme_fmopa_w_h)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, CPUARMState *env, uint32_t desc) { intptr_t row, col, oprsz = simd_maxsz(desc); uint32_t neg = simd_data(desc) * 0x80008000u; diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_outprod_env(DisasContext *s, arg_op *a, MemOp esz, return true; } -TRANS_FEAT(FMOPA_h, aa64_sme, do_outprod_env, a, - MO_32, gen_helper_sme_fmopa_h) +TRANS_FEAT(FMOPA_w_h, aa64_sme, do_outprod_env, a, + MO_32, gen_helper_sme_fmopa_w_h) TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, MO_32, FPST_ZA, gen_helper_sme_fmopa_s) TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, -- 2.43.0
Our current BFMOPA opcode pattern is the widening version of the insn. Rename it to BFMOPA_w, to make way for the non-widening version added in SME2. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-103-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 2 +- target/arm/tcg/sme.decode | 2 +- target/arm/tcg/sme_helper.c | 4 ++-- target/arm/tcg/translate-sme.c | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_7(sme_fmopa_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) -DEF_HELPER_FLAGS_7(sme_bfmopa, TCG_CALL_NO_RWG, +DEF_HELPER_FLAGS_7(sme_bfmopa_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) DEF_HELPER_FLAGS_6(sme_smopa_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ ADDVA_d 11000000 11 01000 1 ... ... ..... 00 ... @adda_64 FMOPA_s 10000000 100 ..... ... ... ..... . 00 .. @op_32 FMOPA_d 10000000 110 ..... ... ... ..... . 0 ... @op_64 -BFMOPA 10000001 100 ..... ... ... ..... . 00 .. @op_32 +BFMOPA_w 10000001 100 ..... ... ... ..... . 00 .. @op_32 FMOPA_w_h 10000001 101 ..... ... ... ..... . 00 .. @op_32 SMOPA_s 1010000 0 10 0 ..... ... ... ..... . 00 .. @op_32 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_fvdot_idx_h)(void *vd, void *vn, void *vm, void *va, } } -void HELPER(sme_bfmopa)(void *vza, void *vzn, void *vzm, - void *vpn, void *vpm, CPUARMState *env, uint32_t desc) +void HELPER(sme_bfmopa_w)(void *vza, void *vzn, void *vzm, + void *vpn, void *vpm, CPUARMState *env, uint32_t desc) { intptr_t row, col, oprsz = simd_maxsz(desc); uint32_t neg = simd_data(desc) * 0x80008000u; diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, MO_64, FPST_ZA, gen_helper_sme_fmopa_d) -TRANS_FEAT(BFMOPA, aa64_sme, do_outprod_env, a, MO_32, gen_helper_sme_bfmopa) +TRANS_FEAT(BFMOPA_w, aa64_sme, do_outprod_env, a, MO_32, gen_helper_sme_bfmopa_w) TRANS_FEAT(SMOPA_s, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_smopa_s) TRANS_FEAT(UMOPA_s, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_umopa_s) -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> For non-widening, we can use float_muladd_negate_product, For widening, which uses dot-product, we need to handle the negation explicitly. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-104-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 19 +++++ target/arm/tcg/vec_internal.h | 5 ++ target/arm/tcg/sme_helper.c | 141 +++++++++++++++++++++++++++------ target/arm/tcg/translate-sme.c | 27 ++++--- 4 files changed, 160 insertions(+), 32 deletions(-) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_bfmopa_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) + +DEF_HELPER_FLAGS_7(sme_fmops_w_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_7(sme_fmops_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_7(sme_fmops_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_7(sme_bfmops_w, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, env, i32) + +DEF_HELPER_FLAGS_7(sme_ah_fmops_w_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_7(sme_ah_fmops_s, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_7(sme_ah_fmops_d, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) +DEF_HELPER_FLAGS_7(sme_ah_bfmops_w, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, env, i32) + DEF_HELPER_FLAGS_6(sme_smopa_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_6(sme_umopa_s, TCG_CALL_NO_RWG, diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/vec_internal.h +++ b/target/arm/tcg/vec_internal.h @@ -XXX,XX +XXX,XX @@ bool is_ebf(CPUARMState *env, float_status *statusp, float_status *oddstatusp); /* * Negate as for FPCR.AH=1 -- do not negate NaNs. */ +static inline float16 bfloat16_ah_chs(float16 a) +{ + return bfloat16_is_any_nan(a) ? a : bfloat16_chs(a); +} + static inline float16 float16_ah_chs(float16 a) { return float16_is_any_nan(a) ? a : float16_chs(a); diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme_addva_d)(void *vzda, void *vzn, void *vpn, } } -void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn, - void *vpm, float_status *fpst, uint32_t desc) +static void do_fmopa_s(void *vza, void *vzn, void *vzm, uint16_t *pn, + uint16_t *pm, float_status *fpst, uint32_t desc, + uint32_t negx, int negf) { intptr_t row, col, oprsz = simd_maxsz(desc); - uint32_t neg = simd_data(desc) << 31; - uint16_t *pn = vpn, *pm = vpm; for (row = 0; row < oprsz; ) { uint16_t pa = pn[H2(row >> 4)]; do { if (pa & 1) { void *vza_row = vza + tile_vslice_offset(row); - uint32_t n = *(uint32_t *)(vzn + H1_4(row)) ^ neg; + uint32_t n = *(uint32_t *)(vzn + H1_4(row)) ^ negx; for (col = 0; col < oprsz; ) { uint16_t pb = pm[H2(col >> 4)]; @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn, if (pb & 1) { uint32_t *a = vza_row + H1_4(col); uint32_t *m = vzm + H1_4(col); - *a = float32_muladd(n, *m, *a, 0, fpst); + *a = float32_muladd(n, *m, *a, negf, fpst); } col += 4; pb >>= 4; @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn, } } -void HELPER(sme_fmopa_d)(void *vza, void *vzn, void *vzm, void *vpn, +void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn, void *vpm, float_status *fpst, uint32_t desc) +{ + do_fmopa_s(vza, vzn, vzm, vpn, vpm, fpst, desc, 0, 0); +} + +void HELPER(sme_fmops_s)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_fmopa_s(vza, vzn, vzm, vpn, vpm, fpst, desc, 1u << 31, 0); +} + +void HELPER(sme_ah_fmops_s)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_fmopa_s(vza, vzn, vzm, vpn, vpm, fpst, desc, 0, + float_muladd_negate_product); +} + +static void do_fmopa_d(uint64_t *za, uint64_t *zn, uint64_t *zm, uint8_t *pn, + uint8_t *pm, float_status *fpst, uint32_t desc, + uint64_t negx, int negf) { intptr_t row, col, oprsz = simd_oprsz(desc) / 8; - uint64_t neg = (uint64_t)simd_data(desc) << 63; - uint64_t *za = vza, *zn = vzn, *zm = vzm; - uint8_t *pn = vpn, *pm = vpm; for (row = 0; row < oprsz; ++row) { if (pn[H1(row)] & 1) { uint64_t *za_row = &za[tile_vslice_index(row)]; - uint64_t n = zn[row] ^ neg; + uint64_t n = zn[row] ^ negx; for (col = 0; col < oprsz; ++col) { if (pm[H1(col)] & 1) { uint64_t *a = &za_row[col]; - *a = float64_muladd(n, zm[col], *a, 0, fpst); + *a = float64_muladd(n, zm[col], *a, negf, fpst); } } } } } +void HELPER(sme_fmopa_d)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_fmopa_d(vza, vzn, vzm, vpn, vpm, fpst, desc, 0, 0); +} + +void HELPER(sme_fmops_d)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_fmopa_d(vza, vzn, vzm, vpn, vpm, fpst, desc, 1ull << 63, 0); +} + +void HELPER(sme_ah_fmops_d)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_fmopa_d(vza, vzn, vzm, vpn, vpm, fpst, desc, 0, + float_muladd_negate_product); +} + /* * Alter PAIR as needed for controlling predicates being false, * and for NEG on an enabled row element. @@ -XXX,XX +XXX,XX @@ static inline uint32_t f16mop_adj_pair(uint32_t pair, uint32_t pg, uint32_t neg) return pair; } +static inline uint32_t f16mop_ah_neg_adj_pair(uint32_t pair, uint32_t pg) +{ + uint32_t l = pg & 1 ? float16_ah_chs(pair) : 0; + uint32_t h = pg & 4 ? float16_ah_chs(pair >> 16) : 0; + return l | (h << 16); +} + +static inline uint32_t bf16mop_ah_neg_adj_pair(uint32_t pair, uint32_t pg) +{ + uint32_t l = pg & 1 ? bfloat16_ah_chs(pair) : 0; + uint32_t h = pg & 4 ? bfloat16_ah_chs(pair >> 16) : 0; + return l | (h << 16); +} + static float32 f16_dotadd(float32 sum, uint32_t e1, uint32_t e2, float_status *s_f16, float_status *s_std, float_status *s_odd) @@ -XXX,XX +XXX,XX @@ static float32 f16_dotadd(float32 sum, uint32_t e1, uint32_t e2, return float32_add(sum, t32, s_std); } -void HELPER(sme_fmopa_w_h)(void *vza, void *vzn, void *vzm, void *vpn, - void *vpm, CPUARMState *env, uint32_t desc) +static void do_fmopa_w_h(void *vza, void *vzn, void *vzm, uint16_t *pn, + uint16_t *pm, CPUARMState *env, uint32_t desc, + uint32_t negx, bool ah_neg) { intptr_t row, col, oprsz = simd_maxsz(desc); - uint32_t neg = simd_data(desc) * 0x80008000u; - uint16_t *pn = vpn, *pm = vpm; float_status fpst_odd = env->vfp.fp_status[FPST_ZA]; set_float_rounding_mode(float_round_to_odd, &fpst_odd); @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_w_h)(void *vza, void *vzn, void *vzm, void *vpn, void *vza_row = vza + tile_vslice_offset(row); uint32_t n = *(uint32_t *)(vzn + H1_4(row)); - n = f16mop_adj_pair(n, prow, neg); + if (ah_neg) { + n = f16mop_ah_neg_adj_pair(n, prow); + } else { + n = f16mop_adj_pair(n, prow, negx); + } for (col = 0; col < oprsz; ) { uint16_t pcol = pm[H2(col >> 4)]; @@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_w_h)(void *vza, void *vzn, void *vzm, void *vpn, } } +void HELPER(sme_fmopa_w_h)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, CPUARMState *env, uint32_t desc) +{ + do_fmopa_w_h(vza, vzn, vzm, vpn, vpm, env, desc, 0, false); +} + +void HELPER(sme_fmops_w_h)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, CPUARMState *env, uint32_t desc) +{ + do_fmopa_w_h(vza, vzn, vzm, vpn, vpm, env, desc, 0x80008000u, false); +} + +void HELPER(sme_ah_fmops_w_h)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, CPUARMState *env, uint32_t desc) +{ + do_fmopa_w_h(vza, vzn, vzm, vpn, vpm, env, desc, 0, true); +} + void HELPER(sme2_fdot_h)(void *vd, void *vn, void *vm, void *va, CPUARMState *env, uint32_t desc) { @@ -XXX,XX +XXX,XX @@ void HELPER(sme2_fvdot_idx_h)(void *vd, void *vn, void *vm, void *va, } } -void HELPER(sme_bfmopa_w)(void *vza, void *vzn, void *vzm, - void *vpn, void *vpm, CPUARMState *env, uint32_t desc) +static void do_bfmopa_w(void *vza, void *vzn, void *vzm, + uint16_t *pn, uint16_t *pm, CPUARMState *env, + uint32_t desc, uint32_t negx, bool ah_neg) { intptr_t row, col, oprsz = simd_maxsz(desc); - uint32_t neg = simd_data(desc) * 0x80008000u; - uint16_t *pn = vpn, *pm = vpm; float_status fpst, fpst_odd; if (is_ebf(env, &fpst, &fpst_odd)) { @@ -XXX,XX +XXX,XX @@ void HELPER(sme_bfmopa_w)(void *vza, void *vzn, void *vzm, void *vza_row = vza + tile_vslice_offset(row); uint32_t n = *(uint32_t *)(vzn + H1_4(row)); - n = f16mop_adj_pair(n, prow, neg); + if (ah_neg) { + n = bf16mop_ah_neg_adj_pair(n, prow); + } else { + n = f16mop_adj_pair(n, prow, negx); + } for (col = 0; col < oprsz; ) { uint16_t pcol = pm[H2(col >> 4)]; @@ -XXX,XX +XXX,XX @@ void HELPER(sme_bfmopa_w)(void *vza, void *vzn, void *vzm, void *vza_row = vza + tile_vslice_offset(row); uint32_t n = *(uint32_t *)(vzn + H1_4(row)); - n = f16mop_adj_pair(n, prow, neg); + if (ah_neg) { + n = bf16mop_ah_neg_adj_pair(n, prow); + } else { + n = f16mop_adj_pair(n, prow, negx); + } for (col = 0; col < oprsz; ) { uint16_t pcol = pm[H2(col >> 4)]; @@ -XXX,XX +XXX,XX @@ void HELPER(sme_bfmopa_w)(void *vza, void *vzn, void *vzm, } } +void HELPER(sme_bfmopa_w)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, CPUARMState *env, uint32_t desc) +{ + do_bfmopa_w(vza, vzn, vzm, vpn, vpm, env, desc, 0, false); +} + +void HELPER(sme_bfmops_w)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, CPUARMState *env, uint32_t desc) +{ + do_bfmopa_w(vza, vzn, vzm, vpn, vpm, env, desc, 0x80008000u, false); +} + +void HELPER(sme_ah_bfmops_w)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, CPUARMState *env, uint32_t desc) +{ + do_bfmopa_w(vza, vzn, vzm, vpn, vpm, env, desc, 0, true); +} + typedef uint32_t IMOPFn32(uint32_t, uint32_t, uint32_t, uint8_t, bool); static inline void do_imopa_s(uint32_t *za, uint32_t *zn, uint32_t *zm, uint8_t *pn, uint8_t *pm, diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ static bool do_outprod_fpst(DisasContext *s, arg_op *a, MemOp esz, gen_helper_gvec_5_ptr *fn) { int svl = streaming_vec_reg_size(s); - uint32_t desc = simd_desc(svl, svl, a->sub); + uint32_t desc = simd_desc(svl, svl, 0); TCGv_ptr za, zn, zm, pn, pm, fpst; if (!sme_smza_enabled_check(s)) { @@ -XXX,XX +XXX,XX @@ static bool do_outprod_env(DisasContext *s, arg_op *a, MemOp esz, gen_helper_gvec_5_ptr *fn) { int svl = streaming_vec_reg_size(s); - uint32_t desc = simd_desc(svl, svl, a->sub); + uint32_t desc = simd_desc(svl, svl, 0); TCGv_ptr za, zn, zm, pn, pm; if (!sme_smza_enabled_check(s)) { @@ -XXX,XX +XXX,XX @@ static bool do_outprod_env(DisasContext *s, arg_op *a, MemOp esz, return true; } -TRANS_FEAT(FMOPA_w_h, aa64_sme, do_outprod_env, a, - MO_32, gen_helper_sme_fmopa_w_h) -TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, - MO_32, FPST_ZA, gen_helper_sme_fmopa_s) -TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, - MO_64, FPST_ZA, gen_helper_sme_fmopa_d) +TRANS_FEAT(FMOPA_w_h, aa64_sme, do_outprod_env, a, MO_32, + !a->sub ? gen_helper_sme_fmopa_w_h + : !s->fpcr_ah ? gen_helper_sme_fmops_w_h + : gen_helper_sme_ah_fmops_w_h) +TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, MO_32, FPST_ZA, + !a->sub ? gen_helper_sme_fmopa_s + : !s->fpcr_ah ? gen_helper_sme_fmops_s + : gen_helper_sme_ah_fmops_s) +TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, MO_64, FPST_ZA, + !a->sub ? gen_helper_sme_fmopa_d + : !s->fpcr_ah ? gen_helper_sme_fmops_d + : gen_helper_sme_ah_fmops_d) -TRANS_FEAT(BFMOPA_w, aa64_sme, do_outprod_env, a, MO_32, gen_helper_sme_bfmopa_w) +TRANS_FEAT(BFMOPA_w, aa64_sme, do_outprod_env, a, MO_32, + !a->sub ? gen_helper_sme_bfmopa_w + : !s->fpcr_ah ? gen_helper_sme_bfmops_w + : gen_helper_sme_ah_bfmops_w) TRANS_FEAT(SMOPA_s, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_smopa_s) TRANS_FEAT(UMOPA_s, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_umopa_s) -- 2.43.0
Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-105-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 6 ++++ target/arm/tcg/sme.decode | 2 ++ target/arm/tcg/sme_helper.c | 51 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 4 +++ 4 files changed, 63 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sme_addva_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_7(sme_fmopa_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_7(sme_fmopa_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_fmopa_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG, @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_7(sme_bfmopa_w, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_7(sme_fmops_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_7(sme_fmops_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_fmops_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_fmops_d, TCG_CALL_NO_RWG, @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_7(sme_bfmops_w, TCG_CALL_NO_RWG, DEF_HELPER_FLAGS_7(sme_ah_fmops_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_7(sme_ah_fmops_h, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_ah_fmops_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_ah_fmops_d, TCG_CALL_NO_RWG, diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ ADDVA_d 11000000 11 01000 1 ... ... ..... 00 ... @adda_64 ### SME Outer Product &op zad zn zm pm pn sub:bool +@op_16 ........ ... zm:5 pm:3 pn:3 zn:5 sub:1 ... zad:1 &op @op_32 ........ ... zm:5 pm:3 pn:3 zn:5 sub:1 .. zad:2 &op @op_64 ........ ... zm:5 pm:3 pn:3 zn:5 sub:1 . zad:3 &op +FMOPA_h 10000001 100 ..... ... ... ..... . 100 . @op_16 FMOPA_s 10000000 100 ..... ... ... ..... . 00 .. @op_32 FMOPA_d 10000000 110 ..... ... ... ..... . 0 ... @op_64 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme_addva_d)(void *vzda, void *vzn, void *vpn, } } +static void do_fmopa_h(void *vza, void *vzn, void *vzm, uint16_t *pn, + uint16_t *pm, float_status *fpst, uint32_t desc, + uint16_t negx, int negf) +{ + intptr_t row, col, oprsz = simd_maxsz(desc); + + for (row = 0; row < oprsz; ) { + uint16_t pa = pn[H2(row >> 4)]; + do { + if (pa & 1) { + void *vza_row = vza + tile_vslice_offset(row); + uint16_t n = *(uint32_t *)(vzn + H1_2(row)) ^ negx; + + for (col = 0; col < oprsz; ) { + uint16_t pb = pm[H2(col >> 4)]; + do { + if (pb & 1) { + uint16_t *a = vza_row + H1_2(col); + uint16_t *m = vzm + H1_2(col); + *a = float16_muladd(n, *m, *a, negf, fpst); + } + col += 2; + pb >>= 2; + } while (col & 15); + } + } + row += 2; + pa >>= 2; + } while (row & 15); + } +} + +void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_fmopa_h(vza, vzn, vzm, vpn, vpm, fpst, desc, 0, 0); +} + +void HELPER(sme_fmops_h)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_fmopa_h(vza, vzn, vzm, vpn, vpm, fpst, desc, 1u << 15, 0); +} + +void HELPER(sme_ah_fmops_h)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_fmopa_h(vza, vzn, vzm, vpn, vpm, fpst, desc, 0, + float_muladd_negate_product); +} + static void do_fmopa_s(void *vza, void *vzn, void *vzm, uint16_t *pn, uint16_t *pm, float_status *fpst, uint32_t desc, uint32_t negx, int negf) diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FMOPA_w_h, aa64_sme, do_outprod_env, a, MO_32, !a->sub ? gen_helper_sme_fmopa_w_h : !s->fpcr_ah ? gen_helper_sme_fmops_w_h : gen_helper_sme_ah_fmops_w_h) +TRANS_FEAT(FMOPA_h, aa64_sme_f16f16, do_outprod_fpst, a, MO_16, FPST_ZA_F16, + !a->sub ? gen_helper_sme_fmopa_h + : !s->fpcr_ah ? gen_helper_sme_fmops_h + : gen_helper_sme_ah_fmops_h) TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, MO_32, FPST_ZA, !a->sub ? gen_helper_sme_fmopa_s : !s->fpcr_ah ? gen_helper_sme_fmops_s -- 2.43.0
Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-106-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/tcg/helper-sme.h | 6 ++++ target/arm/tcg/sme.decode | 2 ++ target/arm/tcg/sme_helper.c | 51 ++++++++++++++++++++++++++++++++++ target/arm/tcg/translate-sme.c | 5 ++++ 4 files changed, 64 insertions(+) diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_bfmopa_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_7(sme_bfmopa, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_fmops_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_7(sme_fmops_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_bfmops_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_7(sme_bfmops, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_ah_fmops_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) @@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_7(sme_ah_fmops_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_7(sme_ah_bfmops_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, env, i32) +DEF_HELPER_FLAGS_7(sme_ah_bfmops, TCG_CALL_NO_RWG, + void, ptr, ptr, ptr, ptr, ptr, fpst, i32) DEF_HELPER_FLAGS_6(sme_smopa_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/tcg/sme.decode b/target/arm/tcg/sme.decode index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme.decode +++ b/target/arm/tcg/sme.decode @@ -XXX,XX +XXX,XX @@ FMOPA_h 10000001 100 ..... ... ... ..... . 100 . @op_16 FMOPA_s 10000000 100 ..... ... ... ..... . 00 .. @op_32 FMOPA_d 10000000 110 ..... ... ... ..... . 0 ... @op_64 +BFMOPA 10000001 101 ..... ... ... ..... . 100 . @op_16 + BFMOPA_w 10000001 100 ..... ... ... ..... . 00 .. @op_32 FMOPA_w_h 10000001 101 ..... ... ... ..... . 00 .. @op_32 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -XXX,XX +XXX,XX @@ void HELPER(sme_ah_fmops_d)(void *vza, void *vzn, void *vzm, void *vpn, float_muladd_negate_product); } +static void do_bfmopa(void *vza, void *vzn, void *vzm, uint16_t *pn, + uint16_t *pm, float_status *fpst, uint32_t desc, + uint16_t negx, int negf) +{ + intptr_t row, col, oprsz = simd_maxsz(desc); + + for (row = 0; row < oprsz; ) { + uint16_t pa = pn[H2(row >> 4)]; + do { + if (pa & 1) { + void *vza_row = vza + tile_vslice_offset(row); + uint16_t n = *(uint32_t *)(vzn + H1_2(row)) ^ negx; + + for (col = 0; col < oprsz; ) { + uint16_t pb = pm[H2(col >> 4)]; + do { + if (pb & 1) { + uint16_t *a = vza_row + H1_2(col); + uint16_t *m = vzm + H1_2(col); + *a = bfloat16_muladd(n, *m, *a, negf, fpst); + } + col += 2; + pb >>= 2; + } while (col & 15); + } + } + row += 2; + pa >>= 2; + } while (row & 15); + } +} + +void HELPER(sme_bfmopa)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_bfmopa(vza, vzn, vzm, vpn, vpm, fpst, desc, 0, 0); +} + +void HELPER(sme_bfmops)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_bfmopa(vza, vzn, vzm, vpn, vpm, fpst, desc, 1u << 15, 0); +} + +void HELPER(sme_ah_bfmops)(void *vza, void *vzn, void *vzm, void *vpn, + void *vpm, float_status *fpst, uint32_t desc) +{ + do_bfmopa(vza, vzn, vzm, vpn, vpm, fpst, desc, 0, + float_muladd_negate_product); +} + /* * Alter PAIR as needed for controlling predicates being false, * and for NEG on an enabled row element. diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, MO_64, FPST_ZA, : !s->fpcr_ah ? gen_helper_sme_fmops_d : gen_helper_sme_ah_fmops_d) +TRANS_FEAT(BFMOPA, aa64_sme_b16b16, do_outprod_fpst, a, MO_16, FPST_ZA, + !a->sub ? gen_helper_sme_bfmopa + : !s->fpcr_ah ? gen_helper_sme_bfmops + : gen_helper_sme_ah_bfmops) + TRANS_FEAT(BFMOPA_w, aa64_sme, do_outprod_env, a, MO_32, !a->sub ? gen_helper_sme_bfmopa_w : !s->fpcr_ah ? gen_helper_sme_bfmops_w -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-107-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- docs/system/arm/emulation.rst | 6 ++++++ target/arm/tcg/cpu64.c | 10 ++++++++-- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst index XXXXXXX..XXXXXXX 100644 --- a/docs/system/arm/emulation.rst +++ b/docs/system/arm/emulation.rst @@ -XXX,XX +XXX,XX @@ the following architecture extensions: - FEAT_SM3 (Advanced SIMD SM3 instructions) - FEAT_SM4 (Advanced SIMD SM4 instructions) - FEAT_SME (Scalable Matrix Extension) +- FEAT_SME2 (Scalable Matrix Extension version 2) +- FEAT_SME2p1 (Scalable Matrix Extension version 2.1) +- FEAT_SME_B16B16 (Non-widening BFloat16 arithmetic for SME2) - FEAT_SME_FA64 (Full A64 instruction set in Streaming SVE mode) +- FEAT_SME_F16F16 (Non-widening half-precision FP16 arithmetic for SME2) - FEAT_SME_F64F64 (Double-precision floating-point outer product instructions) - FEAT_SME_I16I64 (16-bit to 64-bit integer widening outer product instructions) - FEAT_SVE (Scalable Vector Extension) - FEAT_SVE_AES (Scalable Vector AES instructions) +- FEAT_SVE_B16B16 (Non-widening BFloat16 arithmetic for SVE2) - FEAT_SVE_BitPerm (Scalable Vector Bit Permutes instructions) - FEAT_SVE_PMULL128 (Scalable Vector PMULL instructions) - FEAT_SVE_SHA3 (Scalable Vector SHA3 instructions) - FEAT_SVE_SM4 (Scalable Vector SM4 instructions) - FEAT_SVE2 (Scalable Vector Extension version 2) +- FEAT_SVE2p1 (Scalable Vector Extension version 2.1) - FEAT_SPECRES (Speculation restriction instructions) - FEAT_SSBS (Speculative Store Bypass Safe) - FEAT_SSBS2 (MRS and MSR instructions for SSBS version 2) diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c index XXXXXXX..XXXXXXX 100644 --- a/target/arm/tcg/cpu64.c +++ b/target/arm/tcg/cpu64.c @@ -XXX,XX +XXX,XX @@ void aarch64_max_tcg_initfn(Object *obj) */ t = FIELD_DP64(t, ID_AA64PFR1, MTE, 3); /* FEAT_MTE3 */ t = FIELD_DP64(t, ID_AA64PFR1, RAS_FRAC, 0); /* FEAT_RASv1p1 + FEAT_DoubleFault */ - t = FIELD_DP64(t, ID_AA64PFR1, SME, 1); /* FEAT_SME */ + t = FIELD_DP64(t, ID_AA64PFR1, SME, 2); /* FEAT_SME2 */ t = FIELD_DP64(t, ID_AA64PFR1, CSV2_FRAC, 0); /* FEAT_CSV2_3 */ t = FIELD_DP64(t, ID_AA64PFR1, NMI, 1); /* FEAT_NMI */ SET_IDREG(isar, ID_AA64PFR1, t); @@ -XXX,XX +XXX,XX @@ void aarch64_max_tcg_initfn(Object *obj) FIELD_DP64_IDREG(isar, ID_AA64MMFR3, SPEC_FPACC, 1); /* FEAT_FPACC_SPEC */ t = GET_IDREG(isar, ID_AA64ZFR0); - t = FIELD_DP64(t, ID_AA64ZFR0, SVEVER, 1); + t = FIELD_DP64(t, ID_AA64ZFR0, SVEVER, 2); /* FEAT_SVE2p1 */ t = FIELD_DP64(t, ID_AA64ZFR0, AES, 2); /* FEAT_SVE_PMULL128 */ t = FIELD_DP64(t, ID_AA64ZFR0, BITPERM, 1); /* FEAT_SVE_BitPerm */ t = FIELD_DP64(t, ID_AA64ZFR0, BFLOAT16, 2); /* FEAT_BF16, FEAT_EBF16 */ + t = FIELD_DP64(t, ID_AA64ZFR0, B16B16, 1); /* FEAT_SVE_B16B16 */ t = FIELD_DP64(t, ID_AA64ZFR0, SHA3, 1); /* FEAT_SVE_SHA3 */ t = FIELD_DP64(t, ID_AA64ZFR0, SM4, 1); /* FEAT_SVE_SM4 */ t = FIELD_DP64(t, ID_AA64ZFR0, I8MM, 1); /* FEAT_I8MM */ @@ -XXX,XX +XXX,XX @@ void aarch64_max_tcg_initfn(Object *obj) t = GET_IDREG(isar, ID_AA64SMFR0); t = FIELD_DP64(t, ID_AA64SMFR0, F32F32, 1); /* FEAT_SME */ + t = FIELD_DP64(t, ID_AA64SMFR0, BI32I32, 1); /* FEAT_SME2 */ t = FIELD_DP64(t, ID_AA64SMFR0, B16F32, 1); /* FEAT_SME */ t = FIELD_DP64(t, ID_AA64SMFR0, F16F32, 1); /* FEAT_SME */ t = FIELD_DP64(t, ID_AA64SMFR0, I8I32, 0xf); /* FEAT_SME */ + t = FIELD_DP64(t, ID_AA64SMFR0, F16F16, 1); /* FEAT_SME_F16F16 */ + t = FIELD_DP64(t, ID_AA64SMFR0, B16B16, 1); /* FEAT_SME_B16B16 */ + t = FIELD_DP64(t, ID_AA64SMFR0, I16I32, 5); /* FEAT_SME2 */ t = FIELD_DP64(t, ID_AA64SMFR0, F64F64, 1); /* FEAT_SME_F64F64 */ t = FIELD_DP64(t, ID_AA64SMFR0, I16I64, 0xf); /* FEAT_SME_I16I64 */ + t = FIELD_DP64(t, ID_AA64SMFR0, SMEVER, 2); /* FEAT_SME2p1 */ t = FIELD_DP64(t, ID_AA64SMFR0, FA64, 1); /* FEAT_SME_FA64 */ SET_IDREG(isar, ID_AA64SMFR0, t); -- 2.43.0
From: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-108-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- linux-user/elfload.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/linux-user/elfload.c b/linux-user/elfload.c index XXXXXXX..XXXXXXX 100644 --- a/linux-user/elfload.c +++ b/linux-user/elfload.c @@ -XXX,XX +XXX,XX @@ uint64_t get_elf_hwcap2(void) GET_FEATURE_ID(aa64_sme_fa64, ARM_HWCAP2_A64_SME_FA64); GET_FEATURE_ID(aa64_hbc, ARM_HWCAP2_A64_HBC); GET_FEATURE_ID(aa64_mops, ARM_HWCAP2_A64_MOPS); + GET_FEATURE_ID(aa64_sve2p1, ARM_HWCAP2_A64_SVE2P1); + GET_FEATURE_ID(aa64_sme2, (ARM_HWCAP2_A64_SME2 | + ARM_HWCAP2_A64_SME_I16I32 | + ARM_HWCAP2_A64_SME_BI32I32)); + GET_FEATURE_ID(aa64_sme2p1, ARM_HWCAP2_A64_SME2P1); + GET_FEATURE_ID(aa64_sme_b16b16, ARM_HWCAP2_A64_SME_B16B16); + GET_FEATURE_ID(aa64_sme_f16f16, ARM_HWCAP2_A64_SME_F16F16); + GET_FEATURE_ID(aa64_sve_b16b16, ARM_HWCAP2_A64_SVE_B16B16); return hwcaps; } -- 2.43.0