1
First set of arm patches for 6.2. I have a lot more in my
1
The following changes since commit 5767815218efd3cbfd409505ed824d5f356044ae:
2
to-review queue still...
3
2
4
-- PMM
3
Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging (2024-02-14 15:45:52 +0000)
5
6
The following changes since commit d42685765653ec155fdf60910662f8830bdb2cef:
7
8
Open 6.2 development tree (2021-08-25 10:25:12 +0100)
9
4
10
are available in the Git repository at:
5
are available in the Git repository at:
11
6
12
https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20210825
7
https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20240215
13
8
14
for you to fetch changes up to 24b1a6aa43615be22c7ee66bd68ec5675f6a6a9a:
9
for you to fetch changes up to f780e63fe731b058fe52d43653600d8729a1b5f2:
15
10
16
docs: Document how to use gdb with unix sockets (2021-08-25 10:48:51 +0100)
11
docs: Add documentation for the mps3-an536 board (2024-02-15 14:32:39 +0000)
17
12
18
----------------------------------------------------------------
13
----------------------------------------------------------------
19
target-arm queue:
14
target-arm queue:
20
* More MVE emulation work
15
* hw/arm/xilinx_zynq: Wire FIQ between CPU <> GIC
21
* Implement M-profile trapping on division by zero
16
* linux-user/aarch64: Choose SYNC as the preferred MTE mode
22
* kvm: use RCU_READ_LOCK_GUARD() in kvm_arch_fixup_msi_route()
17
* Fix some errors in SVE/SME handling of MTE tags
23
* hw/char/pl011: add support for sending break
18
* hw/pci-host/raven.c: Mark raven_io_ops as implementing unaligned accesses
24
* fsl-imx6ul: Instantiate SAI1/2/3 and ASRC as unimplemented devices
19
* hw/block/tc58128: Don't emit deprecation warning under qtest
25
* hw/dma/pl330: Add memory region to replace default
20
* tests/qtest: Fix handling of npcm7xx and GMAC tests
26
* sbsa-ref: Rename SBSA_GWDT enum value
21
* hw/arm/virt: Wire up non-secure EL2 virtual timer IRQ
27
* fsl-imx7: Instantiate SAI1/2/3 as unimplemented devices
22
* tests/qtest/npcm7xx_emc-test: Connect all NICs to a backend
28
* docs: Document how to use gdb with unix sockets
23
* Don't assert on vmload/vmsave of M-profile CPUs
24
* hw/arm/smmuv3: add support for stage 1 access fault
25
* hw/arm/stellaris: QOM cleanups
26
* Use new CBAR encoding for all v8 CPUs, not all aarch64 CPUs
27
* Improve Cortex_R52 IMPDEF sysreg modelling
28
* Allow access to SPSR_hyp from hyp mode
29
* New board model mps3-an536 (Cortex-R52)
29
30
30
----------------------------------------------------------------
31
----------------------------------------------------------------
31
Eduardo Habkost (1):
32
Luc Michel (1):
32
sbsa-ref: Rename SBSA_GWDT enum value
33
hw/arm/smmuv3: add support for stage 1 access fault
33
34
34
Guenter Roeck (2):
35
Nabih Estefan (1):
35
fsl-imx6ul: Instantiate SAI1/2/3 and ASRC as unimplemented devices
36
tests/qtest: Fix GMAC test to run on a machine in upstream QEMU
36
fsl-imx7: Instantiate SAI1/2/3 as unimplemented devices
37
37
38
Hamza Mahfooz (1):
38
Peter Maydell (22):
39
target/arm: kvm: use RCU_READ_LOCK_GUARD() in kvm_arch_fixup_msi_route()
39
hw/pci-host/raven.c: Mark raven_io_ops as implementing unaligned accesses
40
hw/block/tc58128: Don't emit deprecation warning under qtest
41
tests/qtest/meson.build: Don't include qtests_npcm7xx in qtests_aarch64
42
tests/qtest/bios-tables-test: Allow changes to virt GTDT
43
hw/arm/virt: Wire up non-secure EL2 virtual timer IRQ
44
tests/qtest/bios-tables-tests: Update virt golden reference
45
hw/arm/npcm7xx: Call qemu_configure_nic_device() for GMAC modules
46
tests/qtest/npcm7xx_emc-test: Connect all NICs to a backend
47
target/arm: Don't get MDCR_EL2 in pmu_counter_enabled() before checking ARM_FEATURE_PMU
48
target/arm: Use new CBAR encoding for all v8 CPUs, not all aarch64 CPUs
49
target/arm: The Cortex-R52 has a read-only CBAR
50
target/arm: Add Cortex-R52 IMPDEF sysregs
51
target/arm: Allow access to SPSR_hyp from hyp mode
52
hw/misc/mps2-scc: Fix condition for CFG3 register
53
hw/misc/mps2-scc: Factor out which-board conditionals
54
hw/misc/mps2-scc: Make changes needed for AN536 FPGA image
55
hw/arm/mps3r: Initial skeleton for mps3-an536 board
56
hw/arm/mps3r: Add CPUs, GIC, and per-CPU RAM
57
hw/arm/mps3r: Add UARTs
58
hw/arm/mps3r: Add GPIO, watchdog, dual-timer, I2C devices
59
hw/arm/mps3r: Add remaining devices
60
docs: Add documentation for the mps3-an536 board
40
61
41
Jan Luebbe (1):
62
Philippe Mathieu-Daudé (5):
42
hw/char/pl011: add support for sending break
63
hw/arm/xilinx_zynq: Wire FIQ between CPU <> GIC
64
hw/arm/stellaris: Convert ADC controller to Resettable interface
65
hw/arm/stellaris: Convert I2C controller to Resettable interface
66
hw/arm/stellaris: Add missing QOM 'machine' parent
67
hw/arm/stellaris: Add missing QOM 'SoC' parent
43
68
44
Peter Maydell (37):
69
Richard Henderson (6):
45
target/arm: Note that we handle VMOVL as a special case of VSHLL
70
linux-user/aarch64: Choose SYNC as the preferred MTE mode
46
target/arm: Print MVE VPR in CPU dumps
71
target/arm: Fix nregs computation in do_{ld,st}_zpa
47
target/arm: Fix MVE VSLI by 0 and VSRI by <dt>
72
target/arm: Adjust and validate mtedesc sizem1
48
target/arm: Fix signed VADDV
73
target/arm: Split out make_svemte_desc
49
target/arm: Fix mask handling for MVE narrowing operations
74
target/arm: Handle mte in do_ldrq, do_ldro
50
target/arm: Fix 48-bit saturating shifts
75
target/arm: Fix SVE/SME gross MTE suppression checks
51
target/arm: Fix MVE 48-bit SQRSHRL for small right shifts
52
target/arm: Fix calculation of LTP mask when LR is 0
53
target/arm: Factor out mve_eci_mask()
54
target/arm: Fix VPT advance when ECI is non-zero
55
target/arm: Fix VLDRB/H/W for predicated elements
56
target/arm: Implement MVE VMULL (polynomial)
57
target/arm: Implement MVE incrementing/decrementing dup insns
58
target/arm: Factor out gen_vpst()
59
target/arm: Implement MVE integer vector comparisons
60
target/arm: Implement MVE integer vector-vs-scalar comparisons
61
target/arm: Implement MVE VPSEL
62
target/arm: Implement MVE VMLAS
63
target/arm: Implement MVE shift-by-scalar
64
target/arm: Move 'x' and 'a' bit definitions into vmlaldav formats
65
target/arm: Implement MVE integer min/max across vector
66
target/arm: Implement MVE VABAV
67
target/arm: Implement MVE narrowing moves
68
target/arm: Rename MVEGenDualAccOpFn to MVEGenLongDualAccOpFn
69
target/arm: Implement MVE VMLADAV and VMLSLDAV
70
target/arm: Implement MVE VMLA
71
target/arm: Implement MVE saturating doubling multiply accumulates
72
target/arm: Implement MVE VQABS, VQNEG
73
target/arm: Implement MVE VMAXA, VMINA
74
target/arm: Implement MVE VMOV to/from 2 general-purpose registers
75
target/arm: Implement MVE VPNOT
76
target/arm: Implement MVE VCTP
77
target/arm: Implement MVE scatter-gather insns
78
target/arm: Implement MVE scatter-gather immediate forms
79
target/arm: Implement MVE interleaving loads/stores
80
target/arm: Re-indent sdiv and udiv helpers
81
target/arm: Implement M-profile trapping on division by zero
82
76
83
Sebastian Meyer (1):
77
MAINTAINERS | 3 +-
84
docs: Document how to use gdb with unix sockets
78
docs/system/arm/mps2.rst | 37 +-
79
configs/devices/arm-softmmu/default.mak | 1 +
80
hw/arm/smmuv3-internal.h | 1 +
81
include/hw/arm/smmu-common.h | 1 +
82
include/hw/arm/virt.h | 2 +
83
include/hw/misc/mps2-scc.h | 1 +
84
linux-user/aarch64/target_prctl.h | 29 +-
85
target/arm/internals.h | 2 +-
86
target/arm/tcg/translate-a64.h | 2 +
87
hw/arm/mps3r.c | 640 ++++++++++++++++++++++++++++++++
88
hw/arm/npcm7xx.c | 1 +
89
hw/arm/smmu-common.c | 11 +
90
hw/arm/smmuv3.c | 1 +
91
hw/arm/stellaris.c | 47 ++-
92
hw/arm/virt-acpi-build.c | 20 +-
93
hw/arm/virt.c | 60 ++-
94
hw/arm/xilinx_zynq.c | 2 +
95
hw/block/tc58128.c | 4 +-
96
hw/misc/mps2-scc.c | 138 ++++++-
97
hw/pci-host/raven.c | 1 +
98
target/arm/helper.c | 14 +-
99
target/arm/tcg/cpu32.c | 109 ++++++
100
target/arm/tcg/op_helper.c | 43 ++-
101
target/arm/tcg/sme_helper.c | 8 +-
102
target/arm/tcg/sve_helper.c | 12 +-
103
target/arm/tcg/translate-sme.c | 15 +-
104
target/arm/tcg/translate-sve.c | 83 +++--
105
target/arm/tcg/translate.c | 19 +-
106
tests/qtest/npcm7xx_emc-test.c | 5 +-
107
tests/qtest/npcm_gmac-test.c | 84 +----
108
hw/arm/Kconfig | 5 +
109
hw/arm/meson.build | 1 +
110
tests/data/acpi/virt/FACP | Bin 276 -> 276 bytes
111
tests/data/acpi/virt/GTDT | Bin 96 -> 104 bytes
112
tests/qtest/meson.build | 4 +-
113
36 files changed, 1184 insertions(+), 222 deletions(-)
114
create mode 100644 hw/arm/mps3r.c
85
115
86
Wen, Jianxian (1):
87
hw/dma/pl330: Add memory region to replace default
88
89
docs/system/gdb.rst | 26 +-
90
include/hw/arm/fsl-imx7.h | 5 +
91
target/arm/cpu.h | 1 +
92
target/arm/helper-mve.h | 283 ++++++++++
93
target/arm/helper.h | 4 +-
94
target/arm/translate-a32.h | 2 +
95
target/arm/vec_internal.h | 11 +
96
target/arm/mve.decode | 226 +++++++-
97
target/arm/t32.decode | 1 +
98
hw/arm/exynos4210.c | 3 +
99
hw/arm/fsl-imx6ul.c | 12 +
100
hw/arm/fsl-imx7.c | 7 +
101
hw/arm/sbsa-ref.c | 6 +-
102
hw/arm/xilinx_zynq.c | 3 +
103
hw/char/pl011.c | 6 +
104
hw/dma/pl330.c | 26 +-
105
target/arm/cpu.c | 3 +
106
target/arm/helper.c | 34 +-
107
target/arm/kvm.c | 17 +-
108
target/arm/m_helper.c | 4 +
109
target/arm/mve_helper.c | 1254 ++++++++++++++++++++++++++++++++++++++++++--
110
target/arm/translate-mve.c | 877 ++++++++++++++++++++++++++++++-
111
target/arm/translate-vfp.c | 2 +-
112
target/arm/translate.c | 37 +-
113
target/arm/vec_helper.c | 14 +-
114
25 files changed, 2746 insertions(+), 118 deletions(-)
115
diff view generated by jsdifflib
1
From: "Wen, Jianxian" <Jianxian.Wen@verisilicon.com>
1
From: Philippe Mathieu-Daudé <philmd@linaro.org>
2
2
3
Add property memory region which can connect with IOMMU region to support SMMU translate.
3
Similarly to commits dadbb58f59..5ae79fe825 for other ARM boards,
4
connect FIQ output of the GIC CPU interfaces to the CPU.
4
5
5
Signed-off-by: Jianxian Wen <jianxian.wen@verisilicon.com>
6
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
6
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
7
Message-id: 20240130152548.17855-1-philmd@linaro.org
7
Message-id: 4C23C17B8E87E74E906A25A3254A03F4FA1FEC31@SHASXM03.verisilicon.com
8
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
8
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
---
10
---
10
hw/arm/exynos4210.c | 3 +++
11
hw/arm/xilinx_zynq.c | 2 ++
11
hw/arm/xilinx_zynq.c | 3 +++
12
1 file changed, 2 insertions(+)
12
hw/dma/pl330.c | 26 ++++++++++++++++++++++----
13
3 files changed, 28 insertions(+), 4 deletions(-)
14
13
15
diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
16
index XXXXXXX..XXXXXXX 100644
17
--- a/hw/arm/exynos4210.c
18
+++ b/hw/arm/exynos4210.c
19
@@ -XXX,XX +XXX,XX @@ static DeviceState *pl330_create(uint32_t base, qemu_or_irq *orgate,
20
int i;
21
22
dev = qdev_new("pl330");
23
+ object_property_set_link(OBJECT(dev), "memory",
24
+ OBJECT(get_system_memory()),
25
+ &error_fatal);
26
qdev_prop_set_uint8(dev, "num_events", nevents);
27
qdev_prop_set_uint8(dev, "num_chnls", 8);
28
qdev_prop_set_uint8(dev, "num_periph_req", nreq);
29
diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
14
diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
30
index XXXXXXX..XXXXXXX 100644
15
index XXXXXXX..XXXXXXX 100644
31
--- a/hw/arm/xilinx_zynq.c
16
--- a/hw/arm/xilinx_zynq.c
32
+++ b/hw/arm/xilinx_zynq.c
17
+++ b/hw/arm/xilinx_zynq.c
33
@@ -XXX,XX +XXX,XX @@ static void zynq_init(MachineState *machine)
18
@@ -XXX,XX +XXX,XX @@ static void zynq_init(MachineState *machine)
34
sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[39-IRQ_OFFSET]);
19
sysbus_mmio_map(busdev, 0, MPCORE_PERIPHBASE);
35
20
sysbus_connect_irq(busdev, 0,
36
dev = qdev_new("pl330");
21
qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_IRQ));
37
+ object_property_set_link(OBJECT(dev), "memory",
22
+ sysbus_connect_irq(busdev, 1,
38
+ OBJECT(address_space_mem),
23
+ qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_FIQ));
39
+ &error_fatal);
24
40
qdev_prop_set_uint8(dev, "num_chnls", 8);
25
for (n = 0; n < 64; n++) {
41
qdev_prop_set_uint8(dev, "num_periph_req", 4);
26
pic[n] = qdev_get_gpio_in(dev, n);
42
qdev_prop_set_uint8(dev, "num_events", 16);
43
diff --git a/hw/dma/pl330.c b/hw/dma/pl330.c
44
index XXXXXXX..XXXXXXX 100644
45
--- a/hw/dma/pl330.c
46
+++ b/hw/dma/pl330.c
47
@@ -XXX,XX +XXX,XX @@ struct PL330State {
48
uint8_t num_faulting;
49
uint8_t periph_busy[PL330_PERIPH_NUM];
50
51
+ /* Memory region that DMA operation access */
52
+ MemoryRegion *mem_mr;
53
+ AddressSpace *mem_as;
54
};
55
56
#define TYPE_PL330 "pl330"
57
@@ -XXX,XX +XXX,XX @@ static inline const PL330InsnDesc *pl330_fetch_insn(PL330Chan *ch)
58
uint8_t opcode;
59
int i;
60
61
- dma_memory_read(&address_space_memory, ch->pc, &opcode, 1);
62
+ dma_memory_read(ch->parent->mem_as, ch->pc, &opcode, 1);
63
for (i = 0; insn_desc[i].size; i++) {
64
if ((opcode & insn_desc[i].opmask) == insn_desc[i].opcode) {
65
return &insn_desc[i];
66
@@ -XXX,XX +XXX,XX @@ static inline void pl330_exec_insn(PL330Chan *ch, const PL330InsnDesc *insn)
67
uint8_t buf[PL330_INSN_MAXSIZE];
68
69
assert(insn->size <= PL330_INSN_MAXSIZE);
70
- dma_memory_read(&address_space_memory, ch->pc, buf, insn->size);
71
+ dma_memory_read(ch->parent->mem_as, ch->pc, buf, insn->size);
72
insn->exec(ch, buf[0], &buf[1], insn->size - 1);
73
}
74
75
@@ -XXX,XX +XXX,XX @@ static int pl330_exec_cycle(PL330Chan *channel)
76
if (q != NULL && q->len <= pl330_fifo_num_free(&s->fifo)) {
77
int len = q->len - (q->addr & (q->len - 1));
78
79
- dma_memory_read(&address_space_memory, q->addr, buf, len);
80
+ dma_memory_read(s->mem_as, q->addr, buf, len);
81
trace_pl330_exec_cycle(q->addr, len);
82
if (trace_event_get_state_backends(TRACE_PL330_HEXDUMP)) {
83
pl330_hexdump(buf, len);
84
@@ -XXX,XX +XXX,XX @@ static int pl330_exec_cycle(PL330Chan *channel)
85
fifo_res = pl330_fifo_get(&s->fifo, buf, len, q->tag);
86
}
87
if (fifo_res == PL330_FIFO_OK || q->z) {
88
- dma_memory_write(&address_space_memory, q->addr, buf, len);
89
+ dma_memory_write(s->mem_as, q->addr, buf, len);
90
trace_pl330_exec_cycle(q->addr, len);
91
if (trace_event_get_state_backends(TRACE_PL330_HEXDUMP)) {
92
pl330_hexdump(buf, len);
93
@@ -XXX,XX +XXX,XX @@ static void pl330_realize(DeviceState *dev, Error **errp)
94
"dma", PL330_IOMEM_SIZE);
95
sysbus_init_mmio(SYS_BUS_DEVICE(dev), &s->iomem);
96
97
+ if (!s->mem_mr) {
98
+ error_setg(errp, "'memory' link is not set");
99
+ return;
100
+ } else if (s->mem_mr == get_system_memory()) {
101
+ /* Avoid creating new AS for system memory. */
102
+ s->mem_as = &address_space_memory;
103
+ } else {
104
+ s->mem_as = g_new0(AddressSpace, 1);
105
+ address_space_init(s->mem_as, s->mem_mr,
106
+ memory_region_name(s->mem_mr));
107
+ }
108
+
109
s->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, pl330_exec_cycle_timer, s);
110
111
s->cfg[0] = (s->mgr_ns_at_rst ? 0x4 : 0) |
112
@@ -XXX,XX +XXX,XX @@ static Property pl330_properties[] = {
113
DEFINE_PROP_UINT8("rd_q_dep", PL330State, rd_q_dep, 16),
114
DEFINE_PROP_UINT16("data_buffer_dep", PL330State, data_buffer_dep, 256),
115
116
+ DEFINE_PROP_LINK("memory", PL330State, mem_mr,
117
+ TYPE_MEMORY_REGION, MemoryRegion *),
118
+
119
DEFINE_PROP_END_OF_LIST(),
120
};
121
122
--
27
--
123
2.20.1
28
2.34.1
124
29
125
30
diff view generated by jsdifflib
1
Implement the MVE gather-loads and scatter-stores which
1
From: Richard Henderson <richard.henderson@linaro.org>
2
form the address by adding a base value from a scalar
3
register to an offset in each element of a vector.
4
2
3
The API does not generate an error for setting ASYNC | SYNC; that merely
4
constrains the selection vs the per-cpu default. For qemu linux-user,
5
choose SYNC as the default.
6
7
Cc: qemu-stable@nongnu.org
8
Reported-by: Gustavo Romero <gustavo.romero@linaro.org>
9
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
10
Tested-by: Gustavo Romero <gustavo.romero@linaro.org>
11
Message-id: 20240207025210.8837-2-richard.henderson@linaro.org
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
12
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
7
---
13
---
8
target/arm/helper-mve.h | 32 +++++++++
14
linux-user/aarch64/target_prctl.h | 29 +++++++++++++++++------------
9
target/arm/mve.decode | 12 ++++
15
1 file changed, 17 insertions(+), 12 deletions(-)
10
target/arm/mve_helper.c | 129 +++++++++++++++++++++++++++++++++++++
11
target/arm/translate-mve.c | 97 ++++++++++++++++++++++++++++
12
4 files changed, 270 insertions(+)
13
16
14
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
17
diff --git a/linux-user/aarch64/target_prctl.h b/linux-user/aarch64/target_prctl.h
15
index XXXXXXX..XXXXXXX 100644
18
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/helper-mve.h
19
--- a/linux-user/aarch64/target_prctl.h
17
+++ b/target/arm/helper-mve.h
20
+++ b/linux-user/aarch64/target_prctl.h
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vstrb_h, TCG_CALL_NO_WG, void, env, ptr, i32)
21
@@ -XXX,XX +XXX,XX @@ static abi_long do_prctl_set_tagged_addr_ctrl(CPUArchState *env, abi_long arg2)
19
DEF_HELPER_FLAGS_3(mve_vstrb_w, TCG_CALL_NO_WG, void, env, ptr, i32)
22
env->tagged_addr_enable = arg2 & PR_TAGGED_ADDR_ENABLE;
20
DEF_HELPER_FLAGS_3(mve_vstrh_w, TCG_CALL_NO_WG, void, env, ptr, i32)
23
21
24
if (cpu_isar_feature(aa64_mte, cpu)) {
22
+DEF_HELPER_FLAGS_4(mve_vldrb_sg_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
25
- switch (arg2 & PR_MTE_TCF_MASK) {
23
+DEF_HELPER_FLAGS_4(mve_vldrb_sg_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
26
- case PR_MTE_TCF_NONE:
24
+DEF_HELPER_FLAGS_4(mve_vldrh_sg_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
27
- case PR_MTE_TCF_SYNC:
25
+
28
- case PR_MTE_TCF_ASYNC:
26
+DEF_HELPER_FLAGS_4(mve_vldrb_sg_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
29
- break;
27
+DEF_HELPER_FLAGS_4(mve_vldrb_sg_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
30
- default:
28
+DEF_HELPER_FLAGS_4(mve_vldrb_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
31
- return -EINVAL;
29
+DEF_HELPER_FLAGS_4(mve_vldrh_sg_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
32
- }
30
+DEF_HELPER_FLAGS_4(mve_vldrh_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
33
-
31
+DEF_HELPER_FLAGS_4(mve_vldrw_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
34
/*
32
+DEF_HELPER_FLAGS_4(mve_vldrd_sg_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
35
* Write PR_MTE_TCF to SCTLR_EL1[TCF0].
33
+
36
- * Note that the syscall values are consistent with hw.
34
+DEF_HELPER_FLAGS_4(mve_vstrb_sg_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
37
+ *
35
+DEF_HELPER_FLAGS_4(mve_vstrb_sg_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
38
+ * The kernel has a per-cpu configuration for the sysadmin,
36
+DEF_HELPER_FLAGS_4(mve_vstrb_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
39
+ * /sys/devices/system/cpu/cpu<N>/mte_tcf_preferred,
37
+DEF_HELPER_FLAGS_4(mve_vstrh_sg_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
40
+ * which qemu does not implement.
38
+DEF_HELPER_FLAGS_4(mve_vstrh_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
41
+ *
39
+DEF_HELPER_FLAGS_4(mve_vstrw_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
42
+ * Because there is no performance difference between the modes, and
40
+DEF_HELPER_FLAGS_4(mve_vstrd_sg_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
43
+ * because SYNC is most useful for debugging MTE errors, choose SYNC
41
+
44
+ * as the preferred mode. With this preference, and the way the API
42
+DEF_HELPER_FLAGS_4(mve_vldrh_sg_os_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
45
+ * uses only two bits, there is no way for the program to select
43
+
46
+ * ASYMM mode.
44
+DEF_HELPER_FLAGS_4(mve_vldrh_sg_os_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
47
*/
45
+DEF_HELPER_FLAGS_4(mve_vldrh_sg_os_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
48
- env->cp15.sctlr_el[1] =
46
+DEF_HELPER_FLAGS_4(mve_vldrw_sg_os_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
49
- deposit64(env->cp15.sctlr_el[1], 38, 2, arg2 >> PR_MTE_TCF_SHIFT);
47
+DEF_HELPER_FLAGS_4(mve_vldrd_sg_os_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
50
+ unsigned tcf = 0;
48
+
51
+ if (arg2 & PR_MTE_TCF_SYNC) {
49
+DEF_HELPER_FLAGS_4(mve_vstrh_sg_os_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
52
+ tcf = 1;
50
+DEF_HELPER_FLAGS_4(mve_vstrh_sg_os_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
53
+ } else if (arg2 & PR_MTE_TCF_ASYNC) {
51
+DEF_HELPER_FLAGS_4(mve_vstrw_sg_os_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
54
+ tcf = 2;
52
+DEF_HELPER_FLAGS_4(mve_vstrd_sg_os_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
53
+
54
DEF_HELPER_FLAGS_3(mve_vdup, TCG_CALL_NO_WG, void, env, ptr, i32)
55
56
DEF_HELPER_FLAGS_4(mve_vidupb, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
57
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
58
index XXXXXXX..XXXXXXX 100644
59
--- a/target/arm/mve.decode
60
+++ b/target/arm/mve.decode
61
@@ -XXX,XX +XXX,XX @@
62
&shl_scalar qda rm size
63
&vmaxv qm rda size
64
&vabav qn qm rda size
65
+&vldst_sg qd qm rn size msize os
66
+
67
+# scatter-gather memory size is in bits 6:4
68
+%sg_msize 6:1 4:1
69
70
@vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
71
# Note that both Rn and Qd are 3 bits only (no D bit)
72
@vldst_wn ... u:1 ... . . . . l:1 . rn:3 qd:3 . ... .. imm:7 &vldr_vstr
73
74
+@vldst_sg .... .... .... rn:4 .... ... size:2 ... ... os:1 &vldst_sg \
75
+ qd=%qd qm=%qm msize=%sg_msize
76
+
77
@1op .... .... .... size:2 .. .... .... .... .... &1op qd=%qd qm=%qm
78
@1op_nosz .... .... .... .... .... .... .... .... &1op qd=%qd qm=%qm size=0
79
@2op .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn
80
@@ -XXX,XX +XXX,XX @@ VLDR_VSTR 1110110 1 a:1 . w:1 . .... ... 111101 ....... @vldr_vstr \
81
VLDR_VSTR 1110110 1 a:1 . w:1 . .... ... 111110 ....... @vldr_vstr \
82
size=2 p=1
83
84
+# gather loads/scatter stores
85
+VLDR_S_sg 111 0 1100 1 . 01 .... ... 0 111 . .... .... @vldst_sg
86
+VLDR_U_sg 111 1 1100 1 . 01 .... ... 0 111 . .... .... @vldst_sg
87
+VSTR_sg 111 0 1100 1 . 00 .... ... 0 111 . .... .... @vldst_sg
88
+
89
# Moves between 2 32-bit vector lanes and 2 general purpose registers
90
VMOV_to_2gp 1110 1100 0 . 00 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
91
VMOV_from_2gp 1110 1100 0 . 01 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
92
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
93
index XXXXXXX..XXXXXXX 100644
94
--- a/target/arm/mve_helper.c
95
+++ b/target/arm/mve_helper.c
96
@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
97
#undef DO_VLDR
98
#undef DO_VSTR
99
100
+/*
101
+ * Gather loads/scatter stores. Here each element of Qm specifies
102
+ * an offset to use from the base register Rm. In the _os_ versions
103
+ * that offset is scaled by the element size.
104
+ * For loads, predicated lanes are zeroed instead of retaining
105
+ * their previous values.
106
+ */
107
+#define DO_VLDR_SG(OP, LDTYPE, ESIZE, TYPE, OFFTYPE, ADDRFN) \
108
+ void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, \
109
+ uint32_t base) \
110
+ { \
111
+ TYPE *d = vd; \
112
+ OFFTYPE *m = vm; \
113
+ uint16_t mask = mve_element_mask(env); \
114
+ uint16_t eci_mask = mve_eci_mask(env); \
115
+ unsigned e; \
116
+ uint32_t addr; \
117
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE, eci_mask >>= ESIZE) { \
118
+ if (!(eci_mask & 1)) { \
119
+ continue; \
120
+ } \
121
+ addr = ADDRFN(base, m[H##ESIZE(e)]); \
122
+ d[H##ESIZE(e)] = (mask & 1) ? \
123
+ cpu_##LDTYPE##_data_ra(env, addr, GETPC()) : 0; \
124
+ } \
125
+ mve_advance_vpt(env); \
126
+ }
127
+
128
+/* We know here TYPE is unsigned so always the same as the offset type */
129
+#define DO_VSTR_SG(OP, STTYPE, ESIZE, TYPE, ADDRFN) \
130
+ void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, \
131
+ uint32_t base) \
132
+ { \
133
+ TYPE *d = vd; \
134
+ TYPE *m = vm; \
135
+ uint16_t mask = mve_element_mask(env); \
136
+ unsigned e; \
137
+ uint32_t addr; \
138
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
139
+ addr = ADDRFN(base, m[H##ESIZE(e)]); \
140
+ if (mask & 1) { \
141
+ cpu_##STTYPE##_data_ra(env, addr, d[H##ESIZE(e)], GETPC()); \
142
+ } \
143
+ } \
144
+ mve_advance_vpt(env); \
145
+ }
146
+
147
+/*
148
+ * 64-bit accesses are slightly different: they are done as two 32-bit
149
+ * accesses, controlled by the predicate mask for the relevant beat,
150
+ * and with a single 32-bit offset in the first of the two Qm elements.
151
+ * Note that for QEMU our IMPDEF AIRCR.ENDIANNESS is always 0 (little).
152
+ */
153
+#define DO_VLDR64_SG(OP, ADDRFN) \
154
+ void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, \
155
+ uint32_t base) \
156
+ { \
157
+ uint32_t *d = vd; \
158
+ uint32_t *m = vm; \
159
+ uint16_t mask = mve_element_mask(env); \
160
+ uint16_t eci_mask = mve_eci_mask(env); \
161
+ unsigned e; \
162
+ uint32_t addr; \
163
+ for (e = 0; e < 16 / 4; e++, mask >>= 4, eci_mask >>= 4) { \
164
+ if (!(eci_mask & 1)) { \
165
+ continue; \
166
+ } \
167
+ addr = ADDRFN(base, m[H4(e & ~1)]); \
168
+ addr += 4 * (e & 1); \
169
+ d[H4(e)] = (mask & 1) ? cpu_ldl_data_ra(env, addr, GETPC()) : 0; \
170
+ } \
171
+ mve_advance_vpt(env); \
172
+ }
173
+
174
+#define DO_VSTR64_SG(OP, ADDRFN) \
175
+ void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, \
176
+ uint32_t base) \
177
+ { \
178
+ uint32_t *d = vd; \
179
+ uint32_t *m = vm; \
180
+ uint16_t mask = mve_element_mask(env); \
181
+ unsigned e; \
182
+ uint32_t addr; \
183
+ for (e = 0; e < 16 / 4; e++, mask >>= 4) { \
184
+ addr = ADDRFN(base, m[H4(e & ~1)]); \
185
+ addr += 4 * (e & 1); \
186
+ if (mask & 1) { \
187
+ cpu_stl_data_ra(env, addr, d[H4(e)], GETPC()); \
188
+ } \
189
+ } \
190
+ mve_advance_vpt(env); \
191
+ }
192
+
193
+#define ADDR_ADD(BASE, OFFSET) ((BASE) + (OFFSET))
194
+#define ADDR_ADD_OSH(BASE, OFFSET) ((BASE) + ((OFFSET) << 1))
195
+#define ADDR_ADD_OSW(BASE, OFFSET) ((BASE) + ((OFFSET) << 2))
196
+#define ADDR_ADD_OSD(BASE, OFFSET) ((BASE) + ((OFFSET) << 3))
197
+
198
+DO_VLDR_SG(vldrb_sg_sh, ldsb, 2, int16_t, uint16_t, ADDR_ADD)
199
+DO_VLDR_SG(vldrb_sg_sw, ldsb, 4, int32_t, uint32_t, ADDR_ADD)
200
+DO_VLDR_SG(vldrh_sg_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD)
201
+
202
+DO_VLDR_SG(vldrb_sg_ub, ldub, 1, uint8_t, uint8_t, ADDR_ADD)
203
+DO_VLDR_SG(vldrb_sg_uh, ldub, 2, uint16_t, uint16_t, ADDR_ADD)
204
+DO_VLDR_SG(vldrb_sg_uw, ldub, 4, uint32_t, uint32_t, ADDR_ADD)
205
+DO_VLDR_SG(vldrh_sg_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD)
206
+DO_VLDR_SG(vldrh_sg_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD)
207
+DO_VLDR_SG(vldrw_sg_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD)
208
+DO_VLDR64_SG(vldrd_sg_ud, ADDR_ADD)
209
+
210
+DO_VLDR_SG(vldrh_sg_os_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD_OSH)
211
+DO_VLDR_SG(vldrh_sg_os_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD_OSH)
212
+DO_VLDR_SG(vldrh_sg_os_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD_OSH)
213
+DO_VLDR_SG(vldrw_sg_os_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD_OSW)
214
+DO_VLDR64_SG(vldrd_sg_os_ud, ADDR_ADD_OSD)
215
+
216
+DO_VSTR_SG(vstrb_sg_ub, stb, 1, uint8_t, ADDR_ADD)
217
+DO_VSTR_SG(vstrb_sg_uh, stb, 2, uint16_t, ADDR_ADD)
218
+DO_VSTR_SG(vstrb_sg_uw, stb, 4, uint32_t, ADDR_ADD)
219
+DO_VSTR_SG(vstrh_sg_uh, stw, 2, uint16_t, ADDR_ADD)
220
+DO_VSTR_SG(vstrh_sg_uw, stw, 4, uint32_t, ADDR_ADD)
221
+DO_VSTR_SG(vstrw_sg_uw, stl, 4, uint32_t, ADDR_ADD)
222
+DO_VSTR64_SG(vstrd_sg_ud, ADDR_ADD)
223
+
224
+DO_VSTR_SG(vstrh_sg_os_uh, stw, 2, uint16_t, ADDR_ADD_OSH)
225
+DO_VSTR_SG(vstrh_sg_os_uw, stw, 4, uint32_t, ADDR_ADD_OSH)
226
+DO_VSTR_SG(vstrw_sg_os_uw, stl, 4, uint32_t, ADDR_ADD_OSW)
227
+DO_VSTR64_SG(vstrd_sg_os_ud, ADDR_ADD_OSD)
228
+
229
/*
230
* The mergemask(D, R, M) macro performs the operation "*D = R" but
231
* storing only the bytes which correspond to 1 bits in M,
232
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
233
index XXXXXXX..XXXXXXX 100644
234
--- a/target/arm/translate-mve.c
235
+++ b/target/arm/translate-mve.c
236
@@ -XXX,XX +XXX,XX @@ static inline int vidup_imm(DisasContext *s, int x)
237
#include "decode-mve.c.inc"
238
239
typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
240
+typedef void MVEGenLdStSGFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
241
typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
242
typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
243
typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
244
@@ -XXX,XX +XXX,XX @@ DO_VLDST_WIDE_NARROW(VLDSTB_H, vldrb_sh, vldrb_uh, vstrb_h, MO_8)
245
DO_VLDST_WIDE_NARROW(VLDSTB_W, vldrb_sw, vldrb_uw, vstrb_w, MO_8)
246
DO_VLDST_WIDE_NARROW(VLDSTH_W, vldrh_sw, vldrh_uw, vstrh_w, MO_16)
247
248
+static bool do_ldst_sg(DisasContext *s, arg_vldst_sg *a, MVEGenLdStSGFn fn)
249
+{
250
+ TCGv_i32 addr;
251
+ TCGv_ptr qd, qm;
252
+
253
+ if (!dc_isar_feature(aa32_mve, s) ||
254
+ !mve_check_qreg_bank(s, a->qd | a->qm) ||
255
+ !fn || a->rn == 15) {
256
+ /* Rn case is UNPREDICTABLE */
257
+ return false;
258
+ }
259
+
260
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
261
+ return true;
262
+ }
263
+
264
+ addr = load_reg(s, a->rn);
265
+
266
+ qd = mve_qreg_ptr(a->qd);
267
+ qm = mve_qreg_ptr(a->qm);
268
+ fn(cpu_env, qd, qm, addr);
269
+ tcg_temp_free_ptr(qd);
270
+ tcg_temp_free_ptr(qm);
271
+ tcg_temp_free_i32(addr);
272
+ mve_update_eci(s);
273
+ return true;
274
+}
275
+
276
+/*
277
+ * The naming scheme here is "vldrb_sg_sh == in-memory byte loads
278
+ * signextended to halfword elements in register". _os_ indicates that
279
+ * the offsets in Qm should be scaled by the element size.
280
+ */
281
+/* This macro is just to make the arrays more compact in these functions */
282
+#define F(N) gen_helper_mve_##N
283
+
284
+/* VLDRB/VSTRB (ie msize 1) with OS=1 is UNPREDICTABLE; we UNDEF */
285
+static bool trans_VLDR_S_sg(DisasContext *s, arg_vldst_sg *a)
286
+{
287
+ static MVEGenLdStSGFn * const fns[2][4][4] = { {
288
+ { NULL, F(vldrb_sg_sh), F(vldrb_sg_sw), NULL },
289
+ { NULL, NULL, F(vldrh_sg_sw), NULL },
290
+ { NULL, NULL, NULL, NULL },
291
+ { NULL, NULL, NULL, NULL }
292
+ }, {
293
+ { NULL, NULL, NULL, NULL },
294
+ { NULL, NULL, F(vldrh_sg_os_sw), NULL },
295
+ { NULL, NULL, NULL, NULL },
296
+ { NULL, NULL, NULL, NULL }
297
+ }
55
+ }
298
+ };
56
+ env->cp15.sctlr_el[1] = deposit64(env->cp15.sctlr_el[1], 38, 2, tcf);
299
+ if (a->qd == a->qm) {
57
300
+ return false; /* UNPREDICTABLE */
58
/*
301
+ }
59
* Write PR_MTE_TAG to GCR_EL1[Exclude].
302
+ return do_ldst_sg(s, a, fns[a->os][a->msize][a->size]);
303
+}
304
+
305
+static bool trans_VLDR_U_sg(DisasContext *s, arg_vldst_sg *a)
306
+{
307
+ static MVEGenLdStSGFn * const fns[2][4][4] = { {
308
+ { F(vldrb_sg_ub), F(vldrb_sg_uh), F(vldrb_sg_uw), NULL },
309
+ { NULL, F(vldrh_sg_uh), F(vldrh_sg_uw), NULL },
310
+ { NULL, NULL, F(vldrw_sg_uw), NULL },
311
+ { NULL, NULL, NULL, F(vldrd_sg_ud) }
312
+ }, {
313
+ { NULL, NULL, NULL, NULL },
314
+ { NULL, F(vldrh_sg_os_uh), F(vldrh_sg_os_uw), NULL },
315
+ { NULL, NULL, F(vldrw_sg_os_uw), NULL },
316
+ { NULL, NULL, NULL, F(vldrd_sg_os_ud) }
317
+ }
318
+ };
319
+ if (a->qd == a->qm) {
320
+ return false; /* UNPREDICTABLE */
321
+ }
322
+ return do_ldst_sg(s, a, fns[a->os][a->msize][a->size]);
323
+}
324
+
325
+static bool trans_VSTR_sg(DisasContext *s, arg_vldst_sg *a)
326
+{
327
+ static MVEGenLdStSGFn * const fns[2][4][4] = { {
328
+ { F(vstrb_sg_ub), F(vstrb_sg_uh), F(vstrb_sg_uw), NULL },
329
+ { NULL, F(vstrh_sg_uh), F(vstrh_sg_uw), NULL },
330
+ { NULL, NULL, F(vstrw_sg_uw), NULL },
331
+ { NULL, NULL, NULL, F(vstrd_sg_ud) }
332
+ }, {
333
+ { NULL, NULL, NULL, NULL },
334
+ { NULL, F(vstrh_sg_os_uh), F(vstrh_sg_os_uw), NULL },
335
+ { NULL, NULL, F(vstrw_sg_os_uw), NULL },
336
+ { NULL, NULL, NULL, F(vstrd_sg_os_ud) }
337
+ }
338
+ };
339
+ return do_ldst_sg(s, a, fns[a->os][a->msize][a->size]);
340
+}
341
+
342
+#undef F
343
+
344
static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
345
{
346
TCGv_ptr qd;
347
--
60
--
348
2.20.1
61
2.34.1
349
350
diff view generated by jsdifflib
1
From: Guenter Roeck <linux@roeck-us.net>
1
From: Richard Henderson <richard.henderson@linaro.org>
2
2
3
Instantiate SAI1/2/3 as unimplemented devices to avoid Linux kernel crashes
3
The field is encoded as [0-3], which is convenient for
4
such as the following.
4
indexing our array of function pointers, but the true
5
value is [1-4]. Adjust before calling do_mem_zpa.
5
6
6
Unhandled fault: external abort on non-linefetch (0x808) at 0xd19b0000
7
Add an assert, and move the comment re passing ZT to
7
pgd = (ptrval)
8
the helper back next to the relevant code.
8
[d19b0000] *pgd=82711811, *pte=308a0653, *ppte=308a0453
9
Internal error: : 808 [#1] SMP ARM
10
Modules linked in:
11
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-rc5 #1
12
...
13
[<c095e974>] (regmap_mmio_write32le) from [<c095eb48>] (regmap_mmio_write+0x3c/0x54)
14
[<c095eb48>] (regmap_mmio_write) from [<c09580f4>] (_regmap_write+0x4c/0x1f0)
15
[<c09580f4>] (_regmap_write) from [<c0959b28>] (regmap_write+0x3c/0x60)
16
[<c0959b28>] (regmap_write) from [<c0d41130>] (fsl_sai_runtime_resume+0x9c/0x1ec)
17
[<c0d41130>] (fsl_sai_runtime_resume) from [<c0942464>] (__rpm_callback+0x3c/0x108)
18
[<c0942464>] (__rpm_callback) from [<c0942590>] (rpm_callback+0x60/0x64)
19
[<c0942590>] (rpm_callback) from [<c0942b60>] (rpm_resume+0x5cc/0x808)
20
[<c0942b60>] (rpm_resume) from [<c0942dfc>] (__pm_runtime_resume+0x60/0xa0)
21
[<c0942dfc>] (__pm_runtime_resume) from [<c0d4231c>] (fsl_sai_probe+0x2b8/0x65c)
22
[<c0d4231c>] (fsl_sai_probe) from [<c0935b08>] (platform_probe+0x58/0xb8)
23
[<c0935b08>] (platform_probe) from [<c0933264>] (really_probe.part.0+0x9c/0x334)
24
[<c0933264>] (really_probe.part.0) from [<c093359c>] (__driver_probe_device+0xa0/0x138)
25
[<c093359c>] (__driver_probe_device) from [<c0933664>] (driver_probe_device+0x30/0xc8)
26
[<c0933664>] (driver_probe_device) from [<c0933c88>] (__driver_attach+0x90/0x130)
27
[<c0933c88>] (__driver_attach) from [<c0931060>] (bus_for_each_dev+0x78/0xb8)
28
[<c0931060>] (bus_for_each_dev) from [<c093254c>] (bus_add_driver+0xf0/0x1d8)
29
[<c093254c>] (bus_add_driver) from [<c0934a30>] (driver_register+0x88/0x118)
30
[<c0934a30>] (driver_register) from [<c01022c0>] (do_one_initcall+0x7c/0x3a4)
31
[<c01022c0>] (do_one_initcall) from [<c1601204>] (kernel_init_freeable+0x198/0x22c)
32
[<c1601204>] (kernel_init_freeable) from [<c0f5ff2c>] (kernel_init+0x10/0x128)
33
[<c0f5ff2c>] (kernel_init) from [<c010013c>] (ret_from_fork+0x14/0x38)
34
9
35
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
10
Cc: qemu-stable@nongnu.org
36
Message-id: 20210810175607.538090-1-linux@roeck-us.net
11
Fixes: 206adacfb8d ("target/arm: Add mte helpers for sve scalar + int loads")
12
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
13
Tested-by: Gustavo Romero <gustavo.romero@linaro.org>
14
Message-id: 20240207025210.8837-3-richard.henderson@linaro.org
37
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
15
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
38
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
16
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
39
---
17
---
40
include/hw/arm/fsl-imx7.h | 5 +++++
18
target/arm/tcg/translate-sve.c | 16 ++++++++--------
41
hw/arm/fsl-imx7.c | 7 +++++++
19
1 file changed, 8 insertions(+), 8 deletions(-)
42
2 files changed, 12 insertions(+)
43
20
44
diff --git a/include/hw/arm/fsl-imx7.h b/include/hw/arm/fsl-imx7.h
21
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
45
index XXXXXXX..XXXXXXX 100644
22
index XXXXXXX..XXXXXXX 100644
46
--- a/include/hw/arm/fsl-imx7.h
23
--- a/target/arm/tcg/translate-sve.c
47
+++ b/include/hw/arm/fsl-imx7.h
24
+++ b/target/arm/tcg/translate-sve.c
48
@@ -XXX,XX +XXX,XX @@ enum FslIMX7MemoryMap {
25
@@ -XXX,XX +XXX,XX @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
49
FSL_IMX7_UART6_ADDR = 0x30A80000,
26
TCGv_ptr t_pg;
50
FSL_IMX7_UART7_ADDR = 0x30A90000,
27
int desc = 0;
51
28
52
+ FSL_IMX7_SAI1_ADDR = 0x308A0000,
29
- /*
53
+ FSL_IMX7_SAI2_ADDR = 0x308B0000,
30
- * For e.g. LD4, there are not enough arguments to pass all 4
54
+ FSL_IMX7_SAI3_ADDR = 0x308C0000,
31
- * registers as pointers, so encode the regno into the data field.
55
+ FSL_IMX7_SAIn_SIZE = 0x10000,
32
- * For consistency, do this even for LD1.
56
+
33
- */
57
FSL_IMX7_ENET1_ADDR = 0x30BE0000,
34
+ assert(mte_n >= 1 && mte_n <= 4);
58
FSL_IMX7_ENET2_ADDR = 0x30BF0000,
35
if (s->mte_active[0]) {
59
36
int msz = dtype_msz(dtype);
60
diff --git a/hw/arm/fsl-imx7.c b/hw/arm/fsl-imx7.c
37
61
index XXXXXXX..XXXXXXX 100644
38
@@ -XXX,XX +XXX,XX @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
62
--- a/hw/arm/fsl-imx7.c
39
addr = clean_data_tbi(s, addr);
63
+++ b/hw/arm/fsl-imx7.c
40
}
64
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
65
create_unimplemented_device("can1", FSL_IMX7_CAN1_ADDR, FSL_IMX7_CANn_SIZE);
66
create_unimplemented_device("can2", FSL_IMX7_CAN2_ADDR, FSL_IMX7_CANn_SIZE);
67
41
68
+ /*
42
+ /*
69
+ * SAI (Audio SSI (Synchronous Serial Interface))
43
+ * For e.g. LD4, there are not enough arguments to pass all 4
44
+ * registers as pointers, so encode the regno into the data field.
45
+ * For consistency, do this even for LD1.
70
+ */
46
+ */
71
+ create_unimplemented_device("sai1", FSL_IMX7_SAI1_ADDR, FSL_IMX7_SAIn_SIZE);
47
desc = simd_desc(vsz, vsz, zt | desc);
72
+ create_unimplemented_device("sai2", FSL_IMX7_SAI2_ADDR, FSL_IMX7_SAIn_SIZE);
48
t_pg = tcg_temp_new_ptr();
73
+ create_unimplemented_device("sai2", FSL_IMX7_SAI3_ADDR, FSL_IMX7_SAIn_SIZE);
49
74
+
50
@@ -XXX,XX +XXX,XX @@ static void do_ld_zpa(DisasContext *s, int zt, int pg,
75
/*
51
* accessible via the instruction encoding.
76
* OCOTP
77
*/
52
*/
53
assert(fn != NULL);
54
- do_mem_zpa(s, zt, pg, addr, dtype, nreg, false, fn);
55
+ do_mem_zpa(s, zt, pg, addr, dtype, nreg + 1, false, fn);
56
}
57
58
static bool trans_LD_zprr(DisasContext *s, arg_rprr_load *a)
59
@@ -XXX,XX +XXX,XX @@ static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
60
if (nreg == 0) {
61
/* ST1 */
62
fn = fn_single[s->mte_active[0]][be][msz][esz];
63
- nreg = 1;
64
} else {
65
/* ST2, ST3, ST4 -- msz == esz, enforced by encoding */
66
assert(msz == esz);
67
fn = fn_multiple[s->mte_active[0]][be][nreg - 1][msz];
68
}
69
assert(fn != NULL);
70
- do_mem_zpa(s, zt, pg, addr, msz_dtype(s, msz), nreg, true, fn);
71
+ do_mem_zpa(s, zt, pg, addr, msz_dtype(s, msz), nreg + 1, true, fn);
72
}
73
74
static bool trans_ST_zprr(DisasContext *s, arg_rprr_store *a)
78
--
75
--
79
2.20.1
76
2.34.1
80
81
diff view generated by jsdifflib
1
From: Eduardo Habkost <ehabkost@redhat.com>
1
From: Richard Henderson <richard.henderson@linaro.org>
2
2
3
The SBSA_GWDT enum value conflicts with the SBSA_GWDT() QOM type
3
When we added SVE_MTEDESC_SHIFT, we effectively limited the
4
checking helper, preventing us from using a OBJECT_DEFINE* or
4
maximum size of MTEDESC. Adjust SIZEM1 to consume the remaining
5
DEFINE_INSTANCE_CHECKER macro for the SBSA_GWDT() wrapper.
5
bits (32 - 10 - 5 - 12 == 5). Assert that the data to be stored
6
fits within the field (expecting 8 * 4 - 1 == 31, exact fit).
6
7
7
If I understand the SBSA 6.0 specification correctly, the signal
8
Cc: qemu-stable@nongnu.org
8
being connected to IRQ 16 is the WS0 output signal from the
9
Generic Watchdog. Rename the enum value to SBSA_GWDT_WS0 to be
10
more explicit and avoid the name conflict.
11
12
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
13
Message-id: 20210806023119.431680-1-ehabkost@redhat.com
14
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
9
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
10
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
11
Tested-by: Gustavo Romero <gustavo.romero@linaro.org>
12
Message-id: 20240207025210.8837-4-richard.henderson@linaro.org
15
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
13
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
16
---
14
---
17
hw/arm/sbsa-ref.c | 6 +++---
15
target/arm/internals.h | 2 +-
18
1 file changed, 3 insertions(+), 3 deletions(-)
16
target/arm/tcg/translate-sve.c | 7 ++++---
17
2 files changed, 5 insertions(+), 4 deletions(-)
19
18
20
diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
19
diff --git a/target/arm/internals.h b/target/arm/internals.h
21
index XXXXXXX..XXXXXXX 100644
20
index XXXXXXX..XXXXXXX 100644
22
--- a/hw/arm/sbsa-ref.c
21
--- a/target/arm/internals.h
23
+++ b/hw/arm/sbsa-ref.c
22
+++ b/target/arm/internals.h
24
@@ -XXX,XX +XXX,XX @@ enum {
23
@@ -XXX,XX +XXX,XX @@ FIELD(MTEDESC, TBI, 4, 2)
25
SBSA_GIC_DIST,
24
FIELD(MTEDESC, TCMA, 6, 2)
26
SBSA_GIC_REDIST,
25
FIELD(MTEDESC, WRITE, 8, 1)
27
SBSA_SECURE_EC,
26
FIELD(MTEDESC, ALIGN, 9, 3)
28
- SBSA_GWDT,
27
-FIELD(MTEDESC, SIZEM1, 12, SIMD_DATA_BITS - 12) /* size - 1 */
29
+ SBSA_GWDT_WS0,
28
+FIELD(MTEDESC, SIZEM1, 12, SIMD_DATA_BITS - SVE_MTEDESC_SHIFT - 12) /* size - 1 */
30
SBSA_GWDT_REFRESH,
29
31
SBSA_GWDT_CONTROL,
30
bool mte_probe(CPUARMState *env, uint32_t desc, uint64_t ptr);
32
SBSA_SMMU,
31
uint64_t mte_check(CPUARMState *env, uint32_t desc, uint64_t ptr, uintptr_t ra);
33
@@ -XXX,XX +XXX,XX @@ static const int sbsa_ref_irqmap[] = {
32
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
34
[SBSA_AHCI] = 10,
33
index XXXXXXX..XXXXXXX 100644
35
[SBSA_EHCI] = 11,
34
--- a/target/arm/tcg/translate-sve.c
36
[SBSA_SMMU] = 12, /* ... to 15 */
35
+++ b/target/arm/tcg/translate-sve.c
37
- [SBSA_GWDT] = 16,
36
@@ -XXX,XX +XXX,XX @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
38
+ [SBSA_GWDT_WS0] = 16,
37
{
39
};
38
unsigned vsz = vec_full_reg_size(s);
40
39
TCGv_ptr t_pg;
41
static const char * const valid_cpus[] = {
40
+ uint32_t sizem1;
42
@@ -XXX,XX +XXX,XX @@ static void create_wdt(const SBSAMachineState *sms)
41
int desc = 0;
43
hwaddr cbase = sbsa_ref_memmap[SBSA_GWDT_CONTROL].base;
42
44
DeviceState *dev = qdev_new(TYPE_WDT_SBSA);
43
assert(mte_n >= 1 && mte_n <= 4);
45
SysBusDevice *s = SYS_BUS_DEVICE(dev);
44
+ sizem1 = (mte_n << dtype_msz(dtype)) - 1;
46
- int irq = sbsa_ref_irqmap[SBSA_GWDT];
45
+ assert(sizem1 <= R_MTEDESC_SIZEM1_MASK >> R_MTEDESC_SIZEM1_SHIFT);
47
+ int irq = sbsa_ref_irqmap[SBSA_GWDT_WS0];
46
if (s->mte_active[0]) {
48
47
- int msz = dtype_msz(dtype);
49
sysbus_realize_and_unref(s, &error_fatal);
48
-
50
sysbus_mmio_map(s, 0, rbase);
49
desc = FIELD_DP32(desc, MTEDESC, MIDX, get_mem_index(s));
50
desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid);
51
desc = FIELD_DP32(desc, MTEDESC, TCMA, s->tcma);
52
desc = FIELD_DP32(desc, MTEDESC, WRITE, is_write);
53
- desc = FIELD_DP32(desc, MTEDESC, SIZEM1, (mte_n << msz) - 1);
54
+ desc = FIELD_DP32(desc, MTEDESC, SIZEM1, sizem1);
55
desc <<= SVE_MTEDESC_SHIFT;
56
} else {
57
addr = clean_data_tbi(s, addr);
51
--
58
--
52
2.20.1
59
2.34.1
53
54
diff view generated by jsdifflib
1
Implement the MVE VPNOT insn, which inverts the bits in VPR.P0
1
From: Richard Henderson <richard.henderson@linaro.org>
2
(subject to both predication and to beatwise execution).
3
2
3
Share code that creates mtedesc and embeds within simd_desc.
4
5
Cc: qemu-stable@nongnu.org
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
Tested-by: Gustavo Romero <gustavo.romero@linaro.org>
9
Message-id: 20240207025210.8837-5-richard.henderson@linaro.org
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
---
11
---
7
target/arm/helper-mve.h | 1 +
12
target/arm/tcg/translate-a64.h | 2 ++
8
target/arm/mve.decode | 1 +
13
target/arm/tcg/translate-sme.c | 15 +++--------
9
target/arm/mve_helper.c | 17 +++++++++++++++++
14
target/arm/tcg/translate-sve.c | 47 ++++++++++++++++++----------------
10
target/arm/translate-mve.c | 19 +++++++++++++++++++
15
3 files changed, 31 insertions(+), 33 deletions(-)
11
4 files changed, 38 insertions(+)
12
16
13
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
17
diff --git a/target/arm/tcg/translate-a64.h b/target/arm/tcg/translate-a64.h
14
index XXXXXXX..XXXXXXX 100644
18
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/helper-mve.h
19
--- a/target/arm/tcg/translate-a64.h
16
+++ b/target/arm/helper-mve.h
20
+++ b/target/arm/tcg/translate-a64.h
17
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vorn, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
21
@@ -XXX,XX +XXX,XX @@ bool logic_imm_decode_wmask(uint64_t *result, unsigned int immn,
18
DEF_HELPER_FLAGS_4(mve_veor, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
22
bool sve_access_check(DisasContext *s);
19
23
bool sme_enabled_check(DisasContext *s);
20
DEF_HELPER_FLAGS_4(mve_vpsel, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
24
bool sme_enabled_check_with_svcr(DisasContext *s, unsigned);
21
+DEF_HELPER_FLAGS_1(mve_vpnot, TCG_CALL_NO_WG, void, env)
25
+uint32_t make_svemte_desc(DisasContext *s, unsigned vsz, uint32_t nregs,
22
26
+ uint32_t msz, bool is_write, uint32_t data);
23
DEF_HELPER_FLAGS_4(mve_vaddb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
27
24
DEF_HELPER_FLAGS_4(mve_vaddh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
28
/* This function corresponds to CheckStreamingSVEEnabled. */
25
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
29
static inline bool sme_sm_enabled_check(DisasContext *s)
30
diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c
26
index XXXXXXX..XXXXXXX 100644
31
index XXXXXXX..XXXXXXX 100644
27
--- a/target/arm/mve.decode
32
--- a/target/arm/tcg/translate-sme.c
28
+++ b/target/arm/mve.decode
33
+++ b/target/arm/tcg/translate-sme.c
29
@@ -XXX,XX +XXX,XX @@ VCMPGT 1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 1 @vcmp
34
@@ -XXX,XX +XXX,XX @@ static bool trans_LDST1(DisasContext *s, arg_LDST1 *a)
30
VCMPLE 1111 1110 0 . .. ... 1 ... 1 1111 1 0 . 0 ... 1 @vcmp
35
31
36
TCGv_ptr t_za, t_pg;
37
TCGv_i64 addr;
38
- int svl, desc = 0;
39
+ uint32_t desc;
40
bool be = s->be_data == MO_BE;
41
bool mte = s->mte_active[0];
42
43
@@ -XXX,XX +XXX,XX @@ static bool trans_LDST1(DisasContext *s, arg_LDST1 *a)
44
tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), a->esz);
45
tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn));
46
47
- if (mte) {
48
- desc = FIELD_DP32(desc, MTEDESC, MIDX, get_mem_index(s));
49
- desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid);
50
- desc = FIELD_DP32(desc, MTEDESC, TCMA, s->tcma);
51
- desc = FIELD_DP32(desc, MTEDESC, WRITE, a->st);
52
- desc = FIELD_DP32(desc, MTEDESC, SIZEM1, (1 << a->esz) - 1);
53
- desc <<= SVE_MTEDESC_SHIFT;
54
- } else {
55
+ if (!mte) {
56
addr = clean_data_tbi(s, addr);
57
}
58
- svl = streaming_vec_reg_size(s);
59
- desc = simd_desc(svl, svl, desc);
60
+
61
+ desc = make_svemte_desc(s, streaming_vec_reg_size(s), 1, a->esz, a->st, 0);
62
63
fns[a->esz][be][a->v][mte][a->st](tcg_env, t_za, t_pg, addr,
64
tcg_constant_i32(desc));
65
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
66
index XXXXXXX..XXXXXXX 100644
67
--- a/target/arm/tcg/translate-sve.c
68
+++ b/target/arm/tcg/translate-sve.c
69
@@ -XXX,XX +XXX,XX @@ static const uint8_t dtype_esz[16] = {
70
3, 2, 1, 3
71
};
72
73
-static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
74
- int dtype, uint32_t mte_n, bool is_write,
75
- gen_helper_gvec_mem *fn)
76
+uint32_t make_svemte_desc(DisasContext *s, unsigned vsz, uint32_t nregs,
77
+ uint32_t msz, bool is_write, uint32_t data)
32
{
78
{
33
+ VPNOT 1111 1110 0 0 11 000 1 000 0 1111 0100 1101
79
- unsigned vsz = vec_full_reg_size(s);
34
VPST 1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
80
- TCGv_ptr t_pg;
35
VCMPEQ_scalar 1111 1110 0 . .. ... 1 ... 0 1111 0 1 0 0 .... @vcmp_scalar
81
uint32_t sizem1;
36
}
82
- int desc = 0;
37
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
83
+ uint32_t desc = 0;
38
index XXXXXXX..XXXXXXX 100644
84
39
--- a/target/arm/mve_helper.c
85
- assert(mte_n >= 1 && mte_n <= 4);
40
+++ b/target/arm/mve_helper.c
86
- sizem1 = (mte_n << dtype_msz(dtype)) - 1;
41
@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vpsel)(CPUARMState *env, void *vd, void *vn, void *vm)
87
+ /* Assert all of the data fits, with or without MTE enabled. */
42
mve_advance_vpt(env);
88
+ assert(nregs >= 1 && nregs <= 4);
43
}
89
+ sizem1 = (nregs << msz) - 1;
44
90
assert(sizem1 <= R_MTEDESC_SIZEM1_MASK >> R_MTEDESC_SIZEM1_SHIFT);
45
+void HELPER(mve_vpnot)(CPUARMState *env)
91
+ assert(data < 1u << SVE_MTEDESC_SHIFT);
46
+{
92
+
47
+ /*
93
if (s->mte_active[0]) {
48
+ * P0 bits for unexecuted beats (where eci_mask is 0) are unchanged.
94
desc = FIELD_DP32(desc, MTEDESC, MIDX, get_mem_index(s));
49
+ * P0 bits for predicated lanes in executed bits (where mask is 0) are 0.
95
desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid);
50
+ * P0 bits otherwise are inverted.
96
@@ -XXX,XX +XXX,XX @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
51
+ * (This is the same logic as VCMP.)
97
desc = FIELD_DP32(desc, MTEDESC, WRITE, is_write);
52
+ * This insn is itself subject to predication and to beat-wise execution,
98
desc = FIELD_DP32(desc, MTEDESC, SIZEM1, sizem1);
53
+ * and after it executes VPT state advances in the usual way.
99
desc <<= SVE_MTEDESC_SHIFT;
54
+ */
100
- } else {
55
+ uint16_t mask = mve_element_mask(env);
101
+ }
56
+ uint16_t eci_mask = mve_eci_mask(env);
102
+ return simd_desc(vsz, vsz, desc | data);
57
+ uint16_t beatpred = ~env->v7m.vpr & mask;
58
+ env->v7m.vpr = (env->v7m.vpr & ~(uint32_t)eci_mask) | (beatpred & eci_mask);
59
+ mve_advance_vpt(env);
60
+}
103
+}
61
+
104
+
62
#define DO_1OP_SAT(OP, ESIZE, TYPE, FN) \
105
+static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
63
void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm) \
106
+ int dtype, uint32_t nregs, bool is_write,
64
{ \
107
+ gen_helper_gvec_mem *fn)
65
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
108
+{
66
index XXXXXXX..XXXXXXX 100644
109
+ TCGv_ptr t_pg;
67
--- a/target/arm/translate-mve.c
110
+ uint32_t desc;
68
+++ b/target/arm/translate-mve.c
111
+
69
@@ -XXX,XX +XXX,XX @@ static bool trans_VPST(DisasContext *s, arg_VPST *a)
112
+ if (!s->mte_active[0]) {
70
return true;
113
addr = clean_data_tbi(s, addr);
114
}
115
116
@@ -XXX,XX +XXX,XX @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
117
* registers as pointers, so encode the regno into the data field.
118
* For consistency, do this even for LD1.
119
*/
120
- desc = simd_desc(vsz, vsz, zt | desc);
121
+ desc = make_svemte_desc(s, vec_full_reg_size(s), nregs,
122
+ dtype_msz(dtype), is_write, zt);
123
t_pg = tcg_temp_new_ptr();
124
125
tcg_gen_addi_ptr(t_pg, tcg_env, pred_full_reg_offset(s, pg));
126
@@ -XXX,XX +XXX,XX @@ static void do_mem_zpz(DisasContext *s, int zt, int pg, int zm,
127
int scale, TCGv_i64 scalar, int msz, bool is_write,
128
gen_helper_gvec_mem_scatter *fn)
129
{
130
- unsigned vsz = vec_full_reg_size(s);
131
TCGv_ptr t_zm = tcg_temp_new_ptr();
132
TCGv_ptr t_pg = tcg_temp_new_ptr();
133
TCGv_ptr t_zt = tcg_temp_new_ptr();
134
- int desc = 0;
135
-
136
- if (s->mte_active[0]) {
137
- desc = FIELD_DP32(desc, MTEDESC, MIDX, get_mem_index(s));
138
- desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid);
139
- desc = FIELD_DP32(desc, MTEDESC, TCMA, s->tcma);
140
- desc = FIELD_DP32(desc, MTEDESC, WRITE, is_write);
141
- desc = FIELD_DP32(desc, MTEDESC, SIZEM1, (1 << msz) - 1);
142
- desc <<= SVE_MTEDESC_SHIFT;
143
- }
144
- desc = simd_desc(vsz, vsz, desc | scale);
145
+ uint32_t desc;
146
147
tcg_gen_addi_ptr(t_pg, tcg_env, pred_full_reg_offset(s, pg));
148
tcg_gen_addi_ptr(t_zm, tcg_env, vec_full_reg_offset(s, zm));
149
tcg_gen_addi_ptr(t_zt, tcg_env, vec_full_reg_offset(s, zt));
150
+
151
+ desc = make_svemte_desc(s, vec_full_reg_size(s), 1, msz, is_write, scale);
152
fn(tcg_env, t_zt, t_pg, t_zm, scalar, tcg_constant_i32(desc));
71
}
153
}
72
154
73
+static bool trans_VPNOT(DisasContext *s, arg_VPNOT *a)
74
+{
75
+ /*
76
+ * Invert the predicate in VPR.P0. We have call out to
77
+ * a helper because this insn itself is beatwise and can
78
+ * be predicated.
79
+ */
80
+ if (!dc_isar_feature(aa32_mve, s)) {
81
+ return false;
82
+ }
83
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
84
+ return true;
85
+ }
86
+
87
+ gen_helper_mve_vpnot(cpu_env);
88
+ mve_update_eci(s);
89
+ return true;
90
+}
91
+
92
static bool trans_VADDV(DisasContext *s, arg_VADDV *a)
93
{
94
/* VADDV: vector add across vector */
95
--
155
--
96
2.20.1
156
2.34.1
97
98
diff view generated by jsdifflib
1
Implement the MVE VMOV forms that move data between 2 general-purpose
1
From: Richard Henderson <richard.henderson@linaro.org>
2
registers and 2 32-bit lanes in a vector register.
3
2
3
These functions "use the standard load helpers", but
4
fail to clean_data_tbi or populate mtedesc.
5
6
Cc: qemu-stable@nongnu.org
7
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
8
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
9
Tested-by: Gustavo Romero <gustavo.romero@linaro.org>
10
Message-id: 20240207025210.8837-6-richard.henderson@linaro.org
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
---
12
---
7
target/arm/translate-a32.h | 1 +
13
target/arm/tcg/translate-sve.c | 15 +++++++++++++--
8
target/arm/mve.decode | 4 ++
14
1 file changed, 13 insertions(+), 2 deletions(-)
9
target/arm/translate-mve.c | 85 ++++++++++++++++++++++++++++++++++++++
10
target/arm/translate-vfp.c | 2 +-
11
4 files changed, 91 insertions(+), 1 deletion(-)
12
15
13
diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
16
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
14
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/translate-a32.h
18
--- a/target/arm/tcg/translate-sve.c
16
+++ b/target/arm/translate-a32.h
19
+++ b/target/arm/tcg/translate-sve.c
17
@@ -XXX,XX +XXX,XX @@ void gen_rev16(TCGv_i32 dest, TCGv_i32 var);
20
@@ -XXX,XX +XXX,XX @@ static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype)
18
void clear_eci_state(DisasContext *s);
21
unsigned vsz = vec_full_reg_size(s);
19
bool mve_eci_check(DisasContext *s);
22
TCGv_ptr t_pg;
20
void mve_update_and_store_eci(DisasContext *s);
23
int poff;
21
+bool mve_skip_vmov(DisasContext *s, int vn, int index, int size);
24
+ uint32_t desc;
22
25
23
static inline TCGv_i32 load_cpu_offset(int offset)
26
/* Load the first quadword using the normal predicated load helpers. */
24
{
27
+ if (!s->mte_active[0]) {
25
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
28
+ addr = clean_data_tbi(s, addr);
26
index XXXXXXX..XXXXXXX 100644
27
--- a/target/arm/mve.decode
28
+++ b/target/arm/mve.decode
29
@@ -XXX,XX +XXX,XX @@ VLDR_VSTR 1110110 1 a:1 . w:1 . .... ... 111101 ....... @vldr_vstr \
30
VLDR_VSTR 1110110 1 a:1 . w:1 . .... ... 111110 ....... @vldr_vstr \
31
size=2 p=1
32
33
+# Moves between 2 32-bit vector lanes and 2 general purpose registers
34
+VMOV_to_2gp 1110 1100 0 . 00 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
35
+VMOV_from_2gp 1110 1100 0 . 01 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
36
+
37
# Vector 2-op
38
VAND 1110 1111 0 . 00 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
39
VBIC 1110 1111 0 . 01 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
40
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
41
index XXXXXXX..XXXXXXX 100644
42
--- a/target/arm/translate-mve.c
43
+++ b/target/arm/translate-mve.c
44
@@ -XXX,XX +XXX,XX @@ static bool do_vabav(DisasContext *s, arg_vabav *a, MVEGenVABAVFn *fn)
45
46
DO_VABAV(VABAV_S, vabavs)
47
DO_VABAV(VABAV_U, vabavu)
48
+
49
+static bool trans_VMOV_to_2gp(DisasContext *s, arg_VMOV_to_2gp *a)
50
+{
51
+ /*
52
+ * VMOV two 32-bit vector lanes to two general-purpose registers.
53
+ * This insn is not predicated but it is subject to beat-wise
54
+ * execution if it is not in an IT block. For us this means
55
+ * only that if PSR.ECI says we should not be executing the beat
56
+ * corresponding to the lane of the vector register being accessed
57
+ * then we should skip perfoming the move, and that we need to do
58
+ * the usual check for bad ECI state and advance of ECI state.
59
+ * (If PSR.ECI is non-zero then we cannot be in an IT block.)
60
+ */
61
+ TCGv_i32 tmp;
62
+ int vd;
63
+
64
+ if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qd) ||
65
+ a->rt == 13 || a->rt == 15 || a->rt2 == 13 || a->rt2 == 15 ||
66
+ a->rt == a->rt2) {
67
+ /* Rt/Rt2 cases are UNPREDICTABLE */
68
+ return false;
69
+ }
70
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
71
+ return true;
72
+ }
29
+ }
73
+
30
+
74
+ /* Convert Qreg index to Dreg for read_neon_element32() etc */
31
poff = pred_full_reg_offset(s, pg);
75
+ vd = a->qd * 2;
32
if (vsz > 16) {
76
+
33
/*
77
+ if (!mve_skip_vmov(s, vd, a->idx, MO_32)) {
34
@@ -XXX,XX +XXX,XX @@ static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype)
78
+ tmp = tcg_temp_new_i32();
35
79
+ read_neon_element32(tmp, vd, a->idx, MO_32);
36
gen_helper_gvec_mem *fn
80
+ store_reg(s, a->rt, tmp);
37
= ldr_fns[s->mte_active[0]][s->be_data == MO_BE][dtype][0];
38
- fn(tcg_env, t_pg, addr, tcg_constant_i32(simd_desc(16, 16, zt)));
39
+ desc = make_svemte_desc(s, 16, 1, dtype_msz(dtype), false, zt);
40
+ fn(tcg_env, t_pg, addr, tcg_constant_i32(desc));
41
42
/* Replicate that first quadword. */
43
if (vsz > 16) {
44
@@ -XXX,XX +XXX,XX @@ static void do_ldro(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype)
45
unsigned vsz_r32;
46
TCGv_ptr t_pg;
47
int poff, doff;
48
+ uint32_t desc;
49
50
if (vsz < 32) {
51
/*
52
@@ -XXX,XX +XXX,XX @@ static void do_ldro(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype)
53
}
54
55
/* Load the first octaword using the normal predicated load helpers. */
56
+ if (!s->mte_active[0]) {
57
+ addr = clean_data_tbi(s, addr);
81
+ }
58
+ }
82
+ if (!mve_skip_vmov(s, vd + 1, a->idx, MO_32)) {
59
83
+ tmp = tcg_temp_new_i32();
60
poff = pred_full_reg_offset(s, pg);
84
+ read_neon_element32(tmp, vd + 1, a->idx, MO_32);
61
if (vsz > 32) {
85
+ store_reg(s, a->rt2, tmp);
62
@@ -XXX,XX +XXX,XX @@ static void do_ldro(DisasContext *s, int zt, int pg, TCGv_i64 addr, int dtype)
86
+ }
63
87
+
64
gen_helper_gvec_mem *fn
88
+ mve_update_and_store_eci(s);
65
= ldr_fns[s->mte_active[0]][s->be_data == MO_BE][dtype][0];
89
+ return true;
66
- fn(tcg_env, t_pg, addr, tcg_constant_i32(simd_desc(32, 32, zt)));
90
+}
67
+ desc = make_svemte_desc(s, 32, 1, dtype_msz(dtype), false, zt);
91
+
68
+ fn(tcg_env, t_pg, addr, tcg_constant_i32(desc));
92
+static bool trans_VMOV_from_2gp(DisasContext *s, arg_VMOV_to_2gp *a)
69
93
+{
94
+ /*
95
+ * VMOV two general-purpose registers to two 32-bit vector lanes.
96
+ * This insn is not predicated but it is subject to beat-wise
97
+ * execution if it is not in an IT block. For us this means
98
+ * only that if PSR.ECI says we should not be executing the beat
99
+ * corresponding to the lane of the vector register being accessed
100
+ * then we should skip perfoming the move, and that we need to do
101
+ * the usual check for bad ECI state and advance of ECI state.
102
+ * (If PSR.ECI is non-zero then we cannot be in an IT block.)
103
+ */
104
+ TCGv_i32 tmp;
105
+ int vd;
106
+
107
+ if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qd) ||
108
+ a->rt == 13 || a->rt == 15 || a->rt2 == 13 || a->rt2 == 15) {
109
+ /* Rt/Rt2 cases are UNPREDICTABLE */
110
+ return false;
111
+ }
112
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
113
+ return true;
114
+ }
115
+
116
+ /* Convert Qreg idx to Dreg for read_neon_element32() etc */
117
+ vd = a->qd * 2;
118
+
119
+ if (!mve_skip_vmov(s, vd, a->idx, MO_32)) {
120
+ tmp = load_reg(s, a->rt);
121
+ write_neon_element32(tmp, vd, a->idx, MO_32);
122
+ tcg_temp_free_i32(tmp);
123
+ }
124
+ if (!mve_skip_vmov(s, vd + 1, a->idx, MO_32)) {
125
+ tmp = load_reg(s, a->rt2);
126
+ write_neon_element32(tmp, vd + 1, a->idx, MO_32);
127
+ tcg_temp_free_i32(tmp);
128
+ }
129
+
130
+ mve_update_and_store_eci(s);
131
+ return true;
132
+}
133
diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
134
index XXXXXXX..XXXXXXX 100644
135
--- a/target/arm/translate-vfp.c
136
+++ b/target/arm/translate-vfp.c
137
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
138
return true;
139
}
140
141
-static bool mve_skip_vmov(DisasContext *s, int vn, int index, int size)
142
+bool mve_skip_vmov(DisasContext *s, int vn, int index, int size)
143
{
144
/*
70
/*
145
* In a CPU with MVE, the VMOV (vector lane to general-purpose register)
71
* Replicate that first octaword.
146
--
72
--
147
2.20.1
73
2.34.1
148
149
diff view generated by jsdifflib
1
Implement the MVE integer vector comparison instructions that compare
1
From: Richard Henderson <richard.henderson@linaro.org>
2
each element against a scalar from a general purpose register. These
3
are "VCMP (vector)" encodings T4, T5 and T6 and "VPT (vector)"
4
encodings T4, T5 and T6.
5
2
6
We have to move the decodetree pattern for VPST, because it
3
The TBI and TCMA bits are located within mtedesc, not desc.
7
overlaps with VCMP T4 with size = 0b11.
8
4
5
Cc: qemu-stable@nongnu.org
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
Tested-by: Gustavo Romero <gustavo.romero@linaro.org>
9
Message-id: 20240207025210.8837-7-richard.henderson@linaro.org
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
10
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
11
---
11
---
12
target/arm/helper-mve.h | 32 +++++++++++++++++++++++++++
12
target/arm/tcg/sme_helper.c | 8 ++++----
13
target/arm/mve.decode | 18 +++++++++++++---
13
target/arm/tcg/sve_helper.c | 12 ++++++------
14
target/arm/mve_helper.c | 44 +++++++++++++++++++++++++++++++-------
14
2 files changed, 10 insertions(+), 10 deletions(-)
15
target/arm/translate-mve.c | 43 +++++++++++++++++++++++++++++++++++++
16
4 files changed, 126 insertions(+), 11 deletions(-)
17
15
18
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
16
diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
19
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
20
--- a/target/arm/helper-mve.h
18
--- a/target/arm/tcg/sme_helper.c
21
+++ b/target/arm/helper-mve.h
19
+++ b/target/arm/tcg/sme_helper.c
22
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vcmpgtw, TCG_CALL_NO_WG, void, env, ptr, ptr)
20
@@ -XXX,XX +XXX,XX @@ void sme_ld1_mte(CPUARMState *env, void *za, uint64_t *vg,
23
DEF_HELPER_FLAGS_3(mve_vcmpleb, TCG_CALL_NO_WG, void, env, ptr, ptr)
21
desc = extract32(desc, 0, SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
24
DEF_HELPER_FLAGS_3(mve_vcmpleh, TCG_CALL_NO_WG, void, env, ptr, ptr)
22
25
DEF_HELPER_FLAGS_3(mve_vcmplew, TCG_CALL_NO_WG, void, env, ptr, ptr)
23
/* Perform gross MTE suppression early. */
26
+
24
- if (!tbi_check(desc, bit55) ||
27
+DEF_HELPER_FLAGS_3(mve_vcmpeq_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
25
- tcma_check(desc, bit55, allocation_tag_from_addr(addr))) {
28
+DEF_HELPER_FLAGS_3(mve_vcmpeq_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
26
+ if (!tbi_check(mtedesc, bit55) ||
29
+DEF_HELPER_FLAGS_3(mve_vcmpeq_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
27
+ tcma_check(mtedesc, bit55, allocation_tag_from_addr(addr))) {
30
+
28
mtedesc = 0;
31
+DEF_HELPER_FLAGS_3(mve_vcmpne_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
29
}
32
+DEF_HELPER_FLAGS_3(mve_vcmpne_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
30
33
+DEF_HELPER_FLAGS_3(mve_vcmpne_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
31
@@ -XXX,XX +XXX,XX @@ void sme_st1_mte(CPUARMState *env, void *za, uint64_t *vg, target_ulong addr,
34
+
32
desc = extract32(desc, 0, SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
35
+DEF_HELPER_FLAGS_3(mve_vcmpcs_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
33
36
+DEF_HELPER_FLAGS_3(mve_vcmpcs_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
34
/* Perform gross MTE suppression early. */
37
+DEF_HELPER_FLAGS_3(mve_vcmpcs_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
35
- if (!tbi_check(desc, bit55) ||
38
+
36
- tcma_check(desc, bit55, allocation_tag_from_addr(addr))) {
39
+DEF_HELPER_FLAGS_3(mve_vcmphi_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
37
+ if (!tbi_check(mtedesc, bit55) ||
40
+DEF_HELPER_FLAGS_3(mve_vcmphi_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
38
+ tcma_check(mtedesc, bit55, allocation_tag_from_addr(addr))) {
41
+DEF_HELPER_FLAGS_3(mve_vcmphi_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
39
mtedesc = 0;
42
+
40
}
43
+DEF_HELPER_FLAGS_3(mve_vcmpge_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
41
44
+DEF_HELPER_FLAGS_3(mve_vcmpge_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
42
diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
45
+DEF_HELPER_FLAGS_3(mve_vcmpge_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
46
+
47
+DEF_HELPER_FLAGS_3(mve_vcmplt_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
48
+DEF_HELPER_FLAGS_3(mve_vcmplt_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
49
+DEF_HELPER_FLAGS_3(mve_vcmplt_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
50
+
51
+DEF_HELPER_FLAGS_3(mve_vcmpgt_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
52
+DEF_HELPER_FLAGS_3(mve_vcmpgt_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
53
+DEF_HELPER_FLAGS_3(mve_vcmpgt_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
54
+
55
+DEF_HELPER_FLAGS_3(mve_vcmple_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
56
+DEF_HELPER_FLAGS_3(mve_vcmple_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
57
+DEF_HELPER_FLAGS_3(mve_vcmple_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
58
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
59
index XXXXXXX..XXXXXXX 100644
43
index XXXXXXX..XXXXXXX 100644
60
--- a/target/arm/mve.decode
44
--- a/target/arm/tcg/sve_helper.c
61
+++ b/target/arm/mve.decode
45
+++ b/target/arm/tcg/sve_helper.c
62
@@ -XXX,XX +XXX,XX @@
46
@@ -XXX,XX +XXX,XX @@ void sve_ldN_r_mte(CPUARMState *env, uint64_t *vg, target_ulong addr,
63
&vidup qd rn size imm
47
desc = extract32(desc, 0, SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
64
&viwdup qd rn rm size imm
48
65
&vcmp qm qn size mask
49
/* Perform gross MTE suppression early. */
66
+&vcmp_scalar qn rm size mask
50
- if (!tbi_check(desc, bit55) ||
67
51
- tcma_check(desc, bit55, allocation_tag_from_addr(addr))) {
68
@vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
52
+ if (!tbi_check(mtedesc, bit55) ||
69
# Note that both Rn and Qd are 3 bits only (no D bit)
53
+ tcma_check(mtedesc, bit55, allocation_tag_from_addr(addr))) {
70
@@ -XXX,XX +XXX,XX @@
54
mtedesc = 0;
71
# Vector comparison; 4-bit Qm but 3-bit Qn
72
%mask_22_13 22:1 13:3
73
@vcmp .... .... .. size:2 qn:3 . .... .... .... .... &vcmp qm=%qm mask=%mask_22_13
74
+@vcmp_scalar .... .... .. size:2 qn:3 . .... .... .... rm:4 &vcmp_scalar \
75
+ mask=%mask_22_13
76
77
# Vector loads and stores
78
79
@@ -XXX,XX +XXX,XX @@ VQRDMULH_scalar 1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
80
rdahi=%rdahi rdalo=%rdalo
81
}
82
83
-# Predicate operations
84
-VPST 1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
85
-
86
# Logical immediate operations (1 reg and modified-immediate)
87
88
# The cmode/op bits here decode VORR/VBIC/VMOV/VMVN, but
89
@@ -XXX,XX +XXX,XX @@ VCMPGE 1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 0 @vcmp
90
VCMPLT 1111 1110 0 . .. ... 1 ... 1 1111 1 0 . 0 ... 0 @vcmp
91
VCMPGT 1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 1 @vcmp
92
VCMPLE 1111 1110 0 . .. ... 1 ... 1 1111 1 0 . 0 ... 1 @vcmp
93
+
94
+{
95
+ VPST 1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
96
+ VCMPEQ_scalar 1111 1110 0 . .. ... 1 ... 0 1111 0 1 0 0 .... @vcmp_scalar
97
+}
98
+VCMPNE_scalar 1111 1110 0 . .. ... 1 ... 0 1111 1 1 0 0 .... @vcmp_scalar
99
+VCMPCS_scalar 1111 1110 0 . .. ... 1 ... 0 1111 0 1 1 0 .... @vcmp_scalar
100
+VCMPHI_scalar 1111 1110 0 . .. ... 1 ... 0 1111 1 1 1 0 .... @vcmp_scalar
101
+VCMPGE_scalar 1111 1110 0 . .. ... 1 ... 1 1111 0 1 0 0 .... @vcmp_scalar
102
+VCMPLT_scalar 1111 1110 0 . .. ... 1 ... 1 1111 1 1 0 0 .... @vcmp_scalar
103
+VCMPGT_scalar 1111 1110 0 . .. ... 1 ... 1 1111 0 1 1 0 .... @vcmp_scalar
104
+VCMPLE_scalar 1111 1110 0 . .. ... 1 ... 1 1111 1 1 1 0 .... @vcmp_scalar
105
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
106
index XXXXXXX..XXXXXXX 100644
107
--- a/target/arm/mve_helper.c
108
+++ b/target/arm/mve_helper.c
109
@@ -XXX,XX +XXX,XX @@ DO_VIWDUP_ALL(vdwdup, do_sub_wrap)
110
mve_advance_vpt(env); \
111
}
55
}
112
56
113
-#define DO_VCMP_S(OP, FN) \
57
@@ -XXX,XX +XXX,XX @@ void sve_ldnfff1_r_mte(CPUARMState *env, void *vg, target_ulong addr,
114
- DO_VCMP(OP##b, 1, int8_t, FN) \
58
desc = extract32(desc, 0, SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
115
- DO_VCMP(OP##h, 2, int16_t, FN) \
59
116
- DO_VCMP(OP##w, 4, int32_t, FN)
60
/* Perform gross MTE suppression early. */
117
+#define DO_VCMP_SCALAR(OP, ESIZE, TYPE, FN) \
61
- if (!tbi_check(desc, bit55) ||
118
+ void HELPER(glue(mve_, OP))(CPUARMState *env, void *vn, \
62
- tcma_check(desc, bit55, allocation_tag_from_addr(addr))) {
119
+ uint32_t rm) \
63
+ if (!tbi_check(mtedesc, bit55) ||
120
+ { \
64
+ tcma_check(mtedesc, bit55, allocation_tag_from_addr(addr))) {
121
+ TYPE *n = vn; \
65
mtedesc = 0;
122
+ uint16_t mask = mve_element_mask(env); \
123
+ uint16_t eci_mask = mve_eci_mask(env); \
124
+ uint16_t beatpred = 0; \
125
+ uint16_t emask = MAKE_64BIT_MASK(0, ESIZE); \
126
+ unsigned e; \
127
+ for (e = 0; e < 16 / ESIZE; e++) { \
128
+ bool r = FN(n[H##ESIZE(e)], (TYPE)rm); \
129
+ /* Comparison sets 0/1 bits for each byte in the element */ \
130
+ beatpred |= r * emask; \
131
+ emask <<= ESIZE; \
132
+ } \
133
+ beatpred &= mask; \
134
+ env->v7m.vpr = (env->v7m.vpr & ~(uint32_t)eci_mask) | \
135
+ (beatpred & eci_mask); \
136
+ mve_advance_vpt(env); \
137
+ }
138
139
-#define DO_VCMP_U(OP, FN) \
140
- DO_VCMP(OP##b, 1, uint8_t, FN) \
141
- DO_VCMP(OP##h, 2, uint16_t, FN) \
142
- DO_VCMP(OP##w, 4, uint32_t, FN)
143
+#define DO_VCMP_S(OP, FN) \
144
+ DO_VCMP(OP##b, 1, int8_t, FN) \
145
+ DO_VCMP(OP##h, 2, int16_t, FN) \
146
+ DO_VCMP(OP##w, 4, int32_t, FN) \
147
+ DO_VCMP_SCALAR(OP##_scalarb, 1, int8_t, FN) \
148
+ DO_VCMP_SCALAR(OP##_scalarh, 2, int16_t, FN) \
149
+ DO_VCMP_SCALAR(OP##_scalarw, 4, int32_t, FN)
150
+
151
+#define DO_VCMP_U(OP, FN) \
152
+ DO_VCMP(OP##b, 1, uint8_t, FN) \
153
+ DO_VCMP(OP##h, 2, uint16_t, FN) \
154
+ DO_VCMP(OP##w, 4, uint32_t, FN) \
155
+ DO_VCMP_SCALAR(OP##_scalarb, 1, uint8_t, FN) \
156
+ DO_VCMP_SCALAR(OP##_scalarh, 2, uint16_t, FN) \
157
+ DO_VCMP_SCALAR(OP##_scalarw, 4, uint32_t, FN)
158
159
#define DO_EQ(N, M) ((N) == (M))
160
#define DO_NE(N, M) ((N) != (M))
161
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
162
index XXXXXXX..XXXXXXX 100644
163
--- a/target/arm/translate-mve.c
164
+++ b/target/arm/translate-mve.c
165
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenOneOpImmFn(TCGv_ptr, TCGv_ptr, TCGv_i64);
166
typedef void MVEGenVIDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32);
167
typedef void MVEGenVIWDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32);
168
typedef void MVEGenCmpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
169
+typedef void MVEGenScalarCmpFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
170
171
/* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
172
static inline long mve_qreg_offset(unsigned reg)
173
@@ -XXX,XX +XXX,XX @@ static bool do_vcmp(DisasContext *s, arg_vcmp *a, MVEGenCmpFn *fn)
174
return true;
175
}
176
177
+static bool do_vcmp_scalar(DisasContext *s, arg_vcmp_scalar *a,
178
+ MVEGenScalarCmpFn *fn)
179
+{
180
+ TCGv_ptr qn;
181
+ TCGv_i32 rm;
182
+
183
+ if (!dc_isar_feature(aa32_mve, s) || !fn || a->rm == 13) {
184
+ return false;
185
+ }
186
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
187
+ return true;
188
+ }
189
+
190
+ qn = mve_qreg_ptr(a->qn);
191
+ if (a->rm == 15) {
192
+ /* Encoding Rm=0b1111 means "constant zero" */
193
+ rm = tcg_constant_i32(0);
194
+ } else {
195
+ rm = load_reg(s, a->rm);
196
+ }
197
+ fn(cpu_env, qn, rm);
198
+ tcg_temp_free_ptr(qn);
199
+ tcg_temp_free_i32(rm);
200
+ if (a->mask) {
201
+ /* VPT */
202
+ gen_vpst(s, a->mask);
203
+ }
204
+ mve_update_eci(s);
205
+ return true;
206
+}
207
+
208
#define DO_VCMP(INSN, FN) \
209
static bool trans_##INSN(DisasContext *s, arg_vcmp *a) \
210
{ \
211
@@ -XXX,XX +XXX,XX @@ static bool do_vcmp(DisasContext *s, arg_vcmp *a, MVEGenCmpFn *fn)
212
NULL, \
213
}; \
214
return do_vcmp(s, a, fns[a->size]); \
215
+ } \
216
+ static bool trans_##INSN##_scalar(DisasContext *s, \
217
+ arg_vcmp_scalar *a) \
218
+ { \
219
+ static MVEGenScalarCmpFn * const fns[] = { \
220
+ gen_helper_mve_##FN##_scalarb, \
221
+ gen_helper_mve_##FN##_scalarh, \
222
+ gen_helper_mve_##FN##_scalarw, \
223
+ NULL, \
224
+ }; \
225
+ return do_vcmp_scalar(s, a, fns[a->size]); \
226
}
66
}
227
67
228
DO_VCMP(VCMPEQ, vcmpeq)
68
@@ -XXX,XX +XXX,XX @@ void sve_stN_r_mte(CPUARMState *env, uint64_t *vg, target_ulong addr,
69
desc = extract32(desc, 0, SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
70
71
/* Perform gross MTE suppression early. */
72
- if (!tbi_check(desc, bit55) ||
73
- tcma_check(desc, bit55, allocation_tag_from_addr(addr))) {
74
+ if (!tbi_check(mtedesc, bit55) ||
75
+ tcma_check(mtedesc, bit55, allocation_tag_from_addr(addr))) {
76
mtedesc = 0;
77
}
78
229
--
79
--
230
2.20.1
80
2.34.1
231
232
diff view generated by jsdifflib
1
All the users of the vmlaldav formats have an 'x bit in bit 12 and an
1
The raven_io_ops MemoryRegionOps is the only one in the source tree
2
'a' bit in bit 5; move these to the format rather than specifying them
2
which sets .valid.unaligned to indicate that it should support
3
in each insn pattern.
3
unaligned accesses and which does not also set .impl.unaligned to
4
indicate that its read and write functions can do the unaligned
5
handling themselves. This is a problem, because at the moment the
6
core memory system does not implement the support for handling
7
unaligned accesses by doing a series of aligned accesses and
8
combining them (system/memory.c:access_with_adjusted_size() has a
9
TODO comment noting this).
4
10
11
Fortunately raven_io_read() and raven_io_write() will correctly deal
12
with the case of being passed an unaligned address, so we can fix the
13
missing unaligned access support by setting .impl.unaligned in the
14
MemoryRegionOps struct.
15
16
Fixes: 9a1839164c9c8f06 ("raven: Implement non-contiguous I/O region")
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
17
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
18
Tested-by: Cédric Le Goater <clg@redhat.com>
19
Reviewed-by: Cédric Le Goater <clg@redhat.com>
20
Message-id: 20240112134640.1775041-1-peter.maydell@linaro.org
7
---
21
---
8
target/arm/mve.decode | 16 ++++++++--------
22
hw/pci-host/raven.c | 1 +
9
1 file changed, 8 insertions(+), 8 deletions(-)
23
1 file changed, 1 insertion(+)
10
24
11
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
25
diff --git a/hw/pci-host/raven.c b/hw/pci-host/raven.c
12
index XXXXXXX..XXXXXXX 100644
26
index XXXXXXX..XXXXXXX 100644
13
--- a/target/arm/mve.decode
27
--- a/hw/pci-host/raven.c
14
+++ b/target/arm/mve.decode
28
+++ b/hw/pci-host/raven.c
15
@@ -XXX,XX +XXX,XX @@ VDUP 1110 1110 1 0 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=2
29
@@ -XXX,XX +XXX,XX @@ static const MemoryRegionOps raven_io_ops = {
16
30
.write = raven_io_write,
17
&vmlaldav rdahi rdalo size qn qm x a
31
.endianness = DEVICE_LITTLE_ENDIAN,
18
32
.impl.max_access_size = 4,
19
-@vmlaldav .... .... . ... ... . ... . .... .... qm:3 . \
33
+ .impl.unaligned = true,
20
+@vmlaldav .... .... . ... ... . ... x:1 .... .. a:1 . qm:3 . \
34
.valid.unaligned = true,
21
qn=%qn rdahi=%rdahi rdalo=%rdalo size=%size_16 &vmlaldav
35
};
22
-@vmlaldav_nosz .... .... . ... ... . ... . .... .... qm:3 . \
23
+@vmlaldav_nosz .... .... . ... ... . ... x:1 .... .. a:1 . qm:3 . \
24
qn=%qn rdahi=%rdahi rdalo=%rdalo size=0 &vmlaldav
25
-VMLALDAV_S 1110 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
26
-VMLALDAV_U 1111 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
27
+VMLALDAV_S 1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
28
+VMLALDAV_U 1111 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
29
30
-VMLSLDAV 1110 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav
31
+VMLSLDAV 1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 1 @vmlaldav
32
33
-VRMLALDAVH_S 1110 1110 1 ... ... 0 ... x:1 1111 . 0 a:1 0 ... 0 @vmlaldav_nosz
34
-VRMLALDAVH_U 1111 1110 1 ... ... 0 ... x:1 1111 . 0 a:1 0 ... 0 @vmlaldav_nosz
35
+VRMLALDAVH_S 1110 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
36
+VRMLALDAVH_U 1111 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
37
38
-VRMLSLDAVH 1111 1110 1 ... ... 0 ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav_nosz
39
+VRMLSLDAVH 1111 1110 1 ... ... 0 ... . 1110 . 0 . 0 ... 1 @vmlaldav_nosz
40
41
# Scalar operations
42
36
43
--
37
--
44
2.20.1
38
2.34.1
45
39
46
40
diff view generated by jsdifflib
1
In the MVE helpers for the narrowing operations (DO_VSHRN and
1
Suppress the deprecation warning when we're running under qtest,
2
DO_VSHRN_SAT) we were using the wrong bits of the predicate mask for
2
to avoid "make check" including warning messages in its output.
3
the 'top' versions of the insn. This is because the loop works over
4
the double-sized input elements and shifts the predicate mask by that
5
many bits each time, but when we write out the half-sized output we
6
must look at the mask bits for whichever half of the element we are
7
writing to.
8
9
Correct this by shifting the whole mask right by ESIZE bits for the
10
'top' insns. This allows us also to simplify the saturation bit
11
checking (where we had noticed that we needed to look at a different
12
mask bit for the 'top' insn.)
13
3
14
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
15
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
5
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
6
Message-id: 20240206154151.155620-1-peter.maydell@linaro.org
16
---
7
---
17
target/arm/mve_helper.c | 4 +++-
8
hw/block/tc58128.c | 4 +++-
18
1 file changed, 3 insertions(+), 1 deletion(-)
9
1 file changed, 3 insertions(+), 1 deletion(-)
19
10
20
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
11
diff --git a/hw/block/tc58128.c b/hw/block/tc58128.c
21
index XXXXXXX..XXXXXXX 100644
12
index XXXXXXX..XXXXXXX 100644
22
--- a/target/arm/mve_helper.c
13
--- a/hw/block/tc58128.c
23
+++ b/target/arm/mve_helper.c
14
+++ b/hw/block/tc58128.c
24
@@ -XXX,XX +XXX,XX @@ DO_VSHLL_ALL(vshllt, true)
15
@@ -XXX,XX +XXX,XX @@ static sh7750_io_device tc58128 = {
25
TYPE *d = vd; \
16
26
uint16_t mask = mve_element_mask(env); \
17
int tc58128_init(struct SH7750State *s, const char *zone1, const char *zone2)
27
unsigned le; \
18
{
28
+ mask >>= ESIZE * TOP; \
19
- warn_report_once("The TC58128 flash device is deprecated");
29
for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) { \
20
+ if (!qtest_enabled()) {
30
TYPE r = FN(m[H##LESIZE(le)], shift); \
21
+ warn_report_once("The TC58128 flash device is deprecated");
31
mergemask(&d[H##ESIZE(le * 2 + TOP)], r, mask); \
22
+ }
32
@@ -XXX,XX +XXX,XX @@ static inline int32_t do_sat_bhs(int64_t val, int64_t min, int64_t max,
23
init_dev(&tc58128_devs[0], zone1);
33
uint16_t mask = mve_element_mask(env); \
24
init_dev(&tc58128_devs[1], zone2);
34
bool qc = false; \
25
return sh7750_register_io_device(s, &tc58128);
35
unsigned le; \
36
+ mask >>= ESIZE * TOP; \
37
for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) { \
38
bool sat = false; \
39
TYPE r = FN(m[H##LESIZE(le)], shift, &sat); \
40
mergemask(&d[H##ESIZE(le * 2 + TOP)], r, mask); \
41
- qc |= sat && (mask & 1 << (TOP * ESIZE)); \
42
+ qc |= sat & mask & 1; \
43
} \
44
if (qc) { \
45
env->vfp.qc[0] = qc; \
46
--
26
--
47
2.20.1
27
2.34.1
48
28
49
29
diff view generated by jsdifflib
1
Implement the MVE VMAXA and VMINA insns, which take the absolute
1
We deliberately don't include qtests_npcm7xx in qtests_aarch64,
2
value of the signed elements in the input vector and then accumulate
2
because we already get the coverage of those tests via qtests_arm,
3
the unsigned max or min into the destination vector.
3
and we don't want to use extra CI minutes testing them twice.
4
4
5
In commit 327b680877b79c4b we added it to qtests_aarch64; revert
6
that change.
7
8
Fixes: 327b680877b79c4b ("tests/qtest: Creating qtest for GMAC Module")
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
10
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
11
Message-id: 20240206163043.315535-1-peter.maydell@linaro.org
7
---
12
---
8
target/arm/helper-mve.h | 8 ++++++++
13
tests/qtest/meson.build | 1 -
9
target/arm/mve.decode | 4 ++++
14
1 file changed, 1 deletion(-)
10
target/arm/mve_helper.c | 26 ++++++++++++++++++++++++++
11
target/arm/translate-mve.c | 2 ++
12
4 files changed, 40 insertions(+)
13
15
14
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
16
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
15
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/helper-mve.h
18
--- a/tests/qtest/meson.build
17
+++ b/target/arm/helper-mve.h
19
+++ b/tests/qtest/meson.build
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vqnegb, TCG_CALL_NO_WG, void, env, ptr, ptr)
20
@@ -XXX,XX +XXX,XX @@ qtests_aarch64 = \
19
DEF_HELPER_FLAGS_3(mve_vqnegh, TCG_CALL_NO_WG, void, env, ptr, ptr)
21
(config_all_devices.has_key('CONFIG_RASPI') ? ['bcm2835-dma-test'] : []) + \
20
DEF_HELPER_FLAGS_3(mve_vqnegw, TCG_CALL_NO_WG, void, env, ptr, ptr)
22
(config_all_accel.has_key('CONFIG_TCG') and \
21
23
config_all_devices.has_key('CONFIG_TPM_TIS_I2C') ? ['tpm-tis-i2c-test'] : []) + \
22
+DEF_HELPER_FLAGS_3(mve_vmaxab, TCG_CALL_NO_WG, void, env, ptr, ptr)
24
- (config_all_devices.has_key('CONFIG_NPCM7XX') ? qtests_npcm7xx : []) + \
23
+DEF_HELPER_FLAGS_3(mve_vmaxah, TCG_CALL_NO_WG, void, env, ptr, ptr)
25
['arm-cpu-features',
24
+DEF_HELPER_FLAGS_3(mve_vmaxaw, TCG_CALL_NO_WG, void, env, ptr, ptr)
26
'numa-test',
25
+
27
'boot-serial-test',
26
+DEF_HELPER_FLAGS_3(mve_vminab, TCG_CALL_NO_WG, void, env, ptr, ptr)
27
+DEF_HELPER_FLAGS_3(mve_vminah, TCG_CALL_NO_WG, void, env, ptr, ptr)
28
+DEF_HELPER_FLAGS_3(mve_vminaw, TCG_CALL_NO_WG, void, env, ptr, ptr)
29
+
30
DEF_HELPER_FLAGS_3(mve_vmovnbb, TCG_CALL_NO_WG, void, env, ptr, ptr)
31
DEF_HELPER_FLAGS_3(mve_vmovnbh, TCG_CALL_NO_WG, void, env, ptr, ptr)
32
DEF_HELPER_FLAGS_3(mve_vmovntb, TCG_CALL_NO_WG, void, env, ptr, ptr)
33
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
34
index XXXXXXX..XXXXXXX 100644
35
--- a/target/arm/mve.decode
36
+++ b/target/arm/mve.decode
37
@@ -XXX,XX +XXX,XX @@ VMUL 1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
38
VQMOVUNB 111 0 1110 0 . 11 .. 01 ... 0 1110 1 0 . 0 ... 1 @1op
39
VQMOVN_BS 111 0 1110 0 . 11 .. 11 ... 0 1110 0 0 . 0 ... 1 @1op
40
41
+ VMAXA 111 0 1110 0 . 11 .. 11 ... 0 1110 1 0 . 0 ... 1 @1op
42
+
43
VMULH_S 111 0 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
44
}
45
46
@@ -XXX,XX +XXX,XX @@ VMUL 1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
47
VQMOVUNT 111 0 1110 0 . 11 .. 01 ... 1 1110 1 0 . 0 ... 1 @1op
48
VQMOVN_TS 111 0 1110 0 . 11 .. 11 ... 1 1110 0 0 . 0 ... 1 @1op
49
50
+ VMINA 111 0 1110 0 . 11 .. 11 ... 1 1110 1 0 . 0 ... 1 @1op
51
+
52
VRMULH_S 111 0 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
53
}
54
55
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
56
index XXXXXXX..XXXXXXX 100644
57
--- a/target/arm/mve_helper.c
58
+++ b/target/arm/mve_helper.c
59
@@ -XXX,XX +XXX,XX @@ DO_1OP_SAT(vqabsw, 4, int32_t, DO_VQABS_W)
60
DO_1OP_SAT(vqnegb, 1, int8_t, DO_VQNEG_B)
61
DO_1OP_SAT(vqnegh, 2, int16_t, DO_VQNEG_H)
62
DO_1OP_SAT(vqnegw, 4, int32_t, DO_VQNEG_W)
63
+
64
+/*
65
+ * VMAXA, VMINA: vd is unsigned; vm is signed, and we take its
66
+ * absolute value; we then do an unsigned comparison.
67
+ */
68
+#define DO_VMAXMINA(OP, ESIZE, STYPE, UTYPE, FN) \
69
+ void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm) \
70
+ { \
71
+ UTYPE *d = vd; \
72
+ STYPE *m = vm; \
73
+ uint16_t mask = mve_element_mask(env); \
74
+ unsigned e; \
75
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
76
+ UTYPE r = DO_ABS(m[H##ESIZE(e)]); \
77
+ r = FN(d[H##ESIZE(e)], r); \
78
+ mergemask(&d[H##ESIZE(e)], r, mask); \
79
+ } \
80
+ mve_advance_vpt(env); \
81
+ }
82
+
83
+DO_VMAXMINA(vmaxab, 1, int8_t, uint8_t, DO_MAX)
84
+DO_VMAXMINA(vmaxah, 2, int16_t, uint16_t, DO_MAX)
85
+DO_VMAXMINA(vmaxaw, 4, int32_t, uint32_t, DO_MAX)
86
+DO_VMAXMINA(vminab, 1, int8_t, uint8_t, DO_MIN)
87
+DO_VMAXMINA(vminah, 2, int16_t, uint16_t, DO_MIN)
88
+DO_VMAXMINA(vminaw, 4, int32_t, uint32_t, DO_MIN)
89
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
90
index XXXXXXX..XXXXXXX 100644
91
--- a/target/arm/translate-mve.c
92
+++ b/target/arm/translate-mve.c
93
@@ -XXX,XX +XXX,XX @@ DO_1OP(VABS, vabs)
94
DO_1OP(VNEG, vneg)
95
DO_1OP(VQABS, vqabs)
96
DO_1OP(VQNEG, vqneg)
97
+DO_1OP(VMAXA, vmaxa)
98
+DO_1OP(VMINA, vmina)
99
100
/* Narrowing moves: only size 0 and 1 are valid */
101
#define DO_VMOVN(INSN, FN) \
102
--
28
--
103
2.20.1
29
2.34.1
104
30
105
31
diff view generated by jsdifflib
1
Although the architecture doesn't define it as an alias, VMOVL
1
Allow changes to the virt GTDT -- we are going to add the IRQ
2
(vector move long) is encoded as a VSHLL with a zero shift.
2
entry for a new timer to it.
3
Add a comment in the decode file noting that we handle VMOVL
4
as part of VSHLL.
5
3
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
5
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
6
Message-id: 20240122143537.233498-2-peter.maydell@linaro.org
8
---
7
---
9
target/arm/mve.decode | 2 ++
8
tests/qtest/bios-tables-test-allowed-diff.h | 2 ++
10
1 file changed, 2 insertions(+)
9
1 file changed, 2 insertions(+)
11
10
12
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
11
diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
13
index XXXXXXX..XXXXXXX 100644
12
index XXXXXXX..XXXXXXX 100644
14
--- a/target/arm/mve.decode
13
--- a/tests/qtest/bios-tables-test-allowed-diff.h
15
+++ b/target/arm/mve.decode
14
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
16
@@ -XXX,XX +XXX,XX @@ VRSHRI_U 111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_h
15
@@ -1 +1,3 @@
17
VRSHRI_U 111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_w
16
/* List of comma-separated changed AML files to ignore */
18
17
+"tests/data/acpi/virt/FACP",
19
# VSHLL T1 encoding; the T2 VSHLL encoding is elsewhere in this file
18
+"tests/data/acpi/virt/GTDT",
20
+# Note that VMOVL is encoded as "VSHLL with a zero shift count"; we
21
+# implement it that way rather than special-casing it in the decode.
22
VSHLL_BS 111 0 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_b
23
VSHLL_BS 111 0 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_h
24
25
--
19
--
26
2.20.1
20
2.34.1
27
28
diff view generated by jsdifflib
1
Implement the MVE integer min/max across vector insns
1
Armv8.1+ CPUs have the Virtual Host Extension (VHE) which adds a
2
VMAXV, VMINV, VMAXAV and VMINAV, which find the maximum
2
non-secure EL2 virtual timer. We implemented the timer itself in the
3
from the vector elements and a general purpose register,
3
CPU model, but never wired up its IRQ line to the GIC.
4
and store the maximum back into the general purpose
4
5
register.
5
Wire up the IRQ line (this is always safe whether the CPU has the
6
6
interrupt or not, since it always creates the outbound IRQ line).
7
These insns overlap with VRMLALDAVH (they use what would
7
Report it to the guest via dtb and ACPI if the CPU has the feature.
8
be RdaHi=0b110).
8
9
The DTB binding is documented in the kernel's
10
Documentation/devicetree/bindings/timer/arm\,arch_timer.yaml
11
and the ACPI table entries are documented in the ACPI specification
12
version 6.3 or later.
13
14
Because the IRQ line ACPI binding is new in 6.3, we need to bump the
15
FADT table rev to show that we might be using 6.3 features.
16
17
Note that exposing this IRQ in the DTB will trigger a bug in EDK2
18
versions prior to edk2-stable202311, for users who use the virt board
19
with 'virtualization=on' to enable EL2 emulation and are booting an
20
EDK2 guest BIOS, if that EDK2 has assertions enabled. The effect is
21
that EDK2 will assert on bootup:
22
23
ASSERT [ArmTimerDxe] /home/kraxel/projects/qemu/roms/edk2/ArmVirtPkg/Library/ArmVirtTimerFdtClientLib/ArmVirtTimerFdtClientLib.c(72): PropSize == 36 || PropSize == 48
24
25
If you see that assertion you should do one of:
26
* update your EDK2 binaries to edk2-stable202311 or newer
27
* use the 'virt-8.2' versioned machine type
28
* not use 'virtualization=on'
29
30
(The versions shipped with QEMU itself have the fix.)
9
31
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
32
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
33
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
34
Message-id: 20240122143537.233498-3-peter.maydell@linaro.org
12
---
35
---
13
target/arm/helper-mve.h | 20 ++++++++++++
36
include/hw/arm/virt.h | 2 ++
14
target/arm/mve.decode | 18 +++++++++--
37
hw/arm/virt-acpi-build.c | 20 ++++++++++----
15
target/arm/mve_helper.c | 66 ++++++++++++++++++++++++++++++++++++++
38
hw/arm/virt.c | 60 ++++++++++++++++++++++++++++++++++------
16
target/arm/translate-mve.c | 48 +++++++++++++++++++++++++++
39
3 files changed, 67 insertions(+), 15 deletions(-)
17
4 files changed, 150 insertions(+), 2 deletions(-)
40
18
41
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
19
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
20
index XXXXXXX..XXXXXXX 100644
42
index XXXXXXX..XXXXXXX 100644
21
--- a/target/arm/helper-mve.h
43
--- a/include/hw/arm/virt.h
22
+++ b/target/arm/helper-mve.h
44
+++ b/include/hw/arm/virt.h
23
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vaddvuh, TCG_CALL_NO_WG, i32, env, ptr, i32)
45
@@ -XXX,XX +XXX,XX @@ struct VirtMachineClass {
24
DEF_HELPER_FLAGS_3(mve_vaddvsw, TCG_CALL_NO_WG, i32, env, ptr, i32)
46
/* Machines < 6.2 have no support for describing cpu topology to guest */
25
DEF_HELPER_FLAGS_3(mve_vaddvuw, TCG_CALL_NO_WG, i32, env, ptr, i32)
47
bool no_cpu_topology;
26
48
bool no_tcg_lpa2;
27
+DEF_HELPER_FLAGS_3(mve_vmaxvsb, TCG_CALL_NO_WG, i32, env, ptr, i32)
49
+ bool no_ns_el2_virt_timer_irq;
28
+DEF_HELPER_FLAGS_3(mve_vmaxvsh, TCG_CALL_NO_WG, i32, env, ptr, i32)
50
};
29
+DEF_HELPER_FLAGS_3(mve_vmaxvsw, TCG_CALL_NO_WG, i32, env, ptr, i32)
51
30
+DEF_HELPER_FLAGS_3(mve_vmaxvub, TCG_CALL_NO_WG, i32, env, ptr, i32)
52
struct VirtMachineState {
31
+DEF_HELPER_FLAGS_3(mve_vmaxvuh, TCG_CALL_NO_WG, i32, env, ptr, i32)
53
@@ -XXX,XX +XXX,XX @@ struct VirtMachineState {
32
+DEF_HELPER_FLAGS_3(mve_vmaxvuw, TCG_CALL_NO_WG, i32, env, ptr, i32)
54
PCIBus *bus;
33
+DEF_HELPER_FLAGS_3(mve_vmaxavb, TCG_CALL_NO_WG, i32, env, ptr, i32)
55
char *oem_id;
34
+DEF_HELPER_FLAGS_3(mve_vmaxavh, TCG_CALL_NO_WG, i32, env, ptr, i32)
56
char *oem_table_id;
35
+DEF_HELPER_FLAGS_3(mve_vmaxavw, TCG_CALL_NO_WG, i32, env, ptr, i32)
57
+ bool ns_el2_virt_timer_irq;
36
+
58
};
37
+DEF_HELPER_FLAGS_3(mve_vminvsb, TCG_CALL_NO_WG, i32, env, ptr, i32)
59
38
+DEF_HELPER_FLAGS_3(mve_vminvsh, TCG_CALL_NO_WG, i32, env, ptr, i32)
60
#define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM : VIRT_PCIE_ECAM)
39
+DEF_HELPER_FLAGS_3(mve_vminvsw, TCG_CALL_NO_WG, i32, env, ptr, i32)
61
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
40
+DEF_HELPER_FLAGS_3(mve_vminvub, TCG_CALL_NO_WG, i32, env, ptr, i32)
41
+DEF_HELPER_FLAGS_3(mve_vminvuh, TCG_CALL_NO_WG, i32, env, ptr, i32)
42
+DEF_HELPER_FLAGS_3(mve_vminvuw, TCG_CALL_NO_WG, i32, env, ptr, i32)
43
+DEF_HELPER_FLAGS_3(mve_vminavb, TCG_CALL_NO_WG, i32, env, ptr, i32)
44
+DEF_HELPER_FLAGS_3(mve_vminavh, TCG_CALL_NO_WG, i32, env, ptr, i32)
45
+DEF_HELPER_FLAGS_3(mve_vminavw, TCG_CALL_NO_WG, i32, env, ptr, i32)
46
+
47
DEF_HELPER_FLAGS_3(mve_vaddlv_s, TCG_CALL_NO_WG, i64, env, ptr, i64)
48
DEF_HELPER_FLAGS_3(mve_vaddlv_u, TCG_CALL_NO_WG, i64, env, ptr, i64)
49
50
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
51
index XXXXXXX..XXXXXXX 100644
62
index XXXXXXX..XXXXXXX 100644
52
--- a/target/arm/mve.decode
63
--- a/hw/arm/virt-acpi-build.c
53
+++ b/target/arm/mve.decode
64
+++ b/hw/arm/virt-acpi-build.c
54
@@ -XXX,XX +XXX,XX @@
65
@@ -XXX,XX +XXX,XX @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
55
&vcmp qm qn size mask
66
}
56
&vcmp_scalar qn rm size mask
67
57
&shl_scalar qda rm size
68
/*
58
+&vmaxv qm rda size
69
- * ACPI spec, Revision 5.1
59
70
- * 5.2.24 Generic Timer Description Table (GTDT)
60
@vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
71
+ * ACPI spec, Revision 6.5
61
# Note that both Rn and Qd are 3 bits only (no D bit)
72
+ * 5.2.25 Generic Timer Description Table (GTDT)
62
@@ -XXX,XX +XXX,XX @@
73
*/
63
@vcmp_scalar .... .... .. size:2 qn:3 . .... .... .... rm:4 &vcmp_scalar \
74
static void
64
mask=%mask_22_13
75
build_gtdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
65
76
@@ -XXX,XX +XXX,XX @@ build_gtdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
66
+@vmaxv .... .... .... size:2 .. rda:4 .... .... .... &vmaxv qm=%qm
77
uint32_t irqflags = vmc->claim_edge_triggered_timers ?
67
+
78
1 : /* Interrupt is Edge triggered */
68
# Vector loads and stores
79
0; /* Interrupt is Level triggered */
69
80
- AcpiTable table = { .sig = "GTDT", .rev = 2, .oem_id = vms->oem_id,
70
# Widening loads and narrowing stores:
81
+ AcpiTable table = { .sig = "GTDT", .rev = 3, .oem_id = vms->oem_id,
71
@@ -XXX,XX +XXX,XX @@ VMLALDAV_U 1111 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
82
.oem_table_id = vms->oem_table_id };
72
83
73
VMLSLDAV 1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 1 @vmlaldav
84
acpi_table_begin(&table, table_data);
74
85
@@ -XXX,XX +XXX,XX @@ build_gtdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
75
-VRMLALDAVH_S 1110 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
86
build_append_int_noprefix(table_data, 0, 4);
76
-VRMLALDAVH_U 1111 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
87
/* Platform Timer Offset */
88
build_append_int_noprefix(table_data, 0, 4);
89
-
90
+ if (vms->ns_el2_virt_timer_irq) {
91
+ /* Virtual EL2 Timer GSIV */
92
+ build_append_int_noprefix(table_data, ARCH_TIMER_NS_EL2_VIRT_IRQ, 4);
93
+ /* Virtual EL2 Timer Flags */
94
+ build_append_int_noprefix(table_data, irqflags, 4);
95
+ } else {
96
+ build_append_int_noprefix(table_data, 0, 4);
97
+ build_append_int_noprefix(table_data, 0, 4);
98
+ }
99
acpi_table_end(linker, &table);
100
}
101
102
@@ -XXX,XX +XXX,XX @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
103
static void build_fadt_rev6(GArray *table_data, BIOSLinker *linker,
104
VirtMachineState *vms, unsigned dsdt_tbl_offset)
105
{
106
- /* ACPI v6.0 */
107
+ /* ACPI v6.3 */
108
AcpiFadtData fadt = {
109
.rev = 6,
110
- .minor_ver = 0,
111
+ .minor_ver = 3,
112
.flags = 1 << ACPI_FADT_F_HW_REDUCED_ACPI,
113
.xdsdt_tbl_offset = &dsdt_tbl_offset,
114
};
115
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
116
index XXXXXXX..XXXXXXX 100644
117
--- a/hw/arm/virt.c
118
+++ b/hw/arm/virt.c
119
@@ -XXX,XX +XXX,XX @@ static void create_randomness(MachineState *ms, const char *node)
120
qemu_fdt_setprop(ms->fdt, node, "rng-seed", seed.rng, sizeof(seed.rng));
121
}
122
123
+/*
124
+ * The CPU object always exposes the NS EL2 virt timer IRQ line,
125
+ * but we don't want to advertise it to the guest in the dtb or ACPI
126
+ * table unless it's really going to do something.
127
+ */
128
+static bool ns_el2_virt_timer_present(void)
77
+{
129
+{
78
+ VMAXV_S 1110 1110 1110 .. 10 .... 1111 0 0 . 0 ... 0 @vmaxv
130
+ ARMCPU *cpu = ARM_CPU(qemu_get_cpu(0));
79
+ VMINV_S 1110 1110 1110 .. 10 .... 1111 1 0 . 0 ... 0 @vmaxv
131
+ CPUARMState *env = &cpu->env;
80
+ VMAXAV 1110 1110 1110 .. 00 .... 1111 0 0 . 0 ... 0 @vmaxv
132
+
81
+ VMINAV 1110 1110 1110 .. 00 .... 1111 1 0 . 0 ... 0 @vmaxv
133
+ return arm_feature(env, ARM_FEATURE_AARCH64) &&
82
+ VRMLALDAVH_S 1110 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
134
+ arm_feature(env, ARM_FEATURE_EL2) && cpu_isar_feature(aa64_vh, cpu);
83
+}
135
+}
84
+
136
+
85
+{
137
static void create_fdt(VirtMachineState *vms)
86
+ VMAXV_U 1111 1110 1110 .. 10 .... 1111 0 0 . 0 ... 0 @vmaxv
138
{
87
+ VMINV_U 1111 1110 1110 .. 10 .... 1111 1 0 . 0 ... 0 @vmaxv
139
MachineState *ms = MACHINE(vms);
88
+ VRMLALDAVH_U 1111 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
140
@@ -XXX,XX +XXX,XX @@ static void fdt_add_timer_nodes(const VirtMachineState *vms)
89
+}
141
"arm,armv7-timer");
90
142
}
91
VRMLSLDAVH 1111 1110 1 ... ... 0 ... . 1110 . 0 . 0 ... 1 @vmlaldav_nosz
143
qemu_fdt_setprop(ms->fdt, "/timer", "always-on", NULL, 0);
92
144
- qemu_fdt_setprop_cells(ms->fdt, "/timer", "interrupts",
93
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
145
- GIC_FDT_IRQ_TYPE_PPI,
94
index XXXXXXX..XXXXXXX 100644
146
- INTID_TO_PPI(ARCH_TIMER_S_EL1_IRQ), irqflags,
95
--- a/target/arm/mve_helper.c
147
- GIC_FDT_IRQ_TYPE_PPI,
96
+++ b/target/arm/mve_helper.c
148
- INTID_TO_PPI(ARCH_TIMER_NS_EL1_IRQ), irqflags,
97
@@ -XXX,XX +XXX,XX @@ DO_VADDV(vaddvub, 1, uint8_t)
149
- GIC_FDT_IRQ_TYPE_PPI,
98
DO_VADDV(vaddvuh, 2, uint16_t)
150
- INTID_TO_PPI(ARCH_TIMER_VIRT_IRQ), irqflags,
99
DO_VADDV(vaddvuw, 4, uint32_t)
151
- GIC_FDT_IRQ_TYPE_PPI,
100
152
- INTID_TO_PPI(ARCH_TIMER_NS_EL2_IRQ), irqflags);
101
+/*
153
+ if (vms->ns_el2_virt_timer_irq) {
102
+ * Vector max/min across vector. Unlike VADDV, we must
154
+ qemu_fdt_setprop_cells(ms->fdt, "/timer", "interrupts",
103
+ * read ra as the element size, not its full width.
155
+ GIC_FDT_IRQ_TYPE_PPI,
104
+ * We work with int64_t internally for simplicity.
156
+ INTID_TO_PPI(ARCH_TIMER_S_EL1_IRQ), irqflags,
105
+ */
157
+ GIC_FDT_IRQ_TYPE_PPI,
106
+#define DO_VMAXMINV(OP, ESIZE, TYPE, RATYPE, FN) \
158
+ INTID_TO_PPI(ARCH_TIMER_NS_EL1_IRQ), irqflags,
107
+ uint32_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vm, \
159
+ GIC_FDT_IRQ_TYPE_PPI,
108
+ uint32_t ra_in) \
160
+ INTID_TO_PPI(ARCH_TIMER_VIRT_IRQ), irqflags,
109
+ { \
161
+ GIC_FDT_IRQ_TYPE_PPI,
110
+ uint16_t mask = mve_element_mask(env); \
162
+ INTID_TO_PPI(ARCH_TIMER_NS_EL2_IRQ), irqflags,
111
+ unsigned e; \
163
+ GIC_FDT_IRQ_TYPE_PPI,
112
+ TYPE *m = vm; \
164
+ INTID_TO_PPI(ARCH_TIMER_NS_EL2_VIRT_IRQ), irqflags);
113
+ int64_t ra = (RATYPE)ra_in; \
165
+ } else {
114
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
166
+ qemu_fdt_setprop_cells(ms->fdt, "/timer", "interrupts",
115
+ if (mask & 1) { \
167
+ GIC_FDT_IRQ_TYPE_PPI,
116
+ ra = FN(ra, m[H##ESIZE(e)]); \
168
+ INTID_TO_PPI(ARCH_TIMER_S_EL1_IRQ), irqflags,
117
+ } \
169
+ GIC_FDT_IRQ_TYPE_PPI,
118
+ } \
170
+ INTID_TO_PPI(ARCH_TIMER_NS_EL1_IRQ), irqflags,
119
+ mve_advance_vpt(env); \
171
+ GIC_FDT_IRQ_TYPE_PPI,
120
+ return ra; \
172
+ INTID_TO_PPI(ARCH_TIMER_VIRT_IRQ), irqflags,
121
+ } \
173
+ GIC_FDT_IRQ_TYPE_PPI,
122
+
174
+ INTID_TO_PPI(ARCH_TIMER_NS_EL2_IRQ), irqflags);
123
+#define DO_VMAXMINV_U(INSN, FN) \
124
+ DO_VMAXMINV(INSN##b, 1, uint8_t, uint8_t, FN) \
125
+ DO_VMAXMINV(INSN##h, 2, uint16_t, uint16_t, FN) \
126
+ DO_VMAXMINV(INSN##w, 4, uint32_t, uint32_t, FN)
127
+#define DO_VMAXMINV_S(INSN, FN) \
128
+ DO_VMAXMINV(INSN##b, 1, int8_t, int8_t, FN) \
129
+ DO_VMAXMINV(INSN##h, 2, int16_t, int16_t, FN) \
130
+ DO_VMAXMINV(INSN##w, 4, int32_t, int32_t, FN)
131
+
132
+/*
133
+ * Helpers for max and min of absolute values across vector:
134
+ * note that we only take the absolute value of 'm', not 'n'
135
+ */
136
+static int64_t do_maxa(int64_t n, int64_t m)
137
+{
138
+ if (m < 0) {
139
+ m = -m;
140
+ }
175
+ }
141
+ return MAX(n, m);
176
}
142
+}
177
143
+
178
static void fdt_add_cpu_nodes(const VirtMachineState *vms)
144
+static int64_t do_mina(int64_t n, int64_t m)
179
@@ -XXX,XX +XXX,XX @@ static void create_gic(VirtMachineState *vms, MemoryRegion *mem)
145
+{
180
[GTIMER_VIRT] = ARCH_TIMER_VIRT_IRQ,
146
+ if (m < 0) {
181
[GTIMER_HYP] = ARCH_TIMER_NS_EL2_IRQ,
147
+ m = -m;
182
[GTIMER_SEC] = ARCH_TIMER_S_EL1_IRQ,
148
+ }
183
+ [GTIMER_HYPVIRT] = ARCH_TIMER_NS_EL2_VIRT_IRQ,
149
+ return MIN(n, m);
184
};
150
+}
185
151
+
186
for (unsigned irq = 0; irq < ARRAY_SIZE(timer_irq); irq++) {
152
+DO_VMAXMINV_S(vmaxvs, DO_MAX)
187
@@ -XXX,XX +XXX,XX @@ static void machvirt_init(MachineState *machine)
153
+DO_VMAXMINV_U(vmaxvu, DO_MAX)
188
qdev_realize(DEVICE(cpuobj), NULL, &error_fatal);
154
+DO_VMAXMINV_S(vminvs, DO_MIN)
189
object_unref(cpuobj);
155
+DO_VMAXMINV_U(vminvu, DO_MIN)
190
}
156
+/*
191
+
157
+ * VMAXAV, VMINAV treat the general purpose input as unsigned
192
+ /* Now we've created the CPUs we can see if they have the hypvirt timer */
158
+ * and the vector elements as signed.
193
+ vms->ns_el2_virt_timer_irq = ns_el2_virt_timer_present() &&
159
+ */
194
+ !vmc->no_ns_el2_virt_timer_irq;
160
+DO_VMAXMINV(vmaxavb, 1, int8_t, uint8_t, do_maxa)
195
+
161
+DO_VMAXMINV(vmaxavh, 2, int16_t, uint16_t, do_maxa)
196
fdt_add_timer_nodes(vms);
162
+DO_VMAXMINV(vmaxavw, 4, int32_t, uint32_t, do_maxa)
197
fdt_add_cpu_nodes(vms);
163
+DO_VMAXMINV(vminavb, 1, int8_t, uint8_t, do_mina)
198
164
+DO_VMAXMINV(vminavh, 2, int16_t, uint16_t, do_mina)
199
@@ -XXX,XX +XXX,XX @@ DEFINE_VIRT_MACHINE_AS_LATEST(9, 0)
165
+DO_VMAXMINV(vminavw, 4, int32_t, uint32_t, do_mina)
200
166
+
201
static void virt_machine_8_2_options(MachineClass *mc)
167
#define DO_VADDLV(OP, TYPE, LTYPE) \
202
{
168
uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vm, \
203
+ VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
169
uint64_t ra) \
204
+
170
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
205
virt_machine_9_0_options(mc);
171
index XXXXXXX..XXXXXXX 100644
206
compat_props_add(mc->compat_props, hw_compat_8_2, hw_compat_8_2_len);
172
--- a/target/arm/translate-mve.c
173
+++ b/target/arm/translate-mve.c
174
@@ -XXX,XX +XXX,XX @@ DO_VCMP(VCMPGE, vcmpge)
175
DO_VCMP(VCMPLT, vcmplt)
176
DO_VCMP(VCMPGT, vcmpgt)
177
DO_VCMP(VCMPLE, vcmple)
178
+
179
+static bool do_vmaxv(DisasContext *s, arg_vmaxv *a, MVEGenVADDVFn fn)
180
+{
181
+ /*
207
+ /*
182
+ * MIN/MAX operations across a vector: compute the min or
208
+ * Don't expose NS_EL2_VIRT timer IRQ in DTB on ACPI on 8.2 and
183
+ * max of the initial value in a general purpose register
209
+ * earlier machines. (Exposing it tickles a bug in older EDK2
184
+ * and all the elements in the vector, and store it back
210
+ * guest BIOS binaries.)
185
+ * into the general purpose register.
186
+ */
211
+ */
187
+ TCGv_ptr qm;
212
+ vmc->no_ns_el2_virt_timer_irq = true;
188
+ TCGv_i32 rda;
213
}
189
+
214
DEFINE_VIRT_MACHINE(8, 2)
190
+ if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qm) ||
215
191
+ !fn || a->rda == 13 || a->rda == 15) {
192
+ /* Rda cases are UNPREDICTABLE */
193
+ return false;
194
+ }
195
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
196
+ return true;
197
+ }
198
+
199
+ qm = mve_qreg_ptr(a->qm);
200
+ rda = load_reg(s, a->rda);
201
+ fn(rda, cpu_env, qm, rda);
202
+ store_reg(s, a->rda, rda);
203
+ tcg_temp_free_ptr(qm);
204
+ mve_update_eci(s);
205
+ return true;
206
+}
207
+
208
+#define DO_VMAXV(INSN, FN) \
209
+ static bool trans_##INSN(DisasContext *s, arg_vmaxv *a) \
210
+ { \
211
+ static MVEGenVADDVFn * const fns[] = { \
212
+ gen_helper_mve_##FN##b, \
213
+ gen_helper_mve_##FN##h, \
214
+ gen_helper_mve_##FN##w, \
215
+ NULL, \
216
+ }; \
217
+ return do_vmaxv(s, a, fns[a->size]); \
218
+ }
219
+
220
+DO_VMAXV(VMAXV_S, vmaxvs)
221
+DO_VMAXV(VMAXV_U, vmaxvu)
222
+DO_VMAXV(VMAXAV, vmaxav)
223
+DO_VMAXV(VMINV_S, vminvs)
224
+DO_VMAXV(VMINV_U, vminvu)
225
+DO_VMAXV(VMINAV, vminav)
226
--
216
--
227
2.20.1
217
2.34.1
228
229
diff view generated by jsdifflib
1
Implement the MVE 1-operand saturating operations VQABS and VQNEG.
1
Update the virt golden reference files to say that the FACP is ACPI
2
v6.3, and the GTDT table is a revision 3 table with space for the
3
virtual EL2 timer.
4
5
Diffs from iasl:
6
7
@@ -XXX,XX +XXX,XX @@
8
/*
9
* Intel ACPI Component Architecture
10
* AML/ASL+ Disassembler version 20200925 (64-bit version)
11
* Copyright (c) 2000 - 2020 Intel Corporation
12
*
13
- * Disassembly of tests/data/acpi/virt/FACP, Mon Jan 22 13:48:40 2024
14
+ * Disassembly of /tmp/aml-W8RZH2, Mon Jan 22 13:48:40 2024
15
*
16
* ACPI Data Table [FACP]
17
*
18
* Format: [HexOffset DecimalOffset ByteLength] FieldName : FieldValue
19
*/
20
21
[000h 0000 4] Signature : "FACP" [Fixed ACPI Description Table (FADT)]
22
[004h 0004 4] Table Length : 00000114
23
[008h 0008 1] Revision : 06
24
-[009h 0009 1] Checksum : 15
25
+[009h 0009 1] Checksum : 12
26
[00Ah 0010 6] Oem ID : "BOCHS "
27
[010h 0016 8] Oem Table ID : "BXPC "
28
[018h 0024 4] Oem Revision : 00000001
29
[01Ch 0028 4] Asl Compiler ID : "BXPC"
30
[020h 0032 4] Asl Compiler Revision : 00000001
31
32
[024h 0036 4] FACS Address : 00000000
33
[028h 0040 4] DSDT Address : 00000000
34
[02Ch 0044 1] Model : 00
35
[02Dh 0045 1] PM Profile : 00 [Unspecified]
36
[02Eh 0046 2] SCI Interrupt : 0000
37
[030h 0048 4] SMI Command Port : 00000000
38
[034h 0052 1] ACPI Enable Value : 00
39
[035h 0053 1] ACPI Disable Value : 00
40
[036h 0054 1] S4BIOS Command : 00
41
[037h 0055 1] P-State Control : 00
42
@@ -XXX,XX +XXX,XX @@
43
Use APIC Physical Destination Mode (V4) : 0
44
Hardware Reduced (V5) : 1
45
Low Power S0 Idle (V5) : 0
46
47
[074h 0116 12] Reset Register : [Generic Address Structure]
48
[074h 0116 1] Space ID : 00 [SystemMemory]
49
[075h 0117 1] Bit Width : 00
50
[076h 0118 1] Bit Offset : 00
51
[077h 0119 1] Encoded Access Width : 00 [Undefined/Legacy]
52
[078h 0120 8] Address : 0000000000000000
53
54
[080h 0128 1] Value to cause reset : 00
55
[081h 0129 2] ARM Flags (decoded below) : 0003
56
PSCI Compliant : 1
57
Must use HVC for PSCI : 1
58
59
-[083h 0131 1] FADT Minor Revision : 00
60
+[083h 0131 1] FADT Minor Revision : 03
61
[084h 0132 8] FACS Address : 0000000000000000
62
[08Ch 0140 8] DSDT Address : 0000000000000000
63
[094h 0148 12] PM1A Event Block : [Generic Address Structure]
64
[094h 0148 1] Space ID : 00 [SystemMemory]
65
[095h 0149 1] Bit Width : 00
66
[096h 0150 1] Bit Offset : 00
67
[097h 0151 1] Encoded Access Width : 00 [Undefined/Legacy]
68
[098h 0152 8] Address : 0000000000000000
69
70
[0A0h 0160 12] PM1B Event Block : [Generic Address Structure]
71
[0A0h 0160 1] Space ID : 00 [SystemMemory]
72
[0A1h 0161 1] Bit Width : 00
73
[0A2h 0162 1] Bit Offset : 00
74
[0A3h 0163 1] Encoded Access Width : 00 [Undefined/Legacy]
75
[0A4h 0164 8] Address : 0000000000000000
76
77
@@ -XXX,XX +XXX,XX @@
78
[0F5h 0245 1] Bit Width : 00
79
[0F6h 0246 1] Bit Offset : 00
80
[0F7h 0247 1] Encoded Access Width : 00 [Undefined/Legacy]
81
[0F8h 0248 8] Address : 0000000000000000
82
83
[100h 0256 12] Sleep Status Register : [Generic Address Structure]
84
[100h 0256 1] Space ID : 00 [SystemMemory]
85
[101h 0257 1] Bit Width : 00
86
[102h 0258 1] Bit Offset : 00
87
[103h 0259 1] Encoded Access Width : 00 [Undefined/Legacy]
88
[104h 0260 8] Address : 0000000000000000
89
90
[10Ch 0268 8] Hypervisor ID : 00000000554D4551
91
92
Raw Table Data: Length 276 (0x114)
93
94
- 0000: 46 41 43 50 14 01 00 00 06 15 42 4F 43 48 53 20 // FACP......BOCHS
95
+ 0000: 46 41 43 50 14 01 00 00 06 12 42 4F 43 48 53 20 // FACP......BOCHS
96
0010: 42 58 50 43 20 20 20 20 01 00 00 00 42 58 50 43 // BXPC ....BXPC
97
0020: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 // ................
98
0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 // ................
99
0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 // ................
100
0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 // ................
101
0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 // ................
102
0070: 00 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 // ................
103
- 0080: 00 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 // ................
104
+ 0080: 00 03 00 03 00 00 00 00 00 00 00 00 00 00 00 00 // ................
105
0090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 // ................
106
00A0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 // ................
107
00B0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 // ................
108
00C0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 // ................
109
00D0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 // ................
110
00E0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 // ................
111
00F0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 // ................
112
0100: 00 00 00 00 00 00 00 00 00 00 00 00 51 45 4D 55 // ............QEMU
113
0110: 00 00 00 00 // ....
114
115
@@ -XXX,XX +XXX,XX @@
116
/*
117
* Intel ACPI Component Architecture
118
* AML/ASL+ Disassembler version 20200925 (64-bit version)
119
* Copyright (c) 2000 - 2020 Intel Corporation
120
*
121
- * Disassembly of tests/data/acpi/virt/GTDT, Mon Jan 22 13:48:40 2024
122
+ * Disassembly of /tmp/aml-XDSZH2, Mon Jan 22 13:48:40 2024
123
*
124
* ACPI Data Table [GTDT]
125
*
126
* Format: [HexOffset DecimalOffset ByteLength] FieldName : FieldValue
127
*/
128
129
[000h 0000 4] Signature : "GTDT" [Generic Timer Description Table]
130
-[004h 0004 4] Table Length : 00000060
131
-[008h 0008 1] Revision : 02
132
-[009h 0009 1] Checksum : 9C
133
+[004h 0004 4] Table Length : 00000068
134
+[008h 0008 1] Revision : 03
135
+[009h 0009 1] Checksum : 93
136
[00Ah 0010 6] Oem ID : "BOCHS "
137
[010h 0016 8] Oem Table ID : "BXPC "
138
[018h 0024 4] Oem Revision : 00000001
139
[01Ch 0028 4] Asl Compiler ID : "BXPC"
140
[020h 0032 4] Asl Compiler Revision : 00000001
141
142
[024h 0036 8] Counter Block Address : FFFFFFFFFFFFFFFF
143
[02Ch 0044 4] Reserved : 00000000
144
145
[030h 0048 4] Secure EL1 Interrupt : 0000001D
146
[034h 0052 4] EL1 Flags (decoded below) : 00000000
147
Trigger Mode : 0
148
Polarity : 0
149
Always On : 0
150
151
[038h 0056 4] Non-Secure EL1 Interrupt : 0000001E
152
@@ -XXX,XX +XXX,XX @@
153
154
[040h 0064 4] Virtual Timer Interrupt : 0000001B
155
[044h 0068 4] VT Flags (decoded below) : 00000000
156
Trigger Mode : 0
157
Polarity : 0
158
Always On : 0
159
160
[048h 0072 4] Non-Secure EL2 Interrupt : 0000001A
161
[04Ch 0076 4] NEL2 Flags (decoded below) : 00000000
162
Trigger Mode : 0
163
Polarity : 0
164
Always On : 0
165
[050h 0080 8] Counter Read Block Address : FFFFFFFFFFFFFFFF
166
167
[058h 0088 4] Platform Timer Count : 00000000
168
[05Ch 0092 4] Platform Timer Offset : 00000000
169
+[060h 0096 4] Virtual EL2 Timer GSIV : 00000000
170
+[064h 0100 4] Virtual EL2 Timer Flags : 00000000
171
172
-Raw Table Data: Length 96 (0x60)
173
+Raw Table Data: Length 104 (0x68)
174
175
- 0000: 47 54 44 54 60 00 00 00 02 9C 42 4F 43 48 53 20 // GTDT`.....BOCHS
176
+ 0000: 47 54 44 54 68 00 00 00 03 93 42 4F 43 48 53 20 // GTDTh.....BOCHS
177
0010: 42 58 50 43 20 20 20 20 01 00 00 00 42 58 50 43 // BXPC ....BXPC
178
0020: 01 00 00 00 FF FF FF FF FF FF FF FF 00 00 00 00 // ................
179
0030: 1D 00 00 00 00 00 00 00 1E 00 00 00 04 00 00 00 // ................
180
0040: 1B 00 00 00 00 00 00 00 1A 00 00 00 00 00 00 00 // ................
181
0050: FF FF FF FF FF FF FF FF 00 00 00 00 00 00 00 00 // ................
182
+ 0060: 00 00 00 00 00 00 00 00 // ........
2
183
3
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
184
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
185
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
186
Message-id: 20240122143537.233498-4-peter.maydell@linaro.org
5
---
187
---
6
target/arm/helper-mve.h | 8 ++++++++
188
tests/qtest/bios-tables-test-allowed-diff.h | 2 --
7
target/arm/mve.decode | 3 +++
189
tests/data/acpi/virt/FACP | Bin 276 -> 276 bytes
8
target/arm/mve_helper.c | 37 +++++++++++++++++++++++++++++++++++++
190
tests/data/acpi/virt/GTDT | Bin 96 -> 104 bytes
9
target/arm/translate-mve.c | 2 ++
191
3 files changed, 2 deletions(-)
10
4 files changed, 50 insertions(+)
192
11
193
diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
12
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
13
index XXXXXXX..XXXXXXX 100644
194
index XXXXXXX..XXXXXXX 100644
14
--- a/target/arm/helper-mve.h
195
--- a/tests/qtest/bios-tables-test-allowed-diff.h
15
+++ b/target/arm/helper-mve.h
196
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
16
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vnegw, TCG_CALL_NO_WG, void, env, ptr, ptr)
197
@@ -1,3 +1 @@
17
DEF_HELPER_FLAGS_3(mve_vfnegh, TCG_CALL_NO_WG, void, env, ptr, ptr)
198
/* List of comma-separated changed AML files to ignore */
18
DEF_HELPER_FLAGS_3(mve_vfnegs, TCG_CALL_NO_WG, void, env, ptr, ptr)
199
-"tests/data/acpi/virt/FACP",
19
200
-"tests/data/acpi/virt/GTDT",
20
+DEF_HELPER_FLAGS_3(mve_vqabsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
201
diff --git a/tests/data/acpi/virt/FACP b/tests/data/acpi/virt/FACP
21
+DEF_HELPER_FLAGS_3(mve_vqabsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
22
+DEF_HELPER_FLAGS_3(mve_vqabsw, TCG_CALL_NO_WG, void, env, ptr, ptr)
23
+
24
+DEF_HELPER_FLAGS_3(mve_vqnegb, TCG_CALL_NO_WG, void, env, ptr, ptr)
25
+DEF_HELPER_FLAGS_3(mve_vqnegh, TCG_CALL_NO_WG, void, env, ptr, ptr)
26
+DEF_HELPER_FLAGS_3(mve_vqnegw, TCG_CALL_NO_WG, void, env, ptr, ptr)
27
+
28
DEF_HELPER_FLAGS_3(mve_vmovnbb, TCG_CALL_NO_WG, void, env, ptr, ptr)
29
DEF_HELPER_FLAGS_3(mve_vmovnbh, TCG_CALL_NO_WG, void, env, ptr, ptr)
30
DEF_HELPER_FLAGS_3(mve_vmovntb, TCG_CALL_NO_WG, void, env, ptr, ptr)
31
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
32
index XXXXXXX..XXXXXXX 100644
202
index XXXXXXX..XXXXXXX 100644
33
--- a/target/arm/mve.decode
203
GIT binary patch
34
+++ b/target/arm/mve.decode
204
delta 25
35
@@ -XXX,XX +XXX,XX @@ VABS_fp 1111 1111 1 . 11 .. 01 ... 0 0111 01 . 0 ... 0 @1op
205
gcmbQjG=+)F&CxkPgpq-PO=u!l<;2F$$vli407<0<)c^nh
36
VNEG 1111 1111 1 . 11 .. 01 ... 0 0011 11 . 0 ... 0 @1op
206
37
VNEG_fp 1111 1111 1 . 11 .. 01 ... 0 0111 11 . 0 ... 0 @1op
207
delta 28
38
208
kcmbQjG=+)F&CxkPgpq-PO>`nx<-|!<6Akz$^DuG%0AAS!ssI20
39
+VQABS 1111 1111 1 . 11 .. 00 ... 0 0111 01 . 0 ... 0 @1op
209
40
+VQNEG 1111 1111 1 . 11 .. 00 ... 0 0111 11 . 0 ... 0 @1op
210
diff --git a/tests/data/acpi/virt/GTDT b/tests/data/acpi/virt/GTDT
41
+
42
&vdup qd rt size
43
# Qd is in the fields usually named Qn
44
@vdup .... .... . . .. ... . rt:4 .... . . . . .... qd=%qn &vdup
45
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
46
index XXXXXXX..XXXXXXX 100644
211
index XXXXXXX..XXXXXXX 100644
47
--- a/target/arm/mve_helper.c
212
GIT binary patch
48
+++ b/target/arm/mve_helper.c
213
delta 25
49
@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vpsel)(CPUARMState *env, void *vd, void *vn, void *vm)
214
bcmYeu;BpUf3CUn!U|^m+kt>V?$N&QXMtB4L
50
}
215
51
mve_advance_vpt(env);
216
delta 16
52
}
217
Xcmc~u;BpUf2}xjJU|^avkt+-UB60)u
53
+
218
54
+#define DO_1OP_SAT(OP, ESIZE, TYPE, FN) \
55
+ void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm) \
56
+ { \
57
+ TYPE *d = vd, *m = vm; \
58
+ uint16_t mask = mve_element_mask(env); \
59
+ unsigned e; \
60
+ bool qc = false; \
61
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
62
+ bool sat = false; \
63
+ mergemask(&d[H##ESIZE(e)], FN(m[H##ESIZE(e)], &sat), mask); \
64
+ qc |= sat & mask & 1; \
65
+ } \
66
+ if (qc) { \
67
+ env->vfp.qc[0] = qc; \
68
+ } \
69
+ mve_advance_vpt(env); \
70
+ }
71
+
72
+#define DO_VQABS_B(N, SATP) \
73
+ do_sat_bhs(DO_ABS((int64_t)N), INT8_MIN, INT8_MAX, SATP)
74
+#define DO_VQABS_H(N, SATP) \
75
+ do_sat_bhs(DO_ABS((int64_t)N), INT16_MIN, INT16_MAX, SATP)
76
+#define DO_VQABS_W(N, SATP) \
77
+ do_sat_bhs(DO_ABS((int64_t)N), INT32_MIN, INT32_MAX, SATP)
78
+
79
+#define DO_VQNEG_B(N, SATP) do_sat_bhs(-(int64_t)N, INT8_MIN, INT8_MAX, SATP)
80
+#define DO_VQNEG_H(N, SATP) do_sat_bhs(-(int64_t)N, INT16_MIN, INT16_MAX, SATP)
81
+#define DO_VQNEG_W(N, SATP) do_sat_bhs(-(int64_t)N, INT32_MIN, INT32_MAX, SATP)
82
+
83
+DO_1OP_SAT(vqabsb, 1, int8_t, DO_VQABS_B)
84
+DO_1OP_SAT(vqabsh, 2, int16_t, DO_VQABS_H)
85
+DO_1OP_SAT(vqabsw, 4, int32_t, DO_VQABS_W)
86
+
87
+DO_1OP_SAT(vqnegb, 1, int8_t, DO_VQNEG_B)
88
+DO_1OP_SAT(vqnegh, 2, int16_t, DO_VQNEG_H)
89
+DO_1OP_SAT(vqnegw, 4, int32_t, DO_VQNEG_W)
90
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
91
index XXXXXXX..XXXXXXX 100644
92
--- a/target/arm/translate-mve.c
93
+++ b/target/arm/translate-mve.c
94
@@ -XXX,XX +XXX,XX @@ DO_1OP(VCLZ, vclz)
95
DO_1OP(VCLS, vcls)
96
DO_1OP(VABS, vabs)
97
DO_1OP(VNEG, vneg)
98
+DO_1OP(VQABS, vqabs)
99
+DO_1OP(VQNEG, vqneg)
100
101
/* Narrowing moves: only size 0 and 1 are valid */
102
#define DO_VMOVN(INSN, FN) \
103
--
219
--
104
2.20.1
220
2.34.1
105
106
diff view generated by jsdifflib
1
Implement the MVE saturating doubling multiply accumulate insns
1
The patchset adding the GMAC ethernet to this SoC crossed in the
2
VQDMLAH, VQRDMLAH, VQDMLASH and VQRDMLASH. These perform a multiply,
2
mail with the patchset cleaning up the NIC handling. When we
3
double, add the accumulator shifted by the element size, possibly
3
create the GMAC modules we must call qemu_configure_nic_device()
4
round, saturate to twice the element size, then take the high half of
4
so that the user has the opportunity to use the -nic commandline
5
the result. The *MLAH insns do vector * scalar + vector, and the
5
option to create a network backend and connect it to the GMACs.
6
*MLASH insns do vector * vector + scalar.
7
6
7
Add the missing call.
8
9
Fixes: 21e5326a7c ("hw/arm: Add GMAC devices to NPCM7XX SoC")
8
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
11
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
12
Message-id: 20240206171231.396392-2-peter.maydell@linaro.org
10
---
13
---
11
target/arm/helper-mve.h | 16 +++++++
14
hw/arm/npcm7xx.c | 1 +
12
target/arm/mve.decode | 5 ++
15
1 file changed, 1 insertion(+)
13
target/arm/mve_helper.c | 95 ++++++++++++++++++++++++++++++++++++++
14
target/arm/translate-mve.c | 4 ++
15
4 files changed, 120 insertions(+)
16
16
17
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
17
diff --git a/hw/arm/npcm7xx.c b/hw/arm/npcm7xx.c
18
index XXXXXXX..XXXXXXX 100644
18
index XXXXXXX..XXXXXXX 100644
19
--- a/target/arm/helper-mve.h
19
--- a/hw/arm/npcm7xx.c
20
+++ b/target/arm/helper-mve.h
20
+++ b/hw/arm/npcm7xx.c
21
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vmlasb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
21
@@ -XXX,XX +XXX,XX @@ static void npcm7xx_realize(DeviceState *dev, Error **errp)
22
DEF_HELPER_FLAGS_4(mve_vmlash, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
22
for (i = 0; i < ARRAY_SIZE(s->gmac); i++) {
23
DEF_HELPER_FLAGS_4(mve_vmlasw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
23
SysBusDevice *sbd = SYS_BUS_DEVICE(&s->gmac[i]);
24
24
25
+DEF_HELPER_FLAGS_4(mve_vqdmlahb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
25
+ qemu_configure_nic_device(DEVICE(sbd), false, NULL);
26
+DEF_HELPER_FLAGS_4(mve_vqdmlahh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
26
/*
27
+DEF_HELPER_FLAGS_4(mve_vqdmlahw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
27
* The device exists regardless of whether it's connected to a QEMU
28
+
28
* netdev backend. So always instantiate it even if there is no
29
+DEF_HELPER_FLAGS_4(mve_vqrdmlahb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
30
+DEF_HELPER_FLAGS_4(mve_vqrdmlahh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
31
+DEF_HELPER_FLAGS_4(mve_vqrdmlahw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
32
+
33
+DEF_HELPER_FLAGS_4(mve_vqdmlashb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
34
+DEF_HELPER_FLAGS_4(mve_vqdmlashh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
35
+DEF_HELPER_FLAGS_4(mve_vqdmlashw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
36
+
37
+DEF_HELPER_FLAGS_4(mve_vqrdmlashb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
38
+DEF_HELPER_FLAGS_4(mve_vqrdmlashh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
39
+DEF_HELPER_FLAGS_4(mve_vqrdmlashw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
40
+
41
DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
42
DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
43
DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
44
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
45
index XXXXXXX..XXXXXXX 100644
46
--- a/target/arm/mve.decode
47
+++ b/target/arm/mve.decode
48
@@ -XXX,XX +XXX,XX @@ VQRDMULH_scalar 1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
49
VMLA 111- 1110 0 . .. ... 1 ... 0 1110 . 100 .... @2scalar
50
VMLAS 111- 1110 0 . .. ... 1 ... 1 1110 . 100 .... @2scalar
51
52
+VQRDMLAH 1110 1110 0 . .. ... 0 ... 0 1110 . 100 .... @2scalar
53
+VQRDMLASH 1110 1110 0 . .. ... 0 ... 1 1110 . 100 .... @2scalar
54
+VQDMLAH 1110 1110 0 . .. ... 0 ... 0 1110 . 110 .... @2scalar
55
+VQDMLASH 1110 1110 0 . .. ... 0 ... 1 1110 . 110 .... @2scalar
56
+
57
# Vector add across vector
58
{
59
VADDV 111 u:1 1110 1111 size:2 01 ... 0 1111 0 0 a:1 0 qm:3 0 rda=%rdalo
60
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
61
index XXXXXXX..XXXXXXX 100644
62
--- a/target/arm/mve_helper.c
63
+++ b/target/arm/mve_helper.c
64
@@ -XXX,XX +XXX,XX @@ DO_VQDMLADH_OP(vqrdmlsdhxw, 4, int32_t, 1, 1, do_vqdmlsdh_w)
65
mve_advance_vpt(env); \
66
}
67
68
+#define DO_2OP_SAT_ACC_SCALAR(OP, ESIZE, TYPE, FN) \
69
+ void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn, \
70
+ uint32_t rm) \
71
+ { \
72
+ TYPE *d = vd, *n = vn; \
73
+ TYPE m = rm; \
74
+ uint16_t mask = mve_element_mask(env); \
75
+ unsigned e; \
76
+ bool qc = false; \
77
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
78
+ bool sat = false; \
79
+ mergemask(&d[H##ESIZE(e)], \
80
+ FN(d[H##ESIZE(e)], n[H##ESIZE(e)], m, &sat), \
81
+ mask); \
82
+ qc |= sat & mask & 1; \
83
+ } \
84
+ if (qc) { \
85
+ env->vfp.qc[0] = qc; \
86
+ } \
87
+ mve_advance_vpt(env); \
88
+ }
89
+
90
/* provide unsigned 2-op scalar helpers for all sizes */
91
#define DO_2OP_SCALAR_U(OP, FN) \
92
DO_2OP_SCALAR(OP##b, 1, uint8_t, FN) \
93
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT_SCALAR(vqrdmulh_scalarb, 1, int8_t, DO_QRDMULH_B)
94
DO_2OP_SAT_SCALAR(vqrdmulh_scalarh, 2, int16_t, DO_QRDMULH_H)
95
DO_2OP_SAT_SCALAR(vqrdmulh_scalarw, 4, int32_t, DO_QRDMULH_W)
96
97
+static int8_t do_vqdmlah_b(int8_t a, int8_t b, int8_t c, int round, bool *sat)
98
+{
99
+ int64_t r = (int64_t)a * b * 2 + ((int64_t)c << 8) + (round << 7);
100
+ return do_sat_bhw(r, INT16_MIN, INT16_MAX, sat) >> 8;
101
+}
102
+
103
+static int16_t do_vqdmlah_h(int16_t a, int16_t b, int16_t c,
104
+ int round, bool *sat)
105
+{
106
+ int64_t r = (int64_t)a * b * 2 + ((int64_t)c << 16) + (round << 15);
107
+ return do_sat_bhw(r, INT32_MIN, INT32_MAX, sat) >> 16;
108
+}
109
+
110
+static int32_t do_vqdmlah_w(int32_t a, int32_t b, int32_t c,
111
+ int round, bool *sat)
112
+{
113
+ /*
114
+ * Architecturally we should do the entire add, double, round
115
+ * and then check for saturation. We do three saturating adds,
116
+ * but we need to be careful about the order. If the first
117
+ * m1 + m2 saturates then it's impossible for the *2+rc to
118
+ * bring it back into the non-saturated range. However, if
119
+ * m1 + m2 is negative then it's possible that doing the doubling
120
+ * would take the intermediate result below INT64_MAX and the
121
+ * addition of the rounding constant then brings it back in range.
122
+ * So we add half the rounding constant and half the "c << esize"
123
+ * before doubling rather than adding the rounding constant after
124
+ * the doubling.
125
+ */
126
+ int64_t m1 = (int64_t)a * b;
127
+ int64_t m2 = (int64_t)c << 31;
128
+ int64_t r;
129
+ if (sadd64_overflow(m1, m2, &r) ||
130
+ sadd64_overflow(r, (round << 30), &r) ||
131
+ sadd64_overflow(r, r, &r)) {
132
+ *sat = true;
133
+ return r < 0 ? INT32_MAX : INT32_MIN;
134
+ }
135
+ return r >> 32;
136
+}
137
+
138
+/*
139
+ * The *MLAH insns are vector * scalar + vector;
140
+ * the *MLASH insns are vector * vector + scalar
141
+ */
142
+#define DO_VQDMLAH_B(D, N, M, S) do_vqdmlah_b(N, M, D, 0, S)
143
+#define DO_VQDMLAH_H(D, N, M, S) do_vqdmlah_h(N, M, D, 0, S)
144
+#define DO_VQDMLAH_W(D, N, M, S) do_vqdmlah_w(N, M, D, 0, S)
145
+#define DO_VQRDMLAH_B(D, N, M, S) do_vqdmlah_b(N, M, D, 1, S)
146
+#define DO_VQRDMLAH_H(D, N, M, S) do_vqdmlah_h(N, M, D, 1, S)
147
+#define DO_VQRDMLAH_W(D, N, M, S) do_vqdmlah_w(N, M, D, 1, S)
148
+
149
+#define DO_VQDMLASH_B(D, N, M, S) do_vqdmlah_b(N, D, M, 0, S)
150
+#define DO_VQDMLASH_H(D, N, M, S) do_vqdmlah_h(N, D, M, 0, S)
151
+#define DO_VQDMLASH_W(D, N, M, S) do_vqdmlah_w(N, D, M, 0, S)
152
+#define DO_VQRDMLASH_B(D, N, M, S) do_vqdmlah_b(N, D, M, 1, S)
153
+#define DO_VQRDMLASH_H(D, N, M, S) do_vqdmlah_h(N, D, M, 1, S)
154
+#define DO_VQRDMLASH_W(D, N, M, S) do_vqdmlah_w(N, D, M, 1, S)
155
+
156
+DO_2OP_SAT_ACC_SCALAR(vqdmlahb, 1, int8_t, DO_VQDMLAH_B)
157
+DO_2OP_SAT_ACC_SCALAR(vqdmlahh, 2, int16_t, DO_VQDMLAH_H)
158
+DO_2OP_SAT_ACC_SCALAR(vqdmlahw, 4, int32_t, DO_VQDMLAH_W)
159
+DO_2OP_SAT_ACC_SCALAR(vqrdmlahb, 1, int8_t, DO_VQRDMLAH_B)
160
+DO_2OP_SAT_ACC_SCALAR(vqrdmlahh, 2, int16_t, DO_VQRDMLAH_H)
161
+DO_2OP_SAT_ACC_SCALAR(vqrdmlahw, 4, int32_t, DO_VQRDMLAH_W)
162
+
163
+DO_2OP_SAT_ACC_SCALAR(vqdmlashb, 1, int8_t, DO_VQDMLASH_B)
164
+DO_2OP_SAT_ACC_SCALAR(vqdmlashh, 2, int16_t, DO_VQDMLASH_H)
165
+DO_2OP_SAT_ACC_SCALAR(vqdmlashw, 4, int32_t, DO_VQDMLASH_W)
166
+DO_2OP_SAT_ACC_SCALAR(vqrdmlashb, 1, int8_t, DO_VQRDMLASH_B)
167
+DO_2OP_SAT_ACC_SCALAR(vqrdmlashh, 2, int16_t, DO_VQRDMLASH_H)
168
+DO_2OP_SAT_ACC_SCALAR(vqrdmlashw, 4, int32_t, DO_VQRDMLASH_W)
169
+
170
/* Vector by scalar plus vector */
171
#define DO_VMLA(D, N, M) ((N) * (M) + (D))
172
173
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
174
index XXXXXXX..XXXXXXX 100644
175
--- a/target/arm/translate-mve.c
176
+++ b/target/arm/translate-mve.c
177
@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR(VQRDMULH_scalar, vqrdmulh_scalar)
178
DO_2OP_SCALAR(VBRSR, vbrsr)
179
DO_2OP_SCALAR(VMLA, vmla)
180
DO_2OP_SCALAR(VMLAS, vmlas)
181
+DO_2OP_SCALAR(VQDMLAH, vqdmlah)
182
+DO_2OP_SCALAR(VQRDMLAH, vqrdmlah)
183
+DO_2OP_SCALAR(VQDMLASH, vqdmlash)
184
+DO_2OP_SCALAR(VQRDMLASH, vqrdmlash)
185
186
static bool trans_VQDMULLB_scalar(DisasContext *s, arg_2scalar *a)
187
{
188
--
29
--
189
2.20.1
30
2.34.1
190
191
diff view generated by jsdifflib
1
Implement the MVE VMLA insn, which multiplies a vector by a scalar
1
Currently QEMU will warn if there is a NIC on the board that
2
and accumulates into another vector.
2
is not connected to a backend. By default the '-nic user' will
3
get used for all NICs, but if you manually connect a specific
4
NIC to a specific backend, then the other NICs on the board
5
have no backend and will be warned about:
6
7
qemu-system-arm: warning: nic npcm7xx-emc.1 has no peer
8
qemu-system-arm: warning: nic npcm-gmac.0 has no peer
9
qemu-system-arm: warning: nic npcm-gmac.1 has no peer
10
11
So suppress those warnings by manually connecting every NIC
12
on the board to some backend.
3
13
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
14
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
15
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
16
Reviewed-by: Thomas Huth <thuth@redhat.com>
17
Message-id: 20240206171231.396392-3-peter.maydell@linaro.org
6
---
18
---
7
target/arm/helper-mve.h | 4 ++++
19
tests/qtest/npcm7xx_emc-test.c | 5 ++++-
8
target/arm/mve.decode | 1 +
20
1 file changed, 4 insertions(+), 1 deletion(-)
9
target/arm/mve_helper.c | 5 +++++
10
target/arm/translate-mve.c | 1 +
11
4 files changed, 11 insertions(+)
12
21
13
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
22
diff --git a/tests/qtest/npcm7xx_emc-test.c b/tests/qtest/npcm7xx_emc-test.c
14
index XXXXXXX..XXXXXXX 100644
23
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/helper-mve.h
24
--- a/tests/qtest/npcm7xx_emc-test.c
16
+++ b/target/arm/helper-mve.h
25
+++ b/tests/qtest/npcm7xx_emc-test.c
17
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqdmullb_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i3
26
@@ -XXX,XX +XXX,XX @@ static int *packet_test_init(int module_num, GString *cmd_line)
18
DEF_HELPER_FLAGS_4(mve_vqdmullt_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
27
* KISS and use -nic. The driver accepts 'emc0' and 'emc1' as aliases
19
DEF_HELPER_FLAGS_4(mve_vqdmullt_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
28
* in the 'model' field to specify the device to match.
20
29
*/
21
+DEF_HELPER_FLAGS_4(mve_vmlab, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
30
- g_string_append_printf(cmd_line, " -nic socket,fd=%d,model=emc%d ",
22
+DEF_HELPER_FLAGS_4(mve_vmlah, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
31
+ g_string_append_printf(cmd_line, " -nic socket,fd=%d,model=emc%d "
23
+DEF_HELPER_FLAGS_4(mve_vmlaw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
32
+ "-nic user,model=npcm7xx-emc "
24
+
33
+ "-nic user,model=npcm-gmac "
25
DEF_HELPER_FLAGS_4(mve_vmlasb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
34
+ "-nic user,model=npcm-gmac",
26
DEF_HELPER_FLAGS_4(mve_vmlash, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
35
test_sockets[1], module_num);
27
DEF_HELPER_FLAGS_4(mve_vmlasw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
36
28
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
37
g_test_queue_destroy(packet_test_clear, test_sockets);
29
index XXXXXXX..XXXXXXX 100644
30
--- a/target/arm/mve.decode
31
+++ b/target/arm/mve.decode
32
@@ -XXX,XX +XXX,XX @@ VQDMULH_scalar 1110 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
33
VQRDMULH_scalar 1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
34
35
# The U bit (28) is don't-care because it does not affect the result
36
+VMLA 111- 1110 0 . .. ... 1 ... 0 1110 . 100 .... @2scalar
37
VMLAS 111- 1110 0 . .. ... 1 ... 1 1110 . 100 .... @2scalar
38
39
# Vector add across vector
40
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
41
index XXXXXXX..XXXXXXX 100644
42
--- a/target/arm/mve_helper.c
43
+++ b/target/arm/mve_helper.c
44
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT_SCALAR(vqrdmulh_scalarb, 1, int8_t, DO_QRDMULH_B)
45
DO_2OP_SAT_SCALAR(vqrdmulh_scalarh, 2, int16_t, DO_QRDMULH_H)
46
DO_2OP_SAT_SCALAR(vqrdmulh_scalarw, 4, int32_t, DO_QRDMULH_W)
47
48
+/* Vector by scalar plus vector */
49
+#define DO_VMLA(D, N, M) ((N) * (M) + (D))
50
+
51
+DO_2OP_ACC_SCALAR_U(vmla, DO_VMLA)
52
+
53
/* Vector by vector plus scalar */
54
#define DO_VMLAS(D, N, M) ((N) * (D) + (M))
55
56
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
57
index XXXXXXX..XXXXXXX 100644
58
--- a/target/arm/translate-mve.c
59
+++ b/target/arm/translate-mve.c
60
@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR(VQSUB_U_scalar, vqsubu_scalar)
61
DO_2OP_SCALAR(VQDMULH_scalar, vqdmulh_scalar)
62
DO_2OP_SCALAR(VQRDMULH_scalar, vqrdmulh_scalar)
63
DO_2OP_SCALAR(VBRSR, vbrsr)
64
+DO_2OP_SCALAR(VMLA, vmla)
65
DO_2OP_SCALAR(VMLAS, vmlas)
66
67
static bool trans_VQDMULLB_scalar(DisasContext *s, arg_2scalar *a)
68
--
38
--
69
2.20.1
39
2.34.1
70
71
diff view generated by jsdifflib
1
Unlike A-profile, for M-profile the UDIV and SDIV insns can be
1
It doesn't make sense to read the value of MDCR_EL2 on a non-A-profile
2
configured to raise an exception on division by zero, using the CCR
2
CPU, and in fact if you try to do it we will assert:
3
DIV_0_TRP bit.
4
3
5
Implement support for setting this bit by making the helper functions
4
#6 0x00007ffff4b95e96 in __GI___assert_fail
6
raise the appropriate exception.
5
(assertion=0x5555565a8c70 "!arm_feature(env, ARM_FEATURE_M)", file=0x5555565a6e5c "../../target/arm/helper.c", line=12600, function=0x5555565a9560 <__PRETTY_FUNCTION__.0> "arm_security_space_below_el3") at ./assert/assert.c:101
6
#7 0x0000555555ebf412 in arm_security_space_below_el3 (env=0x555557bc8190) at ../../target/arm/helper.c:12600
7
#8 0x0000555555ea6f89 in arm_is_el2_enabled (env=0x555557bc8190) at ../../target/arm/cpu.h:2595
8
#9 0x0000555555ea942f in arm_mdcr_el2_eff (env=0x555557bc8190) at ../../target/arm/internals.h:1512
7
9
10
We might call pmu_counter_enabled() on an M-profile CPU (for example
11
from the migration pre/post hooks in machine.c); this should always
12
return false because these CPUs don't set ARM_FEATURE_PMU.
13
14
Avoid the assertion by not calling arm_mdcr_el2_eff() before we
15
have done the early return for "PMU not present".
16
17
This fixes an assertion failure if you try to do a loadvm or
18
savevm for an M-profile board.
19
20
Cc: qemu-stable@nongnu.org
21
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2155
8
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
22
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
23
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
9
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
24
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
10
Message-id: 20210730151636.17254-3-peter.maydell@linaro.org
25
Message-id: 20240208153346.970021-1-peter.maydell@linaro.org
11
---
26
---
12
target/arm/cpu.h | 1 +
27
target/arm/helper.c | 12 ++++++++++--
13
target/arm/helper.h | 4 ++--
28
1 file changed, 10 insertions(+), 2 deletions(-)
14
target/arm/helper.c | 19 +++++++++++++++++--
15
target/arm/m_helper.c | 4 ++++
16
target/arm/translate.c | 4 ++--
17
5 files changed, 26 insertions(+), 6 deletions(-)
18
29
19
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
20
index XXXXXXX..XXXXXXX 100644
21
--- a/target/arm/cpu.h
22
+++ b/target/arm/cpu.h
23
@@ -XXX,XX +XXX,XX @@
24
#define EXCP_LAZYFP 20 /* v7M fault during lazy FP stacking */
25
#define EXCP_LSERR 21 /* v8M LSERR SecureFault */
26
#define EXCP_UNALIGNED 22 /* v7M UNALIGNED UsageFault */
27
+#define EXCP_DIVBYZERO 23 /* v7M DIVBYZERO UsageFault */
28
/* NB: add new EXCP_ defines to the array in arm_log_exception() too */
29
30
#define ARMV7M_EXCP_RESET 1
31
diff --git a/target/arm/helper.h b/target/arm/helper.h
32
index XXXXXXX..XXXXXXX 100644
33
--- a/target/arm/helper.h
34
+++ b/target/arm/helper.h
35
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(add_saturate, i32, env, i32, i32)
36
DEF_HELPER_3(sub_saturate, i32, env, i32, i32)
37
DEF_HELPER_3(add_usaturate, i32, env, i32, i32)
38
DEF_HELPER_3(sub_usaturate, i32, env, i32, i32)
39
-DEF_HELPER_FLAGS_2(sdiv, TCG_CALL_NO_RWG_SE, s32, s32, s32)
40
-DEF_HELPER_FLAGS_2(udiv, TCG_CALL_NO_RWG_SE, i32, i32, i32)
41
+DEF_HELPER_FLAGS_3(sdiv, TCG_CALL_NO_RWG, s32, env, s32, s32)
42
+DEF_HELPER_FLAGS_3(udiv, TCG_CALL_NO_RWG, i32, env, i32, i32)
43
DEF_HELPER_FLAGS_1(rbit, TCG_CALL_NO_RWG_SE, i32, i32)
44
45
#define PAS_OP(pfx) \
46
diff --git a/target/arm/helper.c b/target/arm/helper.c
30
diff --git a/target/arm/helper.c b/target/arm/helper.c
47
index XXXXXXX..XXXXXXX 100644
31
index XXXXXXX..XXXXXXX 100644
48
--- a/target/arm/helper.c
32
--- a/target/arm/helper.c
49
+++ b/target/arm/helper.c
33
+++ b/target/arm/helper.c
50
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sxtb16)(uint32_t x)
34
@@ -XXX,XX +XXX,XX @@ static bool pmu_counter_enabled(CPUARMState *env, uint8_t counter)
51
return res;
35
bool enabled, prohibited = false, filtered;
52
}
36
bool secure = arm_is_secure(env);
53
37
int el = arm_current_el(env);
54
+static void handle_possible_div0_trap(CPUARMState *env, uintptr_t ra)
38
- uint64_t mdcr_el2 = arm_mdcr_el2_eff(env);
55
+{
39
- uint8_t hpmn = mdcr_el2 & MDCR_HPMN;
40
+ uint64_t mdcr_el2;
41
+ uint8_t hpmn;
42
56
+ /*
43
+ /*
57
+ * Take a division-by-zero exception if necessary; otherwise return
44
+ * We might be called for M-profile cores where MDCR_EL2 doesn't
58
+ * to get the usual non-trapping division behaviour (result of 0)
45
+ * exist and arm_mdcr_el2_eff() will assert, so this early-exit check
46
+ * must be before we read that value.
59
+ */
47
+ */
60
+ if (arm_feature(env, ARM_FEATURE_M)
48
if (!arm_feature(env, ARM_FEATURE_PMU)) {
61
+ && (env->v7m.ccr[env->v7m.secure] & R_V7M_CCR_DIV_0_TRP_MASK)) {
49
return false;
62
+ raise_exception_ra(env, EXCP_DIVBYZERO, 0, 1, ra);
50
}
63
+ }
51
64
+}
52
+ mdcr_el2 = arm_mdcr_el2_eff(env);
53
+ hpmn = mdcr_el2 & MDCR_HPMN;
65
+
54
+
66
uint32_t HELPER(uxtb16)(uint32_t x)
55
if (!arm_feature(env, ARM_FEATURE_EL2) ||
67
{
56
(counter < hpmn || counter == 31)) {
68
uint32_t res;
57
e = env->cp15.c9_pmcr & PMCRE;
69
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(uxtb16)(uint32_t x)
70
return res;
71
}
72
73
-int32_t HELPER(sdiv)(int32_t num, int32_t den)
74
+int32_t HELPER(sdiv)(CPUARMState *env, int32_t num, int32_t den)
75
{
76
if (den == 0) {
77
+ handle_possible_div0_trap(env, GETPC());
78
return 0;
79
}
80
if (num == INT_MIN && den == -1) {
81
@@ -XXX,XX +XXX,XX @@ int32_t HELPER(sdiv)(int32_t num, int32_t den)
82
return num / den;
83
}
84
85
-uint32_t HELPER(udiv)(uint32_t num, uint32_t den)
86
+uint32_t HELPER(udiv)(CPUARMState *env, uint32_t num, uint32_t den)
87
{
88
if (den == 0) {
89
+ handle_possible_div0_trap(env, GETPC());
90
return 0;
91
}
92
return num / den;
93
@@ -XXX,XX +XXX,XX @@ void arm_log_exception(int idx)
94
[EXCP_LAZYFP] = "v7M exception during lazy FP stacking",
95
[EXCP_LSERR] = "v8M LSERR UsageFault",
96
[EXCP_UNALIGNED] = "v7M UNALIGNED UsageFault",
97
+ [EXCP_DIVBYZERO] = "v7M DIVBYZERO UsageFault",
98
};
99
100
if (idx >= 0 && idx < ARRAY_SIZE(excnames)) {
101
diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
102
index XXXXXXX..XXXXXXX 100644
103
--- a/target/arm/m_helper.c
104
+++ b/target/arm/m_helper.c
105
@@ -XXX,XX +XXX,XX @@ void arm_v7m_cpu_do_interrupt(CPUState *cs)
106
armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_USAGE, env->v7m.secure);
107
env->v7m.cfsr[env->v7m.secure] |= R_V7M_CFSR_UNALIGNED_MASK;
108
break;
109
+ case EXCP_DIVBYZERO:
110
+ armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_USAGE, env->v7m.secure);
111
+ env->v7m.cfsr[env->v7m.secure] |= R_V7M_CFSR_DIVBYZERO_MASK;
112
+ break;
113
case EXCP_SWI:
114
/* The PC already points to the next instruction. */
115
armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_SVC, env->v7m.secure);
116
diff --git a/target/arm/translate.c b/target/arm/translate.c
117
index XXXXXXX..XXXXXXX 100644
118
--- a/target/arm/translate.c
119
+++ b/target/arm/translate.c
120
@@ -XXX,XX +XXX,XX @@ static bool op_div(DisasContext *s, arg_rrr *a, bool u)
121
t1 = load_reg(s, a->rn);
122
t2 = load_reg(s, a->rm);
123
if (u) {
124
- gen_helper_udiv(t1, t1, t2);
125
+ gen_helper_udiv(t1, cpu_env, t1, t2);
126
} else {
127
- gen_helper_sdiv(t1, t1, t2);
128
+ gen_helper_sdiv(t1, cpu_env, t1, t2);
129
}
130
tcg_temp_free_i32(t2);
131
store_reg(s, a->rd, t1);
132
--
58
--
133
2.20.1
59
2.34.1
134
60
135
61
diff view generated by jsdifflib
1
From: Hamza Mahfooz <someguy@effective-light.com>
1
From: Nabih Estefan <nabihestefan@google.com>
2
2
3
As per commit 5626f8c6d468 ("rcu: Add automatically released rcu_read_lock
3
Fix the nocm_gmac-test.c file to run on a nuvoton 7xx machine instead
4
variants"), RCU_READ_LOCK_GUARD() should be used instead of
4
of 8xx. Also fix comments referencing this and values expecting 8xx.
5
rcu_read_{un}lock().
6
5
7
Signed-off-by: Hamza Mahfooz <someguy@effective-light.com>
6
Change-Id: Iabd0fba14910c3f1e883c4a9521350f3db9ffab8
8
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
7
Signed-Off-By: Nabih Estefan <nabihestefan@google.com>
9
Message-id: 20210727235201.11491-1-someguy@effective-light.com
8
Reviewed-by: Tyrone Ting <kfting@nuvoton.com>
9
Message-id: 20240208194759.2858582-2-nabihestefan@google.com
10
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
11
[PMM: commit message tweaks]
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
12
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
---
13
---
12
target/arm/kvm.c | 17 ++++++++---------
14
tests/qtest/npcm_gmac-test.c | 84 +-----------------------------------
13
1 file changed, 8 insertions(+), 9 deletions(-)
15
tests/qtest/meson.build | 3 +-
16
2 files changed, 4 insertions(+), 83 deletions(-)
14
17
15
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
18
diff --git a/tests/qtest/npcm_gmac-test.c b/tests/qtest/npcm_gmac-test.c
16
index XXXXXXX..XXXXXXX 100644
19
index XXXXXXX..XXXXXXX 100644
17
--- a/target/arm/kvm.c
20
--- a/tests/qtest/npcm_gmac-test.c
18
+++ b/target/arm/kvm.c
21
+++ b/tests/qtest/npcm_gmac-test.c
19
@@ -XXX,XX +XXX,XX @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
22
@@ -XXX,XX +XXX,XX @@ typedef struct TestData {
20
hwaddr xlat, len, doorbell_gpa;
23
const GMACModule *module;
21
MemoryRegionSection mrs;
24
} TestData;
22
MemoryRegion *mr;
25
23
- int ret = 1;
26
-/* Values extracted from hw/arm/npcm8xx.c */
24
27
+/* Values extracted from hw/arm/npcm7xx.c */
25
if (as == &address_space_memory) {
28
static const GMACModule gmac_module_list[] = {
26
return 0;
29
{
27
@@ -XXX,XX +XXX,XX @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
30
.irq = 14,
28
31
@@ -XXX,XX +XXX,XX @@ static const GMACModule gmac_module_list[] = {
29
/* MSI doorbell address is translated by an IOMMU */
32
.irq = 15,
30
33
.base_addr = 0xf0804000
31
- rcu_read_lock();
34
},
32
+ RCU_READ_LOCK_GUARD();
35
- {
33
+
36
- .irq = 16,
34
mr = address_space_translate(as, address, &xlat, &len, true,
37
- .base_addr = 0xf0806000
35
MEMTXATTRS_UNSPECIFIED);
38
- },
36
+
39
- {
37
if (!mr) {
40
- .irq = 17,
38
- goto unlock;
41
- .base_addr = 0xf0808000
39
+ return 1;
42
- }
40
}
43
};
41
+
44
42
mrs = memory_region_find(mr, xlat, 1);
45
/* Returns the index of the GMAC module. */
43
+
46
@@ -XXX,XX +XXX,XX @@ static uint32_t gmac_read(QTestState *qts, const GMACModule *mod,
44
if (!mrs.mr) {
47
return qtest_readl(qts, mod->base_addr + regno);
45
- goto unlock;
48
}
46
+ return 1;
49
47
}
50
-static uint16_t pcs_read(QTestState *qts, const GMACModule *mod,
48
51
- NPCMRegister regno)
49
doorbell_gpa = mrs.offset_within_address_space;
52
-{
50
@@ -XXX,XX +XXX,XX @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
53
- uint32_t write_value = (regno & 0x3ffe00) >> 9;
51
54
- qtest_writel(qts, PCS_BASE_ADDRESS + NPCM_PCS_IND_AC_BA, write_value);
52
trace_kvm_arm_fixup_msi_route(address, doorbell_gpa);
55
- uint32_t read_offset = regno & 0x1ff;
53
56
- return qtest_readl(qts, PCS_BASE_ADDRESS + read_offset);
54
- ret = 0;
57
-}
55
-
58
-
56
-unlock:
59
/* Check that GMAC registers are reset to default value */
57
- rcu_read_unlock();
60
static void test_init(gconstpointer test_data)
58
- return ret;
61
{
59
+ return 0;
62
const TestData *td = test_data;
63
const GMACModule *mod = td->module;
64
- QTestState *qts = qtest_init("-machine npcm845-evb");
65
+ QTestState *qts = qtest_init("-machine npcm750-evb");
66
67
#define CHECK_REG32(regno, value) \
68
do { \
69
g_assert_cmphex(gmac_read(qts, mod, (regno)), ==, (value)); \
70
} while (0)
71
72
-#define CHECK_REG_PCS(regno, value) \
73
- do { \
74
- g_assert_cmphex(pcs_read(qts, mod, (regno)), ==, (value)); \
75
- } while (0)
76
-
77
CHECK_REG32(NPCM_DMA_BUS_MODE, 0x00020100);
78
CHECK_REG32(NPCM_DMA_XMT_POLL_DEMAND, 0);
79
CHECK_REG32(NPCM_DMA_RCV_POLL_DEMAND, 0);
80
@@ -XXX,XX +XXX,XX @@ static void test_init(gconstpointer test_data)
81
CHECK_REG32(NPCM_GMAC_PTP_TAR, 0);
82
CHECK_REG32(NPCM_GMAC_PTP_TTSR, 0);
83
84
- /* TODO Add registers PCS */
85
- if (mod->base_addr == 0xf0802000) {
86
- CHECK_REG_PCS(NPCM_PCS_SR_CTL_ID1, 0x699e);
87
- CHECK_REG_PCS(NPCM_PCS_SR_CTL_ID2, 0);
88
- CHECK_REG_PCS(NPCM_PCS_SR_CTL_STS, 0x8000);
89
-
90
- CHECK_REG_PCS(NPCM_PCS_SR_MII_CTRL, 0x1140);
91
- CHECK_REG_PCS(NPCM_PCS_SR_MII_STS, 0x0109);
92
- CHECK_REG_PCS(NPCM_PCS_SR_MII_DEV_ID1, 0x699e);
93
- CHECK_REG_PCS(NPCM_PCS_SR_MII_DEV_ID2, 0x0ced0);
94
- CHECK_REG_PCS(NPCM_PCS_SR_MII_AN_ADV, 0x0020);
95
- CHECK_REG_PCS(NPCM_PCS_SR_MII_LP_BABL, 0);
96
- CHECK_REG_PCS(NPCM_PCS_SR_MII_AN_EXPN, 0);
97
- CHECK_REG_PCS(NPCM_PCS_SR_MII_EXT_STS, 0xc000);
98
-
99
- CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_ABL, 0x0003);
100
- CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_TX_MAX_DLY_LWR, 0x0038);
101
- CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_TX_MAX_DLY_UPR, 0);
102
- CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_TX_MIN_DLY_LWR, 0x0038);
103
- CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_TX_MIN_DLY_UPR, 0);
104
- CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_RX_MAX_DLY_LWR, 0x0058);
105
- CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_RX_MAX_DLY_UPR, 0);
106
- CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_RX_MIN_DLY_LWR, 0x0048);
107
- CHECK_REG_PCS(NPCM_PCS_SR_TIM_SYNC_RX_MIN_DLY_UPR, 0);
108
-
109
- CHECK_REG_PCS(NPCM_PCS_VR_MII_MMD_DIG_CTRL1, 0x2400);
110
- CHECK_REG_PCS(NPCM_PCS_VR_MII_AN_CTRL, 0);
111
- CHECK_REG_PCS(NPCM_PCS_VR_MII_AN_INTR_STS, 0x000a);
112
- CHECK_REG_PCS(NPCM_PCS_VR_MII_TC, 0);
113
- CHECK_REG_PCS(NPCM_PCS_VR_MII_DBG_CTRL, 0);
114
- CHECK_REG_PCS(NPCM_PCS_VR_MII_EEE_MCTRL0, 0x899c);
115
- CHECK_REG_PCS(NPCM_PCS_VR_MII_EEE_TXTIMER, 0);
116
- CHECK_REG_PCS(NPCM_PCS_VR_MII_EEE_RXTIMER, 0);
117
- CHECK_REG_PCS(NPCM_PCS_VR_MII_LINK_TIMER_CTRL, 0);
118
- CHECK_REG_PCS(NPCM_PCS_VR_MII_EEE_MCTRL1, 0);
119
- CHECK_REG_PCS(NPCM_PCS_VR_MII_DIG_STS, 0x0010);
120
- CHECK_REG_PCS(NPCM_PCS_VR_MII_ICG_ERRCNT1, 0);
121
- CHECK_REG_PCS(NPCM_PCS_VR_MII_MISC_STS, 0);
122
- CHECK_REG_PCS(NPCM_PCS_VR_MII_RX_LSTS, 0);
123
- CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_TX_BSTCTRL0, 0x00a);
124
- CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_TX_LVLCTRL0, 0x007f);
125
- CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_TX_GENCTRL0, 0x0001);
126
- CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_TX_GENCTRL1, 0);
127
- CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_TX_STS, 0);
128
- CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_RX_GENCTRL0, 0x0100);
129
- CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_RX_GENCTRL1, 0x1100);
130
- CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_RX_LOS_CTRL0, 0x000e);
131
- CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_MPLL_CTRL0, 0x0100);
132
- CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_MPLL_CTRL1, 0x0032);
133
- CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_MPLL_STS, 0x0001);
134
- CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_MISC_CTRL2, 0);
135
- CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_LVL_CTRL, 0x0019);
136
- CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_MISC_CTRL0, 0);
137
- CHECK_REG_PCS(NPCM_PCS_VR_MII_MP_MISC_CTRL1, 0);
138
- CHECK_REG_PCS(NPCM_PCS_VR_MII_DIG_CTRL2, 0);
139
- CHECK_REG_PCS(NPCM_PCS_VR_MII_DIG_ERRCNT_SEL, 0);
140
- }
141
-
142
qtest_quit(qts);
60
}
143
}
61
144
62
int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
145
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
146
index XXXXXXX..XXXXXXX 100644
147
--- a/tests/qtest/meson.build
148
+++ b/tests/qtest/meson.build
149
@@ -XXX,XX +XXX,XX @@ qtests_npcm7xx = \
150
'npcm7xx_sdhci-test',
151
'npcm7xx_smbus-test',
152
'npcm7xx_timer-test',
153
- 'npcm7xx_watchdog_timer-test'] + \
154
+ 'npcm7xx_watchdog_timer-test',
155
+ 'npcm_gmac-test'] + \
156
(slirp.found() ? ['npcm7xx_emc-test'] : [])
157
qtests_aspeed = \
158
['aspeed_hace-test',
63
--
159
--
64
2.20.1
160
2.34.1
65
66
diff view generated by jsdifflib
1
From: Sebastian Meyer <meyer@absint.com>
1
From: Luc Michel <luc.michel@amd.com>
2
2
3
With gdb 9.0 and better it is possible to connect to a gdbstub
3
An access fault is raised when the Access Flag is not set in the
4
over unix sockets, which is better than a TCP socket connection
4
looked-up PTE and the AFFD field is not set in the corresponding context
5
in some situations. The QEMU command line to set this up is
5
descriptor. This was already implemented for stage 2. Implement it for
6
non-obvious; document it.
6
stage 1 as well.
7
7
8
Signed-off-by: Sebastian Meyer <meyer@absint.com>
8
Signed-off-by: Luc Michel <luc.michel@amd.com>
9
Message-id: 162867284829.27377.4784930719350564918-0@git.sr.ht
9
Reviewed-by: Mostafa Saleh <smostafa@google.com>
10
[PMM: Tweaked commit message; adjusted wording in a couple of
10
Reviewed-by: Eric Auger <eric.auger@redhat.com>
11
places; fixed rST formatting issue; moved section up out of
11
Tested-by: Mostafa Saleh <smostafa@google.com>
12
the 'advanced debugging options' subsection]
12
Message-id: 20240213082211.3330400-1-luc.michel@amd.com
13
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
13
[PMM: tweaked comment text]
14
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
15
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
14
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
16
---
15
---
17
docs/system/gdb.rst | 26 +++++++++++++++++++++++++-
16
hw/arm/smmuv3-internal.h | 1 +
18
1 file changed, 25 insertions(+), 1 deletion(-)
17
include/hw/arm/smmu-common.h | 1 +
18
hw/arm/smmu-common.c | 11 +++++++++++
19
hw/arm/smmuv3.c | 1 +
20
4 files changed, 14 insertions(+)
19
21
20
diff --git a/docs/system/gdb.rst b/docs/system/gdb.rst
22
diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
21
index XXXXXXX..XXXXXXX 100644
23
index XXXXXXX..XXXXXXX 100644
22
--- a/docs/system/gdb.rst
24
--- a/hw/arm/smmuv3-internal.h
23
+++ b/docs/system/gdb.rst
25
+++ b/hw/arm/smmuv3-internal.h
24
@@ -XXX,XX +XXX,XX @@ The ``-s`` option will make QEMU listen for an incoming connection
26
@@ -XXX,XX +XXX,XX @@ static inline int pa_range(STE *ste)
25
from gdb on TCP port 1234, and ``-S`` will make QEMU not start the
27
#define CD_EPD(x, sel) extract32((x)->word[0], (16 * (sel)) + 14, 1)
26
guest until you tell it to from gdb. (If you want to specify which
28
#define CD_ENDI(x) extract32((x)->word[0], 15, 1)
27
TCP port to use or to use something other than TCP for the gdbstub
29
#define CD_IPS(x) extract32((x)->word[1], 0 , 3)
28
-connection, use the ``-gdb dev`` option instead of ``-s``.)
30
+#define CD_AFFD(x) extract32((x)->word[1], 3 , 1)
29
+connection, use the ``-gdb dev`` option instead of ``-s``. See
31
#define CD_TBI(x) extract32((x)->word[1], 6 , 2)
30
+`Using unix sockets`_ for an example.)
32
#define CD_HD(x) extract32((x)->word[1], 10 , 1)
31
33
#define CD_HA(x) extract32((x)->word[1], 11 , 1)
32
.. parsed-literal::
34
diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
33
35
index XXXXXXX..XXXXXXX 100644
34
@@ -XXX,XX +XXX,XX @@ not just those in the cluster you are currently working on::
36
--- a/include/hw/arm/smmu-common.h
35
37
+++ b/include/hw/arm/smmu-common.h
36
(gdb) set schedule-multiple on
38
@@ -XXX,XX +XXX,XX @@ typedef struct SMMUTransCfg {
37
39
bool disabled; /* smmu is disabled */
38
+Using unix sockets
40
bool bypassed; /* translation is bypassed */
39
+==================
41
bool aborted; /* translation is aborted */
42
+ bool affd; /* AF fault disable */
43
uint32_t iotlb_hits; /* counts IOTLB hits */
44
uint32_t iotlb_misses; /* counts IOTLB misses*/
45
/* Used by stage-1 only. */
46
diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
47
index XXXXXXX..XXXXXXX 100644
48
--- a/hw/arm/smmu-common.c
49
+++ b/hw/arm/smmu-common.c
50
@@ -XXX,XX +XXX,XX @@ static int smmu_ptw_64_s1(SMMUTransCfg *cfg,
51
pte_addr, pte, iova, gpa,
52
block_size >> 20);
53
}
40
+
54
+
41
+An alternate method for connecting gdb to the QEMU gdbstub is to use
55
+ /*
42
+a unix socket (if supported by your operating system). This is useful when
56
+ * QEMU does not currently implement HTTU, so if AFFD and PTE.AF
43
+running several tests in parallel, or if you do not have a known free TCP
57
+ * are 0 we take an Access flag fault. (5.4. Context Descriptor)
44
+port (e.g. when running automated tests).
58
+ * An Access flag fault takes priority over a Permission fault.
59
+ */
60
+ if (!PTE_AF(pte) && !cfg->affd) {
61
+ info->type = SMMU_PTW_ERR_ACCESS;
62
+ goto error;
63
+ }
45
+
64
+
46
+First create a chardev with the appropriate options, then
65
ap = PTE_AP(pte);
47
+instruct the gdbserver to use that device:
66
if (is_permission_fault(ap, perm)) {
48
+
67
info->type = SMMU_PTW_ERR_PERMISSION;
49
+.. parsed-literal::
68
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
50
+
69
index XXXXXXX..XXXXXXX 100644
51
+ |qemu_system| -chardev socket,path=/tmp/gdb-socket,server=on,wait=off,id=gdb0 -gdb chardev:gdb0 -S ...
70
--- a/hw/arm/smmuv3.c
52
+
71
+++ b/hw/arm/smmuv3.c
53
+Start gdb as before, but this time connect using the path to
72
@@ -XXX,XX +XXX,XX @@ static int decode_cd(SMMUTransCfg *cfg, CD *cd, SMMUEventInfo *event)
54
+the socket::
73
cfg->oas = MIN(oas2bits(SMMU_IDR5_OAS), cfg->oas);
55
+
74
cfg->tbi = CD_TBI(cd);
56
+ (gdb) target remote /tmp/gdb-socket
75
cfg->asid = CD_ASID(cd);
57
+
76
+ cfg->affd = CD_AFFD(cd);
58
+Note that to use a unix socket for the connection you will need
77
59
+gdb version 9.0 or newer.
78
trace_smmuv3_decode_cd(cfg->oas);
60
+
61
Advanced debugging options
62
==========================
63
79
64
--
80
--
65
2.20.1
81
2.34.1
66
67
diff view generated by jsdifflib
1
Implement the MVE narrowing move insns VMOVN, VQMOVN and VQMOVUN.
1
From: Philippe Mathieu-Daudé <philmd@linaro.org>
2
These take a double-width input, narrow it (possibly saturating) and
3
store the result to either the top or bottom half of the output
4
element.
5
2
3
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
4
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
5
Message-id: 20240213155214.13619-2-philmd@linaro.org
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
8
---
7
---
9
target/arm/helper-mve.h | 20 ++++++++++
8
hw/arm/stellaris.c | 6 ++++--
10
target/arm/mve.decode | 12 ++++++
9
1 file changed, 4 insertions(+), 2 deletions(-)
11
target/arm/mve_helper.c | 78 ++++++++++++++++++++++++++++++++++++++
12
target/arm/translate-mve.c | 22 +++++++++++
13
4 files changed, 132 insertions(+)
14
10
15
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
11
diff --git a/hw/arm/stellaris.c b/hw/arm/stellaris.c
16
index XXXXXXX..XXXXXXX 100644
12
index XXXXXXX..XXXXXXX 100644
17
--- a/target/arm/helper-mve.h
13
--- a/hw/arm/stellaris.c
18
+++ b/target/arm/helper-mve.h
14
+++ b/hw/arm/stellaris.c
19
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vnegw, TCG_CALL_NO_WG, void, env, ptr, ptr)
15
@@ -XXX,XX +XXX,XX @@ static void stellaris_adc_trigger(void *opaque, int irq, int level)
20
DEF_HELPER_FLAGS_3(mve_vfnegh, TCG_CALL_NO_WG, void, env, ptr, ptr)
16
}
21
DEF_HELPER_FLAGS_3(mve_vfnegs, TCG_CALL_NO_WG, void, env, ptr, ptr)
22
23
+DEF_HELPER_FLAGS_3(mve_vmovnbb, TCG_CALL_NO_WG, void, env, ptr, ptr)
24
+DEF_HELPER_FLAGS_3(mve_vmovnbh, TCG_CALL_NO_WG, void, env, ptr, ptr)
25
+DEF_HELPER_FLAGS_3(mve_vmovntb, TCG_CALL_NO_WG, void, env, ptr, ptr)
26
+DEF_HELPER_FLAGS_3(mve_vmovnth, TCG_CALL_NO_WG, void, env, ptr, ptr)
27
+
28
+DEF_HELPER_FLAGS_3(mve_vqmovunbb, TCG_CALL_NO_WG, void, env, ptr, ptr)
29
+DEF_HELPER_FLAGS_3(mve_vqmovunbh, TCG_CALL_NO_WG, void, env, ptr, ptr)
30
+DEF_HELPER_FLAGS_3(mve_vqmovuntb, TCG_CALL_NO_WG, void, env, ptr, ptr)
31
+DEF_HELPER_FLAGS_3(mve_vqmovunth, TCG_CALL_NO_WG, void, env, ptr, ptr)
32
+
33
+DEF_HELPER_FLAGS_3(mve_vqmovnbsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
34
+DEF_HELPER_FLAGS_3(mve_vqmovnbsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
35
+DEF_HELPER_FLAGS_3(mve_vqmovntsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
36
+DEF_HELPER_FLAGS_3(mve_vqmovntsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
37
+
38
+DEF_HELPER_FLAGS_3(mve_vqmovnbub, TCG_CALL_NO_WG, void, env, ptr, ptr)
39
+DEF_HELPER_FLAGS_3(mve_vqmovnbuh, TCG_CALL_NO_WG, void, env, ptr, ptr)
40
+DEF_HELPER_FLAGS_3(mve_vqmovntub, TCG_CALL_NO_WG, void, env, ptr, ptr)
41
+DEF_HELPER_FLAGS_3(mve_vqmovntuh, TCG_CALL_NO_WG, void, env, ptr, ptr)
42
+
43
DEF_HELPER_FLAGS_4(mve_vand, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
44
DEF_HELPER_FLAGS_4(mve_vbic, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
45
DEF_HELPER_FLAGS_4(mve_vorr, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
46
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
47
index XXXXXXX..XXXXXXX 100644
48
--- a/target/arm/mve.decode
49
+++ b/target/arm/mve.decode
50
@@ -XXX,XX +XXX,XX @@ VMUL 1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
51
VSHLL_BS 111 0 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_b
52
VSHLL_BS 111 0 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_h
53
54
+ VQMOVUNB 111 0 1110 0 . 11 .. 01 ... 0 1110 1 0 . 0 ... 1 @1op
55
+ VQMOVN_BS 111 0 1110 0 . 11 .. 11 ... 0 1110 0 0 . 0 ... 1 @1op
56
+
57
VMULH_S 111 0 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
58
}
17
}
59
18
60
@@ -XXX,XX +XXX,XX @@ VMUL 1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
19
-static void stellaris_adc_reset(StellarisADCState *s)
61
VSHLL_BU 111 1 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_b
20
+static void stellaris_adc_reset_hold(Object *obj)
62
VSHLL_BU 111 1 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_h
21
{
63
22
+ StellarisADCState *s = STELLARIS_ADC(obj);
64
+ VMOVNB 111 1 1110 0 . 11 .. 01 ... 0 1110 1 0 . 0 ... 1 @1op
23
int n;
65
+ VQMOVN_BU 111 1 1110 0 . 11 .. 11 ... 0 1110 0 0 . 0 ... 1 @1op
24
66
+
25
for (n = 0; n < 4; n++) {
67
VMULH_U 111 1 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
26
@@ -XXX,XX +XXX,XX @@ static void stellaris_adc_init(Object *obj)
27
memory_region_init_io(&s->iomem, obj, &stellaris_adc_ops, s,
28
"adc", 0x1000);
29
sysbus_init_mmio(sbd, &s->iomem);
30
- stellaris_adc_reset(s);
31
qdev_init_gpio_in(dev, stellaris_adc_trigger, 1);
68
}
32
}
69
33
70
@@ -XXX,XX +XXX,XX @@ VMUL 1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
34
@@ -XXX,XX +XXX,XX @@ static const TypeInfo stellaris_i2c_info = {
71
VSHLL_TS 111 0 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_b
35
static void stellaris_adc_class_init(ObjectClass *klass, void *data)
72
VSHLL_TS 111 0 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_h
36
{
73
37
DeviceClass *dc = DEVICE_CLASS(klass);
74
+ VQMOVUNT 111 0 1110 0 . 11 .. 01 ... 1 1110 1 0 . 0 ... 1 @1op
38
+ ResettableClass *rc = RESETTABLE_CLASS(klass);
75
+ VQMOVN_TS 111 0 1110 0 . 11 .. 11 ... 1 1110 0 0 . 0 ... 1 @1op
39
76
+
40
+ rc->phases.hold = stellaris_adc_reset_hold;
77
VRMULH_S 111 0 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
41
dc->vmsd = &vmstate_stellaris_adc;
78
}
42
}
79
43
80
@@ -XXX,XX +XXX,XX @@ VMUL 1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
81
VSHLL_TU 111 1 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_b
82
VSHLL_TU 111 1 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_h
83
84
+ VMOVNT 111 1 1110 0 . 11 .. 01 ... 1 1110 1 0 . 0 ... 1 @1op
85
+ VQMOVN_TU 111 1 1110 0 . 11 .. 11 ... 1 1110 0 0 . 0 ... 1 @1op
86
+
87
VRMULH_U 111 1 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
88
}
89
90
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
91
index XXXXXXX..XXXXXXX 100644
92
--- a/target/arm/mve_helper.c
93
+++ b/target/arm/mve_helper.c
94
@@ -XXX,XX +XXX,XX @@ DO_VSHRN_SAT_UH(vqrshrnb_uh, vqrshrnt_uh, DO_RSHRN_UH)
95
DO_VSHRN_SAT_SB(vqrshrunbb, vqrshruntb, DO_RSHRUN_B)
96
DO_VSHRN_SAT_SH(vqrshrunbh, vqrshrunth, DO_RSHRUN_H)
97
98
+#define DO_VMOVN(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE) \
99
+ void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm) \
100
+ { \
101
+ LTYPE *m = vm; \
102
+ TYPE *d = vd; \
103
+ uint16_t mask = mve_element_mask(env); \
104
+ unsigned le; \
105
+ mask >>= ESIZE * TOP; \
106
+ for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) { \
107
+ mergemask(&d[H##ESIZE(le * 2 + TOP)], \
108
+ m[H##LESIZE(le)], mask); \
109
+ } \
110
+ mve_advance_vpt(env); \
111
+ }
112
+
113
+DO_VMOVN(vmovnbb, false, 1, uint8_t, 2, uint16_t)
114
+DO_VMOVN(vmovnbh, false, 2, uint16_t, 4, uint32_t)
115
+DO_VMOVN(vmovntb, true, 1, uint8_t, 2, uint16_t)
116
+DO_VMOVN(vmovnth, true, 2, uint16_t, 4, uint32_t)
117
+
118
+#define DO_VMOVN_SAT(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE, FN) \
119
+ void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm) \
120
+ { \
121
+ LTYPE *m = vm; \
122
+ TYPE *d = vd; \
123
+ uint16_t mask = mve_element_mask(env); \
124
+ bool qc = false; \
125
+ unsigned le; \
126
+ mask >>= ESIZE * TOP; \
127
+ for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) { \
128
+ bool sat = false; \
129
+ TYPE r = FN(m[H##LESIZE(le)], &sat); \
130
+ mergemask(&d[H##ESIZE(le * 2 + TOP)], r, mask); \
131
+ qc |= sat & mask & 1; \
132
+ } \
133
+ if (qc) { \
134
+ env->vfp.qc[0] = qc; \
135
+ } \
136
+ mve_advance_vpt(env); \
137
+ }
138
+
139
+#define DO_VMOVN_SAT_UB(BOP, TOP, FN) \
140
+ DO_VMOVN_SAT(BOP, false, 1, uint8_t, 2, uint16_t, FN) \
141
+ DO_VMOVN_SAT(TOP, true, 1, uint8_t, 2, uint16_t, FN)
142
+
143
+#define DO_VMOVN_SAT_UH(BOP, TOP, FN) \
144
+ DO_VMOVN_SAT(BOP, false, 2, uint16_t, 4, uint32_t, FN) \
145
+ DO_VMOVN_SAT(TOP, true, 2, uint16_t, 4, uint32_t, FN)
146
+
147
+#define DO_VMOVN_SAT_SB(BOP, TOP, FN) \
148
+ DO_VMOVN_SAT(BOP, false, 1, int8_t, 2, int16_t, FN) \
149
+ DO_VMOVN_SAT(TOP, true, 1, int8_t, 2, int16_t, FN)
150
+
151
+#define DO_VMOVN_SAT_SH(BOP, TOP, FN) \
152
+ DO_VMOVN_SAT(BOP, false, 2, int16_t, 4, int32_t, FN) \
153
+ DO_VMOVN_SAT(TOP, true, 2, int16_t, 4, int32_t, FN)
154
+
155
+#define DO_VQMOVN_SB(N, SATP) \
156
+ do_sat_bhs((int64_t)(N), INT8_MIN, INT8_MAX, SATP)
157
+#define DO_VQMOVN_UB(N, SATP) \
158
+ do_sat_bhs((uint64_t)(N), 0, UINT8_MAX, SATP)
159
+#define DO_VQMOVUN_B(N, SATP) \
160
+ do_sat_bhs((int64_t)(N), 0, UINT8_MAX, SATP)
161
+
162
+#define DO_VQMOVN_SH(N, SATP) \
163
+ do_sat_bhs((int64_t)(N), INT16_MIN, INT16_MAX, SATP)
164
+#define DO_VQMOVN_UH(N, SATP) \
165
+ do_sat_bhs((uint64_t)(N), 0, UINT16_MAX, SATP)
166
+#define DO_VQMOVUN_H(N, SATP) \
167
+ do_sat_bhs((int64_t)(N), 0, UINT16_MAX, SATP)
168
+
169
+DO_VMOVN_SAT_SB(vqmovnbsb, vqmovntsb, DO_VQMOVN_SB)
170
+DO_VMOVN_SAT_SH(vqmovnbsh, vqmovntsh, DO_VQMOVN_SH)
171
+DO_VMOVN_SAT_UB(vqmovnbub, vqmovntub, DO_VQMOVN_UB)
172
+DO_VMOVN_SAT_UH(vqmovnbuh, vqmovntuh, DO_VQMOVN_UH)
173
+DO_VMOVN_SAT_SB(vqmovunbb, vqmovuntb, DO_VQMOVUN_B)
174
+DO_VMOVN_SAT_SH(vqmovunbh, vqmovunth, DO_VQMOVUN_H)
175
+
176
uint32_t HELPER(mve_vshlc)(CPUARMState *env, void *vd, uint32_t rdm,
177
uint32_t shift)
178
{
179
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
180
index XXXXXXX..XXXXXXX 100644
181
--- a/target/arm/translate-mve.c
182
+++ b/target/arm/translate-mve.c
183
@@ -XXX,XX +XXX,XX @@ DO_1OP(VCLS, vcls)
184
DO_1OP(VABS, vabs)
185
DO_1OP(VNEG, vneg)
186
187
+/* Narrowing moves: only size 0 and 1 are valid */
188
+#define DO_VMOVN(INSN, FN) \
189
+ static bool trans_##INSN(DisasContext *s, arg_1op *a) \
190
+ { \
191
+ static MVEGenOneOpFn * const fns[] = { \
192
+ gen_helper_mve_##FN##b, \
193
+ gen_helper_mve_##FN##h, \
194
+ NULL, \
195
+ NULL, \
196
+ }; \
197
+ return do_1op(s, a, fns[a->size]); \
198
+ }
199
+
200
+DO_VMOVN(VMOVNB, vmovnb)
201
+DO_VMOVN(VMOVNT, vmovnt)
202
+DO_VMOVN(VQMOVUNB, vqmovunb)
203
+DO_VMOVN(VQMOVUNT, vqmovunt)
204
+DO_VMOVN(VQMOVN_BS, vqmovnbs)
205
+DO_VMOVN(VQMOVN_TS, vqmovnts)
206
+DO_VMOVN(VQMOVN_BU, vqmovnbu)
207
+DO_VMOVN(VQMOVN_TU, vqmovntu)
208
+
209
static bool trans_VREV16(DisasContext *s, arg_1op *a)
210
{
211
static MVEGenOneOpFn * const fns[] = {
212
--
44
--
213
2.20.1
45
2.34.1
214
46
215
47
diff view generated by jsdifflib
1
Implement the MVE VMLADAV and VMLSLDAV insns. Like the VMLALDAV and
1
From: Philippe Mathieu-Daudé <philmd@linaro.org>
2
VMLSLDAV insns already implemented, these accumulate multiplied
3
vector elements; but they accumulate a 32-bit result rather than a
4
64-bit one.
5
2
6
Note that these encodings overlap with what would be RdaHi=0b111 for
3
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
7
VMLALDAV, VMLSLDAV, VRMLALDAVH and VRMLSLDAVH.
4
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
5
Message-id: 20240213155214.13619-3-philmd@linaro.org
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8
---
9
hw/arm/stellaris.c | 26 ++++++++++++++++++++++----
10
1 file changed, 22 insertions(+), 4 deletions(-)
8
11
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
12
diff --git a/hw/arm/stellaris.c b/hw/arm/stellaris.c
10
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
11
---
12
target/arm/helper-mve.h | 17 ++++++++++
13
target/arm/mve.decode | 33 +++++++++++++++++---
14
target/arm/mve_helper.c | 41 ++++++++++++++++++++++++
15
target/arm/translate-mve.c | 64 ++++++++++++++++++++++++++++++++++++++
16
4 files changed, 150 insertions(+), 5 deletions(-)
17
18
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
19
index XXXXXXX..XXXXXXX 100644
13
index XXXXXXX..XXXXXXX 100644
20
--- a/target/arm/helper-mve.h
14
--- a/hw/arm/stellaris.c
21
+++ b/target/arm/helper-mve.h
15
+++ b/hw/arm/stellaris.c
22
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vrmlaldavhuw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
16
@@ -XXX,XX +XXX,XX @@ static void stellaris_sys_instance_init(Object *obj)
23
DEF_HELPER_FLAGS_4(mve_vrmlsldavhsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
17
s->sysclk = qdev_init_clock_out(DEVICE(s), "SYSCLK");
24
DEF_HELPER_FLAGS_4(mve_vrmlsldavhxsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
18
}
25
19
26
+DEF_HELPER_FLAGS_4(mve_vmladavsb, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
20
-/* I2C controller. */
27
+DEF_HELPER_FLAGS_4(mve_vmladavsh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
21
+/*
28
+DEF_HELPER_FLAGS_4(mve_vmladavsw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
22
+ * I2C controller.
29
+DEF_HELPER_FLAGS_4(mve_vmladavub, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
23
+ * ??? For now we only implement the master interface.
30
+DEF_HELPER_FLAGS_4(mve_vmladavuh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
24
+ */
31
+DEF_HELPER_FLAGS_4(mve_vmladavuw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
25
32
+DEF_HELPER_FLAGS_4(mve_vmlsdavb, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
26
#define TYPE_STELLARIS_I2C "stellaris-i2c"
33
+DEF_HELPER_FLAGS_4(mve_vmlsdavh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
27
OBJECT_DECLARE_SIMPLE_TYPE(stellaris_i2c_state, STELLARIS_I2C)
34
+DEF_HELPER_FLAGS_4(mve_vmlsdavw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
28
@@ -XXX,XX +XXX,XX @@ static void stellaris_i2c_write(void *opaque, hwaddr offset,
29
stellaris_i2c_update(s);
30
}
31
32
-static void stellaris_i2c_reset(stellaris_i2c_state *s)
33
+static void stellaris_i2c_reset_enter(Object *obj, ResetType type)
34
{
35
+ stellaris_i2c_state *s = STELLARIS_I2C(obj);
35
+
36
+
36
+DEF_HELPER_FLAGS_4(mve_vmladavsxb, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
37
if (s->mcs & STELLARIS_I2C_MCS_BUSBSY)
37
+DEF_HELPER_FLAGS_4(mve_vmladavsxh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
38
i2c_end_transfer(s->bus);
38
+DEF_HELPER_FLAGS_4(mve_vmladavsxw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
39
+DEF_HELPER_FLAGS_4(mve_vmlsdavxb, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
40
+DEF_HELPER_FLAGS_4(mve_vmlsdavxh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
41
+DEF_HELPER_FLAGS_4(mve_vmlsdavxw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
42
+
43
DEF_HELPER_FLAGS_3(mve_vaddvsb, TCG_CALL_NO_WG, i32, env, ptr, i32)
44
DEF_HELPER_FLAGS_3(mve_vaddvub, TCG_CALL_NO_WG, i32, env, ptr, i32)
45
DEF_HELPER_FLAGS_3(mve_vaddvsh, TCG_CALL_NO_WG, i32, env, ptr, i32)
46
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
47
index XXXXXXX..XXXXXXX 100644
48
--- a/target/arm/mve.decode
49
+++ b/target/arm/mve.decode
50
@@ -XXX,XX +XXX,XX @@ VDUP 1110 1110 1 0 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=2
51
%size_16 16:1 !function=plus_1
52
53
&vmlaldav rdahi rdalo size qn qm x a
54
+&vmladav rda size qn qm x a
55
56
@vmlaldav .... .... . ... ... . ... x:1 .... .. a:1 . qm:3 . \
57
qn=%qn rdahi=%rdahi rdalo=%rdalo size=%size_16 &vmlaldav
58
@vmlaldav_nosz .... .... . ... ... . ... x:1 .... .. a:1 . qm:3 . \
59
qn=%qn rdahi=%rdahi rdalo=%rdalo size=0 &vmlaldav
60
-VMLALDAV_S 1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
61
-VMLALDAV_U 1111 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
62
+@vmladav .... .... .... ... . ... x:1 .... . . a:1 . qm:3 . \
63
+ qn=%qn rda=%rdalo size=%size_16 &vmladav
64
+@vmladav_nosz .... .... .... ... . ... x:1 .... . . a:1 . qm:3 . \
65
+ qn=%qn rda=%rdalo size=0 &vmladav
66
67
-VMLSLDAV 1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 1 @vmlaldav
68
+{
69
+ VMLADAV_S 1110 1110 1111 ... . ... . 1110 . 0 . 0 ... 0 @vmladav
70
+ VMLALDAV_S 1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
71
+}
72
+{
73
+ VMLADAV_U 1111 1110 1111 ... . ... . 1110 . 0 . 0 ... 0 @vmladav
74
+ VMLALDAV_U 1111 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
75
+}
39
+}
76
+
40
+
41
+static void stellaris_i2c_reset_hold(Object *obj)
77
+{
42
+{
78
+ VMLSDAV 1110 1110 1111 ... . ... . 1110 . 0 . 0 ... 1 @vmladav
43
+ stellaris_i2c_state *s = STELLARIS_I2C(obj);
79
+ VMLSLDAV 1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 1 @vmlaldav
44
45
s->msa = 0;
46
s->mcs = 0;
47
@@ -XXX,XX +XXX,XX @@ static void stellaris_i2c_reset(stellaris_i2c_state *s)
48
s->mimr = 0;
49
s->mris = 0;
50
s->mcr = 0;
80
+}
51
+}
81
+
52
+
53
+static void stellaris_i2c_reset_exit(Object *obj)
82
+{
54
+{
83
+ VMLSDAV 1111 1110 1111 ... 0 ... . 1110 . 0 . 0 ... 1 @vmladav_nosz
55
+ stellaris_i2c_state *s = STELLARIS_I2C(obj);
84
+ VRMLSLDAVH 1111 1110 1 ... ... 0 ... . 1110 . 0 . 0 ... 1 @vmlaldav_nosz
85
+}
86
+
56
+
87
+VMLADAV_S 1110 1110 1111 ... 0 ... . 1111 . 0 . 0 ... 1 @vmladav_nosz
57
stellaris_i2c_update(s);
88
+VMLADAV_U 1111 1110 1111 ... 0 ... . 1111 . 0 . 0 ... 1 @vmladav_nosz
58
}
89
59
60
@@ -XXX,XX +XXX,XX @@ static void stellaris_i2c_init(Object *obj)
61
memory_region_init_io(&s->iomem, obj, &stellaris_i2c_ops, s,
62
"i2c", 0x1000);
63
sysbus_init_mmio(sbd, &s->iomem);
64
- /* ??? For now we only implement the master interface. */
65
- stellaris_i2c_reset(s);
66
}
67
68
/* Analogue to Digital Converter. This is only partially implemented,
69
@@ -XXX,XX +XXX,XX @@ type_init(stellaris_machine_init)
70
static void stellaris_i2c_class_init(ObjectClass *klass, void *data)
90
{
71
{
91
VMAXV_S 1110 1110 1110 .. 10 .... 1111 0 0 . 0 ... 0 @vmaxv
72
DeviceClass *dc = DEVICE_CLASS(klass);
92
VMINV_S 1110 1110 1110 .. 10 .... 1111 1 0 . 0 ... 0 @vmaxv
73
+ ResettableClass *rc = RESETTABLE_CLASS(klass);
93
VMAXAV 1110 1110 1110 .. 00 .... 1111 0 0 . 0 ... 0 @vmaxv
74
94
VMINAV 1110 1110 1110 .. 00 .... 1111 1 0 . 0 ... 0 @vmaxv
75
+ rc->phases.enter = stellaris_i2c_reset_enter;
95
+ VMLADAV_S 1110 1110 1111 ... 0 ... . 1111 . 0 . 0 ... 0 @vmladav_nosz
76
+ rc->phases.hold = stellaris_i2c_reset_hold;
96
VRMLALDAVH_S 1110 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
77
+ rc->phases.exit = stellaris_i2c_reset_exit;
78
dc->vmsd = &vmstate_stellaris_i2c;
97
}
79
}
98
80
99
{
100
VMAXV_U 1111 1110 1110 .. 10 .... 1111 0 0 . 0 ... 0 @vmaxv
101
VMINV_U 1111 1110 1110 .. 10 .... 1111 1 0 . 0 ... 0 @vmaxv
102
+ VMLADAV_U 1111 1110 1111 ... 0 ... . 1111 . 0 . 0 ... 0 @vmladav_nosz
103
VRMLALDAVH_U 1111 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
104
}
105
106
-VRMLSLDAVH 1111 1110 1 ... ... 0 ... . 1110 . 0 . 0 ... 1 @vmlaldav_nosz
107
-
108
# Scalar operations
109
110
VADD_scalar 1110 1110 0 . .. ... 1 ... 0 1111 . 100 .... @2scalar
111
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
112
index XXXXXXX..XXXXXXX 100644
113
--- a/target/arm/mve_helper.c
114
+++ b/target/arm/mve_helper.c
115
@@ -XXX,XX +XXX,XX @@ DO_LDAV(vmlsldavxsh, 2, int16_t, true, +=, -=)
116
DO_LDAV(vmlsldavsw, 4, int32_t, false, +=, -=)
117
DO_LDAV(vmlsldavxsw, 4, int32_t, true, +=, -=)
118
119
+/*
120
+ * Multiply add dual accumulate ops
121
+ */
122
+#define DO_DAV(OP, ESIZE, TYPE, XCHG, EVENACC, ODDACC) \
123
+ uint32_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vn, \
124
+ void *vm, uint32_t a) \
125
+ { \
126
+ uint16_t mask = mve_element_mask(env); \
127
+ unsigned e; \
128
+ TYPE *n = vn, *m = vm; \
129
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
130
+ if (mask & 1) { \
131
+ if (e & 1) { \
132
+ a ODDACC \
133
+ n[H##ESIZE(e - 1 * XCHG)] * m[H##ESIZE(e)]; \
134
+ } else { \
135
+ a EVENACC \
136
+ n[H##ESIZE(e + 1 * XCHG)] * m[H##ESIZE(e)]; \
137
+ } \
138
+ } \
139
+ } \
140
+ mve_advance_vpt(env); \
141
+ return a; \
142
+ }
143
+
144
+#define DO_DAV_S(INSN, XCHG, EVENACC, ODDACC) \
145
+ DO_DAV(INSN##b, 1, int8_t, XCHG, EVENACC, ODDACC) \
146
+ DO_DAV(INSN##h, 2, int16_t, XCHG, EVENACC, ODDACC) \
147
+ DO_DAV(INSN##w, 4, int32_t, XCHG, EVENACC, ODDACC)
148
+
149
+#define DO_DAV_U(INSN, XCHG, EVENACC, ODDACC) \
150
+ DO_DAV(INSN##b, 1, uint8_t, XCHG, EVENACC, ODDACC) \
151
+ DO_DAV(INSN##h, 2, uint16_t, XCHG, EVENACC, ODDACC) \
152
+ DO_DAV(INSN##w, 4, uint32_t, XCHG, EVENACC, ODDACC)
153
+
154
+DO_DAV_S(vmladavs, false, +=, +=)
155
+DO_DAV_U(vmladavu, false, +=, +=)
156
+DO_DAV_S(vmlsdav, false, +=, -=)
157
+DO_DAV_S(vmladavsx, true, +=, +=)
158
+DO_DAV_S(vmlsdavx, true, +=, -=)
159
+
160
/*
161
* Rounding multiply add long dual accumulate high. In the pseudocode
162
* this is implemented with a 72-bit internal accumulator value of which
163
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
164
index XXXXXXX..XXXXXXX 100644
165
--- a/target/arm/translate-mve.c
166
+++ b/target/arm/translate-mve.c
167
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenVIWDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32, TC
168
typedef void MVEGenCmpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
169
typedef void MVEGenScalarCmpFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
170
typedef void MVEGenVABAVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
171
+typedef void MVEGenDualAccOpFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
172
173
/* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
174
static inline long mve_qreg_offset(unsigned reg)
175
@@ -XXX,XX +XXX,XX @@ static bool trans_VRMLSLDAVH(DisasContext *s, arg_vmlaldav *a)
176
return do_long_dual_acc(s, a, fns[a->x]);
177
}
178
179
+static bool do_dual_acc(DisasContext *s, arg_vmladav *a, MVEGenDualAccOpFn *fn)
180
+{
181
+ TCGv_ptr qn, qm;
182
+ TCGv_i32 rda;
183
+
184
+ if (!dc_isar_feature(aa32_mve, s) ||
185
+ !mve_check_qreg_bank(s, a->qn) ||
186
+ !fn) {
187
+ return false;
188
+ }
189
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
190
+ return true;
191
+ }
192
+
193
+ qn = mve_qreg_ptr(a->qn);
194
+ qm = mve_qreg_ptr(a->qm);
195
+
196
+ /*
197
+ * This insn is subject to beat-wise execution. Partial execution
198
+ * of an A=0 (no-accumulate) insn which does not execute the first
199
+ * beat must start with the current rda value, not 0.
200
+ */
201
+ if (a->a || mve_skip_first_beat(s)) {
202
+ rda = load_reg(s, a->rda);
203
+ } else {
204
+ rda = tcg_const_i32(0);
205
+ }
206
+
207
+ fn(rda, cpu_env, qn, qm, rda);
208
+ store_reg(s, a->rda, rda);
209
+ tcg_temp_free_ptr(qn);
210
+ tcg_temp_free_ptr(qm);
211
+
212
+ mve_update_eci(s);
213
+ return true;
214
+}
215
+
216
+#define DO_DUAL_ACC(INSN, FN) \
217
+ static bool trans_##INSN(DisasContext *s, arg_vmladav *a) \
218
+ { \
219
+ static MVEGenDualAccOpFn * const fns[4][2] = { \
220
+ { gen_helper_mve_##FN##b, gen_helper_mve_##FN##xb }, \
221
+ { gen_helper_mve_##FN##h, gen_helper_mve_##FN##xh }, \
222
+ { gen_helper_mve_##FN##w, gen_helper_mve_##FN##xw }, \
223
+ { NULL, NULL }, \
224
+ }; \
225
+ return do_dual_acc(s, a, fns[a->size][a->x]); \
226
+ }
227
+
228
+DO_DUAL_ACC(VMLADAV_S, vmladavs)
229
+DO_DUAL_ACC(VMLSDAV, vmlsdav)
230
+
231
+static bool trans_VMLADAV_U(DisasContext *s, arg_vmladav *a)
232
+{
233
+ static MVEGenDualAccOpFn * const fns[4][2] = {
234
+ { gen_helper_mve_vmladavub, NULL },
235
+ { gen_helper_mve_vmladavuh, NULL },
236
+ { gen_helper_mve_vmladavuw, NULL },
237
+ { NULL, NULL },
238
+ };
239
+ return do_dual_acc(s, a, fns[a->size][a->x]);
240
+}
241
+
242
static void gen_vpst(DisasContext *s, uint32_t mask)
243
{
244
/*
245
--
81
--
246
2.20.1
82
2.34.1
247
83
248
84
diff view generated by jsdifflib
1
From: Guenter Roeck <linux@roeck-us.net>
1
From: Philippe Mathieu-Daudé <philmd@linaro.org>
2
2
3
Instantiate SAI1/2/3 and ASRC as unimplemented devices to avoid random
3
QDev objects created with qdev_new() need to manually add
4
Linux kernel crashes, such as
4
their parent relationship with object_property_add_child().
5
5
6
Unhandled fault: external abort on non-linefetch (0x808) at 0xd1580010
6
This commit plug the devices which aren't part of the SoC;
7
pgd = (ptrval)
7
they will be plugged into a SoC container in the next one.
8
[d1580010] *pgd=8231b811, *pte=02034653, *ppte=02034453
9
Internal error: : 808 [#1] SMP ARM
10
...
11
[<c095e974>] (regmap_mmio_write32le) from [<c095eb48>] (regmap_mmio_write+0x3c/0x54)
12
[<c095eb48>] (regmap_mmio_write) from [<c09580f4>] (_regmap_write+0x4c/0x1f0)
13
[<c09580f4>] (_regmap_write) from [<c095837c>] (_regmap_update_bits+0xe4/0xec)
14
[<c095837c>] (_regmap_update_bits) from [<c09599b4>] (regmap_update_bits_base+0x50/0x74)
15
[<c09599b4>] (regmap_update_bits_base) from [<c0d3e9e4>] (fsl_asrc_runtime_resume+0x1e4/0x21c)
16
[<c0d3e9e4>] (fsl_asrc_runtime_resume) from [<c0942464>] (__rpm_callback+0x3c/0x108)
17
[<c0942464>] (__rpm_callback) from [<c0942590>] (rpm_callback+0x60/0x64)
18
[<c0942590>] (rpm_callback) from [<c0942b60>] (rpm_resume+0x5cc/0x808)
19
[<c0942b60>] (rpm_resume) from [<c0942dfc>] (__pm_runtime_resume+0x60/0xa0)
20
[<c0942dfc>] (__pm_runtime_resume) from [<c0d3ecc4>] (fsl_asrc_probe+0x2a8/0x708)
21
[<c0d3ecc4>] (fsl_asrc_probe) from [<c0935b08>] (platform_probe+0x58/0xb8)
22
[<c0935b08>] (platform_probe) from [<c0933264>] (really_probe.part.0+0x9c/0x334)
23
[<c0933264>] (really_probe.part.0) from [<c093359c>] (__driver_probe_device+0xa0/0x138)
24
[<c093359c>] (__driver_probe_device) from [<c0933664>] (driver_probe_device+0x30/0xc8)
25
[<c0933664>] (driver_probe_device) from [<c0933c88>] (__driver_attach+0x90/0x130)
26
[<c0933c88>] (__driver_attach) from [<c0931060>] (bus_for_each_dev+0x78/0xb8)
27
[<c0931060>] (bus_for_each_dev) from [<c093254c>] (bus_add_driver+0xf0/0x1d8)
28
[<c093254c>] (bus_add_driver) from [<c0934a30>] (driver_register+0x88/0x118)
29
[<c0934a30>] (driver_register) from [<c01022c0>] (do_one_initcall+0x7c/0x3a4)
30
[<c01022c0>] (do_one_initcall) from [<c1601204>] (kernel_init_freeable+0x198/0x22c)
31
[<c1601204>] (kernel_init_freeable) from [<c0f5ff2c>] (kernel_init+0x10/0x128)
32
[<c0f5ff2c>] (kernel_init) from [<c010013c>] (ret_from_fork+0x14/0x38)
33
8
34
or
9
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
35
10
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
36
Unhandled fault: external abort on non-linefetch (0x808) at 0xd19b0000
11
Message-id: 20240213155214.13619-4-philmd@linaro.org
37
pgd = (ptrval)
38
[d19b0000] *pgd=82711811, *pte=308a0653, *ppte=308a0453
39
Internal error: : 808 [#1] SMP ARM
40
...
41
[<c095e974>] (regmap_mmio_write32le) from [<c095eb48>] (regmap_mmio_write+0x3c/0x54)
42
[<c095eb48>] (regmap_mmio_write) from [<c09580f4>] (_regmap_write+0x4c/0x1f0)
43
[<c09580f4>] (_regmap_write) from [<c0959b28>] (regmap_write+0x3c/0x60)
44
[<c0959b28>] (regmap_write) from [<c0d41130>] (fsl_sai_runtime_resume+0x9c/0x1ec)
45
[<c0d41130>] (fsl_sai_runtime_resume) from [<c0942464>] (__rpm_callback+0x3c/0x108)
46
[<c0942464>] (__rpm_callback) from [<c0942590>] (rpm_callback+0x60/0x64)
47
[<c0942590>] (rpm_callback) from [<c0942b60>] (rpm_resume+0x5cc/0x808)
48
[<c0942b60>] (rpm_resume) from [<c0942dfc>] (__pm_runtime_resume+0x60/0xa0)
49
[<c0942dfc>] (__pm_runtime_resume) from [<c0d4231c>] (fsl_sai_probe+0x2b8/0x65c)
50
[<c0d4231c>] (fsl_sai_probe) from [<c0935b08>] (platform_probe+0x58/0xb8)
51
[<c0935b08>] (platform_probe) from [<c0933264>] (really_probe.part.0+0x9c/0x334)
52
[<c0933264>] (really_probe.part.0) from [<c093359c>] (__driver_probe_device+0xa0/0x138)
53
[<c093359c>] (__driver_probe_device) from [<c0933664>] (driver_probe_device+0x30/0xc8)
54
[<c0933664>] (driver_probe_device) from [<c0933c88>] (__driver_attach+0x90/0x130)
55
[<c0933c88>] (__driver_attach) from [<c0931060>] (bus_for_each_dev+0x78/0xb8)
56
[<c0931060>] (bus_for_each_dev) from [<c093254c>] (bus_add_driver+0xf0/0x1d8)
57
[<c093254c>] (bus_add_driver) from [<c0934a30>] (driver_register+0x88/0x118)
58
[<c0934a30>] (driver_register) from [<c01022c0>] (do_one_initcall+0x7c/0x3a4)
59
[<c01022c0>] (do_one_initcall) from [<c1601204>] (kernel_init_freeable+0x198/0x22c)
60
[<c1601204>] (kernel_init_freeable) from [<c0f5ff2c>] (kernel_init+0x10/0x128)
61
[<c0f5ff2c>] (kernel_init) from [<c010013c>] (ret_from_fork+0x14/0x38)
62
63
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
64
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
65
Message-id: 20210810160318.87376-1-linux@roeck-us.net
66
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
12
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
67
---
13
---
68
hw/arm/fsl-imx6ul.c | 12 ++++++++++++
14
hw/arm/stellaris.c | 4 ++++
69
1 file changed, 12 insertions(+)
15
1 file changed, 4 insertions(+)
70
16
71
diff --git a/hw/arm/fsl-imx6ul.c b/hw/arm/fsl-imx6ul.c
17
diff --git a/hw/arm/stellaris.c b/hw/arm/stellaris.c
72
index XXXXXXX..XXXXXXX 100644
18
index XXXXXXX..XXXXXXX 100644
73
--- a/hw/arm/fsl-imx6ul.c
19
--- a/hw/arm/stellaris.c
74
+++ b/hw/arm/fsl-imx6ul.c
20
+++ b/hw/arm/stellaris.c
75
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
21
@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
76
*/
22
&error_fatal);
77
create_unimplemented_device("sdma", FSL_IMX6UL_SDMA_ADDR, 0x4000);
23
78
24
ssddev = qdev_new("ssd0323");
79
+ /*
25
+ object_property_add_child(OBJECT(ms), "oled", OBJECT(ssddev));
80
+ * SAI (Audio SSI (Synchronous Serial Interface))
26
qdev_prop_set_uint8(ssddev, "cs", 1);
81
+ */
27
qdev_realize_and_unref(ssddev, bus, &error_fatal);
82
+ create_unimplemented_device("sai1", FSL_IMX6UL_SAI1_ADDR, 0x4000);
28
83
+ create_unimplemented_device("sai2", FSL_IMX6UL_SAI2_ADDR, 0x4000);
29
gpio_d_splitter = qdev_new(TYPE_SPLIT_IRQ);
84
+ create_unimplemented_device("sai3", FSL_IMX6UL_SAI3_ADDR, 0x4000);
30
+ object_property_add_child(OBJECT(ms), "splitter",
85
+
31
+ OBJECT(gpio_d_splitter));
86
/*
32
qdev_prop_set_uint32(gpio_d_splitter, "num-lines", 2);
87
* PWM
33
qdev_realize_and_unref(gpio_d_splitter, NULL, &error_fatal);
88
*/
34
qdev_connect_gpio_out(
89
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
35
@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
90
create_unimplemented_device("pwm3", FSL_IMX6UL_PWM3_ADDR, 0x4000);
36
DeviceState *gpad;
91
create_unimplemented_device("pwm4", FSL_IMX6UL_PWM4_ADDR, 0x4000);
37
92
38
gpad = qdev_new(TYPE_STELLARIS_GAMEPAD);
93
+ /*
39
+ object_property_add_child(OBJECT(ms), "gamepad", OBJECT(gpad));
94
+ * Audio ASRC (asynchronous sample rate converter)
40
for (i = 0; i < ARRAY_SIZE(gpad_keycode); i++) {
95
+ */
41
qlist_append_int(gpad_keycode_list, gpad_keycode[i]);
96
+ create_unimplemented_device("asrc", FSL_IMX6UL_ASRC_ADDR, 0x4000);
42
}
97
+
98
/*
99
* CAN
100
*/
101
--
43
--
102
2.20.1
44
2.34.1
103
45
104
46
diff view generated by jsdifflib
1
From: Jan Luebbe <jlu@pengutronix.de>
1
From: Philippe Mathieu-Daudé <philmd@linaro.org>
2
2
3
Break events are currently only handled by chardev/char-serial.c, so we
3
QDev objects created with qdev_new() need to manually add
4
just ignore errors, which results in no behaviour change for other
4
their parent relationship with object_property_add_child().
5
chardevs.
6
5
7
Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
6
Since we don't model the SoC, just use a QOM container.
8
Message-id: 20210806144700.3751979-1-jlu@pengutronix.de
7
8
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
9
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
9
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
10
Message-id: 20240213155214.13619-5-philmd@linaro.org
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
---
12
---
12
hw/char/pl011.c | 6 ++++++
13
hw/arm/stellaris.c | 11 ++++++++++-
13
1 file changed, 6 insertions(+)
14
1 file changed, 10 insertions(+), 1 deletion(-)
14
15
15
diff --git a/hw/char/pl011.c b/hw/char/pl011.c
16
diff --git a/hw/arm/stellaris.c b/hw/arm/stellaris.c
16
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
17
--- a/hw/char/pl011.c
18
--- a/hw/arm/stellaris.c
18
+++ b/hw/char/pl011.c
19
+++ b/hw/arm/stellaris.c
19
@@ -XXX,XX +XXX,XX @@
20
@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
20
#include "hw/qdev-properties-system.h"
21
* 400fe000 system control
21
#include "migration/vmstate.h"
22
*/
22
#include "chardev/char-fe.h"
23
23
+#include "chardev/char-serial.h"
24
+ Object *soc_container;
24
#include "qemu/log.h"
25
DeviceState *gpio_dev[7], *nvic;
25
#include "qemu/module.h"
26
qemu_irq gpio_in[7][8];
26
#include "trace.h"
27
qemu_irq gpio_out[7][8];
27
@@ -XXX,XX +XXX,XX @@ static void pl011_write(void *opaque, hwaddr offset,
28
@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
28
s->read_count = 0;
29
flash_size = (((board->dc0 & 0xffff) + 1) << 1) * 1024;
29
s->read_pos = 0;
30
sram_size = ((board->dc0 >> 18) + 1) * 1024;
30
}
31
31
+ if ((s->lcr ^ value) & 0x1) {
32
+ soc_container = object_new("container");
32
+ int break_enable = value & 0x1;
33
+ object_property_add_child(OBJECT(ms), "soc", soc_container);
33
+ qemu_chr_fe_ioctl(&s->chr, CHR_IOCTL_SERIAL_SET_BREAK,
34
+
34
+ &break_enable);
35
/* Flash programming is done via the SCU, so pretend it is ROM. */
35
+ }
36
memory_region_init_rom(flash, NULL, "stellaris.flash", flash_size,
36
s->lcr = value;
37
&error_fatal);
37
pl011_set_read_trigger(s);
38
@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
38
break;
39
* need its sysclk output.
40
*/
41
ssys_dev = qdev_new(TYPE_STELLARIS_SYS);
42
+ object_property_add_child(soc_container, "sys", OBJECT(ssys_dev));
43
44
/*
45
* Most devices come preprogrammed with a MAC address in the user data.
46
@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
47
sysbus_realize_and_unref(SYS_BUS_DEVICE(ssys_dev), &error_fatal);
48
49
nvic = qdev_new(TYPE_ARMV7M);
50
+ object_property_add_child(soc_container, "v7m", OBJECT(nvic));
51
qdev_prop_set_uint32(nvic, "num-irq", NUM_IRQ_LINES);
52
qdev_prop_set_uint8(nvic, "num-prio-bits", NUM_PRIO_BITS);
53
qdev_prop_set_string(nvic, "cpu-type", ms->cpu_type);
54
@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
55
56
dev = qdev_new(TYPE_STELLARIS_GPTM);
57
sbd = SYS_BUS_DEVICE(dev);
58
+ object_property_add_child(soc_container, "gptm[*]", OBJECT(dev));
59
qdev_connect_clock_in(dev, "clk",
60
qdev_get_clock_out(ssys_dev, "SYSCLK"));
61
sysbus_realize_and_unref(sbd, &error_fatal);
62
@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
63
64
if (board->dc1 & (1 << 3)) { /* watchdog present */
65
dev = qdev_new(TYPE_LUMINARY_WATCHDOG);
66
-
67
+ object_property_add_child(soc_container, "wdg", OBJECT(dev));
68
qdev_connect_clock_in(dev, "WDOGCLK",
69
qdev_get_clock_out(ssys_dev, "SYSCLK"));
70
71
@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
72
SysBusDevice *sbd;
73
74
dev = qdev_new("pl011_luminary");
75
+ object_property_add_child(soc_container, "uart[*]", OBJECT(dev));
76
sbd = SYS_BUS_DEVICE(dev);
77
qdev_prop_set_chr(dev, "chardev", serial_hd(i));
78
sysbus_realize_and_unref(sbd, &error_fatal);
79
@@ -XXX,XX +XXX,XX @@ static void stellaris_init(MachineState *ms, stellaris_board_info *board)
80
DeviceState *enet;
81
82
enet = qdev_new("stellaris_enet");
83
+ object_property_add_child(soc_container, "enet", OBJECT(enet));
84
if (nd) {
85
qdev_set_nic_properties(enet, nd);
86
} else {
39
--
87
--
40
2.20.1
88
2.34.1
41
89
42
90
diff view generated by jsdifflib
1
We're about to make a code change to the sdiv and udiv helper
1
We support two different encodings for the AArch32 IMPDEF
2
functions, so first fix their indentation and coding style.
2
CBAR register -- older cores like the Cortex A9, A7, A15
3
have this at 4, c15, c0, 0; newer cores like the
4
Cortex A35, A53, A57 and A72 have it at 1 c15 c0 0.
5
6
When we implemented this we picked which encoding to
7
use based on whether the CPU set ARM_FEATURE_AARCH64.
8
However this isn't right for three cases:
9
* the qemu-system-arm 'max' CPU, which is supposed to be
10
a variant on a Cortex-A57; it ought to use the same
11
encoding the A57 does and which the AArch64 'max'
12
exposes to AArch32 guest code
13
* the Cortex-R52, which is AArch32-only but has the CBAR
14
at the newer encoding (and where we incorrectly are
15
not yet setting ARM_FEATURE_CBAR_RO anyway)
16
* any possible future support for other v8 AArch32
17
only CPUs, or for supporting "boot the CPU into
18
AArch32 mode" on our existing cores like the A57 etc
19
20
Make the decision of the encoding be based on whether
21
the CPU implements the ARM_FEATURE_V8 flag instead.
22
23
This changes the behaviour only for the qemu-system-arm
24
'-cpu max'. We don't expect anybody to be relying on the
25
old behaviour because:
26
* it's not what the real hardware Cortex-A57 does
27
(and that's what our ID register claims we are)
28
* we don't implement the memory-mapped GICv3 support
29
which is the only thing that exists at the peripheral
30
base address pointed to by the register
3
31
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
32
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
33
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Message-id: 20210730151636.17254-2-peter.maydell@linaro.org
34
Message-id: 20240206132931.38376-2-peter.maydell@linaro.org
7
---
35
---
8
target/arm/helper.c | 15 +++++++++------
36
target/arm/helper.c | 2 +-
9
1 file changed, 9 insertions(+), 6 deletions(-)
37
1 file changed, 1 insertion(+), 1 deletion(-)
10
38
11
diff --git a/target/arm/helper.c b/target/arm/helper.c
39
diff --git a/target/arm/helper.c b/target/arm/helper.c
12
index XXXXXXX..XXXXXXX 100644
40
index XXXXXXX..XXXXXXX 100644
13
--- a/target/arm/helper.c
41
--- a/target/arm/helper.c
14
+++ b/target/arm/helper.c
42
+++ b/target/arm/helper.c
15
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(uxtb16)(uint32_t x)
43
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
16
44
* AArch64 cores we might need to add a specific feature flag
17
int32_t HELPER(sdiv)(int32_t num, int32_t den)
45
* to indicate cores with "flavour 2" CBAR.
18
{
46
*/
19
- if (den == 0)
47
- if (arm_feature(env, ARM_FEATURE_AARCH64)) {
20
- return 0;
48
+ if (arm_feature(env, ARM_FEATURE_V8)) {
21
- if (num == INT_MIN && den == -1)
49
/* 32 bit view is [31:18] 0...0 [43:32]. */
22
- return INT_MIN;
50
uint32_t cbar32 = (extract64(cpu->reset_cbar, 18, 14) << 18)
23
+ if (den == 0) {
51
| extract64(cpu->reset_cbar, 32, 12);
24
+ return 0;
25
+ }
26
+ if (num == INT_MIN && den == -1) {
27
+ return INT_MIN;
28
+ }
29
return num / den;
30
}
31
32
uint32_t HELPER(udiv)(uint32_t num, uint32_t den)
33
{
34
- if (den == 0)
35
- return 0;
36
+ if (den == 0) {
37
+ return 0;
38
+ }
39
return num / den;
40
}
41
42
--
52
--
43
2.20.1
53
2.34.1
44
45
diff view generated by jsdifflib
1
The MVEGenDualAccOpFn is a bit misnamed, since it is used for
1
The Cortex-R52 implements the Configuration Base Address Register
2
the "long dual accumulate" operations that use a 64-bit
2
(CBAR), as a read-only register. Add ARM_FEATURE_CBAR_RO to this CPU
3
accumulator. Rename it to MVEGenLongDualAccOpFn so we can
3
type, so that our implementation provides the register and the
4
use the former name for the 32-bit accumulator insns.
4
associated qdev property.
5
5
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
7
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20240206132931.38376-3-peter.maydell@linaro.org
8
---
9
---
9
target/arm/translate-mve.c | 16 ++++++++--------
10
target/arm/tcg/cpu32.c | 1 +
10
1 file changed, 8 insertions(+), 8 deletions(-)
11
1 file changed, 1 insertion(+)
11
12
12
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
13
diff --git a/target/arm/tcg/cpu32.c b/target/arm/tcg/cpu32.c
13
index XXXXXXX..XXXXXXX 100644
14
index XXXXXXX..XXXXXXX 100644
14
--- a/target/arm/translate-mve.c
15
--- a/target/arm/tcg/cpu32.c
15
+++ b/target/arm/translate-mve.c
16
+++ b/target/arm/tcg/cpu32.c
16
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
17
@@ -XXX,XX +XXX,XX @@ static void cortex_r52_initfn(Object *obj)
17
typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
18
set_feature(&cpu->env, ARM_FEATURE_PMSA);
18
typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
19
set_feature(&cpu->env, ARM_FEATURE_NEON);
19
typedef void MVEGenTwoOpShiftFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
20
set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
20
-typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
21
+ set_feature(&cpu->env, ARM_FEATURE_CBAR_RO);
21
+typedef void MVEGenLongDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
22
cpu->midr = 0x411fd133; /* r1p3 */
22
typedef void MVEGenVADDVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32);
23
cpu->revidr = 0x00000000;
23
typedef void MVEGenOneOpImmFn(TCGv_ptr, TCGv_ptr, TCGv_i64);
24
cpu->reset_fpsid = 0x41034023;
24
typedef void MVEGenVIDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32);
25
@@ -XXX,XX +XXX,XX @@ static bool trans_VQDMULLT_scalar(DisasContext *s, arg_2scalar *a)
26
}
27
28
static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
29
- MVEGenDualAccOpFn *fn)
30
+ MVEGenLongDualAccOpFn *fn)
31
{
32
TCGv_ptr qn, qm;
33
TCGv_i64 rda;
34
@@ -XXX,XX +XXX,XX @@ static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
35
36
static bool trans_VMLALDAV_S(DisasContext *s, arg_vmlaldav *a)
37
{
38
- static MVEGenDualAccOpFn * const fns[4][2] = {
39
+ static MVEGenLongDualAccOpFn * const fns[4][2] = {
40
{ NULL, NULL },
41
{ gen_helper_mve_vmlaldavsh, gen_helper_mve_vmlaldavxsh },
42
{ gen_helper_mve_vmlaldavsw, gen_helper_mve_vmlaldavxsw },
43
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLALDAV_S(DisasContext *s, arg_vmlaldav *a)
44
45
static bool trans_VMLALDAV_U(DisasContext *s, arg_vmlaldav *a)
46
{
47
- static MVEGenDualAccOpFn * const fns[4][2] = {
48
+ static MVEGenLongDualAccOpFn * const fns[4][2] = {
49
{ NULL, NULL },
50
{ gen_helper_mve_vmlaldavuh, NULL },
51
{ gen_helper_mve_vmlaldavuw, NULL },
52
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLALDAV_U(DisasContext *s, arg_vmlaldav *a)
53
54
static bool trans_VMLSLDAV(DisasContext *s, arg_vmlaldav *a)
55
{
56
- static MVEGenDualAccOpFn * const fns[4][2] = {
57
+ static MVEGenLongDualAccOpFn * const fns[4][2] = {
58
{ NULL, NULL },
59
{ gen_helper_mve_vmlsldavsh, gen_helper_mve_vmlsldavxsh },
60
{ gen_helper_mve_vmlsldavsw, gen_helper_mve_vmlsldavxsw },
61
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLSLDAV(DisasContext *s, arg_vmlaldav *a)
62
63
static bool trans_VRMLALDAVH_S(DisasContext *s, arg_vmlaldav *a)
64
{
65
- static MVEGenDualAccOpFn * const fns[] = {
66
+ static MVEGenLongDualAccOpFn * const fns[] = {
67
gen_helper_mve_vrmlaldavhsw, gen_helper_mve_vrmlaldavhxsw,
68
};
69
return do_long_dual_acc(s, a, fns[a->x]);
70
@@ -XXX,XX +XXX,XX @@ static bool trans_VRMLALDAVH_S(DisasContext *s, arg_vmlaldav *a)
71
72
static bool trans_VRMLALDAVH_U(DisasContext *s, arg_vmlaldav *a)
73
{
74
- static MVEGenDualAccOpFn * const fns[] = {
75
+ static MVEGenLongDualAccOpFn * const fns[] = {
76
gen_helper_mve_vrmlaldavhuw, NULL,
77
};
78
return do_long_dual_acc(s, a, fns[a->x]);
79
@@ -XXX,XX +XXX,XX @@ static bool trans_VRMLALDAVH_U(DisasContext *s, arg_vmlaldav *a)
80
81
static bool trans_VRMLSLDAVH(DisasContext *s, arg_vmlaldav *a)
82
{
83
- static MVEGenDualAccOpFn * const fns[] = {
84
+ static MVEGenLongDualAccOpFn * const fns[] = {
85
gen_helper_mve_vrmlsldavhsw, gen_helper_mve_vrmlsldavhxsw,
86
};
87
return do_long_dual_acc(s, a, fns[a->x]);
88
--
25
--
89
2.20.1
26
2.34.1
90
91
diff view generated by jsdifflib
1
Implement the MVE VABAV insn, which computes absolute differences
1
Add the Cortex-R52 IMPDEF sysregs, by defining them here and
2
between elements of two vectors and accumulates the result into
2
also by enabling the AUXCR feature which defines the ACTLR
3
a general purpose register.
3
and HACTLR registers. As is our usual practice, we make these
4
simple reads-as-zero stubs for now.
4
5
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
7
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20240206132931.38376-4-peter.maydell@linaro.org
7
---
9
---
8
target/arm/helper-mve.h | 7 +++++++
10
target/arm/tcg/cpu32.c | 108 +++++++++++++++++++++++++++++++++++++++++
9
target/arm/mve.decode | 6 ++++++
11
1 file changed, 108 insertions(+)
10
target/arm/mve_helper.c | 26 +++++++++++++++++++++++
11
target/arm/translate-mve.c | 43 ++++++++++++++++++++++++++++++++++++++
12
4 files changed, 82 insertions(+)
13
12
14
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
13
diff --git a/target/arm/tcg/cpu32.c b/target/arm/tcg/cpu32.c
15
index XXXXXXX..XXXXXXX 100644
14
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/helper-mve.h
15
--- a/target/arm/tcg/cpu32.c
17
+++ b/target/arm/helper-mve.h
16
+++ b/target/arm/tcg/cpu32.c
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vminavw, TCG_CALL_NO_WG, i32, env, ptr, i32)
17
@@ -XXX,XX +XXX,XX @@ static void cortex_r5_initfn(Object *obj)
19
DEF_HELPER_FLAGS_3(mve_vaddlv_s, TCG_CALL_NO_WG, i64, env, ptr, i64)
18
define_arm_cp_regs(cpu, cortexr5_cp_reginfo);
20
DEF_HELPER_FLAGS_3(mve_vaddlv_u, TCG_CALL_NO_WG, i64, env, ptr, i64)
19
}
21
20
22
+DEF_HELPER_FLAGS_4(mve_vabavsb, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
21
+static const ARMCPRegInfo cortex_r52_cp_reginfo[] = {
23
+DEF_HELPER_FLAGS_4(mve_vabavsh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
22
+ { .name = "CPUACTLR", .cp = 15, .opc1 = 0, .crm = 15,
24
+DEF_HELPER_FLAGS_4(mve_vabavsw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
23
+ .access = PL1_RW, .type = ARM_CP_CONST | ARM_CP_64BIT, .resetvalue = 0 },
25
+DEF_HELPER_FLAGS_4(mve_vabavub, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
24
+ { .name = "IMP_ATCMREGIONR",
26
+DEF_HELPER_FLAGS_4(mve_vabavuh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
25
+ .cp = 15, .opc1 = 0, .crn = 9, .crm = 1, .opc2 = 0,
27
+DEF_HELPER_FLAGS_4(mve_vabavuw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
26
+ .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
27
+ { .name = "IMP_BTCMREGIONR",
28
+ .cp = 15, .opc1 = 0, .crn = 9, .crm = 1, .opc2 = 1,
29
+ .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
30
+ { .name = "IMP_CTCMREGIONR",
31
+ .cp = 15, .opc1 = 0, .crn = 9, .crm = 1, .opc2 = 2,
32
+ .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
33
+ { .name = "IMP_CSCTLR",
34
+ .cp = 15, .opc1 = 1, .crn = 9, .crm = 1, .opc2 = 0,
35
+ .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
36
+ { .name = "IMP_BPCTLR",
37
+ .cp = 15, .opc1 = 1, .crn = 9, .crm = 1, .opc2 = 1,
38
+ .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
39
+ { .name = "IMP_MEMPROTCLR",
40
+ .cp = 15, .opc1 = 1, .crn = 9, .crm = 1, .opc2 = 2,
41
+ .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
42
+ { .name = "IMP_SLAVEPCTLR",
43
+ .cp = 15, .opc1 = 0, .crn = 11, .crm = 0, .opc2 = 0,
44
+ .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
45
+ { .name = "IMP_PERIPHREGIONR",
46
+ .cp = 15, .opc1 = 0, .crn = 15, .crm = 0, .opc2 = 0,
47
+ .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
48
+ { .name = "IMP_FLASHIFREGIONR",
49
+ .cp = 15, .opc1 = 0, .crn = 15, .crm = 0, .opc2 = 1,
50
+ .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
51
+ { .name = "IMP_BUILDOPTR",
52
+ .cp = 15, .opc1 = 0, .crn = 15, .crm = 2, .opc2 = 0,
53
+ .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
54
+ { .name = "IMP_PINOPTR",
55
+ .cp = 15, .opc1 = 0, .crn = 15, .crm = 2, .opc2 = 7,
56
+ .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
57
+ { .name = "IMP_QOSR",
58
+ .cp = 15, .opc1 = 1, .crn = 15, .crm = 3, .opc2 = 1,
59
+ .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
60
+ { .name = "IMP_BUSTIMEOUTR",
61
+ .cp = 15, .opc1 = 1, .crn = 15, .crm = 3, .opc2 = 2,
62
+ .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
63
+ { .name = "IMP_INTMONR",
64
+ .cp = 15, .opc1 = 1, .crn = 15, .crm = 3, .opc2 = 4,
65
+ .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
66
+ { .name = "IMP_ICERR0",
67
+ .cp = 15, .opc1 = 2, .crn = 15, .crm = 0, .opc2 = 0,
68
+ .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
69
+ { .name = "IMP_ICERR1",
70
+ .cp = 15, .opc1 = 2, .crn = 15, .crm = 0, .opc2 = 1,
71
+ .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
72
+ { .name = "IMP_DCERR0",
73
+ .cp = 15, .opc1 = 2, .crn = 15, .crm = 1, .opc2 = 0,
74
+ .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
75
+ { .name = "IMP_DCERR1",
76
+ .cp = 15, .opc1 = 2, .crn = 15, .crm = 1, .opc2 = 1,
77
+ .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
78
+ { .name = "IMP_TCMERR0",
79
+ .cp = 15, .opc1 = 2, .crn = 15, .crm = 2, .opc2 = 0,
80
+ .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
81
+ { .name = "IMP_TCMERR1",
82
+ .cp = 15, .opc1 = 2, .crn = 15, .crm = 2, .opc2 = 1,
83
+ .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
84
+ { .name = "IMP_TCMSYNDR0",
85
+ .cp = 15, .opc1 = 2, .crn = 15, .crm = 2, .opc2 = 2,
86
+ .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
87
+ { .name = "IMP_TCMSYNDR1",
88
+ .cp = 15, .opc1 = 2, .crn = 15, .crm = 2, .opc2 = 3,
89
+ .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
90
+ { .name = "IMP_FLASHERR0",
91
+ .cp = 15, .opc1 = 2, .crn = 15, .crm = 3, .opc2 = 0,
92
+ .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
93
+ { .name = "IMP_FLASHERR1",
94
+ .cp = 15, .opc1 = 2, .crn = 15, .crm = 3, .opc2 = 1,
95
+ .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
96
+ { .name = "IMP_CDBGDR0",
97
+ .cp = 15, .opc1 = 3, .crn = 15, .crm = 0, .opc2 = 0,
98
+ .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
99
+ { .name = "IMP_CBDGBR1",
100
+ .cp = 15, .opc1 = 3, .crn = 15, .crm = 0, .opc2 = 1,
101
+ .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
102
+ { .name = "IMP_TESTR0",
103
+ .cp = 15, .opc1 = 4, .crn = 15, .crm = 0, .opc2 = 0,
104
+ .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
105
+ { .name = "IMP_TESTR1",
106
+ .cp = 15, .opc1 = 4, .crn = 15, .crm = 0, .opc2 = 1,
107
+ .access = PL1_W, .type = ARM_CP_NOP, .resetvalue = 0 },
108
+ { .name = "IMP_CDBGDCI",
109
+ .cp = 15, .opc1 = 0, .crn = 15, .crm = 15, .opc2 = 0,
110
+ .access = PL1_W, .type = ARM_CP_NOP, .resetvalue = 0 },
111
+ { .name = "IMP_CDBGDCT",
112
+ .cp = 15, .opc1 = 3, .crn = 15, .crm = 2, .opc2 = 0,
113
+ .access = PL1_W, .type = ARM_CP_NOP, .resetvalue = 0 },
114
+ { .name = "IMP_CDBGICT",
115
+ .cp = 15, .opc1 = 3, .crn = 15, .crm = 2, .opc2 = 1,
116
+ .access = PL1_W, .type = ARM_CP_NOP, .resetvalue = 0 },
117
+ { .name = "IMP_CDBGDCD",
118
+ .cp = 15, .opc1 = 3, .crn = 15, .crm = 4, .opc2 = 0,
119
+ .access = PL1_W, .type = ARM_CP_NOP, .resetvalue = 0 },
120
+ { .name = "IMP_CDBGICD",
121
+ .cp = 15, .opc1 = 3, .crn = 15, .crm = 4, .opc2 = 1,
122
+ .access = PL1_W, .type = ARM_CP_NOP, .resetvalue = 0 },
123
+};
28
+
124
+
29
DEF_HELPER_FLAGS_3(mve_vmovi, TCG_CALL_NO_WG, void, env, ptr, i64)
125
+
30
DEF_HELPER_FLAGS_3(mve_vandi, TCG_CALL_NO_WG, void, env, ptr, i64)
126
static void cortex_r52_initfn(Object *obj)
31
DEF_HELPER_FLAGS_3(mve_vorri, TCG_CALL_NO_WG, void, env, ptr, i64)
127
{
32
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
128
ARMCPU *cpu = ARM_CPU(obj);
33
index XXXXXXX..XXXXXXX 100644
129
@@ -XXX,XX +XXX,XX @@ static void cortex_r52_initfn(Object *obj)
34
--- a/target/arm/mve.decode
130
set_feature(&cpu->env, ARM_FEATURE_NEON);
35
+++ b/target/arm/mve.decode
131
set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
36
@@ -XXX,XX +XXX,XX @@
132
set_feature(&cpu->env, ARM_FEATURE_CBAR_RO);
37
&vcmp_scalar qn rm size mask
133
+ set_feature(&cpu->env, ARM_FEATURE_AUXCR);
38
&shl_scalar qda rm size
134
cpu->midr = 0x411fd133; /* r1p3 */
39
&vmaxv qm rda size
135
cpu->revidr = 0x00000000;
40
+&vabav qn qm rda size
136
cpu->reset_fpsid = 0x41034023;
41
137
@@ -XXX,XX +XXX,XX @@ static void cortex_r52_initfn(Object *obj)
42
@vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
138
43
# Note that both Rn and Qd are 3 bits only (no D bit)
139
cpu->pmsav7_dregion = 16;
44
@@ -XXX,XX +XXX,XX @@ VMLAS 111- 1110 0 . .. ... 1 ... 1 1110 . 100 .... @2scalar
140
cpu->pmsav8r_hdregion = 16;
45
rdahi=%rdahi rdalo=%rdalo
141
+
142
+ define_arm_cp_regs(cpu, cortex_r52_cp_reginfo);
46
}
143
}
47
144
48
+@vabav .... .... .. size:2 .... rda:4 .... .... .... &vabav qn=%qn qm=%qm
145
static void cortex_r5f_initfn(Object *obj)
49
+
50
+VABAV_S 111 0 1110 10 .. ... 0 .... 1111 . 0 . 0 ... 1 @vabav
51
+VABAV_U 111 1 1110 10 .. ... 0 .... 1111 . 0 . 0 ... 1 @vabav
52
+
53
# Logical immediate operations (1 reg and modified-immediate)
54
55
# The cmode/op bits here decode VORR/VBIC/VMOV/VMVN, but
56
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
57
index XXXXXXX..XXXXXXX 100644
58
--- a/target/arm/mve_helper.c
59
+++ b/target/arm/mve_helper.c
60
@@ -XXX,XX +XXX,XX @@ DO_VMAXMINV(vminavb, 1, int8_t, uint8_t, do_mina)
61
DO_VMAXMINV(vminavh, 2, int16_t, uint16_t, do_mina)
62
DO_VMAXMINV(vminavw, 4, int32_t, uint32_t, do_mina)
63
64
+#define DO_VABAV(OP, ESIZE, TYPE) \
65
+ uint32_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vn, \
66
+ void *vm, uint32_t ra) \
67
+ { \
68
+ uint16_t mask = mve_element_mask(env); \
69
+ unsigned e; \
70
+ TYPE *m = vm, *n = vn; \
71
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
72
+ if (mask & 1) { \
73
+ int64_t n0 = n[H##ESIZE(e)]; \
74
+ int64_t m0 = m[H##ESIZE(e)]; \
75
+ uint32_t r = n0 >= m0 ? (n0 - m0) : (m0 - n0); \
76
+ ra += r; \
77
+ } \
78
+ } \
79
+ mve_advance_vpt(env); \
80
+ return ra; \
81
+ }
82
+
83
+DO_VABAV(vabavsb, 1, int8_t)
84
+DO_VABAV(vabavsh, 2, int16_t)
85
+DO_VABAV(vabavsw, 4, int32_t)
86
+DO_VABAV(vabavub, 1, uint8_t)
87
+DO_VABAV(vabavuh, 2, uint16_t)
88
+DO_VABAV(vabavuw, 4, uint32_t)
89
+
90
#define DO_VADDLV(OP, TYPE, LTYPE) \
91
uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vm, \
92
uint64_t ra) \
93
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
94
index XXXXXXX..XXXXXXX 100644
95
--- a/target/arm/translate-mve.c
96
+++ b/target/arm/translate-mve.c
97
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenVIDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32);
98
typedef void MVEGenVIWDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32);
99
typedef void MVEGenCmpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
100
typedef void MVEGenScalarCmpFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
101
+typedef void MVEGenVABAVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
102
103
/* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
104
static inline long mve_qreg_offset(unsigned reg)
105
@@ -XXX,XX +XXX,XX @@ DO_VMAXV(VMAXAV, vmaxav)
106
DO_VMAXV(VMINV_S, vminvs)
107
DO_VMAXV(VMINV_U, vminvu)
108
DO_VMAXV(VMINAV, vminav)
109
+
110
+static bool do_vabav(DisasContext *s, arg_vabav *a, MVEGenVABAVFn *fn)
111
+{
112
+ /* Absolute difference accumulated across vector */
113
+ TCGv_ptr qn, qm;
114
+ TCGv_i32 rda;
115
+
116
+ if (!dc_isar_feature(aa32_mve, s) ||
117
+ !mve_check_qreg_bank(s, a->qm | a->qn) ||
118
+ !fn || a->rda == 13 || a->rda == 15) {
119
+ /* Rda cases are UNPREDICTABLE */
120
+ return false;
121
+ }
122
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
123
+ return true;
124
+ }
125
+
126
+ qm = mve_qreg_ptr(a->qm);
127
+ qn = mve_qreg_ptr(a->qn);
128
+ rda = load_reg(s, a->rda);
129
+ fn(rda, cpu_env, qn, qm, rda);
130
+ store_reg(s, a->rda, rda);
131
+ tcg_temp_free_ptr(qm);
132
+ tcg_temp_free_ptr(qn);
133
+ mve_update_eci(s);
134
+ return true;
135
+}
136
+
137
+#define DO_VABAV(INSN, FN) \
138
+ static bool trans_##INSN(DisasContext *s, arg_vabav *a) \
139
+ { \
140
+ static MVEGenVABAVFn * const fns[] = { \
141
+ gen_helper_mve_##FN##b, \
142
+ gen_helper_mve_##FN##h, \
143
+ gen_helper_mve_##FN##w, \
144
+ NULL, \
145
+ }; \
146
+ return do_vabav(s, a, fns[a->size]); \
147
+ }
148
+
149
+DO_VABAV(VABAV_S, vabavs)
150
+DO_VABAV(VABAV_U, vabavu)
151
--
146
--
152
2.20.1
147
2.34.1
153
154
diff view generated by jsdifflib
1
In some situations we need a mask telling us which parts of the
1
Architecturally, the AArch32 MSR/MRS to/from banked register
2
vector correspond to beats that are not being executed because of
2
instructions are UNPREDICTABLE for attempts to access a banked
3
ECI, separately from the combined "which bytes are predicated away"
3
register that the guest could access in a more direct way (e.g.
4
mask. Factor this mask calculation out of mve_element_mask() into
4
using this insn to access r8_fiq when already in FIQ mode). QEMU has
5
its own function.
5
chosen to UNDEF on all of these.
6
7
However, for the case of accessing SPSR_hyp from hyp mode, it turns
8
out that real hardware permits this, with the same effect as if the
9
guest had directly written to SPSR. Further, there is some
10
guest code out there that assumes it can do this, because it
11
happens to work on hardware: an example Cortex-R52 startup code
12
fragment uses this, and it got copied into various other places,
13
including Zephyr. Zephyr was fixed to not use this:
14
https://github.com/zephyrproject-rtos/zephyr/issues/47330
15
but other examples are still out there, like the selftest
16
binary for the MPS3-AN536.
17
18
For convenience of being able to run guest code, permit
19
this UNPREDICTABLE access instead of UNDEFing it.
6
20
7
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
21
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
22
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
23
Message-id: 20240206132931.38376-5-peter.maydell@linaro.org
9
---
24
---
10
target/arm/mve_helper.c | 58 ++++++++++++++++++++++++-----------------
25
target/arm/tcg/op_helper.c | 43 ++++++++++++++++++++++++++------------
11
1 file changed, 34 insertions(+), 24 deletions(-)
26
target/arm/tcg/translate.c | 19 +++++++++++------
27
2 files changed, 43 insertions(+), 19 deletions(-)
12
28
13
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
29
diff --git a/target/arm/tcg/op_helper.c b/target/arm/tcg/op_helper.c
14
index XXXXXXX..XXXXXXX 100644
30
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/mve_helper.c
31
--- a/target/arm/tcg/op_helper.c
16
+++ b/target/arm/mve_helper.c
32
+++ b/target/arm/tcg/op_helper.c
17
@@ -XXX,XX +XXX,XX @@
33
@@ -XXX,XX +XXX,XX @@ static void msr_mrs_banked_exc_checks(CPUARMState *env, uint32_t tgtmode,
18
#include "exec/exec-all.h"
34
*/
19
#include "tcg/tcg.h"
35
int curmode = env->uncached_cpsr & CPSR_M;
20
36
21
+static uint16_t mve_eci_mask(CPUARMState *env)
37
- if (regno == 17) {
22
+{
38
- /* ELR_Hyp: a special case because access from tgtmode is OK */
23
+ /*
39
- if (curmode != ARM_CPU_MODE_HYP && curmode != ARM_CPU_MODE_MON) {
24
+ * Return the mask of which elements in the MVE vector correspond
40
- goto undef;
25
+ * to beats being executed. The mask has 1 bits for executed lanes
41
+ if (tgtmode == ARM_CPU_MODE_HYP) {
26
+ * and 0 bits where ECI says this beat was already executed.
42
+ /*
27
+ */
43
+ * Handle Hyp target regs first because some are special cases
28
+ int eci;
44
+ * which don't want the usual "not accessible from tgtmode" check.
29
+
45
+ */
30
+ if ((env->condexec_bits & 0xf) != 0) {
46
+ switch (regno) {
31
+ return 0xffff;
47
+ case 16 ... 17: /* ELR_Hyp, SPSR_Hyp */
32
+ }
48
+ if (curmode != ARM_CPU_MODE_HYP && curmode != ARM_CPU_MODE_MON) {
33
+
49
+ goto undef;
34
+ eci = env->condexec_bits >> 4;
50
+ }
35
+ switch (eci) {
51
+ break;
36
+ case ECI_NONE:
52
+ case 13:
37
+ return 0xffff;
53
+ if (curmode != ARM_CPU_MODE_MON) {
38
+ case ECI_A0:
54
+ goto undef;
39
+ return 0xfff0;
55
+ }
40
+ case ECI_A0A1:
56
+ break;
41
+ return 0xff00;
57
+ default:
42
+ case ECI_A0A1A2:
58
+ g_assert_not_reached();
43
+ case ECI_A0A1A2B0:
59
}
44
+ return 0xf000;
60
return;
45
+ default:
46
+ g_assert_not_reached();
47
+ }
48
+}
49
+
50
static uint16_t mve_element_mask(CPUARMState *env)
51
{
52
/*
53
@@ -XXX,XX +XXX,XX @@ static uint16_t mve_element_mask(CPUARMState *env)
54
mask &= ltpmask;
55
}
61
}
56
62
@@ -XXX,XX +XXX,XX @@ static void msr_mrs_banked_exc_checks(CPUARMState *env, uint32_t tgtmode,
57
- if ((env->condexec_bits & 0xf) == 0) {
63
}
58
- /*
64
}
59
- * ECI bits indicate which beats are already executed;
65
60
- * we handle this by effectively predicating them out.
66
- if (tgtmode == ARM_CPU_MODE_HYP) {
61
- */
67
- /* SPSR_Hyp, r13_hyp: accessible from Monitor mode only */
62
- int eci = env->condexec_bits >> 4;
68
- if (curmode != ARM_CPU_MODE_MON) {
63
- switch (eci) {
69
- goto undef;
64
- case ECI_NONE:
65
- break;
66
- case ECI_A0:
67
- mask &= 0xfff0;
68
- break;
69
- case ECI_A0A1:
70
- mask &= 0xff00;
71
- break;
72
- case ECI_A0A1A2:
73
- case ECI_A0A1A2B0:
74
- mask &= 0xf000;
75
- break;
76
- default:
77
- g_assert_not_reached();
78
- }
70
- }
79
- }
71
- }
80
-
72
-
81
+ /*
73
return;
82
+ * ECI bits indicate which beats are already executed;
74
83
+ * we handle this by effectively predicating them out.
75
undef:
84
+ */
76
@@ -XXX,XX +XXX,XX @@ void HELPER(msr_banked)(CPUARMState *env, uint32_t value, uint32_t tgtmode,
85
+ mask &= mve_eci_mask(env);
77
86
return mask;
78
switch (regno) {
87
}
79
case 16: /* SPSRs */
88
80
- env->banked_spsr[bank_number(tgtmode)] = value;
81
+ if (tgtmode == (env->uncached_cpsr & CPSR_M)) {
82
+ /* Only happens for SPSR_Hyp access in Hyp mode */
83
+ env->spsr = value;
84
+ } else {
85
+ env->banked_spsr[bank_number(tgtmode)] = value;
86
+ }
87
break;
88
case 17: /* ELR_Hyp */
89
env->elr_el[2] = value;
90
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(mrs_banked)(CPUARMState *env, uint32_t tgtmode, uint32_t regno)
91
92
switch (regno) {
93
case 16: /* SPSRs */
94
- return env->banked_spsr[bank_number(tgtmode)];
95
+ if (tgtmode == (env->uncached_cpsr & CPSR_M)) {
96
+ /* Only happens for SPSR_Hyp access in Hyp mode */
97
+ return env->spsr;
98
+ } else {
99
+ return env->banked_spsr[bank_number(tgtmode)];
100
+ }
101
case 17: /* ELR_Hyp */
102
return env->elr_el[2];
103
case 13:
104
diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
105
index XXXXXXX..XXXXXXX 100644
106
--- a/target/arm/tcg/translate.c
107
+++ b/target/arm/tcg/translate.c
108
@@ -XXX,XX +XXX,XX @@ static bool msr_banked_access_decode(DisasContext *s, int r, int sysm, int rn,
109
break;
110
case ARM_CPU_MODE_HYP:
111
/*
112
- * SPSR_hyp and r13_hyp can only be accessed from Monitor mode
113
- * (and so we can forbid accesses from EL2 or below). elr_hyp
114
- * can be accessed also from Hyp mode, so forbid accesses from
115
- * EL0 or EL1.
116
+ * r13_hyp can only be accessed from Monitor mode, and so we
117
+ * can forbid accesses from EL2 or below.
118
+ * elr_hyp can be accessed also from Hyp mode, so forbid
119
+ * accesses from EL0 or EL1.
120
+ * SPSR_hyp is supposed to be in the same category as r13_hyp
121
+ * and UNPREDICTABLE if accessed from anything except Monitor
122
+ * mode. However there is some real-world code that will do
123
+ * it because at least some hardware happens to permit the
124
+ * access. (Notably a standard Cortex-R52 startup code fragment
125
+ * does this.) So we permit SPSR_hyp from Hyp mode also, to allow
126
+ * this (incorrect) guest code to run.
127
*/
128
- if (!arm_dc_feature(s, ARM_FEATURE_EL2) || s->current_el < 2 ||
129
- (s->current_el < 3 && *regno != 17)) {
130
+ if (!arm_dc_feature(s, ARM_FEATURE_EL2) || s->current_el < 2
131
+ || (s->current_el < 3 && *regno != 16 && *regno != 17)) {
132
goto undef;
133
}
134
break;
89
--
135
--
90
2.20.1
136
2.34.1
91
92
diff view generated by jsdifflib
1
Implement the MVE instructions which perform shifts by a scalar.
1
We currently guard the CFG3 register read with
2
These are VSHL T2, VRSHL T2, VQSHL T1 and VQRSHL T2. They take the
2
(scc_partno(s) == 0x524 && scc_partno(s) == 0x547)
3
shift amount in a general purpose register and shift every element in
3
which is clearly wrong as it is never true.
4
the vector by that amount.
5
4
6
Mostly we can reuse the helper functions for shift-by-immediate; we
5
This register is present on all board types except AN524
7
do need two new helpers for VQRSHL.
6
and AN527; correct the condition.
8
7
8
Fixes: 6ac80818941829c0 ("hw/misc/mps2-scc: Implement changes for AN547")
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
10
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
10
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
11
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
12
Message-id: 20240206132931.38376-6-peter.maydell@linaro.org
11
---
13
---
12
target/arm/helper-mve.h | 8 +++++++
14
hw/misc/mps2-scc.c | 2 +-
13
target/arm/mve.decode | 23 ++++++++++++++++---
15
1 file changed, 1 insertion(+), 1 deletion(-)
14
target/arm/mve_helper.c | 2 ++
15
target/arm/translate-mve.c | 46 ++++++++++++++++++++++++++++++++++++++
16
4 files changed, 76 insertions(+), 3 deletions(-)
17
16
18
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
17
diff --git a/hw/misc/mps2-scc.c b/hw/misc/mps2-scc.c
19
index XXXXXXX..XXXXXXX 100644
18
index XXXXXXX..XXXXXXX 100644
20
--- a/target/arm/helper-mve.h
19
--- a/hw/misc/mps2-scc.c
21
+++ b/target/arm/helper-mve.h
20
+++ b/hw/misc/mps2-scc.c
22
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vrshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
21
@@ -XXX,XX +XXX,XX @@ static uint64_t mps2_scc_read(void *opaque, hwaddr offset, unsigned size)
23
DEF_HELPER_FLAGS_4(mve_vrshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
22
r = s->cfg2;
24
DEF_HELPER_FLAGS_4(mve_vrshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
23
break;
25
24
case A_CFG3:
26
+DEF_HELPER_FLAGS_4(mve_vqrshli_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
25
- if (scc_partno(s) == 0x524 && scc_partno(s) == 0x547) {
27
+DEF_HELPER_FLAGS_4(mve_vqrshli_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
26
+ if (scc_partno(s) == 0x524 || scc_partno(s) == 0x547) {
28
+DEF_HELPER_FLAGS_4(mve_vqrshli_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
27
/* CFG3 reserved on AN524 */
29
+
28
goto bad_offset;
30
+DEF_HELPER_FLAGS_4(mve_vqrshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
29
}
31
+DEF_HELPER_FLAGS_4(mve_vqrshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
32
+DEF_HELPER_FLAGS_4(mve_vqrshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
33
+
34
DEF_HELPER_FLAGS_4(mve_vshllbsb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
35
DEF_HELPER_FLAGS_4(mve_vshllbsh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
36
DEF_HELPER_FLAGS_4(mve_vshllbub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
37
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
38
index XXXXXXX..XXXXXXX 100644
39
--- a/target/arm/mve.decode
40
+++ b/target/arm/mve.decode
41
@@ -XXX,XX +XXX,XX @@
42
&viwdup qd rn rm size imm
43
&vcmp qm qn size mask
44
&vcmp_scalar qn rm size mask
45
+&shl_scalar qda rm size
46
47
@vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
48
# Note that both Rn and Qd are 3 bits only (no D bit)
49
@@ -XXX,XX +XXX,XX @@
50
@2_shr_w .... .... .. 1 ..... .... .... .... .... &2shift qd=%qd qm=%qm \
51
size=2 shift=%rshift_i5
52
53
+@shl_scalar .... .... .... size:2 .. .... .... .... rm:4 &shl_scalar qda=%qd
54
+
55
# Vector comparison; 4-bit Qm but 3-bit Qn
56
%mask_22_13 22:1 13:3
57
@vcmp .... .... .. size:2 qn:3 . .... .... .... .... &vcmp qm=%qm mask=%mask_22_13
58
@@ -XXX,XX +XXX,XX @@ VRMLSLDAVH 1111 1110 1 ... ... 0 ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav_no
59
60
VADD_scalar 1110 1110 0 . .. ... 1 ... 0 1111 . 100 .... @2scalar
61
VSUB_scalar 1110 1110 0 . .. ... 1 ... 1 1111 . 100 .... @2scalar
62
-VMUL_scalar 1110 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
63
+
64
+{
65
+ VSHL_S_scalar 1110 1110 0 . 11 .. 01 ... 1 1110 0110 .... @shl_scalar
66
+ VRSHL_S_scalar 1110 1110 0 . 11 .. 11 ... 1 1110 0110 .... @shl_scalar
67
+ VQSHL_S_scalar 1110 1110 0 . 11 .. 01 ... 1 1110 1110 .... @shl_scalar
68
+ VQRSHL_S_scalar 1110 1110 0 . 11 .. 11 ... 1 1110 1110 .... @shl_scalar
69
+ VMUL_scalar 1110 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
70
+}
71
+
72
+{
73
+ VSHL_U_scalar 1111 1110 0 . 11 .. 01 ... 1 1110 0110 .... @shl_scalar
74
+ VRSHL_U_scalar 1111 1110 0 . 11 .. 11 ... 1 1110 0110 .... @shl_scalar
75
+ VQSHL_U_scalar 1111 1110 0 . 11 .. 01 ... 1 1110 1110 .... @shl_scalar
76
+ VQRSHL_U_scalar 1111 1110 0 . 11 .. 11 ... 1 1110 1110 .... @shl_scalar
77
+ VBRSR 1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
78
+}
79
+
80
VHADD_S_scalar 1110 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
81
VHADD_U_scalar 1111 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
82
VHSUB_S_scalar 1110 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
83
@@ -XXX,XX +XXX,XX @@ VHSUB_U_scalar 1111 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
84
size=%size_28
85
}
86
87
-VBRSR 1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
88
-
89
VQDMULH_scalar 1110 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
90
VQRDMULH_scalar 1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
91
92
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
93
index XXXXXXX..XXXXXXX 100644
94
--- a/target/arm/mve_helper.c
95
+++ b/target/arm/mve_helper.c
96
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_SAT_S(vqshli_s, DO_SQSHL_OP)
97
DO_2SHIFT_SAT_S(vqshlui_s, DO_SUQSHL_OP)
98
DO_2SHIFT_U(vrshli_u, DO_VRSHLU)
99
DO_2SHIFT_S(vrshli_s, DO_VRSHLS)
100
+DO_2SHIFT_SAT_U(vqrshli_u, DO_UQRSHL_OP)
101
+DO_2SHIFT_SAT_S(vqrshli_s, DO_SQRSHL_OP)
102
103
/* Shift-and-insert; we always work with 64 bits at a time */
104
#define DO_2SHIFT_INSERT(OP, ESIZE, SHIFTFN, MASKFN) \
105
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
106
index XXXXXXX..XXXXXXX 100644
107
--- a/target/arm/translate-mve.c
108
+++ b/target/arm/translate-mve.c
109
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT(VRSHRI_U, vrshli_u, true)
110
DO_2SHIFT(VSRI, vsri, false)
111
DO_2SHIFT(VSLI, vsli, false)
112
113
+static bool do_2shift_scalar(DisasContext *s, arg_shl_scalar *a,
114
+ MVEGenTwoOpShiftFn *fn)
115
+{
116
+ TCGv_ptr qda;
117
+ TCGv_i32 rm;
118
+
119
+ if (!dc_isar_feature(aa32_mve, s) ||
120
+ !mve_check_qreg_bank(s, a->qda) ||
121
+ a->rm == 13 || a->rm == 15 || !fn) {
122
+ /* Rm cases are UNPREDICTABLE */
123
+ return false;
124
+ }
125
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
126
+ return true;
127
+ }
128
+
129
+ qda = mve_qreg_ptr(a->qda);
130
+ rm = load_reg(s, a->rm);
131
+ fn(cpu_env, qda, qda, rm);
132
+ tcg_temp_free_ptr(qda);
133
+ tcg_temp_free_i32(rm);
134
+ mve_update_eci(s);
135
+ return true;
136
+}
137
+
138
+#define DO_2SHIFT_SCALAR(INSN, FN) \
139
+ static bool trans_##INSN(DisasContext *s, arg_shl_scalar *a) \
140
+ { \
141
+ static MVEGenTwoOpShiftFn * const fns[] = { \
142
+ gen_helper_mve_##FN##b, \
143
+ gen_helper_mve_##FN##h, \
144
+ gen_helper_mve_##FN##w, \
145
+ NULL, \
146
+ }; \
147
+ return do_2shift_scalar(s, a, fns[a->size]); \
148
+ }
149
+
150
+DO_2SHIFT_SCALAR(VSHL_S_scalar, vshli_s)
151
+DO_2SHIFT_SCALAR(VSHL_U_scalar, vshli_u)
152
+DO_2SHIFT_SCALAR(VRSHL_S_scalar, vrshli_s)
153
+DO_2SHIFT_SCALAR(VRSHL_U_scalar, vrshli_u)
154
+DO_2SHIFT_SCALAR(VQSHL_S_scalar, vqshli_s)
155
+DO_2SHIFT_SCALAR(VQSHL_U_scalar, vqshli_u)
156
+DO_2SHIFT_SCALAR(VQRSHL_S_scalar, vqrshli_s)
157
+DO_2SHIFT_SCALAR(VQRSHL_U_scalar, vqrshli_u)
158
+
159
#define DO_VSHLL(INSN, FN) \
160
static bool trans_##INSN(DisasContext *s, arg_2shift *a) \
161
{ \
162
--
30
--
163
2.20.1
31
2.34.1
164
32
165
33
diff view generated by jsdifflib
1
Implement the MVE interleaving load/store functions VLD2, VLD4, VST2
1
The MPS SCC device has a lot of different flavours for the various
2
and VST4. VLD2 loads 16 bytes of data from memory and writes to 2
2
different MPS FPGA images, which look mostly similar but have
3
consecutive Qregs; VLD4 loads 16 bytes of data from memory and writes
3
differences in how particular registers are handled. Currently we
4
to 4 consecutive Qregs. The 'pattern' field in the encoding
4
deal with this with a lot of open-coded checks on scc_partno(), but
5
determines the offset into memory which is accessed and also which
5
as we add more board types this is getting a bit hard to read.
6
elements in the Qregs are written to. (The intention is that a
6
7
sequence of four consecutive VLD4 with different pattern values
7
Factor out the conditions into some functions which we can
8
performs a complete de-interleaving load of 64 bytes into all
8
give more descriptive names to.
9
elements of the 4 Qregs.) VST2 and VST4 do the same, but for stores.
10
9
11
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
12
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
12
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
13
Message-id: 20240206132931.38376-7-peter.maydell@linaro.org
13
---
14
---
14
target/arm/helper-mve.h | 48 ++++++
15
hw/misc/mps2-scc.c | 45 +++++++++++++++++++++++++++++++--------------
15
target/arm/mve.decode | 11 ++
16
1 file changed, 31 insertions(+), 14 deletions(-)
16
target/arm/mve_helper.c | 342 +++++++++++++++++++++++++++++++++++++
17
target/arm/translate-mve.c | 94 ++++++++++
18
4 files changed, 495 insertions(+)
19
17
20
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
18
diff --git a/hw/misc/mps2-scc.c b/hw/misc/mps2-scc.c
21
index XXXXXXX..XXXXXXX 100644
19
index XXXXXXX..XXXXXXX 100644
22
--- a/target/arm/helper-mve.h
20
--- a/hw/misc/mps2-scc.c
23
+++ b/target/arm/helper-mve.h
21
+++ b/hw/misc/mps2-scc.c
24
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vldrd_sg_wb_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
22
@@ -XXX,XX +XXX,XX @@ static int scc_partno(MPS2SCC *s)
25
DEF_HELPER_FLAGS_4(mve_vstrw_sg_wb_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
23
return extract32(s->id, 4, 8);
26
DEF_HELPER_FLAGS_4(mve_vstrd_sg_wb_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
27
28
+DEF_HELPER_FLAGS_3(mve_vld20b, TCG_CALL_NO_WG, void, env, i32, i32)
29
+DEF_HELPER_FLAGS_3(mve_vld20h, TCG_CALL_NO_WG, void, env, i32, i32)
30
+DEF_HELPER_FLAGS_3(mve_vld20w, TCG_CALL_NO_WG, void, env, i32, i32)
31
+
32
+DEF_HELPER_FLAGS_3(mve_vld21b, TCG_CALL_NO_WG, void, env, i32, i32)
33
+DEF_HELPER_FLAGS_3(mve_vld21h, TCG_CALL_NO_WG, void, env, i32, i32)
34
+DEF_HELPER_FLAGS_3(mve_vld21w, TCG_CALL_NO_WG, void, env, i32, i32)
35
+
36
+DEF_HELPER_FLAGS_3(mve_vld40b, TCG_CALL_NO_WG, void, env, i32, i32)
37
+DEF_HELPER_FLAGS_3(mve_vld40h, TCG_CALL_NO_WG, void, env, i32, i32)
38
+DEF_HELPER_FLAGS_3(mve_vld40w, TCG_CALL_NO_WG, void, env, i32, i32)
39
+
40
+DEF_HELPER_FLAGS_3(mve_vld41b, TCG_CALL_NO_WG, void, env, i32, i32)
41
+DEF_HELPER_FLAGS_3(mve_vld41h, TCG_CALL_NO_WG, void, env, i32, i32)
42
+DEF_HELPER_FLAGS_3(mve_vld41w, TCG_CALL_NO_WG, void, env, i32, i32)
43
+
44
+DEF_HELPER_FLAGS_3(mve_vld42b, TCG_CALL_NO_WG, void, env, i32, i32)
45
+DEF_HELPER_FLAGS_3(mve_vld42h, TCG_CALL_NO_WG, void, env, i32, i32)
46
+DEF_HELPER_FLAGS_3(mve_vld42w, TCG_CALL_NO_WG, void, env, i32, i32)
47
+
48
+DEF_HELPER_FLAGS_3(mve_vld43b, TCG_CALL_NO_WG, void, env, i32, i32)
49
+DEF_HELPER_FLAGS_3(mve_vld43h, TCG_CALL_NO_WG, void, env, i32, i32)
50
+DEF_HELPER_FLAGS_3(mve_vld43w, TCG_CALL_NO_WG, void, env, i32, i32)
51
+
52
+DEF_HELPER_FLAGS_3(mve_vst20b, TCG_CALL_NO_WG, void, env, i32, i32)
53
+DEF_HELPER_FLAGS_3(mve_vst20h, TCG_CALL_NO_WG, void, env, i32, i32)
54
+DEF_HELPER_FLAGS_3(mve_vst20w, TCG_CALL_NO_WG, void, env, i32, i32)
55
+
56
+DEF_HELPER_FLAGS_3(mve_vst21b, TCG_CALL_NO_WG, void, env, i32, i32)
57
+DEF_HELPER_FLAGS_3(mve_vst21h, TCG_CALL_NO_WG, void, env, i32, i32)
58
+DEF_HELPER_FLAGS_3(mve_vst21w, TCG_CALL_NO_WG, void, env, i32, i32)
59
+
60
+DEF_HELPER_FLAGS_3(mve_vst40b, TCG_CALL_NO_WG, void, env, i32, i32)
61
+DEF_HELPER_FLAGS_3(mve_vst40h, TCG_CALL_NO_WG, void, env, i32, i32)
62
+DEF_HELPER_FLAGS_3(mve_vst40w, TCG_CALL_NO_WG, void, env, i32, i32)
63
+
64
+DEF_HELPER_FLAGS_3(mve_vst41b, TCG_CALL_NO_WG, void, env, i32, i32)
65
+DEF_HELPER_FLAGS_3(mve_vst41h, TCG_CALL_NO_WG, void, env, i32, i32)
66
+DEF_HELPER_FLAGS_3(mve_vst41w, TCG_CALL_NO_WG, void, env, i32, i32)
67
+
68
+DEF_HELPER_FLAGS_3(mve_vst42b, TCG_CALL_NO_WG, void, env, i32, i32)
69
+DEF_HELPER_FLAGS_3(mve_vst42h, TCG_CALL_NO_WG, void, env, i32, i32)
70
+DEF_HELPER_FLAGS_3(mve_vst42w, TCG_CALL_NO_WG, void, env, i32, i32)
71
+
72
+DEF_HELPER_FLAGS_3(mve_vst43b, TCG_CALL_NO_WG, void, env, i32, i32)
73
+DEF_HELPER_FLAGS_3(mve_vst43h, TCG_CALL_NO_WG, void, env, i32, i32)
74
+DEF_HELPER_FLAGS_3(mve_vst43w, TCG_CALL_NO_WG, void, env, i32, i32)
75
+
76
DEF_HELPER_FLAGS_3(mve_vdup, TCG_CALL_NO_WG, void, env, ptr, i32)
77
78
DEF_HELPER_FLAGS_4(mve_vidupb, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
79
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
80
index XXXXXXX..XXXXXXX 100644
81
--- a/target/arm/mve.decode
82
+++ b/target/arm/mve.decode
83
@@ -XXX,XX +XXX,XX @@
84
&vabav qn qm rda size
85
&vldst_sg qd qm rn size msize os
86
&vldst_sg_imm qd qm a w imm
87
+&vldst_il qd rn size pat w
88
89
# scatter-gather memory size is in bits 6:4
90
%sg_msize 6:1 4:1
91
@@ -XXX,XX +XXX,XX @@
92
@vldst_sg_imm .... .... a:1 . w:1 . .... .... .... . imm:7 &vldst_sg_imm \
93
qd=%qd qm=%qn
94
95
+# Deinterleaving load/interleaving store
96
+@vldst_il .... .... .. w:1 . rn:4 .... ... size:2 pat:2 ..... &vldst_il \
97
+ qd=%qd
98
+
99
@1op .... .... .... size:2 .. .... .... .... .... &1op qd=%qd qm=%qm
100
@1op_nosz .... .... .... .... .... .... .... .... &1op qd=%qd qm=%qm size=0
101
@2op .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn
102
@@ -XXX,XX +XXX,XX @@ VLDRD_sg_imm 111 1 1101 ... 1 ... 0 ... 1 1111 .... .... @vldst_sg_imm
103
VSTRW_sg_imm 111 1 1101 ... 0 ... 0 ... 1 1110 .... .... @vldst_sg_imm
104
VSTRD_sg_imm 111 1 1101 ... 0 ... 0 ... 1 1111 .... .... @vldst_sg_imm
105
106
+# deinterleaving loads/interleaving stores
107
+VLD2 1111 1100 1 .. 1 .... ... 1 111 .. .. 00000 @vldst_il
108
+VLD4 1111 1100 1 .. 1 .... ... 1 111 .. .. 00001 @vldst_il
109
+VST2 1111 1100 1 .. 0 .... ... 1 111 .. .. 00000 @vldst_il
110
+VST4 1111 1100 1 .. 0 .... ... 1 111 .. .. 00001 @vldst_il
111
+
112
# Moves between 2 32-bit vector lanes and 2 general purpose registers
113
VMOV_to_2gp 1110 1100 0 . 00 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
114
VMOV_from_2gp 1110 1100 0 . 01 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
115
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
116
index XXXXXXX..XXXXXXX 100644
117
--- a/target/arm/mve_helper.c
118
+++ b/target/arm/mve_helper.c
119
@@ -XXX,XX +XXX,XX @@ DO_VLDR64_SG(vldrd_sg_wb_ud, ADDR_ADD, true)
120
DO_VSTR_SG(vstrw_sg_wb_uw, stl, 4, uint32_t, ADDR_ADD, true)
121
DO_VSTR64_SG(vstrd_sg_wb_ud, ADDR_ADD, true)
122
123
+/*
124
+ * Deinterleaving loads/interleaving stores.
125
+ *
126
+ * For these helpers we are passed the index of the first Qreg
127
+ * (VLD2/VST2 will also access Qn+1, VLD4/VST4 access Qn .. Qn+3)
128
+ * and the value of the base address register Rn.
129
+ * The helpers are specialized for pattern and element size, so
130
+ * for instance vld42h is VLD4 with pattern 2, element size MO_16.
131
+ *
132
+ * These insns are beatwise but not predicated, so we must honour ECI,
133
+ * but need not look at mve_element_mask().
134
+ *
135
+ * The pseudocode implements these insns with multiple memory accesses
136
+ * of the element size, but rules R_VVVG and R_FXDM permit us to make
137
+ * one 32-bit memory access per beat.
138
+ */
139
+#define DO_VLD4B(OP, O1, O2, O3, O4) \
140
+ void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx, \
141
+ uint32_t base) \
142
+ { \
143
+ int beat, e; \
144
+ uint16_t mask = mve_eci_mask(env); \
145
+ static const uint8_t off[4] = { O1, O2, O3, O4 }; \
146
+ uint32_t addr, data; \
147
+ for (beat = 0; beat < 4; beat++, mask >>= 4) { \
148
+ if ((mask & 1) == 0) { \
149
+ /* ECI says skip this beat */ \
150
+ continue; \
151
+ } \
152
+ addr = base + off[beat] * 4; \
153
+ data = cpu_ldl_le_data_ra(env, addr, GETPC()); \
154
+ for (e = 0; e < 4; e++, data >>= 8) { \
155
+ uint8_t *qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + e); \
156
+ qd[H1(off[beat])] = data; \
157
+ } \
158
+ } \
159
+ }
160
+
161
+#define DO_VLD4H(OP, O1, O2) \
162
+ void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx, \
163
+ uint32_t base) \
164
+ { \
165
+ int beat; \
166
+ uint16_t mask = mve_eci_mask(env); \
167
+ static const uint8_t off[4] = { O1, O1, O2, O2 }; \
168
+ uint32_t addr, data; \
169
+ int y; /* y counts 0 2 0 2 */ \
170
+ uint16_t *qd; \
171
+ for (beat = 0, y = 0; beat < 4; beat++, mask >>= 4, y ^= 2) { \
172
+ if ((mask & 1) == 0) { \
173
+ /* ECI says skip this beat */ \
174
+ continue; \
175
+ } \
176
+ addr = base + off[beat] * 8 + (beat & 1) * 4; \
177
+ data = cpu_ldl_le_data_ra(env, addr, GETPC()); \
178
+ qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + y); \
179
+ qd[H2(off[beat])] = data; \
180
+ data >>= 16; \
181
+ qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + y + 1); \
182
+ qd[H2(off[beat])] = data; \
183
+ } \
184
+ }
185
+
186
+#define DO_VLD4W(OP, O1, O2, O3, O4) \
187
+ void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx, \
188
+ uint32_t base) \
189
+ { \
190
+ int beat; \
191
+ uint16_t mask = mve_eci_mask(env); \
192
+ static const uint8_t off[4] = { O1, O2, O3, O4 }; \
193
+ uint32_t addr, data; \
194
+ uint32_t *qd; \
195
+ int y; \
196
+ for (beat = 0; beat < 4; beat++, mask >>= 4) { \
197
+ if ((mask & 1) == 0) { \
198
+ /* ECI says skip this beat */ \
199
+ continue; \
200
+ } \
201
+ addr = base + off[beat] * 4; \
202
+ data = cpu_ldl_le_data_ra(env, addr, GETPC()); \
203
+ y = (beat + (O1 & 2)) & 3; \
204
+ qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + y); \
205
+ qd[H4(off[beat] >> 2)] = data; \
206
+ } \
207
+ }
208
+
209
+DO_VLD4B(vld40b, 0, 1, 10, 11)
210
+DO_VLD4B(vld41b, 2, 3, 12, 13)
211
+DO_VLD4B(vld42b, 4, 5, 14, 15)
212
+DO_VLD4B(vld43b, 6, 7, 8, 9)
213
+
214
+DO_VLD4H(vld40h, 0, 5)
215
+DO_VLD4H(vld41h, 1, 6)
216
+DO_VLD4H(vld42h, 2, 7)
217
+DO_VLD4H(vld43h, 3, 4)
218
+
219
+DO_VLD4W(vld40w, 0, 1, 10, 11)
220
+DO_VLD4W(vld41w, 2, 3, 12, 13)
221
+DO_VLD4W(vld42w, 4, 5, 14, 15)
222
+DO_VLD4W(vld43w, 6, 7, 8, 9)
223
+
224
+#define DO_VLD2B(OP, O1, O2, O3, O4) \
225
+ void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx, \
226
+ uint32_t base) \
227
+ { \
228
+ int beat, e; \
229
+ uint16_t mask = mve_eci_mask(env); \
230
+ static const uint8_t off[4] = { O1, O2, O3, O4 }; \
231
+ uint32_t addr, data; \
232
+ uint8_t *qd; \
233
+ for (beat = 0; beat < 4; beat++, mask >>= 4) { \
234
+ if ((mask & 1) == 0) { \
235
+ /* ECI says skip this beat */ \
236
+ continue; \
237
+ } \
238
+ addr = base + off[beat] * 2; \
239
+ data = cpu_ldl_le_data_ra(env, addr, GETPC()); \
240
+ for (e = 0; e < 4; e++, data >>= 8) { \
241
+ qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + (e & 1)); \
242
+ qd[H1(off[beat] + (e >> 1))] = data; \
243
+ } \
244
+ } \
245
+ }
246
+
247
+#define DO_VLD2H(OP, O1, O2, O3, O4) \
248
+ void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx, \
249
+ uint32_t base) \
250
+ { \
251
+ int beat; \
252
+ uint16_t mask = mve_eci_mask(env); \
253
+ static const uint8_t off[4] = { O1, O2, O3, O4 }; \
254
+ uint32_t addr, data; \
255
+ int e; \
256
+ uint16_t *qd; \
257
+ for (beat = 0; beat < 4; beat++, mask >>= 4) { \
258
+ if ((mask & 1) == 0) { \
259
+ /* ECI says skip this beat */ \
260
+ continue; \
261
+ } \
262
+ addr = base + off[beat] * 4; \
263
+ data = cpu_ldl_le_data_ra(env, addr, GETPC()); \
264
+ for (e = 0; e < 2; e++, data >>= 16) { \
265
+ qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + e); \
266
+ qd[H2(off[beat])] = data; \
267
+ } \
268
+ } \
269
+ }
270
+
271
+#define DO_VLD2W(OP, O1, O2, O3, O4) \
272
+ void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx, \
273
+ uint32_t base) \
274
+ { \
275
+ int beat; \
276
+ uint16_t mask = mve_eci_mask(env); \
277
+ static const uint8_t off[4] = { O1, O2, O3, O4 }; \
278
+ uint32_t addr, data; \
279
+ uint32_t *qd; \
280
+ for (beat = 0; beat < 4; beat++, mask >>= 4) { \
281
+ if ((mask & 1) == 0) { \
282
+ /* ECI says skip this beat */ \
283
+ continue; \
284
+ } \
285
+ addr = base + off[beat]; \
286
+ data = cpu_ldl_le_data_ra(env, addr, GETPC()); \
287
+ qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + (beat & 1)); \
288
+ qd[H4(off[beat] >> 3)] = data; \
289
+ } \
290
+ }
291
+
292
+DO_VLD2B(vld20b, 0, 2, 12, 14)
293
+DO_VLD2B(vld21b, 4, 6, 8, 10)
294
+
295
+DO_VLD2H(vld20h, 0, 1, 6, 7)
296
+DO_VLD2H(vld21h, 2, 3, 4, 5)
297
+
298
+DO_VLD2W(vld20w, 0, 4, 24, 28)
299
+DO_VLD2W(vld21w, 8, 12, 16, 20)
300
+
301
+#define DO_VST4B(OP, O1, O2, O3, O4) \
302
+ void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx, \
303
+ uint32_t base) \
304
+ { \
305
+ int beat, e; \
306
+ uint16_t mask = mve_eci_mask(env); \
307
+ static const uint8_t off[4] = { O1, O2, O3, O4 }; \
308
+ uint32_t addr, data; \
309
+ for (beat = 0; beat < 4; beat++, mask >>= 4) { \
310
+ if ((mask & 1) == 0) { \
311
+ /* ECI says skip this beat */ \
312
+ continue; \
313
+ } \
314
+ addr = base + off[beat] * 4; \
315
+ data = 0; \
316
+ for (e = 3; e >= 0; e--) { \
317
+ uint8_t *qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + e); \
318
+ data = (data << 8) | qd[H1(off[beat])]; \
319
+ } \
320
+ cpu_stl_le_data_ra(env, addr, data, GETPC()); \
321
+ } \
322
+ }
323
+
324
+#define DO_VST4H(OP, O1, O2) \
325
+ void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx, \
326
+ uint32_t base) \
327
+ { \
328
+ int beat; \
329
+ uint16_t mask = mve_eci_mask(env); \
330
+ static const uint8_t off[4] = { O1, O1, O2, O2 }; \
331
+ uint32_t addr, data; \
332
+ int y; /* y counts 0 2 0 2 */ \
333
+ uint16_t *qd; \
334
+ for (beat = 0, y = 0; beat < 4; beat++, mask >>= 4, y ^= 2) { \
335
+ if ((mask & 1) == 0) { \
336
+ /* ECI says skip this beat */ \
337
+ continue; \
338
+ } \
339
+ addr = base + off[beat] * 8 + (beat & 1) * 4; \
340
+ qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + y); \
341
+ data = qd[H2(off[beat])]; \
342
+ qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + y + 1); \
343
+ data |= qd[H2(off[beat])] << 16; \
344
+ cpu_stl_le_data_ra(env, addr, data, GETPC()); \
345
+ } \
346
+ }
347
+
348
+#define DO_VST4W(OP, O1, O2, O3, O4) \
349
+ void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx, \
350
+ uint32_t base) \
351
+ { \
352
+ int beat; \
353
+ uint16_t mask = mve_eci_mask(env); \
354
+ static const uint8_t off[4] = { O1, O2, O3, O4 }; \
355
+ uint32_t addr, data; \
356
+ uint32_t *qd; \
357
+ int y; \
358
+ for (beat = 0; beat < 4; beat++, mask >>= 4) { \
359
+ if ((mask & 1) == 0) { \
360
+ /* ECI says skip this beat */ \
361
+ continue; \
362
+ } \
363
+ addr = base + off[beat] * 4; \
364
+ y = (beat + (O1 & 2)) & 3; \
365
+ qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + y); \
366
+ data = qd[H4(off[beat] >> 2)]; \
367
+ cpu_stl_le_data_ra(env, addr, data, GETPC()); \
368
+ } \
369
+ }
370
+
371
+DO_VST4B(vst40b, 0, 1, 10, 11)
372
+DO_VST4B(vst41b, 2, 3, 12, 13)
373
+DO_VST4B(vst42b, 4, 5, 14, 15)
374
+DO_VST4B(vst43b, 6, 7, 8, 9)
375
+
376
+DO_VST4H(vst40h, 0, 5)
377
+DO_VST4H(vst41h, 1, 6)
378
+DO_VST4H(vst42h, 2, 7)
379
+DO_VST4H(vst43h, 3, 4)
380
+
381
+DO_VST4W(vst40w, 0, 1, 10, 11)
382
+DO_VST4W(vst41w, 2, 3, 12, 13)
383
+DO_VST4W(vst42w, 4, 5, 14, 15)
384
+DO_VST4W(vst43w, 6, 7, 8, 9)
385
+
386
+#define DO_VST2B(OP, O1, O2, O3, O4) \
387
+ void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx, \
388
+ uint32_t base) \
389
+ { \
390
+ int beat, e; \
391
+ uint16_t mask = mve_eci_mask(env); \
392
+ static const uint8_t off[4] = { O1, O2, O3, O4 }; \
393
+ uint32_t addr, data; \
394
+ uint8_t *qd; \
395
+ for (beat = 0; beat < 4; beat++, mask >>= 4) { \
396
+ if ((mask & 1) == 0) { \
397
+ /* ECI says skip this beat */ \
398
+ continue; \
399
+ } \
400
+ addr = base + off[beat] * 2; \
401
+ data = 0; \
402
+ for (e = 3; e >= 0; e--) { \
403
+ qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + (e & 1)); \
404
+ data = (data << 8) | qd[H1(off[beat] + (e >> 1))]; \
405
+ } \
406
+ cpu_stl_le_data_ra(env, addr, data, GETPC()); \
407
+ } \
408
+ }
409
+
410
+#define DO_VST2H(OP, O1, O2, O3, O4) \
411
+ void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx, \
412
+ uint32_t base) \
413
+ { \
414
+ int beat; \
415
+ uint16_t mask = mve_eci_mask(env); \
416
+ static const uint8_t off[4] = { O1, O2, O3, O4 }; \
417
+ uint32_t addr, data; \
418
+ int e; \
419
+ uint16_t *qd; \
420
+ for (beat = 0; beat < 4; beat++, mask >>= 4) { \
421
+ if ((mask & 1) == 0) { \
422
+ /* ECI says skip this beat */ \
423
+ continue; \
424
+ } \
425
+ addr = base + off[beat] * 4; \
426
+ data = 0; \
427
+ for (e = 1; e >= 0; e--) { \
428
+ qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + e); \
429
+ data = (data << 16) | qd[H2(off[beat])]; \
430
+ } \
431
+ cpu_stl_le_data_ra(env, addr, data, GETPC()); \
432
+ } \
433
+ }
434
+
435
+#define DO_VST2W(OP, O1, O2, O3, O4) \
436
+ void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx, \
437
+ uint32_t base) \
438
+ { \
439
+ int beat; \
440
+ uint16_t mask = mve_eci_mask(env); \
441
+ static const uint8_t off[4] = { O1, O2, O3, O4 }; \
442
+ uint32_t addr, data; \
443
+ uint32_t *qd; \
444
+ for (beat = 0; beat < 4; beat++, mask >>= 4) { \
445
+ if ((mask & 1) == 0) { \
446
+ /* ECI says skip this beat */ \
447
+ continue; \
448
+ } \
449
+ addr = base + off[beat]; \
450
+ qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + (beat & 1)); \
451
+ data = qd[H4(off[beat] >> 3)]; \
452
+ cpu_stl_le_data_ra(env, addr, data, GETPC()); \
453
+ } \
454
+ }
455
+
456
+DO_VST2B(vst20b, 0, 2, 12, 14)
457
+DO_VST2B(vst21b, 4, 6, 8, 10)
458
+
459
+DO_VST2H(vst20h, 0, 1, 6, 7)
460
+DO_VST2H(vst21h, 2, 3, 4, 5)
461
+
462
+DO_VST2W(vst20w, 0, 4, 24, 28)
463
+DO_VST2W(vst21w, 8, 12, 16, 20)
464
+
465
/*
466
* The mergemask(D, R, M) macro performs the operation "*D = R" but
467
* storing only the bytes which correspond to 1 bits in M,
468
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
469
index XXXXXXX..XXXXXXX 100644
470
--- a/target/arm/translate-mve.c
471
+++ b/target/arm/translate-mve.c
472
@@ -XXX,XX +XXX,XX @@ static inline int vidup_imm(DisasContext *s, int x)
473
474
typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
475
typedef void MVEGenLdStSGFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
476
+typedef void MVEGenLdStIlFn(TCGv_ptr, TCGv_i32, TCGv_i32);
477
typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
478
typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
479
typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
480
@@ -XXX,XX +XXX,XX @@ static bool trans_VSTRD_sg_imm(DisasContext *s, arg_vldst_sg_imm *a)
481
return do_ldst_sg_imm(s, a, fns[a->w], MO_64);
482
}
24
}
483
25
484
+static bool do_vldst_il(DisasContext *s, arg_vldst_il *a, MVEGenLdStIlFn *fn,
26
+/* Is CFG_REG2 present? */
485
+ int addrinc)
27
+static bool have_cfg2(MPS2SCC *s)
486
+{
28
+{
487
+ TCGv_i32 rn;
29
+ return scc_partno(s) == 0x524 || scc_partno(s) == 0x547;
488
+
489
+ if (!dc_isar_feature(aa32_mve, s) ||
490
+ !mve_check_qreg_bank(s, a->qd) ||
491
+ !fn || (a->rn == 13 && a->w) || a->rn == 15) {
492
+ /* Variously UNPREDICTABLE or UNDEF or related-encoding */
493
+ return false;
494
+ }
495
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
496
+ return true;
497
+ }
498
+
499
+ rn = load_reg(s, a->rn);
500
+ /*
501
+ * We pass the index of Qd, not a pointer, because the helper must
502
+ * access multiple Q registers starting at Qd and working up.
503
+ */
504
+ fn(cpu_env, tcg_constant_i32(a->qd), rn);
505
+
506
+ if (a->w) {
507
+ tcg_gen_addi_i32(rn, rn, addrinc);
508
+ store_reg(s, a->rn, rn);
509
+ } else {
510
+ tcg_temp_free_i32(rn);
511
+ }
512
+ mve_update_and_store_eci(s);
513
+ return true;
514
+}
30
+}
515
+
31
+
516
+/* This macro is just to make the arrays more compact in these functions */
32
+/* Is CFG_REG3 present? */
517
+#define F(N) gen_helper_mve_##N
33
+static bool have_cfg3(MPS2SCC *s)
518
+
519
+static bool trans_VLD2(DisasContext *s, arg_vldst_il *a)
520
+{
34
+{
521
+ static MVEGenLdStIlFn * const fns[4][4] = {
35
+ return scc_partno(s) != 0x524 && scc_partno(s) != 0x547;
522
+ { F(vld20b), F(vld20h), F(vld20w), NULL, },
523
+ { F(vld21b), F(vld21h), F(vld21w), NULL, },
524
+ { NULL, NULL, NULL, NULL },
525
+ { NULL, NULL, NULL, NULL },
526
+ };
527
+ if (a->qd > 6) {
528
+ return false;
529
+ }
530
+ return do_vldst_il(s, a, fns[a->pat][a->size], 32);
531
+}
36
+}
532
+
37
+
533
+static bool trans_VLD4(DisasContext *s, arg_vldst_il *a)
38
+/* Is CFG_REG5 present? */
39
+static bool have_cfg5(MPS2SCC *s)
534
+{
40
+{
535
+ static MVEGenLdStIlFn * const fns[4][4] = {
41
+ return scc_partno(s) == 0x524 || scc_partno(s) == 0x547;
536
+ { F(vld40b), F(vld40h), F(vld40w), NULL, },
537
+ { F(vld41b), F(vld41h), F(vld41w), NULL, },
538
+ { F(vld42b), F(vld42h), F(vld42w), NULL, },
539
+ { F(vld43b), F(vld43h), F(vld43w), NULL, },
540
+ };
541
+ if (a->qd > 4) {
542
+ return false;
543
+ }
544
+ return do_vldst_il(s, a, fns[a->pat][a->size], 64);
545
+}
42
+}
546
+
43
+
547
+static bool trans_VST2(DisasContext *s, arg_vldst_il *a)
44
+/* Is CFG_REG6 present? */
45
+static bool have_cfg6(MPS2SCC *s)
548
+{
46
+{
549
+ static MVEGenLdStIlFn * const fns[4][4] = {
47
+ return scc_partno(s) == 0x524;
550
+ { F(vst20b), F(vst20h), F(vst20w), NULL, },
551
+ { F(vst21b), F(vst21h), F(vst21w), NULL, },
552
+ { NULL, NULL, NULL, NULL },
553
+ { NULL, NULL, NULL, NULL },
554
+ };
555
+ if (a->qd > 6) {
556
+ return false;
557
+ }
558
+ return do_vldst_il(s, a, fns[a->pat][a->size], 32);
559
+}
48
+}
560
+
49
+
561
+static bool trans_VST4(DisasContext *s, arg_vldst_il *a)
50
/* Handle a write via the SYS_CFG channel to the specified function/device.
562
+{
51
* Return false on error (reported to guest via SYS_CFGCTRL ERROR bit).
563
+ static MVEGenLdStIlFn * const fns[4][4] = {
52
*/
564
+ { F(vst40b), F(vst40h), F(vst40w), NULL, },
53
@@ -XXX,XX +XXX,XX @@ static uint64_t mps2_scc_read(void *opaque, hwaddr offset, unsigned size)
565
+ { F(vst41b), F(vst41h), F(vst41w), NULL, },
54
r = s->cfg1;
566
+ { F(vst42b), F(vst42h), F(vst42w), NULL, },
55
break;
567
+ { F(vst43b), F(vst43h), F(vst43w), NULL, },
56
case A_CFG2:
568
+ };
57
- if (scc_partno(s) != 0x524 && scc_partno(s) != 0x547) {
569
+ if (a->qd > 4) {
58
- /* CFG2 reserved on other boards */
570
+ return false;
59
+ if (!have_cfg2(s)) {
571
+ }
60
goto bad_offset;
572
+ return do_vldst_il(s, a, fns[a->pat][a->size], 64);
61
}
573
+}
62
r = s->cfg2;
574
+
63
break;
575
+#undef F
64
case A_CFG3:
576
+
65
- if (scc_partno(s) == 0x524 || scc_partno(s) == 0x547) {
577
static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
66
- /* CFG3 reserved on AN524 */
578
{
67
+ if (!have_cfg3(s)) {
579
TCGv_ptr qd;
68
goto bad_offset;
69
}
70
/* These are user-settable DIP switches on the board. We don't
71
@@ -XXX,XX +XXX,XX @@ static uint64_t mps2_scc_read(void *opaque, hwaddr offset, unsigned size)
72
r = s->cfg4;
73
break;
74
case A_CFG5:
75
- if (scc_partno(s) != 0x524 && scc_partno(s) != 0x547) {
76
- /* CFG5 reserved on other boards */
77
+ if (!have_cfg5(s)) {
78
goto bad_offset;
79
}
80
r = s->cfg5;
81
break;
82
case A_CFG6:
83
- if (scc_partno(s) != 0x524) {
84
- /* CFG6 reserved on other boards */
85
+ if (!have_cfg6(s)) {
86
goto bad_offset;
87
}
88
r = s->cfg6;
89
@@ -XXX,XX +XXX,XX @@ static void mps2_scc_write(void *opaque, hwaddr offset, uint64_t value,
90
}
91
break;
92
case A_CFG2:
93
- if (scc_partno(s) != 0x524 && scc_partno(s) != 0x547) {
94
- /* CFG2 reserved on other boards */
95
+ if (!have_cfg2(s)) {
96
goto bad_offset;
97
}
98
/* AN524: QSPI Select signal */
99
s->cfg2 = value;
100
break;
101
case A_CFG5:
102
- if (scc_partno(s) != 0x524 && scc_partno(s) != 0x547) {
103
- /* CFG5 reserved on other boards */
104
+ if (!have_cfg5(s)) {
105
goto bad_offset;
106
}
107
/* AN524: ACLK frequency in Hz */
108
s->cfg5 = value;
109
break;
110
case A_CFG6:
111
- if (scc_partno(s) != 0x524) {
112
- /* CFG6 reserved on other boards */
113
+ if (!have_cfg6(s)) {
114
goto bad_offset;
115
}
116
/* AN524: Clock divider for BRAM */
580
--
117
--
581
2.20.1
118
2.34.1
582
119
583
120
diff view generated by jsdifflib
1
Include the MVE VPR register value in the CPU dumps produced by
1
The MPS2 SCC device is broadly the same for all FPGA images, but has
2
arm_cpu_dump_state() if we are printing FPU information. This
2
minor differences in the behaviour of the CFG registers depending on
3
makes it easier to interpret debug logs when predication is
3
the image. In many cases we don't really care about the functionality
4
active.
4
controlled by these registers and a reads-as-written or similar
5
behaviour is sufficient for the moment.
6
7
For the AN536 the required behaviour is:
8
9
* A_CFG0 has CPU reset and halt bits
10
- implement as reads-as-written for the moment
11
* A_CFG1 has flash or ATCM address 0 remap handling
12
- QEMU doesn't model this; implement as reads-as-written
13
* A_CFG2 has QSPI select (like AN524)
14
- implemented (no behaviour, as with AN524)
15
* A_CFG3 is MCC_MSB_ADDR "additional MCC addressing bits"
16
- QEMU doesn't care about these, so use the existing
17
RAZ behaviour for convenience
18
* A_CFG4 is board rev (like all other images)
19
- no change needed
20
* A_CFG5 is ACLK frq in hz (like AN524)
21
- implemented as reads-as-written, as for other boards
22
* A_CFG6 is core 0 vector table base address
23
- implemented as reads-as-written for the moment
24
* A_CFG7 is core 1 vector table base address
25
- implemented as reads-as-written for the moment
26
27
Make the changes necessary for this; leave TODO comments where
28
appropriate to indicate where we might want to come back and
29
implement things like CPU reset.
30
31
The other aspects of the device specific to this FPGA image (like the
32
values of the board ID and similar registers) will be set via the
33
device's qdev properties.
5
34
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
35
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
36
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
37
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
38
Message-id: 20240206132931.38376-8-peter.maydell@linaro.org
8
---
39
---
9
target/arm/cpu.c | 3 +++
40
include/hw/misc/mps2-scc.h | 1 +
10
1 file changed, 3 insertions(+)
41
hw/misc/mps2-scc.c | 101 +++++++++++++++++++++++++++++++++----
11
42
2 files changed, 92 insertions(+), 10 deletions(-)
12
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
43
44
diff --git a/include/hw/misc/mps2-scc.h b/include/hw/misc/mps2-scc.h
13
index XXXXXXX..XXXXXXX 100644
45
index XXXXXXX..XXXXXXX 100644
14
--- a/target/arm/cpu.c
46
--- a/include/hw/misc/mps2-scc.h
15
+++ b/target/arm/cpu.c
47
+++ b/include/hw/misc/mps2-scc.h
16
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_dump_state(CPUState *cs, FILE *f, int flags)
48
@@ -XXX,XX +XXX,XX @@ struct MPS2SCC {
17
i, v);
49
uint32_t cfg4;
18
}
50
uint32_t cfg5;
19
qemu_fprintf(f, "FPSCR: %08x\n", vfp_get_fpscr(env));
51
uint32_t cfg6;
20
+ if (cpu_isar_feature(aa32_mve, cpu)) {
52
+ uint32_t cfg7;
21
+ qemu_fprintf(f, "VPR: %08x\n", env->v7m.vpr);
53
uint32_t cfgdata_rtn;
54
uint32_t cfgdata_out;
55
uint32_t cfgctrl;
56
diff --git a/hw/misc/mps2-scc.c b/hw/misc/mps2-scc.c
57
index XXXXXXX..XXXXXXX 100644
58
--- a/hw/misc/mps2-scc.c
59
+++ b/hw/misc/mps2-scc.c
60
@@ -XXX,XX +XXX,XX @@ REG32(CFG3, 0xc)
61
REG32(CFG4, 0x10)
62
REG32(CFG5, 0x14)
63
REG32(CFG6, 0x18)
64
+REG32(CFG7, 0x1c)
65
REG32(CFGDATA_RTN, 0xa0)
66
REG32(CFGDATA_OUT, 0xa4)
67
REG32(CFGCTRL, 0xa8)
68
@@ -XXX,XX +XXX,XX @@ static int scc_partno(MPS2SCC *s)
69
/* Is CFG_REG2 present? */
70
static bool have_cfg2(MPS2SCC *s)
71
{
72
- return scc_partno(s) == 0x524 || scc_partno(s) == 0x547;
73
+ return scc_partno(s) == 0x524 || scc_partno(s) == 0x547 ||
74
+ scc_partno(s) == 0x536;
75
}
76
77
/* Is CFG_REG3 present? */
78
static bool have_cfg3(MPS2SCC *s)
79
{
80
- return scc_partno(s) != 0x524 && scc_partno(s) != 0x547;
81
+ return scc_partno(s) != 0x524 && scc_partno(s) != 0x547 &&
82
+ scc_partno(s) != 0x536;
83
}
84
85
/* Is CFG_REG5 present? */
86
static bool have_cfg5(MPS2SCC *s)
87
{
88
- return scc_partno(s) == 0x524 || scc_partno(s) == 0x547;
89
+ return scc_partno(s) == 0x524 || scc_partno(s) == 0x547 ||
90
+ scc_partno(s) == 0x536;
91
}
92
93
/* Is CFG_REG6 present? */
94
static bool have_cfg6(MPS2SCC *s)
95
{
96
- return scc_partno(s) == 0x524;
97
+ return scc_partno(s) == 0x524 || scc_partno(s) == 0x536;
98
+}
99
+
100
+/* Is CFG_REG7 present? */
101
+static bool have_cfg7(MPS2SCC *s)
102
+{
103
+ return scc_partno(s) == 0x536;
104
+}
105
+
106
+/* Does CFG_REG0 drive the 'remap' GPIO output? */
107
+static bool cfg0_is_remap(MPS2SCC *s)
108
+{
109
+ return scc_partno(s) != 0x536;
110
+}
111
+
112
+/* Is CFG_REG1 driving a set of LEDs? */
113
+static bool cfg1_is_leds(MPS2SCC *s)
114
+{
115
+ return scc_partno(s) != 0x536;
116
}
117
118
/* Handle a write via the SYS_CFG channel to the specified function/device.
119
@@ -XXX,XX +XXX,XX @@ static uint64_t mps2_scc_read(void *opaque, hwaddr offset, unsigned size)
120
if (!have_cfg3(s)) {
121
goto bad_offset;
122
}
123
- /* These are user-settable DIP switches on the board. We don't
124
+ /*
125
+ * These are user-settable DIP switches on the board. We don't
126
* model that, so just return zeroes.
127
+ *
128
+ * TODO: for AN536 this is MCC_MSB_ADDR "additional MCC addressing
129
+ * bits". These change which part of the DDR4 the motherboard
130
+ * configuration controller can see in its memory map (see the
131
+ * appnote section 2.4). QEMU doesn't model the MCC at all, so these
132
+ * bits are not interesting to us; read-as-zero is as good as anything
133
+ * else.
134
*/
135
r = 0;
136
break;
137
@@ -XXX,XX +XXX,XX @@ static uint64_t mps2_scc_read(void *opaque, hwaddr offset, unsigned size)
138
}
139
r = s->cfg6;
140
break;
141
+ case A_CFG7:
142
+ if (!have_cfg7(s)) {
143
+ goto bad_offset;
22
+ }
144
+ }
145
+ r = s->cfg7;
146
+ break;
147
case A_CFGDATA_RTN:
148
r = s->cfgdata_rtn;
149
break;
150
@@ -XXX,XX +XXX,XX @@ static void mps2_scc_write(void *opaque, hwaddr offset, uint64_t value,
151
* we always reflect bit 0 in the 'remap' GPIO output line,
152
* and let the board wire it up or not as it chooses.
153
* TODO on some boards bit 1 is CPU_WAIT.
154
+ *
155
+ * TODO: on the AN536 this register controls reset and halt
156
+ * for both CPUs. For the moment we don't implement this, so the
157
+ * register just reads as written.
158
*/
159
s->cfg0 = value;
160
- qemu_set_irq(s->remap, s->cfg0 & 1);
161
+ if (cfg0_is_remap(s)) {
162
+ qemu_set_irq(s->remap, s->cfg0 & 1);
163
+ }
164
break;
165
case A_CFG1:
166
s->cfg1 = value;
167
- for (size_t i = 0; i < ARRAY_SIZE(s->led); i++) {
168
- led_set_state(s->led[i], extract32(value, i, 1));
169
+ /*
170
+ * On most boards this register drives LEDs.
171
+ *
172
+ * TODO: for AN536 this controls whether flash and ATCM are
173
+ * enabled or disabled on reset. QEMU doesn't model this, and
174
+ * always wires up RAM in the ATCM area and ROM in the flash area.
175
+ */
176
+ if (cfg1_is_leds(s)) {
177
+ for (size_t i = 0; i < ARRAY_SIZE(s->led); i++) {
178
+ led_set_state(s->led[i], extract32(value, i, 1));
179
+ }
180
}
181
break;
182
case A_CFG2:
183
if (!have_cfg2(s)) {
184
goto bad_offset;
185
}
186
- /* AN524: QSPI Select signal */
187
+ /* AN524, AN536: QSPI Select signal */
188
s->cfg2 = value;
189
break;
190
case A_CFG5:
191
if (!have_cfg5(s)) {
192
goto bad_offset;
193
}
194
- /* AN524: ACLK frequency in Hz */
195
+ /* AN524, AN536: ACLK frequency in Hz */
196
s->cfg5 = value;
197
break;
198
case A_CFG6:
199
@@ -XXX,XX +XXX,XX @@ static void mps2_scc_write(void *opaque, hwaddr offset, uint64_t value,
200
goto bad_offset;
201
}
202
/* AN524: Clock divider for BRAM */
203
+ /* AN536: Core 0 vector table base address */
204
+ s->cfg6 = value;
205
+ break;
206
+ case A_CFG7:
207
+ if (!have_cfg7(s)) {
208
+ goto bad_offset;
209
+ }
210
+ /* AN536: Core 1 vector table base address */
211
s->cfg6 = value;
212
break;
213
case A_CFGDATA_OUT:
214
@@ -XXX,XX +XXX,XX @@ static void mps2_scc_finalize(Object *obj)
215
g_free(s->oscclk_reset);
216
}
217
218
+static bool cfg7_needed(void *opaque)
219
+{
220
+ MPS2SCC *s = opaque;
221
+
222
+ return have_cfg7(s);
223
+}
224
+
225
+static const VMStateDescription vmstate_cfg7 = {
226
+ .name = "mps2-scc/cfg7",
227
+ .version_id = 1,
228
+ .minimum_version_id = 1,
229
+ .needed = cfg7_needed,
230
+ .fields = (const VMStateField[]) {
231
+ VMSTATE_UINT32(cfg7, MPS2SCC),
232
+ VMSTATE_END_OF_LIST()
233
+ }
234
+};
235
+
236
static const VMStateDescription mps2_scc_vmstate = {
237
.name = "mps2-scc",
238
.version_id = 3,
239
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription mps2_scc_vmstate = {
240
VMSTATE_VARRAY_UINT32(oscclk, MPS2SCC, num_oscclk,
241
0, vmstate_info_uint32, uint32_t),
242
VMSTATE_END_OF_LIST()
243
+ },
244
+ .subsections = (const VMStateDescription * const []) {
245
+ &vmstate_cfg7,
246
+ NULL
23
}
247
}
24
}
248
};
25
249
26
--
250
--
27
2.20.1
251
2.34.1
28
252
29
253
diff view generated by jsdifflib
Deleted patch
1
In the MVE shift-and-insert insns, we special case VSLI by 0
2
and VSRI by <dt>. VSRI by <dt> means "don't update the destination",
3
which is what we've implemented. However VSLI by 0 is "set
4
destination to the input", so we don't want to use the same
5
special-casing that we do for VSRI by <dt>.
6
1
7
Since the generic logic gives the right answer for a shift
8
by 0, just use that.
9
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
12
---
13
target/arm/mve_helper.c | 9 +++++----
14
1 file changed, 5 insertions(+), 4 deletions(-)
15
16
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
17
index XXXXXXX..XXXXXXX 100644
18
--- a/target/arm/mve_helper.c
19
+++ b/target/arm/mve_helper.c
20
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_S(vrshli_s, DO_VRSHLS)
21
uint16_t mask; \
22
uint64_t shiftmask; \
23
unsigned e; \
24
- if (shift == 0 || shift == ESIZE * 8) { \
25
+ if (shift == ESIZE * 8) { \
26
/* \
27
- * Only VSLI can shift by 0; only VSRI can shift by <dt>. \
28
- * The generic logic would give the right answer for 0 but \
29
- * fails for <dt>. \
30
+ * Only VSRI can shift by <dt>; it should mean "don't \
31
+ * update the destination". The generic logic can't handle \
32
+ * this because it would try to shift by an out-of-range \
33
+ * amount, so special case it here. \
34
*/ \
35
goto done; \
36
} \
37
--
38
2.20.1
39
40
diff view generated by jsdifflib
Deleted patch
1
A cut-and-paste error meant we handled signed VADDV like
2
unsigned VADDV; fix the type used.
3
1
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
---
7
target/arm/mve_helper.c | 6 +++---
8
1 file changed, 3 insertions(+), 3 deletions(-)
9
10
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
11
index XXXXXXX..XXXXXXX 100644
12
--- a/target/arm/mve_helper.c
13
+++ b/target/arm/mve_helper.c
14
@@ -XXX,XX +XXX,XX @@ DO_LDAVH(vrmlsldavhxsw, int32_t, int64_t, true, true)
15
return ra; \
16
} \
17
18
-DO_VADDV(vaddvsb, 1, uint8_t)
19
-DO_VADDV(vaddvsh, 2, uint16_t)
20
-DO_VADDV(vaddvsw, 4, uint32_t)
21
+DO_VADDV(vaddvsb, 1, int8_t)
22
+DO_VADDV(vaddvsh, 2, int16_t)
23
+DO_VADDV(vaddvsw, 4, int32_t)
24
DO_VADDV(vaddvub, 1, uint8_t)
25
DO_VADDV(vaddvuh, 2, uint16_t)
26
DO_VADDV(vaddvuw, 4, uint32_t)
27
--
28
2.20.1
29
30
diff view generated by jsdifflib
Deleted patch
1
In do_sqrshl48_d() and do_uqrshl48_d() we got some of the edge
2
cases wrong and failed to saturate correctly:
3
1
4
(1) In do_sqrshl48_d() we used the same code that do_shrshl_bhs()
5
does to obtain the saturated most-negative and most-positive 48-bit
6
signed values for the large-shift-left case. This gives (1 << 47)
7
for saturate-to-most-negative, but we weren't sign-extending this
8
value to the 64-bit output as the pseudocode requires.
9
10
(2) For left shifts by less than 48, we copied the "8/16 bit" code
11
from do_sqrshl_bhs() and do_uqrshl_bhs(). This doesn't do the right
12
thing because it assumes the C type we're working with is at least
13
twice the number of bits we're saturating to (so that a shift left by
14
bits-1 can't shift anything off the top of the value). This isn't
15
true for bits == 48, so we would incorrectly return 0 rather than the
16
most-positive value for situations like "shift (1 << 44) right by
17
20". Instead check for saturation by doing the shift and signextend
18
and then testing whether shifting back left again gives the original
19
value.
20
21
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
22
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
23
---
24
target/arm/mve_helper.c | 12 +++++-------
25
1 file changed, 5 insertions(+), 7 deletions(-)
26
27
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
28
index XXXXXXX..XXXXXXX 100644
29
--- a/target/arm/mve_helper.c
30
+++ b/target/arm/mve_helper.c
31
@@ -XXX,XX +XXX,XX @@ static inline int64_t do_sqrshl48_d(int64_t src, int64_t shift,
32
}
33
return src >> -shift;
34
} else if (shift < 48) {
35
- int64_t val = src << shift;
36
- int64_t extval = sextract64(val, 0, 48);
37
- if (!sat || val == extval) {
38
+ int64_t extval = sextract64(src << shift, 0, 48);
39
+ if (!sat || src == (extval >> shift)) {
40
return extval;
41
}
42
} else if (!sat || src == 0) {
43
@@ -XXX,XX +XXX,XX @@ static inline int64_t do_sqrshl48_d(int64_t src, int64_t shift,
44
}
45
46
*sat = 1;
47
- return (1ULL << 47) - (src >= 0);
48
+ return src >= 0 ? MAKE_64BIT_MASK(0, 47) : MAKE_64BIT_MASK(47, 17);
49
}
50
51
/* Operate on 64-bit values, but saturate at 48 bits */
52
@@ -XXX,XX +XXX,XX @@ static inline uint64_t do_uqrshl48_d(uint64_t src, int64_t shift,
53
return extval;
54
}
55
} else if (shift < 48) {
56
- uint64_t val = src << shift;
57
- uint64_t extval = extract64(val, 0, 48);
58
- if (!sat || val == extval) {
59
+ uint64_t extval = extract64(src << shift, 0, 48);
60
+ if (!sat || src == (extval >> shift)) {
61
return extval;
62
}
63
} else if (!sat || src == 0) {
64
--
65
2.20.1
66
67
diff view generated by jsdifflib
Deleted patch
1
We got an edge case wrong in the 48-bit SQRSHRL implementation: if
2
the shift is to the right, although it always makes the result
3
smaller than the input value it might not be within the 48-bit range
4
the result is supposed to be if the input had some bits in [63..48]
5
set and the shift didn't bring all of those within the [47..0] range.
6
1
7
Handle this similarly to the way we already do for this case in
8
do_uqrshl48_d(): extend the calculated result from 48 bits,
9
and return that if not saturating or if it doesn't change the
10
result; otherwise fall through to return a saturated value.
11
12
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
13
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
14
---
15
target/arm/mve_helper.c | 11 +++++++++--
16
1 file changed, 9 insertions(+), 2 deletions(-)
17
18
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
19
index XXXXXXX..XXXXXXX 100644
20
--- a/target/arm/mve_helper.c
21
+++ b/target/arm/mve_helper.c
22
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(mve_uqrshll)(CPUARMState *env, uint64_t n, uint32_t shift)
23
static inline int64_t do_sqrshl48_d(int64_t src, int64_t shift,
24
bool round, uint32_t *sat)
25
{
26
+ int64_t val, extval;
27
+
28
if (shift <= -48) {
29
/* Rounding the sign bit always produces 0. */
30
if (round) {
31
@@ -XXX,XX +XXX,XX @@ static inline int64_t do_sqrshl48_d(int64_t src, int64_t shift,
32
} else if (shift < 0) {
33
if (round) {
34
src >>= -shift - 1;
35
- return (src >> 1) + (src & 1);
36
+ val = (src >> 1) + (src & 1);
37
+ } else {
38
+ val = src >> -shift;
39
+ }
40
+ extval = sextract64(val, 0, 48);
41
+ if (!sat || val == extval) {
42
+ return extval;
43
}
44
- return src >> -shift;
45
} else if (shift < 48) {
46
int64_t extval = sextract64(src << shift, 0, 48);
47
if (!sat || src == (extval >> shift)) {
48
--
49
2.20.1
50
51
diff view generated by jsdifflib
Deleted patch
1
In mve_element_mask(), we calculate a mask for tail predication which
2
should have a number of 1 bits based on the value of LR. However,
3
our MAKE_64BIT_MASK() macro has undefined behaviour when passed a
4
zero length. Special case this to give the all-zeroes mask we
5
require.
6
1
7
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
9
---
10
target/arm/mve_helper.c | 3 ++-
11
1 file changed, 2 insertions(+), 1 deletion(-)
12
13
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
14
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/mve_helper.c
16
+++ b/target/arm/mve_helper.c
17
@@ -XXX,XX +XXX,XX @@ static uint16_t mve_element_mask(CPUARMState *env)
18
*/
19
int masklen = env->regs[14] << env->v7m.ltpsize;
20
assert(masklen <= 16);
21
- mask &= MAKE_64BIT_MASK(0, masklen);
22
+ uint16_t ltpmask = masklen ? MAKE_64BIT_MASK(0, masklen) : 0;
23
+ mask &= ltpmask;
24
}
25
26
if ((env->condexec_bits & 0xf) == 0) {
27
--
28
2.20.1
29
30
diff view generated by jsdifflib
Deleted patch
1
We were not paying attention to the ECI state when advancing the VPT
2
state. Architecturally, VPT state advance happens for every beat
3
(see the pseudocode VPTAdvance()), so on every beat the 4 bits of
4
VPR.P0 corresponding to the current beat are inverted if required,
5
and at the end of beats 1 and 3 the VPR MASK fields are updated.
6
This means that if the ECI state says we should not be executing all
7
4 beats then we need to skip some of the updating of the VPR that we
8
currently do in mve_advance_vpt().
9
1
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
12
---
13
target/arm/mve_helper.c | 24 +++++++++++++++++-------
14
1 file changed, 17 insertions(+), 7 deletions(-)
15
16
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
17
index XXXXXXX..XXXXXXX 100644
18
--- a/target/arm/mve_helper.c
19
+++ b/target/arm/mve_helper.c
20
@@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env)
21
/* Advance the VPT and ECI state if necessary */
22
uint32_t vpr = env->v7m.vpr;
23
unsigned mask01, mask23;
24
+ uint16_t inv_mask;
25
+ uint16_t eci_mask = mve_eci_mask(env);
26
27
if ((env->condexec_bits & 0xf) == 0) {
28
env->condexec_bits = (env->condexec_bits == (ECI_A0A1A2B0 << 4)) ?
29
@@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env)
30
return;
31
}
32
33
+ /* Invert P0 bits if needed, but only for beats we actually executed */
34
mask01 = FIELD_EX32(vpr, V7M_VPR, MASK01);
35
mask23 = FIELD_EX32(vpr, V7M_VPR, MASK23);
36
- if (mask01 > 8) {
37
- /* high bit set, but not 0b1000: invert the relevant half of P0 */
38
- vpr ^= 0xff;
39
+ /* Start by assuming we invert all bits corresponding to executed beats */
40
+ inv_mask = eci_mask;
41
+ if (mask01 <= 8) {
42
+ /* MASK01 says don't invert low half of P0 */
43
+ inv_mask &= ~0xff;
44
}
45
- if (mask23 > 8) {
46
- /* high bit set, but not 0b1000: invert the relevant half of P0 */
47
- vpr ^= 0xff00;
48
+ if (mask23 <= 8) {
49
+ /* MASK23 says don't invert high half of P0 */
50
+ inv_mask &= ~0xff00;
51
}
52
- vpr = FIELD_DP32(vpr, V7M_VPR, MASK01, mask01 << 1);
53
+ vpr ^= inv_mask;
54
+ /* Only update MASK01 if beat 1 executed */
55
+ if (eci_mask & 0xf0) {
56
+ vpr = FIELD_DP32(vpr, V7M_VPR, MASK01, mask01 << 1);
57
+ }
58
+ /* Beat 3 always executes, so update MASK23 */
59
vpr = FIELD_DP32(vpr, V7M_VPR, MASK23, mask23 << 1);
60
env->v7m.vpr = vpr;
61
}
62
--
63
2.20.1
64
65
diff view generated by jsdifflib
Deleted patch
1
For vector loads, predicated elements are zeroed, instead of
2
retaining their previous values (as happens for most data
3
processing operations). This means we need to distinguish
4
"beat not executed due to ECI" (don't touch destination
5
element) from "beat executed but predicated out" (zero
6
destination element).
7
1
8
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
10
---
11
target/arm/mve_helper.c | 8 +++++---
12
1 file changed, 5 insertions(+), 3 deletions(-)
13
14
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
15
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/mve_helper.c
17
+++ b/target/arm/mve_helper.c
18
@@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env)
19
env->v7m.vpr = vpr;
20
}
21
22
-
23
+/* For loads, predicated lanes are zeroed instead of keeping their old values */
24
#define DO_VLDR(OP, MSIZE, LDTYPE, ESIZE, TYPE) \
25
void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr) \
26
{ \
27
TYPE *d = vd; \
28
uint16_t mask = mve_element_mask(env); \
29
+ uint16_t eci_mask = mve_eci_mask(env); \
30
unsigned b, e; \
31
/* \
32
* R_SXTM allows the dest reg to become UNKNOWN for abandoned \
33
@@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env)
34
* then take an exception. \
35
*/ \
36
for (b = 0, e = 0; b < 16; b += ESIZE, e++) { \
37
- if (mask & (1 << b)) { \
38
- d[H##ESIZE(e)] = cpu_##LDTYPE##_data_ra(env, addr, GETPC()); \
39
+ if (eci_mask & (1 << b)) { \
40
+ d[H##ESIZE(e)] = (mask & (1 << b)) ? \
41
+ cpu_##LDTYPE##_data_ra(env, addr, GETPC()) : 0; \
42
} \
43
addr += MSIZE; \
44
} \
45
--
46
2.20.1
47
48
diff view generated by jsdifflib
1
Implement the MVE VMLAS insn, which multiplies a vector by a vector
1
The AN536 is another FPGA image for the MPS3 development board. Unlike
2
and adds a scalar.
2
the existing FPGA images we already model, this board uses a Cortex-R
3
family CPU, and it does not use any equivalent to the M-profile
4
"Subsystem for Embedded" SoC-equivalent that we model in hw/arm/armsse.c.
5
It's therefore more convenient for us to model it as a completely
6
separate C file.
7
8
This commit adds the basic skeleton of the board model, and the
9
code to create all the RAM and ROM. We assume that we're probably
10
going to want to add more images in future, so use the same
11
base class/subclass setup that mps2-tz.c uses, even though at
12
the moment there's only a single subclass.
13
14
Following commits will add the CPUs and the peripherals.
3
15
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
16
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
17
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
18
Message-id: 20240206132931.38376-9-peter.maydell@linaro.org
6
---
19
---
7
target/arm/helper-mve.h | 4 ++++
20
MAINTAINERS | 3 +-
8
target/arm/mve.decode | 3 +++
21
configs/devices/arm-softmmu/default.mak | 1 +
9
target/arm/mve_helper.c | 26 ++++++++++++++++++++++++++
22
hw/arm/mps3r.c | 239 ++++++++++++++++++++++++
10
target/arm/translate-mve.c | 1 +
23
hw/arm/Kconfig | 5 +
11
4 files changed, 34 insertions(+)
24
hw/arm/meson.build | 1 +
12
25
5 files changed, 248 insertions(+), 1 deletion(-)
13
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
26
create mode 100644 hw/arm/mps3r.c
27
28
diff --git a/MAINTAINERS b/MAINTAINERS
14
index XXXXXXX..XXXXXXX 100644
29
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/helper-mve.h
30
--- a/MAINTAINERS
16
+++ b/target/arm/helper-mve.h
31
+++ b/MAINTAINERS
17
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqdmullb_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i3
32
@@ -XXX,XX +XXX,XX @@ F: include/hw/misc/imx7_*.h
18
DEF_HELPER_FLAGS_4(mve_vqdmullt_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
33
F: hw/pci-host/designware.c
19
DEF_HELPER_FLAGS_4(mve_vqdmullt_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
34
F: include/hw/pci-host/designware.h
20
35
21
+DEF_HELPER_FLAGS_4(mve_vmlasb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
36
-MPS2
22
+DEF_HELPER_FLAGS_4(mve_vmlash, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
37
+MPS2 / MPS3
23
+DEF_HELPER_FLAGS_4(mve_vmlasw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
38
M: Peter Maydell <peter.maydell@linaro.org>
24
+
39
L: qemu-arm@nongnu.org
25
DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
40
S: Maintained
26
DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
41
F: hw/arm/mps2.c
27
DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
42
F: hw/arm/mps2-tz.c
28
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
43
+F: hw/arm/mps3r.c
44
F: hw/misc/mps2-*.c
45
F: include/hw/misc/mps2-*.h
46
F: hw/arm/armsse.c
47
diff --git a/configs/devices/arm-softmmu/default.mak b/configs/devices/arm-softmmu/default.mak
29
index XXXXXXX..XXXXXXX 100644
48
index XXXXXXX..XXXXXXX 100644
30
--- a/target/arm/mve.decode
49
--- a/configs/devices/arm-softmmu/default.mak
31
+++ b/target/arm/mve.decode
50
+++ b/configs/devices/arm-softmmu/default.mak
32
@@ -XXX,XX +XXX,XX @@ VBRSR 1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
51
@@ -XXX,XX +XXX,XX @@ CONFIG_ARM_VIRT=y
33
VQDMULH_scalar 1110 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
52
# CONFIG_INTEGRATOR=n
34
VQRDMULH_scalar 1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
53
# CONFIG_FSL_IMX31=n
35
54
# CONFIG_MUSICPAL=n
36
+# The U bit (28) is don't-care because it does not affect the result
55
+# CONFIG_MPS3R=n
37
+VMLAS 111- 1110 0 . .. ... 1 ... 1 1110 . 100 .... @2scalar
56
# CONFIG_MUSCA=n
38
+
57
# CONFIG_CHEETAH=n
39
# Vector add across vector
58
# CONFIG_SX1=n
40
{
59
diff --git a/hw/arm/mps3r.c b/hw/arm/mps3r.c
41
VADDV 111 u:1 1110 1111 size:2 01 ... 0 1111 0 0 a:1 0 qm:3 0 rda=%rdalo
60
new file mode 100644
42
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
61
index XXXXXXX..XXXXXXX
62
--- /dev/null
63
+++ b/hw/arm/mps3r.c
64
@@ -XXX,XX +XXX,XX @@
65
+/*
66
+ * Arm MPS3 board emulation for Cortex-R-based FPGA images.
67
+ * (For M-profile images see mps2.c and mps2tz.c.)
68
+ *
69
+ * Copyright (c) 2017 Linaro Limited
70
+ * Written by Peter Maydell
71
+ *
72
+ * This program is free software; you can redistribute it and/or modify
73
+ * it under the terms of the GNU General Public License version 2 or
74
+ * (at your option) any later version.
75
+ */
76
+
77
+/*
78
+ * The MPS3 is an FPGA based dev board. This file handles FPGA images
79
+ * which use the Cortex-R CPUs. We model these separately from the
80
+ * M-profile images, because on M-profile the FPGA image is based on
81
+ * a "Subsystem for Embedded" which is similar to an SoC, whereas
82
+ * the R-profile FPGA images don't have that abstraction layer.
83
+ *
84
+ * We model the following FPGA images here:
85
+ * "mps3-an536" -- dual Cortex-R52 as documented in Arm Application Note AN536
86
+ *
87
+ * Application Note AN536:
88
+ * https://developer.arm.com/documentation/dai0536/latest/
89
+ */
90
+
91
+#include "qemu/osdep.h"
92
+#include "qemu/units.h"
93
+#include "qapi/error.h"
94
+#include "exec/address-spaces.h"
95
+#include "cpu.h"
96
+#include "hw/boards.h"
97
+#include "hw/arm/boot.h"
98
+
99
+/* Define the layout of RAM and ROM in a board */
100
+typedef struct RAMInfo {
101
+ const char *name;
102
+ hwaddr base;
103
+ hwaddr size;
104
+ int mrindex; /* index into rams[]; -1 for the system RAM block */
105
+ int flags;
106
+} RAMInfo;
107
+
108
+/*
109
+ * The MPS3 DDR is 3GiB, but on a 32-bit host QEMU doesn't permit
110
+ * emulation of that much guest RAM, so artificially make it smaller.
111
+ */
112
+#if HOST_LONG_BITS == 32
113
+#define MPS3_DDR_SIZE (1 * GiB)
114
+#else
115
+#define MPS3_DDR_SIZE (3 * GiB)
116
+#endif
117
+
118
+/*
119
+ * Flag values:
120
+ * IS_MAIN: this is the main machine RAM
121
+ * IS_ROM: this area is read-only
122
+ */
123
+#define IS_MAIN 1
124
+#define IS_ROM 2
125
+
126
+#define MPS3R_RAM_MAX 9
127
+
128
+typedef enum MPS3RFPGAType {
129
+ FPGA_AN536,
130
+} MPS3RFPGAType;
131
+
132
+struct MPS3RMachineClass {
133
+ MachineClass parent;
134
+ MPS3RFPGAType fpga_type;
135
+ const RAMInfo *raminfo;
136
+};
137
+
138
+struct MPS3RMachineState {
139
+ MachineState parent;
140
+ MemoryRegion ram[MPS3R_RAM_MAX];
141
+};
142
+
143
+#define TYPE_MPS3R_MACHINE "mps3r"
144
+#define TYPE_MPS3R_AN536_MACHINE MACHINE_TYPE_NAME("mps3-an536")
145
+
146
+OBJECT_DECLARE_TYPE(MPS3RMachineState, MPS3RMachineClass, MPS3R_MACHINE)
147
+
148
+static const RAMInfo an536_raminfo[] = {
149
+ {
150
+ .name = "ATCM",
151
+ .base = 0x00000000,
152
+ .size = 0x00008000,
153
+ .mrindex = 0,
154
+ }, {
155
+ /* We model the QSPI flash as simple ROM for now */
156
+ .name = "QSPI",
157
+ .base = 0x08000000,
158
+ .size = 0x00800000,
159
+ .flags = IS_ROM,
160
+ .mrindex = 1,
161
+ }, {
162
+ .name = "BRAM",
163
+ .base = 0x10000000,
164
+ .size = 0x00080000,
165
+ .mrindex = 2,
166
+ }, {
167
+ .name = "DDR",
168
+ .base = 0x20000000,
169
+ .size = MPS3_DDR_SIZE,
170
+ .mrindex = -1,
171
+ }, {
172
+ .name = "ATCM0",
173
+ .base = 0xee000000,
174
+ .size = 0x00008000,
175
+ .mrindex = 3,
176
+ }, {
177
+ .name = "BTCM0",
178
+ .base = 0xee100000,
179
+ .size = 0x00008000,
180
+ .mrindex = 4,
181
+ }, {
182
+ .name = "CTCM0",
183
+ .base = 0xee200000,
184
+ .size = 0x00008000,
185
+ .mrindex = 5,
186
+ }, {
187
+ .name = "ATCM1",
188
+ .base = 0xee400000,
189
+ .size = 0x00008000,
190
+ .mrindex = 6,
191
+ }, {
192
+ .name = "BTCM1",
193
+ .base = 0xee500000,
194
+ .size = 0x00008000,
195
+ .mrindex = 7,
196
+ }, {
197
+ .name = "CTCM1",
198
+ .base = 0xee600000,
199
+ .size = 0x00008000,
200
+ .mrindex = 8,
201
+ }, {
202
+ .name = NULL,
203
+ }
204
+};
205
+
206
+static MemoryRegion *mr_for_raminfo(MPS3RMachineState *mms,
207
+ const RAMInfo *raminfo)
208
+{
209
+ /* Return an initialized MemoryRegion for the RAMInfo. */
210
+ MemoryRegion *ram;
211
+
212
+ if (raminfo->mrindex < 0) {
213
+ /* Means this RAMInfo is for QEMU's "system memory" */
214
+ MachineState *machine = MACHINE(mms);
215
+ assert(!(raminfo->flags & IS_ROM));
216
+ return machine->ram;
217
+ }
218
+
219
+ assert(raminfo->mrindex < MPS3R_RAM_MAX);
220
+ ram = &mms->ram[raminfo->mrindex];
221
+
222
+ memory_region_init_ram(ram, NULL, raminfo->name,
223
+ raminfo->size, &error_fatal);
224
+ if (raminfo->flags & IS_ROM) {
225
+ memory_region_set_readonly(ram, true);
226
+ }
227
+ return ram;
228
+}
229
+
230
+static void mps3r_common_init(MachineState *machine)
231
+{
232
+ MPS3RMachineState *mms = MPS3R_MACHINE(machine);
233
+ MPS3RMachineClass *mmc = MPS3R_MACHINE_GET_CLASS(mms);
234
+ MemoryRegion *sysmem = get_system_memory();
235
+
236
+ for (const RAMInfo *ri = mmc->raminfo; ri->name; ri++) {
237
+ MemoryRegion *mr = mr_for_raminfo(mms, ri);
238
+ memory_region_add_subregion(sysmem, ri->base, mr);
239
+ }
240
+}
241
+
242
+static void mps3r_set_default_ram_info(MPS3RMachineClass *mmc)
243
+{
244
+ /*
245
+ * Set mc->default_ram_size and default_ram_id from the
246
+ * information in mmc->raminfo.
247
+ */
248
+ MachineClass *mc = MACHINE_CLASS(mmc);
249
+ const RAMInfo *p;
250
+
251
+ for (p = mmc->raminfo; p->name; p++) {
252
+ if (p->mrindex < 0) {
253
+ /* Found the entry for "system memory" */
254
+ mc->default_ram_size = p->size;
255
+ mc->default_ram_id = p->name;
256
+ return;
257
+ }
258
+ }
259
+ g_assert_not_reached();
260
+}
261
+
262
+static void mps3r_class_init(ObjectClass *oc, void *data)
263
+{
264
+ MachineClass *mc = MACHINE_CLASS(oc);
265
+
266
+ mc->init = mps3r_common_init;
267
+}
268
+
269
+static void mps3r_an536_class_init(ObjectClass *oc, void *data)
270
+{
271
+ MachineClass *mc = MACHINE_CLASS(oc);
272
+ MPS3RMachineClass *mmc = MPS3R_MACHINE_CLASS(oc);
273
+ static const char * const valid_cpu_types[] = {
274
+ ARM_CPU_TYPE_NAME("cortex-r52"),
275
+ NULL
276
+ };
277
+
278
+ mc->desc = "ARM MPS3 with AN536 FPGA image for Cortex-R52";
279
+ mc->default_cpus = 2;
280
+ mc->min_cpus = mc->default_cpus;
281
+ mc->max_cpus = mc->default_cpus;
282
+ mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-r52");
283
+ mc->valid_cpu_types = valid_cpu_types;
284
+ mmc->raminfo = an536_raminfo;
285
+ mps3r_set_default_ram_info(mmc);
286
+}
287
+
288
+static const TypeInfo mps3r_machine_types[] = {
289
+ {
290
+ .name = TYPE_MPS3R_MACHINE,
291
+ .parent = TYPE_MACHINE,
292
+ .abstract = true,
293
+ .instance_size = sizeof(MPS3RMachineState),
294
+ .class_size = sizeof(MPS3RMachineClass),
295
+ .class_init = mps3r_class_init,
296
+ }, {
297
+ .name = TYPE_MPS3R_AN536_MACHINE,
298
+ .parent = TYPE_MPS3R_MACHINE,
299
+ .class_init = mps3r_an536_class_init,
300
+ },
301
+};
302
+
303
+DEFINE_TYPES(mps3r_machine_types);
304
diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
43
index XXXXXXX..XXXXXXX 100644
305
index XXXXXXX..XXXXXXX 100644
44
--- a/target/arm/mve_helper.c
306
--- a/hw/arm/Kconfig
45
+++ b/target/arm/mve_helper.c
307
+++ b/hw/arm/Kconfig
46
@@ -XXX,XX +XXX,XX @@ DO_VQDMLADH_OP(vqrdmlsdhxw, 4, int32_t, 1, 1, do_vqdmlsdh_w)
308
@@ -XXX,XX +XXX,XX @@ config MAINSTONE
47
mve_advance_vpt(env); \
309
select PFLASH_CFI01
48
}
310
select SMC91C111
49
311
50
+/* "accumulating" version where FN takes d as well as n and m */
312
+config MPS3R
51
+#define DO_2OP_ACC_SCALAR(OP, ESIZE, TYPE, FN) \
313
+ bool
52
+ void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn, \
314
+ default y
53
+ uint32_t rm) \
315
+ depends on TCG && ARM
54
+ { \
316
+
55
+ TYPE *d = vd, *n = vn; \
317
config MUSCA
56
+ TYPE m = rm; \
318
bool
57
+ uint16_t mask = mve_element_mask(env); \
319
default y
58
+ unsigned e; \
320
diff --git a/hw/arm/meson.build b/hw/arm/meson.build
59
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
60
+ mergemask(&d[H##ESIZE(e)], \
61
+ FN(d[H##ESIZE(e)], n[H##ESIZE(e)], m), mask); \
62
+ } \
63
+ mve_advance_vpt(env); \
64
+ }
65
+
66
/* provide unsigned 2-op scalar helpers for all sizes */
67
#define DO_2OP_SCALAR_U(OP, FN) \
68
DO_2OP_SCALAR(OP##b, 1, uint8_t, FN) \
69
@@ -XXX,XX +XXX,XX @@ DO_VQDMLADH_OP(vqrdmlsdhxw, 4, int32_t, 1, 1, do_vqdmlsdh_w)
70
DO_2OP_SCALAR(OP##h, 2, int16_t, FN) \
71
DO_2OP_SCALAR(OP##w, 4, int32_t, FN)
72
73
+#define DO_2OP_ACC_SCALAR_U(OP, FN) \
74
+ DO_2OP_ACC_SCALAR(OP##b, 1, uint8_t, FN) \
75
+ DO_2OP_ACC_SCALAR(OP##h, 2, uint16_t, FN) \
76
+ DO_2OP_ACC_SCALAR(OP##w, 4, uint32_t, FN)
77
+
78
DO_2OP_SCALAR_U(vadd_scalar, DO_ADD)
79
DO_2OP_SCALAR_U(vsub_scalar, DO_SUB)
80
DO_2OP_SCALAR_U(vmul_scalar, DO_MUL)
81
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT_SCALAR(vqrdmulh_scalarb, 1, int8_t, DO_QRDMULH_B)
82
DO_2OP_SAT_SCALAR(vqrdmulh_scalarh, 2, int16_t, DO_QRDMULH_H)
83
DO_2OP_SAT_SCALAR(vqrdmulh_scalarw, 4, int32_t, DO_QRDMULH_W)
84
85
+/* Vector by vector plus scalar */
86
+#define DO_VMLAS(D, N, M) ((N) * (D) + (M))
87
+
88
+DO_2OP_ACC_SCALAR_U(vmlas, DO_VMLAS)
89
+
90
/*
91
* Long saturating scalar ops. As with DO_2OP_L, TYPE and H are for the
92
* input (smaller) type and LESIZE, LTYPE, LH for the output (long) type.
93
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
94
index XXXXXXX..XXXXXXX 100644
321
index XXXXXXX..XXXXXXX 100644
95
--- a/target/arm/translate-mve.c
322
--- a/hw/arm/meson.build
96
+++ b/target/arm/translate-mve.c
323
+++ b/hw/arm/meson.build
97
@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR(VQSUB_U_scalar, vqsubu_scalar)
324
@@ -XXX,XX +XXX,XX @@ arm_ss.add(when: 'CONFIG_HIGHBANK', if_true: files('highbank.c'))
98
DO_2OP_SCALAR(VQDMULH_scalar, vqdmulh_scalar)
325
arm_ss.add(when: 'CONFIG_INTEGRATOR', if_true: files('integratorcp.c'))
99
DO_2OP_SCALAR(VQRDMULH_scalar, vqrdmulh_scalar)
326
arm_ss.add(when: 'CONFIG_MAINSTONE', if_true: files('mainstone.c'))
100
DO_2OP_SCALAR(VBRSR, vbrsr)
327
arm_ss.add(when: 'CONFIG_MICROBIT', if_true: files('microbit.c'))
101
+DO_2OP_SCALAR(VMLAS, vmlas)
328
+arm_ss.add(when: 'CONFIG_MPS3R', if_true: files('mps3r.c'))
102
329
arm_ss.add(when: 'CONFIG_MUSICPAL', if_true: files('musicpal.c'))
103
static bool trans_VQDMULLB_scalar(DisasContext *s, arg_2scalar *a)
330
arm_ss.add(when: 'CONFIG_NETDUINOPLUS2', if_true: files('netduinoplus2.c'))
104
{
331
arm_ss.add(when: 'CONFIG_OLIMEX_STM32_H405', if_true: files('olimex-stm32-h405.c'))
105
--
332
--
106
2.20.1
333
2.34.1
107
334
108
335
diff view generated by jsdifflib
1
Implement the MVE VMULL (polynomial) insn. Unlike Neon, this comes
1
Create the CPUs, the GIC, and the per-CPU RAM block for
2
in two flavours: 8x8->16 and a 16x16->32. Also unlike Neon, the
2
the mps3-an536 board.
3
inputs are in either the low or the high half of each double-width
4
element.
5
6
The assembler for this insn indicates the size with "P8" or "P16",
7
encoded into bit 28 as size = 0 or 1. We choose to follow the
8
same encoding as VQDMULL and decode this into a->size as MO_16
9
or MO_32 indicating the size of the result elements. This then
10
carries through to the helper function names where it then
11
matches up with the existing pmull_h() which does an 8x8->16
12
operation and a new pmull_w() which does the 16x16->32.
13
3
14
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
15
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
5
Message-id: 20240206132931.38376-10-peter.maydell@linaro.org
16
---
6
---
17
target/arm/helper-mve.h | 5 +++++
7
hw/arm/mps3r.c | 180 ++++++++++++++++++++++++++++++++++++++++++++++++-
18
target/arm/vec_internal.h | 11 +++++++++++
8
1 file changed, 177 insertions(+), 3 deletions(-)
19
target/arm/mve.decode | 14 ++++++++++----
20
target/arm/mve_helper.c | 16 ++++++++++++++++
21
target/arm/translate-mve.c | 28 ++++++++++++++++++++++++++++
22
target/arm/vec_helper.c | 14 +++++++++++++-
23
6 files changed, 83 insertions(+), 5 deletions(-)
24
9
25
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
10
diff --git a/hw/arm/mps3r.c b/hw/arm/mps3r.c
26
index XXXXXXX..XXXXXXX 100644
11
index XXXXXXX..XXXXXXX 100644
27
--- a/target/arm/helper-mve.h
12
--- a/hw/arm/mps3r.c
28
+++ b/target/arm/helper-mve.h
13
+++ b/hw/arm/mps3r.c
29
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vmulltub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
14
@@ -XXX,XX +XXX,XX @@
30
DEF_HELPER_FLAGS_4(mve_vmulltuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
15
#include "qemu/osdep.h"
31
DEF_HELPER_FLAGS_4(mve_vmulltuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
16
#include "qemu/units.h"
32
17
#include "qapi/error.h"
33
+DEF_HELPER_FLAGS_4(mve_vmullpbh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
18
+#include "qapi/qmp/qlist.h"
34
+DEF_HELPER_FLAGS_4(mve_vmullpth, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
19
#include "exec/address-spaces.h"
35
+DEF_HELPER_FLAGS_4(mve_vmullpbw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
20
#include "cpu.h"
36
+DEF_HELPER_FLAGS_4(mve_vmullptw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
21
#include "hw/boards.h"
37
+
22
+#include "hw/qdev-properties.h"
38
DEF_HELPER_FLAGS_4(mve_vqdmulhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
23
#include "hw/arm/boot.h"
39
DEF_HELPER_FLAGS_4(mve_vqdmulhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
24
+#include "hw/arm/bsa.h"
40
DEF_HELPER_FLAGS_4(mve_vqdmulhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
25
+#include "hw/intc/arm_gicv3.h"
41
diff --git a/target/arm/vec_internal.h b/target/arm/vec_internal.h
26
42
index XXXXXXX..XXXXXXX 100644
27
/* Define the layout of RAM and ROM in a board */
43
--- a/target/arm/vec_internal.h
28
typedef struct RAMInfo {
44
+++ b/target/arm/vec_internal.h
29
@@ -XXX,XX +XXX,XX @@ typedef struct RAMInfo {
45
@@ -XXX,XX +XXX,XX @@ int16_t do_sqrdmlah_h(int16_t, int16_t, int16_t, bool, bool, uint32_t *);
30
#define IS_ROM 2
46
int32_t do_sqrdmlah_s(int32_t, int32_t, int32_t, bool, bool, uint32_t *);
31
47
int64_t do_sqrdmlah_d(int64_t, int64_t, int64_t, bool, bool);
32
#define MPS3R_RAM_MAX 9
33
+#define MPS3R_CPU_MAX 2
34
+
35
+#define PERIPHBASE 0xf0000000
36
+#define NUM_SPIS 96
37
38
typedef enum MPS3RFPGAType {
39
FPGA_AN536,
40
@@ -XXX,XX +XXX,XX @@ struct MPS3RMachineClass {
41
MachineClass parent;
42
MPS3RFPGAType fpga_type;
43
const RAMInfo *raminfo;
44
+ hwaddr loader_start;
45
};
46
47
struct MPS3RMachineState {
48
MachineState parent;
49
+ struct arm_boot_info bootinfo;
50
MemoryRegion ram[MPS3R_RAM_MAX];
51
+ Object *cpu[MPS3R_CPU_MAX];
52
+ MemoryRegion cpu_sysmem[MPS3R_CPU_MAX];
53
+ MemoryRegion sysmem_alias[MPS3R_CPU_MAX];
54
+ MemoryRegion cpu_ram[MPS3R_CPU_MAX];
55
+ GICv3State gic;
56
};
57
58
#define TYPE_MPS3R_MACHINE "mps3r"
59
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *mr_for_raminfo(MPS3RMachineState *mms,
60
return ram;
61
}
48
62
49
+/*
63
+/*
50
+ * 8 x 8 -> 16 vector polynomial multiply where the inputs are
64
+ * There is no defined secondary boot protocol for Linux for the AN536,
51
+ * in the low 8 bits of each 16-bit element
65
+ * because real hardware has a restriction that atomic operations between
52
+*/
66
+ * the two CPUs do not function correctly, and so true SMP is not
53
+uint64_t pmull_h(uint64_t op1, uint64_t op2);
67
+ * possible. Therefore for cases where the user is directly booting
54
+/*
68
+ * a kernel, we treat the system as essentially uniprocessor, and
55
+ * 16 x 16 -> 32 vector polynomial multiply where the inputs are
69
+ * put the secondary CPU into power-off state (as if the user on the
56
+ * in the low 16 bits of each 32-bit element
70
+ * real hardware had configured the secondary to be halted via the
71
+ * SCC config registers).
72
+ *
73
+ * Note that the default secondary boot code would not work here anyway
74
+ * as it assumes a GICv2, and we have a GICv3.
57
+ */
75
+ */
58
+uint64_t pmull_w(uint64_t op1, uint64_t op2);
76
+static void mps3r_write_secondary_boot(ARMCPU *cpu,
59
+
77
+ const struct arm_boot_info *info)
60
#endif /* TARGET_ARM_VEC_INTERNALS_H */
61
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
62
index XXXXXXX..XXXXXXX 100644
63
--- a/target/arm/mve.decode
64
+++ b/target/arm/mve.decode
65
@@ -XXX,XX +XXX,XX @@ VHADD_U 111 1 1111 0 . .. ... 0 ... 0 0000 . 1 . 0 ... 0 @2op
66
VHSUB_S 111 0 1111 0 . .. ... 0 ... 0 0010 . 1 . 0 ... 0 @2op
67
VHSUB_U 111 1 1111 0 . .. ... 0 ... 0 0010 . 1 . 0 ... 0 @2op
68
69
-VMULL_BS 111 0 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
70
-VMULL_BU 111 1 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
71
-VMULL_TS 111 0 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
72
-VMULL_TU 111 1 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
73
+{
74
+ VMULLP_B 111 . 1110 0 . 11 ... 1 ... 0 1110 . 0 . 0 ... 0 @2op_sz28
75
+ VMULL_BS 111 0 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
76
+ VMULL_BU 111 1 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
77
+}
78
+{
79
+ VMULLP_T 111 . 1110 0 . 11 ... 1 ... 1 1110 . 0 . 0 ... 0 @2op_sz28
80
+ VMULL_TS 111 0 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
81
+ VMULL_TU 111 1 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
82
+}
83
84
VQDMULH 1110 1111 0 . .. ... 0 ... 0 1011 . 1 . 0 ... 0 @2op
85
VQRDMULH 1111 1111 0 . .. ... 0 ... 0 1011 . 1 . 0 ... 0 @2op
86
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
87
index XXXXXXX..XXXXXXX 100644
88
--- a/target/arm/mve_helper.c
89
+++ b/target/arm/mve_helper.c
90
@@ -XXX,XX +XXX,XX @@ DO_2OP_L(vmulltub, 1, 1, uint8_t, 2, uint16_t, DO_MUL)
91
DO_2OP_L(vmulltuh, 1, 2, uint16_t, 4, uint32_t, DO_MUL)
92
DO_2OP_L(vmulltuw, 1, 4, uint32_t, 8, uint64_t, DO_MUL)
93
94
+/*
95
+ * Polynomial multiply. We can always do this generating 64 bits
96
+ * of the result at a time, so we don't need to use DO_2OP_L.
97
+ */
98
+#define VMULLPH_MASK 0x00ff00ff00ff00ffULL
99
+#define VMULLPW_MASK 0x0000ffff0000ffffULL
100
+#define DO_VMULLPBH(N, M) pmull_h((N) & VMULLPH_MASK, (M) & VMULLPH_MASK)
101
+#define DO_VMULLPTH(N, M) DO_VMULLPBH((N) >> 8, (M) >> 8)
102
+#define DO_VMULLPBW(N, M) pmull_w((N) & VMULLPW_MASK, (M) & VMULLPW_MASK)
103
+#define DO_VMULLPTW(N, M) DO_VMULLPBW((N) >> 16, (M) >> 16)
104
+
105
+DO_2OP(vmullpbh, 8, uint64_t, DO_VMULLPBH)
106
+DO_2OP(vmullpth, 8, uint64_t, DO_VMULLPTH)
107
+DO_2OP(vmullpbw, 8, uint64_t, DO_VMULLPBW)
108
+DO_2OP(vmullptw, 8, uint64_t, DO_VMULLPTW)
109
+
110
/*
111
* Because the computation type is at least twice as large as required,
112
* these work for both signed and unsigned source types.
113
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
114
index XXXXXXX..XXXXXXX 100644
115
--- a/target/arm/translate-mve.c
116
+++ b/target/arm/translate-mve.c
117
@@ -XXX,XX +XXX,XX @@ static bool trans_VQDMULLT(DisasContext *s, arg_2op *a)
118
return do_2op(s, a, fns[a->size]);
119
}
120
121
+static bool trans_VMULLP_B(DisasContext *s, arg_2op *a)
122
+{
78
+{
123
+ /*
79
+ /*
124
+ * Note that a->size indicates the output size, ie VMULL.P8
80
+ * Power the secondary CPU off. This means we don't need to write any
125
+ * is the 8x8->16 operation and a->size is MO_16; VMULL.P16
81
+ * boot code into guest memory. Note that the 'cpu' argument to this
126
+ * is the 16x16->32 operation and a->size is MO_32.
82
+ * function is the primary CPU we passed to arm_load_kernel(), not
83
+ * the secondary. Loop around all the other CPUs, as the boot.c
84
+ * code does for the "disable secondaries if PSCI is enabled" case.
127
+ */
85
+ */
128
+ static MVEGenTwoOpFn * const fns[] = {
86
+ for (CPUState *cs = first_cpu; cs; cs = CPU_NEXT(cs)) {
129
+ NULL,
87
+ if (cs != first_cpu) {
130
+ gen_helper_mve_vmullpbh,
88
+ object_property_set_bool(OBJECT(cs), "start-powered-off", true,
131
+ gen_helper_mve_vmullpbw,
89
+ &error_abort);
132
+ NULL,
90
+ }
133
+ };
91
+ }
134
+ return do_2op(s, a, fns[a->size]);
135
+}
92
+}
136
+
93
+
137
+static bool trans_VMULLP_T(DisasContext *s, arg_2op *a)
94
+static void mps3r_secondary_cpu_reset(ARMCPU *cpu,
95
+ const struct arm_boot_info *info)
138
+{
96
+{
139
+ /* a->size is as for trans_VMULLP_B */
97
+ /* We don't need to do anything here because the CPU will be off */
140
+ static MVEGenTwoOpFn * const fns[] = {
141
+ NULL,
142
+ gen_helper_mve_vmullpth,
143
+ gen_helper_mve_vmullptw,
144
+ NULL,
145
+ };
146
+ return do_2op(s, a, fns[a->size]);
147
+}
98
+}
148
+
99
+
149
/*
100
+static void create_gic(MPS3RMachineState *mms, MemoryRegion *sysmem)
150
* VADC and VSBC: these perform an add-with-carry or subtract-with-carry
101
+{
151
* of the 32-bit elements in each lane of the input vectors, where the
102
+ MachineState *machine = MACHINE(mms);
152
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
103
+ DeviceState *gicdev;
153
index XXXXXXX..XXXXXXX 100644
104
+ QList *redist_region_count;
154
--- a/target/arm/vec_helper.c
105
+
155
+++ b/target/arm/vec_helper.c
106
+ object_initialize_child(OBJECT(mms), "gic", &mms->gic, TYPE_ARM_GICV3);
156
@@ -XXX,XX +XXX,XX @@ static uint64_t expand_byte_to_half(uint64_t x)
107
+ gicdev = DEVICE(&mms->gic);
157
| ((x & 0xff000000) << 24);
108
+ qdev_prop_set_uint32(gicdev, "num-cpu", machine->smp.cpus);
109
+ qdev_prop_set_uint32(gicdev, "num-irq", NUM_SPIS + GIC_INTERNAL);
110
+ redist_region_count = qlist_new();
111
+ qlist_append_int(redist_region_count, machine->smp.cpus);
112
+ qdev_prop_set_array(gicdev, "redist-region-count", redist_region_count);
113
+ object_property_set_link(OBJECT(&mms->gic), "sysmem",
114
+ OBJECT(sysmem), &error_fatal);
115
+ sysbus_realize(SYS_BUS_DEVICE(&mms->gic), &error_fatal);
116
+ sysbus_mmio_map(SYS_BUS_DEVICE(&mms->gic), 0, PERIPHBASE);
117
+ sysbus_mmio_map(SYS_BUS_DEVICE(&mms->gic), 1, PERIPHBASE + 0x100000);
118
+ /*
119
+ * Wire the outputs from each CPU's generic timer and the GICv3
120
+ * maintenance interrupt signal to the appropriate GIC PPI inputs,
121
+ * and the GIC's IRQ/FIQ/VIRQ/VFIQ interrupt outputs to the CPU's inputs.
122
+ */
123
+ for (int i = 0; i < machine->smp.cpus; i++) {
124
+ DeviceState *cpudev = DEVICE(mms->cpu[i]);
125
+ SysBusDevice *gicsbd = SYS_BUS_DEVICE(&mms->gic);
126
+ int intidbase = NUM_SPIS + i * GIC_INTERNAL;
127
+ int irq;
128
+ /*
129
+ * Mapping from the output timer irq lines from the CPU to the
130
+ * GIC PPI inputs used for this board. This isn't a BSA board,
131
+ * but it uses the standard convention for the PPI numbers.
132
+ */
133
+ const int timer_irq[] = {
134
+ [GTIMER_PHYS] = ARCH_TIMER_NS_EL1_IRQ,
135
+ [GTIMER_VIRT] = ARCH_TIMER_VIRT_IRQ,
136
+ [GTIMER_HYP] = ARCH_TIMER_NS_EL2_IRQ,
137
+ };
138
+
139
+ for (irq = 0; irq < ARRAY_SIZE(timer_irq); irq++) {
140
+ qdev_connect_gpio_out(cpudev, irq,
141
+ qdev_get_gpio_in(gicdev,
142
+ intidbase + timer_irq[irq]));
143
+ }
144
+
145
+ qdev_connect_gpio_out_named(cpudev, "gicv3-maintenance-interrupt", 0,
146
+ qdev_get_gpio_in(gicdev,
147
+ intidbase + ARCH_GIC_MAINT_IRQ));
148
+
149
+ qdev_connect_gpio_out_named(cpudev, "pmu-interrupt", 0,
150
+ qdev_get_gpio_in(gicdev,
151
+ intidbase + VIRTUAL_PMU_IRQ));
152
+
153
+ sysbus_connect_irq(gicsbd, i,
154
+ qdev_get_gpio_in(cpudev, ARM_CPU_IRQ));
155
+ sysbus_connect_irq(gicsbd, i + machine->smp.cpus,
156
+ qdev_get_gpio_in(cpudev, ARM_CPU_FIQ));
157
+ sysbus_connect_irq(gicsbd, i + 2 * machine->smp.cpus,
158
+ qdev_get_gpio_in(cpudev, ARM_CPU_VIRQ));
159
+ sysbus_connect_irq(gicsbd, i + 3 * machine->smp.cpus,
160
+ qdev_get_gpio_in(cpudev, ARM_CPU_VFIQ));
161
+ }
162
+}
163
+
164
static void mps3r_common_init(MachineState *machine)
165
{
166
MPS3RMachineState *mms = MPS3R_MACHINE(machine);
167
@@ -XXX,XX +XXX,XX @@ static void mps3r_common_init(MachineState *machine)
168
MemoryRegion *mr = mr_for_raminfo(mms, ri);
169
memory_region_add_subregion(sysmem, ri->base, mr);
170
}
171
+
172
+ assert(machine->smp.cpus <= MPS3R_CPU_MAX);
173
+ for (int i = 0; i < machine->smp.cpus; i++) {
174
+ g_autofree char *sysmem_name = g_strdup_printf("cpu-%d-memory", i);
175
+ g_autofree char *ramname = g_strdup_printf("cpu-%d-memory", i);
176
+ g_autofree char *alias_name = g_strdup_printf("sysmem-alias-%d", i);
177
+
178
+ /*
179
+ * Each CPU has some private RAM/peripherals, so create the container
180
+ * which will house those, with the whole-machine system memory being
181
+ * used where there's no CPU-specific device. Note that we need the
182
+ * sysmem_alias aliases because we can't put one MR (the original
183
+ * 'sysmem') into more than one other MR.
184
+ */
185
+ memory_region_init(&mms->cpu_sysmem[i], OBJECT(machine),
186
+ sysmem_name, UINT64_MAX);
187
+ memory_region_init_alias(&mms->sysmem_alias[i], OBJECT(machine),
188
+ alias_name, sysmem, 0, UINT64_MAX);
189
+ memory_region_add_subregion_overlap(&mms->cpu_sysmem[i], 0,
190
+ &mms->sysmem_alias[i], -1);
191
+
192
+ mms->cpu[i] = object_new(machine->cpu_type);
193
+ object_property_set_link(mms->cpu[i], "memory",
194
+ OBJECT(&mms->cpu_sysmem[i]), &error_abort);
195
+ object_property_set_int(mms->cpu[i], "reset-cbar",
196
+ PERIPHBASE, &error_abort);
197
+ qdev_realize(DEVICE(mms->cpu[i]), NULL, &error_fatal);
198
+ object_unref(mms->cpu[i]);
199
+
200
+ /* Per-CPU RAM */
201
+ memory_region_init_ram(&mms->cpu_ram[i], NULL, ramname,
202
+ 0x1000, &error_fatal);
203
+ memory_region_add_subregion(&mms->cpu_sysmem[i], 0xe7c01000,
204
+ &mms->cpu_ram[i]);
205
+ }
206
+
207
+ create_gic(mms, sysmem);
208
+
209
+ mms->bootinfo.ram_size = machine->ram_size;
210
+ mms->bootinfo.board_id = -1;
211
+ mms->bootinfo.loader_start = mmc->loader_start;
212
+ mms->bootinfo.write_secondary_boot = mps3r_write_secondary_boot;
213
+ mms->bootinfo.secondary_cpu_reset_hook = mps3r_secondary_cpu_reset;
214
+ arm_load_kernel(ARM_CPU(mms->cpu[0]), machine, &mms->bootinfo);
158
}
215
}
159
216
160
-static uint64_t pmull_h(uint64_t op1, uint64_t op2)
217
static void mps3r_set_default_ram_info(MPS3RMachineClass *mmc)
161
+uint64_t pmull_w(uint64_t op1, uint64_t op2)
218
@@ -XXX,XX +XXX,XX @@ static void mps3r_set_default_ram_info(MPS3RMachineClass *mmc)
162
{
219
/* Found the entry for "system memory" */
163
uint64_t result = 0;
220
mc->default_ram_size = p->size;
164
int i;
221
mc->default_ram_id = p->name;
165
+ for (i = 0; i < 16; ++i) {
222
+ mmc->loader_start = p->base;
166
+ uint64_t mask = (op1 & 0x0000000100000001ull) * 0xffffffff;
223
return;
167
+ result ^= op2 & mask;
224
}
168
+ op1 >>= 1;
225
}
169
+ op2 <<= 1;
226
@@ -XXX,XX +XXX,XX @@ static void mps3r_an536_class_init(ObjectClass *oc, void *data)
170
+ }
227
};
171
+ return result;
228
172
+}
229
mc->desc = "ARM MPS3 with AN536 FPGA image for Cortex-R52";
173
230
- mc->default_cpus = 2;
174
+uint64_t pmull_h(uint64_t op1, uint64_t op2)
231
- mc->min_cpus = mc->default_cpus;
175
+{
232
- mc->max_cpus = mc->default_cpus;
176
+ uint64_t result = 0;
233
+ /*
177
+ int i;
234
+ * In the real FPGA image there are always two cores, but the standard
178
for (i = 0; i < 8; ++i) {
235
+ * initial setting for the SCC SYSCON 0x000 register is 0x21, meaning
179
uint64_t mask = (op1 & 0x0001000100010001ull) * 0xffff;
236
+ * that the second core is held in reset and halted. Many images built for
180
result ^= op2 & mask;
237
+ * the board do not expect the second core to run at startup (especially
238
+ * since on the real FPGA image it is not possible to use LDREX/STREX
239
+ * in RAM between the two cores, so a true SMP setup isn't supported).
240
+ *
241
+ * As QEMU's equivalent of this, we support both -smp 1 and -smp 2,
242
+ * with the default being -smp 1. This seems a more intuitive UI for
243
+ * QEMU users than, for instance, having a machine property to allow
244
+ * the user to set the initial value of the SYSCON 0x000 register.
245
+ */
246
+ mc->default_cpus = 1;
247
+ mc->min_cpus = 1;
248
+ mc->max_cpus = 2;
249
mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-r52");
250
mc->valid_cpu_types = valid_cpu_types;
251
mmc->raminfo = an536_raminfo;
181
--
252
--
182
2.20.1
253
2.34.1
183
184
diff view generated by jsdifflib
1
Implement the MVE VCTP insn, which sets the VPR.P0 predicate bits so
1
This board has a lot of UARTs: there is one UART per CPU in the
2
as to predicate any element at index Rn or greater is predicated. As
2
per-CPU peripheral part of the address map, whose interrupts are
3
with VPNOT, this insn itself is predicable and subject to beatwise
3
connected as per-CPU interrupt lines. Then there are 4 UARTs in the
4
execution.
4
normal part of the peripheral space, whose interrupts are shared
5
peripheral interrupts.
5
6
6
The calculation of the mask is the same as is used to determine
7
Connect and wire them all up; this involves some OR gates where
7
ltpmask in mve_element_mask(), but we precalculate masklen in
8
multiple overflow interrupts are wired into one GIC input.
8
generated code to avoid having to have 4 helpers specialized by size.
9
10
We put the decode line in with the low-overhead-loop insns in
11
t32.decode because it's logically part of that collection of insn
12
patterns, even though it is an MVE only insn.
13
9
14
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
15
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
11
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
12
Message-id: 20240206132931.38376-11-peter.maydell@linaro.org
16
---
13
---
17
target/arm/helper-mve.h | 2 ++
14
hw/arm/mps3r.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++
18
target/arm/translate-a32.h | 1 +
15
1 file changed, 94 insertions(+)
19
target/arm/t32.decode | 1 +
20
target/arm/mve_helper.c | 20 ++++++++++++++++++++
21
target/arm/translate-mve.c | 2 +-
22
target/arm/translate.c | 33 +++++++++++++++++++++++++++++++++
23
6 files changed, 58 insertions(+), 1 deletion(-)
24
16
25
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
17
diff --git a/hw/arm/mps3r.c b/hw/arm/mps3r.c
26
index XXXXXXX..XXXXXXX 100644
18
index XXXXXXX..XXXXXXX 100644
27
--- a/target/arm/helper-mve.h
19
--- a/hw/arm/mps3r.c
28
+++ b/target/arm/helper-mve.h
20
+++ b/hw/arm/mps3r.c
29
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_veor, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
21
@@ -XXX,XX +XXX,XX @@
30
DEF_HELPER_FLAGS_4(mve_vpsel, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
22
#include "qapi/qmp/qlist.h"
31
DEF_HELPER_FLAGS_1(mve_vpnot, TCG_CALL_NO_WG, void, env)
23
#include "exec/address-spaces.h"
32
24
#include "cpu.h"
33
+DEF_HELPER_FLAGS_2(mve_vctp, TCG_CALL_NO_WG, void, env, i32)
25
+#include "sysemu/sysemu.h"
26
#include "hw/boards.h"
27
+#include "hw/or-irq.h"
28
#include "hw/qdev-properties.h"
29
#include "hw/arm/boot.h"
30
#include "hw/arm/bsa.h"
31
+#include "hw/char/cmsdk-apb-uart.h"
32
#include "hw/intc/arm_gicv3.h"
33
34
/* Define the layout of RAM and ROM in a board */
35
@@ -XXX,XX +XXX,XX @@ typedef struct RAMInfo {
36
37
#define MPS3R_RAM_MAX 9
38
#define MPS3R_CPU_MAX 2
39
+#define MPS3R_UART_MAX 4 /* shared UART count */
40
41
#define PERIPHBASE 0xf0000000
42
#define NUM_SPIS 96
43
@@ -XXX,XX +XXX,XX @@ struct MPS3RMachineState {
44
MemoryRegion sysmem_alias[MPS3R_CPU_MAX];
45
MemoryRegion cpu_ram[MPS3R_CPU_MAX];
46
GICv3State gic;
47
+ /* per-CPU UARTs followed by the shared UARTs */
48
+ CMSDKAPBUART uart[MPS3R_CPU_MAX + MPS3R_UART_MAX];
49
+ OrIRQState cpu_uart_oflow[MPS3R_CPU_MAX];
50
+ OrIRQState uart_oflow;
51
};
52
53
#define TYPE_MPS3R_MACHINE "mps3r"
54
@@ -XXX,XX +XXX,XX @@ struct MPS3RMachineState {
55
56
OBJECT_DECLARE_TYPE(MPS3RMachineState, MPS3RMachineClass, MPS3R_MACHINE)
57
58
+/*
59
+ * Main clock frequency CLK in Hz (50MHz). In the image there are also
60
+ * ACLK, MCLK, GPUCLK and PERIPHCLK at the same frequency; for our
61
+ * model we just roll them all into one.
62
+ */
63
+#define CLK_FRQ 50000000
34
+
64
+
35
DEF_HELPER_FLAGS_4(mve_vaddb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
65
static const RAMInfo an536_raminfo[] = {
36
DEF_HELPER_FLAGS_4(mve_vaddh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
66
{
37
DEF_HELPER_FLAGS_4(mve_vaddw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
67
.name = "ATCM",
38
diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
68
@@ -XXX,XX +XXX,XX @@ static void create_gic(MPS3RMachineState *mms, MemoryRegion *sysmem)
39
index XXXXXXX..XXXXXXX 100644
40
--- a/target/arm/translate-a32.h
41
+++ b/target/arm/translate-a32.h
42
@@ -XXX,XX +XXX,XX @@ long neon_element_offset(int reg, int element, MemOp memop);
43
void gen_rev16(TCGv_i32 dest, TCGv_i32 var);
44
void clear_eci_state(DisasContext *s);
45
bool mve_eci_check(DisasContext *s);
46
+void mve_update_eci(DisasContext *s);
47
void mve_update_and_store_eci(DisasContext *s);
48
bool mve_skip_vmov(DisasContext *s, int vn, int index, int size);
49
50
diff --git a/target/arm/t32.decode b/target/arm/t32.decode
51
index XXXXXXX..XXXXXXX 100644
52
--- a/target/arm/t32.decode
53
+++ b/target/arm/t32.decode
54
@@ -XXX,XX +XXX,XX @@ BL 1111 0. .......... 11.1 ............ @branch24
55
# This is DLSTP
56
DLS 1111 0 0000 0 size:2 rn:4 1110 0000 0000 0001
57
}
69
}
58
+ VCTP 1111 0 0000 0 size:2 rn:4 1110 1000 0000 0001
59
]
60
}
70
}
61
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
62
index XXXXXXX..XXXXXXX 100644
63
--- a/target/arm/mve_helper.c
64
+++ b/target/arm/mve_helper.c
65
@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vpnot)(CPUARMState *env)
66
mve_advance_vpt(env);
67
}
68
71
69
+/*
72
+/*
70
+ * VCTP: P0 unexecuted bits unchanged, predicated bits zeroed,
73
+ * Create UART uartno, and map it into the MemoryRegion mem at address baseaddr.
71
+ * otherwise set according to value of Rn. The calculation of
74
+ * The qemu_irq arguments are where we connect the various IRQs from the UART.
72
+ * newmask here works in the same way as the calculation of the
73
+ * ltpmask in mve_element_mask(), but we have pre-calculated
74
+ * the masklen in the generated code.
75
+ */
75
+ */
76
+void HELPER(mve_vctp)(CPUARMState *env, uint32_t masklen)
76
+static void create_uart(MPS3RMachineState *mms, int uartno, MemoryRegion *mem,
77
+ hwaddr baseaddr, qemu_irq txirq, qemu_irq rxirq,
78
+ qemu_irq txoverirq, qemu_irq rxoverirq,
79
+ qemu_irq combirq)
77
+{
80
+{
78
+ uint16_t mask = mve_element_mask(env);
81
+ g_autofree char *s = g_strdup_printf("uart%d", uartno);
79
+ uint16_t eci_mask = mve_eci_mask(env);
82
+ SysBusDevice *sbd;
80
+ uint16_t newmask;
81
+
83
+
82
+ assert(masklen <= 16);
84
+ assert(uartno < ARRAY_SIZE(mms->uart));
83
+ newmask = masklen ? MAKE_64BIT_MASK(0, masklen) : 0;
85
+ object_initialize_child(OBJECT(mms), s, &mms->uart[uartno],
84
+ newmask &= mask;
86
+ TYPE_CMSDK_APB_UART);
85
+ env->v7m.vpr = (env->v7m.vpr & ~(uint32_t)eci_mask) | (newmask & eci_mask);
87
+ qdev_prop_set_uint32(DEVICE(&mms->uart[uartno]), "pclk-frq", CLK_FRQ);
86
+ mve_advance_vpt(env);
88
+ qdev_prop_set_chr(DEVICE(&mms->uart[uartno]), "chardev", serial_hd(uartno));
89
+ sbd = SYS_BUS_DEVICE(&mms->uart[uartno]);
90
+ sysbus_realize(sbd, &error_fatal);
91
+ memory_region_add_subregion(mem, baseaddr,
92
+ sysbus_mmio_get_region(sbd, 0));
93
+ sysbus_connect_irq(sbd, 0, txirq);
94
+ sysbus_connect_irq(sbd, 1, rxirq);
95
+ sysbus_connect_irq(sbd, 2, txoverirq);
96
+ sysbus_connect_irq(sbd, 3, rxoverirq);
97
+ sysbus_connect_irq(sbd, 4, combirq);
87
+}
98
+}
88
+
99
+
89
#define DO_1OP_SAT(OP, ESIZE, TYPE, FN) \
100
static void mps3r_common_init(MachineState *machine)
90
void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm) \
101
{
91
{ \
102
MPS3RMachineState *mms = MPS3R_MACHINE(machine);
92
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
103
MPS3RMachineClass *mmc = MPS3R_MACHINE_GET_CLASS(mms);
93
index XXXXXXX..XXXXXXX 100644
104
MemoryRegion *sysmem = get_system_memory();
94
--- a/target/arm/translate-mve.c
105
+ DeviceState *gicdev;
95
+++ b/target/arm/translate-mve.c
106
96
@@ -XXX,XX +XXX,XX @@ bool mve_eci_check(DisasContext *s)
107
for (const RAMInfo *ri = mmc->raminfo; ri->name; ri++) {
108
MemoryRegion *mr = mr_for_raminfo(mms, ri);
109
@@ -XXX,XX +XXX,XX @@ static void mps3r_common_init(MachineState *machine)
97
}
110
}
98
}
111
99
112
create_gic(mms, sysmem);
100
-static void mve_update_eci(DisasContext *s)
113
+ gicdev = DEVICE(&mms->gic);
101
+void mve_update_eci(DisasContext *s)
102
{
103
/*
104
* The helper function will always update the CPUState field,
105
diff --git a/target/arm/translate.c b/target/arm/translate.c
106
index XXXXXXX..XXXXXXX 100644
107
--- a/target/arm/translate.c
108
+++ b/target/arm/translate.c
109
@@ -XXX,XX +XXX,XX @@ static bool trans_LCTP(DisasContext *s, arg_LCTP *a)
110
return true;
111
}
112
113
+static bool trans_VCTP(DisasContext *s, arg_VCTP *a)
114
+{
115
+ /*
116
+ * M-profile Create Vector Tail Predicate. This insn is itself
117
+ * predicated and is subject to beatwise execution.
118
+ */
119
+ TCGv_i32 rn_shifted, masklen;
120
+
121
+ if (!dc_isar_feature(aa32_mve, s) || a->rn == 13 || a->rn == 15) {
122
+ return false;
123
+ }
124
+
125
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
126
+ return true;
127
+ }
128
+
114
+
129
+ /*
115
+ /*
130
+ * We pre-calculate the mask length here to avoid having
116
+ * UARTs 0 and 1 are per-CPU; their interrupts are wired to
131
+ * to have multiple helpers specialized for size.
117
+ * the relevant CPU's PPI 0..3, aka INTID 16..19
132
+ * We pass the helper "rn <= (1 << (4 - size)) ? (rn << size) : 16".
133
+ */
118
+ */
134
+ rn_shifted = tcg_temp_new_i32();
119
+ for (int i = 0; i < machine->smp.cpus; i++) {
135
+ masklen = load_reg(s, a->rn);
120
+ int intidbase = NUM_SPIS + i * GIC_INTERNAL;
136
+ tcg_gen_shli_i32(rn_shifted, masklen, a->size);
121
+ g_autofree char *s = g_strdup_printf("cpu-uart-oflow-orgate%d", i);
137
+ tcg_gen_movcond_i32(TCG_COND_LEU, masklen,
122
+ DeviceState *orgate;
138
+ masklen, tcg_constant_i32(1 << (4 - a->size)),
123
+
139
+ rn_shifted, tcg_constant_i32(16));
124
+ /* The two overflow IRQs from the UART are ORed together into PPI 3 */
140
+ gen_helper_mve_vctp(cpu_env, masklen);
125
+ object_initialize_child(OBJECT(mms), s, &mms->cpu_uart_oflow[i],
141
+ tcg_temp_free_i32(masklen);
126
+ TYPE_OR_IRQ);
142
+ tcg_temp_free_i32(rn_shifted);
127
+ orgate = DEVICE(&mms->cpu_uart_oflow[i]);
143
+ mve_update_eci(s);
128
+ qdev_prop_set_uint32(orgate, "num-lines", 2);
144
+ return true;
129
+ qdev_realize(orgate, NULL, &error_fatal);
145
+}
130
+ qdev_connect_gpio_out(orgate, 0,
146
131
+ qdev_get_gpio_in(gicdev, intidbase + 19));
147
static bool op_tbranch(DisasContext *s, arg_tbranch *a, bool half)
132
+
148
{
133
+ create_uart(mms, i, &mms->cpu_sysmem[i], 0xe7c00000,
134
+ qdev_get_gpio_in(gicdev, intidbase + 17), /* tx */
135
+ qdev_get_gpio_in(gicdev, intidbase + 16), /* rx */
136
+ qdev_get_gpio_in(orgate, 0), /* txover */
137
+ qdev_get_gpio_in(orgate, 1), /* rxover */
138
+ qdev_get_gpio_in(gicdev, intidbase + 18) /* combined */);
139
+ }
140
+ /*
141
+ * UARTs 2 to 5 are whole-system; all overflow IRQs are ORed
142
+ * together into IRQ 17
143
+ */
144
+ object_initialize_child(OBJECT(mms), "uart-oflow-orgate",
145
+ &mms->uart_oflow, TYPE_OR_IRQ);
146
+ qdev_prop_set_uint32(DEVICE(&mms->uart_oflow), "num-lines",
147
+ MPS3R_UART_MAX * 2);
148
+ qdev_realize(DEVICE(&mms->uart_oflow), NULL, &error_fatal);
149
+ qdev_connect_gpio_out(DEVICE(&mms->uart_oflow), 0,
150
+ qdev_get_gpio_in(gicdev, 17));
151
+
152
+ for (int i = 0; i < MPS3R_UART_MAX; i++) {
153
+ hwaddr baseaddr = 0xe0205000 + i * 0x1000;
154
+ int rxirq = 5 + i * 2, txirq = 6 + i * 2, combirq = 13 + i;
155
+
156
+ create_uart(mms, i + MPS3R_CPU_MAX, sysmem, baseaddr,
157
+ qdev_get_gpio_in(gicdev, txirq),
158
+ qdev_get_gpio_in(gicdev, rxirq),
159
+ qdev_get_gpio_in(DEVICE(&mms->uart_oflow), i * 2),
160
+ qdev_get_gpio_in(DEVICE(&mms->uart_oflow), i * 2 + 1),
161
+ qdev_get_gpio_in(gicdev, combirq));
162
+ }
163
164
mms->bootinfo.ram_size = machine->ram_size;
165
mms->bootinfo.board_id = -1;
149
--
166
--
150
2.20.1
167
2.34.1
151
168
152
169
diff view generated by jsdifflib
1
Implement the MVE VLDR/VSTR insns which do scatter-gather using base
1
Add the GPIO, watchdog, dual-timer and I2C devices to the mps3-an536
2
addresses from Qm plus or minus an immediate offset (possibly with
2
board. These are all simple devices that just need to be created and
3
writeback). Note that writeback is not predicated but it does have
3
wired up.
4
to honour ECI state, so we have to add an eci_mask check to the
5
VSTR_SG macros (the VLDR_SG macros already needed this to be able
6
to distinguish "skip beat" from "set predicated element to 0").
7
4
8
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
7
Message-id: 20240206132931.38376-12-peter.maydell@linaro.org
10
---
8
---
11
target/arm/helper-mve.h | 5 +++
9
hw/arm/mps3r.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++
12
target/arm/mve.decode | 10 +++++
10
1 file changed, 59 insertions(+)
13
target/arm/mve_helper.c | 91 ++++++++++++++++++++++++--------------
14
target/arm/translate-mve.c | 72 ++++++++++++++++++++++++++++++
15
4 files changed, 146 insertions(+), 32 deletions(-)
16
11
17
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
12
diff --git a/hw/arm/mps3r.c b/hw/arm/mps3r.c
18
index XXXXXXX..XXXXXXX 100644
13
index XXXXXXX..XXXXXXX 100644
19
--- a/target/arm/helper-mve.h
14
--- a/hw/arm/mps3r.c
20
+++ b/target/arm/helper-mve.h
15
+++ b/hw/arm/mps3r.c
21
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vstrh_sg_os_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
16
@@ -XXX,XX +XXX,XX @@
22
DEF_HELPER_FLAGS_4(mve_vstrw_sg_os_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
17
#include "sysemu/sysemu.h"
23
DEF_HELPER_FLAGS_4(mve_vstrd_sg_os_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
18
#include "hw/boards.h"
24
19
#include "hw/or-irq.h"
25
+DEF_HELPER_FLAGS_4(mve_vldrw_sg_wb_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
20
+#include "hw/qdev-clock.h"
26
+DEF_HELPER_FLAGS_4(mve_vldrd_sg_wb_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
21
#include "hw/qdev-properties.h"
27
+DEF_HELPER_FLAGS_4(mve_vstrw_sg_wb_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
22
#include "hw/arm/boot.h"
28
+DEF_HELPER_FLAGS_4(mve_vstrd_sg_wb_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
23
#include "hw/arm/bsa.h"
24
#include "hw/char/cmsdk-apb-uart.h"
25
+#include "hw/i2c/arm_sbcon_i2c.h"
26
#include "hw/intc/arm_gicv3.h"
27
+#include "hw/misc/unimp.h"
28
+#include "hw/timer/cmsdk-apb-dualtimer.h"
29
+#include "hw/watchdog/cmsdk-apb-watchdog.h"
30
31
/* Define the layout of RAM and ROM in a board */
32
typedef struct RAMInfo {
33
@@ -XXX,XX +XXX,XX @@ struct MPS3RMachineState {
34
CMSDKAPBUART uart[MPS3R_CPU_MAX + MPS3R_UART_MAX];
35
OrIRQState cpu_uart_oflow[MPS3R_CPU_MAX];
36
OrIRQState uart_oflow;
37
+ CMSDKAPBWatchdog watchdog;
38
+ CMSDKAPBDualTimer dualtimer;
39
+ ArmSbconI2CState i2c[5];
40
+ Clock *clk;
41
};
42
43
#define TYPE_MPS3R_MACHINE "mps3r"
44
@@ -XXX,XX +XXX,XX @@ static void mps3r_common_init(MachineState *machine)
45
MemoryRegion *sysmem = get_system_memory();
46
DeviceState *gicdev;
47
48
+ mms->clk = clock_new(OBJECT(machine), "CLK");
49
+ clock_set_hz(mms->clk, CLK_FRQ);
29
+
50
+
30
DEF_HELPER_FLAGS_3(mve_vdup, TCG_CALL_NO_WG, void, env, ptr, i32)
51
for (const RAMInfo *ri = mmc->raminfo; ri->name; ri++) {
31
52
MemoryRegion *mr = mr_for_raminfo(mms, ri);
32
DEF_HELPER_FLAGS_4(mve_vidupb, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
53
memory_region_add_subregion(sysmem, ri->base, mr);
33
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
54
@@ -XXX,XX +XXX,XX @@ static void mps3r_common_init(MachineState *machine)
34
index XXXXXXX..XXXXXXX 100644
55
qdev_get_gpio_in(gicdev, combirq));
35
--- a/target/arm/mve.decode
36
+++ b/target/arm/mve.decode
37
@@ -XXX,XX +XXX,XX @@
38
&vmaxv qm rda size
39
&vabav qn qm rda size
40
&vldst_sg qd qm rn size msize os
41
+&vldst_sg_imm qd qm a w imm
42
43
# scatter-gather memory size is in bits 6:4
44
%sg_msize 6:1 4:1
45
@@ -XXX,XX +XXX,XX @@
46
@vldst_sg .... .... .... rn:4 .... ... size:2 ... ... os:1 &vldst_sg \
47
qd=%qd qm=%qm msize=%sg_msize
48
49
+# Qm is in the fields usually labeled Qn
50
+@vldst_sg_imm .... .... a:1 . w:1 . .... .... .... . imm:7 &vldst_sg_imm \
51
+ qd=%qd qm=%qn
52
+
53
@1op .... .... .... size:2 .. .... .... .... .... &1op qd=%qd qm=%qm
54
@1op_nosz .... .... .... .... .... .... .... .... &1op qd=%qd qm=%qm size=0
55
@2op .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn
56
@@ -XXX,XX +XXX,XX @@ VLDR_S_sg 111 0 1100 1 . 01 .... ... 0 111 . .... .... @vldst_sg
57
VLDR_U_sg 111 1 1100 1 . 01 .... ... 0 111 . .... .... @vldst_sg
58
VSTR_sg 111 0 1100 1 . 00 .... ... 0 111 . .... .... @vldst_sg
59
60
+VLDRW_sg_imm 111 1 1101 ... 1 ... 0 ... 1 1110 .... .... @vldst_sg_imm
61
+VLDRD_sg_imm 111 1 1101 ... 1 ... 0 ... 1 1111 .... .... @vldst_sg_imm
62
+VSTRW_sg_imm 111 1 1101 ... 0 ... 0 ... 1 1110 .... .... @vldst_sg_imm
63
+VSTRD_sg_imm 111 1 1101 ... 0 ... 0 ... 1 1111 .... .... @vldst_sg_imm
64
+
65
# Moves between 2 32-bit vector lanes and 2 general purpose registers
66
VMOV_to_2gp 1110 1100 0 . 00 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
67
VMOV_from_2gp 1110 1100 0 . 01 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
68
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
69
index XXXXXXX..XXXXXXX 100644
70
--- a/target/arm/mve_helper.c
71
+++ b/target/arm/mve_helper.c
72
@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
73
* For loads, predicated lanes are zeroed instead of retaining
74
* their previous values.
75
*/
76
-#define DO_VLDR_SG(OP, LDTYPE, ESIZE, TYPE, OFFTYPE, ADDRFN) \
77
+#define DO_VLDR_SG(OP, LDTYPE, ESIZE, TYPE, OFFTYPE, ADDRFN, WB) \
78
void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, \
79
uint32_t base) \
80
{ \
81
@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
82
addr = ADDRFN(base, m[H##ESIZE(e)]); \
83
d[H##ESIZE(e)] = (mask & 1) ? \
84
cpu_##LDTYPE##_data_ra(env, addr, GETPC()) : 0; \
85
+ if (WB) { \
86
+ m[H##ESIZE(e)] = addr; \
87
+ } \
88
} \
89
mve_advance_vpt(env); \
90
}
56
}
91
57
92
/* We know here TYPE is unsigned so always the same as the offset type */
58
+ for (int i = 0; i < 4; i++) {
93
-#define DO_VSTR_SG(OP, STTYPE, ESIZE, TYPE, ADDRFN) \
59
+ /* CMSDK GPIO controllers */
94
+#define DO_VSTR_SG(OP, STTYPE, ESIZE, TYPE, ADDRFN, WB) \
60
+ g_autofree char *s = g_strdup_printf("gpio%d", i);
95
void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, \
61
+ create_unimplemented_device(s, 0xe0000000 + i * 0x1000, 0x1000);
96
uint32_t base) \
97
{ \
98
TYPE *d = vd; \
99
TYPE *m = vm; \
100
uint16_t mask = mve_element_mask(env); \
101
+ uint16_t eci_mask = mve_eci_mask(env); \
102
unsigned e; \
103
uint32_t addr; \
104
- for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
105
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE, eci_mask >>= ESIZE) { \
106
+ if (!(eci_mask & 1)) { \
107
+ continue; \
108
+ } \
109
addr = ADDRFN(base, m[H##ESIZE(e)]); \
110
if (mask & 1) { \
111
cpu_##STTYPE##_data_ra(env, addr, d[H##ESIZE(e)], GETPC()); \
112
} \
113
+ if (WB) { \
114
+ m[H##ESIZE(e)] = addr; \
115
+ } \
116
} \
117
mve_advance_vpt(env); \
118
}
119
@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
120
* accesses, controlled by the predicate mask for the relevant beat,
121
* and with a single 32-bit offset in the first of the two Qm elements.
122
* Note that for QEMU our IMPDEF AIRCR.ENDIANNESS is always 0 (little).
123
+ * Address writeback happens on the odd beats and updates the address
124
+ * stored in the even-beat element.
125
*/
126
-#define DO_VLDR64_SG(OP, ADDRFN) \
127
+#define DO_VLDR64_SG(OP, ADDRFN, WB) \
128
void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, \
129
uint32_t base) \
130
{ \
131
@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
132
addr = ADDRFN(base, m[H4(e & ~1)]); \
133
addr += 4 * (e & 1); \
134
d[H4(e)] = (mask & 1) ? cpu_ldl_data_ra(env, addr, GETPC()) : 0; \
135
+ if (WB && (e & 1)) { \
136
+ m[H4(e & ~1)] = addr - 4; \
137
+ } \
138
} \
139
mve_advance_vpt(env); \
140
}
141
142
-#define DO_VSTR64_SG(OP, ADDRFN) \
143
+#define DO_VSTR64_SG(OP, ADDRFN, WB) \
144
void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, \
145
uint32_t base) \
146
{ \
147
uint32_t *d = vd; \
148
uint32_t *m = vm; \
149
uint16_t mask = mve_element_mask(env); \
150
+ uint16_t eci_mask = mve_eci_mask(env); \
151
unsigned e; \
152
uint32_t addr; \
153
- for (e = 0; e < 16 / 4; e++, mask >>= 4) { \
154
+ for (e = 0; e < 16 / 4; e++, mask >>= 4, eci_mask >>= 4) { \
155
+ if (!(eci_mask & 1)) { \
156
+ continue; \
157
+ } \
158
addr = ADDRFN(base, m[H4(e & ~1)]); \
159
addr += 4 * (e & 1); \
160
if (mask & 1) { \
161
cpu_stl_data_ra(env, addr, d[H4(e)], GETPC()); \
162
} \
163
+ if (WB && (e & 1)) { \
164
+ m[H4(e & ~1)] = addr - 4; \
165
+ } \
166
} \
167
mve_advance_vpt(env); \
168
}
169
@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
170
#define ADDR_ADD_OSW(BASE, OFFSET) ((BASE) + ((OFFSET) << 2))
171
#define ADDR_ADD_OSD(BASE, OFFSET) ((BASE) + ((OFFSET) << 3))
172
173
-DO_VLDR_SG(vldrb_sg_sh, ldsb, 2, int16_t, uint16_t, ADDR_ADD)
174
-DO_VLDR_SG(vldrb_sg_sw, ldsb, 4, int32_t, uint32_t, ADDR_ADD)
175
-DO_VLDR_SG(vldrh_sg_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD)
176
+DO_VLDR_SG(vldrb_sg_sh, ldsb, 2, int16_t, uint16_t, ADDR_ADD, false)
177
+DO_VLDR_SG(vldrb_sg_sw, ldsb, 4, int32_t, uint32_t, ADDR_ADD, false)
178
+DO_VLDR_SG(vldrh_sg_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD, false)
179
180
-DO_VLDR_SG(vldrb_sg_ub, ldub, 1, uint8_t, uint8_t, ADDR_ADD)
181
-DO_VLDR_SG(vldrb_sg_uh, ldub, 2, uint16_t, uint16_t, ADDR_ADD)
182
-DO_VLDR_SG(vldrb_sg_uw, ldub, 4, uint32_t, uint32_t, ADDR_ADD)
183
-DO_VLDR_SG(vldrh_sg_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD)
184
-DO_VLDR_SG(vldrh_sg_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD)
185
-DO_VLDR_SG(vldrw_sg_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD)
186
-DO_VLDR64_SG(vldrd_sg_ud, ADDR_ADD)
187
+DO_VLDR_SG(vldrb_sg_ub, ldub, 1, uint8_t, uint8_t, ADDR_ADD, false)
188
+DO_VLDR_SG(vldrb_sg_uh, ldub, 2, uint16_t, uint16_t, ADDR_ADD, false)
189
+DO_VLDR_SG(vldrb_sg_uw, ldub, 4, uint32_t, uint32_t, ADDR_ADD, false)
190
+DO_VLDR_SG(vldrh_sg_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD, false)
191
+DO_VLDR_SG(vldrh_sg_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD, false)
192
+DO_VLDR_SG(vldrw_sg_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD, false)
193
+DO_VLDR64_SG(vldrd_sg_ud, ADDR_ADD, false)
194
195
-DO_VLDR_SG(vldrh_sg_os_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD_OSH)
196
-DO_VLDR_SG(vldrh_sg_os_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD_OSH)
197
-DO_VLDR_SG(vldrh_sg_os_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD_OSH)
198
-DO_VLDR_SG(vldrw_sg_os_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD_OSW)
199
-DO_VLDR64_SG(vldrd_sg_os_ud, ADDR_ADD_OSD)
200
+DO_VLDR_SG(vldrh_sg_os_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD_OSH, false)
201
+DO_VLDR_SG(vldrh_sg_os_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD_OSH, false)
202
+DO_VLDR_SG(vldrh_sg_os_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD_OSH, false)
203
+DO_VLDR_SG(vldrw_sg_os_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD_OSW, false)
204
+DO_VLDR64_SG(vldrd_sg_os_ud, ADDR_ADD_OSD, false)
205
206
-DO_VSTR_SG(vstrb_sg_ub, stb, 1, uint8_t, ADDR_ADD)
207
-DO_VSTR_SG(vstrb_sg_uh, stb, 2, uint16_t, ADDR_ADD)
208
-DO_VSTR_SG(vstrb_sg_uw, stb, 4, uint32_t, ADDR_ADD)
209
-DO_VSTR_SG(vstrh_sg_uh, stw, 2, uint16_t, ADDR_ADD)
210
-DO_VSTR_SG(vstrh_sg_uw, stw, 4, uint32_t, ADDR_ADD)
211
-DO_VSTR_SG(vstrw_sg_uw, stl, 4, uint32_t, ADDR_ADD)
212
-DO_VSTR64_SG(vstrd_sg_ud, ADDR_ADD)
213
+DO_VSTR_SG(vstrb_sg_ub, stb, 1, uint8_t, ADDR_ADD, false)
214
+DO_VSTR_SG(vstrb_sg_uh, stb, 2, uint16_t, ADDR_ADD, false)
215
+DO_VSTR_SG(vstrb_sg_uw, stb, 4, uint32_t, ADDR_ADD, false)
216
+DO_VSTR_SG(vstrh_sg_uh, stw, 2, uint16_t, ADDR_ADD, false)
217
+DO_VSTR_SG(vstrh_sg_uw, stw, 4, uint32_t, ADDR_ADD, false)
218
+DO_VSTR_SG(vstrw_sg_uw, stl, 4, uint32_t, ADDR_ADD, false)
219
+DO_VSTR64_SG(vstrd_sg_ud, ADDR_ADD, false)
220
221
-DO_VSTR_SG(vstrh_sg_os_uh, stw, 2, uint16_t, ADDR_ADD_OSH)
222
-DO_VSTR_SG(vstrh_sg_os_uw, stw, 4, uint32_t, ADDR_ADD_OSH)
223
-DO_VSTR_SG(vstrw_sg_os_uw, stl, 4, uint32_t, ADDR_ADD_OSW)
224
-DO_VSTR64_SG(vstrd_sg_os_ud, ADDR_ADD_OSD)
225
+DO_VSTR_SG(vstrh_sg_os_uh, stw, 2, uint16_t, ADDR_ADD_OSH, false)
226
+DO_VSTR_SG(vstrh_sg_os_uw, stw, 4, uint32_t, ADDR_ADD_OSH, false)
227
+DO_VSTR_SG(vstrw_sg_os_uw, stl, 4, uint32_t, ADDR_ADD_OSW, false)
228
+DO_VSTR64_SG(vstrd_sg_os_ud, ADDR_ADD_OSD, false)
229
+
230
+DO_VLDR_SG(vldrw_sg_wb_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD, true)
231
+DO_VLDR64_SG(vldrd_sg_wb_ud, ADDR_ADD, true)
232
+DO_VSTR_SG(vstrw_sg_wb_uw, stl, 4, uint32_t, ADDR_ADD, true)
233
+DO_VSTR64_SG(vstrd_sg_wb_ud, ADDR_ADD, true)
234
235
/*
236
* The mergemask(D, R, M) macro performs the operation "*D = R" but
237
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
238
index XXXXXXX..XXXXXXX 100644
239
--- a/target/arm/translate-mve.c
240
+++ b/target/arm/translate-mve.c
241
@@ -XXX,XX +XXX,XX @@ static bool trans_VSTR_sg(DisasContext *s, arg_vldst_sg *a)
242
243
#undef F
244
245
+static bool do_ldst_sg_imm(DisasContext *s, arg_vldst_sg_imm *a,
246
+ MVEGenLdStSGFn *fn, unsigned msize)
247
+{
248
+ uint32_t offset;
249
+ TCGv_ptr qd, qm;
250
+
251
+ if (!dc_isar_feature(aa32_mve, s) ||
252
+ !mve_check_qreg_bank(s, a->qd | a->qm) ||
253
+ !fn) {
254
+ return false;
255
+ }
62
+ }
256
+
63
+
257
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
64
+ object_initialize_child(OBJECT(mms), "watchdog", &mms->watchdog,
258
+ return true;
65
+ TYPE_CMSDK_APB_WATCHDOG);
66
+ qdev_connect_clock_in(DEVICE(&mms->watchdog), "WDOGCLK", mms->clk);
67
+ sysbus_realize(SYS_BUS_DEVICE(&mms->watchdog), &error_fatal);
68
+ sysbus_connect_irq(SYS_BUS_DEVICE(&mms->watchdog), 0,
69
+ qdev_get_gpio_in(gicdev, 0));
70
+ sysbus_mmio_map(SYS_BUS_DEVICE(&mms->watchdog), 0, 0xe0100000);
71
+
72
+ object_initialize_child(OBJECT(mms), "dualtimer", &mms->dualtimer,
73
+ TYPE_CMSDK_APB_DUALTIMER);
74
+ qdev_connect_clock_in(DEVICE(&mms->dualtimer), "TIMCLK", mms->clk);
75
+ sysbus_realize(SYS_BUS_DEVICE(&mms->dualtimer), &error_fatal);
76
+ sysbus_connect_irq(SYS_BUS_DEVICE(&mms->dualtimer), 0,
77
+ qdev_get_gpio_in(gicdev, 3));
78
+ sysbus_connect_irq(SYS_BUS_DEVICE(&mms->dualtimer), 1,
79
+ qdev_get_gpio_in(gicdev, 1));
80
+ sysbus_connect_irq(SYS_BUS_DEVICE(&mms->dualtimer), 2,
81
+ qdev_get_gpio_in(gicdev, 2));
82
+ sysbus_mmio_map(SYS_BUS_DEVICE(&mms->dualtimer), 0, 0xe0101000);
83
+
84
+ for (int i = 0; i < ARRAY_SIZE(mms->i2c); i++) {
85
+ static const hwaddr i2cbase[] = {0xe0102000, /* Touch */
86
+ 0xe0103000, /* Audio */
87
+ 0xe0107000, /* Shield0 */
88
+ 0xe0108000, /* Shield1 */
89
+ 0xe0109000}; /* DDR4 EEPROM */
90
+ g_autofree char *s = g_strdup_printf("i2c%d", i);
91
+
92
+ object_initialize_child(OBJECT(mms), s, &mms->i2c[i],
93
+ TYPE_ARM_SBCON_I2C);
94
+ sysbus_realize(SYS_BUS_DEVICE(&mms->i2c[i]), &error_fatal);
95
+ sysbus_mmio_map(SYS_BUS_DEVICE(&mms->i2c[i]), 0, i2cbase[i]);
96
+ if (i != 2 && i != 3) {
97
+ /*
98
+ * internal-only bus: mark it full to avoid user-created
99
+ * i2c devices being plugged into it.
100
+ */
101
+ qbus_mark_full(qdev_get_child_bus(DEVICE(&mms->i2c[i]), "i2c"));
102
+ }
259
+ }
103
+ }
260
+
104
+
261
+ offset = a->imm << msize;
105
mms->bootinfo.ram_size = machine->ram_size;
262
+ if (!a->a) {
106
mms->bootinfo.board_id = -1;
263
+ offset = -offset;
107
mms->bootinfo.loader_start = mmc->loader_start;
264
+ }
265
+
266
+ qd = mve_qreg_ptr(a->qd);
267
+ qm = mve_qreg_ptr(a->qm);
268
+ fn(cpu_env, qd, qm, tcg_constant_i32(offset));
269
+ tcg_temp_free_ptr(qd);
270
+ tcg_temp_free_ptr(qm);
271
+ mve_update_eci(s);
272
+ return true;
273
+}
274
+
275
+static bool trans_VLDRW_sg_imm(DisasContext *s, arg_vldst_sg_imm *a)
276
+{
277
+ static MVEGenLdStSGFn * const fns[] = {
278
+ gen_helper_mve_vldrw_sg_uw,
279
+ gen_helper_mve_vldrw_sg_wb_uw,
280
+ };
281
+ if (a->qd == a->qm) {
282
+ return false; /* UNPREDICTABLE */
283
+ }
284
+ return do_ldst_sg_imm(s, a, fns[a->w], MO_32);
285
+}
286
+
287
+static bool trans_VLDRD_sg_imm(DisasContext *s, arg_vldst_sg_imm *a)
288
+{
289
+ static MVEGenLdStSGFn * const fns[] = {
290
+ gen_helper_mve_vldrd_sg_ud,
291
+ gen_helper_mve_vldrd_sg_wb_ud,
292
+ };
293
+ if (a->qd == a->qm) {
294
+ return false; /* UNPREDICTABLE */
295
+ }
296
+ return do_ldst_sg_imm(s, a, fns[a->w], MO_64);
297
+}
298
+
299
+static bool trans_VSTRW_sg_imm(DisasContext *s, arg_vldst_sg_imm *a)
300
+{
301
+ static MVEGenLdStSGFn * const fns[] = {
302
+ gen_helper_mve_vstrw_sg_uw,
303
+ gen_helper_mve_vstrw_sg_wb_uw,
304
+ };
305
+ return do_ldst_sg_imm(s, a, fns[a->w], MO_32);
306
+}
307
+
308
+static bool trans_VSTRD_sg_imm(DisasContext *s, arg_vldst_sg_imm *a)
309
+{
310
+ static MVEGenLdStSGFn * const fns[] = {
311
+ gen_helper_mve_vstrd_sg_ud,
312
+ gen_helper_mve_vstrd_sg_wb_ud,
313
+ };
314
+ return do_ldst_sg_imm(s, a, fns[a->w], MO_64);
315
+}
316
+
317
static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
318
{
319
TCGv_ptr qd;
320
--
108
--
321
2.20.1
109
2.34.1
322
110
323
111
diff view generated by jsdifflib
1
Implement the MVE incrementing/decrementing dup insns VIDUP, VDDUP,
1
Add the remaining devices (or unimplemented-device stubs) for
2
VIWDUP and VDWDUP. These fill the elements of a vector with
2
this board: SPI controllers, SCC, FPGAIO, I2S, RTC, the
3
successively incrementing values, starting at the offset specified in
3
QSPI write-config block, and ethernet.
4
a general purpose register. The final value of the offset is written
5
back to this register. The wrapping variants take a second general
6
purpose register which specifies the point where the count should
7
wrap back to 0.
8
4
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
10
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
7
Message-id: 20240206132931.38376-13-peter.maydell@linaro.org
11
---
8
---
12
target/arm/helper-mve.h | 12 ++++
9
hw/arm/mps3r.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++
13
target/arm/mve.decode | 25 ++++++++
10
1 file changed, 74 insertions(+)
14
target/arm/mve_helper.c | 63 +++++++++++++++++++
15
target/arm/translate-mve.c | 120 +++++++++++++++++++++++++++++++++++++
16
4 files changed, 220 insertions(+)
17
11
18
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
12
diff --git a/hw/arm/mps3r.c b/hw/arm/mps3r.c
19
index XXXXXXX..XXXXXXX 100644
13
index XXXXXXX..XXXXXXX 100644
20
--- a/target/arm/helper-mve.h
14
--- a/hw/arm/mps3r.c
21
+++ b/target/arm/helper-mve.h
15
+++ b/hw/arm/mps3r.c
22
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vstrh_w, TCG_CALL_NO_WG, void, env, ptr, i32)
16
@@ -XXX,XX +XXX,XX @@
23
17
#include "hw/char/cmsdk-apb-uart.h"
24
DEF_HELPER_FLAGS_3(mve_vdup, TCG_CALL_NO_WG, void, env, ptr, i32)
18
#include "hw/i2c/arm_sbcon_i2c.h"
25
19
#include "hw/intc/arm_gicv3.h"
26
+DEF_HELPER_FLAGS_4(mve_vidupb, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
20
+#include "hw/misc/mps2-scc.h"
27
+DEF_HELPER_FLAGS_4(mve_viduph, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
21
+#include "hw/misc/mps2-fpgaio.h"
28
+DEF_HELPER_FLAGS_4(mve_vidupw, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
22
#include "hw/misc/unimp.h"
23
+#include "hw/net/lan9118.h"
24
+#include "hw/rtc/pl031.h"
25
+#include "hw/ssi/pl022.h"
26
#include "hw/timer/cmsdk-apb-dualtimer.h"
27
#include "hw/watchdog/cmsdk-apb-watchdog.h"
28
29
@@ -XXX,XX +XXX,XX @@ struct MPS3RMachineState {
30
CMSDKAPBWatchdog watchdog;
31
CMSDKAPBDualTimer dualtimer;
32
ArmSbconI2CState i2c[5];
33
+ PL022State spi[3];
34
+ MPS2SCC scc;
35
+ MPS2FPGAIO fpgaio;
36
+ UnimplementedDeviceState i2s_audio;
37
+ PL031State rtc;
38
Clock *clk;
39
};
40
41
@@ -XXX,XX +XXX,XX @@ static const RAMInfo an536_raminfo[] = {
42
}
43
};
44
45
+static const int an536_oscclk[] = {
46
+ 24000000, /* 24MHz reference for RTC and timers */
47
+ 50000000, /* 50MHz ACLK */
48
+ 50000000, /* 50MHz MCLK */
49
+ 50000000, /* 50MHz GPUCLK */
50
+ 24576000, /* 24.576MHz AUDCLK */
51
+ 23750000, /* 23.75MHz HDLCDCLK */
52
+ 100000000, /* 100MHz DDR4_REF_CLK */
53
+};
29
+
54
+
30
+DEF_HELPER_FLAGS_5(mve_viwdupb, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
55
static MemoryRegion *mr_for_raminfo(MPS3RMachineState *mms,
31
+DEF_HELPER_FLAGS_5(mve_viwduph, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
56
const RAMInfo *raminfo)
32
+DEF_HELPER_FLAGS_5(mve_viwdupw, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
57
{
58
@@ -XXX,XX +XXX,XX @@ static void mps3r_common_init(MachineState *machine)
59
MPS3RMachineClass *mmc = MPS3R_MACHINE_GET_CLASS(mms);
60
MemoryRegion *sysmem = get_system_memory();
61
DeviceState *gicdev;
62
+ QList *oscclk;
63
64
mms->clk = clock_new(OBJECT(machine), "CLK");
65
clock_set_hz(mms->clk, CLK_FRQ);
66
@@ -XXX,XX +XXX,XX @@ static void mps3r_common_init(MachineState *machine)
67
}
68
}
69
70
+ for (int i = 0; i < ARRAY_SIZE(mms->spi); i++) {
71
+ g_autofree char *s = g_strdup_printf("spi%d", i);
72
+ hwaddr baseaddr = 0xe0104000 + i * 0x1000;
33
+
73
+
34
+DEF_HELPER_FLAGS_5(mve_vdwdupb, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
74
+ object_initialize_child(OBJECT(mms), s, &mms->spi[i], TYPE_PL022);
35
+DEF_HELPER_FLAGS_5(mve_vdwduph, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
75
+ sysbus_realize(SYS_BUS_DEVICE(&mms->spi[i]), &error_fatal);
36
+DEF_HELPER_FLAGS_5(mve_vdwdupw, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
76
+ sysbus_mmio_map(SYS_BUS_DEVICE(&mms->spi[i]), 0, baseaddr);
37
+
77
+ sysbus_connect_irq(SYS_BUS_DEVICE(&mms->spi[i]), 0,
38
DEF_HELPER_FLAGS_3(mve_vclsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
78
+ qdev_get_gpio_in(gicdev, 22 + i));
39
DEF_HELPER_FLAGS_3(mve_vclsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
40
DEF_HELPER_FLAGS_3(mve_vclsw, TCG_CALL_NO_WG, void, env, ptr, ptr)
41
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
42
index XXXXXXX..XXXXXXX 100644
43
--- a/target/arm/mve.decode
44
+++ b/target/arm/mve.decode
45
@@ -XXX,XX +XXX,XX @@
46
&2scalar qd qn rm size
47
&1imm qd imm cmode op
48
&2shift qd qm shift size
49
+&vidup qd rn size imm
50
+&viwdup qd rn rm size imm
51
52
@vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
53
# Note that both Rn and Qd are 3 bits only (no D bit)
54
@@ -XXX,XX +XXX,XX @@ VDUP 1110 1110 1 1 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=0
55
VDUP 1110 1110 1 0 10 ... 0 .... 1011 . 0 1 1 0000 @vdup size=1
56
VDUP 1110 1110 1 0 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=2
57
58
+# Incrementing and decrementing dup
59
+
60
+# VIDUP, VDDUP format immediate: 1 << (immh:imml)
61
+%imm_vidup 7:1 0:1 !function=vidup_imm
62
+
63
+# VIDUP, VDDUP registers: Rm bits [3:1] from insn, bit 0 is 1;
64
+# Rn bits [3:1] from insn, bit 0 is 0
65
+%vidup_rm 1:3 !function=times_2_plus_1
66
+%vidup_rn 17:3 !function=times_2
67
+
68
+@vidup .... .... . . size:2 .... .... .... .... .... \
69
+ qd=%qd imm=%imm_vidup rn=%vidup_rn &vidup
70
+@viwdup .... .... . . size:2 .... .... .... .... .... \
71
+ qd=%qd imm=%imm_vidup rm=%vidup_rm rn=%vidup_rn &viwdup
72
+{
73
+ VIDUP 1110 1110 0 . .. ... 1 ... 0 1111 . 110 111 . @vidup
74
+ VIWDUP 1110 1110 0 . .. ... 1 ... 0 1111 . 110 ... . @viwdup
75
+}
76
+{
77
+ VDDUP 1110 1110 0 . .. ... 1 ... 1 1111 . 110 111 . @vidup
78
+ VDWDUP 1110 1110 0 . .. ... 1 ... 1 1111 . 110 ... . @viwdup
79
+}
80
+
81
# multiply-add long dual accumulate
82
# rdahi: bits [3:1] from insn, bit 0 is 1
83
# rdalo: bits [3:1] from insn, bit 0 is 0
84
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
85
index XXXXXXX..XXXXXXX 100644
86
--- a/target/arm/mve_helper.c
87
+++ b/target/arm/mve_helper.c
88
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(mve_sqrshr)(CPUARMState *env, uint32_t n, uint32_t shift)
89
{
90
return do_sqrshl_bhs(n, -(int8_t)shift, 32, true, &env->QF);
91
}
92
+
93
+#define DO_VIDUP(OP, ESIZE, TYPE, FN) \
94
+ uint32_t HELPER(mve_##OP)(CPUARMState *env, void *vd, \
95
+ uint32_t offset, uint32_t imm) \
96
+ { \
97
+ TYPE *d = vd; \
98
+ uint16_t mask = mve_element_mask(env); \
99
+ unsigned e; \
100
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
101
+ mergemask(&d[H##ESIZE(e)], offset, mask); \
102
+ offset = FN(offset, imm); \
103
+ } \
104
+ mve_advance_vpt(env); \
105
+ return offset; \
106
+ }
79
+ }
107
+
80
+
108
+#define DO_VIWDUP(OP, ESIZE, TYPE, FN) \
81
+ object_initialize_child(OBJECT(mms), "scc", &mms->scc, TYPE_MPS2_SCC);
109
+ uint32_t HELPER(mve_##OP)(CPUARMState *env, void *vd, \
82
+ qdev_prop_set_uint32(DEVICE(&mms->scc), "scc-cfg0", 0);
110
+ uint32_t offset, uint32_t wrap, \
83
+ qdev_prop_set_uint32(DEVICE(&mms->scc), "scc-cfg4", 0x2);
111
+ uint32_t imm) \
84
+ qdev_prop_set_uint32(DEVICE(&mms->scc), "scc-aid", 0x00200008);
112
+ { \
85
+ qdev_prop_set_uint32(DEVICE(&mms->scc), "scc-id", 0x41055360);
113
+ TYPE *d = vd; \
86
+ oscclk = qlist_new();
114
+ uint16_t mask = mve_element_mask(env); \
87
+ for (int i = 0; i < ARRAY_SIZE(an536_oscclk); i++) {
115
+ unsigned e; \
88
+ qlist_append_int(oscclk, an536_oscclk[i]);
116
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
117
+ mergemask(&d[H##ESIZE(e)], offset, mask); \
118
+ offset = FN(offset, wrap, imm); \
119
+ } \
120
+ mve_advance_vpt(env); \
121
+ return offset; \
122
+ }
89
+ }
90
+ qdev_prop_set_array(DEVICE(&mms->scc), "oscclk", oscclk);
91
+ sysbus_realize(SYS_BUS_DEVICE(&mms->scc), &error_fatal);
92
+ sysbus_mmio_map(SYS_BUS_DEVICE(&mms->scc), 0, 0xe0200000);
123
+
93
+
124
+#define DO_VIDUP_ALL(OP, FN) \
94
+ create_unimplemented_device("i2s-audio", 0xe0201000, 0x1000);
125
+ DO_VIDUP(OP##b, 1, int8_t, FN) \
126
+ DO_VIDUP(OP##h, 2, int16_t, FN) \
127
+ DO_VIDUP(OP##w, 4, int32_t, FN)
128
+
95
+
129
+#define DO_VIWDUP_ALL(OP, FN) \
96
+ object_initialize_child(OBJECT(mms), "fpgaio", &mms->fpgaio,
130
+ DO_VIWDUP(OP##b, 1, int8_t, FN) \
97
+ TYPE_MPS2_FPGAIO);
131
+ DO_VIWDUP(OP##h, 2, int16_t, FN) \
98
+ qdev_prop_set_uint32(DEVICE(&mms->fpgaio), "prescale-clk", an536_oscclk[1]);
132
+ DO_VIWDUP(OP##w, 4, int32_t, FN)
99
+ qdev_prop_set_uint32(DEVICE(&mms->fpgaio), "num-leds", 10);
100
+ qdev_prop_set_bit(DEVICE(&mms->fpgaio), "has-switches", true);
101
+ qdev_prop_set_bit(DEVICE(&mms->fpgaio), "has-dbgctrl", false);
102
+ sysbus_realize(SYS_BUS_DEVICE(&mms->fpgaio), &error_fatal);
103
+ sysbus_mmio_map(SYS_BUS_DEVICE(&mms->fpgaio), 0, 0xe0202000);
133
+
104
+
134
+static uint32_t do_add_wrap(uint32_t offset, uint32_t wrap, uint32_t imm)
105
+ create_unimplemented_device("clcd", 0xe0209000, 0x1000);
135
+{
136
+ offset += imm;
137
+ if (offset == wrap) {
138
+ offset = 0;
139
+ }
140
+ return offset;
141
+}
142
+
106
+
143
+static uint32_t do_sub_wrap(uint32_t offset, uint32_t wrap, uint32_t imm)
107
+ object_initialize_child(OBJECT(mms), "rtc", &mms->rtc, TYPE_PL031);
144
+{
108
+ sysbus_realize(SYS_BUS_DEVICE(&mms->rtc), &error_fatal);
145
+ if (offset == 0) {
109
+ sysbus_mmio_map(SYS_BUS_DEVICE(&mms->rtc), 0, 0xe020a000);
146
+ offset = wrap;
110
+ sysbus_connect_irq(SYS_BUS_DEVICE(&mms->rtc), 0,
147
+ }
111
+ qdev_get_gpio_in(gicdev, 4));
148
+ offset -= imm;
149
+ return offset;
150
+}
151
+
152
+DO_VIDUP_ALL(vidup, DO_ADD)
153
+DO_VIWDUP_ALL(viwdup, do_add_wrap)
154
+DO_VIWDUP_ALL(vdwdup, do_sub_wrap)
155
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
156
index XXXXXXX..XXXXXXX 100644
157
--- a/target/arm/translate-mve.c
158
+++ b/target/arm/translate-mve.c
159
@@ -XXX,XX +XXX,XX @@
160
#include "translate.h"
161
#include "translate-a32.h"
162
163
+static inline int vidup_imm(DisasContext *s, int x)
164
+{
165
+ return 1 << x;
166
+}
167
+
168
/* Include the generated decoder */
169
#include "decode-mve.c.inc"
170
171
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenTwoOpShiftFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
172
typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
173
typedef void MVEGenVADDVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32);
174
typedef void MVEGenOneOpImmFn(TCGv_ptr, TCGv_ptr, TCGv_i64);
175
+typedef void MVEGenVIDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32);
176
+typedef void MVEGenVIWDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32);
177
178
/* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
179
static inline long mve_qreg_offset(unsigned reg)
180
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLC(DisasContext *s, arg_VSHLC *a)
181
mve_update_eci(s);
182
return true;
183
}
184
+
185
+static bool do_vidup(DisasContext *s, arg_vidup *a, MVEGenVIDUPFn *fn)
186
+{
187
+ TCGv_ptr qd;
188
+ TCGv_i32 rn;
189
+
112
+
190
+ /*
113
+ /*
191
+ * Vector increment/decrement with wrap and duplicate (VIDUP, VDDUP).
114
+ * In hardware this is a LAN9220; the LAN9118 is software compatible
192
+ * This fills the vector with elements of successively increasing
115
+ * except that it doesn't support the checksum-offload feature.
193
+ * or decreasing values, starting from Rn.
194
+ */
116
+ */
195
+ if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qd)) {
117
+ lan9118_init(0xe0300000,
196
+ return false;
118
+ qdev_get_gpio_in(gicdev, 18));
197
+ }
198
+ if (a->size == MO_64) {
199
+ /* size 0b11 is another encoding */
200
+ return false;
201
+ }
202
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
203
+ return true;
204
+ }
205
+
119
+
206
+ qd = mve_qreg_ptr(a->qd);
120
+ create_unimplemented_device("usb", 0xe0301000, 0x1000);
207
+ rn = load_reg(s, a->rn);
121
+ create_unimplemented_device("qspi-write-config", 0xe0600000, 0x1000);
208
+ fn(rn, cpu_env, qd, rn, tcg_constant_i32(a->imm));
209
+ store_reg(s, a->rn, rn);
210
+ tcg_temp_free_ptr(qd);
211
+ mve_update_eci(s);
212
+ return true;
213
+}
214
+
122
+
215
+static bool do_viwdup(DisasContext *s, arg_viwdup *a, MVEGenVIWDUPFn *fn)
123
mms->bootinfo.ram_size = machine->ram_size;
216
+{
124
mms->bootinfo.board_id = -1;
217
+ TCGv_ptr qd;
125
mms->bootinfo.loader_start = mmc->loader_start;
218
+ TCGv_i32 rn, rm;
219
+
220
+ /*
221
+ * Vector increment/decrement with wrap and duplicate (VIWDUp, VDWDUP)
222
+ * This fills the vector with elements of successively increasing
223
+ * or decreasing values, starting from Rn. Rm specifies a point where
224
+ * the count wraps back around to 0. The updated offset is written back
225
+ * to Rn.
226
+ */
227
+ if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qd)) {
228
+ return false;
229
+ }
230
+ if (!fn || a->rm == 13 || a->rm == 15) {
231
+ /*
232
+ * size 0b11 is another encoding; Rm == 13 is UNPREDICTABLE;
233
+ * Rm == 13 is VIWDUP, VDWDUP.
234
+ */
235
+ return false;
236
+ }
237
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
238
+ return true;
239
+ }
240
+
241
+ qd = mve_qreg_ptr(a->qd);
242
+ rn = load_reg(s, a->rn);
243
+ rm = load_reg(s, a->rm);
244
+ fn(rn, cpu_env, qd, rn, rm, tcg_constant_i32(a->imm));
245
+ store_reg(s, a->rn, rn);
246
+ tcg_temp_free_ptr(qd);
247
+ tcg_temp_free_i32(rm);
248
+ mve_update_eci(s);
249
+ return true;
250
+}
251
+
252
+static bool trans_VIDUP(DisasContext *s, arg_vidup *a)
253
+{
254
+ static MVEGenVIDUPFn * const fns[] = {
255
+ gen_helper_mve_vidupb,
256
+ gen_helper_mve_viduph,
257
+ gen_helper_mve_vidupw,
258
+ NULL,
259
+ };
260
+ return do_vidup(s, a, fns[a->size]);
261
+}
262
+
263
+static bool trans_VDDUP(DisasContext *s, arg_vidup *a)
264
+{
265
+ static MVEGenVIDUPFn * const fns[] = {
266
+ gen_helper_mve_vidupb,
267
+ gen_helper_mve_viduph,
268
+ gen_helper_mve_vidupw,
269
+ NULL,
270
+ };
271
+ /* VDDUP is just like VIDUP but with a negative immediate */
272
+ a->imm = -a->imm;
273
+ return do_vidup(s, a, fns[a->size]);
274
+}
275
+
276
+static bool trans_VIWDUP(DisasContext *s, arg_viwdup *a)
277
+{
278
+ static MVEGenVIWDUPFn * const fns[] = {
279
+ gen_helper_mve_viwdupb,
280
+ gen_helper_mve_viwduph,
281
+ gen_helper_mve_viwdupw,
282
+ NULL,
283
+ };
284
+ return do_viwdup(s, a, fns[a->size]);
285
+}
286
+
287
+static bool trans_VDWDUP(DisasContext *s, arg_viwdup *a)
288
+{
289
+ static MVEGenVIWDUPFn * const fns[] = {
290
+ gen_helper_mve_vdwdupb,
291
+ gen_helper_mve_vdwduph,
292
+ gen_helper_mve_vdwdupw,
293
+ NULL,
294
+ };
295
+ return do_viwdup(s, a, fns[a->size]);
296
+}
297
--
126
--
298
2.20.1
127
2.34.1
299
128
300
129
diff view generated by jsdifflib
Deleted patch
1
Factor out the "generate code to update VPR.MASK01/MASK23" part of
2
trans_VPST(); we are going to want to reuse it for the VPT insns.
3
1
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
---
7
target/arm/translate-mve.c | 31 +++++++++++++++++--------------
8
1 file changed, 17 insertions(+), 14 deletions(-)
9
10
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
11
index XXXXXXX..XXXXXXX 100644
12
--- a/target/arm/translate-mve.c
13
+++ b/target/arm/translate-mve.c
14
@@ -XXX,XX +XXX,XX @@ static bool trans_VRMLSLDAVH(DisasContext *s, arg_vmlaldav *a)
15
return do_long_dual_acc(s, a, fns[a->x]);
16
}
17
18
-static bool trans_VPST(DisasContext *s, arg_VPST *a)
19
+static void gen_vpst(DisasContext *s, uint32_t mask)
20
{
21
- TCGv_i32 vpr;
22
-
23
- /* mask == 0 is a "related encoding" */
24
- if (!dc_isar_feature(aa32_mve, s) || !a->mask) {
25
- return false;
26
- }
27
- if (!mve_eci_check(s) || !vfp_access_check(s)) {
28
- return true;
29
- }
30
/*
31
* Set the VPR mask fields. We take advantage of MASK01 and MASK23
32
* being adjacent fields in the register.
33
*
34
- * This insn is not predicated, but it is subject to beat-wise
35
+ * Updating the masks is not predicated, but it is subject to beat-wise
36
* execution, and the mask is updated on the odd-numbered beats.
37
* So if PSR.ECI says we should skip beat 1, we mustn't update the
38
* 01 mask field.
39
*/
40
- vpr = load_cpu_field(v7m.vpr);
41
+ TCGv_i32 vpr = load_cpu_field(v7m.vpr);
42
switch (s->eci) {
43
case ECI_NONE:
44
case ECI_A0:
45
/* Update both 01 and 23 fields */
46
tcg_gen_deposit_i32(vpr, vpr,
47
- tcg_constant_i32(a->mask | (a->mask << 4)),
48
+ tcg_constant_i32(mask | (mask << 4)),
49
R_V7M_VPR_MASK01_SHIFT,
50
R_V7M_VPR_MASK01_LENGTH + R_V7M_VPR_MASK23_LENGTH);
51
break;
52
@@ -XXX,XX +XXX,XX @@ static bool trans_VPST(DisasContext *s, arg_VPST *a)
53
case ECI_A0A1A2B0:
54
/* Update only the 23 mask field */
55
tcg_gen_deposit_i32(vpr, vpr,
56
- tcg_constant_i32(a->mask),
57
+ tcg_constant_i32(mask),
58
R_V7M_VPR_MASK23_SHIFT, R_V7M_VPR_MASK23_LENGTH);
59
break;
60
default:
61
g_assert_not_reached();
62
}
63
store_cpu_field(vpr, v7m.vpr);
64
+}
65
+
66
+static bool trans_VPST(DisasContext *s, arg_VPST *a)
67
+{
68
+ /* mask == 0 is a "related encoding" */
69
+ if (!dc_isar_feature(aa32_mve, s) || !a->mask) {
70
+ return false;
71
+ }
72
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
73
+ return true;
74
+ }
75
+ gen_vpst(s, a->mask);
76
mve_update_and_store_eci(s);
77
return true;
78
}
79
--
80
2.20.1
81
82
diff view generated by jsdifflib
Deleted patch
1
Implement the MVE integer vector comparison instructions. These are
2
"VCMP (vector)" encodings T1, T2 and T3, and "VPT (vector)" encodings
3
T1, T2 and T3.
4
1
5
These insns compare corresponding elements in each vector, and update
6
the VPR.P0 predicate bits with the results of the comparison. VPT
7
also sets the VPR.MASK01 and VPR.MASK23 fields -- it is effectively
8
"VCMP then VPST".
9
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
12
---
13
target/arm/helper-mve.h | 32 ++++++++++++++++++++++
14
target/arm/mve.decode | 18 +++++++++++-
15
target/arm/mve_helper.c | 56 ++++++++++++++++++++++++++++++++++++++
16
target/arm/translate-mve.c | 47 ++++++++++++++++++++++++++++++++
17
4 files changed, 152 insertions(+), 1 deletion(-)
18
19
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
20
index XXXXXXX..XXXXXXX 100644
21
--- a/target/arm/helper-mve.h
22
+++ b/target/arm/helper-mve.h
23
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_uqshl, TCG_CALL_NO_RWG, i32, env, i32, i32)
24
DEF_HELPER_FLAGS_3(mve_sqshl, TCG_CALL_NO_RWG, i32, env, i32, i32)
25
DEF_HELPER_FLAGS_3(mve_uqrshl, TCG_CALL_NO_RWG, i32, env, i32, i32)
26
DEF_HELPER_FLAGS_3(mve_sqrshr, TCG_CALL_NO_RWG, i32, env, i32, i32)
27
+
28
+DEF_HELPER_FLAGS_3(mve_vcmpeqb, TCG_CALL_NO_WG, void, env, ptr, ptr)
29
+DEF_HELPER_FLAGS_3(mve_vcmpeqh, TCG_CALL_NO_WG, void, env, ptr, ptr)
30
+DEF_HELPER_FLAGS_3(mve_vcmpeqw, TCG_CALL_NO_WG, void, env, ptr, ptr)
31
+
32
+DEF_HELPER_FLAGS_3(mve_vcmpneb, TCG_CALL_NO_WG, void, env, ptr, ptr)
33
+DEF_HELPER_FLAGS_3(mve_vcmpneh, TCG_CALL_NO_WG, void, env, ptr, ptr)
34
+DEF_HELPER_FLAGS_3(mve_vcmpnew, TCG_CALL_NO_WG, void, env, ptr, ptr)
35
+
36
+DEF_HELPER_FLAGS_3(mve_vcmpcsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
37
+DEF_HELPER_FLAGS_3(mve_vcmpcsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
38
+DEF_HELPER_FLAGS_3(mve_vcmpcsw, TCG_CALL_NO_WG, void, env, ptr, ptr)
39
+
40
+DEF_HELPER_FLAGS_3(mve_vcmphib, TCG_CALL_NO_WG, void, env, ptr, ptr)
41
+DEF_HELPER_FLAGS_3(mve_vcmphih, TCG_CALL_NO_WG, void, env, ptr, ptr)
42
+DEF_HELPER_FLAGS_3(mve_vcmphiw, TCG_CALL_NO_WG, void, env, ptr, ptr)
43
+
44
+DEF_HELPER_FLAGS_3(mve_vcmpgeb, TCG_CALL_NO_WG, void, env, ptr, ptr)
45
+DEF_HELPER_FLAGS_3(mve_vcmpgeh, TCG_CALL_NO_WG, void, env, ptr, ptr)
46
+DEF_HELPER_FLAGS_3(mve_vcmpgew, TCG_CALL_NO_WG, void, env, ptr, ptr)
47
+
48
+DEF_HELPER_FLAGS_3(mve_vcmpltb, TCG_CALL_NO_WG, void, env, ptr, ptr)
49
+DEF_HELPER_FLAGS_3(mve_vcmplth, TCG_CALL_NO_WG, void, env, ptr, ptr)
50
+DEF_HELPER_FLAGS_3(mve_vcmpltw, TCG_CALL_NO_WG, void, env, ptr, ptr)
51
+
52
+DEF_HELPER_FLAGS_3(mve_vcmpgtb, TCG_CALL_NO_WG, void, env, ptr, ptr)
53
+DEF_HELPER_FLAGS_3(mve_vcmpgth, TCG_CALL_NO_WG, void, env, ptr, ptr)
54
+DEF_HELPER_FLAGS_3(mve_vcmpgtw, TCG_CALL_NO_WG, void, env, ptr, ptr)
55
+
56
+DEF_HELPER_FLAGS_3(mve_vcmpleb, TCG_CALL_NO_WG, void, env, ptr, ptr)
57
+DEF_HELPER_FLAGS_3(mve_vcmpleh, TCG_CALL_NO_WG, void, env, ptr, ptr)
58
+DEF_HELPER_FLAGS_3(mve_vcmplew, TCG_CALL_NO_WG, void, env, ptr, ptr)
59
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
60
index XXXXXXX..XXXXXXX 100644
61
--- a/target/arm/mve.decode
62
+++ b/target/arm/mve.decode
63
@@ -XXX,XX +XXX,XX @@
64
&2shift qd qm shift size
65
&vidup qd rn size imm
66
&viwdup qd rn rm size imm
67
+&vcmp qm qn size mask
68
69
@vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
70
# Note that both Rn and Qd are 3 bits only (no D bit)
71
@@ -XXX,XX +XXX,XX @@
72
@2_shr_w .... .... .. 1 ..... .... .... .... .... &2shift qd=%qd qm=%qm \
73
size=2 shift=%rshift_i5
74
75
+# Vector comparison; 4-bit Qm but 3-bit Qn
76
+%mask_22_13 22:1 13:3
77
+@vcmp .... .... .. size:2 qn:3 . .... .... .... .... &vcmp qm=%qm mask=%mask_22_13
78
+
79
# Vector loads and stores
80
81
# Widening loads and narrowing stores:
82
@@ -XXX,XX +XXX,XX @@ VQRDMULH_scalar 1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
83
}
84
85
# Predicate operations
86
-%mask_22_13 22:1 13:3
87
VPST 1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
88
89
# Logical immediate operations (1 reg and modified-immediate)
90
@@ -XXX,XX +XXX,XX @@ VQRSHRUNT 111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_b
91
VQRSHRUNT 111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_h
92
93
VSHLC 111 0 1110 1 . 1 imm:5 ... 0 1111 1100 rdm:4 qd=%qd
94
+
95
+# Comparisons. We expand out the conditions which are split across
96
+# encodings T1, T2, T3 and the fc bits. These include VPT, which is
97
+# effectively "VCMP then VPST". A plain "VCMP" has a mask field of zero.
98
+VCMPEQ 1111 1110 0 . .. ... 1 ... 0 1111 0 0 . 0 ... 0 @vcmp
99
+VCMPNE 1111 1110 0 . .. ... 1 ... 0 1111 1 0 . 0 ... 0 @vcmp
100
+VCMPCS 1111 1110 0 . .. ... 1 ... 0 1111 0 0 . 0 ... 1 @vcmp
101
+VCMPHI 1111 1110 0 . .. ... 1 ... 0 1111 1 0 . 0 ... 1 @vcmp
102
+VCMPGE 1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 0 @vcmp
103
+VCMPLT 1111 1110 0 . .. ... 1 ... 1 1111 1 0 . 0 ... 0 @vcmp
104
+VCMPGT 1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 1 @vcmp
105
+VCMPLE 1111 1110 0 . .. ... 1 ... 1 1111 1 0 . 0 ... 1 @vcmp
106
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
107
index XXXXXXX..XXXXXXX 100644
108
--- a/target/arm/mve_helper.c
109
+++ b/target/arm/mve_helper.c
110
@@ -XXX,XX +XXX,XX @@ static uint32_t do_sub_wrap(uint32_t offset, uint32_t wrap, uint32_t imm)
111
DO_VIDUP_ALL(vidup, DO_ADD)
112
DO_VIWDUP_ALL(viwdup, do_add_wrap)
113
DO_VIWDUP_ALL(vdwdup, do_sub_wrap)
114
+
115
+/*
116
+ * Vector comparison.
117
+ * P0 bits for non-executed beats (where eci_mask is 0) are unchanged.
118
+ * P0 bits for predicated lanes in executed beats (where mask is 0) are 0.
119
+ * P0 bits otherwise are updated with the results of the comparisons.
120
+ * We must also keep unchanged the MASK fields at the top of v7m.vpr.
121
+ */
122
+#define DO_VCMP(OP, ESIZE, TYPE, FN) \
123
+ void HELPER(glue(mve_, OP))(CPUARMState *env, void *vn, void *vm) \
124
+ { \
125
+ TYPE *n = vn, *m = vm; \
126
+ uint16_t mask = mve_element_mask(env); \
127
+ uint16_t eci_mask = mve_eci_mask(env); \
128
+ uint16_t beatpred = 0; \
129
+ uint16_t emask = MAKE_64BIT_MASK(0, ESIZE); \
130
+ unsigned e; \
131
+ for (e = 0; e < 16 / ESIZE; e++) { \
132
+ bool r = FN(n[H##ESIZE(e)], m[H##ESIZE(e)]); \
133
+ /* Comparison sets 0/1 bits for each byte in the element */ \
134
+ beatpred |= r * emask; \
135
+ emask <<= ESIZE; \
136
+ } \
137
+ beatpred &= mask; \
138
+ env->v7m.vpr = (env->v7m.vpr & ~(uint32_t)eci_mask) | \
139
+ (beatpred & eci_mask); \
140
+ mve_advance_vpt(env); \
141
+ }
142
+
143
+#define DO_VCMP_S(OP, FN) \
144
+ DO_VCMP(OP##b, 1, int8_t, FN) \
145
+ DO_VCMP(OP##h, 2, int16_t, FN) \
146
+ DO_VCMP(OP##w, 4, int32_t, FN)
147
+
148
+#define DO_VCMP_U(OP, FN) \
149
+ DO_VCMP(OP##b, 1, uint8_t, FN) \
150
+ DO_VCMP(OP##h, 2, uint16_t, FN) \
151
+ DO_VCMP(OP##w, 4, uint32_t, FN)
152
+
153
+#define DO_EQ(N, M) ((N) == (M))
154
+#define DO_NE(N, M) ((N) != (M))
155
+#define DO_EQ(N, M) ((N) == (M))
156
+#define DO_EQ(N, M) ((N) == (M))
157
+#define DO_GE(N, M) ((N) >= (M))
158
+#define DO_LT(N, M) ((N) < (M))
159
+#define DO_GT(N, M) ((N) > (M))
160
+#define DO_LE(N, M) ((N) <= (M))
161
+
162
+DO_VCMP_U(vcmpeq, DO_EQ)
163
+DO_VCMP_U(vcmpne, DO_NE)
164
+DO_VCMP_U(vcmpcs, DO_GE)
165
+DO_VCMP_U(vcmphi, DO_GT)
166
+DO_VCMP_S(vcmpge, DO_GE)
167
+DO_VCMP_S(vcmplt, DO_LT)
168
+DO_VCMP_S(vcmpgt, DO_GT)
169
+DO_VCMP_S(vcmple, DO_LE)
170
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
171
index XXXXXXX..XXXXXXX 100644
172
--- a/target/arm/translate-mve.c
173
+++ b/target/arm/translate-mve.c
174
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenVADDVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32);
175
typedef void MVEGenOneOpImmFn(TCGv_ptr, TCGv_ptr, TCGv_i64);
176
typedef void MVEGenVIDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32);
177
typedef void MVEGenVIWDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32);
178
+typedef void MVEGenCmpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
179
180
/* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
181
static inline long mve_qreg_offset(unsigned reg)
182
@@ -XXX,XX +XXX,XX @@ static bool trans_VDWDUP(DisasContext *s, arg_viwdup *a)
183
};
184
return do_viwdup(s, a, fns[a->size]);
185
}
186
+
187
+static bool do_vcmp(DisasContext *s, arg_vcmp *a, MVEGenCmpFn *fn)
188
+{
189
+ TCGv_ptr qn, qm;
190
+
191
+ if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qm) ||
192
+ !fn) {
193
+ return false;
194
+ }
195
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
196
+ return true;
197
+ }
198
+
199
+ qn = mve_qreg_ptr(a->qn);
200
+ qm = mve_qreg_ptr(a->qm);
201
+ fn(cpu_env, qn, qm);
202
+ tcg_temp_free_ptr(qn);
203
+ tcg_temp_free_ptr(qm);
204
+ if (a->mask) {
205
+ /* VPT */
206
+ gen_vpst(s, a->mask);
207
+ }
208
+ mve_update_eci(s);
209
+ return true;
210
+}
211
+
212
+#define DO_VCMP(INSN, FN) \
213
+ static bool trans_##INSN(DisasContext *s, arg_vcmp *a) \
214
+ { \
215
+ static MVEGenCmpFn * const fns[] = { \
216
+ gen_helper_mve_##FN##b, \
217
+ gen_helper_mve_##FN##h, \
218
+ gen_helper_mve_##FN##w, \
219
+ NULL, \
220
+ }; \
221
+ return do_vcmp(s, a, fns[a->size]); \
222
+ }
223
+
224
+DO_VCMP(VCMPEQ, vcmpeq)
225
+DO_VCMP(VCMPNE, vcmpne)
226
+DO_VCMP(VCMPCS, vcmpcs)
227
+DO_VCMP(VCMPHI, vcmphi)
228
+DO_VCMP(VCMPGE, vcmpge)
229
+DO_VCMP(VCMPLT, vcmplt)
230
+DO_VCMP(VCMPGT, vcmpgt)
231
+DO_VCMP(VCMPLE, vcmple)
232
--
233
2.20.1
234
235
diff view generated by jsdifflib
1
Implement the MVE VPSEL insn, which sets each byte of the destination
1
Add documentation for the mps3-an536 board type.
2
vector Qd to the byte from either Qn or Qm depending on the value of
3
the corresponding bit in VPR.P0.
4
2
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
3
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
4
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
5
Message-id: 20240206132931.38376-14-peter.maydell@linaro.org
7
---
6
---
8
target/arm/helper-mve.h | 2 ++
7
docs/system/arm/mps2.rst | 37 ++++++++++++++++++++++++++++++++++---
9
target/arm/mve.decode | 7 +++++--
8
1 file changed, 34 insertions(+), 3 deletions(-)
10
target/arm/mve_helper.c | 19 +++++++++++++++++++
11
target/arm/translate-mve.c | 2 ++
12
4 files changed, 28 insertions(+), 2 deletions(-)
13
9
14
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
10
diff --git a/docs/system/arm/mps2.rst b/docs/system/arm/mps2.rst
15
index XXXXXXX..XXXXXXX 100644
11
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/helper-mve.h
12
--- a/docs/system/arm/mps2.rst
17
+++ b/target/arm/helper-mve.h
13
+++ b/docs/system/arm/mps2.rst
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vorr, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
14
@@ -XXX,XX +XXX,XX @@
19
DEF_HELPER_FLAGS_4(mve_vorn, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
15
-Arm MPS2 and MPS3 boards (``mps2-an385``, ``mps2-an386``, ``mps2-an500``, ``mps2-an505``, ``mps2-an511``, ``mps2-an521``, ``mps3-an524``, ``mps3-an547``)
20
DEF_HELPER_FLAGS_4(mve_veor, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
16
-=========================================================================================================================================================
21
17
+Arm MPS2 and MPS3 boards (``mps2-an385``, ``mps2-an386``, ``mps2-an500``, ``mps2-an505``, ``mps2-an511``, ``mps2-an521``, ``mps3-an524``, ``mps3-an536``, ``mps3-an547``)
22
+DEF_HELPER_FLAGS_4(mve_vpsel, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
18
+=========================================================================================================================================================================
19
20
-These board models all use Arm M-profile CPUs.
21
+These board models use Arm M-profile or R-profile CPUs.
22
23
The Arm MPS2, MPS2+ and MPS3 dev boards are FPGA based (the 2+ has a
24
bigger FPGA but is otherwise the same as the 2; the 3 has a bigger
25
@@ -XXX,XX +XXX,XX @@ FPGA image.
26
27
QEMU models the following FPGA images:
28
29
+FPGA images using M-profile CPUs:
23
+
30
+
24
DEF_HELPER_FLAGS_4(mve_vaddb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
31
``mps2-an385``
25
DEF_HELPER_FLAGS_4(mve_vaddh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
32
Cortex-M3 as documented in Arm Application Note AN385
26
DEF_HELPER_FLAGS_4(mve_vaddw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
33
``mps2-an386``
27
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
34
@@ -XXX,XX +XXX,XX @@ QEMU models the following FPGA images:
28
index XXXXXXX..XXXXXXX 100644
35
``mps3-an547``
29
--- a/target/arm/mve.decode
36
Cortex-M55 on an MPS3, as documented in Arm Application Note AN547
30
+++ b/target/arm/mve.decode
37
31
@@ -XXX,XX +XXX,XX @@ VSHLC 111 0 1110 1 . 1 imm:5 ... 0 1111 1100 rdm:4 qd=%qd
38
+FPGA images using R-profile CPUs:
32
# effectively "VCMP then VPST". A plain "VCMP" has a mask field of zero.
33
VCMPEQ 1111 1110 0 . .. ... 1 ... 0 1111 0 0 . 0 ... 0 @vcmp
34
VCMPNE 1111 1110 0 . .. ... 1 ... 0 1111 1 0 . 0 ... 0 @vcmp
35
-VCMPCS 1111 1110 0 . .. ... 1 ... 0 1111 0 0 . 0 ... 1 @vcmp
36
-VCMPHI 1111 1110 0 . .. ... 1 ... 0 1111 1 0 . 0 ... 1 @vcmp
37
+{
38
+ VPSEL 1111 1110 0 . 11 ... 1 ... 0 1111 . 0 . 0 ... 1 @2op_nosz
39
+ VCMPCS 1111 1110 0 . .. ... 1 ... 0 1111 0 0 . 0 ... 1 @vcmp
40
+ VCMPHI 1111 1110 0 . .. ... 1 ... 0 1111 1 0 . 0 ... 1 @vcmp
41
+}
42
VCMPGE 1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 0 @vcmp
43
VCMPLT 1111 1110 0 . .. ... 1 ... 1 1111 1 0 . 0 ... 0 @vcmp
44
VCMPGT 1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 1 @vcmp
45
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
46
index XXXXXXX..XXXXXXX 100644
47
--- a/target/arm/mve_helper.c
48
+++ b/target/arm/mve_helper.c
49
@@ -XXX,XX +XXX,XX @@ DO_VCMP_S(vcmpge, DO_GE)
50
DO_VCMP_S(vcmplt, DO_LT)
51
DO_VCMP_S(vcmpgt, DO_GT)
52
DO_VCMP_S(vcmple, DO_LE)
53
+
39
+
54
+void HELPER(mve_vpsel)(CPUARMState *env, void *vd, void *vn, void *vm)
40
+``mps3-an536``
55
+{
41
+ Dual Cortex-R52 on an MPS3, as documented in Arm Application Note AN536
56
+ /*
57
+ * Qd[n] = VPR.P0[n] ? Qn[n] : Qm[n]
58
+ * but note that whether bytes are written to Qd is still subject
59
+ * to (all forms of) predication in the usual way.
60
+ */
61
+ uint64_t *d = vd, *n = vn, *m = vm;
62
+ uint16_t mask = mve_element_mask(env);
63
+ uint16_t p0 = FIELD_EX32(env->v7m.vpr, V7M_VPR, P0);
64
+ unsigned e;
65
+ for (e = 0; e < 16 / 8; e++, mask >>= 8, p0 >>= 8) {
66
+ uint64_t r = m[H8(e)];
67
+ mergemask(&r, n[H8(e)], p0);
68
+ mergemask(&d[H8(e)], r, mask);
69
+ }
70
+ mve_advance_vpt(env);
71
+}
72
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
73
index XXXXXXX..XXXXXXX 100644
74
--- a/target/arm/translate-mve.c
75
+++ b/target/arm/translate-mve.c
76
@@ -XXX,XX +XXX,XX @@ DO_LOGIC(VORR, gen_helper_mve_vorr)
77
DO_LOGIC(VORN, gen_helper_mve_vorn)
78
DO_LOGIC(VEOR, gen_helper_mve_veor)
79
80
+DO_LOGIC(VPSEL, gen_helper_mve_vpsel)
81
+
42
+
82
#define DO_2OP(INSN, FN) \
43
Differences between QEMU and real hardware:
83
static bool trans_##INSN(DisasContext *s, arg_2op *a) \
44
84
{ \
45
- AN385/AN386 remapping of low 16K of memory to either ZBT SSRAM1 or to
46
@@ -XXX,XX +XXX,XX @@ Differences between QEMU and real hardware:
47
flash, but only as simple ROM, so attempting to rewrite the flash
48
from the guest will fail
49
- QEMU does not model the USB controller in MPS3 boards
50
+- AN536 does not support runtime control of CPU reset and halt via
51
+ the SCC CFG_REG0 register.
52
+- AN536 does not support enabling or disabling the flash and ATCM
53
+ interfaces via the SCC CFG_REG1 register.
54
+- AN536 does not support setting of the initial vector table
55
+ base address via the SCC CFG_REG6 and CFG_REG7 register config,
56
+ and does not provide a mechanism for specifying these values at
57
+ startup, so all guest images must be built to start from TCM
58
+ (i.e. to expect the interrupt vector base at 0 from reset).
59
+- AN536 defaults to only creating a single CPU; this is the equivalent
60
+ of the way the real FPGA image usually runs with the second Cortex-R52
61
+ held in halt via the initial SCC CFG_REG0 register setting. You can
62
+ create the second CPU with ``-smp 2``; both CPUs will then start
63
+ execution immediately on startup.
64
+
65
+Note that for the AN536 the first UART is accessible only by
66
+CPU0, and the second UART is accessible only by CPU1. The
67
+first UART accessible shared between both CPUs is the third
68
+UART. Guest software might therefore be built to use either
69
+the first UART or the third UART; if you don't see any output
70
+from the UART you are looking at, try one of the others.
71
+(Even if the AN536 machine is started with a single CPU and so
72
+no "CPU1-only UART", the UART numbering remains the same,
73
+with the third UART being the first of the shared ones.)
74
75
Machine-specific options
76
""""""""""""""""""""""""
85
--
77
--
86
2.20.1
78
2.34.1
87
79
88
80
diff view generated by jsdifflib