1
First set of arm patches for 6.2. I have a lot more in my
1
I don't have anything else queued up at the moment, so this is just
2
to-review queue still...
2
Richard's SME patches.
3
3
4
-- PMM
4
-- PMM
5
5
6
The following changes since commit d42685765653ec155fdf60910662f8830bdb2cef:
6
The following changes since commit 63b38f6c85acd312c2cab68554abf33adf4ee2b3:
7
7
8
Open 6.2 development tree (2021-08-25 10:25:12 +0100)
8
Merge tag 'pull-target-arm-20220707' of https://git.linaro.org/people/pmaydell/qemu-arm into staging (2022-07-08 06:17:11 +0530)
9
9
10
are available in the Git repository at:
10
are available in the Git repository at:
11
11
12
https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20210825
12
https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20220711
13
13
14
for you to fetch changes up to 24b1a6aa43615be22c7ee66bd68ec5675f6a6a9a:
14
for you to fetch changes up to f9982ceaf26df27d15547a3a7990a95019e9e3a8:
15
15
16
docs: Document how to use gdb with unix sockets (2021-08-25 10:48:51 +0100)
16
linux-user/aarch64: Add SME related hwcap entries (2022-07-11 13:43:52 +0100)
17
17
18
----------------------------------------------------------------
18
----------------------------------------------------------------
19
target-arm queue:
19
target-arm:
20
* More MVE emulation work
20
* Implement SME emulation, for both system and linux-user
21
* Implement M-profile trapping on division by zero
22
* kvm: use RCU_READ_LOCK_GUARD() in kvm_arch_fixup_msi_route()
23
* hw/char/pl011: add support for sending break
24
* fsl-imx6ul: Instantiate SAI1/2/3 and ASRC as unimplemented devices
25
* hw/dma/pl330: Add memory region to replace default
26
* sbsa-ref: Rename SBSA_GWDT enum value
27
* fsl-imx7: Instantiate SAI1/2/3 as unimplemented devices
28
* docs: Document how to use gdb with unix sockets
29
21
30
----------------------------------------------------------------
22
----------------------------------------------------------------
31
Eduardo Habkost (1):
23
Richard Henderson (45):
32
sbsa-ref: Rename SBSA_GWDT enum value
24
target/arm: Handle SME in aarch64_cpu_dump_state
25
target/arm: Add infrastructure for disas_sme
26
target/arm: Trap non-streaming usage when Streaming SVE is active
27
target/arm: Mark ADR as non-streaming
28
target/arm: Mark RDFFR, WRFFR, SETFFR as non-streaming
29
target/arm: Mark BDEP, BEXT, BGRP, COMPACT, FEXPA, FTSSEL as non-streaming
30
target/arm: Mark PMULL, FMMLA as non-streaming
31
target/arm: Mark FTSMUL, FTMAD, FADDA as non-streaming
32
target/arm: Mark SMMLA, UMMLA, USMMLA as non-streaming
33
target/arm: Mark string/histo/crypto as non-streaming
34
target/arm: Mark gather/scatter load/store as non-streaming
35
target/arm: Mark gather prefetch as non-streaming
36
target/arm: Mark LDFF1 and LDNF1 as non-streaming
37
target/arm: Mark LD1RO as non-streaming
38
target/arm: Add SME enablement checks
39
target/arm: Handle SME in sve_access_check
40
target/arm: Implement SME RDSVL, ADDSVL, ADDSPL
41
target/arm: Implement SME ZERO
42
target/arm: Implement SME MOVA
43
target/arm: Implement SME LD1, ST1
44
target/arm: Export unpredicated ld/st from translate-sve.c
45
target/arm: Implement SME LDR, STR
46
target/arm: Implement SME ADDHA, ADDVA
47
target/arm: Implement FMOPA, FMOPS (non-widening)
48
target/arm: Implement BFMOPA, BFMOPS
49
target/arm: Implement FMOPA, FMOPS (widening)
50
target/arm: Implement SME integer outer product
51
target/arm: Implement PSEL
52
target/arm: Implement REVD
53
target/arm: Implement SCLAMP, UCLAMP
54
target/arm: Reset streaming sve state on exception boundaries
55
target/arm: Enable SME for -cpu max
56
linux-user/aarch64: Clear tpidr2_el0 if CLONE_SETTLS
57
linux-user/aarch64: Reset PSTATE.SM on syscalls
58
linux-user/aarch64: Add SM bit to SVE signal context
59
linux-user/aarch64: Tidy target_restore_sigframe error return
60
linux-user/aarch64: Do not allow duplicate or short sve records
61
linux-user/aarch64: Verify extra record lock succeeded
62
linux-user/aarch64: Move sve record checks into restore
63
linux-user/aarch64: Implement SME signal handling
64
linux-user: Rename sve prctls
65
linux-user/aarch64: Implement PR_SME_GET_VL, PR_SME_SET_VL
66
target/arm: Only set ZEN in reset if SVE present
67
target/arm: Enable SME for user-only
68
linux-user/aarch64: Add SME related hwcap entries
33
69
34
Guenter Roeck (2):
70
docs/system/arm/emulation.rst | 4 +
35
fsl-imx6ul: Instantiate SAI1/2/3 and ASRC as unimplemented devices
71
linux-user/aarch64/target_cpu.h | 5 +-
36
fsl-imx7: Instantiate SAI1/2/3 as unimplemented devices
72
linux-user/aarch64/target_prctl.h | 62 +-
37
73
target/arm/cpu.h | 7 +
38
Hamza Mahfooz (1):
74
target/arm/helper-sme.h | 126 ++++
39
target/arm: kvm: use RCU_READ_LOCK_GUARD() in kvm_arch_fixup_msi_route()
75
target/arm/helper-sve.h | 4 +
40
76
target/arm/helper.h | 18 +
41
Jan Luebbe (1):
77
target/arm/translate-a64.h | 45 ++
42
hw/char/pl011: add support for sending break
78
target/arm/translate.h | 16 +
43
79
target/arm/sme-fa64.decode | 60 ++
44
Peter Maydell (37):
80
target/arm/sme.decode | 88 +++
45
target/arm: Note that we handle VMOVL as a special case of VSHLL
81
target/arm/sve.decode | 41 +-
46
target/arm: Print MVE VPR in CPU dumps
82
linux-user/aarch64/cpu_loop.c | 9 +
47
target/arm: Fix MVE VSLI by 0 and VSRI by <dt>
83
linux-user/aarch64/signal.c | 243 ++++++--
48
target/arm: Fix signed VADDV
84
linux-user/elfload.c | 20 +
49
target/arm: Fix mask handling for MVE narrowing operations
85
linux-user/syscall.c | 28 +-
50
target/arm: Fix 48-bit saturating shifts
86
target/arm/cpu.c | 35 +-
51
target/arm: Fix MVE 48-bit SQRSHRL for small right shifts
87
target/arm/cpu64.c | 11 +
52
target/arm: Fix calculation of LTP mask when LR is 0
88
target/arm/helper.c | 56 +-
53
target/arm: Factor out mve_eci_mask()
89
target/arm/sme_helper.c | 1140 +++++++++++++++++++++++++++++++++++++
54
target/arm: Fix VPT advance when ECI is non-zero
90
target/arm/sve_helper.c | 28 +
55
target/arm: Fix VLDRB/H/W for predicated elements
91
target/arm/translate-a64.c | 103 +++-
56
target/arm: Implement MVE VMULL (polynomial)
92
target/arm/translate-sme.c | 373 ++++++++++++
57
target/arm: Implement MVE incrementing/decrementing dup insns
93
target/arm/translate-sve.c | 393 ++++++++++---
58
target/arm: Factor out gen_vpst()
94
target/arm/translate-vfp.c | 12 +
59
target/arm: Implement MVE integer vector comparisons
95
target/arm/translate.c | 2 +
60
target/arm: Implement MVE integer vector-vs-scalar comparisons
96
target/arm/vec_helper.c | 24 +
61
target/arm: Implement MVE VPSEL
97
target/arm/meson.build | 3 +
62
target/arm: Implement MVE VMLAS
98
28 files changed, 2821 insertions(+), 135 deletions(-)
63
target/arm: Implement MVE shift-by-scalar
99
create mode 100644 target/arm/sme-fa64.decode
64
target/arm: Move 'x' and 'a' bit definitions into vmlaldav formats
100
create mode 100644 target/arm/sme.decode
65
target/arm: Implement MVE integer min/max across vector
101
create mode 100644 target/arm/translate-sme.c
66
target/arm: Implement MVE VABAV
67
target/arm: Implement MVE narrowing moves
68
target/arm: Rename MVEGenDualAccOpFn to MVEGenLongDualAccOpFn
69
target/arm: Implement MVE VMLADAV and VMLSLDAV
70
target/arm: Implement MVE VMLA
71
target/arm: Implement MVE saturating doubling multiply accumulates
72
target/arm: Implement MVE VQABS, VQNEG
73
target/arm: Implement MVE VMAXA, VMINA
74
target/arm: Implement MVE VMOV to/from 2 general-purpose registers
75
target/arm: Implement MVE VPNOT
76
target/arm: Implement MVE VCTP
77
target/arm: Implement MVE scatter-gather insns
78
target/arm: Implement MVE scatter-gather immediate forms
79
target/arm: Implement MVE interleaving loads/stores
80
target/arm: Re-indent sdiv and udiv helpers
81
target/arm: Implement M-profile trapping on division by zero
82
83
Sebastian Meyer (1):
84
docs: Document how to use gdb with unix sockets
85
86
Wen, Jianxian (1):
87
hw/dma/pl330: Add memory region to replace default
88
89
docs/system/gdb.rst | 26 +-
90
include/hw/arm/fsl-imx7.h | 5 +
91
target/arm/cpu.h | 1 +
92
target/arm/helper-mve.h | 283 ++++++++++
93
target/arm/helper.h | 4 +-
94
target/arm/translate-a32.h | 2 +
95
target/arm/vec_internal.h | 11 +
96
target/arm/mve.decode | 226 +++++++-
97
target/arm/t32.decode | 1 +
98
hw/arm/exynos4210.c | 3 +
99
hw/arm/fsl-imx6ul.c | 12 +
100
hw/arm/fsl-imx7.c | 7 +
101
hw/arm/sbsa-ref.c | 6 +-
102
hw/arm/xilinx_zynq.c | 3 +
103
hw/char/pl011.c | 6 +
104
hw/dma/pl330.c | 26 +-
105
target/arm/cpu.c | 3 +
106
target/arm/helper.c | 34 +-
107
target/arm/kvm.c | 17 +-
108
target/arm/m_helper.c | 4 +
109
target/arm/mve_helper.c | 1254 ++++++++++++++++++++++++++++++++++++++++++--
110
target/arm/translate-mve.c | 877 ++++++++++++++++++++++++++++++-
111
target/arm/translate-vfp.c | 2 +-
112
target/arm/translate.c | 37 +-
113
target/arm/vec_helper.c | 14 +-
114
25 files changed, 2746 insertions(+), 118 deletions(-)
115
diff view generated by jsdifflib
1
From: "Wen, Jianxian" <Jianxian.Wen@verisilicon.com>
1
From: Richard Henderson <richard.henderson@linaro.org>
2
2
3
Add property memory region which can connect with IOMMU region to support SMMU translate.
3
Dump SVCR, plus use the correct access check for Streaming Mode.
4
4
5
Signed-off-by: Jianxian Wen <jianxian.wen@verisilicon.com>
5
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
6
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
7
Message-id: 4C23C17B8E87E74E906A25A3254A03F4FA1FEC31@SHASXM03.verisilicon.com
7
Message-id: 20220708151540.18136-2-richard.henderson@linaro.org
8
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
---
9
---
10
hw/arm/exynos4210.c | 3 +++
10
target/arm/cpu.c | 17 ++++++++++++++++-
11
hw/arm/xilinx_zynq.c | 3 +++
11
1 file changed, 16 insertions(+), 1 deletion(-)
12
hw/dma/pl330.c | 26 ++++++++++++++++++++++----
13
3 files changed, 28 insertions(+), 4 deletions(-)
14
12
15
diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
13
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
16
index XXXXXXX..XXXXXXX 100644
14
index XXXXXXX..XXXXXXX 100644
17
--- a/hw/arm/exynos4210.c
15
--- a/target/arm/cpu.c
18
+++ b/hw/arm/exynos4210.c
16
+++ b/target/arm/cpu.c
19
@@ -XXX,XX +XXX,XX @@ static DeviceState *pl330_create(uint32_t base, qemu_or_irq *orgate,
17
@@ -XXX,XX +XXX,XX @@ static void aarch64_cpu_dump_state(CPUState *cs, FILE *f, int flags)
20
int i;
18
int i;
21
19
int el = arm_current_el(env);
22
dev = qdev_new("pl330");
20
const char *ns_status;
23
+ object_property_set_link(OBJECT(dev), "memory",
21
+ bool sve;
24
+ OBJECT(get_system_memory()),
22
25
+ &error_fatal);
23
qemu_fprintf(f, " PC=%016" PRIx64 " ", env->pc);
26
qdev_prop_set_uint8(dev, "num_events", nevents);
24
for (i = 0; i < 32; i++) {
27
qdev_prop_set_uint8(dev, "num_chnls", 8);
25
@@ -XXX,XX +XXX,XX @@ static void aarch64_cpu_dump_state(CPUState *cs, FILE *f, int flags)
28
qdev_prop_set_uint8(dev, "num_periph_req", nreq);
26
el,
29
diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
27
psr & PSTATE_SP ? 'h' : 't');
30
index XXXXXXX..XXXXXXX 100644
28
31
--- a/hw/arm/xilinx_zynq.c
29
+ if (cpu_isar_feature(aa64_sme, cpu)) {
32
+++ b/hw/arm/xilinx_zynq.c
30
+ qemu_fprintf(f, " SVCR=%08" PRIx64 " %c%c",
33
@@ -XXX,XX +XXX,XX @@ static void zynq_init(MachineState *machine)
31
+ env->svcr,
34
sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[39-IRQ_OFFSET]);
32
+ (FIELD_EX64(env->svcr, SVCR, ZA) ? 'Z' : '-'),
35
33
+ (FIELD_EX64(env->svcr, SVCR, SM) ? 'S' : '-'));
36
dev = qdev_new("pl330");
34
+ }
37
+ object_property_set_link(OBJECT(dev), "memory",
35
if (cpu_isar_feature(aa64_bti, cpu)) {
38
+ OBJECT(address_space_mem),
36
qemu_fprintf(f, " BTYPE=%d", (psr & PSTATE_BTYPE) >> 10);
39
+ &error_fatal);
37
}
40
qdev_prop_set_uint8(dev, "num_chnls", 8);
38
@@ -XXX,XX +XXX,XX @@ static void aarch64_cpu_dump_state(CPUState *cs, FILE *f, int flags)
41
qdev_prop_set_uint8(dev, "num_periph_req", 4);
39
qemu_fprintf(f, " FPCR=%08x FPSR=%08x\n",
42
qdev_prop_set_uint8(dev, "num_events", 16);
40
vfp_get_fpcr(env), vfp_get_fpsr(env));
43
diff --git a/hw/dma/pl330.c b/hw/dma/pl330.c
41
44
index XXXXXXX..XXXXXXX 100644
42
- if (cpu_isar_feature(aa64_sve, cpu) && sve_exception_el(env, el) == 0) {
45
--- a/hw/dma/pl330.c
43
+ if (cpu_isar_feature(aa64_sme, cpu) && FIELD_EX64(env->svcr, SVCR, SM)) {
46
+++ b/hw/dma/pl330.c
44
+ sve = sme_exception_el(env, el) == 0;
47
@@ -XXX,XX +XXX,XX @@ struct PL330State {
45
+ } else if (cpu_isar_feature(aa64_sve, cpu)) {
48
uint8_t num_faulting;
46
+ sve = sve_exception_el(env, el) == 0;
49
uint8_t periph_busy[PL330_PERIPH_NUM];
50
51
+ /* Memory region that DMA operation access */
52
+ MemoryRegion *mem_mr;
53
+ AddressSpace *mem_as;
54
};
55
56
#define TYPE_PL330 "pl330"
57
@@ -XXX,XX +XXX,XX @@ static inline const PL330InsnDesc *pl330_fetch_insn(PL330Chan *ch)
58
uint8_t opcode;
59
int i;
60
61
- dma_memory_read(&address_space_memory, ch->pc, &opcode, 1);
62
+ dma_memory_read(ch->parent->mem_as, ch->pc, &opcode, 1);
63
for (i = 0; insn_desc[i].size; i++) {
64
if ((opcode & insn_desc[i].opmask) == insn_desc[i].opcode) {
65
return &insn_desc[i];
66
@@ -XXX,XX +XXX,XX @@ static inline void pl330_exec_insn(PL330Chan *ch, const PL330InsnDesc *insn)
67
uint8_t buf[PL330_INSN_MAXSIZE];
68
69
assert(insn->size <= PL330_INSN_MAXSIZE);
70
- dma_memory_read(&address_space_memory, ch->pc, buf, insn->size);
71
+ dma_memory_read(ch->parent->mem_as, ch->pc, buf, insn->size);
72
insn->exec(ch, buf[0], &buf[1], insn->size - 1);
73
}
74
75
@@ -XXX,XX +XXX,XX @@ static int pl330_exec_cycle(PL330Chan *channel)
76
if (q != NULL && q->len <= pl330_fifo_num_free(&s->fifo)) {
77
int len = q->len - (q->addr & (q->len - 1));
78
79
- dma_memory_read(&address_space_memory, q->addr, buf, len);
80
+ dma_memory_read(s->mem_as, q->addr, buf, len);
81
trace_pl330_exec_cycle(q->addr, len);
82
if (trace_event_get_state_backends(TRACE_PL330_HEXDUMP)) {
83
pl330_hexdump(buf, len);
84
@@ -XXX,XX +XXX,XX @@ static int pl330_exec_cycle(PL330Chan *channel)
85
fifo_res = pl330_fifo_get(&s->fifo, buf, len, q->tag);
86
}
87
if (fifo_res == PL330_FIFO_OK || q->z) {
88
- dma_memory_write(&address_space_memory, q->addr, buf, len);
89
+ dma_memory_write(s->mem_as, q->addr, buf, len);
90
trace_pl330_exec_cycle(q->addr, len);
91
if (trace_event_get_state_backends(TRACE_PL330_HEXDUMP)) {
92
pl330_hexdump(buf, len);
93
@@ -XXX,XX +XXX,XX @@ static void pl330_realize(DeviceState *dev, Error **errp)
94
"dma", PL330_IOMEM_SIZE);
95
sysbus_init_mmio(SYS_BUS_DEVICE(dev), &s->iomem);
96
97
+ if (!s->mem_mr) {
98
+ error_setg(errp, "'memory' link is not set");
99
+ return;
100
+ } else if (s->mem_mr == get_system_memory()) {
101
+ /* Avoid creating new AS for system memory. */
102
+ s->mem_as = &address_space_memory;
103
+ } else {
47
+ } else {
104
+ s->mem_as = g_new0(AddressSpace, 1);
48
+ sve = false;
105
+ address_space_init(s->mem_as, s->mem_mr,
106
+ memory_region_name(s->mem_mr));
107
+ }
49
+ }
108
+
50
+
109
s->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, pl330_exec_cycle_timer, s);
51
+ if (sve) {
110
52
int j, zcr_len = sve_vqm1_for_el(env, el);
111
s->cfg[0] = (s->mgr_ns_at_rst ? 0x4 : 0) |
53
112
@@ -XXX,XX +XXX,XX @@ static Property pl330_properties[] = {
54
for (i = 0; i <= FFR_PRED_NUM; i++) {
113
DEFINE_PROP_UINT8("rd_q_dep", PL330State, rd_q_dep, 16),
114
DEFINE_PROP_UINT16("data_buffer_dep", PL330State, data_buffer_dep, 256),
115
116
+ DEFINE_PROP_LINK("memory", PL330State, mem_mr,
117
+ TYPE_MEMORY_REGION, MemoryRegion *),
118
+
119
DEFINE_PROP_END_OF_LIST(),
120
};
121
122
--
55
--
123
2.20.1
56
2.25.1
124
125
diff view generated by jsdifflib
1
The MVEGenDualAccOpFn is a bit misnamed, since it is used for
1
From: Richard Henderson <richard.henderson@linaro.org>
2
the "long dual accumulate" operations that use a 64-bit
3
accumulator. Rename it to MVEGenLongDualAccOpFn so we can
4
use the former name for the 32-bit accumulator insns.
5
2
3
This includes the build rules for the decoder, and the
4
new file for translation, but excludes any instructions.
5
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20220708151540.18136-3-richard.henderson@linaro.org
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
8
---
10
---
9
target/arm/translate-mve.c | 16 ++++++++--------
11
target/arm/translate-a64.h | 1 +
10
1 file changed, 8 insertions(+), 8 deletions(-)
12
target/arm/sme.decode | 20 ++++++++++++++++++++
13
target/arm/translate-a64.c | 7 ++++++-
14
target/arm/translate-sme.c | 35 +++++++++++++++++++++++++++++++++++
15
target/arm/meson.build | 2 ++
16
5 files changed, 64 insertions(+), 1 deletion(-)
17
create mode 100644 target/arm/sme.decode
18
create mode 100644 target/arm/translate-sme.c
11
19
12
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
20
diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
13
index XXXXXXX..XXXXXXX 100644
21
index XXXXXXX..XXXXXXX 100644
14
--- a/target/arm/translate-mve.c
22
--- a/target/arm/translate-a64.h
15
+++ b/target/arm/translate-mve.c
23
+++ b/target/arm/translate-a64.h
16
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
24
@@ -XXX,XX +XXX,XX @@ static inline int pred_gvec_reg_size(DisasContext *s)
17
typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
18
typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
19
typedef void MVEGenTwoOpShiftFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
20
-typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
21
+typedef void MVEGenLongDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
22
typedef void MVEGenVADDVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32);
23
typedef void MVEGenOneOpImmFn(TCGv_ptr, TCGv_ptr, TCGv_i64);
24
typedef void MVEGenVIDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32);
25
@@ -XXX,XX +XXX,XX @@ static bool trans_VQDMULLT_scalar(DisasContext *s, arg_2scalar *a)
26
}
25
}
27
26
28
static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
27
bool disas_sve(DisasContext *, uint32_t);
29
- MVEGenDualAccOpFn *fn)
28
+bool disas_sme(DisasContext *, uint32_t);
30
+ MVEGenLongDualAccOpFn *fn)
29
31
{
30
void gen_gvec_rax1(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
32
TCGv_ptr qn, qm;
31
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
33
TCGv_i64 rda;
32
diff --git a/target/arm/sme.decode b/target/arm/sme.decode
34
@@ -XXX,XX +XXX,XX @@ static bool do_long_dual_acc(DisasContext *s, arg_vmlaldav *a,
33
new file mode 100644
35
34
index XXXXXXX..XXXXXXX
36
static bool trans_VMLALDAV_S(DisasContext *s, arg_vmlaldav *a)
35
--- /dev/null
37
{
36
+++ b/target/arm/sme.decode
38
- static MVEGenDualAccOpFn * const fns[4][2] = {
37
@@ -XXX,XX +XXX,XX @@
39
+ static MVEGenLongDualAccOpFn * const fns[4][2] = {
38
+# AArch64 SME instruction descriptions
40
{ NULL, NULL },
39
+#
41
{ gen_helper_mve_vmlaldavsh, gen_helper_mve_vmlaldavxsh },
40
+# Copyright (c) 2022 Linaro, Ltd
42
{ gen_helper_mve_vmlaldavsw, gen_helper_mve_vmlaldavxsw },
41
+#
43
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLALDAV_S(DisasContext *s, arg_vmlaldav *a)
42
+# This library is free software; you can redistribute it and/or
44
43
+# modify it under the terms of the GNU Lesser General Public
45
static bool trans_VMLALDAV_U(DisasContext *s, arg_vmlaldav *a)
44
+# License as published by the Free Software Foundation; either
46
{
45
+# version 2.1 of the License, or (at your option) any later version.
47
- static MVEGenDualAccOpFn * const fns[4][2] = {
46
+#
48
+ static MVEGenLongDualAccOpFn * const fns[4][2] = {
47
+# This library is distributed in the hope that it will be useful,
49
{ NULL, NULL },
48
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
50
{ gen_helper_mve_vmlaldavuh, NULL },
49
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
51
{ gen_helper_mve_vmlaldavuw, NULL },
50
+# Lesser General Public License for more details.
52
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLALDAV_U(DisasContext *s, arg_vmlaldav *a)
51
+#
53
52
+# You should have received a copy of the GNU Lesser General Public
54
static bool trans_VMLSLDAV(DisasContext *s, arg_vmlaldav *a)
53
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
55
{
54
+
56
- static MVEGenDualAccOpFn * const fns[4][2] = {
55
+#
57
+ static MVEGenLongDualAccOpFn * const fns[4][2] = {
56
+# This file is processed by scripts/decodetree.py
58
{ NULL, NULL },
57
+#
59
{ gen_helper_mve_vmlsldavsh, gen_helper_mve_vmlsldavxsh },
58
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
60
{ gen_helper_mve_vmlsldavsw, gen_helper_mve_vmlsldavxsw },
59
index XXXXXXX..XXXXXXX 100644
61
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLSLDAV(DisasContext *s, arg_vmlaldav *a)
60
--- a/target/arm/translate-a64.c
62
61
+++ b/target/arm/translate-a64.c
63
static bool trans_VRMLALDAVH_S(DisasContext *s, arg_vmlaldav *a)
62
@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_translate_insn(DisasContextBase *dcbase, CPUState *cpu)
64
{
63
}
65
- static MVEGenDualAccOpFn * const fns[] = {
64
66
+ static MVEGenLongDualAccOpFn * const fns[] = {
65
switch (extract32(insn, 25, 4)) {
67
gen_helper_mve_vrmlaldavhsw, gen_helper_mve_vrmlaldavhxsw,
66
- case 0x0: case 0x1: case 0x3: /* UNALLOCATED */
68
};
67
+ case 0x0:
69
return do_long_dual_acc(s, a, fns[a->x]);
68
+ if (!extract32(insn, 31, 1) || !disas_sme(s, insn)) {
70
@@ -XXX,XX +XXX,XX @@ static bool trans_VRMLALDAVH_S(DisasContext *s, arg_vmlaldav *a)
69
+ unallocated_encoding(s);
71
70
+ }
72
static bool trans_VRMLALDAVH_U(DisasContext *s, arg_vmlaldav *a)
71
+ break;
73
{
72
+ case 0x1: case 0x3: /* UNALLOCATED */
74
- static MVEGenDualAccOpFn * const fns[] = {
73
unallocated_encoding(s);
75
+ static MVEGenLongDualAccOpFn * const fns[] = {
74
break;
76
gen_helper_mve_vrmlaldavhuw, NULL,
75
case 0x2:
77
};
76
diff --git a/target/arm/translate-sme.c b/target/arm/translate-sme.c
78
return do_long_dual_acc(s, a, fns[a->x]);
77
new file mode 100644
79
@@ -XXX,XX +XXX,XX @@ static bool trans_VRMLALDAVH_U(DisasContext *s, arg_vmlaldav *a)
78
index XXXXXXX..XXXXXXX
80
79
--- /dev/null
81
static bool trans_VRMLSLDAVH(DisasContext *s, arg_vmlaldav *a)
80
+++ b/target/arm/translate-sme.c
82
{
81
@@ -XXX,XX +XXX,XX @@
83
- static MVEGenDualAccOpFn * const fns[] = {
82
+/*
84
+ static MVEGenLongDualAccOpFn * const fns[] = {
83
+ * AArch64 SME translation
85
gen_helper_mve_vrmlsldavhsw, gen_helper_mve_vrmlsldavhxsw,
84
+ *
86
};
85
+ * Copyright (c) 2022 Linaro, Ltd
87
return do_long_dual_acc(s, a, fns[a->x]);
86
+ *
87
+ * This library is free software; you can redistribute it and/or
88
+ * modify it under the terms of the GNU Lesser General Public
89
+ * License as published by the Free Software Foundation; either
90
+ * version 2.1 of the License, or (at your option) any later version.
91
+ *
92
+ * This library is distributed in the hope that it will be useful,
93
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
94
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
95
+ * Lesser General Public License for more details.
96
+ *
97
+ * You should have received a copy of the GNU Lesser General Public
98
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
99
+ */
100
+
101
+#include "qemu/osdep.h"
102
+#include "cpu.h"
103
+#include "tcg/tcg-op.h"
104
+#include "tcg/tcg-op-gvec.h"
105
+#include "tcg/tcg-gvec-desc.h"
106
+#include "translate.h"
107
+#include "exec/helper-gen.h"
108
+#include "translate-a64.h"
109
+#include "fpu/softfloat.h"
110
+
111
+
112
+/*
113
+ * Include the generated decoder.
114
+ */
115
+
116
+#include "decode-sme.c.inc"
117
diff --git a/target/arm/meson.build b/target/arm/meson.build
118
index XXXXXXX..XXXXXXX 100644
119
--- a/target/arm/meson.build
120
+++ b/target/arm/meson.build
121
@@ -XXX,XX +XXX,XX @@
122
gen = [
123
decodetree.process('sve.decode', extra_args: '--decode=disas_sve'),
124
+ decodetree.process('sme.decode', extra_args: '--decode=disas_sme'),
125
decodetree.process('neon-shared.decode', extra_args: '--decode=disas_neon_shared'),
126
decodetree.process('neon-dp.decode', extra_args: '--decode=disas_neon_dp'),
127
decodetree.process('neon-ls.decode', extra_args: '--decode=disas_neon_ls'),
128
@@ -XXX,XX +XXX,XX @@ arm_ss.add(when: 'TARGET_AARCH64', if_true: files(
129
'sme_helper.c',
130
'translate-a64.c',
131
'translate-sve.c',
132
+ 'translate-sme.c',
133
))
134
135
arm_softmmu_ss = ss.source_set()
88
--
136
--
89
2.20.1
137
2.25.1
90
91
diff view generated by jsdifflib
1
Unlike A-profile, for M-profile the UDIV and SDIV insns can be
1
From: Richard Henderson <richard.henderson@linaro.org>
2
configured to raise an exception on division by zero, using the CCR
2
3
DIV_0_TRP bit.
3
This new behaviour is in the ARM pseudocode function
4
4
AArch64.CheckFPAdvSIMDEnabled, which applies to AArch32
5
Implement support for setting this bit by making the helper functions
5
via AArch32.CheckAdvSIMDOrFPEnabled when the EL to which
6
raise the appropriate exception.
6
the trap would be delivered is in AArch64 mode.
7
7
8
Given that ARMv9 drops support for AArch32 outside EL0, the trap EL
9
detection ought to be trivially true, but the pseudocode still contains
10
a number of conditions, and QEMU has not yet committed to dropping A32
11
support for EL[12] when v9 features are present.
12
13
Since the computation of SME_TRAP_NONSTREAMING is necessarily different
14
for the two modes, we might as well preserve bits within TBFLAG_ANY and
15
allocate separate bits within TBFLAG_A32 and TBFLAG_A64 instead.
16
17
Note that DDI0616A.a has typos for bits [22:21] of LD1RO in the table
18
of instructions illegal in streaming mode.
19
20
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
21
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
22
Message-id: 20220708151540.18136-4-richard.henderson@linaro.org
8
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
23
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
10
Message-id: 20210730151636.17254-3-peter.maydell@linaro.org
11
---
24
---
12
target/arm/cpu.h | 1 +
25
target/arm/cpu.h | 7 +++
13
target/arm/helper.h | 4 ++--
26
target/arm/translate.h | 4 ++
14
target/arm/helper.c | 19 +++++++++++++++++--
27
target/arm/sme-fa64.decode | 90 ++++++++++++++++++++++++++++++++++++++
15
target/arm/m_helper.c | 4 ++++
28
target/arm/helper.c | 41 +++++++++++++++++
16
target/arm/translate.c | 4 ++--
29
target/arm/translate-a64.c | 40 ++++++++++++++++-
17
5 files changed, 26 insertions(+), 6 deletions(-)
30
target/arm/translate-vfp.c | 12 +++++
31
target/arm/translate.c | 2 +
32
target/arm/meson.build | 1 +
33
8 files changed, 195 insertions(+), 2 deletions(-)
34
create mode 100644 target/arm/sme-fa64.decode
18
35
19
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
36
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
20
index XXXXXXX..XXXXXXX 100644
37
index XXXXXXX..XXXXXXX 100644
21
--- a/target/arm/cpu.h
38
--- a/target/arm/cpu.h
22
+++ b/target/arm/cpu.h
39
+++ b/target/arm/cpu.h
40
@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A32, HSTR_ACTIVE, 9, 1)
41
* the same thing as the current security state of the processor!
42
*/
43
FIELD(TBFLAG_A32, NS, 10, 1)
44
+/*
45
+ * Indicates that SME Streaming mode is active, and SMCR_ELx.FA64 is not.
46
+ * This requires an SME trap from AArch32 mode when using NEON.
47
+ */
48
+FIELD(TBFLAG_A32, SME_TRAP_NONSTREAMING, 11, 1)
49
50
/*
51
* Bit usage when in AArch32 state, for M-profile only.
52
@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A64, SMEEXC_EL, 20, 2)
53
FIELD(TBFLAG_A64, PSTATE_SM, 22, 1)
54
FIELD(TBFLAG_A64, PSTATE_ZA, 23, 1)
55
FIELD(TBFLAG_A64, SVL, 24, 4)
56
+/* Indicates that SME Streaming mode is active, and SMCR_ELx.FA64 is not. */
57
+FIELD(TBFLAG_A64, SME_TRAP_NONSTREAMING, 28, 1)
58
59
/*
60
* Helpers for using the above.
61
diff --git a/target/arm/translate.h b/target/arm/translate.h
62
index XXXXXXX..XXXXXXX 100644
63
--- a/target/arm/translate.h
64
+++ b/target/arm/translate.h
65
@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
66
bool pstate_sm;
67
/* True if PSTATE.ZA is set. */
68
bool pstate_za;
69
+ /* True if non-streaming insns should raise an SME Streaming exception. */
70
+ bool sme_trap_nonstreaming;
71
+ /* True if the current instruction is non-streaming. */
72
+ bool is_nonstreaming;
73
/* True if MVE insns are definitely not predicated by VPR or LTPSIZE */
74
bool mve_no_pred;
75
/*
76
diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
77
new file mode 100644
78
index XXXXXXX..XXXXXXX
79
--- /dev/null
80
+++ b/target/arm/sme-fa64.decode
23
@@ -XXX,XX +XXX,XX @@
81
@@ -XXX,XX +XXX,XX @@
24
#define EXCP_LAZYFP 20 /* v7M fault during lazy FP stacking */
82
+# AArch64 SME allowed instruction decoding
25
#define EXCP_LSERR 21 /* v8M LSERR SecureFault */
83
+#
26
#define EXCP_UNALIGNED 22 /* v7M UNALIGNED UsageFault */
84
+# Copyright (c) 2022 Linaro, Ltd
27
+#define EXCP_DIVBYZERO 23 /* v7M DIVBYZERO UsageFault */
85
+#
28
/* NB: add new EXCP_ defines to the array in arm_log_exception() too */
86
+# This library is free software; you can redistribute it and/or
29
87
+# modify it under the terms of the GNU Lesser General Public
30
#define ARMV7M_EXCP_RESET 1
88
+# License as published by the Free Software Foundation; either
31
diff --git a/target/arm/helper.h b/target/arm/helper.h
89
+# version 2.1 of the License, or (at your option) any later version.
32
index XXXXXXX..XXXXXXX 100644
90
+#
33
--- a/target/arm/helper.h
91
+# This library is distributed in the hope that it will be useful,
34
+++ b/target/arm/helper.h
92
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
35
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(add_saturate, i32, env, i32, i32)
93
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
36
DEF_HELPER_3(sub_saturate, i32, env, i32, i32)
94
+# Lesser General Public License for more details.
37
DEF_HELPER_3(add_usaturate, i32, env, i32, i32)
95
+#
38
DEF_HELPER_3(sub_usaturate, i32, env, i32, i32)
96
+# You should have received a copy of the GNU Lesser General Public
39
-DEF_HELPER_FLAGS_2(sdiv, TCG_CALL_NO_RWG_SE, s32, s32, s32)
97
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
40
-DEF_HELPER_FLAGS_2(udiv, TCG_CALL_NO_RWG_SE, i32, i32, i32)
98
+
41
+DEF_HELPER_FLAGS_3(sdiv, TCG_CALL_NO_RWG, s32, env, s32, s32)
99
+#
42
+DEF_HELPER_FLAGS_3(udiv, TCG_CALL_NO_RWG, i32, env, i32, i32)
100
+# This file is processed by scripts/decodetree.py
43
DEF_HELPER_FLAGS_1(rbit, TCG_CALL_NO_RWG_SE, i32, i32)
101
+#
44
102
+
45
#define PAS_OP(pfx) \
103
+# These patterns are taken from Appendix E1.1 of DDI0616 A.a,
104
+# Arm Architecture Reference Manual Supplement,
105
+# The Scalable Matrix Extension (SME), for Armv9-A
106
+
107
+{
108
+ [
109
+ OK 0-00 1110 0000 0001 0010 11-- ---- ---- # SMOV W|Xd,Vn.B[0]
110
+ OK 0-00 1110 0000 0010 0010 11-- ---- ---- # SMOV W|Xd,Vn.H[0]
111
+ OK 0100 1110 0000 0100 0010 11-- ---- ---- # SMOV Xd,Vn.S[0]
112
+ OK 0000 1110 0000 0001 0011 11-- ---- ---- # UMOV Wd,Vn.B[0]
113
+ OK 0000 1110 0000 0010 0011 11-- ---- ---- # UMOV Wd,Vn.H[0]
114
+ OK 0000 1110 0000 0100 0011 11-- ---- ---- # UMOV Wd,Vn.S[0]
115
+ OK 0100 1110 0000 1000 0011 11-- ---- ---- # UMOV Xd,Vn.D[0]
116
+ ]
117
+ FAIL 0--0 111- ---- ---- ---- ---- ---- ---- # Advanced SIMD vector operations
118
+}
119
+
120
+{
121
+ [
122
+ OK 0101 1110 --1- ---- 11-1 11-- ---- ---- # FMULX/FRECPS/FRSQRTS (scalar)
123
+ OK 0101 1110 -10- ---- 00-1 11-- ---- ---- # FMULX/FRECPS/FRSQRTS (scalar, FP16)
124
+ OK 01-1 1110 1-10 0001 11-1 10-- ---- ---- # FRECPE/FRSQRTE/FRECPX (scalar)
125
+ OK 01-1 1110 1111 1001 11-1 10-- ---- ---- # FRECPE/FRSQRTE/FRECPX (scalar, FP16)
126
+ ]
127
+ FAIL 01-1 111- ---- ---- ---- ---- ---- ---- # Advanced SIMD single-element operations
128
+}
129
+
130
+FAIL 0-00 110- ---- ---- ---- ---- ---- ---- # Advanced SIMD structure load/store
131
+FAIL 1100 1110 ---- ---- ---- ---- ---- ---- # Advanced SIMD cryptography extensions
132
+FAIL 0001 1110 0111 1110 0000 00-- ---- ---- # FJCVTZS
133
+
134
+# These are the "avoidance of doubt" final table of Illegal Advanced SIMD instructions
135
+# We don't actually need to include these, as the default is OK.
136
+# -001 111- ---- ---- ---- ---- ---- ---- # Scalar floating-point operations
137
+# --10 110- ---- ---- ---- ---- ---- ---- # Load/store pair of FP registers
138
+# --01 1100 ---- ---- ---- ---- ---- ---- # Load FP register (PC-relative literal)
139
+# --11 1100 --0- ---- ---- ---- ---- ---- # Load/store FP register (unscaled imm)
140
+# --11 1100 --1- ---- ---- ---- ---- --10 # Load/store FP register (register offset)
141
+# --11 1101 ---- ---- ---- ---- ---- ---- # Load/store FP register (scaled imm)
142
+
143
+FAIL 0000 0100 --1- ---- 1010 ---- ---- ---- # ADR
144
+FAIL 0000 0100 --1- ---- 1011 -0-- ---- ---- # FTSSEL, FEXPA
145
+FAIL 0000 0101 --10 0001 100- ---- ---- ---- # COMPACT
146
+FAIL 0010 0101 --01 100- 1111 000- ---0 ---- # RDFFR, RDFFRS
147
+FAIL 0010 0101 --10 1--- 1001 ---- ---- ---- # WRFFR, SETFFR
148
+FAIL 0100 0101 --0- ---- 1011 ---- ---- ---- # BDEP, BEXT, BGRP
149
+FAIL 0100 0101 000- ---- 0110 1--- ---- ---- # PMULLB, PMULLT (128b result)
150
+FAIL 0110 0100 --1- ---- 1110 01-- ---- ---- # FMMLA, BFMMLA
151
+FAIL 0110 0101 --0- ---- 0000 11-- ---- ---- # FTSMUL
152
+FAIL 0110 0101 --01 0--- 100- ---- ---- ---- # FTMAD
153
+FAIL 0110 0101 --01 1--- 001- ---- ---- ---- # FADDA
154
+FAIL 0100 0101 --0- ---- 1001 10-- ---- ---- # SMMLA, UMMLA, USMMLA
155
+FAIL 0100 0101 --1- ---- 1--- ---- ---- ---- # SVE2 string/histo/crypto instructions
156
+FAIL 1000 010- -00- ---- 10-- ---- ---- ---- # SVE2 32-bit gather NT load (vector+scalar)
157
+FAIL 1000 010- -00- ---- 111- ---- ---- ---- # SVE 32-bit gather prefetch (vector+imm)
158
+FAIL 1000 0100 0-1- ---- 0--- ---- ---- ---- # SVE 32-bit gather prefetch (scalar+vector)
159
+FAIL 1000 010- -01- ---- 1--- ---- ---- ---- # SVE 32-bit gather load (vector+imm)
160
+FAIL 1000 0100 0-0- ---- 0--- ---- ---- ---- # SVE 32-bit gather load byte (scalar+vector)
161
+FAIL 1000 0100 1--- ---- 0--- ---- ---- ---- # SVE 32-bit gather load half (scalar+vector)
162
+FAIL 1000 0101 0--- ---- 0--- ---- ---- ---- # SVE 32-bit gather load word (scalar+vector)
163
+FAIL 1010 010- ---- ---- 011- ---- ---- ---- # SVE contiguous FF load (scalar+scalar)
164
+FAIL 1010 010- ---1 ---- 101- ---- ---- ---- # SVE contiguous NF load (scalar+imm)
165
+FAIL 1010 010- -01- ---- 000- ---- ---- ---- # SVE load & replicate 32 bytes (scalar+scalar)
166
+FAIL 1010 010- -010 ---- 001- ---- ---- ---- # SVE load & replicate 32 bytes (scalar+imm)
167
+FAIL 1100 010- ---- ---- ---- ---- ---- ---- # SVE 64-bit gather load/prefetch
168
+FAIL 1110 010- -00- ---- 001- ---- ---- ---- # SVE2 64-bit scatter NT store (vector+scalar)
169
+FAIL 1110 010- -10- ---- 001- ---- ---- ---- # SVE2 32-bit scatter NT store (vector+scalar)
170
+FAIL 1110 010- ---- ---- 1-0- ---- ---- ---- # SVE scatter store (scalar+32-bit vector)
171
+FAIL 1110 010- ---- ---- 101- ---- ---- ---- # SVE scatter store (misc)
46
diff --git a/target/arm/helper.c b/target/arm/helper.c
172
diff --git a/target/arm/helper.c b/target/arm/helper.c
47
index XXXXXXX..XXXXXXX 100644
173
index XXXXXXX..XXXXXXX 100644
48
--- a/target/arm/helper.c
174
--- a/target/arm/helper.c
49
+++ b/target/arm/helper.c
175
+++ b/target/arm/helper.c
50
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sxtb16)(uint32_t x)
176
@@ -XXX,XX +XXX,XX @@ int sme_exception_el(CPUARMState *env, int el)
51
return res;
177
return 0;
52
}
178
}
53
179
54
+static void handle_possible_div0_trap(CPUARMState *env, uintptr_t ra)
180
+/* This corresponds to the ARM pseudocode function IsFullA64Enabled(). */
55
+{
181
+static bool sme_fa64(CPUARMState *env, int el)
182
+{
183
+ if (!cpu_isar_feature(aa64_sme_fa64, env_archcpu(env))) {
184
+ return false;
185
+ }
186
+
187
+ if (el <= 1 && !el_is_in_host(env, el)) {
188
+ if (!FIELD_EX64(env->vfp.smcr_el[1], SMCR, FA64)) {
189
+ return false;
190
+ }
191
+ }
192
+ if (el <= 2 && arm_is_el2_enabled(env)) {
193
+ if (!FIELD_EX64(env->vfp.smcr_el[2], SMCR, FA64)) {
194
+ return false;
195
+ }
196
+ }
197
+ if (arm_feature(env, ARM_FEATURE_EL3)) {
198
+ if (!FIELD_EX64(env->vfp.smcr_el[3], SMCR, FA64)) {
199
+ return false;
200
+ }
201
+ }
202
+
203
+ return true;
204
+}
205
+
206
/*
207
* Given that SVE is enabled, return the vector length for EL.
208
*/
209
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a32(CPUARMState *env, int fp_el,
210
DP_TBFLAG_ANY(flags, PSTATE__IL, 1);
211
}
212
56
+ /*
213
+ /*
57
+ * Take a division-by-zero exception if necessary; otherwise return
214
+ * The SME exception we are testing for is raised via
58
+ * to get the usual non-trapping division behaviour (result of 0)
215
+ * AArch64.CheckFPAdvSIMDEnabled(), as called from
216
+ * AArch32.CheckAdvSIMDOrFPEnabled().
59
+ */
217
+ */
60
+ if (arm_feature(env, ARM_FEATURE_M)
218
+ if (el == 0
61
+ && (env->v7m.ccr[env->v7m.secure] & R_V7M_CCR_DIV_0_TRP_MASK)) {
219
+ && FIELD_EX64(env->svcr, SVCR, SM)
62
+ raise_exception_ra(env, EXCP_DIVBYZERO, 0, 1, ra);
220
+ && (!arm_is_el2_enabled(env)
63
+ }
221
+ || (arm_el_is_aa64(env, 2) && !(env->cp15.hcr_el2 & HCR_TGE)))
64
+}
222
+ && arm_el_is_aa64(env, 1)
65
+
223
+ && !sme_fa64(env, el)) {
66
uint32_t HELPER(uxtb16)(uint32_t x)
224
+ DP_TBFLAG_A32(flags, SME_TRAP_NONSTREAMING, 1);
225
+ }
226
+
227
return rebuild_hflags_common_32(env, fp_el, mmu_idx, flags);
228
}
229
230
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
231
}
232
if (FIELD_EX64(env->svcr, SVCR, SM)) {
233
DP_TBFLAG_A64(flags, PSTATE_SM, 1);
234
+ DP_TBFLAG_A64(flags, SME_TRAP_NONSTREAMING, !sme_fa64(env, el));
235
}
236
DP_TBFLAG_A64(flags, PSTATE_ZA, FIELD_EX64(env->svcr, SVCR, ZA));
237
}
238
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
239
index XXXXXXX..XXXXXXX 100644
240
--- a/target/arm/translate-a64.c
241
+++ b/target/arm/translate-a64.c
242
@@ -XXX,XX +XXX,XX @@ static void do_vec_ld(DisasContext *s, int destidx, int element,
243
* unallocated-encoding checks (otherwise the syndrome information
244
* for the resulting exception will be incorrect).
245
*/
246
-static bool fp_access_check(DisasContext *s)
247
+static bool fp_access_check_only(DisasContext *s)
67
{
248
{
68
uint32_t res;
249
if (s->fp_excp_el) {
69
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(uxtb16)(uint32_t x)
250
assert(!s->fp_access_checked);
70
return res;
251
@@ -XXX,XX +XXX,XX @@ static bool fp_access_check(DisasContext *s)
252
return true;
71
}
253
}
72
254
73
-int32_t HELPER(sdiv)(int32_t num, int32_t den)
255
+static bool fp_access_check(DisasContext *s)
74
+int32_t HELPER(sdiv)(CPUARMState *env, int32_t num, int32_t den)
256
+{
75
{
257
+ if (!fp_access_check_only(s)) {
76
if (den == 0) {
258
+ return false;
77
+ handle_possible_div0_trap(env, GETPC());
259
+ }
78
return 0;
260
+ if (s->sme_trap_nonstreaming && s->is_nonstreaming) {
79
}
261
+ gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
80
if (num == INT_MIN && den == -1) {
262
+ syn_smetrap(SME_ET_Streaming, false));
81
@@ -XXX,XX +XXX,XX @@ int32_t HELPER(sdiv)(int32_t num, int32_t den)
263
+ return false;
82
return num / den;
264
+ }
265
+ return true;
266
+}
267
+
268
/* Check that SVE access is enabled. If it is, return true.
269
* If not, emit code to generate an appropriate exception and return false.
270
*/
271
@@ -XXX,XX +XXX,XX @@ static void handle_sys(DisasContext *s, uint32_t insn, bool isread,
272
default:
273
g_assert_not_reached();
274
}
275
- if ((ri->type & ARM_CP_FPU) && !fp_access_check(s)) {
276
+ if ((ri->type & ARM_CP_FPU) && !fp_access_check_only(s)) {
277
return;
278
} else if ((ri->type & ARM_CP_SVE) && !sve_access_check(s)) {
279
return;
280
@@ -XXX,XX +XXX,XX @@ static void disas_data_proc_simd_fp(DisasContext *s, uint32_t insn)
281
}
83
}
282
}
84
283
85
-uint32_t HELPER(udiv)(uint32_t num, uint32_t den)
284
+/*
86
+uint32_t HELPER(udiv)(CPUARMState *env, uint32_t num, uint32_t den)
285
+ * Include the generated SME FA64 decoder.
87
{
286
+ */
88
if (den == 0) {
287
+
89
+ handle_possible_div0_trap(env, GETPC());
288
+#include "decode-sme-fa64.c.inc"
90
return 0;
289
+
91
}
290
+static bool trans_OK(DisasContext *s, arg_OK *a)
92
return num / den;
291
+{
93
@@ -XXX,XX +XXX,XX @@ void arm_log_exception(int idx)
292
+ return true;
94
[EXCP_LAZYFP] = "v7M exception during lazy FP stacking",
293
+}
95
[EXCP_LSERR] = "v8M LSERR UsageFault",
294
+
96
[EXCP_UNALIGNED] = "v7M UNALIGNED UsageFault",
295
+static bool trans_FAIL(DisasContext *s, arg_OK *a)
97
+ [EXCP_DIVBYZERO] = "v7M DIVBYZERO UsageFault",
296
+{
98
};
297
+ s->is_nonstreaming = true;
99
298
+ return true;
100
if (idx >= 0 && idx < ARRAY_SIZE(excnames)) {
299
+}
101
diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
300
+
102
index XXXXXXX..XXXXXXX 100644
301
/**
103
--- a/target/arm/m_helper.c
302
* is_guarded_page:
104
+++ b/target/arm/m_helper.c
303
* @env: The cpu environment
105
@@ -XXX,XX +XXX,XX @@ void arm_v7m_cpu_do_interrupt(CPUState *cs)
304
@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
106
armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_USAGE, env->v7m.secure);
305
dc->mte_active[1] = EX_TBFLAG_A64(tb_flags, MTE0_ACTIVE);
107
env->v7m.cfsr[env->v7m.secure] |= R_V7M_CFSR_UNALIGNED_MASK;
306
dc->pstate_sm = EX_TBFLAG_A64(tb_flags, PSTATE_SM);
108
break;
307
dc->pstate_za = EX_TBFLAG_A64(tb_flags, PSTATE_ZA);
109
+ case EXCP_DIVBYZERO:
308
+ dc->sme_trap_nonstreaming = EX_TBFLAG_A64(tb_flags, SME_TRAP_NONSTREAMING);
110
+ armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_USAGE, env->v7m.secure);
309
dc->vec_len = 0;
111
+ env->v7m.cfsr[env->v7m.secure] |= R_V7M_CFSR_DIVBYZERO_MASK;
310
dc->vec_stride = 0;
112
+ break;
311
dc->cp_regs = arm_cpu->cp_regs;
113
case EXCP_SWI:
312
@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_translate_insn(DisasContextBase *dcbase, CPUState *cpu)
114
/* The PC already points to the next instruction. */
313
}
115
armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_SVC, env->v7m.secure);
314
}
315
316
+ s->is_nonstreaming = false;
317
+ if (s->sme_trap_nonstreaming) {
318
+ disas_sme_fa64(s, insn);
319
+ }
320
+
321
switch (extract32(insn, 25, 4)) {
322
case 0x0:
323
if (!extract32(insn, 31, 1) || !disas_sme(s, insn)) {
324
diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
325
index XXXXXXX..XXXXXXX 100644
326
--- a/target/arm/translate-vfp.c
327
+++ b/target/arm/translate-vfp.c
328
@@ -XXX,XX +XXX,XX @@ static bool vfp_access_check_a(DisasContext *s, bool ignore_vfp_enabled)
329
return false;
330
}
331
332
+ /*
333
+ * Note that rebuild_hflags_a32 has already accounted for being in EL0
334
+ * and the higher EL in A64 mode, etc. Unlike A64 mode, there do not
335
+ * appear to be any insns which touch VFP which are allowed.
336
+ */
337
+ if (s->sme_trap_nonstreaming) {
338
+ gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
339
+ syn_smetrap(SME_ET_Streaming,
340
+ s->base.pc_next - s->pc_curr == 2));
341
+ return false;
342
+ }
343
+
344
if (!s->vfp_enabled && !ignore_vfp_enabled) {
345
assert(!arm_dc_feature(s, ARM_FEATURE_M));
346
unallocated_encoding(s);
116
diff --git a/target/arm/translate.c b/target/arm/translate.c
347
diff --git a/target/arm/translate.c b/target/arm/translate.c
117
index XXXXXXX..XXXXXXX 100644
348
index XXXXXXX..XXXXXXX 100644
118
--- a/target/arm/translate.c
349
--- a/target/arm/translate.c
119
+++ b/target/arm/translate.c
350
+++ b/target/arm/translate.c
120
@@ -XXX,XX +XXX,XX @@ static bool op_div(DisasContext *s, arg_rrr *a, bool u)
351
@@ -XXX,XX +XXX,XX @@ static void arm_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
121
t1 = load_reg(s, a->rn);
352
dc->vec_len = EX_TBFLAG_A32(tb_flags, VECLEN);
122
t2 = load_reg(s, a->rm);
353
dc->vec_stride = EX_TBFLAG_A32(tb_flags, VECSTRIDE);
123
if (u) {
354
}
124
- gen_helper_udiv(t1, t1, t2);
355
+ dc->sme_trap_nonstreaming =
125
+ gen_helper_udiv(t1, cpu_env, t1, t2);
356
+ EX_TBFLAG_A32(tb_flags, SME_TRAP_NONSTREAMING);
126
} else {
357
}
127
- gen_helper_sdiv(t1, t1, t2);
358
dc->cp_regs = cpu->cp_regs;
128
+ gen_helper_sdiv(t1, cpu_env, t1, t2);
359
dc->features = env->features;
129
}
360
diff --git a/target/arm/meson.build b/target/arm/meson.build
130
tcg_temp_free_i32(t2);
361
index XXXXXXX..XXXXXXX 100644
131
store_reg(s, a->rd, t1);
362
--- a/target/arm/meson.build
363
+++ b/target/arm/meson.build
364
@@ -XXX,XX +XXX,XX @@
365
gen = [
366
decodetree.process('sve.decode', extra_args: '--decode=disas_sve'),
367
decodetree.process('sme.decode', extra_args: '--decode=disas_sme'),
368
+ decodetree.process('sme-fa64.decode', extra_args: '--static-decode=disas_sme_fa64'),
369
decodetree.process('neon-shared.decode', extra_args: '--decode=disas_neon_shared'),
370
decodetree.process('neon-dp.decode', extra_args: '--decode=disas_neon_dp'),
371
decodetree.process('neon-ls.decode', extra_args: '--decode=disas_neon_ls'),
132
--
372
--
133
2.20.1
373
2.25.1
134
135
diff view generated by jsdifflib
1
From: Hamza Mahfooz <someguy@effective-light.com>
1
From: Richard Henderson <richard.henderson@linaro.org>
2
2
3
As per commit 5626f8c6d468 ("rcu: Add automatically released rcu_read_lock
3
Mark ADR as a non-streaming instruction, which should trap
4
variants"), RCU_READ_LOCK_GUARD() should be used instead of
4
if full a64 support is not enabled in streaming mode.
5
rcu_read_{un}lock().
6
5
7
Signed-off-by: Hamza Mahfooz <someguy@effective-light.com>
6
Removing entries from sme-fa64.decode is an easy way to see
8
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
7
what remains to be done.
9
Message-id: 20210727235201.11491-1-someguy@effective-light.com
8
9
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
10
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
11
Message-id: 20220708151540.18136-5-richard.henderson@linaro.org
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
12
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
---
13
---
12
target/arm/kvm.c | 17 ++++++++---------
14
target/arm/translate.h | 7 +++++++
13
1 file changed, 8 insertions(+), 9 deletions(-)
15
target/arm/sme-fa64.decode | 1 -
16
target/arm/translate-sve.c | 8 ++++----
17
3 files changed, 11 insertions(+), 5 deletions(-)
14
18
15
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
19
diff --git a/target/arm/translate.h b/target/arm/translate.h
16
index XXXXXXX..XXXXXXX 100644
20
index XXXXXXX..XXXXXXX 100644
17
--- a/target/arm/kvm.c
21
--- a/target/arm/translate.h
18
+++ b/target/arm/kvm.c
22
+++ b/target/arm/translate.h
19
@@ -XXX,XX +XXX,XX @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
23
@@ -XXX,XX +XXX,XX @@ uint64_t asimd_imm_const(uint32_t imm, int cmode, int op);
20
hwaddr xlat, len, doorbell_gpa;
24
static bool trans_##NAME(DisasContext *s, arg_##NAME *a) \
21
MemoryRegionSection mrs;
25
{ return dc_isar_feature(FEAT, s) && FUNC(s, __VA_ARGS__); }
22
MemoryRegion *mr;
26
23
- int ret = 1;
27
+#define TRANS_FEAT_NONSTREAMING(NAME, FEAT, FUNC, ...) \
24
28
+ static bool trans_##NAME(DisasContext *s, arg_##NAME *a) \
25
if (as == &address_space_memory) {
29
+ { \
26
return 0;
30
+ s->is_nonstreaming = true; \
27
@@ -XXX,XX +XXX,XX @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
31
+ return dc_isar_feature(FEAT, s) && FUNC(s, __VA_ARGS__); \
28
32
+ }
29
/* MSI doorbell address is translated by an IOMMU */
30
31
- rcu_read_lock();
32
+ RCU_READ_LOCK_GUARD();
33
+
33
+
34
mr = address_space_translate(as, address, &xlat, &len, true,
34
#endif /* TARGET_ARM_TRANSLATE_H */
35
MEMTXATTRS_UNSPECIFIED);
35
diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
36
+
36
index XXXXXXX..XXXXXXX 100644
37
if (!mr) {
37
--- a/target/arm/sme-fa64.decode
38
- goto unlock;
38
+++ b/target/arm/sme-fa64.decode
39
+ return 1;
39
@@ -XXX,XX +XXX,XX @@ FAIL 0001 1110 0111 1110 0000 00-- ---- ---- # FJCVTZS
40
}
40
# --11 1100 --1- ---- ---- ---- ---- --10 # Load/store FP register (register offset)
41
+
41
# --11 1101 ---- ---- ---- ---- ---- ---- # Load/store FP register (scaled imm)
42
mrs = memory_region_find(mr, xlat, 1);
42
43
+
43
-FAIL 0000 0100 --1- ---- 1010 ---- ---- ---- # ADR
44
if (!mrs.mr) {
44
FAIL 0000 0100 --1- ---- 1011 -0-- ---- ---- # FTSSEL, FEXPA
45
- goto unlock;
45
FAIL 0000 0101 --10 0001 100- ---- ---- ---- # COMPACT
46
+ return 1;
46
FAIL 0010 0101 --01 100- 1111 000- ---0 ---- # RDFFR, RDFFRS
47
}
47
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
48
48
index XXXXXXX..XXXXXXX 100644
49
doorbell_gpa = mrs.offset_within_address_space;
49
--- a/target/arm/translate-sve.c
50
@@ -XXX,XX +XXX,XX @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
50
+++ b/target/arm/translate-sve.c
51
51
@@ -XXX,XX +XXX,XX @@ static bool do_adr(DisasContext *s, arg_rrri *a, gen_helper_gvec_3 *fn)
52
trace_kvm_arm_fixup_msi_route(address, doorbell_gpa);
52
return gen_gvec_ool_zzz(s, fn, a->rd, a->rn, a->rm, a->imm);
53
54
- ret = 0;
55
-
56
-unlock:
57
- rcu_read_unlock();
58
- return ret;
59
+ return 0;
60
}
53
}
61
54
62
int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
55
-TRANS_FEAT(ADR_p32, aa64_sve, do_adr, a, gen_helper_sve_adr_p32)
56
-TRANS_FEAT(ADR_p64, aa64_sve, do_adr, a, gen_helper_sve_adr_p64)
57
-TRANS_FEAT(ADR_s32, aa64_sve, do_adr, a, gen_helper_sve_adr_s32)
58
-TRANS_FEAT(ADR_u32, aa64_sve, do_adr, a, gen_helper_sve_adr_u32)
59
+TRANS_FEAT_NONSTREAMING(ADR_p32, aa64_sve, do_adr, a, gen_helper_sve_adr_p32)
60
+TRANS_FEAT_NONSTREAMING(ADR_p64, aa64_sve, do_adr, a, gen_helper_sve_adr_p64)
61
+TRANS_FEAT_NONSTREAMING(ADR_s32, aa64_sve, do_adr, a, gen_helper_sve_adr_s32)
62
+TRANS_FEAT_NONSTREAMING(ADR_u32, aa64_sve, do_adr, a, gen_helper_sve_adr_u32)
63
64
/*
65
*** SVE Integer Misc - Unpredicated Group
63
--
66
--
64
2.20.1
67
2.25.1
65
66
diff view generated by jsdifflib
1
Implement the MVE VMAXA and VMINA insns, which take the absolute
1
From: Richard Henderson <richard.henderson@linaro.org>
2
value of the signed elements in the input vector and then accumulate
3
the unsigned max or min into the destination vector.
4
2
3
Mark these as a non-streaming instructions, which should trap
4
if full a64 support is not enabled in streaming mode.
5
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20220708151540.18136-6-richard.henderson@linaro.org
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
7
---
10
---
8
target/arm/helper-mve.h | 8 ++++++++
11
target/arm/sme-fa64.decode | 2 --
9
target/arm/mve.decode | 4 ++++
12
target/arm/translate-sve.c | 9 ++++++---
10
target/arm/mve_helper.c | 26 ++++++++++++++++++++++++++
13
2 files changed, 6 insertions(+), 5 deletions(-)
11
target/arm/translate-mve.c | 2 ++
12
4 files changed, 40 insertions(+)
13
14
14
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
15
diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
15
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/helper-mve.h
17
--- a/target/arm/sme-fa64.decode
17
+++ b/target/arm/helper-mve.h
18
+++ b/target/arm/sme-fa64.decode
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vqnegb, TCG_CALL_NO_WG, void, env, ptr, ptr)
19
@@ -XXX,XX +XXX,XX @@ FAIL 0001 1110 0111 1110 0000 00-- ---- ---- # FJCVTZS
19
DEF_HELPER_FLAGS_3(mve_vqnegh, TCG_CALL_NO_WG, void, env, ptr, ptr)
20
20
DEF_HELPER_FLAGS_3(mve_vqnegw, TCG_CALL_NO_WG, void, env, ptr, ptr)
21
FAIL 0000 0100 --1- ---- 1011 -0-- ---- ---- # FTSSEL, FEXPA
21
22
FAIL 0000 0101 --10 0001 100- ---- ---- ---- # COMPACT
22
+DEF_HELPER_FLAGS_3(mve_vmaxab, TCG_CALL_NO_WG, void, env, ptr, ptr)
23
-FAIL 0010 0101 --01 100- 1111 000- ---0 ---- # RDFFR, RDFFRS
23
+DEF_HELPER_FLAGS_3(mve_vmaxah, TCG_CALL_NO_WG, void, env, ptr, ptr)
24
-FAIL 0010 0101 --10 1--- 1001 ---- ---- ---- # WRFFR, SETFFR
24
+DEF_HELPER_FLAGS_3(mve_vmaxaw, TCG_CALL_NO_WG, void, env, ptr, ptr)
25
FAIL 0100 0101 --0- ---- 1011 ---- ---- ---- # BDEP, BEXT, BGRP
26
FAIL 0100 0101 000- ---- 0110 1--- ---- ---- # PMULLB, PMULLT (128b result)
27
FAIL 0110 0100 --1- ---- 1110 01-- ---- ---- # FMMLA, BFMMLA
28
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
29
index XXXXXXX..XXXXXXX 100644
30
--- a/target/arm/translate-sve.c
31
+++ b/target/arm/translate-sve.c
32
@@ -XXX,XX +XXX,XX @@ static bool do_predset(DisasContext *s, int esz, int rd, int pat, bool setflag)
33
TRANS_FEAT(PTRUE, aa64_sve, do_predset, a->esz, a->rd, a->pat, a->s)
34
35
/* Note pat == 31 is #all, to set all elements. */
36
-TRANS_FEAT(SETFFR, aa64_sve, do_predset, 0, FFR_PRED_NUM, 31, false)
37
+TRANS_FEAT_NONSTREAMING(SETFFR, aa64_sve,
38
+ do_predset, 0, FFR_PRED_NUM, 31, false)
39
40
/* Note pat == 32 is #unimp, to set no elements. */
41
TRANS_FEAT(PFALSE, aa64_sve, do_predset, 0, a->rd, 32, false)
42
@@ -XXX,XX +XXX,XX @@ static bool trans_RDFFR_p(DisasContext *s, arg_RDFFR_p *a)
43
.rd = a->rd, .pg = a->pg, .s = a->s,
44
.rn = FFR_PRED_NUM, .rm = FFR_PRED_NUM,
45
};
25
+
46
+
26
+DEF_HELPER_FLAGS_3(mve_vminab, TCG_CALL_NO_WG, void, env, ptr, ptr)
47
+ s->is_nonstreaming = true;
27
+DEF_HELPER_FLAGS_3(mve_vminah, TCG_CALL_NO_WG, void, env, ptr, ptr)
48
return trans_AND_pppp(s, &alt_a);
28
+DEF_HELPER_FLAGS_3(mve_vminaw, TCG_CALL_NO_WG, void, env, ptr, ptr)
29
+
30
DEF_HELPER_FLAGS_3(mve_vmovnbb, TCG_CALL_NO_WG, void, env, ptr, ptr)
31
DEF_HELPER_FLAGS_3(mve_vmovnbh, TCG_CALL_NO_WG, void, env, ptr, ptr)
32
DEF_HELPER_FLAGS_3(mve_vmovntb, TCG_CALL_NO_WG, void, env, ptr, ptr)
33
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
34
index XXXXXXX..XXXXXXX 100644
35
--- a/target/arm/mve.decode
36
+++ b/target/arm/mve.decode
37
@@ -XXX,XX +XXX,XX @@ VMUL 1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
38
VQMOVUNB 111 0 1110 0 . 11 .. 01 ... 0 1110 1 0 . 0 ... 1 @1op
39
VQMOVN_BS 111 0 1110 0 . 11 .. 11 ... 0 1110 0 0 . 0 ... 1 @1op
40
41
+ VMAXA 111 0 1110 0 . 11 .. 11 ... 0 1110 1 0 . 0 ... 1 @1op
42
+
43
VMULH_S 111 0 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
44
}
49
}
45
50
46
@@ -XXX,XX +XXX,XX @@ VMUL 1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
51
-TRANS_FEAT(RDFFR, aa64_sve, do_mov_p, a->rd, FFR_PRED_NUM)
47
VQMOVUNT 111 0 1110 0 . 11 .. 01 ... 1 1110 1 0 . 0 ... 1 @1op
52
-TRANS_FEAT(WRFFR, aa64_sve, do_mov_p, FFR_PRED_NUM, a->rn)
48
VQMOVN_TS 111 0 1110 0 . 11 .. 11 ... 1 1110 0 0 . 0 ... 1 @1op
53
+TRANS_FEAT_NONSTREAMING(RDFFR, aa64_sve, do_mov_p, a->rd, FFR_PRED_NUM)
49
54
+TRANS_FEAT_NONSTREAMING(WRFFR, aa64_sve, do_mov_p, FFR_PRED_NUM, a->rn)
50
+ VMINA 111 0 1110 0 . 11 .. 11 ... 1 1110 1 0 . 0 ... 1 @1op
55
51
+
56
static bool do_pfirst_pnext(DisasContext *s, arg_rr_esz *a,
52
VRMULH_S 111 0 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
57
void (*gen_fn)(TCGv_i32, TCGv_ptr,
53
}
54
55
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
56
index XXXXXXX..XXXXXXX 100644
57
--- a/target/arm/mve_helper.c
58
+++ b/target/arm/mve_helper.c
59
@@ -XXX,XX +XXX,XX @@ DO_1OP_SAT(vqabsw, 4, int32_t, DO_VQABS_W)
60
DO_1OP_SAT(vqnegb, 1, int8_t, DO_VQNEG_B)
61
DO_1OP_SAT(vqnegh, 2, int16_t, DO_VQNEG_H)
62
DO_1OP_SAT(vqnegw, 4, int32_t, DO_VQNEG_W)
63
+
64
+/*
65
+ * VMAXA, VMINA: vd is unsigned; vm is signed, and we take its
66
+ * absolute value; we then do an unsigned comparison.
67
+ */
68
+#define DO_VMAXMINA(OP, ESIZE, STYPE, UTYPE, FN) \
69
+ void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm) \
70
+ { \
71
+ UTYPE *d = vd; \
72
+ STYPE *m = vm; \
73
+ uint16_t mask = mve_element_mask(env); \
74
+ unsigned e; \
75
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
76
+ UTYPE r = DO_ABS(m[H##ESIZE(e)]); \
77
+ r = FN(d[H##ESIZE(e)], r); \
78
+ mergemask(&d[H##ESIZE(e)], r, mask); \
79
+ } \
80
+ mve_advance_vpt(env); \
81
+ }
82
+
83
+DO_VMAXMINA(vmaxab, 1, int8_t, uint8_t, DO_MAX)
84
+DO_VMAXMINA(vmaxah, 2, int16_t, uint16_t, DO_MAX)
85
+DO_VMAXMINA(vmaxaw, 4, int32_t, uint32_t, DO_MAX)
86
+DO_VMAXMINA(vminab, 1, int8_t, uint8_t, DO_MIN)
87
+DO_VMAXMINA(vminah, 2, int16_t, uint16_t, DO_MIN)
88
+DO_VMAXMINA(vminaw, 4, int32_t, uint32_t, DO_MIN)
89
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
90
index XXXXXXX..XXXXXXX 100644
91
--- a/target/arm/translate-mve.c
92
+++ b/target/arm/translate-mve.c
93
@@ -XXX,XX +XXX,XX @@ DO_1OP(VABS, vabs)
94
DO_1OP(VNEG, vneg)
95
DO_1OP(VQABS, vqabs)
96
DO_1OP(VQNEG, vqneg)
97
+DO_1OP(VMAXA, vmaxa)
98
+DO_1OP(VMINA, vmina)
99
100
/* Narrowing moves: only size 0 and 1 are valid */
101
#define DO_VMOVN(INSN, FN) \
102
--
58
--
103
2.20.1
59
2.25.1
104
105
diff view generated by jsdifflib
New patch
1
From: Richard Henderson <richard.henderson@linaro.org>
1
2
3
Mark these as a non-streaming instructions, which should trap
4
if full a64 support is not enabled in streaming mode.
5
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20220708151540.18136-7-richard.henderson@linaro.org
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
10
---
11
target/arm/sme-fa64.decode | 3 ---
12
target/arm/translate-sve.c | 22 ++++++++++++----------
13
2 files changed, 12 insertions(+), 13 deletions(-)
14
15
diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
16
index XXXXXXX..XXXXXXX 100644
17
--- a/target/arm/sme-fa64.decode
18
+++ b/target/arm/sme-fa64.decode
19
@@ -XXX,XX +XXX,XX @@ FAIL 0001 1110 0111 1110 0000 00-- ---- ---- # FJCVTZS
20
# --11 1100 --1- ---- ---- ---- ---- --10 # Load/store FP register (register offset)
21
# --11 1101 ---- ---- ---- ---- ---- ---- # Load/store FP register (scaled imm)
22
23
-FAIL 0000 0100 --1- ---- 1011 -0-- ---- ---- # FTSSEL, FEXPA
24
-FAIL 0000 0101 --10 0001 100- ---- ---- ---- # COMPACT
25
-FAIL 0100 0101 --0- ---- 1011 ---- ---- ---- # BDEP, BEXT, BGRP
26
FAIL 0100 0101 000- ---- 0110 1--- ---- ---- # PMULLB, PMULLT (128b result)
27
FAIL 0110 0100 --1- ---- 1110 01-- ---- ---- # FMMLA, BFMMLA
28
FAIL 0110 0101 --0- ---- 0000 11-- ---- ---- # FTSMUL
29
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
30
index XXXXXXX..XXXXXXX 100644
31
--- a/target/arm/translate-sve.c
32
+++ b/target/arm/translate-sve.c
33
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_2 * const fexpa_fns[4] = {
34
NULL, gen_helper_sve_fexpa_h,
35
gen_helper_sve_fexpa_s, gen_helper_sve_fexpa_d,
36
};
37
-TRANS_FEAT(FEXPA, aa64_sve, gen_gvec_ool_zz,
38
- fexpa_fns[a->esz], a->rd, a->rn, 0)
39
+TRANS_FEAT_NONSTREAMING(FEXPA, aa64_sve, gen_gvec_ool_zz,
40
+ fexpa_fns[a->esz], a->rd, a->rn, 0)
41
42
static gen_helper_gvec_3 * const ftssel_fns[4] = {
43
NULL, gen_helper_sve_ftssel_h,
44
gen_helper_sve_ftssel_s, gen_helper_sve_ftssel_d,
45
};
46
-TRANS_FEAT(FTSSEL, aa64_sve, gen_gvec_ool_arg_zzz, ftssel_fns[a->esz], a, 0)
47
+TRANS_FEAT_NONSTREAMING(FTSSEL, aa64_sve, gen_gvec_ool_arg_zzz,
48
+ ftssel_fns[a->esz], a, 0)
49
50
/*
51
*** SVE Predicate Logical Operations Group
52
@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(TRN2_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz,
53
static gen_helper_gvec_3 * const compact_fns[4] = {
54
NULL, NULL, gen_helper_sve_compact_s, gen_helper_sve_compact_d
55
};
56
-TRANS_FEAT(COMPACT, aa64_sve, gen_gvec_ool_arg_zpz, compact_fns[a->esz], a, 0)
57
+TRANS_FEAT_NONSTREAMING(COMPACT, aa64_sve, gen_gvec_ool_arg_zpz,
58
+ compact_fns[a->esz], a, 0)
59
60
/* Call the helper that computes the ARM LastActiveElement pseudocode
61
* function, scaled by the element size. This includes the not found
62
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3 * const bext_fns[4] = {
63
gen_helper_sve2_bext_b, gen_helper_sve2_bext_h,
64
gen_helper_sve2_bext_s, gen_helper_sve2_bext_d,
65
};
66
-TRANS_FEAT(BEXT, aa64_sve2_bitperm, gen_gvec_ool_arg_zzz,
67
- bext_fns[a->esz], a, 0)
68
+TRANS_FEAT_NONSTREAMING(BEXT, aa64_sve2_bitperm, gen_gvec_ool_arg_zzz,
69
+ bext_fns[a->esz], a, 0)
70
71
static gen_helper_gvec_3 * const bdep_fns[4] = {
72
gen_helper_sve2_bdep_b, gen_helper_sve2_bdep_h,
73
gen_helper_sve2_bdep_s, gen_helper_sve2_bdep_d,
74
};
75
-TRANS_FEAT(BDEP, aa64_sve2_bitperm, gen_gvec_ool_arg_zzz,
76
- bdep_fns[a->esz], a, 0)
77
+TRANS_FEAT_NONSTREAMING(BDEP, aa64_sve2_bitperm, gen_gvec_ool_arg_zzz,
78
+ bdep_fns[a->esz], a, 0)
79
80
static gen_helper_gvec_3 * const bgrp_fns[4] = {
81
gen_helper_sve2_bgrp_b, gen_helper_sve2_bgrp_h,
82
gen_helper_sve2_bgrp_s, gen_helper_sve2_bgrp_d,
83
};
84
-TRANS_FEAT(BGRP, aa64_sve2_bitperm, gen_gvec_ool_arg_zzz,
85
- bgrp_fns[a->esz], a, 0)
86
+TRANS_FEAT_NONSTREAMING(BGRP, aa64_sve2_bitperm, gen_gvec_ool_arg_zzz,
87
+ bgrp_fns[a->esz], a, 0)
88
89
static gen_helper_gvec_3 * const cadd_fns[4] = {
90
gen_helper_sve2_cadd_b, gen_helper_sve2_cadd_h,
91
--
92
2.25.1
diff view generated by jsdifflib
1
From: Sebastian Meyer <meyer@absint.com>
1
From: Richard Henderson <richard.henderson@linaro.org>
2
2
3
With gdb 9.0 and better it is possible to connect to a gdbstub
3
Mark these as a non-streaming instructions, which should trap
4
over unix sockets, which is better than a TCP socket connection
4
if full a64 support is not enabled in streaming mode.
5
in some situations. The QEMU command line to set this up is
6
non-obvious; document it.
7
5
8
Signed-off-by: Sebastian Meyer <meyer@absint.com>
9
Message-id: 162867284829.27377.4784930719350564918-0@git.sr.ht
10
[PMM: Tweaked commit message; adjusted wording in a couple of
11
places; fixed rST formatting issue; moved section up out of
12
the 'advanced debugging options' subsection]
13
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
14
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20220708151540.18136-8-richard.henderson@linaro.org
15
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
16
---
10
---
17
docs/system/gdb.rst | 26 +++++++++++++++++++++++++-
11
target/arm/sme-fa64.decode | 2 --
18
1 file changed, 25 insertions(+), 1 deletion(-)
12
target/arm/translate-sve.c | 24 +++++++++++++++---------
13
2 files changed, 15 insertions(+), 11 deletions(-)
19
14
20
diff --git a/docs/system/gdb.rst b/docs/system/gdb.rst
15
diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
21
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
22
--- a/docs/system/gdb.rst
17
--- a/target/arm/sme-fa64.decode
23
+++ b/docs/system/gdb.rst
18
+++ b/target/arm/sme-fa64.decode
24
@@ -XXX,XX +XXX,XX @@ The ``-s`` option will make QEMU listen for an incoming connection
19
@@ -XXX,XX +XXX,XX @@ FAIL 0001 1110 0111 1110 0000 00-- ---- ---- # FJCVTZS
25
from gdb on TCP port 1234, and ``-S`` will make QEMU not start the
20
# --11 1100 --1- ---- ---- ---- ---- --10 # Load/store FP register (register offset)
26
guest until you tell it to from gdb. (If you want to specify which
21
# --11 1101 ---- ---- ---- ---- ---- ---- # Load/store FP register (scaled imm)
27
TCP port to use or to use something other than TCP for the gdbstub
22
28
-connection, use the ``-gdb dev`` option instead of ``-s``.)
23
-FAIL 0100 0101 000- ---- 0110 1--- ---- ---- # PMULLB, PMULLT (128b result)
29
+connection, use the ``-gdb dev`` option instead of ``-s``. See
24
-FAIL 0110 0100 --1- ---- 1110 01-- ---- ---- # FMMLA, BFMMLA
30
+`Using unix sockets`_ for an example.)
25
FAIL 0110 0101 --0- ---- 0000 11-- ---- ---- # FTSMUL
31
26
FAIL 0110 0101 --01 0--- 100- ---- ---- ---- # FTMAD
32
.. parsed-literal::
27
FAIL 0110 0101 --01 1--- 001- ---- ---- ---- # FADDA
33
28
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
34
@@ -XXX,XX +XXX,XX @@ not just those in the cluster you are currently working on::
29
index XXXXXXX..XXXXXXX 100644
35
30
--- a/target/arm/translate-sve.c
36
(gdb) set schedule-multiple on
31
+++ b/target/arm/translate-sve.c
37
32
@@ -XXX,XX +XXX,XX @@ static bool do_trans_pmull(DisasContext *s, arg_rrr_esz *a, bool sel)
38
+Using unix sockets
33
gen_helper_gvec_pmull_q, gen_helper_sve2_pmull_h,
39
+==================
34
NULL, gen_helper_sve2_pmull_d,
35
};
36
- if (a->esz == 0
37
- ? !dc_isar_feature(aa64_sve2_pmull128, s)
38
- : !dc_isar_feature(aa64_sve, s)) {
40
+
39
+
41
+An alternate method for connecting gdb to the QEMU gdbstub is to use
40
+ if (a->esz == 0) {
42
+a unix socket (if supported by your operating system). This is useful when
41
+ if (!dc_isar_feature(aa64_sve2_pmull128, s)) {
43
+running several tests in parallel, or if you do not have a known free TCP
42
+ return false;
44
+port (e.g. when running automated tests).
43
+ }
45
+
44
+ s->is_nonstreaming = true;
46
+First create a chardev with the appropriate options, then
45
+ } else if (!dc_isar_feature(aa64_sve, s)) {
47
+instruct the gdbserver to use that device:
46
return false;
48
+
47
}
49
+.. parsed-literal::
48
return gen_gvec_ool_arg_zzz(s, fns[a->esz], a, sel);
50
+
49
@@ -XXX,XX +XXX,XX @@ DO_ZPZZ_FP(FMINP, aa64_sve2, sve2_fminp_zpzz)
51
+ |qemu_system| -chardev socket,path=/tmp/gdb-socket,server=on,wait=off,id=gdb0 -gdb chardev:gdb0 -S ...
50
* SVE Integer Multiply-Add (unpredicated)
52
+
51
*/
53
+Start gdb as before, but this time connect using the path to
52
54
+the socket::
53
-TRANS_FEAT(FMMLA_s, aa64_sve_f32mm, gen_gvec_fpst_zzzz, gen_helper_fmmla_s,
55
+
54
- a->rd, a->rn, a->rm, a->ra, 0, FPST_FPCR)
56
+ (gdb) target remote /tmp/gdb-socket
55
-TRANS_FEAT(FMMLA_d, aa64_sve_f64mm, gen_gvec_fpst_zzzz, gen_helper_fmmla_d,
57
+
56
- a->rd, a->rn, a->rm, a->ra, 0, FPST_FPCR)
58
+Note that to use a unix socket for the connection you will need
57
+TRANS_FEAT_NONSTREAMING(FMMLA_s, aa64_sve_f32mm, gen_gvec_fpst_zzzz,
59
+gdb version 9.0 or newer.
58
+ gen_helper_fmmla_s, a->rd, a->rn, a->rm, a->ra,
60
+
59
+ 0, FPST_FPCR)
61
Advanced debugging options
60
+TRANS_FEAT_NONSTREAMING(FMMLA_d, aa64_sve_f64mm, gen_gvec_fpst_zzzz,
62
==========================
61
+ gen_helper_fmmla_d, a->rd, a->rn, a->rm, a->ra,
63
62
+ 0, FPST_FPCR)
63
64
static gen_helper_gvec_4 * const sqdmlal_zzzw_fns[] = {
65
NULL, gen_helper_sve2_sqdmlal_zzzw_h,
66
@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(BFDOT_zzzz, aa64_sve_bf16, gen_gvec_ool_arg_zzzz,
67
TRANS_FEAT(BFDOT_zzxz, aa64_sve_bf16, gen_gvec_ool_arg_zzxz,
68
gen_helper_gvec_bfdot_idx, a)
69
70
-TRANS_FEAT(BFMMLA, aa64_sve_bf16, gen_gvec_ool_arg_zzzz,
71
- gen_helper_gvec_bfmmla, a, 0)
72
+TRANS_FEAT_NONSTREAMING(BFMMLA, aa64_sve_bf16, gen_gvec_ool_arg_zzzz,
73
+ gen_helper_gvec_bfmmla, a, 0)
74
75
static bool do_BFMLAL_zzzw(DisasContext *s, arg_rrrr_esz *a, bool sel)
76
{
64
--
77
--
65
2.20.1
78
2.25.1
66
67
diff view generated by jsdifflib
1
Implement the MVE VMLAS insn, which multiplies a vector by a vector
1
From: Richard Henderson <richard.henderson@linaro.org>
2
and adds a scalar.
3
2
3
Mark these as a non-streaming instructions, which should trap
4
if full a64 support is not enabled in streaming mode.
5
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20220708151540.18136-9-richard.henderson@linaro.org
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
---
10
---
7
target/arm/helper-mve.h | 4 ++++
11
target/arm/sme-fa64.decode | 3 ---
8
target/arm/mve.decode | 3 +++
12
target/arm/translate-sve.c | 15 +++++++++++----
9
target/arm/mve_helper.c | 26 ++++++++++++++++++++++++++
13
2 files changed, 11 insertions(+), 7 deletions(-)
10
target/arm/translate-mve.c | 1 +
11
4 files changed, 34 insertions(+)
12
14
13
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
15
diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
14
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/helper-mve.h
17
--- a/target/arm/sme-fa64.decode
16
+++ b/target/arm/helper-mve.h
18
+++ b/target/arm/sme-fa64.decode
17
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqdmullb_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i3
19
@@ -XXX,XX +XXX,XX @@ FAIL 0001 1110 0111 1110 0000 00-- ---- ---- # FJCVTZS
18
DEF_HELPER_FLAGS_4(mve_vqdmullt_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
20
# --11 1100 --1- ---- ---- ---- ---- --10 # Load/store FP register (register offset)
19
DEF_HELPER_FLAGS_4(mve_vqdmullt_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
21
# --11 1101 ---- ---- ---- ---- ---- ---- # Load/store FP register (scaled imm)
20
22
21
+DEF_HELPER_FLAGS_4(mve_vmlasb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
23
-FAIL 0110 0101 --0- ---- 0000 11-- ---- ---- # FTSMUL
22
+DEF_HELPER_FLAGS_4(mve_vmlash, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
24
-FAIL 0110 0101 --01 0--- 100- ---- ---- ---- # FTMAD
23
+DEF_HELPER_FLAGS_4(mve_vmlasw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
25
-FAIL 0110 0101 --01 1--- 001- ---- ---- ---- # FADDA
24
+
26
FAIL 0100 0101 --0- ---- 1001 10-- ---- ---- # SMMLA, UMMLA, USMMLA
25
DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
27
FAIL 0100 0101 --1- ---- 1--- ---- ---- ---- # SVE2 string/histo/crypto instructions
26
DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
28
FAIL 1000 010- -00- ---- 10-- ---- ---- ---- # SVE2 32-bit gather NT load (vector+scalar)
27
DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
29
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
28
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
29
index XXXXXXX..XXXXXXX 100644
30
index XXXXXXX..XXXXXXX 100644
30
--- a/target/arm/mve.decode
31
--- a/target/arm/translate-sve.c
31
+++ b/target/arm/mve.decode
32
+++ b/target/arm/translate-sve.c
32
@@ -XXX,XX +XXX,XX @@ VBRSR 1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
33
@@ -XXX,XX +XXX,XX @@ static gen_helper_gvec_3_ptr * const ftmad_fns[4] = {
33
VQDMULH_scalar 1110 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
34
NULL, gen_helper_sve_ftmad_h,
34
VQRDMULH_scalar 1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
35
gen_helper_sve_ftmad_s, gen_helper_sve_ftmad_d,
35
36
};
36
+# The U bit (28) is don't-care because it does not affect the result
37
-TRANS_FEAT(FTMAD, aa64_sve, gen_gvec_fpst_zzz,
37
+VMLAS 111- 1110 0 . .. ... 1 ... 1 1110 . 100 .... @2scalar
38
- ftmad_fns[a->esz], a->rd, a->rn, a->rm, a->imm,
38
+
39
- a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR)
39
# Vector add across vector
40
+TRANS_FEAT_NONSTREAMING(FTMAD, aa64_sve, gen_gvec_fpst_zzz,
40
{
41
+ ftmad_fns[a->esz], a->rd, a->rn, a->rm, a->imm,
41
VADDV 111 u:1 1110 1111 size:2 01 ... 0 1111 0 0 a:1 0 qm:3 0 rda=%rdalo
42
+ a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR)
42
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
43
43
index XXXXXXX..XXXXXXX 100644
44
/*
44
--- a/target/arm/mve_helper.c
45
*** SVE Floating Point Accumulating Reduction Group
45
+++ b/target/arm/mve_helper.c
46
@@ -XXX,XX +XXX,XX @@ static bool trans_FADDA(DisasContext *s, arg_rprr_esz *a)
46
@@ -XXX,XX +XXX,XX @@ DO_VQDMLADH_OP(vqrdmlsdhxw, 4, int32_t, 1, 1, do_vqdmlsdh_w)
47
if (a->esz == 0 || !dc_isar_feature(aa64_sve, s)) {
47
mve_advance_vpt(env); \
48
return false;
48
}
49
}
49
50
+ s->is_nonstreaming = true;
50
+/* "accumulating" version where FN takes d as well as n and m */
51
if (!sve_access_check(s)) {
51
+#define DO_2OP_ACC_SCALAR(OP, ESIZE, TYPE, FN) \
52
return true;
52
+ void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn, \
53
}
53
+ uint32_t rm) \
54
@@ -XXX,XX +XXX,XX @@ static bool trans_FADDA(DisasContext *s, arg_rprr_esz *a)
54
+ { \
55
DO_FP3(FADD_zzz, fadd)
55
+ TYPE *d = vd, *n = vn; \
56
DO_FP3(FSUB_zzz, fsub)
56
+ TYPE m = rm; \
57
DO_FP3(FMUL_zzz, fmul)
57
+ uint16_t mask = mve_element_mask(env); \
58
-DO_FP3(FTSMUL, ftsmul)
58
+ unsigned e; \
59
DO_FP3(FRECPS, recps)
59
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
60
DO_FP3(FRSQRTS, rsqrts)
60
+ mergemask(&d[H##ESIZE(e)], \
61
61
+ FN(d[H##ESIZE(e)], n[H##ESIZE(e)], m), mask); \
62
#undef DO_FP3
62
+ } \
63
63
+ mve_advance_vpt(env); \
64
+static gen_helper_gvec_3_ptr * const ftsmul_fns[4] = {
64
+ }
65
+ NULL, gen_helper_gvec_ftsmul_h,
65
+
66
+ gen_helper_gvec_ftsmul_s, gen_helper_gvec_ftsmul_d
66
/* provide unsigned 2-op scalar helpers for all sizes */
67
+};
67
#define DO_2OP_SCALAR_U(OP, FN) \
68
+TRANS_FEAT_NONSTREAMING(FTSMUL, aa64_sve, gen_gvec_fpst_arg_zzz,
68
DO_2OP_SCALAR(OP##b, 1, uint8_t, FN) \
69
+ ftsmul_fns[a->esz], a, 0)
69
@@ -XXX,XX +XXX,XX @@ DO_VQDMLADH_OP(vqrdmlsdhxw, 4, int32_t, 1, 1, do_vqdmlsdh_w)
70
DO_2OP_SCALAR(OP##h, 2, int16_t, FN) \
71
DO_2OP_SCALAR(OP##w, 4, int32_t, FN)
72
73
+#define DO_2OP_ACC_SCALAR_U(OP, FN) \
74
+ DO_2OP_ACC_SCALAR(OP##b, 1, uint8_t, FN) \
75
+ DO_2OP_ACC_SCALAR(OP##h, 2, uint16_t, FN) \
76
+ DO_2OP_ACC_SCALAR(OP##w, 4, uint32_t, FN)
77
+
78
DO_2OP_SCALAR_U(vadd_scalar, DO_ADD)
79
DO_2OP_SCALAR_U(vsub_scalar, DO_SUB)
80
DO_2OP_SCALAR_U(vmul_scalar, DO_MUL)
81
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT_SCALAR(vqrdmulh_scalarb, 1, int8_t, DO_QRDMULH_B)
82
DO_2OP_SAT_SCALAR(vqrdmulh_scalarh, 2, int16_t, DO_QRDMULH_H)
83
DO_2OP_SAT_SCALAR(vqrdmulh_scalarw, 4, int32_t, DO_QRDMULH_W)
84
85
+/* Vector by vector plus scalar */
86
+#define DO_VMLAS(D, N, M) ((N) * (D) + (M))
87
+
88
+DO_2OP_ACC_SCALAR_U(vmlas, DO_VMLAS)
89
+
70
+
90
/*
71
/*
91
* Long saturating scalar ops. As with DO_2OP_L, TYPE and H are for the
72
*** SVE Floating Point Arithmetic - Predicated Group
92
* input (smaller) type and LESIZE, LTYPE, LH for the output (long) type.
73
*/
93
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
94
index XXXXXXX..XXXXXXX 100644
95
--- a/target/arm/translate-mve.c
96
+++ b/target/arm/translate-mve.c
97
@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR(VQSUB_U_scalar, vqsubu_scalar)
98
DO_2OP_SCALAR(VQDMULH_scalar, vqdmulh_scalar)
99
DO_2OP_SCALAR(VQRDMULH_scalar, vqrdmulh_scalar)
100
DO_2OP_SCALAR(VBRSR, vbrsr)
101
+DO_2OP_SCALAR(VMLAS, vmlas)
102
103
static bool trans_VQDMULLB_scalar(DisasContext *s, arg_2scalar *a)
104
{
105
--
74
--
106
2.20.1
75
2.25.1
107
108
diff view generated by jsdifflib
1
From: Guenter Roeck <linux@roeck-us.net>
1
From: Richard Henderson <richard.henderson@linaro.org>
2
2
3
Instantiate SAI1/2/3 as unimplemented devices to avoid Linux kernel crashes
3
Mark these as a non-streaming instructions, which should trap
4
such as the following.
4
if full a64 support is not enabled in streaming mode.
5
5
6
Unhandled fault: external abort on non-linefetch (0x808) at 0xd19b0000
7
pgd = (ptrval)
8
[d19b0000] *pgd=82711811, *pte=308a0653, *ppte=308a0453
9
Internal error: : 808 [#1] SMP ARM
10
Modules linked in:
11
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-rc5 #1
12
...
13
[<c095e974>] (regmap_mmio_write32le) from [<c095eb48>] (regmap_mmio_write+0x3c/0x54)
14
[<c095eb48>] (regmap_mmio_write) from [<c09580f4>] (_regmap_write+0x4c/0x1f0)
15
[<c09580f4>] (_regmap_write) from [<c0959b28>] (regmap_write+0x3c/0x60)
16
[<c0959b28>] (regmap_write) from [<c0d41130>] (fsl_sai_runtime_resume+0x9c/0x1ec)
17
[<c0d41130>] (fsl_sai_runtime_resume) from [<c0942464>] (__rpm_callback+0x3c/0x108)
18
[<c0942464>] (__rpm_callback) from [<c0942590>] (rpm_callback+0x60/0x64)
19
[<c0942590>] (rpm_callback) from [<c0942b60>] (rpm_resume+0x5cc/0x808)
20
[<c0942b60>] (rpm_resume) from [<c0942dfc>] (__pm_runtime_resume+0x60/0xa0)
21
[<c0942dfc>] (__pm_runtime_resume) from [<c0d4231c>] (fsl_sai_probe+0x2b8/0x65c)
22
[<c0d4231c>] (fsl_sai_probe) from [<c0935b08>] (platform_probe+0x58/0xb8)
23
[<c0935b08>] (platform_probe) from [<c0933264>] (really_probe.part.0+0x9c/0x334)
24
[<c0933264>] (really_probe.part.0) from [<c093359c>] (__driver_probe_device+0xa0/0x138)
25
[<c093359c>] (__driver_probe_device) from [<c0933664>] (driver_probe_device+0x30/0xc8)
26
[<c0933664>] (driver_probe_device) from [<c0933c88>] (__driver_attach+0x90/0x130)
27
[<c0933c88>] (__driver_attach) from [<c0931060>] (bus_for_each_dev+0x78/0xb8)
28
[<c0931060>] (bus_for_each_dev) from [<c093254c>] (bus_add_driver+0xf0/0x1d8)
29
[<c093254c>] (bus_add_driver) from [<c0934a30>] (driver_register+0x88/0x118)
30
[<c0934a30>] (driver_register) from [<c01022c0>] (do_one_initcall+0x7c/0x3a4)
31
[<c01022c0>] (do_one_initcall) from [<c1601204>] (kernel_init_freeable+0x198/0x22c)
32
[<c1601204>] (kernel_init_freeable) from [<c0f5ff2c>] (kernel_init+0x10/0x128)
33
[<c0f5ff2c>] (kernel_init) from [<c010013c>] (ret_from_fork+0x14/0x38)
34
35
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
36
Message-id: 20210810175607.538090-1-linux@roeck-us.net
37
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20220708151540.18136-10-richard.henderson@linaro.org
38
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
39
---
10
---
40
include/hw/arm/fsl-imx7.h | 5 +++++
11
target/arm/sme-fa64.decode | 1 -
41
hw/arm/fsl-imx7.c | 7 +++++++
12
target/arm/translate-sve.c | 12 ++++++------
42
2 files changed, 12 insertions(+)
13
2 files changed, 6 insertions(+), 7 deletions(-)
43
14
44
diff --git a/include/hw/arm/fsl-imx7.h b/include/hw/arm/fsl-imx7.h
15
diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
45
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
46
--- a/include/hw/arm/fsl-imx7.h
17
--- a/target/arm/sme-fa64.decode
47
+++ b/include/hw/arm/fsl-imx7.h
18
+++ b/target/arm/sme-fa64.decode
48
@@ -XXX,XX +XXX,XX @@ enum FslIMX7MemoryMap {
19
@@ -XXX,XX +XXX,XX @@ FAIL 0001 1110 0111 1110 0000 00-- ---- ---- # FJCVTZS
49
FSL_IMX7_UART6_ADDR = 0x30A80000,
20
# --11 1100 --1- ---- ---- ---- ---- --10 # Load/store FP register (register offset)
50
FSL_IMX7_UART7_ADDR = 0x30A90000,
21
# --11 1101 ---- ---- ---- ---- ---- ---- # Load/store FP register (scaled imm)
51
22
52
+ FSL_IMX7_SAI1_ADDR = 0x308A0000,
23
-FAIL 0100 0101 --0- ---- 1001 10-- ---- ---- # SMMLA, UMMLA, USMMLA
53
+ FSL_IMX7_SAI2_ADDR = 0x308B0000,
24
FAIL 0100 0101 --1- ---- 1--- ---- ---- ---- # SVE2 string/histo/crypto instructions
54
+ FSL_IMX7_SAI3_ADDR = 0x308C0000,
25
FAIL 1000 010- -00- ---- 10-- ---- ---- ---- # SVE2 32-bit gather NT load (vector+scalar)
55
+ FSL_IMX7_SAIn_SIZE = 0x10000,
26
FAIL 1000 010- -00- ---- 111- ---- ---- ---- # SVE 32-bit gather prefetch (vector+imm)
56
+
27
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
57
FSL_IMX7_ENET1_ADDR = 0x30BE0000,
58
FSL_IMX7_ENET2_ADDR = 0x30BF0000,
59
60
diff --git a/hw/arm/fsl-imx7.c b/hw/arm/fsl-imx7.c
61
index XXXXXXX..XXXXXXX 100644
28
index XXXXXXX..XXXXXXX 100644
62
--- a/hw/arm/fsl-imx7.c
29
--- a/target/arm/translate-sve.c
63
+++ b/hw/arm/fsl-imx7.c
30
+++ b/target/arm/translate-sve.c
64
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
31
@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FMLALT_zzxw, aa64_sve2, do_FMLAL_zzxw, a, false, true)
65
create_unimplemented_device("can1", FSL_IMX7_CAN1_ADDR, FSL_IMX7_CANn_SIZE);
32
TRANS_FEAT(FMLSLB_zzxw, aa64_sve2, do_FMLAL_zzxw, a, true, false)
66
create_unimplemented_device("can2", FSL_IMX7_CAN2_ADDR, FSL_IMX7_CANn_SIZE);
33
TRANS_FEAT(FMLSLT_zzxw, aa64_sve2, do_FMLAL_zzxw, a, true, true)
67
34
68
+ /*
35
-TRANS_FEAT(SMMLA, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz,
69
+ * SAI (Audio SSI (Synchronous Serial Interface))
36
- gen_helper_gvec_smmla_b, a, 0)
70
+ */
37
-TRANS_FEAT(USMMLA, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz,
71
+ create_unimplemented_device("sai1", FSL_IMX7_SAI1_ADDR, FSL_IMX7_SAIn_SIZE);
38
- gen_helper_gvec_usmmla_b, a, 0)
72
+ create_unimplemented_device("sai2", FSL_IMX7_SAI2_ADDR, FSL_IMX7_SAIn_SIZE);
39
-TRANS_FEAT(UMMLA, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz,
73
+ create_unimplemented_device("sai2", FSL_IMX7_SAI3_ADDR, FSL_IMX7_SAIn_SIZE);
40
- gen_helper_gvec_ummla_b, a, 0)
74
+
41
+TRANS_FEAT_NONSTREAMING(SMMLA, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz,
75
/*
42
+ gen_helper_gvec_smmla_b, a, 0)
76
* OCOTP
43
+TRANS_FEAT_NONSTREAMING(USMMLA, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz,
77
*/
44
+ gen_helper_gvec_usmmla_b, a, 0)
45
+TRANS_FEAT_NONSTREAMING(UMMLA, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz,
46
+ gen_helper_gvec_ummla_b, a, 0)
47
48
TRANS_FEAT(BFDOT_zzzz, aa64_sve_bf16, gen_gvec_ool_arg_zzzz,
49
gen_helper_gvec_bfdot, a, 0)
78
--
50
--
79
2.20.1
51
2.25.1
80
81
diff view generated by jsdifflib
1
From: Eduardo Habkost <ehabkost@redhat.com>
1
From: Richard Henderson <richard.henderson@linaro.org>
2
2
3
The SBSA_GWDT enum value conflicts with the SBSA_GWDT() QOM type
3
Mark these as non-streaming instructions, which should trap
4
checking helper, preventing us from using a OBJECT_DEFINE* or
4
if full a64 support is not enabled in streaming mode.
5
DEFINE_INSTANCE_CHECKER macro for the SBSA_GWDT() wrapper.
6
5
7
If I understand the SBSA 6.0 specification correctly, the signal
8
being connected to IRQ 16 is the WS0 output signal from the
9
Generic Watchdog. Rename the enum value to SBSA_GWDT_WS0 to be
10
more explicit and avoid the name conflict.
11
12
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
13
Message-id: 20210806023119.431680-1-ehabkost@redhat.com
14
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20220708151540.18136-11-richard.henderson@linaro.org
15
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
16
---
10
---
17
hw/arm/sbsa-ref.c | 6 +++---
11
target/arm/sme-fa64.decode | 1 -
18
1 file changed, 3 insertions(+), 3 deletions(-)
12
target/arm/translate-sve.c | 35 ++++++++++++++++++-----------------
13
2 files changed, 18 insertions(+), 18 deletions(-)
19
14
20
diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
15
diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
21
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
22
--- a/hw/arm/sbsa-ref.c
17
--- a/target/arm/sme-fa64.decode
23
+++ b/hw/arm/sbsa-ref.c
18
+++ b/target/arm/sme-fa64.decode
24
@@ -XXX,XX +XXX,XX @@ enum {
19
@@ -XXX,XX +XXX,XX @@ FAIL 0001 1110 0111 1110 0000 00-- ---- ---- # FJCVTZS
25
SBSA_GIC_DIST,
20
# --11 1100 --1- ---- ---- ---- ---- --10 # Load/store FP register (register offset)
26
SBSA_GIC_REDIST,
21
# --11 1101 ---- ---- ---- ---- ---- ---- # Load/store FP register (scaled imm)
27
SBSA_SECURE_EC,
22
28
- SBSA_GWDT,
23
-FAIL 0100 0101 --1- ---- 1--- ---- ---- ---- # SVE2 string/histo/crypto instructions
29
+ SBSA_GWDT_WS0,
24
FAIL 1000 010- -00- ---- 10-- ---- ---- ---- # SVE2 32-bit gather NT load (vector+scalar)
30
SBSA_GWDT_REFRESH,
25
FAIL 1000 010- -00- ---- 111- ---- ---- ---- # SVE 32-bit gather prefetch (vector+imm)
31
SBSA_GWDT_CONTROL,
26
FAIL 1000 0100 0-1- ---- 0--- ---- ---- ---- # SVE 32-bit gather prefetch (scalar+vector)
32
SBSA_SMMU,
27
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
33
@@ -XXX,XX +XXX,XX @@ static const int sbsa_ref_irqmap[] = {
28
index XXXXXXX..XXXXXXX 100644
34
[SBSA_AHCI] = 10,
29
--- a/target/arm/translate-sve.c
35
[SBSA_EHCI] = 11,
30
+++ b/target/arm/translate-sve.c
36
[SBSA_SMMU] = 12, /* ... to 15 */
31
@@ -XXX,XX +XXX,XX @@ DO_SVE2_ZZZ_NARROW(RSUBHNT, rsubhnt)
37
- [SBSA_GWDT] = 16,
32
static gen_helper_gvec_flags_4 * const match_fns[4] = {
38
+ [SBSA_GWDT_WS0] = 16,
33
gen_helper_sve2_match_ppzz_b, gen_helper_sve2_match_ppzz_h, NULL, NULL
39
};
34
};
40
35
-TRANS_FEAT(MATCH, aa64_sve2, do_ppzz_flags, a, match_fns[a->esz])
41
static const char * const valid_cpus[] = {
36
+TRANS_FEAT_NONSTREAMING(MATCH, aa64_sve2, do_ppzz_flags, a, match_fns[a->esz])
42
@@ -XXX,XX +XXX,XX @@ static void create_wdt(const SBSAMachineState *sms)
37
43
hwaddr cbase = sbsa_ref_memmap[SBSA_GWDT_CONTROL].base;
38
static gen_helper_gvec_flags_4 * const nmatch_fns[4] = {
44
DeviceState *dev = qdev_new(TYPE_WDT_SBSA);
39
gen_helper_sve2_nmatch_ppzz_b, gen_helper_sve2_nmatch_ppzz_h, NULL, NULL
45
SysBusDevice *s = SYS_BUS_DEVICE(dev);
40
};
46
- int irq = sbsa_ref_irqmap[SBSA_GWDT];
41
-TRANS_FEAT(NMATCH, aa64_sve2, do_ppzz_flags, a, nmatch_fns[a->esz])
47
+ int irq = sbsa_ref_irqmap[SBSA_GWDT_WS0];
42
+TRANS_FEAT_NONSTREAMING(NMATCH, aa64_sve2, do_ppzz_flags, a, nmatch_fns[a->esz])
48
43
49
sysbus_realize_and_unref(s, &error_fatal);
44
static gen_helper_gvec_4 * const histcnt_fns[4] = {
50
sysbus_mmio_map(s, 0, rbase);
45
NULL, NULL, gen_helper_sve2_histcnt_s, gen_helper_sve2_histcnt_d
46
};
47
-TRANS_FEAT(HISTCNT, aa64_sve2, gen_gvec_ool_arg_zpzz,
48
- histcnt_fns[a->esz], a, 0)
49
+TRANS_FEAT_NONSTREAMING(HISTCNT, aa64_sve2, gen_gvec_ool_arg_zpzz,
50
+ histcnt_fns[a->esz], a, 0)
51
52
-TRANS_FEAT(HISTSEG, aa64_sve2, gen_gvec_ool_arg_zzz,
53
- a->esz == 0 ? gen_helper_sve2_histseg : NULL, a, 0)
54
+TRANS_FEAT_NONSTREAMING(HISTSEG, aa64_sve2, gen_gvec_ool_arg_zzz,
55
+ a->esz == 0 ? gen_helper_sve2_histseg : NULL, a, 0)
56
57
DO_ZPZZ_FP(FADDP, aa64_sve2, sve2_faddp_zpzz)
58
DO_ZPZZ_FP(FMAXNMP, aa64_sve2, sve2_fmaxnmp_zpzz)
59
@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(SQRDCMLAH_zzzz, aa64_sve2, gen_gvec_ool_zzzz,
60
TRANS_FEAT(USDOT_zzzz, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz,
61
a->esz == 2 ? gen_helper_gvec_usdot_b : NULL, a, 0)
62
63
-TRANS_FEAT(AESMC, aa64_sve2_aes, gen_gvec_ool_zz,
64
- gen_helper_crypto_aesmc, a->rd, a->rd, a->decrypt)
65
+TRANS_FEAT_NONSTREAMING(AESMC, aa64_sve2_aes, gen_gvec_ool_zz,
66
+ gen_helper_crypto_aesmc, a->rd, a->rd, a->decrypt)
67
68
-TRANS_FEAT(AESE, aa64_sve2_aes, gen_gvec_ool_arg_zzz,
69
- gen_helper_crypto_aese, a, false)
70
-TRANS_FEAT(AESD, aa64_sve2_aes, gen_gvec_ool_arg_zzz,
71
- gen_helper_crypto_aese, a, true)
72
+TRANS_FEAT_NONSTREAMING(AESE, aa64_sve2_aes, gen_gvec_ool_arg_zzz,
73
+ gen_helper_crypto_aese, a, false)
74
+TRANS_FEAT_NONSTREAMING(AESD, aa64_sve2_aes, gen_gvec_ool_arg_zzz,
75
+ gen_helper_crypto_aese, a, true)
76
77
-TRANS_FEAT(SM4E, aa64_sve2_sm4, gen_gvec_ool_arg_zzz,
78
- gen_helper_crypto_sm4e, a, 0)
79
-TRANS_FEAT(SM4EKEY, aa64_sve2_sm4, gen_gvec_ool_arg_zzz,
80
- gen_helper_crypto_sm4ekey, a, 0)
81
+TRANS_FEAT_NONSTREAMING(SM4E, aa64_sve2_sm4, gen_gvec_ool_arg_zzz,
82
+ gen_helper_crypto_sm4e, a, 0)
83
+TRANS_FEAT_NONSTREAMING(SM4EKEY, aa64_sve2_sm4, gen_gvec_ool_arg_zzz,
84
+ gen_helper_crypto_sm4ekey, a, 0)
85
86
-TRANS_FEAT(RAX1, aa64_sve2_sha3, gen_gvec_fn_arg_zzz, gen_gvec_rax1, a)
87
+TRANS_FEAT_NONSTREAMING(RAX1, aa64_sve2_sha3, gen_gvec_fn_arg_zzz,
88
+ gen_gvec_rax1, a)
89
90
TRANS_FEAT(FCVTNT_sh, aa64_sve2, gen_gvec_fpst_arg_zpz,
91
gen_helper_sve2_fcvtnt_sh, a, 0, FPST_FPCR)
51
--
92
--
52
2.20.1
93
2.25.1
53
54
diff view generated by jsdifflib
1
From: Guenter Roeck <linux@roeck-us.net>
1
From: Richard Henderson <richard.henderson@linaro.org>
2
2
3
Instantiate SAI1/2/3 and ASRC as unimplemented devices to avoid random
3
Mark these as a non-streaming instructions, which should trap
4
Linux kernel crashes, such as
4
if full a64 support is not enabled in streaming mode.
5
5
6
Unhandled fault: external abort on non-linefetch (0x808) at 0xd1580010
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
pgd = (ptrval)
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
[d1580010] *pgd=8231b811, *pte=02034653, *ppte=02034453
8
Message-id: 20220708151540.18136-12-richard.henderson@linaro.org
9
Internal error: : 808 [#1] SMP ARM
10
...
11
[<c095e974>] (regmap_mmio_write32le) from [<c095eb48>] (regmap_mmio_write+0x3c/0x54)
12
[<c095eb48>] (regmap_mmio_write) from [<c09580f4>] (_regmap_write+0x4c/0x1f0)
13
[<c09580f4>] (_regmap_write) from [<c095837c>] (_regmap_update_bits+0xe4/0xec)
14
[<c095837c>] (_regmap_update_bits) from [<c09599b4>] (regmap_update_bits_base+0x50/0x74)
15
[<c09599b4>] (regmap_update_bits_base) from [<c0d3e9e4>] (fsl_asrc_runtime_resume+0x1e4/0x21c)
16
[<c0d3e9e4>] (fsl_asrc_runtime_resume) from [<c0942464>] (__rpm_callback+0x3c/0x108)
17
[<c0942464>] (__rpm_callback) from [<c0942590>] (rpm_callback+0x60/0x64)
18
[<c0942590>] (rpm_callback) from [<c0942b60>] (rpm_resume+0x5cc/0x808)
19
[<c0942b60>] (rpm_resume) from [<c0942dfc>] (__pm_runtime_resume+0x60/0xa0)
20
[<c0942dfc>] (__pm_runtime_resume) from [<c0d3ecc4>] (fsl_asrc_probe+0x2a8/0x708)
21
[<c0d3ecc4>] (fsl_asrc_probe) from [<c0935b08>] (platform_probe+0x58/0xb8)
22
[<c0935b08>] (platform_probe) from [<c0933264>] (really_probe.part.0+0x9c/0x334)
23
[<c0933264>] (really_probe.part.0) from [<c093359c>] (__driver_probe_device+0xa0/0x138)
24
[<c093359c>] (__driver_probe_device) from [<c0933664>] (driver_probe_device+0x30/0xc8)
25
[<c0933664>] (driver_probe_device) from [<c0933c88>] (__driver_attach+0x90/0x130)
26
[<c0933c88>] (__driver_attach) from [<c0931060>] (bus_for_each_dev+0x78/0xb8)
27
[<c0931060>] (bus_for_each_dev) from [<c093254c>] (bus_add_driver+0xf0/0x1d8)
28
[<c093254c>] (bus_add_driver) from [<c0934a30>] (driver_register+0x88/0x118)
29
[<c0934a30>] (driver_register) from [<c01022c0>] (do_one_initcall+0x7c/0x3a4)
30
[<c01022c0>] (do_one_initcall) from [<c1601204>] (kernel_init_freeable+0x198/0x22c)
31
[<c1601204>] (kernel_init_freeable) from [<c0f5ff2c>] (kernel_init+0x10/0x128)
32
[<c0f5ff2c>] (kernel_init) from [<c010013c>] (ret_from_fork+0x14/0x38)
33
34
or
35
36
Unhandled fault: external abort on non-linefetch (0x808) at 0xd19b0000
37
pgd = (ptrval)
38
[d19b0000] *pgd=82711811, *pte=308a0653, *ppte=308a0453
39
Internal error: : 808 [#1] SMP ARM
40
...
41
[<c095e974>] (regmap_mmio_write32le) from [<c095eb48>] (regmap_mmio_write+0x3c/0x54)
42
[<c095eb48>] (regmap_mmio_write) from [<c09580f4>] (_regmap_write+0x4c/0x1f0)
43
[<c09580f4>] (_regmap_write) from [<c0959b28>] (regmap_write+0x3c/0x60)
44
[<c0959b28>] (regmap_write) from [<c0d41130>] (fsl_sai_runtime_resume+0x9c/0x1ec)
45
[<c0d41130>] (fsl_sai_runtime_resume) from [<c0942464>] (__rpm_callback+0x3c/0x108)
46
[<c0942464>] (__rpm_callback) from [<c0942590>] (rpm_callback+0x60/0x64)
47
[<c0942590>] (rpm_callback) from [<c0942b60>] (rpm_resume+0x5cc/0x808)
48
[<c0942b60>] (rpm_resume) from [<c0942dfc>] (__pm_runtime_resume+0x60/0xa0)
49
[<c0942dfc>] (__pm_runtime_resume) from [<c0d4231c>] (fsl_sai_probe+0x2b8/0x65c)
50
[<c0d4231c>] (fsl_sai_probe) from [<c0935b08>] (platform_probe+0x58/0xb8)
51
[<c0935b08>] (platform_probe) from [<c0933264>] (really_probe.part.0+0x9c/0x334)
52
[<c0933264>] (really_probe.part.0) from [<c093359c>] (__driver_probe_device+0xa0/0x138)
53
[<c093359c>] (__driver_probe_device) from [<c0933664>] (driver_probe_device+0x30/0xc8)
54
[<c0933664>] (driver_probe_device) from [<c0933c88>] (__driver_attach+0x90/0x130)
55
[<c0933c88>] (__driver_attach) from [<c0931060>] (bus_for_each_dev+0x78/0xb8)
56
[<c0931060>] (bus_for_each_dev) from [<c093254c>] (bus_add_driver+0xf0/0x1d8)
57
[<c093254c>] (bus_add_driver) from [<c0934a30>] (driver_register+0x88/0x118)
58
[<c0934a30>] (driver_register) from [<c01022c0>] (do_one_initcall+0x7c/0x3a4)
59
[<c01022c0>] (do_one_initcall) from [<c1601204>] (kernel_init_freeable+0x198/0x22c)
60
[<c1601204>] (kernel_init_freeable) from [<c0f5ff2c>] (kernel_init+0x10/0x128)
61
[<c0f5ff2c>] (kernel_init) from [<c010013c>] (ret_from_fork+0x14/0x38)
62
63
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
64
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
65
Message-id: 20210810160318.87376-1-linux@roeck-us.net
66
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
67
---
10
---
68
hw/arm/fsl-imx6ul.c | 12 ++++++++++++
11
target/arm/sme-fa64.decode | 9 ---------
69
1 file changed, 12 insertions(+)
12
target/arm/translate-sve.c | 6 ++++++
13
2 files changed, 6 insertions(+), 9 deletions(-)
70
14
71
diff --git a/hw/arm/fsl-imx6ul.c b/hw/arm/fsl-imx6ul.c
15
diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
72
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
73
--- a/hw/arm/fsl-imx6ul.c
17
--- a/target/arm/sme-fa64.decode
74
+++ b/hw/arm/fsl-imx6ul.c
18
+++ b/target/arm/sme-fa64.decode
75
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
19
@@ -XXX,XX +XXX,XX @@ FAIL 0001 1110 0111 1110 0000 00-- ---- ---- # FJCVTZS
76
*/
20
# --11 1100 --1- ---- ---- ---- ---- --10 # Load/store FP register (register offset)
77
create_unimplemented_device("sdma", FSL_IMX6UL_SDMA_ADDR, 0x4000);
21
# --11 1101 ---- ---- ---- ---- ---- ---- # Load/store FP register (scaled imm)
78
22
79
+ /*
23
-FAIL 1000 010- -00- ---- 10-- ---- ---- ---- # SVE2 32-bit gather NT load (vector+scalar)
80
+ * SAI (Audio SSI (Synchronous Serial Interface))
24
FAIL 1000 010- -00- ---- 111- ---- ---- ---- # SVE 32-bit gather prefetch (vector+imm)
81
+ */
25
FAIL 1000 0100 0-1- ---- 0--- ---- ---- ---- # SVE 32-bit gather prefetch (scalar+vector)
82
+ create_unimplemented_device("sai1", FSL_IMX6UL_SAI1_ADDR, 0x4000);
26
-FAIL 1000 010- -01- ---- 1--- ---- ---- ---- # SVE 32-bit gather load (vector+imm)
83
+ create_unimplemented_device("sai2", FSL_IMX6UL_SAI2_ADDR, 0x4000);
27
-FAIL 1000 0100 0-0- ---- 0--- ---- ---- ---- # SVE 32-bit gather load byte (scalar+vector)
84
+ create_unimplemented_device("sai3", FSL_IMX6UL_SAI3_ADDR, 0x4000);
28
-FAIL 1000 0100 1--- ---- 0--- ---- ---- ---- # SVE 32-bit gather load half (scalar+vector)
85
+
29
-FAIL 1000 0101 0--- ---- 0--- ---- ---- ---- # SVE 32-bit gather load word (scalar+vector)
86
/*
30
FAIL 1010 010- ---- ---- 011- ---- ---- ---- # SVE contiguous FF load (scalar+scalar)
87
* PWM
31
FAIL 1010 010- ---1 ---- 101- ---- ---- ---- # SVE contiguous NF load (scalar+imm)
88
*/
32
FAIL 1010 010- -01- ---- 000- ---- ---- ---- # SVE load & replicate 32 bytes (scalar+scalar)
89
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
33
FAIL 1010 010- -010 ---- 001- ---- ---- ---- # SVE load & replicate 32 bytes (scalar+imm)
90
create_unimplemented_device("pwm3", FSL_IMX6UL_PWM3_ADDR, 0x4000);
34
FAIL 1100 010- ---- ---- ---- ---- ---- ---- # SVE 64-bit gather load/prefetch
91
create_unimplemented_device("pwm4", FSL_IMX6UL_PWM4_ADDR, 0x4000);
35
-FAIL 1110 010- -00- ---- 001- ---- ---- ---- # SVE2 64-bit scatter NT store (vector+scalar)
92
36
-FAIL 1110 010- -10- ---- 001- ---- ---- ---- # SVE2 32-bit scatter NT store (vector+scalar)
93
+ /*
37
-FAIL 1110 010- ---- ---- 1-0- ---- ---- ---- # SVE scatter store (scalar+32-bit vector)
94
+ * Audio ASRC (asynchronous sample rate converter)
38
-FAIL 1110 010- ---- ---- 101- ---- ---- ---- # SVE scatter store (misc)
95
+ */
39
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
96
+ create_unimplemented_device("asrc", FSL_IMX6UL_ASRC_ADDR, 0x4000);
40
index XXXXXXX..XXXXXXX 100644
97
+
41
--- a/target/arm/translate-sve.c
98
/*
42
+++ b/target/arm/translate-sve.c
99
* CAN
43
@@ -XXX,XX +XXX,XX @@ static bool trans_LD1_zprz(DisasContext *s, arg_LD1_zprz *a)
100
*/
44
if (!dc_isar_feature(aa64_sve, s)) {
45
return false;
46
}
47
+ s->is_nonstreaming = true;
48
if (!sve_access_check(s)) {
49
return true;
50
}
51
@@ -XXX,XX +XXX,XX @@ static bool trans_LD1_zpiz(DisasContext *s, arg_LD1_zpiz *a)
52
if (!dc_isar_feature(aa64_sve, s)) {
53
return false;
54
}
55
+ s->is_nonstreaming = true;
56
if (!sve_access_check(s)) {
57
return true;
58
}
59
@@ -XXX,XX +XXX,XX @@ static bool trans_LDNT1_zprz(DisasContext *s, arg_LD1_zprz *a)
60
if (!dc_isar_feature(aa64_sve2, s)) {
61
return false;
62
}
63
+ s->is_nonstreaming = true;
64
if (!sve_access_check(s)) {
65
return true;
66
}
67
@@ -XXX,XX +XXX,XX @@ static bool trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a)
68
if (!dc_isar_feature(aa64_sve, s)) {
69
return false;
70
}
71
+ s->is_nonstreaming = true;
72
if (!sve_access_check(s)) {
73
return true;
74
}
75
@@ -XXX,XX +XXX,XX @@ static bool trans_ST1_zpiz(DisasContext *s, arg_ST1_zpiz *a)
76
if (!dc_isar_feature(aa64_sve, s)) {
77
return false;
78
}
79
+ s->is_nonstreaming = true;
80
if (!sve_access_check(s)) {
81
return true;
82
}
83
@@ -XXX,XX +XXX,XX @@ static bool trans_STNT1_zprz(DisasContext *s, arg_ST1_zprz *a)
84
if (!dc_isar_feature(aa64_sve2, s)) {
85
return false;
86
}
87
+ s->is_nonstreaming = true;
88
if (!sve_access_check(s)) {
89
return true;
90
}
101
--
91
--
102
2.20.1
92
2.25.1
103
104
diff view generated by jsdifflib
1
Implement the MVE integer vector comparison instructions that compare
1
From: Richard Henderson <richard.henderson@linaro.org>
2
each element against a scalar from a general purpose register. These
3
are "VCMP (vector)" encodings T4, T5 and T6 and "VPT (vector)"
4
encodings T4, T5 and T6.
5
2
6
We have to move the decodetree pattern for VPST, because it
3
Mark these as a non-streaming instructions, which should trap if full
7
overlaps with VCMP T4 with size = 0b11.
4
a64 support is not enabled in streaming mode. In this case, introduce
5
PRF_ns (prefetch non-streaming) to handle the checks.
8
6
7
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
8
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
9
Message-id: 20220708151540.18136-13-richard.henderson@linaro.org
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
10
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
11
---
11
---
12
target/arm/helper-mve.h | 32 +++++++++++++++++++++++++++
12
target/arm/sme-fa64.decode | 3 ---
13
target/arm/mve.decode | 18 +++++++++++++---
13
target/arm/sve.decode | 10 +++++-----
14
target/arm/mve_helper.c | 44 +++++++++++++++++++++++++++++++-------
14
target/arm/translate-sve.c | 11 +++++++++++
15
target/arm/translate-mve.c | 43 +++++++++++++++++++++++++++++++++++++
15
3 files changed, 16 insertions(+), 8 deletions(-)
16
4 files changed, 126 insertions(+), 11 deletions(-)
17
16
18
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
17
diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
19
index XXXXXXX..XXXXXXX 100644
18
index XXXXXXX..XXXXXXX 100644
20
--- a/target/arm/helper-mve.h
19
--- a/target/arm/sme-fa64.decode
21
+++ b/target/arm/helper-mve.h
20
+++ b/target/arm/sme-fa64.decode
22
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vcmpgtw, TCG_CALL_NO_WG, void, env, ptr, ptr)
21
@@ -XXX,XX +XXX,XX @@ FAIL 0001 1110 0111 1110 0000 00-- ---- ---- # FJCVTZS
23
DEF_HELPER_FLAGS_3(mve_vcmpleb, TCG_CALL_NO_WG, void, env, ptr, ptr)
22
# --11 1100 --1- ---- ---- ---- ---- --10 # Load/store FP register (register offset)
24
DEF_HELPER_FLAGS_3(mve_vcmpleh, TCG_CALL_NO_WG, void, env, ptr, ptr)
23
# --11 1101 ---- ---- ---- ---- ---- ---- # Load/store FP register (scaled imm)
25
DEF_HELPER_FLAGS_3(mve_vcmplew, TCG_CALL_NO_WG, void, env, ptr, ptr)
24
26
+
25
-FAIL 1000 010- -00- ---- 111- ---- ---- ---- # SVE 32-bit gather prefetch (vector+imm)
27
+DEF_HELPER_FLAGS_3(mve_vcmpeq_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
26
-FAIL 1000 0100 0-1- ---- 0--- ---- ---- ---- # SVE 32-bit gather prefetch (scalar+vector)
28
+DEF_HELPER_FLAGS_3(mve_vcmpeq_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
27
FAIL 1010 010- ---- ---- 011- ---- ---- ---- # SVE contiguous FF load (scalar+scalar)
29
+DEF_HELPER_FLAGS_3(mve_vcmpeq_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
28
FAIL 1010 010- ---1 ---- 101- ---- ---- ---- # SVE contiguous NF load (scalar+imm)
30
+
29
FAIL 1010 010- -01- ---- 000- ---- ---- ---- # SVE load & replicate 32 bytes (scalar+scalar)
31
+DEF_HELPER_FLAGS_3(mve_vcmpne_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
30
FAIL 1010 010- -010 ---- 001- ---- ---- ---- # SVE load & replicate 32 bytes (scalar+imm)
32
+DEF_HELPER_FLAGS_3(mve_vcmpne_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
31
-FAIL 1100 010- ---- ---- ---- ---- ---- ---- # SVE 64-bit gather load/prefetch
33
+DEF_HELPER_FLAGS_3(mve_vcmpne_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
32
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
34
+
35
+DEF_HELPER_FLAGS_3(mve_vcmpcs_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
36
+DEF_HELPER_FLAGS_3(mve_vcmpcs_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
37
+DEF_HELPER_FLAGS_3(mve_vcmpcs_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
38
+
39
+DEF_HELPER_FLAGS_3(mve_vcmphi_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
40
+DEF_HELPER_FLAGS_3(mve_vcmphi_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
41
+DEF_HELPER_FLAGS_3(mve_vcmphi_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
42
+
43
+DEF_HELPER_FLAGS_3(mve_vcmpge_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
44
+DEF_HELPER_FLAGS_3(mve_vcmpge_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
45
+DEF_HELPER_FLAGS_3(mve_vcmpge_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
46
+
47
+DEF_HELPER_FLAGS_3(mve_vcmplt_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
48
+DEF_HELPER_FLAGS_3(mve_vcmplt_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
49
+DEF_HELPER_FLAGS_3(mve_vcmplt_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
50
+
51
+DEF_HELPER_FLAGS_3(mve_vcmpgt_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
52
+DEF_HELPER_FLAGS_3(mve_vcmpgt_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
53
+DEF_HELPER_FLAGS_3(mve_vcmpgt_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
54
+
55
+DEF_HELPER_FLAGS_3(mve_vcmple_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
56
+DEF_HELPER_FLAGS_3(mve_vcmple_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
57
+DEF_HELPER_FLAGS_3(mve_vcmple_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
58
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
59
index XXXXXXX..XXXXXXX 100644
33
index XXXXXXX..XXXXXXX 100644
60
--- a/target/arm/mve.decode
34
--- a/target/arm/sve.decode
61
+++ b/target/arm/mve.decode
35
+++ b/target/arm/sve.decode
62
@@ -XXX,XX +XXX,XX @@
36
@@ -XXX,XX +XXX,XX @@ LD1RO_zpri 1010010 .. 01 0.... 001 ... ..... ..... \
63
&vidup qd rn size imm
37
@rpri_load_msz nreg=0
64
&viwdup qd rn rm size imm
38
65
&vcmp qm qn size mask
39
# SVE 32-bit gather prefetch (scalar plus 32-bit scaled offsets)
66
+&vcmp_scalar qn rm size mask
40
-PRF 1000010 00 -1 ----- 0-- --- ----- 0 ----
67
41
+PRF_ns 1000010 00 -1 ----- 0-- --- ----- 0 ----
68
@vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
42
69
# Note that both Rn and Qd are 3 bits only (no D bit)
43
# SVE 32-bit gather prefetch (vector plus immediate)
70
@@ -XXX,XX +XXX,XX @@
44
-PRF 1000010 -- 00 ----- 111 --- ----- 0 ----
71
# Vector comparison; 4-bit Qm but 3-bit Qn
45
+PRF_ns 1000010 -- 00 ----- 111 --- ----- 0 ----
72
%mask_22_13 22:1 13:3
46
73
@vcmp .... .... .. size:2 qn:3 . .... .... .... .... &vcmp qm=%qm mask=%mask_22_13
47
# SVE contiguous prefetch (scalar plus immediate)
74
+@vcmp_scalar .... .... .. size:2 qn:3 . .... .... .... rm:4 &vcmp_scalar \
48
PRF 1000010 11 1- ----- 0-- --- ----- 0 ----
75
+ mask=%mask_22_13
49
@@ -XXX,XX +XXX,XX @@ LD1_zpiz 1100010 .. 01 ..... 1.. ... ..... ..... \
76
50
@rpri_g_load esz=3
77
# Vector loads and stores
51
78
52
# SVE 64-bit gather prefetch (scalar plus 64-bit scaled offsets)
79
@@ -XXX,XX +XXX,XX @@ VQRDMULH_scalar 1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
53
-PRF 1100010 00 11 ----- 1-- --- ----- 0 ----
80
rdahi=%rdahi rdalo=%rdalo
54
+PRF_ns 1100010 00 11 ----- 1-- --- ----- 0 ----
81
}
55
82
56
# SVE 64-bit gather prefetch (scalar plus unpacked 32-bit scaled offsets)
83
-# Predicate operations
57
-PRF 1100010 00 -1 ----- 0-- --- ----- 0 ----
84
-VPST 1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
58
+PRF_ns 1100010 00 -1 ----- 0-- --- ----- 0 ----
85
-
59
86
# Logical immediate operations (1 reg and modified-immediate)
60
# SVE 64-bit gather prefetch (vector plus immediate)
87
61
-PRF 1100010 -- 00 ----- 111 --- ----- 0 ----
88
# The cmode/op bits here decode VORR/VBIC/VMOV/VMVN, but
62
+PRF_ns 1100010 -- 00 ----- 111 --- ----- 0 ----
89
@@ -XXX,XX +XXX,XX @@ VCMPGE 1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 0 @vcmp
63
90
VCMPLT 1111 1110 0 . .. ... 1 ... 1 1111 1 0 . 0 ... 0 @vcmp
64
### SVE Memory Store Group
91
VCMPGT 1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 1 @vcmp
65
92
VCMPLE 1111 1110 0 . .. ... 1 ... 1 1111 1 0 . 0 ... 1 @vcmp
66
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
93
+
94
+{
95
+ VPST 1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
96
+ VCMPEQ_scalar 1111 1110 0 . .. ... 1 ... 0 1111 0 1 0 0 .... @vcmp_scalar
97
+}
98
+VCMPNE_scalar 1111 1110 0 . .. ... 1 ... 0 1111 1 1 0 0 .... @vcmp_scalar
99
+VCMPCS_scalar 1111 1110 0 . .. ... 1 ... 0 1111 0 1 1 0 .... @vcmp_scalar
100
+VCMPHI_scalar 1111 1110 0 . .. ... 1 ... 0 1111 1 1 1 0 .... @vcmp_scalar
101
+VCMPGE_scalar 1111 1110 0 . .. ... 1 ... 1 1111 0 1 0 0 .... @vcmp_scalar
102
+VCMPLT_scalar 1111 1110 0 . .. ... 1 ... 1 1111 1 1 0 0 .... @vcmp_scalar
103
+VCMPGT_scalar 1111 1110 0 . .. ... 1 ... 1 1111 0 1 1 0 .... @vcmp_scalar
104
+VCMPLE_scalar 1111 1110 0 . .. ... 1 ... 1 1111 1 1 1 0 .... @vcmp_scalar
105
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
106
index XXXXXXX..XXXXXXX 100644
67
index XXXXXXX..XXXXXXX 100644
107
--- a/target/arm/mve_helper.c
68
--- a/target/arm/translate-sve.c
108
+++ b/target/arm/mve_helper.c
69
+++ b/target/arm/translate-sve.c
109
@@ -XXX,XX +XXX,XX @@ DO_VIWDUP_ALL(vdwdup, do_sub_wrap)
70
@@ -XXX,XX +XXX,XX @@ static bool trans_PRF_rr(DisasContext *s, arg_PRF_rr *a)
110
mve_advance_vpt(env); \
111
}
112
113
-#define DO_VCMP_S(OP, FN) \
114
- DO_VCMP(OP##b, 1, int8_t, FN) \
115
- DO_VCMP(OP##h, 2, int16_t, FN) \
116
- DO_VCMP(OP##w, 4, int32_t, FN)
117
+#define DO_VCMP_SCALAR(OP, ESIZE, TYPE, FN) \
118
+ void HELPER(glue(mve_, OP))(CPUARMState *env, void *vn, \
119
+ uint32_t rm) \
120
+ { \
121
+ TYPE *n = vn; \
122
+ uint16_t mask = mve_element_mask(env); \
123
+ uint16_t eci_mask = mve_eci_mask(env); \
124
+ uint16_t beatpred = 0; \
125
+ uint16_t emask = MAKE_64BIT_MASK(0, ESIZE); \
126
+ unsigned e; \
127
+ for (e = 0; e < 16 / ESIZE; e++) { \
128
+ bool r = FN(n[H##ESIZE(e)], (TYPE)rm); \
129
+ /* Comparison sets 0/1 bits for each byte in the element */ \
130
+ beatpred |= r * emask; \
131
+ emask <<= ESIZE; \
132
+ } \
133
+ beatpred &= mask; \
134
+ env->v7m.vpr = (env->v7m.vpr & ~(uint32_t)eci_mask) | \
135
+ (beatpred & eci_mask); \
136
+ mve_advance_vpt(env); \
137
+ }
138
139
-#define DO_VCMP_U(OP, FN) \
140
- DO_VCMP(OP##b, 1, uint8_t, FN) \
141
- DO_VCMP(OP##h, 2, uint16_t, FN) \
142
- DO_VCMP(OP##w, 4, uint32_t, FN)
143
+#define DO_VCMP_S(OP, FN) \
144
+ DO_VCMP(OP##b, 1, int8_t, FN) \
145
+ DO_VCMP(OP##h, 2, int16_t, FN) \
146
+ DO_VCMP(OP##w, 4, int32_t, FN) \
147
+ DO_VCMP_SCALAR(OP##_scalarb, 1, int8_t, FN) \
148
+ DO_VCMP_SCALAR(OP##_scalarh, 2, int16_t, FN) \
149
+ DO_VCMP_SCALAR(OP##_scalarw, 4, int32_t, FN)
150
+
151
+#define DO_VCMP_U(OP, FN) \
152
+ DO_VCMP(OP##b, 1, uint8_t, FN) \
153
+ DO_VCMP(OP##h, 2, uint16_t, FN) \
154
+ DO_VCMP(OP##w, 4, uint32_t, FN) \
155
+ DO_VCMP_SCALAR(OP##_scalarb, 1, uint8_t, FN) \
156
+ DO_VCMP_SCALAR(OP##_scalarh, 2, uint16_t, FN) \
157
+ DO_VCMP_SCALAR(OP##_scalarw, 4, uint32_t, FN)
158
159
#define DO_EQ(N, M) ((N) == (M))
160
#define DO_NE(N, M) ((N) != (M))
161
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
162
index XXXXXXX..XXXXXXX 100644
163
--- a/target/arm/translate-mve.c
164
+++ b/target/arm/translate-mve.c
165
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenOneOpImmFn(TCGv_ptr, TCGv_ptr, TCGv_i64);
166
typedef void MVEGenVIDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32);
167
typedef void MVEGenVIWDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32);
168
typedef void MVEGenCmpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
169
+typedef void MVEGenScalarCmpFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
170
171
/* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
172
static inline long mve_qreg_offset(unsigned reg)
173
@@ -XXX,XX +XXX,XX @@ static bool do_vcmp(DisasContext *s, arg_vcmp *a, MVEGenCmpFn *fn)
174
return true;
71
return true;
175
}
72
}
176
73
177
+static bool do_vcmp_scalar(DisasContext *s, arg_vcmp_scalar *a,
74
+static bool trans_PRF_ns(DisasContext *s, arg_PRF_ns *a)
178
+ MVEGenScalarCmpFn *fn)
179
+{
75
+{
180
+ TCGv_ptr qn;
76
+ if (!dc_isar_feature(aa64_sve, s)) {
181
+ TCGv_i32 rm;
182
+
183
+ if (!dc_isar_feature(aa32_mve, s) || !fn || a->rm == 13) {
184
+ return false;
77
+ return false;
185
+ }
78
+ }
186
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
79
+ /* Prefetch is a nop within QEMU. */
187
+ return true;
80
+ s->is_nonstreaming = true;
188
+ }
81
+ (void)sve_access_check(s);
189
+
190
+ qn = mve_qreg_ptr(a->qn);
191
+ if (a->rm == 15) {
192
+ /* Encoding Rm=0b1111 means "constant zero" */
193
+ rm = tcg_constant_i32(0);
194
+ } else {
195
+ rm = load_reg(s, a->rm);
196
+ }
197
+ fn(cpu_env, qn, rm);
198
+ tcg_temp_free_ptr(qn);
199
+ tcg_temp_free_i32(rm);
200
+ if (a->mask) {
201
+ /* VPT */
202
+ gen_vpst(s, a->mask);
203
+ }
204
+ mve_update_eci(s);
205
+ return true;
82
+ return true;
206
+}
83
+}
207
+
84
+
208
#define DO_VCMP(INSN, FN) \
85
/*
209
static bool trans_##INSN(DisasContext *s, arg_vcmp *a) \
86
* Move Prefix
210
{ \
87
*
211
@@ -XXX,XX +XXX,XX @@ static bool do_vcmp(DisasContext *s, arg_vcmp *a, MVEGenCmpFn *fn)
212
NULL, \
213
}; \
214
return do_vcmp(s, a, fns[a->size]); \
215
+ } \
216
+ static bool trans_##INSN##_scalar(DisasContext *s, \
217
+ arg_vcmp_scalar *a) \
218
+ { \
219
+ static MVEGenScalarCmpFn * const fns[] = { \
220
+ gen_helper_mve_##FN##_scalarb, \
221
+ gen_helper_mve_##FN##_scalarh, \
222
+ gen_helper_mve_##FN##_scalarw, \
223
+ NULL, \
224
+ }; \
225
+ return do_vcmp_scalar(s, a, fns[a->size]); \
226
}
227
228
DO_VCMP(VCMPEQ, vcmpeq)
229
--
88
--
230
2.20.1
89
2.25.1
231
232
diff view generated by jsdifflib
1
From: Jan Luebbe <jlu@pengutronix.de>
1
From: Richard Henderson <richard.henderson@linaro.org>
2
2
3
Break events are currently only handled by chardev/char-serial.c, so we
3
Mark these as a non-streaming instructions, which should trap
4
just ignore errors, which results in no behaviour change for other
4
if full a64 support is not enabled in streaming mode.
5
chardevs.
6
5
7
Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
8
Message-id: 20210806144700.3751979-1-jlu@pengutronix.de
9
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20220708151540.18136-14-richard.henderson@linaro.org
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
---
10
---
12
hw/char/pl011.c | 6 ++++++
11
target/arm/sme-fa64.decode | 2 --
13
1 file changed, 6 insertions(+)
12
target/arm/translate-sve.c | 2 ++
13
2 files changed, 2 insertions(+), 2 deletions(-)
14
14
15
diff --git a/hw/char/pl011.c b/hw/char/pl011.c
15
diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
16
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
17
--- a/hw/char/pl011.c
17
--- a/target/arm/sme-fa64.decode
18
+++ b/hw/char/pl011.c
18
+++ b/target/arm/sme-fa64.decode
19
@@ -XXX,XX +XXX,XX @@
19
@@ -XXX,XX +XXX,XX @@ FAIL 0001 1110 0111 1110 0000 00-- ---- ---- # FJCVTZS
20
#include "hw/qdev-properties-system.h"
20
# --11 1100 --1- ---- ---- ---- ---- --10 # Load/store FP register (register offset)
21
#include "migration/vmstate.h"
21
# --11 1101 ---- ---- ---- ---- ---- ---- # Load/store FP register (scaled imm)
22
#include "chardev/char-fe.h"
22
23
+#include "chardev/char-serial.h"
23
-FAIL 1010 010- ---- ---- 011- ---- ---- ---- # SVE contiguous FF load (scalar+scalar)
24
#include "qemu/log.h"
24
-FAIL 1010 010- ---1 ---- 101- ---- ---- ---- # SVE contiguous NF load (scalar+imm)
25
#include "qemu/module.h"
25
FAIL 1010 010- -01- ---- 000- ---- ---- ---- # SVE load & replicate 32 bytes (scalar+scalar)
26
#include "trace.h"
26
FAIL 1010 010- -010 ---- 001- ---- ---- ---- # SVE load & replicate 32 bytes (scalar+imm)
27
@@ -XXX,XX +XXX,XX @@ static void pl011_write(void *opaque, hwaddr offset,
27
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
28
s->read_count = 0;
28
index XXXXXXX..XXXXXXX 100644
29
s->read_pos = 0;
29
--- a/target/arm/translate-sve.c
30
}
30
+++ b/target/arm/translate-sve.c
31
+ if ((s->lcr ^ value) & 0x1) {
31
@@ -XXX,XX +XXX,XX @@ static bool trans_LDFF1_zprr(DisasContext *s, arg_rprr_load *a)
32
+ int break_enable = value & 0x1;
32
if (!dc_isar_feature(aa64_sve, s)) {
33
+ qemu_chr_fe_ioctl(&s->chr, CHR_IOCTL_SERIAL_SET_BREAK,
33
return false;
34
+ &break_enable);
34
}
35
+ }
35
+ s->is_nonstreaming = true;
36
s->lcr = value;
36
if (sve_access_check(s)) {
37
pl011_set_read_trigger(s);
37
TCGv_i64 addr = new_tmp_a64(s);
38
break;
38
tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), dtype_msz(a->dtype));
39
@@ -XXX,XX +XXX,XX @@ static bool trans_LDNF1_zpri(DisasContext *s, arg_rpri_load *a)
40
if (!dc_isar_feature(aa64_sve, s)) {
41
return false;
42
}
43
+ s->is_nonstreaming = true;
44
if (sve_access_check(s)) {
45
int vsz = vec_full_reg_size(s);
46
int elements = vsz >> dtype_esz[a->dtype];
39
--
47
--
40
2.20.1
48
2.25.1
41
42
diff view generated by jsdifflib
1
Implement the MVE VMLA insn, which multiplies a vector by a scalar
1
From: Richard Henderson <richard.henderson@linaro.org>
2
and accumulates into another vector.
3
2
3
Mark these as a non-streaming instructions, which should trap
4
if full a64 support is not enabled in streaming mode.
5
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20220708151540.18136-15-richard.henderson@linaro.org
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
---
10
---
7
target/arm/helper-mve.h | 4 ++++
11
target/arm/sme-fa64.decode | 3 ---
8
target/arm/mve.decode | 1 +
12
target/arm/translate-sve.c | 2 ++
9
target/arm/mve_helper.c | 5 +++++
13
2 files changed, 2 insertions(+), 3 deletions(-)
10
target/arm/translate-mve.c | 1 +
11
4 files changed, 11 insertions(+)
12
14
13
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
15
diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
14
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/helper-mve.h
17
--- a/target/arm/sme-fa64.decode
16
+++ b/target/arm/helper-mve.h
18
+++ b/target/arm/sme-fa64.decode
17
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqdmullb_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i3
19
@@ -XXX,XX +XXX,XX @@ FAIL 0001 1110 0111 1110 0000 00-- ---- ---- # FJCVTZS
18
DEF_HELPER_FLAGS_4(mve_vqdmullt_scalarh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
20
# --11 1100 --0- ---- ---- ---- ---- ---- # Load/store FP register (unscaled imm)
19
DEF_HELPER_FLAGS_4(mve_vqdmullt_scalarw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
21
# --11 1100 --1- ---- ---- ---- ---- --10 # Load/store FP register (register offset)
20
22
# --11 1101 ---- ---- ---- ---- ---- ---- # Load/store FP register (scaled imm)
21
+DEF_HELPER_FLAGS_4(mve_vmlab, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
23
-
22
+DEF_HELPER_FLAGS_4(mve_vmlah, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
24
-FAIL 1010 010- -01- ---- 000- ---- ---- ---- # SVE load & replicate 32 bytes (scalar+scalar)
23
+DEF_HELPER_FLAGS_4(mve_vmlaw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
25
-FAIL 1010 010- -010 ---- 001- ---- ---- ---- # SVE load & replicate 32 bytes (scalar+imm)
24
+
26
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
25
DEF_HELPER_FLAGS_4(mve_vmlasb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
26
DEF_HELPER_FLAGS_4(mve_vmlash, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
27
DEF_HELPER_FLAGS_4(mve_vmlasw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
28
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
29
index XXXXXXX..XXXXXXX 100644
27
index XXXXXXX..XXXXXXX 100644
30
--- a/target/arm/mve.decode
28
--- a/target/arm/translate-sve.c
31
+++ b/target/arm/mve.decode
29
+++ b/target/arm/translate-sve.c
32
@@ -XXX,XX +XXX,XX @@ VQDMULH_scalar 1110 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
30
@@ -XXX,XX +XXX,XX @@ static bool trans_LD1RO_zprr(DisasContext *s, arg_rprr_load *a)
33
VQRDMULH_scalar 1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
31
if (a->rm == 31) {
34
32
return false;
35
# The U bit (28) is don't-care because it does not affect the result
33
}
36
+VMLA 111- 1110 0 . .. ... 1 ... 0 1110 . 100 .... @2scalar
34
+ s->is_nonstreaming = true;
37
VMLAS 111- 1110 0 . .. ... 1 ... 1 1110 . 100 .... @2scalar
35
if (sve_access_check(s)) {
38
36
TCGv_i64 addr = new_tmp_a64(s);
39
# Vector add across vector
37
tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), dtype_msz(a->dtype));
40
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
38
@@ -XXX,XX +XXX,XX @@ static bool trans_LD1RO_zpri(DisasContext *s, arg_rpri_load *a)
41
index XXXXXXX..XXXXXXX 100644
39
if (!dc_isar_feature(aa64_sve_f64mm, s)) {
42
--- a/target/arm/mve_helper.c
40
return false;
43
+++ b/target/arm/mve_helper.c
41
}
44
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT_SCALAR(vqrdmulh_scalarb, 1, int8_t, DO_QRDMULH_B)
42
+ s->is_nonstreaming = true;
45
DO_2OP_SAT_SCALAR(vqrdmulh_scalarh, 2, int16_t, DO_QRDMULH_H)
43
if (sve_access_check(s)) {
46
DO_2OP_SAT_SCALAR(vqrdmulh_scalarw, 4, int32_t, DO_QRDMULH_W)
44
TCGv_i64 addr = new_tmp_a64(s);
47
45
tcg_gen_addi_i64(addr, cpu_reg_sp(s, a->rn), a->imm * 32);
48
+/* Vector by scalar plus vector */
49
+#define DO_VMLA(D, N, M) ((N) * (M) + (D))
50
+
51
+DO_2OP_ACC_SCALAR_U(vmla, DO_VMLA)
52
+
53
/* Vector by vector plus scalar */
54
#define DO_VMLAS(D, N, M) ((N) * (D) + (M))
55
56
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
57
index XXXXXXX..XXXXXXX 100644
58
--- a/target/arm/translate-mve.c
59
+++ b/target/arm/translate-mve.c
60
@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR(VQSUB_U_scalar, vqsubu_scalar)
61
DO_2OP_SCALAR(VQDMULH_scalar, vqdmulh_scalar)
62
DO_2OP_SCALAR(VQRDMULH_scalar, vqrdmulh_scalar)
63
DO_2OP_SCALAR(VBRSR, vbrsr)
64
+DO_2OP_SCALAR(VMLA, vmla)
65
DO_2OP_SCALAR(VMLAS, vmlas)
66
67
static bool trans_VQDMULLB_scalar(DisasContext *s, arg_2scalar *a)
68
--
46
--
69
2.20.1
47
2.25.1
70
71
diff view generated by jsdifflib
1
Implement the MVE VMLADAV and VMLSLDAV insns. Like the VMLALDAV and
1
From: Richard Henderson <richard.henderson@linaro.org>
2
VMLSLDAV insns already implemented, these accumulate multiplied
3
vector elements; but they accumulate a 32-bit result rather than a
4
64-bit one.
5
2
6
Note that these encodings overlap with what would be RdaHi=0b111 for
3
These functions will be used to verify that the cpu
7
VMLALDAV, VMLSLDAV, VRMLALDAVH and VRMLSLDAVH.
4
is in the correct state for a given instruction.
8
5
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20220708151540.18136-16-richard.henderson@linaro.org
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
10
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
11
---
10
---
12
target/arm/helper-mve.h | 17 ++++++++++
11
target/arm/translate-a64.h | 21 +++++++++++++++++++++
13
target/arm/mve.decode | 33 +++++++++++++++++---
12
target/arm/translate-a64.c | 34 ++++++++++++++++++++++++++++++++++
14
target/arm/mve_helper.c | 41 ++++++++++++++++++++++++
13
2 files changed, 55 insertions(+)
15
target/arm/translate-mve.c | 64 ++++++++++++++++++++++++++++++++++++++
16
4 files changed, 150 insertions(+), 5 deletions(-)
17
14
18
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
15
diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
19
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
20
--- a/target/arm/helper-mve.h
17
--- a/target/arm/translate-a64.h
21
+++ b/target/arm/helper-mve.h
18
+++ b/target/arm/translate-a64.h
22
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vrmlaldavhuw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
19
@@ -XXX,XX +XXX,XX @@ void write_fp_dreg(DisasContext *s, int reg, TCGv_i64 v);
23
DEF_HELPER_FLAGS_4(mve_vrmlsldavhsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
20
bool logic_imm_decode_wmask(uint64_t *result, unsigned int immn,
24
DEF_HELPER_FLAGS_4(mve_vrmlsldavhxsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
21
unsigned int imms, unsigned int immr);
25
22
bool sve_access_check(DisasContext *s);
26
+DEF_HELPER_FLAGS_4(mve_vmladavsb, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
23
+bool sme_enabled_check(DisasContext *s);
27
+DEF_HELPER_FLAGS_4(mve_vmladavsh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
24
+bool sme_enabled_check_with_svcr(DisasContext *s, unsigned);
28
+DEF_HELPER_FLAGS_4(mve_vmladavsw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
29
+DEF_HELPER_FLAGS_4(mve_vmladavub, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
30
+DEF_HELPER_FLAGS_4(mve_vmladavuh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
31
+DEF_HELPER_FLAGS_4(mve_vmladavuw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
32
+DEF_HELPER_FLAGS_4(mve_vmlsdavb, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
33
+DEF_HELPER_FLAGS_4(mve_vmlsdavh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
34
+DEF_HELPER_FLAGS_4(mve_vmlsdavw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
35
+
25
+
36
+DEF_HELPER_FLAGS_4(mve_vmladavsxb, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
26
+/* This function corresponds to CheckStreamingSVEEnabled. */
37
+DEF_HELPER_FLAGS_4(mve_vmladavsxh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
27
+static inline bool sme_sm_enabled_check(DisasContext *s)
38
+DEF_HELPER_FLAGS_4(mve_vmladavsxw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
39
+DEF_HELPER_FLAGS_4(mve_vmlsdavxb, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
40
+DEF_HELPER_FLAGS_4(mve_vmlsdavxh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
41
+DEF_HELPER_FLAGS_4(mve_vmlsdavxw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
42
+
43
DEF_HELPER_FLAGS_3(mve_vaddvsb, TCG_CALL_NO_WG, i32, env, ptr, i32)
44
DEF_HELPER_FLAGS_3(mve_vaddvub, TCG_CALL_NO_WG, i32, env, ptr, i32)
45
DEF_HELPER_FLAGS_3(mve_vaddvsh, TCG_CALL_NO_WG, i32, env, ptr, i32)
46
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
47
index XXXXXXX..XXXXXXX 100644
48
--- a/target/arm/mve.decode
49
+++ b/target/arm/mve.decode
50
@@ -XXX,XX +XXX,XX @@ VDUP 1110 1110 1 0 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=2
51
%size_16 16:1 !function=plus_1
52
53
&vmlaldav rdahi rdalo size qn qm x a
54
+&vmladav rda size qn qm x a
55
56
@vmlaldav .... .... . ... ... . ... x:1 .... .. a:1 . qm:3 . \
57
qn=%qn rdahi=%rdahi rdalo=%rdalo size=%size_16 &vmlaldav
58
@vmlaldav_nosz .... .... . ... ... . ... x:1 .... .. a:1 . qm:3 . \
59
qn=%qn rdahi=%rdahi rdalo=%rdalo size=0 &vmlaldav
60
-VMLALDAV_S 1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
61
-VMLALDAV_U 1111 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
62
+@vmladav .... .... .... ... . ... x:1 .... . . a:1 . qm:3 . \
63
+ qn=%qn rda=%rdalo size=%size_16 &vmladav
64
+@vmladav_nosz .... .... .... ... . ... x:1 .... . . a:1 . qm:3 . \
65
+ qn=%qn rda=%rdalo size=0 &vmladav
66
67
-VMLSLDAV 1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 1 @vmlaldav
68
+{
28
+{
69
+ VMLADAV_S 1110 1110 1111 ... . ... . 1110 . 0 . 0 ... 0 @vmladav
29
+ return sme_enabled_check_with_svcr(s, R_SVCR_SM_MASK);
70
+ VMLALDAV_S 1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
71
+}
72
+{
73
+ VMLADAV_U 1111 1110 1111 ... . ... . 1110 . 0 . 0 ... 0 @vmladav
74
+ VMLALDAV_U 1111 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
75
+}
30
+}
76
+
31
+
32
+/* This function corresponds to CheckSMEAndZAEnabled. */
33
+static inline bool sme_za_enabled_check(DisasContext *s)
77
+{
34
+{
78
+ VMLSDAV 1110 1110 1111 ... . ... . 1110 . 0 . 0 ... 1 @vmladav
35
+ return sme_enabled_check_with_svcr(s, R_SVCR_ZA_MASK);
79
+ VMLSLDAV 1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 1 @vmlaldav
80
+}
36
+}
81
+
37
+
38
+/* Note that this function corresponds to CheckStreamingSVEAndZAEnabled. */
39
+static inline bool sme_smza_enabled_check(DisasContext *s)
82
+{
40
+{
83
+ VMLSDAV 1111 1110 1111 ... 0 ... . 1110 . 0 . 0 ... 1 @vmladav_nosz
41
+ return sme_enabled_check_with_svcr(s, R_SVCR_SM_MASK | R_SVCR_ZA_MASK);
84
+ VRMLSLDAVH 1111 1110 1 ... ... 0 ... . 1110 . 0 . 0 ... 1 @vmlaldav_nosz
85
+}
42
+}
86
+
43
+
87
+VMLADAV_S 1110 1110 1111 ... 0 ... . 1111 . 0 . 0 ... 1 @vmladav_nosz
44
TCGv_i64 clean_data_tbi(DisasContext *s, TCGv_i64 addr);
88
+VMLADAV_U 1111 1110 1111 ... 0 ... . 1111 . 0 . 0 ... 1 @vmladav_nosz
45
TCGv_i64 gen_mte_check1(DisasContext *s, TCGv_i64 addr, bool is_write,
89
46
bool tag_checked, int log2_size);
90
{
47
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
91
VMAXV_S 1110 1110 1110 .. 10 .... 1111 0 0 . 0 ... 0 @vmaxv
48
index XXXXXXX..XXXXXXX 100644
92
VMINV_S 1110 1110 1110 .. 10 .... 1111 1 0 . 0 ... 0 @vmaxv
49
--- a/target/arm/translate-a64.c
93
VMAXAV 1110 1110 1110 .. 00 .... 1111 0 0 . 0 ... 0 @vmaxv
50
+++ b/target/arm/translate-a64.c
94
VMINAV 1110 1110 1110 .. 00 .... 1111 1 0 . 0 ... 0 @vmaxv
51
@@ -XXX,XX +XXX,XX @@ static bool sme_access_check(DisasContext *s)
95
+ VMLADAV_S 1110 1110 1111 ... 0 ... . 1111 . 0 . 0 ... 0 @vmladav_nosz
52
return true;
96
VRMLALDAVH_S 1110 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
97
}
53
}
98
54
99
{
55
+/* This function corresponds to CheckSMEEnabled. */
100
VMAXV_U 1111 1110 1110 .. 10 .... 1111 0 0 . 0 ... 0 @vmaxv
56
+bool sme_enabled_check(DisasContext *s)
101
VMINV_U 1111 1110 1110 .. 10 .... 1111 1 0 . 0 ... 0 @vmaxv
57
+{
102
+ VMLADAV_U 1111 1110 1111 ... 0 ... . 1111 . 0 . 0 ... 0 @vmladav_nosz
58
+ /*
103
VRMLALDAVH_U 1111 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
59
+ * Note that unlike sve_excp_el, we have not constrained sme_excp_el
104
}
60
+ * to be zero when fp_excp_el has priority. This is because we need
105
61
+ * sme_excp_el by itself for cpregs access checks.
106
-VRMLSLDAVH 1111 1110 1 ... ... 0 ... . 1110 . 0 . 0 ... 1 @vmlaldav_nosz
62
+ */
107
-
63
+ if (!s->fp_excp_el || s->sme_excp_el < s->fp_excp_el) {
108
# Scalar operations
64
+ s->fp_access_checked = true;
109
65
+ return sme_access_check(s);
110
VADD_scalar 1110 1110 0 . .. ... 1 ... 0 1111 . 100 .... @2scalar
111
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
112
index XXXXXXX..XXXXXXX 100644
113
--- a/target/arm/mve_helper.c
114
+++ b/target/arm/mve_helper.c
115
@@ -XXX,XX +XXX,XX @@ DO_LDAV(vmlsldavxsh, 2, int16_t, true, +=, -=)
116
DO_LDAV(vmlsldavsw, 4, int32_t, false, +=, -=)
117
DO_LDAV(vmlsldavxsw, 4, int32_t, true, +=, -=)
118
119
+/*
120
+ * Multiply add dual accumulate ops
121
+ */
122
+#define DO_DAV(OP, ESIZE, TYPE, XCHG, EVENACC, ODDACC) \
123
+ uint32_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vn, \
124
+ void *vm, uint32_t a) \
125
+ { \
126
+ uint16_t mask = mve_element_mask(env); \
127
+ unsigned e; \
128
+ TYPE *n = vn, *m = vm; \
129
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
130
+ if (mask & 1) { \
131
+ if (e & 1) { \
132
+ a ODDACC \
133
+ n[H##ESIZE(e - 1 * XCHG)] * m[H##ESIZE(e)]; \
134
+ } else { \
135
+ a EVENACC \
136
+ n[H##ESIZE(e + 1 * XCHG)] * m[H##ESIZE(e)]; \
137
+ } \
138
+ } \
139
+ } \
140
+ mve_advance_vpt(env); \
141
+ return a; \
142
+ }
66
+ }
67
+ return fp_access_check_only(s);
68
+}
143
+
69
+
144
+#define DO_DAV_S(INSN, XCHG, EVENACC, ODDACC) \
70
+/* Common subroutine for CheckSMEAnd*Enabled. */
145
+ DO_DAV(INSN##b, 1, int8_t, XCHG, EVENACC, ODDACC) \
71
+bool sme_enabled_check_with_svcr(DisasContext *s, unsigned req)
146
+ DO_DAV(INSN##h, 2, int16_t, XCHG, EVENACC, ODDACC) \
147
+ DO_DAV(INSN##w, 4, int32_t, XCHG, EVENACC, ODDACC)
148
+
149
+#define DO_DAV_U(INSN, XCHG, EVENACC, ODDACC) \
150
+ DO_DAV(INSN##b, 1, uint8_t, XCHG, EVENACC, ODDACC) \
151
+ DO_DAV(INSN##h, 2, uint16_t, XCHG, EVENACC, ODDACC) \
152
+ DO_DAV(INSN##w, 4, uint32_t, XCHG, EVENACC, ODDACC)
153
+
154
+DO_DAV_S(vmladavs, false, +=, +=)
155
+DO_DAV_U(vmladavu, false, +=, +=)
156
+DO_DAV_S(vmlsdav, false, +=, -=)
157
+DO_DAV_S(vmladavsx, true, +=, +=)
158
+DO_DAV_S(vmlsdavx, true, +=, -=)
159
+
160
/*
161
* Rounding multiply add long dual accumulate high. In the pseudocode
162
* this is implemented with a 72-bit internal accumulator value of which
163
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
164
index XXXXXXX..XXXXXXX 100644
165
--- a/target/arm/translate-mve.c
166
+++ b/target/arm/translate-mve.c
167
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenVIWDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32, TC
168
typedef void MVEGenCmpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
169
typedef void MVEGenScalarCmpFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
170
typedef void MVEGenVABAVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
171
+typedef void MVEGenDualAccOpFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
172
173
/* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
174
static inline long mve_qreg_offset(unsigned reg)
175
@@ -XXX,XX +XXX,XX @@ static bool trans_VRMLSLDAVH(DisasContext *s, arg_vmlaldav *a)
176
return do_long_dual_acc(s, a, fns[a->x]);
177
}
178
179
+static bool do_dual_acc(DisasContext *s, arg_vmladav *a, MVEGenDualAccOpFn *fn)
180
+{
72
+{
181
+ TCGv_ptr qn, qm;
73
+ if (!sme_enabled_check(s)) {
182
+ TCGv_i32 rda;
183
+
184
+ if (!dc_isar_feature(aa32_mve, s) ||
185
+ !mve_check_qreg_bank(s, a->qn) ||
186
+ !fn) {
187
+ return false;
74
+ return false;
188
+ }
75
+ }
189
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
76
+ if (FIELD_EX64(req, SVCR, SM) && !s->pstate_sm) {
190
+ return true;
77
+ gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
78
+ syn_smetrap(SME_ET_NotStreaming, false));
79
+ return false;
191
+ }
80
+ }
192
+
81
+ if (FIELD_EX64(req, SVCR, ZA) && !s->pstate_za) {
193
+ qn = mve_qreg_ptr(a->qn);
82
+ gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
194
+ qm = mve_qreg_ptr(a->qm);
83
+ syn_smetrap(SME_ET_InactiveZA, false));
195
+
84
+ return false;
196
+ /*
197
+ * This insn is subject to beat-wise execution. Partial execution
198
+ * of an A=0 (no-accumulate) insn which does not execute the first
199
+ * beat must start with the current rda value, not 0.
200
+ */
201
+ if (a->a || mve_skip_first_beat(s)) {
202
+ rda = load_reg(s, a->rda);
203
+ } else {
204
+ rda = tcg_const_i32(0);
205
+ }
85
+ }
206
+
207
+ fn(rda, cpu_env, qn, qm, rda);
208
+ store_reg(s, a->rda, rda);
209
+ tcg_temp_free_ptr(qn);
210
+ tcg_temp_free_ptr(qm);
211
+
212
+ mve_update_eci(s);
213
+ return true;
86
+ return true;
214
+}
87
+}
215
+
88
+
216
+#define DO_DUAL_ACC(INSN, FN) \
89
/*
217
+ static bool trans_##INSN(DisasContext *s, arg_vmladav *a) \
90
* This utility function is for doing register extension with an
218
+ { \
91
* optional shift. You will likely want to pass a temporary for the
219
+ static MVEGenDualAccOpFn * const fns[4][2] = { \
220
+ { gen_helper_mve_##FN##b, gen_helper_mve_##FN##xb }, \
221
+ { gen_helper_mve_##FN##h, gen_helper_mve_##FN##xh }, \
222
+ { gen_helper_mve_##FN##w, gen_helper_mve_##FN##xw }, \
223
+ { NULL, NULL }, \
224
+ }; \
225
+ return do_dual_acc(s, a, fns[a->size][a->x]); \
226
+ }
227
+
228
+DO_DUAL_ACC(VMLADAV_S, vmladavs)
229
+DO_DUAL_ACC(VMLSDAV, vmlsdav)
230
+
231
+static bool trans_VMLADAV_U(DisasContext *s, arg_vmladav *a)
232
+{
233
+ static MVEGenDualAccOpFn * const fns[4][2] = {
234
+ { gen_helper_mve_vmladavub, NULL },
235
+ { gen_helper_mve_vmladavuh, NULL },
236
+ { gen_helper_mve_vmladavuw, NULL },
237
+ { NULL, NULL },
238
+ };
239
+ return do_dual_acc(s, a, fns[a->size][a->x]);
240
+}
241
+
242
static void gen_vpst(DisasContext *s, uint32_t mask)
243
{
244
/*
245
--
92
--
246
2.20.1
93
2.25.1
247
248
diff view generated by jsdifflib
1
Implement the MVE narrowing move insns VMOVN, VQMOVN and VQMOVUN.
1
From: Richard Henderson <richard.henderson@linaro.org>
2
These take a double-width input, narrow it (possibly saturating) and
3
store the result to either the top or bottom half of the output
4
element.
5
2
3
The pseudocode for CheckSVEEnabled gains a check for Streaming
4
SVE mode, and for SME present but SVE absent.
5
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20220708151540.18136-17-richard.henderson@linaro.org
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
8
---
10
---
9
target/arm/helper-mve.h | 20 ++++++++++
11
target/arm/translate-a64.c | 22 ++++++++++++++++------
10
target/arm/mve.decode | 12 ++++++
12
1 file changed, 16 insertions(+), 6 deletions(-)
11
target/arm/mve_helper.c | 78 ++++++++++++++++++++++++++++++++++++++
12
target/arm/translate-mve.c | 22 +++++++++++
13
4 files changed, 132 insertions(+)
14
13
15
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
14
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
16
index XXXXXXX..XXXXXXX 100644
15
index XXXXXXX..XXXXXXX 100644
17
--- a/target/arm/helper-mve.h
16
--- a/target/arm/translate-a64.c
18
+++ b/target/arm/helper-mve.h
17
+++ b/target/arm/translate-a64.c
19
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vnegw, TCG_CALL_NO_WG, void, env, ptr, ptr)
18
@@ -XXX,XX +XXX,XX @@ static bool fp_access_check(DisasContext *s)
20
DEF_HELPER_FLAGS_3(mve_vfnegh, TCG_CALL_NO_WG, void, env, ptr, ptr)
19
return true;
21
DEF_HELPER_FLAGS_3(mve_vfnegs, TCG_CALL_NO_WG, void, env, ptr, ptr)
20
}
22
21
23
+DEF_HELPER_FLAGS_3(mve_vmovnbb, TCG_CALL_NO_WG, void, env, ptr, ptr)
22
-/* Check that SVE access is enabled. If it is, return true.
24
+DEF_HELPER_FLAGS_3(mve_vmovnbh, TCG_CALL_NO_WG, void, env, ptr, ptr)
23
+/*
25
+DEF_HELPER_FLAGS_3(mve_vmovntb, TCG_CALL_NO_WG, void, env, ptr, ptr)
24
+ * Check that SVE access is enabled. If it is, return true.
26
+DEF_HELPER_FLAGS_3(mve_vmovnth, TCG_CALL_NO_WG, void, env, ptr, ptr)
25
* If not, emit code to generate an appropriate exception and return false.
26
+ * This function corresponds to CheckSVEEnabled().
27
*/
28
bool sve_access_check(DisasContext *s)
29
{
30
- if (s->sve_excp_el) {
31
- assert(!s->sve_access_checked);
32
- s->sve_access_checked = true;
33
-
34
+ if (s->pstate_sm || !dc_isar_feature(aa64_sve, s)) {
35
+ assert(dc_isar_feature(aa64_sme, s));
36
+ if (!sme_sm_enabled_check(s)) {
37
+ goto fail_exit;
38
+ }
39
+ } else if (s->sve_excp_el) {
40
gen_exception_insn_el(s, s->pc_curr, EXCP_UDEF,
41
syn_sve_access_trap(), s->sve_excp_el);
42
- return false;
43
+ goto fail_exit;
44
}
45
s->sve_access_checked = true;
46
return fp_access_check(s);
27
+
47
+
28
+DEF_HELPER_FLAGS_3(mve_vqmovunbb, TCG_CALL_NO_WG, void, env, ptr, ptr)
48
+ fail_exit:
29
+DEF_HELPER_FLAGS_3(mve_vqmovunbh, TCG_CALL_NO_WG, void, env, ptr, ptr)
49
+ /* Assert that we only raise one exception per instruction. */
30
+DEF_HELPER_FLAGS_3(mve_vqmovuntb, TCG_CALL_NO_WG, void, env, ptr, ptr)
50
+ assert(!s->sve_access_checked);
31
+DEF_HELPER_FLAGS_3(mve_vqmovunth, TCG_CALL_NO_WG, void, env, ptr, ptr)
51
+ s->sve_access_checked = true;
32
+
52
+ return false;
33
+DEF_HELPER_FLAGS_3(mve_vqmovnbsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
34
+DEF_HELPER_FLAGS_3(mve_vqmovnbsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
35
+DEF_HELPER_FLAGS_3(mve_vqmovntsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
36
+DEF_HELPER_FLAGS_3(mve_vqmovntsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
37
+
38
+DEF_HELPER_FLAGS_3(mve_vqmovnbub, TCG_CALL_NO_WG, void, env, ptr, ptr)
39
+DEF_HELPER_FLAGS_3(mve_vqmovnbuh, TCG_CALL_NO_WG, void, env, ptr, ptr)
40
+DEF_HELPER_FLAGS_3(mve_vqmovntub, TCG_CALL_NO_WG, void, env, ptr, ptr)
41
+DEF_HELPER_FLAGS_3(mve_vqmovntuh, TCG_CALL_NO_WG, void, env, ptr, ptr)
42
+
43
DEF_HELPER_FLAGS_4(mve_vand, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
44
DEF_HELPER_FLAGS_4(mve_vbic, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
45
DEF_HELPER_FLAGS_4(mve_vorr, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
46
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
47
index XXXXXXX..XXXXXXX 100644
48
--- a/target/arm/mve.decode
49
+++ b/target/arm/mve.decode
50
@@ -XXX,XX +XXX,XX @@ VMUL 1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
51
VSHLL_BS 111 0 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_b
52
VSHLL_BS 111 0 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_h
53
54
+ VQMOVUNB 111 0 1110 0 . 11 .. 01 ... 0 1110 1 0 . 0 ... 1 @1op
55
+ VQMOVN_BS 111 0 1110 0 . 11 .. 11 ... 0 1110 0 0 . 0 ... 1 @1op
56
+
57
VMULH_S 111 0 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
58
}
53
}
59
54
60
@@ -XXX,XX +XXX,XX @@ VMUL 1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
55
/*
61
VSHLL_BU 111 1 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_b
62
VSHLL_BU 111 1 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_h
63
64
+ VMOVNB 111 1 1110 0 . 11 .. 01 ... 0 1110 1 0 . 0 ... 1 @1op
65
+ VQMOVN_BU 111 1 1110 0 . 11 .. 11 ... 0 1110 0 0 . 0 ... 1 @1op
66
+
67
VMULH_U 111 1 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
68
}
69
70
@@ -XXX,XX +XXX,XX @@ VMUL 1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
71
VSHLL_TS 111 0 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_b
72
VSHLL_TS 111 0 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_h
73
74
+ VQMOVUNT 111 0 1110 0 . 11 .. 01 ... 1 1110 1 0 . 0 ... 1 @1op
75
+ VQMOVN_TS 111 0 1110 0 . 11 .. 11 ... 1 1110 0 0 . 0 ... 1 @1op
76
+
77
VRMULH_S 111 0 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
78
}
79
80
@@ -XXX,XX +XXX,XX @@ VMUL 1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
81
VSHLL_TU 111 1 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_b
82
VSHLL_TU 111 1 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_h
83
84
+ VMOVNT 111 1 1110 0 . 11 .. 01 ... 1 1110 1 0 . 0 ... 1 @1op
85
+ VQMOVN_TU 111 1 1110 0 . 11 .. 11 ... 1 1110 0 0 . 0 ... 1 @1op
86
+
87
VRMULH_U 111 1 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
88
}
89
90
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
91
index XXXXXXX..XXXXXXX 100644
92
--- a/target/arm/mve_helper.c
93
+++ b/target/arm/mve_helper.c
94
@@ -XXX,XX +XXX,XX @@ DO_VSHRN_SAT_UH(vqrshrnb_uh, vqrshrnt_uh, DO_RSHRN_UH)
95
DO_VSHRN_SAT_SB(vqrshrunbb, vqrshruntb, DO_RSHRUN_B)
96
DO_VSHRN_SAT_SH(vqrshrunbh, vqrshrunth, DO_RSHRUN_H)
97
98
+#define DO_VMOVN(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE) \
99
+ void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm) \
100
+ { \
101
+ LTYPE *m = vm; \
102
+ TYPE *d = vd; \
103
+ uint16_t mask = mve_element_mask(env); \
104
+ unsigned le; \
105
+ mask >>= ESIZE * TOP; \
106
+ for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) { \
107
+ mergemask(&d[H##ESIZE(le * 2 + TOP)], \
108
+ m[H##LESIZE(le)], mask); \
109
+ } \
110
+ mve_advance_vpt(env); \
111
+ }
112
+
113
+DO_VMOVN(vmovnbb, false, 1, uint8_t, 2, uint16_t)
114
+DO_VMOVN(vmovnbh, false, 2, uint16_t, 4, uint32_t)
115
+DO_VMOVN(vmovntb, true, 1, uint8_t, 2, uint16_t)
116
+DO_VMOVN(vmovnth, true, 2, uint16_t, 4, uint32_t)
117
+
118
+#define DO_VMOVN_SAT(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE, FN) \
119
+ void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm) \
120
+ { \
121
+ LTYPE *m = vm; \
122
+ TYPE *d = vd; \
123
+ uint16_t mask = mve_element_mask(env); \
124
+ bool qc = false; \
125
+ unsigned le; \
126
+ mask >>= ESIZE * TOP; \
127
+ for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) { \
128
+ bool sat = false; \
129
+ TYPE r = FN(m[H##LESIZE(le)], &sat); \
130
+ mergemask(&d[H##ESIZE(le * 2 + TOP)], r, mask); \
131
+ qc |= sat & mask & 1; \
132
+ } \
133
+ if (qc) { \
134
+ env->vfp.qc[0] = qc; \
135
+ } \
136
+ mve_advance_vpt(env); \
137
+ }
138
+
139
+#define DO_VMOVN_SAT_UB(BOP, TOP, FN) \
140
+ DO_VMOVN_SAT(BOP, false, 1, uint8_t, 2, uint16_t, FN) \
141
+ DO_VMOVN_SAT(TOP, true, 1, uint8_t, 2, uint16_t, FN)
142
+
143
+#define DO_VMOVN_SAT_UH(BOP, TOP, FN) \
144
+ DO_VMOVN_SAT(BOP, false, 2, uint16_t, 4, uint32_t, FN) \
145
+ DO_VMOVN_SAT(TOP, true, 2, uint16_t, 4, uint32_t, FN)
146
+
147
+#define DO_VMOVN_SAT_SB(BOP, TOP, FN) \
148
+ DO_VMOVN_SAT(BOP, false, 1, int8_t, 2, int16_t, FN) \
149
+ DO_VMOVN_SAT(TOP, true, 1, int8_t, 2, int16_t, FN)
150
+
151
+#define DO_VMOVN_SAT_SH(BOP, TOP, FN) \
152
+ DO_VMOVN_SAT(BOP, false, 2, int16_t, 4, int32_t, FN) \
153
+ DO_VMOVN_SAT(TOP, true, 2, int16_t, 4, int32_t, FN)
154
+
155
+#define DO_VQMOVN_SB(N, SATP) \
156
+ do_sat_bhs((int64_t)(N), INT8_MIN, INT8_MAX, SATP)
157
+#define DO_VQMOVN_UB(N, SATP) \
158
+ do_sat_bhs((uint64_t)(N), 0, UINT8_MAX, SATP)
159
+#define DO_VQMOVUN_B(N, SATP) \
160
+ do_sat_bhs((int64_t)(N), 0, UINT8_MAX, SATP)
161
+
162
+#define DO_VQMOVN_SH(N, SATP) \
163
+ do_sat_bhs((int64_t)(N), INT16_MIN, INT16_MAX, SATP)
164
+#define DO_VQMOVN_UH(N, SATP) \
165
+ do_sat_bhs((uint64_t)(N), 0, UINT16_MAX, SATP)
166
+#define DO_VQMOVUN_H(N, SATP) \
167
+ do_sat_bhs((int64_t)(N), 0, UINT16_MAX, SATP)
168
+
169
+DO_VMOVN_SAT_SB(vqmovnbsb, vqmovntsb, DO_VQMOVN_SB)
170
+DO_VMOVN_SAT_SH(vqmovnbsh, vqmovntsh, DO_VQMOVN_SH)
171
+DO_VMOVN_SAT_UB(vqmovnbub, vqmovntub, DO_VQMOVN_UB)
172
+DO_VMOVN_SAT_UH(vqmovnbuh, vqmovntuh, DO_VQMOVN_UH)
173
+DO_VMOVN_SAT_SB(vqmovunbb, vqmovuntb, DO_VQMOVUN_B)
174
+DO_VMOVN_SAT_SH(vqmovunbh, vqmovunth, DO_VQMOVUN_H)
175
+
176
uint32_t HELPER(mve_vshlc)(CPUARMState *env, void *vd, uint32_t rdm,
177
uint32_t shift)
178
{
179
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
180
index XXXXXXX..XXXXXXX 100644
181
--- a/target/arm/translate-mve.c
182
+++ b/target/arm/translate-mve.c
183
@@ -XXX,XX +XXX,XX @@ DO_1OP(VCLS, vcls)
184
DO_1OP(VABS, vabs)
185
DO_1OP(VNEG, vneg)
186
187
+/* Narrowing moves: only size 0 and 1 are valid */
188
+#define DO_VMOVN(INSN, FN) \
189
+ static bool trans_##INSN(DisasContext *s, arg_1op *a) \
190
+ { \
191
+ static MVEGenOneOpFn * const fns[] = { \
192
+ gen_helper_mve_##FN##b, \
193
+ gen_helper_mve_##FN##h, \
194
+ NULL, \
195
+ NULL, \
196
+ }; \
197
+ return do_1op(s, a, fns[a->size]); \
198
+ }
199
+
200
+DO_VMOVN(VMOVNB, vmovnb)
201
+DO_VMOVN(VMOVNT, vmovnt)
202
+DO_VMOVN(VQMOVUNB, vqmovunb)
203
+DO_VMOVN(VQMOVUNT, vqmovunt)
204
+DO_VMOVN(VQMOVN_BS, vqmovnbs)
205
+DO_VMOVN(VQMOVN_TS, vqmovnts)
206
+DO_VMOVN(VQMOVN_BU, vqmovnbu)
207
+DO_VMOVN(VQMOVN_TU, vqmovntu)
208
+
209
static bool trans_VREV16(DisasContext *s, arg_1op *a)
210
{
211
static MVEGenOneOpFn * const fns[] = {
212
--
56
--
213
2.20.1
57
2.25.1
214
215
diff view generated by jsdifflib
1
Implement the MVE interleaving load/store functions VLD2, VLD4, VST2
1
From: Richard Henderson <richard.henderson@linaro.org>
2
and VST4. VLD2 loads 16 bytes of data from memory and writes to 2
3
consecutive Qregs; VLD4 loads 16 bytes of data from memory and writes
4
to 4 consecutive Qregs. The 'pattern' field in the encoding
5
determines the offset into memory which is accessed and also which
6
elements in the Qregs are written to. (The intention is that a
7
sequence of four consecutive VLD4 with different pattern values
8
performs a complete de-interleaving load of 64 bytes into all
9
elements of the 4 Qregs.) VST2 and VST4 do the same, but for stores.
10
2
3
These SME instructions are nominally within the SVE decode space,
4
so we add them to sve.decode and translate-sve.c.
5
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20220708151540.18136-18-richard.henderson@linaro.org
11
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
12
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
13
---
10
---
14
target/arm/helper-mve.h | 48 ++++++
11
target/arm/translate-a64.h | 12 ++++++++++++
15
target/arm/mve.decode | 11 ++
12
target/arm/sve.decode | 5 ++++-
16
target/arm/mve_helper.c | 342 +++++++++++++++++++++++++++++++++++++
13
target/arm/translate-sve.c | 38 ++++++++++++++++++++++++++++++++++++++
17
target/arm/translate-mve.c | 94 ++++++++++
14
3 files changed, 54 insertions(+), 1 deletion(-)
18
4 files changed, 495 insertions(+)
19
15
20
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
16
diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
21
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
22
--- a/target/arm/helper-mve.h
18
--- a/target/arm/translate-a64.h
23
+++ b/target/arm/helper-mve.h
19
+++ b/target/arm/translate-a64.h
24
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vldrd_sg_wb_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
20
@@ -XXX,XX +XXX,XX @@ static inline int vec_full_reg_size(DisasContext *s)
25
DEF_HELPER_FLAGS_4(mve_vstrw_sg_wb_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
21
return s->vl;
26
DEF_HELPER_FLAGS_4(mve_vstrd_sg_wb_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
22
}
27
23
28
+DEF_HELPER_FLAGS_3(mve_vld20b, TCG_CALL_NO_WG, void, env, i32, i32)
24
+/* Return the byte size of the vector register, SVL / 8. */
29
+DEF_HELPER_FLAGS_3(mve_vld20h, TCG_CALL_NO_WG, void, env, i32, i32)
25
+static inline int streaming_vec_reg_size(DisasContext *s)
30
+DEF_HELPER_FLAGS_3(mve_vld20w, TCG_CALL_NO_WG, void, env, i32, i32)
26
+{
31
+
27
+ return s->svl;
32
+DEF_HELPER_FLAGS_3(mve_vld21b, TCG_CALL_NO_WG, void, env, i32, i32)
28
+}
33
+DEF_HELPER_FLAGS_3(mve_vld21h, TCG_CALL_NO_WG, void, env, i32, i32)
34
+DEF_HELPER_FLAGS_3(mve_vld21w, TCG_CALL_NO_WG, void, env, i32, i32)
35
+
36
+DEF_HELPER_FLAGS_3(mve_vld40b, TCG_CALL_NO_WG, void, env, i32, i32)
37
+DEF_HELPER_FLAGS_3(mve_vld40h, TCG_CALL_NO_WG, void, env, i32, i32)
38
+DEF_HELPER_FLAGS_3(mve_vld40w, TCG_CALL_NO_WG, void, env, i32, i32)
39
+
40
+DEF_HELPER_FLAGS_3(mve_vld41b, TCG_CALL_NO_WG, void, env, i32, i32)
41
+DEF_HELPER_FLAGS_3(mve_vld41h, TCG_CALL_NO_WG, void, env, i32, i32)
42
+DEF_HELPER_FLAGS_3(mve_vld41w, TCG_CALL_NO_WG, void, env, i32, i32)
43
+
44
+DEF_HELPER_FLAGS_3(mve_vld42b, TCG_CALL_NO_WG, void, env, i32, i32)
45
+DEF_HELPER_FLAGS_3(mve_vld42h, TCG_CALL_NO_WG, void, env, i32, i32)
46
+DEF_HELPER_FLAGS_3(mve_vld42w, TCG_CALL_NO_WG, void, env, i32, i32)
47
+
48
+DEF_HELPER_FLAGS_3(mve_vld43b, TCG_CALL_NO_WG, void, env, i32, i32)
49
+DEF_HELPER_FLAGS_3(mve_vld43h, TCG_CALL_NO_WG, void, env, i32, i32)
50
+DEF_HELPER_FLAGS_3(mve_vld43w, TCG_CALL_NO_WG, void, env, i32, i32)
51
+
52
+DEF_HELPER_FLAGS_3(mve_vst20b, TCG_CALL_NO_WG, void, env, i32, i32)
53
+DEF_HELPER_FLAGS_3(mve_vst20h, TCG_CALL_NO_WG, void, env, i32, i32)
54
+DEF_HELPER_FLAGS_3(mve_vst20w, TCG_CALL_NO_WG, void, env, i32, i32)
55
+
56
+DEF_HELPER_FLAGS_3(mve_vst21b, TCG_CALL_NO_WG, void, env, i32, i32)
57
+DEF_HELPER_FLAGS_3(mve_vst21h, TCG_CALL_NO_WG, void, env, i32, i32)
58
+DEF_HELPER_FLAGS_3(mve_vst21w, TCG_CALL_NO_WG, void, env, i32, i32)
59
+
60
+DEF_HELPER_FLAGS_3(mve_vst40b, TCG_CALL_NO_WG, void, env, i32, i32)
61
+DEF_HELPER_FLAGS_3(mve_vst40h, TCG_CALL_NO_WG, void, env, i32, i32)
62
+DEF_HELPER_FLAGS_3(mve_vst40w, TCG_CALL_NO_WG, void, env, i32, i32)
63
+
64
+DEF_HELPER_FLAGS_3(mve_vst41b, TCG_CALL_NO_WG, void, env, i32, i32)
65
+DEF_HELPER_FLAGS_3(mve_vst41h, TCG_CALL_NO_WG, void, env, i32, i32)
66
+DEF_HELPER_FLAGS_3(mve_vst41w, TCG_CALL_NO_WG, void, env, i32, i32)
67
+
68
+DEF_HELPER_FLAGS_3(mve_vst42b, TCG_CALL_NO_WG, void, env, i32, i32)
69
+DEF_HELPER_FLAGS_3(mve_vst42h, TCG_CALL_NO_WG, void, env, i32, i32)
70
+DEF_HELPER_FLAGS_3(mve_vst42w, TCG_CALL_NO_WG, void, env, i32, i32)
71
+
72
+DEF_HELPER_FLAGS_3(mve_vst43b, TCG_CALL_NO_WG, void, env, i32, i32)
73
+DEF_HELPER_FLAGS_3(mve_vst43h, TCG_CALL_NO_WG, void, env, i32, i32)
74
+DEF_HELPER_FLAGS_3(mve_vst43w, TCG_CALL_NO_WG, void, env, i32, i32)
75
+
76
DEF_HELPER_FLAGS_3(mve_vdup, TCG_CALL_NO_WG, void, env, ptr, i32)
77
78
DEF_HELPER_FLAGS_4(mve_vidupb, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
79
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
80
index XXXXXXX..XXXXXXX 100644
81
--- a/target/arm/mve.decode
82
+++ b/target/arm/mve.decode
83
@@ -XXX,XX +XXX,XX @@
84
&vabav qn qm rda size
85
&vldst_sg qd qm rn size msize os
86
&vldst_sg_imm qd qm a w imm
87
+&vldst_il qd rn size pat w
88
89
# scatter-gather memory size is in bits 6:4
90
%sg_msize 6:1 4:1
91
@@ -XXX,XX +XXX,XX @@
92
@vldst_sg_imm .... .... a:1 . w:1 . .... .... .... . imm:7 &vldst_sg_imm \
93
qd=%qd qm=%qn
94
95
+# Deinterleaving load/interleaving store
96
+@vldst_il .... .... .. w:1 . rn:4 .... ... size:2 pat:2 ..... &vldst_il \
97
+ qd=%qd
98
+
99
@1op .... .... .... size:2 .. .... .... .... .... &1op qd=%qd qm=%qm
100
@1op_nosz .... .... .... .... .... .... .... .... &1op qd=%qd qm=%qm size=0
101
@2op .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn
102
@@ -XXX,XX +XXX,XX @@ VLDRD_sg_imm 111 1 1101 ... 1 ... 0 ... 1 1111 .... .... @vldst_sg_imm
103
VSTRW_sg_imm 111 1 1101 ... 0 ... 0 ... 1 1110 .... .... @vldst_sg_imm
104
VSTRD_sg_imm 111 1 1101 ... 0 ... 0 ... 1 1111 .... .... @vldst_sg_imm
105
106
+# deinterleaving loads/interleaving stores
107
+VLD2 1111 1100 1 .. 1 .... ... 1 111 .. .. 00000 @vldst_il
108
+VLD4 1111 1100 1 .. 1 .... ... 1 111 .. .. 00001 @vldst_il
109
+VST2 1111 1100 1 .. 0 .... ... 1 111 .. .. 00000 @vldst_il
110
+VST4 1111 1100 1 .. 0 .... ... 1 111 .. .. 00001 @vldst_il
111
+
112
# Moves between 2 32-bit vector lanes and 2 general purpose registers
113
VMOV_to_2gp 1110 1100 0 . 00 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
114
VMOV_from_2gp 1110 1100 0 . 01 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
115
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
116
index XXXXXXX..XXXXXXX 100644
117
--- a/target/arm/mve_helper.c
118
+++ b/target/arm/mve_helper.c
119
@@ -XXX,XX +XXX,XX @@ DO_VLDR64_SG(vldrd_sg_wb_ud, ADDR_ADD, true)
120
DO_VSTR_SG(vstrw_sg_wb_uw, stl, 4, uint32_t, ADDR_ADD, true)
121
DO_VSTR64_SG(vstrd_sg_wb_ud, ADDR_ADD, true)
122
123
+/*
124
+ * Deinterleaving loads/interleaving stores.
125
+ *
126
+ * For these helpers we are passed the index of the first Qreg
127
+ * (VLD2/VST2 will also access Qn+1, VLD4/VST4 access Qn .. Qn+3)
128
+ * and the value of the base address register Rn.
129
+ * The helpers are specialized for pattern and element size, so
130
+ * for instance vld42h is VLD4 with pattern 2, element size MO_16.
131
+ *
132
+ * These insns are beatwise but not predicated, so we must honour ECI,
133
+ * but need not look at mve_element_mask().
134
+ *
135
+ * The pseudocode implements these insns with multiple memory accesses
136
+ * of the element size, but rules R_VVVG and R_FXDM permit us to make
137
+ * one 32-bit memory access per beat.
138
+ */
139
+#define DO_VLD4B(OP, O1, O2, O3, O4) \
140
+ void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx, \
141
+ uint32_t base) \
142
+ { \
143
+ int beat, e; \
144
+ uint16_t mask = mve_eci_mask(env); \
145
+ static const uint8_t off[4] = { O1, O2, O3, O4 }; \
146
+ uint32_t addr, data; \
147
+ for (beat = 0; beat < 4; beat++, mask >>= 4) { \
148
+ if ((mask & 1) == 0) { \
149
+ /* ECI says skip this beat */ \
150
+ continue; \
151
+ } \
152
+ addr = base + off[beat] * 4; \
153
+ data = cpu_ldl_le_data_ra(env, addr, GETPC()); \
154
+ for (e = 0; e < 4; e++, data >>= 8) { \
155
+ uint8_t *qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + e); \
156
+ qd[H1(off[beat])] = data; \
157
+ } \
158
+ } \
159
+ }
160
+
161
+#define DO_VLD4H(OP, O1, O2) \
162
+ void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx, \
163
+ uint32_t base) \
164
+ { \
165
+ int beat; \
166
+ uint16_t mask = mve_eci_mask(env); \
167
+ static const uint8_t off[4] = { O1, O1, O2, O2 }; \
168
+ uint32_t addr, data; \
169
+ int y; /* y counts 0 2 0 2 */ \
170
+ uint16_t *qd; \
171
+ for (beat = 0, y = 0; beat < 4; beat++, mask >>= 4, y ^= 2) { \
172
+ if ((mask & 1) == 0) { \
173
+ /* ECI says skip this beat */ \
174
+ continue; \
175
+ } \
176
+ addr = base + off[beat] * 8 + (beat & 1) * 4; \
177
+ data = cpu_ldl_le_data_ra(env, addr, GETPC()); \
178
+ qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + y); \
179
+ qd[H2(off[beat])] = data; \
180
+ data >>= 16; \
181
+ qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + y + 1); \
182
+ qd[H2(off[beat])] = data; \
183
+ } \
184
+ }
185
+
186
+#define DO_VLD4W(OP, O1, O2, O3, O4) \
187
+ void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx, \
188
+ uint32_t base) \
189
+ { \
190
+ int beat; \
191
+ uint16_t mask = mve_eci_mask(env); \
192
+ static const uint8_t off[4] = { O1, O2, O3, O4 }; \
193
+ uint32_t addr, data; \
194
+ uint32_t *qd; \
195
+ int y; \
196
+ for (beat = 0; beat < 4; beat++, mask >>= 4) { \
197
+ if ((mask & 1) == 0) { \
198
+ /* ECI says skip this beat */ \
199
+ continue; \
200
+ } \
201
+ addr = base + off[beat] * 4; \
202
+ data = cpu_ldl_le_data_ra(env, addr, GETPC()); \
203
+ y = (beat + (O1 & 2)) & 3; \
204
+ qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + y); \
205
+ qd[H4(off[beat] >> 2)] = data; \
206
+ } \
207
+ }
208
+
209
+DO_VLD4B(vld40b, 0, 1, 10, 11)
210
+DO_VLD4B(vld41b, 2, 3, 12, 13)
211
+DO_VLD4B(vld42b, 4, 5, 14, 15)
212
+DO_VLD4B(vld43b, 6, 7, 8, 9)
213
+
214
+DO_VLD4H(vld40h, 0, 5)
215
+DO_VLD4H(vld41h, 1, 6)
216
+DO_VLD4H(vld42h, 2, 7)
217
+DO_VLD4H(vld43h, 3, 4)
218
+
219
+DO_VLD4W(vld40w, 0, 1, 10, 11)
220
+DO_VLD4W(vld41w, 2, 3, 12, 13)
221
+DO_VLD4W(vld42w, 4, 5, 14, 15)
222
+DO_VLD4W(vld43w, 6, 7, 8, 9)
223
+
224
+#define DO_VLD2B(OP, O1, O2, O3, O4) \
225
+ void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx, \
226
+ uint32_t base) \
227
+ { \
228
+ int beat, e; \
229
+ uint16_t mask = mve_eci_mask(env); \
230
+ static const uint8_t off[4] = { O1, O2, O3, O4 }; \
231
+ uint32_t addr, data; \
232
+ uint8_t *qd; \
233
+ for (beat = 0; beat < 4; beat++, mask >>= 4) { \
234
+ if ((mask & 1) == 0) { \
235
+ /* ECI says skip this beat */ \
236
+ continue; \
237
+ } \
238
+ addr = base + off[beat] * 2; \
239
+ data = cpu_ldl_le_data_ra(env, addr, GETPC()); \
240
+ for (e = 0; e < 4; e++, data >>= 8) { \
241
+ qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + (e & 1)); \
242
+ qd[H1(off[beat] + (e >> 1))] = data; \
243
+ } \
244
+ } \
245
+ }
246
+
247
+#define DO_VLD2H(OP, O1, O2, O3, O4) \
248
+ void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx, \
249
+ uint32_t base) \
250
+ { \
251
+ int beat; \
252
+ uint16_t mask = mve_eci_mask(env); \
253
+ static const uint8_t off[4] = { O1, O2, O3, O4 }; \
254
+ uint32_t addr, data; \
255
+ int e; \
256
+ uint16_t *qd; \
257
+ for (beat = 0; beat < 4; beat++, mask >>= 4) { \
258
+ if ((mask & 1) == 0) { \
259
+ /* ECI says skip this beat */ \
260
+ continue; \
261
+ } \
262
+ addr = base + off[beat] * 4; \
263
+ data = cpu_ldl_le_data_ra(env, addr, GETPC()); \
264
+ for (e = 0; e < 2; e++, data >>= 16) { \
265
+ qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + e); \
266
+ qd[H2(off[beat])] = data; \
267
+ } \
268
+ } \
269
+ }
270
+
271
+#define DO_VLD2W(OP, O1, O2, O3, O4) \
272
+ void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx, \
273
+ uint32_t base) \
274
+ { \
275
+ int beat; \
276
+ uint16_t mask = mve_eci_mask(env); \
277
+ static const uint8_t off[4] = { O1, O2, O3, O4 }; \
278
+ uint32_t addr, data; \
279
+ uint32_t *qd; \
280
+ for (beat = 0; beat < 4; beat++, mask >>= 4) { \
281
+ if ((mask & 1) == 0) { \
282
+ /* ECI says skip this beat */ \
283
+ continue; \
284
+ } \
285
+ addr = base + off[beat]; \
286
+ data = cpu_ldl_le_data_ra(env, addr, GETPC()); \
287
+ qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + (beat & 1)); \
288
+ qd[H4(off[beat] >> 3)] = data; \
289
+ } \
290
+ }
291
+
292
+DO_VLD2B(vld20b, 0, 2, 12, 14)
293
+DO_VLD2B(vld21b, 4, 6, 8, 10)
294
+
295
+DO_VLD2H(vld20h, 0, 1, 6, 7)
296
+DO_VLD2H(vld21h, 2, 3, 4, 5)
297
+
298
+DO_VLD2W(vld20w, 0, 4, 24, 28)
299
+DO_VLD2W(vld21w, 8, 12, 16, 20)
300
+
301
+#define DO_VST4B(OP, O1, O2, O3, O4) \
302
+ void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx, \
303
+ uint32_t base) \
304
+ { \
305
+ int beat, e; \
306
+ uint16_t mask = mve_eci_mask(env); \
307
+ static const uint8_t off[4] = { O1, O2, O3, O4 }; \
308
+ uint32_t addr, data; \
309
+ for (beat = 0; beat < 4; beat++, mask >>= 4) { \
310
+ if ((mask & 1) == 0) { \
311
+ /* ECI says skip this beat */ \
312
+ continue; \
313
+ } \
314
+ addr = base + off[beat] * 4; \
315
+ data = 0; \
316
+ for (e = 3; e >= 0; e--) { \
317
+ uint8_t *qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + e); \
318
+ data = (data << 8) | qd[H1(off[beat])]; \
319
+ } \
320
+ cpu_stl_le_data_ra(env, addr, data, GETPC()); \
321
+ } \
322
+ }
323
+
324
+#define DO_VST4H(OP, O1, O2) \
325
+ void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx, \
326
+ uint32_t base) \
327
+ { \
328
+ int beat; \
329
+ uint16_t mask = mve_eci_mask(env); \
330
+ static const uint8_t off[4] = { O1, O1, O2, O2 }; \
331
+ uint32_t addr, data; \
332
+ int y; /* y counts 0 2 0 2 */ \
333
+ uint16_t *qd; \
334
+ for (beat = 0, y = 0; beat < 4; beat++, mask >>= 4, y ^= 2) { \
335
+ if ((mask & 1) == 0) { \
336
+ /* ECI says skip this beat */ \
337
+ continue; \
338
+ } \
339
+ addr = base + off[beat] * 8 + (beat & 1) * 4; \
340
+ qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + y); \
341
+ data = qd[H2(off[beat])]; \
342
+ qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + y + 1); \
343
+ data |= qd[H2(off[beat])] << 16; \
344
+ cpu_stl_le_data_ra(env, addr, data, GETPC()); \
345
+ } \
346
+ }
347
+
348
+#define DO_VST4W(OP, O1, O2, O3, O4) \
349
+ void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx, \
350
+ uint32_t base) \
351
+ { \
352
+ int beat; \
353
+ uint16_t mask = mve_eci_mask(env); \
354
+ static const uint8_t off[4] = { O1, O2, O3, O4 }; \
355
+ uint32_t addr, data; \
356
+ uint32_t *qd; \
357
+ int y; \
358
+ for (beat = 0; beat < 4; beat++, mask >>= 4) { \
359
+ if ((mask & 1) == 0) { \
360
+ /* ECI says skip this beat */ \
361
+ continue; \
362
+ } \
363
+ addr = base + off[beat] * 4; \
364
+ y = (beat + (O1 & 2)) & 3; \
365
+ qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + y); \
366
+ data = qd[H4(off[beat] >> 2)]; \
367
+ cpu_stl_le_data_ra(env, addr, data, GETPC()); \
368
+ } \
369
+ }
370
+
371
+DO_VST4B(vst40b, 0, 1, 10, 11)
372
+DO_VST4B(vst41b, 2, 3, 12, 13)
373
+DO_VST4B(vst42b, 4, 5, 14, 15)
374
+DO_VST4B(vst43b, 6, 7, 8, 9)
375
+
376
+DO_VST4H(vst40h, 0, 5)
377
+DO_VST4H(vst41h, 1, 6)
378
+DO_VST4H(vst42h, 2, 7)
379
+DO_VST4H(vst43h, 3, 4)
380
+
381
+DO_VST4W(vst40w, 0, 1, 10, 11)
382
+DO_VST4W(vst41w, 2, 3, 12, 13)
383
+DO_VST4W(vst42w, 4, 5, 14, 15)
384
+DO_VST4W(vst43w, 6, 7, 8, 9)
385
+
386
+#define DO_VST2B(OP, O1, O2, O3, O4) \
387
+ void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx, \
388
+ uint32_t base) \
389
+ { \
390
+ int beat, e; \
391
+ uint16_t mask = mve_eci_mask(env); \
392
+ static const uint8_t off[4] = { O1, O2, O3, O4 }; \
393
+ uint32_t addr, data; \
394
+ uint8_t *qd; \
395
+ for (beat = 0; beat < 4; beat++, mask >>= 4) { \
396
+ if ((mask & 1) == 0) { \
397
+ /* ECI says skip this beat */ \
398
+ continue; \
399
+ } \
400
+ addr = base + off[beat] * 2; \
401
+ data = 0; \
402
+ for (e = 3; e >= 0; e--) { \
403
+ qd = (uint8_t *)aa32_vfp_qreg(env, qnidx + (e & 1)); \
404
+ data = (data << 8) | qd[H1(off[beat] + (e >> 1))]; \
405
+ } \
406
+ cpu_stl_le_data_ra(env, addr, data, GETPC()); \
407
+ } \
408
+ }
409
+
410
+#define DO_VST2H(OP, O1, O2, O3, O4) \
411
+ void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx, \
412
+ uint32_t base) \
413
+ { \
414
+ int beat; \
415
+ uint16_t mask = mve_eci_mask(env); \
416
+ static const uint8_t off[4] = { O1, O2, O3, O4 }; \
417
+ uint32_t addr, data; \
418
+ int e; \
419
+ uint16_t *qd; \
420
+ for (beat = 0; beat < 4; beat++, mask >>= 4) { \
421
+ if ((mask & 1) == 0) { \
422
+ /* ECI says skip this beat */ \
423
+ continue; \
424
+ } \
425
+ addr = base + off[beat] * 4; \
426
+ data = 0; \
427
+ for (e = 1; e >= 0; e--) { \
428
+ qd = (uint16_t *)aa32_vfp_qreg(env, qnidx + e); \
429
+ data = (data << 16) | qd[H2(off[beat])]; \
430
+ } \
431
+ cpu_stl_le_data_ra(env, addr, data, GETPC()); \
432
+ } \
433
+ }
434
+
435
+#define DO_VST2W(OP, O1, O2, O3, O4) \
436
+ void HELPER(mve_##OP)(CPUARMState *env, uint32_t qnidx, \
437
+ uint32_t base) \
438
+ { \
439
+ int beat; \
440
+ uint16_t mask = mve_eci_mask(env); \
441
+ static const uint8_t off[4] = { O1, O2, O3, O4 }; \
442
+ uint32_t addr, data; \
443
+ uint32_t *qd; \
444
+ for (beat = 0; beat < 4; beat++, mask >>= 4) { \
445
+ if ((mask & 1) == 0) { \
446
+ /* ECI says skip this beat */ \
447
+ continue; \
448
+ } \
449
+ addr = base + off[beat]; \
450
+ qd = (uint32_t *)aa32_vfp_qreg(env, qnidx + (beat & 1)); \
451
+ data = qd[H4(off[beat] >> 3)]; \
452
+ cpu_stl_le_data_ra(env, addr, data, GETPC()); \
453
+ } \
454
+ }
455
+
456
+DO_VST2B(vst20b, 0, 2, 12, 14)
457
+DO_VST2B(vst21b, 4, 6, 8, 10)
458
+
459
+DO_VST2H(vst20h, 0, 1, 6, 7)
460
+DO_VST2H(vst21h, 2, 3, 4, 5)
461
+
462
+DO_VST2W(vst20w, 0, 4, 24, 28)
463
+DO_VST2W(vst21w, 8, 12, 16, 20)
464
+
29
+
465
/*
30
/*
466
* The mergemask(D, R, M) macro performs the operation "*D = R" but
31
* Return the offset info CPUARMState of the predicate vector register Pn.
467
* storing only the bytes which correspond to 1 bits in M,
32
* Note for this purpose, FFR is P16.
468
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
33
@@ -XXX,XX +XXX,XX @@ static inline int pred_full_reg_size(DisasContext *s)
34
return s->vl >> 3;
35
}
36
37
+/* Return the byte size of the predicate register, SVL / 64. */
38
+static inline int streaming_pred_reg_size(DisasContext *s)
39
+{
40
+ return s->svl >> 3;
41
+}
42
+
43
/*
44
* Round up the size of a register to a size allowed by
45
* the tcg vector infrastructure. Any operation which uses this
46
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
469
index XXXXXXX..XXXXXXX 100644
47
index XXXXXXX..XXXXXXX 100644
470
--- a/target/arm/translate-mve.c
48
--- a/target/arm/sve.decode
471
+++ b/target/arm/translate-mve.c
49
+++ b/target/arm/sve.decode
472
@@ -XXX,XX +XXX,XX @@ static inline int vidup_imm(DisasContext *s, int x)
50
@@ -XXX,XX +XXX,XX @@ INDEX_ri 00000100 esz:2 1 imm:s5 010001 rn:5 rd:5
473
51
# SVE index generation (register start, register increment)
474
typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
52
INDEX_rr 00000100 .. 1 ..... 010011 ..... ..... @rd_rn_rm
475
typedef void MVEGenLdStSGFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
53
476
+typedef void MVEGenLdStIlFn(TCGv_ptr, TCGv_i32, TCGv_i32);
54
-### SVE Stack Allocation Group
477
typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
55
+### SVE / Streaming SVE Stack Allocation Group
478
typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
56
479
typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
57
# SVE stack frame adjustment
480
@@ -XXX,XX +XXX,XX @@ static bool trans_VSTRD_sg_imm(DisasContext *s, arg_vldst_sg_imm *a)
58
ADDVL 00000100 001 ..... 01010 ...... ..... @rd_rn_i6
481
return do_ldst_sg_imm(s, a, fns[a->w], MO_64);
59
+ADDSVL 00000100 001 ..... 01011 ...... ..... @rd_rn_i6
60
ADDPL 00000100 011 ..... 01010 ...... ..... @rd_rn_i6
61
+ADDSPL 00000100 011 ..... 01011 ...... ..... @rd_rn_i6
62
63
# SVE stack frame size
64
RDVL 00000100 101 11111 01010 imm:s6 rd:5
65
+RDSVL 00000100 101 11111 01011 imm:s6 rd:5
66
67
### SVE Bitwise Shift - Unpredicated Group
68
69
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
70
index XXXXXXX..XXXXXXX 100644
71
--- a/target/arm/translate-sve.c
72
+++ b/target/arm/translate-sve.c
73
@@ -XXX,XX +XXX,XX @@ static bool trans_ADDVL(DisasContext *s, arg_ADDVL *a)
74
return true;
482
}
75
}
483
76
484
+static bool do_vldst_il(DisasContext *s, arg_vldst_il *a, MVEGenLdStIlFn *fn,
77
+static bool trans_ADDSVL(DisasContext *s, arg_ADDSVL *a)
485
+ int addrinc)
486
+{
78
+{
487
+ TCGv_i32 rn;
79
+ if (!dc_isar_feature(aa64_sme, s)) {
488
+
489
+ if (!dc_isar_feature(aa32_mve, s) ||
490
+ !mve_check_qreg_bank(s, a->qd) ||
491
+ !fn || (a->rn == 13 && a->w) || a->rn == 15) {
492
+ /* Variously UNPREDICTABLE or UNDEF or related-encoding */
493
+ return false;
80
+ return false;
494
+ }
81
+ }
495
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
82
+ if (sme_enabled_check(s)) {
496
+ return true;
83
+ TCGv_i64 rd = cpu_reg_sp(s, a->rd);
84
+ TCGv_i64 rn = cpu_reg_sp(s, a->rn);
85
+ tcg_gen_addi_i64(rd, rn, a->imm * streaming_vec_reg_size(s));
497
+ }
86
+ }
498
+
499
+ rn = load_reg(s, a->rn);
500
+ /*
501
+ * We pass the index of Qd, not a pointer, because the helper must
502
+ * access multiple Q registers starting at Qd and working up.
503
+ */
504
+ fn(cpu_env, tcg_constant_i32(a->qd), rn);
505
+
506
+ if (a->w) {
507
+ tcg_gen_addi_i32(rn, rn, addrinc);
508
+ store_reg(s, a->rn, rn);
509
+ } else {
510
+ tcg_temp_free_i32(rn);
511
+ }
512
+ mve_update_and_store_eci(s);
513
+ return true;
87
+ return true;
514
+}
88
+}
515
+
89
+
516
+/* This macro is just to make the arrays more compact in these functions */
90
static bool trans_ADDPL(DisasContext *s, arg_ADDPL *a)
517
+#define F(N) gen_helper_mve_##N
91
{
518
+
92
if (!dc_isar_feature(aa64_sve, s)) {
519
+static bool trans_VLD2(DisasContext *s, arg_vldst_il *a)
93
@@ -XXX,XX +XXX,XX @@ static bool trans_ADDPL(DisasContext *s, arg_ADDPL *a)
94
return true;
95
}
96
97
+static bool trans_ADDSPL(DisasContext *s, arg_ADDSPL *a)
520
+{
98
+{
521
+ static MVEGenLdStIlFn * const fns[4][4] = {
99
+ if (!dc_isar_feature(aa64_sme, s)) {
522
+ { F(vld20b), F(vld20h), F(vld20w), NULL, },
523
+ { F(vld21b), F(vld21h), F(vld21w), NULL, },
524
+ { NULL, NULL, NULL, NULL },
525
+ { NULL, NULL, NULL, NULL },
526
+ };
527
+ if (a->qd > 6) {
528
+ return false;
100
+ return false;
529
+ }
101
+ }
530
+ return do_vldst_il(s, a, fns[a->pat][a->size], 32);
102
+ if (sme_enabled_check(s)) {
103
+ TCGv_i64 rd = cpu_reg_sp(s, a->rd);
104
+ TCGv_i64 rn = cpu_reg_sp(s, a->rn);
105
+ tcg_gen_addi_i64(rd, rn, a->imm * streaming_pred_reg_size(s));
106
+ }
107
+ return true;
531
+}
108
+}
532
+
109
+
533
+static bool trans_VLD4(DisasContext *s, arg_vldst_il *a)
110
static bool trans_RDVL(DisasContext *s, arg_RDVL *a)
111
{
112
if (!dc_isar_feature(aa64_sve, s)) {
113
@@ -XXX,XX +XXX,XX @@ static bool trans_RDVL(DisasContext *s, arg_RDVL *a)
114
return true;
115
}
116
117
+static bool trans_RDSVL(DisasContext *s, arg_RDSVL *a)
534
+{
118
+{
535
+ static MVEGenLdStIlFn * const fns[4][4] = {
119
+ if (!dc_isar_feature(aa64_sme, s)) {
536
+ { F(vld40b), F(vld40h), F(vld40w), NULL, },
537
+ { F(vld41b), F(vld41h), F(vld41w), NULL, },
538
+ { F(vld42b), F(vld42h), F(vld42w), NULL, },
539
+ { F(vld43b), F(vld43h), F(vld43w), NULL, },
540
+ };
541
+ if (a->qd > 4) {
542
+ return false;
120
+ return false;
543
+ }
121
+ }
544
+ return do_vldst_il(s, a, fns[a->pat][a->size], 64);
122
+ if (sme_enabled_check(s)) {
123
+ TCGv_i64 reg = cpu_reg(s, a->rd);
124
+ tcg_gen_movi_i64(reg, a->imm * streaming_vec_reg_size(s));
125
+ }
126
+ return true;
545
+}
127
+}
546
+
128
+
547
+static bool trans_VST2(DisasContext *s, arg_vldst_il *a)
129
/*
548
+{
130
*** SVE Compute Vector Address Group
549
+ static MVEGenLdStIlFn * const fns[4][4] = {
131
*/
550
+ { F(vst20b), F(vst20h), F(vst20w), NULL, },
551
+ { F(vst21b), F(vst21h), F(vst21w), NULL, },
552
+ { NULL, NULL, NULL, NULL },
553
+ { NULL, NULL, NULL, NULL },
554
+ };
555
+ if (a->qd > 6) {
556
+ return false;
557
+ }
558
+ return do_vldst_il(s, a, fns[a->pat][a->size], 32);
559
+}
560
+
561
+static bool trans_VST4(DisasContext *s, arg_vldst_il *a)
562
+{
563
+ static MVEGenLdStIlFn * const fns[4][4] = {
564
+ { F(vst40b), F(vst40h), F(vst40w), NULL, },
565
+ { F(vst41b), F(vst41h), F(vst41w), NULL, },
566
+ { F(vst42b), F(vst42h), F(vst42w), NULL, },
567
+ { F(vst43b), F(vst43h), F(vst43w), NULL, },
568
+ };
569
+ if (a->qd > 4) {
570
+ return false;
571
+ }
572
+ return do_vldst_il(s, a, fns[a->pat][a->size], 64);
573
+}
574
+
575
+#undef F
576
+
577
static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
578
{
579
TCGv_ptr qd;
580
--
132
--
581
2.20.1
133
2.25.1
582
583
diff view generated by jsdifflib
1
Implement the MVE 1-operand saturating operations VQABS and VQNEG.
1
From: Richard Henderson <richard.henderson@linaro.org>
2
2
3
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
5
Message-id: 20220708151540.18136-19-richard.henderson@linaro.org
3
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
5
---
7
---
6
target/arm/helper-mve.h | 8 ++++++++
8
target/arm/helper-sme.h | 2 ++
7
target/arm/mve.decode | 3 +++
9
target/arm/sme.decode | 4 ++++
8
target/arm/mve_helper.c | 37 +++++++++++++++++++++++++++++++++++++
10
target/arm/sme_helper.c | 25 +++++++++++++++++++++++++
9
target/arm/translate-mve.c | 2 ++
11
target/arm/translate-sme.c | 13 +++++++++++++
10
4 files changed, 50 insertions(+)
12
4 files changed, 44 insertions(+)
11
13
12
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
14
diff --git a/target/arm/helper-sme.h b/target/arm/helper-sme.h
13
index XXXXXXX..XXXXXXX 100644
15
index XXXXXXX..XXXXXXX 100644
14
--- a/target/arm/helper-mve.h
16
--- a/target/arm/helper-sme.h
15
+++ b/target/arm/helper-mve.h
17
+++ b/target/arm/helper-sme.h
16
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vnegw, TCG_CALL_NO_WG, void, env, ptr, ptr)
18
@@ -XXX,XX +XXX,XX @@
17
DEF_HELPER_FLAGS_3(mve_vfnegh, TCG_CALL_NO_WG, void, env, ptr, ptr)
19
18
DEF_HELPER_FLAGS_3(mve_vfnegs, TCG_CALL_NO_WG, void, env, ptr, ptr)
20
DEF_HELPER_FLAGS_2(set_pstate_sm, TCG_CALL_NO_RWG, void, env, i32)
19
21
DEF_HELPER_FLAGS_2(set_pstate_za, TCG_CALL_NO_RWG, void, env, i32)
20
+DEF_HELPER_FLAGS_3(mve_vqabsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
21
+DEF_HELPER_FLAGS_3(mve_vqabsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
22
+DEF_HELPER_FLAGS_3(mve_vqabsw, TCG_CALL_NO_WG, void, env, ptr, ptr)
23
+
22
+
24
+DEF_HELPER_FLAGS_3(mve_vqnegb, TCG_CALL_NO_WG, void, env, ptr, ptr)
23
+DEF_HELPER_FLAGS_3(sme_zero, TCG_CALL_NO_RWG, void, env, i32, i32)
25
+DEF_HELPER_FLAGS_3(mve_vqnegh, TCG_CALL_NO_WG, void, env, ptr, ptr)
24
diff --git a/target/arm/sme.decode b/target/arm/sme.decode
26
+DEF_HELPER_FLAGS_3(mve_vqnegw, TCG_CALL_NO_WG, void, env, ptr, ptr)
25
index XXXXXXX..XXXXXXX 100644
26
--- a/target/arm/sme.decode
27
+++ b/target/arm/sme.decode
28
@@ -XXX,XX +XXX,XX @@
29
#
30
# This file is processed by scripts/decodetree.py
31
#
27
+
32
+
28
DEF_HELPER_FLAGS_3(mve_vmovnbb, TCG_CALL_NO_WG, void, env, ptr, ptr)
33
+### SME Misc
29
DEF_HELPER_FLAGS_3(mve_vmovnbh, TCG_CALL_NO_WG, void, env, ptr, ptr)
34
+
30
DEF_HELPER_FLAGS_3(mve_vmovntb, TCG_CALL_NO_WG, void, env, ptr, ptr)
35
+ZERO 11000000 00 001 00000000000 imm:8
31
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
36
diff --git a/target/arm/sme_helper.c b/target/arm/sme_helper.c
32
index XXXXXXX..XXXXXXX 100644
37
index XXXXXXX..XXXXXXX 100644
33
--- a/target/arm/mve.decode
38
--- a/target/arm/sme_helper.c
34
+++ b/target/arm/mve.decode
39
+++ b/target/arm/sme_helper.c
35
@@ -XXX,XX +XXX,XX @@ VABS_fp 1111 1111 1 . 11 .. 01 ... 0 0111 01 . 0 ... 0 @1op
40
@@ -XXX,XX +XXX,XX @@ void helper_set_pstate_za(CPUARMState *env, uint32_t i)
36
VNEG 1111 1111 1 . 11 .. 01 ... 0 0011 11 . 0 ... 0 @1op
41
memset(env->zarray, 0, sizeof(env->zarray));
37
VNEG_fp 1111 1111 1 . 11 .. 01 ... 0 0111 11 . 0 ... 0 @1op
38
39
+VQABS 1111 1111 1 . 11 .. 00 ... 0 0111 01 . 0 ... 0 @1op
40
+VQNEG 1111 1111 1 . 11 .. 00 ... 0 0111 11 . 0 ... 0 @1op
41
+
42
&vdup qd rt size
43
# Qd is in the fields usually named Qn
44
@vdup .... .... . . .. ... . rt:4 .... . . . . .... qd=%qn &vdup
45
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
46
index XXXXXXX..XXXXXXX 100644
47
--- a/target/arm/mve_helper.c
48
+++ b/target/arm/mve_helper.c
49
@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vpsel)(CPUARMState *env, void *vd, void *vn, void *vm)
50
}
42
}
51
mve_advance_vpt(env);
52
}
43
}
53
+
44
+
54
+#define DO_1OP_SAT(OP, ESIZE, TYPE, FN) \
45
+void helper_sme_zero(CPUARMState *env, uint32_t imm, uint32_t svl)
55
+ void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm) \
46
+{
56
+ { \
47
+ uint32_t i;
57
+ TYPE *d = vd, *m = vm; \
48
+
58
+ uint16_t mask = mve_element_mask(env); \
49
+ /*
59
+ unsigned e; \
50
+ * Special case clearing the entire ZA space.
60
+ bool qc = false; \
51
+ * This falls into the CONSTRAINED UNPREDICTABLE zeroing of any
61
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
52
+ * parts of the ZA storage outside of SVL.
62
+ bool sat = false; \
53
+ */
63
+ mergemask(&d[H##ESIZE(e)], FN(m[H##ESIZE(e)], &sat), mask); \
54
+ if (imm == 0xff) {
64
+ qc |= sat & mask & 1; \
55
+ memset(env->zarray, 0, sizeof(env->zarray));
65
+ } \
56
+ return;
66
+ if (qc) { \
67
+ env->vfp.qc[0] = qc; \
68
+ } \
69
+ mve_advance_vpt(env); \
70
+ }
57
+ }
71
+
58
+
72
+#define DO_VQABS_B(N, SATP) \
59
+ /*
73
+ do_sat_bhs(DO_ABS((int64_t)N), INT8_MIN, INT8_MAX, SATP)
60
+ * Recall that ZAnH.D[m] is spread across ZA[n+8*m],
74
+#define DO_VQABS_H(N, SATP) \
61
+ * so each row is discontiguous within ZA[].
75
+ do_sat_bhs(DO_ABS((int64_t)N), INT16_MIN, INT16_MAX, SATP)
62
+ */
76
+#define DO_VQABS_W(N, SATP) \
63
+ for (i = 0; i < svl; i++) {
77
+ do_sat_bhs(DO_ABS((int64_t)N), INT32_MIN, INT32_MAX, SATP)
64
+ if (imm & (1 << (i % 8))) {
65
+ memset(&env->zarray[i], 0, svl);
66
+ }
67
+ }
68
+}
69
diff --git a/target/arm/translate-sme.c b/target/arm/translate-sme.c
70
index XXXXXXX..XXXXXXX 100644
71
--- a/target/arm/translate-sme.c
72
+++ b/target/arm/translate-sme.c
73
@@ -XXX,XX +XXX,XX @@
74
*/
75
76
#include "decode-sme.c.inc"
78
+
77
+
79
+#define DO_VQNEG_B(N, SATP) do_sat_bhs(-(int64_t)N, INT8_MIN, INT8_MAX, SATP)
80
+#define DO_VQNEG_H(N, SATP) do_sat_bhs(-(int64_t)N, INT16_MIN, INT16_MAX, SATP)
81
+#define DO_VQNEG_W(N, SATP) do_sat_bhs(-(int64_t)N, INT32_MIN, INT32_MAX, SATP)
82
+
78
+
83
+DO_1OP_SAT(vqabsb, 1, int8_t, DO_VQABS_B)
79
+static bool trans_ZERO(DisasContext *s, arg_ZERO *a)
84
+DO_1OP_SAT(vqabsh, 2, int16_t, DO_VQABS_H)
80
+{
85
+DO_1OP_SAT(vqabsw, 4, int32_t, DO_VQABS_W)
81
+ if (!dc_isar_feature(aa64_sme, s)) {
86
+
82
+ return false;
87
+DO_1OP_SAT(vqnegb, 1, int8_t, DO_VQNEG_B)
83
+ }
88
+DO_1OP_SAT(vqnegh, 2, int16_t, DO_VQNEG_H)
84
+ if (sme_za_enabled_check(s)) {
89
+DO_1OP_SAT(vqnegw, 4, int32_t, DO_VQNEG_W)
85
+ gen_helper_sme_zero(cpu_env, tcg_constant_i32(a->imm),
90
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
86
+ tcg_constant_i32(streaming_vec_reg_size(s)));
91
index XXXXXXX..XXXXXXX 100644
87
+ }
92
--- a/target/arm/translate-mve.c
88
+ return true;
93
+++ b/target/arm/translate-mve.c
89
+}
94
@@ -XXX,XX +XXX,XX @@ DO_1OP(VCLZ, vclz)
95
DO_1OP(VCLS, vcls)
96
DO_1OP(VABS, vabs)
97
DO_1OP(VNEG, vneg)
98
+DO_1OP(VQABS, vqabs)
99
+DO_1OP(VQNEG, vqneg)
100
101
/* Narrowing moves: only size 0 and 1 are valid */
102
#define DO_VMOVN(INSN, FN) \
103
--
90
--
104
2.20.1
91
2.25.1
105
106
diff view generated by jsdifflib
1
Implement the MVE VCTP insn, which sets the VPR.P0 predicate bits so
1
From: Richard Henderson <richard.henderson@linaro.org>
2
as to predicate any element at index Rn or greater is predicated. As
3
with VPNOT, this insn itself is predicable and subject to beatwise
4
execution.
5
2
6
The calculation of the mask is the same as is used to determine
3
We can reuse the SVE functions for implementing moves to/from
7
ltpmask in mve_element_mask(), but we precalculate masklen in
4
horizontal tile slices, but we need new ones for moves to/from
8
generated code to avoid having to have 4 helpers specialized by size.
5
vertical tile slices.
9
6
10
We put the decode line in with the low-overhead-loop insns in
7
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
11
t32.decode because it's logically part of that collection of insn
8
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
12
patterns, even though it is an MVE only insn.
9
Message-id: 20220708151540.18136-20-richard.henderson@linaro.org
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
---
12
target/arm/helper-sme.h | 12 +++
13
target/arm/helper-sve.h | 2 +
14
target/arm/translate-a64.h | 8 ++
15
target/arm/translate.h | 5 ++
16
target/arm/sme.decode | 15 ++++
17
target/arm/sme_helper.c | 151 ++++++++++++++++++++++++++++++++++++-
18
target/arm/sve_helper.c | 12 +++
19
target/arm/translate-sme.c | 127 +++++++++++++++++++++++++++++++
20
8 files changed, 331 insertions(+), 1 deletion(-)
13
21
14
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
22
diff --git a/target/arm/helper-sme.h b/target/arm/helper-sme.h
15
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
23
index XXXXXXX..XXXXXXX 100644
16
---
24
--- a/target/arm/helper-sme.h
17
target/arm/helper-mve.h | 2 ++
25
+++ b/target/arm/helper-sme.h
18
target/arm/translate-a32.h | 1 +
26
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_2(set_pstate_sm, TCG_CALL_NO_RWG, void, env, i32)
19
target/arm/t32.decode | 1 +
27
DEF_HELPER_FLAGS_2(set_pstate_za, TCG_CALL_NO_RWG, void, env, i32)
20
target/arm/mve_helper.c | 20 ++++++++++++++++++++
28
21
target/arm/translate-mve.c | 2 +-
29
DEF_HELPER_FLAGS_3(sme_zero, TCG_CALL_NO_RWG, void, env, i32, i32)
22
target/arm/translate.c | 33 +++++++++++++++++++++++++++++++++
30
+
23
6 files changed, 58 insertions(+), 1 deletion(-)
31
+/* Move to/from vertical array slices, i.e. columns, so 'c'. */
24
32
+DEF_HELPER_FLAGS_4(sme_mova_cz_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
25
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
33
+DEF_HELPER_FLAGS_4(sme_mova_zc_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
26
index XXXXXXX..XXXXXXX 100644
34
+DEF_HELPER_FLAGS_4(sme_mova_cz_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
27
--- a/target/arm/helper-mve.h
35
+DEF_HELPER_FLAGS_4(sme_mova_zc_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
28
+++ b/target/arm/helper-mve.h
36
+DEF_HELPER_FLAGS_4(sme_mova_cz_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
29
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_veor, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
37
+DEF_HELPER_FLAGS_4(sme_mova_zc_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
30
DEF_HELPER_FLAGS_4(mve_vpsel, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
38
+DEF_HELPER_FLAGS_4(sme_mova_cz_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
31
DEF_HELPER_FLAGS_1(mve_vpnot, TCG_CALL_NO_WG, void, env)
39
+DEF_HELPER_FLAGS_4(sme_mova_zc_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
32
40
+DEF_HELPER_FLAGS_4(sme_mova_cz_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
33
+DEF_HELPER_FLAGS_2(mve_vctp, TCG_CALL_NO_WG, void, env, i32)
41
+DEF_HELPER_FLAGS_4(sme_mova_zc_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
34
+
42
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
35
DEF_HELPER_FLAGS_4(mve_vaddb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
43
index XXXXXXX..XXXXXXX 100644
36
DEF_HELPER_FLAGS_4(mve_vaddh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
44
--- a/target/arm/helper-sve.h
37
DEF_HELPER_FLAGS_4(mve_vaddw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
45
+++ b/target/arm/helper-sve.h
38
diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
46
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sve_sel_zpzz_s, TCG_CALL_NO_RWG,
39
index XXXXXXX..XXXXXXX 100644
47
void, ptr, ptr, ptr, ptr, i32)
40
--- a/target/arm/translate-a32.h
48
DEF_HELPER_FLAGS_5(sve_sel_zpzz_d, TCG_CALL_NO_RWG,
41
+++ b/target/arm/translate-a32.h
49
void, ptr, ptr, ptr, ptr, i32)
42
@@ -XXX,XX +XXX,XX @@ long neon_element_offset(int reg, int element, MemOp memop);
50
+DEF_HELPER_FLAGS_5(sve_sel_zpzz_q, TCG_CALL_NO_RWG,
43
void gen_rev16(TCGv_i32 dest, TCGv_i32 var);
51
+ void, ptr, ptr, ptr, ptr, i32)
44
void clear_eci_state(DisasContext *s);
52
45
bool mve_eci_check(DisasContext *s);
53
DEF_HELPER_FLAGS_5(sve2_addp_zpzz_b, TCG_CALL_NO_RWG,
46
+void mve_update_eci(DisasContext *s);
54
void, ptr, ptr, ptr, ptr, i32)
47
void mve_update_and_store_eci(DisasContext *s);
55
diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
48
bool mve_skip_vmov(DisasContext *s, int vn, int index, int size);
56
index XXXXXXX..XXXXXXX 100644
49
57
--- a/target/arm/translate-a64.h
50
diff --git a/target/arm/t32.decode b/target/arm/t32.decode
58
+++ b/target/arm/translate-a64.h
51
index XXXXXXX..XXXXXXX 100644
59
@@ -XXX,XX +XXX,XX @@ static inline int pred_gvec_reg_size(DisasContext *s)
52
--- a/target/arm/t32.decode
60
return size_for_gvec(pred_full_reg_size(s));
53
+++ b/target/arm/t32.decode
54
@@ -XXX,XX +XXX,XX @@ BL 1111 0. .......... 11.1 ............ @branch24
55
# This is DLSTP
56
DLS 1111 0 0000 0 size:2 rn:4 1110 0000 0000 0001
57
}
58
+ VCTP 1111 0 0000 0 size:2 rn:4 1110 1000 0000 0001
59
]
60
}
61
}
61
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
62
62
index XXXXXXX..XXXXXXX 100644
63
+/* Return a newly allocated pointer to the predicate register. */
63
--- a/target/arm/mve_helper.c
64
+static inline TCGv_ptr pred_full_reg_ptr(DisasContext *s, int regno)
64
+++ b/target/arm/mve_helper.c
65
+{
65
@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vpnot)(CPUARMState *env)
66
+ TCGv_ptr ret = tcg_temp_new_ptr();
66
mve_advance_vpt(env);
67
+ tcg_gen_addi_ptr(ret, cpu_env, pred_full_reg_offset(s, regno));
68
+ return ret;
69
+}
70
+
71
bool disas_sve(DisasContext *, uint32_t);
72
bool disas_sme(DisasContext *, uint32_t);
73
74
diff --git a/target/arm/translate.h b/target/arm/translate.h
75
index XXXXXXX..XXXXXXX 100644
76
--- a/target/arm/translate.h
77
+++ b/target/arm/translate.h
78
@@ -XXX,XX +XXX,XX @@ static inline int plus_2(DisasContext *s, int x)
79
return x + 2;
67
}
80
}
68
81
69
+/*
82
+static inline int plus_12(DisasContext *s, int x)
70
+ * VCTP: P0 unexecuted bits unchanged, predicated bits zeroed,
83
+{
71
+ * otherwise set according to value of Rn. The calculation of
84
+ return x + 12;
72
+ * newmask here works in the same way as the calculation of the
85
+}
73
+ * ltpmask in mve_element_mask(), but we have pre-calculated
86
+
74
+ * the masklen in the generated code.
87
static inline int times_2(DisasContext *s, int x)
75
+ */
88
{
76
+void HELPER(mve_vctp)(CPUARMState *env, uint32_t masklen)
89
return x * 2;
77
+{
90
diff --git a/target/arm/sme.decode b/target/arm/sme.decode
78
+ uint16_t mask = mve_element_mask(env);
91
index XXXXXXX..XXXXXXX 100644
79
+ uint16_t eci_mask = mve_eci_mask(env);
92
--- a/target/arm/sme.decode
80
+ uint16_t newmask;
93
+++ b/target/arm/sme.decode
81
+
94
@@ -XXX,XX +XXX,XX @@
82
+ assert(masklen <= 16);
95
### SME Misc
83
+ newmask = masklen ? MAKE_64BIT_MASK(0, masklen) : 0;
96
84
+ newmask &= mask;
97
ZERO 11000000 00 001 00000000000 imm:8
85
+ env->v7m.vpr = (env->v7m.vpr & ~(uint32_t)eci_mask) | (newmask & eci_mask);
98
+
86
+ mve_advance_vpt(env);
99
+### SME Move into/from Array
87
+}
100
+
88
+
101
+%mova_rs 13:2 !function=plus_12
89
#define DO_1OP_SAT(OP, ESIZE, TYPE, FN) \
102
+&mova esz rs pg zr za_imm v:bool to_vec:bool
90
void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm) \
103
+
91
{ \
104
+MOVA 11000000 esz:2 00000 0 v:1 .. pg:3 zr:5 0 za_imm:4 \
92
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
105
+ &mova to_vec=0 rs=%mova_rs
93
index XXXXXXX..XXXXXXX 100644
106
+MOVA 11000000 11 00000 1 v:1 .. pg:3 zr:5 0 za_imm:4 \
94
--- a/target/arm/translate-mve.c
107
+ &mova to_vec=0 rs=%mova_rs esz=4
95
+++ b/target/arm/translate-mve.c
108
+
96
@@ -XXX,XX +XXX,XX @@ bool mve_eci_check(DisasContext *s)
109
+MOVA 11000000 esz:2 00001 0 v:1 .. pg:3 0 za_imm:4 zr:5 \
110
+ &mova to_vec=1 rs=%mova_rs
111
+MOVA 11000000 11 00001 1 v:1 .. pg:3 0 za_imm:4 zr:5 \
112
+ &mova to_vec=1 rs=%mova_rs esz=4
113
diff --git a/target/arm/sme_helper.c b/target/arm/sme_helper.c
114
index XXXXXXX..XXXXXXX 100644
115
--- a/target/arm/sme_helper.c
116
+++ b/target/arm/sme_helper.c
117
@@ -XXX,XX +XXX,XX @@
118
119
#include "qemu/osdep.h"
120
#include "cpu.h"
121
-#include "internals.h"
122
+#include "tcg/tcg-gvec-desc.h"
123
#include "exec/helper-proto.h"
124
+#include "qemu/int128.h"
125
+#include "vec_internal.h"
126
127
/* ResetSVEState */
128
void arm_reset_sve_state(CPUARMState *env)
129
@@ -XXX,XX +XXX,XX @@ void helper_sme_zero(CPUARMState *env, uint32_t imm, uint32_t svl)
130
}
97
}
131
}
98
}
132
}
99
133
+
100
-static void mve_update_eci(DisasContext *s)
134
+
101
+void mve_update_eci(DisasContext *s)
135
+/*
136
+ * When considering the ZA storage as an array of elements of
137
+ * type T, the index within that array of the Nth element of
138
+ * a vertical slice of a tile can be calculated like this,
139
+ * regardless of the size of type T. This is because the tiles
140
+ * are interleaved, so if type T is size N bytes then row 1 of
141
+ * the tile is N rows away from row 0. The division by N to
142
+ * convert a byte offset into an array index and the multiplication
143
+ * by N to convert from vslice-index-within-the-tile to
144
+ * the index within the ZA storage cancel out.
145
+ */
146
+#define tile_vslice_index(i) ((i) * sizeof(ARMVectorReg))
147
+
148
+/*
149
+ * When doing byte arithmetic on the ZA storage, the element
150
+ * byteoff bytes away in a tile vertical slice is always this
151
+ * many bytes away in the ZA storage, regardless of the
152
+ * size of the tile element, assuming that byteoff is a multiple
153
+ * of the element size. Again this is because of the interleaving
154
+ * of the tiles. For instance if we have 1 byte per element then
155
+ * each row of the ZA storage has one byte of the vslice data,
156
+ * and (counting from 0) byte 8 goes in row 8 of the storage
157
+ * at offset (8 * row-size-in-bytes).
158
+ * If we have 8 bytes per element then each row of the ZA storage
159
+ * has 8 bytes of the data, but there are 8 interleaved tiles and
160
+ * so byte 8 of the data goes into row 1 of the tile,
161
+ * which is again row 8 of the storage, so the offset is still
162
+ * (8 * row-size-in-bytes). Similarly for other element sizes.
163
+ */
164
+#define tile_vslice_offset(byteoff) ((byteoff) * sizeof(ARMVectorReg))
165
+
166
+
167
+/*
168
+ * Move Zreg vector to ZArray column.
169
+ */
170
+#define DO_MOVA_C(NAME, TYPE, H) \
171
+void HELPER(NAME)(void *za, void *vn, void *vg, uint32_t desc) \
172
+{ \
173
+ int i, oprsz = simd_oprsz(desc); \
174
+ for (i = 0; i < oprsz; ) { \
175
+ uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \
176
+ do { \
177
+ if (pg & 1) { \
178
+ *(TYPE *)(za + tile_vslice_offset(i)) = *(TYPE *)(vn + H(i)); \
179
+ } \
180
+ i += sizeof(TYPE); \
181
+ pg >>= sizeof(TYPE); \
182
+ } while (i & 15); \
183
+ } \
184
+}
185
+
186
+DO_MOVA_C(sme_mova_cz_b, uint8_t, H1)
187
+DO_MOVA_C(sme_mova_cz_h, uint16_t, H1_2)
188
+DO_MOVA_C(sme_mova_cz_s, uint32_t, H1_4)
189
+
190
+void HELPER(sme_mova_cz_d)(void *za, void *vn, void *vg, uint32_t desc)
191
+{
192
+ int i, oprsz = simd_oprsz(desc) / 8;
193
+ uint8_t *pg = vg;
194
+ uint64_t *n = vn;
195
+ uint64_t *a = za;
196
+
197
+ for (i = 0; i < oprsz; i++) {
198
+ if (pg[H1(i)] & 1) {
199
+ a[tile_vslice_index(i)] = n[i];
200
+ }
201
+ }
202
+}
203
+
204
+void HELPER(sme_mova_cz_q)(void *za, void *vn, void *vg, uint32_t desc)
205
+{
206
+ int i, oprsz = simd_oprsz(desc) / 16;
207
+ uint16_t *pg = vg;
208
+ Int128 *n = vn;
209
+ Int128 *a = za;
210
+
211
+ /*
212
+ * Int128 is used here simply to copy 16 bytes, and to simplify
213
+ * the address arithmetic.
214
+ */
215
+ for (i = 0; i < oprsz; i++) {
216
+ if (pg[H2(i)] & 1) {
217
+ a[tile_vslice_index(i)] = n[i];
218
+ }
219
+ }
220
+}
221
+
222
+#undef DO_MOVA_C
223
+
224
+/*
225
+ * Move ZArray column to Zreg vector.
226
+ */
227
+#define DO_MOVA_Z(NAME, TYPE, H) \
228
+void HELPER(NAME)(void *vd, void *za, void *vg, uint32_t desc) \
229
+{ \
230
+ int i, oprsz = simd_oprsz(desc); \
231
+ for (i = 0; i < oprsz; ) { \
232
+ uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \
233
+ do { \
234
+ if (pg & 1) { \
235
+ *(TYPE *)(vd + H(i)) = *(TYPE *)(za + tile_vslice_offset(i)); \
236
+ } \
237
+ i += sizeof(TYPE); \
238
+ pg >>= sizeof(TYPE); \
239
+ } while (i & 15); \
240
+ } \
241
+}
242
+
243
+DO_MOVA_Z(sme_mova_zc_b, uint8_t, H1)
244
+DO_MOVA_Z(sme_mova_zc_h, uint16_t, H1_2)
245
+DO_MOVA_Z(sme_mova_zc_s, uint32_t, H1_4)
246
+
247
+void HELPER(sme_mova_zc_d)(void *vd, void *za, void *vg, uint32_t desc)
248
+{
249
+ int i, oprsz = simd_oprsz(desc) / 8;
250
+ uint8_t *pg = vg;
251
+ uint64_t *d = vd;
252
+ uint64_t *a = za;
253
+
254
+ for (i = 0; i < oprsz; i++) {
255
+ if (pg[H1(i)] & 1) {
256
+ d[i] = a[tile_vslice_index(i)];
257
+ }
258
+ }
259
+}
260
+
261
+void HELPER(sme_mova_zc_q)(void *vd, void *za, void *vg, uint32_t desc)
262
+{
263
+ int i, oprsz = simd_oprsz(desc) / 16;
264
+ uint16_t *pg = vg;
265
+ Int128 *d = vd;
266
+ Int128 *a = za;
267
+
268
+ /*
269
+ * Int128 is used here simply to copy 16 bytes, and to simplify
270
+ * the address arithmetic.
271
+ */
272
+ for (i = 0; i < oprsz; i++, za += sizeof(ARMVectorReg)) {
273
+ if (pg[H2(i)] & 1) {
274
+ d[i] = a[tile_vslice_index(i)];
275
+ }
276
+ }
277
+}
278
+
279
+#undef DO_MOVA_Z
280
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
281
index XXXXXXX..XXXXXXX 100644
282
--- a/target/arm/sve_helper.c
283
+++ b/target/arm/sve_helper.c
284
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_sel_zpzz_d)(void *vd, void *vn, void *vm,
285
}
286
}
287
288
+void HELPER(sve_sel_zpzz_q)(void *vd, void *vn, void *vm,
289
+ void *vg, uint32_t desc)
290
+{
291
+ intptr_t i, opr_sz = simd_oprsz(desc) / 16;
292
+ Int128 *d = vd, *n = vn, *m = vm;
293
+ uint16_t *pg = vg;
294
+
295
+ for (i = 0; i < opr_sz; i += 1) {
296
+ d[i] = (pg[H2(i)] & 1 ? n : m)[i];
297
+ }
298
+}
299
+
300
/* Two operand comparison controlled by a predicate.
301
* ??? It is very tempting to want to be able to expand this inline
302
* with x86 instructions, e.g.
303
diff --git a/target/arm/translate-sme.c b/target/arm/translate-sme.c
304
index XXXXXXX..XXXXXXX 100644
305
--- a/target/arm/translate-sme.c
306
+++ b/target/arm/translate-sme.c
307
@@ -XXX,XX +XXX,XX @@
308
#include "decode-sme.c.inc"
309
310
311
+/*
312
+ * Resolve tile.size[index] to a host pointer, where tile and index
313
+ * are always decoded together, dependent on the element size.
314
+ */
315
+static TCGv_ptr get_tile_rowcol(DisasContext *s, int esz, int rs,
316
+ int tile_index, bool vertical)
317
+{
318
+ int tile = tile_index >> (4 - esz);
319
+ int index = esz == MO_128 ? 0 : extract32(tile_index, 0, 4 - esz);
320
+ int pos, len, offset;
321
+ TCGv_i32 tmp;
322
+ TCGv_ptr addr;
323
+
324
+ /* Compute the final index, which is Rs+imm. */
325
+ tmp = tcg_temp_new_i32();
326
+ tcg_gen_trunc_tl_i32(tmp, cpu_reg(s, rs));
327
+ tcg_gen_addi_i32(tmp, tmp, index);
328
+
329
+ /* Prepare a power-of-two modulo via extraction of @len bits. */
330
+ len = ctz32(streaming_vec_reg_size(s)) - esz;
331
+
332
+ if (vertical) {
333
+ /*
334
+ * Compute the byte offset of the index within the tile:
335
+ * (index % (svl / size)) * size
336
+ * = (index % (svl >> esz)) << esz
337
+ * Perform the power-of-two modulo via extraction of the low @len bits.
338
+ * Perform the multiply by shifting left by @pos bits.
339
+ * Perform these operations simultaneously via deposit into zero.
340
+ */
341
+ pos = esz;
342
+ tcg_gen_deposit_z_i32(tmp, tmp, pos, len);
343
+
344
+ /*
345
+ * For big-endian, adjust the indexed column byte offset within
346
+ * the uint64_t host words that make up env->zarray[].
347
+ */
348
+ if (HOST_BIG_ENDIAN && esz < MO_64) {
349
+ tcg_gen_xori_i32(tmp, tmp, 8 - (1 << esz));
350
+ }
351
+ } else {
352
+ /*
353
+ * Compute the byte offset of the index within the tile:
354
+ * (index % (svl / size)) * (size * sizeof(row))
355
+ * = (index % (svl >> esz)) << (esz + log2(sizeof(row)))
356
+ */
357
+ pos = esz + ctz32(sizeof(ARMVectorReg));
358
+ tcg_gen_deposit_z_i32(tmp, tmp, pos, len);
359
+
360
+ /* Row slices are always aligned and need no endian adjustment. */
361
+ }
362
+
363
+ /* The tile byte offset within env->zarray is the row. */
364
+ offset = tile * sizeof(ARMVectorReg);
365
+
366
+ /* Include the byte offset of zarray to make this relative to env. */
367
+ offset += offsetof(CPUARMState, zarray);
368
+ tcg_gen_addi_i32(tmp, tmp, offset);
369
+
370
+ /* Add the byte offset to env to produce the final pointer. */
371
+ addr = tcg_temp_new_ptr();
372
+ tcg_gen_ext_i32_ptr(addr, tmp);
373
+ tcg_temp_free_i32(tmp);
374
+ tcg_gen_add_ptr(addr, addr, cpu_env);
375
+
376
+ return addr;
377
+}
378
+
379
static bool trans_ZERO(DisasContext *s, arg_ZERO *a)
102
{
380
{
103
/*
381
if (!dc_isar_feature(aa64_sme, s)) {
104
* The helper function will always update the CPUState field,
382
@@ -XXX,XX +XXX,XX @@ static bool trans_ZERO(DisasContext *s, arg_ZERO *a)
105
diff --git a/target/arm/translate.c b/target/arm/translate.c
383
}
106
index XXXXXXX..XXXXXXX 100644
107
--- a/target/arm/translate.c
108
+++ b/target/arm/translate.c
109
@@ -XXX,XX +XXX,XX @@ static bool trans_LCTP(DisasContext *s, arg_LCTP *a)
110
return true;
384
return true;
111
}
385
}
112
386
+
113
+static bool trans_VCTP(DisasContext *s, arg_VCTP *a)
387
+static bool trans_MOVA(DisasContext *s, arg_MOVA *a)
114
+{
388
+{
115
+ /*
389
+ static gen_helper_gvec_4 * const h_fns[5] = {
116
+ * M-profile Create Vector Tail Predicate. This insn is itself
390
+ gen_helper_sve_sel_zpzz_b, gen_helper_sve_sel_zpzz_h,
117
+ * predicated and is subject to beatwise execution.
391
+ gen_helper_sve_sel_zpzz_s, gen_helper_sve_sel_zpzz_d,
118
+ */
392
+ gen_helper_sve_sel_zpzz_q
119
+ TCGv_i32 rn_shifted, masklen;
393
+ };
120
+
394
+ static gen_helper_gvec_3 * const cz_fns[5] = {
121
+ if (!dc_isar_feature(aa32_mve, s) || a->rn == 13 || a->rn == 15) {
395
+ gen_helper_sme_mova_cz_b, gen_helper_sme_mova_cz_h,
396
+ gen_helper_sme_mova_cz_s, gen_helper_sme_mova_cz_d,
397
+ gen_helper_sme_mova_cz_q,
398
+ };
399
+ static gen_helper_gvec_3 * const zc_fns[5] = {
400
+ gen_helper_sme_mova_zc_b, gen_helper_sme_mova_zc_h,
401
+ gen_helper_sme_mova_zc_s, gen_helper_sme_mova_zc_d,
402
+ gen_helper_sme_mova_zc_q,
403
+ };
404
+
405
+ TCGv_ptr t_za, t_zr, t_pg;
406
+ TCGv_i32 t_desc;
407
+ int svl;
408
+
409
+ if (!dc_isar_feature(aa64_sme, s)) {
122
+ return false;
410
+ return false;
123
+ }
411
+ }
124
+
412
+ if (!sme_smza_enabled_check(s)) {
125
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
126
+ return true;
413
+ return true;
127
+ }
414
+ }
128
+
415
+
129
+ /*
416
+ t_za = get_tile_rowcol(s, a->esz, a->rs, a->za_imm, a->v);
130
+ * We pre-calculate the mask length here to avoid having
417
+ t_zr = vec_full_reg_ptr(s, a->zr);
131
+ * to have multiple helpers specialized for size.
418
+ t_pg = pred_full_reg_ptr(s, a->pg);
132
+ * We pass the helper "rn <= (1 << (4 - size)) ? (rn << size) : 16".
419
+
133
+ */
420
+ svl = streaming_vec_reg_size(s);
134
+ rn_shifted = tcg_temp_new_i32();
421
+ t_desc = tcg_constant_i32(simd_desc(svl, svl, 0));
135
+ masklen = load_reg(s, a->rn);
422
+
136
+ tcg_gen_shli_i32(rn_shifted, masklen, a->size);
423
+ if (a->v) {
137
+ tcg_gen_movcond_i32(TCG_COND_LEU, masklen,
424
+ /* Vertical slice -- use sme mova helpers. */
138
+ masklen, tcg_constant_i32(1 << (4 - a->size)),
425
+ if (a->to_vec) {
139
+ rn_shifted, tcg_constant_i32(16));
426
+ zc_fns[a->esz](t_zr, t_za, t_pg, t_desc);
140
+ gen_helper_mve_vctp(cpu_env, masklen);
427
+ } else {
141
+ tcg_temp_free_i32(masklen);
428
+ cz_fns[a->esz](t_za, t_zr, t_pg, t_desc);
142
+ tcg_temp_free_i32(rn_shifted);
429
+ }
143
+ mve_update_eci(s);
430
+ } else {
431
+ /* Horizontal slice -- reuse sve sel helpers. */
432
+ if (a->to_vec) {
433
+ h_fns[a->esz](t_zr, t_za, t_zr, t_pg, t_desc);
434
+ } else {
435
+ h_fns[a->esz](t_za, t_zr, t_za, t_pg, t_desc);
436
+ }
437
+ }
438
+
439
+ tcg_temp_free_ptr(t_za);
440
+ tcg_temp_free_ptr(t_zr);
441
+ tcg_temp_free_ptr(t_pg);
442
+
144
+ return true;
443
+ return true;
145
+}
444
+}
146
147
static bool op_tbranch(DisasContext *s, arg_tbranch *a, bool half)
148
{
149
--
445
--
150
2.20.1
446
2.25.1
151
152
diff view generated by jsdifflib
1
Implement the MVE integer vector comparison instructions. These are
1
From: Richard Henderson <richard.henderson@linaro.org>
2
"VCMP (vector)" encodings T1, T2 and T3, and "VPT (vector)" encodings
3
T1, T2 and T3.
4
2
5
These insns compare corresponding elements in each vector, and update
3
We cannot reuse the SVE functions for LD[1-4] and ST[1-4],
6
the VPR.P0 predicate bits with the results of the comparison. VPT
4
because those functions accept only a Zreg register number.
7
also sets the VPR.MASK01 and VPR.MASK23 fields -- it is effectively
5
For SME, we want to pass a pointer into ZA storage.
8
"VCMP then VPST".
9
6
7
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
8
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
9
Message-id: 20220708151540.18136-21-richard.henderson@linaro.org
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
12
---
11
---
13
target/arm/helper-mve.h | 32 ++++++++++++++++++++++
12
target/arm/helper-sme.h | 82 +++++
14
target/arm/mve.decode | 18 +++++++++++-
13
target/arm/sme.decode | 9 +
15
target/arm/mve_helper.c | 56 ++++++++++++++++++++++++++++++++++++++
14
target/arm/sme_helper.c | 595 +++++++++++++++++++++++++++++++++++++
16
target/arm/translate-mve.c | 47 ++++++++++++++++++++++++++++++++
15
target/arm/translate-sme.c | 70 +++++
17
4 files changed, 152 insertions(+), 1 deletion(-)
16
4 files changed, 756 insertions(+)
18
17
19
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
18
diff --git a/target/arm/helper-sme.h b/target/arm/helper-sme.h
20
index XXXXXXX..XXXXXXX 100644
19
index XXXXXXX..XXXXXXX 100644
21
--- a/target/arm/helper-mve.h
20
--- a/target/arm/helper-sme.h
22
+++ b/target/arm/helper-mve.h
21
+++ b/target/arm/helper-sme.h
23
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_uqshl, TCG_CALL_NO_RWG, i32, env, i32, i32)
22
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sme_mova_cz_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
24
DEF_HELPER_FLAGS_3(mve_sqshl, TCG_CALL_NO_RWG, i32, env, i32, i32)
23
DEF_HELPER_FLAGS_4(sme_mova_zc_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
25
DEF_HELPER_FLAGS_3(mve_uqrshl, TCG_CALL_NO_RWG, i32, env, i32, i32)
24
DEF_HELPER_FLAGS_4(sme_mova_cz_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
26
DEF_HELPER_FLAGS_3(mve_sqrshr, TCG_CALL_NO_RWG, i32, env, i32, i32)
25
DEF_HELPER_FLAGS_4(sme_mova_zc_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
27
+
26
+
28
+DEF_HELPER_FLAGS_3(mve_vcmpeqb, TCG_CALL_NO_WG, void, env, ptr, ptr)
27
+DEF_HELPER_FLAGS_5(sme_ld1b_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
29
+DEF_HELPER_FLAGS_3(mve_vcmpeqh, TCG_CALL_NO_WG, void, env, ptr, ptr)
28
+DEF_HELPER_FLAGS_5(sme_ld1b_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
30
+DEF_HELPER_FLAGS_3(mve_vcmpeqw, TCG_CALL_NO_WG, void, env, ptr, ptr)
29
+DEF_HELPER_FLAGS_5(sme_ld1b_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
31
+
30
+DEF_HELPER_FLAGS_5(sme_ld1b_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
32
+DEF_HELPER_FLAGS_3(mve_vcmpneb, TCG_CALL_NO_WG, void, env, ptr, ptr)
31
+
33
+DEF_HELPER_FLAGS_3(mve_vcmpneh, TCG_CALL_NO_WG, void, env, ptr, ptr)
32
+DEF_HELPER_FLAGS_5(sme_ld1h_be_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
34
+DEF_HELPER_FLAGS_3(mve_vcmpnew, TCG_CALL_NO_WG, void, env, ptr, ptr)
33
+DEF_HELPER_FLAGS_5(sme_ld1h_le_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
35
+
34
+DEF_HELPER_FLAGS_5(sme_ld1h_be_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
36
+DEF_HELPER_FLAGS_3(mve_vcmpcsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
35
+DEF_HELPER_FLAGS_5(sme_ld1h_le_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
37
+DEF_HELPER_FLAGS_3(mve_vcmpcsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
36
+DEF_HELPER_FLAGS_5(sme_ld1h_be_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
38
+DEF_HELPER_FLAGS_3(mve_vcmpcsw, TCG_CALL_NO_WG, void, env, ptr, ptr)
37
+DEF_HELPER_FLAGS_5(sme_ld1h_le_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
39
+
38
+DEF_HELPER_FLAGS_5(sme_ld1h_be_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
40
+DEF_HELPER_FLAGS_3(mve_vcmphib, TCG_CALL_NO_WG, void, env, ptr, ptr)
39
+DEF_HELPER_FLAGS_5(sme_ld1h_le_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
41
+DEF_HELPER_FLAGS_3(mve_vcmphih, TCG_CALL_NO_WG, void, env, ptr, ptr)
40
+
42
+DEF_HELPER_FLAGS_3(mve_vcmphiw, TCG_CALL_NO_WG, void, env, ptr, ptr)
41
+DEF_HELPER_FLAGS_5(sme_ld1s_be_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
43
+
42
+DEF_HELPER_FLAGS_5(sme_ld1s_le_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
44
+DEF_HELPER_FLAGS_3(mve_vcmpgeb, TCG_CALL_NO_WG, void, env, ptr, ptr)
43
+DEF_HELPER_FLAGS_5(sme_ld1s_be_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
45
+DEF_HELPER_FLAGS_3(mve_vcmpgeh, TCG_CALL_NO_WG, void, env, ptr, ptr)
44
+DEF_HELPER_FLAGS_5(sme_ld1s_le_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
46
+DEF_HELPER_FLAGS_3(mve_vcmpgew, TCG_CALL_NO_WG, void, env, ptr, ptr)
45
+DEF_HELPER_FLAGS_5(sme_ld1s_be_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
47
+
46
+DEF_HELPER_FLAGS_5(sme_ld1s_le_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
48
+DEF_HELPER_FLAGS_3(mve_vcmpltb, TCG_CALL_NO_WG, void, env, ptr, ptr)
47
+DEF_HELPER_FLAGS_5(sme_ld1s_be_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
49
+DEF_HELPER_FLAGS_3(mve_vcmplth, TCG_CALL_NO_WG, void, env, ptr, ptr)
48
+DEF_HELPER_FLAGS_5(sme_ld1s_le_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
50
+DEF_HELPER_FLAGS_3(mve_vcmpltw, TCG_CALL_NO_WG, void, env, ptr, ptr)
49
+
51
+
50
+DEF_HELPER_FLAGS_5(sme_ld1d_be_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
52
+DEF_HELPER_FLAGS_3(mve_vcmpgtb, TCG_CALL_NO_WG, void, env, ptr, ptr)
51
+DEF_HELPER_FLAGS_5(sme_ld1d_le_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
53
+DEF_HELPER_FLAGS_3(mve_vcmpgth, TCG_CALL_NO_WG, void, env, ptr, ptr)
52
+DEF_HELPER_FLAGS_5(sme_ld1d_be_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
54
+DEF_HELPER_FLAGS_3(mve_vcmpgtw, TCG_CALL_NO_WG, void, env, ptr, ptr)
53
+DEF_HELPER_FLAGS_5(sme_ld1d_le_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
55
+
54
+DEF_HELPER_FLAGS_5(sme_ld1d_be_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
56
+DEF_HELPER_FLAGS_3(mve_vcmpleb, TCG_CALL_NO_WG, void, env, ptr, ptr)
55
+DEF_HELPER_FLAGS_5(sme_ld1d_le_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
57
+DEF_HELPER_FLAGS_3(mve_vcmpleh, TCG_CALL_NO_WG, void, env, ptr, ptr)
56
+DEF_HELPER_FLAGS_5(sme_ld1d_be_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
58
+DEF_HELPER_FLAGS_3(mve_vcmplew, TCG_CALL_NO_WG, void, env, ptr, ptr)
57
+DEF_HELPER_FLAGS_5(sme_ld1d_le_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
59
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
58
+
59
+DEF_HELPER_FLAGS_5(sme_ld1q_be_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
60
+DEF_HELPER_FLAGS_5(sme_ld1q_le_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
61
+DEF_HELPER_FLAGS_5(sme_ld1q_be_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
62
+DEF_HELPER_FLAGS_5(sme_ld1q_le_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
63
+DEF_HELPER_FLAGS_5(sme_ld1q_be_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
64
+DEF_HELPER_FLAGS_5(sme_ld1q_le_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
65
+DEF_HELPER_FLAGS_5(sme_ld1q_be_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
66
+DEF_HELPER_FLAGS_5(sme_ld1q_le_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
67
+
68
+DEF_HELPER_FLAGS_5(sme_st1b_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
69
+DEF_HELPER_FLAGS_5(sme_st1b_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
70
+DEF_HELPER_FLAGS_5(sme_st1b_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
71
+DEF_HELPER_FLAGS_5(sme_st1b_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
72
+
73
+DEF_HELPER_FLAGS_5(sme_st1h_be_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
74
+DEF_HELPER_FLAGS_5(sme_st1h_le_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
75
+DEF_HELPER_FLAGS_5(sme_st1h_be_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
76
+DEF_HELPER_FLAGS_5(sme_st1h_le_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
77
+DEF_HELPER_FLAGS_5(sme_st1h_be_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
78
+DEF_HELPER_FLAGS_5(sme_st1h_le_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
79
+DEF_HELPER_FLAGS_5(sme_st1h_be_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
80
+DEF_HELPER_FLAGS_5(sme_st1h_le_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
81
+
82
+DEF_HELPER_FLAGS_5(sme_st1s_be_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
83
+DEF_HELPER_FLAGS_5(sme_st1s_le_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
84
+DEF_HELPER_FLAGS_5(sme_st1s_be_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
85
+DEF_HELPER_FLAGS_5(sme_st1s_le_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
86
+DEF_HELPER_FLAGS_5(sme_st1s_be_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
87
+DEF_HELPER_FLAGS_5(sme_st1s_le_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
88
+DEF_HELPER_FLAGS_5(sme_st1s_be_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
89
+DEF_HELPER_FLAGS_5(sme_st1s_le_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
90
+
91
+DEF_HELPER_FLAGS_5(sme_st1d_be_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
92
+DEF_HELPER_FLAGS_5(sme_st1d_le_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
93
+DEF_HELPER_FLAGS_5(sme_st1d_be_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
94
+DEF_HELPER_FLAGS_5(sme_st1d_le_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
95
+DEF_HELPER_FLAGS_5(sme_st1d_be_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
96
+DEF_HELPER_FLAGS_5(sme_st1d_le_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
97
+DEF_HELPER_FLAGS_5(sme_st1d_be_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
98
+DEF_HELPER_FLAGS_5(sme_st1d_le_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
99
+
100
+DEF_HELPER_FLAGS_5(sme_st1q_be_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
101
+DEF_HELPER_FLAGS_5(sme_st1q_le_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
102
+DEF_HELPER_FLAGS_5(sme_st1q_be_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
103
+DEF_HELPER_FLAGS_5(sme_st1q_le_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
104
+DEF_HELPER_FLAGS_5(sme_st1q_be_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
105
+DEF_HELPER_FLAGS_5(sme_st1q_le_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
106
+DEF_HELPER_FLAGS_5(sme_st1q_be_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
107
+DEF_HELPER_FLAGS_5(sme_st1q_le_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
108
diff --git a/target/arm/sme.decode b/target/arm/sme.decode
60
index XXXXXXX..XXXXXXX 100644
109
index XXXXXXX..XXXXXXX 100644
61
--- a/target/arm/mve.decode
110
--- a/target/arm/sme.decode
62
+++ b/target/arm/mve.decode
111
+++ b/target/arm/sme.decode
112
@@ -XXX,XX +XXX,XX @@ MOVA 11000000 esz:2 00001 0 v:1 .. pg:3 0 za_imm:4 zr:5 \
113
&mova to_vec=1 rs=%mova_rs
114
MOVA 11000000 11 00001 1 v:1 .. pg:3 0 za_imm:4 zr:5 \
115
&mova to_vec=1 rs=%mova_rs esz=4
116
+
117
+### SME Memory
118
+
119
+&ldst esz rs pg rn rm za_imm v:bool st:bool
120
+
121
+LDST1 1110000 0 esz:2 st:1 rm:5 v:1 .. pg:3 rn:5 0 za_imm:4 \
122
+ &ldst rs=%mova_rs
123
+LDST1 1110000 111 st:1 rm:5 v:1 .. pg:3 rn:5 0 za_imm:4 \
124
+ &ldst esz=4 rs=%mova_rs
125
diff --git a/target/arm/sme_helper.c b/target/arm/sme_helper.c
126
index XXXXXXX..XXXXXXX 100644
127
--- a/target/arm/sme_helper.c
128
+++ b/target/arm/sme_helper.c
63
@@ -XXX,XX +XXX,XX @@
129
@@ -XXX,XX +XXX,XX @@
64
&2shift qd qm shift size
130
65
&vidup qd rn size imm
131
#include "qemu/osdep.h"
66
&viwdup qd rn rm size imm
132
#include "cpu.h"
67
+&vcmp qm qn size mask
133
+#include "internals.h"
68
134
#include "tcg/tcg-gvec-desc.h"
69
@vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
135
#include "exec/helper-proto.h"
70
# Note that both Rn and Qd are 3 bits only (no D bit)
136
+#include "exec/cpu_ldst.h"
71
@@ -XXX,XX +XXX,XX @@
137
+#include "exec/exec-all.h"
72
@2_shr_w .... .... .. 1 ..... .... .... .... .... &2shift qd=%qd qm=%qm \
138
#include "qemu/int128.h"
73
size=2 shift=%rshift_i5
139
#include "vec_internal.h"
74
140
+#include "sve_ldst_internal.h"
75
+# Vector comparison; 4-bit Qm but 3-bit Qn
141
76
+%mask_22_13 22:1 13:3
142
/* ResetSVEState */
77
+@vcmp .... .... .. size:2 qn:3 . .... .... .... .... &vcmp qm=%qm mask=%mask_22_13
143
void arm_reset_sve_state(CPUARMState *env)
78
+
144
@@ -XXX,XX +XXX,XX @@ void HELPER(sme_mova_zc_q)(void *vd, void *za, void *vg, uint32_t desc)
79
# Vector loads and stores
80
81
# Widening loads and narrowing stores:
82
@@ -XXX,XX +XXX,XX @@ VQRDMULH_scalar 1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
83
}
145
}
84
146
85
# Predicate operations
147
#undef DO_MOVA_Z
86
-%mask_22_13 22:1 13:3
148
+
87
VPST 1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
149
+/*
88
150
+ * Clear elements in a tile slice comprising len bytes.
89
# Logical immediate operations (1 reg and modified-immediate)
151
+ */
90
@@ -XXX,XX +XXX,XX @@ VQRSHRUNT 111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_b
152
+
91
VQRSHRUNT 111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_h
153
+typedef void ClearFn(void *ptr, size_t off, size_t len);
92
154
+
93
VSHLC 111 0 1110 1 . 1 imm:5 ... 0 1111 1100 rdm:4 qd=%qd
155
+static void clear_horizontal(void *ptr, size_t off, size_t len)
94
+
156
+{
95
+# Comparisons. We expand out the conditions which are split across
157
+ memset(ptr + off, 0, len);
96
+# encodings T1, T2, T3 and the fc bits. These include VPT, which is
158
+}
97
+# effectively "VCMP then VPST". A plain "VCMP" has a mask field of zero.
159
+
98
+VCMPEQ 1111 1110 0 . .. ... 1 ... 0 1111 0 0 . 0 ... 0 @vcmp
160
+static void clear_vertical_b(void *vptr, size_t off, size_t len)
99
+VCMPNE 1111 1110 0 . .. ... 1 ... 0 1111 1 0 . 0 ... 0 @vcmp
161
+{
100
+VCMPCS 1111 1110 0 . .. ... 1 ... 0 1111 0 0 . 0 ... 1 @vcmp
162
+ for (size_t i = 0; i < len; ++i) {
101
+VCMPHI 1111 1110 0 . .. ... 1 ... 0 1111 1 0 . 0 ... 1 @vcmp
163
+ *(uint8_t *)(vptr + tile_vslice_offset(i + off)) = 0;
102
+VCMPGE 1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 0 @vcmp
164
+ }
103
+VCMPLT 1111 1110 0 . .. ... 1 ... 1 1111 1 0 . 0 ... 0 @vcmp
165
+}
104
+VCMPGT 1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 1 @vcmp
166
+
105
+VCMPLE 1111 1110 0 . .. ... 1 ... 1 1111 1 0 . 0 ... 1 @vcmp
167
+static void clear_vertical_h(void *vptr, size_t off, size_t len)
106
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
168
+{
169
+ for (size_t i = 0; i < len; i += 2) {
170
+ *(uint16_t *)(vptr + tile_vslice_offset(i + off)) = 0;
171
+ }
172
+}
173
+
174
+static void clear_vertical_s(void *vptr, size_t off, size_t len)
175
+{
176
+ for (size_t i = 0; i < len; i += 4) {
177
+ *(uint32_t *)(vptr + tile_vslice_offset(i + off)) = 0;
178
+ }
179
+}
180
+
181
+static void clear_vertical_d(void *vptr, size_t off, size_t len)
182
+{
183
+ for (size_t i = 0; i < len; i += 8) {
184
+ *(uint64_t *)(vptr + tile_vslice_offset(i + off)) = 0;
185
+ }
186
+}
187
+
188
+static void clear_vertical_q(void *vptr, size_t off, size_t len)
189
+{
190
+ for (size_t i = 0; i < len; i += 16) {
191
+ memset(vptr + tile_vslice_offset(i + off), 0, 16);
192
+ }
193
+}
194
+
195
+/*
196
+ * Copy elements from an array into a tile slice comprising len bytes.
197
+ */
198
+
199
+typedef void CopyFn(void *dst, const void *src, size_t len);
200
+
201
+static void copy_horizontal(void *dst, const void *src, size_t len)
202
+{
203
+ memcpy(dst, src, len);
204
+}
205
+
206
+static void copy_vertical_b(void *vdst, const void *vsrc, size_t len)
207
+{
208
+ const uint8_t *src = vsrc;
209
+ uint8_t *dst = vdst;
210
+ size_t i;
211
+
212
+ for (i = 0; i < len; ++i) {
213
+ dst[tile_vslice_index(i)] = src[i];
214
+ }
215
+}
216
+
217
+static void copy_vertical_h(void *vdst, const void *vsrc, size_t len)
218
+{
219
+ const uint16_t *src = vsrc;
220
+ uint16_t *dst = vdst;
221
+ size_t i;
222
+
223
+ for (i = 0; i < len / 2; ++i) {
224
+ dst[tile_vslice_index(i)] = src[i];
225
+ }
226
+}
227
+
228
+static void copy_vertical_s(void *vdst, const void *vsrc, size_t len)
229
+{
230
+ const uint32_t *src = vsrc;
231
+ uint32_t *dst = vdst;
232
+ size_t i;
233
+
234
+ for (i = 0; i < len / 4; ++i) {
235
+ dst[tile_vslice_index(i)] = src[i];
236
+ }
237
+}
238
+
239
+static void copy_vertical_d(void *vdst, const void *vsrc, size_t len)
240
+{
241
+ const uint64_t *src = vsrc;
242
+ uint64_t *dst = vdst;
243
+ size_t i;
244
+
245
+ for (i = 0; i < len / 8; ++i) {
246
+ dst[tile_vslice_index(i)] = src[i];
247
+ }
248
+}
249
+
250
+static void copy_vertical_q(void *vdst, const void *vsrc, size_t len)
251
+{
252
+ for (size_t i = 0; i < len; i += 16) {
253
+ memcpy(vdst + tile_vslice_offset(i), vsrc + i, 16);
254
+ }
255
+}
256
+
257
+/*
258
+ * Host and TLB primitives for vertical tile slice addressing.
259
+ */
260
+
261
+#define DO_LD(NAME, TYPE, HOST, TLB) \
262
+static inline void sme_##NAME##_v_host(void *za, intptr_t off, void *host) \
263
+{ \
264
+ TYPE val = HOST(host); \
265
+ *(TYPE *)(za + tile_vslice_offset(off)) = val; \
266
+} \
267
+static inline void sme_##NAME##_v_tlb(CPUARMState *env, void *za, \
268
+ intptr_t off, target_ulong addr, uintptr_t ra) \
269
+{ \
270
+ TYPE val = TLB(env, useronly_clean_ptr(addr), ra); \
271
+ *(TYPE *)(za + tile_vslice_offset(off)) = val; \
272
+}
273
+
274
+#define DO_ST(NAME, TYPE, HOST, TLB) \
275
+static inline void sme_##NAME##_v_host(void *za, intptr_t off, void *host) \
276
+{ \
277
+ TYPE val = *(TYPE *)(za + tile_vslice_offset(off)); \
278
+ HOST(host, val); \
279
+} \
280
+static inline void sme_##NAME##_v_tlb(CPUARMState *env, void *za, \
281
+ intptr_t off, target_ulong addr, uintptr_t ra) \
282
+{ \
283
+ TYPE val = *(TYPE *)(za + tile_vslice_offset(off)); \
284
+ TLB(env, useronly_clean_ptr(addr), val, ra); \
285
+}
286
+
287
+/*
288
+ * The ARMVectorReg elements are stored in host-endian 64-bit units.
289
+ * For 128-bit quantities, the sequence defined by the Elem[] pseudocode
290
+ * corresponds to storing the two 64-bit pieces in little-endian order.
291
+ */
292
+#define DO_LDQ(HNAME, VNAME, BE, HOST, TLB) \
293
+static inline void HNAME##_host(void *za, intptr_t off, void *host) \
294
+{ \
295
+ uint64_t val0 = HOST(host), val1 = HOST(host + 8); \
296
+ uint64_t *ptr = za + off; \
297
+ ptr[0] = BE ? val1 : val0, ptr[1] = BE ? val0 : val1; \
298
+} \
299
+static inline void VNAME##_v_host(void *za, intptr_t off, void *host) \
300
+{ \
301
+ HNAME##_host(za, tile_vslice_offset(off), host); \
302
+} \
303
+static inline void HNAME##_tlb(CPUARMState *env, void *za, intptr_t off, \
304
+ target_ulong addr, uintptr_t ra) \
305
+{ \
306
+ uint64_t val0 = TLB(env, useronly_clean_ptr(addr), ra); \
307
+ uint64_t val1 = TLB(env, useronly_clean_ptr(addr + 8), ra); \
308
+ uint64_t *ptr = za + off; \
309
+ ptr[0] = BE ? val1 : val0, ptr[1] = BE ? val0 : val1; \
310
+} \
311
+static inline void VNAME##_v_tlb(CPUARMState *env, void *za, intptr_t off, \
312
+ target_ulong addr, uintptr_t ra) \
313
+{ \
314
+ HNAME##_tlb(env, za, tile_vslice_offset(off), addr, ra); \
315
+}
316
+
317
+#define DO_STQ(HNAME, VNAME, BE, HOST, TLB) \
318
+static inline void HNAME##_host(void *za, intptr_t off, void *host) \
319
+{ \
320
+ uint64_t *ptr = za + off; \
321
+ HOST(host, ptr[BE]); \
322
+ HOST(host + 1, ptr[!BE]); \
323
+} \
324
+static inline void VNAME##_v_host(void *za, intptr_t off, void *host) \
325
+{ \
326
+ HNAME##_host(za, tile_vslice_offset(off), host); \
327
+} \
328
+static inline void HNAME##_tlb(CPUARMState *env, void *za, intptr_t off, \
329
+ target_ulong addr, uintptr_t ra) \
330
+{ \
331
+ uint64_t *ptr = za + off; \
332
+ TLB(env, useronly_clean_ptr(addr), ptr[BE], ra); \
333
+ TLB(env, useronly_clean_ptr(addr + 8), ptr[!BE], ra); \
334
+} \
335
+static inline void VNAME##_v_tlb(CPUARMState *env, void *za, intptr_t off, \
336
+ target_ulong addr, uintptr_t ra) \
337
+{ \
338
+ HNAME##_tlb(env, za, tile_vslice_offset(off), addr, ra); \
339
+}
340
+
341
+DO_LD(ld1b, uint8_t, ldub_p, cpu_ldub_data_ra)
342
+DO_LD(ld1h_be, uint16_t, lduw_be_p, cpu_lduw_be_data_ra)
343
+DO_LD(ld1h_le, uint16_t, lduw_le_p, cpu_lduw_le_data_ra)
344
+DO_LD(ld1s_be, uint32_t, ldl_be_p, cpu_ldl_be_data_ra)
345
+DO_LD(ld1s_le, uint32_t, ldl_le_p, cpu_ldl_le_data_ra)
346
+DO_LD(ld1d_be, uint64_t, ldq_be_p, cpu_ldq_be_data_ra)
347
+DO_LD(ld1d_le, uint64_t, ldq_le_p, cpu_ldq_le_data_ra)
348
+
349
+DO_LDQ(sve_ld1qq_be, sme_ld1q_be, 1, ldq_be_p, cpu_ldq_be_data_ra)
350
+DO_LDQ(sve_ld1qq_le, sme_ld1q_le, 0, ldq_le_p, cpu_ldq_le_data_ra)
351
+
352
+DO_ST(st1b, uint8_t, stb_p, cpu_stb_data_ra)
353
+DO_ST(st1h_be, uint16_t, stw_be_p, cpu_stw_be_data_ra)
354
+DO_ST(st1h_le, uint16_t, stw_le_p, cpu_stw_le_data_ra)
355
+DO_ST(st1s_be, uint32_t, stl_be_p, cpu_stl_be_data_ra)
356
+DO_ST(st1s_le, uint32_t, stl_le_p, cpu_stl_le_data_ra)
357
+DO_ST(st1d_be, uint64_t, stq_be_p, cpu_stq_be_data_ra)
358
+DO_ST(st1d_le, uint64_t, stq_le_p, cpu_stq_le_data_ra)
359
+
360
+DO_STQ(sve_st1qq_be, sme_st1q_be, 1, stq_be_p, cpu_stq_be_data_ra)
361
+DO_STQ(sve_st1qq_le, sme_st1q_le, 0, stq_le_p, cpu_stq_le_data_ra)
362
+
363
+#undef DO_LD
364
+#undef DO_ST
365
+#undef DO_LDQ
366
+#undef DO_STQ
367
+
368
+/*
369
+ * Common helper for all contiguous predicated loads.
370
+ */
371
+
372
+static inline QEMU_ALWAYS_INLINE
373
+void sme_ld1(CPUARMState *env, void *za, uint64_t *vg,
374
+ const target_ulong addr, uint32_t desc, const uintptr_t ra,
375
+ const int esz, uint32_t mtedesc, bool vertical,
376
+ sve_ldst1_host_fn *host_fn,
377
+ sve_ldst1_tlb_fn *tlb_fn,
378
+ ClearFn *clr_fn,
379
+ CopyFn *cpy_fn)
380
+{
381
+ const intptr_t reg_max = simd_oprsz(desc);
382
+ const intptr_t esize = 1 << esz;
383
+ intptr_t reg_off, reg_last;
384
+ SVEContLdSt info;
385
+ void *host;
386
+ int flags;
387
+
388
+ /* Find the active elements. */
389
+ if (!sve_cont_ldst_elements(&info, addr, vg, reg_max, esz, esize)) {
390
+ /* The entire predicate was false; no load occurs. */
391
+ clr_fn(za, 0, reg_max);
392
+ return;
393
+ }
394
+
395
+ /* Probe the page(s). Exit with exception for any invalid page. */
396
+ sve_cont_ldst_pages(&info, FAULT_ALL, env, addr, MMU_DATA_LOAD, ra);
397
+
398
+ /* Handle watchpoints for all active elements. */
399
+ sve_cont_ldst_watchpoints(&info, env, vg, addr, esize, esize,
400
+ BP_MEM_READ, ra);
401
+
402
+ /*
403
+ * Handle mte checks for all active elements.
404
+ * Since TBI must be set for MTE, !mtedesc => !mte_active.
405
+ */
406
+ if (mtedesc) {
407
+ sve_cont_ldst_mte_check(&info, env, vg, addr, esize, esize,
408
+ mtedesc, ra);
409
+ }
410
+
411
+ flags = info.page[0].flags | info.page[1].flags;
412
+ if (unlikely(flags != 0)) {
413
+#ifdef CONFIG_USER_ONLY
414
+ g_assert_not_reached();
415
+#else
416
+ /*
417
+ * At least one page includes MMIO.
418
+ * Any bus operation can fail with cpu_transaction_failed,
419
+ * which for ARM will raise SyncExternal. Perform the load
420
+ * into scratch memory to preserve register state until the end.
421
+ */
422
+ ARMVectorReg scratch = { };
423
+
424
+ reg_off = info.reg_off_first[0];
425
+ reg_last = info.reg_off_last[1];
426
+ if (reg_last < 0) {
427
+ reg_last = info.reg_off_split;
428
+ if (reg_last < 0) {
429
+ reg_last = info.reg_off_last[0];
430
+ }
431
+ }
432
+
433
+ do {
434
+ uint64_t pg = vg[reg_off >> 6];
435
+ do {
436
+ if ((pg >> (reg_off & 63)) & 1) {
437
+ tlb_fn(env, &scratch, reg_off, addr + reg_off, ra);
438
+ }
439
+ reg_off += esize;
440
+ } while (reg_off & 63);
441
+ } while (reg_off <= reg_last);
442
+
443
+ cpy_fn(za, &scratch, reg_max);
444
+ return;
445
+#endif
446
+ }
447
+
448
+ /* The entire operation is in RAM, on valid pages. */
449
+
450
+ reg_off = info.reg_off_first[0];
451
+ reg_last = info.reg_off_last[0];
452
+ host = info.page[0].host;
453
+
454
+ if (!vertical) {
455
+ memset(za, 0, reg_max);
456
+ } else if (reg_off) {
457
+ clr_fn(za, 0, reg_off);
458
+ }
459
+
460
+ while (reg_off <= reg_last) {
461
+ uint64_t pg = vg[reg_off >> 6];
462
+ do {
463
+ if ((pg >> (reg_off & 63)) & 1) {
464
+ host_fn(za, reg_off, host + reg_off);
465
+ } else if (vertical) {
466
+ clr_fn(za, reg_off, esize);
467
+ }
468
+ reg_off += esize;
469
+ } while (reg_off <= reg_last && (reg_off & 63));
470
+ }
471
+
472
+ /*
473
+ * Use the slow path to manage the cross-page misalignment.
474
+ * But we know this is RAM and cannot trap.
475
+ */
476
+ reg_off = info.reg_off_split;
477
+ if (unlikely(reg_off >= 0)) {
478
+ tlb_fn(env, za, reg_off, addr + reg_off, ra);
479
+ }
480
+
481
+ reg_off = info.reg_off_first[1];
482
+ if (unlikely(reg_off >= 0)) {
483
+ reg_last = info.reg_off_last[1];
484
+ host = info.page[1].host;
485
+
486
+ do {
487
+ uint64_t pg = vg[reg_off >> 6];
488
+ do {
489
+ if ((pg >> (reg_off & 63)) & 1) {
490
+ host_fn(za, reg_off, host + reg_off);
491
+ } else if (vertical) {
492
+ clr_fn(za, reg_off, esize);
493
+ }
494
+ reg_off += esize;
495
+ } while (reg_off & 63);
496
+ } while (reg_off <= reg_last);
497
+ }
498
+}
499
+
500
+static inline QEMU_ALWAYS_INLINE
501
+void sme_ld1_mte(CPUARMState *env, void *za, uint64_t *vg,
502
+ target_ulong addr, uint32_t desc, uintptr_t ra,
503
+ const int esz, bool vertical,
504
+ sve_ldst1_host_fn *host_fn,
505
+ sve_ldst1_tlb_fn *tlb_fn,
506
+ ClearFn *clr_fn,
507
+ CopyFn *cpy_fn)
508
+{
509
+ uint32_t mtedesc = desc >> (SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
510
+ int bit55 = extract64(addr, 55, 1);
511
+
512
+ /* Remove mtedesc from the normal sve descriptor. */
513
+ desc = extract32(desc, 0, SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
514
+
515
+ /* Perform gross MTE suppression early. */
516
+ if (!tbi_check(desc, bit55) ||
517
+ tcma_check(desc, bit55, allocation_tag_from_addr(addr))) {
518
+ mtedesc = 0;
519
+ }
520
+
521
+ sme_ld1(env, za, vg, addr, desc, ra, esz, mtedesc, vertical,
522
+ host_fn, tlb_fn, clr_fn, cpy_fn);
523
+}
524
+
525
+#define DO_LD(L, END, ESZ) \
526
+void HELPER(sme_ld1##L##END##_h)(CPUARMState *env, void *za, void *vg, \
527
+ target_ulong addr, uint32_t desc) \
528
+{ \
529
+ sme_ld1(env, za, vg, addr, desc, GETPC(), ESZ, 0, false, \
530
+ sve_ld1##L##L##END##_host, sve_ld1##L##L##END##_tlb, \
531
+ clear_horizontal, copy_horizontal); \
532
+} \
533
+void HELPER(sme_ld1##L##END##_v)(CPUARMState *env, void *za, void *vg, \
534
+ target_ulong addr, uint32_t desc) \
535
+{ \
536
+ sme_ld1(env, za, vg, addr, desc, GETPC(), ESZ, 0, true, \
537
+ sme_ld1##L##END##_v_host, sme_ld1##L##END##_v_tlb, \
538
+ clear_vertical_##L, copy_vertical_##L); \
539
+} \
540
+void HELPER(sme_ld1##L##END##_h_mte)(CPUARMState *env, void *za, void *vg, \
541
+ target_ulong addr, uint32_t desc) \
542
+{ \
543
+ sme_ld1_mte(env, za, vg, addr, desc, GETPC(), ESZ, false, \
544
+ sve_ld1##L##L##END##_host, sve_ld1##L##L##END##_tlb, \
545
+ clear_horizontal, copy_horizontal); \
546
+} \
547
+void HELPER(sme_ld1##L##END##_v_mte)(CPUARMState *env, void *za, void *vg, \
548
+ target_ulong addr, uint32_t desc) \
549
+{ \
550
+ sme_ld1_mte(env, za, vg, addr, desc, GETPC(), ESZ, true, \
551
+ sme_ld1##L##END##_v_host, sme_ld1##L##END##_v_tlb, \
552
+ clear_vertical_##L, copy_vertical_##L); \
553
+}
554
+
555
+DO_LD(b, , MO_8)
556
+DO_LD(h, _be, MO_16)
557
+DO_LD(h, _le, MO_16)
558
+DO_LD(s, _be, MO_32)
559
+DO_LD(s, _le, MO_32)
560
+DO_LD(d, _be, MO_64)
561
+DO_LD(d, _le, MO_64)
562
+DO_LD(q, _be, MO_128)
563
+DO_LD(q, _le, MO_128)
564
+
565
+#undef DO_LD
566
+
567
+/*
568
+ * Common helper for all contiguous predicated stores.
569
+ */
570
+
571
+static inline QEMU_ALWAYS_INLINE
572
+void sme_st1(CPUARMState *env, void *za, uint64_t *vg,
573
+ const target_ulong addr, uint32_t desc, const uintptr_t ra,
574
+ const int esz, uint32_t mtedesc, bool vertical,
575
+ sve_ldst1_host_fn *host_fn,
576
+ sve_ldst1_tlb_fn *tlb_fn)
577
+{
578
+ const intptr_t reg_max = simd_oprsz(desc);
579
+ const intptr_t esize = 1 << esz;
580
+ intptr_t reg_off, reg_last;
581
+ SVEContLdSt info;
582
+ void *host;
583
+ int flags;
584
+
585
+ /* Find the active elements. */
586
+ if (!sve_cont_ldst_elements(&info, addr, vg, reg_max, esz, esize)) {
587
+ /* The entire predicate was false; no store occurs. */
588
+ return;
589
+ }
590
+
591
+ /* Probe the page(s). Exit with exception for any invalid page. */
592
+ sve_cont_ldst_pages(&info, FAULT_ALL, env, addr, MMU_DATA_STORE, ra);
593
+
594
+ /* Handle watchpoints for all active elements. */
595
+ sve_cont_ldst_watchpoints(&info, env, vg, addr, esize, esize,
596
+ BP_MEM_WRITE, ra);
597
+
598
+ /*
599
+ * Handle mte checks for all active elements.
600
+ * Since TBI must be set for MTE, !mtedesc => !mte_active.
601
+ */
602
+ if (mtedesc) {
603
+ sve_cont_ldst_mte_check(&info, env, vg, addr, esize, esize,
604
+ mtedesc, ra);
605
+ }
606
+
607
+ flags = info.page[0].flags | info.page[1].flags;
608
+ if (unlikely(flags != 0)) {
609
+#ifdef CONFIG_USER_ONLY
610
+ g_assert_not_reached();
611
+#else
612
+ /*
613
+ * At least one page includes MMIO.
614
+ * Any bus operation can fail with cpu_transaction_failed,
615
+ * which for ARM will raise SyncExternal. We cannot avoid
616
+ * this fault and will leave with the store incomplete.
617
+ */
618
+ reg_off = info.reg_off_first[0];
619
+ reg_last = info.reg_off_last[1];
620
+ if (reg_last < 0) {
621
+ reg_last = info.reg_off_split;
622
+ if (reg_last < 0) {
623
+ reg_last = info.reg_off_last[0];
624
+ }
625
+ }
626
+
627
+ do {
628
+ uint64_t pg = vg[reg_off >> 6];
629
+ do {
630
+ if ((pg >> (reg_off & 63)) & 1) {
631
+ tlb_fn(env, za, reg_off, addr + reg_off, ra);
632
+ }
633
+ reg_off += esize;
634
+ } while (reg_off & 63);
635
+ } while (reg_off <= reg_last);
636
+ return;
637
+#endif
638
+ }
639
+
640
+ reg_off = info.reg_off_first[0];
641
+ reg_last = info.reg_off_last[0];
642
+ host = info.page[0].host;
643
+
644
+ while (reg_off <= reg_last) {
645
+ uint64_t pg = vg[reg_off >> 6];
646
+ do {
647
+ if ((pg >> (reg_off & 63)) & 1) {
648
+ host_fn(za, reg_off, host + reg_off);
649
+ }
650
+ reg_off += 1 << esz;
651
+ } while (reg_off <= reg_last && (reg_off & 63));
652
+ }
653
+
654
+ /*
655
+ * Use the slow path to manage the cross-page misalignment.
656
+ * But we know this is RAM and cannot trap.
657
+ */
658
+ reg_off = info.reg_off_split;
659
+ if (unlikely(reg_off >= 0)) {
660
+ tlb_fn(env, za, reg_off, addr + reg_off, ra);
661
+ }
662
+
663
+ reg_off = info.reg_off_first[1];
664
+ if (unlikely(reg_off >= 0)) {
665
+ reg_last = info.reg_off_last[1];
666
+ host = info.page[1].host;
667
+
668
+ do {
669
+ uint64_t pg = vg[reg_off >> 6];
670
+ do {
671
+ if ((pg >> (reg_off & 63)) & 1) {
672
+ host_fn(za, reg_off, host + reg_off);
673
+ }
674
+ reg_off += 1 << esz;
675
+ } while (reg_off & 63);
676
+ } while (reg_off <= reg_last);
677
+ }
678
+}
679
+
680
+static inline QEMU_ALWAYS_INLINE
681
+void sme_st1_mte(CPUARMState *env, void *za, uint64_t *vg, target_ulong addr,
682
+ uint32_t desc, uintptr_t ra, int esz, bool vertical,
683
+ sve_ldst1_host_fn *host_fn,
684
+ sve_ldst1_tlb_fn *tlb_fn)
685
+{
686
+ uint32_t mtedesc = desc >> (SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
687
+ int bit55 = extract64(addr, 55, 1);
688
+
689
+ /* Remove mtedesc from the normal sve descriptor. */
690
+ desc = extract32(desc, 0, SIMD_DATA_SHIFT + SVE_MTEDESC_SHIFT);
691
+
692
+ /* Perform gross MTE suppression early. */
693
+ if (!tbi_check(desc, bit55) ||
694
+ tcma_check(desc, bit55, allocation_tag_from_addr(addr))) {
695
+ mtedesc = 0;
696
+ }
697
+
698
+ sme_st1(env, za, vg, addr, desc, ra, esz, mtedesc,
699
+ vertical, host_fn, tlb_fn);
700
+}
701
+
702
+#define DO_ST(L, END, ESZ) \
703
+void HELPER(sme_st1##L##END##_h)(CPUARMState *env, void *za, void *vg, \
704
+ target_ulong addr, uint32_t desc) \
705
+{ \
706
+ sme_st1(env, za, vg, addr, desc, GETPC(), ESZ, 0, false, \
707
+ sve_st1##L##L##END##_host, sve_st1##L##L##END##_tlb); \
708
+} \
709
+void HELPER(sme_st1##L##END##_v)(CPUARMState *env, void *za, void *vg, \
710
+ target_ulong addr, uint32_t desc) \
711
+{ \
712
+ sme_st1(env, za, vg, addr, desc, GETPC(), ESZ, 0, true, \
713
+ sme_st1##L##END##_v_host, sme_st1##L##END##_v_tlb); \
714
+} \
715
+void HELPER(sme_st1##L##END##_h_mte)(CPUARMState *env, void *za, void *vg, \
716
+ target_ulong addr, uint32_t desc) \
717
+{ \
718
+ sme_st1_mte(env, za, vg, addr, desc, GETPC(), ESZ, false, \
719
+ sve_st1##L##L##END##_host, sve_st1##L##L##END##_tlb); \
720
+} \
721
+void HELPER(sme_st1##L##END##_v_mte)(CPUARMState *env, void *za, void *vg, \
722
+ target_ulong addr, uint32_t desc) \
723
+{ \
724
+ sme_st1_mte(env, za, vg, addr, desc, GETPC(), ESZ, true, \
725
+ sme_st1##L##END##_v_host, sme_st1##L##END##_v_tlb); \
726
+}
727
+
728
+DO_ST(b, , MO_8)
729
+DO_ST(h, _be, MO_16)
730
+DO_ST(h, _le, MO_16)
731
+DO_ST(s, _be, MO_32)
732
+DO_ST(s, _le, MO_32)
733
+DO_ST(d, _be, MO_64)
734
+DO_ST(d, _le, MO_64)
735
+DO_ST(q, _be, MO_128)
736
+DO_ST(q, _le, MO_128)
737
+
738
+#undef DO_ST
739
diff --git a/target/arm/translate-sme.c b/target/arm/translate-sme.c
107
index XXXXXXX..XXXXXXX 100644
740
index XXXXXXX..XXXXXXX 100644
108
--- a/target/arm/mve_helper.c
741
--- a/target/arm/translate-sme.c
109
+++ b/target/arm/mve_helper.c
742
+++ b/target/arm/translate-sme.c
110
@@ -XXX,XX +XXX,XX @@ static uint32_t do_sub_wrap(uint32_t offset, uint32_t wrap, uint32_t imm)
743
@@ -XXX,XX +XXX,XX @@ static bool trans_MOVA(DisasContext *s, arg_MOVA *a)
111
DO_VIDUP_ALL(vidup, DO_ADD)
744
112
DO_VIWDUP_ALL(viwdup, do_add_wrap)
745
return true;
113
DO_VIWDUP_ALL(vdwdup, do_sub_wrap)
114
+
115
+/*
116
+ * Vector comparison.
117
+ * P0 bits for non-executed beats (where eci_mask is 0) are unchanged.
118
+ * P0 bits for predicated lanes in executed beats (where mask is 0) are 0.
119
+ * P0 bits otherwise are updated with the results of the comparisons.
120
+ * We must also keep unchanged the MASK fields at the top of v7m.vpr.
121
+ */
122
+#define DO_VCMP(OP, ESIZE, TYPE, FN) \
123
+ void HELPER(glue(mve_, OP))(CPUARMState *env, void *vn, void *vm) \
124
+ { \
125
+ TYPE *n = vn, *m = vm; \
126
+ uint16_t mask = mve_element_mask(env); \
127
+ uint16_t eci_mask = mve_eci_mask(env); \
128
+ uint16_t beatpred = 0; \
129
+ uint16_t emask = MAKE_64BIT_MASK(0, ESIZE); \
130
+ unsigned e; \
131
+ for (e = 0; e < 16 / ESIZE; e++) { \
132
+ bool r = FN(n[H##ESIZE(e)], m[H##ESIZE(e)]); \
133
+ /* Comparison sets 0/1 bits for each byte in the element */ \
134
+ beatpred |= r * emask; \
135
+ emask <<= ESIZE; \
136
+ } \
137
+ beatpred &= mask; \
138
+ env->v7m.vpr = (env->v7m.vpr & ~(uint32_t)eci_mask) | \
139
+ (beatpred & eci_mask); \
140
+ mve_advance_vpt(env); \
141
+ }
142
+
143
+#define DO_VCMP_S(OP, FN) \
144
+ DO_VCMP(OP##b, 1, int8_t, FN) \
145
+ DO_VCMP(OP##h, 2, int16_t, FN) \
146
+ DO_VCMP(OP##w, 4, int32_t, FN)
147
+
148
+#define DO_VCMP_U(OP, FN) \
149
+ DO_VCMP(OP##b, 1, uint8_t, FN) \
150
+ DO_VCMP(OP##h, 2, uint16_t, FN) \
151
+ DO_VCMP(OP##w, 4, uint32_t, FN)
152
+
153
+#define DO_EQ(N, M) ((N) == (M))
154
+#define DO_NE(N, M) ((N) != (M))
155
+#define DO_EQ(N, M) ((N) == (M))
156
+#define DO_EQ(N, M) ((N) == (M))
157
+#define DO_GE(N, M) ((N) >= (M))
158
+#define DO_LT(N, M) ((N) < (M))
159
+#define DO_GT(N, M) ((N) > (M))
160
+#define DO_LE(N, M) ((N) <= (M))
161
+
162
+DO_VCMP_U(vcmpeq, DO_EQ)
163
+DO_VCMP_U(vcmpne, DO_NE)
164
+DO_VCMP_U(vcmpcs, DO_GE)
165
+DO_VCMP_U(vcmphi, DO_GT)
166
+DO_VCMP_S(vcmpge, DO_GE)
167
+DO_VCMP_S(vcmplt, DO_LT)
168
+DO_VCMP_S(vcmpgt, DO_GT)
169
+DO_VCMP_S(vcmple, DO_LE)
170
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
171
index XXXXXXX..XXXXXXX 100644
172
--- a/target/arm/translate-mve.c
173
+++ b/target/arm/translate-mve.c
174
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenVADDVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32);
175
typedef void MVEGenOneOpImmFn(TCGv_ptr, TCGv_ptr, TCGv_i64);
176
typedef void MVEGenVIDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32);
177
typedef void MVEGenVIWDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32);
178
+typedef void MVEGenCmpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
179
180
/* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
181
static inline long mve_qreg_offset(unsigned reg)
182
@@ -XXX,XX +XXX,XX @@ static bool trans_VDWDUP(DisasContext *s, arg_viwdup *a)
183
};
184
return do_viwdup(s, a, fns[a->size]);
185
}
746
}
186
+
747
+
187
+static bool do_vcmp(DisasContext *s, arg_vcmp *a, MVEGenCmpFn *fn)
748
+static bool trans_LDST1(DisasContext *s, arg_LDST1 *a)
188
+{
749
+{
189
+ TCGv_ptr qn, qm;
750
+ typedef void GenLdSt1(TCGv_env, TCGv_ptr, TCGv_ptr, TCGv, TCGv_i32);
190
+
751
+
191
+ if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qm) ||
752
+ /*
192
+ !fn) {
753
+ * Indexed by [esz][be][v][mte][st], which is (except for load/store)
754
+ * also the order in which the elements appear in the function names,
755
+ * and so how we must concatenate the pieces.
756
+ */
757
+
758
+#define FN_LS(F) { gen_helper_sme_ld1##F, gen_helper_sme_st1##F }
759
+#define FN_MTE(F) { FN_LS(F), FN_LS(F##_mte) }
760
+#define FN_HV(F) { FN_MTE(F##_h), FN_MTE(F##_v) }
761
+#define FN_END(L, B) { FN_HV(L), FN_HV(B) }
762
+
763
+ static GenLdSt1 * const fns[5][2][2][2][2] = {
764
+ FN_END(b, b),
765
+ FN_END(h_le, h_be),
766
+ FN_END(s_le, s_be),
767
+ FN_END(d_le, d_be),
768
+ FN_END(q_le, q_be),
769
+ };
770
+
771
+#undef FN_LS
772
+#undef FN_MTE
773
+#undef FN_HV
774
+#undef FN_END
775
+
776
+ TCGv_ptr t_za, t_pg;
777
+ TCGv_i64 addr;
778
+ int svl, desc = 0;
779
+ bool be = s->be_data == MO_BE;
780
+ bool mte = s->mte_active[0];
781
+
782
+ if (!dc_isar_feature(aa64_sme, s)) {
193
+ return false;
783
+ return false;
194
+ }
784
+ }
195
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
785
+ if (!sme_smza_enabled_check(s)) {
196
+ return true;
786
+ return true;
197
+ }
787
+ }
198
+
788
+
199
+ qn = mve_qreg_ptr(a->qn);
789
+ t_za = get_tile_rowcol(s, a->esz, a->rs, a->za_imm, a->v);
200
+ qm = mve_qreg_ptr(a->qm);
790
+ t_pg = pred_full_reg_ptr(s, a->pg);
201
+ fn(cpu_env, qn, qm);
791
+ addr = tcg_temp_new_i64();
202
+ tcg_temp_free_ptr(qn);
792
+
203
+ tcg_temp_free_ptr(qm);
793
+ tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), a->esz);
204
+ if (a->mask) {
794
+ tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn));
205
+ /* VPT */
795
+
206
+ gen_vpst(s, a->mask);
796
+ if (mte) {
207
+ }
797
+ desc = FIELD_DP32(desc, MTEDESC, MIDX, get_mem_index(s));
208
+ mve_update_eci(s);
798
+ desc = FIELD_DP32(desc, MTEDESC, TBI, s->tbid);
799
+ desc = FIELD_DP32(desc, MTEDESC, TCMA, s->tcma);
800
+ desc = FIELD_DP32(desc, MTEDESC, WRITE, a->st);
801
+ desc = FIELD_DP32(desc, MTEDESC, SIZEM1, (1 << a->esz) - 1);
802
+ desc <<= SVE_MTEDESC_SHIFT;
803
+ } else {
804
+ addr = clean_data_tbi(s, addr);
805
+ }
806
+ svl = streaming_vec_reg_size(s);
807
+ desc = simd_desc(svl, svl, desc);
808
+
809
+ fns[a->esz][be][a->v][mte][a->st](cpu_env, t_za, t_pg, addr,
810
+ tcg_constant_i32(desc));
811
+
812
+ tcg_temp_free_ptr(t_za);
813
+ tcg_temp_free_ptr(t_pg);
814
+ tcg_temp_free_i64(addr);
209
+ return true;
815
+ return true;
210
+}
816
+}
211
+
212
+#define DO_VCMP(INSN, FN) \
213
+ static bool trans_##INSN(DisasContext *s, arg_vcmp *a) \
214
+ { \
215
+ static MVEGenCmpFn * const fns[] = { \
216
+ gen_helper_mve_##FN##b, \
217
+ gen_helper_mve_##FN##h, \
218
+ gen_helper_mve_##FN##w, \
219
+ NULL, \
220
+ }; \
221
+ return do_vcmp(s, a, fns[a->size]); \
222
+ }
223
+
224
+DO_VCMP(VCMPEQ, vcmpeq)
225
+DO_VCMP(VCMPNE, vcmpne)
226
+DO_VCMP(VCMPCS, vcmpcs)
227
+DO_VCMP(VCMPHI, vcmphi)
228
+DO_VCMP(VCMPGE, vcmpge)
229
+DO_VCMP(VCMPLT, vcmplt)
230
+DO_VCMP(VCMPGT, vcmpgt)
231
+DO_VCMP(VCMPLE, vcmple)
232
--
817
--
233
2.20.1
818
2.25.1
234
235
diff view generated by jsdifflib
1
Factor out the "generate code to update VPR.MASK01/MASK23" part of
1
From: Richard Henderson <richard.henderson@linaro.org>
2
trans_VPST(); we are going to want to reuse it for the VPT insns.
3
2
3
Add a TCGv_ptr base argument, which will be cpu_env for SVE.
4
We will reuse this for SME save and restore array insns.
5
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20220708151540.18136-22-richard.henderson@linaro.org
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
---
10
---
7
target/arm/translate-mve.c | 31 +++++++++++++++++--------------
11
target/arm/translate-a64.h | 3 +++
8
1 file changed, 17 insertions(+), 14 deletions(-)
12
target/arm/translate-sve.c | 48 ++++++++++++++++++++++++++++----------
13
2 files changed, 39 insertions(+), 12 deletions(-)
9
14
10
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
15
diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
11
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
12
--- a/target/arm/translate-mve.c
17
--- a/target/arm/translate-a64.h
13
+++ b/target/arm/translate-mve.c
18
+++ b/target/arm/translate-a64.h
14
@@ -XXX,XX +XXX,XX @@ static bool trans_VRMLSLDAVH(DisasContext *s, arg_vmlaldav *a)
19
@@ -XXX,XX +XXX,XX @@ void gen_gvec_xar(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
15
return do_long_dual_acc(s, a, fns[a->x]);
20
uint32_t rm_ofs, int64_t shift,
21
uint32_t opr_sz, uint32_t max_sz);
22
23
+void gen_sve_ldr(DisasContext *s, TCGv_ptr, int vofs, int len, int rn, int imm);
24
+void gen_sve_str(DisasContext *s, TCGv_ptr, int vofs, int len, int rn, int imm);
25
+
26
#endif /* TARGET_ARM_TRANSLATE_A64_H */
27
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
28
index XXXXXXX..XXXXXXX 100644
29
--- a/target/arm/translate-sve.c
30
+++ b/target/arm/translate-sve.c
31
@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(UCVTF_dd, aa64_sve, gen_gvec_fpst_arg_zpz,
32
* The load should begin at the address Rn + IMM.
33
*/
34
35
-static void do_ldr(DisasContext *s, uint32_t vofs, int len, int rn, int imm)
36
+void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs,
37
+ int len, int rn, int imm)
38
{
39
int len_align = QEMU_ALIGN_DOWN(len, 8);
40
int len_remain = len % 8;
41
@@ -XXX,XX +XXX,XX @@ static void do_ldr(DisasContext *s, uint32_t vofs, int len, int rn, int imm)
42
t0 = tcg_temp_new_i64();
43
for (i = 0; i < len_align; i += 8) {
44
tcg_gen_qemu_ld_i64(t0, clean_addr, midx, MO_LEUQ);
45
- tcg_gen_st_i64(t0, cpu_env, vofs + i);
46
+ tcg_gen_st_i64(t0, base, vofs + i);
47
tcg_gen_addi_i64(clean_addr, clean_addr, 8);
48
}
49
tcg_temp_free_i64(t0);
50
@@ -XXX,XX +XXX,XX @@ static void do_ldr(DisasContext *s, uint32_t vofs, int len, int rn, int imm)
51
clean_addr = new_tmp_a64_local(s);
52
tcg_gen_mov_i64(clean_addr, t0);
53
54
+ if (base != cpu_env) {
55
+ TCGv_ptr b = tcg_temp_local_new_ptr();
56
+ tcg_gen_mov_ptr(b, base);
57
+ base = b;
58
+ }
59
+
60
gen_set_label(loop);
61
62
t0 = tcg_temp_new_i64();
63
@@ -XXX,XX +XXX,XX @@ static void do_ldr(DisasContext *s, uint32_t vofs, int len, int rn, int imm)
64
tcg_gen_addi_i64(clean_addr, clean_addr, 8);
65
66
tp = tcg_temp_new_ptr();
67
- tcg_gen_add_ptr(tp, cpu_env, i);
68
+ tcg_gen_add_ptr(tp, base, i);
69
tcg_gen_addi_ptr(i, i, 8);
70
tcg_gen_st_i64(t0, tp, vofs);
71
tcg_temp_free_ptr(tp);
72
@@ -XXX,XX +XXX,XX @@ static void do_ldr(DisasContext *s, uint32_t vofs, int len, int rn, int imm)
73
74
tcg_gen_brcondi_ptr(TCG_COND_LTU, i, len_align, loop);
75
tcg_temp_free_ptr(i);
76
+
77
+ if (base != cpu_env) {
78
+ tcg_temp_free_ptr(base);
79
+ assert(len_remain == 0);
80
+ }
81
}
82
83
/*
84
@@ -XXX,XX +XXX,XX @@ static void do_ldr(DisasContext *s, uint32_t vofs, int len, int rn, int imm)
85
default:
86
g_assert_not_reached();
87
}
88
- tcg_gen_st_i64(t0, cpu_env, vofs + len_align);
89
+ tcg_gen_st_i64(t0, base, vofs + len_align);
90
tcg_temp_free_i64(t0);
91
}
16
}
92
}
17
93
18
-static bool trans_VPST(DisasContext *s, arg_VPST *a)
94
/* Similarly for stores. */
19
+static void gen_vpst(DisasContext *s, uint32_t mask)
95
-static void do_str(DisasContext *s, uint32_t vofs, int len, int rn, int imm)
96
+void gen_sve_str(DisasContext *s, TCGv_ptr base, int vofs,
97
+ int len, int rn, int imm)
20
{
98
{
21
- TCGv_i32 vpr;
99
int len_align = QEMU_ALIGN_DOWN(len, 8);
22
-
100
int len_remain = len % 8;
23
- /* mask == 0 is a "related encoding" */
101
@@ -XXX,XX +XXX,XX @@ static void do_str(DisasContext *s, uint32_t vofs, int len, int rn, int imm)
24
- if (!dc_isar_feature(aa32_mve, s) || !a->mask) {
102
25
- return false;
103
t0 = tcg_temp_new_i64();
26
- }
104
for (i = 0; i < len_align; i += 8) {
27
- if (!mve_eci_check(s) || !vfp_access_check(s)) {
105
- tcg_gen_ld_i64(t0, cpu_env, vofs + i);
28
- return true;
106
+ tcg_gen_ld_i64(t0, base, vofs + i);
29
- }
107
tcg_gen_qemu_st_i64(t0, clean_addr, midx, MO_LEUQ);
30
/*
108
tcg_gen_addi_i64(clean_addr, clean_addr, 8);
31
* Set the VPR mask fields. We take advantage of MASK01 and MASK23
109
}
32
* being adjacent fields in the register.
110
@@ -XXX,XX +XXX,XX @@ static void do_str(DisasContext *s, uint32_t vofs, int len, int rn, int imm)
33
*
111
clean_addr = new_tmp_a64_local(s);
34
- * This insn is not predicated, but it is subject to beat-wise
112
tcg_gen_mov_i64(clean_addr, t0);
35
+ * Updating the masks is not predicated, but it is subject to beat-wise
113
36
* execution, and the mask is updated on the odd-numbered beats.
114
+ if (base != cpu_env) {
37
* So if PSR.ECI says we should skip beat 1, we mustn't update the
115
+ TCGv_ptr b = tcg_temp_local_new_ptr();
38
* 01 mask field.
116
+ tcg_gen_mov_ptr(b, base);
39
*/
117
+ base = b;
40
- vpr = load_cpu_field(v7m.vpr);
118
+ }
41
+ TCGv_i32 vpr = load_cpu_field(v7m.vpr);
119
+
42
switch (s->eci) {
120
gen_set_label(loop);
43
case ECI_NONE:
121
44
case ECI_A0:
122
t0 = tcg_temp_new_i64();
45
/* Update both 01 and 23 fields */
123
tp = tcg_temp_new_ptr();
46
tcg_gen_deposit_i32(vpr, vpr,
124
- tcg_gen_add_ptr(tp, cpu_env, i);
47
- tcg_constant_i32(a->mask | (a->mask << 4)),
125
+ tcg_gen_add_ptr(tp, base, i);
48
+ tcg_constant_i32(mask | (mask << 4)),
126
tcg_gen_ld_i64(t0, tp, vofs);
49
R_V7M_VPR_MASK01_SHIFT,
127
tcg_gen_addi_ptr(i, i, 8);
50
R_V7M_VPR_MASK01_LENGTH + R_V7M_VPR_MASK23_LENGTH);
128
tcg_temp_free_ptr(tp);
51
break;
129
@@ -XXX,XX +XXX,XX @@ static void do_str(DisasContext *s, uint32_t vofs, int len, int rn, int imm)
52
@@ -XXX,XX +XXX,XX @@ static bool trans_VPST(DisasContext *s, arg_VPST *a)
130
53
case ECI_A0A1A2B0:
131
tcg_gen_brcondi_ptr(TCG_COND_LTU, i, len_align, loop);
54
/* Update only the 23 mask field */
132
tcg_temp_free_ptr(i);
55
tcg_gen_deposit_i32(vpr, vpr,
133
+
56
- tcg_constant_i32(a->mask),
134
+ if (base != cpu_env) {
57
+ tcg_constant_i32(mask),
135
+ tcg_temp_free_ptr(base);
58
R_V7M_VPR_MASK23_SHIFT, R_V7M_VPR_MASK23_LENGTH);
136
+ assert(len_remain == 0);
59
break;
137
+ }
60
default:
61
g_assert_not_reached();
62
}
138
}
63
store_cpu_field(vpr, v7m.vpr);
139
64
+}
140
/* Predicate register stores can be any multiple of 2. */
65
+
141
if (len_remain) {
66
+static bool trans_VPST(DisasContext *s, arg_VPST *a)
142
t0 = tcg_temp_new_i64();
67
+{
143
- tcg_gen_ld_i64(t0, cpu_env, vofs + len_align);
68
+ /* mask == 0 is a "related encoding" */
144
+ tcg_gen_ld_i64(t0, base, vofs + len_align);
69
+ if (!dc_isar_feature(aa32_mve, s) || !a->mask) {
145
70
+ return false;
146
switch (len_remain) {
71
+ }
147
case 2:
72
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
148
@@ -XXX,XX +XXX,XX @@ static bool trans_LDR_zri(DisasContext *s, arg_rri *a)
73
+ return true;
149
if (sve_access_check(s)) {
74
+ }
150
int size = vec_full_reg_size(s);
75
+ gen_vpst(s, a->mask);
151
int off = vec_full_reg_offset(s, a->rd);
76
mve_update_and_store_eci(s);
152
- do_ldr(s, off, size, a->rn, a->imm * size);
153
+ gen_sve_ldr(s, cpu_env, off, size, a->rn, a->imm * size);
154
}
155
return true;
156
}
157
@@ -XXX,XX +XXX,XX @@ static bool trans_LDR_pri(DisasContext *s, arg_rri *a)
158
if (sve_access_check(s)) {
159
int size = pred_full_reg_size(s);
160
int off = pred_full_reg_offset(s, a->rd);
161
- do_ldr(s, off, size, a->rn, a->imm * size);
162
+ gen_sve_ldr(s, cpu_env, off, size, a->rn, a->imm * size);
163
}
164
return true;
165
}
166
@@ -XXX,XX +XXX,XX @@ static bool trans_STR_zri(DisasContext *s, arg_rri *a)
167
if (sve_access_check(s)) {
168
int size = vec_full_reg_size(s);
169
int off = vec_full_reg_offset(s, a->rd);
170
- do_str(s, off, size, a->rn, a->imm * size);
171
+ gen_sve_str(s, cpu_env, off, size, a->rn, a->imm * size);
172
}
173
return true;
174
}
175
@@ -XXX,XX +XXX,XX @@ static bool trans_STR_pri(DisasContext *s, arg_rri *a)
176
if (sve_access_check(s)) {
177
int size = pred_full_reg_size(s);
178
int off = pred_full_reg_offset(s, a->rd);
179
- do_str(s, off, size, a->rn, a->imm * size);
180
+ gen_sve_str(s, cpu_env, off, size, a->rn, a->imm * size);
181
}
77
return true;
182
return true;
78
}
183
}
79
--
184
--
80
2.20.1
185
2.25.1
81
82
diff view generated by jsdifflib
1
Implement the MVE incrementing/decrementing dup insns VIDUP, VDDUP,
1
From: Richard Henderson <richard.henderson@linaro.org>
2
VIWDUP and VDWDUP. These fill the elements of a vector with
3
successively incrementing values, starting at the offset specified in
4
a general purpose register. The final value of the offset is written
5
back to this register. The wrapping variants take a second general
6
purpose register which specifies the point where the count should
7
wrap back to 0.
8
2
3
We can reuse the SVE functions for LDR and STR, passing in the
4
base of the ZA vector and a zero offset.
5
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20220708151540.18136-23-richard.henderson@linaro.org
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
10
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
11
---
10
---
12
target/arm/helper-mve.h | 12 ++++
11
target/arm/sme.decode | 7 +++++++
13
target/arm/mve.decode | 25 ++++++++
12
target/arm/translate-sme.c | 24 ++++++++++++++++++++++++
14
target/arm/mve_helper.c | 63 +++++++++++++++++++
13
2 files changed, 31 insertions(+)
15
target/arm/translate-mve.c | 120 +++++++++++++++++++++++++++++++++++++
16
4 files changed, 220 insertions(+)
17
14
18
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
15
diff --git a/target/arm/sme.decode b/target/arm/sme.decode
19
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
20
--- a/target/arm/helper-mve.h
17
--- a/target/arm/sme.decode
21
+++ b/target/arm/helper-mve.h
18
+++ b/target/arm/sme.decode
22
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vstrh_w, TCG_CALL_NO_WG, void, env, ptr, i32)
19
@@ -XXX,XX +XXX,XX @@ LDST1 1110000 0 esz:2 st:1 rm:5 v:1 .. pg:3 rn:5 0 za_imm:4 \
23
20
&ldst rs=%mova_rs
24
DEF_HELPER_FLAGS_3(mve_vdup, TCG_CALL_NO_WG, void, env, ptr, i32)
21
LDST1 1110000 111 st:1 rm:5 v:1 .. pg:3 rn:5 0 za_imm:4 \
25
22
&ldst esz=4 rs=%mova_rs
26
+DEF_HELPER_FLAGS_4(mve_vidupb, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
27
+DEF_HELPER_FLAGS_4(mve_viduph, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
28
+DEF_HELPER_FLAGS_4(mve_vidupw, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
29
+
23
+
30
+DEF_HELPER_FLAGS_5(mve_viwdupb, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
24
+&ldstr rv rn imm
31
+DEF_HELPER_FLAGS_5(mve_viwduph, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
25
+@ldstr ....... ... . ...... .. ... rn:5 . imm:4 \
32
+DEF_HELPER_FLAGS_5(mve_viwdupw, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
26
+ &ldstr rv=%mova_rs
33
+
27
+
34
+DEF_HELPER_FLAGS_5(mve_vdwdupb, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
28
+LDR 1110000 100 0 000000 .. 000 ..... 0 .... @ldstr
35
+DEF_HELPER_FLAGS_5(mve_vdwduph, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
29
+STR 1110000 100 1 000000 .. 000 ..... 0 .... @ldstr
36
+DEF_HELPER_FLAGS_5(mve_vdwdupw, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
30
diff --git a/target/arm/translate-sme.c b/target/arm/translate-sme.c
37
+
38
DEF_HELPER_FLAGS_3(mve_vclsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
39
DEF_HELPER_FLAGS_3(mve_vclsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
40
DEF_HELPER_FLAGS_3(mve_vclsw, TCG_CALL_NO_WG, void, env, ptr, ptr)
41
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
42
index XXXXXXX..XXXXXXX 100644
31
index XXXXXXX..XXXXXXX 100644
43
--- a/target/arm/mve.decode
32
--- a/target/arm/translate-sme.c
44
+++ b/target/arm/mve.decode
33
+++ b/target/arm/translate-sme.c
45
@@ -XXX,XX +XXX,XX @@
34
@@ -XXX,XX +XXX,XX @@ static bool trans_LDST1(DisasContext *s, arg_LDST1 *a)
46
&2scalar qd qn rm size
35
tcg_temp_free_i64(addr);
47
&1imm qd imm cmode op
48
&2shift qd qm shift size
49
+&vidup qd rn size imm
50
+&viwdup qd rn rm size imm
51
52
@vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
53
# Note that both Rn and Qd are 3 bits only (no D bit)
54
@@ -XXX,XX +XXX,XX @@ VDUP 1110 1110 1 1 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=0
55
VDUP 1110 1110 1 0 10 ... 0 .... 1011 . 0 1 1 0000 @vdup size=1
56
VDUP 1110 1110 1 0 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=2
57
58
+# Incrementing and decrementing dup
59
+
60
+# VIDUP, VDDUP format immediate: 1 << (immh:imml)
61
+%imm_vidup 7:1 0:1 !function=vidup_imm
62
+
63
+# VIDUP, VDDUP registers: Rm bits [3:1] from insn, bit 0 is 1;
64
+# Rn bits [3:1] from insn, bit 0 is 0
65
+%vidup_rm 1:3 !function=times_2_plus_1
66
+%vidup_rn 17:3 !function=times_2
67
+
68
+@vidup .... .... . . size:2 .... .... .... .... .... \
69
+ qd=%qd imm=%imm_vidup rn=%vidup_rn &vidup
70
+@viwdup .... .... . . size:2 .... .... .... .... .... \
71
+ qd=%qd imm=%imm_vidup rm=%vidup_rm rn=%vidup_rn &viwdup
72
+{
73
+ VIDUP 1110 1110 0 . .. ... 1 ... 0 1111 . 110 111 . @vidup
74
+ VIWDUP 1110 1110 0 . .. ... 1 ... 0 1111 . 110 ... . @viwdup
75
+}
76
+{
77
+ VDDUP 1110 1110 0 . .. ... 1 ... 1 1111 . 110 111 . @vidup
78
+ VDWDUP 1110 1110 0 . .. ... 1 ... 1 1111 . 110 ... . @viwdup
79
+}
80
+
81
# multiply-add long dual accumulate
82
# rdahi: bits [3:1] from insn, bit 0 is 1
83
# rdalo: bits [3:1] from insn, bit 0 is 0
84
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
85
index XXXXXXX..XXXXXXX 100644
86
--- a/target/arm/mve_helper.c
87
+++ b/target/arm/mve_helper.c
88
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(mve_sqrshr)(CPUARMState *env, uint32_t n, uint32_t shift)
89
{
90
return do_sqrshl_bhs(n, -(int8_t)shift, 32, true, &env->QF);
91
}
92
+
93
+#define DO_VIDUP(OP, ESIZE, TYPE, FN) \
94
+ uint32_t HELPER(mve_##OP)(CPUARMState *env, void *vd, \
95
+ uint32_t offset, uint32_t imm) \
96
+ { \
97
+ TYPE *d = vd; \
98
+ uint16_t mask = mve_element_mask(env); \
99
+ unsigned e; \
100
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
101
+ mergemask(&d[H##ESIZE(e)], offset, mask); \
102
+ offset = FN(offset, imm); \
103
+ } \
104
+ mve_advance_vpt(env); \
105
+ return offset; \
106
+ }
107
+
108
+#define DO_VIWDUP(OP, ESIZE, TYPE, FN) \
109
+ uint32_t HELPER(mve_##OP)(CPUARMState *env, void *vd, \
110
+ uint32_t offset, uint32_t wrap, \
111
+ uint32_t imm) \
112
+ { \
113
+ TYPE *d = vd; \
114
+ uint16_t mask = mve_element_mask(env); \
115
+ unsigned e; \
116
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
117
+ mergemask(&d[H##ESIZE(e)], offset, mask); \
118
+ offset = FN(offset, wrap, imm); \
119
+ } \
120
+ mve_advance_vpt(env); \
121
+ return offset; \
122
+ }
123
+
124
+#define DO_VIDUP_ALL(OP, FN) \
125
+ DO_VIDUP(OP##b, 1, int8_t, FN) \
126
+ DO_VIDUP(OP##h, 2, int16_t, FN) \
127
+ DO_VIDUP(OP##w, 4, int32_t, FN)
128
+
129
+#define DO_VIWDUP_ALL(OP, FN) \
130
+ DO_VIWDUP(OP##b, 1, int8_t, FN) \
131
+ DO_VIWDUP(OP##h, 2, int16_t, FN) \
132
+ DO_VIWDUP(OP##w, 4, int32_t, FN)
133
+
134
+static uint32_t do_add_wrap(uint32_t offset, uint32_t wrap, uint32_t imm)
135
+{
136
+ offset += imm;
137
+ if (offset == wrap) {
138
+ offset = 0;
139
+ }
140
+ return offset;
141
+}
142
+
143
+static uint32_t do_sub_wrap(uint32_t offset, uint32_t wrap, uint32_t imm)
144
+{
145
+ if (offset == 0) {
146
+ offset = wrap;
147
+ }
148
+ offset -= imm;
149
+ return offset;
150
+}
151
+
152
+DO_VIDUP_ALL(vidup, DO_ADD)
153
+DO_VIWDUP_ALL(viwdup, do_add_wrap)
154
+DO_VIWDUP_ALL(vdwdup, do_sub_wrap)
155
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
156
index XXXXXXX..XXXXXXX 100644
157
--- a/target/arm/translate-mve.c
158
+++ b/target/arm/translate-mve.c
159
@@ -XXX,XX +XXX,XX @@
160
#include "translate.h"
161
#include "translate-a32.h"
162
163
+static inline int vidup_imm(DisasContext *s, int x)
164
+{
165
+ return 1 << x;
166
+}
167
+
168
/* Include the generated decoder */
169
#include "decode-mve.c.inc"
170
171
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenTwoOpShiftFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
172
typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
173
typedef void MVEGenVADDVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32);
174
typedef void MVEGenOneOpImmFn(TCGv_ptr, TCGv_ptr, TCGv_i64);
175
+typedef void MVEGenVIDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32);
176
+typedef void MVEGenVIWDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32);
177
178
/* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
179
static inline long mve_qreg_offset(unsigned reg)
180
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLC(DisasContext *s, arg_VSHLC *a)
181
mve_update_eci(s);
182
return true;
36
return true;
183
}
37
}
184
+
38
+
185
+static bool do_vidup(DisasContext *s, arg_vidup *a, MVEGenVIDUPFn *fn)
39
+typedef void GenLdStR(DisasContext *, TCGv_ptr, int, int, int, int);
40
+
41
+static bool do_ldst_r(DisasContext *s, arg_ldstr *a, GenLdStR *fn)
186
+{
42
+{
187
+ TCGv_ptr qd;
43
+ int svl = streaming_vec_reg_size(s);
188
+ TCGv_i32 rn;
44
+ int imm = a->imm;
45
+ TCGv_ptr base;
189
+
46
+
190
+ /*
47
+ if (!sme_za_enabled_check(s)) {
191
+ * Vector increment/decrement with wrap and duplicate (VIDUP, VDDUP).
192
+ * This fills the vector with elements of successively increasing
193
+ * or decreasing values, starting from Rn.
194
+ */
195
+ if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qd)) {
196
+ return false;
197
+ }
198
+ if (a->size == MO_64) {
199
+ /* size 0b11 is another encoding */
200
+ return false;
201
+ }
202
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
203
+ return true;
48
+ return true;
204
+ }
49
+ }
205
+
50
+
206
+ qd = mve_qreg_ptr(a->qd);
51
+ /* ZA[n] equates to ZA0H.B[n]. */
207
+ rn = load_reg(s, a->rn);
52
+ base = get_tile_rowcol(s, MO_8, a->rv, imm, false);
208
+ fn(rn, cpu_env, qd, rn, tcg_constant_i32(a->imm));
53
+
209
+ store_reg(s, a->rn, rn);
54
+ fn(s, base, 0, svl, a->rn, imm * svl);
210
+ tcg_temp_free_ptr(qd);
55
+
211
+ mve_update_eci(s);
56
+ tcg_temp_free_ptr(base);
212
+ return true;
57
+ return true;
213
+}
58
+}
214
+
59
+
215
+static bool do_viwdup(DisasContext *s, arg_viwdup *a, MVEGenVIWDUPFn *fn)
60
+TRANS_FEAT(LDR, aa64_sme, do_ldst_r, a, gen_sve_ldr)
216
+{
61
+TRANS_FEAT(STR, aa64_sme, do_ldst_r, a, gen_sve_str)
217
+ TCGv_ptr qd;
218
+ TCGv_i32 rn, rm;
219
+
220
+ /*
221
+ * Vector increment/decrement with wrap and duplicate (VIWDUp, VDWDUP)
222
+ * This fills the vector with elements of successively increasing
223
+ * or decreasing values, starting from Rn. Rm specifies a point where
224
+ * the count wraps back around to 0. The updated offset is written back
225
+ * to Rn.
226
+ */
227
+ if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qd)) {
228
+ return false;
229
+ }
230
+ if (!fn || a->rm == 13 || a->rm == 15) {
231
+ /*
232
+ * size 0b11 is another encoding; Rm == 13 is UNPREDICTABLE;
233
+ * Rm == 13 is VIWDUP, VDWDUP.
234
+ */
235
+ return false;
236
+ }
237
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
238
+ return true;
239
+ }
240
+
241
+ qd = mve_qreg_ptr(a->qd);
242
+ rn = load_reg(s, a->rn);
243
+ rm = load_reg(s, a->rm);
244
+ fn(rn, cpu_env, qd, rn, rm, tcg_constant_i32(a->imm));
245
+ store_reg(s, a->rn, rn);
246
+ tcg_temp_free_ptr(qd);
247
+ tcg_temp_free_i32(rm);
248
+ mve_update_eci(s);
249
+ return true;
250
+}
251
+
252
+static bool trans_VIDUP(DisasContext *s, arg_vidup *a)
253
+{
254
+ static MVEGenVIDUPFn * const fns[] = {
255
+ gen_helper_mve_vidupb,
256
+ gen_helper_mve_viduph,
257
+ gen_helper_mve_vidupw,
258
+ NULL,
259
+ };
260
+ return do_vidup(s, a, fns[a->size]);
261
+}
262
+
263
+static bool trans_VDDUP(DisasContext *s, arg_vidup *a)
264
+{
265
+ static MVEGenVIDUPFn * const fns[] = {
266
+ gen_helper_mve_vidupb,
267
+ gen_helper_mve_viduph,
268
+ gen_helper_mve_vidupw,
269
+ NULL,
270
+ };
271
+ /* VDDUP is just like VIDUP but with a negative immediate */
272
+ a->imm = -a->imm;
273
+ return do_vidup(s, a, fns[a->size]);
274
+}
275
+
276
+static bool trans_VIWDUP(DisasContext *s, arg_viwdup *a)
277
+{
278
+ static MVEGenVIWDUPFn * const fns[] = {
279
+ gen_helper_mve_viwdupb,
280
+ gen_helper_mve_viwduph,
281
+ gen_helper_mve_viwdupw,
282
+ NULL,
283
+ };
284
+ return do_viwdup(s, a, fns[a->size]);
285
+}
286
+
287
+static bool trans_VDWDUP(DisasContext *s, arg_viwdup *a)
288
+{
289
+ static MVEGenVIWDUPFn * const fns[] = {
290
+ gen_helper_mve_vdwdupb,
291
+ gen_helper_mve_vdwduph,
292
+ gen_helper_mve_vdwdupw,
293
+ NULL,
294
+ };
295
+ return do_viwdup(s, a, fns[a->size]);
296
+}
297
--
62
--
298
2.20.1
63
2.25.1
299
300
diff view generated by jsdifflib
1
Implement the MVE integer min/max across vector insns
1
From: Richard Henderson <richard.henderson@linaro.org>
2
VMAXV, VMINV, VMAXAV and VMINAV, which find the maximum
3
from the vector elements and a general purpose register,
4
and store the maximum back into the general purpose
5
register.
6
2
7
These insns overlap with VRMLALDAVH (they use what would
3
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
8
be RdaHi=0b110).
4
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
5
Message-id: 20220708151540.18136-24-richard.henderson@linaro.org
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
---
8
target/arm/helper-sme.h | 5 +++
9
target/arm/sme.decode | 11 +++++
10
target/arm/sme_helper.c | 90 ++++++++++++++++++++++++++++++++++++++
11
target/arm/translate-sme.c | 31 +++++++++++++
12
4 files changed, 137 insertions(+)
9
13
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
14
diff --git a/target/arm/helper-sme.h b/target/arm/helper-sme.h
11
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
12
---
13
target/arm/helper-mve.h | 20 ++++++++++++
14
target/arm/mve.decode | 18 +++++++++--
15
target/arm/mve_helper.c | 66 ++++++++++++++++++++++++++++++++++++++
16
target/arm/translate-mve.c | 48 +++++++++++++++++++++++++++
17
4 files changed, 150 insertions(+), 2 deletions(-)
18
19
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
20
index XXXXXXX..XXXXXXX 100644
15
index XXXXXXX..XXXXXXX 100644
21
--- a/target/arm/helper-mve.h
16
--- a/target/arm/helper-sme.h
22
+++ b/target/arm/helper-mve.h
17
+++ b/target/arm/helper-sme.h
23
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vaddvuh, TCG_CALL_NO_WG, i32, env, ptr, i32)
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sme_st1q_be_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i
24
DEF_HELPER_FLAGS_3(mve_vaddvsw, TCG_CALL_NO_WG, i32, env, ptr, i32)
19
DEF_HELPER_FLAGS_5(sme_st1q_le_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
25
DEF_HELPER_FLAGS_3(mve_vaddvuw, TCG_CALL_NO_WG, i32, env, ptr, i32)
20
DEF_HELPER_FLAGS_5(sme_st1q_be_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
26
21
DEF_HELPER_FLAGS_5(sme_st1q_le_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
27
+DEF_HELPER_FLAGS_3(mve_vmaxvsb, TCG_CALL_NO_WG, i32, env, ptr, i32)
28
+DEF_HELPER_FLAGS_3(mve_vmaxvsh, TCG_CALL_NO_WG, i32, env, ptr, i32)
29
+DEF_HELPER_FLAGS_3(mve_vmaxvsw, TCG_CALL_NO_WG, i32, env, ptr, i32)
30
+DEF_HELPER_FLAGS_3(mve_vmaxvub, TCG_CALL_NO_WG, i32, env, ptr, i32)
31
+DEF_HELPER_FLAGS_3(mve_vmaxvuh, TCG_CALL_NO_WG, i32, env, ptr, i32)
32
+DEF_HELPER_FLAGS_3(mve_vmaxvuw, TCG_CALL_NO_WG, i32, env, ptr, i32)
33
+DEF_HELPER_FLAGS_3(mve_vmaxavb, TCG_CALL_NO_WG, i32, env, ptr, i32)
34
+DEF_HELPER_FLAGS_3(mve_vmaxavh, TCG_CALL_NO_WG, i32, env, ptr, i32)
35
+DEF_HELPER_FLAGS_3(mve_vmaxavw, TCG_CALL_NO_WG, i32, env, ptr, i32)
36
+
22
+
37
+DEF_HELPER_FLAGS_3(mve_vminvsb, TCG_CALL_NO_WG, i32, env, ptr, i32)
23
+DEF_HELPER_FLAGS_5(sme_addha_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
38
+DEF_HELPER_FLAGS_3(mve_vminvsh, TCG_CALL_NO_WG, i32, env, ptr, i32)
24
+DEF_HELPER_FLAGS_5(sme_addva_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
39
+DEF_HELPER_FLAGS_3(mve_vminvsw, TCG_CALL_NO_WG, i32, env, ptr, i32)
25
+DEF_HELPER_FLAGS_5(sme_addha_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
40
+DEF_HELPER_FLAGS_3(mve_vminvub, TCG_CALL_NO_WG, i32, env, ptr, i32)
26
+DEF_HELPER_FLAGS_5(sme_addva_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
41
+DEF_HELPER_FLAGS_3(mve_vminvuh, TCG_CALL_NO_WG, i32, env, ptr, i32)
27
diff --git a/target/arm/sme.decode b/target/arm/sme.decode
42
+DEF_HELPER_FLAGS_3(mve_vminvuw, TCG_CALL_NO_WG, i32, env, ptr, i32)
28
index XXXXXXX..XXXXXXX 100644
43
+DEF_HELPER_FLAGS_3(mve_vminavb, TCG_CALL_NO_WG, i32, env, ptr, i32)
29
--- a/target/arm/sme.decode
44
+DEF_HELPER_FLAGS_3(mve_vminavh, TCG_CALL_NO_WG, i32, env, ptr, i32)
30
+++ b/target/arm/sme.decode
45
+DEF_HELPER_FLAGS_3(mve_vminavw, TCG_CALL_NO_WG, i32, env, ptr, i32)
31
@@ -XXX,XX +XXX,XX @@ LDST1 1110000 111 st:1 rm:5 v:1 .. pg:3 rn:5 0 za_imm:4 \
32
33
LDR 1110000 100 0 000000 .. 000 ..... 0 .... @ldstr
34
STR 1110000 100 1 000000 .. 000 ..... 0 .... @ldstr
46
+
35
+
47
DEF_HELPER_FLAGS_3(mve_vaddlv_s, TCG_CALL_NO_WG, i64, env, ptr, i64)
36
+### SME Add Vector to Array
48
DEF_HELPER_FLAGS_3(mve_vaddlv_u, TCG_CALL_NO_WG, i64, env, ptr, i64)
37
+
49
38
+&adda zad zn pm pn
50
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
39
+@adda_32 ........ .. ..... . pm:3 pn:3 zn:5 ... zad:2 &adda
40
+@adda_64 ........ .. ..... . pm:3 pn:3 zn:5 .. zad:3 &adda
41
+
42
+ADDHA_s 11000000 10 01000 0 ... ... ..... 000 .. @adda_32
43
+ADDVA_s 11000000 10 01000 1 ... ... ..... 000 .. @adda_32
44
+ADDHA_d 11000000 11 01000 0 ... ... ..... 00 ... @adda_64
45
+ADDVA_d 11000000 11 01000 1 ... ... ..... 00 ... @adda_64
46
diff --git a/target/arm/sme_helper.c b/target/arm/sme_helper.c
51
index XXXXXXX..XXXXXXX 100644
47
index XXXXXXX..XXXXXXX 100644
52
--- a/target/arm/mve.decode
48
--- a/target/arm/sme_helper.c
53
+++ b/target/arm/mve.decode
49
+++ b/target/arm/sme_helper.c
54
@@ -XXX,XX +XXX,XX @@
50
@@ -XXX,XX +XXX,XX @@ DO_ST(q, _be, MO_128)
55
&vcmp qm qn size mask
51
DO_ST(q, _le, MO_128)
56
&vcmp_scalar qn rm size mask
52
57
&shl_scalar qda rm size
53
#undef DO_ST
58
+&vmaxv qm rda size
59
60
@vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
61
# Note that both Rn and Qd are 3 bits only (no D bit)
62
@@ -XXX,XX +XXX,XX @@
63
@vcmp_scalar .... .... .. size:2 qn:3 . .... .... .... rm:4 &vcmp_scalar \
64
mask=%mask_22_13
65
66
+@vmaxv .... .... .... size:2 .. rda:4 .... .... .... &vmaxv qm=%qm
67
+
54
+
68
# Vector loads and stores
55
+void HELPER(sme_addha_s)(void *vzda, void *vzn, void *vpn,
69
56
+ void *vpm, uint32_t desc)
70
# Widening loads and narrowing stores:
71
@@ -XXX,XX +XXX,XX @@ VMLALDAV_U 1111 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
72
73
VMLSLDAV 1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 1 @vmlaldav
74
75
-VRMLALDAVH_S 1110 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
76
-VRMLALDAVH_U 1111 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
77
+{
57
+{
78
+ VMAXV_S 1110 1110 1110 .. 10 .... 1111 0 0 . 0 ... 0 @vmaxv
58
+ intptr_t row, col, oprsz = simd_oprsz(desc) / 4;
79
+ VMINV_S 1110 1110 1110 .. 10 .... 1111 1 0 . 0 ... 0 @vmaxv
59
+ uint64_t *pn = vpn, *pm = vpm;
80
+ VMAXAV 1110 1110 1110 .. 00 .... 1111 0 0 . 0 ... 0 @vmaxv
60
+ uint32_t *zda = vzda, *zn = vzn;
81
+ VMINAV 1110 1110 1110 .. 00 .... 1111 1 0 . 0 ... 0 @vmaxv
61
+
82
+ VRMLALDAVH_S 1110 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
62
+ for (row = 0; row < oprsz; ) {
63
+ uint64_t pa = pn[row >> 4];
64
+ do {
65
+ if (pa & 1) {
66
+ for (col = 0; col < oprsz; ) {
67
+ uint64_t pb = pm[col >> 4];
68
+ do {
69
+ if (pb & 1) {
70
+ zda[tile_vslice_index(row) + H4(col)] += zn[H4(col)];
71
+ }
72
+ pb >>= 4;
73
+ } while (++col & 15);
74
+ }
75
+ }
76
+ pa >>= 4;
77
+ } while (++row & 15);
78
+ }
83
+}
79
+}
84
+
80
+
81
+void HELPER(sme_addha_d)(void *vzda, void *vzn, void *vpn,
82
+ void *vpm, uint32_t desc)
85
+{
83
+{
86
+ VMAXV_U 1111 1110 1110 .. 10 .... 1111 0 0 . 0 ... 0 @vmaxv
84
+ intptr_t row, col, oprsz = simd_oprsz(desc) / 8;
87
+ VMINV_U 1111 1110 1110 .. 10 .... 1111 1 0 . 0 ... 0 @vmaxv
85
+ uint8_t *pn = vpn, *pm = vpm;
88
+ VRMLALDAVH_U 1111 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
86
+ uint64_t *zda = vzda, *zn = vzn;
89
+}
90
91
VRMLSLDAVH 1111 1110 1 ... ... 0 ... . 1110 . 0 . 0 ... 1 @vmlaldav_nosz
92
93
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
94
index XXXXXXX..XXXXXXX 100644
95
--- a/target/arm/mve_helper.c
96
+++ b/target/arm/mve_helper.c
97
@@ -XXX,XX +XXX,XX @@ DO_VADDV(vaddvub, 1, uint8_t)
98
DO_VADDV(vaddvuh, 2, uint16_t)
99
DO_VADDV(vaddvuw, 4, uint32_t)
100
101
+/*
102
+ * Vector max/min across vector. Unlike VADDV, we must
103
+ * read ra as the element size, not its full width.
104
+ * We work with int64_t internally for simplicity.
105
+ */
106
+#define DO_VMAXMINV(OP, ESIZE, TYPE, RATYPE, FN) \
107
+ uint32_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vm, \
108
+ uint32_t ra_in) \
109
+ { \
110
+ uint16_t mask = mve_element_mask(env); \
111
+ unsigned e; \
112
+ TYPE *m = vm; \
113
+ int64_t ra = (RATYPE)ra_in; \
114
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
115
+ if (mask & 1) { \
116
+ ra = FN(ra, m[H##ESIZE(e)]); \
117
+ } \
118
+ } \
119
+ mve_advance_vpt(env); \
120
+ return ra; \
121
+ } \
122
+
87
+
123
+#define DO_VMAXMINV_U(INSN, FN) \
88
+ for (row = 0; row < oprsz; ++row) {
124
+ DO_VMAXMINV(INSN##b, 1, uint8_t, uint8_t, FN) \
89
+ if (pn[H1(row)] & 1) {
125
+ DO_VMAXMINV(INSN##h, 2, uint16_t, uint16_t, FN) \
90
+ for (col = 0; col < oprsz; ++col) {
126
+ DO_VMAXMINV(INSN##w, 4, uint32_t, uint32_t, FN)
91
+ if (pm[H1(col)] & 1) {
127
+#define DO_VMAXMINV_S(INSN, FN) \
92
+ zda[tile_vslice_index(row) + col] += zn[col];
128
+ DO_VMAXMINV(INSN##b, 1, int8_t, int8_t, FN) \
93
+ }
129
+ DO_VMAXMINV(INSN##h, 2, int16_t, int16_t, FN) \
94
+ }
130
+ DO_VMAXMINV(INSN##w, 4, int32_t, int32_t, FN)
95
+ }
131
+
132
+/*
133
+ * Helpers for max and min of absolute values across vector:
134
+ * note that we only take the absolute value of 'm', not 'n'
135
+ */
136
+static int64_t do_maxa(int64_t n, int64_t m)
137
+{
138
+ if (m < 0) {
139
+ m = -m;
140
+ }
96
+ }
141
+ return MAX(n, m);
142
+}
97
+}
143
+
98
+
144
+static int64_t do_mina(int64_t n, int64_t m)
99
+void HELPER(sme_addva_s)(void *vzda, void *vzn, void *vpn,
100
+ void *vpm, uint32_t desc)
145
+{
101
+{
146
+ if (m < 0) {
102
+ intptr_t row, col, oprsz = simd_oprsz(desc) / 4;
147
+ m = -m;
103
+ uint64_t *pn = vpn, *pm = vpm;
104
+ uint32_t *zda = vzda, *zn = vzn;
105
+
106
+ for (row = 0; row < oprsz; ) {
107
+ uint64_t pa = pn[row >> 4];
108
+ do {
109
+ if (pa & 1) {
110
+ uint32_t zn_row = zn[H4(row)];
111
+ for (col = 0; col < oprsz; ) {
112
+ uint64_t pb = pm[col >> 4];
113
+ do {
114
+ if (pb & 1) {
115
+ zda[tile_vslice_index(row) + H4(col)] += zn_row;
116
+ }
117
+ pb >>= 4;
118
+ } while (++col & 15);
119
+ }
120
+ }
121
+ pa >>= 4;
122
+ } while (++row & 15);
148
+ }
123
+ }
149
+ return MIN(n, m);
150
+}
124
+}
151
+
125
+
152
+DO_VMAXMINV_S(vmaxvs, DO_MAX)
126
+void HELPER(sme_addva_d)(void *vzda, void *vzn, void *vpn,
153
+DO_VMAXMINV_U(vmaxvu, DO_MAX)
127
+ void *vpm, uint32_t desc)
154
+DO_VMAXMINV_S(vminvs, DO_MIN)
128
+{
155
+DO_VMAXMINV_U(vminvu, DO_MIN)
129
+ intptr_t row, col, oprsz = simd_oprsz(desc) / 8;
156
+/*
130
+ uint8_t *pn = vpn, *pm = vpm;
157
+ * VMAXAV, VMINAV treat the general purpose input as unsigned
131
+ uint64_t *zda = vzda, *zn = vzn;
158
+ * and the vector elements as signed.
159
+ */
160
+DO_VMAXMINV(vmaxavb, 1, int8_t, uint8_t, do_maxa)
161
+DO_VMAXMINV(vmaxavh, 2, int16_t, uint16_t, do_maxa)
162
+DO_VMAXMINV(vmaxavw, 4, int32_t, uint32_t, do_maxa)
163
+DO_VMAXMINV(vminavb, 1, int8_t, uint8_t, do_mina)
164
+DO_VMAXMINV(vminavh, 2, int16_t, uint16_t, do_mina)
165
+DO_VMAXMINV(vminavw, 4, int32_t, uint32_t, do_mina)
166
+
132
+
167
#define DO_VADDLV(OP, TYPE, LTYPE) \
133
+ for (row = 0; row < oprsz; ++row) {
168
uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vm, \
134
+ if (pn[H1(row)] & 1) {
169
uint64_t ra) \
135
+ uint64_t zn_row = zn[row];
170
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
136
+ for (col = 0; col < oprsz; ++col) {
137
+ if (pm[H1(col)] & 1) {
138
+ zda[tile_vslice_index(row) + col] += zn_row;
139
+ }
140
+ }
141
+ }
142
+ }
143
+}
144
diff --git a/target/arm/translate-sme.c b/target/arm/translate-sme.c
171
index XXXXXXX..XXXXXXX 100644
145
index XXXXXXX..XXXXXXX 100644
172
--- a/target/arm/translate-mve.c
146
--- a/target/arm/translate-sme.c
173
+++ b/target/arm/translate-mve.c
147
+++ b/target/arm/translate-sme.c
174
@@ -XXX,XX +XXX,XX @@ DO_VCMP(VCMPGE, vcmpge)
148
@@ -XXX,XX +XXX,XX @@ static bool do_ldst_r(DisasContext *s, arg_ldstr *a, GenLdStR *fn)
175
DO_VCMP(VCMPLT, vcmplt)
149
176
DO_VCMP(VCMPGT, vcmpgt)
150
TRANS_FEAT(LDR, aa64_sme, do_ldst_r, a, gen_sve_ldr)
177
DO_VCMP(VCMPLE, vcmple)
151
TRANS_FEAT(STR, aa64_sme, do_ldst_r, a, gen_sve_str)
178
+
152
+
179
+static bool do_vmaxv(DisasContext *s, arg_vmaxv *a, MVEGenVADDVFn fn)
153
+static bool do_adda(DisasContext *s, arg_adda *a, MemOp esz,
154
+ gen_helper_gvec_4 *fn)
180
+{
155
+{
181
+ /*
156
+ int svl = streaming_vec_reg_size(s);
182
+ * MIN/MAX operations across a vector: compute the min or
157
+ uint32_t desc = simd_desc(svl, svl, 0);
183
+ * max of the initial value in a general purpose register
158
+ TCGv_ptr za, zn, pn, pm;
184
+ * and all the elements in the vector, and store it back
185
+ * into the general purpose register.
186
+ */
187
+ TCGv_ptr qm;
188
+ TCGv_i32 rda;
189
+
159
+
190
+ if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qm) ||
160
+ if (!sme_smza_enabled_check(s)) {
191
+ !fn || a->rda == 13 || a->rda == 15) {
192
+ /* Rda cases are UNPREDICTABLE */
193
+ return false;
194
+ }
195
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
196
+ return true;
161
+ return true;
197
+ }
162
+ }
198
+
163
+
199
+ qm = mve_qreg_ptr(a->qm);
164
+ /* Sum XZR+zad to find ZAd. */
200
+ rda = load_reg(s, a->rda);
165
+ za = get_tile_rowcol(s, esz, 31, a->zad, false);
201
+ fn(rda, cpu_env, qm, rda);
166
+ zn = vec_full_reg_ptr(s, a->zn);
202
+ store_reg(s, a->rda, rda);
167
+ pn = pred_full_reg_ptr(s, a->pn);
203
+ tcg_temp_free_ptr(qm);
168
+ pm = pred_full_reg_ptr(s, a->pm);
204
+ mve_update_eci(s);
169
+
170
+ fn(za, zn, pn, pm, tcg_constant_i32(desc));
171
+
172
+ tcg_temp_free_ptr(za);
173
+ tcg_temp_free_ptr(zn);
174
+ tcg_temp_free_ptr(pn);
175
+ tcg_temp_free_ptr(pm);
205
+ return true;
176
+ return true;
206
+}
177
+}
207
+
178
+
208
+#define DO_VMAXV(INSN, FN) \
179
+TRANS_FEAT(ADDHA_s, aa64_sme, do_adda, a, MO_32, gen_helper_sme_addha_s)
209
+ static bool trans_##INSN(DisasContext *s, arg_vmaxv *a) \
180
+TRANS_FEAT(ADDVA_s, aa64_sme, do_adda, a, MO_32, gen_helper_sme_addva_s)
210
+ { \
181
+TRANS_FEAT(ADDHA_d, aa64_sme_i16i64, do_adda, a, MO_64, gen_helper_sme_addha_d)
211
+ static MVEGenVADDVFn * const fns[] = { \
182
+TRANS_FEAT(ADDVA_d, aa64_sme_i16i64, do_adda, a, MO_64, gen_helper_sme_addva_d)
212
+ gen_helper_mve_##FN##b, \
213
+ gen_helper_mve_##FN##h, \
214
+ gen_helper_mve_##FN##w, \
215
+ NULL, \
216
+ }; \
217
+ return do_vmaxv(s, a, fns[a->size]); \
218
+ }
219
+
220
+DO_VMAXV(VMAXV_S, vmaxvs)
221
+DO_VMAXV(VMAXV_U, vmaxvu)
222
+DO_VMAXV(VMAXAV, vmaxav)
223
+DO_VMAXV(VMINV_S, vminvs)
224
+DO_VMAXV(VMINV_U, vminvu)
225
+DO_VMAXV(VMINAV, vminav)
226
--
183
--
227
2.20.1
184
2.25.1
228
229
diff view generated by jsdifflib
1
Implement the MVE instructions which perform shifts by a scalar.
1
From: Richard Henderson <richard.henderson@linaro.org>
2
These are VSHL T2, VRSHL T2, VQSHL T1 and VQRSHL T2. They take the
3
shift amount in a general purpose register and shift every element in
4
the vector by that amount.
5
2
6
Mostly we can reuse the helper functions for shift-by-immediate; we
3
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
7
do need two new helpers for VQRSHL.
4
Message-id: 20220708151540.18136-25-richard.henderson@linaro.org
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
---
8
target/arm/helper-sme.h | 5 +++
9
target/arm/sme.decode | 9 +++++
10
target/arm/sme_helper.c | 69 ++++++++++++++++++++++++++++++++++++++
11
target/arm/translate-sme.c | 32 ++++++++++++++++++
12
4 files changed, 115 insertions(+)
8
13
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
14
diff --git a/target/arm/helper-sme.h b/target/arm/helper-sme.h
10
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
11
---
12
target/arm/helper-mve.h | 8 +++++++
13
target/arm/mve.decode | 23 ++++++++++++++++---
14
target/arm/mve_helper.c | 2 ++
15
target/arm/translate-mve.c | 46 ++++++++++++++++++++++++++++++++++++++
16
4 files changed, 76 insertions(+), 3 deletions(-)
17
18
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
19
index XXXXXXX..XXXXXXX 100644
15
index XXXXXXX..XXXXXXX 100644
20
--- a/target/arm/helper-mve.h
16
--- a/target/arm/helper-sme.h
21
+++ b/target/arm/helper-mve.h
17
+++ b/target/arm/helper-sme.h
22
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vrshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sme_addha_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
23
DEF_HELPER_FLAGS_4(mve_vrshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
19
DEF_HELPER_FLAGS_5(sme_addva_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
24
DEF_HELPER_FLAGS_4(mve_vrshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
20
DEF_HELPER_FLAGS_5(sme_addha_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
25
21
DEF_HELPER_FLAGS_5(sme_addva_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
26
+DEF_HELPER_FLAGS_4(mve_vqrshli_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
27
+DEF_HELPER_FLAGS_4(mve_vqrshli_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
28
+DEF_HELPER_FLAGS_4(mve_vqrshli_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
29
+
22
+
30
+DEF_HELPER_FLAGS_4(mve_vqrshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
23
+DEF_HELPER_FLAGS_7(sme_fmopa_s, TCG_CALL_NO_RWG,
31
+DEF_HELPER_FLAGS_4(mve_vqrshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
24
+ void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
32
+DEF_HELPER_FLAGS_4(mve_vqrshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
25
+DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG,
26
+ void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
27
diff --git a/target/arm/sme.decode b/target/arm/sme.decode
28
index XXXXXXX..XXXXXXX 100644
29
--- a/target/arm/sme.decode
30
+++ b/target/arm/sme.decode
31
@@ -XXX,XX +XXX,XX @@ ADDHA_s 11000000 10 01000 0 ... ... ..... 000 .. @adda_32
32
ADDVA_s 11000000 10 01000 1 ... ... ..... 000 .. @adda_32
33
ADDHA_d 11000000 11 01000 0 ... ... ..... 00 ... @adda_64
34
ADDVA_d 11000000 11 01000 1 ... ... ..... 00 ... @adda_64
33
+
35
+
34
DEF_HELPER_FLAGS_4(mve_vshllbsb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
36
+### SME Outer Product
35
DEF_HELPER_FLAGS_4(mve_vshllbsh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
37
+
36
DEF_HELPER_FLAGS_4(mve_vshllbub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
38
+&op zad zn zm pm pn sub:bool
37
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
39
+@op_32 ........ ... zm:5 pm:3 pn:3 zn:5 sub:1 .. zad:2 &op
40
+@op_64 ........ ... zm:5 pm:3 pn:3 zn:5 sub:1 . zad:3 &op
41
+
42
+FMOPA_s 10000000 100 ..... ... ... ..... . 00 .. @op_32
43
+FMOPA_d 10000000 110 ..... ... ... ..... . 0 ... @op_64
44
diff --git a/target/arm/sme_helper.c b/target/arm/sme_helper.c
38
index XXXXXXX..XXXXXXX 100644
45
index XXXXXXX..XXXXXXX 100644
39
--- a/target/arm/mve.decode
46
--- a/target/arm/sme_helper.c
40
+++ b/target/arm/mve.decode
47
+++ b/target/arm/sme_helper.c
41
@@ -XXX,XX +XXX,XX @@
48
@@ -XXX,XX +XXX,XX @@
42
&viwdup qd rn rm size imm
49
#include "exec/cpu_ldst.h"
43
&vcmp qm qn size mask
50
#include "exec/exec-all.h"
44
&vcmp_scalar qn rm size mask
51
#include "qemu/int128.h"
45
+&shl_scalar qda rm size
52
+#include "fpu/softfloat.h"
46
53
#include "vec_internal.h"
47
@vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
54
#include "sve_ldst_internal.h"
48
# Note that both Rn and Qd are 3 bits only (no D bit)
55
49
@@ -XXX,XX +XXX,XX @@
56
@@ -XXX,XX +XXX,XX @@ void HELPER(sme_addva_d)(void *vzda, void *vzn, void *vpn,
50
@2_shr_w .... .... .. 1 ..... .... .... .... .... &2shift qd=%qd qm=%qm \
57
}
51
size=2 shift=%rshift_i5
58
}
52
59
}
53
+@shl_scalar .... .... .... size:2 .. .... .... .... rm:4 &shl_scalar qda=%qd
54
+
60
+
55
# Vector comparison; 4-bit Qm but 3-bit Qn
61
+void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn,
56
%mask_22_13 22:1 13:3
62
+ void *vpm, void *vst, uint32_t desc)
57
@vcmp .... .... .. size:2 qn:3 . .... .... .... .... &vcmp qm=%qm mask=%mask_22_13
63
+{
58
@@ -XXX,XX +XXX,XX @@ VRMLSLDAVH 1111 1110 1 ... ... 0 ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav_no
64
+ intptr_t row, col, oprsz = simd_maxsz(desc);
59
65
+ uint32_t neg = simd_data(desc) << 31;
60
VADD_scalar 1110 1110 0 . .. ... 1 ... 0 1111 . 100 .... @2scalar
66
+ uint16_t *pn = vpn, *pm = vpm;
61
VSUB_scalar 1110 1110 0 . .. ... 1 ... 1 1111 . 100 .... @2scalar
67
+ float_status fpst;
62
-VMUL_scalar 1110 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
63
+
68
+
64
+{
69
+ /*
65
+ VSHL_S_scalar 1110 1110 0 . 11 .. 01 ... 1 1110 0110 .... @shl_scalar
70
+ * Make a copy of float_status because this operation does not
66
+ VRSHL_S_scalar 1110 1110 0 . 11 .. 11 ... 1 1110 0110 .... @shl_scalar
71
+ * update the cumulative fp exception status. It also produces
67
+ VQSHL_S_scalar 1110 1110 0 . 11 .. 01 ... 1 1110 1110 .... @shl_scalar
72
+ * default nans.
68
+ VQRSHL_S_scalar 1110 1110 0 . 11 .. 11 ... 1 1110 1110 .... @shl_scalar
73
+ */
69
+ VMUL_scalar 1110 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
74
+ fpst = *(float_status *)vst;
75
+ set_default_nan_mode(true, &fpst);
76
+
77
+ for (row = 0; row < oprsz; ) {
78
+ uint16_t pa = pn[H2(row >> 4)];
79
+ do {
80
+ if (pa & 1) {
81
+ void *vza_row = vza + tile_vslice_offset(row);
82
+ uint32_t n = *(uint32_t *)(vzn + H1_4(row)) ^ neg;
83
+
84
+ for (col = 0; col < oprsz; ) {
85
+ uint16_t pb = pm[H2(col >> 4)];
86
+ do {
87
+ if (pb & 1) {
88
+ uint32_t *a = vza_row + H1_4(col);
89
+ uint32_t *m = vzm + H1_4(col);
90
+ *a = float32_muladd(n, *m, *a, 0, vst);
91
+ }
92
+ col += 4;
93
+ pb >>= 4;
94
+ } while (col & 15);
95
+ }
96
+ }
97
+ row += 4;
98
+ pa >>= 4;
99
+ } while (row & 15);
100
+ }
70
+}
101
+}
71
+
102
+
103
+void HELPER(sme_fmopa_d)(void *vza, void *vzn, void *vzm, void *vpn,
104
+ void *vpm, void *vst, uint32_t desc)
72
+{
105
+{
73
+ VSHL_U_scalar 1111 1110 0 . 11 .. 01 ... 1 1110 0110 .... @shl_scalar
106
+ intptr_t row, col, oprsz = simd_oprsz(desc) / 8;
74
+ VRSHL_U_scalar 1111 1110 0 . 11 .. 11 ... 1 1110 0110 .... @shl_scalar
107
+ uint64_t neg = (uint64_t)simd_data(desc) << 63;
75
+ VQSHL_U_scalar 1111 1110 0 . 11 .. 01 ... 1 1110 1110 .... @shl_scalar
108
+ uint64_t *za = vza, *zn = vzn, *zm = vzm;
76
+ VQRSHL_U_scalar 1111 1110 0 . 11 .. 11 ... 1 1110 1110 .... @shl_scalar
109
+ uint8_t *pn = vpn, *pm = vpm;
77
+ VBRSR 1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
110
+ float_status fpst = *(float_status *)vst;
111
+
112
+ set_default_nan_mode(true, &fpst);
113
+
114
+ for (row = 0; row < oprsz; ++row) {
115
+ if (pn[H1(row)] & 1) {
116
+ uint64_t *za_row = &za[tile_vslice_index(row)];
117
+ uint64_t n = zn[row] ^ neg;
118
+
119
+ for (col = 0; col < oprsz; ++col) {
120
+ if (pm[H1(col)] & 1) {
121
+ uint64_t *a = &za_row[col];
122
+ *a = float64_muladd(n, zm[col], *a, 0, &fpst);
123
+ }
124
+ }
125
+ }
126
+ }
78
+}
127
+}
128
diff --git a/target/arm/translate-sme.c b/target/arm/translate-sme.c
129
index XXXXXXX..XXXXXXX 100644
130
--- a/target/arm/translate-sme.c
131
+++ b/target/arm/translate-sme.c
132
@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(ADDHA_s, aa64_sme, do_adda, a, MO_32, gen_helper_sme_addha_s)
133
TRANS_FEAT(ADDVA_s, aa64_sme, do_adda, a, MO_32, gen_helper_sme_addva_s)
134
TRANS_FEAT(ADDHA_d, aa64_sme_i16i64, do_adda, a, MO_64, gen_helper_sme_addha_d)
135
TRANS_FEAT(ADDVA_d, aa64_sme_i16i64, do_adda, a, MO_64, gen_helper_sme_addva_d)
79
+
136
+
80
VHADD_S_scalar 1110 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
137
+static bool do_outprod_fpst(DisasContext *s, arg_op *a, MemOp esz,
81
VHADD_U_scalar 1111 1110 0 . .. ... 0 ... 0 1111 . 100 .... @2scalar
138
+ gen_helper_gvec_5_ptr *fn)
82
VHSUB_S_scalar 1110 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
83
@@ -XXX,XX +XXX,XX @@ VHSUB_U_scalar 1111 1110 0 . .. ... 0 ... 1 1111 . 100 .... @2scalar
84
size=%size_28
85
}
86
87
-VBRSR 1111 1110 0 . .. ... 1 ... 1 1110 . 110 .... @2scalar
88
-
89
VQDMULH_scalar 1110 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
90
VQRDMULH_scalar 1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
91
92
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
93
index XXXXXXX..XXXXXXX 100644
94
--- a/target/arm/mve_helper.c
95
+++ b/target/arm/mve_helper.c
96
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_SAT_S(vqshli_s, DO_SQSHL_OP)
97
DO_2SHIFT_SAT_S(vqshlui_s, DO_SUQSHL_OP)
98
DO_2SHIFT_U(vrshli_u, DO_VRSHLU)
99
DO_2SHIFT_S(vrshli_s, DO_VRSHLS)
100
+DO_2SHIFT_SAT_U(vqrshli_u, DO_UQRSHL_OP)
101
+DO_2SHIFT_SAT_S(vqrshli_s, DO_SQRSHL_OP)
102
103
/* Shift-and-insert; we always work with 64 bits at a time */
104
#define DO_2SHIFT_INSERT(OP, ESIZE, SHIFTFN, MASKFN) \
105
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
106
index XXXXXXX..XXXXXXX 100644
107
--- a/target/arm/translate-mve.c
108
+++ b/target/arm/translate-mve.c
109
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT(VRSHRI_U, vrshli_u, true)
110
DO_2SHIFT(VSRI, vsri, false)
111
DO_2SHIFT(VSLI, vsli, false)
112
113
+static bool do_2shift_scalar(DisasContext *s, arg_shl_scalar *a,
114
+ MVEGenTwoOpShiftFn *fn)
115
+{
139
+{
116
+ TCGv_ptr qda;
140
+ int svl = streaming_vec_reg_size(s);
117
+ TCGv_i32 rm;
141
+ uint32_t desc = simd_desc(svl, svl, a->sub);
142
+ TCGv_ptr za, zn, zm, pn, pm, fpst;
118
+
143
+
119
+ if (!dc_isar_feature(aa32_mve, s) ||
144
+ if (!sme_smza_enabled_check(s)) {
120
+ !mve_check_qreg_bank(s, a->qda) ||
121
+ a->rm == 13 || a->rm == 15 || !fn) {
122
+ /* Rm cases are UNPREDICTABLE */
123
+ return false;
124
+ }
125
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
126
+ return true;
145
+ return true;
127
+ }
146
+ }
128
+
147
+
129
+ qda = mve_qreg_ptr(a->qda);
148
+ /* Sum XZR+zad to find ZAd. */
130
+ rm = load_reg(s, a->rm);
149
+ za = get_tile_rowcol(s, esz, 31, a->zad, false);
131
+ fn(cpu_env, qda, qda, rm);
150
+ zn = vec_full_reg_ptr(s, a->zn);
132
+ tcg_temp_free_ptr(qda);
151
+ zm = vec_full_reg_ptr(s, a->zm);
133
+ tcg_temp_free_i32(rm);
152
+ pn = pred_full_reg_ptr(s, a->pn);
134
+ mve_update_eci(s);
153
+ pm = pred_full_reg_ptr(s, a->pm);
154
+ fpst = fpstatus_ptr(FPST_FPCR);
155
+
156
+ fn(za, zn, zm, pn, pm, fpst, tcg_constant_i32(desc));
157
+
158
+ tcg_temp_free_ptr(za);
159
+ tcg_temp_free_ptr(zn);
160
+ tcg_temp_free_ptr(pn);
161
+ tcg_temp_free_ptr(pm);
162
+ tcg_temp_free_ptr(fpst);
135
+ return true;
163
+ return true;
136
+}
164
+}
137
+
165
+
138
+#define DO_2SHIFT_SCALAR(INSN, FN) \
166
+TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, MO_32, gen_helper_sme_fmopa_s)
139
+ static bool trans_##INSN(DisasContext *s, arg_shl_scalar *a) \
167
+TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, MO_64, gen_helper_sme_fmopa_d)
140
+ { \
141
+ static MVEGenTwoOpShiftFn * const fns[] = { \
142
+ gen_helper_mve_##FN##b, \
143
+ gen_helper_mve_##FN##h, \
144
+ gen_helper_mve_##FN##w, \
145
+ NULL, \
146
+ }; \
147
+ return do_2shift_scalar(s, a, fns[a->size]); \
148
+ }
149
+
150
+DO_2SHIFT_SCALAR(VSHL_S_scalar, vshli_s)
151
+DO_2SHIFT_SCALAR(VSHL_U_scalar, vshli_u)
152
+DO_2SHIFT_SCALAR(VRSHL_S_scalar, vrshli_s)
153
+DO_2SHIFT_SCALAR(VRSHL_U_scalar, vrshli_u)
154
+DO_2SHIFT_SCALAR(VQSHL_S_scalar, vqshli_s)
155
+DO_2SHIFT_SCALAR(VQSHL_U_scalar, vqshli_u)
156
+DO_2SHIFT_SCALAR(VQRSHL_S_scalar, vqrshli_s)
157
+DO_2SHIFT_SCALAR(VQRSHL_U_scalar, vqrshli_u)
158
+
159
#define DO_VSHLL(INSN, FN) \
160
static bool trans_##INSN(DisasContext *s, arg_2shift *a) \
161
{ \
162
--
168
--
163
2.20.1
169
2.25.1
164
165
diff view generated by jsdifflib
1
Implement the MVE gather-loads and scatter-stores which
1
From: Richard Henderson <richard.henderson@linaro.org>
2
form the address by adding a base value from a scalar
3
register to an offset in each element of a vector.
4
2
3
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
4
Message-id: 20220708151540.18136-26-richard.henderson@linaro.org
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
---
7
---
8
target/arm/helper-mve.h | 32 +++++++++
8
target/arm/helper-sme.h | 2 ++
9
target/arm/mve.decode | 12 ++++
9
target/arm/sme.decode | 2 ++
10
target/arm/mve_helper.c | 129 +++++++++++++++++++++++++++++++++++++
10
target/arm/sme_helper.c | 56 ++++++++++++++++++++++++++++++++++++++
11
target/arm/translate-mve.c | 97 ++++++++++++++++++++++++++++
11
target/arm/translate-sme.c | 30 ++++++++++++++++++++
12
4 files changed, 270 insertions(+)
12
4 files changed, 90 insertions(+)
13
13
14
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
14
diff --git a/target/arm/helper-sme.h b/target/arm/helper-sme.h
15
index XXXXXXX..XXXXXXX 100644
15
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/helper-mve.h
16
--- a/target/arm/helper-sme.h
17
+++ b/target/arm/helper-mve.h
17
+++ b/target/arm/helper-sme.h
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vstrb_h, TCG_CALL_NO_WG, void, env, ptr, i32)
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_7(sme_fmopa_s, TCG_CALL_NO_RWG,
19
DEF_HELPER_FLAGS_3(mve_vstrb_w, TCG_CALL_NO_WG, void, env, ptr, i32)
19
void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
20
DEF_HELPER_FLAGS_3(mve_vstrh_w, TCG_CALL_NO_WG, void, env, ptr, i32)
20
DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG,
21
21
void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
22
+DEF_HELPER_FLAGS_4(mve_vldrb_sg_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
22
+DEF_HELPER_FLAGS_6(sme_bfmopa, TCG_CALL_NO_RWG,
23
+DEF_HELPER_FLAGS_4(mve_vldrb_sg_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
23
+ void, ptr, ptr, ptr, ptr, ptr, i32)
24
+DEF_HELPER_FLAGS_4(mve_vldrh_sg_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
24
diff --git a/target/arm/sme.decode b/target/arm/sme.decode
25
index XXXXXXX..XXXXXXX 100644
26
--- a/target/arm/sme.decode
27
+++ b/target/arm/sme.decode
28
@@ -XXX,XX +XXX,XX @@ ADDVA_d 11000000 11 01000 1 ... ... ..... 00 ... @adda_64
29
30
FMOPA_s 10000000 100 ..... ... ... ..... . 00 .. @op_32
31
FMOPA_d 10000000 110 ..... ... ... ..... . 0 ... @op_64
25
+
32
+
26
+DEF_HELPER_FLAGS_4(mve_vldrb_sg_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
33
+BFMOPA 10000001 100 ..... ... ... ..... . 00 .. @op_32
27
+DEF_HELPER_FLAGS_4(mve_vldrb_sg_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
34
diff --git a/target/arm/sme_helper.c b/target/arm/sme_helper.c
28
+DEF_HELPER_FLAGS_4(mve_vldrb_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
29
+DEF_HELPER_FLAGS_4(mve_vldrh_sg_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
30
+DEF_HELPER_FLAGS_4(mve_vldrh_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
31
+DEF_HELPER_FLAGS_4(mve_vldrw_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
32
+DEF_HELPER_FLAGS_4(mve_vldrd_sg_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
33
+
34
+DEF_HELPER_FLAGS_4(mve_vstrb_sg_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
35
+DEF_HELPER_FLAGS_4(mve_vstrb_sg_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
36
+DEF_HELPER_FLAGS_4(mve_vstrb_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
37
+DEF_HELPER_FLAGS_4(mve_vstrh_sg_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
38
+DEF_HELPER_FLAGS_4(mve_vstrh_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
39
+DEF_HELPER_FLAGS_4(mve_vstrw_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
40
+DEF_HELPER_FLAGS_4(mve_vstrd_sg_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
41
+
42
+DEF_HELPER_FLAGS_4(mve_vldrh_sg_os_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
43
+
44
+DEF_HELPER_FLAGS_4(mve_vldrh_sg_os_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
45
+DEF_HELPER_FLAGS_4(mve_vldrh_sg_os_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
46
+DEF_HELPER_FLAGS_4(mve_vldrw_sg_os_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
47
+DEF_HELPER_FLAGS_4(mve_vldrd_sg_os_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
48
+
49
+DEF_HELPER_FLAGS_4(mve_vstrh_sg_os_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
50
+DEF_HELPER_FLAGS_4(mve_vstrh_sg_os_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
51
+DEF_HELPER_FLAGS_4(mve_vstrw_sg_os_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
52
+DEF_HELPER_FLAGS_4(mve_vstrd_sg_os_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
53
+
54
DEF_HELPER_FLAGS_3(mve_vdup, TCG_CALL_NO_WG, void, env, ptr, i32)
55
56
DEF_HELPER_FLAGS_4(mve_vidupb, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
57
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
58
index XXXXXXX..XXXXXXX 100644
35
index XXXXXXX..XXXXXXX 100644
59
--- a/target/arm/mve.decode
36
--- a/target/arm/sme_helper.c
60
+++ b/target/arm/mve.decode
37
+++ b/target/arm/sme_helper.c
61
@@ -XXX,XX +XXX,XX @@
38
@@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_d)(void *vza, void *vzn, void *vzm, void *vpn,
62
&shl_scalar qda rm size
39
}
63
&vmaxv qm rda size
40
}
64
&vabav qn qm rda size
41
}
65
+&vldst_sg qd qm rn size msize os
66
+
67
+# scatter-gather memory size is in bits 6:4
68
+%sg_msize 6:1 4:1
69
70
@vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
71
# Note that both Rn and Qd are 3 bits only (no D bit)
72
@vldst_wn ... u:1 ... . . . . l:1 . rn:3 qd:3 . ... .. imm:7 &vldr_vstr
73
74
+@vldst_sg .... .... .... rn:4 .... ... size:2 ... ... os:1 &vldst_sg \
75
+ qd=%qd qm=%qm msize=%sg_msize
76
+
77
@1op .... .... .... size:2 .. .... .... .... .... &1op qd=%qd qm=%qm
78
@1op_nosz .... .... .... .... .... .... .... .... &1op qd=%qd qm=%qm size=0
79
@2op .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn
80
@@ -XXX,XX +XXX,XX @@ VLDR_VSTR 1110110 1 a:1 . w:1 . .... ... 111101 ....... @vldr_vstr \
81
VLDR_VSTR 1110110 1 a:1 . w:1 . .... ... 111110 ....... @vldr_vstr \
82
size=2 p=1
83
84
+# gather loads/scatter stores
85
+VLDR_S_sg 111 0 1100 1 . 01 .... ... 0 111 . .... .... @vldst_sg
86
+VLDR_U_sg 111 1 1100 1 . 01 .... ... 0 111 . .... .... @vldst_sg
87
+VSTR_sg 111 0 1100 1 . 00 .... ... 0 111 . .... .... @vldst_sg
88
+
89
# Moves between 2 32-bit vector lanes and 2 general purpose registers
90
VMOV_to_2gp 1110 1100 0 . 00 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
91
VMOV_from_2gp 1110 1100 0 . 01 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
92
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
93
index XXXXXXX..XXXXXXX 100644
94
--- a/target/arm/mve_helper.c
95
+++ b/target/arm/mve_helper.c
96
@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
97
#undef DO_VLDR
98
#undef DO_VSTR
99
100
+/*
101
+ * Gather loads/scatter stores. Here each element of Qm specifies
102
+ * an offset to use from the base register Rm. In the _os_ versions
103
+ * that offset is scaled by the element size.
104
+ * For loads, predicated lanes are zeroed instead of retaining
105
+ * their previous values.
106
+ */
107
+#define DO_VLDR_SG(OP, LDTYPE, ESIZE, TYPE, OFFTYPE, ADDRFN) \
108
+ void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, \
109
+ uint32_t base) \
110
+ { \
111
+ TYPE *d = vd; \
112
+ OFFTYPE *m = vm; \
113
+ uint16_t mask = mve_element_mask(env); \
114
+ uint16_t eci_mask = mve_eci_mask(env); \
115
+ unsigned e; \
116
+ uint32_t addr; \
117
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE, eci_mask >>= ESIZE) { \
118
+ if (!(eci_mask & 1)) { \
119
+ continue; \
120
+ } \
121
+ addr = ADDRFN(base, m[H##ESIZE(e)]); \
122
+ d[H##ESIZE(e)] = (mask & 1) ? \
123
+ cpu_##LDTYPE##_data_ra(env, addr, GETPC()) : 0; \
124
+ } \
125
+ mve_advance_vpt(env); \
126
+ }
127
+
128
+/* We know here TYPE is unsigned so always the same as the offset type */
129
+#define DO_VSTR_SG(OP, STTYPE, ESIZE, TYPE, ADDRFN) \
130
+ void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, \
131
+ uint32_t base) \
132
+ { \
133
+ TYPE *d = vd; \
134
+ TYPE *m = vm; \
135
+ uint16_t mask = mve_element_mask(env); \
136
+ unsigned e; \
137
+ uint32_t addr; \
138
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
139
+ addr = ADDRFN(base, m[H##ESIZE(e)]); \
140
+ if (mask & 1) { \
141
+ cpu_##STTYPE##_data_ra(env, addr, d[H##ESIZE(e)], GETPC()); \
142
+ } \
143
+ } \
144
+ mve_advance_vpt(env); \
145
+ }
146
+
42
+
147
+/*
43
+/*
148
+ * 64-bit accesses are slightly different: they are done as two 32-bit
44
+ * Alter PAIR as needed for controlling predicates being false,
149
+ * accesses, controlled by the predicate mask for the relevant beat,
45
+ * and for NEG on an enabled row element.
150
+ * and with a single 32-bit offset in the first of the two Qm elements.
151
+ * Note that for QEMU our IMPDEF AIRCR.ENDIANNESS is always 0 (little).
152
+ */
46
+ */
153
+#define DO_VLDR64_SG(OP, ADDRFN) \
47
+static inline uint32_t f16mop_adj_pair(uint32_t pair, uint32_t pg, uint32_t neg)
154
+ void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, \
48
+{
155
+ uint32_t base) \
49
+ /*
156
+ { \
50
+ * The pseudocode uses a conditional negate after the conditional zero.
157
+ uint32_t *d = vd; \
51
+ * It is simpler here to unconditionally negate before conditional zero.
158
+ uint32_t *m = vm; \
52
+ */
159
+ uint16_t mask = mve_element_mask(env); \
53
+ pair ^= neg;
160
+ uint16_t eci_mask = mve_eci_mask(env); \
54
+ if (!(pg & 1)) {
161
+ unsigned e; \
55
+ pair &= 0xffff0000u;
162
+ uint32_t addr; \
163
+ for (e = 0; e < 16 / 4; e++, mask >>= 4, eci_mask >>= 4) { \
164
+ if (!(eci_mask & 1)) { \
165
+ continue; \
166
+ } \
167
+ addr = ADDRFN(base, m[H4(e & ~1)]); \
168
+ addr += 4 * (e & 1); \
169
+ d[H4(e)] = (mask & 1) ? cpu_ldl_data_ra(env, addr, GETPC()) : 0; \
170
+ } \
171
+ mve_advance_vpt(env); \
172
+ }
56
+ }
57
+ if (!(pg & 4)) {
58
+ pair &= 0x0000ffffu;
59
+ }
60
+ return pair;
61
+}
173
+
62
+
174
+#define DO_VSTR64_SG(OP, ADDRFN) \
63
+void HELPER(sme_bfmopa)(void *vza, void *vzn, void *vzm, void *vpn,
175
+ void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, \
64
+ void *vpm, uint32_t desc)
176
+ uint32_t base) \
65
+{
177
+ { \
66
+ intptr_t row, col, oprsz = simd_maxsz(desc);
178
+ uint32_t *d = vd; \
67
+ uint32_t neg = simd_data(desc) * 0x80008000u;
179
+ uint32_t *m = vm; \
68
+ uint16_t *pn = vpn, *pm = vpm;
180
+ uint16_t mask = mve_element_mask(env); \
69
+
181
+ unsigned e; \
70
+ for (row = 0; row < oprsz; ) {
182
+ uint32_t addr; \
71
+ uint16_t prow = pn[H2(row >> 4)];
183
+ for (e = 0; e < 16 / 4; e++, mask >>= 4) { \
72
+ do {
184
+ addr = ADDRFN(base, m[H4(e & ~1)]); \
73
+ void *vza_row = vza + tile_vslice_offset(row);
185
+ addr += 4 * (e & 1); \
74
+ uint32_t n = *(uint32_t *)(vzn + H1_4(row));
186
+ if (mask & 1) { \
75
+
187
+ cpu_stl_data_ra(env, addr, d[H4(e)], GETPC()); \
76
+ n = f16mop_adj_pair(n, prow, neg);
188
+ } \
77
+
189
+ } \
78
+ for (col = 0; col < oprsz; ) {
190
+ mve_advance_vpt(env); \
79
+ uint16_t pcol = pm[H2(col >> 4)];
80
+ do {
81
+ if (prow & pcol & 0b0101) {
82
+ uint32_t *a = vza_row + H1_4(col);
83
+ uint32_t m = *(uint32_t *)(vzm + H1_4(col));
84
+
85
+ m = f16mop_adj_pair(m, pcol, 0);
86
+ *a = bfdotadd(*a, n, m);
87
+
88
+ col += 4;
89
+ pcol >>= 4;
90
+ }
91
+ } while (col & 15);
92
+ }
93
+ row += 4;
94
+ prow >>= 4;
95
+ } while (row & 15);
191
+ }
96
+ }
97
+}
98
diff --git a/target/arm/translate-sme.c b/target/arm/translate-sme.c
99
index XXXXXXX..XXXXXXX 100644
100
--- a/target/arm/translate-sme.c
101
+++ b/target/arm/translate-sme.c
102
@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(ADDVA_s, aa64_sme, do_adda, a, MO_32, gen_helper_sme_addva_s)
103
TRANS_FEAT(ADDHA_d, aa64_sme_i16i64, do_adda, a, MO_64, gen_helper_sme_addha_d)
104
TRANS_FEAT(ADDVA_d, aa64_sme_i16i64, do_adda, a, MO_64, gen_helper_sme_addva_d)
105
106
+static bool do_outprod(DisasContext *s, arg_op *a, MemOp esz,
107
+ gen_helper_gvec_5 *fn)
108
+{
109
+ int svl = streaming_vec_reg_size(s);
110
+ uint32_t desc = simd_desc(svl, svl, a->sub);
111
+ TCGv_ptr za, zn, zm, pn, pm;
192
+
112
+
193
+#define ADDR_ADD(BASE, OFFSET) ((BASE) + (OFFSET))
113
+ if (!sme_smza_enabled_check(s)) {
194
+#define ADDR_ADD_OSH(BASE, OFFSET) ((BASE) + ((OFFSET) << 1))
195
+#define ADDR_ADD_OSW(BASE, OFFSET) ((BASE) + ((OFFSET) << 2))
196
+#define ADDR_ADD_OSD(BASE, OFFSET) ((BASE) + ((OFFSET) << 3))
197
+
198
+DO_VLDR_SG(vldrb_sg_sh, ldsb, 2, int16_t, uint16_t, ADDR_ADD)
199
+DO_VLDR_SG(vldrb_sg_sw, ldsb, 4, int32_t, uint32_t, ADDR_ADD)
200
+DO_VLDR_SG(vldrh_sg_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD)
201
+
202
+DO_VLDR_SG(vldrb_sg_ub, ldub, 1, uint8_t, uint8_t, ADDR_ADD)
203
+DO_VLDR_SG(vldrb_sg_uh, ldub, 2, uint16_t, uint16_t, ADDR_ADD)
204
+DO_VLDR_SG(vldrb_sg_uw, ldub, 4, uint32_t, uint32_t, ADDR_ADD)
205
+DO_VLDR_SG(vldrh_sg_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD)
206
+DO_VLDR_SG(vldrh_sg_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD)
207
+DO_VLDR_SG(vldrw_sg_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD)
208
+DO_VLDR64_SG(vldrd_sg_ud, ADDR_ADD)
209
+
210
+DO_VLDR_SG(vldrh_sg_os_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD_OSH)
211
+DO_VLDR_SG(vldrh_sg_os_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD_OSH)
212
+DO_VLDR_SG(vldrh_sg_os_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD_OSH)
213
+DO_VLDR_SG(vldrw_sg_os_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD_OSW)
214
+DO_VLDR64_SG(vldrd_sg_os_ud, ADDR_ADD_OSD)
215
+
216
+DO_VSTR_SG(vstrb_sg_ub, stb, 1, uint8_t, ADDR_ADD)
217
+DO_VSTR_SG(vstrb_sg_uh, stb, 2, uint16_t, ADDR_ADD)
218
+DO_VSTR_SG(vstrb_sg_uw, stb, 4, uint32_t, ADDR_ADD)
219
+DO_VSTR_SG(vstrh_sg_uh, stw, 2, uint16_t, ADDR_ADD)
220
+DO_VSTR_SG(vstrh_sg_uw, stw, 4, uint32_t, ADDR_ADD)
221
+DO_VSTR_SG(vstrw_sg_uw, stl, 4, uint32_t, ADDR_ADD)
222
+DO_VSTR64_SG(vstrd_sg_ud, ADDR_ADD)
223
+
224
+DO_VSTR_SG(vstrh_sg_os_uh, stw, 2, uint16_t, ADDR_ADD_OSH)
225
+DO_VSTR_SG(vstrh_sg_os_uw, stw, 4, uint32_t, ADDR_ADD_OSH)
226
+DO_VSTR_SG(vstrw_sg_os_uw, stl, 4, uint32_t, ADDR_ADD_OSW)
227
+DO_VSTR64_SG(vstrd_sg_os_ud, ADDR_ADD_OSD)
228
+
229
/*
230
* The mergemask(D, R, M) macro performs the operation "*D = R" but
231
* storing only the bytes which correspond to 1 bits in M,
232
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
233
index XXXXXXX..XXXXXXX 100644
234
--- a/target/arm/translate-mve.c
235
+++ b/target/arm/translate-mve.c
236
@@ -XXX,XX +XXX,XX @@ static inline int vidup_imm(DisasContext *s, int x)
237
#include "decode-mve.c.inc"
238
239
typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
240
+typedef void MVEGenLdStSGFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
241
typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
242
typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
243
typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
244
@@ -XXX,XX +XXX,XX @@ DO_VLDST_WIDE_NARROW(VLDSTB_H, vldrb_sh, vldrb_uh, vstrb_h, MO_8)
245
DO_VLDST_WIDE_NARROW(VLDSTB_W, vldrb_sw, vldrb_uw, vstrb_w, MO_8)
246
DO_VLDST_WIDE_NARROW(VLDSTH_W, vldrh_sw, vldrh_uw, vstrh_w, MO_16)
247
248
+static bool do_ldst_sg(DisasContext *s, arg_vldst_sg *a, MVEGenLdStSGFn fn)
249
+{
250
+ TCGv_i32 addr;
251
+ TCGv_ptr qd, qm;
252
+
253
+ if (!dc_isar_feature(aa32_mve, s) ||
254
+ !mve_check_qreg_bank(s, a->qd | a->qm) ||
255
+ !fn || a->rn == 15) {
256
+ /* Rn case is UNPREDICTABLE */
257
+ return false;
258
+ }
259
+
260
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
261
+ return true;
114
+ return true;
262
+ }
115
+ }
263
+
116
+
264
+ addr = load_reg(s, a->rn);
117
+ /* Sum XZR+zad to find ZAd. */
118
+ za = get_tile_rowcol(s, esz, 31, a->zad, false);
119
+ zn = vec_full_reg_ptr(s, a->zn);
120
+ zm = vec_full_reg_ptr(s, a->zm);
121
+ pn = pred_full_reg_ptr(s, a->pn);
122
+ pm = pred_full_reg_ptr(s, a->pm);
265
+
123
+
266
+ qd = mve_qreg_ptr(a->qd);
124
+ fn(za, zn, zm, pn, pm, tcg_constant_i32(desc));
267
+ qm = mve_qreg_ptr(a->qm);
125
+
268
+ fn(cpu_env, qd, qm, addr);
126
+ tcg_temp_free_ptr(za);
269
+ tcg_temp_free_ptr(qd);
127
+ tcg_temp_free_ptr(zn);
270
+ tcg_temp_free_ptr(qm);
128
+ tcg_temp_free_ptr(pn);
271
+ tcg_temp_free_i32(addr);
129
+ tcg_temp_free_ptr(pm);
272
+ mve_update_eci(s);
273
+ return true;
130
+ return true;
274
+}
131
+}
275
+
132
+
276
+/*
133
static bool do_outprod_fpst(DisasContext *s, arg_op *a, MemOp esz,
277
+ * The naming scheme here is "vldrb_sg_sh == in-memory byte loads
134
gen_helper_gvec_5_ptr *fn)
278
+ * signextended to halfword elements in register". _os_ indicates that
135
{
279
+ * the offsets in Qm should be scaled by the element size.
136
@@ -XXX,XX +XXX,XX @@ static bool do_outprod_fpst(DisasContext *s, arg_op *a, MemOp esz,
280
+ */
137
281
+/* This macro is just to make the arrays more compact in these functions */
138
TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, MO_32, gen_helper_sme_fmopa_s)
282
+#define F(N) gen_helper_mve_##N
139
TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, MO_64, gen_helper_sme_fmopa_d)
283
+
140
+
284
+/* VLDRB/VSTRB (ie msize 1) with OS=1 is UNPREDICTABLE; we UNDEF */
141
+/* TODO: FEAT_EBF16 */
285
+static bool trans_VLDR_S_sg(DisasContext *s, arg_vldst_sg *a)
142
+TRANS_FEAT(BFMOPA, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_bfmopa)
286
+{
287
+ static MVEGenLdStSGFn * const fns[2][4][4] = { {
288
+ { NULL, F(vldrb_sg_sh), F(vldrb_sg_sw), NULL },
289
+ { NULL, NULL, F(vldrh_sg_sw), NULL },
290
+ { NULL, NULL, NULL, NULL },
291
+ { NULL, NULL, NULL, NULL }
292
+ }, {
293
+ { NULL, NULL, NULL, NULL },
294
+ { NULL, NULL, F(vldrh_sg_os_sw), NULL },
295
+ { NULL, NULL, NULL, NULL },
296
+ { NULL, NULL, NULL, NULL }
297
+ }
298
+ };
299
+ if (a->qd == a->qm) {
300
+ return false; /* UNPREDICTABLE */
301
+ }
302
+ return do_ldst_sg(s, a, fns[a->os][a->msize][a->size]);
303
+}
304
+
305
+static bool trans_VLDR_U_sg(DisasContext *s, arg_vldst_sg *a)
306
+{
307
+ static MVEGenLdStSGFn * const fns[2][4][4] = { {
308
+ { F(vldrb_sg_ub), F(vldrb_sg_uh), F(vldrb_sg_uw), NULL },
309
+ { NULL, F(vldrh_sg_uh), F(vldrh_sg_uw), NULL },
310
+ { NULL, NULL, F(vldrw_sg_uw), NULL },
311
+ { NULL, NULL, NULL, F(vldrd_sg_ud) }
312
+ }, {
313
+ { NULL, NULL, NULL, NULL },
314
+ { NULL, F(vldrh_sg_os_uh), F(vldrh_sg_os_uw), NULL },
315
+ { NULL, NULL, F(vldrw_sg_os_uw), NULL },
316
+ { NULL, NULL, NULL, F(vldrd_sg_os_ud) }
317
+ }
318
+ };
319
+ if (a->qd == a->qm) {
320
+ return false; /* UNPREDICTABLE */
321
+ }
322
+ return do_ldst_sg(s, a, fns[a->os][a->msize][a->size]);
323
+}
324
+
325
+static bool trans_VSTR_sg(DisasContext *s, arg_vldst_sg *a)
326
+{
327
+ static MVEGenLdStSGFn * const fns[2][4][4] = { {
328
+ { F(vstrb_sg_ub), F(vstrb_sg_uh), F(vstrb_sg_uw), NULL },
329
+ { NULL, F(vstrh_sg_uh), F(vstrh_sg_uw), NULL },
330
+ { NULL, NULL, F(vstrw_sg_uw), NULL },
331
+ { NULL, NULL, NULL, F(vstrd_sg_ud) }
332
+ }, {
333
+ { NULL, NULL, NULL, NULL },
334
+ { NULL, F(vstrh_sg_os_uh), F(vstrh_sg_os_uw), NULL },
335
+ { NULL, NULL, F(vstrw_sg_os_uw), NULL },
336
+ { NULL, NULL, NULL, F(vstrd_sg_os_ud) }
337
+ }
338
+ };
339
+ return do_ldst_sg(s, a, fns[a->os][a->msize][a->size]);
340
+}
341
+
342
+#undef F
343
+
344
static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
345
{
346
TCGv_ptr qd;
347
--
143
--
348
2.20.1
144
2.25.1
349
350
diff view generated by jsdifflib
1
Implement the MVE VMOV forms that move data between 2 general-purpose
1
From: Richard Henderson <richard.henderson@linaro.org>
2
registers and 2 32-bit lanes in a vector register.
3
2
3
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
4
Message-id: 20220708151540.18136-27-richard.henderson@linaro.org
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
6
---
7
---
7
target/arm/translate-a32.h | 1 +
8
target/arm/helper-sme.h | 2 ++
8
target/arm/mve.decode | 4 ++
9
target/arm/sme.decode | 1 +
9
target/arm/translate-mve.c | 85 ++++++++++++++++++++++++++++++++++++++
10
target/arm/sme_helper.c | 74 ++++++++++++++++++++++++++++++++++++++
10
target/arm/translate-vfp.c | 2 +-
11
target/arm/translate-sme.c | 1 +
11
4 files changed, 91 insertions(+), 1 deletion(-)
12
4 files changed, 78 insertions(+)
12
13
13
diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
14
diff --git a/target/arm/helper-sme.h b/target/arm/helper-sme.h
14
index XXXXXXX..XXXXXXX 100644
15
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/translate-a32.h
16
--- a/target/arm/helper-sme.h
16
+++ b/target/arm/translate-a32.h
17
+++ b/target/arm/helper-sme.h
17
@@ -XXX,XX +XXX,XX @@ void gen_rev16(TCGv_i32 dest, TCGv_i32 var);
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sme_addva_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
18
void clear_eci_state(DisasContext *s);
19
DEF_HELPER_FLAGS_5(sme_addha_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
19
bool mve_eci_check(DisasContext *s);
20
DEF_HELPER_FLAGS_5(sme_addva_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
20
void mve_update_and_store_eci(DisasContext *s);
21
21
+bool mve_skip_vmov(DisasContext *s, int vn, int index, int size);
22
+DEF_HELPER_FLAGS_7(sme_fmopa_h, TCG_CALL_NO_RWG,
22
23
+ void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
23
static inline TCGv_i32 load_cpu_offset(int offset)
24
DEF_HELPER_FLAGS_7(sme_fmopa_s, TCG_CALL_NO_RWG,
24
{
25
void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
25
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
26
DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG,
27
diff --git a/target/arm/sme.decode b/target/arm/sme.decode
26
index XXXXXXX..XXXXXXX 100644
28
index XXXXXXX..XXXXXXX 100644
27
--- a/target/arm/mve.decode
29
--- a/target/arm/sme.decode
28
+++ b/target/arm/mve.decode
30
+++ b/target/arm/sme.decode
29
@@ -XXX,XX +XXX,XX @@ VLDR_VSTR 1110110 1 a:1 . w:1 . .... ... 111101 ....... @vldr_vstr \
31
@@ -XXX,XX +XXX,XX @@ FMOPA_s 10000000 100 ..... ... ... ..... . 00 .. @op_32
30
VLDR_VSTR 1110110 1 a:1 . w:1 . .... ... 111110 ....... @vldr_vstr \
32
FMOPA_d 10000000 110 ..... ... ... ..... . 0 ... @op_64
31
size=2 p=1
33
32
34
BFMOPA 10000001 100 ..... ... ... ..... . 00 .. @op_32
33
+# Moves between 2 32-bit vector lanes and 2 general purpose registers
35
+FMOPA_h 10000001 101 ..... ... ... ..... . 00 .. @op_32
34
+VMOV_to_2gp 1110 1100 0 . 00 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
36
diff --git a/target/arm/sme_helper.c b/target/arm/sme_helper.c
35
+VMOV_from_2gp 1110 1100 0 . 01 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
37
index XXXXXXX..XXXXXXX 100644
38
--- a/target/arm/sme_helper.c
39
+++ b/target/arm/sme_helper.c
40
@@ -XXX,XX +XXX,XX @@ static inline uint32_t f16mop_adj_pair(uint32_t pair, uint32_t pg, uint32_t neg)
41
return pair;
42
}
43
44
+static float32 f16_dotadd(float32 sum, uint32_t e1, uint32_t e2,
45
+ float_status *s_std, float_status *s_odd)
46
+{
47
+ float64 e1r = float16_to_float64(e1 & 0xffff, true, s_std);
48
+ float64 e1c = float16_to_float64(e1 >> 16, true, s_std);
49
+ float64 e2r = float16_to_float64(e2 & 0xffff, true, s_std);
50
+ float64 e2c = float16_to_float64(e2 >> 16, true, s_std);
51
+ float64 t64;
52
+ float32 t32;
36
+
53
+
37
# Vector 2-op
54
+ /*
38
VAND 1110 1111 0 . 00 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
55
+ * The ARM pseudocode function FPDot performs both multiplies
39
VBIC 1110 1111 0 . 01 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
56
+ * and the add with a single rounding operation. Emulate this
40
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
57
+ * by performing the first multiply in round-to-odd, then doing
41
index XXXXXXX..XXXXXXX 100644
58
+ * the second multiply as fused multiply-add, and rounding to
42
--- a/target/arm/translate-mve.c
59
+ * float32 all in one step.
43
+++ b/target/arm/translate-mve.c
60
+ */
44
@@ -XXX,XX +XXX,XX @@ static bool do_vabav(DisasContext *s, arg_vabav *a, MVEGenVABAVFn *fn)
61
+ t64 = float64_mul(e1r, e2r, s_odd);
45
62
+ t64 = float64r32_muladd(e1c, e2c, t64, 0, s_std);
46
DO_VABAV(VABAV_S, vabavs)
47
DO_VABAV(VABAV_U, vabavu)
48
+
63
+
49
+static bool trans_VMOV_to_2gp(DisasContext *s, arg_VMOV_to_2gp *a)
64
+ /* This conversion is exact, because we've already rounded. */
50
+{
65
+ t32 = float64_to_float32(t64, s_std);
51
+ /*
52
+ * VMOV two 32-bit vector lanes to two general-purpose registers.
53
+ * This insn is not predicated but it is subject to beat-wise
54
+ * execution if it is not in an IT block. For us this means
55
+ * only that if PSR.ECI says we should not be executing the beat
56
+ * corresponding to the lane of the vector register being accessed
57
+ * then we should skip perfoming the move, and that we need to do
58
+ * the usual check for bad ECI state and advance of ECI state.
59
+ * (If PSR.ECI is non-zero then we cannot be in an IT block.)
60
+ */
61
+ TCGv_i32 tmp;
62
+ int vd;
63
+
66
+
64
+ if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qd) ||
67
+ /* The final accumulation step is not fused. */
65
+ a->rt == 13 || a->rt == 15 || a->rt2 == 13 || a->rt2 == 15 ||
68
+ return float32_add(sum, t32, s_std);
66
+ a->rt == a->rt2) {
67
+ /* Rt/Rt2 cases are UNPREDICTABLE */
68
+ return false;
69
+ }
70
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
71
+ return true;
72
+ }
73
+
74
+ /* Convert Qreg index to Dreg for read_neon_element32() etc */
75
+ vd = a->qd * 2;
76
+
77
+ if (!mve_skip_vmov(s, vd, a->idx, MO_32)) {
78
+ tmp = tcg_temp_new_i32();
79
+ read_neon_element32(tmp, vd, a->idx, MO_32);
80
+ store_reg(s, a->rt, tmp);
81
+ }
82
+ if (!mve_skip_vmov(s, vd + 1, a->idx, MO_32)) {
83
+ tmp = tcg_temp_new_i32();
84
+ read_neon_element32(tmp, vd + 1, a->idx, MO_32);
85
+ store_reg(s, a->rt2, tmp);
86
+ }
87
+
88
+ mve_update_and_store_eci(s);
89
+ return true;
90
+}
69
+}
91
+
70
+
92
+static bool trans_VMOV_from_2gp(DisasContext *s, arg_VMOV_to_2gp *a)
71
+void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn,
72
+ void *vpm, void *vst, uint32_t desc)
93
+{
73
+{
74
+ intptr_t row, col, oprsz = simd_maxsz(desc);
75
+ uint32_t neg = simd_data(desc) * 0x80008000u;
76
+ uint16_t *pn = vpn, *pm = vpm;
77
+ float_status fpst_odd, fpst_std;
78
+
94
+ /*
79
+ /*
95
+ * VMOV two general-purpose registers to two 32-bit vector lanes.
80
+ * Make a copy of float_status because this operation does not
96
+ * This insn is not predicated but it is subject to beat-wise
81
+ * update the cumulative fp exception status. It also produces
97
+ * execution if it is not in an IT block. For us this means
82
+ * default nans. Make a second copy with round-to-odd -- see above.
98
+ * only that if PSR.ECI says we should not be executing the beat
99
+ * corresponding to the lane of the vector register being accessed
100
+ * then we should skip perfoming the move, and that we need to do
101
+ * the usual check for bad ECI state and advance of ECI state.
102
+ * (If PSR.ECI is non-zero then we cannot be in an IT block.)
103
+ */
83
+ */
104
+ TCGv_i32 tmp;
84
+ fpst_std = *(float_status *)vst;
105
+ int vd;
85
+ set_default_nan_mode(true, &fpst_std);
86
+ fpst_odd = fpst_std;
87
+ set_float_rounding_mode(float_round_to_odd, &fpst_odd);
106
+
88
+
107
+ if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qd) ||
89
+ for (row = 0; row < oprsz; ) {
108
+ a->rt == 13 || a->rt == 15 || a->rt2 == 13 || a->rt2 == 15) {
90
+ uint16_t prow = pn[H2(row >> 4)];
109
+ /* Rt/Rt2 cases are UNPREDICTABLE */
91
+ do {
110
+ return false;
92
+ void *vza_row = vza + tile_vslice_offset(row);
93
+ uint32_t n = *(uint32_t *)(vzn + H1_4(row));
94
+
95
+ n = f16mop_adj_pair(n, prow, neg);
96
+
97
+ for (col = 0; col < oprsz; ) {
98
+ uint16_t pcol = pm[H2(col >> 4)];
99
+ do {
100
+ if (prow & pcol & 0b0101) {
101
+ uint32_t *a = vza_row + H1_4(col);
102
+ uint32_t m = *(uint32_t *)(vzm + H1_4(col));
103
+
104
+ m = f16mop_adj_pair(m, pcol, 0);
105
+ *a = f16_dotadd(*a, n, m, &fpst_std, &fpst_odd);
106
+
107
+ col += 4;
108
+ pcol >>= 4;
109
+ }
110
+ } while (col & 15);
111
+ }
112
+ row += 4;
113
+ prow >>= 4;
114
+ } while (row & 15);
111
+ }
115
+ }
112
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
116
+}
113
+ return true;
114
+ }
115
+
117
+
116
+ /* Convert Qreg idx to Dreg for read_neon_element32() etc */
118
void HELPER(sme_bfmopa)(void *vza, void *vzn, void *vzm, void *vpn,
117
+ vd = a->qd * 2;
119
void *vpm, uint32_t desc)
118
+
120
{
119
+ if (!mve_skip_vmov(s, vd, a->idx, MO_32)) {
121
diff --git a/target/arm/translate-sme.c b/target/arm/translate-sme.c
120
+ tmp = load_reg(s, a->rt);
121
+ write_neon_element32(tmp, vd, a->idx, MO_32);
122
+ tcg_temp_free_i32(tmp);
123
+ }
124
+ if (!mve_skip_vmov(s, vd + 1, a->idx, MO_32)) {
125
+ tmp = load_reg(s, a->rt2);
126
+ write_neon_element32(tmp, vd + 1, a->idx, MO_32);
127
+ tcg_temp_free_i32(tmp);
128
+ }
129
+
130
+ mve_update_and_store_eci(s);
131
+ return true;
132
+}
133
diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
134
index XXXXXXX..XXXXXXX 100644
122
index XXXXXXX..XXXXXXX 100644
135
--- a/target/arm/translate-vfp.c
123
--- a/target/arm/translate-sme.c
136
+++ b/target/arm/translate-vfp.c
124
+++ b/target/arm/translate-sme.c
137
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
125
@@ -XXX,XX +XXX,XX @@ static bool do_outprod_fpst(DisasContext *s, arg_op *a, MemOp esz,
138
return true;
126
return true;
139
}
127
}
140
128
141
-static bool mve_skip_vmov(DisasContext *s, int vn, int index, int size)
129
+TRANS_FEAT(FMOPA_h, aa64_sme, do_outprod_fpst, a, MO_32, gen_helper_sme_fmopa_h)
142
+bool mve_skip_vmov(DisasContext *s, int vn, int index, int size)
130
TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, MO_32, gen_helper_sme_fmopa_s)
143
{
131
TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, MO_64, gen_helper_sme_fmopa_d)
144
/*
132
145
* In a CPU with MVE, the VMOV (vector lane to general-purpose register)
146
--
133
--
147
2.20.1
134
2.25.1
148
149
diff view generated by jsdifflib
1
Implement the MVE saturating doubling multiply accumulate insns
1
From: Richard Henderson <richard.henderson@linaro.org>
2
VQDMLAH, VQRDMLAH, VQDMLASH and VQRDMLASH. These perform a multiply,
3
double, add the accumulator shifted by the element size, possibly
4
round, saturate to twice the element size, then take the high half of
5
the result. The *MLAH insns do vector * scalar + vector, and the
6
*MLASH insns do vector * vector + scalar.
7
2
3
This is SMOPA, SUMOPA, USMOPA_s, UMOPA, for both Int8 and Int16.
4
5
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
6
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
7
Message-id: 20220708151540.18136-28-richard.henderson@linaro.org
8
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
10
---
9
---
11
target/arm/helper-mve.h | 16 +++++++
10
target/arm/helper-sme.h | 16 ++++++++
12
target/arm/mve.decode | 5 ++
11
target/arm/sme.decode | 10 +++++
13
target/arm/mve_helper.c | 95 ++++++++++++++++++++++++++++++++++++++
12
target/arm/sme_helper.c | 82 ++++++++++++++++++++++++++++++++++++++
14
target/arm/translate-mve.c | 4 ++
13
target/arm/translate-sme.c | 10 +++++
15
4 files changed, 120 insertions(+)
14
4 files changed, 118 insertions(+)
16
15
17
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
16
diff --git a/target/arm/helper-sme.h b/target/arm/helper-sme.h
18
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
19
--- a/target/arm/helper-mve.h
18
--- a/target/arm/helper-sme.h
20
+++ b/target/arm/helper-mve.h
19
+++ b/target/arm/helper-sme.h
21
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vmlasb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
20
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG,
22
DEF_HELPER_FLAGS_4(mve_vmlash, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
21
void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
23
DEF_HELPER_FLAGS_4(mve_vmlasw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
22
DEF_HELPER_FLAGS_6(sme_bfmopa, TCG_CALL_NO_RWG,
24
23
void, ptr, ptr, ptr, ptr, ptr, i32)
25
+DEF_HELPER_FLAGS_4(mve_vqdmlahb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
24
+DEF_HELPER_FLAGS_6(sme_smopa_s, TCG_CALL_NO_RWG,
26
+DEF_HELPER_FLAGS_4(mve_vqdmlahh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
25
+ void, ptr, ptr, ptr, ptr, ptr, i32)
27
+DEF_HELPER_FLAGS_4(mve_vqdmlahw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
26
+DEF_HELPER_FLAGS_6(sme_umopa_s, TCG_CALL_NO_RWG,
27
+ void, ptr, ptr, ptr, ptr, ptr, i32)
28
+DEF_HELPER_FLAGS_6(sme_sumopa_s, TCG_CALL_NO_RWG,
29
+ void, ptr, ptr, ptr, ptr, ptr, i32)
30
+DEF_HELPER_FLAGS_6(sme_usmopa_s, TCG_CALL_NO_RWG,
31
+ void, ptr, ptr, ptr, ptr, ptr, i32)
32
+DEF_HELPER_FLAGS_6(sme_smopa_d, TCG_CALL_NO_RWG,
33
+ void, ptr, ptr, ptr, ptr, ptr, i32)
34
+DEF_HELPER_FLAGS_6(sme_umopa_d, TCG_CALL_NO_RWG,
35
+ void, ptr, ptr, ptr, ptr, ptr, i32)
36
+DEF_HELPER_FLAGS_6(sme_sumopa_d, TCG_CALL_NO_RWG,
37
+ void, ptr, ptr, ptr, ptr, ptr, i32)
38
+DEF_HELPER_FLAGS_6(sme_usmopa_d, TCG_CALL_NO_RWG,
39
+ void, ptr, ptr, ptr, ptr, ptr, i32)
40
diff --git a/target/arm/sme.decode b/target/arm/sme.decode
41
index XXXXXXX..XXXXXXX 100644
42
--- a/target/arm/sme.decode
43
+++ b/target/arm/sme.decode
44
@@ -XXX,XX +XXX,XX @@ FMOPA_d 10000000 110 ..... ... ... ..... . 0 ... @op_64
45
46
BFMOPA 10000001 100 ..... ... ... ..... . 00 .. @op_32
47
FMOPA_h 10000001 101 ..... ... ... ..... . 00 .. @op_32
28
+
48
+
29
+DEF_HELPER_FLAGS_4(mve_vqrdmlahb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
49
+SMOPA_s 1010000 0 10 0 ..... ... ... ..... . 00 .. @op_32
30
+DEF_HELPER_FLAGS_4(mve_vqrdmlahh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
50
+SUMOPA_s 1010000 0 10 1 ..... ... ... ..... . 00 .. @op_32
31
+DEF_HELPER_FLAGS_4(mve_vqrdmlahw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
51
+USMOPA_s 1010000 1 10 0 ..... ... ... ..... . 00 .. @op_32
52
+UMOPA_s 1010000 1 10 1 ..... ... ... ..... . 00 .. @op_32
32
+
53
+
33
+DEF_HELPER_FLAGS_4(mve_vqdmlashb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
54
+SMOPA_d 1010000 0 11 0 ..... ... ... ..... . 0 ... @op_64
34
+DEF_HELPER_FLAGS_4(mve_vqdmlashh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
55
+SUMOPA_d 1010000 0 11 1 ..... ... ... ..... . 0 ... @op_64
35
+DEF_HELPER_FLAGS_4(mve_vqdmlashw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
56
+USMOPA_d 1010000 1 11 0 ..... ... ... ..... . 0 ... @op_64
57
+UMOPA_d 1010000 1 11 1 ..... ... ... ..... . 0 ... @op_64
58
diff --git a/target/arm/sme_helper.c b/target/arm/sme_helper.c
59
index XXXXXXX..XXXXXXX 100644
60
--- a/target/arm/sme_helper.c
61
+++ b/target/arm/sme_helper.c
62
@@ -XXX,XX +XXX,XX @@ void HELPER(sme_bfmopa)(void *vza, void *vzn, void *vzm, void *vpn,
63
} while (row & 15);
64
}
65
}
36
+
66
+
37
+DEF_HELPER_FLAGS_4(mve_vqrdmlashb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
67
+typedef uint64_t IMOPFn(uint64_t, uint64_t, uint64_t, uint8_t, bool);
38
+DEF_HELPER_FLAGS_4(mve_vqrdmlashh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
39
+DEF_HELPER_FLAGS_4(mve_vqrdmlashw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
40
+
68
+
41
DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
69
+static inline void do_imopa(uint64_t *za, uint64_t *zn, uint64_t *zm,
42
DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
70
+ uint8_t *pn, uint8_t *pm,
43
DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
71
+ uint32_t desc, IMOPFn *fn)
44
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
72
+{
45
index XXXXXXX..XXXXXXX 100644
73
+ intptr_t row, col, oprsz = simd_oprsz(desc) / 8;
46
--- a/target/arm/mve.decode
74
+ bool neg = simd_data(desc);
47
+++ b/target/arm/mve.decode
48
@@ -XXX,XX +XXX,XX @@ VQRDMULH_scalar 1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
49
VMLA 111- 1110 0 . .. ... 1 ... 0 1110 . 100 .... @2scalar
50
VMLAS 111- 1110 0 . .. ... 1 ... 1 1110 . 100 .... @2scalar
51
52
+VQRDMLAH 1110 1110 0 . .. ... 0 ... 0 1110 . 100 .... @2scalar
53
+VQRDMLASH 1110 1110 0 . .. ... 0 ... 1 1110 . 100 .... @2scalar
54
+VQDMLAH 1110 1110 0 . .. ... 0 ... 0 1110 . 110 .... @2scalar
55
+VQDMLASH 1110 1110 0 . .. ... 0 ... 1 1110 . 110 .... @2scalar
56
+
75
+
57
# Vector add across vector
76
+ for (row = 0; row < oprsz; ++row) {
58
{
77
+ uint8_t pa = pn[H1(row)];
59
VADDV 111 u:1 1110 1111 size:2 01 ... 0 1111 0 0 a:1 0 qm:3 0 rda=%rdalo
78
+ uint64_t *za_row = &za[tile_vslice_index(row)];
60
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
79
+ uint64_t n = zn[row];
61
index XXXXXXX..XXXXXXX 100644
80
+
62
--- a/target/arm/mve_helper.c
81
+ for (col = 0; col < oprsz; ++col) {
63
+++ b/target/arm/mve_helper.c
82
+ uint8_t pb = pm[H1(col)];
64
@@ -XXX,XX +XXX,XX @@ DO_VQDMLADH_OP(vqrdmlsdhxw, 4, int32_t, 1, 1, do_vqdmlsdh_w)
83
+ uint64_t *a = &za_row[col];
65
mve_advance_vpt(env); \
84
+
66
}
85
+ *a = fn(n, zm[col], *a, pa & pb, neg);
67
86
+ }
68
+#define DO_2OP_SAT_ACC_SCALAR(OP, ESIZE, TYPE, FN) \
69
+ void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn, \
70
+ uint32_t rm) \
71
+ { \
72
+ TYPE *d = vd, *n = vn; \
73
+ TYPE m = rm; \
74
+ uint16_t mask = mve_element_mask(env); \
75
+ unsigned e; \
76
+ bool qc = false; \
77
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
78
+ bool sat = false; \
79
+ mergemask(&d[H##ESIZE(e)], \
80
+ FN(d[H##ESIZE(e)], n[H##ESIZE(e)], m, &sat), \
81
+ mask); \
82
+ qc |= sat & mask & 1; \
83
+ } \
84
+ if (qc) { \
85
+ env->vfp.qc[0] = qc; \
86
+ } \
87
+ mve_advance_vpt(env); \
88
+ }
87
+ }
89
+
90
/* provide unsigned 2-op scalar helpers for all sizes */
91
#define DO_2OP_SCALAR_U(OP, FN) \
92
DO_2OP_SCALAR(OP##b, 1, uint8_t, FN) \
93
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT_SCALAR(vqrdmulh_scalarb, 1, int8_t, DO_QRDMULH_B)
94
DO_2OP_SAT_SCALAR(vqrdmulh_scalarh, 2, int16_t, DO_QRDMULH_H)
95
DO_2OP_SAT_SCALAR(vqrdmulh_scalarw, 4, int32_t, DO_QRDMULH_W)
96
97
+static int8_t do_vqdmlah_b(int8_t a, int8_t b, int8_t c, int round, bool *sat)
98
+{
99
+ int64_t r = (int64_t)a * b * 2 + ((int64_t)c << 8) + (round << 7);
100
+ return do_sat_bhw(r, INT16_MIN, INT16_MAX, sat) >> 8;
101
+}
88
+}
102
+
89
+
103
+static int16_t do_vqdmlah_h(int16_t a, int16_t b, int16_t c,
90
+#define DEF_IMOP_32(NAME, NTYPE, MTYPE) \
104
+ int round, bool *sat)
91
+static uint64_t NAME(uint64_t n, uint64_t m, uint64_t a, uint8_t p, bool neg) \
105
+{
92
+{ \
106
+ int64_t r = (int64_t)a * b * 2 + ((int64_t)c << 16) + (round << 15);
93
+ uint32_t sum0 = 0, sum1 = 0; \
107
+ return do_sat_bhw(r, INT32_MIN, INT32_MAX, sat) >> 16;
94
+ /* Apply P to N as a mask, making the inactive elements 0. */ \
95
+ n &= expand_pred_b(p); \
96
+ sum0 += (NTYPE)(n >> 0) * (MTYPE)(m >> 0); \
97
+ sum0 += (NTYPE)(n >> 8) * (MTYPE)(m >> 8); \
98
+ sum0 += (NTYPE)(n >> 16) * (MTYPE)(m >> 16); \
99
+ sum0 += (NTYPE)(n >> 24) * (MTYPE)(m >> 24); \
100
+ sum1 += (NTYPE)(n >> 32) * (MTYPE)(m >> 32); \
101
+ sum1 += (NTYPE)(n >> 40) * (MTYPE)(m >> 40); \
102
+ sum1 += (NTYPE)(n >> 48) * (MTYPE)(m >> 48); \
103
+ sum1 += (NTYPE)(n >> 56) * (MTYPE)(m >> 56); \
104
+ if (neg) { \
105
+ sum0 = (uint32_t)a - sum0, sum1 = (uint32_t)(a >> 32) - sum1; \
106
+ } else { \
107
+ sum0 = (uint32_t)a + sum0, sum1 = (uint32_t)(a >> 32) + sum1; \
108
+ } \
109
+ return ((uint64_t)sum1 << 32) | sum0; \
108
+}
110
+}
109
+
111
+
110
+static int32_t do_vqdmlah_w(int32_t a, int32_t b, int32_t c,
112
+#define DEF_IMOP_64(NAME, NTYPE, MTYPE) \
111
+ int round, bool *sat)
113
+static uint64_t NAME(uint64_t n, uint64_t m, uint64_t a, uint8_t p, bool neg) \
112
+{
114
+{ \
113
+ /*
115
+ uint64_t sum = 0; \
114
+ * Architecturally we should do the entire add, double, round
116
+ /* Apply P to N as a mask, making the inactive elements 0. */ \
115
+ * and then check for saturation. We do three saturating adds,
117
+ n &= expand_pred_h(p); \
116
+ * but we need to be careful about the order. If the first
118
+ sum += (NTYPE)(n >> 0) * (MTYPE)(m >> 0); \
117
+ * m1 + m2 saturates then it's impossible for the *2+rc to
119
+ sum += (NTYPE)(n >> 16) * (MTYPE)(m >> 16); \
118
+ * bring it back into the non-saturated range. However, if
120
+ sum += (NTYPE)(n >> 32) * (MTYPE)(m >> 32); \
119
+ * m1 + m2 is negative then it's possible that doing the doubling
121
+ sum += (NTYPE)(n >> 48) * (MTYPE)(m >> 48); \
120
+ * would take the intermediate result below INT64_MAX and the
122
+ return neg ? a - sum : a + sum; \
121
+ * addition of the rounding constant then brings it back in range.
122
+ * So we add half the rounding constant and half the "c << esize"
123
+ * before doubling rather than adding the rounding constant after
124
+ * the doubling.
125
+ */
126
+ int64_t m1 = (int64_t)a * b;
127
+ int64_t m2 = (int64_t)c << 31;
128
+ int64_t r;
129
+ if (sadd64_overflow(m1, m2, &r) ||
130
+ sadd64_overflow(r, (round << 30), &r) ||
131
+ sadd64_overflow(r, r, &r)) {
132
+ *sat = true;
133
+ return r < 0 ? INT32_MAX : INT32_MIN;
134
+ }
135
+ return r >> 32;
136
+}
123
+}
137
+
124
+
138
+/*
125
+DEF_IMOP_32(smopa_s, int8_t, int8_t)
139
+ * The *MLAH insns are vector * scalar + vector;
126
+DEF_IMOP_32(umopa_s, uint8_t, uint8_t)
140
+ * the *MLASH insns are vector * vector + scalar
127
+DEF_IMOP_32(sumopa_s, int8_t, uint8_t)
141
+ */
128
+DEF_IMOP_32(usmopa_s, uint8_t, int8_t)
142
+#define DO_VQDMLAH_B(D, N, M, S) do_vqdmlah_b(N, M, D, 0, S)
143
+#define DO_VQDMLAH_H(D, N, M, S) do_vqdmlah_h(N, M, D, 0, S)
144
+#define DO_VQDMLAH_W(D, N, M, S) do_vqdmlah_w(N, M, D, 0, S)
145
+#define DO_VQRDMLAH_B(D, N, M, S) do_vqdmlah_b(N, M, D, 1, S)
146
+#define DO_VQRDMLAH_H(D, N, M, S) do_vqdmlah_h(N, M, D, 1, S)
147
+#define DO_VQRDMLAH_W(D, N, M, S) do_vqdmlah_w(N, M, D, 1, S)
148
+
129
+
149
+#define DO_VQDMLASH_B(D, N, M, S) do_vqdmlah_b(N, D, M, 0, S)
130
+DEF_IMOP_64(smopa_d, int16_t, int16_t)
150
+#define DO_VQDMLASH_H(D, N, M, S) do_vqdmlah_h(N, D, M, 0, S)
131
+DEF_IMOP_64(umopa_d, uint16_t, uint16_t)
151
+#define DO_VQDMLASH_W(D, N, M, S) do_vqdmlah_w(N, D, M, 0, S)
132
+DEF_IMOP_64(sumopa_d, int16_t, uint16_t)
152
+#define DO_VQRDMLASH_B(D, N, M, S) do_vqdmlah_b(N, D, M, 1, S)
133
+DEF_IMOP_64(usmopa_d, uint16_t, int16_t)
153
+#define DO_VQRDMLASH_H(D, N, M, S) do_vqdmlah_h(N, D, M, 1, S)
154
+#define DO_VQRDMLASH_W(D, N, M, S) do_vqdmlah_w(N, D, M, 1, S)
155
+
134
+
156
+DO_2OP_SAT_ACC_SCALAR(vqdmlahb, 1, int8_t, DO_VQDMLAH_B)
135
+#define DEF_IMOPH(NAME) \
157
+DO_2OP_SAT_ACC_SCALAR(vqdmlahh, 2, int16_t, DO_VQDMLAH_H)
136
+ void HELPER(sme_##NAME)(void *vza, void *vzn, void *vzm, void *vpn, \
158
+DO_2OP_SAT_ACC_SCALAR(vqdmlahw, 4, int32_t, DO_VQDMLAH_W)
137
+ void *vpm, uint32_t desc) \
159
+DO_2OP_SAT_ACC_SCALAR(vqrdmlahb, 1, int8_t, DO_VQRDMLAH_B)
138
+ { do_imopa(vza, vzn, vzm, vpn, vpm, desc, NAME); }
160
+DO_2OP_SAT_ACC_SCALAR(vqrdmlahh, 2, int16_t, DO_VQRDMLAH_H)
161
+DO_2OP_SAT_ACC_SCALAR(vqrdmlahw, 4, int32_t, DO_VQRDMLAH_W)
162
+
139
+
163
+DO_2OP_SAT_ACC_SCALAR(vqdmlashb, 1, int8_t, DO_VQDMLASH_B)
140
+DEF_IMOPH(smopa_s)
164
+DO_2OP_SAT_ACC_SCALAR(vqdmlashh, 2, int16_t, DO_VQDMLASH_H)
141
+DEF_IMOPH(umopa_s)
165
+DO_2OP_SAT_ACC_SCALAR(vqdmlashw, 4, int32_t, DO_VQDMLASH_W)
142
+DEF_IMOPH(sumopa_s)
166
+DO_2OP_SAT_ACC_SCALAR(vqrdmlashb, 1, int8_t, DO_VQRDMLASH_B)
143
+DEF_IMOPH(usmopa_s)
167
+DO_2OP_SAT_ACC_SCALAR(vqrdmlashh, 2, int16_t, DO_VQRDMLASH_H)
144
+DEF_IMOPH(smopa_d)
168
+DO_2OP_SAT_ACC_SCALAR(vqrdmlashw, 4, int32_t, DO_VQRDMLASH_W)
145
+DEF_IMOPH(umopa_d)
146
+DEF_IMOPH(sumopa_d)
147
+DEF_IMOPH(usmopa_d)
148
diff --git a/target/arm/translate-sme.c b/target/arm/translate-sme.c
149
index XXXXXXX..XXXXXXX 100644
150
--- a/target/arm/translate-sme.c
151
+++ b/target/arm/translate-sme.c
152
@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, MO_64, gen_helper_sme_f
153
154
/* TODO: FEAT_EBF16 */
155
TRANS_FEAT(BFMOPA, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_bfmopa)
169
+
156
+
170
/* Vector by scalar plus vector */
157
+TRANS_FEAT(SMOPA_s, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_smopa_s)
171
#define DO_VMLA(D, N, M) ((N) * (M) + (D))
158
+TRANS_FEAT(UMOPA_s, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_umopa_s)
172
159
+TRANS_FEAT(SUMOPA_s, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_sumopa_s)
173
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
160
+TRANS_FEAT(USMOPA_s, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_usmopa_s)
174
index XXXXXXX..XXXXXXX 100644
161
+
175
--- a/target/arm/translate-mve.c
162
+TRANS_FEAT(SMOPA_d, aa64_sme_i16i64, do_outprod, a, MO_64, gen_helper_sme_smopa_d)
176
+++ b/target/arm/translate-mve.c
163
+TRANS_FEAT(UMOPA_d, aa64_sme_i16i64, do_outprod, a, MO_64, gen_helper_sme_umopa_d)
177
@@ -XXX,XX +XXX,XX @@ DO_2OP_SCALAR(VQRDMULH_scalar, vqrdmulh_scalar)
164
+TRANS_FEAT(SUMOPA_d, aa64_sme_i16i64, do_outprod, a, MO_64, gen_helper_sme_sumopa_d)
178
DO_2OP_SCALAR(VBRSR, vbrsr)
165
+TRANS_FEAT(USMOPA_d, aa64_sme_i16i64, do_outprod, a, MO_64, gen_helper_sme_usmopa_d)
179
DO_2OP_SCALAR(VMLA, vmla)
180
DO_2OP_SCALAR(VMLAS, vmlas)
181
+DO_2OP_SCALAR(VQDMLAH, vqdmlah)
182
+DO_2OP_SCALAR(VQRDMLAH, vqrdmlah)
183
+DO_2OP_SCALAR(VQDMLASH, vqdmlash)
184
+DO_2OP_SCALAR(VQRDMLASH, vqrdmlash)
185
186
static bool trans_VQDMULLB_scalar(DisasContext *s, arg_2scalar *a)
187
{
188
--
166
--
189
2.20.1
167
2.25.1
190
191
diff view generated by jsdifflib
1
Implement the MVE VLDR/VSTR insns which do scatter-gather using base
1
From: Richard Henderson <richard.henderson@linaro.org>
2
addresses from Qm plus or minus an immediate offset (possibly with
3
writeback). Note that writeback is not predicated but it does have
4
to honour ECI state, so we have to add an eci_mask check to the
5
VSTR_SG macros (the VLDR_SG macros already needed this to be able
6
to distinguish "skip beat" from "set predicated element to 0").
7
2
3
This is an SVE instruction that operates using the SVE vector
4
length but that it is present only if SME is implemented.
5
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20220708151540.18136-29-richard.henderson@linaro.org
8
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
10
---
10
---
11
target/arm/helper-mve.h | 5 +++
11
target/arm/sve.decode | 20 +++++++++++++
12
target/arm/mve.decode | 10 +++++
12
target/arm/translate-sve.c | 57 ++++++++++++++++++++++++++++++++++++++
13
target/arm/mve_helper.c | 91 ++++++++++++++++++++++++--------------
13
2 files changed, 77 insertions(+)
14
target/arm/translate-mve.c | 72 ++++++++++++++++++++++++++++++
15
4 files changed, 146 insertions(+), 32 deletions(-)
16
14
17
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
15
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
18
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
19
--- a/target/arm/helper-mve.h
17
--- a/target/arm/sve.decode
20
+++ b/target/arm/helper-mve.h
18
+++ b/target/arm/sve.decode
21
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vstrh_sg_os_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
19
@@ -XXX,XX +XXX,XX @@ BFMLALT_zzxw 01100100 11 1 ..... 0100.1 ..... ..... @rrxr_3a esz=2
22
DEF_HELPER_FLAGS_4(mve_vstrw_sg_os_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
20
23
DEF_HELPER_FLAGS_4(mve_vstrd_sg_os_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
21
### SVE2 floating-point bfloat16 dot-product (indexed)
24
22
BFDOT_zzxz 01100100 01 1 ..... 010000 ..... ..... @rrxr_2 esz=2
25
+DEF_HELPER_FLAGS_4(mve_vldrw_sg_wb_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
26
+DEF_HELPER_FLAGS_4(mve_vldrd_sg_wb_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
27
+DEF_HELPER_FLAGS_4(mve_vstrw_sg_wb_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
28
+DEF_HELPER_FLAGS_4(mve_vstrd_sg_wb_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
29
+
23
+
30
DEF_HELPER_FLAGS_3(mve_vdup, TCG_CALL_NO_WG, void, env, ptr, i32)
24
+### SVE broadcast predicate element
31
25
+
32
DEF_HELPER_FLAGS_4(mve_vidupb, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
26
+&psel esz pd pn pm rv imm
33
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
27
+%psel_rv 16:2 !function=plus_12
28
+%psel_imm_b 22:2 19:2
29
+%psel_imm_h 22:2 20:1
30
+%psel_imm_s 22:2
31
+%psel_imm_d 23:1
32
+@psel ........ .. . ... .. .. pn:4 . pm:4 . pd:4 \
33
+ &psel rv=%psel_rv
34
+
35
+PSEL 00100101 .. 1 ..1 .. 01 .... 0 .... 0 .... \
36
+ @psel esz=0 imm=%psel_imm_b
37
+PSEL 00100101 .. 1 .10 .. 01 .... 0 .... 0 .... \
38
+ @psel esz=1 imm=%psel_imm_h
39
+PSEL 00100101 .. 1 100 .. 01 .... 0 .... 0 .... \
40
+ @psel esz=2 imm=%psel_imm_s
41
+PSEL 00100101 .1 1 000 .. 01 .... 0 .... 0 .... \
42
+ @psel esz=3 imm=%psel_imm_d
43
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
34
index XXXXXXX..XXXXXXX 100644
44
index XXXXXXX..XXXXXXX 100644
35
--- a/target/arm/mve.decode
45
--- a/target/arm/translate-sve.c
36
+++ b/target/arm/mve.decode
46
+++ b/target/arm/translate-sve.c
37
@@ -XXX,XX +XXX,XX @@
47
@@ -XXX,XX +XXX,XX @@ static bool do_BFMLAL_zzxw(DisasContext *s, arg_rrxr_esz *a, bool sel)
38
&vmaxv qm rda size
48
39
&vabav qn qm rda size
49
TRANS_FEAT(BFMLALB_zzxw, aa64_sve_bf16, do_BFMLAL_zzxw, a, false)
40
&vldst_sg qd qm rn size msize os
50
TRANS_FEAT(BFMLALT_zzxw, aa64_sve_bf16, do_BFMLAL_zzxw, a, true)
41
+&vldst_sg_imm qd qm a w imm
42
43
# scatter-gather memory size is in bits 6:4
44
%sg_msize 6:1 4:1
45
@@ -XXX,XX +XXX,XX @@
46
@vldst_sg .... .... .... rn:4 .... ... size:2 ... ... os:1 &vldst_sg \
47
qd=%qd qm=%qm msize=%sg_msize
48
49
+# Qm is in the fields usually labeled Qn
50
+@vldst_sg_imm .... .... a:1 . w:1 . .... .... .... . imm:7 &vldst_sg_imm \
51
+ qd=%qd qm=%qn
52
+
51
+
53
@1op .... .... .... size:2 .. .... .... .... .... &1op qd=%qd qm=%qm
52
+static bool trans_PSEL(DisasContext *s, arg_psel *a)
54
@1op_nosz .... .... .... .... .... .... .... .... &1op qd=%qd qm=%qm size=0
53
+{
55
@2op .... .... .. size:2 .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn
54
+ int vl = vec_full_reg_size(s);
56
@@ -XXX,XX +XXX,XX @@ VLDR_S_sg 111 0 1100 1 . 01 .... ... 0 111 . .... .... @vldst_sg
55
+ int pl = pred_gvec_reg_size(s);
57
VLDR_U_sg 111 1 1100 1 . 01 .... ... 0 111 . .... .... @vldst_sg
56
+ int elements = vl >> a->esz;
58
VSTR_sg 111 0 1100 1 . 00 .... ... 0 111 . .... .... @vldst_sg
57
+ TCGv_i64 tmp, didx, dbit;
59
58
+ TCGv_ptr ptr;
60
+VLDRW_sg_imm 111 1 1101 ... 1 ... 0 ... 1 1110 .... .... @vldst_sg_imm
61
+VLDRD_sg_imm 111 1 1101 ... 1 ... 0 ... 1 1111 .... .... @vldst_sg_imm
62
+VSTRW_sg_imm 111 1 1101 ... 0 ... 0 ... 1 1110 .... .... @vldst_sg_imm
63
+VSTRD_sg_imm 111 1 1101 ... 0 ... 0 ... 1 1111 .... .... @vldst_sg_imm
64
+
59
+
65
# Moves between 2 32-bit vector lanes and 2 general purpose registers
60
+ if (!dc_isar_feature(aa64_sme, s)) {
66
VMOV_to_2gp 1110 1100 0 . 00 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
67
VMOV_from_2gp 1110 1100 0 . 01 rt2:4 ... 0 1111 000 idx:1 rt:4 qd=%qd
68
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
69
index XXXXXXX..XXXXXXX 100644
70
--- a/target/arm/mve_helper.c
71
+++ b/target/arm/mve_helper.c
72
@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
73
* For loads, predicated lanes are zeroed instead of retaining
74
* their previous values.
75
*/
76
-#define DO_VLDR_SG(OP, LDTYPE, ESIZE, TYPE, OFFTYPE, ADDRFN) \
77
+#define DO_VLDR_SG(OP, LDTYPE, ESIZE, TYPE, OFFTYPE, ADDRFN, WB) \
78
void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, \
79
uint32_t base) \
80
{ \
81
@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
82
addr = ADDRFN(base, m[H##ESIZE(e)]); \
83
d[H##ESIZE(e)] = (mask & 1) ? \
84
cpu_##LDTYPE##_data_ra(env, addr, GETPC()) : 0; \
85
+ if (WB) { \
86
+ m[H##ESIZE(e)] = addr; \
87
+ } \
88
} \
89
mve_advance_vpt(env); \
90
}
91
92
/* We know here TYPE is unsigned so always the same as the offset type */
93
-#define DO_VSTR_SG(OP, STTYPE, ESIZE, TYPE, ADDRFN) \
94
+#define DO_VSTR_SG(OP, STTYPE, ESIZE, TYPE, ADDRFN, WB) \
95
void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, \
96
uint32_t base) \
97
{ \
98
TYPE *d = vd; \
99
TYPE *m = vm; \
100
uint16_t mask = mve_element_mask(env); \
101
+ uint16_t eci_mask = mve_eci_mask(env); \
102
unsigned e; \
103
uint32_t addr; \
104
- for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
105
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE, eci_mask >>= ESIZE) { \
106
+ if (!(eci_mask & 1)) { \
107
+ continue; \
108
+ } \
109
addr = ADDRFN(base, m[H##ESIZE(e)]); \
110
if (mask & 1) { \
111
cpu_##STTYPE##_data_ra(env, addr, d[H##ESIZE(e)], GETPC()); \
112
} \
113
+ if (WB) { \
114
+ m[H##ESIZE(e)] = addr; \
115
+ } \
116
} \
117
mve_advance_vpt(env); \
118
}
119
@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
120
* accesses, controlled by the predicate mask for the relevant beat,
121
* and with a single 32-bit offset in the first of the two Qm elements.
122
* Note that for QEMU our IMPDEF AIRCR.ENDIANNESS is always 0 (little).
123
+ * Address writeback happens on the odd beats and updates the address
124
+ * stored in the even-beat element.
125
*/
126
-#define DO_VLDR64_SG(OP, ADDRFN) \
127
+#define DO_VLDR64_SG(OP, ADDRFN, WB) \
128
void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, \
129
uint32_t base) \
130
{ \
131
@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
132
addr = ADDRFN(base, m[H4(e & ~1)]); \
133
addr += 4 * (e & 1); \
134
d[H4(e)] = (mask & 1) ? cpu_ldl_data_ra(env, addr, GETPC()) : 0; \
135
+ if (WB && (e & 1)) { \
136
+ m[H4(e & ~1)] = addr - 4; \
137
+ } \
138
} \
139
mve_advance_vpt(env); \
140
}
141
142
-#define DO_VSTR64_SG(OP, ADDRFN) \
143
+#define DO_VSTR64_SG(OP, ADDRFN, WB) \
144
void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm, \
145
uint32_t base) \
146
{ \
147
uint32_t *d = vd; \
148
uint32_t *m = vm; \
149
uint16_t mask = mve_element_mask(env); \
150
+ uint16_t eci_mask = mve_eci_mask(env); \
151
unsigned e; \
152
uint32_t addr; \
153
- for (e = 0; e < 16 / 4; e++, mask >>= 4) { \
154
+ for (e = 0; e < 16 / 4; e++, mask >>= 4, eci_mask >>= 4) { \
155
+ if (!(eci_mask & 1)) { \
156
+ continue; \
157
+ } \
158
addr = ADDRFN(base, m[H4(e & ~1)]); \
159
addr += 4 * (e & 1); \
160
if (mask & 1) { \
161
cpu_stl_data_ra(env, addr, d[H4(e)], GETPC()); \
162
} \
163
+ if (WB && (e & 1)) { \
164
+ m[H4(e & ~1)] = addr - 4; \
165
+ } \
166
} \
167
mve_advance_vpt(env); \
168
}
169
@@ -XXX,XX +XXX,XX @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
170
#define ADDR_ADD_OSW(BASE, OFFSET) ((BASE) + ((OFFSET) << 2))
171
#define ADDR_ADD_OSD(BASE, OFFSET) ((BASE) + ((OFFSET) << 3))
172
173
-DO_VLDR_SG(vldrb_sg_sh, ldsb, 2, int16_t, uint16_t, ADDR_ADD)
174
-DO_VLDR_SG(vldrb_sg_sw, ldsb, 4, int32_t, uint32_t, ADDR_ADD)
175
-DO_VLDR_SG(vldrh_sg_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD)
176
+DO_VLDR_SG(vldrb_sg_sh, ldsb, 2, int16_t, uint16_t, ADDR_ADD, false)
177
+DO_VLDR_SG(vldrb_sg_sw, ldsb, 4, int32_t, uint32_t, ADDR_ADD, false)
178
+DO_VLDR_SG(vldrh_sg_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD, false)
179
180
-DO_VLDR_SG(vldrb_sg_ub, ldub, 1, uint8_t, uint8_t, ADDR_ADD)
181
-DO_VLDR_SG(vldrb_sg_uh, ldub, 2, uint16_t, uint16_t, ADDR_ADD)
182
-DO_VLDR_SG(vldrb_sg_uw, ldub, 4, uint32_t, uint32_t, ADDR_ADD)
183
-DO_VLDR_SG(vldrh_sg_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD)
184
-DO_VLDR_SG(vldrh_sg_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD)
185
-DO_VLDR_SG(vldrw_sg_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD)
186
-DO_VLDR64_SG(vldrd_sg_ud, ADDR_ADD)
187
+DO_VLDR_SG(vldrb_sg_ub, ldub, 1, uint8_t, uint8_t, ADDR_ADD, false)
188
+DO_VLDR_SG(vldrb_sg_uh, ldub, 2, uint16_t, uint16_t, ADDR_ADD, false)
189
+DO_VLDR_SG(vldrb_sg_uw, ldub, 4, uint32_t, uint32_t, ADDR_ADD, false)
190
+DO_VLDR_SG(vldrh_sg_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD, false)
191
+DO_VLDR_SG(vldrh_sg_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD, false)
192
+DO_VLDR_SG(vldrw_sg_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD, false)
193
+DO_VLDR64_SG(vldrd_sg_ud, ADDR_ADD, false)
194
195
-DO_VLDR_SG(vldrh_sg_os_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD_OSH)
196
-DO_VLDR_SG(vldrh_sg_os_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD_OSH)
197
-DO_VLDR_SG(vldrh_sg_os_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD_OSH)
198
-DO_VLDR_SG(vldrw_sg_os_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD_OSW)
199
-DO_VLDR64_SG(vldrd_sg_os_ud, ADDR_ADD_OSD)
200
+DO_VLDR_SG(vldrh_sg_os_sw, ldsw, 4, int32_t, uint32_t, ADDR_ADD_OSH, false)
201
+DO_VLDR_SG(vldrh_sg_os_uh, lduw, 2, uint16_t, uint16_t, ADDR_ADD_OSH, false)
202
+DO_VLDR_SG(vldrh_sg_os_uw, lduw, 4, uint32_t, uint32_t, ADDR_ADD_OSH, false)
203
+DO_VLDR_SG(vldrw_sg_os_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD_OSW, false)
204
+DO_VLDR64_SG(vldrd_sg_os_ud, ADDR_ADD_OSD, false)
205
206
-DO_VSTR_SG(vstrb_sg_ub, stb, 1, uint8_t, ADDR_ADD)
207
-DO_VSTR_SG(vstrb_sg_uh, stb, 2, uint16_t, ADDR_ADD)
208
-DO_VSTR_SG(vstrb_sg_uw, stb, 4, uint32_t, ADDR_ADD)
209
-DO_VSTR_SG(vstrh_sg_uh, stw, 2, uint16_t, ADDR_ADD)
210
-DO_VSTR_SG(vstrh_sg_uw, stw, 4, uint32_t, ADDR_ADD)
211
-DO_VSTR_SG(vstrw_sg_uw, stl, 4, uint32_t, ADDR_ADD)
212
-DO_VSTR64_SG(vstrd_sg_ud, ADDR_ADD)
213
+DO_VSTR_SG(vstrb_sg_ub, stb, 1, uint8_t, ADDR_ADD, false)
214
+DO_VSTR_SG(vstrb_sg_uh, stb, 2, uint16_t, ADDR_ADD, false)
215
+DO_VSTR_SG(vstrb_sg_uw, stb, 4, uint32_t, ADDR_ADD, false)
216
+DO_VSTR_SG(vstrh_sg_uh, stw, 2, uint16_t, ADDR_ADD, false)
217
+DO_VSTR_SG(vstrh_sg_uw, stw, 4, uint32_t, ADDR_ADD, false)
218
+DO_VSTR_SG(vstrw_sg_uw, stl, 4, uint32_t, ADDR_ADD, false)
219
+DO_VSTR64_SG(vstrd_sg_ud, ADDR_ADD, false)
220
221
-DO_VSTR_SG(vstrh_sg_os_uh, stw, 2, uint16_t, ADDR_ADD_OSH)
222
-DO_VSTR_SG(vstrh_sg_os_uw, stw, 4, uint32_t, ADDR_ADD_OSH)
223
-DO_VSTR_SG(vstrw_sg_os_uw, stl, 4, uint32_t, ADDR_ADD_OSW)
224
-DO_VSTR64_SG(vstrd_sg_os_ud, ADDR_ADD_OSD)
225
+DO_VSTR_SG(vstrh_sg_os_uh, stw, 2, uint16_t, ADDR_ADD_OSH, false)
226
+DO_VSTR_SG(vstrh_sg_os_uw, stw, 4, uint32_t, ADDR_ADD_OSH, false)
227
+DO_VSTR_SG(vstrw_sg_os_uw, stl, 4, uint32_t, ADDR_ADD_OSW, false)
228
+DO_VSTR64_SG(vstrd_sg_os_ud, ADDR_ADD_OSD, false)
229
+
230
+DO_VLDR_SG(vldrw_sg_wb_uw, ldl, 4, uint32_t, uint32_t, ADDR_ADD, true)
231
+DO_VLDR64_SG(vldrd_sg_wb_ud, ADDR_ADD, true)
232
+DO_VSTR_SG(vstrw_sg_wb_uw, stl, 4, uint32_t, ADDR_ADD, true)
233
+DO_VSTR64_SG(vstrd_sg_wb_ud, ADDR_ADD, true)
234
235
/*
236
* The mergemask(D, R, M) macro performs the operation "*D = R" but
237
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
238
index XXXXXXX..XXXXXXX 100644
239
--- a/target/arm/translate-mve.c
240
+++ b/target/arm/translate-mve.c
241
@@ -XXX,XX +XXX,XX @@ static bool trans_VSTR_sg(DisasContext *s, arg_vldst_sg *a)
242
243
#undef F
244
245
+static bool do_ldst_sg_imm(DisasContext *s, arg_vldst_sg_imm *a,
246
+ MVEGenLdStSGFn *fn, unsigned msize)
247
+{
248
+ uint32_t offset;
249
+ TCGv_ptr qd, qm;
250
+
251
+ if (!dc_isar_feature(aa32_mve, s) ||
252
+ !mve_check_qreg_bank(s, a->qd | a->qm) ||
253
+ !fn) {
254
+ return false;
61
+ return false;
255
+ }
62
+ }
256
+
63
+ if (!sve_access_check(s)) {
257
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
258
+ return true;
64
+ return true;
259
+ }
65
+ }
260
+
66
+
261
+ offset = a->imm << msize;
67
+ tmp = tcg_temp_new_i64();
262
+ if (!a->a) {
68
+ dbit = tcg_temp_new_i64();
263
+ offset = -offset;
69
+ didx = tcg_temp_new_i64();
70
+ ptr = tcg_temp_new_ptr();
71
+
72
+ /* Compute the predicate element. */
73
+ tcg_gen_addi_i64(tmp, cpu_reg(s, a->rv), a->imm);
74
+ if (is_power_of_2(elements)) {
75
+ tcg_gen_andi_i64(tmp, tmp, elements - 1);
76
+ } else {
77
+ tcg_gen_remu_i64(tmp, tmp, tcg_constant_i64(elements));
264
+ }
78
+ }
265
+
79
+
266
+ qd = mve_qreg_ptr(a->qd);
80
+ /* Extract the predicate byte and bit indices. */
267
+ qm = mve_qreg_ptr(a->qm);
81
+ tcg_gen_shli_i64(tmp, tmp, a->esz);
268
+ fn(cpu_env, qd, qm, tcg_constant_i32(offset));
82
+ tcg_gen_andi_i64(dbit, tmp, 7);
269
+ tcg_temp_free_ptr(qd);
83
+ tcg_gen_shri_i64(didx, tmp, 3);
270
+ tcg_temp_free_ptr(qm);
84
+ if (HOST_BIG_ENDIAN) {
271
+ mve_update_eci(s);
85
+ tcg_gen_xori_i64(didx, didx, 7);
86
+ }
87
+
88
+ /* Load the predicate word. */
89
+ tcg_gen_trunc_i64_ptr(ptr, didx);
90
+ tcg_gen_add_ptr(ptr, ptr, cpu_env);
91
+ tcg_gen_ld8u_i64(tmp, ptr, pred_full_reg_offset(s, a->pm));
92
+
93
+ /* Extract the predicate bit and replicate to MO_64. */
94
+ tcg_gen_shr_i64(tmp, tmp, dbit);
95
+ tcg_gen_andi_i64(tmp, tmp, 1);
96
+ tcg_gen_neg_i64(tmp, tmp);
97
+
98
+ /* Apply to either copy the source, or write zeros. */
99
+ tcg_gen_gvec_ands(MO_64, pred_full_reg_offset(s, a->pd),
100
+ pred_full_reg_offset(s, a->pn), tmp, pl, pl);
101
+
102
+ tcg_temp_free_i64(tmp);
103
+ tcg_temp_free_i64(dbit);
104
+ tcg_temp_free_i64(didx);
105
+ tcg_temp_free_ptr(ptr);
272
+ return true;
106
+ return true;
273
+}
107
+}
274
+
275
+static bool trans_VLDRW_sg_imm(DisasContext *s, arg_vldst_sg_imm *a)
276
+{
277
+ static MVEGenLdStSGFn * const fns[] = {
278
+ gen_helper_mve_vldrw_sg_uw,
279
+ gen_helper_mve_vldrw_sg_wb_uw,
280
+ };
281
+ if (a->qd == a->qm) {
282
+ return false; /* UNPREDICTABLE */
283
+ }
284
+ return do_ldst_sg_imm(s, a, fns[a->w], MO_32);
285
+}
286
+
287
+static bool trans_VLDRD_sg_imm(DisasContext *s, arg_vldst_sg_imm *a)
288
+{
289
+ static MVEGenLdStSGFn * const fns[] = {
290
+ gen_helper_mve_vldrd_sg_ud,
291
+ gen_helper_mve_vldrd_sg_wb_ud,
292
+ };
293
+ if (a->qd == a->qm) {
294
+ return false; /* UNPREDICTABLE */
295
+ }
296
+ return do_ldst_sg_imm(s, a, fns[a->w], MO_64);
297
+}
298
+
299
+static bool trans_VSTRW_sg_imm(DisasContext *s, arg_vldst_sg_imm *a)
300
+{
301
+ static MVEGenLdStSGFn * const fns[] = {
302
+ gen_helper_mve_vstrw_sg_uw,
303
+ gen_helper_mve_vstrw_sg_wb_uw,
304
+ };
305
+ return do_ldst_sg_imm(s, a, fns[a->w], MO_32);
306
+}
307
+
308
+static bool trans_VSTRD_sg_imm(DisasContext *s, arg_vldst_sg_imm *a)
309
+{
310
+ static MVEGenLdStSGFn * const fns[] = {
311
+ gen_helper_mve_vstrd_sg_ud,
312
+ gen_helper_mve_vstrd_sg_wb_ud,
313
+ };
314
+ return do_ldst_sg_imm(s, a, fns[a->w], MO_64);
315
+}
316
+
317
static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
318
{
319
TCGv_ptr qd;
320
--
108
--
321
2.20.1
109
2.25.1
322
323
diff view generated by jsdifflib
1
In some situations we need a mask telling us which parts of the
1
From: Richard Henderson <richard.henderson@linaro.org>
2
vector correspond to beats that are not being executed because of
3
ECI, separately from the combined "which bytes are predicated away"
4
mask. Factor this mask calculation out of mve_element_mask() into
5
its own function.
6
2
3
This is an SVE instruction that operates using the SVE vector
4
length but that it is present only if SME is implemented.
5
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20220708151540.18136-30-richard.henderson@linaro.org
7
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
9
---
10
---
10
target/arm/mve_helper.c | 58 ++++++++++++++++++++++++-----------------
11
target/arm/helper-sve.h | 2 ++
11
1 file changed, 34 insertions(+), 24 deletions(-)
12
target/arm/sve.decode | 1 +
13
target/arm/sve_helper.c | 16 ++++++++++++++++
14
target/arm/translate-sve.c | 2 ++
15
4 files changed, 21 insertions(+)
12
16
13
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
17
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
14
index XXXXXXX..XXXXXXX 100644
18
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/mve_helper.c
19
--- a/target/arm/helper-sve.h
16
+++ b/target/arm/mve_helper.c
20
+++ b/target/arm/helper-sve.h
17
@@ -XXX,XX +XXX,XX @@
21
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_revh_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
18
#include "exec/exec-all.h"
22
19
#include "tcg/tcg.h"
23
DEF_HELPER_FLAGS_4(sve_revw_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
20
24
21
+static uint16_t mve_eci_mask(CPUARMState *env)
25
+DEF_HELPER_FLAGS_4(sme_revd_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
26
+
27
DEF_HELPER_FLAGS_4(sve_rbit_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
28
DEF_HELPER_FLAGS_4(sve_rbit_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
29
DEF_HELPER_FLAGS_4(sve_rbit_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
30
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
31
index XXXXXXX..XXXXXXX 100644
32
--- a/target/arm/sve.decode
33
+++ b/target/arm/sve.decode
34
@@ -XXX,XX +XXX,XX @@ REVB 00000101 .. 1001 00 100 ... ..... ..... @rd_pg_rn
35
REVH 00000101 .. 1001 01 100 ... ..... ..... @rd_pg_rn
36
REVW 00000101 .. 1001 10 100 ... ..... ..... @rd_pg_rn
37
RBIT 00000101 .. 1001 11 100 ... ..... ..... @rd_pg_rn
38
+REVD 00000101 00 1011 10 100 ... ..... ..... @rd_pg_rn_e0
39
40
# SVE vector splice (predicated, destructive)
41
SPLICE 00000101 .. 101 100 100 ... ..... ..... @rdn_pg_rm
42
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
43
index XXXXXXX..XXXXXXX 100644
44
--- a/target/arm/sve_helper.c
45
+++ b/target/arm/sve_helper.c
46
@@ -XXX,XX +XXX,XX @@ DO_ZPZ_D(sve_revh_d, uint64_t, hswap64)
47
48
DO_ZPZ_D(sve_revw_d, uint64_t, wswap64)
49
50
+void HELPER(sme_revd_q)(void *vd, void *vn, void *vg, uint32_t desc)
22
+{
51
+{
23
+ /*
52
+ intptr_t i, opr_sz = simd_oprsz(desc) / 8;
24
+ * Return the mask of which elements in the MVE vector correspond
53
+ uint64_t *d = vd, *n = vn;
25
+ * to beats being executed. The mask has 1 bits for executed lanes
54
+ uint8_t *pg = vg;
26
+ * and 0 bits where ECI says this beat was already executed.
27
+ */
28
+ int eci;
29
+
55
+
30
+ if ((env->condexec_bits & 0xf) != 0) {
56
+ for (i = 0; i < opr_sz; i += 2) {
31
+ return 0xffff;
57
+ if (pg[H1(i)] & 1) {
32
+ }
58
+ uint64_t n0 = n[i + 0];
33
+
59
+ uint64_t n1 = n[i + 1];
34
+ eci = env->condexec_bits >> 4;
60
+ d[i + 0] = n1;
35
+ switch (eci) {
61
+ d[i + 1] = n0;
36
+ case ECI_NONE:
62
+ }
37
+ return 0xffff;
38
+ case ECI_A0:
39
+ return 0xfff0;
40
+ case ECI_A0A1:
41
+ return 0xff00;
42
+ case ECI_A0A1A2:
43
+ case ECI_A0A1A2B0:
44
+ return 0xf000;
45
+ default:
46
+ g_assert_not_reached();
47
+ }
63
+ }
48
+}
64
+}
49
+
65
+
50
static uint16_t mve_element_mask(CPUARMState *env)
66
DO_ZPZ(sve_rbit_b, uint8_t, H1, revbit8)
51
{
67
DO_ZPZ(sve_rbit_h, uint16_t, H1_2, revbit16)
52
/*
68
DO_ZPZ(sve_rbit_s, uint32_t, H1_4, revbit32)
53
@@ -XXX,XX +XXX,XX @@ static uint16_t mve_element_mask(CPUARMState *env)
69
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
54
mask &= ltpmask;
70
index XXXXXXX..XXXXXXX 100644
55
}
71
--- a/target/arm/translate-sve.c
56
72
+++ b/target/arm/translate-sve.c
57
- if ((env->condexec_bits & 0xf) == 0) {
73
@@ -XXX,XX +XXX,XX @@ TRANS_FEAT(REVH, aa64_sve, gen_gvec_ool_arg_zpz, revh_fns[a->esz], a, 0)
58
- /*
74
TRANS_FEAT(REVW, aa64_sve, gen_gvec_ool_arg_zpz,
59
- * ECI bits indicate which beats are already executed;
75
a->esz == 3 ? gen_helper_sve_revw_d : NULL, a, 0)
60
- * we handle this by effectively predicating them out.
76
61
- */
77
+TRANS_FEAT(REVD, aa64_sme, gen_gvec_ool_arg_zpz, gen_helper_sme_revd_q, a, 0)
62
- int eci = env->condexec_bits >> 4;
78
+
63
- switch (eci) {
79
TRANS_FEAT(SPLICE, aa64_sve, gen_gvec_ool_arg_zpzz,
64
- case ECI_NONE:
80
gen_helper_sve_splice, a, a->esz)
65
- break;
66
- case ECI_A0:
67
- mask &= 0xfff0;
68
- break;
69
- case ECI_A0A1:
70
- mask &= 0xff00;
71
- break;
72
- case ECI_A0A1A2:
73
- case ECI_A0A1A2B0:
74
- mask &= 0xf000;
75
- break;
76
- default:
77
- g_assert_not_reached();
78
- }
79
- }
80
-
81
+ /*
82
+ * ECI bits indicate which beats are already executed;
83
+ * we handle this by effectively predicating them out.
84
+ */
85
+ mask &= mve_eci_mask(env);
86
return mask;
87
}
88
81
89
--
82
--
90
2.20.1
83
2.25.1
91
92
diff view generated by jsdifflib
1
Implement the MVE VMULL (polynomial) insn. Unlike Neon, this comes
1
From: Richard Henderson <richard.henderson@linaro.org>
2
in two flavours: 8x8->16 and a 16x16->32. Also unlike Neon, the
3
inputs are in either the low or the high half of each double-width
4
element.
5
2
6
The assembler for this insn indicates the size with "P8" or "P16",
3
This is an SVE instruction that operates using the SVE vector
7
encoded into bit 28 as size = 0 or 1. We choose to follow the
4
length but that it is present only if SME is implemented.
8
same encoding as VQDMULL and decode this into a->size as MO_16
9
or MO_32 indicating the size of the result elements. This then
10
carries through to the helper function names where it then
11
matches up with the existing pmull_h() which does an 8x8->16
12
operation and a new pmull_w() which does the 16x16->32.
13
5
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20220708151540.18136-31-richard.henderson@linaro.org
14
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
15
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
16
---
10
---
17
target/arm/helper-mve.h | 5 +++++
11
target/arm/helper.h | 18 +++++++
18
target/arm/vec_internal.h | 11 +++++++++++
12
target/arm/sve.decode | 5 ++
19
target/arm/mve.decode | 14 ++++++++++----
13
target/arm/translate-sve.c | 102 +++++++++++++++++++++++++++++++++++++
20
target/arm/mve_helper.c | 16 ++++++++++++++++
14
target/arm/vec_helper.c | 24 +++++++++
21
target/arm/translate-mve.c | 28 ++++++++++++++++++++++++++++
15
4 files changed, 149 insertions(+)
22
target/arm/vec_helper.c | 14 +++++++++++++-
23
6 files changed, 83 insertions(+), 5 deletions(-)
24
16
25
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
17
diff --git a/target/arm/helper.h b/target/arm/helper.h
26
index XXXXXXX..XXXXXXX 100644
18
index XXXXXXX..XXXXXXX 100644
27
--- a/target/arm/helper-mve.h
19
--- a/target/arm/helper.h
28
+++ b/target/arm/helper-mve.h
20
+++ b/target/arm/helper.h
29
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vmulltub, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
21
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(gvec_bfmlal, TCG_CALL_NO_RWG,
30
DEF_HELPER_FLAGS_4(mve_vmulltuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
22
DEF_HELPER_FLAGS_6(gvec_bfmlal_idx, TCG_CALL_NO_RWG,
31
DEF_HELPER_FLAGS_4(mve_vmulltuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
23
void, ptr, ptr, ptr, ptr, ptr, i32)
32
24
33
+DEF_HELPER_FLAGS_4(mve_vmullpbh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
25
+DEF_HELPER_FLAGS_5(gvec_sclamp_b, TCG_CALL_NO_RWG,
34
+DEF_HELPER_FLAGS_4(mve_vmullpth, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
26
+ void, ptr, ptr, ptr, ptr, i32)
35
+DEF_HELPER_FLAGS_4(mve_vmullpbw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
27
+DEF_HELPER_FLAGS_5(gvec_sclamp_h, TCG_CALL_NO_RWG,
36
+DEF_HELPER_FLAGS_4(mve_vmullptw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
28
+ void, ptr, ptr, ptr, ptr, i32)
37
+
29
+DEF_HELPER_FLAGS_5(gvec_sclamp_s, TCG_CALL_NO_RWG,
38
DEF_HELPER_FLAGS_4(mve_vqdmulhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
30
+ void, ptr, ptr, ptr, ptr, i32)
39
DEF_HELPER_FLAGS_4(mve_vqdmulhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
31
+DEF_HELPER_FLAGS_5(gvec_sclamp_d, TCG_CALL_NO_RWG,
40
DEF_HELPER_FLAGS_4(mve_vqdmulhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
32
+ void, ptr, ptr, ptr, ptr, i32)
41
diff --git a/target/arm/vec_internal.h b/target/arm/vec_internal.h
33
+
42
index XXXXXXX..XXXXXXX 100644
34
+DEF_HELPER_FLAGS_5(gvec_uclamp_b, TCG_CALL_NO_RWG,
43
--- a/target/arm/vec_internal.h
35
+ void, ptr, ptr, ptr, ptr, i32)
44
+++ b/target/arm/vec_internal.h
36
+DEF_HELPER_FLAGS_5(gvec_uclamp_h, TCG_CALL_NO_RWG,
45
@@ -XXX,XX +XXX,XX @@ int16_t do_sqrdmlah_h(int16_t, int16_t, int16_t, bool, bool, uint32_t *);
37
+ void, ptr, ptr, ptr, ptr, i32)
46
int32_t do_sqrdmlah_s(int32_t, int32_t, int32_t, bool, bool, uint32_t *);
38
+DEF_HELPER_FLAGS_5(gvec_uclamp_s, TCG_CALL_NO_RWG,
47
int64_t do_sqrdmlah_d(int64_t, int64_t, int64_t, bool, bool);
39
+ void, ptr, ptr, ptr, ptr, i32)
48
40
+DEF_HELPER_FLAGS_5(gvec_uclamp_d, TCG_CALL_NO_RWG,
49
+/*
41
+ void, ptr, ptr, ptr, ptr, i32)
50
+ * 8 x 8 -> 16 vector polynomial multiply where the inputs are
42
+
51
+ * in the low 8 bits of each 16-bit element
43
#ifdef TARGET_AARCH64
52
+*/
44
#include "helper-a64.h"
53
+uint64_t pmull_h(uint64_t op1, uint64_t op2);
45
#include "helper-sve.h"
54
+/*
46
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
55
+ * 16 x 16 -> 32 vector polynomial multiply where the inputs are
47
index XXXXXXX..XXXXXXX 100644
56
+ * in the low 16 bits of each 32-bit element
48
--- a/target/arm/sve.decode
57
+ */
49
+++ b/target/arm/sve.decode
58
+uint64_t pmull_w(uint64_t op1, uint64_t op2);
50
@@ -XXX,XX +XXX,XX @@ PSEL 00100101 .. 1 100 .. 01 .... 0 .... 0 .... \
59
+
51
@psel esz=2 imm=%psel_imm_s
60
#endif /* TARGET_ARM_VEC_INTERNALS_H */
52
PSEL 00100101 .1 1 000 .. 01 .... 0 .... 0 .... \
61
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
53
@psel esz=3 imm=%psel_imm_d
62
index XXXXXXX..XXXXXXX 100644
54
+
63
--- a/target/arm/mve.decode
55
+### SVE clamp
64
+++ b/target/arm/mve.decode
56
+
65
@@ -XXX,XX +XXX,XX @@ VHADD_U 111 1 1111 0 . .. ... 0 ... 0 0000 . 1 . 0 ... 0 @2op
57
+SCLAMP 01000100 .. 0 ..... 110000 ..... ..... @rda_rn_rm
66
VHSUB_S 111 0 1111 0 . .. ... 0 ... 0 0010 . 1 . 0 ... 0 @2op
58
+UCLAMP 01000100 .. 0 ..... 110001 ..... ..... @rda_rn_rm
67
VHSUB_U 111 1 1111 0 . .. ... 0 ... 0 0010 . 1 . 0 ... 0 @2op
59
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
68
60
index XXXXXXX..XXXXXXX 100644
69
-VMULL_BS 111 0 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
61
--- a/target/arm/translate-sve.c
70
-VMULL_BU 111 1 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
62
+++ b/target/arm/translate-sve.c
71
-VMULL_TS 111 0 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
63
@@ -XXX,XX +XXX,XX @@ static bool trans_PSEL(DisasContext *s, arg_psel *a)
72
-VMULL_TU 111 1 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
64
tcg_temp_free_ptr(ptr);
73
+{
65
return true;
74
+ VMULLP_B 111 . 1110 0 . 11 ... 1 ... 0 1110 . 0 . 0 ... 0 @2op_sz28
75
+ VMULL_BS 111 0 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
76
+ VMULL_BU 111 1 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
77
+}
78
+{
79
+ VMULLP_T 111 . 1110 0 . 11 ... 1 ... 1 1110 . 0 . 0 ... 0 @2op_sz28
80
+ VMULL_TS 111 0 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
81
+ VMULL_TU 111 1 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
82
+}
83
84
VQDMULH 1110 1111 0 . .. ... 0 ... 0 1011 . 1 . 0 ... 0 @2op
85
VQRDMULH 1111 1111 0 . .. ... 0 ... 0 1011 . 1 . 0 ... 0 @2op
86
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
87
index XXXXXXX..XXXXXXX 100644
88
--- a/target/arm/mve_helper.c
89
+++ b/target/arm/mve_helper.c
90
@@ -XXX,XX +XXX,XX @@ DO_2OP_L(vmulltub, 1, 1, uint8_t, 2, uint16_t, DO_MUL)
91
DO_2OP_L(vmulltuh, 1, 2, uint16_t, 4, uint32_t, DO_MUL)
92
DO_2OP_L(vmulltuw, 1, 4, uint32_t, 8, uint64_t, DO_MUL)
93
94
+/*
95
+ * Polynomial multiply. We can always do this generating 64 bits
96
+ * of the result at a time, so we don't need to use DO_2OP_L.
97
+ */
98
+#define VMULLPH_MASK 0x00ff00ff00ff00ffULL
99
+#define VMULLPW_MASK 0x0000ffff0000ffffULL
100
+#define DO_VMULLPBH(N, M) pmull_h((N) & VMULLPH_MASK, (M) & VMULLPH_MASK)
101
+#define DO_VMULLPTH(N, M) DO_VMULLPBH((N) >> 8, (M) >> 8)
102
+#define DO_VMULLPBW(N, M) pmull_w((N) & VMULLPW_MASK, (M) & VMULLPW_MASK)
103
+#define DO_VMULLPTW(N, M) DO_VMULLPBW((N) >> 16, (M) >> 16)
104
+
105
+DO_2OP(vmullpbh, 8, uint64_t, DO_VMULLPBH)
106
+DO_2OP(vmullpth, 8, uint64_t, DO_VMULLPTH)
107
+DO_2OP(vmullpbw, 8, uint64_t, DO_VMULLPBW)
108
+DO_2OP(vmullptw, 8, uint64_t, DO_VMULLPTW)
109
+
110
/*
111
* Because the computation type is at least twice as large as required,
112
* these work for both signed and unsigned source types.
113
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
114
index XXXXXXX..XXXXXXX 100644
115
--- a/target/arm/translate-mve.c
116
+++ b/target/arm/translate-mve.c
117
@@ -XXX,XX +XXX,XX @@ static bool trans_VQDMULLT(DisasContext *s, arg_2op *a)
118
return do_2op(s, a, fns[a->size]);
119
}
66
}
120
67
+
121
+static bool trans_VMULLP_B(DisasContext *s, arg_2op *a)
68
+static void gen_sclamp_i32(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_i32 a)
122
+{
69
+{
123
+ /*
70
+ tcg_gen_smax_i32(d, a, n);
124
+ * Note that a->size indicates the output size, ie VMULL.P8
71
+ tcg_gen_smin_i32(d, d, m);
125
+ * is the 8x8->16 operation and a->size is MO_16; VMULL.P16
72
+}
126
+ * is the 16x16->32 operation and a->size is MO_32.
73
+
127
+ */
74
+static void gen_sclamp_i64(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_i64 a)
128
+ static MVEGenTwoOpFn * const fns[] = {
75
+{
129
+ NULL,
76
+ tcg_gen_smax_i64(d, a, n);
130
+ gen_helper_mve_vmullpbh,
77
+ tcg_gen_smin_i64(d, d, m);
131
+ gen_helper_mve_vmullpbw,
78
+}
132
+ NULL,
79
+
133
+ };
80
+static void gen_sclamp_vec(unsigned vece, TCGv_vec d, TCGv_vec n,
134
+ return do_2op(s, a, fns[a->size]);
81
+ TCGv_vec m, TCGv_vec a)
135
+}
82
+{
136
+
83
+ tcg_gen_smax_vec(vece, d, a, n);
137
+static bool trans_VMULLP_T(DisasContext *s, arg_2op *a)
84
+ tcg_gen_smin_vec(vece, d, d, m);
138
+{
85
+}
139
+ /* a->size is as for trans_VMULLP_B */
86
+
140
+ static MVEGenTwoOpFn * const fns[] = {
87
+static void gen_sclamp(unsigned vece, uint32_t d, uint32_t n, uint32_t m,
141
+ NULL,
88
+ uint32_t a, uint32_t oprsz, uint32_t maxsz)
142
+ gen_helper_mve_vmullpth,
89
+{
143
+ gen_helper_mve_vmullptw,
90
+ static const TCGOpcode vecop[] = {
144
+ NULL,
91
+ INDEX_op_smin_vec, INDEX_op_smax_vec, 0
145
+ };
92
+ };
146
+ return do_2op(s, a, fns[a->size]);
93
+ static const GVecGen4 ops[4] = {
147
+}
94
+ { .fniv = gen_sclamp_vec,
148
+
95
+ .fno = gen_helper_gvec_sclamp_b,
149
/*
96
+ .opt_opc = vecop,
150
* VADC and VSBC: these perform an add-with-carry or subtract-with-carry
97
+ .vece = MO_8 },
151
* of the 32-bit elements in each lane of the input vectors, where the
98
+ { .fniv = gen_sclamp_vec,
99
+ .fno = gen_helper_gvec_sclamp_h,
100
+ .opt_opc = vecop,
101
+ .vece = MO_16 },
102
+ { .fni4 = gen_sclamp_i32,
103
+ .fniv = gen_sclamp_vec,
104
+ .fno = gen_helper_gvec_sclamp_s,
105
+ .opt_opc = vecop,
106
+ .vece = MO_32 },
107
+ { .fni8 = gen_sclamp_i64,
108
+ .fniv = gen_sclamp_vec,
109
+ .fno = gen_helper_gvec_sclamp_d,
110
+ .opt_opc = vecop,
111
+ .vece = MO_64,
112
+ .prefer_i64 = TCG_TARGET_REG_BITS == 64 }
113
+ };
114
+ tcg_gen_gvec_4(d, n, m, a, oprsz, maxsz, &ops[vece]);
115
+}
116
+
117
+TRANS_FEAT(SCLAMP, aa64_sme, gen_gvec_fn_arg_zzzz, gen_sclamp, a)
118
+
119
+static void gen_uclamp_i32(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_i32 a)
120
+{
121
+ tcg_gen_umax_i32(d, a, n);
122
+ tcg_gen_umin_i32(d, d, m);
123
+}
124
+
125
+static void gen_uclamp_i64(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_i64 a)
126
+{
127
+ tcg_gen_umax_i64(d, a, n);
128
+ tcg_gen_umin_i64(d, d, m);
129
+}
130
+
131
+static void gen_uclamp_vec(unsigned vece, TCGv_vec d, TCGv_vec n,
132
+ TCGv_vec m, TCGv_vec a)
133
+{
134
+ tcg_gen_umax_vec(vece, d, a, n);
135
+ tcg_gen_umin_vec(vece, d, d, m);
136
+}
137
+
138
+static void gen_uclamp(unsigned vece, uint32_t d, uint32_t n, uint32_t m,
139
+ uint32_t a, uint32_t oprsz, uint32_t maxsz)
140
+{
141
+ static const TCGOpcode vecop[] = {
142
+ INDEX_op_umin_vec, INDEX_op_umax_vec, 0
143
+ };
144
+ static const GVecGen4 ops[4] = {
145
+ { .fniv = gen_uclamp_vec,
146
+ .fno = gen_helper_gvec_uclamp_b,
147
+ .opt_opc = vecop,
148
+ .vece = MO_8 },
149
+ { .fniv = gen_uclamp_vec,
150
+ .fno = gen_helper_gvec_uclamp_h,
151
+ .opt_opc = vecop,
152
+ .vece = MO_16 },
153
+ { .fni4 = gen_uclamp_i32,
154
+ .fniv = gen_uclamp_vec,
155
+ .fno = gen_helper_gvec_uclamp_s,
156
+ .opt_opc = vecop,
157
+ .vece = MO_32 },
158
+ { .fni8 = gen_uclamp_i64,
159
+ .fniv = gen_uclamp_vec,
160
+ .fno = gen_helper_gvec_uclamp_d,
161
+ .opt_opc = vecop,
162
+ .vece = MO_64,
163
+ .prefer_i64 = TCG_TARGET_REG_BITS == 64 }
164
+ };
165
+ tcg_gen_gvec_4(d, n, m, a, oprsz, maxsz, &ops[vece]);
166
+}
167
+
168
+TRANS_FEAT(UCLAMP, aa64_sme, gen_gvec_fn_arg_zzzz, gen_uclamp, a)
152
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
169
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
153
index XXXXXXX..XXXXXXX 100644
170
index XXXXXXX..XXXXXXX 100644
154
--- a/target/arm/vec_helper.c
171
--- a/target/arm/vec_helper.c
155
+++ b/target/arm/vec_helper.c
172
+++ b/target/arm/vec_helper.c
156
@@ -XXX,XX +XXX,XX @@ static uint64_t expand_byte_to_half(uint64_t x)
173
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfmlal_idx)(void *vd, void *vn, void *vm,
157
| ((x & 0xff000000) << 24);
174
}
175
clear_tail(d, opr_sz, simd_maxsz(desc));
158
}
176
}
159
177
+
160
-static uint64_t pmull_h(uint64_t op1, uint64_t op2)
178
+#define DO_CLAMP(NAME, TYPE) \
161
+uint64_t pmull_w(uint64_t op1, uint64_t op2)
179
+void HELPER(NAME)(void *d, void *n, void *m, void *a, uint32_t desc) \
162
{
180
+{ \
163
uint64_t result = 0;
181
+ intptr_t i, opr_sz = simd_oprsz(desc); \
164
int i;
182
+ for (i = 0; i < opr_sz; i += sizeof(TYPE)) { \
165
+ for (i = 0; i < 16; ++i) {
183
+ TYPE aa = *(TYPE *)(a + i); \
166
+ uint64_t mask = (op1 & 0x0000000100000001ull) * 0xffffffff;
184
+ TYPE nn = *(TYPE *)(n + i); \
167
+ result ^= op2 & mask;
185
+ TYPE mm = *(TYPE *)(m + i); \
168
+ op1 >>= 1;
186
+ TYPE dd = MIN(MAX(aa, nn), mm); \
169
+ op2 <<= 1;
187
+ *(TYPE *)(d + i) = dd; \
170
+ }
188
+ } \
171
+ return result;
189
+ clear_tail(d, opr_sz, simd_maxsz(desc)); \
172
+}
190
+}
173
191
+
174
+uint64_t pmull_h(uint64_t op1, uint64_t op2)
192
+DO_CLAMP(gvec_sclamp_b, int8_t)
175
+{
193
+DO_CLAMP(gvec_sclamp_h, int16_t)
176
+ uint64_t result = 0;
194
+DO_CLAMP(gvec_sclamp_s, int32_t)
177
+ int i;
195
+DO_CLAMP(gvec_sclamp_d, int64_t)
178
for (i = 0; i < 8; ++i) {
196
+
179
uint64_t mask = (op1 & 0x0001000100010001ull) * 0xffff;
197
+DO_CLAMP(gvec_uclamp_b, uint8_t)
180
result ^= op2 & mask;
198
+DO_CLAMP(gvec_uclamp_h, uint16_t)
199
+DO_CLAMP(gvec_uclamp_s, uint32_t)
200
+DO_CLAMP(gvec_uclamp_d, uint64_t)
181
--
201
--
182
2.20.1
202
2.25.1
183
184
diff view generated by jsdifflib
1
We're about to make a code change to the sdiv and udiv helper
1
From: Richard Henderson <richard.henderson@linaro.org>
2
functions, so first fix their indentation and coding style.
3
2
3
We can handle both exception entry and exception return by
4
hooking into aarch64_sve_change_el.
5
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20220708151540.18136-32-richard.henderson@linaro.org
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Message-id: 20210730151636.17254-2-peter.maydell@linaro.org
7
---
10
---
8
target/arm/helper.c | 15 +++++++++------
11
target/arm/helper.c | 15 +++++++++++++--
9
1 file changed, 9 insertions(+), 6 deletions(-)
12
1 file changed, 13 insertions(+), 2 deletions(-)
10
13
11
diff --git a/target/arm/helper.c b/target/arm/helper.c
14
diff --git a/target/arm/helper.c b/target/arm/helper.c
12
index XXXXXXX..XXXXXXX 100644
15
index XXXXXXX..XXXXXXX 100644
13
--- a/target/arm/helper.c
16
--- a/target/arm/helper.c
14
+++ b/target/arm/helper.c
17
+++ b/target/arm/helper.c
15
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(uxtb16)(uint32_t x)
18
@@ -XXX,XX +XXX,XX @@ void aarch64_sve_change_el(CPUARMState *env, int old_el,
16
19
return;
17
int32_t HELPER(sdiv)(int32_t num, int32_t den)
20
}
18
{
21
19
- if (den == 0)
22
+ old_a64 = old_el ? arm_el_is_aa64(env, old_el) : el0_a64;
20
- return 0;
23
+ new_a64 = new_el ? arm_el_is_aa64(env, new_el) : el0_a64;
21
- if (num == INT_MIN && den == -1)
24
+
22
- return INT_MIN;
25
+ /*
23
+ if (den == 0) {
26
+ * Both AArch64.TakeException and AArch64.ExceptionReturn
24
+ return 0;
27
+ * invoke ResetSVEState when taking an exception from, or
28
+ * returning to, AArch32 state when PSTATE.SM is enabled.
29
+ */
30
+ if (old_a64 != new_a64 && FIELD_EX64(env->svcr, SVCR, SM)) {
31
+ arm_reset_sve_state(env);
32
+ return;
25
+ }
33
+ }
26
+ if (num == INT_MIN && den == -1) {
34
+
27
+ return INT_MIN;
35
/*
28
+ }
36
* DDI0584A.d sec 3.2: "If SVE instructions are disabled or trapped
29
return num / den;
37
* at ELx, or not available because the EL is in AArch32 state, then
30
}
38
@@ -XXX,XX +XXX,XX @@ void aarch64_sve_change_el(CPUARMState *env, int old_el,
31
39
* we already have the correct register contents when encountering the
32
uint32_t HELPER(udiv)(uint32_t num, uint32_t den)
40
* vq0->vq0 transition between EL0->EL1.
33
{
41
*/
34
- if (den == 0)
42
- old_a64 = old_el ? arm_el_is_aa64(env, old_el) : el0_a64;
35
- return 0;
43
old_len = (old_a64 && !sve_exception_el(env, old_el)
36
+ if (den == 0) {
44
? sve_vqm1_for_el(env, old_el) : 0);
37
+ return 0;
45
- new_a64 = new_el ? arm_el_is_aa64(env, new_el) : el0_a64;
38
+ }
46
new_len = (new_a64 && !sve_exception_el(env, new_el)
39
return num / den;
47
? sve_vqm1_for_el(env, new_el) : 0);
40
}
41
48
42
--
49
--
43
2.20.1
50
2.25.1
44
45
diff view generated by jsdifflib
1
All the users of the vmlaldav formats have an 'x bit in bit 12 and an
1
From: Richard Henderson <richard.henderson@linaro.org>
2
'a' bit in bit 5; move these to the format rather than specifying them
3
in each insn pattern.
4
2
3
Note that SME remains effectively disabled for user-only,
4
because we do not yet set CPACR_EL1.SMEN. This needs to
5
wait until the kernel ABI is implemented.
6
7
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
8
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
9
Message-id: 20220708151540.18136-33-richard.henderson@linaro.org
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
7
---
11
---
8
target/arm/mve.decode | 16 ++++++++--------
12
docs/system/arm/emulation.rst | 4 ++++
9
1 file changed, 8 insertions(+), 8 deletions(-)
13
target/arm/cpu64.c | 11 +++++++++++
14
2 files changed, 15 insertions(+)
10
15
11
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
16
diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
12
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
13
--- a/target/arm/mve.decode
18
--- a/docs/system/arm/emulation.rst
14
+++ b/target/arm/mve.decode
19
+++ b/docs/system/arm/emulation.rst
15
@@ -XXX,XX +XXX,XX @@ VDUP 1110 1110 1 0 10 ... 0 .... 1011 . 0 0 1 0000 @vdup size=2
20
@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
16
21
- FEAT_SHA512 (Advanced SIMD SHA512 instructions)
17
&vmlaldav rdahi rdalo size qn qm x a
22
- FEAT_SM3 (Advanced SIMD SM3 instructions)
18
23
- FEAT_SM4 (Advanced SIMD SM4 instructions)
19
-@vmlaldav .... .... . ... ... . ... . .... .... qm:3 . \
24
+- FEAT_SME (Scalable Matrix Extension)
20
+@vmlaldav .... .... . ... ... . ... x:1 .... .. a:1 . qm:3 . \
25
+- FEAT_SME_FA64 (Full A64 instruction set in Streaming SVE mode)
21
qn=%qn rdahi=%rdahi rdalo=%rdalo size=%size_16 &vmlaldav
26
+- FEAT_SME_F64F64 (Double-precision floating-point outer product instructions)
22
-@vmlaldav_nosz .... .... . ... ... . ... . .... .... qm:3 . \
27
+- FEAT_SME_I16I64 (16-bit to 64-bit integer widening outer product instructions)
23
+@vmlaldav_nosz .... .... . ... ... . ... x:1 .... .. a:1 . qm:3 . \
28
- FEAT_SPECRES (Speculation restriction instructions)
24
qn=%qn rdahi=%rdahi rdalo=%rdalo size=0 &vmlaldav
29
- FEAT_SSBS (Speculative Store Bypass Safe)
25
-VMLALDAV_S 1110 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
30
- FEAT_TLBIOS (TLB invalidate instructions in Outer Shareable domain)
26
-VMLALDAV_U 1111 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 0 @vmlaldav
31
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
27
+VMLALDAV_S 1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
32
index XXXXXXX..XXXXXXX 100644
28
+VMLALDAV_U 1111 1110 1 ... ... . ... . 1110 . 0 . 0 ... 0 @vmlaldav
33
--- a/target/arm/cpu64.c
29
34
+++ b/target/arm/cpu64.c
30
-VMLSLDAV 1110 1110 1 ... ... . ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav
35
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
31
+VMLSLDAV 1110 1110 1 ... ... . ... . 1110 . 0 . 0 ... 1 @vmlaldav
36
*/
32
37
t = FIELD_DP64(t, ID_AA64PFR1, MTE, 3); /* FEAT_MTE3 */
33
-VRMLALDAVH_S 1110 1110 1 ... ... 0 ... x:1 1111 . 0 a:1 0 ... 0 @vmlaldav_nosz
38
t = FIELD_DP64(t, ID_AA64PFR1, RAS_FRAC, 0); /* FEAT_RASv1p1 + FEAT_DoubleFault */
34
-VRMLALDAVH_U 1111 1110 1 ... ... 0 ... x:1 1111 . 0 a:1 0 ... 0 @vmlaldav_nosz
39
+ t = FIELD_DP64(t, ID_AA64PFR1, SME, 1); /* FEAT_SME */
35
+VRMLALDAVH_S 1110 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
40
t = FIELD_DP64(t, ID_AA64PFR1, CSV2_FRAC, 0); /* FEAT_CSV2_2 */
36
+VRMLALDAVH_U 1111 1110 1 ... ... 0 ... . 1111 . 0 . 0 ... 0 @vmlaldav_nosz
41
cpu->isar.id_aa64pfr1 = t;
37
42
38
-VRMLSLDAVH 1111 1110 1 ... ... 0 ... x:1 1110 . 0 a:1 0 ... 1 @vmlaldav_nosz
43
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
39
+VRMLSLDAVH 1111 1110 1 ... ... 0 ... . 1110 . 0 . 0 ... 1 @vmlaldav_nosz
44
t = FIELD_DP64(t, ID_AA64DFR0, PMUVER, 5); /* FEAT_PMUv3p4 */
40
45
cpu->isar.id_aa64dfr0 = t;
41
# Scalar operations
46
47
+ t = cpu->isar.id_aa64smfr0;
48
+ t = FIELD_DP64(t, ID_AA64SMFR0, F32F32, 1); /* FEAT_SME */
49
+ t = FIELD_DP64(t, ID_AA64SMFR0, B16F32, 1); /* FEAT_SME */
50
+ t = FIELD_DP64(t, ID_AA64SMFR0, F16F32, 1); /* FEAT_SME */
51
+ t = FIELD_DP64(t, ID_AA64SMFR0, I8I32, 0xf); /* FEAT_SME */
52
+ t = FIELD_DP64(t, ID_AA64SMFR0, F64F64, 1); /* FEAT_SME_F64F64 */
53
+ t = FIELD_DP64(t, ID_AA64SMFR0, I16I64, 0xf); /* FEAT_SME_I16I64 */
54
+ t = FIELD_DP64(t, ID_AA64SMFR0, FA64, 1); /* FEAT_SME_FA64 */
55
+ cpu->isar.id_aa64smfr0 = t;
56
+
57
/* Replicate the same data to the 32-bit id registers. */
58
aa32_max_features(cpu);
42
59
43
--
60
--
44
2.20.1
61
2.25.1
45
46
diff view generated by jsdifflib
1
For vector loads, predicated elements are zeroed, instead of
1
From: Richard Henderson <richard.henderson@linaro.org>
2
retaining their previous values (as happens for most data
3
processing operations). This means we need to distinguish
4
"beat not executed due to ECI" (don't touch destination
5
element) from "beat executed but predicated out" (zero
6
destination element).
7
2
3
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
5
Message-id: 20220708151540.18136-34-richard.henderson@linaro.org
8
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
10
---
7
---
11
target/arm/mve_helper.c | 8 +++++---
8
linux-user/aarch64/target_cpu.h | 5 ++++-
12
1 file changed, 5 insertions(+), 3 deletions(-)
9
1 file changed, 4 insertions(+), 1 deletion(-)
13
10
14
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
11
diff --git a/linux-user/aarch64/target_cpu.h b/linux-user/aarch64/target_cpu.h
15
index XXXXXXX..XXXXXXX 100644
12
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/mve_helper.c
13
--- a/linux-user/aarch64/target_cpu.h
17
+++ b/target/arm/mve_helper.c
14
+++ b/linux-user/aarch64/target_cpu.h
18
@@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env)
15
@@ -XXX,XX +XXX,XX @@ static inline void cpu_clone_regs_parent(CPUARMState *env, unsigned flags)
19
env->v7m.vpr = vpr;
16
17
static inline void cpu_set_tls(CPUARMState *env, target_ulong newtls)
18
{
19
- /* Note that AArch64 Linux keeps the TLS pointer in TPIDR; this is
20
+ /*
21
+ * Note that AArch64 Linux keeps the TLS pointer in TPIDR; this is
22
* different from AArch32 Linux, which uses TPIDRRO.
23
*/
24
env->cp15.tpidr_el[0] = newtls;
25
+ /* TPIDR2_EL0 is cleared with CLONE_SETTLS. */
26
+ env->cp15.tpidr2_el0 = 0;
20
}
27
}
21
28
22
-
29
static inline abi_ulong get_sp_from_cpustate(CPUARMState *state)
23
+/* For loads, predicated lanes are zeroed instead of keeping their old values */
24
#define DO_VLDR(OP, MSIZE, LDTYPE, ESIZE, TYPE) \
25
void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr) \
26
{ \
27
TYPE *d = vd; \
28
uint16_t mask = mve_element_mask(env); \
29
+ uint16_t eci_mask = mve_eci_mask(env); \
30
unsigned b, e; \
31
/* \
32
* R_SXTM allows the dest reg to become UNKNOWN for abandoned \
33
@@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env)
34
* then take an exception. \
35
*/ \
36
for (b = 0, e = 0; b < 16; b += ESIZE, e++) { \
37
- if (mask & (1 << b)) { \
38
- d[H##ESIZE(e)] = cpu_##LDTYPE##_data_ra(env, addr, GETPC()); \
39
+ if (eci_mask & (1 << b)) { \
40
+ d[H##ESIZE(e)] = (mask & (1 << b)) ? \
41
+ cpu_##LDTYPE##_data_ra(env, addr, GETPC()) : 0; \
42
} \
43
addr += MSIZE; \
44
} \
45
--
30
--
46
2.20.1
31
2.25.1
47
48
diff view generated by jsdifflib
1
In mve_element_mask(), we calculate a mask for tail predication which
1
From: Richard Henderson <richard.henderson@linaro.org>
2
should have a number of 1 bits based on the value of LR. However,
3
our MAKE_64BIT_MASK() macro has undefined behaviour when passed a
4
zero length. Special case this to give the all-zeroes mask we
5
require.
6
2
3
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
5
Message-id: 20220708151540.18136-35-richard.henderson@linaro.org
7
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
9
---
7
---
10
target/arm/mve_helper.c | 3 ++-
8
linux-user/aarch64/cpu_loop.c | 9 +++++++++
11
1 file changed, 2 insertions(+), 1 deletion(-)
9
1 file changed, 9 insertions(+)
12
10
13
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
11
diff --git a/linux-user/aarch64/cpu_loop.c b/linux-user/aarch64/cpu_loop.c
14
index XXXXXXX..XXXXXXX 100644
12
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/mve_helper.c
13
--- a/linux-user/aarch64/cpu_loop.c
16
+++ b/target/arm/mve_helper.c
14
+++ b/linux-user/aarch64/cpu_loop.c
17
@@ -XXX,XX +XXX,XX @@ static uint16_t mve_element_mask(CPUARMState *env)
15
@@ -XXX,XX +XXX,XX @@ void cpu_loop(CPUARMState *env)
18
*/
16
19
int masklen = env->regs[14] << env->v7m.ltpsize;
17
switch (trapnr) {
20
assert(masklen <= 16);
18
case EXCP_SWI:
21
- mask &= MAKE_64BIT_MASK(0, masklen);
19
+ /*
22
+ uint16_t ltpmask = masklen ? MAKE_64BIT_MASK(0, masklen) : 0;
20
+ * On syscall, PSTATE.ZA is preserved, along with the ZA matrix.
23
+ mask &= ltpmask;
21
+ * PSTATE.SM is cleared, per SMSTOP, which does ResetSVEState.
24
}
22
+ */
25
23
+ if (FIELD_EX64(env->svcr, SVCR, SM)) {
26
if ((env->condexec_bits & 0xf) == 0) {
24
+ env->svcr = FIELD_DP64(env->svcr, SVCR, SM, 0);
25
+ arm_rebuild_hflags(env);
26
+ arm_reset_sve_state(env);
27
+ }
28
ret = do_syscall(env,
29
env->xregs[8],
30
env->xregs[0],
27
--
31
--
28
2.20.1
32
2.25.1
29
30
diff view generated by jsdifflib
1
We got an edge case wrong in the 48-bit SQRSHRL implementation: if
1
From: Richard Henderson <richard.henderson@linaro.org>
2
the shift is to the right, although it always makes the result
3
smaller than the input value it might not be within the 48-bit range
4
the result is supposed to be if the input had some bits in [63..48]
5
set and the shift didn't bring all of those within the [47..0] range.
6
2
7
Handle this similarly to the way we already do for this case in
3
Make sure to zero the currently reserved fields.
8
do_uqrshl48_d(): extend the calculated result from 48 bits,
9
and return that if not saturating or if it doesn't change the
10
result; otherwise fall through to return a saturated value.
11
4
5
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
6
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
7
Message-id: 20220708151540.18136-36-richard.henderson@linaro.org
12
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
13
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
14
---
9
---
15
target/arm/mve_helper.c | 11 +++++++++--
10
linux-user/aarch64/signal.c | 9 ++++++++-
16
1 file changed, 9 insertions(+), 2 deletions(-)
11
1 file changed, 8 insertions(+), 1 deletion(-)
17
12
18
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
13
diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c
19
index XXXXXXX..XXXXXXX 100644
14
index XXXXXXX..XXXXXXX 100644
20
--- a/target/arm/mve_helper.c
15
--- a/linux-user/aarch64/signal.c
21
+++ b/target/arm/mve_helper.c
16
+++ b/linux-user/aarch64/signal.c
22
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(mve_uqrshll)(CPUARMState *env, uint64_t n, uint32_t shift)
17
@@ -XXX,XX +XXX,XX @@ struct target_extra_context {
23
static inline int64_t do_sqrshl48_d(int64_t src, int64_t shift,
18
struct target_sve_context {
24
bool round, uint32_t *sat)
19
struct target_aarch64_ctx head;
20
uint16_t vl;
21
- uint16_t reserved[3];
22
+ uint16_t flags;
23
+ uint16_t reserved[2];
24
/* The actual SVE data immediately follows. It is laid out
25
* according to TARGET_SVE_SIG_{Z,P}REG_OFFSET, based off of
26
* the original struct pointer.
27
@@ -XXX,XX +XXX,XX @@ struct target_sve_context {
28
#define TARGET_SVE_SIG_CONTEXT_SIZE(VQ) \
29
(TARGET_SVE_SIG_PREG_OFFSET(VQ, 17))
30
31
+#define TARGET_SVE_SIG_FLAG_SM 1
32
+
33
struct target_rt_sigframe {
34
struct target_siginfo info;
35
struct target_ucontext uc;
36
@@ -XXX,XX +XXX,XX @@ static void target_setup_sve_record(struct target_sve_context *sve,
25
{
37
{
26
+ int64_t val, extval;
38
int i, j;
27
+
39
28
if (shift <= -48) {
40
+ memset(sve, 0, sizeof(*sve));
29
/* Rounding the sign bit always produces 0. */
41
__put_user(TARGET_SVE_MAGIC, &sve->head.magic);
30
if (round) {
42
__put_user(size, &sve->head.size);
31
@@ -XXX,XX +XXX,XX @@ static inline int64_t do_sqrshl48_d(int64_t src, int64_t shift,
43
__put_user(vq * TARGET_SVE_VQ_BYTES, &sve->vl);
32
} else if (shift < 0) {
44
+ if (FIELD_EX64(env->svcr, SVCR, SM)) {
33
if (round) {
45
+ __put_user(TARGET_SVE_SIG_FLAG_SM, &sve->flags);
34
src >>= -shift - 1;
46
+ }
35
- return (src >> 1) + (src & 1);
47
36
+ val = (src >> 1) + (src & 1);
48
/* Note that SVE regs are stored as a byte stream, with each byte element
37
+ } else {
49
* at a subsequent address. This corresponds to a little-endian store
38
+ val = src >> -shift;
39
+ }
40
+ extval = sextract64(val, 0, 48);
41
+ if (!sat || val == extval) {
42
+ return extval;
43
}
44
- return src >> -shift;
45
} else if (shift < 48) {
46
int64_t extval = sextract64(src << shift, 0, 48);
47
if (!sat || src == (extval >> shift)) {
48
--
50
--
49
2.20.1
51
2.25.1
50
51
diff view generated by jsdifflib
1
In do_sqrshl48_d() and do_uqrshl48_d() we got some of the edge
1
From: Richard Henderson <richard.henderson@linaro.org>
2
cases wrong and failed to saturate correctly:
3
2
4
(1) In do_sqrshl48_d() we used the same code that do_shrshl_bhs()
3
Fold the return value setting into the goto, so each
5
does to obtain the saturated most-negative and most-positive 48-bit
4
point of failure need not do both.
6
signed values for the large-shift-left case. This gives (1 << 47)
7
for saturate-to-most-negative, but we weren't sign-extending this
8
value to the 64-bit output as the pseudocode requires.
9
5
10
(2) For left shifts by less than 48, we copied the "8/16 bit" code
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
11
from do_sqrshl_bhs() and do_uqrshl_bhs(). This doesn't do the right
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
12
thing because it assumes the C type we're working with is at least
8
Message-id: 20220708151540.18136-37-richard.henderson@linaro.org
13
twice the number of bits we're saturating to (so that a shift left by
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
14
bits-1 can't shift anything off the top of the value). This isn't
10
---
15
true for bits == 48, so we would incorrectly return 0 rather than the
11
linux-user/aarch64/signal.c | 26 +++++++++++---------------
16
most-positive value for situations like "shift (1 << 44) right by
12
1 file changed, 11 insertions(+), 15 deletions(-)
17
20". Instead check for saturation by doing the shift and signextend
18
and then testing whether shifting back left again gives the original
19
value.
20
13
21
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
14
diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c
22
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
23
---
24
target/arm/mve_helper.c | 12 +++++-------
25
1 file changed, 5 insertions(+), 7 deletions(-)
26
27
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
28
index XXXXXXX..XXXXXXX 100644
15
index XXXXXXX..XXXXXXX 100644
29
--- a/target/arm/mve_helper.c
16
--- a/linux-user/aarch64/signal.c
30
+++ b/target/arm/mve_helper.c
17
+++ b/linux-user/aarch64/signal.c
31
@@ -XXX,XX +XXX,XX @@ static inline int64_t do_sqrshl48_d(int64_t src, int64_t shift,
18
@@ -XXX,XX +XXX,XX @@ static int target_restore_sigframe(CPUARMState *env,
19
struct target_sve_context *sve = NULL;
20
uint64_t extra_datap = 0;
21
bool used_extra = false;
22
- bool err = false;
23
int vq = 0, sve_size = 0;
24
25
target_restore_general_frame(env, sf);
26
@@ -XXX,XX +XXX,XX @@ static int target_restore_sigframe(CPUARMState *env,
27
switch (magic) {
28
case 0:
29
if (size != 0) {
30
- err = true;
31
- goto exit;
32
+ goto err;
33
}
34
if (used_extra) {
35
ctx = NULL;
36
@@ -XXX,XX +XXX,XX @@ static int target_restore_sigframe(CPUARMState *env,
37
38
case TARGET_FPSIMD_MAGIC:
39
if (fpsimd || size != sizeof(struct target_fpsimd_context)) {
40
- err = true;
41
- goto exit;
42
+ goto err;
43
}
44
fpsimd = (struct target_fpsimd_context *)ctx;
45
break;
46
@@ -XXX,XX +XXX,XX @@ static int target_restore_sigframe(CPUARMState *env,
47
break;
48
}
49
}
50
- err = true;
51
- goto exit;
52
+ goto err;
53
54
case TARGET_EXTRA_MAGIC:
55
if (extra || size != sizeof(struct target_extra_context)) {
56
- err = true;
57
- goto exit;
58
+ goto err;
59
}
60
__get_user(extra_datap,
61
&((struct target_extra_context *)ctx)->datap);
62
@@ -XXX,XX +XXX,XX @@ static int target_restore_sigframe(CPUARMState *env,
63
/* Unknown record -- we certainly didn't generate it.
64
* Did we in fact get out of sync?
65
*/
66
- err = true;
67
- goto exit;
68
+ goto err;
32
}
69
}
33
return src >> -shift;
70
ctx = (void *)ctx + size;
34
} else if (shift < 48) {
35
- int64_t val = src << shift;
36
- int64_t extval = sextract64(val, 0, 48);
37
- if (!sat || val == extval) {
38
+ int64_t extval = sextract64(src << shift, 0, 48);
39
+ if (!sat || src == (extval >> shift)) {
40
return extval;
41
}
42
} else if (!sat || src == 0) {
43
@@ -XXX,XX +XXX,XX @@ static inline int64_t do_sqrshl48_d(int64_t src, int64_t shift,
44
}
71
}
45
72
@@ -XXX,XX +XXX,XX @@ static int target_restore_sigframe(CPUARMState *env,
46
*sat = 1;
73
if (fpsimd) {
47
- return (1ULL << 47) - (src >= 0);
74
target_restore_fpsimd_record(env, fpsimd);
48
+ return src >= 0 ? MAKE_64BIT_MASK(0, 47) : MAKE_64BIT_MASK(47, 17);
75
} else {
76
- err = true;
77
+ goto err;
78
}
79
80
/* SVE data, if present, overwrites FPSIMD data. */
81
if (sve) {
82
target_restore_sve_record(env, sve, vq);
83
}
84
-
85
- exit:
86
unlock_user(extra, extra_datap, 0);
87
- return err;
88
+ return 0;
89
+
90
+ err:
91
+ unlock_user(extra, extra_datap, 0);
92
+ return 1;
49
}
93
}
50
94
51
/* Operate on 64-bit values, but saturate at 48 bits */
95
static abi_ulong get_sigframe(struct target_sigaction *ka,
52
@@ -XXX,XX +XXX,XX @@ static inline uint64_t do_uqrshl48_d(uint64_t src, int64_t shift,
53
return extval;
54
}
55
} else if (shift < 48) {
56
- uint64_t val = src << shift;
57
- uint64_t extval = extract64(val, 0, 48);
58
- if (!sat || val == extval) {
59
+ uint64_t extval = extract64(src << shift, 0, 48);
60
+ if (!sat || src == (extval >> shift)) {
61
return extval;
62
}
63
} else if (!sat || src == 0) {
64
--
96
--
65
2.20.1
97
2.25.1
66
67
diff view generated by jsdifflib
1
In the MVE helpers for the narrowing operations (DO_VSHRN and
1
From: Richard Henderson <richard.henderson@linaro.org>
2
DO_VSHRN_SAT) we were using the wrong bits of the predicate mask for
3
the 'top' versions of the insn. This is because the loop works over
4
the double-sized input elements and shifts the predicate mask by that
5
many bits each time, but when we write out the half-sized output we
6
must look at the mask bits for whichever half of the element we are
7
writing to.
8
2
9
Correct this by shifting the whole mask right by ESIZE bits for the
3
In parse_user_sigframe, the kernel rejects duplicate sve records,
10
'top' insns. This allows us also to simplify the saturation bit
4
or records that are smaller than the header. We were silently
11
checking (where we had noticed that we needed to look at a different
5
allowing these cases to pass, dropping the record.
12
mask bit for the 'top' insn.)
13
6
7
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
8
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
9
Message-id: 20220708151540.18136-38-richard.henderson@linaro.org
14
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
15
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
16
---
11
---
17
target/arm/mve_helper.c | 4 +++-
12
linux-user/aarch64/signal.c | 5 ++++-
18
1 file changed, 3 insertions(+), 1 deletion(-)
13
1 file changed, 4 insertions(+), 1 deletion(-)
19
14
20
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
15
diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c
21
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
22
--- a/target/arm/mve_helper.c
17
--- a/linux-user/aarch64/signal.c
23
+++ b/target/arm/mve_helper.c
18
+++ b/linux-user/aarch64/signal.c
24
@@ -XXX,XX +XXX,XX @@ DO_VSHLL_ALL(vshllt, true)
19
@@ -XXX,XX +XXX,XX @@ static int target_restore_sigframe(CPUARMState *env,
25
TYPE *d = vd; \
20
break;
26
uint16_t mask = mve_element_mask(env); \
21
27
unsigned le; \
22
case TARGET_SVE_MAGIC:
28
+ mask >>= ESIZE * TOP; \
23
+ if (sve || size < sizeof(struct target_sve_context)) {
29
for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) { \
24
+ goto err;
30
TYPE r = FN(m[H##LESIZE(le)], shift); \
25
+ }
31
mergemask(&d[H##ESIZE(le * 2 + TOP)], r, mask); \
26
if (cpu_isar_feature(aa64_sve, env_archcpu(env))) {
32
@@ -XXX,XX +XXX,XX @@ static inline int32_t do_sat_bhs(int64_t val, int64_t min, int64_t max,
27
vq = sve_vq(env);
33
uint16_t mask = mve_element_mask(env); \
28
sve_size = QEMU_ALIGN_UP(TARGET_SVE_SIG_CONTEXT_SIZE(vq), 16);
34
bool qc = false; \
29
- if (!sve && size == sve_size) {
35
unsigned le; \
30
+ if (size == sve_size) {
36
+ mask >>= ESIZE * TOP; \
31
sve = (struct target_sve_context *)ctx;
37
for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) { \
32
break;
38
bool sat = false; \
33
}
39
TYPE r = FN(m[H##LESIZE(le)], shift, &sat); \
40
mergemask(&d[H##ESIZE(le * 2 + TOP)], r, mask); \
41
- qc |= sat && (mask & 1 << (TOP * ESIZE)); \
42
+ qc |= sat & mask & 1; \
43
} \
44
if (qc) { \
45
env->vfp.qc[0] = qc; \
46
--
34
--
47
2.20.1
35
2.25.1
48
49
diff view generated by jsdifflib
1
A cut-and-paste error meant we handled signed VADDV like
1
From: Richard Henderson <richard.henderson@linaro.org>
2
unsigned VADDV; fix the type used.
3
2
3
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
5
Message-id: 20220708151540.18136-39-richard.henderson@linaro.org
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
---
7
---
7
target/arm/mve_helper.c | 6 +++---
8
linux-user/aarch64/signal.c | 3 +++
8
1 file changed, 3 insertions(+), 3 deletions(-)
9
1 file changed, 3 insertions(+)
9
10
10
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
11
diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c
11
index XXXXXXX..XXXXXXX 100644
12
index XXXXXXX..XXXXXXX 100644
12
--- a/target/arm/mve_helper.c
13
--- a/linux-user/aarch64/signal.c
13
+++ b/target/arm/mve_helper.c
14
+++ b/linux-user/aarch64/signal.c
14
@@ -XXX,XX +XXX,XX @@ DO_LDAVH(vrmlsldavhxsw, int32_t, int64_t, true, true)
15
@@ -XXX,XX +XXX,XX @@ static int target_restore_sigframe(CPUARMState *env,
15
return ra; \
16
__get_user(extra_size,
16
} \
17
&((struct target_extra_context *)ctx)->size);
17
18
extra = lock_user(VERIFY_READ, extra_datap, extra_size, 0);
18
-DO_VADDV(vaddvsb, 1, uint8_t)
19
+ if (!extra) {
19
-DO_VADDV(vaddvsh, 2, uint16_t)
20
+ return 1;
20
-DO_VADDV(vaddvsw, 4, uint32_t)
21
+ }
21
+DO_VADDV(vaddvsb, 1, int8_t)
22
break;
22
+DO_VADDV(vaddvsh, 2, int16_t)
23
23
+DO_VADDV(vaddvsw, 4, int32_t)
24
default:
24
DO_VADDV(vaddvub, 1, uint8_t)
25
DO_VADDV(vaddvuh, 2, uint16_t)
26
DO_VADDV(vaddvuw, 4, uint32_t)
27
--
25
--
28
2.20.1
26
2.25.1
29
30
diff view generated by jsdifflib
1
Implement the MVE VABAV insn, which computes absolute differences
1
From: Richard Henderson <richard.henderson@linaro.org>
2
between elements of two vectors and accumulates the result into
3
a general purpose register.
4
2
3
Move the checks out of the parsing loop and into the
4
restore function. This more closely mirrors the code
5
structure in the kernel, and is slightly clearer.
6
7
Reject rather than silently skip incorrect VL and SVE record sizes,
8
bringing our checks in to line with those the kernel does.
9
10
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
11
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
12
Message-id: 20220708151540.18136-40-richard.henderson@linaro.org
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
13
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
7
---
14
---
8
target/arm/helper-mve.h | 7 +++++++
15
linux-user/aarch64/signal.c | 51 +++++++++++++++++++++++++------------
9
target/arm/mve.decode | 6 ++++++
16
1 file changed, 35 insertions(+), 16 deletions(-)
10
target/arm/mve_helper.c | 26 +++++++++++++++++++++++
11
target/arm/translate-mve.c | 43 ++++++++++++++++++++++++++++++++++++++
12
4 files changed, 82 insertions(+)
13
17
14
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
18
diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c
15
index XXXXXXX..XXXXXXX 100644
19
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/helper-mve.h
20
--- a/linux-user/aarch64/signal.c
17
+++ b/target/arm/helper-mve.h
21
+++ b/linux-user/aarch64/signal.c
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vminavw, TCG_CALL_NO_WG, i32, env, ptr, i32)
22
@@ -XXX,XX +XXX,XX @@ static void target_restore_fpsimd_record(CPUARMState *env,
19
DEF_HELPER_FLAGS_3(mve_vaddlv_s, TCG_CALL_NO_WG, i64, env, ptr, i64)
23
}
20
DEF_HELPER_FLAGS_3(mve_vaddlv_u, TCG_CALL_NO_WG, i64, env, ptr, i64)
21
22
+DEF_HELPER_FLAGS_4(mve_vabavsb, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
23
+DEF_HELPER_FLAGS_4(mve_vabavsh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
24
+DEF_HELPER_FLAGS_4(mve_vabavsw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
25
+DEF_HELPER_FLAGS_4(mve_vabavub, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
26
+DEF_HELPER_FLAGS_4(mve_vabavuh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
27
+DEF_HELPER_FLAGS_4(mve_vabavuw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
28
+
29
DEF_HELPER_FLAGS_3(mve_vmovi, TCG_CALL_NO_WG, void, env, ptr, i64)
30
DEF_HELPER_FLAGS_3(mve_vandi, TCG_CALL_NO_WG, void, env, ptr, i64)
31
DEF_HELPER_FLAGS_3(mve_vorri, TCG_CALL_NO_WG, void, env, ptr, i64)
32
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
33
index XXXXXXX..XXXXXXX 100644
34
--- a/target/arm/mve.decode
35
+++ b/target/arm/mve.decode
36
@@ -XXX,XX +XXX,XX @@
37
&vcmp_scalar qn rm size mask
38
&shl_scalar qda rm size
39
&vmaxv qm rda size
40
+&vabav qn qm rda size
41
42
@vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
43
# Note that both Rn and Qd are 3 bits only (no D bit)
44
@@ -XXX,XX +XXX,XX @@ VMLAS 111- 1110 0 . .. ... 1 ... 1 1110 . 100 .... @2scalar
45
rdahi=%rdahi rdalo=%rdalo
46
}
24
}
47
25
48
+@vabav .... .... .. size:2 .... rda:4 .... .... .... &vabav qn=%qn qm=%qm
26
-static void target_restore_sve_record(CPUARMState *env,
49
+
27
- struct target_sve_context *sve, int vq)
50
+VABAV_S 111 0 1110 10 .. ... 0 .... 1111 . 0 . 0 ... 1 @vabav
28
+static bool target_restore_sve_record(CPUARMState *env,
51
+VABAV_U 111 1 1110 10 .. ... 0 .... 1111 . 0 . 0 ... 1 @vabav
29
+ struct target_sve_context *sve,
52
+
30
+ int size)
53
# Logical immediate operations (1 reg and modified-immediate)
31
{
54
32
- int i, j;
55
# The cmode/op bits here decode VORR/VBIC/VMOV/VMVN, but
33
+ int i, j, vl, vq;
56
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
34
57
index XXXXXXX..XXXXXXX 100644
35
- /* Note that SVE regs are stored as a byte stream, with each byte element
58
--- a/target/arm/mve_helper.c
36
+ if (!cpu_isar_feature(aa64_sve, env_archcpu(env))) {
59
+++ b/target/arm/mve_helper.c
37
+ return false;
60
@@ -XXX,XX +XXX,XX @@ DO_VMAXMINV(vminavb, 1, int8_t, uint8_t, do_mina)
61
DO_VMAXMINV(vminavh, 2, int16_t, uint16_t, do_mina)
62
DO_VMAXMINV(vminavw, 4, int32_t, uint32_t, do_mina)
63
64
+#define DO_VABAV(OP, ESIZE, TYPE) \
65
+ uint32_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vn, \
66
+ void *vm, uint32_t ra) \
67
+ { \
68
+ uint16_t mask = mve_element_mask(env); \
69
+ unsigned e; \
70
+ TYPE *m = vm, *n = vn; \
71
+ for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
72
+ if (mask & 1) { \
73
+ int64_t n0 = n[H##ESIZE(e)]; \
74
+ int64_t m0 = m[H##ESIZE(e)]; \
75
+ uint32_t r = n0 >= m0 ? (n0 - m0) : (m0 - n0); \
76
+ ra += r; \
77
+ } \
78
+ } \
79
+ mve_advance_vpt(env); \
80
+ return ra; \
81
+ }
38
+ }
82
+
39
+
83
+DO_VABAV(vabavsb, 1, int8_t)
40
+ __get_user(vl, &sve->vl);
84
+DO_VABAV(vabavsh, 2, int16_t)
41
+ vq = sve_vq(env);
85
+DO_VABAV(vabavsw, 4, int32_t)
86
+DO_VABAV(vabavub, 1, uint8_t)
87
+DO_VABAV(vabavuh, 2, uint16_t)
88
+DO_VABAV(vabavuw, 4, uint32_t)
89
+
42
+
90
#define DO_VADDLV(OP, TYPE, LTYPE) \
43
+ /* Reject mismatched VL. */
91
uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vm, \
44
+ if (vl != vq * TARGET_SVE_VQ_BYTES) {
92
uint64_t ra) \
93
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
94
index XXXXXXX..XXXXXXX 100644
95
--- a/target/arm/translate-mve.c
96
+++ b/target/arm/translate-mve.c
97
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenVIDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32);
98
typedef void MVEGenVIWDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32);
99
typedef void MVEGenCmpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
100
typedef void MVEGenScalarCmpFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
101
+typedef void MVEGenVABAVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
102
103
/* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
104
static inline long mve_qreg_offset(unsigned reg)
105
@@ -XXX,XX +XXX,XX @@ DO_VMAXV(VMAXAV, vmaxav)
106
DO_VMAXV(VMINV_S, vminvs)
107
DO_VMAXV(VMINV_U, vminvu)
108
DO_VMAXV(VMINAV, vminav)
109
+
110
+static bool do_vabav(DisasContext *s, arg_vabav *a, MVEGenVABAVFn *fn)
111
+{
112
+ /* Absolute difference accumulated across vector */
113
+ TCGv_ptr qn, qm;
114
+ TCGv_i32 rda;
115
+
116
+ if (!dc_isar_feature(aa32_mve, s) ||
117
+ !mve_check_qreg_bank(s, a->qm | a->qn) ||
118
+ !fn || a->rda == 13 || a->rda == 15) {
119
+ /* Rda cases are UNPREDICTABLE */
120
+ return false;
45
+ return false;
121
+ }
46
+ }
122
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
47
+
48
+ /* Accept empty record -- used to clear PSTATE.SM. */
49
+ if (size <= sizeof(*sve)) {
123
+ return true;
50
+ return true;
124
+ }
51
+ }
125
+
52
+
126
+ qm = mve_qreg_ptr(a->qm);
53
+ /* Reject non-empty but incomplete record. */
127
+ qn = mve_qreg_ptr(a->qn);
54
+ if (size < TARGET_SVE_SIG_CONTEXT_SIZE(vq)) {
128
+ rda = load_reg(s, a->rda);
55
+ return false;
129
+ fn(rda, cpu_env, qn, qm, rda);
130
+ store_reg(s, a->rda, rda);
131
+ tcg_temp_free_ptr(qm);
132
+ tcg_temp_free_ptr(qn);
133
+ mve_update_eci(s);
134
+ return true;
135
+}
136
+
137
+#define DO_VABAV(INSN, FN) \
138
+ static bool trans_##INSN(DisasContext *s, arg_vabav *a) \
139
+ { \
140
+ static MVEGenVABAVFn * const fns[] = { \
141
+ gen_helper_mve_##FN##b, \
142
+ gen_helper_mve_##FN##h, \
143
+ gen_helper_mve_##FN##w, \
144
+ NULL, \
145
+ }; \
146
+ return do_vabav(s, a, fns[a->size]); \
147
+ }
56
+ }
148
+
57
+
149
+DO_VABAV(VABAV_S, vabavs)
58
+ /*
150
+DO_VABAV(VABAV_U, vabavu)
59
+ * Note that SVE regs are stored as a byte stream, with each byte element
60
* at a subsequent address. This corresponds to a little-endian load
61
* of our 64-bit hunks.
62
*/
63
@@ -XXX,XX +XXX,XX @@ static void target_restore_sve_record(CPUARMState *env,
64
}
65
}
66
}
67
+ return true;
68
}
69
70
static int target_restore_sigframe(CPUARMState *env,
71
@@ -XXX,XX +XXX,XX @@ static int target_restore_sigframe(CPUARMState *env,
72
struct target_sve_context *sve = NULL;
73
uint64_t extra_datap = 0;
74
bool used_extra = false;
75
- int vq = 0, sve_size = 0;
76
+ int sve_size = 0;
77
78
target_restore_general_frame(env, sf);
79
80
@@ -XXX,XX +XXX,XX @@ static int target_restore_sigframe(CPUARMState *env,
81
if (sve || size < sizeof(struct target_sve_context)) {
82
goto err;
83
}
84
- if (cpu_isar_feature(aa64_sve, env_archcpu(env))) {
85
- vq = sve_vq(env);
86
- sve_size = QEMU_ALIGN_UP(TARGET_SVE_SIG_CONTEXT_SIZE(vq), 16);
87
- if (size == sve_size) {
88
- sve = (struct target_sve_context *)ctx;
89
- break;
90
- }
91
- }
92
- goto err;
93
+ sve = (struct target_sve_context *)ctx;
94
+ sve_size = size;
95
+ break;
96
97
case TARGET_EXTRA_MAGIC:
98
if (extra || size != sizeof(struct target_extra_context)) {
99
@@ -XXX,XX +XXX,XX @@ static int target_restore_sigframe(CPUARMState *env,
100
}
101
102
/* SVE data, if present, overwrites FPSIMD data. */
103
- if (sve) {
104
- target_restore_sve_record(env, sve, vq);
105
+ if (sve && !target_restore_sve_record(env, sve, sve_size)) {
106
+ goto err;
107
}
108
unlock_user(extra, extra_datap, 0);
109
return 0;
151
--
110
--
152
2.20.1
111
2.25.1
153
154
diff view generated by jsdifflib
1
Implement the MVE VPNOT insn, which inverts the bits in VPR.P0
1
From: Richard Henderson <richard.henderson@linaro.org>
2
(subject to both predication and to beatwise execution).
3
2
3
Set the SM bit in the SVE record on signal delivery, create the ZA record.
4
Restore SM and ZA state according to the records present on return.
5
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20220708151540.18136-41-richard.henderson@linaro.org
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
---
10
---
7
target/arm/helper-mve.h | 1 +
11
linux-user/aarch64/signal.c | 167 +++++++++++++++++++++++++++++++++---
8
target/arm/mve.decode | 1 +
12
1 file changed, 154 insertions(+), 13 deletions(-)
9
target/arm/mve_helper.c | 17 +++++++++++++++++
10
target/arm/translate-mve.c | 19 +++++++++++++++++++
11
4 files changed, 38 insertions(+)
12
13
13
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
14
diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c
14
index XXXXXXX..XXXXXXX 100644
15
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/helper-mve.h
16
--- a/linux-user/aarch64/signal.c
16
+++ b/target/arm/helper-mve.h
17
+++ b/linux-user/aarch64/signal.c
17
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vorn, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
18
@@ -XXX,XX +XXX,XX @@ struct target_sve_context {
18
DEF_HELPER_FLAGS_4(mve_veor, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
19
19
20
#define TARGET_SVE_SIG_FLAG_SM 1
20
DEF_HELPER_FLAGS_4(mve_vpsel, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
21
21
+DEF_HELPER_FLAGS_1(mve_vpnot, TCG_CALL_NO_WG, void, env)
22
+#define TARGET_ZA_MAGIC 0x54366345
22
23
+
23
DEF_HELPER_FLAGS_4(mve_vaddb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
24
+struct target_za_context {
24
DEF_HELPER_FLAGS_4(mve_vaddh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
25
+ struct target_aarch64_ctx head;
25
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
26
+ uint16_t vl;
26
index XXXXXXX..XXXXXXX 100644
27
+ uint16_t reserved[3];
27
--- a/target/arm/mve.decode
28
+ /* The actual ZA data immediately follows. */
28
+++ b/target/arm/mve.decode
29
+};
29
@@ -XXX,XX +XXX,XX @@ VCMPGT 1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 1 @vcmp
30
+
30
VCMPLE 1111 1110 0 . .. ... 1 ... 1 1111 1 0 . 0 ... 1 @vcmp
31
+#define TARGET_ZA_SIG_REGS_OFFSET \
31
32
+ QEMU_ALIGN_UP(sizeof(struct target_za_context), TARGET_SVE_VQ_BYTES)
32
{
33
+#define TARGET_ZA_SIG_ZAV_OFFSET(VQ, N) \
33
+ VPNOT 1111 1110 0 0 11 000 1 000 0 1111 0100 1101
34
+ (TARGET_ZA_SIG_REGS_OFFSET + (VQ) * TARGET_SVE_VQ_BYTES * (N))
34
VPST 1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
35
+#define TARGET_ZA_SIG_CONTEXT_SIZE(VQ) \
35
VCMPEQ_scalar 1111 1110 0 . .. ... 1 ... 0 1111 0 1 0 0 .... @vcmp_scalar
36
+ TARGET_ZA_SIG_ZAV_OFFSET(VQ, VQ * TARGET_SVE_VQ_BYTES)
37
+
38
struct target_rt_sigframe {
39
struct target_siginfo info;
40
struct target_ucontext uc;
41
@@ -XXX,XX +XXX,XX @@ static void target_setup_end_record(struct target_aarch64_ctx *end)
36
}
42
}
37
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
43
38
index XXXXXXX..XXXXXXX 100644
44
static void target_setup_sve_record(struct target_sve_context *sve,
39
--- a/target/arm/mve_helper.c
45
- CPUARMState *env, int vq, int size)
40
+++ b/target/arm/mve_helper.c
46
+ CPUARMState *env, int size)
41
@@ -XXX,XX +XXX,XX @@ void HELPER(mve_vpsel)(CPUARMState *env, void *vd, void *vn, void *vm)
47
{
42
mve_advance_vpt(env);
48
- int i, j;
49
+ int i, j, vq = sve_vq(env);
50
51
memset(sve, 0, sizeof(*sve));
52
__put_user(TARGET_SVE_MAGIC, &sve->head.magic);
53
@@ -XXX,XX +XXX,XX @@ static void target_setup_sve_record(struct target_sve_context *sve,
54
}
43
}
55
}
44
56
45
+void HELPER(mve_vpnot)(CPUARMState *env)
57
+static void target_setup_za_record(struct target_za_context *za,
58
+ CPUARMState *env, int size)
46
+{
59
+{
60
+ int vq = sme_vq(env);
61
+ int vl = vq * TARGET_SVE_VQ_BYTES;
62
+ int i, j;
63
+
64
+ memset(za, 0, sizeof(*za));
65
+ __put_user(TARGET_ZA_MAGIC, &za->head.magic);
66
+ __put_user(size, &za->head.size);
67
+ __put_user(vl, &za->vl);
68
+
69
+ if (size == TARGET_ZA_SIG_CONTEXT_SIZE(0)) {
70
+ return;
71
+ }
72
+ assert(size == TARGET_ZA_SIG_CONTEXT_SIZE(vq));
73
+
47
+ /*
74
+ /*
48
+ * P0 bits for unexecuted beats (where eci_mask is 0) are unchanged.
75
+ * Note that ZA vectors are stored as a byte stream,
49
+ * P0 bits for predicated lanes in executed bits (where mask is 0) are 0.
76
+ * with each byte element at a subsequent address.
50
+ * P0 bits otherwise are inverted.
51
+ * (This is the same logic as VCMP.)
52
+ * This insn is itself subject to predication and to beat-wise execution,
53
+ * and after it executes VPT state advances in the usual way.
54
+ */
77
+ */
55
+ uint16_t mask = mve_element_mask(env);
78
+ for (i = 0; i < vl; ++i) {
56
+ uint16_t eci_mask = mve_eci_mask(env);
79
+ uint64_t *z = (void *)za + TARGET_ZA_SIG_ZAV_OFFSET(vq, i);
57
+ uint16_t beatpred = ~env->v7m.vpr & mask;
80
+ for (j = 0; j < vq * 2; ++j) {
58
+ env->v7m.vpr = (env->v7m.vpr & ~(uint32_t)eci_mask) | (beatpred & eci_mask);
81
+ __put_user_e(env->zarray[i].d[j], z + j, le);
59
+ mve_advance_vpt(env);
82
+ }
83
+ }
60
+}
84
+}
61
+
85
+
62
#define DO_1OP_SAT(OP, ESIZE, TYPE, FN) \
86
static void target_restore_general_frame(CPUARMState *env,
63
void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm) \
87
struct target_rt_sigframe *sf)
64
{ \
88
{
65
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
89
@@ -XXX,XX +XXX,XX @@ static void target_restore_fpsimd_record(CPUARMState *env,
66
index XXXXXXX..XXXXXXX 100644
90
67
--- a/target/arm/translate-mve.c
91
static bool target_restore_sve_record(CPUARMState *env,
68
+++ b/target/arm/translate-mve.c
92
struct target_sve_context *sve,
69
@@ -XXX,XX +XXX,XX @@ static bool trans_VPST(DisasContext *s, arg_VPST *a)
93
- int size)
94
+ int size, int *svcr)
95
{
96
- int i, j, vl, vq;
97
+ int i, j, vl, vq, flags;
98
+ bool sm;
99
100
- if (!cpu_isar_feature(aa64_sve, env_archcpu(env))) {
101
+ __get_user(vl, &sve->vl);
102
+ __get_user(flags, &sve->flags);
103
+
104
+ sm = flags & TARGET_SVE_SIG_FLAG_SM;
105
+
106
+ /* The cpu must support Streaming or Non-streaming SVE. */
107
+ if (sm
108
+ ? !cpu_isar_feature(aa64_sme, env_archcpu(env))
109
+ : !cpu_isar_feature(aa64_sve, env_archcpu(env))) {
110
return false;
111
}
112
113
- __get_user(vl, &sve->vl);
114
- vq = sve_vq(env);
115
+ /*
116
+ * Note that we cannot use sve_vq() because that depends on the
117
+ * current setting of PSTATE.SM, not the state to be restored.
118
+ */
119
+ vq = sve_vqm1_for_el_sm(env, 0, sm) + 1;
120
121
/* Reject mismatched VL. */
122
if (vl != vq * TARGET_SVE_VQ_BYTES) {
123
@@ -XXX,XX +XXX,XX @@ static bool target_restore_sve_record(CPUARMState *env,
124
return false;
125
}
126
127
+ *svcr = FIELD_DP64(*svcr, SVCR, SM, sm);
128
+
129
/*
130
* Note that SVE regs are stored as a byte stream, with each byte element
131
* at a subsequent address. This corresponds to a little-endian load
132
@@ -XXX,XX +XXX,XX @@ static bool target_restore_sve_record(CPUARMState *env,
70
return true;
133
return true;
71
}
134
}
72
135
73
+static bool trans_VPNOT(DisasContext *s, arg_VPNOT *a)
136
+static bool target_restore_za_record(CPUARMState *env,
137
+ struct target_za_context *za,
138
+ int size, int *svcr)
74
+{
139
+{
75
+ /*
140
+ int i, j, vl, vq;
76
+ * Invert the predicate in VPR.P0. We have call out to
141
+
77
+ * a helper because this insn itself is beatwise and can
142
+ if (!cpu_isar_feature(aa64_sme, env_archcpu(env))) {
78
+ * be predicated.
79
+ */
80
+ if (!dc_isar_feature(aa32_mve, s)) {
81
+ return false;
143
+ return false;
82
+ }
144
+ }
83
+ if (!mve_eci_check(s) || !vfp_access_check(s)) {
145
+
146
+ __get_user(vl, &za->vl);
147
+ vq = sme_vq(env);
148
+
149
+ /* Reject mismatched VL. */
150
+ if (vl != vq * TARGET_SVE_VQ_BYTES) {
151
+ return false;
152
+ }
153
+
154
+ /* Accept empty record -- used to clear PSTATE.ZA. */
155
+ if (size <= TARGET_ZA_SIG_CONTEXT_SIZE(0)) {
84
+ return true;
156
+ return true;
85
+ }
157
+ }
86
+
158
+
87
+ gen_helper_mve_vpnot(cpu_env);
159
+ /* Reject non-empty but incomplete record. */
88
+ mve_update_eci(s);
160
+ if (size < TARGET_ZA_SIG_CONTEXT_SIZE(vq)) {
161
+ return false;
162
+ }
163
+
164
+ *svcr = FIELD_DP64(*svcr, SVCR, ZA, 1);
165
+
166
+ for (i = 0; i < vl; ++i) {
167
+ uint64_t *z = (void *)za + TARGET_ZA_SIG_ZAV_OFFSET(vq, i);
168
+ for (j = 0; j < vq * 2; ++j) {
169
+ __get_user_e(env->zarray[i].d[j], z + j, le);
170
+ }
171
+ }
89
+ return true;
172
+ return true;
90
+}
173
+}
91
+
174
+
92
static bool trans_VADDV(DisasContext *s, arg_VADDV *a)
175
static int target_restore_sigframe(CPUARMState *env,
93
{
176
struct target_rt_sigframe *sf)
94
/* VADDV: vector add across vector */
177
{
178
struct target_aarch64_ctx *ctx, *extra = NULL;
179
struct target_fpsimd_context *fpsimd = NULL;
180
struct target_sve_context *sve = NULL;
181
+ struct target_za_context *za = NULL;
182
uint64_t extra_datap = 0;
183
bool used_extra = false;
184
int sve_size = 0;
185
+ int za_size = 0;
186
+ int svcr = 0;
187
188
target_restore_general_frame(env, sf);
189
190
@@ -XXX,XX +XXX,XX @@ static int target_restore_sigframe(CPUARMState *env,
191
sve_size = size;
192
break;
193
194
+ case TARGET_ZA_MAGIC:
195
+ if (za || size < sizeof(struct target_za_context)) {
196
+ goto err;
197
+ }
198
+ za = (struct target_za_context *)ctx;
199
+ za_size = size;
200
+ break;
201
+
202
case TARGET_EXTRA_MAGIC:
203
if (extra || size != sizeof(struct target_extra_context)) {
204
goto err;
205
@@ -XXX,XX +XXX,XX @@ static int target_restore_sigframe(CPUARMState *env,
206
}
207
208
/* SVE data, if present, overwrites FPSIMD data. */
209
- if (sve && !target_restore_sve_record(env, sve, sve_size)) {
210
+ if (sve && !target_restore_sve_record(env, sve, sve_size, &svcr)) {
211
goto err;
212
}
213
+ if (za && !target_restore_za_record(env, za, za_size, &svcr)) {
214
+ goto err;
215
+ }
216
+ if (env->svcr != svcr) {
217
+ env->svcr = svcr;
218
+ arm_rebuild_hflags(env);
219
+ }
220
unlock_user(extra, extra_datap, 0);
221
return 0;
222
223
@@ -XXX,XX +XXX,XX @@ static void target_setup_frame(int usig, struct target_sigaction *ka,
224
.total_size = offsetof(struct target_rt_sigframe,
225
uc.tuc_mcontext.__reserved),
226
};
227
- int fpsimd_ofs, fr_ofs, sve_ofs = 0, vq = 0, sve_size = 0;
228
+ int fpsimd_ofs, fr_ofs, sve_ofs = 0, za_ofs = 0;
229
+ int sve_size = 0, za_size = 0;
230
struct target_rt_sigframe *frame;
231
struct target_rt_frame_record *fr;
232
abi_ulong frame_addr, return_addr;
233
@@ -XXX,XX +XXX,XX @@ static void target_setup_frame(int usig, struct target_sigaction *ka,
234
&layout);
235
236
/* SVE state needs saving only if it exists. */
237
- if (cpu_isar_feature(aa64_sve, env_archcpu(env))) {
238
- vq = sve_vq(env);
239
- sve_size = QEMU_ALIGN_UP(TARGET_SVE_SIG_CONTEXT_SIZE(vq), 16);
240
+ if (cpu_isar_feature(aa64_sve, env_archcpu(env)) ||
241
+ cpu_isar_feature(aa64_sme, env_archcpu(env))) {
242
+ sve_size = QEMU_ALIGN_UP(TARGET_SVE_SIG_CONTEXT_SIZE(sve_vq(env)), 16);
243
sve_ofs = alloc_sigframe_space(sve_size, &layout);
244
}
245
+ if (cpu_isar_feature(aa64_sme, env_archcpu(env))) {
246
+ /* ZA state needs saving only if it is enabled. */
247
+ if (FIELD_EX64(env->svcr, SVCR, ZA)) {
248
+ za_size = TARGET_ZA_SIG_CONTEXT_SIZE(sme_vq(env));
249
+ } else {
250
+ za_size = TARGET_ZA_SIG_CONTEXT_SIZE(0);
251
+ }
252
+ za_ofs = alloc_sigframe_space(za_size, &layout);
253
+ }
254
255
if (layout.extra_ofs) {
256
/* Reserve space for the extra end marker. The standard end marker
257
@@ -XXX,XX +XXX,XX @@ static void target_setup_frame(int usig, struct target_sigaction *ka,
258
target_setup_end_record((void *)frame + layout.extra_end_ofs);
259
}
260
if (sve_ofs) {
261
- target_setup_sve_record((void *)frame + sve_ofs, env, vq, sve_size);
262
+ target_setup_sve_record((void *)frame + sve_ofs, env, sve_size);
263
+ }
264
+ if (za_ofs) {
265
+ target_setup_za_record((void *)frame + za_ofs, env, za_size);
266
}
267
268
/* Set up the stack frame for unwinding. */
269
@@ -XXX,XX +XXX,XX @@ static void target_setup_frame(int usig, struct target_sigaction *ka,
270
env->btype = 2;
271
}
272
273
+ /*
274
+ * Invoke the signal handler with both SM and ZA disabled.
275
+ * When clearing SM, ResetSVEState, per SMSTOP.
276
+ */
277
+ if (FIELD_EX64(env->svcr, SVCR, SM)) {
278
+ arm_reset_sve_state(env);
279
+ }
280
+ if (env->svcr) {
281
+ env->svcr = 0;
282
+ arm_rebuild_hflags(env);
283
+ }
284
+
285
if (info) {
286
tswap_siginfo(&frame->info, info);
287
env->xregs[1] = frame_addr + offsetof(struct target_rt_sigframe, info);
95
--
288
--
96
2.20.1
289
2.25.1
97
98
diff view generated by jsdifflib
1
Although the architecture doesn't define it as an alias, VMOVL
1
From: Richard Henderson <richard.henderson@linaro.org>
2
(vector move long) is encoded as a VSHLL with a zero shift.
3
Add a comment in the decode file noting that we handle VMOVL
4
as part of VSHLL.
5
2
3
Add "sve" to the sve prctl functions, to distinguish
4
them from the coming "sme" prctls with similar names.
5
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20220708151540.18136-42-richard.henderson@linaro.org
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
8
---
10
---
9
target/arm/mve.decode | 2 ++
11
linux-user/aarch64/target_prctl.h | 8 ++++----
10
1 file changed, 2 insertions(+)
12
linux-user/syscall.c | 12 ++++++------
13
2 files changed, 10 insertions(+), 10 deletions(-)
11
14
12
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
15
diff --git a/linux-user/aarch64/target_prctl.h b/linux-user/aarch64/target_prctl.h
13
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
14
--- a/target/arm/mve.decode
17
--- a/linux-user/aarch64/target_prctl.h
15
+++ b/target/arm/mve.decode
18
+++ b/linux-user/aarch64/target_prctl.h
16
@@ -XXX,XX +XXX,XX @@ VRSHRI_U 111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_h
19
@@ -XXX,XX +XXX,XX @@
17
VRSHRI_U 111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_w
20
#ifndef AARCH64_TARGET_PRCTL_H
18
21
#define AARCH64_TARGET_PRCTL_H
19
# VSHLL T1 encoding; the T2 VSHLL encoding is elsewhere in this file
22
20
+# Note that VMOVL is encoded as "VSHLL with a zero shift count"; we
23
-static abi_long do_prctl_get_vl(CPUArchState *env)
21
+# implement it that way rather than special-casing it in the decode.
24
+static abi_long do_prctl_sve_get_vl(CPUArchState *env)
22
VSHLL_BS 111 0 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_b
25
{
23
VSHLL_BS 111 0 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_h
26
ARMCPU *cpu = env_archcpu(env);
24
27
if (cpu_isar_feature(aa64_sve, cpu)) {
28
@@ -XXX,XX +XXX,XX @@ static abi_long do_prctl_get_vl(CPUArchState *env)
29
}
30
return -TARGET_EINVAL;
31
}
32
-#define do_prctl_get_vl do_prctl_get_vl
33
+#define do_prctl_sve_get_vl do_prctl_sve_get_vl
34
35
-static abi_long do_prctl_set_vl(CPUArchState *env, abi_long arg2)
36
+static abi_long do_prctl_sve_set_vl(CPUArchState *env, abi_long arg2)
37
{
38
/*
39
* We cannot support either PR_SVE_SET_VL_ONEXEC or PR_SVE_VL_INHERIT.
40
@@ -XXX,XX +XXX,XX @@ static abi_long do_prctl_set_vl(CPUArchState *env, abi_long arg2)
41
}
42
return -TARGET_EINVAL;
43
}
44
-#define do_prctl_set_vl do_prctl_set_vl
45
+#define do_prctl_sve_set_vl do_prctl_sve_set_vl
46
47
static abi_long do_prctl_reset_keys(CPUArchState *env, abi_long arg2)
48
{
49
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
50
index XXXXXXX..XXXXXXX 100644
51
--- a/linux-user/syscall.c
52
+++ b/linux-user/syscall.c
53
@@ -XXX,XX +XXX,XX @@ static abi_long do_prctl_inval1(CPUArchState *env, abi_long arg2)
54
#ifndef do_prctl_set_fp_mode
55
#define do_prctl_set_fp_mode do_prctl_inval1
56
#endif
57
-#ifndef do_prctl_get_vl
58
-#define do_prctl_get_vl do_prctl_inval0
59
+#ifndef do_prctl_sve_get_vl
60
+#define do_prctl_sve_get_vl do_prctl_inval0
61
#endif
62
-#ifndef do_prctl_set_vl
63
-#define do_prctl_set_vl do_prctl_inval1
64
+#ifndef do_prctl_sve_set_vl
65
+#define do_prctl_sve_set_vl do_prctl_inval1
66
#endif
67
#ifndef do_prctl_reset_keys
68
#define do_prctl_reset_keys do_prctl_inval1
69
@@ -XXX,XX +XXX,XX @@ static abi_long do_prctl(CPUArchState *env, abi_long option, abi_long arg2,
70
case PR_SET_FP_MODE:
71
return do_prctl_set_fp_mode(env, arg2);
72
case PR_SVE_GET_VL:
73
- return do_prctl_get_vl(env);
74
+ return do_prctl_sve_get_vl(env);
75
case PR_SVE_SET_VL:
76
- return do_prctl_set_vl(env, arg2);
77
+ return do_prctl_sve_set_vl(env, arg2);
78
case PR_PAC_RESET_KEYS:
79
if (arg3 || arg4 || arg5) {
80
return -TARGET_EINVAL;
25
--
81
--
26
2.20.1
82
2.25.1
27
28
diff view generated by jsdifflib
1
Implement the MVE VPSEL insn, which sets each byte of the destination
1
From: Richard Henderson <richard.henderson@linaro.org>
2
vector Qd to the byte from either Qn or Qm depending on the value of
3
the corresponding bit in VPR.P0.
4
2
3
These prctl set the Streaming SVE vector length, which may
4
be completely different from the Normal SVE vector length.
5
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20220708151540.18136-43-richard.henderson@linaro.org
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
7
---
10
---
8
target/arm/helper-mve.h | 2 ++
11
linux-user/aarch64/target_prctl.h | 54 +++++++++++++++++++++++++++++++
9
target/arm/mve.decode | 7 +++++--
12
linux-user/syscall.c | 16 +++++++++
10
target/arm/mve_helper.c | 19 +++++++++++++++++++
13
2 files changed, 70 insertions(+)
11
target/arm/translate-mve.c | 2 ++
12
4 files changed, 28 insertions(+), 2 deletions(-)
13
14
14
diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
15
diff --git a/linux-user/aarch64/target_prctl.h b/linux-user/aarch64/target_prctl.h
15
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/helper-mve.h
17
--- a/linux-user/aarch64/target_prctl.h
17
+++ b/target/arm/helper-mve.h
18
+++ b/linux-user/aarch64/target_prctl.h
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vorr, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
19
@@ -XXX,XX +XXX,XX @@ static abi_long do_prctl_sve_get_vl(CPUArchState *env)
19
DEF_HELPER_FLAGS_4(mve_vorn, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
20
{
20
DEF_HELPER_FLAGS_4(mve_veor, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
21
ARMCPU *cpu = env_archcpu(env);
21
22
if (cpu_isar_feature(aa64_sve, cpu)) {
22
+DEF_HELPER_FLAGS_4(mve_vpsel, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
23
+ /* PSTATE.SM is always unset on syscall entry. */
24
return sve_vq(env) * 16;
25
}
26
return -TARGET_EINVAL;
27
@@ -XXX,XX +XXX,XX @@ static abi_long do_prctl_sve_set_vl(CPUArchState *env, abi_long arg2)
28
&& arg2 >= 0 && arg2 <= 512 * 16 && !(arg2 & 15)) {
29
uint32_t vq, old_vq;
30
31
+ /* PSTATE.SM is always unset on syscall entry. */
32
old_vq = sve_vq(env);
33
34
/*
35
@@ -XXX,XX +XXX,XX @@ static abi_long do_prctl_sve_set_vl(CPUArchState *env, abi_long arg2)
36
}
37
#define do_prctl_sve_set_vl do_prctl_sve_set_vl
38
39
+static abi_long do_prctl_sme_get_vl(CPUArchState *env)
40
+{
41
+ ARMCPU *cpu = env_archcpu(env);
42
+ if (cpu_isar_feature(aa64_sme, cpu)) {
43
+ return sme_vq(env) * 16;
44
+ }
45
+ return -TARGET_EINVAL;
46
+}
47
+#define do_prctl_sme_get_vl do_prctl_sme_get_vl
23
+
48
+
24
DEF_HELPER_FLAGS_4(mve_vaddb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
49
+static abi_long do_prctl_sme_set_vl(CPUArchState *env, abi_long arg2)
25
DEF_HELPER_FLAGS_4(mve_vaddh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
26
DEF_HELPER_FLAGS_4(mve_vaddw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
27
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
28
index XXXXXXX..XXXXXXX 100644
29
--- a/target/arm/mve.decode
30
+++ b/target/arm/mve.decode
31
@@ -XXX,XX +XXX,XX @@ VSHLC 111 0 1110 1 . 1 imm:5 ... 0 1111 1100 rdm:4 qd=%qd
32
# effectively "VCMP then VPST". A plain "VCMP" has a mask field of zero.
33
VCMPEQ 1111 1110 0 . .. ... 1 ... 0 1111 0 0 . 0 ... 0 @vcmp
34
VCMPNE 1111 1110 0 . .. ... 1 ... 0 1111 1 0 . 0 ... 0 @vcmp
35
-VCMPCS 1111 1110 0 . .. ... 1 ... 0 1111 0 0 . 0 ... 1 @vcmp
36
-VCMPHI 1111 1110 0 . .. ... 1 ... 0 1111 1 0 . 0 ... 1 @vcmp
37
+{
38
+ VPSEL 1111 1110 0 . 11 ... 1 ... 0 1111 . 0 . 0 ... 1 @2op_nosz
39
+ VCMPCS 1111 1110 0 . .. ... 1 ... 0 1111 0 0 . 0 ... 1 @vcmp
40
+ VCMPHI 1111 1110 0 . .. ... 1 ... 0 1111 1 0 . 0 ... 1 @vcmp
41
+}
42
VCMPGE 1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 0 @vcmp
43
VCMPLT 1111 1110 0 . .. ... 1 ... 1 1111 1 0 . 0 ... 0 @vcmp
44
VCMPGT 1111 1110 0 . .. ... 1 ... 1 1111 0 0 . 0 ... 1 @vcmp
45
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
46
index XXXXXXX..XXXXXXX 100644
47
--- a/target/arm/mve_helper.c
48
+++ b/target/arm/mve_helper.c
49
@@ -XXX,XX +XXX,XX @@ DO_VCMP_S(vcmpge, DO_GE)
50
DO_VCMP_S(vcmplt, DO_LT)
51
DO_VCMP_S(vcmpgt, DO_GT)
52
DO_VCMP_S(vcmple, DO_LE)
53
+
54
+void HELPER(mve_vpsel)(CPUARMState *env, void *vd, void *vn, void *vm)
55
+{
50
+{
56
+ /*
51
+ /*
57
+ * Qd[n] = VPR.P0[n] ? Qn[n] : Qm[n]
52
+ * We cannot support either PR_SME_SET_VL_ONEXEC or PR_SME_VL_INHERIT.
58
+ * but note that whether bytes are written to Qd is still subject
53
+ * Note the kernel definition of sve_vl_valid allows for VQ=512,
59
+ * to (all forms of) predication in the usual way.
54
+ * i.e. VL=8192, even though the architectural maximum is VQ=16.
60
+ */
55
+ */
61
+ uint64_t *d = vd, *n = vn, *m = vm;
56
+ if (cpu_isar_feature(aa64_sme, env_archcpu(env))
62
+ uint16_t mask = mve_element_mask(env);
57
+ && arg2 >= 0 && arg2 <= 512 * 16 && !(arg2 & 15)) {
63
+ uint16_t p0 = FIELD_EX32(env->v7m.vpr, V7M_VPR, P0);
58
+ int vq, old_vq;
64
+ unsigned e;
59
+
65
+ for (e = 0; e < 16 / 8; e++, mask >>= 8, p0 >>= 8) {
60
+ old_vq = sme_vq(env);
66
+ uint64_t r = m[H8(e)];
61
+
67
+ mergemask(&r, n[H8(e)], p0);
62
+ /*
68
+ mergemask(&d[H8(e)], r, mask);
63
+ * Bound the value of vq, so that we know that it fits into
64
+ * the 4-bit field in SMCR_EL1. Because PSTATE.SM is cleared
65
+ * on syscall entry, we are not modifying the current SVE
66
+ * vector length.
67
+ */
68
+ vq = MAX(arg2 / 16, 1);
69
+ vq = MIN(vq, 16);
70
+ env->vfp.smcr_el[1] =
71
+ FIELD_DP64(env->vfp.smcr_el[1], SMCR, LEN, vq - 1);
72
+
73
+ /* Delay rebuilding hflags until we know if ZA must change. */
74
+ vq = sve_vqm1_for_el_sm(env, 0, true) + 1;
75
+
76
+ if (vq != old_vq) {
77
+ /*
78
+ * PSTATE.ZA state is cleared on any change to SVL.
79
+ * We need not call arm_rebuild_hflags because PSTATE.SM was
80
+ * cleared on syscall entry, so this hasn't changed VL.
81
+ */
82
+ env->svcr = FIELD_DP64(env->svcr, SVCR, ZA, 0);
83
+ arm_rebuild_hflags(env);
84
+ }
85
+ return vq * 16;
69
+ }
86
+ }
70
+ mve_advance_vpt(env);
87
+ return -TARGET_EINVAL;
71
+}
88
+}
72
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
89
+#define do_prctl_sme_set_vl do_prctl_sme_set_vl
90
+
91
static abi_long do_prctl_reset_keys(CPUArchState *env, abi_long arg2)
92
{
93
ARMCPU *cpu = env_archcpu(env);
94
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
73
index XXXXXXX..XXXXXXX 100644
95
index XXXXXXX..XXXXXXX 100644
74
--- a/target/arm/translate-mve.c
96
--- a/linux-user/syscall.c
75
+++ b/target/arm/translate-mve.c
97
+++ b/linux-user/syscall.c
76
@@ -XXX,XX +XXX,XX @@ DO_LOGIC(VORR, gen_helper_mve_vorr)
98
@@ -XXX,XX +XXX,XX @@ abi_long do_arch_prctl(CPUX86State *env, int code, abi_ulong addr)
77
DO_LOGIC(VORN, gen_helper_mve_vorn)
99
#ifndef PR_SET_SYSCALL_USER_DISPATCH
78
DO_LOGIC(VEOR, gen_helper_mve_veor)
100
# define PR_SET_SYSCALL_USER_DISPATCH 59
79
101
#endif
80
+DO_LOGIC(VPSEL, gen_helper_mve_vpsel)
102
+#ifndef PR_SME_SET_VL
81
+
103
+# define PR_SME_SET_VL 63
82
#define DO_2OP(INSN, FN) \
104
+# define PR_SME_GET_VL 64
83
static bool trans_##INSN(DisasContext *s, arg_2op *a) \
105
+# define PR_SME_VL_LEN_MASK 0xffff
84
{ \
106
+# define PR_SME_VL_INHERIT (1 << 17)
107
+#endif
108
109
#include "target_prctl.h"
110
111
@@ -XXX,XX +XXX,XX @@ static abi_long do_prctl_inval1(CPUArchState *env, abi_long arg2)
112
#ifndef do_prctl_set_unalign
113
#define do_prctl_set_unalign do_prctl_inval1
114
#endif
115
+#ifndef do_prctl_sme_get_vl
116
+#define do_prctl_sme_get_vl do_prctl_inval0
117
+#endif
118
+#ifndef do_prctl_sme_set_vl
119
+#define do_prctl_sme_set_vl do_prctl_inval1
120
+#endif
121
122
static abi_long do_prctl(CPUArchState *env, abi_long option, abi_long arg2,
123
abi_long arg3, abi_long arg4, abi_long arg5)
124
@@ -XXX,XX +XXX,XX @@ static abi_long do_prctl(CPUArchState *env, abi_long option, abi_long arg2,
125
return do_prctl_sve_get_vl(env);
126
case PR_SVE_SET_VL:
127
return do_prctl_sve_set_vl(env, arg2);
128
+ case PR_SME_GET_VL:
129
+ return do_prctl_sme_get_vl(env);
130
+ case PR_SME_SET_VL:
131
+ return do_prctl_sme_set_vl(env, arg2);
132
case PR_PAC_RESET_KEYS:
133
if (arg3 || arg4 || arg5) {
134
return -TARGET_EINVAL;
85
--
135
--
86
2.20.1
136
2.25.1
87
88
diff view generated by jsdifflib
1
In the MVE shift-and-insert insns, we special case VSLI by 0
1
From: Richard Henderson <richard.henderson@linaro.org>
2
and VSRI by <dt>. VSRI by <dt> means "don't update the destination",
3
which is what we've implemented. However VSLI by 0 is "set
4
destination to the input", so we don't want to use the same
5
special-casing that we do for VSRI by <dt>.
6
2
7
Since the generic logic gives the right answer for a shift
3
There's no reason to set CPACR_EL1.ZEN if SVE disabled.
8
by 0, just use that.
9
4
5
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
6
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
7
Message-id: 20220708151540.18136-44-richard.henderson@linaro.org
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
12
---
9
---
13
target/arm/mve_helper.c | 9 +++++----
10
target/arm/cpu.c | 7 +++----
14
1 file changed, 5 insertions(+), 4 deletions(-)
11
1 file changed, 3 insertions(+), 4 deletions(-)
15
12
16
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
13
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
17
index XXXXXXX..XXXXXXX 100644
14
index XXXXXXX..XXXXXXX 100644
18
--- a/target/arm/mve_helper.c
15
--- a/target/arm/cpu.c
19
+++ b/target/arm/mve_helper.c
16
+++ b/target/arm/cpu.c
20
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_S(vrshli_s, DO_VRSHLS)
17
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(DeviceState *dev)
21
uint16_t mask; \
18
/* and to the FP/Neon instructions */
22
uint64_t shiftmask; \
19
env->cp15.cpacr_el1 = FIELD_DP64(env->cp15.cpacr_el1,
23
unsigned e; \
20
CPACR_EL1, FPEN, 3);
24
- if (shift == 0 || shift == ESIZE * 8) { \
21
- /* and to the SVE instructions */
25
+ if (shift == ESIZE * 8) { \
22
- env->cp15.cpacr_el1 = FIELD_DP64(env->cp15.cpacr_el1,
26
/* \
23
- CPACR_EL1, ZEN, 3);
27
- * Only VSLI can shift by 0; only VSRI can shift by <dt>. \
24
- /* with reasonable vector length */
28
- * The generic logic would give the right answer for 0 but \
25
+ /* and to the SVE instructions, with default vector length */
29
- * fails for <dt>. \
26
if (cpu_isar_feature(aa64_sve, cpu)) {
30
+ * Only VSRI can shift by <dt>; it should mean "don't \
27
+ env->cp15.cpacr_el1 = FIELD_DP64(env->cp15.cpacr_el1,
31
+ * update the destination". The generic logic can't handle \
28
+ CPACR_EL1, ZEN, 3);
32
+ * this because it would try to shift by an out-of-range \
29
env->vfp.zcr_el[1] = cpu->sve_default_vq - 1;
33
+ * amount, so special case it here. \
30
}
34
*/ \
31
/*
35
goto done; \
36
} \
37
--
32
--
38
2.20.1
33
2.25.1
39
40
diff view generated by jsdifflib
1
Include the MVE VPR register value in the CPU dumps produced by
1
From: Richard Henderson <richard.henderson@linaro.org>
2
arm_cpu_dump_state() if we are printing FPU information. This
3
makes it easier to interpret debug logs when predication is
4
active.
5
2
3
Enable SME, TPIDR2_EL0, and FA64 if supported by the cpu.
4
5
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
6
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
7
Message-id: 20220708151540.18136-45-richard.henderson@linaro.org
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
8
---
9
---
9
target/arm/cpu.c | 3 +++
10
target/arm/cpu.c | 11 +++++++++++
10
1 file changed, 3 insertions(+)
11
1 file changed, 11 insertions(+)
11
12
12
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
13
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
13
index XXXXXXX..XXXXXXX 100644
14
index XXXXXXX..XXXXXXX 100644
14
--- a/target/arm/cpu.c
15
--- a/target/arm/cpu.c
15
+++ b/target/arm/cpu.c
16
+++ b/target/arm/cpu.c
16
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_dump_state(CPUState *cs, FILE *f, int flags)
17
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(DeviceState *dev)
17
i, v);
18
CPACR_EL1, ZEN, 3);
19
env->vfp.zcr_el[1] = cpu->sve_default_vq - 1;
18
}
20
}
19
qemu_fprintf(f, "FPSCR: %08x\n", vfp_get_fpscr(env));
21
+ /* and for SME instructions, with default vector length, and TPIDR2 */
20
+ if (cpu_isar_feature(aa32_mve, cpu)) {
22
+ if (cpu_isar_feature(aa64_sme, cpu)) {
21
+ qemu_fprintf(f, "VPR: %08x\n", env->v7m.vpr);
23
+ env->cp15.sctlr_el[1] |= SCTLR_EnTP2;
24
+ env->cp15.cpacr_el1 = FIELD_DP64(env->cp15.cpacr_el1,
25
+ CPACR_EL1, SMEN, 3);
26
+ env->vfp.smcr_el[1] = cpu->sme_default_vq - 1;
27
+ if (cpu_isar_feature(aa64_sme_fa64, cpu)) {
28
+ env->vfp.smcr_el[1] = FIELD_DP64(env->vfp.smcr_el[1],
29
+ SMCR, FA64, 1);
30
+ }
22
+ }
31
+ }
23
}
32
/*
24
}
33
* Enable 48-bit address space (TODO: take reserved_va into account).
25
34
* Enable TBI0 but not TBI1.
26
--
35
--
27
2.20.1
36
2.25.1
28
29
diff view generated by jsdifflib
1
We were not paying attention to the ECI state when advancing the VPT
1
From: Richard Henderson <richard.henderson@linaro.org>
2
state. Architecturally, VPT state advance happens for every beat
3
(see the pseudocode VPTAdvance()), so on every beat the 4 bits of
4
VPR.P0 corresponding to the current beat are inverted if required,
5
and at the end of beats 1 and 3 the VPR MASK fields are updated.
6
This means that if the ECI state says we should not be executing all
7
4 beats then we need to skip some of the updating of the VPR that we
8
currently do in mve_advance_vpt().
9
2
3
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
5
Message-id: 20220708151540.18136-46-richard.henderson@linaro.org
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
12
---
7
---
13
target/arm/mve_helper.c | 24 +++++++++++++++++-------
8
linux-user/elfload.c | 20 ++++++++++++++++++++
14
1 file changed, 17 insertions(+), 7 deletions(-)
9
1 file changed, 20 insertions(+)
15
10
16
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
11
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
17
index XXXXXXX..XXXXXXX 100644
12
index XXXXXXX..XXXXXXX 100644
18
--- a/target/arm/mve_helper.c
13
--- a/linux-user/elfload.c
19
+++ b/target/arm/mve_helper.c
14
+++ b/linux-user/elfload.c
20
@@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env)
15
@@ -XXX,XX +XXX,XX @@ enum {
21
/* Advance the VPT and ECI state if necessary */
16
ARM_HWCAP2_A64_RNG = 1 << 16,
22
uint32_t vpr = env->v7m.vpr;
17
ARM_HWCAP2_A64_BTI = 1 << 17,
23
unsigned mask01, mask23;
18
ARM_HWCAP2_A64_MTE = 1 << 18,
24
+ uint16_t inv_mask;
19
+ ARM_HWCAP2_A64_ECV = 1 << 19,
25
+ uint16_t eci_mask = mve_eci_mask(env);
20
+ ARM_HWCAP2_A64_AFP = 1 << 20,
26
21
+ ARM_HWCAP2_A64_RPRES = 1 << 21,
27
if ((env->condexec_bits & 0xf) == 0) {
22
+ ARM_HWCAP2_A64_MTE3 = 1 << 22,
28
env->condexec_bits = (env->condexec_bits == (ECI_A0A1A2B0 << 4)) ?
23
+ ARM_HWCAP2_A64_SME = 1 << 23,
29
@@ -XXX,XX +XXX,XX @@ static void mve_advance_vpt(CPUARMState *env)
24
+ ARM_HWCAP2_A64_SME_I16I64 = 1 << 24,
30
return;
25
+ ARM_HWCAP2_A64_SME_F64F64 = 1 << 25,
31
}
26
+ ARM_HWCAP2_A64_SME_I8I32 = 1 << 26,
32
27
+ ARM_HWCAP2_A64_SME_F16F32 = 1 << 27,
33
+ /* Invert P0 bits if needed, but only for beats we actually executed */
28
+ ARM_HWCAP2_A64_SME_B16F32 = 1 << 28,
34
mask01 = FIELD_EX32(vpr, V7M_VPR, MASK01);
29
+ ARM_HWCAP2_A64_SME_F32F32 = 1 << 29,
35
mask23 = FIELD_EX32(vpr, V7M_VPR, MASK23);
30
+ ARM_HWCAP2_A64_SME_FA64 = 1 << 30,
36
- if (mask01 > 8) {
31
};
37
- /* high bit set, but not 0b1000: invert the relevant half of P0 */
32
38
- vpr ^= 0xff;
33
#define ELF_HWCAP get_elf_hwcap()
39
+ /* Start by assuming we invert all bits corresponding to executed beats */
34
@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap2(void)
40
+ inv_mask = eci_mask;
35
GET_FEATURE_ID(aa64_rndr, ARM_HWCAP2_A64_RNG);
41
+ if (mask01 <= 8) {
36
GET_FEATURE_ID(aa64_bti, ARM_HWCAP2_A64_BTI);
42
+ /* MASK01 says don't invert low half of P0 */
37
GET_FEATURE_ID(aa64_mte, ARM_HWCAP2_A64_MTE);
43
+ inv_mask &= ~0xff;
38
+ GET_FEATURE_ID(aa64_sme, (ARM_HWCAP2_A64_SME |
44
}
39
+ ARM_HWCAP2_A64_SME_F32F32 |
45
- if (mask23 > 8) {
40
+ ARM_HWCAP2_A64_SME_B16F32 |
46
- /* high bit set, but not 0b1000: invert the relevant half of P0 */
41
+ ARM_HWCAP2_A64_SME_F16F32 |
47
- vpr ^= 0xff00;
42
+ ARM_HWCAP2_A64_SME_I8I32));
48
+ if (mask23 <= 8) {
43
+ GET_FEATURE_ID(aa64_sme_f64f64, ARM_HWCAP2_A64_SME_F64F64);
49
+ /* MASK23 says don't invert high half of P0 */
44
+ GET_FEATURE_ID(aa64_sme_i16i64, ARM_HWCAP2_A64_SME_I16I64);
50
+ inv_mask &= ~0xff00;
45
+ GET_FEATURE_ID(aa64_sme_fa64, ARM_HWCAP2_A64_SME_FA64);
51
}
46
52
- vpr = FIELD_DP32(vpr, V7M_VPR, MASK01, mask01 << 1);
47
return hwcaps;
53
+ vpr ^= inv_mask;
54
+ /* Only update MASK01 if beat 1 executed */
55
+ if (eci_mask & 0xf0) {
56
+ vpr = FIELD_DP32(vpr, V7M_VPR, MASK01, mask01 << 1);
57
+ }
58
+ /* Beat 3 always executes, so update MASK23 */
59
vpr = FIELD_DP32(vpr, V7M_VPR, MASK23, mask23 << 1);
60
env->v7m.vpr = vpr;
61
}
48
}
62
--
49
--
63
2.20.1
50
2.25.1
64
65
diff view generated by jsdifflib