1
Another target-arm queue, since we're over 30 patches
1
Mostly my decodetree stuff, but also some patches for various
2
already. Most of this is RTH's SVE-patches-part-1.
2
smaller bugs/features from others.
3
3
4
thanks
4
thanks
5
-- PMM
5
-- PMM
6
6
7
The following changes since commit 53550e81e2cafe7c03a39526b95cd21b5194d9b1:
7
8
8
The following changes since commit d32e41a1188e929cc0fb16829ce3736046951e39:
9
Merge remote-tracking branch 'remotes/berrange/tags/qcrypto-next-pull-request' into staging (2020-06-15 16:36:34 +0100)
9
10
Merge remote-tracking branch 'remotes/famz/tags/docker-and-block-pull-request' into staging (2018-05-18 14:11:52 +0100)
11
10
12
are available in the Git repository at:
11
are available in the Git repository at:
13
12
14
git://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20180518
13
https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20200616
15
14
16
for you to fetch changes up to b94f8f60bd841c5b737185cd38263e26822f77ab:
15
for you to fetch changes up to 64b397417a26509bcdff44ab94356a35c7901c79:
17
16
18
target/arm: Implement SVE Permute - Extract Group (2018-05-18 17:48:09 +0100)
17
hw: arm: Set vendor property for IMX SDHCI emulations (2020-06-16 10:32:29 +0100)
19
18
20
----------------------------------------------------------------
19
----------------------------------------------------------------
21
target-arm queue:
20
* hw: arm: Set vendor property for IMX SDHCI emulations
22
* Initial part of SVE implementation (currently disabled)
21
* sd: sdhci: Implement basic vendor specific register support
23
* smmuv3: fix some minor Coverity issues
22
* hw/net/imx_fec: Convert debug fprintf() to trace events
24
* add model of Xilinx ZynqMP generic DMA controller
23
* target/arm/cpu: adjust virtual time for all KVM arm cpus
25
* expose (most) Arm coprocessor/system registers to
24
* Implement configurable descriptor size in ftgmac100
26
gdb via QEMU's gdbstub, for reads only
25
* hw/misc/imx6ul_ccm: Implement non writable bits in CCM registers
26
* target/arm: More Neon decodetree conversion work
27
27
28
----------------------------------------------------------------
28
----------------------------------------------------------------
29
Abdallah Bouassida (3):
29
Erik Smit (1):
30
target/arm: Add "ARM_CP_NO_GDB" as a new bit field for ARMCPRegInfo type
30
Implement configurable descriptor size in ftgmac100
31
target/arm: Add "_S" suffix to the secure version of a sysreg
32
target/arm: Add the XML dynamic generation
33
31
34
Eric Auger (2):
32
Guenter Roeck (2):
35
hw/arm/smmuv3: Fix Coverity issue in smmuv3_record_event
33
sd: sdhci: Implement basic vendor specific register support
36
hw/arm/smmu-common: Fix coverity issue in get_block_pte_address
34
hw: arm: Set vendor property for IMX SDHCI emulations
37
35
38
Francisco Iglesias (2):
36
Jean-Christophe Dubois (2):
39
xlnx-zdma: Add a model of the Xilinx ZynqMP generic DMA
37
hw/misc/imx6ul_ccm: Implement non writable bits in CCM registers
40
xlnx-zynqmp: Connect the ZynqMP GDMA and ADMA
38
hw/net/imx_fec: Convert debug fprintf() to trace events
41
39
42
Richard Henderson (25):
40
Peter Maydell (17):
43
target/arm: Introduce translate-a64.h
41
target/arm: Fix missing temp frees in do_vshll_2sh
44
target/arm: Add SVE decode skeleton
42
target/arm: Convert Neon 3-reg-diff prewidening ops to decodetree
45
target/arm: Implement SVE Bitwise Logical - Unpredicated Group
43
target/arm: Convert Neon 3-reg-diff narrowing ops to decodetree
46
target/arm: Implement SVE load vector/predicate
44
target/arm: Convert Neon 3-reg-diff VABAL, VABDL to decodetree
47
target/arm: Implement SVE predicate test
45
target/arm: Convert Neon 3-reg-diff long multiplies
48
target/arm: Implement SVE Predicate Logical Operations Group
46
target/arm: Convert Neon 3-reg-diff saturating doubling multiplies
49
target/arm: Implement SVE Predicate Misc Group
47
target/arm: Convert Neon 3-reg-diff polynomial VMULL
50
target/arm: Implement SVE Integer Binary Arithmetic - Predicated Group
48
target/arm: Add 'static' and 'const' annotations to VSHLL function arrays
51
target/arm: Implement SVE Integer Reduction Group
49
target/arm: Add missing TCG temp free in do_2shift_env_64()
52
target/arm: Implement SVE bitwise shift by immediate (predicated)
50
target/arm: Convert Neon 2-reg-scalar integer multiplies to decodetree
53
target/arm: Implement SVE bitwise shift by vector (predicated)
51
target/arm: Convert Neon 2-reg-scalar float multiplies to decodetree
54
target/arm: Implement SVE bitwise shift by wide elements (predicated)
52
target/arm: Convert Neon 2-reg-scalar VQDMULH, VQRDMULH to decodetree
55
target/arm: Implement SVE Integer Arithmetic - Unary Predicated Group
53
target/arm: Convert Neon 2-reg-scalar VQRDMLAH, VQRDMLSH to decodetree
56
target/arm: Implement SVE Integer Multiply-Add Group
54
target/arm: Convert Neon 2-reg-scalar long multiplies to decodetree
57
target/arm: Implement SVE Integer Arithmetic - Unpredicated Group
55
target/arm: Convert Neon VEXT to decodetree
58
target/arm: Implement SVE Index Generation Group
56
target/arm: Convert Neon VTBL, VTBX to decodetree
59
target/arm: Implement SVE Stack Allocation Group
57
target/arm: Convert Neon VDUP (scalar) to decodetree
60
target/arm: Implement SVE Bitwise Shift - Unpredicated Group
61
target/arm: Implement SVE Compute Vector Address Group
62
target/arm: Implement SVE floating-point exponential accelerator
63
target/arm: Implement SVE floating-point trig select coefficient
64
target/arm: Implement SVE Element Count Group
65
target/arm: Implement SVE Bitwise Immediate Group
66
target/arm: Implement SVE Integer Wide Immediate - Predicated Group
67
target/arm: Implement SVE Permute - Extract Group
68
58
69
hw/dma/Makefile.objs | 1 +
59
fangying (1):
70
target/arm/Makefile.objs | 10 +
60
target/arm/cpu: adjust virtual time for all KVM arm cpus
71
include/hw/arm/xlnx-zynqmp.h | 5 +
72
include/hw/dma/xlnx-zdma.h | 84 ++
73
include/qom/cpu.h | 5 +-
74
target/arm/cpu.h | 37 +-
75
target/arm/helper-sve.h | 427 +++++++++
76
target/arm/helper.h | 1 +
77
target/arm/translate-a64.h | 118 +++
78
gdbstub.c | 10 +
79
hw/arm/smmu-common.c | 4 +-
80
hw/arm/smmuv3.c | 2 +-
81
hw/arm/xlnx-zynqmp.c | 53 ++
82
hw/dma/xlnx-zdma.c | 832 +++++++++++++++++
83
target/arm/cpu.c | 1 +
84
target/arm/gdbstub.c | 76 ++
85
target/arm/helper.c | 57 +-
86
target/arm/sve_helper.c | 1562 +++++++++++++++++++++++++++++++
87
target/arm/translate-a64.c | 119 +--
88
target/arm/translate-sve.c | 2070 ++++++++++++++++++++++++++++++++++++++++++
89
.gitignore | 1 +
90
target/arm/sve.decode | 419 +++++++++
91
22 files changed, 5778 insertions(+), 116 deletions(-)
92
create mode 100644 include/hw/dma/xlnx-zdma.h
93
create mode 100644 target/arm/helper-sve.h
94
create mode 100644 target/arm/translate-a64.h
95
create mode 100644 hw/dma/xlnx-zdma.c
96
create mode 100644 target/arm/sve_helper.c
97
create mode 100644 target/arm/translate-sve.c
98
create mode 100644 target/arm/sve.decode
99
61
62
hw/sd/sdhci-internal.h | 5 +
63
include/hw/sd/sdhci.h | 5 +
64
target/arm/translate.h | 1 +
65
target/arm/neon-dp.decode | 130 +++++
66
hw/arm/fsl-imx25.c | 6 +
67
hw/arm/fsl-imx6.c | 6 +
68
hw/arm/fsl-imx6ul.c | 2 +
69
hw/arm/fsl-imx7.c | 2 +
70
hw/misc/imx6ul_ccm.c | 76 ++-
71
hw/net/ftgmac100.c | 26 +-
72
hw/net/imx_fec.c | 106 ++--
73
hw/sd/sdhci.c | 18 +-
74
target/arm/cpu.c | 6 +-
75
target/arm/cpu64.c | 1 -
76
target/arm/kvm.c | 21 +-
77
target/arm/translate-neon.inc.c | 1148 ++++++++++++++++++++++++++++++++++++++-
78
target/arm/translate.c | 684 +----------------------
79
hw/net/trace-events | 18 +
80
18 files changed, 1495 insertions(+), 766 deletions(-)
81
diff view generated by jsdifflib
Deleted patch
1
From: Abdallah Bouassida <abdallah.bouassida@lauterbach.com>
2
1
3
This is a preparation for the coming feature of creating dynamically an XML
4
description for the ARM sysregs.
5
A register has ARM_CP_NO_GDB enabled will not be shown in the dynamic XML.
6
This bit is enabled automatically when creating CP_ANY wildcard aliases.
7
This bit could be enabled manually for any register we want to remove from the
8
dynamic XML description.
9
10
Signed-off-by: Abdallah Bouassida <abdallah.bouassida@lauterbach.com>
11
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
12
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
13
Tested-by: Alex Bennée <alex.bennee@linaro.org>
14
Message-id: 1524153386-3550-2-git-send-email-abdallah.bouassida@lauterbach.com
15
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
16
---
17
target/arm/cpu.h | 3 ++-
18
target/arm/helper.c | 2 +-
19
2 files changed, 3 insertions(+), 2 deletions(-)
20
21
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
22
index XXXXXXX..XXXXXXX 100644
23
--- a/target/arm/cpu.h
24
+++ b/target/arm/cpu.h
25
@@ -XXX,XX +XXX,XX @@ static inline uint64_t cpreg_to_kvm_id(uint32_t cpregid)
26
#define ARM_LAST_SPECIAL ARM_CP_DC_ZVA
27
#define ARM_CP_FPU 0x1000
28
#define ARM_CP_SVE 0x2000
29
+#define ARM_CP_NO_GDB 0x4000
30
/* Used only as a terminator for ARMCPRegInfo lists */
31
#define ARM_CP_SENTINEL 0xffff
32
/* Mask of only the flag bits in a type field */
33
-#define ARM_CP_FLAG_MASK 0x30ff
34
+#define ARM_CP_FLAG_MASK 0x70ff
35
36
/* Valid values for ARMCPRegInfo state field, indicating which of
37
* the AArch32 and AArch64 execution states this register is visible in.
38
diff --git a/target/arm/helper.c b/target/arm/helper.c
39
index XXXXXXX..XXXXXXX 100644
40
--- a/target/arm/helper.c
41
+++ b/target/arm/helper.c
42
@@ -XXX,XX +XXX,XX @@ static void add_cpreg_to_hashtable(ARMCPU *cpu, const ARMCPRegInfo *r,
43
if (((r->crm == CP_ANY) && crm != 0) ||
44
((r->opc1 == CP_ANY) && opc1 != 0) ||
45
((r->opc2 == CP_ANY) && opc2 != 0)) {
46
- r2->type |= ARM_CP_ALIAS;
47
+ r2->type |= ARM_CP_ALIAS | ARM_CP_NO_GDB;
48
}
49
50
/* Check that raw accesses are either forbidden or handled. Note that
51
--
52
2.17.0
53
54
diff view generated by jsdifflib
Deleted patch
1
From: Abdallah Bouassida <abdallah.bouassida@lauterbach.com>
2
1
3
This is a preparation for the coming feature of creating dynamically an XML
4
description for the ARM sysregs.
5
Add "_S" suffix to the secure version of sysregs that have both S and NS views
6
Replace (S) and (NS) by _S and _NS for the register that are manually defined,
7
so all the registers follow the same convention.
8
9
Signed-off-by: Abdallah Bouassida <abdallah.bouassida@lauterbach.com>
10
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
11
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
12
Tested-by: Alex Bennée <alex.bennee@linaro.org>
13
Message-id: 1524153386-3550-3-git-send-email-abdallah.bouassida@lauterbach.com
14
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
15
---
16
target/arm/helper.c | 29 ++++++++++++++++++-----------
17
1 file changed, 18 insertions(+), 11 deletions(-)
18
19
diff --git a/target/arm/helper.c b/target/arm/helper.c
20
index XXXXXXX..XXXXXXX 100644
21
--- a/target/arm/helper.c
22
+++ b/target/arm/helper.c
23
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo cp_reginfo[] = {
24
* the secure register to be properly reset and migrated. There is also no
25
* v8 EL1 version of the register so the non-secure instance stands alone.
26
*/
27
- { .name = "FCSEIDR(NS)",
28
+ { .name = "FCSEIDR",
29
.cp = 15, .opc1 = 0, .crn = 13, .crm = 0, .opc2 = 0,
30
.access = PL1_RW, .secure = ARM_CP_SECSTATE_NS,
31
.fieldoffset = offsetof(CPUARMState, cp15.fcseidr_ns),
32
.resetvalue = 0, .writefn = fcse_write, .raw_writefn = raw_write, },
33
- { .name = "FCSEIDR(S)",
34
+ { .name = "FCSEIDR_S",
35
.cp = 15, .opc1 = 0, .crn = 13, .crm = 0, .opc2 = 0,
36
.access = PL1_RW, .secure = ARM_CP_SECSTATE_S,
37
.fieldoffset = offsetof(CPUARMState, cp15.fcseidr_s),
38
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo cp_reginfo[] = {
39
.access = PL1_RW, .secure = ARM_CP_SECSTATE_NS,
40
.fieldoffset = offsetof(CPUARMState, cp15.contextidr_el[1]),
41
.resetvalue = 0, .writefn = contextidr_write, .raw_writefn = raw_write, },
42
- { .name = "CONTEXTIDR(S)", .state = ARM_CP_STATE_AA32,
43
+ { .name = "CONTEXTIDR_S", .state = ARM_CP_STATE_AA32,
44
.cp = 15, .opc1 = 0, .crn = 13, .crm = 0, .opc2 = 1,
45
.access = PL1_RW, .secure = ARM_CP_SECSTATE_S,
46
.fieldoffset = offsetof(CPUARMState, cp15.contextidr_s),
47
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo generic_timer_cp_reginfo[] = {
48
cp15.c14_timer[GTIMER_PHYS].ctl),
49
.writefn = gt_phys_ctl_write, .raw_writefn = raw_write,
50
},
51
- { .name = "CNTP_CTL(S)",
52
+ { .name = "CNTP_CTL_S",
53
.cp = 15, .crn = 14, .crm = 2, .opc1 = 0, .opc2 = 1,
54
.secure = ARM_CP_SECSTATE_S,
55
.type = ARM_CP_IO | ARM_CP_ALIAS, .access = PL1_RW | PL0_R,
56
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo generic_timer_cp_reginfo[] = {
57
.accessfn = gt_ptimer_access,
58
.readfn = gt_phys_tval_read, .writefn = gt_phys_tval_write,
59
},
60
- { .name = "CNTP_TVAL(S)",
61
+ { .name = "CNTP_TVAL_S",
62
.cp = 15, .crn = 14, .crm = 2, .opc1 = 0, .opc2 = 0,
63
.secure = ARM_CP_SECSTATE_S,
64
.type = ARM_CP_NO_RAW | ARM_CP_IO, .access = PL1_RW | PL0_R,
65
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo generic_timer_cp_reginfo[] = {
66
.accessfn = gt_ptimer_access,
67
.writefn = gt_phys_cval_write, .raw_writefn = raw_write,
68
},
69
- { .name = "CNTP_CVAL(S)", .cp = 15, .crm = 14, .opc1 = 2,
70
+ { .name = "CNTP_CVAL_S", .cp = 15, .crm = 14, .opc1 = 2,
71
.secure = ARM_CP_SECSTATE_S,
72
.access = PL1_RW | PL0_R,
73
.type = ARM_CP_64BIT | ARM_CP_IO | ARM_CP_ALIAS,
74
@@ -XXX,XX +XXX,XX @@ CpuDefinitionInfoList *arch_query_cpu_definitions(Error **errp)
75
76
static void add_cpreg_to_hashtable(ARMCPU *cpu, const ARMCPRegInfo *r,
77
void *opaque, int state, int secstate,
78
- int crm, int opc1, int opc2)
79
+ int crm, int opc1, int opc2,
80
+ const char *name)
81
{
82
/* Private utility function for define_one_arm_cp_reg_with_opaque():
83
* add a single reginfo struct to the hash table.
84
@@ -XXX,XX +XXX,XX @@ static void add_cpreg_to_hashtable(ARMCPU *cpu, const ARMCPRegInfo *r,
85
int is64 = (r->type & ARM_CP_64BIT) ? 1 : 0;
86
int ns = (secstate & ARM_CP_SECSTATE_NS) ? 1 : 0;
87
88
+ r2->name = g_strdup(name);
89
/* Reset the secure state to the specific incoming state. This is
90
* necessary as the register may have been defined with both states.
91
*/
92
@@ -XXX,XX +XXX,XX @@ void define_one_arm_cp_reg_with_opaque(ARMCPU *cpu,
93
/* Under AArch32 CP registers can be common
94
* (same for secure and non-secure world) or banked.
95
*/
96
+ char *name;
97
+
98
switch (r->secure) {
99
case ARM_CP_SECSTATE_S:
100
case ARM_CP_SECSTATE_NS:
101
add_cpreg_to_hashtable(cpu, r, opaque, state,
102
- r->secure, crm, opc1, opc2);
103
+ r->secure, crm, opc1, opc2,
104
+ r->name);
105
break;
106
default:
107
+ name = g_strdup_printf("%s_S", r->name);
108
add_cpreg_to_hashtable(cpu, r, opaque, state,
109
ARM_CP_SECSTATE_S,
110
- crm, opc1, opc2);
111
+ crm, opc1, opc2, name);
112
+ g_free(name);
113
add_cpreg_to_hashtable(cpu, r, opaque, state,
114
ARM_CP_SECSTATE_NS,
115
- crm, opc1, opc2);
116
+ crm, opc1, opc2, r->name);
117
break;
118
}
119
} else {
120
@@ -XXX,XX +XXX,XX @@ void define_one_arm_cp_reg_with_opaque(ARMCPU *cpu,
121
* of AArch32 */
122
add_cpreg_to_hashtable(cpu, r, opaque, state,
123
ARM_CP_SECSTATE_NS,
124
- crm, opc1, opc2);
125
+ crm, opc1, opc2, r->name);
126
}
127
}
128
}
129
--
130
2.17.0
131
132
diff view generated by jsdifflib
1
From: Richard Henderson <richard.henderson@linaro.org>
1
The widenfn() in do_vshll_2sh() does not free the input 32-bit
2
TCGv, so we need to do this in the calling code.
2
3
3
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
5
Message-id: 20180516223007.10256-24-richard.henderson@linaro.org
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
7
---
7
---
8
target/arm/translate-sve.c | 49 ++++++++++++++++++++++++++++++++++++++
8
target/arm/translate-neon.inc.c | 2 ++
9
target/arm/sve.decode | 17 +++++++++++++
9
1 file changed, 2 insertions(+)
10
2 files changed, 66 insertions(+)
11
10
12
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
11
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
13
index XXXXXXX..XXXXXXX 100644
12
index XXXXXXX..XXXXXXX 100644
14
--- a/target/arm/translate-sve.c
13
--- a/target/arm/translate-neon.inc.c
15
+++ b/target/arm/translate-sve.c
14
+++ b/target/arm/translate-neon.inc.c
16
@@ -XXX,XX +XXX,XX @@ static bool trans_SINCDEC_v(DisasContext *s, arg_incdec2_cnt *a,
15
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
17
return true;
16
tmp = tcg_temp_new_i64();
18
}
17
19
18
widenfn(tmp, rm0);
20
+/*
19
+ tcg_temp_free_i32(rm0);
21
+ *** SVE Bitwise Immediate Group
20
if (a->shift != 0) {
22
+ */
21
tcg_gen_shli_i64(tmp, tmp, a->shift);
23
+
22
tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
24
+static bool do_zz_dbm(DisasContext *s, arg_rr_dbm *a, GVecGen2iFn *gvec_fn)
23
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
25
+{
24
neon_store_reg64(tmp, a->vd);
26
+ uint64_t imm;
25
27
+ if (!logic_imm_decode_wmask(&imm, extract32(a->dbm, 12, 1),
26
widenfn(tmp, rm1);
28
+ extract32(a->dbm, 0, 6),
27
+ tcg_temp_free_i32(rm1);
29
+ extract32(a->dbm, 6, 6))) {
28
if (a->shift != 0) {
30
+ return false;
29
tcg_gen_shli_i64(tmp, tmp, a->shift);
31
+ }
30
tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
32
+ if (sve_access_check(s)) {
33
+ unsigned vsz = vec_full_reg_size(s);
34
+ gvec_fn(MO_64, vec_full_reg_offset(s, a->rd),
35
+ vec_full_reg_offset(s, a->rn), imm, vsz, vsz);
36
+ }
37
+ return true;
38
+}
39
+
40
+static bool trans_AND_zzi(DisasContext *s, arg_rr_dbm *a, uint32_t insn)
41
+{
42
+ return do_zz_dbm(s, a, tcg_gen_gvec_andi);
43
+}
44
+
45
+static bool trans_ORR_zzi(DisasContext *s, arg_rr_dbm *a, uint32_t insn)
46
+{
47
+ return do_zz_dbm(s, a, tcg_gen_gvec_ori);
48
+}
49
+
50
+static bool trans_EOR_zzi(DisasContext *s, arg_rr_dbm *a, uint32_t insn)
51
+{
52
+ return do_zz_dbm(s, a, tcg_gen_gvec_xori);
53
+}
54
+
55
+static bool trans_DUPM(DisasContext *s, arg_DUPM *a, uint32_t insn)
56
+{
57
+ uint64_t imm;
58
+ if (!logic_imm_decode_wmask(&imm, extract32(a->dbm, 12, 1),
59
+ extract32(a->dbm, 0, 6),
60
+ extract32(a->dbm, 6, 6))) {
61
+ return false;
62
+ }
63
+ if (sve_access_check(s)) {
64
+ do_dupi_z(s, a->rd, imm);
65
+ }
66
+ return true;
67
+}
68
+
69
/*
70
*** SVE Memory - 32-bit Gather and Unsized Contiguous Group
71
*/
72
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
73
index XXXXXXX..XXXXXXX 100644
74
--- a/target/arm/sve.decode
75
+++ b/target/arm/sve.decode
76
@@ -XXX,XX +XXX,XX @@
77
78
&rr_esz rd rn esz
79
&rri rd rn imm
80
+&rr_dbm rd rn dbm
81
&rrri rd rn rm imm
82
&rri_esz rd rn imm esz
83
&rrr_esz rd rn rm esz
84
@@ -XXX,XX +XXX,XX @@
85
@rd_rn_tszimm ........ .. ... ... ...... rn:5 rd:5 \
86
&rri_esz esz=%tszimm16_esz
87
88
+# Two register operand, one encoded bitmask.
89
+@rdn_dbm ........ .. .... dbm:13 rd:5 \
90
+ &rr_dbm rn=%reg_movprfx
91
+
92
# Basic Load/Store with 9-bit immediate offset
93
@pd_rn_i9 ........ ........ ...... rn:5 . rd:4 \
94
&rri imm=%imm9_16_10
95
@@ -XXX,XX +XXX,XX @@ INCDEC_v 00000100 .. 1 1 .... 1100 0 d:1 ..... ..... @incdec2_cnt u=1
96
# Note these require esz != 0.
97
SINCDEC_v 00000100 .. 1 0 .... 1100 d:1 u:1 ..... ..... @incdec2_cnt
98
99
+### SVE Bitwise Immediate Group
100
+
101
+# SVE bitwise logical with immediate (unpredicated)
102
+ORR_zzi 00000101 00 0000 ............. ..... @rdn_dbm
103
+EOR_zzi 00000101 01 0000 ............. ..... @rdn_dbm
104
+AND_zzi 00000101 10 0000 ............. ..... @rdn_dbm
105
+
106
+# SVE broadcast bitmask immediate
107
+DUPM 00000101 11 0000 dbm:13 rd:5
108
+
109
+### SVE Predicate Logical Operations Group
110
+
111
# SVE predicate logical operations
112
AND_pppp 00100101 0. 00 .... 01 .... 0 .... 0 .... @pd_pg_pn_pm_s
113
BIC_pppp 00100101 0. 00 .... 01 .... 0 .... 1 .... @pd_pg_pn_pm_s
114
--
31
--
115
2.17.0
32
2.20.1
116
33
117
34
diff view generated by jsdifflib
1
From: Richard Henderson <richard.henderson@linaro.org>
1
Convert the "pre-widening" insns VADDL, VSUBL, VADDW and VSUBW
2
2
in the Neon 3-registers-different-lengths group to decodetree.
3
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
3
These insns work by widening one or both inputs to double their
4
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
4
size, performing an add or subtract at the doubled size and
5
Message-id: 20180516223007.10256-21-richard.henderson@linaro.org
5
then storing the double-size result.
6
7
As usual, rather than copying the loop of the original decoder
8
(which needs awkward code to avoid problems when source and
9
destination registers overlap) we just unroll the two passes.
10
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
12
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
7
---
13
---
8
target/arm/helper-sve.h | 4 ++
14
target/arm/neon-dp.decode | 43 +++++++++++++
9
target/arm/sve_helper.c | 90 ++++++++++++++++++++++++++++++++++++++
15
target/arm/translate-neon.inc.c | 104 ++++++++++++++++++++++++++++++++
10
target/arm/translate-sve.c | 24 ++++++++++
16
target/arm/translate.c | 16 ++---
11
target/arm/sve.decode | 7 +++
17
3 files changed, 151 insertions(+), 12 deletions(-)
12
4 files changed, 125 insertions(+)
18
13
19
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
14
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
15
index XXXXXXX..XXXXXXX 100644
20
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/helper-sve.h
21
--- a/target/arm/neon-dp.decode
17
+++ b/target/arm/helper-sve.h
22
+++ b/target/arm/neon-dp.decode
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_adr_p64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
23
@@ -XXX,XX +XXX,XX @@ VCVT_FU_2sh 1111 001 1 1 . ...... .... 1111 0 . . 1 .... @2reg_vcvt
19
DEF_HELPER_FLAGS_4(sve_adr_s32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
24
# So we have a single decode line and check the cmode/op in the
20
DEF_HELPER_FLAGS_4(sve_adr_u32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
25
# trans function.
21
26
Vimm_1r 1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
22
+DEF_HELPER_FLAGS_3(sve_fexpa_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
27
+
23
+DEF_HELPER_FLAGS_3(sve_fexpa_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
28
+######################################################################
24
+DEF_HELPER_FLAGS_3(sve_fexpa_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
29
+# Within the "two registers, or three registers of different lengths"
25
+
30
+# grouping ([23,4]=0b10), bits [21:20] are either part of the opcode
26
DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
31
+# decode: 0b11 for VEXT, two-reg-misc, VTBL, and duplicate-scalar;
27
DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
32
+# or they are a size field for the three-reg-different-lengths and
28
DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
33
+# two-reg-and-scalar insn groups (where size cannot be 0b11). This
29
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
34
+# is slightly awkward for decodetree: we handle it with this
35
+# non-exclusive group which contains within it two exclusive groups:
36
+# one for the size=0b11 patterns, and one for the size-not-0b11
37
+# patterns. This allows us to check that none of the insns within
38
+# each subgroup accidentally overlap each other. Note that all the
39
+# trans functions for the size-not-0b11 patterns must check and
40
+# return false for size==3.
41
+######################################################################
42
+{
43
+ # 0b11 subgroup will go here
44
+
45
+ # Subgroup for size != 0b11
46
+ [
47
+ ##################################################################
48
+ # 3-reg-different-length grouping:
49
+ # 1111 001 U 1 D sz!=11 Vn:4 Vd:4 opc:4 N 0 M 0 Vm:4
50
+ ##################################################################
51
+
52
+ &3diff vm vn vd size
53
+
54
+ @3diff .... ... . . . size:2 .... .... .... . . . . .... \
55
+ &3diff vm=%vm_dp vn=%vn_dp vd=%vd_dp
56
+
57
+ VADDL_S_3d 1111 001 0 1 . .. .... .... 0000 . 0 . 0 .... @3diff
58
+ VADDL_U_3d 1111 001 1 1 . .. .... .... 0000 . 0 . 0 .... @3diff
59
+
60
+ VADDW_S_3d 1111 001 0 1 . .. .... .... 0001 . 0 . 0 .... @3diff
61
+ VADDW_U_3d 1111 001 1 1 . .. .... .... 0001 . 0 . 0 .... @3diff
62
+
63
+ VSUBL_S_3d 1111 001 0 1 . .. .... .... 0010 . 0 . 0 .... @3diff
64
+ VSUBL_U_3d 1111 001 1 1 . .. .... .... 0010 . 0 . 0 .... @3diff
65
+
66
+ VSUBW_S_3d 1111 001 0 1 . .. .... .... 0011 . 0 . 0 .... @3diff
67
+ VSUBW_U_3d 1111 001 1 1 . .. .... .... 0011 . 0 . 0 .... @3diff
68
+ ]
69
+}
70
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
30
index XXXXXXX..XXXXXXX 100644
71
index XXXXXXX..XXXXXXX 100644
31
--- a/target/arm/sve_helper.c
72
--- a/target/arm/translate-neon.inc.c
32
+++ b/target/arm/sve_helper.c
73
+++ b/target/arm/translate-neon.inc.c
33
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_adr_u32)(void *vd, void *vn, void *vm, uint32_t desc)
74
@@ -XXX,XX +XXX,XX @@ static bool trans_Vimm_1r(DisasContext *s, arg_1reg_imm *a)
34
d[i] = n[i] + ((uint64_t)(uint32_t)m[i] << sh);
35
}
75
}
76
return do_1reg_imm(s, a, fn);
36
}
77
}
37
+
78
+
38
+void HELPER(sve_fexpa_h)(void *vd, void *vn, uint32_t desc)
79
+static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
80
+ NeonGenWidenFn *widenfn,
81
+ NeonGenTwo64OpFn *opfn,
82
+ bool src1_wide)
39
+{
83
+{
40
+ /* These constants are cut-and-paste directly from the ARM pseudocode. */
84
+ /* 3-regs different lengths, prewidening case (VADDL/VSUBL/VAADW/VSUBW) */
41
+ static const uint16_t coeff[] = {
85
+ TCGv_i64 rn0_64, rn1_64, rm_64;
42
+ 0x0000, 0x0016, 0x002d, 0x0045, 0x005d, 0x0075, 0x008e, 0x00a8,
86
+ TCGv_i32 rm;
43
+ 0x00c2, 0x00dc, 0x00f8, 0x0114, 0x0130, 0x014d, 0x016b, 0x0189,
87
+
44
+ 0x01a8, 0x01c8, 0x01e8, 0x0209, 0x022b, 0x024e, 0x0271, 0x0295,
88
+ if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
45
+ 0x02ba, 0x02e0, 0x0306, 0x032e, 0x0356, 0x037f, 0x03a9, 0x03d4,
89
+ return false;
46
+ };
90
+ }
47
+ intptr_t i, opr_sz = simd_oprsz(desc) / 2;
91
+
48
+ uint16_t *d = vd, *n = vn;
92
+ /* UNDEF accesses to D16-D31 if they don't exist. */
49
+
93
+ if (!dc_isar_feature(aa32_simd_r32, s) &&
50
+ for (i = 0; i < opr_sz; i++) {
94
+ ((a->vd | a->vn | a->vm) & 0x10)) {
51
+ uint16_t nn = n[i];
95
+ return false;
52
+ intptr_t idx = extract32(nn, 0, 5);
96
+ }
53
+ uint16_t exp = extract32(nn, 5, 5);
97
+
54
+ d[i] = coeff[idx] | (exp << 10);
98
+ if (!widenfn || !opfn) {
55
+ }
99
+ /* size == 3 case, which is an entirely different insn group */
56
+}
100
+ return false;
57
+
101
+ }
58
+void HELPER(sve_fexpa_s)(void *vd, void *vn, uint32_t desc)
102
+
59
+{
103
+ if ((a->vd & 1) || (src1_wide && (a->vn & 1))) {
60
+ /* These constants are cut-and-paste directly from the ARM pseudocode. */
104
+ return false;
61
+ static const uint32_t coeff[] = {
105
+ }
62
+ 0x000000, 0x0164d2, 0x02cd87, 0x043a29,
106
+
63
+ 0x05aac3, 0x071f62, 0x08980f, 0x0a14d5,
107
+ if (!vfp_access_check(s)) {
64
+ 0x0b95c2, 0x0d1adf, 0x0ea43a, 0x1031dc,
108
+ return true;
65
+ 0x11c3d3, 0x135a2b, 0x14f4f0, 0x16942d,
109
+ }
66
+ 0x1837f0, 0x19e046, 0x1b8d3a, 0x1d3eda,
110
+
67
+ 0x1ef532, 0x20b051, 0x227043, 0x243516,
111
+ rn0_64 = tcg_temp_new_i64();
68
+ 0x25fed7, 0x27cd94, 0x29a15b, 0x2b7a3a,
112
+ rn1_64 = tcg_temp_new_i64();
69
+ 0x2d583f, 0x2f3b79, 0x3123f6, 0x3311c4,
113
+ rm_64 = tcg_temp_new_i64();
70
+ 0x3504f3, 0x36fd92, 0x38fbaf, 0x3aff5b,
114
+
71
+ 0x3d08a4, 0x3f179a, 0x412c4d, 0x4346cd,
115
+ if (src1_wide) {
72
+ 0x45672a, 0x478d75, 0x49b9be, 0x4bec15,
116
+ neon_load_reg64(rn0_64, a->vn);
73
+ 0x4e248c, 0x506334, 0x52a81e, 0x54f35b,
117
+ } else {
74
+ 0x5744fd, 0x599d16, 0x5bfbb8, 0x5e60f5,
118
+ TCGv_i32 tmp = neon_load_reg(a->vn, 0);
75
+ 0x60ccdf, 0x633f89, 0x65b907, 0x68396a,
119
+ widenfn(rn0_64, tmp);
76
+ 0x6ac0c7, 0x6d4f30, 0x6fe4ba, 0x728177,
120
+ tcg_temp_free_i32(tmp);
77
+ 0x75257d, 0x77d0df, 0x7a83b3, 0x7d3e0c,
121
+ }
78
+ };
122
+ rm = neon_load_reg(a->vm, 0);
79
+ intptr_t i, opr_sz = simd_oprsz(desc) / 4;
123
+
80
+ uint32_t *d = vd, *n = vn;
124
+ widenfn(rm_64, rm);
81
+
125
+ tcg_temp_free_i32(rm);
82
+ for (i = 0; i < opr_sz; i++) {
126
+ opfn(rn0_64, rn0_64, rm_64);
83
+ uint32_t nn = n[i];
127
+
84
+ intptr_t idx = extract32(nn, 0, 6);
128
+ /*
85
+ uint32_t exp = extract32(nn, 6, 8);
129
+ * Load second pass inputs before storing the first pass result, to
86
+ d[i] = coeff[idx] | (exp << 23);
130
+ * avoid incorrect results if a narrow input overlaps with the result.
87
+ }
131
+ */
88
+}
132
+ if (src1_wide) {
89
+
133
+ neon_load_reg64(rn1_64, a->vn + 1);
90
+void HELPER(sve_fexpa_d)(void *vd, void *vn, uint32_t desc)
134
+ } else {
91
+{
135
+ TCGv_i32 tmp = neon_load_reg(a->vn, 1);
92
+ /* These constants are cut-and-paste directly from the ARM pseudocode. */
136
+ widenfn(rn1_64, tmp);
93
+ static const uint64_t coeff[] = {
137
+ tcg_temp_free_i32(tmp);
94
+ 0x0000000000000ull, 0x02C9A3E778061ull, 0x059B0D3158574ull,
138
+ }
95
+ 0x0874518759BC8ull, 0x0B5586CF9890Full, 0x0E3EC32D3D1A2ull,
139
+ rm = neon_load_reg(a->vm, 1);
96
+ 0x11301D0125B51ull, 0x1429AAEA92DE0ull, 0x172B83C7D517Bull,
140
+
97
+ 0x1A35BEB6FCB75ull, 0x1D4873168B9AAull, 0x2063B88628CD6ull,
141
+ neon_store_reg64(rn0_64, a->vd);
98
+ 0x2387A6E756238ull, 0x26B4565E27CDDull, 0x29E9DF51FDEE1ull,
142
+
99
+ 0x2D285A6E4030Bull, 0x306FE0A31B715ull, 0x33C08B26416FFull,
143
+ widenfn(rm_64, rm);
100
+ 0x371A7373AA9CBull, 0x3A7DB34E59FF7ull, 0x3DEA64C123422ull,
144
+ tcg_temp_free_i32(rm);
101
+ 0x4160A21F72E2Aull, 0x44E086061892Dull, 0x486A2B5C13CD0ull,
145
+ opfn(rn1_64, rn1_64, rm_64);
102
+ 0x4BFDAD5362A27ull, 0x4F9B2769D2CA7ull, 0x5342B569D4F82ull,
146
+ neon_store_reg64(rn1_64, a->vd + 1);
103
+ 0x56F4736B527DAull, 0x5AB07DD485429ull, 0x5E76F15AD2148ull,
147
+
104
+ 0x6247EB03A5585ull, 0x6623882552225ull, 0x6A09E667F3BCDull,
148
+ tcg_temp_free_i64(rn0_64);
105
+ 0x6DFB23C651A2Full, 0x71F75E8EC5F74ull, 0x75FEB564267C9ull,
149
+ tcg_temp_free_i64(rn1_64);
106
+ 0x7A11473EB0187ull, 0x7E2F336CF4E62ull, 0x82589994CCE13ull,
150
+ tcg_temp_free_i64(rm_64);
107
+ 0x868D99B4492EDull, 0x8ACE5422AA0DBull, 0x8F1AE99157736ull,
151
+
108
+ 0x93737B0CDC5E5ull, 0x97D829FDE4E50ull, 0x9C49182A3F090ull,
109
+ 0xA0C667B5DE565ull, 0xA5503B23E255Dull, 0xA9E6B5579FDBFull,
110
+ 0xAE89F995AD3ADull, 0xB33A2B84F15FBull, 0xB7F76F2FB5E47ull,
111
+ 0xBCC1E904BC1D2ull, 0xC199BDD85529Cull, 0xC67F12E57D14Bull,
112
+ 0xCB720DCEF9069ull, 0xD072D4A07897Cull, 0xD5818DCFBA487ull,
113
+ 0xDA9E603DB3285ull, 0xDFC97337B9B5Full, 0xE502EE78B3FF6ull,
114
+ 0xEA4AFA2A490DAull, 0xEFA1BEE615A27ull, 0xF50765B6E4540ull,
115
+ 0xFA7C1819E90D8ull,
116
+ };
117
+ intptr_t i, opr_sz = simd_oprsz(desc) / 8;
118
+ uint64_t *d = vd, *n = vn;
119
+
120
+ for (i = 0; i < opr_sz; i++) {
121
+ uint64_t nn = n[i];
122
+ intptr_t idx = extract32(nn, 0, 6);
123
+ uint64_t exp = extract32(nn, 6, 11);
124
+ d[i] = coeff[idx] | (exp << 52);
125
+ }
126
+}
127
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
128
index XXXXXXX..XXXXXXX 100644
129
--- a/target/arm/translate-sve.c
130
+++ b/target/arm/translate-sve.c
131
@@ -XXX,XX +XXX,XX @@ static bool trans_ADR_u32(DisasContext *s, arg_rrri *a, uint32_t insn)
132
return do_adr(s, a, gen_helper_sve_adr_u32);
133
}
134
135
+/*
136
+ *** SVE Integer Misc - Unpredicated Group
137
+ */
138
+
139
+static bool trans_FEXPA(DisasContext *s, arg_rr_esz *a, uint32_t insn)
140
+{
141
+ static gen_helper_gvec_2 * const fns[4] = {
142
+ NULL,
143
+ gen_helper_sve_fexpa_h,
144
+ gen_helper_sve_fexpa_s,
145
+ gen_helper_sve_fexpa_d,
146
+ };
147
+ if (a->esz == 0) {
148
+ return false;
149
+ }
150
+ if (sve_access_check(s)) {
151
+ unsigned vsz = vec_full_reg_size(s);
152
+ tcg_gen_gvec_2_ool(vec_full_reg_offset(s, a->rd),
153
+ vec_full_reg_offset(s, a->rn),
154
+ vsz, vsz, 0, fns[a->esz]);
155
+ }
156
+ return true;
152
+ return true;
157
+}
153
+}
158
+
154
+
159
/*
155
+#define DO_PREWIDEN(INSN, S, EXT, OP, SRC1WIDE) \
160
*** SVE Predicate Logical Operations Group
156
+ static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a) \
161
*/
157
+ { \
162
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
158
+ static NeonGenWidenFn * const widenfn[] = { \
159
+ gen_helper_neon_widen_##S##8, \
160
+ gen_helper_neon_widen_##S##16, \
161
+ tcg_gen_##EXT##_i32_i64, \
162
+ NULL, \
163
+ }; \
164
+ static NeonGenTwo64OpFn * const addfn[] = { \
165
+ gen_helper_neon_##OP##l_u16, \
166
+ gen_helper_neon_##OP##l_u32, \
167
+ tcg_gen_##OP##_i64, \
168
+ NULL, \
169
+ }; \
170
+ return do_prewiden_3d(s, a, widenfn[a->size], \
171
+ addfn[a->size], SRC1WIDE); \
172
+ }
173
+
174
+DO_PREWIDEN(VADDL_S, s, ext, add, false)
175
+DO_PREWIDEN(VADDL_U, u, extu, add, false)
176
+DO_PREWIDEN(VSUBL_S, s, ext, sub, false)
177
+DO_PREWIDEN(VSUBL_U, u, extu, sub, false)
178
+DO_PREWIDEN(VADDW_S, s, ext, add, true)
179
+DO_PREWIDEN(VADDW_U, u, extu, add, true)
180
+DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
181
+DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
182
diff --git a/target/arm/translate.c b/target/arm/translate.c
163
index XXXXXXX..XXXXXXX 100644
183
index XXXXXXX..XXXXXXX 100644
164
--- a/target/arm/sve.decode
184
--- a/target/arm/translate.c
165
+++ b/target/arm/sve.decode
185
+++ b/target/arm/translate.c
166
@@ -XXX,XX +XXX,XX @@
186
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
167
187
/* Three registers of different lengths. */
168
# Two operand
188
int src1_wide;
169
@pd_pn ........ esz:2 .. .... ....... rn:4 . rd:4 &rr_esz
189
int src2_wide;
170
+@rd_rn ........ esz:2 ...... ...... rn:5 rd:5 &rr_esz
190
- int prewiden;
171
191
/* undefreq: bit 0 : UNDEF if size == 0
172
# Three operand with unused vector element size
192
* bit 1 : UNDEF if size == 1
173
@rd_rn_rm_e0 ........ ... rm:5 ... ... rn:5 rd:5 &rrr_esz esz=0
193
* bit 2 : UNDEF if size == 2
174
@@ -XXX,XX +XXX,XX @@ ADR_u32 00000100 01 1 ..... 1010 .. ..... ..... @rd_rn_msz_rm
194
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
175
ADR_p32 00000100 10 1 ..... 1010 .. ..... ..... @rd_rn_msz_rm
195
int undefreq;
176
ADR_p64 00000100 11 1 ..... 1010 .. ..... ..... @rd_rn_msz_rm
196
/* prewiden, src1_wide, src2_wide, undefreq */
177
197
static const int neon_3reg_wide[16][4] = {
178
+### SVE Integer Misc - Unpredicated Group
198
- {1, 0, 0, 0}, /* VADDL */
179
+
199
- {1, 1, 0, 0}, /* VADDW */
180
+# SVE floating-point exponential accelerator
200
- {1, 0, 0, 0}, /* VSUBL */
181
+# Note esz != 0
201
- {1, 1, 0, 0}, /* VSUBW */
182
+FEXPA 00000100 .. 1 00000 101110 ..... ..... @rd_rn
202
+ {0, 0, 0, 7}, /* VADDL: handled by decodetree */
183
+
203
+ {0, 0, 0, 7}, /* VADDW: handled by decodetree */
184
### SVE Predicate Logical Operations Group
204
+ {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
185
205
+ {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
186
# SVE predicate logical operations
206
{0, 1, 1, 0}, /* VADDHN */
207
{0, 0, 0, 0}, /* VABAL */
208
{0, 1, 1, 0}, /* VSUBHN */
209
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
210
{0, 0, 0, 7}, /* Reserved: always UNDEF */
211
};
212
213
- prewiden = neon_3reg_wide[op][0];
214
src1_wide = neon_3reg_wide[op][1];
215
src2_wide = neon_3reg_wide[op][2];
216
undefreq = neon_3reg_wide[op][3];
217
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
218
} else {
219
tmp = neon_load_reg(rn, pass);
220
}
221
- if (prewiden) {
222
- gen_neon_widen(cpu_V0, tmp, size, u);
223
- }
224
}
225
if (src2_wide) {
226
neon_load_reg64(cpu_V1, rm + pass);
227
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
228
} else {
229
tmp2 = neon_load_reg(rm, pass);
230
}
231
- if (prewiden) {
232
- gen_neon_widen(cpu_V1, tmp2, size, u);
233
- }
234
}
235
switch (op) {
236
case 0: case 1: case 4: /* VADDL, VADDW, VADDHN, VRADDHN */
187
--
237
--
188
2.17.0
238
2.20.1
189
239
190
240
diff view generated by jsdifflib
1
From: Richard Henderson <richard.henderson@linaro.org>
1
Convert the narrow-to-high-half insns VADDHN, VSUBHN, VRADDHN,
2
2
VRSUBHN in the Neon 3-registers-different-lengths group to
3
Move some stuff that will be common to both translate-a64.c
3
decodetree.
4
and translate-sve.c.
4
5
6
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
7
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
8
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
9
Message-id: 20180516223007.10256-2-richard.henderson@linaro.org
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
11
---
7
---
12
target/arm/translate-a64.h | 118 +++++++++++++++++++++++++++++++++++++
8
target/arm/neon-dp.decode | 6 +++
13
target/arm/translate-a64.c | 112 +++++------------------------------
9
target/arm/translate-neon.inc.c | 87 +++++++++++++++++++++++++++++++
14
2 files changed, 133 insertions(+), 97 deletions(-)
10
target/arm/translate.c | 91 ++++-----------------------------
15
create mode 100644 target/arm/translate-a64.h
11
3 files changed, 104 insertions(+), 80 deletions(-)
16
12
17
diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
13
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
18
new file mode 100644
14
index XXXXXXX..XXXXXXX 100644
19
index XXXXXXX..XXXXXXX
15
--- a/target/arm/neon-dp.decode
20
--- /dev/null
16
+++ b/target/arm/neon-dp.decode
21
+++ b/target/arm/translate-a64.h
17
@@ -XXX,XX +XXX,XX @@ Vimm_1r 1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
22
@@ -XXX,XX +XXX,XX @@
18
23
+/*
19
VSUBW_S_3d 1111 001 0 1 . .. .... .... 0011 . 0 . 0 .... @3diff
24
+ * AArch64 translation, common definitions.
20
VSUBW_U_3d 1111 001 1 1 . .. .... .... 0011 . 0 . 0 .... @3diff
25
+ *
21
+
26
+ * This library is free software; you can redistribute it and/or
22
+ VADDHN_3d 1111 001 0 1 . .. .... .... 0100 . 0 . 0 .... @3diff
27
+ * modify it under the terms of the GNU Lesser General Public
23
+ VRADDHN_3d 1111 001 1 1 . .. .... .... 0100 . 0 . 0 .... @3diff
28
+ * License as published by the Free Software Foundation; either
24
+
29
+ * version 2 of the License, or (at your option) any later version.
25
+ VSUBHN_3d 1111 001 0 1 . .. .... .... 0110 . 0 . 0 .... @3diff
30
+ *
26
+ VRSUBHN_3d 1111 001 1 1 . .. .... .... 0110 . 0 . 0 .... @3diff
31
+ * This library is distributed in the hope that it will be useful,
27
]
32
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
28
}
33
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
29
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
34
+ * Lesser General Public License for more details.
30
index XXXXXXX..XXXXXXX 100644
35
+ *
31
--- a/target/arm/translate-neon.inc.c
36
+ * You should have received a copy of the GNU Lesser General Public
32
+++ b/target/arm/translate-neon.inc.c
37
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
33
@@ -XXX,XX +XXX,XX @@ DO_PREWIDEN(VADDW_S, s, ext, add, true)
38
+ */
34
DO_PREWIDEN(VADDW_U, u, extu, add, true)
39
+
35
DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
40
+#ifndef TARGET_ARM_TRANSLATE_A64_H
36
DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
41
+#define TARGET_ARM_TRANSLATE_A64_H
37
+
42
+
38
+static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
43
+void unallocated_encoding(DisasContext *s);
39
+ NeonGenTwo64OpFn *opfn, NeonGenNarrowFn *narrowfn)
44
+
45
+#define unsupported_encoding(s, insn) \
46
+ do { \
47
+ qemu_log_mask(LOG_UNIMP, \
48
+ "%s:%d: unsupported instruction encoding 0x%08x " \
49
+ "at pc=%016" PRIx64 "\n", \
50
+ __FILE__, __LINE__, insn, s->pc - 4); \
51
+ unallocated_encoding(s); \
52
+ } while (0)
53
+
54
+TCGv_i64 new_tmp_a64(DisasContext *s);
55
+TCGv_i64 new_tmp_a64_zero(DisasContext *s);
56
+TCGv_i64 cpu_reg(DisasContext *s, int reg);
57
+TCGv_i64 cpu_reg_sp(DisasContext *s, int reg);
58
+TCGv_i64 read_cpu_reg(DisasContext *s, int reg, int sf);
59
+TCGv_i64 read_cpu_reg_sp(DisasContext *s, int reg, int sf);
60
+void write_fp_dreg(DisasContext *s, int reg, TCGv_i64 v);
61
+TCGv_ptr get_fpstatus_ptr(bool);
62
+bool logic_imm_decode_wmask(uint64_t *result, unsigned int immn,
63
+ unsigned int imms, unsigned int immr);
64
+uint64_t vfp_expand_imm(int size, uint8_t imm8);
65
+bool sve_access_check(DisasContext *s);
66
+
67
+/* We should have at some point before trying to access an FP register
68
+ * done the necessary access check, so assert that
69
+ * (a) we did the check and
70
+ * (b) we didn't then just plough ahead anyway if it failed.
71
+ * Print the instruction pattern in the abort message so we can figure
72
+ * out what we need to fix if a user encounters this problem in the wild.
73
+ */
74
+static inline void assert_fp_access_checked(DisasContext *s)
75
+{
40
+{
76
+#ifdef CONFIG_DEBUG_TCG
41
+ /* 3-regs different lengths, narrowing (VADDHN/VSUBHN/VRADDHN/VRSUBHN) */
77
+ if (unlikely(!s->fp_access_checked || s->fp_excp_el)) {
42
+ TCGv_i64 rn_64, rm_64;
78
+ fprintf(stderr, "target-arm: FP access check missing for "
43
+ TCGv_i32 rd0, rd1;
79
+ "instruction 0x%08x\n", s->insn);
44
+
80
+ abort();
45
+ if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
81
+ }
46
+ return false;
82
+#endif
47
+ }
48
+
49
+ /* UNDEF accesses to D16-D31 if they don't exist. */
50
+ if (!dc_isar_feature(aa32_simd_r32, s) &&
51
+ ((a->vd | a->vn | a->vm) & 0x10)) {
52
+ return false;
53
+ }
54
+
55
+ if (!opfn || !narrowfn) {
56
+ /* size == 3 case, which is an entirely different insn group */
57
+ return false;
58
+ }
59
+
60
+ if ((a->vn | a->vm) & 1) {
61
+ return false;
62
+ }
63
+
64
+ if (!vfp_access_check(s)) {
65
+ return true;
66
+ }
67
+
68
+ rn_64 = tcg_temp_new_i64();
69
+ rm_64 = tcg_temp_new_i64();
70
+ rd0 = tcg_temp_new_i32();
71
+ rd1 = tcg_temp_new_i32();
72
+
73
+ neon_load_reg64(rn_64, a->vn);
74
+ neon_load_reg64(rm_64, a->vm);
75
+
76
+ opfn(rn_64, rn_64, rm_64);
77
+
78
+ narrowfn(rd0, rn_64);
79
+
80
+ neon_load_reg64(rn_64, a->vn + 1);
81
+ neon_load_reg64(rm_64, a->vm + 1);
82
+
83
+ opfn(rn_64, rn_64, rm_64);
84
+
85
+ narrowfn(rd1, rn_64);
86
+
87
+ neon_store_reg(a->vd, 0, rd0);
88
+ neon_store_reg(a->vd, 1, rd1);
89
+
90
+ tcg_temp_free_i64(rn_64);
91
+ tcg_temp_free_i64(rm_64);
92
+
93
+ return true;
83
+}
94
+}
84
+
95
+
85
+/* Return the offset into CPUARMState of an element of specified
96
+#define DO_NARROW_3D(INSN, OP, NARROWTYPE, EXTOP) \
86
+ * size, 'element' places in from the least significant end of
97
+ static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a) \
87
+ * the FP/vector register Qn.
98
+ { \
88
+ */
99
+ static NeonGenTwo64OpFn * const addfn[] = { \
89
+static inline int vec_reg_offset(DisasContext *s, int regno,
100
+ gen_helper_neon_##OP##l_u16, \
90
+ int element, TCGMemOp size)
101
+ gen_helper_neon_##OP##l_u32, \
102
+ tcg_gen_##OP##_i64, \
103
+ NULL, \
104
+ }; \
105
+ static NeonGenNarrowFn * const narrowfn[] = { \
106
+ gen_helper_neon_##NARROWTYPE##_high_u8, \
107
+ gen_helper_neon_##NARROWTYPE##_high_u16, \
108
+ EXTOP, \
109
+ NULL, \
110
+ }; \
111
+ return do_narrow_3d(s, a, addfn[a->size], narrowfn[a->size]); \
112
+ }
113
+
114
+static void gen_narrow_round_high_u32(TCGv_i32 rd, TCGv_i64 rn)
91
+{
115
+{
92
+ int offs = 0;
116
+ tcg_gen_addi_i64(rn, rn, 1u << 31);
93
+#ifdef HOST_WORDS_BIGENDIAN
117
+ tcg_gen_extrh_i64_i32(rd, rn);
94
+ /* This is complicated slightly because vfp.zregs[n].d[0] is
95
+ * still the low half and vfp.zregs[n].d[1] the high half
96
+ * of the 128 bit vector, even on big endian systems.
97
+ * Calculate the offset assuming a fully bigendian 128 bits,
98
+ * then XOR to account for the order of the two 64 bit halves.
99
+ */
100
+ offs += (16 - ((element + 1) * (1 << size)));
101
+ offs ^= 8;
102
+#else
103
+ offs += element * (1 << size);
104
+#endif
105
+ offs += offsetof(CPUARMState, vfp.zregs[regno]);
106
+ assert_fp_access_checked(s);
107
+ return offs;
108
+}
118
+}
109
+
119
+
110
+/* Return the offset info CPUARMState of the "whole" vector register Qn. */
120
+DO_NARROW_3D(VADDHN, add, narrow, tcg_gen_extrh_i64_i32)
111
+static inline int vec_full_reg_offset(DisasContext *s, int regno)
121
+DO_NARROW_3D(VSUBHN, sub, narrow, tcg_gen_extrh_i64_i32)
112
+{
122
+DO_NARROW_3D(VRADDHN, add, narrow_round, gen_narrow_round_high_u32)
113
+ assert_fp_access_checked(s);
123
+DO_NARROW_3D(VRSUBHN, sub, narrow_round, gen_narrow_round_high_u32)
114
+ return offsetof(CPUARMState, vfp.zregs[regno]);
124
diff --git a/target/arm/translate.c b/target/arm/translate.c
115
+}
116
+
117
+/* Return a newly allocated pointer to the vector register. */
118
+static inline TCGv_ptr vec_full_reg_ptr(DisasContext *s, int regno)
119
+{
120
+ TCGv_ptr ret = tcg_temp_new_ptr();
121
+ tcg_gen_addi_ptr(ret, cpu_env, vec_full_reg_offset(s, regno));
122
+ return ret;
123
+}
124
+
125
+/* Return the byte size of the "whole" vector register, VL / 8. */
126
+static inline int vec_full_reg_size(DisasContext *s)
127
+{
128
+ return s->sve_len;
129
+}
130
+
131
+bool disas_sve(DisasContext *, uint32_t);
132
+
133
+/* Note that the gvec expanders operate on offsets + sizes. */
134
+typedef void GVecGen2Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t);
135
+typedef void GVecGen2iFn(unsigned, uint32_t, uint32_t, int64_t,
136
+ uint32_t, uint32_t);
137
+typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
138
+ uint32_t, uint32_t, uint32_t);
139
+
140
+#endif /* TARGET_ARM_TRANSLATE_A64_H */
141
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
142
index XXXXXXX..XXXXXXX 100644
125
index XXXXXXX..XXXXXXX 100644
143
--- a/target/arm/translate-a64.c
126
--- a/target/arm/translate.c
144
+++ b/target/arm/translate-a64.c
127
+++ b/target/arm/translate.c
145
@@ -XXX,XX +XXX,XX @@
128
@@ -XXX,XX +XXX,XX @@ static inline void gen_neon_addl(int size)
146
#include "exec/log.h"
147
148
#include "trace-tcg.h"
149
+#include "translate-a64.h"
150
151
static TCGv_i64 cpu_X[32];
152
static TCGv_i64 cpu_pc;
153
154
/* Load/store exclusive handling */
155
static TCGv_i64 cpu_exclusive_high;
156
-static TCGv_i64 cpu_reg(DisasContext *s, int reg);
157
158
static const char *regnames[] = {
159
"x0", "x1", "x2", "x3", "x4", "x5", "x6", "x7",
160
@@ -XXX,XX +XXX,XX @@ typedef void CryptoThreeOpIntFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
161
typedef void CryptoThreeOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
162
typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, TCGMemOp);
163
164
-/* Note that the gvec expanders operate on offsets + sizes. */
165
-typedef void GVecGen2Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t);
166
-typedef void GVecGen2iFn(unsigned, uint32_t, uint32_t, int64_t,
167
- uint32_t, uint32_t);
168
-typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
169
- uint32_t, uint32_t, uint32_t);
170
-
171
/* initialize TCG globals. */
172
void a64_translate_init(void)
173
{
174
@@ -XXX,XX +XXX,XX @@ static inline void gen_goto_tb(DisasContext *s, int n, uint64_t dest)
175
}
129
}
176
}
130
}
177
131
178
-static void unallocated_encoding(DisasContext *s)
132
-static inline void gen_neon_subl(int size)
179
+void unallocated_encoding(DisasContext *s)
180
{
181
/* Unallocated and reserved encodings are uncategorized */
182
gen_exception_insn(s, 4, EXCP_UDEF, syn_uncategorized(),
183
default_exception_el(s));
184
}
185
186
-#define unsupported_encoding(s, insn) \
187
- do { \
188
- qemu_log_mask(LOG_UNIMP, \
189
- "%s:%d: unsupported instruction encoding 0x%08x " \
190
- "at pc=%016" PRIx64 "\n", \
191
- __FILE__, __LINE__, insn, s->pc - 4); \
192
- unallocated_encoding(s); \
193
- } while (0)
194
-
195
static void init_tmp_a64_array(DisasContext *s)
196
{
197
#ifdef CONFIG_DEBUG_TCG
198
@@ -XXX,XX +XXX,XX @@ static void free_tmp_a64(DisasContext *s)
199
init_tmp_a64_array(s);
200
}
201
202
-static TCGv_i64 new_tmp_a64(DisasContext *s)
203
+TCGv_i64 new_tmp_a64(DisasContext *s)
204
{
205
assert(s->tmp_a64_count < TMP_A64_MAX);
206
return s->tmp_a64[s->tmp_a64_count++] = tcg_temp_new_i64();
207
}
208
209
-static TCGv_i64 new_tmp_a64_zero(DisasContext *s)
210
+TCGv_i64 new_tmp_a64_zero(DisasContext *s)
211
{
212
TCGv_i64 t = new_tmp_a64(s);
213
tcg_gen_movi_i64(t, 0);
214
@@ -XXX,XX +XXX,XX @@ static TCGv_i64 new_tmp_a64_zero(DisasContext *s)
215
* to cpu_X[31] and ZR accesses to a temporary which can be discarded.
216
* This is the point of the _sp forms.
217
*/
218
-static TCGv_i64 cpu_reg(DisasContext *s, int reg)
219
+TCGv_i64 cpu_reg(DisasContext *s, int reg)
220
{
221
if (reg == 31) {
222
return new_tmp_a64_zero(s);
223
@@ -XXX,XX +XXX,XX @@ static TCGv_i64 cpu_reg(DisasContext *s, int reg)
224
}
225
226
/* register access for when 31 == SP */
227
-static TCGv_i64 cpu_reg_sp(DisasContext *s, int reg)
228
+TCGv_i64 cpu_reg_sp(DisasContext *s, int reg)
229
{
230
return cpu_X[reg];
231
}
232
@@ -XXX,XX +XXX,XX @@ static TCGv_i64 cpu_reg_sp(DisasContext *s, int reg)
233
* representing the register contents. This TCGv is an auto-freed
234
* temporary so it need not be explicitly freed, and may be modified.
235
*/
236
-static TCGv_i64 read_cpu_reg(DisasContext *s, int reg, int sf)
237
+TCGv_i64 read_cpu_reg(DisasContext *s, int reg, int sf)
238
{
239
TCGv_i64 v = new_tmp_a64(s);
240
if (reg != 31) {
241
@@ -XXX,XX +XXX,XX @@ static TCGv_i64 read_cpu_reg(DisasContext *s, int reg, int sf)
242
return v;
243
}
244
245
-static TCGv_i64 read_cpu_reg_sp(DisasContext *s, int reg, int sf)
246
+TCGv_i64 read_cpu_reg_sp(DisasContext *s, int reg, int sf)
247
{
248
TCGv_i64 v = new_tmp_a64(s);
249
if (sf) {
250
@@ -XXX,XX +XXX,XX @@ static TCGv_i64 read_cpu_reg_sp(DisasContext *s, int reg, int sf)
251
return v;
252
}
253
254
-/* We should have at some point before trying to access an FP register
255
- * done the necessary access check, so assert that
256
- * (a) we did the check and
257
- * (b) we didn't then just plough ahead anyway if it failed.
258
- * Print the instruction pattern in the abort message so we can figure
259
- * out what we need to fix if a user encounters this problem in the wild.
260
- */
261
-static inline void assert_fp_access_checked(DisasContext *s)
262
-{
133
-{
263
-#ifdef CONFIG_DEBUG_TCG
134
- switch (size) {
264
- if (unlikely(!s->fp_access_checked || s->fp_excp_el)) {
135
- case 0: gen_helper_neon_subl_u16(CPU_V001); break;
265
- fprintf(stderr, "target-arm: FP access check missing for "
136
- case 1: gen_helper_neon_subl_u32(CPU_V001); break;
266
- "instruction 0x%08x\n", s->insn);
137
- case 2: tcg_gen_sub_i64(CPU_V001); break;
267
- abort();
138
- default: abort();
268
- }
139
- }
269
-#endif
270
-}
140
-}
271
-
141
-
272
-/* Return the offset into CPUARMState of an element of specified
142
static inline void gen_neon_negl(TCGv_i64 var, int size)
273
- * size, 'element' places in from the least significant end of
274
- * the FP/vector register Qn.
275
- */
276
-static inline int vec_reg_offset(DisasContext *s, int regno,
277
- int element, TCGMemOp size)
278
-{
279
- int offs = 0;
280
-#ifdef HOST_WORDS_BIGENDIAN
281
- /* This is complicated slightly because vfp.zregs[n].d[0] is
282
- * still the low half and vfp.zregs[n].d[1] the high half
283
- * of the 128 bit vector, even on big endian systems.
284
- * Calculate the offset assuming a fully bigendian 128 bits,
285
- * then XOR to account for the order of the two 64 bit halves.
286
- */
287
- offs += (16 - ((element + 1) * (1 << size)));
288
- offs ^= 8;
289
-#else
290
- offs += element * (1 << size);
291
-#endif
292
- offs += offsetof(CPUARMState, vfp.zregs[regno]);
293
- assert_fp_access_checked(s);
294
- return offs;
295
-}
296
-
297
-/* Return the offset info CPUARMState of the "whole" vector register Qn. */
298
-static inline int vec_full_reg_offset(DisasContext *s, int regno)
299
-{
300
- assert_fp_access_checked(s);
301
- return offsetof(CPUARMState, vfp.zregs[regno]);
302
-}
303
-
304
-/* Return a newly allocated pointer to the vector register. */
305
-static TCGv_ptr vec_full_reg_ptr(DisasContext *s, int regno)
306
-{
307
- TCGv_ptr ret = tcg_temp_new_ptr();
308
- tcg_gen_addi_ptr(ret, cpu_env, vec_full_reg_offset(s, regno));
309
- return ret;
310
-}
311
-
312
-/* Return the byte size of the "whole" vector register, VL / 8. */
313
-static inline int vec_full_reg_size(DisasContext *s)
314
-{
315
- /* FIXME SVE: We should put the composite ZCR_EL* value into tb->flags.
316
- In the meantime this is just the AdvSIMD length of 128. */
317
- return 128 / 8;
318
-}
319
-
320
/* Return the offset into CPUARMState of a slice (from
321
* the least significant end) of FP register Qn (ie
322
* Dn, Sn, Hn or Bn).
323
@@ -XXX,XX +XXX,XX @@ static void clear_vec_high(DisasContext *s, bool is_q, int rd)
324
}
325
}
326
327
-static void write_fp_dreg(DisasContext *s, int reg, TCGv_i64 v)
328
+void write_fp_dreg(DisasContext *s, int reg, TCGv_i64 v)
329
{
143
{
330
unsigned ofs = fp_reg_offset(s, reg, MO_64);
144
switch (size) {
331
145
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
332
@@ -XXX,XX +XXX,XX @@ static void write_fp_sreg(DisasContext *s, int reg, TCGv_i32 v)
146
op = (insn >> 8) & 0xf;
333
tcg_temp_free_i64(tmp);
147
if ((insn & (1 << 6)) == 0) {
334
}
148
/* Three registers of different lengths. */
335
149
- int src1_wide;
336
-static TCGv_ptr get_fpstatus_ptr(bool is_f16)
150
- int src2_wide;
337
+TCGv_ptr get_fpstatus_ptr(bool is_f16)
151
/* undefreq: bit 0 : UNDEF if size == 0
338
{
152
* bit 1 : UNDEF if size == 1
339
TCGv_ptr statusptr = tcg_temp_new_ptr();
153
* bit 2 : UNDEF if size == 2
340
int offset;
154
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
341
@@ -XXX,XX +XXX,XX @@ static inline bool fp_access_check(DisasContext *s)
155
{0, 0, 0, 7}, /* VADDW: handled by decodetree */
342
/* Check that SVE access is enabled. If it is, return true.
156
{0, 0, 0, 7}, /* VSUBL: handled by decodetree */
343
* If not, emit code to generate an appropriate exception and return false.
157
{0, 0, 0, 7}, /* VSUBW: handled by decodetree */
344
*/
158
- {0, 1, 1, 0}, /* VADDHN */
345
-static inline bool sve_access_check(DisasContext *s)
159
+ {0, 0, 0, 7}, /* VADDHN: handled by decodetree */
346
+bool sve_access_check(DisasContext *s)
160
{0, 0, 0, 0}, /* VABAL */
347
{
161
- {0, 1, 1, 0}, /* VSUBHN */
348
if (s->sve_excp_el) {
162
+ {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
349
gen_exception_insn(s, 4, EXCP_UDEF, syn_sve_access_trap(),
163
{0, 0, 0, 0}, /* VABDL */
350
s->sve_excp_el);
164
{0, 0, 0, 0}, /* VMLAL */
351
return false;
165
{0, 0, 0, 9}, /* VQDMLAL */
352
}
166
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
353
- return true;
167
{0, 0, 0, 7}, /* Reserved: always UNDEF */
354
+ return fp_access_check(s);
168
};
355
}
169
356
170
- src1_wide = neon_3reg_wide[op][1];
357
/*
171
- src2_wide = neon_3reg_wide[op][2];
358
@@ -XXX,XX +XXX,XX @@ static inline uint64_t bitmask64(unsigned int length)
172
undefreq = neon_3reg_wide[op][3];
359
* value (ie should cause a guest UNDEF exception), and true if they are
173
360
* valid, in which case the decoded bit pattern is written to result.
174
if ((undefreq & (1 << size)) ||
361
*/
175
((undefreq & 8) && u)) {
362
-static bool logic_imm_decode_wmask(uint64_t *result, unsigned int immn,
176
return 1;
363
- unsigned int imms, unsigned int immr)
177
}
364
+bool logic_imm_decode_wmask(uint64_t *result, unsigned int immn,
178
- if ((src1_wide && (rn & 1)) ||
365
+ unsigned int imms, unsigned int immr)
179
- (src2_wide && (rm & 1)) ||
366
{
180
- (!src2_wide && (rd & 1))) {
367
uint64_t mask;
181
+ if (rd & 1) {
368
unsigned e, levels, s, r;
182
return 1;
369
@@ -XXX,XX +XXX,XX @@ static void disas_fp_3src(DisasContext *s, uint32_t insn)
183
}
370
* the range 01....1xx to 10....0xx, and the most significant 4 bits of
184
371
* the mantissa; see VFPExpandImm() in the v8 ARM ARM.
185
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
372
*/
186
/* Avoid overlapping operands. Wide source operands are
373
-static uint64_t vfp_expand_imm(int size, uint8_t imm8)
187
always aligned so will never overlap with wide
374
+uint64_t vfp_expand_imm(int size, uint8_t imm8)
188
destinations in problematic ways. */
375
{
189
- if (rd == rm && !src2_wide) {
376
uint64_t imm;
190
+ if (rd == rm) {
377
191
tmp = neon_load_reg(rm, 1);
192
neon_store_scratch(2, tmp);
193
- } else if (rd == rn && !src1_wide) {
194
+ } else if (rd == rn) {
195
tmp = neon_load_reg(rn, 1);
196
neon_store_scratch(2, tmp);
197
}
198
tmp3 = NULL;
199
for (pass = 0; pass < 2; pass++) {
200
- if (src1_wide) {
201
- neon_load_reg64(cpu_V0, rn + pass);
202
- tmp = NULL;
203
+ if (pass == 1 && rd == rn) {
204
+ tmp = neon_load_scratch(2);
205
} else {
206
- if (pass == 1 && rd == rn) {
207
- tmp = neon_load_scratch(2);
208
- } else {
209
- tmp = neon_load_reg(rn, pass);
210
- }
211
+ tmp = neon_load_reg(rn, pass);
212
}
213
- if (src2_wide) {
214
- neon_load_reg64(cpu_V1, rm + pass);
215
- tmp2 = NULL;
216
+ if (pass == 1 && rd == rm) {
217
+ tmp2 = neon_load_scratch(2);
218
} else {
219
- if (pass == 1 && rd == rm) {
220
- tmp2 = neon_load_scratch(2);
221
- } else {
222
- tmp2 = neon_load_reg(rm, pass);
223
- }
224
+ tmp2 = neon_load_reg(rm, pass);
225
}
226
switch (op) {
227
- case 0: case 1: case 4: /* VADDL, VADDW, VADDHN, VRADDHN */
228
- gen_neon_addl(size);
229
- break;
230
- case 2: case 3: case 6: /* VSUBL, VSUBW, VSUBHN, VRSUBHN */
231
- gen_neon_subl(size);
232
- break;
233
case 5: case 7: /* VABAL, VABDL */
234
switch ((size << 1) | u) {
235
case 0:
236
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
237
abort();
238
}
239
neon_store_reg64(cpu_V0, rd + pass);
240
- } else if (op == 4 || op == 6) {
241
- /* Narrowing operation. */
242
- tmp = tcg_temp_new_i32();
243
- if (!u) {
244
- switch (size) {
245
- case 0:
246
- gen_helper_neon_narrow_high_u8(tmp, cpu_V0);
247
- break;
248
- case 1:
249
- gen_helper_neon_narrow_high_u16(tmp, cpu_V0);
250
- break;
251
- case 2:
252
- tcg_gen_extrh_i64_i32(tmp, cpu_V0);
253
- break;
254
- default: abort();
255
- }
256
- } else {
257
- switch (size) {
258
- case 0:
259
- gen_helper_neon_narrow_round_high_u8(tmp, cpu_V0);
260
- break;
261
- case 1:
262
- gen_helper_neon_narrow_round_high_u16(tmp, cpu_V0);
263
- break;
264
- case 2:
265
- tcg_gen_addi_i64(cpu_V0, cpu_V0, 1u << 31);
266
- tcg_gen_extrh_i64_i32(tmp, cpu_V0);
267
- break;
268
- default: abort();
269
- }
270
- }
271
- if (pass == 0) {
272
- tmp3 = tmp;
273
- } else {
274
- neon_store_reg(rd, 0, tmp3);
275
- neon_store_reg(rd, 1, tmp);
276
- }
277
} else {
278
/* Write back the result. */
279
neon_store_reg64(cpu_V0, rd + pass);
378
--
280
--
379
2.17.0
281
2.20.1
380
282
381
283
diff view generated by jsdifflib
1
From: Richard Henderson <richard.henderson@linaro.org>
1
Convert the Neon 3-reg-diff insns VABAL and VABDL to decodetree.
2
2
Like almost all the remaining insns in this group, these are
3
Excepting MOVPRFX, which isn't a reduction. Presumably it is
3
a combination of a two-input operation which returns a double width
4
placed within the group because of its encoding.
4
result and then a possible accumulation of that double width
5
5
result into the destination.
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
6
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20180516223007.10256-10-richard.henderson@linaro.org
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
10
---
9
---
11
target/arm/helper-sve.h | 44 ++++++++++++++++++
10
target/arm/translate.h | 1 +
12
target/arm/sve_helper.c | 91 ++++++++++++++++++++++++++++++++++++++
11
target/arm/neon-dp.decode | 6 ++
13
target/arm/translate-sve.c | 68 ++++++++++++++++++++++++++++
12
target/arm/translate-neon.inc.c | 132 ++++++++++++++++++++++++++++++++
14
target/arm/sve.decode | 22 +++++++++
13
target/arm/translate.c | 31 +-------
15
4 files changed, 225 insertions(+)
14
4 files changed, 142 insertions(+), 28 deletions(-)
16
15
17
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
16
diff --git a/target/arm/translate.h b/target/arm/translate.h
18
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
19
--- a/target/arm/helper-sve.h
18
--- a/target/arm/translate.h
20
+++ b/target/arm/helper-sve.h
19
+++ b/target/arm/translate.h
21
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sve_udiv_zpzz_s, TCG_CALL_NO_RWG,
20
@@ -XXX,XX +XXX,XX @@ typedef void NeonGenTwo64OpEnvFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
22
DEF_HELPER_FLAGS_5(sve_udiv_zpzz_d, TCG_CALL_NO_RWG,
21
typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
23
void, ptr, ptr, ptr, ptr, i32)
22
typedef void NeonGenNarrowEnvFn(TCGv_i32, TCGv_ptr, TCGv_i64);
24
23
typedef void NeonGenWidenFn(TCGv_i64, TCGv_i32);
25
+DEF_HELPER_FLAGS_3(sve_orv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
24
+typedef void NeonGenTwoOpWidenFn(TCGv_i64, TCGv_i32, TCGv_i32);
26
+DEF_HELPER_FLAGS_3(sve_orv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
25
typedef void NeonGenTwoSingleOPFn(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
27
+DEF_HELPER_FLAGS_3(sve_orv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
26
typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
28
+DEF_HELPER_FLAGS_3(sve_orv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
27
typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64);
29
+
28
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
30
+DEF_HELPER_FLAGS_3(sve_eorv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
29
index XXXXXXX..XXXXXXX 100644
31
+DEF_HELPER_FLAGS_3(sve_eorv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
30
--- a/target/arm/neon-dp.decode
32
+DEF_HELPER_FLAGS_3(sve_eorv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
31
+++ b/target/arm/neon-dp.decode
33
+DEF_HELPER_FLAGS_3(sve_eorv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
32
@@ -XXX,XX +XXX,XX @@ Vimm_1r 1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
34
+
33
VADDHN_3d 1111 001 0 1 . .. .... .... 0100 . 0 . 0 .... @3diff
35
+DEF_HELPER_FLAGS_3(sve_andv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
34
VRADDHN_3d 1111 001 1 1 . .. .... .... 0100 . 0 . 0 .... @3diff
36
+DEF_HELPER_FLAGS_3(sve_andv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
35
37
+DEF_HELPER_FLAGS_3(sve_andv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
36
+ VABAL_S_3d 1111 001 0 1 . .. .... .... 0101 . 0 . 0 .... @3diff
38
+DEF_HELPER_FLAGS_3(sve_andv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
37
+ VABAL_U_3d 1111 001 1 1 . .. .... .... 0101 . 0 . 0 .... @3diff
39
+
38
+
40
+DEF_HELPER_FLAGS_3(sve_saddv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
39
VSUBHN_3d 1111 001 0 1 . .. .... .... 0110 . 0 . 0 .... @3diff
41
+DEF_HELPER_FLAGS_3(sve_saddv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
40
VRSUBHN_3d 1111 001 1 1 . .. .... .... 0110 . 0 . 0 .... @3diff
42
+DEF_HELPER_FLAGS_3(sve_saddv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
41
+
43
+
42
+ VABDL_S_3d 1111 001 0 1 . .. .... .... 0111 . 0 . 0 .... @3diff
44
+DEF_HELPER_FLAGS_3(sve_uaddv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
43
+ VABDL_U_3d 1111 001 1 1 . .. .... .... 0111 . 0 . 0 .... @3diff
45
+DEF_HELPER_FLAGS_3(sve_uaddv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
44
]
46
+DEF_HELPER_FLAGS_3(sve_uaddv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
45
}
47
+DEF_HELPER_FLAGS_3(sve_uaddv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
46
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
48
+
47
index XXXXXXX..XXXXXXX 100644
49
+DEF_HELPER_FLAGS_3(sve_smaxv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
48
--- a/target/arm/translate-neon.inc.c
50
+DEF_HELPER_FLAGS_3(sve_smaxv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
49
+++ b/target/arm/translate-neon.inc.c
51
+DEF_HELPER_FLAGS_3(sve_smaxv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
50
@@ -XXX,XX +XXX,XX @@ DO_NARROW_3D(VADDHN, add, narrow, tcg_gen_extrh_i64_i32)
52
+DEF_HELPER_FLAGS_3(sve_smaxv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
51
DO_NARROW_3D(VSUBHN, sub, narrow, tcg_gen_extrh_i64_i32)
53
+
52
DO_NARROW_3D(VRADDHN, add, narrow_round, gen_narrow_round_high_u32)
54
+DEF_HELPER_FLAGS_3(sve_umaxv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
53
DO_NARROW_3D(VRSUBHN, sub, narrow_round, gen_narrow_round_high_u32)
55
+DEF_HELPER_FLAGS_3(sve_umaxv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
54
+
56
+DEF_HELPER_FLAGS_3(sve_umaxv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
55
+static bool do_long_3d(DisasContext *s, arg_3diff *a,
57
+DEF_HELPER_FLAGS_3(sve_umaxv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
56
+ NeonGenTwoOpWidenFn *opfn,
58
+
57
+ NeonGenTwo64OpFn *accfn)
59
+DEF_HELPER_FLAGS_3(sve_sminv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
58
+{
60
+DEF_HELPER_FLAGS_3(sve_sminv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
59
+ /*
61
+DEF_HELPER_FLAGS_3(sve_sminv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
60
+ * 3-regs different lengths, long operations.
62
+DEF_HELPER_FLAGS_3(sve_sminv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
61
+ * These perform an operation on two inputs that returns a double-width
63
+
62
+ * result, and then possibly perform an accumulation operation of
64
+DEF_HELPER_FLAGS_3(sve_uminv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
63
+ * that result into the double-width destination.
65
+DEF_HELPER_FLAGS_3(sve_uminv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
64
+ */
66
+DEF_HELPER_FLAGS_3(sve_uminv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
65
+ TCGv_i64 rd0, rd1, tmp;
67
+DEF_HELPER_FLAGS_3(sve_uminv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
66
+ TCGv_i32 rn, rm;
68
+
67
+
69
DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
68
+ if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
70
DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
69
+ return false;
71
DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
70
+ }
72
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
71
+
73
index XXXXXXX..XXXXXXX 100644
72
+ /* UNDEF accesses to D16-D31 if they don't exist. */
74
--- a/target/arm/sve_helper.c
73
+ if (!dc_isar_feature(aa32_simd_r32, s) &&
75
+++ b/target/arm/sve_helper.c
74
+ ((a->vd | a->vn | a->vm) & 0x10)) {
76
@@ -XXX,XX +XXX,XX @@ DO_ZPZZ_D(sve_udiv_zpzz_d, uint64_t, DO_DIV)
75
+ return false;
77
76
+ }
78
#undef DO_ZPZZ
77
+
79
#undef DO_ZPZZ_D
78
+ if (!opfn) {
80
+
79
+ /* size == 3 case, which is an entirely different insn group */
81
+/* Two-operand reduction expander, controlled by a predicate.
80
+ return false;
82
+ * The difference between TYPERED and TYPERET has to do with
81
+ }
83
+ * sign-extension. E.g. for SMAX, TYPERED must be signed,
82
+
84
+ * but TYPERET must be unsigned so that e.g. a 32-bit value
83
+ if (a->vd & 1) {
85
+ * is not sign-extended to the ABI uint64_t return type.
84
+ return false;
86
+ */
85
+ }
87
+/* ??? If we were to vectorize this by hand the reduction ordering
86
+
88
+ * would change. For integer operands, this is perfectly fine.
87
+ if (!vfp_access_check(s)) {
89
+ */
90
+#define DO_VPZ(NAME, TYPEELT, TYPERED, TYPERET, H, INIT, OP) \
91
+uint64_t HELPER(NAME)(void *vn, void *vg, uint32_t desc) \
92
+{ \
93
+ intptr_t i, opr_sz = simd_oprsz(desc); \
94
+ TYPERED ret = INIT; \
95
+ for (i = 0; i < opr_sz; ) { \
96
+ uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \
97
+ do { \
98
+ if (pg & 1) { \
99
+ TYPEELT nn = *(TYPEELT *)(vn + H(i)); \
100
+ ret = OP(ret, nn); \
101
+ } \
102
+ i += sizeof(TYPEELT), pg >>= sizeof(TYPEELT); \
103
+ } while (i & 15); \
104
+ } \
105
+ return (TYPERET)ret; \
106
+}
107
+
108
+#define DO_VPZ_D(NAME, TYPEE, TYPER, INIT, OP) \
109
+uint64_t HELPER(NAME)(void *vn, void *vg, uint32_t desc) \
110
+{ \
111
+ intptr_t i, opr_sz = simd_oprsz(desc) / 8; \
112
+ TYPEE *n = vn; \
113
+ uint8_t *pg = vg; \
114
+ TYPER ret = INIT; \
115
+ for (i = 0; i < opr_sz; i += 1) { \
116
+ if (pg[H1(i)] & 1) { \
117
+ TYPEE nn = n[i]; \
118
+ ret = OP(ret, nn); \
119
+ } \
120
+ } \
121
+ return ret; \
122
+}
123
+
124
+DO_VPZ(sve_orv_b, uint8_t, uint8_t, uint8_t, H1, 0, DO_ORR)
125
+DO_VPZ(sve_orv_h, uint16_t, uint16_t, uint16_t, H1_2, 0, DO_ORR)
126
+DO_VPZ(sve_orv_s, uint32_t, uint32_t, uint32_t, H1_4, 0, DO_ORR)
127
+DO_VPZ_D(sve_orv_d, uint64_t, uint64_t, 0, DO_ORR)
128
+
129
+DO_VPZ(sve_eorv_b, uint8_t, uint8_t, uint8_t, H1, 0, DO_EOR)
130
+DO_VPZ(sve_eorv_h, uint16_t, uint16_t, uint16_t, H1_2, 0, DO_EOR)
131
+DO_VPZ(sve_eorv_s, uint32_t, uint32_t, uint32_t, H1_4, 0, DO_EOR)
132
+DO_VPZ_D(sve_eorv_d, uint64_t, uint64_t, 0, DO_EOR)
133
+
134
+DO_VPZ(sve_andv_b, uint8_t, uint8_t, uint8_t, H1, -1, DO_AND)
135
+DO_VPZ(sve_andv_h, uint16_t, uint16_t, uint16_t, H1_2, -1, DO_AND)
136
+DO_VPZ(sve_andv_s, uint32_t, uint32_t, uint32_t, H1_4, -1, DO_AND)
137
+DO_VPZ_D(sve_andv_d, uint64_t, uint64_t, -1, DO_AND)
138
+
139
+DO_VPZ(sve_saddv_b, int8_t, uint64_t, uint64_t, H1, 0, DO_ADD)
140
+DO_VPZ(sve_saddv_h, int16_t, uint64_t, uint64_t, H1_2, 0, DO_ADD)
141
+DO_VPZ(sve_saddv_s, int32_t, uint64_t, uint64_t, H1_4, 0, DO_ADD)
142
+
143
+DO_VPZ(sve_uaddv_b, uint8_t, uint64_t, uint64_t, H1, 0, DO_ADD)
144
+DO_VPZ(sve_uaddv_h, uint16_t, uint64_t, uint64_t, H1_2, 0, DO_ADD)
145
+DO_VPZ(sve_uaddv_s, uint32_t, uint64_t, uint64_t, H1_4, 0, DO_ADD)
146
+DO_VPZ_D(sve_uaddv_d, uint64_t, uint64_t, 0, DO_ADD)
147
+
148
+DO_VPZ(sve_smaxv_b, int8_t, int8_t, uint8_t, H1, INT8_MIN, DO_MAX)
149
+DO_VPZ(sve_smaxv_h, int16_t, int16_t, uint16_t, H1_2, INT16_MIN, DO_MAX)
150
+DO_VPZ(sve_smaxv_s, int32_t, int32_t, uint32_t, H1_4, INT32_MIN, DO_MAX)
151
+DO_VPZ_D(sve_smaxv_d, int64_t, int64_t, INT64_MIN, DO_MAX)
152
+
153
+DO_VPZ(sve_umaxv_b, uint8_t, uint8_t, uint8_t, H1, 0, DO_MAX)
154
+DO_VPZ(sve_umaxv_h, uint16_t, uint16_t, uint16_t, H1_2, 0, DO_MAX)
155
+DO_VPZ(sve_umaxv_s, uint32_t, uint32_t, uint32_t, H1_4, 0, DO_MAX)
156
+DO_VPZ_D(sve_umaxv_d, uint64_t, uint64_t, 0, DO_MAX)
157
+
158
+DO_VPZ(sve_sminv_b, int8_t, int8_t, uint8_t, H1, INT8_MAX, DO_MIN)
159
+DO_VPZ(sve_sminv_h, int16_t, int16_t, uint16_t, H1_2, INT16_MAX, DO_MIN)
160
+DO_VPZ(sve_sminv_s, int32_t, int32_t, uint32_t, H1_4, INT32_MAX, DO_MIN)
161
+DO_VPZ_D(sve_sminv_d, int64_t, int64_t, INT64_MAX, DO_MIN)
162
+
163
+DO_VPZ(sve_uminv_b, uint8_t, uint8_t, uint8_t, H1, -1, DO_MIN)
164
+DO_VPZ(sve_uminv_h, uint16_t, uint16_t, uint16_t, H1_2, -1, DO_MIN)
165
+DO_VPZ(sve_uminv_s, uint32_t, uint32_t, uint32_t, H1_4, -1, DO_MIN)
166
+DO_VPZ_D(sve_uminv_d, uint64_t, uint64_t, -1, DO_MIN)
167
+
168
+#undef DO_VPZ
169
+#undef DO_VPZ_D
170
+
171
#undef DO_AND
172
#undef DO_ORR
173
#undef DO_EOR
174
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
175
index XXXXXXX..XXXXXXX 100644
176
--- a/target/arm/translate-sve.c
177
+++ b/target/arm/translate-sve.c
178
@@ -XXX,XX +XXX,XX @@ static bool trans_UDIV_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
179
180
#undef DO_ZPZZ
181
182
+/*
183
+ *** SVE Integer Reduction Group
184
+ */
185
+
186
+typedef void gen_helper_gvec_reduc(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_i32);
187
+static bool do_vpz_ool(DisasContext *s, arg_rpr_esz *a,
188
+ gen_helper_gvec_reduc *fn)
189
+{
190
+ unsigned vsz = vec_full_reg_size(s);
191
+ TCGv_ptr t_zn, t_pg;
192
+ TCGv_i32 desc;
193
+ TCGv_i64 temp;
194
+
195
+ if (fn == NULL) {
196
+ return false;
197
+ }
198
+ if (!sve_access_check(s)) {
199
+ return true;
88
+ return true;
200
+ }
89
+ }
201
+
90
+
202
+ desc = tcg_const_i32(simd_desc(vsz, vsz, 0));
91
+ rd0 = tcg_temp_new_i64();
203
+ temp = tcg_temp_new_i64();
92
+ rd1 = tcg_temp_new_i64();
204
+ t_zn = tcg_temp_new_ptr();
93
+
205
+ t_pg = tcg_temp_new_ptr();
94
+ rn = neon_load_reg(a->vn, 0);
206
+
95
+ rm = neon_load_reg(a->vm, 0);
207
+ tcg_gen_addi_ptr(t_zn, cpu_env, vec_full_reg_offset(s, a->rn));
96
+ opfn(rd0, rn, rm);
208
+ tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, a->pg));
97
+ tcg_temp_free_i32(rn);
209
+ fn(temp, t_zn, t_pg, desc);
98
+ tcg_temp_free_i32(rm);
210
+ tcg_temp_free_ptr(t_zn);
99
+
211
+ tcg_temp_free_ptr(t_pg);
100
+ rn = neon_load_reg(a->vn, 1);
212
+ tcg_temp_free_i32(desc);
101
+ rm = neon_load_reg(a->vm, 1);
213
+
102
+ opfn(rd1, rn, rm);
214
+ write_fp_dreg(s, a->rd, temp);
103
+ tcg_temp_free_i32(rn);
215
+ tcg_temp_free_i64(temp);
104
+ tcg_temp_free_i32(rm);
105
+
106
+ /* Don't store results until after all loads: they might overlap */
107
+ if (accfn) {
108
+ tmp = tcg_temp_new_i64();
109
+ neon_load_reg64(tmp, a->vd);
110
+ accfn(tmp, tmp, rd0);
111
+ neon_store_reg64(tmp, a->vd);
112
+ neon_load_reg64(tmp, a->vd + 1);
113
+ accfn(tmp, tmp, rd1);
114
+ neon_store_reg64(tmp, a->vd + 1);
115
+ tcg_temp_free_i64(tmp);
116
+ } else {
117
+ neon_store_reg64(rd0, a->vd);
118
+ neon_store_reg64(rd1, a->vd + 1);
119
+ }
120
+
121
+ tcg_temp_free_i64(rd0);
122
+ tcg_temp_free_i64(rd1);
123
+
216
+ return true;
124
+ return true;
217
+}
125
+}
218
+
126
+
219
+#define DO_VPZ(NAME, name) \
127
+static bool trans_VABDL_S_3d(DisasContext *s, arg_3diff *a)
220
+static bool trans_##NAME(DisasContext *s, arg_rpr_esz *a, uint32_t insn) \
128
+{
221
+{ \
129
+ static NeonGenTwoOpWidenFn * const opfn[] = {
222
+ static gen_helper_gvec_reduc * const fns[4] = { \
130
+ gen_helper_neon_abdl_s16,
223
+ gen_helper_sve_##name##_b, gen_helper_sve_##name##_h, \
131
+ gen_helper_neon_abdl_s32,
224
+ gen_helper_sve_##name##_s, gen_helper_sve_##name##_d, \
132
+ gen_helper_neon_abdl_s64,
225
+ }; \
133
+ NULL,
226
+ return do_vpz_ool(s, a, fns[a->esz]); \
134
+ };
227
+}
135
+
228
+
136
+ return do_long_3d(s, a, opfn[a->size], NULL);
229
+DO_VPZ(ORV, orv)
137
+}
230
+DO_VPZ(ANDV, andv)
138
+
231
+DO_VPZ(EORV, eorv)
139
+static bool trans_VABDL_U_3d(DisasContext *s, arg_3diff *a)
232
+
140
+{
233
+DO_VPZ(UADDV, uaddv)
141
+ static NeonGenTwoOpWidenFn * const opfn[] = {
234
+DO_VPZ(SMAXV, smaxv)
142
+ gen_helper_neon_abdl_u16,
235
+DO_VPZ(UMAXV, umaxv)
143
+ gen_helper_neon_abdl_u32,
236
+DO_VPZ(SMINV, sminv)
144
+ gen_helper_neon_abdl_u64,
237
+DO_VPZ(UMINV, uminv)
145
+ NULL,
238
+
146
+ };
239
+static bool trans_SADDV(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
147
+
240
+{
148
+ return do_long_3d(s, a, opfn[a->size], NULL);
241
+ static gen_helper_gvec_reduc * const fns[4] = {
149
+}
242
+ gen_helper_sve_saddv_b, gen_helper_sve_saddv_h,
150
+
243
+ gen_helper_sve_saddv_s, NULL
151
+static bool trans_VABAL_S_3d(DisasContext *s, arg_3diff *a)
244
+ };
152
+{
245
+ return do_vpz_ool(s, a, fns[a->esz]);
153
+ static NeonGenTwoOpWidenFn * const opfn[] = {
246
+}
154
+ gen_helper_neon_abdl_s16,
247
+
155
+ gen_helper_neon_abdl_s32,
248
+#undef DO_VPZ
156
+ gen_helper_neon_abdl_s64,
249
+
157
+ NULL,
250
/*
158
+ };
251
*** SVE Predicate Logical Operations Group
159
+ static NeonGenTwo64OpFn * const addfn[] = {
252
*/
160
+ gen_helper_neon_addl_u16,
253
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
161
+ gen_helper_neon_addl_u32,
254
index XXXXXXX..XXXXXXX 100644
162
+ tcg_gen_add_i64,
255
--- a/target/arm/sve.decode
163
+ NULL,
256
+++ b/target/arm/sve.decode
164
+ };
257
@@ -XXX,XX +XXX,XX @@
165
+
258
&rr_esz rd rn esz
166
+ return do_long_3d(s, a, opfn[a->size], addfn[a->size]);
259
&rri rd rn imm
167
+}
260
&rrr_esz rd rn rm esz
168
+
261
+&rpr_esz rd pg rn esz
169
+static bool trans_VABAL_U_3d(DisasContext *s, arg_3diff *a)
262
&rprr_s rd pg rn rm s
170
+{
263
&rprr_esz rd pg rn rm esz
171
+ static NeonGenTwoOpWidenFn * const opfn[] = {
264
172
+ gen_helper_neon_abdl_u16,
265
@@ -XXX,XX +XXX,XX @@
173
+ gen_helper_neon_abdl_u32,
266
@rdm_pg_rn ........ esz:2 ... ... ... pg:3 rn:5 rd:5 \
174
+ gen_helper_neon_abdl_u64,
267
&rprr_esz rm=%reg_movprfx
175
+ NULL,
268
176
+ };
269
+# One register operand, with governing predicate, vector element size
177
+ static NeonGenTwo64OpFn * const addfn[] = {
270
+@rd_pg_rn ........ esz:2 ... ... ... pg:3 rn:5 rd:5 &rpr_esz
178
+ gen_helper_neon_addl_u16,
271
+
179
+ gen_helper_neon_addl_u32,
272
# Basic Load/Store with 9-bit immediate offset
180
+ tcg_gen_add_i64,
273
@pd_rn_i9 ........ ........ ...... rn:5 . rd:4 \
181
+ NULL,
274
&rri imm=%imm9_16_10
182
+ };
275
@@ -XXX,XX +XXX,XX @@ UDIV_zpzz 00000100 .. 010 101 000 ... ..... ..... @rdn_pg_rm
183
+
276
SDIV_zpzz 00000100 .. 010 110 000 ... ..... ..... @rdm_pg_rn # SDIVR
184
+ return do_long_3d(s, a, opfn[a->size], addfn[a->size]);
277
UDIV_zpzz 00000100 .. 010 111 000 ... ..... ..... @rdm_pg_rn # UDIVR
185
+}
278
186
diff --git a/target/arm/translate.c b/target/arm/translate.c
279
+### SVE Integer Reduction Group
187
index XXXXXXX..XXXXXXX 100644
280
+
188
--- a/target/arm/translate.c
281
+# SVE bitwise logical reduction (predicated)
189
+++ b/target/arm/translate.c
282
+ORV 00000100 .. 011 000 001 ... ..... ..... @rd_pg_rn
190
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
283
+EORV 00000100 .. 011 001 001 ... ..... ..... @rd_pg_rn
191
{0, 0, 0, 7}, /* VSUBL: handled by decodetree */
284
+ANDV 00000100 .. 011 010 001 ... ..... ..... @rd_pg_rn
192
{0, 0, 0, 7}, /* VSUBW: handled by decodetree */
285
+
193
{0, 0, 0, 7}, /* VADDHN: handled by decodetree */
286
+# SVE integer add reduction (predicated)
194
- {0, 0, 0, 0}, /* VABAL */
287
+# Note that saddv requires size != 3.
195
+ {0, 0, 0, 7}, /* VABAL */
288
+UADDV 00000100 .. 000 001 001 ... ..... ..... @rd_pg_rn
196
{0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
289
+SADDV 00000100 .. 000 000 001 ... ..... ..... @rd_pg_rn
197
- {0, 0, 0, 0}, /* VABDL */
290
+
198
+ {0, 0, 0, 7}, /* VABDL */
291
+# SVE integer min/max reduction (predicated)
199
{0, 0, 0, 0}, /* VMLAL */
292
+SMAXV 00000100 .. 001 000 001 ... ..... ..... @rd_pg_rn
200
{0, 0, 0, 9}, /* VQDMLAL */
293
+UMAXV 00000100 .. 001 001 001 ... ..... ..... @rd_pg_rn
201
{0, 0, 0, 0}, /* VMLSL */
294
+SMINV 00000100 .. 001 010 001 ... ..... ..... @rd_pg_rn
202
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
295
+UMINV 00000100 .. 001 011 001 ... ..... ..... @rd_pg_rn
203
tmp2 = neon_load_reg(rm, pass);
296
+
204
}
297
### SVE Logical - Unpredicated Group
205
switch (op) {
298
206
- case 5: case 7: /* VABAL, VABDL */
299
# SVE bitwise logical operations (unpredicated)
207
- switch ((size << 1) | u) {
208
- case 0:
209
- gen_helper_neon_abdl_s16(cpu_V0, tmp, tmp2);
210
- break;
211
- case 1:
212
- gen_helper_neon_abdl_u16(cpu_V0, tmp, tmp2);
213
- break;
214
- case 2:
215
- gen_helper_neon_abdl_s32(cpu_V0, tmp, tmp2);
216
- break;
217
- case 3:
218
- gen_helper_neon_abdl_u32(cpu_V0, tmp, tmp2);
219
- break;
220
- case 4:
221
- gen_helper_neon_abdl_s64(cpu_V0, tmp, tmp2);
222
- break;
223
- case 5:
224
- gen_helper_neon_abdl_u64(cpu_V0, tmp, tmp2);
225
- break;
226
- default: abort();
227
- }
228
- tcg_temp_free_i32(tmp2);
229
- tcg_temp_free_i32(tmp);
230
- break;
231
case 8: case 9: case 10: case 11: case 12: case 13:
232
/* VMLAL, VQDMLAL, VMLSL, VQDMLSL, VMULL, VQDMULL */
233
gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
234
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
235
case 10: /* VMLSL */
236
gen_neon_negl(cpu_V0, size);
237
/* Fall through */
238
- case 5: case 8: /* VABAL, VMLAL */
239
+ case 8: /* VABAL, VMLAL */
240
gen_neon_addl(size);
241
break;
242
case 9: case 11: /* VQDMLAL, VQDMLSL */
300
--
243
--
301
2.17.0
244
2.20.1
302
245
303
246
diff view generated by jsdifflib
1
From: Richard Henderson <richard.henderson@linaro.org>
1
Convert the Neon 3-reg-diff insns VMULL, VMLAL and VMLSL; these perform
2
a 32x32->64 multiply with possible accumulate.
2
3
3
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
4
Note that for VMLSL we do the accumulate directly with a subtraction
4
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
5
rather than doing a negate-then-add as the old code did.
5
Message-id: 20180516223007.10256-20-richard.henderson@linaro.org
6
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
7
---
9
---
8
target/arm/helper-sve.h | 5 +++++
10
target/arm/neon-dp.decode | 9 +++++
9
target/arm/sve_helper.c | 40 ++++++++++++++++++++++++++++++++++++++
11
target/arm/translate-neon.inc.c | 71 +++++++++++++++++++++++++++++++++
10
target/arm/translate-sve.c | 36 ++++++++++++++++++++++++++++++++++
12
target/arm/translate.c | 21 +++-------
11
target/arm/sve.decode | 12 ++++++++++++
13
3 files changed, 86 insertions(+), 15 deletions(-)
12
4 files changed, 93 insertions(+)
13
14
14
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
15
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
15
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/helper-sve.h
17
--- a/target/arm/neon-dp.decode
17
+++ b/target/arm/helper-sve.h
18
+++ b/target/arm/neon-dp.decode
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_lsl_zzw_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
19
@@ -XXX,XX +XXX,XX @@ Vimm_1r 1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
19
DEF_HELPER_FLAGS_4(sve_lsl_zzw_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
20
20
DEF_HELPER_FLAGS_4(sve_lsl_zzw_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
21
VABDL_S_3d 1111 001 0 1 . .. .... .... 0111 . 0 . 0 .... @3diff
21
22
VABDL_U_3d 1111 001 1 1 . .. .... .... 0111 . 0 . 0 .... @3diff
22
+DEF_HELPER_FLAGS_4(sve_adr_p32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
23
+DEF_HELPER_FLAGS_4(sve_adr_p64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
24
+DEF_HELPER_FLAGS_4(sve_adr_s32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
25
+DEF_HELPER_FLAGS_4(sve_adr_u32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
26
+
23
+
27
DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
24
+ VMLAL_S_3d 1111 001 0 1 . .. .... .... 1000 . 0 . 0 .... @3diff
28
DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
25
+ VMLAL_U_3d 1111 001 1 1 . .. .... .... 1000 . 0 . 0 .... @3diff
29
DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
26
+
30
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
27
+ VMLSL_S_3d 1111 001 0 1 . .. .... .... 1010 . 0 . 0 .... @3diff
28
+ VMLSL_U_3d 1111 001 1 1 . .. .... .... 1010 . 0 . 0 .... @3diff
29
+
30
+ VMULL_S_3d 1111 001 0 1 . .. .... .... 1100 . 0 . 0 .... @3diff
31
+ VMULL_U_3d 1111 001 1 1 . .. .... .... 1100 . 0 . 0 .... @3diff
32
]
33
}
34
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
31
index XXXXXXX..XXXXXXX 100644
35
index XXXXXXX..XXXXXXX 100644
32
--- a/target/arm/sve_helper.c
36
--- a/target/arm/translate-neon.inc.c
33
+++ b/target/arm/sve_helper.c
37
+++ b/target/arm/translate-neon.inc.c
34
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_index_d)(void *vd, uint64_t start,
38
@@ -XXX,XX +XXX,XX @@ static bool trans_VABAL_U_3d(DisasContext *s, arg_3diff *a)
35
d[i] = start + i * incr;
39
36
}
40
return do_long_3d(s, a, opfn[a->size], addfn[a->size]);
37
}
41
}
38
+
42
+
39
+void HELPER(sve_adr_p32)(void *vd, void *vn, void *vm, uint32_t desc)
43
+static void gen_mull_s32(TCGv_i64 rd, TCGv_i32 rn, TCGv_i32 rm)
40
+{
44
+{
41
+ intptr_t i, opr_sz = simd_oprsz(desc) / 4;
45
+ TCGv_i32 lo = tcg_temp_new_i32();
42
+ uint32_t sh = simd_data(desc);
46
+ TCGv_i32 hi = tcg_temp_new_i32();
43
+ uint32_t *d = vd, *n = vn, *m = vm;
47
+
44
+ for (i = 0; i < opr_sz; i += 1) {
48
+ tcg_gen_muls2_i32(lo, hi, rn, rm);
45
+ d[i] = n[i] + (m[i] << sh);
49
+ tcg_gen_concat_i32_i64(rd, lo, hi);
46
+ }
50
+
51
+ tcg_temp_free_i32(lo);
52
+ tcg_temp_free_i32(hi);
47
+}
53
+}
48
+
54
+
49
+void HELPER(sve_adr_p64)(void *vd, void *vn, void *vm, uint32_t desc)
55
+static void gen_mull_u32(TCGv_i64 rd, TCGv_i32 rn, TCGv_i32 rm)
50
+{
56
+{
51
+ intptr_t i, opr_sz = simd_oprsz(desc) / 8;
57
+ TCGv_i32 lo = tcg_temp_new_i32();
52
+ uint64_t sh = simd_data(desc);
58
+ TCGv_i32 hi = tcg_temp_new_i32();
53
+ uint64_t *d = vd, *n = vn, *m = vm;
59
+
54
+ for (i = 0; i < opr_sz; i += 1) {
60
+ tcg_gen_mulu2_i32(lo, hi, rn, rm);
55
+ d[i] = n[i] + (m[i] << sh);
61
+ tcg_gen_concat_i32_i64(rd, lo, hi);
56
+ }
62
+
63
+ tcg_temp_free_i32(lo);
64
+ tcg_temp_free_i32(hi);
57
+}
65
+}
58
+
66
+
59
+void HELPER(sve_adr_s32)(void *vd, void *vn, void *vm, uint32_t desc)
67
+static bool trans_VMULL_S_3d(DisasContext *s, arg_3diff *a)
60
+{
68
+{
61
+ intptr_t i, opr_sz = simd_oprsz(desc) / 8;
69
+ static NeonGenTwoOpWidenFn * const opfn[] = {
62
+ uint64_t sh = simd_data(desc);
70
+ gen_helper_neon_mull_s8,
63
+ uint64_t *d = vd, *n = vn, *m = vm;
71
+ gen_helper_neon_mull_s16,
64
+ for (i = 0; i < opr_sz; i += 1) {
72
+ gen_mull_s32,
65
+ d[i] = n[i] + ((uint64_t)(int32_t)m[i] << sh);
73
+ NULL,
66
+ }
74
+ };
75
+
76
+ return do_long_3d(s, a, opfn[a->size], NULL);
67
+}
77
+}
68
+
78
+
69
+void HELPER(sve_adr_u32)(void *vd, void *vn, void *vm, uint32_t desc)
79
+static bool trans_VMULL_U_3d(DisasContext *s, arg_3diff *a)
70
+{
80
+{
71
+ intptr_t i, opr_sz = simd_oprsz(desc) / 8;
81
+ static NeonGenTwoOpWidenFn * const opfn[] = {
72
+ uint64_t sh = simd_data(desc);
82
+ gen_helper_neon_mull_u8,
73
+ uint64_t *d = vd, *n = vn, *m = vm;
83
+ gen_helper_neon_mull_u16,
74
+ for (i = 0; i < opr_sz; i += 1) {
84
+ gen_mull_u32,
75
+ d[i] = n[i] + ((uint64_t)(uint32_t)m[i] << sh);
85
+ NULL,
76
+ }
86
+ };
77
+}
78
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
79
index XXXXXXX..XXXXXXX 100644
80
--- a/target/arm/translate-sve.c
81
+++ b/target/arm/translate-sve.c
82
@@ -XXX,XX +XXX,XX @@ static bool trans_RDVL(DisasContext *s, arg_RDVL *a, uint32_t insn)
83
return true;
84
}
85
86
+/*
87
+ *** SVE Compute Vector Address Group
88
+ */
89
+
87
+
90
+static bool do_adr(DisasContext *s, arg_rrri *a, gen_helper_gvec_3 *fn)
88
+ return do_long_3d(s, a, opfn[a->size], NULL);
91
+{
92
+ if (sve_access_check(s)) {
93
+ unsigned vsz = vec_full_reg_size(s);
94
+ tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
95
+ vec_full_reg_offset(s, a->rn),
96
+ vec_full_reg_offset(s, a->rm),
97
+ vsz, vsz, a->imm, fn);
98
+ }
99
+ return true;
100
+}
89
+}
101
+
90
+
102
+static bool trans_ADR_p32(DisasContext *s, arg_rrri *a, uint32_t insn)
91
+#define DO_VMLAL(INSN,MULL,ACC) \
103
+{
92
+ static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a) \
104
+ return do_adr(s, a, gen_helper_sve_adr_p32);
93
+ { \
105
+}
94
+ static NeonGenTwoOpWidenFn * const opfn[] = { \
95
+ gen_helper_neon_##MULL##8, \
96
+ gen_helper_neon_##MULL##16, \
97
+ gen_##MULL##32, \
98
+ NULL, \
99
+ }; \
100
+ static NeonGenTwo64OpFn * const accfn[] = { \
101
+ gen_helper_neon_##ACC##l_u16, \
102
+ gen_helper_neon_##ACC##l_u32, \
103
+ tcg_gen_##ACC##_i64, \
104
+ NULL, \
105
+ }; \
106
+ return do_long_3d(s, a, opfn[a->size], accfn[a->size]); \
107
+ }
106
+
108
+
107
+static bool trans_ADR_p64(DisasContext *s, arg_rrri *a, uint32_t insn)
109
+DO_VMLAL(VMLAL_S,mull_s,add)
108
+{
110
+DO_VMLAL(VMLAL_U,mull_u,add)
109
+ return do_adr(s, a, gen_helper_sve_adr_p64);
111
+DO_VMLAL(VMLSL_S,mull_s,sub)
110
+}
112
+DO_VMLAL(VMLSL_U,mull_u,sub)
111
+
113
diff --git a/target/arm/translate.c b/target/arm/translate.c
112
+static bool trans_ADR_s32(DisasContext *s, arg_rrri *a, uint32_t insn)
113
+{
114
+ return do_adr(s, a, gen_helper_sve_adr_s32);
115
+}
116
+
117
+static bool trans_ADR_u32(DisasContext *s, arg_rrri *a, uint32_t insn)
118
+{
119
+ return do_adr(s, a, gen_helper_sve_adr_u32);
120
+}
121
+
122
/*
123
*** SVE Predicate Logical Operations Group
124
*/
125
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
126
index XXXXXXX..XXXXXXX 100644
114
index XXXXXXX..XXXXXXX 100644
127
--- a/target/arm/sve.decode
115
--- a/target/arm/translate.c
128
+++ b/target/arm/sve.decode
116
+++ b/target/arm/translate.c
129
@@ -XXX,XX +XXX,XX @@
117
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
130
118
{0, 0, 0, 7}, /* VABAL */
131
&rr_esz rd rn esz
119
{0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
132
&rri rd rn imm
120
{0, 0, 0, 7}, /* VABDL */
133
+&rrri rd rn rm imm
121
- {0, 0, 0, 0}, /* VMLAL */
134
&rri_esz rd rn imm esz
122
+ {0, 0, 0, 7}, /* VMLAL */
135
&rrr_esz rd rn rm esz
123
{0, 0, 0, 9}, /* VQDMLAL */
136
&rpr_esz rd pg rn esz
124
- {0, 0, 0, 0}, /* VMLSL */
137
@@ -XXX,XX +XXX,XX @@
125
+ {0, 0, 0, 7}, /* VMLSL */
138
# Three operand, vector element size
126
{0, 0, 0, 9}, /* VQDMLSL */
139
@rd_rn_rm ........ esz:2 . rm:5 ... ... rn:5 rd:5 &rrr_esz
127
- {0, 0, 0, 0}, /* Integer VMULL */
140
128
+ {0, 0, 0, 7}, /* Integer VMULL */
141
+# Three operand with "memory" size, aka immediate left shift
129
{0, 0, 0, 9}, /* VQDMULL */
142
+@rd_rn_msz_rm ........ ... rm:5 .... imm:2 rn:5 rd:5 &rrri
130
{0, 0, 0, 0xa}, /* Polynomial VMULL */
143
+
131
{0, 0, 0, 7}, /* Reserved: always UNDEF */
144
# Two register operand, with governing predicate, vector element size
132
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
145
@rdn_pg_rm ........ esz:2 ... ... ... pg:3 rm:5 rd:5 \
133
tmp2 = neon_load_reg(rm, pass);
146
&rprr_esz rn=%reg_movprfx
134
}
147
@@ -XXX,XX +XXX,XX @@ ASR_zzw 00000100 .. 1 ..... 1000 00 ..... ..... @rd_rn_rm
135
switch (op) {
148
LSR_zzw 00000100 .. 1 ..... 1000 01 ..... ..... @rd_rn_rm
136
- case 8: case 9: case 10: case 11: case 12: case 13:
149
LSL_zzw 00000100 .. 1 ..... 1000 11 ..... ..... @rd_rn_rm
137
- /* VMLAL, VQDMLAL, VMLSL, VQDMLSL, VMULL, VQDMULL */
150
138
+ case 9: case 11: case 13:
151
+### SVE Compute Vector Address Group
139
+ /* VQDMLAL, VQDMLSL, VQDMULL */
152
+
140
gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
153
+# SVE vector address generation
141
break;
154
+ADR_s32 00000100 00 1 ..... 1010 .. ..... ..... @rd_rn_msz_rm
142
default: /* 15 is RESERVED: caught earlier */
155
+ADR_u32 00000100 01 1 ..... 1010 .. ..... ..... @rd_rn_msz_rm
143
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
156
+ADR_p32 00000100 10 1 ..... 1010 .. ..... ..... @rd_rn_msz_rm
144
/* VQDMULL */
157
+ADR_p64 00000100 11 1 ..... 1010 .. ..... ..... @rd_rn_msz_rm
145
gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
158
+
146
neon_store_reg64(cpu_V0, rd + pass);
159
### SVE Predicate Logical Operations Group
147
- } else if (op == 5 || (op >= 8 && op <= 11)) {
160
148
+ } else {
161
# SVE predicate logical operations
149
/* Accumulate. */
150
neon_load_reg64(cpu_V1, rd + pass);
151
switch (op) {
152
- case 10: /* VMLSL */
153
- gen_neon_negl(cpu_V0, size);
154
- /* Fall through */
155
- case 8: /* VABAL, VMLAL */
156
- gen_neon_addl(size);
157
- break;
158
case 9: case 11: /* VQDMLAL, VQDMLSL */
159
gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
160
if (op == 11) {
161
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
162
abort();
163
}
164
neon_store_reg64(cpu_V0, rd + pass);
165
- } else {
166
- /* Write back the result. */
167
- neon_store_reg64(cpu_V0, rd + pass);
168
}
169
}
170
} else {
162
--
171
--
163
2.17.0
172
2.20.1
164
173
165
174
diff view generated by jsdifflib
1
From: Richard Henderson <richard.henderson@linaro.org>
1
Convert the Neon 3-reg-diff insns VQDMULL, VQDMLAL and VQDMLSL:
2
2
these are all saturating doubling long multiplies with a possible
3
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
3
accumulate step.
4
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
4
5
Message-id: 20180516223007.10256-19-richard.henderson@linaro.org
5
These are the last insns in the group which use the pass-over-each
6
elements loop, so we can delete that code.
7
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
7
---
10
---
8
target/arm/helper-sve.h | 12 ++++++
11
target/arm/neon-dp.decode | 6 +++
9
target/arm/sve_helper.c | 30 ++++++++++++++
12
target/arm/translate-neon.inc.c | 82 +++++++++++++++++++++++++++++++++
10
target/arm/translate-sve.c | 85 ++++++++++++++++++++++++++++++++++++++
13
target/arm/translate.c | 59 ++----------------------
11
target/arm/sve.decode | 26 ++++++++++++
14
3 files changed, 92 insertions(+), 55 deletions(-)
12
4 files changed, 153 insertions(+)
15
13
16
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
14
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
15
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/helper-sve.h
18
--- a/target/arm/neon-dp.decode
17
+++ b/target/arm/helper-sve.h
19
+++ b/target/arm/neon-dp.decode
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_index_h, TCG_CALL_NO_RWG, void, ptr, i32, i32, i32)
20
@@ -XXX,XX +XXX,XX @@ Vimm_1r 1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
19
DEF_HELPER_FLAGS_4(sve_index_s, TCG_CALL_NO_RWG, void, ptr, i32, i32, i32)
21
VMLAL_S_3d 1111 001 0 1 . .. .... .... 1000 . 0 . 0 .... @3diff
20
DEF_HELPER_FLAGS_4(sve_index_d, TCG_CALL_NO_RWG, void, ptr, i64, i64, i32)
22
VMLAL_U_3d 1111 001 1 1 . .. .... .... 1000 . 0 . 0 .... @3diff
21
23
22
+DEF_HELPER_FLAGS_4(sve_asr_zzw_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
24
+ VQDMLAL_3d 1111 001 0 1 . .. .... .... 1001 . 0 . 0 .... @3diff
23
+DEF_HELPER_FLAGS_4(sve_asr_zzw_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
25
+
24
+DEF_HELPER_FLAGS_4(sve_asr_zzw_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
26
VMLSL_S_3d 1111 001 0 1 . .. .... .... 1010 . 0 . 0 .... @3diff
25
+
27
VMLSL_U_3d 1111 001 1 1 . .. .... .... 1010 . 0 . 0 .... @3diff
26
+DEF_HELPER_FLAGS_4(sve_lsr_zzw_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
28
27
+DEF_HELPER_FLAGS_4(sve_lsr_zzw_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
29
+ VQDMLSL_3d 1111 001 0 1 . .. .... .... 1011 . 0 . 0 .... @3diff
28
+DEF_HELPER_FLAGS_4(sve_lsr_zzw_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
30
+
29
+
31
VMULL_S_3d 1111 001 0 1 . .. .... .... 1100 . 0 . 0 .... @3diff
30
+DEF_HELPER_FLAGS_4(sve_lsl_zzw_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
32
VMULL_U_3d 1111 001 1 1 . .. .... .... 1100 . 0 . 0 .... @3diff
31
+DEF_HELPER_FLAGS_4(sve_lsl_zzw_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
33
+
32
+DEF_HELPER_FLAGS_4(sve_lsl_zzw_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
34
+ VQDMULL_3d 1111 001 0 1 . .. .... .... 1101 . 0 . 0 .... @3diff
33
+
35
]
34
DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
36
}
35
DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
37
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
36
DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
37
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
38
index XXXXXXX..XXXXXXX 100644
38
index XXXXXXX..XXXXXXX 100644
39
--- a/target/arm/sve_helper.c
39
--- a/target/arm/translate-neon.inc.c
40
+++ b/target/arm/sve_helper.c
40
+++ b/target/arm/translate-neon.inc.c
41
@@ -XXX,XX +XXX,XX @@ DO_ZPZ(sve_neg_h, uint16_t, H1_2, DO_NEG)
41
@@ -XXX,XX +XXX,XX @@ DO_VMLAL(VMLAL_S,mull_s,add)
42
DO_ZPZ(sve_neg_s, uint32_t, H1_4, DO_NEG)
42
DO_VMLAL(VMLAL_U,mull_u,add)
43
DO_ZPZ_D(sve_neg_d, uint64_t, DO_NEG)
43
DO_VMLAL(VMLSL_S,mull_s,sub)
44
44
DO_VMLAL(VMLSL_U,mull_u,sub)
45
+/* Three-operand expander, unpredicated, in which the third operand is "wide".
45
+
46
+ */
46
+static void gen_VQDMULL_16(TCGv_i64 rd, TCGv_i32 rn, TCGv_i32 rm)
47
+#define DO_ZZW(NAME, TYPE, TYPEW, H, OP) \
47
+{
48
+void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \
48
+ gen_helper_neon_mull_s16(rd, rn, rm);
49
+{ \
49
+ gen_helper_neon_addl_saturate_s32(rd, cpu_env, rd, rd);
50
+ intptr_t i, opr_sz = simd_oprsz(desc); \
50
+}
51
+ for (i = 0; i < opr_sz; ) { \
51
+
52
+ TYPEW mm = *(TYPEW *)(vm + i); \
52
+static void gen_VQDMULL_32(TCGv_i64 rd, TCGv_i32 rn, TCGv_i32 rm)
53
+ do { \
53
+{
54
+ TYPE nn = *(TYPE *)(vn + H(i)); \
54
+ gen_mull_s32(rd, rn, rm);
55
+ *(TYPE *)(vd + H(i)) = OP(nn, mm); \
55
+ gen_helper_neon_addl_saturate_s64(rd, cpu_env, rd, rd);
56
+ i += sizeof(TYPE); \
56
+}
57
+ } while (i & 7); \
57
+
58
+ } \
58
+static bool trans_VQDMULL_3d(DisasContext *s, arg_3diff *a)
59
+}
59
+{
60
+
60
+ static NeonGenTwoOpWidenFn * const opfn[] = {
61
+DO_ZZW(sve_asr_zzw_b, int8_t, uint64_t, H1, DO_ASR)
61
+ NULL,
62
+DO_ZZW(sve_lsr_zzw_b, uint8_t, uint64_t, H1, DO_LSR)
62
+ gen_VQDMULL_16,
63
+DO_ZZW(sve_lsl_zzw_b, uint8_t, uint64_t, H1, DO_LSL)
63
+ gen_VQDMULL_32,
64
+
64
+ NULL,
65
+DO_ZZW(sve_asr_zzw_h, int16_t, uint64_t, H1_2, DO_ASR)
65
+ };
66
+DO_ZZW(sve_lsr_zzw_h, uint16_t, uint64_t, H1_2, DO_LSR)
66
+
67
+DO_ZZW(sve_lsl_zzw_h, uint16_t, uint64_t, H1_2, DO_LSL)
67
+ return do_long_3d(s, a, opfn[a->size], NULL);
68
+
68
+}
69
+DO_ZZW(sve_asr_zzw_s, int32_t, uint64_t, H1_4, DO_ASR)
69
+
70
+DO_ZZW(sve_lsr_zzw_s, uint32_t, uint64_t, H1_4, DO_LSR)
70
+static void gen_VQDMLAL_acc_16(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
71
+DO_ZZW(sve_lsl_zzw_s, uint32_t, uint64_t, H1_4, DO_LSL)
71
+{
72
+
72
+ gen_helper_neon_addl_saturate_s32(rd, cpu_env, rn, rm);
73
+#undef DO_ZZW
73
+}
74
+
74
+
75
#undef DO_CLS_B
75
+static void gen_VQDMLAL_acc_32(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
76
#undef DO_CLS_H
76
+{
77
#undef DO_CLZ_B
77
+ gen_helper_neon_addl_saturate_s64(rd, cpu_env, rn, rm);
78
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
78
+}
79
+
80
+static bool trans_VQDMLAL_3d(DisasContext *s, arg_3diff *a)
81
+{
82
+ static NeonGenTwoOpWidenFn * const opfn[] = {
83
+ NULL,
84
+ gen_VQDMULL_16,
85
+ gen_VQDMULL_32,
86
+ NULL,
87
+ };
88
+ static NeonGenTwo64OpFn * const accfn[] = {
89
+ NULL,
90
+ gen_VQDMLAL_acc_16,
91
+ gen_VQDMLAL_acc_32,
92
+ NULL,
93
+ };
94
+
95
+ return do_long_3d(s, a, opfn[a->size], accfn[a->size]);
96
+}
97
+
98
+static void gen_VQDMLSL_acc_16(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
99
+{
100
+ gen_helper_neon_negl_u32(rm, rm);
101
+ gen_helper_neon_addl_saturate_s32(rd, cpu_env, rn, rm);
102
+}
103
+
104
+static void gen_VQDMLSL_acc_32(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
105
+{
106
+ tcg_gen_neg_i64(rm, rm);
107
+ gen_helper_neon_addl_saturate_s64(rd, cpu_env, rn, rm);
108
+}
109
+
110
+static bool trans_VQDMLSL_3d(DisasContext *s, arg_3diff *a)
111
+{
112
+ static NeonGenTwoOpWidenFn * const opfn[] = {
113
+ NULL,
114
+ gen_VQDMULL_16,
115
+ gen_VQDMULL_32,
116
+ NULL,
117
+ };
118
+ static NeonGenTwo64OpFn * const accfn[] = {
119
+ NULL,
120
+ gen_VQDMLSL_acc_16,
121
+ gen_VQDMLSL_acc_32,
122
+ NULL,
123
+ };
124
+
125
+ return do_long_3d(s, a, opfn[a->size], accfn[a->size]);
126
+}
127
diff --git a/target/arm/translate.c b/target/arm/translate.c
79
index XXXXXXX..XXXXXXX 100644
128
index XXXXXXX..XXXXXXX 100644
80
--- a/target/arm/translate-sve.c
129
--- a/target/arm/translate.c
81
+++ b/target/arm/translate-sve.c
130
+++ b/target/arm/translate.c
82
@@ -XXX,XX +XXX,XX @@ static bool do_mov_z(DisasContext *s, int rd, int rn)
131
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
83
return do_vector2_z(s, tcg_gen_gvec_mov, 0, rd, rn);
132
{0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
84
}
133
{0, 0, 0, 7}, /* VABDL */
85
134
{0, 0, 0, 7}, /* VMLAL */
86
+/* Initialize a Zreg with replications of a 64-bit immediate. */
135
- {0, 0, 0, 9}, /* VQDMLAL */
87
+static void do_dupi_z(DisasContext *s, int rd, uint64_t word)
136
+ {0, 0, 0, 7}, /* VQDMLAL */
88
+{
137
{0, 0, 0, 7}, /* VMLSL */
89
+ unsigned vsz = vec_full_reg_size(s);
138
- {0, 0, 0, 9}, /* VQDMLSL */
90
+ tcg_gen_gvec_dup64i(vec_full_reg_offset(s, rd), vsz, vsz, word);
139
+ {0, 0, 0, 7}, /* VQDMLSL */
91
+}
140
{0, 0, 0, 7}, /* Integer VMULL */
92
+
141
- {0, 0, 0, 9}, /* VQDMULL */
93
/* Invoke a vector expander on two Pregs. */
142
+ {0, 0, 0, 7}, /* VQDMULL */
94
static bool do_vector2_p(DisasContext *s, GVecGen2Fn *gvec_fn,
143
{0, 0, 0, 0xa}, /* Polynomial VMULL */
95
int esz, int rd, int rn)
144
{0, 0, 0, 7}, /* Reserved: always UNDEF */
96
@@ -XXX,XX +XXX,XX @@ DO_ZPZW(LSL, lsl)
145
};
97
146
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
98
#undef DO_ZPZW
147
}
99
148
return 0;
100
+/*
149
}
101
+ *** SVE Bitwise Shift - Unpredicated Group
150
-
102
+ */
151
- /* Avoid overlapping operands. Wide source operands are
103
+
152
- always aligned so will never overlap with wide
104
+static bool do_shift_imm(DisasContext *s, arg_rri_esz *a, bool asr,
153
- destinations in problematic ways. */
105
+ void (*gvec_fn)(unsigned, uint32_t, uint32_t,
154
- if (rd == rm) {
106
+ int64_t, uint32_t, uint32_t))
155
- tmp = neon_load_reg(rm, 1);
107
+{
156
- neon_store_scratch(2, tmp);
108
+ if (a->esz < 0) {
157
- } else if (rd == rn) {
109
+ /* Invalid tsz encoding -- see tszimm_esz. */
158
- tmp = neon_load_reg(rn, 1);
110
+ return false;
159
- neon_store_scratch(2, tmp);
111
+ }
160
- }
112
+ if (sve_access_check(s)) {
161
- tmp3 = NULL;
113
+ unsigned vsz = vec_full_reg_size(s);
162
- for (pass = 0; pass < 2; pass++) {
114
+ /* Shift by element size is architecturally valid. For
163
- if (pass == 1 && rd == rn) {
115
+ arithmetic right-shift, it's the same as by one less.
164
- tmp = neon_load_scratch(2);
116
+ Otherwise it is a zeroing operation. */
165
- } else {
117
+ if (a->imm >= 8 << a->esz) {
166
- tmp = neon_load_reg(rn, pass);
118
+ if (asr) {
167
- }
119
+ a->imm = (8 << a->esz) - 1;
168
- if (pass == 1 && rd == rm) {
120
+ } else {
169
- tmp2 = neon_load_scratch(2);
121
+ do_dupi_z(s, a->rd, 0);
170
- } else {
122
+ return true;
171
- tmp2 = neon_load_reg(rm, pass);
123
+ }
172
- }
124
+ }
173
- switch (op) {
125
+ gvec_fn(a->esz, vec_full_reg_offset(s, a->rd),
174
- case 9: case 11: case 13:
126
+ vec_full_reg_offset(s, a->rn), a->imm, vsz, vsz);
175
- /* VQDMLAL, VQDMLSL, VQDMULL */
127
+ }
176
- gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
128
+ return true;
177
- break;
129
+}
178
- default: /* 15 is RESERVED: caught earlier */
130
+
179
- abort();
131
+static bool trans_ASR_zzi(DisasContext *s, arg_rri_esz *a, uint32_t insn)
180
- }
132
+{
181
- if (op == 13) {
133
+ return do_shift_imm(s, a, true, tcg_gen_gvec_sari);
182
- /* VQDMULL */
134
+}
183
- gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
135
+
184
- neon_store_reg64(cpu_V0, rd + pass);
136
+static bool trans_LSR_zzi(DisasContext *s, arg_rri_esz *a, uint32_t insn)
185
- } else {
137
+{
186
- /* Accumulate. */
138
+ return do_shift_imm(s, a, false, tcg_gen_gvec_shri);
187
- neon_load_reg64(cpu_V1, rd + pass);
139
+}
188
- switch (op) {
140
+
189
- case 9: case 11: /* VQDMLAL, VQDMLSL */
141
+static bool trans_LSL_zzi(DisasContext *s, arg_rri_esz *a, uint32_t insn)
190
- gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
142
+{
191
- if (op == 11) {
143
+ return do_shift_imm(s, a, false, tcg_gen_gvec_shli);
192
- gen_neon_negl(cpu_V0, size);
144
+}
193
- }
145
+
194
- gen_neon_addl_saturate(cpu_V0, cpu_V1, size);
146
+static bool do_zzw_ool(DisasContext *s, arg_rrr_esz *a, gen_helper_gvec_3 *fn)
195
- break;
147
+{
196
- default:
148
+ if (fn == NULL) {
197
- abort();
149
+ return false;
198
- }
150
+ }
199
- neon_store_reg64(cpu_V0, rd + pass);
151
+ if (sve_access_check(s)) {
200
- }
152
+ unsigned vsz = vec_full_reg_size(s);
201
- }
153
+ tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
202
+ abort(); /* all others handled by decodetree */
154
+ vec_full_reg_offset(s, a->rn),
203
} else {
155
+ vec_full_reg_offset(s, a->rm),
204
/* Two registers and a scalar. NB that for ops of this form
156
+ vsz, vsz, 0, fn);
205
* the ARM ARM labels bit 24 as Q, but it is in our variable
157
+ }
158
+ return true;
159
+}
160
+
161
+#define DO_ZZW(NAME, name) \
162
+static bool trans_##NAME##_zzw(DisasContext *s, arg_rrr_esz *a, \
163
+ uint32_t insn) \
164
+{ \
165
+ static gen_helper_gvec_3 * const fns[4] = { \
166
+ gen_helper_sve_##name##_zzw_b, gen_helper_sve_##name##_zzw_h, \
167
+ gen_helper_sve_##name##_zzw_s, NULL \
168
+ }; \
169
+ return do_zzw_ool(s, a, fns[a->esz]); \
170
+}
171
+
172
+DO_ZZW(ASR, asr)
173
+DO_ZZW(LSR, lsr)
174
+DO_ZZW(LSL, lsl)
175
+
176
+#undef DO_ZZW
177
+
178
/*
179
*** SVE Integer Multiply-Add Group
180
*/
181
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
182
index XXXXXXX..XXXXXXX 100644
183
--- a/target/arm/sve.decode
184
+++ b/target/arm/sve.decode
185
@@ -XXX,XX +XXX,XX @@
186
# A combination of tsz:imm3 -- extract (tsz:imm3) - esize
187
%tszimm_shl 22:2 5:5 !function=tszimm_shl
188
189
+# Similarly for the tszh/tszl pair at 22/16 for zzi
190
+%tszimm16_esz 22:2 16:5 !function=tszimm_esz
191
+%tszimm16_shr 22:2 16:5 !function=tszimm_shr
192
+%tszimm16_shl 22:2 16:5 !function=tszimm_shl
193
+
194
# Either a copy of rd (at bit 0), or a different source
195
# as propagated via the MOVPRFX instruction.
196
%reg_movprfx 0:5
197
@@ -XXX,XX +XXX,XX @@
198
199
&rr_esz rd rn esz
200
&rri rd rn imm
201
+&rri_esz rd rn imm esz
202
&rrr_esz rd rn rm esz
203
&rpr_esz rd pg rn esz
204
&rprr_s rd pg rn rm s
205
@@ -XXX,XX +XXX,XX @@
206
@rdn_pg_tszimm ........ .. ... ... ... pg:3 ..... rd:5 \
207
&rpri_esz rn=%reg_movprfx esz=%tszimm_esz
208
209
+# Similarly without predicate.
210
+@rd_rn_tszimm ........ .. ... ... ...... rn:5 rd:5 \
211
+ &rri_esz esz=%tszimm16_esz
212
+
213
# Basic Load/Store with 9-bit immediate offset
214
@pd_rn_i9 ........ ........ ...... rn:5 . rd:4 \
215
&rri imm=%imm9_16_10
216
@@ -XXX,XX +XXX,XX @@ ADDPL 00000100 011 ..... 01010 ...... ..... @rd_rn_i6
217
# SVE stack frame size
218
RDVL 00000100 101 11111 01010 imm:s6 rd:5
219
220
+### SVE Bitwise Shift - Unpredicated Group
221
+
222
+# SVE bitwise shift by immediate (unpredicated)
223
+ASR_zzi 00000100 .. 1 ..... 1001 00 ..... ..... \
224
+ @rd_rn_tszimm imm=%tszimm16_shr
225
+LSR_zzi 00000100 .. 1 ..... 1001 01 ..... ..... \
226
+ @rd_rn_tszimm imm=%tszimm16_shr
227
+LSL_zzi 00000100 .. 1 ..... 1001 11 ..... ..... \
228
+ @rd_rn_tszimm imm=%tszimm16_shl
229
+
230
+# SVE bitwise shift by wide elements (unpredicated)
231
+# Note esz != 3
232
+ASR_zzw 00000100 .. 1 ..... 1000 00 ..... ..... @rd_rn_rm
233
+LSR_zzw 00000100 .. 1 ..... 1000 01 ..... ..... @rd_rn_rm
234
+LSL_zzw 00000100 .. 1 ..... 1000 11 ..... ..... @rd_rn_rm
235
+
236
### SVE Predicate Logical Operations Group
237
238
# SVE predicate logical operations
239
--
206
--
240
2.17.0
207
2.20.1
241
208
242
209
diff view generated by jsdifflib
1
From: Richard Henderson <richard.henderson@linaro.org>
1
Convert the Neon 3-reg-diff insn polynomial VMULL. This is the last
2
insn in this group to be converted.
2
3
3
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
5
Message-id: 20180516223007.10256-26-richard.henderson@linaro.org
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
7
---
6
---
8
target/arm/helper-sve.h | 2 +
7
target/arm/neon-dp.decode | 2 ++
9
target/arm/sve_helper.c | 81 ++++++++++++++++++++++++++++++++++++++
8
target/arm/translate-neon.inc.c | 43 +++++++++++++++++++++++
10
target/arm/translate-sve.c | 34 ++++++++++++++++
9
target/arm/translate.c | 60 ++-------------------------------
11
target/arm/sve.decode | 7 ++++
10
3 files changed, 48 insertions(+), 57 deletions(-)
12
4 files changed, 124 insertions(+)
13
11
14
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
12
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
15
index XXXXXXX..XXXXXXX 100644
13
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/helper-sve.h
14
--- a/target/arm/neon-dp.decode
17
+++ b/target/arm/helper-sve.h
15
+++ b/target/arm/neon-dp.decode
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_cpy_z_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
16
@@ -XXX,XX +XXX,XX @@ Vimm_1r 1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
19
DEF_HELPER_FLAGS_4(sve_cpy_z_s, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
17
VMULL_U_3d 1111 001 1 1 . .. .... .... 1100 . 0 . 0 .... @3diff
20
DEF_HELPER_FLAGS_4(sve_cpy_z_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
18
21
19
VQDMULL_3d 1111 001 0 1 . .. .... .... 1101 . 0 . 0 .... @3diff
22
+DEF_HELPER_FLAGS_4(sve_ext, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
23
+
20
+
24
DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
21
+ VMULL_P_3d 1111 001 0 1 . .. .... .... 1110 . 0 . 0 .... @3diff
25
DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
22
]
26
DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
23
}
27
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
24
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
28
index XXXXXXX..XXXXXXX 100644
25
index XXXXXXX..XXXXXXX 100644
29
--- a/target/arm/sve_helper.c
26
--- a/target/arm/translate-neon.inc.c
30
+++ b/target/arm/sve_helper.c
27
+++ b/target/arm/translate-neon.inc.c
31
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_cpy_z_d)(void *vd, void *vg, uint64_t val, uint32_t desc)
28
@@ -XXX,XX +XXX,XX @@ static bool trans_VQDMLSL_3d(DisasContext *s, arg_3diff *a)
32
d[i] = (pg[H1(i)] & 1 ? val : 0);
29
33
}
30
return do_long_3d(s, a, opfn[a->size], accfn[a->size]);
34
}
31
}
35
+
32
+
36
+/* Big-endian hosts need to frob the byte indicies. If the copy
33
+static bool trans_VMULL_P_3d(DisasContext *s, arg_3diff *a)
37
+ * happens to be 8-byte aligned, then no frobbing necessary.
38
+ */
39
+static void swap_memmove(void *vd, void *vs, size_t n)
40
+{
34
+{
41
+ uintptr_t d = (uintptr_t)vd;
35
+ gen_helper_gvec_3 *fn_gvec;
42
+ uintptr_t s = (uintptr_t)vs;
43
+ uintptr_t o = (d | s | n) & 7;
44
+ size_t i;
45
+
36
+
46
+#ifndef HOST_WORDS_BIGENDIAN
37
+ if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
47
+ o = 0;
38
+ return false;
48
+#endif
39
+ }
49
+ switch (o) {
40
+
41
+ /* UNDEF accesses to D16-D31 if they don't exist. */
42
+ if (!dc_isar_feature(aa32_simd_r32, s) &&
43
+ ((a->vd | a->vn | a->vm) & 0x10)) {
44
+ return false;
45
+ }
46
+
47
+ if (a->vd & 1) {
48
+ return false;
49
+ }
50
+
51
+ switch (a->size) {
50
+ case 0:
52
+ case 0:
51
+ memmove(vd, vs, n);
53
+ fn_gvec = gen_helper_neon_pmull_h;
52
+ break;
54
+ break;
55
+ case 2:
56
+ if (!dc_isar_feature(aa32_pmull, s)) {
57
+ return false;
58
+ }
59
+ fn_gvec = gen_helper_gvec_pmull_q;
60
+ break;
61
+ default:
62
+ return false;
63
+ }
53
+
64
+
54
+ case 4:
65
+ if (!vfp_access_check(s)) {
55
+ if (d < s || d >= s + n) {
56
+ for (i = 0; i < n; i += 4) {
57
+ *(uint32_t *)H1_4(d + i) = *(uint32_t *)H1_4(s + i);
58
+ }
59
+ } else {
60
+ for (i = n; i > 0; ) {
61
+ i -= 4;
62
+ *(uint32_t *)H1_4(d + i) = *(uint32_t *)H1_4(s + i);
63
+ }
64
+ }
65
+ break;
66
+
67
+ case 2:
68
+ case 6:
69
+ if (d < s || d >= s + n) {
70
+ for (i = 0; i < n; i += 2) {
71
+ *(uint16_t *)H1_2(d + i) = *(uint16_t *)H1_2(s + i);
72
+ }
73
+ } else {
74
+ for (i = n; i > 0; ) {
75
+ i -= 2;
76
+ *(uint16_t *)H1_2(d + i) = *(uint16_t *)H1_2(s + i);
77
+ }
78
+ }
79
+ break;
80
+
81
+ default:
82
+ if (d < s || d >= s + n) {
83
+ for (i = 0; i < n; i++) {
84
+ *(uint8_t *)H1(d + i) = *(uint8_t *)H1(s + i);
85
+ }
86
+ } else {
87
+ for (i = n; i > 0; ) {
88
+ i -= 1;
89
+ *(uint8_t *)H1(d + i) = *(uint8_t *)H1(s + i);
90
+ }
91
+ }
92
+ break;
93
+ }
94
+}
95
+
96
+void HELPER(sve_ext)(void *vd, void *vn, void *vm, uint32_t desc)
97
+{
98
+ intptr_t opr_sz = simd_oprsz(desc);
99
+ size_t n_ofs = simd_data(desc);
100
+ size_t n_siz = opr_sz - n_ofs;
101
+
102
+ if (vd != vm) {
103
+ swap_memmove(vd, vn + n_ofs, n_siz);
104
+ swap_memmove(vd + n_siz, vm, n_ofs);
105
+ } else if (vd != vn) {
106
+ swap_memmove(vd + n_siz, vd, n_ofs);
107
+ swap_memmove(vd, vn + n_ofs, n_siz);
108
+ } else {
109
+ /* vd == vn == vm. Need temp space. */
110
+ ARMVectorReg tmp;
111
+ swap_memmove(&tmp, vm, n_ofs);
112
+ swap_memmove(vd, vd + n_ofs, n_siz);
113
+ memcpy(vd + n_siz, &tmp, n_ofs);
114
+ }
115
+}
116
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
117
index XXXXXXX..XXXXXXX 100644
118
--- a/target/arm/translate-sve.c
119
+++ b/target/arm/translate-sve.c
120
@@ -XXX,XX +XXX,XX @@ static bool trans_CPY_z_i(DisasContext *s, arg_CPY_z_i *a, uint32_t insn)
121
return true;
122
}
123
124
+/*
125
+ *** SVE Permute Extract Group
126
+ */
127
+
128
+static bool trans_EXT(DisasContext *s, arg_EXT *a, uint32_t insn)
129
+{
130
+ if (!sve_access_check(s)) {
131
+ return true;
66
+ return true;
132
+ }
67
+ }
133
+
68
+
134
+ unsigned vsz = vec_full_reg_size(s);
69
+ tcg_gen_gvec_3_ool(neon_reg_offset(a->vd, 0),
135
+ unsigned n_ofs = a->imm >= vsz ? 0 : a->imm;
70
+ neon_reg_offset(a->vn, 0),
136
+ unsigned n_siz = vsz - n_ofs;
71
+ neon_reg_offset(a->vm, 0),
137
+ unsigned d = vec_full_reg_offset(s, a->rd);
72
+ 16, 16, 0, fn_gvec);
138
+ unsigned n = vec_full_reg_offset(s, a->rn);
139
+ unsigned m = vec_full_reg_offset(s, a->rm);
140
+
141
+ /* Use host vector move insns if we have appropriate sizes
142
+ * and no unfortunate overlap.
143
+ */
144
+ if (m != d
145
+ && n_ofs == size_for_gvec(n_ofs)
146
+ && n_siz == size_for_gvec(n_siz)
147
+ && (d != n || n_siz <= n_ofs)) {
148
+ tcg_gen_gvec_mov(0, d, n + n_ofs, n_siz, n_siz);
149
+ if (n_ofs != 0) {
150
+ tcg_gen_gvec_mov(0, d + n_siz, m, n_ofs, n_ofs);
151
+ }
152
+ } else {
153
+ tcg_gen_gvec_3_ool(d, n, m, vsz, vsz, n_ofs, gen_helper_sve_ext);
154
+ }
155
+ return true;
73
+ return true;
156
+}
74
+}
157
+
75
diff --git a/target/arm/translate.c b/target/arm/translate.c
158
/*
159
*** SVE Memory - 32-bit Gather and Unsized Contiguous Group
160
*/
161
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
162
index XXXXXXX..XXXXXXX 100644
76
index XXXXXXX..XXXXXXX 100644
163
--- a/target/arm/sve.decode
77
--- a/target/arm/translate.c
164
+++ b/target/arm/sve.decode
78
+++ b/target/arm/translate.c
165
@@ -XXX,XX +XXX,XX @@
79
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
166
80
{
167
%imm4_16_p1 16:4 !function=plus1
81
int op;
168
%imm6_22_5 22:1 5:5
82
int q;
169
+%imm8_16_10 16:5 10:3
83
- int rd, rn, rm, rd_ofs, rn_ofs, rm_ofs;
170
%imm9_16_10 16:s6 10:3
84
+ int rd, rn, rm, rd_ofs, rm_ofs;
171
85
int size;
172
# A combination of tsz:imm3 -- extract esize.
86
int pass;
173
@@ -XXX,XX +XXX,XX @@ FCPY 00000101 .. 01 .... 110 imm:8 ..... @rdn_pg4
87
int u;
174
CPY_m_i 00000101 .. 01 .... 01 . ........ ..... @rdn_pg4 imm=%sh8_i8s
88
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
175
CPY_z_i 00000101 .. 01 .... 00 . ........ ..... @rdn_pg4 imm=%sh8_i8s
89
size = (insn >> 20) & 3;
176
90
vec_size = q ? 16 : 8;
177
+### SVE Permute - Extract Group
91
rd_ofs = neon_reg_offset(rd, 0);
178
+
92
- rn_ofs = neon_reg_offset(rn, 0);
179
+# SVE extract vector (immediate offset)
93
rm_ofs = neon_reg_offset(rm, 0);
180
+EXT 00000101 001 ..... 000 ... rm:5 rd:5 \
94
181
+ &rrri rn=%reg_movprfx imm=%imm8_16_10
95
if ((insn & (1 << 23)) == 0) {
182
+
96
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
183
### SVE Predicate Logical Operations Group
97
if (size != 3) {
184
98
op = (insn >> 8) & 0xf;
185
# SVE predicate logical operations
99
if ((insn & (1 << 6)) == 0) {
100
- /* Three registers of different lengths. */
101
- /* undefreq: bit 0 : UNDEF if size == 0
102
- * bit 1 : UNDEF if size == 1
103
- * bit 2 : UNDEF if size == 2
104
- * bit 3 : UNDEF if U == 1
105
- * Note that [2:0] set implies 'always UNDEF'
106
- */
107
- int undefreq;
108
- /* prewiden, src1_wide, src2_wide, undefreq */
109
- static const int neon_3reg_wide[16][4] = {
110
- {0, 0, 0, 7}, /* VADDL: handled by decodetree */
111
- {0, 0, 0, 7}, /* VADDW: handled by decodetree */
112
- {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
113
- {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
114
- {0, 0, 0, 7}, /* VADDHN: handled by decodetree */
115
- {0, 0, 0, 7}, /* VABAL */
116
- {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
117
- {0, 0, 0, 7}, /* VABDL */
118
- {0, 0, 0, 7}, /* VMLAL */
119
- {0, 0, 0, 7}, /* VQDMLAL */
120
- {0, 0, 0, 7}, /* VMLSL */
121
- {0, 0, 0, 7}, /* VQDMLSL */
122
- {0, 0, 0, 7}, /* Integer VMULL */
123
- {0, 0, 0, 7}, /* VQDMULL */
124
- {0, 0, 0, 0xa}, /* Polynomial VMULL */
125
- {0, 0, 0, 7}, /* Reserved: always UNDEF */
126
- };
127
-
128
- undefreq = neon_3reg_wide[op][3];
129
-
130
- if ((undefreq & (1 << size)) ||
131
- ((undefreq & 8) && u)) {
132
- return 1;
133
- }
134
- if (rd & 1) {
135
- return 1;
136
- }
137
-
138
- /* Handle polynomial VMULL in a single pass. */
139
- if (op == 14) {
140
- if (size == 0) {
141
- /* VMULL.P8 */
142
- tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16,
143
- 0, gen_helper_neon_pmull_h);
144
- } else {
145
- /* VMULL.P64 */
146
- if (!dc_isar_feature(aa32_pmull, s)) {
147
- return 1;
148
- }
149
- tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16,
150
- 0, gen_helper_gvec_pmull_q);
151
- }
152
- return 0;
153
- }
154
- abort(); /* all others handled by decodetree */
155
+ /* Three registers of different lengths: handled by decodetree */
156
+ return 1;
157
} else {
158
/* Two registers and a scalar. NB that for ops of this form
159
* the ARM ARM labels bit 24 as Q, but it is in our variable
186
--
160
--
187
2.17.0
161
2.20.1
188
162
189
163
diff view generated by jsdifflib
1
From: Eric Auger <eric.auger@redhat.com>
1
Mark the arrays of function pointers in trans_VSHLL_S_2sh() and
2
trans_VSHLL_U_2sh() as both 'static' and 'const'.
2
3
3
Coverity points out that this can overflow if n > 31,
4
because it's only doing 32-bit arithmetic. Let's use 1ULL instead
5
of 1. Also the formulae used to compute n can be replaced by
6
the level_shift() macro.
7
8
Reported-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Eric Auger <eric.auger@redhat.com>
10
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
11
Message-id: 1526493784-25328-3-git-send-email-eric.auger@redhat.com
12
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
13
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
14
---
6
---
15
hw/arm/smmu-common.c | 4 ++--
7
target/arm/translate-neon.inc.c | 4 ++--
16
1 file changed, 2 insertions(+), 2 deletions(-)
8
1 file changed, 2 insertions(+), 2 deletions(-)
17
9
18
diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
10
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
19
index XXXXXXX..XXXXXXX 100644
11
index XXXXXXX..XXXXXXX 100644
20
--- a/hw/arm/smmu-common.c
12
--- a/target/arm/translate-neon.inc.c
21
+++ b/hw/arm/smmu-common.c
13
+++ b/target/arm/translate-neon.inc.c
22
@@ -XXX,XX +XXX,XX @@ static inline hwaddr get_table_pte_address(uint64_t pte, int granule_sz)
14
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
23
static inline hwaddr get_block_pte_address(uint64_t pte, int level,
15
24
int granule_sz, uint64_t *bsz)
16
static bool trans_VSHLL_S_2sh(DisasContext *s, arg_2reg_shift *a)
25
{
17
{
26
- int n = (granule_sz - 3) * (4 - level) + 3;
18
- NeonGenWidenFn *widenfn[] = {
27
+ int n = level_shift(level, granule_sz);
19
+ static NeonGenWidenFn * const widenfn[] = {
28
20
gen_helper_neon_widen_s8,
29
- *bsz = 1 << n;
21
gen_helper_neon_widen_s16,
30
+ *bsz = 1ULL << n;
22
tcg_gen_ext_i32_i64,
31
return PTE_ADDRESS(pte, n);
23
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL_S_2sh(DisasContext *s, arg_2reg_shift *a)
32
}
24
33
25
static bool trans_VSHLL_U_2sh(DisasContext *s, arg_2reg_shift *a)
26
{
27
- NeonGenWidenFn *widenfn[] = {
28
+ static NeonGenWidenFn * const widenfn[] = {
29
gen_helper_neon_widen_u8,
30
gen_helper_neon_widen_u16,
31
tcg_gen_extu_i32_i64,
34
--
32
--
35
2.17.0
33
2.20.1
36
34
37
35
diff view generated by jsdifflib
1
From: Richard Henderson <richard.henderson@linaro.org>
1
In commit 37bfce81b10450071 we accidentally introduced a leak of a TCG
2
temporary in do_2shift_env_64(); free it.
2
3
3
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
5
Message-id: 20180516223007.10256-18-richard.henderson@linaro.org
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
7
---
6
---
8
target/arm/translate-sve.c | 27 +++++++++++++++++++++++++++
7
target/arm/translate-neon.inc.c | 1 +
9
target/arm/sve.decode | 12 ++++++++++++
8
1 file changed, 1 insertion(+)
10
2 files changed, 39 insertions(+)
11
9
12
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
10
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
13
index XXXXXXX..XXXXXXX 100644
11
index XXXXXXX..XXXXXXX 100644
14
--- a/target/arm/translate-sve.c
12
--- a/target/arm/translate-neon.inc.c
15
+++ b/target/arm/translate-sve.c
13
+++ b/target/arm/translate-neon.inc.c
16
@@ -XXX,XX +XXX,XX @@ static bool trans_INDEX_rr(DisasContext *s, arg_INDEX_rr *a, uint32_t insn)
14
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_64(DisasContext *s, arg_2reg_shift *a,
15
neon_load_reg64(tmp, a->vm + pass);
16
fn(tmp, cpu_env, tmp, constimm);
17
neon_store_reg64(tmp, a->vd + pass);
18
+ tcg_temp_free_i64(tmp);
19
}
20
tcg_temp_free_i64(constimm);
17
return true;
21
return true;
18
}
19
20
+/*
21
+ *** SVE Stack Allocation Group
22
+ */
23
+
24
+static bool trans_ADDVL(DisasContext *s, arg_ADDVL *a, uint32_t insn)
25
+{
26
+ TCGv_i64 rd = cpu_reg_sp(s, a->rd);
27
+ TCGv_i64 rn = cpu_reg_sp(s, a->rn);
28
+ tcg_gen_addi_i64(rd, rn, a->imm * vec_full_reg_size(s));
29
+ return true;
30
+}
31
+
32
+static bool trans_ADDPL(DisasContext *s, arg_ADDPL *a, uint32_t insn)
33
+{
34
+ TCGv_i64 rd = cpu_reg_sp(s, a->rd);
35
+ TCGv_i64 rn = cpu_reg_sp(s, a->rn);
36
+ tcg_gen_addi_i64(rd, rn, a->imm * pred_full_reg_size(s));
37
+ return true;
38
+}
39
+
40
+static bool trans_RDVL(DisasContext *s, arg_RDVL *a, uint32_t insn)
41
+{
42
+ TCGv_i64 reg = cpu_reg(s, a->rd);
43
+ tcg_gen_movi_i64(reg, a->imm * vec_full_reg_size(s));
44
+ return true;
45
+}
46
+
47
/*
48
*** SVE Predicate Logical Operations Group
49
*/
50
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
51
index XXXXXXX..XXXXXXX 100644
52
--- a/target/arm/sve.decode
53
+++ b/target/arm/sve.decode
54
@@ -XXX,XX +XXX,XX @@
55
# One register operand, with governing predicate, vector element size
56
@rd_pg_rn ........ esz:2 ... ... ... pg:3 rn:5 rd:5 &rpr_esz
57
58
+# Two register operands with a 6-bit signed immediate.
59
+@rd_rn_i6 ........ ... rn:5 ..... imm:s6 rd:5 &rri
60
+
61
# Two register operand, one immediate operand, with predicate,
62
# element size encoded as TSZHL. User must fill in imm.
63
@rdn_pg_tszimm ........ .. ... ... ... pg:3 ..... rd:5 \
64
@@ -XXX,XX +XXX,XX @@ INDEX_ri 00000100 esz:2 1 imm:s5 010001 rn:5 rd:5
65
# SVE index generation (register start, register increment)
66
INDEX_rr 00000100 .. 1 ..... 010011 ..... ..... @rd_rn_rm
67
68
+### SVE Stack Allocation Group
69
+
70
+# SVE stack frame adjustment
71
+ADDVL 00000100 001 ..... 01010 ...... ..... @rd_rn_i6
72
+ADDPL 00000100 011 ..... 01010 ...... ..... @rd_rn_i6
73
+
74
+# SVE stack frame size
75
+RDVL 00000100 101 11111 01010 imm:s6 rd:5
76
+
77
### SVE Predicate Logical Operations Group
78
79
# SVE predicate logical operations
80
--
22
--
81
2.17.0
23
2.20.1
82
24
83
25
diff view generated by jsdifflib
1
From: Richard Henderson <richard.henderson@linaro.org>
1
Convert the VMLA, VMLS and VMUL insns in the Neon "2 registers and a
2
2
scalar" group to decodetree. These are 32x32->32 operations where
3
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
3
one of the inputs is the scalar, followed by a possible accumulate
4
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
4
operation of the 32-bit result.
5
Message-id: 20180516223007.10256-22-richard.henderson@linaro.org
5
6
The refactoring removes some of the oddities of the old decoder:
7
* operands to the operation and accumulation were often
8
reversed (taking advantage of the fact that most of these ops
9
are commutative); the new code follows the pseudocode order
10
* the Q bit in the insn was in a local variable 'u'; in the
11
new code it is decoded into a->q
12
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
13
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
14
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
7
---
15
---
8
target/arm/helper-sve.h | 4 ++++
16
target/arm/neon-dp.decode | 15 ++++
9
target/arm/sve_helper.c | 43 ++++++++++++++++++++++++++++++++++++++
17
target/arm/translate-neon.inc.c | 133 ++++++++++++++++++++++++++++++++
10
target/arm/translate-sve.c | 21 +++++++++++++++++++
18
target/arm/translate.c | 77 ++----------------
11
target/arm/sve.decode | 4 ++++
19
3 files changed, 154 insertions(+), 71 deletions(-)
12
4 files changed, 72 insertions(+)
20
13
21
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
14
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
15
index XXXXXXX..XXXXXXX 100644
22
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/helper-sve.h
23
--- a/target/arm/neon-dp.decode
17
+++ b/target/arm/helper-sve.h
24
+++ b/target/arm/neon-dp.decode
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(sve_fexpa_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
25
@@ -XXX,XX +XXX,XX @@ Vimm_1r 1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
19
DEF_HELPER_FLAGS_3(sve_fexpa_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
26
VQDMULL_3d 1111 001 0 1 . .. .... .... 1101 . 0 . 0 .... @3diff
20
DEF_HELPER_FLAGS_3(sve_fexpa_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
27
21
28
VMULL_P_3d 1111 001 0 1 . .. .... .... 1110 . 0 . 0 .... @3diff
22
+DEF_HELPER_FLAGS_4(sve_ftssel_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
29
+
23
+DEF_HELPER_FLAGS_4(sve_ftssel_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
30
+ ##################################################################
24
+DEF_HELPER_FLAGS_4(sve_ftssel_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
31
+ # 2-regs-plus-scalar grouping:
25
+
32
+ # 1111 001 Q 1 D sz!=11 Vn:4 Vd:4 opc:4 N 1 M 0 Vm:4
26
DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
33
+ ##################################################################
27
DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
34
+ &2scalar vm vn vd size q
28
DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
35
+
29
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
36
+ @2scalar .... ... q:1 . . size:2 .... .... .... . . . . .... \
37
+ &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp
38
+
39
+ VMLA_2sc 1111 001 . 1 . .. .... .... 0000 . 1 . 0 .... @2scalar
40
+
41
+ VMLS_2sc 1111 001 . 1 . .. .... .... 0100 . 1 . 0 .... @2scalar
42
+
43
+ VMUL_2sc 1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
44
]
45
}
46
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
30
index XXXXXXX..XXXXXXX 100644
47
index XXXXXXX..XXXXXXX 100644
31
--- a/target/arm/sve_helper.c
48
--- a/target/arm/translate-neon.inc.c
32
+++ b/target/arm/sve_helper.c
49
+++ b/target/arm/translate-neon.inc.c
33
@@ -XXX,XX +XXX,XX @@
50
@@ -XXX,XX +XXX,XX @@ static bool trans_VMULL_P_3d(DisasContext *s, arg_3diff *a)
34
#include "exec/cpu_ldst.h"
51
16, 16, 0, fn_gvec);
35
#include "exec/helper-proto.h"
36
#include "tcg/tcg-gvec-desc.h"
37
+#include "fpu/softfloat.h"
38
39
40
/* Note that vector data is stored in host-endian 64-bit chunks,
41
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_fexpa_d)(void *vd, void *vn, uint32_t desc)
42
d[i] = coeff[idx] | (exp << 52);
43
}
44
}
45
+
46
+void HELPER(sve_ftssel_h)(void *vd, void *vn, void *vm, uint32_t desc)
47
+{
48
+ intptr_t i, opr_sz = simd_oprsz(desc) / 2;
49
+ uint16_t *d = vd, *n = vn, *m = vm;
50
+ for (i = 0; i < opr_sz; i += 1) {
51
+ uint16_t nn = n[i];
52
+ uint16_t mm = m[i];
53
+ if (mm & 1) {
54
+ nn = float16_one;
55
+ }
56
+ d[i] = nn ^ (mm & 2) << 14;
57
+ }
58
+}
59
+
60
+void HELPER(sve_ftssel_s)(void *vd, void *vn, void *vm, uint32_t desc)
61
+{
62
+ intptr_t i, opr_sz = simd_oprsz(desc) / 4;
63
+ uint32_t *d = vd, *n = vn, *m = vm;
64
+ for (i = 0; i < opr_sz; i += 1) {
65
+ uint32_t nn = n[i];
66
+ uint32_t mm = m[i];
67
+ if (mm & 1) {
68
+ nn = float32_one;
69
+ }
70
+ d[i] = nn ^ (mm & 2) << 30;
71
+ }
72
+}
73
+
74
+void HELPER(sve_ftssel_d)(void *vd, void *vn, void *vm, uint32_t desc)
75
+{
76
+ intptr_t i, opr_sz = simd_oprsz(desc) / 8;
77
+ uint64_t *d = vd, *n = vn, *m = vm;
78
+ for (i = 0; i < opr_sz; i += 1) {
79
+ uint64_t nn = n[i];
80
+ uint64_t mm = m[i];
81
+ if (mm & 1) {
82
+ nn = float64_one;
83
+ }
84
+ d[i] = nn ^ (mm & 2) << 62;
85
+ }
86
+}
87
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
88
index XXXXXXX..XXXXXXX 100644
89
--- a/target/arm/translate-sve.c
90
+++ b/target/arm/translate-sve.c
91
@@ -XXX,XX +XXX,XX @@ static bool trans_FEXPA(DisasContext *s, arg_rr_esz *a, uint32_t insn)
92
return true;
52
return true;
93
}
53
}
94
54
+
95
+static bool trans_FTSSEL(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
55
+static void gen_neon_dup_low16(TCGv_i32 var)
96
+{
56
+{
97
+ static gen_helper_gvec_3 * const fns[4] = {
57
+ TCGv_i32 tmp = tcg_temp_new_i32();
98
+ NULL,
58
+ tcg_gen_ext16u_i32(var, var);
99
+ gen_helper_sve_ftssel_h,
59
+ tcg_gen_shli_i32(tmp, var, 16);
100
+ gen_helper_sve_ftssel_s,
60
+ tcg_gen_or_i32(var, var, tmp);
101
+ gen_helper_sve_ftssel_d,
61
+ tcg_temp_free_i32(tmp);
102
+ };
62
+}
103
+ if (a->esz == 0) {
63
+
64
+static void gen_neon_dup_high16(TCGv_i32 var)
65
+{
66
+ TCGv_i32 tmp = tcg_temp_new_i32();
67
+ tcg_gen_andi_i32(var, var, 0xffff0000);
68
+ tcg_gen_shri_i32(tmp, var, 16);
69
+ tcg_gen_or_i32(var, var, tmp);
70
+ tcg_temp_free_i32(tmp);
71
+}
72
+
73
+static inline TCGv_i32 neon_get_scalar(int size, int reg)
74
+{
75
+ TCGv_i32 tmp;
76
+ if (size == 1) {
77
+ tmp = neon_load_reg(reg & 7, reg >> 4);
78
+ if (reg & 8) {
79
+ gen_neon_dup_high16(tmp);
80
+ } else {
81
+ gen_neon_dup_low16(tmp);
82
+ }
83
+ } else {
84
+ tmp = neon_load_reg(reg & 15, reg >> 4);
85
+ }
86
+ return tmp;
87
+}
88
+
89
+static bool do_2scalar(DisasContext *s, arg_2scalar *a,
90
+ NeonGenTwoOpFn *opfn, NeonGenTwoOpFn *accfn)
91
+{
92
+ /*
93
+ * Two registers and a scalar: perform an operation between
94
+ * the input elements and the scalar, and then possibly
95
+ * perform an accumulation operation of that result into the
96
+ * destination.
97
+ */
98
+ TCGv_i32 scalar;
99
+ int pass;
100
+
101
+ if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
104
+ return false;
102
+ return false;
105
+ }
103
+ }
106
+ if (sve_access_check(s)) {
104
+
107
+ unsigned vsz = vec_full_reg_size(s);
105
+ /* UNDEF accesses to D16-D31 if they don't exist. */
108
+ tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
106
+ if (!dc_isar_feature(aa32_simd_r32, s) &&
109
+ vec_full_reg_offset(s, a->rn),
107
+ ((a->vd | a->vn | a->vm) & 0x10)) {
110
+ vec_full_reg_offset(s, a->rm),
108
+ return false;
111
+ vsz, vsz, 0, fns[a->esz]);
109
+ }
112
+ }
110
+
111
+ if (!opfn) {
112
+ /* Bad size (including size == 3, which is a different insn group) */
113
+ return false;
114
+ }
115
+
116
+ if (a->q && ((a->vd | a->vn) & 1)) {
117
+ return false;
118
+ }
119
+
120
+ if (!vfp_access_check(s)) {
121
+ return true;
122
+ }
123
+
124
+ scalar = neon_get_scalar(a->size, a->vm);
125
+
126
+ for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
127
+ TCGv_i32 tmp = neon_load_reg(a->vn, pass);
128
+ opfn(tmp, tmp, scalar);
129
+ if (accfn) {
130
+ TCGv_i32 rd = neon_load_reg(a->vd, pass);
131
+ accfn(tmp, rd, tmp);
132
+ tcg_temp_free_i32(rd);
133
+ }
134
+ neon_store_reg(a->vd, pass, tmp);
135
+ }
136
+ tcg_temp_free_i32(scalar);
113
+ return true;
137
+ return true;
114
+}
138
+}
115
+
139
+
116
/*
140
+static bool trans_VMUL_2sc(DisasContext *s, arg_2scalar *a)
117
*** SVE Predicate Logical Operations Group
141
+{
118
*/
142
+ static NeonGenTwoOpFn * const opfn[] = {
119
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
143
+ NULL,
144
+ gen_helper_neon_mul_u16,
145
+ tcg_gen_mul_i32,
146
+ NULL,
147
+ };
148
+
149
+ return do_2scalar(s, a, opfn[a->size], NULL);
150
+}
151
+
152
+static bool trans_VMLA_2sc(DisasContext *s, arg_2scalar *a)
153
+{
154
+ static NeonGenTwoOpFn * const opfn[] = {
155
+ NULL,
156
+ gen_helper_neon_mul_u16,
157
+ tcg_gen_mul_i32,
158
+ NULL,
159
+ };
160
+ static NeonGenTwoOpFn * const accfn[] = {
161
+ NULL,
162
+ gen_helper_neon_add_u16,
163
+ tcg_gen_add_i32,
164
+ NULL,
165
+ };
166
+
167
+ return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
168
+}
169
+
170
+static bool trans_VMLS_2sc(DisasContext *s, arg_2scalar *a)
171
+{
172
+ static NeonGenTwoOpFn * const opfn[] = {
173
+ NULL,
174
+ gen_helper_neon_mul_u16,
175
+ tcg_gen_mul_i32,
176
+ NULL,
177
+ };
178
+ static NeonGenTwoOpFn * const accfn[] = {
179
+ NULL,
180
+ gen_helper_neon_sub_u16,
181
+ tcg_gen_sub_i32,
182
+ NULL,
183
+ };
184
+
185
+ return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
186
+}
187
diff --git a/target/arm/translate.c b/target/arm/translate.c
120
index XXXXXXX..XXXXXXX 100644
188
index XXXXXXX..XXXXXXX 100644
121
--- a/target/arm/sve.decode
189
--- a/target/arm/translate.c
122
+++ b/target/arm/sve.decode
190
+++ b/target/arm/translate.c
123
@@ -XXX,XX +XXX,XX @@ ADR_p64 00000100 11 1 ..... 1010 .. ..... ..... @rd_rn_msz_rm
191
@@ -XXX,XX +XXX,XX @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
124
# Note esz != 0
192
#define VFP_DREG_N(reg, insn) VFP_DREG(reg, insn, 16, 7)
125
FEXPA 00000100 .. 1 00000 101110 ..... ..... @rd_rn
193
#define VFP_DREG_M(reg, insn) VFP_DREG(reg, insn, 0, 5)
126
194
127
+# SVE floating-point trig select coefficient
195
-static void gen_neon_dup_low16(TCGv_i32 var)
128
+# Note esz != 0
196
-{
129
+FTSSEL 00000100 .. 1 ..... 101100 ..... ..... @rd_rn_rm
197
- TCGv_i32 tmp = tcg_temp_new_i32();
130
+
198
- tcg_gen_ext16u_i32(var, var);
131
### SVE Predicate Logical Operations Group
199
- tcg_gen_shli_i32(tmp, var, 16);
132
200
- tcg_gen_or_i32(var, var, tmp);
133
# SVE predicate logical operations
201
- tcg_temp_free_i32(tmp);
202
-}
203
-
204
-static void gen_neon_dup_high16(TCGv_i32 var)
205
-{
206
- TCGv_i32 tmp = tcg_temp_new_i32();
207
- tcg_gen_andi_i32(var, var, 0xffff0000);
208
- tcg_gen_shri_i32(tmp, var, 16);
209
- tcg_gen_or_i32(var, var, tmp);
210
- tcg_temp_free_i32(tmp);
211
-}
212
-
213
static inline bool use_goto_tb(DisasContext *s, target_ulong dest)
214
{
215
#ifndef CONFIG_USER_ONLY
216
@@ -XXX,XX +XXX,XX @@ static void gen_exception_return(DisasContext *s, TCGv_i32 pc)
217
218
#define CPU_V001 cpu_V0, cpu_V0, cpu_V1
219
220
-static inline void gen_neon_add(int size, TCGv_i32 t0, TCGv_i32 t1)
221
-{
222
- switch (size) {
223
- case 0: gen_helper_neon_add_u8(t0, t0, t1); break;
224
- case 1: gen_helper_neon_add_u16(t0, t0, t1); break;
225
- case 2: tcg_gen_add_i32(t0, t0, t1); break;
226
- default: abort();
227
- }
228
-}
229
-
230
-static inline void gen_neon_rsb(int size, TCGv_i32 t0, TCGv_i32 t1)
231
-{
232
- switch (size) {
233
- case 0: gen_helper_neon_sub_u8(t0, t1, t0); break;
234
- case 1: gen_helper_neon_sub_u16(t0, t1, t0); break;
235
- case 2: tcg_gen_sub_i32(t0, t1, t0); break;
236
- default: return;
237
- }
238
-}
239
-
240
static TCGv_i32 neon_load_scratch(int scratch)
241
{
242
TCGv_i32 tmp = tcg_temp_new_i32();
243
@@ -XXX,XX +XXX,XX @@ static void neon_store_scratch(int scratch, TCGv_i32 var)
244
tcg_temp_free_i32(var);
245
}
246
247
-static inline TCGv_i32 neon_get_scalar(int size, int reg)
248
-{
249
- TCGv_i32 tmp;
250
- if (size == 1) {
251
- tmp = neon_load_reg(reg & 7, reg >> 4);
252
- if (reg & 8) {
253
- gen_neon_dup_high16(tmp);
254
- } else {
255
- gen_neon_dup_low16(tmp);
256
- }
257
- } else {
258
- tmp = neon_load_reg(reg & 15, reg >> 4);
259
- }
260
- return tmp;
261
-}
262
-
263
static int gen_neon_unzip(int rd, int rm, int size, int q)
264
{
265
TCGv_ptr pd, pm;
266
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
267
return 1;
268
}
269
switch (op) {
270
+ case 0: /* Integer VMLA scalar */
271
+ case 4: /* Integer VMLS scalar */
272
+ case 8: /* Integer VMUL scalar */
273
+ return 1; /* handled by decodetree */
274
+
275
case 1: /* Float VMLA scalar */
276
case 5: /* Floating point VMLS scalar */
277
case 9: /* Floating point VMUL scalar */
278
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
279
return 1;
280
}
281
/* fall through */
282
- case 0: /* Integer VMLA scalar */
283
- case 4: /* Integer VMLS scalar */
284
- case 8: /* Integer VMUL scalar */
285
case 12: /* VQDMULH scalar */
286
case 13: /* VQRDMULH scalar */
287
if (u && ((rd | rn) & 1)) {
288
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
289
} else {
290
gen_helper_neon_qrdmulh_s32(tmp, cpu_env, tmp, tmp2);
291
}
292
- } else if (op & 1) {
293
+ } else {
294
TCGv_ptr fpstatus = get_fpstatus_ptr(1);
295
gen_helper_vfp_muls(tmp, tmp, tmp2, fpstatus);
296
tcg_temp_free_ptr(fpstatus);
297
- } else {
298
- switch (size) {
299
- case 0: gen_helper_neon_mul_u8(tmp, tmp, tmp2); break;
300
- case 1: gen_helper_neon_mul_u16(tmp, tmp, tmp2); break;
301
- case 2: tcg_gen_mul_i32(tmp, tmp, tmp2); break;
302
- default: abort();
303
- }
304
}
305
tcg_temp_free_i32(tmp2);
306
if (op < 8) {
307
/* Accumulate. */
308
tmp2 = neon_load_reg(rd, pass);
309
switch (op) {
310
- case 0:
311
- gen_neon_add(size, tmp, tmp2);
312
- break;
313
case 1:
314
{
315
TCGv_ptr fpstatus = get_fpstatus_ptr(1);
316
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
317
tcg_temp_free_ptr(fpstatus);
318
break;
319
}
320
- case 4:
321
- gen_neon_rsb(size, tmp, tmp2);
322
- break;
323
case 5:
324
{
325
TCGv_ptr fpstatus = get_fpstatus_ptr(1);
134
--
326
--
135
2.17.0
327
2.20.1
136
328
137
329
diff view generated by jsdifflib
1
From: Richard Henderson <richard.henderson@linaro.org>
1
Convert the float versions of VMLA, VMLS and VMUL in the Neon
2
2-reg-scalar group to decodetree.
2
3
3
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
4
Message-id: 20180516223007.10256-5-richard.henderson@linaro.org
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
---
5
---
7
target/arm/translate-sve.c | 127 +++++++++++++++++++++++++++++++++++++
6
As noted in the comment on the WRAP_FP_FN macro, we could have
8
target/arm/sve.decode | 20 ++++++
7
had a do_2scalar_fp() function, but for 3 insns it seemed
9
2 files changed, 147 insertions(+)
8
simpler to just do the wrapping to get hold of the fpstatus ptr.
9
(These are the only fp insns in the group.)
10
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
11
---
12
target/arm/neon-dp.decode | 3 ++
13
target/arm/translate-neon.inc.c | 65 +++++++++++++++++++++++++++++++++
14
target/arm/translate.c | 37 ++-----------------
15
3 files changed, 71 insertions(+), 34 deletions(-)
10
16
11
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
17
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
12
index XXXXXXX..XXXXXXX 100644
18
index XXXXXXX..XXXXXXX 100644
13
--- a/target/arm/translate-sve.c
19
--- a/target/arm/neon-dp.decode
14
+++ b/target/arm/translate-sve.c
20
+++ b/target/arm/neon-dp.decode
15
@@ -XXX,XX +XXX,XX @@
21
@@ -XXX,XX +XXX,XX @@ Vimm_1r 1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
16
* Implement all of the translator functions referenced by the decoder.
22
&2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp
17
*/
23
18
24
VMLA_2sc 1111 001 . 1 . .. .... .... 0000 . 1 . 0 .... @2scalar
19
+/* Return the offset info CPUARMState of the predicate vector register Pn.
25
+ VMLA_F_2sc 1111 001 . 1 . .. .... .... 0001 . 1 . 0 .... @2scalar
20
+ * Note for this purpose, FFR is P16.
26
21
+ */
27
VMLS_2sc 1111 001 . 1 . .. .... .... 0100 . 1 . 0 .... @2scalar
22
+static inline int pred_full_reg_offset(DisasContext *s, int regno)
28
+ VMLS_F_2sc 1111 001 . 1 . .. .... .... 0101 . 1 . 0 .... @2scalar
23
+{
29
24
+ return offsetof(CPUARMState, vfp.pregs[regno]);
30
VMUL_2sc 1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
25
+}
31
+ VMUL_F_2sc 1111 001 . 1 . .. .... .... 1001 . 1 . 0 .... @2scalar
26
+
32
]
27
+/* Return the byte size of the whole predicate register, VL / 64. */
33
}
28
+static inline int pred_full_reg_size(DisasContext *s)
34
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
29
+{
35
index XXXXXXX..XXXXXXX 100644
30
+ return s->sve_len >> 3;
36
--- a/target/arm/translate-neon.inc.c
31
+}
37
+++ b/target/arm/translate-neon.inc.c
32
+
38
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLS_2sc(DisasContext *s, arg_2scalar *a)
33
/* Invoke a vector expander on two Zregs. */
39
34
static bool do_vector2_z(DisasContext *s, GVecGen2Fn *gvec_fn,
40
return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
35
int esz, int rd, int rn)
36
@@ -XXX,XX +XXX,XX @@ static bool trans_BIC_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
37
{
38
return do_vector3_z(s, tcg_gen_gvec_andc, 0, a->rd, a->rn, a->rm);
39
}
41
}
40
+
42
+
41
+/*
43
+/*
42
+ *** SVE Memory - 32-bit Gather and Unsized Contiguous Group
44
+ * Rather than have a float-specific version of do_2scalar just for
45
+ * three insns, we wrap a NeonGenTwoSingleOpFn to turn it into
46
+ * a NeonGenTwoOpFn.
43
+ */
47
+ */
44
+
48
+#define WRAP_FP_FN(WRAPNAME, FUNC) \
45
+/* Subroutine loading a vector register at VOFS of LEN bytes.
49
+ static void WRAPNAME(TCGv_i32 rd, TCGv_i32 rn, TCGv_i32 rm) \
46
+ * The load should begin at the address Rn + IMM.
50
+ { \
47
+ */
51
+ TCGv_ptr fpstatus = get_fpstatus_ptr(1); \
48
+
52
+ FUNC(rd, rn, rm, fpstatus); \
49
+static void do_ldr(DisasContext *s, uint32_t vofs, uint32_t len,
53
+ tcg_temp_free_ptr(fpstatus); \
50
+ int rn, int imm)
51
+{
52
+ uint32_t len_align = QEMU_ALIGN_DOWN(len, 8);
53
+ uint32_t len_remain = len % 8;
54
+ uint32_t nparts = len / 8 + ctpop8(len_remain);
55
+ int midx = get_mem_index(s);
56
+ TCGv_i64 addr, t0, t1;
57
+
58
+ addr = tcg_temp_new_i64();
59
+ t0 = tcg_temp_new_i64();
60
+
61
+ /* Note that unpredicated load/store of vector/predicate registers
62
+ * are defined as a stream of bytes, which equates to little-endian
63
+ * operations on larger quantities. There is no nice way to force
64
+ * a little-endian load for aarch64_be-linux-user out of line.
65
+ *
66
+ * Attempt to keep code expansion to a minimum by limiting the
67
+ * amount of unrolling done.
68
+ */
69
+ if (nparts <= 4) {
70
+ int i;
71
+
72
+ for (i = 0; i < len_align; i += 8) {
73
+ tcg_gen_addi_i64(addr, cpu_reg_sp(s, rn), imm + i);
74
+ tcg_gen_qemu_ld_i64(t0, addr, midx, MO_LEQ);
75
+ tcg_gen_st_i64(t0, cpu_env, vofs + i);
76
+ }
77
+ } else {
78
+ TCGLabel *loop = gen_new_label();
79
+ TCGv_ptr tp, i = tcg_const_local_ptr(0);
80
+
81
+ gen_set_label(loop);
82
+
83
+ /* Minimize the number of local temps that must be re-read from
84
+ * the stack each iteration. Instead, re-compute values other
85
+ * than the loop counter.
86
+ */
87
+ tp = tcg_temp_new_ptr();
88
+ tcg_gen_addi_ptr(tp, i, imm);
89
+ tcg_gen_extu_ptr_i64(addr, tp);
90
+ tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, rn));
91
+
92
+ tcg_gen_qemu_ld_i64(t0, addr, midx, MO_LEQ);
93
+
94
+ tcg_gen_add_ptr(tp, cpu_env, i);
95
+ tcg_gen_addi_ptr(i, i, 8);
96
+ tcg_gen_st_i64(t0, tp, vofs);
97
+ tcg_temp_free_ptr(tp);
98
+
99
+ tcg_gen_brcondi_ptr(TCG_COND_LTU, i, len_align, loop);
100
+ tcg_temp_free_ptr(i);
101
+ }
54
+ }
102
+
55
+
103
+ /* Predicate register loads can be any multiple of 2.
56
+WRAP_FP_FN(gen_VMUL_F_mul, gen_helper_vfp_muls)
104
+ * Note that we still store the entire 64-bit unit into cpu_env.
57
+WRAP_FP_FN(gen_VMUL_F_add, gen_helper_vfp_adds)
105
+ */
58
+WRAP_FP_FN(gen_VMUL_F_sub, gen_helper_vfp_subs)
106
+ if (len_remain) {
107
+ tcg_gen_addi_i64(addr, cpu_reg_sp(s, rn), imm + len_align);
108
+
59
+
109
+ switch (len_remain) {
60
+static bool trans_VMUL_F_2sc(DisasContext *s, arg_2scalar *a)
110
+ case 2:
61
+{
111
+ case 4:
62
+ static NeonGenTwoOpFn * const opfn[] = {
112
+ case 8:
63
+ NULL,
113
+ tcg_gen_qemu_ld_i64(t0, addr, midx, MO_LE | ctz32(len_remain));
64
+ NULL, /* TODO: fp16 support */
114
+ break;
65
+ gen_VMUL_F_mul,
66
+ NULL,
67
+ };
115
+
68
+
116
+ case 6:
69
+ return do_2scalar(s, a, opfn[a->size], NULL);
117
+ t1 = tcg_temp_new_i64();
118
+ tcg_gen_qemu_ld_i64(t0, addr, midx, MO_LEUL);
119
+ tcg_gen_addi_i64(addr, addr, 4);
120
+ tcg_gen_qemu_ld_i64(t1, addr, midx, MO_LEUW);
121
+ tcg_gen_deposit_i64(t0, t0, t1, 32, 32);
122
+ tcg_temp_free_i64(t1);
123
+ break;
124
+
125
+ default:
126
+ g_assert_not_reached();
127
+ }
128
+ tcg_gen_st_i64(t0, cpu_env, vofs + len_align);
129
+ }
130
+ tcg_temp_free_i64(addr);
131
+ tcg_temp_free_i64(t0);
132
+}
70
+}
133
+
71
+
134
+static bool trans_LDR_zri(DisasContext *s, arg_rri *a, uint32_t insn)
72
+static bool trans_VMLA_F_2sc(DisasContext *s, arg_2scalar *a)
135
+{
73
+{
136
+ if (sve_access_check(s)) {
74
+ static NeonGenTwoOpFn * const opfn[] = {
137
+ int size = vec_full_reg_size(s);
75
+ NULL,
138
+ int off = vec_full_reg_offset(s, a->rd);
76
+ NULL, /* TODO: fp16 support */
139
+ do_ldr(s, off, size, a->rn, a->imm * size);
77
+ gen_VMUL_F_mul,
140
+ }
78
+ NULL,
141
+ return true;
79
+ };
80
+ static NeonGenTwoOpFn * const accfn[] = {
81
+ NULL,
82
+ NULL, /* TODO: fp16 support */
83
+ gen_VMUL_F_add,
84
+ NULL,
85
+ };
86
+
87
+ return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
142
+}
88
+}
143
+
89
+
144
+static bool trans_LDR_pri(DisasContext *s, arg_rri *a, uint32_t insn)
90
+static bool trans_VMLS_F_2sc(DisasContext *s, arg_2scalar *a)
145
+{
91
+{
146
+ if (sve_access_check(s)) {
92
+ static NeonGenTwoOpFn * const opfn[] = {
147
+ int size = pred_full_reg_size(s);
93
+ NULL,
148
+ int off = pred_full_reg_offset(s, a->rd);
94
+ NULL, /* TODO: fp16 support */
149
+ do_ldr(s, off, size, a->rn, a->imm * size);
95
+ gen_VMUL_F_mul,
150
+ }
96
+ NULL,
151
+ return true;
97
+ };
98
+ static NeonGenTwoOpFn * const accfn[] = {
99
+ NULL,
100
+ NULL, /* TODO: fp16 support */
101
+ gen_VMUL_F_sub,
102
+ NULL,
103
+ };
104
+
105
+ return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
152
+}
106
+}
153
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
107
diff --git a/target/arm/translate.c b/target/arm/translate.c
154
index XXXXXXX..XXXXXXX 100644
108
index XXXXXXX..XXXXXXX 100644
155
--- a/target/arm/sve.decode
109
--- a/target/arm/translate.c
156
+++ b/target/arm/sve.decode
110
+++ b/target/arm/translate.c
157
@@ -XXX,XX +XXX,XX @@
111
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
158
# This file is processed by scripts/decodetree.py
112
case 0: /* Integer VMLA scalar */
159
#
113
case 4: /* Integer VMLS scalar */
160
114
case 8: /* Integer VMUL scalar */
161
+###########################################################################
115
- return 1; /* handled by decodetree */
162
+# Named fields. These are primarily for disjoint fields.
116
-
117
case 1: /* Float VMLA scalar */
118
case 5: /* Floating point VMLS scalar */
119
case 9: /* Floating point VMUL scalar */
120
- if (size == 1) {
121
- return 1;
122
- }
123
- /* fall through */
124
+ return 1; /* handled by decodetree */
163
+
125
+
164
+%imm9_16_10 16:s6 10:3
126
case 12: /* VQDMULH scalar */
165
+
127
case 13: /* VQRDMULH scalar */
166
###########################################################################
128
if (u && ((rd | rn) & 1)) {
167
# Named attribute sets. These are used to make nice(er) names
129
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
168
# when creating helpers common to those for the individual
130
} else {
169
# instruction patterns.
131
gen_helper_neon_qdmulh_s32(tmp, cpu_env, tmp, tmp2);
170
132
}
171
+&rri rd rn imm
133
- } else if (op == 13) {
172
&rrr_esz rd rn rm esz
134
+ } else {
173
135
if (size == 1) {
174
###########################################################################
136
gen_helper_neon_qrdmulh_s16(tmp, cpu_env, tmp, tmp2);
175
@@ -XXX,XX +XXX,XX @@
137
} else {
176
# Three operand with unused vector element size
138
gen_helper_neon_qrdmulh_s32(tmp, cpu_env, tmp, tmp2);
177
@rd_rn_rm_e0 ........ ... rm:5 ... ... rn:5 rd:5 &rrr_esz esz=0
139
}
178
140
- } else {
179
+# Basic Load/Store with 9-bit immediate offset
141
- TCGv_ptr fpstatus = get_fpstatus_ptr(1);
180
+@pd_rn_i9 ........ ........ ...... rn:5 . rd:4 \
142
- gen_helper_vfp_muls(tmp, tmp, tmp2, fpstatus);
181
+ &rri imm=%imm9_16_10
143
- tcg_temp_free_ptr(fpstatus);
182
+@rd_rn_i9 ........ ........ ...... rn:5 rd:5 \
144
}
183
+ &rri imm=%imm9_16_10
145
tcg_temp_free_i32(tmp2);
184
+
146
- if (op < 8) {
185
###########################################################################
147
- /* Accumulate. */
186
# Instruction patterns. Grouped according to the SVE encodingindex.xhtml.
148
- tmp2 = neon_load_reg(rd, pass);
187
149
- switch (op) {
188
@@ -XXX,XX +XXX,XX @@ AND_zzz 00000100 00 1 ..... 001 100 ..... ..... @rd_rn_rm_e0
150
- case 1:
189
ORR_zzz 00000100 01 1 ..... 001 100 ..... ..... @rd_rn_rm_e0
151
- {
190
EOR_zzz 00000100 10 1 ..... 001 100 ..... ..... @rd_rn_rm_e0
152
- TCGv_ptr fpstatus = get_fpstatus_ptr(1);
191
BIC_zzz 00000100 11 1 ..... 001 100 ..... ..... @rd_rn_rm_e0
153
- gen_helper_vfp_adds(tmp, tmp, tmp2, fpstatus);
192
+
154
- tcg_temp_free_ptr(fpstatus);
193
+### SVE Memory - 32-bit Gather and Unsized Contiguous Group
155
- break;
194
+
156
- }
195
+# SVE load predicate register
157
- case 5:
196
+LDR_pri 10000101 10 ...... 000 ... ..... 0 .... @pd_rn_i9
158
- {
197
+
159
- TCGv_ptr fpstatus = get_fpstatus_ptr(1);
198
+# SVE load vector register
160
- gen_helper_vfp_subs(tmp, tmp2, tmp, fpstatus);
199
+LDR_zri 10000101 10 ...... 010 ... ..... ..... @rd_rn_i9
161
- tcg_temp_free_ptr(fpstatus);
162
- break;
163
- }
164
- default:
165
- abort();
166
- }
167
- tcg_temp_free_i32(tmp2);
168
- }
169
neon_store_reg(rd, pass, tmp);
170
}
171
break;
200
--
172
--
201
2.17.0
173
2.20.1
202
174
203
175
diff view generated by jsdifflib
1
From: Richard Henderson <richard.henderson@linaro.org>
1
Convert the VQDMULH and VQRDMULH insns in the 2-reg-scalar group
2
to decodetree.
2
3
3
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
5
Message-id: 20180516223007.10256-25-richard.henderson@linaro.org
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
7
---
6
---
8
target/arm/helper-sve.h | 10 ++++
7
target/arm/neon-dp.decode | 3 +++
9
target/arm/sve_helper.c | 108 +++++++++++++++++++++++++++++++++++++
8
target/arm/translate-neon.inc.c | 29 +++++++++++++++++++++++
10
target/arm/translate-sve.c | 88 ++++++++++++++++++++++++++++++
9
target/arm/translate.c | 42 ++-------------------------------
11
target/arm/sve.decode | 19 ++++++-
10
3 files changed, 34 insertions(+), 40 deletions(-)
12
4 files changed, 224 insertions(+), 1 deletion(-)
13
11
14
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
12
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
15
index XXXXXXX..XXXXXXX 100644
13
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/helper-sve.h
14
--- a/target/arm/neon-dp.decode
17
+++ b/target/arm/helper-sve.h
15
+++ b/target/arm/neon-dp.decode
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_uqaddi_s, TCG_CALL_NO_RWG, void, ptr, ptr, s64, i32)
16
@@ -XXX,XX +XXX,XX @@ Vimm_1r 1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
19
DEF_HELPER_FLAGS_4(sve_uqaddi_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
17
20
DEF_HELPER_FLAGS_4(sve_uqsubi_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
18
VMUL_2sc 1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
21
19
VMUL_F_2sc 1111 001 . 1 . .. .... .... 1001 . 1 . 0 .... @2scalar
22
+DEF_HELPER_FLAGS_5(sve_cpy_m_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i64, i32)
23
+DEF_HELPER_FLAGS_5(sve_cpy_m_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i64, i32)
24
+DEF_HELPER_FLAGS_5(sve_cpy_m_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i64, i32)
25
+DEF_HELPER_FLAGS_5(sve_cpy_m_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i64, i32)
26
+
20
+
27
+DEF_HELPER_FLAGS_4(sve_cpy_z_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
21
+ VQDMULH_2sc 1111 001 . 1 . .. .... .... 1100 . 1 . 0 .... @2scalar
28
+DEF_HELPER_FLAGS_4(sve_cpy_z_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
22
+ VQRDMULH_2sc 1111 001 . 1 . .. .... .... 1101 . 1 . 0 .... @2scalar
29
+DEF_HELPER_FLAGS_4(sve_cpy_z_s, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
23
]
30
+DEF_HELPER_FLAGS_4(sve_cpy_z_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
24
}
31
+
25
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
32
DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
33
DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
34
DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
35
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
36
index XXXXXXX..XXXXXXX 100644
26
index XXXXXXX..XXXXXXX 100644
37
--- a/target/arm/sve_helper.c
27
--- a/target/arm/translate-neon.inc.c
38
+++ b/target/arm/sve_helper.c
28
+++ b/target/arm/translate-neon.inc.c
39
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_uqsubi_d)(void *d, void *a, uint64_t b, uint32_t desc)
29
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLS_F_2sc(DisasContext *s, arg_2scalar *a)
40
*(uint64_t *)(d + i) = (ai < b ? 0 : ai - b);
30
41
}
31
return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
42
}
32
}
43
+
33
+
44
+/* Two operand predicated copy immediate with merge. All valid immediates
34
+WRAP_ENV_FN(gen_VQDMULH_16, gen_helper_neon_qdmulh_s16)
45
+ * can fit within 17 signed bits in the simd_data field.
35
+WRAP_ENV_FN(gen_VQDMULH_32, gen_helper_neon_qdmulh_s32)
46
+ */
36
+WRAP_ENV_FN(gen_VQRDMULH_16, gen_helper_neon_qrdmulh_s16)
47
+void HELPER(sve_cpy_m_b)(void *vd, void *vn, void *vg,
37
+WRAP_ENV_FN(gen_VQRDMULH_32, gen_helper_neon_qrdmulh_s32)
48
+ uint64_t mm, uint32_t desc)
38
+
39
+static bool trans_VQDMULH_2sc(DisasContext *s, arg_2scalar *a)
49
+{
40
+{
50
+ intptr_t i, opr_sz = simd_oprsz(desc) / 8;
41
+ static NeonGenTwoOpFn * const opfn[] = {
51
+ uint64_t *d = vd, *n = vn;
42
+ NULL,
52
+ uint8_t *pg = vg;
43
+ gen_VQDMULH_16,
44
+ gen_VQDMULH_32,
45
+ NULL,
46
+ };
53
+
47
+
54
+ mm = dup_const(MO_8, mm);
48
+ return do_2scalar(s, a, opfn[a->size], NULL);
55
+ for (i = 0; i < opr_sz; i += 1) {
56
+ uint64_t nn = n[i];
57
+ uint64_t pp = expand_pred_b(pg[H1(i)]);
58
+ d[i] = (mm & pp) | (nn & ~pp);
59
+ }
60
+}
49
+}
61
+
50
+
62
+void HELPER(sve_cpy_m_h)(void *vd, void *vn, void *vg,
51
+static bool trans_VQRDMULH_2sc(DisasContext *s, arg_2scalar *a)
63
+ uint64_t mm, uint32_t desc)
64
+{
52
+{
65
+ intptr_t i, opr_sz = simd_oprsz(desc) / 8;
53
+ static NeonGenTwoOpFn * const opfn[] = {
66
+ uint64_t *d = vd, *n = vn;
54
+ NULL,
67
+ uint8_t *pg = vg;
55
+ gen_VQRDMULH_16,
68
+
56
+ gen_VQRDMULH_32,
69
+ mm = dup_const(MO_16, mm);
57
+ NULL,
70
+ for (i = 0; i < opr_sz; i += 1) {
71
+ uint64_t nn = n[i];
72
+ uint64_t pp = expand_pred_h(pg[H1(i)]);
73
+ d[i] = (mm & pp) | (nn & ~pp);
74
+ }
75
+}
76
+
77
+void HELPER(sve_cpy_m_s)(void *vd, void *vn, void *vg,
78
+ uint64_t mm, uint32_t desc)
79
+{
80
+ intptr_t i, opr_sz = simd_oprsz(desc) / 8;
81
+ uint64_t *d = vd, *n = vn;
82
+ uint8_t *pg = vg;
83
+
84
+ mm = dup_const(MO_32, mm);
85
+ for (i = 0; i < opr_sz; i += 1) {
86
+ uint64_t nn = n[i];
87
+ uint64_t pp = expand_pred_s(pg[H1(i)]);
88
+ d[i] = (mm & pp) | (nn & ~pp);
89
+ }
90
+}
91
+
92
+void HELPER(sve_cpy_m_d)(void *vd, void *vn, void *vg,
93
+ uint64_t mm, uint32_t desc)
94
+{
95
+ intptr_t i, opr_sz = simd_oprsz(desc) / 8;
96
+ uint64_t *d = vd, *n = vn;
97
+ uint8_t *pg = vg;
98
+
99
+ for (i = 0; i < opr_sz; i += 1) {
100
+ uint64_t nn = n[i];
101
+ d[i] = (pg[H1(i)] & 1 ? mm : nn);
102
+ }
103
+}
104
+
105
+void HELPER(sve_cpy_z_b)(void *vd, void *vg, uint64_t val, uint32_t desc)
106
+{
107
+ intptr_t i, opr_sz = simd_oprsz(desc) / 8;
108
+ uint64_t *d = vd;
109
+ uint8_t *pg = vg;
110
+
111
+ val = dup_const(MO_8, val);
112
+ for (i = 0; i < opr_sz; i += 1) {
113
+ d[i] = val & expand_pred_b(pg[H1(i)]);
114
+ }
115
+}
116
+
117
+void HELPER(sve_cpy_z_h)(void *vd, void *vg, uint64_t val, uint32_t desc)
118
+{
119
+ intptr_t i, opr_sz = simd_oprsz(desc) / 8;
120
+ uint64_t *d = vd;
121
+ uint8_t *pg = vg;
122
+
123
+ val = dup_const(MO_16, val);
124
+ for (i = 0; i < opr_sz; i += 1) {
125
+ d[i] = val & expand_pred_h(pg[H1(i)]);
126
+ }
127
+}
128
+
129
+void HELPER(sve_cpy_z_s)(void *vd, void *vg, uint64_t val, uint32_t desc)
130
+{
131
+ intptr_t i, opr_sz = simd_oprsz(desc) / 8;
132
+ uint64_t *d = vd;
133
+ uint8_t *pg = vg;
134
+
135
+ val = dup_const(MO_32, val);
136
+ for (i = 0; i < opr_sz; i += 1) {
137
+ d[i] = val & expand_pred_s(pg[H1(i)]);
138
+ }
139
+}
140
+
141
+void HELPER(sve_cpy_z_d)(void *vd, void *vg, uint64_t val, uint32_t desc)
142
+{
143
+ intptr_t i, opr_sz = simd_oprsz(desc) / 8;
144
+ uint64_t *d = vd;
145
+ uint8_t *pg = vg;
146
+
147
+ for (i = 0; i < opr_sz; i += 1) {
148
+ d[i] = (pg[H1(i)] & 1 ? val : 0);
149
+ }
150
+}
151
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
152
index XXXXXXX..XXXXXXX 100644
153
--- a/target/arm/translate-sve.c
154
+++ b/target/arm/translate-sve.c
155
@@ -XXX,XX +XXX,XX @@ static inline int plus1(int x)
156
return x + 1;
157
}
158
159
+/* The SH bit is in bit 8. Extract the low 8 and shift. */
160
+static inline int expand_imm_sh8s(int x)
161
+{
162
+ return (int8_t)x << (x & 0x100 ? 8 : 0);
163
+}
164
+
165
/*
166
* Include the generated decoder.
167
*/
168
@@ -XXX,XX +XXX,XX @@ static bool trans_DUPM(DisasContext *s, arg_DUPM *a, uint32_t insn)
169
return true;
170
}
171
172
+/*
173
+ *** SVE Integer Wide Immediate - Predicated Group
174
+ */
175
+
176
+/* Implement all merging copies. This is used for CPY (immediate),
177
+ * FCPY, CPY (scalar), CPY (SIMD&FP scalar).
178
+ */
179
+static void do_cpy_m(DisasContext *s, int esz, int rd, int rn, int pg,
180
+ TCGv_i64 val)
181
+{
182
+ typedef void gen_cpy(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64, TCGv_i32);
183
+ static gen_cpy * const fns[4] = {
184
+ gen_helper_sve_cpy_m_b, gen_helper_sve_cpy_m_h,
185
+ gen_helper_sve_cpy_m_s, gen_helper_sve_cpy_m_d,
186
+ };
187
+ unsigned vsz = vec_full_reg_size(s);
188
+ TCGv_i32 desc = tcg_const_i32(simd_desc(vsz, vsz, 0));
189
+ TCGv_ptr t_zd = tcg_temp_new_ptr();
190
+ TCGv_ptr t_zn = tcg_temp_new_ptr();
191
+ TCGv_ptr t_pg = tcg_temp_new_ptr();
192
+
193
+ tcg_gen_addi_ptr(t_zd, cpu_env, vec_full_reg_offset(s, rd));
194
+ tcg_gen_addi_ptr(t_zn, cpu_env, vec_full_reg_offset(s, rn));
195
+ tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, pg));
196
+
197
+ fns[esz](t_zd, t_zn, t_pg, val, desc);
198
+
199
+ tcg_temp_free_ptr(t_zd);
200
+ tcg_temp_free_ptr(t_zn);
201
+ tcg_temp_free_ptr(t_pg);
202
+ tcg_temp_free_i32(desc);
203
+}
204
+
205
+static bool trans_FCPY(DisasContext *s, arg_FCPY *a, uint32_t insn)
206
+{
207
+ if (a->esz == 0) {
208
+ return false;
209
+ }
210
+ if (sve_access_check(s)) {
211
+ /* Decode the VFP immediate. */
212
+ uint64_t imm = vfp_expand_imm(a->esz, a->imm);
213
+ TCGv_i64 t_imm = tcg_const_i64(imm);
214
+ do_cpy_m(s, a->esz, a->rd, a->rn, a->pg, t_imm);
215
+ tcg_temp_free_i64(t_imm);
216
+ }
217
+ return true;
218
+}
219
+
220
+static bool trans_CPY_m_i(DisasContext *s, arg_rpri_esz *a, uint32_t insn)
221
+{
222
+ if (a->esz == 0 && extract32(insn, 13, 1)) {
223
+ return false;
224
+ }
225
+ if (sve_access_check(s)) {
226
+ TCGv_i64 t_imm = tcg_const_i64(a->imm);
227
+ do_cpy_m(s, a->esz, a->rd, a->rn, a->pg, t_imm);
228
+ tcg_temp_free_i64(t_imm);
229
+ }
230
+ return true;
231
+}
232
+
233
+static bool trans_CPY_z_i(DisasContext *s, arg_CPY_z_i *a, uint32_t insn)
234
+{
235
+ static gen_helper_gvec_2i * const fns[4] = {
236
+ gen_helper_sve_cpy_z_b, gen_helper_sve_cpy_z_h,
237
+ gen_helper_sve_cpy_z_s, gen_helper_sve_cpy_z_d,
238
+ };
58
+ };
239
+
59
+
240
+ if (a->esz == 0 && extract32(insn, 13, 1)) {
60
+ return do_2scalar(s, a, opfn[a->size], NULL);
241
+ return false;
242
+ }
243
+ if (sve_access_check(s)) {
244
+ unsigned vsz = vec_full_reg_size(s);
245
+ TCGv_i64 t_imm = tcg_const_i64(a->imm);
246
+ tcg_gen_gvec_2i_ool(vec_full_reg_offset(s, a->rd),
247
+ pred_full_reg_offset(s, a->pg),
248
+ t_imm, vsz, vsz, 0, fns[a->esz]);
249
+ tcg_temp_free_i64(t_imm);
250
+ }
251
+ return true;
252
+}
61
+}
62
diff --git a/target/arm/translate.c b/target/arm/translate.c
63
index XXXXXXX..XXXXXXX 100644
64
--- a/target/arm/translate.c
65
+++ b/target/arm/translate.c
66
@@ -XXX,XX +XXX,XX @@ static void gen_exception_return(DisasContext *s, TCGv_i32 pc)
67
68
#define CPU_V001 cpu_V0, cpu_V0, cpu_V1
69
70
-static TCGv_i32 neon_load_scratch(int scratch)
71
-{
72
- TCGv_i32 tmp = tcg_temp_new_i32();
73
- tcg_gen_ld_i32(tmp, cpu_env, offsetof(CPUARMState, vfp.scratch[scratch]));
74
- return tmp;
75
-}
76
-
77
-static void neon_store_scratch(int scratch, TCGv_i32 var)
78
-{
79
- tcg_gen_st_i32(var, cpu_env, offsetof(CPUARMState, vfp.scratch[scratch]));
80
- tcg_temp_free_i32(var);
81
-}
82
-
83
static int gen_neon_unzip(int rd, int rm, int size, int q)
84
{
85
TCGv_ptr pd, pm;
86
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
87
case 1: /* Float VMLA scalar */
88
case 5: /* Floating point VMLS scalar */
89
case 9: /* Floating point VMUL scalar */
90
- return 1; /* handled by decodetree */
91
-
92
case 12: /* VQDMULH scalar */
93
case 13: /* VQRDMULH scalar */
94
- if (u && ((rd | rn) & 1)) {
95
- return 1;
96
- }
97
- tmp = neon_get_scalar(size, rm);
98
- neon_store_scratch(0, tmp);
99
- for (pass = 0; pass < (u ? 4 : 2); pass++) {
100
- tmp = neon_load_scratch(0);
101
- tmp2 = neon_load_reg(rn, pass);
102
- if (op == 12) {
103
- if (size == 1) {
104
- gen_helper_neon_qdmulh_s16(tmp, cpu_env, tmp, tmp2);
105
- } else {
106
- gen_helper_neon_qdmulh_s32(tmp, cpu_env, tmp, tmp2);
107
- }
108
- } else {
109
- if (size == 1) {
110
- gen_helper_neon_qrdmulh_s16(tmp, cpu_env, tmp, tmp2);
111
- } else {
112
- gen_helper_neon_qrdmulh_s32(tmp, cpu_env, tmp, tmp2);
113
- }
114
- }
115
- tcg_temp_free_i32(tmp2);
116
- neon_store_reg(rd, pass, tmp);
117
- }
118
- break;
119
+ return 1; /* handled by decodetree */
253
+
120
+
254
/*
121
case 3: /* VQDMLAL scalar */
255
*** SVE Memory - 32-bit Gather and Unsized Contiguous Group
122
case 7: /* VQDMLSL scalar */
256
*/
123
case 11: /* VQDMULL scalar */
257
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
258
index XXXXXXX..XXXXXXX 100644
259
--- a/target/arm/sve.decode
260
+++ b/target/arm/sve.decode
261
@@ -XXX,XX +XXX,XX @@
262
###########################################################################
263
# Named fields. These are primarily for disjoint fields.
264
265
-%imm4_16_p1 16:4 !function=plus1
266
+%imm4_16_p1 16:4 !function=plus1
267
%imm6_22_5 22:1 5:5
268
%imm9_16_10 16:s6 10:3
269
270
@@ -XXX,XX +XXX,XX @@
271
%tszimm16_shr 22:2 16:5 !function=tszimm_shr
272
%tszimm16_shl 22:2 16:5 !function=tszimm_shl
273
274
+# Signed 8-bit immediate, optionally shifted left by 8.
275
+%sh8_i8s 5:9 !function=expand_imm_sh8s
276
+
277
# Either a copy of rd (at bit 0), or a different source
278
# as propagated via the MOVPRFX instruction.
279
%reg_movprfx 0:5
280
@@ -XXX,XX +XXX,XX @@
281
@rd_rn_tszimm ........ .. ... ... ...... rn:5 rd:5 \
282
&rri_esz esz=%tszimm16_esz
283
284
+# Two register operand, one immediate operand, with 4-bit predicate.
285
+# User must fill in imm.
286
+@rdn_pg4 ........ esz:2 .. pg:4 ... ........ rd:5 \
287
+ &rpri_esz rn=%reg_movprfx
288
+
289
# Two register operand, one encoded bitmask.
290
@rdn_dbm ........ .. .... dbm:13 rd:5 \
291
&rr_dbm rn=%reg_movprfx
292
@@ -XXX,XX +XXX,XX @@ AND_zzi 00000101 10 0000 ............. ..... @rdn_dbm
293
# SVE broadcast bitmask immediate
294
DUPM 00000101 11 0000 dbm:13 rd:5
295
296
+### SVE Integer Wide Immediate - Predicated Group
297
+
298
+# SVE copy floating-point immediate (predicated)
299
+FCPY 00000101 .. 01 .... 110 imm:8 ..... @rdn_pg4
300
+
301
+# SVE copy integer immediate (predicated)
302
+CPY_m_i 00000101 .. 01 .... 01 . ........ ..... @rdn_pg4 imm=%sh8_i8s
303
+CPY_z_i 00000101 .. 01 .... 00 . ........ ..... @rdn_pg4 imm=%sh8_i8s
304
+
305
### SVE Predicate Logical Operations Group
306
307
# SVE predicate logical operations
308
--
124
--
309
2.17.0
125
2.20.1
310
126
311
127
diff view generated by jsdifflib
1
From: Richard Henderson <richard.henderson@linaro.org>
1
Convert the VQRDMLAH and VQRDMLSH insns in the 2-reg-scalar
2
group to decodetree.
2
3
3
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
4
Message-id: 20180516223007.10256-8-richard.henderson@linaro.org
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
---
6
---
7
target/arm/cpu.h | 4 +
7
target/arm/neon-dp.decode | 3 ++
8
target/arm/helper-sve.h | 3 +
8
target/arm/translate-neon.inc.c | 74 +++++++++++++++++++++++++++++++++
9
target/arm/sve_helper.c | 84 +++++++++++++++
9
target/arm/translate.c | 38 +----------------
10
target/arm/translate-sve.c | 209 +++++++++++++++++++++++++++++++++++++
10
3 files changed, 79 insertions(+), 36 deletions(-)
11
target/arm/sve.decode | 31 ++++++
12
5 files changed, 331 insertions(+)
13
11
14
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
12
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
15
index XXXXXXX..XXXXXXX 100644
13
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/cpu.h
14
--- a/target/arm/neon-dp.decode
17
+++ b/target/arm/cpu.h
15
+++ b/target/arm/neon-dp.decode
18
@@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
16
@@ -XXX,XX +XXX,XX @@ Vimm_1r 1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
19
17
20
#ifdef TARGET_AARCH64
18
VQDMULH_2sc 1111 001 . 1 . .. .... .... 1100 . 1 . 0 .... @2scalar
21
/* Store FFR as pregs[16] to make it easier to treat as any other. */
19
VQRDMULH_2sc 1111 001 . 1 . .. .... .... 1101 . 1 . 0 .... @2scalar
22
+#define FFR_PRED_NUM 16
20
+
23
ARMPredicateReg pregs[17];
21
+ VQRDMLAH_2sc 1111 001 . 1 . .. .... .... 1110 . 1 . 0 .... @2scalar
24
/* Scratch space for aa64 sve predicate temporary. */
22
+ VQRDMLSH_2sc 1111 001 . 1 . .. .... .... 1111 . 1 . 0 .... @2scalar
25
ARMPredicateReg preg_tmp;
23
]
26
@@ -XXX,XX +XXX,XX @@ static inline uint64_t *aa64_vfp_qreg(CPUARMState *env, unsigned regno)
27
return &env->vfp.zregs[regno].d[0];
28
}
24
}
29
25
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
30
+/* Shared between translate-sve.c and sve_helper.c. */
26
index XXXXXXX..XXXXXXX 100644
31
+extern const uint64_t pred_esz_masks[4];
27
--- a/target/arm/translate-neon.inc.c
28
+++ b/target/arm/translate-neon.inc.c
29
@@ -XXX,XX +XXX,XX @@ static bool trans_VQRDMULH_2sc(DisasContext *s, arg_2scalar *a)
30
31
return do_2scalar(s, a, opfn[a->size], NULL);
32
}
32
+
33
+
33
#endif
34
+static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
34
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
35
+ NeonGenThreeOpEnvFn *opfn)
35
index XXXXXXX..XXXXXXX 100644
36
+{
36
--- a/target/arm/helper-sve.h
37
+ /*
37
+++ b/target/arm/helper-sve.h
38
+ * VQRDMLAH/VQRDMLSH: this is like do_2scalar, but the opfn
38
@@ -XXX,XX +XXX,XX @@
39
+ * performs a kind of fused op-then-accumulate using a helper
39
DEF_HELPER_FLAGS_2(sve_predtest1, TCG_CALL_NO_WG, i32, i64, i64)
40
+ * function that takes all of rd, rn and the scalar at once.
40
DEF_HELPER_FLAGS_3(sve_predtest, TCG_CALL_NO_WG, i32, ptr, ptr, i32)
41
+ */
41
42
+ TCGv_i32 scalar;
42
+DEF_HELPER_FLAGS_3(sve_pfirst, TCG_CALL_NO_WG, i32, ptr, ptr, i32)
43
+ int pass;
43
+DEF_HELPER_FLAGS_3(sve_pnext, TCG_CALL_NO_WG, i32, ptr, ptr, i32)
44
+
44
+
45
DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
45
+ if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
46
DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
46
+ return false;
47
DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
48
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
49
index XXXXXXX..XXXXXXX 100644
50
--- a/target/arm/sve_helper.c
51
+++ b/target/arm/sve_helper.c
52
@@ -XXX,XX +XXX,XX @@ LOGICAL_PPPP(sve_nand_pppp, DO_NAND)
53
#undef DO_NAND
54
#undef DO_SEL
55
#undef LOGICAL_PPPP
56
+
57
+/* Similar to the ARM LastActiveElement pseudocode function, except the
58
+ result is multiplied by the element size. This includes the not found
59
+ indication; e.g. not found for esz=3 is -8. */
60
+static intptr_t last_active_element(uint64_t *g, intptr_t words, intptr_t esz)
61
+{
62
+ uint64_t mask = pred_esz_masks[esz];
63
+ intptr_t i = words;
64
+
65
+ do {
66
+ uint64_t this_g = g[--i] & mask;
67
+ if (this_g) {
68
+ return i * 64 + (63 - clz64(this_g));
69
+ }
70
+ } while (i > 0);
71
+ return (intptr_t)-1 << esz;
72
+}
73
+
74
+uint32_t HELPER(sve_pfirst)(void *vd, void *vg, uint32_t words)
75
+{
76
+ uint32_t flags = PREDTEST_INIT;
77
+ uint64_t *d = vd, *g = vg;
78
+ intptr_t i = 0;
79
+
80
+ do {
81
+ uint64_t this_d = d[i];
82
+ uint64_t this_g = g[i];
83
+
84
+ if (this_g) {
85
+ if (!(flags & 4)) {
86
+ /* Set in D the first bit of G. */
87
+ this_d |= this_g & -this_g;
88
+ d[i] = this_d;
89
+ }
90
+ flags = iter_predtest_fwd(this_d, this_g, flags);
91
+ }
92
+ } while (++i < words);
93
+
94
+ return flags;
95
+}
96
+
97
+uint32_t HELPER(sve_pnext)(void *vd, void *vg, uint32_t pred_desc)
98
+{
99
+ intptr_t words = extract32(pred_desc, 0, SIMD_OPRSZ_BITS);
100
+ intptr_t esz = extract32(pred_desc, SIMD_DATA_SHIFT, 2);
101
+ uint32_t flags = PREDTEST_INIT;
102
+ uint64_t *d = vd, *g = vg, esz_mask;
103
+ intptr_t i, next;
104
+
105
+ next = last_active_element(vd, words, esz) + (1 << esz);
106
+ esz_mask = pred_esz_masks[esz];
107
+
108
+ /* Similar to the pseudocode for pnext, but scaled by ESZ
109
+ so that we find the correct bit. */
110
+ if (next < words * 64) {
111
+ uint64_t mask = -1;
112
+
113
+ if (next & 63) {
114
+ mask = ~((1ull << (next & 63)) - 1);
115
+ next &= -64;
116
+ }
117
+ do {
118
+ uint64_t this_g = g[next / 64] & esz_mask & mask;
119
+ if (this_g != 0) {
120
+ next = (next & -64) + ctz64(this_g);
121
+ break;
122
+ }
123
+ next += 64;
124
+ mask = -1;
125
+ } while (next < words * 64);
126
+ }
47
+ }
127
+
48
+
128
+ i = 0;
49
+ if (!dc_isar_feature(aa32_rdm, s)) {
129
+ do {
50
+ return false;
130
+ uint64_t this_d = 0;
51
+ }
131
+ if (i == next / 64) {
132
+ this_d = 1ull << (next & 63);
133
+ }
134
+ d[i] = this_d;
135
+ flags = iter_predtest_fwd(this_d, g[i] & esz_mask, flags);
136
+ } while (++i < words);
137
+
52
+
138
+ return flags;
53
+ /* UNDEF accesses to D16-D31 if they don't exist. */
139
+}
54
+ if (!dc_isar_feature(aa32_simd_r32, s) &&
140
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
55
+ ((a->vd | a->vn | a->vm) & 0x10)) {
141
index XXXXXXX..XXXXXXX 100644
56
+ return false;
142
--- a/target/arm/translate-sve.c
57
+ }
143
+++ b/target/arm/translate-sve.c
144
@@ -XXX,XX +XXX,XX @@
145
#include "exec/exec-all.h"
146
#include "tcg-op.h"
147
#include "tcg-op-gvec.h"
148
+#include "tcg-gvec-desc.h"
149
#include "qemu/log.h"
150
#include "arm_ldst.h"
151
#include "translate.h"
152
@@ -XXX,XX +XXX,XX @@ static void do_predtest(DisasContext *s, int dofs, int gofs, int words)
153
tcg_temp_free_i32(t);
154
}
155
156
+/* For each element size, the bits within a predicate word that are active. */
157
+const uint64_t pred_esz_masks[4] = {
158
+ 0xffffffffffffffffull, 0x5555555555555555ull,
159
+ 0x1111111111111111ull, 0x0101010101010101ull
160
+};
161
+
58
+
162
/*
59
+ if (!opfn) {
163
*** SVE Logical - Unpredicated Group
60
+ /* Bad size (including size == 3, which is a different insn group) */
164
*/
61
+ return false;
165
@@ -XXX,XX +XXX,XX @@ static bool trans_PTEST(DisasContext *s, arg_PTEST *a, uint32_t insn)
62
+ }
166
return true;
167
}
168
169
+/* See the ARM pseudocode DecodePredCount. */
170
+static unsigned decode_pred_count(unsigned fullsz, int pattern, int esz)
171
+{
172
+ unsigned elements = fullsz >> esz;
173
+ unsigned bound;
174
+
63
+
175
+ switch (pattern) {
64
+ if (a->q && ((a->vd | a->vn) & 1)) {
176
+ case 0x0: /* POW2 */
65
+ return false;
177
+ return pow2floor(elements);
178
+ case 0x1: /* VL1 */
179
+ case 0x2: /* VL2 */
180
+ case 0x3: /* VL3 */
181
+ case 0x4: /* VL4 */
182
+ case 0x5: /* VL5 */
183
+ case 0x6: /* VL6 */
184
+ case 0x7: /* VL7 */
185
+ case 0x8: /* VL8 */
186
+ bound = pattern;
187
+ break;
188
+ case 0x9: /* VL16 */
189
+ case 0xa: /* VL32 */
190
+ case 0xb: /* VL64 */
191
+ case 0xc: /* VL128 */
192
+ case 0xd: /* VL256 */
193
+ bound = 16 << (pattern - 9);
194
+ break;
195
+ case 0x1d: /* MUL4 */
196
+ return elements - elements % 4;
197
+ case 0x1e: /* MUL3 */
198
+ return elements - elements % 3;
199
+ case 0x1f: /* ALL */
200
+ return elements;
201
+ default: /* #uimm5 */
202
+ return 0;
203
+ }
66
+ }
204
+ return elements >= bound ? bound : 0;
205
+}
206
+
67
+
207
+/* This handles all of the predicate initialization instructions,
68
+ if (!vfp_access_check(s)) {
208
+ * PTRUE, PFALSE, SETFFR. For PFALSE, we will have set PAT == 32
209
+ * so that decode_pred_count returns 0. For SETFFR, we will have
210
+ * set RD == 16 == FFR.
211
+ */
212
+static bool do_predset(DisasContext *s, int esz, int rd, int pat, bool setflag)
213
+{
214
+ if (!sve_access_check(s)) {
215
+ return true;
69
+ return true;
216
+ }
70
+ }
217
+
71
+
218
+ unsigned fullsz = vec_full_reg_size(s);
72
+ scalar = neon_get_scalar(a->size, a->vm);
219
+ unsigned ofs = pred_full_reg_offset(s, rd);
220
+ unsigned numelem, setsz, i;
221
+ uint64_t word, lastword;
222
+ TCGv_i64 t;
223
+
73
+
224
+ numelem = decode_pred_count(fullsz, pat, esz);
74
+ for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
75
+ TCGv_i32 rn = neon_load_reg(a->vn, pass);
76
+ TCGv_i32 rd = neon_load_reg(a->vd, pass);
77
+ opfn(rd, cpu_env, rn, scalar, rd);
78
+ tcg_temp_free_i32(rn);
79
+ neon_store_reg(a->vd, pass, rd);
80
+ }
81
+ tcg_temp_free_i32(scalar);
225
+
82
+
226
+ /* Determine what we must store into each bit, and how many. */
227
+ if (numelem == 0) {
228
+ lastword = word = 0;
229
+ setsz = fullsz;
230
+ } else {
231
+ setsz = numelem << esz;
232
+ lastword = word = pred_esz_masks[esz];
233
+ if (setsz % 64) {
234
+ lastword &= ~(-1ull << (setsz % 64));
235
+ }
236
+ }
237
+
238
+ t = tcg_temp_new_i64();
239
+ if (fullsz <= 64) {
240
+ tcg_gen_movi_i64(t, lastword);
241
+ tcg_gen_st_i64(t, cpu_env, ofs);
242
+ goto done;
243
+ }
244
+
245
+ if (word == lastword) {
246
+ unsigned maxsz = size_for_gvec(fullsz / 8);
247
+ unsigned oprsz = size_for_gvec(setsz / 8);
248
+
249
+ if (oprsz * 8 == setsz) {
250
+ tcg_gen_gvec_dup64i(ofs, oprsz, maxsz, word);
251
+ goto done;
252
+ }
253
+ if (oprsz * 8 == setsz + 8) {
254
+ tcg_gen_gvec_dup64i(ofs, oprsz, maxsz, word);
255
+ tcg_gen_movi_i64(t, 0);
256
+ tcg_gen_st_i64(t, cpu_env, ofs + oprsz - 8);
257
+ goto done;
258
+ }
259
+ }
260
+
261
+ setsz /= 8;
262
+ fullsz /= 8;
263
+
264
+ tcg_gen_movi_i64(t, word);
265
+ for (i = 0; i < setsz; i += 8) {
266
+ tcg_gen_st_i64(t, cpu_env, ofs + i);
267
+ }
268
+ if (lastword != word) {
269
+ tcg_gen_movi_i64(t, lastword);
270
+ tcg_gen_st_i64(t, cpu_env, ofs + i);
271
+ i += 8;
272
+ }
273
+ if (i < fullsz) {
274
+ tcg_gen_movi_i64(t, 0);
275
+ for (; i < fullsz; i += 8) {
276
+ tcg_gen_st_i64(t, cpu_env, ofs + i);
277
+ }
278
+ }
279
+
280
+ done:
281
+ tcg_temp_free_i64(t);
282
+
283
+ /* PTRUES */
284
+ if (setflag) {
285
+ tcg_gen_movi_i32(cpu_NF, -(word != 0));
286
+ tcg_gen_movi_i32(cpu_CF, word == 0);
287
+ tcg_gen_movi_i32(cpu_VF, 0);
288
+ tcg_gen_mov_i32(cpu_ZF, cpu_NF);
289
+ }
290
+ return true;
83
+ return true;
291
+}
84
+}
292
+
85
+
293
+static bool trans_PTRUE(DisasContext *s, arg_PTRUE *a, uint32_t insn)
86
+static bool trans_VQRDMLAH_2sc(DisasContext *s, arg_2scalar *a)
294
+{
87
+{
295
+ return do_predset(s, a->esz, a->rd, a->pat, a->s);
88
+ static NeonGenThreeOpEnvFn *opfn[] = {
89
+ NULL,
90
+ gen_helper_neon_qrdmlah_s16,
91
+ gen_helper_neon_qrdmlah_s32,
92
+ NULL,
93
+ };
94
+ return do_vqrdmlah_2sc(s, a, opfn[a->size]);
296
+}
95
+}
297
+
96
+
298
+static bool trans_SETFFR(DisasContext *s, arg_SETFFR *a, uint32_t insn)
97
+static bool trans_VQRDMLSH_2sc(DisasContext *s, arg_2scalar *a)
299
+{
98
+{
300
+ /* Note pat == 31 is #all, to set all elements. */
99
+ static NeonGenThreeOpEnvFn *opfn[] = {
301
+ return do_predset(s, 0, FFR_PRED_NUM, 31, false);
100
+ NULL,
101
+ gen_helper_neon_qrdmlsh_s16,
102
+ gen_helper_neon_qrdmlsh_s32,
103
+ NULL,
104
+ };
105
+ return do_vqrdmlah_2sc(s, a, opfn[a->size]);
302
+}
106
+}
303
+
107
diff --git a/target/arm/translate.c b/target/arm/translate.c
304
+static bool trans_PFALSE(DisasContext *s, arg_PFALSE *a, uint32_t insn)
305
+{
306
+ /* Note pat == 32 is #unimp, to set no elements. */
307
+ return do_predset(s, 0, a->rd, 32, false);
308
+}
309
+
310
+static bool trans_RDFFR_p(DisasContext *s, arg_RDFFR_p *a, uint32_t insn)
311
+{
312
+ /* The path through do_pppp_flags is complicated enough to want to avoid
313
+ * duplication. Frob the arguments into the form of a predicated AND.
314
+ */
315
+ arg_rprr_s alt_a = {
316
+ .rd = a->rd, .pg = a->pg, .s = a->s,
317
+ .rn = FFR_PRED_NUM, .rm = FFR_PRED_NUM,
318
+ };
319
+ return trans_AND_pppp(s, &alt_a, insn);
320
+}
321
+
322
+static bool trans_RDFFR(DisasContext *s, arg_RDFFR *a, uint32_t insn)
323
+{
324
+ return do_mov_p(s, a->rd, FFR_PRED_NUM);
325
+}
326
+
327
+static bool trans_WRFFR(DisasContext *s, arg_WRFFR *a, uint32_t insn)
328
+{
329
+ return do_mov_p(s, FFR_PRED_NUM, a->rn);
330
+}
331
+
332
+static bool do_pfirst_pnext(DisasContext *s, arg_rr_esz *a,
333
+ void (*gen_fn)(TCGv_i32, TCGv_ptr,
334
+ TCGv_ptr, TCGv_i32))
335
+{
336
+ if (!sve_access_check(s)) {
337
+ return true;
338
+ }
339
+
340
+ TCGv_ptr t_pd = tcg_temp_new_ptr();
341
+ TCGv_ptr t_pg = tcg_temp_new_ptr();
342
+ TCGv_i32 t;
343
+ unsigned desc;
344
+
345
+ desc = DIV_ROUND_UP(pred_full_reg_size(s), 8);
346
+ desc = deposit32(desc, SIMD_DATA_SHIFT, 2, a->esz);
347
+
348
+ tcg_gen_addi_ptr(t_pd, cpu_env, pred_full_reg_offset(s, a->rd));
349
+ tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, a->rn));
350
+ t = tcg_const_i32(desc);
351
+
352
+ gen_fn(t, t_pd, t_pg, t);
353
+ tcg_temp_free_ptr(t_pd);
354
+ tcg_temp_free_ptr(t_pg);
355
+
356
+ do_pred_flags(t);
357
+ tcg_temp_free_i32(t);
358
+ return true;
359
+}
360
+
361
+static bool trans_PFIRST(DisasContext *s, arg_rr_esz *a, uint32_t insn)
362
+{
363
+ return do_pfirst_pnext(s, a, gen_helper_sve_pfirst);
364
+}
365
+
366
+static bool trans_PNEXT(DisasContext *s, arg_rr_esz *a, uint32_t insn)
367
+{
368
+ return do_pfirst_pnext(s, a, gen_helper_sve_pnext);
369
+}
370
+
371
/*
372
*** SVE Memory - 32-bit Gather and Unsized Contiguous Group
373
*/
374
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
375
index XXXXXXX..XXXXXXX 100644
108
index XXXXXXX..XXXXXXX 100644
376
--- a/target/arm/sve.decode
109
--- a/target/arm/translate.c
377
+++ b/target/arm/sve.decode
110
+++ b/target/arm/translate.c
378
@@ -XXX,XX +XXX,XX @@
111
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
379
# when creating helpers common to those for the individual
112
case 9: /* Floating point VMUL scalar */
380
# instruction patterns.
113
case 12: /* VQDMULH scalar */
381
114
case 13: /* VQRDMULH scalar */
382
+&rr_esz rd rn esz
115
+ case 14: /* VQRDMLAH scalar */
383
&rri rd rn imm
116
+ case 15: /* VQRDMLSH scalar */
384
&rrr_esz rd rn rm esz
117
return 1; /* handled by decodetree */
385
&rprr_s rd pg rn rm s
118
386
@@ -XXX,XX +XXX,XX @@
119
case 3: /* VQDMLAL scalar */
387
# Named instruction formats. These are generally used to
120
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
388
# reduce the amount of duplication between instruction patterns.
121
neon_store_reg64(cpu_V0, rd + pass);
389
122
}
390
+# Two operand with unused vector element size
123
break;
391
+@pd_pn_e0 ........ ........ ....... rn:4 . rd:4 &rr_esz esz=0
124
- case 14: /* VQRDMLAH scalar */
392
+
125
- case 15: /* VQRDMLSH scalar */
393
+# Two operand
126
- {
394
+@pd_pn ........ esz:2 .. .... ....... rn:4 . rd:4 &rr_esz
127
- NeonGenThreeOpEnvFn *fn;
395
+
128
-
396
# Three operand with unused vector element size
129
- if (!dc_isar_feature(aa32_rdm, s)) {
397
@rd_rn_rm_e0 ........ ... rm:5 ... ... rn:5 rd:5 &rrr_esz esz=0
130
- return 1;
398
131
- }
399
@@ -XXX,XX +XXX,XX @@ NAND_pppp 00100101 1. 00 .... 01 .... 1 .... 1 .... @pd_pg_pn_pm_s
132
- if (u && ((rd | rn) & 1)) {
400
# SVE predicate test
133
- return 1;
401
PTEST 00100101 01 010000 11 pg:4 0 rn:4 0 0000
134
- }
402
135
- if (op == 14) {
403
+# SVE predicate initialize
136
- if (size == 1) {
404
+PTRUE 00100101 esz:2 01100 s:1 111000 pat:5 0 rd:4
137
- fn = gen_helper_neon_qrdmlah_s16;
405
+
138
- } else {
406
+# SVE initialize FFR
139
- fn = gen_helper_neon_qrdmlah_s32;
407
+SETFFR 00100101 0010 1100 1001 0000 0000 0000
140
- }
408
+
141
- } else {
409
+# SVE zero predicate register
142
- if (size == 1) {
410
+PFALSE 00100101 0001 1000 1110 0100 0000 rd:4
143
- fn = gen_helper_neon_qrdmlsh_s16;
411
+
144
- } else {
412
+# SVE predicate read from FFR (predicated)
145
- fn = gen_helper_neon_qrdmlsh_s32;
413
+RDFFR_p 00100101 0 s:1 0110001111000 pg:4 0 rd:4
146
- }
414
+
147
- }
415
+# SVE predicate read from FFR (unpredicated)
148
-
416
+RDFFR 00100101 0001 1001 1111 0000 0000 rd:4
149
- tmp2 = neon_get_scalar(size, rm);
417
+
150
- for (pass = 0; pass < (u ? 4 : 2); pass++) {
418
+# SVE FFR write from predicate (WRFFR)
151
- tmp = neon_load_reg(rn, pass);
419
+WRFFR 00100101 0010 1000 1001 000 rn:4 00000
152
- tmp3 = neon_load_reg(rd, pass);
420
+
153
- fn(tmp, cpu_env, tmp, tmp2, tmp3);
421
+# SVE predicate first active
154
- tcg_temp_free_i32(tmp3);
422
+PFIRST 00100101 01 011 000 11000 00 .... 0 .... @pd_pn_e0
155
- neon_store_reg(rd, pass, tmp);
423
+
156
- }
424
+# SVE predicate next active
157
- tcg_temp_free_i32(tmp2);
425
+PNEXT 00100101 .. 011 001 11000 10 .... 0 .... @pd_pn
158
- }
426
+
159
- break;
427
### SVE Memory - 32-bit Gather and Unsized Contiguous Group
160
default:
428
161
g_assert_not_reached();
429
# SVE load predicate register
162
}
430
--
163
--
431
2.17.0
164
2.20.1
432
165
433
166
diff view generated by jsdifflib
1
From: Francisco Iglesias <frasse.iglesias@gmail.com>
1
Convert the Neon 2-reg-scalar long multiplies to decodetree.
2
These are the last instructions in the group.
2
3
3
Add a model of the generic DMA found on Xilinx ZynqMP.
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
---
7
target/arm/neon-dp.decode | 18 ++++
8
target/arm/translate-neon.inc.c | 163 ++++++++++++++++++++++++++++
9
target/arm/translate.c | 182 ++------------------------------
10
3 files changed, 187 insertions(+), 176 deletions(-)
4
11
5
Signed-off-by: Francisco Iglesias <frasse.iglesias@gmail.com>
12
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
6
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
7
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
8
Message-id: 20180503214201.29082-2-frasse.iglesias@gmail.com
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
10
---
11
hw/dma/Makefile.objs | 1 +
12
include/hw/dma/xlnx-zdma.h | 84 ++++
13
hw/dma/xlnx-zdma.c | 832 +++++++++++++++++++++++++++++++++++++
14
3 files changed, 917 insertions(+)
15
create mode 100644 include/hw/dma/xlnx-zdma.h
16
create mode 100644 hw/dma/xlnx-zdma.c
17
18
diff --git a/hw/dma/Makefile.objs b/hw/dma/Makefile.objs
19
index XXXXXXX..XXXXXXX 100644
13
index XXXXXXX..XXXXXXX 100644
20
--- a/hw/dma/Makefile.objs
14
--- a/target/arm/neon-dp.decode
21
+++ b/hw/dma/Makefile.objs
15
+++ b/target/arm/neon-dp.decode
22
@@ -XXX,XX +XXX,XX @@ common-obj-$(CONFIG_ETRAXFS) += etraxfs_dma.o
16
@@ -XXX,XX +XXX,XX @@ Vimm_1r 1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
23
common-obj-$(CONFIG_STP2000) += sparc32_dma.o
17
24
obj-$(CONFIG_XLNX_ZYNQMP) += xlnx_dpdma.o
18
@2scalar .... ... q:1 . . size:2 .... .... .... . . . . .... \
25
obj-$(CONFIG_XLNX_ZYNQMP_ARM) += xlnx_dpdma.o
19
&2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp
26
+common-obj-$(CONFIG_XLNX_ZYNQMP_ARM) += xlnx-zdma.o
20
+ # For the 'long' ops the Q bit is part of insn decode
27
21
+ @2scalar_q0 .... ... . . . size:2 .... .... .... . . . . .... \
28
obj-$(CONFIG_OMAP) += omap_dma.o soc_dma.o
22
+ &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp q=0
29
obj-$(CONFIG_PXA2XX) += pxa2xx_dma.o
23
30
diff --git a/include/hw/dma/xlnx-zdma.h b/include/hw/dma/xlnx-zdma.h
24
VMLA_2sc 1111 001 . 1 . .. .... .... 0000 . 1 . 0 .... @2scalar
31
new file mode 100644
25
VMLA_F_2sc 1111 001 . 1 . .. .... .... 0001 . 1 . 0 .... @2scalar
32
index XXXXXXX..XXXXXXX
26
33
--- /dev/null
27
+ VMLAL_S_2sc 1111 001 0 1 . .. .... .... 0010 . 1 . 0 .... @2scalar_q0
34
+++ b/include/hw/dma/xlnx-zdma.h
28
+ VMLAL_U_2sc 1111 001 1 1 . .. .... .... 0010 . 1 . 0 .... @2scalar_q0
35
@@ -XXX,XX +XXX,XX @@
29
+
36
+/*
30
+ VQDMLAL_2sc 1111 001 0 1 . .. .... .... 0011 . 1 . 0 .... @2scalar_q0
37
+ * QEMU model of the ZynqMP generic DMA
31
+
38
+ *
32
VMLS_2sc 1111 001 . 1 . .. .... .... 0100 . 1 . 0 .... @2scalar
39
+ * Copyright (c) 2014 Xilinx Inc.
33
VMLS_F_2sc 1111 001 . 1 . .. .... .... 0101 . 1 . 0 .... @2scalar
40
+ * Copyright (c) 2018 FEIMTECH AB
34
41
+ *
35
+ VMLSL_S_2sc 1111 001 0 1 . .. .... .... 0110 . 1 . 0 .... @2scalar_q0
42
+ * Written by Edgar E. Iglesias <edgar.iglesias@xilinx.com>,
36
+ VMLSL_U_2sc 1111 001 1 1 . .. .... .... 0110 . 1 . 0 .... @2scalar_q0
43
+ * Francisco Iglesias <francisco.iglesias@feimtech.se>
37
+
44
+ *
38
+ VQDMLSL_2sc 1111 001 0 1 . .. .... .... 0111 . 1 . 0 .... @2scalar_q0
45
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
39
+
46
+ * of this software and associated documentation files (the "Software"), to deal
40
VMUL_2sc 1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
47
+ * in the Software without restriction, including without limitation the rights
41
VMUL_F_2sc 1111 001 . 1 . .. .... .... 1001 . 1 . 0 .... @2scalar
48
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
42
49
+ * copies of the Software, and to permit persons to whom the Software is
43
+ VMULL_S_2sc 1111 001 0 1 . .. .... .... 1010 . 1 . 0 .... @2scalar_q0
50
+ * furnished to do so, subject to the following conditions:
44
+ VMULL_U_2sc 1111 001 1 1 . .. .... .... 1010 . 1 . 0 .... @2scalar_q0
51
+ *
45
+
52
+ * The above copyright notice and this permission notice shall be included in
46
+ VQDMULL_2sc 1111 001 0 1 . .. .... .... 1011 . 1 . 0 .... @2scalar_q0
53
+ * all copies or substantial portions of the Software.
47
+
54
+ *
48
VQDMULH_2sc 1111 001 . 1 . .. .... .... 1100 . 1 . 0 .... @2scalar
55
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
49
VQRDMULH_2sc 1111 001 . 1 . .. .... .... 1101 . 1 . 0 .... @2scalar
56
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
50
57
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
51
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
58
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
52
index XXXXXXX..XXXXXXX 100644
59
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
53
--- a/target/arm/translate-neon.inc.c
60
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
54
+++ b/target/arm/translate-neon.inc.c
61
+ * THE SOFTWARE.
55
@@ -XXX,XX +XXX,XX @@ static bool trans_VQRDMLSH_2sc(DisasContext *s, arg_2scalar *a)
62
+ */
56
};
63
+
57
return do_vqrdmlah_2sc(s, a, opfn[a->size]);
64
+#ifndef XLNX_ZDMA_H
58
}
65
+#define XLNX_ZDMA_H
59
+
66
+
60
+static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
67
+#include "hw/sysbus.h"
61
+ NeonGenTwoOpWidenFn *opfn,
68
+#include "hw/register.h"
62
+ NeonGenTwo64OpFn *accfn)
69
+#include "sysemu/dma.h"
63
+{
70
+
64
+ /*
71
+#define ZDMA_R_MAX (0x204 / 4)
65
+ * Two registers and a scalar, long operations: perform an
72
+
66
+ * operation on the input elements and the scalar which produces
73
+typedef enum {
67
+ * a double-width result, and then possibly perform an accumulation
74
+ DISABLED = 0,
68
+ * operation of that result into the destination.
75
+ ENABLED = 1,
69
+ */
76
+ PAUSED = 2,
70
+ TCGv_i32 scalar, rn;
77
+} XlnxZDMAState;
71
+ TCGv_i64 rn0_64, rn1_64;
78
+
72
+
79
+typedef union {
73
+ if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
80
+ struct {
81
+ uint64_t addr;
82
+ uint32_t size;
83
+ uint32_t attr;
84
+ };
85
+ uint32_t words[4];
86
+} XlnxZDMADescr;
87
+
88
+typedef struct XlnxZDMA {
89
+ SysBusDevice parent_obj;
90
+ MemoryRegion iomem;
91
+ MemTxAttrs attr;
92
+ MemoryRegion *dma_mr;
93
+ AddressSpace *dma_as;
94
+ qemu_irq irq_zdma_ch_imr;
95
+
96
+ struct {
97
+ uint32_t bus_width;
98
+ } cfg;
99
+
100
+ XlnxZDMAState state;
101
+ bool error;
102
+
103
+ XlnxZDMADescr dsc_src;
104
+ XlnxZDMADescr dsc_dst;
105
+
106
+ uint32_t regs[ZDMA_R_MAX];
107
+ RegisterInfo regs_info[ZDMA_R_MAX];
108
+
109
+ /* We don't model the common bufs. Must be at least 16 bytes
110
+ to model write only mode. */
111
+ uint8_t buf[2048];
112
+} XlnxZDMA;
113
+
114
+#define TYPE_XLNX_ZDMA "xlnx.zdma"
115
+
116
+#define XLNX_ZDMA(obj) \
117
+ OBJECT_CHECK(XlnxZDMA, (obj), TYPE_XLNX_ZDMA)
118
+
119
+#endif /* XLNX_ZDMA_H */
120
diff --git a/hw/dma/xlnx-zdma.c b/hw/dma/xlnx-zdma.c
121
new file mode 100644
122
index XXXXXXX..XXXXXXX
123
--- /dev/null
124
+++ b/hw/dma/xlnx-zdma.c
125
@@ -XXX,XX +XXX,XX @@
126
+/*
127
+ * QEMU model of the ZynqMP generic DMA
128
+ *
129
+ * Copyright (c) 2014 Xilinx Inc.
130
+ * Copyright (c) 2018 FEIMTECH AB
131
+ *
132
+ * Written by Edgar E. Iglesias <edgar.iglesias@xilinx.com>,
133
+ * Francisco Iglesias <francisco.iglesias@feimtech.se>
134
+ *
135
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
136
+ * of this software and associated documentation files (the "Software"), to deal
137
+ * in the Software without restriction, including without limitation the rights
138
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
139
+ * copies of the Software, and to permit persons to whom the Software is
140
+ * furnished to do so, subject to the following conditions:
141
+ *
142
+ * The above copyright notice and this permission notice shall be included in
143
+ * all copies or substantial portions of the Software.
144
+ *
145
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
146
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
147
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
148
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
149
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
150
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
151
+ * THE SOFTWARE.
152
+ */
153
+
154
+#include "qemu/osdep.h"
155
+#include "hw/dma/xlnx-zdma.h"
156
+#include "qemu/bitops.h"
157
+#include "qemu/log.h"
158
+#include "qapi/error.h"
159
+
160
+#ifndef XLNX_ZDMA_ERR_DEBUG
161
+#define XLNX_ZDMA_ERR_DEBUG 0
162
+#endif
163
+
164
+REG32(ZDMA_ERR_CTRL, 0x0)
165
+ FIELD(ZDMA_ERR_CTRL, APB_ERR_RES, 0, 1)
166
+REG32(ZDMA_CH_ISR, 0x100)
167
+ FIELD(ZDMA_CH_ISR, DMA_PAUSE, 11, 1)
168
+ FIELD(ZDMA_CH_ISR, DMA_DONE, 10, 1)
169
+ FIELD(ZDMA_CH_ISR, AXI_WR_DATA, 9, 1)
170
+ FIELD(ZDMA_CH_ISR, AXI_RD_DATA, 8, 1)
171
+ FIELD(ZDMA_CH_ISR, AXI_RD_DST_DSCR, 7, 1)
172
+ FIELD(ZDMA_CH_ISR, AXI_RD_SRC_DSCR, 6, 1)
173
+ FIELD(ZDMA_CH_ISR, IRQ_DST_ACCT_ERR, 5, 1)
174
+ FIELD(ZDMA_CH_ISR, IRQ_SRC_ACCT_ERR, 4, 1)
175
+ FIELD(ZDMA_CH_ISR, BYTE_CNT_OVRFL, 3, 1)
176
+ FIELD(ZDMA_CH_ISR, DST_DSCR_DONE, 2, 1)
177
+ FIELD(ZDMA_CH_ISR, SRC_DSCR_DONE, 1, 1)
178
+ FIELD(ZDMA_CH_ISR, INV_APB, 0, 1)
179
+REG32(ZDMA_CH_IMR, 0x104)
180
+ FIELD(ZDMA_CH_IMR, DMA_PAUSE, 11, 1)
181
+ FIELD(ZDMA_CH_IMR, DMA_DONE, 10, 1)
182
+ FIELD(ZDMA_CH_IMR, AXI_WR_DATA, 9, 1)
183
+ FIELD(ZDMA_CH_IMR, AXI_RD_DATA, 8, 1)
184
+ FIELD(ZDMA_CH_IMR, AXI_RD_DST_DSCR, 7, 1)
185
+ FIELD(ZDMA_CH_IMR, AXI_RD_SRC_DSCR, 6, 1)
186
+ FIELD(ZDMA_CH_IMR, IRQ_DST_ACCT_ERR, 5, 1)
187
+ FIELD(ZDMA_CH_IMR, IRQ_SRC_ACCT_ERR, 4, 1)
188
+ FIELD(ZDMA_CH_IMR, BYTE_CNT_OVRFL, 3, 1)
189
+ FIELD(ZDMA_CH_IMR, DST_DSCR_DONE, 2, 1)
190
+ FIELD(ZDMA_CH_IMR, SRC_DSCR_DONE, 1, 1)
191
+ FIELD(ZDMA_CH_IMR, INV_APB, 0, 1)
192
+REG32(ZDMA_CH_IEN, 0x108)
193
+ FIELD(ZDMA_CH_IEN, DMA_PAUSE, 11, 1)
194
+ FIELD(ZDMA_CH_IEN, DMA_DONE, 10, 1)
195
+ FIELD(ZDMA_CH_IEN, AXI_WR_DATA, 9, 1)
196
+ FIELD(ZDMA_CH_IEN, AXI_RD_DATA, 8, 1)
197
+ FIELD(ZDMA_CH_IEN, AXI_RD_DST_DSCR, 7, 1)
198
+ FIELD(ZDMA_CH_IEN, AXI_RD_SRC_DSCR, 6, 1)
199
+ FIELD(ZDMA_CH_IEN, IRQ_DST_ACCT_ERR, 5, 1)
200
+ FIELD(ZDMA_CH_IEN, IRQ_SRC_ACCT_ERR, 4, 1)
201
+ FIELD(ZDMA_CH_IEN, BYTE_CNT_OVRFL, 3, 1)
202
+ FIELD(ZDMA_CH_IEN, DST_DSCR_DONE, 2, 1)
203
+ FIELD(ZDMA_CH_IEN, SRC_DSCR_DONE, 1, 1)
204
+ FIELD(ZDMA_CH_IEN, INV_APB, 0, 1)
205
+REG32(ZDMA_CH_IDS, 0x10c)
206
+ FIELD(ZDMA_CH_IDS, DMA_PAUSE, 11, 1)
207
+ FIELD(ZDMA_CH_IDS, DMA_DONE, 10, 1)
208
+ FIELD(ZDMA_CH_IDS, AXI_WR_DATA, 9, 1)
209
+ FIELD(ZDMA_CH_IDS, AXI_RD_DATA, 8, 1)
210
+ FIELD(ZDMA_CH_IDS, AXI_RD_DST_DSCR, 7, 1)
211
+ FIELD(ZDMA_CH_IDS, AXI_RD_SRC_DSCR, 6, 1)
212
+ FIELD(ZDMA_CH_IDS, IRQ_DST_ACCT_ERR, 5, 1)
213
+ FIELD(ZDMA_CH_IDS, IRQ_SRC_ACCT_ERR, 4, 1)
214
+ FIELD(ZDMA_CH_IDS, BYTE_CNT_OVRFL, 3, 1)
215
+ FIELD(ZDMA_CH_IDS, DST_DSCR_DONE, 2, 1)
216
+ FIELD(ZDMA_CH_IDS, SRC_DSCR_DONE, 1, 1)
217
+ FIELD(ZDMA_CH_IDS, INV_APB, 0, 1)
218
+REG32(ZDMA_CH_CTRL0, 0x110)
219
+ FIELD(ZDMA_CH_CTRL0, OVR_FETCH, 7, 1)
220
+ FIELD(ZDMA_CH_CTRL0, POINT_TYPE, 6, 1)
221
+ FIELD(ZDMA_CH_CTRL0, MODE, 4, 2)
222
+ FIELD(ZDMA_CH_CTRL0, RATE_CTRL, 3, 1)
223
+ FIELD(ZDMA_CH_CTRL0, CONT_ADDR, 2, 1)
224
+ FIELD(ZDMA_CH_CTRL0, CONT, 1, 1)
225
+REG32(ZDMA_CH_CTRL1, 0x114)
226
+ FIELD(ZDMA_CH_CTRL1, DST_ISSUE, 5, 5)
227
+ FIELD(ZDMA_CH_CTRL1, SRC_ISSUE, 0, 5)
228
+REG32(ZDMA_CH_FCI, 0x118)
229
+ FIELD(ZDMA_CH_FCI, PROG_CELL_CNT, 2, 2)
230
+ FIELD(ZDMA_CH_FCI, SIDE, 1, 1)
231
+ FIELD(ZDMA_CH_FCI, EN, 0, 1)
232
+REG32(ZDMA_CH_STATUS, 0x11c)
233
+ FIELD(ZDMA_CH_STATUS, STATE, 0, 2)
234
+REG32(ZDMA_CH_DATA_ATTR, 0x120)
235
+ FIELD(ZDMA_CH_DATA_ATTR, ARBURST, 26, 2)
236
+ FIELD(ZDMA_CH_DATA_ATTR, ARCACHE, 22, 4)
237
+ FIELD(ZDMA_CH_DATA_ATTR, ARQOS, 18, 4)
238
+ FIELD(ZDMA_CH_DATA_ATTR, ARLEN, 14, 4)
239
+ FIELD(ZDMA_CH_DATA_ATTR, AWBURST, 12, 2)
240
+ FIELD(ZDMA_CH_DATA_ATTR, AWCACHE, 8, 4)
241
+ FIELD(ZDMA_CH_DATA_ATTR, AWQOS, 4, 4)
242
+ FIELD(ZDMA_CH_DATA_ATTR, AWLEN, 0, 4)
243
+REG32(ZDMA_CH_DSCR_ATTR, 0x124)
244
+ FIELD(ZDMA_CH_DSCR_ATTR, AXCOHRNT, 8, 1)
245
+ FIELD(ZDMA_CH_DSCR_ATTR, AXCACHE, 4, 4)
246
+ FIELD(ZDMA_CH_DSCR_ATTR, AXQOS, 0, 4)
247
+REG32(ZDMA_CH_SRC_DSCR_WORD0, 0x128)
248
+REG32(ZDMA_CH_SRC_DSCR_WORD1, 0x12c)
249
+ FIELD(ZDMA_CH_SRC_DSCR_WORD1, MSB, 0, 17)
250
+REG32(ZDMA_CH_SRC_DSCR_WORD2, 0x130)
251
+ FIELD(ZDMA_CH_SRC_DSCR_WORD2, SIZE, 0, 30)
252
+REG32(ZDMA_CH_SRC_DSCR_WORD3, 0x134)
253
+ FIELD(ZDMA_CH_SRC_DSCR_WORD3, CMD, 3, 2)
254
+ FIELD(ZDMA_CH_SRC_DSCR_WORD3, INTR, 2, 1)
255
+ FIELD(ZDMA_CH_SRC_DSCR_WORD3, TYPE, 1, 1)
256
+ FIELD(ZDMA_CH_SRC_DSCR_WORD3, COHRNT, 0, 1)
257
+REG32(ZDMA_CH_DST_DSCR_WORD0, 0x138)
258
+REG32(ZDMA_CH_DST_DSCR_WORD1, 0x13c)
259
+ FIELD(ZDMA_CH_DST_DSCR_WORD1, MSB, 0, 17)
260
+REG32(ZDMA_CH_DST_DSCR_WORD2, 0x140)
261
+ FIELD(ZDMA_CH_DST_DSCR_WORD2, SIZE, 0, 30)
262
+REG32(ZDMA_CH_DST_DSCR_WORD3, 0x144)
263
+ FIELD(ZDMA_CH_DST_DSCR_WORD3, INTR, 2, 1)
264
+ FIELD(ZDMA_CH_DST_DSCR_WORD3, TYPE, 1, 1)
265
+ FIELD(ZDMA_CH_DST_DSCR_WORD3, COHRNT, 0, 1)
266
+REG32(ZDMA_CH_WR_ONLY_WORD0, 0x148)
267
+REG32(ZDMA_CH_WR_ONLY_WORD1, 0x14c)
268
+REG32(ZDMA_CH_WR_ONLY_WORD2, 0x150)
269
+REG32(ZDMA_CH_WR_ONLY_WORD3, 0x154)
270
+REG32(ZDMA_CH_SRC_START_LSB, 0x158)
271
+REG32(ZDMA_CH_SRC_START_MSB, 0x15c)
272
+ FIELD(ZDMA_CH_SRC_START_MSB, ADDR, 0, 17)
273
+REG32(ZDMA_CH_DST_START_LSB, 0x160)
274
+REG32(ZDMA_CH_DST_START_MSB, 0x164)
275
+ FIELD(ZDMA_CH_DST_START_MSB, ADDR, 0, 17)
276
+REG32(ZDMA_CH_RATE_CTRL, 0x18c)
277
+ FIELD(ZDMA_CH_RATE_CTRL, CNT, 0, 12)
278
+REG32(ZDMA_CH_SRC_CUR_PYLD_LSB, 0x168)
279
+REG32(ZDMA_CH_SRC_CUR_PYLD_MSB, 0x16c)
280
+ FIELD(ZDMA_CH_SRC_CUR_PYLD_MSB, ADDR, 0, 17)
281
+REG32(ZDMA_CH_DST_CUR_PYLD_LSB, 0x170)
282
+REG32(ZDMA_CH_DST_CUR_PYLD_MSB, 0x174)
283
+ FIELD(ZDMA_CH_DST_CUR_PYLD_MSB, ADDR, 0, 17)
284
+REG32(ZDMA_CH_SRC_CUR_DSCR_LSB, 0x178)
285
+REG32(ZDMA_CH_SRC_CUR_DSCR_MSB, 0x17c)
286
+ FIELD(ZDMA_CH_SRC_CUR_DSCR_MSB, ADDR, 0, 17)
287
+REG32(ZDMA_CH_DST_CUR_DSCR_LSB, 0x180)
288
+REG32(ZDMA_CH_DST_CUR_DSCR_MSB, 0x184)
289
+ FIELD(ZDMA_CH_DST_CUR_DSCR_MSB, ADDR, 0, 17)
290
+REG32(ZDMA_CH_TOTAL_BYTE, 0x188)
291
+REG32(ZDMA_CH_RATE_CNTL, 0x18c)
292
+ FIELD(ZDMA_CH_RATE_CNTL, CNT, 0, 12)
293
+REG32(ZDMA_CH_IRQ_SRC_ACCT, 0x190)
294
+ FIELD(ZDMA_CH_IRQ_SRC_ACCT, CNT, 0, 8)
295
+REG32(ZDMA_CH_IRQ_DST_ACCT, 0x194)
296
+ FIELD(ZDMA_CH_IRQ_DST_ACCT, CNT, 0, 8)
297
+REG32(ZDMA_CH_DBG0, 0x198)
298
+ FIELD(ZDMA_CH_DBG0, CMN_BUF_FREE, 0, 9)
299
+REG32(ZDMA_CH_DBG1, 0x19c)
300
+ FIELD(ZDMA_CH_DBG1, CMN_BUF_OCC, 0, 9)
301
+REG32(ZDMA_CH_CTRL2, 0x200)
302
+ FIELD(ZDMA_CH_CTRL2, EN, 0, 1)
303
+
304
+enum {
305
+ PT_REG = 0,
306
+ PT_MEM = 1,
307
+};
308
+
309
+enum {
310
+ CMD_HALT = 1,
311
+ CMD_STOP = 2,
312
+};
313
+
314
+enum {
315
+ RW_MODE_RW = 0,
316
+ RW_MODE_WO = 1,
317
+ RW_MODE_RO = 2,
318
+};
319
+
320
+enum {
321
+ DTYPE_LINEAR = 0,
322
+ DTYPE_LINKED = 1,
323
+};
324
+
325
+enum {
326
+ AXI_BURST_FIXED = 0,
327
+ AXI_BURST_INCR = 1,
328
+};
329
+
330
+static void zdma_ch_imr_update_irq(XlnxZDMA *s)
331
+{
332
+ bool pending;
333
+
334
+ pending = s->regs[R_ZDMA_CH_ISR] & ~s->regs[R_ZDMA_CH_IMR];
335
+
336
+ qemu_set_irq(s->irq_zdma_ch_imr, pending);
337
+}
338
+
339
+static void zdma_ch_isr_postw(RegisterInfo *reg, uint64_t val64)
340
+{
341
+ XlnxZDMA *s = XLNX_ZDMA(reg->opaque);
342
+ zdma_ch_imr_update_irq(s);
343
+}
344
+
345
+static uint64_t zdma_ch_ien_prew(RegisterInfo *reg, uint64_t val64)
346
+{
347
+ XlnxZDMA *s = XLNX_ZDMA(reg->opaque);
348
+ uint32_t val = val64;
349
+
350
+ s->regs[R_ZDMA_CH_IMR] &= ~val;
351
+ zdma_ch_imr_update_irq(s);
352
+ return 0;
353
+}
354
+
355
+static uint64_t zdma_ch_ids_prew(RegisterInfo *reg, uint64_t val64)
356
+{
357
+ XlnxZDMA *s = XLNX_ZDMA(reg->opaque);
358
+ uint32_t val = val64;
359
+
360
+ s->regs[R_ZDMA_CH_IMR] |= val;
361
+ zdma_ch_imr_update_irq(s);
362
+ return 0;
363
+}
364
+
365
+static void zdma_set_state(XlnxZDMA *s, XlnxZDMAState state)
366
+{
367
+ s->state = state;
368
+ ARRAY_FIELD_DP32(s->regs, ZDMA_CH_STATUS, STATE, state);
369
+
370
+ /* Signal error if we have an error condition. */
371
+ if (s->error) {
372
+ ARRAY_FIELD_DP32(s->regs, ZDMA_CH_STATUS, STATE, 3);
373
+ }
374
+}
375
+
376
+static void zdma_src_done(XlnxZDMA *s)
377
+{
378
+ unsigned int cnt;
379
+ cnt = ARRAY_FIELD_EX32(s->regs, ZDMA_CH_IRQ_SRC_ACCT, CNT);
380
+ cnt++;
381
+ ARRAY_FIELD_DP32(s->regs, ZDMA_CH_IRQ_SRC_ACCT, CNT, cnt);
382
+ ARRAY_FIELD_DP32(s->regs, ZDMA_CH_ISR, SRC_DSCR_DONE, true);
383
+
384
+ /* Did we overflow? */
385
+ if (cnt != ARRAY_FIELD_EX32(s->regs, ZDMA_CH_IRQ_SRC_ACCT, CNT)) {
386
+ ARRAY_FIELD_DP32(s->regs, ZDMA_CH_ISR, IRQ_SRC_ACCT_ERR, true);
387
+ }
388
+ zdma_ch_imr_update_irq(s);
389
+}
390
+
391
+static void zdma_dst_done(XlnxZDMA *s)
392
+{
393
+ unsigned int cnt;
394
+ cnt = ARRAY_FIELD_EX32(s->regs, ZDMA_CH_IRQ_DST_ACCT, CNT);
395
+ cnt++;
396
+ ARRAY_FIELD_DP32(s->regs, ZDMA_CH_IRQ_DST_ACCT, CNT, cnt);
397
+ ARRAY_FIELD_DP32(s->regs, ZDMA_CH_ISR, DST_DSCR_DONE, true);
398
+
399
+ /* Did we overflow? */
400
+ if (cnt != ARRAY_FIELD_EX32(s->regs, ZDMA_CH_IRQ_DST_ACCT, CNT)) {
401
+ ARRAY_FIELD_DP32(s->regs, ZDMA_CH_ISR, IRQ_DST_ACCT_ERR, true);
402
+ }
403
+ zdma_ch_imr_update_irq(s);
404
+}
405
+
406
+static uint64_t zdma_get_regaddr64(XlnxZDMA *s, unsigned int basereg)
407
+{
408
+ uint64_t addr;
409
+
410
+ addr = s->regs[basereg + 1];
411
+ addr <<= 32;
412
+ addr |= s->regs[basereg];
413
+
414
+ return addr;
415
+}
416
+
417
+static void zdma_put_regaddr64(XlnxZDMA *s, unsigned int basereg, uint64_t addr)
418
+{
419
+ s->regs[basereg] = addr;
420
+ s->regs[basereg + 1] = addr >> 32;
421
+}
422
+
423
+static bool zdma_load_descriptor(XlnxZDMA *s, uint64_t addr, void *buf)
424
+{
425
+ /* ZDMA descriptors must be aligned to their own size. */
426
+ if (addr % sizeof(XlnxZDMADescr)) {
427
+ qemu_log_mask(LOG_GUEST_ERROR,
428
+ "zdma: unaligned descriptor at %" PRIx64,
429
+ addr);
430
+ memset(buf, 0xdeadbeef, sizeof(XlnxZDMADescr));
431
+ s->error = true;
432
+ return false;
74
+ return false;
433
+ }
75
+ }
434
+
76
+
435
+ address_space_rw(s->dma_as, addr, s->attr,
77
+ /* UNDEF accesses to D16-D31 if they don't exist. */
436
+ buf, sizeof(XlnxZDMADescr), false);
78
+ if (!dc_isar_feature(aa32_simd_r32, s) &&
79
+ ((a->vd | a->vn | a->vm) & 0x10)) {
80
+ return false;
81
+ }
82
+
83
+ if (!opfn) {
84
+ /* Bad size (including size == 3, which is a different insn group) */
85
+ return false;
86
+ }
87
+
88
+ if (a->vd & 1) {
89
+ return false;
90
+ }
91
+
92
+ if (!vfp_access_check(s)) {
93
+ return true;
94
+ }
95
+
96
+ scalar = neon_get_scalar(a->size, a->vm);
97
+
98
+ /* Load all inputs before writing any outputs, in case of overlap */
99
+ rn = neon_load_reg(a->vn, 0);
100
+ rn0_64 = tcg_temp_new_i64();
101
+ opfn(rn0_64, rn, scalar);
102
+ tcg_temp_free_i32(rn);
103
+
104
+ rn = neon_load_reg(a->vn, 1);
105
+ rn1_64 = tcg_temp_new_i64();
106
+ opfn(rn1_64, rn, scalar);
107
+ tcg_temp_free_i32(rn);
108
+ tcg_temp_free_i32(scalar);
109
+
110
+ if (accfn) {
111
+ TCGv_i64 t64 = tcg_temp_new_i64();
112
+ neon_load_reg64(t64, a->vd);
113
+ accfn(t64, t64, rn0_64);
114
+ neon_store_reg64(t64, a->vd);
115
+ neon_load_reg64(t64, a->vd + 1);
116
+ accfn(t64, t64, rn1_64);
117
+ neon_store_reg64(t64, a->vd + 1);
118
+ tcg_temp_free_i64(t64);
119
+ } else {
120
+ neon_store_reg64(rn0_64, a->vd);
121
+ neon_store_reg64(rn1_64, a->vd + 1);
122
+ }
123
+ tcg_temp_free_i64(rn0_64);
124
+ tcg_temp_free_i64(rn1_64);
437
+ return true;
125
+ return true;
438
+}
126
+}
439
+
127
+
440
+static void zdma_load_src_descriptor(XlnxZDMA *s)
128
+static bool trans_VMULL_S_2sc(DisasContext *s, arg_2scalar *a)
441
+{
129
+{
442
+ uint64_t src_addr;
130
+ static NeonGenTwoOpWidenFn * const opfn[] = {
443
+ unsigned int ptype = ARRAY_FIELD_EX32(s->regs, ZDMA_CH_CTRL0, POINT_TYPE);
131
+ NULL,
444
+
132
+ gen_helper_neon_mull_s16,
445
+ if (ptype == PT_REG) {
133
+ gen_mull_s32,
446
+ memcpy(&s->dsc_src, &s->regs[R_ZDMA_CH_SRC_DSCR_WORD0],
134
+ NULL,
447
+ sizeof(s->dsc_src));
135
+ };
448
+ return;
136
+
449
+ }
137
+ return do_2scalar_long(s, a, opfn[a->size], NULL);
450
+
138
+}
451
+ src_addr = zdma_get_regaddr64(s, R_ZDMA_CH_SRC_CUR_DSCR_LSB);
139
+
452
+
140
+static bool trans_VMULL_U_2sc(DisasContext *s, arg_2scalar *a)
453
+ if (!zdma_load_descriptor(s, src_addr, &s->dsc_src)) {
141
+{
454
+ ARRAY_FIELD_DP32(s->regs, ZDMA_CH_ISR, AXI_RD_SRC_DSCR, true);
142
+ static NeonGenTwoOpWidenFn * const opfn[] = {
455
+ }
143
+ NULL,
456
+}
144
+ gen_helper_neon_mull_u16,
457
+
145
+ gen_mull_u32,
458
+static void zdma_load_dst_descriptor(XlnxZDMA *s)
146
+ NULL,
459
+{
147
+ };
460
+ uint64_t dst_addr;
148
+
461
+ unsigned int ptype = ARRAY_FIELD_EX32(s->regs, ZDMA_CH_CTRL0, POINT_TYPE);
149
+ return do_2scalar_long(s, a, opfn[a->size], NULL);
462
+
150
+}
463
+ if (ptype == PT_REG) {
151
+
464
+ memcpy(&s->dsc_dst, &s->regs[R_ZDMA_CH_DST_DSCR_WORD0],
152
+#define DO_VMLAL_2SC(INSN, MULL, ACC) \
465
+ sizeof(s->dsc_dst));
153
+ static bool trans_##INSN##_2sc(DisasContext *s, arg_2scalar *a) \
466
+ return;
154
+ { \
467
+ }
155
+ static NeonGenTwoOpWidenFn * const opfn[] = { \
468
+
156
+ NULL, \
469
+ dst_addr = zdma_get_regaddr64(s, R_ZDMA_CH_DST_CUR_DSCR_LSB);
157
+ gen_helper_neon_##MULL##16, \
470
+
158
+ gen_##MULL##32, \
471
+ if (!zdma_load_descriptor(s, dst_addr, &s->dsc_dst)) {
159
+ NULL, \
472
+ ARRAY_FIELD_DP32(s->regs, ZDMA_CH_ISR, AXI_RD_DST_DSCR, true);
160
+ }; \
473
+ }
161
+ static NeonGenTwo64OpFn * const accfn[] = { \
474
+}
162
+ NULL, \
475
+
163
+ gen_helper_neon_##ACC##l_u32, \
476
+static uint64_t zdma_update_descr_addr(XlnxZDMA *s, bool type,
164
+ tcg_gen_##ACC##_i64, \
477
+ unsigned int basereg)
165
+ NULL, \
478
+{
166
+ }; \
479
+ uint64_t addr, next;
167
+ return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]); \
480
+
168
+ }
481
+ if (type == DTYPE_LINEAR) {
169
+
482
+ next = zdma_get_regaddr64(s, basereg);
170
+DO_VMLAL_2SC(VMLAL_S, mull_s, add)
483
+ next += sizeof(s->dsc_dst);
171
+DO_VMLAL_2SC(VMLAL_U, mull_u, add)
484
+ zdma_put_regaddr64(s, basereg, next);
172
+DO_VMLAL_2SC(VMLSL_S, mull_s, sub)
485
+ } else {
173
+DO_VMLAL_2SC(VMLSL_U, mull_u, sub)
486
+ addr = zdma_get_regaddr64(s, basereg);
174
+
487
+ addr += sizeof(s->dsc_dst);
175
+static bool trans_VQDMULL_2sc(DisasContext *s, arg_2scalar *a)
488
+ address_space_rw(s->dma_as, addr, s->attr, (void *) &next, 8, false);
176
+{
489
+ zdma_put_regaddr64(s, basereg, next);
177
+ static NeonGenTwoOpWidenFn * const opfn[] = {
490
+ }
178
+ NULL,
491
+ return next;
179
+ gen_VQDMULL_16,
492
+}
180
+ gen_VQDMULL_32,
493
+
181
+ NULL,
494
+static void zdma_write_dst(XlnxZDMA *s, uint8_t *buf, uint32_t len)
182
+ };
495
+{
183
+
496
+ uint32_t dst_size, dlen;
184
+ return do_2scalar_long(s, a, opfn[a->size], NULL);
497
+ bool dst_intr, dst_type;
185
+}
498
+ unsigned int ptype = ARRAY_FIELD_EX32(s->regs, ZDMA_CH_CTRL0, POINT_TYPE);
186
+
499
+ unsigned int rw_mode = ARRAY_FIELD_EX32(s->regs, ZDMA_CH_CTRL0, MODE);
187
+static bool trans_VQDMLAL_2sc(DisasContext *s, arg_2scalar *a)
500
+ unsigned int burst_type = ARRAY_FIELD_EX32(s->regs, ZDMA_CH_DATA_ATTR,
188
+{
501
+ AWBURST);
189
+ static NeonGenTwoOpWidenFn * const opfn[] = {
502
+
190
+ NULL,
503
+ /* FIXED burst types are only supported in simple dma mode. */
191
+ gen_VQDMULL_16,
504
+ if (ptype != PT_REG) {
192
+ gen_VQDMULL_32,
505
+ burst_type = AXI_BURST_INCR;
193
+ NULL,
506
+ }
194
+ };
507
+
195
+ static NeonGenTwo64OpFn * const accfn[] = {
508
+ while (len) {
196
+ NULL,
509
+ dst_size = FIELD_EX32(s->dsc_dst.words[2], ZDMA_CH_DST_DSCR_WORD2,
197
+ gen_VQDMLAL_acc_16,
510
+ SIZE);
198
+ gen_VQDMLAL_acc_32,
511
+ dst_type = FIELD_EX32(s->dsc_dst.words[3], ZDMA_CH_DST_DSCR_WORD3,
199
+ NULL,
512
+ TYPE);
200
+ };
513
+ if (dst_size == 0 && ptype == PT_MEM) {
201
+
514
+ uint64_t next;
202
+ return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
515
+ next = zdma_update_descr_addr(s, dst_type,
203
+}
516
+ R_ZDMA_CH_DST_CUR_DSCR_LSB);
204
+
517
+ zdma_load_descriptor(s, next, &s->dsc_dst);
205
+static bool trans_VQDMLSL_2sc(DisasContext *s, arg_2scalar *a)
518
+ dst_size = FIELD_EX32(s->dsc_dst.words[2], ZDMA_CH_DST_DSCR_WORD2,
206
+{
519
+ SIZE);
207
+ static NeonGenTwoOpWidenFn * const opfn[] = {
520
+ dst_type = FIELD_EX32(s->dsc_dst.words[3], ZDMA_CH_DST_DSCR_WORD3,
208
+ NULL,
521
+ TYPE);
209
+ gen_VQDMULL_16,
522
+ }
210
+ gen_VQDMULL_32,
523
+
211
+ NULL,
524
+ /* Match what hardware does by ignoring the dst_size and only using
212
+ };
525
+ * the src size for Simple register mode. */
213
+ static NeonGenTwo64OpFn * const accfn[] = {
526
+ if (ptype == PT_REG && rw_mode != RW_MODE_WO) {
214
+ NULL,
527
+ dst_size = len;
215
+ gen_VQDMLSL_acc_16,
528
+ }
216
+ gen_VQDMLSL_acc_32,
529
+
217
+ NULL,
530
+ dst_intr = FIELD_EX32(s->dsc_dst.words[3], ZDMA_CH_DST_DSCR_WORD3,
218
+ };
531
+ INTR);
219
+
532
+
220
+ return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
533
+ dlen = len > dst_size ? dst_size : len;
221
+}
534
+ if (burst_type == AXI_BURST_FIXED) {
222
diff --git a/target/arm/translate.c b/target/arm/translate.c
535
+ if (dlen > (s->cfg.bus_width / 8)) {
223
index XXXXXXX..XXXXXXX 100644
536
+ dlen = s->cfg.bus_width / 8;
224
--- a/target/arm/translate.c
537
+ }
225
+++ b/target/arm/translate.c
538
+ }
226
@@ -XXX,XX +XXX,XX @@ static void gen_revsh(TCGv_i32 dest, TCGv_i32 var)
539
+
227
tcg_gen_ext16s_i32(dest, var);
540
+ address_space_rw(s->dma_as, s->dsc_dst.addr, s->attr, buf, dlen,
228
}
541
+ true);
229
542
+ if (burst_type == AXI_BURST_INCR) {
230
-/* 32x32->64 multiply. Marks inputs as dead. */
543
+ s->dsc_dst.addr += dlen;
231
-static TCGv_i64 gen_mulu_i64_i32(TCGv_i32 a, TCGv_i32 b)
544
+ }
232
-{
545
+ dst_size -= dlen;
233
- TCGv_i32 lo = tcg_temp_new_i32();
546
+ buf += dlen;
234
- TCGv_i32 hi = tcg_temp_new_i32();
547
+ len -= dlen;
235
- TCGv_i64 ret;
548
+
236
-
549
+ if (dst_size == 0 && dst_intr) {
237
- tcg_gen_mulu2_i32(lo, hi, a, b);
550
+ zdma_dst_done(s);
238
- tcg_temp_free_i32(a);
551
+ }
239
- tcg_temp_free_i32(b);
552
+
240
-
553
+ /* Write back to buffered descriptor. */
241
- ret = tcg_temp_new_i64();
554
+ s->dsc_dst.words[2] = FIELD_DP32(s->dsc_dst.words[2],
242
- tcg_gen_concat_i32_i64(ret, lo, hi);
555
+ ZDMA_CH_DST_DSCR_WORD2,
243
- tcg_temp_free_i32(lo);
556
+ SIZE,
244
- tcg_temp_free_i32(hi);
557
+ dst_size);
245
-
558
+ }
246
- return ret;
559
+}
247
-}
560
+
248
-
561
+static void zdma_process_descr(XlnxZDMA *s)
249
-static TCGv_i64 gen_muls_i64_i32(TCGv_i32 a, TCGv_i32 b)
562
+{
250
-{
563
+ uint64_t src_addr;
251
- TCGv_i32 lo = tcg_temp_new_i32();
564
+ uint32_t src_size, len;
252
- TCGv_i32 hi = tcg_temp_new_i32();
565
+ unsigned int src_cmd;
253
- TCGv_i64 ret;
566
+ bool src_intr, src_type;
254
-
567
+ unsigned int ptype = ARRAY_FIELD_EX32(s->regs, ZDMA_CH_CTRL0, POINT_TYPE);
255
- tcg_gen_muls2_i32(lo, hi, a, b);
568
+ unsigned int rw_mode = ARRAY_FIELD_EX32(s->regs, ZDMA_CH_CTRL0, MODE);
256
- tcg_temp_free_i32(a);
569
+ unsigned int burst_type = ARRAY_FIELD_EX32(s->regs, ZDMA_CH_DATA_ATTR,
257
- tcg_temp_free_i32(b);
570
+ ARBURST);
258
-
571
+
259
- ret = tcg_temp_new_i64();
572
+ src_addr = s->dsc_src.addr;
260
- tcg_gen_concat_i32_i64(ret, lo, hi);
573
+ src_size = FIELD_EX32(s->dsc_src.words[2], ZDMA_CH_SRC_DSCR_WORD2, SIZE);
261
- tcg_temp_free_i32(lo);
574
+ src_cmd = FIELD_EX32(s->dsc_src.words[3], ZDMA_CH_SRC_DSCR_WORD3, CMD);
262
- tcg_temp_free_i32(hi);
575
+ src_type = FIELD_EX32(s->dsc_src.words[3], ZDMA_CH_SRC_DSCR_WORD3, TYPE);
263
-
576
+ src_intr = FIELD_EX32(s->dsc_src.words[3], ZDMA_CH_SRC_DSCR_WORD3, INTR);
264
- return ret;
577
+
265
-}
578
+ /* FIXED burst types and non-rw modes are only supported in
266
-
579
+ * simple dma mode.
267
/* Swap low and high halfwords. */
580
+ */
268
static void gen_swap_half(TCGv_i32 var)
581
+ if (ptype != PT_REG) {
269
{
582
+ if (rw_mode != RW_MODE_RW) {
270
@@ -XXX,XX +XXX,XX @@ static inline void gen_neon_addl(int size)
583
+ qemu_log_mask(LOG_GUEST_ERROR,
271
}
584
+ "zDMA: rw-mode=%d but not simple DMA mode.\n",
272
}
585
+ rw_mode);
273
586
+ }
274
-static inline void gen_neon_negl(TCGv_i64 var, int size)
587
+ if (burst_type != AXI_BURST_INCR) {
275
-{
588
+ qemu_log_mask(LOG_GUEST_ERROR,
276
- switch (size) {
589
+ "zDMA: burst_type=%d but not simple DMA mode.\n",
277
- case 0: gen_helper_neon_negl_u16(var, var); break;
590
+ burst_type);
278
- case 1: gen_helper_neon_negl_u32(var, var); break;
591
+ }
279
- case 2:
592
+ burst_type = AXI_BURST_INCR;
280
- tcg_gen_neg_i64(var, var);
593
+ rw_mode = RW_MODE_RW;
281
- break;
594
+ }
282
- default: abort();
595
+
283
- }
596
+ if (rw_mode == RW_MODE_WO) {
284
-}
597
+ /* In Simple DMA Write-Only, we need to push DST size bytes
285
-
598
+ * regardless of what SRC size is set to. */
286
-static inline void gen_neon_addl_saturate(TCGv_i64 op0, TCGv_i64 op1, int size)
599
+ src_size = FIELD_EX32(s->dsc_dst.words[2], ZDMA_CH_DST_DSCR_WORD2,
287
-{
600
+ SIZE);
288
- switch (size) {
601
+ memcpy(s->buf, &s->regs[R_ZDMA_CH_WR_ONLY_WORD0], s->cfg.bus_width / 8);
289
- case 1: gen_helper_neon_addl_saturate_s32(op0, cpu_env, op0, op1); break;
602
+ }
290
- case 2: gen_helper_neon_addl_saturate_s64(op0, cpu_env, op0, op1); break;
603
+
291
- default: abort();
604
+ while (src_size) {
292
- }
605
+ len = src_size > ARRAY_SIZE(s->buf) ? ARRAY_SIZE(s->buf) : src_size;
293
-}
606
+ if (burst_type == AXI_BURST_FIXED) {
294
-
607
+ if (len > (s->cfg.bus_width / 8)) {
295
-static inline void gen_neon_mull(TCGv_i64 dest, TCGv_i32 a, TCGv_i32 b,
608
+ len = s->cfg.bus_width / 8;
296
- int size, int u)
609
+ }
297
-{
610
+ }
298
- TCGv_i64 tmp;
611
+
299
-
612
+ if (rw_mode == RW_MODE_WO) {
300
- switch ((size << 1) | u) {
613
+ if (len > s->cfg.bus_width / 8) {
301
- case 0: gen_helper_neon_mull_s8(dest, a, b); break;
614
+ len = s->cfg.bus_width / 8;
302
- case 1: gen_helper_neon_mull_u8(dest, a, b); break;
615
+ }
303
- case 2: gen_helper_neon_mull_s16(dest, a, b); break;
616
+ } else {
304
- case 3: gen_helper_neon_mull_u16(dest, a, b); break;
617
+ address_space_rw(s->dma_as, src_addr, s->attr, s->buf, len,
305
- case 4:
618
+ false);
306
- tmp = gen_muls_i64_i32(a, b);
619
+ if (burst_type == AXI_BURST_INCR) {
307
- tcg_gen_mov_i64(dest, tmp);
620
+ src_addr += len;
308
- tcg_temp_free_i64(tmp);
621
+ }
309
- break;
622
+ }
310
- case 5:
623
+
311
- tmp = gen_mulu_i64_i32(a, b);
624
+ if (rw_mode != RW_MODE_RO) {
312
- tcg_gen_mov_i64(dest, tmp);
625
+ zdma_write_dst(s, s->buf, len);
313
- tcg_temp_free_i64(tmp);
626
+ }
314
- break;
627
+
315
- default: abort();
628
+ s->regs[R_ZDMA_CH_TOTAL_BYTE] += len;
316
- }
629
+ src_size -= len;
317
-
630
+ }
318
- /* gen_helper_neon_mull_[su]{8|16} do not free their parameters.
631
+
319
- Don't forget to clean them now. */
632
+ ARRAY_FIELD_DP32(s->regs, ZDMA_CH_ISR, DMA_DONE, true);
320
- if (size < 2) {
633
+
321
- tcg_temp_free_i32(a);
634
+ if (src_intr) {
322
- tcg_temp_free_i32(b);
635
+ zdma_src_done(s);
323
- }
636
+ }
324
-}
637
+
325
-
638
+ /* Load next descriptor. */
326
static void gen_neon_narrow_op(int op, int u, int size,
639
+ if (ptype == PT_REG || src_cmd == CMD_STOP) {
327
TCGv_i32 dest, TCGv_i64 src)
640
+ ARRAY_FIELD_DP32(s->regs, ZDMA_CH_CTRL2, EN, 0);
328
{
641
+ zdma_set_state(s, DISABLED);
329
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
642
+ return;
330
int u;
643
+ }
331
int vec_size;
644
+
332
uint32_t imm;
645
+ if (src_cmd == CMD_HALT) {
333
- TCGv_i32 tmp, tmp2, tmp3, tmp4, tmp5;
646
+ zdma_set_state(s, PAUSED);
334
+ TCGv_i32 tmp, tmp2, tmp3, tmp5;
647
+ ARRAY_FIELD_DP32(s->regs, ZDMA_CH_ISR, DMA_PAUSE, 1);
335
TCGv_ptr ptr1;
648
+ zdma_ch_imr_update_irq(s);
336
TCGv_i64 tmp64;
649
+ return;
337
650
+ }
338
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
651
+
339
return 1;
652
+ zdma_update_descr_addr(s, src_type, R_ZDMA_CH_SRC_CUR_DSCR_LSB);
340
} else { /* (insn & 0x00800010 == 0x00800000) */
653
+}
341
if (size != 3) {
654
+
342
- op = (insn >> 8) & 0xf;
655
+static void zdma_run(XlnxZDMA *s)
343
- if ((insn & (1 << 6)) == 0) {
656
+{
344
- /* Three registers of different lengths: handled by decodetree */
657
+ while (s->state == ENABLED && !s->error) {
345
- return 1;
658
+ zdma_load_src_descriptor(s);
346
- } else {
659
+
347
- /* Two registers and a scalar. NB that for ops of this form
660
+ if (s->error) {
348
- * the ARM ARM labels bit 24 as Q, but it is in our variable
661
+ zdma_set_state(s, DISABLED);
349
- * 'u', not 'q'.
662
+ } else {
350
- */
663
+ zdma_process_descr(s);
351
- if (size == 0) {
664
+ }
352
- return 1;
665
+ }
353
- }
666
+
354
- switch (op) {
667
+ zdma_ch_imr_update_irq(s);
355
- case 0: /* Integer VMLA scalar */
668
+}
356
- case 4: /* Integer VMLS scalar */
669
+
357
- case 8: /* Integer VMUL scalar */
670
+static void zdma_update_descr_addr_from_start(XlnxZDMA *s)
358
- case 1: /* Float VMLA scalar */
671
+{
359
- case 5: /* Floating point VMLS scalar */
672
+ uint64_t src_addr, dst_addr;
360
- case 9: /* Floating point VMUL scalar */
673
+
361
- case 12: /* VQDMULH scalar */
674
+ src_addr = zdma_get_regaddr64(s, R_ZDMA_CH_SRC_START_LSB);
362
- case 13: /* VQRDMULH scalar */
675
+ zdma_put_regaddr64(s, R_ZDMA_CH_SRC_CUR_DSCR_LSB, src_addr);
363
- case 14: /* VQRDMLAH scalar */
676
+ dst_addr = zdma_get_regaddr64(s, R_ZDMA_CH_DST_START_LSB);
364
- case 15: /* VQRDMLSH scalar */
677
+ zdma_put_regaddr64(s, R_ZDMA_CH_DST_CUR_DSCR_LSB, dst_addr);
365
- return 1; /* handled by decodetree */
678
+ zdma_load_dst_descriptor(s);
366
-
679
+}
367
- case 3: /* VQDMLAL scalar */
680
+
368
- case 7: /* VQDMLSL scalar */
681
+static void zdma_ch_ctrlx_postw(RegisterInfo *reg, uint64_t val64)
369
- case 11: /* VQDMULL scalar */
682
+{
370
- if (u == 1) {
683
+ XlnxZDMA *s = XLNX_ZDMA(reg->opaque);
371
- return 1;
684
+
372
- }
685
+ if (ARRAY_FIELD_EX32(s->regs, ZDMA_CH_CTRL2, EN)) {
373
- /* fall through */
686
+ s->error = false;
374
- case 2: /* VMLAL sclar */
687
+
375
- case 6: /* VMLSL scalar */
688
+ if (s->state == PAUSED &&
376
- case 10: /* VMULL scalar */
689
+ ARRAY_FIELD_EX32(s->regs, ZDMA_CH_CTRL0, CONT)) {
377
- if (rd & 1) {
690
+ if (ARRAY_FIELD_EX32(s->regs, ZDMA_CH_CTRL0, CONT_ADDR) == 1) {
378
- return 1;
691
+ zdma_update_descr_addr_from_start(s);
379
- }
692
+ } else {
380
- tmp2 = neon_get_scalar(size, rm);
693
+ bool src_type = FIELD_EX32(s->dsc_src.words[3],
381
- /* We need a copy of tmp2 because gen_neon_mull
694
+ ZDMA_CH_SRC_DSCR_WORD3, TYPE);
382
- * deletes it during pass 0. */
695
+ zdma_update_descr_addr(s, src_type,
383
- tmp4 = tcg_temp_new_i32();
696
+ R_ZDMA_CH_SRC_CUR_DSCR_LSB);
384
- tcg_gen_mov_i32(tmp4, tmp2);
697
+ }
385
- tmp3 = neon_load_reg(rn, 1);
698
+ ARRAY_FIELD_DP32(s->regs, ZDMA_CH_CTRL0, CONT, false);
386
-
699
+ zdma_set_state(s, ENABLED);
387
- for (pass = 0; pass < 2; pass++) {
700
+ } else if (s->state == DISABLED) {
388
- if (pass == 0) {
701
+ zdma_update_descr_addr_from_start(s);
389
- tmp = neon_load_reg(rn, 0);
702
+ zdma_set_state(s, ENABLED);
390
- } else {
703
+ }
391
- tmp = tmp3;
704
+ } else {
392
- tmp2 = tmp4;
705
+ /* Leave Paused state? */
393
- }
706
+ if (s->state == PAUSED &&
394
- gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
707
+ ARRAY_FIELD_EX32(s->regs, ZDMA_CH_CTRL0, CONT)) {
395
- if (op != 11) {
708
+ zdma_set_state(s, DISABLED);
396
- neon_load_reg64(cpu_V1, rd + pass);
709
+ }
397
- }
710
+ }
398
- switch (op) {
711
+
399
- case 6:
712
+ zdma_run(s);
400
- gen_neon_negl(cpu_V0, size);
713
+}
401
- /* Fall through */
714
+
402
- case 2:
715
+static RegisterAccessInfo zdma_regs_info[] = {
403
- gen_neon_addl(size);
716
+ { .name = "ZDMA_ERR_CTRL", .addr = A_ZDMA_ERR_CTRL,
404
- break;
717
+ .rsvd = 0xfffffffe,
405
- case 3: case 7:
718
+ },{ .name = "ZDMA_CH_ISR", .addr = A_ZDMA_CH_ISR,
406
- gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
719
+ .rsvd = 0xfffff000,
407
- if (op == 7) {
720
+ .w1c = 0xfff,
408
- gen_neon_negl(cpu_V0, size);
721
+ .post_write = zdma_ch_isr_postw,
409
- }
722
+ },{ .name = "ZDMA_CH_IMR", .addr = A_ZDMA_CH_IMR,
410
- gen_neon_addl_saturate(cpu_V0, cpu_V1, size);
723
+ .reset = 0xfff,
411
- break;
724
+ .rsvd = 0xfffff000,
412
- case 10:
725
+ .ro = 0xfff,
413
- /* no-op */
726
+ },{ .name = "ZDMA_CH_IEN", .addr = A_ZDMA_CH_IEN,
414
- break;
727
+ .rsvd = 0xfffff000,
415
- case 11:
728
+ .pre_write = zdma_ch_ien_prew,
416
- gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
729
+ },{ .name = "ZDMA_CH_IDS", .addr = A_ZDMA_CH_IDS,
417
- break;
730
+ .rsvd = 0xfffff000,
418
- default:
731
+ .pre_write = zdma_ch_ids_prew,
419
- abort();
732
+ },{ .name = "ZDMA_CH_CTRL0", .addr = A_ZDMA_CH_CTRL0,
420
- }
733
+ .reset = 0x80,
421
- neon_store_reg64(cpu_V0, rd + pass);
734
+ .rsvd = 0xffffff01,
422
- }
735
+ .post_write = zdma_ch_ctrlx_postw,
423
- break;
736
+ },{ .name = "ZDMA_CH_CTRL1", .addr = A_ZDMA_CH_CTRL1,
424
- default:
737
+ .reset = 0x3ff,
425
- g_assert_not_reached();
738
+ .rsvd = 0xfffffc00,
426
- }
739
+ },{ .name = "ZDMA_CH_FCI", .addr = A_ZDMA_CH_FCI,
427
- }
740
+ .rsvd = 0xffffffc0,
428
+ /*
741
+ },{ .name = "ZDMA_CH_STATUS", .addr = A_ZDMA_CH_STATUS,
429
+ * Three registers of different lengths, or two registers and
742
+ .rsvd = 0xfffffffc,
430
+ * a scalar: handled by decodetree
743
+ .ro = 0x3,
431
+ */
744
+ },{ .name = "ZDMA_CH_DATA_ATTR", .addr = A_ZDMA_CH_DATA_ATTR,
432
+ return 1;
745
+ .reset = 0x483d20f,
433
} else { /* size == 3 */
746
+ .rsvd = 0xf0000000,
434
if (!u) {
747
+ },{ .name = "ZDMA_CH_DSCR_ATTR", .addr = A_ZDMA_CH_DSCR_ATTR,
435
/* Extract. */
748
+ .rsvd = 0xfffffe00,
749
+ },{ .name = "ZDMA_CH_SRC_DSCR_WORD0", .addr = A_ZDMA_CH_SRC_DSCR_WORD0,
750
+ },{ .name = "ZDMA_CH_SRC_DSCR_WORD1", .addr = A_ZDMA_CH_SRC_DSCR_WORD1,
751
+ .rsvd = 0xfffe0000,
752
+ },{ .name = "ZDMA_CH_SRC_DSCR_WORD2", .addr = A_ZDMA_CH_SRC_DSCR_WORD2,
753
+ .rsvd = 0xc0000000,
754
+ },{ .name = "ZDMA_CH_SRC_DSCR_WORD3", .addr = A_ZDMA_CH_SRC_DSCR_WORD3,
755
+ .rsvd = 0xffffffe0,
756
+ },{ .name = "ZDMA_CH_DST_DSCR_WORD0", .addr = A_ZDMA_CH_DST_DSCR_WORD0,
757
+ },{ .name = "ZDMA_CH_DST_DSCR_WORD1", .addr = A_ZDMA_CH_DST_DSCR_WORD1,
758
+ .rsvd = 0xfffe0000,
759
+ },{ .name = "ZDMA_CH_DST_DSCR_WORD2", .addr = A_ZDMA_CH_DST_DSCR_WORD2,
760
+ .rsvd = 0xc0000000,
761
+ },{ .name = "ZDMA_CH_DST_DSCR_WORD3", .addr = A_ZDMA_CH_DST_DSCR_WORD3,
762
+ .rsvd = 0xfffffffa,
763
+ },{ .name = "ZDMA_CH_WR_ONLY_WORD0", .addr = A_ZDMA_CH_WR_ONLY_WORD0,
764
+ },{ .name = "ZDMA_CH_WR_ONLY_WORD1", .addr = A_ZDMA_CH_WR_ONLY_WORD1,
765
+ },{ .name = "ZDMA_CH_WR_ONLY_WORD2", .addr = A_ZDMA_CH_WR_ONLY_WORD2,
766
+ },{ .name = "ZDMA_CH_WR_ONLY_WORD3", .addr = A_ZDMA_CH_WR_ONLY_WORD3,
767
+ },{ .name = "ZDMA_CH_SRC_START_LSB", .addr = A_ZDMA_CH_SRC_START_LSB,
768
+ },{ .name = "ZDMA_CH_SRC_START_MSB", .addr = A_ZDMA_CH_SRC_START_MSB,
769
+ .rsvd = 0xfffe0000,
770
+ },{ .name = "ZDMA_CH_DST_START_LSB", .addr = A_ZDMA_CH_DST_START_LSB,
771
+ },{ .name = "ZDMA_CH_DST_START_MSB", .addr = A_ZDMA_CH_DST_START_MSB,
772
+ .rsvd = 0xfffe0000,
773
+ },{ .name = "ZDMA_CH_SRC_CUR_PYLD_LSB", .addr = A_ZDMA_CH_SRC_CUR_PYLD_LSB,
774
+ .ro = 0xffffffff,
775
+ },{ .name = "ZDMA_CH_SRC_CUR_PYLD_MSB", .addr = A_ZDMA_CH_SRC_CUR_PYLD_MSB,
776
+ .rsvd = 0xfffe0000,
777
+ .ro = 0x1ffff,
778
+ },{ .name = "ZDMA_CH_DST_CUR_PYLD_LSB", .addr = A_ZDMA_CH_DST_CUR_PYLD_LSB,
779
+ .ro = 0xffffffff,
780
+ },{ .name = "ZDMA_CH_DST_CUR_PYLD_MSB", .addr = A_ZDMA_CH_DST_CUR_PYLD_MSB,
781
+ .rsvd = 0xfffe0000,
782
+ .ro = 0x1ffff,
783
+ },{ .name = "ZDMA_CH_SRC_CUR_DSCR_LSB", .addr = A_ZDMA_CH_SRC_CUR_DSCR_LSB,
784
+ .ro = 0xffffffff,
785
+ },{ .name = "ZDMA_CH_SRC_CUR_DSCR_MSB", .addr = A_ZDMA_CH_SRC_CUR_DSCR_MSB,
786
+ .rsvd = 0xfffe0000,
787
+ .ro = 0x1ffff,
788
+ },{ .name = "ZDMA_CH_DST_CUR_DSCR_LSB", .addr = A_ZDMA_CH_DST_CUR_DSCR_LSB,
789
+ .ro = 0xffffffff,
790
+ },{ .name = "ZDMA_CH_DST_CUR_DSCR_MSB", .addr = A_ZDMA_CH_DST_CUR_DSCR_MSB,
791
+ .rsvd = 0xfffe0000,
792
+ .ro = 0x1ffff,
793
+ },{ .name = "ZDMA_CH_TOTAL_BYTE", .addr = A_ZDMA_CH_TOTAL_BYTE,
794
+ .w1c = 0xffffffff,
795
+ },{ .name = "ZDMA_CH_RATE_CNTL", .addr = A_ZDMA_CH_RATE_CNTL,
796
+ .rsvd = 0xfffff000,
797
+ },{ .name = "ZDMA_CH_IRQ_SRC_ACCT", .addr = A_ZDMA_CH_IRQ_SRC_ACCT,
798
+ .rsvd = 0xffffff00,
799
+ .ro = 0xff,
800
+ .cor = 0xff,
801
+ },{ .name = "ZDMA_CH_IRQ_DST_ACCT", .addr = A_ZDMA_CH_IRQ_DST_ACCT,
802
+ .rsvd = 0xffffff00,
803
+ .ro = 0xff,
804
+ .cor = 0xff,
805
+ },{ .name = "ZDMA_CH_DBG0", .addr = A_ZDMA_CH_DBG0,
806
+ .rsvd = 0xfffffe00,
807
+ .ro = 0x1ff,
808
+ },{ .name = "ZDMA_CH_DBG1", .addr = A_ZDMA_CH_DBG1,
809
+ .rsvd = 0xfffffe00,
810
+ .ro = 0x1ff,
811
+ },{ .name = "ZDMA_CH_CTRL2", .addr = A_ZDMA_CH_CTRL2,
812
+ .rsvd = 0xfffffffe,
813
+ .post_write = zdma_ch_ctrlx_postw,
814
+ }
815
+};
816
+
817
+static void zdma_reset(DeviceState *dev)
818
+{
819
+ XlnxZDMA *s = XLNX_ZDMA(dev);
820
+ unsigned int i;
821
+
822
+ for (i = 0; i < ARRAY_SIZE(s->regs_info); ++i) {
823
+ register_reset(&s->regs_info[i]);
824
+ }
825
+
826
+ zdma_ch_imr_update_irq(s);
827
+}
828
+
829
+static uint64_t zdma_read(void *opaque, hwaddr addr, unsigned size)
830
+{
831
+ XlnxZDMA *s = XLNX_ZDMA(opaque);
832
+ RegisterInfo *r = &s->regs_info[addr / 4];
833
+
834
+ if (!r->data) {
835
+ qemu_log("%s: Decode error: read from %" HWADDR_PRIx "\n",
836
+ object_get_canonical_path(OBJECT(s)),
837
+ addr);
838
+ ARRAY_FIELD_DP32(s->regs, ZDMA_CH_ISR, INV_APB, true);
839
+ zdma_ch_imr_update_irq(s);
840
+ return 0;
841
+ }
842
+ return register_read(r, ~0, NULL, false);
843
+}
844
+
845
+static void zdma_write(void *opaque, hwaddr addr, uint64_t value,
846
+ unsigned size)
847
+{
848
+ XlnxZDMA *s = XLNX_ZDMA(opaque);
849
+ RegisterInfo *r = &s->regs_info[addr / 4];
850
+
851
+ if (!r->data) {
852
+ qemu_log("%s: Decode error: write to %" HWADDR_PRIx "=%" PRIx64 "\n",
853
+ object_get_canonical_path(OBJECT(s)),
854
+ addr, value);
855
+ ARRAY_FIELD_DP32(s->regs, ZDMA_CH_ISR, INV_APB, true);
856
+ zdma_ch_imr_update_irq(s);
857
+ return;
858
+ }
859
+ register_write(r, value, ~0, NULL, false);
860
+}
861
+
862
+static const MemoryRegionOps zdma_ops = {
863
+ .read = zdma_read,
864
+ .write = zdma_write,
865
+ .endianness = DEVICE_LITTLE_ENDIAN,
866
+ .valid = {
867
+ .min_access_size = 4,
868
+ .max_access_size = 4,
869
+ },
870
+};
871
+
872
+static void zdma_realize(DeviceState *dev, Error **errp)
873
+{
874
+ XlnxZDMA *s = XLNX_ZDMA(dev);
875
+ unsigned int i;
876
+
877
+ for (i = 0; i < ARRAY_SIZE(zdma_regs_info); ++i) {
878
+ RegisterInfo *r = &s->regs_info[zdma_regs_info[i].addr / 4];
879
+
880
+ *r = (RegisterInfo) {
881
+ .data = (uint8_t *)&s->regs[
882
+ zdma_regs_info[i].addr / 4],
883
+ .data_size = sizeof(uint32_t),
884
+ .access = &zdma_regs_info[i],
885
+ .opaque = s,
886
+ };
887
+ }
888
+
889
+ if (s->dma_mr) {
890
+ s->dma_as = g_malloc0(sizeof(AddressSpace));
891
+ address_space_init(s->dma_as, s->dma_mr, NULL);
892
+ } else {
893
+ s->dma_as = &address_space_memory;
894
+ }
895
+ s->attr = MEMTXATTRS_UNSPECIFIED;
896
+}
897
+
898
+static void zdma_init(Object *obj)
899
+{
900
+ XlnxZDMA *s = XLNX_ZDMA(obj);
901
+ SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
902
+
903
+ memory_region_init_io(&s->iomem, obj, &zdma_ops, s,
904
+ TYPE_XLNX_ZDMA, ZDMA_R_MAX * 4);
905
+ sysbus_init_mmio(sbd, &s->iomem);
906
+ sysbus_init_irq(sbd, &s->irq_zdma_ch_imr);
907
+
908
+ object_property_add_link(obj, "dma", TYPE_MEMORY_REGION,
909
+ (Object **)&s->dma_mr,
910
+ qdev_prop_allow_set_link_before_realize,
911
+ OBJ_PROP_LINK_UNREF_ON_RELEASE,
912
+ &error_abort);
913
+}
914
+
915
+static const VMStateDescription vmstate_zdma = {
916
+ .name = TYPE_XLNX_ZDMA,
917
+ .version_id = 1,
918
+ .minimum_version_id = 1,
919
+ .minimum_version_id_old = 1,
920
+ .fields = (VMStateField[]) {
921
+ VMSTATE_UINT32_ARRAY(regs, XlnxZDMA, ZDMA_R_MAX),
922
+ VMSTATE_UINT32(state, XlnxZDMA),
923
+ VMSTATE_UINT32_ARRAY(dsc_src.words, XlnxZDMA, 4),
924
+ VMSTATE_UINT32_ARRAY(dsc_dst.words, XlnxZDMA, 4),
925
+ VMSTATE_END_OF_LIST(),
926
+ }
927
+};
928
+
929
+static Property zdma_props[] = {
930
+ DEFINE_PROP_UINT32("bus-width", XlnxZDMA, cfg.bus_width, 64),
931
+ DEFINE_PROP_END_OF_LIST(),
932
+};
933
+
934
+static void zdma_class_init(ObjectClass *klass, void *data)
935
+{
936
+ DeviceClass *dc = DEVICE_CLASS(klass);
937
+
938
+ dc->reset = zdma_reset;
939
+ dc->realize = zdma_realize;
940
+ dc->props = zdma_props;
941
+ dc->vmsd = &vmstate_zdma;
942
+}
943
+
944
+static const TypeInfo zdma_info = {
945
+ .name = TYPE_XLNX_ZDMA,
946
+ .parent = TYPE_SYS_BUS_DEVICE,
947
+ .instance_size = sizeof(XlnxZDMA),
948
+ .class_init = zdma_class_init,
949
+ .instance_init = zdma_init,
950
+};
951
+
952
+static void zdma_register_types(void)
953
+{
954
+ type_register_static(&zdma_info);
955
+}
956
+
957
+type_init(zdma_register_types)
958
--
436
--
959
2.17.0
437
2.20.1
960
438
961
439
diff view generated by jsdifflib
1
From: Richard Henderson <richard.henderson@linaro.org>
1
Convert the Neon VEXT insn to decodetree. Rather than keeping the
2
2
old implementation which used fixed temporaries cpu_V0 and cpu_V1
3
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
3
and did the extraction with by-hand shift and logic ops, we use
4
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
4
the TCG extract2 insn.
5
Message-id: 20180516223007.10256-6-richard.henderson@linaro.org
5
6
We don't need to special case 0 or 8 immediates any more as the
7
optimizer is smart enough to throw away the dead code.
8
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
10
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
7
---
11
---
8
target/arm/Makefile.objs | 2 +-
12
target/arm/neon-dp.decode | 8 +++-
9
target/arm/helper-sve.h | 21 ++++++++++
13
target/arm/translate-neon.inc.c | 76 +++++++++++++++++++++++++++++++++
10
target/arm/helper.h | 1 +
14
target/arm/translate.c | 58 +------------------------
11
target/arm/sve_helper.c | 78 ++++++++++++++++++++++++++++++++++++++
15
3 files changed, 85 insertions(+), 57 deletions(-)
12
target/arm/translate-sve.c | 65 +++++++++++++++++++++++++++++++
16
13
target/arm/sve.decode | 5 +++
17
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
14
6 files changed, 171 insertions(+), 1 deletion(-)
15
create mode 100644 target/arm/helper-sve.h
16
create mode 100644 target/arm/sve_helper.c
17
18
diff --git a/target/arm/Makefile.objs b/target/arm/Makefile.objs
19
index XXXXXXX..XXXXXXX 100644
18
index XXXXXXX..XXXXXXX 100644
20
--- a/target/arm/Makefile.objs
19
--- a/target/arm/neon-dp.decode
21
+++ b/target/arm/Makefile.objs
20
+++ b/target/arm/neon-dp.decode
22
@@ -XXX,XX +XXX,XX @@ target/arm/decode-sve.inc.c: $(SRC_PATH)/target/arm/sve.decode $(DECODETREE)
21
@@ -XXX,XX +XXX,XX @@ Vimm_1r 1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
23
     "GEN", $(TARGET_DIR)$@)
22
# return false for size==3.
24
23
######################################################################
25
target/arm/translate-sve.o: target/arm/decode-sve.inc.c
24
{
26
-obj-$(TARGET_AARCH64) += translate-sve.o
25
- # 0b11 subgroup will go here
27
+obj-$(TARGET_AARCH64) += translate-sve.o sve_helper.o
26
+ [
28
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
27
+ ##################################################################
29
new file mode 100644
28
+ # Miscellaneous size=0b11 insns
30
index XXXXXXX..XXXXXXX
29
+ ##################################################################
31
--- /dev/null
30
+ VEXT 1111 001 0 1 . 11 .... .... imm:4 . q:1 . 0 .... \
32
+++ b/target/arm/helper-sve.h
31
+ vm=%vm_dp vn=%vn_dp vd=%vd_dp
33
@@ -XXX,XX +XXX,XX @@
32
+ ]
34
+/*
33
35
+ * AArch64 SVE specific helper definitions
34
# Subgroup for size != 0b11
36
+ *
35
[
37
+ * Copyright (c) 2018 Linaro, Ltd
36
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
38
+ *
39
+ * This library is free software; you can redistribute it and/or
40
+ * modify it under the terms of the GNU Lesser General Public
41
+ * License as published by the Free Software Foundation; either
42
+ * version 2 of the License, or (at your option) any later version.
43
+ *
44
+ * This library is distributed in the hope that it will be useful,
45
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
46
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
47
+ * Lesser General Public License for more details.
48
+ *
49
+ * You should have received a copy of the GNU Lesser General Public
50
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
51
+ */
52
+
53
+DEF_HELPER_FLAGS_2(sve_predtest1, TCG_CALL_NO_WG, i32, i64, i64)
54
+DEF_HELPER_FLAGS_3(sve_predtest, TCG_CALL_NO_WG, i32, ptr, ptr, i32)
55
diff --git a/target/arm/helper.h b/target/arm/helper.h
56
index XXXXXXX..XXXXXXX 100644
37
index XXXXXXX..XXXXXXX 100644
57
--- a/target/arm/helper.h
38
--- a/target/arm/translate-neon.inc.c
58
+++ b/target/arm/helper.h
39
+++ b/target/arm/translate-neon.inc.c
59
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fcmlad, TCG_CALL_NO_RWG,
40
@@ -XXX,XX +XXX,XX @@ static bool trans_VQDMLSL_2sc(DisasContext *s, arg_2scalar *a)
60
41
61
#ifdef TARGET_AARCH64
42
return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
62
#include "helper-a64.h"
43
}
63
+#include "helper-sve.h"
44
+
64
#endif
45
+static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
65
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
66
new file mode 100644
67
index XXXXXXX..XXXXXXX
68
--- /dev/null
69
+++ b/target/arm/sve_helper.c
70
@@ -XXX,XX +XXX,XX @@
71
+/*
72
+ * ARM SVE Operations
73
+ *
74
+ * Copyright (c) 2018 Linaro, Ltd.
75
+ *
76
+ * This library is free software; you can redistribute it and/or
77
+ * modify it under the terms of the GNU Lesser General Public
78
+ * License as published by the Free Software Foundation; either
79
+ * version 2 of the License, or (at your option) any later version.
80
+ *
81
+ * This library is distributed in the hope that it will be useful,
82
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
83
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
84
+ * Lesser General Public License for more details.
85
+ *
86
+ * You should have received a copy of the GNU Lesser General Public
87
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
88
+ */
89
+
90
+#include "qemu/osdep.h"
91
+#include "cpu.h"
92
+#include "exec/exec-all.h"
93
+#include "exec/cpu_ldst.h"
94
+#include "exec/helper-proto.h"
95
+#include "tcg/tcg-gvec-desc.h"
96
+
97
+
98
+/* Return a value for NZCV as per the ARM PredTest pseudofunction.
99
+ *
100
+ * The return value has bit 31 set if N is set, bit 1 set if Z is clear,
101
+ * and bit 0 set if C is set. Compare the definitions of these variables
102
+ * within CPUARMState.
103
+ */
104
+
105
+/* For no G bits set, NZCV = C. */
106
+#define PREDTEST_INIT 1
107
+
108
+/* This is an iterative function, called for each Pd and Pg word
109
+ * moving forward.
110
+ */
111
+static uint32_t iter_predtest_fwd(uint64_t d, uint64_t g, uint32_t flags)
112
+{
46
+{
113
+ if (likely(g)) {
47
+ if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
114
+ /* Compute N from first D & G.
48
+ return false;
115
+ Use bit 2 to signal first G bit seen. */
49
+ }
116
+ if (!(flags & 4)) {
50
+
117
+ flags |= ((d & (g & -g)) != 0) << 31;
51
+ /* UNDEF accesses to D16-D31 if they don't exist. */
118
+ flags |= 4;
52
+ if (!dc_isar_feature(aa32_simd_r32, s) &&
53
+ ((a->vd | a->vn | a->vm) & 0x10)) {
54
+ return false;
55
+ }
56
+
57
+ if ((a->vn | a->vm | a->vd) & a->q) {
58
+ return false;
59
+ }
60
+
61
+ if (a->imm > 7 && !a->q) {
62
+ return false;
63
+ }
64
+
65
+ if (!vfp_access_check(s)) {
66
+ return true;
67
+ }
68
+
69
+ if (!a->q) {
70
+ /* Extract 64 bits from <Vm:Vn> */
71
+ TCGv_i64 left, right, dest;
72
+
73
+ left = tcg_temp_new_i64();
74
+ right = tcg_temp_new_i64();
75
+ dest = tcg_temp_new_i64();
76
+
77
+ neon_load_reg64(right, a->vn);
78
+ neon_load_reg64(left, a->vm);
79
+ tcg_gen_extract2_i64(dest, right, left, a->imm * 8);
80
+ neon_store_reg64(dest, a->vd);
81
+
82
+ tcg_temp_free_i64(left);
83
+ tcg_temp_free_i64(right);
84
+ tcg_temp_free_i64(dest);
85
+ } else {
86
+ /* Extract 128 bits from <Vm+1:Vm:Vn+1:Vn> */
87
+ TCGv_i64 left, middle, right, destleft, destright;
88
+
89
+ left = tcg_temp_new_i64();
90
+ middle = tcg_temp_new_i64();
91
+ right = tcg_temp_new_i64();
92
+ destleft = tcg_temp_new_i64();
93
+ destright = tcg_temp_new_i64();
94
+
95
+ if (a->imm < 8) {
96
+ neon_load_reg64(right, a->vn);
97
+ neon_load_reg64(middle, a->vn + 1);
98
+ tcg_gen_extract2_i64(destright, right, middle, a->imm * 8);
99
+ neon_load_reg64(left, a->vm);
100
+ tcg_gen_extract2_i64(destleft, middle, left, a->imm * 8);
101
+ } else {
102
+ neon_load_reg64(right, a->vn + 1);
103
+ neon_load_reg64(middle, a->vm);
104
+ tcg_gen_extract2_i64(destright, right, middle, (a->imm - 8) * 8);
105
+ neon_load_reg64(left, a->vm + 1);
106
+ tcg_gen_extract2_i64(destleft, middle, left, (a->imm - 8) * 8);
119
+ }
107
+ }
120
+
108
+
121
+ /* Accumulate Z from each D & G. */
109
+ neon_store_reg64(destright, a->vd);
122
+ flags |= ((d & g) != 0) << 1;
110
+ neon_store_reg64(destleft, a->vd + 1);
123
+
111
+
124
+ /* Compute C from last !(D & G). Replace previous. */
112
+ tcg_temp_free_i64(destright);
125
+ flags = deposit32(flags, 0, 1, (d & pow2floor(g)) == 0);
113
+ tcg_temp_free_i64(destleft);
126
+ }
114
+ tcg_temp_free_i64(right);
127
+ return flags;
115
+ tcg_temp_free_i64(middle);
128
+}
116
+ tcg_temp_free_i64(left);
129
+
130
+/* The same for a single word predicate. */
131
+uint32_t HELPER(sve_predtest1)(uint64_t d, uint64_t g)
132
+{
133
+ return iter_predtest_fwd(d, g, PREDTEST_INIT);
134
+}
135
+
136
+/* The same for a multi-word predicate. */
137
+uint32_t HELPER(sve_predtest)(void *vd, void *vg, uint32_t words)
138
+{
139
+ uint32_t flags = PREDTEST_INIT;
140
+ uint64_t *d = vd, *g = vg;
141
+ uintptr_t i = 0;
142
+
143
+ do {
144
+ flags = iter_predtest_fwd(d[i], g[i], flags);
145
+ } while (++i < words);
146
+
147
+ return flags;
148
+}
149
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
150
index XXXXXXX..XXXXXXX 100644
151
--- a/target/arm/translate-sve.c
152
+++ b/target/arm/translate-sve.c
153
@@ -XXX,XX +XXX,XX @@ static bool do_mov_z(DisasContext *s, int rd, int rn)
154
return do_vector2_z(s, tcg_gen_gvec_mov, 0, rd, rn);
155
}
156
157
+/* Set the cpu flags as per a return from an SVE helper. */
158
+static void do_pred_flags(TCGv_i32 t)
159
+{
160
+ tcg_gen_mov_i32(cpu_NF, t);
161
+ tcg_gen_andi_i32(cpu_ZF, t, 2);
162
+ tcg_gen_andi_i32(cpu_CF, t, 1);
163
+ tcg_gen_movi_i32(cpu_VF, 0);
164
+}
165
+
166
+/* Subroutines computing the ARM PredTest psuedofunction. */
167
+static void do_predtest1(TCGv_i64 d, TCGv_i64 g)
168
+{
169
+ TCGv_i32 t = tcg_temp_new_i32();
170
+
171
+ gen_helper_sve_predtest1(t, d, g);
172
+ do_pred_flags(t);
173
+ tcg_temp_free_i32(t);
174
+}
175
+
176
+static void do_predtest(DisasContext *s, int dofs, int gofs, int words)
177
+{
178
+ TCGv_ptr dptr = tcg_temp_new_ptr();
179
+ TCGv_ptr gptr = tcg_temp_new_ptr();
180
+ TCGv_i32 t;
181
+
182
+ tcg_gen_addi_ptr(dptr, cpu_env, dofs);
183
+ tcg_gen_addi_ptr(gptr, cpu_env, gofs);
184
+ t = tcg_const_i32(words);
185
+
186
+ gen_helper_sve_predtest(t, dptr, gptr, t);
187
+ tcg_temp_free_ptr(dptr);
188
+ tcg_temp_free_ptr(gptr);
189
+
190
+ do_pred_flags(t);
191
+ tcg_temp_free_i32(t);
192
+}
193
+
194
/*
195
*** SVE Logical - Unpredicated Group
196
*/
197
@@ -XXX,XX +XXX,XX @@ static bool trans_BIC_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
198
return do_vector3_z(s, tcg_gen_gvec_andc, 0, a->rd, a->rn, a->rm);
199
}
200
201
+/*
202
+ *** SVE Predicate Misc Group
203
+ */
204
+
205
+static bool trans_PTEST(DisasContext *s, arg_PTEST *a, uint32_t insn)
206
+{
207
+ if (sve_access_check(s)) {
208
+ int nofs = pred_full_reg_offset(s, a->rn);
209
+ int gofs = pred_full_reg_offset(s, a->pg);
210
+ int words = DIV_ROUND_UP(pred_full_reg_size(s), 8);
211
+
212
+ if (words == 1) {
213
+ TCGv_i64 pn = tcg_temp_new_i64();
214
+ TCGv_i64 pg = tcg_temp_new_i64();
215
+
216
+ tcg_gen_ld_i64(pn, cpu_env, nofs);
217
+ tcg_gen_ld_i64(pg, cpu_env, gofs);
218
+ do_predtest1(pn, pg);
219
+
220
+ tcg_temp_free_i64(pn);
221
+ tcg_temp_free_i64(pg);
222
+ } else {
223
+ do_predtest(s, nofs, gofs, words);
224
+ }
225
+ }
117
+ }
226
+ return true;
118
+ return true;
227
+}
119
+}
228
+
120
diff --git a/target/arm/translate.c b/target/arm/translate.c
229
/*
230
*** SVE Memory - 32-bit Gather and Unsized Contiguous Group
231
*/
232
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
233
index XXXXXXX..XXXXXXX 100644
121
index XXXXXXX..XXXXXXX 100644
234
--- a/target/arm/sve.decode
122
--- a/target/arm/translate.c
235
+++ b/target/arm/sve.decode
123
+++ b/target/arm/translate.c
236
@@ -XXX,XX +XXX,XX @@ ORR_zzz 00000100 01 1 ..... 001 100 ..... ..... @rd_rn_rm_e0
124
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
237
EOR_zzz 00000100 10 1 ..... 001 100 ..... ..... @rd_rn_rm_e0
125
int pass;
238
BIC_zzz 00000100 11 1 ..... 001 100 ..... ..... @rd_rn_rm_e0
126
int u;
239
127
int vec_size;
240
+### SVE Predicate Misc Group
128
- uint32_t imm;
241
+
129
TCGv_i32 tmp, tmp2, tmp3, tmp5;
242
+# SVE predicate test
130
TCGv_ptr ptr1;
243
+PTEST 00100101 01 010000 11 pg:4 0 rn:4 0 0000
131
- TCGv_i64 tmp64;
244
+
132
245
### SVE Memory - 32-bit Gather and Unsized Contiguous Group
133
if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
246
134
return 1;
247
# SVE load predicate register
135
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
136
return 1;
137
} else { /* size == 3 */
138
if (!u) {
139
- /* Extract. */
140
- imm = (insn >> 8) & 0xf;
141
-
142
- if (imm > 7 && !q)
143
- return 1;
144
-
145
- if (q && ((rd | rn | rm) & 1)) {
146
- return 1;
147
- }
148
-
149
- if (imm == 0) {
150
- neon_load_reg64(cpu_V0, rn);
151
- if (q) {
152
- neon_load_reg64(cpu_V1, rn + 1);
153
- }
154
- } else if (imm == 8) {
155
- neon_load_reg64(cpu_V0, rn + 1);
156
- if (q) {
157
- neon_load_reg64(cpu_V1, rm);
158
- }
159
- } else if (q) {
160
- tmp64 = tcg_temp_new_i64();
161
- if (imm < 8) {
162
- neon_load_reg64(cpu_V0, rn);
163
- neon_load_reg64(tmp64, rn + 1);
164
- } else {
165
- neon_load_reg64(cpu_V0, rn + 1);
166
- neon_load_reg64(tmp64, rm);
167
- }
168
- tcg_gen_shri_i64(cpu_V0, cpu_V0, (imm & 7) * 8);
169
- tcg_gen_shli_i64(cpu_V1, tmp64, 64 - ((imm & 7) * 8));
170
- tcg_gen_or_i64(cpu_V0, cpu_V0, cpu_V1);
171
- if (imm < 8) {
172
- neon_load_reg64(cpu_V1, rm);
173
- } else {
174
- neon_load_reg64(cpu_V1, rm + 1);
175
- imm -= 8;
176
- }
177
- tcg_gen_shli_i64(cpu_V1, cpu_V1, 64 - (imm * 8));
178
- tcg_gen_shri_i64(tmp64, tmp64, imm * 8);
179
- tcg_gen_or_i64(cpu_V1, cpu_V1, tmp64);
180
- tcg_temp_free_i64(tmp64);
181
- } else {
182
- /* BUGFIX */
183
- neon_load_reg64(cpu_V0, rn);
184
- tcg_gen_shri_i64(cpu_V0, cpu_V0, imm * 8);
185
- neon_load_reg64(cpu_V1, rm);
186
- tcg_gen_shli_i64(cpu_V1, cpu_V1, 64 - (imm * 8));
187
- tcg_gen_or_i64(cpu_V0, cpu_V0, cpu_V1);
188
- }
189
- neon_store_reg64(cpu_V0, rd);
190
- if (q) {
191
- neon_store_reg64(cpu_V1, rd + 1);
192
- }
193
+ /* Extract: handled by decodetree */
194
+ return 1;
195
} else if ((insn & (1 << 11)) == 0) {
196
/* Two register misc. */
197
op = ((insn >> 12) & 0x30) | ((insn >> 7) & 0xf);
248
--
198
--
249
2.17.0
199
2.20.1
250
200
251
201
diff view generated by jsdifflib
1
From: Richard Henderson <richard.henderson@linaro.org>
1
Convert the Neon VTBL, VTBX instructions to decodetree. The actual
2
implementation of the insn is copied across to the new trans function
3
unchanged except for renaming 'tmp5' to 'tmp4'.
2
4
3
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
5
Message-id: 20180516223007.10256-23-richard.henderson@linaro.org
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
7
---
7
---
8
target/arm/helper-sve.h | 11 ++
8
target/arm/neon-dp.decode | 3 ++
9
target/arm/sve_helper.c | 136 ++++++++++++++++++
9
target/arm/translate-neon.inc.c | 56 +++++++++++++++++++++++++++++++++
10
target/arm/translate-sve.c | 288 +++++++++++++++++++++++++++++++++++++
10
target/arm/translate.c | 41 +++---------------------
11
target/arm/sve.decode | 31 +++-
11
3 files changed, 63 insertions(+), 37 deletions(-)
12
4 files changed, 465 insertions(+), 1 deletion(-)
13
12
14
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
13
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
15
index XXXXXXX..XXXXXXX 100644
14
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/helper-sve.h
15
--- a/target/arm/neon-dp.decode
17
+++ b/target/arm/helper-sve.h
16
+++ b/target/arm/neon-dp.decode
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_ftssel_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
17
@@ -XXX,XX +XXX,XX @@ Vimm_1r 1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
19
DEF_HELPER_FLAGS_4(sve_ftssel_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
18
##################################################################
20
DEF_HELPER_FLAGS_4(sve_ftssel_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
19
VEXT 1111 001 0 1 . 11 .... .... imm:4 . q:1 . 0 .... \
21
20
vm=%vm_dp vn=%vn_dp vd=%vd_dp
22
+DEF_HELPER_FLAGS_4(sve_sqaddi_b, TCG_CALL_NO_RWG, void, ptr, ptr, s32, i32)
23
+DEF_HELPER_FLAGS_4(sve_sqaddi_h, TCG_CALL_NO_RWG, void, ptr, ptr, s32, i32)
24
+DEF_HELPER_FLAGS_4(sve_sqaddi_s, TCG_CALL_NO_RWG, void, ptr, ptr, s64, i32)
25
+DEF_HELPER_FLAGS_4(sve_sqaddi_d, TCG_CALL_NO_RWG, void, ptr, ptr, s64, i32)
26
+
21
+
27
+DEF_HELPER_FLAGS_4(sve_uqaddi_b, TCG_CALL_NO_RWG, void, ptr, ptr, s32, i32)
22
+ VTBL 1111 001 1 1 . 11 .... .... 10 len:2 . op:1 . 0 .... \
28
+DEF_HELPER_FLAGS_4(sve_uqaddi_h, TCG_CALL_NO_RWG, void, ptr, ptr, s32, i32)
23
+ vm=%vm_dp vn=%vn_dp vd=%vd_dp
29
+DEF_HELPER_FLAGS_4(sve_uqaddi_s, TCG_CALL_NO_RWG, void, ptr, ptr, s64, i32)
24
]
30
+DEF_HELPER_FLAGS_4(sve_uqaddi_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
25
31
+DEF_HELPER_FLAGS_4(sve_uqsubi_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
26
# Subgroup for size != 0b11
32
+
27
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
33
DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
34
DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
35
DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
36
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
37
index XXXXXXX..XXXXXXX 100644
28
index XXXXXXX..XXXXXXX 100644
38
--- a/target/arm/sve_helper.c
29
--- a/target/arm/translate-neon.inc.c
39
+++ b/target/arm/sve_helper.c
30
+++ b/target/arm/translate-neon.inc.c
40
@@ -XXX,XX +XXX,XX @@ void HELPER(sve_ftssel_d)(void *vd, void *vn, void *vm, uint32_t desc)
31
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
41
d[i] = nn ^ (mm & 2) << 62;
42
}
32
}
33
return true;
43
}
34
}
44
+
35
+
45
+/*
36
+static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
46
+ * Signed saturating addition with scalar operand.
37
+{
47
+ */
38
+ int n;
39
+ TCGv_i32 tmp, tmp2, tmp3, tmp4;
40
+ TCGv_ptr ptr1;
48
+
41
+
49
+void HELPER(sve_sqaddi_b)(void *d, void *a, int32_t b, uint32_t desc)
42
+ if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
50
+{
43
+ return false;
51
+ intptr_t i, oprsz = simd_oprsz(desc);
52
+
53
+ for (i = 0; i < oprsz; i += sizeof(int8_t)) {
54
+ int r = *(int8_t *)(a + i) + b;
55
+ if (r > INT8_MAX) {
56
+ r = INT8_MAX;
57
+ } else if (r < INT8_MIN) {
58
+ r = INT8_MIN;
59
+ }
60
+ *(int8_t *)(d + i) = r;
61
+ }
62
+}
63
+
64
+void HELPER(sve_sqaddi_h)(void *d, void *a, int32_t b, uint32_t desc)
65
+{
66
+ intptr_t i, oprsz = simd_oprsz(desc);
67
+
68
+ for (i = 0; i < oprsz; i += sizeof(int16_t)) {
69
+ int r = *(int16_t *)(a + i) + b;
70
+ if (r > INT16_MAX) {
71
+ r = INT16_MAX;
72
+ } else if (r < INT16_MIN) {
73
+ r = INT16_MIN;
74
+ }
75
+ *(int16_t *)(d + i) = r;
76
+ }
77
+}
78
+
79
+void HELPER(sve_sqaddi_s)(void *d, void *a, int64_t b, uint32_t desc)
80
+{
81
+ intptr_t i, oprsz = simd_oprsz(desc);
82
+
83
+ for (i = 0; i < oprsz; i += sizeof(int32_t)) {
84
+ int64_t r = *(int32_t *)(a + i) + b;
85
+ if (r > INT32_MAX) {
86
+ r = INT32_MAX;
87
+ } else if (r < INT32_MIN) {
88
+ r = INT32_MIN;
89
+ }
90
+ *(int32_t *)(d + i) = r;
91
+ }
92
+}
93
+
94
+void HELPER(sve_sqaddi_d)(void *d, void *a, int64_t b, uint32_t desc)
95
+{
96
+ intptr_t i, oprsz = simd_oprsz(desc);
97
+
98
+ for (i = 0; i < oprsz; i += sizeof(int64_t)) {
99
+ int64_t ai = *(int64_t *)(a + i);
100
+ int64_t r = ai + b;
101
+ if (((r ^ ai) & ~(ai ^ b)) < 0) {
102
+ /* Signed overflow. */
103
+ r = (r < 0 ? INT64_MAX : INT64_MIN);
104
+ }
105
+ *(int64_t *)(d + i) = r;
106
+ }
107
+}
108
+
109
+/*
110
+ * Unsigned saturating addition with scalar operand.
111
+ */
112
+
113
+void HELPER(sve_uqaddi_b)(void *d, void *a, int32_t b, uint32_t desc)
114
+{
115
+ intptr_t i, oprsz = simd_oprsz(desc);
116
+
117
+ for (i = 0; i < oprsz; i += sizeof(uint8_t)) {
118
+ int r = *(uint8_t *)(a + i) + b;
119
+ if (r > UINT8_MAX) {
120
+ r = UINT8_MAX;
121
+ } else if (r < 0) {
122
+ r = 0;
123
+ }
124
+ *(uint8_t *)(d + i) = r;
125
+ }
126
+}
127
+
128
+void HELPER(sve_uqaddi_h)(void *d, void *a, int32_t b, uint32_t desc)
129
+{
130
+ intptr_t i, oprsz = simd_oprsz(desc);
131
+
132
+ for (i = 0; i < oprsz; i += sizeof(uint16_t)) {
133
+ int r = *(uint16_t *)(a + i) + b;
134
+ if (r > UINT16_MAX) {
135
+ r = UINT16_MAX;
136
+ } else if (r < 0) {
137
+ r = 0;
138
+ }
139
+ *(uint16_t *)(d + i) = r;
140
+ }
141
+}
142
+
143
+void HELPER(sve_uqaddi_s)(void *d, void *a, int64_t b, uint32_t desc)
144
+{
145
+ intptr_t i, oprsz = simd_oprsz(desc);
146
+
147
+ for (i = 0; i < oprsz; i += sizeof(uint32_t)) {
148
+ int64_t r = *(uint32_t *)(a + i) + b;
149
+ if (r > UINT32_MAX) {
150
+ r = UINT32_MAX;
151
+ } else if (r < 0) {
152
+ r = 0;
153
+ }
154
+ *(uint32_t *)(d + i) = r;
155
+ }
156
+}
157
+
158
+void HELPER(sve_uqaddi_d)(void *d, void *a, uint64_t b, uint32_t desc)
159
+{
160
+ intptr_t i, oprsz = simd_oprsz(desc);
161
+
162
+ for (i = 0; i < oprsz; i += sizeof(uint64_t)) {
163
+ uint64_t r = *(uint64_t *)(a + i) + b;
164
+ if (r < b) {
165
+ r = UINT64_MAX;
166
+ }
167
+ *(uint64_t *)(d + i) = r;
168
+ }
169
+}
170
+
171
+void HELPER(sve_uqsubi_d)(void *d, void *a, uint64_t b, uint32_t desc)
172
+{
173
+ intptr_t i, oprsz = simd_oprsz(desc);
174
+
175
+ for (i = 0; i < oprsz; i += sizeof(uint64_t)) {
176
+ uint64_t ai = *(uint64_t *)(a + i);
177
+ *(uint64_t *)(d + i) = (ai < b ? 0 : ai - b);
178
+ }
179
+}
180
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
181
index XXXXXXX..XXXXXXX 100644
182
--- a/target/arm/translate-sve.c
183
+++ b/target/arm/translate-sve.c
184
@@ -XXX,XX +XXX,XX @@ static int tszimm_shl(int x)
185
return x - (8 << tszimm_esz(x));
186
}
187
188
+static inline int plus1(int x)
189
+{
190
+ return x + 1;
191
+}
192
+
193
/*
194
* Include the generated decoder.
195
*/
196
@@ -XXX,XX +XXX,XX @@ static bool trans_PNEXT(DisasContext *s, arg_rr_esz *a, uint32_t insn)
197
return do_pfirst_pnext(s, a, gen_helper_sve_pnext);
198
}
199
200
+/*
201
+ *** SVE Element Count Group
202
+ */
203
+
204
+/* Perform an inline saturating addition of a 32-bit value within
205
+ * a 64-bit register. The second operand is known to be positive,
206
+ * which halves the comparisions we must perform to bound the result.
207
+ */
208
+static void do_sat_addsub_32(TCGv_i64 reg, TCGv_i64 val, bool u, bool d)
209
+{
210
+ int64_t ibound;
211
+ TCGv_i64 bound;
212
+ TCGCond cond;
213
+
214
+ /* Use normal 64-bit arithmetic to detect 32-bit overflow. */
215
+ if (u) {
216
+ tcg_gen_ext32u_i64(reg, reg);
217
+ } else {
218
+ tcg_gen_ext32s_i64(reg, reg);
219
+ }
220
+ if (d) {
221
+ tcg_gen_sub_i64(reg, reg, val);
222
+ ibound = (u ? 0 : INT32_MIN);
223
+ cond = TCG_COND_LT;
224
+ } else {
225
+ tcg_gen_add_i64(reg, reg, val);
226
+ ibound = (u ? UINT32_MAX : INT32_MAX);
227
+ cond = TCG_COND_GT;
228
+ }
229
+ bound = tcg_const_i64(ibound);
230
+ tcg_gen_movcond_i64(cond, reg, reg, bound, bound, reg);
231
+ tcg_temp_free_i64(bound);
232
+}
233
+
234
+/* Similarly with 64-bit values. */
235
+static void do_sat_addsub_64(TCGv_i64 reg, TCGv_i64 val, bool u, bool d)
236
+{
237
+ TCGv_i64 t0 = tcg_temp_new_i64();
238
+ TCGv_i64 t1 = tcg_temp_new_i64();
239
+ TCGv_i64 t2;
240
+
241
+ if (u) {
242
+ if (d) {
243
+ tcg_gen_sub_i64(t0, reg, val);
244
+ tcg_gen_movi_i64(t1, 0);
245
+ tcg_gen_movcond_i64(TCG_COND_LTU, reg, reg, val, t1, t0);
246
+ } else {
247
+ tcg_gen_add_i64(t0, reg, val);
248
+ tcg_gen_movi_i64(t1, -1);
249
+ tcg_gen_movcond_i64(TCG_COND_LTU, reg, t0, reg, t1, t0);
250
+ }
251
+ } else {
252
+ if (d) {
253
+ /* Detect signed overflow for subtraction. */
254
+ tcg_gen_xor_i64(t0, reg, val);
255
+ tcg_gen_sub_i64(t1, reg, val);
256
+ tcg_gen_xor_i64(reg, reg, t0);
257
+ tcg_gen_and_i64(t0, t0, reg);
258
+
259
+ /* Bound the result. */
260
+ tcg_gen_movi_i64(reg, INT64_MIN);
261
+ t2 = tcg_const_i64(0);
262
+ tcg_gen_movcond_i64(TCG_COND_LT, reg, t0, t2, reg, t1);
263
+ } else {
264
+ /* Detect signed overflow for addition. */
265
+ tcg_gen_xor_i64(t0, reg, val);
266
+ tcg_gen_add_i64(reg, reg, val);
267
+ tcg_gen_xor_i64(t1, reg, val);
268
+ tcg_gen_andc_i64(t0, t1, t0);
269
+
270
+ /* Bound the result. */
271
+ tcg_gen_movi_i64(t1, INT64_MAX);
272
+ t2 = tcg_const_i64(0);
273
+ tcg_gen_movcond_i64(TCG_COND_LT, reg, t0, t2, t1, reg);
274
+ }
275
+ tcg_temp_free_i64(t2);
276
+ }
277
+ tcg_temp_free_i64(t0);
278
+ tcg_temp_free_i64(t1);
279
+}
280
+
281
+/* Similarly with a vector and a scalar operand. */
282
+static void do_sat_addsub_vec(DisasContext *s, int esz, int rd, int rn,
283
+ TCGv_i64 val, bool u, bool d)
284
+{
285
+ unsigned vsz = vec_full_reg_size(s);
286
+ TCGv_ptr dptr, nptr;
287
+ TCGv_i32 t32, desc;
288
+ TCGv_i64 t64;
289
+
290
+ dptr = tcg_temp_new_ptr();
291
+ nptr = tcg_temp_new_ptr();
292
+ tcg_gen_addi_ptr(dptr, cpu_env, vec_full_reg_offset(s, rd));
293
+ tcg_gen_addi_ptr(nptr, cpu_env, vec_full_reg_offset(s, rn));
294
+ desc = tcg_const_i32(simd_desc(vsz, vsz, 0));
295
+
296
+ switch (esz) {
297
+ case MO_8:
298
+ t32 = tcg_temp_new_i32();
299
+ tcg_gen_extrl_i64_i32(t32, val);
300
+ if (d) {
301
+ tcg_gen_neg_i32(t32, t32);
302
+ }
303
+ if (u) {
304
+ gen_helper_sve_uqaddi_b(dptr, nptr, t32, desc);
305
+ } else {
306
+ gen_helper_sve_sqaddi_b(dptr, nptr, t32, desc);
307
+ }
308
+ tcg_temp_free_i32(t32);
309
+ break;
310
+
311
+ case MO_16:
312
+ t32 = tcg_temp_new_i32();
313
+ tcg_gen_extrl_i64_i32(t32, val);
314
+ if (d) {
315
+ tcg_gen_neg_i32(t32, t32);
316
+ }
317
+ if (u) {
318
+ gen_helper_sve_uqaddi_h(dptr, nptr, t32, desc);
319
+ } else {
320
+ gen_helper_sve_sqaddi_h(dptr, nptr, t32, desc);
321
+ }
322
+ tcg_temp_free_i32(t32);
323
+ break;
324
+
325
+ case MO_32:
326
+ t64 = tcg_temp_new_i64();
327
+ if (d) {
328
+ tcg_gen_neg_i64(t64, val);
329
+ } else {
330
+ tcg_gen_mov_i64(t64, val);
331
+ }
332
+ if (u) {
333
+ gen_helper_sve_uqaddi_s(dptr, nptr, t64, desc);
334
+ } else {
335
+ gen_helper_sve_sqaddi_s(dptr, nptr, t64, desc);
336
+ }
337
+ tcg_temp_free_i64(t64);
338
+ break;
339
+
340
+ case MO_64:
341
+ if (u) {
342
+ if (d) {
343
+ gen_helper_sve_uqsubi_d(dptr, nptr, val, desc);
344
+ } else {
345
+ gen_helper_sve_uqaddi_d(dptr, nptr, val, desc);
346
+ }
347
+ } else if (d) {
348
+ t64 = tcg_temp_new_i64();
349
+ tcg_gen_neg_i64(t64, val);
350
+ gen_helper_sve_sqaddi_d(dptr, nptr, t64, desc);
351
+ tcg_temp_free_i64(t64);
352
+ } else {
353
+ gen_helper_sve_sqaddi_d(dptr, nptr, val, desc);
354
+ }
355
+ break;
356
+
357
+ default:
358
+ g_assert_not_reached();
359
+ }
44
+ }
360
+
45
+
361
+ tcg_temp_free_ptr(dptr);
46
+ /* UNDEF accesses to D16-D31 if they don't exist. */
362
+ tcg_temp_free_ptr(nptr);
47
+ if (!dc_isar_feature(aa32_simd_r32, s) &&
363
+ tcg_temp_free_i32(desc);
48
+ ((a->vd | a->vn | a->vm) & 0x10)) {
364
+}
49
+ return false;
50
+ }
365
+
51
+
366
+static bool trans_CNT_r(DisasContext *s, arg_CNT_r *a, uint32_t insn)
52
+ if (!vfp_access_check(s)) {
367
+{
368
+ if (sve_access_check(s)) {
369
+ unsigned fullsz = vec_full_reg_size(s);
370
+ unsigned numelem = decode_pred_count(fullsz, a->pat, a->esz);
371
+ tcg_gen_movi_i64(cpu_reg(s, a->rd), numelem * a->imm);
372
+ }
373
+ return true;
374
+}
375
+
376
+static bool trans_INCDEC_r(DisasContext *s, arg_incdec_cnt *a, uint32_t insn)
377
+{
378
+ if (sve_access_check(s)) {
379
+ unsigned fullsz = vec_full_reg_size(s);
380
+ unsigned numelem = decode_pred_count(fullsz, a->pat, a->esz);
381
+ int inc = numelem * a->imm * (a->d ? -1 : 1);
382
+ TCGv_i64 reg = cpu_reg(s, a->rd);
383
+
384
+ tcg_gen_addi_i64(reg, reg, inc);
385
+ }
386
+ return true;
387
+}
388
+
389
+static bool trans_SINCDEC_r_32(DisasContext *s, arg_incdec_cnt *a,
390
+ uint32_t insn)
391
+{
392
+ if (!sve_access_check(s)) {
393
+ return true;
53
+ return true;
394
+ }
54
+ }
395
+
55
+
396
+ unsigned fullsz = vec_full_reg_size(s);
56
+ n = a->len + 1;
397
+ unsigned numelem = decode_pred_count(fullsz, a->pat, a->esz);
57
+ if ((a->vn + n) > 32) {
398
+ int inc = numelem * a->imm;
58
+ /*
399
+ TCGv_i64 reg = cpu_reg(s, a->rd);
59
+ * This is UNPREDICTABLE; we choose to UNDEF to avoid the
400
+
60
+ * helper function running off the end of the register file.
401
+ /* Use normal 64-bit arithmetic to detect 32-bit overflow. */
61
+ */
402
+ if (inc == 0) {
62
+ return false;
403
+ if (a->u) {
63
+ }
404
+ tcg_gen_ext32u_i64(reg, reg);
64
+ n <<= 3;
405
+ } else {
65
+ if (a->op) {
406
+ tcg_gen_ext32s_i64(reg, reg);
66
+ tmp = neon_load_reg(a->vd, 0);
407
+ }
408
+ } else {
67
+ } else {
409
+ TCGv_i64 t = tcg_const_i64(inc);
68
+ tmp = tcg_temp_new_i32();
410
+ do_sat_addsub_32(reg, t, a->u, a->d);
69
+ tcg_gen_movi_i32(tmp, 0);
411
+ tcg_temp_free_i64(t);
412
+ }
70
+ }
71
+ tmp2 = neon_load_reg(a->vm, 0);
72
+ ptr1 = vfp_reg_ptr(true, a->vn);
73
+ tmp4 = tcg_const_i32(n);
74
+ gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp4);
75
+ tcg_temp_free_i32(tmp);
76
+ if (a->op) {
77
+ tmp = neon_load_reg(a->vd, 1);
78
+ } else {
79
+ tmp = tcg_temp_new_i32();
80
+ tcg_gen_movi_i32(tmp, 0);
81
+ }
82
+ tmp3 = neon_load_reg(a->vm, 1);
83
+ gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp4);
84
+ tcg_temp_free_i32(tmp4);
85
+ tcg_temp_free_ptr(ptr1);
86
+ neon_store_reg(a->vd, 0, tmp2);
87
+ neon_store_reg(a->vd, 1, tmp3);
88
+ tcg_temp_free_i32(tmp);
413
+ return true;
89
+ return true;
414
+}
90
+}
415
+
91
diff --git a/target/arm/translate.c b/target/arm/translate.c
416
+static bool trans_SINCDEC_r_64(DisasContext *s, arg_incdec_cnt *a,
417
+ uint32_t insn)
418
+{
419
+ if (!sve_access_check(s)) {
420
+ return true;
421
+ }
422
+
423
+ unsigned fullsz = vec_full_reg_size(s);
424
+ unsigned numelem = decode_pred_count(fullsz, a->pat, a->esz);
425
+ int inc = numelem * a->imm;
426
+ TCGv_i64 reg = cpu_reg(s, a->rd);
427
+
428
+ if (inc != 0) {
429
+ TCGv_i64 t = tcg_const_i64(inc);
430
+ do_sat_addsub_64(reg, t, a->u, a->d);
431
+ tcg_temp_free_i64(t);
432
+ }
433
+ return true;
434
+}
435
+
436
+static bool trans_INCDEC_v(DisasContext *s, arg_incdec2_cnt *a, uint32_t insn)
437
+{
438
+ if (a->esz == 0) {
439
+ return false;
440
+ }
441
+
442
+ unsigned fullsz = vec_full_reg_size(s);
443
+ unsigned numelem = decode_pred_count(fullsz, a->pat, a->esz);
444
+ int inc = numelem * a->imm;
445
+
446
+ if (inc != 0) {
447
+ if (sve_access_check(s)) {
448
+ TCGv_i64 t = tcg_const_i64(a->d ? -inc : inc);
449
+ tcg_gen_gvec_adds(a->esz, vec_full_reg_offset(s, a->rd),
450
+ vec_full_reg_offset(s, a->rn),
451
+ t, fullsz, fullsz);
452
+ tcg_temp_free_i64(t);
453
+ }
454
+ } else {
455
+ do_mov_z(s, a->rd, a->rn);
456
+ }
457
+ return true;
458
+}
459
+
460
+static bool trans_SINCDEC_v(DisasContext *s, arg_incdec2_cnt *a,
461
+ uint32_t insn)
462
+{
463
+ if (a->esz == 0) {
464
+ return false;
465
+ }
466
+
467
+ unsigned fullsz = vec_full_reg_size(s);
468
+ unsigned numelem = decode_pred_count(fullsz, a->pat, a->esz);
469
+ int inc = numelem * a->imm;
470
+
471
+ if (inc != 0) {
472
+ if (sve_access_check(s)) {
473
+ TCGv_i64 t = tcg_const_i64(inc);
474
+ do_sat_addsub_vec(s, a->esz, a->rd, a->rn, t, a->u, a->d);
475
+ tcg_temp_free_i64(t);
476
+ }
477
+ } else {
478
+ do_mov_z(s, a->rd, a->rn);
479
+ }
480
+ return true;
481
+}
482
+
483
/*
484
*** SVE Memory - 32-bit Gather and Unsized Contiguous Group
485
*/
486
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
487
index XXXXXXX..XXXXXXX 100644
92
index XXXXXXX..XXXXXXX 100644
488
--- a/target/arm/sve.decode
93
--- a/target/arm/translate.c
489
+++ b/target/arm/sve.decode
94
+++ b/target/arm/translate.c
490
@@ -XXX,XX +XXX,XX @@
95
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
491
###########################################################################
96
{
492
# Named fields. These are primarily for disjoint fields.
97
int op;
493
98
int q;
494
+%imm4_16_p1 16:4 !function=plus1
99
- int rd, rn, rm, rd_ofs, rm_ofs;
495
%imm6_22_5 22:1 5:5
100
+ int rd, rm, rd_ofs, rm_ofs;
496
%imm9_16_10 16:s6 10:3
101
int size;
497
102
int pass;
498
@@ -XXX,XX +XXX,XX @@
103
int u;
499
&rprr_esz rd pg rn rm esz
104
int vec_size;
500
&rprrr_esz rd pg rn rm ra esz
105
- TCGv_i32 tmp, tmp2, tmp3, tmp5;
501
&rpri_esz rd pg rn imm esz
106
- TCGv_ptr ptr1;
502
+&ptrue rd esz pat s
107
+ TCGv_i32 tmp, tmp2, tmp3;
503
+&incdec_cnt rd pat esz imm d u
108
504
+&incdec2_cnt rd rn pat esz imm d u
109
if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
505
110
return 1;
506
###########################################################################
111
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
507
# Named instruction formats. These are generally used to
112
q = (insn & (1 << 6)) != 0;
508
@@ -XXX,XX +XXX,XX @@
113
u = (insn >> 24) & 1;
509
@rd_rn_i9 ........ ........ ...... rn:5 rd:5 \
114
VFP_DREG_D(rd, insn);
510
&rri imm=%imm9_16_10
115
- VFP_DREG_N(rn, insn);
511
116
VFP_DREG_M(rm, insn);
512
+# One register, pattern, and uint4+1.
117
size = (insn >> 20) & 3;
513
+# User must fill in U and D.
118
vec_size = q ? 16 : 8;
514
+@incdec_cnt ........ esz:2 .. .... ...... pat:5 rd:5 \
119
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
515
+ &incdec_cnt imm=%imm4_16_p1
120
break;
516
+@incdec2_cnt ........ esz:2 .. .... ...... pat:5 rd:5 \
121
}
517
+ &incdec2_cnt imm=%imm4_16_p1 rn=%reg_movprfx
122
} else if ((insn & (1 << 10)) == 0) {
518
+
123
- /* VTBL, VTBX. */
519
###########################################################################
124
- int n = ((insn >> 8) & 3) + 1;
520
# Instruction patterns. Grouped according to the SVE encodingindex.xhtml.
125
- if ((rn + n) > 32) {
521
126
- /* This is UNPREDICTABLE; we choose to UNDEF to avoid the
522
@@ -XXX,XX +XXX,XX @@ FEXPA 00000100 .. 1 00000 101110 ..... ..... @rd_rn
127
- * helper function running off the end of the register file.
523
# Note esz != 0
128
- */
524
FTSSEL 00000100 .. 1 ..... 101100 ..... ..... @rd_rn_rm
129
- return 1;
525
130
- }
526
-### SVE Predicate Logical Operations Group
131
- n <<= 3;
527
+### SVE Element Count Group
132
- if (insn & (1 << 6)) {
528
+
133
- tmp = neon_load_reg(rd, 0);
529
+# SVE element count
134
- } else {
530
+CNT_r 00000100 .. 10 .... 1110 0 0 ..... ..... @incdec_cnt d=0 u=1
135
- tmp = tcg_temp_new_i32();
531
+
136
- tcg_gen_movi_i32(tmp, 0);
532
+# SVE inc/dec register by element count
137
- }
533
+INCDEC_r 00000100 .. 11 .... 1110 0 d:1 ..... ..... @incdec_cnt u=1
138
- tmp2 = neon_load_reg(rm, 0);
534
+
139
- ptr1 = vfp_reg_ptr(true, rn);
535
+# SVE saturating inc/dec register by element count
140
- tmp5 = tcg_const_i32(n);
536
+SINCDEC_r_32 00000100 .. 10 .... 1111 d:1 u:1 ..... ..... @incdec_cnt
141
- gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp5);
537
+SINCDEC_r_64 00000100 .. 11 .... 1111 d:1 u:1 ..... ..... @incdec_cnt
142
- tcg_temp_free_i32(tmp);
538
+
143
- if (insn & (1 << 6)) {
539
+# SVE inc/dec vector by element count
144
- tmp = neon_load_reg(rd, 1);
540
+# Note this requires esz != 0.
145
- } else {
541
+INCDEC_v 00000100 .. 1 1 .... 1100 0 d:1 ..... ..... @incdec2_cnt u=1
146
- tmp = tcg_temp_new_i32();
542
+
147
- tcg_gen_movi_i32(tmp, 0);
543
+# SVE saturating inc/dec vector by element count
148
- }
544
+# Note these require esz != 0.
149
- tmp3 = neon_load_reg(rm, 1);
545
+SINCDEC_v 00000100 .. 1 0 .... 1100 d:1 u:1 ..... ..... @incdec2_cnt
150
- gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp5);
546
151
- tcg_temp_free_i32(tmp5);
547
# SVE predicate logical operations
152
- tcg_temp_free_ptr(ptr1);
548
AND_pppp 00100101 0. 00 .... 01 .... 0 .... 0 .... @pd_pg_pn_pm_s
153
- neon_store_reg(rd, 0, tmp2);
154
- neon_store_reg(rd, 1, tmp3);
155
- tcg_temp_free_i32(tmp);
156
+ /* VTBL, VTBX: handled by decodetree */
157
+ return 1;
158
} else if ((insn & 0x380) == 0) {
159
/* VDUP */
160
int element;
549
--
161
--
550
2.17.0
162
2.20.1
551
163
552
164
diff view generated by jsdifflib
1
From: Richard Henderson <richard.henderson@linaro.org>
1
Convert the Neon VDUP (scalar) insn to decodetree. (Note that we
2
can't call this just "VDUP" as we used that already in vfp.decode for
3
the "VDUP (general purpose register" insn.)
2
4
3
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
5
Message-id: 20180516223007.10256-7-richard.henderson@linaro.org
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
7
---
7
---
8
target/arm/cpu.h | 4 +-
8
target/arm/neon-dp.decode | 7 +++++++
9
target/arm/helper-sve.h | 10 +
9
target/arm/translate-neon.inc.c | 26 ++++++++++++++++++++++++++
10
target/arm/sve_helper.c | 39 ++++
10
target/arm/translate.c | 25 +------------------------
11
target/arm/translate-sve.c | 361 +++++++++++++++++++++++++++++++++++++
11
3 files changed, 34 insertions(+), 24 deletions(-)
12
target/arm/sve.decode | 16 ++
13
5 files changed, 429 insertions(+), 1 deletion(-)
14
12
15
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
13
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
16
index XXXXXXX..XXXXXXX 100644
14
index XXXXXXX..XXXXXXX 100644
17
--- a/target/arm/cpu.h
15
--- a/target/arm/neon-dp.decode
18
+++ b/target/arm/cpu.h
16
+++ b/target/arm/neon-dp.decode
19
@@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
17
@@ -XXX,XX +XXX,XX @@ Vimm_1r 1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
20
#ifdef TARGET_AARCH64
18
21
/* Store FFR as pregs[16] to make it easier to treat as any other. */
19
VTBL 1111 001 1 1 . 11 .... .... 10 len:2 . op:1 . 0 .... \
22
ARMPredicateReg pregs[17];
20
vm=%vm_dp vn=%vn_dp vd=%vd_dp
23
+ /* Scratch space for aa64 sve predicate temporary. */
21
+
24
+ ARMPredicateReg preg_tmp;
22
+ VDUP_scalar 1111 001 1 1 . 11 index:3 1 .... 11 000 q:1 . 0 .... \
25
#endif
23
+ vm=%vm_dp vd=%vd_dp size=0
26
24
+ VDUP_scalar 1111 001 1 1 . 11 index:2 10 .... 11 000 q:1 . 0 .... \
27
uint32_t xregs[16];
25
+ vm=%vm_dp vd=%vd_dp size=1
28
@@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
26
+ VDUP_scalar 1111 001 1 1 . 11 index:1 100 .... 11 000 q:1 . 0 .... \
29
int vec_len;
27
+ vm=%vm_dp vd=%vd_dp size=2
30
int vec_stride;
28
]
31
29
32
- /* scratch space when Tn are not sufficient. */
30
# Subgroup for size != 0b11
33
+ /* Scratch space for aa32 neon expansion. */
31
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
34
uint32_t scratch[8];
35
36
/* There are a number of distinct float control structures:
37
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
38
index XXXXXXX..XXXXXXX 100644
32
index XXXXXXX..XXXXXXX 100644
39
--- a/target/arm/helper-sve.h
33
--- a/target/arm/translate-neon.inc.c
40
+++ b/target/arm/helper-sve.h
34
+++ b/target/arm/translate-neon.inc.c
41
@@ -XXX,XX +XXX,XX @@
35
@@ -XXX,XX +XXX,XX @@ static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
42
36
tcg_temp_free_i32(tmp);
43
DEF_HELPER_FLAGS_2(sve_predtest1, TCG_CALL_NO_WG, i32, i64, i64)
37
return true;
44
DEF_HELPER_FLAGS_3(sve_predtest, TCG_CALL_NO_WG, i32, ptr, ptr, i32)
45
+
46
+DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
47
+DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
48
+DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
49
+DEF_HELPER_FLAGS_5(sve_sel_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
50
+DEF_HELPER_FLAGS_5(sve_orr_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
51
+DEF_HELPER_FLAGS_5(sve_orn_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
52
+DEF_HELPER_FLAGS_5(sve_nor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
53
+DEF_HELPER_FLAGS_5(sve_nand_pppp, TCG_CALL_NO_RWG,
54
+ void, ptr, ptr, ptr, ptr, i32)
55
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
56
index XXXXXXX..XXXXXXX 100644
57
--- a/target/arm/sve_helper.c
58
+++ b/target/arm/sve_helper.c
59
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_predtest)(void *vd, void *vg, uint32_t words)
60
61
return flags;
62
}
38
}
63
+
39
+
64
+#define LOGICAL_PPPP(NAME, FUNC) \
40
+static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
65
+void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc) \
41
+{
66
+{ \
42
+ if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
67
+ uintptr_t opr_sz = simd_oprsz(desc); \
43
+ return false;
68
+ uint64_t *d = vd, *n = vn, *m = vm, *g = vg; \
44
+ }
69
+ uintptr_t i; \
70
+ for (i = 0; i < opr_sz / 8; ++i) { \
71
+ d[i] = FUNC(n[i], m[i], g[i]); \
72
+ } \
73
+}
74
+
45
+
75
+#define DO_AND(N, M, G) (((N) & (M)) & (G))
46
+ /* UNDEF accesses to D16-D31 if they don't exist. */
76
+#define DO_BIC(N, M, G) (((N) & ~(M)) & (G))
47
+ if (!dc_isar_feature(aa32_simd_r32, s) &&
77
+#define DO_EOR(N, M, G) (((N) ^ (M)) & (G))
48
+ ((a->vd | a->vm) & 0x10)) {
78
+#define DO_ORR(N, M, G) (((N) | (M)) & (G))
49
+ return false;
79
+#define DO_ORN(N, M, G) (((N) | ~(M)) & (G))
50
+ }
80
+#define DO_NOR(N, M, G) (~((N) | (M)) & (G))
81
+#define DO_NAND(N, M, G) (~((N) & (M)) & (G))
82
+#define DO_SEL(N, M, G) (((N) & (G)) | ((M) & ~(G)))
83
+
51
+
84
+LOGICAL_PPPP(sve_and_pppp, DO_AND)
52
+ if (a->vd & a->q) {
85
+LOGICAL_PPPP(sve_bic_pppp, DO_BIC)
53
+ return false;
86
+LOGICAL_PPPP(sve_eor_pppp, DO_EOR)
54
+ }
87
+LOGICAL_PPPP(sve_sel_pppp, DO_SEL)
88
+LOGICAL_PPPP(sve_orr_pppp, DO_ORR)
89
+LOGICAL_PPPP(sve_orn_pppp, DO_ORN)
90
+LOGICAL_PPPP(sve_nor_pppp, DO_NOR)
91
+LOGICAL_PPPP(sve_nand_pppp, DO_NAND)
92
+
55
+
93
+#undef DO_AND
56
+ if (!vfp_access_check(s)) {
94
+#undef DO_BIC
95
+#undef DO_EOR
96
+#undef DO_ORR
97
+#undef DO_ORN
98
+#undef DO_NOR
99
+#undef DO_NAND
100
+#undef DO_SEL
101
+#undef LOGICAL_PPPP
102
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
103
index XXXXXXX..XXXXXXX 100644
104
--- a/target/arm/translate-sve.c
105
+++ b/target/arm/translate-sve.c
106
@@ -XXX,XX +XXX,XX @@ static inline int pred_full_reg_size(DisasContext *s)
107
return s->sve_len >> 3;
108
}
109
110
+/* Round up the size of a register to a size allowed by
111
+ * the tcg vector infrastructure. Any operation which uses this
112
+ * size may assume that the bits above pred_full_reg_size are zero,
113
+ * and must leave them the same way.
114
+ *
115
+ * Note that this is not needed for the vector registers as they
116
+ * are always properly sized for tcg vectors.
117
+ */
118
+static int size_for_gvec(int size)
119
+{
120
+ if (size <= 8) {
121
+ return 8;
122
+ } else {
123
+ return QEMU_ALIGN_UP(size, 16);
124
+ }
125
+}
126
+
127
+static int pred_gvec_reg_size(DisasContext *s)
128
+{
129
+ return size_for_gvec(pred_full_reg_size(s));
130
+}
131
+
132
/* Invoke a vector expander on two Zregs. */
133
static bool do_vector2_z(DisasContext *s, GVecGen2Fn *gvec_fn,
134
int esz, int rd, int rn)
135
@@ -XXX,XX +XXX,XX @@ static bool do_mov_z(DisasContext *s, int rd, int rn)
136
return do_vector2_z(s, tcg_gen_gvec_mov, 0, rd, rn);
137
}
138
139
+/* Invoke a vector expander on two Pregs. */
140
+static bool do_vector2_p(DisasContext *s, GVecGen2Fn *gvec_fn,
141
+ int esz, int rd, int rn)
142
+{
143
+ if (sve_access_check(s)) {
144
+ unsigned psz = pred_gvec_reg_size(s);
145
+ gvec_fn(esz, pred_full_reg_offset(s, rd),
146
+ pred_full_reg_offset(s, rn), psz, psz);
147
+ }
148
+ return true;
149
+}
150
+
151
+/* Invoke a vector expander on three Pregs. */
152
+static bool do_vector3_p(DisasContext *s, GVecGen3Fn *gvec_fn,
153
+ int esz, int rd, int rn, int rm)
154
+{
155
+ if (sve_access_check(s)) {
156
+ unsigned psz = pred_gvec_reg_size(s);
157
+ gvec_fn(esz, pred_full_reg_offset(s, rd),
158
+ pred_full_reg_offset(s, rn),
159
+ pred_full_reg_offset(s, rm), psz, psz);
160
+ }
161
+ return true;
162
+}
163
+
164
+/* Invoke a vector operation on four Pregs. */
165
+static bool do_vecop4_p(DisasContext *s, const GVecGen4 *gvec_op,
166
+ int rd, int rn, int rm, int rg)
167
+{
168
+ if (sve_access_check(s)) {
169
+ unsigned psz = pred_gvec_reg_size(s);
170
+ tcg_gen_gvec_4(pred_full_reg_offset(s, rd),
171
+ pred_full_reg_offset(s, rn),
172
+ pred_full_reg_offset(s, rm),
173
+ pred_full_reg_offset(s, rg),
174
+ psz, psz, gvec_op);
175
+ }
176
+ return true;
177
+}
178
+
179
+/* Invoke a vector move on two Pregs. */
180
+static bool do_mov_p(DisasContext *s, int rd, int rn)
181
+{
182
+ return do_vector2_p(s, tcg_gen_gvec_mov, 0, rd, rn);
183
+}
184
+
185
/* Set the cpu flags as per a return from an SVE helper. */
186
static void do_pred_flags(TCGv_i32 t)
187
{
188
@@ -XXX,XX +XXX,XX @@ static bool trans_BIC_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
189
return do_vector3_z(s, tcg_gen_gvec_andc, 0, a->rd, a->rn, a->rm);
190
}
191
192
+/*
193
+ *** SVE Predicate Logical Operations Group
194
+ */
195
+
196
+static bool do_pppp_flags(DisasContext *s, arg_rprr_s *a,
197
+ const GVecGen4 *gvec_op)
198
+{
199
+ if (!sve_access_check(s)) {
200
+ return true;
57
+ return true;
201
+ }
58
+ }
202
+
59
+
203
+ unsigned psz = pred_gvec_reg_size(s);
60
+ tcg_gen_gvec_dup_mem(a->size, neon_reg_offset(a->vd, 0),
204
+ int dofs = pred_full_reg_offset(s, a->rd);
61
+ neon_element_offset(a->vm, a->index, a->size),
205
+ int nofs = pred_full_reg_offset(s, a->rn);
62
+ a->q ? 16 : 8, a->q ? 16 : 8);
206
+ int mofs = pred_full_reg_offset(s, a->rm);
207
+ int gofs = pred_full_reg_offset(s, a->pg);
208
+
209
+ if (psz == 8) {
210
+ /* Do the operation and the flags generation in temps. */
211
+ TCGv_i64 pd = tcg_temp_new_i64();
212
+ TCGv_i64 pn = tcg_temp_new_i64();
213
+ TCGv_i64 pm = tcg_temp_new_i64();
214
+ TCGv_i64 pg = tcg_temp_new_i64();
215
+
216
+ tcg_gen_ld_i64(pn, cpu_env, nofs);
217
+ tcg_gen_ld_i64(pm, cpu_env, mofs);
218
+ tcg_gen_ld_i64(pg, cpu_env, gofs);
219
+
220
+ gvec_op->fni8(pd, pn, pm, pg);
221
+ tcg_gen_st_i64(pd, cpu_env, dofs);
222
+
223
+ do_predtest1(pd, pg);
224
+
225
+ tcg_temp_free_i64(pd);
226
+ tcg_temp_free_i64(pn);
227
+ tcg_temp_free_i64(pm);
228
+ tcg_temp_free_i64(pg);
229
+ } else {
230
+ /* The operation and flags generation is large. The computation
231
+ * of the flags depends on the original contents of the guarding
232
+ * predicate. If the destination overwrites the guarding predicate,
233
+ * then the easiest way to get this right is to save a copy.
234
+ */
235
+ int tofs = gofs;
236
+ if (a->rd == a->pg) {
237
+ tofs = offsetof(CPUARMState, vfp.preg_tmp);
238
+ tcg_gen_gvec_mov(0, tofs, gofs, psz, psz);
239
+ }
240
+
241
+ tcg_gen_gvec_4(dofs, nofs, mofs, gofs, psz, psz, gvec_op);
242
+ do_predtest(s, dofs, tofs, psz / 8);
243
+ }
244
+ return true;
63
+ return true;
245
+}
64
+}
246
+
65
diff --git a/target/arm/translate.c b/target/arm/translate.c
247
+static void gen_and_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
248
+{
249
+ tcg_gen_and_i64(pd, pn, pm);
250
+ tcg_gen_and_i64(pd, pd, pg);
251
+}
252
+
253
+static void gen_and_pg_vec(unsigned vece, TCGv_vec pd, TCGv_vec pn,
254
+ TCGv_vec pm, TCGv_vec pg)
255
+{
256
+ tcg_gen_and_vec(vece, pd, pn, pm);
257
+ tcg_gen_and_vec(vece, pd, pd, pg);
258
+}
259
+
260
+static bool trans_AND_pppp(DisasContext *s, arg_rprr_s *a, uint32_t insn)
261
+{
262
+ static const GVecGen4 op = {
263
+ .fni8 = gen_and_pg_i64,
264
+ .fniv = gen_and_pg_vec,
265
+ .fno = gen_helper_sve_and_pppp,
266
+ .prefer_i64 = TCG_TARGET_REG_BITS == 64,
267
+ };
268
+ if (a->s) {
269
+ return do_pppp_flags(s, a, &op);
270
+ } else if (a->rn == a->rm) {
271
+ if (a->pg == a->rn) {
272
+ return do_mov_p(s, a->rd, a->rn);
273
+ } else {
274
+ return do_vector3_p(s, tcg_gen_gvec_and, 0, a->rd, a->rn, a->pg);
275
+ }
276
+ } else if (a->pg == a->rn || a->pg == a->rm) {
277
+ return do_vector3_p(s, tcg_gen_gvec_and, 0, a->rd, a->rn, a->rm);
278
+ } else {
279
+ return do_vecop4_p(s, &op, a->rd, a->rn, a->rm, a->pg);
280
+ }
281
+}
282
+
283
+static void gen_bic_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
284
+{
285
+ tcg_gen_andc_i64(pd, pn, pm);
286
+ tcg_gen_and_i64(pd, pd, pg);
287
+}
288
+
289
+static void gen_bic_pg_vec(unsigned vece, TCGv_vec pd, TCGv_vec pn,
290
+ TCGv_vec pm, TCGv_vec pg)
291
+{
292
+ tcg_gen_andc_vec(vece, pd, pn, pm);
293
+ tcg_gen_and_vec(vece, pd, pd, pg);
294
+}
295
+
296
+static bool trans_BIC_pppp(DisasContext *s, arg_rprr_s *a, uint32_t insn)
297
+{
298
+ static const GVecGen4 op = {
299
+ .fni8 = gen_bic_pg_i64,
300
+ .fniv = gen_bic_pg_vec,
301
+ .fno = gen_helper_sve_bic_pppp,
302
+ .prefer_i64 = TCG_TARGET_REG_BITS == 64,
303
+ };
304
+ if (a->s) {
305
+ return do_pppp_flags(s, a, &op);
306
+ } else if (a->pg == a->rn) {
307
+ return do_vector3_p(s, tcg_gen_gvec_andc, 0, a->rd, a->rn, a->rm);
308
+ } else {
309
+ return do_vecop4_p(s, &op, a->rd, a->rn, a->rm, a->pg);
310
+ }
311
+}
312
+
313
+static void gen_eor_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
314
+{
315
+ tcg_gen_xor_i64(pd, pn, pm);
316
+ tcg_gen_and_i64(pd, pd, pg);
317
+}
318
+
319
+static void gen_eor_pg_vec(unsigned vece, TCGv_vec pd, TCGv_vec pn,
320
+ TCGv_vec pm, TCGv_vec pg)
321
+{
322
+ tcg_gen_xor_vec(vece, pd, pn, pm);
323
+ tcg_gen_and_vec(vece, pd, pd, pg);
324
+}
325
+
326
+static bool trans_EOR_pppp(DisasContext *s, arg_rprr_s *a, uint32_t insn)
327
+{
328
+ static const GVecGen4 op = {
329
+ .fni8 = gen_eor_pg_i64,
330
+ .fniv = gen_eor_pg_vec,
331
+ .fno = gen_helper_sve_eor_pppp,
332
+ .prefer_i64 = TCG_TARGET_REG_BITS == 64,
333
+ };
334
+ if (a->s) {
335
+ return do_pppp_flags(s, a, &op);
336
+ } else {
337
+ return do_vecop4_p(s, &op, a->rd, a->rn, a->rm, a->pg);
338
+ }
339
+}
340
+
341
+static void gen_sel_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
342
+{
343
+ tcg_gen_and_i64(pn, pn, pg);
344
+ tcg_gen_andc_i64(pm, pm, pg);
345
+ tcg_gen_or_i64(pd, pn, pm);
346
+}
347
+
348
+static void gen_sel_pg_vec(unsigned vece, TCGv_vec pd, TCGv_vec pn,
349
+ TCGv_vec pm, TCGv_vec pg)
350
+{
351
+ tcg_gen_and_vec(vece, pn, pn, pg);
352
+ tcg_gen_andc_vec(vece, pm, pm, pg);
353
+ tcg_gen_or_vec(vece, pd, pn, pm);
354
+}
355
+
356
+static bool trans_SEL_pppp(DisasContext *s, arg_rprr_s *a, uint32_t insn)
357
+{
358
+ static const GVecGen4 op = {
359
+ .fni8 = gen_sel_pg_i64,
360
+ .fniv = gen_sel_pg_vec,
361
+ .fno = gen_helper_sve_sel_pppp,
362
+ .prefer_i64 = TCG_TARGET_REG_BITS == 64,
363
+ };
364
+ if (a->s) {
365
+ return false;
366
+ } else {
367
+ return do_vecop4_p(s, &op, a->rd, a->rn, a->rm, a->pg);
368
+ }
369
+}
370
+
371
+static void gen_orr_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
372
+{
373
+ tcg_gen_or_i64(pd, pn, pm);
374
+ tcg_gen_and_i64(pd, pd, pg);
375
+}
376
+
377
+static void gen_orr_pg_vec(unsigned vece, TCGv_vec pd, TCGv_vec pn,
378
+ TCGv_vec pm, TCGv_vec pg)
379
+{
380
+ tcg_gen_or_vec(vece, pd, pn, pm);
381
+ tcg_gen_and_vec(vece, pd, pd, pg);
382
+}
383
+
384
+static bool trans_ORR_pppp(DisasContext *s, arg_rprr_s *a, uint32_t insn)
385
+{
386
+ static const GVecGen4 op = {
387
+ .fni8 = gen_orr_pg_i64,
388
+ .fniv = gen_orr_pg_vec,
389
+ .fno = gen_helper_sve_orr_pppp,
390
+ .prefer_i64 = TCG_TARGET_REG_BITS == 64,
391
+ };
392
+ if (a->s) {
393
+ return do_pppp_flags(s, a, &op);
394
+ } else if (a->pg == a->rn && a->rn == a->rm) {
395
+ return do_mov_p(s, a->rd, a->rn);
396
+ } else {
397
+ return do_vecop4_p(s, &op, a->rd, a->rn, a->rm, a->pg);
398
+ }
399
+}
400
+
401
+static void gen_orn_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
402
+{
403
+ tcg_gen_orc_i64(pd, pn, pm);
404
+ tcg_gen_and_i64(pd, pd, pg);
405
+}
406
+
407
+static void gen_orn_pg_vec(unsigned vece, TCGv_vec pd, TCGv_vec pn,
408
+ TCGv_vec pm, TCGv_vec pg)
409
+{
410
+ tcg_gen_orc_vec(vece, pd, pn, pm);
411
+ tcg_gen_and_vec(vece, pd, pd, pg);
412
+}
413
+
414
+static bool trans_ORN_pppp(DisasContext *s, arg_rprr_s *a, uint32_t insn)
415
+{
416
+ static const GVecGen4 op = {
417
+ .fni8 = gen_orn_pg_i64,
418
+ .fniv = gen_orn_pg_vec,
419
+ .fno = gen_helper_sve_orn_pppp,
420
+ .prefer_i64 = TCG_TARGET_REG_BITS == 64,
421
+ };
422
+ if (a->s) {
423
+ return do_pppp_flags(s, a, &op);
424
+ } else {
425
+ return do_vecop4_p(s, &op, a->rd, a->rn, a->rm, a->pg);
426
+ }
427
+}
428
+
429
+static void gen_nor_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
430
+{
431
+ tcg_gen_or_i64(pd, pn, pm);
432
+ tcg_gen_andc_i64(pd, pg, pd);
433
+}
434
+
435
+static void gen_nor_pg_vec(unsigned vece, TCGv_vec pd, TCGv_vec pn,
436
+ TCGv_vec pm, TCGv_vec pg)
437
+{
438
+ tcg_gen_or_vec(vece, pd, pn, pm);
439
+ tcg_gen_andc_vec(vece, pd, pg, pd);
440
+}
441
+
442
+static bool trans_NOR_pppp(DisasContext *s, arg_rprr_s *a, uint32_t insn)
443
+{
444
+ static const GVecGen4 op = {
445
+ .fni8 = gen_nor_pg_i64,
446
+ .fniv = gen_nor_pg_vec,
447
+ .fno = gen_helper_sve_nor_pppp,
448
+ .prefer_i64 = TCG_TARGET_REG_BITS == 64,
449
+ };
450
+ if (a->s) {
451
+ return do_pppp_flags(s, a, &op);
452
+ } else {
453
+ return do_vecop4_p(s, &op, a->rd, a->rn, a->rm, a->pg);
454
+ }
455
+}
456
+
457
+static void gen_nand_pg_i64(TCGv_i64 pd, TCGv_i64 pn, TCGv_i64 pm, TCGv_i64 pg)
458
+{
459
+ tcg_gen_and_i64(pd, pn, pm);
460
+ tcg_gen_andc_i64(pd, pg, pd);
461
+}
462
+
463
+static void gen_nand_pg_vec(unsigned vece, TCGv_vec pd, TCGv_vec pn,
464
+ TCGv_vec pm, TCGv_vec pg)
465
+{
466
+ tcg_gen_and_vec(vece, pd, pn, pm);
467
+ tcg_gen_andc_vec(vece, pd, pg, pd);
468
+}
469
+
470
+static bool trans_NAND_pppp(DisasContext *s, arg_rprr_s *a, uint32_t insn)
471
+{
472
+ static const GVecGen4 op = {
473
+ .fni8 = gen_nand_pg_i64,
474
+ .fniv = gen_nand_pg_vec,
475
+ .fno = gen_helper_sve_nand_pppp,
476
+ .prefer_i64 = TCG_TARGET_REG_BITS == 64,
477
+ };
478
+ if (a->s) {
479
+ return do_pppp_flags(s, a, &op);
480
+ } else {
481
+ return do_vecop4_p(s, &op, a->rd, a->rn, a->rm, a->pg);
482
+ }
483
+}
484
+
485
/*
486
*** SVE Predicate Misc Group
487
*/
488
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
489
index XXXXXXX..XXXXXXX 100644
66
index XXXXXXX..XXXXXXX 100644
490
--- a/target/arm/sve.decode
67
--- a/target/arm/translate.c
491
+++ b/target/arm/sve.decode
68
+++ b/target/arm/translate.c
492
@@ -XXX,XX +XXX,XX @@
69
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
493
70
}
494
&rri rd rn imm
71
break;
495
&rrr_esz rd rn rm esz
72
}
496
+&rprr_s rd pg rn rm s
73
- } else if ((insn & (1 << 10)) == 0) {
497
74
- /* VTBL, VTBX: handled by decodetree */
498
###########################################################################
75
- return 1;
499
# Named instruction formats. These are generally used to
76
- } else if ((insn & 0x380) == 0) {
500
@@ -XXX,XX +XXX,XX @@
77
- /* VDUP */
501
# Three operand with unused vector element size
78
- int element;
502
@rd_rn_rm_e0 ........ ... rm:5 ... ... rn:5 rd:5 &rrr_esz esz=0
79
- MemOp size;
503
80
-
504
+# Three predicate operand, with governing predicate, flag setting
81
- if ((insn & (7 << 16)) == 0 || (q && (rd & 1))) {
505
+@pd_pg_pn_pm_s ........ . s:1 .. rm:4 .. pg:4 . rn:4 . rd:4 &rprr_s
82
- return 1;
506
+
83
- }
507
# Basic Load/Store with 9-bit immediate offset
84
- if (insn & (1 << 16)) {
508
@pd_rn_i9 ........ ........ ...... rn:5 . rd:4 \
85
- size = MO_8;
509
&rri imm=%imm9_16_10
86
- element = (insn >> 17) & 7;
510
@@ -XXX,XX +XXX,XX @@ ORR_zzz 00000100 01 1 ..... 001 100 ..... ..... @rd_rn_rm_e0
87
- } else if (insn & (1 << 17)) {
511
EOR_zzz 00000100 10 1 ..... 001 100 ..... ..... @rd_rn_rm_e0
88
- size = MO_16;
512
BIC_zzz 00000100 11 1 ..... 001 100 ..... ..... @rd_rn_rm_e0
89
- element = (insn >> 18) & 3;
513
90
- } else {
514
+### SVE Predicate Logical Operations Group
91
- size = MO_32;
515
+
92
- element = (insn >> 19) & 1;
516
+# SVE predicate logical operations
93
- }
517
+AND_pppp 00100101 0. 00 .... 01 .... 0 .... 0 .... @pd_pg_pn_pm_s
94
- tcg_gen_gvec_dup_mem(size, neon_reg_offset(rd, 0),
518
+BIC_pppp 00100101 0. 00 .... 01 .... 0 .... 1 .... @pd_pg_pn_pm_s
95
- neon_element_offset(rm, element, size),
519
+EOR_pppp 00100101 0. 00 .... 01 .... 1 .... 0 .... @pd_pg_pn_pm_s
96
- q ? 16 : 8, q ? 16 : 8);
520
+SEL_pppp 00100101 0. 00 .... 01 .... 1 .... 1 .... @pd_pg_pn_pm_s
97
} else {
521
+ORR_pppp 00100101 1. 00 .... 01 .... 0 .... 0 .... @pd_pg_pn_pm_s
98
+ /* VTBL, VTBX, VDUP: handled by decodetree */
522
+ORN_pppp 00100101 1. 00 .... 01 .... 0 .... 1 .... @pd_pg_pn_pm_s
99
return 1;
523
+NOR_pppp 00100101 1. 00 .... 01 .... 1 .... 0 .... @pd_pg_pn_pm_s
100
}
524
+NAND_pppp 00100101 1. 00 .... 01 .... 1 .... 1 .... @pd_pg_pn_pm_s
101
}
525
+
526
### SVE Predicate Misc Group
527
528
# SVE predicate test
529
--
102
--
530
2.17.0
103
2.20.1
531
104
532
105
diff view generated by jsdifflib
1
From: Richard Henderson <richard.henderson@linaro.org>
1
From: Jean-Christophe Dubois <jcd@tribudubois.net>
2
2
3
These were the instructions that were stubbed out when
3
Some bits of the CCM registers are non writable.
4
introducing the decode skeleton.
5
4
5
This was left undone in the initial commit (all bits of registers were
6
writable).
7
8
This patch adds the required code to protect the non writable bits.
9
10
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
11
Message-id: 20200608133508.550046-1-jcd@tribudubois.net
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
12
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20180516223007.10256-4-richard.henderson@linaro.org
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
13
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
10
---
14
---
11
target/arm/translate-sve.c | 55 ++++++++++++++++++++++++++++++++------
15
hw/misc/imx6ul_ccm.c | 76 ++++++++++++++++++++++++++++++++++++--------
12
1 file changed, 47 insertions(+), 8 deletions(-)
16
1 file changed, 63 insertions(+), 13 deletions(-)
13
17
14
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
18
diff --git a/hw/misc/imx6ul_ccm.c b/hw/misc/imx6ul_ccm.c
15
index XXXXXXX..XXXXXXX 100644
19
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/translate-sve.c
20
--- a/hw/misc/imx6ul_ccm.c
17
+++ b/target/arm/translate-sve.c
21
+++ b/hw/misc/imx6ul_ccm.c
18
@@ -XXX,XX +XXX,XX @@
22
@@ -XXX,XX +XXX,XX @@
19
* Implement all of the translator functions referenced by the decoder.
23
20
*/
24
#include "trace.h"
21
25
22
-static bool trans_AND_zzz(DisasContext *s, arg_AND_zzz *a, uint32_t insn)
26
+static const uint32_t ccm_mask[CCM_MAX] = {
23
+/* Invoke a vector expander on two Zregs. */
27
+ [CCM_CCR] = 0xf01fef80,
24
+static bool do_vector2_z(DisasContext *s, GVecGen2Fn *gvec_fn,
28
+ [CCM_CCDR] = 0xfffeffff,
25
+ int esz, int rd, int rn)
29
+ [CCM_CSR] = 0xffffffff,
30
+ [CCM_CCSR] = 0xfffffef2,
31
+ [CCM_CACRR] = 0xfffffff8,
32
+ [CCM_CBCDR] = 0xc1f8e000,
33
+ [CCM_CBCMR] = 0xfc03cfff,
34
+ [CCM_CSCMR1] = 0x80700000,
35
+ [CCM_CSCMR2] = 0xe01ff003,
36
+ [CCM_CSCDR1] = 0xfe00c780,
37
+ [CCM_CS1CDR] = 0xfe00fe00,
38
+ [CCM_CS2CDR] = 0xf8007000,
39
+ [CCM_CDCDR] = 0xf00fffff,
40
+ [CCM_CHSCCDR] = 0xfffc01ff,
41
+ [CCM_CSCDR2] = 0xfe0001ff,
42
+ [CCM_CSCDR3] = 0xffffc1ff,
43
+ [CCM_CDHIPR] = 0xffffffff,
44
+ [CCM_CTOR] = 0x00000000,
45
+ [CCM_CLPCR] = 0xf39ff01c,
46
+ [CCM_CISR] = 0xfb85ffbe,
47
+ [CCM_CIMR] = 0xfb85ffbf,
48
+ [CCM_CCOSR] = 0xfe00fe00,
49
+ [CCM_CGPR] = 0xfffc3fea,
50
+ [CCM_CCGR0] = 0x00000000,
51
+ [CCM_CCGR1] = 0x00000000,
52
+ [CCM_CCGR2] = 0x00000000,
53
+ [CCM_CCGR3] = 0x00000000,
54
+ [CCM_CCGR4] = 0x00000000,
55
+ [CCM_CCGR5] = 0x00000000,
56
+ [CCM_CCGR6] = 0x00000000,
57
+ [CCM_CMEOR] = 0xafffff1f,
58
+};
59
+
60
+static const uint32_t analog_mask[CCM_ANALOG_MAX] = {
61
+ [CCM_ANALOG_PLL_ARM] = 0xfff60f80,
62
+ [CCM_ANALOG_PLL_USB1] = 0xfffe0fbc,
63
+ [CCM_ANALOG_PLL_USB2] = 0xfffe0fbc,
64
+ [CCM_ANALOG_PLL_SYS] = 0xfffa0ffe,
65
+ [CCM_ANALOG_PLL_SYS_SS] = 0x00000000,
66
+ [CCM_ANALOG_PLL_SYS_NUM] = 0xc0000000,
67
+ [CCM_ANALOG_PLL_SYS_DENOM] = 0xc0000000,
68
+ [CCM_ANALOG_PLL_AUDIO] = 0xffe20f80,
69
+ [CCM_ANALOG_PLL_AUDIO_NUM] = 0xc0000000,
70
+ [CCM_ANALOG_PLL_AUDIO_DENOM] = 0xc0000000,
71
+ [CCM_ANALOG_PLL_VIDEO] = 0xffe20f80,
72
+ [CCM_ANALOG_PLL_VIDEO_NUM] = 0xc0000000,
73
+ [CCM_ANALOG_PLL_VIDEO_DENOM] = 0xc0000000,
74
+ [CCM_ANALOG_PLL_ENET] = 0xffc20ff0,
75
+ [CCM_ANALOG_PFD_480] = 0x40404040,
76
+ [CCM_ANALOG_PFD_528] = 0x40404040,
77
+ [PMU_MISC0] = 0x01fe8306,
78
+ [PMU_MISC1] = 0x07fcede0,
79
+ [PMU_MISC2] = 0x005f5f5f,
80
+};
81
+
82
static const char *imx6ul_ccm_reg_name(uint32_t reg)
26
{
83
{
27
- return false;
84
static char unknown[20];
28
+ if (sve_access_check(s)) {
85
@@ -XXX,XX +XXX,XX @@ static void imx6ul_ccm_write(void *opaque, hwaddr offset, uint64_t value,
29
+ unsigned vsz = vec_full_reg_size(s);
86
30
+ gvec_fn(esz, vec_full_reg_offset(s, rd),
87
trace_ccm_write_reg(imx6ul_ccm_reg_name(index), (uint32_t)value);
31
+ vec_full_reg_offset(s, rn), vsz, vsz);
88
32
+ }
89
- /*
33
+ return true;
90
- * We will do a better implementation later. In particular some bits
91
- * cannot be written to.
92
- */
93
- s->ccm[index] = (uint32_t)value;
94
+ s->ccm[index] = (s->ccm[index] & ccm_mask[index]) |
95
+ ((uint32_t)value & ~ccm_mask[index]);
34
}
96
}
35
97
36
-static bool trans_ORR_zzz(DisasContext *s, arg_ORR_zzz *a, uint32_t insn)
98
static uint64_t imx6ul_analog_read(void *opaque, hwaddr offset, unsigned size)
37
+/* Invoke a vector expander on three Zregs. */
99
@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
38
+static bool do_vector3_z(DisasContext *s, GVecGen3Fn *gvec_fn,
100
* the REG_NAME register. So we change the value of the
39
+ int esz, int rd, int rn, int rm)
101
* REG_NAME register, setting bits passed in the value.
40
{
102
*/
41
- return false;
103
- s->analog[index - 1] |= value;
42
+ if (sve_access_check(s)) {
104
+ s->analog[index - 1] |= (value & ~analog_mask[index - 1]);
43
+ unsigned vsz = vec_full_reg_size(s);
105
break;
44
+ gvec_fn(esz, vec_full_reg_offset(s, rd),
106
case CCM_ANALOG_PLL_ARM_CLR:
45
+ vec_full_reg_offset(s, rn),
107
case CCM_ANALOG_PLL_USB1_CLR:
46
+ vec_full_reg_offset(s, rm), vsz, vsz);
108
@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
47
+ }
109
* the REG_NAME register. So we change the value of the
48
+ return true;
110
* REG_NAME register, unsetting bits passed in the value.
49
}
111
*/
50
112
- s->analog[index - 2] &= ~value;
51
-static bool trans_EOR_zzz(DisasContext *s, arg_EOR_zzz *a, uint32_t insn)
113
+ s->analog[index - 2] &= ~(value & ~analog_mask[index - 2]);
52
+/* Invoke a vector move on two Zregs. */
114
break;
53
+static bool do_mov_z(DisasContext *s, int rd, int rn)
115
case CCM_ANALOG_PLL_ARM_TOG:
54
{
116
case CCM_ANALOG_PLL_USB1_TOG:
55
- return false;
117
@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
56
+ return do_vector2_z(s, tcg_gen_gvec_mov, 0, rd, rn);
118
* the REG_NAME register. So we change the value of the
57
}
119
* REG_NAME register, toggling bits passed in the value.
58
120
*/
59
-static bool trans_BIC_zzz(DisasContext *s, arg_BIC_zzz *a, uint32_t insn)
121
- s->analog[index - 3] ^= value;
60
+/*
122
+ s->analog[index - 3] ^= (value & ~analog_mask[index - 3]);
61
+ *** SVE Logical - Unpredicated Group
123
break;
62
+ */
124
default:
63
+
125
- /*
64
+static bool trans_AND_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
126
- * We will do a better implementation later. In particular some bits
65
{
127
- * cannot be written to.
66
- return false;
128
- */
67
+ return do_vector3_z(s, tcg_gen_gvec_and, 0, a->rd, a->rn, a->rm);
129
- s->analog[index] = value;
68
+}
130
+ s->analog[index] = (s->analog[index] & analog_mask[index]) |
69
+
131
+ (value & ~analog_mask[index]);
70
+static bool trans_ORR_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
132
break;
71
+{
133
}
72
+ if (a->rn == a->rm) { /* MOV */
73
+ return do_mov_z(s, a->rd, a->rn);
74
+ } else {
75
+ return do_vector3_z(s, tcg_gen_gvec_or, 0, a->rd, a->rn, a->rm);
76
+ }
77
+}
78
+
79
+static bool trans_EOR_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
80
+{
81
+ return do_vector3_z(s, tcg_gen_gvec_xor, 0, a->rd, a->rn, a->rm);
82
+}
83
+
84
+static bool trans_BIC_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
85
+{
86
+ return do_vector3_z(s, tcg_gen_gvec_andc, 0, a->rd, a->rn, a->rm);
87
}
134
}
88
--
135
--
89
2.17.0
136
2.20.1
90
137
91
138
diff view generated by jsdifflib
1
From: Richard Henderson <richard.henderson@linaro.org>
1
From: Erik Smit <erik.lucas.smit@gmail.com>
2
2
3
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
3
The hardware supports configurable descriptor sizes, configured in the DBLAC
4
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
4
register.
5
Message-id: 20180516223007.10256-17-richard.henderson@linaro.org
5
6
Most drivers use the default 4 word descriptor, which is currently hardcoded,
7
but Aspeed SDK configures 8 words to store extra data.
8
9
Signed-off-by: Erik Smit <erik.lucas.smit@gmail.com>
10
Reviewed-by: Cédric Le Goater <clg@kaod.org>
11
[PMM: removed unnecessary parens]
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
12
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
---
13
---
8
target/arm/helper-sve.h | 5 +++
14
hw/net/ftgmac100.c | 26 ++++++++++++++++++++++++--
9
target/arm/sve_helper.c | 40 +++++++++++++++++++
15
1 file changed, 24 insertions(+), 2 deletions(-)
10
target/arm/translate-sve.c | 79 ++++++++++++++++++++++++++++++++++++++
11
target/arm/sve.decode | 14 +++++++
12
4 files changed, 138 insertions(+)
13
16
14
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
17
diff --git a/hw/net/ftgmac100.c b/hw/net/ftgmac100.c
15
index XXXXXXX..XXXXXXX 100644
18
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/helper-sve.h
19
--- a/hw/net/ftgmac100.c
17
+++ b/target/arm/helper-sve.h
20
+++ b/hw/net/ftgmac100.c
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sve_mls_s, TCG_CALL_NO_RWG,
21
@@ -XXX,XX +XXX,XX @@
19
DEF_HELPER_FLAGS_6(sve_mls_d, TCG_CALL_NO_RWG,
22
#define FTGMAC100_APTC_TXPOLL_CNT(x) (((x) >> 8) & 0xf)
20
void, ptr, ptr, ptr, ptr, ptr, i32)
23
#define FTGMAC100_APTC_TXPOLL_TIME_SEL (1 << 12)
21
22
+DEF_HELPER_FLAGS_4(sve_index_b, TCG_CALL_NO_RWG, void, ptr, i32, i32, i32)
23
+DEF_HELPER_FLAGS_4(sve_index_h, TCG_CALL_NO_RWG, void, ptr, i32, i32, i32)
24
+DEF_HELPER_FLAGS_4(sve_index_s, TCG_CALL_NO_RWG, void, ptr, i32, i32, i32)
25
+DEF_HELPER_FLAGS_4(sve_index_d, TCG_CALL_NO_RWG, void, ptr, i64, i64, i32)
26
+
27
DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
28
DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
29
DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
30
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
31
index XXXXXXX..XXXXXXX 100644
32
--- a/target/arm/sve_helper.c
33
+++ b/target/arm/sve_helper.c
34
@@ -XXX,XX +XXX,XX @@ DO_ZPZZZ_D(sve_mls_d, uint64_t, DO_MLS)
35
#undef DO_MLS
36
#undef DO_ZPZZZ
37
#undef DO_ZPZZZ_D
38
+
39
+void HELPER(sve_index_b)(void *vd, uint32_t start,
40
+ uint32_t incr, uint32_t desc)
41
+{
42
+ intptr_t i, opr_sz = simd_oprsz(desc);
43
+ uint8_t *d = vd;
44
+ for (i = 0; i < opr_sz; i += 1) {
45
+ d[H1(i)] = start + i * incr;
46
+ }
47
+}
48
+
49
+void HELPER(sve_index_h)(void *vd, uint32_t start,
50
+ uint32_t incr, uint32_t desc)
51
+{
52
+ intptr_t i, opr_sz = simd_oprsz(desc) / 2;
53
+ uint16_t *d = vd;
54
+ for (i = 0; i < opr_sz; i += 1) {
55
+ d[H2(i)] = start + i * incr;
56
+ }
57
+}
58
+
59
+void HELPER(sve_index_s)(void *vd, uint32_t start,
60
+ uint32_t incr, uint32_t desc)
61
+{
62
+ intptr_t i, opr_sz = simd_oprsz(desc) / 4;
63
+ uint32_t *d = vd;
64
+ for (i = 0; i < opr_sz; i += 1) {
65
+ d[H4(i)] = start + i * incr;
66
+ }
67
+}
68
+
69
+void HELPER(sve_index_d)(void *vd, uint64_t start,
70
+ uint64_t incr, uint32_t desc)
71
+{
72
+ intptr_t i, opr_sz = simd_oprsz(desc) / 8;
73
+ uint64_t *d = vd;
74
+ for (i = 0; i < opr_sz; i += 1) {
75
+ d[i] = start + i * incr;
76
+ }
77
+}
78
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
79
index XXXXXXX..XXXXXXX 100644
80
--- a/target/arm/translate-sve.c
81
+++ b/target/arm/translate-sve.c
82
@@ -XXX,XX +XXX,XX @@ DO_ZPZZZ(MLS, mls)
83
84
#undef DO_ZPZZZ
85
24
86
+/*
25
+/*
87
+ *** SVE Index Generation Group
26
+ * DMA burst length and arbitration control register
88
+ */
27
+ */
89
+
28
+#define FTGMAC100_DBLAC_RXBURST_SIZE(x) (((x) >> 8) & 0x3)
90
+static void do_index(DisasContext *s, int esz, int rd,
29
+#define FTGMAC100_DBLAC_TXBURST_SIZE(x) (((x) >> 10) & 0x3)
91
+ TCGv_i64 start, TCGv_i64 incr)
30
+#define FTGMAC100_DBLAC_RXDES_SIZE(x) ((((x) >> 12) & 0xf) * 8)
92
+{
31
+#define FTGMAC100_DBLAC_TXDES_SIZE(x) ((((x) >> 16) & 0xf) * 8)
93
+ unsigned vsz = vec_full_reg_size(s);
32
+#define FTGMAC100_DBLAC_IFG_CNT(x) (((x) >> 20) & 0x7)
94
+ TCGv_i32 desc = tcg_const_i32(simd_desc(vsz, vsz, 0));
33
+#define FTGMAC100_DBLAC_IFG_INC (1 << 23)
95
+ TCGv_ptr t_zd = tcg_temp_new_ptr();
96
+
97
+ tcg_gen_addi_ptr(t_zd, cpu_env, vec_full_reg_offset(s, rd));
98
+ if (esz == 3) {
99
+ gen_helper_sve_index_d(t_zd, start, incr, desc);
100
+ } else {
101
+ typedef void index_fn(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32);
102
+ static index_fn * const fns[3] = {
103
+ gen_helper_sve_index_b,
104
+ gen_helper_sve_index_h,
105
+ gen_helper_sve_index_s,
106
+ };
107
+ TCGv_i32 s32 = tcg_temp_new_i32();
108
+ TCGv_i32 i32 = tcg_temp_new_i32();
109
+
110
+ tcg_gen_extrl_i64_i32(s32, start);
111
+ tcg_gen_extrl_i64_i32(i32, incr);
112
+ fns[esz](t_zd, s32, i32, desc);
113
+
114
+ tcg_temp_free_i32(s32);
115
+ tcg_temp_free_i32(i32);
116
+ }
117
+ tcg_temp_free_ptr(t_zd);
118
+ tcg_temp_free_i32(desc);
119
+}
120
+
121
+static bool trans_INDEX_ii(DisasContext *s, arg_INDEX_ii *a, uint32_t insn)
122
+{
123
+ if (sve_access_check(s)) {
124
+ TCGv_i64 start = tcg_const_i64(a->imm1);
125
+ TCGv_i64 incr = tcg_const_i64(a->imm2);
126
+ do_index(s, a->esz, a->rd, start, incr);
127
+ tcg_temp_free_i64(start);
128
+ tcg_temp_free_i64(incr);
129
+ }
130
+ return true;
131
+}
132
+
133
+static bool trans_INDEX_ir(DisasContext *s, arg_INDEX_ir *a, uint32_t insn)
134
+{
135
+ if (sve_access_check(s)) {
136
+ TCGv_i64 start = tcg_const_i64(a->imm);
137
+ TCGv_i64 incr = cpu_reg(s, a->rm);
138
+ do_index(s, a->esz, a->rd, start, incr);
139
+ tcg_temp_free_i64(start);
140
+ }
141
+ return true;
142
+}
143
+
144
+static bool trans_INDEX_ri(DisasContext *s, arg_INDEX_ri *a, uint32_t insn)
145
+{
146
+ if (sve_access_check(s)) {
147
+ TCGv_i64 start = cpu_reg(s, a->rn);
148
+ TCGv_i64 incr = tcg_const_i64(a->imm);
149
+ do_index(s, a->esz, a->rd, start, incr);
150
+ tcg_temp_free_i64(incr);
151
+ }
152
+ return true;
153
+}
154
+
155
+static bool trans_INDEX_rr(DisasContext *s, arg_INDEX_rr *a, uint32_t insn)
156
+{
157
+ if (sve_access_check(s)) {
158
+ TCGv_i64 start = cpu_reg(s, a->rn);
159
+ TCGv_i64 incr = cpu_reg(s, a->rm);
160
+ do_index(s, a->esz, a->rd, start, incr);
161
+ }
162
+ return true;
163
+}
164
+
34
+
165
/*
35
/*
166
*** SVE Predicate Logical Operations Group
36
* PHY control register
167
*/
37
*/
168
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
38
@@ -XXX,XX +XXX,XX @@ static void ftgmac100_do_tx(FTGMAC100State *s, uint32_t tx_ring,
169
index XXXXXXX..XXXXXXX 100644
39
if (bd.des0 & s->txdes0_edotr) {
170
--- a/target/arm/sve.decode
40
addr = tx_ring;
171
+++ b/target/arm/sve.decode
41
} else {
172
@@ -XXX,XX +XXX,XX @@ ORR_zzz 00000100 01 1 ..... 001 100 ..... ..... @rd_rn_rm_e0
42
- addr += sizeof(FTGMAC100Desc);
173
EOR_zzz 00000100 10 1 ..... 001 100 ..... ..... @rd_rn_rm_e0
43
+ addr += FTGMAC100_DBLAC_TXDES_SIZE(s->dblac);
174
BIC_zzz 00000100 11 1 ..... 001 100 ..... ..... @rd_rn_rm_e0
44
}
175
45
}
176
+### SVE Index Generation Group
46
177
+
47
@@ -XXX,XX +XXX,XX @@ static void ftgmac100_write(void *opaque, hwaddr addr,
178
+# SVE index generation (immediate start, immediate increment)
48
s->phydata = value & 0xffff;
179
+INDEX_ii 00000100 esz:2 1 imm2:s5 010000 imm1:s5 rd:5
49
break;
180
+
50
case FTGMAC100_DBLAC: /* DMA Burst Length and Arbitration Control */
181
+# SVE index generation (immediate start, register increment)
51
+ if (FTGMAC100_DBLAC_TXDES_SIZE(s->dblac) < sizeof(FTGMAC100Desc)) {
182
+INDEX_ir 00000100 esz:2 1 rm:5 010010 imm:s5 rd:5
52
+ qemu_log_mask(LOG_GUEST_ERROR,
183
+
53
+ "%s: transmit descriptor too small : %d bytes\n",
184
+# SVE index generation (register start, immediate increment)
54
+ __func__, FTGMAC100_DBLAC_TXDES_SIZE(s->dblac));
185
+INDEX_ri 00000100 esz:2 1 imm:s5 010001 rn:5 rd:5
55
+ break;
186
+
56
+ }
187
+# SVE index generation (register start, register increment)
57
+ if (FTGMAC100_DBLAC_RXDES_SIZE(s->dblac) < sizeof(FTGMAC100Desc)) {
188
+INDEX_rr 00000100 .. 1 ..... 010011 ..... ..... @rd_rn_rm
58
+ qemu_log_mask(LOG_GUEST_ERROR,
189
+
59
+ "%s: receive descriptor too small : %d bytes\n",
190
### SVE Predicate Logical Operations Group
60
+ __func__, FTGMAC100_DBLAC_RXDES_SIZE(s->dblac));
191
61
+ break;
192
# SVE predicate logical operations
62
+ }
63
s->dblac = value;
64
break;
65
case FTGMAC100_REVR: /* Feature Register */
66
@@ -XXX,XX +XXX,XX @@ static ssize_t ftgmac100_receive(NetClientState *nc, const uint8_t *buf,
67
if (bd.des0 & s->rxdes0_edorr) {
68
addr = s->rx_ring;
69
} else {
70
- addr += sizeof(FTGMAC100Desc);
71
+ addr += FTGMAC100_DBLAC_RXDES_SIZE(s->dblac);
72
}
73
}
74
s->rx_descriptor = addr;
193
--
75
--
194
2.17.0
76
2.20.1
195
77
196
78
diff view generated by jsdifflib
1
From: Abdallah Bouassida <abdallah.bouassida@lauterbach.com>
1
From: fangying <fangying1@huawei.com>
2
2
3
Generate an XML description for the cp-regs.
3
Virtual time adjustment was implemented for virt-5.0 machine type,
4
Register these regs with the gdb_register_coprocessor().
4
but the cpu property was enabled only for host-passthrough and max
5
Add arm_gdb_get_sysreg() to use it as a callback to read those regs.
5
cpu model. Let's add it for any KVM arm cpu which has the generic
6
Add a dummy arm_gdb_set_sysreg().
6
timer feature enabled.
7
7
8
Signed-off-by: Abdallah Bouassida <abdallah.bouassida@lauterbach.com>
8
Signed-off-by: Ying Fang <fangying1@huawei.com>
9
Tested-by: Alex Bennée <alex.bennee@linaro.org>
9
Reviewed-by: Andrew Jones <drjones@redhat.com>
10
Message-id: 1524153386-3550-4-git-send-email-abdallah.bouassida@lauterbach.com
10
Message-id: 20200608121243.2076-1-fangying1@huawei.com
11
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
11
[PMM: minor commit message tweak, removed inaccurate
12
suggested-by tag]
12
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
13
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
13
---
14
---
14
include/qom/cpu.h | 5 ++-
15
target/arm/cpu.c | 6 ++++--
15
target/arm/cpu.h | 26 +++++++++++++++
16
target/arm/cpu64.c | 1 -
16
gdbstub.c | 10 ++++++
17
target/arm/kvm.c | 21 +++++++++++----------
17
target/arm/cpu.c | 1 +
18
3 files changed, 15 insertions(+), 13 deletions(-)
18
target/arm/gdbstub.c | 76 ++++++++++++++++++++++++++++++++++++++++++++
19
target/arm/helper.c | 26 +++++++++++++++
20
6 files changed, 143 insertions(+), 1 deletion(-)
21
19
22
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
23
index XXXXXXX..XXXXXXX 100644
24
--- a/include/qom/cpu.h
25
+++ b/include/qom/cpu.h
26
@@ -XXX,XX +XXX,XX @@ struct TranslationBlock;
27
* before the insn which triggers a watchpoint rather than after it.
28
* @gdb_arch_name: Optional callback that returns the architecture name known
29
* to GDB. The caller must free the returned string with g_free.
30
+ * @gdb_get_dynamic_xml: Callback to return dynamically generated XML for the
31
+ * gdb stub. Returns a pointer to the XML contents for the specified XML file
32
+ * or NULL if the CPU doesn't have a dynamically generated content for it.
33
* @cpu_exec_enter: Callback for cpu_exec preparation.
34
* @cpu_exec_exit: Callback for cpu_exec cleanup.
35
* @cpu_exec_interrupt: Callback for processing interrupts in cpu_exec.
36
@@ -XXX,XX +XXX,XX @@ typedef struct CPUClass {
37
const struct VMStateDescription *vmsd;
38
const char *gdb_core_xml_file;
39
gchar * (*gdb_arch_name)(CPUState *cpu);
40
-
41
+ const char * (*gdb_get_dynamic_xml)(CPUState *cpu, const char *xmlname);
42
void (*cpu_exec_enter)(CPUState *cpu);
43
void (*cpu_exec_exit)(CPUState *cpu);
44
bool (*cpu_exec_interrupt)(CPUState *cpu, int interrupt_request);
45
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
46
index XXXXXXX..XXXXXXX 100644
47
--- a/target/arm/cpu.h
48
+++ b/target/arm/cpu.h
49
@@ -XXX,XX +XXX,XX @@ enum {
50
s<2n+1> maps to the most significant half of d<n>
51
*/
52
53
+/**
54
+ * DynamicGDBXMLInfo:
55
+ * @desc: Contains the XML descriptions.
56
+ * @num_cpregs: Number of the Coprocessor registers seen by GDB.
57
+ * @cpregs_keys: Array that contains the corresponding Key of
58
+ * a given cpreg with the same order of the cpreg in the XML description.
59
+ */
60
+typedef struct DynamicGDBXMLInfo {
61
+ char *desc;
62
+ int num_cpregs;
63
+ uint32_t *cpregs_keys;
64
+} DynamicGDBXMLInfo;
65
+
66
/* CPU state for each instance of a generic timer (in cp15 c14) */
67
typedef struct ARMGenericTimer {
68
uint64_t cval; /* Timer CompareValue register */
69
@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
70
uint64_t *cpreg_vmstate_values;
71
int32_t cpreg_vmstate_array_len;
72
73
+ DynamicGDBXMLInfo dyn_xml;
74
+
75
/* Timers used by the generic (architected) timer */
76
QEMUTimer *gt_timer[NUM_GTIMERS];
77
/* GPIO outputs for generic timer */
78
@@ -XXX,XX +XXX,XX @@ hwaddr arm_cpu_get_phys_page_attrs_debug(CPUState *cpu, vaddr addr,
79
int arm_cpu_gdb_read_register(CPUState *cpu, uint8_t *buf, int reg);
80
int arm_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
81
82
+/* Dynamically generates for gdb stub an XML description of the sysregs from
83
+ * the cp_regs hashtable. Returns the registered sysregs number.
84
+ */
85
+int arm_gen_dynamic_xml(CPUState *cpu);
86
+
87
+/* Returns the dynamically generated XML for the gdb stub.
88
+ * Returns a pointer to the XML contents for the specified XML file or NULL
89
+ * if the XML name doesn't match the predefined one.
90
+ */
91
+const char *arm_gdb_get_dynamic_xml(CPUState *cpu, const char *xmlname);
92
+
93
int arm_cpu_write_elf64_note(WriteCoreDumpFunction f, CPUState *cs,
94
int cpuid, void *opaque);
95
int arm_cpu_write_elf32_note(WriteCoreDumpFunction f, CPUState *cs,
96
diff --git a/gdbstub.c b/gdbstub.c
97
index XXXXXXX..XXXXXXX 100644
98
--- a/gdbstub.c
99
+++ b/gdbstub.c
100
@@ -XXX,XX +XXX,XX @@ static const char *get_feature_xml(const char *p, const char **newp,
101
}
102
return target_xml;
103
}
104
+ if (cc->gdb_get_dynamic_xml) {
105
+ CPUState *cpu = first_cpu;
106
+ char *xmlname = g_strndup(p, len);
107
+ const char *xml = cc->gdb_get_dynamic_xml(cpu, xmlname);
108
+
109
+ g_free(xmlname);
110
+ if (xml) {
111
+ return xml;
112
+ }
113
+ }
114
for (i = 0; ; i++) {
115
name = xml_builtin[i][0];
116
if (!name || (strncmp(name, p, len) == 0 && strlen(name) == len))
117
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
20
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
118
index XXXXXXX..XXXXXXX 100644
21
index XXXXXXX..XXXXXXX 100644
119
--- a/target/arm/cpu.c
22
--- a/target/arm/cpu.c
120
+++ b/target/arm/cpu.c
23
+++ b/target/arm/cpu.c
121
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_class_init(ObjectClass *oc, void *data)
24
@@ -XXX,XX +XXX,XX @@ void arm_cpu_post_init(Object *obj)
122
cc->gdb_num_core_regs = 26;
25
if (arm_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER)) {
123
cc->gdb_core_xml_file = "arm-core.xml";
26
qdev_property_add_static(DEVICE(cpu), &arm_cpu_gt_cntfrq_property);
124
cc->gdb_arch_name = arm_gdb_arch_name;
27
}
125
+ cc->gdb_get_dynamic_xml = arm_gdb_get_dynamic_xml;
28
+
126
cc->gdb_stop_before_watchpoint = true;
29
+ if (kvm_enabled()) {
127
cc->debug_excp_handler = arm_debug_excp_handler;
30
+ kvm_arm_add_vcpu_properties(obj);
128
cc->debug_check_watchpoint = arm_debug_check_watchpoint;
31
+ }
129
diff --git a/target/arm/gdbstub.c b/target/arm/gdbstub.c
32
}
33
34
static void arm_cpu_finalizefn(Object *obj)
35
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
36
37
if (kvm_enabled()) {
38
kvm_arm_set_cpu_features_from_host(cpu);
39
- kvm_arm_add_vcpu_properties(obj);
40
} else {
41
cortex_a15_initfn(obj);
42
43
@@ -XXX,XX +XXX,XX @@ static void arm_host_initfn(Object *obj)
44
if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
45
aarch64_add_sve_properties(obj);
46
}
47
- kvm_arm_add_vcpu_properties(obj);
48
arm_cpu_post_init(obj);
49
}
50
51
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
130
index XXXXXXX..XXXXXXX 100644
52
index XXXXXXX..XXXXXXX 100644
131
--- a/target/arm/gdbstub.c
53
--- a/target/arm/cpu64.c
132
+++ b/target/arm/gdbstub.c
54
+++ b/target/arm/cpu64.c
133
@@ -XXX,XX +XXX,XX @@
55
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
134
#include "cpu.h"
56
135
#include "exec/gdbstub.h"
57
if (kvm_enabled()) {
136
58
kvm_arm_set_cpu_features_from_host(cpu);
137
+typedef struct RegisterSysregXmlParam {
59
- kvm_arm_add_vcpu_properties(obj);
138
+ CPUState *cs;
60
} else {
139
+ GString *s;
61
uint64_t t;
140
+} RegisterSysregXmlParam;
62
uint32_t u;
141
+
63
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
142
/* Old gdb always expect FPA registers. Newer (xml-aware) gdb only expect
64
index XXXXXXX..XXXXXXX 100644
143
whatever the target description contains. Due to a historical mishap
65
--- a/target/arm/kvm.c
144
the FPA registers appear in between core integer regs and the CPSR.
66
+++ b/target/arm/kvm.c
145
@@ -XXX,XX +XXX,XX @@ int arm_cpu_gdb_write_register(CPUState *cs, uint8_t *mem_buf, int n)
67
@@ -XXX,XX +XXX,XX @@ static void kvm_no_adjvtime_set(Object *obj, bool value, Error **errp)
146
/* Unknown register. */
68
/* KVM VCPU properties should be prefixed with "kvm-". */
147
return 0;
69
void kvm_arm_add_vcpu_properties(Object *obj)
70
{
71
- if (!kvm_enabled()) {
72
- return;
73
- }
74
+ ARMCPU *cpu = ARM_CPU(obj);
75
+ CPUARMState *env = &cpu->env;
76
77
- ARM_CPU(obj)->kvm_adjvtime = true;
78
- object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get,
79
- kvm_no_adjvtime_set);
80
- object_property_set_description(obj, "kvm-no-adjvtime",
81
- "Set on to disable the adjustment of "
82
- "the virtual counter. VM stopped time "
83
- "will be counted.");
84
+ if (arm_feature(env, ARM_FEATURE_GENERIC_TIMER)) {
85
+ cpu->kvm_adjvtime = true;
86
+ object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get,
87
+ kvm_no_adjvtime_set);
88
+ object_property_set_description(obj, "kvm-no-adjvtime",
89
+ "Set on to disable the adjustment of "
90
+ "the virtual counter. VM stopped time "
91
+ "will be counted.");
92
+ }
148
}
93
}
149
+
94
150
+static void arm_gen_one_xml_reg_tag(GString *s, DynamicGDBXMLInfo *dyn_xml,
95
bool kvm_arm_pmu_supported(CPUState *cpu)
151
+ ARMCPRegInfo *ri, uint32_t ri_key,
152
+ int bitsize)
153
+{
154
+ g_string_append_printf(s, "<reg name=\"%s\"", ri->name);
155
+ g_string_append_printf(s, " bitsize=\"%d\"", bitsize);
156
+ g_string_append_printf(s, " group=\"cp_regs\"/>");
157
+ dyn_xml->num_cpregs++;
158
+ dyn_xml->cpregs_keys[dyn_xml->num_cpregs - 1] = ri_key;
159
+}
160
+
161
+static void arm_register_sysreg_for_xml(gpointer key, gpointer value,
162
+ gpointer p)
163
+{
164
+ uint32_t ri_key = *(uint32_t *)key;
165
+ ARMCPRegInfo *ri = value;
166
+ RegisterSysregXmlParam *param = (RegisterSysregXmlParam *)p;
167
+ GString *s = param->s;
168
+ ARMCPU *cpu = ARM_CPU(param->cs);
169
+ CPUARMState *env = &cpu->env;
170
+ DynamicGDBXMLInfo *dyn_xml = &cpu->dyn_xml;
171
+
172
+ if (!(ri->type & (ARM_CP_NO_RAW | ARM_CP_NO_GDB))) {
173
+ if (arm_feature(env, ARM_FEATURE_AARCH64)) {
174
+ if (ri->state == ARM_CP_STATE_AA64) {
175
+ arm_gen_one_xml_reg_tag(s , dyn_xml, ri, ri_key, 64);
176
+ }
177
+ } else {
178
+ if (ri->state == ARM_CP_STATE_AA32) {
179
+ if (!arm_feature(env, ARM_FEATURE_EL3) &&
180
+ (ri->secure & ARM_CP_SECSTATE_S)) {
181
+ return;
182
+ }
183
+ if (ri->type & ARM_CP_64BIT) {
184
+ arm_gen_one_xml_reg_tag(s , dyn_xml, ri, ri_key, 64);
185
+ } else {
186
+ arm_gen_one_xml_reg_tag(s , dyn_xml, ri, ri_key, 32);
187
+ }
188
+ }
189
+ }
190
+ }
191
+}
192
+
193
+int arm_gen_dynamic_xml(CPUState *cs)
194
+{
195
+ ARMCPU *cpu = ARM_CPU(cs);
196
+ GString *s = g_string_new(NULL);
197
+ RegisterSysregXmlParam param = {cs, s};
198
+
199
+ cpu->dyn_xml.num_cpregs = 0;
200
+ cpu->dyn_xml.cpregs_keys = g_malloc(sizeof(uint32_t *) *
201
+ g_hash_table_size(cpu->cp_regs));
202
+ g_string_printf(s, "<?xml version=\"1.0\"?>");
203
+ g_string_append_printf(s, "<!DOCTYPE target SYSTEM \"gdb-target.dtd\">");
204
+ g_string_append_printf(s, "<feature name=\"org.qemu.gdb.arm.sys.regs\">");
205
+ g_hash_table_foreach(cpu->cp_regs, arm_register_sysreg_for_xml, &param);
206
+ g_string_append_printf(s, "</feature>");
207
+ cpu->dyn_xml.desc = g_string_free(s, false);
208
+ return cpu->dyn_xml.num_cpregs;
209
+}
210
+
211
+const char *arm_gdb_get_dynamic_xml(CPUState *cs, const char *xmlname)
212
+{
213
+ ARMCPU *cpu = ARM_CPU(cs);
214
+
215
+ if (strcmp(xmlname, "system-registers.xml") == 0) {
216
+ return cpu->dyn_xml.desc;
217
+ }
218
+ return NULL;
219
+}
220
diff --git a/target/arm/helper.c b/target/arm/helper.c
221
index XXXXXXX..XXXXXXX 100644
222
--- a/target/arm/helper.c
223
+++ b/target/arm/helper.c
224
@@ -XXX,XX +XXX,XX @@ static void write_raw_cp_reg(CPUARMState *env, const ARMCPRegInfo *ri,
225
}
226
}
227
228
+static int arm_gdb_get_sysreg(CPUARMState *env, uint8_t *buf, int reg)
229
+{
230
+ ARMCPU *cpu = arm_env_get_cpu(env);
231
+ const ARMCPRegInfo *ri;
232
+ uint32_t key;
233
+
234
+ key = cpu->dyn_xml.cpregs_keys[reg];
235
+ ri = get_arm_cp_reginfo(cpu->cp_regs, key);
236
+ if (ri) {
237
+ if (cpreg_field_is_64bit(ri)) {
238
+ return gdb_get_reg64(buf, (uint64_t)read_raw_cp_reg(env, ri));
239
+ } else {
240
+ return gdb_get_reg32(buf, (uint32_t)read_raw_cp_reg(env, ri));
241
+ }
242
+ }
243
+ return 0;
244
+}
245
+
246
+static int arm_gdb_set_sysreg(CPUARMState *env, uint8_t *buf, int reg)
247
+{
248
+ return 0;
249
+}
250
+
251
static bool raw_accessors_invalid(const ARMCPRegInfo *ri)
252
{
253
/* Return true if the regdef would cause an assertion if you called
254
@@ -XXX,XX +XXX,XX @@ void arm_cpu_register_gdb_regs_for_features(ARMCPU *cpu)
255
gdb_register_coprocessor(cs, vfp_gdb_get_reg, vfp_gdb_set_reg,
256
19, "arm-vfp.xml", 0);
257
}
258
+ gdb_register_coprocessor(cs, arm_gdb_get_sysreg, arm_gdb_set_sysreg,
259
+ arm_gen_dynamic_xml(cs),
260
+ "system-registers.xml", 0);
261
}
262
263
/* Sort alphabetically by type name, except for "any". */
264
--
96
--
265
2.17.0
97
2.20.1
266
98
267
99
diff view generated by jsdifflib
1
From: Richard Henderson <richard.henderson@linaro.org>
1
From: Jean-Christophe Dubois <jcd@tribudubois.net>
2
2
3
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
3
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
4
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
4
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
5
Message-id: 20180516223007.10256-16-richard.henderson@linaro.org
5
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
6
[PMD: Fixed 32-bit format string using PRIx32/PRIx64]
7
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
---
9
---
8
target/arm/translate-sve.c | 34 ++++++++++++++++++++++++++++++++++
10
hw/net/imx_fec.c | 106 +++++++++++++++++++-------------------------
9
target/arm/sve.decode | 13 +++++++++++++
11
hw/net/trace-events | 18 ++++++++
10
2 files changed, 47 insertions(+)
12
2 files changed, 63 insertions(+), 61 deletions(-)
11
13
12
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
14
diff --git a/hw/net/imx_fec.c b/hw/net/imx_fec.c
13
index XXXXXXX..XXXXXXX 100644
15
index XXXXXXX..XXXXXXX 100644
14
--- a/target/arm/translate-sve.c
16
--- a/hw/net/imx_fec.c
15
+++ b/target/arm/translate-sve.c
17
+++ b/hw/net/imx_fec.c
16
@@ -XXX,XX +XXX,XX @@ static bool trans_BIC_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
18
@@ -XXX,XX +XXX,XX @@
17
return do_vector3_z(s, tcg_gen_gvec_andc, 0, a->rd, a->rn, a->rm);
19
#include "qemu/module.h"
18
}
20
#include "net/checksum.h"
19
21
#include "net/eth.h"
20
+/*
22
+#include "trace.h"
21
+ *** SVE Integer Arithmetic - Unpredicated Group
23
22
+ */
24
/* For crc32 */
23
+
25
#include <zlib.h>
24
+static bool trans_ADD_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
26
25
+{
27
-#ifndef DEBUG_IMX_FEC
26
+ return do_vector3_z(s, tcg_gen_gvec_add, a->esz, a->rd, a->rn, a->rm);
28
-#define DEBUG_IMX_FEC 0
27
+}
29
-#endif
28
+
30
-
29
+static bool trans_SUB_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
31
-#define FEC_PRINTF(fmt, args...) \
30
+{
32
- do { \
31
+ return do_vector3_z(s, tcg_gen_gvec_sub, a->esz, a->rd, a->rn, a->rm);
33
- if (DEBUG_IMX_FEC) { \
32
+}
34
- fprintf(stderr, "[%s]%s: " fmt , TYPE_IMX_FEC, \
33
+
35
- __func__, ##args); \
34
+static bool trans_SQADD_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
36
- } \
35
+{
37
- } while (0)
36
+ return do_vector3_z(s, tcg_gen_gvec_ssadd, a->esz, a->rd, a->rn, a->rm);
38
-
37
+}
39
-#ifndef DEBUG_IMX_PHY
38
+
40
-#define DEBUG_IMX_PHY 0
39
+static bool trans_SQSUB_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
41
-#endif
40
+{
42
-
41
+ return do_vector3_z(s, tcg_gen_gvec_sssub, a->esz, a->rd, a->rn, a->rm);
43
-#define PHY_PRINTF(fmt, args...) \
42
+}
44
- do { \
43
+
45
- if (DEBUG_IMX_PHY) { \
44
+static bool trans_UQADD_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
46
- fprintf(stderr, "[%s.phy]%s: " fmt , TYPE_IMX_FEC, \
45
+{
47
- __func__, ##args); \
46
+ return do_vector3_z(s, tcg_gen_gvec_usadd, a->esz, a->rd, a->rn, a->rm);
48
- } \
47
+}
49
- } while (0)
48
+
50
-
49
+static bool trans_UQSUB_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
51
#define IMX_MAX_DESC 1024
50
+{
52
51
+ return do_vector3_z(s, tcg_gen_gvec_ussub, a->esz, a->rd, a->rn, a->rm);
53
static const char *imx_default_reg_name(IMXFECState *s, uint32_t index)
52
+}
54
@@ -XXX,XX +XXX,XX @@ static void imx_eth_update(IMXFECState *s);
53
+
55
* For now we don't handle any GPIO/interrupt line, so the OS will
54
/*
56
* have to poll for the PHY status.
55
*** SVE Integer Arithmetic - Binary Predicated Group
56
*/
57
*/
57
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
58
-static void phy_update_irq(IMXFECState *s)
59
+static void imx_phy_update_irq(IMXFECState *s)
60
{
61
imx_eth_update(s);
62
}
63
64
-static void phy_update_link(IMXFECState *s)
65
+static void imx_phy_update_link(IMXFECState *s)
66
{
67
/* Autonegotiation status mirrors link status. */
68
if (qemu_get_queue(s->nic)->link_down) {
69
- PHY_PRINTF("link is down\n");
70
+ trace_imx_phy_update_link("down");
71
s->phy_status &= ~0x0024;
72
s->phy_int |= PHY_INT_DOWN;
73
} else {
74
- PHY_PRINTF("link is up\n");
75
+ trace_imx_phy_update_link("up");
76
s->phy_status |= 0x0024;
77
s->phy_int |= PHY_INT_ENERGYON;
78
s->phy_int |= PHY_INT_AUTONEG_COMPLETE;
79
}
80
- phy_update_irq(s);
81
+ imx_phy_update_irq(s);
82
}
83
84
static void imx_eth_set_link(NetClientState *nc)
85
{
86
- phy_update_link(IMX_FEC(qemu_get_nic_opaque(nc)));
87
+ imx_phy_update_link(IMX_FEC(qemu_get_nic_opaque(nc)));
88
}
89
90
-static void phy_reset(IMXFECState *s)
91
+static void imx_phy_reset(IMXFECState *s)
92
{
93
+ trace_imx_phy_reset();
94
+
95
s->phy_status = 0x7809;
96
s->phy_control = 0x3000;
97
s->phy_advertise = 0x01e1;
98
s->phy_int_mask = 0;
99
s->phy_int = 0;
100
- phy_update_link(s);
101
+ imx_phy_update_link(s);
102
}
103
104
-static uint32_t do_phy_read(IMXFECState *s, int reg)
105
+static uint32_t imx_phy_read(IMXFECState *s, int reg)
106
{
107
uint32_t val;
108
109
@@ -XXX,XX +XXX,XX @@ static uint32_t do_phy_read(IMXFECState *s, int reg)
110
case 29: /* Interrupt source. */
111
val = s->phy_int;
112
s->phy_int = 0;
113
- phy_update_irq(s);
114
+ imx_phy_update_irq(s);
115
break;
116
case 30: /* Interrupt mask */
117
val = s->phy_int_mask;
118
@@ -XXX,XX +XXX,XX @@ static uint32_t do_phy_read(IMXFECState *s, int reg)
119
break;
120
}
121
122
- PHY_PRINTF("read 0x%04x @ %d\n", val, reg);
123
+ trace_imx_phy_read(val, reg);
124
125
return val;
126
}
127
128
-static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
129
+static void imx_phy_write(IMXFECState *s, int reg, uint32_t val)
130
{
131
- PHY_PRINTF("write 0x%04x @ %d\n", val, reg);
132
+ trace_imx_phy_write(val, reg);
133
134
if (reg > 31) {
135
/* we only advertise one phy */
136
@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
137
switch (reg) {
138
case 0: /* Basic Control */
139
if (val & 0x8000) {
140
- phy_reset(s);
141
+ imx_phy_reset(s);
142
} else {
143
s->phy_control = val & 0x7980;
144
/* Complete autonegotiation immediately. */
145
@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
146
break;
147
case 30: /* Interrupt mask */
148
s->phy_int_mask = val & 0xff;
149
- phy_update_irq(s);
150
+ imx_phy_update_irq(s);
151
break;
152
case 17:
153
case 18:
154
@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
155
static void imx_fec_read_bd(IMXFECBufDesc *bd, dma_addr_t addr)
156
{
157
dma_memory_read(&address_space_memory, addr, bd, sizeof(*bd));
158
+
159
+ trace_imx_fec_read_bd(addr, bd->flags, bd->length, bd->data);
160
}
161
162
static void imx_fec_write_bd(IMXFECBufDesc *bd, dma_addr_t addr)
163
@@ -XXX,XX +XXX,XX @@ static void imx_fec_write_bd(IMXFECBufDesc *bd, dma_addr_t addr)
164
static void imx_enet_read_bd(IMXENETBufDesc *bd, dma_addr_t addr)
165
{
166
dma_memory_read(&address_space_memory, addr, bd, sizeof(*bd));
167
+
168
+ trace_imx_enet_read_bd(addr, bd->flags, bd->length, bd->data,
169
+ bd->option, bd->status);
170
}
171
172
static void imx_enet_write_bd(IMXENETBufDesc *bd, dma_addr_t addr)
173
@@ -XXX,XX +XXX,XX @@ static void imx_fec_do_tx(IMXFECState *s)
174
int len;
175
176
imx_fec_read_bd(&bd, addr);
177
- FEC_PRINTF("tx_bd %x flags %04x len %d data %08x\n",
178
- addr, bd.flags, bd.length, bd.data);
179
if ((bd.flags & ENET_BD_R) == 0) {
180
+
181
/* Run out of descriptors to transmit. */
182
- FEC_PRINTF("tx_bd ran out of descriptors to transmit\n");
183
+ trace_imx_eth_tx_bd_busy();
184
+
185
break;
186
}
187
len = bd.length;
188
@@ -XXX,XX +XXX,XX @@ static void imx_enet_do_tx(IMXFECState *s, uint32_t index)
189
int len;
190
191
imx_enet_read_bd(&bd, addr);
192
- FEC_PRINTF("tx_bd %x flags %04x len %d data %08x option %04x "
193
- "status %04x\n", addr, bd.flags, bd.length, bd.data,
194
- bd.option, bd.status);
195
if ((bd.flags & ENET_BD_R) == 0) {
196
/* Run out of descriptors to transmit. */
197
+
198
+ trace_imx_eth_tx_bd_busy();
199
+
200
break;
201
}
202
len = bd.length;
203
@@ -XXX,XX +XXX,XX @@ static void imx_eth_enable_rx(IMXFECState *s, bool flush)
204
s->regs[ENET_RDAR] = (bd.flags & ENET_BD_E) ? ENET_RDAR_RDAR : 0;
205
206
if (!s->regs[ENET_RDAR]) {
207
- FEC_PRINTF("RX buffer full\n");
208
+ trace_imx_eth_rx_bd_full();
209
} else if (flush) {
210
qemu_flush_queued_packets(qemu_get_queue(s->nic));
211
}
212
@@ -XXX,XX +XXX,XX @@ static void imx_eth_reset(DeviceState *d)
213
memset(s->tx_descriptor, 0, sizeof(s->tx_descriptor));
214
215
/* We also reset the PHY */
216
- phy_reset(s);
217
+ imx_phy_reset(s);
218
}
219
220
static uint32_t imx_default_read(IMXFECState *s, uint32_t index)
221
@@ -XXX,XX +XXX,XX @@ static uint64_t imx_eth_read(void *opaque, hwaddr offset, unsigned size)
222
break;
223
}
224
225
- FEC_PRINTF("reg[%s] => 0x%" PRIx32 "\n", imx_eth_reg_name(s, index),
226
- value);
227
+ trace_imx_eth_read(index, imx_eth_reg_name(s, index), value);
228
229
return value;
230
}
231
@@ -XXX,XX +XXX,XX @@ static void imx_eth_write(void *opaque, hwaddr offset, uint64_t value,
232
const bool single_tx_ring = !imx_eth_is_multi_tx_ring(s);
233
uint32_t index = offset >> 2;
234
235
- FEC_PRINTF("reg[%s] <= 0x%" PRIx32 "\n", imx_eth_reg_name(s, index),
236
- (uint32_t)value);
237
+ trace_imx_eth_write(index, imx_eth_reg_name(s, index), value);
238
239
switch (index) {
240
case ENET_EIR:
241
@@ -XXX,XX +XXX,XX @@ static void imx_eth_write(void *opaque, hwaddr offset, uint64_t value,
242
if (extract32(value, 29, 1)) {
243
/* This is a read operation */
244
s->regs[ENET_MMFR] = deposit32(s->regs[ENET_MMFR], 0, 16,
245
- do_phy_read(s,
246
+ imx_phy_read(s,
247
extract32(value,
248
18, 10)));
249
} else {
250
/* This a write operation */
251
- do_phy_write(s, extract32(value, 18, 10), extract32(value, 0, 16));
252
+ imx_phy_write(s, extract32(value, 18, 10), extract32(value, 0, 16));
253
}
254
/* raise the interrupt as the PHY operation is done */
255
s->regs[ENET_EIR] |= ENET_INT_MII;
256
@@ -XXX,XX +XXX,XX @@ static bool imx_eth_can_receive(NetClientState *nc)
257
{
258
IMXFECState *s = IMX_FEC(qemu_get_nic_opaque(nc));
259
260
- FEC_PRINTF("\n");
261
-
262
return !!s->regs[ENET_RDAR];
263
}
264
265
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
266
unsigned int buf_len;
267
size_t size = len;
268
269
- FEC_PRINTF("len %d\n", (int)size);
270
+ trace_imx_fec_receive(size);
271
272
if (!s->regs[ENET_RDAR]) {
273
qemu_log_mask(LOG_GUEST_ERROR, "[%s]%s: Unexpected packet\n",
274
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
275
bd.length = buf_len;
276
size -= buf_len;
277
278
- FEC_PRINTF("rx_bd 0x%x length %d\n", addr, bd.length);
279
+ trace_imx_fec_receive_len(addr, bd.length);
280
281
/* The last 4 bytes are the CRC. */
282
if (size < 4) {
283
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
284
if (size == 0) {
285
/* Last buffer in frame. */
286
bd.flags |= flags | ENET_BD_L;
287
- FEC_PRINTF("rx frame flags %04x\n", bd.flags);
288
+
289
+ trace_imx_fec_receive_last(bd.flags);
290
+
291
s->regs[ENET_EIR] |= ENET_INT_RXF;
292
} else {
293
s->regs[ENET_EIR] |= ENET_INT_RXB;
294
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
295
size_t size = len;
296
bool shift16 = s->regs[ENET_RACC] & ENET_RACC_SHIFT16;
297
298
- FEC_PRINTF("len %d\n", (int)size);
299
+ trace_imx_enet_receive(size);
300
301
if (!s->regs[ENET_RDAR]) {
302
qemu_log_mask(LOG_GUEST_ERROR, "[%s]%s: Unexpected packet\n",
303
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
304
bd.length = buf_len;
305
size -= buf_len;
306
307
- FEC_PRINTF("rx_bd 0x%x length %d\n", addr, bd.length);
308
+ trace_imx_enet_receive_len(addr, bd.length);
309
310
/* The last 4 bytes are the CRC. */
311
if (size < 4) {
312
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
313
if (size == 0) {
314
/* Last buffer in frame. */
315
bd.flags |= flags | ENET_BD_L;
316
- FEC_PRINTF("rx frame flags %04x\n", bd.flags);
317
+
318
+ trace_imx_enet_receive_last(bd.flags);
319
+
320
/* Indicate that we've updated the last buffer descriptor. */
321
bd.last_buffer = ENET_BD_BDU;
322
if (bd.option & ENET_BD_RX_INT) {
323
diff --git a/hw/net/trace-events b/hw/net/trace-events
58
index XXXXXXX..XXXXXXX 100644
324
index XXXXXXX..XXXXXXX 100644
59
--- a/target/arm/sve.decode
325
--- a/hw/net/trace-events
60
+++ b/target/arm/sve.decode
326
+++ b/hw/net/trace-events
61
@@ -XXX,XX +XXX,XX @@
327
@@ -XXX,XX +XXX,XX @@ i82596_receive_packet(size_t sz) "len=%zu"
62
# Three predicate operand, with governing predicate, flag setting
328
i82596_new_mac(const char *id_with_mac) "New MAC for: %s"
63
@pd_pg_pn_pm_s ........ . s:1 .. rm:4 .. pg:4 . rn:4 . rd:4 &rprr_s
329
i82596_set_multicast(uint16_t count) "Added %d multicast entries"
64
330
i82596_channel_attention(void *s) "%p: Received CHANNEL ATTENTION"
65
+# Three operand, vector element size
331
+
66
+@rd_rn_rm ........ esz:2 . rm:5 ... ... rn:5 rd:5 &rrr_esz
332
+# imx_fec.c
67
+
333
+imx_phy_read(uint32_t val, int reg) "0x%04"PRIx32" <= reg[%d]"
68
# Two register operand, with governing predicate, vector element size
334
+imx_phy_write(uint32_t val, int reg) "0x%04"PRIx32" => reg[%d]"
69
@rdn_pg_rm ........ esz:2 ... ... ... pg:3 rm:5 rd:5 \
335
+imx_phy_update_link(const char *s) "%s"
70
&rprr_esz rn=%reg_movprfx
336
+imx_phy_reset(void) ""
71
@@ -XXX,XX +XXX,XX @@ MLS 00000100 .. 0 ..... 011 ... ..... ..... @rda_pg_rn_rm
337
+imx_fec_read_bd(uint64_t addr, int flags, int len, int data) "tx_bd 0x%"PRIx64" flags 0x%04x len %d data 0x%08x"
72
MLA 00000100 .. 0 ..... 110 ... ..... ..... @rdn_pg_ra_rm # MAD
338
+imx_enet_read_bd(uint64_t addr, int flags, int len, int data, int options, int status) "tx_bd 0x%"PRIx64" flags 0x%04x len %d data 0x%08x option 0x%04x status 0x%04x"
73
MLS 00000100 .. 0 ..... 111 ... ..... ..... @rdn_pg_ra_rm # MSB
339
+imx_eth_tx_bd_busy(void) "tx_bd ran out of descriptors to transmit"
74
340
+imx_eth_rx_bd_full(void) "RX buffer is full"
75
+### SVE Integer Arithmetic - Unpredicated Group
341
+imx_eth_read(int reg, const char *reg_name, uint32_t value) "reg[%d:%s] => 0x%08"PRIx32
76
+
342
+imx_eth_write(int reg, const char *reg_name, uint64_t value) "reg[%d:%s] <= 0x%08"PRIx64
77
+# SVE integer add/subtract vectors (unpredicated)
343
+imx_fec_receive(size_t size) "len %zu"
78
+ADD_zzz 00000100 .. 1 ..... 000 000 ..... ..... @rd_rn_rm
344
+imx_fec_receive_len(uint64_t addr, int len) "rx_bd 0x%"PRIx64" length %d"
79
+SUB_zzz 00000100 .. 1 ..... 000 001 ..... ..... @rd_rn_rm
345
+imx_fec_receive_last(int last) "rx frame flags 0x%04x"
80
+SQADD_zzz 00000100 .. 1 ..... 000 100 ..... ..... @rd_rn_rm
346
+imx_enet_receive(size_t size) "len %zu"
81
+UQADD_zzz 00000100 .. 1 ..... 000 101 ..... ..... @rd_rn_rm
347
+imx_enet_receive_len(uint64_t addr, int len) "rx_bd 0x%"PRIx64" length %d"
82
+SQSUB_zzz 00000100 .. 1 ..... 000 110 ..... ..... @rd_rn_rm
348
+imx_enet_receive_last(int last) "rx frame flags 0x%04x"
83
+UQSUB_zzz 00000100 .. 1 ..... 000 111 ..... ..... @rd_rn_rm
84
+
85
### SVE Logical - Unpredicated Group
86
87
# SVE bitwise logical operations (unpredicated)
88
--
349
--
89
2.17.0
350
2.20.1
90
351
91
352
diff view generated by jsdifflib
1
From: Richard Henderson <richard.henderson@linaro.org>
1
From: Guenter Roeck <linux@roeck-us.net>
2
2
3
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
3
The Linux kernel's IMX code now uses vendor specific commands.
4
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
4
This results in endless warnings when booting the Linux kernel.
5
Message-id: 20180516223007.10256-15-richard.henderson@linaro.org
5
6
sdhci-esdhc-imx 2194000.usdhc: esdhc_wait_for_card_clock_gate_off:
7
    card clock still not gate off in 100us!.
8
9
Implement support for the vendor specific command implemented in IMX hardware
10
to be able to avoid this warning.
11
12
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
13
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
14
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
15
Message-id: 20200603145258.195920-2-linux@roeck-us.net
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
16
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
---
17
---
8
target/arm/helper-sve.h | 18 ++++++++++++
18
hw/sd/sdhci-internal.h | 5 +++++
9
target/arm/sve_helper.c | 57 ++++++++++++++++++++++++++++++++++++++
19
include/hw/sd/sdhci.h | 5 +++++
10
target/arm/translate-sve.c | 34 +++++++++++++++++++++++
20
hw/sd/sdhci.c | 18 +++++++++++++++++-
11
target/arm/sve.decode | 17 ++++++++++++
21
3 files changed, 27 insertions(+), 1 deletion(-)
12
4 files changed, 126 insertions(+)
13
22
14
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
23
diff --git a/hw/sd/sdhci-internal.h b/hw/sd/sdhci-internal.h
15
index XXXXXXX..XXXXXXX 100644
24
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/helper-sve.h
25
--- a/hw/sd/sdhci-internal.h
17
+++ b/target/arm/helper-sve.h
26
+++ b/hw/sd/sdhci-internal.h
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_neg_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
27
@@ -XXX,XX +XXX,XX @@
19
DEF_HELPER_FLAGS_4(sve_neg_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
28
#define SDHC_CMD_INHIBIT 0x00000001
20
DEF_HELPER_FLAGS_4(sve_neg_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
29
#define SDHC_DATA_INHIBIT 0x00000002
21
30
#define SDHC_DAT_LINE_ACTIVE 0x00000004
22
+DEF_HELPER_FLAGS_6(sve_mla_b, TCG_CALL_NO_RWG,
31
+#define SDHC_IMX_CLOCK_GATE_OFF 0x00000080
23
+ void, ptr, ptr, ptr, ptr, ptr, i32)
32
#define SDHC_DOING_WRITE 0x00000100
24
+DEF_HELPER_FLAGS_6(sve_mla_h, TCG_CALL_NO_RWG,
33
#define SDHC_DOING_READ 0x00000200
25
+ void, ptr, ptr, ptr, ptr, ptr, i32)
34
#define SDHC_SPACE_AVAILABLE 0x00000400
26
+DEF_HELPER_FLAGS_6(sve_mla_s, TCG_CALL_NO_RWG,
35
@@ -XXX,XX +XXX,XX @@ extern const VMStateDescription sdhci_vmstate;
27
+ void, ptr, ptr, ptr, ptr, ptr, i32)
36
28
+DEF_HELPER_FLAGS_6(sve_mla_d, TCG_CALL_NO_RWG,
37
29
+ void, ptr, ptr, ptr, ptr, ptr, i32)
38
#define ESDHC_MIX_CTRL 0x48
30
+
39
+
31
+DEF_HELPER_FLAGS_6(sve_mls_b, TCG_CALL_NO_RWG,
40
#define ESDHC_VENDOR_SPEC 0xc0
32
+ void, ptr, ptr, ptr, ptr, ptr, i32)
41
+#define ESDHC_IMX_FRC_SDCLK_ON (1 << 8)
33
+DEF_HELPER_FLAGS_6(sve_mls_h, TCG_CALL_NO_RWG,
34
+ void, ptr, ptr, ptr, ptr, ptr, i32)
35
+DEF_HELPER_FLAGS_6(sve_mls_s, TCG_CALL_NO_RWG,
36
+ void, ptr, ptr, ptr, ptr, ptr, i32)
37
+DEF_HELPER_FLAGS_6(sve_mls_d, TCG_CALL_NO_RWG,
38
+ void, ptr, ptr, ptr, ptr, ptr, i32)
39
+
42
+
40
DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
43
#define ESDHC_DLL_CTRL 0x60
41
DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
44
42
DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
45
#define ESDHC_TUNING_CTRL 0xcc
43
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
46
@@ -XXX,XX +XXX,XX @@ extern const VMStateDescription sdhci_vmstate;
47
#define DEFINE_SDHCI_COMMON_PROPERTIES(_state) \
48
DEFINE_PROP_UINT8("sd-spec-version", _state, sd_spec_version, 2), \
49
DEFINE_PROP_UINT8("uhs", _state, uhs_mode, UHS_NOT_SUPPORTED), \
50
+ DEFINE_PROP_UINT8("vendor", _state, vendor, SDHCI_VENDOR_NONE), \
51
\
52
/* Capabilities registers provide information on supported
53
* features of this specific host controller implementation */ \
54
diff --git a/include/hw/sd/sdhci.h b/include/hw/sd/sdhci.h
44
index XXXXXXX..XXXXXXX 100644
55
index XXXXXXX..XXXXXXX 100644
45
--- a/target/arm/sve_helper.c
56
--- a/include/hw/sd/sdhci.h
46
+++ b/target/arm/sve_helper.c
57
+++ b/include/hw/sd/sdhci.h
47
@@ -XXX,XX +XXX,XX @@ DO_ZPZI_D(sve_asrd_d, int64_t, DO_ASRD)
58
@@ -XXX,XX +XXX,XX @@ typedef struct SDHCIState {
48
#undef DO_ASRD
59
uint16_t acmd12errsts; /* Auto CMD12 error status register */
49
#undef DO_ZPZI
60
uint16_t hostctl2; /* Host Control 2 */
50
#undef DO_ZPZI_D
61
uint64_t admasysaddr; /* ADMA System Address Register */
51
+
62
+ uint16_t vendor_spec; /* Vendor specific register */
52
+/* Fully general four-operand expander, controlled by a predicate.
63
53
+ */
64
/* Read-only registers */
54
+#define DO_ZPZZZ(NAME, TYPE, H, OP) \
65
uint64_t capareg; /* Capabilities Register */
55
+void HELPER(NAME)(void *vd, void *va, void *vn, void *vm, \
66
@@ -XXX,XX +XXX,XX @@ typedef struct SDHCIState {
56
+ void *vg, uint32_t desc) \
67
uint32_t quirks;
57
+{ \
68
uint8_t sd_spec_version;
58
+ intptr_t i, opr_sz = simd_oprsz(desc); \
69
uint8_t uhs_mode;
59
+ for (i = 0; i < opr_sz; ) { \
70
+ uint8_t vendor; /* For vendor specific functionality */
60
+ uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \
71
} SDHCIState;
61
+ do { \
72
62
+ if (pg & 1) { \
73
+#define SDHCI_VENDOR_NONE 0
63
+ TYPE nn = *(TYPE *)(vn + H(i)); \
74
+#define SDHCI_VENDOR_IMX 1
64
+ TYPE mm = *(TYPE *)(vm + H(i)); \
65
+ TYPE aa = *(TYPE *)(va + H(i)); \
66
+ *(TYPE *)(vd + H(i)) = OP(aa, nn, mm); \
67
+ } \
68
+ i += sizeof(TYPE), pg >>= sizeof(TYPE); \
69
+ } while (i & 15); \
70
+ } \
71
+}
72
+
73
+/* Similarly, specialized for 64-bit operands. */
74
+#define DO_ZPZZZ_D(NAME, TYPE, OP) \
75
+void HELPER(NAME)(void *vd, void *va, void *vn, void *vm, \
76
+ void *vg, uint32_t desc) \
77
+{ \
78
+ intptr_t i, opr_sz = simd_oprsz(desc) / 8; \
79
+ TYPE *d = vd, *a = va, *n = vn, *m = vm; \
80
+ uint8_t *pg = vg; \
81
+ for (i = 0; i < opr_sz; i += 1) { \
82
+ if (pg[H1(i)] & 1) { \
83
+ TYPE aa = a[i], nn = n[i], mm = m[i]; \
84
+ d[i] = OP(aa, nn, mm); \
85
+ } \
86
+ } \
87
+}
88
+
89
+#define DO_MLA(A, N, M) (A + N * M)
90
+#define DO_MLS(A, N, M) (A - N * M)
91
+
92
+DO_ZPZZZ(sve_mla_b, uint8_t, H1, DO_MLA)
93
+DO_ZPZZZ(sve_mls_b, uint8_t, H1, DO_MLS)
94
+
95
+DO_ZPZZZ(sve_mla_h, uint16_t, H1_2, DO_MLA)
96
+DO_ZPZZZ(sve_mls_h, uint16_t, H1_2, DO_MLS)
97
+
98
+DO_ZPZZZ(sve_mla_s, uint32_t, H1_4, DO_MLA)
99
+DO_ZPZZZ(sve_mls_s, uint32_t, H1_4, DO_MLS)
100
+
101
+DO_ZPZZZ_D(sve_mla_d, uint64_t, DO_MLA)
102
+DO_ZPZZZ_D(sve_mls_d, uint64_t, DO_MLS)
103
+
104
+#undef DO_MLA
105
+#undef DO_MLS
106
+#undef DO_ZPZZZ
107
+#undef DO_ZPZZZ_D
108
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
109
index XXXXXXX..XXXXXXX 100644
110
--- a/target/arm/translate-sve.c
111
+++ b/target/arm/translate-sve.c
112
@@ -XXX,XX +XXX,XX @@ DO_ZPZW(LSL, lsl)
113
114
#undef DO_ZPZW
115
116
+/*
117
+ *** SVE Integer Multiply-Add Group
118
+ */
119
+
120
+static bool do_zpzzz_ool(DisasContext *s, arg_rprrr_esz *a,
121
+ gen_helper_gvec_5 *fn)
122
+{
123
+ if (sve_access_check(s)) {
124
+ unsigned vsz = vec_full_reg_size(s);
125
+ tcg_gen_gvec_5_ool(vec_full_reg_offset(s, a->rd),
126
+ vec_full_reg_offset(s, a->ra),
127
+ vec_full_reg_offset(s, a->rn),
128
+ vec_full_reg_offset(s, a->rm),
129
+ pred_full_reg_offset(s, a->pg),
130
+ vsz, vsz, 0, fn);
131
+ }
132
+ return true;
133
+}
134
+
135
+#define DO_ZPZZZ(NAME, name) \
136
+static bool trans_##NAME(DisasContext *s, arg_rprrr_esz *a, uint32_t insn) \
137
+{ \
138
+ static gen_helper_gvec_5 * const fns[4] = { \
139
+ gen_helper_sve_##name##_b, gen_helper_sve_##name##_h, \
140
+ gen_helper_sve_##name##_s, gen_helper_sve_##name##_d, \
141
+ }; \
142
+ return do_zpzzz_ool(s, a, fns[a->esz]); \
143
+}
144
+
145
+DO_ZPZZZ(MLA, mla)
146
+DO_ZPZZZ(MLS, mls)
147
+
148
+#undef DO_ZPZZZ
149
+
75
+
150
/*
76
/*
151
*** SVE Predicate Logical Operations Group
77
* Controller does not provide transfer-complete interrupt when not
152
*/
78
* busy.
153
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
79
diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
154
index XXXXXXX..XXXXXXX 100644
80
index XXXXXXX..XXXXXXX 100644
155
--- a/target/arm/sve.decode
81
--- a/hw/sd/sdhci.c
156
+++ b/target/arm/sve.decode
82
+++ b/hw/sd/sdhci.c
157
@@ -XXX,XX +XXX,XX @@
83
@@ -XXX,XX +XXX,XX @@ static uint64_t usdhc_read(void *opaque, hwaddr offset, unsigned size)
158
&rpr_esz rd pg rn esz
84
}
159
&rprr_s rd pg rn rm s
85
break;
160
&rprr_esz rd pg rn rm esz
86
161
+&rprrr_esz rd pg rn rm ra esz
87
+ case ESDHC_VENDOR_SPEC:
162
&rpri_esz rd pg rn imm esz
88
+ ret = s->vendor_spec;
163
89
+ break;
164
###########################################################################
90
case ESDHC_DLL_CTRL:
165
@@ -XXX,XX +XXX,XX @@
91
case ESDHC_TUNE_CTRL_STATUS:
166
@rdm_pg_rn ........ esz:2 ... ... ... pg:3 rn:5 rd:5 \
92
case ESDHC_UNDOCUMENTED_REG27:
167
&rprr_esz rm=%reg_movprfx
93
case ESDHC_TUNING_CTRL:
168
94
- case ESDHC_VENDOR_SPEC:
169
+# Three register operand, with governing predicate, vector element size
95
case ESDHC_MIX_CTRL:
170
+@rda_pg_rn_rm ........ esz:2 . rm:5 ... pg:3 rn:5 rd:5 \
96
case ESDHC_WTMK_LVL:
171
+ &rprrr_esz ra=%reg_movprfx
97
ret = 0;
172
+@rdn_pg_ra_rm ........ esz:2 . rm:5 ... pg:3 ra:5 rd:5 \
98
@@ -XXX,XX +XXX,XX @@ usdhc_write(void *opaque, hwaddr offset, uint64_t val, unsigned size)
173
+ &rprrr_esz rn=%reg_movprfx
99
case ESDHC_UNDOCUMENTED_REG27:
100
case ESDHC_TUNING_CTRL:
101
case ESDHC_WTMK_LVL:
102
+ break;
174
+
103
+
175
# One register operand, with governing predicate, vector element size
104
case ESDHC_VENDOR_SPEC:
176
@rd_pg_rn ........ esz:2 ... ... ... pg:3 rn:5 rd:5 &rpr_esz
105
+ s->vendor_spec = value;
177
106
+ switch (s->vendor) {
178
@@ -XXX,XX +XXX,XX @@ UXTH 00000100 .. 010 011 101 ... ..... ..... @rd_pg_rn
107
+ case SDHCI_VENDOR_IMX:
179
SXTW 00000100 .. 010 100 101 ... ..... ..... @rd_pg_rn
108
+ if (value & ESDHC_IMX_FRC_SDCLK_ON) {
180
UXTW 00000100 .. 010 101 101 ... ..... ..... @rd_pg_rn
109
+ s->prnsts &= ~SDHC_IMX_CLOCK_GATE_OFF;
181
110
+ } else {
182
+### SVE Integer Multiply-Add Group
111
+ s->prnsts |= SDHC_IMX_CLOCK_GATE_OFF;
183
+
112
+ }
184
+# SVE integer multiply-add writing addend (predicated)
113
+ break;
185
+MLA 00000100 .. 0 ..... 010 ... ..... ..... @rda_pg_rn_rm
114
+ default:
186
+MLS 00000100 .. 0 ..... 011 ... ..... ..... @rda_pg_rn_rm
115
+ break;
187
+
116
+ }
188
+# SVE integer multiply-add writing multiplicand (predicated)
117
break;
189
+MLA 00000100 .. 0 ..... 110 ... ..... ..... @rdn_pg_ra_rm # MAD
118
190
+MLS 00000100 .. 0 ..... 111 ... ..... ..... @rdn_pg_ra_rm # MSB
119
case SDHC_HOSTCTL:
191
+
192
### SVE Logical - Unpredicated Group
193
194
# SVE bitwise logical operations (unpredicated)
195
--
120
--
196
2.17.0
121
2.20.1
197
122
198
123
diff view generated by jsdifflib
1
From: Francisco Iglesias <frasse.iglesias@gmail.com>
1
From: Guenter Roeck <linux@roeck-us.net>
2
2
3
The ZynqMP contains two instances of a generic DMA, the GDMA, located in the
3
Set vendor property to IMX to enable IMX specific functionality
4
FPD (full power domain), and the ADMA, located in LPD (low power domain). This
4
in sdhci code.
5
patch adds these two DMAs to the ZynqMP board.
6
5
7
Signed-off-by: Francisco Iglesias <frasse.iglesias@gmail.com>
6
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
8
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
7
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
9
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
8
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
10
Message-id: 20180503214201.29082-3-frasse.iglesias@gmail.com
9
Message-id: 20200603145258.195920-3-linux@roeck-us.net
11
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
12
---
11
---
13
include/hw/arm/xlnx-zynqmp.h | 5 ++++
12
hw/arm/fsl-imx25.c | 6 ++++++
14
hw/arm/xlnx-zynqmp.c | 53 ++++++++++++++++++++++++++++++++++++
13
hw/arm/fsl-imx6.c | 6 ++++++
15
2 files changed, 58 insertions(+)
14
hw/arm/fsl-imx6ul.c | 2 ++
15
hw/arm/fsl-imx7.c | 2 ++
16
4 files changed, 16 insertions(+)
16
17
17
diff --git a/include/hw/arm/xlnx-zynqmp.h b/include/hw/arm/xlnx-zynqmp.h
18
diff --git a/hw/arm/fsl-imx25.c b/hw/arm/fsl-imx25.c
18
index XXXXXXX..XXXXXXX 100644
19
index XXXXXXX..XXXXXXX 100644
19
--- a/include/hw/arm/xlnx-zynqmp.h
20
--- a/hw/arm/fsl-imx25.c
20
+++ b/include/hw/arm/xlnx-zynqmp.h
21
+++ b/hw/arm/fsl-imx25.c
21
@@ -XXX,XX +XXX,XX @@
22
@@ -XXX,XX +XXX,XX @@ static void fsl_imx25_realize(DeviceState *dev, Error **errp)
22
#include "hw/sd/sdhci.h"
23
&err);
23
#include "hw/ssi/xilinx_spips.h"
24
object_property_set_uint(OBJECT(&s->esdhc[i]), IMX25_ESDHC_CAPABILITIES,
24
#include "hw/dma/xlnx_dpdma.h"
25
"capareg", &err);
25
+#include "hw/dma/xlnx-zdma.h"
26
+ object_property_set_uint(OBJECT(&s->esdhc[i]), SDHCI_VENDOR_IMX,
26
#include "hw/display/xlnx_dp.h"
27
+ "vendor", &err);
27
#include "hw/intc/xlnx-zynqmp-ipi.h"
28
#include "hw/timer/xlnx-zynqmp-rtc.h"
29
@@ -XXX,XX +XXX,XX @@
30
#define XLNX_ZYNQMP_NUM_UARTS 2
31
#define XLNX_ZYNQMP_NUM_SDHCI 2
32
#define XLNX_ZYNQMP_NUM_SPIS 2
33
+#define XLNX_ZYNQMP_NUM_GDMA_CH 8
34
+#define XLNX_ZYNQMP_NUM_ADMA_CH 8
35
36
#define XLNX_ZYNQMP_NUM_QSPI_BUS 2
37
#define XLNX_ZYNQMP_NUM_QSPI_BUS_CS 2
38
@@ -XXX,XX +XXX,XX @@ typedef struct XlnxZynqMPState {
39
XlnxDPDMAState dpdma;
40
XlnxZynqMPIPI ipi;
41
XlnxZynqMPRTC rtc;
42
+ XlnxZDMA gdma[XLNX_ZYNQMP_NUM_GDMA_CH];
43
+ XlnxZDMA adma[XLNX_ZYNQMP_NUM_ADMA_CH];
44
45
char *boot_cpu;
46
ARMCPU *boot_cpu_ptr;
47
diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
48
index XXXXXXX..XXXXXXX 100644
49
--- a/hw/arm/xlnx-zynqmp.c
50
+++ b/hw/arm/xlnx-zynqmp.c
51
@@ -XXX,XX +XXX,XX @@ static const int spi_intr[XLNX_ZYNQMP_NUM_SPIS] = {
52
19, 20,
53
};
54
55
+static const uint64_t gdma_ch_addr[XLNX_ZYNQMP_NUM_GDMA_CH] = {
56
+ 0xFD500000, 0xFD510000, 0xFD520000, 0xFD530000,
57
+ 0xFD540000, 0xFD550000, 0xFD560000, 0xFD570000
58
+};
59
+
60
+static const int gdma_ch_intr[XLNX_ZYNQMP_NUM_GDMA_CH] = {
61
+ 124, 125, 126, 127, 128, 129, 130, 131
62
+};
63
+
64
+static const uint64_t adma_ch_addr[XLNX_ZYNQMP_NUM_ADMA_CH] = {
65
+ 0xFFA80000, 0xFFA90000, 0xFFAA0000, 0xFFAB0000,
66
+ 0xFFAC0000, 0xFFAD0000, 0xFFAE0000, 0xFFAF0000
67
+};
68
+
69
+static const int adma_ch_intr[XLNX_ZYNQMP_NUM_ADMA_CH] = {
70
+ 77, 78, 79, 80, 81, 82, 83, 84
71
+};
72
+
73
typedef struct XlnxZynqMPGICRegion {
74
int region_index;
75
uint32_t address;
76
@@ -XXX,XX +XXX,XX @@ static void xlnx_zynqmp_init(Object *obj)
77
78
object_initialize(&s->rtc, sizeof(s->rtc), TYPE_XLNX_ZYNQMP_RTC);
79
qdev_set_parent_bus(DEVICE(&s->rtc), sysbus_get_default());
80
+
81
+ for (i = 0; i < XLNX_ZYNQMP_NUM_GDMA_CH; i++) {
82
+ object_initialize(&s->gdma[i], sizeof(s->gdma[i]), TYPE_XLNX_ZDMA);
83
+ qdev_set_parent_bus(DEVICE(&s->gdma[i]), sysbus_get_default());
84
+ }
85
+
86
+ for (i = 0; i < XLNX_ZYNQMP_NUM_ADMA_CH; i++) {
87
+ object_initialize(&s->adma[i], sizeof(s->adma[i]), TYPE_XLNX_ZDMA);
88
+ qdev_set_parent_bus(DEVICE(&s->adma[i]), sysbus_get_default());
89
+ }
90
}
91
92
static void xlnx_zynqmp_realize(DeviceState *dev, Error **errp)
93
@@ -XXX,XX +XXX,XX @@ static void xlnx_zynqmp_realize(DeviceState *dev, Error **errp)
94
}
95
sysbus_mmio_map(SYS_BUS_DEVICE(&s->rtc), 0, RTC_ADDR);
96
sysbus_connect_irq(SYS_BUS_DEVICE(&s->rtc), 0, gic_spi[RTC_IRQ]);
97
+
98
+ for (i = 0; i < XLNX_ZYNQMP_NUM_GDMA_CH; i++) {
99
+ object_property_set_uint(OBJECT(&s->gdma[i]), 128, "bus-width", &err);
100
+ object_property_set_bool(OBJECT(&s->gdma[i]), true, "realized", &err);
101
+ if (err) {
28
+ if (err) {
102
+ error_propagate(errp, err);
29
+ error_propagate(errp, err);
103
+ return;
30
+ return;
104
+ }
31
+ }
105
+
32
object_property_set_bool(OBJECT(&s->esdhc[i]), true, "realized", &err);
106
+ sysbus_mmio_map(SYS_BUS_DEVICE(&s->gdma[i]), 0, gdma_ch_addr[i]);
33
if (err) {
107
+ sysbus_connect_irq(SYS_BUS_DEVICE(&s->gdma[i]), 0,
34
error_propagate(errp, err);
108
+ gic_spi[gdma_ch_intr[i]]);
35
diff --git a/hw/arm/fsl-imx6.c b/hw/arm/fsl-imx6.c
109
+ }
36
index XXXXXXX..XXXXXXX 100644
110
+
37
--- a/hw/arm/fsl-imx6.c
111
+ for (i = 0; i < XLNX_ZYNQMP_NUM_ADMA_CH; i++) {
38
+++ b/hw/arm/fsl-imx6.c
112
+ object_property_set_bool(OBJECT(&s->adma[i]), true, "realized", &err);
39
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6_realize(DeviceState *dev, Error **errp)
40
&err);
41
object_property_set_uint(OBJECT(&s->esdhc[i]), IMX6_ESDHC_CAPABILITIES,
42
"capareg", &err);
43
+ object_property_set_uint(OBJECT(&s->esdhc[i]), SDHCI_VENDOR_IMX,
44
+ "vendor", &err);
113
+ if (err) {
45
+ if (err) {
114
+ error_propagate(errp, err);
46
+ error_propagate(errp, err);
115
+ return;
47
+ return;
116
+ }
48
+ }
117
+
49
object_property_set_bool(OBJECT(&s->esdhc[i]), true, "realized", &err);
118
+ sysbus_mmio_map(SYS_BUS_DEVICE(&s->adma[i]), 0, adma_ch_addr[i]);
50
if (err) {
119
+ sysbus_connect_irq(SYS_BUS_DEVICE(&s->adma[i]), 0,
51
error_propagate(errp, err);
120
+ gic_spi[adma_ch_intr[i]]);
52
diff --git a/hw/arm/fsl-imx6ul.c b/hw/arm/fsl-imx6ul.c
121
+ }
53
index XXXXXXX..XXXXXXX 100644
122
}
54
--- a/hw/arm/fsl-imx6ul.c
123
55
+++ b/hw/arm/fsl-imx6ul.c
124
static Property xlnx_zynqmp_props[] = {
56
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
57
FSL_IMX6UL_USDHC2_IRQ,
58
};
59
60
+ object_property_set_uint(OBJECT(&s->usdhc[i]), SDHCI_VENDOR_IMX,
61
+ "vendor", &error_abort);
62
object_property_set_bool(OBJECT(&s->usdhc[i]), true, "realized",
63
&error_abort);
64
65
diff --git a/hw/arm/fsl-imx7.c b/hw/arm/fsl-imx7.c
66
index XXXXXXX..XXXXXXX 100644
67
--- a/hw/arm/fsl-imx7.c
68
+++ b/hw/arm/fsl-imx7.c
69
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
70
FSL_IMX7_USDHC3_IRQ,
71
};
72
73
+ object_property_set_uint(OBJECT(&s->usdhc[i]), SDHCI_VENDOR_IMX,
74
+ "vendor", &error_abort);
75
object_property_set_bool(OBJECT(&s->usdhc[i]), true, "realized",
76
&error_abort);
77
125
--
78
--
126
2.17.0
79
2.20.1
127
80
128
81
diff view generated by jsdifflib
Deleted patch
1
From: Eric Auger <eric.auger@redhat.com>
2
1
3
Coverity complains about use of uninitialized Evt struct.
4
The EVT_SET_TYPE and similar setters use deposit32() on fields
5
in the struct, so they read the uninitialized existing values.
6
In cases where we don't set all the fields in the event struct
7
we'll end up leaking random uninitialized data from QEMU's
8
stack into the guest.
9
10
Initializing the struct with "Evt evt = {};" ought to satisfy
11
Coverity and fix the data leak.
12
13
Signed-off-by: Eric Auger <eric.auger@redhat.com>
14
Reported-by: Peter Maydell <peter.maydell@linaro.org>
15
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
16
Message-id: 1526493784-25328-2-git-send-email-eric.auger@redhat.com
17
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
18
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
19
---
20
hw/arm/smmuv3.c | 2 +-
21
1 file changed, 1 insertion(+), 1 deletion(-)
22
23
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
24
index XXXXXXX..XXXXXXX 100644
25
--- a/hw/arm/smmuv3.c
26
+++ b/hw/arm/smmuv3.c
27
@@ -XXX,XX +XXX,XX @@ static MemTxResult smmuv3_write_eventq(SMMUv3State *s, Evt *evt)
28
29
void smmuv3_record_event(SMMUv3State *s, SMMUEventInfo *info)
30
{
31
- Evt evt;
32
+ Evt evt = {};
33
MemTxResult r;
34
35
if (!smmuv3_eventq_enabled(s)) {
36
--
37
2.17.0
38
39
diff view generated by jsdifflib
Deleted patch
1
From: Richard Henderson <richard.henderson@linaro.org>
2
1
3
Including only 4, as-yet unimplemented, instruction patterns
4
so that the whole thing compiles.
5
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20180516223007.10256-3-richard.henderson@linaro.org
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
10
---
11
target/arm/Makefile.objs | 10 ++++++
12
target/arm/translate-a64.c | 7 ++++-
13
target/arm/translate-sve.c | 63 ++++++++++++++++++++++++++++++++++++++
14
.gitignore | 1 +
15
target/arm/sve.decode | 45 +++++++++++++++++++++++++++
16
5 files changed, 125 insertions(+), 1 deletion(-)
17
create mode 100644 target/arm/translate-sve.c
18
create mode 100644 target/arm/sve.decode
19
20
diff --git a/target/arm/Makefile.objs b/target/arm/Makefile.objs
21
index XXXXXXX..XXXXXXX 100644
22
--- a/target/arm/Makefile.objs
23
+++ b/target/arm/Makefile.objs
24
@@ -XXX,XX +XXX,XX @@ obj-y += gdbstub.o
25
obj-$(TARGET_AARCH64) += cpu64.o translate-a64.o helper-a64.o gdbstub64.o
26
obj-y += crypto_helper.o
27
obj-$(CONFIG_SOFTMMU) += arm-powerctl.o
28
+
29
+DECODETREE = $(SRC_PATH)/scripts/decodetree.py
30
+
31
+target/arm/decode-sve.inc.c: $(SRC_PATH)/target/arm/sve.decode $(DECODETREE)
32
+    $(call quiet-command,\
33
+     $(PYTHON) $(DECODETREE) --decode disas_sve -o $@ $<,\
34
+     "GEN", $(TARGET_DIR)$@)
35
+
36
+target/arm/translate-sve.o: target/arm/decode-sve.inc.c
37
+obj-$(TARGET_AARCH64) += translate-sve.o
38
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
39
index XXXXXXX..XXXXXXX 100644
40
--- a/target/arm/translate-a64.c
41
+++ b/target/arm/translate-a64.c
42
@@ -XXX,XX +XXX,XX @@ static void disas_a64_insn(CPUARMState *env, DisasContext *s)
43
s->fp_access_checked = false;
44
45
switch (extract32(insn, 25, 4)) {
46
- case 0x0: case 0x1: case 0x2: case 0x3: /* UNALLOCATED */
47
+ case 0x0: case 0x1: case 0x3: /* UNALLOCATED */
48
unallocated_encoding(s);
49
break;
50
+ case 0x2:
51
+ if (!arm_dc_feature(s, ARM_FEATURE_SVE) || !disas_sve(s, insn)) {
52
+ unallocated_encoding(s);
53
+ }
54
+ break;
55
case 0x8: case 0x9: /* Data processing - immediate */
56
disas_data_proc_imm(s, insn);
57
break;
58
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
59
new file mode 100644
60
index XXXXXXX..XXXXXXX
61
--- /dev/null
62
+++ b/target/arm/translate-sve.c
63
@@ -XXX,XX +XXX,XX @@
64
+/*
65
+ * AArch64 SVE translation
66
+ *
67
+ * Copyright (c) 2018 Linaro, Ltd
68
+ *
69
+ * This library is free software; you can redistribute it and/or
70
+ * modify it under the terms of the GNU Lesser General Public
71
+ * License as published by the Free Software Foundation; either
72
+ * version 2 of the License, or (at your option) any later version.
73
+ *
74
+ * This library is distributed in the hope that it will be useful,
75
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
76
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
77
+ * Lesser General Public License for more details.
78
+ *
79
+ * You should have received a copy of the GNU Lesser General Public
80
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
81
+ */
82
+
83
+#include "qemu/osdep.h"
84
+#include "cpu.h"
85
+#include "exec/exec-all.h"
86
+#include "tcg-op.h"
87
+#include "tcg-op-gvec.h"
88
+#include "qemu/log.h"
89
+#include "arm_ldst.h"
90
+#include "translate.h"
91
+#include "internals.h"
92
+#include "exec/helper-proto.h"
93
+#include "exec/helper-gen.h"
94
+#include "exec/log.h"
95
+#include "trace-tcg.h"
96
+#include "translate-a64.h"
97
+
98
+/*
99
+ * Include the generated decoder.
100
+ */
101
+
102
+#include "decode-sve.inc.c"
103
+
104
+/*
105
+ * Implement all of the translator functions referenced by the decoder.
106
+ */
107
+
108
+static bool trans_AND_zzz(DisasContext *s, arg_AND_zzz *a, uint32_t insn)
109
+{
110
+ return false;
111
+}
112
+
113
+static bool trans_ORR_zzz(DisasContext *s, arg_ORR_zzz *a, uint32_t insn)
114
+{
115
+ return false;
116
+}
117
+
118
+static bool trans_EOR_zzz(DisasContext *s, arg_EOR_zzz *a, uint32_t insn)
119
+{
120
+ return false;
121
+}
122
+
123
+static bool trans_BIC_zzz(DisasContext *s, arg_BIC_zzz *a, uint32_t insn)
124
+{
125
+ return false;
126
+}
127
diff --git a/.gitignore b/.gitignore
128
index XXXXXXX..XXXXXXX 100644
129
--- a/.gitignore
130
+++ b/.gitignore
131
@@ -XXX,XX +XXX,XX @@ trace-dtrace-root.h
132
trace-dtrace-root.dtrace
133
trace-ust-all.h
134
trace-ust-all.c
135
+/target/arm/decode-sve.inc.c
136
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
137
new file mode 100644
138
index XXXXXXX..XXXXXXX
139
--- /dev/null
140
+++ b/target/arm/sve.decode
141
@@ -XXX,XX +XXX,XX @@
142
+# AArch64 SVE instruction descriptions
143
+#
144
+# Copyright (c) 2017 Linaro, Ltd
145
+#
146
+# This library is free software; you can redistribute it and/or
147
+# modify it under the terms of the GNU Lesser General Public
148
+# License as published by the Free Software Foundation; either
149
+# version 2 of the License, or (at your option) any later version.
150
+#
151
+# This library is distributed in the hope that it will be useful,
152
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
153
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
154
+# Lesser General Public License for more details.
155
+#
156
+# You should have received a copy of the GNU Lesser General Public
157
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
158
+
159
+#
160
+# This file is processed by scripts/decodetree.py
161
+#
162
+
163
+###########################################################################
164
+# Named attribute sets. These are used to make nice(er) names
165
+# when creating helpers common to those for the individual
166
+# instruction patterns.
167
+
168
+&rrr_esz rd rn rm esz
169
+
170
+###########################################################################
171
+# Named instruction formats. These are generally used to
172
+# reduce the amount of duplication between instruction patterns.
173
+
174
+# Three operand with unused vector element size
175
+@rd_rn_rm_e0 ........ ... rm:5 ... ... rn:5 rd:5 &rrr_esz esz=0
176
+
177
+###########################################################################
178
+# Instruction patterns. Grouped according to the SVE encodingindex.xhtml.
179
+
180
+### SVE Logical - Unpredicated Group
181
+
182
+# SVE bitwise logical operations (unpredicated)
183
+AND_zzz 00000100 00 1 ..... 001 100 ..... ..... @rd_rn_rm_e0
184
+ORR_zzz 00000100 01 1 ..... 001 100 ..... ..... @rd_rn_rm_e0
185
+EOR_zzz 00000100 10 1 ..... 001 100 ..... ..... @rd_rn_rm_e0
186
+BIC_zzz 00000100 11 1 ..... 001 100 ..... ..... @rd_rn_rm_e0
187
--
188
2.17.0
189
190
diff view generated by jsdifflib
Deleted patch
1
From: Richard Henderson <richard.henderson@linaro.org>
2
1
3
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
4
Message-id: 20180516223007.10256-9-richard.henderson@linaro.org
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
---
7
target/arm/helper-sve.h | 145 +++++++++++++++++++++++++++
8
target/arm/sve_helper.c | 194 +++++++++++++++++++++++++++++++++++++
9
target/arm/translate-sve.c | 68 +++++++++++++
10
target/arm/sve.decode | 42 ++++++++
11
4 files changed, 449 insertions(+)
12
13
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
14
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/helper-sve.h
16
+++ b/target/arm/helper-sve.h
17
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(sve_predtest, TCG_CALL_NO_WG, i32, ptr, ptr, i32)
18
DEF_HELPER_FLAGS_3(sve_pfirst, TCG_CALL_NO_WG, i32, ptr, ptr, i32)
19
DEF_HELPER_FLAGS_3(sve_pnext, TCG_CALL_NO_WG, i32, ptr, ptr, i32)
20
21
+DEF_HELPER_FLAGS_5(sve_and_zpzz_b, TCG_CALL_NO_RWG,
22
+ void, ptr, ptr, ptr, ptr, i32)
23
+DEF_HELPER_FLAGS_5(sve_and_zpzz_h, TCG_CALL_NO_RWG,
24
+ void, ptr, ptr, ptr, ptr, i32)
25
+DEF_HELPER_FLAGS_5(sve_and_zpzz_s, TCG_CALL_NO_RWG,
26
+ void, ptr, ptr, ptr, ptr, i32)
27
+DEF_HELPER_FLAGS_5(sve_and_zpzz_d, TCG_CALL_NO_RWG,
28
+ void, ptr, ptr, ptr, ptr, i32)
29
+
30
+DEF_HELPER_FLAGS_5(sve_eor_zpzz_b, TCG_CALL_NO_RWG,
31
+ void, ptr, ptr, ptr, ptr, i32)
32
+DEF_HELPER_FLAGS_5(sve_eor_zpzz_h, TCG_CALL_NO_RWG,
33
+ void, ptr, ptr, ptr, ptr, i32)
34
+DEF_HELPER_FLAGS_5(sve_eor_zpzz_s, TCG_CALL_NO_RWG,
35
+ void, ptr, ptr, ptr, ptr, i32)
36
+DEF_HELPER_FLAGS_5(sve_eor_zpzz_d, TCG_CALL_NO_RWG,
37
+ void, ptr, ptr, ptr, ptr, i32)
38
+
39
+DEF_HELPER_FLAGS_5(sve_orr_zpzz_b, TCG_CALL_NO_RWG,
40
+ void, ptr, ptr, ptr, ptr, i32)
41
+DEF_HELPER_FLAGS_5(sve_orr_zpzz_h, TCG_CALL_NO_RWG,
42
+ void, ptr, ptr, ptr, ptr, i32)
43
+DEF_HELPER_FLAGS_5(sve_orr_zpzz_s, TCG_CALL_NO_RWG,
44
+ void, ptr, ptr, ptr, ptr, i32)
45
+DEF_HELPER_FLAGS_5(sve_orr_zpzz_d, TCG_CALL_NO_RWG,
46
+ void, ptr, ptr, ptr, ptr, i32)
47
+
48
+DEF_HELPER_FLAGS_5(sve_bic_zpzz_b, TCG_CALL_NO_RWG,
49
+ void, ptr, ptr, ptr, ptr, i32)
50
+DEF_HELPER_FLAGS_5(sve_bic_zpzz_h, TCG_CALL_NO_RWG,
51
+ void, ptr, ptr, ptr, ptr, i32)
52
+DEF_HELPER_FLAGS_5(sve_bic_zpzz_s, TCG_CALL_NO_RWG,
53
+ void, ptr, ptr, ptr, ptr, i32)
54
+DEF_HELPER_FLAGS_5(sve_bic_zpzz_d, TCG_CALL_NO_RWG,
55
+ void, ptr, ptr, ptr, ptr, i32)
56
+
57
+DEF_HELPER_FLAGS_5(sve_add_zpzz_b, TCG_CALL_NO_RWG,
58
+ void, ptr, ptr, ptr, ptr, i32)
59
+DEF_HELPER_FLAGS_5(sve_add_zpzz_h, TCG_CALL_NO_RWG,
60
+ void, ptr, ptr, ptr, ptr, i32)
61
+DEF_HELPER_FLAGS_5(sve_add_zpzz_s, TCG_CALL_NO_RWG,
62
+ void, ptr, ptr, ptr, ptr, i32)
63
+DEF_HELPER_FLAGS_5(sve_add_zpzz_d, TCG_CALL_NO_RWG,
64
+ void, ptr, ptr, ptr, ptr, i32)
65
+
66
+DEF_HELPER_FLAGS_5(sve_sub_zpzz_b, TCG_CALL_NO_RWG,
67
+ void, ptr, ptr, ptr, ptr, i32)
68
+DEF_HELPER_FLAGS_5(sve_sub_zpzz_h, TCG_CALL_NO_RWG,
69
+ void, ptr, ptr, ptr, ptr, i32)
70
+DEF_HELPER_FLAGS_5(sve_sub_zpzz_s, TCG_CALL_NO_RWG,
71
+ void, ptr, ptr, ptr, ptr, i32)
72
+DEF_HELPER_FLAGS_5(sve_sub_zpzz_d, TCG_CALL_NO_RWG,
73
+ void, ptr, ptr, ptr, ptr, i32)
74
+
75
+DEF_HELPER_FLAGS_5(sve_smax_zpzz_b, TCG_CALL_NO_RWG,
76
+ void, ptr, ptr, ptr, ptr, i32)
77
+DEF_HELPER_FLAGS_5(sve_smax_zpzz_h, TCG_CALL_NO_RWG,
78
+ void, ptr, ptr, ptr, ptr, i32)
79
+DEF_HELPER_FLAGS_5(sve_smax_zpzz_s, TCG_CALL_NO_RWG,
80
+ void, ptr, ptr, ptr, ptr, i32)
81
+DEF_HELPER_FLAGS_5(sve_smax_zpzz_d, TCG_CALL_NO_RWG,
82
+ void, ptr, ptr, ptr, ptr, i32)
83
+
84
+DEF_HELPER_FLAGS_5(sve_umax_zpzz_b, TCG_CALL_NO_RWG,
85
+ void, ptr, ptr, ptr, ptr, i32)
86
+DEF_HELPER_FLAGS_5(sve_umax_zpzz_h, TCG_CALL_NO_RWG,
87
+ void, ptr, ptr, ptr, ptr, i32)
88
+DEF_HELPER_FLAGS_5(sve_umax_zpzz_s, TCG_CALL_NO_RWG,
89
+ void, ptr, ptr, ptr, ptr, i32)
90
+DEF_HELPER_FLAGS_5(sve_umax_zpzz_d, TCG_CALL_NO_RWG,
91
+ void, ptr, ptr, ptr, ptr, i32)
92
+
93
+DEF_HELPER_FLAGS_5(sve_smin_zpzz_b, TCG_CALL_NO_RWG,
94
+ void, ptr, ptr, ptr, ptr, i32)
95
+DEF_HELPER_FLAGS_5(sve_smin_zpzz_h, TCG_CALL_NO_RWG,
96
+ void, ptr, ptr, ptr, ptr, i32)
97
+DEF_HELPER_FLAGS_5(sve_smin_zpzz_s, TCG_CALL_NO_RWG,
98
+ void, ptr, ptr, ptr, ptr, i32)
99
+DEF_HELPER_FLAGS_5(sve_smin_zpzz_d, TCG_CALL_NO_RWG,
100
+ void, ptr, ptr, ptr, ptr, i32)
101
+
102
+DEF_HELPER_FLAGS_5(sve_umin_zpzz_b, TCG_CALL_NO_RWG,
103
+ void, ptr, ptr, ptr, ptr, i32)
104
+DEF_HELPER_FLAGS_5(sve_umin_zpzz_h, TCG_CALL_NO_RWG,
105
+ void, ptr, ptr, ptr, ptr, i32)
106
+DEF_HELPER_FLAGS_5(sve_umin_zpzz_s, TCG_CALL_NO_RWG,
107
+ void, ptr, ptr, ptr, ptr, i32)
108
+DEF_HELPER_FLAGS_5(sve_umin_zpzz_d, TCG_CALL_NO_RWG,
109
+ void, ptr, ptr, ptr, ptr, i32)
110
+
111
+DEF_HELPER_FLAGS_5(sve_sabd_zpzz_b, TCG_CALL_NO_RWG,
112
+ void, ptr, ptr, ptr, ptr, i32)
113
+DEF_HELPER_FLAGS_5(sve_sabd_zpzz_h, TCG_CALL_NO_RWG,
114
+ void, ptr, ptr, ptr, ptr, i32)
115
+DEF_HELPER_FLAGS_5(sve_sabd_zpzz_s, TCG_CALL_NO_RWG,
116
+ void, ptr, ptr, ptr, ptr, i32)
117
+DEF_HELPER_FLAGS_5(sve_sabd_zpzz_d, TCG_CALL_NO_RWG,
118
+ void, ptr, ptr, ptr, ptr, i32)
119
+
120
+DEF_HELPER_FLAGS_5(sve_uabd_zpzz_b, TCG_CALL_NO_RWG,
121
+ void, ptr, ptr, ptr, ptr, i32)
122
+DEF_HELPER_FLAGS_5(sve_uabd_zpzz_h, TCG_CALL_NO_RWG,
123
+ void, ptr, ptr, ptr, ptr, i32)
124
+DEF_HELPER_FLAGS_5(sve_uabd_zpzz_s, TCG_CALL_NO_RWG,
125
+ void, ptr, ptr, ptr, ptr, i32)
126
+DEF_HELPER_FLAGS_5(sve_uabd_zpzz_d, TCG_CALL_NO_RWG,
127
+ void, ptr, ptr, ptr, ptr, i32)
128
+
129
+DEF_HELPER_FLAGS_5(sve_mul_zpzz_b, TCG_CALL_NO_RWG,
130
+ void, ptr, ptr, ptr, ptr, i32)
131
+DEF_HELPER_FLAGS_5(sve_mul_zpzz_h, TCG_CALL_NO_RWG,
132
+ void, ptr, ptr, ptr, ptr, i32)
133
+DEF_HELPER_FLAGS_5(sve_mul_zpzz_s, TCG_CALL_NO_RWG,
134
+ void, ptr, ptr, ptr, ptr, i32)
135
+DEF_HELPER_FLAGS_5(sve_mul_zpzz_d, TCG_CALL_NO_RWG,
136
+ void, ptr, ptr, ptr, ptr, i32)
137
+
138
+DEF_HELPER_FLAGS_5(sve_smulh_zpzz_b, TCG_CALL_NO_RWG,
139
+ void, ptr, ptr, ptr, ptr, i32)
140
+DEF_HELPER_FLAGS_5(sve_smulh_zpzz_h, TCG_CALL_NO_RWG,
141
+ void, ptr, ptr, ptr, ptr, i32)
142
+DEF_HELPER_FLAGS_5(sve_smulh_zpzz_s, TCG_CALL_NO_RWG,
143
+ void, ptr, ptr, ptr, ptr, i32)
144
+DEF_HELPER_FLAGS_5(sve_smulh_zpzz_d, TCG_CALL_NO_RWG,
145
+ void, ptr, ptr, ptr, ptr, i32)
146
+
147
+DEF_HELPER_FLAGS_5(sve_umulh_zpzz_b, TCG_CALL_NO_RWG,
148
+ void, ptr, ptr, ptr, ptr, i32)
149
+DEF_HELPER_FLAGS_5(sve_umulh_zpzz_h, TCG_CALL_NO_RWG,
150
+ void, ptr, ptr, ptr, ptr, i32)
151
+DEF_HELPER_FLAGS_5(sve_umulh_zpzz_s, TCG_CALL_NO_RWG,
152
+ void, ptr, ptr, ptr, ptr, i32)
153
+DEF_HELPER_FLAGS_5(sve_umulh_zpzz_d, TCG_CALL_NO_RWG,
154
+ void, ptr, ptr, ptr, ptr, i32)
155
+
156
+DEF_HELPER_FLAGS_5(sve_sdiv_zpzz_s, TCG_CALL_NO_RWG,
157
+ void, ptr, ptr, ptr, ptr, i32)
158
+DEF_HELPER_FLAGS_5(sve_sdiv_zpzz_d, TCG_CALL_NO_RWG,
159
+ void, ptr, ptr, ptr, ptr, i32)
160
+
161
+DEF_HELPER_FLAGS_5(sve_udiv_zpzz_s, TCG_CALL_NO_RWG,
162
+ void, ptr, ptr, ptr, ptr, i32)
163
+DEF_HELPER_FLAGS_5(sve_udiv_zpzz_d, TCG_CALL_NO_RWG,
164
+ void, ptr, ptr, ptr, ptr, i32)
165
+
166
DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
167
DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
168
DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
169
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
170
index XXXXXXX..XXXXXXX 100644
171
--- a/target/arm/sve_helper.c
172
+++ b/target/arm/sve_helper.c
173
@@ -XXX,XX +XXX,XX @@
174
#include "tcg/tcg-gvec-desc.h"
175
176
177
+/* Note that vector data is stored in host-endian 64-bit chunks,
178
+ so addressing units smaller than that needs a host-endian fixup. */
179
+#ifdef HOST_WORDS_BIGENDIAN
180
+#define H1(x) ((x) ^ 7)
181
+#define H1_2(x) ((x) ^ 6)
182
+#define H1_4(x) ((x) ^ 4)
183
+#define H2(x) ((x) ^ 3)
184
+#define H4(x) ((x) ^ 1)
185
+#else
186
+#define H1(x) (x)
187
+#define H1_2(x) (x)
188
+#define H1_4(x) (x)
189
+#define H2(x) (x)
190
+#define H4(x) (x)
191
+#endif
192
+
193
/* Return a value for NZCV as per the ARM PredTest pseudofunction.
194
*
195
* The return value has bit 31 set if N is set, bit 1 set if Z is clear,
196
@@ -XXX,XX +XXX,XX @@ LOGICAL_PPPP(sve_nand_pppp, DO_NAND)
197
#undef DO_SEL
198
#undef LOGICAL_PPPP
199
200
+/* Fully general three-operand expander, controlled by a predicate.
201
+ * This is complicated by the host-endian storage of the register file.
202
+ */
203
+/* ??? I don't expect the compiler could ever vectorize this itself.
204
+ * With some tables we can convert bit masks to byte masks, and with
205
+ * extra care wrt byte/word ordering we could use gcc generic vectors
206
+ * and do 16 bytes at a time.
207
+ */
208
+#define DO_ZPZZ(NAME, TYPE, H, OP) \
209
+void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc) \
210
+{ \
211
+ intptr_t i, opr_sz = simd_oprsz(desc); \
212
+ for (i = 0; i < opr_sz; ) { \
213
+ uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \
214
+ do { \
215
+ if (pg & 1) { \
216
+ TYPE nn = *(TYPE *)(vn + H(i)); \
217
+ TYPE mm = *(TYPE *)(vm + H(i)); \
218
+ *(TYPE *)(vd + H(i)) = OP(nn, mm); \
219
+ } \
220
+ i += sizeof(TYPE), pg >>= sizeof(TYPE); \
221
+ } while (i & 15); \
222
+ } \
223
+}
224
+
225
+/* Similarly, specialized for 64-bit operands. */
226
+#define DO_ZPZZ_D(NAME, TYPE, OP) \
227
+void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc) \
228
+{ \
229
+ intptr_t i, opr_sz = simd_oprsz(desc) / 8; \
230
+ TYPE *d = vd, *n = vn, *m = vm; \
231
+ uint8_t *pg = vg; \
232
+ for (i = 0; i < opr_sz; i += 1) { \
233
+ if (pg[H1(i)] & 1) { \
234
+ TYPE nn = n[i], mm = m[i]; \
235
+ d[i] = OP(nn, mm); \
236
+ } \
237
+ } \
238
+}
239
+
240
+#define DO_AND(N, M) (N & M)
241
+#define DO_EOR(N, M) (N ^ M)
242
+#define DO_ORR(N, M) (N | M)
243
+#define DO_BIC(N, M) (N & ~M)
244
+#define DO_ADD(N, M) (N + M)
245
+#define DO_SUB(N, M) (N - M)
246
+#define DO_MAX(N, M) ((N) >= (M) ? (N) : (M))
247
+#define DO_MIN(N, M) ((N) >= (M) ? (M) : (N))
248
+#define DO_ABD(N, M) ((N) >= (M) ? (N) - (M) : (M) - (N))
249
+#define DO_MUL(N, M) (N * M)
250
+#define DO_DIV(N, M) (M ? N / M : 0)
251
+
252
+DO_ZPZZ(sve_and_zpzz_b, uint8_t, H1, DO_AND)
253
+DO_ZPZZ(sve_and_zpzz_h, uint16_t, H1_2, DO_AND)
254
+DO_ZPZZ(sve_and_zpzz_s, uint32_t, H1_4, DO_AND)
255
+DO_ZPZZ_D(sve_and_zpzz_d, uint64_t, DO_AND)
256
+
257
+DO_ZPZZ(sve_orr_zpzz_b, uint8_t, H1, DO_ORR)
258
+DO_ZPZZ(sve_orr_zpzz_h, uint16_t, H1_2, DO_ORR)
259
+DO_ZPZZ(sve_orr_zpzz_s, uint32_t, H1_4, DO_ORR)
260
+DO_ZPZZ_D(sve_orr_zpzz_d, uint64_t, DO_ORR)
261
+
262
+DO_ZPZZ(sve_eor_zpzz_b, uint8_t, H1, DO_EOR)
263
+DO_ZPZZ(sve_eor_zpzz_h, uint16_t, H1_2, DO_EOR)
264
+DO_ZPZZ(sve_eor_zpzz_s, uint32_t, H1_4, DO_EOR)
265
+DO_ZPZZ_D(sve_eor_zpzz_d, uint64_t, DO_EOR)
266
+
267
+DO_ZPZZ(sve_bic_zpzz_b, uint8_t, H1, DO_BIC)
268
+DO_ZPZZ(sve_bic_zpzz_h, uint16_t, H1_2, DO_BIC)
269
+DO_ZPZZ(sve_bic_zpzz_s, uint32_t, H1_4, DO_BIC)
270
+DO_ZPZZ_D(sve_bic_zpzz_d, uint64_t, DO_BIC)
271
+
272
+DO_ZPZZ(sve_add_zpzz_b, uint8_t, H1, DO_ADD)
273
+DO_ZPZZ(sve_add_zpzz_h, uint16_t, H1_2, DO_ADD)
274
+DO_ZPZZ(sve_add_zpzz_s, uint32_t, H1_4, DO_ADD)
275
+DO_ZPZZ_D(sve_add_zpzz_d, uint64_t, DO_ADD)
276
+
277
+DO_ZPZZ(sve_sub_zpzz_b, uint8_t, H1, DO_SUB)
278
+DO_ZPZZ(sve_sub_zpzz_h, uint16_t, H1_2, DO_SUB)
279
+DO_ZPZZ(sve_sub_zpzz_s, uint32_t, H1_4, DO_SUB)
280
+DO_ZPZZ_D(sve_sub_zpzz_d, uint64_t, DO_SUB)
281
+
282
+DO_ZPZZ(sve_smax_zpzz_b, int8_t, H1, DO_MAX)
283
+DO_ZPZZ(sve_smax_zpzz_h, int16_t, H1_2, DO_MAX)
284
+DO_ZPZZ(sve_smax_zpzz_s, int32_t, H1_4, DO_MAX)
285
+DO_ZPZZ_D(sve_smax_zpzz_d, int64_t, DO_MAX)
286
+
287
+DO_ZPZZ(sve_umax_zpzz_b, uint8_t, H1, DO_MAX)
288
+DO_ZPZZ(sve_umax_zpzz_h, uint16_t, H1_2, DO_MAX)
289
+DO_ZPZZ(sve_umax_zpzz_s, uint32_t, H1_4, DO_MAX)
290
+DO_ZPZZ_D(sve_umax_zpzz_d, uint64_t, DO_MAX)
291
+
292
+DO_ZPZZ(sve_smin_zpzz_b, int8_t, H1, DO_MIN)
293
+DO_ZPZZ(sve_smin_zpzz_h, int16_t, H1_2, DO_MIN)
294
+DO_ZPZZ(sve_smin_zpzz_s, int32_t, H1_4, DO_MIN)
295
+DO_ZPZZ_D(sve_smin_zpzz_d, int64_t, DO_MIN)
296
+
297
+DO_ZPZZ(sve_umin_zpzz_b, uint8_t, H1, DO_MIN)
298
+DO_ZPZZ(sve_umin_zpzz_h, uint16_t, H1_2, DO_MIN)
299
+DO_ZPZZ(sve_umin_zpzz_s, uint32_t, H1_4, DO_MIN)
300
+DO_ZPZZ_D(sve_umin_zpzz_d, uint64_t, DO_MIN)
301
+
302
+DO_ZPZZ(sve_sabd_zpzz_b, int8_t, H1, DO_ABD)
303
+DO_ZPZZ(sve_sabd_zpzz_h, int16_t, H1_2, DO_ABD)
304
+DO_ZPZZ(sve_sabd_zpzz_s, int32_t, H1_4, DO_ABD)
305
+DO_ZPZZ_D(sve_sabd_zpzz_d, int64_t, DO_ABD)
306
+
307
+DO_ZPZZ(sve_uabd_zpzz_b, uint8_t, H1, DO_ABD)
308
+DO_ZPZZ(sve_uabd_zpzz_h, uint16_t, H1_2, DO_ABD)
309
+DO_ZPZZ(sve_uabd_zpzz_s, uint32_t, H1_4, DO_ABD)
310
+DO_ZPZZ_D(sve_uabd_zpzz_d, uint64_t, DO_ABD)
311
+
312
+/* Because the computation type is at least twice as large as required,
313
+ these work for both signed and unsigned source types. */
314
+static inline uint8_t do_mulh_b(int32_t n, int32_t m)
315
+{
316
+ return (n * m) >> 8;
317
+}
318
+
319
+static inline uint16_t do_mulh_h(int32_t n, int32_t m)
320
+{
321
+ return (n * m) >> 16;
322
+}
323
+
324
+static inline uint32_t do_mulh_s(int64_t n, int64_t m)
325
+{
326
+ return (n * m) >> 32;
327
+}
328
+
329
+static inline uint64_t do_smulh_d(uint64_t n, uint64_t m)
330
+{
331
+ uint64_t lo, hi;
332
+ muls64(&lo, &hi, n, m);
333
+ return hi;
334
+}
335
+
336
+static inline uint64_t do_umulh_d(uint64_t n, uint64_t m)
337
+{
338
+ uint64_t lo, hi;
339
+ mulu64(&lo, &hi, n, m);
340
+ return hi;
341
+}
342
+
343
+DO_ZPZZ(sve_mul_zpzz_b, uint8_t, H1, DO_MUL)
344
+DO_ZPZZ(sve_mul_zpzz_h, uint16_t, H1_2, DO_MUL)
345
+DO_ZPZZ(sve_mul_zpzz_s, uint32_t, H1_4, DO_MUL)
346
+DO_ZPZZ_D(sve_mul_zpzz_d, uint64_t, DO_MUL)
347
+
348
+DO_ZPZZ(sve_smulh_zpzz_b, int8_t, H1, do_mulh_b)
349
+DO_ZPZZ(sve_smulh_zpzz_h, int16_t, H1_2, do_mulh_h)
350
+DO_ZPZZ(sve_smulh_zpzz_s, int32_t, H1_4, do_mulh_s)
351
+DO_ZPZZ_D(sve_smulh_zpzz_d, uint64_t, do_smulh_d)
352
+
353
+DO_ZPZZ(sve_umulh_zpzz_b, uint8_t, H1, do_mulh_b)
354
+DO_ZPZZ(sve_umulh_zpzz_h, uint16_t, H1_2, do_mulh_h)
355
+DO_ZPZZ(sve_umulh_zpzz_s, uint32_t, H1_4, do_mulh_s)
356
+DO_ZPZZ_D(sve_umulh_zpzz_d, uint64_t, do_umulh_d)
357
+
358
+DO_ZPZZ(sve_sdiv_zpzz_s, int32_t, H1_4, DO_DIV)
359
+DO_ZPZZ_D(sve_sdiv_zpzz_d, int64_t, DO_DIV)
360
+
361
+DO_ZPZZ(sve_udiv_zpzz_s, uint32_t, H1_4, DO_DIV)
362
+DO_ZPZZ_D(sve_udiv_zpzz_d, uint64_t, DO_DIV)
363
+
364
+#undef DO_ZPZZ
365
+#undef DO_ZPZZ_D
366
+#undef DO_AND
367
+#undef DO_ORR
368
+#undef DO_EOR
369
+#undef DO_BIC
370
+#undef DO_ADD
371
+#undef DO_SUB
372
+#undef DO_MAX
373
+#undef DO_MIN
374
+#undef DO_ABD
375
+#undef DO_MUL
376
+#undef DO_DIV
377
+
378
/* Similar to the ARM LastActiveElement pseudocode function, except the
379
result is multiplied by the element size. This includes the not found
380
indication; e.g. not found for esz=3 is -8. */
381
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
382
index XXXXXXX..XXXXXXX 100644
383
--- a/target/arm/translate-sve.c
384
+++ b/target/arm/translate-sve.c
385
@@ -XXX,XX +XXX,XX @@ static bool trans_BIC_zzz(DisasContext *s, arg_rrr_esz *a, uint32_t insn)
386
return do_vector3_z(s, tcg_gen_gvec_andc, 0, a->rd, a->rn, a->rm);
387
}
388
389
+/*
390
+ *** SVE Integer Arithmetic - Binary Predicated Group
391
+ */
392
+
393
+static bool do_zpzz_ool(DisasContext *s, arg_rprr_esz *a, gen_helper_gvec_4 *fn)
394
+{
395
+ unsigned vsz = vec_full_reg_size(s);
396
+ if (fn == NULL) {
397
+ return false;
398
+ }
399
+ if (sve_access_check(s)) {
400
+ tcg_gen_gvec_4_ool(vec_full_reg_offset(s, a->rd),
401
+ vec_full_reg_offset(s, a->rn),
402
+ vec_full_reg_offset(s, a->rm),
403
+ pred_full_reg_offset(s, a->pg),
404
+ vsz, vsz, 0, fn);
405
+ }
406
+ return true;
407
+}
408
+
409
+#define DO_ZPZZ(NAME, name) \
410
+static bool trans_##NAME##_zpzz(DisasContext *s, arg_rprr_esz *a, \
411
+ uint32_t insn) \
412
+{ \
413
+ static gen_helper_gvec_4 * const fns[4] = { \
414
+ gen_helper_sve_##name##_zpzz_b, gen_helper_sve_##name##_zpzz_h, \
415
+ gen_helper_sve_##name##_zpzz_s, gen_helper_sve_##name##_zpzz_d, \
416
+ }; \
417
+ return do_zpzz_ool(s, a, fns[a->esz]); \
418
+}
419
+
420
+DO_ZPZZ(AND, and)
421
+DO_ZPZZ(EOR, eor)
422
+DO_ZPZZ(ORR, orr)
423
+DO_ZPZZ(BIC, bic)
424
+
425
+DO_ZPZZ(ADD, add)
426
+DO_ZPZZ(SUB, sub)
427
+
428
+DO_ZPZZ(SMAX, smax)
429
+DO_ZPZZ(UMAX, umax)
430
+DO_ZPZZ(SMIN, smin)
431
+DO_ZPZZ(UMIN, umin)
432
+DO_ZPZZ(SABD, sabd)
433
+DO_ZPZZ(UABD, uabd)
434
+
435
+DO_ZPZZ(MUL, mul)
436
+DO_ZPZZ(SMULH, smulh)
437
+DO_ZPZZ(UMULH, umulh)
438
+
439
+static bool trans_SDIV_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
440
+{
441
+ static gen_helper_gvec_4 * const fns[4] = {
442
+ NULL, NULL, gen_helper_sve_sdiv_zpzz_s, gen_helper_sve_sdiv_zpzz_d
443
+ };
444
+ return do_zpzz_ool(s, a, fns[a->esz]);
445
+}
446
+
447
+static bool trans_UDIV_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
448
+{
449
+ static gen_helper_gvec_4 * const fns[4] = {
450
+ NULL, NULL, gen_helper_sve_udiv_zpzz_s, gen_helper_sve_udiv_zpzz_d
451
+ };
452
+ return do_zpzz_ool(s, a, fns[a->esz]);
453
+}
454
+
455
+#undef DO_ZPZZ
456
+
457
/*
458
*** SVE Predicate Logical Operations Group
459
*/
460
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
461
index XXXXXXX..XXXXXXX 100644
462
--- a/target/arm/sve.decode
463
+++ b/target/arm/sve.decode
464
@@ -XXX,XX +XXX,XX @@
465
466
%imm9_16_10 16:s6 10:3
467
468
+# Either a copy of rd (at bit 0), or a different source
469
+# as propagated via the MOVPRFX instruction.
470
+%reg_movprfx 0:5
471
+
472
###########################################################################
473
# Named attribute sets. These are used to make nice(er) names
474
# when creating helpers common to those for the individual
475
@@ -XXX,XX +XXX,XX @@
476
&rri rd rn imm
477
&rrr_esz rd rn rm esz
478
&rprr_s rd pg rn rm s
479
+&rprr_esz rd pg rn rm esz
480
481
###########################################################################
482
# Named instruction formats. These are generally used to
483
@@ -XXX,XX +XXX,XX @@
484
# Three predicate operand, with governing predicate, flag setting
485
@pd_pg_pn_pm_s ........ . s:1 .. rm:4 .. pg:4 . rn:4 . rd:4 &rprr_s
486
487
+# Two register operand, with governing predicate, vector element size
488
+@rdn_pg_rm ........ esz:2 ... ... ... pg:3 rm:5 rd:5 \
489
+ &rprr_esz rn=%reg_movprfx
490
+@rdm_pg_rn ........ esz:2 ... ... ... pg:3 rn:5 rd:5 \
491
+ &rprr_esz rm=%reg_movprfx
492
+
493
# Basic Load/Store with 9-bit immediate offset
494
@pd_rn_i9 ........ ........ ...... rn:5 . rd:4 \
495
&rri imm=%imm9_16_10
496
@@ -XXX,XX +XXX,XX @@
497
###########################################################################
498
# Instruction patterns. Grouped according to the SVE encodingindex.xhtml.
499
500
+### SVE Integer Arithmetic - Binary Predicated Group
501
+
502
+# SVE bitwise logical vector operations (predicated)
503
+ORR_zpzz 00000100 .. 011 000 000 ... ..... ..... @rdn_pg_rm
504
+EOR_zpzz 00000100 .. 011 001 000 ... ..... ..... @rdn_pg_rm
505
+AND_zpzz 00000100 .. 011 010 000 ... ..... ..... @rdn_pg_rm
506
+BIC_zpzz 00000100 .. 011 011 000 ... ..... ..... @rdn_pg_rm
507
+
508
+# SVE integer add/subtract vectors (predicated)
509
+ADD_zpzz 00000100 .. 000 000 000 ... ..... ..... @rdn_pg_rm
510
+SUB_zpzz 00000100 .. 000 001 000 ... ..... ..... @rdn_pg_rm
511
+SUB_zpzz 00000100 .. 000 011 000 ... ..... ..... @rdm_pg_rn # SUBR
512
+
513
+# SVE integer min/max/difference (predicated)
514
+SMAX_zpzz 00000100 .. 001 000 000 ... ..... ..... @rdn_pg_rm
515
+UMAX_zpzz 00000100 .. 001 001 000 ... ..... ..... @rdn_pg_rm
516
+SMIN_zpzz 00000100 .. 001 010 000 ... ..... ..... @rdn_pg_rm
517
+UMIN_zpzz 00000100 .. 001 011 000 ... ..... ..... @rdn_pg_rm
518
+SABD_zpzz 00000100 .. 001 100 000 ... ..... ..... @rdn_pg_rm
519
+UABD_zpzz 00000100 .. 001 101 000 ... ..... ..... @rdn_pg_rm
520
+
521
+# SVE integer multiply/divide (predicated)
522
+MUL_zpzz 00000100 .. 010 000 000 ... ..... ..... @rdn_pg_rm
523
+SMULH_zpzz 00000100 .. 010 010 000 ... ..... ..... @rdn_pg_rm
524
+UMULH_zpzz 00000100 .. 010 011 000 ... ..... ..... @rdn_pg_rm
525
+# Note that divide requires size >= 2; below 2 is unallocated.
526
+SDIV_zpzz 00000100 .. 010 100 000 ... ..... ..... @rdn_pg_rm
527
+UDIV_zpzz 00000100 .. 010 101 000 ... ..... ..... @rdn_pg_rm
528
+SDIV_zpzz 00000100 .. 010 110 000 ... ..... ..... @rdm_pg_rn # SDIVR
529
+UDIV_zpzz 00000100 .. 010 111 000 ... ..... ..... @rdm_pg_rn # UDIVR
530
+
531
### SVE Logical - Unpredicated Group
532
533
# SVE bitwise logical operations (unpredicated)
534
--
535
2.17.0
536
537
diff view generated by jsdifflib
Deleted patch
1
From: Richard Henderson <richard.henderson@linaro.org>
2
1
3
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
5
Message-id: 20180516223007.10256-11-richard.henderson@linaro.org
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
---
8
target/arm/helper-sve.h | 25 ++++
9
target/arm/sve_helper.c | 264 +++++++++++++++++++++++++++++++++++++
10
target/arm/translate-sve.c | 130 ++++++++++++++++++
11
target/arm/sve.decode | 26 ++++
12
4 files changed, 445 insertions(+)
13
14
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
15
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/helper-sve.h
17
+++ b/target/arm/helper-sve.h
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(sve_uminv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
19
DEF_HELPER_FLAGS_3(sve_uminv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
20
DEF_HELPER_FLAGS_3(sve_uminv_d, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
21
22
+DEF_HELPER_FLAGS_3(sve_clr_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
23
+DEF_HELPER_FLAGS_3(sve_clr_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
24
+DEF_HELPER_FLAGS_3(sve_clr_s, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
25
+DEF_HELPER_FLAGS_3(sve_clr_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
26
+
27
+DEF_HELPER_FLAGS_4(sve_asr_zpzi_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
28
+DEF_HELPER_FLAGS_4(sve_asr_zpzi_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
29
+DEF_HELPER_FLAGS_4(sve_asr_zpzi_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
30
+DEF_HELPER_FLAGS_4(sve_asr_zpzi_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
31
+
32
+DEF_HELPER_FLAGS_4(sve_lsr_zpzi_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
33
+DEF_HELPER_FLAGS_4(sve_lsr_zpzi_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
34
+DEF_HELPER_FLAGS_4(sve_lsr_zpzi_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
35
+DEF_HELPER_FLAGS_4(sve_lsr_zpzi_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
36
+
37
+DEF_HELPER_FLAGS_4(sve_lsl_zpzi_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
38
+DEF_HELPER_FLAGS_4(sve_lsl_zpzi_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
39
+DEF_HELPER_FLAGS_4(sve_lsl_zpzi_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
40
+DEF_HELPER_FLAGS_4(sve_lsl_zpzi_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
41
+
42
+DEF_HELPER_FLAGS_4(sve_asrd_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
43
+DEF_HELPER_FLAGS_4(sve_asrd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
44
+DEF_HELPER_FLAGS_4(sve_asrd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
45
+DEF_HELPER_FLAGS_4(sve_asrd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
46
+
47
DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
48
DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
49
DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
50
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
51
index XXXXXXX..XXXXXXX 100644
52
--- a/target/arm/sve_helper.c
53
+++ b/target/arm/sve_helper.c
54
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_predtest)(void *vd, void *vg, uint32_t words)
55
return flags;
56
}
57
58
+/* Expand active predicate bits to bytes, for byte elements.
59
+ * for (i = 0; i < 256; ++i) {
60
+ * unsigned long m = 0;
61
+ * for (j = 0; j < 8; j++) {
62
+ * if ((i >> j) & 1) {
63
+ * m |= 0xfful << (j << 3);
64
+ * }
65
+ * }
66
+ * printf("0x%016lx,\n", m);
67
+ * }
68
+ */
69
+static inline uint64_t expand_pred_b(uint8_t byte)
70
+{
71
+ static const uint64_t word[256] = {
72
+ 0x0000000000000000, 0x00000000000000ff, 0x000000000000ff00,
73
+ 0x000000000000ffff, 0x0000000000ff0000, 0x0000000000ff00ff,
74
+ 0x0000000000ffff00, 0x0000000000ffffff, 0x00000000ff000000,
75
+ 0x00000000ff0000ff, 0x00000000ff00ff00, 0x00000000ff00ffff,
76
+ 0x00000000ffff0000, 0x00000000ffff00ff, 0x00000000ffffff00,
77
+ 0x00000000ffffffff, 0x000000ff00000000, 0x000000ff000000ff,
78
+ 0x000000ff0000ff00, 0x000000ff0000ffff, 0x000000ff00ff0000,
79
+ 0x000000ff00ff00ff, 0x000000ff00ffff00, 0x000000ff00ffffff,
80
+ 0x000000ffff000000, 0x000000ffff0000ff, 0x000000ffff00ff00,
81
+ 0x000000ffff00ffff, 0x000000ffffff0000, 0x000000ffffff00ff,
82
+ 0x000000ffffffff00, 0x000000ffffffffff, 0x0000ff0000000000,
83
+ 0x0000ff00000000ff, 0x0000ff000000ff00, 0x0000ff000000ffff,
84
+ 0x0000ff0000ff0000, 0x0000ff0000ff00ff, 0x0000ff0000ffff00,
85
+ 0x0000ff0000ffffff, 0x0000ff00ff000000, 0x0000ff00ff0000ff,
86
+ 0x0000ff00ff00ff00, 0x0000ff00ff00ffff, 0x0000ff00ffff0000,
87
+ 0x0000ff00ffff00ff, 0x0000ff00ffffff00, 0x0000ff00ffffffff,
88
+ 0x0000ffff00000000, 0x0000ffff000000ff, 0x0000ffff0000ff00,
89
+ 0x0000ffff0000ffff, 0x0000ffff00ff0000, 0x0000ffff00ff00ff,
90
+ 0x0000ffff00ffff00, 0x0000ffff00ffffff, 0x0000ffffff000000,
91
+ 0x0000ffffff0000ff, 0x0000ffffff00ff00, 0x0000ffffff00ffff,
92
+ 0x0000ffffffff0000, 0x0000ffffffff00ff, 0x0000ffffffffff00,
93
+ 0x0000ffffffffffff, 0x00ff000000000000, 0x00ff0000000000ff,
94
+ 0x00ff00000000ff00, 0x00ff00000000ffff, 0x00ff000000ff0000,
95
+ 0x00ff000000ff00ff, 0x00ff000000ffff00, 0x00ff000000ffffff,
96
+ 0x00ff0000ff000000, 0x00ff0000ff0000ff, 0x00ff0000ff00ff00,
97
+ 0x00ff0000ff00ffff, 0x00ff0000ffff0000, 0x00ff0000ffff00ff,
98
+ 0x00ff0000ffffff00, 0x00ff0000ffffffff, 0x00ff00ff00000000,
99
+ 0x00ff00ff000000ff, 0x00ff00ff0000ff00, 0x00ff00ff0000ffff,
100
+ 0x00ff00ff00ff0000, 0x00ff00ff00ff00ff, 0x00ff00ff00ffff00,
101
+ 0x00ff00ff00ffffff, 0x00ff00ffff000000, 0x00ff00ffff0000ff,
102
+ 0x00ff00ffff00ff00, 0x00ff00ffff00ffff, 0x00ff00ffffff0000,
103
+ 0x00ff00ffffff00ff, 0x00ff00ffffffff00, 0x00ff00ffffffffff,
104
+ 0x00ffff0000000000, 0x00ffff00000000ff, 0x00ffff000000ff00,
105
+ 0x00ffff000000ffff, 0x00ffff0000ff0000, 0x00ffff0000ff00ff,
106
+ 0x00ffff0000ffff00, 0x00ffff0000ffffff, 0x00ffff00ff000000,
107
+ 0x00ffff00ff0000ff, 0x00ffff00ff00ff00, 0x00ffff00ff00ffff,
108
+ 0x00ffff00ffff0000, 0x00ffff00ffff00ff, 0x00ffff00ffffff00,
109
+ 0x00ffff00ffffffff, 0x00ffffff00000000, 0x00ffffff000000ff,
110
+ 0x00ffffff0000ff00, 0x00ffffff0000ffff, 0x00ffffff00ff0000,
111
+ 0x00ffffff00ff00ff, 0x00ffffff00ffff00, 0x00ffffff00ffffff,
112
+ 0x00ffffffff000000, 0x00ffffffff0000ff, 0x00ffffffff00ff00,
113
+ 0x00ffffffff00ffff, 0x00ffffffffff0000, 0x00ffffffffff00ff,
114
+ 0x00ffffffffffff00, 0x00ffffffffffffff, 0xff00000000000000,
115
+ 0xff000000000000ff, 0xff0000000000ff00, 0xff0000000000ffff,
116
+ 0xff00000000ff0000, 0xff00000000ff00ff, 0xff00000000ffff00,
117
+ 0xff00000000ffffff, 0xff000000ff000000, 0xff000000ff0000ff,
118
+ 0xff000000ff00ff00, 0xff000000ff00ffff, 0xff000000ffff0000,
119
+ 0xff000000ffff00ff, 0xff000000ffffff00, 0xff000000ffffffff,
120
+ 0xff0000ff00000000, 0xff0000ff000000ff, 0xff0000ff0000ff00,
121
+ 0xff0000ff0000ffff, 0xff0000ff00ff0000, 0xff0000ff00ff00ff,
122
+ 0xff0000ff00ffff00, 0xff0000ff00ffffff, 0xff0000ffff000000,
123
+ 0xff0000ffff0000ff, 0xff0000ffff00ff00, 0xff0000ffff00ffff,
124
+ 0xff0000ffffff0000, 0xff0000ffffff00ff, 0xff0000ffffffff00,
125
+ 0xff0000ffffffffff, 0xff00ff0000000000, 0xff00ff00000000ff,
126
+ 0xff00ff000000ff00, 0xff00ff000000ffff, 0xff00ff0000ff0000,
127
+ 0xff00ff0000ff00ff, 0xff00ff0000ffff00, 0xff00ff0000ffffff,
128
+ 0xff00ff00ff000000, 0xff00ff00ff0000ff, 0xff00ff00ff00ff00,
129
+ 0xff00ff00ff00ffff, 0xff00ff00ffff0000, 0xff00ff00ffff00ff,
130
+ 0xff00ff00ffffff00, 0xff00ff00ffffffff, 0xff00ffff00000000,
131
+ 0xff00ffff000000ff, 0xff00ffff0000ff00, 0xff00ffff0000ffff,
132
+ 0xff00ffff00ff0000, 0xff00ffff00ff00ff, 0xff00ffff00ffff00,
133
+ 0xff00ffff00ffffff, 0xff00ffffff000000, 0xff00ffffff0000ff,
134
+ 0xff00ffffff00ff00, 0xff00ffffff00ffff, 0xff00ffffffff0000,
135
+ 0xff00ffffffff00ff, 0xff00ffffffffff00, 0xff00ffffffffffff,
136
+ 0xffff000000000000, 0xffff0000000000ff, 0xffff00000000ff00,
137
+ 0xffff00000000ffff, 0xffff000000ff0000, 0xffff000000ff00ff,
138
+ 0xffff000000ffff00, 0xffff000000ffffff, 0xffff0000ff000000,
139
+ 0xffff0000ff0000ff, 0xffff0000ff00ff00, 0xffff0000ff00ffff,
140
+ 0xffff0000ffff0000, 0xffff0000ffff00ff, 0xffff0000ffffff00,
141
+ 0xffff0000ffffffff, 0xffff00ff00000000, 0xffff00ff000000ff,
142
+ 0xffff00ff0000ff00, 0xffff00ff0000ffff, 0xffff00ff00ff0000,
143
+ 0xffff00ff00ff00ff, 0xffff00ff00ffff00, 0xffff00ff00ffffff,
144
+ 0xffff00ffff000000, 0xffff00ffff0000ff, 0xffff00ffff00ff00,
145
+ 0xffff00ffff00ffff, 0xffff00ffffff0000, 0xffff00ffffff00ff,
146
+ 0xffff00ffffffff00, 0xffff00ffffffffff, 0xffffff0000000000,
147
+ 0xffffff00000000ff, 0xffffff000000ff00, 0xffffff000000ffff,
148
+ 0xffffff0000ff0000, 0xffffff0000ff00ff, 0xffffff0000ffff00,
149
+ 0xffffff0000ffffff, 0xffffff00ff000000, 0xffffff00ff0000ff,
150
+ 0xffffff00ff00ff00, 0xffffff00ff00ffff, 0xffffff00ffff0000,
151
+ 0xffffff00ffff00ff, 0xffffff00ffffff00, 0xffffff00ffffffff,
152
+ 0xffffffff00000000, 0xffffffff000000ff, 0xffffffff0000ff00,
153
+ 0xffffffff0000ffff, 0xffffffff00ff0000, 0xffffffff00ff00ff,
154
+ 0xffffffff00ffff00, 0xffffffff00ffffff, 0xffffffffff000000,
155
+ 0xffffffffff0000ff, 0xffffffffff00ff00, 0xffffffffff00ffff,
156
+ 0xffffffffffff0000, 0xffffffffffff00ff, 0xffffffffffffff00,
157
+ 0xffffffffffffffff,
158
+ };
159
+ return word[byte];
160
+}
161
+
162
+/* Similarly for half-word elements.
163
+ * for (i = 0; i < 256; ++i) {
164
+ * unsigned long m = 0;
165
+ * if (i & 0xaa) {
166
+ * continue;
167
+ * }
168
+ * for (j = 0; j < 8; j += 2) {
169
+ * if ((i >> j) & 1) {
170
+ * m |= 0xfffful << (j << 3);
171
+ * }
172
+ * }
173
+ * printf("[0x%x] = 0x%016lx,\n", i, m);
174
+ * }
175
+ */
176
+static inline uint64_t expand_pred_h(uint8_t byte)
177
+{
178
+ static const uint64_t word[] = {
179
+ [0x01] = 0x000000000000ffff, [0x04] = 0x00000000ffff0000,
180
+ [0x05] = 0x00000000ffffffff, [0x10] = 0x0000ffff00000000,
181
+ [0x11] = 0x0000ffff0000ffff, [0x14] = 0x0000ffffffff0000,
182
+ [0x15] = 0x0000ffffffffffff, [0x40] = 0xffff000000000000,
183
+ [0x41] = 0xffff00000000ffff, [0x44] = 0xffff0000ffff0000,
184
+ [0x45] = 0xffff0000ffffffff, [0x50] = 0xffffffff00000000,
185
+ [0x51] = 0xffffffff0000ffff, [0x54] = 0xffffffffffff0000,
186
+ [0x55] = 0xffffffffffffffff,
187
+ };
188
+ return word[byte & 0x55];
189
+}
190
+
191
+/* Similarly for single word elements. */
192
+static inline uint64_t expand_pred_s(uint8_t byte)
193
+{
194
+ static const uint64_t word[] = {
195
+ [0x01] = 0x00000000ffffffffull,
196
+ [0x10] = 0xffffffff00000000ull,
197
+ [0x11] = 0xffffffffffffffffull,
198
+ };
199
+ return word[byte & 0x11];
200
+}
201
+
202
#define LOGICAL_PPPP(NAME, FUNC) \
203
void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc) \
204
{ \
205
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sve_pnext)(void *vd, void *vg, uint32_t pred_desc)
206
207
return flags;
208
}
209
+
210
+/* Store zero into every active element of Zd. We will use this for two
211
+ * and three-operand predicated instructions for which logic dictates a
212
+ * zero result. In particular, logical shift by element size, which is
213
+ * otherwise undefined on the host.
214
+ *
215
+ * For element sizes smaller than uint64_t, we use tables to expand
216
+ * the N bits of the controlling predicate to a byte mask, and clear
217
+ * those bytes.
218
+ */
219
+void HELPER(sve_clr_b)(void *vd, void *vg, uint32_t desc)
220
+{
221
+ intptr_t i, opr_sz = simd_oprsz(desc) / 8;
222
+ uint64_t *d = vd;
223
+ uint8_t *pg = vg;
224
+ for (i = 0; i < opr_sz; i += 1) {
225
+ d[i] &= ~expand_pred_b(pg[H1(i)]);
226
+ }
227
+}
228
+
229
+void HELPER(sve_clr_h)(void *vd, void *vg, uint32_t desc)
230
+{
231
+ intptr_t i, opr_sz = simd_oprsz(desc) / 8;
232
+ uint64_t *d = vd;
233
+ uint8_t *pg = vg;
234
+ for (i = 0; i < opr_sz; i += 1) {
235
+ d[i] &= ~expand_pred_h(pg[H1(i)]);
236
+ }
237
+}
238
+
239
+void HELPER(sve_clr_s)(void *vd, void *vg, uint32_t desc)
240
+{
241
+ intptr_t i, opr_sz = simd_oprsz(desc) / 8;
242
+ uint64_t *d = vd;
243
+ uint8_t *pg = vg;
244
+ for (i = 0; i < opr_sz; i += 1) {
245
+ d[i] &= ~expand_pred_s(pg[H1(i)]);
246
+ }
247
+}
248
+
249
+void HELPER(sve_clr_d)(void *vd, void *vg, uint32_t desc)
250
+{
251
+ intptr_t i, opr_sz = simd_oprsz(desc) / 8;
252
+ uint64_t *d = vd;
253
+ uint8_t *pg = vg;
254
+ for (i = 0; i < opr_sz; i += 1) {
255
+ if (pg[H1(i)] & 1) {
256
+ d[i] = 0;
257
+ }
258
+ }
259
+}
260
+
261
+/* Three-operand expander, immediate operand, controlled by a predicate.
262
+ */
263
+#define DO_ZPZI(NAME, TYPE, H, OP) \
264
+void HELPER(NAME)(void *vd, void *vn, void *vg, uint32_t desc) \
265
+{ \
266
+ intptr_t i, opr_sz = simd_oprsz(desc); \
267
+ TYPE imm = simd_data(desc); \
268
+ for (i = 0; i < opr_sz; ) { \
269
+ uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \
270
+ do { \
271
+ if (pg & 1) { \
272
+ TYPE nn = *(TYPE *)(vn + H(i)); \
273
+ *(TYPE *)(vd + H(i)) = OP(nn, imm); \
274
+ } \
275
+ i += sizeof(TYPE), pg >>= sizeof(TYPE); \
276
+ } while (i & 15); \
277
+ } \
278
+}
279
+
280
+/* Similarly, specialized for 64-bit operands. */
281
+#define DO_ZPZI_D(NAME, TYPE, OP) \
282
+void HELPER(NAME)(void *vd, void *vn, void *vg, uint32_t desc) \
283
+{ \
284
+ intptr_t i, opr_sz = simd_oprsz(desc) / 8; \
285
+ TYPE *d = vd, *n = vn; \
286
+ TYPE imm = simd_data(desc); \
287
+ uint8_t *pg = vg; \
288
+ for (i = 0; i < opr_sz; i += 1) { \
289
+ if (pg[H1(i)] & 1) { \
290
+ TYPE nn = n[i]; \
291
+ d[i] = OP(nn, imm); \
292
+ } \
293
+ } \
294
+}
295
+
296
+#define DO_SHR(N, M) (N >> M)
297
+#define DO_SHL(N, M) (N << M)
298
+
299
+/* Arithmetic shift right for division. This rounds negative numbers
300
+ toward zero as per signed division. Therefore before shifting,
301
+ when N is negative, add 2**M-1. */
302
+#define DO_ASRD(N, M) ((N + (N < 0 ? ((__typeof(N))1 << M) - 1 : 0)) >> M)
303
+
304
+DO_ZPZI(sve_asr_zpzi_b, int8_t, H1, DO_SHR)
305
+DO_ZPZI(sve_asr_zpzi_h, int16_t, H1_2, DO_SHR)
306
+DO_ZPZI(sve_asr_zpzi_s, int32_t, H1_4, DO_SHR)
307
+DO_ZPZI_D(sve_asr_zpzi_d, int64_t, DO_SHR)
308
+
309
+DO_ZPZI(sve_lsr_zpzi_b, uint8_t, H1, DO_SHR)
310
+DO_ZPZI(sve_lsr_zpzi_h, uint16_t, H1_2, DO_SHR)
311
+DO_ZPZI(sve_lsr_zpzi_s, uint32_t, H1_4, DO_SHR)
312
+DO_ZPZI_D(sve_lsr_zpzi_d, uint64_t, DO_SHR)
313
+
314
+DO_ZPZI(sve_lsl_zpzi_b, uint8_t, H1, DO_SHL)
315
+DO_ZPZI(sve_lsl_zpzi_h, uint16_t, H1_2, DO_SHL)
316
+DO_ZPZI(sve_lsl_zpzi_s, uint32_t, H1_4, DO_SHL)
317
+DO_ZPZI_D(sve_lsl_zpzi_d, uint64_t, DO_SHL)
318
+
319
+DO_ZPZI(sve_asrd_b, int8_t, H1, DO_ASRD)
320
+DO_ZPZI(sve_asrd_h, int16_t, H1_2, DO_ASRD)
321
+DO_ZPZI(sve_asrd_s, int32_t, H1_4, DO_ASRD)
322
+DO_ZPZI_D(sve_asrd_d, int64_t, DO_ASRD)
323
+
324
+#undef DO_SHR
325
+#undef DO_SHL
326
+#undef DO_ASRD
327
+#undef DO_ZPZI
328
+#undef DO_ZPZI_D
329
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
330
index XXXXXXX..XXXXXXX 100644
331
--- a/target/arm/translate-sve.c
332
+++ b/target/arm/translate-sve.c
333
@@ -XXX,XX +XXX,XX @@
334
#include "trace-tcg.h"
335
#include "translate-a64.h"
336
337
+/*
338
+ * Helpers for extracting complex instruction fields.
339
+ */
340
+
341
+/* See e.g. ASR (immediate, predicated).
342
+ * Returns -1 for unallocated encoding; diagnose later.
343
+ */
344
+static int tszimm_esz(int x)
345
+{
346
+ x >>= 3; /* discard imm3 */
347
+ return 31 - clz32(x);
348
+}
349
+
350
+static int tszimm_shr(int x)
351
+{
352
+ return (16 << tszimm_esz(x)) - x;
353
+}
354
+
355
+/* See e.g. LSL (immediate, predicated). */
356
+static int tszimm_shl(int x)
357
+{
358
+ return x - (8 << tszimm_esz(x));
359
+}
360
+
361
/*
362
* Include the generated decoder.
363
*/
364
@@ -XXX,XX +XXX,XX @@ static bool trans_SADDV(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
365
366
#undef DO_VPZ
367
368
+/*
369
+ *** SVE Shift by Immediate - Predicated Group
370
+ */
371
+
372
+/* Store zero into every active element of Zd. We will use this for two
373
+ * and three-operand predicated instructions for which logic dictates a
374
+ * zero result.
375
+ */
376
+static bool do_clr_zp(DisasContext *s, int rd, int pg, int esz)
377
+{
378
+ static gen_helper_gvec_2 * const fns[4] = {
379
+ gen_helper_sve_clr_b, gen_helper_sve_clr_h,
380
+ gen_helper_sve_clr_s, gen_helper_sve_clr_d,
381
+ };
382
+ if (sve_access_check(s)) {
383
+ unsigned vsz = vec_full_reg_size(s);
384
+ tcg_gen_gvec_2_ool(vec_full_reg_offset(s, rd),
385
+ pred_full_reg_offset(s, pg),
386
+ vsz, vsz, 0, fns[esz]);
387
+ }
388
+ return true;
389
+}
390
+
391
+static bool do_zpzi_ool(DisasContext *s, arg_rpri_esz *a,
392
+ gen_helper_gvec_3 *fn)
393
+{
394
+ if (sve_access_check(s)) {
395
+ unsigned vsz = vec_full_reg_size(s);
396
+ tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
397
+ vec_full_reg_offset(s, a->rn),
398
+ pred_full_reg_offset(s, a->pg),
399
+ vsz, vsz, a->imm, fn);
400
+ }
401
+ return true;
402
+}
403
+
404
+static bool trans_ASR_zpzi(DisasContext *s, arg_rpri_esz *a, uint32_t insn)
405
+{
406
+ static gen_helper_gvec_3 * const fns[4] = {
407
+ gen_helper_sve_asr_zpzi_b, gen_helper_sve_asr_zpzi_h,
408
+ gen_helper_sve_asr_zpzi_s, gen_helper_sve_asr_zpzi_d,
409
+ };
410
+ if (a->esz < 0) {
411
+ /* Invalid tsz encoding -- see tszimm_esz. */
412
+ return false;
413
+ }
414
+ /* Shift by element size is architecturally valid. For
415
+ arithmetic right-shift, it's the same as by one less. */
416
+ a->imm = MIN(a->imm, (8 << a->esz) - 1);
417
+ return do_zpzi_ool(s, a, fns[a->esz]);
418
+}
419
+
420
+static bool trans_LSR_zpzi(DisasContext *s, arg_rpri_esz *a, uint32_t insn)
421
+{
422
+ static gen_helper_gvec_3 * const fns[4] = {
423
+ gen_helper_sve_lsr_zpzi_b, gen_helper_sve_lsr_zpzi_h,
424
+ gen_helper_sve_lsr_zpzi_s, gen_helper_sve_lsr_zpzi_d,
425
+ };
426
+ if (a->esz < 0) {
427
+ return false;
428
+ }
429
+ /* Shift by element size is architecturally valid.
430
+ For logical shifts, it is a zeroing operation. */
431
+ if (a->imm >= (8 << a->esz)) {
432
+ return do_clr_zp(s, a->rd, a->pg, a->esz);
433
+ } else {
434
+ return do_zpzi_ool(s, a, fns[a->esz]);
435
+ }
436
+}
437
+
438
+static bool trans_LSL_zpzi(DisasContext *s, arg_rpri_esz *a, uint32_t insn)
439
+{
440
+ static gen_helper_gvec_3 * const fns[4] = {
441
+ gen_helper_sve_lsl_zpzi_b, gen_helper_sve_lsl_zpzi_h,
442
+ gen_helper_sve_lsl_zpzi_s, gen_helper_sve_lsl_zpzi_d,
443
+ };
444
+ if (a->esz < 0) {
445
+ return false;
446
+ }
447
+ /* Shift by element size is architecturally valid.
448
+ For logical shifts, it is a zeroing operation. */
449
+ if (a->imm >= (8 << a->esz)) {
450
+ return do_clr_zp(s, a->rd, a->pg, a->esz);
451
+ } else {
452
+ return do_zpzi_ool(s, a, fns[a->esz]);
453
+ }
454
+}
455
+
456
+static bool trans_ASRD(DisasContext *s, arg_rpri_esz *a, uint32_t insn)
457
+{
458
+ static gen_helper_gvec_3 * const fns[4] = {
459
+ gen_helper_sve_asrd_b, gen_helper_sve_asrd_h,
460
+ gen_helper_sve_asrd_s, gen_helper_sve_asrd_d,
461
+ };
462
+ if (a->esz < 0) {
463
+ return false;
464
+ }
465
+ /* Shift by element size is architecturally valid. For arithmetic
466
+ right shift for division, it is a zeroing operation. */
467
+ if (a->imm >= (8 << a->esz)) {
468
+ return do_clr_zp(s, a->rd, a->pg, a->esz);
469
+ } else {
470
+ return do_zpzi_ool(s, a, fns[a->esz]);
471
+ }
472
+}
473
+
474
/*
475
*** SVE Predicate Logical Operations Group
476
*/
477
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
478
index XXXXXXX..XXXXXXX 100644
479
--- a/target/arm/sve.decode
480
+++ b/target/arm/sve.decode
481
@@ -XXX,XX +XXX,XX @@
482
###########################################################################
483
# Named fields. These are primarily for disjoint fields.
484
485
+%imm6_22_5 22:1 5:5
486
%imm9_16_10 16:s6 10:3
487
488
+# A combination of tsz:imm3 -- extract esize.
489
+%tszimm_esz 22:2 5:5 !function=tszimm_esz
490
+# A combination of tsz:imm3 -- extract (2 * esize) - (tsz:imm3)
491
+%tszimm_shr 22:2 5:5 !function=tszimm_shr
492
+# A combination of tsz:imm3 -- extract (tsz:imm3) - esize
493
+%tszimm_shl 22:2 5:5 !function=tszimm_shl
494
+
495
# Either a copy of rd (at bit 0), or a different source
496
# as propagated via the MOVPRFX instruction.
497
%reg_movprfx 0:5
498
@@ -XXX,XX +XXX,XX @@
499
&rpr_esz rd pg rn esz
500
&rprr_s rd pg rn rm s
501
&rprr_esz rd pg rn rm esz
502
+&rpri_esz rd pg rn imm esz
503
504
###########################################################################
505
# Named instruction formats. These are generally used to
506
@@ -XXX,XX +XXX,XX @@
507
# One register operand, with governing predicate, vector element size
508
@rd_pg_rn ........ esz:2 ... ... ... pg:3 rn:5 rd:5 &rpr_esz
509
510
+# Two register operand, one immediate operand, with predicate,
511
+# element size encoded as TSZHL. User must fill in imm.
512
+@rdn_pg_tszimm ........ .. ... ... ... pg:3 ..... rd:5 \
513
+ &rpri_esz rn=%reg_movprfx esz=%tszimm_esz
514
+
515
# Basic Load/Store with 9-bit immediate offset
516
@pd_rn_i9 ........ ........ ...... rn:5 . rd:4 \
517
&rri imm=%imm9_16_10
518
@@ -XXX,XX +XXX,XX @@ UMAXV 00000100 .. 001 001 001 ... ..... ..... @rd_pg_rn
519
SMINV 00000100 .. 001 010 001 ... ..... ..... @rd_pg_rn
520
UMINV 00000100 .. 001 011 001 ... ..... ..... @rd_pg_rn
521
522
+### SVE Shift by Immediate - Predicated Group
523
+
524
+# SVE bitwise shift by immediate (predicated)
525
+ASR_zpzi 00000100 .. 000 000 100 ... .. ... ..... \
526
+ @rdn_pg_tszimm imm=%tszimm_shr
527
+LSR_zpzi 00000100 .. 000 001 100 ... .. ... ..... \
528
+ @rdn_pg_tszimm imm=%tszimm_shr
529
+LSL_zpzi 00000100 .. 000 011 100 ... .. ... ..... \
530
+ @rdn_pg_tszimm imm=%tszimm_shl
531
+ASRD 00000100 .. 000 100 100 ... .. ... ..... \
532
+ @rdn_pg_tszimm imm=%tszimm_shr
533
+
534
### SVE Logical - Unpredicated Group
535
536
# SVE bitwise logical operations (unpredicated)
537
--
538
2.17.0
539
540
diff view generated by jsdifflib
Deleted patch
1
From: Richard Henderson <richard.henderson@linaro.org>
2
1
3
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
5
Message-id: 20180516223007.10256-12-richard.henderson@linaro.org
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
---
8
target/arm/helper-sve.h | 27 +++++++++++++++++++++++++++
9
target/arm/sve_helper.c | 25 +++++++++++++++++++++++++
10
target/arm/translate-sve.c | 4 ++++
11
target/arm/sve.decode | 8 ++++++++
12
4 files changed, 64 insertions(+)
13
14
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
15
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/helper-sve.h
17
+++ b/target/arm/helper-sve.h
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sve_udiv_zpzz_s, TCG_CALL_NO_RWG,
19
DEF_HELPER_FLAGS_5(sve_udiv_zpzz_d, TCG_CALL_NO_RWG,
20
void, ptr, ptr, ptr, ptr, i32)
21
22
+DEF_HELPER_FLAGS_5(sve_asr_zpzz_b, TCG_CALL_NO_RWG,
23
+ void, ptr, ptr, ptr, ptr, i32)
24
+DEF_HELPER_FLAGS_5(sve_asr_zpzz_h, TCG_CALL_NO_RWG,
25
+ void, ptr, ptr, ptr, ptr, i32)
26
+DEF_HELPER_FLAGS_5(sve_asr_zpzz_s, TCG_CALL_NO_RWG,
27
+ void, ptr, ptr, ptr, ptr, i32)
28
+DEF_HELPER_FLAGS_5(sve_asr_zpzz_d, TCG_CALL_NO_RWG,
29
+ void, ptr, ptr, ptr, ptr, i32)
30
+
31
+DEF_HELPER_FLAGS_5(sve_lsr_zpzz_b, TCG_CALL_NO_RWG,
32
+ void, ptr, ptr, ptr, ptr, i32)
33
+DEF_HELPER_FLAGS_5(sve_lsr_zpzz_h, TCG_CALL_NO_RWG,
34
+ void, ptr, ptr, ptr, ptr, i32)
35
+DEF_HELPER_FLAGS_5(sve_lsr_zpzz_s, TCG_CALL_NO_RWG,
36
+ void, ptr, ptr, ptr, ptr, i32)
37
+DEF_HELPER_FLAGS_5(sve_lsr_zpzz_d, TCG_CALL_NO_RWG,
38
+ void, ptr, ptr, ptr, ptr, i32)
39
+
40
+DEF_HELPER_FLAGS_5(sve_lsl_zpzz_b, TCG_CALL_NO_RWG,
41
+ void, ptr, ptr, ptr, ptr, i32)
42
+DEF_HELPER_FLAGS_5(sve_lsl_zpzz_h, TCG_CALL_NO_RWG,
43
+ void, ptr, ptr, ptr, ptr, i32)
44
+DEF_HELPER_FLAGS_5(sve_lsl_zpzz_s, TCG_CALL_NO_RWG,
45
+ void, ptr, ptr, ptr, ptr, i32)
46
+DEF_HELPER_FLAGS_5(sve_lsl_zpzz_d, TCG_CALL_NO_RWG,
47
+ void, ptr, ptr, ptr, ptr, i32)
48
+
49
DEF_HELPER_FLAGS_3(sve_orv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
50
DEF_HELPER_FLAGS_3(sve_orv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
51
DEF_HELPER_FLAGS_3(sve_orv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
52
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
53
index XXXXXXX..XXXXXXX 100644
54
--- a/target/arm/sve_helper.c
55
+++ b/target/arm/sve_helper.c
56
@@ -XXX,XX +XXX,XX @@ DO_ZPZZ_D(sve_sdiv_zpzz_d, int64_t, DO_DIV)
57
DO_ZPZZ(sve_udiv_zpzz_s, uint32_t, H1_4, DO_DIV)
58
DO_ZPZZ_D(sve_udiv_zpzz_d, uint64_t, DO_DIV)
59
60
+/* Note that all bits of the shift are significant
61
+ and not modulo the element size. */
62
+#define DO_ASR(N, M) (N >> MIN(M, sizeof(N) * 8 - 1))
63
+#define DO_LSR(N, M) (M < sizeof(N) * 8 ? N >> M : 0)
64
+#define DO_LSL(N, M) (M < sizeof(N) * 8 ? N << M : 0)
65
+
66
+DO_ZPZZ(sve_asr_zpzz_b, int8_t, H1, DO_ASR)
67
+DO_ZPZZ(sve_lsr_zpzz_b, uint8_t, H1_2, DO_LSR)
68
+DO_ZPZZ(sve_lsl_zpzz_b, uint8_t, H1_4, DO_LSL)
69
+
70
+DO_ZPZZ(sve_asr_zpzz_h, int16_t, H1, DO_ASR)
71
+DO_ZPZZ(sve_lsr_zpzz_h, uint16_t, H1_2, DO_LSR)
72
+DO_ZPZZ(sve_lsl_zpzz_h, uint16_t, H1_4, DO_LSL)
73
+
74
+DO_ZPZZ(sve_asr_zpzz_s, int32_t, H1, DO_ASR)
75
+DO_ZPZZ(sve_lsr_zpzz_s, uint32_t, H1_2, DO_LSR)
76
+DO_ZPZZ(sve_lsl_zpzz_s, uint32_t, H1_4, DO_LSL)
77
+
78
+DO_ZPZZ_D(sve_asr_zpzz_d, int64_t, DO_ASR)
79
+DO_ZPZZ_D(sve_lsr_zpzz_d, uint64_t, DO_LSR)
80
+DO_ZPZZ_D(sve_lsl_zpzz_d, uint64_t, DO_LSL)
81
+
82
#undef DO_ZPZZ
83
#undef DO_ZPZZ_D
84
85
@@ -XXX,XX +XXX,XX @@ DO_VPZ_D(sve_uminv_d, uint64_t, uint64_t, -1, DO_MIN)
86
#undef DO_ABD
87
#undef DO_MUL
88
#undef DO_DIV
89
+#undef DO_ASR
90
+#undef DO_LSR
91
+#undef DO_LSL
92
93
/* Similar to the ARM LastActiveElement pseudocode function, except the
94
result is multiplied by the element size. This includes the not found
95
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
96
index XXXXXXX..XXXXXXX 100644
97
--- a/target/arm/translate-sve.c
98
+++ b/target/arm/translate-sve.c
99
@@ -XXX,XX +XXX,XX @@ DO_ZPZZ(MUL, mul)
100
DO_ZPZZ(SMULH, smulh)
101
DO_ZPZZ(UMULH, umulh)
102
103
+DO_ZPZZ(ASR, asr)
104
+DO_ZPZZ(LSR, lsr)
105
+DO_ZPZZ(LSL, lsl)
106
+
107
static bool trans_SDIV_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
108
{
109
static gen_helper_gvec_4 * const fns[4] = {
110
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
111
index XXXXXXX..XXXXXXX 100644
112
--- a/target/arm/sve.decode
113
+++ b/target/arm/sve.decode
114
@@ -XXX,XX +XXX,XX @@ LSL_zpzi 00000100 .. 000 011 100 ... .. ... ..... \
115
ASRD 00000100 .. 000 100 100 ... .. ... ..... \
116
@rdn_pg_tszimm imm=%tszimm_shr
117
118
+# SVE bitwise shift by vector (predicated)
119
+ASR_zpzz 00000100 .. 010 000 100 ... ..... ..... @rdn_pg_rm
120
+LSR_zpzz 00000100 .. 010 001 100 ... ..... ..... @rdn_pg_rm
121
+LSL_zpzz 00000100 .. 010 011 100 ... ..... ..... @rdn_pg_rm
122
+ASR_zpzz 00000100 .. 010 100 100 ... ..... ..... @rdm_pg_rn # ASRR
123
+LSR_zpzz 00000100 .. 010 101 100 ... ..... ..... @rdm_pg_rn # LSRR
124
+LSL_zpzz 00000100 .. 010 111 100 ... ..... ..... @rdm_pg_rn # LSLR
125
+
126
### SVE Logical - Unpredicated Group
127
128
# SVE bitwise logical operations (unpredicated)
129
--
130
2.17.0
131
132
diff view generated by jsdifflib
Deleted patch
1
From: Richard Henderson <richard.henderson@linaro.org>
2
1
3
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
5
Message-id: 20180516223007.10256-13-richard.henderson@linaro.org
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
---
8
target/arm/helper-sve.h | 21 +++++++++++++++++++++
9
target/arm/sve_helper.c | 35 +++++++++++++++++++++++++++++++++++
10
target/arm/translate-sve.c | 24 ++++++++++++++++++++++++
11
target/arm/sve.decode | 6 ++++++
12
4 files changed, 86 insertions(+)
13
14
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
15
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/helper-sve.h
17
+++ b/target/arm/helper-sve.h
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sve_lsl_zpzz_s, TCG_CALL_NO_RWG,
19
DEF_HELPER_FLAGS_5(sve_lsl_zpzz_d, TCG_CALL_NO_RWG,
20
void, ptr, ptr, ptr, ptr, i32)
21
22
+DEF_HELPER_FLAGS_5(sve_asr_zpzw_b, TCG_CALL_NO_RWG,
23
+ void, ptr, ptr, ptr, ptr, i32)
24
+DEF_HELPER_FLAGS_5(sve_asr_zpzw_h, TCG_CALL_NO_RWG,
25
+ void, ptr, ptr, ptr, ptr, i32)
26
+DEF_HELPER_FLAGS_5(sve_asr_zpzw_s, TCG_CALL_NO_RWG,
27
+ void, ptr, ptr, ptr, ptr, i32)
28
+
29
+DEF_HELPER_FLAGS_5(sve_lsr_zpzw_b, TCG_CALL_NO_RWG,
30
+ void, ptr, ptr, ptr, ptr, i32)
31
+DEF_HELPER_FLAGS_5(sve_lsr_zpzw_h, TCG_CALL_NO_RWG,
32
+ void, ptr, ptr, ptr, ptr, i32)
33
+DEF_HELPER_FLAGS_5(sve_lsr_zpzw_s, TCG_CALL_NO_RWG,
34
+ void, ptr, ptr, ptr, ptr, i32)
35
+
36
+DEF_HELPER_FLAGS_5(sve_lsl_zpzw_b, TCG_CALL_NO_RWG,
37
+ void, ptr, ptr, ptr, ptr, i32)
38
+DEF_HELPER_FLAGS_5(sve_lsl_zpzw_h, TCG_CALL_NO_RWG,
39
+ void, ptr, ptr, ptr, ptr, i32)
40
+DEF_HELPER_FLAGS_5(sve_lsl_zpzw_s, TCG_CALL_NO_RWG,
41
+ void, ptr, ptr, ptr, ptr, i32)
42
+
43
DEF_HELPER_FLAGS_3(sve_orv_b, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
44
DEF_HELPER_FLAGS_3(sve_orv_h, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
45
DEF_HELPER_FLAGS_3(sve_orv_s, TCG_CALL_NO_RWG, i64, ptr, ptr, i32)
46
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
47
index XXXXXXX..XXXXXXX 100644
48
--- a/target/arm/sve_helper.c
49
+++ b/target/arm/sve_helper.c
50
@@ -XXX,XX +XXX,XX @@ DO_ZPZZ_D(sve_lsl_zpzz_d, uint64_t, DO_LSL)
51
#undef DO_ZPZZ
52
#undef DO_ZPZZ_D
53
54
+/* Three-operand expander, controlled by a predicate, in which the
55
+ * third operand is "wide". That is, for D = N op M, the same 64-bit
56
+ * value of M is used with all of the narrower values of N.
57
+ */
58
+#define DO_ZPZW(NAME, TYPE, TYPEW, H, OP) \
59
+void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc) \
60
+{ \
61
+ intptr_t i, opr_sz = simd_oprsz(desc); \
62
+ for (i = 0; i < opr_sz; ) { \
63
+ uint8_t pg = *(uint8_t *)(vg + H1(i >> 3)); \
64
+ TYPEW mm = *(TYPEW *)(vm + i); \
65
+ do { \
66
+ if (pg & 1) { \
67
+ TYPE nn = *(TYPE *)(vn + H(i)); \
68
+ *(TYPE *)(vd + H(i)) = OP(nn, mm); \
69
+ } \
70
+ i += sizeof(TYPE), pg >>= sizeof(TYPE); \
71
+ } while (i & 7); \
72
+ } \
73
+}
74
+
75
+DO_ZPZW(sve_asr_zpzw_b, int8_t, uint64_t, H1, DO_ASR)
76
+DO_ZPZW(sve_lsr_zpzw_b, uint8_t, uint64_t, H1, DO_LSR)
77
+DO_ZPZW(sve_lsl_zpzw_b, uint8_t, uint64_t, H1, DO_LSL)
78
+
79
+DO_ZPZW(sve_asr_zpzw_h, int16_t, uint64_t, H1_2, DO_ASR)
80
+DO_ZPZW(sve_lsr_zpzw_h, uint16_t, uint64_t, H1_2, DO_LSR)
81
+DO_ZPZW(sve_lsl_zpzw_h, uint16_t, uint64_t, H1_2, DO_LSL)
82
+
83
+DO_ZPZW(sve_asr_zpzw_s, int32_t, uint64_t, H1_4, DO_ASR)
84
+DO_ZPZW(sve_lsr_zpzw_s, uint32_t, uint64_t, H1_4, DO_LSR)
85
+DO_ZPZW(sve_lsl_zpzw_s, uint32_t, uint64_t, H1_4, DO_LSL)
86
+
87
+#undef DO_ZPZW
88
+
89
/* Two-operand reduction expander, controlled by a predicate.
90
* The difference between TYPERED and TYPERET has to do with
91
* sign-extension. E.g. for SMAX, TYPERED must be signed,
92
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
93
index XXXXXXX..XXXXXXX 100644
94
--- a/target/arm/translate-sve.c
95
+++ b/target/arm/translate-sve.c
96
@@ -XXX,XX +XXX,XX @@ static bool trans_ASRD(DisasContext *s, arg_rpri_esz *a, uint32_t insn)
97
}
98
}
99
100
+/*
101
+ *** SVE Bitwise Shift - Predicated Group
102
+ */
103
+
104
+#define DO_ZPZW(NAME, name) \
105
+static bool trans_##NAME##_zpzw(DisasContext *s, arg_rprr_esz *a, \
106
+ uint32_t insn) \
107
+{ \
108
+ static gen_helper_gvec_4 * const fns[3] = { \
109
+ gen_helper_sve_##name##_zpzw_b, gen_helper_sve_##name##_zpzw_h, \
110
+ gen_helper_sve_##name##_zpzw_s, \
111
+ }; \
112
+ if (a->esz < 0 || a->esz >= 3) { \
113
+ return false; \
114
+ } \
115
+ return do_zpzz_ool(s, a, fns[a->esz]); \
116
+}
117
+
118
+DO_ZPZW(ASR, asr)
119
+DO_ZPZW(LSR, lsr)
120
+DO_ZPZW(LSL, lsl)
121
+
122
+#undef DO_ZPZW
123
+
124
/*
125
*** SVE Predicate Logical Operations Group
126
*/
127
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
128
index XXXXXXX..XXXXXXX 100644
129
--- a/target/arm/sve.decode
130
+++ b/target/arm/sve.decode
131
@@ -XXX,XX +XXX,XX @@ ASR_zpzz 00000100 .. 010 100 100 ... ..... ..... @rdm_pg_rn # ASRR
132
LSR_zpzz 00000100 .. 010 101 100 ... ..... ..... @rdm_pg_rn # LSRR
133
LSL_zpzz 00000100 .. 010 111 100 ... ..... ..... @rdm_pg_rn # LSLR
134
135
+# SVE bitwise shift by wide elements (predicated)
136
+# Note these require size != 3.
137
+ASR_zpzw 00000100 .. 011 000 100 ... ..... ..... @rdn_pg_rm
138
+LSR_zpzw 00000100 .. 011 001 100 ... ..... ..... @rdn_pg_rm
139
+LSL_zpzw 00000100 .. 011 011 100 ... ..... ..... @rdn_pg_rm
140
+
141
### SVE Logical - Unpredicated Group
142
143
# SVE bitwise logical operations (unpredicated)
144
--
145
2.17.0
146
147
diff view generated by jsdifflib
Deleted patch
1
From: Richard Henderson <richard.henderson@linaro.org>
2
1
3
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
5
Message-id: 20180516223007.10256-14-richard.henderson@linaro.org
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
---
8
target/arm/helper-sve.h | 60 ++++++++++++++++++
9
target/arm/sve_helper.c | 127 +++++++++++++++++++++++++++++++++++++
10
target/arm/translate-sve.c | 113 +++++++++++++++++++++++++++++++++
11
target/arm/sve.decode | 23 +++++++
12
4 files changed, 323 insertions(+)
13
14
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
15
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/helper-sve.h
17
+++ b/target/arm/helper-sve.h
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(sve_asrd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
19
DEF_HELPER_FLAGS_4(sve_asrd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
20
DEF_HELPER_FLAGS_4(sve_asrd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
21
22
+DEF_HELPER_FLAGS_4(sve_cls_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
23
+DEF_HELPER_FLAGS_4(sve_cls_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
24
+DEF_HELPER_FLAGS_4(sve_cls_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
25
+DEF_HELPER_FLAGS_4(sve_cls_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
26
+
27
+DEF_HELPER_FLAGS_4(sve_clz_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
28
+DEF_HELPER_FLAGS_4(sve_clz_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
29
+DEF_HELPER_FLAGS_4(sve_clz_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
30
+DEF_HELPER_FLAGS_4(sve_clz_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
31
+
32
+DEF_HELPER_FLAGS_4(sve_cnt_zpz_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
33
+DEF_HELPER_FLAGS_4(sve_cnt_zpz_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
34
+DEF_HELPER_FLAGS_4(sve_cnt_zpz_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
35
+DEF_HELPER_FLAGS_4(sve_cnt_zpz_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
36
+
37
+DEF_HELPER_FLAGS_4(sve_cnot_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
38
+DEF_HELPER_FLAGS_4(sve_cnot_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
39
+DEF_HELPER_FLAGS_4(sve_cnot_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
40
+DEF_HELPER_FLAGS_4(sve_cnot_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
41
+
42
+DEF_HELPER_FLAGS_4(sve_fabs_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
43
+DEF_HELPER_FLAGS_4(sve_fabs_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
44
+DEF_HELPER_FLAGS_4(sve_fabs_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
45
+
46
+DEF_HELPER_FLAGS_4(sve_fneg_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
47
+DEF_HELPER_FLAGS_4(sve_fneg_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
48
+DEF_HELPER_FLAGS_4(sve_fneg_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
49
+
50
+DEF_HELPER_FLAGS_4(sve_not_zpz_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
51
+DEF_HELPER_FLAGS_4(sve_not_zpz_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
52
+DEF_HELPER_FLAGS_4(sve_not_zpz_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
53
+DEF_HELPER_FLAGS_4(sve_not_zpz_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
54
+
55
+DEF_HELPER_FLAGS_4(sve_sxtb_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
56
+DEF_HELPER_FLAGS_4(sve_sxtb_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
57
+DEF_HELPER_FLAGS_4(sve_sxtb_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
58
+
59
+DEF_HELPER_FLAGS_4(sve_uxtb_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
60
+DEF_HELPER_FLAGS_4(sve_uxtb_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
61
+DEF_HELPER_FLAGS_4(sve_uxtb_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
62
+
63
+DEF_HELPER_FLAGS_4(sve_sxth_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
64
+DEF_HELPER_FLAGS_4(sve_sxth_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
65
+
66
+DEF_HELPER_FLAGS_4(sve_uxth_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
67
+DEF_HELPER_FLAGS_4(sve_uxth_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
68
+
69
+DEF_HELPER_FLAGS_4(sve_sxtw_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
70
+DEF_HELPER_FLAGS_4(sve_uxtw_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
71
+
72
+DEF_HELPER_FLAGS_4(sve_abs_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
73
+DEF_HELPER_FLAGS_4(sve_abs_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
74
+DEF_HELPER_FLAGS_4(sve_abs_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
75
+DEF_HELPER_FLAGS_4(sve_abs_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
76
+
77
+DEF_HELPER_FLAGS_4(sve_neg_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
78
+DEF_HELPER_FLAGS_4(sve_neg_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
79
+DEF_HELPER_FLAGS_4(sve_neg_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
80
+DEF_HELPER_FLAGS_4(sve_neg_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
81
+
82
DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
83
DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
84
DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
85
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
86
index XXXXXXX..XXXXXXX 100644
87
--- a/target/arm/sve_helper.c
88
+++ b/target/arm/sve_helper.c
89
@@ -XXX,XX +XXX,XX @@ DO_ZPZW(sve_lsl_zpzw_s, uint32_t, uint64_t, H1_4, DO_LSL)
90
91
#undef DO_ZPZW
92
93
+/* Fully general two-operand expander, controlled by a predicate.
94
+ */
95
+#define DO_ZPZ(NAME, TYPE, H, OP) \
96
+void HELPER(NAME)(void *vd, void *vn, void *vg, uint32_t desc) \
97
+{ \
98
+ intptr_t i, opr_sz = simd_oprsz(desc); \
99
+ for (i = 0; i < opr_sz; ) { \
100
+ uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \
101
+ do { \
102
+ if (pg & 1) { \
103
+ TYPE nn = *(TYPE *)(vn + H(i)); \
104
+ *(TYPE *)(vd + H(i)) = OP(nn); \
105
+ } \
106
+ i += sizeof(TYPE), pg >>= sizeof(TYPE); \
107
+ } while (i & 15); \
108
+ } \
109
+}
110
+
111
+/* Similarly, specialized for 64-bit operands. */
112
+#define DO_ZPZ_D(NAME, TYPE, OP) \
113
+void HELPER(NAME)(void *vd, void *vn, void *vg, uint32_t desc) \
114
+{ \
115
+ intptr_t i, opr_sz = simd_oprsz(desc) / 8; \
116
+ TYPE *d = vd, *n = vn; \
117
+ uint8_t *pg = vg; \
118
+ for (i = 0; i < opr_sz; i += 1) { \
119
+ if (pg[H1(i)] & 1) { \
120
+ TYPE nn = n[i]; \
121
+ d[i] = OP(nn); \
122
+ } \
123
+ } \
124
+}
125
+
126
+#define DO_CLS_B(N) (clrsb32(N) - 24)
127
+#define DO_CLS_H(N) (clrsb32(N) - 16)
128
+
129
+DO_ZPZ(sve_cls_b, int8_t, H1, DO_CLS_B)
130
+DO_ZPZ(sve_cls_h, int16_t, H1_2, DO_CLS_H)
131
+DO_ZPZ(sve_cls_s, int32_t, H1_4, clrsb32)
132
+DO_ZPZ_D(sve_cls_d, int64_t, clrsb64)
133
+
134
+#define DO_CLZ_B(N) (clz32(N) - 24)
135
+#define DO_CLZ_H(N) (clz32(N) - 16)
136
+
137
+DO_ZPZ(sve_clz_b, uint8_t, H1, DO_CLZ_B)
138
+DO_ZPZ(sve_clz_h, uint16_t, H1_2, DO_CLZ_H)
139
+DO_ZPZ(sve_clz_s, uint32_t, H1_4, clz32)
140
+DO_ZPZ_D(sve_clz_d, uint64_t, clz64)
141
+
142
+DO_ZPZ(sve_cnt_zpz_b, uint8_t, H1, ctpop8)
143
+DO_ZPZ(sve_cnt_zpz_h, uint16_t, H1_2, ctpop16)
144
+DO_ZPZ(sve_cnt_zpz_s, uint32_t, H1_4, ctpop32)
145
+DO_ZPZ_D(sve_cnt_zpz_d, uint64_t, ctpop64)
146
+
147
+#define DO_CNOT(N) (N == 0)
148
+
149
+DO_ZPZ(sve_cnot_b, uint8_t, H1, DO_CNOT)
150
+DO_ZPZ(sve_cnot_h, uint16_t, H1_2, DO_CNOT)
151
+DO_ZPZ(sve_cnot_s, uint32_t, H1_4, DO_CNOT)
152
+DO_ZPZ_D(sve_cnot_d, uint64_t, DO_CNOT)
153
+
154
+#define DO_FABS(N) (N & ((__typeof(N))-1 >> 1))
155
+
156
+DO_ZPZ(sve_fabs_h, uint16_t, H1_2, DO_FABS)
157
+DO_ZPZ(sve_fabs_s, uint32_t, H1_4, DO_FABS)
158
+DO_ZPZ_D(sve_fabs_d, uint64_t, DO_FABS)
159
+
160
+#define DO_FNEG(N) (N ^ ~((__typeof(N))-1 >> 1))
161
+
162
+DO_ZPZ(sve_fneg_h, uint16_t, H1_2, DO_FNEG)
163
+DO_ZPZ(sve_fneg_s, uint32_t, H1_4, DO_FNEG)
164
+DO_ZPZ_D(sve_fneg_d, uint64_t, DO_FNEG)
165
+
166
+#define DO_NOT(N) (~N)
167
+
168
+DO_ZPZ(sve_not_zpz_b, uint8_t, H1, DO_NOT)
169
+DO_ZPZ(sve_not_zpz_h, uint16_t, H1_2, DO_NOT)
170
+DO_ZPZ(sve_not_zpz_s, uint32_t, H1_4, DO_NOT)
171
+DO_ZPZ_D(sve_not_zpz_d, uint64_t, DO_NOT)
172
+
173
+#define DO_SXTB(N) ((int8_t)N)
174
+#define DO_SXTH(N) ((int16_t)N)
175
+#define DO_SXTS(N) ((int32_t)N)
176
+#define DO_UXTB(N) ((uint8_t)N)
177
+#define DO_UXTH(N) ((uint16_t)N)
178
+#define DO_UXTS(N) ((uint32_t)N)
179
+
180
+DO_ZPZ(sve_sxtb_h, uint16_t, H1_2, DO_SXTB)
181
+DO_ZPZ(sve_sxtb_s, uint32_t, H1_4, DO_SXTB)
182
+DO_ZPZ(sve_sxth_s, uint32_t, H1_4, DO_SXTH)
183
+DO_ZPZ_D(sve_sxtb_d, uint64_t, DO_SXTB)
184
+DO_ZPZ_D(sve_sxth_d, uint64_t, DO_SXTH)
185
+DO_ZPZ_D(sve_sxtw_d, uint64_t, DO_SXTS)
186
+
187
+DO_ZPZ(sve_uxtb_h, uint16_t, H1_2, DO_UXTB)
188
+DO_ZPZ(sve_uxtb_s, uint32_t, H1_4, DO_UXTB)
189
+DO_ZPZ(sve_uxth_s, uint32_t, H1_4, DO_UXTH)
190
+DO_ZPZ_D(sve_uxtb_d, uint64_t, DO_UXTB)
191
+DO_ZPZ_D(sve_uxth_d, uint64_t, DO_UXTH)
192
+DO_ZPZ_D(sve_uxtw_d, uint64_t, DO_UXTS)
193
+
194
+#define DO_ABS(N) (N < 0 ? -N : N)
195
+
196
+DO_ZPZ(sve_abs_b, int8_t, H1, DO_ABS)
197
+DO_ZPZ(sve_abs_h, int16_t, H1_2, DO_ABS)
198
+DO_ZPZ(sve_abs_s, int32_t, H1_4, DO_ABS)
199
+DO_ZPZ_D(sve_abs_d, int64_t, DO_ABS)
200
+
201
+#define DO_NEG(N) (-N)
202
+
203
+DO_ZPZ(sve_neg_b, uint8_t, H1, DO_NEG)
204
+DO_ZPZ(sve_neg_h, uint16_t, H1_2, DO_NEG)
205
+DO_ZPZ(sve_neg_s, uint32_t, H1_4, DO_NEG)
206
+DO_ZPZ_D(sve_neg_d, uint64_t, DO_NEG)
207
+
208
+#undef DO_CLS_B
209
+#undef DO_CLS_H
210
+#undef DO_CLZ_B
211
+#undef DO_CLZ_H
212
+#undef DO_CNOT
213
+#undef DO_FABS
214
+#undef DO_FNEG
215
+#undef DO_ABS
216
+#undef DO_NEG
217
+#undef DO_ZPZ
218
+#undef DO_ZPZ_D
219
+
220
/* Two-operand reduction expander, controlled by a predicate.
221
* The difference between TYPERED and TYPERET has to do with
222
* sign-extension. E.g. for SMAX, TYPERED must be signed,
223
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
224
index XXXXXXX..XXXXXXX 100644
225
--- a/target/arm/translate-sve.c
226
+++ b/target/arm/translate-sve.c
227
@@ -XXX,XX +XXX,XX @@ static bool trans_UDIV_zpzz(DisasContext *s, arg_rprr_esz *a, uint32_t insn)
228
229
#undef DO_ZPZZ
230
231
+/*
232
+ *** SVE Integer Arithmetic - Unary Predicated Group
233
+ */
234
+
235
+static bool do_zpz_ool(DisasContext *s, arg_rpr_esz *a, gen_helper_gvec_3 *fn)
236
+{
237
+ if (fn == NULL) {
238
+ return false;
239
+ }
240
+ if (sve_access_check(s)) {
241
+ unsigned vsz = vec_full_reg_size(s);
242
+ tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd),
243
+ vec_full_reg_offset(s, a->rn),
244
+ pred_full_reg_offset(s, a->pg),
245
+ vsz, vsz, 0, fn);
246
+ }
247
+ return true;
248
+}
249
+
250
+#define DO_ZPZ(NAME, name) \
251
+static bool trans_##NAME(DisasContext *s, arg_rpr_esz *a, uint32_t insn) \
252
+{ \
253
+ static gen_helper_gvec_3 * const fns[4] = { \
254
+ gen_helper_sve_##name##_b, gen_helper_sve_##name##_h, \
255
+ gen_helper_sve_##name##_s, gen_helper_sve_##name##_d, \
256
+ }; \
257
+ return do_zpz_ool(s, a, fns[a->esz]); \
258
+}
259
+
260
+DO_ZPZ(CLS, cls)
261
+DO_ZPZ(CLZ, clz)
262
+DO_ZPZ(CNT_zpz, cnt_zpz)
263
+DO_ZPZ(CNOT, cnot)
264
+DO_ZPZ(NOT_zpz, not_zpz)
265
+DO_ZPZ(ABS, abs)
266
+DO_ZPZ(NEG, neg)
267
+
268
+static bool trans_FABS(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
269
+{
270
+ static gen_helper_gvec_3 * const fns[4] = {
271
+ NULL,
272
+ gen_helper_sve_fabs_h,
273
+ gen_helper_sve_fabs_s,
274
+ gen_helper_sve_fabs_d
275
+ };
276
+ return do_zpz_ool(s, a, fns[a->esz]);
277
+}
278
+
279
+static bool trans_FNEG(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
280
+{
281
+ static gen_helper_gvec_3 * const fns[4] = {
282
+ NULL,
283
+ gen_helper_sve_fneg_h,
284
+ gen_helper_sve_fneg_s,
285
+ gen_helper_sve_fneg_d
286
+ };
287
+ return do_zpz_ool(s, a, fns[a->esz]);
288
+}
289
+
290
+static bool trans_SXTB(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
291
+{
292
+ static gen_helper_gvec_3 * const fns[4] = {
293
+ NULL,
294
+ gen_helper_sve_sxtb_h,
295
+ gen_helper_sve_sxtb_s,
296
+ gen_helper_sve_sxtb_d
297
+ };
298
+ return do_zpz_ool(s, a, fns[a->esz]);
299
+}
300
+
301
+static bool trans_UXTB(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
302
+{
303
+ static gen_helper_gvec_3 * const fns[4] = {
304
+ NULL,
305
+ gen_helper_sve_uxtb_h,
306
+ gen_helper_sve_uxtb_s,
307
+ gen_helper_sve_uxtb_d
308
+ };
309
+ return do_zpz_ool(s, a, fns[a->esz]);
310
+}
311
+
312
+static bool trans_SXTH(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
313
+{
314
+ static gen_helper_gvec_3 * const fns[4] = {
315
+ NULL, NULL,
316
+ gen_helper_sve_sxth_s,
317
+ gen_helper_sve_sxth_d
318
+ };
319
+ return do_zpz_ool(s, a, fns[a->esz]);
320
+}
321
+
322
+static bool trans_UXTH(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
323
+{
324
+ static gen_helper_gvec_3 * const fns[4] = {
325
+ NULL, NULL,
326
+ gen_helper_sve_uxth_s,
327
+ gen_helper_sve_uxth_d
328
+ };
329
+ return do_zpz_ool(s, a, fns[a->esz]);
330
+}
331
+
332
+static bool trans_SXTW(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
333
+{
334
+ return do_zpz_ool(s, a, a->esz == 3 ? gen_helper_sve_sxtw_d : NULL);
335
+}
336
+
337
+static bool trans_UXTW(DisasContext *s, arg_rpr_esz *a, uint32_t insn)
338
+{
339
+ return do_zpz_ool(s, a, a->esz == 3 ? gen_helper_sve_uxtw_d : NULL);
340
+}
341
+
342
+#undef DO_ZPZ
343
+
344
/*
345
*** SVE Integer Reduction Group
346
*/
347
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
348
index XXXXXXX..XXXXXXX 100644
349
--- a/target/arm/sve.decode
350
+++ b/target/arm/sve.decode
351
@@ -XXX,XX +XXX,XX @@ ASR_zpzw 00000100 .. 011 000 100 ... ..... ..... @rdn_pg_rm
352
LSR_zpzw 00000100 .. 011 001 100 ... ..... ..... @rdn_pg_rm
353
LSL_zpzw 00000100 .. 011 011 100 ... ..... ..... @rdn_pg_rm
354
355
+### SVE Integer Arithmetic - Unary Predicated Group
356
+
357
+# SVE unary bit operations (predicated)
358
+# Note esz != 0 for FABS and FNEG.
359
+CLS 00000100 .. 011 000 101 ... ..... ..... @rd_pg_rn
360
+CLZ 00000100 .. 011 001 101 ... ..... ..... @rd_pg_rn
361
+CNT_zpz 00000100 .. 011 010 101 ... ..... ..... @rd_pg_rn
362
+CNOT 00000100 .. 011 011 101 ... ..... ..... @rd_pg_rn
363
+NOT_zpz 00000100 .. 011 110 101 ... ..... ..... @rd_pg_rn
364
+FABS 00000100 .. 011 100 101 ... ..... ..... @rd_pg_rn
365
+FNEG 00000100 .. 011 101 101 ... ..... ..... @rd_pg_rn
366
+
367
+# SVE integer unary operations (predicated)
368
+# Note esz > original size for extensions.
369
+ABS 00000100 .. 010 110 101 ... ..... ..... @rd_pg_rn
370
+NEG 00000100 .. 010 111 101 ... ..... ..... @rd_pg_rn
371
+SXTB 00000100 .. 010 000 101 ... ..... ..... @rd_pg_rn
372
+UXTB 00000100 .. 010 001 101 ... ..... ..... @rd_pg_rn
373
+SXTH 00000100 .. 010 010 101 ... ..... ..... @rd_pg_rn
374
+UXTH 00000100 .. 010 011 101 ... ..... ..... @rd_pg_rn
375
+SXTW 00000100 .. 010 100 101 ... ..... ..... @rd_pg_rn
376
+UXTW 00000100 .. 010 101 101 ... ..... ..... @rd_pg_rn
377
+
378
### SVE Logical - Unpredicated Group
379
380
# SVE bitwise logical operations (unpredicated)
381
--
382
2.17.0
383
384
diff view generated by jsdifflib