1
Most of this is the Neon decodetree patches, followed by Edgar's versal cleanups.
1
Mostly my decodetree stuff, but also some patches for various
2
smaller bugs/features from others.
2
3
3
thanks
4
thanks
4
-- PMM
5
-- PMM
5
6
7
The following changes since commit 53550e81e2cafe7c03a39526b95cd21b5194d9b1:
6
8
7
The following changes since commit 2ef486e76d64436be90f7359a3071fb2a56ce835:
9
Merge remote-tracking branch 'remotes/berrange/tags/qcrypto-next-pull-request' into staging (2020-06-15 16:36:34 +0100)
8
9
Merge remote-tracking branch 'remotes/marcel/tags/rdma-pull-request' into staging (2020-05-03 14:12:56 +0100)
10
10
11
are available in the Git repository at:
11
are available in the Git repository at:
12
12
13
https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20200504
13
https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20200616
14
14
15
for you to fetch changes up to 9aefc6cf9b73f66062d2f914a0136756e7a28211:
15
for you to fetch changes up to 64b397417a26509bcdff44ab94356a35c7901c79:
16
16
17
target/arm: Move gen_ function typedefs to translate.h (2020-05-04 12:59:26 +0100)
17
hw: arm: Set vendor property for IMX SDHCI emulations (2020-06-16 10:32:29 +0100)
18
18
19
----------------------------------------------------------------
19
----------------------------------------------------------------
20
target-arm queue:
20
* hw: arm: Set vendor property for IMX SDHCI emulations
21
* Start of conversion of Neon insns to decodetree
21
* sd: sdhci: Implement basic vendor specific register support
22
* versal board: support SD and RTC
22
* hw/net/imx_fec: Convert debug fprintf() to trace events
23
* Implement ARMv8.2-TTS2UXN
23
* target/arm/cpu: adjust virtual time for all KVM arm cpus
24
* Make VQDMULL undefined when U=1
24
* Implement configurable descriptor size in ftgmac100
25
* Some minor code cleanups
25
* hw/misc/imx6ul_ccm: Implement non writable bits in CCM registers
26
* target/arm: More Neon decodetree conversion work
26
27
27
----------------------------------------------------------------
28
----------------------------------------------------------------
28
Edgar E. Iglesias (11):
29
Erik Smit (1):
29
hw/arm: versal: Remove inclusion of arm_gicv3_common.h
30
Implement configurable descriptor size in ftgmac100
30
hw/arm: versal: Move misplaced comment
31
hw/arm: versal-virt: Fix typo xlnx-ve -> xlnx-versal
32
hw/arm: versal: Embed the UARTs into the SoC type
33
hw/arm: versal: Embed the GEMs into the SoC type
34
hw/arm: versal: Embed the ADMAs into the SoC type
35
hw/arm: versal: Embed the APUs into the SoC type
36
hw/arm: versal: Add support for SD
37
hw/arm: versal: Add support for the RTC
38
hw/arm: versal-virt: Add support for SD
39
hw/arm: versal-virt: Add support for the RTC
40
31
41
Fredrik Strupe (1):
32
Guenter Roeck (2):
42
target/arm: Make VQDMULL undefined when U=1
33
sd: sdhci: Implement basic vendor specific register support
34
hw: arm: Set vendor property for IMX SDHCI emulations
43
35
44
Peter Maydell (25):
36
Jean-Christophe Dubois (2):
45
target/arm: Don't use a TLB for ARMMMUIdx_Stage2
37
hw/misc/imx6ul_ccm: Implement non writable bits in CCM registers
46
target/arm: Use enum constant in get_phys_addr_lpae() call
38
hw/net/imx_fec: Convert debug fprintf() to trace events
47
target/arm: Add new 's1_is_el0' argument to get_phys_addr_lpae()
48
target/arm: Implement ARMv8.2-TTS2UXN
49
target/arm: Use correct variable for setting 'max' cpu's ID_AA64DFR0
50
target/arm/translate-vfp.inc.c: Remove duplicate simd_r32 check
51
target/arm: Don't allow Thumb Neon insns without FEATURE_NEON
52
target/arm: Add stubs for AArch32 Neon decodetree
53
target/arm: Convert VCMLA (vector) to decodetree
54
target/arm: Convert VCADD (vector) to decodetree
55
target/arm: Convert V[US]DOT (vector) to decodetree
56
target/arm: Convert VFM[AS]L (vector) to decodetree
57
target/arm: Convert VCMLA (scalar) to decodetree
58
target/arm: Convert V[US]DOT (scalar) to decodetree
59
target/arm: Convert VFM[AS]L (scalar) to decodetree
60
target/arm: Convert Neon load/store multiple structures to decodetree
61
target/arm: Convert Neon 'load single structure to all lanes' to decodetree
62
target/arm: Convert Neon 'load/store single structure' to decodetree
63
target/arm: Convert Neon 3-reg-same VADD/VSUB to decodetree
64
target/arm: Convert Neon 3-reg-same logic ops to decodetree
65
target/arm: Convert Neon 3-reg-same VMAX/VMIN to decodetree
66
target/arm: Convert Neon 3-reg-same comparisons to decodetree
67
target/arm: Convert Neon 3-reg-same VQADD/VQSUB to decodetree
68
target/arm: Convert Neon 3-reg-same VMUL, VMLA, VMLS, VSHL to decodetree
69
target/arm: Move gen_ function typedefs to translate.h
70
39
71
Philippe Mathieu-Daudé (2):
40
Peter Maydell (17):
72
hw/arm/mps2-tz: Use TYPE_IOTKIT instead of hardcoded string
41
target/arm: Fix missing temp frees in do_vshll_2sh
73
target/arm: Use uint64_t for midr field in CPU state struct
42
target/arm: Convert Neon 3-reg-diff prewidening ops to decodetree
43
target/arm: Convert Neon 3-reg-diff narrowing ops to decodetree
44
target/arm: Convert Neon 3-reg-diff VABAL, VABDL to decodetree
45
target/arm: Convert Neon 3-reg-diff long multiplies
46
target/arm: Convert Neon 3-reg-diff saturating doubling multiplies
47
target/arm: Convert Neon 3-reg-diff polynomial VMULL
48
target/arm: Add 'static' and 'const' annotations to VSHLL function arrays
49
target/arm: Add missing TCG temp free in do_2shift_env_64()
50
target/arm: Convert Neon 2-reg-scalar integer multiplies to decodetree
51
target/arm: Convert Neon 2-reg-scalar float multiplies to decodetree
52
target/arm: Convert Neon 2-reg-scalar VQDMULH, VQRDMULH to decodetree
53
target/arm: Convert Neon 2-reg-scalar VQRDMLAH, VQRDMLSH to decodetree
54
target/arm: Convert Neon 2-reg-scalar long multiplies to decodetree
55
target/arm: Convert Neon VEXT to decodetree
56
target/arm: Convert Neon VTBL, VTBX to decodetree
57
target/arm: Convert Neon VDUP (scalar) to decodetree
74
58
75
include/hw/arm/xlnx-versal.h | 31 +-
59
fangying (1):
76
target/arm/cpu-param.h | 2 +-
60
target/arm/cpu: adjust virtual time for all KVM arm cpus
77
target/arm/cpu.h | 38 ++-
78
target/arm/translate-a64.h | 9 -
79
target/arm/translate.h | 26 ++
80
target/arm/neon-dp.decode | 86 +++++
81
target/arm/neon-ls.decode | 52 +++
82
target/arm/neon-shared.decode | 66 ++++
83
hw/arm/mps2-tz.c | 2 +-
84
hw/arm/xlnx-versal-virt.c | 74 ++++-
85
hw/arm/xlnx-versal.c | 115 +++++--
86
target/arm/cpu.c | 3 +-
87
target/arm/cpu64.c | 8 +-
88
target/arm/helper.c | 183 ++++------
89
target/arm/translate-a64.c | 17 -
90
target/arm/translate-neon.inc.c | 714 +++++++++++++++++++++++++++++++++++++++
91
target/arm/translate-vfp.inc.c | 6 -
92
target/arm/translate.c | 716 +++-------------------------------------
93
target/arm/Makefile.objs | 18 +
94
19 files changed, 1302 insertions(+), 864 deletions(-)
95
create mode 100644 target/arm/neon-dp.decode
96
create mode 100644 target/arm/neon-ls.decode
97
create mode 100644 target/arm/neon-shared.decode
98
create mode 100644 target/arm/translate-neon.inc.c
99
61
62
hw/sd/sdhci-internal.h | 5 +
63
include/hw/sd/sdhci.h | 5 +
64
target/arm/translate.h | 1 +
65
target/arm/neon-dp.decode | 130 +++++
66
hw/arm/fsl-imx25.c | 6 +
67
hw/arm/fsl-imx6.c | 6 +
68
hw/arm/fsl-imx6ul.c | 2 +
69
hw/arm/fsl-imx7.c | 2 +
70
hw/misc/imx6ul_ccm.c | 76 ++-
71
hw/net/ftgmac100.c | 26 +-
72
hw/net/imx_fec.c | 106 ++--
73
hw/sd/sdhci.c | 18 +-
74
target/arm/cpu.c | 6 +-
75
target/arm/cpu64.c | 1 -
76
target/arm/kvm.c | 21 +-
77
target/arm/translate-neon.inc.c | 1148 ++++++++++++++++++++++++++++++++++++++-
78
target/arm/translate.c | 684 +----------------------
79
hw/net/trace-events | 18 +
80
18 files changed, 1495 insertions(+), 766 deletions(-)
81
diff view generated by jsdifflib
Deleted patch
1
From: Fredrik Strupe <fredrik@strupe.net>
2
1
3
According to Arm ARM, VQDMULL is only valid when U=0, while having
4
U=1 is unallocated.
5
6
Signed-off-by: Fredrik Strupe <fredrik@strupe.net>
7
Fixes: 695272dcb976 ("target-arm: Handle UNDEF cases for Neon 3-regs-different-widths")
8
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
10
---
11
target/arm/translate.c | 2 +-
12
1 file changed, 1 insertion(+), 1 deletion(-)
13
14
diff --git a/target/arm/translate.c b/target/arm/translate.c
15
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/translate.c
17
+++ b/target/arm/translate.c
18
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
19
{0, 0, 0, 0}, /* VMLSL */
20
{0, 0, 0, 9}, /* VQDMLSL */
21
{0, 0, 0, 0}, /* Integer VMULL */
22
- {0, 0, 0, 1}, /* VQDMULL */
23
+ {0, 0, 0, 9}, /* VQDMULL */
24
{0, 0, 0, 0xa}, /* Polynomial VMULL */
25
{0, 0, 0, 7}, /* Reserved: always UNDEF */
26
};
27
--
28
2.20.1
29
30
diff view generated by jsdifflib
1
We were accidentally permitting decode of Thumb Neon insns even if
1
The widenfn() in do_vshll_2sh() does not free the input 32-bit
2
the CPU didn't have the FEATURE_NEON bit set, because the feature
2
TCGv, so we need to do this in the calling code.
3
check was being done before the call to disas_neon_data_insn() and
4
disas_neon_ls_insn() in the Arm decoder but was omitted from the
5
Thumb decoder. Push the feature bit check down into the called
6
functions so it is done for both Arm and Thumb encodings.
7
3
8
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
10
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
6
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
11
Message-id: 20200430181003.21682-3-peter.maydell@linaro.org
12
---
7
---
13
target/arm/translate.c | 16 ++++++++--------
8
target/arm/translate-neon.inc.c | 2 ++
14
1 file changed, 8 insertions(+), 8 deletions(-)
9
1 file changed, 2 insertions(+)
15
10
16
diff --git a/target/arm/translate.c b/target/arm/translate.c
11
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
17
index XXXXXXX..XXXXXXX 100644
12
index XXXXXXX..XXXXXXX 100644
18
--- a/target/arm/translate.c
13
--- a/target/arm/translate-neon.inc.c
19
+++ b/target/arm/translate.c
14
+++ b/target/arm/translate-neon.inc.c
20
@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
15
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
21
TCGv_i32 tmp2;
16
tmp = tcg_temp_new_i64();
22
TCGv_i64 tmp64;
17
23
18
widenfn(tmp, rm0);
24
+ if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
19
+ tcg_temp_free_i32(rm0);
25
+ return 1;
20
if (a->shift != 0) {
26
+ }
21
tcg_gen_shli_i64(tmp, tmp, a->shift);
27
+
22
tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
28
/* FIXME: this access check should not take precedence over UNDEF
23
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
29
* for invalid encodings; we will generate incorrect syndrome information
24
neon_store_reg64(tmp, a->vd);
30
* for attempts to execute invalid vfp/neon encodings with FP disabled.
25
31
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
26
widenfn(tmp, rm1);
32
TCGv_ptr ptr1, ptr2, ptr3;
27
+ tcg_temp_free_i32(rm1);
33
TCGv_i64 tmp64;
28
if (a->shift != 0) {
34
29
tcg_gen_shli_i64(tmp, tmp, a->shift);
35
+ if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
30
tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
36
+ return 1;
37
+ }
38
+
39
/* FIXME: this access check should not take precedence over UNDEF
40
* for invalid encodings; we will generate incorrect syndrome information
41
* for attempts to execute invalid vfp/neon encodings with FP disabled.
42
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
43
44
if (((insn >> 25) & 7) == 1) {
45
/* NEON Data processing. */
46
- if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
47
- goto illegal_op;
48
- }
49
-
50
if (disas_neon_data_insn(s, insn)) {
51
goto illegal_op;
52
}
53
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
54
}
55
if ((insn & 0x0f100000) == 0x04000000) {
56
/* NEON load/store. */
57
- if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
58
- goto illegal_op;
59
- }
60
-
61
if (disas_neon_ls_insn(s, insn)) {
62
goto illegal_op;
63
}
64
--
31
--
65
2.20.1
32
2.20.1
66
33
67
34
diff view generated by jsdifflib
1
Convert the V[US]DOT (vector) insns to decodetree.
1
Convert the "pre-widening" insns VADDL, VSUBL, VADDW and VSUBW
2
in the Neon 3-registers-different-lengths group to decodetree.
3
These insns work by widening one or both inputs to double their
4
size, performing an add or subtract at the doubled size and
5
then storing the double-size result.
6
7
As usual, rather than copying the loop of the original decoder
8
(which needs awkward code to avoid problems when source and
9
destination registers overlap) we just unroll the two passes.
2
10
3
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
12
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
5
Message-id: 20200430181003.21682-7-peter.maydell@linaro.org
6
---
13
---
7
target/arm/neon-shared.decode | 4 ++++
14
target/arm/neon-dp.decode | 43 +++++++++++++
8
target/arm/translate-neon.inc.c | 32 ++++++++++++++++++++++++++++++++
15
target/arm/translate-neon.inc.c | 104 ++++++++++++++++++++++++++++++++
9
target/arm/translate.c | 9 +--------
16
target/arm/translate.c | 16 ++---
10
3 files changed, 37 insertions(+), 8 deletions(-)
17
3 files changed, 151 insertions(+), 12 deletions(-)
11
18
12
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
19
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
13
index XXXXXXX..XXXXXXX 100644
20
index XXXXXXX..XXXXXXX 100644
14
--- a/target/arm/neon-shared.decode
21
--- a/target/arm/neon-dp.decode
15
+++ b/target/arm/neon-shared.decode
22
+++ b/target/arm/neon-dp.decode
16
@@ -XXX,XX +XXX,XX @@ VCMLA 1111 110 rot:2 . 1 size:1 .... .... 1000 . q:1 . 0 .... \
23
@@ -XXX,XX +XXX,XX @@ VCVT_FU_2sh 1111 001 1 1 . ...... .... 1111 0 . . 1 .... @2reg_vcvt
17
24
# So we have a single decode line and check the cmode/op in the
18
VCADD 1111 110 rot:1 1 . 0 size:1 .... .... 1000 . q:1 . 0 .... \
25
# trans function.
19
vm=%vm_dp vn=%vn_dp vd=%vd_dp
26
Vimm_1r 1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
20
+
27
+
21
+# VUDOT and VSDOT
28
+######################################################################
22
+VDOT 1111 110 00 . 10 .... .... 1101 . q:1 . u:1 .... \
29
+# Within the "two registers, or three registers of different lengths"
23
+ vm=%vm_dp vn=%vn_dp vd=%vd_dp
30
+# grouping ([23,4]=0b10), bits [21:20] are either part of the opcode
31
+# decode: 0b11 for VEXT, two-reg-misc, VTBL, and duplicate-scalar;
32
+# or they are a size field for the three-reg-different-lengths and
33
+# two-reg-and-scalar insn groups (where size cannot be 0b11). This
34
+# is slightly awkward for decodetree: we handle it with this
35
+# non-exclusive group which contains within it two exclusive groups:
36
+# one for the size=0b11 patterns, and one for the size-not-0b11
37
+# patterns. This allows us to check that none of the insns within
38
+# each subgroup accidentally overlap each other. Note that all the
39
+# trans functions for the size-not-0b11 patterns must check and
40
+# return false for size==3.
41
+######################################################################
42
+{
43
+ # 0b11 subgroup will go here
44
+
45
+ # Subgroup for size != 0b11
46
+ [
47
+ ##################################################################
48
+ # 3-reg-different-length grouping:
49
+ # 1111 001 U 1 D sz!=11 Vn:4 Vd:4 opc:4 N 0 M 0 Vm:4
50
+ ##################################################################
51
+
52
+ &3diff vm vn vd size
53
+
54
+ @3diff .... ... . . . size:2 .... .... .... . . . . .... \
55
+ &3diff vm=%vm_dp vn=%vn_dp vd=%vd_dp
56
+
57
+ VADDL_S_3d 1111 001 0 1 . .. .... .... 0000 . 0 . 0 .... @3diff
58
+ VADDL_U_3d 1111 001 1 1 . .. .... .... 0000 . 0 . 0 .... @3diff
59
+
60
+ VADDW_S_3d 1111 001 0 1 . .. .... .... 0001 . 0 . 0 .... @3diff
61
+ VADDW_U_3d 1111 001 1 1 . .. .... .... 0001 . 0 . 0 .... @3diff
62
+
63
+ VSUBL_S_3d 1111 001 0 1 . .. .... .... 0010 . 0 . 0 .... @3diff
64
+ VSUBL_U_3d 1111 001 1 1 . .. .... .... 0010 . 0 . 0 .... @3diff
65
+
66
+ VSUBW_S_3d 1111 001 0 1 . .. .... .... 0011 . 0 . 0 .... @3diff
67
+ VSUBW_U_3d 1111 001 1 1 . .. .... .... 0011 . 0 . 0 .... @3diff
68
+ ]
69
+}
24
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
70
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
25
index XXXXXXX..XXXXXXX 100644
71
index XXXXXXX..XXXXXXX 100644
26
--- a/target/arm/translate-neon.inc.c
72
--- a/target/arm/translate-neon.inc.c
27
+++ b/target/arm/translate-neon.inc.c
73
+++ b/target/arm/translate-neon.inc.c
28
@@ -XXX,XX +XXX,XX @@ static bool trans_VCADD(DisasContext *s, arg_VCADD *a)
74
@@ -XXX,XX +XXX,XX @@ static bool trans_Vimm_1r(DisasContext *s, arg_1reg_imm *a)
29
tcg_temp_free_ptr(fpst);
75
}
30
return true;
76
return do_1reg_imm(s, a, fn);
31
}
77
}
32
+
78
+
33
+static bool trans_VDOT(DisasContext *s, arg_VDOT *a)
79
+static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
80
+ NeonGenWidenFn *widenfn,
81
+ NeonGenTwo64OpFn *opfn,
82
+ bool src1_wide)
34
+{
83
+{
35
+ int opr_sz;
84
+ /* 3-regs different lengths, prewidening case (VADDL/VSUBL/VAADW/VSUBW) */
36
+ gen_helper_gvec_3 *fn_gvec;
85
+ TCGv_i64 rn0_64, rn1_64, rm_64;
37
+
86
+ TCGv_i32 rm;
38
+ if (!dc_isar_feature(aa32_dp, s)) {
87
+
88
+ if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
39
+ return false;
89
+ return false;
40
+ }
90
+ }
41
+
91
+
42
+ /* UNDEF accesses to D16-D31 if they don't exist. */
92
+ /* UNDEF accesses to D16-D31 if they don't exist. */
43
+ if (!dc_isar_feature(aa32_simd_r32, s) &&
93
+ if (!dc_isar_feature(aa32_simd_r32, s) &&
44
+ ((a->vd | a->vn | a->vm) & 0x10)) {
94
+ ((a->vd | a->vn | a->vm) & 0x10)) {
45
+ return false;
95
+ return false;
46
+ }
96
+ }
47
+
97
+
48
+ if ((a->vn | a->vm | a->vd) & a->q) {
98
+ if (!widenfn || !opfn) {
99
+ /* size == 3 case, which is an entirely different insn group */
100
+ return false;
101
+ }
102
+
103
+ if ((a->vd & 1) || (src1_wide && (a->vn & 1))) {
49
+ return false;
104
+ return false;
50
+ }
105
+ }
51
+
106
+
52
+ if (!vfp_access_check(s)) {
107
+ if (!vfp_access_check(s)) {
53
+ return true;
108
+ return true;
54
+ }
109
+ }
55
+
110
+
56
+ opr_sz = (1 + a->q) * 8;
111
+ rn0_64 = tcg_temp_new_i64();
57
+ fn_gvec = a->u ? gen_helper_gvec_udot_b : gen_helper_gvec_sdot_b;
112
+ rn1_64 = tcg_temp_new_i64();
58
+ tcg_gen_gvec_3_ool(vfp_reg_offset(1, a->vd),
113
+ rm_64 = tcg_temp_new_i64();
59
+ vfp_reg_offset(1, a->vn),
114
+
60
+ vfp_reg_offset(1, a->vm),
115
+ if (src1_wide) {
61
+ opr_sz, opr_sz, 0, fn_gvec);
116
+ neon_load_reg64(rn0_64, a->vn);
117
+ } else {
118
+ TCGv_i32 tmp = neon_load_reg(a->vn, 0);
119
+ widenfn(rn0_64, tmp);
120
+ tcg_temp_free_i32(tmp);
121
+ }
122
+ rm = neon_load_reg(a->vm, 0);
123
+
124
+ widenfn(rm_64, rm);
125
+ tcg_temp_free_i32(rm);
126
+ opfn(rn0_64, rn0_64, rm_64);
127
+
128
+ /*
129
+ * Load second pass inputs before storing the first pass result, to
130
+ * avoid incorrect results if a narrow input overlaps with the result.
131
+ */
132
+ if (src1_wide) {
133
+ neon_load_reg64(rn1_64, a->vn + 1);
134
+ } else {
135
+ TCGv_i32 tmp = neon_load_reg(a->vn, 1);
136
+ widenfn(rn1_64, tmp);
137
+ tcg_temp_free_i32(tmp);
138
+ }
139
+ rm = neon_load_reg(a->vm, 1);
140
+
141
+ neon_store_reg64(rn0_64, a->vd);
142
+
143
+ widenfn(rm_64, rm);
144
+ tcg_temp_free_i32(rm);
145
+ opfn(rn1_64, rn1_64, rm_64);
146
+ neon_store_reg64(rn1_64, a->vd + 1);
147
+
148
+ tcg_temp_free_i64(rn0_64);
149
+ tcg_temp_free_i64(rn1_64);
150
+ tcg_temp_free_i64(rm_64);
151
+
62
+ return true;
152
+ return true;
63
+}
153
+}
154
+
155
+#define DO_PREWIDEN(INSN, S, EXT, OP, SRC1WIDE) \
156
+ static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a) \
157
+ { \
158
+ static NeonGenWidenFn * const widenfn[] = { \
159
+ gen_helper_neon_widen_##S##8, \
160
+ gen_helper_neon_widen_##S##16, \
161
+ tcg_gen_##EXT##_i32_i64, \
162
+ NULL, \
163
+ }; \
164
+ static NeonGenTwo64OpFn * const addfn[] = { \
165
+ gen_helper_neon_##OP##l_u16, \
166
+ gen_helper_neon_##OP##l_u32, \
167
+ tcg_gen_##OP##_i64, \
168
+ NULL, \
169
+ }; \
170
+ return do_prewiden_3d(s, a, widenfn[a->size], \
171
+ addfn[a->size], SRC1WIDE); \
172
+ }
173
+
174
+DO_PREWIDEN(VADDL_S, s, ext, add, false)
175
+DO_PREWIDEN(VADDL_U, u, extu, add, false)
176
+DO_PREWIDEN(VSUBL_S, s, ext, sub, false)
177
+DO_PREWIDEN(VSUBL_U, u, extu, sub, false)
178
+DO_PREWIDEN(VADDW_S, s, ext, add, true)
179
+DO_PREWIDEN(VADDW_U, u, extu, add, true)
180
+DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
181
+DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
64
diff --git a/target/arm/translate.c b/target/arm/translate.c
182
diff --git a/target/arm/translate.c b/target/arm/translate.c
65
index XXXXXXX..XXXXXXX 100644
183
index XXXXXXX..XXXXXXX 100644
66
--- a/target/arm/translate.c
184
--- a/target/arm/translate.c
67
+++ b/target/arm/translate.c
185
+++ b/target/arm/translate.c
68
@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
186
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
69
bool is_long = false, q = extract32(insn, 6, 1);
187
/* Three registers of different lengths. */
70
bool ptr_is_env = false;
188
int src1_wide;
71
189
int src2_wide;
72
- if ((insn & 0xfeb00f00) == 0xfc200d00) {
190
- int prewiden;
73
- /* V[US]DOT -- 1111 1100 0.10 .... .... 1101 .Q.U .... */
191
/* undefreq: bit 0 : UNDEF if size == 0
74
- bool u = extract32(insn, 4, 1);
192
* bit 1 : UNDEF if size == 1
75
- if (!dc_isar_feature(aa32_dp, s)) {
193
* bit 2 : UNDEF if size == 2
76
- return 1;
194
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
77
- }
195
int undefreq;
78
- fn_gvec = u ? gen_helper_gvec_udot_b : gen_helper_gvec_sdot_b;
196
/* prewiden, src1_wide, src2_wide, undefreq */
79
- } else if ((insn & 0xff300f10) == 0xfc200810) {
197
static const int neon_3reg_wide[16][4] = {
80
+ if ((insn & 0xff300f10) == 0xfc200810) {
198
- {1, 0, 0, 0}, /* VADDL */
81
/* VFM[AS]L -- 1111 1100 S.10 .... .... 1000 .Q.1 .... */
199
- {1, 1, 0, 0}, /* VADDW */
82
int is_s = extract32(insn, 23, 1);
200
- {1, 0, 0, 0}, /* VSUBL */
83
if (!dc_isar_feature(aa32_fhm, s)) {
201
- {1, 1, 0, 0}, /* VSUBW */
202
+ {0, 0, 0, 7}, /* VADDL: handled by decodetree */
203
+ {0, 0, 0, 7}, /* VADDW: handled by decodetree */
204
+ {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
205
+ {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
206
{0, 1, 1, 0}, /* VADDHN */
207
{0, 0, 0, 0}, /* VABAL */
208
{0, 1, 1, 0}, /* VSUBHN */
209
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
210
{0, 0, 0, 7}, /* Reserved: always UNDEF */
211
};
212
213
- prewiden = neon_3reg_wide[op][0];
214
src1_wide = neon_3reg_wide[op][1];
215
src2_wide = neon_3reg_wide[op][2];
216
undefreq = neon_3reg_wide[op][3];
217
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
218
} else {
219
tmp = neon_load_reg(rn, pass);
220
}
221
- if (prewiden) {
222
- gen_neon_widen(cpu_V0, tmp, size, u);
223
- }
224
}
225
if (src2_wide) {
226
neon_load_reg64(cpu_V1, rm + pass);
227
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
228
} else {
229
tmp2 = neon_load_reg(rm, pass);
230
}
231
- if (prewiden) {
232
- gen_neon_widen(cpu_V1, tmp2, size, u);
233
- }
234
}
235
switch (op) {
236
case 0: case 1: case 4: /* VADDL, VADDW, VADDHN, VRADDHN */
84
--
237
--
85
2.20.1
238
2.20.1
86
239
87
240
diff view generated by jsdifflib
1
Convert the VCMLA (vector) insns in the 3same extension group to
1
Convert the narrow-to-high-half insns VADDHN, VSUBHN, VRADDHN,
2
VRSUBHN in the Neon 3-registers-different-lengths group to
2
decodetree.
3
decodetree.
3
4
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Message-id: 20200430181003.21682-5-peter.maydell@linaro.org
7
---
7
---
8
target/arm/neon-shared.decode | 11 ++++++++++
8
target/arm/neon-dp.decode | 6 +++
9
target/arm/translate-neon.inc.c | 37 +++++++++++++++++++++++++++++++++
9
target/arm/translate-neon.inc.c | 87 +++++++++++++++++++++++++++++++
10
target/arm/translate.c | 11 +---------
10
target/arm/translate.c | 91 ++++-----------------------------
11
3 files changed, 49 insertions(+), 10 deletions(-)
11
3 files changed, 104 insertions(+), 80 deletions(-)
12
12
13
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
13
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
14
index XXXXXXX..XXXXXXX 100644
14
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/neon-shared.decode
15
--- a/target/arm/neon-dp.decode
16
+++ b/target/arm/neon-shared.decode
16
+++ b/target/arm/neon-dp.decode
17
@@ -XXX,XX +XXX,XX @@
17
@@ -XXX,XX +XXX,XX @@ Vimm_1r 1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
18
# More specifically, this covers:
18
19
# 2reg scalar ext: 0b1111_1110_xxxx_xxxx_xxxx_1x0x_xxxx_xxxx
19
VSUBW_S_3d 1111 001 0 1 . .. .... .... 0011 . 0 . 0 .... @3diff
20
# 3same ext: 0b1111_110x_xxxx_xxxx_xxxx_1x0x_xxxx_xxxx
20
VSUBW_U_3d 1111 001 1 1 . .. .... .... 0011 . 0 . 0 .... @3diff
21
+
21
+
22
+# VFP/Neon register fields; same as vfp.decode
22
+ VADDHN_3d 1111 001 0 1 . .. .... .... 0100 . 0 . 0 .... @3diff
23
+%vm_dp 5:1 0:4
23
+ VRADDHN_3d 1111 001 1 1 . .. .... .... 0100 . 0 . 0 .... @3diff
24
+%vm_sp 0:4 5:1
24
+
25
+%vn_dp 7:1 16:4
25
+ VSUBHN_3d 1111 001 0 1 . .. .... .... 0110 . 0 . 0 .... @3diff
26
+%vn_sp 16:4 7:1
26
+ VRSUBHN_3d 1111 001 1 1 . .. .... .... 0110 . 0 . 0 .... @3diff
27
+%vd_dp 22:1 12:4
27
]
28
+%vd_sp 12:4 22:1
28
}
29
+
30
+VCMLA 1111 110 rot:2 . 1 size:1 .... .... 1000 . q:1 . 0 .... \
31
+ vm=%vm_dp vn=%vn_dp vd=%vd_dp
32
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
29
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
33
index XXXXXXX..XXXXXXX 100644
30
index XXXXXXX..XXXXXXX 100644
34
--- a/target/arm/translate-neon.inc.c
31
--- a/target/arm/translate-neon.inc.c
35
+++ b/target/arm/translate-neon.inc.c
32
+++ b/target/arm/translate-neon.inc.c
36
@@ -XXX,XX +XXX,XX @@
33
@@ -XXX,XX +XXX,XX @@ DO_PREWIDEN(VADDW_S, s, ext, add, true)
37
#include "decode-neon-dp.inc.c"
34
DO_PREWIDEN(VADDW_U, u, extu, add, true)
38
#include "decode-neon-ls.inc.c"
35
DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
39
#include "decode-neon-shared.inc.c"
36
DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
40
+
37
+
41
+static bool trans_VCMLA(DisasContext *s, arg_VCMLA *a)
38
+static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
39
+ NeonGenTwo64OpFn *opfn, NeonGenNarrowFn *narrowfn)
42
+{
40
+{
43
+ int opr_sz;
41
+ /* 3-regs different lengths, narrowing (VADDHN/VSUBHN/VRADDHN/VRSUBHN) */
44
+ TCGv_ptr fpst;
42
+ TCGv_i64 rn_64, rm_64;
45
+ gen_helper_gvec_3_ptr *fn_gvec_ptr;
43
+ TCGv_i32 rd0, rd1;
46
+
44
+
47
+ if (!dc_isar_feature(aa32_vcma, s)
45
+ if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
48
+ || (!a->size && !dc_isar_feature(aa32_fp16_arith, s))) {
49
+ return false;
46
+ return false;
50
+ }
47
+ }
51
+
48
+
52
+ /* UNDEF accesses to D16-D31 if they don't exist. */
49
+ /* UNDEF accesses to D16-D31 if they don't exist. */
53
+ if (!dc_isar_feature(aa32_simd_r32, s) &&
50
+ if (!dc_isar_feature(aa32_simd_r32, s) &&
54
+ ((a->vd | a->vn | a->vm) & 0x10)) {
51
+ ((a->vd | a->vn | a->vm) & 0x10)) {
55
+ return false;
52
+ return false;
56
+ }
53
+ }
57
+
54
+
58
+ if ((a->vn | a->vm | a->vd) & a->q) {
55
+ if (!opfn || !narrowfn) {
56
+ /* size == 3 case, which is an entirely different insn group */
57
+ return false;
58
+ }
59
+
60
+ if ((a->vn | a->vm) & 1) {
59
+ return false;
61
+ return false;
60
+ }
62
+ }
61
+
63
+
62
+ if (!vfp_access_check(s)) {
64
+ if (!vfp_access_check(s)) {
63
+ return true;
65
+ return true;
64
+ }
66
+ }
65
+
67
+
66
+ opr_sz = (1 + a->q) * 8;
68
+ rn_64 = tcg_temp_new_i64();
67
+ fpst = get_fpstatus_ptr(1);
69
+ rm_64 = tcg_temp_new_i64();
68
+ fn_gvec_ptr = a->size ? gen_helper_gvec_fcmlas : gen_helper_gvec_fcmlah;
70
+ rd0 = tcg_temp_new_i32();
69
+ tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
71
+ rd1 = tcg_temp_new_i32();
70
+ vfp_reg_offset(1, a->vn),
72
+
71
+ vfp_reg_offset(1, a->vm),
73
+ neon_load_reg64(rn_64, a->vn);
72
+ fpst, opr_sz, opr_sz, a->rot,
74
+ neon_load_reg64(rm_64, a->vm);
73
+ fn_gvec_ptr);
75
+
74
+ tcg_temp_free_ptr(fpst);
76
+ opfn(rn_64, rn_64, rm_64);
77
+
78
+ narrowfn(rd0, rn_64);
79
+
80
+ neon_load_reg64(rn_64, a->vn + 1);
81
+ neon_load_reg64(rm_64, a->vm + 1);
82
+
83
+ opfn(rn_64, rn_64, rm_64);
84
+
85
+ narrowfn(rd1, rn_64);
86
+
87
+ neon_store_reg(a->vd, 0, rd0);
88
+ neon_store_reg(a->vd, 1, rd1);
89
+
90
+ tcg_temp_free_i64(rn_64);
91
+ tcg_temp_free_i64(rm_64);
92
+
75
+ return true;
93
+ return true;
76
+}
94
+}
95
+
96
+#define DO_NARROW_3D(INSN, OP, NARROWTYPE, EXTOP) \
97
+ static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a) \
98
+ { \
99
+ static NeonGenTwo64OpFn * const addfn[] = { \
100
+ gen_helper_neon_##OP##l_u16, \
101
+ gen_helper_neon_##OP##l_u32, \
102
+ tcg_gen_##OP##_i64, \
103
+ NULL, \
104
+ }; \
105
+ static NeonGenNarrowFn * const narrowfn[] = { \
106
+ gen_helper_neon_##NARROWTYPE##_high_u8, \
107
+ gen_helper_neon_##NARROWTYPE##_high_u16, \
108
+ EXTOP, \
109
+ NULL, \
110
+ }; \
111
+ return do_narrow_3d(s, a, addfn[a->size], narrowfn[a->size]); \
112
+ }
113
+
114
+static void gen_narrow_round_high_u32(TCGv_i32 rd, TCGv_i64 rn)
115
+{
116
+ tcg_gen_addi_i64(rn, rn, 1u << 31);
117
+ tcg_gen_extrh_i64_i32(rd, rn);
118
+}
119
+
120
+DO_NARROW_3D(VADDHN, add, narrow, tcg_gen_extrh_i64_i32)
121
+DO_NARROW_3D(VSUBHN, sub, narrow, tcg_gen_extrh_i64_i32)
122
+DO_NARROW_3D(VRADDHN, add, narrow_round, gen_narrow_round_high_u32)
123
+DO_NARROW_3D(VRSUBHN, sub, narrow_round, gen_narrow_round_high_u32)
77
diff --git a/target/arm/translate.c b/target/arm/translate.c
124
diff --git a/target/arm/translate.c b/target/arm/translate.c
78
index XXXXXXX..XXXXXXX 100644
125
index XXXXXXX..XXXXXXX 100644
79
--- a/target/arm/translate.c
126
--- a/target/arm/translate.c
80
+++ b/target/arm/translate.c
127
+++ b/target/arm/translate.c
81
@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
128
@@ -XXX,XX +XXX,XX @@ static inline void gen_neon_addl(int size)
82
bool is_long = false, q = extract32(insn, 6, 1);
129
}
83
bool ptr_is_env = false;
130
}
84
131
85
- if ((insn & 0xfe200f10) == 0xfc200800) {
132
-static inline void gen_neon_subl(int size)
86
- /* VCMLA -- 1111 110R R.1S .... .... 1000 ...0 .... */
133
-{
87
- int size = extract32(insn, 20, 1);
134
- switch (size) {
88
- data = extract32(insn, 23, 2); /* rot */
135
- case 0: gen_helper_neon_subl_u16(CPU_V001); break;
89
- if (!dc_isar_feature(aa32_vcma, s)
136
- case 1: gen_helper_neon_subl_u32(CPU_V001); break;
90
- || (!size && !dc_isar_feature(aa32_fp16_arith, s))) {
137
- case 2: tcg_gen_sub_i64(CPU_V001); break;
91
- return 1;
138
- default: abort();
92
- }
139
- }
93
- fn_gvec_ptr = size ? gen_helper_gvec_fcmlas : gen_helper_gvec_fcmlah;
140
-}
94
- } else if ((insn & 0xfea00f10) == 0xfc800800) {
141
-
95
+ if ((insn & 0xfea00f10) == 0xfc800800) {
142
static inline void gen_neon_negl(TCGv_i64 var, int size)
96
/* VCADD -- 1111 110R 1.0S .... .... 1000 ...0 .... */
143
{
97
int size = extract32(insn, 20, 1);
144
switch (size) {
98
data = extract32(insn, 24, 1); /* rot */
145
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
146
op = (insn >> 8) & 0xf;
147
if ((insn & (1 << 6)) == 0) {
148
/* Three registers of different lengths. */
149
- int src1_wide;
150
- int src2_wide;
151
/* undefreq: bit 0 : UNDEF if size == 0
152
* bit 1 : UNDEF if size == 1
153
* bit 2 : UNDEF if size == 2
154
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
155
{0, 0, 0, 7}, /* VADDW: handled by decodetree */
156
{0, 0, 0, 7}, /* VSUBL: handled by decodetree */
157
{0, 0, 0, 7}, /* VSUBW: handled by decodetree */
158
- {0, 1, 1, 0}, /* VADDHN */
159
+ {0, 0, 0, 7}, /* VADDHN: handled by decodetree */
160
{0, 0, 0, 0}, /* VABAL */
161
- {0, 1, 1, 0}, /* VSUBHN */
162
+ {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
163
{0, 0, 0, 0}, /* VABDL */
164
{0, 0, 0, 0}, /* VMLAL */
165
{0, 0, 0, 9}, /* VQDMLAL */
166
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
167
{0, 0, 0, 7}, /* Reserved: always UNDEF */
168
};
169
170
- src1_wide = neon_3reg_wide[op][1];
171
- src2_wide = neon_3reg_wide[op][2];
172
undefreq = neon_3reg_wide[op][3];
173
174
if ((undefreq & (1 << size)) ||
175
((undefreq & 8) && u)) {
176
return 1;
177
}
178
- if ((src1_wide && (rn & 1)) ||
179
- (src2_wide && (rm & 1)) ||
180
- (!src2_wide && (rd & 1))) {
181
+ if (rd & 1) {
182
return 1;
183
}
184
185
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
186
/* Avoid overlapping operands. Wide source operands are
187
always aligned so will never overlap with wide
188
destinations in problematic ways. */
189
- if (rd == rm && !src2_wide) {
190
+ if (rd == rm) {
191
tmp = neon_load_reg(rm, 1);
192
neon_store_scratch(2, tmp);
193
- } else if (rd == rn && !src1_wide) {
194
+ } else if (rd == rn) {
195
tmp = neon_load_reg(rn, 1);
196
neon_store_scratch(2, tmp);
197
}
198
tmp3 = NULL;
199
for (pass = 0; pass < 2; pass++) {
200
- if (src1_wide) {
201
- neon_load_reg64(cpu_V0, rn + pass);
202
- tmp = NULL;
203
+ if (pass == 1 && rd == rn) {
204
+ tmp = neon_load_scratch(2);
205
} else {
206
- if (pass == 1 && rd == rn) {
207
- tmp = neon_load_scratch(2);
208
- } else {
209
- tmp = neon_load_reg(rn, pass);
210
- }
211
+ tmp = neon_load_reg(rn, pass);
212
}
213
- if (src2_wide) {
214
- neon_load_reg64(cpu_V1, rm + pass);
215
- tmp2 = NULL;
216
+ if (pass == 1 && rd == rm) {
217
+ tmp2 = neon_load_scratch(2);
218
} else {
219
- if (pass == 1 && rd == rm) {
220
- tmp2 = neon_load_scratch(2);
221
- } else {
222
- tmp2 = neon_load_reg(rm, pass);
223
- }
224
+ tmp2 = neon_load_reg(rm, pass);
225
}
226
switch (op) {
227
- case 0: case 1: case 4: /* VADDL, VADDW, VADDHN, VRADDHN */
228
- gen_neon_addl(size);
229
- break;
230
- case 2: case 3: case 6: /* VSUBL, VSUBW, VSUBHN, VRSUBHN */
231
- gen_neon_subl(size);
232
- break;
233
case 5: case 7: /* VABAL, VABDL */
234
switch ((size << 1) | u) {
235
case 0:
236
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
237
abort();
238
}
239
neon_store_reg64(cpu_V0, rd + pass);
240
- } else if (op == 4 || op == 6) {
241
- /* Narrowing operation. */
242
- tmp = tcg_temp_new_i32();
243
- if (!u) {
244
- switch (size) {
245
- case 0:
246
- gen_helper_neon_narrow_high_u8(tmp, cpu_V0);
247
- break;
248
- case 1:
249
- gen_helper_neon_narrow_high_u16(tmp, cpu_V0);
250
- break;
251
- case 2:
252
- tcg_gen_extrh_i64_i32(tmp, cpu_V0);
253
- break;
254
- default: abort();
255
- }
256
- } else {
257
- switch (size) {
258
- case 0:
259
- gen_helper_neon_narrow_round_high_u8(tmp, cpu_V0);
260
- break;
261
- case 1:
262
- gen_helper_neon_narrow_round_high_u16(tmp, cpu_V0);
263
- break;
264
- case 2:
265
- tcg_gen_addi_i64(cpu_V0, cpu_V0, 1u << 31);
266
- tcg_gen_extrh_i64_i32(tmp, cpu_V0);
267
- break;
268
- default: abort();
269
- }
270
- }
271
- if (pass == 0) {
272
- tmp3 = tmp;
273
- } else {
274
- neon_store_reg(rd, 0, tmp3);
275
- neon_store_reg(rd, 1, tmp);
276
- }
277
} else {
278
/* Write back the result. */
279
neon_store_reg64(cpu_V0, rd + pass);
99
--
280
--
100
2.20.1
281
2.20.1
101
282
102
283
diff view generated by jsdifflib
1
Convert the Neon 3-reg-same VADD and VSUB insns to decodetree.
1
Convert the Neon 3-reg-diff insns VABAL and VABDL to decodetree.
2
2
Like almost all the remaining insns in this group, these are
3
Note that we don't need the neon_3r_sizes[op] check here because all
3
a combination of a two-input operation which returns a double width
4
size values are OK for VADD and VSUB; we'll add this when we convert
4
result and then a possible accumulation of that double width
5
the first insn that has size restrictions.
5
result into the destination.
6
7
For this we need one of the GVecGen*Fn typedefs currently in
8
translate-a64.h; move them all to translate.h as a block so they
9
are visible to the 32-bit decoder.
10
6
11
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
12
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
8
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
13
Message-id: 20200430181003.21682-15-peter.maydell@linaro.org
14
---
9
---
15
target/arm/translate-a64.h | 9 --------
10
target/arm/translate.h | 1 +
16
target/arm/translate.h | 9 ++++++++
11
target/arm/neon-dp.decode | 6 ++
17
target/arm/neon-dp.decode | 17 +++++++++++++++
12
target/arm/translate-neon.inc.c | 132 ++++++++++++++++++++++++++++++++
18
target/arm/translate-neon.inc.c | 38 +++++++++++++++++++++++++++++++++
13
target/arm/translate.c | 31 +-------
19
target/arm/translate.c | 14 ++++--------
14
4 files changed, 142 insertions(+), 28 deletions(-)
20
5 files changed, 68 insertions(+), 19 deletions(-)
15
21
22
diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
23
index XXXXXXX..XXXXXXX 100644
24
--- a/target/arm/translate-a64.h
25
+++ b/target/arm/translate-a64.h
26
@@ -XXX,XX +XXX,XX @@ static inline int vec_full_reg_size(DisasContext *s)
27
28
bool disas_sve(DisasContext *, uint32_t);
29
30
-/* Note that the gvec expanders operate on offsets + sizes. */
31
-typedef void GVecGen2Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t);
32
-typedef void GVecGen2iFn(unsigned, uint32_t, uint32_t, int64_t,
33
- uint32_t, uint32_t);
34
-typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
35
- uint32_t, uint32_t, uint32_t);
36
-typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t,
37
- uint32_t, uint32_t, uint32_t);
38
-
39
#endif /* TARGET_ARM_TRANSLATE_A64_H */
40
diff --git a/target/arm/translate.h b/target/arm/translate.h
16
diff --git a/target/arm/translate.h b/target/arm/translate.h
41
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
42
--- a/target/arm/translate.h
18
--- a/target/arm/translate.h
43
+++ b/target/arm/translate.h
19
+++ b/target/arm/translate.h
44
@@ -XXX,XX +XXX,XX @@ void gen_sshl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
20
@@ -XXX,XX +XXX,XX @@ typedef void NeonGenTwo64OpEnvFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
45
#define dc_isar_feature(name, ctx) \
21
typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
46
({ DisasContext *ctx_ = (ctx); isar_feature_##name(ctx_->isar); })
22
typedef void NeonGenNarrowEnvFn(TCGv_i32, TCGv_ptr, TCGv_i64);
47
23
typedef void NeonGenWidenFn(TCGv_i64, TCGv_i32);
48
+/* Note that the gvec expanders operate on offsets + sizes. */
24
+typedef void NeonGenTwoOpWidenFn(TCGv_i64, TCGv_i32, TCGv_i32);
49
+typedef void GVecGen2Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t);
25
typedef void NeonGenTwoSingleOPFn(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
50
+typedef void GVecGen2iFn(unsigned, uint32_t, uint32_t, int64_t,
26
typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
51
+ uint32_t, uint32_t);
27
typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64);
52
+typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
53
+ uint32_t, uint32_t, uint32_t);
54
+typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t,
55
+ uint32_t, uint32_t, uint32_t);
56
+
57
#endif /* TARGET_ARM_TRANSLATE_H */
58
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
28
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
59
index XXXXXXX..XXXXXXX 100644
29
index XXXXXXX..XXXXXXX 100644
60
--- a/target/arm/neon-dp.decode
30
--- a/target/arm/neon-dp.decode
61
+++ b/target/arm/neon-dp.decode
31
+++ b/target/arm/neon-dp.decode
62
@@ -XXX,XX +XXX,XX @@
32
@@ -XXX,XX +XXX,XX @@ Vimm_1r 1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
63
#
33
VADDHN_3d 1111 001 0 1 . .. .... .... 0100 . 0 . 0 .... @3diff
64
# This file is processed by scripts/decodetree.py
34
VRADDHN_3d 1111 001 1 1 . .. .... .... 0100 . 0 . 0 .... @3diff
65
#
35
66
+# VFP/Neon register fields; same as vfp.decode
36
+ VABAL_S_3d 1111 001 0 1 . .. .... .... 0101 . 0 . 0 .... @3diff
67
+%vm_dp 5:1 0:4
37
+ VABAL_U_3d 1111 001 1 1 . .. .... .... 0101 . 0 . 0 .... @3diff
68
+%vn_dp 7:1 16:4
38
+
69
+%vd_dp 22:1 12:4
39
VSUBHN_3d 1111 001 0 1 . .. .... .... 0110 . 0 . 0 .... @3diff
70
40
VRSUBHN_3d 1111 001 1 1 . .. .... .... 0110 . 0 . 0 .... @3diff
71
# Encodings for Neon data processing instructions where the T32 encoding
41
+
72
# is a simple transformation of the A32 encoding.
42
+ VABDL_S_3d 1111 001 0 1 . .. .... .... 0111 . 0 . 0 .... @3diff
73
@@ -XXX,XX +XXX,XX @@
43
+ VABDL_U_3d 1111 001 1 1 . .. .... .... 0111 . 0 . 0 .... @3diff
74
# 0b111p_1111_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
44
]
75
# This file works on the A32 encoding only; calling code for T32 has to
45
}
76
# transform the insn into the A32 version first.
77
+
78
+######################################################################
79
+# 3-reg-same grouping:
80
+# 1111 001 U 0 D sz:2 Vn:4 Vd:4 opc:4 N Q M op Vm:4
81
+######################################################################
82
+
83
+&3same vm vn vd q size
84
+
85
+@3same .... ... . . . size:2 .... .... .... . q:1 . . .... \
86
+ &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
87
+
88
+VADD_3s 1111 001 0 0 . .. .... .... 1000 . . . 0 .... @3same
89
+VSUB_3s 1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
90
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
46
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
91
index XXXXXXX..XXXXXXX 100644
47
index XXXXXXX..XXXXXXX 100644
92
--- a/target/arm/translate-neon.inc.c
48
--- a/target/arm/translate-neon.inc.c
93
+++ b/target/arm/translate-neon.inc.c
49
+++ b/target/arm/translate-neon.inc.c
94
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDST_single(DisasContext *s, arg_VLDST_single *a)
50
@@ -XXX,XX +XXX,XX @@ DO_NARROW_3D(VADDHN, add, narrow, tcg_gen_extrh_i64_i32)
95
51
DO_NARROW_3D(VSUBHN, sub, narrow, tcg_gen_extrh_i64_i32)
96
return true;
52
DO_NARROW_3D(VRADDHN, add, narrow_round, gen_narrow_round_high_u32)
97
}
53
DO_NARROW_3D(VRSUBHN, sub, narrow_round, gen_narrow_round_high_u32)
98
+
54
+
99
+static bool do_3same(DisasContext *s, arg_3same *a, GVecGen3Fn fn)
55
+static bool do_long_3d(DisasContext *s, arg_3diff *a,
100
+{
56
+ NeonGenTwoOpWidenFn *opfn,
101
+ int vec_size = a->q ? 16 : 8;
57
+ NeonGenTwo64OpFn *accfn)
102
+ int rd_ofs = neon_reg_offset(a->vd, 0);
58
+{
103
+ int rn_ofs = neon_reg_offset(a->vn, 0);
59
+ /*
104
+ int rm_ofs = neon_reg_offset(a->vm, 0);
60
+ * 3-regs different lengths, long operations.
61
+ * These perform an operation on two inputs that returns a double-width
62
+ * result, and then possibly perform an accumulation operation of
63
+ * that result into the double-width destination.
64
+ */
65
+ TCGv_i64 rd0, rd1, tmp;
66
+ TCGv_i32 rn, rm;
105
+
67
+
106
+ if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
68
+ if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
107
+ return false;
69
+ return false;
108
+ }
70
+ }
109
+
71
+
110
+ /* UNDEF accesses to D16-D31 if they don't exist. */
72
+ /* UNDEF accesses to D16-D31 if they don't exist. */
111
+ if (!dc_isar_feature(aa32_simd_r32, s) &&
73
+ if (!dc_isar_feature(aa32_simd_r32, s) &&
112
+ ((a->vd | a->vn | a->vm) & 0x10)) {
74
+ ((a->vd | a->vn | a->vm) & 0x10)) {
113
+ return false;
75
+ return false;
114
+ }
76
+ }
115
+
77
+
116
+ if ((a->vn | a->vm | a->vd) & a->q) {
78
+ if (!opfn) {
79
+ /* size == 3 case, which is an entirely different insn group */
80
+ return false;
81
+ }
82
+
83
+ if (a->vd & 1) {
117
+ return false;
84
+ return false;
118
+ }
85
+ }
119
+
86
+
120
+ if (!vfp_access_check(s)) {
87
+ if (!vfp_access_check(s)) {
121
+ return true;
88
+ return true;
122
+ }
89
+ }
123
+
90
+
124
+ fn(a->size, rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
91
+ rd0 = tcg_temp_new_i64();
92
+ rd1 = tcg_temp_new_i64();
93
+
94
+ rn = neon_load_reg(a->vn, 0);
95
+ rm = neon_load_reg(a->vm, 0);
96
+ opfn(rd0, rn, rm);
97
+ tcg_temp_free_i32(rn);
98
+ tcg_temp_free_i32(rm);
99
+
100
+ rn = neon_load_reg(a->vn, 1);
101
+ rm = neon_load_reg(a->vm, 1);
102
+ opfn(rd1, rn, rm);
103
+ tcg_temp_free_i32(rn);
104
+ tcg_temp_free_i32(rm);
105
+
106
+ /* Don't store results until after all loads: they might overlap */
107
+ if (accfn) {
108
+ tmp = tcg_temp_new_i64();
109
+ neon_load_reg64(tmp, a->vd);
110
+ accfn(tmp, tmp, rd0);
111
+ neon_store_reg64(tmp, a->vd);
112
+ neon_load_reg64(tmp, a->vd + 1);
113
+ accfn(tmp, tmp, rd1);
114
+ neon_store_reg64(tmp, a->vd + 1);
115
+ tcg_temp_free_i64(tmp);
116
+ } else {
117
+ neon_store_reg64(rd0, a->vd);
118
+ neon_store_reg64(rd1, a->vd + 1);
119
+ }
120
+
121
+ tcg_temp_free_i64(rd0);
122
+ tcg_temp_free_i64(rd1);
123
+
125
+ return true;
124
+ return true;
126
+}
125
+}
127
+
126
+
128
+#define DO_3SAME(INSN, FUNC) \
127
+static bool trans_VABDL_S_3d(DisasContext *s, arg_3diff *a)
129
+ static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a) \
128
+{
130
+ { \
129
+ static NeonGenTwoOpWidenFn * const opfn[] = {
131
+ return do_3same(s, a, FUNC); \
130
+ gen_helper_neon_abdl_s16,
132
+ }
131
+ gen_helper_neon_abdl_s32,
133
+
132
+ gen_helper_neon_abdl_s64,
134
+DO_3SAME(VADD, tcg_gen_gvec_add)
133
+ NULL,
135
+DO_3SAME(VSUB, tcg_gen_gvec_sub)
134
+ };
135
+
136
+ return do_long_3d(s, a, opfn[a->size], NULL);
137
+}
138
+
139
+static bool trans_VABDL_U_3d(DisasContext *s, arg_3diff *a)
140
+{
141
+ static NeonGenTwoOpWidenFn * const opfn[] = {
142
+ gen_helper_neon_abdl_u16,
143
+ gen_helper_neon_abdl_u32,
144
+ gen_helper_neon_abdl_u64,
145
+ NULL,
146
+ };
147
+
148
+ return do_long_3d(s, a, opfn[a->size], NULL);
149
+}
150
+
151
+static bool trans_VABAL_S_3d(DisasContext *s, arg_3diff *a)
152
+{
153
+ static NeonGenTwoOpWidenFn * const opfn[] = {
154
+ gen_helper_neon_abdl_s16,
155
+ gen_helper_neon_abdl_s32,
156
+ gen_helper_neon_abdl_s64,
157
+ NULL,
158
+ };
159
+ static NeonGenTwo64OpFn * const addfn[] = {
160
+ gen_helper_neon_addl_u16,
161
+ gen_helper_neon_addl_u32,
162
+ tcg_gen_add_i64,
163
+ NULL,
164
+ };
165
+
166
+ return do_long_3d(s, a, opfn[a->size], addfn[a->size]);
167
+}
168
+
169
+static bool trans_VABAL_U_3d(DisasContext *s, arg_3diff *a)
170
+{
171
+ static NeonGenTwoOpWidenFn * const opfn[] = {
172
+ gen_helper_neon_abdl_u16,
173
+ gen_helper_neon_abdl_u32,
174
+ gen_helper_neon_abdl_u64,
175
+ NULL,
176
+ };
177
+ static NeonGenTwo64OpFn * const addfn[] = {
178
+ gen_helper_neon_addl_u16,
179
+ gen_helper_neon_addl_u32,
180
+ tcg_gen_add_i64,
181
+ NULL,
182
+ };
183
+
184
+ return do_long_3d(s, a, opfn[a->size], addfn[a->size]);
185
+}
136
diff --git a/target/arm/translate.c b/target/arm/translate.c
186
diff --git a/target/arm/translate.c b/target/arm/translate.c
137
index XXXXXXX..XXXXXXX 100644
187
index XXXXXXX..XXXXXXX 100644
138
--- a/target/arm/translate.c
188
--- a/target/arm/translate.c
139
+++ b/target/arm/translate.c
189
+++ b/target/arm/translate.c
140
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
190
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
141
}
191
{0, 0, 0, 7}, /* VSUBL: handled by decodetree */
142
return 0;
192
{0, 0, 0, 7}, /* VSUBW: handled by decodetree */
143
193
{0, 0, 0, 7}, /* VADDHN: handled by decodetree */
144
- case NEON_3R_VADD_VSUB:
194
- {0, 0, 0, 0}, /* VABAL */
145
- if (u) {
195
+ {0, 0, 0, 7}, /* VABAL */
146
- tcg_gen_gvec_sub(size, rd_ofs, rn_ofs, rm_ofs,
196
{0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
147
- vec_size, vec_size);
197
- {0, 0, 0, 0}, /* VABDL */
148
- } else {
198
+ {0, 0, 0, 7}, /* VABDL */
149
- tcg_gen_gvec_add(size, rd_ofs, rn_ofs, rm_ofs,
199
{0, 0, 0, 0}, /* VMLAL */
150
- vec_size, vec_size);
200
{0, 0, 0, 9}, /* VQDMLAL */
151
- }
201
{0, 0, 0, 0}, /* VMLSL */
152
- return 0;
153
-
154
case NEON_3R_VQADD:
155
tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
156
rn_ofs, rm_ofs, vec_size, vec_size,
157
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
202
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
158
tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, vec_size, vec_size,
203
tmp2 = neon_load_reg(rm, pass);
159
u ? &ushl_op[size] : &sshl_op[size]);
204
}
160
return 0;
205
switch (op) {
161
+
206
- case 5: case 7: /* VABAL, VABDL */
162
+ case NEON_3R_VADD_VSUB:
207
- switch ((size << 1) | u) {
163
+ /* Already handled by decodetree */
208
- case 0:
164
+ return 1;
209
- gen_helper_neon_abdl_s16(cpu_V0, tmp, tmp2);
165
}
210
- break;
166
211
- case 1:
167
if (size == 3) {
212
- gen_helper_neon_abdl_u16(cpu_V0, tmp, tmp2);
213
- break;
214
- case 2:
215
- gen_helper_neon_abdl_s32(cpu_V0, tmp, tmp2);
216
- break;
217
- case 3:
218
- gen_helper_neon_abdl_u32(cpu_V0, tmp, tmp2);
219
- break;
220
- case 4:
221
- gen_helper_neon_abdl_s64(cpu_V0, tmp, tmp2);
222
- break;
223
- case 5:
224
- gen_helper_neon_abdl_u64(cpu_V0, tmp, tmp2);
225
- break;
226
- default: abort();
227
- }
228
- tcg_temp_free_i32(tmp2);
229
- tcg_temp_free_i32(tmp);
230
- break;
231
case 8: case 9: case 10: case 11: case 12: case 13:
232
/* VMLAL, VQDMLAL, VMLSL, VQDMLSL, VMULL, VQDMULL */
233
gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
234
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
235
case 10: /* VMLSL */
236
gen_neon_negl(cpu_V0, size);
237
/* Fall through */
238
- case 5: case 8: /* VABAL, VMLAL */
239
+ case 8: /* VABAL, VMLAL */
240
gen_neon_addl(size);
241
break;
242
case 9: case 11: /* VQDMLAL, VQDMLSL */
168
--
243
--
169
2.20.1
244
2.20.1
170
245
171
246
diff view generated by jsdifflib
1
Convert the Neon VMUL, VMLA, VMLS and VSHL insns in the
1
Convert the Neon 3-reg-diff insns VMULL, VMLAL and VMLSL; these perform
2
3-reg-same grouping to decodetree.
2
a 32x32->64 multiply with possible accumulate.
3
4
Note that for VMLSL we do the accumulate directly with a subtraction
5
rather than doing a negate-then-add as the old code did.
3
6
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
8
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Message-id: 20200430181003.21682-20-peter.maydell@linaro.org
7
---
9
---
8
target/arm/neon-dp.decode | 9 +++++++
10
target/arm/neon-dp.decode | 9 +++++
9
target/arm/translate-neon.inc.c | 44 +++++++++++++++++++++++++++++++++
11
target/arm/translate-neon.inc.c | 71 +++++++++++++++++++++++++++++++++
10
target/arm/translate.c | 28 +++------------------
12
target/arm/translate.c | 21 +++-------
11
3 files changed, 56 insertions(+), 25 deletions(-)
13
3 files changed, 86 insertions(+), 15 deletions(-)
12
14
13
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
15
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
14
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/neon-dp.decode
17
--- a/target/arm/neon-dp.decode
16
+++ b/target/arm/neon-dp.decode
18
+++ b/target/arm/neon-dp.decode
17
@@ -XXX,XX +XXX,XX @@ VCGT_U_3s 1111 001 1 0 . .. .... .... 0011 . . . 0 .... @3same
19
@@ -XXX,XX +XXX,XX @@ Vimm_1r 1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
18
VCGE_S_3s 1111 001 0 0 . .. .... .... 0011 . . . 1 .... @3same
20
19
VCGE_U_3s 1111 001 1 0 . .. .... .... 0011 . . . 1 .... @3same
21
VABDL_S_3d 1111 001 0 1 . .. .... .... 0111 . 0 . 0 .... @3diff
20
22
VABDL_U_3d 1111 001 1 1 . .. .... .... 0111 . 0 . 0 .... @3diff
21
+VSHL_S_3s 1111 001 0 0 . .. .... .... 0100 . . . 0 .... @3same
22
+VSHL_U_3s 1111 001 1 0 . .. .... .... 0100 . . . 0 .... @3same
23
+
23
+
24
VMAX_S_3s 1111 001 0 0 . .. .... .... 0110 . . . 0 .... @3same
24
+ VMLAL_S_3d 1111 001 0 1 . .. .... .... 1000 . 0 . 0 .... @3diff
25
VMAX_U_3s 1111 001 1 0 . .. .... .... 0110 . . . 0 .... @3same
25
+ VMLAL_U_3d 1111 001 1 1 . .. .... .... 1000 . 0 . 0 .... @3diff
26
VMIN_S_3s 1111 001 0 0 . .. .... .... 0110 . . . 1 .... @3same
27
@@ -XXX,XX +XXX,XX @@ VSUB_3s 1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
28
29
VTST_3s 1111 001 0 0 . .. .... .... 1000 . . . 1 .... @3same
30
VCEQ_3s 1111 001 1 0 . .. .... .... 1000 . . . 1 .... @3same
31
+
26
+
32
+VMLA_3s 1111 001 0 0 . .. .... .... 1001 . . . 0 .... @3same
27
+ VMLSL_S_3d 1111 001 0 1 . .. .... .... 1010 . 0 . 0 .... @3diff
33
+VMLS_3s 1111 001 1 0 . .. .... .... 1001 . . . 0 .... @3same
28
+ VMLSL_U_3d 1111 001 1 1 . .. .... .... 1010 . 0 . 0 .... @3diff
34
+
29
+
35
+VMUL_3s 1111 001 0 0 . .. .... .... 1001 . . . 1 .... @3same
30
+ VMULL_S_3d 1111 001 0 1 . .. .... .... 1100 . 0 . 0 .... @3diff
36
+VMUL_p_3s 1111 001 1 0 . .. .... .... 1001 . . . 1 .... @3same
31
+ VMULL_U_3d 1111 001 1 1 . .. .... .... 1100 . 0 . 0 .... @3diff
32
]
33
}
37
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
34
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
38
index XXXXXXX..XXXXXXX 100644
35
index XXXXXXX..XXXXXXX 100644
39
--- a/target/arm/translate-neon.inc.c
36
--- a/target/arm/translate-neon.inc.c
40
+++ b/target/arm/translate-neon.inc.c
37
+++ b/target/arm/translate-neon.inc.c
41
@@ -XXX,XX +XXX,XX @@ DO_3SAME_NO_SZ_3(VMAX_S, tcg_gen_gvec_smax)
38
@@ -XXX,XX +XXX,XX @@ static bool trans_VABAL_U_3d(DisasContext *s, arg_3diff *a)
42
DO_3SAME_NO_SZ_3(VMAX_U, tcg_gen_gvec_umax)
39
43
DO_3SAME_NO_SZ_3(VMIN_S, tcg_gen_gvec_smin)
40
return do_long_3d(s, a, opfn[a->size], addfn[a->size]);
44
DO_3SAME_NO_SZ_3(VMIN_U, tcg_gen_gvec_umin)
41
}
45
+DO_3SAME_NO_SZ_3(VMUL, tcg_gen_gvec_mul)
46
47
#define DO_3SAME_CMP(INSN, COND) \
48
static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs, \
49
@@ -XXX,XX +XXX,XX @@ DO_3SAME_GVEC4(VQADD_S, sqadd_op)
50
DO_3SAME_GVEC4(VQADD_U, uqadd_op)
51
DO_3SAME_GVEC4(VQSUB_S, sqsub_op)
52
DO_3SAME_GVEC4(VQSUB_U, uqsub_op)
53
+
42
+
54
+static void gen_VMUL_p_3s(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
43
+static void gen_mull_s32(TCGv_i64 rd, TCGv_i32 rn, TCGv_i32 rm)
55
+ uint32_t rm_ofs, uint32_t oprsz, uint32_t maxsz)
56
+{
44
+{
57
+ tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz,
45
+ TCGv_i32 lo = tcg_temp_new_i32();
58
+ 0, gen_helper_gvec_pmul_b);
46
+ TCGv_i32 hi = tcg_temp_new_i32();
47
+
48
+ tcg_gen_muls2_i32(lo, hi, rn, rm);
49
+ tcg_gen_concat_i32_i64(rd, lo, hi);
50
+
51
+ tcg_temp_free_i32(lo);
52
+ tcg_temp_free_i32(hi);
59
+}
53
+}
60
+
54
+
61
+static bool trans_VMUL_p_3s(DisasContext *s, arg_3same *a)
55
+static void gen_mull_u32(TCGv_i64 rd, TCGv_i32 rn, TCGv_i32 rm)
62
+{
56
+{
63
+ if (a->size != 0) {
57
+ TCGv_i32 lo = tcg_temp_new_i32();
64
+ return false;
58
+ TCGv_i32 hi = tcg_temp_new_i32();
65
+ }
59
+
66
+ return do_3same(s, a, gen_VMUL_p_3s);
60
+ tcg_gen_mulu2_i32(lo, hi, rn, rm);
61
+ tcg_gen_concat_i32_i64(rd, lo, hi);
62
+
63
+ tcg_temp_free_i32(lo);
64
+ tcg_temp_free_i32(hi);
67
+}
65
+}
68
+
66
+
69
+#define DO_3SAME_GVEC3_NO_SZ_3(INSN, OPARRAY) \
67
+static bool trans_VMULL_S_3d(DisasContext *s, arg_3diff *a)
70
+ static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs, \
68
+{
71
+ uint32_t rn_ofs, uint32_t rm_ofs, \
69
+ static NeonGenTwoOpWidenFn * const opfn[] = {
72
+ uint32_t oprsz, uint32_t maxsz) \
70
+ gen_helper_neon_mull_s8,
71
+ gen_helper_neon_mull_s16,
72
+ gen_mull_s32,
73
+ NULL,
74
+ };
75
+
76
+ return do_long_3d(s, a, opfn[a->size], NULL);
77
+}
78
+
79
+static bool trans_VMULL_U_3d(DisasContext *s, arg_3diff *a)
80
+{
81
+ static NeonGenTwoOpWidenFn * const opfn[] = {
82
+ gen_helper_neon_mull_u8,
83
+ gen_helper_neon_mull_u16,
84
+ gen_mull_u32,
85
+ NULL,
86
+ };
87
+
88
+ return do_long_3d(s, a, opfn[a->size], NULL);
89
+}
90
+
91
+#define DO_VMLAL(INSN,MULL,ACC) \
92
+ static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a) \
73
+ { \
93
+ { \
74
+ tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, \
94
+ static NeonGenTwoOpWidenFn * const opfn[] = { \
75
+ oprsz, maxsz, &OPARRAY[vece]); \
95
+ gen_helper_neon_##MULL##8, \
76
+ } \
96
+ gen_helper_neon_##MULL##16, \
77
+ DO_3SAME_NO_SZ_3(INSN, gen_##INSN##_3s)
97
+ gen_##MULL##32, \
98
+ NULL, \
99
+ }; \
100
+ static NeonGenTwo64OpFn * const accfn[] = { \
101
+ gen_helper_neon_##ACC##l_u16, \
102
+ gen_helper_neon_##ACC##l_u32, \
103
+ tcg_gen_##ACC##_i64, \
104
+ NULL, \
105
+ }; \
106
+ return do_long_3d(s, a, opfn[a->size], accfn[a->size]); \
107
+ }
78
+
108
+
79
+
109
+DO_VMLAL(VMLAL_S,mull_s,add)
80
+DO_3SAME_GVEC3_NO_SZ_3(VMLA, mla_op)
110
+DO_VMLAL(VMLAL_U,mull_u,add)
81
+DO_3SAME_GVEC3_NO_SZ_3(VMLS, mls_op)
111
+DO_VMLAL(VMLSL_S,mull_s,sub)
82
+
112
+DO_VMLAL(VMLSL_U,mull_u,sub)
83
+#define DO_3SAME_GVEC3_SHIFT(INSN, OPARRAY) \
84
+ static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs, \
85
+ uint32_t rn_ofs, uint32_t rm_ofs, \
86
+ uint32_t oprsz, uint32_t maxsz) \
87
+ { \
88
+ /* Note the operation is vshl vd,vm,vn */ \
89
+ tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, \
90
+ oprsz, maxsz, &OPARRAY[vece]); \
91
+ } \
92
+ DO_3SAME(INSN, gen_##INSN##_3s)
93
+
94
+DO_3SAME_GVEC3_SHIFT(VSHL_S, sshl_op)
95
+DO_3SAME_GVEC3_SHIFT(VSHL_U, ushl_op)
96
diff --git a/target/arm/translate.c b/target/arm/translate.c
113
diff --git a/target/arm/translate.c b/target/arm/translate.c
97
index XXXXXXX..XXXXXXX 100644
114
index XXXXXXX..XXXXXXX 100644
98
--- a/target/arm/translate.c
115
--- a/target/arm/translate.c
99
+++ b/target/arm/translate.c
116
+++ b/target/arm/translate.c
100
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
117
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
101
}
118
{0, 0, 0, 7}, /* VABAL */
102
return 1;
119
{0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
103
120
{0, 0, 0, 7}, /* VABDL */
104
- case NEON_3R_VMUL: /* VMUL */
121
- {0, 0, 0, 0}, /* VMLAL */
105
- if (u) {
122
+ {0, 0, 0, 7}, /* VMLAL */
106
- /* Polynomial case allows only P8. */
123
{0, 0, 0, 9}, /* VQDMLAL */
107
- if (size != 0) {
124
- {0, 0, 0, 0}, /* VMLSL */
108
- return 1;
125
+ {0, 0, 0, 7}, /* VMLSL */
109
- }
126
{0, 0, 0, 9}, /* VQDMLSL */
110
- tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size,
127
- {0, 0, 0, 0}, /* Integer VMULL */
111
- 0, gen_helper_gvec_pmul_b);
128
+ {0, 0, 0, 7}, /* Integer VMULL */
112
- } else {
129
{0, 0, 0, 9}, /* VQDMULL */
113
- tcg_gen_gvec_mul(size, rd_ofs, rn_ofs, rm_ofs,
130
{0, 0, 0, 0xa}, /* Polynomial VMULL */
114
- vec_size, vec_size);
131
{0, 0, 0, 7}, /* Reserved: always UNDEF */
115
- }
116
- return 0;
117
-
118
- case NEON_3R_VML: /* VMLA, VMLS */
119
- tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size,
120
- u ? &mls_op[size] : &mla_op[size]);
121
- return 0;
122
-
123
- case NEON_3R_VSHL:
124
- /* Note the operation is vshl vd,vm,vn */
125
- tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, vec_size, vec_size,
126
- u ? &ushl_op[size] : &sshl_op[size]);
127
- return 0;
128
-
129
case NEON_3R_VADD_VSUB:
130
case NEON_3R_LOGIC:
131
case NEON_3R_VMAX:
132
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
132
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
133
case NEON_3R_VCGE:
133
tmp2 = neon_load_reg(rm, pass);
134
case NEON_3R_VQADD:
134
}
135
case NEON_3R_VQSUB:
135
switch (op) {
136
+ case NEON_3R_VMUL:
136
- case 8: case 9: case 10: case 11: case 12: case 13:
137
+ case NEON_3R_VML:
137
- /* VMLAL, VQDMLAL, VMLSL, VQDMLSL, VMULL, VQDMULL */
138
+ case NEON_3R_VSHL:
138
+ case 9: case 11: case 13:
139
/* Already handled by decodetree */
139
+ /* VQDMLAL, VQDMLSL, VQDMULL */
140
return 1;
140
gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
141
}
141
break;
142
default: /* 15 is RESERVED: caught earlier */
143
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
144
/* VQDMULL */
145
gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
146
neon_store_reg64(cpu_V0, rd + pass);
147
- } else if (op == 5 || (op >= 8 && op <= 11)) {
148
+ } else {
149
/* Accumulate. */
150
neon_load_reg64(cpu_V1, rd + pass);
151
switch (op) {
152
- case 10: /* VMLSL */
153
- gen_neon_negl(cpu_V0, size);
154
- /* Fall through */
155
- case 8: /* VABAL, VMLAL */
156
- gen_neon_addl(size);
157
- break;
158
case 9: case 11: /* VQDMLAL, VQDMLSL */
159
gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
160
if (op == 11) {
161
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
162
abort();
163
}
164
neon_store_reg64(cpu_V0, rd + pass);
165
- } else {
166
- /* Write back the result. */
167
- neon_store_reg64(cpu_V0, rd + pass);
168
}
169
}
170
} else {
142
--
171
--
143
2.20.1
172
2.20.1
144
173
145
174
diff view generated by jsdifflib
1
Convert the Neon VQADD/VQSUB insns in the 3-reg-same grouping
1
Convert the Neon 3-reg-diff insns VQDMULL, VQDMLAL and VQDMLSL:
2
to decodetree.
2
these are all saturating doubling long multiplies with a possible
3
accumulate step.
4
5
These are the last insns in the group which use the pass-over-each
6
elements loop, so we can delete that code.
3
7
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
9
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Message-id: 20200430181003.21682-19-peter.maydell@linaro.org
7
---
10
---
8
target/arm/neon-dp.decode | 6 ++++++
11
target/arm/neon-dp.decode | 6 +++
9
target/arm/translate-neon.inc.c | 15 +++++++++++++++
12
target/arm/translate-neon.inc.c | 82 +++++++++++++++++++++++++++++++++
10
target/arm/translate.c | 14 ++------------
13
target/arm/translate.c | 59 ++----------------------
11
3 files changed, 23 insertions(+), 12 deletions(-)
14
3 files changed, 92 insertions(+), 55 deletions(-)
12
15
13
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
16
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
14
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/neon-dp.decode
18
--- a/target/arm/neon-dp.decode
16
+++ b/target/arm/neon-dp.decode
19
+++ b/target/arm/neon-dp.decode
17
@@ -XXX,XX +XXX,XX @@
20
@@ -XXX,XX +XXX,XX @@ Vimm_1r 1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
18
@3same .... ... . . . size:2 .... .... .... . q:1 . . .... \
21
VMLAL_S_3d 1111 001 0 1 . .. .... .... 1000 . 0 . 0 .... @3diff
19
&3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
22
VMLAL_U_3d 1111 001 1 1 . .. .... .... 1000 . 0 . 0 .... @3diff
20
23
21
+VQADD_S_3s 1111 001 0 0 . .. .... .... 0000 . . . 1 .... @3same
24
+ VQDMLAL_3d 1111 001 0 1 . .. .... .... 1001 . 0 . 0 .... @3diff
22
+VQADD_U_3s 1111 001 1 0 . .. .... .... 0000 . . . 1 .... @3same
25
+
23
+
26
VMLSL_S_3d 1111 001 0 1 . .. .... .... 1010 . 0 . 0 .... @3diff
24
@3same_logic .... ... . . . .. .... .... .... . q:1 .. .... \
27
VMLSL_U_3d 1111 001 1 1 . .. .... .... 1010 . 0 . 0 .... @3diff
25
&3same vm=%vm_dp vn=%vn_dp vd=%vd_dp size=0
28
26
29
+ VQDMLSL_3d 1111 001 0 1 . .. .... .... 1011 . 0 . 0 .... @3diff
27
@@ -XXX,XX +XXX,XX @@ VBSL_3s 1111 001 1 0 . 01 .... .... 0001 ... 1 .... @3same_logic
30
+
28
VBIT_3s 1111 001 1 0 . 10 .... .... 0001 ... 1 .... @3same_logic
31
VMULL_S_3d 1111 001 0 1 . .. .... .... 1100 . 0 . 0 .... @3diff
29
VBIF_3s 1111 001 1 0 . 11 .... .... 0001 ... 1 .... @3same_logic
32
VMULL_U_3d 1111 001 1 1 . .. .... .... 1100 . 0 . 0 .... @3diff
30
33
+
31
+VQSUB_S_3s 1111 001 0 0 . .. .... .... 0010 . . . 1 .... @3same
34
+ VQDMULL_3d 1111 001 0 1 . .. .... .... 1101 . 0 . 0 .... @3diff
32
+VQSUB_U_3s 1111 001 1 0 . .. .... .... 0010 . . . 1 .... @3same
35
]
33
+
36
}
34
VCGT_S_3s 1111 001 0 0 . .. .... .... 0011 . . . 0 .... @3same
35
VCGT_U_3s 1111 001 1 0 . .. .... .... 0011 . . . 0 .... @3same
36
VCGE_S_3s 1111 001 0 0 . .. .... .... 0011 . . . 1 .... @3same
37
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
37
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
38
index XXXXXXX..XXXXXXX 100644
38
index XXXXXXX..XXXXXXX 100644
39
--- a/target/arm/translate-neon.inc.c
39
--- a/target/arm/translate-neon.inc.c
40
+++ b/target/arm/translate-neon.inc.c
40
+++ b/target/arm/translate-neon.inc.c
41
@@ -XXX,XX +XXX,XX @@ static void gen_VTST_3s(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
41
@@ -XXX,XX +XXX,XX @@ DO_VMLAL(VMLAL_S,mull_s,add)
42
tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &cmtst_op[vece]);
42
DO_VMLAL(VMLAL_U,mull_u,add)
43
}
43
DO_VMLAL(VMLSL_S,mull_s,sub)
44
DO_3SAME_NO_SZ_3(VTST, gen_VTST_3s)
44
DO_VMLAL(VMLSL_U,mull_u,sub)
45
+
45
+
46
+#define DO_3SAME_GVEC4(INSN, OPARRAY) \
46
+static void gen_VQDMULL_16(TCGv_i64 rd, TCGv_i32 rn, TCGv_i32 rm)
47
+ static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs, \
47
+{
48
+ uint32_t rn_ofs, uint32_t rm_ofs, \
48
+ gen_helper_neon_mull_s16(rd, rn, rm);
49
+ uint32_t oprsz, uint32_t maxsz) \
49
+ gen_helper_neon_addl_saturate_s32(rd, cpu_env, rd, rd);
50
+ { \
50
+}
51
+ tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc), \
51
+
52
+ rn_ofs, rm_ofs, oprsz, maxsz, &OPARRAY[vece]); \
52
+static void gen_VQDMULL_32(TCGv_i64 rd, TCGv_i32 rn, TCGv_i32 rm)
53
+ } \
53
+{
54
+ DO_3SAME(INSN, gen_##INSN##_3s)
54
+ gen_mull_s32(rd, rn, rm);
55
+
55
+ gen_helper_neon_addl_saturate_s64(rd, cpu_env, rd, rd);
56
+DO_3SAME_GVEC4(VQADD_S, sqadd_op)
56
+}
57
+DO_3SAME_GVEC4(VQADD_U, uqadd_op)
57
+
58
+DO_3SAME_GVEC4(VQSUB_S, sqsub_op)
58
+static bool trans_VQDMULL_3d(DisasContext *s, arg_3diff *a)
59
+DO_3SAME_GVEC4(VQSUB_U, uqsub_op)
59
+{
60
+ static NeonGenTwoOpWidenFn * const opfn[] = {
61
+ NULL,
62
+ gen_VQDMULL_16,
63
+ gen_VQDMULL_32,
64
+ NULL,
65
+ };
66
+
67
+ return do_long_3d(s, a, opfn[a->size], NULL);
68
+}
69
+
70
+static void gen_VQDMLAL_acc_16(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
71
+{
72
+ gen_helper_neon_addl_saturate_s32(rd, cpu_env, rn, rm);
73
+}
74
+
75
+static void gen_VQDMLAL_acc_32(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
76
+{
77
+ gen_helper_neon_addl_saturate_s64(rd, cpu_env, rn, rm);
78
+}
79
+
80
+static bool trans_VQDMLAL_3d(DisasContext *s, arg_3diff *a)
81
+{
82
+ static NeonGenTwoOpWidenFn * const opfn[] = {
83
+ NULL,
84
+ gen_VQDMULL_16,
85
+ gen_VQDMULL_32,
86
+ NULL,
87
+ };
88
+ static NeonGenTwo64OpFn * const accfn[] = {
89
+ NULL,
90
+ gen_VQDMLAL_acc_16,
91
+ gen_VQDMLAL_acc_32,
92
+ NULL,
93
+ };
94
+
95
+ return do_long_3d(s, a, opfn[a->size], accfn[a->size]);
96
+}
97
+
98
+static void gen_VQDMLSL_acc_16(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
99
+{
100
+ gen_helper_neon_negl_u32(rm, rm);
101
+ gen_helper_neon_addl_saturate_s32(rd, cpu_env, rn, rm);
102
+}
103
+
104
+static void gen_VQDMLSL_acc_32(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
105
+{
106
+ tcg_gen_neg_i64(rm, rm);
107
+ gen_helper_neon_addl_saturate_s64(rd, cpu_env, rn, rm);
108
+}
109
+
110
+static bool trans_VQDMLSL_3d(DisasContext *s, arg_3diff *a)
111
+{
112
+ static NeonGenTwoOpWidenFn * const opfn[] = {
113
+ NULL,
114
+ gen_VQDMULL_16,
115
+ gen_VQDMULL_32,
116
+ NULL,
117
+ };
118
+ static NeonGenTwo64OpFn * const accfn[] = {
119
+ NULL,
120
+ gen_VQDMLSL_acc_16,
121
+ gen_VQDMLSL_acc_32,
122
+ NULL,
123
+ };
124
+
125
+ return do_long_3d(s, a, opfn[a->size], accfn[a->size]);
126
+}
60
diff --git a/target/arm/translate.c b/target/arm/translate.c
127
diff --git a/target/arm/translate.c b/target/arm/translate.c
61
index XXXXXXX..XXXXXXX 100644
128
index XXXXXXX..XXXXXXX 100644
62
--- a/target/arm/translate.c
129
--- a/target/arm/translate.c
63
+++ b/target/arm/translate.c
130
+++ b/target/arm/translate.c
64
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
131
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
65
}
132
{0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
66
return 1;
133
{0, 0, 0, 7}, /* VABDL */
67
134
{0, 0, 0, 7}, /* VMLAL */
68
- case NEON_3R_VQADD:
135
- {0, 0, 0, 9}, /* VQDMLAL */
69
- tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
136
+ {0, 0, 0, 7}, /* VQDMLAL */
70
- rn_ofs, rm_ofs, vec_size, vec_size,
137
{0, 0, 0, 7}, /* VMLSL */
71
- (u ? uqadd_op : sqadd_op) + size);
138
- {0, 0, 0, 9}, /* VQDMLSL */
72
- return 0;
139
+ {0, 0, 0, 7}, /* VQDMLSL */
140
{0, 0, 0, 7}, /* Integer VMULL */
141
- {0, 0, 0, 9}, /* VQDMULL */
142
+ {0, 0, 0, 7}, /* VQDMULL */
143
{0, 0, 0, 0xa}, /* Polynomial VMULL */
144
{0, 0, 0, 7}, /* Reserved: always UNDEF */
145
};
146
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
147
}
148
return 0;
149
}
73
-
150
-
74
- case NEON_3R_VQSUB:
151
- /* Avoid overlapping operands. Wide source operands are
75
- tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
152
- always aligned so will never overlap with wide
76
- rn_ofs, rm_ofs, vec_size, vec_size,
153
- destinations in problematic ways. */
77
- (u ? uqsub_op : sqsub_op) + size);
154
- if (rd == rm) {
78
- return 0;
155
- tmp = neon_load_reg(rm, 1);
79
-
156
- neon_store_scratch(2, tmp);
80
case NEON_3R_VMUL: /* VMUL */
157
- } else if (rd == rn) {
81
if (u) {
158
- tmp = neon_load_reg(rn, 1);
82
/* Polynomial case allows only P8. */
159
- neon_store_scratch(2, tmp);
83
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
160
- }
84
case NEON_3R_VTST_VCEQ:
161
- tmp3 = NULL;
85
case NEON_3R_VCGT:
162
- for (pass = 0; pass < 2; pass++) {
86
case NEON_3R_VCGE:
163
- if (pass == 1 && rd == rn) {
87
+ case NEON_3R_VQADD:
164
- tmp = neon_load_scratch(2);
88
+ case NEON_3R_VQSUB:
165
- } else {
89
/* Already handled by decodetree */
166
- tmp = neon_load_reg(rn, pass);
90
return 1;
167
- }
91
}
168
- if (pass == 1 && rd == rm) {
169
- tmp2 = neon_load_scratch(2);
170
- } else {
171
- tmp2 = neon_load_reg(rm, pass);
172
- }
173
- switch (op) {
174
- case 9: case 11: case 13:
175
- /* VQDMLAL, VQDMLSL, VQDMULL */
176
- gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
177
- break;
178
- default: /* 15 is RESERVED: caught earlier */
179
- abort();
180
- }
181
- if (op == 13) {
182
- /* VQDMULL */
183
- gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
184
- neon_store_reg64(cpu_V0, rd + pass);
185
- } else {
186
- /* Accumulate. */
187
- neon_load_reg64(cpu_V1, rd + pass);
188
- switch (op) {
189
- case 9: case 11: /* VQDMLAL, VQDMLSL */
190
- gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
191
- if (op == 11) {
192
- gen_neon_negl(cpu_V0, size);
193
- }
194
- gen_neon_addl_saturate(cpu_V0, cpu_V1, size);
195
- break;
196
- default:
197
- abort();
198
- }
199
- neon_store_reg64(cpu_V0, rd + pass);
200
- }
201
- }
202
+ abort(); /* all others handled by decodetree */
203
} else {
204
/* Two registers and a scalar. NB that for ops of this form
205
* the ARM ARM labels bit 24 as Q, but it is in our variable
92
--
206
--
93
2.20.1
207
2.20.1
94
208
95
209
diff view generated by jsdifflib
1
Convert the Neon "load/store single structure to one lane" insns to
1
Convert the Neon 3-reg-diff insn polynomial VMULL. This is the last
2
decodetree.
2
insn in this group to be converted.
3
4
As this is the last set of insns in the neon load/store group,
5
we can remove the whole disas_neon_ls_insn() function.
6
3
7
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
9
Message-id: 20200430181003.21682-14-peter.maydell@linaro.org
10
---
6
---
11
target/arm/neon-ls.decode | 11 +++
7
target/arm/neon-dp.decode | 2 ++
12
target/arm/translate-neon.inc.c | 89 +++++++++++++++++++
8
target/arm/translate-neon.inc.c | 43 +++++++++++++++++++++++
13
target/arm/translate.c | 147 --------------------------------
9
target/arm/translate.c | 60 ++-------------------------------
14
3 files changed, 100 insertions(+), 147 deletions(-)
10
3 files changed, 48 insertions(+), 57 deletions(-)
15
11
16
diff --git a/target/arm/neon-ls.decode b/target/arm/neon-ls.decode
12
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
17
index XXXXXXX..XXXXXXX 100644
13
index XXXXXXX..XXXXXXX 100644
18
--- a/target/arm/neon-ls.decode
14
--- a/target/arm/neon-dp.decode
19
+++ b/target/arm/neon-ls.decode
15
+++ b/target/arm/neon-dp.decode
20
@@ -XXX,XX +XXX,XX @@ VLDST_multiple 1111 0100 0 . l:1 0 rn:4 .... itype:4 size:2 align:2 rm:4 \
16
@@ -XXX,XX +XXX,XX @@ Vimm_1r 1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
21
17
VMULL_U_3d 1111 001 1 1 . .. .... .... 1100 . 0 . 0 .... @3diff
22
VLD_all_lanes 1111 0100 1 . 1 0 rn:4 .... 11 n:2 size:2 t:1 a:1 rm:4 \
18
23
vd=%vd_dp
19
VQDMULL_3d 1111 001 0 1 . .. .... .... 1101 . 0 . 0 .... @3diff
24
+
20
+
25
+# Neon load/store single structure to one lane
21
+ VMULL_P_3d 1111 001 0 1 . .. .... .... 1110 . 0 . 0 .... @3diff
26
+%imm1_5_p1 5:1 !function=plus1
22
]
27
+%imm1_6_p1 6:1 !function=plus1
23
}
28
+
29
+VLDST_single 1111 0100 1 . l:1 0 rn:4 .... 00 n:2 reg_idx:3 align:1 rm:4 \
30
+ vd=%vd_dp size=0 stride=1
31
+VLDST_single 1111 0100 1 . l:1 0 rn:4 .... 01 n:2 reg_idx:2 align:2 rm:4 \
32
+ vd=%vd_dp size=1 stride=%imm1_5_p1
33
+VLDST_single 1111 0100 1 . l:1 0 rn:4 .... 10 n:2 reg_idx:1 align:3 rm:4 \
34
+ vd=%vd_dp size=2 stride=%imm1_6_p1
35
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
24
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
36
index XXXXXXX..XXXXXXX 100644
25
index XXXXXXX..XXXXXXX 100644
37
--- a/target/arm/translate-neon.inc.c
26
--- a/target/arm/translate-neon.inc.c
38
+++ b/target/arm/translate-neon.inc.c
27
+++ b/target/arm/translate-neon.inc.c
39
@@ -XXX,XX +XXX,XX @@
28
@@ -XXX,XX +XXX,XX @@ static bool trans_VQDMLSL_3d(DisasContext *s, arg_3diff *a)
40
* It might be possible to convert it to a standalone .c file eventually.
29
41
*/
30
return do_long_3d(s, a, opfn[a->size], accfn[a->size]);
42
43
+static inline int plus1(DisasContext *s, int x)
44
+{
45
+ return x + 1;
46
+}
47
+
48
/* Include the generated Neon decoder */
49
#include "decode-neon-dp.inc.c"
50
#include "decode-neon-ls.inc.c"
51
@@ -XXX,XX +XXX,XX @@ static bool trans_VLD_all_lanes(DisasContext *s, arg_VLD_all_lanes *a)
52
53
return true;
54
}
31
}
55
+
32
+
56
+static bool trans_VLDST_single(DisasContext *s, arg_VLDST_single *a)
33
+static bool trans_VMULL_P_3d(DisasContext *s, arg_3diff *a)
57
+{
34
+{
58
+ /* Neon load/store single structure to one lane */
35
+ gen_helper_gvec_3 *fn_gvec;
59
+ int reg;
60
+ int nregs = a->n + 1;
61
+ int vd = a->vd;
62
+ TCGv_i32 addr, tmp;
63
+
36
+
64
+ if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
37
+ if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
65
+ return false;
38
+ return false;
66
+ }
39
+ }
67
+
40
+
68
+ /* UNDEF accesses to D16-D31 if they don't exist */
41
+ /* UNDEF accesses to D16-D31 if they don't exist. */
69
+ if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
42
+ if (!dc_isar_feature(aa32_simd_r32, s) &&
43
+ ((a->vd | a->vn | a->vm) & 0x10)) {
70
+ return false;
44
+ return false;
71
+ }
45
+ }
72
+
46
+
73
+ /* Catch the UNDEF cases. This is unavoidably a bit messy. */
47
+ if (a->vd & 1) {
74
+ switch (nregs) {
48
+ return false;
75
+ case 1:
49
+ }
76
+ if (((a->align & (1 << a->size)) != 0) ||
50
+
77
+ (a->size == 2 && ((a->align & 3) == 1 || (a->align & 3) == 2))) {
51
+ switch (a->size) {
52
+ case 0:
53
+ fn_gvec = gen_helper_neon_pmull_h;
54
+ break;
55
+ case 2:
56
+ if (!dc_isar_feature(aa32_pmull, s)) {
78
+ return false;
57
+ return false;
79
+ }
58
+ }
80
+ break;
59
+ fn_gvec = gen_helper_gvec_pmull_q;
81
+ case 3:
82
+ if ((a->align & 1) != 0) {
83
+ return false;
84
+ }
85
+ /* fall through */
86
+ case 2:
87
+ if (a->size == 2 && (a->align & 2) != 0) {
88
+ return false;
89
+ }
90
+ break;
91
+ case 4:
92
+ if ((a->size == 2) && ((a->align & 3) == 3)) {
93
+ return false;
94
+ }
95
+ break;
60
+ break;
96
+ default:
61
+ default:
97
+ abort();
98
+ }
99
+ if ((vd + a->stride * (nregs - 1)) > 31) {
100
+ /*
101
+ * Attempts to write off the end of the register file are
102
+ * UNPREDICTABLE; we choose to UNDEF because otherwise we would
103
+ * access off the end of the array that holds the register data.
104
+ */
105
+ return false;
62
+ return false;
106
+ }
63
+ }
107
+
64
+
108
+ if (!vfp_access_check(s)) {
65
+ if (!vfp_access_check(s)) {
109
+ return true;
66
+ return true;
110
+ }
67
+ }
111
+
68
+
112
+ tmp = tcg_temp_new_i32();
69
+ tcg_gen_gvec_3_ool(neon_reg_offset(a->vd, 0),
113
+ addr = tcg_temp_new_i32();
70
+ neon_reg_offset(a->vn, 0),
114
+ load_reg_var(s, addr, a->rn);
71
+ neon_reg_offset(a->vm, 0),
115
+ /*
72
+ 16, 16, 0, fn_gvec);
116
+ * TODO: if we implemented alignment exceptions, we should check
117
+ * addr against the alignment encoded in a->align here.
118
+ */
119
+ for (reg = 0; reg < nregs; reg++) {
120
+ if (a->l) {
121
+ gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s),
122
+ s->be_data | a->size);
123
+ neon_store_element(vd, a->reg_idx, a->size, tmp);
124
+ } else { /* Store */
125
+ neon_load_element(tmp, vd, a->reg_idx, a->size);
126
+ gen_aa32_st_i32(s, tmp, addr, get_mem_index(s),
127
+ s->be_data | a->size);
128
+ }
129
+ vd += a->stride;
130
+ tcg_gen_addi_i32(addr, addr, 1 << a->size);
131
+ }
132
+ tcg_temp_free_i32(addr);
133
+ tcg_temp_free_i32(tmp);
134
+
135
+ gen_neon_ldst_base_update(s, a->rm, a->rn, (1 << a->size) * nregs);
136
+
137
+ return true;
73
+ return true;
138
+}
74
+}
139
diff --git a/target/arm/translate.c b/target/arm/translate.c
75
diff --git a/target/arm/translate.c b/target/arm/translate.c
140
index XXXXXXX..XXXXXXX 100644
76
index XXXXXXX..XXXXXXX 100644
141
--- a/target/arm/translate.c
77
--- a/target/arm/translate.c
142
+++ b/target/arm/translate.c
78
+++ b/target/arm/translate.c
143
@@ -XXX,XX +XXX,XX @@ static void gen_neon_trn_u16(TCGv_i32 t0, TCGv_i32 t1)
79
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
144
tcg_temp_free_i32(rd);
80
{
145
}
81
int op;
146
82
int q;
83
- int rd, rn, rm, rd_ofs, rn_ofs, rm_ofs;
84
+ int rd, rn, rm, rd_ofs, rm_ofs;
85
int size;
86
int pass;
87
int u;
88
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
89
size = (insn >> 20) & 3;
90
vec_size = q ? 16 : 8;
91
rd_ofs = neon_reg_offset(rd, 0);
92
- rn_ofs = neon_reg_offset(rn, 0);
93
rm_ofs = neon_reg_offset(rm, 0);
94
95
if ((insn & (1 << 23)) == 0) {
96
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
97
if (size != 3) {
98
op = (insn >> 8) & 0xf;
99
if ((insn & (1 << 6)) == 0) {
100
- /* Three registers of different lengths. */
101
- /* undefreq: bit 0 : UNDEF if size == 0
102
- * bit 1 : UNDEF if size == 1
103
- * bit 2 : UNDEF if size == 2
104
- * bit 3 : UNDEF if U == 1
105
- * Note that [2:0] set implies 'always UNDEF'
106
- */
107
- int undefreq;
108
- /* prewiden, src1_wide, src2_wide, undefreq */
109
- static const int neon_3reg_wide[16][4] = {
110
- {0, 0, 0, 7}, /* VADDL: handled by decodetree */
111
- {0, 0, 0, 7}, /* VADDW: handled by decodetree */
112
- {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
113
- {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
114
- {0, 0, 0, 7}, /* VADDHN: handled by decodetree */
115
- {0, 0, 0, 7}, /* VABAL */
116
- {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
117
- {0, 0, 0, 7}, /* VABDL */
118
- {0, 0, 0, 7}, /* VMLAL */
119
- {0, 0, 0, 7}, /* VQDMLAL */
120
- {0, 0, 0, 7}, /* VMLSL */
121
- {0, 0, 0, 7}, /* VQDMLSL */
122
- {0, 0, 0, 7}, /* Integer VMULL */
123
- {0, 0, 0, 7}, /* VQDMULL */
124
- {0, 0, 0, 0xa}, /* Polynomial VMULL */
125
- {0, 0, 0, 7}, /* Reserved: always UNDEF */
126
- };
147
-
127
-
148
-/* Translate a NEON load/store element instruction. Return nonzero if the
128
- undefreq = neon_3reg_wide[op][3];
149
- instruction is invalid. */
150
-static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
151
-{
152
- int rd, rn, rm;
153
- int nregs;
154
- int stride;
155
- int size;
156
- int reg;
157
- int load;
158
- TCGv_i32 addr;
159
- TCGv_i32 tmp;
160
-
129
-
161
- if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
130
- if ((undefreq & (1 << size)) ||
162
- return 1;
131
- ((undefreq & 8) && u)) {
163
- }
164
-
165
- /* FIXME: this access check should not take precedence over UNDEF
166
- * for invalid encodings; we will generate incorrect syndrome information
167
- * for attempts to execute invalid vfp/neon encodings with FP disabled.
168
- */
169
- if (s->fp_excp_el) {
170
- gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
171
- syn_simd_access_trap(1, 0xe, false), s->fp_excp_el);
172
- return 0;
173
- }
174
-
175
- if (!s->vfp_enabled)
176
- return 1;
177
- VFP_DREG_D(rd, insn);
178
- rn = (insn >> 16) & 0xf;
179
- rm = insn & 0xf;
180
- load = (insn & (1 << 21)) != 0;
181
- if ((insn & (1 << 23)) == 0) {
182
- /* Load store all elements -- handled already by decodetree */
183
- return 1;
184
- } else {
185
- size = (insn >> 10) & 3;
186
- if (size == 3) {
187
- /* Load single element to all lanes -- handled by decodetree */
188
- return 1;
189
- } else {
190
- /* Single element. */
191
- int idx = (insn >> 4) & 0xf;
192
- int reg_idx;
193
- switch (size) {
194
- case 0:
195
- reg_idx = (insn >> 5) & 7;
196
- stride = 1;
197
- break;
198
- case 1:
199
- reg_idx = (insn >> 6) & 3;
200
- stride = (insn & (1 << 5)) ? 2 : 1;
201
- break;
202
- case 2:
203
- reg_idx = (insn >> 7) & 1;
204
- stride = (insn & (1 << 6)) ? 2 : 1;
205
- break;
206
- default:
207
- abort();
208
- }
209
- nregs = ((insn >> 8) & 3) + 1;
210
- /* Catch the UNDEF cases. This is unavoidably a bit messy. */
211
- switch (nregs) {
212
- case 1:
213
- if (((idx & (1 << size)) != 0) ||
214
- (size == 2 && ((idx & 3) == 1 || (idx & 3) == 2))) {
215
- return 1;
132
- return 1;
216
- }
133
- }
217
- break;
134
- if (rd & 1) {
218
- case 3:
219
- if ((idx & 1) != 0) {
220
- return 1;
135
- return 1;
221
- }
136
- }
222
- /* fall through */
137
-
223
- case 2:
138
- /* Handle polynomial VMULL in a single pass. */
224
- if (size == 2 && (idx & 2) != 0) {
139
- if (op == 14) {
225
- return 1;
140
- if (size == 0) {
141
- /* VMULL.P8 */
142
- tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16,
143
- 0, gen_helper_neon_pmull_h);
144
- } else {
145
- /* VMULL.P64 */
146
- if (!dc_isar_feature(aa32_pmull, s)) {
147
- return 1;
148
- }
149
- tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16,
150
- 0, gen_helper_gvec_pmull_q);
151
- }
152
- return 0;
226
- }
153
- }
227
- break;
154
- abort(); /* all others handled by decodetree */
228
- case 4:
155
+ /* Three registers of different lengths: handled by decodetree */
229
- if ((size == 2) && ((idx & 3) == 3)) {
156
+ return 1;
230
- return 1;
157
} else {
231
- }
158
/* Two registers and a scalar. NB that for ops of this form
232
- break;
159
* the ARM ARM labels bit 24 as Q, but it is in our variable
233
- default:
234
- abort();
235
- }
236
- if ((rd + stride * (nregs - 1)) > 31) {
237
- /* Attempts to write off the end of the register file
238
- * are UNPREDICTABLE; we choose to UNDEF because otherwise
239
- * the neon_load_reg() would write off the end of the array.
240
- */
241
- return 1;
242
- }
243
- tmp = tcg_temp_new_i32();
244
- addr = tcg_temp_new_i32();
245
- load_reg_var(s, addr, rn);
246
- for (reg = 0; reg < nregs; reg++) {
247
- if (load) {
248
- gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s),
249
- s->be_data | size);
250
- neon_store_element(rd, reg_idx, size, tmp);
251
- } else { /* Store */
252
- neon_load_element(tmp, rd, reg_idx, size);
253
- gen_aa32_st_i32(s, tmp, addr, get_mem_index(s),
254
- s->be_data | size);
255
- }
256
- rd += stride;
257
- tcg_gen_addi_i32(addr, addr, 1 << size);
258
- }
259
- tcg_temp_free_i32(addr);
260
- tcg_temp_free_i32(tmp);
261
- stride = nregs * (1 << size);
262
- }
263
- }
264
- if (rm != 15) {
265
- TCGv_i32 base;
266
-
267
- base = load_reg(s, rn);
268
- if (rm == 13) {
269
- tcg_gen_addi_i32(base, base, stride);
270
- } else {
271
- TCGv_i32 index;
272
- index = load_reg(s, rm);
273
- tcg_gen_add_i32(base, base, index);
274
- tcg_temp_free_i32(index);
275
- }
276
- store_reg(s, rn, base);
277
- }
278
- return 0;
279
-}
280
-
281
static inline void gen_neon_narrow(int size, TCGv_i32 dest, TCGv_i64 src)
282
{
283
switch (size) {
284
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
285
}
286
return;
287
}
288
- if ((insn & 0x0f100000) == 0x04000000) {
289
- /* NEON load/store. */
290
- if (disas_neon_ls_insn(s, insn)) {
291
- goto illegal_op;
292
- }
293
- return;
294
- }
295
if ((insn & 0x0e000f00) == 0x0c000100) {
296
if (arm_dc_feature(s, ARM_FEATURE_IWMMXT)) {
297
/* iWMMXt register transfer. */
298
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
299
}
300
break;
301
case 12:
302
- if ((insn & 0x01100000) == 0x01000000) {
303
- if (disas_neon_ls_insn(s, insn)) {
304
- goto illegal_op;
305
- }
306
- break;
307
- }
308
goto illegal_op;
309
default:
310
illegal_op:
311
--
160
--
312
2.20.1
161
2.20.1
313
162
314
163
diff view generated by jsdifflib
1
Convert the Neon logic ops in the 3-reg-same grouping to decodetree.
1
Mark the arrays of function pointers in trans_VSHLL_S_2sh() and
2
Note that for the logic ops the 'size' field forms part of their
2
trans_VSHLL_U_2sh() as both 'static' and 'const'.
3
decode and the actual operations are always bitwise.
4
3
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
7
Message-id: 20200430181003.21682-16-peter.maydell@linaro.org
8
---
6
---
9
target/arm/neon-dp.decode | 12 +++++++++++
7
target/arm/translate-neon.inc.c | 4 ++--
10
target/arm/translate-neon.inc.c | 19 +++++++++++++++++
8
1 file changed, 2 insertions(+), 2 deletions(-)
11
target/arm/translate.c | 38 +--------------------------------
12
3 files changed, 32 insertions(+), 37 deletions(-)
13
9
14
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
15
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/neon-dp.decode
17
+++ b/target/arm/neon-dp.decode
18
@@ -XXX,XX +XXX,XX @@
19
@3same .... ... . . . size:2 .... .... .... . q:1 . . .... \
20
&3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
21
22
+@3same_logic .... ... . . . .. .... .... .... . q:1 .. .... \
23
+ &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp size=0
24
+
25
+VAND_3s 1111 001 0 0 . 00 .... .... 0001 ... 1 .... @3same_logic
26
+VBIC_3s 1111 001 0 0 . 01 .... .... 0001 ... 1 .... @3same_logic
27
+VORR_3s 1111 001 0 0 . 10 .... .... 0001 ... 1 .... @3same_logic
28
+VORN_3s 1111 001 0 0 . 11 .... .... 0001 ... 1 .... @3same_logic
29
+VEOR_3s 1111 001 1 0 . 00 .... .... 0001 ... 1 .... @3same_logic
30
+VBSL_3s 1111 001 1 0 . 01 .... .... 0001 ... 1 .... @3same_logic
31
+VBIT_3s 1111 001 1 0 . 10 .... .... 0001 ... 1 .... @3same_logic
32
+VBIF_3s 1111 001 1 0 . 11 .... .... 0001 ... 1 .... @3same_logic
33
+
34
VADD_3s 1111 001 0 0 . .. .... .... 1000 . . . 0 .... @3same
35
VSUB_3s 1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
36
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
10
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
37
index XXXXXXX..XXXXXXX 100644
11
index XXXXXXX..XXXXXXX 100644
38
--- a/target/arm/translate-neon.inc.c
12
--- a/target/arm/translate-neon.inc.c
39
+++ b/target/arm/translate-neon.inc.c
13
+++ b/target/arm/translate-neon.inc.c
40
@@ -XXX,XX +XXX,XX @@ static bool do_3same(DisasContext *s, arg_3same *a, GVecGen3Fn fn)
14
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
41
15
42
DO_3SAME(VADD, tcg_gen_gvec_add)
16
static bool trans_VSHLL_S_2sh(DisasContext *s, arg_2reg_shift *a)
43
DO_3SAME(VSUB, tcg_gen_gvec_sub)
17
{
44
+DO_3SAME(VAND, tcg_gen_gvec_and)
18
- NeonGenWidenFn *widenfn[] = {
45
+DO_3SAME(VBIC, tcg_gen_gvec_andc)
19
+ static NeonGenWidenFn * const widenfn[] = {
46
+DO_3SAME(VORR, tcg_gen_gvec_or)
20
gen_helper_neon_widen_s8,
47
+DO_3SAME(VORN, tcg_gen_gvec_orc)
21
gen_helper_neon_widen_s16,
48
+DO_3SAME(VEOR, tcg_gen_gvec_xor)
22
tcg_gen_ext_i32_i64,
49
+
23
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL_S_2sh(DisasContext *s, arg_2reg_shift *a)
50
+/* These insns are all gvec_bitsel but with the inputs in various orders. */
24
51
+#define DO_3SAME_BITSEL(INSN, O1, O2, O3) \
25
static bool trans_VSHLL_U_2sh(DisasContext *s, arg_2reg_shift *a)
52
+ static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs, \
26
{
53
+ uint32_t rn_ofs, uint32_t rm_ofs, \
27
- NeonGenWidenFn *widenfn[] = {
54
+ uint32_t oprsz, uint32_t maxsz) \
28
+ static NeonGenWidenFn * const widenfn[] = {
55
+ { \
29
gen_helper_neon_widen_u8,
56
+ tcg_gen_gvec_bitsel(vece, rd_ofs, O1, O2, O3, oprsz, maxsz); \
30
gen_helper_neon_widen_u16,
57
+ } \
31
tcg_gen_extu_i32_i64,
58
+ DO_3SAME(INSN, gen_##INSN##_3s)
59
+
60
+DO_3SAME_BITSEL(VBSL, rd_ofs, rn_ofs, rm_ofs)
61
+DO_3SAME_BITSEL(VBIT, rm_ofs, rn_ofs, rd_ofs)
62
+DO_3SAME_BITSEL(VBIF, rm_ofs, rd_ofs, rn_ofs)
63
diff --git a/target/arm/translate.c b/target/arm/translate.c
64
index XXXXXXX..XXXXXXX 100644
65
--- a/target/arm/translate.c
66
+++ b/target/arm/translate.c
67
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
68
}
69
return 1;
70
71
- case NEON_3R_LOGIC: /* Logic ops. */
72
- switch ((u << 2) | size) {
73
- case 0: /* VAND */
74
- tcg_gen_gvec_and(0, rd_ofs, rn_ofs, rm_ofs,
75
- vec_size, vec_size);
76
- break;
77
- case 1: /* VBIC */
78
- tcg_gen_gvec_andc(0, rd_ofs, rn_ofs, rm_ofs,
79
- vec_size, vec_size);
80
- break;
81
- case 2: /* VORR */
82
- tcg_gen_gvec_or(0, rd_ofs, rn_ofs, rm_ofs,
83
- vec_size, vec_size);
84
- break;
85
- case 3: /* VORN */
86
- tcg_gen_gvec_orc(0, rd_ofs, rn_ofs, rm_ofs,
87
- vec_size, vec_size);
88
- break;
89
- case 4: /* VEOR */
90
- tcg_gen_gvec_xor(0, rd_ofs, rn_ofs, rm_ofs,
91
- vec_size, vec_size);
92
- break;
93
- case 5: /* VBSL */
94
- tcg_gen_gvec_bitsel(MO_8, rd_ofs, rd_ofs, rn_ofs, rm_ofs,
95
- vec_size, vec_size);
96
- break;
97
- case 6: /* VBIT */
98
- tcg_gen_gvec_bitsel(MO_8, rd_ofs, rm_ofs, rn_ofs, rd_ofs,
99
- vec_size, vec_size);
100
- break;
101
- case 7: /* VBIF */
102
- tcg_gen_gvec_bitsel(MO_8, rd_ofs, rm_ofs, rd_ofs, rn_ofs,
103
- vec_size, vec_size);
104
- break;
105
- }
106
- return 0;
107
-
108
case NEON_3R_VQADD:
109
tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
110
rn_ofs, rm_ofs, vec_size, vec_size,
111
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
112
return 0;
113
114
case NEON_3R_VADD_VSUB:
115
+ case NEON_3R_LOGIC:
116
/* Already handled by decodetree */
117
return 1;
118
}
119
--
32
--
120
2.20.1
33
2.20.1
121
34
122
35
diff view generated by jsdifflib
1
Convert the V[US]DOT (scalar) insns in the 2reg-scalar-ext group
1
In commit 37bfce81b10450071 we accidentally introduced a leak of a TCG
2
to decodetree.
2
temporary in do_2shift_env_64(); free it.
3
3
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Message-id: 20200430181003.21682-10-peter.maydell@linaro.org
7
---
6
---
8
target/arm/neon-shared.decode | 3 +++
7
target/arm/translate-neon.inc.c | 1 +
9
target/arm/translate-neon.inc.c | 35 +++++++++++++++++++++++++++++++++
8
1 file changed, 1 insertion(+)
10
target/arm/translate.c | 13 +-----------
11
3 files changed, 39 insertions(+), 12 deletions(-)
12
9
13
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
14
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/neon-shared.decode
16
+++ b/target/arm/neon-shared.decode
17
@@ -XXX,XX +XXX,XX @@ VCMLA_scalar 1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
18
vn=%vn_dp vd=%vd_dp size=0
19
VCMLA_scalar 1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
20
vm=%vm_dp vn=%vn_dp vd=%vd_dp size=1 index=0
21
+
22
+VDOT_scalar 1111 1110 0 . 10 .... .... 1101 . q:1 index:1 u:1 rm:4 \
23
+ vm=%vm_dp vn=%vn_dp vd=%vd_dp
24
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
10
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
25
index XXXXXXX..XXXXXXX 100644
11
index XXXXXXX..XXXXXXX 100644
26
--- a/target/arm/translate-neon.inc.c
12
--- a/target/arm/translate-neon.inc.c
27
+++ b/target/arm/translate-neon.inc.c
13
+++ b/target/arm/translate-neon.inc.c
28
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMLA_scalar(DisasContext *s, arg_VCMLA_scalar *a)
14
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_64(DisasContext *s, arg_2reg_shift *a,
29
tcg_temp_free_ptr(fpst);
15
neon_load_reg64(tmp, a->vm + pass);
16
fn(tmp, cpu_env, tmp, constimm);
17
neon_store_reg64(tmp, a->vd + pass);
18
+ tcg_temp_free_i64(tmp);
19
}
20
tcg_temp_free_i64(constimm);
30
return true;
21
return true;
31
}
32
+
33
+static bool trans_VDOT_scalar(DisasContext *s, arg_VDOT_scalar *a)
34
+{
35
+ gen_helper_gvec_3 *fn_gvec;
36
+ int opr_sz;
37
+ TCGv_ptr fpst;
38
+
39
+ if (!dc_isar_feature(aa32_dp, s)) {
40
+ return false;
41
+ }
42
+
43
+ /* UNDEF accesses to D16-D31 if they don't exist. */
44
+ if (!dc_isar_feature(aa32_simd_r32, s) &&
45
+ ((a->vd | a->vn) & 0x10)) {
46
+ return false;
47
+ }
48
+
49
+ if ((a->vd | a->vn) & a->q) {
50
+ return false;
51
+ }
52
+
53
+ if (!vfp_access_check(s)) {
54
+ return true;
55
+ }
56
+
57
+ fn_gvec = a->u ? gen_helper_gvec_udot_idx_b : gen_helper_gvec_sdot_idx_b;
58
+ opr_sz = (1 + a->q) * 8;
59
+ fpst = get_fpstatus_ptr(1);
60
+ tcg_gen_gvec_3_ool(vfp_reg_offset(1, a->vd),
61
+ vfp_reg_offset(1, a->vn),
62
+ vfp_reg_offset(1, a->rm),
63
+ opr_sz, opr_sz, a->index, fn_gvec);
64
+ tcg_temp_free_ptr(fpst);
65
+ return true;
66
+}
67
diff --git a/target/arm/translate.c b/target/arm/translate.c
68
index XXXXXXX..XXXXXXX 100644
69
--- a/target/arm/translate.c
70
+++ b/target/arm/translate.c
71
@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn)
72
bool is_long = false, q = extract32(insn, 6, 1);
73
bool ptr_is_env = false;
74
75
- if ((insn & 0xffb00f00) == 0xfe200d00) {
76
- /* V[US]DOT -- 1111 1110 0.10 .... .... 1101 .Q.U .... */
77
- int u = extract32(insn, 4, 1);
78
-
79
- if (!dc_isar_feature(aa32_dp, s)) {
80
- return 1;
81
- }
82
- fn_gvec = u ? gen_helper_gvec_udot_idx_b : gen_helper_gvec_sdot_idx_b;
83
- /* rm is just Vm, and index is M. */
84
- data = extract32(insn, 5, 1); /* index */
85
- rm = extract32(insn, 0, 4);
86
- } else if ((insn & 0xffa00f10) == 0xfe000810) {
87
+ if ((insn & 0xffa00f10) == 0xfe000810) {
88
/* VFM[AS]L -- 1111 1110 0.0S .... .... 1000 .Q.1 .... */
89
int is_s = extract32(insn, 20, 1);
90
int vm20 = extract32(insn, 0, 3);
91
--
22
--
92
2.20.1
23
2.20.1
93
24
94
25
diff view generated by jsdifflib
1
Convert the VFM[AS]L (scalar) insns in the 2reg-scalar-ext group
1
Convert the VMLA, VMLS and VMUL insns in the Neon "2 registers and a
2
to decodetree. These are the last ones in the group so we can remove
2
scalar" group to decodetree. These are 32x32->32 operations where
3
all the legacy decode for the group.
3
one of the inputs is the scalar, followed by a possible accumulate
4
4
operation of the 32-bit result.
5
Note that in disas_thumb2_insn() the parts of this encoding space
5
6
where the decodetree decoder returns false will correctly be directed
6
The refactoring removes some of the oddities of the old decoder:
7
to illegal_op by the "(insn & (1 << 28))" check so they won't fall
7
* operands to the operation and accumulation were often
8
into disas_coproc_insn() by mistake.
8
reversed (taking advantage of the fact that most of these ops
9
are commutative); the new code follows the pseudocode order
10
* the Q bit in the insn was in a local variable 'u'; in the
11
new code it is decoded into a->q
9
12
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
13
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
14
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
12
Message-id: 20200430181003.21682-11-peter.maydell@linaro.org
13
---
15
---
14
target/arm/neon-shared.decode | 7 +++
16
target/arm/neon-dp.decode | 15 ++++
15
target/arm/translate-neon.inc.c | 32 ++++++++++
17
target/arm/translate-neon.inc.c | 133 ++++++++++++++++++++++++++++++++
16
target/arm/translate.c | 107 +-------------------------------
18
target/arm/translate.c | 77 ++----------------
17
3 files changed, 40 insertions(+), 106 deletions(-)
19
3 files changed, 154 insertions(+), 71 deletions(-)
18
20
19
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
21
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
20
index XXXXXXX..XXXXXXX 100644
22
index XXXXXXX..XXXXXXX 100644
21
--- a/target/arm/neon-shared.decode
23
--- a/target/arm/neon-dp.decode
22
+++ b/target/arm/neon-shared.decode
24
+++ b/target/arm/neon-dp.decode
23
@@ -XXX,XX +XXX,XX @@ VCMLA_scalar 1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
25
@@ -XXX,XX +XXX,XX @@ Vimm_1r 1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
24
26
VQDMULL_3d 1111 001 0 1 . .. .... .... 1101 . 0 . 0 .... @3diff
25
VDOT_scalar 1111 1110 0 . 10 .... .... 1101 . q:1 index:1 u:1 rm:4 \
27
26
vm=%vm_dp vn=%vn_dp vd=%vd_dp
28
VMULL_P_3d 1111 001 0 1 . .. .... .... 1110 . 0 . 0 .... @3diff
27
+
29
+
28
+%vfml_scalar_q0_rm 0:3 5:1
30
+ ##################################################################
29
+%vfml_scalar_q1_index 5:1 3:1
31
+ # 2-regs-plus-scalar grouping:
30
+VFML_scalar 1111 1110 0 . 0 s:1 .... .... 1000 . 0 . 1 index:1 ... \
32
+ # 1111 001 Q 1 D sz!=11 Vn:4 Vd:4 opc:4 N 1 M 0 Vm:4
31
+ rm=%vfml_scalar_q0_rm vn=%vn_sp vd=%vd_dp q=0
33
+ ##################################################################
32
+VFML_scalar 1111 1110 0 . 0 s:1 .... .... 1000 . 1 . 1 . rm:3 \
34
+ &2scalar vm vn vd size q
33
+ index=%vfml_scalar_q1_index vn=%vn_dp vd=%vd_dp q=1
35
+
36
+ @2scalar .... ... q:1 . . size:2 .... .... .... . . . . .... \
37
+ &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp
38
+
39
+ VMLA_2sc 1111 001 . 1 . .. .... .... 0000 . 1 . 0 .... @2scalar
40
+
41
+ VMLS_2sc 1111 001 . 1 . .. .... .... 0100 . 1 . 0 .... @2scalar
42
+
43
+ VMUL_2sc 1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
44
]
45
}
34
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
46
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
35
index XXXXXXX..XXXXXXX 100644
47
index XXXXXXX..XXXXXXX 100644
36
--- a/target/arm/translate-neon.inc.c
48
--- a/target/arm/translate-neon.inc.c
37
+++ b/target/arm/translate-neon.inc.c
49
+++ b/target/arm/translate-neon.inc.c
38
@@ -XXX,XX +XXX,XX @@ static bool trans_VDOT_scalar(DisasContext *s, arg_VDOT_scalar *a)
50
@@ -XXX,XX +XXX,XX @@ static bool trans_VMULL_P_3d(DisasContext *s, arg_3diff *a)
39
tcg_temp_free_ptr(fpst);
51
16, 16, 0, fn_gvec);
40
return true;
52
return true;
41
}
53
}
42
+
54
+
43
+static bool trans_VFML_scalar(DisasContext *s, arg_VFML_scalar *a)
55
+static void gen_neon_dup_low16(TCGv_i32 var)
44
+{
56
+{
45
+ int opr_sz;
57
+ TCGv_i32 tmp = tcg_temp_new_i32();
46
+
58
+ tcg_gen_ext16u_i32(var, var);
47
+ if (!dc_isar_feature(aa32_fhm, s)) {
59
+ tcg_gen_shli_i32(tmp, var, 16);
60
+ tcg_gen_or_i32(var, var, tmp);
61
+ tcg_temp_free_i32(tmp);
62
+}
63
+
64
+static void gen_neon_dup_high16(TCGv_i32 var)
65
+{
66
+ TCGv_i32 tmp = tcg_temp_new_i32();
67
+ tcg_gen_andi_i32(var, var, 0xffff0000);
68
+ tcg_gen_shri_i32(tmp, var, 16);
69
+ tcg_gen_or_i32(var, var, tmp);
70
+ tcg_temp_free_i32(tmp);
71
+}
72
+
73
+static inline TCGv_i32 neon_get_scalar(int size, int reg)
74
+{
75
+ TCGv_i32 tmp;
76
+ if (size == 1) {
77
+ tmp = neon_load_reg(reg & 7, reg >> 4);
78
+ if (reg & 8) {
79
+ gen_neon_dup_high16(tmp);
80
+ } else {
81
+ gen_neon_dup_low16(tmp);
82
+ }
83
+ } else {
84
+ tmp = neon_load_reg(reg & 15, reg >> 4);
85
+ }
86
+ return tmp;
87
+}
88
+
89
+static bool do_2scalar(DisasContext *s, arg_2scalar *a,
90
+ NeonGenTwoOpFn *opfn, NeonGenTwoOpFn *accfn)
91
+{
92
+ /*
93
+ * Two registers and a scalar: perform an operation between
94
+ * the input elements and the scalar, and then possibly
95
+ * perform an accumulation operation of that result into the
96
+ * destination.
97
+ */
98
+ TCGv_i32 scalar;
99
+ int pass;
100
+
101
+ if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
48
+ return false;
102
+ return false;
49
+ }
103
+ }
50
+
104
+
51
+ /* UNDEF accesses to D16-D31 if they don't exist. */
105
+ /* UNDEF accesses to D16-D31 if they don't exist. */
52
+ if (!dc_isar_feature(aa32_simd_r32, s) &&
106
+ if (!dc_isar_feature(aa32_simd_r32, s) &&
53
+ ((a->vd & 0x10) || (a->q && (a->vn & 0x10)))) {
107
+ ((a->vd | a->vn | a->vm) & 0x10)) {
54
+ return false;
108
+ return false;
55
+ }
109
+ }
56
+
110
+
57
+ if (a->vd & a->q) {
111
+ if (!opfn) {
112
+ /* Bad size (including size == 3, which is a different insn group) */
113
+ return false;
114
+ }
115
+
116
+ if (a->q && ((a->vd | a->vn) & 1)) {
58
+ return false;
117
+ return false;
59
+ }
118
+ }
60
+
119
+
61
+ if (!vfp_access_check(s)) {
120
+ if (!vfp_access_check(s)) {
62
+ return true;
121
+ return true;
63
+ }
122
+ }
64
+
123
+
65
+ opr_sz = (1 + a->q) * 8;
124
+ scalar = neon_get_scalar(a->size, a->vm);
66
+ tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
125
+
67
+ vfp_reg_offset(a->q, a->vn),
126
+ for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
68
+ vfp_reg_offset(a->q, a->rm),
127
+ TCGv_i32 tmp = neon_load_reg(a->vn, pass);
69
+ cpu_env, opr_sz, opr_sz,
128
+ opfn(tmp, tmp, scalar);
70
+ (a->index << 2) | a->s, /* is_2 == 0 */
129
+ if (accfn) {
71
+ gen_helper_gvec_fmlal_idx_a32);
130
+ TCGv_i32 rd = neon_load_reg(a->vd, pass);
131
+ accfn(tmp, rd, tmp);
132
+ tcg_temp_free_i32(rd);
133
+ }
134
+ neon_store_reg(a->vd, pass, tmp);
135
+ }
136
+ tcg_temp_free_i32(scalar);
72
+ return true;
137
+ return true;
138
+}
139
+
140
+static bool trans_VMUL_2sc(DisasContext *s, arg_2scalar *a)
141
+{
142
+ static NeonGenTwoOpFn * const opfn[] = {
143
+ NULL,
144
+ gen_helper_neon_mul_u16,
145
+ tcg_gen_mul_i32,
146
+ NULL,
147
+ };
148
+
149
+ return do_2scalar(s, a, opfn[a->size], NULL);
150
+}
151
+
152
+static bool trans_VMLA_2sc(DisasContext *s, arg_2scalar *a)
153
+{
154
+ static NeonGenTwoOpFn * const opfn[] = {
155
+ NULL,
156
+ gen_helper_neon_mul_u16,
157
+ tcg_gen_mul_i32,
158
+ NULL,
159
+ };
160
+ static NeonGenTwoOpFn * const accfn[] = {
161
+ NULL,
162
+ gen_helper_neon_add_u16,
163
+ tcg_gen_add_i32,
164
+ NULL,
165
+ };
166
+
167
+ return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
168
+}
169
+
170
+static bool trans_VMLS_2sc(DisasContext *s, arg_2scalar *a)
171
+{
172
+ static NeonGenTwoOpFn * const opfn[] = {
173
+ NULL,
174
+ gen_helper_neon_mul_u16,
175
+ tcg_gen_mul_i32,
176
+ NULL,
177
+ };
178
+ static NeonGenTwoOpFn * const accfn[] = {
179
+ NULL,
180
+ gen_helper_neon_sub_u16,
181
+ tcg_gen_sub_i32,
182
+ NULL,
183
+ };
184
+
185
+ return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
73
+}
186
+}
74
diff --git a/target/arm/translate.c b/target/arm/translate.c
187
diff --git a/target/arm/translate.c b/target/arm/translate.c
75
index XXXXXXX..XXXXXXX 100644
188
index XXXXXXX..XXXXXXX 100644
76
--- a/target/arm/translate.c
189
--- a/target/arm/translate.c
77
+++ b/target/arm/translate.c
190
+++ b/target/arm/translate.c
78
@@ -XXX,XX +XXX,XX @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
191
@@ -XXX,XX +XXX,XX @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
192
#define VFP_DREG_N(reg, insn) VFP_DREG(reg, insn, 16, 7)
193
#define VFP_DREG_M(reg, insn) VFP_DREG(reg, insn, 0, 5)
194
195
-static void gen_neon_dup_low16(TCGv_i32 var)
196
-{
197
- TCGv_i32 tmp = tcg_temp_new_i32();
198
- tcg_gen_ext16u_i32(var, var);
199
- tcg_gen_shli_i32(tmp, var, 16);
200
- tcg_gen_or_i32(var, var, tmp);
201
- tcg_temp_free_i32(tmp);
202
-}
203
-
204
-static void gen_neon_dup_high16(TCGv_i32 var)
205
-{
206
- TCGv_i32 tmp = tcg_temp_new_i32();
207
- tcg_gen_andi_i32(var, var, 0xffff0000);
208
- tcg_gen_shri_i32(tmp, var, 16);
209
- tcg_gen_or_i32(var, var, tmp);
210
- tcg_temp_free_i32(tmp);
211
-}
212
-
213
static inline bool use_goto_tb(DisasContext *s, target_ulong dest)
214
{
215
#ifndef CONFIG_USER_ONLY
216
@@ -XXX,XX +XXX,XX @@ static void gen_exception_return(DisasContext *s, TCGv_i32 pc)
217
218
#define CPU_V001 cpu_V0, cpu_V0, cpu_V1
219
220
-static inline void gen_neon_add(int size, TCGv_i32 t0, TCGv_i32 t1)
221
-{
222
- switch (size) {
223
- case 0: gen_helper_neon_add_u8(t0, t0, t1); break;
224
- case 1: gen_helper_neon_add_u16(t0, t0, t1); break;
225
- case 2: tcg_gen_add_i32(t0, t0, t1); break;
226
- default: abort();
227
- }
228
-}
229
-
230
-static inline void gen_neon_rsb(int size, TCGv_i32 t0, TCGv_i32 t1)
231
-{
232
- switch (size) {
233
- case 0: gen_helper_neon_sub_u8(t0, t1, t0); break;
234
- case 1: gen_helper_neon_sub_u16(t0, t1, t0); break;
235
- case 2: tcg_gen_sub_i32(t0, t1, t0); break;
236
- default: return;
237
- }
238
-}
239
-
240
static TCGv_i32 neon_load_scratch(int scratch)
241
{
242
TCGv_i32 tmp = tcg_temp_new_i32();
243
@@ -XXX,XX +XXX,XX @@ static void neon_store_scratch(int scratch, TCGv_i32 var)
244
tcg_temp_free_i32(var);
79
}
245
}
80
246
81
#define VFP_REG_SHR(x, n) (((n) > 0) ? (x) >> (n) : (x) << -(n))
247
-static inline TCGv_i32 neon_get_scalar(int size, int reg)
82
-#define VFP_SREG(insn, bigbit, smallbit) \
248
-{
83
- ((VFP_REG_SHR(insn, bigbit - 1) & 0x1e) | (((insn) >> (smallbit)) & 1))
249
- TCGv_i32 tmp;
84
#define VFP_DREG(reg, insn, bigbit, smallbit) do { \
250
- if (size == 1) {
85
if (dc_isar_feature(aa32_simd_r32, s)) { \
251
- tmp = neon_load_reg(reg & 7, reg >> 4);
86
reg = (((insn) >> (bigbit)) & 0x0f) \
252
- if (reg & 8) {
87
@@ -XXX,XX +XXX,XX @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
253
- gen_neon_dup_high16(tmp);
88
reg = ((insn) >> (bigbit)) & 0x0f; \
89
}} while (0)
90
91
-#define VFP_SREG_D(insn) VFP_SREG(insn, 12, 22)
92
#define VFP_DREG_D(reg, insn) VFP_DREG(reg, insn, 12, 22)
93
-#define VFP_SREG_N(insn) VFP_SREG(insn, 16, 7)
94
#define VFP_DREG_N(reg, insn) VFP_DREG(reg, insn, 16, 7)
95
-#define VFP_SREG_M(insn) VFP_SREG(insn, 0, 5)
96
#define VFP_DREG_M(reg, insn) VFP_DREG(reg, insn, 0, 5)
97
98
static void gen_neon_dup_low16(TCGv_i32 var)
99
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
100
return 0;
101
}
102
103
-/* Advanced SIMD two registers and a scalar extension.
104
- * 31 24 23 22 20 16 12 11 10 9 8 3 0
105
- * +-----------------+----+---+----+----+----+---+----+---+----+---------+----+
106
- * | 1 1 1 1 1 1 1 0 | o1 | D | o2 | Vn | Vd | 1 | o3 | 0 | o4 | N Q M U | Vm |
107
- * +-----------------+----+---+----+----+----+---+----+---+----+---------+----+
108
- *
109
- */
110
-
111
-static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn)
112
-{
113
- gen_helper_gvec_3 *fn_gvec = NULL;
114
- gen_helper_gvec_3_ptr *fn_gvec_ptr = NULL;
115
- int rd, rn, rm, opr_sz, data;
116
- int off_rn, off_rm;
117
- bool is_long = false, q = extract32(insn, 6, 1);
118
- bool ptr_is_env = false;
119
-
120
- if ((insn & 0xffa00f10) == 0xfe000810) {
121
- /* VFM[AS]L -- 1111 1110 0.0S .... .... 1000 .Q.1 .... */
122
- int is_s = extract32(insn, 20, 1);
123
- int vm20 = extract32(insn, 0, 3);
124
- int vm3 = extract32(insn, 3, 1);
125
- int m = extract32(insn, 5, 1);
126
- int index;
127
-
128
- if (!dc_isar_feature(aa32_fhm, s)) {
129
- return 1;
130
- }
131
- if (q) {
132
- rm = vm20;
133
- index = m * 2 + vm3;
134
- } else {
254
- } else {
135
- rm = vm20 * 2 + m;
255
- gen_neon_dup_low16(tmp);
136
- index = vm3;
137
- }
138
- is_long = true;
139
- data = (index << 2) | is_s; /* is_2 == 0 */
140
- fn_gvec_ptr = gen_helper_gvec_fmlal_idx_a32;
141
- ptr_is_env = true;
142
- } else {
143
- return 1;
144
- }
145
-
146
- VFP_DREG_D(rd, insn);
147
- if (rd & q) {
148
- return 1;
149
- }
150
- if (q || !is_long) {
151
- VFP_DREG_N(rn, insn);
152
- if (rn & q & !is_long) {
153
- return 1;
154
- }
155
- off_rn = vfp_reg_offset(1, rn);
156
- off_rm = vfp_reg_offset(1, rm);
157
- } else {
158
- rn = VFP_SREG_N(insn);
159
- off_rn = vfp_reg_offset(0, rn);
160
- off_rm = vfp_reg_offset(0, rm);
161
- }
162
- if (s->fp_excp_el) {
163
- gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
164
- syn_simd_access_trap(1, 0xe, false), s->fp_excp_el);
165
- return 0;
166
- }
167
- if (!s->vfp_enabled) {
168
- return 1;
169
- }
170
-
171
- opr_sz = (1 + q) * 8;
172
- if (fn_gvec_ptr) {
173
- TCGv_ptr ptr;
174
- if (ptr_is_env) {
175
- ptr = cpu_env;
176
- } else {
177
- ptr = get_fpstatus_ptr(1);
178
- }
179
- tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd), off_rn, off_rm, ptr,
180
- opr_sz, opr_sz, data, fn_gvec_ptr);
181
- if (!ptr_is_env) {
182
- tcg_temp_free_ptr(ptr);
183
- }
256
- }
184
- } else {
257
- } else {
185
- tcg_gen_gvec_3_ool(vfp_reg_offset(1, rd), off_rn, off_rm,
258
- tmp = neon_load_reg(reg & 15, reg >> 4);
186
- opr_sz, opr_sz, data, fn_gvec);
187
- }
259
- }
188
- return 0;
260
- return tmp;
189
-}
261
-}
190
-
262
-
191
static int disas_coproc_insn(DisasContext *s, uint32_t insn)
263
static int gen_neon_unzip(int rd, int rm, int size, int q)
192
{
264
{
193
int cpnum, is64, crn, crm, opc1, opc2, isread, rt, rt2;
265
TCGv_ptr pd, pm;
194
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
266
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
267
return 1;
268
}
269
switch (op) {
270
+ case 0: /* Integer VMLA scalar */
271
+ case 4: /* Integer VMLS scalar */
272
+ case 8: /* Integer VMUL scalar */
273
+ return 1; /* handled by decodetree */
274
+
275
case 1: /* Float VMLA scalar */
276
case 5: /* Floating point VMLS scalar */
277
case 9: /* Floating point VMUL scalar */
278
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
279
return 1;
195
}
280
}
196
}
281
/* fall through */
197
}
282
- case 0: /* Integer VMLA scalar */
198
- } else if ((insn & 0x0f000a00) == 0x0e000800
283
- case 4: /* Integer VMLS scalar */
199
- && arm_dc_feature(s, ARM_FEATURE_V8)) {
284
- case 8: /* Integer VMUL scalar */
200
- if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
285
case 12: /* VQDMULH scalar */
201
- goto illegal_op;
286
case 13: /* VQRDMULH scalar */
202
- }
287
if (u && ((rd | rn) & 1)) {
203
- return;
288
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
204
}
289
} else {
205
goto illegal_op;
290
gen_helper_neon_qrdmulh_s32(tmp, cpu_env, tmp, tmp2);
206
}
291
}
207
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
292
- } else if (op & 1) {
208
}
293
+ } else {
209
break;
294
TCGv_ptr fpstatus = get_fpstatus_ptr(1);
210
}
295
gen_helper_vfp_muls(tmp, tmp, tmp2, fpstatus);
211
- if ((insn & 0xff000a00) == 0xfe000800
296
tcg_temp_free_ptr(fpstatus);
212
- && arm_dc_feature(s, ARM_FEATURE_V8)) {
297
- } else {
213
- /* The Thumb2 and ARM encodings are identical. */
298
- switch (size) {
214
- if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
299
- case 0: gen_helper_neon_mul_u8(tmp, tmp, tmp2); break;
215
- goto illegal_op;
300
- case 1: gen_helper_neon_mul_u16(tmp, tmp, tmp2); break;
216
- }
301
- case 2: tcg_gen_mul_i32(tmp, tmp, tmp2); break;
217
- } else if (((insn >> 24) & 3) == 3) {
302
- default: abort();
218
+ if (((insn >> 24) & 3) == 3) {
303
- }
219
/* Translate into the equivalent ARM encoding. */
304
}
220
insn = (insn & 0xe2ffffff) | ((insn & (1 << 28)) >> 4) | (1 << 28);
305
tcg_temp_free_i32(tmp2);
221
if (disas_neon_data_insn(s, insn)) {
306
if (op < 8) {
307
/* Accumulate. */
308
tmp2 = neon_load_reg(rd, pass);
309
switch (op) {
310
- case 0:
311
- gen_neon_add(size, tmp, tmp2);
312
- break;
313
case 1:
314
{
315
TCGv_ptr fpstatus = get_fpstatus_ptr(1);
316
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
317
tcg_temp_free_ptr(fpstatus);
318
break;
319
}
320
- case 4:
321
- gen_neon_rsb(size, tmp, tmp2);
322
- break;
323
case 5:
324
{
325
TCGv_ptr fpstatus = get_fpstatus_ptr(1);
222
--
326
--
223
2.20.1
327
2.20.1
224
328
225
329
diff view generated by jsdifflib
1
Convert the Neon 3-reg-same VMAX and VMIN insns to decodetree.
1
Convert the float versions of VMLA, VMLS and VMUL in the Neon
2
2-reg-scalar group to decodetree.
2
3
3
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
---
6
As noted in the comment on the WRAP_FP_FN macro, we could have
7
had a do_2scalar_fp() function, but for 3 insns it seemed
8
simpler to just do the wrapping to get hold of the fpstatus ptr.
9
(These are the only fp insns in the group.)
4
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
10
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
5
Message-id: 20200430181003.21682-17-peter.maydell@linaro.org
6
---
11
---
7
target/arm/neon-dp.decode | 5 +++++
12
target/arm/neon-dp.decode | 3 ++
8
target/arm/translate-neon.inc.c | 14 ++++++++++++++
13
target/arm/translate-neon.inc.c | 65 +++++++++++++++++++++++++++++++++
9
target/arm/translate.c | 21 ++-------------------
14
target/arm/translate.c | 37 ++-----------------
10
3 files changed, 21 insertions(+), 19 deletions(-)
15
3 files changed, 71 insertions(+), 34 deletions(-)
11
16
12
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
17
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
13
index XXXXXXX..XXXXXXX 100644
18
index XXXXXXX..XXXXXXX 100644
14
--- a/target/arm/neon-dp.decode
19
--- a/target/arm/neon-dp.decode
15
+++ b/target/arm/neon-dp.decode
20
+++ b/target/arm/neon-dp.decode
16
@@ -XXX,XX +XXX,XX @@ VBSL_3s 1111 001 1 0 . 01 .... .... 0001 ... 1 .... @3same_logic
21
@@ -XXX,XX +XXX,XX @@ Vimm_1r 1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
17
VBIT_3s 1111 001 1 0 . 10 .... .... 0001 ... 1 .... @3same_logic
22
&2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp
18
VBIF_3s 1111 001 1 0 . 11 .... .... 0001 ... 1 .... @3same_logic
23
19
24
VMLA_2sc 1111 001 . 1 . .. .... .... 0000 . 1 . 0 .... @2scalar
20
+VMAX_S_3s 1111 001 0 0 . .. .... .... 0110 . . . 0 .... @3same
25
+ VMLA_F_2sc 1111 001 . 1 . .. .... .... 0001 . 1 . 0 .... @2scalar
21
+VMAX_U_3s 1111 001 1 0 . .. .... .... 0110 . . . 0 .... @3same
26
22
+VMIN_S_3s 1111 001 0 0 . .. .... .... 0110 . . . 1 .... @3same
27
VMLS_2sc 1111 001 . 1 . .. .... .... 0100 . 1 . 0 .... @2scalar
23
+VMIN_U_3s 1111 001 1 0 . .. .... .... 0110 . . . 1 .... @3same
28
+ VMLS_F_2sc 1111 001 . 1 . .. .... .... 0101 . 1 . 0 .... @2scalar
24
+
29
25
VADD_3s 1111 001 0 0 . .. .... .... 1000 . . . 0 .... @3same
30
VMUL_2sc 1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
26
VSUB_3s 1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
31
+ VMUL_F_2sc 1111 001 . 1 . .. .... .... 1001 . 1 . 0 .... @2scalar
32
]
33
}
27
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
34
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
28
index XXXXXXX..XXXXXXX 100644
35
index XXXXXXX..XXXXXXX 100644
29
--- a/target/arm/translate-neon.inc.c
36
--- a/target/arm/translate-neon.inc.c
30
+++ b/target/arm/translate-neon.inc.c
37
+++ b/target/arm/translate-neon.inc.c
31
@@ -XXX,XX +XXX,XX @@ DO_3SAME(VEOR, tcg_gen_gvec_xor)
38
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLS_2sc(DisasContext *s, arg_2scalar *a)
32
DO_3SAME_BITSEL(VBSL, rd_ofs, rn_ofs, rm_ofs)
39
33
DO_3SAME_BITSEL(VBIT, rm_ofs, rn_ofs, rd_ofs)
40
return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
34
DO_3SAME_BITSEL(VBIF, rm_ofs, rd_ofs, rn_ofs)
41
}
35
+
42
+
36
+#define DO_3SAME_NO_SZ_3(INSN, FUNC) \
43
+/*
37
+ static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a) \
44
+ * Rather than have a float-specific version of do_2scalar just for
38
+ { \
45
+ * three insns, we wrap a NeonGenTwoSingleOpFn to turn it into
39
+ if (a->size == 3) { \
46
+ * a NeonGenTwoOpFn.
40
+ return false; \
47
+ */
41
+ } \
48
+#define WRAP_FP_FN(WRAPNAME, FUNC) \
42
+ return do_3same(s, a, FUNC); \
49
+ static void WRAPNAME(TCGv_i32 rd, TCGv_i32 rn, TCGv_i32 rm) \
50
+ { \
51
+ TCGv_ptr fpstatus = get_fpstatus_ptr(1); \
52
+ FUNC(rd, rn, rm, fpstatus); \
53
+ tcg_temp_free_ptr(fpstatus); \
43
+ }
54
+ }
44
+
55
+
45
+DO_3SAME_NO_SZ_3(VMAX_S, tcg_gen_gvec_smax)
56
+WRAP_FP_FN(gen_VMUL_F_mul, gen_helper_vfp_muls)
46
+DO_3SAME_NO_SZ_3(VMAX_U, tcg_gen_gvec_umax)
57
+WRAP_FP_FN(gen_VMUL_F_add, gen_helper_vfp_adds)
47
+DO_3SAME_NO_SZ_3(VMIN_S, tcg_gen_gvec_smin)
58
+WRAP_FP_FN(gen_VMUL_F_sub, gen_helper_vfp_subs)
48
+DO_3SAME_NO_SZ_3(VMIN_U, tcg_gen_gvec_umin)
59
+
60
+static bool trans_VMUL_F_2sc(DisasContext *s, arg_2scalar *a)
61
+{
62
+ static NeonGenTwoOpFn * const opfn[] = {
63
+ NULL,
64
+ NULL, /* TODO: fp16 support */
65
+ gen_VMUL_F_mul,
66
+ NULL,
67
+ };
68
+
69
+ return do_2scalar(s, a, opfn[a->size], NULL);
70
+}
71
+
72
+static bool trans_VMLA_F_2sc(DisasContext *s, arg_2scalar *a)
73
+{
74
+ static NeonGenTwoOpFn * const opfn[] = {
75
+ NULL,
76
+ NULL, /* TODO: fp16 support */
77
+ gen_VMUL_F_mul,
78
+ NULL,
79
+ };
80
+ static NeonGenTwoOpFn * const accfn[] = {
81
+ NULL,
82
+ NULL, /* TODO: fp16 support */
83
+ gen_VMUL_F_add,
84
+ NULL,
85
+ };
86
+
87
+ return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
88
+}
89
+
90
+static bool trans_VMLS_F_2sc(DisasContext *s, arg_2scalar *a)
91
+{
92
+ static NeonGenTwoOpFn * const opfn[] = {
93
+ NULL,
94
+ NULL, /* TODO: fp16 support */
95
+ gen_VMUL_F_mul,
96
+ NULL,
97
+ };
98
+ static NeonGenTwoOpFn * const accfn[] = {
99
+ NULL,
100
+ NULL, /* TODO: fp16 support */
101
+ gen_VMUL_F_sub,
102
+ NULL,
103
+ };
104
+
105
+ return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
106
+}
49
diff --git a/target/arm/translate.c b/target/arm/translate.c
107
diff --git a/target/arm/translate.c b/target/arm/translate.c
50
index XXXXXXX..XXXXXXX 100644
108
index XXXXXXX..XXXXXXX 100644
51
--- a/target/arm/translate.c
109
--- a/target/arm/translate.c
52
+++ b/target/arm/translate.c
110
+++ b/target/arm/translate.c
53
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
111
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
54
rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
112
case 0: /* Integer VMLA scalar */
55
return 0;
113
case 4: /* Integer VMLS scalar */
56
114
case 8: /* Integer VMUL scalar */
57
- case NEON_3R_VMAX:
115
- return 1; /* handled by decodetree */
58
- if (u) {
59
- tcg_gen_gvec_umax(size, rd_ofs, rn_ofs, rm_ofs,
60
- vec_size, vec_size);
61
- } else {
62
- tcg_gen_gvec_smax(size, rd_ofs, rn_ofs, rm_ofs,
63
- vec_size, vec_size);
64
- }
65
- return 0;
66
- case NEON_3R_VMIN:
67
- if (u) {
68
- tcg_gen_gvec_umin(size, rd_ofs, rn_ofs, rm_ofs,
69
- vec_size, vec_size);
70
- } else {
71
- tcg_gen_gvec_smin(size, rd_ofs, rn_ofs, rm_ofs,
72
- vec_size, vec_size);
73
- }
74
- return 0;
75
-
116
-
76
case NEON_3R_VSHL:
117
case 1: /* Float VMLA scalar */
77
/* Note the operation is vshl vd,vm,vn */
118
case 5: /* Floating point VMLS scalar */
78
tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, vec_size, vec_size,
119
case 9: /* Floating point VMUL scalar */
120
- if (size == 1) {
121
- return 1;
122
- }
123
- /* fall through */
124
+ return 1; /* handled by decodetree */
125
+
126
case 12: /* VQDMULH scalar */
127
case 13: /* VQRDMULH scalar */
128
if (u && ((rd | rn) & 1)) {
79
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
129
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
80
130
} else {
81
case NEON_3R_VADD_VSUB:
131
gen_helper_neon_qdmulh_s32(tmp, cpu_env, tmp, tmp2);
82
case NEON_3R_LOGIC:
132
}
83
+ case NEON_3R_VMAX:
133
- } else if (op == 13) {
84
+ case NEON_3R_VMIN:
134
+ } else {
85
/* Already handled by decodetree */
135
if (size == 1) {
86
return 1;
136
gen_helper_neon_qrdmulh_s16(tmp, cpu_env, tmp, tmp2);
87
}
137
} else {
138
gen_helper_neon_qrdmulh_s32(tmp, cpu_env, tmp, tmp2);
139
}
140
- } else {
141
- TCGv_ptr fpstatus = get_fpstatus_ptr(1);
142
- gen_helper_vfp_muls(tmp, tmp, tmp2, fpstatus);
143
- tcg_temp_free_ptr(fpstatus);
144
}
145
tcg_temp_free_i32(tmp2);
146
- if (op < 8) {
147
- /* Accumulate. */
148
- tmp2 = neon_load_reg(rd, pass);
149
- switch (op) {
150
- case 1:
151
- {
152
- TCGv_ptr fpstatus = get_fpstatus_ptr(1);
153
- gen_helper_vfp_adds(tmp, tmp, tmp2, fpstatus);
154
- tcg_temp_free_ptr(fpstatus);
155
- break;
156
- }
157
- case 5:
158
- {
159
- TCGv_ptr fpstatus = get_fpstatus_ptr(1);
160
- gen_helper_vfp_subs(tmp, tmp2, tmp, fpstatus);
161
- tcg_temp_free_ptr(fpstatus);
162
- break;
163
- }
164
- default:
165
- abort();
166
- }
167
- tcg_temp_free_i32(tmp2);
168
- }
169
neon_store_reg(rd, pass, tmp);
170
}
171
break;
88
--
172
--
89
2.20.1
173
2.20.1
90
174
91
175
diff view generated by jsdifflib
1
Convert the Neon comparison ops in the 3-reg-same grouping
1
Convert the VQDMULH and VQRDMULH insns in the 2-reg-scalar group
2
to decodetree.
2
to decodetree.
3
3
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Message-id: 20200430181003.21682-18-peter.maydell@linaro.org
7
---
6
---
8
target/arm/neon-dp.decode | 8 ++++++++
7
target/arm/neon-dp.decode | 3 +++
9
target/arm/translate-neon.inc.c | 22 ++++++++++++++++++++++
8
target/arm/translate-neon.inc.c | 29 +++++++++++++++++++++++
10
target/arm/translate.c | 23 +++--------------------
9
target/arm/translate.c | 42 ++-------------------------------
11
3 files changed, 33 insertions(+), 20 deletions(-)
10
3 files changed, 34 insertions(+), 40 deletions(-)
12
11
13
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
12
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
14
index XXXXXXX..XXXXXXX 100644
13
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/neon-dp.decode
14
--- a/target/arm/neon-dp.decode
16
+++ b/target/arm/neon-dp.decode
15
+++ b/target/arm/neon-dp.decode
17
@@ -XXX,XX +XXX,XX @@ VBSL_3s 1111 001 1 0 . 01 .... .... 0001 ... 1 .... @3same_logic
16
@@ -XXX,XX +XXX,XX @@ Vimm_1r 1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
18
VBIT_3s 1111 001 1 0 . 10 .... .... 0001 ... 1 .... @3same_logic
17
19
VBIF_3s 1111 001 1 0 . 11 .... .... 0001 ... 1 .... @3same_logic
18
VMUL_2sc 1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
20
19
VMUL_F_2sc 1111 001 . 1 . .. .... .... 1001 . 1 . 0 .... @2scalar
21
+VCGT_S_3s 1111 001 0 0 . .. .... .... 0011 . . . 0 .... @3same
22
+VCGT_U_3s 1111 001 1 0 . .. .... .... 0011 . . . 0 .... @3same
23
+VCGE_S_3s 1111 001 0 0 . .. .... .... 0011 . . . 1 .... @3same
24
+VCGE_U_3s 1111 001 1 0 . .. .... .... 0011 . . . 1 .... @3same
25
+
20
+
26
VMAX_S_3s 1111 001 0 0 . .. .... .... 0110 . . . 0 .... @3same
21
+ VQDMULH_2sc 1111 001 . 1 . .. .... .... 1100 . 1 . 0 .... @2scalar
27
VMAX_U_3s 1111 001 1 0 . .. .... .... 0110 . . . 0 .... @3same
22
+ VQRDMULH_2sc 1111 001 . 1 . .. .... .... 1101 . 1 . 0 .... @2scalar
28
VMIN_S_3s 1111 001 0 0 . .. .... .... 0110 . . . 1 .... @3same
23
]
29
@@ -XXX,XX +XXX,XX @@ VMIN_U_3s 1111 001 1 0 . .. .... .... 0110 . . . 1 .... @3same
24
}
30
31
VADD_3s 1111 001 0 0 . .. .... .... 1000 . . . 0 .... @3same
32
VSUB_3s 1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
33
+
34
+VTST_3s 1111 001 0 0 . .. .... .... 1000 . . . 1 .... @3same
35
+VCEQ_3s 1111 001 1 0 . .. .... .... 1000 . . . 1 .... @3same
36
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
25
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
37
index XXXXXXX..XXXXXXX 100644
26
index XXXXXXX..XXXXXXX 100644
38
--- a/target/arm/translate-neon.inc.c
27
--- a/target/arm/translate-neon.inc.c
39
+++ b/target/arm/translate-neon.inc.c
28
+++ b/target/arm/translate-neon.inc.c
40
@@ -XXX,XX +XXX,XX @@ DO_3SAME_NO_SZ_3(VMAX_S, tcg_gen_gvec_smax)
29
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLS_F_2sc(DisasContext *s, arg_2scalar *a)
41
DO_3SAME_NO_SZ_3(VMAX_U, tcg_gen_gvec_umax)
30
42
DO_3SAME_NO_SZ_3(VMIN_S, tcg_gen_gvec_smin)
31
return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
43
DO_3SAME_NO_SZ_3(VMIN_U, tcg_gen_gvec_umin)
32
}
44
+
33
+
45
+#define DO_3SAME_CMP(INSN, COND) \
34
+WRAP_ENV_FN(gen_VQDMULH_16, gen_helper_neon_qdmulh_s16)
46
+ static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs, \
35
+WRAP_ENV_FN(gen_VQDMULH_32, gen_helper_neon_qdmulh_s32)
47
+ uint32_t rn_ofs, uint32_t rm_ofs, \
36
+WRAP_ENV_FN(gen_VQRDMULH_16, gen_helper_neon_qrdmulh_s16)
48
+ uint32_t oprsz, uint32_t maxsz) \
37
+WRAP_ENV_FN(gen_VQRDMULH_32, gen_helper_neon_qrdmulh_s32)
49
+ { \
50
+ tcg_gen_gvec_cmp(COND, vece, rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz); \
51
+ } \
52
+ DO_3SAME_NO_SZ_3(INSN, gen_##INSN##_3s)
53
+
38
+
54
+DO_3SAME_CMP(VCGT_S, TCG_COND_GT)
39
+static bool trans_VQDMULH_2sc(DisasContext *s, arg_2scalar *a)
55
+DO_3SAME_CMP(VCGT_U, TCG_COND_GTU)
40
+{
56
+DO_3SAME_CMP(VCGE_S, TCG_COND_GE)
41
+ static NeonGenTwoOpFn * const opfn[] = {
57
+DO_3SAME_CMP(VCGE_U, TCG_COND_GEU)
42
+ NULL,
58
+DO_3SAME_CMP(VCEQ, TCG_COND_EQ)
43
+ gen_VQDMULH_16,
44
+ gen_VQDMULH_32,
45
+ NULL,
46
+ };
59
+
47
+
60
+static void gen_VTST_3s(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
48
+ return do_2scalar(s, a, opfn[a->size], NULL);
61
+ uint32_t rm_ofs, uint32_t oprsz, uint32_t maxsz)
49
+}
50
+
51
+static bool trans_VQRDMULH_2sc(DisasContext *s, arg_2scalar *a)
62
+{
52
+{
63
+ tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &cmtst_op[vece]);
53
+ static NeonGenTwoOpFn * const opfn[] = {
54
+ NULL,
55
+ gen_VQRDMULH_16,
56
+ gen_VQRDMULH_32,
57
+ NULL,
58
+ };
59
+
60
+ return do_2scalar(s, a, opfn[a->size], NULL);
64
+}
61
+}
65
+DO_3SAME_NO_SZ_3(VTST, gen_VTST_3s)
66
diff --git a/target/arm/translate.c b/target/arm/translate.c
62
diff --git a/target/arm/translate.c b/target/arm/translate.c
67
index XXXXXXX..XXXXXXX 100644
63
index XXXXXXX..XXXXXXX 100644
68
--- a/target/arm/translate.c
64
--- a/target/arm/translate.c
69
+++ b/target/arm/translate.c
65
+++ b/target/arm/translate.c
66
@@ -XXX,XX +XXX,XX @@ static void gen_exception_return(DisasContext *s, TCGv_i32 pc)
67
68
#define CPU_V001 cpu_V0, cpu_V0, cpu_V1
69
70
-static TCGv_i32 neon_load_scratch(int scratch)
71
-{
72
- TCGv_i32 tmp = tcg_temp_new_i32();
73
- tcg_gen_ld_i32(tmp, cpu_env, offsetof(CPUARMState, vfp.scratch[scratch]));
74
- return tmp;
75
-}
76
-
77
-static void neon_store_scratch(int scratch, TCGv_i32 var)
78
-{
79
- tcg_gen_st_i32(var, cpu_env, offsetof(CPUARMState, vfp.scratch[scratch]));
80
- tcg_temp_free_i32(var);
81
-}
82
-
83
static int gen_neon_unzip(int rd, int rm, int size, int q)
84
{
85
TCGv_ptr pd, pm;
70
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
86
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
71
u ? &mls_op[size] : &mla_op[size]);
87
case 1: /* Float VMLA scalar */
72
return 0;
88
case 5: /* Floating point VMLS scalar */
73
89
case 9: /* Floating point VMUL scalar */
74
- case NEON_3R_VTST_VCEQ:
90
- return 1; /* handled by decodetree */
75
- if (u) { /* VCEQ */
76
- tcg_gen_gvec_cmp(TCG_COND_EQ, size, rd_ofs, rn_ofs, rm_ofs,
77
- vec_size, vec_size);
78
- } else { /* VTST */
79
- tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
80
- vec_size, vec_size, &cmtst_op[size]);
81
- }
82
- return 0;
83
-
91
-
84
- case NEON_3R_VCGT:
92
case 12: /* VQDMULH scalar */
85
- tcg_gen_gvec_cmp(u ? TCG_COND_GTU : TCG_COND_GT, size,
93
case 13: /* VQRDMULH scalar */
86
- rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
94
- if (u && ((rd | rn) & 1)) {
87
- return 0;
95
- return 1;
88
-
96
- }
89
- case NEON_3R_VCGE:
97
- tmp = neon_get_scalar(size, rm);
90
- tcg_gen_gvec_cmp(u ? TCG_COND_GEU : TCG_COND_GE, size,
98
- neon_store_scratch(0, tmp);
91
- rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
99
- for (pass = 0; pass < (u ? 4 : 2); pass++) {
92
- return 0;
100
- tmp = neon_load_scratch(0);
93
-
101
- tmp2 = neon_load_reg(rn, pass);
94
case NEON_3R_VSHL:
102
- if (op == 12) {
95
/* Note the operation is vshl vd,vm,vn */
103
- if (size == 1) {
96
tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, vec_size, vec_size,
104
- gen_helper_neon_qdmulh_s16(tmp, cpu_env, tmp, tmp2);
97
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
105
- } else {
98
case NEON_3R_LOGIC:
106
- gen_helper_neon_qdmulh_s32(tmp, cpu_env, tmp, tmp2);
99
case NEON_3R_VMAX:
107
- }
100
case NEON_3R_VMIN:
108
- } else {
101
+ case NEON_3R_VTST_VCEQ:
109
- if (size == 1) {
102
+ case NEON_3R_VCGT:
110
- gen_helper_neon_qrdmulh_s16(tmp, cpu_env, tmp, tmp2);
103
+ case NEON_3R_VCGE:
111
- } else {
104
/* Already handled by decodetree */
112
- gen_helper_neon_qrdmulh_s32(tmp, cpu_env, tmp, tmp2);
105
return 1;
113
- }
106
}
114
- }
115
- tcg_temp_free_i32(tmp2);
116
- neon_store_reg(rd, pass, tmp);
117
- }
118
- break;
119
+ return 1; /* handled by decodetree */
120
+
121
case 3: /* VQDMLAL scalar */
122
case 7: /* VQDMLSL scalar */
123
case 11: /* VQDMULL scalar */
107
--
124
--
108
2.20.1
125
2.20.1
109
126
110
127
diff view generated by jsdifflib
1
Convert VCMLA (scalar) in the 2reg-scalar-ext group to decodetree.
1
Convert the VQRDMLAH and VQRDMLSH insns in the 2-reg-scalar
2
group to decodetree.
2
3
3
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
5
Message-id: 20200430181003.21682-9-peter.maydell@linaro.org
6
---
6
---
7
target/arm/neon-shared.decode | 5 +++++
7
target/arm/neon-dp.decode | 3 ++
8
target/arm/translate-neon.inc.c | 40 +++++++++++++++++++++++++++++++++
8
target/arm/translate-neon.inc.c | 74 +++++++++++++++++++++++++++++++++
9
target/arm/translate.c | 26 +--------------------
9
target/arm/translate.c | 38 +----------------
10
3 files changed, 46 insertions(+), 25 deletions(-)
10
3 files changed, 79 insertions(+), 36 deletions(-)
11
11
12
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
12
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
13
index XXXXXXX..XXXXXXX 100644
13
index XXXXXXX..XXXXXXX 100644
14
--- a/target/arm/neon-shared.decode
14
--- a/target/arm/neon-dp.decode
15
+++ b/target/arm/neon-shared.decode
15
+++ b/target/arm/neon-dp.decode
16
@@ -XXX,XX +XXX,XX @@ VFML 1111 110 0 s:1 . 10 .... .... 1000 . 0 . 1 .... \
16
@@ -XXX,XX +XXX,XX @@ Vimm_1r 1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
17
vm=%vm_sp vn=%vn_sp vd=%vd_dp q=0
17
18
VFML 1111 110 0 s:1 . 10 .... .... 1000 . 1 . 1 .... \
18
VQDMULH_2sc 1111 001 . 1 . .. .... .... 1100 . 1 . 0 .... @2scalar
19
vm=%vm_dp vn=%vn_dp vd=%vd_dp q=1
19
VQRDMULH_2sc 1111 001 . 1 . .. .... .... 1101 . 1 . 0 .... @2scalar
20
+
20
+
21
+VCMLA_scalar 1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
21
+ VQRDMLAH_2sc 1111 001 . 1 . .. .... .... 1110 . 1 . 0 .... @2scalar
22
+ vn=%vn_dp vd=%vd_dp size=0
22
+ VQRDMLSH_2sc 1111 001 . 1 . .. .... .... 1111 . 1 . 0 .... @2scalar
23
+VCMLA_scalar 1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
23
]
24
+ vm=%vm_dp vn=%vn_dp vd=%vd_dp size=1 index=0
24
}
25
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
25
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
26
index XXXXXXX..XXXXXXX 100644
26
index XXXXXXX..XXXXXXX 100644
27
--- a/target/arm/translate-neon.inc.c
27
--- a/target/arm/translate-neon.inc.c
28
+++ b/target/arm/translate-neon.inc.c
28
+++ b/target/arm/translate-neon.inc.c
29
@@ -XXX,XX +XXX,XX @@ static bool trans_VFML(DisasContext *s, arg_VFML *a)
29
@@ -XXX,XX +XXX,XX @@ static bool trans_VQRDMULH_2sc(DisasContext *s, arg_2scalar *a)
30
gen_helper_gvec_fmlal_a32);
30
31
return true;
31
return do_2scalar(s, a, opfn[a->size], NULL);
32
}
32
}
33
+
33
+
34
+static bool trans_VCMLA_scalar(DisasContext *s, arg_VCMLA_scalar *a)
34
+static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
35
+ NeonGenThreeOpEnvFn *opfn)
35
+{
36
+{
36
+ gen_helper_gvec_3_ptr *fn_gvec_ptr;
37
+ /*
37
+ int opr_sz;
38
+ * VQRDMLAH/VQRDMLSH: this is like do_2scalar, but the opfn
38
+ TCGv_ptr fpst;
39
+ * performs a kind of fused op-then-accumulate using a helper
40
+ * function that takes all of rd, rn and the scalar at once.
41
+ */
42
+ TCGv_i32 scalar;
43
+ int pass;
39
+
44
+
40
+ if (!dc_isar_feature(aa32_vcma, s)) {
45
+ if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
41
+ return false;
46
+ return false;
42
+ }
47
+ }
43
+ if (a->size == 0 && !dc_isar_feature(aa32_fp16_arith, s)) {
48
+
49
+ if (!dc_isar_feature(aa32_rdm, s)) {
44
+ return false;
50
+ return false;
45
+ }
51
+ }
46
+
52
+
47
+ /* UNDEF accesses to D16-D31 if they don't exist. */
53
+ /* UNDEF accesses to D16-D31 if they don't exist. */
48
+ if (!dc_isar_feature(aa32_simd_r32, s) &&
54
+ if (!dc_isar_feature(aa32_simd_r32, s) &&
49
+ ((a->vd | a->vn | a->vm) & 0x10)) {
55
+ ((a->vd | a->vn | a->vm) & 0x10)) {
50
+ return false;
56
+ return false;
51
+ }
57
+ }
52
+
58
+
53
+ if ((a->vd | a->vn) & a->q) {
59
+ if (!opfn) {
60
+ /* Bad size (including size == 3, which is a different insn group) */
61
+ return false;
62
+ }
63
+
64
+ if (a->q && ((a->vd | a->vn) & 1)) {
54
+ return false;
65
+ return false;
55
+ }
66
+ }
56
+
67
+
57
+ if (!vfp_access_check(s)) {
68
+ if (!vfp_access_check(s)) {
58
+ return true;
69
+ return true;
59
+ }
70
+ }
60
+
71
+
61
+ fn_gvec_ptr = (a->size ? gen_helper_gvec_fcmlas_idx
72
+ scalar = neon_get_scalar(a->size, a->vm);
62
+ : gen_helper_gvec_fcmlah_idx);
73
+
63
+ opr_sz = (1 + a->q) * 8;
74
+ for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
64
+ fpst = get_fpstatus_ptr(1);
75
+ TCGv_i32 rn = neon_load_reg(a->vn, pass);
65
+ tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
76
+ TCGv_i32 rd = neon_load_reg(a->vd, pass);
66
+ vfp_reg_offset(1, a->vn),
77
+ opfn(rd, cpu_env, rn, scalar, rd);
67
+ vfp_reg_offset(1, a->vm),
78
+ tcg_temp_free_i32(rn);
68
+ fpst, opr_sz, opr_sz,
79
+ neon_store_reg(a->vd, pass, rd);
69
+ (a->index << 2) | a->rot, fn_gvec_ptr);
80
+ }
70
+ tcg_temp_free_ptr(fpst);
81
+ tcg_temp_free_i32(scalar);
82
+
71
+ return true;
83
+ return true;
84
+}
85
+
86
+static bool trans_VQRDMLAH_2sc(DisasContext *s, arg_2scalar *a)
87
+{
88
+ static NeonGenThreeOpEnvFn *opfn[] = {
89
+ NULL,
90
+ gen_helper_neon_qrdmlah_s16,
91
+ gen_helper_neon_qrdmlah_s32,
92
+ NULL,
93
+ };
94
+ return do_vqrdmlah_2sc(s, a, opfn[a->size]);
95
+}
96
+
97
+static bool trans_VQRDMLSH_2sc(DisasContext *s, arg_2scalar *a)
98
+{
99
+ static NeonGenThreeOpEnvFn *opfn[] = {
100
+ NULL,
101
+ gen_helper_neon_qrdmlsh_s16,
102
+ gen_helper_neon_qrdmlsh_s32,
103
+ NULL,
104
+ };
105
+ return do_vqrdmlah_2sc(s, a, opfn[a->size]);
72
+}
106
+}
73
diff --git a/target/arm/translate.c b/target/arm/translate.c
107
diff --git a/target/arm/translate.c b/target/arm/translate.c
74
index XXXXXXX..XXXXXXX 100644
108
index XXXXXXX..XXXXXXX 100644
75
--- a/target/arm/translate.c
109
--- a/target/arm/translate.c
76
+++ b/target/arm/translate.c
110
+++ b/target/arm/translate.c
77
@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn)
111
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
78
bool is_long = false, q = extract32(insn, 6, 1);
112
case 9: /* Floating point VMUL scalar */
79
bool ptr_is_env = false;
113
case 12: /* VQDMULH scalar */
80
114
case 13: /* VQRDMULH scalar */
81
- if ((insn & 0xff000f10) == 0xfe000800) {
115
+ case 14: /* VQRDMLAH scalar */
82
- /* VCMLA (indexed) -- 1111 1110 S.RR .... .... 1000 ...0 .... */
116
+ case 15: /* VQRDMLSH scalar */
83
- int rot = extract32(insn, 20, 2);
117
return 1; /* handled by decodetree */
84
- int size = extract32(insn, 23, 1);
118
85
- int index;
119
case 3: /* VQDMLAL scalar */
120
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
121
neon_store_reg64(cpu_V0, rd + pass);
122
}
123
break;
124
- case 14: /* VQRDMLAH scalar */
125
- case 15: /* VQRDMLSH scalar */
126
- {
127
- NeonGenThreeOpEnvFn *fn;
86
-
128
-
87
- if (!dc_isar_feature(aa32_vcma, s)) {
129
- if (!dc_isar_feature(aa32_rdm, s)) {
88
- return 1;
130
- return 1;
89
- }
131
- }
90
- if (size == 0) {
132
- if (u && ((rd | rn) & 1)) {
91
- if (!dc_isar_feature(aa32_fp16_arith, s)) {
133
- return 1;
92
- return 1;
134
- }
93
- }
135
- if (op == 14) {
94
- /* For fp16, rm is just Vm, and index is M. */
136
- if (size == 1) {
95
- rm = extract32(insn, 0, 4);
137
- fn = gen_helper_neon_qrdmlah_s16;
96
- index = extract32(insn, 5, 1);
138
- } else {
97
- } else {
139
- fn = gen_helper_neon_qrdmlah_s32;
98
- /* For fp32, rm is the usual M:Vm, and index is 0. */
140
- }
99
- VFP_DREG_M(rm, insn);
141
- } else {
100
- index = 0;
142
- if (size == 1) {
101
- }
143
- fn = gen_helper_neon_qrdmlsh_s16;
102
- data = (index << 2) | rot;
144
- } else {
103
- fn_gvec_ptr = (size ? gen_helper_gvec_fcmlas_idx
145
- fn = gen_helper_neon_qrdmlsh_s32;
104
- : gen_helper_gvec_fcmlah_idx);
146
- }
105
- } else if ((insn & 0xffb00f00) == 0xfe200d00) {
147
- }
106
+ if ((insn & 0xffb00f00) == 0xfe200d00) {
148
-
107
/* V[US]DOT -- 1111 1110 0.10 .... .... 1101 .Q.U .... */
149
- tmp2 = neon_get_scalar(size, rm);
108
int u = extract32(insn, 4, 1);
150
- for (pass = 0; pass < (u ? 4 : 2); pass++) {
109
151
- tmp = neon_load_reg(rn, pass);
152
- tmp3 = neon_load_reg(rd, pass);
153
- fn(tmp, cpu_env, tmp, tmp2, tmp3);
154
- tcg_temp_free_i32(tmp3);
155
- neon_store_reg(rd, pass, tmp);
156
- }
157
- tcg_temp_free_i32(tmp2);
158
- }
159
- break;
160
default:
161
g_assert_not_reached();
162
}
110
--
163
--
111
2.20.1
164
2.20.1
112
165
113
166
diff view generated by jsdifflib
1
Convert the Neon "load single structure to all lanes" insns to
1
Convert the Neon 2-reg-scalar long multiplies to decodetree.
2
decodetree.
2
These are the last instructions in the group.
3
3
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Message-id: 20200430181003.21682-13-peter.maydell@linaro.org
7
---
6
---
8
target/arm/neon-ls.decode | 5 +++
7
target/arm/neon-dp.decode | 18 ++++
9
target/arm/translate-neon.inc.c | 73 +++++++++++++++++++++++++++++++++
8
target/arm/translate-neon.inc.c | 163 ++++++++++++++++++++++++++++
10
target/arm/translate.c | 55 +------------------------
9
target/arm/translate.c | 182 ++------------------------------
11
3 files changed, 80 insertions(+), 53 deletions(-)
10
3 files changed, 187 insertions(+), 176 deletions(-)
12
11
13
diff --git a/target/arm/neon-ls.decode b/target/arm/neon-ls.decode
12
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
14
index XXXXXXX..XXXXXXX 100644
13
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/neon-ls.decode
14
--- a/target/arm/neon-dp.decode
16
+++ b/target/arm/neon-ls.decode
15
+++ b/target/arm/neon-dp.decode
17
@@ -XXX,XX +XXX,XX @@
16
@@ -XXX,XX +XXX,XX @@ Vimm_1r 1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
18
17
19
VLDST_multiple 1111 0100 0 . l:1 0 rn:4 .... itype:4 size:2 align:2 rm:4 \
18
@2scalar .... ... q:1 . . size:2 .... .... .... . . . . .... \
20
vd=%vd_dp
19
&2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp
21
+
20
+ # For the 'long' ops the Q bit is part of insn decode
22
+# Neon load single element to all lanes
21
+ @2scalar_q0 .... ... . . . size:2 .... .... .... . . . . .... \
23
+
22
+ &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp q=0
24
+VLD_all_lanes 1111 0100 1 . 1 0 rn:4 .... 11 n:2 size:2 t:1 a:1 rm:4 \
23
25
+ vd=%vd_dp
24
VMLA_2sc 1111 001 . 1 . .. .... .... 0000 . 1 . 0 .... @2scalar
25
VMLA_F_2sc 1111 001 . 1 . .. .... .... 0001 . 1 . 0 .... @2scalar
26
27
+ VMLAL_S_2sc 1111 001 0 1 . .. .... .... 0010 . 1 . 0 .... @2scalar_q0
28
+ VMLAL_U_2sc 1111 001 1 1 . .. .... .... 0010 . 1 . 0 .... @2scalar_q0
29
+
30
+ VQDMLAL_2sc 1111 001 0 1 . .. .... .... 0011 . 1 . 0 .... @2scalar_q0
31
+
32
VMLS_2sc 1111 001 . 1 . .. .... .... 0100 . 1 . 0 .... @2scalar
33
VMLS_F_2sc 1111 001 . 1 . .. .... .... 0101 . 1 . 0 .... @2scalar
34
35
+ VMLSL_S_2sc 1111 001 0 1 . .. .... .... 0110 . 1 . 0 .... @2scalar_q0
36
+ VMLSL_U_2sc 1111 001 1 1 . .. .... .... 0110 . 1 . 0 .... @2scalar_q0
37
+
38
+ VQDMLSL_2sc 1111 001 0 1 . .. .... .... 0111 . 1 . 0 .... @2scalar_q0
39
+
40
VMUL_2sc 1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
41
VMUL_F_2sc 1111 001 . 1 . .. .... .... 1001 . 1 . 0 .... @2scalar
42
43
+ VMULL_S_2sc 1111 001 0 1 . .. .... .... 1010 . 1 . 0 .... @2scalar_q0
44
+ VMULL_U_2sc 1111 001 1 1 . .. .... .... 1010 . 1 . 0 .... @2scalar_q0
45
+
46
+ VQDMULL_2sc 1111 001 0 1 . .. .... .... 1011 . 1 . 0 .... @2scalar_q0
47
+
48
VQDMULH_2sc 1111 001 . 1 . .. .... .... 1100 . 1 . 0 .... @2scalar
49
VQRDMULH_2sc 1111 001 . 1 . .. .... .... 1101 . 1 . 0 .... @2scalar
50
26
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
51
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
27
index XXXXXXX..XXXXXXX 100644
52
index XXXXXXX..XXXXXXX 100644
28
--- a/target/arm/translate-neon.inc.c
53
--- a/target/arm/translate-neon.inc.c
29
+++ b/target/arm/translate-neon.inc.c
54
+++ b/target/arm/translate-neon.inc.c
30
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDST_multiple(DisasContext *s, arg_VLDST_multiple *a)
55
@@ -XXX,XX +XXX,XX @@ static bool trans_VQRDMLSH_2sc(DisasContext *s, arg_2scalar *a)
31
gen_neon_ldst_base_update(s, a->rm, a->rn, nregs * interleave * 8);
56
};
32
return true;
57
return do_vqrdmlah_2sc(s, a, opfn[a->size]);
33
}
58
}
34
+
59
+
35
+static bool trans_VLD_all_lanes(DisasContext *s, arg_VLD_all_lanes *a)
60
+static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
36
+{
61
+ NeonGenTwoOpWidenFn *opfn,
37
+ /* Neon load single structure to all lanes */
62
+ NeonGenTwo64OpFn *accfn)
38
+ int reg, stride, vec_size;
63
+{
39
+ int vd = a->vd;
64
+ /*
40
+ int size = a->size;
65
+ * Two registers and a scalar, long operations: perform an
41
+ int nregs = a->n + 1;
66
+ * operation on the input elements and the scalar which produces
42
+ TCGv_i32 addr, tmp;
67
+ * a double-width result, and then possibly perform an accumulation
68
+ * operation of that result into the destination.
69
+ */
70
+ TCGv_i32 scalar, rn;
71
+ TCGv_i64 rn0_64, rn1_64;
43
+
72
+
44
+ if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
73
+ if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
45
+ return false;
74
+ return false;
46
+ }
75
+ }
47
+
76
+
48
+ /* UNDEF accesses to D16-D31 if they don't exist */
77
+ /* UNDEF accesses to D16-D31 if they don't exist. */
49
+ if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
78
+ if (!dc_isar_feature(aa32_simd_r32, s) &&
79
+ ((a->vd | a->vn | a->vm) & 0x10)) {
50
+ return false;
80
+ return false;
51
+ }
81
+ }
52
+
82
+
53
+ if (size == 3) {
83
+ if (!opfn) {
54
+ if (nregs != 4 || a->a == 0) {
84
+ /* Bad size (including size == 3, which is a different insn group) */
55
+ return false;
56
+ }
57
+ /* For VLD4 size == 3 a == 1 means 32 bits at 16 byte alignment */
58
+ size = 2;
59
+ }
60
+ if (nregs == 1 && a->a == 1 && size == 0) {
61
+ return false;
85
+ return false;
62
+ }
86
+ }
63
+ if (nregs == 3 && a->a == 1) {
87
+
88
+ if (a->vd & 1) {
64
+ return false;
89
+ return false;
65
+ }
90
+ }
66
+
91
+
67
+ if (!vfp_access_check(s)) {
92
+ if (!vfp_access_check(s)) {
68
+ return true;
93
+ return true;
69
+ }
94
+ }
70
+
95
+
71
+ /*
96
+ scalar = neon_get_scalar(a->size, a->vm);
72
+ * VLD1 to all lanes: T bit indicates how many Dregs to write.
97
+
73
+ * VLD2/3/4 to all lanes: T bit indicates register stride.
98
+ /* Load all inputs before writing any outputs, in case of overlap */
74
+ */
99
+ rn = neon_load_reg(a->vn, 0);
75
+ stride = a->t ? 2 : 1;
100
+ rn0_64 = tcg_temp_new_i64();
76
+ vec_size = nregs == 1 ? stride * 8 : 8;
101
+ opfn(rn0_64, rn, scalar);
77
+
102
+ tcg_temp_free_i32(rn);
78
+ tmp = tcg_temp_new_i32();
103
+
79
+ addr = tcg_temp_new_i32();
104
+ rn = neon_load_reg(a->vn, 1);
80
+ load_reg_var(s, addr, a->rn);
105
+ rn1_64 = tcg_temp_new_i64();
81
+ for (reg = 0; reg < nregs; reg++) {
106
+ opfn(rn1_64, rn, scalar);
82
+ gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s),
107
+ tcg_temp_free_i32(rn);
83
+ s->be_data | size);
108
+ tcg_temp_free_i32(scalar);
84
+ if ((vd & 1) && vec_size == 16) {
109
+
85
+ /*
110
+ if (accfn) {
86
+ * We cannot write 16 bytes at once because the
111
+ TCGv_i64 t64 = tcg_temp_new_i64();
87
+ * destination is unaligned.
112
+ neon_load_reg64(t64, a->vd);
88
+ */
113
+ accfn(t64, t64, rn0_64);
89
+ tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
114
+ neon_store_reg64(t64, a->vd);
90
+ 8, 8, tmp);
115
+ neon_load_reg64(t64, a->vd + 1);
91
+ tcg_gen_gvec_mov(0, neon_reg_offset(vd + 1, 0),
116
+ accfn(t64, t64, rn1_64);
92
+ neon_reg_offset(vd, 0), 8, 8);
117
+ neon_store_reg64(t64, a->vd + 1);
93
+ } else {
118
+ tcg_temp_free_i64(t64);
94
+ tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
119
+ } else {
95
+ vec_size, vec_size, tmp);
120
+ neon_store_reg64(rn0_64, a->vd);
96
+ }
121
+ neon_store_reg64(rn1_64, a->vd + 1);
97
+ tcg_gen_addi_i32(addr, addr, 1 << size);
122
+ }
98
+ vd += stride;
123
+ tcg_temp_free_i64(rn0_64);
99
+ }
124
+ tcg_temp_free_i64(rn1_64);
100
+ tcg_temp_free_i32(tmp);
101
+ tcg_temp_free_i32(addr);
102
+
103
+ gen_neon_ldst_base_update(s, a->rm, a->rn, (1 << size) * nregs);
104
+
105
+ return true;
125
+ return true;
126
+}
127
+
128
+static bool trans_VMULL_S_2sc(DisasContext *s, arg_2scalar *a)
129
+{
130
+ static NeonGenTwoOpWidenFn * const opfn[] = {
131
+ NULL,
132
+ gen_helper_neon_mull_s16,
133
+ gen_mull_s32,
134
+ NULL,
135
+ };
136
+
137
+ return do_2scalar_long(s, a, opfn[a->size], NULL);
138
+}
139
+
140
+static bool trans_VMULL_U_2sc(DisasContext *s, arg_2scalar *a)
141
+{
142
+ static NeonGenTwoOpWidenFn * const opfn[] = {
143
+ NULL,
144
+ gen_helper_neon_mull_u16,
145
+ gen_mull_u32,
146
+ NULL,
147
+ };
148
+
149
+ return do_2scalar_long(s, a, opfn[a->size], NULL);
150
+}
151
+
152
+#define DO_VMLAL_2SC(INSN, MULL, ACC) \
153
+ static bool trans_##INSN##_2sc(DisasContext *s, arg_2scalar *a) \
154
+ { \
155
+ static NeonGenTwoOpWidenFn * const opfn[] = { \
156
+ NULL, \
157
+ gen_helper_neon_##MULL##16, \
158
+ gen_##MULL##32, \
159
+ NULL, \
160
+ }; \
161
+ static NeonGenTwo64OpFn * const accfn[] = { \
162
+ NULL, \
163
+ gen_helper_neon_##ACC##l_u32, \
164
+ tcg_gen_##ACC##_i64, \
165
+ NULL, \
166
+ }; \
167
+ return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]); \
168
+ }
169
+
170
+DO_VMLAL_2SC(VMLAL_S, mull_s, add)
171
+DO_VMLAL_2SC(VMLAL_U, mull_u, add)
172
+DO_VMLAL_2SC(VMLSL_S, mull_s, sub)
173
+DO_VMLAL_2SC(VMLSL_U, mull_u, sub)
174
+
175
+static bool trans_VQDMULL_2sc(DisasContext *s, arg_2scalar *a)
176
+{
177
+ static NeonGenTwoOpWidenFn * const opfn[] = {
178
+ NULL,
179
+ gen_VQDMULL_16,
180
+ gen_VQDMULL_32,
181
+ NULL,
182
+ };
183
+
184
+ return do_2scalar_long(s, a, opfn[a->size], NULL);
185
+}
186
+
187
+static bool trans_VQDMLAL_2sc(DisasContext *s, arg_2scalar *a)
188
+{
189
+ static NeonGenTwoOpWidenFn * const opfn[] = {
190
+ NULL,
191
+ gen_VQDMULL_16,
192
+ gen_VQDMULL_32,
193
+ NULL,
194
+ };
195
+ static NeonGenTwo64OpFn * const accfn[] = {
196
+ NULL,
197
+ gen_VQDMLAL_acc_16,
198
+ gen_VQDMLAL_acc_32,
199
+ NULL,
200
+ };
201
+
202
+ return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
203
+}
204
+
205
+static bool trans_VQDMLSL_2sc(DisasContext *s, arg_2scalar *a)
206
+{
207
+ static NeonGenTwoOpWidenFn * const opfn[] = {
208
+ NULL,
209
+ gen_VQDMULL_16,
210
+ gen_VQDMULL_32,
211
+ NULL,
212
+ };
213
+ static NeonGenTwo64OpFn * const accfn[] = {
214
+ NULL,
215
+ gen_VQDMLSL_acc_16,
216
+ gen_VQDMLSL_acc_32,
217
+ NULL,
218
+ };
219
+
220
+ return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
106
+}
221
+}
107
diff --git a/target/arm/translate.c b/target/arm/translate.c
222
diff --git a/target/arm/translate.c b/target/arm/translate.c
108
index XXXXXXX..XXXXXXX 100644
223
index XXXXXXX..XXXXXXX 100644
109
--- a/target/arm/translate.c
224
--- a/target/arm/translate.c
110
+++ b/target/arm/translate.c
225
+++ b/target/arm/translate.c
111
@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
226
@@ -XXX,XX +XXX,XX @@ static void gen_revsh(TCGv_i32 dest, TCGv_i32 var)
112
int size;
227
tcg_gen_ext16s_i32(dest, var);
113
int reg;
228
}
114
int load;
229
115
- int vec_size;
230
-/* 32x32->64 multiply. Marks inputs as dead. */
116
TCGv_i32 addr;
231
-static TCGv_i64 gen_mulu_i64_i32(TCGv_i32 a, TCGv_i32 b)
117
TCGv_i32 tmp;
232
-{
118
233
- TCGv_i32 lo = tcg_temp_new_i32();
119
@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
234
- TCGv_i32 hi = tcg_temp_new_i32();
120
} else {
235
- TCGv_i64 ret;
121
size = (insn >> 10) & 3;
236
-
122
if (size == 3) {
237
- tcg_gen_mulu2_i32(lo, hi, a, b);
123
- /* Load single element to all lanes. */
238
- tcg_temp_free_i32(a);
124
- int a = (insn >> 4) & 1;
239
- tcg_temp_free_i32(b);
125
- if (!load) {
240
-
241
- ret = tcg_temp_new_i64();
242
- tcg_gen_concat_i32_i64(ret, lo, hi);
243
- tcg_temp_free_i32(lo);
244
- tcg_temp_free_i32(hi);
245
-
246
- return ret;
247
-}
248
-
249
-static TCGv_i64 gen_muls_i64_i32(TCGv_i32 a, TCGv_i32 b)
250
-{
251
- TCGv_i32 lo = tcg_temp_new_i32();
252
- TCGv_i32 hi = tcg_temp_new_i32();
253
- TCGv_i64 ret;
254
-
255
- tcg_gen_muls2_i32(lo, hi, a, b);
256
- tcg_temp_free_i32(a);
257
- tcg_temp_free_i32(b);
258
-
259
- ret = tcg_temp_new_i64();
260
- tcg_gen_concat_i32_i64(ret, lo, hi);
261
- tcg_temp_free_i32(lo);
262
- tcg_temp_free_i32(hi);
263
-
264
- return ret;
265
-}
266
-
267
/* Swap low and high halfwords. */
268
static void gen_swap_half(TCGv_i32 var)
269
{
270
@@ -XXX,XX +XXX,XX @@ static inline void gen_neon_addl(int size)
271
}
272
}
273
274
-static inline void gen_neon_negl(TCGv_i64 var, int size)
275
-{
276
- switch (size) {
277
- case 0: gen_helper_neon_negl_u16(var, var); break;
278
- case 1: gen_helper_neon_negl_u32(var, var); break;
279
- case 2:
280
- tcg_gen_neg_i64(var, var);
281
- break;
282
- default: abort();
283
- }
284
-}
285
-
286
-static inline void gen_neon_addl_saturate(TCGv_i64 op0, TCGv_i64 op1, int size)
287
-{
288
- switch (size) {
289
- case 1: gen_helper_neon_addl_saturate_s32(op0, cpu_env, op0, op1); break;
290
- case 2: gen_helper_neon_addl_saturate_s64(op0, cpu_env, op0, op1); break;
291
- default: abort();
292
- }
293
-}
294
-
295
-static inline void gen_neon_mull(TCGv_i64 dest, TCGv_i32 a, TCGv_i32 b,
296
- int size, int u)
297
-{
298
- TCGv_i64 tmp;
299
-
300
- switch ((size << 1) | u) {
301
- case 0: gen_helper_neon_mull_s8(dest, a, b); break;
302
- case 1: gen_helper_neon_mull_u8(dest, a, b); break;
303
- case 2: gen_helper_neon_mull_s16(dest, a, b); break;
304
- case 3: gen_helper_neon_mull_u16(dest, a, b); break;
305
- case 4:
306
- tmp = gen_muls_i64_i32(a, b);
307
- tcg_gen_mov_i64(dest, tmp);
308
- tcg_temp_free_i64(tmp);
309
- break;
310
- case 5:
311
- tmp = gen_mulu_i64_i32(a, b);
312
- tcg_gen_mov_i64(dest, tmp);
313
- tcg_temp_free_i64(tmp);
314
- break;
315
- default: abort();
316
- }
317
-
318
- /* gen_helper_neon_mull_[su]{8|16} do not free their parameters.
319
- Don't forget to clean them now. */
320
- if (size < 2) {
321
- tcg_temp_free_i32(a);
322
- tcg_temp_free_i32(b);
323
- }
324
-}
325
-
326
static void gen_neon_narrow_op(int op, int u, int size,
327
TCGv_i32 dest, TCGv_i64 src)
328
{
329
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
330
int u;
331
int vec_size;
332
uint32_t imm;
333
- TCGv_i32 tmp, tmp2, tmp3, tmp4, tmp5;
334
+ TCGv_i32 tmp, tmp2, tmp3, tmp5;
335
TCGv_ptr ptr1;
336
TCGv_i64 tmp64;
337
338
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
339
return 1;
340
} else { /* (insn & 0x00800010 == 0x00800000) */
341
if (size != 3) {
342
- op = (insn >> 8) & 0xf;
343
- if ((insn & (1 << 6)) == 0) {
344
- /* Three registers of different lengths: handled by decodetree */
126
- return 1;
345
- return 1;
127
- }
346
- } else {
128
- size = (insn >> 6) & 3;
347
- /* Two registers and a scalar. NB that for ops of this form
129
- nregs = ((insn >> 8) & 3) + 1;
348
- * the ARM ARM labels bit 24 as Q, but it is in our variable
130
-
349
- * 'u', not 'q'.
131
- if (size == 3) {
350
- */
132
- if (nregs != 4 || a == 0) {
351
- if (size == 0) {
133
- return 1;
352
- return 1;
134
- }
353
- }
135
- /* For VLD4 size==3 a == 1 means 32 bits at 16 byte alignment */
354
- switch (op) {
136
- size = 2;
355
- case 0: /* Integer VMLA scalar */
356
- case 4: /* Integer VMLS scalar */
357
- case 8: /* Integer VMUL scalar */
358
- case 1: /* Float VMLA scalar */
359
- case 5: /* Floating point VMLS scalar */
360
- case 9: /* Floating point VMUL scalar */
361
- case 12: /* VQDMULH scalar */
362
- case 13: /* VQRDMULH scalar */
363
- case 14: /* VQRDMLAH scalar */
364
- case 15: /* VQRDMLSH scalar */
365
- return 1; /* handled by decodetree */
366
-
367
- case 3: /* VQDMLAL scalar */
368
- case 7: /* VQDMLSL scalar */
369
- case 11: /* VQDMULL scalar */
370
- if (u == 1) {
371
- return 1;
372
- }
373
- /* fall through */
374
- case 2: /* VMLAL sclar */
375
- case 6: /* VMLSL scalar */
376
- case 10: /* VMULL scalar */
377
- if (rd & 1) {
378
- return 1;
379
- }
380
- tmp2 = neon_get_scalar(size, rm);
381
- /* We need a copy of tmp2 because gen_neon_mull
382
- * deletes it during pass 0. */
383
- tmp4 = tcg_temp_new_i32();
384
- tcg_gen_mov_i32(tmp4, tmp2);
385
- tmp3 = neon_load_reg(rn, 1);
386
-
387
- for (pass = 0; pass < 2; pass++) {
388
- if (pass == 0) {
389
- tmp = neon_load_reg(rn, 0);
390
- } else {
391
- tmp = tmp3;
392
- tmp2 = tmp4;
393
- }
394
- gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
395
- if (op != 11) {
396
- neon_load_reg64(cpu_V1, rd + pass);
397
- }
398
- switch (op) {
399
- case 6:
400
- gen_neon_negl(cpu_V0, size);
401
- /* Fall through */
402
- case 2:
403
- gen_neon_addl(size);
404
- break;
405
- case 3: case 7:
406
- gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
407
- if (op == 7) {
408
- gen_neon_negl(cpu_V0, size);
409
- }
410
- gen_neon_addl_saturate(cpu_V0, cpu_V1, size);
411
- break;
412
- case 10:
413
- /* no-op */
414
- break;
415
- case 11:
416
- gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
417
- break;
418
- default:
419
- abort();
420
- }
421
- neon_store_reg64(cpu_V0, rd + pass);
422
- }
423
- break;
424
- default:
425
- g_assert_not_reached();
426
- }
137
- }
427
- }
138
- if (nregs == 1 && a == 1 && size == 0) {
428
+ /*
139
- return 1;
429
+ * Three registers of different lengths, or two registers and
140
- }
430
+ * a scalar: handled by decodetree
141
- if (nregs == 3 && a == 1) {
431
+ */
142
- return 1;
143
- }
144
- addr = tcg_temp_new_i32();
145
- load_reg_var(s, addr, rn);
146
-
147
- /* VLD1 to all lanes: bit 5 indicates how many Dregs to write.
148
- * VLD2/3/4 to all lanes: bit 5 indicates register stride.
149
- */
150
- stride = (insn & (1 << 5)) ? 2 : 1;
151
- vec_size = nregs == 1 ? stride * 8 : 8;
152
-
153
- tmp = tcg_temp_new_i32();
154
- for (reg = 0; reg < nregs; reg++) {
155
- gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s),
156
- s->be_data | size);
157
- if ((rd & 1) && vec_size == 16) {
158
- /* We cannot write 16 bytes at once because the
159
- * destination is unaligned.
160
- */
161
- tcg_gen_gvec_dup_i32(size, neon_reg_offset(rd, 0),
162
- 8, 8, tmp);
163
- tcg_gen_gvec_mov(0, neon_reg_offset(rd + 1, 0),
164
- neon_reg_offset(rd, 0), 8, 8);
165
- } else {
166
- tcg_gen_gvec_dup_i32(size, neon_reg_offset(rd, 0),
167
- vec_size, vec_size, tmp);
168
- }
169
- tcg_gen_addi_i32(addr, addr, 1 << size);
170
- rd += stride;
171
- }
172
- tcg_temp_free_i32(tmp);
173
- tcg_temp_free_i32(addr);
174
- stride = (1 << size) * nregs;
175
+ /* Load single element to all lanes -- handled by decodetree */
176
+ return 1;
432
+ return 1;
177
} else {
433
} else { /* size == 3 */
178
/* Single element. */
434
if (!u) {
179
int idx = (insn >> 4) & 0xf;
435
/* Extract. */
180
--
436
--
181
2.20.1
437
2.20.1
182
438
183
439
diff view generated by jsdifflib
1
Convert the VCADD (vector) insns to decodetree.
1
Convert the Neon VEXT insn to decodetree. Rather than keeping the
2
old implementation which used fixed temporaries cpu_V0 and cpu_V1
3
and did the extraction with by-hand shift and logic ops, we use
4
the TCG extract2 insn.
5
6
We don't need to special case 0 or 8 immediates any more as the
7
optimizer is smart enough to throw away the dead code.
2
8
3
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
10
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
5
Message-id: 20200430181003.21682-6-peter.maydell@linaro.org
6
---
11
---
7
target/arm/neon-shared.decode | 3 +++
12
target/arm/neon-dp.decode | 8 +++-
8
target/arm/translate-neon.inc.c | 37 +++++++++++++++++++++++++++++++++
13
target/arm/translate-neon.inc.c | 76 +++++++++++++++++++++++++++++++++
9
target/arm/translate.c | 11 +---------
14
target/arm/translate.c | 58 +------------------------
10
3 files changed, 41 insertions(+), 10 deletions(-)
15
3 files changed, 85 insertions(+), 57 deletions(-)
11
16
12
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
17
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
13
index XXXXXXX..XXXXXXX 100644
18
index XXXXXXX..XXXXXXX 100644
14
--- a/target/arm/neon-shared.decode
19
--- a/target/arm/neon-dp.decode
15
+++ b/target/arm/neon-shared.decode
20
+++ b/target/arm/neon-dp.decode
16
@@ -XXX,XX +XXX,XX @@
21
@@ -XXX,XX +XXX,XX @@ Vimm_1r 1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
17
22
# return false for size==3.
18
VCMLA 1111 110 rot:2 . 1 size:1 .... .... 1000 . q:1 . 0 .... \
23
######################################################################
19
vm=%vm_dp vn=%vn_dp vd=%vd_dp
24
{
20
+
25
- # 0b11 subgroup will go here
21
+VCADD 1111 110 rot:1 1 . 0 size:1 .... .... 1000 . q:1 . 0 .... \
26
+ [
22
+ vm=%vm_dp vn=%vn_dp vd=%vd_dp
27
+ ##################################################################
28
+ # Miscellaneous size=0b11 insns
29
+ ##################################################################
30
+ VEXT 1111 001 0 1 . 11 .... .... imm:4 . q:1 . 0 .... \
31
+ vm=%vm_dp vn=%vn_dp vd=%vd_dp
32
+ ]
33
34
# Subgroup for size != 0b11
35
[
23
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
36
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
24
index XXXXXXX..XXXXXXX 100644
37
index XXXXXXX..XXXXXXX 100644
25
--- a/target/arm/translate-neon.inc.c
38
--- a/target/arm/translate-neon.inc.c
26
+++ b/target/arm/translate-neon.inc.c
39
+++ b/target/arm/translate-neon.inc.c
27
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMLA(DisasContext *s, arg_VCMLA *a)
40
@@ -XXX,XX +XXX,XX @@ static bool trans_VQDMLSL_2sc(DisasContext *s, arg_2scalar *a)
28
tcg_temp_free_ptr(fpst);
41
29
return true;
42
return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
30
}
43
}
31
+
44
+
32
+static bool trans_VCADD(DisasContext *s, arg_VCADD *a)
45
+static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
33
+{
46
+{
34
+ int opr_sz;
47
+ if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
35
+ TCGv_ptr fpst;
36
+ gen_helper_gvec_3_ptr *fn_gvec_ptr;
37
+
38
+ if (!dc_isar_feature(aa32_vcma, s)
39
+ || (!a->size && !dc_isar_feature(aa32_fp16_arith, s))) {
40
+ return false;
48
+ return false;
41
+ }
49
+ }
42
+
50
+
43
+ /* UNDEF accesses to D16-D31 if they don't exist. */
51
+ /* UNDEF accesses to D16-D31 if they don't exist. */
44
+ if (!dc_isar_feature(aa32_simd_r32, s) &&
52
+ if (!dc_isar_feature(aa32_simd_r32, s) &&
...
...
48
+
56
+
49
+ if ((a->vn | a->vm | a->vd) & a->q) {
57
+ if ((a->vn | a->vm | a->vd) & a->q) {
50
+ return false;
58
+ return false;
51
+ }
59
+ }
52
+
60
+
61
+ if (a->imm > 7 && !a->q) {
62
+ return false;
63
+ }
64
+
53
+ if (!vfp_access_check(s)) {
65
+ if (!vfp_access_check(s)) {
54
+ return true;
66
+ return true;
55
+ }
67
+ }
56
+
68
+
57
+ opr_sz = (1 + a->q) * 8;
69
+ if (!a->q) {
58
+ fpst = get_fpstatus_ptr(1);
70
+ /* Extract 64 bits from <Vm:Vn> */
59
+ fn_gvec_ptr = a->size ? gen_helper_gvec_fcadds : gen_helper_gvec_fcaddh;
71
+ TCGv_i64 left, right, dest;
60
+ tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
72
+
61
+ vfp_reg_offset(1, a->vn),
73
+ left = tcg_temp_new_i64();
62
+ vfp_reg_offset(1, a->vm),
74
+ right = tcg_temp_new_i64();
63
+ fpst, opr_sz, opr_sz, a->rot,
75
+ dest = tcg_temp_new_i64();
64
+ fn_gvec_ptr);
76
+
65
+ tcg_temp_free_ptr(fpst);
77
+ neon_load_reg64(right, a->vn);
78
+ neon_load_reg64(left, a->vm);
79
+ tcg_gen_extract2_i64(dest, right, left, a->imm * 8);
80
+ neon_store_reg64(dest, a->vd);
81
+
82
+ tcg_temp_free_i64(left);
83
+ tcg_temp_free_i64(right);
84
+ tcg_temp_free_i64(dest);
85
+ } else {
86
+ /* Extract 128 bits from <Vm+1:Vm:Vn+1:Vn> */
87
+ TCGv_i64 left, middle, right, destleft, destright;
88
+
89
+ left = tcg_temp_new_i64();
90
+ middle = tcg_temp_new_i64();
91
+ right = tcg_temp_new_i64();
92
+ destleft = tcg_temp_new_i64();
93
+ destright = tcg_temp_new_i64();
94
+
95
+ if (a->imm < 8) {
96
+ neon_load_reg64(right, a->vn);
97
+ neon_load_reg64(middle, a->vn + 1);
98
+ tcg_gen_extract2_i64(destright, right, middle, a->imm * 8);
99
+ neon_load_reg64(left, a->vm);
100
+ tcg_gen_extract2_i64(destleft, middle, left, a->imm * 8);
101
+ } else {
102
+ neon_load_reg64(right, a->vn + 1);
103
+ neon_load_reg64(middle, a->vm);
104
+ tcg_gen_extract2_i64(destright, right, middle, (a->imm - 8) * 8);
105
+ neon_load_reg64(left, a->vm + 1);
106
+ tcg_gen_extract2_i64(destleft, middle, left, (a->imm - 8) * 8);
107
+ }
108
+
109
+ neon_store_reg64(destright, a->vd);
110
+ neon_store_reg64(destleft, a->vd + 1);
111
+
112
+ tcg_temp_free_i64(destright);
113
+ tcg_temp_free_i64(destleft);
114
+ tcg_temp_free_i64(right);
115
+ tcg_temp_free_i64(middle);
116
+ tcg_temp_free_i64(left);
117
+ }
66
+ return true;
118
+ return true;
67
+}
119
+}
68
diff --git a/target/arm/translate.c b/target/arm/translate.c
120
diff --git a/target/arm/translate.c b/target/arm/translate.c
69
index XXXXXXX..XXXXXXX 100644
121
index XXXXXXX..XXXXXXX 100644
70
--- a/target/arm/translate.c
122
--- a/target/arm/translate.c
71
+++ b/target/arm/translate.c
123
+++ b/target/arm/translate.c
72
@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
124
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
73
bool is_long = false, q = extract32(insn, 6, 1);
125
int pass;
74
bool ptr_is_env = false;
126
int u;
75
127
int vec_size;
76
- if ((insn & 0xfea00f10) == 0xfc800800) {
128
- uint32_t imm;
77
- /* VCADD -- 1111 110R 1.0S .... .... 1000 ...0 .... */
129
TCGv_i32 tmp, tmp2, tmp3, tmp5;
78
- int size = extract32(insn, 20, 1);
130
TCGv_ptr ptr1;
79
- data = extract32(insn, 24, 1); /* rot */
131
- TCGv_i64 tmp64;
80
- if (!dc_isar_feature(aa32_vcma, s)
132
81
- || (!size && !dc_isar_feature(aa32_fp16_arith, s))) {
133
if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
82
- return 1;
134
return 1;
83
- }
135
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
84
- fn_gvec_ptr = size ? gen_helper_gvec_fcadds : gen_helper_gvec_fcaddh;
136
return 1;
85
- } else if ((insn & 0xfeb00f00) == 0xfc200d00) {
137
} else { /* size == 3 */
86
+ if ((insn & 0xfeb00f00) == 0xfc200d00) {
138
if (!u) {
87
/* V[US]DOT -- 1111 1100 0.10 .... .... 1101 .Q.U .... */
139
- /* Extract. */
88
bool u = extract32(insn, 4, 1);
140
- imm = (insn >> 8) & 0xf;
89
if (!dc_isar_feature(aa32_dp, s)) {
141
-
142
- if (imm > 7 && !q)
143
- return 1;
144
-
145
- if (q && ((rd | rn | rm) & 1)) {
146
- return 1;
147
- }
148
-
149
- if (imm == 0) {
150
- neon_load_reg64(cpu_V0, rn);
151
- if (q) {
152
- neon_load_reg64(cpu_V1, rn + 1);
153
- }
154
- } else if (imm == 8) {
155
- neon_load_reg64(cpu_V0, rn + 1);
156
- if (q) {
157
- neon_load_reg64(cpu_V1, rm);
158
- }
159
- } else if (q) {
160
- tmp64 = tcg_temp_new_i64();
161
- if (imm < 8) {
162
- neon_load_reg64(cpu_V0, rn);
163
- neon_load_reg64(tmp64, rn + 1);
164
- } else {
165
- neon_load_reg64(cpu_V0, rn + 1);
166
- neon_load_reg64(tmp64, rm);
167
- }
168
- tcg_gen_shri_i64(cpu_V0, cpu_V0, (imm & 7) * 8);
169
- tcg_gen_shli_i64(cpu_V1, tmp64, 64 - ((imm & 7) * 8));
170
- tcg_gen_or_i64(cpu_V0, cpu_V0, cpu_V1);
171
- if (imm < 8) {
172
- neon_load_reg64(cpu_V1, rm);
173
- } else {
174
- neon_load_reg64(cpu_V1, rm + 1);
175
- imm -= 8;
176
- }
177
- tcg_gen_shli_i64(cpu_V1, cpu_V1, 64 - (imm * 8));
178
- tcg_gen_shri_i64(tmp64, tmp64, imm * 8);
179
- tcg_gen_or_i64(cpu_V1, cpu_V1, tmp64);
180
- tcg_temp_free_i64(tmp64);
181
- } else {
182
- /* BUGFIX */
183
- neon_load_reg64(cpu_V0, rn);
184
- tcg_gen_shri_i64(cpu_V0, cpu_V0, imm * 8);
185
- neon_load_reg64(cpu_V1, rm);
186
- tcg_gen_shli_i64(cpu_V1, cpu_V1, 64 - (imm * 8));
187
- tcg_gen_or_i64(cpu_V0, cpu_V0, cpu_V1);
188
- }
189
- neon_store_reg64(cpu_V0, rd);
190
- if (q) {
191
- neon_store_reg64(cpu_V1, rd + 1);
192
- }
193
+ /* Extract: handled by decodetree */
194
+ return 1;
195
} else if ((insn & (1 << 11)) == 0) {
196
/* Two register misc. */
197
op = ((insn >> 12) & 0x30) | ((insn >> 7) & 0xf);
90
--
198
--
91
2.20.1
199
2.20.1
92
200
93
201
diff view generated by jsdifflib
1
Convert the Neon "load/store multiple structures" insns to decodetree.
1
Convert the Neon VTBL, VTBX instructions to decodetree. The actual
2
implementation of the insn is copied across to the new trans function
3
unchanged except for renaming 'tmp5' to 'tmp4'.
2
4
3
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
5
Message-id: 20200430181003.21682-12-peter.maydell@linaro.org
6
---
7
---
7
target/arm/neon-ls.decode | 7 ++
8
target/arm/neon-dp.decode | 3 ++
8
target/arm/translate-neon.inc.c | 124 ++++++++++++++++++++++++++++++++
9
target/arm/translate-neon.inc.c | 56 +++++++++++++++++++++++++++++++++
9
target/arm/translate.c | 91 +----------------------
10
target/arm/translate.c | 41 +++---------------------
10
3 files changed, 133 insertions(+), 89 deletions(-)
11
3 files changed, 63 insertions(+), 37 deletions(-)
11
12
12
diff --git a/target/arm/neon-ls.decode b/target/arm/neon-ls.decode
13
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
13
index XXXXXXX..XXXXXXX 100644
14
index XXXXXXX..XXXXXXX 100644
14
--- a/target/arm/neon-ls.decode
15
--- a/target/arm/neon-dp.decode
15
+++ b/target/arm/neon-ls.decode
16
+++ b/target/arm/neon-dp.decode
16
@@ -XXX,XX +XXX,XX @@
17
@@ -XXX,XX +XXX,XX @@ Vimm_1r 1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
17
# 0b1111_1001_xxx0_xxxx_xxxx_xxxx_xxxx_xxxx
18
##################################################################
18
# This file works on the A32 encoding only; calling code for T32 has to
19
VEXT 1111 001 0 1 . 11 .... .... imm:4 . q:1 . 0 .... \
19
# transform the insn into the A32 version first.
20
vm=%vm_dp vn=%vn_dp vd=%vd_dp
20
+
21
+
21
+%vd_dp 22:1 12:4
22
+ VTBL 1111 001 1 1 . 11 .... .... 10 len:2 . op:1 . 0 .... \
22
+
23
+ vm=%vm_dp vn=%vn_dp vd=%vd_dp
23
+# Neon load/store multiple structures
24
]
24
+
25
25
+VLDST_multiple 1111 0100 0 . l:1 0 rn:4 .... itype:4 size:2 align:2 rm:4 \
26
# Subgroup for size != 0b11
26
+ vd=%vd_dp
27
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
27
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
28
index XXXXXXX..XXXXXXX 100644
28
index XXXXXXX..XXXXXXX 100644
29
--- a/target/arm/translate-neon.inc.c
29
--- a/target/arm/translate-neon.inc.c
30
+++ b/target/arm/translate-neon.inc.c
30
+++ b/target/arm/translate-neon.inc.c
31
@@ -XXX,XX +XXX,XX @@ static bool trans_VFML_scalar(DisasContext *s, arg_VFML_scalar *a)
31
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
32
gen_helper_gvec_fmlal_idx_a32);
32
}
33
return true;
33
return true;
34
}
34
}
35
+
35
+
36
+static struct {
36
+static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
37
+ int nregs;
38
+ int interleave;
39
+ int spacing;
40
+} const neon_ls_element_type[11] = {
41
+ {1, 4, 1},
42
+ {1, 4, 2},
43
+ {4, 1, 1},
44
+ {2, 2, 2},
45
+ {1, 3, 1},
46
+ {1, 3, 2},
47
+ {3, 1, 1},
48
+ {1, 1, 1},
49
+ {1, 2, 1},
50
+ {1, 2, 2},
51
+ {2, 1, 1}
52
+};
53
+
54
+static void gen_neon_ldst_base_update(DisasContext *s, int rm, int rn,
55
+ int stride)
56
+{
37
+{
57
+ if (rm != 15) {
38
+ int n;
58
+ TCGv_i32 base;
39
+ TCGv_i32 tmp, tmp2, tmp3, tmp4;
59
+
40
+ TCGv_ptr ptr1;
60
+ base = load_reg(s, rn);
61
+ if (rm == 13) {
62
+ tcg_gen_addi_i32(base, base, stride);
63
+ } else {
64
+ TCGv_i32 index;
65
+ index = load_reg(s, rm);
66
+ tcg_gen_add_i32(base, base, index);
67
+ tcg_temp_free_i32(index);
68
+ }
69
+ store_reg(s, rn, base);
70
+ }
71
+}
72
+
73
+static bool trans_VLDST_multiple(DisasContext *s, arg_VLDST_multiple *a)
74
+{
75
+ /* Neon load/store multiple structures */
76
+ int nregs, interleave, spacing, reg, n;
77
+ MemOp endian = s->be_data;
78
+ int mmu_idx = get_mem_index(s);
79
+ int size = a->size;
80
+ TCGv_i64 tmp64;
81
+ TCGv_i32 addr, tmp;
82
+
41
+
83
+ if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
42
+ if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
84
+ return false;
43
+ return false;
85
+ }
44
+ }
86
+
45
+
87
+ /* UNDEF accesses to D16-D31 if they don't exist */
46
+ /* UNDEF accesses to D16-D31 if they don't exist. */
88
+ if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
47
+ if (!dc_isar_feature(aa32_simd_r32, s) &&
89
+ return false;
48
+ ((a->vd | a->vn | a->vm) & 0x10)) {
90
+ }
91
+ if (a->itype > 10) {
92
+ return false;
93
+ }
94
+ /* Catch UNDEF cases for bad values of align field */
95
+ switch (a->itype & 0xc) {
96
+ case 4:
97
+ if (a->align >= 2) {
98
+ return false;
99
+ }
100
+ break;
101
+ case 8:
102
+ if (a->align == 3) {
103
+ return false;
104
+ }
105
+ break;
106
+ default:
107
+ break;
108
+ }
109
+ nregs = neon_ls_element_type[a->itype].nregs;
110
+ interleave = neon_ls_element_type[a->itype].interleave;
111
+ spacing = neon_ls_element_type[a->itype].spacing;
112
+ if (size == 3 && (interleave | spacing) != 1) {
113
+ return false;
49
+ return false;
114
+ }
50
+ }
115
+
51
+
116
+ if (!vfp_access_check(s)) {
52
+ if (!vfp_access_check(s)) {
117
+ return true;
53
+ return true;
118
+ }
54
+ }
119
+
55
+
120
+ /* For our purposes, bytes are always little-endian. */
56
+ n = a->len + 1;
121
+ if (size == 0) {
57
+ if ((a->vn + n) > 32) {
122
+ endian = MO_LE;
58
+ /*
59
+ * This is UNPREDICTABLE; we choose to UNDEF to avoid the
60
+ * helper function running off the end of the register file.
61
+ */
62
+ return false;
123
+ }
63
+ }
124
+ /*
64
+ n <<= 3;
125
+ * Consecutive little-endian elements from a single register
65
+ if (a->op) {
126
+ * can be promoted to a larger little-endian operation.
66
+ tmp = neon_load_reg(a->vd, 0);
127
+ */
67
+ } else {
128
+ if (interleave == 1 && endian == MO_LE) {
68
+ tmp = tcg_temp_new_i32();
129
+ size = 3;
69
+ tcg_gen_movi_i32(tmp, 0);
130
+ }
70
+ }
131
+ tmp64 = tcg_temp_new_i64();
71
+ tmp2 = neon_load_reg(a->vm, 0);
132
+ addr = tcg_temp_new_i32();
72
+ ptr1 = vfp_reg_ptr(true, a->vn);
133
+ tmp = tcg_const_i32(1 << size);
73
+ tmp4 = tcg_const_i32(n);
134
+ load_reg_var(s, addr, a->rn);
74
+ gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp4);
135
+ for (reg = 0; reg < nregs; reg++) {
75
+ tcg_temp_free_i32(tmp);
136
+ for (n = 0; n < 8 >> size; n++) {
76
+ if (a->op) {
137
+ int xs;
77
+ tmp = neon_load_reg(a->vd, 1);
138
+ for (xs = 0; xs < interleave; xs++) {
78
+ } else {
139
+ int tt = a->vd + reg + spacing * xs;
79
+ tmp = tcg_temp_new_i32();
140
+
80
+ tcg_gen_movi_i32(tmp, 0);
141
+ if (a->l) {
142
+ gen_aa32_ld_i64(s, tmp64, addr, mmu_idx, endian | size);
143
+ neon_store_element64(tt, n, size, tmp64);
144
+ } else {
145
+ neon_load_element64(tmp64, tt, n, size);
146
+ gen_aa32_st_i64(s, tmp64, addr, mmu_idx, endian | size);
147
+ }
148
+ tcg_gen_add_i32(addr, addr, tmp);
149
+ }
150
+ }
151
+ }
81
+ }
152
+ tcg_temp_free_i32(addr);
82
+ tmp3 = neon_load_reg(a->vm, 1);
83
+ gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp4);
84
+ tcg_temp_free_i32(tmp4);
85
+ tcg_temp_free_ptr(ptr1);
86
+ neon_store_reg(a->vd, 0, tmp2);
87
+ neon_store_reg(a->vd, 1, tmp3);
153
+ tcg_temp_free_i32(tmp);
88
+ tcg_temp_free_i32(tmp);
154
+ tcg_temp_free_i64(tmp64);
155
+
156
+ gen_neon_ldst_base_update(s, a->rm, a->rn, nregs * interleave * 8);
157
+ return true;
89
+ return true;
158
+}
90
+}
159
diff --git a/target/arm/translate.c b/target/arm/translate.c
91
diff --git a/target/arm/translate.c b/target/arm/translate.c
160
index XXXXXXX..XXXXXXX 100644
92
index XXXXXXX..XXXXXXX 100644
161
--- a/target/arm/translate.c
93
--- a/target/arm/translate.c
162
+++ b/target/arm/translate.c
94
+++ b/target/arm/translate.c
163
@@ -XXX,XX +XXX,XX @@ static void gen_neon_trn_u16(TCGv_i32 t0, TCGv_i32 t1)
95
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
164
}
165
166
167
-static struct {
168
- int nregs;
169
- int interleave;
170
- int spacing;
171
-} const neon_ls_element_type[11] = {
172
- {1, 4, 1},
173
- {1, 4, 2},
174
- {4, 1, 1},
175
- {2, 2, 2},
176
- {1, 3, 1},
177
- {1, 3, 2},
178
- {3, 1, 1},
179
- {1, 1, 1},
180
- {1, 2, 1},
181
- {1, 2, 2},
182
- {2, 1, 1}
183
-};
184
-
185
/* Translate a NEON load/store element instruction. Return nonzero if the
186
instruction is invalid. */
187
static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
188
{
96
{
189
int rd, rn, rm;
97
int op;
190
- int op;
98
int q;
191
int nregs;
99
- int rd, rn, rm, rd_ofs, rm_ofs;
192
- int interleave;
100
+ int rd, rm, rd_ofs, rm_ofs;
193
- int spacing;
194
int stride;
195
int size;
101
int size;
196
int reg;
102
int pass;
197
int load;
103
int u;
198
- int n;
199
int vec_size;
104
int vec_size;
200
- int mmu_idx;
105
- TCGv_i32 tmp, tmp2, tmp3, tmp5;
201
- MemOp endian;
106
- TCGv_ptr ptr1;
202
TCGv_i32 addr;
107
+ TCGv_i32 tmp, tmp2, tmp3;
203
TCGv_i32 tmp;
204
- TCGv_i32 tmp2;
205
- TCGv_i64 tmp64;
206
108
207
if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
109
if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
208
return 1;
110
return 1;
209
@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
111
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
210
rn = (insn >> 16) & 0xf;
112
q = (insn & (1 << 6)) != 0;
211
rm = insn & 0xf;
113
u = (insn >> 24) & 1;
212
load = (insn & (1 << 21)) != 0;
114
VFP_DREG_D(rd, insn);
213
- endian = s->be_data;
115
- VFP_DREG_N(rn, insn);
214
- mmu_idx = get_mem_index(s);
116
VFP_DREG_M(rm, insn);
215
if ((insn & (1 << 23)) == 0) {
117
size = (insn >> 20) & 3;
216
- /* Load store all elements. */
118
vec_size = q ? 16 : 8;
217
- op = (insn >> 8) & 0xf;
119
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
218
- size = (insn >> 6) & 3;
120
break;
219
- if (op > 10)
121
}
220
- return 1;
122
} else if ((insn & (1 << 10)) == 0) {
221
- /* Catch UNDEF cases for bad values of align field */
123
- /* VTBL, VTBX. */
222
- switch (op & 0xc) {
124
- int n = ((insn >> 8) & 3) + 1;
223
- case 4:
125
- if ((rn + n) > 32) {
224
- if (((insn >> 5) & 1) == 1) {
126
- /* This is UNPREDICTABLE; we choose to UNDEF to avoid the
225
- return 1;
127
- * helper function running off the end of the register file.
226
- }
128
- */
227
- break;
129
- return 1;
228
- case 8:
229
- if (((insn >> 4) & 3) == 3) {
230
- return 1;
231
- }
232
- break;
233
- default:
234
- break;
235
- }
236
- nregs = neon_ls_element_type[op].nregs;
237
- interleave = neon_ls_element_type[op].interleave;
238
- spacing = neon_ls_element_type[op].spacing;
239
- if (size == 3 && (interleave | spacing) != 1) {
240
- return 1;
241
- }
242
- /* For our purposes, bytes are always little-endian. */
243
- if (size == 0) {
244
- endian = MO_LE;
245
- }
246
- /* Consecutive little-endian elements from a single register
247
- * can be promoted to a larger little-endian operation.
248
- */
249
- if (interleave == 1 && endian == MO_LE) {
250
- size = 3;
251
- }
252
- tmp64 = tcg_temp_new_i64();
253
- addr = tcg_temp_new_i32();
254
- tmp2 = tcg_const_i32(1 << size);
255
- load_reg_var(s, addr, rn);
256
- for (reg = 0; reg < nregs; reg++) {
257
- for (n = 0; n < 8 >> size; n++) {
258
- int xs;
259
- for (xs = 0; xs < interleave; xs++) {
260
- int tt = rd + reg + spacing * xs;
261
-
262
- if (load) {
263
- gen_aa32_ld_i64(s, tmp64, addr, mmu_idx, endian | size);
264
- neon_store_element64(tt, n, size, tmp64);
265
- } else {
266
- neon_load_element64(tmp64, tt, n, size);
267
- gen_aa32_st_i64(s, tmp64, addr, mmu_idx, endian | size);
268
- }
269
- tcg_gen_add_i32(addr, addr, tmp2);
270
- }
130
- }
271
- }
131
- n <<= 3;
272
- }
132
- if (insn & (1 << 6)) {
273
- tcg_temp_free_i32(addr);
133
- tmp = neon_load_reg(rd, 0);
274
- tcg_temp_free_i32(tmp2);
134
- } else {
275
- tcg_temp_free_i64(tmp64);
135
- tmp = tcg_temp_new_i32();
276
- stride = nregs * interleave * 8;
136
- tcg_gen_movi_i32(tmp, 0);
277
+ /* Load store all elements -- handled already by decodetree */
137
- }
278
+ return 1;
138
- tmp2 = neon_load_reg(rm, 0);
279
} else {
139
- ptr1 = vfp_reg_ptr(true, rn);
280
size = (insn >> 10) & 3;
140
- tmp5 = tcg_const_i32(n);
281
if (size == 3) {
141
- gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp5);
142
- tcg_temp_free_i32(tmp);
143
- if (insn & (1 << 6)) {
144
- tmp = neon_load_reg(rd, 1);
145
- } else {
146
- tmp = tcg_temp_new_i32();
147
- tcg_gen_movi_i32(tmp, 0);
148
- }
149
- tmp3 = neon_load_reg(rm, 1);
150
- gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp5);
151
- tcg_temp_free_i32(tmp5);
152
- tcg_temp_free_ptr(ptr1);
153
- neon_store_reg(rd, 0, tmp2);
154
- neon_store_reg(rd, 1, tmp3);
155
- tcg_temp_free_i32(tmp);
156
+ /* VTBL, VTBX: handled by decodetree */
157
+ return 1;
158
} else if ((insn & 0x380) == 0) {
159
/* VDUP */
160
int element;
282
--
161
--
283
2.20.1
162
2.20.1
284
163
285
164
diff view generated by jsdifflib
1
Convert the VFM[AS]L (vector) insns to decodetree. This is the last
1
Convert the Neon VDUP (scalar) insn to decodetree. (Note that we
2
insn in the legacy decoder for the 3same_ext group, so we can
2
can't call this just "VDUP" as we used that already in vfp.decode for
3
delete the legacy decoder function for the group entirely.
3
the "VDUP (general purpose register" insn.)
4
5
Note that in disas_thumb2_insn() the parts of this encoding space
6
where the decodetree decoder returns false will correctly be directed
7
to illegal_op by the "(insn & (1 << 28))" check so they won't fall
8
into disas_coproc_insn() by mistake.
9
4
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
12
Message-id: 20200430181003.21682-8-peter.maydell@linaro.org
13
---
7
---
14
target/arm/neon-shared.decode | 6 +++
8
target/arm/neon-dp.decode | 7 +++++++
15
target/arm/translate-neon.inc.c | 31 +++++++++++
9
target/arm/translate-neon.inc.c | 26 ++++++++++++++++++++++++++
16
target/arm/translate.c | 92 +--------------------------------
10
target/arm/translate.c | 25 +------------------------
17
3 files changed, 38 insertions(+), 91 deletions(-)
11
3 files changed, 34 insertions(+), 24 deletions(-)
18
12
19
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
13
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
20
index XXXXXXX..XXXXXXX 100644
14
index XXXXXXX..XXXXXXX 100644
21
--- a/target/arm/neon-shared.decode
15
--- a/target/arm/neon-dp.decode
22
+++ b/target/arm/neon-shared.decode
16
+++ b/target/arm/neon-dp.decode
23
@@ -XXX,XX +XXX,XX @@ VCADD 1111 110 rot:1 1 . 0 size:1 .... .... 1000 . q:1 . 0 .... \
17
@@ -XXX,XX +XXX,XX @@ Vimm_1r 1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
24
# VUDOT and VSDOT
18
25
VDOT 1111 110 00 . 10 .... .... 1101 . q:1 . u:1 .... \
19
VTBL 1111 001 1 1 . 11 .... .... 10 len:2 . op:1 . 0 .... \
26
vm=%vm_dp vn=%vn_dp vd=%vd_dp
20
vm=%vm_dp vn=%vn_dp vd=%vd_dp
27
+
21
+
28
+# VFM[AS]L
22
+ VDUP_scalar 1111 001 1 1 . 11 index:3 1 .... 11 000 q:1 . 0 .... \
29
+VFML 1111 110 0 s:1 . 10 .... .... 1000 . 0 . 1 .... \
23
+ vm=%vm_dp vd=%vd_dp size=0
30
+ vm=%vm_sp vn=%vn_sp vd=%vd_dp q=0
24
+ VDUP_scalar 1111 001 1 1 . 11 index:2 10 .... 11 000 q:1 . 0 .... \
31
+VFML 1111 110 0 s:1 . 10 .... .... 1000 . 1 . 1 .... \
25
+ vm=%vm_dp vd=%vd_dp size=1
32
+ vm=%vm_dp vn=%vn_dp vd=%vd_dp q=1
26
+ VDUP_scalar 1111 001 1 1 . 11 index:1 100 .... 11 000 q:1 . 0 .... \
27
+ vm=%vm_dp vd=%vd_dp size=2
28
]
29
30
# Subgroup for size != 0b11
33
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
31
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
34
index XXXXXXX..XXXXXXX 100644
32
index XXXXXXX..XXXXXXX 100644
35
--- a/target/arm/translate-neon.inc.c
33
--- a/target/arm/translate-neon.inc.c
36
+++ b/target/arm/translate-neon.inc.c
34
+++ b/target/arm/translate-neon.inc.c
37
@@ -XXX,XX +XXX,XX @@ static bool trans_VDOT(DisasContext *s, arg_VDOT *a)
35
@@ -XXX,XX +XXX,XX @@ static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
38
opr_sz, opr_sz, 0, fn_gvec);
36
tcg_temp_free_i32(tmp);
39
return true;
37
return true;
40
}
38
}
41
+
39
+
42
+static bool trans_VFML(DisasContext *s, arg_VFML *a)
40
+static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
43
+{
41
+{
44
+ int opr_sz;
42
+ if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
45
+
46
+ if (!dc_isar_feature(aa32_fhm, s)) {
47
+ return false;
43
+ return false;
48
+ }
44
+ }
49
+
45
+
50
+ /* UNDEF accesses to D16-D31 if they don't exist. */
46
+ /* UNDEF accesses to D16-D31 if they don't exist. */
51
+ if (!dc_isar_feature(aa32_simd_r32, s) &&
47
+ if (!dc_isar_feature(aa32_simd_r32, s) &&
52
+ (a->vd & 0x10)) {
48
+ ((a->vd | a->vm) & 0x10)) {
53
+ return false;
49
+ return false;
54
+ }
50
+ }
55
+
51
+
56
+ if (a->vd & a->q) {
52
+ if (a->vd & a->q) {
57
+ return false;
53
+ return false;
58
+ }
54
+ }
59
+
55
+
60
+ if (!vfp_access_check(s)) {
56
+ if (!vfp_access_check(s)) {
61
+ return true;
57
+ return true;
62
+ }
58
+ }
63
+
59
+
64
+ opr_sz = (1 + a->q) * 8;
60
+ tcg_gen_gvec_dup_mem(a->size, neon_reg_offset(a->vd, 0),
65
+ tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
61
+ neon_element_offset(a->vm, a->index, a->size),
66
+ vfp_reg_offset(a->q, a->vn),
62
+ a->q ? 16 : 8, a->q ? 16 : 8);
67
+ vfp_reg_offset(a->q, a->vm),
68
+ cpu_env, opr_sz, opr_sz, a->s, /* is_2 == 0 */
69
+ gen_helper_gvec_fmlal_a32);
70
+ return true;
63
+ return true;
71
+}
64
+}
72
diff --git a/target/arm/translate.c b/target/arm/translate.c
65
diff --git a/target/arm/translate.c b/target/arm/translate.c
73
index XXXXXXX..XXXXXXX 100644
66
index XXXXXXX..XXXXXXX 100644
74
--- a/target/arm/translate.c
67
--- a/target/arm/translate.c
75
+++ b/target/arm/translate.c
68
+++ b/target/arm/translate.c
76
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
69
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
77
return 0;
70
}
78
}
71
break;
79
72
}
80
-/* Advanced SIMD three registers of the same length extension.
73
- } else if ((insn & (1 << 10)) == 0) {
81
- * 31 25 23 22 20 16 12 11 10 9 8 3 0
74
- /* VTBL, VTBX: handled by decodetree */
82
- * +---------------+-----+---+-----+----+----+---+----+---+----+---------+----+
75
- return 1;
83
- * | 1 1 1 1 1 1 0 | op1 | D | op2 | Vn | Vd | 1 | o3 | 0 | o4 | N Q M U | Vm |
76
- } else if ((insn & 0x380) == 0) {
84
- * +---------------+-----+---+-----+----+----+---+----+---+----+---------+----+
77
- /* VDUP */
85
- */
78
- int element;
86
-static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
79
- MemOp size;
87
-{
88
- gen_helper_gvec_3 *fn_gvec = NULL;
89
- gen_helper_gvec_3_ptr *fn_gvec_ptr = NULL;
90
- int rd, rn, rm, opr_sz;
91
- int data = 0;
92
- int off_rn, off_rm;
93
- bool is_long = false, q = extract32(insn, 6, 1);
94
- bool ptr_is_env = false;
95
-
80
-
96
- if ((insn & 0xff300f10) == 0xfc200810) {
81
- if ((insn & (7 << 16)) == 0 || (q && (rd & 1))) {
97
- /* VFM[AS]L -- 1111 1100 S.10 .... .... 1000 .Q.1 .... */
82
- return 1;
98
- int is_s = extract32(insn, 23, 1);
83
- }
99
- if (!dc_isar_feature(aa32_fhm, s)) {
84
- if (insn & (1 << 16)) {
100
- return 1;
85
- size = MO_8;
101
- }
86
- element = (insn >> 17) & 7;
102
- is_long = true;
87
- } else if (insn & (1 << 17)) {
103
- data = is_s; /* is_2 == 0 */
88
- size = MO_16;
104
- fn_gvec_ptr = gen_helper_gvec_fmlal_a32;
89
- element = (insn >> 18) & 3;
105
- ptr_is_env = true;
90
- } else {
106
- } else {
91
- size = MO_32;
107
- return 1;
92
- element = (insn >> 19) & 1;
108
- }
93
- }
109
-
94
- tcg_gen_gvec_dup_mem(size, neon_reg_offset(rd, 0),
110
- VFP_DREG_D(rd, insn);
95
- neon_element_offset(rm, element, size),
111
- if (rd & q) {
96
- q ? 16 : 8, q ? 16 : 8);
112
- return 1;
97
} else {
113
- }
98
+ /* VTBL, VTBX, VDUP: handled by decodetree */
114
- if (q || !is_long) {
99
return 1;
115
- VFP_DREG_N(rn, insn);
116
- VFP_DREG_M(rm, insn);
117
- if ((rn | rm) & q & !is_long) {
118
- return 1;
119
- }
120
- off_rn = vfp_reg_offset(1, rn);
121
- off_rm = vfp_reg_offset(1, rm);
122
- } else {
123
- rn = VFP_SREG_N(insn);
124
- rm = VFP_SREG_M(insn);
125
- off_rn = vfp_reg_offset(0, rn);
126
- off_rm = vfp_reg_offset(0, rm);
127
- }
128
-
129
- if (s->fp_excp_el) {
130
- gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
131
- syn_simd_access_trap(1, 0xe, false), s->fp_excp_el);
132
- return 0;
133
- }
134
- if (!s->vfp_enabled) {
135
- return 1;
136
- }
137
-
138
- opr_sz = (1 + q) * 8;
139
- if (fn_gvec_ptr) {
140
- TCGv_ptr ptr;
141
- if (ptr_is_env) {
142
- ptr = cpu_env;
143
- } else {
144
- ptr = get_fpstatus_ptr(1);
145
- }
146
- tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd), off_rn, off_rm, ptr,
147
- opr_sz, opr_sz, data, fn_gvec_ptr);
148
- if (!ptr_is_env) {
149
- tcg_temp_free_ptr(ptr);
150
- }
151
- } else {
152
- tcg_gen_gvec_3_ool(vfp_reg_offset(1, rd), off_rn, off_rm,
153
- opr_sz, opr_sz, data, fn_gvec);
154
- }
155
- return 0;
156
-}
157
-
158
/* Advanced SIMD two registers and a scalar extension.
159
* 31 24 23 22 20 16 12 11 10 9 8 3 0
160
* +-----------------+----+---+----+----+----+---+----+---+----+---------+----+
161
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
162
}
163
}
164
}
100
}
165
- } else if ((insn & 0x0e000a00) == 0x0c000800
166
- && arm_dc_feature(s, ARM_FEATURE_V8)) {
167
- if (disas_neon_insn_3same_ext(s, insn)) {
168
- goto illegal_op;
169
- }
170
- return;
171
} else if ((insn & 0x0f000a00) == 0x0e000800
172
&& arm_dc_feature(s, ARM_FEATURE_V8)) {
173
if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
174
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
175
}
176
break;
177
}
101
}
178
- if ((insn & 0xfe000a00) == 0xfc000800
179
+ if ((insn & 0xff000a00) == 0xfe000800
180
&& arm_dc_feature(s, ARM_FEATURE_V8)) {
181
/* The Thumb2 and ARM encodings are identical. */
182
- if (disas_neon_insn_3same_ext(s, insn)) {
183
- goto illegal_op;
184
- }
185
- } else if ((insn & 0xff000a00) == 0xfe000800
186
- && arm_dc_feature(s, ARM_FEATURE_V8)) {
187
- /* The Thumb2 and ARM encodings are identical. */
188
if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
189
goto illegal_op;
190
}
191
--
102
--
192
2.20.1
103
2.20.1
193
104
194
105
diff view generated by jsdifflib
1
From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
1
From: Jean-Christophe Dubois <jcd@tribudubois.net>
2
2
3
Embed the UARTs into the SoC type.
3
Some bits of the CCM registers are non writable.
4
4
5
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
5
This was left undone in the initial commit (all bits of registers were
6
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
6
writable).
7
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
7
8
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
8
This patch adds the required code to protect the non writable bits.
9
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
9
10
Message-id: 20200427181649.26851-5-edgar.iglesias@gmail.com
10
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
11
Message-id: 20200608133508.550046-1-jcd@tribudubois.net
12
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
11
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
13
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
12
---
14
---
13
include/hw/arm/xlnx-versal.h | 3 ++-
15
hw/misc/imx6ul_ccm.c | 76 ++++++++++++++++++++++++++++++++++++--------
14
hw/arm/xlnx-versal.c | 12 ++++++------
16
1 file changed, 63 insertions(+), 13 deletions(-)
15
2 files changed, 8 insertions(+), 7 deletions(-)
16
17
17
diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
18
diff --git a/hw/misc/imx6ul_ccm.c b/hw/misc/imx6ul_ccm.c
18
index XXXXXXX..XXXXXXX 100644
19
index XXXXXXX..XXXXXXX 100644
19
--- a/include/hw/arm/xlnx-versal.h
20
--- a/hw/misc/imx6ul_ccm.c
20
+++ b/include/hw/arm/xlnx-versal.h
21
+++ b/hw/misc/imx6ul_ccm.c
21
@@ -XXX,XX +XXX,XX @@
22
@@ -XXX,XX +XXX,XX @@
22
#include "hw/sysbus.h"
23
23
#include "hw/arm/boot.h"
24
#include "trace.h"
24
#include "hw/intc/arm_gicv3.h"
25
25
+#include "hw/char/pl011.h"
26
+static const uint32_t ccm_mask[CCM_MAX] = {
26
27
+ [CCM_CCR] = 0xf01fef80,
27
#define TYPE_XLNX_VERSAL "xlnx-versal"
28
+ [CCM_CCDR] = 0xfffeffff,
28
#define XLNX_VERSAL(obj) OBJECT_CHECK(Versal, (obj), TYPE_XLNX_VERSAL)
29
+ [CCM_CSR] = 0xffffffff,
29
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
30
+ [CCM_CCSR] = 0xfffffef2,
30
MemoryRegion mr_ocm;
31
+ [CCM_CACRR] = 0xfffffff8,
31
32
+ [CCM_CBCDR] = 0xc1f8e000,
32
struct {
33
+ [CCM_CBCMR] = 0xfc03cfff,
33
- SysBusDevice *uart[XLNX_VERSAL_NR_UARTS];
34
+ [CCM_CSCMR1] = 0x80700000,
34
+ PL011State uart[XLNX_VERSAL_NR_UARTS];
35
+ [CCM_CSCMR2] = 0xe01ff003,
35
SysBusDevice *gem[XLNX_VERSAL_NR_GEMS];
36
+ [CCM_CSCDR1] = 0xfe00c780,
36
SysBusDevice *adma[XLNX_VERSAL_NR_ADMAS];
37
+ [CCM_CS1CDR] = 0xfe00fe00,
37
} iou;
38
+ [CCM_CS2CDR] = 0xf8007000,
38
diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
39
+ [CCM_CDCDR] = 0xf00fffff,
39
index XXXXXXX..XXXXXXX 100644
40
+ [CCM_CHSCCDR] = 0xfffc01ff,
40
--- a/hw/arm/xlnx-versal.c
41
+ [CCM_CSCDR2] = 0xfe0001ff,
41
+++ b/hw/arm/xlnx-versal.c
42
+ [CCM_CSCDR3] = 0xffffc1ff,
42
@@ -XXX,XX +XXX,XX @@
43
+ [CCM_CDHIPR] = 0xffffffff,
43
#include "kvm_arm.h"
44
+ [CCM_CTOR] = 0x00000000,
44
#include "hw/misc/unimp.h"
45
+ [CCM_CLPCR] = 0xf39ff01c,
45
#include "hw/arm/xlnx-versal.h"
46
+ [CCM_CISR] = 0xfb85ffbe,
46
-#include "hw/char/pl011.h"
47
+ [CCM_CIMR] = 0xfb85ffbf,
47
48
+ [CCM_CCOSR] = 0xfe00fe00,
48
#define XLNX_VERSAL_ACPU_TYPE ARM_CPU_TYPE_NAME("cortex-a72")
49
+ [CCM_CGPR] = 0xfffc3fea,
49
#define GEM_REVISION 0x40070106
50
+ [CCM_CCGR0] = 0x00000000,
50
@@ -XXX,XX +XXX,XX @@ static void versal_create_uarts(Versal *s, qemu_irq *pic)
51
+ [CCM_CCGR1] = 0x00000000,
51
DeviceState *dev;
52
+ [CCM_CCGR2] = 0x00000000,
52
MemoryRegion *mr;
53
+ [CCM_CCGR3] = 0x00000000,
53
54
+ [CCM_CCGR4] = 0x00000000,
54
- dev = qdev_create(NULL, TYPE_PL011);
55
+ [CCM_CCGR5] = 0x00000000,
55
- s->lpd.iou.uart[i] = SYS_BUS_DEVICE(dev);
56
+ [CCM_CCGR6] = 0x00000000,
56
+ sysbus_init_child_obj(OBJECT(s), name,
57
+ [CCM_CMEOR] = 0xafffff1f,
57
+ &s->lpd.iou.uart[i], sizeof(s->lpd.iou.uart[i]),
58
+};
58
+ TYPE_PL011);
59
+
59
+ dev = DEVICE(&s->lpd.iou.uart[i]);
60
+static const uint32_t analog_mask[CCM_ANALOG_MAX] = {
60
qdev_prop_set_chr(dev, "chardev", serial_hd(i));
61
+ [CCM_ANALOG_PLL_ARM] = 0xfff60f80,
61
- object_property_add_child(OBJECT(s), name, OBJECT(dev), &error_fatal);
62
+ [CCM_ANALOG_PLL_USB1] = 0xfffe0fbc,
62
qdev_init_nofail(dev);
63
+ [CCM_ANALOG_PLL_USB2] = 0xfffe0fbc,
63
64
+ [CCM_ANALOG_PLL_SYS] = 0xfffa0ffe,
64
- mr = sysbus_mmio_get_region(s->lpd.iou.uart[i], 0);
65
+ [CCM_ANALOG_PLL_SYS_SS] = 0x00000000,
65
+ mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
66
+ [CCM_ANALOG_PLL_SYS_NUM] = 0xc0000000,
66
memory_region_add_subregion(&s->mr_ps, addrs[i], mr);
67
+ [CCM_ANALOG_PLL_SYS_DENOM] = 0xc0000000,
67
68
+ [CCM_ANALOG_PLL_AUDIO] = 0xffe20f80,
68
- sysbus_connect_irq(s->lpd.iou.uart[i], 0, pic[irqs[i]]);
69
+ [CCM_ANALOG_PLL_AUDIO_NUM] = 0xc0000000,
69
+ sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[irqs[i]]);
70
+ [CCM_ANALOG_PLL_AUDIO_DENOM] = 0xc0000000,
70
g_free(name);
71
+ [CCM_ANALOG_PLL_VIDEO] = 0xffe20f80,
72
+ [CCM_ANALOG_PLL_VIDEO_NUM] = 0xc0000000,
73
+ [CCM_ANALOG_PLL_VIDEO_DENOM] = 0xc0000000,
74
+ [CCM_ANALOG_PLL_ENET] = 0xffc20ff0,
75
+ [CCM_ANALOG_PFD_480] = 0x40404040,
76
+ [CCM_ANALOG_PFD_528] = 0x40404040,
77
+ [PMU_MISC0] = 0x01fe8306,
78
+ [PMU_MISC1] = 0x07fcede0,
79
+ [PMU_MISC2] = 0x005f5f5f,
80
+};
81
+
82
static const char *imx6ul_ccm_reg_name(uint32_t reg)
83
{
84
static char unknown[20];
85
@@ -XXX,XX +XXX,XX @@ static void imx6ul_ccm_write(void *opaque, hwaddr offset, uint64_t value,
86
87
trace_ccm_write_reg(imx6ul_ccm_reg_name(index), (uint32_t)value);
88
89
- /*
90
- * We will do a better implementation later. In particular some bits
91
- * cannot be written to.
92
- */
93
- s->ccm[index] = (uint32_t)value;
94
+ s->ccm[index] = (s->ccm[index] & ccm_mask[index]) |
95
+ ((uint32_t)value & ~ccm_mask[index]);
96
}
97
98
static uint64_t imx6ul_analog_read(void *opaque, hwaddr offset, unsigned size)
99
@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
100
* the REG_NAME register. So we change the value of the
101
* REG_NAME register, setting bits passed in the value.
102
*/
103
- s->analog[index - 1] |= value;
104
+ s->analog[index - 1] |= (value & ~analog_mask[index - 1]);
105
break;
106
case CCM_ANALOG_PLL_ARM_CLR:
107
case CCM_ANALOG_PLL_USB1_CLR:
108
@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
109
* the REG_NAME register. So we change the value of the
110
* REG_NAME register, unsetting bits passed in the value.
111
*/
112
- s->analog[index - 2] &= ~value;
113
+ s->analog[index - 2] &= ~(value & ~analog_mask[index - 2]);
114
break;
115
case CCM_ANALOG_PLL_ARM_TOG:
116
case CCM_ANALOG_PLL_USB1_TOG:
117
@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
118
* the REG_NAME register. So we change the value of the
119
* REG_NAME register, toggling bits passed in the value.
120
*/
121
- s->analog[index - 3] ^= value;
122
+ s->analog[index - 3] ^= (value & ~analog_mask[index - 3]);
123
break;
124
default:
125
- /*
126
- * We will do a better implementation later. In particular some bits
127
- * cannot be written to.
128
- */
129
- s->analog[index] = value;
130
+ s->analog[index] = (s->analog[index] & analog_mask[index]) |
131
+ (value & ~analog_mask[index]);
132
break;
71
}
133
}
72
}
134
}
73
--
135
--
74
2.20.1
136
2.20.1
75
137
76
138
diff view generated by jsdifflib
1
From: Philippe Mathieu-Daudé <f4bug@amsat.org>
1
From: Erik Smit <erik.lucas.smit@gmail.com>
2
2
3
By using the TYPE_* definitions for devices, we can:
3
The hardware supports configurable descriptor sizes, configured in the DBLAC
4
- quickly find where devices are used with 'git-grep'
4
register.
5
- easily rename a device (one-line change).
6
5
7
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
6
Most drivers use the default 4 word descriptor, which is currently hardcoded,
8
Message-id: 20200428154650.21991-1-f4bug@amsat.org
7
but Aspeed SDK configures 8 words to store extra data.
9
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
8
9
Signed-off-by: Erik Smit <erik.lucas.smit@gmail.com>
10
Reviewed-by: Cédric Le Goater <clg@kaod.org>
11
[PMM: removed unnecessary parens]
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
12
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
---
13
---
12
hw/arm/mps2-tz.c | 2 +-
14
hw/net/ftgmac100.c | 26 ++++++++++++++++++++++++--
13
1 file changed, 1 insertion(+), 1 deletion(-)
15
1 file changed, 24 insertions(+), 2 deletions(-)
14
16
15
diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
17
diff --git a/hw/net/ftgmac100.c b/hw/net/ftgmac100.c
16
index XXXXXXX..XXXXXXX 100644
18
index XXXXXXX..XXXXXXX 100644
17
--- a/hw/arm/mps2-tz.c
19
--- a/hw/net/ftgmac100.c
18
+++ b/hw/arm/mps2-tz.c
20
+++ b/hw/net/ftgmac100.c
19
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
21
@@ -XXX,XX +XXX,XX @@
20
exit(EXIT_FAILURE);
22
#define FTGMAC100_APTC_TXPOLL_CNT(x) (((x) >> 8) & 0xf)
23
#define FTGMAC100_APTC_TXPOLL_TIME_SEL (1 << 12)
24
25
+/*
26
+ * DMA burst length and arbitration control register
27
+ */
28
+#define FTGMAC100_DBLAC_RXBURST_SIZE(x) (((x) >> 8) & 0x3)
29
+#define FTGMAC100_DBLAC_TXBURST_SIZE(x) (((x) >> 10) & 0x3)
30
+#define FTGMAC100_DBLAC_RXDES_SIZE(x) ((((x) >> 12) & 0xf) * 8)
31
+#define FTGMAC100_DBLAC_TXDES_SIZE(x) ((((x) >> 16) & 0xf) * 8)
32
+#define FTGMAC100_DBLAC_IFG_CNT(x) (((x) >> 20) & 0x7)
33
+#define FTGMAC100_DBLAC_IFG_INC (1 << 23)
34
+
35
/*
36
* PHY control register
37
*/
38
@@ -XXX,XX +XXX,XX @@ static void ftgmac100_do_tx(FTGMAC100State *s, uint32_t tx_ring,
39
if (bd.des0 & s->txdes0_edotr) {
40
addr = tx_ring;
41
} else {
42
- addr += sizeof(FTGMAC100Desc);
43
+ addr += FTGMAC100_DBLAC_TXDES_SIZE(s->dblac);
44
}
21
}
45
}
22
46
23
- sysbus_init_child_obj(OBJECT(machine), "iotkit", &mms->iotkit,
47
@@ -XXX,XX +XXX,XX @@ static void ftgmac100_write(void *opaque, hwaddr addr,
24
+ sysbus_init_child_obj(OBJECT(machine), TYPE_IOTKIT, &mms->iotkit,
48
s->phydata = value & 0xffff;
25
sizeof(mms->iotkit), mmc->armsse_type);
49
break;
26
iotkitdev = DEVICE(&mms->iotkit);
50
case FTGMAC100_DBLAC: /* DMA Burst Length and Arbitration Control */
27
object_property_set_link(OBJECT(&mms->iotkit), OBJECT(system_memory),
51
+ if (FTGMAC100_DBLAC_TXDES_SIZE(s->dblac) < sizeof(FTGMAC100Desc)) {
52
+ qemu_log_mask(LOG_GUEST_ERROR,
53
+ "%s: transmit descriptor too small : %d bytes\n",
54
+ __func__, FTGMAC100_DBLAC_TXDES_SIZE(s->dblac));
55
+ break;
56
+ }
57
+ if (FTGMAC100_DBLAC_RXDES_SIZE(s->dblac) < sizeof(FTGMAC100Desc)) {
58
+ qemu_log_mask(LOG_GUEST_ERROR,
59
+ "%s: receive descriptor too small : %d bytes\n",
60
+ __func__, FTGMAC100_DBLAC_RXDES_SIZE(s->dblac));
61
+ break;
62
+ }
63
s->dblac = value;
64
break;
65
case FTGMAC100_REVR: /* Feature Register */
66
@@ -XXX,XX +XXX,XX @@ static ssize_t ftgmac100_receive(NetClientState *nc, const uint8_t *buf,
67
if (bd.des0 & s->rxdes0_edorr) {
68
addr = s->rx_ring;
69
} else {
70
- addr += sizeof(FTGMAC100Desc);
71
+ addr += FTGMAC100_DBLAC_RXDES_SIZE(s->dblac);
72
}
73
}
74
s->rx_descriptor = addr;
28
--
75
--
29
2.20.1
76
2.20.1
30
77
31
78
diff view generated by jsdifflib
Deleted patch
1
We define ARMMMUIdx_Stage2 as being an MMU index which uses a QEMU
2
TLB. However we never actually use the TLB -- all stage 2 lookups
3
are done by direct calls to get_phys_addr_lpae() followed by a
4
physical address load via address_space_ld*().
5
1
6
Remove Stage2 from the list of ARM MMU indexes which correspond to
7
real core MMU indexes, and instead put it in the set of "NOTLB" ARM
8
MMU indexes.
9
10
This allows us to drop NB_MMU_MODES to 11. It also means we can
11
safely add support for the ARMv8.3-TTS2UXN extension, which adds
12
permission bits to the stage 2 descriptors which define execute
13
permission separatel for EL0 and EL1; supporting that while keeping
14
Stage2 in a QEMU TLB would require us to use separate TLBs for
15
"Stage2 for an EL0 access" and "Stage2 for an EL1 access", which is a
16
lot of extra complication given we aren't even using the QEMU TLB.
17
18
In the process of updating the comment on our MMU index use,
19
fix a couple of other minor errors:
20
* NS EL2 EL2&0 was missing from the list in the comment
21
* some text hadn't been updated from when we bumped NB_MMU_MODES
22
above 8
23
24
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
25
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
26
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
27
Message-id: 20200330210400.11724-2-peter.maydell@linaro.org
28
---
29
target/arm/cpu-param.h | 2 +-
30
target/arm/cpu.h | 21 +++++---
31
target/arm/helper.c | 112 ++++-------------------------------------
32
3 files changed, 27 insertions(+), 108 deletions(-)
33
34
diff --git a/target/arm/cpu-param.h b/target/arm/cpu-param.h
35
index XXXXXXX..XXXXXXX 100644
36
--- a/target/arm/cpu-param.h
37
+++ b/target/arm/cpu-param.h
38
@@ -XXX,XX +XXX,XX @@
39
# define TARGET_PAGE_BITS_MIN 10
40
#endif
41
42
-#define NB_MMU_MODES 12
43
+#define NB_MMU_MODES 11
44
45
#endif
46
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
47
index XXXXXXX..XXXXXXX 100644
48
--- a/target/arm/cpu.h
49
+++ b/target/arm/cpu.h
50
@@ -XXX,XX +XXX,XX @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
51
* handling via the TLB. The only way to do a stage 1 translation without
52
* the immediate stage 2 translation is via the ATS or AT system insns,
53
* which can be slow-pathed and always do a page table walk.
54
+ * The only use of stage 2 translations is either as part of an s1+2
55
+ * lookup or when loading the descriptors during a stage 1 page table walk,
56
+ * and in both those cases we don't use the TLB.
57
* 4. we can also safely fold together the "32 bit EL3" and "64 bit EL3"
58
* translation regimes, because they map reasonably well to each other
59
* and they can't both be active at the same time.
60
@@ -XXX,XX +XXX,XX @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
61
* NS EL1 EL1&0 stage 1+2 (aka NS PL1)
62
* NS EL1 EL1&0 stage 1+2 +PAN
63
* NS EL0 EL2&0
64
+ * NS EL2 EL2&0
65
* NS EL2 EL2&0 +PAN
66
* NS EL2 (aka NS PL2)
67
* S EL0 EL1&0 (aka S PL0)
68
* S EL1 EL1&0 (not used if EL3 is 32 bit)
69
* S EL1 EL1&0 +PAN
70
* S EL3 (aka S PL1)
71
- * NS EL1&0 stage 2
72
*
73
- * for a total of 12 different mmu_idx.
74
+ * for a total of 11 different mmu_idx.
75
*
76
* R profile CPUs have an MPU, but can use the same set of MMU indexes
77
* as A profile. They only need to distinguish NS EL0 and NS EL1 (and
78
@@ -XXX,XX +XXX,XX @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
79
* are not quite the same -- different CPU types (most notably M profile
80
* vs A/R profile) would like to use MMU indexes with different semantics,
81
* but since we don't ever need to use all of those in a single CPU we
82
- * can avoid setting NB_MMU_MODES to more than 8. The lower bits of
83
+ * can avoid having to set NB_MMU_MODES to "total number of A profile MMU
84
+ * modes + total number of M profile MMU modes". The lower bits of
85
* ARMMMUIdx are the core TLB mmu index, and the higher bits are always
86
* the same for any particular CPU.
87
* Variables of type ARMMUIdx are always full values, and the core
88
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdx {
89
ARMMMUIdx_SE10_1_PAN = 9 | ARM_MMU_IDX_A,
90
ARMMMUIdx_SE3 = 10 | ARM_MMU_IDX_A,
91
92
- ARMMMUIdx_Stage2 = 11 | ARM_MMU_IDX_A,
93
-
94
/*
95
* These are not allocated TLBs and are used only for AT system
96
* instructions or for the first stage of an S12 page table walk.
97
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdx {
98
ARMMMUIdx_Stage1_E0 = 0 | ARM_MMU_IDX_NOTLB,
99
ARMMMUIdx_Stage1_E1 = 1 | ARM_MMU_IDX_NOTLB,
100
ARMMMUIdx_Stage1_E1_PAN = 2 | ARM_MMU_IDX_NOTLB,
101
+ /*
102
+ * Not allocated a TLB: used only for second stage of an S12 page
103
+ * table walk, or for descriptor loads during first stage of an S1
104
+ * page table walk. Note that if we ever want to have a TLB for this
105
+ * then various TLB flush insns which currently are no-ops or flush
106
+ * only stage 1 MMU indexes will need to change to flush stage 2.
107
+ */
108
+ ARMMMUIdx_Stage2 = 3 | ARM_MMU_IDX_NOTLB,
109
110
/*
111
* M-profile.
112
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdxBit {
113
TO_CORE_BIT(SE10_1),
114
TO_CORE_BIT(SE10_1_PAN),
115
TO_CORE_BIT(SE3),
116
- TO_CORE_BIT(Stage2),
117
118
TO_CORE_BIT(MUser),
119
TO_CORE_BIT(MPriv),
120
diff --git a/target/arm/helper.c b/target/arm/helper.c
121
index XXXXXXX..XXXXXXX 100644
122
--- a/target/arm/helper.c
123
+++ b/target/arm/helper.c
124
@@ -XXX,XX +XXX,XX @@ static void tlbiall_nsnh_write(CPUARMState *env, const ARMCPRegInfo *ri,
125
tlb_flush_by_mmuidx(cs,
126
ARMMMUIdxBit_E10_1 |
127
ARMMMUIdxBit_E10_1_PAN |
128
- ARMMMUIdxBit_E10_0 |
129
- ARMMMUIdxBit_Stage2);
130
+ ARMMMUIdxBit_E10_0);
131
}
132
133
static void tlbiall_nsnh_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
134
@@ -XXX,XX +XXX,XX @@ static void tlbiall_nsnh_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
135
tlb_flush_by_mmuidx_all_cpus_synced(cs,
136
ARMMMUIdxBit_E10_1 |
137
ARMMMUIdxBit_E10_1_PAN |
138
- ARMMMUIdxBit_E10_0 |
139
- ARMMMUIdxBit_Stage2);
140
+ ARMMMUIdxBit_E10_0);
141
}
142
143
-static void tlbiipas2_write(CPUARMState *env, const ARMCPRegInfo *ri,
144
- uint64_t value)
145
-{
146
- /* Invalidate by IPA. This has to invalidate any structures that
147
- * contain only stage 2 translation information, but does not need
148
- * to apply to structures that contain combined stage 1 and stage 2
149
- * translation information.
150
- * This must NOP if EL2 isn't implemented or SCR_EL3.NS is zero.
151
- */
152
- CPUState *cs = env_cpu(env);
153
- uint64_t pageaddr;
154
-
155
- if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
156
- return;
157
- }
158
-
159
- pageaddr = sextract64(value << 12, 0, 40);
160
-
161
- tlb_flush_page_by_mmuidx(cs, pageaddr, ARMMMUIdxBit_Stage2);
162
-}
163
-
164
-static void tlbiipas2_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
165
- uint64_t value)
166
-{
167
- CPUState *cs = env_cpu(env);
168
- uint64_t pageaddr;
169
-
170
- if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
171
- return;
172
- }
173
-
174
- pageaddr = sextract64(value << 12, 0, 40);
175
-
176
- tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr,
177
- ARMMMUIdxBit_Stage2);
178
-}
179
180
static void tlbiall_hyp_write(CPUARMState *env, const ARMCPRegInfo *ri,
181
uint64_t value)
182
@@ -XXX,XX +XXX,XX @@ static void vttbr_write(CPUARMState *env, const ARMCPRegInfo *ri,
183
tlb_flush_by_mmuidx(cs,
184
ARMMMUIdxBit_E10_1 |
185
ARMMMUIdxBit_E10_1_PAN |
186
- ARMMMUIdxBit_E10_0 |
187
- ARMMMUIdxBit_Stage2);
188
+ ARMMMUIdxBit_E10_0);
189
raw_write(env, ri, value);
190
}
191
}
192
@@ -XXX,XX +XXX,XX @@ static int alle1_tlbmask(CPUARMState *env)
193
return ARMMMUIdxBit_SE10_1 |
194
ARMMMUIdxBit_SE10_1_PAN |
195
ARMMMUIdxBit_SE10_0;
196
- } else if (arm_feature(env, ARM_FEATURE_EL2)) {
197
- return ARMMMUIdxBit_E10_1 |
198
- ARMMMUIdxBit_E10_1_PAN |
199
- ARMMMUIdxBit_E10_0 |
200
- ARMMMUIdxBit_Stage2;
201
} else {
202
return ARMMMUIdxBit_E10_1 |
203
ARMMMUIdxBit_E10_1_PAN |
204
@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae3is_write(CPUARMState *env, const ARMCPRegInfo *ri,
205
ARMMMUIdxBit_SE3);
206
}
207
208
-static void tlbi_aa64_ipas2e1_write(CPUARMState *env, const ARMCPRegInfo *ri,
209
- uint64_t value)
210
-{
211
- /* Invalidate by IPA. This has to invalidate any structures that
212
- * contain only stage 2 translation information, but does not need
213
- * to apply to structures that contain combined stage 1 and stage 2
214
- * translation information.
215
- * This must NOP if EL2 isn't implemented or SCR_EL3.NS is zero.
216
- */
217
- ARMCPU *cpu = env_archcpu(env);
218
- CPUState *cs = CPU(cpu);
219
- uint64_t pageaddr;
220
-
221
- if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
222
- return;
223
- }
224
-
225
- pageaddr = sextract64(value << 12, 0, 48);
226
-
227
- tlb_flush_page_by_mmuidx(cs, pageaddr, ARMMMUIdxBit_Stage2);
228
-}
229
-
230
-static void tlbi_aa64_ipas2e1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
231
- uint64_t value)
232
-{
233
- CPUState *cs = env_cpu(env);
234
- uint64_t pageaddr;
235
-
236
- if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
237
- return;
238
- }
239
-
240
- pageaddr = sextract64(value << 12, 0, 48);
241
-
242
- tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr,
243
- ARMMMUIdxBit_Stage2);
244
-}
245
-
246
static CPAccessResult aa64_zva_access(CPUARMState *env, const ARMCPRegInfo *ri,
247
bool isread)
248
{
249
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
250
.writefn = tlbi_aa64_vae1_write },
251
{ .name = "TLBI_IPAS2E1IS", .state = ARM_CP_STATE_AA64,
252
.opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 1,
253
- .access = PL2_W, .type = ARM_CP_NO_RAW,
254
- .writefn = tlbi_aa64_ipas2e1is_write },
255
+ .access = PL2_W, .type = ARM_CP_NOP },
256
{ .name = "TLBI_IPAS2LE1IS", .state = ARM_CP_STATE_AA64,
257
.opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 5,
258
- .access = PL2_W, .type = ARM_CP_NO_RAW,
259
- .writefn = tlbi_aa64_ipas2e1is_write },
260
+ .access = PL2_W, .type = ARM_CP_NOP },
261
{ .name = "TLBI_ALLE1IS", .state = ARM_CP_STATE_AA64,
262
.opc0 = 1, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 4,
263
.access = PL2_W, .type = ARM_CP_NO_RAW,
264
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
265
.writefn = tlbi_aa64_alle1is_write },
266
{ .name = "TLBI_IPAS2E1", .state = ARM_CP_STATE_AA64,
267
.opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 1,
268
- .access = PL2_W, .type = ARM_CP_NO_RAW,
269
- .writefn = tlbi_aa64_ipas2e1_write },
270
+ .access = PL2_W, .type = ARM_CP_NOP },
271
{ .name = "TLBI_IPAS2LE1", .state = ARM_CP_STATE_AA64,
272
.opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 5,
273
- .access = PL2_W, .type = ARM_CP_NO_RAW,
274
- .writefn = tlbi_aa64_ipas2e1_write },
275
+ .access = PL2_W, .type = ARM_CP_NOP },
276
{ .name = "TLBI_ALLE1", .state = ARM_CP_STATE_AA64,
277
.opc0 = 1, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 4,
278
.access = PL2_W, .type = ARM_CP_NO_RAW,
279
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
280
.writefn = tlbimva_hyp_is_write },
281
{ .name = "TLBIIPAS2",
282
.cp = 15, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 1,
283
- .type = ARM_CP_NO_RAW, .access = PL2_W,
284
- .writefn = tlbiipas2_write },
285
+ .type = ARM_CP_NOP, .access = PL2_W },
286
{ .name = "TLBIIPAS2IS",
287
.cp = 15, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 1,
288
- .type = ARM_CP_NO_RAW, .access = PL2_W,
289
- .writefn = tlbiipas2_is_write },
290
+ .type = ARM_CP_NOP, .access = PL2_W },
291
{ .name = "TLBIIPAS2L",
292
.cp = 15, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 5,
293
- .type = ARM_CP_NO_RAW, .access = PL2_W,
294
- .writefn = tlbiipas2_write },
295
+ .type = ARM_CP_NOP, .access = PL2_W },
296
{ .name = "TLBIIPAS2LIS",
297
.cp = 15, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 5,
298
- .type = ARM_CP_NO_RAW, .access = PL2_W,
299
- .writefn = tlbiipas2_is_write },
300
+ .type = ARM_CP_NOP, .access = PL2_W },
301
/* 32 bit cache operations */
302
{ .name = "ICIALLUIS", .cp = 15, .opc1 = 0, .crn = 7, .crm = 1, .opc2 = 0,
303
.type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_pou_access },
304
--
305
2.20.1
306
307
diff view generated by jsdifflib
Deleted patch
1
The access_type argument to get_phys_addr_lpae() is an MMUAccessType;
2
use the enum constant MMU_DATA_LOAD rather than a literal 0 when we
3
call it in S1_ptw_translate().
4
1
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
7
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20200330210400.11724-3-peter.maydell@linaro.org
9
---
10
target/arm/helper.c | 5 +++--
11
1 file changed, 3 insertions(+), 2 deletions(-)
12
13
diff --git a/target/arm/helper.c b/target/arm/helper.c
14
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/helper.c
16
+++ b/target/arm/helper.c
17
@@ -XXX,XX +XXX,XX @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
18
pcacheattrs = &cacheattrs;
19
}
20
21
- ret = get_phys_addr_lpae(env, addr, 0, ARMMMUIdx_Stage2, &s2pa,
22
- &txattrs, &s2prot, &s2size, fi, pcacheattrs);
23
+ ret = get_phys_addr_lpae(env, addr, MMU_DATA_LOAD, ARMMMUIdx_Stage2,
24
+ &s2pa, &txattrs, &s2prot, &s2size, fi,
25
+ pcacheattrs);
26
if (ret) {
27
assert(fi->type != ARMFault_None);
28
fi->s2addr = addr;
29
--
30
2.20.1
31
32
diff view generated by jsdifflib
Deleted patch
1
For ARMv8.2-TTS2UXN, the stage 2 page table walk wants to know
2
whether the stage 1 access is for EL0 or not, because whether
3
exec permission is given can depend on whether this is an EL0
4
or EL1 access. Add a new argument to get_phys_addr_lpae() so
5
the call sites can pass this information in.
6
1
7
Since get_phys_addr_lpae() doesn't already have a doc comment,
8
add one so we have a place to put the documentation of the
9
semantics of the new s1_is_el0 argument.
10
11
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
12
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
13
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
14
Message-id: 20200330210400.11724-4-peter.maydell@linaro.org
15
---
16
target/arm/helper.c | 29 ++++++++++++++++++++++++++++-
17
1 file changed, 28 insertions(+), 1 deletion(-)
18
19
diff --git a/target/arm/helper.c b/target/arm/helper.c
20
index XXXXXXX..XXXXXXX 100644
21
--- a/target/arm/helper.c
22
+++ b/target/arm/helper.c
23
@@ -XXX,XX +XXX,XX @@
24
25
static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
26
MMUAccessType access_type, ARMMMUIdx mmu_idx,
27
+ bool s1_is_el0,
28
hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
29
target_ulong *page_size_ptr,
30
ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs);
31
@@ -XXX,XX +XXX,XX @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
32
}
33
34
ret = get_phys_addr_lpae(env, addr, MMU_DATA_LOAD, ARMMMUIdx_Stage2,
35
+ false,
36
&s2pa, &txattrs, &s2prot, &s2size, fi,
37
pcacheattrs);
38
if (ret) {
39
@@ -XXX,XX +XXX,XX @@ static ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
40
};
41
}
42
43
+/**
44
+ * get_phys_addr_lpae: perform one stage of page table walk, LPAE format
45
+ *
46
+ * Returns false if the translation was successful. Otherwise, phys_ptr, attrs,
47
+ * prot and page_size may not be filled in, and the populated fsr value provides
48
+ * information on why the translation aborted, in the format of a long-format
49
+ * DFSR/IFSR fault register, with the following caveats:
50
+ * * the WnR bit is never set (the caller must do this).
51
+ *
52
+ * @env: CPUARMState
53
+ * @address: virtual address to get physical address for
54
+ * @access_type: MMU_DATA_LOAD, MMU_DATA_STORE or MMU_INST_FETCH
55
+ * @mmu_idx: MMU index indicating required translation regime
56
+ * @s1_is_el0: if @mmu_idx is ARMMMUIdx_Stage2 (so this is a stage 2 page table
57
+ * walk), must be true if this is stage 2 of a stage 1+2 walk for an
58
+ * EL0 access). If @mmu_idx is anything else, @s1_is_el0 is ignored.
59
+ * @phys_ptr: set to the physical address corresponding to the virtual address
60
+ * @attrs: set to the memory transaction attributes to use
61
+ * @prot: set to the permissions for the page containing phys_ptr
62
+ * @page_size_ptr: set to the size of the page containing phys_ptr
63
+ * @fi: set to fault info if the translation fails
64
+ * @cacheattrs: (if non-NULL) set to the cacheability/shareability attributes
65
+ */
66
static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
67
MMUAccessType access_type, ARMMMUIdx mmu_idx,
68
+ bool s1_is_el0,
69
hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
70
target_ulong *page_size_ptr,
71
ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
72
@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
73
74
/* S1 is done. Now do S2 translation. */
75
ret = get_phys_addr_lpae(env, ipa, access_type, ARMMMUIdx_Stage2,
76
+ mmu_idx == ARMMMUIdx_E10_0,
77
phys_ptr, attrs, &s2_prot,
78
page_size, fi,
79
cacheattrs != NULL ? &cacheattrs2 : NULL);
80
@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
81
}
82
83
if (regime_using_lpae_format(env, mmu_idx)) {
84
- return get_phys_addr_lpae(env, address, access_type, mmu_idx,
85
+ return get_phys_addr_lpae(env, address, access_type, mmu_idx, false,
86
phys_ptr, attrs, prot, page_size,
87
fi, cacheattrs);
88
} else if (regime_sctlr(env, mmu_idx) & SCTLR_XP) {
89
--
90
2.20.1
91
92
diff view generated by jsdifflib
1
The ARMv8.2-TTS2UXN feature extends the XN field in stage 2
1
From: fangying <fangying1@huawei.com>
2
translation table descriptors from just bit [54] to bits [54:53],
3
allowing stage 2 to control execution permissions separately for EL0
4
and EL1. Implement the new semantics of the XN field and enable
5
the feature for our 'max' CPU.
6
2
3
Virtual time adjustment was implemented for virt-5.0 machine type,
4
but the cpu property was enabled only for host-passthrough and max
5
cpu model. Let's add it for any KVM arm cpu which has the generic
6
timer feature enabled.
7
8
Signed-off-by: Ying Fang <fangying1@huawei.com>
9
Reviewed-by: Andrew Jones <drjones@redhat.com>
10
Message-id: 20200608121243.2076-1-fangying1@huawei.com
11
[PMM: minor commit message tweak, removed inaccurate
12
suggested-by tag]
7
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
13
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
9
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
10
Message-id: 20200330210400.11724-5-peter.maydell@linaro.org
11
---
14
---
12
target/arm/cpu.h | 15 +++++++++++++++
15
target/arm/cpu.c | 6 ++++--
13
target/arm/cpu.c | 1 +
16
target/arm/cpu64.c | 1 -
14
target/arm/cpu64.c | 2 ++
17
target/arm/kvm.c | 21 +++++++++++----------
15
target/arm/helper.c | 37 +++++++++++++++++++++++++++++++------
18
3 files changed, 15 insertions(+), 13 deletions(-)
16
4 files changed, 49 insertions(+), 6 deletions(-)
17
19
18
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
19
index XXXXXXX..XXXXXXX 100644
20
--- a/target/arm/cpu.h
21
+++ b/target/arm/cpu.h
22
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_ccidx(const ARMISARegisters *id)
23
return FIELD_EX32(id->id_mmfr4, ID_MMFR4, CCIDX) != 0;
24
}
25
26
+static inline bool isar_feature_aa32_tts2uxn(const ARMISARegisters *id)
27
+{
28
+ return FIELD_EX32(id->id_mmfr4, ID_MMFR4, XNX) != 0;
29
+}
30
+
31
/*
32
* 64-bit feature tests via id registers.
33
*/
34
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_ccidx(const ARMISARegisters *id)
35
return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, CCIDX) != 0;
36
}
37
38
+static inline bool isar_feature_aa64_tts2uxn(const ARMISARegisters *id)
39
+{
40
+ return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, XNX) != 0;
41
+}
42
+
43
/*
44
* Feature tests for "does this exist in either 32-bit or 64-bit?"
45
*/
46
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_any_ccidx(const ARMISARegisters *id)
47
return isar_feature_aa64_ccidx(id) || isar_feature_aa32_ccidx(id);
48
}
49
50
+static inline bool isar_feature_any_tts2uxn(const ARMISARegisters *id)
51
+{
52
+ return isar_feature_aa64_tts2uxn(id) || isar_feature_aa32_tts2uxn(id);
53
+}
54
+
55
/*
56
* Forward to the above feature tests given an ARMCPU pointer.
57
*/
58
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
20
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
59
index XXXXXXX..XXXXXXX 100644
21
index XXXXXXX..XXXXXXX 100644
60
--- a/target/arm/cpu.c
22
--- a/target/arm/cpu.c
61
+++ b/target/arm/cpu.c
23
+++ b/target/arm/cpu.c
24
@@ -XXX,XX +XXX,XX @@ void arm_cpu_post_init(Object *obj)
25
if (arm_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER)) {
26
qdev_property_add_static(DEVICE(cpu), &arm_cpu_gt_cntfrq_property);
27
}
28
+
29
+ if (kvm_enabled()) {
30
+ kvm_arm_add_vcpu_properties(obj);
31
+ }
32
}
33
34
static void arm_cpu_finalizefn(Object *obj)
62
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
35
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
63
t = FIELD_DP32(t, ID_MMFR4, HPDS, 1); /* AA32HPD */
36
64
t = FIELD_DP32(t, ID_MMFR4, AC2, 1); /* ACTLR2, HACTLR2 */
37
if (kvm_enabled()) {
65
t = FIELD_DP32(t, ID_MMFR4, CNP, 1); /* TTCNP */
38
kvm_arm_set_cpu_features_from_host(cpu);
66
+ t = FIELD_DP32(t, ID_MMFR4, XNX, 1); /* TTS2UXN */
39
- kvm_arm_add_vcpu_properties(obj);
67
cpu->isar.id_mmfr4 = t;
40
} else {
68
}
41
cortex_a15_initfn(obj);
69
#endif
42
43
@@ -XXX,XX +XXX,XX @@ static void arm_host_initfn(Object *obj)
44
if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
45
aarch64_add_sve_properties(obj);
46
}
47
- kvm_arm_add_vcpu_properties(obj);
48
arm_cpu_post_init(obj);
49
}
50
70
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
51
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
71
index XXXXXXX..XXXXXXX 100644
52
index XXXXXXX..XXXXXXX 100644
72
--- a/target/arm/cpu64.c
53
--- a/target/arm/cpu64.c
73
+++ b/target/arm/cpu64.c
54
+++ b/target/arm/cpu64.c
74
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
55
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
75
t = FIELD_DP64(t, ID_AA64MMFR1, VH, 1);
56
76
t = FIELD_DP64(t, ID_AA64MMFR1, PAN, 2); /* ATS1E1 */
57
if (kvm_enabled()) {
77
t = FIELD_DP64(t, ID_AA64MMFR1, VMIDBITS, 2); /* VMID16 */
58
kvm_arm_set_cpu_features_from_host(cpu);
78
+ t = FIELD_DP64(t, ID_AA64MMFR1, XNX, 1); /* TTS2UXN */
59
- kvm_arm_add_vcpu_properties(obj);
79
cpu->isar.id_aa64mmfr1 = t;
60
} else {
80
61
uint64_t t;
81
t = cpu->isar.id_aa64mmfr2;
62
uint32_t u;
82
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
63
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
83
u = FIELD_DP32(u, ID_MMFR4, HPDS, 1); /* AA32HPD */
84
u = FIELD_DP32(u, ID_MMFR4, AC2, 1); /* ACTLR2, HACTLR2 */
85
u = FIELD_DP32(u, ID_MMFR4, CNP, 1); /* TTCNP */
86
+ u = FIELD_DP32(u, ID_MMFR4, XNX, 1); /* TTS2UXN */
87
cpu->isar.id_mmfr4 = u;
88
89
u = cpu->isar.id_aa64dfr0;
90
diff --git a/target/arm/helper.c b/target/arm/helper.c
91
index XXXXXXX..XXXXXXX 100644
64
index XXXXXXX..XXXXXXX 100644
92
--- a/target/arm/helper.c
65
--- a/target/arm/kvm.c
93
+++ b/target/arm/helper.c
66
+++ b/target/arm/kvm.c
94
@@ -XXX,XX +XXX,XX @@ simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
67
@@ -XXX,XX +XXX,XX @@ static void kvm_no_adjvtime_set(Object *obj, bool value, Error **errp)
95
*
68
/* KVM VCPU properties should be prefixed with "kvm-". */
96
* @env: CPUARMState
69
void kvm_arm_add_vcpu_properties(Object *obj)
97
* @s2ap: The 2-bit stage2 access permissions (S2AP)
98
- * @xn: XN (execute-never) bit
99
+ * @xn: XN (execute-never) bits
100
+ * @s1_is_el0: true if this is S2 of an S1+2 walk for EL0
101
*/
102
-static int get_S2prot(CPUARMState *env, int s2ap, int xn)
103
+static int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0)
104
{
70
{
105
int prot = 0;
71
- if (!kvm_enabled()) {
106
72
- return;
107
@@ -XXX,XX +XXX,XX @@ static int get_S2prot(CPUARMState *env, int s2ap, int xn)
73
- }
108
if (s2ap & 2) {
74
+ ARMCPU *cpu = ARM_CPU(obj);
109
prot |= PAGE_WRITE;
75
+ CPUARMState *env = &cpu->env;
110
}
76
111
- if (!xn) {
77
- ARM_CPU(obj)->kvm_adjvtime = true;
112
- if (arm_el_is_aa64(env, 2) || prot & PAGE_READ) {
78
- object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get,
113
+
79
- kvm_no_adjvtime_set);
114
+ if (cpu_isar_feature(any_tts2uxn, env_archcpu(env))) {
80
- object_property_set_description(obj, "kvm-no-adjvtime",
115
+ switch (xn) {
81
- "Set on to disable the adjustment of "
116
+ case 0:
82
- "the virtual counter. VM stopped time "
117
prot |= PAGE_EXEC;
83
- "will be counted.");
118
+ break;
84
+ if (arm_feature(env, ARM_FEATURE_GENERIC_TIMER)) {
119
+ case 1:
85
+ cpu->kvm_adjvtime = true;
120
+ if (s1_is_el0) {
86
+ object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get,
121
+ prot |= PAGE_EXEC;
87
+ kvm_no_adjvtime_set);
122
+ }
88
+ object_property_set_description(obj, "kvm-no-adjvtime",
123
+ break;
89
+ "Set on to disable the adjustment of "
124
+ case 2:
90
+ "the virtual counter. VM stopped time "
125
+ break;
91
+ "will be counted.");
126
+ case 3:
92
+ }
127
+ if (!s1_is_el0) {
93
}
128
+ prot |= PAGE_EXEC;
94
129
+ }
95
bool kvm_arm_pmu_supported(CPUState *cpu)
130
+ break;
131
+ default:
132
+ g_assert_not_reached();
133
+ }
134
+ } else {
135
+ if (!extract32(xn, 1, 1)) {
136
+ if (arm_el_is_aa64(env, 2) || prot & PAGE_READ) {
137
+ prot |= PAGE_EXEC;
138
+ }
139
}
140
}
141
return prot;
142
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
143
}
144
145
ap = extract32(attrs, 4, 2);
146
- xn = extract32(attrs, 12, 1);
147
148
if (mmu_idx == ARMMMUIdx_Stage2) {
149
ns = true;
150
- *prot = get_S2prot(env, ap, xn);
151
+ xn = extract32(attrs, 11, 2);
152
+ *prot = get_S2prot(env, ap, xn, s1_is_el0);
153
} else {
154
ns = extract32(attrs, 3, 1);
155
+ xn = extract32(attrs, 12, 1);
156
pxn = extract32(attrs, 11, 1);
157
*prot = get_S1prot(env, mmu_idx, aarch64, ap, ns, xn, pxn);
158
}
159
--
96
--
160
2.20.1
97
2.20.1
161
98
162
99
diff view generated by jsdifflib
Deleted patch
1
In aarch64_max_initfn() we update both 32-bit and 64-bit ID
2
registers. The intended pattern is that for 64-bit ID registers we
3
use FIELD_DP64 and the uint64_t 't' register, while 32-bit ID
4
registers use FIELD_DP32 and the uint32_t 'u' register. For
5
ID_AA64DFR0 we accidentally used 'u', meaning that the top 32 bits of
6
this 64-bit ID register would end up always zero. Luckily at the
7
moment that's what they should be anyway, so this bug has no visible
8
effects.
9
1
10
Use the right-sized variable.
11
12
Fixes: 3bec78447a958d481991
13
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
14
Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
15
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
16
Message-id: 20200423110915.10527-1-peter.maydell@linaro.org
17
---
18
target/arm/cpu64.c | 6 +++---
19
1 file changed, 3 insertions(+), 3 deletions(-)
20
21
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
22
index XXXXXXX..XXXXXXX 100644
23
--- a/target/arm/cpu64.c
24
+++ b/target/arm/cpu64.c
25
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
26
u = FIELD_DP32(u, ID_MMFR4, XNX, 1); /* TTS2UXN */
27
cpu->isar.id_mmfr4 = u;
28
29
- u = cpu->isar.id_aa64dfr0;
30
- u = FIELD_DP64(u, ID_AA64DFR0, PMUVER, 5); /* v8.4-PMU */
31
- cpu->isar.id_aa64dfr0 = u;
32
+ t = cpu->isar.id_aa64dfr0;
33
+ t = FIELD_DP64(t, ID_AA64DFR0, PMUVER, 5); /* v8.4-PMU */
34
+ cpu->isar.id_aa64dfr0 = t;
35
36
u = cpu->isar.id_dfr0;
37
u = FIELD_DP32(u, ID_DFR0, PERFMON, 5); /* v8.4-PMU */
38
--
39
2.20.1
40
41
diff view generated by jsdifflib
Deleted patch
1
From: Philippe Mathieu-Daudé <f4bug@amsat.org>
2
1
3
MIDR_EL1 is a 64-bit system register with the top 32-bit being RES0.
4
Represent it in QEMU's ARMCPU struct with a uint64_t, not a
5
uint32_t.
6
7
This fixes an error when compiling with -Werror=conversion
8
because we were manipulating the register value using a
9
local uint64_t variable:
10
11
target/arm/cpu64.c: In function ‘aarch64_max_initfn’:
12
target/arm/cpu64.c:628:21: error: conversion from ‘uint64_t’ {aka ‘long unsigned int’} to ‘uint32_t’ {aka ‘unsigned int’} may change value [-Werror=conversion]
13
628 | cpu->midr = t;
14
| ^
15
16
and future-proofs us against a possible future architecture
17
change using some of the top 32 bits.
18
19
Suggested-by: Laurent Desnogues <laurent.desnogues@gmail.com>
20
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
21
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
22
Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
23
Message-id: 20200428172634.29707-1-f4bug@amsat.org
24
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
25
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
26
---
27
target/arm/cpu.h | 2 +-
28
target/arm/cpu.c | 2 +-
29
2 files changed, 2 insertions(+), 2 deletions(-)
30
31
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
32
index XXXXXXX..XXXXXXX 100644
33
--- a/target/arm/cpu.h
34
+++ b/target/arm/cpu.h
35
@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
36
uint64_t id_aa64dfr0;
37
uint64_t id_aa64dfr1;
38
} isar;
39
- uint32_t midr;
40
+ uint64_t midr;
41
uint32_t revidr;
42
uint32_t reset_fpsid;
43
uint32_t ctr;
44
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
45
index XXXXXXX..XXXXXXX 100644
46
--- a/target/arm/cpu.c
47
+++ b/target/arm/cpu.c
48
@@ -XXX,XX +XXX,XX @@ static const ARMCPUInfo arm_cpus[] = {
49
static Property arm_cpu_properties[] = {
50
DEFINE_PROP_BOOL("start-powered-off", ARMCPU, start_powered_off, false),
51
DEFINE_PROP_UINT32("psci-conduit", ARMCPU, psci_conduit, 0),
52
- DEFINE_PROP_UINT32("midr", ARMCPU, midr, 0),
53
+ DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0),
54
DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
55
mp_affinity, ARM64_AFFINITY_INVALID),
56
DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID),
57
--
58
2.20.1
59
60
diff view generated by jsdifflib
Deleted patch
1
From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
2
1
3
Remove inclusion of arm_gicv3_common.h, this already gets
4
included via xlnx-versal.h.
5
6
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
7
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
8
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
9
Message-id: 20200427181649.26851-2-edgar.iglesias@gmail.com
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
---
12
hw/arm/xlnx-versal.c | 1 -
13
1 file changed, 1 deletion(-)
14
15
diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
16
index XXXXXXX..XXXXXXX 100644
17
--- a/hw/arm/xlnx-versal.c
18
+++ b/hw/arm/xlnx-versal.c
19
@@ -XXX,XX +XXX,XX @@
20
#include "hw/arm/boot.h"
21
#include "kvm_arm.h"
22
#include "hw/misc/unimp.h"
23
-#include "hw/intc/arm_gicv3_common.h"
24
#include "hw/arm/xlnx-versal.h"
25
#include "hw/char/pl011.h"
26
27
--
28
2.20.1
29
30
diff view generated by jsdifflib
Deleted patch
1
From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
2
1
3
Move misplaced comment.
4
5
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
6
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
7
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
8
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
9
Message-id: 20200427181649.26851-3-edgar.iglesias@gmail.com
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
---
12
hw/arm/xlnx-versal.c | 2 +-
13
1 file changed, 1 insertion(+), 1 deletion(-)
14
15
diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
16
index XXXXXXX..XXXXXXX 100644
17
--- a/hw/arm/xlnx-versal.c
18
+++ b/hw/arm/xlnx-versal.c
19
@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_cpus(Versal *s)
20
21
obj = object_new(XLNX_VERSAL_ACPU_TYPE);
22
if (!obj) {
23
- /* Secondary CPUs start in PSCI powered-down state */
24
error_report("Unable to create apu.cpu[%d] of type %s",
25
i, XLNX_VERSAL_ACPU_TYPE);
26
exit(EXIT_FAILURE);
27
@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_cpus(Versal *s)
28
object_property_set_int(obj, s->cfg.psci_conduit,
29
"psci-conduit", &error_abort);
30
if (i) {
31
+ /* Secondary CPUs start in PSCI powered-down state */
32
object_property_set_bool(obj, true,
33
"start-powered-off", &error_abort);
34
}
35
--
36
2.20.1
37
38
diff view generated by jsdifflib
Deleted patch
1
From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
2
1
3
Fix typo xlnx-ve -> xlnx-versal.
4
5
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
6
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
7
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
8
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
9
Message-id: 20200427181649.26851-4-edgar.iglesias@gmail.com
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
---
12
hw/arm/xlnx-versal-virt.c | 2 +-
13
1 file changed, 1 insertion(+), 1 deletion(-)
14
15
diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
16
index XXXXXXX..XXXXXXX 100644
17
--- a/hw/arm/xlnx-versal-virt.c
18
+++ b/hw/arm/xlnx-versal-virt.c
19
@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
20
psci_conduit = QEMU_PSCI_CONDUIT_SMC;
21
}
22
23
- sysbus_init_child_obj(OBJECT(machine), "xlnx-ve", &s->soc,
24
+ sysbus_init_child_obj(OBJECT(machine), "xlnx-versal", &s->soc,
25
sizeof(s->soc), TYPE_XLNX_VERSAL);
26
object_property_set_link(OBJECT(&s->soc), OBJECT(machine->ram),
27
"ddr", &error_abort);
28
--
29
2.20.1
30
31
diff view generated by jsdifflib
Deleted patch
1
From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
2
1
3
Embed the GEMs into the SoC type.
4
5
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
6
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
7
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
8
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
9
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
10
Message-id: 20200427181649.26851-6-edgar.iglesias@gmail.com
11
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
12
---
13
include/hw/arm/xlnx-versal.h | 3 ++-
14
hw/arm/xlnx-versal.c | 15 ++++++++-------
15
2 files changed, 10 insertions(+), 8 deletions(-)
16
17
diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
18
index XXXXXXX..XXXXXXX 100644
19
--- a/include/hw/arm/xlnx-versal.h
20
+++ b/include/hw/arm/xlnx-versal.h
21
@@ -XXX,XX +XXX,XX @@
22
#include "hw/arm/boot.h"
23
#include "hw/intc/arm_gicv3.h"
24
#include "hw/char/pl011.h"
25
+#include "hw/net/cadence_gem.h"
26
27
#define TYPE_XLNX_VERSAL "xlnx-versal"
28
#define XLNX_VERSAL(obj) OBJECT_CHECK(Versal, (obj), TYPE_XLNX_VERSAL)
29
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
30
31
struct {
32
PL011State uart[XLNX_VERSAL_NR_UARTS];
33
- SysBusDevice *gem[XLNX_VERSAL_NR_GEMS];
34
+ CadenceGEMState gem[XLNX_VERSAL_NR_GEMS];
35
SysBusDevice *adma[XLNX_VERSAL_NR_ADMAS];
36
} iou;
37
} lpd;
38
diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
39
index XXXXXXX..XXXXXXX 100644
40
--- a/hw/arm/xlnx-versal.c
41
+++ b/hw/arm/xlnx-versal.c
42
@@ -XXX,XX +XXX,XX @@ static void versal_create_gems(Versal *s, qemu_irq *pic)
43
DeviceState *dev;
44
MemoryRegion *mr;
45
46
- dev = qdev_create(NULL, "cadence_gem");
47
- s->lpd.iou.gem[i] = SYS_BUS_DEVICE(dev);
48
- object_property_add_child(OBJECT(s), name, OBJECT(dev), &error_fatal);
49
+ sysbus_init_child_obj(OBJECT(s), name,
50
+ &s->lpd.iou.gem[i], sizeof(s->lpd.iou.gem[i]),
51
+ TYPE_CADENCE_GEM);
52
+ dev = DEVICE(&s->lpd.iou.gem[i]);
53
if (nd->used) {
54
qemu_check_nic_model(nd, "cadence_gem");
55
qdev_set_nic_properties(dev, nd);
56
}
57
- object_property_set_int(OBJECT(s->lpd.iou.gem[i]),
58
+ object_property_set_int(OBJECT(dev),
59
2, "num-priority-queues",
60
&error_abort);
61
- object_property_set_link(OBJECT(s->lpd.iou.gem[i]),
62
+ object_property_set_link(OBJECT(dev),
63
OBJECT(&s->mr_ps), "dma",
64
&error_abort);
65
qdev_init_nofail(dev);
66
67
- mr = sysbus_mmio_get_region(s->lpd.iou.gem[i], 0);
68
+ mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
69
memory_region_add_subregion(&s->mr_ps, addrs[i], mr);
70
71
- sysbus_connect_irq(s->lpd.iou.gem[i], 0, pic[irqs[i]]);
72
+ sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[irqs[i]]);
73
g_free(name);
74
}
75
}
76
--
77
2.20.1
78
79
diff view generated by jsdifflib
Deleted patch
1
From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
2
1
3
Embed the ADMAs into the SoC type.
4
5
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
6
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
7
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
8
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
9
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
10
Message-id: 20200427181649.26851-7-edgar.iglesias@gmail.com
11
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
12
---
13
include/hw/arm/xlnx-versal.h | 3 ++-
14
hw/arm/xlnx-versal.c | 14 +++++++-------
15
2 files changed, 9 insertions(+), 8 deletions(-)
16
17
diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
18
index XXXXXXX..XXXXXXX 100644
19
--- a/include/hw/arm/xlnx-versal.h
20
+++ b/include/hw/arm/xlnx-versal.h
21
@@ -XXX,XX +XXX,XX @@
22
#include "hw/arm/boot.h"
23
#include "hw/intc/arm_gicv3.h"
24
#include "hw/char/pl011.h"
25
+#include "hw/dma/xlnx-zdma.h"
26
#include "hw/net/cadence_gem.h"
27
28
#define TYPE_XLNX_VERSAL "xlnx-versal"
29
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
30
struct {
31
PL011State uart[XLNX_VERSAL_NR_UARTS];
32
CadenceGEMState gem[XLNX_VERSAL_NR_GEMS];
33
- SysBusDevice *adma[XLNX_VERSAL_NR_ADMAS];
34
+ XlnxZDMA adma[XLNX_VERSAL_NR_ADMAS];
35
} iou;
36
} lpd;
37
38
diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
39
index XXXXXXX..XXXXXXX 100644
40
--- a/hw/arm/xlnx-versal.c
41
+++ b/hw/arm/xlnx-versal.c
42
@@ -XXX,XX +XXX,XX @@ static void versal_create_admas(Versal *s, qemu_irq *pic)
43
DeviceState *dev;
44
MemoryRegion *mr;
45
46
- dev = qdev_create(NULL, "xlnx.zdma");
47
- s->lpd.iou.adma[i] = SYS_BUS_DEVICE(dev);
48
- object_property_set_int(OBJECT(s->lpd.iou.adma[i]), 128, "bus-width",
49
- &error_abort);
50
- object_property_add_child(OBJECT(s), name, OBJECT(dev), &error_fatal);
51
+ sysbus_init_child_obj(OBJECT(s), name,
52
+ &s->lpd.iou.adma[i], sizeof(s->lpd.iou.adma[i]),
53
+ TYPE_XLNX_ZDMA);
54
+ dev = DEVICE(&s->lpd.iou.adma[i]);
55
+ object_property_set_int(OBJECT(dev), 128, "bus-width", &error_abort);
56
qdev_init_nofail(dev);
57
58
- mr = sysbus_mmio_get_region(s->lpd.iou.adma[i], 0);
59
+ mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
60
memory_region_add_subregion(&s->mr_ps,
61
MM_ADMA_CH0 + i * MM_ADMA_CH0_SIZE, mr);
62
63
- sysbus_connect_irq(s->lpd.iou.adma[i], 0, pic[VERSAL_ADMA_IRQ_0 + i]);
64
+ sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[VERSAL_ADMA_IRQ_0 + i]);
65
g_free(name);
66
}
67
}
68
--
69
2.20.1
70
71
diff view generated by jsdifflib
Deleted patch
1
From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
2
1
3
Embed the APUs into the SoC type.
4
5
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
6
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
7
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
8
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
9
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
10
Message-id: 20200427181649.26851-8-edgar.iglesias@gmail.com
11
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
12
---
13
include/hw/arm/xlnx-versal.h | 2 +-
14
hw/arm/xlnx-versal-virt.c | 4 ++--
15
hw/arm/xlnx-versal.c | 19 +++++--------------
16
3 files changed, 8 insertions(+), 17 deletions(-)
17
18
diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
19
index XXXXXXX..XXXXXXX 100644
20
--- a/include/hw/arm/xlnx-versal.h
21
+++ b/include/hw/arm/xlnx-versal.h
22
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
23
struct {
24
struct {
25
MemoryRegion mr;
26
- ARMCPU *cpu[XLNX_VERSAL_NR_ACPUS];
27
+ ARMCPU cpu[XLNX_VERSAL_NR_ACPUS];
28
GICv3State gic;
29
} apu;
30
} fpd;
31
diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
32
index XXXXXXX..XXXXXXX 100644
33
--- a/hw/arm/xlnx-versal-virt.c
34
+++ b/hw/arm/xlnx-versal-virt.c
35
@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
36
s->binfo.get_dtb = versal_virt_get_dtb;
37
s->binfo.modify_dtb = versal_virt_modify_dtb;
38
if (machine->kernel_filename) {
39
- arm_load_kernel(s->soc.fpd.apu.cpu[0], machine, &s->binfo);
40
+ arm_load_kernel(&s->soc.fpd.apu.cpu[0], machine, &s->binfo);
41
} else {
42
- AddressSpace *as = arm_boot_address_space(s->soc.fpd.apu.cpu[0],
43
+ AddressSpace *as = arm_boot_address_space(&s->soc.fpd.apu.cpu[0],
44
&s->binfo);
45
/* Some boot-loaders (e.g u-boot) don't like blobs at address 0 (NULL).
46
* Offset things by 4K. */
47
diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
48
index XXXXXXX..XXXXXXX 100644
49
--- a/hw/arm/xlnx-versal.c
50
+++ b/hw/arm/xlnx-versal.c
51
@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_cpus(Versal *s)
52
53
for (i = 0; i < ARRAY_SIZE(s->fpd.apu.cpu); i++) {
54
Object *obj;
55
- char *name;
56
-
57
- obj = object_new(XLNX_VERSAL_ACPU_TYPE);
58
- if (!obj) {
59
- error_report("Unable to create apu.cpu[%d] of type %s",
60
- i, XLNX_VERSAL_ACPU_TYPE);
61
- exit(EXIT_FAILURE);
62
- }
63
-
64
- name = g_strdup_printf("apu-cpu[%d]", i);
65
- object_property_add_child(OBJECT(s), name, obj, &error_fatal);
66
- g_free(name);
67
68
+ object_initialize_child(OBJECT(s), "apu-cpu[*]",
69
+ &s->fpd.apu.cpu[i], sizeof(s->fpd.apu.cpu[i]),
70
+ XLNX_VERSAL_ACPU_TYPE, &error_abort, NULL);
71
+ obj = OBJECT(&s->fpd.apu.cpu[i]);
72
object_property_set_int(obj, s->cfg.psci_conduit,
73
"psci-conduit", &error_abort);
74
if (i) {
75
@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_cpus(Versal *s)
76
object_property_set_link(obj, OBJECT(&s->fpd.apu.mr), "memory",
77
&error_abort);
78
object_property_set_bool(obj, true, "realized", &error_fatal);
79
- s->fpd.apu.cpu[i] = ARM_CPU(obj);
80
}
81
}
82
83
@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_gic(Versal *s, qemu_irq *pic)
84
}
85
86
for (i = 0; i < nr_apu_cpus; i++) {
87
- DeviceState *cpudev = DEVICE(s->fpd.apu.cpu[i]);
88
+ DeviceState *cpudev = DEVICE(&s->fpd.apu.cpu[i]);
89
int ppibase = XLNX_VERSAL_NR_IRQS + i * GIC_INTERNAL + GIC_NR_SGIS;
90
qemu_irq maint_irq;
91
int ti;
92
--
93
2.20.1
94
95
diff view generated by jsdifflib
Deleted patch
1
From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
2
1
3
Add support for SD.
4
5
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
6
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
7
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
8
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
9
Message-id: 20200427181649.26851-9-edgar.iglesias@gmail.com
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
---
12
include/hw/arm/xlnx-versal.h | 12 ++++++++++++
13
hw/arm/xlnx-versal.c | 31 +++++++++++++++++++++++++++++++
14
2 files changed, 43 insertions(+)
15
16
diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
17
index XXXXXXX..XXXXXXX 100644
18
--- a/include/hw/arm/xlnx-versal.h
19
+++ b/include/hw/arm/xlnx-versal.h
20
@@ -XXX,XX +XXX,XX @@
21
22
#include "hw/sysbus.h"
23
#include "hw/arm/boot.h"
24
+#include "hw/sd/sdhci.h"
25
#include "hw/intc/arm_gicv3.h"
26
#include "hw/char/pl011.h"
27
#include "hw/dma/xlnx-zdma.h"
28
@@ -XXX,XX +XXX,XX @@
29
#define XLNX_VERSAL_NR_UARTS 2
30
#define XLNX_VERSAL_NR_GEMS 2
31
#define XLNX_VERSAL_NR_ADMAS 8
32
+#define XLNX_VERSAL_NR_SDS 2
33
#define XLNX_VERSAL_NR_IRQS 192
34
35
typedef struct Versal {
36
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
37
} iou;
38
} lpd;
39
40
+ /* The Platform Management Controller subsystem. */
41
+ struct {
42
+ struct {
43
+ SDHCIState sd[XLNX_VERSAL_NR_SDS];
44
+ } iou;
45
+ } pmc;
46
+
47
struct {
48
MemoryRegion *mr_ddr;
49
uint32_t psci_conduit;
50
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
51
#define VERSAL_GEM1_IRQ_0 58
52
#define VERSAL_GEM1_WAKE_IRQ_0 59
53
#define VERSAL_ADMA_IRQ_0 60
54
+#define VERSAL_SD0_IRQ_0 126
55
56
/* Architecturally reserved IRQs suitable for virtualization. */
57
#define VERSAL_RSVD_IRQ_FIRST 111
58
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
59
#define MM_FPD_CRF 0xfd1a0000U
60
#define MM_FPD_CRF_SIZE 0x140000
61
62
+#define MM_PMC_SD0 0xf1040000U
63
+#define MM_PMC_SD0_SIZE 0x10000
64
#define MM_PMC_CRP 0xf1260000U
65
#define MM_PMC_CRP_SIZE 0x10000
66
#endif
67
diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
68
index XXXXXXX..XXXXXXX 100644
69
--- a/hw/arm/xlnx-versal.c
70
+++ b/hw/arm/xlnx-versal.c
71
@@ -XXX,XX +XXX,XX @@ static void versal_create_admas(Versal *s, qemu_irq *pic)
72
}
73
}
74
75
+#define SDHCI_CAPABILITIES 0x280737ec6481 /* Same as on ZynqMP. */
76
+static void versal_create_sds(Versal *s, qemu_irq *pic)
77
+{
78
+ int i;
79
+
80
+ for (i = 0; i < ARRAY_SIZE(s->pmc.iou.sd); i++) {
81
+ DeviceState *dev;
82
+ MemoryRegion *mr;
83
+
84
+ sysbus_init_child_obj(OBJECT(s), "sd[*]",
85
+ &s->pmc.iou.sd[i], sizeof(s->pmc.iou.sd[i]),
86
+ TYPE_SYSBUS_SDHCI);
87
+ dev = DEVICE(&s->pmc.iou.sd[i]);
88
+
89
+ object_property_set_uint(OBJECT(dev),
90
+ 3, "sd-spec-version", &error_fatal);
91
+ object_property_set_uint(OBJECT(dev), SDHCI_CAPABILITIES, "capareg",
92
+ &error_fatal);
93
+ object_property_set_uint(OBJECT(dev), UHS_I, "uhs", &error_fatal);
94
+ qdev_init_nofail(dev);
95
+
96
+ mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
97
+ memory_region_add_subregion(&s->mr_ps,
98
+ MM_PMC_SD0 + i * MM_PMC_SD0_SIZE, mr);
99
+
100
+ sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0,
101
+ pic[VERSAL_SD0_IRQ_0 + i * 2]);
102
+ }
103
+}
104
+
105
/* This takes the board allocated linear DDR memory and creates aliases
106
* for each split DDR range/aperture on the Versal address map.
107
*/
108
@@ -XXX,XX +XXX,XX @@ static void versal_realize(DeviceState *dev, Error **errp)
109
versal_create_uarts(s, pic);
110
versal_create_gems(s, pic);
111
versal_create_admas(s, pic);
112
+ versal_create_sds(s, pic);
113
versal_map_ddr(s);
114
versal_unimp(s);
115
116
--
117
2.20.1
118
119
diff view generated by jsdifflib
1
From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
1
From: Jean-Christophe Dubois <jcd@tribudubois.net>
2
2
3
hw/arm: versal: Add support for the RTC.
3
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
4
5
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
6
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
4
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
7
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
5
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
8
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
6
[PMD: Fixed 32-bit format string using PRIx32/PRIx64]
9
Message-id: 20200427181649.26851-10-edgar.iglesias@gmail.com
7
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
---
9
---
12
include/hw/arm/xlnx-versal.h | 8 ++++++++
10
hw/net/imx_fec.c | 106 +++++++++++++++++++-------------------------
13
hw/arm/xlnx-versal.c | 21 +++++++++++++++++++++
11
hw/net/trace-events | 18 ++++++++
14
2 files changed, 29 insertions(+)
12
2 files changed, 63 insertions(+), 61 deletions(-)
15
13
16
diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
14
diff --git a/hw/net/imx_fec.c b/hw/net/imx_fec.c
17
index XXXXXXX..XXXXXXX 100644
15
index XXXXXXX..XXXXXXX 100644
18
--- a/include/hw/arm/xlnx-versal.h
16
--- a/hw/net/imx_fec.c
19
+++ b/include/hw/arm/xlnx-versal.h
17
+++ b/hw/net/imx_fec.c
20
@@ -XXX,XX +XXX,XX @@
18
@@ -XXX,XX +XXX,XX @@
21
#include "hw/char/pl011.h"
19
#include "qemu/module.h"
22
#include "hw/dma/xlnx-zdma.h"
20
#include "net/checksum.h"
23
#include "hw/net/cadence_gem.h"
21
#include "net/eth.h"
24
+#include "hw/rtc/xlnx-zynqmp-rtc.h"
22
+#include "trace.h"
25
23
26
#define TYPE_XLNX_VERSAL "xlnx-versal"
24
/* For crc32 */
27
#define XLNX_VERSAL(obj) OBJECT_CHECK(Versal, (obj), TYPE_XLNX_VERSAL)
25
#include <zlib.h>
28
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
26
29
struct {
27
-#ifndef DEBUG_IMX_FEC
30
SDHCIState sd[XLNX_VERSAL_NR_SDS];
28
-#define DEBUG_IMX_FEC 0
31
} iou;
29
-#endif
32
+
30
-
33
+ XlnxZynqMPRTC rtc;
31
-#define FEC_PRINTF(fmt, args...) \
34
} pmc;
32
- do { \
35
33
- if (DEBUG_IMX_FEC) { \
36
struct {
34
- fprintf(stderr, "[%s]%s: " fmt , TYPE_IMX_FEC, \
37
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
35
- __func__, ##args); \
38
#define VERSAL_GEM1_IRQ_0 58
36
- } \
39
#define VERSAL_GEM1_WAKE_IRQ_0 59
37
- } while (0)
40
#define VERSAL_ADMA_IRQ_0 60
38
-
41
+#define VERSAL_RTC_APB_ERR_IRQ 121
39
-#ifndef DEBUG_IMX_PHY
42
#define VERSAL_SD0_IRQ_0 126
40
-#define DEBUG_IMX_PHY 0
43
+#define VERSAL_RTC_ALARM_IRQ 142
41
-#endif
44
+#define VERSAL_RTC_SECONDS_IRQ 143
42
-
45
43
-#define PHY_PRINTF(fmt, args...) \
46
/* Architecturally reserved IRQs suitable for virtualization. */
44
- do { \
47
#define VERSAL_RSVD_IRQ_FIRST 111
45
- if (DEBUG_IMX_PHY) { \
48
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
46
- fprintf(stderr, "[%s.phy]%s: " fmt , TYPE_IMX_FEC, \
49
#define MM_PMC_SD0_SIZE 0x10000
47
- __func__, ##args); \
50
#define MM_PMC_CRP 0xf1260000U
48
- } \
51
#define MM_PMC_CRP_SIZE 0x10000
49
- } while (0)
52
+#define MM_PMC_RTC 0xf12a0000
50
-
53
+#define MM_PMC_RTC_SIZE 0x10000
51
#define IMX_MAX_DESC 1024
54
#endif
52
55
diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
53
static const char *imx_default_reg_name(IMXFECState *s, uint32_t index)
54
@@ -XXX,XX +XXX,XX @@ static void imx_eth_update(IMXFECState *s);
55
* For now we don't handle any GPIO/interrupt line, so the OS will
56
* have to poll for the PHY status.
57
*/
58
-static void phy_update_irq(IMXFECState *s)
59
+static void imx_phy_update_irq(IMXFECState *s)
60
{
61
imx_eth_update(s);
62
}
63
64
-static void phy_update_link(IMXFECState *s)
65
+static void imx_phy_update_link(IMXFECState *s)
66
{
67
/* Autonegotiation status mirrors link status. */
68
if (qemu_get_queue(s->nic)->link_down) {
69
- PHY_PRINTF("link is down\n");
70
+ trace_imx_phy_update_link("down");
71
s->phy_status &= ~0x0024;
72
s->phy_int |= PHY_INT_DOWN;
73
} else {
74
- PHY_PRINTF("link is up\n");
75
+ trace_imx_phy_update_link("up");
76
s->phy_status |= 0x0024;
77
s->phy_int |= PHY_INT_ENERGYON;
78
s->phy_int |= PHY_INT_AUTONEG_COMPLETE;
79
}
80
- phy_update_irq(s);
81
+ imx_phy_update_irq(s);
82
}
83
84
static void imx_eth_set_link(NetClientState *nc)
85
{
86
- phy_update_link(IMX_FEC(qemu_get_nic_opaque(nc)));
87
+ imx_phy_update_link(IMX_FEC(qemu_get_nic_opaque(nc)));
88
}
89
90
-static void phy_reset(IMXFECState *s)
91
+static void imx_phy_reset(IMXFECState *s)
92
{
93
+ trace_imx_phy_reset();
94
+
95
s->phy_status = 0x7809;
96
s->phy_control = 0x3000;
97
s->phy_advertise = 0x01e1;
98
s->phy_int_mask = 0;
99
s->phy_int = 0;
100
- phy_update_link(s);
101
+ imx_phy_update_link(s);
102
}
103
104
-static uint32_t do_phy_read(IMXFECState *s, int reg)
105
+static uint32_t imx_phy_read(IMXFECState *s, int reg)
106
{
107
uint32_t val;
108
109
@@ -XXX,XX +XXX,XX @@ static uint32_t do_phy_read(IMXFECState *s, int reg)
110
case 29: /* Interrupt source. */
111
val = s->phy_int;
112
s->phy_int = 0;
113
- phy_update_irq(s);
114
+ imx_phy_update_irq(s);
115
break;
116
case 30: /* Interrupt mask */
117
val = s->phy_int_mask;
118
@@ -XXX,XX +XXX,XX @@ static uint32_t do_phy_read(IMXFECState *s, int reg)
119
break;
120
}
121
122
- PHY_PRINTF("read 0x%04x @ %d\n", val, reg);
123
+ trace_imx_phy_read(val, reg);
124
125
return val;
126
}
127
128
-static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
129
+static void imx_phy_write(IMXFECState *s, int reg, uint32_t val)
130
{
131
- PHY_PRINTF("write 0x%04x @ %d\n", val, reg);
132
+ trace_imx_phy_write(val, reg);
133
134
if (reg > 31) {
135
/* we only advertise one phy */
136
@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
137
switch (reg) {
138
case 0: /* Basic Control */
139
if (val & 0x8000) {
140
- phy_reset(s);
141
+ imx_phy_reset(s);
142
} else {
143
s->phy_control = val & 0x7980;
144
/* Complete autonegotiation immediately. */
145
@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
146
break;
147
case 30: /* Interrupt mask */
148
s->phy_int_mask = val & 0xff;
149
- phy_update_irq(s);
150
+ imx_phy_update_irq(s);
151
break;
152
case 17:
153
case 18:
154
@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
155
static void imx_fec_read_bd(IMXFECBufDesc *bd, dma_addr_t addr)
156
{
157
dma_memory_read(&address_space_memory, addr, bd, sizeof(*bd));
158
+
159
+ trace_imx_fec_read_bd(addr, bd->flags, bd->length, bd->data);
160
}
161
162
static void imx_fec_write_bd(IMXFECBufDesc *bd, dma_addr_t addr)
163
@@ -XXX,XX +XXX,XX @@ static void imx_fec_write_bd(IMXFECBufDesc *bd, dma_addr_t addr)
164
static void imx_enet_read_bd(IMXENETBufDesc *bd, dma_addr_t addr)
165
{
166
dma_memory_read(&address_space_memory, addr, bd, sizeof(*bd));
167
+
168
+ trace_imx_enet_read_bd(addr, bd->flags, bd->length, bd->data,
169
+ bd->option, bd->status);
170
}
171
172
static void imx_enet_write_bd(IMXENETBufDesc *bd, dma_addr_t addr)
173
@@ -XXX,XX +XXX,XX @@ static void imx_fec_do_tx(IMXFECState *s)
174
int len;
175
176
imx_fec_read_bd(&bd, addr);
177
- FEC_PRINTF("tx_bd %x flags %04x len %d data %08x\n",
178
- addr, bd.flags, bd.length, bd.data);
179
if ((bd.flags & ENET_BD_R) == 0) {
180
+
181
/* Run out of descriptors to transmit. */
182
- FEC_PRINTF("tx_bd ran out of descriptors to transmit\n");
183
+ trace_imx_eth_tx_bd_busy();
184
+
185
break;
186
}
187
len = bd.length;
188
@@ -XXX,XX +XXX,XX @@ static void imx_enet_do_tx(IMXFECState *s, uint32_t index)
189
int len;
190
191
imx_enet_read_bd(&bd, addr);
192
- FEC_PRINTF("tx_bd %x flags %04x len %d data %08x option %04x "
193
- "status %04x\n", addr, bd.flags, bd.length, bd.data,
194
- bd.option, bd.status);
195
if ((bd.flags & ENET_BD_R) == 0) {
196
/* Run out of descriptors to transmit. */
197
+
198
+ trace_imx_eth_tx_bd_busy();
199
+
200
break;
201
}
202
len = bd.length;
203
@@ -XXX,XX +XXX,XX @@ static void imx_eth_enable_rx(IMXFECState *s, bool flush)
204
s->regs[ENET_RDAR] = (bd.flags & ENET_BD_E) ? ENET_RDAR_RDAR : 0;
205
206
if (!s->regs[ENET_RDAR]) {
207
- FEC_PRINTF("RX buffer full\n");
208
+ trace_imx_eth_rx_bd_full();
209
} else if (flush) {
210
qemu_flush_queued_packets(qemu_get_queue(s->nic));
211
}
212
@@ -XXX,XX +XXX,XX @@ static void imx_eth_reset(DeviceState *d)
213
memset(s->tx_descriptor, 0, sizeof(s->tx_descriptor));
214
215
/* We also reset the PHY */
216
- phy_reset(s);
217
+ imx_phy_reset(s);
218
}
219
220
static uint32_t imx_default_read(IMXFECState *s, uint32_t index)
221
@@ -XXX,XX +XXX,XX @@ static uint64_t imx_eth_read(void *opaque, hwaddr offset, unsigned size)
222
break;
223
}
224
225
- FEC_PRINTF("reg[%s] => 0x%" PRIx32 "\n", imx_eth_reg_name(s, index),
226
- value);
227
+ trace_imx_eth_read(index, imx_eth_reg_name(s, index), value);
228
229
return value;
230
}
231
@@ -XXX,XX +XXX,XX @@ static void imx_eth_write(void *opaque, hwaddr offset, uint64_t value,
232
const bool single_tx_ring = !imx_eth_is_multi_tx_ring(s);
233
uint32_t index = offset >> 2;
234
235
- FEC_PRINTF("reg[%s] <= 0x%" PRIx32 "\n", imx_eth_reg_name(s, index),
236
- (uint32_t)value);
237
+ trace_imx_eth_write(index, imx_eth_reg_name(s, index), value);
238
239
switch (index) {
240
case ENET_EIR:
241
@@ -XXX,XX +XXX,XX @@ static void imx_eth_write(void *opaque, hwaddr offset, uint64_t value,
242
if (extract32(value, 29, 1)) {
243
/* This is a read operation */
244
s->regs[ENET_MMFR] = deposit32(s->regs[ENET_MMFR], 0, 16,
245
- do_phy_read(s,
246
+ imx_phy_read(s,
247
extract32(value,
248
18, 10)));
249
} else {
250
/* This a write operation */
251
- do_phy_write(s, extract32(value, 18, 10), extract32(value, 0, 16));
252
+ imx_phy_write(s, extract32(value, 18, 10), extract32(value, 0, 16));
253
}
254
/* raise the interrupt as the PHY operation is done */
255
s->regs[ENET_EIR] |= ENET_INT_MII;
256
@@ -XXX,XX +XXX,XX @@ static bool imx_eth_can_receive(NetClientState *nc)
257
{
258
IMXFECState *s = IMX_FEC(qemu_get_nic_opaque(nc));
259
260
- FEC_PRINTF("\n");
261
-
262
return !!s->regs[ENET_RDAR];
263
}
264
265
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
266
unsigned int buf_len;
267
size_t size = len;
268
269
- FEC_PRINTF("len %d\n", (int)size);
270
+ trace_imx_fec_receive(size);
271
272
if (!s->regs[ENET_RDAR]) {
273
qemu_log_mask(LOG_GUEST_ERROR, "[%s]%s: Unexpected packet\n",
274
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
275
bd.length = buf_len;
276
size -= buf_len;
277
278
- FEC_PRINTF("rx_bd 0x%x length %d\n", addr, bd.length);
279
+ trace_imx_fec_receive_len(addr, bd.length);
280
281
/* The last 4 bytes are the CRC. */
282
if (size < 4) {
283
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
284
if (size == 0) {
285
/* Last buffer in frame. */
286
bd.flags |= flags | ENET_BD_L;
287
- FEC_PRINTF("rx frame flags %04x\n", bd.flags);
288
+
289
+ trace_imx_fec_receive_last(bd.flags);
290
+
291
s->regs[ENET_EIR] |= ENET_INT_RXF;
292
} else {
293
s->regs[ENET_EIR] |= ENET_INT_RXB;
294
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
295
size_t size = len;
296
bool shift16 = s->regs[ENET_RACC] & ENET_RACC_SHIFT16;
297
298
- FEC_PRINTF("len %d\n", (int)size);
299
+ trace_imx_enet_receive(size);
300
301
if (!s->regs[ENET_RDAR]) {
302
qemu_log_mask(LOG_GUEST_ERROR, "[%s]%s: Unexpected packet\n",
303
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
304
bd.length = buf_len;
305
size -= buf_len;
306
307
- FEC_PRINTF("rx_bd 0x%x length %d\n", addr, bd.length);
308
+ trace_imx_enet_receive_len(addr, bd.length);
309
310
/* The last 4 bytes are the CRC. */
311
if (size < 4) {
312
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
313
if (size == 0) {
314
/* Last buffer in frame. */
315
bd.flags |= flags | ENET_BD_L;
316
- FEC_PRINTF("rx frame flags %04x\n", bd.flags);
317
+
318
+ trace_imx_enet_receive_last(bd.flags);
319
+
320
/* Indicate that we've updated the last buffer descriptor. */
321
bd.last_buffer = ENET_BD_BDU;
322
if (bd.option & ENET_BD_RX_INT) {
323
diff --git a/hw/net/trace-events b/hw/net/trace-events
56
index XXXXXXX..XXXXXXX 100644
324
index XXXXXXX..XXXXXXX 100644
57
--- a/hw/arm/xlnx-versal.c
325
--- a/hw/net/trace-events
58
+++ b/hw/arm/xlnx-versal.c
326
+++ b/hw/net/trace-events
59
@@ -XXX,XX +XXX,XX @@ static void versal_create_sds(Versal *s, qemu_irq *pic)
327
@@ -XXX,XX +XXX,XX @@ i82596_receive_packet(size_t sz) "len=%zu"
60
}
328
i82596_new_mac(const char *id_with_mac) "New MAC for: %s"
61
}
329
i82596_set_multicast(uint16_t count) "Added %d multicast entries"
62
330
i82596_channel_attention(void *s) "%p: Received CHANNEL ATTENTION"
63
+static void versal_create_rtc(Versal *s, qemu_irq *pic)
331
+
64
+{
332
+# imx_fec.c
65
+ SysBusDevice *sbd;
333
+imx_phy_read(uint32_t val, int reg) "0x%04"PRIx32" <= reg[%d]"
66
+ MemoryRegion *mr;
334
+imx_phy_write(uint32_t val, int reg) "0x%04"PRIx32" => reg[%d]"
67
+
335
+imx_phy_update_link(const char *s) "%s"
68
+ sysbus_init_child_obj(OBJECT(s), "rtc", &s->pmc.rtc, sizeof(s->pmc.rtc),
336
+imx_phy_reset(void) ""
69
+ TYPE_XLNX_ZYNQMP_RTC);
337
+imx_fec_read_bd(uint64_t addr, int flags, int len, int data) "tx_bd 0x%"PRIx64" flags 0x%04x len %d data 0x%08x"
70
+ sbd = SYS_BUS_DEVICE(&s->pmc.rtc);
338
+imx_enet_read_bd(uint64_t addr, int flags, int len, int data, int options, int status) "tx_bd 0x%"PRIx64" flags 0x%04x len %d data 0x%08x option 0x%04x status 0x%04x"
71
+ qdev_init_nofail(DEVICE(sbd));
339
+imx_eth_tx_bd_busy(void) "tx_bd ran out of descriptors to transmit"
72
+
340
+imx_eth_rx_bd_full(void) "RX buffer is full"
73
+ mr = sysbus_mmio_get_region(sbd, 0);
341
+imx_eth_read(int reg, const char *reg_name, uint32_t value) "reg[%d:%s] => 0x%08"PRIx32
74
+ memory_region_add_subregion(&s->mr_ps, MM_PMC_RTC, mr);
342
+imx_eth_write(int reg, const char *reg_name, uint64_t value) "reg[%d:%s] <= 0x%08"PRIx64
75
+
343
+imx_fec_receive(size_t size) "len %zu"
76
+ /*
344
+imx_fec_receive_len(uint64_t addr, int len) "rx_bd 0x%"PRIx64" length %d"
77
+ * TODO: Connect the ALARM and SECONDS interrupts once our RTC model
345
+imx_fec_receive_last(int last) "rx frame flags 0x%04x"
78
+ * supports them.
346
+imx_enet_receive(size_t size) "len %zu"
79
+ */
347
+imx_enet_receive_len(uint64_t addr, int len) "rx_bd 0x%"PRIx64" length %d"
80
+ sysbus_connect_irq(sbd, 1, pic[VERSAL_RTC_APB_ERR_IRQ]);
348
+imx_enet_receive_last(int last) "rx frame flags 0x%04x"
81
+}
82
+
83
/* This takes the board allocated linear DDR memory and creates aliases
84
* for each split DDR range/aperture on the Versal address map.
85
*/
86
@@ -XXX,XX +XXX,XX @@ static void versal_realize(DeviceState *dev, Error **errp)
87
versal_create_gems(s, pic);
88
versal_create_admas(s, pic);
89
versal_create_sds(s, pic);
90
+ versal_create_rtc(s, pic);
91
versal_map_ddr(s);
92
versal_unimp(s);
93
94
--
349
--
95
2.20.1
350
2.20.1
96
351
97
352
diff view generated by jsdifflib
Deleted patch
1
From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
2
1
3
Add support for SD.
4
5
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
6
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
7
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
8
Message-id: 20200427181649.26851-11-edgar.iglesias@gmail.com
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
10
---
11
hw/arm/xlnx-versal-virt.c | 46 +++++++++++++++++++++++++++++++++++++++
12
1 file changed, 46 insertions(+)
13
14
diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
15
index XXXXXXX..XXXXXXX 100644
16
--- a/hw/arm/xlnx-versal-virt.c
17
+++ b/hw/arm/xlnx-versal-virt.c
18
@@ -XXX,XX +XXX,XX @@
19
#include "hw/arm/sysbus-fdt.h"
20
#include "hw/arm/fdt.h"
21
#include "cpu.h"
22
+#include "hw/qdev-properties.h"
23
#include "hw/arm/xlnx-versal.h"
24
25
#define TYPE_XLNX_VERSAL_VIRT_MACHINE MACHINE_TYPE_NAME("xlnx-versal-virt")
26
@@ -XXX,XX +XXX,XX @@ static void fdt_add_zdma_nodes(VersalVirt *s)
27
}
28
}
29
30
+static void fdt_add_sd_nodes(VersalVirt *s)
31
+{
32
+ const char clocknames[] = "clk_xin\0clk_ahb";
33
+ const char compat[] = "arasan,sdhci-8.9a";
34
+ int i;
35
+
36
+ for (i = ARRAY_SIZE(s->soc.pmc.iou.sd) - 1; i >= 0; i--) {
37
+ uint64_t addr = MM_PMC_SD0 + MM_PMC_SD0_SIZE * i;
38
+ char *name = g_strdup_printf("/sdhci@%" PRIx64, addr);
39
+
40
+ qemu_fdt_add_subnode(s->fdt, name);
41
+
42
+ qemu_fdt_setprop_cells(s->fdt, name, "clocks",
43
+ s->phandle.clk_25Mhz, s->phandle.clk_25Mhz);
44
+ qemu_fdt_setprop(s->fdt, name, "clock-names",
45
+ clocknames, sizeof(clocknames));
46
+ qemu_fdt_setprop_cells(s->fdt, name, "interrupts",
47
+ GIC_FDT_IRQ_TYPE_SPI, VERSAL_SD0_IRQ_0 + i * 2,
48
+ GIC_FDT_IRQ_FLAGS_LEVEL_HI);
49
+ qemu_fdt_setprop_sized_cells(s->fdt, name, "reg",
50
+ 2, addr, 2, MM_PMC_SD0_SIZE);
51
+ qemu_fdt_setprop(s->fdt, name, "compatible", compat, sizeof(compat));
52
+ g_free(name);
53
+ }
54
+}
55
+
56
static void fdt_nop_memory_nodes(void *fdt, Error **errp)
57
{
58
Error *err = NULL;
59
@@ -XXX,XX +XXX,XX @@ static void create_virtio_regions(VersalVirt *s)
60
}
61
}
62
63
+static void sd_plugin_card(SDHCIState *sd, DriveInfo *di)
64
+{
65
+ BlockBackend *blk = di ? blk_by_legacy_dinfo(di) : NULL;
66
+ DeviceState *card;
67
+
68
+ card = qdev_create(qdev_get_child_bus(DEVICE(sd), "sd-bus"), TYPE_SD_CARD);
69
+ object_property_add_child(OBJECT(sd), "card[*]", OBJECT(card),
70
+ &error_fatal);
71
+ qdev_prop_set_drive(card, "drive", blk, &error_fatal);
72
+ object_property_set_bool(OBJECT(card), true, "realized", &error_fatal);
73
+}
74
+
75
static void versal_virt_init(MachineState *machine)
76
{
77
VersalVirt *s = XLNX_VERSAL_VIRT_MACHINE(machine);
78
int psci_conduit = QEMU_PSCI_CONDUIT_DISABLED;
79
+ int i;
80
81
/*
82
* If the user provides an Operating System to be loaded, we expect them
83
@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
84
fdt_add_gic_nodes(s);
85
fdt_add_timer_nodes(s);
86
fdt_add_zdma_nodes(s);
87
+ fdt_add_sd_nodes(s);
88
fdt_add_cpu_nodes(s, psci_conduit);
89
fdt_add_clk_node(s, "/clk125", 125000000, s->phandle.clk_125Mhz);
90
fdt_add_clk_node(s, "/clk25", 25000000, s->phandle.clk_25Mhz);
91
@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
92
memory_region_add_subregion_overlap(get_system_memory(),
93
0, &s->soc.fpd.apu.mr, 0);
94
95
+ /* Plugin SD cards. */
96
+ for (i = 0; i < ARRAY_SIZE(s->soc.pmc.iou.sd); i++) {
97
+ sd_plugin_card(&s->soc.pmc.iou.sd[i], drive_get_next(IF_SD));
98
+ }
99
+
100
s->binfo.ram_size = machine->ram_size;
101
s->binfo.loader_start = 0x0;
102
s->binfo.get_dtb = versal_virt_get_dtb;
103
--
104
2.20.1
105
106
diff view generated by jsdifflib
1
From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
1
From: Guenter Roeck <linux@roeck-us.net>
2
2
3
Add support for the RTC.
3
The Linux kernel's IMX code now uses vendor specific commands.
4
This results in endless warnings when booting the Linux kernel.
4
5
5
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
6
sdhci-esdhc-imx 2194000.usdhc: esdhc_wait_for_card_clock_gate_off:
6
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
7
    card clock still not gate off in 100us!.
7
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
8
8
Message-id: 20200427181649.26851-12-edgar.iglesias@gmail.com
9
Implement support for the vendor specific command implemented in IMX hardware
10
to be able to avoid this warning.
11
12
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
13
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
14
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
15
Message-id: 20200603145258.195920-2-linux@roeck-us.net
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
16
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
10
---
17
---
11
hw/arm/xlnx-versal-virt.c | 22 ++++++++++++++++++++++
18
hw/sd/sdhci-internal.h | 5 +++++
12
1 file changed, 22 insertions(+)
19
include/hw/sd/sdhci.h | 5 +++++
20
hw/sd/sdhci.c | 18 +++++++++++++++++-
21
3 files changed, 27 insertions(+), 1 deletion(-)
13
22
14
diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
23
diff --git a/hw/sd/sdhci-internal.h b/hw/sd/sdhci-internal.h
15
index XXXXXXX..XXXXXXX 100644
24
index XXXXXXX..XXXXXXX 100644
16
--- a/hw/arm/xlnx-versal-virt.c
25
--- a/hw/sd/sdhci-internal.h
17
+++ b/hw/arm/xlnx-versal-virt.c
26
+++ b/hw/sd/sdhci-internal.h
18
@@ -XXX,XX +XXX,XX @@ static void fdt_add_sd_nodes(VersalVirt *s)
27
@@ -XXX,XX +XXX,XX @@
19
}
28
#define SDHC_CMD_INHIBIT 0x00000001
20
}
29
#define SDHC_DATA_INHIBIT 0x00000002
21
30
#define SDHC_DAT_LINE_ACTIVE 0x00000004
22
+static void fdt_add_rtc_node(VersalVirt *s)
31
+#define SDHC_IMX_CLOCK_GATE_OFF 0x00000080
23
+{
32
#define SDHC_DOING_WRITE 0x00000100
24
+ const char compat[] = "xlnx,zynqmp-rtc";
33
#define SDHC_DOING_READ 0x00000200
25
+ const char interrupt_names[] = "alarm\0sec";
34
#define SDHC_SPACE_AVAILABLE 0x00000400
26
+ char *name = g_strdup_printf("/rtc@%x", MM_PMC_RTC);
35
@@ -XXX,XX +XXX,XX @@ extern const VMStateDescription sdhci_vmstate;
36
37
38
#define ESDHC_MIX_CTRL 0x48
27
+
39
+
28
+ qemu_fdt_add_subnode(s->fdt, name);
40
#define ESDHC_VENDOR_SPEC 0xc0
41
+#define ESDHC_IMX_FRC_SDCLK_ON (1 << 8)
29
+
42
+
30
+ qemu_fdt_setprop_cells(s->fdt, name, "interrupts",
43
#define ESDHC_DLL_CTRL 0x60
31
+ GIC_FDT_IRQ_TYPE_SPI, VERSAL_RTC_ALARM_IRQ,
44
32
+ GIC_FDT_IRQ_FLAGS_LEVEL_HI,
45
#define ESDHC_TUNING_CTRL 0xcc
33
+ GIC_FDT_IRQ_TYPE_SPI, VERSAL_RTC_SECONDS_IRQ,
46
@@ -XXX,XX +XXX,XX @@ extern const VMStateDescription sdhci_vmstate;
34
+ GIC_FDT_IRQ_FLAGS_LEVEL_HI);
47
#define DEFINE_SDHCI_COMMON_PROPERTIES(_state) \
35
+ qemu_fdt_setprop(s->fdt, name, "interrupt-names",
48
DEFINE_PROP_UINT8("sd-spec-version", _state, sd_spec_version, 2), \
36
+ interrupt_names, sizeof(interrupt_names));
49
DEFINE_PROP_UINT8("uhs", _state, uhs_mode, UHS_NOT_SUPPORTED), \
37
+ qemu_fdt_setprop_sized_cells(s->fdt, name, "reg",
50
+ DEFINE_PROP_UINT8("vendor", _state, vendor, SDHCI_VENDOR_NONE), \
38
+ 2, MM_PMC_RTC, 2, MM_PMC_RTC_SIZE);
51
\
39
+ qemu_fdt_setprop(s->fdt, name, "compatible", compat, sizeof(compat));
52
/* Capabilities registers provide information on supported
40
+ g_free(name);
53
* features of this specific host controller implementation */ \
41
+}
54
diff --git a/include/hw/sd/sdhci.h b/include/hw/sd/sdhci.h
55
index XXXXXXX..XXXXXXX 100644
56
--- a/include/hw/sd/sdhci.h
57
+++ b/include/hw/sd/sdhci.h
58
@@ -XXX,XX +XXX,XX @@ typedef struct SDHCIState {
59
uint16_t acmd12errsts; /* Auto CMD12 error status register */
60
uint16_t hostctl2; /* Host Control 2 */
61
uint64_t admasysaddr; /* ADMA System Address Register */
62
+ uint16_t vendor_spec; /* Vendor specific register */
63
64
/* Read-only registers */
65
uint64_t capareg; /* Capabilities Register */
66
@@ -XXX,XX +XXX,XX @@ typedef struct SDHCIState {
67
uint32_t quirks;
68
uint8_t sd_spec_version;
69
uint8_t uhs_mode;
70
+ uint8_t vendor; /* For vendor specific functionality */
71
} SDHCIState;
72
73
+#define SDHCI_VENDOR_NONE 0
74
+#define SDHCI_VENDOR_IMX 1
42
+
75
+
43
static void fdt_nop_memory_nodes(void *fdt, Error **errp)
76
/*
44
{
77
* Controller does not provide transfer-complete interrupt when not
45
Error *err = NULL;
78
* busy.
46
@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
79
diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
47
fdt_add_timer_nodes(s);
80
index XXXXXXX..XXXXXXX 100644
48
fdt_add_zdma_nodes(s);
81
--- a/hw/sd/sdhci.c
49
fdt_add_sd_nodes(s);
82
+++ b/hw/sd/sdhci.c
50
+ fdt_add_rtc_node(s);
83
@@ -XXX,XX +XXX,XX @@ static uint64_t usdhc_read(void *opaque, hwaddr offset, unsigned size)
51
fdt_add_cpu_nodes(s, psci_conduit);
84
}
52
fdt_add_clk_node(s, "/clk125", 125000000, s->phandle.clk_125Mhz);
85
break;
53
fdt_add_clk_node(s, "/clk25", 25000000, s->phandle.clk_25Mhz);
86
87
+ case ESDHC_VENDOR_SPEC:
88
+ ret = s->vendor_spec;
89
+ break;
90
case ESDHC_DLL_CTRL:
91
case ESDHC_TUNE_CTRL_STATUS:
92
case ESDHC_UNDOCUMENTED_REG27:
93
case ESDHC_TUNING_CTRL:
94
- case ESDHC_VENDOR_SPEC:
95
case ESDHC_MIX_CTRL:
96
case ESDHC_WTMK_LVL:
97
ret = 0;
98
@@ -XXX,XX +XXX,XX @@ usdhc_write(void *opaque, hwaddr offset, uint64_t val, unsigned size)
99
case ESDHC_UNDOCUMENTED_REG27:
100
case ESDHC_TUNING_CTRL:
101
case ESDHC_WTMK_LVL:
102
+ break;
103
+
104
case ESDHC_VENDOR_SPEC:
105
+ s->vendor_spec = value;
106
+ switch (s->vendor) {
107
+ case SDHCI_VENDOR_IMX:
108
+ if (value & ESDHC_IMX_FRC_SDCLK_ON) {
109
+ s->prnsts &= ~SDHC_IMX_CLOCK_GATE_OFF;
110
+ } else {
111
+ s->prnsts |= SDHC_IMX_CLOCK_GATE_OFF;
112
+ }
113
+ break;
114
+ default:
115
+ break;
116
+ }
117
break;
118
119
case SDHC_HOSTCTL:
54
--
120
--
55
2.20.1
121
2.20.1
56
122
57
123
diff view generated by jsdifflib
Deleted patch
1
Somewhere along theline we accidentally added a duplicate
2
"using D16-D31 when they don't exist" check to do_vfm_dp()
3
(probably an artifact of a patchseries rebase). Remove it.
4
1
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
7
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
8
Message-id: 20200430181003.21682-2-peter.maydell@linaro.org
9
---
10
target/arm/translate-vfp.inc.c | 6 ------
11
1 file changed, 6 deletions(-)
12
13
diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
14
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/translate-vfp.inc.c
16
+++ b/target/arm/translate-vfp.inc.c
17
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
18
return false;
19
}
20
21
- /* UNDEF accesses to D16-D31 if they don't exist. */
22
- if (!dc_isar_feature(aa32_simd_r32, s) &&
23
- ((a->vd | a->vn | a->vm) & 0x10)) {
24
- return false;
25
- }
26
-
27
if (!vfp_access_check(s)) {
28
return true;
29
}
30
--
31
2.20.1
32
33
diff view generated by jsdifflib
1
Add the infrastructure for building and invoking a decodetree decoder
1
From: Guenter Roeck <linux@roeck-us.net>
2
for the AArch32 Neon encodings. At the moment the new decoder covers
3
nothing, so we always fall back to the existing hand-written decode.
4
2
5
We follow the same pattern we did for the VFP decodetree conversion
3
Set vendor property to IMX to enable IMX specific functionality
6
(commit 78e138bc1f672c145ef6ace74617d and following): code that deals
4
in sdhci code.
7
with Neon will be moving gradually out to translate-neon.vfp.inc,
8
which we #include into translate.c.
9
5
10
In order to share the decode files between A32 and T32, we
6
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
11
split Neon into 3 parts:
7
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
12
* data-processing
8
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
13
* load-store
9
Message-id: 20200603145258.195920-3-linux@roeck-us.net
14
* 'shared' encodings
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
---
12
hw/arm/fsl-imx25.c | 6 ++++++
13
hw/arm/fsl-imx6.c | 6 ++++++
14
hw/arm/fsl-imx6ul.c | 2 ++
15
hw/arm/fsl-imx7.c | 2 ++
16
4 files changed, 16 insertions(+)
15
17
16
The first two groups of instructions have similar but not identical
18
diff --git a/hw/arm/fsl-imx25.c b/hw/arm/fsl-imx25.c
17
A32 and T32 encodings, so we need to manually transform the T32
18
encoding into the A32 one before calling the decoder; the third group
19
covers the Neon instructions which are identical in A32 and T32.
20
21
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
22
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
23
Message-id: 20200430181003.21682-4-peter.maydell@linaro.org
24
---
25
target/arm/neon-dp.decode | 29 ++++++++++++++++++++++++++
26
target/arm/neon-ls.decode | 29 ++++++++++++++++++++++++++
27
target/arm/neon-shared.decode | 27 +++++++++++++++++++++++++
28
target/arm/translate-neon.inc.c | 32 +++++++++++++++++++++++++++++
29
target/arm/translate.c | 36 +++++++++++++++++++++++++++++++--
30
target/arm/Makefile.objs | 18 +++++++++++++++++
31
6 files changed, 169 insertions(+), 2 deletions(-)
32
create mode 100644 target/arm/neon-dp.decode
33
create mode 100644 target/arm/neon-ls.decode
34
create mode 100644 target/arm/neon-shared.decode
35
create mode 100644 target/arm/translate-neon.inc.c
36
37
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
38
new file mode 100644
39
index XXXXXXX..XXXXXXX
40
--- /dev/null
41
+++ b/target/arm/neon-dp.decode
42
@@ -XXX,XX +XXX,XX @@
43
+# AArch32 Neon data-processing instruction descriptions
44
+#
45
+# Copyright (c) 2020 Linaro, Ltd
46
+#
47
+# This library is free software; you can redistribute it and/or
48
+# modify it under the terms of the GNU Lesser General Public
49
+# License as published by the Free Software Foundation; either
50
+# version 2 of the License, or (at your option) any later version.
51
+#
52
+# This library is distributed in the hope that it will be useful,
53
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
54
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
55
+# Lesser General Public License for more details.
56
+#
57
+# You should have received a copy of the GNU Lesser General Public
58
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
59
+
60
+#
61
+# This file is processed by scripts/decodetree.py
62
+#
63
+
64
+# Encodings for Neon data processing instructions where the T32 encoding
65
+# is a simple transformation of the A32 encoding.
66
+# More specifically, this file covers instructions where the A32 encoding is
67
+# 0b1111_001p_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
68
+# and the T32 encoding is
69
+# 0b111p_1111_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
70
+# This file works on the A32 encoding only; calling code for T32 has to
71
+# transform the insn into the A32 version first.
72
diff --git a/target/arm/neon-ls.decode b/target/arm/neon-ls.decode
73
new file mode 100644
74
index XXXXXXX..XXXXXXX
75
--- /dev/null
76
+++ b/target/arm/neon-ls.decode
77
@@ -XXX,XX +XXX,XX @@
78
+# AArch32 Neon load/store instruction descriptions
79
+#
80
+# Copyright (c) 2020 Linaro, Ltd
81
+#
82
+# This library is free software; you can redistribute it and/or
83
+# modify it under the terms of the GNU Lesser General Public
84
+# License as published by the Free Software Foundation; either
85
+# version 2 of the License, or (at your option) any later version.
86
+#
87
+# This library is distributed in the hope that it will be useful,
88
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
89
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
90
+# Lesser General Public License for more details.
91
+#
92
+# You should have received a copy of the GNU Lesser General Public
93
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
94
+
95
+#
96
+# This file is processed by scripts/decodetree.py
97
+#
98
+
99
+# Encodings for Neon load/store instructions where the T32 encoding
100
+# is a simple transformation of the A32 encoding.
101
+# More specifically, this file covers instructions where the A32 encoding is
102
+# 0b1111_0100_xxx0_xxxx_xxxx_xxxx_xxxx_xxxx
103
+# and the T32 encoding is
104
+# 0b1111_1001_xxx0_xxxx_xxxx_xxxx_xxxx_xxxx
105
+# This file works on the A32 encoding only; calling code for T32 has to
106
+# transform the insn into the A32 version first.
107
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
108
new file mode 100644
109
index XXXXXXX..XXXXXXX
110
--- /dev/null
111
+++ b/target/arm/neon-shared.decode
112
@@ -XXX,XX +XXX,XX @@
113
+# AArch32 Neon instruction descriptions
114
+#
115
+# Copyright (c) 2020 Linaro, Ltd
116
+#
117
+# This library is free software; you can redistribute it and/or
118
+# modify it under the terms of the GNU Lesser General Public
119
+# License as published by the Free Software Foundation; either
120
+# version 2 of the License, or (at your option) any later version.
121
+#
122
+# This library is distributed in the hope that it will be useful,
123
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
124
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
125
+# Lesser General Public License for more details.
126
+#
127
+# You should have received a copy of the GNU Lesser General Public
128
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
129
+
130
+#
131
+# This file is processed by scripts/decodetree.py
132
+#
133
+
134
+# Encodings for Neon instructions whose encoding is the same for
135
+# both A32 and T32.
136
+
137
+# More specifically, this covers:
138
+# 2reg scalar ext: 0b1111_1110_xxxx_xxxx_xxxx_1x0x_xxxx_xxxx
139
+# 3same ext: 0b1111_110x_xxxx_xxxx_xxxx_1x0x_xxxx_xxxx
140
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
141
new file mode 100644
142
index XXXXXXX..XXXXXXX
143
--- /dev/null
144
+++ b/target/arm/translate-neon.inc.c
145
@@ -XXX,XX +XXX,XX @@
146
+/*
147
+ * ARM translation: AArch32 Neon instructions
148
+ *
149
+ * Copyright (c) 2003 Fabrice Bellard
150
+ * Copyright (c) 2005-2007 CodeSourcery
151
+ * Copyright (c) 2007 OpenedHand, Ltd.
152
+ * Copyright (c) 2020 Linaro, Ltd.
153
+ *
154
+ * This library is free software; you can redistribute it and/or
155
+ * modify it under the terms of the GNU Lesser General Public
156
+ * License as published by the Free Software Foundation; either
157
+ * version 2 of the License, or (at your option) any later version.
158
+ *
159
+ * This library is distributed in the hope that it will be useful,
160
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
161
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
162
+ * Lesser General Public License for more details.
163
+ *
164
+ * You should have received a copy of the GNU Lesser General Public
165
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
166
+ */
167
+
168
+/*
169
+ * This file is intended to be included from translate.c; it uses
170
+ * some macros and definitions provided by that file.
171
+ * It might be possible to convert it to a standalone .c file eventually.
172
+ */
173
+
174
+/* Include the generated Neon decoder */
175
+#include "decode-neon-dp.inc.c"
176
+#include "decode-neon-ls.inc.c"
177
+#include "decode-neon-shared.inc.c"
178
diff --git a/target/arm/translate.c b/target/arm/translate.c
179
index XXXXXXX..XXXXXXX 100644
19
index XXXXXXX..XXXXXXX 100644
180
--- a/target/arm/translate.c
20
--- a/hw/arm/fsl-imx25.c
181
+++ b/target/arm/translate.c
21
+++ b/hw/arm/fsl-imx25.c
182
@@ -XXX,XX +XXX,XX @@ static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
22
@@ -XXX,XX +XXX,XX @@ static void fsl_imx25_realize(DeviceState *dev, Error **errp)
183
23
&err);
184
#define ARM_CP_RW_BIT (1 << 20)
24
object_property_set_uint(OBJECT(&s->esdhc[i]), IMX25_ESDHC_CAPABILITIES,
185
25
"capareg", &err);
186
-/* Include the VFP decoder */
26
+ object_property_set_uint(OBJECT(&s->esdhc[i]), SDHCI_VENDOR_IMX,
187
+/* Include the VFP and Neon decoders */
27
+ "vendor", &err);
188
#include "translate-vfp.inc.c"
28
+ if (err) {
189
+#include "translate-neon.inc.c"
29
+ error_propagate(errp, err);
190
191
static inline void iwmmxt_load_reg(TCGv_i64 var, int reg)
192
{
193
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
194
/* Unconditional instructions. */
195
/* TODO: Perhaps merge these into one decodetree output file. */
196
if (disas_a32_uncond(s, insn) ||
197
- disas_vfp_uncond(s, insn)) {
198
+ disas_vfp_uncond(s, insn) ||
199
+ disas_neon_dp(s, insn) ||
200
+ disas_neon_ls(s, insn) ||
201
+ disas_neon_shared(s, insn)) {
202
return;
203
}
204
/* fall back to legacy decoder */
205
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
206
ARCH(6T2);
207
}
208
209
+ if ((insn & 0xef000000) == 0xef000000) {
210
+ /*
211
+ * T32 encodings 0b111p_1111_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
212
+ * transform into
213
+ * A32 encodings 0b1111_001p_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
214
+ */
215
+ uint32_t a32_insn = (insn & 0xe2ffffff) |
216
+ ((insn & (1 << 28)) >> 4) | (1 << 28);
217
+
218
+ if (disas_neon_dp(s, a32_insn)) {
219
+ return;
30
+ return;
220
+ }
31
+ }
221
+ }
32
object_property_set_bool(OBJECT(&s->esdhc[i]), true, "realized", &err);
222
+
33
if (err) {
223
+ if ((insn & 0xff100000) == 0xf9000000) {
34
error_propagate(errp, err);
224
+ /*
35
diff --git a/hw/arm/fsl-imx6.c b/hw/arm/fsl-imx6.c
225
+ * T32 encodings 0b1111_1001_ppp0_qqqq_qqqq_qqqq_qqqq_qqqq
36
index XXXXXXX..XXXXXXX 100644
226
+ * transform into
37
--- a/hw/arm/fsl-imx6.c
227
+ * A32 encodings 0b1111_0100_ppp0_qqqq_qqqq_qqqq_qqqq_qqqq
38
+++ b/hw/arm/fsl-imx6.c
228
+ */
39
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6_realize(DeviceState *dev, Error **errp)
229
+ uint32_t a32_insn = (insn & 0x00ffffff) | 0xf4000000;
40
&err);
230
+
41
object_property_set_uint(OBJECT(&s->esdhc[i]), IMX6_ESDHC_CAPABILITIES,
231
+ if (disas_neon_ls(s, a32_insn)) {
42
"capareg", &err);
43
+ object_property_set_uint(OBJECT(&s->esdhc[i]), SDHCI_VENDOR_IMX,
44
+ "vendor", &err);
45
+ if (err) {
46
+ error_propagate(errp, err);
232
+ return;
47
+ return;
233
+ }
48
+ }
234
+ }
49
object_property_set_bool(OBJECT(&s->esdhc[i]), true, "realized", &err);
235
+
50
if (err) {
236
/*
51
error_propagate(errp, err);
237
* TODO: Perhaps merge these into one decodetree output file.
52
diff --git a/hw/arm/fsl-imx6ul.c b/hw/arm/fsl-imx6ul.c
238
* Note disas_vfp is written for a32 with cond field in the
239
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
240
*/
241
if (disas_t32(s, insn) ||
242
disas_vfp_uncond(s, insn) ||
243
+ disas_neon_shared(s, insn) ||
244
((insn >> 28) == 0xe && disas_vfp(s, insn))) {
245
return;
246
}
247
diff --git a/target/arm/Makefile.objs b/target/arm/Makefile.objs
248
index XXXXXXX..XXXXXXX 100644
53
index XXXXXXX..XXXXXXX 100644
249
--- a/target/arm/Makefile.objs
54
--- a/hw/arm/fsl-imx6ul.c
250
+++ b/target/arm/Makefile.objs
55
+++ b/hw/arm/fsl-imx6ul.c
251
@@ -XXX,XX +XXX,XX @@ target/arm/decode-sve.inc.c: $(SRC_PATH)/target/arm/sve.decode $(DECODETREE)
56
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
252
     $(PYTHON) $(DECODETREE) --decode disas_sve -o $@ $<,\
57
FSL_IMX6UL_USDHC2_IRQ,
253
     "GEN", $(TARGET_DIR)$@)
58
};
254
59
255
+target/arm/decode-neon-shared.inc.c: $(SRC_PATH)/target/arm/neon-shared.decode $(DECODETREE)
60
+ object_property_set_uint(OBJECT(&s->usdhc[i]), SDHCI_VENDOR_IMX,
256
+    $(call quiet-command,\
61
+ "vendor", &error_abort);
257
+     $(PYTHON) $(DECODETREE) --static-decode disas_neon_shared -o $@ $<,\
62
object_property_set_bool(OBJECT(&s->usdhc[i]), true, "realized",
258
+     "GEN", $(TARGET_DIR)$@)
63
&error_abort);
259
+
64
260
+target/arm/decode-neon-dp.inc.c: $(SRC_PATH)/target/arm/neon-dp.decode $(DECODETREE)
65
diff --git a/hw/arm/fsl-imx7.c b/hw/arm/fsl-imx7.c
261
+    $(call quiet-command,\
66
index XXXXXXX..XXXXXXX 100644
262
+     $(PYTHON) $(DECODETREE) --static-decode disas_neon_dp -o $@ $<,\
67
--- a/hw/arm/fsl-imx7.c
263
+     "GEN", $(TARGET_DIR)$@)
68
+++ b/hw/arm/fsl-imx7.c
264
+
69
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
265
+target/arm/decode-neon-ls.inc.c: $(SRC_PATH)/target/arm/neon-ls.decode $(DECODETREE)
70
FSL_IMX7_USDHC3_IRQ,
266
+    $(call quiet-command,\
71
};
267
+     $(PYTHON) $(DECODETREE) --static-decode disas_neon_ls -o $@ $<,\
72
268
+     "GEN", $(TARGET_DIR)$@)
73
+ object_property_set_uint(OBJECT(&s->usdhc[i]), SDHCI_VENDOR_IMX,
269
+
74
+ "vendor", &error_abort);
270
target/arm/decode-vfp.inc.c: $(SRC_PATH)/target/arm/vfp.decode $(DECODETREE)
75
object_property_set_bool(OBJECT(&s->usdhc[i]), true, "realized",
271
    $(call quiet-command,\
76
&error_abort);
272
     $(PYTHON) $(DECODETREE) --static-decode disas_vfp -o $@ $<,\
77
273
@@ -XXX,XX +XXX,XX @@ target/arm/decode-t16.inc.c: $(SRC_PATH)/target/arm/t16.decode $(DECODETREE)
274
     "GEN", $(TARGET_DIR)$@)
275
276
target/arm/translate-sve.o: target/arm/decode-sve.inc.c
277
+target/arm/translate.o: target/arm/decode-neon-shared.inc.c
278
+target/arm/translate.o: target/arm/decode-neon-dp.inc.c
279
+target/arm/translate.o: target/arm/decode-neon-ls.inc.c
280
target/arm/translate.o: target/arm/decode-vfp.inc.c
281
target/arm/translate.o: target/arm/decode-vfp-uncond.inc.c
282
target/arm/translate.o: target/arm/decode-a32.inc.c
283
--
78
--
284
2.20.1
79
2.20.1
285
80
286
81
diff view generated by jsdifflib
Deleted patch
1
We're going to want at least some of the NeonGen* typedefs
2
for the refactored 32-bit Neon decoder, so move them all
3
to translate.h since it makes more sense to keep them in
4
one group.
5
1
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20200430181003.21682-23-peter.maydell@linaro.org
9
---
10
target/arm/translate.h | 17 +++++++++++++++++
11
target/arm/translate-a64.c | 17 -----------------
12
2 files changed, 17 insertions(+), 17 deletions(-)
13
14
diff --git a/target/arm/translate.h b/target/arm/translate.h
15
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/translate.h
17
+++ b/target/arm/translate.h
18
@@ -XXX,XX +XXX,XX @@ typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
19
typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t,
20
uint32_t, uint32_t, uint32_t);
21
22
+/* Function prototype for gen_ functions for calling Neon helpers */
23
+typedef void NeonGenOneOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32);
24
+typedef void NeonGenTwoOpFn(TCGv_i32, TCGv_i32, TCGv_i32);
25
+typedef void NeonGenTwoOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32);
26
+typedef void NeonGenTwo64OpFn(TCGv_i64, TCGv_i64, TCGv_i64);
27
+typedef void NeonGenTwo64OpEnvFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
28
+typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
29
+typedef void NeonGenNarrowEnvFn(TCGv_i32, TCGv_ptr, TCGv_i64);
30
+typedef void NeonGenWidenFn(TCGv_i64, TCGv_i32);
31
+typedef void NeonGenTwoSingleOPFn(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
32
+typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
33
+typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64);
34
+typedef void CryptoTwoOpFn(TCGv_ptr, TCGv_ptr);
35
+typedef void CryptoThreeOpIntFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
36
+typedef void CryptoThreeOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
37
+typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
38
+
39
#endif /* TARGET_ARM_TRANSLATE_H */
40
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
41
index XXXXXXX..XXXXXXX 100644
42
--- a/target/arm/translate-a64.c
43
+++ b/target/arm/translate-a64.c
44
@@ -XXX,XX +XXX,XX @@ typedef struct AArch64DecodeTable {
45
AArch64DecodeFn *disas_fn;
46
} AArch64DecodeTable;
47
48
-/* Function prototype for gen_ functions for calling Neon helpers */
49
-typedef void NeonGenOneOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32);
50
-typedef void NeonGenTwoOpFn(TCGv_i32, TCGv_i32, TCGv_i32);
51
-typedef void NeonGenTwoOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32);
52
-typedef void NeonGenTwo64OpFn(TCGv_i64, TCGv_i64, TCGv_i64);
53
-typedef void NeonGenTwo64OpEnvFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
54
-typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
55
-typedef void NeonGenNarrowEnvFn(TCGv_i32, TCGv_ptr, TCGv_i64);
56
-typedef void NeonGenWidenFn(TCGv_i64, TCGv_i32);
57
-typedef void NeonGenTwoSingleOPFn(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
58
-typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
59
-typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64);
60
-typedef void CryptoTwoOpFn(TCGv_ptr, TCGv_ptr);
61
-typedef void CryptoThreeOpIntFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
62
-typedef void CryptoThreeOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
63
-typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
64
-
65
/* initialize TCG globals. */
66
void a64_translate_init(void)
67
{
68
--
69
2.20.1
70
71
diff view generated by jsdifflib