1
Just my fp16 work, plus some small stuff for the sbsa-ref board;
1
Some arm patches; my to-review queue is by no means empty, but
2
but my rule of thumb is to send a pullreq once I get over about
2
this is a big enough set of patches to be getting on with...
3
30 patches...
4
3
5
-- PMM
4
-- PMM
6
5
7
The following changes since commit 2f4c51c0f384d7888a04b4815861e6d5fd244d75:
6
The following changes since commit cb9c6a8e5ad6a1f0ce164d352e3102df46986e22:
8
7
9
Merge remote-tracking branch 'remotes/kraxel/tags/usb-20200831-pull-request' into staging (2020-08-31 19:39:13 +0100)
8
.gitlab-ci.d/windows: Work-around timeout and OpenGL problems of the MSYS2 jobs (2023-01-04 18:58:33 +0000)
10
9
11
are available in the Git repository at:
10
are available in the Git repository at:
12
11
13
https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20200901
12
https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20230105
14
13
15
for you to fetch changes up to 3f462bf0f6ea6382dd1502d4eb1fcd33c8e774f5:
14
for you to fetch changes up to 93c9678de9dc7d2e68f9e8477da072bac30ef132:
16
15
17
hw/arm/sbsa-ref : Add embedded controller in secure memory (2020-09-01 14:01:34 +0100)
16
hw/net: Fix read of uninitialized memory in imx_fec. (2023-01-05 15:33:00 +0000)
18
17
19
----------------------------------------------------------------
18
----------------------------------------------------------------
20
target-arm queue:
19
target-arm queue:
21
* Implement fp16 support for AArch32 VFP and Neon
20
* Implement AArch32 ARMv8-R support
22
* hw/arm/sbsa-ref: add "reg" property to DT cpu nodes
21
* Add Cortex-R52 CPU
23
* hw/arm/sbsa-ref : Add embedded controller in secure memory
22
* fix handling of HLT semihosting in system mode
23
* hw/timer/ixm_epit: cleanup and fix bug in compare handling
24
* target/arm: Coding style fixes
25
* target/arm: Clean up includes
26
* nseries: minor code cleanups
27
* target/arm: align exposed ID registers with Linux
28
* hw/arm/smmu-common: remove unnecessary inlines
29
* i.MX7D: Handle GPT timers
30
* i.MX7D: Connect IRQs to GPIO devices
31
* i.MX6UL: Add a specific GPT timer instance
32
* hw/net: Fix read of uninitialized memory in imx_fec
24
33
25
----------------------------------------------------------------
34
----------------------------------------------------------------
26
Graeme Gregory (2):
35
Alex Bennée (1):
27
hw/misc/sbsa_ec : Add an embedded controller for sbsa-ref
36
target/arm: fix handling of HLT semihosting in system mode
28
hw/arm/sbsa-ref : Add embedded controller in secure memory
29
37
30
Leif Lindholm (1):
38
Axel Heider (8):
31
hw/arm/sbsa-ref: add "reg" property to DT cpu nodes
39
hw/timer/imx_epit: improve comments
40
hw/timer/imx_epit: cleanup CR defines
41
hw/timer/imx_epit: define SR_OCIF
42
hw/timer/imx_epit: update interrupt state on CR write access
43
hw/timer/imx_epit: hard reset initializes CR with 0
44
hw/timer/imx_epit: factor out register write handlers
45
hw/timer/imx_epit: remove explicit fields cnt and freq
46
hw/timer/imx_epit: fix compare timer handling
32
47
33
Peter Maydell (44):
48
Claudio Fontana (1):
34
target/arm: Remove local definitions of float constants
49
target/arm: cleanup cpu includes
35
target/arm: Use correct ID register check for aa32_fp16_arith
36
target/arm: Implement VFP fp16 for VFP_BINOP operations
37
target/arm: Implement VFP fp16 VMLA, VMLS, VNMLS, VNMLA, VNMUL
38
target/arm: Macroify trans functions for VFMA, VFMS, VFNMA, VFNMS
39
target/arm: Implement VFP fp16 for fused-multiply-add
40
target/arm: Macroify uses of do_vfp_2op_sp() and do_vfp_2op_dp()
41
target/arm: Implement VFP fp16 for VABS, VNEG, VSQRT
42
target/arm: Implement VFP fp16 for VMOV immediate
43
target/arm: Implement VFP fp16 VCMP
44
target/arm: Implement VFP fp16 VLDR and VSTR
45
target/arm: Implement VFP fp16 VCVT between float and integer
46
target/arm: Make VFP_CONV_FIX macros take separate float type and float size
47
target/arm: Use macros instead of open-coding fp16 conversion helpers
48
target/arm: Implement VFP fp16 VCVT between float and fixed-point
49
target/arm: Implement VFP vp16 VCVT-with-specified-rounding-mode
50
target/arm: Implement VFP fp16 VSEL
51
target/arm: Implement VFP fp16 VRINT*
52
target/arm: Implement new VFP fp16 insn VINS
53
target/arm: Implement new VFP fp16 insn VMOVX
54
target/arm: Implement VFP fp16 VMOV between gp and halfprec registers
55
target/arm: Implement FP16 for Neon VADD, VSUB, VABD, VMUL
56
target/arm: Implement fp16 for Neon VRECPE, VRSQRTE using gvec
57
target/arm: Implement fp16 for Neon VABS, VNEG of floats
58
target/arm: Implement fp16 for VCEQ, VCGE, VCGT comparisons
59
target/arm: Implement fp16 for VACGE, VACGT
60
target/arm: Implement fp16 for Neon VMAX, VMIN
61
target/arm: Implement fp16 for Neon VMAXNM, VMINNM
62
target/arm: Implement fp16 for Neon VMLA, VMLS operations
63
target/arm: Implement fp16 for Neon VFMA, VMFS
64
target/arm: Implement fp16 for Neon fp compare-vs-0
65
target/arm: Implement fp16 for Neon VRECPS
66
target/arm: Implement fp16 for Neon VRSQRTS
67
target/arm: Implement fp16 for Neon pairwise fp ops
68
target/arm: Implement fp16 for Neon float-integer VCVT
69
target/arm: Convert Neon VCVT fixed-point to gvec
70
target/arm: Implement fp16 for Neon VCVT fixed-point
71
target/arm: Implement fp16 for Neon VCVT with rounding modes
72
target/arm: Implement fp16 for Neon VRINT-with-specified-rounding-mode
73
target/arm: Implement fp16 for Neon VRINTX
74
target/arm/vec_helper: Handle oprsz less than 16 bytes in indexed operations
75
target/arm/vec_helper: Add gvec fp indexed multiply-and-add operations
76
target/arm: Implement fp16 for Neon VMUL, VMLA, VMLS
77
target/arm: Enable FP16 in '-cpu max'
78
50
79
target/arm/cpu.h | 7 +-
51
Fabiano Rosas (5):
80
target/arm/helper.h | 133 ++++++-
52
target/arm: Fix checkpatch comment style warnings in helper.c
81
target/arm/neon-dp.decode | 8 +-
53
target/arm: Fix checkpatch space errors in helper.c
82
target/arm/vfp-uncond.decode | 27 +-
54
target/arm: Fix checkpatch brace errors in helper.c
83
target/arm/vfp.decode | 34 +-
55
target/arm: Remove unused includes from m_helper.c
84
hw/arm/sbsa-ref.c | 43 ++-
56
target/arm: Remove unused includes from helper.c
85
hw/misc/sbsa_ec.c | 98 +++++
86
target/arm/cpu.c | 3 +-
87
target/arm/cpu64.c | 10 +-
88
target/arm/helper-a64.c | 11 -
89
target/arm/translate-sve.c | 4 -
90
target/arm/vec_helper.c | 431 ++++++++++++++++++++-
91
target/arm/vfp_helper.c | 244 +++++-------
92
hw/misc/meson.build | 2 +
93
target/arm/translate-neon.c.inc | 755 +++++++++++++------------------------
94
target/arm/translate-vfp.c.inc | 810 ++++++++++++++++++++++++++++++++++++----
95
16 files changed, 1819 insertions(+), 801 deletions(-)
96
create mode 100644 hw/misc/sbsa_ec.c
97
57
58
Jean-Christophe Dubois (4):
59
i.MX7D: Connect GPT timers to IRQ
60
i.MX7D: Compute clock frequency for the fixed frequency clocks.
61
i.MX6UL: Add a specific GPT timer instance for the i.MX6UL
62
i.MX7D: Connect IRQs to GPIO devices.
63
64
Peter Maydell (1):
65
target/arm:Set lg_page_size to 0 if either S1 or S2 asks for it
66
67
Philippe Mathieu-Daudé (5):
68
hw/input/tsc2xxx: Constify set_transform()'s MouseTransformInfo arg
69
hw/arm/nseries: Constify various read-only arrays
70
hw/arm/nseries: Silent -Wmissing-field-initializers warning
71
hw/arm/smmu-common: Reduce smmu_inv_notifiers_mr() scope
72
hw/arm/smmu-common: Avoid using inlined functions with external linkage
73
74
Stephen Longfield (1):
75
hw/net: Fix read of uninitialized memory in imx_fec.
76
77
Tobias Röhmel (7):
78
target/arm: Don't add all MIDR aliases for cores that implement PMSA
79
target/arm: Make RVBAR available for all ARMv8 CPUs
80
target/arm: Make stage_2_format for cache attributes optional
81
target/arm: Enable TTBCR_EAE for ARMv8-R AArch32
82
target/arm: Add PMSAv8r registers
83
target/arm: Add PMSAv8r functionality
84
target/arm: Add ARM Cortex-R52 CPU
85
86
Zhuojia Shen (1):
87
target/arm: align exposed ID registers with Linux
88
89
include/hw/arm/fsl-imx7.h | 20 +
90
include/hw/arm/smmu-common.h | 3 -
91
include/hw/input/tsc2xxx.h | 4 +-
92
include/hw/timer/imx_epit.h | 8 +-
93
include/hw/timer/imx_gpt.h | 1 +
94
target/arm/cpu.h | 6 +
95
target/arm/internals.h | 4 +
96
hw/arm/fsl-imx6ul.c | 2 +-
97
hw/arm/fsl-imx7.c | 41 +-
98
hw/arm/nseries.c | 28 +-
99
hw/arm/smmu-common.c | 15 +-
100
hw/input/tsc2005.c | 2 +-
101
hw/input/tsc210x.c | 3 +-
102
hw/misc/imx6ul_ccm.c | 6 -
103
hw/misc/imx7_ccm.c | 49 ++-
104
hw/net/imx_fec.c | 8 +-
105
hw/timer/imx_epit.c | 376 +++++++++-------
106
hw/timer/imx_gpt.c | 25 ++
107
target/arm/cpu.c | 35 +-
108
target/arm/cpu64.c | 6 -
109
target/arm/cpu_tcg.c | 42 ++
110
target/arm/debug_helper.c | 3 +
111
target/arm/helper.c | 871 +++++++++++++++++++++++++++++---------
112
target/arm/m_helper.c | 16 -
113
target/arm/machine.c | 28 ++
114
target/arm/ptw.c | 152 +++++--
115
target/arm/tlb_helper.c | 4 +
116
target/arm/translate.c | 2 +-
117
tests/tcg/aarch64/sysregs.c | 24 +-
118
tests/tcg/aarch64/Makefile.target | 7 +-
119
30 files changed, 1330 insertions(+), 461 deletions(-)
120
diff view generated by jsdifflib
Deleted patch
1
In several places the target/arm code defines local float constants
2
for 2, 3 and 1.5, which are also provided by include/fpu/softfloat.h.
3
Remove the unnecessary local duplicate versions.
4
1
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
7
Message-id: 20200828183354.27913-2-peter.maydell@linaro.org
8
---
9
target/arm/helper-a64.c | 11 -----------
10
target/arm/translate-sve.c | 4 ----
11
target/arm/vfp_helper.c | 4 ----
12
3 files changed, 19 deletions(-)
13
14
diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
15
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/helper-a64.c
17
+++ b/target/arm/helper-a64.c
18
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(neon_cgt_f64)(float64 a, float64 b, void *fpstp)
19
* versions, these do a fully fused multiply-add or
20
* multiply-add-and-halve.
21
*/
22
-#define float16_two make_float16(0x4000)
23
-#define float16_three make_float16(0x4200)
24
-#define float16_one_point_five make_float16(0x3e00)
25
-
26
-#define float32_two make_float32(0x40000000)
27
-#define float32_three make_float32(0x40400000)
28
-#define float32_one_point_five make_float32(0x3fc00000)
29
-
30
-#define float64_two make_float64(0x4000000000000000ULL)
31
-#define float64_three make_float64(0x4008000000000000ULL)
32
-#define float64_one_point_five make_float64(0x3FF8000000000000ULL)
33
34
uint32_t HELPER(recpsf_f16)(uint32_t a, uint32_t b, void *fpstp)
35
{
36
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
37
index XXXXXXX..XXXXXXX 100644
38
--- a/target/arm/translate-sve.c
39
+++ b/target/arm/translate-sve.c
40
@@ -XXX,XX +XXX,XX @@ static bool trans_##NAME##_zpzi(DisasContext *s, arg_rpri_esz *a) \
41
return true; \
42
}
43
44
-#define float16_two make_float16(0x4000)
45
-#define float32_two make_float32(0x40000000)
46
-#define float64_two make_float64(0x4000000000000000ULL)
47
-
48
DO_FP_IMM(FADD, fadds, half, one)
49
DO_FP_IMM(FSUB, fsubs, half, one)
50
DO_FP_IMM(FMUL, fmuls, half, two)
51
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
52
index XXXXXXX..XXXXXXX 100644
53
--- a/target/arm/vfp_helper.c
54
+++ b/target/arm/vfp_helper.c
55
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(vfp_fcvt_f64_to_f16)(float64 a, void *fpstp, uint32_t ahp_mode)
56
return r;
57
}
58
59
-#define float32_two make_float32(0x40000000)
60
-#define float32_three make_float32(0x40400000)
61
-#define float32_one_point_five make_float32(0x3fc00000)
62
-
63
float32 HELPER(recps_f32)(CPUARMState *env, float32 a, float32 b)
64
{
65
float_status *s = &env->vfp.standard_fp_status;
66
--
67
2.20.1
68
69
diff view generated by jsdifflib
1
Convert the Neon VCVT float<->fixed-point insns to a
1
In get_phys_addr_twostage() we set the lg_page_size of the result to
2
gvec style, in preparation for adding fp16 support.
2
the maximum of the stage 1 and stage 2 page sizes. This works for
3
the case where we do want to create a TLB entry, because we know the
4
common TLB code only creates entries of the TARGET_PAGE_SIZE and
5
asking for a size larger than that only means that invalidations
6
invalidate the whole larger area. However, if lg_page_size is
7
smaller than TARGET_PAGE_SIZE this effectively means "don't create a
8
TLB entry"; in this case if either S1 or S2 said "this covers less
9
than a page and can't go in a TLB" then the final result also should
10
be marked that way. Set the resulting page size to 0 if either
11
stage asked for a less-than-a-page entry, and expand the comment
12
to explain what's going on.
13
14
This has no effect for VMSA because currently the VMSA lookup always
15
returns results that cover at least TARGET_PAGE_SIZE; however when we
16
add v8R support it will reuse this code path, and for v8R the S1 and
17
S2 results can be smaller than TARGET_PAGE_SIZE.
3
18
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
19
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
20
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Message-id: 20200828183354.27913-38-peter.maydell@linaro.org
21
Message-id: 20221212142708.610090-1-peter.maydell@linaro.org
7
---
22
---
8
target/arm/helper.h | 5 +++++
23
target/arm/ptw.c | 16 +++++++++++++---
9
target/arm/vec_helper.c | 20 +++++++++++++++++++
24
1 file changed, 13 insertions(+), 3 deletions(-)
10
target/arm/translate-neon.c.inc | 35 +++++++++++++++++----------------
11
3 files changed, 43 insertions(+), 17 deletions(-)
12
25
13
diff --git a/target/arm/helper.h b/target/arm/helper.h
26
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
14
index XXXXXXX..XXXXXXX 100644
27
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/helper.h
28
--- a/target/arm/ptw.c
16
+++ b/target/arm/helper.h
29
+++ b/target/arm/ptw.c
17
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_tosizs, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
30
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
18
DEF_HELPER_FLAGS_4(gvec_touszh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
19
DEF_HELPER_FLAGS_4(gvec_touizs, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
20
21
+DEF_HELPER_FLAGS_4(gvec_vcvt_sf, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
22
+DEF_HELPER_FLAGS_4(gvec_vcvt_uf, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
23
+DEF_HELPER_FLAGS_4(gvec_vcvt_fs, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
24
+DEF_HELPER_FLAGS_4(gvec_vcvt_fu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
25
+
26
DEF_HELPER_FLAGS_4(gvec_frecpe_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
27
DEF_HELPER_FLAGS_4(gvec_frecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
28
DEF_HELPER_FLAGS_4(gvec_frecpe_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
29
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
30
index XXXXXXX..XXXXXXX 100644
31
--- a/target/arm/vec_helper.c
32
+++ b/target/arm/vec_helper.c
33
@@ -XXX,XX +XXX,XX @@ DO_NEON_PAIRWISE(neon_pmax, max)
34
DO_NEON_PAIRWISE(neon_pmin, min)
35
36
#undef DO_NEON_PAIRWISE
37
+
38
+#define DO_VCVT_FIXED(NAME, FUNC, TYPE) \
39
+ void HELPER(NAME)(void *vd, void *vn, void *stat, uint32_t desc) \
40
+ { \
41
+ intptr_t i, oprsz = simd_oprsz(desc); \
42
+ int shift = simd_data(desc); \
43
+ TYPE *d = vd, *n = vn; \
44
+ float_status *fpst = stat; \
45
+ for (i = 0; i < oprsz / sizeof(TYPE); i++) { \
46
+ d[i] = FUNC(n[i], shift, fpst); \
47
+ } \
48
+ clear_tail(d, oprsz, simd_maxsz(desc)); \
49
+ }
50
+
51
+DO_VCVT_FIXED(gvec_vcvt_sf, helper_vfp_sltos, uint32_t)
52
+DO_VCVT_FIXED(gvec_vcvt_uf, helper_vfp_ultos, uint32_t)
53
+DO_VCVT_FIXED(gvec_vcvt_fs, helper_vfp_tosls_round_to_zero, uint32_t)
54
+DO_VCVT_FIXED(gvec_vcvt_fu, helper_vfp_touls_round_to_zero, uint32_t)
55
+
56
+#undef DO_VCVT_FIXED
57
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
58
index XXXXXXX..XXXXXXX 100644
59
--- a/target/arm/translate-neon.c.inc
60
+++ b/target/arm/translate-neon.c.inc
61
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL_U_2sh(DisasContext *s, arg_2reg_shift *a)
62
}
63
64
static bool do_fp_2sh(DisasContext *s, arg_2reg_shift *a,
65
- NeonGenTwoSingleOpFn *fn)
66
+ gen_helper_gvec_2_ptr *fn)
67
{
68
/* FP operations in 2-reg-and-shift group */
69
- TCGv_i32 tmp, shiftv;
70
- TCGv_ptr fpstatus;
71
- int pass;
72
+ int vec_size = a->q ? 16 : 8;
73
+ int rd_ofs = neon_reg_offset(a->vd, 0);
74
+ int rm_ofs = neon_reg_offset(a->vm, 0);
75
+ TCGv_ptr fpst;
76
77
if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
78
return false;
79
}
31
}
80
32
81
+ if (a->size != 0) {
33
/*
82
+ if (!dc_isar_feature(aa32_fp16_arith, s)) {
34
- * Use the maximum of the S1 & S2 page size, so that invalidation
83
+ return false;
35
- * of pages > TARGET_PAGE_SIZE works correctly.
84
+ }
36
+ * If either S1 or S2 returned a result smaller than TARGET_PAGE_SIZE,
85
+ }
37
+ * this means "don't put this in the TLB"; in this case, return a
86
+
38
+ * result with lg_page_size == 0 to achieve that. Otherwise,
87
/* UNDEF accesses to D16-D31 if they don't exist. */
39
+ * use the maximum of the S1 & S2 page size, so that invalidation
88
if (!dc_isar_feature(aa32_simd_r32, s) &&
40
+ * of pages > TARGET_PAGE_SIZE works correctly. (This works even though
89
((a->vd | a->vm) & 0x10)) {
41
+ * we know the combined result permissions etc only cover the minimum
90
@@ -XXX,XX +XXX,XX @@ static bool do_fp_2sh(DisasContext *s, arg_2reg_shift *a,
42
+ * of the S1 and S2 page size, because we know that the common TLB code
91
return true;
43
+ * never actually creates TLB entries bigger than TARGET_PAGE_SIZE,
44
+ * and passing a larger page size value only affects invalidations.)
45
*/
46
- if (result->f.lg_page_size < s1_lgpgsz) {
47
+ if (result->f.lg_page_size < TARGET_PAGE_BITS ||
48
+ s1_lgpgsz < TARGET_PAGE_BITS) {
49
+ result->f.lg_page_size = 0;
50
+ } else if (result->f.lg_page_size < s1_lgpgsz) {
51
result->f.lg_page_size = s1_lgpgsz;
92
}
52
}
93
53
94
- fpstatus = fpstatus_ptr(FPST_STD);
95
- shiftv = tcg_const_i32(a->shift);
96
- for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
97
- tmp = neon_load_reg(a->vm, pass);
98
- fn(tmp, tmp, shiftv, fpstatus);
99
- neon_store_reg(a->vd, pass, tmp);
100
- }
101
- tcg_temp_free_ptr(fpstatus);
102
- tcg_temp_free_i32(shiftv);
103
+ fpst = fpstatus_ptr(a->size ? FPST_STD_F16 : FPST_STD);
104
+ tcg_gen_gvec_2_ptr(rd_ofs, rm_ofs, fpst, vec_size, vec_size, a->shift, fn);
105
+ tcg_temp_free_ptr(fpst);
106
return true;
107
}
108
109
@@ -XXX,XX +XXX,XX @@ static bool do_fp_2sh(DisasContext *s, arg_2reg_shift *a,
110
return do_fp_2sh(s, a, FUNC); \
111
}
112
113
-DO_FP_2SH(VCVT_SF, gen_helper_vfp_sltos)
114
-DO_FP_2SH(VCVT_UF, gen_helper_vfp_ultos)
115
-DO_FP_2SH(VCVT_FS, gen_helper_vfp_tosls_round_to_zero)
116
-DO_FP_2SH(VCVT_FU, gen_helper_vfp_touls_round_to_zero)
117
+DO_FP_2SH(VCVT_SF, gen_helper_gvec_vcvt_sf)
118
+DO_FP_2SH(VCVT_UF, gen_helper_gvec_vcvt_uf)
119
+DO_FP_2SH(VCVT_FS, gen_helper_gvec_vcvt_fs)
120
+DO_FP_2SH(VCVT_FU, gen_helper_gvec_vcvt_fu)
121
122
static uint64_t asimd_imm_const(uint32_t imm, int cmode, int op)
123
{
124
--
54
--
125
2.20.1
55
2.25.1
126
127
diff view generated by jsdifflib
1
Implement the fp16 versions of the VFP VCVT instruction forms
1
From: Tobias Röhmel <tobias.roehmel@rwth-aachen.de>
2
which convert between floating point and integer with a specified
3
rounding mode.
4
2
3
Cores with PMSA have the MPUIR register which has the
4
same encoding as the MIDR alias with opc2=4. So we only
5
add that alias if we are not realizing a core that
6
implements PMSA.
7
8
Signed-off-by: Tobias Röhmel <tobias.roehmel@rwth-aachen.de>
9
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
10
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
11
Message-id: 20221206102504.165775-2-tobias.roehmel@rwth-aachen.de
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
12
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
7
Message-id: 20200828183354.27913-17-peter.maydell@linaro.org
8
---
13
---
9
target/arm/vfp-uncond.decode | 6 ++++--
14
target/arm/helper.c | 13 +++++++++----
10
target/arm/translate-vfp.c.inc | 32 ++++++++++++++++++++++++--------
15
1 file changed, 9 insertions(+), 4 deletions(-)
11
2 files changed, 28 insertions(+), 10 deletions(-)
12
16
13
diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
17
diff --git a/target/arm/helper.c b/target/arm/helper.c
14
index XXXXXXX..XXXXXXX 100644
18
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/vfp-uncond.decode
19
--- a/target/arm/helper.c
16
+++ b/target/arm/vfp-uncond.decode
20
+++ b/target/arm/helper.c
17
@@ -XXX,XX +XXX,XX @@ VRINT 1111 1110 1.11 10 rm:2 .... 1011 01.0 .... \
21
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
18
vm=%vm_dp vd=%vd_dp dp=1
22
.access = PL1_R, .type = ARM_CP_NO_RAW, .resetvalue = cpu->midr,
19
23
.fieldoffset = offsetof(CPUARMState, cp15.c0_cpuid),
20
# VCVT float to int with specified rounding mode; Vd is always single-precision
24
.readfn = midr_read },
21
+VCVT 1111 1110 1.11 11 rm:2 .... 1001 op:1 1.0 .... \
25
- /* crn = 0 op1 = 0 crm = 0 op2 = 4,7 : AArch32 aliases of MIDR */
22
+ vm=%vm_sp vd=%vd_sp sz=1
26
- { .name = "MIDR", .type = ARM_CP_ALIAS | ARM_CP_CONST,
23
VCVT 1111 1110 1.11 11 rm:2 .... 1010 op:1 1.0 .... \
27
- .cp = 15, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 4,
24
- vm=%vm_sp vd=%vd_sp dp=0
28
- .access = PL1_R, .resetvalue = cpu->midr },
25
+ vm=%vm_sp vd=%vd_sp sz=2
29
+ /* crn = 0 op1 = 0 crm = 0 op2 = 7 : AArch32 aliases of MIDR */
26
VCVT 1111 1110 1.11 11 rm:2 .... 1011 op:1 1.0 .... \
30
{ .name = "MIDR", .type = ARM_CP_ALIAS | ARM_CP_CONST,
27
- vm=%vm_dp vd=%vd_sp dp=1
31
.cp = 15, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 7,
28
+ vm=%vm_dp vd=%vd_sp sz=3
32
.access = PL1_R, .resetvalue = cpu->midr },
29
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
33
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
30
index XXXXXXX..XXXXXXX 100644
34
.accessfn = access_aa64_tid1,
31
--- a/target/arm/translate-vfp.c.inc
35
.type = ARM_CP_CONST, .resetvalue = cpu->revidr },
32
+++ b/target/arm/translate-vfp.c.inc
36
};
33
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
37
+ ARMCPRegInfo id_v8_midr_alias_cp_reginfo = {
34
static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
38
+ .name = "MIDR", .type = ARM_CP_ALIAS | ARM_CP_CONST,
35
{
39
+ .cp = 15, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 4,
36
uint32_t rd, rm;
40
+ .access = PL1_R, .resetvalue = cpu->midr
37
- bool dp = a->dp;
41
+ };
38
+ int sz = a->sz;
42
ARMCPRegInfo id_cp_reginfo[] = {
39
TCGv_ptr fpst;
43
/* These are common to v8 and pre-v8 */
40
TCGv_i32 tcg_rmode, tcg_shift;
44
{ .name = "CTR",
41
int rounding = fp_decode_rm[a->rm];
45
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
42
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
46
}
43
return false;
47
if (arm_feature(env, ARM_FEATURE_V8)) {
44
}
48
define_arm_cp_regs(cpu, id_v8_midr_cp_reginfo);
45
49
+ if (!arm_feature(env, ARM_FEATURE_PMSA)) {
46
- if (dp && !dc_isar_feature(aa32_fpdp_v2, s)) {
50
+ define_one_arm_cp_reg(cpu, &id_v8_midr_alias_cp_reginfo);
47
+ if (sz == 3 && !dc_isar_feature(aa32_fpdp_v2, s)) {
48
+ return false;
49
+ }
50
+
51
+ if (sz == 1 && !dc_isar_feature(aa32_fp16_arith, s)) {
52
return false;
53
}
54
55
/* UNDEF accesses to D16-D31 if they don't exist */
56
- if (dp && !dc_isar_feature(aa32_simd_r32, s) && (a->vm & 0x10)) {
57
+ if (sz == 3 && !dc_isar_feature(aa32_simd_r32, s) && (a->vm & 0x10)) {
58
return false;
59
}
60
61
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
62
return true;
63
}
64
65
- fpst = fpstatus_ptr(FPST_FPCR);
66
+ if (sz == 1) {
67
+ fpst = fpstatus_ptr(FPST_FPCR_F16);
68
+ } else {
69
+ fpst = fpstatus_ptr(FPST_FPCR);
70
+ }
71
72
tcg_shift = tcg_const_i32(0);
73
74
tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
75
gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
76
77
- if (dp) {
78
+ if (sz == 3) {
79
TCGv_i64 tcg_double, tcg_res;
80
TCGv_i32 tcg_tmp;
81
tcg_double = tcg_temp_new_i64();
82
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
83
tcg_single = tcg_temp_new_i32();
84
tcg_res = tcg_temp_new_i32();
85
neon_load_reg32(tcg_single, rm);
86
- if (is_signed) {
87
- gen_helper_vfp_tosls(tcg_res, tcg_single, tcg_shift, fpst);
88
+ if (sz == 1) {
89
+ if (is_signed) {
90
+ gen_helper_vfp_toslh(tcg_res, tcg_single, tcg_shift, fpst);
91
+ } else {
92
+ gen_helper_vfp_toulh(tcg_res, tcg_single, tcg_shift, fpst);
93
+ }
51
+ }
94
} else {
52
} else {
95
- gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
53
define_arm_cp_regs(cpu, id_pre_v8_midr_cp_reginfo);
96
+ if (is_signed) {
97
+ gen_helper_vfp_tosls(tcg_res, tcg_single, tcg_shift, fpst);
98
+ } else {
99
+ gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
100
+ }
101
}
54
}
102
neon_store_reg32(tcg_res, rd);
103
tcg_temp_free_i32(tcg_res);
104
--
55
--
105
2.20.1
56
2.25.1
106
57
107
58
diff view generated by jsdifflib
1
Convert the Neon floating-point VMUL, VMLA and VMLS to use gvec,
1
From: Tobias Röhmel <tobias.roehmel@rwth-aachen.de>
2
and use this to implement fp16 support.
3
2
3
RVBAR shadows RVBAR_ELx where x is the highest exception
4
level if the highest EL is not EL3. This patch also allows
5
ARMv8 CPUs to change the reset address with
6
the rvbar property.
7
8
Signed-off-by: Tobias Röhmel <tobias.roehmel@rwth-aachen.de>
9
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
10
Message-id: 20221206102504.165775-3-tobias.roehmel@rwth-aachen.de
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Message-id: 20200828183354.27913-45-peter.maydell@linaro.org
7
---
12
---
8
target/arm/translate-neon.c.inc | 114 ++++++++++++++++----------------
13
target/arm/cpu.c | 6 +++++-
9
1 file changed, 57 insertions(+), 57 deletions(-)
14
target/arm/helper.c | 21 ++++++++++++++-------
15
2 files changed, 19 insertions(+), 8 deletions(-)
10
16
11
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
17
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
12
index XXXXXXX..XXXXXXX 100644
18
index XXXXXXX..XXXXXXX 100644
13
--- a/target/arm/translate-neon.c.inc
19
--- a/target/arm/cpu.c
14
+++ b/target/arm/translate-neon.c.inc
20
+++ b/target/arm/cpu.c
15
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLS_2sc(DisasContext *s, arg_2scalar *a)
21
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj)
16
return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
22
env->cp15.cpacr_el1 = FIELD_DP64(env->cp15.cpacr_el1,
17
}
23
CPACR, CP11, 3);
18
24
#endif
19
-/*
25
+ if (arm_feature(env, ARM_FEATURE_V8)) {
20
- * Rather than have a float-specific version of do_2scalar just for
26
+ env->cp15.rvbar = cpu->rvbar_prop;
21
- * three insns, we wrap a NeonGenTwoSingleOpFn to turn it into
27
+ env->regs[15] = cpu->rvbar_prop;
22
- * a NeonGenTwoOpFn.
28
+ }
23
- */
24
-#define WRAP_FP_FN(WRAPNAME, FUNC) \
25
- static void WRAPNAME(TCGv_i32 rd, TCGv_i32 rn, TCGv_i32 rm) \
26
- { \
27
- TCGv_ptr fpstatus = fpstatus_ptr(FPST_STD); \
28
- FUNC(rd, rn, rm, fpstatus); \
29
- tcg_temp_free_ptr(fpstatus); \
30
+static bool do_2scalar_fp_vec(DisasContext *s, arg_2scalar *a,
31
+ gen_helper_gvec_3_ptr *fn)
32
+{
33
+ /* Two registers and a scalar, using gvec */
34
+ int vec_size = a->q ? 16 : 8;
35
+ int rd_ofs = neon_reg_offset(a->vd, 0);
36
+ int rn_ofs = neon_reg_offset(a->vn, 0);
37
+ int rm_ofs;
38
+ int idx;
39
+ TCGv_ptr fpstatus;
40
+
41
+ if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
42
+ return false;
43
}
29
}
44
30
45
-WRAP_FP_FN(gen_VMUL_F_mul, gen_helper_vfp_muls)
31
#if defined(CONFIG_USER_ONLY)
46
-WRAP_FP_FN(gen_VMUL_F_add, gen_helper_vfp_adds)
32
@@ -XXX,XX +XXX,XX @@ void arm_cpu_post_init(Object *obj)
47
-WRAP_FP_FN(gen_VMUL_F_sub, gen_helper_vfp_subs)
33
qdev_property_add_static(DEVICE(obj), &arm_cpu_reset_hivecs_property);
48
+ /* UNDEF accesses to D16-D31 if they don't exist. */
34
}
49
+ if (!dc_isar_feature(aa32_simd_r32, s) &&
35
50
+ ((a->vd | a->vn | a->vm) & 0x10)) {
36
- if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
51
+ return false;
37
+ if (arm_feature(&cpu->env, ARM_FEATURE_V8)) {
52
+ }
38
object_property_add_uint64_ptr(obj, "rvbar",
53
39
&cpu->rvbar_prop,
54
-static bool trans_VMUL_F_2sc(DisasContext *s, arg_2scalar *a)
40
OBJ_PROP_FLAG_READWRITE);
55
-{
41
diff --git a/target/arm/helper.c b/target/arm/helper.c
56
- static NeonGenTwoOpFn * const opfn[] = {
42
index XXXXXXX..XXXXXXX 100644
57
- NULL,
43
--- a/target/arm/helper.c
58
- NULL, /* TODO: fp16 support */
44
+++ b/target/arm/helper.c
59
- gen_VMUL_F_mul,
45
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
60
- NULL,
46
if (!arm_feature(env, ARM_FEATURE_EL3) &&
61
- };
47
!arm_feature(env, ARM_FEATURE_EL2)) {
62
+ if (!fn) {
48
ARMCPRegInfo rvbar = {
63
+ /* Bad size (including size == 3, which is a different insn group) */
49
- .name = "RVBAR_EL1", .state = ARM_CP_STATE_AA64,
64
+ return false;
50
+ .name = "RVBAR_EL1", .state = ARM_CP_STATE_BOTH,
65
+ }
51
.opc0 = 3, .opc1 = 0, .crn = 12, .crm = 0, .opc2 = 1,
66
52
.access = PL1_R,
67
- return do_2scalar(s, a, opfn[a->size], NULL);
53
.fieldoffset = offsetof(CPUARMState, cp15.rvbar),
68
+ if (a->q && ((a->vd | a->vn) & 1)) {
54
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
69
+ return false;
55
}
70
+ }
56
/* RVBAR_EL2 is only implemented if EL2 is the highest EL */
71
+
57
if (!arm_feature(env, ARM_FEATURE_EL3)) {
72
+ if (!vfp_access_check(s)) {
58
- ARMCPRegInfo rvbar = {
73
+ return true;
59
- .name = "RVBAR_EL2", .state = ARM_CP_STATE_AA64,
74
+ }
60
- .opc0 = 3, .opc1 = 4, .crn = 12, .crm = 0, .opc2 = 1,
75
+
61
- .access = PL2_R,
76
+ /* a->vm is M:Vm, which encodes both register and index */
62
- .fieldoffset = offsetof(CPUARMState, cp15.rvbar),
77
+ idx = extract32(a->vm, a->size + 2, 2);
63
+ ARMCPRegInfo rvbar[] = {
78
+ a->vm = extract32(a->vm, 0, a->size + 2);
64
+ {
79
+ rm_ofs = neon_reg_offset(a->vm, 0);
65
+ .name = "RVBAR_EL2", .state = ARM_CP_STATE_AA64,
80
+
66
+ .opc0 = 3, .opc1 = 4, .crn = 12, .crm = 0, .opc2 = 1,
81
+ fpstatus = fpstatus_ptr(a->size == 1 ? FPST_STD_F16 : FPST_STD);
67
+ .access = PL2_R,
82
+ tcg_gen_gvec_3_ptr(rd_ofs, rn_ofs, rm_ofs, fpstatus,
68
+ .fieldoffset = offsetof(CPUARMState, cp15.rvbar),
83
+ vec_size, vec_size, idx, fn);
69
+ },
84
+ tcg_temp_free_ptr(fpstatus);
70
+ { .name = "RVBAR", .type = ARM_CP_ALIAS,
85
+ return true;
71
+ .cp = 15, .opc1 = 0, .crn = 12, .crm = 0, .opc2 = 1,
86
}
72
+ .access = PL2_R,
87
73
+ .fieldoffset = offsetof(CPUARMState, cp15.rvbar),
88
-static bool trans_VMLA_F_2sc(DisasContext *s, arg_2scalar *a)
74
+ },
89
-{
75
};
90
- static NeonGenTwoOpFn * const opfn[] = {
76
- define_one_arm_cp_reg(cpu, &rvbar);
91
- NULL,
77
+ define_arm_cp_regs(cpu, rvbar);
92
- NULL, /* TODO: fp16 support */
78
}
93
- gen_VMUL_F_mul,
79
}
94
- NULL,
80
95
- };
96
- static NeonGenTwoOpFn * const accfn[] = {
97
- NULL,
98
- NULL, /* TODO: fp16 support */
99
- gen_VMUL_F_add,
100
- NULL,
101
- };
102
+#define DO_VMUL_F_2sc(NAME, FUNC) \
103
+ static bool trans_##NAME##_F_2sc(DisasContext *s, arg_2scalar *a) \
104
+ { \
105
+ static gen_helper_gvec_3_ptr * const opfn[] = { \
106
+ NULL, \
107
+ gen_helper_##FUNC##_h, \
108
+ gen_helper_##FUNC##_s, \
109
+ NULL, \
110
+ }; \
111
+ if (a->size == MO_16 && !dc_isar_feature(aa32_fp16_arith, s)) { \
112
+ return false; \
113
+ } \
114
+ return do_2scalar_fp_vec(s, a, opfn[a->size]); \
115
+ }
116
117
- return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
118
-}
119
-
120
-static bool trans_VMLS_F_2sc(DisasContext *s, arg_2scalar *a)
121
-{
122
- static NeonGenTwoOpFn * const opfn[] = {
123
- NULL,
124
- NULL, /* TODO: fp16 support */
125
- gen_VMUL_F_mul,
126
- NULL,
127
- };
128
- static NeonGenTwoOpFn * const accfn[] = {
129
- NULL,
130
- NULL, /* TODO: fp16 support */
131
- gen_VMUL_F_sub,
132
- NULL,
133
- };
134
-
135
- return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
136
-}
137
+DO_VMUL_F_2sc(VMUL, gvec_fmul_idx)
138
+DO_VMUL_F_2sc(VMLA, gvec_fmla_nf_idx)
139
+DO_VMUL_F_2sc(VMLS, gvec_fmls_nf_idx)
140
141
WRAP_ENV_FN(gen_VQDMULH_16, gen_helper_neon_qdmulh_s16)
142
WRAP_ENV_FN(gen_VQDMULH_32, gen_helper_neon_qdmulh_s32)
143
--
81
--
144
2.20.1
82
2.25.1
145
83
146
84
diff view generated by jsdifflib
1
Macroify the uses of do_vfp_2op_sp() and do_vfp_2op_dp(); this will
1
From: Tobias Röhmel <tobias.roehmel@rwth-aachen.de>
2
make it easier to add the halfprec support.
3
2
3
The v8R PMSAv8 has a two-stage MPU translation process, but, unlike
4
VMSAv8, the stage 2 attributes are in the same format as the stage 1
5
attributes (8-bit MAIR format). Rather than converting the MAIR
6
format to the format used for VMSA stage 2 (bits [5:2] of a VMSA
7
stage 2 descriptor) and then converting back to do the attribute
8
combination, allow combined_attrs_nofwb() to accept s2 attributes
9
that are already in the MAIR format.
10
11
We move the assert() to combined_attrs_fwb(), because that function
12
really does require a VMSA stage 2 attribute format. (We will never
13
get there for v8R, because PMSAv8 does not implement FEAT_S2FWB.)
14
15
Signed-off-by: Tobias Röhmel <tobias.roehmel@rwth-aachen.de>
16
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
17
Message-id: 20221206102504.165775-4-tobias.roehmel@rwth-aachen.de
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
18
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Message-id: 20200828183354.27913-8-peter.maydell@linaro.org
7
---
19
---
8
target/arm/translate-vfp.c.inc | 49 ++++++++++------------------------
20
target/arm/ptw.c | 10 ++++++++--
9
1 file changed, 14 insertions(+), 35 deletions(-)
21
1 file changed, 8 insertions(+), 2 deletions(-)
10
22
11
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
23
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
12
index XXXXXXX..XXXXXXX 100644
24
index XXXXXXX..XXXXXXX 100644
13
--- a/target/arm/translate-vfp.c.inc
25
--- a/target/arm/ptw.c
14
+++ b/target/arm/translate-vfp.c.inc
26
+++ b/target/arm/ptw.c
15
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
27
@@ -XXX,XX +XXX,XX @@ static uint8_t combined_attrs_nofwb(uint64_t hcr,
16
return true;
28
{
17
}
29
uint8_t s1lo, s2lo, s1hi, s2hi, s2_mair_attrs, ret_attrs;
18
30
19
-static bool trans_VMOV_reg_sp(DisasContext *s, arg_VMOV_reg_sp *a)
31
- s2_mair_attrs = convert_stage2_attrs(hcr, s2.attrs);
20
-{
32
+ if (s2.is_s2_format) {
21
- return do_vfp_2op_sp(s, tcg_gen_mov_i32, a->vd, a->vm);
33
+ s2_mair_attrs = convert_stage2_attrs(hcr, s2.attrs);
22
-}
34
+ } else {
23
+#define DO_VFP_2OP(INSN, PREC, FN) \
35
+ s2_mair_attrs = s2.attrs;
24
+ static bool trans_##INSN##_##PREC(DisasContext *s, \
25
+ arg_##INSN##_##PREC *a) \
26
+ { \
27
+ return do_vfp_2op_##PREC(s, FN, a->vd, a->vm); \
28
+ }
36
+ }
29
37
30
-static bool trans_VMOV_reg_dp(DisasContext *s, arg_VMOV_reg_dp *a)
38
s1lo = extract32(s1.attrs, 0, 4);
31
-{
39
s2lo = extract32(s2_mair_attrs, 0, 4);
32
- return do_vfp_2op_dp(s, tcg_gen_mov_i64, a->vd, a->vm);
40
@@ -XXX,XX +XXX,XX @@ static uint8_t force_cacheattr_nibble_wb(uint8_t attr)
33
-}
41
*/
34
+DO_VFP_2OP(VMOV_reg, sp, tcg_gen_mov_i32)
42
static uint8_t combined_attrs_fwb(ARMCacheAttrs s1, ARMCacheAttrs s2)
35
+DO_VFP_2OP(VMOV_reg, dp, tcg_gen_mov_i64)
36
37
-static bool trans_VABS_sp(DisasContext *s, arg_VABS_sp *a)
38
-{
39
- return do_vfp_2op_sp(s, gen_helper_vfp_abss, a->vd, a->vm);
40
-}
41
+DO_VFP_2OP(VABS, sp, gen_helper_vfp_abss)
42
+DO_VFP_2OP(VABS, dp, gen_helper_vfp_absd)
43
44
-static bool trans_VABS_dp(DisasContext *s, arg_VABS_dp *a)
45
-{
46
- return do_vfp_2op_dp(s, gen_helper_vfp_absd, a->vd, a->vm);
47
-}
48
-
49
-static bool trans_VNEG_sp(DisasContext *s, arg_VNEG_sp *a)
50
-{
51
- return do_vfp_2op_sp(s, gen_helper_vfp_negs, a->vd, a->vm);
52
-}
53
-
54
-static bool trans_VNEG_dp(DisasContext *s, arg_VNEG_dp *a)
55
-{
56
- return do_vfp_2op_dp(s, gen_helper_vfp_negd, a->vd, a->vm);
57
-}
58
+DO_VFP_2OP(VNEG, sp, gen_helper_vfp_negs)
59
+DO_VFP_2OP(VNEG, dp, gen_helper_vfp_negd)
60
61
static void gen_VSQRT_sp(TCGv_i32 vd, TCGv_i32 vm)
62
{
43
{
63
gen_helper_vfp_sqrts(vd, vm, cpu_env);
44
+ assert(s2.is_s2_format && !s1.is_s2_format);
64
}
45
+
65
46
switch (s2.attrs) {
66
-static bool trans_VSQRT_sp(DisasContext *s, arg_VSQRT_sp *a)
47
case 7:
67
-{
48
/* Use stage 1 attributes */
68
- return do_vfp_2op_sp(s, gen_VSQRT_sp, a->vd, a->vm);
49
@@ -XXX,XX +XXX,XX @@ static ARMCacheAttrs combine_cacheattrs(uint64_t hcr,
69
-}
50
ARMCacheAttrs ret;
70
-
51
bool tagged = false;
71
static void gen_VSQRT_dp(TCGv_i64 vd, TCGv_i64 vm)
52
72
{
53
- assert(s2.is_s2_format && !s1.is_s2_format);
73
gen_helper_vfp_sqrtd(vd, vm, cpu_env);
54
+ assert(!s1.is_s2_format);
74
}
55
ret.is_s2_format = false;
75
56
76
-static bool trans_VSQRT_dp(DisasContext *s, arg_VSQRT_dp *a)
57
if (s1.attrs == 0xf0) {
77
-{
78
- return do_vfp_2op_dp(s, gen_VSQRT_dp, a->vd, a->vm);
79
-}
80
+DO_VFP_2OP(VSQRT, sp, gen_VSQRT_sp)
81
+DO_VFP_2OP(VSQRT, dp, gen_VSQRT_dp)
82
83
static bool trans_VCMP_sp(DisasContext *s, arg_VCMP_sp *a)
84
{
85
--
58
--
86
2.20.1
59
2.25.1
87
60
88
61
diff view generated by jsdifflib
1
Implement the fp16 version of the VFP VRINT* insns.
1
From: Tobias Röhmel <tobias.roehmel@rwth-aachen.de>
2
2
3
ARMv8-R AArch32 CPUs behave as if TTBCR.EAE is always 1 even
4
tough they don't have the TTBCR register.
5
See ARM Architecture Reference Manual Supplement - ARMv8, for the ARMv8-R
6
AArch32 architecture profile Version:A.c section C1.2.
7
8
Signed-off-by: Tobias Röhmel <tobias.roehmel@rwth-aachen.de>
9
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
10
Message-id: 20221206102504.165775-5-tobias.roehmel@rwth-aachen.de
3
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
5
Message-id: 20200828183354.27913-19-peter.maydell@linaro.org
6
---
12
---
7
target/arm/helper.h | 2 +
13
target/arm/internals.h | 4 ++++
8
target/arm/vfp-uncond.decode | 6 ++-
14
target/arm/debug_helper.c | 3 +++
9
target/arm/vfp.decode | 3 ++
15
target/arm/tlb_helper.c | 4 ++++
10
target/arm/vfp_helper.c | 21 ++++++++
16
3 files changed, 11 insertions(+)
11
target/arm/translate-vfp.c.inc | 98 +++++++++++++++++++++++++++++++---
12
5 files changed, 122 insertions(+), 8 deletions(-)
13
17
14
diff --git a/target/arm/helper.h b/target/arm/helper.h
18
diff --git a/target/arm/internals.h b/target/arm/internals.h
15
index XXXXXXX..XXXXXXX 100644
19
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/helper.h
20
--- a/target/arm/internals.h
17
+++ b/target/arm/helper.h
21
+++ b/target/arm/internals.h
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(shr_cc, i32, env, i32, i32)
22
@@ -XXX,XX +XXX,XX @@ unsigned int arm_pamax(ARMCPU *cpu);
19
DEF_HELPER_3(sar_cc, i32, env, i32, i32)
23
static inline bool extended_addresses_enabled(CPUARMState *env)
20
DEF_HELPER_3(ror_cc, i32, env, i32, i32)
24
{
21
25
uint64_t tcr = env->cp15.tcr_el[arm_is_secure(env) ? 3 : 1];
22
+DEF_HELPER_FLAGS_2(rinth_exact, TCG_CALL_NO_RWG, f16, f16, ptr)
26
+ if (arm_feature(env, ARM_FEATURE_PMSA) &&
23
DEF_HELPER_FLAGS_2(rints_exact, TCG_CALL_NO_RWG, f32, f32, ptr)
27
+ arm_feature(env, ARM_FEATURE_V8)) {
24
DEF_HELPER_FLAGS_2(rintd_exact, TCG_CALL_NO_RWG, f64, f64, ptr)
28
+ return true;
25
+DEF_HELPER_FLAGS_2(rinth, TCG_CALL_NO_RWG, f16, f16, ptr)
29
+ }
26
DEF_HELPER_FLAGS_2(rints, TCG_CALL_NO_RWG, f32, f32, ptr)
30
return arm_el_is_aa64(env, 1) ||
27
DEF_HELPER_FLAGS_2(rintd, TCG_CALL_NO_RWG, f64, f64, ptr)
31
(arm_feature(env, ARM_FEATURE_LPAE) && (tcr & TTBCR_EAE));
28
32
}
29
diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
33
diff --git a/target/arm/debug_helper.c b/target/arm/debug_helper.c
30
index XXXXXXX..XXXXXXX 100644
34
index XXXXXXX..XXXXXXX 100644
31
--- a/target/arm/vfp-uncond.decode
35
--- a/target/arm/debug_helper.c
32
+++ b/target/arm/vfp-uncond.decode
36
+++ b/target/arm/debug_helper.c
33
@@ -XXX,XX +XXX,XX @@ VMINNM_sp 1111 1110 1.00 .... .... 1010 .1.0 .... @vfp_dnm_s
37
@@ -XXX,XX +XXX,XX @@ static uint32_t arm_debug_exception_fsr(CPUARMState *env)
34
VMAXNM_dp 1111 1110 1.00 .... .... 1011 .0.0 .... @vfp_dnm_d
38
35
VMINNM_dp 1111 1110 1.00 .... .... 1011 .1.0 .... @vfp_dnm_d
39
if (target_el == 2 || arm_el_is_aa64(env, target_el)) {
36
40
using_lpae = true;
37
+VRINT 1111 1110 1.11 10 rm:2 .... 1001 01.0 .... \
41
+ } else if (arm_feature(env, ARM_FEATURE_PMSA) &&
38
+ vm=%vm_sp vd=%vd_sp sz=1
42
+ arm_feature(env, ARM_FEATURE_V8)) {
39
VRINT 1111 1110 1.11 10 rm:2 .... 1010 01.0 .... \
43
+ using_lpae = true;
40
- vm=%vm_sp vd=%vd_sp dp=0
44
} else {
41
+ vm=%vm_sp vd=%vd_sp sz=2
45
if (arm_feature(env, ARM_FEATURE_LPAE) &&
42
VRINT 1111 1110 1.11 10 rm:2 .... 1011 01.0 .... \
46
(env->cp15.tcr_el[target_el] & TTBCR_EAE)) {
43
- vm=%vm_dp vd=%vd_dp dp=1
47
diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c
44
+ vm=%vm_dp vd=%vd_dp sz=3
45
46
# VCVT float to int with specified rounding mode; Vd is always single-precision
47
VCVT 1111 1110 1.11 11 rm:2 .... 1001 op:1 1.0 .... \
48
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
49
index XXXXXXX..XXXXXXX 100644
48
index XXXXXXX..XXXXXXX 100644
50
--- a/target/arm/vfp.decode
49
--- a/target/arm/tlb_helper.c
51
+++ b/target/arm/vfp.decode
50
+++ b/target/arm/tlb_helper.c
52
@@ -XXX,XX +XXX,XX @@ VCVT_f16_f32 ---- 1110 1.11 0011 .... 1010 t:1 1.0 .... \
51
@@ -XXX,XX +XXX,XX @@ bool regime_using_lpae_format(CPUARMState *env, ARMMMUIdx mmu_idx)
53
VCVT_f16_f64 ---- 1110 1.11 0011 .... 1011 t:1 1.0 .... \
52
if (el == 2 || arm_el_is_aa64(env, el)) {
54
vd=%vd_sp vm=%vm_dp
55
56
+VRINTR_hp ---- 1110 1.11 0110 .... 1001 01.0 .... @vfp_dm_ss
57
VRINTR_sp ---- 1110 1.11 0110 .... 1010 01.0 .... @vfp_dm_ss
58
VRINTR_dp ---- 1110 1.11 0110 .... 1011 01.0 .... @vfp_dm_dd
59
60
+VRINTZ_hp ---- 1110 1.11 0110 .... 1001 11.0 .... @vfp_dm_ss
61
VRINTZ_sp ---- 1110 1.11 0110 .... 1010 11.0 .... @vfp_dm_ss
62
VRINTZ_dp ---- 1110 1.11 0110 .... 1011 11.0 .... @vfp_dm_dd
63
64
+VRINTX_hp ---- 1110 1.11 0111 .... 1001 01.0 .... @vfp_dm_ss
65
VRINTX_sp ---- 1110 1.11 0111 .... 1010 01.0 .... @vfp_dm_ss
66
VRINTX_dp ---- 1110 1.11 0111 .... 1011 01.0 .... @vfp_dm_dd
67
68
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
69
index XXXXXXX..XXXXXXX 100644
70
--- a/target/arm/vfp_helper.c
71
+++ b/target/arm/vfp_helper.c
72
@@ -XXX,XX +XXX,XX @@ float64 VFP_HELPER(muladd, d)(float64 a, float64 b, float64 c, void *fpstp)
73
}
74
75
/* ARMv8 round to integral */
76
+dh_ctype_f16 HELPER(rinth_exact)(dh_ctype_f16 x, void *fp_status)
77
+{
78
+ return float16_round_to_int(x, fp_status);
79
+}
80
+
81
float32 HELPER(rints_exact)(float32 x, void *fp_status)
82
{
83
return float32_round_to_int(x, fp_status);
84
@@ -XXX,XX +XXX,XX @@ float64 HELPER(rintd_exact)(float64 x, void *fp_status)
85
return float64_round_to_int(x, fp_status);
86
}
87
88
+dh_ctype_f16 HELPER(rinth)(dh_ctype_f16 x, void *fp_status)
89
+{
90
+ int old_flags = get_float_exception_flags(fp_status), new_flags;
91
+ float16 ret;
92
+
93
+ ret = float16_round_to_int(x, fp_status);
94
+
95
+ /* Suppress any inexact exceptions the conversion produced */
96
+ if (!(old_flags & float_flag_inexact)) {
97
+ new_flags = get_float_exception_flags(fp_status);
98
+ set_float_exception_flags(new_flags & ~float_flag_inexact, fp_status);
99
+ }
100
+
101
+ return ret;
102
+}
103
+
104
float32 HELPER(rints)(float32 x, void *fp_status)
105
{
106
int old_flags = get_float_exception_flags(fp_status), new_flags;
107
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
108
index XXXXXXX..XXXXXXX 100644
109
--- a/target/arm/translate-vfp.c.inc
110
+++ b/target/arm/translate-vfp.c.inc
111
@@ -XXX,XX +XXX,XX @@ static const uint8_t fp_decode_rm[] = {
112
static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
113
{
114
uint32_t rd, rm;
115
- bool dp = a->dp;
116
+ int sz = a->sz;
117
TCGv_ptr fpst;
118
TCGv_i32 tcg_rmode;
119
int rounding = fp_decode_rm[a->rm];
120
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
121
return false;
122
}
123
124
- if (dp && !dc_isar_feature(aa32_fpdp_v2, s)) {
125
+ if (sz == 3 && !dc_isar_feature(aa32_fpdp_v2, s)) {
126
+ return false;
127
+ }
128
+
129
+ if (sz == 1 && !dc_isar_feature(aa32_fp16_arith, s)) {
130
return false;
131
}
132
133
/* UNDEF accesses to D16-D31 if they don't exist */
134
- if (dp && !dc_isar_feature(aa32_simd_r32, s) &&
135
+ if (sz == 3 && !dc_isar_feature(aa32_simd_r32, s) &&
136
((a->vm | a->vd) & 0x10)) {
137
return false;
138
}
139
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
140
return true;
53
return true;
141
}
54
}
142
55
+ if (arm_feature(env, ARM_FEATURE_PMSA) &&
143
- fpst = fpstatus_ptr(FPST_FPCR);
56
+ arm_feature(env, ARM_FEATURE_V8)) {
144
+ if (sz == 1) {
145
+ fpst = fpstatus_ptr(FPST_FPCR_F16);
146
+ } else {
147
+ fpst = fpstatus_ptr(FPST_FPCR);
148
+ }
149
150
tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rounding));
151
gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
152
153
- if (dp) {
154
+ if (sz == 3) {
155
TCGv_i64 tcg_op;
156
TCGv_i64 tcg_res;
157
tcg_op = tcg_temp_new_i64();
158
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
159
tcg_op = tcg_temp_new_i32();
160
tcg_res = tcg_temp_new_i32();
161
neon_load_reg32(tcg_op, rm);
162
- gen_helper_rints(tcg_res, tcg_op, fpst);
163
+ if (sz == 1) {
164
+ gen_helper_rinth(tcg_res, tcg_op, fpst);
165
+ } else {
166
+ gen_helper_rints(tcg_res, tcg_op, fpst);
167
+ }
168
neon_store_reg32(tcg_res, rd);
169
tcg_temp_free_i32(tcg_op);
170
tcg_temp_free_i32(tcg_res);
171
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
172
return true;
173
}
174
175
+static bool trans_VRINTR_hp(DisasContext *s, arg_VRINTR_sp *a)
176
+{
177
+ TCGv_ptr fpst;
178
+ TCGv_i32 tmp;
179
+
180
+ if (!dc_isar_feature(aa32_fp16_arith, s)) {
181
+ return false;
182
+ }
183
+
184
+ if (!vfp_access_check(s)) {
185
+ return true;
57
+ return true;
186
+ }
58
+ }
187
+
59
if (arm_feature(env, ARM_FEATURE_LPAE)
188
+ tmp = tcg_temp_new_i32();
60
&& (regime_tcr(env, mmu_idx) & TTBCR_EAE)) {
189
+ neon_load_reg32(tmp, a->vm);
61
return true;
190
+ fpst = fpstatus_ptr(FPST_FPCR_F16);
191
+ gen_helper_rinth(tmp, tmp, fpst);
192
+ neon_store_reg32(tmp, a->vd);
193
+ tcg_temp_free_ptr(fpst);
194
+ tcg_temp_free_i32(tmp);
195
+ return true;
196
+}
197
+
198
static bool trans_VRINTR_sp(DisasContext *s, arg_VRINTR_sp *a)
199
{
200
TCGv_ptr fpst;
201
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_dp *a)
202
return true;
203
}
204
205
+static bool trans_VRINTZ_hp(DisasContext *s, arg_VRINTZ_sp *a)
206
+{
207
+ TCGv_ptr fpst;
208
+ TCGv_i32 tmp;
209
+ TCGv_i32 tcg_rmode;
210
+
211
+ if (!dc_isar_feature(aa32_fp16_arith, s)) {
212
+ return false;
213
+ }
214
+
215
+ if (!vfp_access_check(s)) {
216
+ return true;
217
+ }
218
+
219
+ tmp = tcg_temp_new_i32();
220
+ neon_load_reg32(tmp, a->vm);
221
+ fpst = fpstatus_ptr(FPST_FPCR_F16);
222
+ tcg_rmode = tcg_const_i32(float_round_to_zero);
223
+ gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
224
+ gen_helper_rinth(tmp, tmp, fpst);
225
+ gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
226
+ neon_store_reg32(tmp, a->vd);
227
+ tcg_temp_free_ptr(fpst);
228
+ tcg_temp_free_i32(tcg_rmode);
229
+ tcg_temp_free_i32(tmp);
230
+ return true;
231
+}
232
+
233
static bool trans_VRINTZ_sp(DisasContext *s, arg_VRINTZ_sp *a)
234
{
235
TCGv_ptr fpst;
236
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_dp *a)
237
return true;
238
}
239
240
+static bool trans_VRINTX_hp(DisasContext *s, arg_VRINTX_sp *a)
241
+{
242
+ TCGv_ptr fpst;
243
+ TCGv_i32 tmp;
244
+
245
+ if (!dc_isar_feature(aa32_fp16_arith, s)) {
246
+ return false;
247
+ }
248
+
249
+ if (!vfp_access_check(s)) {
250
+ return true;
251
+ }
252
+
253
+ tmp = tcg_temp_new_i32();
254
+ neon_load_reg32(tmp, a->vm);
255
+ fpst = fpstatus_ptr(FPST_FPCR_F16);
256
+ gen_helper_rinth_exact(tmp, tmp, fpst);
257
+ neon_store_reg32(tmp, a->vd);
258
+ tcg_temp_free_ptr(fpst);
259
+ tcg_temp_free_i32(tmp);
260
+ return true;
261
+}
262
+
263
static bool trans_VRINTX_sp(DisasContext *s, arg_VRINTX_sp *a)
264
{
265
TCGv_ptr fpst;
266
--
62
--
267
2.20.1
63
2.25.1
268
64
269
65
diff view generated by jsdifflib
1
The aa32_fp16_arith feature check function currently looks at the
1
From: Tobias Röhmel <tobias.roehmel@rwth-aachen.de>
2
AArch64 ID_AA64PFR0 register. This is (as the comment notes) not
3
correct. The bogus check was put in mostly to allow testing of the
4
fp16 variants of the VCMLA instructions and it was something of
5
a mistake that we allowed them to exist in master.
6
2
7
Switch the feature check function to testing VMFR1.FPHP, which is
3
Signed-off-by: Tobias Röhmel <tobias.roehmel@rwth-aachen.de>
8
what it ought to be.
4
Message-id: 20221206102504.165775-6-tobias.roehmel@rwth-aachen.de
9
10
This will remove emulation of the VCMLA and VCADD insns from
11
AArch32 code running on an AArch64 '-cpu max' using system emulation.
12
(They were never enabled for aarch32 linux-user and system-emulation.)
13
Since we weren't advertising their existence via the AArch32 ID
14
register, well-behaved guests wouldn't have been using them anyway.
15
16
Once we have implemented all the AArch32 support for the FP16 extension
17
we will advertise it in the MVFR1 ID register field, which will reenable
18
these insns along with all the others.
19
20
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
21
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
22
Message-id: 20200828183354.27913-3-peter.maydell@linaro.org
23
---
6
---
24
target/arm/cpu.h | 7 +------
7
target/arm/cpu.h | 6 +
25
1 file changed, 1 insertion(+), 6 deletions(-)
8
target/arm/cpu.c | 28 +++-
9
target/arm/helper.c | 302 +++++++++++++++++++++++++++++++++++++++++++
10
target/arm/machine.c | 28 ++++
11
4 files changed, 360 insertions(+), 4 deletions(-)
26
12
27
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
13
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
28
index XXXXXXX..XXXXXXX 100644
14
index XXXXXXX..XXXXXXX 100644
29
--- a/target/arm/cpu.h
15
--- a/target/arm/cpu.h
30
+++ b/target/arm/cpu.h
16
+++ b/target/arm/cpu.h
31
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_predinv(const ARMISARegisters *id)
17
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
32
18
};
33
static inline bool isar_feature_aa32_fp16_arith(const ARMISARegisters *id)
19
uint64_t sctlr_el[4];
34
{
20
};
35
- /*
21
+ uint64_t vsctlr; /* Virtualization System control register. */
36
- * This is a placeholder for use by VCMA until the rest of
22
uint64_t cpacr_el1; /* Architectural feature access control register */
37
- * the ARMv8.2-FP16 extension is implemented for aa32 mode.
23
uint64_t cptr_el[4]; /* ARMv8 feature trap registers */
38
- * At which point we can properly set and check MVFR1.FPHP.
24
uint32_t c1_xscaleauxcr; /* XScale auxiliary control register. */
39
- */
25
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
40
- return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, FP) == 1;
26
*/
41
+ return FIELD_EX32(id->mvfr1, MVFR1, FPHP) >= 3;
27
uint32_t *rbar[M_REG_NUM_BANKS];
28
uint32_t *rlar[M_REG_NUM_BANKS];
29
+ uint32_t *hprbar;
30
+ uint32_t *hprlar;
31
uint32_t mair0[M_REG_NUM_BANKS];
32
uint32_t mair1[M_REG_NUM_BANKS];
33
+ uint32_t hprselr;
34
} pmsav8;
35
36
/* v8M SAU */
37
@@ -XXX,XX +XXX,XX @@ struct ArchCPU {
38
bool has_mpu;
39
/* PMSAv7 MPU number of supported regions */
40
uint32_t pmsav7_dregion;
41
+ /* PMSAv8 MPU number of supported hyp regions */
42
+ uint32_t pmsav8r_hdregion;
43
/* v8M SAU number of supported regions */
44
uint32_t sau_sregion;
45
46
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
47
index XXXXXXX..XXXXXXX 100644
48
--- a/target/arm/cpu.c
49
+++ b/target/arm/cpu.c
50
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj)
51
sizeof(*env->pmsav7.dracr) * cpu->pmsav7_dregion);
52
}
53
}
54
+
55
+ if (cpu->pmsav8r_hdregion > 0) {
56
+ memset(env->pmsav8.hprbar, 0,
57
+ sizeof(*env->pmsav8.hprbar) * cpu->pmsav8r_hdregion);
58
+ memset(env->pmsav8.hprlar, 0,
59
+ sizeof(*env->pmsav8.hprlar) * cpu->pmsav8r_hdregion);
60
+ }
61
+
62
env->pmsav7.rnr[M_REG_NS] = 0;
63
env->pmsav7.rnr[M_REG_S] = 0;
64
env->pmsav8.mair0[M_REG_NS] = 0;
65
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
66
/* MPU can be configured out of a PMSA CPU either by setting has-mpu
67
* to false or by setting pmsav7-dregion to 0.
68
*/
69
- if (!cpu->has_mpu) {
70
- cpu->pmsav7_dregion = 0;
71
- }
72
- if (cpu->pmsav7_dregion == 0) {
73
+ if (!cpu->has_mpu || cpu->pmsav7_dregion == 0) {
74
cpu->has_mpu = false;
75
+ cpu->pmsav7_dregion = 0;
76
+ cpu->pmsav8r_hdregion = 0;
77
}
78
79
if (arm_feature(env, ARM_FEATURE_PMSA) &&
80
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
81
env->pmsav7.dracr = g_new0(uint32_t, nr);
82
}
83
}
84
+
85
+ if (cpu->pmsav8r_hdregion > 0xff) {
86
+ error_setg(errp, "PMSAv8 MPU EL2 #regions invalid %" PRIu32,
87
+ cpu->pmsav8r_hdregion);
88
+ return;
89
+ }
90
+
91
+ if (cpu->pmsav8r_hdregion) {
92
+ env->pmsav8.hprbar = g_new0(uint32_t,
93
+ cpu->pmsav8r_hdregion);
94
+ env->pmsav8.hprlar = g_new0(uint32_t,
95
+ cpu->pmsav8r_hdregion);
96
+ }
97
}
98
99
if (arm_feature(env, ARM_FEATURE_M_SECURITY)) {
100
diff --git a/target/arm/helper.c b/target/arm/helper.c
101
index XXXXXXX..XXXXXXX 100644
102
--- a/target/arm/helper.c
103
+++ b/target/arm/helper.c
104
@@ -XXX,XX +XXX,XX @@ static void pmsav7_rgnr_write(CPUARMState *env, const ARMCPRegInfo *ri,
105
raw_write(env, ri, value);
42
}
106
}
43
107
44
static inline bool isar_feature_aa32_vfp_simd(const ARMISARegisters *id)
108
+static void prbar_write(CPUARMState *env, const ARMCPRegInfo *ri,
109
+ uint64_t value)
110
+{
111
+ ARMCPU *cpu = env_archcpu(env);
112
+
113
+ tlb_flush(CPU(cpu)); /* Mappings may have changed - purge! */
114
+ env->pmsav8.rbar[M_REG_NS][env->pmsav7.rnr[M_REG_NS]] = value;
115
+}
116
+
117
+static uint64_t prbar_read(CPUARMState *env, const ARMCPRegInfo *ri)
118
+{
119
+ return env->pmsav8.rbar[M_REG_NS][env->pmsav7.rnr[M_REG_NS]];
120
+}
121
+
122
+static void prlar_write(CPUARMState *env, const ARMCPRegInfo *ri,
123
+ uint64_t value)
124
+{
125
+ ARMCPU *cpu = env_archcpu(env);
126
+
127
+ tlb_flush(CPU(cpu)); /* Mappings may have changed - purge! */
128
+ env->pmsav8.rlar[M_REG_NS][env->pmsav7.rnr[M_REG_NS]] = value;
129
+}
130
+
131
+static uint64_t prlar_read(CPUARMState *env, const ARMCPRegInfo *ri)
132
+{
133
+ return env->pmsav8.rlar[M_REG_NS][env->pmsav7.rnr[M_REG_NS]];
134
+}
135
+
136
+static void prselr_write(CPUARMState *env, const ARMCPRegInfo *ri,
137
+ uint64_t value)
138
+{
139
+ ARMCPU *cpu = env_archcpu(env);
140
+
141
+ /*
142
+ * Ignore writes that would select not implemented region.
143
+ * This is architecturally UNPREDICTABLE.
144
+ */
145
+ if (value >= cpu->pmsav7_dregion) {
146
+ return;
147
+ }
148
+
149
+ env->pmsav7.rnr[M_REG_NS] = value;
150
+}
151
+
152
+static void hprbar_write(CPUARMState *env, const ARMCPRegInfo *ri,
153
+ uint64_t value)
154
+{
155
+ ARMCPU *cpu = env_archcpu(env);
156
+
157
+ tlb_flush(CPU(cpu)); /* Mappings may have changed - purge! */
158
+ env->pmsav8.hprbar[env->pmsav8.hprselr] = value;
159
+}
160
+
161
+static uint64_t hprbar_read(CPUARMState *env, const ARMCPRegInfo *ri)
162
+{
163
+ return env->pmsav8.hprbar[env->pmsav8.hprselr];
164
+}
165
+
166
+static void hprlar_write(CPUARMState *env, const ARMCPRegInfo *ri,
167
+ uint64_t value)
168
+{
169
+ ARMCPU *cpu = env_archcpu(env);
170
+
171
+ tlb_flush(CPU(cpu)); /* Mappings may have changed - purge! */
172
+ env->pmsav8.hprlar[env->pmsav8.hprselr] = value;
173
+}
174
+
175
+static uint64_t hprlar_read(CPUARMState *env, const ARMCPRegInfo *ri)
176
+{
177
+ return env->pmsav8.hprlar[env->pmsav8.hprselr];
178
+}
179
+
180
+static void hprenr_write(CPUARMState *env, const ARMCPRegInfo *ri,
181
+ uint64_t value)
182
+{
183
+ uint32_t n;
184
+ uint32_t bit;
185
+ ARMCPU *cpu = env_archcpu(env);
186
+
187
+ /* Ignore writes to unimplemented regions */
188
+ int rmax = MIN(cpu->pmsav8r_hdregion, 32);
189
+ value &= MAKE_64BIT_MASK(0, rmax);
190
+
191
+ tlb_flush(CPU(cpu)); /* Mappings may have changed - purge! */
192
+
193
+ /* Register alias is only valid for first 32 indexes */
194
+ for (n = 0; n < rmax; ++n) {
195
+ bit = extract32(value, n, 1);
196
+ env->pmsav8.hprlar[n] = deposit32(
197
+ env->pmsav8.hprlar[n], 0, 1, bit);
198
+ }
199
+}
200
+
201
+static uint64_t hprenr_read(CPUARMState *env, const ARMCPRegInfo *ri)
202
+{
203
+ uint32_t n;
204
+ uint32_t result = 0x0;
205
+ ARMCPU *cpu = env_archcpu(env);
206
+
207
+ /* Register alias is only valid for first 32 indexes */
208
+ for (n = 0; n < MIN(cpu->pmsav8r_hdregion, 32); ++n) {
209
+ if (env->pmsav8.hprlar[n] & 0x1) {
210
+ result |= (0x1 << n);
211
+ }
212
+ }
213
+ return result;
214
+}
215
+
216
+static void hprselr_write(CPUARMState *env, const ARMCPRegInfo *ri,
217
+ uint64_t value)
218
+{
219
+ ARMCPU *cpu = env_archcpu(env);
220
+
221
+ /*
222
+ * Ignore writes that would select not implemented region.
223
+ * This is architecturally UNPREDICTABLE.
224
+ */
225
+ if (value >= cpu->pmsav8r_hdregion) {
226
+ return;
227
+ }
228
+
229
+ env->pmsav8.hprselr = value;
230
+}
231
+
232
+static void pmsav8r_regn_write(CPUARMState *env, const ARMCPRegInfo *ri,
233
+ uint64_t value)
234
+{
235
+ ARMCPU *cpu = env_archcpu(env);
236
+ uint8_t index = (extract32(ri->opc0, 0, 1) << 4) |
237
+ (extract32(ri->crm, 0, 3) << 1) | extract32(ri->opc2, 2, 1);
238
+
239
+ tlb_flush(CPU(cpu)); /* Mappings may have changed - purge! */
240
+
241
+ if (ri->opc1 & 4) {
242
+ if (index >= cpu->pmsav8r_hdregion) {
243
+ return;
244
+ }
245
+ if (ri->opc2 & 0x1) {
246
+ env->pmsav8.hprlar[index] = value;
247
+ } else {
248
+ env->pmsav8.hprbar[index] = value;
249
+ }
250
+ } else {
251
+ if (index >= cpu->pmsav7_dregion) {
252
+ return;
253
+ }
254
+ if (ri->opc2 & 0x1) {
255
+ env->pmsav8.rlar[M_REG_NS][index] = value;
256
+ } else {
257
+ env->pmsav8.rbar[M_REG_NS][index] = value;
258
+ }
259
+ }
260
+}
261
+
262
+static uint64_t pmsav8r_regn_read(CPUARMState *env, const ARMCPRegInfo *ri)
263
+{
264
+ ARMCPU *cpu = env_archcpu(env);
265
+ uint8_t index = (extract32(ri->opc0, 0, 1) << 4) |
266
+ (extract32(ri->crm, 0, 3) << 1) | extract32(ri->opc2, 2, 1);
267
+
268
+ if (ri->opc1 & 4) {
269
+ if (index >= cpu->pmsav8r_hdregion) {
270
+ return 0x0;
271
+ }
272
+ if (ri->opc2 & 0x1) {
273
+ return env->pmsav8.hprlar[index];
274
+ } else {
275
+ return env->pmsav8.hprbar[index];
276
+ }
277
+ } else {
278
+ if (index >= cpu->pmsav7_dregion) {
279
+ return 0x0;
280
+ }
281
+ if (ri->opc2 & 0x1) {
282
+ return env->pmsav8.rlar[M_REG_NS][index];
283
+ } else {
284
+ return env->pmsav8.rbar[M_REG_NS][index];
285
+ }
286
+ }
287
+}
288
+
289
+static const ARMCPRegInfo pmsav8r_cp_reginfo[] = {
290
+ { .name = "PRBAR",
291
+ .cp = 15, .opc1 = 0, .crn = 6, .crm = 3, .opc2 = 0,
292
+ .access = PL1_RW, .type = ARM_CP_NO_RAW,
293
+ .accessfn = access_tvm_trvm,
294
+ .readfn = prbar_read, .writefn = prbar_write },
295
+ { .name = "PRLAR",
296
+ .cp = 15, .opc1 = 0, .crn = 6, .crm = 3, .opc2 = 1,
297
+ .access = PL1_RW, .type = ARM_CP_NO_RAW,
298
+ .accessfn = access_tvm_trvm,
299
+ .readfn = prlar_read, .writefn = prlar_write },
300
+ { .name = "PRSELR", .resetvalue = 0,
301
+ .cp = 15, .opc1 = 0, .crn = 6, .crm = 2, .opc2 = 1,
302
+ .access = PL1_RW, .accessfn = access_tvm_trvm,
303
+ .writefn = prselr_write,
304
+ .fieldoffset = offsetof(CPUARMState, pmsav7.rnr[M_REG_NS]) },
305
+ { .name = "HPRBAR", .resetvalue = 0,
306
+ .cp = 15, .opc1 = 4, .crn = 6, .crm = 3, .opc2 = 0,
307
+ .access = PL2_RW, .type = ARM_CP_NO_RAW,
308
+ .readfn = hprbar_read, .writefn = hprbar_write },
309
+ { .name = "HPRLAR",
310
+ .cp = 15, .opc1 = 4, .crn = 6, .crm = 3, .opc2 = 1,
311
+ .access = PL2_RW, .type = ARM_CP_NO_RAW,
312
+ .readfn = hprlar_read, .writefn = hprlar_write },
313
+ { .name = "HPRSELR", .resetvalue = 0,
314
+ .cp = 15, .opc1 = 4, .crn = 6, .crm = 2, .opc2 = 1,
315
+ .access = PL2_RW,
316
+ .writefn = hprselr_write,
317
+ .fieldoffset = offsetof(CPUARMState, pmsav8.hprselr) },
318
+ { .name = "HPRENR",
319
+ .cp = 15, .opc1 = 4, .crn = 6, .crm = 1, .opc2 = 1,
320
+ .access = PL2_RW, .type = ARM_CP_NO_RAW,
321
+ .readfn = hprenr_read, .writefn = hprenr_write },
322
+};
323
+
324
static const ARMCPRegInfo pmsav7_cp_reginfo[] = {
325
/* Reset for all these registers is handled in arm_cpu_reset(),
326
* because the PMSAv7 is also used by M-profile CPUs, which do
327
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
328
.access = PL1_R, .type = ARM_CP_CONST,
329
.resetvalue = cpu->pmsav7_dregion << 8
330
};
331
+ /* HMPUIR is specific to PMSA V8 */
332
+ ARMCPRegInfo id_hmpuir_reginfo = {
333
+ .name = "HMPUIR",
334
+ .cp = 15, .opc1 = 4, .crn = 0, .crm = 0, .opc2 = 4,
335
+ .access = PL2_R, .type = ARM_CP_CONST,
336
+ .resetvalue = cpu->pmsav8r_hdregion
337
+ };
338
static const ARMCPRegInfo crn0_wi_reginfo = {
339
.name = "CRN0_WI", .cp = 15, .crn = 0, .crm = CP_ANY,
340
.opc1 = CP_ANY, .opc2 = CP_ANY, .access = PL1_W,
341
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
342
define_arm_cp_regs(cpu, id_cp_reginfo);
343
if (!arm_feature(env, ARM_FEATURE_PMSA)) {
344
define_one_arm_cp_reg(cpu, &id_tlbtr_reginfo);
345
+ } else if (arm_feature(env, ARM_FEATURE_PMSA) &&
346
+ arm_feature(env, ARM_FEATURE_V8)) {
347
+ uint32_t i = 0;
348
+ char *tmp_string;
349
+
350
+ define_one_arm_cp_reg(cpu, &id_mpuir_reginfo);
351
+ define_one_arm_cp_reg(cpu, &id_hmpuir_reginfo);
352
+ define_arm_cp_regs(cpu, pmsav8r_cp_reginfo);
353
+
354
+ /* Register alias is only valid for first 32 indexes */
355
+ for (i = 0; i < MIN(cpu->pmsav7_dregion, 32); ++i) {
356
+ uint8_t crm = 0b1000 | extract32(i, 1, 3);
357
+ uint8_t opc1 = extract32(i, 4, 1);
358
+ uint8_t opc2 = extract32(i, 0, 1) << 2;
359
+
360
+ tmp_string = g_strdup_printf("PRBAR%u", i);
361
+ ARMCPRegInfo tmp_prbarn_reginfo = {
362
+ .name = tmp_string, .type = ARM_CP_ALIAS | ARM_CP_NO_RAW,
363
+ .cp = 15, .opc1 = opc1, .crn = 6, .crm = crm, .opc2 = opc2,
364
+ .access = PL1_RW, .resetvalue = 0,
365
+ .accessfn = access_tvm_trvm,
366
+ .writefn = pmsav8r_regn_write, .readfn = pmsav8r_regn_read
367
+ };
368
+ define_one_arm_cp_reg(cpu, &tmp_prbarn_reginfo);
369
+ g_free(tmp_string);
370
+
371
+ opc2 = extract32(i, 0, 1) << 2 | 0x1;
372
+ tmp_string = g_strdup_printf("PRLAR%u", i);
373
+ ARMCPRegInfo tmp_prlarn_reginfo = {
374
+ .name = tmp_string, .type = ARM_CP_ALIAS | ARM_CP_NO_RAW,
375
+ .cp = 15, .opc1 = opc1, .crn = 6, .crm = crm, .opc2 = opc2,
376
+ .access = PL1_RW, .resetvalue = 0,
377
+ .accessfn = access_tvm_trvm,
378
+ .writefn = pmsav8r_regn_write, .readfn = pmsav8r_regn_read
379
+ };
380
+ define_one_arm_cp_reg(cpu, &tmp_prlarn_reginfo);
381
+ g_free(tmp_string);
382
+ }
383
+
384
+ /* Register alias is only valid for first 32 indexes */
385
+ for (i = 0; i < MIN(cpu->pmsav8r_hdregion, 32); ++i) {
386
+ uint8_t crm = 0b1000 | extract32(i, 1, 3);
387
+ uint8_t opc1 = 0b100 | extract32(i, 4, 1);
388
+ uint8_t opc2 = extract32(i, 0, 1) << 2;
389
+
390
+ tmp_string = g_strdup_printf("HPRBAR%u", i);
391
+ ARMCPRegInfo tmp_hprbarn_reginfo = {
392
+ .name = tmp_string,
393
+ .type = ARM_CP_NO_RAW,
394
+ .cp = 15, .opc1 = opc1, .crn = 6, .crm = crm, .opc2 = opc2,
395
+ .access = PL2_RW, .resetvalue = 0,
396
+ .writefn = pmsav8r_regn_write, .readfn = pmsav8r_regn_read
397
+ };
398
+ define_one_arm_cp_reg(cpu, &tmp_hprbarn_reginfo);
399
+ g_free(tmp_string);
400
+
401
+ opc2 = extract32(i, 0, 1) << 2 | 0x1;
402
+ tmp_string = g_strdup_printf("HPRLAR%u", i);
403
+ ARMCPRegInfo tmp_hprlarn_reginfo = {
404
+ .name = tmp_string,
405
+ .type = ARM_CP_NO_RAW,
406
+ .cp = 15, .opc1 = opc1, .crn = 6, .crm = crm, .opc2 = opc2,
407
+ .access = PL2_RW, .resetvalue = 0,
408
+ .writefn = pmsav8r_regn_write, .readfn = pmsav8r_regn_read
409
+ };
410
+ define_one_arm_cp_reg(cpu, &tmp_hprlarn_reginfo);
411
+ g_free(tmp_string);
412
+ }
413
} else if (arm_feature(env, ARM_FEATURE_V7)) {
414
define_one_arm_cp_reg(cpu, &id_mpuir_reginfo);
415
}
416
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
417
sctlr.type |= ARM_CP_SUPPRESS_TB_END;
418
}
419
define_one_arm_cp_reg(cpu, &sctlr);
420
+
421
+ if (arm_feature(env, ARM_FEATURE_PMSA) &&
422
+ arm_feature(env, ARM_FEATURE_V8)) {
423
+ ARMCPRegInfo vsctlr = {
424
+ .name = "VSCTLR", .state = ARM_CP_STATE_AA32,
425
+ .cp = 15, .opc1 = 4, .crn = 2, .crm = 0, .opc2 = 0,
426
+ .access = PL2_RW, .resetvalue = 0x0,
427
+ .fieldoffset = offsetoflow32(CPUARMState, cp15.vsctlr),
428
+ };
429
+ define_one_arm_cp_reg(cpu, &vsctlr);
430
+ }
431
}
432
433
if (cpu_isar_feature(aa64_lor, cpu)) {
434
diff --git a/target/arm/machine.c b/target/arm/machine.c
435
index XXXXXXX..XXXXXXX 100644
436
--- a/target/arm/machine.c
437
+++ b/target/arm/machine.c
438
@@ -XXX,XX +XXX,XX @@ static bool pmsav8_needed(void *opaque)
439
arm_feature(env, ARM_FEATURE_V8);
440
}
441
442
+static bool pmsav8r_needed(void *opaque)
443
+{
444
+ ARMCPU *cpu = opaque;
445
+ CPUARMState *env = &cpu->env;
446
+
447
+ return arm_feature(env, ARM_FEATURE_PMSA) &&
448
+ arm_feature(env, ARM_FEATURE_V8) &&
449
+ !arm_feature(env, ARM_FEATURE_M);
450
+}
451
+
452
+static const VMStateDescription vmstate_pmsav8r = {
453
+ .name = "cpu/pmsav8/pmsav8r",
454
+ .version_id = 1,
455
+ .minimum_version_id = 1,
456
+ .needed = pmsav8r_needed,
457
+ .fields = (VMStateField[]) {
458
+ VMSTATE_VARRAY_UINT32(env.pmsav8.hprbar, ARMCPU,
459
+ pmsav8r_hdregion, 0, vmstate_info_uint32, uint32_t),
460
+ VMSTATE_VARRAY_UINT32(env.pmsav8.hprlar, ARMCPU,
461
+ pmsav8r_hdregion, 0, vmstate_info_uint32, uint32_t),
462
+ VMSTATE_END_OF_LIST()
463
+ },
464
+};
465
+
466
static const VMStateDescription vmstate_pmsav8 = {
467
.name = "cpu/pmsav8",
468
.version_id = 1,
469
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_pmsav8 = {
470
VMSTATE_UINT32(env.pmsav8.mair0[M_REG_NS], ARMCPU),
471
VMSTATE_UINT32(env.pmsav8.mair1[M_REG_NS], ARMCPU),
472
VMSTATE_END_OF_LIST()
473
+ },
474
+ .subsections = (const VMStateDescription * []) {
475
+ &vmstate_pmsav8r,
476
+ NULL
477
}
478
};
479
45
--
480
--
46
2.20.1
481
2.25.1
47
482
48
483
diff view generated by jsdifflib
1
Convert the Neon VRSQRTS insn to using a gvec helper,
1
From: Tobias Röhmel <tobias.roehmel@rwth-aachen.de>
2
and use this to implement the fp16 case.
2
3
3
Add PMSAv8r translation.
4
As with VRECPS, we adjust the phrasing of the new implementation
4
5
slightly so that the fp32 version parallels the fp16 one.
5
Signed-off-by: Tobias Röhmel <tobias.roehmel@rwth-aachen.de>
6
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
Message-id: 20221206102504.165775-7-tobias.roehmel@rwth-aachen.de
7
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
9
Message-id: 20200828183354.27913-35-peter.maydell@linaro.org
10
---
9
---
11
target/arm/helper.h | 4 +++-
10
target/arm/ptw.c | 126 ++++++++++++++++++++++++++++++++++++++---------
12
target/arm/vec_helper.c | 30 ++++++++++++++++++++++++++++++
11
1 file changed, 104 insertions(+), 22 deletions(-)
13
target/arm/vfp_helper.c | 15 ---------------
12
14
target/arm/translate-neon.c.inc | 21 +--------------------
13
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
15
4 files changed, 34 insertions(+), 36 deletions(-)
16
17
diff --git a/target/arm/helper.h b/target/arm/helper.h
18
index XXXXXXX..XXXXXXX 100644
14
index XXXXXXX..XXXXXXX 100644
19
--- a/target/arm/helper.h
15
--- a/target/arm/ptw.c
20
+++ b/target/arm/helper.h
16
+++ b/target/arm/ptw.c
21
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_4(vfp_muladdd, f64, f64, f64, f64, ptr)
17
@@ -XXX,XX +XXX,XX @@ static bool pmsav7_use_background_region(ARMCPU *cpu, ARMMMUIdx mmu_idx,
22
DEF_HELPER_4(vfp_muladds, f32, f32, f32, f32, ptr)
18
23
DEF_HELPER_4(vfp_muladdh, f16, f16, f16, f16, ptr)
19
if (arm_feature(env, ARM_FEATURE_M)) {
24
20
return env->v7m.mpu_ctrl[is_secure] & R_V7M_MPU_CTRL_PRIVDEFENA_MASK;
25
-DEF_HELPER_3(rsqrts_f32, f32, env, f32, f32)
21
- } else {
26
DEF_HELPER_FLAGS_2(recpe_f16, TCG_CALL_NO_RWG, f16, f16, ptr)
22
- return regime_sctlr(env, mmu_idx) & SCTLR_BR;
27
DEF_HELPER_FLAGS_2(recpe_f32, TCG_CALL_NO_RWG, f32, f32, ptr)
23
}
28
DEF_HELPER_FLAGS_2(recpe_f64, TCG_CALL_NO_RWG, f64, f64, ptr)
24
+
29
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fminnum_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i3
25
+ if (mmu_idx == ARMMMUIdx_Stage2) {
30
DEF_HELPER_FLAGS_5(gvec_recps_nf_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
26
+ return false;
31
DEF_HELPER_FLAGS_5(gvec_recps_nf_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
27
+ }
32
28
+
33
+DEF_HELPER_FLAGS_5(gvec_rsqrts_nf_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
29
+ return regime_sctlr(env, mmu_idx) & SCTLR_BR;
34
+DEF_HELPER_FLAGS_5(gvec_rsqrts_nf_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
35
+
36
DEF_HELPER_FLAGS_5(gvec_fmla_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
37
DEF_HELPER_FLAGS_5(gvec_fmla_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
38
39
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
40
index XXXXXXX..XXXXXXX 100644
41
--- a/target/arm/vec_helper.c
42
+++ b/target/arm/vec_helper.c
43
@@ -XXX,XX +XXX,XX @@ static float32 float32_recps_nf(float32 op1, float32 op2, float_status *stat)
44
return float32_sub(float32_two, float32_mul(op1, op2, stat), stat);
45
}
30
}
46
31
47
+/* Reciprocal square-root step. AArch32 non-fused semantics. */
32
static bool get_phys_addr_pmsav7(CPUARMState *env, uint32_t address,
48
+static float16 float16_rsqrts_nf(float16 op1, float16 op2, float_status *stat)
33
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_pmsav7(CPUARMState *env, uint32_t address,
34
return !(result->f.prot & (1 << access_type));
35
}
36
37
+static uint32_t *regime_rbar(CPUARMState *env, ARMMMUIdx mmu_idx,
38
+ uint32_t secure)
49
+{
39
+{
50
+ op1 = float16_squash_input_denormal(op1, stat);
40
+ if (regime_el(env, mmu_idx) == 2) {
51
+ op2 = float16_squash_input_denormal(op2, stat);
41
+ return env->pmsav8.hprbar;
52
+
42
+ } else {
53
+ if ((float16_is_infinity(op1) && float16_is_zero(op2)) ||
43
+ return env->pmsav8.rbar[secure];
54
+ (float16_is_infinity(op2) && float16_is_zero(op1))) {
44
+ }
55
+ return float16_one_point_five;
56
+ }
57
+ op1 = float16_sub(float16_three, float16_mul(op1, op2, stat), stat);
58
+ return float16_div(op1, float16_two, stat);
59
+}
45
+}
60
+
46
+
61
+static float32 float32_rsqrts_nf(float32 op1, float32 op2, float_status *stat)
47
+static uint32_t *regime_rlar(CPUARMState *env, ARMMMUIdx mmu_idx,
48
+ uint32_t secure)
62
+{
49
+{
63
+ op1 = float32_squash_input_denormal(op1, stat);
50
+ if (regime_el(env, mmu_idx) == 2) {
64
+ op2 = float32_squash_input_denormal(op2, stat);
51
+ return env->pmsav8.hprlar;
65
+
52
+ } else {
66
+ if ((float32_is_infinity(op1) && float32_is_zero(op2)) ||
53
+ return env->pmsav8.rlar[secure];
67
+ (float32_is_infinity(op2) && float32_is_zero(op1))) {
54
+ }
68
+ return float32_one_point_five;
69
+ }
70
+ op1 = float32_sub(float32_three, float32_mul(op1, op2, stat), stat);
71
+ return float32_div(op1, float32_two, stat);
72
+}
55
+}
73
+
56
+
74
#define DO_3OP(NAME, FUNC, TYPE) \
57
bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
75
void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \
58
MMUAccessType access_type, ARMMMUIdx mmu_idx,
76
{ \
59
bool secure, GetPhysAddrResult *result,
77
@@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_fminnum_s, float32_minnum, float32)
60
@@ -XXX,XX +XXX,XX @@ bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
78
DO_3OP(gvec_recps_nf_h, float16_recps_nf, float16)
61
bool hit = false;
79
DO_3OP(gvec_recps_nf_s, float32_recps_nf, float32)
62
uint32_t addr_page_base = address & TARGET_PAGE_MASK;
80
63
uint32_t addr_page_limit = addr_page_base + (TARGET_PAGE_SIZE - 1);
81
+DO_3OP(gvec_rsqrts_nf_h, float16_rsqrts_nf, float16)
64
+ int region_counter;
82
+DO_3OP(gvec_rsqrts_nf_s, float32_rsqrts_nf, float32)
65
+
83
+
66
+ if (regime_el(env, mmu_idx) == 2) {
84
#ifdef TARGET_AARCH64
67
+ region_counter = cpu->pmsav8r_hdregion;
85
68
+ } else {
86
DO_3OP(gvec_recps_h, helper_recpsf_f16, float16)
69
+ region_counter = cpu->pmsav7_dregion;
87
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
70
+ }
88
index XXXXXXX..XXXXXXX 100644
71
89
--- a/target/arm/vfp_helper.c
72
result->f.lg_page_size = TARGET_PAGE_BITS;
90
+++ b/target/arm/vfp_helper.c
73
result->f.phys_addr = address;
91
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(vfp_fcvt_f64_to_f16)(float64 a, void *fpstp, uint32_t ahp_mode)
74
@@ -XXX,XX +XXX,XX @@ bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
92
return r;
75
*mregion = -1;
76
}
77
78
+ if (mmu_idx == ARMMMUIdx_Stage2) {
79
+ fi->stage2 = true;
80
+ }
81
+
82
/*
83
* Unlike the ARM ARM pseudocode, we don't need to check whether this
84
* was an exception vector read from the vector table (which is always
85
@@ -XXX,XX +XXX,XX @@ bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
86
hit = true;
87
}
88
89
- for (n = (int)cpu->pmsav7_dregion - 1; n >= 0; n--) {
90
+ uint32_t bitmask;
91
+ if (arm_feature(env, ARM_FEATURE_M)) {
92
+ bitmask = 0x1f;
93
+ } else {
94
+ bitmask = 0x3f;
95
+ fi->level = 0;
96
+ }
97
+
98
+ for (n = region_counter - 1; n >= 0; n--) {
99
/* region search */
100
/*
101
- * Note that the base address is bits [31:5] from the register
102
- * with bits [4:0] all zeroes, but the limit address is bits
103
- * [31:5] from the register with bits [4:0] all ones.
104
+ * Note that the base address is bits [31:x] from the register
105
+ * with bits [x-1:0] all zeroes, but the limit address is bits
106
+ * [31:x] from the register with bits [x:0] all ones. Where x is
107
+ * 5 for Cortex-M and 6 for Cortex-R
108
*/
109
- uint32_t base = env->pmsav8.rbar[secure][n] & ~0x1f;
110
- uint32_t limit = env->pmsav8.rlar[secure][n] | 0x1f;
111
+ uint32_t base = regime_rbar(env, mmu_idx, secure)[n] & ~bitmask;
112
+ uint32_t limit = regime_rlar(env, mmu_idx, secure)[n] | bitmask;
113
114
- if (!(env->pmsav8.rlar[secure][n] & 0x1)) {
115
+ if (!(regime_rlar(env, mmu_idx, secure)[n] & 0x1)) {
116
/* Region disabled */
117
continue;
118
}
119
@@ -XXX,XX +XXX,XX @@ bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
120
* PMSAv7 where highest-numbered-region wins)
121
*/
122
fi->type = ARMFault_Permission;
123
- fi->level = 1;
124
+ if (arm_feature(env, ARM_FEATURE_M)) {
125
+ fi->level = 1;
126
+ }
127
return true;
128
}
129
130
@@ -XXX,XX +XXX,XX @@ bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
131
}
132
133
if (!hit) {
134
- /* background fault */
135
- fi->type = ARMFault_Background;
136
+ if (arm_feature(env, ARM_FEATURE_M)) {
137
+ fi->type = ARMFault_Background;
138
+ } else {
139
+ fi->type = ARMFault_Permission;
140
+ }
141
return true;
142
}
143
144
@@ -XXX,XX +XXX,XX @@ bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
145
/* hit using the background region */
146
get_phys_addr_pmsav7_default(env, mmu_idx, address, &result->f.prot);
147
} else {
148
- uint32_t ap = extract32(env->pmsav8.rbar[secure][matchregion], 1, 2);
149
- uint32_t xn = extract32(env->pmsav8.rbar[secure][matchregion], 0, 1);
150
+ uint32_t matched_rbar = regime_rbar(env, mmu_idx, secure)[matchregion];
151
+ uint32_t matched_rlar = regime_rlar(env, mmu_idx, secure)[matchregion];
152
+ uint32_t ap = extract32(matched_rbar, 1, 2);
153
+ uint32_t xn = extract32(matched_rbar, 0, 1);
154
bool pxn = false;
155
156
if (arm_feature(env, ARM_FEATURE_V8_1M)) {
157
- pxn = extract32(env->pmsav8.rlar[secure][matchregion], 4, 1);
158
+ pxn = extract32(matched_rlar, 4, 1);
159
}
160
161
if (m_is_system_region(env, address)) {
162
@@ -XXX,XX +XXX,XX @@ bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
163
xn = 1;
164
}
165
166
- result->f.prot = simple_ap_to_rw_prot(env, mmu_idx, ap);
167
+ if (regime_el(env, mmu_idx) == 2) {
168
+ result->f.prot = simple_ap_to_rw_prot_is_user(ap,
169
+ mmu_idx != ARMMMUIdx_E2);
170
+ } else {
171
+ result->f.prot = simple_ap_to_rw_prot(env, mmu_idx, ap);
172
+ }
173
+
174
+ if (!arm_feature(env, ARM_FEATURE_M)) {
175
+ uint8_t attrindx = extract32(matched_rlar, 1, 3);
176
+ uint64_t mair = env->cp15.mair_el[regime_el(env, mmu_idx)];
177
+ uint8_t sh = extract32(matched_rlar, 3, 2);
178
+
179
+ if (regime_sctlr(env, mmu_idx) & SCTLR_WXN &&
180
+ result->f.prot & PAGE_WRITE && mmu_idx != ARMMMUIdx_Stage2) {
181
+ xn = 0x1;
182
+ }
183
+
184
+ if ((regime_el(env, mmu_idx) == 1) &&
185
+ regime_sctlr(env, mmu_idx) & SCTLR_UWXN && ap == 0x1) {
186
+ pxn = 0x1;
187
+ }
188
+
189
+ result->cacheattrs.is_s2_format = false;
190
+ result->cacheattrs.attrs = extract64(mair, attrindx * 8, 8);
191
+ result->cacheattrs.shareability = sh;
192
+ }
193
+
194
if (result->f.prot && !xn && !(pxn && !is_user)) {
195
result->f.prot |= PAGE_EXEC;
196
}
197
- /*
198
- * We don't need to look the attribute up in the MAIR0/MAIR1
199
- * registers because that only tells us about cacheability.
200
- */
201
+
202
if (mregion) {
203
*mregion = matchregion;
204
}
205
}
206
207
fi->type = ARMFault_Permission;
208
- fi->level = 1;
209
+ if (arm_feature(env, ARM_FEATURE_M)) {
210
+ fi->level = 1;
211
+ }
212
return !(result->f.prot & (1 << access_type));
93
}
213
}
94
214
95
-float32 HELPER(rsqrts_f32)(CPUARMState *env, float32 a, float32 b)
215
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
96
-{
216
cacheattrs1 = result->cacheattrs;
97
- float_status *s = &env->vfp.standard_fp_status;
217
memset(result, 0, sizeof(*result));
98
- float32 product;
218
99
- if ((float32_is_infinity(a) && float32_is_zero_or_denormal(b)) ||
219
- ret = get_phys_addr_lpae(env, ptw, ipa, access_type, is_el0, result, fi);
100
- (float32_is_infinity(b) && float32_is_zero_or_denormal(a))) {
220
+ if (arm_feature(env, ARM_FEATURE_PMSA)) {
101
- if (!(float32_is_zero(a) || float32_is_zero(b))) {
221
+ ret = get_phys_addr_pmsav8(env, ipa, access_type,
102
- float_raise(float_flag_input_denormal, s);
222
+ ptw->in_mmu_idx, is_secure, result, fi);
103
- }
223
+ } else {
104
- return float32_one_point_five;
224
+ ret = get_phys_addr_lpae(env, ptw, ipa, access_type,
105
- }
225
+ is_el0, result, fi);
106
- product = float32_mul(a, b, s);
226
+ }
107
- return float32_div(float32_sub(float32_three, product, s), float32_two, s);
227
fi->s2addr = ipa;
108
-}
228
109
-
229
/* Combine the S1 and S2 perms. */
110
/* NEON helpers. */
111
112
/* Constants 256 and 512 are used in some helpers; we avoid relying on
113
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
114
index XXXXXXX..XXXXXXX 100644
115
--- a/target/arm/translate-neon.c.inc
116
+++ b/target/arm/translate-neon.c.inc
117
@@ -XXX,XX +XXX,XX @@ DO_3S_FP_GVEC(VMLS, gen_helper_gvec_fmls_s, gen_helper_gvec_fmls_h)
118
DO_3S_FP_GVEC(VFMA, gen_helper_gvec_vfma_s, gen_helper_gvec_vfma_h)
119
DO_3S_FP_GVEC(VFMS, gen_helper_gvec_vfms_s, gen_helper_gvec_vfms_h)
120
DO_3S_FP_GVEC(VRECPS, gen_helper_gvec_recps_nf_s, gen_helper_gvec_recps_nf_h)
121
+DO_3S_FP_GVEC(VRSQRTS, gen_helper_gvec_rsqrts_nf_s, gen_helper_gvec_rsqrts_nf_h)
122
123
WRAP_FP_GVEC(gen_VMAXNM_fp32_3s, FPST_STD, gen_helper_gvec_fmaxnum_s)
124
WRAP_FP_GVEC(gen_VMAXNM_fp16_3s, FPST_STD_F16, gen_helper_gvec_fmaxnum_h)
125
@@ -XXX,XX +XXX,XX @@ static bool trans_VMINNM_fp_3s(DisasContext *s, arg_3same *a)
126
return do_3same(s, a, gen_VMINNM_fp32_3s);
127
}
128
129
-WRAP_ENV_FN(gen_VRSQRTS_tramp, gen_helper_rsqrts_f32)
130
-
131
-static void gen_VRSQRTS_fp_3s(unsigned vece, uint32_t rd_ofs,
132
- uint32_t rn_ofs, uint32_t rm_ofs,
133
- uint32_t oprsz, uint32_t maxsz)
134
-{
135
- static const GVecGen3 ops = { .fni4 = gen_VRSQRTS_tramp };
136
- tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &ops);
137
-}
138
-
139
-static bool trans_VRSQRTS_fp_3s(DisasContext *s, arg_3same *a)
140
-{
141
- if (a->size != 0) {
142
- /* TODO fp16 support */
143
- return false;
144
- }
145
-
146
- return do_3same(s, a, gen_VRSQRTS_fp_3s);
147
-}
148
-
149
static bool do_3same_fp_pair(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn)
150
{
151
/* FP operations handled pairwise 32 bits at a time */
152
--
230
--
153
2.20.1
231
2.25.1
154
232
155
233
diff view generated by jsdifflib
1
Rewrite Neon VABS/VNEG of floats to use gvec logical AND and XOR, so
1
From: Tobias Röhmel <tobias.roehmel@rwth-aachen.de>
2
that we can implement the fp16 version of the insns.
3
2
3
All constants are taken from the ARM Cortex-R52 Processor TRM Revision: r1p3
4
5
Signed-off-by: Tobias Röhmel <tobias.roehmel@rwth-aachen.de>
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
Message-id: 20221206102504.165775-8-tobias.roehmel@rwth-aachen.de
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Message-id: 20200828183354.27913-26-peter.maydell@linaro.org
7
---
9
---
8
target/arm/translate-neon.c.inc | 34 +++++++++++++++++++++++++++------
10
target/arm/cpu_tcg.c | 42 ++++++++++++++++++++++++++++++++++++++++++
9
1 file changed, 28 insertions(+), 6 deletions(-)
11
1 file changed, 42 insertions(+)
10
12
11
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
13
diff --git a/target/arm/cpu_tcg.c b/target/arm/cpu_tcg.c
12
index XXXXXXX..XXXXXXX 100644
14
index XXXXXXX..XXXXXXX 100644
13
--- a/target/arm/translate-neon.c.inc
15
--- a/target/arm/cpu_tcg.c
14
+++ b/target/arm/translate-neon.c.inc
16
+++ b/target/arm/cpu_tcg.c
15
@@ -XXX,XX +XXX,XX @@ static bool trans_VCNT(DisasContext *s, arg_2misc *a)
17
@@ -XXX,XX +XXX,XX @@ static void cortex_r5_initfn(Object *obj)
16
return do_2misc(s, a, gen_helper_neon_cnt_u8);
18
define_arm_cp_regs(cpu, cortexr5_cp_reginfo);
17
}
19
}
18
20
19
+static void gen_VABS_F(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
21
+static void cortex_r52_initfn(Object *obj)
20
+ uint32_t oprsz, uint32_t maxsz)
21
+{
22
+{
22
+ tcg_gen_gvec_andi(vece, rd_ofs, rm_ofs,
23
+ ARMCPU *cpu = ARM_CPU(obj);
23
+ vece == MO_16 ? 0x7fff : 0x7fffffff,
24
+
24
+ oprsz, maxsz);
25
+ set_feature(&cpu->env, ARM_FEATURE_V8);
26
+ set_feature(&cpu->env, ARM_FEATURE_EL2);
27
+ set_feature(&cpu->env, ARM_FEATURE_PMSA);
28
+ set_feature(&cpu->env, ARM_FEATURE_NEON);
29
+ set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
30
+ cpu->midr = 0x411fd133; /* r1p3 */
31
+ cpu->revidr = 0x00000000;
32
+ cpu->reset_fpsid = 0x41034023;
33
+ cpu->isar.mvfr0 = 0x10110222;
34
+ cpu->isar.mvfr1 = 0x12111111;
35
+ cpu->isar.mvfr2 = 0x00000043;
36
+ cpu->ctr = 0x8144c004;
37
+ cpu->reset_sctlr = 0x30c50838;
38
+ cpu->isar.id_pfr0 = 0x00000131;
39
+ cpu->isar.id_pfr1 = 0x10111001;
40
+ cpu->isar.id_dfr0 = 0x03010006;
41
+ cpu->id_afr0 = 0x00000000;
42
+ cpu->isar.id_mmfr0 = 0x00211040;
43
+ cpu->isar.id_mmfr1 = 0x40000000;
44
+ cpu->isar.id_mmfr2 = 0x01200000;
45
+ cpu->isar.id_mmfr3 = 0xf0102211;
46
+ cpu->isar.id_mmfr4 = 0x00000010;
47
+ cpu->isar.id_isar0 = 0x02101110;
48
+ cpu->isar.id_isar1 = 0x13112111;
49
+ cpu->isar.id_isar2 = 0x21232142;
50
+ cpu->isar.id_isar3 = 0x01112131;
51
+ cpu->isar.id_isar4 = 0x00010142;
52
+ cpu->isar.id_isar5 = 0x00010001;
53
+ cpu->isar.dbgdidr = 0x77168000;
54
+ cpu->clidr = (1 << 27) | (1 << 24) | 0x3;
55
+ cpu->ccsidr[0] = 0x700fe01a; /* 32KB L1 dcache */
56
+ cpu->ccsidr[1] = 0x201fe00a; /* 32KB L1 icache */
57
+
58
+ cpu->pmsav7_dregion = 16;
59
+ cpu->pmsav8r_hdregion = 16;
25
+}
60
+}
26
+
61
+
27
static bool trans_VABS_F(DisasContext *s, arg_2misc *a)
62
static void cortex_r5f_initfn(Object *obj)
28
{
63
{
29
- if (a->size != 2) {
64
ARMCPU *cpu = ARM_CPU(obj);
30
+ if (a->size == MO_16) {
65
@@ -XXX,XX +XXX,XX @@ static const ARMCPUInfo arm_tcg_cpus[] = {
31
+ if (!dc_isar_feature(aa32_fp16_arith, s)) {
66
.class_init = arm_v7m_class_init },
32
+ return false;
67
{ .name = "cortex-r5", .initfn = cortex_r5_initfn },
33
+ }
68
{ .name = "cortex-r5f", .initfn = cortex_r5f_initfn },
34
+ } else if (a->size != MO_32) {
69
+ { .name = "cortex-r52", .initfn = cortex_r52_initfn },
35
return false;
70
{ .name = "ti925t", .initfn = ti925t_initfn },
36
}
71
{ .name = "sa1100", .initfn = sa1100_initfn },
37
- /* TODO: FP16 : size == 1 */
72
{ .name = "sa1110", .initfn = sa1110_initfn },
38
- return do_2misc(s, a, gen_helper_vfp_abss);
39
+ return do_2misc_vec(s, a, gen_VABS_F);
40
+}
41
+
42
+static void gen_VNEG_F(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
43
+ uint32_t oprsz, uint32_t maxsz)
44
+{
45
+ tcg_gen_gvec_xori(vece, rd_ofs, rm_ofs,
46
+ vece == MO_16 ? 0x8000 : 0x80000000,
47
+ oprsz, maxsz);
48
}
49
50
static bool trans_VNEG_F(DisasContext *s, arg_2misc *a)
51
{
52
- if (a->size != 2) {
53
+ if (a->size == MO_16) {
54
+ if (!dc_isar_feature(aa32_fp16_arith, s)) {
55
+ return false;
56
+ }
57
+ } else if (a->size != MO_32) {
58
return false;
59
}
60
- /* TODO: FP16 : size == 1 */
61
- return do_2misc(s, a, gen_helper_vfp_negs);
62
+ return do_2misc_vec(s, a, gen_VNEG_F);
63
}
64
65
static bool trans_VRECPE(DisasContext *s, arg_2misc *a)
66
--
73
--
67
2.20.1
74
2.25.1
68
75
69
76
diff view generated by jsdifflib
1
Add gvec helpers for doing Neon-style indexed non-fused fp
1
From: Alex Bennée <alex.bennee@linaro.org>
2
multiply-and-accumulate operations.
3
2
3
The check semihosting_enabled() wants to know if the guest is
4
currently in user mode. Unlike the other cases the test was inverted
5
causing us to block semihosting calls in non-EL0 modes.
6
7
Cc: qemu-stable@nongnu.org
8
Fixes: 19b26317e9 (target/arm: Honour -semihosting-config userspace=on)
9
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
10
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Message-id: 20200828183354.27913-44-peter.maydell@linaro.org
6
---
12
---
7
target/arm/helper.h | 10 ++++++++++
13
target/arm/translate.c | 2 +-
8
target/arm/vec_helper.c | 27 ++++++++++++++++++++++-----
14
1 file changed, 1 insertion(+), 1 deletion(-)
9
2 files changed, 32 insertions(+), 5 deletions(-)
10
15
11
diff --git a/target/arm/helper.h b/target/arm/helper.h
16
diff --git a/target/arm/translate.c b/target/arm/translate.c
12
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
13
--- a/target/arm/helper.h
18
--- a/target/arm/translate.c
14
+++ b/target/arm/helper.h
19
+++ b/target/arm/translate.c
15
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fmul_idx_s, TCG_CALL_NO_RWG,
20
@@ -XXX,XX +XXX,XX @@ static inline void gen_hlt(DisasContext *s, int imm)
16
DEF_HELPER_FLAGS_5(gvec_fmul_idx_d, TCG_CALL_NO_RWG,
21
* semihosting, to provide some semblance of security
17
void, ptr, ptr, ptr, ptr, i32)
22
* (and for consistency with our 32-bit semihosting).
18
23
*/
19
+DEF_HELPER_FLAGS_5(gvec_fmla_nf_idx_h, TCG_CALL_NO_RWG,
24
- if (semihosting_enabled(s->current_el != 0) &&
20
+ void, ptr, ptr, ptr, ptr, i32)
25
+ if (semihosting_enabled(s->current_el == 0) &&
21
+DEF_HELPER_FLAGS_5(gvec_fmla_nf_idx_s, TCG_CALL_NO_RWG,
26
(imm == (s->thumb ? 0x3c : 0xf000))) {
22
+ void, ptr, ptr, ptr, ptr, i32)
27
gen_exception_internal_insn(s, EXCP_SEMIHOST);
23
+
28
return;
24
+DEF_HELPER_FLAGS_5(gvec_fmls_nf_idx_h, TCG_CALL_NO_RWG,
25
+ void, ptr, ptr, ptr, ptr, i32)
26
+DEF_HELPER_FLAGS_5(gvec_fmls_nf_idx_s, TCG_CALL_NO_RWG,
27
+ void, ptr, ptr, ptr, ptr, i32)
28
+
29
DEF_HELPER_FLAGS_6(gvec_fmla_idx_h, TCG_CALL_NO_RWG,
30
void, ptr, ptr, ptr, ptr, ptr, i32)
31
DEF_HELPER_FLAGS_6(gvec_fmla_idx_s, TCG_CALL_NO_RWG,
32
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
33
index XXXXXXX..XXXXXXX 100644
34
--- a/target/arm/vec_helper.c
35
+++ b/target/arm/vec_helper.c
36
@@ -XXX,XX +XXX,XX @@ DO_MLA_IDX(gvec_mls_idx_d, uint64_t, -, )
37
38
#undef DO_MLA_IDX
39
40
-#define DO_FMUL_IDX(NAME, TYPE, H) \
41
+#define DO_FMUL_IDX(NAME, ADD, TYPE, H) \
42
void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \
43
{ \
44
intptr_t i, j, oprsz = simd_oprsz(desc); \
45
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \
46
for (i = 0; i < oprsz / sizeof(TYPE); i += segment) { \
47
TYPE mm = m[H(i + idx)]; \
48
for (j = 0; j < segment; j++) { \
49
- d[i + j] = TYPE##_mul(n[i + j], mm, stat); \
50
+ d[i + j] = TYPE##_##ADD(d[i + j], \
51
+ TYPE##_mul(n[i + j], mm, stat), stat); \
52
} \
53
} \
54
clear_tail(d, oprsz, simd_maxsz(desc)); \
55
}
56
57
-DO_FMUL_IDX(gvec_fmul_idx_h, float16, H2)
58
-DO_FMUL_IDX(gvec_fmul_idx_s, float32, H4)
59
-DO_FMUL_IDX(gvec_fmul_idx_d, float64, )
60
+#define float16_nop(N, M, S) (M)
61
+#define float32_nop(N, M, S) (M)
62
+#define float64_nop(N, M, S) (M)
63
64
+DO_FMUL_IDX(gvec_fmul_idx_h, nop, float16, H2)
65
+DO_FMUL_IDX(gvec_fmul_idx_s, nop, float32, H4)
66
+DO_FMUL_IDX(gvec_fmul_idx_d, nop, float64, )
67
+
68
+/*
69
+ * Non-fused multiply-accumulate operations, for Neon. NB that unlike
70
+ * the fused ops below they assume accumulate both from and into Vd.
71
+ */
72
+DO_FMUL_IDX(gvec_fmla_nf_idx_h, add, float16, H2)
73
+DO_FMUL_IDX(gvec_fmla_nf_idx_s, add, float32, H4)
74
+DO_FMUL_IDX(gvec_fmls_nf_idx_h, sub, float16, H2)
75
+DO_FMUL_IDX(gvec_fmls_nf_idx_s, sub, float32, H4)
76
+
77
+#undef float16_nop
78
+#undef float32_nop
79
+#undef float64_nop
80
#undef DO_FMUL_IDX
81
82
#define DO_FMLA_IDX(NAME, TYPE, H) \
83
--
29
--
84
2.20.1
30
2.25.1
85
31
86
32
diff view generated by jsdifflib
1
Convert the Neon pairwise fp ops to use a single gvic-style
1
From: Axel Heider <axel.heider@hensoldt.net>
2
helper to do the full operation instead of one helper call
3
for each 32-bit part. This allows us to use the same
4
framework to implement the fp16.
5
2
3
Fix typos, add background information
4
5
Signed-off-by: Axel Heider <axel.heider@hensoldt.net>
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20200828183354.27913-36-peter.maydell@linaro.org
9
---
8
---
10
target/arm/helper.h | 7 +++++
9
hw/timer/imx_epit.c | 20 ++++++++++++++++----
11
target/arm/vec_helper.c | 45 +++++++++++++++++++++++++++++++++
10
1 file changed, 16 insertions(+), 4 deletions(-)
12
target/arm/translate-neon.c.inc | 42 ++++++++++++------------------
13
3 files changed, 68 insertions(+), 26 deletions(-)
14
11
15
diff --git a/target/arm/helper.h b/target/arm/helper.h
12
diff --git a/hw/timer/imx_epit.c b/hw/timer/imx_epit.c
16
index XXXXXXX..XXXXXXX 100644
13
index XXXXXXX..XXXXXXX 100644
17
--- a/target/arm/helper.h
14
--- a/hw/timer/imx_epit.c
18
+++ b/target/arm/helper.h
15
+++ b/hw/timer/imx_epit.c
19
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fcmlas_idx, TCG_CALL_NO_RWG,
16
@@ -XXX,XX +XXX,XX @@ static void imx_epit_set_freq(IMXEPITState *s)
20
DEF_HELPER_FLAGS_5(gvec_fcmlad, TCG_CALL_NO_RWG,
17
}
21
void, ptr, ptr, ptr, ptr, i32)
22
23
+DEF_HELPER_FLAGS_5(neon_paddh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
24
+DEF_HELPER_FLAGS_5(neon_pmaxh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
25
+DEF_HELPER_FLAGS_5(neon_pminh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
26
+DEF_HELPER_FLAGS_5(neon_padds, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
27
+DEF_HELPER_FLAGS_5(neon_pmaxs, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
28
+DEF_HELPER_FLAGS_5(neon_pmins, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
29
+
30
DEF_HELPER_FLAGS_4(gvec_frecpe_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
31
DEF_HELPER_FLAGS_4(gvec_frecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
32
DEF_HELPER_FLAGS_4(gvec_frecpe_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
33
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
34
index XXXXXXX..XXXXXXX 100644
35
--- a/target/arm/vec_helper.c
36
+++ b/target/arm/vec_helper.c
37
@@ -XXX,XX +XXX,XX @@ DO_ABA(gvec_uaba_s, uint32_t)
38
DO_ABA(gvec_uaba_d, uint64_t)
39
40
#undef DO_ABA
41
+
42
+#define DO_NEON_PAIRWISE(NAME, OP) \
43
+ void HELPER(NAME##s)(void *vd, void *vn, void *vm, \
44
+ void *stat, uint32_t oprsz) \
45
+ { \
46
+ float_status *fpst = stat; \
47
+ float32 *d = vd; \
48
+ float32 *n = vn; \
49
+ float32 *m = vm; \
50
+ float32 r0, r1; \
51
+ \
52
+ /* Read all inputs before writing outputs in case vm == vd */ \
53
+ r0 = float32_##OP(n[H4(0)], n[H4(1)], fpst); \
54
+ r1 = float32_##OP(m[H4(0)], m[H4(1)], fpst); \
55
+ \
56
+ d[H4(0)] = r0; \
57
+ d[H4(1)] = r1; \
58
+ } \
59
+ \
60
+ void HELPER(NAME##h)(void *vd, void *vn, void *vm, \
61
+ void *stat, uint32_t oprsz) \
62
+ { \
63
+ float_status *fpst = stat; \
64
+ float16 *d = vd; \
65
+ float16 *n = vn; \
66
+ float16 *m = vm; \
67
+ float16 r0, r1, r2, r3; \
68
+ \
69
+ /* Read all inputs before writing outputs in case vm == vd */ \
70
+ r0 = float16_##OP(n[H2(0)], n[H2(1)], fpst); \
71
+ r1 = float16_##OP(n[H2(2)], n[H2(3)], fpst); \
72
+ r2 = float16_##OP(m[H2(0)], m[H2(1)], fpst); \
73
+ r3 = float16_##OP(m[H2(2)], m[H2(3)], fpst); \
74
+ \
75
+ d[H4(0)] = r0; \
76
+ d[H4(1)] = r1; \
77
+ d[H4(2)] = r2; \
78
+ d[H4(3)] = r3; \
79
+ }
80
+
81
+DO_NEON_PAIRWISE(neon_padd, add)
82
+DO_NEON_PAIRWISE(neon_pmax, max)
83
+DO_NEON_PAIRWISE(neon_pmin, min)
84
+
85
+#undef DO_NEON_PAIRWISE
86
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
87
index XXXXXXX..XXXXXXX 100644
88
--- a/target/arm/translate-neon.c.inc
89
+++ b/target/arm/translate-neon.c.inc
90
@@ -XXX,XX +XXX,XX @@ static bool trans_VMINNM_fp_3s(DisasContext *s, arg_3same *a)
91
return do_3same(s, a, gen_VMINNM_fp32_3s);
92
}
18
}
93
19
94
-static bool do_3same_fp_pair(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn)
20
+/*
95
+static bool do_3same_fp_pair(DisasContext *s, arg_3same *a,
21
+ * This is called both on hardware (device) reset and software reset.
96
+ gen_helper_gvec_3_ptr *fn)
22
+ */
23
static void imx_epit_reset(DeviceState *dev)
97
{
24
{
98
- /* FP operations handled pairwise 32 bits at a time */
25
IMXEPITState *s = IMX_EPIT(dev);
99
- TCGv_i32 tmp, tmp2, tmp3;
100
+ /* FP pairwise operations */
101
TCGv_ptr fpstatus;
102
103
if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
104
@@ -XXX,XX +XXX,XX @@ static bool do_3same_fp_pair(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn)
105
106
assert(a->q == 0); /* enforced by decode patterns */
107
26
108
- /*
27
- /*
109
- * Note that we have to be careful not to clobber the source operands
28
- * Soft reset doesn't touch some bits; hard reset clears them
110
- * in the "vm == vd" case by storing the result of the first pass too
111
- * early. Since Q is 0 there are always just two passes, so instead
112
- * of a complicated loop over each pass we just unroll.
113
- */
29
- */
114
- fpstatus = fpstatus_ptr(FPST_STD);
30
+ /* Soft reset doesn't touch some bits; hard reset clears them */
115
- tmp = neon_load_reg(a->vn, 0);
31
s->cr &= (CR_EN|CR_ENMOD|CR_STOPEN|CR_DOZEN|CR_WAITEN|CR_DBGEN);
116
- tmp2 = neon_load_reg(a->vn, 1);
32
s->sr = 0;
117
- fn(tmp, tmp, tmp2, fpstatus);
33
s->lr = EPIT_TIMER_MAX;
118
- tcg_temp_free_i32(tmp2);
34
@@ -XXX,XX +XXX,XX @@ static void imx_epit_write(void *opaque, hwaddr offset, uint64_t value,
119
35
ptimer_transaction_begin(s->timer_cmp);
120
- tmp3 = neon_load_reg(a->vm, 0);
36
ptimer_transaction_begin(s->timer_reload);
121
- tmp2 = neon_load_reg(a->vm, 1);
37
122
- fn(tmp3, tmp3, tmp2, fpstatus);
38
+ /* Update the frequency. Has been done already in case of a reset. */
123
- tcg_temp_free_i32(tmp2);
39
if (!(s->cr & CR_SWR)) {
124
+ fpstatus = fpstatus_ptr(a->size != 0 ? FPST_STD_F16 : FPST_STD);
40
imx_epit_set_freq(s);
125
+ tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
41
}
126
+ vfp_reg_offset(1, a->vn),
42
@@ -XXX,XX +XXX,XX @@ static void imx_epit_write(void *opaque, hwaddr offset, uint64_t value,
127
+ vfp_reg_offset(1, a->vm),
43
break;
128
+ fpstatus, 8, 8, 0, fn);
44
129
tcg_temp_free_ptr(fpstatus);
45
case 1: /* SR - ACK*/
130
46
- /* writing 1 to OCIF clear the OCIF bit */
131
- neon_store_reg(a->vd, 0, tmp);
47
+ /* writing 1 to OCIF clears the OCIF bit */
132
- neon_store_reg(a->vd, 1, tmp3);
48
if (value & 0x01) {
133
return true;
49
s->sr = 0;
50
imx_epit_update_int(s);
51
@@ -XXX,XX +XXX,XX @@ static void imx_epit_realize(DeviceState *dev, Error **errp)
52
0x00001000);
53
sysbus_init_mmio(sbd, &s->iomem);
54
55
+ /*
56
+ * The reload timer keeps running when the peripheral is enabled. It is a
57
+ * kind of wall clock that does not generate any interrupts. The callback
58
+ * needs to be provided, but it does nothing as the ptimer already supports
59
+ * all necessary reloading functionality.
60
+ */
61
s->timer_reload = ptimer_init(imx_epit_reload, s, PTIMER_POLICY_LEGACY);
62
63
+ /*
64
+ * The compare timer is running only when the peripheral configuration is
65
+ * in a state that will generate compare interrupts.
66
+ */
67
s->timer_cmp = ptimer_init(imx_epit_cmp, s, PTIMER_POLICY_LEGACY);
134
}
68
}
135
69
136
@@ -XXX,XX +XXX,XX @@ static bool do_3same_fp_pair(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn)
137
static bool trans_##INSN##_fp_3s(DisasContext *s, arg_3same *a) \
138
{ \
139
if (a->size != 0) { \
140
- /* TODO fp16 support */ \
141
- return false; \
142
+ if (!dc_isar_feature(aa32_fp16_arith, s)) { \
143
+ return false; \
144
+ } \
145
+ return do_3same_fp_pair(s, a, FUNC##h); \
146
} \
147
- return do_3same_fp_pair(s, a, FUNC); \
148
+ return do_3same_fp_pair(s, a, FUNC##s); \
149
}
150
151
-DO_3S_FP_PAIR(VPADD, gen_helper_vfp_adds)
152
-DO_3S_FP_PAIR(VPMAX, gen_helper_vfp_maxs)
153
-DO_3S_FP_PAIR(VPMIN, gen_helper_vfp_mins)
154
+DO_3S_FP_PAIR(VPADD, gen_helper_neon_padd)
155
+DO_3S_FP_PAIR(VPMAX, gen_helper_neon_pmax)
156
+DO_3S_FP_PAIR(VPMIN, gen_helper_neon_pmin)
157
158
static bool do_vector_2sh(DisasContext *s, arg_2reg_shift *a, GVecGen2iFn *fn)
159
{
160
--
70
--
161
2.20.1
71
2.25.1
162
163
diff view generated by jsdifflib
1
From: Graeme Gregory <graeme@nuviainc.com>
1
From: Axel Heider <axel.heider@hensoldt.net>
2
2
3
A difference between sbsa platform and the virt platform is PSCI is
3
remove unused defines, add needed defines
4
handled by ARM-TF in the sbsa platform. This means that the PSCI code
5
there needs to communicate some of the platform power changes down
6
to the qemu code for things like shutdown/reset control.
7
4
8
Space has been left to extend the EC if we find other use cases in
5
Signed-off-by: Axel Heider <axel.heider@hensoldt.net>
9
future where ARM-TF and qemu need to communicate.
10
11
Signed-off-by: Graeme Gregory <graeme@nuviainc.com>
12
Reviewed-by: Leif Lindholm <leif@nuviainc.com>
13
Tested-by: Leif Lindholm <leif@nuviainc.com>
14
Message-id: 20200826141952.136164-2-graeme@nuviainc.com
15
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
16
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
17
---
8
---
18
hw/misc/sbsa_ec.c | 98 +++++++++++++++++++++++++++++++++++++++++++++
9
include/hw/timer/imx_epit.h | 4 ++--
19
hw/misc/meson.build | 2 +
10
hw/timer/imx_epit.c | 4 ++--
20
2 files changed, 100 insertions(+)
11
2 files changed, 4 insertions(+), 4 deletions(-)
21
create mode 100644 hw/misc/sbsa_ec.c
22
12
23
diff --git a/hw/misc/sbsa_ec.c b/hw/misc/sbsa_ec.c
13
diff --git a/include/hw/timer/imx_epit.h b/include/hw/timer/imx_epit.h
24
new file mode 100644
14
index XXXXXXX..XXXXXXX 100644
25
index XXXXXXX..XXXXXXX
15
--- a/include/hw/timer/imx_epit.h
26
--- /dev/null
16
+++ b/include/hw/timer/imx_epit.h
27
+++ b/hw/misc/sbsa_ec.c
28
@@ -XXX,XX +XXX,XX @@
17
@@ -XXX,XX +XXX,XX @@
29
+/*
18
#define CR_OCIEN (1 << 2)
30
+ * ARM SBSA Reference Platform Embedded Controller
19
#define CR_RLD (1 << 3)
31
+ *
20
#define CR_PRESCALE_SHIFT (4)
32
+ * A device to allow PSCI running in the secure side of sbsa-ref machine
21
-#define CR_PRESCALE_MASK (0xfff)
33
+ * to communicate platform power states to qemu.
22
+#define CR_PRESCALE_BITS (12)
34
+ *
23
#define CR_SWR (1 << 16)
35
+ * Copyright (c) 2020 Nuvia Inc
24
#define CR_IOVW (1 << 17)
36
+ * Written by Graeme Gregory <graeme@nuviainc.com>
25
#define CR_DBGEN (1 << 18)
37
+ *
26
@@ -XXX,XX +XXX,XX @@
38
+ * SPDX-License-Identifer: GPL-2.0-or-later
27
#define CR_DOZEN (1 << 20)
39
+ */
28
#define CR_STOPEN (1 << 21)
40
+
29
#define CR_CLKSRC_SHIFT (24)
41
+#include "qemu/osdep.h"
30
-#define CR_CLKSRC_MASK (0x3 << CR_CLKSRC_SHIFT)
42
+#include "qemu-common.h"
31
+#define CR_CLKSRC_BITS (2)
43
+#include "qemu/log.h"
32
44
+#include "hw/sysbus.h"
33
#define EPIT_TIMER_MAX 0XFFFFFFFFUL
45
+#include "sysemu/runstate.h"
34
46
+
35
diff --git a/hw/timer/imx_epit.c b/hw/timer/imx_epit.c
47
+typedef struct {
48
+ SysBusDevice parent_obj;
49
+ MemoryRegion iomem;
50
+} SECUREECState;
51
+
52
+#define TYPE_SBSA_EC "sbsa-ec"
53
+#define SECURE_EC(obj) OBJECT_CHECK(SECUREECState, (obj), TYPE_SBSA_EC)
54
+
55
+enum sbsa_ec_powerstates {
56
+ SBSA_EC_CMD_POWEROFF = 0x01,
57
+ SBSA_EC_CMD_REBOOT = 0x02,
58
+};
59
+
60
+static uint64_t sbsa_ec_read(void *opaque, hwaddr offset, unsigned size)
61
+{
62
+ /* No use for this currently */
63
+ qemu_log_mask(LOG_GUEST_ERROR, "sbsa-ec: no readable registers");
64
+ return 0;
65
+}
66
+
67
+static void sbsa_ec_write(void *opaque, hwaddr offset,
68
+ uint64_t value, unsigned size)
69
+{
70
+ if (offset == 0) { /* PSCI machine power command register */
71
+ switch (value) {
72
+ case SBSA_EC_CMD_POWEROFF:
73
+ qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
74
+ break;
75
+ case SBSA_EC_CMD_REBOOT:
76
+ qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
77
+ break;
78
+ default:
79
+ qemu_log_mask(LOG_GUEST_ERROR,
80
+ "sbsa-ec: unknown power command");
81
+ }
82
+ } else {
83
+ qemu_log_mask(LOG_GUEST_ERROR, "sbsa-ec: unknown EC register");
84
+ }
85
+}
86
+
87
+static const MemoryRegionOps sbsa_ec_ops = {
88
+ .read = sbsa_ec_read,
89
+ .write = sbsa_ec_write,
90
+ .endianness = DEVICE_NATIVE_ENDIAN,
91
+ .valid.min_access_size = 4,
92
+ .valid.max_access_size = 4,
93
+};
94
+
95
+static void sbsa_ec_init(Object *obj)
96
+{
97
+ SECUREECState *s = SECURE_EC(obj);
98
+ SysBusDevice *dev = SYS_BUS_DEVICE(obj);
99
+
100
+ memory_region_init_io(&s->iomem, obj, &sbsa_ec_ops, s, "sbsa-ec",
101
+ 0x1000);
102
+ sysbus_init_mmio(dev, &s->iomem);
103
+}
104
+
105
+static void sbsa_ec_class_init(ObjectClass *klass, void *data)
106
+{
107
+ DeviceClass *dc = DEVICE_CLASS(klass);
108
+
109
+ /* No vmstate or reset required: device has no internal state */
110
+ dc->user_creatable = false;
111
+}
112
+
113
+static const TypeInfo sbsa_ec_info = {
114
+ .name = TYPE_SBSA_EC,
115
+ .parent = TYPE_SYS_BUS_DEVICE,
116
+ .instance_size = sizeof(SECUREECState),
117
+ .instance_init = sbsa_ec_init,
118
+ .class_init = sbsa_ec_class_init,
119
+};
120
+
121
+static void sbsa_ec_register_type(void)
122
+{
123
+ type_register_static(&sbsa_ec_info);
124
+}
125
+
126
+type_init(sbsa_ec_register_type);
127
diff --git a/hw/misc/meson.build b/hw/misc/meson.build
128
index XXXXXXX..XXXXXXX 100644
36
index XXXXXXX..XXXXXXX 100644
129
--- a/hw/misc/meson.build
37
--- a/hw/timer/imx_epit.c
130
+++ b/hw/misc/meson.build
38
+++ b/hw/timer/imx_epit.c
131
@@ -XXX,XX +XXX,XX @@ specific_ss.add(when: 'CONFIG_MAC_VIA', if_true: files('mac_via.c'))
39
@@ -XXX,XX +XXX,XX @@ static void imx_epit_set_freq(IMXEPITState *s)
132
40
uint32_t clksrc;
133
specific_ss.add(when: 'CONFIG_MIPS_CPS', if_true: files('mips_cmgcr.c', 'mips_cpc.c'))
41
uint32_t prescaler;
134
specific_ss.add(when: 'CONFIG_MIPS_ITU', if_true: files('mips_itu.c'))
42
135
+
43
- clksrc = extract32(s->cr, CR_CLKSRC_SHIFT, 2);
136
+specific_ss.add(when: 'CONFIG_SBSA_REF', if_true: files('sbsa_ec.c'))
44
- prescaler = 1 + extract32(s->cr, CR_PRESCALE_SHIFT, 12);
45
+ clksrc = extract32(s->cr, CR_CLKSRC_SHIFT, CR_CLKSRC_BITS);
46
+ prescaler = 1 + extract32(s->cr, CR_PRESCALE_SHIFT, CR_PRESCALE_BITS);
47
48
s->freq = imx_ccm_get_clock_frequency(s->ccm,
49
imx_epit_clocks[clksrc]) / prescaler;
137
--
50
--
138
2.20.1
51
2.25.1
139
140
diff view generated by jsdifflib
1
From: Graeme Gregory <graeme@nuviainc.com>
1
From: Axel Heider <axel.heider@hensoldt.net>
2
2
3
Add the previously created sbsa-ec device to the sbsa-ref machine in
4
secure memory so the PSCI implementation in ARM-TF can access it, but
5
not expose it to non secure firmware or OS except by via ARM-TF.
6
7
Signed-off-by: Graeme Gregory <graeme@nuviainc.com>
8
Reviewed-by: Leif Lindholm <leif@nuviainc.com>
9
Tested-by: Leif Lindholm <leif@nuviainc.com>
10
Message-id: 20200826141952.136164-3-graeme@nuviainc.com
11
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
3
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
12
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
13
---
5
---
14
hw/arm/sbsa-ref.c | 14 ++++++++++++++
6
include/hw/timer/imx_epit.h | 2 ++
15
1 file changed, 14 insertions(+)
7
hw/timer/imx_epit.c | 12 ++++++------
8
2 files changed, 8 insertions(+), 6 deletions(-)
16
9
17
diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
10
diff --git a/include/hw/timer/imx_epit.h b/include/hw/timer/imx_epit.h
18
index XXXXXXX..XXXXXXX 100644
11
index XXXXXXX..XXXXXXX 100644
19
--- a/hw/arm/sbsa-ref.c
12
--- a/include/hw/timer/imx_epit.h
20
+++ b/hw/arm/sbsa-ref.c
13
+++ b/include/hw/timer/imx_epit.h
21
@@ -XXX,XX +XXX,XX @@ enum {
14
@@ -XXX,XX +XXX,XX @@
22
SBSA_CPUPERIPHS,
15
#define CR_CLKSRC_SHIFT (24)
23
SBSA_GIC_DIST,
16
#define CR_CLKSRC_BITS (2)
24
SBSA_GIC_REDIST,
17
25
+ SBSA_SECURE_EC,
18
+#define SR_OCIF (1 << 0)
26
SBSA_SMMU,
19
+
27
SBSA_UART,
20
#define EPIT_TIMER_MAX 0XFFFFFFFFUL
28
SBSA_RTC,
21
29
@@ -XXX,XX +XXX,XX @@ static const MemMapEntry sbsa_ref_memmap[] = {
22
#define TYPE_IMX_EPIT "imx.epit"
30
[SBSA_CPUPERIPHS] = { 0x40000000, 0x00040000 },
23
diff --git a/hw/timer/imx_epit.c b/hw/timer/imx_epit.c
31
[SBSA_GIC_DIST] = { 0x40060000, 0x00010000 },
24
index XXXXXXX..XXXXXXX 100644
32
[SBSA_GIC_REDIST] = { 0x40080000, 0x04000000 },
25
--- a/hw/timer/imx_epit.c
33
+ [SBSA_SECURE_EC] = { 0x50000000, 0x00001000 },
26
+++ b/hw/timer/imx_epit.c
34
[SBSA_UART] = { 0x60000000, 0x00001000 },
27
@@ -XXX,XX +XXX,XX @@ static const IMXClk imx_epit_clocks[] = {
35
[SBSA_RTC] = { 0x60010000, 0x00001000 },
28
*/
36
[SBSA_GPIO] = { 0x60020000, 0x00001000 },
29
static void imx_epit_update_int(IMXEPITState *s)
37
@@ -XXX,XX +XXX,XX @@ static void *sbsa_ref_dtb(const struct arm_boot_info *binfo, int *fdt_size)
30
{
38
return board->fdt;
31
- if (s->sr && (s->cr & CR_OCIEN) && (s->cr & CR_EN)) {
32
+ if ((s->sr & SR_OCIF) && (s->cr & CR_OCIEN) && (s->cr & CR_EN)) {
33
qemu_irq_raise(s->irq);
34
} else {
35
qemu_irq_lower(s->irq);
36
@@ -XXX,XX +XXX,XX @@ static void imx_epit_write(void *opaque, hwaddr offset, uint64_t value,
37
break;
38
39
case 1: /* SR - ACK*/
40
- /* writing 1 to OCIF clears the OCIF bit */
41
- if (value & 0x01) {
42
- s->sr = 0;
43
+ /* writing 1 to SR.OCIF clears this bit and turns the interrupt off */
44
+ if (value & SR_OCIF) {
45
+ s->sr = 0; /* SR.OCIF is the only bit in this register anyway */
46
imx_epit_update_int(s);
47
}
48
break;
49
@@ -XXX,XX +XXX,XX @@ static void imx_epit_cmp(void *opaque)
50
IMXEPITState *s = IMX_EPIT(opaque);
51
52
DPRINTF("sr was %d\n", s->sr);
53
-
54
- s->sr = 1;
55
+ /* Set interrupt status bit SR.OCIF and update the interrupt state */
56
+ s->sr |= SR_OCIF;
57
imx_epit_update_int(s);
39
}
58
}
40
59
41
+static void create_secure_ec(MemoryRegion *mem)
42
+{
43
+ hwaddr base = sbsa_ref_memmap[SBSA_SECURE_EC].base;
44
+ DeviceState *dev = qdev_new("sbsa-ec");
45
+ SysBusDevice *s = SYS_BUS_DEVICE(dev);
46
+
47
+ memory_region_add_subregion(mem, base,
48
+ sysbus_mmio_get_region(s, 0));
49
+}
50
+
51
static void sbsa_ref_init(MachineState *machine)
52
{
53
unsigned int smp_cpus = machine->smp.cpus;
54
@@ -XXX,XX +XXX,XX @@ static void sbsa_ref_init(MachineState *machine)
55
56
create_pcie(sms);
57
58
+ create_secure_ec(secure_sysmem);
59
+
60
sms->bootinfo.ram_size = machine->ram_size;
61
sms->bootinfo.nb_cpus = smp_cpus;
62
sms->bootinfo.board_id = -1;
63
--
60
--
64
2.20.1
61
2.25.1
65
66
diff view generated by jsdifflib
1
Convert the neon floating-point vector absolute comparison ops
1
From: Axel Heider <axel.heider@hensoldt.net>
2
VACGE and VACGT over to using a gvec hepler and use this to
3
implement the fp16 case.
4
2
3
The interrupt state can change due to:
4
- reset clears both SR.OCIF and CR.OCIE
5
- write to CR.EN or CR.OCIE
6
7
Signed-off-by: Axel Heider <axel.heider@hensoldt.net>
8
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
7
Message-id: 20200828183354.27913-28-peter.maydell@linaro.org
8
---
10
---
9
target/arm/helper.h | 6 ++++++
11
hw/timer/imx_epit.c | 16 ++++++++++++----
10
target/arm/vec_helper.c | 26 ++++++++++++++++++++++++++
12
1 file changed, 12 insertions(+), 4 deletions(-)
11
target/arm/translate-neon.c.inc | 4 ++--
12
3 files changed, 34 insertions(+), 2 deletions(-)
13
13
14
diff --git a/target/arm/helper.h b/target/arm/helper.h
14
diff --git a/hw/timer/imx_epit.c b/hw/timer/imx_epit.c
15
index XXXXXXX..XXXXXXX 100644
15
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/helper.h
16
--- a/hw/timer/imx_epit.c
17
+++ b/target/arm/helper.h
17
+++ b/hw/timer/imx_epit.c
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fcge_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
18
@@ -XXX,XX +XXX,XX @@ static void imx_epit_write(void *opaque, hwaddr offset, uint64_t value,
19
DEF_HELPER_FLAGS_5(gvec_fcgt_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
19
if (s->cr & CR_SWR) {
20
DEF_HELPER_FLAGS_5(gvec_fcgt_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
20
/* handle the reset */
21
21
imx_epit_reset(DEVICE(s));
22
+DEF_HELPER_FLAGS_5(gvec_facge_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
22
- /*
23
+DEF_HELPER_FLAGS_5(gvec_facge_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
23
- * TODO: could we 'break' here? following operations appear
24
- * to duplicate the work imx_epit_reset() already did.
25
- */
26
}
27
28
+ /*
29
+ * The interrupt state can change due to:
30
+ * - reset clears both SR.OCIF and CR.OCIE
31
+ * - write to CR.EN or CR.OCIE
32
+ */
33
+ imx_epit_update_int(s);
24
+
34
+
25
+DEF_HELPER_FLAGS_5(gvec_facgt_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
35
+ /*
26
+DEF_HELPER_FLAGS_5(gvec_facgt_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
36
+ * TODO: could we 'break' here for reset? following operations appear
37
+ * to duplicate the work imx_epit_reset() already did.
38
+ */
27
+
39
+
28
DEF_HELPER_FLAGS_5(gvec_ftsmul_h, TCG_CALL_NO_RWG,
40
ptimer_transaction_begin(s->timer_cmp);
29
void, ptr, ptr, ptr, ptr, i32)
41
ptimer_transaction_begin(s->timer_reload);
30
DEF_HELPER_FLAGS_5(gvec_ftsmul_s, TCG_CALL_NO_RWG,
31
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
32
index XXXXXXX..XXXXXXX 100644
33
--- a/target/arm/vec_helper.c
34
+++ b/target/arm/vec_helper.c
35
@@ -XXX,XX +XXX,XX @@ static uint32_t float32_cgt(float32 op1, float32 op2, float_status *stat)
36
return -float32_lt(op2, op1, stat);
37
}
38
39
+static uint16_t float16_acge(float16 op1, float16 op2, float_status *stat)
40
+{
41
+ return -float16_le(float16_abs(op2), float16_abs(op1), stat);
42
+}
43
+
44
+static uint32_t float32_acge(float32 op1, float32 op2, float_status *stat)
45
+{
46
+ return -float32_le(float32_abs(op2), float32_abs(op1), stat);
47
+}
48
+
49
+static uint16_t float16_acgt(float16 op1, float16 op2, float_status *stat)
50
+{
51
+ return -float16_lt(float16_abs(op2), float16_abs(op1), stat);
52
+}
53
+
54
+static uint32_t float32_acgt(float32 op1, float32 op2, float_status *stat)
55
+{
56
+ return -float32_lt(float32_abs(op2), float32_abs(op1), stat);
57
+}
58
+
59
#define DO_2OP(NAME, FUNC, TYPE) \
60
void HELPER(NAME)(void *vd, void *vn, void *stat, uint32_t desc) \
61
{ \
62
@@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_fcge_s, float32_cge, float32)
63
DO_3OP(gvec_fcgt_h, float16_cgt, float16)
64
DO_3OP(gvec_fcgt_s, float32_cgt, float32)
65
66
+DO_3OP(gvec_facge_h, float16_acge, float16)
67
+DO_3OP(gvec_facge_s, float32_acge, float32)
68
+
69
+DO_3OP(gvec_facgt_h, float16_acgt, float16)
70
+DO_3OP(gvec_facgt_s, float32_acgt, float32)
71
+
72
#ifdef TARGET_AARCH64
73
74
DO_3OP(gvec_recps_h, helper_recpsf_f16, float16)
75
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
76
index XXXXXXX..XXXXXXX 100644
77
--- a/target/arm/translate-neon.c.inc
78
+++ b/target/arm/translate-neon.c.inc
79
@@ -XXX,XX +XXX,XX @@ DO_3S_FP_GVEC(VMUL, gen_helper_gvec_fmul_s, gen_helper_gvec_fmul_h)
80
DO_3S_FP_GVEC(VCEQ, gen_helper_gvec_fceq_s, gen_helper_gvec_fceq_h)
81
DO_3S_FP_GVEC(VCGE, gen_helper_gvec_fcge_s, gen_helper_gvec_fcge_h)
82
DO_3S_FP_GVEC(VCGT, gen_helper_gvec_fcgt_s, gen_helper_gvec_fcgt_h)
83
+DO_3S_FP_GVEC(VACGE, gen_helper_gvec_facge_s, gen_helper_gvec_facge_h)
84
+DO_3S_FP_GVEC(VACGT, gen_helper_gvec_facgt_s, gen_helper_gvec_facgt_h)
85
86
/*
87
* For all the functions using this macro, size == 1 means fp16,
88
@@ -XXX,XX +XXX,XX @@ DO_3S_FP_GVEC(VCGT, gen_helper_gvec_fcgt_s, gen_helper_gvec_fcgt_h)
89
return do_3same_fp(s, a, FUNC, READS_VD); \
90
}
91
92
-DO_3S_FP(VACGE, gen_helper_neon_acge_f32, false)
93
-DO_3S_FP(VACGT, gen_helper_neon_acgt_f32, false)
94
DO_3S_FP(VMAX, gen_helper_vfp_maxs, false)
95
DO_3S_FP(VMIN, gen_helper_vfp_mins, false)
96
42
97
--
43
--
98
2.20.1
44
2.25.1
99
100
diff view generated by jsdifflib
1
Convert the Neon float-integer VCVT insns to gvec, and use this
1
From: Axel Heider <axel.heider@hensoldt.net>
2
to implement fp16 support for them.
3
2
4
Note that unlike the VFP int<->fp16 VCVT insns we converted
3
Signed-off-by: Axel Heider <axel.heider@hensoldt.net>
5
earlier and which convert to/from a 32-bit integer, these
4
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
6
Neon insns convert to/from 16-bit integers. So we can use
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
the existing vfp conversion helpers for the f32<->u32/i32
6
---
8
case but need to provide our own for f16<->u16/i16.
7
hw/timer/imx_epit.c | 20 ++++++++++++++------
8
1 file changed, 14 insertions(+), 6 deletions(-)
9
9
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
10
diff --git a/hw/timer/imx_epit.c b/hw/timer/imx_epit.c
11
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
12
Message-id: 20200828183354.27913-37-peter.maydell@linaro.org
13
---
14
target/arm/helper.h | 9 +++++++++
15
target/arm/vec_helper.c | 29 +++++++++++++++++++++++++++++
16
target/arm/translate-neon.c.inc | 15 ++++-----------
17
3 files changed, 42 insertions(+), 11 deletions(-)
18
19
diff --git a/target/arm/helper.h b/target/arm/helper.h
20
index XXXXXXX..XXXXXXX 100644
11
index XXXXXXX..XXXXXXX 100644
21
--- a/target/arm/helper.h
12
--- a/hw/timer/imx_epit.c
22
+++ b/target/arm/helper.h
13
+++ b/hw/timer/imx_epit.c
23
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(neon_padds, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
14
@@ -XXX,XX +XXX,XX @@ static void imx_epit_set_freq(IMXEPITState *s)
24
DEF_HELPER_FLAGS_5(neon_pmaxs, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
15
/*
25
DEF_HELPER_FLAGS_5(neon_pmins, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
16
* This is called both on hardware (device) reset and software reset.
26
17
*/
27
+DEF_HELPER_FLAGS_4(gvec_sstoh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
18
-static void imx_epit_reset(DeviceState *dev)
28
+DEF_HELPER_FLAGS_4(gvec_sitos, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
19
+static void imx_epit_reset(IMXEPITState *s, bool is_hard_reset)
29
+DEF_HELPER_FLAGS_4(gvec_ustoh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
20
{
30
+DEF_HELPER_FLAGS_4(gvec_uitos, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
21
- IMXEPITState *s = IMX_EPIT(dev);
31
+DEF_HELPER_FLAGS_4(gvec_tosszh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
22
-
32
+DEF_HELPER_FLAGS_4(gvec_tosizs, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
23
/* Soft reset doesn't touch some bits; hard reset clears them */
33
+DEF_HELPER_FLAGS_4(gvec_touszh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
24
- s->cr &= (CR_EN|CR_ENMOD|CR_STOPEN|CR_DOZEN|CR_WAITEN|CR_DBGEN);
34
+DEF_HELPER_FLAGS_4(gvec_touizs, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
25
+ if (is_hard_reset) {
35
+
26
+ s->cr = 0;
36
DEF_HELPER_FLAGS_4(gvec_frecpe_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
27
+ } else {
37
DEF_HELPER_FLAGS_4(gvec_frecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
28
+ s->cr &= (CR_EN|CR_ENMOD|CR_STOPEN|CR_DOZEN|CR_WAITEN|CR_DBGEN);
38
DEF_HELPER_FLAGS_4(gvec_frecpe_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
29
+ }
39
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
30
s->sr = 0;
40
index XXXXXXX..XXXXXXX 100644
31
s->lr = EPIT_TIMER_MAX;
41
--- a/target/arm/vec_helper.c
32
s->cmp = 0;
42
+++ b/target/arm/vec_helper.c
33
@@ -XXX,XX +XXX,XX @@ static void imx_epit_write(void *opaque, hwaddr offset, uint64_t value,
43
@@ -XXX,XX +XXX,XX @@ static uint32_t float32_acgt(float32 op1, float32 op2, float_status *stat)
34
s->cr = value & 0x03ffffff;
44
return -float32_lt(float32_abs(op2), float32_abs(op1), stat);
35
if (s->cr & CR_SWR) {
36
/* handle the reset */
37
- imx_epit_reset(DEVICE(s));
38
+ imx_epit_reset(s, false);
39
}
40
41
/*
42
@@ -XXX,XX +XXX,XX @@ static void imx_epit_realize(DeviceState *dev, Error **errp)
43
s->timer_cmp = ptimer_init(imx_epit_cmp, s, PTIMER_POLICY_LEGACY);
45
}
44
}
46
45
47
+static int16_t vfp_tosszh(float16 x, void *fpstp)
46
+static void imx_epit_dev_reset(DeviceState *dev)
48
+{
47
+{
49
+ float_status *fpst = fpstp;
48
+ IMXEPITState *s = IMX_EPIT(dev);
50
+ if (float16_is_any_nan(x)) {
49
+ imx_epit_reset(s, true);
51
+ float_raise(float_flag_invalid, fpst);
52
+ return 0;
53
+ }
54
+ return float16_to_int16_round_to_zero(x, fpst);
55
+}
50
+}
56
+
51
+
57
+static uint16_t vfp_touszh(float16 x, void *fpstp)
52
static void imx_epit_class_init(ObjectClass *klass, void *data)
58
+{
53
{
59
+ float_status *fpst = fpstp;
54
DeviceClass *dc = DEVICE_CLASS(klass);
60
+ if (float16_is_any_nan(x)) {
55
61
+ float_raise(float_flag_invalid, fpst);
56
dc->realize = imx_epit_realize;
62
+ return 0;
57
- dc->reset = imx_epit_reset;
63
+ }
58
+ dc->reset = imx_epit_dev_reset;
64
+ return float16_to_uint16_round_to_zero(x, fpst);
59
dc->vmsd = &vmstate_imx_timer_epit;
65
+}
60
dc->desc = "i.MX periodic timer";
66
+
67
#define DO_2OP(NAME, FUNC, TYPE) \
68
void HELPER(NAME)(void *vd, void *vn, void *stat, uint32_t desc) \
69
{ \
70
@@ -XXX,XX +XXX,XX @@ DO_2OP(gvec_frsqrte_h, helper_rsqrte_f16, float16)
71
DO_2OP(gvec_frsqrte_s, helper_rsqrte_f32, float32)
72
DO_2OP(gvec_frsqrte_d, helper_rsqrte_f64, float64)
73
74
+DO_2OP(gvec_sitos, helper_vfp_sitos, int32_t)
75
+DO_2OP(gvec_uitos, helper_vfp_uitos, uint32_t)
76
+DO_2OP(gvec_tosizs, helper_vfp_tosizs, float32)
77
+DO_2OP(gvec_touizs, helper_vfp_touizs, float32)
78
+DO_2OP(gvec_sstoh, int16_to_float16, int16_t)
79
+DO_2OP(gvec_ustoh, uint16_to_float16, uint16_t)
80
+DO_2OP(gvec_tosszh, vfp_tosszh, float16)
81
+DO_2OP(gvec_touszh, vfp_touszh, float16)
82
+
83
#define WRAP_CMP0_FWD(FN, CMPOP, TYPE) \
84
static TYPE TYPE##_##FN##0(TYPE op, float_status *stat) \
85
{ \
86
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
87
index XXXXXXX..XXXXXXX 100644
88
--- a/target/arm/translate-neon.c.inc
89
+++ b/target/arm/translate-neon.c.inc
90
@@ -XXX,XX +XXX,XX @@ static bool do_2misc_fp(DisasContext *s, arg_2misc *a,
91
return true;
92
}
61
}
93
94
-#define DO_2MISC_FP(INSN, FUNC) \
95
- static bool trans_##INSN(DisasContext *s, arg_2misc *a) \
96
- { \
97
- return do_2misc_fp(s, a, FUNC); \
98
- }
99
-
100
-DO_2MISC_FP(VCVT_FS, gen_helper_vfp_sitos)
101
-DO_2MISC_FP(VCVT_FU, gen_helper_vfp_uitos)
102
-DO_2MISC_FP(VCVT_SF, gen_helper_vfp_tosizs)
103
-DO_2MISC_FP(VCVT_UF, gen_helper_vfp_touizs)
104
-
105
#define DO_2MISC_FP_VEC(INSN, HFUNC, SFUNC) \
106
static void gen_##INSN(unsigned vece, uint32_t rd_ofs, \
107
uint32_t rm_ofs, \
108
@@ -XXX,XX +XXX,XX @@ DO_2MISC_FP_VEC(VCGE0_F, gen_helper_gvec_fcge0_h, gen_helper_gvec_fcge0_s)
109
DO_2MISC_FP_VEC(VCEQ0_F, gen_helper_gvec_fceq0_h, gen_helper_gvec_fceq0_s)
110
DO_2MISC_FP_VEC(VCLT0_F, gen_helper_gvec_fclt0_h, gen_helper_gvec_fclt0_s)
111
DO_2MISC_FP_VEC(VCLE0_F, gen_helper_gvec_fcle0_h, gen_helper_gvec_fcle0_s)
112
+DO_2MISC_FP_VEC(VCVT_FS, gen_helper_gvec_sstoh, gen_helper_gvec_sitos)
113
+DO_2MISC_FP_VEC(VCVT_FU, gen_helper_gvec_ustoh, gen_helper_gvec_uitos)
114
+DO_2MISC_FP_VEC(VCVT_SF, gen_helper_gvec_tosszh, gen_helper_gvec_tosizs)
115
+DO_2MISC_FP_VEC(VCVT_UF, gen_helper_gvec_touszh, gen_helper_gvec_touizs)
116
117
static bool trans_VRINTX(DisasContext *s, arg_2misc *a)
118
{
119
--
62
--
120
2.20.1
63
2.25.1
121
122
diff view generated by jsdifflib
1
Implement the fp16 versions of the VFP VCVT instruction forms which
1
From: Axel Heider <axel.heider@hensoldt.net>
2
convert between floating point and integer.
3
2
3
Signed-off-by: Axel Heider <axel.heider@hensoldt.net>
4
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Message-id: 20200828183354.27913-13-peter.maydell@linaro.org
7
---
6
---
8
target/arm/vfp.decode | 4 +++
7
hw/timer/imx_epit.c | 215 ++++++++++++++++++++++++--------------------
9
target/arm/translate-vfp.c.inc | 65 ++++++++++++++++++++++++++++++++++
8
1 file changed, 117 insertions(+), 98 deletions(-)
10
2 files changed, 69 insertions(+)
11
9
12
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
10
diff --git a/hw/timer/imx_epit.c b/hw/timer/imx_epit.c
13
index XXXXXXX..XXXXXXX 100644
11
index XXXXXXX..XXXXXXX 100644
14
--- a/target/arm/vfp.decode
12
--- a/hw/timer/imx_epit.c
15
+++ b/target/arm/vfp.decode
13
+++ b/hw/timer/imx_epit.c
16
@@ -XXX,XX +XXX,XX @@ VCVT_sp ---- 1110 1.11 0111 .... 1010 11.0 .... @vfp_dm_ds
14
@@ -XXX,XX +XXX,XX @@ static void imx_epit_reload_compare_timer(IMXEPITState *s)
17
VCVT_dp ---- 1110 1.11 0111 .... 1011 11.0 .... @vfp_dm_sd
15
}
18
19
# VCVT from integer to floating point: Vm always single; Vd depends on size
20
+VCVT_int_hp ---- 1110 1.11 1000 .... 1001 s:1 1.0 .... \
21
+ vd=%vd_sp vm=%vm_sp
22
VCVT_int_sp ---- 1110 1.11 1000 .... 1010 s:1 1.0 .... \
23
vd=%vd_sp vm=%vm_sp
24
VCVT_int_dp ---- 1110 1.11 1000 .... 1011 s:1 1.0 .... \
25
@@ -XXX,XX +XXX,XX @@ VCVT_fix_dp ---- 1110 1.11 1.1. .... 1011 .1.0 .... \
26
vd=%vd_dp imm=%vm_sp opc=%vcvt_fix_op
27
28
# VCVT float to integer (VCVT and VCVTR): Vd always single; Vd depends on size
29
+VCVT_hp_int ---- 1110 1.11 110 s:1 .... 1001 rz:1 1.0 .... \
30
+ vd=%vd_sp vm=%vm_sp
31
VCVT_sp_int ---- 1110 1.11 110 s:1 .... 1010 rz:1 1.0 .... \
32
vd=%vd_sp vm=%vm_sp
33
VCVT_dp_int ---- 1110 1.11 110 s:1 .... 1011 rz:1 1.0 .... \
34
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
35
index XXXXXXX..XXXXXXX 100644
36
--- a/target/arm/translate-vfp.c.inc
37
+++ b/target/arm/translate-vfp.c.inc
38
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
39
return true;
40
}
16
}
41
17
42
+static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
18
+static void imx_epit_write_cr(IMXEPITState *s, uint32_t value)
43
+{
19
+{
44
+ TCGv_i32 vm;
20
+ uint32_t oldcr = s->cr;
45
+ TCGv_ptr fpst;
21
+
46
+
22
+ s->cr = value & 0x03ffffff;
47
+ if (!dc_isar_feature(aa32_fp16_arith, s)) {
23
+
48
+ return false;
24
+ if (s->cr & CR_SWR) {
49
+ }
25
+ /* handle the reset */
50
+
26
+ imx_epit_reset(s, false);
51
+ if (!vfp_access_check(s)) {
27
+ }
52
+ return true;
28
+
53
+ }
29
+ /*
54
+
30
+ * The interrupt state can change due to:
55
+ vm = tcg_temp_new_i32();
31
+ * - reset clears both SR.OCIF and CR.OCIE
56
+ neon_load_reg32(vm, a->vm);
32
+ * - write to CR.EN or CR.OCIE
57
+ fpst = fpstatus_ptr(FPST_FPCR_F16);
33
+ */
58
+ if (a->s) {
34
+ imx_epit_update_int(s);
59
+ /* i32 -> f16 */
35
+
60
+ gen_helper_vfp_sitoh(vm, vm, fpst);
36
+ /*
61
+ } else {
37
+ * TODO: could we 'break' here for reset? following operations appear
62
+ /* u32 -> f16 */
38
+ * to duplicate the work imx_epit_reset() already did.
63
+ gen_helper_vfp_uitoh(vm, vm, fpst);
39
+ */
64
+ }
40
+
65
+ neon_store_reg32(vm, a->vd);
41
+ ptimer_transaction_begin(s->timer_cmp);
66
+ tcg_temp_free_i32(vm);
42
+ ptimer_transaction_begin(s->timer_reload);
67
+ tcg_temp_free_ptr(fpst);
43
+
68
+ return true;
44
+ /* Update the frequency. Has been done already in case of a reset. */
69
+}
45
+ if (!(s->cr & CR_SWR)) {
70
+
46
+ imx_epit_set_freq(s);
71
static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
47
+ }
72
{
48
+
73
TCGv_i32 vm;
49
+ if (s->freq && (s->cr & CR_EN) && !(oldcr & CR_EN)) {
74
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
50
+ if (s->cr & CR_ENMOD) {
75
return true;
51
+ if (s->cr & CR_RLD) {
76
}
52
+ ptimer_set_limit(s->timer_reload, s->lr, 1);
77
53
+ ptimer_set_limit(s->timer_cmp, s->lr, 1);
78
+static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
54
+ } else {
79
+{
55
+ ptimer_set_limit(s->timer_reload, EPIT_TIMER_MAX, 1);
80
+ TCGv_i32 vm;
56
+ ptimer_set_limit(s->timer_cmp, EPIT_TIMER_MAX, 1);
81
+ TCGv_ptr fpst;
57
+ }
82
+
58
+ }
83
+ if (!dc_isar_feature(aa32_fp16_arith, s)) {
59
+
84
+ return false;
60
+ imx_epit_reload_compare_timer(s);
85
+ }
61
+ ptimer_run(s->timer_reload, 0);
86
+
62
+ if (s->cr & CR_OCIEN) {
87
+ if (!vfp_access_check(s)) {
63
+ ptimer_run(s->timer_cmp, 0);
88
+ return true;
89
+ }
90
+
91
+ fpst = fpstatus_ptr(FPST_FPCR_F16);
92
+ vm = tcg_temp_new_i32();
93
+ neon_load_reg32(vm, a->vm);
94
+
95
+ if (a->s) {
96
+ if (a->rz) {
97
+ gen_helper_vfp_tosizh(vm, vm, fpst);
98
+ } else {
64
+ } else {
99
+ gen_helper_vfp_tosih(vm, vm, fpst);
65
+ ptimer_stop(s->timer_cmp);
66
+ }
67
+ } else if (!(s->cr & CR_EN)) {
68
+ /* stop both timers */
69
+ ptimer_stop(s->timer_reload);
70
+ ptimer_stop(s->timer_cmp);
71
+ } else if (s->cr & CR_OCIEN) {
72
+ if (!(oldcr & CR_OCIEN)) {
73
+ imx_epit_reload_compare_timer(s);
74
+ ptimer_run(s->timer_cmp, 0);
100
+ }
75
+ }
101
+ } else {
76
+ } else {
102
+ if (a->rz) {
77
+ ptimer_stop(s->timer_cmp);
103
+ gen_helper_vfp_touizh(vm, vm, fpst);
78
+ }
104
+ } else {
79
+
105
+ gen_helper_vfp_touih(vm, vm, fpst);
80
+ ptimer_transaction_commit(s->timer_cmp);
106
+ }
81
+ ptimer_transaction_commit(s->timer_reload);
107
+ }
82
+}
108
+ neon_store_reg32(vm, a->vd);
83
+
109
+ tcg_temp_free_i32(vm);
84
+static void imx_epit_write_sr(IMXEPITState *s, uint32_t value)
110
+ tcg_temp_free_ptr(fpst);
85
+{
111
+ return true;
86
+ /* writing 1 to SR.OCIF clears this bit and turns the interrupt off */
112
+}
87
+ if (value & SR_OCIF) {
113
+
88
+ s->sr = 0; /* SR.OCIF is the only bit in this register anyway */
114
static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
89
+ imx_epit_update_int(s);
90
+ }
91
+}
92
+
93
+static void imx_epit_write_lr(IMXEPITState *s, uint32_t value)
94
+{
95
+ s->lr = value;
96
+
97
+ ptimer_transaction_begin(s->timer_cmp);
98
+ ptimer_transaction_begin(s->timer_reload);
99
+ if (s->cr & CR_RLD) {
100
+ /* Also set the limit if the LRD bit is set */
101
+ /* If IOVW bit is set then set the timer value */
102
+ ptimer_set_limit(s->timer_reload, s->lr, s->cr & CR_IOVW);
103
+ ptimer_set_limit(s->timer_cmp, s->lr, 0);
104
+ } else if (s->cr & CR_IOVW) {
105
+ /* If IOVW bit is set then set the timer value */
106
+ ptimer_set_count(s->timer_reload, s->lr);
107
+ }
108
+ /*
109
+ * Commit the change to s->timer_reload, so it can propagate. Otherwise
110
+ * the timer interrupt may not fire properly. The commit must happen
111
+ * before calling imx_epit_reload_compare_timer(), which reads
112
+ * s->timer_reload internally again.
113
+ */
114
+ ptimer_transaction_commit(s->timer_reload);
115
+ imx_epit_reload_compare_timer(s);
116
+ ptimer_transaction_commit(s->timer_cmp);
117
+}
118
+
119
+static void imx_epit_write_cmp(IMXEPITState *s, uint32_t value)
120
+{
121
+ s->cmp = value;
122
+
123
+ ptimer_transaction_begin(s->timer_cmp);
124
+ imx_epit_reload_compare_timer(s);
125
+ ptimer_transaction_commit(s->timer_cmp);
126
+}
127
+
128
static void imx_epit_write(void *opaque, hwaddr offset, uint64_t value,
129
unsigned size)
115
{
130
{
116
TCGv_i32 vm;
131
IMXEPITState *s = IMX_EPIT(opaque);
132
- uint64_t oldcr;
133
134
DPRINTF("(%s, value = 0x%08x)\n", imx_epit_reg_name(offset >> 2),
135
(uint32_t)value);
136
137
switch (offset >> 2) {
138
case 0: /* CR */
139
-
140
- oldcr = s->cr;
141
- s->cr = value & 0x03ffffff;
142
- if (s->cr & CR_SWR) {
143
- /* handle the reset */
144
- imx_epit_reset(s, false);
145
- }
146
-
147
- /*
148
- * The interrupt state can change due to:
149
- * - reset clears both SR.OCIF and CR.OCIE
150
- * - write to CR.EN or CR.OCIE
151
- */
152
- imx_epit_update_int(s);
153
-
154
- /*
155
- * TODO: could we 'break' here for reset? following operations appear
156
- * to duplicate the work imx_epit_reset() already did.
157
- */
158
-
159
- ptimer_transaction_begin(s->timer_cmp);
160
- ptimer_transaction_begin(s->timer_reload);
161
-
162
- /* Update the frequency. Has been done already in case of a reset. */
163
- if (!(s->cr & CR_SWR)) {
164
- imx_epit_set_freq(s);
165
- }
166
-
167
- if (s->freq && (s->cr & CR_EN) && !(oldcr & CR_EN)) {
168
- if (s->cr & CR_ENMOD) {
169
- if (s->cr & CR_RLD) {
170
- ptimer_set_limit(s->timer_reload, s->lr, 1);
171
- ptimer_set_limit(s->timer_cmp, s->lr, 1);
172
- } else {
173
- ptimer_set_limit(s->timer_reload, EPIT_TIMER_MAX, 1);
174
- ptimer_set_limit(s->timer_cmp, EPIT_TIMER_MAX, 1);
175
- }
176
- }
177
-
178
- imx_epit_reload_compare_timer(s);
179
- ptimer_run(s->timer_reload, 0);
180
- if (s->cr & CR_OCIEN) {
181
- ptimer_run(s->timer_cmp, 0);
182
- } else {
183
- ptimer_stop(s->timer_cmp);
184
- }
185
- } else if (!(s->cr & CR_EN)) {
186
- /* stop both timers */
187
- ptimer_stop(s->timer_reload);
188
- ptimer_stop(s->timer_cmp);
189
- } else if (s->cr & CR_OCIEN) {
190
- if (!(oldcr & CR_OCIEN)) {
191
- imx_epit_reload_compare_timer(s);
192
- ptimer_run(s->timer_cmp, 0);
193
- }
194
- } else {
195
- ptimer_stop(s->timer_cmp);
196
- }
197
-
198
- ptimer_transaction_commit(s->timer_cmp);
199
- ptimer_transaction_commit(s->timer_reload);
200
+ imx_epit_write_cr(s, (uint32_t)value);
201
break;
202
203
- case 1: /* SR - ACK*/
204
- /* writing 1 to SR.OCIF clears this bit and turns the interrupt off */
205
- if (value & SR_OCIF) {
206
- s->sr = 0; /* SR.OCIF is the only bit in this register anyway */
207
- imx_epit_update_int(s);
208
- }
209
+ case 1: /* SR */
210
+ imx_epit_write_sr(s, (uint32_t)value);
211
break;
212
213
- case 2: /* LR - set ticks */
214
- s->lr = value;
215
-
216
- ptimer_transaction_begin(s->timer_cmp);
217
- ptimer_transaction_begin(s->timer_reload);
218
- if (s->cr & CR_RLD) {
219
- /* Also set the limit if the LRD bit is set */
220
- /* If IOVW bit is set then set the timer value */
221
- ptimer_set_limit(s->timer_reload, s->lr, s->cr & CR_IOVW);
222
- ptimer_set_limit(s->timer_cmp, s->lr, 0);
223
- } else if (s->cr & CR_IOVW) {
224
- /* If IOVW bit is set then set the timer value */
225
- ptimer_set_count(s->timer_reload, s->lr);
226
- }
227
- /*
228
- * Commit the change to s->timer_reload, so it can propagate. Otherwise
229
- * the timer interrupt may not fire properly. The commit must happen
230
- * before calling imx_epit_reload_compare_timer(), which reads
231
- * s->timer_reload internally again.
232
- */
233
- ptimer_transaction_commit(s->timer_reload);
234
- imx_epit_reload_compare_timer(s);
235
- ptimer_transaction_commit(s->timer_cmp);
236
+ case 2: /* LR */
237
+ imx_epit_write_lr(s, (uint32_t)value);
238
break;
239
240
case 3: /* CMP */
241
- s->cmp = value;
242
-
243
- ptimer_transaction_begin(s->timer_cmp);
244
- imx_epit_reload_compare_timer(s);
245
- ptimer_transaction_commit(s->timer_cmp);
246
-
247
+ imx_epit_write_cmp(s, (uint32_t)value);
248
break;
249
250
default:
251
qemu_log_mask(LOG_GUEST_ERROR, "[%s]%s: Bad register at offset 0x%"
252
HWADDR_PRIx "\n", TYPE_IMX_EPIT, __func__, offset);
253
-
254
break;
255
}
256
}
257
+
258
static void imx_epit_cmp(void *opaque)
259
{
260
IMXEPITState *s = IMX_EPIT(opaque);
117
--
261
--
118
2.20.1
262
2.25.1
119
120
diff view generated by jsdifflib
1
Implement VFP fp16 support for fused multiply-add insns
1
From: Axel Heider <axel.heider@hensoldt.net>
2
VFNMA, VFNMS, VFMA, VFMS.
3
2
3
The CNT register is a read-only register. There is no need to
4
store it's value, it can be calculated on demand.
5
The calculated frequency is needed temporarily only.
6
7
Note that this is a migration compatibility break for all boards
8
types that use the EPIT peripheral.
9
10
Signed-off-by: Axel Heider <axel.heider@hensoldt.net>
11
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
12
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Message-id: 20200828183354.27913-7-peter.maydell@linaro.org
7
---
13
---
8
target/arm/helper.h | 1 +
14
include/hw/timer/imx_epit.h | 2 -
9
target/arm/vfp.decode | 5 +++
15
hw/timer/imx_epit.c | 73 ++++++++++++++-----------------------
10
target/arm/vfp_helper.c | 7 ++++
16
2 files changed, 28 insertions(+), 47 deletions(-)
11
target/arm/translate-vfp.c.inc | 64 ++++++++++++++++++++++++++++++++++
12
4 files changed, 77 insertions(+)
13
17
14
diff --git a/target/arm/helper.h b/target/arm/helper.h
18
diff --git a/include/hw/timer/imx_epit.h b/include/hw/timer/imx_epit.h
15
index XXXXXXX..XXXXXXX 100644
19
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/helper.h
20
--- a/include/hw/timer/imx_epit.h
17
+++ b/target/arm/helper.h
21
+++ b/include/hw/timer/imx_epit.h
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(vfp_fcvt_f64_to_f16, TCG_CALL_NO_RWG, f16, f64, ptr, i32)
22
@@ -XXX,XX +XXX,XX @@ struct IMXEPITState {
19
23
uint32_t sr;
20
DEF_HELPER_4(vfp_muladdd, f64, f64, f64, f64, ptr)
24
uint32_t lr;
21
DEF_HELPER_4(vfp_muladds, f32, f32, f32, f32, ptr)
25
uint32_t cmp;
22
+DEF_HELPER_4(vfp_muladdh, f16, f16, f16, f16, ptr)
26
- uint32_t cnt;
23
27
24
DEF_HELPER_3(recps_f32, f32, env, f32, f32)
28
- uint32_t freq;
25
DEF_HELPER_3(rsqrts_f32, f32, env, f32, f32)
29
qemu_irq irq;
26
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
30
};
31
32
diff --git a/hw/timer/imx_epit.c b/hw/timer/imx_epit.c
27
index XXXXXXX..XXXXXXX 100644
33
index XXXXXXX..XXXXXXX 100644
28
--- a/target/arm/vfp.decode
34
--- a/hw/timer/imx_epit.c
29
+++ b/target/arm/vfp.decode
35
+++ b/hw/timer/imx_epit.c
30
@@ -XXX,XX +XXX,XX @@ VDIV_hp ---- 1110 1.00 .... .... 1001 .0.0 .... @vfp_dnm_s
36
@@ -XXX,XX +XXX,XX @@ static void imx_epit_update_int(IMXEPITState *s)
31
VDIV_sp ---- 1110 1.00 .... .... 1010 .0.0 .... @vfp_dnm_s
37
}
32
VDIV_dp ---- 1110 1.00 .... .... 1011 .0.0 .... @vfp_dnm_d
33
34
+VFMA_hp ---- 1110 1.10 .... .... 1001 .0. 0 .... @vfp_dnm_s
35
+VFMS_hp ---- 1110 1.10 .... .... 1001 .1. 0 .... @vfp_dnm_s
36
+VFNMA_hp ---- 1110 1.01 .... .... 1001 .0. 0 .... @vfp_dnm_s
37
+VFNMS_hp ---- 1110 1.01 .... .... 1001 .1. 0 .... @vfp_dnm_s
38
+
39
VFMA_sp ---- 1110 1.10 .... .... 1010 .0. 0 .... @vfp_dnm_s
40
VFMS_sp ---- 1110 1.10 .... .... 1010 .1. 0 .... @vfp_dnm_s
41
VFNMA_sp ---- 1110 1.01 .... .... 1010 .0. 0 .... @vfp_dnm_s
42
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
43
index XXXXXXX..XXXXXXX 100644
44
--- a/target/arm/vfp_helper.c
45
+++ b/target/arm/vfp_helper.c
46
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(rsqrte_u32)(uint32_t a)
47
}
38
}
48
39
49
/* VFPv4 fused multiply-accumulate */
40
-/*
50
+dh_ctype_f16 VFP_HELPER(muladd, h)(dh_ctype_f16 a, dh_ctype_f16 b,
41
- * Must be called from within a ptimer_transaction_begin/commit block
51
+ dh_ctype_f16 c, void *fpstp)
42
- * for both s->timer_cmp and s->timer_reload.
52
+{
43
- */
53
+ float_status *fpst = fpstp;
44
-static void imx_epit_set_freq(IMXEPITState *s)
54
+ return float16_muladd(a, b, c, 0, fpst);
45
+static uint32_t imx_epit_get_freq(IMXEPITState *s)
55
+}
56
+
57
float32 VFP_HELPER(muladd, s)(float32 a, float32 b, float32 c, void *fpstp)
58
{
46
{
59
float_status *fpst = fpstp;
47
- uint32_t clksrc;
60
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
48
- uint32_t prescaler;
61
index XXXXXXX..XXXXXXX 100644
49
-
62
--- a/target/arm/translate-vfp.c.inc
50
- clksrc = extract32(s->cr, CR_CLKSRC_SHIFT, CR_CLKSRC_BITS);
63
+++ b/target/arm/translate-vfp.c.inc
51
- prescaler = 1 + extract32(s->cr, CR_PRESCALE_SHIFT, CR_PRESCALE_BITS);
64
@@ -XXX,XX +XXX,XX @@ static bool trans_VMAXNM_dp(DisasContext *s, arg_VMAXNM_dp *a)
52
-
65
a->vd, a->vn, a->vm, false);
53
- s->freq = imx_ccm_get_clock_frequency(s->ccm,
54
- imx_epit_clocks[clksrc]) / prescaler;
55
-
56
- DPRINTF("Setting ptimer frequency to %u\n", s->freq);
57
-
58
- if (s->freq) {
59
- ptimer_set_freq(s->timer_reload, s->freq);
60
- ptimer_set_freq(s->timer_cmp, s->freq);
61
- }
62
+ uint32_t clksrc = extract32(s->cr, CR_CLKSRC_SHIFT, CR_CLKSRC_BITS);
63
+ uint32_t prescaler = 1 + extract32(s->cr, CR_PRESCALE_SHIFT, CR_PRESCALE_BITS);
64
+ uint32_t f_in = imx_ccm_get_clock_frequency(s->ccm, imx_epit_clocks[clksrc]);
65
+ uint32_t freq = f_in / prescaler;
66
+ DPRINTF("ptimer frequency is %u\n", freq);
67
+ return freq;
66
}
68
}
67
69
68
+static bool do_vfm_hp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
70
/*
69
+{
71
@@ -XXX,XX +XXX,XX @@ static void imx_epit_reset(IMXEPITState *s, bool is_hard_reset)
70
+ /*
72
s->sr = 0;
71
+ * VFNMA : fd = muladd(-fd, fn, fm)
73
s->lr = EPIT_TIMER_MAX;
72
+ * VFNMS : fd = muladd(-fd, -fn, fm)
74
s->cmp = 0;
73
+ * VFMA : fd = muladd( fd, fn, fm)
75
- s->cnt = 0;
74
+ * VFMS : fd = muladd( fd, -fn, fm)
76
ptimer_transaction_begin(s->timer_cmp);
75
+ *
77
ptimer_transaction_begin(s->timer_reload);
76
+ * These are fused multiply-add, and must be done as one floating
78
- /* stop both timers */
77
+ * point operation with no rounding between the multiplication and
78
+ * addition steps. NB that doing the negations here as separate
79
+ * steps is correct : an input NaN should come out with its sign
80
+ * bit flipped if it is a negated-input.
81
+ */
82
+ TCGv_ptr fpst;
83
+ TCGv_i32 vn, vm, vd;
84
+
79
+
85
+ /*
80
+ /*
86
+ * Present in VFPv4 only, and only with the FP16 extension.
81
+ * The reset switches off the input clock, so even if the CR.EN is still
87
+ * Note that we can't rely on the SIMDFMAC check alone, because
82
+ * set, the timers are no longer running.
88
+ * in a Neon-no-VFP core that ID register field will be non-zero.
89
+ */
83
+ */
90
+ if (!dc_isar_feature(aa32_fp16_arith, s) ||
84
+ assert(imx_epit_get_freq(s) == 0);
91
+ !dc_isar_feature(aa32_simdfmac, s) ||
85
ptimer_stop(s->timer_cmp);
92
+ !dc_isar_feature(aa32_fpsp_v2, s)) {
86
ptimer_stop(s->timer_reload);
93
+ return false;
87
- /* compute new frequency */
94
+ }
88
- imx_epit_set_freq(s);
95
+
89
/* init both timers to EPIT_TIMER_MAX */
96
+ if (s->vec_len != 0 || s->vec_stride != 0) {
90
ptimer_set_limit(s->timer_cmp, EPIT_TIMER_MAX, 1);
97
+ return false;
91
ptimer_set_limit(s->timer_reload, EPIT_TIMER_MAX, 1);
98
+ }
92
- if (s->freq && (s->cr & CR_EN)) {
99
+
93
- /* if the timer is still enabled, restart it */
100
+ if (!vfp_access_check(s)) {
94
- ptimer_run(s->timer_reload, 0);
101
+ return true;
95
- }
102
+ }
96
ptimer_transaction_commit(s->timer_cmp);
103
+
97
ptimer_transaction_commit(s->timer_reload);
104
+ vn = tcg_temp_new_i32();
98
}
105
+ vm = tcg_temp_new_i32();
99
106
+ vd = tcg_temp_new_i32();
100
-static uint32_t imx_epit_update_count(IMXEPITState *s)
107
+
101
-{
108
+ neon_load_reg32(vn, a->vn);
102
- s->cnt = ptimer_get_count(s->timer_reload);
109
+ neon_load_reg32(vm, a->vm);
103
-
110
+ if (neg_n) {
104
- return s->cnt;
111
+ /* VFNMS, VFMS */
105
-}
112
+ gen_helper_vfp_negh(vn, vn);
106
-
113
+ }
107
static uint64_t imx_epit_read(void *opaque, hwaddr offset, unsigned size)
114
+ neon_load_reg32(vd, a->vd);
115
+ if (neg_d) {
116
+ /* VFNMA, VFNMS */
117
+ gen_helper_vfp_negh(vd, vd);
118
+ }
119
+ fpst = fpstatus_ptr(FPST_FPCR_F16);
120
+ gen_helper_vfp_muladdh(vd, vn, vm, vd, fpst);
121
+ neon_store_reg32(vd, a->vd);
122
+
123
+ tcg_temp_free_ptr(fpst);
124
+ tcg_temp_free_i32(vn);
125
+ tcg_temp_free_i32(vm);
126
+ tcg_temp_free_i32(vd);
127
+
128
+ return true;
129
+}
130
+
131
static bool do_vfm_sp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
132
{
108
{
133
/*
109
IMXEPITState *s = IMX_EPIT(opaque);
134
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
110
@@ -XXX,XX +XXX,XX @@ static uint64_t imx_epit_read(void *opaque, hwaddr offset, unsigned size)
135
MAKE_ONE_VFM_TRANS_FN(VFNMA, PREC, false, true) \
111
break;
136
MAKE_ONE_VFM_TRANS_FN(VFNMS, PREC, true, true)
112
137
113
case 4: /* CNT */
138
+MAKE_VFM_TRANS_FNS(hp)
114
- imx_epit_update_count(s);
139
MAKE_VFM_TRANS_FNS(sp)
115
- reg_value = s->cnt;
140
MAKE_VFM_TRANS_FNS(dp)
116
+ reg_value = ptimer_get_count(s->timer_reload);
141
117
break;
118
119
default:
120
@@ -XXX,XX +XXX,XX @@ static void imx_epit_reload_compare_timer(IMXEPITState *s)
121
{
122
if ((s->cr & (CR_EN | CR_OCIEN)) == (CR_EN | CR_OCIEN)) {
123
/* if the compare feature is on and timers are running */
124
- uint32_t tmp = imx_epit_update_count(s);
125
+ uint32_t tmp = ptimer_get_count(s->timer_reload);
126
uint64_t next;
127
if (tmp > s->cmp) {
128
/* It'll fire in this round of the timer */
129
@@ -XXX,XX +XXX,XX @@ static void imx_epit_reload_compare_timer(IMXEPITState *s)
130
131
static void imx_epit_write_cr(IMXEPITState *s, uint32_t value)
132
{
133
+ uint32_t freq = 0;
134
uint32_t oldcr = s->cr;
135
136
s->cr = value & 0x03ffffff;
137
@@ -XXX,XX +XXX,XX @@ static void imx_epit_write_cr(IMXEPITState *s, uint32_t value)
138
ptimer_transaction_begin(s->timer_cmp);
139
ptimer_transaction_begin(s->timer_reload);
140
141
- /* Update the frequency. Has been done already in case of a reset. */
142
+ /*
143
+ * Update the frequency. In case of a reset the input clock was
144
+ * switched off, so this can be skipped.
145
+ */
146
if (!(s->cr & CR_SWR)) {
147
- imx_epit_set_freq(s);
148
+ freq = imx_epit_get_freq(s);
149
+ if (freq) {
150
+ ptimer_set_freq(s->timer_reload, freq);
151
+ ptimer_set_freq(s->timer_cmp, freq);
152
+ }
153
}
154
155
- if (s->freq && (s->cr & CR_EN) && !(oldcr & CR_EN)) {
156
+ if (freq && (s->cr & CR_EN) && !(oldcr & CR_EN)) {
157
if (s->cr & CR_ENMOD) {
158
if (s->cr & CR_RLD) {
159
ptimer_set_limit(s->timer_reload, s->lr, 1);
160
@@ -XXX,XX +XXX,XX @@ static const MemoryRegionOps imx_epit_ops = {
161
162
static const VMStateDescription vmstate_imx_timer_epit = {
163
.name = TYPE_IMX_EPIT,
164
- .version_id = 2,
165
- .minimum_version_id = 2,
166
+ .version_id = 3,
167
+ .minimum_version_id = 3,
168
.fields = (VMStateField[]) {
169
VMSTATE_UINT32(cr, IMXEPITState),
170
VMSTATE_UINT32(sr, IMXEPITState),
171
VMSTATE_UINT32(lr, IMXEPITState),
172
VMSTATE_UINT32(cmp, IMXEPITState),
173
- VMSTATE_UINT32(cnt, IMXEPITState),
174
- VMSTATE_UINT32(freq, IMXEPITState),
175
VMSTATE_PTIMER(timer_reload, IMXEPITState),
176
VMSTATE_PTIMER(timer_cmp, IMXEPITState),
177
VMSTATE_END_OF_LIST()
142
--
178
--
143
2.20.1
179
2.25.1
144
145
diff view generated by jsdifflib
1
Convert the Neon VRINT-with-specified-rounding-mode insns to gvec,
1
From: Axel Heider <axel.heider@hensoldt.net>
2
and use this to implement the fp16 versions.
3
2
3
- fix #1263 for CR writes
4
- rework compare time handling
5
- The compare timer has to run even if CR.OCIEN is not set,
6
as SR.OCIF must be updated.
7
- The compare timer fires exactly once when the
8
compare value is less than the current value, but the
9
reload values is less than the compare value.
10
- The compare timer will never fire if the reload value is
11
less than the compare value. Disable it in this case.
12
13
Signed-off-by: Axel Heider <axel.heider@hensoldt.net>
14
[PMM: fixed minor style nits]
15
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
16
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Message-id: 20200828183354.27913-41-peter.maydell@linaro.org
7
---
17
---
8
target/arm/helper.h | 4 +-
18
hw/timer/imx_epit.c | 192 ++++++++++++++++++++++++++------------------
9
target/arm/vec_helper.c | 21 +++++++++++
19
1 file changed, 116 insertions(+), 76 deletions(-)
10
target/arm/vfp_helper.c | 17 ---------
11
target/arm/translate-neon.c.inc | 67 +++------------------------------
12
4 files changed, 30 insertions(+), 79 deletions(-)
13
20
14
diff --git a/target/arm/helper.h b/target/arm/helper.h
21
diff --git a/hw/timer/imx_epit.c b/hw/timer/imx_epit.c
15
index XXXXXXX..XXXXXXX 100644
22
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/helper.h
23
--- a/hw/timer/imx_epit.c
17
+++ b/target/arm/helper.h
24
+++ b/hw/timer/imx_epit.c
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_sqtoh, f16, i64, i32, ptr)
25
@@ -XXX,XX +XXX,XX @@
19
DEF_HELPER_3(vfp_uqtoh, f16, i64, i32, ptr)
26
* Originally written by Hans Jiang
20
27
* Updated by Peter Chubb
21
DEF_HELPER_FLAGS_2(set_rmode, TCG_CALL_NO_RWG, i32, i32, ptr)
28
* Updated by Jean-Christophe Dubois <jcd@tribudubois.net>
22
-DEF_HELPER_FLAGS_2(set_neon_rmode, TCG_CALL_NO_RWG, i32, i32, env)
29
+ * Updated by Axel Heider
23
30
*
24
DEF_HELPER_FLAGS_3(vfp_fcvt_f16_to_f32, TCG_CALL_NO_RWG, f32, f16, ptr, i32)
31
* This code is licensed under GPL version 2 or later. See
25
DEF_HELPER_FLAGS_3(vfp_fcvt_f32_to_f16, TCG_CALL_NO_RWG, f16, f32, ptr, i32)
32
* the COPYING file in the top-level directory.
26
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_vcvt_rm_us, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
33
@@ -XXX,XX +XXX,XX @@ static uint64_t imx_epit_read(void *opaque, hwaddr offset, unsigned size)
27
DEF_HELPER_FLAGS_4(gvec_vcvt_rm_sh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
34
return reg_value;
28
DEF_HELPER_FLAGS_4(gvec_vcvt_rm_uh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
35
}
29
36
30
+DEF_HELPER_FLAGS_4(gvec_vrint_rm_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
37
-/* Must be called from ptimer_transaction_begin/commit block for s->timer_cmp */
31
+DEF_HELPER_FLAGS_4(gvec_vrint_rm_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
38
-static void imx_epit_reload_compare_timer(IMXEPITState *s)
32
+
39
+/*
33
DEF_HELPER_FLAGS_4(gvec_frecpe_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
40
+ * Must be called from a ptimer_transaction_begin/commit block for
34
DEF_HELPER_FLAGS_4(gvec_frecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
41
+ * s->timer_cmp, but outside of a transaction block of s->timer_reload,
35
DEF_HELPER_FLAGS_4(gvec_frecpe_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
42
+ * so the proper counter value is read.
36
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
43
+ */
37
index XXXXXXX..XXXXXXX 100644
44
+static void imx_epit_update_compare_timer(IMXEPITState *s)
38
--- a/target/arm/vec_helper.c
45
{
39
+++ b/target/arm/vec_helper.c
46
- if ((s->cr & (CR_EN | CR_OCIEN)) == (CR_EN | CR_OCIEN)) {
40
@@ -XXX,XX +XXX,XX @@ DO_VCVT_RMODE(gvec_vcvt_rm_sh, helper_vfp_toshh, uint16_t)
47
- /* if the compare feature is on and timers are running */
41
DO_VCVT_RMODE(gvec_vcvt_rm_uh, helper_vfp_touhh, uint16_t)
48
- uint32_t tmp = ptimer_get_count(s->timer_reload);
42
49
- uint64_t next;
43
#undef DO_VCVT_RMODE
50
- if (tmp > s->cmp) {
44
+
51
- /* It'll fire in this round of the timer */
45
+#define DO_VRINT_RMODE(NAME, FUNC, TYPE) \
52
- next = tmp - s->cmp;
46
+ void HELPER(NAME)(void *vd, void *vn, void *stat, uint32_t desc) \
53
- } else { /* catch it next time around */
47
+ { \
54
- next = tmp - s->cmp + ((s->cr & CR_RLD) ? EPIT_TIMER_MAX : s->lr);
48
+ float_status *fpst = stat; \
55
+ uint64_t counter = 0;
49
+ intptr_t i, oprsz = simd_oprsz(desc); \
56
+ bool is_oneshot = false;
50
+ uint32_t rmode = simd_data(desc); \
57
+ /*
51
+ uint32_t prev_rmode = get_float_rounding_mode(fpst); \
58
+ * The compare timer only has to run if the timer peripheral is active
52
+ TYPE *d = vd, *n = vn; \
59
+ * and there is an input clock, Otherwise it can be switched off.
53
+ set_float_rounding_mode(rmode, fpst); \
60
+ */
54
+ for (i = 0; i < oprsz / sizeof(TYPE); i++) { \
61
+ bool is_active = (s->cr & CR_EN) && imx_epit_get_freq(s);
55
+ d[i] = FUNC(n[i], fpst); \
62
+ if (is_active) {
56
+ } \
63
+ /*
57
+ set_float_rounding_mode(prev_rmode, fpst); \
64
+ * Calculate next timeout for compare timer. Reading the reload
58
+ clear_tail(d, oprsz, simd_maxsz(desc)); \
65
+ * counter returns proper results only if pending transactions
66
+ * on it are committed here. Otherwise stale values are be read.
67
+ */
68
+ counter = ptimer_get_count(s->timer_reload);
69
+ uint64_t limit = ptimer_get_limit(s->timer_cmp);
70
+ /*
71
+ * The compare timer is a periodic timer if the limit is at least
72
+ * the compare value. Otherwise it may fire at most once in the
73
+ * current round.
74
+ */
75
+ bool is_oneshot = (limit >= s->cmp);
76
+ if (counter >= s->cmp) {
77
+ /* The compare timer fires in the current round. */
78
+ counter -= s->cmp;
79
+ } else if (!is_oneshot) {
80
+ /*
81
+ * The compare timer fires after a reload, as it is below the
82
+ * compare value already in this round. Note that the counter
83
+ * value calculated below can be above the 32-bit limit, which
84
+ * is legal here because the compare timer is an internal
85
+ * helper ptimer only.
86
+ */
87
+ counter += limit - s->cmp;
88
+ } else {
89
+ /*
90
+ * The compare timer won't fire in this round, and the limit is
91
+ * set to a value below the compare value. This practically means
92
+ * it will never fire, so it can be switched off.
93
+ */
94
+ is_active = false;
95
}
96
- ptimer_set_count(s->timer_cmp, next);
97
}
98
+
99
+ /*
100
+ * Set the compare timer and let it run, or stop it. This is agnostic
101
+ * of CR.OCIEN bit, as this bit affects interrupt generation only. The
102
+ * compare timer needs to run even if no interrupts are to be generated,
103
+ * because the SR.OCIF bit must be updated also.
104
+ * Note that the timer might already be stopped or be running with
105
+ * counter values. However, finding out when an update is needed and
106
+ * when not is not trivial. It's much easier applying the setting again,
107
+ * as this does not harm either and the overhead is negligible.
108
+ */
109
+ if (is_active) {
110
+ ptimer_set_count(s->timer_cmp, counter);
111
+ ptimer_run(s->timer_cmp, is_oneshot ? 1 : 0);
112
+ } else {
113
+ ptimer_stop(s->timer_cmp);
59
+ }
114
+ }
60
+
115
+
61
+DO_VRINT_RMODE(gvec_vrint_rm_h, helper_rinth, uint16_t)
116
}
62
+DO_VRINT_RMODE(gvec_vrint_rm_s, helper_rints, uint32_t)
117
63
+
118
static void imx_epit_write_cr(IMXEPITState *s, uint32_t value)
64
+#undef DO_VRINT_RMODE
119
{
65
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
120
- uint32_t freq = 0;
66
index XXXXXXX..XXXXXXX 100644
121
uint32_t oldcr = s->cr;
67
--- a/target/arm/vfp_helper.c
122
68
+++ b/target/arm/vfp_helper.c
123
s->cr = value & 0x03ffffff;
69
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(set_rmode)(uint32_t rmode, void *fpstp)
124
70
return prev_rmode;
125
if (s->cr & CR_SWR) {
71
}
126
- /* handle the reset */
72
127
+ /*
73
-/* Set the current fp rounding mode in the standard fp status and return
128
+ * Reset clears CR.SWR again. It does not touch CR.EN, but the timers
74
- * the old one. This is for NEON instructions that need to change the
129
+ * are still stopped because the input clock is disabled.
75
- * rounding mode but wish to use the standard FPSCR values for everything
130
+ */
76
- * else. Always set the rounding mode back to the correct value after
131
imx_epit_reset(s, false);
77
- * modifying it.
132
+ } else {
78
- * The argument is a softfloat float_round_ value.
133
+ uint32_t freq;
79
- */
134
+ uint32_t toggled_cr_bits = oldcr ^ s->cr;
80
-uint32_t HELPER(set_neon_rmode)(uint32_t rmode, CPUARMState *env)
135
+ /* re-initialize the limits if CR.RLD has changed */
81
-{
136
+ bool set_limit = toggled_cr_bits & CR_RLD;
82
- float_status *fp_status = &env->vfp.standard_fp_status;
137
+ /* set the counter if the timer got just enabled and CR.ENMOD is set */
83
-
138
+ bool is_switched_on = (toggled_cr_bits & s->cr) & CR_EN;
84
- uint32_t prev_rmode = get_float_rounding_mode(fp_status);
139
+ bool set_counter = is_switched_on && (s->cr & CR_ENMOD);
85
- set_float_rounding_mode(rmode, fp_status);
140
+
86
-
141
+ ptimer_transaction_begin(s->timer_cmp);
87
- return prev_rmode;
142
+ ptimer_transaction_begin(s->timer_reload);
88
-}
143
+ freq = imx_epit_get_freq(s);
89
-
144
+ if (freq) {
90
/* Half precision conversions. */
145
+ ptimer_set_freq(s->timer_reload, freq);
91
float32 HELPER(vfp_fcvt_f16_to_f32)(uint32_t a, void *fpstp, uint32_t ahp_mode)
146
+ ptimer_set_freq(s->timer_cmp, freq);
92
{
147
+ }
93
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
148
+
94
index XXXXXXX..XXXXXXX 100644
149
+ if (set_limit || set_counter) {
95
--- a/target/arm/translate-neon.c.inc
150
+ uint64_t limit = (s->cr & CR_RLD) ? s->lr : EPIT_TIMER_MAX;
96
+++ b/target/arm/translate-neon.c.inc
151
+ ptimer_set_limit(s->timer_reload, limit, set_counter ? 1 : 0);
97
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX(DisasContext *s, arg_2misc *a)
152
+ if (set_limit) {
98
return do_2misc_fp(s, a, gen_helper_rints_exact);
153
+ ptimer_set_limit(s->timer_cmp, limit, 0);
99
}
154
+ }
100
155
+ }
101
-static bool do_vrint(DisasContext *s, arg_2misc *a, int rmode)
156
+ /*
102
-{
157
+ * If there is an input clock and the peripheral is enabled, then
158
+ * ensure the wall clock timer is ticking. Otherwise stop the timers.
159
+ * The compare timer will be updated later.
160
+ */
161
+ if (freq && (s->cr & CR_EN)) {
162
+ ptimer_run(s->timer_reload, 0);
163
+ } else {
164
+ ptimer_stop(s->timer_reload);
165
+ }
166
+ /* Commit changes to reload timer, so they can propagate. */
167
+ ptimer_transaction_commit(s->timer_reload);
168
+ /* Update compare timer based on the committed reload timer value. */
169
+ imx_epit_update_compare_timer(s);
170
+ ptimer_transaction_commit(s->timer_cmp);
171
}
172
173
/*
174
@@ -XXX,XX +XXX,XX @@ static void imx_epit_write_cr(IMXEPITState *s, uint32_t value)
175
* - write to CR.EN or CR.OCIE
176
*/
177
imx_epit_update_int(s);
178
-
103
- /*
179
- /*
104
- * Handle a VRINT* operation by iterating 32 bits at a time,
180
- * TODO: could we 'break' here for reset? following operations appear
105
- * with a specified rounding mode in operation.
181
- * to duplicate the work imx_epit_reset() already did.
106
- */
182
- */
107
- int pass;
183
-
108
- TCGv_ptr fpst;
184
- ptimer_transaction_begin(s->timer_cmp);
109
- TCGv_i32 tcg_rmode;
185
- ptimer_transaction_begin(s->timer_reload);
110
-
186
-
111
- if (!arm_dc_feature(s, ARM_FEATURE_NEON) ||
187
- /*
112
- !arm_dc_feature(s, ARM_FEATURE_V8)) {
188
- * Update the frequency. In case of a reset the input clock was
113
- return false;
189
- * switched off, so this can be skipped.
190
- */
191
- if (!(s->cr & CR_SWR)) {
192
- freq = imx_epit_get_freq(s);
193
- if (freq) {
194
- ptimer_set_freq(s->timer_reload, freq);
195
- ptimer_set_freq(s->timer_cmp, freq);
196
- }
114
- }
197
- }
115
-
198
-
116
- /* UNDEF accesses to D16-D31 if they don't exist. */
199
- if (freq && (s->cr & CR_EN) && !(oldcr & CR_EN)) {
117
- if (!dc_isar_feature(aa32_simd_r32, s) &&
200
- if (s->cr & CR_ENMOD) {
118
- ((a->vd | a->vm) & 0x10)) {
201
- if (s->cr & CR_RLD) {
119
- return false;
202
- ptimer_set_limit(s->timer_reload, s->lr, 1);
203
- ptimer_set_limit(s->timer_cmp, s->lr, 1);
204
- } else {
205
- ptimer_set_limit(s->timer_reload, EPIT_TIMER_MAX, 1);
206
- ptimer_set_limit(s->timer_cmp, EPIT_TIMER_MAX, 1);
207
- }
208
- }
209
-
210
- imx_epit_reload_compare_timer(s);
211
- ptimer_run(s->timer_reload, 0);
212
- if (s->cr & CR_OCIEN) {
213
- ptimer_run(s->timer_cmp, 0);
214
- } else {
215
- ptimer_stop(s->timer_cmp);
216
- }
217
- } else if (!(s->cr & CR_EN)) {
218
- /* stop both timers */
219
- ptimer_stop(s->timer_reload);
220
- ptimer_stop(s->timer_cmp);
221
- } else if (s->cr & CR_OCIEN) {
222
- if (!(oldcr & CR_OCIEN)) {
223
- imx_epit_reload_compare_timer(s);
224
- ptimer_run(s->timer_cmp, 0);
225
- }
226
- } else {
227
- ptimer_stop(s->timer_cmp);
120
- }
228
- }
121
-
229
-
122
- if (a->size != 2) {
230
- ptimer_transaction_commit(s->timer_cmp);
123
- /* TODO: FP16 will be the size == 1 case */
231
- ptimer_transaction_commit(s->timer_reload);
124
- return false;
232
}
125
- }
233
126
-
234
static void imx_epit_write_sr(IMXEPITState *s, uint32_t value)
127
- if ((a->vd | a->vm) & a->q) {
235
@@ -XXX,XX +XXX,XX @@ static void imx_epit_write_lr(IMXEPITState *s, uint32_t value)
128
- return false;
236
/* If IOVW bit is set then set the timer value */
129
- }
237
ptimer_set_count(s->timer_reload, s->lr);
130
-
238
}
131
- if (!vfp_access_check(s)) {
239
- /*
132
- return true;
240
- * Commit the change to s->timer_reload, so it can propagate. Otherwise
133
- }
241
- * the timer interrupt may not fire properly. The commit must happen
134
-
242
- * before calling imx_epit_reload_compare_timer(), which reads
135
- fpst = fpstatus_ptr(FPST_STD);
243
- * s->timer_reload internally again.
136
- tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rmode));
244
- */
137
- gen_helper_set_neon_rmode(tcg_rmode, tcg_rmode, cpu_env);
245
+ /* Commit the changes to s->timer_reload, so they can propagate. */
138
- for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
246
ptimer_transaction_commit(s->timer_reload);
139
- TCGv_i32 tmp = neon_load_reg(a->vm, pass);
247
- imx_epit_reload_compare_timer(s);
140
- gen_helper_rints(tmp, tmp, fpst);
248
+ /* Update the compare timer based on the committed reload timer value. */
141
- neon_store_reg(a->vd, pass, tmp);
249
+ imx_epit_update_compare_timer(s);
142
- }
250
ptimer_transaction_commit(s->timer_cmp);
143
- gen_helper_set_neon_rmode(tcg_rmode, tcg_rmode, cpu_env);
251
}
144
- tcg_temp_free_i32(tcg_rmode);
252
145
- tcg_temp_free_ptr(fpst);
253
@@ -XXX,XX +XXX,XX @@ static void imx_epit_write_cmp(IMXEPITState *s, uint32_t value)
146
-
254
{
147
- return true;
255
s->cmp = value;
148
-}
256
149
-
257
+ /* Update the compare timer based on the committed reload timer value. */
150
-#define DO_VRINT(INSN, RMODE) \
258
ptimer_transaction_begin(s->timer_cmp);
151
- static bool trans_##INSN(DisasContext *s, arg_2misc *a) \
259
- imx_epit_reload_compare_timer(s);
152
- { \
260
+ imx_epit_update_compare_timer(s);
153
- return do_vrint(s, a, RMODE); \
261
ptimer_transaction_commit(s->timer_cmp);
154
- }
262
}
155
-
263
156
-DO_VRINT(VRINTN, FPROUNDING_TIEEVEN)
264
@@ -XXX,XX +XXX,XX @@ static void imx_epit_cmp(void *opaque)
157
-DO_VRINT(VRINTA, FPROUNDING_TIEAWAY)
265
{
158
-DO_VRINT(VRINTZ, FPROUNDING_ZERO)
266
IMXEPITState *s = IMX_EPIT(opaque);
159
-DO_VRINT(VRINTM, FPROUNDING_NEGINF)
267
160
-DO_VRINT(VRINTP, FPROUNDING_POSINF)
268
+ /* The cmp ptimer can't be running when the peripheral is disabled */
161
-
269
+ assert(s->cr & CR_EN);
162
#define DO_VEC_RMODE(INSN, RMODE, OP) \
270
+
163
static void gen_##INSN(unsigned vece, uint32_t rd_ofs, \
271
DPRINTF("sr was %d\n", s->sr);
164
uint32_t rm_ofs, \
272
/* Set interrupt status bit SR.OCIF and update the interrupt state */
165
@@ -XXX,XX +XXX,XX @@ DO_VEC_RMODE(VCVTPS, FPROUNDING_POSINF, vcvt_rm_s)
273
s->sr |= SR_OCIF;
166
DO_VEC_RMODE(VCVTMU, FPROUNDING_NEGINF, vcvt_rm_u)
167
DO_VEC_RMODE(VCVTMS, FPROUNDING_NEGINF, vcvt_rm_s)
168
169
+DO_VEC_RMODE(VRINTN, FPROUNDING_TIEEVEN, vrint_rm_)
170
+DO_VEC_RMODE(VRINTA, FPROUNDING_TIEAWAY, vrint_rm_)
171
+DO_VEC_RMODE(VRINTZ, FPROUNDING_ZERO, vrint_rm_)
172
+DO_VEC_RMODE(VRINTM, FPROUNDING_NEGINF, vrint_rm_)
173
+DO_VEC_RMODE(VRINTP, FPROUNDING_POSINF, vrint_rm_)
174
+
175
static bool trans_VSWP(DisasContext *s, arg_2misc *a)
176
{
177
TCGv_i64 rm, rd;
178
--
274
--
179
2.20.1
275
2.25.1
180
181
diff view generated by jsdifflib
1
Convert the Neon VCVT with-specified-rounding-mode instructions
1
From: Fabiano Rosas <farosas@suse.de>
2
to gvec, and use this to implement fp16 support for them.
3
2
3
Fix these:
4
5
WARNING: Block comments use a leading /* on a separate line
6
WARNING: Block comments use * on subsequent lines
7
WARNING: Block comments use a trailing */ on a separate line
8
9
Signed-off-by: Fabiano Rosas <farosas@suse.de>
10
Reviewed-by: Claudio Fontana <cfontana@suse.de>
11
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
12
Message-id: 20221213190537.511-2-farosas@suse.de
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
13
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Message-id: 20200828183354.27913-40-peter.maydell@linaro.org
7
---
14
---
8
target/arm/helper.h | 5 ++
15
target/arm/helper.c | 323 +++++++++++++++++++++++++++++---------------
9
target/arm/vec_helper.c | 23 +++++++
16
1 file changed, 215 insertions(+), 108 deletions(-)
10
target/arm/translate-neon.c.inc | 105 ++++++++++++--------------------
11
3 files changed, 66 insertions(+), 67 deletions(-)
12
17
13
diff --git a/target/arm/helper.h b/target/arm/helper.h
18
diff --git a/target/arm/helper.c b/target/arm/helper.c
14
index XXXXXXX..XXXXXXX 100644
19
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/helper.h
20
--- a/target/arm/helper.c
16
+++ b/target/arm/helper.h
21
+++ b/target/arm/helper.c
17
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_vcvt_uh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
22
@@ -XXX,XX +XXX,XX @@ uint64_t read_raw_cp_reg(CPUARMState *env, const ARMCPRegInfo *ri)
18
DEF_HELPER_FLAGS_4(gvec_vcvt_hs, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
23
static void write_raw_cp_reg(CPUARMState *env, const ARMCPRegInfo *ri,
19
DEF_HELPER_FLAGS_4(gvec_vcvt_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
24
uint64_t v)
20
25
{
21
+DEF_HELPER_FLAGS_4(gvec_vcvt_rm_ss, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
26
- /* Raw write of a coprocessor register (as needed for migration, etc).
22
+DEF_HELPER_FLAGS_4(gvec_vcvt_rm_us, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
27
+ /*
23
+DEF_HELPER_FLAGS_4(gvec_vcvt_rm_sh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
28
+ * Raw write of a coprocessor register (as needed for migration, etc).
24
+DEF_HELPER_FLAGS_4(gvec_vcvt_rm_uh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
29
* Note that constant registers are treated as write-ignored; the
25
+
30
* caller should check for success by whether a readback gives the
26
DEF_HELPER_FLAGS_4(gvec_frecpe_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
31
* value written.
27
DEF_HELPER_FLAGS_4(gvec_frecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
32
@@ -XXX,XX +XXX,XX @@ static void write_raw_cp_reg(CPUARMState *env, const ARMCPRegInfo *ri,
28
DEF_HELPER_FLAGS_4(gvec_frecpe_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
33
29
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
34
static bool raw_accessors_invalid(const ARMCPRegInfo *ri)
30
index XXXXXXX..XXXXXXX 100644
35
{
31
--- a/target/arm/vec_helper.c
36
- /* Return true if the regdef would cause an assertion if you called
32
+++ b/target/arm/vec_helper.c
37
+ /*
33
@@ -XXX,XX +XXX,XX @@ DO_VCVT_FIXED(gvec_vcvt_hs, helper_vfp_toshh_round_to_zero, uint16_t)
38
+ * Return true if the regdef would cause an assertion if you called
34
DO_VCVT_FIXED(gvec_vcvt_hu, helper_vfp_touhh_round_to_zero, uint16_t)
39
* read_raw_cp_reg() or write_raw_cp_reg() on it (ie if it is a
35
40
* program bug for it not to have the NO_RAW flag).
36
#undef DO_VCVT_FIXED
41
* NB that returning false here doesn't necessarily mean that calling
37
+
42
@@ -XXX,XX +XXX,XX @@ bool write_list_to_cpustate(ARMCPU *cpu)
38
+#define DO_VCVT_RMODE(NAME, FUNC, TYPE) \
43
if (ri->type & ARM_CP_NO_RAW) {
39
+ void HELPER(NAME)(void *vd, void *vn, void *stat, uint32_t desc) \
44
continue;
40
+ { \
45
}
41
+ float_status *fpst = stat; \
46
- /* Write value and confirm it reads back as written
42
+ intptr_t i, oprsz = simd_oprsz(desc); \
47
+ /*
43
+ uint32_t rmode = simd_data(desc); \
48
+ * Write value and confirm it reads back as written
44
+ uint32_t prev_rmode = get_float_rounding_mode(fpst); \
49
* (to catch read-only registers and partially read-only
45
+ TYPE *d = vd, *n = vn; \
50
* registers where the incoming migration value doesn't match)
46
+ set_float_rounding_mode(rmode, fpst); \
51
*/
47
+ for (i = 0; i < oprsz / sizeof(TYPE); i++) { \
52
@@ -XXX,XX +XXX,XX @@ static gint cpreg_key_compare(gconstpointer a, gconstpointer b)
48
+ d[i] = FUNC(n[i], 0, fpst); \
53
49
+ } \
54
void init_cpreg_list(ARMCPU *cpu)
50
+ set_float_rounding_mode(prev_rmode, fpst); \
55
{
51
+ clear_tail(d, oprsz, simd_maxsz(desc)); \
56
- /* Initialise the cpreg_tuples[] array based on the cp_regs hash.
52
+ }
57
+ /*
53
+
58
+ * Initialise the cpreg_tuples[] array based on the cp_regs hash.
54
+DO_VCVT_RMODE(gvec_vcvt_rm_ss, helper_vfp_tosls, uint32_t)
59
* Note that we require cpreg_tuples[] to be sorted by key ID.
55
+DO_VCVT_RMODE(gvec_vcvt_rm_us, helper_vfp_touls, uint32_t)
60
*/
56
+DO_VCVT_RMODE(gvec_vcvt_rm_sh, helper_vfp_toshh, uint16_t)
61
GList *keys;
57
+DO_VCVT_RMODE(gvec_vcvt_rm_uh, helper_vfp_touhh, uint16_t)
62
@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_el3_aa32ns(CPUARMState *env,
58
+
63
return CP_ACCESS_OK;
59
+#undef DO_VCVT_RMODE
64
}
60
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
65
61
index XXXXXXX..XXXXXXX 100644
66
-/* Some secure-only AArch32 registers trap to EL3 if used from
62
--- a/target/arm/translate-neon.c.inc
67
+/*
63
+++ b/target/arm/translate-neon.c.inc
68
+ * Some secure-only AArch32 registers trap to EL3 if used from
64
@@ -XXX,XX +XXX,XX @@ DO_VRINT(VRINTZ, FPROUNDING_ZERO)
69
* Secure EL1 (but are just ordinary UNDEF in other non-EL3 contexts).
65
DO_VRINT(VRINTM, FPROUNDING_NEGINF)
70
* Note that an access from Secure EL1 can only happen if EL3 is AArch64.
66
DO_VRINT(VRINTP, FPROUNDING_POSINF)
71
* We assume that the .access field is set to PL1_RW.
67
72
@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_trap_aa32s_el1(CPUARMState *env,
68
-static bool do_vcvt(DisasContext *s, arg_2misc *a, int rmode, bool is_signed)
73
return CP_ACCESS_TRAP_UNCATEGORIZED;
69
-{
74
}
70
- /*
75
71
- * Handle a VCVT* operation by iterating 32 bits at a time,
76
-/* Check for traps to performance monitor registers, which are controlled
72
- * with a specified rounding mode in operation.
77
+/*
73
- */
78
+ * Check for traps to performance monitor registers, which are controlled
74
- int pass;
79
* by MDCR_EL2.TPM for EL2 and MDCR_EL3.TPM for EL3.
75
- TCGv_ptr fpst;
80
*/
76
- TCGv_i32 tcg_rmode, tcg_shift;
81
static CPAccessResult access_tpm(CPUARMState *env, const ARMCPRegInfo *ri,
77
-
82
@@ -XXX,XX +XXX,XX @@ static void fcse_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
78
- if (!arm_dc_feature(s, ARM_FEATURE_NEON) ||
83
ARMCPU *cpu = env_archcpu(env);
79
- !arm_dc_feature(s, ARM_FEATURE_V8)) {
84
80
- return false;
85
if (raw_read(env, ri) != value) {
81
+#define DO_VEC_RMODE(INSN, RMODE, OP) \
86
- /* Unlike real hardware the qemu TLB uses virtual addresses,
82
+ static void gen_##INSN(unsigned vece, uint32_t rd_ofs, \
87
+ /*
83
+ uint32_t rm_ofs, \
88
+ * Unlike real hardware the qemu TLB uses virtual addresses,
84
+ uint32_t oprsz, uint32_t maxsz) \
89
* not modified virtual addresses, so this causes a TLB flush.
85
+ { \
90
*/
86
+ static gen_helper_gvec_2_ptr * const fns[4] = { \
91
tlb_flush(CPU(cpu));
87
+ NULL, \
92
@@ -XXX,XX +XXX,XX @@ static void contextidr_write(CPUARMState *env, const ARMCPRegInfo *ri,
88
+ gen_helper_gvec_##OP##h, \
93
89
+ gen_helper_gvec_##OP##s, \
94
if (raw_read(env, ri) != value && !arm_feature(env, ARM_FEATURE_PMSA)
90
+ NULL, \
95
&& !extended_addresses_enabled(env)) {
91
+ }; \
96
- /* For VMSA (when not using the LPAE long descriptor page table
92
+ TCGv_ptr fpst; \
97
+ /*
93
+ fpst = fpstatus_ptr(vece == 1 ? FPST_STD_F16 : FPST_STD); \
98
+ * For VMSA (when not using the LPAE long descriptor page table
94
+ tcg_gen_gvec_2_ptr(rd_ofs, rm_ofs, fpst, oprsz, maxsz, \
99
* format) this register includes the ASID, so do a TLB flush.
95
+ arm_rmode_to_sf(RMODE), fns[vece]); \
100
* For PMSA it is purely a process ID and no action is needed.
96
+ tcg_temp_free_ptr(fpst); \
101
*/
97
+ } \
102
@@ -XXX,XX +XXX,XX @@ static void tlbiipas2is_hyp_write(CPUARMState *env, const ARMCPRegInfo *ri,
98
+ static bool trans_##INSN(DisasContext *s, arg_2misc *a) \
103
}
99
+ { \
104
100
+ if (!arm_dc_feature(s, ARM_FEATURE_V8)) { \
105
static const ARMCPRegInfo cp_reginfo[] = {
101
+ return false; \
106
- /* Define the secure and non-secure FCSE identifier CP registers
102
+ } \
107
+ /*
103
+ if (a->size == MO_16) { \
108
+ * Define the secure and non-secure FCSE identifier CP registers
104
+ if (!dc_isar_feature(aa32_fp16_arith, s)) { \
109
* separately because there is no secure bank in V8 (no _EL3). This allows
105
+ return false; \
110
* the secure register to be properly reset and migrated. There is also no
106
+ } \
111
* v8 EL1 version of the register so the non-secure instance stands alone.
107
+ } else if (a->size != MO_32) { \
112
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo cp_reginfo[] = {
108
+ return false; \
113
.access = PL1_RW, .secure = ARM_CP_SECSTATE_S,
109
+ } \
114
.fieldoffset = offsetof(CPUARMState, cp15.fcseidr_s),
110
+ return do_2misc_vec(s, a, gen_##INSN); \
115
.resetvalue = 0, .writefn = fcse_write, .raw_writefn = raw_write, },
111
}
116
- /* Define the secure and non-secure context identifier CP registers
112
117
+ /*
113
- /* UNDEF accesses to D16-D31 if they don't exist. */
118
+ * Define the secure and non-secure context identifier CP registers
114
- if (!dc_isar_feature(aa32_simd_r32, s) &&
119
* separately because there is no secure bank in V8 (no _EL3). This allows
115
- ((a->vd | a->vm) & 0x10)) {
120
* the secure register to be properly reset and migrated. In the
116
- return false;
121
* non-secure case, the 32-bit register will have reset and migration
117
- }
122
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo cp_reginfo[] = {
118
-
123
};
119
- if (a->size != 2) {
124
120
- /* TODO: FP16 will be the size == 1 case */
125
static const ARMCPRegInfo not_v8_cp_reginfo[] = {
121
- return false;
126
- /* NB: Some of these registers exist in v8 but with more precise
122
- }
127
+ /*
123
-
128
+ * NB: Some of these registers exist in v8 but with more precise
124
- if ((a->vd | a->vm) & a->q) {
129
* definitions that don't use CP_ANY wildcards (mostly in v8_cp_reginfo[]).
125
- return false;
130
*/
126
- }
131
/* MMU Domain access control / MPU write buffer control */
127
-
132
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo not_v8_cp_reginfo[] = {
128
- if (!vfp_access_check(s)) {
133
.writefn = dacr_write, .raw_writefn = raw_write,
129
- return true;
134
.bank_fieldoffsets = { offsetoflow32(CPUARMState, cp15.dacr_s),
130
- }
135
offsetoflow32(CPUARMState, cp15.dacr_ns) } },
131
-
136
- /* ARMv7 allocates a range of implementation defined TLB LOCKDOWN regs.
132
- fpst = fpstatus_ptr(FPST_STD);
137
+ /*
133
- tcg_shift = tcg_const_i32(0);
138
+ * ARMv7 allocates a range of implementation defined TLB LOCKDOWN regs.
134
- tcg_rmode = tcg_const_i32(arm_rmode_to_sf(rmode));
139
* For v6 and v5, these mappings are overly broad.
135
- gen_helper_set_neon_rmode(tcg_rmode, tcg_rmode, cpu_env);
140
*/
136
- for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
141
{ .name = "TLB_LOCKDOWN", .cp = 15, .crn = 10, .crm = 0,
137
- TCGv_i32 tmp = neon_load_reg(a->vm, pass);
142
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo not_v8_cp_reginfo[] = {
138
- if (is_signed) {
143
};
139
- gen_helper_vfp_tosls(tmp, tmp, tcg_shift, fpst);
144
140
- } else {
145
static const ARMCPRegInfo not_v6_cp_reginfo[] = {
141
- gen_helper_vfp_touls(tmp, tmp, tcg_shift, fpst);
146
- /* Not all pre-v6 cores implemented this WFI, so this is slightly
142
- }
147
+ /*
143
- neon_store_reg(a->vd, pass, tmp);
148
+ * Not all pre-v6 cores implemented this WFI, so this is slightly
144
- }
149
* over-broad.
145
- gen_helper_set_neon_rmode(tcg_rmode, tcg_rmode, cpu_env);
150
*/
146
- tcg_temp_free_i32(tcg_rmode);
151
{ .name = "WFI_v5", .cp = 15, .crn = 7, .crm = 8, .opc1 = 0, .opc2 = 2,
147
- tcg_temp_free_i32(tcg_shift);
152
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo not_v6_cp_reginfo[] = {
148
- tcg_temp_free_ptr(fpst);
153
};
149
-
154
150
- return true;
155
static const ARMCPRegInfo not_v7_cp_reginfo[] = {
151
-}
156
- /* Standard v6 WFI (also used in some pre-v6 cores); not in v7 (which
152
-
157
+ /*
153
-#define DO_VCVT(INSN, RMODE, SIGNED) \
158
+ * Standard v6 WFI (also used in some pre-v6 cores); not in v7 (which
154
- static bool trans_##INSN(DisasContext *s, arg_2misc *a) \
159
* is UNPREDICTABLE; we choose to NOP as most implementations do).
155
- { \
160
*/
156
- return do_vcvt(s, a, RMODE, SIGNED); \
161
{ .name = "WFI_v6", .cp = 15, .crn = 7, .crm = 0, .opc1 = 0, .opc2 = 4,
157
- }
162
.access = PL1_W, .type = ARM_CP_WFI },
158
-
163
- /* L1 cache lockdown. Not architectural in v6 and earlier but in practice
159
-DO_VCVT(VCVTAU, FPROUNDING_TIEAWAY, false)
164
+ /*
160
-DO_VCVT(VCVTAS, FPROUNDING_TIEAWAY, true)
165
+ * L1 cache lockdown. Not architectural in v6 and earlier but in practice
161
-DO_VCVT(VCVTNU, FPROUNDING_TIEEVEN, false)
166
* implemented in 926, 946, 1026, 1136, 1176 and 11MPCore. StrongARM and
162
-DO_VCVT(VCVTNS, FPROUNDING_TIEEVEN, true)
167
* OMAPCP will override this space.
163
-DO_VCVT(VCVTPU, FPROUNDING_POSINF, false)
168
*/
164
-DO_VCVT(VCVTPS, FPROUNDING_POSINF, true)
169
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo not_v7_cp_reginfo[] = {
165
-DO_VCVT(VCVTMU, FPROUNDING_NEGINF, false)
170
{ .name = "DUMMY", .cp = 15, .crn = 0, .crm = 0, .opc1 = 1, .opc2 = CP_ANY,
166
-DO_VCVT(VCVTMS, FPROUNDING_NEGINF, true)
171
.access = PL1_R, .type = ARM_CP_CONST | ARM_CP_NO_RAW,
167
+DO_VEC_RMODE(VCVTAU, FPROUNDING_TIEAWAY, vcvt_rm_u)
172
.resetvalue = 0 },
168
+DO_VEC_RMODE(VCVTAS, FPROUNDING_TIEAWAY, vcvt_rm_s)
173
- /* We don't implement pre-v7 debug but most CPUs had at least a DBGDIDR;
169
+DO_VEC_RMODE(VCVTNU, FPROUNDING_TIEEVEN, vcvt_rm_u)
174
+ /*
170
+DO_VEC_RMODE(VCVTNS, FPROUNDING_TIEEVEN, vcvt_rm_s)
175
+ * We don't implement pre-v7 debug but most CPUs had at least a DBGDIDR;
171
+DO_VEC_RMODE(VCVTPU, FPROUNDING_POSINF, vcvt_rm_u)
176
* implementing it as RAZ means the "debug architecture version" bits
172
+DO_VEC_RMODE(VCVTPS, FPROUNDING_POSINF, vcvt_rm_s)
177
* will read as a reserved value, which should cause Linux to not try
173
+DO_VEC_RMODE(VCVTMU, FPROUNDING_NEGINF, vcvt_rm_u)
178
* to use the debug hardware.
174
+DO_VEC_RMODE(VCVTMS, FPROUNDING_NEGINF, vcvt_rm_s)
179
*/
175
180
{ .name = "DBGDIDR", .cp = 14, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 0,
176
static bool trans_VSWP(DisasContext *s, arg_2misc *a)
181
.access = PL0_R, .type = ARM_CP_CONST, .resetvalue = 0 },
177
{
182
- /* MMU TLB control. Note that the wildcarding means we cover not just
183
+ /*
184
+ * MMU TLB control. Note that the wildcarding means we cover not just
185
* the unified TLB ops but also the dside/iside/inner-shareable variants.
186
*/
187
{ .name = "TLBIALL", .cp = 15, .crn = 8, .crm = CP_ANY,
188
@@ -XXX,XX +XXX,XX @@ static void cpacr_write(CPUARMState *env, const ARMCPRegInfo *ri,
189
190
/* In ARMv8 most bits of CPACR_EL1 are RES0. */
191
if (!arm_feature(env, ARM_FEATURE_V8)) {
192
- /* ARMv7 defines bits for unimplemented coprocessors as RAZ/WI.
193
+ /*
194
+ * ARMv7 defines bits for unimplemented coprocessors as RAZ/WI.
195
* ASEDIS [31] and D32DIS [30] are both UNK/SBZP without VFP.
196
* TRCDIS [28] is RAZ/WI since we do not implement a trace macrocell.
197
*/
198
@@ -XXX,XX +XXX,XX @@ static void cpacr_write(CPUARMState *env, const ARMCPRegInfo *ri,
199
value |= R_CPACR_ASEDIS_MASK;
200
}
201
202
- /* VFPv3 and upwards with NEON implement 32 double precision
203
+ /*
204
+ * VFPv3 and upwards with NEON implement 32 double precision
205
* registers (D0-D31).
206
*/
207
if (!cpu_isar_feature(aa32_simd_r32, env_archcpu(env))) {
208
@@ -XXX,XX +XXX,XX @@ static uint64_t cpacr_read(CPUARMState *env, const ARMCPRegInfo *ri)
209
210
static void cpacr_reset(CPUARMState *env, const ARMCPRegInfo *ri)
211
{
212
- /* Call cpacr_write() so that we reset with the correct RAO bits set
213
+ /*
214
+ * Call cpacr_write() so that we reset with the correct RAO bits set
215
* for our CPU features.
216
*/
217
cpacr_write(env, ri, 0);
218
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v6_cp_reginfo[] = {
219
{ .name = "MVA_prefetch",
220
.cp = 15, .crn = 7, .crm = 13, .opc1 = 0, .opc2 = 1,
221
.access = PL1_W, .type = ARM_CP_NOP },
222
- /* We need to break the TB after ISB to execute self-modifying code
223
+ /*
224
+ * We need to break the TB after ISB to execute self-modifying code
225
* correctly and also to take any pending interrupts immediately.
226
* So use arm_cp_write_ignore() function instead of ARM_CP_NOP flag.
227
*/
228
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v6_cp_reginfo[] = {
229
.bank_fieldoffsets = { offsetof(CPUARMState, cp15.ifar_s),
230
offsetof(CPUARMState, cp15.ifar_ns) },
231
.resetvalue = 0, },
232
- /* Watchpoint Fault Address Register : should actually only be present
233
+ /*
234
+ * Watchpoint Fault Address Register : should actually only be present
235
* for 1136, 1176, 11MPCore.
236
*/
237
{ .name = "WFAR", .cp = 15, .crn = 6, .crm = 0, .opc1 = 0, .opc2 = 1,
238
@@ -XXX,XX +XXX,XX @@ static bool event_supported(uint16_t number)
239
static CPAccessResult pmreg_access(CPUARMState *env, const ARMCPRegInfo *ri,
240
bool isread)
241
{
242
- /* Performance monitor registers user accessibility is controlled
243
+ /*
244
+ * Performance monitor registers user accessibility is controlled
245
* by PMUSERENR. MDCR_EL2.TPM and MDCR_EL3.TPM allow configurable
246
* trapping to EL2 or EL3 for other accesses.
247
*/
248
@@ -XXX,XX +XXX,XX @@ static CPAccessResult pmreg_access_ccntr(CPUARMState *env,
249
(MDCR_HPME | MDCR_HPMD | MDCR_HPMN | MDCR_HCCD | MDCR_HLP)
250
#define MDCR_EL3_PMU_ENABLE_BITS (MDCR_SPME | MDCR_SCCD)
251
252
-/* Returns true if the counter (pass 31 for PMCCNTR) should count events using
253
+/*
254
+ * Returns true if the counter (pass 31 for PMCCNTR) should count events using
255
* the current EL, security state, and register configuration.
256
*/
257
static bool pmu_counter_enabled(CPUARMState *env, uint8_t counter)
258
@@ -XXX,XX +XXX,XX @@ static uint64_t pmccntr_read(CPUARMState *env, const ARMCPRegInfo *ri)
259
static void pmselr_write(CPUARMState *env, const ARMCPRegInfo *ri,
260
uint64_t value)
261
{
262
- /* The value of PMSELR.SEL affects the behavior of PMXEVTYPER and
263
+ /*
264
+ * The value of PMSELR.SEL affects the behavior of PMXEVTYPER and
265
* PMXEVCNTR. We allow [0..31] to be written to PMSELR here; in the
266
* meanwhile, we check PMSELR.SEL when PMXEVTYPER and PMXEVCNTR are
267
* accessed.
268
@@ -XXX,XX +XXX,XX @@ static void pmevtyper_write(CPUARMState *env, const ARMCPRegInfo *ri,
269
env->cp15.c14_pmevtyper[counter] = value & PMXEVTYPER_MASK;
270
pmevcntr_op_finish(env, counter);
271
}
272
- /* Attempts to access PMXEVTYPER are CONSTRAINED UNPREDICTABLE when
273
+ /*
274
+ * Attempts to access PMXEVTYPER are CONSTRAINED UNPREDICTABLE when
275
* PMSELR value is equal to or greater than the number of implemented
276
* counters, but not equal to 0x1f. We opt to behave as a RAZ/WI.
277
*/
278
@@ -XXX,XX +XXX,XX @@ static uint64_t pmevcntr_read(CPUARMState *env, const ARMCPRegInfo *ri,
279
}
280
return ret;
281
} else {
282
- /* We opt to behave as a RAZ/WI when attempts to access PM[X]EVCNTR
283
- * are CONSTRAINED UNPREDICTABLE. */
284
+ /*
285
+ * We opt to behave as a RAZ/WI when attempts to access PM[X]EVCNTR
286
+ * are CONSTRAINED UNPREDICTABLE.
287
+ */
288
return 0;
289
}
290
}
291
@@ -XXX,XX +XXX,XX @@ static void pmintenclr_write(CPUARMState *env, const ARMCPRegInfo *ri,
292
static void vbar_write(CPUARMState *env, const ARMCPRegInfo *ri,
293
uint64_t value)
294
{
295
- /* Note that even though the AArch64 view of this register has bits
296
+ /*
297
+ * Note that even though the AArch64 view of this register has bits
298
* [10:0] all RES0 we can only mask the bottom 5, to comply with the
299
* architectural requirements for bits which are RES0 only in some
300
* contexts. (ARMv8 would permit us to do no masking at all, but ARMv7
301
@@ -XXX,XX +XXX,XX @@ static void scr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
302
if (!arm_feature(env, ARM_FEATURE_EL2)) {
303
valid_mask &= ~SCR_HCE;
304
305
- /* On ARMv7, SMD (or SCD as it is called in v7) is only
306
+ /*
307
+ * On ARMv7, SMD (or SCD as it is called in v7) is only
308
* supported if EL2 exists. The bit is UNK/SBZP when
309
* EL2 is unavailable. In QEMU ARMv7, we force it to always zero
310
* when EL2 is unavailable.
311
@@ -XXX,XX +XXX,XX @@ static uint64_t ccsidr_read(CPUARMState *env, const ARMCPRegInfo *ri)
312
{
313
ARMCPU *cpu = env_archcpu(env);
314
315
- /* Acquire the CSSELR index from the bank corresponding to the CCSIDR
316
+ /*
317
+ * Acquire the CSSELR index from the bank corresponding to the CCSIDR
318
* bank
319
*/
320
uint32_t index = A32_BANKED_REG_GET(env, csselr,
321
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
322
/* the old v6 WFI, UNPREDICTABLE in v7 but we choose to NOP */
323
{ .name = "NOP", .cp = 15, .crn = 7, .crm = 0, .opc1 = 0, .opc2 = 4,
324
.access = PL1_W, .type = ARM_CP_NOP },
325
- /* Performance monitors are implementation defined in v7,
326
+ /*
327
+ * Performance monitors are implementation defined in v7,
328
* but with an ARM recommended set of registers, which we
329
* follow.
330
*
331
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
332
.writefn = csselr_write, .resetvalue = 0,
333
.bank_fieldoffsets = { offsetof(CPUARMState, cp15.csselr_s),
334
offsetof(CPUARMState, cp15.csselr_ns) } },
335
- /* Auxiliary ID register: this actually has an IMPDEF value but for now
336
+ /*
337
+ * Auxiliary ID register: this actually has an IMPDEF value but for now
338
* just RAZ for all cores:
339
*/
340
{ .name = "AIDR", .state = ARM_CP_STATE_BOTH,
341
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
342
.access = PL1_R, .type = ARM_CP_CONST,
343
.accessfn = access_aa64_tid1,
344
.resetvalue = 0 },
345
- /* Auxiliary fault status registers: these also are IMPDEF, and we
346
+ /*
347
+ * Auxiliary fault status registers: these also are IMPDEF, and we
348
* choose to RAZ/WI for all cores.
349
*/
350
{ .name = "AFSR0_EL1", .state = ARM_CP_STATE_BOTH,
351
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
352
.opc0 = 3, .opc1 = 0, .crn = 5, .crm = 1, .opc2 = 1,
353
.access = PL1_RW, .accessfn = access_tvm_trvm,
354
.type = ARM_CP_CONST, .resetvalue = 0 },
355
- /* MAIR can just read-as-written because we don't implement caches
356
+ /*
357
+ * MAIR can just read-as-written because we don't implement caches
358
* and so don't need to care about memory attributes.
359
*/
360
{ .name = "MAIR_EL1", .state = ARM_CP_STATE_AA64,
361
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
362
.opc0 = 3, .opc1 = 6, .crn = 10, .crm = 2, .opc2 = 0,
363
.access = PL3_RW, .fieldoffset = offsetof(CPUARMState, cp15.mair_el[3]),
364
.resetvalue = 0 },
365
- /* For non-long-descriptor page tables these are PRRR and NMRR;
366
+ /*
367
+ * For non-long-descriptor page tables these are PRRR and NMRR;
368
* regardless they still act as reads-as-written for QEMU.
369
*/
370
- /* MAIR0/1 are defined separately from their 64-bit counterpart which
371
+ /*
372
+ * MAIR0/1 are defined separately from their 64-bit counterpart which
373
* allows them to assign the correct fieldoffset based on the endianness
374
* handled in the field definitions.
375
*/
376
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v6k_cp_reginfo[] = {
377
static CPAccessResult gt_cntfrq_access(CPUARMState *env, const ARMCPRegInfo *ri,
378
bool isread)
379
{
380
- /* CNTFRQ: not visible from PL0 if both PL0PCTEN and PL0VCTEN are zero.
381
+ /*
382
+ * CNTFRQ: not visible from PL0 if both PL0PCTEN and PL0VCTEN are zero.
383
* Writable only at the highest implemented exception level.
384
*/
385
int el = arm_current_el(env);
386
@@ -XXX,XX +XXX,XX @@ static CPAccessResult gt_stimer_access(CPUARMState *env,
387
const ARMCPRegInfo *ri,
388
bool isread)
389
{
390
- /* The AArch64 register view of the secure physical timer is
391
+ /*
392
+ * The AArch64 register view of the secure physical timer is
393
* always accessible from EL3, and configurably accessible from
394
* Secure EL1.
395
*/
396
@@ -XXX,XX +XXX,XX @@ static void gt_recalc_timer(ARMCPU *cpu, int timeridx)
397
ARMGenericTimer *gt = &cpu->env.cp15.c14_timer[timeridx];
398
399
if (gt->ctl & 1) {
400
- /* Timer enabled: calculate and set current ISTATUS, irq, and
401
+ /*
402
+ * Timer enabled: calculate and set current ISTATUS, irq, and
403
* reset timer to when ISTATUS next has to change
404
*/
405
uint64_t offset = timeridx == GTIMER_VIRT ?
406
@@ -XXX,XX +XXX,XX @@ static void gt_recalc_timer(ARMCPU *cpu, int timeridx)
407
/* Next transition is when we hit cval */
408
nexttick = gt->cval + offset;
409
}
410
- /* Note that the desired next expiry time might be beyond the
411
+ /*
412
+ * Note that the desired next expiry time might be beyond the
413
* signed-64-bit range of a QEMUTimer -- in this case we just
414
* set the timer for as far in the future as possible. When the
415
* timer expires we will reset the timer for any remaining period.
416
@@ -XXX,XX +XXX,XX @@ static void gt_ctl_write(CPUARMState *env, const ARMCPRegInfo *ri,
417
/* Enable toggled */
418
gt_recalc_timer(cpu, timeridx);
419
} else if ((oldval ^ value) & 2) {
420
- /* IMASK toggled: don't need to recalculate,
421
+ /*
422
+ * IMASK toggled: don't need to recalculate,
423
* just set the interrupt line based on ISTATUS
424
*/
425
int irqstate = (oldval & 4) && !(value & 2);
426
@@ -XXX,XX +XXX,XX @@ static void arm_gt_cntfrq_reset(CPUARMState *env, const ARMCPRegInfo *opaque)
427
}
428
429
static const ARMCPRegInfo generic_timer_cp_reginfo[] = {
430
- /* Note that CNTFRQ is purely reads-as-written for the benefit
431
+ /*
432
+ * Note that CNTFRQ is purely reads-as-written for the benefit
433
* of software; writing it doesn't actually change the timer frequency.
434
* Our reset value matches the fixed frequency we implement the timer at.
435
*/
436
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo generic_timer_cp_reginfo[] = {
437
.readfn = gt_virt_redir_cval_read, .raw_readfn = raw_read,
438
.writefn = gt_virt_redir_cval_write, .raw_writefn = raw_write,
439
},
440
- /* Secure timer -- this is actually restricted to only EL3
441
+ /*
442
+ * Secure timer -- this is actually restricted to only EL3
443
* and configurably Secure-EL1 via the accessfn.
444
*/
445
{ .name = "CNTPS_TVAL_EL1", .state = ARM_CP_STATE_AA64,
446
@@ -XXX,XX +XXX,XX @@ static CPAccessResult e2h_access(CPUARMState *env, const ARMCPRegInfo *ri,
447
448
#else
449
450
-/* In user-mode most of the generic timer registers are inaccessible
451
+/*
452
+ * In user-mode most of the generic timer registers are inaccessible
453
* however modern kernels (4.12+) allow access to cntvct_el0
454
*/
455
456
@@ -XXX,XX +XXX,XX @@ static uint64_t gt_virt_cnt_read(CPUARMState *env, const ARMCPRegInfo *ri)
457
{
458
ARMCPU *cpu = env_archcpu(env);
459
460
- /* Currently we have no support for QEMUTimer in linux-user so we
461
+ /*
462
+ * Currently we have no support for QEMUTimer in linux-user so we
463
* can't call gt_get_countervalue(env), instead we directly
464
* call the lower level functions.
465
*/
466
@@ -XXX,XX +XXX,XX @@ static CPAccessResult ats_access(CPUARMState *env, const ARMCPRegInfo *ri,
467
bool isread)
468
{
469
if (ri->opc2 & 4) {
470
- /* The ATS12NSO* operations must trap to EL3 or EL2 if executed in
471
+ /*
472
+ * The ATS12NSO* operations must trap to EL3 or EL2 if executed in
473
* Secure EL1 (which can only happen if EL3 is AArch64).
474
* They are simply UNDEF if executed from NS EL1.
475
* They function normally from EL2 or EL3.
476
@@ -XXX,XX +XXX,XX @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
477
}
478
}
479
} else {
480
- /* fsr is a DFSR/IFSR value for the short descriptor
481
+ /*
482
+ * fsr is a DFSR/IFSR value for the short descriptor
483
* translation table format (with WnR always clear).
484
* Convert it to a 32-bit PAR.
485
*/
486
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo pmsav8r_cp_reginfo[] = {
487
};
488
489
static const ARMCPRegInfo pmsav7_cp_reginfo[] = {
490
- /* Reset for all these registers is handled in arm_cpu_reset(),
491
+ /*
492
+ * Reset for all these registers is handled in arm_cpu_reset(),
493
* because the PMSAv7 is also used by M-profile CPUs, which do
494
* not register cpregs but still need the state to be reset.
495
*/
496
@@ -XXX,XX +XXX,XX @@ static void vmsa_ttbcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
497
}
498
499
if (arm_feature(env, ARM_FEATURE_LPAE)) {
500
- /* With LPAE the TTBCR could result in a change of ASID
501
+ /*
502
+ * With LPAE the TTBCR could result in a change of ASID
503
* via the TTBCR.A1 bit, so do a TLB flush.
504
*/
505
tlb_flush(CPU(cpu));
506
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo vmsa_cp_reginfo[] = {
507
offsetoflow32(CPUARMState, cp15.tcr_el[1])} },
508
};
509
510
-/* Note that unlike TTBCR, writing to TTBCR2 does not require flushing
511
+/*
512
+ * Note that unlike TTBCR, writing to TTBCR2 does not require flushing
513
* qemu tlbs nor adjusting cached masks.
514
*/
515
static const ARMCPRegInfo ttbcr2_reginfo = {
516
@@ -XXX,XX +XXX,XX @@ static void omap_wfi_write(CPUARMState *env, const ARMCPRegInfo *ri,
517
static void omap_cachemaint_write(CPUARMState *env, const ARMCPRegInfo *ri,
518
uint64_t value)
519
{
520
- /* On OMAP there are registers indicating the max/min index of dcache lines
521
+ /*
522
+ * On OMAP there are registers indicating the max/min index of dcache lines
523
* containing a dirty line; cache flush operations have to reset these.
524
*/
525
env->cp15.c15_i_max = 0x000;
526
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo omap_cp_reginfo[] = {
527
.crm = 8, .opc1 = 0, .opc2 = 0, .access = PL1_RW,
528
.type = ARM_CP_NO_RAW,
529
.readfn = arm_cp_read_zero, .writefn = omap_wfi_write, },
530
- /* TODO: Peripheral port remap register:
531
+ /*
532
+ * TODO: Peripheral port remap register:
533
* On OMAP2 mcr p15, 0, rn, c15, c2, 4 sets up the interrupt controller
534
* base address at $rn & ~0xfff and map size of 0x200 << ($rn & 0xfff),
535
* when MMU is off.
536
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo xscale_cp_reginfo[] = {
537
.cp = 15, .crn = 1, .crm = 0, .opc1 = 0, .opc2 = 1, .access = PL1_RW,
538
.fieldoffset = offsetof(CPUARMState, cp15.c1_xscaleauxcr),
539
.resetvalue = 0, },
540
- /* XScale specific cache-lockdown: since we have no cache we NOP these
541
+ /*
542
+ * XScale specific cache-lockdown: since we have no cache we NOP these
543
* and hope the guest does not really rely on cache behaviour.
544
*/
545
{ .name = "XSCALE_LOCK_ICACHE_LINE",
546
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo xscale_cp_reginfo[] = {
547
};
548
549
static const ARMCPRegInfo dummy_c15_cp_reginfo[] = {
550
- /* RAZ/WI the whole crn=15 space, when we don't have a more specific
551
+ /*
552
+ * RAZ/WI the whole crn=15 space, when we don't have a more specific
553
* implementation of this implementation-defined space.
554
* Ideally this should eventually disappear in favour of actually
555
* implementing the correct behaviour for all cores.
556
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo cache_block_ops_cp_reginfo[] = {
557
};
558
559
static const ARMCPRegInfo cache_test_clean_cp_reginfo[] = {
560
- /* The cache test-and-clean instructions always return (1 << 30)
561
+ /*
562
+ * The cache test-and-clean instructions always return (1 << 30)
563
* to indicate that there are no dirty cache lines.
564
*/
565
{ .name = "TC_DCACHE", .cp = 15, .crn = 7, .crm = 10, .opc1 = 0, .opc2 = 3,
566
@@ -XXX,XX +XXX,XX @@ static uint64_t mpidr_read_val(CPUARMState *env)
567
568
if (arm_feature(env, ARM_FEATURE_V7MP)) {
569
mpidr |= (1U << 31);
570
- /* Cores which are uniprocessor (non-coherent)
571
+ /*
572
+ * Cores which are uniprocessor (non-coherent)
573
* but still implement the MP extensions set
574
* bit 30. (For instance, Cortex-R5).
575
*/
576
@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_tocu(CPUARMState *env, const ARMCPRegInfo *ri,
577
return do_cacheop_pou_access(env, HCR_TOCU | HCR_TPU);
578
}
579
580
-/* See: D4.7.2 TLB maintenance requirements and the TLB maintenance instructions
581
+/*
582
+ * See: D4.7.2 TLB maintenance requirements and the TLB maintenance instructions
583
* Page D4-1736 (DDI0487A.b)
584
*/
585
586
@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_alle3is_write(CPUARMState *env, const ARMCPRegInfo *ri,
587
static void tlbi_aa64_vae2_write(CPUARMState *env, const ARMCPRegInfo *ri,
588
uint64_t value)
589
{
590
- /* Invalidate by VA, EL2
591
+ /*
592
+ * Invalidate by VA, EL2
593
* Currently handles both VAE2 and VALE2, since we don't support
594
* flush-last-level-only.
595
*/
596
@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae2_write(CPUARMState *env, const ARMCPRegInfo *ri,
597
static void tlbi_aa64_vae3_write(CPUARMState *env, const ARMCPRegInfo *ri,
598
uint64_t value)
599
{
600
- /* Invalidate by VA, EL3
601
+ /*
602
+ * Invalidate by VA, EL3
603
* Currently handles both VAE3 and VALE3, since we don't support
604
* flush-last-level-only.
605
*/
606
@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
607
static void tlbi_aa64_vae1_write(CPUARMState *env, const ARMCPRegInfo *ri,
608
uint64_t value)
609
{
610
- /* Invalidate by VA, EL1&0 (AArch64 version).
611
+ /*
612
+ * Invalidate by VA, EL1&0 (AArch64 version).
613
* Currently handles all of VAE1, VAAE1, VAALE1 and VALE1,
614
* since we don't support flush-for-specific-ASID-only or
615
* flush-last-level-only.
616
@@ -XXX,XX +XXX,XX @@ static CPAccessResult sp_el0_access(CPUARMState *env, const ARMCPRegInfo *ri,
617
bool isread)
618
{
619
if (!(env->pstate & PSTATE_SP)) {
620
- /* Access to SP_EL0 is undefined if it's being used as
621
+ /*
622
+ * Access to SP_EL0 is undefined if it's being used as
623
* the stack pointer.
624
*/
625
return CP_ACCESS_TRAP_UNCATEGORIZED;
626
@@ -XXX,XX +XXX,XX @@ static void sctlr_write(CPUARMState *env, const ARMCPRegInfo *ri,
627
}
628
629
if (raw_read(env, ri) == value) {
630
- /* Skip the TLB flush if nothing actually changed; Linux likes
631
+ /*
632
+ * Skip the TLB flush if nothing actually changed; Linux likes
633
* to do a lot of pointless SCTLR writes.
634
*/
635
return;
636
@@ -XXX,XX +XXX,XX @@ static void mdcr_el2_write(CPUARMState *env, const ARMCPRegInfo *ri,
637
}
638
639
static const ARMCPRegInfo v8_cp_reginfo[] = {
640
- /* Minimal set of EL0-visible registers. This will need to be expanded
641
+ /*
642
+ * Minimal set of EL0-visible registers. This will need to be expanded
643
* significantly for system emulation of AArch64 CPUs.
644
*/
645
{ .name = "NZCV", .state = ARM_CP_STATE_AA64,
646
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
647
.opc0 = 3, .opc1 = 0, .crn = 4, .crm = 0, .opc2 = 0,
648
.access = PL1_RW,
649
.fieldoffset = offsetof(CPUARMState, banked_spsr[BANK_SVC]) },
650
- /* We rely on the access checks not allowing the guest to write to the
651
+ /*
652
+ * We rely on the access checks not allowing the guest to write to the
653
* state field when SPSel indicates that it's being used as the stack
654
* pointer.
655
*/
656
@@ -XXX,XX +XXX,XX @@ static void do_hcr_write(CPUARMState *env, uint64_t value, uint64_t valid_mask)
657
if (arm_feature(env, ARM_FEATURE_EL3)) {
658
valid_mask &= ~HCR_HCD;
659
} else if (cpu->psci_conduit != QEMU_PSCI_CONDUIT_SMC) {
660
- /* Architecturally HCR.TSC is RES0 if EL3 is not implemented.
661
+ /*
662
+ * Architecturally HCR.TSC is RES0 if EL3 is not implemented.
663
* However, if we're using the SMC PSCI conduit then QEMU is
664
* effectively acting like EL3 firmware and so the guest at
665
* EL2 should retain the ability to prevent EL1 from being
666
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
667
.access = PL2_W, .type = ARM_CP_NO_RAW | ARM_CP_EL3_NO_EL2_UNDEF,
668
.writefn = tlbi_aa64_vae2is_write },
669
#ifndef CONFIG_USER_ONLY
670
- /* Unlike the other EL2-related AT operations, these must
671
+ /*
672
+ * Unlike the other EL2-related AT operations, these must
673
* UNDEF from EL3 if EL2 is not implemented, which is why we
674
* define them here rather than with the rest of the AT ops.
675
*/
676
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
677
.access = PL2_W, .accessfn = at_s1e2_access,
678
.type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC | ARM_CP_EL3_NO_EL2_UNDEF,
679
.writefn = ats_write64 },
680
- /* The AArch32 ATS1H* operations are CONSTRAINED UNPREDICTABLE
681
+ /*
682
+ * The AArch32 ATS1H* operations are CONSTRAINED UNPREDICTABLE
683
* if EL2 is not implemented; we choose to UNDEF. Behaviour at EL3
684
* with SCR.NS == 0 outside Monitor mode is UNPREDICTABLE; we choose
685
* to behave as if SCR.NS was 1.
686
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
687
.writefn = ats1h_write, .type = ARM_CP_NO_RAW | ARM_CP_RAISES_EXC },
688
{ .name = "CNTHCTL_EL2", .state = ARM_CP_STATE_BOTH,
689
.opc0 = 3, .opc1 = 4, .crn = 14, .crm = 1, .opc2 = 0,
690
- /* ARMv7 requires bit 0 and 1 to reset to 1. ARMv8 defines the
691
+ /*
692
+ * ARMv7 requires bit 0 and 1 to reset to 1. ARMv8 defines the
693
* reset values as IMPDEF. We choose to reset to 3 to comply with
694
* both ARMv7 and ARMv8.
695
*/
696
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el2_sec_cp_reginfo[] = {
697
static CPAccessResult nsacr_access(CPUARMState *env, const ARMCPRegInfo *ri,
698
bool isread)
699
{
700
- /* The NSACR is RW at EL3, and RO for NS EL1 and NS EL2.
701
+ /*
702
+ * The NSACR is RW at EL3, and RO for NS EL1 and NS EL2.
703
* At Secure EL1 it traps to EL3 or EL2.
704
*/
705
if (arm_current_el(env) == 3) {
706
@@ -XXX,XX +XXX,XX @@ static void define_pmu_regs(ARMCPU *cpu)
707
}
708
}
709
710
-/* We don't know until after realize whether there's a GICv3
711
+/*
712
+ * We don't know until after realize whether there's a GICv3
713
* attached, and that is what registers the gicv3 sysregs.
714
* So we have to fill in the GIC fields in ID_PFR/ID_PFR1_EL1/ID_AA64PFR0_EL1
715
* at runtime.
716
@@ -XXX,XX +XXX,XX @@ static uint64_t id_aa64pfr0_read(CPUARMState *env, const ARMCPRegInfo *ri)
717
}
718
#endif
719
720
-/* Shared logic between LORID and the rest of the LOR* registers.
721
+/*
722
+ * Shared logic between LORID and the rest of the LOR* registers.
723
* Secure state exclusion has already been dealt with.
724
*/
725
static CPAccessResult access_lor_ns(CPUARMState *env,
726
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
727
728
define_arm_cp_regs(cpu, cp_reginfo);
729
if (!arm_feature(env, ARM_FEATURE_V8)) {
730
- /* Must go early as it is full of wildcards that may be
731
+ /*
732
+ * Must go early as it is full of wildcards that may be
733
* overridden by later definitions.
734
*/
735
define_arm_cp_regs(cpu, not_v8_cp_reginfo);
736
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
737
.access = PL1_R, .type = ARM_CP_CONST,
738
.accessfn = access_aa32_tid3,
739
.resetvalue = cpu->isar.id_pfr0 },
740
- /* ID_PFR1 is not a plain ARM_CP_CONST because we don't know
741
+ /*
742
+ * ID_PFR1 is not a plain ARM_CP_CONST because we don't know
743
* the value of the GIC field until after we define these regs.
744
*/
745
{ .name = "ID_PFR1", .state = ARM_CP_STATE_BOTH,
746
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
747
748
define_arm_cp_regs(cpu, el3_regs);
749
}
750
- /* The behaviour of NSACR is sufficiently various that we don't
751
+ /*
752
+ * The behaviour of NSACR is sufficiently various that we don't
753
* try to describe it in a single reginfo:
754
* if EL3 is 64 bit, then trap to EL3 from S EL1,
755
* reads as constant 0xc00 from NS EL1 and NS EL2
756
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
757
if (cpu_isar_feature(aa32_jazelle, cpu)) {
758
define_arm_cp_regs(cpu, jazelle_regs);
759
}
760
- /* Slightly awkwardly, the OMAP and StrongARM cores need all of
761
+ /*
762
+ * Slightly awkwardly, the OMAP and StrongARM cores need all of
763
* cp15 crn=0 to be writes-ignored, whereas for other cores they should
764
* be read-only (ie write causes UNDEF exception).
765
*/
766
{
767
ARMCPRegInfo id_pre_v8_midr_cp_reginfo[] = {
768
- /* Pre-v8 MIDR space.
769
+ /*
770
+ * Pre-v8 MIDR space.
771
* Note that the MIDR isn't a simple constant register because
772
* of the TI925 behaviour where writes to another register can
773
* cause the MIDR value to change.
774
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
775
if (arm_feature(env, ARM_FEATURE_OMAPCP) ||
776
arm_feature(env, ARM_FEATURE_STRONGARM)) {
777
size_t i;
778
- /* Register the blanket "writes ignored" value first to cover the
779
+ /*
780
+ * Register the blanket "writes ignored" value first to cover the
781
* whole space. Then update the specific ID registers to allow write
782
* access, so that they ignore writes rather than causing them to
783
* UNDEF.
784
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
785
.raw_writefn = raw_write,
786
};
787
if (arm_feature(env, ARM_FEATURE_XSCALE)) {
788
- /* Normally we would always end the TB on an SCTLR write, but Linux
789
+ /*
790
+ * Normally we would always end the TB on an SCTLR write, but Linux
791
* arch/arm/mach-pxa/sleep.S expects two instructions following
792
* an MMU enable to execute from cache. Imitate this behaviour.
793
*/
794
@@ -XXX,XX +XXX,XX @@ static void add_cpreg_to_hashtable(ARMCPU *cpu, const ARMCPRegInfo *r,
795
void define_one_arm_cp_reg_with_opaque(ARMCPU *cpu,
796
const ARMCPRegInfo *r, void *opaque)
797
{
798
- /* Define implementations of coprocessor registers.
799
+ /*
800
+ * Define implementations of coprocessor registers.
801
* We store these in a hashtable because typically
802
* there are less than 150 registers in a space which
803
* is 16*16*16*8*8 = 262144 in size.
804
@@ -XXX,XX +XXX,XX @@ void define_one_arm_cp_reg_with_opaque(ARMCPU *cpu,
805
default:
806
g_assert_not_reached();
807
}
808
- /* The AArch64 pseudocode CheckSystemAccess() specifies that op1
809
+ /*
810
+ * The AArch64 pseudocode CheckSystemAccess() specifies that op1
811
* encodes a minimum access level for the register. We roll this
812
* runtime check into our general permission check code, so check
813
* here that the reginfo's specified permissions are strict enough
814
@@ -XXX,XX +XXX,XX @@ void define_one_arm_cp_reg_with_opaque(ARMCPU *cpu,
815
assert((r->access & ~mask) == 0);
816
}
817
818
- /* Check that the register definition has enough info to handle
819
+ /*
820
+ * Check that the register definition has enough info to handle
821
* reads and writes if they are permitted.
822
*/
823
if (!(r->type & (ARM_CP_SPECIAL_MASK | ARM_CP_CONST))) {
824
@@ -XXX,XX +XXX,XX @@ void define_one_arm_cp_reg_with_opaque(ARMCPU *cpu,
825
continue;
826
}
827
if (state == ARM_CP_STATE_AA32) {
828
- /* Under AArch32 CP registers can be common
829
+ /*
830
+ * Under AArch32 CP registers can be common
831
* (same for secure and non-secure world) or banked.
832
*/
833
char *name;
834
@@ -XXX,XX +XXX,XX @@ void define_one_arm_cp_reg_with_opaque(ARMCPU *cpu,
835
g_assert_not_reached();
836
}
837
} else {
838
- /* AArch64 registers get mapped to non-secure instance
839
- * of AArch32 */
840
+ /*
841
+ * AArch64 registers get mapped to non-secure instance
842
+ * of AArch32
843
+ */
844
add_cpreg_to_hashtable(cpu, r, opaque, state,
845
ARM_CP_SECSTATE_NS,
846
crm, opc1, opc2, r->name);
847
@@ -XXX,XX +XXX,XX @@ void arm_cp_reset_ignore(CPUARMState *env, const ARMCPRegInfo *opaque)
848
849
static int bad_mode_switch(CPUARMState *env, int mode, CPSRWriteType write_type)
850
{
851
- /* Return true if it is not valid for us to switch to
852
+ /*
853
+ * Return true if it is not valid for us to switch to
854
* this CPU mode (ie all the UNPREDICTABLE cases in
855
* the ARM ARM CPSRWriteByInstr pseudocode).
856
*/
857
@@ -XXX,XX +XXX,XX @@ static int bad_mode_switch(CPUARMState *env, int mode, CPSRWriteType write_type)
858
case ARM_CPU_MODE_UND:
859
case ARM_CPU_MODE_IRQ:
860
case ARM_CPU_MODE_FIQ:
861
- /* Note that we don't implement the IMPDEF NSACR.RFR which in v7
862
+ /*
863
+ * Note that we don't implement the IMPDEF NSACR.RFR which in v7
864
* allows FIQ mode to be Secure-only. (In v8 this doesn't exist.)
865
*/
866
- /* If HCR.TGE is set then changes from Monitor to NS PL1 via MSR
867
+ /*
868
+ * If HCR.TGE is set then changes from Monitor to NS PL1 via MSR
869
* and CPS are treated as illegal mode changes.
870
*/
871
if (write_type == CPSRWriteByInstr &&
872
@@ -XXX,XX +XXX,XX @@ void cpsr_write(CPUARMState *env, uint32_t val, uint32_t mask,
873
env->GE = (val >> 16) & 0xf;
874
}
875
876
- /* In a V7 implementation that includes the security extensions but does
877
+ /*
878
+ * In a V7 implementation that includes the security extensions but does
879
* not include Virtualization Extensions the SCR.FW and SCR.AW bits control
880
* whether non-secure software is allowed to change the CPSR_F and CPSR_A
881
* bits respectively.
882
@@ -XXX,XX +XXX,XX @@ void cpsr_write(CPUARMState *env, uint32_t val, uint32_t mask,
883
changed_daif = (env->daif ^ val) & mask;
884
885
if (changed_daif & CPSR_A) {
886
- /* Check to see if we are allowed to change the masking of async
887
+ /*
888
+ * Check to see if we are allowed to change the masking of async
889
* abort exceptions from a non-secure state.
890
*/
891
if (!(env->cp15.scr_el3 & SCR_AW)) {
892
@@ -XXX,XX +XXX,XX @@ void cpsr_write(CPUARMState *env, uint32_t val, uint32_t mask,
893
}
894
895
if (changed_daif & CPSR_F) {
896
- /* Check to see if we are allowed to change the masking of FIQ
897
+ /*
898
+ * Check to see if we are allowed to change the masking of FIQ
899
* exceptions from a non-secure state.
900
*/
901
if (!(env->cp15.scr_el3 & SCR_FW)) {
902
@@ -XXX,XX +XXX,XX @@ void cpsr_write(CPUARMState *env, uint32_t val, uint32_t mask,
903
mask &= ~CPSR_F;
904
}
905
906
- /* Check whether non-maskable FIQ (NMFI) support is enabled.
907
+ /*
908
+ * Check whether non-maskable FIQ (NMFI) support is enabled.
909
* If this bit is set software is not allowed to mask
910
* FIQs, but is allowed to set CPSR_F to 0.
911
*/
912
@@ -XXX,XX +XXX,XX @@ void cpsr_write(CPUARMState *env, uint32_t val, uint32_t mask,
913
if (write_type != CPSRWriteRaw &&
914
((env->uncached_cpsr ^ val) & mask & CPSR_M)) {
915
if ((env->uncached_cpsr & CPSR_M) == ARM_CPU_MODE_USR) {
916
- /* Note that we can only get here in USR mode if this is a
917
+ /*
918
+ * Note that we can only get here in USR mode if this is a
919
* gdb stub write; for this case we follow the architectural
920
* behaviour for guest writes in USR mode of ignoring an attempt
921
* to switch mode. (Those are caught by translate.c for writes
922
@@ -XXX,XX +XXX,XX @@ void cpsr_write(CPUARMState *env, uint32_t val, uint32_t mask,
923
*/
924
mask &= ~CPSR_M;
925
} else if (bad_mode_switch(env, val & CPSR_M, write_type)) {
926
- /* Attempt to switch to an invalid mode: this is UNPREDICTABLE in
927
+ /*
928
+ * Attempt to switch to an invalid mode: this is UNPREDICTABLE in
929
* v7, and has defined behaviour in v8:
930
* + leave CPSR.M untouched
931
* + allow changes to the other CPSR fields
932
@@ -XXX,XX +XXX,XX @@ static void switch_mode(CPUARMState *env, int mode)
933
env->regs[14] = env->banked_r14[r14_bank_number(mode)];
934
}
935
936
-/* Physical Interrupt Target EL Lookup Table
937
+/*
938
+ * Physical Interrupt Target EL Lookup Table
939
*
940
* [ From ARM ARM section G1.13.4 (Table G1-15) ]
941
*
942
@@ -XXX,XX +XXX,XX @@ uint32_t arm_phys_excp_target_el(CPUState *cs, uint32_t excp_idx,
943
if (arm_feature(env, ARM_FEATURE_EL3)) {
944
rw = ((env->cp15.scr_el3 & SCR_RW) == SCR_RW);
945
} else {
946
- /* Either EL2 is the highest EL (and so the EL2 register width
947
+ /*
948
+ * Either EL2 is the highest EL (and so the EL2 register width
949
* is given by is64); or there is no EL2 or EL3, in which case
950
* the value of 'rw' does not affect the table lookup anyway.
951
*/
952
@@ -XXX,XX +XXX,XX @@ void aarch64_sync_64_to_32(CPUARMState *env)
953
env->banked_r13[bank_number(ARM_CPU_MODE_UND)] = env->xregs[23];
954
}
955
956
- /* Registers x24-x30 are mapped to r8-r14 in FIQ mode. If we are in FIQ
957
+ /*
958
+ * Registers x24-x30 are mapped to r8-r14 in FIQ mode. If we are in FIQ
959
* mode, then we can copy to r8-r14. Otherwise, we copy to the
960
* FIQ bank for r8-r14.
961
*/
962
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_do_interrupt_aarch32(CPUState *cs)
963
/* High vectors. When enabled, base address cannot be remapped. */
964
addr += 0xffff0000;
965
} else {
966
- /* ARM v7 architectures provide a vector base address register to remap
967
+ /*
968
+ * ARM v7 architectures provide a vector base address register to remap
969
* the interrupt vector table.
970
* This register is only followed in non-monitor mode, and is banked.
971
* Note: only bits 31:5 are valid.
972
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_do_interrupt_aarch64(CPUState *cs)
973
aarch64_sve_change_el(env, cur_el, new_el, is_a64(env));
974
975
if (cur_el < new_el) {
976
- /* Entry vector offset depends on whether the implemented EL
977
+ /*
978
+ * Entry vector offset depends on whether the implemented EL
979
* immediately lower than the target level is using AArch32 or AArch64
980
*/
981
bool is_aa64;
982
@@ -XXX,XX +XXX,XX @@ static void handle_semihosting(CPUState *cs)
983
}
984
#endif
985
986
-/* Handle a CPU exception for A and R profile CPUs.
987
+/*
988
+ * Handle a CPU exception for A and R profile CPUs.
989
* Do any appropriate logging, handle PSCI calls, and then hand off
990
* to the AArch64-entry or AArch32-entry function depending on the
991
* target exception level's register width.
992
@@ -XXX,XX +XXX,XX @@ void arm_cpu_do_interrupt(CPUState *cs)
993
}
994
#endif
995
996
- /* Hooks may change global state so BQL should be held, also the
997
+ /*
998
+ * Hooks may change global state so BQL should be held, also the
999
* BQL needs to be held for any modification of
1000
* cs->interrupt_request.
1001
*/
1002
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
1003
};
1004
}
1005
1006
-/* Note that signed overflow is undefined in C. The following routines are
1007
- careful to use unsigned types where modulo arithmetic is required.
1008
- Failure to do so _will_ break on newer gcc. */
1009
+/*
1010
+ * Note that signed overflow is undefined in C. The following routines are
1011
+ * careful to use unsigned types where modulo arithmetic is required.
1012
+ * Failure to do so _will_ break on newer gcc.
1013
+ */
1014
1015
/* Signed saturating arithmetic. */
1016
1017
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sel_flags)(uint32_t flags, uint32_t a, uint32_t b)
1018
return (a & mask) | (b & ~mask);
1019
}
1020
1021
-/* CRC helpers.
1022
+/*
1023
+ * CRC helpers.
1024
* The upper bytes of val (above the number specified by 'bytes') must have
1025
* been zeroed out by the caller.
1026
*/
1027
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(crc32c)(uint32_t acc, uint32_t val, uint32_t bytes)
1028
return crc32c(acc, buf, bytes) ^ 0xffffffff;
1029
}
1030
1031
-/* Return the exception level to which FP-disabled exceptions should
1032
+/*
1033
+ * Return the exception level to which FP-disabled exceptions should
1034
* be taken, or 0 if FP is enabled.
1035
*/
1036
int fp_exception_el(CPUARMState *env, int cur_el)
1037
@@ -XXX,XX +XXX,XX @@ int fp_exception_el(CPUARMState *env, int cur_el)
1038
#ifndef CONFIG_USER_ONLY
1039
uint64_t hcr_el2;
1040
1041
- /* CPACR and the CPTR registers don't exist before v6, so FP is
1042
+ /*
1043
+ * CPACR and the CPTR registers don't exist before v6, so FP is
1044
* always accessible
1045
*/
1046
if (!arm_feature(env, ARM_FEATURE_V6)) {
1047
@@ -XXX,XX +XXX,XX @@ int fp_exception_el(CPUARMState *env, int cur_el)
1048
1049
hcr_el2 = arm_hcr_el2_eff(env);
1050
1051
- /* The CPACR controls traps to EL1, or PL1 if we're 32 bit:
1052
+ /*
1053
+ * The CPACR controls traps to EL1, or PL1 if we're 32 bit:
1054
* 0, 2 : trap EL0 and EL1/PL1 accesses
1055
* 1 : trap only EL0 accesses
1056
* 3 : trap no accesses
178
--
1057
--
179
2.20.1
1058
2.25.1
180
181
diff view generated by jsdifflib
1
From: Leif Lindholm <leif@nuviainc.com>
1
From: Fabiano Rosas <farosas@suse.de>
2
2
3
The sbsa-ref platform uses a minimal device tree to pass amount of memory
3
Fix the following:
4
as well as number of cpus to the firmware. However, when dumping that
5
minimal dtb (with -M sbsa-virt,dumpdtb=<file>), the resulting blob
6
generates a warning when decompiled by dtc due to lack of reg property.
7
4
8
Add a simple reg property per cpu, representing a 64-bit MPIDR_EL1.
5
ERROR: spaces required around that '|' (ctx:VxV)
6
ERROR: space required before the open parenthesis '('
7
ERROR: spaces required around that '+' (ctx:VxB)
8
ERROR: space prohibited between function name and open parenthesis '('
9
9
10
This also ends up being cleaner than having the firmware calculating its
10
(the last two still have some occurrences in macros which I left
11
own IDs for generating APCI.
11
behind because it might impact readability)
12
12
13
Signed-off-by: Leif Lindholm <leif@nuviainc.com>
13
Signed-off-by: Fabiano Rosas <farosas@suse.de>
14
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
14
Reviewed-by: Claudio Fontana <cfontana@suse.de>
15
Message-id: 20200827124335.30586-1-leif@nuviainc.com
15
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
16
Message-id: 20221213190537.511-3-farosas@suse.de
16
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
17
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
17
---
18
---
18
hw/arm/sbsa-ref.c | 29 +++++++++++++++++++++++------
19
target/arm/helper.c | 42 +++++++++++++++++++++---------------------
19
1 file changed, 23 insertions(+), 6 deletions(-)
20
1 file changed, 21 insertions(+), 21 deletions(-)
20
21
21
diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
22
diff --git a/target/arm/helper.c b/target/arm/helper.c
22
index XXXXXXX..XXXXXXX 100644
23
index XXXXXXX..XXXXXXX 100644
23
--- a/hw/arm/sbsa-ref.c
24
--- a/target/arm/helper.c
24
+++ b/hw/arm/sbsa-ref.c
25
+++ b/target/arm/helper.c
25
@@ -XXX,XX +XXX,XX @@ static const int sbsa_ref_irqmap[] = {
26
@@ -XXX,XX +XXX,XX @@ static void add_cpreg_to_list(gpointer key, gpointer opaque)
26
[SBSA_EHCI] = 11,
27
uint32_t regidx = (uintptr_t)key;
28
const ARMCPRegInfo *ri = get_arm_cp_reginfo(cpu->cp_regs, regidx);
29
30
- if (!(ri->type & (ARM_CP_NO_RAW|ARM_CP_ALIAS))) {
31
+ if (!(ri->type & (ARM_CP_NO_RAW | ARM_CP_ALIAS))) {
32
cpu->cpreg_indexes[cpu->cpreg_array_len] = cpreg_to_kvm_id(regidx);
33
/* The value array need not be initialized at this point */
34
cpu->cpreg_array_len++;
35
@@ -XXX,XX +XXX,XX @@ static void count_cpreg(gpointer key, gpointer opaque)
36
37
ri = g_hash_table_lookup(cpu->cp_regs, key);
38
39
- if (!(ri->type & (ARM_CP_NO_RAW|ARM_CP_ALIAS))) {
40
+ if (!(ri->type & (ARM_CP_NO_RAW | ARM_CP_ALIAS))) {
41
cpu->cpreg_array_len++;
42
}
43
}
44
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v6k_cp_reginfo[] = {
45
.resetfn = arm_cp_reset_ignore },
46
{ .name = "TPIDRRO_EL0", .state = ARM_CP_STATE_AA64,
47
.opc0 = 3, .opc1 = 3, .opc2 = 3, .crn = 13, .crm = 0,
48
- .access = PL0_R|PL1_W,
49
+ .access = PL0_R | PL1_W,
50
.fieldoffset = offsetof(CPUARMState, cp15.tpidrro_el[0]),
51
.resetvalue = 0},
52
{ .name = "TPIDRURO", .cp = 15, .crn = 13, .crm = 0, .opc1 = 0, .opc2 = 3,
53
- .access = PL0_R|PL1_W,
54
+ .access = PL0_R | PL1_W,
55
.bank_fieldoffsets = { offsetoflow32(CPUARMState, cp15.tpidruro_s),
56
offsetoflow32(CPUARMState, cp15.tpidruro_ns) },
57
.resetfn = arm_cp_reset_ignore },
58
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo cache_block_ops_cp_reginfo[] = {
59
.resetvalue = 0 },
60
/* The cache ops themselves: these all NOP for QEMU */
61
{ .name = "IICR", .cp = 15, .crm = 5, .opc1 = 0,
62
- .access = PL1_W, .type = ARM_CP_NOP|ARM_CP_64BIT },
63
+ .access = PL1_W, .type = ARM_CP_NOP | ARM_CP_64BIT },
64
{ .name = "IDCR", .cp = 15, .crm = 6, .opc1 = 0,
65
- .access = PL1_W, .type = ARM_CP_NOP|ARM_CP_64BIT },
66
+ .access = PL1_W, .type = ARM_CP_NOP | ARM_CP_64BIT },
67
{ .name = "CDCR", .cp = 15, .crm = 12, .opc1 = 0,
68
- .access = PL0_W, .type = ARM_CP_NOP|ARM_CP_64BIT },
69
+ .access = PL0_W, .type = ARM_CP_NOP | ARM_CP_64BIT },
70
{ .name = "PIR", .cp = 15, .crm = 12, .opc1 = 1,
71
- .access = PL0_W, .type = ARM_CP_NOP|ARM_CP_64BIT },
72
+ .access = PL0_W, .type = ARM_CP_NOP | ARM_CP_64BIT },
73
{ .name = "PDR", .cp = 15, .crm = 12, .opc1 = 2,
74
- .access = PL0_W, .type = ARM_CP_NOP|ARM_CP_64BIT },
75
+ .access = PL0_W, .type = ARM_CP_NOP | ARM_CP_64BIT },
76
{ .name = "CIDCR", .cp = 15, .crm = 14, .opc1 = 0,
77
- .access = PL1_W, .type = ARM_CP_NOP|ARM_CP_64BIT },
78
+ .access = PL1_W, .type = ARM_CP_NOP | ARM_CP_64BIT },
27
};
79
};
28
80
29
+static uint64_t sbsa_ref_cpu_mp_affinity(SBSAMachineState *sms, int idx)
81
static const ARMCPRegInfo cache_test_clean_cp_reginfo[] = {
30
+{
82
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
31
+ uint8_t clustersz = ARM_DEFAULT_CPUS_PER_CLUSTER;
83
ARMCPRegInfo cbar = {
32
+ return arm_cpu_mp_affinity(idx, clustersz);
84
.name = "CBAR",
33
+}
85
.cp = 15, .crn = 15, .crm = 0, .opc1 = 4, .opc2 = 0,
34
+
86
- .access = PL1_R|PL3_W, .resetvalue = cpu->reset_cbar,
35
/*
87
+ .access = PL1_R | PL3_W, .resetvalue = cpu->reset_cbar,
36
* Firmware on this machine only uses ACPI table to load OS, these limited
88
.fieldoffset = offsetof(CPUARMState,
37
* device tree nodes are just to let firmware know the info which varies from
89
cp15.c15_config_base_address)
38
@@ -XXX,XX +XXX,XX @@ static void create_fdt(SBSAMachineState *sms)
90
};
39
g_free(matrix);
91
@@ -XXX,XX +XXX,XX @@ static void switch_mode(CPUARMState *env, int mode)
92
return;
93
94
if (old_mode == ARM_CPU_MODE_FIQ) {
95
- memcpy (env->fiq_regs, env->regs + 8, 5 * sizeof(uint32_t));
96
- memcpy (env->regs + 8, env->usr_regs, 5 * sizeof(uint32_t));
97
+ memcpy(env->fiq_regs, env->regs + 8, 5 * sizeof(uint32_t));
98
+ memcpy(env->regs + 8, env->usr_regs, 5 * sizeof(uint32_t));
99
} else if (mode == ARM_CPU_MODE_FIQ) {
100
- memcpy (env->usr_regs, env->regs + 8, 5 * sizeof(uint32_t));
101
- memcpy (env->regs + 8, env->fiq_regs, 5 * sizeof(uint32_t));
102
+ memcpy(env->usr_regs, env->regs + 8, 5 * sizeof(uint32_t));
103
+ memcpy(env->regs + 8, env->fiq_regs, 5 * sizeof(uint32_t));
40
}
104
}
41
105
42
+ /*
106
i = bank_number(old_mode);
43
+ * From Documentation/devicetree/bindings/arm/cpus.yaml
107
@@ -XXX,XX +XXX,XX @@ static inline uint8_t sub8_usat(uint8_t a, uint8_t b)
44
+ * On ARM v8 64-bit systems this property is required
108
RESULT(sum, n, 16); \
45
+ * and matches the MPIDR_EL1 register affinity bits.
109
if (sum >= 0) \
46
+ *
110
ge |= 3 << (n * 2); \
47
+ * * If cpus node's #address-cells property is set to 2
111
- } while(0)
48
+ *
112
+ } while (0)
49
+ * The first reg cell bits [7:0] must be set to
113
50
+ * bits [39:32] of MPIDR_EL1.
114
#define SARITH8(a, b, n, op) do { \
51
+ *
115
int32_t sum; \
52
+ * The second reg cell bits [23:0] must be set to
116
@@ -XXX,XX +XXX,XX @@ static inline uint8_t sub8_usat(uint8_t a, uint8_t b)
53
+ * bits [23:0] of MPIDR_EL1.
117
RESULT(sum, n, 8); \
54
+ */
118
if (sum >= 0) \
55
qemu_fdt_add_subnode(sms->fdt, "/cpus");
119
ge |= 1 << n; \
56
+ qemu_fdt_setprop_cell(sms->fdt, "/cpus", "#address-cells", 2);
120
- } while(0)
57
+ qemu_fdt_setprop_cell(sms->fdt, "/cpus", "#size-cells", 0x0);
121
+ } while (0)
58
122
59
for (cpu = sms->smp_cpus - 1; cpu >= 0; cpu--) {
123
60
char *nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
124
#define ADD16(a, b, n) SARITH16(a, b, n, +)
61
ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu));
125
@@ -XXX,XX +XXX,XX @@ static inline uint8_t sub8_usat(uint8_t a, uint8_t b)
62
CPUState *cs = CPU(armcpu);
126
RESULT(sum, n, 16); \
63
+ uint64_t mpidr = sbsa_ref_cpu_mp_affinity(sms, cpu);
127
if ((sum >> 16) == 1) \
64
128
ge |= 3 << (n * 2); \
65
qemu_fdt_add_subnode(sms->fdt, nodename);
129
- } while(0)
66
+ qemu_fdt_setprop_u64(sms->fdt, nodename, "reg", mpidr);
130
+ } while (0)
67
131
68
if (ms->possible_cpus->cpus[cs->cpu_index].props.has_node_id) {
132
#define ADD8(a, b, n) do { \
69
qemu_fdt_setprop_cell(sms->fdt, nodename, "numa-node-id",
133
uint32_t sum; \
70
@@ -XXX,XX +XXX,XX @@ static void sbsa_ref_init(MachineState *machine)
134
@@ -XXX,XX +XXX,XX @@ static inline uint8_t sub8_usat(uint8_t a, uint8_t b)
71
arm_load_kernel(ARM_CPU(first_cpu), machine, &sms->bootinfo);
135
RESULT(sum, n, 8); \
72
}
136
if ((sum >> 8) == 1) \
73
137
ge |= 1 << n; \
74
-static uint64_t sbsa_ref_cpu_mp_affinity(SBSAMachineState *sms, int idx)
138
- } while(0)
75
-{
139
+ } while (0)
76
- uint8_t clustersz = ARM_DEFAULT_CPUS_PER_CLUSTER;
140
77
- return arm_cpu_mp_affinity(idx, clustersz);
141
#define SUB16(a, b, n) do { \
78
-}
142
uint32_t sum; \
79
-
143
@@ -XXX,XX +XXX,XX @@ static inline uint8_t sub8_usat(uint8_t a, uint8_t b)
80
static const CPUArchIdList *sbsa_ref_possible_cpu_arch_ids(MachineState *ms)
144
RESULT(sum, n, 16); \
81
{
145
if ((sum >> 16) == 0) \
82
unsigned int max_cpus = ms->smp.max_cpus;
146
ge |= 3 << (n * 2); \
147
- } while(0)
148
+ } while (0)
149
150
#define SUB8(a, b, n) do { \
151
uint32_t sum; \
152
@@ -XXX,XX +XXX,XX @@ static inline uint8_t sub8_usat(uint8_t a, uint8_t b)
153
RESULT(sum, n, 8); \
154
if ((sum >> 8) == 0) \
155
ge |= 1 << n; \
156
- } while(0)
157
+ } while (0)
158
159
#define PFX u
160
#define ARITH_GE
83
--
161
--
84
2.20.1
162
2.25.1
85
86
diff view generated by jsdifflib
1
Implement fp16 versions of the VFP VMLA, VMLS, VNMLS, VNMLA, VNMUL
1
From: Fabiano Rosas <farosas@suse.de>
2
instructions. (These are all the remaining ones which we implement
3
via do_vfp_3op_[hsd]p().)
4
2
3
Fix this:
4
ERROR: braces {} are necessary for all arms of this statement
5
6
Signed-off-by: Fabiano Rosas <farosas@suse.de>
7
Reviewed-by: Claudio Fontana <cfontana@suse.de>
8
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
9
Message-id: 20221213190537.511-4-farosas@suse.de
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
7
Message-id: 20200828183354.27913-5-peter.maydell@linaro.org
8
---
11
---
9
target/arm/helper.h | 1 +
12
target/arm/helper.c | 67 ++++++++++++++++++++++++++++-----------------
10
target/arm/vfp.decode | 5 ++
13
1 file changed, 42 insertions(+), 25 deletions(-)
11
target/arm/vfp_helper.c | 5 ++
12
target/arm/translate-vfp.c.inc | 84 ++++++++++++++++++++++++++++++++++
13
4 files changed, 95 insertions(+)
14
14
15
diff --git a/target/arm/helper.h b/target/arm/helper.h
15
diff --git a/target/arm/helper.c b/target/arm/helper.c
16
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
17
--- a/target/arm/helper.h
17
--- a/target/arm/helper.c
18
+++ b/target/arm/helper.h
18
+++ b/target/arm/helper.c
19
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_maxnumd, f64, f64, f64, ptr)
19
@@ -XXX,XX +XXX,XX @@ void cpsr_write(CPUARMState *env, uint32_t val, uint32_t mask,
20
DEF_HELPER_3(vfp_minnumh, f16, f16, f16, ptr)
20
env->CF = (val >> 29) & 1;
21
DEF_HELPER_3(vfp_minnums, f32, f32, f32, ptr)
21
env->VF = (val << 3) & 0x80000000;
22
DEF_HELPER_3(vfp_minnumd, f64, f64, f64, ptr)
22
}
23
+DEF_HELPER_1(vfp_negh, f16, f16)
23
- if (mask & CPSR_Q)
24
DEF_HELPER_1(vfp_negs, f32, f32)
24
+ if (mask & CPSR_Q) {
25
DEF_HELPER_1(vfp_negd, f64, f64)
25
env->QF = ((val & CPSR_Q) != 0);
26
DEF_HELPER_1(vfp_abss, f32, f32)
26
- if (mask & CPSR_T)
27
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
27
+ }
28
index XXXXXXX..XXXXXXX 100644
28
+ if (mask & CPSR_T) {
29
--- a/target/arm/vfp.decode
29
env->thumb = ((val & CPSR_T) != 0);
30
+++ b/target/arm/vfp.decode
30
+ }
31
@@ -XXX,XX +XXX,XX @@ VLDM_VSTM_dp ---- 1101 0.1 l:1 rn:4 .... 1011 imm:8 \
31
if (mask & CPSR_IT_0_1) {
32
vd=%vd_dp p=1 u=0 w=1
32
env->condexec_bits &= ~3;
33
33
env->condexec_bits |= (val >> 25) & 3;
34
# 3-register VFP data-processing; bits [23,21:20,6] identify the operation.
34
@@ -XXX,XX +XXX,XX @@ static void switch_mode(CPUARMState *env, int mode)
35
+VMLA_hp ---- 1110 0.00 .... .... 1001 .0.0 .... @vfp_dnm_s
35
int i;
36
VMLA_sp ---- 1110 0.00 .... .... 1010 .0.0 .... @vfp_dnm_s
36
37
VMLA_dp ---- 1110 0.00 .... .... 1011 .0.0 .... @vfp_dnm_d
37
old_mode = env->uncached_cpsr & CPSR_M;
38
38
- if (mode == old_mode)
39
+VMLS_hp ---- 1110 0.00 .... .... 1001 .1.0 .... @vfp_dnm_s
39
+ if (mode == old_mode) {
40
VMLS_sp ---- 1110 0.00 .... .... 1010 .1.0 .... @vfp_dnm_s
40
return;
41
VMLS_dp ---- 1110 0.00 .... .... 1011 .1.0 .... @vfp_dnm_d
41
+ }
42
42
43
+VNMLS_hp ---- 1110 0.01 .... .... 1001 .0.0 .... @vfp_dnm_s
43
if (old_mode == ARM_CPU_MODE_FIQ) {
44
VNMLS_sp ---- 1110 0.01 .... .... 1010 .0.0 .... @vfp_dnm_s
44
memcpy(env->fiq_regs, env->regs + 8, 5 * sizeof(uint32_t));
45
VNMLS_dp ---- 1110 0.01 .... .... 1011 .0.0 .... @vfp_dnm_d
45
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_do_interrupt_aarch32(CPUState *cs)
46
46
new_mode = ARM_CPU_MODE_UND;
47
+VNMLA_hp ---- 1110 0.01 .... .... 1001 .1.0 .... @vfp_dnm_s
47
addr = 0x04;
48
VNMLA_sp ---- 1110 0.01 .... .... 1010 .1.0 .... @vfp_dnm_s
48
mask = CPSR_I;
49
VNMLA_dp ---- 1110 0.01 .... .... 1011 .1.0 .... @vfp_dnm_d
49
- if (env->thumb)
50
50
+ if (env->thumb) {
51
@@ -XXX,XX +XXX,XX @@ VMUL_hp ---- 1110 0.10 .... .... 1001 .0.0 .... @vfp_dnm_s
51
offset = 2;
52
VMUL_sp ---- 1110 0.10 .... .... 1010 .0.0 .... @vfp_dnm_s
52
- else
53
VMUL_dp ---- 1110 0.10 .... .... 1011 .0.0 .... @vfp_dnm_d
53
+ } else {
54
54
offset = 4;
55
+VNMUL_hp ---- 1110 0.10 .... .... 1001 .1.0 .... @vfp_dnm_s
55
+ }
56
VNMUL_sp ---- 1110 0.10 .... .... 1010 .1.0 .... @vfp_dnm_s
56
break;
57
VNMUL_dp ---- 1110 0.10 .... .... 1011 .1.0 .... @vfp_dnm_d
57
case EXCP_SWI:
58
58
new_mode = ARM_CPU_MODE_SVC;
59
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
59
@@ -XXX,XX +XXX,XX @@ static inline uint16_t add16_sat(uint16_t a, uint16_t b)
60
index XXXXXXX..XXXXXXX 100644
60
61
--- a/target/arm/vfp_helper.c
61
res = a + b;
62
+++ b/target/arm/vfp_helper.c
62
if (((res ^ a) & 0x8000) && !((a ^ b) & 0x8000)) {
63
@@ -XXX,XX +XXX,XX @@ VFP_BINOP(minnum)
63
- if (a & 0x8000)
64
VFP_BINOP(maxnum)
64
+ if (a & 0x8000) {
65
#undef VFP_BINOP
65
res = 0x8000;
66
66
- else
67
+dh_ctype_f16 VFP_HELPER(neg, h)(dh_ctype_f16 a)
67
+ } else {
68
+{
68
res = 0x7fff;
69
+ return float16_chs(a);
69
+ }
70
+}
70
}
71
+
71
return res;
72
float32 VFP_HELPER(neg, s)(float32 a)
72
}
73
@@ -XXX,XX +XXX,XX @@ static inline uint8_t add8_sat(uint8_t a, uint8_t b)
74
75
res = a + b;
76
if (((res ^ a) & 0x80) && !((a ^ b) & 0x80)) {
77
- if (a & 0x80)
78
+ if (a & 0x80) {
79
res = 0x80;
80
- else
81
+ } else {
82
res = 0x7f;
83
+ }
84
}
85
return res;
86
}
87
@@ -XXX,XX +XXX,XX @@ static inline uint16_t sub16_sat(uint16_t a, uint16_t b)
88
89
res = a - b;
90
if (((res ^ a) & 0x8000) && ((a ^ b) & 0x8000)) {
91
- if (a & 0x8000)
92
+ if (a & 0x8000) {
93
res = 0x8000;
94
- else
95
+ } else {
96
res = 0x7fff;
97
+ }
98
}
99
return res;
100
}
101
@@ -XXX,XX +XXX,XX @@ static inline uint8_t sub8_sat(uint8_t a, uint8_t b)
102
103
res = a - b;
104
if (((res ^ a) & 0x80) && ((a ^ b) & 0x80)) {
105
- if (a & 0x80)
106
+ if (a & 0x80) {
107
res = 0x80;
108
- else
109
+ } else {
110
res = 0x7f;
111
+ }
112
}
113
return res;
114
}
115
@@ -XXX,XX +XXX,XX @@ static inline uint16_t add16_usat(uint16_t a, uint16_t b)
73
{
116
{
74
return float32_chs(a);
117
uint16_t res;
75
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
118
res = a + b;
76
index XXXXXXX..XXXXXXX 100644
119
- if (res < a)
77
--- a/target/arm/translate-vfp.c.inc
120
+ if (res < a) {
78
+++ b/target/arm/translate-vfp.c.inc
121
res = 0xffff;
79
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
122
+ }
80
return true;
123
return res;
81
}
124
}
82
125
83
+static void gen_VMLA_hp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
126
static inline uint16_t sub16_usat(uint16_t a, uint16_t b)
84
+{
85
+ /* Note that order of inputs to the add matters for NaNs */
86
+ TCGv_i32 tmp = tcg_temp_new_i32();
87
+
88
+ gen_helper_vfp_mulh(tmp, vn, vm, fpst);
89
+ gen_helper_vfp_addh(vd, vd, tmp, fpst);
90
+ tcg_temp_free_i32(tmp);
91
+}
92
+
93
+static bool trans_VMLA_hp(DisasContext *s, arg_VMLA_sp *a)
94
+{
95
+ return do_vfp_3op_hp(s, gen_VMLA_hp, a->vd, a->vn, a->vm, true);
96
+}
97
+
98
static void gen_VMLA_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
99
{
127
{
100
/* Note that order of inputs to the add matters for NaNs */
128
- if (a > b)
101
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLA_dp(DisasContext *s, arg_VMLA_dp *a)
129
+ if (a > b) {
102
return do_vfp_3op_dp(s, gen_VMLA_dp, a->vd, a->vn, a->vm, true);
130
return a - b;
131
- else
132
+ } else {
133
return 0;
134
+ }
103
}
135
}
104
136
105
+static void gen_VMLS_hp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
137
static inline uint8_t add8_usat(uint8_t a, uint8_t b)
106
+{
107
+ /*
108
+ * VMLS: vd = vd + -(vn * vm)
109
+ * Note that order of inputs to the add matters for NaNs.
110
+ */
111
+ TCGv_i32 tmp = tcg_temp_new_i32();
112
+
113
+ gen_helper_vfp_mulh(tmp, vn, vm, fpst);
114
+ gen_helper_vfp_negh(tmp, tmp);
115
+ gen_helper_vfp_addh(vd, vd, tmp, fpst);
116
+ tcg_temp_free_i32(tmp);
117
+}
118
+
119
+static bool trans_VMLS_hp(DisasContext *s, arg_VMLS_sp *a)
120
+{
121
+ return do_vfp_3op_hp(s, gen_VMLS_hp, a->vd, a->vn, a->vm, true);
122
+}
123
+
124
static void gen_VMLS_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
125
{
138
{
126
/*
139
uint8_t res;
127
@@ -XXX,XX +XXX,XX @@ static bool trans_VMLS_dp(DisasContext *s, arg_VMLS_dp *a)
140
res = a + b;
128
return do_vfp_3op_dp(s, gen_VMLS_dp, a->vd, a->vn, a->vm, true);
141
- if (res < a)
142
+ if (res < a) {
143
res = 0xff;
144
+ }
145
return res;
129
}
146
}
130
147
131
+static void gen_VNMLS_hp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
148
static inline uint8_t sub8_usat(uint8_t a, uint8_t b)
132
+{
133
+ /*
134
+ * VNMLS: -fd + (fn * fm)
135
+ * Note that it isn't valid to replace (-A + B) with (B - A) or similar
136
+ * plausible looking simplifications because this will give wrong results
137
+ * for NaNs.
138
+ */
139
+ TCGv_i32 tmp = tcg_temp_new_i32();
140
+
141
+ gen_helper_vfp_mulh(tmp, vn, vm, fpst);
142
+ gen_helper_vfp_negh(vd, vd);
143
+ gen_helper_vfp_addh(vd, vd, tmp, fpst);
144
+ tcg_temp_free_i32(tmp);
145
+}
146
+
147
+static bool trans_VNMLS_hp(DisasContext *s, arg_VNMLS_sp *a)
148
+{
149
+ return do_vfp_3op_hp(s, gen_VNMLS_hp, a->vd, a->vn, a->vm, true);
150
+}
151
+
152
static void gen_VNMLS_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
153
{
149
{
154
/*
150
- if (a > b)
155
@@ -XXX,XX +XXX,XX @@ static bool trans_VNMLS_dp(DisasContext *s, arg_VNMLS_dp *a)
151
+ if (a > b) {
156
return do_vfp_3op_dp(s, gen_VNMLS_dp, a->vd, a->vn, a->vm, true);
152
return a - b;
153
- else
154
+ } else {
155
return 0;
156
+ }
157
}
157
}
158
158
159
+static void gen_VNMLA_hp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
159
#define ADD16(a, b, n) RESULT(add16_usat(a, b), n, 16);
160
+{
160
@@ -XXX,XX +XXX,XX @@ static inline uint8_t sub8_usat(uint8_t a, uint8_t b)
161
+ /* VNMLA: -fd + -(fn * fm) */
161
162
+ TCGv_i32 tmp = tcg_temp_new_i32();
162
static inline uint8_t do_usad(uint8_t a, uint8_t b)
163
+
164
+ gen_helper_vfp_mulh(tmp, vn, vm, fpst);
165
+ gen_helper_vfp_negh(tmp, tmp);
166
+ gen_helper_vfp_negh(vd, vd);
167
+ gen_helper_vfp_addh(vd, vd, tmp, fpst);
168
+ tcg_temp_free_i32(tmp);
169
+}
170
+
171
+static bool trans_VNMLA_hp(DisasContext *s, arg_VNMLA_sp *a)
172
+{
173
+ return do_vfp_3op_hp(s, gen_VNMLA_hp, a->vd, a->vn, a->vm, true);
174
+}
175
+
176
static void gen_VNMLA_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
177
{
163
{
178
/* VNMLA: -fd + -(fn * fm) */
164
- if (a > b)
179
@@ -XXX,XX +XXX,XX @@ static bool trans_VMUL_dp(DisasContext *s, arg_VMUL_dp *a)
165
+ if (a > b) {
180
return do_vfp_3op_dp(s, gen_helper_vfp_muld, a->vd, a->vn, a->vm, false);
166
return a - b;
167
- else
168
+ } else {
169
return b - a;
170
+ }
181
}
171
}
182
172
183
+static void gen_VNMUL_hp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
173
/* Unsigned sum of absolute byte differences. */
184
+{
174
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sel_flags)(uint32_t flags, uint32_t a, uint32_t b)
185
+ /* VNMUL: -(fn * fm) */
175
uint32_t mask;
186
+ gen_helper_vfp_mulh(vd, vn, vm, fpst);
176
187
+ gen_helper_vfp_negh(vd, vd);
177
mask = 0;
188
+}
178
- if (flags & 1)
189
+
179
+ if (flags & 1) {
190
+static bool trans_VNMUL_hp(DisasContext *s, arg_VNMUL_sp *a)
180
mask |= 0xff;
191
+{
181
- if (flags & 2)
192
+ return do_vfp_3op_hp(s, gen_VNMUL_hp, a->vd, a->vn, a->vm, false);
182
+ }
193
+}
183
+ if (flags & 2) {
194
+
184
mask |= 0xff00;
195
static void gen_VNMUL_sp(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm, TCGv_ptr fpst)
185
- if (flags & 4)
196
{
186
+ }
197
/* VNMUL: -(fn * fm) */
187
+ if (flags & 4) {
188
mask |= 0xff0000;
189
- if (flags & 8)
190
+ }
191
+ if (flags & 8) {
192
mask |= 0xff000000;
193
+ }
194
return (a & mask) | (b & ~mask);
195
}
196
198
--
197
--
199
2.20.1
198
2.25.1
200
201
diff view generated by jsdifflib
1
In the gvec helper functions for indexed operations, for AArch32
1
From: Fabiano Rosas <farosas@suse.de>
2
Neon the oprsz (total size of the vector) can be less than 16 bytes
3
if the operation is on a D reg. Since the inner loop in these
4
helpers always goes from 0 to segment, we must clamp it based
5
on oprsz to avoid processing a full 16 byte segment when asked to
6
handle an 8 byte wide vector.
7
2
3
Signed-off-by: Fabiano Rosas <farosas@suse.de>
4
Reviewed-by: Claudio Fontana <cfontana@suse.de>
5
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
6
Message-id: 20221213190537.511-5-farosas@suse.de
8
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
10
Message-id: 20200828183354.27913-43-peter.maydell@linaro.org
11
---
8
---
12
target/arm/vec_helper.c | 12 ++++++++----
9
target/arm/m_helper.c | 16 ----------------
13
1 file changed, 8 insertions(+), 4 deletions(-)
10
1 file changed, 16 deletions(-)
14
11
15
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
12
diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
16
index XXXXXXX..XXXXXXX 100644
13
index XXXXXXX..XXXXXXX 100644
17
--- a/target/arm/vec_helper.c
14
--- a/target/arm/m_helper.c
18
+++ b/target/arm/vec_helper.c
15
+++ b/target/arm/m_helper.c
19
@@ -XXX,XX +XXX,XX @@ DO_MULADD(gvec_vfms_s, float32_mulsub_f, float32)
16
@@ -XXX,XX +XXX,XX @@
20
#define DO_MUL_IDX(NAME, TYPE, H) \
17
*/
21
void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \
18
22
{ \
19
#include "qemu/osdep.h"
23
- intptr_t i, j, oprsz = simd_oprsz(desc), segment = 16 / sizeof(TYPE); \
20
-#include "qemu/units.h"
24
+ intptr_t i, j, oprsz = simd_oprsz(desc); \
21
-#include "target/arm/idau.h"
25
+ intptr_t segment = MIN(16, oprsz) / sizeof(TYPE); \
22
-#include "trace.h"
26
intptr_t idx = simd_data(desc); \
23
#include "cpu.h"
27
TYPE *d = vd, *n = vn, *m = vm; \
24
#include "internals.h"
28
for (i = 0; i < oprsz / sizeof(TYPE); i += segment) { \
25
-#include "exec/gdbstub.h"
29
@@ -XXX,XX +XXX,XX @@ DO_MUL_IDX(gvec_mul_idx_d, uint64_t, )
26
#include "exec/helper-proto.h"
30
#define DO_MLA_IDX(NAME, TYPE, OP, H) \
27
-#include "qemu/host-utils.h"
31
void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc) \
28
#include "qemu/main-loop.h"
32
{ \
29
#include "qemu/bitops.h"
33
- intptr_t i, j, oprsz = simd_oprsz(desc), segment = 16 / sizeof(TYPE); \
30
-#include "qemu/crc32c.h"
34
+ intptr_t i, j, oprsz = simd_oprsz(desc); \
31
-#include "qemu/qemu-print.h"
35
+ intptr_t segment = MIN(16, oprsz) / sizeof(TYPE); \
32
#include "qemu/log.h"
36
intptr_t idx = simd_data(desc); \
33
#include "exec/exec-all.h"
37
TYPE *d = vd, *n = vn, *m = vm, *a = va; \
34
-#include <zlib.h> /* For crc32 */
38
for (i = 0; i < oprsz / sizeof(TYPE); i += segment) { \
35
-#include "semihosting/semihost.h"
39
@@ -XXX,XX +XXX,XX @@ DO_MLA_IDX(gvec_mls_idx_d, uint64_t, -, )
36
-#include "sysemu/cpus.h"
40
#define DO_FMUL_IDX(NAME, TYPE, H) \
37
-#include "sysemu/kvm.h"
41
void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \
38
-#include "qemu/range.h"
42
{ \
39
-#include "qapi/qapi-commands-machine-target.h"
43
- intptr_t i, j, oprsz = simd_oprsz(desc), segment = 16 / sizeof(TYPE); \
40
-#include "qapi/error.h"
44
+ intptr_t i, j, oprsz = simd_oprsz(desc); \
41
-#include "qemu/guest-random.h"
45
+ intptr_t segment = MIN(16, oprsz) / sizeof(TYPE); \
42
#ifdef CONFIG_TCG
46
intptr_t idx = simd_data(desc); \
43
-#include "arm_ldst.h"
47
TYPE *d = vd, *n = vn, *m = vm; \
44
#include "exec/cpu_ldst.h"
48
for (i = 0; i < oprsz / sizeof(TYPE); i += segment) { \
45
#include "semihosting/common-semi.h"
49
@@ -XXX,XX +XXX,XX @@ DO_FMUL_IDX(gvec_fmul_idx_d, float64, )
46
#endif
50
void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, \
51
void *stat, uint32_t desc) \
52
{ \
53
- intptr_t i, j, oprsz = simd_oprsz(desc), segment = 16 / sizeof(TYPE); \
54
+ intptr_t i, j, oprsz = simd_oprsz(desc); \
55
+ intptr_t segment = MIN(16, oprsz) / sizeof(TYPE); \
56
TYPE op1_neg = extract32(desc, SIMD_DATA_SHIFT, 1); \
57
intptr_t idx = desc >> (SIMD_DATA_SHIFT + 1); \
58
TYPE *d = vd, *n = vn, *m = vm, *a = va; \
59
--
47
--
60
2.20.1
48
2.25.1
61
62
diff view generated by jsdifflib
1
Convert the Neon VRINTX insn to use gvec, and use this to implement
1
From: Fabiano Rosas <farosas@suse.de>
2
fp16 support for it.
3
2
3
Signed-off-by: Fabiano Rosas <farosas@suse.de>
4
Reviewed-by: Claudio Fontana <cfontana@suse.de>
5
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
6
Message-id: 20221213190537.511-6-farosas@suse.de
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Message-id: 20200828183354.27913-42-peter.maydell@linaro.org
7
---
8
---
8
target/arm/helper.h | 3 +++
9
target/arm/helper.c | 7 -------
9
target/arm/vec_helper.c | 3 +++
10
1 file changed, 7 deletions(-)
10
target/arm/translate-neon.c.inc | 45 +++------------------------------
11
3 files changed, 9 insertions(+), 42 deletions(-)
12
11
13
diff --git a/target/arm/helper.h b/target/arm/helper.h
12
diff --git a/target/arm/helper.c b/target/arm/helper.c
14
index XXXXXXX..XXXXXXX 100644
13
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/helper.h
14
--- a/target/arm/helper.c
16
+++ b/target/arm/helper.h
15
+++ b/target/arm/helper.c
17
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_vcvt_rm_uh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
16
@@ -XXX,XX +XXX,XX @@
18
DEF_HELPER_FLAGS_4(gvec_vrint_rm_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
17
*/
19
DEF_HELPER_FLAGS_4(gvec_vrint_rm_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
18
20
19
#include "qemu/osdep.h"
21
+DEF_HELPER_FLAGS_4(gvec_vrintx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
20
-#include "qemu/units.h"
22
+DEF_HELPER_FLAGS_4(gvec_vrintx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
21
#include "qemu/log.h"
23
+
22
#include "trace.h"
24
DEF_HELPER_FLAGS_4(gvec_frecpe_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
23
#include "cpu.h"
25
DEF_HELPER_FLAGS_4(gvec_frecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
24
#include "internals.h"
26
DEF_HELPER_FLAGS_4(gvec_frecpe_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
25
#include "exec/helper-proto.h"
27
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
26
-#include "qemu/host-utils.h"
28
index XXXXXXX..XXXXXXX 100644
27
#include "qemu/main-loop.h"
29
--- a/target/arm/vec_helper.c
28
#include "qemu/timer.h"
30
+++ b/target/arm/vec_helper.c
29
#include "qemu/bitops.h"
31
@@ -XXX,XX +XXX,XX @@ DO_2OP(gvec_frsqrte_h, helper_rsqrte_f16, float16)
30
@@ -XXX,XX +XXX,XX @@
32
DO_2OP(gvec_frsqrte_s, helper_rsqrte_f32, float32)
31
#include "exec/exec-all.h"
33
DO_2OP(gvec_frsqrte_d, helper_rsqrte_f64, float64)
32
#include <zlib.h> /* For crc32 */
34
33
#include "hw/irq.h"
35
+DO_2OP(gvec_vrintx_h, float16_round_to_int, float16)
34
-#include "semihosting/semihost.h"
36
+DO_2OP(gvec_vrintx_s, float32_round_to_int, float32)
35
-#include "sysemu/cpus.h"
37
+
36
#include "sysemu/cpu-timers.h"
38
DO_2OP(gvec_sitos, helper_vfp_sitos, int32_t)
37
#include "sysemu/kvm.h"
39
DO_2OP(gvec_uitos, helper_vfp_uitos, uint32_t)
38
-#include "qemu/range.h"
40
DO_2OP(gvec_tosizs, helper_vfp_tosizs, float32)
39
#include "qapi/qapi-commands-machine-target.h"
41
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
40
#include "qapi/error.h"
42
index XXXXXXX..XXXXXXX 100644
41
#include "qemu/guest-random.h"
43
--- a/target/arm/translate-neon.c.inc
42
#ifdef CONFIG_TCG
44
+++ b/target/arm/translate-neon.c.inc
43
-#include "arm_ldst.h"
45
@@ -XXX,XX +XXX,XX @@ static bool trans_VQNEG(DisasContext *s, arg_2misc *a)
44
-#include "exec/cpu_ldst.h"
46
return do_2misc(s, a, fn[a->size]);
45
#include "semihosting/common-semi.h"
47
}
46
#endif
48
47
#include "cpregs.h"
49
-static bool do_2misc_fp(DisasContext *s, arg_2misc *a,
50
- NeonGenOneSingleOpFn *fn)
51
-{
52
- int pass;
53
- TCGv_ptr fpst;
54
-
55
- /* Handle a 2-reg-misc operation by iterating 32 bits at a time */
56
- if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
57
- return false;
58
- }
59
-
60
- /* UNDEF accesses to D16-D31 if they don't exist. */
61
- if (!dc_isar_feature(aa32_simd_r32, s) &&
62
- ((a->vd | a->vm) & 0x10)) {
63
- return false;
64
- }
65
-
66
- if (a->size != 2) {
67
- /* TODO: FP16 will be the size == 1 case */
68
- return false;
69
- }
70
-
71
- if ((a->vd | a->vm) & a->q) {
72
- return false;
73
- }
74
-
75
- if (!vfp_access_check(s)) {
76
- return true;
77
- }
78
-
79
- fpst = fpstatus_ptr(FPST_STD);
80
- for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
81
- TCGv_i32 tmp = neon_load_reg(a->vm, pass);
82
- fn(tmp, tmp, fpst);
83
- neon_store_reg(a->vd, pass, tmp);
84
- }
85
- tcg_temp_free_ptr(fpst);
86
-
87
- return true;
88
-}
89
-
90
#define DO_2MISC_FP_VEC(INSN, HFUNC, SFUNC) \
91
static void gen_##INSN(unsigned vece, uint32_t rd_ofs, \
92
uint32_t rm_ofs, \
93
@@ -XXX,XX +XXX,XX @@ DO_2MISC_FP_VEC(VCVT_FU, gen_helper_gvec_ustoh, gen_helper_gvec_uitos)
94
DO_2MISC_FP_VEC(VCVT_SF, gen_helper_gvec_tosszh, gen_helper_gvec_tosizs)
95
DO_2MISC_FP_VEC(VCVT_UF, gen_helper_gvec_touszh, gen_helper_gvec_touizs)
96
97
+DO_2MISC_FP_VEC(VRINTX_impl, gen_helper_gvec_vrintx_h, gen_helper_gvec_vrintx_s)
98
+
99
static bool trans_VRINTX(DisasContext *s, arg_2misc *a)
100
{
101
if (!arm_dc_feature(s, ARM_FEATURE_V8)) {
102
return false;
103
}
104
- return do_2misc_fp(s, a, gen_helper_rints_exact);
105
+ return trans_VRINTX_impl(s, a);
106
}
107
108
#define DO_VEC_RMODE(INSN, RMODE, OP) \
109
--
48
--
110
2.20.1
49
2.25.1
111
112
diff view generated by jsdifflib
1
Set the MVFR1 ID register FPHP and SIMDHP fields to indicate
1
From: Claudio Fontana <cfontana@suse.de>
2
that our "-cpu max" has v8.2-FP16.
3
2
3
Remove some unused headers.
4
5
Signed-off-by: Claudio Fontana <cfontana@suse.de>
6
Acked-by: Richard Henderson <richard.henderson@linaro.org>
7
Reviewed-by: Claudio Fontana <cfontana@suse.de>
8
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
9
Signed-off-by: Fabiano Rosas <farosas@suse.de>
10
Message-id: 20221213190537.511-7-farosas@suse.de
11
[added back some includes that are still needed at this point]
12
Signed-off-by: Fabiano Rosas <farosas@suse.de>
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
13
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Message-id: 20200828183354.27913-46-peter.maydell@linaro.org
7
---
14
---
8
target/arm/cpu.c | 3 ++-
15
target/arm/cpu.c | 1 -
9
target/arm/cpu64.c | 10 ++++------
16
target/arm/cpu64.c | 6 ------
10
2 files changed, 6 insertions(+), 7 deletions(-)
17
2 files changed, 7 deletions(-)
11
18
12
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
19
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
13
index XXXXXXX..XXXXXXX 100644
20
index XXXXXXX..XXXXXXX 100644
14
--- a/target/arm/cpu.c
21
--- a/target/arm/cpu.c
15
+++ b/target/arm/cpu.c
22
+++ b/target/arm/cpu.c
16
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
23
@@ -XXX,XX +XXX,XX @@
17
cpu->isar.id_isar6 = t;
24
#include "target/arm/idau.h"
18
25
#include "qemu/module.h"
19
t = cpu->isar.mvfr1;
26
#include "qapi/error.h"
20
- t = FIELD_DP32(t, MVFR1, FPHP, 2); /* v8.0 FP support */
27
-#include "qapi/visitor.h"
21
+ t = FIELD_DP32(t, MVFR1, FPHP, 3); /* v8.2-FP16 */
28
#include "cpu.h"
22
+ t = FIELD_DP32(t, MVFR1, SIMDHP, 2); /* v8.2-FP16 */
29
#ifdef CONFIG_TCG
23
cpu->isar.mvfr1 = t;
30
#include "hw/core/tcg-cpu-ops.h"
24
25
t = cpu->isar.mvfr2;
26
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
31
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
27
index XXXXXXX..XXXXXXX 100644
32
index XXXXXXX..XXXXXXX 100644
28
--- a/target/arm/cpu64.c
33
--- a/target/arm/cpu64.c
29
+++ b/target/arm/cpu64.c
34
+++ b/target/arm/cpu64.c
30
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
35
@@ -XXX,XX +XXX,XX @@
31
u = FIELD_DP32(u, ID_DFR0, PERFMON, 5); /* v8.4-PMU */
36
#include "qemu/osdep.h"
32
cpu->isar.id_dfr0 = u;
37
#include "qapi/error.h"
33
38
#include "cpu.h"
34
- /*
39
-#ifdef CONFIG_TCG
35
- * FIXME: We do not yet support ARMv8.2-fp16 for AArch32 yet,
40
-#include "hw/core/tcg-cpu-ops.h"
36
- * so do not set MVFR1.FPHP. Strictly speaking this is not legal,
41
-#endif /* CONFIG_TCG */
37
- * but it is also not legal to enable SVE without support for FP16,
42
#include "qemu/module.h"
38
- * and enabling SVE in system mode is more useful in the short term.
43
-#if !defined(CONFIG_USER_ONLY)
39
- */
44
-#include "hw/loader.h"
40
+ u = cpu->isar.mvfr1;
45
-#endif
41
+ u = FIELD_DP32(u, MVFR1, FPHP, 3); /* v8.2-FP16 */
46
#include "sysemu/kvm.h"
42
+ u = FIELD_DP32(u, MVFR1, SIMDHP, 2); /* v8.2-FP16 */
47
#include "sysemu/hvf.h"
43
+ cpu->isar.mvfr1 = u;
48
#include "kvm_arm.h"
44
45
#ifdef CONFIG_USER_ONLY
46
/* For usermode -cpu max we can use a larger and more efficient DCZ
47
--
49
--
48
2.20.1
50
2.25.1
49
50
diff view generated by jsdifflib
1
Implement fp16 for the Neon VCVT insns which convert between
1
From: Philippe Mathieu-Daudé <philmd@linaro.org>
2
float and fixed-point.
3
2
3
The pointed MouseTransformInfo structure is accessed read-only.
4
5
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
6
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
7
Message-id: 20221220142520.24094-2-philmd@linaro.org
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Message-id: 20200828183354.27913-39-peter.maydell@linaro.org
7
---
9
---
8
target/arm/helper.h | 5 +++++
10
include/hw/input/tsc2xxx.h | 4 ++--
9
target/arm/neon-dp.decode | 8 +++++++-
11
hw/input/tsc2005.c | 2 +-
10
target/arm/vec_helper.c | 4 ++++
12
hw/input/tsc210x.c | 3 +--
11
target/arm/translate-neon.c.inc | 5 +++++
13
3 files changed, 4 insertions(+), 5 deletions(-)
12
4 files changed, 21 insertions(+), 1 deletion(-)
13
14
14
diff --git a/target/arm/helper.h b/target/arm/helper.h
15
diff --git a/include/hw/input/tsc2xxx.h b/include/hw/input/tsc2xxx.h
15
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/helper.h
17
--- a/include/hw/input/tsc2xxx.h
17
+++ b/target/arm/helper.h
18
+++ b/include/hw/input/tsc2xxx.h
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_vcvt_uf, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
19
@@ -XXX,XX +XXX,XX @@ uWireSlave *tsc2102_init(qemu_irq pint);
19
DEF_HELPER_FLAGS_4(gvec_vcvt_fs, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
20
uWireSlave *tsc2301_init(qemu_irq penirq, qemu_irq kbirq, qemu_irq dav);
20
DEF_HELPER_FLAGS_4(gvec_vcvt_fu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
21
I2SCodec *tsc210x_codec(uWireSlave *chip);
21
22
uint32_t tsc210x_txrx(void *opaque, uint32_t value, int len);
22
+DEF_HELPER_FLAGS_4(gvec_vcvt_sh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
23
-void tsc210x_set_transform(uWireSlave *chip, MouseTransformInfo *info);
23
+DEF_HELPER_FLAGS_4(gvec_vcvt_uh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
24
+void tsc210x_set_transform(uWireSlave *chip, const MouseTransformInfo *info);
24
+DEF_HELPER_FLAGS_4(gvec_vcvt_hs, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
25
void tsc210x_key_event(uWireSlave *chip, int key, int down);
25
+DEF_HELPER_FLAGS_4(gvec_vcvt_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
26
26
+
27
/* tsc2005.c */
27
DEF_HELPER_FLAGS_4(gvec_frecpe_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
28
void *tsc2005_init(qemu_irq pintdav);
28
DEF_HELPER_FLAGS_4(gvec_frecpe_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
29
uint32_t tsc2005_txrx(void *opaque, uint32_t value, int len);
29
DEF_HELPER_FLAGS_4(gvec_frecpe_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
30
-void tsc2005_set_transform(void *opaque, MouseTransformInfo *info);
30
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
31
+void tsc2005_set_transform(void *opaque, const MouseTransformInfo *info);
32
33
#endif
34
diff --git a/hw/input/tsc2005.c b/hw/input/tsc2005.c
31
index XXXXXXX..XXXXXXX 100644
35
index XXXXXXX..XXXXXXX 100644
32
--- a/target/arm/neon-dp.decode
36
--- a/hw/input/tsc2005.c
33
+++ b/target/arm/neon-dp.decode
37
+++ b/hw/input/tsc2005.c
34
@@ -XXX,XX +XXX,XX @@ VMINNM_fp_3s 1111 001 1 0 . 1 . .... .... 1111 ... 1 .... @3same_fp
38
@@ -XXX,XX +XXX,XX @@ void *tsc2005_init(qemu_irq pintdav)
35
# We use size=0 for fp32 and size=1 for fp16 to match the 3-same encodings.
39
* from the touchscreen. Assuming 12-bit precision was used during
36
@2reg_vcvt .... ... . . . 1 ..... .... .... . q:1 . . .... \
40
* tslib calibration.
37
&2reg_shift vm=%vm_dp vd=%vd_dp size=0 shift=%neon_rshift_i5
41
*/
38
+@2reg_vcvt_f16 .... ... . . . 11 .... .... .... . q:1 . . .... \
42
-void tsc2005_set_transform(void *opaque, MouseTransformInfo *info)
39
+ &2reg_shift vm=%vm_dp vd=%vd_dp size=1 shift=%neon_rshift_i4
43
+void tsc2005_set_transform(void *opaque, const MouseTransformInfo *info)
40
44
{
41
VSHR_S_2sh 1111 001 0 1 . ...... .... 0000 . . . 1 .... @2reg_shr_d
45
TSC2005State *s = (TSC2005State *) opaque;
42
VSHR_S_2sh 1111 001 0 1 . ...... .... 0000 . . . 1 .... @2reg_shr_s
46
43
@@ -XXX,XX +XXX,XX @@ VSHLL_U_2sh 1111 001 1 1 . ...... .... 1010 . 0 . 1 .... @2reg_shll_h
47
diff --git a/hw/input/tsc210x.c b/hw/input/tsc210x.c
44
VSHLL_U_2sh 1111 001 1 1 . ...... .... 1010 . 0 . 1 .... @2reg_shll_b
45
46
# VCVT fixed<->float conversions
47
-# TODO: FP16 fixed<->float conversions are opc==0b1100 and 0b1101
48
+VCVT_SH_2sh 1111 001 0 1 . ...... .... 1100 0 . . 1 .... @2reg_vcvt_f16
49
+VCVT_UH_2sh 1111 001 1 1 . ...... .... 1100 0 . . 1 .... @2reg_vcvt_f16
50
+VCVT_HS_2sh 1111 001 0 1 . ...... .... 1101 0 . . 1 .... @2reg_vcvt_f16
51
+VCVT_HU_2sh 1111 001 1 1 . ...... .... 1101 0 . . 1 .... @2reg_vcvt_f16
52
+
53
VCVT_SF_2sh 1111 001 0 1 . ...... .... 1110 0 . . 1 .... @2reg_vcvt
54
VCVT_UF_2sh 1111 001 1 1 . ...... .... 1110 0 . . 1 .... @2reg_vcvt
55
VCVT_FS_2sh 1111 001 0 1 . ...... .... 1111 0 . . 1 .... @2reg_vcvt
56
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
57
index XXXXXXX..XXXXXXX 100644
48
index XXXXXXX..XXXXXXX 100644
58
--- a/target/arm/vec_helper.c
49
--- a/hw/input/tsc210x.c
59
+++ b/target/arm/vec_helper.c
50
+++ b/hw/input/tsc210x.c
60
@@ -XXX,XX +XXX,XX @@ DO_VCVT_FIXED(gvec_vcvt_sf, helper_vfp_sltos, uint32_t)
51
@@ -XXX,XX +XXX,XX @@ I2SCodec *tsc210x_codec(uWireSlave *chip)
61
DO_VCVT_FIXED(gvec_vcvt_uf, helper_vfp_ultos, uint32_t)
52
* from the touchscreen. Assuming 12-bit precision was used during
62
DO_VCVT_FIXED(gvec_vcvt_fs, helper_vfp_tosls_round_to_zero, uint32_t)
53
* tslib calibration.
63
DO_VCVT_FIXED(gvec_vcvt_fu, helper_vfp_touls_round_to_zero, uint32_t)
54
*/
64
+DO_VCVT_FIXED(gvec_vcvt_sh, helper_vfp_shtoh, uint16_t)
55
-void tsc210x_set_transform(uWireSlave *chip,
65
+DO_VCVT_FIXED(gvec_vcvt_uh, helper_vfp_uhtoh, uint16_t)
56
- MouseTransformInfo *info)
66
+DO_VCVT_FIXED(gvec_vcvt_hs, helper_vfp_toshh_round_to_zero, uint16_t)
57
+void tsc210x_set_transform(uWireSlave *chip, const MouseTransformInfo *info)
67
+DO_VCVT_FIXED(gvec_vcvt_hu, helper_vfp_touhh_round_to_zero, uint16_t)
68
69
#undef DO_VCVT_FIXED
70
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
71
index XXXXXXX..XXXXXXX 100644
72
--- a/target/arm/translate-neon.c.inc
73
+++ b/target/arm/translate-neon.c.inc
74
@@ -XXX,XX +XXX,XX @@ DO_FP_2SH(VCVT_UF, gen_helper_gvec_vcvt_uf)
75
DO_FP_2SH(VCVT_FS, gen_helper_gvec_vcvt_fs)
76
DO_FP_2SH(VCVT_FU, gen_helper_gvec_vcvt_fu)
77
78
+DO_FP_2SH(VCVT_SH, gen_helper_gvec_vcvt_sh)
79
+DO_FP_2SH(VCVT_UH, gen_helper_gvec_vcvt_uh)
80
+DO_FP_2SH(VCVT_HS, gen_helper_gvec_vcvt_hs)
81
+DO_FP_2SH(VCVT_HU, gen_helper_gvec_vcvt_hu)
82
+
83
static uint64_t asimd_imm_const(uint32_t imm, int cmode, int op)
84
{
58
{
85
/*
59
TSC210xState *s = (TSC210xState *) chip->opaque;
60
#if 0
86
--
61
--
87
2.20.1
62
2.25.1
88
63
89
64
diff view generated by jsdifflib
1
Convert the Neon VRECPS insn to using a gvec helper, and
1
From: Philippe Mathieu-Daudé <philmd@linaro.org>
2
use this to implement the fp16 case.
3
2
4
The phrasing of the new float32_recps_nf() is slightly different from
3
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
5
the old recps_f32() so that it parallels the f16 version; for f16 we
4
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
can't assume that flush-to-zero is always enabled.
5
Message-id: 20221220142520.24094-3-philmd@linaro.org
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
---
8
hw/arm/nseries.c | 18 +++++++++---------
9
1 file changed, 9 insertions(+), 9 deletions(-)
7
10
8
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
diff --git a/hw/arm/nseries.c b/hw/arm/nseries.c
9
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
10
Message-id: 20200828183354.27913-34-peter.maydell@linaro.org
11
---
12
target/arm/helper.h | 4 +++-
13
target/arm/vec_helper.c | 31 +++++++++++++++++++++++++++++++
14
target/arm/vfp_helper.c | 13 -------------
15
target/arm/translate-neon.c.inc | 21 +--------------------
16
4 files changed, 35 insertions(+), 34 deletions(-)
17
18
diff --git a/target/arm/helper.h b/target/arm/helper.h
19
index XXXXXXX..XXXXXXX 100644
12
index XXXXXXX..XXXXXXX 100644
20
--- a/target/arm/helper.h
13
--- a/hw/arm/nseries.c
21
+++ b/target/arm/helper.h
14
+++ b/hw/arm/nseries.c
22
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_4(vfp_muladdd, f64, f64, f64, f64, ptr)
15
@@ -XXX,XX +XXX,XX @@ static void n8x0_i2c_setup(struct n800_s *s)
23
DEF_HELPER_4(vfp_muladds, f32, f32, f32, f32, ptr)
24
DEF_HELPER_4(vfp_muladdh, f16, f16, f16, f16, ptr)
25
26
-DEF_HELPER_3(recps_f32, f32, env, f32, f32)
27
DEF_HELPER_3(rsqrts_f32, f32, env, f32, f32)
28
DEF_HELPER_FLAGS_2(recpe_f16, TCG_CALL_NO_RWG, f16, f16, ptr)
29
DEF_HELPER_FLAGS_2(recpe_f32, TCG_CALL_NO_RWG, f32, f32, ptr)
30
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fmaxnum_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i3
31
DEF_HELPER_FLAGS_5(gvec_fminnum_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
32
DEF_HELPER_FLAGS_5(gvec_fminnum_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
33
34
+DEF_HELPER_FLAGS_5(gvec_recps_nf_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
35
+DEF_HELPER_FLAGS_5(gvec_recps_nf_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
36
+
37
DEF_HELPER_FLAGS_5(gvec_fmla_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
38
DEF_HELPER_FLAGS_5(gvec_fmla_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
39
40
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
41
index XXXXXXX..XXXXXXX 100644
42
--- a/target/arm/vec_helper.c
43
+++ b/target/arm/vec_helper.c
44
@@ -XXX,XX +XXX,XX @@ static float32 float32_abd(float32 op1, float32 op2, float_status *stat)
45
return float32_abs(float32_sub(op1, op2, stat));
46
}
16
}
47
17
48
+/*
18
/* Touchscreen and keypad controller */
49
+ * Reciprocal step. These are the AArch32 version which uses a
19
-static MouseTransformInfo n800_pointercal = {
50
+ * non-fused multiply-and-subtract.
20
+static const MouseTransformInfo n800_pointercal = {
51
+ */
21
.x = 800,
52
+static float16 float16_recps_nf(float16 op1, float16 op2, float_status *stat)
22
.y = 480,
53
+{
23
.a = { 14560, -68, -3455208, -39, -9621, 35152972, 65536 },
54
+ op1 = float16_squash_input_denormal(op1, stat);
24
};
55
+ op2 = float16_squash_input_denormal(op2, stat);
25
56
+
26
-static MouseTransformInfo n810_pointercal = {
57
+ if ((float16_is_infinity(op1) && float16_is_zero(op2)) ||
27
+static const MouseTransformInfo n810_pointercal = {
58
+ (float16_is_infinity(op2) && float16_is_zero(op1))) {
28
.x = 800,
59
+ return float16_two;
29
.y = 480,
60
+ }
30
.a = { 15041, 148, -4731056, 171, -10238, 35933380, 65536 },
61
+ return float16_sub(float16_two, float16_mul(op1, op2, stat), stat);
31
@@ -XXX,XX +XXX,XX @@ static void n810_key_event(void *opaque, int keycode)
62
+}
32
63
+
33
#define M    0
64
+static float32 float32_recps_nf(float32 op1, float32 op2, float_status *stat)
34
65
+{
35
-static int n810_keys[0x80] = {
66
+ op1 = float32_squash_input_denormal(op1, stat);
36
+static const int n810_keys[0x80] = {
67
+ op2 = float32_squash_input_denormal(op2, stat);
37
[0x01] = 16,    /* Q */
68
+
38
[0x02] = 37,    /* K */
69
+ if ((float32_is_infinity(op1) && float32_is_zero(op2)) ||
39
[0x03] = 24,    /* O */
70
+ (float32_is_infinity(op2) && float32_is_zero(op1))) {
40
@@ -XXX,XX +XXX,XX @@ static void n8x0_usb_setup(struct n800_s *s)
71
+ return float32_two;
41
/* Setup done before the main bootloader starts by some early setup code
72
+ }
42
* - used when we want to run the main bootloader in emulation. This
73
+ return float32_sub(float32_two, float32_mul(op1, op2, stat), stat);
43
* isn't documented. */
74
+}
44
-static uint32_t n800_pinout[104] = {
75
+
45
+static const uint32_t n800_pinout[104] = {
76
#define DO_3OP(NAME, FUNC, TYPE) \
46
0x080f00d8, 0x00d40808, 0x03080808, 0x080800d0,
77
void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \
47
0x00dc0808, 0x0b0f0f00, 0x080800b4, 0x00c00808,
78
{ \
48
0x08080808, 0x180800c4, 0x00b80000, 0x08080808,
79
@@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_fmaxnum_s, float32_maxnum, float32)
49
@@ -XXX,XX +XXX,XX @@ static void n8x0_boot_init(void *opaque)
80
DO_3OP(gvec_fminnum_h, float16_minnum, float16)
50
#define OMAP_TAG_CBUS        0x4e03
81
DO_3OP(gvec_fminnum_s, float32_minnum, float32)
51
#define OMAP_TAG_EM_ASIC_BB5    0x4e04
82
52
83
+DO_3OP(gvec_recps_nf_h, float16_recps_nf, float16)
53
-static struct omap_gpiosw_info_s {
84
+DO_3OP(gvec_recps_nf_s, float32_recps_nf, float32)
54
+static const struct omap_gpiosw_info_s {
85
+
55
const char *name;
86
#ifdef TARGET_AARCH64
56
int line;
87
57
int type;
88
DO_3OP(gvec_recps_h, helper_recpsf_f16, float16)
58
@@ -XXX,XX +XXX,XX @@ static struct omap_gpiosw_info_s {
89
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
59
{ NULL }
90
index XXXXXXX..XXXXXXX 100644
60
};
91
--- a/target/arm/vfp_helper.c
61
92
+++ b/target/arm/vfp_helper.c
62
-static struct omap_partition_info_s {
93
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(vfp_fcvt_f64_to_f16)(float64 a, void *fpstp, uint32_t ahp_mode)
63
+static const struct omap_partition_info_s {
94
return r;
64
uint32_t offset;
95
}
65
uint32_t size;
96
66
int mask;
97
-float32 HELPER(recps_f32)(CPUARMState *env, float32 a, float32 b)
67
@@ -XXX,XX +XXX,XX @@ static struct omap_partition_info_s {
98
-{
68
{ 0, 0, 0, NULL }
99
- float_status *s = &env->vfp.standard_fp_status;
69
};
100
- if ((float32_is_infinity(a) && float32_is_zero_or_denormal(b)) ||
70
101
- (float32_is_infinity(b) && float32_is_zero_or_denormal(a))) {
71
-static uint8_t n8x0_bd_addr[6] = { N8X0_BD_ADDR };
102
- if (!(float32_is_zero(a) || float32_is_zero(b))) {
72
+static const uint8_t n8x0_bd_addr[6] = { N8X0_BD_ADDR };
103
- float_raise(float_flag_input_denormal, s);
73
104
- }
74
static int n8x0_atag_setup(void *p, int model)
105
- return float32_two;
106
- }
107
- return float32_sub(float32_two, float32_mul(a, b, s), s);
108
-}
109
-
110
float32 HELPER(rsqrts_f32)(CPUARMState *env, float32 a, float32 b)
111
{
75
{
112
float_status *s = &env->vfp.standard_fp_status;
76
uint8_t *b;
113
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
77
uint16_t *w;
114
index XXXXXXX..XXXXXXX 100644
78
uint32_t *l;
115
--- a/target/arm/translate-neon.c.inc
79
- struct omap_gpiosw_info_s *gpiosw;
116
+++ b/target/arm/translate-neon.c.inc
80
- struct omap_partition_info_s *partition;
117
@@ -XXX,XX +XXX,XX @@ DO_3S_FP_GVEC(VMLA, gen_helper_gvec_fmla_s, gen_helper_gvec_fmla_h)
81
+ const struct omap_gpiosw_info_s *gpiosw;
118
DO_3S_FP_GVEC(VMLS, gen_helper_gvec_fmls_s, gen_helper_gvec_fmls_h)
82
+ const struct omap_partition_info_s *partition;
119
DO_3S_FP_GVEC(VFMA, gen_helper_gvec_vfma_s, gen_helper_gvec_vfma_h)
83
const char *tag;
120
DO_3S_FP_GVEC(VFMS, gen_helper_gvec_vfms_s, gen_helper_gvec_vfms_h)
84
121
+DO_3S_FP_GVEC(VRECPS, gen_helper_gvec_recps_nf_s, gen_helper_gvec_recps_nf_h)
85
w = p;
122
123
WRAP_FP_GVEC(gen_VMAXNM_fp32_3s, FPST_STD, gen_helper_gvec_fmaxnum_s)
124
WRAP_FP_GVEC(gen_VMAXNM_fp16_3s, FPST_STD_F16, gen_helper_gvec_fmaxnum_h)
125
@@ -XXX,XX +XXX,XX @@ static bool trans_VMINNM_fp_3s(DisasContext *s, arg_3same *a)
126
return do_3same(s, a, gen_VMINNM_fp32_3s);
127
}
128
129
-WRAP_ENV_FN(gen_VRECPS_tramp, gen_helper_recps_f32)
130
-
131
-static void gen_VRECPS_fp_3s(unsigned vece, uint32_t rd_ofs,
132
- uint32_t rn_ofs, uint32_t rm_ofs,
133
- uint32_t oprsz, uint32_t maxsz)
134
-{
135
- static const GVecGen3 ops = { .fni4 = gen_VRECPS_tramp };
136
- tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &ops);
137
-}
138
-
139
-static bool trans_VRECPS_fp_3s(DisasContext *s, arg_3same *a)
140
-{
141
- if (a->size != 0) {
142
- /* TODO fp16 support */
143
- return false;
144
- }
145
-
146
- return do_3same(s, a, gen_VRECPS_fp_3s);
147
-}
148
-
149
WRAP_ENV_FN(gen_VRSQRTS_tramp, gen_helper_rsqrts_f32)
150
151
static void gen_VRSQRTS_fp_3s(unsigned vece, uint32_t rd_ofs,
152
--
86
--
153
2.20.1
87
2.25.1
154
88
155
89
diff view generated by jsdifflib
1
Convert the neon floating-point vector compare-vs-0 insns VCEQ0,
1
From: Philippe Mathieu-Daudé <philmd@linaro.org>
2
VCGT0, VCLE0, VCGE0 and VCLT0 to use a gvec helper, and use this to
3
implement the fp16 case.
4
2
3
Silent when compiling with -Wextra:
4
5
../hw/arm/nseries.c:1081:12: warning: missing field 'line' initializer [-Wmissing-field-initializers]
6
{ NULL }
7
^
8
9
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
10
Message-id: 20221220142520.24094-4-philmd@linaro.org
11
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
12
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
7
Message-id: 20200828183354.27913-33-peter.maydell@linaro.org
8
---
13
---
9
target/arm/helper.h | 15 +++++++++++++++
14
hw/arm/nseries.c | 10 ++++------
10
target/arm/vec_helper.c | 25 +++++++++++++++++++++++++
15
1 file changed, 4 insertions(+), 6 deletions(-)
11
target/arm/translate-neon.c.inc | 33 +++++----------------------------
12
3 files changed, 45 insertions(+), 28 deletions(-)
13
16
14
diff --git a/target/arm/helper.h b/target/arm/helper.h
17
diff --git a/hw/arm/nseries.c b/hw/arm/nseries.c
15
index XXXXXXX..XXXXXXX 100644
18
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/helper.h
19
--- a/hw/arm/nseries.c
17
+++ b/target/arm/helper.h
20
+++ b/hw/arm/nseries.c
18
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_frsqrte_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
21
@@ -XXX,XX +XXX,XX @@ static const struct omap_gpiosw_info_s {
19
DEF_HELPER_FLAGS_4(gvec_frsqrte_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
22
"headphone", N8X0_HEADPHONE_GPIO,
20
DEF_HELPER_FLAGS_4(gvec_frsqrte_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
23
OMAP_GPIOSW_TYPE_CONNECTION | OMAP_GPIOSW_INVERTED,
21
24
},
22
+DEF_HELPER_FLAGS_4(gvec_fcgt0_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
25
- { NULL }
23
+DEF_HELPER_FLAGS_4(gvec_fcgt0_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
26
+ { /* end of list */ }
24
+
27
}, n810_gpiosw_info[] = {
25
+DEF_HELPER_FLAGS_4(gvec_fcge0_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
28
{
26
+DEF_HELPER_FLAGS_4(gvec_fcge0_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
29
"gps_reset", N810_GPS_RESET_GPIO,
27
+
30
@@ -XXX,XX +XXX,XX @@ static const struct omap_gpiosw_info_s {
28
+DEF_HELPER_FLAGS_4(gvec_fceq0_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
31
"slide", N810_SLIDE_GPIO,
29
+DEF_HELPER_FLAGS_4(gvec_fceq0_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
32
OMAP_GPIOSW_TYPE_COVER | OMAP_GPIOSW_INVERTED,
30
+
33
},
31
+DEF_HELPER_FLAGS_4(gvec_fcle0_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
34
- { NULL }
32
+DEF_HELPER_FLAGS_4(gvec_fcle0_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
35
+ { /* end of list */ }
33
+
36
};
34
+DEF_HELPER_FLAGS_4(gvec_fclt0_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
37
35
+DEF_HELPER_FLAGS_4(gvec_fclt0_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
38
static const struct omap_partition_info_s {
36
+
39
@@ -XXX,XX +XXX,XX @@ static const struct omap_partition_info_s {
37
DEF_HELPER_FLAGS_5(gvec_fadd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
40
{ 0x00080000, 0x00200000, 0x0, "kernel" },
38
DEF_HELPER_FLAGS_5(gvec_fadd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
41
{ 0x00280000, 0x00200000, 0x3, "initfs" },
39
DEF_HELPER_FLAGS_5(gvec_fadd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
42
{ 0x00480000, 0x0fb80000, 0x3, "rootfs" },
40
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
41
index XXXXXXX..XXXXXXX 100644
42
--- a/target/arm/vec_helper.c
43
+++ b/target/arm/vec_helper.c
44
@@ -XXX,XX +XXX,XX @@ DO_2OP(gvec_frsqrte_h, helper_rsqrte_f16, float16)
45
DO_2OP(gvec_frsqrte_s, helper_rsqrte_f32, float32)
46
DO_2OP(gvec_frsqrte_d, helper_rsqrte_f64, float64)
47
48
+#define WRAP_CMP0_FWD(FN, CMPOP, TYPE) \
49
+ static TYPE TYPE##_##FN##0(TYPE op, float_status *stat) \
50
+ { \
51
+ return TYPE##_##CMPOP(op, TYPE##_zero, stat); \
52
+ }
53
+
54
+#define WRAP_CMP0_REV(FN, CMPOP, TYPE) \
55
+ static TYPE TYPE##_##FN##0(TYPE op, float_status *stat) \
56
+ { \
57
+ return TYPE##_##CMPOP(TYPE##_zero, op, stat); \
58
+ }
59
+
60
+#define DO_2OP_CMP0(FN, CMPOP, DIRN) \
61
+ WRAP_CMP0_##DIRN(FN, CMPOP, float16) \
62
+ WRAP_CMP0_##DIRN(FN, CMPOP, float32) \
63
+ DO_2OP(gvec_f##FN##0_h, float16_##FN##0, float16) \
64
+ DO_2OP(gvec_f##FN##0_s, float32_##FN##0, float32)
65
+
66
+DO_2OP_CMP0(cgt, cgt, FWD)
67
+DO_2OP_CMP0(cge, cge, FWD)
68
+DO_2OP_CMP0(ceq, ceq, FWD)
69
+DO_2OP_CMP0(clt, cgt, REV)
70
+DO_2OP_CMP0(cle, cge, REV)
71
+
72
#undef DO_2OP
73
+#undef DO_2OP_CMP0
74
75
/* Floating-point trigonometric starting value.
76
* See the ARM ARM pseudocode function FPTrigSMul.
77
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
78
index XXXXXXX..XXXXXXX 100644
79
--- a/target/arm/translate-neon.c.inc
80
+++ b/target/arm/translate-neon.c.inc
81
@@ -XXX,XX +XXX,XX @@ DO_2MISC_FP(VCVT_UF, gen_helper_vfp_touizs)
82
83
DO_2MISC_FP_VEC(VRECPE_F, gen_helper_gvec_frecpe_h, gen_helper_gvec_frecpe_s)
84
DO_2MISC_FP_VEC(VRSQRTE_F, gen_helper_gvec_frsqrte_h, gen_helper_gvec_frsqrte_s)
85
+DO_2MISC_FP_VEC(VCGT0_F, gen_helper_gvec_fcgt0_h, gen_helper_gvec_fcgt0_s)
86
+DO_2MISC_FP_VEC(VCGE0_F, gen_helper_gvec_fcge0_h, gen_helper_gvec_fcge0_s)
87
+DO_2MISC_FP_VEC(VCEQ0_F, gen_helper_gvec_fceq0_h, gen_helper_gvec_fceq0_s)
88
+DO_2MISC_FP_VEC(VCLT0_F, gen_helper_gvec_fclt0_h, gen_helper_gvec_fclt0_s)
89
+DO_2MISC_FP_VEC(VCLE0_F, gen_helper_gvec_fcle0_h, gen_helper_gvec_fcle0_s)
90
91
static bool trans_VRINTX(DisasContext *s, arg_2misc *a)
92
{
93
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX(DisasContext *s, arg_2misc *a)
94
return do_2misc_fp(s, a, gen_helper_rints_exact);
95
}
96
97
-#define WRAP_FP_CMP0_FWD(WRAPNAME, FUNC) \
98
- static void WRAPNAME(TCGv_i32 d, TCGv_i32 m, TCGv_ptr fpst) \
99
- { \
100
- TCGv_i32 zero = tcg_const_i32(0); \
101
- FUNC(d, m, zero, fpst); \
102
- tcg_temp_free_i32(zero); \
103
- }
104
-#define WRAP_FP_CMP0_REV(WRAPNAME, FUNC) \
105
- static void WRAPNAME(TCGv_i32 d, TCGv_i32 m, TCGv_ptr fpst) \
106
- { \
107
- TCGv_i32 zero = tcg_const_i32(0); \
108
- FUNC(d, zero, m, fpst); \
109
- tcg_temp_free_i32(zero); \
110
- }
111
-
43
-
112
-#define DO_FP_CMP0(INSN, FUNC, REV) \
44
- { 0, 0, 0, NULL }
113
- WRAP_FP_CMP0_##REV(gen_##INSN, FUNC) \
45
+ { /* end of list */ }
114
- static bool trans_##INSN(DisasContext *s, arg_2misc *a) \
46
}, n810_part_info[] = {
115
- { \
47
{ 0x00000000, 0x00020000, 0x3, "bootloader" },
116
- return do_2misc_fp(s, a, gen_##INSN); \
48
{ 0x00020000, 0x00060000, 0x0, "config" },
117
- }
49
{ 0x00080000, 0x00220000, 0x0, "kernel" },
50
{ 0x002a0000, 0x00400000, 0x0, "initfs" },
51
{ 0x006a0000, 0x0f960000, 0x0, "rootfs" },
118
-
52
-
119
-DO_FP_CMP0(VCGT0_F, gen_helper_neon_cgt_f32, FWD)
53
- { 0, 0, 0, NULL }
120
-DO_FP_CMP0(VCGE0_F, gen_helper_neon_cge_f32, FWD)
54
+ { /* end of list */ }
121
-DO_FP_CMP0(VCEQ0_F, gen_helper_neon_ceq_f32, FWD)
55
};
122
-DO_FP_CMP0(VCLE0_F, gen_helper_neon_cge_f32, REV)
56
123
-DO_FP_CMP0(VCLT0_F, gen_helper_neon_cgt_f32, REV)
57
static const uint8_t n8x0_bd_addr[6] = { N8X0_BD_ADDR };
124
-
125
static bool do_vrint(DisasContext *s, arg_2misc *a, int rmode)
126
{
127
/*
128
--
58
--
129
2.20.1
59
2.25.1
130
60
131
61
diff view generated by jsdifflib
1
Convert the neon floating-point vector operations VFMA and VFMS
1
From: Zhuojia Shen <chaosdefinition@hotmail.com>
2
to use a gvec helper, and use this to implement the fp16 case.
2
3
3
In CPUID registers exposed to userspace, some registers were missing
4
This is the last use of do_3same_fp() so we can now delete
4
and some fields were not exposed. This patch aligns exposed ID
5
that function.
5
registers and their fields with what the upstream kernel currently
6
6
exposes.
7
8
Specifically, the following new ID registers/fields are exposed to
9
userspace:
10
11
ID_AA64PFR1_EL1.BT: bits 3-0
12
ID_AA64PFR1_EL1.MTE: bits 11-8
13
ID_AA64PFR1_EL1.SME: bits 27-24
14
15
ID_AA64ZFR0_EL1.SVEver: bits 3-0
16
ID_AA64ZFR0_EL1.AES: bits 7-4
17
ID_AA64ZFR0_EL1.BitPerm: bits 19-16
18
ID_AA64ZFR0_EL1.BF16: bits 23-20
19
ID_AA64ZFR0_EL1.SHA3: bits 35-32
20
ID_AA64ZFR0_EL1.SM4: bits 43-40
21
ID_AA64ZFR0_EL1.I8MM: bits 47-44
22
ID_AA64ZFR0_EL1.F32MM: bits 55-52
23
ID_AA64ZFR0_EL1.F64MM: bits 59-56
24
25
ID_AA64SMFR0_EL1.F32F32: bit 32
26
ID_AA64SMFR0_EL1.B16F32: bit 34
27
ID_AA64SMFR0_EL1.F16F32: bit 35
28
ID_AA64SMFR0_EL1.I8I32: bits 39-36
29
ID_AA64SMFR0_EL1.F64F64: bit 48
30
ID_AA64SMFR0_EL1.I16I64: bits 55-52
31
ID_AA64SMFR0_EL1.FA64: bit 63
32
33
ID_AA64MMFR0_EL1.ECV: bits 63-60
34
35
ID_AA64MMFR1_EL1.AFP: bits 47-44
36
37
ID_AA64MMFR2_EL1.AT: bits 35-32
38
39
ID_AA64ISAR0_EL1.RNDR: bits 63-60
40
41
ID_AA64ISAR1_EL1.FRINTTS: bits 35-32
42
ID_AA64ISAR1_EL1.BF16: bits 47-44
43
ID_AA64ISAR1_EL1.DGH: bits 51-48
44
ID_AA64ISAR1_EL1.I8MM: bits 55-52
45
46
ID_AA64ISAR2_EL1.WFxT: bits 3-0
47
ID_AA64ISAR2_EL1.RPRES: bits 7-4
48
ID_AA64ISAR2_EL1.GPA3: bits 11-8
49
ID_AA64ISAR2_EL1.APA3: bits 15-12
50
51
The code is also refactored to use symbolic names for ID register fields
52
for better readability and maintainability.
53
54
The test case in tests/tcg/aarch64/sysregs.c is also updated to match
55
the intended behavior.
56
57
Signed-off-by: Zhuojia Shen <chaosdefinition@hotmail.com>
58
Message-id: DS7PR12MB6309FB585E10772928F14271ACE79@DS7PR12MB6309.namprd12.prod.outlook.com
59
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
60
[PMM: use Sn_n_Cn_Cn_n syntax to work with older assemblers
61
that don't recognize id_aa64isar2_el1 and id_aa64mmfr2_el1]
7
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
62
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
9
Message-id: 20200828183354.27913-32-peter.maydell@linaro.org
10
---
63
---
11
target/arm/helper.h | 6 +++
64
target/arm/helper.c | 96 +++++++++++++++++++++++++------
12
target/arm/vec_helper.c | 33 +++++++++++-
65
tests/tcg/aarch64/sysregs.c | 24 ++++++--
13
target/arm/translate-neon.c.inc | 92 +--------------------------------
66
tests/tcg/aarch64/Makefile.target | 7 ++-
14
3 files changed, 40 insertions(+), 91 deletions(-)
67
3 files changed, 103 insertions(+), 24 deletions(-)
15
68
16
diff --git a/target/arm/helper.h b/target/arm/helper.h
69
diff --git a/target/arm/helper.c b/target/arm/helper.c
17
index XXXXXXX..XXXXXXX 100644
70
index XXXXXXX..XXXXXXX 100644
18
--- a/target/arm/helper.h
71
--- a/target/arm/helper.c
19
+++ b/target/arm/helper.h
72
+++ b/target/arm/helper.c
20
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fmla_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
73
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
21
DEF_HELPER_FLAGS_5(gvec_fmls_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
74
#ifdef CONFIG_USER_ONLY
22
DEF_HELPER_FLAGS_5(gvec_fmls_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
75
static const ARMCPRegUserSpaceInfo v8_user_idregs[] = {
23
76
{ .name = "ID_AA64PFR0_EL1",
24
+DEF_HELPER_FLAGS_5(gvec_vfma_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
77
- .exported_bits = 0x000f000f00ff0000,
25
+DEF_HELPER_FLAGS_5(gvec_vfma_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
78
- .fixed_bits = 0x0000000000000011 },
79
+ .exported_bits = R_ID_AA64PFR0_FP_MASK |
80
+ R_ID_AA64PFR0_ADVSIMD_MASK |
81
+ R_ID_AA64PFR0_SVE_MASK |
82
+ R_ID_AA64PFR0_DIT_MASK,
83
+ .fixed_bits = (0x1u << R_ID_AA64PFR0_EL0_SHIFT) |
84
+ (0x1u << R_ID_AA64PFR0_EL1_SHIFT) },
85
{ .name = "ID_AA64PFR1_EL1",
86
- .exported_bits = 0x00000000000000f0 },
87
+ .exported_bits = R_ID_AA64PFR1_BT_MASK |
88
+ R_ID_AA64PFR1_SSBS_MASK |
89
+ R_ID_AA64PFR1_MTE_MASK |
90
+ R_ID_AA64PFR1_SME_MASK },
91
{ .name = "ID_AA64PFR*_EL1_RESERVED",
92
- .is_glob = true },
93
- { .name = "ID_AA64ZFR0_EL1" },
94
+ .is_glob = true },
95
+ { .name = "ID_AA64ZFR0_EL1",
96
+ .exported_bits = R_ID_AA64ZFR0_SVEVER_MASK |
97
+ R_ID_AA64ZFR0_AES_MASK |
98
+ R_ID_AA64ZFR0_BITPERM_MASK |
99
+ R_ID_AA64ZFR0_BFLOAT16_MASK |
100
+ R_ID_AA64ZFR0_SHA3_MASK |
101
+ R_ID_AA64ZFR0_SM4_MASK |
102
+ R_ID_AA64ZFR0_I8MM_MASK |
103
+ R_ID_AA64ZFR0_F32MM_MASK |
104
+ R_ID_AA64ZFR0_F64MM_MASK },
105
+ { .name = "ID_AA64SMFR0_EL1",
106
+ .exported_bits = R_ID_AA64SMFR0_F32F32_MASK |
107
+ R_ID_AA64SMFR0_B16F32_MASK |
108
+ R_ID_AA64SMFR0_F16F32_MASK |
109
+ R_ID_AA64SMFR0_I8I32_MASK |
110
+ R_ID_AA64SMFR0_F64F64_MASK |
111
+ R_ID_AA64SMFR0_I16I64_MASK |
112
+ R_ID_AA64SMFR0_FA64_MASK },
113
{ .name = "ID_AA64MMFR0_EL1",
114
- .fixed_bits = 0x00000000ff000000 },
115
- { .name = "ID_AA64MMFR1_EL1" },
116
+ .exported_bits = R_ID_AA64MMFR0_ECV_MASK,
117
+ .fixed_bits = (0xfu << R_ID_AA64MMFR0_TGRAN64_SHIFT) |
118
+ (0xfu << R_ID_AA64MMFR0_TGRAN4_SHIFT) },
119
+ { .name = "ID_AA64MMFR1_EL1",
120
+ .exported_bits = R_ID_AA64MMFR1_AFP_MASK },
121
+ { .name = "ID_AA64MMFR2_EL1",
122
+ .exported_bits = R_ID_AA64MMFR2_AT_MASK },
123
{ .name = "ID_AA64MMFR*_EL1_RESERVED",
124
- .is_glob = true },
125
+ .is_glob = true },
126
{ .name = "ID_AA64DFR0_EL1",
127
- .fixed_bits = 0x0000000000000006 },
128
- { .name = "ID_AA64DFR1_EL1" },
129
+ .fixed_bits = (0x6u << R_ID_AA64DFR0_DEBUGVER_SHIFT) },
130
+ { .name = "ID_AA64DFR1_EL1" },
131
{ .name = "ID_AA64DFR*_EL1_RESERVED",
132
- .is_glob = true },
133
+ .is_glob = true },
134
{ .name = "ID_AA64AFR*",
135
- .is_glob = true },
136
+ .is_glob = true },
137
{ .name = "ID_AA64ISAR0_EL1",
138
- .exported_bits = 0x00fffffff0fffff0 },
139
+ .exported_bits = R_ID_AA64ISAR0_AES_MASK |
140
+ R_ID_AA64ISAR0_SHA1_MASK |
141
+ R_ID_AA64ISAR0_SHA2_MASK |
142
+ R_ID_AA64ISAR0_CRC32_MASK |
143
+ R_ID_AA64ISAR0_ATOMIC_MASK |
144
+ R_ID_AA64ISAR0_RDM_MASK |
145
+ R_ID_AA64ISAR0_SHA3_MASK |
146
+ R_ID_AA64ISAR0_SM3_MASK |
147
+ R_ID_AA64ISAR0_SM4_MASK |
148
+ R_ID_AA64ISAR0_DP_MASK |
149
+ R_ID_AA64ISAR0_FHM_MASK |
150
+ R_ID_AA64ISAR0_TS_MASK |
151
+ R_ID_AA64ISAR0_RNDR_MASK },
152
{ .name = "ID_AA64ISAR1_EL1",
153
- .exported_bits = 0x000000f0ffffffff },
154
+ .exported_bits = R_ID_AA64ISAR1_DPB_MASK |
155
+ R_ID_AA64ISAR1_APA_MASK |
156
+ R_ID_AA64ISAR1_API_MASK |
157
+ R_ID_AA64ISAR1_JSCVT_MASK |
158
+ R_ID_AA64ISAR1_FCMA_MASK |
159
+ R_ID_AA64ISAR1_LRCPC_MASK |
160
+ R_ID_AA64ISAR1_GPA_MASK |
161
+ R_ID_AA64ISAR1_GPI_MASK |
162
+ R_ID_AA64ISAR1_FRINTTS_MASK |
163
+ R_ID_AA64ISAR1_SB_MASK |
164
+ R_ID_AA64ISAR1_BF16_MASK |
165
+ R_ID_AA64ISAR1_DGH_MASK |
166
+ R_ID_AA64ISAR1_I8MM_MASK },
167
+ { .name = "ID_AA64ISAR2_EL1",
168
+ .exported_bits = R_ID_AA64ISAR2_WFXT_MASK |
169
+ R_ID_AA64ISAR2_RPRES_MASK |
170
+ R_ID_AA64ISAR2_GPA3_MASK |
171
+ R_ID_AA64ISAR2_APA3_MASK },
172
{ .name = "ID_AA64ISAR*_EL1_RESERVED",
173
- .is_glob = true },
174
+ .is_glob = true },
175
};
176
modify_arm_cp_regs(v8_idregs, v8_user_idregs);
177
#endif
178
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
179
#ifdef CONFIG_USER_ONLY
180
static const ARMCPRegUserSpaceInfo id_v8_user_midr_cp_reginfo[] = {
181
{ .name = "MIDR_EL1",
182
- .exported_bits = 0x00000000ffffffff },
183
- { .name = "REVIDR_EL1" },
184
+ .exported_bits = R_MIDR_EL1_REVISION_MASK |
185
+ R_MIDR_EL1_PARTNUM_MASK |
186
+ R_MIDR_EL1_ARCHITECTURE_MASK |
187
+ R_MIDR_EL1_VARIANT_MASK |
188
+ R_MIDR_EL1_IMPLEMENTER_MASK },
189
+ { .name = "REVIDR_EL1" },
190
};
191
modify_arm_cp_regs(id_v8_midr_cp_reginfo, id_v8_user_midr_cp_reginfo);
192
#endif
193
diff --git a/tests/tcg/aarch64/sysregs.c b/tests/tcg/aarch64/sysregs.c
194
index XXXXXXX..XXXXXXX 100644
195
--- a/tests/tcg/aarch64/sysregs.c
196
+++ b/tests/tcg/aarch64/sysregs.c
197
@@ -XXX,XX +XXX,XX @@
198
#define HWCAP_CPUID (1 << 11)
199
#endif
200
201
+/*
202
+ * Older assemblers don't recognize newer system register names,
203
+ * but we can still access them by the Sn_n_Cn_Cn_n syntax.
204
+ */
205
+#define SYS_ID_AA64ISAR2_EL1 S3_0_C0_C6_2
206
+#define SYS_ID_AA64MMFR2_EL1 S3_0_C0_C7_2
26
+
207
+
27
+DEF_HELPER_FLAGS_5(gvec_vfms_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
208
int failed_bit_count;
28
+DEF_HELPER_FLAGS_5(gvec_vfms_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
209
29
+
210
/* Read and print system register `id' value */
30
DEF_HELPER_FLAGS_5(gvec_ftsmul_h, TCG_CALL_NO_RWG,
211
@@ -XXX,XX +XXX,XX @@ int main(void)
31
void, ptr, ptr, ptr, ptr, i32)
212
* minimum valid fields - for the purposes of this check allowed
32
DEF_HELPER_FLAGS_5(gvec_ftsmul_s, TCG_CALL_NO_RWG,
213
* to have non-zero values.
33
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
214
*/
215
- get_cpu_reg_check_mask(id_aa64isar0_el1, _m(00ff,ffff,f0ff,fff0));
216
- get_cpu_reg_check_mask(id_aa64isar1_el1, _m(0000,00f0,ffff,ffff));
217
+ get_cpu_reg_check_mask(id_aa64isar0_el1, _m(f0ff,ffff,f0ff,fff0));
218
+ get_cpu_reg_check_mask(id_aa64isar1_el1, _m(00ff,f0ff,ffff,ffff));
219
+ get_cpu_reg_check_mask(SYS_ID_AA64ISAR2_EL1, _m(0000,0000,0000,ffff));
220
/* TGran4 & TGran64 as pegged to -1 */
221
- get_cpu_reg_check_mask(id_aa64mmfr0_el1, _m(0000,0000,ff00,0000));
222
- get_cpu_reg_check_zero(id_aa64mmfr1_el1);
223
+ get_cpu_reg_check_mask(id_aa64mmfr0_el1, _m(f000,0000,ff00,0000));
224
+ get_cpu_reg_check_mask(id_aa64mmfr1_el1, _m(0000,f000,0000,0000));
225
+ get_cpu_reg_check_mask(SYS_ID_AA64MMFR2_EL1, _m(0000,000f,0000,0000));
226
/* EL1/EL0 reported as AA64 only */
227
get_cpu_reg_check_mask(id_aa64pfr0_el1, _m(000f,000f,00ff,0011));
228
- get_cpu_reg_check_mask(id_aa64pfr1_el1, _m(0000,0000,0000,00f0));
229
+ get_cpu_reg_check_mask(id_aa64pfr1_el1, _m(0000,0000,0f00,0fff));
230
/* all hidden, DebugVer fixed to 0x6 (ARMv8 debug architecture) */
231
get_cpu_reg_check_mask(id_aa64dfr0_el1, _m(0000,0000,0000,0006));
232
get_cpu_reg_check_zero(id_aa64dfr1_el1);
233
- get_cpu_reg_check_zero(id_aa64zfr0_el1);
234
+ get_cpu_reg_check_mask(id_aa64zfr0_el1, _m(0ff0,ff0f,00ff,00ff));
235
+#ifdef HAS_ARMV9_SME
236
+ get_cpu_reg_check_mask(id_aa64smfr0_el1, _m(80f1,00fd,0000,0000));
237
+#endif
238
239
get_cpu_reg_check_zero(id_aa64afr0_el1);
240
get_cpu_reg_check_zero(id_aa64afr1_el1);
241
diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target
34
index XXXXXXX..XXXXXXX 100644
242
index XXXXXXX..XXXXXXX 100644
35
--- a/target/arm/vec_helper.c
243
--- a/tests/tcg/aarch64/Makefile.target
36
+++ b/target/arm/vec_helper.c
244
+++ b/tests/tcg/aarch64/Makefile.target
37
@@ -XXX,XX +XXX,XX @@ static float32 float32_mulsub_nf(float32 dest, float32 op1, float32 op2,
245
@@ -XXX,XX +XXX,XX @@ config-cc.mak: Makefile
38
return float32_sub(dest, float32_mul(op1, op2, stat), stat);
246
     $(call cc-option,-march=armv8.1-a+sve2, CROSS_CC_HAS_SVE2); \
39
}
247
     $(call cc-option,-march=armv8.3-a, CROSS_CC_HAS_ARMV8_3); \
40
248
     $(call cc-option,-mbranch-protection=standard, CROSS_CC_HAS_ARMV8_BTI); \
41
-#define DO_MULADD(NAME, FUNC, TYPE) \
249
-     $(call cc-option,-march=armv8.5-a+memtag, CROSS_CC_HAS_ARMV8_MTE)) 3> config-cc.mak
42
+/* Fused versions; these have the semantics Neon VFMA/VFMS want */
250
+     $(call cc-option,-march=armv8.5-a+memtag, CROSS_CC_HAS_ARMV8_MTE); \
43
+static float16 float16_muladd_f(float16 dest, float16 op1, float16 op2,
251
+     $(call cc-option,-march=armv9-a+sme, CROSS_CC_HAS_ARMV9_SME)) 3> config-cc.mak
44
+ float_status *stat)
252
-include config-cc.mak
45
+{
253
46
+ return float16_muladd(op1, op2, dest, 0, stat);
254
# Pauth Tests
47
+}
255
@@ -XXX,XX +XXX,XX @@ endif
48
+
256
ifneq ($(CROSS_CC_HAS_SVE),)
49
+static float32 float32_muladd_f(float32 dest, float32 op1, float32 op2,
257
# System Registers Tests
50
+ float_status *stat)
258
AARCH64_TESTS += sysregs
51
+{
259
+ifneq ($(CROSS_CC_HAS_ARMV9_SME),)
52
+ return float32_muladd(op1, op2, dest, 0, stat);
260
+sysregs: CFLAGS+=-march=armv9-a+sme -DHAS_ARMV9_SME
53
+}
261
+else
54
+
262
sysregs: CFLAGS+=-march=armv8.1-a+sve
55
+static float16 float16_mulsub_f(float16 dest, float16 op1, float16 op2,
263
+endif
56
+ float_status *stat)
264
57
+{
265
# SVE ioctl test
58
+ return float16_muladd(float16_chs(op1), op2, dest, 0, stat);
266
AARCH64_TESTS += sve-ioctls
59
+}
60
+
61
+static float32 float32_mulsub_f(float32 dest, float32 op1, float32 op2,
62
+ float_status *stat)
63
+{
64
+ return float32_muladd(float32_chs(op1), op2, dest, 0, stat);
65
+}
66
+
67
+#define DO_MULADD(NAME, FUNC, TYPE) \
68
void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \
69
{ \
70
intptr_t i, oprsz = simd_oprsz(desc); \
71
@@ -XXX,XX +XXX,XX @@ DO_MULADD(gvec_fmla_s, float32_muladd_nf, float32)
72
DO_MULADD(gvec_fmls_h, float16_mulsub_nf, float16)
73
DO_MULADD(gvec_fmls_s, float32_mulsub_nf, float32)
74
75
+DO_MULADD(gvec_vfma_h, float16_muladd_f, float16)
76
+DO_MULADD(gvec_vfma_s, float32_muladd_f, float32)
77
+
78
+DO_MULADD(gvec_vfms_h, float16_mulsub_f, float16)
79
+DO_MULADD(gvec_vfms_s, float32_mulsub_f, float32)
80
+
81
/* For the indexed ops, SVE applies the index per 128-bit vector segment.
82
* For AdvSIMD, there is of course only one such vector segment.
83
*/
84
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
85
index XXXXXXX..XXXXXXX 100644
86
--- a/target/arm/translate-neon.c.inc
87
+++ b/target/arm/translate-neon.c.inc
88
@@ -XXX,XX +XXX,XX @@ DO_3SAME_PAIR(VPADD, padd_u)
89
DO_3SAME_VQDMULH(VQDMULH, qdmulh)
90
DO_3SAME_VQDMULH(VQRDMULH, qrdmulh)
91
92
-static bool do_3same_fp(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn,
93
- bool reads_vd)
94
-{
95
- /*
96
- * FP operations handled elementwise 32 bits at a time.
97
- * If reads_vd is true then the old value of Vd will be
98
- * loaded before calling the callback function. This is
99
- * used for multiply-accumulate type operations.
100
- */
101
- TCGv_i32 tmp, tmp2;
102
- int pass;
103
-
104
- if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
105
- return false;
106
- }
107
-
108
- /* UNDEF accesses to D16-D31 if they don't exist. */
109
- if (!dc_isar_feature(aa32_simd_r32, s) &&
110
- ((a->vd | a->vn | a->vm) & 0x10)) {
111
- return false;
112
- }
113
-
114
- if ((a->vn | a->vm | a->vd) & a->q) {
115
- return false;
116
- }
117
-
118
- if (!vfp_access_check(s)) {
119
- return true;
120
- }
121
-
122
- TCGv_ptr fpstatus = fpstatus_ptr(FPST_STD);
123
- for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
124
- tmp = neon_load_reg(a->vn, pass);
125
- tmp2 = neon_load_reg(a->vm, pass);
126
- if (reads_vd) {
127
- TCGv_i32 tmp_rd = neon_load_reg(a->vd, pass);
128
- fn(tmp_rd, tmp, tmp2, fpstatus);
129
- neon_store_reg(a->vd, pass, tmp_rd);
130
- tcg_temp_free_i32(tmp);
131
- } else {
132
- fn(tmp, tmp, tmp2, fpstatus);
133
- neon_store_reg(a->vd, pass, tmp);
134
- }
135
- tcg_temp_free_i32(tmp2);
136
- }
137
- tcg_temp_free_ptr(fpstatus);
138
- return true;
139
-}
140
-
141
#define WRAP_FP_GVEC(WRAPNAME, FPST, FUNC) \
142
static void WRAPNAME(unsigned vece, uint32_t rd_ofs, \
143
uint32_t rn_ofs, uint32_t rm_ofs, \
144
@@ -XXX,XX +XXX,XX @@ DO_3S_FP_GVEC(VMAX, gen_helper_gvec_fmax_s, gen_helper_gvec_fmax_h)
145
DO_3S_FP_GVEC(VMIN, gen_helper_gvec_fmin_s, gen_helper_gvec_fmin_h)
146
DO_3S_FP_GVEC(VMLA, gen_helper_gvec_fmla_s, gen_helper_gvec_fmla_h)
147
DO_3S_FP_GVEC(VMLS, gen_helper_gvec_fmls_s, gen_helper_gvec_fmls_h)
148
+DO_3S_FP_GVEC(VFMA, gen_helper_gvec_vfma_s, gen_helper_gvec_vfma_h)
149
+DO_3S_FP_GVEC(VFMS, gen_helper_gvec_vfms_s, gen_helper_gvec_vfms_h)
150
151
WRAP_FP_GVEC(gen_VMAXNM_fp32_3s, FPST_STD, gen_helper_gvec_fmaxnum_s)
152
WRAP_FP_GVEC(gen_VMAXNM_fp16_3s, FPST_STD_F16, gen_helper_gvec_fmaxnum_h)
153
@@ -XXX,XX +XXX,XX @@ static bool trans_VRSQRTS_fp_3s(DisasContext *s, arg_3same *a)
154
return do_3same(s, a, gen_VRSQRTS_fp_3s);
155
}
156
157
-static void gen_VFMA_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm,
158
- TCGv_ptr fpstatus)
159
-{
160
- gen_helper_vfp_muladds(vd, vn, vm, vd, fpstatus);
161
-}
162
-
163
-static bool trans_VFMA_fp_3s(DisasContext *s, arg_3same *a)
164
-{
165
- if (!dc_isar_feature(aa32_simdfmac, s)) {
166
- return false;
167
- }
168
-
169
- if (a->size != 0) {
170
- /* TODO fp16 support */
171
- return false;
172
- }
173
-
174
- return do_3same_fp(s, a, gen_VFMA_fp_3s, true);
175
-}
176
-
177
-static void gen_VFMS_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm,
178
- TCGv_ptr fpstatus)
179
-{
180
- gen_helper_vfp_negs(vn, vn);
181
- gen_helper_vfp_muladds(vd, vn, vm, vd, fpstatus);
182
-}
183
-
184
-static bool trans_VFMS_fp_3s(DisasContext *s, arg_3same *a)
185
-{
186
- if (!dc_isar_feature(aa32_simdfmac, s)) {
187
- return false;
188
- }
189
-
190
- if (a->size != 0) {
191
- /* TODO fp16 support */
192
- return false;
193
- }
194
-
195
- return do_3same_fp(s, a, gen_VFMS_fp_3s, true);
196
-}
197
-
198
static bool do_3same_fp_pair(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn)
199
{
200
/* FP operations handled pairwise 32 bits at a time */
201
--
267
--
202
2.20.1
268
2.25.1
203
204
diff view generated by jsdifflib
1
Implement VFP fp16 for VABS, VNEG and VSQRT. This is all
1
From: Philippe Mathieu-Daudé <philmd@linaro.org>
2
the fp16 insns that use the DO_VFP_2OP macro, because there
3
is no fp16 version of VMOV_reg.
4
2
5
Notes:
3
This function is not used anywhere outside this file,
6
* the gen_helper_vfp_negh already exists as we needed to create
4
so we can make the function "static void".
7
it for the fp16 multiply-add insns
8
* as usual we need to use the f16 version of the fp_status;
9
this is only relevant for VSQRT
10
5
6
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
7
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
8
Reviewed-by: Eric Auger <eric.auger@redhat.com>
9
Message-id: 20221216214924.4711-2-philmd@linaro.org
11
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
12
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
13
Message-id: 20200828183354.27913-9-peter.maydell@linaro.org
14
---
11
---
15
target/arm/helper.h | 2 ++
12
include/hw/arm/smmu-common.h | 3 ---
16
target/arm/vfp.decode | 3 +++
13
hw/arm/smmu-common.c | 2 +-
17
target/arm/vfp_helper.c | 10 +++++++++
14
2 files changed, 1 insertion(+), 4 deletions(-)
18
target/arm/translate-vfp.c.inc | 40 ++++++++++++++++++++++++++++++++++
19
4 files changed, 55 insertions(+)
20
15
21
diff --git a/target/arm/helper.h b/target/arm/helper.h
16
diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
22
index XXXXXXX..XXXXXXX 100644
17
index XXXXXXX..XXXXXXX 100644
23
--- a/target/arm/helper.h
18
--- a/include/hw/arm/smmu-common.h
24
+++ b/target/arm/helper.h
19
+++ b/include/hw/arm/smmu-common.h
25
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_minnumd, f64, f64, f64, ptr)
20
@@ -XXX,XX +XXX,XX @@ void smmu_iotlb_inv_iova(SMMUState *s, int asid, dma_addr_t iova,
26
DEF_HELPER_1(vfp_negh, f16, f16)
21
/* Unmap the range of all the notifiers registered to any IOMMU mr */
27
DEF_HELPER_1(vfp_negs, f32, f32)
22
void smmu_inv_notifiers_all(SMMUState *s);
28
DEF_HELPER_1(vfp_negd, f64, f64)
23
29
+DEF_HELPER_1(vfp_absh, f16, f16)
24
-/* Unmap the range of all the notifiers registered to @mr */
30
DEF_HELPER_1(vfp_abss, f32, f32)
25
-void smmu_inv_notifiers_mr(IOMMUMemoryRegion *mr);
31
DEF_HELPER_1(vfp_absd, f64, f64)
26
-
32
+DEF_HELPER_2(vfp_sqrth, f16, f16, env)
27
#endif /* HW_ARM_SMMU_COMMON_H */
33
DEF_HELPER_2(vfp_sqrts, f32, f32, env)
28
diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
34
DEF_HELPER_2(vfp_sqrtd, f64, f64, env)
35
DEF_HELPER_3(vfp_cmps, void, f32, f32, env)
36
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
37
index XXXXXXX..XXXXXXX 100644
29
index XXXXXXX..XXXXXXX 100644
38
--- a/target/arm/vfp.decode
30
--- a/hw/arm/smmu-common.c
39
+++ b/target/arm/vfp.decode
31
+++ b/hw/arm/smmu-common.c
40
@@ -XXX,XX +XXX,XX @@ VMOV_imm_dp ---- 1110 1.11 .... .... 1011 0000 .... \
32
@@ -XXX,XX +XXX,XX @@ static void smmu_unmap_notifier_range(IOMMUNotifier *n)
41
VMOV_reg_sp ---- 1110 1.11 0000 .... 1010 01.0 .... @vfp_dm_ss
42
VMOV_reg_dp ---- 1110 1.11 0000 .... 1011 01.0 .... @vfp_dm_dd
43
44
+VABS_hp ---- 1110 1.11 0000 .... 1001 11.0 .... @vfp_dm_ss
45
VABS_sp ---- 1110 1.11 0000 .... 1010 11.0 .... @vfp_dm_ss
46
VABS_dp ---- 1110 1.11 0000 .... 1011 11.0 .... @vfp_dm_dd
47
48
+VNEG_hp ---- 1110 1.11 0001 .... 1001 01.0 .... @vfp_dm_ss
49
VNEG_sp ---- 1110 1.11 0001 .... 1010 01.0 .... @vfp_dm_ss
50
VNEG_dp ---- 1110 1.11 0001 .... 1011 01.0 .... @vfp_dm_dd
51
52
+VSQRT_hp ---- 1110 1.11 0001 .... 1001 11.0 .... @vfp_dm_ss
53
VSQRT_sp ---- 1110 1.11 0001 .... 1010 11.0 .... @vfp_dm_ss
54
VSQRT_dp ---- 1110 1.11 0001 .... 1011 11.0 .... @vfp_dm_dd
55
56
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
57
index XXXXXXX..XXXXXXX 100644
58
--- a/target/arm/vfp_helper.c
59
+++ b/target/arm/vfp_helper.c
60
@@ -XXX,XX +XXX,XX @@ float64 VFP_HELPER(neg, d)(float64 a)
61
return float64_chs(a);
62
}
33
}
63
34
64
+dh_ctype_f16 VFP_HELPER(abs, h)(dh_ctype_f16 a)
35
/* Unmap all notifiers attached to @mr */
65
+{
36
-inline void smmu_inv_notifiers_mr(IOMMUMemoryRegion *mr)
66
+ return float16_abs(a);
37
+static void smmu_inv_notifiers_mr(IOMMUMemoryRegion *mr)
67
+}
68
+
69
float32 VFP_HELPER(abs, s)(float32 a)
70
{
38
{
71
return float32_abs(a);
39
IOMMUNotifier *n;
72
@@ -XXX,XX +XXX,XX @@ float64 VFP_HELPER(abs, d)(float64 a)
73
return float64_abs(a);
74
}
75
76
+dh_ctype_f16 VFP_HELPER(sqrt, h)(dh_ctype_f16 a, CPUARMState *env)
77
+{
78
+ return float16_sqrt(a, &env->vfp.fp_status_f16);
79
+}
80
+
81
float32 VFP_HELPER(sqrt, s)(float32 a, CPUARMState *env)
82
{
83
return float32_sqrt(a, &env->vfp.fp_status);
84
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
85
index XXXXXXX..XXXXXXX 100644
86
--- a/target/arm/translate-vfp.c.inc
87
+++ b/target/arm/translate-vfp.c.inc
88
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
89
return true;
90
}
91
92
+static bool do_vfp_2op_hp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
93
+{
94
+ /*
95
+ * Do a half-precision operation. Functionally this is
96
+ * the same as do_vfp_2op_sp(), except:
97
+ * - it doesn't need the VFP vector handling (fp16 is a
98
+ * v8 feature, and in v8 VFP vectors don't exist)
99
+ * - it does the aa32_fp16_arith feature test
100
+ */
101
+ TCGv_i32 f0;
102
+
103
+ if (!dc_isar_feature(aa32_fp16_arith, s)) {
104
+ return false;
105
+ }
106
+
107
+ if (s->vec_len != 0 || s->vec_stride != 0) {
108
+ return false;
109
+ }
110
+
111
+ if (!vfp_access_check(s)) {
112
+ return true;
113
+ }
114
+
115
+ f0 = tcg_temp_new_i32();
116
+ neon_load_reg32(f0, vm);
117
+ fn(f0, f0);
118
+ neon_store_reg32(f0, vd);
119
+ tcg_temp_free_i32(f0);
120
+
121
+ return true;
122
+}
123
+
124
static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
125
{
126
uint32_t delta_m = 0;
127
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
128
DO_VFP_2OP(VMOV_reg, sp, tcg_gen_mov_i32)
129
DO_VFP_2OP(VMOV_reg, dp, tcg_gen_mov_i64)
130
131
+DO_VFP_2OP(VABS, hp, gen_helper_vfp_absh)
132
DO_VFP_2OP(VABS, sp, gen_helper_vfp_abss)
133
DO_VFP_2OP(VABS, dp, gen_helper_vfp_absd)
134
135
+DO_VFP_2OP(VNEG, hp, gen_helper_vfp_negh)
136
DO_VFP_2OP(VNEG, sp, gen_helper_vfp_negs)
137
DO_VFP_2OP(VNEG, dp, gen_helper_vfp_negd)
138
139
+static void gen_VSQRT_hp(TCGv_i32 vd, TCGv_i32 vm)
140
+{
141
+ gen_helper_vfp_sqrth(vd, vm, cpu_env);
142
+}
143
+
144
static void gen_VSQRT_sp(TCGv_i32 vd, TCGv_i32 vm)
145
{
146
gen_helper_vfp_sqrts(vd, vm, cpu_env);
147
@@ -XXX,XX +XXX,XX @@ static void gen_VSQRT_dp(TCGv_i64 vd, TCGv_i64 vm)
148
gen_helper_vfp_sqrtd(vd, vm, cpu_env);
149
}
150
151
+DO_VFP_2OP(VSQRT, hp, gen_VSQRT_hp)
152
DO_VFP_2OP(VSQRT, sp, gen_VSQRT_sp)
153
DO_VFP_2OP(VSQRT, dp, gen_VSQRT_dp)
154
40
155
--
41
--
156
2.20.1
42
2.25.1
157
43
158
44
diff view generated by jsdifflib
1
Implmeent VFP fp16 support for simple binary-operator VFP insns VADD,
1
From: Philippe Mathieu-Daudé <philmd@linaro.org>
2
VSUB, VMUL, VDIV, VMINNM and VMAXNM:
3
2
4
* make the VFP_BINOP() macro generate float16 helpers as well as
3
When using Clang ("Apple clang version 14.0.0 (clang-1400.0.29.202)")
5
float32 and float64
4
and building with -Wall we get:
6
* implement a do_vfp_3op_hp() function similar to the existing
7
do_vfp_3op_sp()
8
* add decode for the half-precision insn patterns
9
5
10
Note that the VFP_BINOP macro use creates a couple of unused helper
6
hw/arm/smmu-common.c:173:33: warning: static function 'smmu_hash_remove_by_asid_iova' is used in an inline function with external linkage [-Wstatic-in-inline]
11
functions vfp_maxh and vfp_minh, but they're small so it's not worth
7
hw/arm/smmu-common.h:170:1: note: use 'static' to give inline function 'smmu_iotlb_inv_iova' internal linkage
12
splitting the BINOP operations into "needs halfprec" and "no
8
void smmu_iotlb_inv_iova(SMMUState *s, int asid, dma_addr_t iova,
13
halfprec" groups.
9
^
10
static
14
11
12
None of our code base require / use inlined functions with external
13
linkage. Some places use internal inlining in the hot path. These
14
two functions are certainly not in any hot path and don't justify
15
any inlining, so these are likely oversights rather than intentional.
16
17
Reported-by: Stefan Weil <sw@weilnetz.de>
18
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
19
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
20
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
21
Reviewed-by: Eric Auger <eric.auger@redhat.com>
22
Message-id: 20221216214924.4711-3-philmd@linaro.org
15
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
23
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
16
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
17
Message-id: 20200828183354.27913-4-peter.maydell@linaro.org
18
---
24
---
19
target/arm/helper.h | 8 ++++
25
hw/arm/smmu-common.c | 13 ++++++-------
20
target/arm/vfp-uncond.decode | 3 ++
26
1 file changed, 6 insertions(+), 7 deletions(-)
21
target/arm/vfp.decode | 4 ++
22
target/arm/vfp_helper.c | 5 ++
23
target/arm/translate-vfp.c.inc | 86 ++++++++++++++++++++++++++++++++++
24
5 files changed, 106 insertions(+)
25
27
26
diff --git a/target/arm/helper.h b/target/arm/helper.h
28
diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
27
index XXXXXXX..XXXXXXX 100644
29
index XXXXXXX..XXXXXXX 100644
28
--- a/target/arm/helper.h
30
--- a/hw/arm/smmu-common.c
29
+++ b/target/arm/helper.h
31
+++ b/hw/arm/smmu-common.c
30
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(probe_access, TCG_CALL_NO_WG, void, env, tl, i32, i32, i32)
32
@@ -XXX,XX +XXX,XX @@ void smmu_iotlb_insert(SMMUState *bs, SMMUTransCfg *cfg, SMMUTLBEntry *new)
31
DEF_HELPER_1(vfp_get_fpscr, i32, env)
33
g_hash_table_insert(bs->iotlb, key, new);
32
DEF_HELPER_2(vfp_set_fpscr, void, env, i32)
33
34
+DEF_HELPER_3(vfp_addh, f16, f16, f16, ptr)
35
DEF_HELPER_3(vfp_adds, f32, f32, f32, ptr)
36
DEF_HELPER_3(vfp_addd, f64, f64, f64, ptr)
37
+DEF_HELPER_3(vfp_subh, f16, f16, f16, ptr)
38
DEF_HELPER_3(vfp_subs, f32, f32, f32, ptr)
39
DEF_HELPER_3(vfp_subd, f64, f64, f64, ptr)
40
+DEF_HELPER_3(vfp_mulh, f16, f16, f16, ptr)
41
DEF_HELPER_3(vfp_muls, f32, f32, f32, ptr)
42
DEF_HELPER_3(vfp_muld, f64, f64, f64, ptr)
43
+DEF_HELPER_3(vfp_divh, f16, f16, f16, ptr)
44
DEF_HELPER_3(vfp_divs, f32, f32, f32, ptr)
45
DEF_HELPER_3(vfp_divd, f64, f64, f64, ptr)
46
+DEF_HELPER_3(vfp_maxh, f16, f16, f16, ptr)
47
DEF_HELPER_3(vfp_maxs, f32, f32, f32, ptr)
48
DEF_HELPER_3(vfp_maxd, f64, f64, f64, ptr)
49
+DEF_HELPER_3(vfp_minh, f16, f16, f16, ptr)
50
DEF_HELPER_3(vfp_mins, f32, f32, f32, ptr)
51
DEF_HELPER_3(vfp_mind, f64, f64, f64, ptr)
52
+DEF_HELPER_3(vfp_maxnumh, f16, f16, f16, ptr)
53
DEF_HELPER_3(vfp_maxnums, f32, f32, f32, ptr)
54
DEF_HELPER_3(vfp_maxnumd, f64, f64, f64, ptr)
55
+DEF_HELPER_3(vfp_minnumh, f16, f16, f16, ptr)
56
DEF_HELPER_3(vfp_minnums, f32, f32, f32, ptr)
57
DEF_HELPER_3(vfp_minnumd, f64, f64, f64, ptr)
58
DEF_HELPER_1(vfp_negs, f32, f32)
59
diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
60
index XXXXXXX..XXXXXXX 100644
61
--- a/target/arm/vfp-uncond.decode
62
+++ b/target/arm/vfp-uncond.decode
63
@@ -XXX,XX +XXX,XX @@ VSEL 1111 1110 0. cc:2 .... .... 1010 .0.0 .... \
64
VSEL 1111 1110 0. cc:2 .... .... 1011 .0.0 .... \
65
vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
66
67
+VMAXNM_hp 1111 1110 1.00 .... .... 1001 .0.0 .... @vfp_dnm_s
68
+VMINNM_hp 1111 1110 1.00 .... .... 1001 .1.0 .... @vfp_dnm_s
69
+
70
VMAXNM_sp 1111 1110 1.00 .... .... 1010 .0.0 .... @vfp_dnm_s
71
VMINNM_sp 1111 1110 1.00 .... .... 1010 .1.0 .... @vfp_dnm_s
72
73
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
74
index XXXXXXX..XXXXXXX 100644
75
--- a/target/arm/vfp.decode
76
+++ b/target/arm/vfp.decode
77
@@ -XXX,XX +XXX,XX @@ VNMLS_dp ---- 1110 0.01 .... .... 1011 .0.0 .... @vfp_dnm_d
78
VNMLA_sp ---- 1110 0.01 .... .... 1010 .1.0 .... @vfp_dnm_s
79
VNMLA_dp ---- 1110 0.01 .... .... 1011 .1.0 .... @vfp_dnm_d
80
81
+VMUL_hp ---- 1110 0.10 .... .... 1001 .0.0 .... @vfp_dnm_s
82
VMUL_sp ---- 1110 0.10 .... .... 1010 .0.0 .... @vfp_dnm_s
83
VMUL_dp ---- 1110 0.10 .... .... 1011 .0.0 .... @vfp_dnm_d
84
85
VNMUL_sp ---- 1110 0.10 .... .... 1010 .1.0 .... @vfp_dnm_s
86
VNMUL_dp ---- 1110 0.10 .... .... 1011 .1.0 .... @vfp_dnm_d
87
88
+VADD_hp ---- 1110 0.11 .... .... 1001 .0.0 .... @vfp_dnm_s
89
VADD_sp ---- 1110 0.11 .... .... 1010 .0.0 .... @vfp_dnm_s
90
VADD_dp ---- 1110 0.11 .... .... 1011 .0.0 .... @vfp_dnm_d
91
92
+VSUB_hp ---- 1110 0.11 .... .... 1001 .1.0 .... @vfp_dnm_s
93
VSUB_sp ---- 1110 0.11 .... .... 1010 .1.0 .... @vfp_dnm_s
94
VSUB_dp ---- 1110 0.11 .... .... 1011 .1.0 .... @vfp_dnm_d
95
96
+VDIV_hp ---- 1110 1.00 .... .... 1001 .0.0 .... @vfp_dnm_s
97
VDIV_sp ---- 1110 1.00 .... .... 1010 .0.0 .... @vfp_dnm_s
98
VDIV_dp ---- 1110 1.00 .... .... 1011 .0.0 .... @vfp_dnm_d
99
100
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
101
index XXXXXXX..XXXXXXX 100644
102
--- a/target/arm/vfp_helper.c
103
+++ b/target/arm/vfp_helper.c
104
@@ -XXX,XX +XXX,XX @@ void vfp_set_fpscr(CPUARMState *env, uint32_t val)
105
#define VFP_HELPER(name, p) HELPER(glue(glue(vfp_,name),p))
106
107
#define VFP_BINOP(name) \
108
+dh_ctype_f16 VFP_HELPER(name, h)(dh_ctype_f16 a, dh_ctype_f16 b, void *fpstp) \
109
+{ \
110
+ float_status *fpst = fpstp; \
111
+ return float16_ ## name(a, b, fpst); \
112
+} \
113
float32 VFP_HELPER(name, s)(float32 a, float32 b, void *fpstp) \
114
{ \
115
float_status *fpst = fpstp; \
116
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
117
index XXXXXXX..XXXXXXX 100644
118
--- a/target/arm/translate-vfp.c.inc
119
+++ b/target/arm/translate-vfp.c.inc
120
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
121
return true;
122
}
34
}
123
35
124
+static bool do_vfp_3op_hp(DisasContext *s, VFPGen3OpSPFn *fn,
36
-inline void smmu_iotlb_inv_all(SMMUState *s)
125
+ int vd, int vn, int vm, bool reads_vd)
37
+void smmu_iotlb_inv_all(SMMUState *s)
126
+{
127
+ /*
128
+ * Do a half-precision operation. Functionally this is
129
+ * the same as do_vfp_3op_sp(), except:
130
+ * - it uses the FPST_FPCR_F16
131
+ * - it doesn't need the VFP vector handling (fp16 is a
132
+ * v8 feature, and in v8 VFP vectors don't exist)
133
+ * - it does the aa32_fp16_arith feature test
134
+ */
135
+ TCGv_i32 f0, f1, fd;
136
+ TCGv_ptr fpst;
137
+
138
+ if (!dc_isar_feature(aa32_fp16_arith, s)) {
139
+ return false;
140
+ }
141
+
142
+ if (s->vec_len != 0 || s->vec_stride != 0) {
143
+ return false;
144
+ }
145
+
146
+ if (!vfp_access_check(s)) {
147
+ return true;
148
+ }
149
+
150
+ f0 = tcg_temp_new_i32();
151
+ f1 = tcg_temp_new_i32();
152
+ fd = tcg_temp_new_i32();
153
+ fpst = fpstatus_ptr(FPST_FPCR_F16);
154
+
155
+ neon_load_reg32(f0, vn);
156
+ neon_load_reg32(f1, vm);
157
+
158
+ if (reads_vd) {
159
+ neon_load_reg32(fd, vd);
160
+ }
161
+ fn(fd, f0, f1, fpst);
162
+ neon_store_reg32(fd, vd);
163
+
164
+ tcg_temp_free_i32(f0);
165
+ tcg_temp_free_i32(f1);
166
+ tcg_temp_free_i32(fd);
167
+ tcg_temp_free_ptr(fpst);
168
+
169
+ return true;
170
+}
171
+
172
static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
173
int vd, int vn, int vm, bool reads_vd)
174
{
38
{
175
@@ -XXX,XX +XXX,XX @@ static bool trans_VNMLA_dp(DisasContext *s, arg_VNMLA_dp *a)
39
trace_smmu_iotlb_inv_all();
176
return do_vfp_3op_dp(s, gen_VNMLA_dp, a->vd, a->vn, a->vm, true);
40
g_hash_table_remove_all(s->iotlb);
41
@@ -XXX,XX +XXX,XX @@ static gboolean smmu_hash_remove_by_asid_iova(gpointer key, gpointer value,
42
((entry->iova & ~info->mask) == info->iova);
177
}
43
}
178
44
179
+static bool trans_VMUL_hp(DisasContext *s, arg_VMUL_sp *a)
45
-inline void
180
+{
46
-smmu_iotlb_inv_iova(SMMUState *s, int asid, dma_addr_t iova,
181
+ return do_vfp_3op_hp(s, gen_helper_vfp_mulh, a->vd, a->vn, a->vm, false);
47
- uint8_t tg, uint64_t num_pages, uint8_t ttl)
182
+}
48
+void smmu_iotlb_inv_iova(SMMUState *s, int asid, dma_addr_t iova,
183
+
49
+ uint8_t tg, uint64_t num_pages, uint8_t ttl)
184
static bool trans_VMUL_sp(DisasContext *s, arg_VMUL_sp *a)
185
{
50
{
186
return do_vfp_3op_sp(s, gen_helper_vfp_muls, a->vd, a->vn, a->vm, false);
51
/* if tg is not set we use 4KB range invalidation */
187
@@ -XXX,XX +XXX,XX @@ static bool trans_VNMUL_dp(DisasContext *s, arg_VNMUL_dp *a)
52
uint8_t granule = tg ? tg * 2 + 10 : 12;
188
return do_vfp_3op_dp(s, gen_VNMUL_dp, a->vd, a->vn, a->vm, false);
53
@@ -XXX,XX +XXX,XX @@ smmu_iotlb_inv_iova(SMMUState *s, int asid, dma_addr_t iova,
54
&info);
189
}
55
}
190
56
191
+static bool trans_VADD_hp(DisasContext *s, arg_VADD_sp *a)
57
-inline void smmu_iotlb_inv_asid(SMMUState *s, uint16_t asid)
192
+{
58
+void smmu_iotlb_inv_asid(SMMUState *s, uint16_t asid)
193
+ return do_vfp_3op_hp(s, gen_helper_vfp_addh, a->vd, a->vn, a->vm, false);
194
+}
195
+
196
static bool trans_VADD_sp(DisasContext *s, arg_VADD_sp *a)
197
{
59
{
198
return do_vfp_3op_sp(s, gen_helper_vfp_adds, a->vd, a->vn, a->vm, false);
60
trace_smmu_iotlb_inv_asid(asid);
199
@@ -XXX,XX +XXX,XX @@ static bool trans_VADD_dp(DisasContext *s, arg_VADD_dp *a)
61
g_hash_table_foreach_remove(s->iotlb, smmu_hash_remove_by_asid, &asid);
200
return do_vfp_3op_dp(s, gen_helper_vfp_addd, a->vd, a->vn, a->vm, false);
62
@@ -XXX,XX +XXX,XX @@ error:
201
}
63
*
202
64
* return 0 on success
203
+static bool trans_VSUB_hp(DisasContext *s, arg_VSUB_sp *a)
65
*/
204
+{
66
-inline int smmu_ptw(SMMUTransCfg *cfg, dma_addr_t iova, IOMMUAccessFlags perm,
205
+ return do_vfp_3op_hp(s, gen_helper_vfp_subh, a->vd, a->vn, a->vm, false);
67
- SMMUTLBEntry *tlbe, SMMUPTWEventInfo *info)
206
+}
68
+int smmu_ptw(SMMUTransCfg *cfg, dma_addr_t iova, IOMMUAccessFlags perm,
207
+
69
+ SMMUTLBEntry *tlbe, SMMUPTWEventInfo *info)
208
static bool trans_VSUB_sp(DisasContext *s, arg_VSUB_sp *a)
209
{
70
{
210
return do_vfp_3op_sp(s, gen_helper_vfp_subs, a->vd, a->vn, a->vm, false);
71
if (!cfg->aa64) {
211
@@ -XXX,XX +XXX,XX @@ static bool trans_VSUB_dp(DisasContext *s, arg_VSUB_dp *a)
72
/*
212
return do_vfp_3op_dp(s, gen_helper_vfp_subd, a->vd, a->vn, a->vm, false);
213
}
214
215
+static bool trans_VDIV_hp(DisasContext *s, arg_VDIV_sp *a)
216
+{
217
+ return do_vfp_3op_hp(s, gen_helper_vfp_divh, a->vd, a->vn, a->vm, false);
218
+}
219
+
220
static bool trans_VDIV_sp(DisasContext *s, arg_VDIV_sp *a)
221
{
222
return do_vfp_3op_sp(s, gen_helper_vfp_divs, a->vd, a->vn, a->vm, false);
223
@@ -XXX,XX +XXX,XX @@ static bool trans_VDIV_dp(DisasContext *s, arg_VDIV_dp *a)
224
return do_vfp_3op_dp(s, gen_helper_vfp_divd, a->vd, a->vn, a->vm, false);
225
}
226
227
+static bool trans_VMINNM_hp(DisasContext *s, arg_VMINNM_sp *a)
228
+{
229
+ if (!dc_isar_feature(aa32_vminmaxnm, s)) {
230
+ return false;
231
+ }
232
+ return do_vfp_3op_hp(s, gen_helper_vfp_minnumh,
233
+ a->vd, a->vn, a->vm, false);
234
+}
235
+
236
+static bool trans_VMAXNM_hp(DisasContext *s, arg_VMAXNM_sp *a)
237
+{
238
+ if (!dc_isar_feature(aa32_vminmaxnm, s)) {
239
+ return false;
240
+ }
241
+ return do_vfp_3op_hp(s, gen_helper_vfp_maxnumh,
242
+ a->vd, a->vn, a->vm, false);
243
+}
244
+
245
static bool trans_VMINNM_sp(DisasContext *s, arg_VMINNM_sp *a)
246
{
247
if (!dc_isar_feature(aa32_vminmaxnm, s)) {
248
--
73
--
249
2.20.1
74
2.25.1
250
75
251
76
diff view generated by jsdifflib
1
Convert the Neon floating point VMAXNM and VMINNM insns to
1
From: Jean-Christophe Dubois <jcd@tribudubois.net>
2
using a gvec helper and use this to implement the fp16 case.
3
2
3
So far the GPT timers were unable to raise IRQs to the processor.
4
5
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Message-id: 20200828183354.27913-30-peter.maydell@linaro.org
7
---
8
---
8
target/arm/helper.h | 6 ++++++
9
include/hw/arm/fsl-imx7.h | 5 +++++
9
target/arm/vec_helper.c | 6 ++++++
10
hw/arm/fsl-imx7.c | 10 ++++++++++
10
target/arm/translate-neon.c.inc | 23 +++++++++++++++--------
11
2 files changed, 15 insertions(+)
11
3 files changed, 27 insertions(+), 8 deletions(-)
12
12
13
diff --git a/target/arm/helper.h b/target/arm/helper.h
13
diff --git a/include/hw/arm/fsl-imx7.h b/include/hw/arm/fsl-imx7.h
14
index XXXXXXX..XXXXXXX 100644
14
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/helper.h
15
--- a/include/hw/arm/fsl-imx7.h
16
+++ b/target/arm/helper.h
16
+++ b/include/hw/arm/fsl-imx7.h
17
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fmax_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
17
@@ -XXX,XX +XXX,XX @@ enum FslIMX7IRQs {
18
DEF_HELPER_FLAGS_5(gvec_fmin_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
18
FSL_IMX7_USB2_IRQ = 42,
19
DEF_HELPER_FLAGS_5(gvec_fmin_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
19
FSL_IMX7_USB3_IRQ = 40,
20
20
21
+DEF_HELPER_FLAGS_5(gvec_fmaxnum_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
21
+ FSL_IMX7_GPT1_IRQ = 55,
22
+DEF_HELPER_FLAGS_5(gvec_fmaxnum_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
22
+ FSL_IMX7_GPT2_IRQ = 54,
23
+ FSL_IMX7_GPT3_IRQ = 53,
24
+ FSL_IMX7_GPT4_IRQ = 52,
23
+
25
+
24
+DEF_HELPER_FLAGS_5(gvec_fminnum_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
26
FSL_IMX7_WDOG1_IRQ = 78,
25
+DEF_HELPER_FLAGS_5(gvec_fminnum_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
27
FSL_IMX7_WDOG2_IRQ = 79,
28
FSL_IMX7_WDOG3_IRQ = 10,
29
diff --git a/hw/arm/fsl-imx7.c b/hw/arm/fsl-imx7.c
30
index XXXXXXX..XXXXXXX 100644
31
--- a/hw/arm/fsl-imx7.c
32
+++ b/hw/arm/fsl-imx7.c
33
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
34
FSL_IMX7_GPT4_ADDR,
35
};
36
37
+ static const int FSL_IMX7_GPTn_IRQ[FSL_IMX7_NUM_GPTS] = {
38
+ FSL_IMX7_GPT1_IRQ,
39
+ FSL_IMX7_GPT2_IRQ,
40
+ FSL_IMX7_GPT3_IRQ,
41
+ FSL_IMX7_GPT4_IRQ,
42
+ };
26
+
43
+
27
DEF_HELPER_FLAGS_5(gvec_ftsmul_h, TCG_CALL_NO_RWG,
44
s->gpt[i].ccm = IMX_CCM(&s->ccm);
28
void, ptr, ptr, ptr, ptr, i32)
45
sysbus_realize(SYS_BUS_DEVICE(&s->gpt[i]), &error_abort);
29
DEF_HELPER_FLAGS_5(gvec_ftsmul_s, TCG_CALL_NO_RWG,
46
sysbus_mmio_map(SYS_BUS_DEVICE(&s->gpt[i]), 0, FSL_IMX7_GPTn_ADDR[i]);
30
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
47
+ sysbus_connect_irq(SYS_BUS_DEVICE(&s->gpt[i]), 0,
31
index XXXXXXX..XXXXXXX 100644
48
+ qdev_get_gpio_in(DEVICE(&s->a7mpcore),
32
--- a/target/arm/vec_helper.c
49
+ FSL_IMX7_GPTn_IRQ[i]));
33
+++ b/target/arm/vec_helper.c
34
@@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_fmax_s, float32_max, float32)
35
DO_3OP(gvec_fmin_h, float16_min, float16)
36
DO_3OP(gvec_fmin_s, float32_min, float32)
37
38
+DO_3OP(gvec_fmaxnum_h, float16_maxnum, float16)
39
+DO_3OP(gvec_fmaxnum_s, float32_maxnum, float32)
40
+
41
+DO_3OP(gvec_fminnum_h, float16_minnum, float16)
42
+DO_3OP(gvec_fminnum_s, float32_minnum, float32)
43
+
44
#ifdef TARGET_AARCH64
45
46
DO_3OP(gvec_recps_h, helper_recpsf_f16, float16)
47
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
48
index XXXXXXX..XXXXXXX 100644
49
--- a/target/arm/translate-neon.c.inc
50
+++ b/target/arm/translate-neon.c.inc
51
@@ -XXX,XX +XXX,XX @@ static void gen_VMLS_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm,
52
DO_3S_FP(VMLA, gen_VMLA_fp_3s, true)
53
DO_3S_FP(VMLS, gen_VMLS_fp_3s, true)
54
55
+WRAP_FP_GVEC(gen_VMAXNM_fp32_3s, FPST_STD, gen_helper_gvec_fmaxnum_s)
56
+WRAP_FP_GVEC(gen_VMAXNM_fp16_3s, FPST_STD_F16, gen_helper_gvec_fmaxnum_h)
57
+WRAP_FP_GVEC(gen_VMINNM_fp32_3s, FPST_STD, gen_helper_gvec_fminnum_s)
58
+WRAP_FP_GVEC(gen_VMINNM_fp16_3s, FPST_STD_F16, gen_helper_gvec_fminnum_h)
59
+
60
static bool trans_VMAXNM_fp_3s(DisasContext *s, arg_3same *a)
61
{
62
if (!arm_dc_feature(s, ARM_FEATURE_V8)) {
63
@@ -XXX,XX +XXX,XX @@ static bool trans_VMAXNM_fp_3s(DisasContext *s, arg_3same *a)
64
}
50
}
65
51
66
if (a->size != 0) {
52
for (i = 0; i < FSL_IMX7_NUM_GPIOS; i++) {
67
- /* TODO fp16 support */
68
- return false;
69
+ if (!dc_isar_feature(aa32_fp16_arith, s)) {
70
+ return false;
71
+ }
72
+ return do_3same(s, a, gen_VMAXNM_fp16_3s);
73
}
74
-
75
- return do_3same_fp(s, a, gen_helper_vfp_maxnums, false);
76
+ return do_3same(s, a, gen_VMAXNM_fp32_3s);
77
}
78
79
static bool trans_VMINNM_fp_3s(DisasContext *s, arg_3same *a)
80
@@ -XXX,XX +XXX,XX @@ static bool trans_VMINNM_fp_3s(DisasContext *s, arg_3same *a)
81
}
82
83
if (a->size != 0) {
84
- /* TODO fp16 support */
85
- return false;
86
+ if (!dc_isar_feature(aa32_fp16_arith, s)) {
87
+ return false;
88
+ }
89
+ return do_3same(s, a, gen_VMINNM_fp16_3s);
90
}
91
-
92
- return do_3same_fp(s, a, gen_helper_vfp_minnums, false);
93
+ return do_3same(s, a, gen_VMINNM_fp32_3s);
94
}
95
96
WRAP_ENV_FN(gen_VRECPS_tramp, gen_helper_recps_f32)
97
--
53
--
98
2.20.1
54
2.25.1
99
100
diff view generated by jsdifflib
1
Macroify creation of the trans functions for single and double
1
From: Jean-Christophe Dubois <jcd@tribudubois.net>
2
precision VFMA, VFMS, VFNMA, VFNMS. The repetition was OK for
3
two sizes, but we're about to add halfprec and it will get a bit
4
more than seems reasonable.
5
2
3
CCM derived clocks will have to be added later.
4
5
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20200828183354.27913-6-peter.maydell@linaro.org
9
---
8
---
10
target/arm/translate-vfp.c.inc | 50 +++++++++-------------------------
9
hw/misc/imx7_ccm.c | 49 +++++++++++++++++++++++++++++++++++++---------
11
1 file changed, 13 insertions(+), 37 deletions(-)
10
1 file changed, 40 insertions(+), 9 deletions(-)
12
11
13
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
12
diff --git a/hw/misc/imx7_ccm.c b/hw/misc/imx7_ccm.c
14
index XXXXXXX..XXXXXXX 100644
13
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/translate-vfp.c.inc
14
--- a/hw/misc/imx7_ccm.c
16
+++ b/target/arm/translate-vfp.c.inc
15
+++ b/hw/misc/imx7_ccm.c
17
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_sp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
16
@@ -XXX,XX +XXX,XX @@
18
return true;
17
#include "hw/misc/imx7_ccm.h"
19
}
18
#include "migration/vmstate.h"
20
19
21
-static bool trans_VFMA_sp(DisasContext *s, arg_VFMA_sp *a)
20
+#include "trace.h"
22
-{
21
+
23
- return do_vfm_sp(s, a, false, false);
22
+#define CKIH_FREQ 24000000 /* 24MHz crystal input */
24
-}
23
+
25
-
24
static void imx7_analog_reset(DeviceState *dev)
26
-static bool trans_VFMS_sp(DisasContext *s, arg_VFMS_sp *a)
25
{
27
-{
26
IMX7AnalogState *s = IMX7_ANALOG(dev);
28
- return do_vfm_sp(s, a, true, false);
27
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_imx7_ccm = {
29
-}
28
static uint32_t imx7_ccm_get_clock_frequency(IMXCCMState *dev, IMXClk clock)
30
-
31
-static bool trans_VFNMA_sp(DisasContext *s, arg_VFNMA_sp *a)
32
-{
33
- return do_vfm_sp(s, a, false, true);
34
-}
35
-
36
-static bool trans_VFNMS_sp(DisasContext *s, arg_VFNMS_sp *a)
37
-{
38
- return do_vfm_sp(s, a, true, true);
39
-}
40
-
41
static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
42
{
29
{
43
/*
30
/*
44
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
31
- * This function is "consumed" by GPT emulation code, however on
45
return true;
32
- * i.MX7 each GPT block can have their own clock root. This means
33
- * that this functions needs somehow to know requester's identity
34
- * and the way to pass it: be it via additional IMXClk constants
35
- * or by adding another argument to this method needs to be
36
- * figured out
37
+ * This function is "consumed" by GPT emulation code. Some clocks
38
+ * have fixed frequencies and we can provide requested frequency
39
+ * easily. However for CCM provided clocks (like IPG) each GPT
40
+ * timer can have its own clock root.
41
+ * This means we need additionnal information when calling this
42
+ * function to know the requester's identity.
43
*/
44
- qemu_log_mask(LOG_GUEST_ERROR, "[%s]%s: Not implemented\n",
45
- TYPE_IMX7_CCM, __func__);
46
- return 0;
47
+ uint32_t freq = 0;
48
+
49
+ switch (clock) {
50
+ case CLK_NONE:
51
+ break;
52
+ case CLK_32k:
53
+ freq = CKIL_FREQ;
54
+ break;
55
+ case CLK_HIGH:
56
+ freq = CKIH_FREQ;
57
+ break;
58
+ case CLK_IPG:
59
+ case CLK_IPG_HIGH:
60
+ /*
61
+ * For now we don't have a way to figure out the device this
62
+ * function is called for. Until then the IPG derived clocks
63
+ * are left unimplemented.
64
+ */
65
+ qemu_log_mask(LOG_GUEST_ERROR, "[%s]%s: Clock %d Not implemented\n",
66
+ TYPE_IMX7_CCM, __func__, clock);
67
+ break;
68
+ default:
69
+ qemu_log_mask(LOG_GUEST_ERROR, "[%s]%s: unsupported clock %d\n",
70
+ TYPE_IMX7_CCM, __func__, clock);
71
+ break;
72
+ }
73
+
74
+ trace_ccm_clock_freq(clock, freq);
75
+
76
+ return freq;
46
}
77
}
47
78
48
-static bool trans_VFMA_dp(DisasContext *s, arg_VFMA_dp *a)
79
static void imx7_ccm_class_init(ObjectClass *klass, void *data)
49
-{
50
- return do_vfm_dp(s, a, false, false);
51
-}
52
+#define MAKE_ONE_VFM_TRANS_FN(INSN, PREC, NEGN, NEGD) \
53
+ static bool trans_##INSN##_##PREC(DisasContext *s, \
54
+ arg_##INSN##_##PREC *a) \
55
+ { \
56
+ return do_vfm_##PREC(s, a, NEGN, NEGD); \
57
+ }
58
59
-static bool trans_VFMS_dp(DisasContext *s, arg_VFMS_dp *a)
60
-{
61
- return do_vfm_dp(s, a, true, false);
62
-}
63
+#define MAKE_VFM_TRANS_FNS(PREC) \
64
+ MAKE_ONE_VFM_TRANS_FN(VFMA, PREC, false, false) \
65
+ MAKE_ONE_VFM_TRANS_FN(VFMS, PREC, true, false) \
66
+ MAKE_ONE_VFM_TRANS_FN(VFNMA, PREC, false, true) \
67
+ MAKE_ONE_VFM_TRANS_FN(VFNMS, PREC, true, true)
68
69
-static bool trans_VFNMA_dp(DisasContext *s, arg_VFNMA_dp *a)
70
-{
71
- return do_vfm_dp(s, a, false, true);
72
-}
73
-
74
-static bool trans_VFNMS_dp(DisasContext *s, arg_VFNMS_dp *a)
75
-{
76
- return do_vfm_dp(s, a, true, true);
77
-}
78
+MAKE_VFM_TRANS_FNS(sp)
79
+MAKE_VFM_TRANS_FNS(dp)
80
81
static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
82
{
83
--
80
--
84
2.20.1
81
2.25.1
85
86
diff view generated by jsdifflib
Deleted patch
1
Implement VFP fp16 support for the VMOV immediate insn.
2
1
3
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
5
Message-id: 20200828183354.27913-10-peter.maydell@linaro.org
6
---
7
target/arm/vfp.decode | 2 ++
8
target/arm/translate-vfp.c.inc | 22 ++++++++++++++++++++++
9
2 files changed, 24 insertions(+)
10
11
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
12
index XXXXXXX..XXXXXXX 100644
13
--- a/target/arm/vfp.decode
14
+++ b/target/arm/vfp.decode
15
@@ -XXX,XX +XXX,XX @@ VFMS_dp ---- 1110 1.10 .... .... 1011 .1.0 .... @vfp_dnm_d
16
VFNMA_dp ---- 1110 1.01 .... .... 1011 .0.0 .... @vfp_dnm_d
17
VFNMS_dp ---- 1110 1.01 .... .... 1011 .1.0 .... @vfp_dnm_d
18
19
+VMOV_imm_hp ---- 1110 1.11 .... .... 1001 0000 .... \
20
+ vd=%vd_sp imm=%vmov_imm
21
VMOV_imm_sp ---- 1110 1.11 .... .... 1010 0000 .... \
22
vd=%vd_sp imm=%vmov_imm
23
VMOV_imm_dp ---- 1110 1.11 .... .... 1011 0000 .... \
24
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
25
index XXXXXXX..XXXXXXX 100644
26
--- a/target/arm/translate-vfp.c.inc
27
+++ b/target/arm/translate-vfp.c.inc
28
@@ -XXX,XX +XXX,XX @@ MAKE_VFM_TRANS_FNS(hp)
29
MAKE_VFM_TRANS_FNS(sp)
30
MAKE_VFM_TRANS_FNS(dp)
31
32
+static bool trans_VMOV_imm_hp(DisasContext *s, arg_VMOV_imm_sp *a)
33
+{
34
+ TCGv_i32 fd;
35
+
36
+ if (!dc_isar_feature(aa32_fp16_arith, s)) {
37
+ return false;
38
+ }
39
+
40
+ if (s->vec_len != 0 || s->vec_stride != 0) {
41
+ return false;
42
+ }
43
+
44
+ if (!vfp_access_check(s)) {
45
+ return true;
46
+ }
47
+
48
+ fd = tcg_const_i32(vfp_expand_imm(MO_16, a->imm));
49
+ neon_store_reg32(fd, a->vd);
50
+ tcg_temp_free_i32(fd);
51
+ return true;
52
+}
53
+
54
static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
55
{
56
uint32_t delta_d = 0;
57
--
58
2.20.1
59
60
diff view generated by jsdifflib
Deleted patch
1
Implement fp16 version of VCMP.
2
1
3
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
5
Message-id: 20200828183354.27913-11-peter.maydell@linaro.org
6
---
7
target/arm/helper.h | 2 ++
8
target/arm/vfp.decode | 2 ++
9
target/arm/vfp_helper.c | 15 +++++++------
10
target/arm/translate-vfp.c.inc | 39 ++++++++++++++++++++++++++++++++++
11
4 files changed, 51 insertions(+), 7 deletions(-)
12
13
diff --git a/target/arm/helper.h b/target/arm/helper.h
14
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/helper.h
16
+++ b/target/arm/helper.h
17
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_1(vfp_absd, f64, f64)
18
DEF_HELPER_2(vfp_sqrth, f16, f16, env)
19
DEF_HELPER_2(vfp_sqrts, f32, f32, env)
20
DEF_HELPER_2(vfp_sqrtd, f64, f64, env)
21
+DEF_HELPER_3(vfp_cmph, void, f16, f16, env)
22
DEF_HELPER_3(vfp_cmps, void, f32, f32, env)
23
DEF_HELPER_3(vfp_cmpd, void, f64, f64, env)
24
+DEF_HELPER_3(vfp_cmpeh, void, f16, f16, env)
25
DEF_HELPER_3(vfp_cmpes, void, f32, f32, env)
26
DEF_HELPER_3(vfp_cmped, void, f64, f64, env)
27
28
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
29
index XXXXXXX..XXXXXXX 100644
30
--- a/target/arm/vfp.decode
31
+++ b/target/arm/vfp.decode
32
@@ -XXX,XX +XXX,XX @@ VSQRT_hp ---- 1110 1.11 0001 .... 1001 11.0 .... @vfp_dm_ss
33
VSQRT_sp ---- 1110 1.11 0001 .... 1010 11.0 .... @vfp_dm_ss
34
VSQRT_dp ---- 1110 1.11 0001 .... 1011 11.0 .... @vfp_dm_dd
35
36
+VCMP_hp ---- 1110 1.11 010 z:1 .... 1001 e:1 1.0 .... \
37
+ vd=%vd_sp vm=%vm_sp
38
VCMP_sp ---- 1110 1.11 010 z:1 .... 1010 e:1 1.0 .... \
39
vd=%vd_sp vm=%vm_sp
40
VCMP_dp ---- 1110 1.11 010 z:1 .... 1011 e:1 1.0 .... \
41
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
42
index XXXXXXX..XXXXXXX 100644
43
--- a/target/arm/vfp_helper.c
44
+++ b/target/arm/vfp_helper.c
45
@@ -XXX,XX +XXX,XX @@ static void softfloat_to_vfp_compare(CPUARMState *env, FloatRelation cmp)
46
}
47
48
/* XXX: check quiet/signaling case */
49
-#define DO_VFP_cmp(p, type) \
50
-void VFP_HELPER(cmp, p)(type a, type b, CPUARMState *env) \
51
+#define DO_VFP_cmp(P, FLOATTYPE, ARGTYPE, FPST) \
52
+void VFP_HELPER(cmp, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env) \
53
{ \
54
softfloat_to_vfp_compare(env, \
55
- type ## _compare_quiet(a, b, &env->vfp.fp_status)); \
56
+ FLOATTYPE ## _compare_quiet(a, b, &env->vfp.FPST)); \
57
} \
58
-void VFP_HELPER(cmpe, p)(type a, type b, CPUARMState *env) \
59
+void VFP_HELPER(cmpe, P)(ARGTYPE a, ARGTYPE b, CPUARMState *env) \
60
{ \
61
softfloat_to_vfp_compare(env, \
62
- type ## _compare(a, b, &env->vfp.fp_status)); \
63
+ FLOATTYPE ## _compare(a, b, &env->vfp.FPST)); \
64
}
65
-DO_VFP_cmp(s, float32)
66
-DO_VFP_cmp(d, float64)
67
+DO_VFP_cmp(h, float16, dh_ctype_f16, fp_status_f16)
68
+DO_VFP_cmp(s, float32, float32, fp_status)
69
+DO_VFP_cmp(d, float64, float64, fp_status)
70
#undef DO_VFP_cmp
71
72
/* Integer to float and float to integer conversions */
73
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
74
index XXXXXXX..XXXXXXX 100644
75
--- a/target/arm/translate-vfp.c.inc
76
+++ b/target/arm/translate-vfp.c.inc
77
@@ -XXX,XX +XXX,XX @@ DO_VFP_2OP(VSQRT, hp, gen_VSQRT_hp)
78
DO_VFP_2OP(VSQRT, sp, gen_VSQRT_sp)
79
DO_VFP_2OP(VSQRT, dp, gen_VSQRT_dp)
80
81
+static bool trans_VCMP_hp(DisasContext *s, arg_VCMP_sp *a)
82
+{
83
+ TCGv_i32 vd, vm;
84
+
85
+ if (!dc_isar_feature(aa32_fp16_arith, s)) {
86
+ return false;
87
+ }
88
+
89
+ /* Vm/M bits must be zero for the Z variant */
90
+ if (a->z && a->vm != 0) {
91
+ return false;
92
+ }
93
+
94
+ if (!vfp_access_check(s)) {
95
+ return true;
96
+ }
97
+
98
+ vd = tcg_temp_new_i32();
99
+ vm = tcg_temp_new_i32();
100
+
101
+ neon_load_reg32(vd, a->vd);
102
+ if (a->z) {
103
+ tcg_gen_movi_i32(vm, 0);
104
+ } else {
105
+ neon_load_reg32(vm, a->vm);
106
+ }
107
+
108
+ if (a->e) {
109
+ gen_helper_vfp_cmpeh(vd, vm, cpu_env);
110
+ } else {
111
+ gen_helper_vfp_cmph(vd, vm, cpu_env);
112
+ }
113
+
114
+ tcg_temp_free_i32(vd);
115
+ tcg_temp_free_i32(vm);
116
+
117
+ return true;
118
+}
119
+
120
static bool trans_VCMP_sp(DisasContext *s, arg_VCMP_sp *a)
121
{
122
TCGv_i32 vd, vm;
123
--
124
2.20.1
125
126
diff view generated by jsdifflib
Deleted patch
1
Implement the fp16 versions of the VFP VLDR/VSTR (immediate).
2
1
3
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
5
Message-id: 20200828183354.27913-12-peter.maydell@linaro.org
6
---
7
target/arm/vfp.decode | 3 +--
8
target/arm/translate-vfp.c.inc | 35 ++++++++++++++++++++++++++++++++++
9
2 files changed, 36 insertions(+), 2 deletions(-)
10
11
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
12
index XXXXXXX..XXXXXXX 100644
13
--- a/target/arm/vfp.decode
14
+++ b/target/arm/vfp.decode
15
@@ -XXX,XX +XXX,XX @@ VMOV_single ---- 1110 000 l:1 .... rt:4 1010 . 001 0000 vn=%vn_sp
16
VMOV_64_sp ---- 1100 010 op:1 rt2:4 rt:4 1010 00.1 .... vm=%vm_sp
17
VMOV_64_dp ---- 1100 010 op:1 rt2:4 rt:4 1011 00.1 .... vm=%vm_dp
18
19
-# Note that the half-precision variants of VLDR and VSTR are
20
-# not part of this decodetree at all because they have bits [9:8] == 0b01
21
+VLDR_VSTR_hp ---- 1101 u:1 .0 l:1 rn:4 .... 1001 imm:8 vd=%vd_sp
22
VLDR_VSTR_sp ---- 1101 u:1 .0 l:1 rn:4 .... 1010 imm:8 vd=%vd_sp
23
VLDR_VSTR_dp ---- 1101 u:1 .0 l:1 rn:4 .... 1011 imm:8 vd=%vd_dp
24
25
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
26
index XXXXXXX..XXXXXXX 100644
27
--- a/target/arm/translate-vfp.c.inc
28
+++ b/target/arm/translate-vfp.c.inc
29
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_dp *a)
30
return true;
31
}
32
33
+static bool trans_VLDR_VSTR_hp(DisasContext *s, arg_VLDR_VSTR_sp *a)
34
+{
35
+ uint32_t offset;
36
+ TCGv_i32 addr, tmp;
37
+
38
+ if (!dc_isar_feature(aa32_fp16_arith, s)) {
39
+ return false;
40
+ }
41
+
42
+ if (!vfp_access_check(s)) {
43
+ return true;
44
+ }
45
+
46
+ /* imm8 field is offset/2 for fp16, unlike fp32 and fp64 */
47
+ offset = a->imm << 1;
48
+ if (!a->u) {
49
+ offset = -offset;
50
+ }
51
+
52
+ /* For thumb, use of PC is UNPREDICTABLE. */
53
+ addr = add_reg_for_lit(s, a->rn, offset);
54
+ tmp = tcg_temp_new_i32();
55
+ if (a->l) {
56
+ gen_aa32_ld16u(s, tmp, addr, get_mem_index(s));
57
+ neon_store_reg32(tmp, a->vd);
58
+ } else {
59
+ neon_load_reg32(tmp, a->vd);
60
+ gen_aa32_st16(s, tmp, addr, get_mem_index(s));
61
+ }
62
+ tcg_temp_free_i32(tmp);
63
+ tcg_temp_free_i32(addr);
64
+
65
+ return true;
66
+}
67
+
68
static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
69
{
70
uint32_t offset;
71
--
72
2.20.1
73
74
diff view generated by jsdifflib
Deleted patch
1
Currently the VFP_CONV_FIX macros take a single fsz argument for the
2
size of the float type, which is used both to select the name of
3
the functions to call (eg float32_is_any_nan()) and also for the
4
type to use for the float inputs and outputs (eg float32).
5
1
6
Separate these into fsz and ftype arguments, so that we can use them
7
for fp16, which uses 'float16' in the function names but is still
8
passing inputs and outputs in a 32-bit sized type.
9
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
12
Message-id: 20200828183354.27913-14-peter.maydell@linaro.org
13
---
14
target/arm/vfp_helper.c | 46 ++++++++++++++++++++---------------------
15
1 file changed, 23 insertions(+), 23 deletions(-)
16
17
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
18
index XXXXXXX..XXXXXXX 100644
19
--- a/target/arm/vfp_helper.c
20
+++ b/target/arm/vfp_helper.c
21
@@ -XXX,XX +XXX,XX @@ float32 VFP_HELPER(fcvts, d)(float64 x, CPUARMState *env)
22
}
23
24
/* VFP3 fixed point conversion. */
25
-#define VFP_CONV_FIX_FLOAT(name, p, fsz, isz, itype) \
26
-float##fsz HELPER(vfp_##name##to##p)(uint##isz##_t x, uint32_t shift, \
27
+#define VFP_CONV_FIX_FLOAT(name, p, fsz, ftype, isz, itype) \
28
+ftype HELPER(vfp_##name##to##p)(uint##isz##_t x, uint32_t shift, \
29
void *fpstp) \
30
{ return itype##_to_##float##fsz##_scalbn(x, -shift, fpstp); }
31
32
-#define VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, isz, itype, ROUND, suff) \
33
-uint##isz##_t HELPER(vfp_to##name##p##suff)(float##fsz x, uint32_t shift, \
34
+#define VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, ftype, isz, itype, ROUND, suff) \
35
+uint##isz##_t HELPER(vfp_to##name##p##suff)(ftype x, uint32_t shift, \
36
void *fpst) \
37
{ \
38
if (unlikely(float##fsz##_is_any_nan(x))) { \
39
@@ -XXX,XX +XXX,XX @@ uint##isz##_t HELPER(vfp_to##name##p##suff)(float##fsz x, uint32_t shift, \
40
return float##fsz##_to_##itype##_scalbn(x, ROUND, shift, fpst); \
41
}
42
43
-#define VFP_CONV_FIX(name, p, fsz, isz, itype) \
44
-VFP_CONV_FIX_FLOAT(name, p, fsz, isz, itype) \
45
-VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, isz, itype, \
46
+#define VFP_CONV_FIX(name, p, fsz, ftype, isz, itype) \
47
+VFP_CONV_FIX_FLOAT(name, p, fsz, ftype, isz, itype) \
48
+VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, ftype, isz, itype, \
49
float_round_to_zero, _round_to_zero) \
50
-VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, isz, itype, \
51
+VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, ftype, isz, itype, \
52
get_float_rounding_mode(fpst), )
53
54
-#define VFP_CONV_FIX_A64(name, p, fsz, isz, itype) \
55
-VFP_CONV_FIX_FLOAT(name, p, fsz, isz, itype) \
56
-VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, isz, itype, \
57
+#define VFP_CONV_FIX_A64(name, p, fsz, ftype, isz, itype) \
58
+VFP_CONV_FIX_FLOAT(name, p, fsz, ftype, isz, itype) \
59
+VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, ftype, isz, itype, \
60
get_float_rounding_mode(fpst), )
61
62
-VFP_CONV_FIX(sh, d, 64, 64, int16)
63
-VFP_CONV_FIX(sl, d, 64, 64, int32)
64
-VFP_CONV_FIX_A64(sq, d, 64, 64, int64)
65
-VFP_CONV_FIX(uh, d, 64, 64, uint16)
66
-VFP_CONV_FIX(ul, d, 64, 64, uint32)
67
-VFP_CONV_FIX_A64(uq, d, 64, 64, uint64)
68
-VFP_CONV_FIX(sh, s, 32, 32, int16)
69
-VFP_CONV_FIX(sl, s, 32, 32, int32)
70
-VFP_CONV_FIX_A64(sq, s, 32, 64, int64)
71
-VFP_CONV_FIX(uh, s, 32, 32, uint16)
72
-VFP_CONV_FIX(ul, s, 32, 32, uint32)
73
-VFP_CONV_FIX_A64(uq, s, 32, 64, uint64)
74
+VFP_CONV_FIX(sh, d, 64, float64, 64, int16)
75
+VFP_CONV_FIX(sl, d, 64, float64, 64, int32)
76
+VFP_CONV_FIX_A64(sq, d, 64, float64, 64, int64)
77
+VFP_CONV_FIX(uh, d, 64, float64, 64, uint16)
78
+VFP_CONV_FIX(ul, d, 64, float64, 64, uint32)
79
+VFP_CONV_FIX_A64(uq, d, 64, float64, 64, uint64)
80
+VFP_CONV_FIX(sh, s, 32, float32, 32, int16)
81
+VFP_CONV_FIX(sl, s, 32, float32, 32, int32)
82
+VFP_CONV_FIX_A64(sq, s, 32, float32, 64, int64)
83
+VFP_CONV_FIX(uh, s, 32, float32, 32, uint16)
84
+VFP_CONV_FIX(ul, s, 32, float32, 32, uint32)
85
+VFP_CONV_FIX_A64(uq, s, 32, float32, 64, uint64)
86
87
#undef VFP_CONV_FIX
88
#undef VFP_CONV_FIX_FLOAT
89
--
90
2.20.1
91
92
diff view generated by jsdifflib
Deleted patch
1
Now the VFP_CONV_FIX macros can handle fp16's distinction between the
2
width of the operation and the width of the type used to pass operands,
3
use the macros rather than the open-coded functions.
4
1
5
This creates an extra six helper functions, all of which we are going
6
to need for the AArch32 VFP fp16 instructions.
7
8
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
10
Message-id: 20200828183354.27913-15-peter.maydell@linaro.org
11
---
12
target/arm/helper.h | 6 +++
13
target/arm/vfp_helper.c | 86 +++--------------------------------------
14
2 files changed, 12 insertions(+), 80 deletions(-)
15
16
diff --git a/target/arm/helper.h b/target/arm/helper.h
17
index XXXXXXX..XXXXXXX 100644
18
--- a/target/arm/helper.h
19
+++ b/target/arm/helper.h
20
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_2(vfp_tosizh, s32, f16, ptr)
21
DEF_HELPER_2(vfp_tosizs, s32, f32, ptr)
22
DEF_HELPER_2(vfp_tosizd, s32, f64, ptr)
23
24
+DEF_HELPER_3(vfp_toshh_round_to_zero, i32, f16, i32, ptr)
25
+DEF_HELPER_3(vfp_toslh_round_to_zero, i32, f16, i32, ptr)
26
+DEF_HELPER_3(vfp_touhh_round_to_zero, i32, f16, i32, ptr)
27
+DEF_HELPER_3(vfp_toulh_round_to_zero, i32, f16, i32, ptr)
28
DEF_HELPER_3(vfp_toshs_round_to_zero, i32, f32, i32, ptr)
29
DEF_HELPER_3(vfp_tosls_round_to_zero, i32, f32, i32, ptr)
30
DEF_HELPER_3(vfp_touhs_round_to_zero, i32, f32, i32, ptr)
31
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_sqtod, f64, i64, i32, ptr)
32
DEF_HELPER_3(vfp_uhtod, f64, i64, i32, ptr)
33
DEF_HELPER_3(vfp_ultod, f64, i64, i32, ptr)
34
DEF_HELPER_3(vfp_uqtod, f64, i64, i32, ptr)
35
+DEF_HELPER_3(vfp_shtoh, f16, i32, i32, ptr)
36
+DEF_HELPER_3(vfp_uhtoh, f16, i32, i32, ptr)
37
DEF_HELPER_3(vfp_sltoh, f16, i32, i32, ptr)
38
DEF_HELPER_3(vfp_ultoh, f16, i32, i32, ptr)
39
DEF_HELPER_3(vfp_sqtoh, f16, i64, i32, ptr)
40
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
41
index XXXXXXX..XXXXXXX 100644
42
--- a/target/arm/vfp_helper.c
43
+++ b/target/arm/vfp_helper.c
44
@@ -XXX,XX +XXX,XX @@ VFP_CONV_FIX_A64(sq, s, 32, float32, 64, int64)
45
VFP_CONV_FIX(uh, s, 32, float32, 32, uint16)
46
VFP_CONV_FIX(ul, s, 32, float32, 32, uint32)
47
VFP_CONV_FIX_A64(uq, s, 32, float32, 64, uint64)
48
+VFP_CONV_FIX(sh, h, 16, dh_ctype_f16, 32, int16)
49
+VFP_CONV_FIX(sl, h, 16, dh_ctype_f16, 32, int32)
50
+VFP_CONV_FIX_A64(sq, h, 16, dh_ctype_f16, 64, int64)
51
+VFP_CONV_FIX(uh, h, 16, dh_ctype_f16, 32, uint16)
52
+VFP_CONV_FIX(ul, h, 16, dh_ctype_f16, 32, uint32)
53
+VFP_CONV_FIX_A64(uq, h, 16, dh_ctype_f16, 64, uint64)
54
55
#undef VFP_CONV_FIX
56
#undef VFP_CONV_FIX_FLOAT
57
#undef VFP_CONV_FLOAT_FIX_ROUND
58
#undef VFP_CONV_FIX_A64
59
60
-uint32_t HELPER(vfp_sltoh)(uint32_t x, uint32_t shift, void *fpst)
61
-{
62
- return int32_to_float16_scalbn(x, -shift, fpst);
63
-}
64
-
65
-uint32_t HELPER(vfp_ultoh)(uint32_t x, uint32_t shift, void *fpst)
66
-{
67
- return uint32_to_float16_scalbn(x, -shift, fpst);
68
-}
69
-
70
-uint32_t HELPER(vfp_sqtoh)(uint64_t x, uint32_t shift, void *fpst)
71
-{
72
- return int64_to_float16_scalbn(x, -shift, fpst);
73
-}
74
-
75
-uint32_t HELPER(vfp_uqtoh)(uint64_t x, uint32_t shift, void *fpst)
76
-{
77
- return uint64_to_float16_scalbn(x, -shift, fpst);
78
-}
79
-
80
-uint32_t HELPER(vfp_toshh)(uint32_t x, uint32_t shift, void *fpst)
81
-{
82
- if (unlikely(float16_is_any_nan(x))) {
83
- float_raise(float_flag_invalid, fpst);
84
- return 0;
85
- }
86
- return float16_to_int16_scalbn(x, get_float_rounding_mode(fpst),
87
- shift, fpst);
88
-}
89
-
90
-uint32_t HELPER(vfp_touhh)(uint32_t x, uint32_t shift, void *fpst)
91
-{
92
- if (unlikely(float16_is_any_nan(x))) {
93
- float_raise(float_flag_invalid, fpst);
94
- return 0;
95
- }
96
- return float16_to_uint16_scalbn(x, get_float_rounding_mode(fpst),
97
- shift, fpst);
98
-}
99
-
100
-uint32_t HELPER(vfp_toslh)(uint32_t x, uint32_t shift, void *fpst)
101
-{
102
- if (unlikely(float16_is_any_nan(x))) {
103
- float_raise(float_flag_invalid, fpst);
104
- return 0;
105
- }
106
- return float16_to_int32_scalbn(x, get_float_rounding_mode(fpst),
107
- shift, fpst);
108
-}
109
-
110
-uint32_t HELPER(vfp_toulh)(uint32_t x, uint32_t shift, void *fpst)
111
-{
112
- if (unlikely(float16_is_any_nan(x))) {
113
- float_raise(float_flag_invalid, fpst);
114
- return 0;
115
- }
116
- return float16_to_uint32_scalbn(x, get_float_rounding_mode(fpst),
117
- shift, fpst);
118
-}
119
-
120
-uint64_t HELPER(vfp_tosqh)(uint32_t x, uint32_t shift, void *fpst)
121
-{
122
- if (unlikely(float16_is_any_nan(x))) {
123
- float_raise(float_flag_invalid, fpst);
124
- return 0;
125
- }
126
- return float16_to_int64_scalbn(x, get_float_rounding_mode(fpst),
127
- shift, fpst);
128
-}
129
-
130
-uint64_t HELPER(vfp_touqh)(uint32_t x, uint32_t shift, void *fpst)
131
-{
132
- if (unlikely(float16_is_any_nan(x))) {
133
- float_raise(float_flag_invalid, fpst);
134
- return 0;
135
- }
136
- return float16_to_uint64_scalbn(x, get_float_rounding_mode(fpst),
137
- shift, fpst);
138
-}
139
-
140
/* Set the current fp rounding mode and return the old one.
141
* The argument is a softfloat float_round_ value.
142
*/
143
--
144
2.20.1
145
146
diff view generated by jsdifflib
Deleted patch
1
Implement the fp16 versions of the VFP VCVT instruction forms which
2
convert between floating point and fixed-point.
3
1
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Message-id: 20200828183354.27913-16-peter.maydell@linaro.org
7
---
8
target/arm/vfp.decode | 2 ++
9
target/arm/translate-vfp.c.inc | 59 ++++++++++++++++++++++++++++++++++
10
2 files changed, 61 insertions(+)
11
12
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
13
index XXXXXXX..XXXXXXX 100644
14
--- a/target/arm/vfp.decode
15
+++ b/target/arm/vfp.decode
16
@@ -XXX,XX +XXX,XX @@ VJCVT ---- 1110 1.11 1001 .... 1011 11.0 .... @vfp_dm_sd
17
# We assemble bits 18 (op), 16 (u) and 7 (sx) into a single opc field
18
# for the convenience of the trans_VCVT_fix functions.
19
%vcvt_fix_op 18:1 16:1 7:1
20
+VCVT_fix_hp ---- 1110 1.11 1.1. .... 1001 .1.0 .... \
21
+ vd=%vd_sp imm=%vm_sp opc=%vcvt_fix_op
22
VCVT_fix_sp ---- 1110 1.11 1.1. .... 1010 .1.0 .... \
23
vd=%vd_sp imm=%vm_sp opc=%vcvt_fix_op
24
VCVT_fix_dp ---- 1110 1.11 1.1. .... 1011 .1.0 .... \
25
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
26
index XXXXXXX..XXXXXXX 100644
27
--- a/target/arm/translate-vfp.c.inc
28
+++ b/target/arm/translate-vfp.c.inc
29
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
30
return true;
31
}
32
33
+static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
34
+{
35
+ TCGv_i32 vd, shift;
36
+ TCGv_ptr fpst;
37
+ int frac_bits;
38
+
39
+ if (!dc_isar_feature(aa32_fp16_arith, s)) {
40
+ return false;
41
+ }
42
+
43
+ if (!vfp_access_check(s)) {
44
+ return true;
45
+ }
46
+
47
+ frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
48
+
49
+ vd = tcg_temp_new_i32();
50
+ neon_load_reg32(vd, a->vd);
51
+
52
+ fpst = fpstatus_ptr(FPST_FPCR_F16);
53
+ shift = tcg_const_i32(frac_bits);
54
+
55
+ /* Switch on op:U:sx bits */
56
+ switch (a->opc) {
57
+ case 0:
58
+ gen_helper_vfp_shtoh(vd, vd, shift, fpst);
59
+ break;
60
+ case 1:
61
+ gen_helper_vfp_sltoh(vd, vd, shift, fpst);
62
+ break;
63
+ case 2:
64
+ gen_helper_vfp_uhtoh(vd, vd, shift, fpst);
65
+ break;
66
+ case 3:
67
+ gen_helper_vfp_ultoh(vd, vd, shift, fpst);
68
+ break;
69
+ case 4:
70
+ gen_helper_vfp_toshh_round_to_zero(vd, vd, shift, fpst);
71
+ break;
72
+ case 5:
73
+ gen_helper_vfp_toslh_round_to_zero(vd, vd, shift, fpst);
74
+ break;
75
+ case 6:
76
+ gen_helper_vfp_touhh_round_to_zero(vd, vd, shift, fpst);
77
+ break;
78
+ case 7:
79
+ gen_helper_vfp_toulh_round_to_zero(vd, vd, shift, fpst);
80
+ break;
81
+ default:
82
+ g_assert_not_reached();
83
+ }
84
+
85
+ neon_store_reg32(vd, a->vd);
86
+ tcg_temp_free_i32(vd);
87
+ tcg_temp_free_i32(shift);
88
+ tcg_temp_free_ptr(fpst);
89
+ return true;
90
+}
91
+
92
static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
93
{
94
TCGv_i32 vd, shift;
95
--
96
2.20.1
97
98
diff view generated by jsdifflib
1
Implement FP16 support for the Neon insns which use the DO_3S_FP_GVEC
1
From: Jean-Christophe Dubois <jcd@tribudubois.net>
2
macro: VADD, VSUB, VABD, VMUL.
3
2
4
For VABD this requires us to implement a new gvec_fabd_h helper
3
The i.MX6UL doesn't support CLK_HIGH ou CLK_HIGH_DIV clock source.
5
using the machinery we have already for the other helpers.
6
4
5
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
6
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
8
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
9
Message-id: 20200828183354.27913-24-peter.maydell@linaro.org
10
---
8
---
11
target/arm/helper.h | 1 +
9
include/hw/timer/imx_gpt.h | 1 +
12
target/arm/vec_helper.c | 6 ++++++
10
hw/arm/fsl-imx6ul.c | 2 +-
13
target/arm/translate-neon.c.inc | 36 +++++++++++++++++----------------
11
hw/misc/imx6ul_ccm.c | 6 ------
14
3 files changed, 26 insertions(+), 17 deletions(-)
12
hw/timer/imx_gpt.c | 25 +++++++++++++++++++++++++
13
4 files changed, 27 insertions(+), 7 deletions(-)
15
14
16
diff --git a/target/arm/helper.h b/target/arm/helper.h
15
diff --git a/include/hw/timer/imx_gpt.h b/include/hw/timer/imx_gpt.h
17
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
18
--- a/target/arm/helper.h
17
--- a/include/hw/timer/imx_gpt.h
19
+++ b/target/arm/helper.h
18
+++ b/include/hw/timer/imx_gpt.h
20
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fmul_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
19
@@ -XXX,XX +XXX,XX @@
21
DEF_HELPER_FLAGS_5(gvec_fmul_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
20
#define TYPE_IMX25_GPT "imx25.gpt"
22
DEF_HELPER_FLAGS_5(gvec_fmul_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
21
#define TYPE_IMX31_GPT "imx31.gpt"
23
22
#define TYPE_IMX6_GPT "imx6.gpt"
24
+DEF_HELPER_FLAGS_5(gvec_fabd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
23
+#define TYPE_IMX6UL_GPT "imx6ul.gpt"
25
DEF_HELPER_FLAGS_5(gvec_fabd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
24
#define TYPE_IMX7_GPT "imx7.gpt"
26
25
27
DEF_HELPER_FLAGS_5(gvec_ftsmul_h, TCG_CALL_NO_RWG,
26
#define TYPE_IMX_GPT TYPE_IMX25_GPT
28
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
27
diff --git a/hw/arm/fsl-imx6ul.c b/hw/arm/fsl-imx6ul.c
29
index XXXXXXX..XXXXXXX 100644
28
index XXXXXXX..XXXXXXX 100644
30
--- a/target/arm/vec_helper.c
29
--- a/hw/arm/fsl-imx6ul.c
31
+++ b/target/arm/vec_helper.c
30
+++ b/hw/arm/fsl-imx6ul.c
32
@@ -XXX,XX +XXX,XX @@ static float64 float64_ftsmul(float64 op1, uint64_t op2, float_status *stat)
31
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_init(Object *obj)
33
return result;
32
*/
33
for (i = 0; i < FSL_IMX6UL_NUM_GPTS; i++) {
34
snprintf(name, NAME_SIZE, "gpt%d", i);
35
- object_initialize_child(obj, name, &s->gpt[i], TYPE_IMX7_GPT);
36
+ object_initialize_child(obj, name, &s->gpt[i], TYPE_IMX6UL_GPT);
37
}
38
39
/*
40
diff --git a/hw/misc/imx6ul_ccm.c b/hw/misc/imx6ul_ccm.c
41
index XXXXXXX..XXXXXXX 100644
42
--- a/hw/misc/imx6ul_ccm.c
43
+++ b/hw/misc/imx6ul_ccm.c
44
@@ -XXX,XX +XXX,XX @@ static uint32_t imx6ul_ccm_get_clock_frequency(IMXCCMState *dev, IMXClk clock)
45
case CLK_32k:
46
freq = CKIL_FREQ;
47
break;
48
- case CLK_HIGH:
49
- freq = CKIH_FREQ;
50
- break;
51
- case CLK_HIGH_DIV:
52
- freq = CKIH_FREQ / 8;
53
- break;
54
default:
55
qemu_log_mask(LOG_GUEST_ERROR, "[%s]%s: unsupported clock %d\n",
56
TYPE_IMX6UL_CCM, __func__, clock);
57
diff --git a/hw/timer/imx_gpt.c b/hw/timer/imx_gpt.c
58
index XXXXXXX..XXXXXXX 100644
59
--- a/hw/timer/imx_gpt.c
60
+++ b/hw/timer/imx_gpt.c
61
@@ -XXX,XX +XXX,XX @@ static const IMXClk imx6_gpt_clocks[] = {
62
CLK_HIGH, /* 111 reference clock */
63
};
64
65
+static const IMXClk imx6ul_gpt_clocks[] = {
66
+ CLK_NONE, /* 000 No clock source */
67
+ CLK_IPG, /* 001 ipg_clk, 532MHz*/
68
+ CLK_IPG_HIGH, /* 010 ipg_clk_highfreq */
69
+ CLK_EXT, /* 011 External clock */
70
+ CLK_32k, /* 100 ipg_clk_32k */
71
+ CLK_NONE, /* 101 not defined */
72
+ CLK_NONE, /* 110 not defined */
73
+ CLK_NONE, /* 111 not defined */
74
+};
75
+
76
static const IMXClk imx7_gpt_clocks[] = {
77
CLK_NONE, /* 000 No clock source */
78
CLK_IPG, /* 001 ipg_clk, 532MHz*/
79
@@ -XXX,XX +XXX,XX @@ static void imx6_gpt_init(Object *obj)
80
s->clocks = imx6_gpt_clocks;
34
}
81
}
35
82
36
+static float16 float16_abd(float16 op1, float16 op2, float_status *stat)
83
+static void imx6ul_gpt_init(Object *obj)
37
+{
84
+{
38
+ return float16_abs(float16_sub(op1, op2, stat));
85
+ IMXGPTState *s = IMX_GPT(obj);
86
+
87
+ s->clocks = imx6ul_gpt_clocks;
39
+}
88
+}
40
+
89
+
41
static float32 float32_abd(float32 op1, float32 op2, float_status *stat)
90
static void imx7_gpt_init(Object *obj)
42
{
91
{
43
return float32_abs(float32_sub(op1, op2, stat));
92
IMXGPTState *s = IMX_GPT(obj);
44
@@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_ftsmul_h, float16_ftsmul, float16)
93
@@ -XXX,XX +XXX,XX @@ static const TypeInfo imx6_gpt_info = {
45
DO_3OP(gvec_ftsmul_s, float32_ftsmul, float32)
94
.instance_init = imx6_gpt_init,
46
DO_3OP(gvec_ftsmul_d, float64_ftsmul, float64)
95
};
47
96
48
+DO_3OP(gvec_fabd_h, float16_abd, float16)
97
+static const TypeInfo imx6ul_gpt_info = {
49
DO_3OP(gvec_fabd_s, float32_abd, float32)
98
+ .name = TYPE_IMX6UL_GPT,
50
99
+ .parent = TYPE_IMX25_GPT,
51
#ifdef TARGET_AARCH64
100
+ .instance_init = imx6ul_gpt_init,
52
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
101
+};
53
index XXXXXXX..XXXXXXX 100644
102
+
54
--- a/target/arm/translate-neon.c.inc
103
static const TypeInfo imx7_gpt_info = {
55
+++ b/target/arm/translate-neon.c.inc
104
.name = TYPE_IMX7_GPT,
56
@@ -XXX,XX +XXX,XX @@ static bool do_3same_fp(DisasContext *s, arg_3same *a, VFPGen3OpSPFn *fn,
105
.parent = TYPE_IMX25_GPT,
57
return true;
106
@@ -XXX,XX +XXX,XX @@ static void imx_gpt_register_types(void)
107
type_register_static(&imx25_gpt_info);
108
type_register_static(&imx31_gpt_info);
109
type_register_static(&imx6_gpt_info);
110
+ type_register_static(&imx6ul_gpt_info);
111
type_register_static(&imx7_gpt_info);
58
}
112
}
59
113
60
-/*
61
- * For all the functions using this macro, size == 1 means fp16,
62
- * which is an architecture extension we don't implement yet.
63
- */
64
-#define DO_3S_FP_GVEC(INSN,FUNC) \
65
- static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs, \
66
- uint32_t rn_ofs, uint32_t rm_ofs, \
67
- uint32_t oprsz, uint32_t maxsz) \
68
+#define WRAP_FP_GVEC(WRAPNAME, FPST, FUNC) \
69
+ static void WRAPNAME(unsigned vece, uint32_t rd_ofs, \
70
+ uint32_t rn_ofs, uint32_t rm_ofs, \
71
+ uint32_t oprsz, uint32_t maxsz) \
72
{ \
73
- TCGv_ptr fpst = fpstatus_ptr(FPST_STD); \
74
+ TCGv_ptr fpst = fpstatus_ptr(FPST); \
75
tcg_gen_gvec_3_ptr(rd_ofs, rn_ofs, rm_ofs, fpst, \
76
oprsz, maxsz, 0, FUNC); \
77
tcg_temp_free_ptr(fpst); \
78
- } \
79
+ }
80
+
81
+#define DO_3S_FP_GVEC(INSN,SFUNC,HFUNC) \
82
+ WRAP_FP_GVEC(gen_##INSN##_fp32_3s, FPST_STD, SFUNC) \
83
+ WRAP_FP_GVEC(gen_##INSN##_fp16_3s, FPST_STD_F16, HFUNC) \
84
static bool trans_##INSN##_fp_3s(DisasContext *s, arg_3same *a) \
85
{ \
86
if (a->size != 0) { \
87
- /* TODO fp16 support */ \
88
- return false; \
89
+ if (!dc_isar_feature(aa32_fp16_arith, s)) { \
90
+ return false; \
91
+ } \
92
+ return do_3same(s, a, gen_##INSN##_fp16_3s); \
93
} \
94
- return do_3same(s, a, gen_##INSN##_3s); \
95
+ return do_3same(s, a, gen_##INSN##_fp32_3s); \
96
}
97
98
99
-DO_3S_FP_GVEC(VADD, gen_helper_gvec_fadd_s)
100
-DO_3S_FP_GVEC(VSUB, gen_helper_gvec_fsub_s)
101
-DO_3S_FP_GVEC(VABD, gen_helper_gvec_fabd_s)
102
-DO_3S_FP_GVEC(VMUL, gen_helper_gvec_fmul_s)
103
+DO_3S_FP_GVEC(VADD, gen_helper_gvec_fadd_s, gen_helper_gvec_fadd_h)
104
+DO_3S_FP_GVEC(VSUB, gen_helper_gvec_fsub_s, gen_helper_gvec_fsub_h)
105
+DO_3S_FP_GVEC(VABD, gen_helper_gvec_fabd_s, gen_helper_gvec_fabd_h)
106
+DO_3S_FP_GVEC(VMUL, gen_helper_gvec_fmul_s, gen_helper_gvec_fmul_h)
107
108
/*
109
* For all the functions using this macro, size == 1 means fp16,
110
--
114
--
111
2.20.1
115
2.25.1
112
113
diff view generated by jsdifflib
1
Convert the Neon float-point VMAX and VMIN insns over to using
1
From: Jean-Christophe Dubois <jcd@tribudubois.net>
2
a gvec helper, and use this to implement the fp16 case.
3
2
3
IRQs were not associated to the various GPIO devices inside i.MX7D.
4
This patch brings the i.MX7D on par with i.MX6.
5
6
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
7
Message-id: 20221226101418.415170-1-jcd@tribudubois.net
8
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Message-id: 20200828183354.27913-29-peter.maydell@linaro.org
7
---
10
---
8
target/arm/helper.h | 6 ++++++
11
include/hw/arm/fsl-imx7.h | 15 +++++++++++++++
9
target/arm/vec_helper.c | 6 ++++++
12
hw/arm/fsl-imx7.c | 31 ++++++++++++++++++++++++++++++-
10
target/arm/translate-neon.c.inc | 5 ++---
13
2 files changed, 45 insertions(+), 1 deletion(-)
11
3 files changed, 14 insertions(+), 3 deletions(-)
12
14
13
diff --git a/target/arm/helper.h b/target/arm/helper.h
15
diff --git a/include/hw/arm/fsl-imx7.h b/include/hw/arm/fsl-imx7.h
14
index XXXXXXX..XXXXXXX 100644
16
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/helper.h
17
--- a/include/hw/arm/fsl-imx7.h
16
+++ b/target/arm/helper.h
18
+++ b/include/hw/arm/fsl-imx7.h
17
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_facge_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
19
@@ -XXX,XX +XXX,XX @@ enum FslIMX7IRQs {
18
DEF_HELPER_FLAGS_5(gvec_facgt_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
20
FSL_IMX7_GPT3_IRQ = 53,
19
DEF_HELPER_FLAGS_5(gvec_facgt_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
21
FSL_IMX7_GPT4_IRQ = 52,
20
22
21
+DEF_HELPER_FLAGS_5(gvec_fmax_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
23
+ FSL_IMX7_GPIO1_LOW_IRQ = 64,
22
+DEF_HELPER_FLAGS_5(gvec_fmax_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
24
+ FSL_IMX7_GPIO1_HIGH_IRQ = 65,
25
+ FSL_IMX7_GPIO2_LOW_IRQ = 66,
26
+ FSL_IMX7_GPIO2_HIGH_IRQ = 67,
27
+ FSL_IMX7_GPIO3_LOW_IRQ = 68,
28
+ FSL_IMX7_GPIO3_HIGH_IRQ = 69,
29
+ FSL_IMX7_GPIO4_LOW_IRQ = 70,
30
+ FSL_IMX7_GPIO4_HIGH_IRQ = 71,
31
+ FSL_IMX7_GPIO5_LOW_IRQ = 72,
32
+ FSL_IMX7_GPIO5_HIGH_IRQ = 73,
33
+ FSL_IMX7_GPIO6_LOW_IRQ = 74,
34
+ FSL_IMX7_GPIO6_HIGH_IRQ = 75,
35
+ FSL_IMX7_GPIO7_LOW_IRQ = 76,
36
+ FSL_IMX7_GPIO7_HIGH_IRQ = 77,
23
+
37
+
24
+DEF_HELPER_FLAGS_5(gvec_fmin_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
38
FSL_IMX7_WDOG1_IRQ = 78,
25
+DEF_HELPER_FLAGS_5(gvec_fmin_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
39
FSL_IMX7_WDOG2_IRQ = 79,
40
FSL_IMX7_WDOG3_IRQ = 10,
41
diff --git a/hw/arm/fsl-imx7.c b/hw/arm/fsl-imx7.c
42
index XXXXXXX..XXXXXXX 100644
43
--- a/hw/arm/fsl-imx7.c
44
+++ b/hw/arm/fsl-imx7.c
45
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
46
FSL_IMX7_GPIO7_ADDR,
47
};
48
49
+ static const int FSL_IMX7_GPIOn_LOW_IRQ[FSL_IMX7_NUM_GPIOS] = {
50
+ FSL_IMX7_GPIO1_LOW_IRQ,
51
+ FSL_IMX7_GPIO2_LOW_IRQ,
52
+ FSL_IMX7_GPIO3_LOW_IRQ,
53
+ FSL_IMX7_GPIO4_LOW_IRQ,
54
+ FSL_IMX7_GPIO5_LOW_IRQ,
55
+ FSL_IMX7_GPIO6_LOW_IRQ,
56
+ FSL_IMX7_GPIO7_LOW_IRQ,
57
+ };
26
+
58
+
27
DEF_HELPER_FLAGS_5(gvec_ftsmul_h, TCG_CALL_NO_RWG,
59
+ static const int FSL_IMX7_GPIOn_HIGH_IRQ[FSL_IMX7_NUM_GPIOS] = {
28
void, ptr, ptr, ptr, ptr, i32)
60
+ FSL_IMX7_GPIO1_HIGH_IRQ,
29
DEF_HELPER_FLAGS_5(gvec_ftsmul_s, TCG_CALL_NO_RWG,
61
+ FSL_IMX7_GPIO2_HIGH_IRQ,
30
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
62
+ FSL_IMX7_GPIO3_HIGH_IRQ,
31
index XXXXXXX..XXXXXXX 100644
63
+ FSL_IMX7_GPIO4_HIGH_IRQ,
32
--- a/target/arm/vec_helper.c
64
+ FSL_IMX7_GPIO5_HIGH_IRQ,
33
+++ b/target/arm/vec_helper.c
65
+ FSL_IMX7_GPIO6_HIGH_IRQ,
34
@@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_facge_s, float32_acge, float32)
66
+ FSL_IMX7_GPIO7_HIGH_IRQ,
35
DO_3OP(gvec_facgt_h, float16_acgt, float16)
67
+ };
36
DO_3OP(gvec_facgt_s, float32_acgt, float32)
37
38
+DO_3OP(gvec_fmax_h, float16_max, float16)
39
+DO_3OP(gvec_fmax_s, float32_max, float32)
40
+
68
+
41
+DO_3OP(gvec_fmin_h, float16_min, float16)
69
sysbus_realize(SYS_BUS_DEVICE(&s->gpio[i]), &error_abort);
42
+DO_3OP(gvec_fmin_s, float32_min, float32)
70
- sysbus_mmio_map(SYS_BUS_DEVICE(&s->gpio[i]), 0, FSL_IMX7_GPIOn_ADDR[i]);
71
+ sysbus_mmio_map(SYS_BUS_DEVICE(&s->gpio[i]), 0,
72
+ FSL_IMX7_GPIOn_ADDR[i]);
43
+
73
+
44
#ifdef TARGET_AARCH64
74
+ sysbus_connect_irq(SYS_BUS_DEVICE(&s->gpio[i]), 0,
45
75
+ qdev_get_gpio_in(DEVICE(&s->a7mpcore),
46
DO_3OP(gvec_recps_h, helper_recpsf_f16, float16)
76
+ FSL_IMX7_GPIOn_LOW_IRQ[i]));
47
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
77
+
48
index XXXXXXX..XXXXXXX 100644
78
+ sysbus_connect_irq(SYS_BUS_DEVICE(&s->gpio[i]), 1,
49
--- a/target/arm/translate-neon.c.inc
79
+ qdev_get_gpio_in(DEVICE(&s->a7mpcore),
50
+++ b/target/arm/translate-neon.c.inc
80
+ FSL_IMX7_GPIOn_HIGH_IRQ[i]));
51
@@ -XXX,XX +XXX,XX @@ DO_3S_FP_GVEC(VCGE, gen_helper_gvec_fcge_s, gen_helper_gvec_fcge_h)
52
DO_3S_FP_GVEC(VCGT, gen_helper_gvec_fcgt_s, gen_helper_gvec_fcgt_h)
53
DO_3S_FP_GVEC(VACGE, gen_helper_gvec_facge_s, gen_helper_gvec_facge_h)
54
DO_3S_FP_GVEC(VACGT, gen_helper_gvec_facgt_s, gen_helper_gvec_facgt_h)
55
+DO_3S_FP_GVEC(VMAX, gen_helper_gvec_fmax_s, gen_helper_gvec_fmax_h)
56
+DO_3S_FP_GVEC(VMIN, gen_helper_gvec_fmin_s, gen_helper_gvec_fmin_h)
57
58
/*
59
* For all the functions using this macro, size == 1 means fp16,
60
@@ -XXX,XX +XXX,XX @@ DO_3S_FP_GVEC(VACGT, gen_helper_gvec_facgt_s, gen_helper_gvec_facgt_h)
61
return do_3same_fp(s, a, FUNC, READS_VD); \
62
}
81
}
63
82
64
-DO_3S_FP(VMAX, gen_helper_vfp_maxs, false)
83
/*
65
-DO_3S_FP(VMIN, gen_helper_vfp_mins, false)
66
-
67
static void gen_VMLA_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm,
68
TCGv_ptr fpstatus)
69
{
70
--
84
--
71
2.20.1
85
2.25.1
72
73
diff view generated by jsdifflib
1
Implement the fp16 versions of the VFP VSEL instruction.
1
From: Stephen Longfield <slongfield@google.com>
2
2
3
Size is used at lines 1088/1188 for the loop, which reads the last 4
4
bytes from the crc_ptr so it does need to get increased, however it
5
shouldn't be increased before the buffer is passed to CRC computation,
6
or the crc32 function will access uninitialized memory.
7
8
This was pointed out to me by clg@kaod.org during the code review of
9
a similar patch to hw/net/ftgmac100.c
10
11
Change-Id: Ib0464303b191af1e28abeb2f5105eb25aadb5e9b
12
Signed-off-by: Stephen Longfield <slongfield@google.com>
13
Reviewed-by: Patrick Venture <venture@google.com>
14
Message-id: 20221221183202.3788132-1-slongfield@google.com
15
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
3
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
16
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
4
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
5
Message-id: 20200828183354.27913-18-peter.maydell@linaro.org
6
---
17
---
7
target/arm/vfp-uncond.decode | 6 ++++--
18
hw/net/imx_fec.c | 8 ++++----
8
target/arm/translate-vfp.c.inc | 16 ++++++++++++----
19
1 file changed, 4 insertions(+), 4 deletions(-)
9
2 files changed, 16 insertions(+), 6 deletions(-)
10
20
11
diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
21
diff --git a/hw/net/imx_fec.c b/hw/net/imx_fec.c
12
index XXXXXXX..XXXXXXX 100644
22
index XXXXXXX..XXXXXXX 100644
13
--- a/target/arm/vfp-uncond.decode
23
--- a/hw/net/imx_fec.c
14
+++ b/target/arm/vfp-uncond.decode
24
+++ b/hw/net/imx_fec.c
15
@@ -XXX,XX +XXX,XX @@
25
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
16
@vfp_dnm_s ................................ vm=%vm_sp vn=%vn_sp vd=%vd_sp
26
return 0;
17
@vfp_dnm_d ................................ vm=%vm_dp vn=%vn_dp vd=%vd_dp
18
19
+VSEL 1111 1110 0. cc:2 .... .... 1001 .0.0 .... \
20
+ vm=%vm_sp vn=%vn_sp vd=%vd_sp sz=1
21
VSEL 1111 1110 0. cc:2 .... .... 1010 .0.0 .... \
22
- vm=%vm_sp vn=%vn_sp vd=%vd_sp dp=0
23
+ vm=%vm_sp vn=%vn_sp vd=%vd_sp sz=2
24
VSEL 1111 1110 0. cc:2 .... .... 1011 .0.0 .... \
25
- vm=%vm_dp vn=%vn_dp vd=%vd_dp dp=1
26
+ vm=%vm_dp vn=%vn_dp vd=%vd_dp sz=3
27
28
VMAXNM_hp 1111 1110 1.00 .... .... 1001 .0.0 .... @vfp_dnm_s
29
VMINNM_hp 1111 1110 1.00 .... .... 1001 .1.0 .... @vfp_dnm_s
30
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
31
index XXXXXXX..XXXXXXX 100644
32
--- a/target/arm/translate-vfp.c.inc
33
+++ b/target/arm/translate-vfp.c.inc
34
@@ -XXX,XX +XXX,XX @@ static bool vfp_access_check(DisasContext *s)
35
static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
36
{
37
uint32_t rd, rn, rm;
38
- bool dp = a->dp;
39
+ int sz = a->sz;
40
41
if (!dc_isar_feature(aa32_vsel, s)) {
42
return false;
43
}
27
}
44
28
45
- if (dp && !dc_isar_feature(aa32_fpdp_v2, s)) {
29
- /* 4 bytes for the CRC. */
46
+ if (sz == 3 && !dc_isar_feature(aa32_fpdp_v2, s)) {
30
- size += 4;
47
+ return false;
31
crc = cpu_to_be32(crc32(~0, buf, size));
48
+ }
32
+ /* Increase size by 4, loop below reads the last 4 bytes from crc_ptr. */
49
+
33
+ size += 4;
50
+ if (sz == 1 && !dc_isar_feature(aa32_fp16_arith, s)) {
34
crc_ptr = (uint8_t *) &crc;
51
return false;
35
36
/* Huge frames are truncated. */
37
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
38
return 0;
52
}
39
}
53
40
54
/* UNDEF accesses to D16-D31 if they don't exist */
41
- /* 4 bytes for the CRC. */
55
- if (dp && !dc_isar_feature(aa32_simd_r32, s) &&
42
- size += 4;
56
+ if (sz == 3 && !dc_isar_feature(aa32_simd_r32, s) &&
43
crc = cpu_to_be32(crc32(~0, buf, size));
57
((a->vm | a->vn | a->vd) & 0x10)) {
44
+ /* Increase size by 4, loop below reads the last 4 bytes from crc_ptr. */
58
return false;
45
+ size += 4;
59
}
46
crc_ptr = (uint8_t *) &crc;
60
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
47
61
return true;
48
if (shift16) {
62
}
63
64
- if (dp) {
65
+ if (sz == 3) {
66
TCGv_i64 frn, frm, dest;
67
TCGv_i64 tmp, zero, zf, nf, vf;
68
69
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
70
tcg_temp_free_i32(tmp);
71
break;
72
}
73
+ /* For fp16 the top half is always zeroes */
74
+ if (sz == 1) {
75
+ tcg_gen_andi_i32(dest, dest, 0xffff);
76
+ }
77
neon_store_reg32(dest, rd);
78
tcg_temp_free_i32(frn);
79
tcg_temp_free_i32(frm);
80
--
49
--
81
2.20.1
50
2.25.1
82
83
diff view generated by jsdifflib
Deleted patch
1
The fp16 extension includes a new instruction VINS, which copies the
2
lower 16 bits of a 32-bit source VFP register into the upper 16 bits
3
of the destination. Implement it.
4
1
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
7
Message-id: 20200828183354.27913-20-peter.maydell@linaro.org
8
---
9
target/arm/vfp-uncond.decode | 3 +++
10
target/arm/translate-vfp.c.inc | 28 ++++++++++++++++++++++++++++
11
2 files changed, 31 insertions(+)
12
13
diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
14
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/vfp-uncond.decode
16
+++ b/target/arm/vfp-uncond.decode
17
@@ -XXX,XX +XXX,XX @@ VCVT 1111 1110 1.11 11 rm:2 .... 1010 op:1 1.0 .... \
18
vm=%vm_sp vd=%vd_sp sz=2
19
VCVT 1111 1110 1.11 11 rm:2 .... 1011 op:1 1.0 .... \
20
vm=%vm_dp vd=%vd_sp sz=3
21
+
22
+VINS 1111 1110 1.11 0000 .... 1010 11 . 0 .... \
23
+ vd=%vd_sp vm=%vm_sp
24
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
25
index XXXXXXX..XXXXXXX 100644
26
--- a/target/arm/translate-vfp.c.inc
27
+++ b/target/arm/translate-vfp.c.inc
28
@@ -XXX,XX +XXX,XX @@ static bool trans_NOCP(DisasContext *s, arg_NOCP *a)
29
30
return false;
31
}
32
+
33
+static bool trans_VINS(DisasContext *s, arg_VINS *a)
34
+{
35
+ TCGv_i32 rd, rm;
36
+
37
+ if (!dc_isar_feature(aa32_fp16_arith, s)) {
38
+ return false;
39
+ }
40
+
41
+ if (s->vec_len != 0 || s->vec_stride != 0) {
42
+ return false;
43
+ }
44
+
45
+ if (!vfp_access_check(s)) {
46
+ return true;
47
+ }
48
+
49
+ /* Insert low half of Vm into high half of Vd */
50
+ rm = tcg_temp_new_i32();
51
+ rd = tcg_temp_new_i32();
52
+ neon_load_reg32(rm, a->vm);
53
+ neon_load_reg32(rd, a->vd);
54
+ tcg_gen_deposit_i32(rd, rd, rm, 16, 16);
55
+ neon_store_reg32(rd, a->vd);
56
+ tcg_temp_free_i32(rm);
57
+ tcg_temp_free_i32(rd);
58
+ return true;
59
+}
60
--
61
2.20.1
62
63
diff view generated by jsdifflib
Deleted patch
1
The fp16 extension includes a new instruction VMOVX, which copies the
2
upper 16 bits of a 32-bit source VFP register into the lower 16
3
bits of the destination and zeroes the high half of the destination.
4
Implement it.
5
1
6
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
8
Message-id: 20200828183354.27913-21-peter.maydell@linaro.org
9
---
10
target/arm/vfp-uncond.decode | 3 +++
11
target/arm/translate-vfp.c.inc | 25 +++++++++++++++++++++++++
12
2 files changed, 28 insertions(+)
13
14
diff --git a/target/arm/vfp-uncond.decode b/target/arm/vfp-uncond.decode
15
index XXXXXXX..XXXXXXX 100644
16
--- a/target/arm/vfp-uncond.decode
17
+++ b/target/arm/vfp-uncond.decode
18
@@ -XXX,XX +XXX,XX @@ VCVT 1111 1110 1.11 11 rm:2 .... 1010 op:1 1.0 .... \
19
VCVT 1111 1110 1.11 11 rm:2 .... 1011 op:1 1.0 .... \
20
vm=%vm_dp vd=%vd_sp sz=3
21
22
+VMOVX 1111 1110 1.11 0000 .... 1010 01 . 0 .... \
23
+ vd=%vd_sp vm=%vm_sp
24
+
25
VINS 1111 1110 1.11 0000 .... 1010 11 . 0 .... \
26
vd=%vd_sp vm=%vm_sp
27
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
28
index XXXXXXX..XXXXXXX 100644
29
--- a/target/arm/translate-vfp.c.inc
30
+++ b/target/arm/translate-vfp.c.inc
31
@@ -XXX,XX +XXX,XX @@ static bool trans_VINS(DisasContext *s, arg_VINS *a)
32
tcg_temp_free_i32(rd);
33
return true;
34
}
35
+
36
+static bool trans_VMOVX(DisasContext *s, arg_VINS *a)
37
+{
38
+ TCGv_i32 rm;
39
+
40
+ if (!dc_isar_feature(aa32_fp16_arith, s)) {
41
+ return false;
42
+ }
43
+
44
+ if (s->vec_len != 0 || s->vec_stride != 0) {
45
+ return false;
46
+ }
47
+
48
+ if (!vfp_access_check(s)) {
49
+ return true;
50
+ }
51
+
52
+ /* Set Vd to high half of Vm */
53
+ rm = tcg_temp_new_i32();
54
+ neon_load_reg32(rm, a->vm);
55
+ tcg_gen_shri_i32(rm, rm, 16);
56
+ neon_store_reg32(rm, a->vd);
57
+ tcg_temp_free_i32(rm);
58
+ return true;
59
+}
60
--
61
2.20.1
62
63
diff view generated by jsdifflib
Deleted patch
1
Implement the VFP fp16 variant of VMOV that transfers a 16-bit
2
value between a general purpose register and a VFP register.
3
1
4
Note that Rt == 15 is UNPREDICTABLE; since this insn is v8 and later
5
only we have no need to replicate the old "updates CPSR.NZCV"
6
behaviour that the singleprec version of this insn does.
7
8
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
9
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
10
Message-id: 20200828183354.27913-22-peter.maydell@linaro.org
11
---
12
target/arm/vfp.decode | 1 +
13
target/arm/translate-vfp.c.inc | 34 ++++++++++++++++++++++++++++++++++
14
2 files changed, 35 insertions(+)
15
16
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
17
index XXXXXXX..XXXXXXX 100644
18
--- a/target/arm/vfp.decode
19
+++ b/target/arm/vfp.decode
20
@@ -XXX,XX +XXX,XX @@ VDUP ---- 1110 1 b:1 q:1 0 .... rt:4 1011 . 0 e:1 1 0000 \
21
vn=%vn_dp
22
23
VMSR_VMRS ---- 1110 111 l:1 reg:4 rt:4 1010 0001 0000
24
+VMOV_half ---- 1110 000 l:1 .... rt:4 1001 . 001 0000 vn=%vn_sp
25
VMOV_single ---- 1110 000 l:1 .... rt:4 1010 . 001 0000 vn=%vn_sp
26
27
VMOV_64_sp ---- 1100 010 op:1 rt2:4 rt:4 1010 00.1 .... vm=%vm_sp
28
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
29
index XXXXXXX..XXXXXXX 100644
30
--- a/target/arm/translate-vfp.c.inc
31
+++ b/target/arm/translate-vfp.c.inc
32
@@ -XXX,XX +XXX,XX @@ static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
33
return true;
34
}
35
36
+static bool trans_VMOV_half(DisasContext *s, arg_VMOV_single *a)
37
+{
38
+ TCGv_i32 tmp;
39
+
40
+ if (!dc_isar_feature(aa32_fp16_arith, s)) {
41
+ return false;
42
+ }
43
+
44
+ if (a->rt == 15) {
45
+ /* UNPREDICTABLE; we choose to UNDEF */
46
+ return false;
47
+ }
48
+
49
+ if (!vfp_access_check(s)) {
50
+ return true;
51
+ }
52
+
53
+ if (a->l) {
54
+ /* VFP to general purpose register */
55
+ tmp = tcg_temp_new_i32();
56
+ neon_load_reg32(tmp, a->vn);
57
+ tcg_gen_andi_i32(tmp, tmp, 0xffff);
58
+ store_reg(s, a->rt, tmp);
59
+ } else {
60
+ /* general purpose register to VFP */
61
+ tmp = load_reg(s, a->rt);
62
+ tcg_gen_andi_i32(tmp, tmp, 0xffff);
63
+ neon_store_reg32(tmp, a->vn);
64
+ tcg_temp_free_i32(tmp);
65
+ }
66
+
67
+ return true;
68
+}
69
+
70
static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
71
{
72
TCGv_i32 tmp;
73
--
74
2.20.1
75
76
diff view generated by jsdifflib
Deleted patch
1
We already have gvec helpers for floating point VRECPE and
2
VRQSRTE, so convert the Neon decoder to use them and
3
add the fp16 support.
4
1
5
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
7
Message-id: 20200828183354.27913-25-peter.maydell@linaro.org
8
---
9
target/arm/translate-neon.c.inc | 31 +++++++++++++++++++++++++++++--
10
1 file changed, 29 insertions(+), 2 deletions(-)
11
12
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
13
index XXXXXXX..XXXXXXX 100644
14
--- a/target/arm/translate-neon.c.inc
15
+++ b/target/arm/translate-neon.c.inc
16
@@ -XXX,XX +XXX,XX @@ static bool do_2misc_fp(DisasContext *s, arg_2misc *a,
17
return do_2misc_fp(s, a, FUNC); \
18
}
19
20
-DO_2MISC_FP(VRECPE_F, gen_helper_recpe_f32)
21
-DO_2MISC_FP(VRSQRTE_F, gen_helper_rsqrte_f32)
22
DO_2MISC_FP(VCVT_FS, gen_helper_vfp_sitos)
23
DO_2MISC_FP(VCVT_FU, gen_helper_vfp_uitos)
24
DO_2MISC_FP(VCVT_SF, gen_helper_vfp_tosizs)
25
DO_2MISC_FP(VCVT_UF, gen_helper_vfp_touizs)
26
27
+#define DO_2MISC_FP_VEC(INSN, HFUNC, SFUNC) \
28
+ static void gen_##INSN(unsigned vece, uint32_t rd_ofs, \
29
+ uint32_t rm_ofs, \
30
+ uint32_t oprsz, uint32_t maxsz) \
31
+ { \
32
+ static gen_helper_gvec_2_ptr * const fns[4] = { \
33
+ NULL, HFUNC, SFUNC, NULL, \
34
+ }; \
35
+ TCGv_ptr fpst; \
36
+ fpst = fpstatus_ptr(vece == MO_16 ? FPST_STD_F16 : FPST_STD); \
37
+ tcg_gen_gvec_2_ptr(rd_ofs, rm_ofs, fpst, oprsz, maxsz, 0, \
38
+ fns[vece]); \
39
+ tcg_temp_free_ptr(fpst); \
40
+ } \
41
+ static bool trans_##INSN(DisasContext *s, arg_2misc *a) \
42
+ { \
43
+ if (a->size == MO_16) { \
44
+ if (!dc_isar_feature(aa32_fp16_arith, s)) { \
45
+ return false; \
46
+ } \
47
+ } else if (a->size != MO_32) { \
48
+ return false; \
49
+ } \
50
+ return do_2misc_vec(s, a, gen_##INSN); \
51
+ }
52
+
53
+DO_2MISC_FP_VEC(VRECPE_F, gen_helper_gvec_frecpe_h, gen_helper_gvec_frecpe_s)
54
+DO_2MISC_FP_VEC(VRSQRTE_F, gen_helper_gvec_frsqrte_h, gen_helper_gvec_frsqrte_s)
55
+
56
static bool trans_VRINTX(DisasContext *s, arg_2misc *a)
57
{
58
if (!arm_dc_feature(s, ARM_FEATURE_V8)) {
59
--
60
2.20.1
61
62
diff view generated by jsdifflib
Deleted patch
1
Convert the Neon floating-point vector comparison ops VCEQ,
2
VCGE and VCGT over to using a gvec helper and use this to
3
implement the fp16 case.
4
1
5
(We put the float16_ceq() etc functions above the DO_2OP()
6
macro definition because later when we convert the
7
compare-against-zero instructions we'll want their
8
definitions to be visible at that point in the source file.)
9
10
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
11
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
12
Message-id: 20200828183354.27913-27-peter.maydell@linaro.org
13
---
14
target/arm/helper.h | 9 +++++++
15
target/arm/vec_helper.c | 44 +++++++++++++++++++++++++++++++++
16
target/arm/translate-neon.c.inc | 6 ++---
17
3 files changed, 56 insertions(+), 3 deletions(-)
18
19
diff --git a/target/arm/helper.h b/target/arm/helper.h
20
index XXXXXXX..XXXXXXX 100644
21
--- a/target/arm/helper.h
22
+++ b/target/arm/helper.h
23
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fmul_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
24
DEF_HELPER_FLAGS_5(gvec_fabd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
25
DEF_HELPER_FLAGS_5(gvec_fabd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
26
27
+DEF_HELPER_FLAGS_5(gvec_fceq_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
28
+DEF_HELPER_FLAGS_5(gvec_fceq_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
29
+
30
+DEF_HELPER_FLAGS_5(gvec_fcge_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
31
+DEF_HELPER_FLAGS_5(gvec_fcge_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
32
+
33
+DEF_HELPER_FLAGS_5(gvec_fcgt_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
34
+DEF_HELPER_FLAGS_5(gvec_fcgt_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
35
+
36
DEF_HELPER_FLAGS_5(gvec_ftsmul_h, TCG_CALL_NO_RWG,
37
void, ptr, ptr, ptr, ptr, i32)
38
DEF_HELPER_FLAGS_5(gvec_ftsmul_s, TCG_CALL_NO_RWG,
39
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
40
index XXXXXXX..XXXXXXX 100644
41
--- a/target/arm/vec_helper.c
42
+++ b/target/arm/vec_helper.c
43
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fcmlad)(void *vd, void *vn, void *vm,
44
clear_tail(d, opr_sz, simd_maxsz(desc));
45
}
46
47
+/*
48
+ * Floating point comparisons producing an integer result (all 1s or all 0s).
49
+ * Note that EQ doesn't signal InvalidOp for QNaNs but GE and GT do.
50
+ * Softfloat routines return 0/1, which we convert to the 0/-1 Neon requires.
51
+ */
52
+static uint16_t float16_ceq(float16 op1, float16 op2, float_status *stat)
53
+{
54
+ return -float16_eq_quiet(op1, op2, stat);
55
+}
56
+
57
+static uint32_t float32_ceq(float32 op1, float32 op2, float_status *stat)
58
+{
59
+ return -float32_eq_quiet(op1, op2, stat);
60
+}
61
+
62
+static uint16_t float16_cge(float16 op1, float16 op2, float_status *stat)
63
+{
64
+ return -float16_le(op2, op1, stat);
65
+}
66
+
67
+static uint32_t float32_cge(float32 op1, float32 op2, float_status *stat)
68
+{
69
+ return -float32_le(op2, op1, stat);
70
+}
71
+
72
+static uint16_t float16_cgt(float16 op1, float16 op2, float_status *stat)
73
+{
74
+ return -float16_lt(op2, op1, stat);
75
+}
76
+
77
+static uint32_t float32_cgt(float32 op1, float32 op2, float_status *stat)
78
+{
79
+ return -float32_lt(op2, op1, stat);
80
+}
81
+
82
#define DO_2OP(NAME, FUNC, TYPE) \
83
void HELPER(NAME)(void *vd, void *vn, void *stat, uint32_t desc) \
84
{ \
85
@@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_ftsmul_d, float64_ftsmul, float64)
86
DO_3OP(gvec_fabd_h, float16_abd, float16)
87
DO_3OP(gvec_fabd_s, float32_abd, float32)
88
89
+DO_3OP(gvec_fceq_h, float16_ceq, float16)
90
+DO_3OP(gvec_fceq_s, float32_ceq, float32)
91
+
92
+DO_3OP(gvec_fcge_h, float16_cge, float16)
93
+DO_3OP(gvec_fcge_s, float32_cge, float32)
94
+
95
+DO_3OP(gvec_fcgt_h, float16_cgt, float16)
96
+DO_3OP(gvec_fcgt_s, float32_cgt, float32)
97
+
98
#ifdef TARGET_AARCH64
99
100
DO_3OP(gvec_recps_h, helper_recpsf_f16, float16)
101
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
102
index XXXXXXX..XXXXXXX 100644
103
--- a/target/arm/translate-neon.c.inc
104
+++ b/target/arm/translate-neon.c.inc
105
@@ -XXX,XX +XXX,XX @@ DO_3S_FP_GVEC(VADD, gen_helper_gvec_fadd_s, gen_helper_gvec_fadd_h)
106
DO_3S_FP_GVEC(VSUB, gen_helper_gvec_fsub_s, gen_helper_gvec_fsub_h)
107
DO_3S_FP_GVEC(VABD, gen_helper_gvec_fabd_s, gen_helper_gvec_fabd_h)
108
DO_3S_FP_GVEC(VMUL, gen_helper_gvec_fmul_s, gen_helper_gvec_fmul_h)
109
+DO_3S_FP_GVEC(VCEQ, gen_helper_gvec_fceq_s, gen_helper_gvec_fceq_h)
110
+DO_3S_FP_GVEC(VCGE, gen_helper_gvec_fcge_s, gen_helper_gvec_fcge_h)
111
+DO_3S_FP_GVEC(VCGT, gen_helper_gvec_fcgt_s, gen_helper_gvec_fcgt_h)
112
113
/*
114
* For all the functions using this macro, size == 1 means fp16,
115
@@ -XXX,XX +XXX,XX @@ DO_3S_FP_GVEC(VMUL, gen_helper_gvec_fmul_s, gen_helper_gvec_fmul_h)
116
return do_3same_fp(s, a, FUNC, READS_VD); \
117
}
118
119
-DO_3S_FP(VCEQ, gen_helper_neon_ceq_f32, false)
120
-DO_3S_FP(VCGE, gen_helper_neon_cge_f32, false)
121
-DO_3S_FP(VCGT, gen_helper_neon_cgt_f32, false)
122
DO_3S_FP(VACGE, gen_helper_neon_acge_f32, false)
123
DO_3S_FP(VACGT, gen_helper_neon_acgt_f32, false)
124
DO_3S_FP(VMAX, gen_helper_vfp_maxs, false)
125
--
126
2.20.1
127
128
diff view generated by jsdifflib
Deleted patch
1
Convert the Neon floating-point VMLA and VMLS insns over to using a
2
gvec helper, and use this to implement the fp16 case.
3
1
4
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
5
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
6
Message-id: 20200828183354.27913-31-peter.maydell@linaro.org
7
---
8
target/arm/helper.h | 6 +++++
9
target/arm/vec_helper.c | 42 +++++++++++++++++++++++++++++++++
10
target/arm/translate-neon.c.inc | 33 ++------------------------
11
3 files changed, 50 insertions(+), 31 deletions(-)
12
13
diff --git a/target/arm/helper.h b/target/arm/helper.h
14
index XXXXXXX..XXXXXXX 100644
15
--- a/target/arm/helper.h
16
+++ b/target/arm/helper.h
17
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_fmaxnum_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i3
18
DEF_HELPER_FLAGS_5(gvec_fminnum_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
19
DEF_HELPER_FLAGS_5(gvec_fminnum_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
20
21
+DEF_HELPER_FLAGS_5(gvec_fmla_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
22
+DEF_HELPER_FLAGS_5(gvec_fmla_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
23
+
24
+DEF_HELPER_FLAGS_5(gvec_fmls_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
25
+DEF_HELPER_FLAGS_5(gvec_fmls_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
26
+
27
DEF_HELPER_FLAGS_5(gvec_ftsmul_h, TCG_CALL_NO_RWG,
28
void, ptr, ptr, ptr, ptr, i32)
29
DEF_HELPER_FLAGS_5(gvec_ftsmul_s, TCG_CALL_NO_RWG,
30
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
31
index XXXXXXX..XXXXXXX 100644
32
--- a/target/arm/vec_helper.c
33
+++ b/target/arm/vec_helper.c
34
@@ -XXX,XX +XXX,XX @@ DO_3OP(gvec_rsqrts_d, helper_rsqrtsf_f64, float64)
35
#endif
36
#undef DO_3OP
37
38
+/* Non-fused multiply-add (unlike float16_muladd etc, which are fused) */
39
+static float16 float16_muladd_nf(float16 dest, float16 op1, float16 op2,
40
+ float_status *stat)
41
+{
42
+ return float16_add(dest, float16_mul(op1, op2, stat), stat);
43
+}
44
+
45
+static float32 float32_muladd_nf(float32 dest, float32 op1, float32 op2,
46
+ float_status *stat)
47
+{
48
+ return float32_add(dest, float32_mul(op1, op2, stat), stat);
49
+}
50
+
51
+static float16 float16_mulsub_nf(float16 dest, float16 op1, float16 op2,
52
+ float_status *stat)
53
+{
54
+ return float16_sub(dest, float16_mul(op1, op2, stat), stat);
55
+}
56
+
57
+static float32 float32_mulsub_nf(float32 dest, float32 op1, float32 op2,
58
+ float_status *stat)
59
+{
60
+ return float32_sub(dest, float32_mul(op1, op2, stat), stat);
61
+}
62
+
63
+#define DO_MULADD(NAME, FUNC, TYPE) \
64
+void HELPER(NAME)(void *vd, void *vn, void *vm, void *stat, uint32_t desc) \
65
+{ \
66
+ intptr_t i, oprsz = simd_oprsz(desc); \
67
+ TYPE *d = vd, *n = vn, *m = vm; \
68
+ for (i = 0; i < oprsz / sizeof(TYPE); i++) { \
69
+ d[i] = FUNC(d[i], n[i], m[i], stat); \
70
+ } \
71
+ clear_tail(d, oprsz, simd_maxsz(desc)); \
72
+}
73
+
74
+DO_MULADD(gvec_fmla_h, float16_muladd_nf, float16)
75
+DO_MULADD(gvec_fmla_s, float32_muladd_nf, float32)
76
+
77
+DO_MULADD(gvec_fmls_h, float16_mulsub_nf, float16)
78
+DO_MULADD(gvec_fmls_s, float32_mulsub_nf, float32)
79
+
80
/* For the indexed ops, SVE applies the index per 128-bit vector segment.
81
* For AdvSIMD, there is of course only one such vector segment.
82
*/
83
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
84
index XXXXXXX..XXXXXXX 100644
85
--- a/target/arm/translate-neon.c.inc
86
+++ b/target/arm/translate-neon.c.inc
87
@@ -XXX,XX +XXX,XX @@ DO_3S_FP_GVEC(VACGE, gen_helper_gvec_facge_s, gen_helper_gvec_facge_h)
88
DO_3S_FP_GVEC(VACGT, gen_helper_gvec_facgt_s, gen_helper_gvec_facgt_h)
89
DO_3S_FP_GVEC(VMAX, gen_helper_gvec_fmax_s, gen_helper_gvec_fmax_h)
90
DO_3S_FP_GVEC(VMIN, gen_helper_gvec_fmin_s, gen_helper_gvec_fmin_h)
91
-
92
-/*
93
- * For all the functions using this macro, size == 1 means fp16,
94
- * which is an architecture extension we don't implement yet.
95
- */
96
-#define DO_3S_FP(INSN,FUNC,READS_VD) \
97
- static bool trans_##INSN##_fp_3s(DisasContext *s, arg_3same *a) \
98
- { \
99
- if (a->size != 0) { \
100
- /* TODO fp16 support */ \
101
- return false; \
102
- } \
103
- return do_3same_fp(s, a, FUNC, READS_VD); \
104
- }
105
-
106
-static void gen_VMLA_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm,
107
- TCGv_ptr fpstatus)
108
-{
109
- gen_helper_vfp_muls(vn, vn, vm, fpstatus);
110
- gen_helper_vfp_adds(vd, vd, vn, fpstatus);
111
-}
112
-
113
-static void gen_VMLS_fp_3s(TCGv_i32 vd, TCGv_i32 vn, TCGv_i32 vm,
114
- TCGv_ptr fpstatus)
115
-{
116
- gen_helper_vfp_muls(vn, vn, vm, fpstatus);
117
- gen_helper_vfp_subs(vd, vd, vn, fpstatus);
118
-}
119
-
120
-DO_3S_FP(VMLA, gen_VMLA_fp_3s, true)
121
-DO_3S_FP(VMLS, gen_VMLS_fp_3s, true)
122
+DO_3S_FP_GVEC(VMLA, gen_helper_gvec_fmla_s, gen_helper_gvec_fmla_h)
123
+DO_3S_FP_GVEC(VMLS, gen_helper_gvec_fmls_s, gen_helper_gvec_fmls_h)
124
125
WRAP_FP_GVEC(gen_VMAXNM_fp32_3s, FPST_STD, gen_helper_gvec_fmaxnum_s)
126
WRAP_FP_GVEC(gen_VMAXNM_fp16_3s, FPST_STD_F16, gen_helper_gvec_fmaxnum_h)
127
--
128
2.20.1
129
130
diff view generated by jsdifflib