Add a shared emulation library in target/arm/emulate/ using a
decodetree decoder (a64-ldst.decode) and a callback-based interface
(struct arm_emul_ops) that any hypervisor backend can implement.
The hypervisor cannot emulate ISV=0 data aborts without decoding the
faulting instruction, since the ESR syndrome does not carry the access
size or target register.
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
target/arm/emulate/a64-ldst.decode | 293 ++++++++++++
target/arm/emulate/arm_emulate.c | 738 +++++++++++++++++++++++++++++
target/arm/emulate/arm_emulate.h | 55 +++
target/arm/emulate/meson.build | 16 +
target/arm/meson.build | 1 +
5 files changed, 1103 insertions(+)
create mode 100644 target/arm/emulate/a64-ldst.decode
create mode 100644 target/arm/emulate/arm_emulate.c
create mode 100644 target/arm/emulate/arm_emulate.h
create mode 100644 target/arm/emulate/meson.build
diff --git a/target/arm/emulate/a64-ldst.decode b/target/arm/emulate/a64-ldst.decode
new file mode 100644
index 0000000..9a7b697
--- /dev/null
+++ b/target/arm/emulate/a64-ldst.decode
@@ -0,0 +1,293 @@
+# AArch64 load/store instruction patterns for ISV=0 emulation
+#
+# Copyright (c) 2026 Lucas Amaral <lucaaamaral@gmail.com>
+#
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+### Argument sets
+
+# Load/store exclusive
+&stxr rn rt rt2 rs sz lasr
+
+# Load/store pair (GPR and SIMD/FP)
+&ldstpair rt2 rt rn imm sz sign w p
+
+# Load/store immediate (unscaled, pre/post-index, unprivileged, unsigned offset)
+# 'u' flag: 0 = 9-bit signed immediate (byte offset), 1 = 12-bit unsigned (needs << sz)
+&ldst_imm rt rn imm sz sign w p unpriv ext u
+
+# Load/store register offset
+&ldst rm rn rt sign ext sz opt s
+
+# Atomic memory operations
+&atomic rs rn rt a r sz
+
+# Compare-and-swap
+&cas rs rn rt sz a r
+
+# Load with PAC (LDRAA/LDRAB, FEAT_PAuth)
+%ldra_imm 22:s1 12:9
+&ldra rt rn imm m w
+
+### Format templates
+
+# Exclusives
+@stxr sz:2 ...... ... rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr
+
+# Load/store pair: imm7 is signed, scaled by element size in handler
+@ldstpair .. ... . ... . imm:s7 rt2:5 rn:5 rt:5 &ldstpair
+
+# Load/store immediate (9-bit signed)
+@ldst_imm .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=0 p=0 w=0
+@ldst_imm_pre .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=0 p=0 w=1
+@ldst_imm_post .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=0 p=1 w=1
+@ldst_imm_user .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=1 p=0 w=0
+
+# Load/store unsigned offset (12-bit, handler scales by << sz)
+@ldst_uimm .. ... . .. .. imm:12 rn:5 rt:5 &ldst_imm u=1 unpriv=0 p=0 w=0
+
+# Load/store register offset
+@ldst .. ... . .. .. . rm:5 opt:3 s:1 .. rn:5 rt:5 &ldst
+
+# Atomics
+@atomic sz:2 ... . .. a:1 r:1 . rs:5 . ... .. rn:5 rt:5 &atomic
+
+# Compare-and-swap: sz extracted by pattern (CAS) or set constant (CASP)
+@cas .. ...... . a:1 . rs:5 r:1 ..... rn:5 rt:5 &cas
+
+# Load with PAC
+@ldra .. ... . .. m:1 . . ......... w:1 . rn:5 rt:5 &ldra imm=%ldra_imm
+
+### Load/store exclusive
+
+# STXR / STLXR (sz encodes 8/16/32/64-bit)
+STXR .. 001000 000 ..... . ..... ..... ..... @stxr
+
+# LDXR / LDAXR
+LDXR .. 001000 010 ..... . ..... ..... ..... @stxr
+
+# STXP / STLXP (bit[31]=1, bit[30]=sf → sz=2 for 32-bit, sz=3 for 64-bit)
+STXP 10 001000 001 rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr sz=2
+STXP 11 001000 001 rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr sz=3
+
+# LDXP / LDAXP
+LDXP 10 001000 011 rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr sz=2
+LDXP 11 001000 011 rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr sz=3
+
+### Compare-and-swap
+
+# CAS / CASA / CASAL / CASL
+CAS sz:2 001000 1 . 1 ..... . 11111 ..... ..... @cas
+
+# CASP / CASPA / CASPAL / CASPL (pair: Rt,Rt+1 and Rs,Rs+1)
+CASP 00 001000 0 . 1 ..... . 11111 ..... ..... @cas sz=2
+CASP 01 001000 0 . 1 ..... . 11111 ..... ..... @cas sz=3
+
+### Load/store pair — non-temporal (STNP/LDNP)
+
+# STNP/LDNP: offset only, no writeback. Non-temporal hint ignored.
+STP 00 101 0 000 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP 00 101 0 000 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+STP 10 101 0 000 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+LDP 10 101 0 000 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STP_v 00 101 1 000 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP_v 00 101 1 000 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+STP_v 01 101 1 000 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+LDP_v 01 101 1 000 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STP_v 10 101 1 000 0 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=0
+LDP_v 10 101 1 000 1 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=0
+
+### Load/store pair — post-indexed
+
+STP 00 101 0 001 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=1 w=1
+LDP 00 101 0 001 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=1 w=1
+LDP 01 101 0 001 1 ....... ..... ..... ..... @ldstpair sz=2 sign=1 p=1 w=1
+STP 10 101 0 001 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+LDP 10 101 0 001 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+STP_v 00 101 1 001 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=1 w=1
+LDP_v 00 101 1 001 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=1 w=1
+STP_v 01 101 1 001 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+LDP_v 01 101 1 001 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+STP_v 10 101 1 001 0 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=1 w=1
+LDP_v 10 101 1 001 1 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=1 w=1
+
+### Load/store pair — signed offset
+
+STP 00 101 0 010 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP 00 101 0 010 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP 01 101 0 010 1 ....... ..... ..... ..... @ldstpair sz=2 sign=1 p=0 w=0
+STP 10 101 0 010 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+LDP 10 101 0 010 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STP_v 00 101 1 010 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP_v 00 101 1 010 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+STP_v 01 101 1 010 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+LDP_v 01 101 1 010 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STP_v 10 101 1 010 0 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=0
+LDP_v 10 101 1 010 1 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=0
+
+### Load/store pair — pre-indexed
+
+STP 00 101 0 011 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=1
+LDP 00 101 0 011 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=1
+LDP 01 101 0 011 1 ....... ..... ..... ..... @ldstpair sz=2 sign=1 p=0 w=1
+STP 10 101 0 011 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+LDP 10 101 0 011 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+STP_v 00 101 1 011 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=1
+LDP_v 00 101 1 011 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=1
+STP_v 01 101 1 011 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+LDP_v 01 101 1 011 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+STP_v 10 101 1 011 0 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=1
+LDP_v 10 101 1 011 1 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=1
+
+### Load/store pair — STGP (store allocation tag + pair)
+
+STGP 01 101 0 001 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+STGP 01 101 0 010 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STGP 01 101 0 011 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+
+### Load/store register — unscaled immediate (LDUR/STUR)
+
+# GPR
+STR_i sz:2 111 0 00 00 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0
+LDR_i 00 111 0 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=1 sz=0
+LDR_i 01 111 0 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=1 sz=1
+LDR_i 10 111 0 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=1 sz=2
+LDR_i 11 111 0 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0 sz=3
+LDR_i 00 111 0 00 10 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=0 sz=0
+LDR_i 01 111 0 00 10 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=0 sz=1
+LDR_i 10 111 0 00 10 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=0 sz=2
+LDR_i 00 111 0 00 11 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=1 sz=0
+LDR_i 01 111 0 00 11 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=1 sz=1
+
+# SIMD/FP
+STR_v_i sz:2 111 1 00 00 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0
+STR_v_i 00 111 1 00 10 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0 sz=4
+LDR_v_i sz:2 111 1 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0
+LDR_v_i 00 111 1 00 11 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0 sz=4
+
+### Load/store register — post-indexed
+
+# GPR
+STR_i sz:2 111 0 00 00 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0
+LDR_i 00 111 0 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=1 sz=0
+LDR_i 01 111 0 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=1 sz=1
+LDR_i 10 111 0 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=1 sz=2
+LDR_i 11 111 0 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0 sz=3
+LDR_i 00 111 0 00 10 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=0 sz=0
+LDR_i 01 111 0 00 10 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=0 sz=1
+LDR_i 10 111 0 00 10 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=0 sz=2
+LDR_i 00 111 0 00 11 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=1 sz=0
+LDR_i 01 111 0 00 11 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=1 sz=1
+
+# SIMD/FP
+STR_v_i sz:2 111 1 00 00 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0
+STR_v_i 00 111 1 00 10 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0 sz=4
+LDR_v_i sz:2 111 1 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0
+LDR_v_i 00 111 1 00 11 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0 sz=4
+
+### Load/store register — unprivileged
+
+# GPR only (no SIMD/FP unprivileged forms)
+STR_i sz:2 111 0 00 00 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=0
+LDR_i 00 111 0 00 01 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=1 sz=0
+LDR_i 01 111 0 00 01 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=1 sz=1
+LDR_i 10 111 0 00 01 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=1 sz=2
+LDR_i 11 111 0 00 01 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=0 sz=3
+LDR_i 00 111 0 00 10 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=0 sz=0
+LDR_i 01 111 0 00 10 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=0 sz=1
+LDR_i 10 111 0 00 10 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=0 sz=2
+LDR_i 00 111 0 00 11 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=1 sz=0
+LDR_i 01 111 0 00 11 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=1 sz=1
+
+### Load/store register — pre-indexed
+
+# GPR
+STR_i sz:2 111 0 00 00 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0
+LDR_i 00 111 0 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=1 sz=0
+LDR_i 01 111 0 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=1 sz=1
+LDR_i 10 111 0 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=1 sz=2
+LDR_i 11 111 0 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0 sz=3
+LDR_i 00 111 0 00 10 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=0 sz=0
+LDR_i 01 111 0 00 10 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=0 sz=1
+LDR_i 10 111 0 00 10 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=0 sz=2
+LDR_i 00 111 0 00 11 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=1 sz=0
+LDR_i 01 111 0 00 11 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=1 sz=1
+
+# SIMD/FP
+STR_v_i sz:2 111 1 00 00 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0
+STR_v_i 00 111 1 00 10 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0 sz=4
+LDR_v_i sz:2 111 1 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0
+LDR_v_i 00 111 1 00 11 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0 sz=4
+
+### PRFM — unscaled immediate: prefetch is a NOP
+
+NOP 11 111 0 00 10 0 --------- 00 ----- -----
+
+### Load/store register — unsigned offset
+
+# GPR
+STR_i sz:2 111 0 01 00 ............ ..... ..... @ldst_uimm sign=0 ext=0
+LDR_i 00 111 0 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=1 sz=0
+LDR_i 01 111 0 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=1 sz=1
+LDR_i 10 111 0 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=1 sz=2
+LDR_i 11 111 0 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=3
+LDR_i 00 111 0 01 10 ............ ..... ..... @ldst_uimm sign=1 ext=0 sz=0
+LDR_i 01 111 0 01 10 ............ ..... ..... @ldst_uimm sign=1 ext=0 sz=1
+LDR_i 10 111 0 01 10 ............ ..... ..... @ldst_uimm sign=1 ext=0 sz=2
+LDR_i 00 111 0 01 11 ............ ..... ..... @ldst_uimm sign=1 ext=1 sz=0
+LDR_i 01 111 0 01 11 ............ ..... ..... @ldst_uimm sign=1 ext=1 sz=1
+
+# PRFM — unsigned offset
+NOP 11 111 0 01 10 ------------ ----- -----
+
+# SIMD/FP
+STR_v_i sz:2 111 1 01 00 ............ ..... ..... @ldst_uimm sign=0 ext=0
+STR_v_i 00 111 1 01 10 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=4
+LDR_v_i sz:2 111 1 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=0
+LDR_v_i 00 111 1 01 11 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=4
+
+### Load/store register — register offset
+
+# GPR
+STR sz:2 111 0 00 00 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0
+LDR 00 111 0 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=1 sz=0
+LDR 01 111 0 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=1 sz=1
+LDR 10 111 0 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=1 sz=2
+LDR 11 111 0 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0 sz=3
+LDR 00 111 0 00 10 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=0 sz=0
+LDR 01 111 0 00 10 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=0 sz=1
+LDR 10 111 0 00 10 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=0 sz=2
+LDR 00 111 0 00 11 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=1 sz=0
+LDR 01 111 0 00 11 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=1 sz=1
+
+# PRFM — register offset
+NOP 11 111 0 00 10 1 ----- -1- - 10 ----- -----
+
+# SIMD/FP
+STR_v sz:2 111 1 00 00 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0
+STR_v 00 111 1 00 10 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0 sz=4
+LDR_v sz:2 111 1 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0
+LDR_v 00 111 1 00 11 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0 sz=4
+
+### Atomic memory operations
+
+LDADD .. 111 0 00 . . 1 ..... 0000 00 ..... ..... @atomic
+LDCLR .. 111 0 00 . . 1 ..... 0001 00 ..... ..... @atomic
+LDEOR .. 111 0 00 . . 1 ..... 0010 00 ..... ..... @atomic
+LDSET .. 111 0 00 . . 1 ..... 0011 00 ..... ..... @atomic
+LDSMAX .. 111 0 00 . . 1 ..... 0100 00 ..... ..... @atomic
+LDSMIN .. 111 0 00 . . 1 ..... 0101 00 ..... ..... @atomic
+LDUMAX .. 111 0 00 . . 1 ..... 0110 00 ..... ..... @atomic
+LDUMIN .. 111 0 00 . . 1 ..... 0111 00 ..... ..... @atomic
+SWP .. 111 0 00 . . 1 ..... 1000 00 ..... ..... @atomic
+
+### Load with PAC (FEAT_PAuth)
+
+# LDRAA (M=0) / LDRAB (M=1), offset (W=0) / pre-indexed (W=1)
+LDRA 11 111 0 00 . . 1 ......... . 1 ..... ..... @ldra
+
+### System instructions — DC cache maintenance
+
+# SYS with CRn=C7 covers all data cache operations (DC CIVAC, CVAC, etc.).
+# On MMIO regions, cache maintenance is a harmless no-op.
+NOP 1101 0101 0000 1 --- 0111 ---- --- -----
diff --git a/target/arm/emulate/arm_emulate.c b/target/arm/emulate/arm_emulate.c
new file mode 100644
index 0000000..cd8f44d
--- /dev/null
+++ b/target/arm/emulate/arm_emulate.c
@@ -0,0 +1,738 @@
+/*
+ * AArch64 instruction emulation for ISV=0 data aborts
+ *
+ * Copyright (c) 2026 Lucas Amaral <lucaaamaral@gmail.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "arm_emulate.h"
+#include "qemu/bitops.h"
+#include "qemu/error-report.h"
+
+/* Named "DisasContext" as required by the decodetree code generator */
+typedef struct {
+ CPUState *cpu;
+ const struct arm_emul_ops *ops;
+ ArmEmulResult result;
+} DisasContext;
+
+#include "decode-a64-ldst.c.inc"
+
+/* GPR data access (Rt, Rs, Rt2) -- register 31 = XZR */
+
+static uint64_t gpr_read(DisasContext *ctx, int reg)
+{
+ if (reg == 31) {
+ return 0; /* XZR */
+ }
+ return ctx->ops->read_gpr(ctx->cpu, reg);
+}
+
+static void gpr_write(DisasContext *ctx, int reg, uint64_t val)
+{
+ if (reg == 31) {
+ return; /* XZR -- discard */
+ }
+ ctx->ops->write_gpr(ctx->cpu, reg, val);
+}
+
+/* Base register access (Rn) -- register 31 = SP */
+
+static uint64_t base_read(DisasContext *ctx, int rn)
+{
+ return ctx->ops->read_gpr(ctx->cpu, rn);
+}
+
+static void base_write(DisasContext *ctx, int rn, uint64_t val)
+{
+ ctx->ops->write_gpr(ctx->cpu, rn, val);
+}
+
+/* Memory access wrappers */
+
+static int mem_read(DisasContext *ctx, uint64_t va, void *buf, int size)
+{
+ int ret = ctx->ops->read_mem(ctx->cpu, va, buf, size);
+ if (ret != 0) {
+ ctx->result = ARM_EMUL_ERR_MEM;
+ }
+ return ret;
+}
+
+static int mem_write(DisasContext *ctx, uint64_t va, const void *buf, int size)
+{
+ int ret = ctx->ops->write_mem(ctx->cpu, va, buf, size);
+ if (ret != 0) {
+ ctx->result = ARM_EMUL_ERR_MEM;
+ }
+ return ret;
+}
+
+/* Sign/zero extension helpers */
+
+static uint64_t sign_extend(uint64_t val, int from_bits)
+{
+ int shift = 64 - from_bits;
+ return (int64_t)(val << shift) >> shift;
+}
+
+/* Apply sign/zero extension */
+static uint64_t load_extend(uint64_t val, int sz, int sign, int ext)
+{
+ int data_bits = 8 << sz;
+
+ if (sign) {
+ val = sign_extend(val, data_bits);
+ if (ext) {
+ /* Sign-extend to 32 bits (W register) */
+ val &= 0xFFFFFFFF;
+ }
+ } else if (ext) {
+ /* Zero-extend to 32 bits (W register) */
+ val &= 0xFFFFFFFF;
+ }
+ return val;
+}
+
+/* Register offset extension (DDI 0487 C6.2.131) */
+
+static uint64_t extend_reg(uint64_t val, int option, int shift)
+{
+ switch (option) {
+ case 0: /* UXTB */
+ val = (uint8_t)val;
+ break;
+ case 1: /* UXTH */
+ val = (uint16_t)val;
+ break;
+ case 2: /* UXTW */
+ val = (uint32_t)val;
+ break;
+ case 3: /* UXTX / LSL */
+ break;
+ case 4: /* SXTB */
+ val = (int64_t)(int8_t)val;
+ break;
+ case 5: /* SXTH */
+ val = (int64_t)(int16_t)val;
+ break;
+ case 6: /* SXTW */
+ val = (int64_t)(int32_t)val;
+ break;
+ case 7: /* SXTX */
+ break;
+ }
+ return val << shift;
+}
+
+/*
+ * Load/store pair: STP, LDP, STNP, LDNP, STGP, LDPSW
+ * (DDI 0487 C3.3.14 -- C3.3.16)
+ */
+
+static bool trans_STP(DisasContext *ctx, arg_ldstpair *a)
+{
+ int esize = 1 << a->sz; /* 4 or 8 bytes */
+ int64_t offset = (int64_t)a->imm << a->sz;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset; /* post-index: unmodified base */
+ uint8_t buf[16]; /* max 2 x 8 bytes */
+
+ uint64_t v1 = gpr_read(ctx, a->rt);
+ uint64_t v2 = gpr_read(ctx, a->rt2);
+ memcpy(buf, &v1, esize);
+ memcpy(buf + esize, &v2, esize);
+
+ if (mem_write(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+static bool trans_LDP(DisasContext *ctx, arg_ldstpair *a)
+{
+ int esize = 1 << a->sz;
+ int64_t offset = (int64_t)a->imm << a->sz;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[16];
+ uint64_t v1 = 0, v2 = 0;
+
+ memset(buf, 0, sizeof(buf));
+ if (mem_read(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+ memcpy(&v1, buf, esize);
+ memcpy(&v2, buf + esize, esize);
+
+ /* LDPSW: sign-extend 32-bit values to 64-bit (sign=1, sz=2) */
+ if (a->sign) {
+ v1 = sign_extend(v1, 8 * esize);
+ v2 = sign_extend(v2, 8 * esize);
+ }
+
+ gpr_write(ctx, a->rt, v1);
+ gpr_write(ctx, a->rt2, v2);
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+/* STGP: tag operation is a NOP for emulation; data stored via STP */
+static bool trans_STGP(DisasContext *ctx, arg_ldstpair *a)
+{
+ return trans_STP(ctx, a);
+}
+
+/*
+ * SIMD/FP load/store pair: STP_v, LDP_v
+ * (DDI 0487 C3.3.14 -- C3.3.16)
+ */
+
+static bool trans_STP_v(DisasContext *ctx, arg_ldstpair *a)
+{
+ int esize = 1 << a->sz; /* 4, 8, or 16 bytes */
+ int64_t offset = (int64_t)a->imm << a->sz;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[32]; /* max 2 x 16 bytes */
+
+ ctx->ops->read_fpreg(ctx->cpu, a->rt, buf, esize);
+ ctx->ops->read_fpreg(ctx->cpu, a->rt2, buf + esize, esize);
+
+ if (mem_write(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+static bool trans_LDP_v(DisasContext *ctx, arg_ldstpair *a)
+{
+ int esize = 1 << a->sz;
+ int64_t offset = (int64_t)a->imm << a->sz;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[32];
+
+ memset(buf, 0, sizeof(buf));
+ if (mem_read(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ ctx->ops->write_fpreg(ctx->cpu, a->rt, buf, esize);
+ ctx->ops->write_fpreg(ctx->cpu, a->rt2, buf + esize, esize);
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+/* Load/store single -- immediate (GPR) (DDI 0487 C3.3.8 -- C3.3.13) */
+
+static bool trans_STR_i(DisasContext *ctx, arg_ldst_imm *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int64_t offset = a->u ? ((int64_t)(uint64_t)a->imm << a->sz)
+ : (int64_t)a->imm;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+
+ uint64_t val = gpr_read(ctx, a->rt);
+ if (mem_write(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+static bool trans_LDR_i(DisasContext *ctx, arg_ldst_imm *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int64_t offset = a->u ? ((int64_t)(uint64_t)a->imm << a->sz)
+ : (int64_t)a->imm;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint64_t val = 0;
+
+ if (mem_read(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ val = load_extend(val, a->sz, a->sign, a->ext);
+ gpr_write(ctx, a->rt, val);
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+/*
+ * Load/store single -- immediate (SIMD/FP)
+ * STR_v_i / LDR_v_i (DDI 0487 C3.3.10)
+ */
+
+static bool trans_STR_v_i(DisasContext *ctx, arg_ldst_imm *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int64_t offset = a->u ? ((int64_t)(uint64_t)a->imm << a->sz)
+ : (int64_t)a->imm;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[16];
+
+ ctx->ops->read_fpreg(ctx->cpu, a->rt, buf, esize);
+ if (mem_write(ctx, va, buf, esize) != 0) {
+ return true;
+ }
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+static bool trans_LDR_v_i(DisasContext *ctx, arg_ldst_imm *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int64_t offset = a->u ? ((int64_t)(uint64_t)a->imm << a->sz)
+ : (int64_t)a->imm;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[16];
+
+ memset(buf, 0, sizeof(buf));
+ if (mem_read(ctx, va, buf, esize) != 0) {
+ return true;
+ }
+
+ ctx->ops->write_fpreg(ctx->cpu, a->rt, buf, esize);
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+/*
+ * Load/store single -- register offset (GPR)
+ * STR / LDR (DDI 0487 C3.3.9)
+ */
+
+static bool trans_STR(DisasContext *ctx, arg_ldst *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int shift = a->s ? a->sz : 0;
+ uint64_t rm_val = gpr_read(ctx, a->rm);
+ uint64_t offset = extend_reg(rm_val, a->opt, shift);
+ uint64_t va = base_read(ctx, a->rn) + offset;
+
+ uint64_t val = gpr_read(ctx, a->rt);
+ mem_write(ctx, va, &val, esize);
+ return true;
+}
+
+static bool trans_LDR(DisasContext *ctx, arg_ldst *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int shift = a->s ? a->sz : 0;
+ uint64_t rm_val = gpr_read(ctx, a->rm);
+ uint64_t offset = extend_reg(rm_val, a->opt, shift);
+ uint64_t va = base_read(ctx, a->rn) + offset;
+ uint64_t val = 0;
+
+ if (mem_read(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ val = load_extend(val, a->sz, a->sign, a->ext);
+ gpr_write(ctx, a->rt, val);
+ return true;
+}
+
+/*
+ * Load/store single -- register offset (SIMD/FP)
+ * STR_v / LDR_v (DDI 0487 C3.3.10)
+ */
+
+static bool trans_STR_v(DisasContext *ctx, arg_ldst *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int shift = a->s ? a->sz : 0;
+ uint64_t rm_val = gpr_read(ctx, a->rm);
+ uint64_t offset = extend_reg(rm_val, a->opt, shift);
+ uint64_t va = base_read(ctx, a->rn) + offset;
+ uint8_t buf[16];
+
+ ctx->ops->read_fpreg(ctx->cpu, a->rt, buf, esize);
+ mem_write(ctx, va, buf, esize);
+ return true;
+}
+
+static bool trans_LDR_v(DisasContext *ctx, arg_ldst *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int shift = a->s ? a->sz : 0;
+ uint64_t rm_val = gpr_read(ctx, a->rm);
+ uint64_t offset = extend_reg(rm_val, a->opt, shift);
+ uint64_t va = base_read(ctx, a->rn) + offset;
+ uint8_t buf[16];
+
+ memset(buf, 0, sizeof(buf));
+ if (mem_read(ctx, va, buf, esize) != 0) {
+ return true;
+ }
+
+ ctx->ops->write_fpreg(ctx->cpu, a->rt, buf, esize);
+ return true;
+}
+
+/*
+ * Load/store exclusive: STXR, LDXR, STXP, LDXP
+ * (DDI 0487 C3.3.6)
+ *
+ * Exclusive monitors have no meaning on MMIO. STXR always reports
+ * success (Rs=0) and LDXR does not set an exclusive monitor.
+ */
+
+static bool trans_STXR(DisasContext *ctx, arg_stxr *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t val = gpr_read(ctx, a->rt);
+
+ if (mem_write(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ /* Report success -- no exclusive monitor on emulated access */
+ gpr_write(ctx, a->rs, 0);
+ return true;
+}
+
+static bool trans_LDXR(DisasContext *ctx, arg_stxr *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t val = 0;
+
+ if (mem_read(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ gpr_write(ctx, a->rt, val);
+ return true;
+}
+
+static bool trans_STXP(DisasContext *ctx, arg_stxr *a)
+{
+ int esize = 1 << a->sz; /* sz=2->4, sz=3->8 */
+ uint64_t va = base_read(ctx, a->rn);
+ uint8_t buf[16];
+
+ uint64_t v1 = gpr_read(ctx, a->rt);
+ uint64_t v2 = gpr_read(ctx, a->rt2);
+ memcpy(buf, &v1, esize);
+ memcpy(buf + esize, &v2, esize);
+
+ if (mem_write(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ gpr_write(ctx, a->rs, 0); /* success */
+ return true;
+}
+
+static bool trans_LDXP(DisasContext *ctx, arg_stxr *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint8_t buf[16];
+ uint64_t v1 = 0, v2 = 0;
+
+ memset(buf, 0, sizeof(buf));
+ if (mem_read(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ memcpy(&v1, buf, esize);
+ memcpy(&v2, buf + esize, esize);
+ gpr_write(ctx, a->rt, v1);
+ gpr_write(ctx, a->rt2, v2);
+ return true;
+}
+
+/*
+ * Atomic memory operations (DDI 0487 C3.3.2)
+ *
+ * Non-atomic read-modify-write; sufficient for MMIO.
+ * Acquire/release semantics ignored (sequentially consistent by design).
+ */
+
+typedef uint64_t (*atomic_op_fn)(uint64_t old, uint64_t operand, int bits);
+
+static uint64_t atomic_add(uint64_t old, uint64_t op, int bits)
+{
+ (void)bits;
+ return old + op;
+}
+
+static uint64_t atomic_clr(uint64_t old, uint64_t op, int bits)
+{
+ (void)bits;
+ return old & ~op;
+}
+
+static uint64_t atomic_eor(uint64_t old, uint64_t op, int bits)
+{
+ (void)bits;
+ return old ^ op;
+}
+
+static uint64_t atomic_set(uint64_t old, uint64_t op, int bits)
+{
+ (void)bits;
+ return old | op;
+}
+
+static uint64_t atomic_smax(uint64_t old, uint64_t op, int bits)
+{
+ int64_t a = sign_extend(old, bits);
+ int64_t b = sign_extend(op, bits);
+ return (a >= b) ? old : op;
+}
+
+static uint64_t atomic_smin(uint64_t old, uint64_t op, int bits)
+{
+ int64_t a = sign_extend(old, bits);
+ int64_t b = sign_extend(op, bits);
+ return (a <= b) ? old : op;
+}
+
+static uint64_t atomic_umax(uint64_t old, uint64_t op, int bits)
+{
+ uint64_t mask = (bits == 64) ? UINT64_MAX : (1ULL << bits) - 1;
+ return ((old & mask) >= (op & mask)) ? old : op;
+}
+
+static uint64_t atomic_umin(uint64_t old, uint64_t op, int bits)
+{
+ uint64_t mask = (bits == 64) ? UINT64_MAX : (1ULL << bits) - 1;
+ return ((old & mask) <= (op & mask)) ? old : op;
+}
+
+static bool do_atomic(DisasContext *ctx, arg_atomic *a, atomic_op_fn fn)
+{
+ int esize = 1 << a->sz;
+ int bits = 8 * esize;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t old = 0;
+
+ if (mem_read(ctx, va, &old, esize) != 0) {
+ return true;
+ }
+
+ uint64_t operand = gpr_read(ctx, a->rs);
+ uint64_t result = fn(old, operand, bits);
+
+ if (mem_write(ctx, va, &result, esize) != 0) {
+ return true;
+ }
+
+ /* Rt receives the old value (before modification) */
+ gpr_write(ctx, a->rt, old);
+ return true;
+}
+
+static bool trans_LDADD(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_add);
+}
+
+static bool trans_LDCLR(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_clr);
+}
+
+static bool trans_LDEOR(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_eor);
+}
+
+static bool trans_LDSET(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_set);
+}
+
+static bool trans_LDSMAX(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_smax);
+}
+
+static bool trans_LDSMIN(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_smin);
+}
+
+static bool trans_LDUMAX(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_umax);
+}
+
+static bool trans_LDUMIN(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_umin);
+}
+
+static bool trans_SWP(DisasContext *ctx, arg_atomic *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t old = 0;
+
+ if (mem_read(ctx, va, &old, esize) != 0) {
+ return true;
+ }
+
+ uint64_t newval = gpr_read(ctx, a->rs);
+ if (mem_write(ctx, va, &newval, esize) != 0) {
+ return true;
+ }
+
+ gpr_write(ctx, a->rt, old);
+ return true;
+}
+
+/* Compare-and-swap: CAS, CASP (DDI 0487 C3.3.1) */
+
+static bool trans_CAS(DisasContext *ctx, arg_cas *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t current = 0;
+
+ if (mem_read(ctx, va, ¤t, esize) != 0) {
+ return true;
+ }
+
+ uint64_t mask = (esize == 8) ? UINT64_MAX : (1ULL << (8 * esize)) - 1;
+ uint64_t compare = gpr_read(ctx, a->rs) & mask;
+
+ if ((current & mask) == compare) {
+ uint64_t newval = gpr_read(ctx, a->rt) & mask;
+ if (mem_write(ctx, va, &newval, esize) != 0) {
+ return true;
+ }
+ }
+
+ /* Rs receives the old memory value (whether or not swap occurred) */
+ gpr_write(ctx, a->rs, current);
+ return true;
+}
+
+/* CASP: compare-and-swap pair (Rs,Rs+1 compared; Rt,Rt+1 stored) */
+static bool trans_CASP(DisasContext *ctx, arg_cas *a)
+{
+ /* CASP requires even register pairs; odd or r31 is UNPREDICTABLE */
+ if ((a->rs & 1) || a->rs >= 31 || (a->rt & 1) || a->rt >= 31) {
+ return false;
+ }
+
+ int esize = 1 << a->sz; /* per-register size */
+ uint64_t va = base_read(ctx, a->rn);
+ uint8_t buf[16];
+ uint64_t cur1 = 0, cur2 = 0;
+
+ memset(buf, 0, sizeof(buf));
+ if (mem_read(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+ memcpy(&cur1, buf, esize);
+ memcpy(&cur2, buf + esize, esize);
+
+ uint64_t mask = (esize == 8) ? UINT64_MAX : (1ULL << (8 * esize)) - 1;
+ uint64_t cmp1 = gpr_read(ctx, a->rs) & mask;
+ uint64_t cmp2 = gpr_read(ctx, a->rs + 1) & mask;
+
+ if ((cur1 & mask) == cmp1 && (cur2 & mask) == cmp2) {
+ uint64_t new1 = gpr_read(ctx, a->rt) & mask;
+ uint64_t new2 = gpr_read(ctx, a->rt + 1) & mask;
+ memcpy(buf, &new1, esize);
+ memcpy(buf + esize, &new2, esize);
+ if (mem_write(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+ }
+
+ gpr_write(ctx, a->rs, cur1);
+ gpr_write(ctx, a->rs + 1, cur2);
+ return true;
+}
+
+/*
+ * Load with PAC: LDRAA / LDRAB (FEAT_PAuth)
+ * (DDI 0487 C6.2.121)
+ *
+ * Pointer authentication is not emulated -- the base register is used
+ * directly (equivalent to auth always succeeding).
+ */
+
+static bool trans_LDRA(DisasContext *ctx, arg_ldra *a)
+{
+ int64_t offset = (int64_t)a->imm << 3; /* S:imm9, scaled by 8 */
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = base + offset; /* auth not emulated */
+ uint64_t val = 0;
+
+ if (mem_read(ctx, va, &val, 8) != 0) {
+ return true;
+ }
+
+ gpr_write(ctx, a->rt, val);
+
+ if (a->w) {
+ base_write(ctx, a->rn, va);
+ }
+ return true;
+}
+
+/* PRFM, DC cache maintenance -- treated as NOP */
+static bool trans_NOP(DisasContext *ctx, arg_NOP *a)
+{
+ (void)ctx;
+ (void)a;
+ return true;
+}
+
+/* Entry point */
+
+ArmEmulResult arm_emul_insn(CPUState *cpu, const struct arm_emul_ops *ops,
+ uint32_t insn)
+{
+ DisasContext ctx = {
+ .cpu = cpu,
+ .ops = ops,
+ .result = ARM_EMUL_OK,
+ };
+
+ if (!decode_a64_ldst(&ctx, insn)) {
+ return ARM_EMUL_UNHANDLED;
+ }
+
+ return ctx.result;
+}
diff --git a/target/arm/emulate/arm_emulate.h b/target/arm/emulate/arm_emulate.h
new file mode 100644
index 0000000..eef8a37
--- /dev/null
+++ b/target/arm/emulate/arm_emulate.h
@@ -0,0 +1,55 @@
+/*
+ * AArch64 instruction emulation library
+ *
+ * Copyright (c) 2026 Lucas Amaral <lucaaamaral@gmail.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef ARM_EMULATE_H
+#define ARM_EMULATE_H
+
+#include "qemu/osdep.h"
+
+/*
+ * CPUState is only used as an opaque pointer (via qemu/typedefs.h).
+ * Callers that dereference CPUState include hw/core/cpu.h themselves.
+ */
+
+/**
+ * ArmEmulResult - return status from arm_emul_insn()
+ */
+typedef enum {
+ ARM_EMUL_OK, /* Instruction emulated successfully */
+ ARM_EMUL_UNHANDLED, /* Instruction not recognized by decoder */
+ ARM_EMUL_ERR_MEM, /* Memory access callback failed */
+} ArmEmulResult;
+
+/**
+ * struct arm_emul_ops - hypervisor register/memory callbacks
+ *
+ * GPR reg 31 = SP (the XZR/SP distinction is handled internally).
+ * Memory callbacks use guest virtual addresses.
+ */
+struct arm_emul_ops {
+ uint64_t (*read_gpr)(CPUState *cpu, int reg);
+ void (*write_gpr)(CPUState *cpu, int reg, uint64_t val);
+
+ /* @size: access width in bytes (4, 8, or 16) */
+ void (*read_fpreg)(CPUState *cpu, int reg, void *buf, int size);
+ void (*write_fpreg)(CPUState *cpu, int reg, const void *buf, int size);
+
+ /* Returns 0 on success, non-zero on failure */
+ int (*read_mem)(CPUState *cpu, uint64_t va, void *buf, int size);
+ int (*write_mem)(CPUState *cpu, uint64_t va, const void *buf, int size);
+};
+
+/**
+ * arm_emul_insn - decode and emulate one AArch64 instruction
+ *
+ * Caller must synchronize CPU state and fetch @insn before calling.
+ */
+ArmEmulResult arm_emul_insn(CPUState *cpu, const struct arm_emul_ops *ops,
+ uint32_t insn);
+
+#endif /* ARM_EMULATE_H */
diff --git a/target/arm/emulate/meson.build b/target/arm/emulate/meson.build
new file mode 100644
index 0000000..29b7879
--- /dev/null
+++ b/target/arm/emulate/meson.build
@@ -0,0 +1,16 @@
+gen_a64_ldst = decodetree.process('a64-ldst.decode',
+ extra_args: ['--static-decode=decode_a64_ldst'])
+
+arm_common_system_ss.add(when: 'TARGET_AARCH64', if_true: [
+ gen_a64_ldst, files('arm_emulate.c')
+])
+
+# Static library for unit testing (links emulation code + decodetree decoder)
+arm_emulate_test_lib = static_library('arm-emulate-test',
+ sources: [files('arm_emulate.c'), gen_a64_ldst],
+ dependencies: [qemuutil],
+ include_directories: include_directories('.'))
+
+arm_emulate_test = declare_dependency(
+ link_with: arm_emulate_test_lib,
+ include_directories: include_directories('.'))
diff --git a/target/arm/meson.build b/target/arm/meson.build
index 6e0e504..a4b2291 100644
--- a/target/arm/meson.build
+++ b/target/arm/meson.build
@@ -57,6 +57,7 @@ arm_common_system_ss.add(files(
'vfp_fpscr.c',
))
+subdir('emulate')
subdir('hvf')
subdir('whpx')
--
2.52.0
On Fri, 13 Mar 2026 at 02:19, Lucas Amaral <lucaaamaral@gmail.com> wrote: > > Add a shared emulation library in target/arm/emulate/ using a > decodetree decoder (a64-ldst.decode) and a callback-based interface > (struct arm_emul_ops) that any hypervisor backend can implement. > > The hypervisor cannot emulate ISV=0 data aborts without decoding the > faulting instruction, since the ESR syndrome does not carry the access > size or target register. > > Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com> > --- > target/arm/emulate/a64-ldst.decode | 293 ++++++++++++ > target/arm/emulate/arm_emulate.c | 738 +++++++++++++++++++++++++++++ > target/arm/emulate/arm_emulate.h | 55 +++ > target/arm/emulate/meson.build | 16 + > target/arm/meson.build | 1 + > 5 files changed, 1103 insertions(+) This is a huge patch, please can you split it into more easily reviewable chunks? Something like "basic framework", then add the instructions in multiple patches that each cover one coherent group of insns. Are there any places where your decodetree file patterns differ from the tcg ones? If so, that's fine, but please note them in the relevant commit messages for convenience of review. thanks -- PMM
> On 13. Mar 2026, at 03:18, Lucas Amaral <lucaaamaral@gmail.com> wrote:
>
> Add a shared emulation library in target/arm/emulate/ using a
> decodetree decoder (a64-ldst.decode) and a callback-based interface
> (struct arm_emul_ops) that any hypervisor backend can implement.
>
> The hypervisor cannot emulate ISV=0 data aborts without decoding the
> faulting instruction, since the ESR syndrome does not carry the access
> size or target register.
>
> Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
[…]
> +/**
> + * struct arm_emul_ops - hypervisor register/memory callbacks
> + *
> + * GPR reg 31 = SP (the XZR/SP distinction is handled internally).
> + * Memory callbacks use guest virtual addresses.
> + */
> +struct arm_emul_ops {
> + uint64_t (*read_gpr)(CPUState *cpu, int reg);
> + void (*write_gpr)(CPUState *cpu, int reg, uint64_t val);
> +
> + /* @size: access width in bytes (4, 8, or 16) */
> + void (*read_fpreg)(CPUState *cpu, int reg, void *buf, int size);
> + void (*write_fpreg)(CPUState *cpu, int reg, const void *buf, int size);
Hello,
Can be good to have, but you should have a default implementation using CPUState in an arm_helpers
to not duplicate them across each backend. and then do an if(ctx->ops->read_gpr) { use override } else { default }
with a default implementation.
> +
> + /* Returns 0 on success, non-zero on failure */
> + int (*read_mem)(CPUState *cpu, uint64_t va, void *buf, int size);
> + int (*write_mem)(CPUState *cpu, uint64_t va, const void *buf, int size);
> +};
A memory access - especially one that will be emulated - can span multiple (physical) pages under
the hood. If everything is mapped you’re fine, but that’s a bit depending on precious luck, especially
as the AArch64 glibc does unaligned accesses on memcpy.
On x86 side of things, was able to run Windows (NT) and Linux but not Haiku, (the Hurd needs more complexity that I don’t even handle yet for x86), and Win9x without handling such a fault case.
And there are memory to memory instructions on the way (FEAT_MOPS) where that’s even more likely to happen.
The downside of read_mem/write_mem is even if you return a fault code, you don’t know which one of the two pages
(or more potentially for memory-to-memory instructions) raised the fault.
Made a design change away to an mmu_gva_to_gpa callback and not having read/write ops anymore like this because of that factor (see target/i386/emulate/x86_mmu.c x86_write_mem_ex/x86_read_mem_ex)
Maybe you could keep a read_mem/write_mem matching those two on top of mmu_gva_to_gpa for your unit tests. Or run those
in a guest context as kvm-unit-tests does.
Thank you,
> +
> +/**
> + * arm_emul_insn - decode and emulate one AArch64 instruction
> + *
> + * Caller must synchronize CPU state and fetch @insn before calling.
> + */
> +ArmEmulResult arm_emul_insn(CPUState *cpu, const struct arm_emul_ops *ops,
> + uint32_t insn);
> +
> +#endif /* ARM_EMULATE_H */
> diff --git a/target/arm/emulate/meson.build b/target/arm/emulate/meson.build
> new file mode 100644
> index 0000000..29b7879
> --- /dev/null
> +++ b/target/arm/emulate/meson.build
> @@ -0,0 +1,16 @@
> +gen_a64_ldst = decodetree.process('a64-ldst.decode',
> + extra_args: ['--static-decode=decode_a64_ldst'])
> +
> +arm_common_system_ss.add(when: 'TARGET_AARCH64', if_true: [
> + gen_a64_ldst, files('arm_emulate.c')
> +])
> +
> +# Static library for unit testing (links emulation code + decodetree decoder)
> +arm_emulate_test_lib = static_library('arm-emulate-test',
> + sources: [files('arm_emulate.c'), gen_a64_ldst],
> + dependencies: [qemuutil],
> + include_directories: include_directories('.'))
> +
> +arm_emulate_test = declare_dependency(
> + link_with: arm_emulate_test_lib,
> + include_directories: include_directories('.'))
> diff --git a/target/arm/meson.build b/target/arm/meson.build
> index 6e0e504..a4b2291 100644
> --- a/target/arm/meson.build
> +++ b/target/arm/meson.build
> @@ -57,6 +57,7 @@ arm_common_system_ss.add(files(
> 'vfp_fpscr.c',
> ))
>
> +subdir('emulate')
> subdir('hvf')
> subdir('whpx')
>
> --
> 2.52.0
>
>
© 2016 - 2026 Red Hat, Inc.