target/arm/emulate/a64-ldst.decode | 293 ++++++++++++ target/arm/emulate/arm_emulate.c | 738 +++++++++++++++++++++++++++++ target/arm/emulate/arm_emulate.h | 55 +++ target/arm/emulate/meson.build | 16 + target/arm/hvf/hvf.c | 94 +++- target/arm/meson.build | 1 + target/arm/whpx/whpx-all.c | 86 +++- tests/unit/meson.build | 1 + tests/unit/test-arm-emulate.c | 540 +++++++++++++++++++++ 9 files changed, 1820 insertions(+), 4 deletions(-) create mode 100644 target/arm/emulate/a64-ldst.decode create mode 100644 target/arm/emulate/arm_emulate.c create mode 100644 target/arm/emulate/arm_emulate.h create mode 100644 target/arm/emulate/meson.build create mode 100644 tests/unit/test-arm-emulate.c
When a guest triggers a data abort with ISV=0 (e.g. STP, LDP, SIMD/FP
load/store, writeback addressing, atomics, exclusives), the ESR syndrome
does not carry the access size or target register, so the hypervisor
cannot emulate MMIO without decoding the faulting instruction.
v1 handled this inside HVF with a hand-written decoder. Based on review
feedback from Mohamed Mediouni and Peter Maydell, v2 restructures the
implementation as:
- A shared emulation library in target/arm/emulate/ with a decodetree
decoder (a64-ldst.decode), usable by any hypervisor backend.
- A callback-based interface (struct arm_emul_ops) that abstracts
register and memory access, keeping the library hypervisor-agnostic.
- HVF and WHPX backends wired as the first two consumers.
Instruction classes handled (DDI 0487):
- Load/store pair: STP, LDP, STNP, LDNP, STGP, LDPSW (C3.3.14-16)
- SIMD/FP load/store pair and single (C3.3.10, C3.3.14-16)
- All immediate addressing: unscaled, post/pre-index, unsigned offset
- Register offset addressing with extend (C3.3.9)
- Exclusives: STXR, LDXR, STXP, LDXP (C3.3.6)
- Atomics: LDADD, LDCLR, LDEOR, LDSET, LDSMAX/MIN, LDUMAX/MIN, SWP
- Compare-and-swap: CAS, CASP (C3.3.1)
- LDRAA/LDRAB with FEAT_PAuth (C6.2.121)
- PRFM, DC maintenance (as NOPs)
Intentionally omitted (not observed in ISV=0 MMIO traps during testing):
- AdvSIMD structure loads/stores (LD1/ST1 etc.)
- MTE load/stores (FEAT_MTE)
- 128-bit atomics (FEAT_LSE128)
- MOPS (FEAT_MOPS)
KVM NISV handling is a natural follow-up -- it requires similar
arm_emul_ops callbacks using KVM vcpu ioctls.
v1 -> v2:
- Moved from HVF-specific inline decoder to shared library
in target/arm/emulate/ (Mohamed Mediouni)
- Added decodetree decoder for structured instruction parsing
(Peter Maydell)
- Made hypervisor-agnostic; wired HVF and WHPX (Peter Maydell)
- Added CASP register-pair validation (odd/r31 -> UNHANDLED)
- Added unit tests (19 test cases)
- Split into 3 patches for reviewability
Lucas Amaral (3):
target/arm: add AArch64 ISV=0 instruction emulation library
tests: add unit tests for ISV=0 emulation library
target/arm: wire ISV=0 emulation into HVF and WHPX
target/arm/emulate/a64-ldst.decode | 293 ++++++++++++
target/arm/emulate/arm_emulate.c | 738 +++++++++++++++++++++++++++++
target/arm/emulate/arm_emulate.h | 55 +++
target/arm/emulate/meson.build | 16 +
target/arm/hvf/hvf.c | 94 +++-
target/arm/meson.build | 1 +
target/arm/whpx/whpx-all.c | 86 +++-
tests/unit/meson.build | 1 +
tests/unit/test-arm-emulate.c | 540 +++++++++++++++++++++
9 files changed, 1820 insertions(+), 4 deletions(-)
create mode 100644 target/arm/emulate/a64-ldst.decode
create mode 100644 target/arm/emulate/arm_emulate.c
create mode 100644 target/arm/emulate/arm_emulate.h
create mode 100644 target/arm/emulate/meson.build
create mode 100644 tests/unit/test-arm-emulate.c
--
2.52.0
Add a shared emulation library for AArch64 load/store instructions that
cause ISV=0 data aborts under hardware virtualization, and wire it into
HVF (macOS) and WHPX (Windows).
When the Instruction Syndrome Valid bit is clear, the hypervisor cannot
determine the faulting instruction's target register or access size from
the syndrome alone. This previously hit an assert(isv) and killed the
VM. The library fetches and decodes the faulting instruction using a
decodetree-generated decoder, then emulates it directly against the vCPU
register file and memory.
As suggested in v1 review, the library uses its own a64-ldst.decode
rather than sharing target/arm/tcg/a64.decode. Beyond the practical
complexity noted in review, the two have incompatible purposes: TCG's
trans_* functions are a compiler — they emit IR ops into a translation
block for later execution. This library's trans_* functions are an
interpreter — they execute directly against the vCPU register file and
memory. The decodetree-generated dispatcher calls trans_* by name, so
both cannot coexist in the same translation unit. Decode patterns are
kept consistent with TCG's where possible.
This series wires the library into HVF (macOS) and WHPX (Windows). KVM
on ARM already handles ISV=0 data aborts in-kernel via
kvm_arm_handle_dabt_nisv(), but could use this library as a userspace
fallback in the future.
Changes since v2:
- Split monolithic patch into 6 incremental patches: framework, then
one patch per coherent instruction group (Peter)
- Removed per-backend callback ops; library uses CPUArchState directly
with cpu_memory_rw_debug() for memory access (Mohamed)
- Removed mock unit tests (Mohamed; kvm-unit-tests is the right
vehicle for decoder validation)
- Added architectural justification for separate decode file
Lucas Amaral (6):
target/arm/emulate: add ISV=0 emulation library with load/store
immediate
target/arm/emulate: add load/store register offset
target/arm/emulate: add load/store pair
target/arm/emulate: add load/store exclusive
target/arm/emulate: add atomic, compare-and-swap, and PAC load
target/arm/hvf,whpx: wire ISV=0 emulation for data aborts
target/arm/emulate/a64-ldst.decode | 293 +++++++++++
target/arm/emulate/arm_emulate.c | 747 +++++++++++++++++++++++++++++
target/arm/emulate/arm_emulate.h | 30 ++
target/arm/emulate/meson.build | 6 +
target/arm/hvf/hvf.c | 41 +-
target/arm/meson.build | 1 +
target/arm/whpx/whpx-all.c | 39 +-
7 files changed, 1153 insertions(+), 4 deletions(-)
create mode 100644 target/arm/emulate/a64-ldst.decode
create mode 100644 target/arm/emulate/arm_emulate.c
create mode 100644 target/arm/emulate/arm_emulate.h
create mode 100644 target/arm/emulate/meson.build
--
2.52.0
Add a shared emulation library for AArch64 load/store instructions that
cause ISV=0 data aborts under hardware virtualization, and wire it into
HVF (macOS) and WHPX (Windows).
When the Instruction Syndrome Valid bit is clear, the hypervisor cannot
determine the faulting instruction's target register or access size from
the syndrome alone. This previously hit an assert(isv) and killed the
VM. The library fetches and decodes the faulting instruction using a
decodetree-generated decoder, then emulates it directly against the vCPU
register file and memory.
The library uses its own a64-ldst.decode rather than sharing
target/arm/tcg/a64.decode — TCG's trans_* functions emit IR into a
translation block, while this library's execute directly. Decode
patterns are kept consistent with TCG's where possible.
Changes since v3:
- Inject synchronous external abort (matching kvm_inject_arm_sea()
syndrome) on unhandled instruction or memory error, instead of
silently advancing PC or returning an error.
- Fix WHPX advance_pc bug: error paths no longer advance PC.
- Add page-crossing guard in mem_read/mem_write to prevent partial
side effects from cpu_memory_rw_debug().
Changes since v2:
- Split monolithic patch into 6 incremental patches: framework, then
one patch per coherent instruction group (Peter)
- Removed per-backend callback ops; library uses CPUArchState directly
with cpu_memory_rw_debug() for memory access (Mohamed)
- Removed mock unit tests (Mohamed; kvm-unit-tests is the right
vehicle for decoder validation)
- Added architectural justification for separate decode file
Lucas Amaral (6):
target/arm/emulate: add ISV=0 emulation library with load/store
immediate
target/arm/emulate: add load/store register offset
target/arm/emulate: add load/store pair
target/arm/emulate: add load/store exclusive
target/arm/emulate: add atomic, compare-and-swap, and PAC load
target/arm/hvf,whpx: wire ISV=0 emulation for data aborts
target/arm/emulate/a64-ldst.decode | 293 +++++++++++
target/arm/emulate/arm_emulate.c | 758 +++++++++++++++++++++++++++++
target/arm/emulate/arm_emulate.h | 30 ++
target/arm/emulate/meson.build | 6 +
target/arm/hvf/hvf.c | 46 +-
target/arm/meson.build | 1 +
target/arm/whpx/whpx-all.c | 61 ++-
7 files changed, 1191 insertions(+), 4 deletions(-)
create mode 100644 target/arm/emulate/a64-ldst.decode
create mode 100644 target/arm/emulate/arm_emulate.c
create mode 100644 target/arm/emulate/arm_emulate.h
create mode 100644 target/arm/emulate/meson.build
--
2.52.0
Lucas Amaral <lucaaamaral@gmail.com> writes: > Add a shared emulation library for AArch64 load/store instructions that > cause ISV=0 data aborts under hardware virtualization, and wire it into > HVF (macOS) and WHPX (Windows). FYI posting follow-up versions as reply to existing threads is likely to hide your series from the patchew tooling and possibly the maintainers as it hides in the old threads. -- Alex Bennée Virtualisation Tech Lead @ Linaro
Add a shared emulation library for AArch64 load/store instructions that
cause ISV=0 data aborts under hardware virtualization (HVF, WHPX).
When the Instruction Syndrome Valid bit is clear, the hypervisor cannot
determine the faulting instruction's target register or access size from
the syndrome alone. This library fetches and decodes the instruction
using a decodetree-generated decoder, then emulates it by accessing the
vCPU's register file (CPUARMState) and memory (cpu_memory_rw_debug)
directly.
This patch establishes the framework and adds load/store single with
immediate addressing — the most common ISV=0 trigger. Subsequent
patches add register-offset, pair, exclusive, and atomic instructions.
Instruction coverage:
- STR/LDR (GPR): unscaled, post-indexed, unprivileged, pre-indexed,
unsigned offset — all sizes (8/16/32/64-bit), sign/zero extension
- STR/LDR (SIMD/FP): same addressing modes, 8-128 bit elements
- PRFM: prefetch treated as NOP
- DC cache maintenance (SYS CRn=C7): NOP on MMIO
This library uses its own a64-ldst.decode rather than sharing
target/arm/tcg/a64.decode. TCG's trans_* functions are a compiler:
they emit IR ops into a translation block for later execution. This
library's trans_* functions are an interpreter: they execute directly
against the vCPU register file and memory. The decodetree-generated
dispatcher calls trans_* by name, so both cannot coexist in the same
translation unit. Decode patterns are kept consistent with TCG's
where possible.
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
target/arm/emulate/a64-ldst.decode | 129 ++++++++++++++++
target/arm/emulate/arm_emulate.c | 237 +++++++++++++++++++++++++++++
target/arm/emulate/arm_emulate.h | 30 ++++
target/arm/emulate/meson.build | 6 +
target/arm/meson.build | 1 +
5 files changed, 403 insertions(+)
create mode 100644 target/arm/emulate/a64-ldst.decode
create mode 100644 target/arm/emulate/arm_emulate.c
create mode 100644 target/arm/emulate/arm_emulate.h
create mode 100644 target/arm/emulate/meson.build
diff --git a/target/arm/emulate/a64-ldst.decode b/target/arm/emulate/a64-ldst.decode
new file mode 100644
index 00000000..c887dcba
--- /dev/null
+++ b/target/arm/emulate/a64-ldst.decode
@@ -0,0 +1,129 @@
+# AArch64 load/store instruction patterns for ISV=0 emulation
+#
+# Copyright (c) 2026 Lucas Amaral <lucaaamaral@gmail.com>
+#
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+### Argument sets
+
+# Load/store immediate (unscaled, pre/post-index, unprivileged, unsigned offset)
+# 'u' flag: 0 = 9-bit signed immediate (byte offset), 1 = 12-bit unsigned (needs << sz)
+&ldst_imm rt rn imm sz sign w p unpriv ext u
+
+### Format templates
+
+# Load/store immediate (9-bit signed)
+@ldst_imm .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=0 p=0 w=0
+@ldst_imm_pre .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=0 p=0 w=1
+@ldst_imm_post .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=0 p=1 w=1
+@ldst_imm_user .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=1 p=0 w=0
+
+# Load/store unsigned offset (12-bit, handler scales by << sz)
+@ldst_uimm .. ... . .. .. imm:12 rn:5 rt:5 &ldst_imm u=1 unpriv=0 p=0 w=0
+
+### Load/store register — unscaled immediate (LDUR/STUR)
+
+# GPR
+STR_i sz:2 111 0 00 00 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0
+LDR_i 00 111 0 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=1 sz=0
+LDR_i 01 111 0 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=1 sz=1
+LDR_i 10 111 0 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=1 sz=2
+LDR_i 11 111 0 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0 sz=3
+LDR_i 00 111 0 00 10 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=0 sz=0
+LDR_i 01 111 0 00 10 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=0 sz=1
+LDR_i 10 111 0 00 10 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=0 sz=2
+LDR_i 00 111 0 00 11 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=1 sz=0
+LDR_i 01 111 0 00 11 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=1 sz=1
+
+# SIMD/FP
+STR_v_i sz:2 111 1 00 00 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0
+STR_v_i 00 111 1 00 10 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0 sz=4
+LDR_v_i sz:2 111 1 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0
+LDR_v_i 00 111 1 00 11 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0 sz=4
+
+### Load/store register — post-indexed
+
+# GPR
+STR_i sz:2 111 0 00 00 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0
+LDR_i 00 111 0 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=1 sz=0
+LDR_i 01 111 0 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=1 sz=1
+LDR_i 10 111 0 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=1 sz=2
+LDR_i 11 111 0 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0 sz=3
+LDR_i 00 111 0 00 10 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=0 sz=0
+LDR_i 01 111 0 00 10 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=0 sz=1
+LDR_i 10 111 0 00 10 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=0 sz=2
+LDR_i 00 111 0 00 11 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=1 sz=0
+LDR_i 01 111 0 00 11 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=1 sz=1
+
+# SIMD/FP
+STR_v_i sz:2 111 1 00 00 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0
+STR_v_i 00 111 1 00 10 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0 sz=4
+LDR_v_i sz:2 111 1 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0
+LDR_v_i 00 111 1 00 11 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0 sz=4
+
+### Load/store register — unprivileged
+
+# GPR only (no SIMD/FP unprivileged forms)
+STR_i sz:2 111 0 00 00 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=0
+LDR_i 00 111 0 00 01 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=1 sz=0
+LDR_i 01 111 0 00 01 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=1 sz=1
+LDR_i 10 111 0 00 01 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=1 sz=2
+LDR_i 11 111 0 00 01 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=0 sz=3
+LDR_i 00 111 0 00 10 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=0 sz=0
+LDR_i 01 111 0 00 10 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=0 sz=1
+LDR_i 10 111 0 00 10 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=0 sz=2
+LDR_i 00 111 0 00 11 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=1 sz=0
+LDR_i 01 111 0 00 11 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=1 sz=1
+
+### Load/store register — pre-indexed
+
+# GPR
+STR_i sz:2 111 0 00 00 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0
+LDR_i 00 111 0 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=1 sz=0
+LDR_i 01 111 0 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=1 sz=1
+LDR_i 10 111 0 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=1 sz=2
+LDR_i 11 111 0 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0 sz=3
+LDR_i 00 111 0 00 10 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=0 sz=0
+LDR_i 01 111 0 00 10 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=0 sz=1
+LDR_i 10 111 0 00 10 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=0 sz=2
+LDR_i 00 111 0 00 11 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=1 sz=0
+LDR_i 01 111 0 00 11 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=1 sz=1
+
+# SIMD/FP
+STR_v_i sz:2 111 1 00 00 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0
+STR_v_i 00 111 1 00 10 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0 sz=4
+LDR_v_i sz:2 111 1 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0
+LDR_v_i 00 111 1 00 11 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0 sz=4
+
+### PRFM — unscaled immediate: prefetch is a NOP
+
+NOP 11 111 0 00 10 0 --------- 00 ----- -----
+
+### Load/store register — unsigned offset
+
+# GPR
+STR_i sz:2 111 0 01 00 ............ ..... ..... @ldst_uimm sign=0 ext=0
+LDR_i 00 111 0 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=1 sz=0
+LDR_i 01 111 0 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=1 sz=1
+LDR_i 10 111 0 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=1 sz=2
+LDR_i 11 111 0 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=3
+LDR_i 00 111 0 01 10 ............ ..... ..... @ldst_uimm sign=1 ext=0 sz=0
+LDR_i 01 111 0 01 10 ............ ..... ..... @ldst_uimm sign=1 ext=0 sz=1
+LDR_i 10 111 0 01 10 ............ ..... ..... @ldst_uimm sign=1 ext=0 sz=2
+LDR_i 00 111 0 01 11 ............ ..... ..... @ldst_uimm sign=1 ext=1 sz=0
+LDR_i 01 111 0 01 11 ............ ..... ..... @ldst_uimm sign=1 ext=1 sz=1
+
+# PRFM — unsigned offset
+NOP 11 111 0 01 10 ------------ ----- -----
+
+# SIMD/FP
+STR_v_i sz:2 111 1 01 00 ............ ..... ..... @ldst_uimm sign=0 ext=0
+STR_v_i 00 111 1 01 10 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=4
+LDR_v_i sz:2 111 1 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=0
+LDR_v_i 00 111 1 01 11 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=4
+
+### System instructions — DC cache maintenance
+
+# SYS with CRn=C7 covers all data cache operations (DC CIVAC, CVAC, etc.).
+# On MMIO regions, cache maintenance is a harmless no-op.
+NOP 1101 0101 0000 1 --- 0111 ---- --- -----
diff --git a/target/arm/emulate/arm_emulate.c b/target/arm/emulate/arm_emulate.c
new file mode 100644
index 00000000..02fefc30
--- /dev/null
+++ b/target/arm/emulate/arm_emulate.c
@@ -0,0 +1,237 @@
+/*
+ * AArch64 instruction emulation for ISV=0 data aborts
+ *
+ * Copyright (c) 2026 Lucas Amaral <lucaaamaral@gmail.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "arm_emulate.h"
+#include "target/arm/cpu.h"
+#include "exec/cpu-common.h"
+#include "exec/target_page.h"
+
+/* TODO: assumes LE guest data layout (sufficient for HVF/WHPX, both LE-only) */
+
+/* Named "DisasContext" as required by the decodetree code generator */
+typedef struct {
+ CPUState *cpu;
+ CPUARMState *env;
+ ArmEmulResult result;
+} DisasContext;
+
+#include "decode-a64-ldst.c.inc"
+
+/* GPR data access (Rt, Rs, Rt2) -- register 31 = XZR */
+
+static uint64_t gpr_read(DisasContext *ctx, int reg)
+{
+ if (reg == 31) {
+ return 0; /* XZR */
+ }
+ return ctx->env->xregs[reg];
+}
+
+static void gpr_write(DisasContext *ctx, int reg, uint64_t val)
+{
+ if (reg == 31) {
+ return; /* XZR -- discard */
+ }
+ ctx->env->xregs[reg] = val;
+ ctx->cpu->vcpu_dirty = true;
+}
+
+/* Base register access (Rn) -- register 31 = SP */
+
+static uint64_t base_read(DisasContext *ctx, int rn)
+{
+ return ctx->env->xregs[rn];
+}
+
+static void base_write(DisasContext *ctx, int rn, uint64_t val)
+{
+ ctx->env->xregs[rn] = val;
+ ctx->cpu->vcpu_dirty = true;
+}
+
+/* SIMD/FP register access */
+
+static void fpreg_read(DisasContext *ctx, int reg, void *buf, int size)
+{
+ memcpy(buf, &ctx->env->vfp.zregs[reg], size);
+}
+
+static void fpreg_write(DisasContext *ctx, int reg, const void *buf, int size)
+{
+ memset(&ctx->env->vfp.zregs[reg], 0, sizeof(ctx->env->vfp.zregs[reg]));
+ memcpy(&ctx->env->vfp.zregs[reg], buf, size);
+ ctx->cpu->vcpu_dirty = true;
+}
+
+/* Memory access wrappers */
+
+static int mem_read(DisasContext *ctx, uint64_t va, void *buf, int size)
+{
+ if (((va & ~TARGET_PAGE_MASK) + size) > TARGET_PAGE_SIZE) {
+ ctx->result = ARM_EMUL_ERR_MEM;
+ return -1;
+ }
+ int ret = cpu_memory_rw_debug(ctx->cpu, va, buf, size, false);
+ if (ret != 0) {
+ ctx->result = ARM_EMUL_ERR_MEM;
+ }
+ return ret;
+}
+
+static int mem_write(DisasContext *ctx, uint64_t va, const void *buf, int size)
+{
+ if (((va & ~TARGET_PAGE_MASK) + size) > TARGET_PAGE_SIZE) {
+ ctx->result = ARM_EMUL_ERR_MEM;
+ return -1;
+ }
+ int ret = cpu_memory_rw_debug(ctx->cpu, va, (void *)buf, size, true);
+ if (ret != 0) {
+ ctx->result = ARM_EMUL_ERR_MEM;
+ }
+ return ret;
+}
+
+/* Sign/zero extension helpers */
+
+static uint64_t sign_extend(uint64_t val, int from_bits)
+{
+ int shift = 64 - from_bits;
+ return (int64_t)(val << shift) >> shift;
+}
+
+/* Apply sign/zero extension */
+static uint64_t load_extend(uint64_t val, int sz, int sign, int ext)
+{
+ int data_bits = 8 << sz;
+
+ if (sign) {
+ val = sign_extend(val, data_bits);
+ if (ext) {
+ /* Sign-extend to 32 bits (W register) */
+ val &= 0xFFFFFFFF;
+ }
+ } else if (ext) {
+ /* Zero-extend to 32 bits (W register) */
+ val &= 0xFFFFFFFF;
+ }
+ return val;
+}
+
+/* Load/store single -- immediate (GPR) (DDI 0487 C3.3.8 -- C3.3.13) */
+
+static bool trans_STR_i(DisasContext *ctx, arg_ldst_imm *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int64_t offset = a->u ? ((int64_t)(uint64_t)a->imm << a->sz)
+ : (int64_t)a->imm;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+
+ uint64_t val = gpr_read(ctx, a->rt);
+ if (mem_write(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+static bool trans_LDR_i(DisasContext *ctx, arg_ldst_imm *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int64_t offset = a->u ? ((int64_t)(uint64_t)a->imm << a->sz)
+ : (int64_t)a->imm;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint64_t val = 0;
+
+ if (mem_read(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ val = load_extend(val, a->sz, a->sign, a->ext);
+ gpr_write(ctx, a->rt, val);
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+/*
+ * Load/store single -- immediate (SIMD/FP)
+ * STR_v_i / LDR_v_i (DDI 0487 C3.3.10)
+ */
+
+static bool trans_STR_v_i(DisasContext *ctx, arg_ldst_imm *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int64_t offset = a->u ? ((int64_t)(uint64_t)a->imm << a->sz)
+ : (int64_t)a->imm;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[16];
+
+ fpreg_read(ctx, a->rt, buf, esize);
+ if (mem_write(ctx, va, buf, esize) != 0) {
+ return true;
+ }
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+static bool trans_LDR_v_i(DisasContext *ctx, arg_ldst_imm *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int64_t offset = a->u ? ((int64_t)(uint64_t)a->imm << a->sz)
+ : (int64_t)a->imm;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[16];
+
+ if (mem_read(ctx, va, buf, esize) != 0) {
+ return true;
+ }
+
+ fpreg_write(ctx, a->rt, buf, esize);
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+/* PRFM, DC cache maintenance -- treated as NOP */
+static bool trans_NOP(DisasContext *ctx, arg_NOP *a)
+{
+ (void)ctx;
+ (void)a;
+ return true;
+}
+
+/* Entry point */
+
+ArmEmulResult arm_emul_insn(CPUArchState *env, uint32_t insn)
+{
+ DisasContext ctx = {
+ .cpu = env_cpu(env),
+ .env = env,
+ .result = ARM_EMUL_OK,
+ };
+
+ if (!decode_a64_ldst(&ctx, insn)) {
+ return ARM_EMUL_UNHANDLED;
+ }
+
+ return ctx.result;
+}
diff --git a/target/arm/emulate/arm_emulate.h b/target/arm/emulate/arm_emulate.h
new file mode 100644
index 00000000..7fe29839
--- /dev/null
+++ b/target/arm/emulate/arm_emulate.h
@@ -0,0 +1,30 @@
+/*
+ * AArch64 instruction emulation library
+ *
+ * Copyright (c) 2026 Lucas Amaral <lucaaamaral@gmail.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef ARM_EMULATE_H
+#define ARM_EMULATE_H
+
+#include "qemu/osdep.h"
+
+/**
+ * ArmEmulResult - return status from arm_emul_insn()
+ */
+typedef enum {
+ ARM_EMUL_OK, /* Instruction emulated successfully */
+ ARM_EMUL_UNHANDLED, /* Instruction not recognized by decoder */
+ ARM_EMUL_ERR_MEM, /* Memory access failed */
+} ArmEmulResult;
+
+/**
+ * arm_emul_insn - decode and emulate one AArch64 instruction
+ *
+ * Caller must synchronize CPU state and fetch @insn before calling.
+ */
+ArmEmulResult arm_emul_insn(CPUArchState *env, uint32_t insn);
+
+#endif /* ARM_EMULATE_H */
diff --git a/target/arm/emulate/meson.build b/target/arm/emulate/meson.build
new file mode 100644
index 00000000..c0b38dd1
--- /dev/null
+++ b/target/arm/emulate/meson.build
@@ -0,0 +1,6 @@
+gen_a64_ldst = decodetree.process('a64-ldst.decode',
+ extra_args: ['--static-decode=decode_a64_ldst'])
+
+arm_common_system_ss.add(when: 'TARGET_AARCH64', if_true: [
+ gen_a64_ldst, files('arm_emulate.c')
+])
diff --git a/target/arm/meson.build b/target/arm/meson.build
index 6e0e504a..a4b2291b 100644
--- a/target/arm/meson.build
+++ b/target/arm/meson.build
@@ -57,6 +57,7 @@ arm_common_system_ss.add(files(
'vfp_fpscr.c',
))
+subdir('emulate')
subdir('hvf')
subdir('whpx')
--
2.52.0
On 3/16/26 15:50, Lucas Amaral wrote:
> +typedef struct {
> + CPUState *cpu;
> + CPUARMState *env;
> + ArmEmulResult result;
> +} DisasContext;
...
> +ArmEmulResult arm_emul_insn(CPUArchState *env, uint32_t insn)
> +{
> + DisasContext ctx = {
> + .cpu = env_cpu(env),
> + .env = env,
The env_cpu function is trivial pointer arithmetic.
Put the one that's used more into DisasContext and use env_cpu or cpu_env inline to get to
the other.
> diff --git a/target/arm/emulate/meson.build b/target/arm/emulate/meson.build
> new file mode 100644
> index 00000000..c0b38dd1
> --- /dev/null
> +++ b/target/arm/emulate/meson.build
> @@ -0,0 +1,6 @@
> +gen_a64_ldst = decodetree.process('a64-ldst.decode',
> + extra_args: ['--static-decode=decode_a64_ldst'])
> +
> +arm_common_system_ss.add(when: 'TARGET_AARCH64', if_true: [
> + gen_a64_ldst, files('arm_emulate.c')
> +])
Do we really want to include this emulation when the host virtualization won't use it?
I'm sure Kconfig can be used to select it from the relevant virt configs.
r~
Add emulation for load/store register offset addressing mode
(DDI 0487 C3.3.9). The offset register value is extended via
UXTB/UXTH/UXTW/UXTX/SXTB/SXTH/SXTW/SXTX and optionally
shifted by the element size.
Instruction coverage:
- STR/LDR (GPR): register offset with extend, all sizes
- STR/LDR (SIMD/FP): register offset with extend, 8-128 bit
- PRFM register offset: NOP
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
target/arm/emulate/a64-ldst.decode | 29 ++++++++
target/arm/emulate/arm_emulate.c | 103 +++++++++++++++++++++++++++++
2 files changed, 132 insertions(+)
diff --git a/target/arm/emulate/a64-ldst.decode b/target/arm/emulate/a64-ldst.decode
index c887dcba..af6babe1 100644
--- a/target/arm/emulate/a64-ldst.decode
+++ b/target/arm/emulate/a64-ldst.decode
@@ -10,6 +10,9 @@
# 'u' flag: 0 = 9-bit signed immediate (byte offset), 1 = 12-bit unsigned (needs << sz)
&ldst_imm rt rn imm sz sign w p unpriv ext u
+# Load/store register offset
+&ldst rm rn rt sign ext sz opt s
+
### Format templates
# Load/store immediate (9-bit signed)
@@ -21,6 +24,9 @@
# Load/store unsigned offset (12-bit, handler scales by << sz)
@ldst_uimm .. ... . .. .. imm:12 rn:5 rt:5 &ldst_imm u=1 unpriv=0 p=0 w=0
+# Load/store register offset
+@ldst .. ... . .. .. . rm:5 opt:3 s:1 .. rn:5 rt:5 &ldst
+
### Load/store register — unscaled immediate (LDUR/STUR)
# GPR
@@ -122,6 +128,29 @@ STR_v_i 00 111 1 01 10 ............ ..... ..... @ldst_uimm sign=
LDR_v_i sz:2 111 1 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=0
LDR_v_i 00 111 1 01 11 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=4
+### Load/store register — register offset
+
+# GPR
+STR sz:2 111 0 00 00 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0
+LDR 00 111 0 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=1 sz=0
+LDR 01 111 0 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=1 sz=1
+LDR 10 111 0 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=1 sz=2
+LDR 11 111 0 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0 sz=3
+LDR 00 111 0 00 10 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=0 sz=0
+LDR 01 111 0 00 10 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=0 sz=1
+LDR 10 111 0 00 10 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=0 sz=2
+LDR 00 111 0 00 11 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=1 sz=0
+LDR 01 111 0 00 11 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=1 sz=1
+
+# PRFM — register offset
+NOP 11 111 0 00 10 1 ----- -1- - 10 ----- -----
+
+# SIMD/FP
+STR_v sz:2 111 1 00 00 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0
+STR_v 00 111 1 00 10 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0 sz=4
+LDR_v sz:2 111 1 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0
+LDR_v 00 111 1 00 11 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0 sz=4
+
### System instructions — DC cache maintenance
# SYS with CRn=C7 covers all data cache operations (DC CIVAC, CVAC, etc.).
diff --git a/target/arm/emulate/arm_emulate.c b/target/arm/emulate/arm_emulate.c
index 02fefc30..bf09e2a6 100644
--- a/target/arm/emulate/arm_emulate.c
+++ b/target/arm/emulate/arm_emulate.c
@@ -211,6 +211,109 @@ static bool trans_LDR_v_i(DisasContext *ctx, arg_ldst_imm *a)
return true;
}
+/* Register offset extension (DDI 0487 C6.2.131) */
+
+static uint64_t extend_reg(uint64_t val, int option, int shift)
+{
+ switch (option) {
+ case 0: /* UXTB */
+ val = (uint8_t)val;
+ break;
+ case 1: /* UXTH */
+ val = (uint16_t)val;
+ break;
+ case 2: /* UXTW */
+ val = (uint32_t)val;
+ break;
+ case 3: /* UXTX / LSL */
+ break;
+ case 4: /* SXTB */
+ val = (int64_t)(int8_t)val;
+ break;
+ case 5: /* SXTH */
+ val = (int64_t)(int16_t)val;
+ break;
+ case 6: /* SXTW */
+ val = (int64_t)(int32_t)val;
+ break;
+ case 7: /* SXTX */
+ break;
+ }
+ return val << shift;
+}
+
+/*
+ * Load/store single -- register offset (GPR)
+ * STR / LDR (DDI 0487 C3.3.9)
+ */
+
+static bool trans_STR(DisasContext *ctx, arg_ldst *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int shift = a->s ? a->sz : 0;
+ uint64_t rm_val = gpr_read(ctx, a->rm);
+ uint64_t offset = extend_reg(rm_val, a->opt, shift);
+ uint64_t va = base_read(ctx, a->rn) + offset;
+
+ uint64_t val = gpr_read(ctx, a->rt);
+ mem_write(ctx, va, &val, esize);
+ return true;
+}
+
+static bool trans_LDR(DisasContext *ctx, arg_ldst *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int shift = a->s ? a->sz : 0;
+ uint64_t rm_val = gpr_read(ctx, a->rm);
+ uint64_t offset = extend_reg(rm_val, a->opt, shift);
+ uint64_t va = base_read(ctx, a->rn) + offset;
+ uint64_t val = 0;
+
+ if (mem_read(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ val = load_extend(val, a->sz, a->sign, a->ext);
+ gpr_write(ctx, a->rt, val);
+ return true;
+}
+
+/*
+ * Load/store single -- register offset (SIMD/FP)
+ * STR_v / LDR_v (DDI 0487 C3.3.10)
+ */
+
+static bool trans_STR_v(DisasContext *ctx, arg_ldst *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int shift = a->s ? a->sz : 0;
+ uint64_t rm_val = gpr_read(ctx, a->rm);
+ uint64_t offset = extend_reg(rm_val, a->opt, shift);
+ uint64_t va = base_read(ctx, a->rn) + offset;
+ uint8_t buf[16];
+
+ fpreg_read(ctx, a->rt, buf, esize);
+ mem_write(ctx, va, buf, esize);
+ return true;
+}
+
+static bool trans_LDR_v(DisasContext *ctx, arg_ldst *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int shift = a->s ? a->sz : 0;
+ uint64_t rm_val = gpr_read(ctx, a->rm);
+ uint64_t offset = extend_reg(rm_val, a->opt, shift);
+ uint64_t va = base_read(ctx, a->rn) + offset;
+ uint8_t buf[16];
+
+ if (mem_read(ctx, va, buf, esize) != 0) {
+ return true;
+ }
+
+ fpreg_write(ctx, a->rt, buf, esize);
+ return true;
+}
+
/* PRFM, DC cache maintenance -- treated as NOP */
static bool trans_NOP(DisasContext *ctx, arg_NOP *a)
{
--
2.52.0
Add emulation for load/store pair instructions (DDI 0487 C3.3.14 --
C3.3.16). All addressing modes are covered: non-temporal (STNP/LDNP),
post-indexed, signed offset, and pre-indexed.
Instruction coverage:
- STP/LDP (GPR): 32/64-bit pairs, all addressing modes
- STP/LDP (SIMD/FP): 32/64/128-bit pairs, all addressing modes
- LDPSW: sign-extending 32-bit pair load
- STGP: store allocation tag pair (tag operation is NOP for MMIO)
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
target/arm/emulate/a64-ldst.decode | 68 ++++++++++++++++++
target/arm/emulate/arm_emulate.c | 111 +++++++++++++++++++++++++++++
2 files changed, 179 insertions(+)
diff --git a/target/arm/emulate/a64-ldst.decode b/target/arm/emulate/a64-ldst.decode
index af6babe1..f3de3f86 100644
--- a/target/arm/emulate/a64-ldst.decode
+++ b/target/arm/emulate/a64-ldst.decode
@@ -10,6 +10,9 @@
# 'u' flag: 0 = 9-bit signed immediate (byte offset), 1 = 12-bit unsigned (needs << sz)
&ldst_imm rt rn imm sz sign w p unpriv ext u
+# Load/store pair (GPR and SIMD/FP)
+&ldstpair rt2 rt rn imm sz sign w p
+
# Load/store register offset
&ldst rm rn rt sign ext sz opt s
@@ -24,6 +27,9 @@
# Load/store unsigned offset (12-bit, handler scales by << sz)
@ldst_uimm .. ... . .. .. imm:12 rn:5 rt:5 &ldst_imm u=1 unpriv=0 p=0 w=0
+# Load/store pair: imm7 is signed, scaled by element size in handler
+@ldstpair .. ... . ... . imm:s7 rt2:5 rn:5 rt:5 &ldstpair
+
# Load/store register offset
@ldst .. ... . .. .. . rm:5 opt:3 s:1 .. rn:5 rt:5 &ldst
@@ -128,6 +134,68 @@ STR_v_i 00 111 1 01 10 ............ ..... ..... @ldst_uimm sign=
LDR_v_i sz:2 111 1 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=0
LDR_v_i 00 111 1 01 11 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=4
+### Load/store pair — non-temporal (STNP/LDNP)
+
+# STNP/LDNP: offset only, no writeback. Non-temporal hint ignored.
+STP 00 101 0 000 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP 00 101 0 000 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+STP 10 101 0 000 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+LDP 10 101 0 000 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STP_v 00 101 1 000 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP_v 00 101 1 000 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+STP_v 01 101 1 000 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+LDP_v 01 101 1 000 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STP_v 10 101 1 000 0 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=0
+LDP_v 10 101 1 000 1 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=0
+
+### Load/store pair — post-indexed
+
+STP 00 101 0 001 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=1 w=1
+LDP 00 101 0 001 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=1 w=1
+LDP 01 101 0 001 1 ....... ..... ..... ..... @ldstpair sz=2 sign=1 p=1 w=1
+STP 10 101 0 001 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+LDP 10 101 0 001 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+STP_v 00 101 1 001 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=1 w=1
+LDP_v 00 101 1 001 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=1 w=1
+STP_v 01 101 1 001 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+LDP_v 01 101 1 001 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+STP_v 10 101 1 001 0 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=1 w=1
+LDP_v 10 101 1 001 1 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=1 w=1
+
+### Load/store pair — signed offset
+
+STP 00 101 0 010 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP 00 101 0 010 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP 01 101 0 010 1 ....... ..... ..... ..... @ldstpair sz=2 sign=1 p=0 w=0
+STP 10 101 0 010 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+LDP 10 101 0 010 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STP_v 00 101 1 010 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP_v 00 101 1 010 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+STP_v 01 101 1 010 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+LDP_v 01 101 1 010 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STP_v 10 101 1 010 0 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=0
+LDP_v 10 101 1 010 1 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=0
+
+### Load/store pair — pre-indexed
+
+STP 00 101 0 011 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=1
+LDP 00 101 0 011 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=1
+LDP 01 101 0 011 1 ....... ..... ..... ..... @ldstpair sz=2 sign=1 p=0 w=1
+STP 10 101 0 011 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+LDP 10 101 0 011 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+STP_v 00 101 1 011 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=1
+LDP_v 00 101 1 011 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=1
+STP_v 01 101 1 011 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+LDP_v 01 101 1 011 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+STP_v 10 101 1 011 0 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=1
+LDP_v 10 101 1 011 1 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=1
+
+### Load/store pair — STGP (store allocation tag + pair)
+
+STGP 01 101 0 001 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+STGP 01 101 0 010 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STGP 01 101 0 011 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+
### Load/store register — register offset
# GPR
diff --git a/target/arm/emulate/arm_emulate.c b/target/arm/emulate/arm_emulate.c
index bf09e2a6..6c63a0d0 100644
--- a/target/arm/emulate/arm_emulate.c
+++ b/target/arm/emulate/arm_emulate.c
@@ -122,6 +122,117 @@ static uint64_t load_extend(uint64_t val, int sz, int sign, int ext)
return val;
}
+/*
+ * Load/store pair: STP, LDP, STNP, LDNP, STGP, LDPSW
+ * (DDI 0487 C3.3.14 -- C3.3.16)
+ */
+
+static bool trans_STP(DisasContext *ctx, arg_ldstpair *a)
+{
+ int esize = 1 << a->sz; /* 4 or 8 bytes */
+ int64_t offset = (int64_t)a->imm << a->sz;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset; /* post-index: unmodified base */
+ uint8_t buf[16]; /* max 2 x 8 bytes */
+
+ uint64_t v1 = gpr_read(ctx, a->rt);
+ uint64_t v2 = gpr_read(ctx, a->rt2);
+ memcpy(buf, &v1, esize);
+ memcpy(buf + esize, &v2, esize);
+
+ if (mem_write(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+static bool trans_LDP(DisasContext *ctx, arg_ldstpair *a)
+{
+ int esize = 1 << a->sz;
+ int64_t offset = (int64_t)a->imm << a->sz;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[16];
+ uint64_t v1 = 0, v2 = 0;
+
+ if (mem_read(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+ memcpy(&v1, buf, esize);
+ memcpy(&v2, buf + esize, esize);
+
+ /* LDPSW: sign-extend 32-bit values to 64-bit (sign=1, sz=2) */
+ if (a->sign) {
+ v1 = sign_extend(v1, 8 * esize);
+ v2 = sign_extend(v2, 8 * esize);
+ }
+
+ gpr_write(ctx, a->rt, v1);
+ gpr_write(ctx, a->rt2, v2);
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+/* STGP: tag operation is a NOP for emulation; data stored via STP */
+static bool trans_STGP(DisasContext *ctx, arg_ldstpair *a)
+{
+ return trans_STP(ctx, a);
+}
+
+/*
+ * SIMD/FP load/store pair: STP_v, LDP_v
+ * (DDI 0487 C3.3.14 -- C3.3.16)
+ */
+
+static bool trans_STP_v(DisasContext *ctx, arg_ldstpair *a)
+{
+ int esize = 1 << a->sz; /* 4, 8, or 16 bytes */
+ int64_t offset = (int64_t)a->imm << a->sz;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[32]; /* max 2 x 16 bytes */
+
+ fpreg_read(ctx, a->rt, buf, esize);
+ fpreg_read(ctx, a->rt2, buf + esize, esize);
+
+ if (mem_write(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+static bool trans_LDP_v(DisasContext *ctx, arg_ldstpair *a)
+{
+ int esize = 1 << a->sz;
+ int64_t offset = (int64_t)a->imm << a->sz;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[32];
+
+ if (mem_read(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ fpreg_write(ctx, a->rt, buf, esize);
+ fpreg_write(ctx, a->rt2, buf + esize, esize);
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
/* Load/store single -- immediate (GPR) (DDI 0487 C3.3.8 -- C3.3.13) */
static bool trans_STR_i(DisasContext *ctx, arg_ldst_imm *a)
--
2.52.0
Add emulation for load/store exclusive instructions (DDI 0487 C3.3.6).
Exclusive monitors have no meaning on emulated MMIO accesses, so STXR
always reports success (Rs=0) and LDXR does not set a monitor.
Instruction coverage:
- STXR/STLXR: exclusive store, 8/16/32/64-bit
- LDXR/LDAXR: exclusive load, 8/16/32/64-bit
- STXP/STLXP: exclusive store pair, 32/64-bit
- LDXP/LDAXP: exclusive load pair, 32/64-bit
STXP/LDXP use two explicit decode patterns (sz=2, sz=3) for the
32/64-bit size variants.
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
target/arm/emulate/a64-ldst.decode | 22 +++++++++
target/arm/emulate/arm_emulate.c | 74 ++++++++++++++++++++++++++++++
2 files changed, 96 insertions(+)
diff --git a/target/arm/emulate/a64-ldst.decode b/target/arm/emulate/a64-ldst.decode
index f3de3f86..fadf6fd2 100644
--- a/target/arm/emulate/a64-ldst.decode
+++ b/target/arm/emulate/a64-ldst.decode
@@ -10,6 +10,9 @@
# 'u' flag: 0 = 9-bit signed immediate (byte offset), 1 = 12-bit unsigned (needs << sz)
&ldst_imm rt rn imm sz sign w p unpriv ext u
+# Load/store exclusive
+&stxr rn rt rt2 rs sz lasr
+
# Load/store pair (GPR and SIMD/FP)
&ldstpair rt2 rt rn imm sz sign w p
@@ -18,6 +21,9 @@
### Format templates
+# Exclusives
+@stxr sz:2 ...... ... rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr
+
# Load/store immediate (9-bit signed)
@ldst_imm .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=0 p=0 w=0
@ldst_imm_pre .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=0 p=0 w=1
@@ -134,6 +140,22 @@ STR_v_i 00 111 1 01 10 ............ ..... ..... @ldst_uimm sign=
LDR_v_i sz:2 111 1 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=0
LDR_v_i 00 111 1 01 11 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=4
+### Load/store exclusive
+
+# STXR / STLXR (sz encodes 8/16/32/64-bit)
+STXR .. 001000 000 ..... . ..... ..... ..... @stxr
+
+# LDXR / LDAXR
+LDXR .. 001000 010 ..... . ..... ..... ..... @stxr
+
+# STXP / STLXP (bit[31]=1, bit[30]=sf → sz=2 for 32-bit, sz=3 for 64-bit)
+STXP 10 001000 001 rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr sz=2
+STXP 11 001000 001 rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr sz=3
+
+# LDXP / LDAXP
+LDXP 10 001000 011 rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr sz=2
+LDXP 11 001000 011 rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr sz=3
+
### Load/store pair — non-temporal (STNP/LDNP)
# STNP/LDNP: offset only, no writeback. Non-temporal hint ignored.
diff --git a/target/arm/emulate/arm_emulate.c b/target/arm/emulate/arm_emulate.c
index 6c63a0d0..52e41703 100644
--- a/target/arm/emulate/arm_emulate.c
+++ b/target/arm/emulate/arm_emulate.c
@@ -425,6 +425,80 @@ static bool trans_LDR_v(DisasContext *ctx, arg_ldst *a)
return true;
}
+/*
+ * Load/store exclusive: STXR, LDXR, STXP, LDXP
+ * (DDI 0487 C3.3.6)
+ *
+ * Exclusive monitors have no meaning on MMIO. STXR always reports
+ * success (Rs=0) and LDXR does not set an exclusive monitor.
+ */
+
+static bool trans_STXR(DisasContext *ctx, arg_stxr *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t val = gpr_read(ctx, a->rt);
+
+ if (mem_write(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ /* Report success -- no exclusive monitor on emulated access */
+ gpr_write(ctx, a->rs, 0);
+ return true;
+}
+
+static bool trans_LDXR(DisasContext *ctx, arg_stxr *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t val = 0;
+
+ if (mem_read(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ gpr_write(ctx, a->rt, val);
+ return true;
+}
+
+static bool trans_STXP(DisasContext *ctx, arg_stxr *a)
+{
+ int esize = 1 << a->sz; /* sz=2->4, sz=3->8 */
+ uint64_t va = base_read(ctx, a->rn);
+ uint8_t buf[16];
+
+ uint64_t v1 = gpr_read(ctx, a->rt);
+ uint64_t v2 = gpr_read(ctx, a->rt2);
+ memcpy(buf, &v1, esize);
+ memcpy(buf + esize, &v2, esize);
+
+ if (mem_write(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ gpr_write(ctx, a->rs, 0); /* success */
+ return true;
+}
+
+static bool trans_LDXP(DisasContext *ctx, arg_stxr *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint8_t buf[16];
+ uint64_t v1 = 0, v2 = 0;
+
+ if (mem_read(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ memcpy(&v1, buf, esize);
+ memcpy(&v2, buf + esize, esize);
+ gpr_write(ctx, a->rt, v1);
+ gpr_write(ctx, a->rt2, v2);
+ return true;
+}
+
/* PRFM, DC cache maintenance -- treated as NOP */
static bool trans_NOP(DisasContext *ctx, arg_NOP *a)
{
--
2.52.0
Add emulation for remaining ISV=0 load/store instruction classes.
Atomic memory operations (DDI 0487 C3.3.2):
- LDADD, LDCLR, LDEOR, LDSET: arithmetic/logic atomics
- LDSMAX, LDSMIN, LDUMAX, LDUMIN: signed/unsigned min/max
- SWP: atomic swap
Non-atomic read-modify-write, sufficient for MMIO where concurrent
access is not a concern. Acquire/release semantics are ignored.
Compare-and-swap (DDI 0487 C3.3.1):
- CAS/CASA/CASAL/CASL: single-register compare-and-swap
- CASP/CASPA/CASPAL/CASPL: register-pair compare-and-swap
CASP validates even register pairs; odd or r31 returns UNHANDLED.
Load with PAC (DDI 0487 C6.2.121):
- LDRAA/LDRAB: pointer-authenticated load, offset/pre-indexed
Pointer authentication is not emulated (equivalent to auth always
succeeding), which is correct for MMIO since PAC is a software
security mechanism, not a memory access semantic.
CASP uses two explicit decode patterns for the 32/64-bit size
variants. LDRA's offset immediate is stored raw in the decode;
the handler scales by << 3.
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
target/arm/emulate/a64-ldst.decode | 45 ++++++
target/arm/emulate/arm_emulate.c | 233 +++++++++++++++++++++++++++++
2 files changed, 278 insertions(+)
diff --git a/target/arm/emulate/a64-ldst.decode b/target/arm/emulate/a64-ldst.decode
index fadf6fd2..9292bfdf 100644
--- a/target/arm/emulate/a64-ldst.decode
+++ b/target/arm/emulate/a64-ldst.decode
@@ -16,6 +16,16 @@
# Load/store pair (GPR and SIMD/FP)
&ldstpair rt2 rt rn imm sz sign w p
+# Atomic memory operations
+&atomic rs rn rt a r sz
+
+# Compare-and-swap
+&cas rs rn rt sz a r
+
+# Load with PAC (LDRAA/LDRAB, FEAT_PAuth)
+%ldra_imm 22:s1 12:9
+&ldra rt rn imm m w
+
# Load/store register offset
&ldst rm rn rt sign ext sz opt s
@@ -36,6 +46,15 @@
# Load/store pair: imm7 is signed, scaled by element size in handler
@ldstpair .. ... . ... . imm:s7 rt2:5 rn:5 rt:5 &ldstpair
+# Atomics
+@atomic sz:2 ... . .. a:1 r:1 . rs:5 . ... .. rn:5 rt:5 &atomic
+
+# Compare-and-swap: sz extracted by pattern (CAS) or set constant (CASP)
+@cas .. ...... . a:1 . rs:5 r:1 ..... rn:5 rt:5 &cas
+
+# Load with PAC
+@ldra .. ... . .. m:1 . . ......... w:1 . rn:5 rt:5 &ldra imm=%ldra_imm
+
# Load/store register offset
@ldst .. ... . .. .. . rm:5 opt:3 s:1 .. rn:5 rt:5 &ldst
@@ -241,6 +260,32 @@ STR_v 00 111 1 00 10 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=
LDR_v sz:2 111 1 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0
LDR_v 00 111 1 00 11 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0 sz=4
+### Compare-and-swap
+
+# CAS / CASA / CASAL / CASL
+CAS sz:2 001000 1 . 1 ..... . 11111 ..... ..... @cas
+
+# CASP / CASPA / CASPAL / CASPL (pair: Rt,Rt+1 and Rs,Rs+1)
+CASP 00 001000 0 . 1 ..... . 11111 ..... ..... @cas sz=2
+CASP 01 001000 0 . 1 ..... . 11111 ..... ..... @cas sz=3
+
+### Atomic memory operations
+
+LDADD .. 111 0 00 . . 1 ..... 0000 00 ..... ..... @atomic
+LDCLR .. 111 0 00 . . 1 ..... 0001 00 ..... ..... @atomic
+LDEOR .. 111 0 00 . . 1 ..... 0010 00 ..... ..... @atomic
+LDSET .. 111 0 00 . . 1 ..... 0011 00 ..... ..... @atomic
+LDSMAX .. 111 0 00 . . 1 ..... 0100 00 ..... ..... @atomic
+LDSMIN .. 111 0 00 . . 1 ..... 0101 00 ..... ..... @atomic
+LDUMAX .. 111 0 00 . . 1 ..... 0110 00 ..... ..... @atomic
+LDUMIN .. 111 0 00 . . 1 ..... 0111 00 ..... ..... @atomic
+SWP .. 111 0 00 . . 1 ..... 1000 00 ..... ..... @atomic
+
+### Load with PAC (FEAT_PAuth)
+
+# LDRAA (M=0) / LDRAB (M=1), offset (W=0) / pre-indexed (W=1)
+LDRA 11 111 0 00 . . 1 ......... . 1 ..... ..... @ldra
+
### System instructions — DC cache maintenance
# SYS with CRn=C7 covers all data cache operations (DC CIVAC, CVAC, etc.).
diff --git a/target/arm/emulate/arm_emulate.c b/target/arm/emulate/arm_emulate.c
index 52e41703..44a559ad 100644
--- a/target/arm/emulate/arm_emulate.c
+++ b/target/arm/emulate/arm_emulate.c
@@ -499,6 +499,239 @@ static bool trans_LDXP(DisasContext *ctx, arg_stxr *a)
return true;
}
+/*
+ * Atomic memory operations (DDI 0487 C3.3.2)
+ *
+ * Non-atomic read-modify-write; sufficient for MMIO.
+ * Acquire/release semantics ignored (sequentially consistent by design).
+ */
+
+typedef uint64_t (*atomic_op_fn)(uint64_t old, uint64_t operand, int bits);
+
+static uint64_t atomic_add(uint64_t old, uint64_t op, int bits)
+{
+ (void)bits;
+ return old + op;
+}
+
+static uint64_t atomic_clr(uint64_t old, uint64_t op, int bits)
+{
+ (void)bits;
+ return old & ~op;
+}
+
+static uint64_t atomic_eor(uint64_t old, uint64_t op, int bits)
+{
+ (void)bits;
+ return old ^ op;
+}
+
+static uint64_t atomic_set(uint64_t old, uint64_t op, int bits)
+{
+ (void)bits;
+ return old | op;
+}
+
+static uint64_t atomic_smax(uint64_t old, uint64_t op, int bits)
+{
+ int64_t a = sign_extend(old, bits);
+ int64_t b = sign_extend(op, bits);
+ return (a >= b) ? old : op;
+}
+
+static uint64_t atomic_smin(uint64_t old, uint64_t op, int bits)
+{
+ int64_t a = sign_extend(old, bits);
+ int64_t b = sign_extend(op, bits);
+ return (a <= b) ? old : op;
+}
+
+static uint64_t atomic_umax(uint64_t old, uint64_t op, int bits)
+{
+ uint64_t mask = (bits == 64) ? UINT64_MAX : (1ULL << bits) - 1;
+ return ((old & mask) >= (op & mask)) ? old : op;
+}
+
+static uint64_t atomic_umin(uint64_t old, uint64_t op, int bits)
+{
+ uint64_t mask = (bits == 64) ? UINT64_MAX : (1ULL << bits) - 1;
+ return ((old & mask) <= (op & mask)) ? old : op;
+}
+
+static bool do_atomic(DisasContext *ctx, arg_atomic *a, atomic_op_fn fn)
+{
+ int esize = 1 << a->sz;
+ int bits = 8 * esize;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t old = 0;
+
+ if (mem_read(ctx, va, &old, esize) != 0) {
+ return true;
+ }
+
+ uint64_t operand = gpr_read(ctx, a->rs);
+ uint64_t result = fn(old, operand, bits);
+
+ if (mem_write(ctx, va, &result, esize) != 0) {
+ return true;
+ }
+
+ /* Rt receives the old value (before modification) */
+ gpr_write(ctx, a->rt, old);
+ return true;
+}
+
+static bool trans_LDADD(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_add);
+}
+
+static bool trans_LDCLR(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_clr);
+}
+
+static bool trans_LDEOR(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_eor);
+}
+
+static bool trans_LDSET(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_set);
+}
+
+static bool trans_LDSMAX(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_smax);
+}
+
+static bool trans_LDSMIN(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_smin);
+}
+
+static bool trans_LDUMAX(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_umax);
+}
+
+static bool trans_LDUMIN(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_umin);
+}
+
+static bool trans_SWP(DisasContext *ctx, arg_atomic *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t old = 0;
+
+ if (mem_read(ctx, va, &old, esize) != 0) {
+ return true;
+ }
+
+ uint64_t newval = gpr_read(ctx, a->rs);
+ if (mem_write(ctx, va, &newval, esize) != 0) {
+ return true;
+ }
+
+ gpr_write(ctx, a->rt, old);
+ return true;
+}
+
+/* Compare-and-swap: CAS, CASP (DDI 0487 C3.3.1) */
+
+static bool trans_CAS(DisasContext *ctx, arg_cas *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t current = 0;
+
+ if (mem_read(ctx, va, ¤t, esize) != 0) {
+ return true;
+ }
+
+ uint64_t mask = (esize == 8) ? UINT64_MAX : (1ULL << (8 * esize)) - 1;
+ uint64_t compare = gpr_read(ctx, a->rs) & mask;
+
+ if ((current & mask) == compare) {
+ uint64_t newval = gpr_read(ctx, a->rt) & mask;
+ if (mem_write(ctx, va, &newval, esize) != 0) {
+ return true;
+ }
+ }
+
+ /* Rs receives the old memory value (whether or not swap occurred) */
+ gpr_write(ctx, a->rs, current);
+ return true;
+}
+
+/* CASP: compare-and-swap pair (Rs,Rs+1 compared; Rt,Rt+1 stored) */
+static bool trans_CASP(DisasContext *ctx, arg_cas *a)
+{
+ /* CASP requires even register pairs; odd or r31 is UNPREDICTABLE */
+ if ((a->rs & 1) || a->rs >= 31 || (a->rt & 1) || a->rt >= 31) {
+ return false;
+ }
+
+ int esize = 1 << a->sz; /* per-register size */
+ uint64_t va = base_read(ctx, a->rn);
+ uint8_t buf[16];
+ uint64_t cur1 = 0, cur2 = 0;
+
+ if (mem_read(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+ memcpy(&cur1, buf, esize);
+ memcpy(&cur2, buf + esize, esize);
+
+ uint64_t mask = (esize == 8) ? UINT64_MAX : (1ULL << (8 * esize)) - 1;
+ uint64_t cmp1 = gpr_read(ctx, a->rs) & mask;
+ uint64_t cmp2 = gpr_read(ctx, a->rs + 1) & mask;
+
+ if ((cur1 & mask) == cmp1 && (cur2 & mask) == cmp2) {
+ uint64_t new1 = gpr_read(ctx, a->rt) & mask;
+ uint64_t new2 = gpr_read(ctx, a->rt + 1) & mask;
+ memcpy(buf, &new1, esize);
+ memcpy(buf + esize, &new2, esize);
+ if (mem_write(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+ }
+
+ gpr_write(ctx, a->rs, cur1);
+ gpr_write(ctx, a->rs + 1, cur2);
+ return true;
+}
+
+/*
+ * Load with PAC: LDRAA / LDRAB (FEAT_PAuth)
+ * (DDI 0487 C6.2.121)
+ *
+ * Pointer authentication is not emulated -- the base register is used
+ * directly (equivalent to auth always succeeding).
+ */
+
+static bool trans_LDRA(DisasContext *ctx, arg_ldra *a)
+{
+ int64_t offset = (int64_t)a->imm << 3; /* S:imm9, scaled by 8 */
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = base + offset; /* auth not emulated */
+ uint64_t val = 0;
+
+ if (mem_read(ctx, va, &val, 8) != 0) {
+ return true;
+ }
+
+ gpr_write(ctx, a->rt, val);
+
+ if (a->w) {
+ base_write(ctx, a->rn, va);
+ }
+ return true;
+}
+
/* PRFM, DC cache maintenance -- treated as NOP */
static bool trans_NOP(DisasContext *ctx, arg_NOP *a)
{
--
2.52.0
When a data abort with ISV=0 occurs during MMIO emulation, the
syndrome register does not carry the access size or target register.
Previously this hit an assert(isv) and killed the VM.
Replace the assert with instruction fetch + decode + emulate using the
shared library in target/arm/emulate/. The faulting instruction is read
from guest memory via cpu_memory_rw_debug(), decoded by the decodetree-
generated decoder, and emulated against the vCPU register file.
Both HVF (macOS) and WHPX (Windows Hyper-V) use the same pattern:
1. cpu_synchronize_state() to flush hypervisor registers
2. Fetch 4-byte instruction at env->pc
3. arm_emul_insn(env, insn)
4. On success, advance PC past the emulated instruction
If the instruction is unhandled or a memory error occurs, a synchronous
external abort is injected into the guest via syn_data_abort_no_iss()
with fnv=1 and fsc=0x10, matching the syndrome that KVM uses in
kvm_inject_arm_sea(). The guest kernel's fault handler then reports
the error through its normal data abort path.
WHPX adds a whpx_inject_data_abort() helper and adjusts the
whpx_handle_mmio() return convention so the caller skips PC advancement
when an exception has been injected.
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
target/arm/hvf/hvf.c | 46 ++++++++++++++++++++++++++--
target/arm/whpx/whpx-all.c | 61 +++++++++++++++++++++++++++++++++++++-
2 files changed, 103 insertions(+), 4 deletions(-)
diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index 5fc8f6bb..000e54bd 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -32,6 +32,7 @@
#include "arm-powerctl.h"
#include "target/arm/cpu.h"
#include "target/arm/internals.h"
+#include "emulate/arm_emulate.h"
#include "target/arm/multiprocessing.h"
#include "target/arm/gtimer.h"
#include "target/arm/trace.h"
@@ -2175,10 +2176,49 @@ static int hvf_handle_exception(CPUState *cpu, hv_vcpu_exit_exception_t *excp)
assert(!s1ptw);
/*
- * TODO: ISV will be 0 for SIMD or SVE accesses.
- * Inject the exception into the guest.
+ * ISV=0: syndrome doesn't carry access size/register info.
+ * Fetch and emulate via target/arm/emulate/.
*/
- assert(isv);
+ if (!isv) {
+ ARMCPU *arm_cpu = ARM_CPU(cpu);
+ CPUARMState *env = &arm_cpu->env;
+ uint32_t insn;
+ ArmEmulResult r;
+
+ cpu_synchronize_state(cpu);
+
+ if (cpu_memory_rw_debug(cpu, env->pc,
+ (uint8_t *)&insn, 4, false) != 0) {
+ bool same_el = arm_current_el(env) == 1;
+ uint32_t esr = syn_data_abort_no_iss(same_el,
+ 1, 0, 0, 0, iswrite, 0x10);
+
+ error_report("HVF: cannot read insn at pc=0x%" PRIx64,
+ (uint64_t)env->pc);
+ env->exception.vaddress = excp->virtual_address;
+ hvf_raise_exception(cpu, EXCP_DATA_ABORT, esr, 1);
+ break;
+ }
+
+ r = arm_emul_insn(env, insn);
+ if (r == ARM_EMUL_UNHANDLED || r == ARM_EMUL_ERR_MEM) {
+ bool same_el = arm_current_el(env) == 1;
+ uint32_t esr = syn_data_abort_no_iss(same_el,
+ 1, 0, 0, 0, iswrite, 0x10);
+
+ error_report("HVF: ISV=0 %s insn 0x%08x at "
+ "pc=0x%" PRIx64 ", injecting data abort",
+ r == ARM_EMUL_UNHANDLED ? "unhandled"
+ : "memory error",
+ insn, (uint64_t)env->pc);
+ env->exception.vaddress = excp->virtual_address;
+ hvf_raise_exception(cpu, EXCP_DATA_ABORT, esr, 1);
+ break;
+ }
+
+ advance_pc = true;
+ break;
+ }
/*
* Emulate MMIO.
diff --git a/target/arm/whpx/whpx-all.c b/target/arm/whpx/whpx-all.c
index 513551be..0c04073e 100644
--- a/target/arm/whpx/whpx-all.c
+++ b/target/arm/whpx/whpx-all.c
@@ -29,6 +29,7 @@
#include "syndrome.h"
#include "target/arm/cpregs.h"
#include "internals.h"
+#include "emulate/arm_emulate.h"
#include "system/whpx-internal.h"
#include "system/whpx-accel-ops.h"
@@ -352,6 +353,27 @@ static void whpx_set_gp_reg(CPUState *cpu, int rt, uint64_t val)
whpx_set_reg(cpu, reg, reg_val);
}
+/*
+ * Inject a synchronous external abort (data abort) into the guest.
+ * Used when ISV=0 instruction emulation fails. Matches the syndrome
+ * that KVM uses in kvm_inject_arm_sea().
+ */
+static void whpx_inject_data_abort(CPUState *cpu, bool iswrite)
+{
+ ARMCPU *arm_cpu = ARM_CPU(cpu);
+ CPUARMState *env = &arm_cpu->env;
+ bool same_el = arm_current_el(env) == 1;
+ uint32_t esr = syn_data_abort_no_iss(same_el, 1, 0, 0, 0, iswrite, 0x10);
+
+ cpu->exception_index = EXCP_DATA_ABORT;
+ env->exception.target_el = 1;
+ env->exception.syndrome = esr;
+
+ bql_lock();
+ arm_cpu_do_interrupt(cpu);
+ bql_unlock();
+}
+
static int whpx_handle_mmio(CPUState *cpu, WHV_MEMORY_ACCESS_CONTEXT *ctx)
{
uint64_t syndrome = ctx->Syndrome;
@@ -366,7 +388,40 @@ static int whpx_handle_mmio(CPUState *cpu, WHV_MEMORY_ACCESS_CONTEXT *ctx)
uint64_t val = 0;
assert(!cm);
- assert(isv);
+
+ /*
+ * ISV=0: syndrome doesn't carry access size/register info.
+ * Fetch and decode the faulting instruction via the emulation library.
+ */
+ if (!isv) {
+ ARMCPU *arm_cpu = ARM_CPU(cpu);
+ CPUARMState *env = &arm_cpu->env;
+ uint32_t insn;
+ ArmEmulResult r;
+
+ cpu_synchronize_state(cpu);
+
+ if (cpu_memory_rw_debug(cpu, env->pc,
+ (uint8_t *)&insn, 4, false) != 0) {
+ error_report("WHPX: cannot read insn at pc=0x%" PRIx64,
+ (uint64_t)env->pc);
+ whpx_inject_data_abort(cpu, iswrite);
+ return 1;
+ }
+
+ r = arm_emul_insn(env, insn);
+ if (r == ARM_EMUL_UNHANDLED || r == ARM_EMUL_ERR_MEM) {
+ error_report("WHPX: ISV=0 %s insn 0x%08x at "
+ "pc=0x%" PRIx64 ", injecting data abort",
+ r == ARM_EMUL_UNHANDLED ? "unhandled"
+ : "memory error",
+ insn, (uint64_t)env->pc);
+ whpx_inject_data_abort(cpu, iswrite);
+ return 1;
+ }
+
+ return 0;
+ }
if (iswrite) {
val = whpx_get_gp_reg(cpu, srt);
@@ -451,6 +506,10 @@ int whpx_vcpu_run(CPUState *cpu)
}
ret = whpx_handle_mmio(cpu, &vcpu->exit_ctx.MemoryAccess);
+ if (ret > 0) {
+ advance_pc = false;
+ ret = 0;
+ }
break;
case WHvRunVpExitReasonCanceled:
cpu->exception_index = EXCP_INTERRUPT;
--
2.52.0
Add a shared emulation library for AArch64 load/store instructions that
cause ISV=0 data aborts under hardware virtualization (HVF, WHPX).
When the Instruction Syndrome Valid bit is clear, the hypervisor cannot
determine the faulting instruction's target register or access size from
the syndrome alone. This library fetches and decodes the instruction
using a decodetree-generated decoder, then emulates it by accessing the
vCPU's register file (CPUARMState) and memory (cpu_memory_rw_debug)
directly.
This patch establishes the framework and adds load/store single with
immediate addressing — the most common ISV=0 trigger. Subsequent
patches add register-offset, pair, exclusive, and atomic instructions.
Instruction coverage:
- STR/LDR (GPR): unscaled, post-indexed, unprivileged, pre-indexed,
unsigned offset — all sizes (8/16/32/64-bit), sign/zero extension
- STR/LDR (SIMD/FP): same addressing modes, 8-128 bit elements
- PRFM: prefetch treated as NOP
- DC cache maintenance (SYS CRn=C7): NOP on MMIO
This library uses its own a64-ldst.decode rather than sharing
target/arm/tcg/a64.decode. TCG's trans_* functions are a compiler:
they emit IR ops into a translation block for later execution. This
library's trans_* functions are an interpreter: they execute directly
against the vCPU register file and memory. The decodetree-generated
dispatcher calls trans_* by name, so both cannot coexist in the same
translation unit. Decode patterns are kept consistent with TCG's
where possible.
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
target/arm/emulate/a64-ldst.decode | 129 ++++++++++++++++
target/arm/emulate/arm_emulate.c | 226 +++++++++++++++++++++++++++++
target/arm/emulate/arm_emulate.h | 30 ++++
target/arm/emulate/meson.build | 6 +
target/arm/meson.build | 1 +
5 files changed, 392 insertions(+)
create mode 100644 target/arm/emulate/a64-ldst.decode
create mode 100644 target/arm/emulate/arm_emulate.c
create mode 100644 target/arm/emulate/arm_emulate.h
create mode 100644 target/arm/emulate/meson.build
diff --git a/target/arm/emulate/a64-ldst.decode b/target/arm/emulate/a64-ldst.decode
new file mode 100644
index 00000000..c887dcba
--- /dev/null
+++ b/target/arm/emulate/a64-ldst.decode
@@ -0,0 +1,129 @@
+# AArch64 load/store instruction patterns for ISV=0 emulation
+#
+# Copyright (c) 2026 Lucas Amaral <lucaaamaral@gmail.com>
+#
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+### Argument sets
+
+# Load/store immediate (unscaled, pre/post-index, unprivileged, unsigned offset)
+# 'u' flag: 0 = 9-bit signed immediate (byte offset), 1 = 12-bit unsigned (needs << sz)
+&ldst_imm rt rn imm sz sign w p unpriv ext u
+
+### Format templates
+
+# Load/store immediate (9-bit signed)
+@ldst_imm .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=0 p=0 w=0
+@ldst_imm_pre .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=0 p=0 w=1
+@ldst_imm_post .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=0 p=1 w=1
+@ldst_imm_user .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=1 p=0 w=0
+
+# Load/store unsigned offset (12-bit, handler scales by << sz)
+@ldst_uimm .. ... . .. .. imm:12 rn:5 rt:5 &ldst_imm u=1 unpriv=0 p=0 w=0
+
+### Load/store register — unscaled immediate (LDUR/STUR)
+
+# GPR
+STR_i sz:2 111 0 00 00 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0
+LDR_i 00 111 0 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=1 sz=0
+LDR_i 01 111 0 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=1 sz=1
+LDR_i 10 111 0 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=1 sz=2
+LDR_i 11 111 0 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0 sz=3
+LDR_i 00 111 0 00 10 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=0 sz=0
+LDR_i 01 111 0 00 10 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=0 sz=1
+LDR_i 10 111 0 00 10 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=0 sz=2
+LDR_i 00 111 0 00 11 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=1 sz=0
+LDR_i 01 111 0 00 11 0 ......... 00 ..... ..... @ldst_imm sign=1 ext=1 sz=1
+
+# SIMD/FP
+STR_v_i sz:2 111 1 00 00 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0
+STR_v_i 00 111 1 00 10 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0 sz=4
+LDR_v_i sz:2 111 1 00 01 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0
+LDR_v_i 00 111 1 00 11 0 ......... 00 ..... ..... @ldst_imm sign=0 ext=0 sz=4
+
+### Load/store register — post-indexed
+
+# GPR
+STR_i sz:2 111 0 00 00 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0
+LDR_i 00 111 0 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=1 sz=0
+LDR_i 01 111 0 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=1 sz=1
+LDR_i 10 111 0 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=1 sz=2
+LDR_i 11 111 0 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0 sz=3
+LDR_i 00 111 0 00 10 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=0 sz=0
+LDR_i 01 111 0 00 10 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=0 sz=1
+LDR_i 10 111 0 00 10 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=0 sz=2
+LDR_i 00 111 0 00 11 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=1 sz=0
+LDR_i 01 111 0 00 11 0 ......... 01 ..... ..... @ldst_imm_post sign=1 ext=1 sz=1
+
+# SIMD/FP
+STR_v_i sz:2 111 1 00 00 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0
+STR_v_i 00 111 1 00 10 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0 sz=4
+LDR_v_i sz:2 111 1 00 01 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0
+LDR_v_i 00 111 1 00 11 0 ......... 01 ..... ..... @ldst_imm_post sign=0 ext=0 sz=4
+
+### Load/store register — unprivileged
+
+# GPR only (no SIMD/FP unprivileged forms)
+STR_i sz:2 111 0 00 00 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=0
+LDR_i 00 111 0 00 01 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=1 sz=0
+LDR_i 01 111 0 00 01 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=1 sz=1
+LDR_i 10 111 0 00 01 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=1 sz=2
+LDR_i 11 111 0 00 01 0 ......... 10 ..... ..... @ldst_imm_user sign=0 ext=0 sz=3
+LDR_i 00 111 0 00 10 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=0 sz=0
+LDR_i 01 111 0 00 10 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=0 sz=1
+LDR_i 10 111 0 00 10 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=0 sz=2
+LDR_i 00 111 0 00 11 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=1 sz=0
+LDR_i 01 111 0 00 11 0 ......... 10 ..... ..... @ldst_imm_user sign=1 ext=1 sz=1
+
+### Load/store register — pre-indexed
+
+# GPR
+STR_i sz:2 111 0 00 00 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0
+LDR_i 00 111 0 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=1 sz=0
+LDR_i 01 111 0 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=1 sz=1
+LDR_i 10 111 0 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=1 sz=2
+LDR_i 11 111 0 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0 sz=3
+LDR_i 00 111 0 00 10 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=0 sz=0
+LDR_i 01 111 0 00 10 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=0 sz=1
+LDR_i 10 111 0 00 10 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=0 sz=2
+LDR_i 00 111 0 00 11 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=1 sz=0
+LDR_i 01 111 0 00 11 0 ......... 11 ..... ..... @ldst_imm_pre sign=1 ext=1 sz=1
+
+# SIMD/FP
+STR_v_i sz:2 111 1 00 00 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0
+STR_v_i 00 111 1 00 10 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0 sz=4
+LDR_v_i sz:2 111 1 00 01 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0
+LDR_v_i 00 111 1 00 11 0 ......... 11 ..... ..... @ldst_imm_pre sign=0 ext=0 sz=4
+
+### PRFM — unscaled immediate: prefetch is a NOP
+
+NOP 11 111 0 00 10 0 --------- 00 ----- -----
+
+### Load/store register — unsigned offset
+
+# GPR
+STR_i sz:2 111 0 01 00 ............ ..... ..... @ldst_uimm sign=0 ext=0
+LDR_i 00 111 0 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=1 sz=0
+LDR_i 01 111 0 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=1 sz=1
+LDR_i 10 111 0 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=1 sz=2
+LDR_i 11 111 0 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=3
+LDR_i 00 111 0 01 10 ............ ..... ..... @ldst_uimm sign=1 ext=0 sz=0
+LDR_i 01 111 0 01 10 ............ ..... ..... @ldst_uimm sign=1 ext=0 sz=1
+LDR_i 10 111 0 01 10 ............ ..... ..... @ldst_uimm sign=1 ext=0 sz=2
+LDR_i 00 111 0 01 11 ............ ..... ..... @ldst_uimm sign=1 ext=1 sz=0
+LDR_i 01 111 0 01 11 ............ ..... ..... @ldst_uimm sign=1 ext=1 sz=1
+
+# PRFM — unsigned offset
+NOP 11 111 0 01 10 ------------ ----- -----
+
+# SIMD/FP
+STR_v_i sz:2 111 1 01 00 ............ ..... ..... @ldst_uimm sign=0 ext=0
+STR_v_i 00 111 1 01 10 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=4
+LDR_v_i sz:2 111 1 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=0
+LDR_v_i 00 111 1 01 11 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=4
+
+### System instructions — DC cache maintenance
+
+# SYS with CRn=C7 covers all data cache operations (DC CIVAC, CVAC, etc.).
+# On MMIO regions, cache maintenance is a harmless no-op.
+NOP 1101 0101 0000 1 --- 0111 ---- --- -----
diff --git a/target/arm/emulate/arm_emulate.c b/target/arm/emulate/arm_emulate.c
new file mode 100644
index 00000000..2b4e2a9e
--- /dev/null
+++ b/target/arm/emulate/arm_emulate.c
@@ -0,0 +1,226 @@
+/*
+ * AArch64 instruction emulation for ISV=0 data aborts
+ *
+ * Copyright (c) 2026 Lucas Amaral <lucaaamaral@gmail.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "arm_emulate.h"
+#include "target/arm/cpu.h"
+#include "exec/cpu-common.h"
+
+/* Named "DisasContext" as required by the decodetree code generator */
+typedef struct {
+ CPUState *cpu;
+ CPUARMState *env;
+ ArmEmulResult result;
+} DisasContext;
+
+#include "decode-a64-ldst.c.inc"
+
+/* GPR data access (Rt, Rs, Rt2) -- register 31 = XZR */
+
+static uint64_t gpr_read(DisasContext *ctx, int reg)
+{
+ if (reg == 31) {
+ return 0; /* XZR */
+ }
+ return ctx->env->xregs[reg];
+}
+
+static void gpr_write(DisasContext *ctx, int reg, uint64_t val)
+{
+ if (reg == 31) {
+ return; /* XZR -- discard */
+ }
+ ctx->env->xregs[reg] = val;
+ ctx->cpu->vcpu_dirty = true;
+}
+
+/* Base register access (Rn) -- register 31 = SP */
+
+static uint64_t base_read(DisasContext *ctx, int rn)
+{
+ return ctx->env->xregs[rn];
+}
+
+static void base_write(DisasContext *ctx, int rn, uint64_t val)
+{
+ ctx->env->xregs[rn] = val;
+ ctx->cpu->vcpu_dirty = true;
+}
+
+/* SIMD/FP register access */
+
+static void fpreg_read(DisasContext *ctx, int reg, void *buf, int size)
+{
+ memcpy(buf, &ctx->env->vfp.zregs[reg], size);
+}
+
+static void fpreg_write(DisasContext *ctx, int reg, const void *buf, int size)
+{
+ memset(&ctx->env->vfp.zregs[reg], 0, sizeof(ctx->env->vfp.zregs[reg]));
+ memcpy(&ctx->env->vfp.zregs[reg], buf, size);
+ ctx->cpu->vcpu_dirty = true;
+}
+
+/* Memory access wrappers */
+
+static int mem_read(DisasContext *ctx, uint64_t va, void *buf, int size)
+{
+ int ret = cpu_memory_rw_debug(ctx->cpu, va, buf, size, false);
+ if (ret != 0) {
+ ctx->result = ARM_EMUL_ERR_MEM;
+ }
+ return ret;
+}
+
+static int mem_write(DisasContext *ctx, uint64_t va, const void *buf, int size)
+{
+ int ret = cpu_memory_rw_debug(ctx->cpu, va, (void *)buf, size, true);
+ if (ret != 0) {
+ ctx->result = ARM_EMUL_ERR_MEM;
+ }
+ return ret;
+}
+
+/* Sign/zero extension helpers */
+
+static uint64_t sign_extend(uint64_t val, int from_bits)
+{
+ int shift = 64 - from_bits;
+ return (int64_t)(val << shift) >> shift;
+}
+
+/* Apply sign/zero extension */
+static uint64_t load_extend(uint64_t val, int sz, int sign, int ext)
+{
+ int data_bits = 8 << sz;
+
+ if (sign) {
+ val = sign_extend(val, data_bits);
+ if (ext) {
+ /* Sign-extend to 32 bits (W register) */
+ val &= 0xFFFFFFFF;
+ }
+ } else if (ext) {
+ /* Zero-extend to 32 bits (W register) */
+ val &= 0xFFFFFFFF;
+ }
+ return val;
+}
+
+/* Load/store single -- immediate (GPR) (DDI 0487 C3.3.8 -- C3.3.13) */
+
+static bool trans_STR_i(DisasContext *ctx, arg_ldst_imm *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int64_t offset = a->u ? ((int64_t)(uint64_t)a->imm << a->sz)
+ : (int64_t)a->imm;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+
+ uint64_t val = gpr_read(ctx, a->rt);
+ if (mem_write(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+static bool trans_LDR_i(DisasContext *ctx, arg_ldst_imm *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int64_t offset = a->u ? ((int64_t)(uint64_t)a->imm << a->sz)
+ : (int64_t)a->imm;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint64_t val = 0;
+
+ if (mem_read(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ val = load_extend(val, a->sz, a->sign, a->ext);
+ gpr_write(ctx, a->rt, val);
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+/*
+ * Load/store single -- immediate (SIMD/FP)
+ * STR_v_i / LDR_v_i (DDI 0487 C3.3.10)
+ */
+
+static bool trans_STR_v_i(DisasContext *ctx, arg_ldst_imm *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int64_t offset = a->u ? ((int64_t)(uint64_t)a->imm << a->sz)
+ : (int64_t)a->imm;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[16];
+
+ fpreg_read(ctx, a->rt, buf, esize);
+ if (mem_write(ctx, va, buf, esize) != 0) {
+ return true;
+ }
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+static bool trans_LDR_v_i(DisasContext *ctx, arg_ldst_imm *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int64_t offset = a->u ? ((int64_t)(uint64_t)a->imm << a->sz)
+ : (int64_t)a->imm;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[16];
+
+ if (mem_read(ctx, va, buf, esize) != 0) {
+ return true;
+ }
+
+ fpreg_write(ctx, a->rt, buf, esize);
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+/* PRFM, DC cache maintenance -- treated as NOP */
+static bool trans_NOP(DisasContext *ctx, arg_NOP *a)
+{
+ (void)ctx;
+ (void)a;
+ return true;
+}
+
+/* Entry point */
+
+ArmEmulResult arm_emul_insn(CPUArchState *env, uint32_t insn)
+{
+ DisasContext ctx = {
+ .cpu = env_cpu(env),
+ .env = env,
+ .result = ARM_EMUL_OK,
+ };
+
+ if (!decode_a64_ldst(&ctx, insn)) {
+ return ARM_EMUL_UNHANDLED;
+ }
+
+ return ctx.result;
+}
diff --git a/target/arm/emulate/arm_emulate.h b/target/arm/emulate/arm_emulate.h
new file mode 100644
index 00000000..7fe29839
--- /dev/null
+++ b/target/arm/emulate/arm_emulate.h
@@ -0,0 +1,30 @@
+/*
+ * AArch64 instruction emulation library
+ *
+ * Copyright (c) 2026 Lucas Amaral <lucaaamaral@gmail.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef ARM_EMULATE_H
+#define ARM_EMULATE_H
+
+#include "qemu/osdep.h"
+
+/**
+ * ArmEmulResult - return status from arm_emul_insn()
+ */
+typedef enum {
+ ARM_EMUL_OK, /* Instruction emulated successfully */
+ ARM_EMUL_UNHANDLED, /* Instruction not recognized by decoder */
+ ARM_EMUL_ERR_MEM, /* Memory access failed */
+} ArmEmulResult;
+
+/**
+ * arm_emul_insn - decode and emulate one AArch64 instruction
+ *
+ * Caller must synchronize CPU state and fetch @insn before calling.
+ */
+ArmEmulResult arm_emul_insn(CPUArchState *env, uint32_t insn);
+
+#endif /* ARM_EMULATE_H */
diff --git a/target/arm/emulate/meson.build b/target/arm/emulate/meson.build
new file mode 100644
index 00000000..c0b38dd1
--- /dev/null
+++ b/target/arm/emulate/meson.build
@@ -0,0 +1,6 @@
+gen_a64_ldst = decodetree.process('a64-ldst.decode',
+ extra_args: ['--static-decode=decode_a64_ldst'])
+
+arm_common_system_ss.add(when: 'TARGET_AARCH64', if_true: [
+ gen_a64_ldst, files('arm_emulate.c')
+])
diff --git a/target/arm/meson.build b/target/arm/meson.build
index 6e0e504a..a4b2291b 100644
--- a/target/arm/meson.build
+++ b/target/arm/meson.build
@@ -57,6 +57,7 @@ arm_common_system_ss.add(files(
'vfp_fpscr.c',
))
+subdir('emulate')
subdir('hvf')
subdir('whpx')
--
2.52.0
Add emulation for load/store register offset addressing mode
(DDI 0487 C3.3.9). The offset register value is extended via
UXTB/UXTH/UXTW/UXTX/SXTB/SXTH/SXTW/SXTX and optionally
shifted by the element size.
Instruction coverage:
- STR/LDR (GPR): register offset with extend, all sizes
- STR/LDR (SIMD/FP): register offset with extend, 8-128 bit
- PRFM register offset: NOP
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
target/arm/emulate/a64-ldst.decode | 29 ++++++++
target/arm/emulate/arm_emulate.c | 103 +++++++++++++++++++++++++++++
2 files changed, 132 insertions(+)
diff --git a/target/arm/emulate/a64-ldst.decode b/target/arm/emulate/a64-ldst.decode
index c887dcba..af6babe1 100644
--- a/target/arm/emulate/a64-ldst.decode
+++ b/target/arm/emulate/a64-ldst.decode
@@ -10,6 +10,9 @@
# 'u' flag: 0 = 9-bit signed immediate (byte offset), 1 = 12-bit unsigned (needs << sz)
&ldst_imm rt rn imm sz sign w p unpriv ext u
+# Load/store register offset
+&ldst rm rn rt sign ext sz opt s
+
### Format templates
# Load/store immediate (9-bit signed)
@@ -21,6 +24,9 @@
# Load/store unsigned offset (12-bit, handler scales by << sz)
@ldst_uimm .. ... . .. .. imm:12 rn:5 rt:5 &ldst_imm u=1 unpriv=0 p=0 w=0
+# Load/store register offset
+@ldst .. ... . .. .. . rm:5 opt:3 s:1 .. rn:5 rt:5 &ldst
+
### Load/store register — unscaled immediate (LDUR/STUR)
# GPR
@@ -122,6 +128,29 @@ STR_v_i 00 111 1 01 10 ............ ..... ..... @ldst_uimm sign=
LDR_v_i sz:2 111 1 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=0
LDR_v_i 00 111 1 01 11 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=4
+### Load/store register — register offset
+
+# GPR
+STR sz:2 111 0 00 00 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0
+LDR 00 111 0 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=1 sz=0
+LDR 01 111 0 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=1 sz=1
+LDR 10 111 0 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=1 sz=2
+LDR 11 111 0 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0 sz=3
+LDR 00 111 0 00 10 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=0 sz=0
+LDR 01 111 0 00 10 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=0 sz=1
+LDR 10 111 0 00 10 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=0 sz=2
+LDR 00 111 0 00 11 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=1 sz=0
+LDR 01 111 0 00 11 1 ..... ... . 10 ..... ..... @ldst sign=1 ext=1 sz=1
+
+# PRFM — register offset
+NOP 11 111 0 00 10 1 ----- -1- - 10 ----- -----
+
+# SIMD/FP
+STR_v sz:2 111 1 00 00 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0
+STR_v 00 111 1 00 10 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0 sz=4
+LDR_v sz:2 111 1 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0
+LDR_v 00 111 1 00 11 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0 sz=4
+
### System instructions — DC cache maintenance
# SYS with CRn=C7 covers all data cache operations (DC CIVAC, CVAC, etc.).
diff --git a/target/arm/emulate/arm_emulate.c b/target/arm/emulate/arm_emulate.c
index 2b4e2a9e..0e77cf33 100644
--- a/target/arm/emulate/arm_emulate.c
+++ b/target/arm/emulate/arm_emulate.c
@@ -200,6 +200,109 @@ static bool trans_LDR_v_i(DisasContext *ctx, arg_ldst_imm *a)
return true;
}
+/* Register offset extension (DDI 0487 C6.2.131) */
+
+static uint64_t extend_reg(uint64_t val, int option, int shift)
+{
+ switch (option) {
+ case 0: /* UXTB */
+ val = (uint8_t)val;
+ break;
+ case 1: /* UXTH */
+ val = (uint16_t)val;
+ break;
+ case 2: /* UXTW */
+ val = (uint32_t)val;
+ break;
+ case 3: /* UXTX / LSL */
+ break;
+ case 4: /* SXTB */
+ val = (int64_t)(int8_t)val;
+ break;
+ case 5: /* SXTH */
+ val = (int64_t)(int16_t)val;
+ break;
+ case 6: /* SXTW */
+ val = (int64_t)(int32_t)val;
+ break;
+ case 7: /* SXTX */
+ break;
+ }
+ return val << shift;
+}
+
+/*
+ * Load/store single -- register offset (GPR)
+ * STR / LDR (DDI 0487 C3.3.9)
+ */
+
+static bool trans_STR(DisasContext *ctx, arg_ldst *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int shift = a->s ? a->sz : 0;
+ uint64_t rm_val = gpr_read(ctx, a->rm);
+ uint64_t offset = extend_reg(rm_val, a->opt, shift);
+ uint64_t va = base_read(ctx, a->rn) + offset;
+
+ uint64_t val = gpr_read(ctx, a->rt);
+ mem_write(ctx, va, &val, esize);
+ return true;
+}
+
+static bool trans_LDR(DisasContext *ctx, arg_ldst *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int shift = a->s ? a->sz : 0;
+ uint64_t rm_val = gpr_read(ctx, a->rm);
+ uint64_t offset = extend_reg(rm_val, a->opt, shift);
+ uint64_t va = base_read(ctx, a->rn) + offset;
+ uint64_t val = 0;
+
+ if (mem_read(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ val = load_extend(val, a->sz, a->sign, a->ext);
+ gpr_write(ctx, a->rt, val);
+ return true;
+}
+
+/*
+ * Load/store single -- register offset (SIMD/FP)
+ * STR_v / LDR_v (DDI 0487 C3.3.10)
+ */
+
+static bool trans_STR_v(DisasContext *ctx, arg_ldst *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int shift = a->s ? a->sz : 0;
+ uint64_t rm_val = gpr_read(ctx, a->rm);
+ uint64_t offset = extend_reg(rm_val, a->opt, shift);
+ uint64_t va = base_read(ctx, a->rn) + offset;
+ uint8_t buf[16];
+
+ fpreg_read(ctx, a->rt, buf, esize);
+ mem_write(ctx, va, buf, esize);
+ return true;
+}
+
+static bool trans_LDR_v(DisasContext *ctx, arg_ldst *a)
+{
+ int esize = (a->sz <= 3) ? (1 << a->sz) : 16;
+ int shift = a->s ? a->sz : 0;
+ uint64_t rm_val = gpr_read(ctx, a->rm);
+ uint64_t offset = extend_reg(rm_val, a->opt, shift);
+ uint64_t va = base_read(ctx, a->rn) + offset;
+ uint8_t buf[16];
+
+ if (mem_read(ctx, va, buf, esize) != 0) {
+ return true;
+ }
+
+ fpreg_write(ctx, a->rt, buf, esize);
+ return true;
+}
+
/* PRFM, DC cache maintenance -- treated as NOP */
static bool trans_NOP(DisasContext *ctx, arg_NOP *a)
{
--
2.52.0
Add emulation for load/store pair instructions (DDI 0487 C3.3.14 --
C3.3.16). All addressing modes are covered: non-temporal (STNP/LDNP),
post-indexed, signed offset, and pre-indexed.
Instruction coverage:
- STP/LDP (GPR): 32/64-bit pairs, all addressing modes
- STP/LDP (SIMD/FP): 32/64/128-bit pairs, all addressing modes
- LDPSW: sign-extending 32-bit pair load
- STGP: store allocation tag pair (tag operation is NOP for MMIO)
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
target/arm/emulate/a64-ldst.decode | 68 ++++++++++++++++++
target/arm/emulate/arm_emulate.c | 111 +++++++++++++++++++++++++++++
2 files changed, 179 insertions(+)
diff --git a/target/arm/emulate/a64-ldst.decode b/target/arm/emulate/a64-ldst.decode
index af6babe1..f3de3f86 100644
--- a/target/arm/emulate/a64-ldst.decode
+++ b/target/arm/emulate/a64-ldst.decode
@@ -10,6 +10,9 @@
# 'u' flag: 0 = 9-bit signed immediate (byte offset), 1 = 12-bit unsigned (needs << sz)
&ldst_imm rt rn imm sz sign w p unpriv ext u
+# Load/store pair (GPR and SIMD/FP)
+&ldstpair rt2 rt rn imm sz sign w p
+
# Load/store register offset
&ldst rm rn rt sign ext sz opt s
@@ -24,6 +27,9 @@
# Load/store unsigned offset (12-bit, handler scales by << sz)
@ldst_uimm .. ... . .. .. imm:12 rn:5 rt:5 &ldst_imm u=1 unpriv=0 p=0 w=0
+# Load/store pair: imm7 is signed, scaled by element size in handler
+@ldstpair .. ... . ... . imm:s7 rt2:5 rn:5 rt:5 &ldstpair
+
# Load/store register offset
@ldst .. ... . .. .. . rm:5 opt:3 s:1 .. rn:5 rt:5 &ldst
@@ -128,6 +134,68 @@ STR_v_i 00 111 1 01 10 ............ ..... ..... @ldst_uimm sign=
LDR_v_i sz:2 111 1 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=0
LDR_v_i 00 111 1 01 11 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=4
+### Load/store pair — non-temporal (STNP/LDNP)
+
+# STNP/LDNP: offset only, no writeback. Non-temporal hint ignored.
+STP 00 101 0 000 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP 00 101 0 000 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+STP 10 101 0 000 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+LDP 10 101 0 000 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STP_v 00 101 1 000 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP_v 00 101 1 000 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+STP_v 01 101 1 000 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+LDP_v 01 101 1 000 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STP_v 10 101 1 000 0 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=0
+LDP_v 10 101 1 000 1 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=0
+
+### Load/store pair — post-indexed
+
+STP 00 101 0 001 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=1 w=1
+LDP 00 101 0 001 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=1 w=1
+LDP 01 101 0 001 1 ....... ..... ..... ..... @ldstpair sz=2 sign=1 p=1 w=1
+STP 10 101 0 001 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+LDP 10 101 0 001 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+STP_v 00 101 1 001 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=1 w=1
+LDP_v 00 101 1 001 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=1 w=1
+STP_v 01 101 1 001 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+LDP_v 01 101 1 001 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+STP_v 10 101 1 001 0 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=1 w=1
+LDP_v 10 101 1 001 1 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=1 w=1
+
+### Load/store pair — signed offset
+
+STP 00 101 0 010 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP 00 101 0 010 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP 01 101 0 010 1 ....... ..... ..... ..... @ldstpair sz=2 sign=1 p=0 w=0
+STP 10 101 0 010 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+LDP 10 101 0 010 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STP_v 00 101 1 010 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+LDP_v 00 101 1 010 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=0
+STP_v 01 101 1 010 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+LDP_v 01 101 1 010 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STP_v 10 101 1 010 0 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=0
+LDP_v 10 101 1 010 1 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=0
+
+### Load/store pair — pre-indexed
+
+STP 00 101 0 011 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=1
+LDP 00 101 0 011 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=1
+LDP 01 101 0 011 1 ....... ..... ..... ..... @ldstpair sz=2 sign=1 p=0 w=1
+STP 10 101 0 011 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+LDP 10 101 0 011 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+STP_v 00 101 1 011 0 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=1
+LDP_v 00 101 1 011 1 ....... ..... ..... ..... @ldstpair sz=2 sign=0 p=0 w=1
+STP_v 01 101 1 011 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+LDP_v 01 101 1 011 1 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+STP_v 10 101 1 011 0 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=1
+LDP_v 10 101 1 011 1 ....... ..... ..... ..... @ldstpair sz=4 sign=0 p=0 w=1
+
+### Load/store pair — STGP (store allocation tag + pair)
+
+STGP 01 101 0 001 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=1 w=1
+STGP 01 101 0 010 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=0
+STGP 01 101 0 011 0 ....... ..... ..... ..... @ldstpair sz=3 sign=0 p=0 w=1
+
### Load/store register — register offset
# GPR
diff --git a/target/arm/emulate/arm_emulate.c b/target/arm/emulate/arm_emulate.c
index 0e77cf33..a7c62b44 100644
--- a/target/arm/emulate/arm_emulate.c
+++ b/target/arm/emulate/arm_emulate.c
@@ -111,6 +111,117 @@ static uint64_t load_extend(uint64_t val, int sz, int sign, int ext)
return val;
}
+/*
+ * Load/store pair: STP, LDP, STNP, LDNP, STGP, LDPSW
+ * (DDI 0487 C3.3.14 -- C3.3.16)
+ */
+
+static bool trans_STP(DisasContext *ctx, arg_ldstpair *a)
+{
+ int esize = 1 << a->sz; /* 4 or 8 bytes */
+ int64_t offset = (int64_t)a->imm << a->sz;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset; /* post-index: unmodified base */
+ uint8_t buf[16]; /* max 2 x 8 bytes */
+
+ uint64_t v1 = gpr_read(ctx, a->rt);
+ uint64_t v2 = gpr_read(ctx, a->rt2);
+ memcpy(buf, &v1, esize);
+ memcpy(buf + esize, &v2, esize);
+
+ if (mem_write(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+static bool trans_LDP(DisasContext *ctx, arg_ldstpair *a)
+{
+ int esize = 1 << a->sz;
+ int64_t offset = (int64_t)a->imm << a->sz;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[16];
+ uint64_t v1 = 0, v2 = 0;
+
+ if (mem_read(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+ memcpy(&v1, buf, esize);
+ memcpy(&v2, buf + esize, esize);
+
+ /* LDPSW: sign-extend 32-bit values to 64-bit (sign=1, sz=2) */
+ if (a->sign) {
+ v1 = sign_extend(v1, 8 * esize);
+ v2 = sign_extend(v2, 8 * esize);
+ }
+
+ gpr_write(ctx, a->rt, v1);
+ gpr_write(ctx, a->rt2, v2);
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+/* STGP: tag operation is a NOP for emulation; data stored via STP */
+static bool trans_STGP(DisasContext *ctx, arg_ldstpair *a)
+{
+ return trans_STP(ctx, a);
+}
+
+/*
+ * SIMD/FP load/store pair: STP_v, LDP_v
+ * (DDI 0487 C3.3.14 -- C3.3.16)
+ */
+
+static bool trans_STP_v(DisasContext *ctx, arg_ldstpair *a)
+{
+ int esize = 1 << a->sz; /* 4, 8, or 16 bytes */
+ int64_t offset = (int64_t)a->imm << a->sz;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[32]; /* max 2 x 16 bytes */
+
+ fpreg_read(ctx, a->rt, buf, esize);
+ fpreg_read(ctx, a->rt2, buf + esize, esize);
+
+ if (mem_write(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
+static bool trans_LDP_v(DisasContext *ctx, arg_ldstpair *a)
+{
+ int esize = 1 << a->sz;
+ int64_t offset = (int64_t)a->imm << a->sz;
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = a->p ? base : base + offset;
+ uint8_t buf[32];
+
+ if (mem_read(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ fpreg_write(ctx, a->rt, buf, esize);
+ fpreg_write(ctx, a->rt2, buf + esize, esize);
+
+ if (a->w) {
+ base_write(ctx, a->rn, base + offset);
+ }
+ return true;
+}
+
/* Load/store single -- immediate (GPR) (DDI 0487 C3.3.8 -- C3.3.13) */
static bool trans_STR_i(DisasContext *ctx, arg_ldst_imm *a)
--
2.52.0
Add emulation for load/store exclusive instructions (DDI 0487 C3.3.6).
Exclusive monitors have no meaning on emulated MMIO accesses, so STXR
always reports success (Rs=0) and LDXR does not set a monitor.
Instruction coverage:
- STXR/STLXR: exclusive store, 8/16/32/64-bit
- LDXR/LDAXR: exclusive load, 8/16/32/64-bit
- STXP/STLXP: exclusive store pair, 32/64-bit
- LDXP/LDAXP: exclusive load pair, 32/64-bit
STXP/LDXP use two explicit decode patterns (sz=2, sz=3) for the
32/64-bit size variants.
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
target/arm/emulate/a64-ldst.decode | 22 +++++++++
target/arm/emulate/arm_emulate.c | 74 ++++++++++++++++++++++++++++++
2 files changed, 96 insertions(+)
diff --git a/target/arm/emulate/a64-ldst.decode b/target/arm/emulate/a64-ldst.decode
index f3de3f86..fadf6fd2 100644
--- a/target/arm/emulate/a64-ldst.decode
+++ b/target/arm/emulate/a64-ldst.decode
@@ -10,6 +10,9 @@
# 'u' flag: 0 = 9-bit signed immediate (byte offset), 1 = 12-bit unsigned (needs << sz)
&ldst_imm rt rn imm sz sign w p unpriv ext u
+# Load/store exclusive
+&stxr rn rt rt2 rs sz lasr
+
# Load/store pair (GPR and SIMD/FP)
&ldstpair rt2 rt rn imm sz sign w p
@@ -18,6 +21,9 @@
### Format templates
+# Exclusives
+@stxr sz:2 ...... ... rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr
+
# Load/store immediate (9-bit signed)
@ldst_imm .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=0 p=0 w=0
@ldst_imm_pre .. ... . .. .. . imm:s9 .. rn:5 rt:5 &ldst_imm u=0 unpriv=0 p=0 w=1
@@ -134,6 +140,22 @@ STR_v_i 00 111 1 01 10 ............ ..... ..... @ldst_uimm sign=
LDR_v_i sz:2 111 1 01 01 ............ ..... ..... @ldst_uimm sign=0 ext=0
LDR_v_i 00 111 1 01 11 ............ ..... ..... @ldst_uimm sign=0 ext=0 sz=4
+### Load/store exclusive
+
+# STXR / STLXR (sz encodes 8/16/32/64-bit)
+STXR .. 001000 000 ..... . ..... ..... ..... @stxr
+
+# LDXR / LDAXR
+LDXR .. 001000 010 ..... . ..... ..... ..... @stxr
+
+# STXP / STLXP (bit[31]=1, bit[30]=sf → sz=2 for 32-bit, sz=3 for 64-bit)
+STXP 10 001000 001 rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr sz=2
+STXP 11 001000 001 rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr sz=3
+
+# LDXP / LDAXP
+LDXP 10 001000 011 rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr sz=2
+LDXP 11 001000 011 rs:5 lasr:1 rt2:5 rn:5 rt:5 &stxr sz=3
+
### Load/store pair — non-temporal (STNP/LDNP)
# STNP/LDNP: offset only, no writeback. Non-temporal hint ignored.
diff --git a/target/arm/emulate/arm_emulate.c b/target/arm/emulate/arm_emulate.c
index a7c62b44..fd567e65 100644
--- a/target/arm/emulate/arm_emulate.c
+++ b/target/arm/emulate/arm_emulate.c
@@ -414,6 +414,80 @@ static bool trans_LDR_v(DisasContext *ctx, arg_ldst *a)
return true;
}
+/*
+ * Load/store exclusive: STXR, LDXR, STXP, LDXP
+ * (DDI 0487 C3.3.6)
+ *
+ * Exclusive monitors have no meaning on MMIO. STXR always reports
+ * success (Rs=0) and LDXR does not set an exclusive monitor.
+ */
+
+static bool trans_STXR(DisasContext *ctx, arg_stxr *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t val = gpr_read(ctx, a->rt);
+
+ if (mem_write(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ /* Report success -- no exclusive monitor on emulated access */
+ gpr_write(ctx, a->rs, 0);
+ return true;
+}
+
+static bool trans_LDXR(DisasContext *ctx, arg_stxr *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t val = 0;
+
+ if (mem_read(ctx, va, &val, esize) != 0) {
+ return true;
+ }
+
+ gpr_write(ctx, a->rt, val);
+ return true;
+}
+
+static bool trans_STXP(DisasContext *ctx, arg_stxr *a)
+{
+ int esize = 1 << a->sz; /* sz=2->4, sz=3->8 */
+ uint64_t va = base_read(ctx, a->rn);
+ uint8_t buf[16];
+
+ uint64_t v1 = gpr_read(ctx, a->rt);
+ uint64_t v2 = gpr_read(ctx, a->rt2);
+ memcpy(buf, &v1, esize);
+ memcpy(buf + esize, &v2, esize);
+
+ if (mem_write(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ gpr_write(ctx, a->rs, 0); /* success */
+ return true;
+}
+
+static bool trans_LDXP(DisasContext *ctx, arg_stxr *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint8_t buf[16];
+ uint64_t v1 = 0, v2 = 0;
+
+ if (mem_read(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+
+ memcpy(&v1, buf, esize);
+ memcpy(&v2, buf + esize, esize);
+ gpr_write(ctx, a->rt, v1);
+ gpr_write(ctx, a->rt2, v2);
+ return true;
+}
+
/* PRFM, DC cache maintenance -- treated as NOP */
static bool trans_NOP(DisasContext *ctx, arg_NOP *a)
{
--
2.52.0
Add emulation for remaining ISV=0 load/store instruction classes.
Atomic memory operations (DDI 0487 C3.3.2):
- LDADD, LDCLR, LDEOR, LDSET: arithmetic/logic atomics
- LDSMAX, LDSMIN, LDUMAX, LDUMIN: signed/unsigned min/max
- SWP: atomic swap
Non-atomic read-modify-write, sufficient for MMIO where concurrent
access is not a concern. Acquire/release semantics are ignored.
Compare-and-swap (DDI 0487 C3.3.1):
- CAS/CASA/CASAL/CASL: single-register compare-and-swap
- CASP/CASPA/CASPAL/CASPL: register-pair compare-and-swap
CASP validates even register pairs; odd or r31 returns UNHANDLED.
Load with PAC (DDI 0487 C6.2.121):
- LDRAA/LDRAB: pointer-authenticated load, offset/pre-indexed
Pointer authentication is not emulated (equivalent to auth always
succeeding), which is correct for MMIO since PAC is a software
security mechanism, not a memory access semantic.
CASP uses two explicit decode patterns for the 32/64-bit size
variants. LDRA's offset immediate is stored raw in the decode;
the handler scales by << 3.
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
target/arm/emulate/a64-ldst.decode | 45 ++++++
target/arm/emulate/arm_emulate.c | 233 +++++++++++++++++++++++++++++
2 files changed, 278 insertions(+)
diff --git a/target/arm/emulate/a64-ldst.decode b/target/arm/emulate/a64-ldst.decode
index fadf6fd2..9292bfdf 100644
--- a/target/arm/emulate/a64-ldst.decode
+++ b/target/arm/emulate/a64-ldst.decode
@@ -16,6 +16,16 @@
# Load/store pair (GPR and SIMD/FP)
&ldstpair rt2 rt rn imm sz sign w p
+# Atomic memory operations
+&atomic rs rn rt a r sz
+
+# Compare-and-swap
+&cas rs rn rt sz a r
+
+# Load with PAC (LDRAA/LDRAB, FEAT_PAuth)
+%ldra_imm 22:s1 12:9
+&ldra rt rn imm m w
+
# Load/store register offset
&ldst rm rn rt sign ext sz opt s
@@ -36,6 +46,15 @@
# Load/store pair: imm7 is signed, scaled by element size in handler
@ldstpair .. ... . ... . imm:s7 rt2:5 rn:5 rt:5 &ldstpair
+# Atomics
+@atomic sz:2 ... . .. a:1 r:1 . rs:5 . ... .. rn:5 rt:5 &atomic
+
+# Compare-and-swap: sz extracted by pattern (CAS) or set constant (CASP)
+@cas .. ...... . a:1 . rs:5 r:1 ..... rn:5 rt:5 &cas
+
+# Load with PAC
+@ldra .. ... . .. m:1 . . ......... w:1 . rn:5 rt:5 &ldra imm=%ldra_imm
+
# Load/store register offset
@ldst .. ... . .. .. . rm:5 opt:3 s:1 .. rn:5 rt:5 &ldst
@@ -241,6 +260,32 @@ STR_v 00 111 1 00 10 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=
LDR_v sz:2 111 1 00 01 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0
LDR_v 00 111 1 00 11 1 ..... ... . 10 ..... ..... @ldst sign=0 ext=0 sz=4
+### Compare-and-swap
+
+# CAS / CASA / CASAL / CASL
+CAS sz:2 001000 1 . 1 ..... . 11111 ..... ..... @cas
+
+# CASP / CASPA / CASPAL / CASPL (pair: Rt,Rt+1 and Rs,Rs+1)
+CASP 00 001000 0 . 1 ..... . 11111 ..... ..... @cas sz=2
+CASP 01 001000 0 . 1 ..... . 11111 ..... ..... @cas sz=3
+
+### Atomic memory operations
+
+LDADD .. 111 0 00 . . 1 ..... 0000 00 ..... ..... @atomic
+LDCLR .. 111 0 00 . . 1 ..... 0001 00 ..... ..... @atomic
+LDEOR .. 111 0 00 . . 1 ..... 0010 00 ..... ..... @atomic
+LDSET .. 111 0 00 . . 1 ..... 0011 00 ..... ..... @atomic
+LDSMAX .. 111 0 00 . . 1 ..... 0100 00 ..... ..... @atomic
+LDSMIN .. 111 0 00 . . 1 ..... 0101 00 ..... ..... @atomic
+LDUMAX .. 111 0 00 . . 1 ..... 0110 00 ..... ..... @atomic
+LDUMIN .. 111 0 00 . . 1 ..... 0111 00 ..... ..... @atomic
+SWP .. 111 0 00 . . 1 ..... 1000 00 ..... ..... @atomic
+
+### Load with PAC (FEAT_PAuth)
+
+# LDRAA (M=0) / LDRAB (M=1), offset (W=0) / pre-indexed (W=1)
+LDRA 11 111 0 00 . . 1 ......... . 1 ..... ..... @ldra
+
### System instructions — DC cache maintenance
# SYS with CRn=C7 covers all data cache operations (DC CIVAC, CVAC, etc.).
diff --git a/target/arm/emulate/arm_emulate.c b/target/arm/emulate/arm_emulate.c
index fd567e65..1b959745 100644
--- a/target/arm/emulate/arm_emulate.c
+++ b/target/arm/emulate/arm_emulate.c
@@ -488,6 +488,239 @@ static bool trans_LDXP(DisasContext *ctx, arg_stxr *a)
return true;
}
+/*
+ * Atomic memory operations (DDI 0487 C3.3.2)
+ *
+ * Non-atomic read-modify-write; sufficient for MMIO.
+ * Acquire/release semantics ignored (sequentially consistent by design).
+ */
+
+typedef uint64_t (*atomic_op_fn)(uint64_t old, uint64_t operand, int bits);
+
+static uint64_t atomic_add(uint64_t old, uint64_t op, int bits)
+{
+ (void)bits;
+ return old + op;
+}
+
+static uint64_t atomic_clr(uint64_t old, uint64_t op, int bits)
+{
+ (void)bits;
+ return old & ~op;
+}
+
+static uint64_t atomic_eor(uint64_t old, uint64_t op, int bits)
+{
+ (void)bits;
+ return old ^ op;
+}
+
+static uint64_t atomic_set(uint64_t old, uint64_t op, int bits)
+{
+ (void)bits;
+ return old | op;
+}
+
+static uint64_t atomic_smax(uint64_t old, uint64_t op, int bits)
+{
+ int64_t a = sign_extend(old, bits);
+ int64_t b = sign_extend(op, bits);
+ return (a >= b) ? old : op;
+}
+
+static uint64_t atomic_smin(uint64_t old, uint64_t op, int bits)
+{
+ int64_t a = sign_extend(old, bits);
+ int64_t b = sign_extend(op, bits);
+ return (a <= b) ? old : op;
+}
+
+static uint64_t atomic_umax(uint64_t old, uint64_t op, int bits)
+{
+ uint64_t mask = (bits == 64) ? UINT64_MAX : (1ULL << bits) - 1;
+ return ((old & mask) >= (op & mask)) ? old : op;
+}
+
+static uint64_t atomic_umin(uint64_t old, uint64_t op, int bits)
+{
+ uint64_t mask = (bits == 64) ? UINT64_MAX : (1ULL << bits) - 1;
+ return ((old & mask) <= (op & mask)) ? old : op;
+}
+
+static bool do_atomic(DisasContext *ctx, arg_atomic *a, atomic_op_fn fn)
+{
+ int esize = 1 << a->sz;
+ int bits = 8 * esize;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t old = 0;
+
+ if (mem_read(ctx, va, &old, esize) != 0) {
+ return true;
+ }
+
+ uint64_t operand = gpr_read(ctx, a->rs);
+ uint64_t result = fn(old, operand, bits);
+
+ if (mem_write(ctx, va, &result, esize) != 0) {
+ return true;
+ }
+
+ /* Rt receives the old value (before modification) */
+ gpr_write(ctx, a->rt, old);
+ return true;
+}
+
+static bool trans_LDADD(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_add);
+}
+
+static bool trans_LDCLR(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_clr);
+}
+
+static bool trans_LDEOR(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_eor);
+}
+
+static bool trans_LDSET(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_set);
+}
+
+static bool trans_LDSMAX(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_smax);
+}
+
+static bool trans_LDSMIN(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_smin);
+}
+
+static bool trans_LDUMAX(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_umax);
+}
+
+static bool trans_LDUMIN(DisasContext *ctx, arg_atomic *a)
+{
+ return do_atomic(ctx, a, atomic_umin);
+}
+
+static bool trans_SWP(DisasContext *ctx, arg_atomic *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t old = 0;
+
+ if (mem_read(ctx, va, &old, esize) != 0) {
+ return true;
+ }
+
+ uint64_t newval = gpr_read(ctx, a->rs);
+ if (mem_write(ctx, va, &newval, esize) != 0) {
+ return true;
+ }
+
+ gpr_write(ctx, a->rt, old);
+ return true;
+}
+
+/* Compare-and-swap: CAS, CASP (DDI 0487 C3.3.1) */
+
+static bool trans_CAS(DisasContext *ctx, arg_cas *a)
+{
+ int esize = 1 << a->sz;
+ uint64_t va = base_read(ctx, a->rn);
+ uint64_t current = 0;
+
+ if (mem_read(ctx, va, ¤t, esize) != 0) {
+ return true;
+ }
+
+ uint64_t mask = (esize == 8) ? UINT64_MAX : (1ULL << (8 * esize)) - 1;
+ uint64_t compare = gpr_read(ctx, a->rs) & mask;
+
+ if ((current & mask) == compare) {
+ uint64_t newval = gpr_read(ctx, a->rt) & mask;
+ if (mem_write(ctx, va, &newval, esize) != 0) {
+ return true;
+ }
+ }
+
+ /* Rs receives the old memory value (whether or not swap occurred) */
+ gpr_write(ctx, a->rs, current);
+ return true;
+}
+
+/* CASP: compare-and-swap pair (Rs,Rs+1 compared; Rt,Rt+1 stored) */
+static bool trans_CASP(DisasContext *ctx, arg_cas *a)
+{
+ /* CASP requires even register pairs; odd or r31 is UNPREDICTABLE */
+ if ((a->rs & 1) || a->rs >= 31 || (a->rt & 1) || a->rt >= 31) {
+ return false;
+ }
+
+ int esize = 1 << a->sz; /* per-register size */
+ uint64_t va = base_read(ctx, a->rn);
+ uint8_t buf[16];
+ uint64_t cur1 = 0, cur2 = 0;
+
+ if (mem_read(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+ memcpy(&cur1, buf, esize);
+ memcpy(&cur2, buf + esize, esize);
+
+ uint64_t mask = (esize == 8) ? UINT64_MAX : (1ULL << (8 * esize)) - 1;
+ uint64_t cmp1 = gpr_read(ctx, a->rs) & mask;
+ uint64_t cmp2 = gpr_read(ctx, a->rs + 1) & mask;
+
+ if ((cur1 & mask) == cmp1 && (cur2 & mask) == cmp2) {
+ uint64_t new1 = gpr_read(ctx, a->rt) & mask;
+ uint64_t new2 = gpr_read(ctx, a->rt + 1) & mask;
+ memcpy(buf, &new1, esize);
+ memcpy(buf + esize, &new2, esize);
+ if (mem_write(ctx, va, buf, 2 * esize) != 0) {
+ return true;
+ }
+ }
+
+ gpr_write(ctx, a->rs, cur1);
+ gpr_write(ctx, a->rs + 1, cur2);
+ return true;
+}
+
+/*
+ * Load with PAC: LDRAA / LDRAB (FEAT_PAuth)
+ * (DDI 0487 C6.2.121)
+ *
+ * Pointer authentication is not emulated -- the base register is used
+ * directly (equivalent to auth always succeeding).
+ */
+
+static bool trans_LDRA(DisasContext *ctx, arg_ldra *a)
+{
+ int64_t offset = (int64_t)a->imm << 3; /* S:imm9, scaled by 8 */
+ uint64_t base = base_read(ctx, a->rn);
+ uint64_t va = base + offset; /* auth not emulated */
+ uint64_t val = 0;
+
+ if (mem_read(ctx, va, &val, 8) != 0) {
+ return true;
+ }
+
+ gpr_write(ctx, a->rt, val);
+
+ if (a->w) {
+ base_write(ctx, a->rn, va);
+ }
+ return true;
+}
+
/* PRFM, DC cache maintenance -- treated as NOP */
static bool trans_NOP(DisasContext *ctx, arg_NOP *a)
{
--
2.52.0
When a data abort with ISV=0 occurs during MMIO emulation, the
syndrome register does not carry the access size or target register.
Previously this hit an assert(isv) and killed the VM.
Replace the assert with instruction fetch + decode + emulate using the
shared library in target/arm/emulate/. The faulting instruction is read
from guest memory via cpu_memory_rw_debug(), decoded by the decodetree-
generated decoder, and emulated against the vCPU register file.
Both HVF (macOS) and WHPX (Windows Hyper-V) use the same pattern:
1. cpu_synchronize_state() to flush hypervisor registers
2. Fetch 4-byte instruction at env->pc
3. arm_emul_insn(env, insn)
4. Log errors for unhandled/memory-fault cases, advance PC
This makes ISV=0 data aborts non-fatal, enabling MMIO access from
SIMD/FP loads, load/store pairs, atomics, and other instructions
that hardware does not decode into the syndrome.
Signed-off-by: Lucas Amaral <lucaaamaral@gmail.com>
---
target/arm/hvf/hvf.c | 41 +++++++++++++++++++++++++++++++++++---
target/arm/whpx/whpx-all.c | 39 +++++++++++++++++++++++++++++++++++-
2 files changed, 76 insertions(+), 4 deletions(-)
diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index 5fc8f6bb..219dbbca 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -32,6 +32,7 @@
#include "arm-powerctl.h"
#include "target/arm/cpu.h"
#include "target/arm/internals.h"
+#include "emulate/arm_emulate.h"
#include "target/arm/multiprocessing.h"
#include "target/arm/gtimer.h"
#include "target/arm/trace.h"
@@ -2175,10 +2176,44 @@ static int hvf_handle_exception(CPUState *cpu, hv_vcpu_exit_exception_t *excp)
assert(!s1ptw);
/*
- * TODO: ISV will be 0 for SIMD or SVE accesses.
- * Inject the exception into the guest.
+ * ISV=0: syndrome doesn't carry access size/register info.
+ * Fetch and emulate via target/arm/emulate/.
+ * Unhandled instructions log an error and advance PC.
*/
- assert(isv);
+ if (!isv) {
+ ARMCPU *arm_cpu = ARM_CPU(cpu);
+ CPUARMState *env = &arm_cpu->env;
+ uint32_t insn;
+ ArmEmulResult r;
+
+ cpu_synchronize_state(cpu);
+
+ if (cpu_memory_rw_debug(cpu, env->pc,
+ (uint8_t *)&insn, 4, false) != 0) {
+ error_report("HVF: cannot read insn at pc=0x%" PRIx64,
+ (uint64_t)env->pc);
+ advance_pc = true;
+ break;
+ }
+
+ r = arm_emul_insn(env, insn);
+ if (r == ARM_EMUL_UNHANDLED) {
+ /*
+ * TODO: Inject data abort into guest instead of
+ * advancing PC. Requires setting ESR_EL1/FAR_EL1/
+ * ELR_EL1/SPSR_EL1 and redirecting to VBAR_EL1.
+ */
+ error_report("HVF: ISV=0 unhandled insn 0x%08x at "
+ "pc=0x%" PRIx64, insn, (uint64_t)env->pc);
+ } else if (r == ARM_EMUL_ERR_MEM) {
+ error_report("HVF: ISV=0 memory error emulating "
+ "insn 0x%08x at pc=0x%" PRIx64,
+ insn, (uint64_t)env->pc);
+ }
+
+ advance_pc = true;
+ break;
+ }
/*
* Emulate MMIO.
diff --git a/target/arm/whpx/whpx-all.c b/target/arm/whpx/whpx-all.c
index 513551be..2f8ffc7f 100644
--- a/target/arm/whpx/whpx-all.c
+++ b/target/arm/whpx/whpx-all.c
@@ -29,6 +29,7 @@
#include "syndrome.h"
#include "target/arm/cpregs.h"
#include "internals.h"
+#include "emulate/arm_emulate.h"
#include "system/whpx-internal.h"
#include "system/whpx-accel-ops.h"
@@ -366,7 +367,43 @@ static int whpx_handle_mmio(CPUState *cpu, WHV_MEMORY_ACCESS_CONTEXT *ctx)
uint64_t val = 0;
assert(!cm);
- assert(isv);
+
+ /*
+ * ISV=0: syndrome doesn't carry access size/register info.
+ * Fetch and decode the faulting instruction via the emulation library.
+ */
+ if (!isv) {
+ ARMCPU *arm_cpu = ARM_CPU(cpu);
+ CPUARMState *env = &arm_cpu->env;
+ uint32_t insn;
+ ArmEmulResult r;
+
+ cpu_synchronize_state(cpu);
+
+ if (cpu_memory_rw_debug(cpu, env->pc,
+ (uint8_t *)&insn, 4, false) != 0) {
+ error_report("WHPX: cannot read insn at pc=0x%" PRIx64,
+ (uint64_t)env->pc);
+ return 0;
+ }
+
+ r = arm_emul_insn(env, insn);
+ if (r == ARM_EMUL_UNHANDLED) {
+ /*
+ * TODO: Inject data abort into guest instead of
+ * advancing PC. Requires setting ESR_EL1/FAR_EL1/
+ * ELR_EL1/SPSR_EL1 and redirecting to VBAR_EL1.
+ */
+ error_report("WHPX: ISV=0 unhandled insn 0x%08x at "
+ "pc=0x%" PRIx64, insn, (uint64_t)env->pc);
+ } else if (r == ARM_EMUL_ERR_MEM) {
+ error_report("WHPX: ISV=0 memory error emulating "
+ "insn 0x%08x at pc=0x%" PRIx64,
+ insn, (uint64_t)env->pc);
+ }
+
+ return 0;
+ }
if (iswrite) {
val = whpx_get_gp_reg(cpu, srt);
--
2.52.0
© 2016 - 2026 Red Hat, Inc.