From: John Högberg <john.hogberg@ericsson.com>
Unlike architectures with precise self-modifying code semantics
(e.g. x86) ARM processors do not maintain coherency for instruction
execution and memory, and require the explicit use of cache
management instructions as well as an instruction barrier to make
code updates visible (the latter on every core that is going to
execute said code).
While this is required to make JITs work on actual hardware, QEMU
has gotten away with not handling this since it does not emulate
caches, and unconditionally invalidates code whenever the softmmu
or the user-mode page protection logic detects that code has been
modified.
Unfortunately the latter does not work in the face of dual-mapped
code (a common W^X workaround), where one page is executable and
the other is writable: user-mode has no way to connect one with the
other as that is only known to the kernel and the emulated
application.
This commit works around the issue by invalidating code in
IC IVAU instructions.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1034
Co-authored-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: John Högberg <john.hogberg@ericsson.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper.c | 47 ++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 44 insertions(+), 3 deletions(-)
diff --git a/target/arm/helper.c b/target/arm/helper.c
index d4bee43bd0..235e3cd0b6 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -5228,6 +5228,36 @@ static void mdcr_el2_write(CPUARMState *env, const ARMCPRegInfo *ri,
}
}
+#ifdef CONFIG_USER_ONLY
+/*
+ * `IC IVAU` is handled to improve compatibility with JITs that dual-map their
+ * code to get around W^X restrictions, where one region is writable and the
+ * other is executable.
+ *
+ * Since the executable region is never written to we cannot detect code
+ * changes when running in user mode, and rely on the emulated JIT telling us
+ * that the code has changed by executing this instruction.
+ */
+static void ic_ivau_write(CPUARMState *env, const ARMCPRegInfo *ri,
+ uint64_t value)
+{
+ uint64_t icache_line_mask, start_address, end_address;
+ const ARMCPU *cpu;
+
+ cpu = env_archcpu(env);
+
+ icache_line_mask = (4 << extract32(cpu->ctr, 0, 4)) - 1;
+ start_address = value & ~icache_line_mask;
+ end_address = value | icache_line_mask;
+
+ mmap_lock();
+
+ tb_invalidate_phys_range(start_address, end_address);
+
+ mmap_unlock();
+}
+#endif
+
static const ARMCPRegInfo v8_cp_reginfo[] = {
/*
* Minimal set of EL0-visible registers. This will need to be expanded
@@ -5267,7 +5297,10 @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
{ .name = "CURRENTEL", .state = ARM_CP_STATE_AA64,
.opc0 = 3, .opc1 = 0, .opc2 = 2, .crn = 4, .crm = 2,
.access = PL1_R, .type = ARM_CP_CURRENTEL },
- /* Cache ops: all NOPs since we don't emulate caches */
+ /*
+ * Instruction cache ops. All of these except `IC IVAU` NOP because we
+ * don't emulate caches.
+ */
{ .name = "IC_IALLUIS", .state = ARM_CP_STATE_AA64,
.opc0 = 1, .opc1 = 0, .crn = 7, .crm = 1, .opc2 = 0,
.access = PL1_W, .type = ARM_CP_NOP,
@@ -5280,9 +5313,17 @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
.accessfn = access_tocu },
{ .name = "IC_IVAU", .state = ARM_CP_STATE_AA64,
.opc0 = 1, .opc1 = 3, .crn = 7, .crm = 5, .opc2 = 1,
- .access = PL0_W, .type = ARM_CP_NOP,
+ .access = PL0_W,
.fgt = FGT_ICIVAU,
- .accessfn = access_tocu },
+ .accessfn = access_tocu,
+#ifdef CONFIG_USER_ONLY
+ .type = ARM_CP_NO_RAW,
+ .writefn = ic_ivau_write
+#else
+ .type = ARM_CP_NOP
+#endif
+ },
+ /* Cache ops: all NOPs since we don't emulate caches */
{ .name = "DC_IVAC", .state = ARM_CP_STATE_AA64,
.opc0 = 1, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 1,
.access = PL1_W, .accessfn = aa64_cacheop_poc_access,
--
2.38.5
On Tue, 20 Jun 2023 at 02:04, ~jhogberg <jhogberg@git.sr.ht> wrote:
>
> From: John Högberg <john.hogberg@ericsson.com>
>
> Unlike architectures with precise self-modifying code semantics
> (e.g. x86) ARM processors do not maintain coherency for instruction
> execution and memory, and require the explicit use of cache
> management instructions as well as an instruction barrier to make
> code updates visible (the latter on every core that is going to
> execute said code).
This is implementation-dependent : if the
implementation reports CTR_EL0.{DIC,IDC} == {1,1} then
it doesn't need icache invalidation or data cache clean
to provide data-to-instruction or instruction-to-data
coherence. This is currently not true for any CPU QEMU
models, but the Neoverse-V1 (which I'm about to send a patch
for) can do this. (It's also tempting to make 'max' set
these bits, which would save the guest some effort in
doing cache ops which we NOP anyway.)
So maybe we should also force CTR_EL0.DIC to 0 in user-mode
so that the guest won't decide based on the value of that bit
that it doesn't need to issue the IC IVAU ?
arm_cpu_realizefn() would be the place to do this, I think.
thanks
-- PMM
> This is implementation-dependent : if the
> implementation reports CTR_EL0.{DIC,IDC} == {1,1} then
> it doesn't need icache invalidation or data cache clean
> to provide data-to-instruction or instruction-to-data
> coherence. This is currently not true for any CPU QEMU
> models, but the Neoverse-V1 (which I'm about to send a patch
> for) can do this. (It's also tempting to make 'max' set
> these bits, which would save the guest some effort in
> doing cache ops which we NOP anyway.)
Sure, I'll update the commit message to this effect.
> So maybe we should also force CTR_EL0.DIC to 0 in user-mode
> so that the guest won't decide based on the value of that bit
> that it doesn't need to issue the IC IVAU ?
> arm_cpu_realizefn() would be the place to do this, I think.
Sounds good, I'll fix that. Thanks :)
/John
-----Original Message-----
From: Peter Maydell <peter.maydell@linaro.org>
To: ~jhogberg <john.hogberg@ericsson.com>
Cc: qemu-devel@nongnu.org
Subject: Re: [PATCH qemu v3 1/2] target/arm: Handle IC IVAU to improve
compatibility with JITs
Date: Mon, 26 Jun 2023 13:38:16 +0100
On Tue, 20 Jun 2023 at 02:04, ~jhogberg <jhogberg@git.sr.ht> wrote:
>
> From: John Högberg <john.hogberg@ericsson.com>
>
> Unlike architectures with precise self-modifying code semantics
> (e.g. x86) ARM processors do not maintain coherency for instruction
> execution and memory, and require the explicit use of cache
> management instructions as well as an instruction barrier to make
> code updates visible (the latter on every core that is going to
> execute said code).
This is implementation-dependent : if the
implementation reports CTR_EL0.{DIC,IDC} == {1,1} then
it doesn't need icache invalidation or data cache clean
to provide data-to-instruction or instruction-to-data
coherence. This is currently not true for any CPU QEMU
models, but the Neoverse-V1 (which I'm about to send a patch
for) can do this. (It's also tempting to make 'max' set
these bits, which would save the guest some effort in
doing cache ops which we NOP anyway.)
So maybe we should also force CTR_EL0.DIC to 0 in user-mode
so that the guest won't decide based on the value of that bit
that it doesn't need to issue the IC IVAU ?
arm_cpu_realizefn() would be the place to do this, I think.
thanks
-- PMM
© 2016 - 2026 Red Hat, Inc.