[PATCH] target/arm: Implement WFE, SEV and SEVONPEND for Cortex-M

Ashish Anand posted 1 patch 1 month, 2 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20251223120012.541777-1-ashish.a6@samsung.com
Maintainers: Peter Maydell <peter.maydell@linaro.org>
There is a newer version of this series
hw/intc/armv7m_nvic.c         | 22 ++++++++++++++++-
include/hw/intc/armv7m_nvic.h |  1 +
target/arm/cpu.c              |  7 ++++++
target/arm/cpu.h              |  1 +
target/arm/machine.c          | 19 +++++++++++++++
target/arm/tcg/helper.h       |  1 +
target/arm/tcg/op_helper.c    | 45 +++++++++++++++++++++++++++++------
target/arm/tcg/t16.decode     |  2 +-
target/arm/tcg/t32.decode     |  2 +-
target/arm/tcg/translate.c    | 20 ++++++++++++----
10 files changed, 106 insertions(+), 14 deletions(-)
[PATCH] target/arm: Implement WFE, SEV and SEVONPEND for Cortex-M
Posted by Ashish Anand 1 month, 2 weeks ago
From: "ashish.a6" <ashish.a6@samsung.com>

    Currently, QEMU implements the 'Wait For Event' (WFE) instruction as a
    a simple yield. This causes high host CPU usage because Guest
    RTOS idle loops effectively become busy-wait loops.

    To improve efficiency, this patch transitions WFE to use the architectural
    'Halt' state (EXCP_HLT) for M-profile CPUs. This allows the host thread
    to sleep when the guest is idle.

    To support this transition, we implement the full architectural behavior
    required for WFE, specifically the 'Event Register' and 'SEVONPEND' logic
    defined in the ARMv7-M specification. This ensures that the CPU correctly
    identifies wakeup conditions as defined by the architecture.

    Changes include:
    1.  target/arm/cpu.h: Added 'event_register' to the v7m state struct.
    2.  target/arm/cpu.c: Ensured event_register is cleared on reset and
        updated arm_cpu_has_work() to check the event register state.
    3.  target/arm/machine.c: Added VMState subsection for migration compatibility.
    4.  hw/intc/armv7m_nvic.h: Added was_pending flag, to check for SEVONPEND logic.
    5.  hw/intc/armv7m_nvic.c: Implemented SEVONPEND logic in :
        nvic_recompute_state() and nvic_recompute_state_secure().
        This sets the event register and kicks the CPU when a new interrupt
        becomes pending.
    6.  target/arm/tcg/helper.h: Declare the new helper_sev function.
    7.  target/arm/tcg/op_helper.c:
        - Update HELPER(wfe) to use EXCP_HLT for M-profile CPUs.
        - Implement HELPER(sev) to set the event register and kick all vCPUs.
    8.  target/arm/tcg/t16.decode / t32.decode: Enable decoding of the SEV
        instruction in Thumb/Thumb-2 mode.
    9.  target/arm/tcg/translate.c: Implement trans_SEV using inline
        TCG generation to ensure system-wide visibility.

    This patch enables resource-efficient idle emulation for Cortex-M.

    Signed-off-by: Ashish Anand ashish.a6@samsung.com
---
Testing Notes:
- Performance Test: Monitored host CPU usage while the guest was in an 
  idle loop. Usage dropped from 100% (busy-yield) to <5% (EXCP_HALT).
- Functional Test: Verified SEVONPEND logic using a bare-metal test 
  case on Cortex-M. Confirmed that pending interrupts correctly 
  trigger a wakeup when SEVONPEND is enabled, as per Armv7-M spec.
- Regression Test: 'make check' passes for all ARM targets.
- Note: 4 x86-specific test cases (lpc-ich9, e1000e) failed, but 
  I confirmed these also fail on a clean master branch.

 hw/intc/armv7m_nvic.c         | 22 ++++++++++++++++-
 include/hw/intc/armv7m_nvic.h |  1 +
 target/arm/cpu.c              |  7 ++++++
 target/arm/cpu.h              |  1 +
 target/arm/machine.c          | 19 +++++++++++++++
 target/arm/tcg/helper.h       |  1 +
 target/arm/tcg/op_helper.c    | 45 +++++++++++++++++++++++++++++------
 target/arm/tcg/t16.decode     |  2 +-
 target/arm/tcg/t32.decode     |  2 +-
 target/arm/tcg/translate.c    | 20 ++++++++++++----
 10 files changed, 106 insertions(+), 14 deletions(-)

diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
index 7c78961040..a60990c71f 100644
--- a/hw/intc/armv7m_nvic.c
+++ b/hw/intc/armv7m_nvic.c
@@ -16,6 +16,7 @@
 #include "migration/vmstate.h"
 #include "qemu/timer.h"
 #include "hw/intc/armv7m_nvic.h"
+#include "hw/arm/armv7m.h"
 #include "hw/irq.h"
 #include "hw/qdev-properties.h"
 #include "system/tcg.h"
@@ -232,6 +233,8 @@ static void nvic_recompute_state_secure(NVICState *s)
     int pend_irq = 0;
     bool pending_is_s_banked = false;
     int pend_subprio = 0;
+    ARMv7MState *armv7m = container_of(s, ARMv7MState, nvic);
+    ARMCPU *cpu = armv7m->cpu;
 
     /* R_CQRV: precedence is by:
      *  - lowest group priority; if both the same then
@@ -259,6 +262,14 @@ static void nvic_recompute_state_secure(NVICState *s)
                 targets_secure = !exc_is_banked(i) && exc_targets_secure(s, i);
             }
 
+            if (!vec->was_pending && vec->pending) {
+                if (cpu->env.v7m.scr[bank] & R_V7M_SCR_SEVONPEND_MASK) {
+                    cpu->env.v7m.event_register = 1;
+                    qemu_cpu_kick(CPU(cpu));
+                }
+            }
+            vec->was_pending = vec->pending;
+
             prio = exc_group_prio(s, vec->prio, targets_secure);
             subprio = vec->prio & ~nvic_gprio_mask(s, targets_secure);
             if (vec->enabled && vec->pending &&
@@ -293,6 +304,8 @@ static void nvic_recompute_state(NVICState *s)
     int pend_prio = NVIC_NOEXC_PRIO;
     int active_prio = NVIC_NOEXC_PRIO;
     int pend_irq = 0;
+    ARMv7MState *armv7m = container_of(s, ARMv7MState, nvic);
+    ARMCPU *cpu = armv7m->cpu;
 
     /* In theory we could write one function that handled both
      * the "security extension present" and "not present"; however
@@ -316,6 +329,13 @@ static void nvic_recompute_state(NVICState *s)
         if (vec->active && vec->prio < active_prio) {
             active_prio = vec->prio;
         }
+        if (!vec->was_pending && vec->pending) {
+            if (cpu->env.v7m.scr[M_REG_NS] & R_V7M_SCR_SEVONPEND_MASK) {
+                cpu->env.v7m.event_register = 1;
+                qemu_cpu_kick(CPU(cpu));
+            }
+        }
+        vec->was_pending = vec->pending;
     }
 
     if (active_prio > 0) {
@@ -1657,7 +1677,7 @@ static void nvic_writel(NVICState *s, uint32_t offset, uint32_t value,
         }
         /* We don't implement deep-sleep so these bits are RAZ/WI.
          * The other bits in the register are banked.
-         * QEMU's implementation ignores SEVONPEND and SLEEPONEXIT, which
+         * QEMU's implementation ignores SLEEPONEXIT, which
          * is architecturally permitted.
          */
         value &= ~(R_V7M_SCR_SLEEPDEEP_MASK | R_V7M_SCR_SLEEPDEEPS_MASK);
diff --git a/include/hw/intc/armv7m_nvic.h b/include/hw/intc/armv7m_nvic.h
index 7b9964fe7e..305eaf6e6a 100644
--- a/include/hw/intc/armv7m_nvic.h
+++ b/include/hw/intc/armv7m_nvic.h
@@ -32,6 +32,7 @@ typedef struct VecInfo {
     uint8_t pending;
     uint8_t active;
     uint8_t level; /* exceptions <=15 never set level */
+    bool was_pending;
 } VecInfo;
 
 struct NVICState {
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 39292fb9bc..0a044f7254 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -143,6 +143,12 @@ static bool arm_cpu_has_work(CPUState *cs)
 {
     ARMCPU *cpu = ARM_CPU(cs);
 
+    if (arm_feature(&cpu->env, ARM_FEATURE_M)) {
+        if (cpu->env.v7m.event_register) {
+            return true;
+        }
+    }
+
     return (cpu->power_state != PSCI_OFF)
         && cpu_test_interrupt(cs,
                CPU_INTERRUPT_FIQ | CPU_INTERRUPT_HARD
@@ -480,6 +486,7 @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
             ~(R_V7M_FPCCR_LSPEN_MASK | R_V7M_FPCCR_S_MASK);
         env->v7m.control[M_REG_S] |= R_V7M_CONTROL_FPCA_MASK;
 #endif
+        env->v7m.event_register = 0;
     }
 
     /* M profile requires that reset clears the exclusive monitor;
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 39f2b2e54d..44433a444c 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -639,6 +639,7 @@ typedef struct CPUArchState {
         uint32_t nsacr;
         uint32_t ltpsize;
         uint32_t vpr;
+        uint32_t event_register;
     } v7m;
 
     /* Information associated with an exception about to be taken:
diff --git a/target/arm/machine.c b/target/arm/machine.c
index 0befdb0b28..acc79188f2 100644
--- a/target/arm/machine.c
+++ b/target/arm/machine.c
@@ -508,6 +508,24 @@ static const VMStateDescription vmstate_m_mve = {
     },
 };
 
+static bool m_event_needed(void *opaque)
+{
+    ARMCPU *cpu = opaque;
+    /* Only save the state if the event register is set (non-zero) */
+    return cpu->env.v7m.event_register != 0;
+}
+
+static const VMStateDescription vmstate_m_event = {
+    .name = "cpu/m/event",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = m_event_needed,
+    .fields = (const VMStateField[]) {
+        VMSTATE_UINT32(env.v7m.event_register, ARMCPU),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static const VMStateDescription vmstate_m = {
     .name = "cpu/m",
     .version_id = 4,
@@ -535,6 +553,7 @@ static const VMStateDescription vmstate_m = {
         &vmstate_m_v8m,
         &vmstate_m_fp,
         &vmstate_m_mve,
+        &vmstate_m_event,
         NULL
     }
 };
diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h
index 4636d1bc03..5a10a9fba3 100644
--- a/target/arm/tcg/helper.h
+++ b/target/arm/tcg/helper.h
@@ -60,6 +60,7 @@ DEF_HELPER_1(yield, void, env)
 DEF_HELPER_1(pre_hvc, void, env)
 DEF_HELPER_2(pre_smc, void, env, i32)
 DEF_HELPER_1(vesb, void, env)
+DEF_HELPER_1(sev, void, env)
 
 DEF_HELPER_3(cpsr_write, void, env, i32, i32)
 DEF_HELPER_2(cpsr_write_eret, void, env, i32)
diff --git a/target/arm/tcg/op_helper.c b/target/arm/tcg/op_helper.c
index 4fbd219555..ad79724778 100644
--- a/target/arm/tcg/op_helper.c
+++ b/target/arm/tcg/op_helper.c
@@ -469,16 +469,47 @@ void HELPER(wfit)(CPUARMState *env, uint64_t timeout)
 #endif
 }
 
+void HELPER(sev)(CPUARMState *env)
+{
+    CPUState *cs = env_cpu(env);
+    CPU_FOREACH(cs) {
+        ARMCPU *target_cpu = ARM_CPU(cs);
+        if (arm_feature(&target_cpu->env, ARM_FEATURE_M)) {
+            target_cpu->env.v7m.event_register = 1;
+        }
+        qemu_cpu_kick(cs);
+    }
+}
+
 void HELPER(wfe)(CPUARMState *env)
 {
-    /* This is a hint instruction that is semantically different
-     * from YIELD even though we currently implement it identically.
-     * Don't actually halt the CPU, just yield back to top
-     * level loop. This is not going into a "low power state"
-     * (ie halting until some event occurs), so we never take
-     * a configurable trap to a different exception level.
+    /*
+     * WFE (Wait For Event) is a hint instruction.
+     * For Cortex-M (M-profile), we implement the strict architectural behavior:
+     * 1. Check the Event Register (set by SEV or SEVONPEND).
+     * 2. If set, clear it and continue (consume the event).
      */
-    HELPER(yield)(env);
+    if (arm_feature(env, ARM_FEATURE_M)) {
+        CPUState *cs = env_cpu(env);
+
+        if (env->v7m.event_register) {
+            env->v7m.event_register = 0;
+            return;
+        }
+
+        cs->exception_index = EXCP_HLT;
+        cs->halted = 1;
+        cpu_loop_exit(cs);
+    } else {
+        /*
+         * For A-profile and others, we rely on the existing "yield" behavior.
+         * Don't actually halt the CPU, just yield back to top
+         * level loop. This is not going into a "low power state"
+         * (ie halting until some event occurs), so we never take
+         * a configurable trap to a different exception level
+         */
+        HELPER(yield)(env);
+    }
 }
 
 void HELPER(yield)(CPUARMState *env)
diff --git a/target/arm/tcg/t16.decode b/target/arm/tcg/t16.decode
index 646c74929d..ac6e24ac14 100644
--- a/target/arm/tcg/t16.decode
+++ b/target/arm/tcg/t16.decode
@@ -229,7 +229,7 @@ REVSH           1011 1010 11 ... ...            @rdm
     WFI         1011 1111 0011 0000
 
     # TODO: Implement SEV, SEVL; may help SMP performance.
-    # SEV       1011 1111 0100 0000
+    SEV         1011 1111 0100 0000
     # SEVL      1011 1111 0101 0000
 
     # The canonical nop has the second nibble as 0000, but the whole of the
diff --git a/target/arm/tcg/t32.decode b/target/arm/tcg/t32.decode
index d327178829..59b0edf63f 100644
--- a/target/arm/tcg/t32.decode
+++ b/target/arm/tcg/t32.decode
@@ -370,7 +370,7 @@ CLZ              1111 1010 1011 ---- 1111 .... 1000 ....      @rdm
         WFI      1111 0011 1010 1111 1000 0000 0000 0011
 
         # TODO: Implement SEV, SEVL; may help SMP performance.
-        # SEV    1111 0011 1010 1111 1000 0000 0000 0100
+        SEV      1111 0011 1010 1111 1000 0000 0000 0100
         # SEVL   1111 0011 1010 1111 1000 0000 0000 0101
 
         ESB      1111 0011 1010 1111 1000 0000 0001 0000
diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
index 63735d9789..bfe2691884 100644
--- a/target/arm/tcg/translate.c
+++ b/target/arm/tcg/translate.c
@@ -3241,14 +3241,25 @@ static bool trans_YIELD(DisasContext *s, arg_YIELD *a)
     return true;
 }
 
+static bool trans_SEV(DisasContext *s, arg_SEV *a)
+{
+    gen_update_pc(s, curr_insn_len(s));
+    gen_helper_sev(tcg_env);
+    tcg_gen_exit_tb(NULL, 0);
+    s->base.is_jmp = DISAS_NORETURN;
+    return true;
+}
+
 static bool trans_WFE(DisasContext *s, arg_WFE *a)
 {
     /*
      * When running single-threaded TCG code, use the helper to ensure that
-     * the next round-robin scheduled vCPU gets a crack.  In MTTCG mode we
-     * just skip this instruction.  Currently the SEV/SEVL instructions,
-     * which are *one* of many ways to wake the CPU from WFE, are not
-     * implemented so we can't sleep like WFI does.
+     * the next round-robin scheduled vCPU gets a crack.
+     *
+     * For Cortex-M, we implement the architectural WFE behavior (sleeping
+     * until an event occurs or the Event Register is set).
+     * For other profiles, we currently treat this as a NOP or yield,
+     * to preserve existing performance characteristics.
      */
     if (!(tb_cflags(s->base.tb) & CF_PARALLEL)) {
         gen_update_pc(s, curr_insn_len(s));
@@ -6807,6 +6818,7 @@ static void arm_tr_tb_stop(DisasContextBase *dcbase, CPUState *cpu)
             break;
         case DISAS_WFE:
             gen_helper_wfe(tcg_env);
+            tcg_gen_exit_tb(NULL, 0);
             break;
         case DISAS_YIELD:
             gen_helper_yield(tcg_env);
-- 
2.43.0
Re: [PATCH] target/arm: Implement WFE, SEV and SEVONPEND for Cortex-M
Posted by Peter Maydell 3 weeks, 3 days ago
On Tue, 23 Dec 2025 at 12:02, Ashish Anand <ashish.a6@samsung.com> wrote:
>
> From: "ashish.a6" <ashish.a6@samsung.com>
>
>     Currently, QEMU implements the 'Wait For Event' (WFE) instruction as a
>     a simple yield. This causes high host CPU usage because Guest
>     RTOS idle loops effectively become busy-wait loops.
>
>     To improve efficiency, this patch transitions WFE to use the architectural
>     'Halt' state (EXCP_HLT) for M-profile CPUs. This allows the host thread
>     to sleep when the guest is idle.


> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 39f2b2e54d..44433a444c 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -639,6 +639,7 @@ typedef struct CPUArchState {
>          uint32_t nsacr;
>          uint32_t ltpsize;
>          uint32_t vpr;
> +        uint32_t event_register;
>      } v7m;

One more small thing that I thought of -- although we're only
implementing WFE support for M-profile here, the concept of the
event register is the same for A-profile. So we should put the
field in the top level of the CPU state struct, not inside the v7m
sub-struct. That way it's available for us to use and share
if and when we ever do it for A-profile.

> +static bool m_event_needed(void *opaque)
> +{
> +    ARMCPU *cpu = opaque;
> +    /* Only save the state if the event register is set (non-zero) */
> +    return cpu->env.v7m.event_register != 0;
> +}
> +
> +static const VMStateDescription vmstate_m_event = {
> +    .name = "cpu/m/event",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = m_event_needed,
> +    .fields = (const VMStateField[]) {
> +        VMSTATE_UINT32(env.v7m.event_register, ARMCPU),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};

...and so similarly the migration handling should not have
anything M-profile specific to it.

thanks
-- PMM
Re: [PATCH] target/arm: Implement WFE, SEV and SEVONPEND for Cortex-M
Posted by Ashish Anand 3 weeks, 2 days ago
On 13/01/26 03:21PM, Peter Maydell wrote:
>On Tue, 23 Dec 2025 at 12:02, Ashish Anand <ashish.a6@samsung.com> wrote:
>>
>> From: "ashish.a6" <ashish.a6@samsung.com>
>>
>>     Currently, QEMU implements the 'Wait For Event' (WFE) instruction as a
>>     a simple yield. This causes high host CPU usage because Guest
>>     RTOS idle loops effectively become busy-wait loops.
>>
>>     To improve efficiency, this patch transitions WFE to use the architectural
>>     'Halt' state (EXCP_HLT) for M-profile CPUs. This allows the host thread
>>     to sleep when the guest is idle.
>
>
>> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
>> index 39f2b2e54d..44433a444c 100644
>> --- a/target/arm/cpu.h
>> +++ b/target/arm/cpu.h
>> @@ -639,6 +639,7 @@ typedef struct CPUArchState {
>>          uint32_t nsacr;
>>          uint32_t ltpsize;
>>          uint32_t vpr;
>> +        uint32_t event_register;
>>      } v7m;
>
>One more small thing that I thought of -- although we're only
>implementing WFE support for M-profile here, the concept of the
>event register is the same for A-profile. So we should put the
>field in the top level of the CPU state struct, not inside the v7m
>sub-struct. That way it's available for us to use and share
>if and when we ever do it for A-profile.
>
>> +static bool m_event_needed(void *opaque)
>> +{
>> +    ARMCPU *cpu = opaque;
>> +    /* Only save the state if the event register is set (non-zero) */
>> +    return cpu->env.v7m.event_register != 0;
>> +}
>> +
>> +static const VMStateDescription vmstate_m_event = {
>> +    .name = "cpu/m/event",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .needed = m_event_needed,
>> +    .fields = (const VMStateField[]) {
>> +        VMSTATE_UINT32(env.v7m.event_register, ARMCPU),
>> +        VMSTATE_END_OF_LIST()
>> +    }
>> +};
>
>...and so similarly the migration handling should not have
>anything M-profile specific to it.
>
>thanks
>-- PMM

Hi Peter,

Thanks for the detailed review. I acknowledge the comments and will incorporate the suggested changes and send a v2 shortly.


Regards,
Ashish

>
Re: [PATCH] target/arm: Implement WFE, SEV and SEVONPEND for Cortex-M
Posted by Peter Maydell 3 weeks, 3 days ago
On Tue, 23 Dec 2025 at 12:02, Ashish Anand <ashish.a6@samsung.com> wrote:
>
> From: "ashish.a6" <ashish.a6@samsung.com>
>
>     Currently, QEMU implements the 'Wait For Event' (WFE) instruction as a
>     a simple yield. This causes high host CPU usage because Guest
>     RTOS idle loops effectively become busy-wait loops.
>
>     To improve efficiency, this patch transitions WFE to use the architectural
>     'Halt' state (EXCP_HLT) for M-profile CPUs. This allows the host thread
>     to sleep when the guest is idle.

Thanks for this patch; it's been a missing piece in QEMU's emulation
for a long time. I think implementing it just for M-profile is fine.


>     To support this transition, we implement the full architectural behavior
>     required for WFE, specifically the 'Event Register' and 'SEVONPEND' logic
>     defined in the ARMv7-M specification. This ensures that the CPU correctly
>     identifies wakeup conditions as defined by the architecture.
>
>     Changes include:
>     1.  target/arm/cpu.h: Added 'event_register' to the v7m state struct.
>     2.  target/arm/cpu.c: Ensured event_register is cleared on reset and
>         updated arm_cpu_has_work() to check the event register state.
>     3.  target/arm/machine.c: Added VMState subsection for migration compatibility.
>     4.  hw/intc/armv7m_nvic.h: Added was_pending flag, to check for SEVONPEND logic.
>     5.  hw/intc/armv7m_nvic.c: Implemented SEVONPEND logic in :
>         nvic_recompute_state() and nvic_recompute_state_secure().
>         This sets the event register and kicks the CPU when a new interrupt
>         becomes pending.
>     6.  target/arm/tcg/helper.h: Declare the new helper_sev function.
>     7.  target/arm/tcg/op_helper.c:
>         - Update HELPER(wfe) to use EXCP_HLT for M-profile CPUs.
>         - Implement HELPER(sev) to set the event register and kick all vCPUs.
>     8.  target/arm/tcg/t16.decode / t32.decode: Enable decoding of the SEV
>         instruction in Thumb/Thumb-2 mode.
>     9.  target/arm/tcg/translate.c: Implement trans_SEV using inline
>         TCG generation to ensure system-wide visibility.

You don't really need to provide this kind of fine-grained "what
the changes are" info in the commit message -- it's more interesting
to talk about "what" at a higher level, and about "why".

>
>     This patch enables resource-efficient idle emulation for Cortex-M.
>
>     Signed-off-by: Ashish Anand ashish.a6@samsung.com

>  hw/intc/armv7m_nvic.c         | 22 ++++++++++++++++-
>  include/hw/intc/armv7m_nvic.h |  1 +
>  target/arm/cpu.c              |  7 ++++++
>  target/arm/cpu.h              |  1 +
>  target/arm/machine.c          | 19 +++++++++++++++
>  target/arm/tcg/helper.h       |  1 +
>  target/arm/tcg/op_helper.c    | 45 +++++++++++++++++++++++++++++------
>  target/arm/tcg/t16.decode     |  2 +-
>  target/arm/tcg/t32.decode     |  2 +-
>  target/arm/tcg/translate.c    | 20 ++++++++++++----
>  10 files changed, 106 insertions(+), 14 deletions(-)

It looks like there's a missing piece here: R_BPBR in the v8M
Arm ARM says that the event register is also set on exception
entry and exception return.

> diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
> index 7c78961040..a60990c71f 100644
> --- a/hw/intc/armv7m_nvic.c
> +++ b/hw/intc/armv7m_nvic.c
> @@ -16,6 +16,7 @@
>  #include "migration/vmstate.h"
>  #include "qemu/timer.h"
>  #include "hw/intc/armv7m_nvic.h"
> +#include "hw/arm/armv7m.h"

This shouldn't be needed.

>  #include "hw/irq.h"
>  #include "hw/qdev-properties.h"
>  #include "system/tcg.h"
> @@ -232,6 +233,8 @@ static void nvic_recompute_state_secure(NVICState *s)
>      int pend_irq = 0;
>      bool pending_is_s_banked = false;
>      int pend_subprio = 0;
> +    ARMv7MState *armv7m = container_of(s, ARMv7MState, nvic);
> +    ARMCPU *cpu = armv7m->cpu;

You don't need this roundabout way to get at the CPU object:
we already have a direct pointer to it in s->cpu (which is
how the rest of the code in this file gets to it).

>
>      /* R_CQRV: precedence is by:
>       *  - lowest group priority; if both the same then
> @@ -259,6 +262,14 @@ static void nvic_recompute_state_secure(NVICState *s)
>                  targets_secure = !exc_is_banked(i) && exc_targets_secure(s, i);
>              }
>
> +            if (!vec->was_pending && vec->pending) {
> +                if (cpu->env.v7m.scr[bank] & R_V7M_SCR_SEVONPEND_MASK) {
> +                    cpu->env.v7m.event_register = 1;
> +                    qemu_cpu_kick(CPU(cpu));
> +                }
> +            }
> +            vec->was_pending = vec->pending;

I don't think this is the right way to implement the "set event
register when an exception transitions from inactive to pending"
requirement. We should do this when we write vec->pending from
0 to 1. This means we'll need to abstract out the "set vec->pending"
action into a function, because currently we do it in multiple places
in the source code as a direct write. When we do that we will
have a single place we can do the "if this is 0->1 then set
the event register" change.

Also, the spec says it is interrupts going to pending that are
WFE wakeup events, not all exceptions, so I think we only want
to do this for anything NVIC_FIRST_IRQ and above.

> +
>              prio = exc_group_prio(s, vec->prio, targets_secure);
>              subprio = vec->prio & ~nvic_gprio_mask(s, targets_secure);
>              if (vec->enabled && vec->pending &&
> @@ -293,6 +304,8 @@ static void nvic_recompute_state(NVICState *s)
>      int pend_prio = NVIC_NOEXC_PRIO;
>      int active_prio = NVIC_NOEXC_PRIO;
>      int pend_irq = 0;
> +    ARMv7MState *armv7m = container_of(s, ARMv7MState, nvic);
> +    ARMCPU *cpu = armv7m->cpu;
>
>      /* In theory we could write one function that handled both
>       * the "security extension present" and "not present"; however
> @@ -316,6 +329,13 @@ static void nvic_recompute_state(NVICState *s)
>          if (vec->active && vec->prio < active_prio) {
>              active_prio = vec->prio;
>          }
> +        if (!vec->was_pending && vec->pending) {
> +            if (cpu->env.v7m.scr[M_REG_NS] & R_V7M_SCR_SEVONPEND_MASK) {
> +                cpu->env.v7m.event_register = 1;
> +                qemu_cpu_kick(CPU(cpu));
> +            }
> +        }
> +        vec->was_pending = vec->pending;
>      }
>
>      if (active_prio > 0) {
> @@ -1657,7 +1677,7 @@ static void nvic_writel(NVICState *s, uint32_t offset, uint32_t value,
>          }
>          /* We don't implement deep-sleep so these bits are RAZ/WI.
>           * The other bits in the register are banked.
> -         * QEMU's implementation ignores SEVONPEND and SLEEPONEXIT, which
> +         * QEMU's implementation ignores SLEEPONEXIT, which
>           * is architecturally permitted.
>           */
>          value &= ~(R_V7M_SCR_SLEEPDEEP_MASK | R_V7M_SCR_SLEEPDEEPS_MASK);
> diff --git a/include/hw/intc/armv7m_nvic.h b/include/hw/intc/armv7m_nvic.h
> index 7b9964fe7e..305eaf6e6a 100644
> --- a/include/hw/intc/armv7m_nvic.h
> +++ b/include/hw/intc/armv7m_nvic.h
> @@ -32,6 +32,7 @@ typedef struct VecInfo {
>      uint8_t pending;
>      uint8_t active;
>      uint8_t level; /* exceptions <=15 never set level */
> +    bool was_pending;
>  } VecInfo;
>
>  struct NVICState {
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index 39292fb9bc..0a044f7254 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -143,6 +143,12 @@ static bool arm_cpu_has_work(CPUState *cs)
>  {
>      ARMCPU *cpu = ARM_CPU(cs);
>
> +    if (arm_feature(&cpu->env, ARM_FEATURE_M)) {
> +        if (cpu->env.v7m.event_register) {
> +            return true;
> +        }
> +    }
> +
>      return (cpu->power_state != PSCI_OFF)
>          && cpu_test_interrupt(cs,
>                 CPU_INTERRUPT_FIQ | CPU_INTERRUPT_HARD
> @@ -480,6 +486,7 @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
>              ~(R_V7M_FPCCR_LSPEN_MASK | R_V7M_FPCCR_S_MASK);
>          env->v7m.control[M_REG_S] |= R_V7M_CONTROL_FPCA_MASK;
>  #endif
> +        env->v7m.event_register = 0;

CPU reset resets most state fields to 0 by default, so we
don't need to do this explicitly here.

>      }
>
>      /* M profile requires that reset clears the exclusive monitor;
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 39f2b2e54d..44433a444c 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -639,6 +639,7 @@ typedef struct CPUArchState {
>          uint32_t nsacr;
>          uint32_t ltpsize;
>          uint32_t vpr;
> +        uint32_t event_register;
>      } v7m;
>
>      /* Information associated with an exception about to be taken:
> diff --git a/target/arm/machine.c b/target/arm/machine.c
> index 0befdb0b28..acc79188f2 100644
> --- a/target/arm/machine.c
> +++ b/target/arm/machine.c

All the migration handling looks good; thanks for remembering
to include it.

> diff --git a/target/arm/tcg/helper.h b/target/arm/tcg/helper.h
> index 4636d1bc03..5a10a9fba3 100644
> --- a/target/arm/tcg/helper.h
> +++ b/target/arm/tcg/helper.h
> @@ -60,6 +60,7 @@ DEF_HELPER_1(yield, void, env)
>  DEF_HELPER_1(pre_hvc, void, env)
>  DEF_HELPER_2(pre_smc, void, env, i32)
>  DEF_HELPER_1(vesb, void, env)
> +DEF_HELPER_1(sev, void, env)
>
>  DEF_HELPER_3(cpsr_write, void, env, i32, i32)
>  DEF_HELPER_2(cpsr_write_eret, void, env, i32)
> diff --git a/target/arm/tcg/op_helper.c b/target/arm/tcg/op_helper.c
> index 4fbd219555..ad79724778 100644
> --- a/target/arm/tcg/op_helper.c
> +++ b/target/arm/tcg/op_helper.c
> @@ -469,16 +469,47 @@ void HELPER(wfit)(CPUARMState *env, uint64_t timeout)
>  #endif
>  }
>
> +void HELPER(sev)(CPUARMState *env)
> +{
> +    CPUState *cs = env_cpu(env);
> +    CPU_FOREACH(cs) {
> +        ARMCPU *target_cpu = ARM_CPU(cs);
> +        if (arm_feature(&target_cpu->env, ARM_FEATURE_M)) {
> +            target_cpu->env.v7m.event_register = 1;
> +        }
> +        qemu_cpu_kick(cs);

I don't think we need to kick the CPU if it is ourselves
(by definition we can't be waiting in WFE if we're executing
SEV), so we could write

        if (!qemu_cpu_is_self(cs)) {
            qemu_cpu_kick(cs);
        }

> +    }
> +}
> +
>  void HELPER(wfe)(CPUARMState *env)
>  {
> -    /* This is a hint instruction that is semantically different
> -     * from YIELD even though we currently implement it identically.
> -     * Don't actually halt the CPU, just yield back to top
> -     * level loop. This is not going into a "low power state"
> -     * (ie halting until some event occurs), so we never take
> -     * a configurable trap to a different exception level.
> +    /*
> +     * WFE (Wait For Event) is a hint instruction.
> +     * For Cortex-M (M-profile), we implement the strict architectural behavior:
> +     * 1. Check the Event Register (set by SEV or SEVONPEND).
> +     * 2. If set, clear it and continue (consume the event).
>       */

You should add a CONFIG_USER_ONLY version similar to what
we have in the wfi helper that makes WFE a nop there. Otherwise
trying to do WFE in the user-mode emulator will abort when
we raise EXCP_HLT.

> -    HELPER(yield)(env);
> +    if (arm_feature(env, ARM_FEATURE_M)) {
> +        CPUState *cs = env_cpu(env);
> +
> +        if (env->v7m.event_register) {
> +            env->v7m.event_register = 0;
> +            return;
> +        }
> +
> +        cs->exception_index = EXCP_HLT;
> +        cs->halted = 1;
> +        cpu_loop_exit(cs);
> +    } else {
> +        /*
> +         * For A-profile and others, we rely on the existing "yield" behavior.
> +         * Don't actually halt the CPU, just yield back to top
> +         * level loop. This is not going into a "low power state"
> +         * (ie halting until some event occurs), so we never take
> +         * a configurable trap to a different exception level
> +         */
> +        HELPER(yield)(env);
> +    }
>  }
>
>  void HELPER(yield)(CPUARMState *env)
> diff --git a/target/arm/tcg/t16.decode b/target/arm/tcg/t16.decode
> index 646c74929d..ac6e24ac14 100644
> --- a/target/arm/tcg/t16.decode
> +++ b/target/arm/tcg/t16.decode
> @@ -229,7 +229,7 @@ REVSH           1011 1010 11 ... ...            @rdm
>      WFI         1011 1111 0011 0000
>
>      # TODO: Implement SEV, SEVL; may help SMP performance.

We should update the TODO comment to note that we only
implement SEV for M-profile, and that A-profile SEV, SEVL
are still a TODO item. (Ditto the comment in t32.decode.)

> -    # SEV       1011 1111 0100 0000
> +    SEV         1011 1111 0100 0000
>      # SEVL      1011 1111 0101 0000
>
>      # The canonical nop has the second nibble as 0000, but the whole of the
> diff --git a/target/arm/tcg/t32.decode b/target/arm/tcg/t32.decode
> index d327178829..59b0edf63f 100644
> --- a/target/arm/tcg/t32.decode
> +++ b/target/arm/tcg/t32.decode
> @@ -370,7 +370,7 @@ CLZ              1111 1010 1011 ---- 1111 .... 1000 ....      @rdm
>          WFI      1111 0011 1010 1111 1000 0000 0000 0011
>
>          # TODO: Implement SEV, SEVL; may help SMP performance.
> -        # SEV    1111 0011 1010 1111 1000 0000 0000 0100
> +        SEV      1111 0011 1010 1111 1000 0000 0000 0100
>          # SEVL   1111 0011 1010 1111 1000 0000 0000 0101
>
>          ESB      1111 0011 1010 1111 1000 0000 0001 0000
> diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
> index 63735d9789..bfe2691884 100644
> --- a/target/arm/tcg/translate.c
> +++ b/target/arm/tcg/translate.c
> @@ -3241,14 +3241,25 @@ static bool trans_YIELD(DisasContext *s, arg_YIELD *a)
>      return true;
>  }
>
> +static bool trans_SEV(DisasContext *s, arg_SEV *a)
> +{

This should translate to a NOP if CONFIG_USER_ONLY, or
if we're not M-profile.

> +    gen_update_pc(s, curr_insn_len(s));
> +    gen_helper_sev(tcg_env);
> +    tcg_gen_exit_tb(NULL, 0);
> +    s->base.is_jmp = DISAS_NORETURN;

Why do we need to do this tcg_gen_exit_tb() and set DISAS_NORETURN ?

Do we even need to end the current TB on a SEV instruction ?

> +    return true;

> +}
> +
>  static bool trans_WFE(DisasContext *s, arg_WFE *a)
>  {
>      /*
>       * When running single-threaded TCG code, use the helper to ensure that
> -     * the next round-robin scheduled vCPU gets a crack.  In MTTCG mode we
> -     * just skip this instruction.  Currently the SEV/SEVL instructions,
> -     * which are *one* of many ways to wake the CPU from WFE, are not
> -     * implemented so we can't sleep like WFI does.
> +     * the next round-robin scheduled vCPU gets a crack.
> +     *
> +     * For Cortex-M, we implement the architectural WFE behavior (sleeping
> +     * until an event occurs or the Event Register is set).
> +     * For other profiles, we currently treat this as a NOP or yield,
> +     * to preserve existing performance characteristics.
>       */
>      if (!(tb_cflags(s->base.tb) & CF_PARALLEL)) {
>          gen_update_pc(s, curr_insn_len(s));
> @@ -6807,6 +6818,7 @@ static void arm_tr_tb_stop(DisasContextBase *dcbase, CPUState *cpu)
>              break;
>          case DISAS_WFE:
>              gen_helper_wfe(tcg_env);
> +            tcg_gen_exit_tb(NULL, 0);

Why is this necessary ?

>              break;
>          case DISAS_YIELD:
>              gen_helper_yield(tcg_env);
> --
> 2.43.0

thanks
-- PMM
Re: [PATCH] target/arm: Implement WFE, SEV and SEVONPEND for Cortex-M
Posted by Ashish Anand 3 weeks, 2 days ago
>It looks like there's a missing piece here: R_BPBR in the v8M
>Arm ARM says that the event register is also set on exception
>entry and exception return.
>

You're right that I'm missing this. Looking at DDI0403E_B & DDI0419E (ARMv7-M & ARMv6-M) 
Section B1.5.18:
- "An asynchronous exception at a priority that preempts any currently 
   active exception" is a WFE wakeup event
- "Any WFE wakeup event, or the execution of an exception return instruction, sets the Event Register"  

So does this imply that even in ARMv7-M/ARMv6-M (not just ARMv8-M), the event register should 
be set on:
1. Exception entry (when exception is taken/preempts)
2. Exception return

>Why do we need to do this tcg_gen_exit_tb() and set DISAS_NORETURN ?
>
>Do we even need to end the current TB on a SEV instruction ?
>

You're absolutely right - SEV doesn't need to end the TB. It just sets 
the event register. I'll remove the tcg_gen_exit_tb() and DISAS_NORETURN.


>>          case DISAS_WFE:
>>              gen_helper_wfe(tcg_env);
>> +            tcg_gen_exit_tb(NULL, 0);
>
>Why is this necessary ?
>

I think this is necessary because helper_wfe() can conditionally return (when 
event register is set), following the same pattern as WFI.

WFI helper can return if cpu_has_work(), so it needs tcg_gen_exit_tb() 
for that return path. WFE helper can return if event_register is set, 
so it also needs tcg_gen_exit_tb() for that return path


Thanks,
Ashish


Re: [PATCH] target/arm: Implement WFE, SEV and SEVONPEND for Cortex-M
Posted by Peter Maydell 3 weeks, 1 day ago
On Wed, 14 Jan 2026 at 14:20, Ashish Anand <ashish.a6@samsung.com> wrote:
>
> >It looks like there's a missing piece here: R_BPBR in the v8M
> >Arm ARM says that the event register is also set on exception
> >entry and exception return.
> >
>
> You're right that I'm missing this. Looking at DDI0403E_B & DDI0419E (ARMv7-M & ARMv6-M)
> Section B1.5.18:
> - "An asynchronous exception at a priority that preempts any currently
>    active exception" is a WFE wakeup event
> - "Any WFE wakeup event, or the execution of an exception return instruction, sets the Event Register"
>
> So does this imply that even in ARMv7-M/ARMv6-M (not just ARMv8-M), the event register should
> be set on:
> 1. Exception entry (when exception is taken/preempts)
> 2. Exception return

I hadn't looked at the v7M documentation. Looking at the v7M Arm ARM,
the text doesn't mention the exception-entry case, but the pseudocode
ExceptionTaken() function unconditionally calls SetEventRegister().
So I think this is likely a case where the manual text was wrong and
it got corrected for v8M, rather than being a divergence in behaviour.
For QEMU we should set the event register in both these cases
regardless of M-profile version.

v7M's text is also a bit confused about whether the SEVONPEND
bit affects only interrupts, or all exceptions. v8M seems to
consistently say "interrupts". I guess we go with "assume this
was a mistake in the v7M text". If you happen to have convenient
access to real hardware we could test the actual behaviour, but
if not I would go with "only interrupts" for both v7M and v8M.

> >>          case DISAS_WFE:
> >>              gen_helper_wfe(tcg_env);
> >> +            tcg_gen_exit_tb(NULL, 0);
> >
> >Why is this necessary ?
> >
>
> I think this is necessary because helper_wfe() can conditionally return (when
> event register is set), following the same pattern as WFI.
>
> WFI helper can return if cpu_has_work(), so it needs tcg_gen_exit_tb()
> for that return path. WFE helper can return if event_register is set,
> so it also needs tcg_gen_exit_tb() for that return path

Ah, I see -- currently the wfe helper calls the yield helper,
and that always calls cpu_loop_exit(), so the helper never
returns. (I'd assumed that because WFE is conceptually a NOP
for us at the moment that the helper would return.)

We should add a comment similar to the one we have for the
DISAS_WFI case:
    /* Go back to the main loop to check for interrupts */

thanks
-- PMM
Re: [PATCH] target/arm: Implement WFE, SEV and SEVONPEND for Cortex-M
Posted by Ashish Anand 2 weeks, 4 days ago
>I hadn't looked at the v7M documentation. Looking at the v7M Arm ARM,
>the text doesn't mention the exception-entry case, but the pseudocode
>ExceptionTaken() function unconditionally calls SetEventRegister().
>So I think this is likely a case where the manual text was wrong and
>it got corrected for v8M, rather than being a divergence in behaviour.
>For QEMU we should set the event register in both these cases
>regardless of M-profile version.
>

Hi Peter,
Thanks for clarifying this I will take this into account for v2. 

>v7M's text is also a bit confused about whether the SEVONPEND
>bit affects only interrupts, or all exceptions. v8M seems to
>consistently say "interrupts". I guess we go with "assume this
>was a mistake in the v7M text". If you happen to have convenient
>access to real hardware we could test the actual behaviour, but
>if not I would go with "only interrupts" for both v7M and v8M.

Currently I don't have access to hardware, so I will take "only interrupts" into consideration.
In future if I do get access to real hardware I will test this separately.

>Ah, I see -- currently the wfe helper calls the yield helper,
>and that always calls cpu_loop_exit(), so the helper never
>returns. (I'd assumed that because WFE is conceptually a NOP
>for us at the moment that the helper would return.)
>
>We should add a comment similar to the one we have for the
>DISAS_WFI case:
>    /* Go back to the main loop to check for interrupts */
>

Sure will take care of this in v2.


Thanks,
Ashish