[PATCH v3 5/8] riscv: smp: use NMI for CPU stop

Yunhui Cui posted 8 patches 2 months, 2 weeks ago
[PATCH v3 5/8] riscv: smp: use NMI for CPU stop
Posted by Yunhui Cui 2 months, 2 weeks ago
Use NMI instead of IPI for CPU stop if RISC-V SSE NMI is supported.

Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
---
 arch/riscv/include/asm/smp.h           |  2 ++
 arch/riscv/kernel/smp.c                | 10 +++++++---
 drivers/firmware/riscv/riscv_sse_nmi.c |  1 +
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/include/asm/smp.h b/arch/riscv/include/asm/smp.h
index f53f1f0e7aa9e..e01ea962adfc4 100644
--- a/arch/riscv/include/asm/smp.h
+++ b/arch/riscv/include/asm/smp.h
@@ -63,6 +63,8 @@ static inline void cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
 }
 #endif
 
+void cpu_stop(void);
+
 /* Secondary hart entry */
 asmlinkage void smp_callin(void);
 
diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c
index 1b8cf986abbd0..bca95ec0b0f74 100644
--- a/arch/riscv/kernel/smp.c
+++ b/arch/riscv/kernel/smp.c
@@ -69,7 +69,7 @@ int riscv_hartid_to_cpuid(unsigned long hartid)
 	return -ENOENT;
 }
 
-static void ipi_stop(void)
+void cpu_stop(void)
 {
 	set_cpu_online(smp_processor_id(), false);
 	while (1)
@@ -127,7 +127,7 @@ static irqreturn_t handle_IPI(int irq, void *data)
 		generic_smp_call_function_interrupt();
 		break;
 	case IPI_CPU_STOP:
-		ipi_stop();
+		cpu_stop();
 		break;
 	case IPI_CPU_CRASH_STOP:
 		cpu_crash_stop(cpu, get_irq_regs());
@@ -259,7 +259,11 @@ void smp_send_stop(void)
 
 		if (system_state <= SYSTEM_RUNNING)
 			pr_crit("SMP: stopping secondary CPUs\n");
-		send_ipi_mask(&mask, IPI_CPU_STOP);
+
+		if (!nmi_support())
+			send_ipi_mask(&mask, IPI_CPU_STOP);
+		else
+			send_nmi_mask(&mask, LOCAL_NMI_CRASH);
 	}
 
 	/* Wait up to one second for other CPUs to stop */
diff --git a/drivers/firmware/riscv/riscv_sse_nmi.c b/drivers/firmware/riscv/riscv_sse_nmi.c
index add028efd25a0..02e2de2bb70f7 100644
--- a/drivers/firmware/riscv/riscv_sse_nmi.c
+++ b/drivers/firmware/riscv/riscv_sse_nmi.c
@@ -58,6 +58,7 @@ static int local_nmi_handler(u32 evt, void *arg, struct pt_regs *regs)
 	type = atomic_read(this_cpu_ptr(&local_nmi));
 
 	NMI_HANDLE(LOCAL_NMI_CRASH, cpu_crash_stop, cpu, regs);
+	NMI_HANDLE(LOCAL_NMI_STOP, cpu_stop);
 
 	atomic_andnot(type, this_cpu_ptr(&local_nmi));
 
-- 
2.39.5
Re: [PATCH v3 5/8] riscv: smp: use NMI for CPU stop
Posted by Radim Krčmář 2 months, 1 week ago
2025-11-27T20:53:02+08:00, Yunhui Cui <cuiyunhui@bytedance.com>:
> Use NMI instead of IPI for CPU stop if RISC-V SSE NMI is supported.
>
> Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
> ---
> diff --git a/drivers/firmware/riscv/riscv_sse_nmi.c b/drivers/firmware/riscv/riscv_sse_nmi.c
> @@ -58,6 +58,7 @@ static int local_nmi_handler(u32 evt, void *arg, struct pt_regs *regs)
>  	type = atomic_read(this_cpu_ptr(&local_nmi));
>  
>  	NMI_HANDLE(LOCAL_NMI_CRASH, cpu_crash_stop, cpu, regs);
> +	NMI_HANDLE(LOCAL_NMI_STOP, cpu_stop);

Please document the intended preemption design for all SSE events,
because it will be a nightmare if we forget some assumptions in the
coming years.  (That includes the relative priorities of RAS/PMU/...)

Thanks.
Re: [External] Re: [PATCH v3 5/8] riscv: smp: use NMI for CPU stop
Posted by yunhui cui 2 months, 1 week ago
Hi Radim,

On Thu, Dec 4, 2025 at 12:07 PM Radim Krčmář <rkrcmar@ventanamicro.com> wrote:
>
> 2025-11-27T20:53:02+08:00, Yunhui Cui <cuiyunhui@bytedance.com>:
> > Use NMI instead of IPI for CPU stop if RISC-V SSE NMI is supported.
> >
> > Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
> > ---
> > diff --git a/drivers/firmware/riscv/riscv_sse_nmi.c b/drivers/firmware/riscv/riscv_sse_nmi.c
> > @@ -58,6 +58,7 @@ static int local_nmi_handler(u32 evt, void *arg, struct pt_regs *regs)
> >       type = atomic_read(this_cpu_ptr(&local_nmi));
> >
> >       NMI_HANDLE(LOCAL_NMI_CRASH, cpu_crash_stop, cpu, regs);
> > +     NMI_HANDLE(LOCAL_NMI_STOP, cpu_stop);
>
> Please document the intended preemption design for all SSE events,
> because it will be a nightmare if we forget some assumptions in the
> coming years.  (That includes the relative priorities of RAS/PMU/...)

Actually, LOCAL_NMI_CRASH, LOCAL_NMI_STOP, LOCAL_NMI_BACKTRACE,
LOCAL_NMI_KGDB, ... are all implemented via the single SSE event
SBI_SSE_EVENT_LOCAL_SOFTWARE_INJECTED. Per the SSE design, no
preemption will occur among CRASH, STOP, BACKTRACE, and KGDB events.

>
> Thanks.

Thanks,
Yunhui
Re: [External] Re: [PATCH v3 5/8] riscv: smp: use NMI for CPU stop
Posted by Radim Krčmář 2 months, 1 week ago
2025-12-04T13:28:45+08:00, yunhui cui <cuiyunhui@bytedance.com>:
> Hi Radim,
>
> On Thu, Dec 4, 2025 at 12:07 PM Radim Krčmář <rkrcmar@ventanamicro.com> wrote:
>>
>> 2025-11-27T20:53:02+08:00, Yunhui Cui <cuiyunhui@bytedance.com>:
>> > Use NMI instead of IPI for CPU stop if RISC-V SSE NMI is supported.
>> >
>> > Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
>> > ---
>> > diff --git a/drivers/firmware/riscv/riscv_sse_nmi.c b/drivers/firmware/riscv/riscv_sse_nmi.c
>> > @@ -58,6 +58,7 @@ static int local_nmi_handler(u32 evt, void *arg, struct pt_regs *regs)
>> >       type = atomic_read(this_cpu_ptr(&local_nmi));
>> >
>> >       NMI_HANDLE(LOCAL_NMI_CRASH, cpu_crash_stop, cpu, regs);
>> > +     NMI_HANDLE(LOCAL_NMI_STOP, cpu_stop);
>>
>> Please document the intended preemption design for all SSE events,
>> because it will be a nightmare if we forget some assumptions in the
>> coming years.  (That includes the relative priorities of RAS/PMU/...)
>
> Actually, LOCAL_NMI_CRASH, LOCAL_NMI_STOP, LOCAL_NMI_BACKTRACE,
> LOCAL_NMI_KGDB, ... are all implemented via the single SSE event
> SBI_SSE_EVENT_LOCAL_SOFTWARE_INJECTED. Per the SSE design, no
> preemption will occur among CRASH, STOP, BACKTRACE, and KGDB events.

That is how it is.  I don't understand why it must be like that.

For example: PMU_OVERFLOW has lower event_id than SOFTWARE_INJECTED, so
it will currently interrupt NMI_CRASH as they both have priority 0,
although NMI_CRASH probably shouldn't be masked by anything, and should
preempt everything.
NMI_BACKTRACE, on the other hand, probably shouldn't have that high
priority as there seem more important events (e.g. RAS and NMI_CRASH).

The issues can be avoided by event priorities, masking, or deemed as
non-issue, but I think it would be beneficial to provide some reasoning
behind the design, as the choices don't seem obvious to me.

Thanks.
Re: [External] Re: [PATCH v3 5/8] riscv: smp: use NMI for CPU stop
Posted by yunhui cui 2 months ago
Hi Radim,

On Thu, Dec 4, 2025 at 9:16 PM Radim Krčmář <rkrcmar@ventanamicro.com> wrote:
>
> 2025-12-04T13:28:45+08:00, yunhui cui <cuiyunhui@bytedance.com>:
> > Hi Radim,
> >
> > On Thu, Dec 4, 2025 at 12:07 PM Radim Krčmář <rkrcmar@ventanamicro.com> wrote:
> >>
> >> 2025-11-27T20:53:02+08:00, Yunhui Cui <cuiyunhui@bytedance.com>:
> >> > Use NMI instead of IPI for CPU stop if RISC-V SSE NMI is supported.
> >> >
> >> > Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
> >> > ---
> >> > diff --git a/drivers/firmware/riscv/riscv_sse_nmi.c b/drivers/firmware/riscv/riscv_sse_nmi.c
> >> > @@ -58,6 +58,7 @@ static int local_nmi_handler(u32 evt, void *arg, struct pt_regs *regs)
> >> >       type = atomic_read(this_cpu_ptr(&local_nmi));
> >> >
> >> >       NMI_HANDLE(LOCAL_NMI_CRASH, cpu_crash_stop, cpu, regs);
> >> > +     NMI_HANDLE(LOCAL_NMI_STOP, cpu_stop);
> >>
> >> Please document the intended preemption design for all SSE events,
> >> because it will be a nightmare if we forget some assumptions in the
> >> coming years.  (That includes the relative priorities of RAS/PMU/...)
> >
> > Actually, LOCAL_NMI_CRASH, LOCAL_NMI_STOP, LOCAL_NMI_BACKTRACE,
> > LOCAL_NMI_KGDB, ... are all implemented via the single SSE event
> > SBI_SSE_EVENT_LOCAL_SOFTWARE_INJECTED. Per the SSE design, no
> > preemption will occur among CRASH, STOP, BACKTRACE, and KGDB events.
>
> That is how it is.  I don't understand why it must be like that.
>
> For example: PMU_OVERFLOW has lower event_id than SOFTWARE_INJECTED, so
> it will currently interrupt NMI_CRASH as they both have priority 0,
> although NMI_CRASH probably shouldn't be masked by anything, and should
> preempt everything.
> NMI_BACKTRACE, on the other hand, probably shouldn't have that high
> priority as there seem more important events (e.g. RAS and NMI_CRASH).
>
> The issues can be avoided by event priorities, masking, or deemed as
> non-issue, but I think it would be beneficial to provide some reasoning
> behind the design, as the choices don't seem obvious to me.

Indeed, it is necessary to consider the priority among different
events. Should different priorities also be assigned to NMI_CRASH,
NMI_BACKTRACE, NMI_STOP, and NMI_KGDB? Do these operations need to be
visible to the BIOS? Could you kindly provide some good suggestions?

>
> Thanks.

Thanks,
Yunhui
Re: [External] Re: [PATCH v3 5/8] riscv: smp: use NMI for CPU stop
Posted by Radim Krčmář 2 months ago
2025-12-08T19:40:39+08:00, yunhui cui <cuiyunhui@bytedance.com>:
> Hi Radim,
>
> On Thu, Dec 4, 2025 at 9:16 PM Radim Krčmář <rkrcmar@ventanamicro.com> wrote:
>>
>> 2025-12-04T13:28:45+08:00, yunhui cui <cuiyunhui@bytedance.com>:
>> > Hi Radim,
>> >
>> > On Thu, Dec 4, 2025 at 12:07 PM Radim Krčmář <rkrcmar@ventanamicro.com> wrote:
>> >>
>> >> 2025-11-27T20:53:02+08:00, Yunhui Cui <cuiyunhui@bytedance.com>:
>> >> > Use NMI instead of IPI for CPU stop if RISC-V SSE NMI is supported.
>> >> >
>> >> > Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
>> >> > ---
>> >> > diff --git a/drivers/firmware/riscv/riscv_sse_nmi.c b/drivers/firmware/riscv/riscv_sse_nmi.c
>> >> > @@ -58,6 +58,7 @@ static int local_nmi_handler(u32 evt, void *arg, struct pt_regs *regs)
>> >> >       type = atomic_read(this_cpu_ptr(&local_nmi));
>> >> >
>> >> >       NMI_HANDLE(LOCAL_NMI_CRASH, cpu_crash_stop, cpu, regs);
>> >> > +     NMI_HANDLE(LOCAL_NMI_STOP, cpu_stop);
>> >>
>> >> Please document the intended preemption design for all SSE events,
>> >> because it will be a nightmare if we forget some assumptions in the
>> >> coming years.  (That includes the relative priorities of RAS/PMU/...)
>> >
>> > Actually, LOCAL_NMI_CRASH, LOCAL_NMI_STOP, LOCAL_NMI_BACKTRACE,
>> > LOCAL_NMI_KGDB, ... are all implemented via the single SSE event
>> > SBI_SSE_EVENT_LOCAL_SOFTWARE_INJECTED. Per the SSE design, no
>> > preemption will occur among CRASH, STOP, BACKTRACE, and KGDB events.
>>
>> That is how it is.  I don't understand why it must be like that.
>>
>> For example: PMU_OVERFLOW has lower event_id than SOFTWARE_INJECTED, so
>> it will currently interrupt NMI_CRASH as they both have priority 0,
>> although NMI_CRASH probably shouldn't be masked by anything, and should
>> preempt everything.
>> NMI_BACKTRACE, on the other hand, probably shouldn't have that high
>> priority as there seem more important events (e.g. RAS and NMI_CRASH).
>>
>> The issues can be avoided by event priorities, masking, or deemed as
>> non-issue, but I think it would be beneficial to provide some reasoning
>> behind the design, as the choices don't seem obvious to me.
>
> Indeed, it is necessary to consider the priority among different
> events. Should different priorities also be assigned to NMI_CRASH,
> NMI_BACKTRACE, NMI_STOP, and NMI_KGDB?

I think it would be beneficial to document the desired behavior even if
we can't (currently?) implement it, because like you said, SSE can't
directly express the priority when multiplexing SOFTWARE_INJECTED.

>                                        Do these operations need to be
> visible to the BIOS?

BIOS shouldn't care what lower privilege wants to do.
SBI could define more events for software use, though.

>                      Could you kindly provide some good suggestions?

I think it would be good practice to explicitly set a unique priority
when registering SSE events.  Maybe through a global priority enum, and
make sure that all event registrations are passing a value from that
enum.

That would make sure that different events interact like we expect them
to, but it doesn't solve the multiplexing issue of SOFTWARE_INJECTED.

If we're fine with all SOFTWARE_INJECTED sub-handlers having the maximal
priority (higher than RAS/PMU/UNKNOWN_NMI/...), then we could hope that
lower imporance handlers (e.g. BACKTRACE) won't hang, so the higher
importance handlers (e.g. CRASH) would eventually run.
We're dealing with low-occurrence scenarios, so this might be "good
enough for now"...

Situation would get simpler if we could avoid some sub-handlers;
alternatively, it would get more complicated if SOFTWARE_INJECTED had
lower priority than some other event -- we'd make CRASH partially
recover its high priority image by masking other SSE events during its
execution (and we'd need warding amulets against hangs and starvation).

Thanks.
Re: [External] Re: [PATCH v3 5/8] riscv: smp: use NMI for CPU stop
Posted by yunhui cui 1 month, 4 weeks ago
Hi Radim,

On Wed, Dec 10, 2025 at 10:22 PM Radim Krčmář <rkrcmar@ventanamicro.com> wrote:
>
> 2025-12-08T19:40:39+08:00, yunhui cui <cuiyunhui@bytedance.com>:
> > Hi Radim,
> >
> > On Thu, Dec 4, 2025 at 9:16 PM Radim Krčmář <rkrcmar@ventanamicro.com> wrote:
> >>
> >> 2025-12-04T13:28:45+08:00, yunhui cui <cuiyunhui@bytedance.com>:
> >> > Hi Radim,
> >> >
> >> > On Thu, Dec 4, 2025 at 12:07 PM Radim Krčmář <rkrcmar@ventanamicro.com> wrote:
> >> >>
> >> >> 2025-11-27T20:53:02+08:00, Yunhui Cui <cuiyunhui@bytedance.com>:
> >> >> > Use NMI instead of IPI for CPU stop if RISC-V SSE NMI is supported.
> >> >> >
> >> >> > Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
> >> >> > ---
> >> >> > diff --git a/drivers/firmware/riscv/riscv_sse_nmi.c b/drivers/firmware/riscv/riscv_sse_nmi.c
> >> >> > @@ -58,6 +58,7 @@ static int local_nmi_handler(u32 evt, void *arg, struct pt_regs *regs)
> >> >> >       type = atomic_read(this_cpu_ptr(&local_nmi));
> >> >> >
> >> >> >       NMI_HANDLE(LOCAL_NMI_CRASH, cpu_crash_stop, cpu, regs);
> >> >> > +     NMI_HANDLE(LOCAL_NMI_STOP, cpu_stop);
> >> >>
> >> >> Please document the intended preemption design for all SSE events,
> >> >> because it will be a nightmare if we forget some assumptions in the
> >> >> coming years.  (That includes the relative priorities of RAS/PMU/...)
> >> >
> >> > Actually, LOCAL_NMI_CRASH, LOCAL_NMI_STOP, LOCAL_NMI_BACKTRACE,
> >> > LOCAL_NMI_KGDB, ... are all implemented via the single SSE event
> >> > SBI_SSE_EVENT_LOCAL_SOFTWARE_INJECTED. Per the SSE design, no
> >> > preemption will occur among CRASH, STOP, BACKTRACE, and KGDB events.
> >>
> >> That is how it is.  I don't understand why it must be like that.
> >>
> >> For example: PMU_OVERFLOW has lower event_id than SOFTWARE_INJECTED, so
> >> it will currently interrupt NMI_CRASH as they both have priority 0,
> >> although NMI_CRASH probably shouldn't be masked by anything, and should
> >> preempt everything.
> >> NMI_BACKTRACE, on the other hand, probably shouldn't have that high
> >> priority as there seem more important events (e.g. RAS and NMI_CRASH).
> >>
> >> The issues can be avoided by event priorities, masking, or deemed as
> >> non-issue, but I think it would be beneficial to provide some reasoning
> >> behind the design, as the choices don't seem obvious to me.
> >
> > Indeed, it is necessary to consider the priority among different
> > events. Should different priorities also be assigned to NMI_CRASH,
> > NMI_BACKTRACE, NMI_STOP, and NMI_KGDB?
>
> I think it would be beneficial to document the desired behavior even if
> we can't (currently?) implement it, because like you said, SSE can't
> directly express the priority when multiplexing SOFTWARE_INJECTED.
>
> >                                        Do these operations need to be
> > visible to the BIOS?
>
> BIOS shouldn't care what lower privilege wants to do.
> SBI could define more events for software use, though.
>
> >                      Could you kindly provide some good suggestions?
>
> I think it would be good practice to explicitly set a unique priority
> when registering SSE events.  Maybe through a global priority enum, and
> make sure that all event registrations are passing a value from that
> enum.

Yes, each distinct event should have a preset priority. I noticed that
SBI_SSE_EVENT_LOCAL_PMU_OVERFLOW in Clément's patchset
(https://lore.kernel.org/all/20251105082639.342973-5-cleger@rivosinc.com/)
also does not explicitly set a priority.
As you mentioned, a "global priority enum" is required. I will also
respond to Clément's email.

>
> That would make sure that different events interact like we expect them
> to, but it doesn't solve the multiplexing issue of SOFTWARE_INJECTED.
>
> If we're fine with all SOFTWARE_INJECTED sub-handlers having the maximal
> priority (higher than RAS/PMU/UNKNOWN_NMI/...), then we could hope that
> lower imporance handlers (e.g. BACKTRACE) won't hang, so the higher
> importance handlers (e.g. CRASH) would eventually run.
> We're dealing with low-occurrence scenarios, so this might be "good
> enough for now"...

Currently, x86 also handles multiple local NMI events, including
BACKTRACE, via a single NMI vector. We might as well set the priority
of SOFTWARE_INJECTED to the highest.

>
> Situation would get simpler if we could avoid some sub-handlers;
> alternatively, it would get more complicated if SOFTWARE_INJECTED had
> lower priority than some other event -- we'd make CRASH partially
> recover its high priority image by masking other SSE events during its
> execution (and we'd need warding amulets against hangs and starvation).
>
> Thanks.

Thanks,
Yunhui