[PATCH v7 08/10] x86/nmi: Enable NMI-source for IPIs delivered as NMIs

Sohil Mehta posted 10 patches 8 months ago
[PATCH v7 08/10] x86/nmi: Enable NMI-source for IPIs delivered as NMIs
Posted by Sohil Mehta 8 months ago
With the IPI handling APIs ready to support the new NMI encoding, encode
the NMI delivery mode directly with the NMI-source vectors to trigger
NMIs.

Move most of the existing NMI-based IPIs to use the new NMI-source
vectors, except for the microcode rendezvous NMI and the crash reboot
NMI. NMI handling for them is special-cased in exc_nmi() and does not
need NMI-source reporting.

However, in the future, it might be useful to assign a source vector to
all NMI sources to improve isolation and debuggability.

Originally-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
---
v7: No change.

v6: Include asm/nmi.h to avoid compile errors. (LKP)

v5: Encode APIC_DM_NMI directly with the NMI-source vector.
---
 arch/x86/include/asm/apic.h      | 8 ++++++++
 arch/x86/kernel/apic/hw_nmi.c    | 2 +-
 arch/x86/kernel/cpu/mce/inject.c | 2 +-
 arch/x86/kernel/kgdb.c           | 2 +-
 arch/x86/kernel/nmi_selftest.c   | 2 +-
 arch/x86/kernel/smp.c            | 2 +-
 6 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 32cdd81e5e45..5789df1708bd 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -14,6 +14,7 @@
 #include <asm/msr.h>
 #include <asm/hardirq.h>
 #include <asm/io.h>
+#include <asm/nmi.h>
 #include <asm/posted_intr.h>
 
 #define ARCH_APICTIMER_STOPS_ON_C3	1
@@ -23,6 +24,13 @@
 #define APIC_EXTNMI_ALL		1
 #define APIC_EXTNMI_NONE	2
 
+/* Trigger NMIs with source information */
+#define TEST_NMI		(APIC_DM_NMI | NMIS_VECTOR_TEST)
+#define SMP_STOP_NMI		(APIC_DM_NMI | NMIS_VECTOR_SMP_STOP)
+#define BT_NMI			(APIC_DM_NMI | NMIS_VECTOR_BT)
+#define KGDB_NMI		(APIC_DM_NMI | NMIS_VECTOR_KGDB)
+#define MCE_NMI			(APIC_DM_NMI | NMIS_VECTOR_MCE)
+
 /*
  * Debugging macros
  */
diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 4e04f13d2de9..586f4b25feae 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -33,7 +33,7 @@ u64 hw_nmi_get_sample_period(int watchdog_thresh)
 #ifdef arch_trigger_cpumask_backtrace
 static void nmi_raise_cpu_backtrace(cpumask_t *mask)
 {
-	__apic_send_IPI_mask(mask, NMI_VECTOR);
+	__apic_send_IPI_mask(mask, BT_NMI);
 }
 
 void arch_trigger_cpumask_backtrace(const cpumask_t *mask, int exclude_cpu)
diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index 320068e01c22..81a04836ac74 100644
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -270,7 +270,7 @@ static void __maybe_unused raise_mce(struct mce *m)
 					mce_irq_ipi, NULL, 0);
 				preempt_enable();
 			} else if (m->inject_flags & MCJ_NMI_BROADCAST)
-				__apic_send_IPI_mask(mce_inject_cpumask, NMI_VECTOR);
+				__apic_send_IPI_mask(mce_inject_cpumask, MCE_NMI);
 		}
 		start = jiffies;
 		while (!cpumask_empty(mce_inject_cpumask)) {
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index dfde434d2992..013feeb6a9ef 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -416,7 +416,7 @@ static void kgdb_disable_hw_debug(struct pt_regs *regs)
  */
 void kgdb_roundup_cpus(void)
 {
-	apic_send_IPI_allbutself(NMI_VECTOR);
+	apic_send_IPI_allbutself(KGDB_NMI);
 }
 #endif
 
diff --git a/arch/x86/kernel/nmi_selftest.c b/arch/x86/kernel/nmi_selftest.c
index f3918888e494..d5578370b47f 100644
--- a/arch/x86/kernel/nmi_selftest.c
+++ b/arch/x86/kernel/nmi_selftest.c
@@ -72,7 +72,7 @@ static void __init test_nmi_ipi(struct cpumask *mask)
 	/* sync above data before sending NMI */
 	wmb();
 
-	__apic_send_IPI_mask(mask, NMI_VECTOR);
+	__apic_send_IPI_mask(mask, TEST_NMI);
 
 	/* Don't wait longer than a second */
 	timeout = USEC_PER_SEC;
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 1ed065b78467..239e9bf97abb 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -217,7 +217,7 @@ static void native_stop_other_cpus(int wait)
 			pr_emerg("Shutting down cpus with NMI\n");
 
 			for_each_cpu(cpu, &cpus_stop_mask)
-				__apic_send_IPI(cpu, NMI_VECTOR);
+				__apic_send_IPI(cpu, SMP_STOP_NMI);
 		}
 		/*
 		 * Don't wait longer than 10 ms if the caller didn't
-- 
2.43.0
Re: [PATCH v7 08/10] x86/nmi: Enable NMI-source for IPIs delivered as NMIs
Posted by Sean Christopherson 7 months ago
On Thu, Jun 12, 2025, Sohil Mehta wrote:
> With the IPI handling APIs ready to support the new NMI encoding, encode
> the NMI delivery mode directly with the NMI-source vectors to trigger
> NMIs.
> 
> Move most of the existing NMI-based IPIs to use the new NMI-source
> vectors, except for the microcode rendezvous NMI and the crash reboot
> NMI. NMI handling for them is special-cased in exc_nmi() and does not
> need NMI-source reporting.
> 
> However, in the future, it might be useful to assign a source vector to
> all NMI sources to improve isolation and debuggability.
> 
> Originally-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Co-developed-by: Xin Li (Intel) <xin@zytor.com>
> Signed-off-by: Xin Li (Intel) <xin@zytor.com>
> Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
> ---
> v7: No change.
> 
> v6: Include asm/nmi.h to avoid compile errors. (LKP)
> 
> v5: Encode APIC_DM_NMI directly with the NMI-source vector.
> ---
>  arch/x86/include/asm/apic.h      | 8 ++++++++
>  arch/x86/kernel/apic/hw_nmi.c    | 2 +-
>  arch/x86/kernel/cpu/mce/inject.c | 2 +-
>  arch/x86/kernel/kgdb.c           | 2 +-
>  arch/x86/kernel/nmi_selftest.c   | 2 +-
>  arch/x86/kernel/smp.c            | 2 +-
>  6 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
> index 32cdd81e5e45..5789df1708bd 100644
> --- a/arch/x86/include/asm/apic.h
> +++ b/arch/x86/include/asm/apic.h
> @@ -14,6 +14,7 @@
>  #include <asm/msr.h>
>  #include <asm/hardirq.h>
>  #include <asm/io.h>
> +#include <asm/nmi.h>
>  #include <asm/posted_intr.h>
>  
>  #define ARCH_APICTIMER_STOPS_ON_C3	1
> @@ -23,6 +24,13 @@
>  #define APIC_EXTNMI_ALL		1
>  #define APIC_EXTNMI_NONE	2
>  
> +/* Trigger NMIs with source information */
> +#define TEST_NMI		(APIC_DM_NMI | NMIS_VECTOR_TEST)
> +#define SMP_STOP_NMI		(APIC_DM_NMI | NMIS_VECTOR_SMP_STOP)
> +#define BT_NMI			(APIC_DM_NMI | NMIS_VECTOR_BT)

s/BT/BACKTRACE?

> +#define KGDB_NMI		(APIC_DM_NMI | NMIS_VECTOR_KGDB)
> +#define MCE_NMI			(APIC_DM_NMI | NMIS_VECTOR_MCE)

IMO, NMI_xxx reads better, e.g. it's easier to see that code is sending an NMI
at the call sites.

> +
>  /*
>   * Debugging macros
>   */
> diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
> index 4e04f13d2de9..586f4b25feae 100644
> --- a/arch/x86/kernel/apic/hw_nmi.c
> +++ b/arch/x86/kernel/apic/hw_nmi.c
> @@ -33,7 +33,7 @@ u64 hw_nmi_get_sample_period(int watchdog_thresh)
>  #ifdef arch_trigger_cpumask_backtrace
>  static void nmi_raise_cpu_backtrace(cpumask_t *mask)
>  {
> -	__apic_send_IPI_mask(mask, NMI_VECTOR);
> +	__apic_send_IPI_mask(mask, BT_NMI);

This patch is buggy.  There are at least two implementations of ->send_IPI_mask()
that this breaks:

  uv_send_IPI_mask() = > uv_send_IPI_one():

	if (vector == NMI_VECTOR)
		dmode = APIC_DELIVERY_MODE_NMI;
	else
		dmode = APIC_DELIVERY_MODE_FIXED;


and xen_send_IPI_mask() => xen_map_vector():

	switch (vector) {
	case RESCHEDULE_VECTOR:
		xen_vector = XEN_RESCHEDULE_VECTOR;
		break;
	case CALL_FUNCTION_VECTOR:
		xen_vector = XEN_CALL_FUNCTION_VECTOR;
		break;
	case CALL_FUNCTION_SINGLE_VECTOR:
		xen_vector = XEN_CALL_FUNCTION_SINGLE_VECTOR;
		break;
	case IRQ_WORK_VECTOR:
		xen_vector = XEN_IRQ_WORK_VECTOR;
		break;
#ifdef CONFIG_X86_64
	case NMI_VECTOR:
	case APIC_DM_NMI: /* Some use that instead of NMI_VECTOR */
		xen_vector = XEN_NMI_VECTOR;
		break;
#endif
	default:
		xen_vector = -1;
		printk(KERN_ERR "xen: vector 0x%x is not implemented\n",
			vector);
	}

	return xen_vector;

Looking at all of this again, shoving the NMI source information into the @vector
is quite brittle.  Nothing forces implementations to handle embedded delivery
mode information.

One thought would be to pass a small struct (by value), and then provide macros
to generate the structure for a specific vector.  That provides some amount of
type safety and should make it a bit harder to pass in garbage, without making
the callers any less readable.

struct apic_ipi {
	u8 vector;
	u8 type;
};

#define APIC_IPI(v, t) ({ struct apic_ipi i = { .vector = v, .type = t }; i; })
#define APIC_IPI_IRQ(vector) APIC_IPI(vector, APIC_DELIVERY_MODE_FIXED)
#define APIC_IPI_NMI(vector) APIC_IPI(vector, APIC_DELIVERY_MODE_NMI)

#define IPI_IRQ_WORK		APIC_IPI_IRQ(IRQ_WORK_VECTOR)
#define IPI_POSTED_INTR_WAKEUP	APIC_IPI_IRQ(POSTED_INTR_WAKEUP_VECTOR)

#define IPI_NMI_BACKTRACE	APIC_IPI_NMI(NMI_BACKTRACE_VECTOR)

static __always_inline void __apic_send_IPI_self(struct apic_ipi ipi)
Re: [PATCH v7 08/10] x86/nmi: Enable NMI-source for IPIs delivered as NMIs
Posted by Sohil Mehta 7 months ago
On 7/8/2025 11:37 AM, Sean Christopherson wrote:

> This patch is buggy.  There are at least two implementations of ->send_IPI_mask()
> that this breaks:
> 

Thank you for point this out. I should have been more diligent.


> Looking at all of this again, shoving the NMI source information into the @vector
> is quite brittle.  Nothing forces implementations to handle embedded delivery
> mode information.
> 

I agree. There is already some confusion with NMI_VECTOR and APIC_DM_NMI
used interchangeably sometimes. Adding the new NMI-source vectors with
the encoded delivery mode makes it worse.


> One thought would be to pass a small struct (by value), and then provide macros
> to generate the structure for a specific vector.  That provides some amount of
> type safety and should make it a bit harder to pass in garbage, without making
> the callers any less readable.
>
> struct apic_ipi {
> 	u8 vector;
> 	u8 type;
> };
>  

I am fine with this approach. Though, the changes would be massive since
we have quite a few interfaces and a lot of "struct apic".

	.send_IPI
	.send_IPI_mask
	.send_IPI_mask_allbutself
	.send_IPI_allbutself
	.send_IPI_all
	.send_IPI_self


An option I was considering was whether we should avoid exposing the raw
delivery mode to the callers since it is mainly an APIC internal thing.
The callers should only have to say NMI or IRQ along with the vector and
let the APIC code figure out how to generate it.

One option is to add a separate set of send_IPI_NMI APIs parallel to
send_IPI ones that we have. But then we would end with >10 ways to
generate IPIs.

Another way would be to assign the NMI vectors in a different range and
use the range to differentiate between IRQ and NMI.

For example:
	IRQ => 0x0-0xFF
	NMI => 0x10000-0x1000F.

However, this would still be fragile and probably have similar issues to
the one you pointed out.

> 
> static __always_inline void __apic_send_IPI_self(struct apic_ipi ipi)

Taking a step back:

Since we are considering changing the interface, would it be worth
consolidating the multiple send_IPI APIs into one or two? Mainly, by
moving the destination information from the function name to the
function parameter.

  apic_send_IPI(DEST, MASK, TYPE, VECTOR)

  DEST   => self, all, allbutself, mask, maskbutself

  MASK   => cpumask

  TYPE   => IRQ, NMI

  VECTOR => Vector number specific to the type.

I like the single line IPI invocation. All of this can still be passed
in a neat "struct apic_ipi" with a macro helping the callers fill the
struct.

These interfaces are decades old. So, maybe I am being too ambitious and
this isn't practically feasible. Thoughts/Suggestions?


Note: Another part of me says there are only a handful of NMI IPI usages
and the heavy lifting isn't worth it. We should fix the bugs, improve
testing and use the existing approach since it is the least invasive :)
Re: [PATCH v7 08/10] x86/nmi: Enable NMI-source for IPIs delivered as NMIs
Posted by Sean Christopherson 7 months ago
On Thu, Jul 10, 2025, Sohil Mehta wrote:
> On 7/8/2025 11:37 AM, Sean Christopherson wrote:
> 
> > This patch is buggy.  There are at least two implementations of ->send_IPI_mask()
> > that this breaks:
> > 
> 
> Thank you for point this out. I should have been more diligent.
> 
> 
> > Looking at all of this again, shoving the NMI source information into the @vector
> > is quite brittle.  Nothing forces implementations to handle embedded delivery
> > mode information.
> > 
> 
> I agree. There is already some confusion with NMI_VECTOR and APIC_DM_NMI
> used interchangeably sometimes. Adding the new NMI-source vectors with
> the encoded delivery mode makes it worse.
> 
> 
> > One thought would be to pass a small struct (by value), and then provide macros
> > to generate the structure for a specific vector.  That provides some amount of
> > type safety and should make it a bit harder to pass in garbage, without making
> > the callers any less readable.
> >
> > struct apic_ipi {
> > 	u8 vector;
> > 	u8 type;
> > };
> >  
> 
> I am fine with this approach. Though, the changes would be massive since
> we have quite a few interfaces and a lot of "struct apic".

It'd definitely be big, but it doesn't seem like it'd be overwhelmingly painful.
Though it's certainly enough churn that I wouldn't do anything until there's a
consensus one way or the other :-)

> 	.send_IPI
> 	.send_IPI_mask
> 	.send_IPI_mask_allbutself
> 	.send_IPI_allbutself
> 	.send_IPI_all
> 	.send_IPI_self
> 
> 
> An option I was considering was whether we should avoid exposing the raw
> delivery mode to the callers since it is mainly an APIC internal thing.
> The callers should only have to say NMI or IRQ along with the vector and
> let the APIC code figure out how to generate it.
> 
> One option is to add a separate set of send_IPI_NMI APIs parallel to
> send_IPI ones that we have. But then we would end with >10 ways to
> generate IPIs.

Yeah, that idea crossed my mind too, and I came to the same conclusion.

> Another way would be to assign the NMI vectors in a different range and
> use the range to differentiate between IRQ and NMI.
> 
> For example:
> 	IRQ => 0x0-0xFF
> 	NMI => 0x10000-0x1000F.
> 
> However, this would still be fragile and probably have similar issues to
> the one you pointed out.
> 
> > 
> > static __always_inline void __apic_send_IPI_self(struct apic_ipi ipi)
> 
> Taking a step back:
> 
> Since we are considering changing the interface, would it be worth
> consolidating the multiple send_IPI APIs into one or two? Mainly, by
> moving the destination information from the function name to the
> function parameter.
> 
>   apic_send_IPI(DEST, MASK, TYPE, VECTOR)
> 
>   DEST   => self, all, allbutself, mask, maskbutself
> 
>   MASK   => cpumask
> 
>   TYPE   => IRQ, NMI
> 
>   VECTOR => Vector number specific to the type.
> 
> I like the single line IPI invocation. All of this can still be passed
> in a neat "struct apic_ipi" with a macro helping the callers fill the
> struct.
> 
> These interfaces are decades old. So, maybe I am being too ambitious and
> this isn't practically feasible. Thoughts/Suggestions?

I suspect making DEST a parameter will be a net negative.  Many (most?) implementations
will likely de-multiplex the DEST on the back end, i.e. the amount of churn will
be roughly the same, and we might end up with *more* code due to multiple
implemenations having to do the fan out.

I think we'd also end up with slightly less readable code in the callers.

> Note: Another part of me says there are only a handful of NMI IPI usages
> and the heavy lifting isn't worth it. We should fix the bugs, improve
> testing and use the existing approach since it is the least invasive :)

FWIW, I think the churn would be worthwhile in the long run.  But I'm also not
volunteering to do said work...
Re: [PATCH v7 08/10] x86/nmi: Enable NMI-source for IPIs delivered as NMIs
Posted by Sohil Mehta 6 months, 2 weeks ago
Hi Thomas, Dave,

Seeking your inputs on the below direction.

On 7/10/2025 3:40 PM, Sean Christopherson wrote:
>>> One thought would be to pass a small struct (by value), and then provide macros
>>> to generate the structure for a specific vector.  That provides some amount of
>>> type safety and should make it a bit harder to pass in garbage, without making
>>> the callers any less readable.
>>>
>>> struct apic_ipi {
>>> 	u8 vector;
>>> 	u8 type;
>>> };
>>>  
>>> #define APIC_IPI(v, t) ({ struct apic_ipi i = { .vector = v, .type = t }; i; })
>>> #define APIC_IPI_IRQ(vector) APIC_IPI(vector, APIC_DELIVERY_MODE_FIXED)
>>> #define APIC_IPI_NMI(vector) APIC_IPI(vector, APIC_DELIVERY_MODE_NMI)
>>> 
>>> #define IPI_IRQ_WORK		APIC_IPI_IRQ(IRQ_WORK_VECTOR)
>>> #define IPI_POSTED_INTR_WAKEUP	APIC_IPI_IRQ(POSTED_INTR_WAKEUP_VECTOR)
>>> 
>>> #define IPI_NMI_BACKTRACE	APIC_IPI_NMI(NMI_BACKTRACE_VECTOR)
>>> 
>>> static __always_inline void __apic_send_IPI_self(struct apic_ipi ipi)
>>
>> I am fine with this approach. Though, the changes would be massive since
>> we have quite a few interfaces and a lot of "struct apic".
> 
> It'd definitely be big, but it doesn't seem like it'd be overwhelmingly painful.
> Though it's certainly enough churn that I wouldn't do anything until there's a
> consensus one way or the other :-)
> 

In order to accommodate NMI source vectors, updating the send_IPI_*()
APIs seems like the right thing to do for the long run. But it would
introduce some churn. Is this the optimal path forward? Any other
options we should consider?