[PATCH v7 06/10] x86/nmi: Add support to handle NMIs with source information

Sohil Mehta posted 10 patches 7 months, 4 weeks ago
[PATCH v7 06/10] x86/nmi: Add support to handle NMIs with source information
Posted by Sohil Mehta 7 months, 4 weeks ago
The NMI-source bitmap is delivered as FRED event data to the kernel.
When available, use NMI-source based filtering to determine the exact
handlers to run.

Activate NMI-source based filtering only for Local NMIs. While handling
platform NMI types (such as SERR and IOCHK), do not use the source
bitmap. They have only one handler registered per type, so there is no
need to disambiguate between multiple handlers.

Some third-party chipsets may send NMI messages with a fixed vector of
2, which would result in bit 2 being set in the NMI-source bitmap. Skip
the local NMI handlers in this situation.

Bit 0 of the source bitmap is set by the hardware whenever a source
vector was not used while generating an NMI, or the originator could not
be reliably identified. Poll all the registered handlers in that case.

When multiple handlers need to be executed, adhere to the existing
priority scheme and execute the handlers registered with NMI_FLAG_FIRST
before others.

The logic for handling legacy NMIs is unaffected since the source bitmap
would always have all bits set.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Reviewed-by: Xin Li (Intel) <xin@zytor.com>
---
v7: Add review tag from Xin.

v6: Get rid of a separate NMI source matching function (PeterZ)
    Set source_bitmap to ULONG_MAX to match all sources by default

v5: Significantly simplify NMI-source handling logic.
    Get rid of a separate lookup table for NMI-source vectors.
    Adhere to existing priority scheme for handling NMIs.
---
 arch/x86/kernel/nmi.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 8071ad32aa11..3d2b636e9379 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -130,6 +130,7 @@ static void nmi_check_duration(struct nmiaction *action, u64 duration)
 static int nmi_handle(unsigned int type, struct pt_regs *regs)
 {
 	struct nmi_desc *desc = nmi_to_desc(type);
+	unsigned long source_bitmap = ULONG_MAX;
 	nmi_handler_t ehandler;
 	struct nmiaction *a;
 	int handled=0;
@@ -148,16 +149,45 @@ static int nmi_handle(unsigned int type, struct pt_regs *regs)
 
 	rcu_read_lock();
 
+	/*
+	 * Activate NMI source-based filtering only for Local NMIs.
+	 *
+	 * Platform NMI types (such as SERR and IOCHK) have only one
+	 * handler registered per type, so there is no need to
+	 * disambiguate between multiple handlers.
+	 *
+	 * Also, if a platform source ends up setting bit 2 in the
+	 * source bitmap, the local NMI handlers would be skipped since
+	 * none of them use this reserved vector.
+	 *
+	 * For Unknown NMIs, avoid using the source bitmap to ensure all
+	 * potential handlers have a chance to claim responsibility.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_NMI_SOURCE) && type == NMI_LOCAL) {
+		source_bitmap = fred_event_data(regs);
+
+		/* Reset the bitmap if a valid source could not be identified */
+		if (WARN_ON_ONCE(!source_bitmap) || (source_bitmap & BIT(NMIS_NO_SOURCE)))
+			source_bitmap = ULONG_MAX;
+	}
+
 	/*
 	 * NMIs are edge-triggered, which means if you have enough
 	 * of them concurrently, you can lose some because only one
 	 * can be latched at any given time.  Walk the whole list
 	 * to handle those situations.
+	 *
+	 * However, NMI-source reporting does not have this limitation.
+	 * When NMI sources have been identified, only run the handlers
+	 * that match the reported vectors.
 	 */
 	list_for_each_entry_rcu(a, &desc->head, list) {
 		int thishandled;
 		u64 delta;
 
+		if (!(source_bitmap & BIT(a->source_vector)))
+			continue;
+
 		delta = sched_clock();
 		thishandled = a->handler(type, regs);
 		handled += thishandled;
-- 
2.43.0
RE: [PATCH v7 06/10] x86/nmi: Add support to handle NMIs with source information
Posted by Zhuo, Qiuxu 7 months ago
> From: Sohil Mehta <sohil.mehta@intel.com>
> [...]
> Activate NMI-source based filtering only for Local NMIs. While handling
> platform NMI types (such as SERR and IOCHK), do not use the source bitmap.
> They have only one handler registered per type, so there is no need to

Same as the comments for patch 6. 
Platform NMI types may have multiple handlers registered per type.

> [...]

> --- a/arch/x86/kernel/nmi.c
> +++ b/arch/x86/kernel/nmi.c
> @@ -130,6 +130,7 @@ static void nmi_check_duration(struct nmiaction
> *action, u64 duration)  static int nmi_handle(unsigned int type, struct pt_regs
> *regs)  {
>  	struct nmi_desc *desc = nmi_to_desc(type);
> +	unsigned long source_bitmap = ULONG_MAX;
>  	nmi_handler_t ehandler;
>  	struct nmiaction *a;
>  	int handled=0;
> @@ -148,16 +149,45 @@ static int nmi_handle(unsigned int type, struct
> pt_regs *regs)
> 
>  	rcu_read_lock();
> 
> +	/*
> +	 * Activate NMI source-based filtering only for Local NMIs.
> +	 *
> +	 * Platform NMI types (such as SERR and IOCHK) have only one
> +	 * handler registered per type, so there is no need to

Ditto.

> +	 * disambiguate between multiple handlers.
> +	 *
> +	 * Also, if a platform source ends up setting bit 2 in the
> +	 * source bitmap, the local NMI handlers would be skipped since
> +	 * none of them use this reserved vector.
> +	 *
> +	 * For Unknown NMIs, avoid using the source bitmap to ensure all
> +	 * potential handlers have a chance to claim responsibility.
> +	 */
> +	if (cpu_feature_enabled(X86_FEATURE_NMI_SOURCE) && type ==
> NMI_LOCAL) {
> +		source_bitmap = fred_event_data(regs);
> +
> +		/* Reset the bitmap if a valid source could not be identified */
> +		if (WARN_ON_ONCE(!source_bitmap) || (source_bitmap &
> BIT(NMIS_NO_SOURCE)))
> +			source_bitmap = ULONG_MAX;
> +	}
> +
>  	/*
>  	 * NMIs are edge-triggered, which means if you have enough
>  	 * of them concurrently, you can lose some because only one
>  	 * can be latched at any given time.  Walk the whole list
>  	 * to handle those situations.
> +	 *
> +	 * However, NMI-source reporting does not have this limitation.
> +	 * When NMI sources have been identified, only run the handlers
> +	 * that match the reported vectors.
>  	 */
>  	list_for_each_entry_rcu(a, &desc->head, list) {
>  		int thishandled;
>  		u64 delta;
> 
> +		if (!(source_bitmap & BIT(a->source_vector)))
> +			continue;
> +

Is it possible for the "source_bitmap" to have some non-NMIS_NO_SOURCE bit set 
while the user registers their NMI handler with the NMIS_NO_SOURCE type? 

If so, we may need to allow the NMI handler with the NMIS_NO_SOURCE type to be 
invoked unconditionally to ensure no NMIs are lost.

>  		delta = sched_clock();
>  		thishandled = a->handler(type, regs);
>  		handled += thishandled;
> --
> 2.43.0
> 
Re: [PATCH v7 06/10] x86/nmi: Add support to handle NMIs with source information
Posted by Sohil Mehta 7 months ago
On 7/7/2025 6:50 AM, Zhuo, Qiuxu wrote:

> Is it possible for the "source_bitmap" to have some non-NMIS_NO_SOURCE bit set 
> while the user registers their NMI handler with the NMIS_NO_SOURCE type? 
> 

IIUC, this is what you are asking:

Someone registers a handler without any source information.
For example, GHES does this.

register_nmi_handler(NMI_LOCAL, ghes_notify_nmi, 0, "ghes", NMIS_NO_SOURCE);

Now, when a GHES NMI shows up, can it have anything other than bit 0
(NMIS_NO_SOURCE) set in the source bitmap?

I believe the answer is no. Unless the GHES implementation or the
hardware has a bug, this should not happen. If this happens, it would
get logged as an unknown NMI in the kernel log.


> If so, we may need to allow the NMI handler with the NMIS_NO_SOURCE type to be 
> invoked unconditionally to ensure no NMIs are lost.
> 

If the kernel can't rely on the accuracy of the source_bitmap, then
NMI-source as a feature starts losing value.

Sohil