[v2] x86/nmi: Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus()

[PATCH v2] x86/nmi: Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus()

Posted by Waiman Long 1 year, 2 months ago

Depending on the type of panics, it was found that the
__register_nmi_handler() function can be called in NMI context from
nmi_shootdown_cpus() leading to a lockdep splat like the following.

[ 1123.133573] ================================
[ 1123.137845] WARNING: inconsistent lock state
[ 1123.142118] 6.12.0-31.el10.x86_64+debug #1 Not tainted
[ 1123.147257] --------------------------------
[ 1123.151529] inconsistent {INITIAL USE} -> {IN-NMI} usage.
  :
[ 1123.261544]  Possible unsafe locking scenario:
[ 1123.261544]
[ 1123.267463]        CPU0
[ 1123.269915]        ----
[ 1123.272368]   lock(&nmi_desc[0].lock);
[ 1123.276122]   <Interrupt>
[ 1123.278746]     lock(&nmi_desc[0].lock);
[ 1123.282671]
[ 1123.282671]  *** DEADLOCK ***
  :
[ 1123.314088] Call Trace:
[ 1123.316542]  <NMI>
[ 1123.318562]  dump_stack_lvl+0x6f/0xb0
[ 1123.322230]  print_usage_bug.part.0+0x3d3/0x610
[ 1123.330618]  lock_acquire.part.0+0x2e6/0x360
[ 1123.357217]  _raw_spin_lock_irqsave+0x46/0x90
[ 1123.366193]  __register_nmi_handler+0x8f/0x3a0
[ 1123.374401]  nmi_shootdown_cpus+0x95/0x120
[ 1123.378509]  kdump_nmi_shootdown_cpus+0x15/0x20
[ 1123.383040]  native_machine_crash_shutdown+0x54/0x160
[ 1123.388095]  __crash_kexec+0x10f/0x1f0
[ 1123.421465]  ? __ghes_panic.cold+0x4f/0x5d
[ 1123.482648]  </NMI>

In this particular case, the following panic message was printed before.

[ 1122.808188] Kernel panic - not syncing: Fatal hardware error!

This message seemed to be given out from __ghes_panic() running in
NMI context.

The __register_nmi_handler() function which takes the nmi_desc lock
with irq disabled shouldn't be called from NMI context as this can
lead to deadlock.

The nmi_shootdown_cpus() function can only be invoked once. After the
first invocation, all other CPUs should be stuck in the newly added
crash_nmi_callback() and cannot respond to a second NMI.

One way to address this problem is to remove all the panic() calls from
NMI context, but that can be too restrictive.

Another way to fix this problem while allowing panic() calls from
NMI context is by adding a new emergency NMI handler to the nmi_desc
structure and provide a new set_emergency_nmi_handler() helper to
atomically set crash_nmi_callback() in any context. The new emergency
handler will be invoked first before other handlers in the linked
list. That will eliminate the need to take any lock and serve the panic
in NMI use case.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 arch/x86/include/asm/nmi.h |  2 ++
 arch/x86/kernel/nmi.c      | 40 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/reboot.c   | 11 ++++-------
 3 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
index 41a0ebb699ec..6715c123eff4 100644
--- a/arch/x86/include/asm/nmi.h
+++ b/arch/x86/include/asm/nmi.h
@@ -56,6 +56,8 @@ int __register_nmi_handler(unsigned int, struct nmiaction *);
 
 void unregister_nmi_handler(unsigned int, const char *);
 
+int set_emergency_nmi_handler(unsigned int type, nmi_handler_t handler);
+
 void stop_nmi(void);
 void restart_nmi(void);
 void local_touch_nmi(void);
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index ed163c8c8604..d551f2814cf2 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -40,8 +40,12 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/nmi.h>
 
+/*
+ * An emergency handler can be set in any context
+ */
 struct nmi_desc {
 	raw_spinlock_t lock;
+	nmi_handler_t emerg_handler;	/* Emergency handler */
 	struct list_head head;
 };
 
@@ -132,9 +136,18 @@ static void nmi_check_duration(struct nmiaction *action, u64 duration)
 static int nmi_handle(unsigned int type, struct pt_regs *regs)
 {
 	struct nmi_desc *desc = nmi_to_desc(type);
+	nmi_handler_t ehandler;
 	struct nmiaction *a;
 	int handled=0;
 
+	/*
+	 * Call the emergency handler first, if set
+	 * Emergency handler is not traced or checked by nmi_check_duration().
+	 */
+	ehandler = READ_ONCE(desc->emerg_handler);
+	if (ehandler)
+		handled = ehandler(type, regs);
+
 	rcu_read_lock();
 
 	/*
@@ -224,6 +237,33 @@ void unregister_nmi_handler(unsigned int type, const char *name)
 }
 EXPORT_SYMBOL_GPL(unregister_nmi_handler);
 
+/**
+ * set_emergency_nmi_handler - Set emergency handler
+ * @handler - the emergency handler to be stored
+ * Return: 0 if success, -EEXIST if a handler had been stored
+ *
+ * Atomically set an emergency NMI handler which, if set, will be invoked
+ * before all the other handlers in the linked list. If a NULL handler is
+ * passed in, it will clear it.
+ */
+int set_emergency_nmi_handler(unsigned int type, nmi_handler_t handler)
+{
+	struct nmi_desc *desc = nmi_to_desc(type);
+	nmi_handler_t orig = NULL;
+
+	if (!handler) {
+		orig = READ_ONCE(desc->emerg_handler);
+		WARN_ON_ONCE(!orig);
+	}
+
+	if (try_cmpxchg(&desc->emerg_handler, &orig, handler))
+		return 0;
+	if (WARN_ON_ONCE(orig == handler))
+		return 0;
+	WARN_ONCE(1, "%s: failed to set emergency NMI handler!\n", __func__);
+	return -EEXIST;
+}
+
 static void
 pci_serr_error(unsigned char reason, struct pt_regs *regs)
 {
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 615922838c51..c1c8e1334343 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -926,15 +926,12 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 	shootdown_callback = callback;
 
 	atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
-	/* Would it be better to replace the trap vector here? */
-	if (register_nmi_handler(NMI_LOCAL, crash_nmi_callback,
-				 NMI_FLAG_FIRST, "crash"))
-		return;		/* Return what? */
+
 	/*
-	 * Ensure the new callback function is set before sending
-	 * out the NMI
+	 * Atomically set emergency handler to be invoked first before other
+	 * handlers. The action shouldn't fail or a warning will be printed.
 	 */
-	wmb();
+	set_emergency_nmi_handler(NMI_LOCAL, crash_nmi_callback);
 
 	apic_send_IPI_allbutself(NMI_VECTOR);
 
-- 
2.47.0

Re: [PATCH v2] x86/nmi: Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus()

Posted by Thomas Gleixner 1 year, 2 months ago

On Tue, Dec 03 2024 at 10:07, Waiman Long wrote:
> +	/*
> +	 * Call the emergency handler first, if set
> +	 * Emergency handler is not traced or checked by nmi_check_duration().
> +	 */
> +	ehandler = READ_ONCE(desc->emerg_handler);
> +	if (ehandler)
> +		handled = ehandler(type, regs);

Shouldn't this just stop processing right here?

Thanks,

        tglx

Re: [PATCH v2] x86/nmi: Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus()

Posted by Waiman Long 1 year, 2 months ago

On 12/4/24 8:10 AM, Thomas Gleixner wrote:
> On Tue, Dec 03 2024 at 10:07, Waiman Long wrote:
>> +	/*
>> +	 * Call the emergency handler first, if set
>> +	 * Emergency handler is not traced or checked by nmi_check_duration().
>> +	 */
>> +	ehandler = READ_ONCE(desc->emerg_handler);
>> +	if (ehandler)
>> +		handled = ehandler(type, regs);
> Shouldn't this just stop processing right here?

Yes in the case of crash_nmi_callback(). I suppose it is a no-return 
call. As the emergency handler is supposed to be a general mechanism in 
design, I don't want to make too many assumptions of what will happen 
when the handler is invoked.

What suggested changes do you have in your mind? I do make a minor 
mistake in the functional comment that is reported by kernel test robot 
which I will fix in the next version.

Cheers,
Longman

Re: [PATCH v2] x86/nmi: Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus()

Posted by Thomas Gleixner 1 year, 2 months ago

On Wed, Dec 04 2024 at 12:23, Waiman Long wrote:
> On 12/4/24 8:10 AM, Thomas Gleixner wrote:
>> On Tue, Dec 03 2024 at 10:07, Waiman Long wrote:
>>> +	/*
>>> +	 * Call the emergency handler first, if set
>>> +	 * Emergency handler is not traced or checked by nmi_check_duration().
>>> +	 */
>>> +	ehandler = READ_ONCE(desc->emerg_handler);
>>> +	if (ehandler)
>>> +		handled = ehandler(type, regs);
>> Shouldn't this just stop processing right here?
>
> Yes in the case of crash_nmi_callback(). I suppose it is a no-return 
> call. As the emergency handler is supposed to be a general mechanism in 
> design, I don't want to make too many assumptions of what will happen 
> when the handler is invoked.

I'm not convinced that this should be used as a general mechanism. It's
for emergency situations and that's where it stops. If the thing
returns, it's a bug IMO.

Thanks,

        tglx

Re: [PATCH v2] x86/nmi: Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus()

Posted by Waiman Long 1 year, 2 months ago

On 12/4/24 1:03 PM, Thomas Gleixner wrote:
> On Wed, Dec 04 2024 at 12:23, Waiman Long wrote:
>> On 12/4/24 8:10 AM, Thomas Gleixner wrote:
>>> On Tue, Dec 03 2024 at 10:07, Waiman Long wrote:
>>>> +	/*
>>>> +	 * Call the emergency handler first, if set
>>>> +	 * Emergency handler is not traced or checked by nmi_check_duration().
>>>> +	 */
>>>> +	ehandler = READ_ONCE(desc->emerg_handler);
>>>> +	if (ehandler)
>>>> +		handled = ehandler(type, regs);
>>> Shouldn't this just stop processing right here?
>> Yes in the case of crash_nmi_callback(). I suppose it is a no-return
>> call. As the emergency handler is supposed to be a general mechanism in
>> design, I don't want to make too many assumptions of what will happen
>> when the handler is invoked.
> I'm not convinced that this should be used as a general mechanism. It's
> for emergency situations and that's where it stops. If the thing
> returns, it's a bug IMO.

OK, I am fine with that. I will put a BUG_ON() after that in the next 
version.

Thanks,
Longman

Re: [PATCH v2] x86/nmi: Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus()

Posted by Waiman Long 1 year, 2 months ago

On 12/4/24 2:28 PM, Waiman Long wrote:
>
> On 12/4/24 1:03 PM, Thomas Gleixner wrote:
>> On Wed, Dec 04 2024 at 12:23, Waiman Long wrote:
>>> On 12/4/24 8:10 AM, Thomas Gleixner wrote:
>>>> On Tue, Dec 03 2024 at 10:07, Waiman Long wrote:
>>>>> +    /*
>>>>> +     * Call the emergency handler first, if set
>>>>> +     * Emergency handler is not traced or checked by 
>>>>> nmi_check_duration().
>>>>> +     */
>>>>> +    ehandler = READ_ONCE(desc->emerg_handler);
>>>>> +    if (ehandler)
>>>>> +        handled = ehandler(type, regs);
>>>> Shouldn't this just stop processing right here?
>>> Yes in the case of crash_nmi_callback(). I suppose it is a no-return
>>> call. As the emergency handler is supposed to be a general mechanism in
>>> design, I don't want to make too many assumptions of what will happen
>>> when the handler is invoked.
>> I'm not convinced that this should be used as a general mechanism. It's
>> for emergency situations and that's where it stops. If the thing
>> returns, it's a bug IMO.
>
> OK, I am fine with that. I will put a BUG_ON() after that in the next 
> version.

Actually, crash_nmi_callback() can return in the case of the crashing 
CPUs, though all the other CPUs will not return once called. So I 
believe the current form is correct. I will update the comment to 
reflect that.

Cheers,
Longman

Re: [PATCH v2] x86/nmi: Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus()

Posted by Thomas Gleixner 1 year, 2 months ago

On Wed, Dec 04 2024 at 23:01, Waiman Long wrote:
> On 12/4/24 2:28 PM, Waiman Long wrote:
>>> I'm not convinced that this should be used as a general mechanism. It's
>>> for emergency situations and that's where it stops. If the thing
>>> returns, it's a bug IMO.
>>
>> OK, I am fine with that. I will put a BUG_ON() after that in the next 
>> version.
>
> Actually, crash_nmi_callback() can return in the case of the crashing 
> CPUs, though all the other CPUs will not return once called. So I 
> believe the current form is correct. I will update the comment to 
> reflect that.

Why would you continue servicing the NMI on a CPU which just crashed?

Re: [PATCH v2] x86/nmi: Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus()

Posted by kernel test robot 1 year, 2 months ago

Hi Waiman,

kernel test robot noticed the following build warnings:

[auto build test WARNING on tip/master]
[also build test WARNING on tip/x86/core tip/auto-latest bp/for-next linus/master v6.13-rc1 next-20241203]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Waiman-Long/x86-nmi-Add-an-emergency-handler-in-nmi_desc-use-it-in-nmi_shootdown_cpus/20241204-105505
base:   tip/master
patch link:    https://lore.kernel.org/r/20241203150732.182065-1-longman%40redhat.com
patch subject: [PATCH v2] x86/nmi: Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus()
config: i386-buildonly-randconfig-001 (https://download.01.org/0day-ci/archive/20241204/202412041510.eJUQbvy7-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241204/202412041510.eJUQbvy7-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202412041510.eJUQbvy7-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> arch/x86/kernel/nmi.c:250: warning: Function parameter or struct member 'type' not described in 'set_emergency_nmi_handler'
>> arch/x86/kernel/nmi.c:250: warning: Function parameter or struct member 'handler' not described in 'set_emergency_nmi_handler'


vim +250 arch/x86/kernel/nmi.c

   239	
   240	/**
   241	 * set_emergency_nmi_handler - Set emergency handler
   242	 * @handler - the emergency handler to be stored
   243	 * Return: 0 if success, -EEXIST if a handler had been stored
   244	 *
   245	 * Atomically set an emergency NMI handler which, if set, will be invoked
   246	 * before all the other handlers in the linked list. If a NULL handler is
   247	 * passed in, it will clear it.
   248	 */
   249	int set_emergency_nmi_handler(unsigned int type, nmi_handler_t handler)
 > 250	{
   251		struct nmi_desc *desc = nmi_to_desc(type);
   252		nmi_handler_t orig = NULL;
   253	
   254		if (!handler) {
   255			orig = READ_ONCE(desc->emerg_handler);
   256			WARN_ON_ONCE(!orig);
   257		}
   258	
   259		if (try_cmpxchg(&desc->emerg_handler, &orig, handler))
   260			return 0;
   261		if (WARN_ON_ONCE(orig == handler))
   262			return 0;
   263		WARN_ONCE(1, "%s: failed to set emergency NMI handler!\n", __func__);
   264		return -EEXIST;
   265	}
   266	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

Re: [PATCH v2] x86/nmi: Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus()

Posted by Rik van Riel 1 year, 2 months ago

On Tue, 2024-12-03 at 10:07 -0500, Waiman Long wrote:
> 
> Another way to fix this problem while allowing panic() calls from
> NMI context is by adding a new emergency NMI handler to the nmi_desc
> structure and provide a new set_emergency_nmi_handler() helper to
> atomically set crash_nmi_callback() in any context. The new emergency
> handler will be invoked first before other handlers in the linked
> list. That will eliminate the need to take any lock and serve the
> panic
> in NMI use case.
> 
> Signed-off-by: Waiman Long <longman@redhat.com>

I've seen that panic come by a number of times, but
never came up with a good fix.

Your idea certainly looks like it should work, and
avoid all the issues along the way.

Acked-by: Rik van Riel <riel@surriel.com>

-- 
All Rights Reversed.

Re: [PATCH v2] x86/nmi: Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus()

Posted by Waiman Long 1 year, 2 months ago

On 12/3/24 8:00 PM, Rik van Riel wrote:
> On Tue, 2024-12-03 at 10:07 -0500, Waiman Long wrote:
>> Another way to fix this problem while allowing panic() calls from
>> NMI context is by adding a new emergency NMI handler to the nmi_desc
>> structure and provide a new set_emergency_nmi_handler() helper to
>> atomically set crash_nmi_callback() in any context. The new emergency
>> handler will be invoked first before other handlers in the linked
>> list. That will eliminate the need to take any lock and serve the
>> panic
>> in NMI use case.
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
> I've seen that panic come by a number of times, but
> never came up with a good fix.
>
> Your idea certainly looks like it should work, and
> avoid all the issues along the way.
>
> Acked-by: Rik van Riel <riel@surriel.com>

Thanks for the review.

Cheers,
Longman