arch/x86/kernel/nmi.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)
The __register_nmi_handler() function can be called in NMI context from
nmi_shootdown_cpus() leading to a lockdep splat like the following.
[ 1123.133573] ================================
[ 1123.137845] WARNING: inconsistent lock state
[ 1123.142118] 6.12.0-31.el10.x86_64+debug #1 Not tainted
[ 1123.147257] --------------------------------
[ 1123.151529] inconsistent {INITIAL USE} -> {IN-NMI} usage.
:
[ 1123.261544] Possible unsafe locking scenario:
[ 1123.261544]
[ 1123.267463] CPU0
[ 1123.269915] ----
[ 1123.272368] lock(&nmi_desc[0].lock);
[ 1123.276122] <Interrupt>
[ 1123.278746] lock(&nmi_desc[0].lock);
[ 1123.282671]
[ 1123.282671] *** DEADLOCK ***
:
[ 1123.314088] Call Trace:
[ 1123.316542] <NMI>
[ 1123.318562] dump_stack_lvl+0x6f/0xb0
[ 1123.322230] print_usage_bug.part.0+0x3d3/0x610
[ 1123.330618] lock_acquire.part.0+0x2e6/0x360
[ 1123.357217] _raw_spin_lock_irqsave+0x46/0x90
[ 1123.366193] __register_nmi_handler+0x8f/0x3a0
[ 1123.374401] nmi_shootdown_cpus+0x95/0x120
[ 1123.378509] kdump_nmi_shootdown_cpus+0x15/0x20
[ 1123.383040] native_machine_crash_shutdown+0x54/0x160
[ 1123.388095] __crash_kexec+0x10f/0x1f0
In this particular case, a panic had just happened.
[ 1122.808188] Kernel panic - not syncing: Fatal hardware error!
It can be argued that a lockdep splat after a panic is not a big deal,
but it can still confuse users. Fix that by using trylock in NMI context
to avoid the lockdep splat. If trylock fails, we still need to acquire
the lock on the best effort basis to avoid potential NMI descriptor
list corruption.
Signed-off-by: Waiman Long <longman@redhat.com>
---
arch/x86/kernel/nmi.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index ed163c8c8604..0b8ad64be117 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -171,8 +171,23 @@ int __register_nmi_handler(unsigned int type, struct nmiaction *action)
if (WARN_ON_ONCE(!action->handler || !list_empty(&action->list)))
return -EINVAL;
+ if (in_nmi()) {
+ /*
+ * register_nmi_handler() can be called in NMI context from
+ * nmi_shootdown_cpus(). In this case, we use trylock to
+ * acquire the NMI descriptor lock to avoid potential lockdep
+ * splat. If that fails, we still acquire the lock on the best
+ * effort basis, but a real deadlock may happen if the CPU is
+ * acquiring that lock when NMI happens.
+ */
+ if (raw_spin_trylock_irqsave(&desc->lock, flags))
+ goto locked;
+ pr_err("%s: trylock of NMI desc lock failed in NMI context, may deadlock!\n",
+ __func__);
+ }
raw_spin_lock_irqsave(&desc->lock, flags);
+locked:
/*
* Indicate if there are multiple registrations on the
* internal NMI handler call chains (SERR and IO_CHECK).
--
2.47.0
On Wed, Nov 27, 2024 at 06:34:55PM -0500, Waiman Long wrote: > The __register_nmi_handler() function can be called in NMI context from > nmi_shootdown_cpus() leading to a lockdep splat like the following. This seems fundamentally insane. Why are we okay with this?
On 11/28/24 4:28 AM, Peter Zijlstra wrote: > On Wed, Nov 27, 2024 at 06:34:55PM -0500, Waiman Long wrote: >> The __register_nmi_handler() function can be called in NMI context from >> nmi_shootdown_cpus() leading to a lockdep splat like the following. > This seems fundamentally insane. Why are we okay with this? According to the functional comment of nmi_shootdown_cpus(), * nmi_shootdown_cpus() can only be invoked once. After the first * invocation all other CPUs are stuck in crash_nmi_callback() and * cannot respond to a second NMI. That is why it has to insert the crash_nmi_callback() call with register_nmi_handler() here in the NMI context. Changing this will require a fundamental redesign of the way this shutdown process need to be handled and I am not knowledgeable enough to do that. I will certainly appreciate idea to handle it in a more graceful way. Cheers, Longman
On 11/28/24 8:06 PM, Waiman Long wrote: > > On 11/28/24 4:28 AM, Peter Zijlstra wrote: >> On Wed, Nov 27, 2024 at 06:34:55PM -0500, Waiman Long wrote: >>> The __register_nmi_handler() function can be called in NMI context from >>> nmi_shootdown_cpus() leading to a lockdep splat like the following. >> This seems fundamentally insane. Why are we okay with this? > > According to the functional comment of nmi_shootdown_cpus(), > > * nmi_shootdown_cpus() can only be invoked once. After the first > * invocation all other CPUs are stuck in crash_nmi_callback() and > * cannot respond to a second NMI. > > That is why it has to insert the crash_nmi_callback() call with > register_nmi_handler() here in the NMI context. Changing this will > require a fundamental redesign of the way this shutdown process need > to be handled and I am not knowledgeable enough to do that. I will > certainly appreciate idea to handle it in a more graceful way. One idea that I current have is to add a emergency callback pointer to the nmi_desc structure which, if set, has priority over the handlers in the linked list and will be called first. In this way, nmi_shootdown_cpus() can set the pointer to point to crash_nmi_callback() without the need to take a lock and insert another handler at the front of the list. Please let me know if this idea is acceptable or not. Cheers, Longman
On Thu, Nov 28 2024 at 20:55, Waiman Long wrote: > On 11/28/24 8:06 PM, Waiman Long wrote: >> >> On 11/28/24 4:28 AM, Peter Zijlstra wrote: >>> On Wed, Nov 27, 2024 at 06:34:55PM -0500, Waiman Long wrote: >>>> The __register_nmi_handler() function can be called in NMI context from >>>> nmi_shootdown_cpus() leading to a lockdep splat like the following. >>> This seems fundamentally insane. Why are we okay with this? >> >> According to the functional comment of nmi_shootdown_cpus(), >> >> * nmi_shootdown_cpus() can only be invoked once. After the first >> * invocation all other CPUs are stuck in crash_nmi_callback() and >> * cannot respond to a second NMI. >> >> That is why it has to insert the crash_nmi_callback() call with >> register_nmi_handler() here in the NMI context. Changing this will >> require a fundamental redesign of the way this shutdown process need >> to be handled and I am not knowledgeable enough to do that. I will >> certainly appreciate idea to handle it in a more graceful way. > > One idea that I current have is to add a emergency callback pointer to > the nmi_desc structure which, if set, has priority over the handlers in > the linked list and will be called first. In this way, > nmi_shootdown_cpus() can set the pointer to point to > crash_nmi_callback() without the need to take a lock and insert another > handler at the front of the list. Please let me know if this idea is > acceptable or not. That's way more sane.
On 11/29/24 11:57 AM, Thomas Gleixner wrote: > On Thu, Nov 28 2024 at 20:55, Waiman Long wrote: >> On 11/28/24 8:06 PM, Waiman Long wrote: >>> On 11/28/24 4:28 AM, Peter Zijlstra wrote: >>>> On Wed, Nov 27, 2024 at 06:34:55PM -0500, Waiman Long wrote: >>>>> The __register_nmi_handler() function can be called in NMI context from >>>>> nmi_shootdown_cpus() leading to a lockdep splat like the following. >>>> This seems fundamentally insane. Why are we okay with this? >>> According to the functional comment of nmi_shootdown_cpus(), >>> >>> * nmi_shootdown_cpus() can only be invoked once. After the first >>> * invocation all other CPUs are stuck in crash_nmi_callback() and >>> * cannot respond to a second NMI. >>> >>> That is why it has to insert the crash_nmi_callback() call with >>> register_nmi_handler() here in the NMI context. Changing this will >>> require a fundamental redesign of the way this shutdown process need >>> to be handled and I am not knowledgeable enough to do that. I will >>> certainly appreciate idea to handle it in a more graceful way. >> One idea that I current have is to add a emergency callback pointer to >> the nmi_desc structure which, if set, has priority over the handlers in >> the linked list and will be called first. In this way, >> nmi_shootdown_cpus() can set the pointer to point to >> crash_nmi_callback() without the need to take a lock and insert another >> handler at the front of the list. Please let me know if this idea is >> acceptable or not. > That's way more sane. Thanks for the feedback, I will work on a patch to do just that. Cheers, Longman
© 2016 - 2026 Red Hat, Inc.