arch/x86/virt/hw.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)
x86_virt_invoke_kvm_emergency_callback() reaches rcu_dereference()
through machine_crash_shutdown() with IRQs disabled but with RCU not
necessarily watching the crashing CPU, which triggers a suspicious
RCU usage splat on debug kernels (CONFIG_PROVE_RCU=y) during
panic/kdump:
WARNING: suspicious RCU usage
arch/x86/virt/hw.c:52 suspicious rcu_dereference_check() usage!
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by tee/11119:
#0: ffff8881fa32c440 (sb_writers#3){.+.+}-{0:0}, at: ksys_write
Call Trace:
<TASK>
dump_stack_lvl+0x84/0xd0
lockdep_rcu_suspicious.cold+0x37/0x8f
x86_virt_invoke_kvm_emergency_callback+0x5f/0x70
x86_svm_emergency_disable_virtualization_cpu+0x2a/0x30
x86_virt_emergency_disable_virtualization_cpu+0x6b/0x90
native_machine_crash_shutdown+0x72/0x170
__crash_kexec+0x137/0x280
panic+0xce/0xd0
sysrq_handle_crash+0x1f/0x20
__handle_sysrq.cold+0x192/0x335
write_sysrq_trigger+0x8c/0xc0
proc_reg_write+0x1c3/0x3c0
vfs_write+0x1d0/0xf80
ksys_write+0x116/0x250
do_syscall_64+0x11c/0x1480
entry_SYSCALL_64_after_hwframe+0x76/0x7e
</TASK>
A truly correct fix is non-trivial: the RCU usage genuinely is wrong in
panic context (RCU may ignore the crashing CPU during synchronization),
and a concurrent KVM module unload could in principle race with the
callback read; see commit 2baa33a8ddd6 ("KVM: x86: Leave user-return
notifier registered on reboot/shutdown") which notes that nothing
prevents module unload during panic/reboot.
However, the alternatives are worse:
- smp_store_release()/smp_load_acquire() handles ordering but not
liveness; the kernel still needs to keep the module text alive
while the callback is in flight.
- Taking a lock in the panic path is risky — any lock could be held
by a CPU that has already been NMI'd to a halt.
Use rcu_dereference_raw() to silence the splat and accept the
vanishingly small remaining race. Panic context inherently cannot
guarantee complete correctness; the goal here is to keep debug builds
quiet on the kdump path so the splat doesn't obscure the actual
kernel state being captured.
Reproducible on a debug kernel (CONFIG_PROVE_LOCKING=y, CONFIG_PROVE_RCU=y)
with kvm_amd or kvm_intel loaded by triggering kdump:
echo c > /proc/sysrq-trigger
Suggested-by: Sean Christopherson <seanjc@google.com>
Fixes: 428afac5a8ea ("KVM: x86: Move bulk of emergency virtualizaton logic to virt subsystem")
Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
---
arch/x86/virt/hw.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/x86/virt/hw.c b/arch/x86/virt/hw.c
index f647557d38ac..7e9091c640be 100644
--- a/arch/x86/virt/hw.c
+++ b/arch/x86/virt/hw.c
@@ -49,7 +49,20 @@ static void x86_virt_invoke_kvm_emergency_callback(void)
{
cpu_emergency_virt_cb *kvm_callback;
- kvm_callback = rcu_dereference(kvm_emergency_callback);
+ /*
+ * RCU may not be watching the crashing CPU here, so rcu_dereference()
+ * triggers a suspicious-RCU-usage splat. In principle, a concurrent
+ * KVM module unload could race with this read; see commit 2baa33a8ddd6
+ * ("KVM: x86: Leave user-return notifier registered on reboot/shutdown")
+ * which notes that nothing prevents module unload during panic/reboot.
+ *
+ * However, taking a lock here would be riskier than the current race:
+ * the system is going down via NMI shootdown, and any lock could be
+ * held by an already-stopped CPU. Use rcu_dereference_raw() to silence
+ * the lockdep splat and accept the comically small remaining race;
+ * panic context inherently cannot guarantee complete correctness.
+ */
+ kvm_callback = rcu_dereference_raw(kvm_emergency_callback);
if (kvm_callback)
kvm_callback();
}
--
2.54.0
On Tue, 05 May 2026 04:54:35 +0500, Mikhail Gavrilov wrote:
> x86_virt_invoke_kvm_emergency_callback() reaches rcu_dereference()
> through machine_crash_shutdown() with IRQs disabled but with RCU not
> necessarily watching the crashing CPU, which triggers a suspicious
> RCU usage splat on debug kernels (CONFIG_PROVE_RCU=y) during
> panic/kdump:
>
> WARNING: suspicious RCU usage
> arch/x86/virt/hw.c:52 suspicious rcu_dereference_check() usage!
>
> [...]
Applied to kvm-x86 fixes, thanks!
[1/1] x86/virt: Silence RCU lockdep splat in emergency virt callback path
https://github.com/kvm-x86/linux/commit/fff82ea9d900
--
https://github.com/kvm-x86/linux/tree/next
On Tue, May 05, 2026, Mikhail Gavrilov wrote:
> x86_virt_invoke_kvm_emergency_callback() reaches rcu_dereference()
> through machine_crash_shutdown() with IRQs disabled but with RCU not
> necessarily watching the crashing CPU, which triggers a suspicious
> RCU usage splat on debug kernels (CONFIG_PROVE_RCU=y) during
> panic/kdump:
>
> WARNING: suspicious RCU usage
> arch/x86/virt/hw.c:52 suspicious rcu_dereference_check() usage!
>
> rcu_scheduler_active = 2, debug_locks = 1
> 1 lock held by tee/11119:
> #0: ffff8881fa32c440 (sb_writers#3){.+.+}-{0:0}, at: ksys_write
>
> Call Trace:
> <TASK>
> dump_stack_lvl+0x84/0xd0
> lockdep_rcu_suspicious.cold+0x37/0x8f
> x86_virt_invoke_kvm_emergency_callback+0x5f/0x70
> x86_svm_emergency_disable_virtualization_cpu+0x2a/0x30
> x86_virt_emergency_disable_virtualization_cpu+0x6b/0x90
> native_machine_crash_shutdown+0x72/0x170
> __crash_kexec+0x137/0x280
> panic+0xce/0xd0
> sysrq_handle_crash+0x1f/0x20
> __handle_sysrq.cold+0x192/0x335
> write_sysrq_trigger+0x8c/0xc0
> proc_reg_write+0x1c3/0x3c0
> vfs_write+0x1d0/0xf80
> ksys_write+0x116/0x250
> do_syscall_64+0x11c/0x1480
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> </TASK>
>
> A truly correct fix is non-trivial: the RCU usage genuinely is wrong in
> panic context (RCU may ignore the crashing CPU during synchronization),
> and a concurrent KVM module unload could in principle race with the
> callback read; see commit 2baa33a8ddd6 ("KVM: x86: Leave user-return
> notifier registered on reboot/shutdown") which notes that nothing
> prevents module unload during panic/reboot.
>
> However, the alternatives are worse:
>
> - smp_store_release()/smp_load_acquire() handles ordering but not
> liveness; the kernel still needs to keep the module text alive
> while the callback is in flight.
> - Taking a lock in the panic path is risky — any lock could be held
> by a CPU that has already been NMI'd to a halt.
>
> Use rcu_dereference_raw() to silence the splat and accept the
> vanishingly small remaining race. Panic context inherently cannot
> guarantee complete correctness; the goal here is to keep debug builds
> quiet on the kdump path so the splat doesn't obscure the actual
> kernel state being captured.
>
> Reproducible on a debug kernel (CONFIG_PROVE_LOCKING=y, CONFIG_PROVE_RCU=y)
> with kvm_amd or kvm_intel loaded by triggering kdump:
>
> echo c > /proc/sysrq-trigger
>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Fixes: 428afac5a8ea ("KVM: x86: Move bulk of emergency virtualizaton logic to virt subsystem")
> Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
> ---
Acked-by: Sean Christopherson <seanjc@google.com>
(I can also take this through kvm-x86; I have no preference whatsoever)
On Fri, May 8, 2026 at 2:59 AM Sean Christopherson <seanjc@google.com> wrote: > > Acked-by: Sean Christopherson <seanjc@google.com> > > (I can also take this through kvm-x86; I have no preference whatsoever) Thanks Sean! Whichever path is most convenient works for me. -- Best Regards, Mike Gavrilov.
© 2016 - 2026 Red Hat, Inc.