arch/x86/virt/hw.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
x86_virt_invoke_kvm_emergency_callback() reaches rcu_dereference()
through machine_crash_shutdown() with IRQs disabled but with RCU not
necessarily watching, which triggers a suspicious RCU usage splat on
debug kernels (CONFIG_PROVE_RCU=y) during panic/kdump:
WARNING: suspicious RCU usage
arch/x86/virt/hw.c:52 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by tee/11119:
#0: ffff8881fa32c440 (sb_writers#3){.+.+}-{0:0}, at: ksys_write
Call Trace:
<TASK>
dump_stack_lvl+0x84/0xd0
lockdep_rcu_suspicious.cold+0x37/0x8f
x86_virt_invoke_kvm_emergency_callback+0x5f/0x70
x86_svm_emergency_disable_virtualization_cpu+0x2a/0x30
x86_virt_emergency_disable_virtualization_cpu+0x6b/0x90
native_machine_crash_shutdown+0x72/0x170
__crash_kexec+0x137/0x280
panic+0xce/0xd0
sysrq_handle_crash+0x1f/0x20
__handle_sysrq.cold+0x192/0x335
write_sysrq_trigger+0x8c/0xc0
proc_reg_write+0x1c3/0x3c0
vfs_write+0x1d0/0xf80
ksys_write+0x116/0x250
do_syscall_64+0x11c/0x1480
entry_SYSCALL_64_after_hwframe+0x76/0x7e
</TASK>
The RCU usage is correct: writers
(x86_virt_{register,unregister}_emergency_callback()) serialize via
rcu_assign_pointer() + synchronize_rcu(), while the reader on the
emergency path runs with IRQs disabled (the only caller is
x86_virt_emergency_disable_virtualization_cpu(), which has
lockdep_assert_irqs_disabled()), which is a valid classic-RCU read-side
critical section.
Use rcu_dereference_check() with irqs_disabled() to silence the splat
without weakening the protection.
Reproducible on a debug kernel (CONFIG_PROVE_LOCKING=y, CONFIG_PROVE_RCU=y)
with kvm_amd or kvm_intel loaded by triggering kdump:
echo c > /proc/sysrq-trigger
Fixes: 428afac5a8ea ("KVM: x86: Move bulk of emergency virtualizaton logic to virt subsystem")
Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
---
arch/x86/virt/hw.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/x86/virt/hw.c b/arch/x86/virt/hw.c
index f647557d38ac..57eebc99299d 100644
--- a/arch/x86/virt/hw.c
+++ b/arch/x86/virt/hw.c
@@ -49,7 +49,13 @@ static void x86_virt_invoke_kvm_emergency_callback(void)
{
cpu_emergency_virt_cb *kvm_callback;
- kvm_callback = rcu_dereference(kvm_emergency_callback);
+ /*
+ * Callers invoke this with IRQs disabled (see
+ * x86_virt_emergency_disable_virtualization_cpu()), which is a valid
+ * RCU read-side critical section. Tell lockdep so it doesn't complain
+ * during panic/reboot paths.
+ */
+ kvm_callback = rcu_dereference_check(kvm_emergency_callback, irqs_disabled());
if (kvm_callback)
kvm_callback();
}
--
2.54.0
On Sun, May 03, 2026, Mikhail Gavrilov wrote:
> x86_virt_invoke_kvm_emergency_callback() reaches rcu_dereference()
> through machine_crash_shutdown() with IRQs disabled but with RCU not
> necessarily watching, which triggers a suspicious RCU usage splat on
> debug kernels (CONFIG_PROVE_RCU=y) during panic/kdump:
>
> WARNING: suspicious RCU usage
> arch/x86/virt/hw.c:52 suspicious rcu_dereference_check() usage!
>
> other info that might help us debug this:
>
> rcu_scheduler_active = 2, debug_locks = 1
> 1 lock held by tee/11119:
> #0: ffff8881fa32c440 (sb_writers#3){.+.+}-{0:0}, at: ksys_write
>
> Call Trace:
> <TASK>
> dump_stack_lvl+0x84/0xd0
> lockdep_rcu_suspicious.cold+0x37/0x8f
> x86_virt_invoke_kvm_emergency_callback+0x5f/0x70
> x86_svm_emergency_disable_virtualization_cpu+0x2a/0x30
> x86_virt_emergency_disable_virtualization_cpu+0x6b/0x90
> native_machine_crash_shutdown+0x72/0x170
> __crash_kexec+0x137/0x280
> panic+0xce/0xd0
> sysrq_handle_crash+0x1f/0x20
> __handle_sysrq.cold+0x192/0x335
> write_sysrq_trigger+0x8c/0xc0
> proc_reg_write+0x1c3/0x3c0
> vfs_write+0x1d0/0xf80
> ksys_write+0x116/0x250
> do_syscall_64+0x11c/0x1480
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> </TASK>
>
> The RCU usage is correct: writers
> (x86_virt_{register,unregister}_emergency_callback()) serialize via
> rcu_assign_pointer() + synchronize_rcu(), while the reader on the
> emergency path runs with IRQs disabled (the only caller is
> x86_virt_emergency_disable_virtualization_cpu(), which has
> lockdep_assert_irqs_disabled()), which is a valid classic-RCU read-side
> critical section.
>
> Use rcu_dereference_check() with irqs_disabled() to silence the splat
> without weakening the protection.
>
> Reproducible on a debug kernel (CONFIG_PROVE_LOCKING=y, CONFIG_PROVE_RCU=y)
> with kvm_amd or kvm_intel loaded by triggering kdump:
>
> echo c > /proc/sysrq-trigger
>
> Fixes: 428afac5a8ea ("KVM: x86: Move bulk of emergency virtualizaton logic to virt subsystem")
> Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
> ---
> arch/x86/virt/hw.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/virt/hw.c b/arch/x86/virt/hw.c
> index f647557d38ac..57eebc99299d 100644
> --- a/arch/x86/virt/hw.c
> +++ b/arch/x86/virt/hw.c
> @@ -49,7 +49,13 @@ static void x86_virt_invoke_kvm_emergency_callback(void)
> {
> cpu_emergency_virt_cb *kvm_callback;
>
> - kvm_callback = rcu_dereference(kvm_emergency_callback);
> + /*
> + * Callers invoke this with IRQs disabled (see
> + * x86_virt_emergency_disable_virtualization_cpu()), which is a valid
> + * RCU read-side critical section. Tell lockdep so it doesn't complain
> + * during panic/reboot paths.
> + */
> + kvm_callback = rcu_dereference_check(kvm_emergency_callback, irqs_disabled());
This feels wrong. If RCU truly isn't watching this CPU, then isn't RCU allowed
to ignore this CPU when synchronizing?
> if (kvm_callback)
> kvm_callback();
> }
> --
> 2.54.0
>
On Mon, May 4, 2026 at 10:48 PM Sean Christopherson <seanjc@google.com> wrote:
>
> This feels wrong. If RCU truly isn't watching this CPU, then isn't RCU allowed
> to ignore this CPU when synchronizing?
>
You're correct that irqs_disabled() doesn't imply RCU is watching, and
in the general case that would be a real concern. However, on the
emergency virt callback path the practical situation is narrower:
1. The reader (x86_virt_invoke_kvm_emergency_callback) only runs from
panic/kexec/reboot via x86_virt_emergency_disable_virtualization_cpu()
and machine_crash_shutdown().
2. The writer (x86_virt_unregister_emergency_callback) calls
synchronize_rcu(), which would observe an RCU read-side critical
section started by rcu_read_lock(). But on the panic path we don't
have rcu_read_lock() — we just have IRQs disabled. So even with my
patch, a concurrent unregister could in principle free the callback
out from under us.
3. In practice, the writer can only run from KVM module unload. By
the time we're in panic context, all CPUs except the crashing one
have been NMI'd into x86_svm_emergency_disable_virtualization_cpu
too — a kvm_amd unload happening concurrently with panic seems
extraordinarily unlikely, and the system is going down regardless.
So the splat is technically a real issue, but the underlying race is
already so vanishingly small that I'm not sure what the right fix
shape is. Some options:
a) Treat this as "panic context can't be RCU-correct anyway" and
use rcu_dereference_raw() with a comment.
b) Convert kvm_emergency_callback away from RCU (it's only set/cleared
once per KVM module lifetime; a regular pointer with smp_store_release/
smp_load_acquire would suffice).
c) Keep my patch but document that it's a minor lockdep silencer for
a path where the use-after-free window is closed by other means
(panic-time module unload being unrealistic).
What direction would you prefer? I'm happy to spin v2 as needed.
--
Best Regards,
Mike Gavrilov.
On Mon, May 4, 2026 at 11:50 PM Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> wrote: > > What direction would you prefer? I'm happy to spin v2 as needed. > After looking at how other places in the kernel handle this — kernel/notifier.c, kernel/cgroup/cgroup.c, kernel/fork.c, kernel/sched/fair.c all use rcu_dereference_raw() when the caller has context-specific knowledge that makes lockdep checks inappropriate. I'll send v2 using rcu_dereference_raw() with a comment explaining the panic-context reasoning. The diff would look like: /* * The crashing CPU may be outside RCU's watching set in panic context. * Use rcu_dereference_raw() to avoid lockdep complaints — the writers * (KVM module load/unload) cannot run during emergency virt callback * invocation, so the pointer is effectively stable here. */ kvm_callback = rcu_dereference_raw(kvm_emergency_callback); Let me know if you'd prefer a different approach (option (b) from my previous mail — converting away from RCU entirely — is a bigger change but I can do that instead). -- Best Regards, Mike Gavrilov.
On Tue, May 05, 2026, Mikhail Gavrilov wrote:
> On Mon, May 4, 2026 at 11:50 PM Mikhail Gavrilov
> <mikhail.v.gavrilov@gmail.com> wrote:
> >
> > What direction would you prefer? I'm happy to spin v2 as needed.
> >
>
> After looking at how other places in the kernel handle this — kernel/notifier.c,
> kernel/cgroup/cgroup.c, kernel/fork.c, kernel/sched/fair.c all use
> rcu_dereference_raw() when the caller has context-specific knowledge that
> makes lockdep checks inappropriate.
>
> I'll send v2 using rcu_dereference_raw() with a comment explaining the
> panic-context reasoning. The diff would look like:
>
> /*
> * The crashing CPU may be outside RCU's watching set in panic context.
> * Use rcu_dereference_raw() to avoid lockdep complaints — the writers
> * (KVM module load/unload) cannot run during emergency virt callback
> * invocation, so the pointer is effectively stable here.
AFAIK, nothing actually prevents module unload when the kernel is panicking and/or
rebooting. E.g. see commit 2baa33a8ddd6 ("KVM: x86: Leave user-return notifier
registered on reboot/shutdown").
> */
> kvm_callback = rcu_dereference_raw(kvm_emergency_callback);
>
> Let me know if you'd prefer a different approach (option (b) from my
> previous mail — converting away from RCU entirely — is a bigger change
> but I can do that instead).
For "normal" usage, if there really is even such a thing for this case,
smp_store_release() / smp_load_acquire() won't suffice, because the kernel needs
to ensure the module text isn't freed while the callback is in-flight.
But as you noted before, if the kernel is panicking, (a) the window for anything
to go wrong is comically small, and (b) at some point the kernel _can't_ guarantee
that everything will be "fine". So I'd probably be ok with just sweeping this
under the rug? Assuming we can't come up with an easy-ish solution that doesn't
require taking locks (which to me, would have a higher probability of causing
problems).
x86_virt_invoke_kvm_emergency_callback() reaches rcu_dereference()
through machine_crash_shutdown() with IRQs disabled but with RCU not
necessarily watching the crashing CPU, which triggers a suspicious
RCU usage splat on debug kernels (CONFIG_PROVE_RCU=y) during
panic/kdump:
WARNING: suspicious RCU usage
arch/x86/virt/hw.c:52 suspicious rcu_dereference_check() usage!
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by tee/11119:
#0: ffff8881fa32c440 (sb_writers#3){.+.+}-{0:0}, at: ksys_write
Call Trace:
<TASK>
dump_stack_lvl+0x84/0xd0
lockdep_rcu_suspicious.cold+0x37/0x8f
x86_virt_invoke_kvm_emergency_callback+0x5f/0x70
x86_svm_emergency_disable_virtualization_cpu+0x2a/0x30
x86_virt_emergency_disable_virtualization_cpu+0x6b/0x90
native_machine_crash_shutdown+0x72/0x170
__crash_kexec+0x137/0x280
panic+0xce/0xd0
sysrq_handle_crash+0x1f/0x20
__handle_sysrq.cold+0x192/0x335
write_sysrq_trigger+0x8c/0xc0
proc_reg_write+0x1c3/0x3c0
vfs_write+0x1d0/0xf80
ksys_write+0x116/0x250
do_syscall_64+0x11c/0x1480
entry_SYSCALL_64_after_hwframe+0x76/0x7e
</TASK>
A truly correct fix is non-trivial: the RCU usage genuinely is wrong in
panic context (RCU may ignore the crashing CPU during synchronization),
and a concurrent KVM module unload could in principle race with the
callback read; see commit 2baa33a8ddd6 ("KVM: x86: Leave user-return
notifier registered on reboot/shutdown") which notes that nothing
prevents module unload during panic/reboot.
However, the alternatives are worse:
- smp_store_release()/smp_load_acquire() handles ordering but not
liveness; the kernel still needs to keep the module text alive
while the callback is in flight.
- Taking a lock in the panic path is risky — any lock could be held
by a CPU that has already been NMI'd to a halt.
Use rcu_dereference_raw() to silence the splat and accept the
vanishingly small remaining race. Panic context inherently cannot
guarantee complete correctness; the goal here is to keep debug builds
quiet on the kdump path so the splat doesn't obscure the actual
kernel state being captured.
Reproducible on a debug kernel (CONFIG_PROVE_LOCKING=y, CONFIG_PROVE_RCU=y)
with kvm_amd or kvm_intel loaded by triggering kdump:
echo c > /proc/sysrq-trigger
Suggested-by: Sean Christopherson <seanjc@google.com>
Fixes: 428afac5a8ea ("KVM: x86: Move bulk of emergency virtualizaton logic to virt subsystem")
Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
---
arch/x86/virt/hw.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/x86/virt/hw.c b/arch/x86/virt/hw.c
index f647557d38ac..7e9091c640be 100644
--- a/arch/x86/virt/hw.c
+++ b/arch/x86/virt/hw.c
@@ -49,7 +49,20 @@ static void x86_virt_invoke_kvm_emergency_callback(void)
{
cpu_emergency_virt_cb *kvm_callback;
- kvm_callback = rcu_dereference(kvm_emergency_callback);
+ /*
+ * RCU may not be watching the crashing CPU here, so rcu_dereference()
+ * triggers a suspicious-RCU-usage splat. In principle, a concurrent
+ * KVM module unload could race with this read; see commit 2baa33a8ddd6
+ * ("KVM: x86: Leave user-return notifier registered on reboot/shutdown")
+ * which notes that nothing prevents module unload during panic/reboot.
+ *
+ * However, taking a lock here would be riskier than the current race:
+ * the system is going down via NMI shootdown, and any lock could be
+ * held by an already-stopped CPU. Use rcu_dereference_raw() to silence
+ * the lockdep splat and accept the comically small remaining race;
+ * panic context inherently cannot guarantee complete correctness.
+ */
+ kvm_callback = rcu_dereference_raw(kvm_emergency_callback);
if (kvm_callback)
kvm_callback();
}
--
2.54.0
On Tue, 05 May 2026 04:54:35 +0500, Mikhail Gavrilov wrote:
> x86_virt_invoke_kvm_emergency_callback() reaches rcu_dereference()
> through machine_crash_shutdown() with IRQs disabled but with RCU not
> necessarily watching the crashing CPU, which triggers a suspicious
> RCU usage splat on debug kernels (CONFIG_PROVE_RCU=y) during
> panic/kdump:
>
> WARNING: suspicious RCU usage
> arch/x86/virt/hw.c:52 suspicious rcu_dereference_check() usage!
>
> [...]
Applied to kvm-x86 fixes, thanks!
[1/1] x86/virt: Silence RCU lockdep splat in emergency virt callback path
https://github.com/kvm-x86/linux/commit/fff82ea9d900
--
https://github.com/kvm-x86/linux/tree/next
On Tue, May 05, 2026, Mikhail Gavrilov wrote:
> x86_virt_invoke_kvm_emergency_callback() reaches rcu_dereference()
> through machine_crash_shutdown() with IRQs disabled but with RCU not
> necessarily watching the crashing CPU, which triggers a suspicious
> RCU usage splat on debug kernels (CONFIG_PROVE_RCU=y) during
> panic/kdump:
>
> WARNING: suspicious RCU usage
> arch/x86/virt/hw.c:52 suspicious rcu_dereference_check() usage!
>
> rcu_scheduler_active = 2, debug_locks = 1
> 1 lock held by tee/11119:
> #0: ffff8881fa32c440 (sb_writers#3){.+.+}-{0:0}, at: ksys_write
>
> Call Trace:
> <TASK>
> dump_stack_lvl+0x84/0xd0
> lockdep_rcu_suspicious.cold+0x37/0x8f
> x86_virt_invoke_kvm_emergency_callback+0x5f/0x70
> x86_svm_emergency_disable_virtualization_cpu+0x2a/0x30
> x86_virt_emergency_disable_virtualization_cpu+0x6b/0x90
> native_machine_crash_shutdown+0x72/0x170
> __crash_kexec+0x137/0x280
> panic+0xce/0xd0
> sysrq_handle_crash+0x1f/0x20
> __handle_sysrq.cold+0x192/0x335
> write_sysrq_trigger+0x8c/0xc0
> proc_reg_write+0x1c3/0x3c0
> vfs_write+0x1d0/0xf80
> ksys_write+0x116/0x250
> do_syscall_64+0x11c/0x1480
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> </TASK>
>
> A truly correct fix is non-trivial: the RCU usage genuinely is wrong in
> panic context (RCU may ignore the crashing CPU during synchronization),
> and a concurrent KVM module unload could in principle race with the
> callback read; see commit 2baa33a8ddd6 ("KVM: x86: Leave user-return
> notifier registered on reboot/shutdown") which notes that nothing
> prevents module unload during panic/reboot.
>
> However, the alternatives are worse:
>
> - smp_store_release()/smp_load_acquire() handles ordering but not
> liveness; the kernel still needs to keep the module text alive
> while the callback is in flight.
> - Taking a lock in the panic path is risky — any lock could be held
> by a CPU that has already been NMI'd to a halt.
>
> Use rcu_dereference_raw() to silence the splat and accept the
> vanishingly small remaining race. Panic context inherently cannot
> guarantee complete correctness; the goal here is to keep debug builds
> quiet on the kdump path so the splat doesn't obscure the actual
> kernel state being captured.
>
> Reproducible on a debug kernel (CONFIG_PROVE_LOCKING=y, CONFIG_PROVE_RCU=y)
> with kvm_amd or kvm_intel loaded by triggering kdump:
>
> echo c > /proc/sysrq-trigger
>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Fixes: 428afac5a8ea ("KVM: x86: Move bulk of emergency virtualizaton logic to virt subsystem")
> Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
> ---
Acked-by: Sean Christopherson <seanjc@google.com>
(I can also take this through kvm-x86; I have no preference whatsoever)
On Fri, May 8, 2026 at 2:59 AM Sean Christopherson <seanjc@google.com> wrote: > > Acked-by: Sean Christopherson <seanjc@google.com> > > (I can also take this through kvm-x86; I have no preference whatsoever) Thanks Sean! Whichever path is most convenient works for me. -- Best Regards, Mike Gavrilov.
© 2016 - 2026 Red Hat, Inc.