x86/virt: Fix RCU lockdep splat in emergency virt callback path

[PATCH] x86/virt: Fix RCU lockdep splat in emergency virt callback path

Posted by Mikhail Gavrilov 1 month, 1 week ago

x86_virt_invoke_kvm_emergency_callback() reaches rcu_dereference()
through machine_crash_shutdown() with IRQs disabled but with RCU not
necessarily watching, which triggers a suspicious RCU usage splat on
debug kernels (CONFIG_PROVE_RCU=y) during panic/kdump:

WARNING: suspicious RCU usage
arch/x86/virt/hw.c:52 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by tee/11119:
 #0: ffff8881fa32c440 (sb_writers#3){.+.+}-{0:0}, at: ksys_write

Call Trace:
 <TASK>
 dump_stack_lvl+0x84/0xd0
 lockdep_rcu_suspicious.cold+0x37/0x8f
 x86_virt_invoke_kvm_emergency_callback+0x5f/0x70
 x86_svm_emergency_disable_virtualization_cpu+0x2a/0x30
 x86_virt_emergency_disable_virtualization_cpu+0x6b/0x90
 native_machine_crash_shutdown+0x72/0x170
 __crash_kexec+0x137/0x280
 panic+0xce/0xd0
 sysrq_handle_crash+0x1f/0x20
 __handle_sysrq.cold+0x192/0x335
 write_sysrq_trigger+0x8c/0xc0
 proc_reg_write+0x1c3/0x3c0
 vfs_write+0x1d0/0xf80
 ksys_write+0x116/0x250
 do_syscall_64+0x11c/0x1480
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
 </TASK>

The RCU usage is correct: writers
(x86_virt_{register,unregister}_emergency_callback()) serialize via
rcu_assign_pointer() + synchronize_rcu(), while the reader on the
emergency path runs with IRQs disabled (the only caller is
x86_virt_emergency_disable_virtualization_cpu(), which has
lockdep_assert_irqs_disabled()), which is a valid classic-RCU read-side
critical section.

Use rcu_dereference_check() with irqs_disabled() to silence the splat
without weakening the protection.

Reproducible on a debug kernel (CONFIG_PROVE_LOCKING=y, CONFIG_PROVE_RCU=y)
with kvm_amd or kvm_intel loaded by triggering kdump:

  echo c > /proc/sysrq-trigger

Fixes: 428afac5a8ea ("KVM: x86: Move bulk of emergency virtualizaton logic to virt subsystem")
Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
---
 arch/x86/virt/hw.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/virt/hw.c b/arch/x86/virt/hw.c
index f647557d38ac..57eebc99299d 100644
--- a/arch/x86/virt/hw.c
+++ b/arch/x86/virt/hw.c
@@ -49,7 +49,13 @@ static void x86_virt_invoke_kvm_emergency_callback(void)
 {
 	cpu_emergency_virt_cb *kvm_callback;
 
-	kvm_callback = rcu_dereference(kvm_emergency_callback);
+	/*
+	 * Callers invoke this with IRQs disabled (see
+	 * x86_virt_emergency_disable_virtualization_cpu()), which is a valid
+	 * RCU read-side critical section. Tell lockdep so it doesn't complain
+	 * during panic/reboot paths.
+	 */
+	kvm_callback = rcu_dereference_check(kvm_emergency_callback, irqs_disabled());
 	if (kvm_callback)
 		kvm_callback();
 }
-- 
2.54.0

Re: [PATCH] x86/virt: Fix RCU lockdep splat in emergency virt callback path

Posted by Sean Christopherson 1 month, 1 week ago

On Sun, May 03, 2026, Mikhail Gavrilov wrote:
> x86_virt_invoke_kvm_emergency_callback() reaches rcu_dereference()
> through machine_crash_shutdown() with IRQs disabled but with RCU not
> necessarily watching, which triggers a suspicious RCU usage splat on
> debug kernels (CONFIG_PROVE_RCU=y) during panic/kdump:
> 
> WARNING: suspicious RCU usage
> arch/x86/virt/hw.c:52 suspicious rcu_dereference_check() usage!
> 
> other info that might help us debug this:
> 
> rcu_scheduler_active = 2, debug_locks = 1
> 1 lock held by tee/11119:
>  #0: ffff8881fa32c440 (sb_writers#3){.+.+}-{0:0}, at: ksys_write
> 
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x84/0xd0
>  lockdep_rcu_suspicious.cold+0x37/0x8f
>  x86_virt_invoke_kvm_emergency_callback+0x5f/0x70
>  x86_svm_emergency_disable_virtualization_cpu+0x2a/0x30
>  x86_virt_emergency_disable_virtualization_cpu+0x6b/0x90
>  native_machine_crash_shutdown+0x72/0x170
>  __crash_kexec+0x137/0x280
>  panic+0xce/0xd0
>  sysrq_handle_crash+0x1f/0x20
>  __handle_sysrq.cold+0x192/0x335
>  write_sysrq_trigger+0x8c/0xc0
>  proc_reg_write+0x1c3/0x3c0
>  vfs_write+0x1d0/0xf80
>  ksys_write+0x116/0x250
>  do_syscall_64+0x11c/0x1480
>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>  </TASK>
> 
> The RCU usage is correct: writers
> (x86_virt_{register,unregister}_emergency_callback()) serialize via
> rcu_assign_pointer() + synchronize_rcu(), while the reader on the
> emergency path runs with IRQs disabled (the only caller is
> x86_virt_emergency_disable_virtualization_cpu(), which has
> lockdep_assert_irqs_disabled()), which is a valid classic-RCU read-side
> critical section.
> 
> Use rcu_dereference_check() with irqs_disabled() to silence the splat
> without weakening the protection.
> 
> Reproducible on a debug kernel (CONFIG_PROVE_LOCKING=y, CONFIG_PROVE_RCU=y)
> with kvm_amd or kvm_intel loaded by triggering kdump:
> 
>   echo c > /proc/sysrq-trigger
> 
> Fixes: 428afac5a8ea ("KVM: x86: Move bulk of emergency virtualizaton logic to virt subsystem")
> Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
> ---
>  arch/x86/virt/hw.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/virt/hw.c b/arch/x86/virt/hw.c
> index f647557d38ac..57eebc99299d 100644
> --- a/arch/x86/virt/hw.c
> +++ b/arch/x86/virt/hw.c
> @@ -49,7 +49,13 @@ static void x86_virt_invoke_kvm_emergency_callback(void)
>  {
>  	cpu_emergency_virt_cb *kvm_callback;
>  
> -	kvm_callback = rcu_dereference(kvm_emergency_callback);
> +	/*
> +	 * Callers invoke this with IRQs disabled (see
> +	 * x86_virt_emergency_disable_virtualization_cpu()), which is a valid
> +	 * RCU read-side critical section. Tell lockdep so it doesn't complain
> +	 * during panic/reboot paths.
> +	 */
> +	kvm_callback = rcu_dereference_check(kvm_emergency_callback, irqs_disabled());

This feels wrong.  If RCU truly isn't watching this CPU, then isn't RCU allowed
to ignore this CPU when synchronizing?

>  	if (kvm_callback)
>  		kvm_callback();
>  }
> -- 
> 2.54.0
>

Re: [PATCH] x86/virt: Fix RCU lockdep splat in emergency virt callback path

Posted by Mikhail Gavrilov 1 month, 1 week ago

On Mon, May 4, 2026 at 10:48 PM Sean Christopherson <seanjc@google.com> wrote:
>
> This feels wrong.  If RCU truly isn't watching this CPU, then isn't RCU allowed
> to ignore this CPU when synchronizing?
>

You're correct that irqs_disabled() doesn't imply RCU is watching, and
in the general case that would be a real concern. However, on the
emergency virt callback path the practical situation is narrower:

  1. The reader (x86_virt_invoke_kvm_emergency_callback) only runs from
     panic/kexec/reboot via x86_virt_emergency_disable_virtualization_cpu()
     and machine_crash_shutdown().

  2. The writer (x86_virt_unregister_emergency_callback) calls
     synchronize_rcu(), which would observe an RCU read-side critical
     section started by rcu_read_lock(). But on the panic path we don't
     have rcu_read_lock() — we just have IRQs disabled. So even with my
     patch, a concurrent unregister could in principle free the callback
     out from under us.

  3. In practice, the writer can only run from KVM module unload. By
     the time we're in panic context, all CPUs except the crashing one
     have been NMI'd into x86_svm_emergency_disable_virtualization_cpu
     too — a kvm_amd unload happening concurrently with panic seems
     extraordinarily unlikely, and the system is going down regardless.

So the splat is technically a real issue, but the underlying race is
already so vanishingly small that I'm not sure what the right fix
shape is. Some options:

  a) Treat this as "panic context can't be RCU-correct anyway" and
     use rcu_dereference_raw() with a comment.

  b) Convert kvm_emergency_callback away from RCU (it's only set/cleared
     once per KVM module lifetime; a regular pointer with smp_store_release/
     smp_load_acquire would suffice).

  c) Keep my patch but document that it's a minor lockdep silencer for
     a path where the use-after-free window is closed by other means
     (panic-time module unload being unrealistic).

What direction would you prefer? I'm happy to spin v2 as needed.

-- 
Best Regards,
Mike Gavrilov.

Re: [PATCH] x86/virt: Fix RCU lockdep splat in emergency virt callback path

Posted by Mikhail Gavrilov 1 month, 1 week ago

On Mon, May 4, 2026 at 11:50 PM Mikhail Gavrilov
<mikhail.v.gavrilov@gmail.com> wrote:
>
> What direction would you prefer? I'm happy to spin v2 as needed.
>

After looking at how other places in the kernel handle this — kernel/notifier.c,
kernel/cgroup/cgroup.c, kernel/fork.c, kernel/sched/fair.c all use
rcu_dereference_raw() when the caller has context-specific knowledge that
makes lockdep checks inappropriate.

I'll send v2 using rcu_dereference_raw() with a comment explaining the
panic-context reasoning. The diff would look like:

  /*
   * The crashing CPU may be outside RCU's watching set in panic context.
   * Use rcu_dereference_raw() to avoid lockdep complaints — the writers
   * (KVM module load/unload) cannot run during emergency virt callback
   * invocation, so the pointer is effectively stable here.
   */
  kvm_callback = rcu_dereference_raw(kvm_emergency_callback);

Let me know if you'd prefer a different approach (option (b) from my
previous mail — converting away from RCU entirely — is a bigger change
but I can do that instead).

-- 
Best Regards,
Mike Gavrilov.

Re: [PATCH] x86/virt: Fix RCU lockdep splat in emergency virt callback path

Posted by Sean Christopherson 1 month, 1 week ago

On Tue, May 05, 2026, Mikhail Gavrilov wrote:
> On Mon, May 4, 2026 at 11:50 PM Mikhail Gavrilov
> <mikhail.v.gavrilov@gmail.com> wrote:
> >
> > What direction would you prefer? I'm happy to spin v2 as needed.
> >
> 
> After looking at how other places in the kernel handle this — kernel/notifier.c,
> kernel/cgroup/cgroup.c, kernel/fork.c, kernel/sched/fair.c all use
> rcu_dereference_raw() when the caller has context-specific knowledge that
> makes lockdep checks inappropriate.
> 
> I'll send v2 using rcu_dereference_raw() with a comment explaining the
> panic-context reasoning. The diff would look like:
> 
>   /*
>    * The crashing CPU may be outside RCU's watching set in panic context.
>    * Use rcu_dereference_raw() to avoid lockdep complaints — the writers
>    * (KVM module load/unload) cannot run during emergency virt callback
>    * invocation, so the pointer is effectively stable here.

AFAIK, nothing actually prevents module unload when the kernel is panicking and/or
rebooting.  E.g. see commit 2baa33a8ddd6 ("KVM: x86: Leave user-return notifier
registered on reboot/shutdown").

>    */
>   kvm_callback = rcu_dereference_raw(kvm_emergency_callback);
> 
> Let me know if you'd prefer a different approach (option (b) from my
> previous mail — converting away from RCU entirely — is a bigger change
> but I can do that instead).

For "normal" usage, if there really is even such a thing for this case,
smp_store_release() / smp_load_acquire() won't suffice, because the kernel needs
to ensure the module text isn't freed while the callback is in-flight.

But as you noted before, if the kernel is panicking, (a) the window for anything
to go wrong is comically small, and (b) at some point the kernel _can't_ guarantee
that everything will be "fine".  So I'd probably be ok with just sweeping this
under the rug?  Assuming we can't come up with an easy-ish solution that doesn't
require taking locks (which to me, would have a higher probability of causing
problems).

[PATCH v2] x86/virt: Silence RCU lockdep splat in emergency virt callback path

Posted by Mikhail Gavrilov 1 month, 1 week ago

x86_virt_invoke_kvm_emergency_callback() reaches rcu_dereference()
through machine_crash_shutdown() with IRQs disabled but with RCU not
necessarily watching the crashing CPU, which triggers a suspicious
RCU usage splat on debug kernels (CONFIG_PROVE_RCU=y) during
panic/kdump:

  WARNING: suspicious RCU usage
  arch/x86/virt/hw.c:52 suspicious rcu_dereference_check() usage!

  rcu_scheduler_active = 2, debug_locks = 1
  1 lock held by tee/11119:
   #0: ffff8881fa32c440 (sb_writers#3){.+.+}-{0:0}, at: ksys_write

  Call Trace:
   <TASK>
   dump_stack_lvl+0x84/0xd0
   lockdep_rcu_suspicious.cold+0x37/0x8f
   x86_virt_invoke_kvm_emergency_callback+0x5f/0x70
   x86_svm_emergency_disable_virtualization_cpu+0x2a/0x30
   x86_virt_emergency_disable_virtualization_cpu+0x6b/0x90
   native_machine_crash_shutdown+0x72/0x170
   __crash_kexec+0x137/0x280
   panic+0xce/0xd0
   sysrq_handle_crash+0x1f/0x20
   __handle_sysrq.cold+0x192/0x335
   write_sysrq_trigger+0x8c/0xc0
   proc_reg_write+0x1c3/0x3c0
   vfs_write+0x1d0/0xf80
   ksys_write+0x116/0x250
   do_syscall_64+0x11c/0x1480
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
   </TASK>

A truly correct fix is non-trivial: the RCU usage genuinely is wrong in
panic context (RCU may ignore the crashing CPU during synchronization),
and a concurrent KVM module unload could in principle race with the
callback read; see commit 2baa33a8ddd6 ("KVM: x86: Leave user-return
notifier registered on reboot/shutdown") which notes that nothing
prevents module unload during panic/reboot.

However, the alternatives are worse:

  - smp_store_release()/smp_load_acquire() handles ordering but not
    liveness; the kernel still needs to keep the module text alive
    while the callback is in flight.
  - Taking a lock in the panic path is risky — any lock could be held
    by a CPU that has already been NMI'd to a halt.

Use rcu_dereference_raw() to silence the splat and accept the
vanishingly small remaining race. Panic context inherently cannot
guarantee complete correctness; the goal here is to keep debug builds
quiet on the kdump path so the splat doesn't obscure the actual
kernel state being captured.

Reproducible on a debug kernel (CONFIG_PROVE_LOCKING=y, CONFIG_PROVE_RCU=y)
with kvm_amd or kvm_intel loaded by triggering kdump:

  echo c > /proc/sysrq-trigger

Suggested-by: Sean Christopherson <seanjc@google.com>
Fixes: 428afac5a8ea ("KVM: x86: Move bulk of emergency virtualizaton logic to virt subsystem")
Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
---
 arch/x86/virt/hw.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/virt/hw.c b/arch/x86/virt/hw.c
index f647557d38ac..7e9091c640be 100644
--- a/arch/x86/virt/hw.c
+++ b/arch/x86/virt/hw.c
@@ -49,7 +49,20 @@ static void x86_virt_invoke_kvm_emergency_callback(void)
 {
 	cpu_emergency_virt_cb *kvm_callback;
 
-	kvm_callback = rcu_dereference(kvm_emergency_callback);
+	/*
+	 * RCU may not be watching the crashing CPU here, so rcu_dereference()
+	 * triggers a suspicious-RCU-usage splat. In principle, a concurrent
+	 * KVM module unload could race with this read; see commit 2baa33a8ddd6
+	 * ("KVM: x86: Leave user-return notifier registered on reboot/shutdown")
+	 * which notes that nothing prevents module unload during panic/reboot.
+	 *
+	 * However, taking a lock here would be riskier than the current race:
+	 * the system is going down via NMI shootdown, and any lock could be
+	 * held by an already-stopped CPU. Use rcu_dereference_raw() to silence
+	 * the lockdep splat and accept the comically small remaining race;
+	 * panic context inherently cannot guarantee complete correctness.
+	 */
+	kvm_callback = rcu_dereference_raw(kvm_emergency_callback);
 	if (kvm_callback)
 		kvm_callback();
 }
-- 
2.54.0

Re: [PATCH v2] x86/virt: Silence RCU lockdep splat in emergency virt callback path

Posted by Sean Christopherson 3 weeks, 5 days ago

On Tue, 05 May 2026 04:54:35 +0500, Mikhail Gavrilov wrote:
> x86_virt_invoke_kvm_emergency_callback() reaches rcu_dereference()
> through machine_crash_shutdown() with IRQs disabled but with RCU not
> necessarily watching the crashing CPU, which triggers a suspicious
> RCU usage splat on debug kernels (CONFIG_PROVE_RCU=y) during
> panic/kdump:
> 
>   WARNING: suspicious RCU usage
>   arch/x86/virt/hw.c:52 suspicious rcu_dereference_check() usage!
> 
> [...]

Applied to kvm-x86 fixes, thanks!

[1/1] x86/virt: Silence RCU lockdep splat in emergency virt callback path
      https://github.com/kvm-x86/linux/commit/fff82ea9d900

--
https://github.com/kvm-x86/linux/tree/next

Re: [PATCH v2] x86/virt: Silence RCU lockdep splat in emergency virt callback path

Posted by Sean Christopherson 1 month, 1 week ago

On Tue, May 05, 2026, Mikhail Gavrilov wrote:
> x86_virt_invoke_kvm_emergency_callback() reaches rcu_dereference()
> through machine_crash_shutdown() with IRQs disabled but with RCU not
> necessarily watching the crashing CPU, which triggers a suspicious
> RCU usage splat on debug kernels (CONFIG_PROVE_RCU=y) during
> panic/kdump:
> 
>   WARNING: suspicious RCU usage
>   arch/x86/virt/hw.c:52 suspicious rcu_dereference_check() usage!
> 
>   rcu_scheduler_active = 2, debug_locks = 1
>   1 lock held by tee/11119:
>    #0: ffff8881fa32c440 (sb_writers#3){.+.+}-{0:0}, at: ksys_write
> 
>   Call Trace:
>    <TASK>
>    dump_stack_lvl+0x84/0xd0
>    lockdep_rcu_suspicious.cold+0x37/0x8f
>    x86_virt_invoke_kvm_emergency_callback+0x5f/0x70
>    x86_svm_emergency_disable_virtualization_cpu+0x2a/0x30
>    x86_virt_emergency_disable_virtualization_cpu+0x6b/0x90
>    native_machine_crash_shutdown+0x72/0x170
>    __crash_kexec+0x137/0x280
>    panic+0xce/0xd0
>    sysrq_handle_crash+0x1f/0x20
>    __handle_sysrq.cold+0x192/0x335
>    write_sysrq_trigger+0x8c/0xc0
>    proc_reg_write+0x1c3/0x3c0
>    vfs_write+0x1d0/0xf80
>    ksys_write+0x116/0x250
>    do_syscall_64+0x11c/0x1480
>    entry_SYSCALL_64_after_hwframe+0x76/0x7e
>    </TASK>
> 
> A truly correct fix is non-trivial: the RCU usage genuinely is wrong in
> panic context (RCU may ignore the crashing CPU during synchronization),
> and a concurrent KVM module unload could in principle race with the
> callback read; see commit 2baa33a8ddd6 ("KVM: x86: Leave user-return
> notifier registered on reboot/shutdown") which notes that nothing
> prevents module unload during panic/reboot.
> 
> However, the alternatives are worse:
> 
>   - smp_store_release()/smp_load_acquire() handles ordering but not
>     liveness; the kernel still needs to keep the module text alive
>     while the callback is in flight.
>   - Taking a lock in the panic path is risky — any lock could be held
>     by a CPU that has already been NMI'd to a halt.
> 
> Use rcu_dereference_raw() to silence the splat and accept the
> vanishingly small remaining race. Panic context inherently cannot
> guarantee complete correctness; the goal here is to keep debug builds
> quiet on the kdump path so the splat doesn't obscure the actual
> kernel state being captured.
> 
> Reproducible on a debug kernel (CONFIG_PROVE_LOCKING=y, CONFIG_PROVE_RCU=y)
> with kvm_amd or kvm_intel loaded by triggering kdump:
> 
>   echo c > /proc/sysrq-trigger
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Fixes: 428afac5a8ea ("KVM: x86: Move bulk of emergency virtualizaton logic to virt subsystem")
> Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
> ---

Acked-by: Sean Christopherson <seanjc@google.com>

(I can also take this through kvm-x86; I have no preference whatsoever)

Re: [PATCH v2] x86/virt: Silence RCU lockdep splat in emergency virt callback path

Posted by Mikhail Gavrilov 1 month ago

On Fri, May 8, 2026 at 2:59 AM Sean Christopherson <seanjc@google.com> wrote:
>
> Acked-by: Sean Christopherson <seanjc@google.com>
>
> (I can also take this through kvm-x86; I have no preference whatsoever)

Thanks Sean! Whichever path is most convenient works for me.

-- 
Best Regards,
Mike Gavrilov.