[PATCH v3] x86/virt/tdx: Fix lockdep assertion failure in cache flush for kexec

Kai Huang posted 1 patch 22 hours ago
arch/x86/virt/vmx/tdx/tdx.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
[PATCH v3] x86/virt/tdx: Fix lockdep assertion failure in cache flush for kexec
Posted by Kai Huang 22 hours ago
TDX can leave the cache in an incoherent state for the memory it uses.
During kexec, the kernel does a WBINVD for all CPUs which may have
incoherent cache before memory gets reused in the second kernel.

The kernel tracks the need for the kexec-time WBINVD generically (i.e.,
not just for TDX) in a per-cpu boolean.  The TDX core also provides a
function tdx_cpu_flush_cache_for_kexec() to do the WBINVD at an earlier
time to save the kexec-time WBINVD.  This function manipulates the
per-cpu boolean, and it has a lockdep_assert_preemption_disabled() to
check that the context cannot be preempted.

This function is called from both KVM's syscore ops shutdown() handler
(which is called at an early stage in the kexec path) and the module
unload path.  The lockdep assert passes in the kexec path since the
shutdown() handler calls that function in IRQ context.  However, the
module unload path invokes it via the CPUHP callback which runs that
function in a per-cpu thread with preemption enabled.  This triggers the
lockdep warning when unloading the KVM module:

  IS_ENABLED(CONFIG_PREEMPT_COUNT) && __lockdep_enabled && (preempt_count() == 0 && this_cpu_read(hardirqs_enabled))
  WARNING: arch/x86/virt/vmx/tdx/tdx.c:1875 at tdx_cpu_flush_cache_for_kexec+0x36/0x60, CPU#0: cpuhp/0/22
  ...
  Call Trace:
   <TASK>
   vt_disable_virtualization_cpu+0x1c/0x30 [kvm_intel]
   kvm_arch_disable_virtualization_cpu+0x12/0x80 [kvm]
   kvm_offline_cpu+0x24/0x40 [kvm]
   cpuhp_invoke_callback+0x1b0/0x740
   ...

The lockdep warning is actually a false positive, though, because the
CPUHP callback guarantees that the function runs on the same CPU even
when preemption is enabled.  In other words, the lockdep assert of
preemption being disabled is overkill.

Remove the overkill lockdep_assert_preemption_disabled(), and change
this_cpu_{read|write}() to __this_cpu_{read|write}() which provides a
more proper preemption check (when CONFIG_DEBUG_PREEMPT is true), which
checks all conditions that the context cannot be moved to another CPU to
run in the middle.

Fixes: 61221d07e815 ("KVM/TDX: Explicitly do WBINVD when no more TDX SEAMCALLs")
Cc: stable@vger.kernel.org
Reported-by: Vishal Verma <vishal.l.verma@intel.com>
Tested-by: Vishal Verma <vishal.l.verma@intel.com>
Acked-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Acked-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---

v2 -> v3:

- Add Kirill's tag (Thanks!)
- Address Dave's comments (provided in internal chat) about the changelog
being not concise enough to get the main points:

  The code manipulates a per-cpu variable
  There is a preempt disable check for #1
  The preempt check passes in the kexec path, but WARN()s in the "KVM unload"
  The KVM unload path is actually OK so #3 is a false positive

v2: https://lore.kernel.org/lkml/20260312100009.924136-1-kai.huang@intel.com/


---
 arch/x86/virt/vmx/tdx/tdx.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 8b8e165a2001..6f6be1df4b78 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -1872,9 +1872,7 @@ EXPORT_SYMBOL_FOR_KVM(tdh_phymem_page_wbinvd_hkid);
 #ifdef CONFIG_KEXEC_CORE
 void tdx_cpu_flush_cache_for_kexec(void)
 {
-	lockdep_assert_preemption_disabled();
-
-	if (!this_cpu_read(cache_state_incoherent))
+	if (!__this_cpu_read(cache_state_incoherent))
 		return;
 
 	/*
@@ -1883,7 +1881,7 @@ void tdx_cpu_flush_cache_for_kexec(void)
 	 * there should be no more SEAMCALLs on this CPU.
 	 */
 	wbinvd();
-	this_cpu_write(cache_state_incoherent, false);
+	__this_cpu_write(cache_state_incoherent, false);
 }
 EXPORT_SYMBOL_FOR_KVM(tdx_cpu_flush_cache_for_kexec);
 #endif

base-commit: 87d034b5b9f36c66bf02af587fb6935af88ffbf1
-- 
2.53.0