From nobody Sun Feb 8 07:08:28 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CCF3F207A20 for ; Thu, 30 Oct 2025 19:15:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761851736; cv=none; b=DU3IngPseA5dBS0ZNRJdDmGPtkPgtjR3qeakLkfrWT8H0o3iPxyrUl5yC5TAU9VYvkmaVXCNwOwary41SS/2VypfrZncGpVdcqjiHNb3dMQbdS0ND5aFjKU3Pyfvz+SVMn4QZQ+2b6cMvxJcAIRLt0qiokfYT8QFwvoea2bWn58= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761851736; c=relaxed/simple; bh=j2T7DW3Qe5yMXkG86FZZbGazbRBbfYmr7MKMMdilO1k=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=aCufCgb50LU5rFJuCXWds3mOX3Ql90pc4dfL+aNniSaquZNxPYWeaTBtDVcHM7P1raK8qfYLTPmRRTLIgSffxA4TkPTnJhtvmHAuNOJ3/9IRAEdXZX3v44fy8t01OdrhTgUBIjMhAUD0WYbuQkqMUWiKGQvJs4O3uHJMFFkqoIk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=PTI/oFlw; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="PTI/oFlw" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-34077439124so966454a91.1 for ; Thu, 30 Oct 2025 12:15:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1761851734; x=1762456534; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=Dq1vYL3NSNsi6Qw9Ae5vBXSHkezMLdaWd5VgrpqK1Lg=; b=PTI/oFlwFtg+xtJegJAlnvfL80AoSwiaLQq32r7l+4F07JW7fDTHIonU9bfNhim6aj 2/7ZLU0I+HGXt5JzF5AYY7Sz79wihy7SYA9B8s87SWNw4themWlf4hlMnsGqu9Scie4r 2K7Bq2X0J7JLwjdTXGJVsvzGpJJ0pnUIEK2r7hsvLU0cyT+E3CDZjR8ln7/Z6hj5BmnT uDmOA0HMZ7l/lhy87x6Y1bDI7Vah92c8nh6p1qNv1uMXZ938/ZPcXmSQk8XY1n0MdgXV QOyItUCgLJDXcWYLhuRzdir9v7nBgG8+YZt99kXvrHPR7S99RCz/8uqLzbqgC05kA4Mk 9zHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761851734; x=1762456534; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Dq1vYL3NSNsi6Qw9Ae5vBXSHkezMLdaWd5VgrpqK1Lg=; b=Iu1eHXi4VHVLCouFWQ6sPtQJltxgmbKCzQvQT/EMo0fXRysWm72RgxCQmOurRIYZD5 q0+uiLAwjDnLOBKZD3oaq0MAux0CxtPkUl4NMc0plXoMFK9kPoWniuyUL+RaB9eN5kgF 8Cf7gUAC7lzsVGXmtBtEO4FPR/1/1pmHMOJRKVDmG0/R/m7zDhsnRDFr/C1nsc/LV5ZX 6L/CwxGzxtjon6cbJQ7xsa2MIALZTDAqmKrnujVa8xbor7wajrGhrxbhZ6znM8bya1FY T+eJj5caH5nA3NRBS0j8efPZHhBDmqYMqfDzSxQps3LddaIT68S1PAv5FsMCWBv9IPK1 FRdg== X-Forwarded-Encrypted: i=1; AJvYcCWN1hhTAmsJSPjdPaIIGL3TaX+j3Q9IstpTOK1yJGOUeBYopmNt6+sa5trxvtpLKqb/T+BYkzMsyZJydd0=@vger.kernel.org X-Gm-Message-State: AOJu0YyD40nOQtmoRHnScTXP+LnZXIQLit/VzIhvr8xrIxgclHwn3KDB AWs8s+DCuv/LXtsCdQsjrK+GUDTLmFGBGsT58kJVh2b+FsGfzyyScTci4AReEuIwGNShYG1jKkg Xas4+Yg== X-Google-Smtp-Source: AGHT+IFMS6wWJi/5fPuAa9ahYFcHZMTgc+dfquVEQJl3MCn69erLyEpCcqPdCsRbIozxN4rCkrsm4Ku7EUw= X-Received: from pjbli2.prod.google.com ([2002:a17:90b:48c2:b0:340:5f48:d8ff]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:5628:b0:32e:a54a:be4a with SMTP id 98e67ed59e1d1-34082fc6369mr1032761a91.2.1761851734140; Thu, 30 Oct 2025 12:15:34 -0700 (PDT) Reply-To: Sean Christopherson Date: Thu, 30 Oct 2025 12:15:25 -0700 In-Reply-To: <20251030191528.3380553-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20251030191528.3380553-1-seanjc@google.com> X-Mailer: git-send-email 2.51.1.930.gacf6e81ea2-goog Message-ID: <20251030191528.3380553-2-seanjc@google.com> Subject: [PATCH v5 1/4] KVM: TDX: Explicitly set user-return MSRs that *may* be clobbered by the TDX-Module From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini , "Kirill A. Shutemov" Cc: kvm@vger.kernel.org, x86@kernel.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, Yan Zhao , Xiaoyao Li , Rick Edgecombe , Hou Wenlong Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Set all user-return MSRs to their post-TD-exit value when preparing to run a TDX vCPU to ensure the value that KVM expects to be loaded after running the vCPU is indeed the value that's loaded in hardware. If the TDX-Module doesn't actually enter the guest, i.e. doesn't do VM-Enter, then it won't "restore" VMM state, i.e. won't clobber user-return MSRs to their expected post-run values, in which case simply updating KVM's "cached" value will effectively corrupt the cache due to hardware still holding the original value. In theory, KVM could conditionally update the current user-return value if and only if tdh_vp_enter() succeeds, but in practice "success" doesn't guarantee the TDX-Module actually entered the guest, e.g. if the TDX-Module synthesizes an EPT Violation because it suspects a zero-step attack. Force-load the expected values instead of trying to decipher whether or not the TDX-Module restored/clobbered MSRs, as the risk doesn't justify the benefits. Effectively avoiding four WRMSRs once per run loop (even if the vCPU is scheduled out, user-return MSRs only need to be reloaded if the CPU exits to userspace or runs a non-TDX vCPU) is likely in the noise when amortized over all entries, given the cost of running a TDX vCPU. E.g. the cost of the WRMSRs is somewhere between ~300 and ~500 cycles, whereas the cost of a _single_ roundtrip to/from a TDX guest is thousands of cycles. Fixes: e0b4f31a3c65 ("KVM: TDX: restore user ret MSRs") Cc: stable@vger.kernel.org Cc: Yan Zhao Cc: Xiaoyao Li Cc: Rick Edgecombe Signed-off-by: Sean Christopherson Reviewed-by: Xiaoyao Li --- arch/x86/include/asm/kvm_host.h | 1 - arch/x86/kvm/vmx/tdx.c | 52 +++++++++++++++------------------ arch/x86/kvm/vmx/tdx.h | 1 - arch/x86/kvm/x86.c | 9 ------ 4 files changed, 23 insertions(+), 40 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 48598d017d6f..d158dfd1842e 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -2378,7 +2378,6 @@ int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ip= i_bitmap_low, int kvm_add_user_return_msr(u32 msr); int kvm_find_user_return_msr(u32 msr); int kvm_set_user_return_msr(unsigned index, u64 val, u64 mask); -void kvm_user_return_msr_update_cache(unsigned int index, u64 val); u64 kvm_get_user_return_msr(unsigned int slot); =20 static inline bool kvm_is_supported_user_return_msr(u32 msr) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 326db9b9c567..cde91a995076 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -763,25 +763,6 @@ static bool tdx_protected_apic_has_interrupt(struct kv= m_vcpu *vcpu) return tdx_vcpu_state_details_intr_pending(vcpu_state_details); } =20 -/* - * Compared to vmx_prepare_switch_to_guest(), there is not much to do - * as SEAMCALL/SEAMRET calls take care of most of save and restore. - */ -void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) -{ - struct vcpu_vt *vt =3D to_vt(vcpu); - - if (vt->guest_state_loaded) - return; - - if (likely(is_64bit_mm(current->mm))) - vt->msr_host_kernel_gs_base =3D current->thread.gsbase; - else - vt->msr_host_kernel_gs_base =3D read_msr(MSR_KERNEL_GS_BASE); - - vt->guest_state_loaded =3D true; -} - struct tdx_uret_msr { u32 msr; unsigned int slot; @@ -795,19 +776,38 @@ static struct tdx_uret_msr tdx_uret_msrs[] =3D { {.msr =3D MSR_TSC_AUX,}, }; =20 -static void tdx_user_return_msr_update_cache(void) +void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) { + struct vcpu_vt *vt =3D to_vt(vcpu); int i; =20 + if (vt->guest_state_loaded) + return; + + if (likely(is_64bit_mm(current->mm))) + vt->msr_host_kernel_gs_base =3D current->thread.gsbase; + else + vt->msr_host_kernel_gs_base =3D read_msr(MSR_KERNEL_GS_BASE); + + vt->guest_state_loaded =3D true; + + /* + * Explicitly set user-return MSRs that are clobbered by the TDX-Module + * if VP.ENTER succeeds, i.e. on TD-Exit, with the values that would be + * written by the TDX-Module. Don't rely on the TDX-Module to actually + * clobber the MSRs, as the contract is poorly defined and not upheld. + * E.g. the TDX-Module will synthesize an EPT Violation without doing + * VM-Enter if it suspects a zero-step attack, and never "restore" VMM + * state. + */ for (i =3D 0; i < ARRAY_SIZE(tdx_uret_msrs); i++) - kvm_user_return_msr_update_cache(tdx_uret_msrs[i].slot, - tdx_uret_msrs[i].defval); + kvm_set_user_return_msr(tdx_uret_msrs[i].slot, + tdx_uret_msrs[i].defval, -1ull); } =20 static void tdx_prepare_switch_to_host(struct kvm_vcpu *vcpu) { struct vcpu_vt *vt =3D to_vt(vcpu); - struct vcpu_tdx *tdx =3D to_tdx(vcpu); =20 if (!vt->guest_state_loaded) return; @@ -815,11 +815,6 @@ static void tdx_prepare_switch_to_host(struct kvm_vcpu= *vcpu) ++vcpu->stat.host_state_reload; wrmsrl(MSR_KERNEL_GS_BASE, vt->msr_host_kernel_gs_base); =20 - if (tdx->guest_entered) { - tdx_user_return_msr_update_cache(); - tdx->guest_entered =3D false; - } - vt->guest_state_loaded =3D false; } =20 @@ -1059,7 +1054,6 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, u64 ru= n_flags) update_debugctlmsr(vcpu->arch.host_debugctl); =20 tdx_load_host_xsave_state(vcpu); - tdx->guest_entered =3D true; =20 vcpu->arch.regs_avail &=3D TDX_REGS_AVAIL_SET; =20 diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index ca39a9391db1..7f258870dc41 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -67,7 +67,6 @@ struct vcpu_tdx { u64 vp_enter_ret; =20 enum vcpu_tdx_state state; - bool guest_entered; =20 u64 map_gpa_next; u64 map_gpa_end; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b4b5d2d09634..639589af7cbe 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -681,15 +681,6 @@ int kvm_set_user_return_msr(unsigned slot, u64 value, = u64 mask) } EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_user_return_msr); =20 -void kvm_user_return_msr_update_cache(unsigned int slot, u64 value) -{ - struct kvm_user_return_msrs *msrs =3D this_cpu_ptr(user_return_msrs); - - msrs->values[slot].curr =3D value; - kvm_user_return_register_notifier(msrs); -} -EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_user_return_msr_update_cache); - u64 kvm_get_user_return_msr(unsigned int slot) { return this_cpu_ptr(user_return_msrs)->values[slot].curr; --=20 2.51.1.930.gacf6e81ea2-goog From nobody Sun Feb 8 07:08:28 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C9B432135B9 for ; Thu, 30 Oct 2025 19:15:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761851738; cv=none; b=kUf0VotzGtvFPGuR0HZzDu9bbQKwmXh2cWeZA0bg1KKKmYqVmfiV36dr3Y9PD+he2eSK/xNd3Ka4MvgXm7gNX9WhYLum5kOpT3YJ4kdHyLbwzhp567mVo0m1GzXlmVyuDdCW1453MFio9dbV0KxyehTy73EgMxP62VhOK8IsNFg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761851738; c=relaxed/simple; bh=s/j8oAqk1LO8lywmGIHCOU69Wyo7ZAJOhkz5KhrFPe0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ecAVd2pjHhu6wP18IHBbkuC4zhwYuTUl22E+XVnjRZ842acK1exLk2yPbT0eCae99/ctGZF15mgEUgieoP0sC4j+1Ie0sxdwUxgK4pazjxZ8Ud8hdefFUSz6hJ/H3uIzTj7wz+JFzkWm/mK9YlhQoFMRAQ6Uk1eYPLHe56qz22c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=QvOi3wiS; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="QvOi3wiS" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-290992f9693so19866275ad.1 for ; Thu, 30 Oct 2025 12:15:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1761851736; x=1762456536; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=RIfvvqyC4pkF2mzEqWw3TYRNyqKqrDE0HW5h0ZeeWLw=; b=QvOi3wiSyX2oZDUQQtajyIpmScwDOYHzW09zf6MIXwWH60ueOiP9naeWrpICmdH5wn WJqqb7/r9+hkEpmRUCGkURLCvh9NPfk8uQFIabk3i5IBmP4LHfkp+Az3bZWtSFu7wEwv XNLbJcQ7zw2kuTrThRHV1R6cXkUdmZ/bO+LpUClA5hX4mDZx48tFjSX/I6YhNthuKYaV O0SZJKPuPUrSOkXtQyxbb76zpq4Z20adGFMiQBi+TpQCV7Es0oCxIJBWszE3XC0oBAzl OA880YzLf8uf3IJ8QKoOCz2er7nEwze4HrVDuurY7++s9nMx4qsa7etNHAJOSxgmSiL9 kl4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761851736; x=1762456536; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=RIfvvqyC4pkF2mzEqWw3TYRNyqKqrDE0HW5h0ZeeWLw=; b=IeEgm9M9X5W4KNX8CTwuCQwQsQXWDA2VEjQ+CfEyXFdnPS1zfIMxXEErfeNaO9KIux 5R+/QfeEkOXKX2sh48EI5cpqp+dPuRG8QTzeNg61hFm1O6F6/s6udIsQMNTpX6jK23j3 b0RsYbWlKOpYONg8/LEqfPpCG+qu9mdqDjlgVw6wZ3/oVVMh570cbBB2OYM2B1QqS36Q Y2vixGCF/D1fMKFxfMVK3dqIG61OxxKffrFukh3MwREsxoK5AKE1XDWf7Xn+BVqYfyfv N1QtJwlx4+99/v2G3vDMiie/5hJnq+LeP6zFKtsmbEUop37IywwTisy+hE19+PA6eWJP ZbUQ== X-Forwarded-Encrypted: i=1; AJvYcCW5hPH+UexjZSrCgn7udMALjezLK92R3mjKaXRKSHJg/v46JrxXrFgbwn0ZIzX8WIaYYE/0kNh+3Q1knko=@vger.kernel.org X-Gm-Message-State: AOJu0Yw/u9GmoJGqulOnqQ1LJsdk1ewrsT4CU3EolYXQZyzvVvZMO9ow 16QHxZbM80zA9dY9OgdYkTTUX474OkyV3VQ6CZhhlJVtU1YPcbt1Sryp1PGXi9DWEF4hhCB+41S X36e6LQ== X-Google-Smtp-Source: AGHT+IH+436+sxPArkYZcKYPQII9ttXJyIOB3pB5LOkXSlNMmSPGsaxPPxQ7B5IoTdJPom2cUmt179+6PuE= X-Received: from plbjy16.prod.google.com ([2002:a17:903:42d0:b0:290:28e2:ce5c]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:d505:b0:290:b10f:9aec with SMTP id d9443c01a7336-294ed2c61c2mr47900205ad.26.1761851736057; Thu, 30 Oct 2025 12:15:36 -0700 (PDT) Reply-To: Sean Christopherson Date: Thu, 30 Oct 2025 12:15:26 -0700 In-Reply-To: <20251030191528.3380553-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20251030191528.3380553-1-seanjc@google.com> X-Mailer: git-send-email 2.51.1.930.gacf6e81ea2-goog Message-ID: <20251030191528.3380553-3-seanjc@google.com> Subject: [PATCH v5 2/4] KVM: x86: WARN if user-return MSR notifier is registered on exit From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini , "Kirill A. Shutemov" Cc: kvm@vger.kernel.org, x86@kernel.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, Yan Zhao , Xiaoyao Li , Rick Edgecombe , Hou Wenlong Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When freeing the per-CPU user-return MSRs structures, WARN if any CPU has a registered notifier to help detect and/or debug potential use-after-free issues. The lifecycle of the notifiers is rather convoluted, and has several non-obvious paths where notifiers are unregistered, i.e. isn't exactly the most robust code possible. The notifiers they are registered on-demand in KVM, on the first WRMSR to a tracked register. _Usually_ the notifier is unregistered whenever the CPU returns to userspace. But because any given CPU isn't guaranteed to return to userspace, e.g. the CPU could be offlined before doing so, KVM also "drops", a.k.a. unregisters, the notifiers when virtualization is disabled on the CPU. Further complicating the unregister path is the fact that the calls to disable virtualization come from common KVM, and the per-CPU calls are guarded by a per-CPU flag (to harden _that_ code against bugs, e.g. due to mishandling reboot). Reboot/shutdown in particular is problematic, as KVM disables virtualization via IPI function call, i.e. from IRQ context, instead of using the cpuhp framework, which runs in task context. I.e. on reboot/shutdown, drop_user_return_notifiers() is called asynchronously. Forced reboot/shutdown is the most problematic scenario, as userspace tasks are not frozen before kvm_shutdown() is invoked, i.e. KVM could be actively manipulating the user-return MSR lists and/or notifiers when the IPI arrives. To a certain extent, all bets are off when userspace forces a reboot/shutdown, but KVM should at least avoid a use-after-free, e.g. to avoid crashing the kernel when trying to reboot. Signed-off-by: Sean Christopherson --- arch/x86/kvm/x86.c | 33 +++++++++++++++++++++++++-------- 1 file changed, 25 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 639589af7cbe..bb7a7515f280 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -575,6 +575,27 @@ static inline void kvm_async_pf_hash_reset(struct kvm_= vcpu *vcpu) vcpu->arch.apf.gfns[i] =3D ~0; } =20 +static int kvm_init_user_return_msrs(void) +{ + user_return_msrs =3D alloc_percpu(struct kvm_user_return_msrs); + if (!user_return_msrs) { + pr_err("failed to allocate percpu user_return_msrs\n"); + return -ENOMEM; + } + kvm_nr_uret_msrs =3D 0; + return 0; +} + +static void kvm_free_user_return_msrs(void) +{ + int cpu; + + for_each_possible_cpu(cpu) + WARN_ON_ONCE(per_cpu_ptr(user_return_msrs, cpu)->registered); + + free_percpu(user_return_msrs); +} + static void kvm_on_user_return(struct user_return_notifier *urn) { unsigned slot; @@ -10023,13 +10044,9 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *o= ps) return -ENOMEM; } =20 - user_return_msrs =3D alloc_percpu(struct kvm_user_return_msrs); - if (!user_return_msrs) { - pr_err("failed to allocate percpu kvm_user_return_msrs\n"); - r =3D -ENOMEM; + r =3D kvm_init_user_return_msrs(); + if (r) goto out_free_x86_emulator_cache; - } - kvm_nr_uret_msrs =3D 0; =20 r =3D kvm_mmu_vendor_module_init(); if (r) @@ -10132,7 +10149,7 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *op= s) out_mmu_exit: kvm_mmu_vendor_module_exit(); out_free_percpu: - free_percpu(user_return_msrs); + kvm_free_user_return_msrs(); out_free_x86_emulator_cache: kmem_cache_destroy(x86_emulator_cache); return r; @@ -10161,7 +10178,7 @@ void kvm_x86_vendor_exit(void) #endif kvm_x86_call(hardware_unsetup)(); kvm_mmu_vendor_module_exit(); - free_percpu(user_return_msrs); + kvm_free_user_return_msrs(); kmem_cache_destroy(x86_emulator_cache); #ifdef CONFIG_KVM_XEN static_key_deferred_flush(&kvm_xen_enabled); --=20 2.51.1.930.gacf6e81ea2-goog From nobody Sun Feb 8 07:08:28 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8FE6C2E9EAD for ; Thu, 30 Oct 2025 19:15:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761851740; cv=none; b=pNBLonWq+WVtOAuM0g8AG2OPnA4MAS14lSsdPApw5/FtJn4oEtmhYFqhpJ7NxP5HgaIVDlC92cN1Xc0JbFpGrxO094rL1PwQ9QX2eKVTpJY7n/b2M88tR8ftfIqsHq7fkyww+TSv1AP9ORQd7hl0UDovhzaaLwwPlROwamOeq0U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761851740; c=relaxed/simple; bh=njD7vVo3RY8sZ4uT4oO36OjUKcILGJX+/8k/St9xxG0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=eOKkGZ4sjhyqqs8S3KBLkAHGEhNuDE5c2quHfGnYhBDYAGmC2yUIHKlmUHUMgSQioV4g1U+X7GAUCnLQ0qFZcgdsOjYa0iGxTVOcVu6GcwfpiIOegVKg2NtxJifYRxQNG9/PjtvLKZpixF62yQwHlAJZ/j+NyCk1OqkP7cdmCiA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Q0p8a6bX; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Q0p8a6bX" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-290cd61855eso14947735ad.1 for ; Thu, 30 Oct 2025 12:15:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1761851738; x=1762456538; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=FNqfEfJoqSYUvF/W9hXgR/MQkgj3qF/GVzyQVpyWACQ=; b=Q0p8a6bX6FhCDfnRiY+kedgbEYtJoMDcd5rTgLIPOUO8GIzgV62l4O/v1YN7HAiNr0 FjqqLR524t6rCgPBM1uaRYsXo5QGxfMrthq87BcmMklmxPAgORCUCLbwkoNVFkNZEV/7 S+ME7ugHnm2910Xnt8WCFnyfkyjVPalZASDfxsfNPkj/AWzdXaxNo+OwQklH+X4bohMf rbKwGjbatc0vYevb2SspE/+QEwsYmFoY64YO66nWMUuH94C9BZVHnxrQSlnvMOBfHQVj WGmuTKgd2q7cL2orj2r7znJ+HnngeO+E5vFs0B777F/plIKTsPd+O5UagXAXLcmCfee/ qdBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761851738; x=1762456538; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=FNqfEfJoqSYUvF/W9hXgR/MQkgj3qF/GVzyQVpyWACQ=; b=aP5tv3tA0ShrPGQO37neF1rXasTxoNgK+Z4HHNVlsABDFMoUFjWL+MJNdTpiDNLq+e TJ7pGQmuHNSV2Vc/c6LPPAmZ0VXeiPejKftJHM9kgkJOUwCNblwa04NLtzvUBDHBqRKm 6RP+w9pP3X833brGC+h3AsN0oX1Zr+6tnX8HGy0DpLdEyoM6xHz2IoPf25kWR7sH6iwE nKA5mNX8NVILzXlsaQes3ZG/zuegwEGvdKMzTP9QFGzRSym0UmRcf9V0g/85Z2C3aMQg 5OGZ1U3XIn2JItXVWPjOCrQsHLwTG1lHu9BSoZaqUlpPtyxR73iGJ4wZ7onVJ4nHJwKQ cAww== X-Forwarded-Encrypted: i=1; AJvYcCVlxh8xhmhGrMfX19YQMyBZU7iUbjrPRHOHXCgs+Q3179HgHT1tYyMDvn+rTiPl6jpwCplBK3xvfyuQ3/4=@vger.kernel.org X-Gm-Message-State: AOJu0Yz5xCJTUJDC93w/KL3SpvUq/mlmgx9SJ1fkRhDpVJQ/o3wAMqcq nXIVxNpoxjrgdnJDmXrx6OFWorV5uVlxfuFCqpv0Je2QPkWOwW3io4Ah+DNCD/ae9Y1Q2+AjvB2 9qiK5ZA== X-Google-Smtp-Source: AGHT+IFBdKayTTHDRLc1XeLTBMS4cU/1mdOhxtVsleqZOp+VTYxxaZOIKtrD0iyEKzZHky8RcgnIOTS6Ng8= X-Received: from plch11.prod.google.com ([2002:a17:902:f2cb:b0:290:be3d:aff6]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:3202:b0:294:b758:76b2 with SMTP id d9443c01a7336-2951a58890cmr12319815ad.43.1761851737922; Thu, 30 Oct 2025 12:15:37 -0700 (PDT) Reply-To: Sean Christopherson Date: Thu, 30 Oct 2025 12:15:27 -0700 In-Reply-To: <20251030191528.3380553-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20251030191528.3380553-1-seanjc@google.com> X-Mailer: git-send-email 2.51.1.930.gacf6e81ea2-goog Message-ID: <20251030191528.3380553-4-seanjc@google.com> Subject: [PATCH v5 3/4] KVM: x86: Leave user-return notifier registered on reboot/shutdown From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini , "Kirill A. Shutemov" Cc: kvm@vger.kernel.org, x86@kernel.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, Yan Zhao , Xiaoyao Li , Rick Edgecombe , Hou Wenlong Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Leave KVM's user-return notifier registered in the unlikely case that the notifier is registered when disabling virtualization via IPI callback in response to reboot/shutdown. On reboot/shutdown, keeping the notifier registered is ok as far as MSR state is concerned (arguably better then restoring MSRs at an unknown point in time), as the callback will run cleanly and restore host MSRs if the CPU manages to return to userspace before the system goes down. The only wrinkle is that if kvm.ko module unload manages to race with reboot/shutdown, then leaving the notifier registered could lead to use-after-free due to calling into unloaded kvm.ko module code. But such a race is only possible on --forced reboot/shutdown, because otherwise userspace tasks would be frozen before kvm_shutdown() is called, i.e. on a "normal" reboot/shutdown, it should be impossible for the CPU to return to userspace after kvm_shutdown(). Furthermore, on a --forced reboot/shutdown, unregistering the user-return hook from IRQ context doesn't fully guard against use-after-free, because KVM could immediately re-register the hook, e.g. if the IRQ arrives before kvm_user_return_register_notifier() is called. Rather than trying to guard against the IPI in the "normal" user-return code, which is difficult and noisy, simply leave the user-return notifier registered on a reboot, and bump the kvm.ko module refcount to defend against a use-after-free due to kvm.ko unload racing against reboot. Alternatively, KVM could allow kvm.ko and try to drop the notifiers during kvm_x86_exit(), but that's also a can of worms as registration is per-CPU, and so KVM would need to blast an IPI, and doing so while a reboot/shutdown is in-progress is far risky than preventing userspace from unloading KVM. Signed-off-by: Sean Christopherson --- arch/x86/kvm/x86.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index bb7a7515f280..c927326344b1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13086,7 +13086,21 @@ int kvm_arch_enable_virtualization_cpu(void) void kvm_arch_disable_virtualization_cpu(void) { kvm_x86_call(disable_virtualization_cpu)(); - drop_user_return_notifiers(); + + /* + * Leave the user-return notifiers as-is when disabling virtualization + * for reboot, i.e. when disabling via IPI function call, and instead + * pin kvm.ko (if it's a module) to defend against use-after-free (in + * the *very* unlikely scenario module unload is racing with reboot). + * On a forced reboot, tasks aren't frozen before shutdown, and so KVM + * could be actively modifying user-return MSR state when the IPI to + * disable virtualization arrives. Handle the extreme edge case here + * instead of trying to account for it in the normal flows. + */ + if (in_task() || WARN_ON_ONCE(!kvm_rebooting)) + drop_user_return_notifiers(); + else + __module_get(THIS_MODULE); } =20 bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu) --=20 2.51.1.930.gacf6e81ea2-goog From nobody Sun Feb 8 07:08:28 2026 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 79F5A2F7AD8 for ; Thu, 30 Oct 2025 19:15:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761851743; cv=none; b=r19pPqjOECyY3U8TtfwTYVHfim90O/gCfhaNJGf1lIS6SFw48yC1SyowjDTQjOsvpiGeZbZZdfiJSrN6hi/28517bSZ8rqOhAZO0IRlBJtC8LhWJlcCPuC+vsM+mzeBkL8SerLfW/nBS7FAAtgJJ/EeSayarvGmAeIz9ZssoBcY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761851743; c=relaxed/simple; bh=5kpuGHjjpW7eTD6TcJHKfUqA4500T6O1STI83TOW8fs=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=SyXikdUVI9GPfad+BBulHUwbfvmdzeSuyaXvs8DbajmAKphlRiVhOV4AE79Zo0o4WR/WrrJIs3d06U+ca09/JuwxtsUyt++sKnxT6IE1RGhz/9JmP4yHMpTOszBAQBDSd+LGXif2pXxa2IiSV8kryGO/udgH73qX8I+zwT9qDBg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=ZEpsSSHZ; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ZEpsSSHZ" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-b62f9247dd1so1144220a12.0 for ; Thu, 30 Oct 2025 12:15:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1761851741; x=1762456541; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=W/2e8g1L0c2/fPIAdJgPfrLPj4zwW85TJ2q4DSKPNps=; b=ZEpsSSHZBtu0juLkwieUGwZtNeGeCyMjSEXFCX2/G85+IzUpJIMh9Wc0bLqkGLBmCO yDeyxLipzL0K6L2d8MtYUsmlFvJ2LfLIsmAjGqAipRUJ0s08WdPvmF8CB0KqGZzWg4eh 09c5AgmLCopk1zd+1ydIBzmZJdFpQmDTkZfZZU+OORi01kWcazruCBIOJg7RvBnhmTAz josrWYJ21DWrftAtiO7pLHh2UCMoQlLBGThEBdhdgRJFqCaPxewb1jQ2krMAAJY46VfD I6512lw9p7rPhTsqVjmQHptJsg2bDwwLdS3DNvrVoa22wVVAOju4Fdsyk3WntAeRZdgM 12hQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761851741; x=1762456541; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=W/2e8g1L0c2/fPIAdJgPfrLPj4zwW85TJ2q4DSKPNps=; b=UaFq6x5W2ZjVCW34niUmcqJmaoDlvqpyXBWTTJugIIhcqV4QnY3pMZw5bEJ8ejJOVQ Xr7IV38ZKGfBypye4LqtWC0rp26nOyB2C1qy5XT7w5uZhT1d97JmYj+wF4n7cnfbkYaW BEcAPfo50nrHvhBoUWgYT9LlrXBOxFXnegQJVYbG0CUTdnO0pqADVE3Gh6oppbu7Geb8 NFqbav2bJ+lOzQyZj9TdA5ZtMM8bDjk5RyrKRy5AfIoYpNbcqLZM+PvXAy+d98/8nJs3 MYWdcjkv13E+lGGh7mQWxcTY88NCkwHyClLQqLrWqqygBCncfKXkppbZtTS1nqSYSNTi wppw== X-Forwarded-Encrypted: i=1; AJvYcCW7cndtiHmTcdRMoemEpDOCxkolGcMHkd6ezZBZsqAGGEaTYntip8LcXnW9Flvkrh4CMcmE7saa3OsqIdY=@vger.kernel.org X-Gm-Message-State: AOJu0Yw9EjoF7xKW1X/p4XLWpsIWy8cL4Ojvwf9VSCkp5qhvuzcAryjJ qR/bFshyorX/lqtbiED/AQPlhwCUJviD1BM1O1MEZIHGU0u9DL/idpdUJWlzeXgV5BJ579hxbUI uwJoEFQ== X-Google-Smtp-Source: AGHT+IHBG+E9whSy2GErOVQq7nq3zTa9mi3obzAxIEd5dUmGB6oa0sd3yHkkpbWzPAltL195JDv7MbsuY58= X-Received: from pllk11.prod.google.com ([2002:a17:902:760b:b0:290:b136:4f08]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:1a67:b0:295:ed6:4641 with SMTP id d9443c01a7336-2951a3bfafamr11598625ad.25.1761851740789; Thu, 30 Oct 2025 12:15:40 -0700 (PDT) Reply-To: Sean Christopherson Date: Thu, 30 Oct 2025 12:15:28 -0700 In-Reply-To: <20251030191528.3380553-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20251030191528.3380553-1-seanjc@google.com> X-Mailer: git-send-email 2.51.1.930.gacf6e81ea2-goog Message-ID: <20251030191528.3380553-5-seanjc@google.com> Subject: [PATCH v5 4/4] KVM: x86: Don't disable IRQs when unregistering user-return notifier From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini , "Kirill A. Shutemov" Cc: kvm@vger.kernel.org, x86@kernel.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, Yan Zhao , Xiaoyao Li , Rick Edgecombe , Hou Wenlong Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Hou Wenlong Remove the code to disable IRQs when unregistering KVM's user-return notifier now that KVM doesn't invoke kvm_on_user_return() when disabling virtualization via IPI function call, i.e. now that there's no need to guard against re-entrancy via IPI callback. Note, disabling IRQs has largely been unnecessary since commit a377ac1cd9d7b ("x86/entry: Move user return notifier out of loop") moved fire_user_return_notifiers() into the section with IRQs disabled. In doing so, the commit somewhat inadvertently fixed the underlying issue that was papered over by commit 1650b4ebc99d ("KVM: Disable irq while unregistering user notifier"). I.e. in practice, the code and comment has been stale since commit a377ac1cd9d7b. Signed-off-by: Hou Wenlong [sean: rewrite changelog after rebasing, drop lockdep assert] Signed-off-by: Sean Christopherson Reviewed-by: Kai Huang --- arch/x86/kvm/x86.c | 14 +++----------- 1 file changed, 3 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c927326344b1..719a5fa45eb1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -602,18 +602,10 @@ static void kvm_on_user_return(struct user_return_not= ifier *urn) struct kvm_user_return_msrs *msrs =3D container_of(urn, struct kvm_user_return_msrs, urn); struct kvm_user_return_msr_values *values; - unsigned long flags; =20 - /* - * Disabling irqs at this point since the following code could be - * interrupted and executed through kvm_arch_disable_virtualization_cpu() - */ - local_irq_save(flags); - if (msrs->registered) { - msrs->registered =3D false; - user_return_notifier_unregister(urn); - } - local_irq_restore(flags); + msrs->registered =3D false; + user_return_notifier_unregister(urn); + for (slot =3D 0; slot < kvm_nr_uret_msrs; ++slot) { values =3D &msrs->values[slot]; if (values->host !=3D values->curr) { --=20 2.51.1.930.gacf6e81ea2-goog