From nobody Thu Apr 2 22:23:03 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3733228506A for ; Sat, 14 Feb 2026 01:27:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032430; cv=none; b=HEwTcGIklg1rHkdt2SKmdWhaRHa/xPuZlVIawsaTP08kSoTLGO5gkliPVGhGPDoQA7xsqgM6kHkNhIrJtS968AB+o7Dpn+r/Vu8grwj2FVEOA3jEzoHxqKvmF43yOXOuHyEN7sdIPq0IkxH5wDpJ7wGjxWSocPv7qtOFzJx9VdY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032430; c=relaxed/simple; bh=4EUvtuubXDsq2YIfpcwadRuQgjt5GtaI9Om4D5BZP9I=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=bgdpoFNpW6rSoxG5XjQWqOCb4/9NcJUItk6xTvfCbDUoPWwOeZMZXQOKHmAlPZjunUWFXFy7gPIbO+GyqHcfLcEIcjIlOoTCH92MRjyMVAtcF4FTqw3vguuJkjOeFAQQLyJapFqQRmwfuUeZ6a0DqJwrtK7daGIVJz0B9MWjy3o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=gKzzSoaK; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="gKzzSoaK" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-353c9d644b0so1069067a91.2 for ; Fri, 13 Feb 2026 17:27:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1771032427; x=1771637227; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=W7MxKQiK0Ayh5SRrSHD6eYo36EclamwmJq0KYQ9dvh4=; b=gKzzSoaKu3CK5x+WIOQlXsAY004mj7S0TcoHF3W4e7t61Z1pK2G38ouo9Wdto0DfE7 ux04nTFlASGw3lmZDUngUaV8wc0gsIL5KqqpLTU9FU1FsP5RfqQo5NGPaLqjA2BhOaMk vvNrvjYdF0/A7Xl7qncHbdITQliJiQ/+1S5vg6Isys4Fj1Q2AKNHvlAYvT7oEna5PjsB DC1YsgCZr6gIvJ2CqMnLkK4KjvhhGuJYqjNiAtIlq6/W/CY3fGWr/hhBhKiHjSfYhWNf IL2FzFZQ24MjDlzMPuGItR4YrgZ5o45tv9BGlqYN2JIBiA2WUK5gdRIJRRIH58ZNUtg5 rceA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771032427; x=1771637227; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=W7MxKQiK0Ayh5SRrSHD6eYo36EclamwmJq0KYQ9dvh4=; b=bOusZD/jjNM5GTf0RxHR/h24OtNAT6RTQU0KLXJAFMqmH5AE8WQj54DkufxnXeOCm2 cRXrGU27tqIWP7smKHffs2k1Tz520Pq3rR663lk2q8ymIM0YJp2yRfl+AWG8eOdn7tII zERAPhw6PKJzDi748mQtzTG90NfBsCyNMjUQ4QLXo0dxhVqJb9OeW5UWKpJtMOTxl1LJ rlpiSi9RqbsMiF6gk0QTsQmIhY4197NDCRp75dKbkm+0GDJ0+TH/3Xe4/QhKnahOsrDz 2vSWr62FoA+QH/G6VMkMvGOyqyj3K+11HRVeLszShbMMPSUixf0n97JoX510QlB+YRmV LyhQ== X-Gm-Message-State: AOJu0YxSo0yRlyFRiRQOowV67TQMuzM+uqzXbf3WzHXVuzS52pbZxevZ aC2d7gwa1uiPeHfQvppKgFr3eioD0HG87S9zuWpZoLB8NDKO2mRTk2O8EB4uonSbY12lQYIdj0C HVaVhXA== X-Received: from pjbfv20.prod.google.com ([2002:a17:90b:e94:b0:354:a52b:ba9d]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:1c0c:b0:356:2eac:b649 with SMTP id 98e67ed59e1d1-358448265e8mr1145384a91.7.1771032427400; Fri, 13 Feb 2026 17:27:07 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 13 Feb 2026 17:26:47 -0800 In-Reply-To: <20260214012702.2368778-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260214012702.2368778-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.310.g728cabbaf7-goog Message-ID: <20260214012702.2368778-2-seanjc@google.com> Subject: [PATCH v3 01/16] KVM: x86: Move kvm_rebooting to x86 From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Peter Zijlstra , Arnaldo Carvalho de Melo , Namhyung Kim , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, linux-perf-users@vger.kernel.org, Chao Gao , Xu Yilun , Dan Williams Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Move kvm_rebooting, which is only read by x86, to KVM x86 so that it can be moved again to core x86 code. Add a "shutdown" arch hook to facilate setting the flag in KVM x86, along with a pile of comments to provide more context around what KVM x86 is doing and why. Reviewed-by: Chao Gao Acked-by: Dave Hansen Signed-off-by: Sean Christopherson Reviewed-by: Dan Williams Tested-by: Chao Gao Tested-by: Sagi Shahar --- arch/x86/kvm/x86.c | 22 ++++++++++++++++++++++ arch/x86/kvm/x86.h | 1 + include/linux/kvm_host.h | 8 +++++++- virt/kvm/kvm_main.c | 14 +++++++------- 4 files changed, 37 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index db3f393192d9..77edc24f8309 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -700,6 +700,9 @@ static void drop_user_return_notifiers(void) kvm_on_user_return(&msrs->urn); } =20 +__visible bool kvm_rebooting; +EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_rebooting); + /* * Handle a fault on a hardware virtualization (VMX or SVM) instruction. * @@ -13178,6 +13181,25 @@ int kvm_arch_enable_virtualization_cpu(void) return 0; } =20 +void kvm_arch_shutdown(void) +{ + /* + * Set kvm_rebooting to indicate that KVM has asynchronously disabled + * hardware virtualization, i.e. that errors and/or exceptions on SVM + * and VMX instructions are expected and should be ignored. + */ + kvm_rebooting =3D true; + + /* + * Ensure kvm_rebooting is visible before IPIs are sent to other CPUs + * to disable virtualization. Effectively pairs with the reception of + * the IPI (kvm_rebooting is read in task/exception context, but only + * _needs_ to be read as %true after the IPI function callback disables + * virtualization). + */ + smp_wmb(); +} + void kvm_arch_disable_virtualization_cpu(void) { kvm_x86_call(disable_virtualization_cpu)(); diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 94d4f07aaaa0..b314649e5c02 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -54,6 +54,7 @@ struct kvm_host_values { u64 arch_capabilities; }; =20 +extern bool kvm_rebooting; void kvm_spurious_fault(void); =20 #define SIZE_OF_MEMSLOTS_HASHTABLE \ diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 2c7d76262898..981b55c0a3a7 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1630,6 +1630,13 @@ static inline void kvm_create_vcpu_debugfs(struct kv= m_vcpu *vcpu) {} #endif =20 #ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING +/* + * kvm_arch_shutdown() is invoked immediately prior to forcefully disabling + * hardware virtualization on all CPUs via IPI function calls (in preparat= ion + * for shutdown or reboot), e.g. to allow arch code to prepare for disabli= ng + * virtualization while KVM may be actively running vCPUs. + */ +void kvm_arch_shutdown(void); /* * kvm_arch_{enable,disable}_virtualization() are called on one CPU, under * kvm_usage_lock, immediately after/before 0=3D>1 and 1=3D>0 transitions = of @@ -2305,7 +2312,6 @@ static inline bool kvm_check_request(int req, struct = kvm_vcpu *vcpu) =20 #ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING extern bool enable_virt_at_load; -extern bool kvm_rebooting; #endif =20 extern unsigned int halt_poll_ns; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 571cf0d6ec01..e081e7244299 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -5593,13 +5593,15 @@ bool enable_virt_at_load =3D true; module_param(enable_virt_at_load, bool, 0444); EXPORT_SYMBOL_FOR_KVM_INTERNAL(enable_virt_at_load); =20 -__visible bool kvm_rebooting; -EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_rebooting); - static DEFINE_PER_CPU(bool, virtualization_enabled); static DEFINE_MUTEX(kvm_usage_lock); static int kvm_usage_count; =20 +__weak void kvm_arch_shutdown(void) +{ + +} + __weak void kvm_arch_enable_virtualization(void) { =20 @@ -5653,10 +5655,9 @@ static int kvm_offline_cpu(unsigned int cpu) =20 static void kvm_shutdown(void *data) { + kvm_arch_shutdown(); + /* - * Disable hardware virtualization and set kvm_rebooting to indicate - * that KVM has asynchronously disabled hardware virtualization, i.e. - * that relevant errors and exceptions aren't entirely unexpected. * Some flavors of hardware virtualization need to be disabled before * transferring control to firmware (to perform shutdown/reboot), e.g. * on x86, virtualization can block INIT interrupts, which are used by @@ -5665,7 +5666,6 @@ static void kvm_shutdown(void *data) * 100% comprehensive. */ pr_info("kvm: exiting hardware virtualization\n"); - kvm_rebooting =3D true; on_each_cpu(kvm_disable_virtualization_cpu, NULL, 1); } =20 --=20 2.53.0.310.g728cabbaf7-goog From nobody Thu Apr 2 22:23:03 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 16B36293C42 for ; Sat, 14 Feb 2026 01:27:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032431; cv=none; b=a2jnvWzBK40JYr1LLsWb3txt2xDCBuiSPvnELlVAoU2elDfiIfbQszuyQie0nATBmBjNPQxv7iO6e+3RNfjVSDMeErggRMjxfLiEkhiP8FUhNp0/1Oh4aWHyiIZmgTxqeroLCSoerWRce9soySr4Iywj+v/c9xgrzcBghFz0d64= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032431; c=relaxed/simple; bh=uHRBT0xtiTkIZnuHIJP94+r3B3sUuzdcITtjzmdIZxM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=NoMFfbvbH2fAFPgguzA99NxGzuD/E/tz8VZSFAN6nrEZdQAdd8GdIt08HfpuzCAiFGg6nZen+97c7OmoCK/lYE54au8/c6xz/ZB86irUndIRX2evsvXPjfivDgrAvq7c6QU6UOVpVWgeYAAC4akGTMm5VXF/yVEfM9i8tEQa4+8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=AzbM1kmG; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="AzbM1kmG" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2aaf0dbd073so16008465ad.3 for ; Fri, 13 Feb 2026 17:27:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1771032429; x=1771637229; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=RDtoclkNp4NWg025cIg81rPLzczzFSfFPqIQKO5nW/E=; b=AzbM1kmGvfonESNeM5gf/EkKtK14QmUOulaHH/TuYm86IiW+1usGe5/Qujy/9Z9M/a 8StGD2vquyv9mlQC7jysNKFrluPX+TknqZFGe1Ia2f8yAmNnxqjOeJJwUBz7NQ3udBta u8GMB7uPEfjoH4ZSA2Yef2/TRr30VQdF/JJYwNOWUKF4ySU8JHoAsNdg5bXxq/xsugHv qEZS/yaLJUaWelLgwGTqgnBr/6QOeMVaDLw3vT9nZLGiAEQSFg3kyjAICWqhmcO9CiMe YB9r27WHLcdMkpTg7hLpP4/3WLt9Mv+jI6lVNxu+ZxvBaqyZC5aZhfO1dtMX3Jos/u9N rCtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771032429; x=1771637229; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=RDtoclkNp4NWg025cIg81rPLzczzFSfFPqIQKO5nW/E=; b=Pp+wRgnbrJzeKwbaHFD5y4B8ksBW5CLra+E7m+iRUbmaPxHdw1D+TgDNquNpE426a4 iVDRx72mnS51wljEklFP0RGTKYUkRpLf4WkfZNfxe6AuI6Jppdgjc3nDXW1e+qmHliHr FU3Yjo76QcT61ody1Fl01z3lYIcNaJGWvcsjZQV2DVCuL2uD0LRdtryrejdYs0ZvGKdI 35KcOpi+6JBA+g6B2An66FTGgV/gXsohjEdgUdD4gOBdIf2b3/rLHsXFhy9l48/FWM+4 +mTAjYg8/wV6izx+YDJ97m5eX+pxVgtbUDVrKgP+D4nyZLqwpkesN2+NJ9NUAOao3943 qlKw== X-Gm-Message-State: AOJu0YwKUe+845hfmN3AN3LnP+VaSj+2pvEQ+SrP1vQTuauHq5WRIYaW oDgKYQHOY9DHeycQN7xdcmQHPtP7lHUGzeIQQjdrF6dXPVuBojCxPKluZOvJNK9bEtZBm2dNGsE qqDHHeA== X-Received: from plil7.prod.google.com ([2002:a17:903:17c7:b0:2a1:10fa:4a4a]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:f609:b0:2ab:230d:2d96 with SMTP id d9443c01a7336-2ab50521f49mr39916555ad.11.1771032429246; Fri, 13 Feb 2026 17:27:09 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 13 Feb 2026 17:26:48 -0800 In-Reply-To: <20260214012702.2368778-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260214012702.2368778-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.310.g728cabbaf7-goog Message-ID: <20260214012702.2368778-3-seanjc@google.com> Subject: [PATCH v3 02/16] KVM: VMX: Move architectural "vmcs" and "vmcs_hdr" structures to public vmx.h From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Peter Zijlstra , Arnaldo Carvalho de Melo , Namhyung Kim , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, linux-perf-users@vger.kernel.org, Chao Gao , Xu Yilun , Dan Williams Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Move "struct vmcs" and "struct vmcs_hdr" to asm/vmx.h in anticipation of moving VMXON/VMXOFF to the core kernel (VMXON requires a "root" VMCS with the appropriate revision ID in its header). No functional change intended. Signed-off-by: Sean Christopherson Reviewed-by: Dan Williams Tested-by: Chao Gao Tested-by: Sagi Shahar --- arch/x86/include/asm/vmx.h | 11 +++++++++++ arch/x86/kvm/vmx/vmcs.h | 11 ----------- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index b92ff87e3560..37080382df54 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -20,6 +20,17 @@ #include #include =20 +struct vmcs_hdr { + u32 revision_id:31; + u32 shadow_vmcs:1; +}; + +struct vmcs { + struct vmcs_hdr hdr; + u32 abort; + char data[]; +}; + #define VMCS_CONTROL_BIT(x) BIT(VMX_FEATURE_##x & 0x1f) =20 /* diff --git a/arch/x86/kvm/vmx/vmcs.h b/arch/x86/kvm/vmx/vmcs.h index 66d747e265b1..1f16ddeae9cb 100644 --- a/arch/x86/kvm/vmx/vmcs.h +++ b/arch/x86/kvm/vmx/vmcs.h @@ -22,17 +22,6 @@ #define VMCS12_IDX_TO_ENC(idx) ROL16(idx, 10) #define ENC_TO_VMCS12_IDX(enc) ROL16(enc, 6) =20 -struct vmcs_hdr { - u32 revision_id:31; - u32 shadow_vmcs:1; -}; - -struct vmcs { - struct vmcs_hdr hdr; - u32 abort; - char data[]; -}; - DECLARE_PER_CPU(struct vmcs *, current_vmcs); =20 /* --=20 2.53.0.310.g728cabbaf7-goog From nobody Thu Apr 2 22:23:03 2026 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 209DE29ACC6 for ; Sat, 14 Feb 2026 01:27:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032433; cv=none; b=Q6WTqXY+Hxw8lkCO7KvWFXzrsrEsj5zIi3Q9tTRQoHTnpClvosTMfS6mp0H1eM/PIOs8OOCXHI57iSO/cB0IBU9pTY+XUSfNcod7ZsUz5nLMNMFFB7R84AGs3HyrlHPO0qEqiY62hJOLzrXd7nFS8pkyoLQcly1MciBW0fM7hOo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032433; c=relaxed/simple; bh=gnRLsQcZywazDuEdGGXGGq+fa2JWlZCKtOcI/j1sQ60=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ejoU758SM/nkgb66OBmYX0hMw+eDwC+nUqdrIHpb3qVVCJufM/5YhGQWDdcua55Zw8EKEqiIlqZdWUBR4v69ET7mUvkus+g/CYdBN3y6hY+bGb50eIua19gXAFZEayVHp3UV7NQo7K/TVffaUjwaQgptfwDvqRzAatk+pYfzwFM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=vHeMcmhb; arc=none smtp.client-ip=209.85.210.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="vHeMcmhb" Received: by mail-pf1-f201.google.com with SMTP id d2e1a72fcca58-824b05ba554so3464964b3a.1 for ; Fri, 13 Feb 2026 17:27:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1771032431; x=1771637231; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=LlribCFdngaQSRseMvArVzR7Ue2wpxs9f06/EhF3nRU=; b=vHeMcmhbQyG4zdBO/z9AEYaEFKIN4P87I9An6Ii9awX7yFFAi9Z0skbcusdMkICY5M DBF4H0UhTRqiyyzbCbHC4+OfJdMVq9P77FdsHyNGXCG+MTzLZk04isiNCLPtBgoGbyI9 7YCiofmaPVDImyxm0+ZeB8dDwOafRu6C5H4vrI95YKbndKZsn1iunI6rUGPw6gU21gRz bwMmR8UvyTjtPbMFQUZda5MPGisMWQa2Zwq4LBvkt3187xdRROZl+SIa0adxV2HpbrpW VO8yLb0pHnnp2s2l3uIpu6s5A+L2ZJ/vTWHqU668WIRL9X8sC7vPVmHTei3Z9Zop2AKf 1ixA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771032431; x=1771637231; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=LlribCFdngaQSRseMvArVzR7Ue2wpxs9f06/EhF3nRU=; b=WRrlADdEZfWpKuvEqHWFRCmbwblQImfqx8XufNkKxMHuIC2X2fq1kc5TyKOj+xb5lW 03987JFhcxDg/8R01KmCamI/zrmf1zPOFsI9toGQoXAgIFLZw5MHbs+BLCZQ9Uml8gpl FGreF4H1CyaiebkEtAVDVciHXMBhwMez3FtasvYQwRbETgEXqCI5FxqVzlerizEyCygJ ijN7jO2beO5M534qhjONACuwOrAybsTU1NzKBpghGWAKIPrSRF1KPydIt8ufMXFxId+i 2RW/+T0l2tuqYndKN7J6k6+N7hm3H33eWUFv5vF/ATb7CINEqTUxYRehtTf9YgV/mEll BiFg== X-Gm-Message-State: AOJu0YyyYgtZneYp5RzzeHcNEiEsx4yLMme4mzwt9YnoIfd8ymXqXXKT r/QJbfqy4Uy2WI1gU0k2wxkk0xOGc8yWSg9YFXJ2qSAXZa8WmUfuOON3Brs0/8BW2k6MdiS3e0x XwnMM1A== X-Received: from pfbem2.prod.google.com ([2002:a05:6a00:3742:b0:823:c4bd:60eb]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:4b10:b0:81c:785e:1216 with SMTP id d2e1a72fcca58-824d932e621mr967760b3a.0.1771032431208; Fri, 13 Feb 2026 17:27:11 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 13 Feb 2026 17:26:49 -0800 In-Reply-To: <20260214012702.2368778-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260214012702.2368778-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.310.g728cabbaf7-goog Message-ID: <20260214012702.2368778-4-seanjc@google.com> Subject: [PATCH v3 03/16] KVM: x86: Move "kvm_rebooting" to kernel as "virt_rebooting" From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Peter Zijlstra , Arnaldo Carvalho de Melo , Namhyung Kim , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, linux-perf-users@vger.kernel.org, Chao Gao , Xu Yilun , Dan Williams Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Move "kvm_rebooting" to the kernel, exported for KVM, as one of many steps towards extracting the innermost VMXON and EFER.SVME management logic out of KVM and into to core x86. For lack of a better name, call the new file "hw.c", to yield "virt hardware" when combined with its parent directory. No functional change intended. Signed-off-by: Sean Christopherson Reviewed-by: Dan Williams Tested-by: Chao Gao Tested-by: Sagi Shahar --- arch/x86/include/asm/virt.h | 11 +++++++++++ arch/x86/kvm/svm/svm.c | 3 ++- arch/x86/kvm/svm/vmenter.S | 10 +++++----- arch/x86/kvm/vmx/tdx.c | 3 ++- arch/x86/kvm/vmx/vmenter.S | 2 +- arch/x86/kvm/vmx/vmx.c | 5 +++-- arch/x86/kvm/x86.c | 17 ++++++++--------- arch/x86/kvm/x86.h | 1 - arch/x86/virt/Makefile | 2 ++ arch/x86/virt/hw.c | 7 +++++++ 10 files changed, 41 insertions(+), 20 deletions(-) create mode 100644 arch/x86/include/asm/virt.h create mode 100644 arch/x86/virt/hw.c diff --git a/arch/x86/include/asm/virt.h b/arch/x86/include/asm/virt.h new file mode 100644 index 000000000000..131b9bf9ef3c --- /dev/null +++ b/arch/x86/include/asm/virt.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef _ASM_X86_VIRT_H +#define _ASM_X86_VIRT_H + +#include + +#if IS_ENABLED(CONFIG_KVM_X86) +extern bool virt_rebooting; +#endif + +#endif /* _ASM_X86_VIRT_H */ diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 8f8bc863e214..0ae66c770ebc 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -44,6 +44,7 @@ #include #include #include +#include =20 #include =20 @@ -495,7 +496,7 @@ static inline void kvm_cpu_svm_disable(void) =20 static void svm_emergency_disable_virtualization_cpu(void) { - kvm_rebooting =3D true; + virt_rebooting =3D true; =20 kvm_cpu_svm_disable(); } diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S index 3392bcadfb89..d47c5c93c991 100644 --- a/arch/x86/kvm/svm/vmenter.S +++ b/arch/x86/kvm/svm/vmenter.S @@ -298,16 +298,16 @@ SYM_FUNC_START(__svm_vcpu_run) RESTORE_GUEST_SPEC_CTRL_BODY RESTORE_HOST_SPEC_CTRL_BODY (%_ASM_SP) =20 -10: cmpb $0, _ASM_RIP(kvm_rebooting) +10: cmpb $0, _ASM_RIP(virt_rebooting) jne 2b ud2 -30: cmpb $0, _ASM_RIP(kvm_rebooting) +30: cmpb $0, _ASM_RIP(virt_rebooting) jne 4b ud2 -50: cmpb $0, _ASM_RIP(kvm_rebooting) +50: cmpb $0, _ASM_RIP(virt_rebooting) jne 6b ud2 -70: cmpb $0, _ASM_RIP(kvm_rebooting) +70: cmpb $0, _ASM_RIP(virt_rebooting) jne 8b ud2 =20 @@ -394,7 +394,7 @@ SYM_FUNC_START(__svm_sev_es_vcpu_run) RESTORE_GUEST_SPEC_CTRL_BODY RESTORE_HOST_SPEC_CTRL_BODY %sil =20 -3: cmpb $0, kvm_rebooting(%rip) +3: cmpb $0, virt_rebooting(%rip) jne 2b ud2 =20 diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 5df9d32d2058..0c790eb0bfa6 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -6,6 +6,7 @@ #include #include #include +#include #include "capabilities.h" #include "mmu.h" #include "x86_ops.h" @@ -1994,7 +1995,7 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t= fastpath) * TDX_SEAMCALL_VMFAILINVALID. */ if (unlikely((vp_enter_ret & TDX_SW_ERROR) =3D=3D TDX_SW_ERROR)) { - KVM_BUG_ON(!kvm_rebooting, vcpu->kvm); + KVM_BUG_ON(!virt_rebooting, vcpu->kvm); goto unhandled_exit; } =20 diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S index 4426d34811fc..8a481dae9cae 100644 --- a/arch/x86/kvm/vmx/vmenter.S +++ b/arch/x86/kvm/vmx/vmenter.S @@ -310,7 +310,7 @@ SYM_INNER_LABEL_ALIGN(vmx_vmexit, SYM_L_GLOBAL) RET =20 .Lfixup: - cmpb $0, _ASM_RIP(kvm_rebooting) + cmpb $0, _ASM_RIP(virt_rebooting) jne .Lvmfail ud2 .Lvmfail: diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 967b58a8ab9d..fc6e3b620866 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -48,6 +48,7 @@ #include #include #include +#include #include =20 #include @@ -814,13 +815,13 @@ void vmx_emergency_disable_virtualization_cpu(void) int cpu =3D raw_smp_processor_id(); struct loaded_vmcs *v; =20 - kvm_rebooting =3D true; + virt_rebooting =3D true; =20 /* * Note, CR4.VMXE can be _cleared_ in NMI context, but it can only be * set in task context. If this races with VMX is disabled by an NMI, * VMCLEAR and VMXOFF may #UD, but KVM will eat those faults due to - * kvm_rebooting set. + * virt_rebooting set. */ if (!(__read_cr4() & X86_CR4_VMXE)) return; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 77edc24f8309..69937d14f5e1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -83,6 +83,8 @@ #include #include #include +#include + #include =20 #define CREATE_TRACE_POINTS @@ -700,9 +702,6 @@ static void drop_user_return_notifiers(void) kvm_on_user_return(&msrs->urn); } =20 -__visible bool kvm_rebooting; -EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_rebooting); - /* * Handle a fault on a hardware virtualization (VMX or SVM) instruction. * @@ -713,7 +712,7 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_rebooting); noinstr void kvm_spurious_fault(void) { /* Fault while not rebooting. We want the trace. */ - BUG_ON(!kvm_rebooting); + BUG_ON(!virt_rebooting); } EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_spurious_fault); =20 @@ -13184,16 +13183,16 @@ int kvm_arch_enable_virtualization_cpu(void) void kvm_arch_shutdown(void) { /* - * Set kvm_rebooting to indicate that KVM has asynchronously disabled + * Set virt_rebooting to indicate that KVM has asynchronously disabled * hardware virtualization, i.e. that errors and/or exceptions on SVM * and VMX instructions are expected and should be ignored. */ - kvm_rebooting =3D true; + virt_rebooting =3D true; =20 /* - * Ensure kvm_rebooting is visible before IPIs are sent to other CPUs + * Ensure virt_rebooting is visible before IPIs are sent to other CPUs * to disable virtualization. Effectively pairs with the reception of - * the IPI (kvm_rebooting is read in task/exception context, but only + * the IPI (virt_rebooting is read in task/exception context, but only * _needs_ to be read as %true after the IPI function callback disables * virtualization). */ @@ -13214,7 +13213,7 @@ void kvm_arch_disable_virtualization_cpu(void) * disable virtualization arrives. Handle the extreme edge case here * instead of trying to account for it in the normal flows. */ - if (in_task() || WARN_ON_ONCE(!kvm_rebooting)) + if (in_task() || WARN_ON_ONCE(!virt_rebooting)) drop_user_return_notifiers(); else __module_get(THIS_MODULE); diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index b314649e5c02..94d4f07aaaa0 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -54,7 +54,6 @@ struct kvm_host_values { u64 arch_capabilities; }; =20 -extern bool kvm_rebooting; void kvm_spurious_fault(void); =20 #define SIZE_OF_MEMSLOTS_HASHTABLE \ diff --git a/arch/x86/virt/Makefile b/arch/x86/virt/Makefile index ea343fc392dc..6e485751650c 100644 --- a/arch/x86/virt/Makefile +++ b/arch/x86/virt/Makefile @@ -1,2 +1,4 @@ # SPDX-License-Identifier: GPL-2.0-only obj-y +=3D svm/ vmx/ + +obj-$(subst m,y,$(CONFIG_KVM_X86)) +=3D hw.o \ No newline at end of file diff --git a/arch/x86/virt/hw.c b/arch/x86/virt/hw.c new file mode 100644 index 000000000000..df3dc18d19b4 --- /dev/null +++ b/arch/x86/virt/hw.c @@ -0,0 +1,7 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include + +#include + +__visible bool virt_rebooting; +EXPORT_SYMBOL_FOR_KVM(virt_rebooting); --=20 2.53.0.310.g728cabbaf7-goog From nobody Thu Apr 2 22:23:03 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D2EA328504D for ; Sat, 14 Feb 2026 01:27:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032448; cv=none; b=kN9FrH6utfaDCcfS1OyOY+ztfdqX0e/AqeoQrNoG2YnjPj1WVA9MkqGMvtvaQjh0O9Bf6+k1TbZUn1y0r4GgS37TA88v4n2l7jEz3FiLgfETIiWhsYuBIowTevjeGDdMJHq2IoaxjXH/GAbyatN7LWC/5/NmTrjADgh1Ci4ix/Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032448; c=relaxed/simple; bh=f74NX1koICdcmf3x7Z0AXOlx6psRB/2lx2ODBYt41oE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ZXt9oG4t9lrWv7UyaCwdOiyOP4JqIEA1hvPAcmvaabi4pa0jN5bbkx2hL9CpyVnSAl1yxTKspeLoeNZj43EzlPK7/YRUxoioSclGQwUo/fMmTkusOYact+G8ciEsLd5rvG2nWVLnir7Navqi6uQPNKoZc2ZKS8bMytB7yCdQICU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=3Jy/U5PC; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="3Jy/U5PC" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-354c0234c1fso1480695a91.2 for ; Fri, 13 Feb 2026 17:27:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1771032433; x=1771637233; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=qVKdMdext7pNokZXu6Az08qOVw2l9S4d2dq/07UMlK8=; b=3Jy/U5PChy8jWuexnNCBmzdUfI2jdqIkBveDClMa9s0jaiZuZPT6PNrInYtzrY2Tto e+wuupOJGra2XYb1lsYBIJdDYQpX6N4JzDQB6q3iHlHWoBJGNmSzNWMoPnX4HKhi7v0B SCiM2SaH/WMYEWY/oTfDgT9V4b1EUbXH5NoCbDolvFS97psJ3BrGL6mjGVFdNtIX8eH+ P+g1Pop/DEAFtaRO87VFxMNoeyK24U2q6sDQI0f6JUXHaaKtrd0ocOO0/Sj+MOkje8Ks unTm6tax7IGawx54oqPf1vH+HUm5/C06n0VQttcAkS9cRs6/iM2Lqi6pJtxzfo6txjPi j5Qg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771032433; x=1771637233; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=qVKdMdext7pNokZXu6Az08qOVw2l9S4d2dq/07UMlK8=; b=LpXZcawBfruo6cHKEL13MHbSWXUpkv49ykgBvgF6BYmDZtz1zJlnYPDrwx599CCP6o xRHXESLvSvapBmgLyO5yFLv+GqDm0qfguytFiXWX5R956cB1MHrjvODUd2EILb96lRIK yPgiD7K61OPs3mp/GoJ5Vvl/fGzuwg/2t5NXtALwIz7kNvs+jCnl4c1nlxtMGRWHAhub 4ZX8ttBfGru6ziwQIUlqJOa1hCbi5ANd7o1rGmGoY8PGlc2UFY07uYw7HDzQ9M1zC4Z3 pR6v3vjxUATdaiTUJfeHaOXi98qe88cZEFgC/NMlrT3L/b4qdZ0FIr57EuEaT/B9f2Sj GNEA== X-Gm-Message-State: AOJu0Ywr9lL++E7tbsNHaIx6Nxa8MTx2LELZR9oZzAwj5+iun0G95Ij1 6hPATfpelMa6OVO13DCqqKG79g1nI/xjlvFNB28cJH1YHgMYAW/buzjVcwMV35wEECMYt6AEOkM BbUW6/Q== X-Received: from pjbgi4.prod.google.com ([2002:a17:90b:1104:b0:356:216c:ed75]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:5282:b0:34c:635f:f855 with SMTP id 98e67ed59e1d1-3584481e367mr1194611a91.7.1771032433065; Fri, 13 Feb 2026 17:27:13 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 13 Feb 2026 17:26:50 -0800 In-Reply-To: <20260214012702.2368778-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260214012702.2368778-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.310.g728cabbaf7-goog Message-ID: <20260214012702.2368778-5-seanjc@google.com> Subject: [PATCH v3 04/16] KVM: VMX: Unconditionally allocate root VMCSes during boot CPU bringup From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Peter Zijlstra , Arnaldo Carvalho de Melo , Namhyung Kim , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, linux-perf-users@vger.kernel.org, Chao Gao , Xu Yilun , Dan Williams Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Allocate the root VMCS (misleading called "vmxarea" and "kvm_area" in KVM) for each possible CPU during early boot CPU bringup, before early TDX initialization, so that TDX can eventually do VMXON on-demand (to make SEAMCALLs) without needing to load kvm-intel.ko. Allocate the pages early on, e.g. instead of trying to do so on-demand, to avoid having to juggle allocation failures at runtime. Opportunistically rename the per-CPU pointers to better reflect the role of the VMCS. Use Intel's "root VMCS" terminology, e.g. from various VMCS patents[1][2] and older SDMs, not the more opaque "VMXON region" used in recent versions of the SDM. While it's possible the VMCS passed to VMXON no longer serves as _the_ root VMCS on modern CPUs, it is still in effect a "root mode VMCS", as described in the patents. Link: https://patentimages.storage.googleapis.com/c7/e4/32/d7a7def5580667/W= O2013101191A1.pdf [1] Link: https://patentimages.storage.googleapis.com/13/f6/8d/1361fab8c33373/U= S20080163205A1.pdf [2] Signed-off-by: Sean Christopherson Reviewed-by: Dan Williams Tested-by: Chao Gao Tested-by: Sagi Shahar --- arch/x86/include/asm/virt.h | 13 ++++++- arch/x86/kernel/cpu/common.c | 2 + arch/x86/kvm/vmx/vmx.c | 58 ++--------------------------- arch/x86/virt/hw.c | 71 ++++++++++++++++++++++++++++++++++++ 4 files changed, 89 insertions(+), 55 deletions(-) diff --git a/arch/x86/include/asm/virt.h b/arch/x86/include/asm/virt.h index 131b9bf9ef3c..0da6db4f5b0c 100644 --- a/arch/x86/include/asm/virt.h +++ b/arch/x86/include/asm/virt.h @@ -2,10 +2,21 @@ #ifndef _ASM_X86_VIRT_H #define _ASM_X86_VIRT_H =20 -#include +#include + +#include =20 #if IS_ENABLED(CONFIG_KVM_X86) extern bool virt_rebooting; + +void __init x86_virt_init(void); + +#if IS_ENABLED(CONFIG_KVM_INTEL) +DECLARE_PER_CPU(struct vmcs *, root_vmcs); +#endif + +#else +static __always_inline void x86_virt_init(void) {} #endif =20 #endif /* _ASM_X86_VIRT_H */ diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index e7ab22fce3b5..dda9e41292db 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -71,6 +71,7 @@ #include #include #include +#include #include #include =20 @@ -2143,6 +2144,7 @@ static __init void identify_boot_cpu(void) cpu_detect_tlb(&boot_cpu_data); setup_cr_pinning(); =20 + x86_virt_init(); tsx_init(); tdx_init(); lkgs_init(); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index fc6e3b620866..abd4830f71d8 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -580,7 +580,6 @@ noinline void invept_error(unsigned long ext, u64 eptp) vmx_insn_failed("invept failed: ext=3D0x%lx eptp=3D%llx\n", ext, eptp); } =20 -static DEFINE_PER_CPU(struct vmcs *, vmxarea); DEFINE_PER_CPU(struct vmcs *, current_vmcs); /* * We maintain a per-CPU linked-list of VMCS loaded on that CPU. This is n= eeded @@ -2934,6 +2933,9 @@ static bool __kvm_is_vmx_supported(void) return false; } =20 + if (!per_cpu(root_vmcs, cpu)) + return false; + return true; } =20 @@ -3008,7 +3010,7 @@ static int kvm_cpu_vmxon(u64 vmxon_pointer) int vmx_enable_virtualization_cpu(void) { int cpu =3D raw_smp_processor_id(); - u64 phys_addr =3D __pa(per_cpu(vmxarea, cpu)); + u64 phys_addr =3D __pa(per_cpu(root_vmcs, cpu)); int r; =20 if (cr4_read_shadow() & X86_CR4_VMXE) @@ -3129,47 +3131,6 @@ int alloc_loaded_vmcs(struct loaded_vmcs *loaded_vmc= s) return -ENOMEM; } =20 -static void free_kvm_area(void) -{ - int cpu; - - for_each_possible_cpu(cpu) { - free_vmcs(per_cpu(vmxarea, cpu)); - per_cpu(vmxarea, cpu) =3D NULL; - } -} - -static __init int alloc_kvm_area(void) -{ - int cpu; - - for_each_possible_cpu(cpu) { - struct vmcs *vmcs; - - vmcs =3D alloc_vmcs_cpu(false, cpu, GFP_KERNEL); - if (!vmcs) { - free_kvm_area(); - return -ENOMEM; - } - - /* - * When eVMCS is enabled, alloc_vmcs_cpu() sets - * vmcs->revision_id to KVM_EVMCS_VERSION instead of - * revision_id reported by MSR_IA32_VMX_BASIC. - * - * However, even though not explicitly documented by - * TLFS, VMXArea passed as VMXON argument should - * still be marked with revision_id reported by - * physical CPU. - */ - if (kvm_is_using_evmcs()) - vmcs->hdr.revision_id =3D vmx_basic_vmcs_revision_id(vmcs_config.basic); - - per_cpu(vmxarea, cpu) =3D vmcs; - } - return 0; -} - static void fix_pmode_seg(struct kvm_vcpu *vcpu, int seg, struct kvm_segment *save) { @@ -8566,8 +8527,6 @@ void vmx_hardware_unsetup(void) =20 if (nested) nested_vmx_hardware_unsetup(); - - free_kvm_area(); } =20 void vmx_vm_destroy(struct kvm *kvm) @@ -8870,10 +8829,6 @@ __init int vmx_hardware_setup(void) return r; } =20 - r =3D alloc_kvm_area(); - if (r) - goto err_kvm_area; - kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler); =20 /* @@ -8900,11 +8855,6 @@ __init int vmx_hardware_setup(void) kvm_caps.inapplicable_quirks &=3D ~KVM_X86_QUIRK_IGNORE_GUEST_PAT; =20 return 0; - -err_kvm_area: - if (nested) - nested_vmx_hardware_unsetup(); - return r; } =20 void vmx_exit(void) diff --git a/arch/x86/virt/hw.c b/arch/x86/virt/hw.c index df3dc18d19b4..56972f594d90 100644 --- a/arch/x86/virt/hw.c +++ b/arch/x86/virt/hw.c @@ -1,7 +1,78 @@ // SPDX-License-Identifier: GPL-2.0-only +#include +#include +#include #include +#include +#include =20 +#include +#include #include +#include =20 __visible bool virt_rebooting; EXPORT_SYMBOL_FOR_KVM(virt_rebooting); + +#if IS_ENABLED(CONFIG_KVM_INTEL) +DEFINE_PER_CPU(struct vmcs *, root_vmcs); +EXPORT_PER_CPU_SYMBOL(root_vmcs); + +static __init void x86_vmx_exit(void) +{ + int cpu; + + for_each_possible_cpu(cpu) { + free_page((unsigned long)per_cpu(root_vmcs, cpu)); + per_cpu(root_vmcs, cpu) =3D NULL; + } +} + +static __init int x86_vmx_init(void) +{ + u64 basic_msr; + u32 rev_id; + int cpu; + + if (!cpu_feature_enabled(X86_FEATURE_VMX)) + return -EOPNOTSUPP; + + rdmsrq(MSR_IA32_VMX_BASIC, basic_msr); + + /* IA-32 SDM Vol 3B: VMCS size is never greater than 4kB. */ + if (WARN_ON_ONCE(vmx_basic_vmcs_size(basic_msr) > PAGE_SIZE)) + return -EIO; + + /* + * Even if eVMCS is enabled (or will be enabled?), and even though not + * explicitly documented by TLFS, the root VMCS passed to VMXON should + * still be marked with the revision_id reported by the physical CPU. + */ + rev_id =3D vmx_basic_vmcs_revision_id(basic_msr); + + for_each_possible_cpu(cpu) { + int node =3D cpu_to_node(cpu); + struct page *page; + struct vmcs *vmcs; + + page =3D __alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO, 0); + if (!page) { + x86_vmx_exit(); + return -ENOMEM; + } + + vmcs =3D page_address(page); + vmcs->hdr.revision_id =3D rev_id; + per_cpu(root_vmcs, cpu) =3D vmcs; + } + + return 0; +} +#else +static __init int x86_vmx_init(void) { return -EOPNOTSUPP; } +#endif + +void __init x86_virt_init(void) +{ + x86_vmx_init(); +} --=20 2.53.0.310.g728cabbaf7-goog From nobody Thu Apr 2 22:23:03 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0547628506B for ; Sat, 14 Feb 2026 01:27:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032440; cv=none; b=aYWR+XlKBjteiXyNvuQMWhZTFiEavKC2n1En74IY4C5PMbUlwvENyPSCb1qyVEyEWDexE6qkZw0kRAxPv1wiprCL8cB65gm4tsNqrFWIPzT7vfNNPQK9x2p6NxZRP0yYTLvkzTvoTl884gSKZwf/WWlT1vtKuSqAYXpOcmWqplo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032440; c=relaxed/simple; bh=KxQaeOFTUwNddWWAwysEkVI+6gLN8UxXhcnpjxklb+A=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=n3dkQRdwZL+a3kjCsVrarXM4rxJy2njkuWdDM1Dpi4MAoq1EJv7KvHma7mc8yteF9jm8mDrPzk5xmD9aH/f0xLOroOah9mrEMYHgltvYf6c10uqkmYm5mJa2lucxBG7XRV/FcABWS7Bsph93jZ+jf7eh6WM5WacO/7NyBYM1NiM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=XJ/dSDIR; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="XJ/dSDIR" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-354c72d23dfso6166709a91.2 for ; Fri, 13 Feb 2026 17:27:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1771032435; x=1771637235; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=ncRTKT34NsAmZ158dpqAeNu/HepDjS+oL5Xbl/IP/0E=; b=XJ/dSDIRMgLOoGcIeQSSlYDnhpgjRACXRnBQt7SUi8al2r6NZ1t/zPFQRmMPO0VZ5I 1729pOs24HGlcDrvT820muwdBAXj+ia9yAcXg85xR97KCNYzHpLhtSVFK8+BTS/Q0ocZ g+NSpKHhjD7LDS7OU4GXmBgBRW10KxWhRS8N/KHmI2Y77fzD9EyFbM/J4y1K/NuI4usq GZg0rurbbY4e8D7D01WrbQNXrqGwyzDEyeuGocOPxQ9KtycIDptP/927gCwE1FfRDKk0 gtA4f3PNFx37kMmSq9IZ3kUfR373ObgH9U9nRKSGomq6G8KO48PZjq4/eY5d2ybBddm5 tuYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771032435; x=1771637235; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ncRTKT34NsAmZ158dpqAeNu/HepDjS+oL5Xbl/IP/0E=; b=QV8ja55/5bZ/qDLZ3+wDRjHt113pF6Z2UIuPootk/vtKYE+9TXXqicxQsR8VyLCyMz 77Dc98IuSnPLFlEWqDO3h0RN7Fv+icCUVabNj2aG0y33Cbwus4BNLIesSf8Im3v1yzas tQRKiaWRF5ispV/oxFlJ0HMVExrXzAoLm4oC/lgfs5ZTeeBmx2a4LNkLu6cT+sNlHQsw pLfqbJyVwPn6PK4lhGFhZ7TGzbye4PArv6SBFB58SDgqTJYoGAtOo5IKJJcCEmdJ72nF Ngra/MlHwqwQVKJflAI09ZaqsVsvegdZoJyGWZdd8F420TnhlOw90B3V8FK0RJR2fseG OxUw== X-Gm-Message-State: AOJu0YykFAu1z0SC5EUvXfjrTl4usNTa1n8BIss0OQ9n2wp5sOmJ1CFy O6Pet5yH8+qDPC1waZvUOgFCj2BAgwhFhprNnpOJz2nAZ7rqIGjZPQ7cH5CTV4l6o+/H1Iw3Ktd zgPgW1g== X-Received: from pjl16.prod.google.com ([2002:a17:90b:2f90:b0:353:28:3531]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:5105:b0:33b:be31:8194 with SMTP id 98e67ed59e1d1-356aad80906mr3557884a91.34.1771032434676; Fri, 13 Feb 2026 17:27:14 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 13 Feb 2026 17:26:51 -0800 In-Reply-To: <20260214012702.2368778-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260214012702.2368778-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.310.g728cabbaf7-goog Message-ID: <20260214012702.2368778-6-seanjc@google.com> Subject: [PATCH v3 05/16] x86/virt: Force-clear X86_FEATURE_VMX if configuring root VMCS fails From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Peter Zijlstra , Arnaldo Carvalho de Melo , Namhyung Kim , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, linux-perf-users@vger.kernel.org, Chao Gao , Xu Yilun , Dan Williams Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" If allocating and configuring a root VMCS fails, clear X86_FEATURE_VMX in all CPUs so that KVM doesn't need to manually check root_vmcs. As added bonuses, clearing VMX will reflect that VMX is unusable in /proc/cpuinfo, and will avoid a futile auto-probe of kvm-intel.ko. WARN if allocating a root VMCS page fails, e.g. to help users figure out why VMX is broken in the unlikely scenario something goes sideways during boot (and because the allocation should succeed unless there's a kernel bug). Tweak KVM's error message to suggest checking kernel logs if VMX is unsupported (in addition to checking BIOS). Signed-off-by: Sean Christopherson Reviewed-by: Dan Williams Tested-by: Chao Gao Tested-by: Sagi Shahar --- arch/x86/kvm/vmx/vmx.c | 7 ++++--- arch/x86/virt/hw.c | 14 ++++++++++++-- 2 files changed, 16 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index abd4830f71d8..e767835a4f3a 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2927,14 +2927,15 @@ static bool __kvm_is_vmx_supported(void) return false; } =20 - if (!this_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) || - !this_cpu_has(X86_FEATURE_VMX)) { + if (!this_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL)) { pr_err("VMX not enabled (by BIOS) in MSR_IA32_FEAT_CTL on CPU %d\n", cpu= ); return false; } =20 - if (!per_cpu(root_vmcs, cpu)) + if (!this_cpu_has(X86_FEATURE_VMX)) { + pr_err("VMX not fully enabled on CPU %d. Check kernel logs and/or BIOS\= n", cpu); return false; + } =20 return true; } diff --git a/arch/x86/virt/hw.c b/arch/x86/virt/hw.c index 56972f594d90..40495872fdfb 100644 --- a/arch/x86/virt/hw.c +++ b/arch/x86/virt/hw.c @@ -28,7 +28,7 @@ static __init void x86_vmx_exit(void) } } =20 -static __init int x86_vmx_init(void) +static __init int __x86_vmx_init(void) { u64 basic_msr; u32 rev_id; @@ -56,7 +56,7 @@ static __init int x86_vmx_init(void) struct vmcs *vmcs; =20 page =3D __alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO, 0); - if (!page) { + if (WARN_ON_ONCE(!page)) { x86_vmx_exit(); return -ENOMEM; } @@ -68,6 +68,16 @@ static __init int x86_vmx_init(void) =20 return 0; } + +static __init int x86_vmx_init(void) +{ + int r; + + r =3D __x86_vmx_init(); + if (r) + setup_clear_cpu_cap(X86_FEATURE_VMX); + return r; +} #else static __init int x86_vmx_init(void) { return -EOPNOTSUPP; } #endif --=20 2.53.0.310.g728cabbaf7-goog From nobody Thu Apr 2 22:23:03 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 65A9E29DB99 for ; Sat, 14 Feb 2026 01:27:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032439; cv=none; b=tJQysNEtdkAfYgujvTIH0fpzmhNlH52junQUjwGajxMYlmhw3ShH6RUxx/aMwcLCZ7kOBeJLYm3RdgZZ0/a3Y431LJZ5V36T4bjQnfageDHsYqrWLvjY3G6/viXnPSA4VGl6y6rIA8T4Rj/lvwhCQS5A2DnqPjD6TSN7ydDFRak= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032439; c=relaxed/simple; bh=/EVryE9KEraeRZcfAjdAXyVVep9IRYm2MBvmK48Hx8s=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=mlm3LkFoZfFxBAzpnoFKYs9AKctHsyQmFHbHYFXkM0Pq9rB3e9Cgepx8tgTWIbcRPixX3BMwvDcMUsHt397pW0UQf2ALbb6L49NW43MN9Y8Mx+EEYvh1XFB9xwn8RITvEATdRUstOHhp46uhwGpVvNlR7X+HzKKLxW08NoJWIis= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=CCQoMUPR; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="CCQoMUPR" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2aaf2f3bef6so18029845ad.0 for ; Fri, 13 Feb 2026 17:27:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1771032437; x=1771637237; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=+DBpVUd3qPIKBqituI0MpYpKiFQclysQZsAv6lUaJSc=; b=CCQoMUPRXVQojyWut7IIWgXv7JuaFRLLOEwK8U7SKmhwW+lcXolwahsmferyelNTXm XkR5H63TxdsCMqQcCrpEwRz/mvfkNoc8dcImt0yYvRtktbIS6UBnS4+c1roobKgp4an0 hZxDmAeJKPqCdVACLaEBSKUsbV4e6eurISaYxsb/27FszTQvwmo2uCOo/RVfaccjDFfd loEKCqaoxvGROHUscMza21W/xmWU/gPsSAt5Scq6cWkPog6nfLFeV3kSn7IMLM6cIKc4 OfzYcTAToq4cjAjY7LGtDnYKP/3/KrrEdJp9Y8DdYytvlO80Ty05lsJhqMGd+XPAF8qI xXTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771032437; x=1771637237; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=+DBpVUd3qPIKBqituI0MpYpKiFQclysQZsAv6lUaJSc=; b=H1h4sMEHHT/Cg02uuvTDveHulotTfxBDPai6FbFPabwPkSbd4J7Q6OiXDLFqGQn+FT wdS8BSgrwm8FAksFrsDmS2pjgg4g6OiszBEtV3RqYJ8govfOs0bhAex/fFeb1Mh5BL6v GasG+DM52nKtkctF1Jm3Us/W7kYkS0ztbd7T9mXGGwHdfbbd3R+CgIYKDvAtpBMvG8pt 94sMjdEpwau7buE72+QrQgBvqDfbFsDZJXgsH0N182VOfYST21OYSbbUU68/vUhPrZSj G9SfYCV7HoczX2XtGZqOEEz99V88Nj5MC8QNg1mPaafIQE3U1QcKxVQh7BiIiOQXl8Bp vZPw== X-Gm-Message-State: AOJu0YyBzLQeVG3tUdvsYqyUVLzTZvy377+SXEQ3QW/393aRVc1tBmfW qQSSk/d4AGFbn1aRihQpuFh5unmLkS/6CNrwNoePRcJOcJTqSMCNTF0jj3pYgxYufCfkW75wMpz GaCzPfA== X-Received: from pjbnd8.prod.google.com ([2002:a17:90b:4cc8:b0:356:2c99:c20a]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:db0f:b0:2aa:de68:98c8 with SMTP id d9443c01a7336-2ad1740c141mr11643275ad.4.1771032436499; Fri, 13 Feb 2026 17:27:16 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 13 Feb 2026 17:26:52 -0800 In-Reply-To: <20260214012702.2368778-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260214012702.2368778-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.310.g728cabbaf7-goog Message-ID: <20260214012702.2368778-7-seanjc@google.com> Subject: [PATCH v3 06/16] KVM: VMX: Move core VMXON enablement to kernel From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Peter Zijlstra , Arnaldo Carvalho de Melo , Namhyung Kim , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, linux-perf-users@vger.kernel.org, Chao Gao , Xu Yilun , Dan Williams Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Move the innermost VMXON+VMXOFF logic out of KVM and into to core x86 so that TDX can (eventually) force VMXON without having to rely on KVM being loaded, e.g. to do SEAMCALLs during initialization. Opportunistically update the comment regarding emergency disabling via NMI to clarify that virt_rebooting will be set by _another_ emergency callback, i.e. that virt_rebooting doesn't need to be set before VMCLEAR, only before _this_ invocation does VMXOFF. Signed-off-by: Sean Christopherson Acked-by: Dave Hansen Reviewed-by: Dan Williams Tested-by: Chao Gao Tested-by: Sagi Shahar --- arch/x86/events/intel/pt.c | 1 - arch/x86/include/asm/virt.h | 6 +-- arch/x86/kvm/vmx/vmx.c | 73 +++---------------------------- arch/x86/virt/hw.c | 85 ++++++++++++++++++++++++++++++++++++- 4 files changed, 92 insertions(+), 73 deletions(-) diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c index 44524a387c58..b5726b50e77d 100644 --- a/arch/x86/events/intel/pt.c +++ b/arch/x86/events/intel/pt.c @@ -1591,7 +1591,6 @@ void intel_pt_handle_vmx(int on) =20 local_irq_restore(flags); } -EXPORT_SYMBOL_FOR_KVM(intel_pt_handle_vmx); =20 /* * PMU callbacks diff --git a/arch/x86/include/asm/virt.h b/arch/x86/include/asm/virt.h index 0da6db4f5b0c..cca0210a5c16 100644 --- a/arch/x86/include/asm/virt.h +++ b/arch/x86/include/asm/virt.h @@ -2,8 +2,6 @@ #ifndef _ASM_X86_VIRT_H #define _ASM_X86_VIRT_H =20 -#include - #include =20 #if IS_ENABLED(CONFIG_KVM_X86) @@ -12,7 +10,9 @@ extern bool virt_rebooting; void __init x86_virt_init(void); =20 #if IS_ENABLED(CONFIG_KVM_INTEL) -DECLARE_PER_CPU(struct vmcs *, root_vmcs); +int x86_vmx_enable_virtualization_cpu(void); +int x86_vmx_disable_virtualization_cpu(void); +void x86_vmx_emergency_disable_virtualization_cpu(void); #endif =20 #else diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index e767835a4f3a..36238cc694fd 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -786,41 +786,16 @@ static int vmx_set_guest_uret_msr(struct vcpu_vmx *vm= x, return ret; } =20 -/* - * Disable VMX and clear CR4.VMXE (even if VMXOFF faults) - * - * Note, VMXOFF causes a #UD if the CPU is !post-VMXON, but it's impossibl= e to - * atomically track post-VMXON state, e.g. this may be called in NMI conte= xt. - * Eat all faults as all other faults on VMXOFF faults are mode related, i= .e. - * faults are guaranteed to be due to the !post-VMXON check unless the CPU= is - * magically in RM, VM86, compat mode, or at CPL>0. - */ -static int kvm_cpu_vmxoff(void) -{ - asm goto("1: vmxoff\n\t" - _ASM_EXTABLE(1b, %l[fault]) - ::: "cc", "memory" : fault); - - cr4_clear_bits(X86_CR4_VMXE); - return 0; - -fault: - cr4_clear_bits(X86_CR4_VMXE); - return -EIO; -} - void vmx_emergency_disable_virtualization_cpu(void) { int cpu =3D raw_smp_processor_id(); struct loaded_vmcs *v; =20 - virt_rebooting =3D true; - /* * Note, CR4.VMXE can be _cleared_ in NMI context, but it can only be - * set in task context. If this races with VMX is disabled by an NMI, - * VMCLEAR and VMXOFF may #UD, but KVM will eat those faults due to - * virt_rebooting set. + * set in task context. If this races with _another_ emergency call + * from NMI context, VMCLEAR may #UD, but KVM will eat those faults due + * to virt_rebooting being set by the interrupting NMI callback. */ if (!(__read_cr4() & X86_CR4_VMXE)) return; @@ -832,7 +807,7 @@ void vmx_emergency_disable_virtualization_cpu(void) vmcs_clear(v->shadow_vmcs); } =20 - kvm_cpu_vmxoff(); + x86_vmx_emergency_disable_virtualization_cpu(); } =20 static void __loaded_vmcs_clear(void *arg) @@ -2988,34 +2963,9 @@ int vmx_check_processor_compat(void) return 0; } =20 -static int kvm_cpu_vmxon(u64 vmxon_pointer) -{ - u64 msr; - - cr4_set_bits(X86_CR4_VMXE); - - asm goto("1: vmxon %[vmxon_pointer]\n\t" - _ASM_EXTABLE(1b, %l[fault]) - : : [vmxon_pointer] "m"(vmxon_pointer) - : : fault); - return 0; - -fault: - WARN_ONCE(1, "VMXON faulted, MSR_IA32_FEAT_CTL (0x3a) =3D 0x%llx\n", - rdmsrq_safe(MSR_IA32_FEAT_CTL, &msr) ? 0xdeadbeef : msr); - cr4_clear_bits(X86_CR4_VMXE); - - return -EFAULT; -} - int vmx_enable_virtualization_cpu(void) { int cpu =3D raw_smp_processor_id(); - u64 phys_addr =3D __pa(per_cpu(root_vmcs, cpu)); - int r; - - if (cr4_read_shadow() & X86_CR4_VMXE) - return -EBUSY; =20 /* * This can happen if we hot-added a CPU but failed to allocate @@ -3024,15 +2974,7 @@ int vmx_enable_virtualization_cpu(void) if (kvm_is_using_evmcs() && !hv_get_vp_assist_page(cpu)) return -EFAULT; =20 - intel_pt_handle_vmx(1); - - r =3D kvm_cpu_vmxon(phys_addr); - if (r) { - intel_pt_handle_vmx(0); - return r; - } - - return 0; + return x86_vmx_enable_virtualization_cpu(); } =20 static void vmclear_local_loaded_vmcss(void) @@ -3049,12 +2991,9 @@ void vmx_disable_virtualization_cpu(void) { vmclear_local_loaded_vmcss(); =20 - if (kvm_cpu_vmxoff()) - kvm_spurious_fault(); + x86_vmx_disable_virtualization_cpu(); =20 hv_reset_evmcs(); - - intel_pt_handle_vmx(0); } =20 struct vmcs *alloc_vmcs_cpu(bool shadow, int cpu, gfp_t flags) diff --git a/arch/x86/virt/hw.c b/arch/x86/virt/hw.c index 40495872fdfb..dc426c2bc24a 100644 --- a/arch/x86/virt/hw.c +++ b/arch/x86/virt/hw.c @@ -15,8 +15,89 @@ __visible bool virt_rebooting; EXPORT_SYMBOL_FOR_KVM(virt_rebooting); =20 #if IS_ENABLED(CONFIG_KVM_INTEL) -DEFINE_PER_CPU(struct vmcs *, root_vmcs); -EXPORT_PER_CPU_SYMBOL(root_vmcs); +static DEFINE_PER_CPU(struct vmcs *, root_vmcs); + +static int x86_virt_cpu_vmxon(void) +{ + u64 vmxon_pointer =3D __pa(per_cpu(root_vmcs, raw_smp_processor_id())); + u64 msr; + + cr4_set_bits(X86_CR4_VMXE); + + asm goto("1: vmxon %[vmxon_pointer]\n\t" + _ASM_EXTABLE(1b, %l[fault]) + : : [vmxon_pointer] "m"(vmxon_pointer) + : : fault); + return 0; + +fault: + WARN_ONCE(1, "VMXON faulted, MSR_IA32_FEAT_CTL (0x3a) =3D 0x%llx\n", + rdmsrq_safe(MSR_IA32_FEAT_CTL, &msr) ? 0xdeadbeef : msr); + cr4_clear_bits(X86_CR4_VMXE); + + return -EFAULT; +} + +int x86_vmx_enable_virtualization_cpu(void) +{ + int r; + + if (cr4_read_shadow() & X86_CR4_VMXE) + return -EBUSY; + + intel_pt_handle_vmx(1); + + r =3D x86_virt_cpu_vmxon(); + if (r) { + intel_pt_handle_vmx(0); + return r; + } + + return 0; +} +EXPORT_SYMBOL_FOR_KVM(x86_vmx_enable_virtualization_cpu); + +/* + * Disable VMX and clear CR4.VMXE (even if VMXOFF faults) + * + * Note, VMXOFF causes a #UD if the CPU is !post-VMXON, but it's impossibl= e to + * atomically track post-VMXON state, e.g. this may be called in NMI conte= xt. + * Eat all faults as all other faults on VMXOFF faults are mode related, i= .e. + * faults are guaranteed to be due to the !post-VMXON check unless the CPU= is + * magically in RM, VM86, compat mode, or at CPL>0. + */ +int x86_vmx_disable_virtualization_cpu(void) +{ + int r =3D -EIO; + + asm goto("1: vmxoff\n\t" + _ASM_EXTABLE(1b, %l[fault]) + ::: "cc", "memory" : fault); + r =3D 0; + +fault: + cr4_clear_bits(X86_CR4_VMXE); + intel_pt_handle_vmx(0); + return r; +} +EXPORT_SYMBOL_FOR_KVM(x86_vmx_disable_virtualization_cpu); + +void x86_vmx_emergency_disable_virtualization_cpu(void) +{ + virt_rebooting =3D true; + + /* + * Note, CR4.VMXE can be _cleared_ in NMI context, but it can only be + * set in task context. If this races with _another_ emergency call + * from NMI context, VMXOFF may #UD, but kernel will eat those faults + * due to virt_rebooting being set by the interrupting NMI callback. + */ + if (!(__read_cr4() & X86_CR4_VMXE)) + return; + + x86_vmx_disable_virtualization_cpu(); +} +EXPORT_SYMBOL_FOR_KVM(x86_vmx_emergency_disable_virtualization_cpu); =20 static __init void x86_vmx_exit(void) { --=20 2.53.0.310.g728cabbaf7-goog From nobody Thu Apr 2 22:23:03 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 065A728D83E for ; Sat, 14 Feb 2026 01:27:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032440; cv=none; b=EXgHV2zkAK7fcJKJAMB1jUzhPErsi6RcYqtQPZqoaQAOwbFUnGbtaVGan+ejvobQXNr0Vjo4z8h4ct/ZWAvVj1X8rHv1Ol84PoksRTg3MvGak8SdHRuodC7EJdPgwdu9VKgz4vNiFB4+Q1eS6phLANIy+WDTN+PclxCzJBR4h+Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032440; c=relaxed/simple; bh=kfZ1Zz3kqt0Lijh2yW8pjFnNAWPf0J1gDLwcJsJYrSU=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=jQp2zAdAnWUv3HJsePs7+n264oYZilNmg4faHpLp7sZ5nXEuqOsA78ETRAvRTHNG326MGS6dnnUQROPdeSyJaprj/zJUzeBQmuLIcTarMnr94zOWMfbgwBSulGYU/kT7RfdnGVVgoiVA2EjiSL2cxtgXJXV1GEf8P6kmAl2FHd0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Dt1vt/HZ; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Dt1vt/HZ" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-354c65f69edso1752820a91.0 for ; Fri, 13 Feb 2026 17:27:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1771032438; x=1771637238; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=g0hBl26w79lKn400XYX8NYrJRLqwtNFM7RMEsJBIMT0=; b=Dt1vt/HZiSNqdOpZWvHv5ibJ5IFAnUhAUzO58NQINehJzOl6o4MQ360JM7U/a5w9Di 4Dm5Zs6N8hJ28cZE4q77wFASLNdLSOVV34c8y6+wDzrbKHcDsDVPkZ2i+8oe9FBofkKg 6KxUvp6tkK3zLcNSOVK1d0033JXfuSbflB6r5qEre8m7rKvzqDGykWJE3w9ubf39fi7z 3mHMw4JCehAIQa9fsDyFUjhcQLkr4MGZ0a1VXOfyK5tMMD04RMDIDdSf+9XEbNIDHamN ZYm6pXSxM8XPvXmunEvOXLJdalFWLUbdizDHEPbomTM4dKlbxyNKpI0r5Ii58MCKPq17 NTMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771032438; x=1771637238; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=g0hBl26w79lKn400XYX8NYrJRLqwtNFM7RMEsJBIMT0=; b=oB1rtjE3Mtgmnenb/9Y+sVoWmPbnE8y0vpJpGoi7Crv0KFh/6ECq0giGrXxc712xUE QUFKrPU545gBpEmpXZdMfZlKemHjYaRV4g2j/Hy439O/mLC02loV5LuwNC26K9TsVxo5 E/2uqBMYbz9r+G0NHEu5QwX0370KUa+pdvDv5cbc/qZGO1777NUoAO3E/jOleWXszEe6 pCiMAJyLNGuC6jTieB2kyd9G7ME/tXYVIphxrNwy9+MKxpqN5iuwJrx+/Pv65JYgQI6N /uMPaDehmx7VVL9TwzszUdspOADZce+/qJEHOKrqIpdH7lxZS/e86hIPNsBEobh0SBwH GUnA== X-Gm-Message-State: AOJu0Yy4crm0fmOuXyhju5m/BMnl307nfbs7lu4k8VfclZvbBnOVeIpG +/k0CRpXhq8tgXHbTfeCz39U8nnemIIIJ7k1VGHe4KrxFZEBPi1pm4TsacV32u/uNtI2mQje1UK E3x66hQ== X-Received: from pjbpx8.prod.google.com ([2002:a17:90b:2708:b0:356:a274:747f]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2784:b0:343:684c:f8a0 with SMTP id 98e67ed59e1d1-356aad5f32fmr3636333a91.23.1771032438229; Fri, 13 Feb 2026 17:27:18 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 13 Feb 2026 17:26:53 -0800 In-Reply-To: <20260214012702.2368778-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260214012702.2368778-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.310.g728cabbaf7-goog Message-ID: <20260214012702.2368778-8-seanjc@google.com> Subject: [PATCH v3 07/16] KVM: SVM: Move core EFER.SVME enablement to kernel From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Peter Zijlstra , Arnaldo Carvalho de Melo , Namhyung Kim , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, linux-perf-users@vger.kernel.org, Chao Gao , Xu Yilun , Dan Williams Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Move the innermost EFER.SVME logic out of KVM and into to core x86 to land the SVM support alongside VMX support. This will allow providing a more unified API from the kernel to KVM, and will allow moving the bulk of the emergency disabling insanity out of KVM without having a weird split between kernel and KVM for SVM vs. VMX. No functional change intended. Signed-off-by: Sean Christopherson Reviewed-by: Dan Williams Tested-by: Chao Gao Tested-by: Sagi Shahar --- arch/x86/include/asm/virt.h | 6 +++++ arch/x86/kvm/svm/svm.c | 33 +++++------------------ arch/x86/virt/hw.c | 53 +++++++++++++++++++++++++++++++++++++ 3 files changed, 65 insertions(+), 27 deletions(-) diff --git a/arch/x86/include/asm/virt.h b/arch/x86/include/asm/virt.h index cca0210a5c16..9a0753eaa20c 100644 --- a/arch/x86/include/asm/virt.h +++ b/arch/x86/include/asm/virt.h @@ -15,6 +15,12 @@ int x86_vmx_disable_virtualization_cpu(void); void x86_vmx_emergency_disable_virtualization_cpu(void); #endif =20 +#if IS_ENABLED(CONFIG_KVM_AMD) +int x86_svm_enable_virtualization_cpu(void); +int x86_svm_disable_virtualization_cpu(void); +void x86_svm_emergency_disable_virtualization_cpu(void); +#endif + #else static __always_inline void x86_virt_init(void) {} #endif diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 0ae66c770ebc..5f033bf3ba83 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -478,27 +478,9 @@ static __always_inline struct sev_es_save_area *sev_es= _host_save_area(struct svm return &sd->save_area->host_sev_es_save; } =20 -static inline void kvm_cpu_svm_disable(void) -{ - uint64_t efer; - - wrmsrq(MSR_VM_HSAVE_PA, 0); - rdmsrq(MSR_EFER, efer); - if (efer & EFER_SVME) { - /* - * Force GIF=3D1 prior to disabling SVM, e.g. to ensure INIT and - * NMI aren't blocked. - */ - stgi(); - wrmsrq(MSR_EFER, efer & ~EFER_SVME); - } -} - static void svm_emergency_disable_virtualization_cpu(void) { - virt_rebooting =3D true; - - kvm_cpu_svm_disable(); + wrmsrq(MSR_VM_HSAVE_PA, 0); } =20 static void svm_disable_virtualization_cpu(void) @@ -507,7 +489,7 @@ static void svm_disable_virtualization_cpu(void) if (tsc_scaling) __svm_write_tsc_multiplier(SVM_TSC_RATIO_DEFAULT); =20 - kvm_cpu_svm_disable(); + x86_svm_disable_virtualization_cpu(); =20 amd_pmu_disable_virt(); } @@ -516,12 +498,12 @@ static int svm_enable_virtualization_cpu(void) { =20 struct svm_cpu_data *sd; - uint64_t efer; int me =3D raw_smp_processor_id(); + int r; =20 - rdmsrq(MSR_EFER, efer); - if (efer & EFER_SVME) - return -EBUSY; + r =3D x86_svm_enable_virtualization_cpu(); + if (r) + return r; =20 sd =3D per_cpu_ptr(&svm_data, me); sd->asid_generation =3D 1; @@ -529,8 +511,6 @@ static int svm_enable_virtualization_cpu(void) sd->next_asid =3D sd->max_asid + 1; sd->min_asid =3D max_sev_asid + 1; =20 - wrmsrq(MSR_EFER, efer | EFER_SVME); - wrmsrq(MSR_VM_HSAVE_PA, sd->save_area_pa); =20 if (static_cpu_has(X86_FEATURE_TSCRATEMSR)) { @@ -541,7 +521,6 @@ static int svm_enable_virtualization_cpu(void) __svm_write_tsc_multiplier(SVM_TSC_RATIO_DEFAULT); } =20 - /* * Get OSVW bits. * diff --git a/arch/x86/virt/hw.c b/arch/x86/virt/hw.c index dc426c2bc24a..014e9dfab805 100644 --- a/arch/x86/virt/hw.c +++ b/arch/x86/virt/hw.c @@ -163,6 +163,59 @@ static __init int x86_vmx_init(void) static __init int x86_vmx_init(void) { return -EOPNOTSUPP; } #endif =20 +#if IS_ENABLED(CONFIG_KVM_AMD) +int x86_svm_enable_virtualization_cpu(void) +{ + u64 efer; + + if (!cpu_feature_enabled(X86_FEATURE_SVM)) + return -EOPNOTSUPP; + + rdmsrq(MSR_EFER, efer); + if (efer & EFER_SVME) + return -EBUSY; + + wrmsrq(MSR_EFER, efer | EFER_SVME); + return 0; +} +EXPORT_SYMBOL_FOR_KVM(x86_svm_enable_virtualization_cpu); + +int x86_svm_disable_virtualization_cpu(void) +{ + int r =3D -EIO; + u64 efer; + + /* + * Force GIF=3D1 prior to disabling SVM, e.g. to ensure INIT and + * NMI aren't blocked. + */ + asm goto("1: stgi\n\t" + _ASM_EXTABLE(1b, %l[fault]) + ::: "memory" : fault); + r =3D 0; + +fault: + rdmsrq(MSR_EFER, efer); + wrmsrq(MSR_EFER, efer & ~EFER_SVME); + return r; +} +EXPORT_SYMBOL_FOR_KVM(x86_svm_disable_virtualization_cpu); + +void x86_svm_emergency_disable_virtualization_cpu(void) +{ + u64 efer; + + virt_rebooting =3D true; + + rdmsrq(MSR_EFER, efer); + if (!(efer & EFER_SVME)) + return; + + x86_svm_disable_virtualization_cpu(); +} +EXPORT_SYMBOL_FOR_KVM(x86_svm_emergency_disable_virtualization_cpu); +#endif + void __init x86_virt_init(void) { x86_vmx_init(); --=20 2.53.0.310.g728cabbaf7-goog From nobody Thu Apr 2 22:23:03 2026 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C15492BEC4E for ; Sat, 14 Feb 2026 01:27:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032446; cv=none; b=rJ+IxbPfKMPyt9g4/Hcp0xOnJ5j6Z6CpfhnzIuk4UBudldDDphEyupKySTfO15A4Q0SmOywLmGTf57ZJCtnGZ8TYiTPQ0qAQLjgey1VHFaktymmbG9iftHSbZ/4qj2dKuKv5hjPxCup4w5miK/b+fQIF4d3VygwotBDq6GK9TFM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032446; c=relaxed/simple; bh=SFe9wm1PDAKjnrht0sUarAMichPsYfPk0uBQoi5Ozw8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=E0shOG/KvP7hstpKmVm+bB6x74wRPUBH9NjqjJ1Ed5TWn9TU1gWJxrOdmXtmf+LT0QpGzPJ8y197BrHaIiZnHM5+yQptrDHC+2Jp3+Llx/fFNDlxndCizEFXylXCj+GJC0EFQF70R4KWPpqCZEtioJ1bhBYLNEiT8cdeQ9z8quA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=TsohMgjg; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="TsohMgjg" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-c6e1d32a128so976375a12.3 for ; Fri, 13 Feb 2026 17:27:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1771032440; x=1771637240; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=fTHOLr69ZvJgSZJU6aztcC4yMSqOUAaHLuJiMpybd6M=; b=TsohMgjg/irXT9teRZy5VoNBF85tgLXek+PiI0IYA/gtky3LWIBOoJs7qSZnM5xXVF +EzqZaNwbnQkgktOgSqttSCGomMVPmnEjl/QUc0TcVdgKTk0QnbYsnwEr5wMyB5GiPOu 6f9KMSzp2pUUJapyb8hB9LQKfD9zZEMCIjwEfpsLbCyKOMuSouS2EP+e93NK6QK1SqUS 58f+kofO3ThNJqH6ProNKMdCw8PyWxyRHmdeqHrYOFQfVAA4rQwqSWsrWHcxU4fxGSA+ 5WzmjuHUZ4WmCLYijc6V4xBEKKy8O5qAI/cp85n/Hss/ir+A0y9u3OmKu9NE8ZSLJimA rQcg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771032440; x=1771637240; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=fTHOLr69ZvJgSZJU6aztcC4yMSqOUAaHLuJiMpybd6M=; b=W2q22FX0pX0Z3nTFpA+6qpNzdaNYc3XLD0qlP+jELgnynn9+8eXWwQLLs/wJKzPZyI p79W0xBhVqPzifZbF0C0Y9rKVWpY6XcSPigDkxzs0Z7L2e2Z1bemjVxbhX7qXYl809TB qR0ALmL21htiZ+Ty6Jq/l47j1GTDi8oEWVVMSDmEg8o7vkjs4K+rbNc8G+4fZhSRFQA8 LfuLyP1A3lZs/8OaxVcuDTKFsFu6wmz0E52PeSTVMUA1tqyRlVS2Cf5Kbk+Pvbes5n6b KHBY/n+u+n+2sKuvXzIcVaXXdY1UDUm66pdPjzUrddLit8sP4LqQ+x64dqupv9w4fNbj /p3Q== X-Gm-Message-State: AOJu0YylkVO6eZZHngnvm835ourn58DsN9dfzDg+69m/B4UNn/0ceywH LpPY1xvB8usOt3jvOzsAznPKza51DwNTMiAgga3iCmozwB+nkcVelMbQ26ofqClfiZ7Ws4Y7moe mIdvdBQ== X-Received: from pgcb65.prod.google.com ([2002:a63:6744:0:b0:c6e:260e:bff2]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:6009:b0:366:581e:19f6 with SMTP id adf61e73a8af0-3946c812d7emr3447834637.23.1771032439905; Fri, 13 Feb 2026 17:27:19 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 13 Feb 2026 17:26:54 -0800 In-Reply-To: <20260214012702.2368778-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260214012702.2368778-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.310.g728cabbaf7-goog Message-ID: <20260214012702.2368778-9-seanjc@google.com> Subject: [PATCH v3 08/16] KVM: x86: Move bulk of emergency virtualizaton logic to virt subsystem From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Peter Zijlstra , Arnaldo Carvalho de Melo , Namhyung Kim , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, linux-perf-users@vger.kernel.org, Chao Gao , Xu Yilun , Dan Williams Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Move the majority of the code related to disabling hardware virtualization in emergency from KVM into the virt subsystem so that virt can take full ownership of the state of SVM/VMX. This will allow refcounting usage of SVM/VMX so that KVM and the TDX subsystem can enable VMX without stomping on each other. To route the emergency callback to the "right" vendor code, add to avoid mixing vendor and generic code, implement a x86_virt_ops structure to track the emergency callback, along with the SVM vs. VMX (vs. "none") feature that is active. To avoid having to choose between SVM and VMX, simply refuse to enable either if both are somehow supported. No known CPU supports both SVM and VMX, and it's comically unlikely such a CPU will ever exist. Leave KVM's clearing of loaded VMCSes and MSR_VM_HSAVE_PA in KVM, via a callback explicitly scoped to KVM. Loading VMCSes and saving/restoring host state are firmly tied to running VMs, and thus are (a) KVM's responsibility and (b) operations that are still exclusively reserved for KVM (as far as in-tree code is concerned). I.e. the contract being established is that non-KVM subsystems can utilize virtualization, but for all intents and purposes cannot act as full-blown hypervisors. Signed-off-by: Sean Christopherson Reviewed-by: Chao Gao Reviewed-by: Dan Williams Tested-by: Chao Gao Tested-by: Sagi Shahar --- arch/x86/include/asm/kvm_host.h | 3 +- arch/x86/include/asm/reboot.h | 11 --- arch/x86/include/asm/virt.h | 9 ++- arch/x86/kernel/crash.c | 3 +- arch/x86/kernel/reboot.c | 63 ++-------------- arch/x86/kernel/smp.c | 5 +- arch/x86/kvm/vmx/vmx.c | 11 --- arch/x86/kvm/x86.c | 4 +- arch/x86/virt/hw.c | 123 +++++++++++++++++++++++++++++--- 9 files changed, 138 insertions(+), 94 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index ff07c45e3c73..0bda52fbcae5 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -40,7 +40,8 @@ #include #include #include -#include +#include + #include =20 #define __KVM_HAVE_ARCH_VCPU_DEBUGFS diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h index ecd58ea9a837..a671a1145906 100644 --- a/arch/x86/include/asm/reboot.h +++ b/arch/x86/include/asm/reboot.h @@ -25,17 +25,6 @@ void __noreturn machine_real_restart(unsigned int type); #define MRR_BIOS 0 #define MRR_APM 1 =20 -typedef void (cpu_emergency_virt_cb)(void); -#if IS_ENABLED(CONFIG_KVM_X86) -void cpu_emergency_register_virt_callback(cpu_emergency_virt_cb *callback); -void cpu_emergency_unregister_virt_callback(cpu_emergency_virt_cb *callbac= k); -void cpu_emergency_disable_virtualization(void); -#else -static inline void cpu_emergency_register_virt_callback(cpu_emergency_virt= _cb *callback) {} -static inline void cpu_emergency_unregister_virt_callback(cpu_emergency_vi= rt_cb *callback) {} -static inline void cpu_emergency_disable_virtualization(void) {} -#endif /* CONFIG_KVM_X86 */ - typedef void (*nmi_shootdown_cb)(int, struct pt_regs*); void nmi_shootdown_cpus(nmi_shootdown_cb callback); void run_crash_ipi_callback(struct pt_regs *regs); diff --git a/arch/x86/include/asm/virt.h b/arch/x86/include/asm/virt.h index 9a0753eaa20c..2c35534437e0 100644 --- a/arch/x86/include/asm/virt.h +++ b/arch/x86/include/asm/virt.h @@ -4,6 +4,8 @@ =20 #include =20 +typedef void (cpu_emergency_virt_cb)(void); + #if IS_ENABLED(CONFIG_KVM_X86) extern bool virt_rebooting; =20 @@ -12,17 +14,20 @@ void __init x86_virt_init(void); #if IS_ENABLED(CONFIG_KVM_INTEL) int x86_vmx_enable_virtualization_cpu(void); int x86_vmx_disable_virtualization_cpu(void); -void x86_vmx_emergency_disable_virtualization_cpu(void); #endif =20 #if IS_ENABLED(CONFIG_KVM_AMD) int x86_svm_enable_virtualization_cpu(void); int x86_svm_disable_virtualization_cpu(void); -void x86_svm_emergency_disable_virtualization_cpu(void); #endif =20 +int x86_virt_emergency_disable_virtualization_cpu(void); + +void x86_virt_register_emergency_callback(cpu_emergency_virt_cb *callback); +void x86_virt_unregister_emergency_callback(cpu_emergency_virt_cb *callbac= k); #else static __always_inline void x86_virt_init(void) {} +static inline int x86_virt_emergency_disable_virtualization_cpu(void) { re= turn -ENOENT; } #endif =20 #endif /* _ASM_X86_VIRT_H */ diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 335fd2ee9766..cd796818d94d 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -42,6 +42,7 @@ #include #include #include +#include =20 /* Used while preparing memory map entries for second kernel */ struct crash_memmap_data { @@ -111,7 +112,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs) =20 crash_smp_send_stop(); =20 - cpu_emergency_disable_virtualization(); + x86_virt_emergency_disable_virtualization_cpu(); =20 /* * Disable Intel PT to stop its logging diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c index 6032fa9ec753..0bab8863375a 100644 --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -27,6 +27,7 @@ #include #include #include +#include =20 #include #include @@ -532,51 +533,6 @@ static inline void kb_wait(void) static inline void nmi_shootdown_cpus_on_restart(void); =20 #if IS_ENABLED(CONFIG_KVM_X86) -/* RCU-protected callback to disable virtualization prior to reboot. */ -static cpu_emergency_virt_cb __rcu *cpu_emergency_virt_callback; - -void cpu_emergency_register_virt_callback(cpu_emergency_virt_cb *callback) -{ - if (WARN_ON_ONCE(rcu_access_pointer(cpu_emergency_virt_callback))) - return; - - rcu_assign_pointer(cpu_emergency_virt_callback, callback); -} -EXPORT_SYMBOL_FOR_KVM(cpu_emergency_register_virt_callback); - -void cpu_emergency_unregister_virt_callback(cpu_emergency_virt_cb *callbac= k) -{ - if (WARN_ON_ONCE(rcu_access_pointer(cpu_emergency_virt_callback) !=3D cal= lback)) - return; - - rcu_assign_pointer(cpu_emergency_virt_callback, NULL); - synchronize_rcu(); -} -EXPORT_SYMBOL_FOR_KVM(cpu_emergency_unregister_virt_callback); - -/* - * Disable virtualization, i.e. VMX or SVM, to ensure INIT is recognized d= uring - * reboot. VMX blocks INIT if the CPU is post-VMXON, and SVM blocks INIT = if - * GIF=3D0, i.e. if the crash occurred between CLGI and STGI. - */ -void cpu_emergency_disable_virtualization(void) -{ - cpu_emergency_virt_cb *callback; - - /* - * IRQs must be disabled as KVM enables virtualization in hardware via - * function call IPIs, i.e. IRQs need to be disabled to guarantee - * virtualization stays disabled. - */ - lockdep_assert_irqs_disabled(); - - rcu_read_lock(); - callback =3D rcu_dereference(cpu_emergency_virt_callback); - if (callback) - callback(); - rcu_read_unlock(); -} - static void emergency_reboot_disable_virtualization(void) { local_irq_disable(); @@ -588,16 +544,11 @@ static void emergency_reboot_disable_virtualization(v= oid) * We can't take any locks and we may be on an inconsistent state, so * use NMIs as IPIs to tell the other CPUs to disable VMX/SVM and halt. * - * Do the NMI shootdown even if virtualization is off on _this_ CPU, as - * other CPUs may have virtualization enabled. + * Safely force _this_ CPU out of VMX/SVM operation, and if necessary, + * blast NMIs to force other CPUs out of VMX/SVM as well.k */ - if (rcu_access_pointer(cpu_emergency_virt_callback)) { - /* Safely force _this_ CPU out of VMX/SVM operation. */ - cpu_emergency_disable_virtualization(); - - /* Disable VMX/SVM and halt on other CPUs. */ + if (!x86_virt_emergency_disable_virtualization_cpu()) nmi_shootdown_cpus_on_restart(); - } } #else static void emergency_reboot_disable_virtualization(void) { } @@ -875,10 +826,10 @@ static int crash_nmi_callback(unsigned int val, struc= t pt_regs *regs) shootdown_callback(cpu, regs); =20 /* - * Prepare the CPU for reboot _after_ invoking the callback so that the - * callback can safely use virtualization instructions, e.g. VMCLEAR. + * Disable virtualization, as both VMX and SVM can block INIT and thus + * prevent AP bringup, e.g. in a kdump kernel or in firmware. */ - cpu_emergency_disable_virtualization(); + x86_virt_emergency_disable_virtualization_cpu(); =20 atomic_dec(&waiting_for_crash_ipi); =20 diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index b014e6d229f9..cbf95fe2b207 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -35,6 +35,7 @@ #include #include #include +#include =20 /* * Some notes on x86 processor bugs affecting SMP operation: @@ -124,7 +125,7 @@ static int smp_stop_nmi_callback(unsigned int val, stru= ct pt_regs *regs) if (raw_smp_processor_id() =3D=3D atomic_read(&stopping_cpu)) return NMI_HANDLED; =20 - cpu_emergency_disable_virtualization(); + x86_virt_emergency_disable_virtualization_cpu(); stop_this_cpu(NULL); =20 return NMI_HANDLED; @@ -136,7 +137,7 @@ static int smp_stop_nmi_callback(unsigned int val, stru= ct pt_regs *regs) DEFINE_IDTENTRY_SYSVEC(sysvec_reboot) { apic_eoi(); - cpu_emergency_disable_virtualization(); + x86_virt_emergency_disable_virtualization_cpu(); stop_this_cpu(NULL); } =20 diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 36238cc694fd..c02fd7e91809 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -791,23 +791,12 @@ void vmx_emergency_disable_virtualization_cpu(void) int cpu =3D raw_smp_processor_id(); struct loaded_vmcs *v; =20 - /* - * Note, CR4.VMXE can be _cleared_ in NMI context, but it can only be - * set in task context. If this races with _another_ emergency call - * from NMI context, VMCLEAR may #UD, but KVM will eat those faults due - * to virt_rebooting being set by the interrupting NMI callback. - */ - if (!(__read_cr4() & X86_CR4_VMXE)) - return; - list_for_each_entry(v, &per_cpu(loaded_vmcss_on_cpu, cpu), loaded_vmcss_on_cpu_link) { vmcs_clear(v->vmcs); if (v->shadow_vmcs) vmcs_clear(v->shadow_vmcs); } - - x86_vmx_emergency_disable_virtualization_cpu(); } =20 static void __loaded_vmcs_clear(void *arg) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 69937d14f5e1..4f30acd639f3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13076,12 +13076,12 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_vcpu_deliver_s= ipi_vector); =20 void kvm_arch_enable_virtualization(void) { - cpu_emergency_register_virt_callback(kvm_x86_ops.emergency_disable_virtua= lization_cpu); + x86_virt_register_emergency_callback(kvm_x86_ops.emergency_disable_virtua= lization_cpu); } =20 void kvm_arch_disable_virtualization(void) { - cpu_emergency_unregister_virt_callback(kvm_x86_ops.emergency_disable_virt= ualization_cpu); + x86_virt_unregister_emergency_callback(kvm_x86_ops.emergency_disable_virt= ualization_cpu); } =20 int kvm_arch_enable_virtualization_cpu(void) diff --git a/arch/x86/virt/hw.c b/arch/x86/virt/hw.c index 014e9dfab805..73c8309ba3fb 100644 --- a/arch/x86/virt/hw.c +++ b/arch/x86/virt/hw.c @@ -11,9 +11,45 @@ #include #include =20 +struct x86_virt_ops { + int feature; + void (*emergency_disable_virtualization_cpu)(void); +}; +static struct x86_virt_ops virt_ops __ro_after_init; + __visible bool virt_rebooting; EXPORT_SYMBOL_FOR_KVM(virt_rebooting); =20 +static cpu_emergency_virt_cb __rcu *kvm_emergency_callback; + +void x86_virt_register_emergency_callback(cpu_emergency_virt_cb *callback) +{ + if (WARN_ON_ONCE(rcu_access_pointer(kvm_emergency_callback))) + return; + + rcu_assign_pointer(kvm_emergency_callback, callback); +} +EXPORT_SYMBOL_FOR_KVM(x86_virt_register_emergency_callback); + +void x86_virt_unregister_emergency_callback(cpu_emergency_virt_cb *callbac= k) +{ + if (WARN_ON_ONCE(rcu_access_pointer(kvm_emergency_callback) !=3D callback= )) + return; + + rcu_assign_pointer(kvm_emergency_callback, NULL); + synchronize_rcu(); +} +EXPORT_SYMBOL_FOR_KVM(x86_virt_unregister_emergency_callback); + +static void x86_virt_invoke_kvm_emergency_callback(void) +{ + cpu_emergency_virt_cb *kvm_callback; + + kvm_callback =3D rcu_dereference(kvm_emergency_callback); + if (kvm_callback) + kvm_callback(); +} + #if IS_ENABLED(CONFIG_KVM_INTEL) static DEFINE_PER_CPU(struct vmcs *, root_vmcs); =20 @@ -42,6 +78,9 @@ int x86_vmx_enable_virtualization_cpu(void) { int r; =20 + if (virt_ops.feature !=3D X86_FEATURE_VMX) + return -EOPNOTSUPP; + if (cr4_read_shadow() & X86_CR4_VMXE) return -EBUSY; =20 @@ -82,22 +121,24 @@ int x86_vmx_disable_virtualization_cpu(void) } EXPORT_SYMBOL_FOR_KVM(x86_vmx_disable_virtualization_cpu); =20 -void x86_vmx_emergency_disable_virtualization_cpu(void) +static void x86_vmx_emergency_disable_virtualization_cpu(void) { virt_rebooting =3D true; =20 /* * Note, CR4.VMXE can be _cleared_ in NMI context, but it can only be * set in task context. If this races with _another_ emergency call - * from NMI context, VMXOFF may #UD, but kernel will eat those faults - * due to virt_rebooting being set by the interrupting NMI callback. + * from NMI context, VMCLEAR (in KVM) and VMXOFF may #UD, but KVM and + * the kernel will eat those faults due to virt_rebooting being set by + * the interrupting NMI callback. */ if (!(__read_cr4() & X86_CR4_VMXE)) return; =20 + x86_virt_invoke_kvm_emergency_callback(); + x86_vmx_disable_virtualization_cpu(); } -EXPORT_SYMBOL_FOR_KVM(x86_vmx_emergency_disable_virtualization_cpu); =20 static __init void x86_vmx_exit(void) { @@ -111,6 +152,11 @@ static __init void x86_vmx_exit(void) =20 static __init int __x86_vmx_init(void) { + const struct x86_virt_ops vmx_ops =3D { + .feature =3D X86_FEATURE_VMX, + .emergency_disable_virtualization_cpu =3D x86_vmx_emergency_disable_virt= ualization_cpu, + }; + u64 basic_msr; u32 rev_id; int cpu; @@ -147,6 +193,7 @@ static __init int __x86_vmx_init(void) per_cpu(root_vmcs, cpu) =3D vmcs; } =20 + memcpy(&virt_ops, &vmx_ops, sizeof(virt_ops)); return 0; } =20 @@ -161,6 +208,7 @@ static __init int x86_vmx_init(void) } #else static __init int x86_vmx_init(void) { return -EOPNOTSUPP; } +static __init void x86_vmx_exit(void) { } #endif =20 #if IS_ENABLED(CONFIG_KVM_AMD) @@ -168,7 +216,7 @@ int x86_svm_enable_virtualization_cpu(void) { u64 efer; =20 - if (!cpu_feature_enabled(X86_FEATURE_SVM)) + if (virt_ops.feature !=3D X86_FEATURE_SVM) return -EOPNOTSUPP; =20 rdmsrq(MSR_EFER, efer); @@ -201,7 +249,7 @@ int x86_svm_disable_virtualization_cpu(void) } EXPORT_SYMBOL_FOR_KVM(x86_svm_disable_virtualization_cpu); =20 -void x86_svm_emergency_disable_virtualization_cpu(void) +static void x86_svm_emergency_disable_virtualization_cpu(void) { u64 efer; =20 @@ -211,12 +259,71 @@ void x86_svm_emergency_disable_virtualization_cpu(voi= d) if (!(efer & EFER_SVME)) return; =20 + x86_virt_invoke_kvm_emergency_callback(); + x86_svm_disable_virtualization_cpu(); } -EXPORT_SYMBOL_FOR_KVM(x86_svm_emergency_disable_virtualization_cpu); + +static __init int x86_svm_init(void) +{ + const struct x86_virt_ops svm_ops =3D { + .feature =3D X86_FEATURE_SVM, + .emergency_disable_virtualization_cpu =3D x86_svm_emergency_disable_virt= ualization_cpu, + }; + + if (!cpu_feature_enabled(X86_FEATURE_SVM)) + return -EOPNOTSUPP; + + memcpy(&virt_ops, &svm_ops, sizeof(virt_ops)); + return 0; +} +#else +static __init int x86_svm_init(void) { return -EOPNOTSUPP; } #endif =20 +/* + * Disable virtualization, i.e. VMX or SVM, to ensure INIT is recognized d= uring + * reboot. VMX blocks INIT if the CPU is post-VMXON, and SVM blocks INIT = if + * GIF=3D0, i.e. if the crash occurred between CLGI and STGI. + */ +int x86_virt_emergency_disable_virtualization_cpu(void) +{ + /* Ensure the !feature check can't get false positives. */ + BUILD_BUG_ON(!X86_FEATURE_SVM || !X86_FEATURE_VMX); + + if (!virt_ops.feature) + return -EOPNOTSUPP; + + /* + * IRQs must be disabled as virtualization is enabled in hardware via + * function call IPIs, i.e. IRQs need to be disabled to guarantee + * virtualization stays disabled. + */ + lockdep_assert_irqs_disabled(); + + /* + * Do the NMI shootdown even if virtualization is off on _this_ CPU, as + * other CPUs may have virtualization enabled. + * + * TODO: Track whether or not virtualization might be enabled on other + * CPUs? May not be worth avoiding the NMI shootdown... + */ + virt_ops.emergency_disable_virtualization_cpu(); + return 0; +} + void __init x86_virt_init(void) { - x86_vmx_init(); + /* + * Attempt to initialize both SVM and VMX, and simply use whichever one + * is present. Rsefuse to enable/use SVM or VMX if both are somehow + * supported. No known CPU supports both SVM and VMX. + */ + bool has_vmx =3D !x86_vmx_init(); + bool has_svm =3D !x86_svm_init(); + + if (WARN_ON_ONCE(has_vmx && has_svm)) { + x86_vmx_exit(); + memset(&virt_ops, 0, sizeof(virt_ops)); + } } --=20 2.53.0.310.g728cabbaf7-goog From nobody Thu Apr 2 22:23:03 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8B32128A3FA for ; Sat, 14 Feb 2026 01:27:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032448; cv=none; b=ev3Z+RjbSIlC2GwzRJnYw5oNJJvL+ojSVevPgULCeDC0SrxMZMQ2sk5qjRaOhN/uKho60D9u7kmAPZ6hg1/Xn+DS6RearrQDGA8vfrn0DQwUeU1gc4WmYnLz+6VSONI7/xZYQgbDSeP7ZXdMCus3yJZGZTtYAEhsyj3pAmmjTGQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032448; c=relaxed/simple; bh=iPXfIO7K+KW2+1Ao6Y3aG8DSZZi+oj+CTe/chHK4iYc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=jrZP1azxU4Ns3bnCSUD5RrB0evpHGA6vvyBJIjK5qBabLeSJmCk+gbNLQJSWQNmwRMkLbcWDLWYKZsB/3xhZ8NRFj/UTq5O/qsWcpk/C47/PZb/DvXhbibaM7wt0k0zzJySyUYWIseZ+CwAE4033mlAZ+vNre9vJ/iUfJ1m9vb0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=g9FQNT3o; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="g9FQNT3o" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2a75ed2f89dso19804615ad.1 for ; Fri, 13 Feb 2026 17:27:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1771032442; x=1771637242; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=6Ryx4LNHhCx13BXADixN3R43oN7W3jye5PV09KGYh1o=; b=g9FQNT3o+OVraA4G2DNBhxG6hohm9bbQs+TEsoqFyWndIp++HBz/P1brDZG/FHWoTV dRThohkW+XTvE0Onz0DLq5mNkMFTEA4Idyb0Hb5pMBgqEP7HAYUPQt9YcaiiEMmboyzy UGFUUpwIl8vuiqKzZlyAmN5y2P9+fKdh1v1iTy3oHFBfnGWpho8UBS/CTW5QfcbLI54Y UTU+0hGS4bpNFD7z6AyaQK/AXy711yf4JptXeVN49YdtjHYtkUpuNXgT0D1qAOJ6QHsH KMzBDe/llqIrpY7bij5vyDz/fgKghc8ePPnJrXT8KlZKObPqTwbxZQTu9Xe3UocuDSRV Kn7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771032442; x=1771637242; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=6Ryx4LNHhCx13BXADixN3R43oN7W3jye5PV09KGYh1o=; b=gt48ABnNL3T2FcjdtDSkFNrfrIzCz4uA1kfQqA8QWVEA6if1pE7SaiuxGBqHC9MJ2H WQr9eMuDu75kexh48Z+UkOmntG1lKjqYzx7LNgUOTWKT2eHZL6CSslLkXVx5NFXMIWdG aWCOnQGrXgUSY+qyTYNczxpHui4cGnmce45YftZFhXzUOASSvBcTqJjzjG+HcSsmKtJe p3cI+sL1dOXatpIFVMWO5iMAi0Y/JFFAcO9F/YeTjiAHScgVbixVVVqEKPMWl0llaCBf NIfAPPgQGMVqwi4VkmArvxtzi2jAJj3hbfmH4QILrmytwasWYKmrFqZ0HeHnhEvUpuyt 9mhA== X-Gm-Message-State: AOJu0YwW1WeJ+zc41D6xrWwjKQ3oncgXwjOsMRKq/I2ZHiY9ajzPHFgv L84CqKqfKBC5Q5+vUP7JFXg1zhX3DOzO0V2QLbKonGbOJNxnALJpb1ecZT2qspjOHvbJXB1+QO6 w1x+nUw== X-Received: from pjbsr6.prod.google.com ([2002:a17:90b:4e86:b0:354:c477:4601]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:c401:b0:2a0:97d2:a264 with SMTP id d9443c01a7336-2ad174f4c21mr14586945ad.37.1771032441603; Fri, 13 Feb 2026 17:27:21 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 13 Feb 2026 17:26:55 -0800 In-Reply-To: <20260214012702.2368778-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260214012702.2368778-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.310.g728cabbaf7-goog Message-ID: <20260214012702.2368778-10-seanjc@google.com> Subject: [PATCH v3 09/16] x86/virt: Add refcounting of VMX/SVM usage to support multiple in-kernel users From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Peter Zijlstra , Arnaldo Carvalho de Melo , Namhyung Kim , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, linux-perf-users@vger.kernel.org, Chao Gao , Xu Yilun , Dan Williams Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Implement a per-CPU refcounting scheme so that "users" of hardware virtualization, e.g. KVM and the future TDX code, can co-exist without pulling the rug out from under each other. E.g. if KVM were to disable VMX on module unload or when the last KVM VM was destroyed, SEAMCALLs from the TDX subsystem would #UD and panic the kernel. Disable preemption in the get/put APIs to ensure virtualization is fully enabled/disabled before returning to the caller. E.g. if the task were preempted after a 0=3D>1 transition, the new task would see a 1=3D>2 and th= us return without enabling virtualization. Explicitly disable preemption instead of requiring the caller to do so, because the need to disable preemption is an artifact of the implementation. E.g. from KVM's perspective there is no _need_ to disable preemption as KVM guarantees the pCPU on which it is running is stable (but preemption is enabled). Opportunistically abstract away SVM vs. VMX in the public APIs by using X86_FEATURE_{SVM,VMX} to communicate what technology the caller wants to enable and use. Cc: Xu Yilun Signed-off-by: Sean Christopherson Reviewed-by: Chao Gao Reviewed-by: Dan Williams Tested-by: Chao Gao Tested-by: Sagi Shahar --- arch/x86/include/asm/virt.h | 11 ++----- arch/x86/kvm/svm/svm.c | 4 +-- arch/x86/kvm/vmx/vmx.c | 4 +-- arch/x86/virt/hw.c | 64 +++++++++++++++++++++++++++---------- 4 files changed, 53 insertions(+), 30 deletions(-) diff --git a/arch/x86/include/asm/virt.h b/arch/x86/include/asm/virt.h index 2c35534437e0..1558a0673d06 100644 --- a/arch/x86/include/asm/virt.h +++ b/arch/x86/include/asm/virt.h @@ -11,15 +11,8 @@ extern bool virt_rebooting; =20 void __init x86_virt_init(void); =20 -#if IS_ENABLED(CONFIG_KVM_INTEL) -int x86_vmx_enable_virtualization_cpu(void); -int x86_vmx_disable_virtualization_cpu(void); -#endif - -#if IS_ENABLED(CONFIG_KVM_AMD) -int x86_svm_enable_virtualization_cpu(void); -int x86_svm_disable_virtualization_cpu(void); -#endif +int x86_virt_get_ref(int feat); +void x86_virt_put_ref(int feat); =20 int x86_virt_emergency_disable_virtualization_cpu(void); =20 diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 5f033bf3ba83..539fb4306dce 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -489,7 +489,7 @@ static void svm_disable_virtualization_cpu(void) if (tsc_scaling) __svm_write_tsc_multiplier(SVM_TSC_RATIO_DEFAULT); =20 - x86_svm_disable_virtualization_cpu(); + x86_virt_put_ref(X86_FEATURE_SVM); =20 amd_pmu_disable_virt(); } @@ -501,7 +501,7 @@ static int svm_enable_virtualization_cpu(void) int me =3D raw_smp_processor_id(); int r; =20 - r =3D x86_svm_enable_virtualization_cpu(); + r =3D x86_virt_get_ref(X86_FEATURE_SVM); if (r) return r; =20 diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index c02fd7e91809..6200cf4dbd26 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2963,7 +2963,7 @@ int vmx_enable_virtualization_cpu(void) if (kvm_is_using_evmcs() && !hv_get_vp_assist_page(cpu)) return -EFAULT; =20 - return x86_vmx_enable_virtualization_cpu(); + return x86_virt_get_ref(X86_FEATURE_VMX); } =20 static void vmclear_local_loaded_vmcss(void) @@ -2980,7 +2980,7 @@ void vmx_disable_virtualization_cpu(void) { vmclear_local_loaded_vmcss(); =20 - x86_vmx_disable_virtualization_cpu(); + x86_virt_put_ref(X86_FEATURE_VMX); =20 hv_reset_evmcs(); } diff --git a/arch/x86/virt/hw.c b/arch/x86/virt/hw.c index 73c8309ba3fb..c898f16fe612 100644 --- a/arch/x86/virt/hw.c +++ b/arch/x86/virt/hw.c @@ -13,6 +13,8 @@ =20 struct x86_virt_ops { int feature; + int (*enable_virtualization_cpu)(void); + int (*disable_virtualization_cpu)(void); void (*emergency_disable_virtualization_cpu)(void); }; static struct x86_virt_ops virt_ops __ro_after_init; @@ -20,6 +22,8 @@ static struct x86_virt_ops virt_ops __ro_after_init; __visible bool virt_rebooting; EXPORT_SYMBOL_FOR_KVM(virt_rebooting); =20 +static DEFINE_PER_CPU(int, virtualization_nr_users); + static cpu_emergency_virt_cb __rcu *kvm_emergency_callback; =20 void x86_virt_register_emergency_callback(cpu_emergency_virt_cb *callback) @@ -74,13 +78,10 @@ static int x86_virt_cpu_vmxon(void) return -EFAULT; } =20 -int x86_vmx_enable_virtualization_cpu(void) +static int x86_vmx_enable_virtualization_cpu(void) { int r; =20 - if (virt_ops.feature !=3D X86_FEATURE_VMX) - return -EOPNOTSUPP; - if (cr4_read_shadow() & X86_CR4_VMXE) return -EBUSY; =20 @@ -94,7 +95,6 @@ int x86_vmx_enable_virtualization_cpu(void) =20 return 0; } -EXPORT_SYMBOL_FOR_KVM(x86_vmx_enable_virtualization_cpu); =20 /* * Disable VMX and clear CR4.VMXE (even if VMXOFF faults) @@ -105,7 +105,7 @@ EXPORT_SYMBOL_FOR_KVM(x86_vmx_enable_virtualization_cpu= ); * faults are guaranteed to be due to the !post-VMXON check unless the CPU= is * magically in RM, VM86, compat mode, or at CPL>0. */ -int x86_vmx_disable_virtualization_cpu(void) +static int x86_vmx_disable_virtualization_cpu(void) { int r =3D -EIO; =20 @@ -119,7 +119,6 @@ int x86_vmx_disable_virtualization_cpu(void) intel_pt_handle_vmx(0); return r; } -EXPORT_SYMBOL_FOR_KVM(x86_vmx_disable_virtualization_cpu); =20 static void x86_vmx_emergency_disable_virtualization_cpu(void) { @@ -154,6 +153,8 @@ static __init int __x86_vmx_init(void) { const struct x86_virt_ops vmx_ops =3D { .feature =3D X86_FEATURE_VMX, + .enable_virtualization_cpu =3D x86_vmx_enable_virtualization_cpu, + .disable_virtualization_cpu =3D x86_vmx_disable_virtualization_cpu, .emergency_disable_virtualization_cpu =3D x86_vmx_emergency_disable_virt= ualization_cpu, }; =20 @@ -212,13 +213,10 @@ static __init void x86_vmx_exit(void) { } #endif =20 #if IS_ENABLED(CONFIG_KVM_AMD) -int x86_svm_enable_virtualization_cpu(void) +static int x86_svm_enable_virtualization_cpu(void) { u64 efer; =20 - if (virt_ops.feature !=3D X86_FEATURE_SVM) - return -EOPNOTSUPP; - rdmsrq(MSR_EFER, efer); if (efer & EFER_SVME) return -EBUSY; @@ -226,9 +224,8 @@ int x86_svm_enable_virtualization_cpu(void) wrmsrq(MSR_EFER, efer | EFER_SVME); return 0; } -EXPORT_SYMBOL_FOR_KVM(x86_svm_enable_virtualization_cpu); =20 -int x86_svm_disable_virtualization_cpu(void) +static int x86_svm_disable_virtualization_cpu(void) { int r =3D -EIO; u64 efer; @@ -247,7 +244,6 @@ int x86_svm_disable_virtualization_cpu(void) wrmsrq(MSR_EFER, efer & ~EFER_SVME); return r; } -EXPORT_SYMBOL_FOR_KVM(x86_svm_disable_virtualization_cpu); =20 static void x86_svm_emergency_disable_virtualization_cpu(void) { @@ -268,6 +264,8 @@ static __init int x86_svm_init(void) { const struct x86_virt_ops svm_ops =3D { .feature =3D X86_FEATURE_SVM, + .enable_virtualization_cpu =3D x86_svm_enable_virtualization_cpu, + .disable_virtualization_cpu =3D x86_svm_disable_virtualization_cpu, .emergency_disable_virtualization_cpu =3D x86_svm_emergency_disable_virt= ualization_cpu, }; =20 @@ -281,6 +279,41 @@ static __init int x86_svm_init(void) static __init int x86_svm_init(void) { return -EOPNOTSUPP; } #endif =20 +int x86_virt_get_ref(int feat) +{ + int r; + + /* Ensure the !feature check can't get false positives. */ + BUILD_BUG_ON(!X86_FEATURE_SVM || !X86_FEATURE_VMX); + + if (!virt_ops.feature || virt_ops.feature !=3D feat) + return -EOPNOTSUPP; + + guard(preempt)(); + + if (this_cpu_inc_return(virtualization_nr_users) > 1) + return 0; + + r =3D virt_ops.enable_virtualization_cpu(); + if (r) + WARN_ON_ONCE(this_cpu_dec_return(virtualization_nr_users)); + + return r; +} +EXPORT_SYMBOL_FOR_KVM(x86_virt_get_ref); + +void x86_virt_put_ref(int feat) +{ + guard(preempt)(); + + if (WARN_ON_ONCE(!this_cpu_read(virtualization_nr_users)) || + this_cpu_dec_return(virtualization_nr_users)) + return; + + BUG_ON(virt_ops.disable_virtualization_cpu() && !virt_rebooting); +} +EXPORT_SYMBOL_FOR_KVM(x86_virt_put_ref); + /* * Disable virtualization, i.e. VMX or SVM, to ensure INIT is recognized d= uring * reboot. VMX blocks INIT if the CPU is post-VMXON, and SVM blocks INIT = if @@ -288,9 +321,6 @@ static __init int x86_svm_init(void) { return -EOPNOTSU= PP; } */ int x86_virt_emergency_disable_virtualization_cpu(void) { - /* Ensure the !feature check can't get false positives. */ - BUILD_BUG_ON(!X86_FEATURE_SVM || !X86_FEATURE_VMX); - if (!virt_ops.feature) return -EOPNOTSUPP; =20 --=20 2.53.0.310.g728cabbaf7-goog From nobody Thu Apr 2 22:23:03 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 65F7229B22F for ; Sat, 14 Feb 2026 01:27:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032445; cv=none; b=DKURwiNrVnz8Gt/uvMneRQYW2LC2XeRjeDttwoIlsPO2GEcGhGEdS+JFtcK7DF0FgY3ZZOCWJdGhbt1ihKPrjLHYBvB0dQJ0aXKQ9Z+JQzc469Jds8jT4uYh1cK/dxMhn8JXKicFhTX1lFMtsbVfoF+nTtclbZ7HntAq2LbhbiU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032445; c=relaxed/simple; bh=hftPp1cgH316yfhw8k/dPrCnpI6uxskVDDg5/T2D/XQ=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=NGFRq2tNULGaxVDCzyOYQkBtjsERIR/a1962ndCZ4rccL/1tvrBLr/F4XAJNM4DyzhrnFVj5DxnwHR26UaygzI6szQ788Spotb1Xk0ocrjEypQvF28espux4VpONqmSU+YeG5vNFaKiG7pg0U7iQoQKTGphAPZTn+rvvCipDvEM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=ODe2sKw8; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ODe2sKw8" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-3562370038dso1400519a91.3 for ; Fri, 13 Feb 2026 17:27:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1771032444; x=1771637244; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=H80q9SD+282s080Ud0tbsBB2WQnuDfalKG9TFAGQvtU=; b=ODe2sKw8XoEK7CQYt+oCiLu5mLNoyuVj1bq+LvEAPbVmd7QLJjArvXZ9OHfsTmlrbn m7Se4Odc9ibeOBJ6ztQ6uNQXaPXElwac8G4qk/2g1vFwmdd67izJuA0opxTy+94nssVV CLNwOsDyI29nKRzGm1kVry8CmORq5gGhQLfXL92CJgKkNry4PSPbVNDDWu2pkh+EzKv/ cL03Q+MVhTVSH7PymNRyqlR2q/e8EIjhi7uR8TxNQtOyX0Cj03IYxNzjoSJOAxtqIB0f 9uyqsF+3OT1utCHsc/Kt4RCT7DSJvvMKb/1zcsBtPO89Yo26Z7MXAMfav0bgkNabM/yR zCKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771032444; x=1771637244; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=H80q9SD+282s080Ud0tbsBB2WQnuDfalKG9TFAGQvtU=; b=wi8+IsxVpkKRRQbSCXaaLaRJEBe2wQXj6Dbga10fHASEfBg4choKY4fPiwkFfLovJk NA9ISIaoCnlo7u8LrgyEj8joWVpVUTS02OKPIC65laW5jD7nUv8NsbpVMwvqQ2kxjfpN UVGud5KGMBx88xHKnEilPyevdKNZ4hGMX/ns0FPxUXUHVcYXkAZcCEaOhvmCJT516ZUI 4wWaPZSdrfViHx/CPBf0rGf4N1mUUCP9SMiSwZyZOnw9z5FWNG1eDolP7JqvG7YOaoXS 2zv+7Hv7w8tIGpo5UMUAOyJRYH2QsKiForAVru6BZYwwlspr2wCWuyqkfeqT5cI1cehX /e5g== X-Gm-Message-State: AOJu0Yyg6o5Lo4RCAcOlJN6ODubo6GiDzaVoRL8WOb9C862XlYx++kkL NbwwCmu5PQhd1JChQ2pbkVtUmg5Fp2NR6I2+XDaSxXCdblW8KVZtBy0CZrv8t0pe43V9TIkTFNv nz0U64g== X-Received: from pgvt5.prod.google.com ([2002:a65:64c5:0:b0:c65:e24e:cef1]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:43a5:b0:38e:9cc5:218c with SMTP id adf61e73a8af0-3946c8be39amr3465099637.54.1771032443485; Fri, 13 Feb 2026 17:27:23 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 13 Feb 2026 17:26:56 -0800 In-Reply-To: <20260214012702.2368778-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260214012702.2368778-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.310.g728cabbaf7-goog Message-ID: <20260214012702.2368778-11-seanjc@google.com> Subject: [PATCH v3 10/16] x86/virt/tdx: Drop the outdated requirement that TDX be enabled in IRQ context From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Peter Zijlstra , Arnaldo Carvalho de Melo , Namhyung Kim , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, linux-perf-users@vger.kernel.org, Chao Gao , Xu Yilun , Dan Williams Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Remove TDX's outdated requirement that per-CPU enabling be done via IPI function call, which was a stale artifact leftover from early versions of the TDX enablement series. The requirement that IRQs be disabled should have been dropped as part of the revamped series that relied on a the KVM rework to enable VMX at module load. In other words, the kernel's "requirement" was never a requirement at all, but instead a reflection of how KVM enabled VMX (via IPI callback) when the TDX subsystem code was merged. Note, accessing per-CPU information is safe even without disabling IRQs, as tdx_online_cpu() is invoked via a cpuhp callback, i.e. from a per-CPU thread. Link: https://lore.kernel.org/all/ZyJOiPQnBz31qLZ7@google.com Signed-off-by: Sean Christopherson Reviewed-by: Dan Williams Tested-by: Chao Gao Tested-by: Sagi Shahar --- arch/x86/kvm/vmx/tdx.c | 9 +-------- arch/x86/virt/vmx/tdx/tdx.c | 4 ---- 2 files changed, 1 insertion(+), 12 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 0c790eb0bfa6..582469118b79 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -3294,17 +3294,10 @@ int tdx_gmem_max_mapping_level(struct kvm *kvm, kvm= _pfn_t pfn, bool is_private) =20 static int tdx_online_cpu(unsigned int cpu) { - unsigned long flags; - int r; - /* Sanity check CPU is already in post-VMXON */ WARN_ON_ONCE(!(cr4_read_shadow() & X86_CR4_VMXE)); =20 - local_irq_save(flags); - r =3D tdx_cpu_enable(); - local_irq_restore(flags); - - return r; + return tdx_cpu_enable(); } =20 static int tdx_offline_cpu(unsigned int cpu) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 5ce4ebe99774..dfd82fac0498 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -148,8 +148,6 @@ static int try_init_module_global(void) * global initialization SEAMCALL if not done) on local cpu to make this * cpu be ready to run any other SEAMCALLs. * - * Always call this function via IPI function calls. - * * Return 0 on success, otherwise errors. */ int tdx_cpu_enable(void) @@ -160,8 +158,6 @@ int tdx_cpu_enable(void) if (!boot_cpu_has(X86_FEATURE_TDX_HOST_PLATFORM)) return -ENODEV; =20 - lockdep_assert_irqs_disabled(); - if (__this_cpu_read(tdx_lp_initialized)) return 0; =20 --=20 2.53.0.310.g728cabbaf7-goog From nobody Thu Apr 2 22:23:03 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EA0412C029F for ; Sat, 14 Feb 2026 01:27:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032451; cv=none; b=eF7RgDH8OwlbFJkHwYcaXTZoorlkHVgNW9Upy7hO2altqkpWnDFimWZnRsAYxlHxrDEc8hviBUHnyfo6BdK3+DNCmSCid72b6WxMBsn/pnjtx6sz3niqvP1rBQDGhG6vAH38xizRRFPjcc2ZTp4n7LLQziVJbJCXg1hpAEXWky4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032451; c=relaxed/simple; bh=veezVVvkHI8EJ5vDQW8W/d9IuIsrwRrbe2kmw/QFdRo=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=oWZbrLpNIAQJ195aKcoXbKOApM5AxiE3Nu7yPFhXjf38NU9AIBGoB48HOIjRUpu6uGRAn/RRhSaxrnMoX6gpdrRrCCjjg2Prhr54uVsi4QlmtlwY1M0aJqoGGgWpPa6/85bOf1fuknEYeqQYgoad21S3qyVYJDiZyXM1IyD4j9s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Q0/uub6e; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Q0/uub6e" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2a7701b6353so19217265ad.3 for ; Fri, 13 Feb 2026 17:27:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1771032445; x=1771637245; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=RoaSnnMV3Xy+vejoQwjN3YFWS5VqZgJOGJUK0ulpbmA=; b=Q0/uub6ea121aK9cqTQIZ+qIC6cgjYlK3fWyk322qngzCp3kF0BYgNqlLovAI7VOuF N8LVFcj0WYZsFxawv90WLXGvnUC0/XmSG/nxpcPmE5NKMmrNSI9gMZh7f2QAWvjc4DgS gv0igUSRmHn/s6wPmgCbVqS4h0ujW7xlaVq50sk/5iD/pXafkr5Wjcn5O9hS/IKxJraf hpIAy3+CcR9jVU1ETF395ODz8lw3bJphJY/9eDFiMiBp48LIUW+lMA/nicvg8VDxDec2 yOMqdOg09j/y0I3gbagKPmgLiqYb37brz4Cqg467MIc+gqGdl0gkuGXKahnlSbf2vO+9 Nw6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771032445; x=1771637245; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=RoaSnnMV3Xy+vejoQwjN3YFWS5VqZgJOGJUK0ulpbmA=; b=jKhCVKm+F+7kIWk8y5gax//4xRg7ysKTOJ3EOgdL+YQghoiRqxfNrGlyAPA2y2azlu qf7vgFHYsWqZW3aT4N6Q4lKpU0N8rO1+Ut8crJfaJs/l80tgjbO4hdXCyCjBCSmJYfvg oXA3XUU67jl8fzCGDPegZGqo8i7e6O3mQ8Uj95MlsNfDwg/Farw3iAvPXuYsqALYzCii AmXKl72W6MLSoXYjehYBDrgmYEYNzg1Wzy1BEjUxgoPGoU71vRszQdIk6ppbGagyOseB saSevVqoJVBMvsvkfrrdhSKpWx6WWNwxhdTK1R64VPMEx+VFjdrdS/ur52KHrmHzJeDR HLNw== X-Gm-Message-State: AOJu0YwHy2l14705ItfNQXXoQJtzNyYr5YV4r2Wztwy+h9opzuyKG5h5 +M3LrpMZ3sBNZo3cErgO/KfstXCy/8QvGmTsx+Gx+OPwAxQukJLMU1Ez+5L3aOh2nisdEy/vQo/ BjiiD6w== X-Received: from plhd16.prod.google.com ([2002:a17:903:2310:b0:2aa:d61d:b14c]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:46ce:b0:2a8:fc56:a3e6 with SMTP id d9443c01a7336-2ad174a09camr11139595ad.24.1771032445181; Fri, 13 Feb 2026 17:27:25 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 13 Feb 2026 17:26:57 -0800 In-Reply-To: <20260214012702.2368778-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260214012702.2368778-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.310.g728cabbaf7-goog Message-ID: <20260214012702.2368778-12-seanjc@google.com> Subject: [PATCH v3 11/16] KVM: x86/tdx: Do VMXON and TDX-Module initialization during subsys init From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Peter Zijlstra , Arnaldo Carvalho de Melo , Namhyung Kim , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, linux-perf-users@vger.kernel.org, Chao Gao , Xu Yilun , Dan Williams Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now that VMXON can be done without bouncing through KVM, do TDX-Module initialization during subsys init (specifically before module_init() so that it runs before KVM when both are built-in). Aside from the obvious benefits of separating core TDX code from KVM, this will allow tagging a pile of TDX functions and globals as being __init and __ro_after_init. Reviewed-by: Dan Williams Signed-off-by: Sean Christopherson Acked-by: Dave Hansen Reviewed-by: Chao Gao Tested-by: Chao Gao Tested-by: Sagi Shahar --- Documentation/arch/x86/tdx.rst | 36 +------ arch/x86/include/asm/tdx.h | 4 - arch/x86/kvm/vmx/tdx.c | 148 ++++++----------------------- arch/x86/virt/vmx/tdx/tdx.c | 168 +++++++++++++++++++-------------- arch/x86/virt/vmx/tdx/tdx.h | 8 -- 5 files changed, 130 insertions(+), 234 deletions(-) diff --git a/Documentation/arch/x86/tdx.rst b/Documentation/arch/x86/tdx.rst index 61670e7df2f7..ff6b110291bc 100644 --- a/Documentation/arch/x86/tdx.rst +++ b/Documentation/arch/x86/tdx.rst @@ -60,44 +60,18 @@ Besides initializing the TDX module, a per-cpu initiali= zation SEAMCALL must be done on one cpu before any other SEAMCALLs can be made on that cpu. =20 -The kernel provides two functions, tdx_enable() and tdx_cpu_enable() to -allow the user of TDX to enable the TDX module and enable TDX on local -cpu respectively. - -Making SEAMCALL requires VMXON has been done on that CPU. Currently only -KVM implements VMXON. For now both tdx_enable() and tdx_cpu_enable() -don't do VMXON internally (not trivial), but depends on the caller to -guarantee that. - -To enable TDX, the caller of TDX should: 1) temporarily disable CPU -hotplug; 2) do VMXON and tdx_enable_cpu() on all online cpus; 3) call -tdx_enable(). For example:: - - cpus_read_lock(); - on_each_cpu(vmxon_and_tdx_cpu_enable()); - ret =3D tdx_enable(); - cpus_read_unlock(); - if (ret) - goto no_tdx; - // TDX is ready to use - -And the caller of TDX must guarantee the tdx_cpu_enable() has been -successfully done on any cpu before it wants to run any other SEAMCALL. -A typical usage is do both VMXON and tdx_cpu_enable() in CPU hotplug -online callback, and refuse to online if tdx_cpu_enable() fails. - User can consult dmesg to see whether the TDX module has been initialized. =20 If the TDX module is initialized successfully, dmesg shows something like below:: =20 [..] virt/tdx: 262668 KBs allocated for PAMT - [..] virt/tdx: module initialized + [..] virt/tdx: TDX-Module initialized =20 If the TDX module failed to initialize, dmesg also shows it failed to initialize:: =20 - [..] virt/tdx: module initialization failed ... + [..] virt/tdx: TDX-Module initialization failed ... =20 TDX Interaction to Other Kernel Components ------------------------------------------ @@ -129,9 +103,9 @@ CPU Hotplug ~~~~~~~~~~~ =20 TDX module requires the per-cpu initialization SEAMCALL must be done on -one cpu before any other SEAMCALLs can be made on that cpu. The kernel -provides tdx_cpu_enable() to let the user of TDX to do it when the user -wants to use a new cpu for TDX task. +one cpu before any other SEAMCALLs can be made on that cpu. The kernel, +via the CPU hotplug framework, performs the necessary initialization when +a CPU is first brought online. =20 TDX doesn't support physical (ACPI) CPU hotplug. During machine boot, TDX verifies all boot-time present logical CPUs are TDX compatible before diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 6b338d7f01b7..a149740b24e8 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -145,8 +145,6 @@ static __always_inline u64 sc_retry(sc_func_t func, u64= fn, #define seamcall(_fn, _args) sc_retry(__seamcall, (_fn), (_args)) #define seamcall_ret(_fn, _args) sc_retry(__seamcall_ret, (_fn), (_args)) #define seamcall_saved_ret(_fn, _args) sc_retry(__seamcall_saved_ret, (_fn= ), (_args)) -int tdx_cpu_enable(void); -int tdx_enable(void); const char *tdx_dump_mce_info(struct mce *m); const struct tdx_sys_info *tdx_get_sysinfo(void); =20 @@ -223,8 +221,6 @@ u64 tdh_phymem_page_wbinvd_tdr(struct tdx_td *td); u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct page *page); #else static inline void tdx_init(void) { } -static inline int tdx_cpu_enable(void) { return -ENODEV; } -static inline int tdx_enable(void) { return -ENODEV; } static inline u32 tdx_get_nr_guest_keyids(void) { return 0; } static inline const char *tdx_dump_mce_info(struct mce *m) { return NULL; } static inline const struct tdx_sys_info *tdx_get_sysinfo(void) { return NU= LL; } diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 582469118b79..0ac01c119336 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -59,7 +59,7 @@ module_param_named(tdx, enable_tdx, bool, 0444); #define TDX_SHARED_BIT_PWL_5 gpa_to_gfn(BIT_ULL(51)) #define TDX_SHARED_BIT_PWL_4 gpa_to_gfn(BIT_ULL(47)) =20 -static enum cpuhp_state tdx_cpuhp_state; +static enum cpuhp_state tdx_cpuhp_state __ro_after_init; =20 static const struct tdx_sys_info *tdx_sysinfo; =20 @@ -3294,10 +3294,7 @@ int tdx_gmem_max_mapping_level(struct kvm *kvm, kvm_= pfn_t pfn, bool is_private) =20 static int tdx_online_cpu(unsigned int cpu) { - /* Sanity check CPU is already in post-VMXON */ - WARN_ON_ONCE(!(cr4_read_shadow() & X86_CR4_VMXE)); - - return tdx_cpu_enable(); + return 0; } =20 static int tdx_offline_cpu(unsigned int cpu) @@ -3336,51 +3333,6 @@ static int tdx_offline_cpu(unsigned int cpu) return -EBUSY; } =20 -static void __do_tdx_cleanup(void) -{ - /* - * Once TDX module is initialized, it cannot be disabled and - * re-initialized again w/o runtime update (which isn't - * supported by kernel). Only need to remove the cpuhp here. - * The TDX host core code tracks TDX status and can handle - * 'multiple enabling' scenario. - */ - WARN_ON_ONCE(!tdx_cpuhp_state); - cpuhp_remove_state_nocalls_cpuslocked(tdx_cpuhp_state); - tdx_cpuhp_state =3D 0; -} - -static void __tdx_cleanup(void) -{ - cpus_read_lock(); - __do_tdx_cleanup(); - cpus_read_unlock(); -} - -static int __init __do_tdx_bringup(void) -{ - int r; - - /* - * TDX-specific cpuhp callback to call tdx_cpu_enable() on all - * online CPUs before calling tdx_enable(), and on any new - * going-online CPU to make sure it is ready for TDX guest. - */ - r =3D cpuhp_setup_state_cpuslocked(CPUHP_AP_ONLINE_DYN, - "kvm/cpu/tdx:online", - tdx_online_cpu, tdx_offline_cpu); - if (r < 0) - return r; - - tdx_cpuhp_state =3D r; - - r =3D tdx_enable(); - if (r) - __do_tdx_cleanup(); - - return r; -} - static int __init __tdx_bringup(void) { const struct tdx_sys_info_td_conf *td_conf; @@ -3400,34 +3352,18 @@ static int __init __tdx_bringup(void) } } =20 - /* - * Enabling TDX requires enabling hardware virtualization first, - * as making SEAMCALLs requires CPU being in post-VMXON state. - */ - r =3D kvm_enable_virtualization(); - if (r) - return r; - - cpus_read_lock(); - r =3D __do_tdx_bringup(); - cpus_read_unlock(); - - if (r) - goto tdx_bringup_err; - - r =3D -EINVAL; /* Get TDX global information for later use */ tdx_sysinfo =3D tdx_get_sysinfo(); - if (WARN_ON_ONCE(!tdx_sysinfo)) - goto get_sysinfo_err; + if (!tdx_sysinfo) + return -ENODEV; =20 /* Check TDX module and KVM capabilities */ if (!tdx_get_supported_attrs(&tdx_sysinfo->td_conf) || !tdx_get_supported_xfam(&tdx_sysinfo->td_conf)) - goto get_sysinfo_err; + return -EINVAL; =20 if (!(tdx_sysinfo->features.tdx_features0 & MD_FIELD_ID_FEATURES0_TOPOLOG= Y_ENUM)) - goto get_sysinfo_err; + return -EINVAL; =20 /* * TDX has its own limit of maximum vCPUs it can support for all @@ -3462,34 +3398,31 @@ static int __init __tdx_bringup(void) if (td_conf->max_vcpus_per_td < num_present_cpus()) { pr_err("Disable TDX: MAX_VCPU_PER_TD (%u) smaller than number of logical= CPUs (%u).\n", td_conf->max_vcpus_per_td, num_present_cpus()); - goto get_sysinfo_err; + return -EINVAL; } =20 if (misc_cg_set_capacity(MISC_CG_RES_TDX, tdx_get_nr_guest_keyids())) - goto get_sysinfo_err; + return -EINVAL; =20 /* - * Leave hardware virtualization enabled after TDX is enabled - * successfully. TDX CPU hotplug depends on this. + * TDX-specific cpuhp callback to disallow offlining the last CPU in a + * packing while KVM is running one or more TDs. Reclaiming HKIDs + * requires doing PAGE.WBINVD on every package, i.e. offlining all CPUs + * of a package would prevent reclaiming the HKID. */ + r =3D cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "kvm/cpu/tdx:online", + tdx_online_cpu, tdx_offline_cpu); + if (r < 0) + goto err_cpuhup; + + tdx_cpuhp_state =3D r; return 0; =20 -get_sysinfo_err: - __tdx_cleanup(); -tdx_bringup_err: - kvm_disable_virtualization(); +err_cpuhup: + misc_cg_set_capacity(MISC_CG_RES_TDX, 0); return r; } =20 -void tdx_cleanup(void) -{ - if (enable_tdx) { - misc_cg_set_capacity(MISC_CG_RES_TDX, 0); - __tdx_cleanup(); - kvm_disable_virtualization(); - } -} - int __init tdx_bringup(void) { int r, i; @@ -3521,39 +3454,11 @@ int __init tdx_bringup(void) goto success_disable_tdx; } =20 - if (!cpu_feature_enabled(X86_FEATURE_MOVDIR64B)) { - pr_err("tdx: MOVDIR64B is required for TDX\n"); - goto success_disable_tdx; - } - - if (!cpu_feature_enabled(X86_FEATURE_SELFSNOOP)) { - pr_err("Self-snoop is required for TDX\n"); - goto success_disable_tdx; - } - if (!cpu_feature_enabled(X86_FEATURE_TDX_HOST_PLATFORM)) { - pr_err("tdx: no TDX private KeyIDs available\n"); + pr_err("TDX not supported by the host platform\n"); goto success_disable_tdx; } =20 - if (!enable_virt_at_load) { - pr_err("tdx: tdx requires kvm.enable_virt_at_load=3D1\n"); - goto success_disable_tdx; - } - - /* - * Ideally KVM should probe whether TDX module has been loaded - * first and then try to bring it up. But TDX needs to use SEAMCALL - * to probe whether the module is loaded (there is no CPUID or MSR - * for that), and making SEAMCALL requires enabling virtualization - * first, just like the rest steps of bringing up TDX module. - * - * So, for simplicity do everything in __tdx_bringup(); the first - * SEAMCALL will return -ENODEV when the module is not loaded. The - * only complication is having to make sure that initialization - * SEAMCALLs don't return TDX_SEAMCALL_VMFAILINVALID in other - * cases. - */ r =3D __tdx_bringup(); if (r) { /* @@ -3568,8 +3473,6 @@ int __init tdx_bringup(void) */ if (r =3D=3D -ENODEV) goto success_disable_tdx; - - enable_tdx =3D 0; } =20 return r; @@ -3579,6 +3482,15 @@ int __init tdx_bringup(void) return 0; } =20 +void tdx_cleanup(void) +{ + if (!enable_tdx) + return; + + misc_cg_set_capacity(MISC_CG_RES_TDX, 0); + cpuhp_remove_state(tdx_cpuhp_state); +} + void __init tdx_hardware_setup(void) { KVM_SANITY_CHECK_VM_STRUCT_SIZE(kvm_tdx); diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index dfd82fac0498..feea8dd6920d 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include #include @@ -39,6 +40,7 @@ #include #include #include +#include #include "tdx.h" =20 static u32 tdx_global_keyid __ro_after_init; @@ -51,13 +53,11 @@ static DEFINE_PER_CPU(bool, tdx_lp_initialized); =20 static struct tdmr_info_list tdx_tdmr_list; =20 -static enum tdx_module_status_t tdx_module_status; -static DEFINE_MUTEX(tdx_module_lock); - /* All TDX-usable memory regions. Protected by mem_hotplug_lock. */ static LIST_HEAD(tdx_memlist); =20 static struct tdx_sys_info tdx_sysinfo; +static bool tdx_module_initialized; =20 typedef void (*sc_err_func_t)(u64 fn, u64 err, struct tdx_module_args *arg= s); =20 @@ -142,22 +142,15 @@ static int try_init_module_global(void) } =20 /** - * tdx_cpu_enable - Enable TDX on local cpu - * - * Do one-time TDX module per-cpu initialization SEAMCALL (and TDX module - * global initialization SEAMCALL if not done) on local cpu to make this - * cpu be ready to run any other SEAMCALLs. - * - * Return 0 on success, otherwise errors. + * Enable VMXON and then do one-time TDX module per-cpu initialization SEA= MCALL + * (and TDX module global initialization SEAMCALL if not done) on local cp= u to + * make this cpu be ready to run any other SEAMCALLs. */ -int tdx_cpu_enable(void) +static int tdx_cpu_enable(void) { struct tdx_module_args args =3D {}; int ret; =20 - if (!boot_cpu_has(X86_FEATURE_TDX_HOST_PLATFORM)) - return -ENODEV; - if (__this_cpu_read(tdx_lp_initialized)) return 0; =20 @@ -178,7 +171,58 @@ int tdx_cpu_enable(void) =20 return 0; } -EXPORT_SYMBOL_FOR_KVM(tdx_cpu_enable); + +static int tdx_online_cpu(unsigned int cpu) +{ + int ret; + + ret =3D x86_virt_get_ref(X86_FEATURE_VMX); + if (ret) + return ret; + + ret =3D tdx_cpu_enable(); + if (ret) + x86_virt_put_ref(X86_FEATURE_VMX); + + return ret; +} + +static int tdx_offline_cpu(unsigned int cpu) +{ + x86_virt_put_ref(X86_FEATURE_VMX); + return 0; +} + +static void tdx_shutdown_cpu(void *ign) +{ + x86_virt_put_ref(X86_FEATURE_VMX); +} + +static void tdx_shutdown(void *ign) +{ + on_each_cpu(tdx_shutdown_cpu, NULL, 1); +} + +static int tdx_suspend(void *ign) +{ + x86_virt_put_ref(X86_FEATURE_VMX); + return 0; +} + +static void tdx_resume(void *ign) +{ + WARN_ON_ONCE(x86_virt_get_ref(X86_FEATURE_VMX)); +} + +static const struct syscore_ops tdx_syscore_ops =3D { + .suspend =3D tdx_suspend, + .resume =3D tdx_resume, + .shutdown =3D tdx_shutdown, +}; + +static struct syscore tdx_syscore =3D { + .ops =3D &tdx_syscore_ops, +}; =20 /* * Add a memory region as a TDX memory block. The caller must make sure @@ -1153,67 +1197,50 @@ static int init_tdx_module(void) goto out_put_tdxmem; } =20 -static int __tdx_enable(void) +static int tdx_enable(void) { + enum cpuhp_state state; int ret; =20 + if (!cpu_feature_enabled(X86_FEATURE_TDX_HOST_PLATFORM)) { + pr_err("TDX not supported by the host platform\n"); + return -ENODEV; + } + + if (!cpu_feature_enabled(X86_FEATURE_XSAVE)) { + pr_err("XSAVE is required for TDX\n"); + return -EINVAL; + } + + if (!cpu_feature_enabled(X86_FEATURE_MOVDIR64B)) { + pr_err("MOVDIR64B is required for TDX\n"); + return -EINVAL; + } + + if (!cpu_feature_enabled(X86_FEATURE_SELFSNOOP)) { + pr_err("Self-snoop is required for TDX\n"); + return -ENODEV; + } + + state =3D cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "virt/tdx:online", + tdx_online_cpu, tdx_offline_cpu); + if (state < 0) + return state; + ret =3D init_tdx_module(); if (ret) { - pr_err("module initialization failed (%d)\n", ret); - tdx_module_status =3D TDX_MODULE_ERROR; + pr_err("TDX-Module initialization failed (%d)\n", ret); + cpuhp_remove_state(state); return ret; } =20 - pr_info("module initialized\n"); - tdx_module_status =3D TDX_MODULE_INITIALIZED; + register_syscore(&tdx_syscore); =20 + tdx_module_initialized =3D true; + pr_info("TDX-Module initialized\n"); return 0; } - -/** - * tdx_enable - Enable TDX module to make it ready to run TDX guests - * - * This function assumes the caller has: 1) held read lock of CPU hotplug - * lock to prevent any new cpu from becoming online; 2) done both VMXON - * and tdx_cpu_enable() on all online cpus. - * - * This function requires there's at least one online cpu for each CPU - * package to succeed. - * - * This function can be called in parallel by multiple callers. - * - * Return 0 if TDX is enabled successfully, otherwise error. - */ -int tdx_enable(void) -{ - int ret; - - if (!boot_cpu_has(X86_FEATURE_TDX_HOST_PLATFORM)) - return -ENODEV; - - lockdep_assert_cpus_held(); - - mutex_lock(&tdx_module_lock); - - switch (tdx_module_status) { - case TDX_MODULE_UNINITIALIZED: - ret =3D __tdx_enable(); - break; - case TDX_MODULE_INITIALIZED: - /* Already initialized, great, tell the caller. */ - ret =3D 0; - break; - default: - /* Failed to initialize in the previous attempts */ - ret =3D -EINVAL; - break; - } - - mutex_unlock(&tdx_module_lock); - - return ret; -} -EXPORT_SYMBOL_FOR_KVM(tdx_enable); +subsys_initcall(tdx_enable); =20 static bool is_pamt_page(unsigned long phys) { @@ -1464,15 +1491,10 @@ void __init tdx_init(void) =20 const struct tdx_sys_info *tdx_get_sysinfo(void) { - const struct tdx_sys_info *p =3D NULL; + if (!tdx_module_initialized) + return NULL; =20 - /* Make sure all fields in @tdx_sysinfo have been populated */ - mutex_lock(&tdx_module_lock); - if (tdx_module_status =3D=3D TDX_MODULE_INITIALIZED) - p =3D (const struct tdx_sys_info *)&tdx_sysinfo; - mutex_unlock(&tdx_module_lock); - - return p; + return (const struct tdx_sys_info *)&tdx_sysinfo; } EXPORT_SYMBOL_FOR_KVM(tdx_get_sysinfo); =20 diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 82bb82be8567..dde219c823b4 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -91,14 +91,6 @@ struct tdmr_info { * Do not put any hardware-defined TDX structure representations below * this comment! */ - -/* Kernel defined TDX module status during module initialization. */ -enum tdx_module_status_t { - TDX_MODULE_UNINITIALIZED, - TDX_MODULE_INITIALIZED, - TDX_MODULE_ERROR -}; - struct tdx_memblock { struct list_head list; unsigned long start_pfn; --=20 2.53.0.310.g728cabbaf7-goog From nobody Thu Apr 2 22:23:03 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF08B2D876A for ; Sat, 14 Feb 2026 01:27:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032454; cv=none; b=cP96hWwt0SbBnOF5ldSDmTr0DF+aH+s+8ELP9oU6v1zAceaizxbXIHcn8apHR/5hvL5/hDY78QqiwtM8K0AwoQ2+DKAD/4SgXP/wc905l0vUZgcUg+4ftwhOLs9iJE43iDadcuQuzhFvzSZuzyJ4Httoa2dfzztKd+afIMsvTxA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032454; c=relaxed/simple; bh=cb13x5z5N8fp2BGhzM0zHZ/gKYs1HHYc4PPu8z7jp18=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ZV1FHm2zgIoKWaOml6qql/N4yvScNB5t6TSE9yjtwvOvBPtQB6mpH8im1IzjBvpiY82AZomB6NbxmXanCtKBDdrx4DRydI2GYjz/bauVQ6YMKHp8vwdXhLHGBYoVOCEXuG4LdyD6XjozlyUCBDfiqd+j92PdrpFnJMpBd4YHO8s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=SwyzpkWs; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="SwyzpkWs" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2aae146bab0so19379195ad.0 for ; Fri, 13 Feb 2026 17:27:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1771032447; x=1771637247; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=1nvsxV5QM155um7A8rKyWUR5yrYtoiVHtXpq9eL/jZk=; b=SwyzpkWsQ7SXilYxxH7nT3grwcrHiaYoPLdMr+QQ5z0kTn9d8S7Rg+qZwxqxnQCIZB xtg7IrP51jwvzugoOmgqeYZVUywEy2F7O4WrV0aFrQ9gONWGfSFkroGZvQtzGfxZujYN rV1CvOkUXsnS9RaHVVmzRHzQPqOYv2xX7+BNxz3PSuZ09XEskpHCbl3ayTyZdcKSnZwV lZk5sgCXCCfMLsgew/LX5zkgHG4haAoXvVTUwBm7DQfXiw7l6KBJnxc/th136aWkrqmm 41iRiL/Fm8YLT3XZGjCclnXTOJJJi3Exkyw896y682oRbbrKNgIGAkPgRhMxNDVjTgeD YBaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771032447; x=1771637247; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=1nvsxV5QM155um7A8rKyWUR5yrYtoiVHtXpq9eL/jZk=; b=Fm0Xj4KmLNralYaumNnmOMefX9ZGFhDM5hoVGKJwLZsdBNX8ScD/n25TreDJ2zkwoQ thrYf1MB/Og3rwdc9T4Hx2v/bz9Nk3n/8vVKsdWBcDEvWAMbyWNAscalQAJ5piMrdHP6 zYKY8IRw1oKmTdLaNVmxECieSErophwsXJv4DKVlEsxkYOor1kF6KVicOOLhi4z/J6+L Vr+rfOJGo4Y8D34DLQQaarf3Fop3S/1z0qfR55CxzjKi3GATV4LLFdyZtQjJ4aQnqJgv i0Dku/8wBkziAD5prcYvUchyeKUdVQMjuCsucFtG62BzXeEL+85wQf5WlCDmZVohz6Rj 5yAw== X-Gm-Message-State: AOJu0YyiDXQOnABkJPzuGhsiAiq0GCpx8mZh7YGiVovh8Rsa1+y+Xf/s p5LlcvZP84f4XNCf6qp5FoGus0JUwbyOrV2C8zRXBMRP0wikbjrnMh36N0BFLjp2feilP7BA1Tx 1B2uOUw== X-Received: from plil7.prod.google.com ([2002:a17:903:17c7:b0:2a7:6eb5:7e30]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:d488:b0:2aa:daf7:84f6 with SMTP id d9443c01a7336-2ab50627b36mr35501185ad.55.1771032447053; Fri, 13 Feb 2026 17:27:27 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 13 Feb 2026 17:26:58 -0800 In-Reply-To: <20260214012702.2368778-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260214012702.2368778-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.310.g728cabbaf7-goog Message-ID: <20260214012702.2368778-13-seanjc@google.com> Subject: [PATCH v3 12/16] x86/virt/tdx: Tag a pile of functions as __init, and globals as __ro_after_init From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Peter Zijlstra , Arnaldo Carvalho de Melo , Namhyung Kim , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, linux-perf-users@vger.kernel.org, Chao Gao , Xu Yilun , Dan Williams Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now that TDX-Module initialization is done during subsys init, tag all related functions as __init, and relevant data as __ro_after_init. Reviewed-by: Dan Williams Reviewed-by: Chao Gao Signed-off-by: Sean Christopherson Tested-by: Chao Gao Tested-by: Sagi Shahar --- arch/x86/virt/vmx/tdx/tdx.c | 119 ++++++++++---------- arch/x86/virt/vmx/tdx/tdx_global_metadata.c | 10 +- 2 files changed, 66 insertions(+), 63 deletions(-) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index feea8dd6920d..05d634caa4e8 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -56,8 +56,8 @@ static struct tdmr_info_list tdx_tdmr_list; /* All TDX-usable memory regions. Protected by mem_hotplug_lock. */ static LIST_HEAD(tdx_memlist); =20 -static struct tdx_sys_info tdx_sysinfo; -static bool tdx_module_initialized; +static struct tdx_sys_info tdx_sysinfo __ro_after_init; +static bool tdx_module_initialized __ro_after_init; =20 typedef void (*sc_err_func_t)(u64 fn, u64 err, struct tdx_module_args *arg= s); =20 @@ -229,8 +229,9 @@ static struct syscore tdx_syscore =3D { * all memory regions are added in address ascending order and don't * overlap. */ -static int add_tdx_memblock(struct list_head *tmb_list, unsigned long star= t_pfn, - unsigned long end_pfn, int nid) +static __init int add_tdx_memblock(struct list_head *tmb_list, + unsigned long start_pfn, + unsigned long end_pfn, int nid) { struct tdx_memblock *tmb; =20 @@ -248,7 +249,7 @@ static int add_tdx_memblock(struct list_head *tmb_list,= unsigned long start_pfn, return 0; } =20 -static void free_tdx_memlist(struct list_head *tmb_list) +static __init void free_tdx_memlist(struct list_head *tmb_list) { /* @tmb_list is protected by mem_hotplug_lock */ while (!list_empty(tmb_list)) { @@ -266,7 +267,7 @@ static void free_tdx_memlist(struct list_head *tmb_list) * ranges off in a secondary structure because memblock is modified * in memory hotplug while TDX memory regions are fixed. */ -static int build_tdx_memlist(struct list_head *tmb_list) +static __init int build_tdx_memlist(struct list_head *tmb_list) { unsigned long start_pfn, end_pfn; int i, nid, ret; @@ -298,7 +299,7 @@ static int build_tdx_memlist(struct list_head *tmb_list) return ret; } =20 -static int read_sys_metadata_field(u64 field_id, u64 *data) +static __init int read_sys_metadata_field(u64 field_id, u64 *data) { struct tdx_module_args args =3D {}; int ret; @@ -320,7 +321,7 @@ static int read_sys_metadata_field(u64 field_id, u64 *d= ata) =20 #include "tdx_global_metadata.c" =20 -static int check_features(struct tdx_sys_info *sysinfo) +static __init int check_features(struct tdx_sys_info *sysinfo) { u64 tdx_features0 =3D sysinfo->features.tdx_features0; =20 @@ -333,7 +334,7 @@ static int check_features(struct tdx_sys_info *sysinfo) } =20 /* Calculate the actual TDMR size */ -static int tdmr_size_single(u16 max_reserved_per_tdmr) +static __init int tdmr_size_single(u16 max_reserved_per_tdmr) { int tdmr_sz; =20 @@ -347,8 +348,8 @@ static int tdmr_size_single(u16 max_reserved_per_tdmr) return ALIGN(tdmr_sz, TDMR_INFO_ALIGNMENT); } =20 -static int alloc_tdmr_list(struct tdmr_info_list *tdmr_list, - struct tdx_sys_info_tdmr *sysinfo_tdmr) +static __init int alloc_tdmr_list(struct tdmr_info_list *tdmr_list, + struct tdx_sys_info_tdmr *sysinfo_tdmr) { size_t tdmr_sz, tdmr_array_sz; void *tdmr_array; @@ -379,7 +380,7 @@ static int alloc_tdmr_list(struct tdmr_info_list *tdmr_= list, return 0; } =20 -static void free_tdmr_list(struct tdmr_info_list *tdmr_list) +static __init void free_tdmr_list(struct tdmr_info_list *tdmr_list) { free_pages_exact(tdmr_list->tdmrs, tdmr_list->max_tdmrs * tdmr_list->tdmr_sz); @@ -408,8 +409,8 @@ static inline u64 tdmr_end(struct tdmr_info *tdmr) * preallocated @tdmr_list, following all the special alignment * and size rules for TDMR. */ -static int fill_out_tdmrs(struct list_head *tmb_list, - struct tdmr_info_list *tdmr_list) +static __init int fill_out_tdmrs(struct list_head *tmb_list, + struct tdmr_info_list *tdmr_list) { struct tdx_memblock *tmb; int tdmr_idx =3D 0; @@ -485,8 +486,8 @@ static int fill_out_tdmrs(struct list_head *tmb_list, * Calculate PAMT size given a TDMR and a page size. The returned * PAMT size is always aligned up to 4K page boundary. */ -static unsigned long tdmr_get_pamt_sz(struct tdmr_info *tdmr, int pgsz, - u16 pamt_entry_size) +static __init unsigned long tdmr_get_pamt_sz(struct tdmr_info *tdmr, int p= gsz, + u16 pamt_entry_size) { unsigned long pamt_sz, nr_pamt_entries; =20 @@ -517,7 +518,7 @@ static unsigned long tdmr_get_pamt_sz(struct tdmr_info = *tdmr, int pgsz, * PAMT. This node will have some memory covered by the TDMR. The * relative amount of memory covered is not considered. */ -static int tdmr_get_nid(struct tdmr_info *tdmr, struct list_head *tmb_list) +static __init int tdmr_get_nid(struct tdmr_info *tdmr, struct list_head *t= mb_list) { struct tdx_memblock *tmb; =20 @@ -546,9 +547,9 @@ static int tdmr_get_nid(struct tdmr_info *tdmr, struct = list_head *tmb_list) * Allocate PAMTs from the local NUMA node of some memory in @tmb_list * within @tdmr, and set up PAMTs for @tdmr. */ -static int tdmr_set_up_pamt(struct tdmr_info *tdmr, - struct list_head *tmb_list, - u16 pamt_entry_size[]) +static __init int tdmr_set_up_pamt(struct tdmr_info *tdmr, + struct list_head *tmb_list, + u16 pamt_entry_size[]) { unsigned long pamt_base[TDX_PS_NR]; unsigned long pamt_size[TDX_PS_NR]; @@ -618,7 +619,7 @@ static void tdmr_get_pamt(struct tdmr_info *tdmr, unsig= ned long *pamt_base, *pamt_size =3D pamt_sz; } =20 -static void tdmr_do_pamt_func(struct tdmr_info *tdmr, +static __init void tdmr_do_pamt_func(struct tdmr_info *tdmr, void (*pamt_func)(unsigned long base, unsigned long size)) { unsigned long pamt_base, pamt_size; @@ -635,17 +636,17 @@ static void tdmr_do_pamt_func(struct tdmr_info *tdmr, pamt_func(pamt_base, pamt_size); } =20 -static void free_pamt(unsigned long pamt_base, unsigned long pamt_size) +static __init void free_pamt(unsigned long pamt_base, unsigned long pamt_s= ize) { free_contig_range(pamt_base >> PAGE_SHIFT, pamt_size >> PAGE_SHIFT); } =20 -static void tdmr_free_pamt(struct tdmr_info *tdmr) +static __init void tdmr_free_pamt(struct tdmr_info *tdmr) { tdmr_do_pamt_func(tdmr, free_pamt); } =20 -static void tdmrs_free_pamt_all(struct tdmr_info_list *tdmr_list) +static __init void tdmrs_free_pamt_all(struct tdmr_info_list *tdmr_list) { int i; =20 @@ -654,9 +655,9 @@ static void tdmrs_free_pamt_all(struct tdmr_info_list *= tdmr_list) } =20 /* Allocate and set up PAMTs for all TDMRs */ -static int tdmrs_set_up_pamt_all(struct tdmr_info_list *tdmr_list, - struct list_head *tmb_list, - u16 pamt_entry_size[]) +static __init int tdmrs_set_up_pamt_all(struct tdmr_info_list *tdmr_list, + struct list_head *tmb_list, + u16 pamt_entry_size[]) { int i, ret =3D 0; =20 @@ -705,12 +706,13 @@ void tdx_quirk_reset_page(struct page *page) } EXPORT_SYMBOL_FOR_KVM(tdx_quirk_reset_page); =20 -static void tdmr_quirk_reset_pamt(struct tdmr_info *tdmr) +static __init void tdmr_quirk_reset_pamt(struct tdmr_info *tdmr) + { tdmr_do_pamt_func(tdmr, tdx_quirk_reset_paddr); } =20 -static void tdmrs_quirk_reset_pamt_all(struct tdmr_info_list *tdmr_list) +static __init void tdmrs_quirk_reset_pamt_all(struct tdmr_info_list *tdmr_= list) { int i; =20 @@ -718,7 +720,7 @@ static void tdmrs_quirk_reset_pamt_all(struct tdmr_info= _list *tdmr_list) tdmr_quirk_reset_pamt(tdmr_entry(tdmr_list, i)); } =20 -static unsigned long tdmrs_count_pamt_kb(struct tdmr_info_list *tdmr_list) +static __init unsigned long tdmrs_count_pamt_kb(struct tdmr_info_list *tdm= r_list) { unsigned long pamt_size =3D 0; int i; @@ -733,8 +735,8 @@ static unsigned long tdmrs_count_pamt_kb(struct tdmr_in= fo_list *tdmr_list) return pamt_size / 1024; } =20 -static int tdmr_add_rsvd_area(struct tdmr_info *tdmr, int *p_idx, u64 addr, - u64 size, u16 max_reserved_per_tdmr) +static __init int tdmr_add_rsvd_area(struct tdmr_info *tdmr, int *p_idx, + u64 addr, u64 size, u16 max_reserved_per_tdmr) { struct tdmr_reserved_area *rsvd_areas =3D tdmr->reserved_areas; int idx =3D *p_idx; @@ -767,10 +769,10 @@ static int tdmr_add_rsvd_area(struct tdmr_info *tdmr,= int *p_idx, u64 addr, * those holes fall within @tdmr, set up a TDMR reserved area to cover * the hole. */ -static int tdmr_populate_rsvd_holes(struct list_head *tmb_list, - struct tdmr_info *tdmr, - int *rsvd_idx, - u16 max_reserved_per_tdmr) +static __init int tdmr_populate_rsvd_holes(struct list_head *tmb_list, + struct tdmr_info *tdmr, + int *rsvd_idx, + u16 max_reserved_per_tdmr) { struct tdx_memblock *tmb; u64 prev_end; @@ -831,10 +833,10 @@ static int tdmr_populate_rsvd_holes(struct list_head = *tmb_list, * overlaps with @tdmr, set up a TDMR reserved area to cover the * overlapping part. */ -static int tdmr_populate_rsvd_pamts(struct tdmr_info_list *tdmr_list, - struct tdmr_info *tdmr, - int *rsvd_idx, - u16 max_reserved_per_tdmr) +static __init int tdmr_populate_rsvd_pamts(struct tdmr_info_list *tdmr_lis= t, + struct tdmr_info *tdmr, + int *rsvd_idx, + u16 max_reserved_per_tdmr) { int i, ret; =20 @@ -869,7 +871,7 @@ static int tdmr_populate_rsvd_pamts(struct tdmr_info_li= st *tdmr_list, } =20 /* Compare function called by sort() for TDMR reserved areas */ -static int rsvd_area_cmp_func(const void *a, const void *b) +static __init int rsvd_area_cmp_func(const void *a, const void *b) { struct tdmr_reserved_area *r1 =3D (struct tdmr_reserved_area *)a; struct tdmr_reserved_area *r2 =3D (struct tdmr_reserved_area *)b; @@ -888,10 +890,10 @@ static int rsvd_area_cmp_func(const void *a, const vo= id *b) * Populate reserved areas for the given @tdmr, including memory holes * (via @tmb_list) and PAMTs (via @tdmr_list). */ -static int tdmr_populate_rsvd_areas(struct tdmr_info *tdmr, - struct list_head *tmb_list, - struct tdmr_info_list *tdmr_list, - u16 max_reserved_per_tdmr) +static __init int tdmr_populate_rsvd_areas(struct tdmr_info *tdmr, + struct list_head *tmb_list, + struct tdmr_info_list *tdmr_list, + u16 max_reserved_per_tdmr) { int ret, rsvd_idx =3D 0; =20 @@ -916,9 +918,9 @@ static int tdmr_populate_rsvd_areas(struct tdmr_info *t= dmr, * Populate reserved areas for all TDMRs in @tdmr_list, including memory * holes (via @tmb_list) and PAMTs. */ -static int tdmrs_populate_rsvd_areas_all(struct tdmr_info_list *tdmr_list, - struct list_head *tmb_list, - u16 max_reserved_per_tdmr) +static __init int tdmrs_populate_rsvd_areas_all(struct tdmr_info_list *tdm= r_list, + struct list_head *tmb_list, + u16 max_reserved_per_tdmr) { int i; =20 @@ -939,9 +941,9 @@ static int tdmrs_populate_rsvd_areas_all(struct tdmr_in= fo_list *tdmr_list, * to cover all TDX memory regions in @tmb_list based on the TDX module * TDMR global information in @sysinfo_tdmr. */ -static int construct_tdmrs(struct list_head *tmb_list, - struct tdmr_info_list *tdmr_list, - struct tdx_sys_info_tdmr *sysinfo_tdmr) +static __init int construct_tdmrs(struct list_head *tmb_list, + struct tdmr_info_list *tdmr_list, + struct tdx_sys_info_tdmr *sysinfo_tdmr) { u16 pamt_entry_size[TDX_PS_NR] =3D { sysinfo_tdmr->pamt_4k_entry_size, @@ -973,7 +975,8 @@ static int construct_tdmrs(struct list_head *tmb_list, return ret; } =20 -static int config_tdx_module(struct tdmr_info_list *tdmr_list, u64 global_= keyid) +static __init int config_tdx_module(struct tdmr_info_list *tdmr_list, + u64 global_keyid) { struct tdx_module_args args =3D {}; u64 *tdmr_pa_array; @@ -1008,7 +1011,7 @@ static int config_tdx_module(struct tdmr_info_list *t= dmr_list, u64 global_keyid) return ret; } =20 -static int do_global_key_config(void *unused) +static __init int do_global_key_config(void *unused) { struct tdx_module_args args =3D {}; =20 @@ -1026,7 +1029,7 @@ static int do_global_key_config(void *unused) * KVM) can ensure success by ensuring sufficient CPUs are online and * can run SEAMCALLs. */ -static int config_global_keyid(void) +static __init int config_global_keyid(void) { cpumask_var_t packages; int cpu, ret =3D -EINVAL; @@ -1066,7 +1069,7 @@ static int config_global_keyid(void) return ret; } =20 -static int init_tdmr(struct tdmr_info *tdmr) +static __init int init_tdmr(struct tdmr_info *tdmr) { u64 next; =20 @@ -1097,7 +1100,7 @@ static int init_tdmr(struct tdmr_info *tdmr) return 0; } =20 -static int init_tdmrs(struct tdmr_info_list *tdmr_list) +static __init int init_tdmrs(struct tdmr_info_list *tdmr_list) { int i; =20 @@ -1116,7 +1119,7 @@ static int init_tdmrs(struct tdmr_info_list *tdmr_lis= t) return 0; } =20 -static int init_tdx_module(void) +static __init int init_tdx_module(void) { int ret; =20 @@ -1197,7 +1200,7 @@ static int init_tdx_module(void) goto out_put_tdxmem; } =20 -static int tdx_enable(void) +static __init int tdx_enable(void) { enum cpuhp_state state; int ret; diff --git a/arch/x86/virt/vmx/tdx/tdx_global_metadata.c b/arch/x86/virt/vm= x/tdx/tdx_global_metadata.c index 13ad2663488b..360963bc9328 100644 --- a/arch/x86/virt/vmx/tdx/tdx_global_metadata.c +++ b/arch/x86/virt/vmx/tdx/tdx_global_metadata.c @@ -7,7 +7,7 @@ * Include this file to other C file instead. */ =20 -static int get_tdx_sys_info_features(struct tdx_sys_info_features *sysinfo= _features) +static __init int get_tdx_sys_info_features(struct tdx_sys_info_features *= sysinfo_features) { int ret =3D 0; u64 val; @@ -18,7 +18,7 @@ static int get_tdx_sys_info_features(struct tdx_sys_info_= features *sysinfo_featu return ret; } =20 -static int get_tdx_sys_info_tdmr(struct tdx_sys_info_tdmr *sysinfo_tdmr) +static __init int get_tdx_sys_info_tdmr(struct tdx_sys_info_tdmr *sysinfo_= tdmr) { int ret =3D 0; u64 val; @@ -37,7 +37,7 @@ static int get_tdx_sys_info_tdmr(struct tdx_sys_info_tdmr= *sysinfo_tdmr) return ret; } =20 -static int get_tdx_sys_info_td_ctrl(struct tdx_sys_info_td_ctrl *sysinfo_t= d_ctrl) +static __init int get_tdx_sys_info_td_ctrl(struct tdx_sys_info_td_ctrl *sy= sinfo_td_ctrl) { int ret =3D 0; u64 val; @@ -52,7 +52,7 @@ static int get_tdx_sys_info_td_ctrl(struct tdx_sys_info_t= d_ctrl *sysinfo_td_ctrl return ret; } =20 -static int get_tdx_sys_info_td_conf(struct tdx_sys_info_td_conf *sysinfo_t= d_conf) +static __init int get_tdx_sys_info_td_conf(struct tdx_sys_info_td_conf *sy= sinfo_td_conf) { int ret =3D 0; u64 val; @@ -85,7 +85,7 @@ static int get_tdx_sys_info_td_conf(struct tdx_sys_info_t= d_conf *sysinfo_td_conf return ret; } =20 -static int get_tdx_sys_info(struct tdx_sys_info *sysinfo) +static __init int get_tdx_sys_info(struct tdx_sys_info *sysinfo) { int ret =3D 0; =20 --=20 2.53.0.310.g728cabbaf7-goog From nobody Thu Apr 2 22:23:03 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B5FF22BE7DD for ; Sat, 14 Feb 2026 01:27:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032451; cv=none; b=Ia4GTVlXMGKziERDu5S73PAeneHv5/zZWIYT5DNykOcl+wlgzJMv5FPs0XJM65BZIXuS6vBAbqxwMoV+iB9gGE13wdV8XqPbl9IPWSSyi7slkdpRelzSvVsvwjAw2m4/d3abGmCAn5kGNesHn5/zGc5Smu5C6l96SFHOyjEQjAM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032451; c=relaxed/simple; bh=E6Q5/+hh3nilZrpijZlxvvvkiid5ytZMA5bWglZIuIs=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=H13Xt28LElLoIJb7g8pk8bfMG0iwkx00wGDBX9rQS9a90aFAhSbYSrV1cyyhpFFeXhjeIKSsnm4uw8YhNdk2fZTZeDSXPToi4xDuqnnbELD5xCm4z3CWQ2XO0qLv5QYr0QbAisRpvolzVkAQ/gIhlBJOCAYV+XDYU+8JxruNlIA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=NB6gPc69; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="NB6gPc69" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2aaf0dbd073so16009735ad.3 for ; Fri, 13 Feb 2026 17:27:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1771032449; x=1771637249; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=f9/W8zXMCwCxOnxdB4EBaLa8j2TODn+O9pqKnu48UGU=; b=NB6gPc69Yhq0Ypckr0l+ueugS5Ke5EEF+f60t1Mnt3VUHkSthCS5WOpmWbfFTXp43i AbdkXPbSKsi3TU1zrnjM67AbT18fPki/Iqm305uVcRCt6K9Ae4h/KHORlVWbhfBhbE79 UFHcwcHWWV0hb05M2oZZvz8c4aLKHYAGtOggzkcjSn/uydE6lxq7qRPBoQ3VO6AvJ2WU /kVs13fgG3fJGsXPucf4vK+mE4UfZd9k1oY5+eR5EXhDWUFks1+TW+kB1MeYdzUwNXVL Ya5iDeAfV8gWmGA9t66UzpC35aswlmHBe7Dj18i1Dmol2lwMWjbwpKDHe0lerDNwRMpH DNng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771032449; x=1771637249; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=f9/W8zXMCwCxOnxdB4EBaLa8j2TODn+O9pqKnu48UGU=; b=VoOHK2q2QcHuBalUwt937anvbH7noZni6+7rg297FS991Dlui7Ute2oz358kV/aTg2 8wWfbdvn7bNNEgU4skiJCI+ORmtKGB8zVUcQnYSskppq+V+MOUHm5gz1oQFeheMcxjo6 UtBDynY3/684lPiBDdyZksa8itnl8iTKn3lnvgM8dF8m/oEw4BerHHkXHsVNiCK5lLtm D1xOQxq+QdSyxfDUk1qUlaN4Pw7PTmErk2wQuddhwC+UCNBEj7aQ7rY+xkkXBf1kWRGW 6XOaTSAOflZhArXoriablHce5Lcpp7mxBIoqVbG542H5QYbogMNx82sa9iva4huZKrrF rQgQ== X-Gm-Message-State: AOJu0YwLJ7rKuZDZo/tUEB1fHmY2dPeH3LgFC9a35Es/IUsizjnfwq2z 5z8He55XPbsJPshRhBSjH2TdGSBoZFJppRDk4qlT8QflCF6yhsCsLMHNpbjc3Yx+4g/1b/M4rCs nbdvGKQ== X-Received: from plmk2.prod.google.com ([2002:a17:903:1802:b0:2a4:2817:d023]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:cccd:b0:2a7:80bf:3131 with SMTP id d9443c01a7336-2ab5062d628mr31972075ad.58.1771032449020; Fri, 13 Feb 2026 17:27:29 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 13 Feb 2026 17:26:59 -0800 In-Reply-To: <20260214012702.2368778-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260214012702.2368778-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.310.g728cabbaf7-goog Message-ID: <20260214012702.2368778-14-seanjc@google.com> Subject: [PATCH v3 13/16] x86/virt/tdx: KVM: Consolidate TDX CPU hotplug handling From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Peter Zijlstra , Arnaldo Carvalho de Melo , Namhyung Kim , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, linux-perf-users@vger.kernel.org, Chao Gao , Xu Yilun , Dan Williams Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Chao Gao The core kernel registers a CPU hotplug callback to do VMX and TDX init and deinit while KVM registers a separate CPU offline callback to block offlining the last online CPU in a socket. Splitting TDX-related CPU hotplug handling across two components is odd and adds unnecessary complexity. Consolidate TDX-related CPU hotplug handling by integrating KVM's tdx_offline_cpu() to the one in the core kernel. Also move nr_configured_hkid to the core kernel because tdx_offline_cpu() references it. Since HKID allocation and free are handled in the core kernel, it's more natural to track used HKIDs there. Reviewed-by: Dan Williams Signed-off-by: Chao Gao Signed-off-by: Sean Christopherson Tested-by: Chao Gao Tested-by: Sagi Shahar --- arch/x86/kvm/vmx/tdx.c | 67 +------------------------------------ arch/x86/virt/vmx/tdx/tdx.c | 49 +++++++++++++++++++++++++-- 2 files changed, 47 insertions(+), 69 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 0ac01c119336..fea3dfc7ac8b 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -59,8 +59,6 @@ module_param_named(tdx, enable_tdx, bool, 0444); #define TDX_SHARED_BIT_PWL_5 gpa_to_gfn(BIT_ULL(51)) #define TDX_SHARED_BIT_PWL_4 gpa_to_gfn(BIT_ULL(47)) =20 -static enum cpuhp_state tdx_cpuhp_state __ro_after_init; - static const struct tdx_sys_info *tdx_sysinfo; =20 void tdh_vp_rd_failed(struct vcpu_tdx *tdx, char *uclass, u32 field, u64 e= rr) @@ -219,8 +217,6 @@ static int init_kvm_tdx_caps(const struct tdx_sys_info_= td_conf *td_conf, */ static DEFINE_MUTEX(tdx_lock); =20 -static atomic_t nr_configured_hkid; - static bool tdx_operand_busy(u64 err) { return (err & TDX_SEAMCALL_STATUS_MASK) =3D=3D TDX_OPERAND_BUSY; @@ -268,7 +264,6 @@ static inline void tdx_hkid_free(struct kvm_tdx *kvm_td= x) { tdx_guest_keyid_free(kvm_tdx->hkid); kvm_tdx->hkid =3D -1; - atomic_dec(&nr_configured_hkid); misc_cg_uncharge(MISC_CG_RES_TDX, kvm_tdx->misc_cg, 1); put_misc_cg(kvm_tdx->misc_cg); kvm_tdx->misc_cg =3D NULL; @@ -2399,8 +2394,6 @@ static int __tdx_td_init(struct kvm *kvm, struct td_p= arams *td_params, =20 ret =3D -ENOMEM; =20 - atomic_inc(&nr_configured_hkid); - tdr_page =3D alloc_page(GFP_KERNEL); if (!tdr_page) goto free_hkid; @@ -3292,51 +3285,10 @@ int tdx_gmem_max_mapping_level(struct kvm *kvm, kvm= _pfn_t pfn, bool is_private) return PG_LEVEL_4K; } =20 -static int tdx_online_cpu(unsigned int cpu) -{ - return 0; -} - -static int tdx_offline_cpu(unsigned int cpu) -{ - int i; - - /* No TD is running. Allow any cpu to be offline. */ - if (!atomic_read(&nr_configured_hkid)) - return 0; - - /* - * In order to reclaim TDX HKID, (i.e. when deleting guest TD), need to - * call TDH.PHYMEM.PAGE.WBINVD on all packages to program all memory - * controller with pconfig. If we have active TDX HKID, refuse to - * offline the last online cpu. - */ - for_each_online_cpu(i) { - /* - * Found another online cpu on the same package. - * Allow to offline. - */ - if (i !=3D cpu && topology_physical_package_id(i) =3D=3D - topology_physical_package_id(cpu)) - return 0; - } - - /* - * This is the last cpu of this package. Don't offline it. - * - * Because it's hard for human operator to understand the - * reason, warn it. - */ -#define MSG_ALLPKG_ONLINE \ - "TDX requires all packages to have an online CPU. Delete all TDs in order= to offline all CPUs of a package.\n" - pr_warn_ratelimited(MSG_ALLPKG_ONLINE); - return -EBUSY; -} - static int __init __tdx_bringup(void) { const struct tdx_sys_info_td_conf *td_conf; - int r, i; + int i; =20 for (i =3D 0; i < ARRAY_SIZE(tdx_uret_msrs); i++) { /* @@ -3404,23 +3356,7 @@ static int __init __tdx_bringup(void) if (misc_cg_set_capacity(MISC_CG_RES_TDX, tdx_get_nr_guest_keyids())) return -EINVAL; =20 - /* - * TDX-specific cpuhp callback to disallow offlining the last CPU in a - * packing while KVM is running one or more TDs. Reclaiming HKIDs - * requires doing PAGE.WBINVD on every package, i.e. offlining all CPUs - * of a package would prevent reclaiming the HKID. - */ - r =3D cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "kvm/cpu/tdx:online", - tdx_online_cpu, tdx_offline_cpu); - if (r < 0) - goto err_cpuhup; - - tdx_cpuhp_state =3D r; return 0; - -err_cpuhup: - misc_cg_set_capacity(MISC_CG_RES_TDX, 0); - return r; } =20 int __init tdx_bringup(void) @@ -3488,7 +3424,6 @@ void tdx_cleanup(void) return; =20 misc_cg_set_capacity(MISC_CG_RES_TDX, 0); - cpuhp_remove_state(tdx_cpuhp_state); } =20 void __init tdx_hardware_setup(void) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 05d634caa4e8..ddbab87d2467 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -59,6 +59,8 @@ static LIST_HEAD(tdx_memlist); static struct tdx_sys_info tdx_sysinfo __ro_after_init; static bool tdx_module_initialized __ro_after_init; =20 +static atomic_t nr_configured_hkid; + typedef void (*sc_err_func_t)(u64 fn, u64 err, struct tdx_module_args *arg= s); =20 static inline void seamcall_err(u64 fn, u64 err, struct tdx_module_args *a= rgs) @@ -189,6 +191,40 @@ static int tdx_online_cpu(unsigned int cpu) =20 static int tdx_offline_cpu(unsigned int cpu) { + int i; + + /* No TD is running. Allow any cpu to be offline. */ + if (!atomic_read(&nr_configured_hkid)) + goto done; + + /* + * In order to reclaim TDX HKID, (i.e. when deleting guest TD), need to + * call TDH.PHYMEM.PAGE.WBINVD on all packages to program all memory + * controller with pconfig. If we have active TDX HKID, refuse to + * offline the last online cpu. + */ + for_each_online_cpu(i) { + /* + * Found another online cpu on the same package. + * Allow to offline. + */ + if (i !=3D cpu && topology_physical_package_id(i) =3D=3D + topology_physical_package_id(cpu)) + goto done; + } + + /* + * This is the last cpu of this package. Don't offline it. + * + * Because it's hard for human operator to understand the + * reason, warn it. + */ +#define MSG_ALLPKG_ONLINE \ + "TDX requires all packages to have an online CPU. Delete all TDs in order= to offline all CPUs of a package.\n" + pr_warn_ratelimited(MSG_ALLPKG_ONLINE); + return -EBUSY; + +done: x86_virt_put_ref(X86_FEATURE_VMX); return 0; } @@ -1509,15 +1545,22 @@ EXPORT_SYMBOL_FOR_KVM(tdx_get_nr_guest_keyids); =20 int tdx_guest_keyid_alloc(void) { - return ida_alloc_range(&tdx_guest_keyid_pool, tdx_guest_keyid_start, - tdx_guest_keyid_start + tdx_nr_guest_keyids - 1, - GFP_KERNEL); + int ret; + + ret =3D ida_alloc_range(&tdx_guest_keyid_pool, tdx_guest_keyid_start, + tdx_guest_keyid_start + tdx_nr_guest_keyids - 1, + GFP_KERNEL); + if (ret >=3D 0) + atomic_inc(&nr_configured_hkid); + + return ret; } EXPORT_SYMBOL_FOR_KVM(tdx_guest_keyid_alloc); =20 void tdx_guest_keyid_free(unsigned int keyid) { ida_free(&tdx_guest_keyid_pool, keyid); + atomic_dec(&nr_configured_hkid); } EXPORT_SYMBOL_FOR_KVM(tdx_guest_keyid_free); =20 --=20 2.53.0.310.g728cabbaf7-goog From nobody Thu Apr 2 22:23:03 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BDC212DECB2 for ; Sat, 14 Feb 2026 01:27:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032454; cv=none; b=jZiAWuWi7ObSoGcfj0JQ22hBAwaFmpc3ZHDQLnpQ1m1Nc2uoV0T3Ysy/EVO1lFStIYbfP3DEGyitswM1SdIMIeDs9IfZPO/P3W8h5ZDoe2shnVtukdxxM9nZJCUbrB727x2splvdTpU8zqPv/s0/sxQs+e/WLduwx+Ahk6goThA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032454; c=relaxed/simple; bh=2jURTAzz81y5fmKZ0nkZ+MRnOfqPxSPZP5ZzXKImRwE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Mx/Xaow4F+MKR7xdDHUcY3xKX3iH9nINgAlKyBtVI8qI69OHRVMgLGFT1krQYsInVh1fEyGOly24DHxB3J0euzxX+lmO73ty3X804fv5aUM7d4gyZGqNWqZnQn0Xv4sK+Im26/hN9GCi+xQrAVcmGUKn3gDwZhtTvBi1Ya8a/0Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=oPrtfbDX; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="oPrtfbDX" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2aae146bab0so19379475ad.0 for ; Fri, 13 Feb 2026 17:27:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1771032451; x=1771637251; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=hrsNdfLewzqAJ4j/aZKlf6lqr1ErBy4WWW6V+vzZ7i0=; b=oPrtfbDXpFGF+Nvkp/7SPz/60rjY57PFvdBOlFwHU5fwCuBzhGjsGWna5Im7oClNHv MJGiWSypeq8hFLRPSuY92UbEomAGadSJGH87KJUMwMEBJ9HZHG9yF6ZwFGdK3YjlxEXe GicFYao3t4s9zoz42mnUn0IcW+gxrpVmk/T8BVTAxyuFORmQP21G1lae/Eq6rQbwR6IR kxMJWhfQHvSi/5t9uneEPMh1ImLYKeaSa2tysv5066X5Lp2FI+bUPXENiK7eX5KHqRoe xM/VPZKpDsPq2xXKNsyd8aOIckZB3c8kz61rUloAV+AK0oJPzCTh9h2lARHmEYT2upOO HKBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771032451; x=1771637251; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=hrsNdfLewzqAJ4j/aZKlf6lqr1ErBy4WWW6V+vzZ7i0=; b=n1ZcKmVbWV1nwKKQi4Nyhta2c1d4nWLkb6y79beXSHZjdBVz2gIaVtzVebq5fUDzR1 v5yZH/OPehEEqmHVEpGrpC1PLf0qgWQwtHzapbiJ9D6M5FV7O8wBfzz5CGEI5EfDdtLQ CFsdOsLJ8tCgRjmeNv7PR9OIGGdu4XTkGukaM4x1kQGEEpK3Z3uixFpNTDq/uBjRSOhh PDHCjH18tal7FN38HJBt8tVmZr3jQXOQuQUoJh4T/1wlStzSQDGD45rSzuzQNeI7Du6W RnPAkovE1wyCNGA8hco/XhnfXeIUzScPdcuas1Tostkj+EjrWHcRZhOjfk4DlBui/ZYI FMsw== X-Gm-Message-State: AOJu0YxBHlmlFpTweeNs3Kh8yGfBD3njZgHrIhfDCkzHyPRv7699bhZ2 jWG03Hsu/9Rmc4hzvMBmiEX175IC9+ngfpn4ugf2DECDEk3m234MLZhXIg+W018cfyAKlkqT4jj /OqxcPA== X-Received: from plqu18.prod.google.com ([2002:a17:902:a612:b0:2a7:8c71:aa97]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:c947:b0:2aa:d647:b3ed with SMTP id d9443c01a7336-2ab505bd0fdmr36326685ad.34.1771032450744; Fri, 13 Feb 2026 17:27:30 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 13 Feb 2026 17:27:00 -0800 In-Reply-To: <20260214012702.2368778-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260214012702.2368778-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.310.g728cabbaf7-goog Message-ID: <20260214012702.2368778-15-seanjc@google.com> Subject: [PATCH v3 14/16] x86/virt/tdx: Use ida_is_empty() to detect if any TDs may be running From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Peter Zijlstra , Arnaldo Carvalho de Melo , Namhyung Kim , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, linux-perf-users@vger.kernel.org, Chao Gao , Xu Yilun , Dan Williams Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Drop nr_configured_hkid and instead use ida_is_empty() to detect if any HKIDs have been allocated/configured. Suggested-by: Dan Williams Reviewed-by: Dan Williams Reviewed-by: Chao Gao Signed-off-by: Sean Christopherson Tested-by: Chao Gao Tested-by: Sagi Shahar --- arch/x86/virt/vmx/tdx/tdx.c | 17 ++++------------- 1 file changed, 4 insertions(+), 13 deletions(-) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index ddbab87d2467..bdee937b84d4 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -59,8 +59,6 @@ static LIST_HEAD(tdx_memlist); static struct tdx_sys_info tdx_sysinfo __ro_after_init; static bool tdx_module_initialized __ro_after_init; =20 -static atomic_t nr_configured_hkid; - typedef void (*sc_err_func_t)(u64 fn, u64 err, struct tdx_module_args *arg= s); =20 static inline void seamcall_err(u64 fn, u64 err, struct tdx_module_args *a= rgs) @@ -194,7 +192,7 @@ static int tdx_offline_cpu(unsigned int cpu) int i; =20 /* No TD is running. Allow any cpu to be offline. */ - if (!atomic_read(&nr_configured_hkid)) + if (ida_is_empty(&tdx_guest_keyid_pool)) goto done; =20 /* @@ -1545,22 +1543,15 @@ EXPORT_SYMBOL_FOR_KVM(tdx_get_nr_guest_keyids); =20 int tdx_guest_keyid_alloc(void) { - int ret; - - ret =3D ida_alloc_range(&tdx_guest_keyid_pool, tdx_guest_keyid_start, - tdx_guest_keyid_start + tdx_nr_guest_keyids - 1, - GFP_KERNEL); - if (ret >=3D 0) - atomic_inc(&nr_configured_hkid); - - return ret; + return ida_alloc_range(&tdx_guest_keyid_pool, tdx_guest_keyid_start, + tdx_guest_keyid_start + tdx_nr_guest_keyids - 1, + GFP_KERNEL); } EXPORT_SYMBOL_FOR_KVM(tdx_guest_keyid_alloc); =20 void tdx_guest_keyid_free(unsigned int keyid) { ida_free(&tdx_guest_keyid_pool, keyid); - atomic_dec(&nr_configured_hkid); } EXPORT_SYMBOL_FOR_KVM(tdx_guest_keyid_free); =20 --=20 2.53.0.310.g728cabbaf7-goog From nobody Thu Apr 2 22:23:03 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8AD182E888C for ; Sat, 14 Feb 2026 01:27:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032455; cv=none; b=rPvZuErm/NQ7RcPFZxvKTchwMj+DFn8BFdn/RXIuCv7LtoBXzARtjBGSR2czqGMN7ZNrXLU1Pp8CywNo+xmeo92DUs8eMSJ/INdmkYgYUSx98kZLrJxDQq+S8UgCHEOFJgOO6gQERy0erSVzvEt50/aJosRHGhM+iHHUzJQ9KKY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032455; c=relaxed/simple; bh=n0glIiKij5SE6jJtccYoydGsQfuK8yUaTbTzRTF8FPk=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=UCPNiMKMvVO5P/E6tCHF0+qirGHtJ8T93uy9ANtIYrcrgOgm9KLoaZXTt+2adrbdIc2HKAvgiRkon8dxHI5dqUBP3VnkNpzKakLYyPHdfejOt1r1dCqq1qLHhU3S4ZS5+qCzFwwsGK9ytDhvxhnwdewAZ4Tm/UJMvlRDjzPRyK8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=PWMlD0o+; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="PWMlD0o+" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2a784b2234dso73374285ad.1 for ; Fri, 13 Feb 2026 17:27:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1771032453; x=1771637253; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=fCDccjCq5My2WdlmlvmFH+sEHO27BgDMTfTEXHYAVac=; b=PWMlD0o+3iMLoQtU44m9hyF7kD1xkm2VN/1hVsF58ld6s/iJwKtUboCn6Fqmbv7oXt bWkuCxQmiLUssrcYE5sMVjVlES441KjpD9KKLEc7etS7k3oXPZUCS6aTRYklaJOIS7fV Fa2Q/n/eYiUA3pVFIRivtidEX7K7FVgIUgkoAenFqwkJnHvuvsIEu/nKBtMEx/VQQmvv pfUUgyqB5KJEOzICbvUcCSyMAKq1t0dnZATGady7p0saJnXMWmLJLu3Gx/Vc8rFeq7yQ L+exN3P8IJF5gk6elSyvMAi6381ael/jbl6kx5jxzGFZLKA/dkyHbZnRONdz+GeMl9bA y4+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771032453; x=1771637253; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=fCDccjCq5My2WdlmlvmFH+sEHO27BgDMTfTEXHYAVac=; b=nzLxNwBE9dDtLC7DK856NLw8jrMz3YK76tE97LC7VZYYpf2q+aaNbb7VwL25s//NiN rQKdZhCrdE5rcx7600TfM7ur+UWJ+EbdOILBCpjUN9RgbQ2pRdK/6RlCRoQYEHDvh9yh ImMNT28HDIN1UCeMI+HSFxeVzC280iMtTEZx9syfOMF+dpN7ttN4kMyZRzsrG1d5amwr iHCAi+LZAlj5sSoUxht4aA+hGuQ9At586Ky6snrktLdUoYy+sELGvg1UJ8n8Af/Vbgwe C3Zi8wZtyz27xUyTq2KUEj31rUEoKue3IeeC8QlwnFtapugbbC6UtLhBbmIixrT1fb0C yz0w== X-Gm-Message-State: AOJu0Yye4JOecXb7RP1Gj8DFktO9BmBy9oDHTMHYADPAtnuwa6zhv3Re Tjz2LTaeN0MhtJfUSOzrVGV+xzcsocqK9t+dcjWI0GQWhzlehspOCu4dmjz5wb1n712nOiFH/R4 oKtWs9g== X-Received: from plse10.prod.google.com ([2002:a17:902:b78a:b0:2a9:5b1d:9c8a]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:2450:b0:2aa:d6d5:773c with SMTP id d9443c01a7336-2ab50546cb2mr32518995ad.25.1771032452483; Fri, 13 Feb 2026 17:27:32 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 13 Feb 2026 17:27:01 -0800 In-Reply-To: <20260214012702.2368778-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260214012702.2368778-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.310.g728cabbaf7-goog Message-ID: <20260214012702.2368778-16-seanjc@google.com> Subject: [PATCH v3 15/16] KVM: Bury kvm_{en,dis}able_virtualization() in kvm_main.c once more From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Peter Zijlstra , Arnaldo Carvalho de Melo , Namhyung Kim , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, linux-perf-users@vger.kernel.org, Chao Gao , Xu Yilun , Dan Williams Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now that TDX handles doing VMXON without KVM's involvement, bury the top-level APIs to enable and disable virtualization back in kvm_main.c. No functional change intended. Reviewed-by: Dan Williams Reviewed-by: Chao Gao Signed-off-by: Sean Christopherson Tested-by: Chao Gao Tested-by: Sagi Shahar --- include/linux/kvm_host.h | 8 -------- virt/kvm/kvm_main.c | 17 +++++++++++++---- 2 files changed, 13 insertions(+), 12 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 981b55c0a3a7..760e0ec2c8eb 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2605,12 +2605,4 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu = *vcpu, struct kvm_pre_fault_memory *range); #endif =20 -#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING -int kvm_enable_virtualization(void); -void kvm_disable_virtualization(void); -#else -static inline int kvm_enable_virtualization(void) { return 0; } -static inline void kvm_disable_virtualization(void) { } -#endif - #endif diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index e081e7244299..737b74b15bb5 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1112,6 +1112,9 @@ static inline struct kvm_io_bus *kvm_get_bus_for_dest= ruction(struct kvm *kvm, !refcount_read(&kvm->users_count)); } =20 +static int kvm_enable_virtualization(void); +static void kvm_disable_virtualization(void); + static struct kvm *kvm_create_vm(unsigned long type, const char *fdname) { struct kvm *kvm =3D kvm_arch_alloc_vm(); @@ -5704,7 +5707,7 @@ static struct syscore kvm_syscore =3D { .ops =3D &kvm_syscore_ops, }; =20 -int kvm_enable_virtualization(void) +static int kvm_enable_virtualization(void) { int r; =20 @@ -5749,9 +5752,8 @@ int kvm_enable_virtualization(void) --kvm_usage_count; return r; } -EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_enable_virtualization); =20 -void kvm_disable_virtualization(void) +static void kvm_disable_virtualization(void) { guard(mutex)(&kvm_usage_lock); =20 @@ -5762,7 +5764,6 @@ void kvm_disable_virtualization(void) cpuhp_remove_state(CPUHP_AP_KVM_ONLINE); kvm_arch_disable_virtualization(); } -EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_disable_virtualization); =20 static int kvm_init_virtualization(void) { @@ -5778,6 +5779,14 @@ static void kvm_uninit_virtualization(void) kvm_disable_virtualization(); } #else /* CONFIG_KVM_GENERIC_HARDWARE_ENABLING */ +static int kvm_enable_virtualization(void) +{ + return 0; +} +static void kvm_disable_virtualization(void) +{ + +} static int kvm_init_virtualization(void) { return 0; --=20 2.53.0.310.g728cabbaf7-goog From nobody Thu Apr 2 22:23:03 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6AB152F616A for ; Sat, 14 Feb 2026 01:27:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032457; cv=none; b=X3nq/9TChGa2GkOcUlmeLrVSeNzKgyjE52KRa9wGbRZwO/Tgxamouzh6dGVRBGo3L2WjmOKQjNGwZSIYbSLYDPAAHzlYrUbi+Vvqopkrkv4ReK22YdUL8RRq7mXPyMvA6gOniSIPioC/zN0zdGWUPpklKLV1E6S2ocJlmNXgOwQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032457; c=relaxed/simple; bh=s3lzO/bXszH5OMIQQtWSrd06VzARWtW4CDEP6dETZYo=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=SuJ7kY9nUrMWAvJIiRVdFpOeeUIpnlM6mREfvMSbqsbr8x2OZvWwXmgvuJOnrhn8uI0XEeMgXbVAeQoWx3f58hTsavRnYG1YtihLvGXfCh9V6Ozin81tupg2Xf+IOwXy0tdVtiBPPxnS5nRMYannW8x3j+RdCk6vAjHaHxGMHMM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=u3jchCRW; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="u3jchCRW" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2a90510a6d1so12699895ad.0 for ; Fri, 13 Feb 2026 17:27:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1771032455; x=1771637255; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=bpLFx/BKr4BaXcZ20UtDSNr2VDzmqXClfDpQcpkuB30=; b=u3jchCRWSjlhw4SO8ogo0t9LqP4/TSEeWmQq2iV5VmviyERvFTEdMmaBGMXzB4B+EG F7ijZ+dlCl6VNQ9rHIXEFS1plCtRX6edTH6h/7SopPg1FrwKthgionnx4l1IsP3X+2jd Cm21j1Vi+Zv1rRDgE94V7WCmZql5mkUXdJ4tSu1FSdTwTeEXgXqAH2lDa89sS1MFuZNS zjf/izj8IxYVzLhWPuLhehKjyairXYuHfrgwRyh6pOKWq1oC4/oh/oibm70ojAqwB6AJ WQJB2LZnKo0haAR4NgBIzZwYsRBkDnMlIzRqN/Qe7OY1j37HslYn3X9YVOU4ETy/xorZ rNyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771032455; x=1771637255; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=bpLFx/BKr4BaXcZ20UtDSNr2VDzmqXClfDpQcpkuB30=; b=anZPHiYu6NW1hOcSXlVS6T8CGw1OwrhGWDSc3iw052uAtxCs52r3/OqVCdbkYkNQsc bViuOpzq8slcNIeOX9rK9QqI5z5ascuj0GLAVhBZZojFy8BXktrCTf/NhrGPy3SSoRlD gj2xBOyxwLJnGPQQZp3NgDEYlnYNocljg5Zs7WshIJDyzm2XS9JAqzD9KDcSEFKcqk1T Ewq2SdCqlytvsTXY5bcwJwCSHn3AOxJmtXbHWzFHuUOszBNqeM/LyUO0/sRTr+vIdHkE lFGqthcK3rs0rH2tpWQluiD3bga1FQTCSxe6ZMbvkGoAkOWF3CYHYYVHdNWXakq+vppT sF1A== X-Gm-Message-State: AOJu0YytYlJKXwBtFwuca8RWvZlBtEW1FoEt0ezSA0yXyGvP1KkEDnrv WqlEwJ4xgddOm1V0zS1i9FkDN2kxrLteCdy4oW++XpS+CMPHokSCQWFlWozq7Fv8sY3CST3HqJC BElbD0w== X-Received: from plrt19.prod.google.com ([2002:a17:902:b213:b0:2aa:d3a6:c339]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:984:b0:2aa:e4f2:f076 with SMTP id d9443c01a7336-2ac974316f2mr20540065ad.8.1771032454516; Fri, 13 Feb 2026 17:27:34 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 13 Feb 2026 17:27:02 -0800 In-Reply-To: <20260214012702.2368778-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260214012702.2368778-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.310.g728cabbaf7-goog Message-ID: <20260214012702.2368778-17-seanjc@google.com> Subject: [PATCH v3 16/16] KVM: TDX: Fold tdx_bringup() into tdx_hardware_setup() From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Peter Zijlstra , Arnaldo Carvalho de Melo , Namhyung Kim , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, linux-perf-users@vger.kernel.org, Chao Gao , Xu Yilun , Dan Williams Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now that TDX doesn't need to manually enable virtualization through _KVM_ APIs during setup, fold tdx_bringup() into tdx_hardware_setup() where the code belongs, e.g. so that KVM doesn't leave the S-EPT kvm_x86_ops wired up when TDX is disabled. The weird ordering (and naming) was necessary to allow KVM TDX to use kvm_enable_virtualization(), which in turn had a hard dependency on kvm_x86_ops.enable_virtualization_cpu and thus kvm_x86_vendor_init(). Signed-off-by: Sean Christopherson Reviewed-by: Dan Williams Tested-by: Chao Gao Tested-by: Sagi Shahar --- arch/x86/kvm/vmx/main.c | 19 ++++++++----------- arch/x86/kvm/vmx/tdx.c | 39 +++++++++++++++------------------------ arch/x86/kvm/vmx/tdx.h | 8 ++------ 3 files changed, 25 insertions(+), 41 deletions(-) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index a46ccd670785..dbebddf648be 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -29,10 +29,15 @@ static __init int vt_hardware_setup(void) if (ret) return ret; =20 + return enable_tdx ? tdx_hardware_setup() : 0; +} + +static void vt_hardware_unsetup(void) +{ if (enable_tdx) - tdx_hardware_setup(); + tdx_hardware_unsetup(); =20 - return 0; + vmx_hardware_unsetup(); } =20 static int vt_vm_init(struct kvm *kvm) @@ -869,7 +874,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D { =20 .check_processor_compatibility =3D vmx_check_processor_compat, =20 - .hardware_unsetup =3D vmx_hardware_unsetup, + .hardware_unsetup =3D vt_op(hardware_unsetup), =20 .enable_virtualization_cpu =3D vmx_enable_virtualization_cpu, .disable_virtualization_cpu =3D vt_op(disable_virtualization_cpu), @@ -1029,7 +1034,6 @@ struct kvm_x86_init_ops vt_init_ops __initdata =3D { static void __exit vt_exit(void) { kvm_exit(); - tdx_cleanup(); vmx_exit(); } module_exit(vt_exit); @@ -1043,11 +1047,6 @@ static int __init vt_init(void) if (r) return r; =20 - /* tdx_init() has been taken */ - r =3D tdx_bringup(); - if (r) - goto err_tdx_bringup; - /* * TDX and VMX have different vCPU structures. Calculate the * maximum size/align so that kvm_init() can use the larger @@ -1074,8 +1073,6 @@ static int __init vt_init(void) return 0; =20 err_kvm_init: - tdx_cleanup(); -err_tdx_bringup: vmx_exit(); return r; } diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index fea3dfc7ac8b..d354022ba9c9 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -3285,7 +3285,12 @@ int tdx_gmem_max_mapping_level(struct kvm *kvm, kvm_= pfn_t pfn, bool is_private) return PG_LEVEL_4K; } =20 -static int __init __tdx_bringup(void) +void tdx_hardware_unsetup(void) +{ + misc_cg_set_capacity(MISC_CG_RES_TDX, 0); +} + +static int __init __tdx_hardware_setup(void) { const struct tdx_sys_info_td_conf *td_conf; int i; @@ -3359,7 +3364,7 @@ static int __init __tdx_bringup(void) return 0; } =20 -int __init tdx_bringup(void) +int __init tdx_hardware_setup(void) { int r, i; =20 @@ -3395,7 +3400,7 @@ int __init tdx_bringup(void) goto success_disable_tdx; } =20 - r =3D __tdx_bringup(); + r =3D __tdx_hardware_setup(); if (r) { /* * Disable TDX only but don't fail to load module if the TDX @@ -3409,31 +3414,12 @@ int __init tdx_bringup(void) */ if (r =3D=3D -ENODEV) goto success_disable_tdx; + + return r; } =20 - return r; - -success_disable_tdx: - enable_tdx =3D 0; - return 0; -} - -void tdx_cleanup(void) -{ - if (!enable_tdx) - return; - - misc_cg_set_capacity(MISC_CG_RES_TDX, 0); -} - -void __init tdx_hardware_setup(void) -{ KVM_SANITY_CHECK_VM_STRUCT_SIZE(kvm_tdx); =20 - /* - * Note, if the TDX module can't be loaded, KVM TDX support will be - * disabled but KVM will continue loading (see tdx_bringup()). - */ vt_x86_ops.vm_size =3D max_t(unsigned int, vt_x86_ops.vm_size, sizeof(str= uct kvm_tdx)); =20 vt_x86_ops.link_external_spt =3D tdx_sept_link_private_spt; @@ -3441,4 +3427,9 @@ void __init tdx_hardware_setup(void) vt_x86_ops.free_external_spt =3D tdx_sept_free_private_spt; vt_x86_ops.remove_external_spte =3D tdx_sept_remove_private_spte; vt_x86_ops.protected_apic_has_interrupt =3D tdx_protected_apic_has_interr= upt; + return 0; + +success_disable_tdx: + enable_tdx =3D 0; + return 0; } diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index 45b5183ccb36..b5cd2ffb303e 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -8,9 +8,8 @@ #ifdef CONFIG_KVM_INTEL_TDX #include "common.h" =20 -void tdx_hardware_setup(void); -int tdx_bringup(void); -void tdx_cleanup(void); +int tdx_hardware_setup(void); +void tdx_hardware_unsetup(void); =20 extern bool enable_tdx; =20 @@ -187,9 +186,6 @@ TDX_BUILD_TDVPS_ACCESSORS(8, MANAGEMENT, management); TDX_BUILD_TDVPS_ACCESSORS(64, STATE_NON_ARCH, state_non_arch); =20 #else -static inline int tdx_bringup(void) { return 0; } -static inline void tdx_cleanup(void) {} - #define enable_tdx 0 =20 struct kvm_tdx { --=20 2.53.0.310.g728cabbaf7-goog