From nobody Thu Apr 2 23:54:08 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D2EA328504D for ; Sat, 14 Feb 2026 01:27:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032448; cv=none; b=kN9FrH6utfaDCcfS1OyOY+ztfdqX0e/AqeoQrNoG2YnjPj1WVA9MkqGMvtvaQjh0O9Bf6+k1TbZUn1y0r4GgS37TA88v4n2l7jEz3FiLgfETIiWhsYuBIowTevjeGDdMJHq2IoaxjXH/GAbyatN7LWC/5/NmTrjADgh1Ci4ix/Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771032448; c=relaxed/simple; bh=f74NX1koICdcmf3x7Z0AXOlx6psRB/2lx2ODBYt41oE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ZXt9oG4t9lrWv7UyaCwdOiyOP4JqIEA1hvPAcmvaabi4pa0jN5bbkx2hL9CpyVnSAl1yxTKspeLoeNZj43EzlPK7/YRUxoioSclGQwUo/fMmTkusOYact+G8ciEsLd5rvG2nWVLnir7Navqi6uQPNKoZc2ZKS8bMytB7yCdQICU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=3Jy/U5PC; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="3Jy/U5PC" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-354c0234c1fso1480695a91.2 for ; Fri, 13 Feb 2026 17:27:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1771032433; x=1771637233; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=qVKdMdext7pNokZXu6Az08qOVw2l9S4d2dq/07UMlK8=; b=3Jy/U5PChy8jWuexnNCBmzdUfI2jdqIkBveDClMa9s0jaiZuZPT6PNrInYtzrY2Tto e+wuupOJGra2XYb1lsYBIJdDYQpX6N4JzDQB6q3iHlHWoBJGNmSzNWMoPnX4HKhi7v0B SCiM2SaH/WMYEWY/oTfDgT9V4b1EUbXH5NoCbDolvFS97psJ3BrGL6mjGVFdNtIX8eH+ P+g1Pop/DEAFtaRO87VFxMNoeyK24U2q6sDQI0f6JUXHaaKtrd0ocOO0/Sj+MOkje8Ks unTm6tax7IGawx54oqPf1vH+HUm5/C06n0VQttcAkS9cRs6/iM2Lqi6pJtxzfo6txjPi j5Qg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771032433; x=1771637233; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=qVKdMdext7pNokZXu6Az08qOVw2l9S4d2dq/07UMlK8=; b=LpXZcawBfruo6cHKEL13MHbSWXUpkv49ykgBvgF6BYmDZtz1zJlnYPDrwx599CCP6o xRHXESLvSvapBmgLyO5yFLv+GqDm0qfguytFiXWX5R956cB1MHrjvODUd2EILb96lRIK yPgiD7K61OPs3mp/GoJ5Vvl/fGzuwg/2t5NXtALwIz7kNvs+jCnl4c1nlxtMGRWHAhub 4ZX8ttBfGru6ziwQIUlqJOa1hCbi5ANd7o1rGmGoY8PGlc2UFY07uYw7HDzQ9M1zC4Z3 pR6v3vjxUATdaiTUJfeHaOXi98qe88cZEFgC/NMlrT3L/b4qdZ0FIr57EuEaT/B9f2Sj GNEA== X-Gm-Message-State: AOJu0Ywr9lL++E7tbsNHaIx6Nxa8MTx2LELZR9oZzAwj5+iun0G95Ij1 6hPATfpelMa6OVO13DCqqKG79g1nI/xjlvFNB28cJH1YHgMYAW/buzjVcwMV35wEECMYt6AEOkM BbUW6/Q== X-Received: from pjbgi4.prod.google.com ([2002:a17:90b:1104:b0:356:216c:ed75]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:5282:b0:34c:635f:f855 with SMTP id 98e67ed59e1d1-3584481e367mr1194611a91.7.1771032433065; Fri, 13 Feb 2026 17:27:13 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 13 Feb 2026 17:26:50 -0800 In-Reply-To: <20260214012702.2368778-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260214012702.2368778-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.310.g728cabbaf7-goog Message-ID: <20260214012702.2368778-5-seanjc@google.com> Subject: [PATCH v3 04/16] KVM: VMX: Unconditionally allocate root VMCSes during boot CPU bringup From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Peter Zijlstra , Arnaldo Carvalho de Melo , Namhyung Kim , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, linux-perf-users@vger.kernel.org, Chao Gao , Xu Yilun , Dan Williams Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Allocate the root VMCS (misleading called "vmxarea" and "kvm_area" in KVM) for each possible CPU during early boot CPU bringup, before early TDX initialization, so that TDX can eventually do VMXON on-demand (to make SEAMCALLs) without needing to load kvm-intel.ko. Allocate the pages early on, e.g. instead of trying to do so on-demand, to avoid having to juggle allocation failures at runtime. Opportunistically rename the per-CPU pointers to better reflect the role of the VMCS. Use Intel's "root VMCS" terminology, e.g. from various VMCS patents[1][2] and older SDMs, not the more opaque "VMXON region" used in recent versions of the SDM. While it's possible the VMCS passed to VMXON no longer serves as _the_ root VMCS on modern CPUs, it is still in effect a "root mode VMCS", as described in the patents. Link: https://patentimages.storage.googleapis.com/c7/e4/32/d7a7def5580667/W= O2013101191A1.pdf [1] Link: https://patentimages.storage.googleapis.com/13/f6/8d/1361fab8c33373/U= S20080163205A1.pdf [2] Signed-off-by: Sean Christopherson --- arch/x86/include/asm/virt.h | 13 ++++++- arch/x86/kernel/cpu/common.c | 2 + arch/x86/kvm/vmx/vmx.c | 58 ++--------------------------- arch/x86/virt/hw.c | 71 ++++++++++++++++++++++++++++++++++++ 4 files changed, 89 insertions(+), 55 deletions(-) diff --git a/arch/x86/include/asm/virt.h b/arch/x86/include/asm/virt.h index 131b9bf9ef3c..0da6db4f5b0c 100644 --- a/arch/x86/include/asm/virt.h +++ b/arch/x86/include/asm/virt.h @@ -2,10 +2,21 @@ #ifndef _ASM_X86_VIRT_H #define _ASM_X86_VIRT_H =20 -#include +#include + +#include =20 #if IS_ENABLED(CONFIG_KVM_X86) extern bool virt_rebooting; + +void __init x86_virt_init(void); + +#if IS_ENABLED(CONFIG_KVM_INTEL) +DECLARE_PER_CPU(struct vmcs *, root_vmcs); +#endif + +#else +static __always_inline void x86_virt_init(void) {} #endif =20 #endif /* _ASM_X86_VIRT_H */ diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index e7ab22fce3b5..dda9e41292db 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -71,6 +71,7 @@ #include #include #include +#include #include #include =20 @@ -2143,6 +2144,7 @@ static __init void identify_boot_cpu(void) cpu_detect_tlb(&boot_cpu_data); setup_cr_pinning(); =20 + x86_virt_init(); tsx_init(); tdx_init(); lkgs_init(); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index fc6e3b620866..abd4830f71d8 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -580,7 +580,6 @@ noinline void invept_error(unsigned long ext, u64 eptp) vmx_insn_failed("invept failed: ext=3D0x%lx eptp=3D%llx\n", ext, eptp); } =20 -static DEFINE_PER_CPU(struct vmcs *, vmxarea); DEFINE_PER_CPU(struct vmcs *, current_vmcs); /* * We maintain a per-CPU linked-list of VMCS loaded on that CPU. This is n= eeded @@ -2934,6 +2933,9 @@ static bool __kvm_is_vmx_supported(void) return false; } =20 + if (!per_cpu(root_vmcs, cpu)) + return false; + return true; } =20 @@ -3008,7 +3010,7 @@ static int kvm_cpu_vmxon(u64 vmxon_pointer) int vmx_enable_virtualization_cpu(void) { int cpu =3D raw_smp_processor_id(); - u64 phys_addr =3D __pa(per_cpu(vmxarea, cpu)); + u64 phys_addr =3D __pa(per_cpu(root_vmcs, cpu)); int r; =20 if (cr4_read_shadow() & X86_CR4_VMXE) @@ -3129,47 +3131,6 @@ int alloc_loaded_vmcs(struct loaded_vmcs *loaded_vmc= s) return -ENOMEM; } =20 -static void free_kvm_area(void) -{ - int cpu; - - for_each_possible_cpu(cpu) { - free_vmcs(per_cpu(vmxarea, cpu)); - per_cpu(vmxarea, cpu) =3D NULL; - } -} - -static __init int alloc_kvm_area(void) -{ - int cpu; - - for_each_possible_cpu(cpu) { - struct vmcs *vmcs; - - vmcs =3D alloc_vmcs_cpu(false, cpu, GFP_KERNEL); - if (!vmcs) { - free_kvm_area(); - return -ENOMEM; - } - - /* - * When eVMCS is enabled, alloc_vmcs_cpu() sets - * vmcs->revision_id to KVM_EVMCS_VERSION instead of - * revision_id reported by MSR_IA32_VMX_BASIC. - * - * However, even though not explicitly documented by - * TLFS, VMXArea passed as VMXON argument should - * still be marked with revision_id reported by - * physical CPU. - */ - if (kvm_is_using_evmcs()) - vmcs->hdr.revision_id =3D vmx_basic_vmcs_revision_id(vmcs_config.basic); - - per_cpu(vmxarea, cpu) =3D vmcs; - } - return 0; -} - static void fix_pmode_seg(struct kvm_vcpu *vcpu, int seg, struct kvm_segment *save) { @@ -8566,8 +8527,6 @@ void vmx_hardware_unsetup(void) =20 if (nested) nested_vmx_hardware_unsetup(); - - free_kvm_area(); } =20 void vmx_vm_destroy(struct kvm *kvm) @@ -8870,10 +8829,6 @@ __init int vmx_hardware_setup(void) return r; } =20 - r =3D alloc_kvm_area(); - if (r) - goto err_kvm_area; - kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler); =20 /* @@ -8900,11 +8855,6 @@ __init int vmx_hardware_setup(void) kvm_caps.inapplicable_quirks &=3D ~KVM_X86_QUIRK_IGNORE_GUEST_PAT; =20 return 0; - -err_kvm_area: - if (nested) - nested_vmx_hardware_unsetup(); - return r; } =20 void vmx_exit(void) diff --git a/arch/x86/virt/hw.c b/arch/x86/virt/hw.c index df3dc18d19b4..56972f594d90 100644 --- a/arch/x86/virt/hw.c +++ b/arch/x86/virt/hw.c @@ -1,7 +1,78 @@ // SPDX-License-Identifier: GPL-2.0-only +#include +#include +#include #include +#include +#include =20 +#include +#include #include +#include =20 __visible bool virt_rebooting; EXPORT_SYMBOL_FOR_KVM(virt_rebooting); + +#if IS_ENABLED(CONFIG_KVM_INTEL) +DEFINE_PER_CPU(struct vmcs *, root_vmcs); +EXPORT_PER_CPU_SYMBOL(root_vmcs); + +static __init void x86_vmx_exit(void) +{ + int cpu; + + for_each_possible_cpu(cpu) { + free_page((unsigned long)per_cpu(root_vmcs, cpu)); + per_cpu(root_vmcs, cpu) =3D NULL; + } +} + +static __init int x86_vmx_init(void) +{ + u64 basic_msr; + u32 rev_id; + int cpu; + + if (!cpu_feature_enabled(X86_FEATURE_VMX)) + return -EOPNOTSUPP; + + rdmsrq(MSR_IA32_VMX_BASIC, basic_msr); + + /* IA-32 SDM Vol 3B: VMCS size is never greater than 4kB. */ + if (WARN_ON_ONCE(vmx_basic_vmcs_size(basic_msr) > PAGE_SIZE)) + return -EIO; + + /* + * Even if eVMCS is enabled (or will be enabled?), and even though not + * explicitly documented by TLFS, the root VMCS passed to VMXON should + * still be marked with the revision_id reported by the physical CPU. + */ + rev_id =3D vmx_basic_vmcs_revision_id(basic_msr); + + for_each_possible_cpu(cpu) { + int node =3D cpu_to_node(cpu); + struct page *page; + struct vmcs *vmcs; + + page =3D __alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO, 0); + if (!page) { + x86_vmx_exit(); + return -ENOMEM; + } + + vmcs =3D page_address(page); + vmcs->hdr.revision_id =3D rev_id; + per_cpu(root_vmcs, cpu) =3D vmcs; + } + + return 0; +} +#else +static __init int x86_vmx_init(void) { return -EOPNOTSUPP; } +#endif + +void __init x86_virt_init(void) +{ + x86_vmx_init(); +} --=20 2.53.0.310.g728cabbaf7-goog