arch/x86/include/asm/kvm-x86-ops.h | 4 +-
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/include/asm/shared/tdx.h | 7 +-
arch/x86/include/asm/tdx.h | 25 +
.../tdx => include/asm}/tdx_global_metadata.h | 19 +
arch/x86/include/uapi/asm/kvm.h | 59 +
arch/x86/kvm/Kconfig | 2 +
arch/x86/kvm/cpuid.c | 21 +
arch/x86/kvm/cpuid.h | 3 +
arch/x86/kvm/mmu/mmu.c | 9 +-
arch/x86/kvm/vmx/main.c | 143 +-
arch/x86/kvm/vmx/pmu_intel.c | 50 +-
arch/x86/kvm/vmx/pmu_intel.h | 28 +
arch/x86/kvm/vmx/tdx.c | 1369 ++++++++++++++++-
arch/x86/kvm/vmx/tdx.h | 92 ++
arch/x86/kvm/vmx/tdx_arch.h | 165 ++
arch/x86/kvm/vmx/tdx_errno.h | 37 +
arch/x86/kvm/vmx/vmx.h | 34 +-
arch/x86/kvm/vmx/x86_ops.h | 24 +
arch/x86/kvm/x86.c | 15 +-
arch/x86/virt/vmx/tdx/tdx.c | 264 +++-
arch/x86/virt/vmx/tdx/tdx.h | 39 +-
arch/x86/virt/vmx/tdx/tdx_global_metadata.c | 46 +
23 files changed, 2391 insertions(+), 66 deletions(-)
rename arch/x86/{virt/vmx/tdx => include/asm}/tdx_global_metadata.h (68%)
create mode 100644 arch/x86/kvm/vmx/pmu_intel.h
create mode 100644 arch/x86/kvm/vmx/tdx_arch.h
create mode 100644 arch/x86/kvm/vmx/tdx_errno.h
Hi,
Here is v2 of TDX VM/vCPU creation series. As discussed earlier, non-nits
from v1[0] have been applied and it’s ready to hand off to Paolo. A few
items remain that may be worth further discussion:
- Disable CET/PT in tdx_get_supported_xfam(), as these features haven’t
been been tested.
- The Retry loop around tdh_phymem_page_reclaim() in “KVM: TDX:
create/destroy VM structure” likely can be dropped.
- Drop support for TDX Module’s that don’t support
MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM. [1]
- Type-safety in to_vmx()/to_tdx(). [2]
This series has 9 commits intended to collect acks from x86 maintainers.
The first group of commits is for reading TDX static metadata for KVM to
use:
x86/virt/tdx: Share the global metadata structure for KVM to use
x86/virt/tdx: Read essential global metadata for KVM
The second group is for exporting a TDX keyid allocator for KVM to use:
x86/virt/tdx: Add tdx_guest_keyid_alloc/free() to alloc and free
TDX guest KeyID
The third group is for exporting various SEAMCALLs needed by KVM for this
series. SEAMCALL patches for the later TDX sections will come with those
series’:
x86/virt/tdx: Add SEAMCALL wrappers for TDX KeyID management
x86/virt/tdx: Add SEAMCALL wrappers for TDX TD creation
x86/virt/tdx: Add SEAMCALL wrappers for TDX vCPU creation
x86/virt/tdx: Add SEAMCALL wrappers for TDX page cache management
x86/virt/tdx: Add SEAMCALL wrappers for TDX VM/vCPU field access
x86/virt/tdx: Add SEAMCALL wrappers for TDX flush operations
This series is based off of a kvm-coco-queue commit and some pre-req
series:
1. d659088d46df "KVM: x86/mmu: Prevent aliased memslot GFNs" (in
kvm-coco-queue).
2. v6 of “TDX host: metadata reading tweaks, bug fix and info dump”
3. “KVM: VMX: Initialize TDX when loading KVM module” re-ordered from
the commits in kvm-coco-queue
It requires TDX module 1.5.06.00.0744[3], or later. This is due to removal
of the workarounds for the lack of NO_RBP_MOD. Now NO_RBP_MOD is enabled,
and this particular version of the TDX module has a NO_RBP_MOD related bug
fix.
The full KVM branch is here:
https://github.com/intel/tdx/tree/tdx_kvm_dev-2024-10-30
Matching QEMU:
https://github.com/intel-staging/qemu-tdx/tree/tdx-qemu-wip-2024-10-11
[0] https://lore.kernel.org/kvm/20240812224820.34826-1-rick.p.edgecombe@intel.com/
[1] https://lore.kernel.org/kvm/d71540ab13e728d1326baae92e8ea82d00c08abe.camel@intel.com/
[2] https://lore.kernel.org/kvm/89657f96-0ed1-4543-9074-f13f62cc4694@redhat.com/
[3] https://github.com/intel/tdx-module/releases/tag/TDX_1.5.06
Isaku Yamahata (19):
x86/virt/tdx: Add tdx_guest_keyid_alloc/free() to alloc and free TDX
guest KeyID
x86/virt/tdx: Add SEAMCALL wrappers for TDX KeyID management
x86/virt/tdx: Add SEAMCALL wrappers for TDX TD creation
x86/virt/tdx: Add SEAMCALL wrappers for TDX vCPU creation
x86/virt/tdx: Add SEAMCALL wrappers for TDX page cache management
x86/virt/tdx: Add SEAMCALL wrappers for TDX VM/vCPU field access
x86/virt/tdx: Add SEAMCALL wrappers for TDX flush operations
KVM: TDX: Add placeholders for TDX VM/vCPU structures
KVM: TDX: Define TDX architectural definitions
KVM: TDX: Add helper functions to print TDX SEAMCALL error
KVM: TDX: Add place holder for TDX VM specific mem_enc_op ioctl
KVM: TDX: Get system-wide info about TDX module on initialization
KVM: TDX: create/destroy VM structure
KVM: TDX: Support per-VM KVM_CAP_MAX_VCPUS extension check
KVM: TDX: initialize VM with TDX specific parameters
KVM: TDX: Make pmu_intel.c ignore guest TD case
KVM: TDX: Don't offline the last cpu of one package when there's TDX
guest
KVM: TDX: create/free TDX vcpu structure
KVM: TDX: Do TDX specific vcpu initialization
Kai Huang (3):
x86/virt/tdx: Share the global metadata structure for KVM to use
KVM: TDX: Get TDX global information
x86/virt/tdx: Read essential global metadata for KVM
Sean Christopherson (1):
KVM: TDX: Add TDX "architectural" error codes
Xiaoyao Li (2):
KVM: x86: Introduce KVM_TDX_GET_CPUID
KVM: x86/mmu: Taking guest pa into consideration when calculate tdp
level
arch/x86/include/asm/kvm-x86-ops.h | 4 +-
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/include/asm/shared/tdx.h | 7 +-
arch/x86/include/asm/tdx.h | 25 +
.../tdx => include/asm}/tdx_global_metadata.h | 19 +
arch/x86/include/uapi/asm/kvm.h | 59 +
arch/x86/kvm/Kconfig | 2 +
arch/x86/kvm/cpuid.c | 21 +
arch/x86/kvm/cpuid.h | 3 +
arch/x86/kvm/mmu/mmu.c | 9 +-
arch/x86/kvm/vmx/main.c | 143 +-
arch/x86/kvm/vmx/pmu_intel.c | 50 +-
arch/x86/kvm/vmx/pmu_intel.h | 28 +
arch/x86/kvm/vmx/tdx.c | 1369 ++++++++++++++++-
arch/x86/kvm/vmx/tdx.h | 92 ++
arch/x86/kvm/vmx/tdx_arch.h | 165 ++
arch/x86/kvm/vmx/tdx_errno.h | 37 +
arch/x86/kvm/vmx/vmx.h | 34 +-
arch/x86/kvm/vmx/x86_ops.h | 24 +
arch/x86/kvm/x86.c | 15 +-
arch/x86/virt/vmx/tdx/tdx.c | 264 +++-
arch/x86/virt/vmx/tdx/tdx.h | 39 +-
arch/x86/virt/vmx/tdx/tdx_global_metadata.c | 46 +
23 files changed, 2391 insertions(+), 66 deletions(-)
rename arch/x86/{virt/vmx/tdx => include/asm}/tdx_global_metadata.h (68%)
create mode 100644 arch/x86/kvm/vmx/pmu_intel.h
create mode 100644 arch/x86/kvm/vmx/tdx_arch.h
create mode 100644 arch/x86/kvm/vmx/tdx_errno.h
--
2.47.0
Applied to kvm-coco-queue, thanks. Tomorrow I will go through the changes and review. Paolo
On 30/10/24 21:00, Rick Edgecombe wrote: > Here is v2 of TDX VM/vCPU creation series. As discussed earlier, non-nits > from v1[0] have been applied and it’s ready to hand off to Paolo. A few > items remain that may be worth further discussion: > - Disable CET/PT in tdx_get_supported_xfam(), as these features haven’t > been been tested. It seems for Intel PT we have no support for restoring host state. IA32_RTIT_* MSR preservation is Init(XFAM(8)) which means the TDX Module sets the MSR to its RESET value after TD Enty/Exit. So it seems to me XFAM(8) does need to be disabled until that is supported.
On Thu, 2024-10-31 at 21:21 +0200, Adrian Hunter wrote: > On 30/10/24 21:00, Rick Edgecombe wrote: > > Here is v2 of TDX VM/vCPU creation series. As discussed earlier, non-nits > > from v1[0] have been applied and it’s ready to hand off to Paolo. A few > > items remain that may be worth further discussion: > > - Disable CET/PT in tdx_get_supported_xfam(), as these features haven’t > > been been tested. > > It seems for Intel PT we have no support for restoring host > state. IA32_RTIT_* MSR preservation is Init(XFAM(8)) which means > the TDX Module sets the MSR to its RESET value after TD Enty/Exit. > So it seems to me XFAM(8) does need to be disabled until that is > supported. Good point. Let's disable it and CET. We can try a fixup patch when these land in kvm-coco-queue.
On Thu, Oct 31, 2024 at 09:21:29PM +0200, Adrian Hunter wrote: > On 30/10/24 21:00, Rick Edgecombe wrote: > > Here is v2 of TDX VM/vCPU creation series. As discussed earlier, non-nits > > from v1[0] have been applied and it’s ready to hand off to Paolo. A few > > items remain that may be worth further discussion: > > - Disable CET/PT in tdx_get_supported_xfam(), as these features haven’t > > been been tested. > > It seems for Intel PT we have no support for restoring host > state. IA32_RTIT_* MSR preservation is Init(XFAM(8)) which means > the TDX Module sets the MSR to its RESET value after TD Enty/Exit. > So it seems to me XFAM(8) does need to be disabled until that is > supported. So for now, we should remove the PT bit from tdx_get_supported_xfam(), but can still keep it in tdx_restore_host_xsave_state()? Then for save/restore, maybe we can just use the pt_guest_enter() and pt_guest_exit() also for TDX. Some additional checks are needed for the pt_mode though as the TDX module always clears the state if PT is enabled. And the PT_MODE_SYSTEM will be missing TDX enter/exit data but might be otherwise usable. Regards, Tony
On 11/11/24 11:49, Tony Lindgren wrote: > On Thu, Oct 31, 2024 at 09:21:29PM +0200, Adrian Hunter wrote: >> On 30/10/24 21:00, Rick Edgecombe wrote: >>> Here is v2 of TDX VM/vCPU creation series. As discussed earlier, non-nits >>> from v1[0] have been applied and it’s ready to hand off to Paolo. A few >>> items remain that may be worth further discussion: >>> - Disable CET/PT in tdx_get_supported_xfam(), as these features haven’t >>> been been tested. >> >> It seems for Intel PT we have no support for restoring host >> state. IA32_RTIT_* MSR preservation is Init(XFAM(8)) which means >> the TDX Module sets the MSR to its RESET value after TD Enty/Exit. >> So it seems to me XFAM(8) does need to be disabled until that is >> supported. > > So for now, we should remove the PT bit from tdx_get_supported_xfam(), > but can still keep it in tdx_restore_host_xsave_state()? Yes > > Then for save/restore, maybe we can just use the pt_guest_enter() and > pt_guest_exit() also for TDX. Some additional checks are needed for > the pt_mode though as the TDX module always clears the state if PT is > enabled. And the PT_MODE_SYSTEM will be missing TDX enter/exit data > but might be otherwise usable. pt_guest_enter() / pt_guest_exit() are not suitable for TDX. pt_mode is not relevant for TDX because the TDX guest is always hidden from the host behind SEAM. However, restoring host MSRs is not the only issue. The TDX Module does not validate Intel PT CPUID leaf 0x14 (except it must be all zero if Intel PT is not supported i.e. if XFAM bit 8 is zero). For invalid MSR accesses by the guest, the TDX Module will inject #GP. Host VMM could provide valid CPUID to avoid that, but it would also need to be valid for the destination platform if migration was to be attempted. Disabling Intel PT for TDX for now also avoids that issue.
On Tue, Nov 12, 2024 at 09:26:36AM +0200, Adrian Hunter wrote: > On 11/11/24 11:49, Tony Lindgren wrote: > > On Thu, Oct 31, 2024 at 09:21:29PM +0200, Adrian Hunter wrote: > >> On 30/10/24 21:00, Rick Edgecombe wrote: > >>> Here is v2 of TDX VM/vCPU creation series. As discussed earlier, non-nits > >>> from v1[0] have been applied and it’s ready to hand off to Paolo. A few > >>> items remain that may be worth further discussion: > >>> - Disable CET/PT in tdx_get_supported_xfam(), as these features haven’t > >>> been been tested. > >> > >> It seems for Intel PT we have no support for restoring host > >> state. IA32_RTIT_* MSR preservation is Init(XFAM(8)) which means > >> the TDX Module sets the MSR to its RESET value after TD Enty/Exit. > >> So it seems to me XFAM(8) does need to be disabled until that is > >> supported. > > > > So for now, we should remove the PT bit from tdx_get_supported_xfam(), > > but can still keep it in tdx_restore_host_xsave_state()? > > Yes > > > > > Then for save/restore, maybe we can just use the pt_guest_enter() and > > pt_guest_exit() also for TDX. Some additional checks are needed for > > the pt_mode though as the TDX module always clears the state if PT is > > enabled. And the PT_MODE_SYSTEM will be missing TDX enter/exit data > > but might be otherwise usable. > > pt_guest_enter() / pt_guest_exit() are not suitable for TDX. pt_mode > is not relevant for TDX because the TDX guest is always hidden from the > host behind SEAM. However, restoring host MSRs is not the only issue. > > The TDX Module does not validate Intel PT CPUID leaf 0x14 > (except it must be all zero if Intel PT is not supported > i.e. if XFAM bit 8 is zero). For invalid MSR accesses by the guest, > the TDX Module will inject #GP. Host VMM could provide valid CPUID > to avoid that, but it would also need to be valid for the destination > platform if migration was to be attempted. > > Disabling Intel PT for TDX for now also avoids that issue. OK thanks for the detailed explanation. Regards, Tony
On Wed, Oct 30, 2024 at 8:01 PM Rick Edgecombe <rick.p.edgecombe@intel.com> wrote: > > Hi, > > Here is v2 of TDX VM/vCPU creation series. As discussed earlier, non-nits > from v1[0] have been applied and it’s ready to hand off to Paolo. A few > items remain that may be worth further discussion: > - Disable CET/PT in tdx_get_supported_xfam(), as these features haven’t > been been tested. > - The Retry loop around tdh_phymem_page_reclaim() in “KVM: TDX: > create/destroy VM structure” likely can be dropped. > - Drop support for TDX Module’s that don’t support > MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM. [1] > - Type-safety in to_vmx()/to_tdx(). [2] To sum up: removed: 04 replaced by add wrapper functions for SEAMCALLs subseries 06: not needed anymore, all logic for KeyID mgmt now in x86/virt/tdx 10: tdx_capabilities dropped, replaced mostly by 02 11: KVM_TDX_CAPABILITIES moved to patch 16 19: not needed anymore 20: was needed by patch 24 22: folded in other patches 24: left for later 25: left for later/for userspace 01/02:ok 03: need to change 32 to 128 04: ok 05/06/07/08/09/10: replaced with https://lore.kernel.org/kvm/20241203010317.827803-2-rick.p.edgecombe@intel.com/ 11: see the type safety comment above: > The ugly part here is the type-unsafety of to_vmx/to_tdx. We probably > should add some "#pragma poison" of to_vmx/to_tdx: for example both can > be poisoned in pmu_intel.c after the definition of > vcpu_to_lbr_records(), while one of them can be poisoned in > sgx.c/posted_intr.c/vmx.c/tdx.c. 12/13/14/15: ok 16/17: to review 18: not sure why the check against num_present_cpus() is needed? 19: ok 20: ok 21: ok 22: missing review comment from v1 > + /* TDX only supports x2APIC, which requires an in-kernel local APIC. */ > + if (!vcpu->arch.apic) > + return -EINVAL; nit: Use kvm_apic_present() 23: ok 24: need to apply fix - if (sub_leaf & TDX_MD_UNREADABLE_LEAF_MASK || + if (leaf & TDX_MD_UNREADABLE_LEAF_MASK || 25: ok
On Mon, 2024-12-23 at 17:25 +0100, Paolo Bonzini wrote: > To sum up: > > removed: > 04 replaced by add wrapper functions for SEAMCALLs subseries > 06: not needed anymore, all logic for KeyID mgmt now in x86/virt/tdx > 10: tdx_capabilities dropped, replaced mostly by 02 Sorry, what is this? Not from patch 10 "x86/virt/tdx: Add SEAMCALL wrappers for TDX flush operations". What was dropped from which patch? > 11: KVM_TDX_CAPABILITIES moved to patch 16 > 19: not needed anymore I guess this is not referring to "KVM: TDX: initialize VM with TDX specific parameters", so not sure which one is dropped. > 20: was needed by patch 24 > 22: folded in other patches > 24: left for later > 25: left for later/for userspace Ok. I'm can't figure out what these numbers correspond to, but kvm-coco-queue doesn't seem to have dropped any patches yet, so maybe it will make more sense when I can take a look at the refresh there. > > 01/02:ok > 03: need to change 32 to 128 > 04: ok > 05/06/07/08/09/10: replaced with > https://lore.kernel.org/kvm/20241203010317.827803-2-rick.p.edgecombe@intel.com/ > 11: see the type safety comment above: > > The ugly part here is the type-unsafety of to_vmx/to_tdx. We probably > > should add some "#pragma poison" of to_vmx/to_tdx: for example both can > > be poisoned in pmu_intel.c after the definition of > > vcpu_to_lbr_records(), while one of them can be poisoned in > > sgx.c/posted_intr.c/vmx.c/tdx.c. I left it off because you said "Not a strict requirement though." and gave it a RB tag. Other stuff seemed higher priority. We can look at some options for a follow on patch if it lightens your load. > > 12/13/14/15: ok > 16/17: to review > 18: not sure why the check against num_present_cpus() is needed? The per-vm KVM_MAX_VCPUS will be min_t(int, kvm->max_vcpus, num_present_cpus()). So if td_conf->max_vcpus_per_td < num_present_cpus(), then it might report supporting more CPUs then actually supported by the TDX module. As to why not just report td_conf->max_vcpus_per_td, that value is the max CPUs that are supported by any platform the TDX module supports. So it is more about what the TDX module supports, then what userspace cares about (how many vCPUs they can use). I think we could probably get by without the check and blame the TDX module if it does something strange. It is seems safer ABI-wise to have the check. But we are being a bit more cavalier around protecting against TDX supported CPUID bit changes then originally planned, so the check here now seems inconsistent. Let me flag Kai to confirm there was not some known violating configuration. He explored a bunch of edge cases on this corner. > 19: ok > 20: ok > 21: ok > > 22: missing review comment from v1 > > > + /* TDX only supports x2APIC, which requires an in-kernel local APIC. */ > > + if (!vcpu->arch.apic) > > + return -EINVAL; > > nit: Use kvm_apic_present() Oops, nice catch. > > 23: ok > > 24: need to apply fix > > - if (sub_leaf & TDX_MD_UNREADABLE_LEAF_MASK || > + if (leaf & TDX_MD_UNREADABLE_LEAF_MASK || > > 25: ok
On Sat, Jan 04, 2025 at 01:43:56AM +0000, Edgecombe, Rick P wrote: > On Mon, 2024-12-23 at 17:25 +0100, Paolo Bonzini wrote: > > 11: see the type safety comment above: > > > The ugly part here is the type-unsafety of to_vmx/to_tdx. We probably > > > should add some "#pragma poison" of to_vmx/to_tdx: for example both can > > > be poisoned in pmu_intel.c after the definition of > > > vcpu_to_lbr_records(), while one of them can be poisoned in > > > sgx.c/posted_intr.c/vmx.c/tdx.c. > > I left it off because you said "Not a strict requirement though." and gave it a > RB tag. Other stuff seemed higher priority. We can look at some options for a > follow on patch if it lightens your load. I suggest we do this: - Make to_kvm_tdx() and to_tdx() private to tdx.c as they're only used in tdx.c - Add pragma poison to_vmx at the top of tdx.c - Add pragma poison to_vmx in pmu_intel.c after vcpu_to_lbr_records() Other pragma poison to_vmx can be added as needed, but AFAIK there's not need to add it for to_tdx(). Regards, Tony
On Sat, Jan 04, 2025 at 01:43:56AM +0000, Edgecombe, Rick P wrote: > On Mon, 2024-12-23 at 17:25 +0100, Paolo Bonzini wrote: > > 22: missing review comment from v1 > > > > > + /* TDX only supports x2APIC, which requires an in-kernel local APIC. */ > > > + if (!vcpu->arch.apic) > > > + return -EINVAL; > > > > nit: Use kvm_apic_present() > > Oops, nice catch. Sorry this fell through. I made a patch for this earlier but missed it while rebasing to a later dev branch and never sent it. Below is a rebased version against the current KVM CoCo queue to fold in if still needed. Sounds like this might be already dealt with in Paolo's upcoming CoCo queue branch though. Regards, Tony 8< -------------------- From aac264e9923c15522baf9ae765b1d58165c24523 Mon Sep 17 00:00:00 2001 From: Tony Lindgren <tony.lindgren@linux.intel.com> Date: Mon, 2 Sep 2024 13:52:20 +0300 Subject: [PATCH 1/1] KVM/TDX: Use kvm_apic_present() in tdx_vcpu_create() Use kvm_apic_present() in tdx_vcpu_create(). We need to now export apic_hw_disabled for kvm-intel to use it. Suggested-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Tony Lindgren <tony.lindgren@linux.intel.com> --- arch/x86/kvm/lapic.c | 2 ++ arch/x86/kvm/vmx/tdx.c | 3 ++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index fcf3a8907196..2b83092eace2 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -139,6 +139,8 @@ __read_mostly DEFINE_STATIC_KEY_FALSE(kvm_has_noapic_vcpu); EXPORT_SYMBOL_GPL(kvm_has_noapic_vcpu); __read_mostly DEFINE_STATIC_KEY_DEFERRED_FALSE(apic_hw_disabled, HZ); +EXPORT_SYMBOL_GPL(apic_hw_disabled); + __read_mostly DEFINE_STATIC_KEY_DEFERRED_FALSE(apic_sw_disabled, HZ); static inline int apic_enabled(struct kvm_lapic *apic) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index d0dc3200fa37..6c68567d964d 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -8,6 +8,7 @@ #include "capabilities.h" #include "mmu.h" #include "x86_ops.h" +#include "lapic.h" #include "tdx.h" #include "vmx.h" #include "mmu/spte.h" @@ -674,7 +675,7 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu) return -EIO; /* TDX only supports x2APIC, which requires an in-kernel local APIC. */ - if (!vcpu->arch.apic) + if (!kvm_apic_present(vcpu)) return -EINVAL; fpstate_set_confidential(&vcpu->arch.guest_fpu); -- 2.47.1
On 7.01.25 г. 9:37 ч., Tony Lindgren wrote: > On Sat, Jan 04, 2025 at 01:43:56AM +0000, Edgecombe, Rick P wrote: >> On Mon, 2024-12-23 at 17:25 +0100, Paolo Bonzini wrote: >>> 22: missing review comment from v1 >>> >>>> + /* TDX only supports x2APIC, which requires an in-kernel local APIC. */ >>>> + if (!vcpu->arch.apic) >>>> + return -EINVAL; >>> >>> nit: Use kvm_apic_present() >> >> Oops, nice catch. > > Sorry this fell through. I made a patch for this earlier but missed it > while rebasing to a later dev branch and never sent it. > > Below is a rebased version against the current KVM CoCo queue to fold > in if still needed. Sounds like this might be already dealt with in > Paolo's upcoming CoCo queue branch though. > > Regards, > > Tony > > 8< -------------------- > From aac264e9923c15522baf9ae765b1d58165c24523 Mon Sep 17 00:00:00 2001 > From: Tony Lindgren <tony.lindgren@linux.intel.com> > Date: Mon, 2 Sep 2024 13:52:20 +0300 > Subject: [PATCH 1/1] KVM/TDX: Use kvm_apic_present() in tdx_vcpu_create() > > Use kvm_apic_present() in tdx_vcpu_create(). We need to now export > apic_hw_disabled for kvm-intel to use it. > > Suggested-by: Nikolay Borisov <nik.borisov@suse.com> > Signed-off-by: Tony Lindgren <tony.lindgren@linux.intel.com> > --- > arch/x86/kvm/lapic.c | 2 ++ > arch/x86/kvm/vmx/tdx.c | 3 ++- > 2 files changed, 4 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index fcf3a8907196..2b83092eace2 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -139,6 +139,8 @@ __read_mostly DEFINE_STATIC_KEY_FALSE(kvm_has_noapic_vcpu); > EXPORT_SYMBOL_GPL(kvm_has_noapic_vcpu); > > __read_mostly DEFINE_STATIC_KEY_DEFERRED_FALSE(apic_hw_disabled, HZ); > +EXPORT_SYMBOL_GPL(apic_hw_disabled); Is it really required to expose this symbol? apic_hw_disabled is defined as static inline in the header? > +> __read_mostly DEFINE_STATIC_KEY_DEFERRED_FALSE(apic_sw_disabled, HZ); > > static inline int apic_enabled(struct kvm_lapic *apic) > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > index d0dc3200fa37..6c68567d964d 100644 > --- a/arch/x86/kvm/vmx/tdx.c > +++ b/arch/x86/kvm/vmx/tdx.c > @@ -8,6 +8,7 @@ > #include "capabilities.h" > #include "mmu.h" > #include "x86_ops.h" > +#include "lapic.h" > #include "tdx.h" > #include "vmx.h" > #include "mmu/spte.h" > @@ -674,7 +675,7 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu) > return -EIO; > > /* TDX only supports x2APIC, which requires an in-kernel local APIC. */ > - if (!vcpu->arch.apic) > + if (!kvm_apic_present(vcpu)) > return -EINVAL; > > fpstate_set_confidential(&vcpu->arch.guest_fpu);
On Tue, Jan 07, 2025 at 02:41:51PM +0200, Nikolay Borisov wrote: > On 7.01.25 г. 9:37 ч., Tony Lindgren wrote: > > --- a/arch/x86/kvm/lapic.c > > +++ b/arch/x86/kvm/lapic.c > > @@ -139,6 +139,8 @@ __read_mostly DEFINE_STATIC_KEY_FALSE(kvm_has_noapic_vcpu); > > EXPORT_SYMBOL_GPL(kvm_has_noapic_vcpu); > > __read_mostly DEFINE_STATIC_KEY_DEFERRED_FALSE(apic_hw_disabled, HZ); > > +EXPORT_SYMBOL_GPL(apic_hw_disabled); > > Is it really required to expose this symbol? apic_hw_disabled is defined as > static inline in the header? For loadable modules yes, otherwise we'll get: ERROR: modpost: "apic_hw_disabled" [arch/x86/kvm/kvm-intel.ko] undefined! This is similar to the EXPORT_SYMBOL_GPL(kvm_has_noapic_vcpu) already there. Regards, Tony
On Wed, Jan 08, 2025, Tony Lindgren wrote: > On Tue, Jan 07, 2025 at 02:41:51PM +0200, Nikolay Borisov wrote: > > On 7.01.25 г. 9:37 ч., Tony Lindgren wrote: > > > --- a/arch/x86/kvm/lapic.c > > > +++ b/arch/x86/kvm/lapic.c > > > @@ -139,6 +139,8 @@ __read_mostly DEFINE_STATIC_KEY_FALSE(kvm_has_noapic_vcpu); > > > EXPORT_SYMBOL_GPL(kvm_has_noapic_vcpu); > > > __read_mostly DEFINE_STATIC_KEY_DEFERRED_FALSE(apic_hw_disabled, HZ); > > > +EXPORT_SYMBOL_GPL(apic_hw_disabled); > > > > Is it really required to expose this symbol? apic_hw_disabled is defined as > > static inline in the header? No, apic_hw_disabled can't be "static inline", because it's a variable, not a function. > For loadable modules yes, otherwise we'll get: > > ERROR: modpost: "apic_hw_disabled" [arch/x86/kvm/kvm-intel.ko] undefined! > > This is similar to the EXPORT_SYMBOL_GPL(kvm_has_noapic_vcpu) already > there. Heh, which is a hint that you're using the wrong helper. TDX should check lapic_in_kernel(), not kvm_apic_present(). The former verifies that local APIC emulation/virtualization is handed in-kernel, i.e. by KVM. The latter checks that the local APIC is in-kernel *and* that the vCPU's local APIC is hardware enabled, and checking that the local APIC is hardware enabled is unnecessary and only works by sheer dumb luck. The only reason kvm_create_lapic() stuffs the enable bit is to avoid toggling the static key, which incurs costly IPIs to patch kernel text. If apic_hw_disabled were to be removed (which is somewhat seriously being considered), this code would be deleted and TDX would break. /* * Stuff the APIC ENABLE bit in lieu of temporarily incrementing * apic_hw_disabled; the full RESET value is set by kvm_lapic_reset(). */ vcpu->arch.apic_base = MSR_IA32_APICBASE_ENABLE;
On Wed, Jan 08, 2025 at 07:01:01AM -0800, Sean Christopherson wrote: > On Wed, Jan 08, 2025, Tony Lindgren wrote: > > On Tue, Jan 07, 2025 at 02:41:51PM +0200, Nikolay Borisov wrote: > > > On 7.01.25 г. 9:37 ч., Tony Lindgren wrote: > > > > --- a/arch/x86/kvm/lapic.c > > > > +++ b/arch/x86/kvm/lapic.c > > > > @@ -139,6 +139,8 @@ __read_mostly DEFINE_STATIC_KEY_FALSE(kvm_has_noapic_vcpu); > > > > EXPORT_SYMBOL_GPL(kvm_has_noapic_vcpu); > > > > __read_mostly DEFINE_STATIC_KEY_DEFERRED_FALSE(apic_hw_disabled, HZ); > > > > +EXPORT_SYMBOL_GPL(apic_hw_disabled); > > > > > > Is it really required to expose this symbol? apic_hw_disabled is defined as > > > static inline in the header? > > No, apic_hw_disabled can't be "static inline", because it's a variable, not a > function. > > > For loadable modules yes, otherwise we'll get: > > > > ERROR: modpost: "apic_hw_disabled" [arch/x86/kvm/kvm-intel.ko] undefined! > > > > This is similar to the EXPORT_SYMBOL_GPL(kvm_has_noapic_vcpu) already > > there. > > Heh, which is a hint that you're using the wrong helper. TDX should check > lapic_in_kernel(), not kvm_apic_present(). The former verifies that local APIC > emulation/virtualization is handed in-kernel, i.e. by KVM. The latter checks > that the local APIC is in-kernel *and* that the vCPU's local APIC is hardware > enabled, and checking that the local APIC is hardware enabled is unnecessary > and only works by sheer dumb luck. OK makes sense :) > The only reason kvm_create_lapic() stuffs the enable bit is to avoid toggling > the static key, which incurs costly IPIs to patch kernel text. If > apic_hw_disabled were to be removed (which is somewhat seriously being considered), > this code would be deleted and TDX would break. > > /* > * Stuff the APIC ENABLE bit in lieu of temporarily incrementing > * apic_hw_disabled; the full RESET value is set by kvm_lapic_reset(). > */ > vcpu->arch.apic_base = MSR_IA32_APICBASE_ENABLE; Thanks for the clarification. Updated patch below for reference in case it's still needed. Regards, Tony 8< ---------------------- From 1e4b72fe4a69f0bdd7c8379315b97be79fb6cf8a Mon Sep 17 00:00:00 2001 From: Tony Lindgren <tony.lindgren@linux.intel.com> Date: Mon, 2 Sep 2024 13:52:20 +0300 Subject: [PATCH 1/1] KVM/TDX: Use lapic_in_kernel() in tdx_vcpu_create() Use lapic_in_kernel() in tdx_vcpu_create(). Suggested-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Tony Lindgren <tony.lindgren@linux.intel.com> --- arch/x86/kvm/vmx/tdx.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index d0dc3200fa37..b905a7c9e2ff 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -8,6 +8,7 @@ #include "capabilities.h" #include "mmu.h" #include "x86_ops.h" +#include "lapic.h" #include "tdx.h" #include "vmx.h" #include "mmu/spte.h" @@ -674,7 +675,7 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu) return -EIO; /* TDX only supports x2APIC, which requires an in-kernel local APIC. */ - if (!vcpu->arch.apic) + if (!lapic_in_kernel(vcpu)) return -EINVAL; fpstate_set_confidential(&vcpu->arch.guest_fpu); -- 2.47.1
On Sat, 2025-01-04 at 01:43 +0000, Edgecombe, Rick P wrote: > > 18: not sure why the check against num_present_cpus() is needed? > > The per-vm KVM_MAX_VCPUS will be min_t(int, kvm->max_vcpus, num_present_cpus()). > So if td_conf->max_vcpus_per_td < num_present_cpus(), then it might report > supporting more CPUs then actually supported by the TDX module. Right. > > As to why not just report td_conf->max_vcpus_per_td, that value is the max CPUs > that are supported by any platform the TDX module supports. So it is more about > what the TDX module supports, then what userspace cares about (how many vCPUs > they can use). Sean didn't want to make reporting maximum vcpus depend on the whims of TDX module since this doesn't provide a predictable ABI: https://lore.kernel.org/kvm/ZmzaqRy2zjvlsDfL@google.com/ > > I think we could probably get by without the check and blame the TDX module if > it does something strange. It is seems safer ABI-wise to have the check. But we > are being a bit more cavalier around protecting against TDX supported CPUID bit > changes then originally planned, so the check here now seems inconsistent. > > Let me flag Kai to confirm there was not some known violating configuration. He > explored a bunch of edge cases on this corner. In practice the "max_vcpu_per_td" will never be smaller than the maximum logical CPUs that ALL the platforms that the module supports can possibly have. I got this from the TDX module guys, and I don't think there's any reason for the TDX module to break this. However from module ABI's perspective (from the JSON), it could be any value, so I think we should have a sanity check. I think this is also different from the "array size of CPUID_CONFIGs" ABI breakage (assuming this is what you meant "protecting TDX supported CPUID bits" above) since it is currently documented as 32 in the JSON.
© 2016 - 2026 Red Hat, Inc.