[v2] TDX vCPU/VM creation

[PATCH v2 00/25] TDX vCPU/VM creation

Posted by Rick Edgecombe 1 year, 3 months ago

Hi,

Here is v2 of TDX VM/vCPU creation series. As discussed earlier, non-nits 
from v1[0] have been applied and it’s ready to hand off to Paolo. A few 
items remain that may be worth further discussion:
 - Disable CET/PT in tdx_get_supported_xfam(), as these features haven’t 
   been been tested.
 - The Retry loop around tdh_phymem_page_reclaim() in “KVM: TDX: 
   create/destroy VM structure” likely can be dropped.
 - Drop support for TDX Module’s that don’t support
   MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM. [1]
 - Type-safety in to_vmx()/to_tdx(). [2]

This series has 9 commits intended to collect acks from x86 maintainers. 
The first group of commits is for reading TDX static metadata for KVM to 
use:
	x86/virt/tdx: Share the global metadata structure for KVM to use
	x86/virt/tdx: Read essential global metadata for KVM

The second group is for exporting a TDX keyid allocator for KVM to use:
	x86/virt/tdx: Add tdx_guest_keyid_alloc/free() to alloc and free
	  TDX guest KeyID

The third group is for exporting various SEAMCALLs needed by KVM for this 
series. SEAMCALL patches for the later TDX sections will come with those 
series’:
	x86/virt/tdx: Add SEAMCALL wrappers for TDX KeyID management
	x86/virt/tdx: Add SEAMCALL wrappers for TDX TD creation
	x86/virt/tdx: Add SEAMCALL wrappers for TDX vCPU creation
	x86/virt/tdx: Add SEAMCALL wrappers for TDX page cache management
	x86/virt/tdx: Add SEAMCALL wrappers for TDX VM/vCPU field access
	x86/virt/tdx: Add SEAMCALL wrappers for TDX flush operations

This series is based off of a kvm-coco-queue commit and some pre-req
series:
1. d659088d46df "KVM: x86/mmu: Prevent aliased memslot GFNs" (in
   kvm-coco-queue).
2. v6 of “TDX host: metadata reading tweaks, bug fix and info dump”
3. “KVM: VMX: Initialize TDX when loading KVM module” re-ordered from 
   the commits in kvm-coco-queue

It requires TDX module 1.5.06.00.0744[3], or later. This is due to removal
of the workarounds for the lack of NO_RBP_MOD. Now NO_RBP_MOD is enabled,
and this particular version of the TDX module has a NO_RBP_MOD related bug
fix.

The full KVM branch is here:
https://github.com/intel/tdx/tree/tdx_kvm_dev-2024-10-30

Matching QEMU:
https://github.com/intel-staging/qemu-tdx/tree/tdx-qemu-wip-2024-10-11

[0] https://lore.kernel.org/kvm/20240812224820.34826-1-rick.p.edgecombe@intel.com/
[1] https://lore.kernel.org/kvm/d71540ab13e728d1326baae92e8ea82d00c08abe.camel@intel.com/
[2] https://lore.kernel.org/kvm/89657f96-0ed1-4543-9074-f13f62cc4694@redhat.com/
[3] https://github.com/intel/tdx-module/releases/tag/TDX_1.5.06

Isaku Yamahata (19):
  x86/virt/tdx: Add tdx_guest_keyid_alloc/free() to alloc and free TDX
    guest KeyID
  x86/virt/tdx: Add SEAMCALL wrappers for TDX KeyID management
  x86/virt/tdx: Add SEAMCALL wrappers for TDX TD creation
  x86/virt/tdx: Add SEAMCALL wrappers for TDX vCPU creation
  x86/virt/tdx: Add SEAMCALL wrappers for TDX page cache management
  x86/virt/tdx: Add SEAMCALL wrappers for TDX VM/vCPU field access
  x86/virt/tdx: Add SEAMCALL wrappers for TDX flush operations
  KVM: TDX: Add placeholders for TDX VM/vCPU structures
  KVM: TDX: Define TDX architectural definitions
  KVM: TDX: Add helper functions to print TDX SEAMCALL error
  KVM: TDX: Add place holder for TDX VM specific mem_enc_op ioctl
  KVM: TDX: Get system-wide info about TDX module on initialization
  KVM: TDX: create/destroy VM structure
  KVM: TDX: Support per-VM KVM_CAP_MAX_VCPUS extension check
  KVM: TDX: initialize VM with TDX specific parameters
  KVM: TDX: Make pmu_intel.c ignore guest TD case
  KVM: TDX: Don't offline the last cpu of one package when there's TDX
    guest
  KVM: TDX: create/free TDX vcpu structure
  KVM: TDX: Do TDX specific vcpu initialization

Kai Huang (3):
  x86/virt/tdx: Share the global metadata structure for KVM to use
  KVM: TDX: Get TDX global information
  x86/virt/tdx: Read essential global metadata for KVM

Sean Christopherson (1):
  KVM: TDX: Add TDX "architectural" error codes

Xiaoyao Li (2):
  KVM: x86: Introduce KVM_TDX_GET_CPUID
  KVM: x86/mmu: Taking guest pa into consideration when calculate tdp
    level

 arch/x86/include/asm/kvm-x86-ops.h            |    4 +-
 arch/x86/include/asm/kvm_host.h               |    2 +
 arch/x86/include/asm/shared/tdx.h             |    7 +-
 arch/x86/include/asm/tdx.h                    |   25 +
 .../tdx => include/asm}/tdx_global_metadata.h |   19 +
 arch/x86/include/uapi/asm/kvm.h               |   59 +
 arch/x86/kvm/Kconfig                          |    2 +
 arch/x86/kvm/cpuid.c                          |   21 +
 arch/x86/kvm/cpuid.h                          |    3 +
 arch/x86/kvm/mmu/mmu.c                        |    9 +-
 arch/x86/kvm/vmx/main.c                       |  143 +-
 arch/x86/kvm/vmx/pmu_intel.c                  |   50 +-
 arch/x86/kvm/vmx/pmu_intel.h                  |   28 +
 arch/x86/kvm/vmx/tdx.c                        | 1369 ++++++++++++++++-
 arch/x86/kvm/vmx/tdx.h                        |   92 ++
 arch/x86/kvm/vmx/tdx_arch.h                   |  165 ++
 arch/x86/kvm/vmx/tdx_errno.h                  |   37 +
 arch/x86/kvm/vmx/vmx.h                        |   34 +-
 arch/x86/kvm/vmx/x86_ops.h                    |   24 +
 arch/x86/kvm/x86.c                            |   15 +-
 arch/x86/virt/vmx/tdx/tdx.c                   |  264 +++-
 arch/x86/virt/vmx/tdx/tdx.h                   |   39 +-
 arch/x86/virt/vmx/tdx/tdx_global_metadata.c   |   46 +
 23 files changed, 2391 insertions(+), 66 deletions(-)
 rename arch/x86/{virt/vmx/tdx => include/asm}/tdx_global_metadata.h (68%)
 create mode 100644 arch/x86/kvm/vmx/pmu_intel.h
 create mode 100644 arch/x86/kvm/vmx/tdx_arch.h
 create mode 100644 arch/x86/kvm/vmx/tdx_errno.h

-- 
2.47.0

Re: [PATCH v2 00/25] TDX vCPU/VM creation

Posted by Paolo Bonzini 1 year, 1 month ago

Applied to kvm-coco-queue, thanks.  Tomorrow I will go through the
changes and review.

Paolo

Re: [PATCH v2 00/25] TDX vCPU/VM creation

Posted by Adrian Hunter 1 year, 3 months ago

On 30/10/24 21:00, Rick Edgecombe wrote:
> Here is v2 of TDX VM/vCPU creation series. As discussed earlier, non-nits 
> from v1[0] have been applied and it’s ready to hand off to Paolo. A few 
> items remain that may be worth further discussion:
>  - Disable CET/PT in tdx_get_supported_xfam(), as these features haven’t 
>    been been tested.

It seems for Intel PT we have no support for restoring host
state.  IA32_RTIT_* MSR preservation is Init(XFAM(8)) which means
the TDX Module sets the MSR to its RESET value after TD Enty/Exit.
So it seems to me XFAM(8) does need to be disabled until that is
supported.

Re: [PATCH v2 00/25] TDX vCPU/VM creation

Posted by Edgecombe, Rick P 1 year, 2 months ago

On Thu, 2024-10-31 at 21:21 +0200, Adrian Hunter wrote:
> On 30/10/24 21:00, Rick Edgecombe wrote:
> > Here is v2 of TDX VM/vCPU creation series. As discussed earlier, non-nits 
> > from v1[0] have been applied and it’s ready to hand off to Paolo. A few 
> > items remain that may be worth further discussion:
> >   - Disable CET/PT in tdx_get_supported_xfam(), as these features haven’t 
> >     been been tested.
> 
> It seems for Intel PT we have no support for restoring host
> state.  IA32_RTIT_* MSR preservation is Init(XFAM(8)) which means
> the TDX Module sets the MSR to its RESET value after TD Enty/Exit.
> So it seems to me XFAM(8) does need to be disabled until that is
> supported.

Good point. Let's disable it and CET. We can try a fixup patch when these land
in kvm-coco-queue.

Re: [PATCH v2 00/25] TDX vCPU/VM creation

Posted by Tony Lindgren 1 year, 2 months ago

On Thu, Oct 31, 2024 at 09:21:29PM +0200, Adrian Hunter wrote:
> On 30/10/24 21:00, Rick Edgecombe wrote:
> > Here is v2 of TDX VM/vCPU creation series. As discussed earlier, non-nits 
> > from v1[0] have been applied and it’s ready to hand off to Paolo. A few 
> > items remain that may be worth further discussion:
> >  - Disable CET/PT in tdx_get_supported_xfam(), as these features haven’t 
> >    been been tested.
> 
> It seems for Intel PT we have no support for restoring host
> state.  IA32_RTIT_* MSR preservation is Init(XFAM(8)) which means
> the TDX Module sets the MSR to its RESET value after TD Enty/Exit.
> So it seems to me XFAM(8) does need to be disabled until that is
> supported.

So for now, we should remove the PT bit from tdx_get_supported_xfam(),
but can still keep it in tdx_restore_host_xsave_state()?

Then for save/restore, maybe we can just use the pt_guest_enter() and
pt_guest_exit() also for TDX. Some additional checks are needed for
the pt_mode though as the TDX module always clears the state if PT is
enabled. And the PT_MODE_SYSTEM will be missing TDX enter/exit data
but might be otherwise usable.

Regards,

Tony

Re: [PATCH v2 00/25] TDX vCPU/VM creation

Posted by Adrian Hunter 1 year, 2 months ago

On 11/11/24 11:49, Tony Lindgren wrote:
> On Thu, Oct 31, 2024 at 09:21:29PM +0200, Adrian Hunter wrote:
>> On 30/10/24 21:00, Rick Edgecombe wrote:
>>> Here is v2 of TDX VM/vCPU creation series. As discussed earlier, non-nits 
>>> from v1[0] have been applied and it’s ready to hand off to Paolo. A few 
>>> items remain that may be worth further discussion:
>>>  - Disable CET/PT in tdx_get_supported_xfam(), as these features haven’t 
>>>    been been tested.
>>
>> It seems for Intel PT we have no support for restoring host
>> state.  IA32_RTIT_* MSR preservation is Init(XFAM(8)) which means
>> the TDX Module sets the MSR to its RESET value after TD Enty/Exit.
>> So it seems to me XFAM(8) does need to be disabled until that is
>> supported.
> 
> So for now, we should remove the PT bit from tdx_get_supported_xfam(),
> but can still keep it in tdx_restore_host_xsave_state()?

Yes

> 
> Then for save/restore, maybe we can just use the pt_guest_enter() and
> pt_guest_exit() also for TDX. Some additional checks are needed for
> the pt_mode though as the TDX module always clears the state if PT is
> enabled. And the PT_MODE_SYSTEM will be missing TDX enter/exit data
> but might be otherwise usable.

pt_guest_enter() / pt_guest_exit() are not suitable for TDX.  pt_mode
is not relevant for TDX because the TDX guest is always hidden from the
host behind SEAM.  However, restoring host MSRs is not the only issue.

The TDX Module does not validate Intel PT CPUID leaf 0x14
(except it must be all zero if Intel PT is not supported
i.e. if XFAM bit 8 is zero).  For invalid MSR accesses by the guest,
the TDX Module will inject #GP.  Host VMM could provide valid CPUID
to avoid that, but it would also need to be valid for the destination
platform if migration was to be attempted.

Disabling Intel PT for TDX for now also avoids that issue.

Re: [PATCH v2 00/25] TDX vCPU/VM creation

Posted by Tony Lindgren 1 year, 2 months ago

On Tue, Nov 12, 2024 at 09:26:36AM +0200, Adrian Hunter wrote:
> On 11/11/24 11:49, Tony Lindgren wrote:
> > On Thu, Oct 31, 2024 at 09:21:29PM +0200, Adrian Hunter wrote:
> >> On 30/10/24 21:00, Rick Edgecombe wrote:
> >>> Here is v2 of TDX VM/vCPU creation series. As discussed earlier, non-nits 
> >>> from v1[0] have been applied and it’s ready to hand off to Paolo. A few 
> >>> items remain that may be worth further discussion:
> >>>  - Disable CET/PT in tdx_get_supported_xfam(), as these features haven’t 
> >>>    been been tested.
> >>
> >> It seems for Intel PT we have no support for restoring host
> >> state.  IA32_RTIT_* MSR preservation is Init(XFAM(8)) which means
> >> the TDX Module sets the MSR to its RESET value after TD Enty/Exit.
> >> So it seems to me XFAM(8) does need to be disabled until that is
> >> supported.
> > 
> > So for now, we should remove the PT bit from tdx_get_supported_xfam(),
> > but can still keep it in tdx_restore_host_xsave_state()?
> 
> Yes
> 
> > 
> > Then for save/restore, maybe we can just use the pt_guest_enter() and
> > pt_guest_exit() also for TDX. Some additional checks are needed for
> > the pt_mode though as the TDX module always clears the state if PT is
> > enabled. And the PT_MODE_SYSTEM will be missing TDX enter/exit data
> > but might be otherwise usable.
> 
> pt_guest_enter() / pt_guest_exit() are not suitable for TDX.  pt_mode
> is not relevant for TDX because the TDX guest is always hidden from the
> host behind SEAM.  However, restoring host MSRs is not the only issue.
> 
> The TDX Module does not validate Intel PT CPUID leaf 0x14
> (except it must be all zero if Intel PT is not supported
> i.e. if XFAM bit 8 is zero).  For invalid MSR accesses by the guest,
> the TDX Module will inject #GP.  Host VMM could provide valid CPUID
> to avoid that, but it would also need to be valid for the destination
> platform if migration was to be attempted.
> 
> Disabling Intel PT for TDX for now also avoids that issue.

OK thanks for the detailed explanation.

Regards,

Tony

Re: [PATCH v2 00/25] TDX vCPU/VM creation

Posted by Paolo Bonzini 1 year, 1 month ago

On Wed, Oct 30, 2024 at 8:01 PM Rick Edgecombe
<rick.p.edgecombe@intel.com> wrote:
>
> Hi,
>
> Here is v2 of TDX VM/vCPU creation series. As discussed earlier, non-nits
> from v1[0] have been applied and it’s ready to hand off to Paolo. A few
> items remain that may be worth further discussion:
>  - Disable CET/PT in tdx_get_supported_xfam(), as these features haven’t
>    been been tested.
>  - The Retry loop around tdh_phymem_page_reclaim() in “KVM: TDX:
>    create/destroy VM structure” likely can be dropped.
>  - Drop support for TDX Module’s that don’t support
>    MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM. [1]
>  - Type-safety in to_vmx()/to_tdx(). [2]

To sum up:

removed:
04 replaced by add wrapper functions for SEAMCALLs subseries
06: not needed anymore, all logic for KeyID mgmt now in x86/virt/tdx
10: tdx_capabilities dropped, replaced mostly by 02
11: KVM_TDX_CAPABILITIES moved to patch 16
19: not needed anymore
20: was needed by patch 24
22: folded in other patches
24: left for later
25: left for later/for userspace

01/02:ok
03: need to change 32 to 128
04: ok
05/06/07/08/09/10: replaced with
https://lore.kernel.org/kvm/20241203010317.827803-2-rick.p.edgecombe@intel.com/
11: see the type safety comment above:
> The ugly part here is the type-unsafety of to_vmx/to_tdx.  We probably
> should add some "#pragma poison" of to_vmx/to_tdx: for example both can
> be poisoned in pmu_intel.c after the definition of
> vcpu_to_lbr_records(), while one of them can be poisoned in
> sgx.c/posted_intr.c/vmx.c/tdx.c.

12/13/14/15: ok
16/17: to review
18: not sure why the check against num_present_cpus() is needed?
19: ok
20: ok
21: ok

22: missing review comment from v1

> +     /* TDX only supports x2APIC, which requires an in-kernel local APIC. */
> +     if (!vcpu->arch.apic)
> +             return -EINVAL;

nit: Use kvm_apic_present()

23: ok

24: need to apply fix

-       if (sub_leaf & TDX_MD_UNREADABLE_LEAF_MASK ||
+       if (leaf & TDX_MD_UNREADABLE_LEAF_MASK ||

25: ok

Re: [PATCH v2 00/25] TDX vCPU/VM creation

Posted by Edgecombe, Rick P 1 year, 1 month ago

On Mon, 2024-12-23 at 17:25 +0100, Paolo Bonzini wrote:
> To sum up:
> 
> removed:
> 04 replaced by add wrapper functions for SEAMCALLs subseries
> 06: not needed anymore, all logic for KeyID mgmt now in x86/virt/tdx
> 10: tdx_capabilities dropped, replaced mostly by 02

Sorry, what is this? Not from patch 10 "x86/virt/tdx: Add SEAMCALL wrappers for
TDX flush operations". What was dropped from which patch?

> 11: KVM_TDX_CAPABILITIES moved to patch 16
> 19: not needed anymore

I guess this is not referring to "KVM: TDX: initialize VM with TDX specific
parameters", so not sure which one is dropped.

> 20: was needed by patch 24
> 22: folded in other patches

> 24: left for later
> 25: left for later/for userspace
Ok.

I'm can't figure out what these numbers correspond to, but kvm-coco-queue
doesn't seem to have dropped any patches yet, so maybe it will make more sense
when I can take a look at the refresh there.

> 
> 01/02:ok
> 03: need to change 32 to 128
> 04: ok
> 05/06/07/08/09/10: replaced with
> https://lore.kernel.org/kvm/20241203010317.827803-2-rick.p.edgecombe@intel.com/
> 11: see the type safety comment above:
> > The ugly part here is the type-unsafety of to_vmx/to_tdx.  We probably
> > should add some "#pragma poison" of to_vmx/to_tdx: for example both can
> > be poisoned in pmu_intel.c after the definition of
> > vcpu_to_lbr_records(), while one of them can be poisoned in
> > sgx.c/posted_intr.c/vmx.c/tdx.c.

I left it off because you said "Not a strict requirement though." and gave it a
RB tag. Other stuff seemed higher priority. We can look at some options for a
follow on patch if it lightens your load.

> 
> 12/13/14/15: ok
> 16/17: to review
> 18: not sure why the check against num_present_cpus() is needed?

The per-vm KVM_MAX_VCPUS will be min_t(int, kvm->max_vcpus, num_present_cpus()).
So if td_conf->max_vcpus_per_td < num_present_cpus(), then it might report
supporting more CPUs then actually supported by the TDX module.

As to why not just report td_conf->max_vcpus_per_td, that value is the max CPUs
that are supported by any platform the TDX module supports. So it is more about
what the TDX module supports, then what userspace cares about (how many vCPUs
they can use).

I think we could probably get by without the check and blame the TDX module if
it does something strange. It is seems safer ABI-wise to have the check. But we
are being a bit more cavalier around protecting against TDX supported CPUID bit
changes then originally planned, so the check here now seems inconsistent.

Let me flag Kai to confirm there was not some known violating configuration. He
explored a bunch of edge cases on this corner.

> 19: ok
> 20: ok
> 21: ok
> 
> 22: missing review comment from v1
> 
> > +     /* TDX only supports x2APIC, which requires an in-kernel local APIC. */
> > +     if (!vcpu->arch.apic)
> > +             return -EINVAL;
> 
> nit: Use kvm_apic_present()

Oops, nice catch.

> 
> 23: ok
> 
> 24: need to apply fix
> 
> -       if (sub_leaf & TDX_MD_UNREADABLE_LEAF_MASK ||
> +       if (leaf & TDX_MD_UNREADABLE_LEAF_MASK ||
> 
> 25: ok

Re: [PATCH v2 00/25] TDX vCPU/VM creation

Posted by Tony Lindgren 1 year ago

On Sat, Jan 04, 2025 at 01:43:56AM +0000, Edgecombe, Rick P wrote:
> On Mon, 2024-12-23 at 17:25 +0100, Paolo Bonzini wrote:
> > 11: see the type safety comment above:
> > > The ugly part here is the type-unsafety of to_vmx/to_tdx.  We probably
> > > should add some "#pragma poison" of to_vmx/to_tdx: for example both can
> > > be poisoned in pmu_intel.c after the definition of
> > > vcpu_to_lbr_records(), while one of them can be poisoned in
> > > sgx.c/posted_intr.c/vmx.c/tdx.c.
> 
> I left it off because you said "Not a strict requirement though." and gave it a
> RB tag. Other stuff seemed higher priority. We can look at some options for a
> follow on patch if it lightens your load.

I suggest we do this:

- Make to_kvm_tdx() and to_tdx() private to tdx.c as they're only used
  in tdx.c

- Add pragma poison to_vmx at the top of tdx.c

- Add pragma poison to_vmx in pmu_intel.c after vcpu_to_lbr_records()

Other pragma poison to_vmx can be added as needed, but AFAIK there's
not need to add it for to_tdx().

Regards,

Tony

Re: [PATCH v2 00/25] TDX vCPU/VM creation

Posted by Tony Lindgren 1 year, 1 month ago

On Sat, Jan 04, 2025 at 01:43:56AM +0000, Edgecombe, Rick P wrote:
> On Mon, 2024-12-23 at 17:25 +0100, Paolo Bonzini wrote:
> > 22: missing review comment from v1
> > 
> > > +     /* TDX only supports x2APIC, which requires an in-kernel local APIC. */
> > > +     if (!vcpu->arch.apic)
> > > +             return -EINVAL;
> > 
> > nit: Use kvm_apic_present()
> 
> Oops, nice catch.

Sorry this fell through. I made a patch for this earlier but missed it
while rebasing to a later dev branch and never sent it.

Below is a rebased version against the current KVM CoCo queue to fold
in if still needed. Sounds like this might be already dealt with in
Paolo's upcoming CoCo queue branch though.

Regards,

Tony

8< --------------------
From aac264e9923c15522baf9ae765b1d58165c24523 Mon Sep 17 00:00:00 2001
From: Tony Lindgren <tony.lindgren@linux.intel.com>
Date: Mon, 2 Sep 2024 13:52:20 +0300
Subject: [PATCH 1/1] KVM/TDX: Use kvm_apic_present() in tdx_vcpu_create()

Use kvm_apic_present() in tdx_vcpu_create(). We need to now export
apic_hw_disabled for kvm-intel to use it.

Suggested-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Tony Lindgren <tony.lindgren@linux.intel.com>
---
 arch/x86/kvm/lapic.c   | 2 ++
 arch/x86/kvm/vmx/tdx.c | 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index fcf3a8907196..2b83092eace2 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -139,6 +139,8 @@ __read_mostly DEFINE_STATIC_KEY_FALSE(kvm_has_noapic_vcpu);
 EXPORT_SYMBOL_GPL(kvm_has_noapic_vcpu);
 
 __read_mostly DEFINE_STATIC_KEY_DEFERRED_FALSE(apic_hw_disabled, HZ);
+EXPORT_SYMBOL_GPL(apic_hw_disabled);
+
 __read_mostly DEFINE_STATIC_KEY_DEFERRED_FALSE(apic_sw_disabled, HZ);
 
 static inline int apic_enabled(struct kvm_lapic *apic)
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index d0dc3200fa37..6c68567d964d 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -8,6 +8,7 @@
 #include "capabilities.h"
 #include "mmu.h"
 #include "x86_ops.h"
+#include "lapic.h"
 #include "tdx.h"
 #include "vmx.h"
 #include "mmu/spte.h"
@@ -674,7 +675,7 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 		return -EIO;
 
 	/* TDX only supports x2APIC, which requires an in-kernel local APIC. */
-	if (!vcpu->arch.apic)
+	if (!kvm_apic_present(vcpu))
 		return -EINVAL;
 
 	fpstate_set_confidential(&vcpu->arch.guest_fpu);
-- 
2.47.1

Re: [PATCH v2 00/25] TDX vCPU/VM creation

Posted by Nikolay Borisov 1 year, 1 month ago


On 7.01.25 г. 9:37 ч., Tony Lindgren wrote:
> On Sat, Jan 04, 2025 at 01:43:56AM +0000, Edgecombe, Rick P wrote:
>> On Mon, 2024-12-23 at 17:25 +0100, Paolo Bonzini wrote:
>>> 22: missing review comment from v1
>>>
>>>> +     /* TDX only supports x2APIC, which requires an in-kernel local APIC. */
>>>> +     if (!vcpu->arch.apic)
>>>> +             return -EINVAL;
>>>
>>> nit: Use kvm_apic_present()
>>
>> Oops, nice catch.
> 
> Sorry this fell through. I made a patch for this earlier but missed it
> while rebasing to a later dev branch and never sent it.
> 
> Below is a rebased version against the current KVM CoCo queue to fold
> in if still needed. Sounds like this might be already dealt with in
> Paolo's upcoming CoCo queue branch though.
> 
> Regards,
> 
> Tony
> 
> 8< --------------------
>  From aac264e9923c15522baf9ae765b1d58165c24523 Mon Sep 17 00:00:00 2001
> From: Tony Lindgren <tony.lindgren@linux.intel.com>
> Date: Mon, 2 Sep 2024 13:52:20 +0300
> Subject: [PATCH 1/1] KVM/TDX: Use kvm_apic_present() in tdx_vcpu_create()
> 
> Use kvm_apic_present() in tdx_vcpu_create(). We need to now export
> apic_hw_disabled for kvm-intel to use it.
> 
> Suggested-by: Nikolay Borisov <nik.borisov@suse.com>
> Signed-off-by: Tony Lindgren <tony.lindgren@linux.intel.com>
> ---
>   arch/x86/kvm/lapic.c   | 2 ++
>   arch/x86/kvm/vmx/tdx.c | 3 ++-
>   2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index fcf3a8907196..2b83092eace2 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -139,6 +139,8 @@ __read_mostly DEFINE_STATIC_KEY_FALSE(kvm_has_noapic_vcpu);
>   EXPORT_SYMBOL_GPL(kvm_has_noapic_vcpu);
>   
>   __read_mostly DEFINE_STATIC_KEY_DEFERRED_FALSE(apic_hw_disabled, HZ);
> +EXPORT_SYMBOL_GPL(apic_hw_disabled);

Is it really required to expose this symbol? apic_hw_disabled is defined 
as static inline in the header?

 > +>   __read_mostly DEFINE_STATIC_KEY_DEFERRED_FALSE(apic_sw_disabled, 
HZ);
>   
>   static inline int apic_enabled(struct kvm_lapic *apic)
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index d0dc3200fa37..6c68567d964d 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -8,6 +8,7 @@
>   #include "capabilities.h"
>   #include "mmu.h"
>   #include "x86_ops.h"
> +#include "lapic.h"
>   #include "tdx.h"
>   #include "vmx.h"
>   #include "mmu/spte.h"
> @@ -674,7 +675,7 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
>   		return -EIO;
>   
>   	/* TDX only supports x2APIC, which requires an in-kernel local APIC. */
> -	if (!vcpu->arch.apic)
> +	if (!kvm_apic_present(vcpu))
>   		return -EINVAL;
>   
>   	fpstate_set_confidential(&vcpu->arch.guest_fpu);

Re: [PATCH v2 00/25] TDX vCPU/VM creation

Posted by Tony Lindgren 1 year, 1 month ago

On Tue, Jan 07, 2025 at 02:41:51PM +0200, Nikolay Borisov wrote:
> On 7.01.25 г. 9:37 ч., Tony Lindgren wrote:
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -139,6 +139,8 @@ __read_mostly DEFINE_STATIC_KEY_FALSE(kvm_has_noapic_vcpu);
> >   EXPORT_SYMBOL_GPL(kvm_has_noapic_vcpu);
> >   __read_mostly DEFINE_STATIC_KEY_DEFERRED_FALSE(apic_hw_disabled, HZ);
> > +EXPORT_SYMBOL_GPL(apic_hw_disabled);
> 
> Is it really required to expose this symbol? apic_hw_disabled is defined as
> static inline in the header?

For loadable modules yes, otherwise we'll get:

ERROR: modpost: "apic_hw_disabled" [arch/x86/kvm/kvm-intel.ko] undefined!

This is similar to the EXPORT_SYMBOL_GPL(kvm_has_noapic_vcpu) already
there.

Regards,

Tony

Re: [PATCH v2 00/25] TDX vCPU/VM creation

Posted by Sean Christopherson 1 year ago

On Wed, Jan 08, 2025, Tony Lindgren wrote:
> On Tue, Jan 07, 2025 at 02:41:51PM +0200, Nikolay Borisov wrote:
> > On 7.01.25 г. 9:37 ч., Tony Lindgren wrote:
> > > --- a/arch/x86/kvm/lapic.c
> > > +++ b/arch/x86/kvm/lapic.c
> > > @@ -139,6 +139,8 @@ __read_mostly DEFINE_STATIC_KEY_FALSE(kvm_has_noapic_vcpu);
> > >   EXPORT_SYMBOL_GPL(kvm_has_noapic_vcpu);
> > >   __read_mostly DEFINE_STATIC_KEY_DEFERRED_FALSE(apic_hw_disabled, HZ);
> > > +EXPORT_SYMBOL_GPL(apic_hw_disabled);
> > 
> > Is it really required to expose this symbol? apic_hw_disabled is defined as
> > static inline in the header?

No, apic_hw_disabled can't be "static inline", because it's a variable, not a
function.

> For loadable modules yes, otherwise we'll get:
> 
> ERROR: modpost: "apic_hw_disabled" [arch/x86/kvm/kvm-intel.ko] undefined!
> 
> This is similar to the EXPORT_SYMBOL_GPL(kvm_has_noapic_vcpu) already
> there.

Heh, which is a hint that you're using the wrong helper.  TDX should check
lapic_in_kernel(), not kvm_apic_present().  The former verifies that local APIC
emulation/virtualization is handed in-kernel, i.e. by KVM.  The latter checks
that the local APIC is in-kernel *and* that the vCPU's local APIC is hardware
enabled, and checking that the local APIC is hardware enabled is unnecessary
and only works by sheer dumb luck.

The only reason kvm_create_lapic() stuffs the enable bit is to avoid toggling
the static key, which incurs costly IPIs to patch kernel text.  If
apic_hw_disabled were to be removed (which is somewhat seriously being considered),
this code would be deleted and TDX would break.

	/*
	 * Stuff the APIC ENABLE bit in lieu of temporarily incrementing
	 * apic_hw_disabled; the full RESET value is set by kvm_lapic_reset().
	 */
	vcpu->arch.apic_base = MSR_IA32_APICBASE_ENABLE;

Re: [PATCH v2 00/25] TDX vCPU/VM creation

Posted by Tony Lindgren 1 year ago

On Wed, Jan 08, 2025 at 07:01:01AM -0800, Sean Christopherson wrote:
> On Wed, Jan 08, 2025, Tony Lindgren wrote:
> > On Tue, Jan 07, 2025 at 02:41:51PM +0200, Nikolay Borisov wrote:
> > > On 7.01.25 г. 9:37 ч., Tony Lindgren wrote:
> > > > --- a/arch/x86/kvm/lapic.c
> > > > +++ b/arch/x86/kvm/lapic.c
> > > > @@ -139,6 +139,8 @@ __read_mostly DEFINE_STATIC_KEY_FALSE(kvm_has_noapic_vcpu);
> > > >   EXPORT_SYMBOL_GPL(kvm_has_noapic_vcpu);
> > > >   __read_mostly DEFINE_STATIC_KEY_DEFERRED_FALSE(apic_hw_disabled, HZ);
> > > > +EXPORT_SYMBOL_GPL(apic_hw_disabled);
> > > 
> > > Is it really required to expose this symbol? apic_hw_disabled is defined as
> > > static inline in the header?
> 
> No, apic_hw_disabled can't be "static inline", because it's a variable, not a
> function.
> 
> > For loadable modules yes, otherwise we'll get:
> > 
> > ERROR: modpost: "apic_hw_disabled" [arch/x86/kvm/kvm-intel.ko] undefined!
> > 
> > This is similar to the EXPORT_SYMBOL_GPL(kvm_has_noapic_vcpu) already
> > there.
> 
> Heh, which is a hint that you're using the wrong helper.  TDX should check
> lapic_in_kernel(), not kvm_apic_present().  The former verifies that local APIC
> emulation/virtualization is handed in-kernel, i.e. by KVM.  The latter checks
> that the local APIC is in-kernel *and* that the vCPU's local APIC is hardware
> enabled, and checking that the local APIC is hardware enabled is unnecessary
> and only works by sheer dumb luck.

OK makes sense :)

> The only reason kvm_create_lapic() stuffs the enable bit is to avoid toggling
> the static key, which incurs costly IPIs to patch kernel text.  If
> apic_hw_disabled were to be removed (which is somewhat seriously being considered),
> this code would be deleted and TDX would break.
> 
> 	/*
> 	 * Stuff the APIC ENABLE bit in lieu of temporarily incrementing
> 	 * apic_hw_disabled; the full RESET value is set by kvm_lapic_reset().
> 	 */
> 	vcpu->arch.apic_base = MSR_IA32_APICBASE_ENABLE;

Thanks for the clarification. Updated patch below for reference in
case it's still needed.

Regards,

Tony

8< ----------------------
From 1e4b72fe4a69f0bdd7c8379315b97be79fb6cf8a Mon Sep 17 00:00:00 2001
From: Tony Lindgren <tony.lindgren@linux.intel.com>
Date: Mon, 2 Sep 2024 13:52:20 +0300
Subject: [PATCH 1/1] KVM/TDX: Use lapic_in_kernel() in tdx_vcpu_create()

Use lapic_in_kernel() in tdx_vcpu_create().

Suggested-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Tony Lindgren <tony.lindgren@linux.intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index d0dc3200fa37..b905a7c9e2ff 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -8,6 +8,7 @@
 #include "capabilities.h"
 #include "mmu.h"
 #include "x86_ops.h"
+#include "lapic.h"
 #include "tdx.h"
 #include "vmx.h"
 #include "mmu/spte.h"
@@ -674,7 +675,7 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 		return -EIO;
 
 	/* TDX only supports x2APIC, which requires an in-kernel local APIC. */
-	if (!vcpu->arch.apic)
+	if (!lapic_in_kernel(vcpu))
 		return -EINVAL;
 
 	fpstate_set_confidential(&vcpu->arch.guest_fpu);
-- 
2.47.1

Re: [PATCH v2 00/25] TDX vCPU/VM creation

Posted by Huang, Kai 1 year, 1 month ago

On Sat, 2025-01-04 at 01:43 +0000, Edgecombe, Rick P wrote:
> > 18: not sure why the check against num_present_cpus() is needed?
> 
> The per-vm KVM_MAX_VCPUS will be min_t(int, kvm->max_vcpus, num_present_cpus()).
> So if td_conf->max_vcpus_per_td < num_present_cpus(), then it might report
> supporting more CPUs then actually supported by the TDX module.

Right.

> 
> As to why not just report td_conf->max_vcpus_per_td, that value is the max CPUs
> that are supported by any platform the TDX module supports. So it is more about
> what the TDX module supports, then what userspace cares about (how many vCPUs
> they can use).

Sean didn't want to make reporting maximum vcpus depend on the whims of TDX
module since this doesn't provide a predictable ABI:

https://lore.kernel.org/kvm/ZmzaqRy2zjvlsDfL@google.com/

> 
> I think we could probably get by without the check and blame the TDX module if
> it does something strange. It is seems safer ABI-wise to have the check. But we
> are being a bit more cavalier around protecting against TDX supported CPUID bit
> changes then originally planned, so the check here now seems inconsistent.
> 
> Let me flag Kai to confirm there was not some known violating configuration. He
> explored a bunch of edge cases on this corner.

In practice the "max_vcpu_per_td" will never be smaller than the maximum logical
CPUs that ALL the platforms that the module supports can possibly have.  I got
this from the TDX module guys, and I don't think there's any reason for the TDX
module to break this.

However from module ABI's perspective (from the JSON), it could be any value, so
I think we should have a sanity check.  I think this is also different from the
"array size of CPUID_CONFIGs" ABI breakage (assuming this is what you meant
"protecting TDX supported CPUID bits" above) since it is currently documented as
32 in the JSON.