[PATCH v2 00/25] TDX vCPU/VM creation

Rick Edgecombe posted 25 patches 3 weeks, 4 days ago
arch/x86/include/asm/kvm-x86-ops.h            |    4 +-
arch/x86/include/asm/kvm_host.h               |    2 +
arch/x86/include/asm/shared/tdx.h             |    7 +-
arch/x86/include/asm/tdx.h                    |   25 +
.../tdx => include/asm}/tdx_global_metadata.h |   19 +
arch/x86/include/uapi/asm/kvm.h               |   59 +
arch/x86/kvm/Kconfig                          |    2 +
arch/x86/kvm/cpuid.c                          |   21 +
arch/x86/kvm/cpuid.h                          |    3 +
arch/x86/kvm/mmu/mmu.c                        |    9 +-
arch/x86/kvm/vmx/main.c                       |  143 +-
arch/x86/kvm/vmx/pmu_intel.c                  |   50 +-
arch/x86/kvm/vmx/pmu_intel.h                  |   28 +
arch/x86/kvm/vmx/tdx.c                        | 1369 ++++++++++++++++-
arch/x86/kvm/vmx/tdx.h                        |   92 ++
arch/x86/kvm/vmx/tdx_arch.h                   |  165 ++
arch/x86/kvm/vmx/tdx_errno.h                  |   37 +
arch/x86/kvm/vmx/vmx.h                        |   34 +-
arch/x86/kvm/vmx/x86_ops.h                    |   24 +
arch/x86/kvm/x86.c                            |   15 +-
arch/x86/virt/vmx/tdx/tdx.c                   |  264 +++-
arch/x86/virt/vmx/tdx/tdx.h                   |   39 +-
arch/x86/virt/vmx/tdx/tdx_global_metadata.c   |   46 +
23 files changed, 2391 insertions(+), 66 deletions(-)
rename arch/x86/{virt/vmx/tdx => include/asm}/tdx_global_metadata.h (68%)
create mode 100644 arch/x86/kvm/vmx/pmu_intel.h
create mode 100644 arch/x86/kvm/vmx/tdx_arch.h
create mode 100644 arch/x86/kvm/vmx/tdx_errno.h
[PATCH v2 00/25] TDX vCPU/VM creation
Posted by Rick Edgecombe 3 weeks, 4 days ago
Hi,

Here is v2 of TDX VM/vCPU creation series. As discussed earlier, non-nits 
from v1[0] have been applied and it’s ready to hand off to Paolo. A few 
items remain that may be worth further discussion:
 - Disable CET/PT in tdx_get_supported_xfam(), as these features haven’t 
   been been tested.
 - The Retry loop around tdh_phymem_page_reclaim() in “KVM: TDX: 
   create/destroy VM structure” likely can be dropped.
 - Drop support for TDX Module’s that don’t support
   MD_FIELD_ID_FEATURES0_TOPOLOGY_ENUM. [1]
 - Type-safety in to_vmx()/to_tdx(). [2]

This series has 9 commits intended to collect acks from x86 maintainers. 
The first group of commits is for reading TDX static metadata for KVM to 
use:
	x86/virt/tdx: Share the global metadata structure for KVM to use
	x86/virt/tdx: Read essential global metadata for KVM

The second group is for exporting a TDX keyid allocator for KVM to use:
	x86/virt/tdx: Add tdx_guest_keyid_alloc/free() to alloc and free
	  TDX guest KeyID

The third group is for exporting various SEAMCALLs needed by KVM for this 
series. SEAMCALL patches for the later TDX sections will come with those 
series’:
	x86/virt/tdx: Add SEAMCALL wrappers for TDX KeyID management
	x86/virt/tdx: Add SEAMCALL wrappers for TDX TD creation
	x86/virt/tdx: Add SEAMCALL wrappers for TDX vCPU creation
	x86/virt/tdx: Add SEAMCALL wrappers for TDX page cache management
	x86/virt/tdx: Add SEAMCALL wrappers for TDX VM/vCPU field access
	x86/virt/tdx: Add SEAMCALL wrappers for TDX flush operations

This series is based off of a kvm-coco-queue commit and some pre-req
series:
1. d659088d46df "KVM: x86/mmu: Prevent aliased memslot GFNs" (in
   kvm-coco-queue).
2. v6 of “TDX host: metadata reading tweaks, bug fix and info dump”
3. “KVM: VMX: Initialize TDX when loading KVM module” re-ordered from 
   the commits in kvm-coco-queue

It requires TDX module 1.5.06.00.0744[3], or later. This is due to removal
of the workarounds for the lack of NO_RBP_MOD. Now NO_RBP_MOD is enabled,
and this particular version of the TDX module has a NO_RBP_MOD related bug
fix.

The full KVM branch is here:
https://github.com/intel/tdx/tree/tdx_kvm_dev-2024-10-30

Matching QEMU:
https://github.com/intel-staging/qemu-tdx/tree/tdx-qemu-wip-2024-10-11

[0] https://lore.kernel.org/kvm/20240812224820.34826-1-rick.p.edgecombe@intel.com/
[1] https://lore.kernel.org/kvm/d71540ab13e728d1326baae92e8ea82d00c08abe.camel@intel.com/
[2] https://lore.kernel.org/kvm/89657f96-0ed1-4543-9074-f13f62cc4694@redhat.com/
[3] https://github.com/intel/tdx-module/releases/tag/TDX_1.5.06

Isaku Yamahata (19):
  x86/virt/tdx: Add tdx_guest_keyid_alloc/free() to alloc and free TDX
    guest KeyID
  x86/virt/tdx: Add SEAMCALL wrappers for TDX KeyID management
  x86/virt/tdx: Add SEAMCALL wrappers for TDX TD creation
  x86/virt/tdx: Add SEAMCALL wrappers for TDX vCPU creation
  x86/virt/tdx: Add SEAMCALL wrappers for TDX page cache management
  x86/virt/tdx: Add SEAMCALL wrappers for TDX VM/vCPU field access
  x86/virt/tdx: Add SEAMCALL wrappers for TDX flush operations
  KVM: TDX: Add placeholders for TDX VM/vCPU structures
  KVM: TDX: Define TDX architectural definitions
  KVM: TDX: Add helper functions to print TDX SEAMCALL error
  KVM: TDX: Add place holder for TDX VM specific mem_enc_op ioctl
  KVM: TDX: Get system-wide info about TDX module on initialization
  KVM: TDX: create/destroy VM structure
  KVM: TDX: Support per-VM KVM_CAP_MAX_VCPUS extension check
  KVM: TDX: initialize VM with TDX specific parameters
  KVM: TDX: Make pmu_intel.c ignore guest TD case
  KVM: TDX: Don't offline the last cpu of one package when there's TDX
    guest
  KVM: TDX: create/free TDX vcpu structure
  KVM: TDX: Do TDX specific vcpu initialization

Kai Huang (3):
  x86/virt/tdx: Share the global metadata structure for KVM to use
  KVM: TDX: Get TDX global information
  x86/virt/tdx: Read essential global metadata for KVM

Sean Christopherson (1):
  KVM: TDX: Add TDX "architectural" error codes

Xiaoyao Li (2):
  KVM: x86: Introduce KVM_TDX_GET_CPUID
  KVM: x86/mmu: Taking guest pa into consideration when calculate tdp
    level

 arch/x86/include/asm/kvm-x86-ops.h            |    4 +-
 arch/x86/include/asm/kvm_host.h               |    2 +
 arch/x86/include/asm/shared/tdx.h             |    7 +-
 arch/x86/include/asm/tdx.h                    |   25 +
 .../tdx => include/asm}/tdx_global_metadata.h |   19 +
 arch/x86/include/uapi/asm/kvm.h               |   59 +
 arch/x86/kvm/Kconfig                          |    2 +
 arch/x86/kvm/cpuid.c                          |   21 +
 arch/x86/kvm/cpuid.h                          |    3 +
 arch/x86/kvm/mmu/mmu.c                        |    9 +-
 arch/x86/kvm/vmx/main.c                       |  143 +-
 arch/x86/kvm/vmx/pmu_intel.c                  |   50 +-
 arch/x86/kvm/vmx/pmu_intel.h                  |   28 +
 arch/x86/kvm/vmx/tdx.c                        | 1369 ++++++++++++++++-
 arch/x86/kvm/vmx/tdx.h                        |   92 ++
 arch/x86/kvm/vmx/tdx_arch.h                   |  165 ++
 arch/x86/kvm/vmx/tdx_errno.h                  |   37 +
 arch/x86/kvm/vmx/vmx.h                        |   34 +-
 arch/x86/kvm/vmx/x86_ops.h                    |   24 +
 arch/x86/kvm/x86.c                            |   15 +-
 arch/x86/virt/vmx/tdx/tdx.c                   |  264 +++-
 arch/x86/virt/vmx/tdx/tdx.h                   |   39 +-
 arch/x86/virt/vmx/tdx/tdx_global_metadata.c   |   46 +
 23 files changed, 2391 insertions(+), 66 deletions(-)
 rename arch/x86/{virt/vmx/tdx => include/asm}/tdx_global_metadata.h (68%)
 create mode 100644 arch/x86/kvm/vmx/pmu_intel.h
 create mode 100644 arch/x86/kvm/vmx/tdx_arch.h
 create mode 100644 arch/x86/kvm/vmx/tdx_errno.h

-- 
2.47.0

Re: [PATCH v2 00/25] TDX vCPU/VM creation
Posted by Adrian Hunter 3 weeks, 3 days ago
On 30/10/24 21:00, Rick Edgecombe wrote:
> Here is v2 of TDX VM/vCPU creation series. As discussed earlier, non-nits 
> from v1[0] have been applied and it’s ready to hand off to Paolo. A few 
> items remain that may be worth further discussion:
>  - Disable CET/PT in tdx_get_supported_xfam(), as these features haven’t 
>    been been tested.

It seems for Intel PT we have no support for restoring host
state.  IA32_RTIT_* MSR preservation is Init(XFAM(8)) which means
the TDX Module sets the MSR to its RESET value after TD Enty/Exit.
So it seems to me XFAM(8) does need to be disabled until that is
supported.

Re: [PATCH v2 00/25] TDX vCPU/VM creation
Posted by Edgecombe, Rick P 1 week, 5 days ago
On Thu, 2024-10-31 at 21:21 +0200, Adrian Hunter wrote:
> On 30/10/24 21:00, Rick Edgecombe wrote:
> > Here is v2 of TDX VM/vCPU creation series. As discussed earlier, non-nits 
> > from v1[0] have been applied and it’s ready to hand off to Paolo. A few 
> > items remain that may be worth further discussion:
> >   - Disable CET/PT in tdx_get_supported_xfam(), as these features haven’t 
> >     been been tested.
> 
> It seems for Intel PT we have no support for restoring host
> state.  IA32_RTIT_* MSR preservation is Init(XFAM(8)) which means
> the TDX Module sets the MSR to its RESET value after TD Enty/Exit.
> So it seems to me XFAM(8) does need to be disabled until that is
> supported.

Good point. Let's disable it and CET. We can try a fixup patch when these land
in kvm-coco-queue.
Re: [PATCH v2 00/25] TDX vCPU/VM creation
Posted by Tony Lindgren 1 week, 6 days ago
On Thu, Oct 31, 2024 at 09:21:29PM +0200, Adrian Hunter wrote:
> On 30/10/24 21:00, Rick Edgecombe wrote:
> > Here is v2 of TDX VM/vCPU creation series. As discussed earlier, non-nits 
> > from v1[0] have been applied and it’s ready to hand off to Paolo. A few 
> > items remain that may be worth further discussion:
> >  - Disable CET/PT in tdx_get_supported_xfam(), as these features haven’t 
> >    been been tested.
> 
> It seems for Intel PT we have no support for restoring host
> state.  IA32_RTIT_* MSR preservation is Init(XFAM(8)) which means
> the TDX Module sets the MSR to its RESET value after TD Enty/Exit.
> So it seems to me XFAM(8) does need to be disabled until that is
> supported.

So for now, we should remove the PT bit from tdx_get_supported_xfam(),
but can still keep it in tdx_restore_host_xsave_state()?

Then for save/restore, maybe we can just use the pt_guest_enter() and
pt_guest_exit() also for TDX. Some additional checks are needed for
the pt_mode though as the TDX module always clears the state if PT is
enabled. And the PT_MODE_SYSTEM will be missing TDX enter/exit data
but might be otherwise usable.

Regards,

Tony
Re: [PATCH v2 00/25] TDX vCPU/VM creation
Posted by Adrian Hunter 1 week, 5 days ago
On 11/11/24 11:49, Tony Lindgren wrote:
> On Thu, Oct 31, 2024 at 09:21:29PM +0200, Adrian Hunter wrote:
>> On 30/10/24 21:00, Rick Edgecombe wrote:
>>> Here is v2 of TDX VM/vCPU creation series. As discussed earlier, non-nits 
>>> from v1[0] have been applied and it’s ready to hand off to Paolo. A few 
>>> items remain that may be worth further discussion:
>>>  - Disable CET/PT in tdx_get_supported_xfam(), as these features haven’t 
>>>    been been tested.
>>
>> It seems for Intel PT we have no support for restoring host
>> state.  IA32_RTIT_* MSR preservation is Init(XFAM(8)) which means
>> the TDX Module sets the MSR to its RESET value after TD Enty/Exit.
>> So it seems to me XFAM(8) does need to be disabled until that is
>> supported.
> 
> So for now, we should remove the PT bit from tdx_get_supported_xfam(),
> but can still keep it in tdx_restore_host_xsave_state()?

Yes

> 
> Then for save/restore, maybe we can just use the pt_guest_enter() and
> pt_guest_exit() also for TDX. Some additional checks are needed for
> the pt_mode though as the TDX module always clears the state if PT is
> enabled. And the PT_MODE_SYSTEM will be missing TDX enter/exit data
> but might be otherwise usable.

pt_guest_enter() / pt_guest_exit() are not suitable for TDX.  pt_mode
is not relevant for TDX because the TDX guest is always hidden from the
host behind SEAM.  However, restoring host MSRs is not the only issue.

The TDX Module does not validate Intel PT CPUID leaf 0x14
(except it must be all zero if Intel PT is not supported
i.e. if XFAM bit 8 is zero).  For invalid MSR accesses by the guest,
the TDX Module will inject #GP.  Host VMM could provide valid CPUID
to avoid that, but it would also need to be valid for the destination
platform if migration was to be attempted.

Disabling Intel PT for TDX for now also avoids that issue.
Re: [PATCH v2 00/25] TDX vCPU/VM creation
Posted by Tony Lindgren 1 week, 5 days ago
On Tue, Nov 12, 2024 at 09:26:36AM +0200, Adrian Hunter wrote:
> On 11/11/24 11:49, Tony Lindgren wrote:
> > On Thu, Oct 31, 2024 at 09:21:29PM +0200, Adrian Hunter wrote:
> >> On 30/10/24 21:00, Rick Edgecombe wrote:
> >>> Here is v2 of TDX VM/vCPU creation series. As discussed earlier, non-nits 
> >>> from v1[0] have been applied and it’s ready to hand off to Paolo. A few 
> >>> items remain that may be worth further discussion:
> >>>  - Disable CET/PT in tdx_get_supported_xfam(), as these features haven’t 
> >>>    been been tested.
> >>
> >> It seems for Intel PT we have no support for restoring host
> >> state.  IA32_RTIT_* MSR preservation is Init(XFAM(8)) which means
> >> the TDX Module sets the MSR to its RESET value after TD Enty/Exit.
> >> So it seems to me XFAM(8) does need to be disabled until that is
> >> supported.
> > 
> > So for now, we should remove the PT bit from tdx_get_supported_xfam(),
> > but can still keep it in tdx_restore_host_xsave_state()?
> 
> Yes
> 
> > 
> > Then for save/restore, maybe we can just use the pt_guest_enter() and
> > pt_guest_exit() also for TDX. Some additional checks are needed for
> > the pt_mode though as the TDX module always clears the state if PT is
> > enabled. And the PT_MODE_SYSTEM will be missing TDX enter/exit data
> > but might be otherwise usable.
> 
> pt_guest_enter() / pt_guest_exit() are not suitable for TDX.  pt_mode
> is not relevant for TDX because the TDX guest is always hidden from the
> host behind SEAM.  However, restoring host MSRs is not the only issue.
> 
> The TDX Module does not validate Intel PT CPUID leaf 0x14
> (except it must be all zero if Intel PT is not supported
> i.e. if XFAM bit 8 is zero).  For invalid MSR accesses by the guest,
> the TDX Module will inject #GP.  Host VMM could provide valid CPUID
> to avoid that, but it would also need to be valid for the destination
> platform if migration was to be attempted.
> 
> Disabling Intel PT for TDX for now also avoids that issue.

OK thanks for the detailed explanation.

Regards,

Tony