[RFC PATCH 0/4] KVM: x86/tdx: Have TDX handle VMXON during bringup

Sean Christopherson posted 4 patches 2 months, 1 week ago
There is a newer version of this series
Documentation/arch/x86/tdx.rst              |  26 --
arch/x86/events/intel/pt.c                  |   1 -
arch/x86/include/asm/reboot.h               |   3 -
arch/x86/include/asm/tdx.h                  |   4 -
arch/x86/include/asm/virt.h                 |  21 ++
arch/x86/include/asm/vmx.h                  |  11 +
arch/x86/kernel/cpu/common.c                |   2 +
arch/x86/kernel/reboot.c                    |  11 -
arch/x86/kvm/svm/svm.c                      |  34 +-
arch/x86/kvm/svm/vmenter.S                  |  10 +-
arch/x86/kvm/vmx/tdx.c                      | 190 +++---------
arch/x86/kvm/vmx/vmcs.h                     |  11 -
arch/x86/kvm/vmx/vmenter.S                  |   2 +-
arch/x86/kvm/vmx/vmx.c                      | 128 +-------
arch/x86/kvm/x86.c                          |  18 +-
arch/x86/virt/Makefile                      |   2 +
arch/x86/virt/hw.c                          | 327 ++++++++++++++++++++
arch/x86/virt/vmx/tdx/tdx.c                 | 292 +++++++++--------
arch/x86/virt/vmx/tdx/tdx.h                 |   8 -
arch/x86/virt/vmx/tdx/tdx_global_metadata.c |  10 +-
include/linux/kvm_host.h                    |  10 +-
virt/kvm/kvm_main.c                         |  31 +-
22 files changed, 622 insertions(+), 530 deletions(-)
create mode 100644 arch/x86/include/asm/virt.h
create mode 100644 arch/x86/virt/hw.c
[RFC PATCH 0/4] KVM: x86/tdx: Have TDX handle VMXON during bringup
Posted by Sean Christopherson 2 months, 1 week ago
This is a sort of middle ground between fully yanking core virtualization
support out of KVM, and unconditionally doing VMXON during boot[0].

I got quite far long on rebasing some internal patches we have to extract the
core virtualization bits out of KVM x86, but as I paged back in all of the
things we had punted on (because they were waaay out of scope for our needs),
I realized more and more that providing truly generic virtualization
instrastructure is vastly different than providing infrastructure that can be
shared by multiple instances of KVM (or things very similar to KVM)[1].

So while I still don't want to blindly do VMXON, I also think that trying to
actually support another in-tree hypervisor, without an imminent user to drive
the development, is a waste of resources, and would saddle KVM with a pile of
pointless complexity.

The idea here is to extract _only_ VMXON+VMXOFF and EFER.SVME toggling.  AFAIK
there's no second user of SVM, i.e. no equivalent to TDX, but I wanted to keep
things as symmetrical as possible.

Emphasis on "only", because leaving VMCS tracking and clearing in KVM is
another key difference from Xin's series.  The "light bulb" moment on that
front is that TDX isn't a hypervisor, and isn't trying to be a hypervisor.
Specifically, TDX should _never_ have it's own VMCSes (that are visible to the
host; the TDX-Module has it's own VMCSes to do SEAMCALL/SEAMRET), and so there
is simply no reason to move that functionality out of KVM.

With that out of the way, dealing with VMXON/VMXOFF and EFER.SVME is a fairly
simple refcounting game.

Oh, and I didn't bother looking to see if it would work, but if TDX only needs
VMXON during boot, then the TDX use of VMXON could be transient.  I.e. TDX
could simply blast on_each_cpu() and forego the cpuhp and syscore hooks (a
non-emergency reboot during init isn't possible).  I don't particuarly care
what TDX does, as it's a fairly minor detail all things concerned.  I went with
the "harder" approach, e.g. to validate keeping the VMXON users count elevated
would do the right thing with respect to CPU offlining, etc.

Lightly tested (see the hacks below to verify the TDX side appears to do what
it's supposed to do), but it seems to work?  Heavily RFC, e.g. the third patch
in particular needs to be chunked up, I'm sure there's polishing to be done,
etc.

[0] https://lore.kernel.org/all/20250909182828.1542362-1-xin@zytor.com
[1] https://lore.kernel.org/all/aOl5EutrdL_OlVOO@google.com

Sean Christopherson (4):
  KVM: x86: Move kvm_rebooting to x86
  KVM: x86: Extract VMXON and EFER.SVME enablement to kernel
  KVM: x86/tdx: Do VMXON and TDX-Module initialization during tdx_init()
  KVM: Bury kvm_{en,dis}able_virtualization() in kvm_main.c once more

 Documentation/arch/x86/tdx.rst              |  26 --
 arch/x86/events/intel/pt.c                  |   1 -
 arch/x86/include/asm/reboot.h               |   3 -
 arch/x86/include/asm/tdx.h                  |   4 -
 arch/x86/include/asm/virt.h                 |  21 ++
 arch/x86/include/asm/vmx.h                  |  11 +
 arch/x86/kernel/cpu/common.c                |   2 +
 arch/x86/kernel/reboot.c                    |  11 -
 arch/x86/kvm/svm/svm.c                      |  34 +-
 arch/x86/kvm/svm/vmenter.S                  |  10 +-
 arch/x86/kvm/vmx/tdx.c                      | 190 +++---------
 arch/x86/kvm/vmx/vmcs.h                     |  11 -
 arch/x86/kvm/vmx/vmenter.S                  |   2 +-
 arch/x86/kvm/vmx/vmx.c                      | 128 +-------
 arch/x86/kvm/x86.c                          |  18 +-
 arch/x86/virt/Makefile                      |   2 +
 arch/x86/virt/hw.c                          | 327 ++++++++++++++++++++
 arch/x86/virt/vmx/tdx/tdx.c                 | 292 +++++++++--------
 arch/x86/virt/vmx/tdx/tdx.h                 |   8 -
 arch/x86/virt/vmx/tdx/tdx_global_metadata.c |  10 +-
 include/linux/kvm_host.h                    |  10 +-
 virt/kvm/kvm_main.c                         |  31 +-
 22 files changed, 622 insertions(+), 530 deletions(-)
 create mode 100644 arch/x86/include/asm/virt.h
 create mode 100644 arch/x86/virt/hw.c


base-commit: efcebc8f7aeeba15feb1a5bde70af74d96bf1a76
-- 
2.51.0.740.g6adb054d12-goog
Re: [RFC PATCH 0/4] KVM: x86/tdx: Have TDX handle VMXON during bringup
Posted by dan.j.williams@intel.com 2 months ago
[ Add Alexey for question below about SEV-TIO needing to enable SNP from
the PSP driver? ]

Sean Christopherson wrote:
> This is a sort of middle ground between fully yanking core virtualization
> support out of KVM, and unconditionally doing VMXON during boot[0].

Thanks for this, Sean!

> I got quite far long on rebasing some internal patches we have to extract the
> core virtualization bits out of KVM x86, but as I paged back in all of the
> things we had punted on (because they were waaay out of scope for our needs),
> I realized more and more that providing truly generic virtualization
> instrastructure is vastly different than providing infrastructure that can be
> shared by multiple instances of KVM (or things very similar to KVM)[1].
> 
> So while I still don't want to blindly do VMXON, I also think that trying to
> actually support another in-tree hypervisor, without an imminent user to drive
> the development, is a waste of resources, and would saddle KVM with a pile of
> pointless complexity.
> 
> The idea here is to extract _only_ VMXON+VMXOFF and EFER.SVME toggling.  AFAIK
> there's no second user of SVM, i.e. no equivalent to TDX, but I wanted to keep
> things as symmetrical as possible.

Alexey did mention in the TEE I/O call that the PSP driver does need to
turn on SVM. Added him to the Cc to clarify if SEV-TIO needs at least
SVM enabled outside of KVM in some cases.

> Emphasis on "only", because leaving VMCS tracking and clearing in KVM is
> another key difference from Xin's series.  The "light bulb" moment on that
> front is that TDX isn't a hypervisor, and isn't trying to be a hypervisor.
> Specifically, TDX should _never_ have it's own VMCSes (that are visible to the
> host; the TDX-Module has it's own VMCSes to do SEAMCALL/SEAMRET), and so there
> is simply no reason to move that functionality out of KVM.
> 
> With that out of the way, dealing with VMXON/VMXOFF and EFER.SVME is a fairly
> simple refcounting game.
> 
> Oh, and I didn't bother looking to see if it would work, but if TDX only needs
> VMXON during boot, then the TDX use of VMXON could be transient.

With the work-in-progress "Host Services", the expectation is that VMX
would remain on especially because there is no current way to de-init
TDX.

Now, the "TDX always-on even outside of Host Services" this series is
proposing gives me slight pause. I.e. Any resources that TDX gobbles, or
features that TDX is incompatible (ACPI S3), need a trip through a BIOS
menu to turn off.  However, if that becomes a problem in practice we can
circle back later to fix that up.

> could simply blast on_each_cpu() and forego the cpuhp and syscore hooks (a
> non-emergency reboot during init isn't possible).  I don't particuarly care
> what TDX does, as it's a fairly minor detail all things concerned.  I went with
> the "harder" approach, e.g. to validate keeping the VMXON users count elevated
> would do the right thing with respect to CPU offlining, etc.
> 
> Lightly tested (see the hacks below to verify the TDX side appears to do what
> it's supposed to do), but it seems to work?  Heavily RFC, e.g. the third patch
> in particular needs to be chunked up, I'm sure there's polishing to be done,
> etc.

Sounds good and I read this as "hey, this is the form I would like to
see, when someone else cleans this up and sends it back to me as a
non-RFC".

Thanks again!
Re: [RFC PATCH 0/4] KVM: x86/tdx: Have TDX handle VMXON during bringup
Posted by Alexey Kardashevskiy 2 months ago

On 14/10/25 09:22, dan.j.williams@intel.com wrote:
> [ Add Alexey for question below about SEV-TIO needing to enable SNP from
> the PSP driver? ]
> 
> Sean Christopherson wrote:
>> This is a sort of middle ground between fully yanking core virtualization
>> support out of KVM, and unconditionally doing VMXON during boot[0].
> 
> Thanks for this, Sean!
> 
>> I got quite far long on rebasing some internal patches we have to extract the
>> core virtualization bits out of KVM x86, but as I paged back in all of the
>> things we had punted on (because they were waaay out of scope for our needs),
>> I realized more and more that providing truly generic virtualization
>> instrastructure is vastly different than providing infrastructure that can be
>> shared by multiple instances of KVM (or things very similar to KVM)[1].
>>
>> So while I still don't want to blindly do VMXON, I also think that trying to
>> actually support another in-tree hypervisor, without an imminent user to drive
>> the development, is a waste of resources, and would saddle KVM with a pile of
>> pointless complexity.
>>
>> The idea here is to extract _only_ VMXON+VMXOFF and EFER.SVME toggling.  AFAIK
>> there's no second user of SVM, i.e. no equivalent to TDX, but I wanted to keep
>> things as symmetrical as possible.
> 
> Alexey did mention in the TEE I/O call that the PSP driver does need to
> turn on SVM. Added him to the Cc to clarify if SEV-TIO needs at least
> SVM enabled outside of KVM in some cases.

Nah, the PSP driver needs to enable those encrypted flavors of KVM (SEV, SEV-ES, SEV-SNP) in the PSP (set up memory keys or RMP) but the basic SVM does not need that, and EFER.SVME enables just it. When those SEV* are needed - KVM calls the PSP driver to enable those in the PSP, and shuts SEV* down when the last SEV* VM stops (or when kvm_amd unloads?). And the PSP cannot see EFER.SVME at any moment to act differently.

So until KVM tries VMRUN'ing a vCPU (+few more instructions), EFER.SVME does not matter afaict. Thanks,


-- 
Alexey
Re: [RFC PATCH 0/4] KVM: x86/tdx: Have TDX handle VMXON during bringup
Posted by Sean Christopherson 2 months ago
On Mon, Oct 13, 2025, dan.j.williams@intel.com wrote:
> > Emphasis on "only", because leaving VMCS tracking and clearing in KVM is
> > another key difference from Xin's series.  The "light bulb" moment on that
> > front is that TDX isn't a hypervisor, and isn't trying to be a hypervisor.
> > Specifically, TDX should _never_ have it's own VMCSes (that are visible to the
> > host; the TDX-Module has it's own VMCSes to do SEAMCALL/SEAMRET), and so there
> > is simply no reason to move that functionality out of KVM.
> > 
> > With that out of the way, dealing with VMXON/VMXOFF and EFER.SVME is a fairly
> > simple refcounting game.
> > 
> > Oh, and I didn't bother looking to see if it would work, but if TDX only needs
> > VMXON during boot, then the TDX use of VMXON could be transient.
> 
> With the work-in-progress "Host Services", the expectation is that VMX
> would remain on especially because there is no current way to de-init
> TDX.

What are Host Services?

> Now, the "TDX always-on even outside of Host Services" this series is
> proposing gives me slight pause. I.e. Any resources that TDX gobbles, or
> features that TDX is incompatible (ACPI S3), need a trip through a BIOS
> menu to turn off.  However, if that becomes a problem in practice we can
> circle back later to fix that up.

Oooh, by "TDX always-on" you mean invoking tdx_enable() during boot, as opposed
to throwing it into a loadable module.  To be honest, I completely missed the
whole PAMT allocation and imcompatible features side of things.

And Rick already pointed out that doing tdx_enable() during tdx_init() would be
far too early.

So it seems like the simple answer is to continue to have __tdx_bringup() invoke
tdx_enable(), but without all the caveats about the caller needed to hold the
CPUs lock, be post-VMXON, etc.

> > could simply blast on_each_cpu() and forego the cpuhp and syscore hooks (a
> > non-emergency reboot during init isn't possible).  I don't particuarly care
> > what TDX does, as it's a fairly minor detail all things concerned.  I went with
> > the "harder" approach, e.g. to validate keeping the VMXON users count elevated
> > would do the right thing with respect to CPU offlining, etc.
> > 
> > Lightly tested (see the hacks below to verify the TDX side appears to do what
> > it's supposed to do), but it seems to work?  Heavily RFC, e.g. the third patch
> > in particular needs to be chunked up, I'm sure there's polishing to be done,
> > etc.
> 
> Sounds good and I read this as "hey, this is the form I would like to
> see, when someone else cleans this up and sends it back to me as a
> non-RFC".

Actually, I think I can take it forward.  Knock wood, but I don't think there's
all that much left to be done.  Heck, even writing the code for the initial RFC
was a pretty short adventure once I had my head wrapped around the concept.
Re: [RFC PATCH 0/4] KVM: x86/tdx: Have TDX handle VMXON during bringup
Posted by dan.j.williams@intel.com 2 months ago
Sean Christopherson wrote:
> On Mon, Oct 13, 2025, dan.j.williams@intel.com wrote:
> > > Emphasis on "only", because leaving VMCS tracking and clearing in KVM is
> > > another key difference from Xin's series.  The "light bulb" moment on that
> > > front is that TDX isn't a hypervisor, and isn't trying to be a hypervisor.
> > > Specifically, TDX should _never_ have it's own VMCSes (that are visible to the
> > > host; the TDX-Module has it's own VMCSes to do SEAMCALL/SEAMRET), and so there
> > > is simply no reason to move that functionality out of KVM.
> > > 
> > > With that out of the way, dealing with VMXON/VMXOFF and EFER.SVME is a fairly
> > > simple refcounting game.
> > > 
> > > Oh, and I didn't bother looking to see if it would work, but if TDX only needs
> > > VMXON during boot, then the TDX use of VMXON could be transient.
> > 
> > With the work-in-progress "Host Services", the expectation is that VMX
> > would remain on especially because there is no current way to de-init
> > TDX.
> 
> What are Host Services?

That is my catch all name for TDX things that are independent of VMs.
Also called "tdx-host" in the preview patches [1]. This is capabilities
like updating the TDX Module at runtime, and the TEE I/O (TDX Connect)
stuff like establishing PCI device link encryption even if you never
assign that device to a VM.

[1]: http://lore.kernel.org/20250919142237.418648-2-dan.j.williams@intel.com

> > Now, the "TDX always-on even outside of Host Services" this series is
> > proposing gives me slight pause. I.e. Any resources that TDX gobbles, or
> > features that TDX is incompatible (ACPI S3), need a trip through a BIOS
> > menu to turn off.  However, if that becomes a problem in practice we can
> > circle back later to fix that up.
> 
> Oooh, by "TDX always-on" you mean invoking tdx_enable() during boot, as opposed
> to throwing it into a loadable module.  To be honest, I completely missed the
> whole PAMT allocation and imcompatible features side of things.
> 
> And Rick already pointed out that doing tdx_enable() during tdx_init() would be
> far too early.
> 
> So it seems like the simple answer is to continue to have __tdx_bringup() invoke
> tdx_enable(), but without all the caveats about the caller needed to hold the
> CPUs lock, be post-VMXON, etc.

Yeah, I like the option to hold off on paying any costs until absolutely
necessary.

The tdx-host driver will also be a direct tdx_enable() consumer, and it
is already prepared for resolving the "multiple consumers to race to
enable" case.

> > > non-emergency reboot during init isn't possible).  I don't particuarly care
> > > what TDX does, as it's a fairly minor detail all things concerned.  I went with
> > > the "harder" approach, e.g. to validate keeping the VMXON users count elevated
> > > would do the right thing with respect to CPU offlining, etc.
> > > 
> > > Lightly tested (see the hacks below to verify the TDX side appears to do what
> > > it's supposed to do), but it seems to work?  Heavily RFC, e.g. the third patch
> > > in particular needs to be chunked up, I'm sure there's polishing to be done,
> > > etc.
> > 
> > Sounds good and I read this as "hey, this is the form I would like to
> > see, when someone else cleans this up and sends it back to me as a
> > non-RFC".
> 
> Actually, I think I can take it forward.  Knock wood, but I don't think there's
> all that much left to be done.  Heck, even writing the code for the initial RFC
> was a pretty short adventure once I had my head wrapped around the concept.

Ack.
Re: [RFC PATCH 0/4] KVM: x86/tdx: Have TDX handle VMXON during bringup
Posted by dan.j.williams@intel.com 1 month ago
dan.j.williams@ wrote:
[..]
> > > Sounds good and I read this as "hey, this is the form I would like to
> > > see, when someone else cleans this up and sends it back to me as a
> > > non-RFC".
> > 
> > Actually, I think I can take it forward.  Knock wood, but I don't think there's
> > all that much left to be done.  Heck, even writing the code for the initial RFC
> > was a pretty short adventure once I had my head wrapped around the concept.
> 
> Ack.

FYI, this series is now included in tsm.git#staging [1]. Chao had one
fixup to it [2].

Recall that tsm.git#staging is where all of us working on PCI Device
Security (TDX, SEV, and CCA) can start tripping over each others
implementations [3] in a unified tree. 

The initial core work in that tree is a v6.19 candidate as long as at
least one arch implementation is also ready. SEV looks nearly ready [4].
CCA will sit out as it is going through a specification update. TDX is
ready save for this vmxon dependency.

I recall being concerned about the new TDX always-on stance, but I now
think it is ok. Just puts more pressure on the dynamic-PAMT work to
land. In the meantime, disabling TDX in the BIOS is a stopgap for those
that can not tolerate the static-PAMT overhead.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm.git/log/?h=staging
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm.git/commit/?id=406cd719d2a2
[3]: https://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm.git/commit/?id=e3d238ddeec0
[4]: http://lore.kernel.org/20251111063819.4098701-1-aik@amd.com