[v1] KVM: SEV: allow customizing VMSA features

[PATCH 00/10] KVM: SEV: allow customizing VMSA features

Posted by Paolo Bonzini 1 year, 12 months ago

The idea that no parameter would ever be necessary when enabling SEV or
SEV-ES for a VM was decidedly optimistic.  The first source of variability
that was encountered is the desired set of VMSA features, as that affects
the measurement of the VM's initial state and cannot be changed
arbitrarily by the hypervisor.

This series adds all the APIs that are needed to customize the features,
with room for future enhancements:

- a new /dev/kvm device attribute to retrieve the set of supported
  features (right now, only debug swap)

- a new sub-operation for KVM_MEM_ENCRYPT_OP that can take a struct,
  replacing the existing KVM_SEV_INIT and KVM_SEV_ES_INIT

It then puts the new op to work by including the VMSA features as a field
of the The existing KVM_SEV_INIT and KVM_SEV_ES_INIT use the full set of
supported VMSA features for backwards compatibility; but I am considering
also making them use zero as the feature mask, and will gladly adjust the
patches if so requested.

In order to avoid creating *two* new KVM_MEM_ENCRYPT_OPs, I decided that
I could as well make SEV and SEV-ES use VM types.  And then, why not make
a SEV-ES VM, when created with the new VM type instead of KVM_SEV_ES_INIT,
reject KVM_GET_REGS/KVM_SET_REGS and friends on the vCPU file descriptor
once the VMSA has been encrypted...  Which is how the API should have
always behaved.

The series is defined as follows:

- patches 1 and 2 are unrelated fixes and improvements for the SEV API

- patches 3 to 5 introduce the new device attribute to retrieve supported
  VMSA features

- patches 6 to 7 introduce new infrastructure for VM types, partly lifted
  out of the TDX patches

- patches 8 and 9 introduce respectively the new VM types for SEV and
  SEV-ES, and KVM_SEV_INIT2 as a new sub-operation for KVM_MEM_ENCRYPT_OP.

- patches 10 and 11 are tests.  The last patch is not intended to be applied
  in order to keep some coverage of KVM_SEV_INIT and KVM_SEV_ES_INIT in
  self tests; but it is there as "proof" that migration can be made to
  work with the new API as well.


The idea is that SEV SNP will only ever support KVM_SEV_INIT2.  I have
patches in progress for QEMU to support this new API.

Thanks,

Paolo

Isaku Yamahata (1):
  KVM: x86: Add is_vm_type_supported callback

Paolo Bonzini (10):
  KVM: x86: define standard behavior for bits 0/1 of VM type
  KVM: SEV: fix compat ABI for KVM_MEMORY_ENCRYPT_OP
  KVM: introduce new vendor op for KVM_GET_DEVICE_ATTR
  Documentation: kvm/sev: separate description of firmware
  KVM: SEV: publish supported VMSA features
  KVM: SEV: store VMSA features in kvm_sev_info
  KVM: SEV: define VM types for SEV and SEV-ES
  KVM: SEV: introduce KVM_SEV_INIT2 operation
  selftests: kvm: add tests for KVM_SEV_INIT2
  selftests: kvm: switch sev_migrate_tests to KVM_SEV_INIT2

 Documentation/virt/kvm/api.rst                |   2 +
 .../virt/kvm/x86/amd-memory-encryption.rst    |  81 +++++++--
 arch/x86/include/asm/kvm-x86-ops.h            |   2 +
 arch/x86/include/asm/kvm_host.h               |   4 +-
 arch/x86/include/uapi/asm/kvm.h               |  42 ++++-
 arch/x86/kvm/svm/sev.c                        | 104 +++++++++++-
 arch/x86/kvm/svm/svm.c                        |  14 +-
 arch/x86/kvm/svm/svm.h                        |   6 +-
 arch/x86/kvm/x86.c                            | 158 ++++++++++++++----
 arch/x86/kvm/x86.h                            |   2 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/include/kvm_util_base.h     |   6 +-
 .../selftests/kvm/set_memory_region_test.c    |   8 +-
 .../selftests/kvm/x86_64/sev_init2_tests.c    | 147 ++++++++++++++++
 .../selftests/kvm/x86_64/sev_migrate_tests.c  |  45 ++---
 15 files changed, 530 insertions(+), 92 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/sev_init2_tests.c

-- 
2.39.0

Re: [PATCH 00/10] KVM: SEV: allow customizing VMSA features

Posted by Sean Christopherson 1 year, 12 months ago

On Fri, Feb 09, 2024, Paolo Bonzini wrote:
> The idea that no parameter would ever be necessary when enabling SEV or
> SEV-ES for a VM was decidedly optimistic.

That implies there was a conscious decision regarding the uAPI.  AFAICT, all of
the SEV uAPIs are direct reflections of the PSP invocations.  Which is why I'm
being so draconian about the SNP uAPIs; this time around, we need to actually
design something.

> The first source of variability that was encountered is the desired set of
> VMSA features, as that affects the measurement of the VM's initial state and
> cannot be changed arbitrarily by the hypervisor.
> 
> This series adds all the APIs that are needed to customize the features,
> with room for future enhancements:
> 
> - a new /dev/kvm device attribute to retrieve the set of supported
>   features (right now, only debug swap)
> 
> - a new sub-operation for KVM_MEM_ENCRYPT_OP that can take a struct,
>   replacing the existing KVM_SEV_INIT and KVM_SEV_ES_INIT
> 
> It then puts the new op to work by including the VMSA features as a field
> of the The existing KVM_SEV_INIT and KVM_SEV_ES_INIT use the full set of
> supported VMSA features for backwards compatibility; but I am considering
> also making them use zero as the feature mask, and will gladly adjust the
> patches if so requested.

Rather than add a new KVM_MEMORY_ENCRYPT_OP, I think we should go for broke and
start building the generic set of "protected VM" APIs.  E.g. TDX wants to add
KVM_TDX_INIT_VM, and I'm guessing ARM needs similar functionality.  And AFAIK,
every technology follows an INIT => ADD (MEASURE) * N => FINALIZE type sequence.

If need be, I would rather have a massive union, a la kvm_run, to hold the vendor
specific bits than end up with sub-sub-ioctls and every vendor implementation
reinventing the wheel.

If it's sane and feasible for userspace, maybe even KVM_CREATE_VM2?

> In order to avoid creating *two* new KVM_MEM_ENCRYPT_OPs, I decided that
> I could as well make SEV and SEV-ES use VM types.  And then, why not make
> a SEV-ES VM, when created with the new VM type instead of KVM_SEV_ES_INIT,
> reject KVM_GET_REGS/KVM_SET_REGS and friends on the vCPU file descriptor
> once the VMSA has been encrypted...  Which is how the API should have
> always behaved.

+1000

Re: [PATCH 00/10] KVM: SEV: allow customizing VMSA features

Posted by Paolo Bonzini 1 year, 12 months ago

On Fri, Feb 9, 2024 at 8:40 PM Sean Christopherson <seanjc@google.com> wrote:
> On Fri, Feb 09, 2024, Paolo Bonzini wrote:
> > The idea that no parameter would ever be necessary when enabling SEV or
> > SEV-ES for a VM was decidedly optimistic.
>
> That implies there was a conscious decision regarding the uAPI.  AFAICT, all of
> the SEV uAPIs are direct reflections of the PSP invocations.  Which is why I'm
> being so draconian about the SNP uAPIs; this time around, we need to actually
> design something.

You liked that word, heh? :) The part that I am less sure about, is
that it's actually _possible_ to design something.

If you end up with a KVM_CREATE_VM2 whose arguments are

   uint32_t flags;
   uint32_t vm_type;
   uint64_t arch_mishmash_0; /* Intel only */
   uint64_t arch_mishmash_1; /* AMD only */
   uint64_t arch_mishmash_2; /* Intel only */
   uint64_t arch_mishmash_3; /* AMD only */

and half of the flags are Intel only, the other half are AMD only...
do you actually gain anything over a vendor-specific ioctl?

Case in point being that the SEV VMSA features would be one of the
fields above, and they would obviously not be available for TDX.

kvm_run is a different story because it's the result of mmap, and not
a ioctl. But in this case we have:

- pretty generic APIs like UPDATE_DATA and MEASURE that should just be
renamed to remove SEV references. Even DBG_DECRYPT and DBG_ENCRYPT
fall in this category

- APIs that seem okay but may depend on specific initialization flows,
for example LAUNCH_UPDATE_VMSA. One example of the problems with
initialization flows is LAUNCH_FINISH, which seems pretty tame but is
different between SEV{,-ES} and SNP. Another example could be CPUID
which is slightly different between vendors.

- APIs that are currently vendor-specific, but where a second version
has a chance of being cross-vendor, for example LAUNCH_SECRET or
GET_ATTESTATION_REPORT. Or maybe not.

- others that have no hope, because they include so many pieces of
vendor-specific data that there's hardly anything to share. INIT is
one of them. I guess you could fit the Intel CPUID square hole into
AMD's CPUID round peg or vice versa, but there's really little in
common between the two.

I think we should try to go for the first three, but realistically
will have to stop at the first one in most cases. Honestly, this
unified API effort should have started years ago if we wanted to go
there. I see where you're coming from, but the benefits are marginal
(like the amount of userspace code that could be reused) and the
effort huge in comparison. And especially, it's much worse to get an
attempt at a unified API wrong, than to have N different APIs.

This is not a free-for-all, there are definitely some
KVM_MEMORY_ENCRYPT_OP that can be shared between SEV and TDX. The
series also adds VM type support for SEV which fixes a poor past
choice. I don't think there's much to gain from sharing the whole INIT
phase though.

Paolo

Re: [PATCH 00/10] KVM: SEV: allow customizing VMSA features

Posted by Sean Christopherson 1 year, 12 months ago

On Fri, Feb 09, 2024, Paolo Bonzini wrote:
> On Fri, Feb 9, 2024 at 8:40 PM Sean Christopherson <seanjc@google.com> wrote:
> > On Fri, Feb 09, 2024, Paolo Bonzini wrote:
> > > The idea that no parameter would ever be necessary when enabling SEV or
> > > SEV-ES for a VM was decidedly optimistic.
> >
> > That implies there was a conscious decision regarding the uAPI.  AFAICT, all of
> > the SEV uAPIs are direct reflections of the PSP invocations.  Which is why I'm
> > being so draconian about the SNP uAPIs; this time around, we need to actually
> > design something.
> 
> You liked that word, heh? :) The part that I am less sure about, is
> that it's actually _possible_ to design something.
> 
> If you end up with a KVM_CREATE_VM2 whose arguments are
> 
>    uint32_t flags;
>    uint32_t vm_type;
>    uint64_t arch_mishmash_0; /* Intel only */
>    uint64_t arch_mishmash_1; /* AMD only */
>    uint64_t arch_mishmash_2; /* Intel only */
>    uint64_t arch_mishmash_3; /* AMD only */
> 
> and half of the flags are Intel only, the other half are AMD only...
> do you actually gain anything over a vendor-specific ioctl?

Sane, generic names.  I agree the code gains are likely negligible, but for me
at least, having KVM-friendly names for the commands would be immensely helpful.
E.g. for KVM_CREATE_VM2, I was thinking more like:

  __u32 flags;
  __u32 vm_type;
  union {
	struct tdx;
	struct sev;
	struct sev_es;
	struct sev_snp;
	__u8 pad[<big size>]
  };

Rinse and repeat for APIs that have a common purpose, but different payloads.

Similar to KVM_{SET,GET}_NESTED_STATE, where the data is wildly different, and
there's very little overlap between {svm,vmx}_set_nested_state(), I find it quite
valuable to have a single set of APIs.  E.g. I don't have to translate between
VMX and SVM terminology when thinking about the APIs, when discussing them, etc.

That's especially true for all this CoCo goo, where the names are ridiculously
divergent, and often not exactly intuitive.  E.g. LAUNCH_MEASURE reads like
"measure the launch", but surprise, it's "get the measurement".

> Case in point being that the SEV VMSA features would be one of the
> fields above, and they would obviously not be available for TDX.
> 
> kvm_run is a different story because it's the result of mmap, and not
> a ioctl. But in this case we have:
> 
> - pretty generic APIs like UPDATE_DATA and MEASURE that should just be
> renamed to remove SEV references. Even DBG_DECRYPT and DBG_ENCRYPT
> fall in this category
> 
> - APIs that seem okay but may depend on specific initialization flows,
> for example LAUNCH_UPDATE_VMSA. One example of the problems with
> initialization flows is LAUNCH_FINISH, which seems pretty tame but is
> different between SEV{,-ES} and SNP. Another example could be CPUID
> which is slightly different between vendors.
> 
> - APIs that are currently vendor-specific, but where a second version
> has a chance of being cross-vendor, for example LAUNCH_SECRET or
> GET_ATTESTATION_REPORT. Or maybe not.

AFAICT, LAUNCH_SECRET (a.k.a. LAUNCH_UPDATE_SECRET) and GET_ATTESTATION_REPORT
shouldn't even be a KVM APIs.  Ditto for LAUNCH_MEASURE, and probably other PSP
commands.  IIUC, userspace has the VM's firmware handle, I don't see why KVM
needs to be a middle-man.

> - others that have no hope, because they include so many pieces of
> vendor-specific data that there's hardly anything to share. INIT is
> one of them. I guess you could fit the Intel CPUID square hole into
> AMD's CPUID round peg or vice versa, but there's really little in
> common between the two.
> 
> I think we should try to go for the first three, but realistically
> will have to stop at the first one in most cases. Honestly, this
> unified API effort should have started years ago if we wanted to go
> there. I see where you're coming from, but the benefits are marginal
> (like the amount of userspace code that could be reused) and the
> effort huge in comparison.

The effort doesn't seem huge, so long as we don't try to make the parameters
common across vendor code.  The list of APIs doesn't seem insurmountable (note,
I'm not entirely sure these are correct mappings):

  create
  init VM   (LAUNCH_START / TDH.MNG.INIT)
  update    (LAUNCH_UPDATE_DATA / TDH.MEM.PAGE.ADD+TDH.MR.EXTEND)
  init vCPU (LAUNCH_UPDATE_VMSA / TDH.VP.INIT)
  finalize  (LAUNCH_FINISH / TDH.MR.FINALIZE)

Re: [PATCH 00/10] KVM: SEV: allow customizing VMSA features

Posted by Paolo Bonzini 1 year, 12 months ago

On Tue, Feb 13, 2024 at 3:46 AM Sean Christopherson <seanjc@google.com> wrote:
>   __u32 flags;
>   __u32 vm_type;
>   union {
>         struct tdx;
>         struct sev;
>         struct sev_es;
>         struct sev_snp;
>         __u8 pad[<big size>]
>   };
>
> Rinse and repeat for APIs that have a common purpose, but different payloads.
>
> Similar to KVM_{SET,GET}_NESTED_STATE, where the data is wildly different, and
> there's very little overlap between {svm,vmx}_set_nested_state(), I find it quite
> valuable to have a single set of APIs.  E.g. I don't have to translate between
> VMX and SVM terminology when thinking about the APIs, when discussing them, etc.
>
> That's especially true for all this CoCo goo, where the names are ridiculously
> divergent, and often not exactly intuitive.  E.g. LAUNCH_MEASURE reads like
> "measure the launch", but surprise, it's "get the measurement".

I agree, but then you'd have to do things like "CPUID data is passed
via UPDATE_DATA for SEV and INIT_VM for TDX (and probably not at all
for pKVM)". And in one case the firmware may prefer to encrypt in
place, in the other you cannot do that at all.

There was a reason why SVM support was not added from the beginning.
Before adding nested get/set support for SVM, the whole nested
virtualization was made as similar as possible in design and
functionality to VMX. Of course it cannot be entirely the same, but
for example they share the overall idea that pending events and L2
state are taken from vCPU state; kvm_nested_state only stores global
processor state (VMXON/VMCS pointers on VMX, and GIF on SVM) and,
while in guest mode, L1 state and control bits. This ensures that the
same userspace flow can work for both VMX and SVM. However, in this
case we can't really control what is done in firmware.

> The effort doesn't seem huge, so long as we don't try to make the parameters
> common across vendor code.  The list of APIs doesn't seem insurmountable (note,
> I'm not entirely sure these are correct mappings):

While the effort isn't huge, the benefit is also pretty small, which
comes to a second big difference with GET/SET_NESTED_STATE: because
there is a GET ioctl, we have the possibility of retrieving the "black
box" and passing it back. With CoCo it's anyway userspace's task to
fill in the parameter structs. I just don't see the possibility of
sharing any code except the final ioctl, which to be honest is not
much to show. And the higher price might be in re-reviewing code that
has already been reviewed, both in KVM and in userspace.

Paolo

>   create
>   init VM   (LAUNCH_START / TDH.MNG.INIT)
>   update    (LAUNCH_UPDATE_DATA / TDH.MEM.PAGE.ADD+TDH.MR.EXTEND)
>   init vCPU (LAUNCH_UPDATE_VMSA / TDH.VP.INIT)
>   finalize  (LAUNCH_FINISH / TDH.MR.FINALIZE)

Re: [PATCH 00/10] KVM: SEV: allow customizing VMSA features

Posted by Sean Christopherson 1 year, 11 months ago

On Tue, Feb 13, 2024, Paolo Bonzini wrote:
> On Tue, Feb 13, 2024 at 3:46 AM Sean Christopherson <seanjc@google.com> wrote:
> >   __u32 flags;
> >   __u32 vm_type;
> >   union {
> >         struct tdx;
> >         struct sev;
> >         struct sev_es;
> >         struct sev_snp;
> >         __u8 pad[<big size>]
> >   };
> >
> > Rinse and repeat for APIs that have a common purpose, but different payloads.
> >
> > Similar to KVM_{SET,GET}_NESTED_STATE, where the data is wildly different, and
> > there's very little overlap between {svm,vmx}_set_nested_state(), I find it quite
> > valuable to have a single set of APIs.  E.g. I don't have to translate between
> > VMX and SVM terminology when thinking about the APIs, when discussing them, etc.
> >
> > That's especially true for all this CoCo goo, where the names are ridiculously
> > divergent, and often not exactly intuitive.  E.g. LAUNCH_MEASURE reads like
> > "measure the launch", but surprise, it's "get the measurement".
> 
> I agree, but then you'd have to do things like "CPUID data is passed
> via UPDATE_DATA for SEV and INIT_VM for TDX (and probably not at all
> for pKVM)". And in one case the firmware may prefer to encrypt in
> place, in the other you cannot do that at all.
> 
> There was a reason why SVM support was not added from the beginning.
> Before adding nested get/set support for SVM, the whole nested
> virtualization was made as similar as possible in design and
> functionality to VMX. Of course it cannot be entirely the same, but
> for example they share the overall idea that pending events and L2
> state are taken from vCPU state; kvm_nested_state only stores global
> processor state (VMXON/VMCS pointers on VMX, and GIF on SVM) and,
> while in guest mode, L1 state and control bits. This ensures that the
> same userspace flow can work for both VMX and SVM. However, in this
> case we can't really control what is done in firmware.
> 
> > The effort doesn't seem huge, so long as we don't try to make the parameters
> > common across vendor code.  The list of APIs doesn't seem insurmountable (note,
> > I'm not entirely sure these are correct mappings):
> 
> While the effort isn't huge, the benefit is also pretty small, which
> comes to a second big difference with GET/SET_NESTED_STATE: because
> there is a GET ioctl, we have the possibility of retrieving the "black
> box" and passing it back. With CoCo it's anyway userspace's task to
> fill in the parameter structs. I just don't see the possibility of
> sharing any code except the final ioctl, which to be honest is not
> much to show. And the higher price might be in re-reviewing code that
> has already been reviewed, both in KVM and in userspace.

Yeah, I realize I'm probably grasping at straws.  *sigh*