[PATCH v15 19/20] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event

Michael Roth posted 20 patches 1 year, 7 months ago
[PATCH v15 19/20] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
Posted by Michael Roth 1 year, 7 months ago
Version 2 of GHCB specification added support for the SNP Extended Guest
Request Message NAE event. This event serves a nearly identical purpose
to the previously-added SNP_GUEST_REQUEST event, but allows for
additional certificate data to be supplied via an additional
guest-supplied buffer to be used mainly for verifying the signature of
an attestation report as returned by firmware.

This certificate data is supplied by userspace, so unlike with
SNP_GUEST_REQUEST events, SNP_EXTENDED_GUEST_REQUEST events are first
forwarded to userspace via a KVM_EXIT_VMGEXIT exit structure, and then
the firmware request is made after the certificate data has been fetched
from userspace.

Since there is a potential for race conditions where the
userspace-supplied certificate data may be out-of-sync relative to the
reported TCB or VLEK that firmware will use when signing attestation
reports, a hook is also provided so that userspace can be informed once
the attestation request is actually completed. See the updates to
Documentation/ for more details on these aspects.

Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 Documentation/virt/kvm/api.rst | 87 ++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/sev.c         | 86 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.h         |  3 ++
 include/uapi/linux/kvm.h       | 23 +++++++++
 4 files changed, 199 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index f0b76ff5030d..f3780ac98d56 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7060,6 +7060,93 @@ Please note that the kernel is allowed to use the kvm_run structure as the
 primary storage for certain register types. Therefore, the kernel may use the
 values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
 
+::
+
+		/* KVM_EXIT_VMGEXIT */
+		struct kvm_user_vmgexit {
+  #define KVM_USER_VMGEXIT_REQ_CERTS		1
+			__u32 type; /* KVM_USER_VMGEXIT_* type */
+			union {
+				struct {
+					__u64 data_gpa;
+					__u64 data_npages;
+  #define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_INVALID_LEN   1
+  #define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_BUSY          2
+  #define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_GENERIC       (1 << 31)
+					__u32 ret;
+  #define KVM_USER_VMGEXIT_REQ_CERTS_FLAGS_NOTIFY_DONE	BIT(0)
+					__u8 flags;
+  #define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_PENDING	0
+  #define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE		1
+					__u8 status;
+				} req_certs;
+			};
+		};
+
+
+If exit reason is KVM_EXIT_VMGEXIT then it indicates that an SEV-SNP guest
+has issued a VMGEXIT instruction (as documented by the AMD Architecture
+Programmer's Manual (APM)) to the hypervisor that needs to be serviced by
+userspace. These are generally handled by the host kernel, but in some
+cases some aspects of handling a VMGEXIT are done in userspace.
+
+A kvm_user_vmgexit structure is defined to encapsulate the data to be
+sent to or returned by userspace. The type field defines the specific type
+of exit that needs to be serviced, and that type is used as a discriminator
+to determine which union type should be used for input/output.
+
+KVM_USER_VMGEXIT_REQ_CERTS
+--------------------------
+
+When an SEV-SNP issues a guest request for an attestation report, it has the
+option of issuing it in the form an *extended* guest request when a
+certificate blob is returned alongside the attestation report so the guest
+can validate the endorsement key used by SNP firmware to sign the report.
+These certificates are managed by userspace and are requested via
+KVM_EXIT_VMGEXITs using the KVM_USER_VMGEXIT_REQ_CERTS type.
+
+For the KVM_USER_VMGEXIT_REQ_CERTS type, the req_certs union type
+is used. The kernel will supply in 'data_gpa' the value the guest supplies
+via the RAX field of the GHCB when issuing extended guest requests.
+'data_npages' will similarly contain the value the guest supplies in RBX
+denoting the number of shared pages available to write the certificate
+data into.
+
+  - If the supplied number of pages is sufficient, userspace should write
+    the certificate data blob (in the format defined by the GHCB spec) in
+    the address indicated by 'data_gpa' and set 'ret' to 0.
+
+  - If the number of pages supplied is not sufficient, userspace must write
+    the required number of pages in 'data_npages' and then set 'ret' to 1.
+
+  - If userspace is temporarily unable to handle the request, 'ret' should
+    be set to 2 to inform the guest to retry later.
+
+  - If some other error occurred, userspace should set 'ret' to a non-zero
+    value that is distinct from the specific return values mentioned above.
+
+Generally some care needs be taken to keep the returned certificate data in
+sync with the actual endorsement key in use by firmware at the time the
+attestation request is sent to SNP firmware. The recommended scheme to do
+this is for the VMM to obtain a shared or exclusive lock on the path the
+certificate blob file resides at before reading it and returning it to KVM,
+and that it continues to hold the lock until the attestation request is
+actually sent to firmware. To facilitate this, the VMM can set the
+KVM_USER_VMGEXIT_REQ_CERTS_FLAGS_NOTIFY_DONE flag before returning the
+certificate blob, in which case another KVM_EXIT_VMGEXIT of type
+KVM_USER_VMGEXIT_REQ_CERTS will be sent to userspace with
+KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE being set in the status field to
+indicate the request is fully-completed and that any associated locks can be
+released.
+
+Tools/libraries that perform updates to SNP firmware TCB values or endorsement
+keys (e.g. firmware interfaces such as SNP_COMMIT, SNP_SET_CONFIG, or
+SNP_VLEK_LOAD, see Documentation/virt/coco/sev-guest.rst for more details) in
+such a way that the certificate blob needs to be updated, should similarly
+take an exclusive lock on the certificate blob for the duration of any updates
+to firmware or the certificate blob contents to ensure that VMMs using the
+above scheme will not return certificate blob data that is out of sync with
+firmware.
 
 6. Capabilities that can be enabled on vCPUs
 ============================================
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 5c6262f3232f..35f0bd91f92e 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3297,6 +3297,11 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 		if (!sev_snp_guest(vcpu->kvm))
 			goto vmgexit_err;
 		break;
+	case SVM_VMGEXIT_EXT_GUEST_REQUEST:
+		if (!sev_snp_guest(vcpu->kvm) || !kvm_ghcb_rax_is_valid(svm) ||
+		    !kvm_ghcb_rbx_is_valid(svm))
+			goto vmgexit_err;
+		break;
 	default:
 		reason = GHCB_ERR_INVALID_EVENT;
 		goto vmgexit_err;
@@ -3988,6 +3993,84 @@ static void snp_handle_guest_req(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp
 	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
 }
 
+static int snp_complete_ext_guest_req(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	struct vmcb_control_area *control;
+	struct kvm *kvm = vcpu->kvm;
+	sev_ret_code fw_err = 0;
+	int vmm_ret;
+
+	vmm_ret = vcpu->run->vmgexit.req_certs.ret;
+	if (vmm_ret) {
+		if (vmm_ret == SNP_GUEST_VMM_ERR_INVALID_LEN)
+			vcpu->arch.regs[VCPU_REGS_RBX] =
+				vcpu->run->vmgexit.req_certs.data_npages;
+		goto out;
+	}
+
+	/*
+	 * The request was completed on the previous completion callback and
+	 * this completion is only for the STATUS_DONE userspace notification.
+	 */
+	if (vcpu->run->vmgexit.req_certs.status == KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE)
+		goto out_resume;
+
+	control = &svm->vmcb->control;
+
+	if (__snp_handle_guest_req(kvm, control->exit_info_1,
+				   control->exit_info_2, &fw_err))
+		vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
+
+out:
+	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
+
+	if (vcpu->run->vmgexit.req_certs.flags & KVM_USER_VMGEXIT_REQ_CERTS_FLAGS_NOTIFY_DONE) {
+		vcpu->run->vmgexit.req_certs.status = KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE;
+		vcpu->run->vmgexit.req_certs.flags = 0;
+		return 0; /* notify userspace of completion */
+	}
+
+out_resume:
+	return 1; /* resume guest */
+}
+
+static int snp_begin_ext_guest_req(struct kvm_vcpu *vcpu)
+{
+	int vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
+	struct vcpu_svm *svm = to_svm(vcpu);
+	unsigned long data_npages;
+	sev_ret_code fw_err;
+	gpa_t data_gpa;
+
+	if (!sev_snp_guest(vcpu->kvm))
+		goto abort_request;
+
+	data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
+	data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
+
+	if (!IS_ALIGNED(data_gpa, PAGE_SIZE))
+		goto abort_request;
+
+	/*
+	 * Grab the certificates from userspace so that can be bundled with
+	 * attestation/guest requests.
+	 */
+	vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
+	vcpu->run->vmgexit.type = KVM_USER_VMGEXIT_REQ_CERTS;
+	vcpu->run->vmgexit.req_certs.data_gpa = data_gpa;
+	vcpu->run->vmgexit.req_certs.data_npages = data_npages;
+	vcpu->run->vmgexit.req_certs.flags = 0;
+	vcpu->run->vmgexit.req_certs.status = KVM_USER_VMGEXIT_REQ_CERTS_STATUS_PENDING;
+	vcpu->arch.complete_userspace_io = snp_complete_ext_guest_req;
+
+	return 0; /* forward request to userspace */
+
+abort_request:
+	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
+	return 1; /* resume guest */
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
@@ -4266,6 +4349,9 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		snp_handle_guest_req(svm, control->exit_info_1, control->exit_info_2);
 		ret = 1;
 		break;
+	case SVM_VMGEXIT_EXT_GUEST_REQUEST:
+		ret = snp_begin_ext_guest_req(vcpu);
+		break;
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 		vcpu_unimpl(vcpu,
 			    "vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index e325ede0f463..83c562b4712a 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -309,6 +309,9 @@ struct vcpu_svm {
 
 	/* Guest GIF value, used when vGIF is not enabled */
 	bool guest_gif;
+
+	/* Transaction ID associated with SNP config updates */
+	u64 snp_transaction_id;
 };
 
 struct svm_cpu_data {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2190adbe3002..106367d87189 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -135,6 +135,26 @@ struct kvm_xen_exit {
 	} u;
 };
 
+struct kvm_user_vmgexit {
+#define KVM_USER_VMGEXIT_REQ_CERTS		1
+	__u32 type; /* KVM_USER_VMGEXIT_* type */
+	union {
+		struct {
+			__u64 data_gpa;
+			__u64 data_npages;
+#define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_INVALID_LEN   1
+#define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_BUSY          2
+#define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_GENERIC       (1 << 31)
+			__u32 ret;
+#define KVM_USER_VMGEXIT_REQ_CERTS_FLAGS_NOTIFY_DONE	BIT(0)
+			__u8 flags;
+#define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_PENDING	0
+#define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE		1
+			__u8 status;
+		} req_certs;
+	};
+};
+
 #define KVM_S390_GET_SKEYS_NONE   1
 #define KVM_S390_SKEYS_MAX        1048576
 
@@ -178,6 +198,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_NOTIFY           37
 #define KVM_EXIT_LOONGARCH_IOCSR  38
 #define KVM_EXIT_MEMORY_FAULT     39
+#define KVM_EXIT_VMGEXIT          40
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -433,6 +454,8 @@ struct kvm_run {
 			__u64 gpa;
 			__u64 size;
 		} memory_fault;
+		/* KVM_EXIT_VMGEXIT */
+		struct kvm_user_vmgexit vmgexit;
 		/* Fix the size of the union. */
 		char padding[256];
 	};
-- 
2.25.1
Re: [PATCH v15 19/20] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
Posted by Sean Christopherson 1 year, 7 months ago
On Wed, May 01, 2024, Michael Roth wrote:
> Version 2 of GHCB specification added support for the SNP Extended Guest
> Request Message NAE event. This event serves a nearly identical purpose
> to the previously-added SNP_GUEST_REQUEST event, but allows for
> additional certificate data to be supplied via an additional
> guest-supplied buffer to be used mainly for verifying the signature of
> an attestation report as returned by firmware.
> 
> This certificate data is supplied by userspace, so unlike with
> SNP_GUEST_REQUEST events, SNP_EXTENDED_GUEST_REQUEST events are first
> forwarded to userspace via a KVM_EXIT_VMGEXIT exit structure, and then
> the firmware request is made after the certificate data has been fetched
> from userspace.
> 
> Since there is a potential for race conditions where the
> userspace-supplied certificate data may be out-of-sync relative to the
> reported TCB or VLEK that firmware will use when signing attestation
> reports, a hook is also provided so that userspace can be informed once
> the attestation request is actually completed. See the updates to
> Documentation/ for more details on these aspects.
> 
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  Documentation/virt/kvm/api.rst | 87 ++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/svm/sev.c         | 86 +++++++++++++++++++++++++++++++++
>  arch/x86/kvm/svm/svm.h         |  3 ++
>  include/uapi/linux/kvm.h       | 23 +++++++++
>  4 files changed, 199 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index f0b76ff5030d..f3780ac98d56 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -7060,6 +7060,93 @@ Please note that the kernel is allowed to use the kvm_run structure as the
>  primary storage for certain register types. Therefore, the kernel may use the
>  values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
>  
> +::
> +
> +		/* KVM_EXIT_VMGEXIT */
> +		struct kvm_user_vmgexit {

LOL, it looks dumb, but maybe kvm_vmgexit_exit to avoid confusing about whether
the struct refers to host userspace vs. guest userspace?

Actually, I vote to punt on naming until more exits need to be kicked to userspace,
and just do (see below for details on how I got here):

		/* KVM_EXIT_VMGEXIT */
		struct {
			__u64 exit_code;
			union {
				struct {
					__u64 data_gpa;
					__u64 data_npages;
					__u64 ret;
				} req_certs;
			};
		} vmgexit;

> +  #define KVM_USER_VMGEXIT_REQ_CERTS		1
> +			__u32 type; /* KVM_USER_VMGEXIT_* type */

Regardless of whether or not requesting a certificate is vendor specific enough
to justify its own exit reason, I don't think KVM should have a #VMGEXIT that
adds its own layer.  Structuring the user exit this way will make it weird and/or
difficult to handle #VMGEXITs that _do_ fit a generic pattern, e.g. a user might
wonder why PSC #VMGEXITs don't show up here.

And defining an exit reason that is, for all intents and purposes, a regurgitation
of the raw #VMGEXIT reason, but with a different value, is also confusing.  E.g.
it wouldn't be unreasonable for a reader to expect that "type" matches the value
defined in the GHCB (or whever the values are defined).

Ah, you copied what KVM does for Hyper-V and Xen emulation.  Hrm.  But only
partially.

Assuming it's impractical to have a generic user exit for this, and we think
there is a high likelihood of needing to punt more #VMGEXITs to userspace, then
we should more closely (perhaps even exactly) follow the Hyper-V and Xen models.
I.e. for all values and whanot that are controlled/defined by a third party
(Hyper-V, Xen, the GHCB, etc.) #define those values in a header that is clearly
"owned" by the third party.

E.g. IIRC, include/xen/interface/xen.h is copied verbatim from Xen documentation
(source?).  And include/asm-generic/hyperv-tlfs.h is the kernel's copy of the
TLFS, which dictates all of the Hyper-V hypercalls.

If we do that, then my concerns/objections largely go away, e.g. KVM isn't
defining magic values, there's less chance for confusion about what "type" holds,
etc.

Oh, and if we go that route, the sizes for all fields should follow the GHCB,
e.g. I believe the "type" should be a __u64.

> +			union {
> +				struct {
> +					__u64 data_gpa;
> +					__u64 data_npages;
> +  #define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_INVALID_LEN   1
> +  #define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_BUSY          2
> +  #define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_GENERIC       (1 << 31)

Hopefully it won't matter, but are BUSY and GENERIC actually defined somewhere?
I don't see them in GHCB 2.0.

In a perfect world, it would be nice for KVM to not have to care about the error
codes.  But KVM disallows KVM_{G,S}ET_REGS for guest with protected state, which
means it's not feasible for userspace to set registers, at least not in any sane
way.

Heh, we could abuse KVM_SYNC_X86_REGS to let userspace specify RBX, but (a) that's
gross, and (b) KVM_SYNC_X86_REGS and KVM_SYNC_X86_SREGS really ought to be rejected
if guest state is protected.

> +					__u32 ret;
> +  #define KVM_USER_VMGEXIT_REQ_CERTS_FLAGS_NOTIFY_DONE	BIT(0)

This has no business being buried in a VMGEXIT_REQ_CERTS flags.  Notifying
userspace that KVM completed its portion of a userspace exit is completely generic.

And aside from where the notification flag lives, _if_ we add a notification
mechanism, it belongs in a separate patch, because it's purely a performance
optimization.  Userspace can use immediate_exit to force KVM to re-exit after
completing an exit.

Actually, I take that back, this isn't even an optimization, it's literally a
non-generic implementation of kvm_run.immediate_exit.

If this were an optimization, i.e. KVM truly notified userspace without exiting,
then it would need to be a lot more robust, e.g. to ensure userspace actually
received the notification before KVM moved on.

> +					__u8 flags;
> +  #define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_PENDING	0
> +  #define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE		1

This is also a weird reimplementation of generic functionality.  KVM nullifies
vcpu->arch.complete_userspace_io _before_ invoking the callback.  So if a callback
needs to run again on the next KVM_RUN, it can simply set complete_userspace_io
again.  In other words, literally doing nothing will get you what you want :-)

> +					__u8 status;
> +				} req_certs;
> +			};
> +		};
Re: [PATCH v15 19/20] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
Posted by Michael Roth 1 year, 7 months ago
On Mon, May 13, 2024 at 04:48:25PM -0700, Sean Christopherson wrote:
> On Wed, May 01, 2024, Michael Roth wrote:
> > Version 2 of GHCB specification added support for the SNP Extended Guest
> > Request Message NAE event. This event serves a nearly identical purpose
> > to the previously-added SNP_GUEST_REQUEST event, but allows for
> > additional certificate data to be supplied via an additional
> > guest-supplied buffer to be used mainly for verifying the signature of
> > an attestation report as returned by firmware.
> > 
> > This certificate data is supplied by userspace, so unlike with
> > SNP_GUEST_REQUEST events, SNP_EXTENDED_GUEST_REQUEST events are first
> > forwarded to userspace via a KVM_EXIT_VMGEXIT exit structure, and then
> > the firmware request is made after the certificate data has been fetched
> > from userspace.
> > 
> > Since there is a potential for race conditions where the
> > userspace-supplied certificate data may be out-of-sync relative to the
> > reported TCB or VLEK that firmware will use when signing attestation
> > reports, a hook is also provided so that userspace can be informed once
> > the attestation request is actually completed. See the updates to
> > Documentation/ for more details on these aspects.
> > 
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > ---
> >  Documentation/virt/kvm/api.rst | 87 ++++++++++++++++++++++++++++++++++
> >  arch/x86/kvm/svm/sev.c         | 86 +++++++++++++++++++++++++++++++++
> >  arch/x86/kvm/svm/svm.h         |  3 ++
> >  include/uapi/linux/kvm.h       | 23 +++++++++
> >  4 files changed, 199 insertions(+)
> > 
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index f0b76ff5030d..f3780ac98d56 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -7060,6 +7060,93 @@ Please note that the kernel is allowed to use the kvm_run structure as the
> >  primary storage for certain register types. Therefore, the kernel may use the
> >  values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
> >  
> > +::
> > +
> > +		/* KVM_EXIT_VMGEXIT */
> > +		struct kvm_user_vmgexit {
> 
> LOL, it looks dumb, but maybe kvm_vmgexit_exit to avoid confusing about whether
> the struct refers to host userspace vs. guest userspace?
> 
> Actually, I vote to punt on naming until more exits need to be kicked to userspace,
> and just do (see below for details on how I got here):
> 
> 		/* KVM_EXIT_VMGEXIT */
> 		struct {
> 			__u64 exit_code;
> 			union {
> 				struct {
> 					__u64 data_gpa;
> 					__u64 data_npages;
> 					__u64 ret;
> 				} req_certs;
> 			};
> 		} vmgexit;
> 
> > +  #define KVM_USER_VMGEXIT_REQ_CERTS		1
> > +			__u32 type; /* KVM_USER_VMGEXIT_* type */
> 
> Regardless of whether or not requesting a certificate is vendor specific enough
> to justify its own exit reason, I don't think KVM should have a #VMGEXIT that
> adds its own layer.  Structuring the user exit this way will make it weird and/or
> difficult to handle #VMGEXITs that _do_ fit a generic pattern, e.g. a user might
> wonder why PSC #VMGEXITs don't show up here.
> 
> And defining an exit reason that is, for all intents and purposes, a regurgitation
> of the raw #VMGEXIT reason, but with a different value, is also confusing.  E.g.
> it wouldn't be unreasonable for a reader to expect that "type" matches the value
> defined in the GHCB (or whever the values are defined).

The type in this case is actually "extended guest request". You'd rightly
pointed out that that is miles away from describing what KVM wants
userspace to do, so I named it "request certificate". And now with PSC being
handled as seperate KVM_HC_MAP_GPA_RANGE event with no exposure of GHCB/etc
to userspace, it made further sense to not lean too heavily on the GHCB for
defining the types.

But continuing to name it KVM_EXIT_VMGEXIT sort of goes against that
decoupling, so I can see some potential for confusion there. KVM_EXIT_SNP is
probably a better generic name for what this exit is meant to cover. But I'm
not aware of anything specific that would involve requiring extending this in
the near-term, though maybe there's some potential with live migration. So a
renaming to something more generic and less specific to VMGEXIT/GHCB,
like KVM_EXIT_SNP, or something more specific like KVM_EXIT_SNP_REQ_CERTS,
both seem warranted, but I don't think moving to something more coupled to
VMGEXIT/GHCB would provide much benefit long-term.

> 
> Ah, you copied what KVM does for Hyper-V and Xen emulation.  Hrm.  But only
> partially.
> 
> Assuming it's impractical to have a generic user exit for this, and we think
> there is a high likelihood of needing to punt more #VMGEXITs to userspace, then
> we should more closely (perhaps even exactly) follow the Hyper-V and Xen models.
> I.e. for all values and whanot that are controlled/defined by a third party
> (Hyper-V, Xen, the GHCB, etc.) #define those values in a header that is clearly
> "owned" by the third party.
> 
> E.g. IIRC, include/xen/interface/xen.h is copied verbatim from Xen documentation
> (source?).  And include/asm-generic/hyperv-tlfs.h is the kernel's copy of the
> TLFS, which dictates all of the Hyper-V hypercalls.
> 
> If we do that, then my concerns/objections largely go away, e.g. KVM isn't
> defining magic values, there's less chance for confusion about what "type" holds,
> etc.
> 
> Oh, and if we go that route, the sizes for all fields should follow the GHCB,
> e.g. I believe the "type" should be a __u64.
> 
> > +			union {
> > +				struct {
> > +					__u64 data_gpa;
> > +					__u64 data_npages;
> > +  #define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_INVALID_LEN   1
> > +  #define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_BUSY          2
> > +  #define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_GENERIC       (1 << 31)
> 
> Hopefully it won't matter, but are BUSY and GENERIC actually defined somewhere?
> I don't see them in GHCB 2.0.

BUSY is defined in 4.1.7:

  It is not expected that a guest would issue many Guest Request NAE
  events. However, access to the SNP firmware is a sequential and
  synchronous operation. To avoid the possibility of a guest creating a
  denial-of-service attack against the SNP firmware, it is recommended
  that some form of rate limiting be implemented should it be detected
  that a high number of Guest Request NAE events are being issued. To
  allow for this, the hypervisor may set the SW_EXITINFO2 field to
  0x0000000200000000, which will inform the guest to retry the request

INVALID_LEN in 4.1.8.1:

  The hypervisor must validate that the guest has supplied enough pages
  to hold the certificates that will be returned before performing the SNP
  guest request. If there are not enough guest pages to hold the certificate
  table and certificate data, the hypervisor will return the required number
  of pages needed to hold the certificate table and certificate data in the
  RBX register and set the SW_EXITINFO2 field to 0x0000000100000000.

and GENERIC chosen to provide an non-zero error code that doesn't
conflict with that above (or future) GHCB-defined values. But KVM isn't
trying to expose the actual GHCB details, like how these values are to be in
the upper 32-bits of SW_EXITINFO2, it just re-uses the values to avoid
purposefully obfuscating the GHCB return codes they relate to.

> 
> In a perfect world, it would be nice for KVM to not have to care about the error
> codes.  But KVM disallows KVM_{G,S}ET_REGS for guest with protected state, which
> means it's not feasible for userspace to set registers, at least not in any sane
> way.
> 
> Heh, we could abuse KVM_SYNC_X86_REGS to let userspace specify RBX, but (a) that's
> gross, and (b) KVM_SYNC_X86_REGS and KVM_SYNC_X86_SREGS really ought to be rejected
> if guest state is protected.
> 
> > +					__u32 ret;
> > +  #define KVM_USER_VMGEXIT_REQ_CERTS_FLAGS_NOTIFY_DONE	BIT(0)
> 
> This has no business being buried in a VMGEXIT_REQ_CERTS flags.  Notifying
> userspace that KVM completed its portion of a userspace exit is completely generic.
> 
> And aside from where the notification flag lives, _if_ we add a notification
> mechanism, it belongs in a separate patch, because it's purely a performance
> optimization.  Userspace can use immediate_exit to force KVM to re-exit after
> completing an exit.
> 
> Actually, I take that back, this isn't even an optimization, it's literally a
> non-generic implementation of kvm_run.immediate_exit.

Relying on a generic -EINTR response resulting from kvm_run.immediate_exit
doesn't seem like a very robust way to ensure the attestation request
was made to firmware. It seems fully possible that future code changes
could result in EINTR being returned for other reasons. So how do you
reliably detect that the EINTR is a result of immediate_exit being called
after the attestation request is made to firmware? We could squirrel something
away in struct kvm_run to probe for, but delivering another
KVM_EXIT_SNP_REQ_CERT with an extra flag set seems to be reasonably
userspace-friendly.

> 
> If this were an optimization, i.e. KVM truly notified userspace without exiting,
> then it would need to be a lot more robust, e.g. to ensure userspace actually
> received the notification before KVM moved on.

Right, this does rely on exiting via , not userspace polling for flags or
anything along that line.

> 
> > +					__u8 flags;
> > +  #define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_PENDING	0
> > +  #define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE		1
> 
> This is also a weird reimplementation of generic functionality.  KVM nullifies
> vcpu->arch.complete_userspace_io _before_ invoking the callback.  So if a callback
> needs to run again on the next KVM_RUN, it can simply set complete_userspace_io
> again.  In other words, literally doing nothing will get you what you want :-)

We could just have the completion callback set complete_userspace_io
again, but then you'd always get 2 userspace exit events per attestation
request. There could be some userspaces that don't implement the
file-locking scheme, in which case they wouldn't need the 2nd notification.
That's why the KVM_USER_VMGEXIT_REQ_CERTS_FLAGS_NOTIFY_DONE flag is provided
as an opt-in.

The pending/done status bits are so userspace can distinguish between the
start of a certificate request and the completion side of it after it gets
bound a completed attestation request and the filelock can be released.

Thanks,

Mike

> 
> > +					__u8 status;
> > +				} req_certs;
> > +			};
> > +		};
Re: [PATCH v15 19/20] KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
Posted by Sean Christopherson 1 year, 7 months ago
On Mon, May 13, 2024, Michael Roth wrote:
> On Mon, May 13, 2024 at 04:48:25PM -0700, Sean Christopherson wrote:
> > Actually, I take that back, this isn't even an optimization, it's literally a
> > non-generic implementation of kvm_run.immediate_exit.
> 
> Relying on a generic -EINTR response resulting from kvm_run.immediate_exit
> doesn't seem like a very robust way to ensure the attestation request
> was made to firmware. It seems fully possible that future code changes
> could result in EINTR being returned for other reasons. So how do you
> reliably detect that the EINTR is a result of immediate_exit being called
> after the attestation request is made to firmware? We could squirrel something
> away in struct kvm_run to probe for, but delivering another
> KVM_EXIT_SNP_REQ_CERT with an extra flag set seems to be reasonably
> userspace-friendly.

And unnecessarily specific to a single exit.  But it's a non-issue (except
possibly on ARM).

I doubt it's formally documented anywhere, but userspace absolutely relies on
kvm_run.immediate_exit to be processed _after_ complete_userspace_io().  If KVM
exits with -EINTR before invoking cui(), live migration will break due to taking
a snapshot of vCPU state in the middle of an instruction.

Given that userspace has likely built up rigid expectations for immediate_exit,
I don't see any problem formally documenting KVM's behavior, i.e. signing a
contract guaranteeing that KVM will complete the "back half" of emulation if
immediate_exit is set and KVM_RUN return -EINTR.

ARM is the only arch that is at all suspect, due to its rather massive
kvm_arch_vcpu_run_pid_change() hook.  At a quick glance, it seems to be ok, too.
And if it's not, we need to fix that asap, because it's like a bug waiting to
happen.

> > If this were an optimization, i.e. KVM truly notified userspace without exiting,
> > then it would need to be a lot more robust, e.g. to ensure userspace actually
> > received the notification before KVM moved on.
> 
> Right, this does rely on exiting via , not userspace polling for flags or
> anything along that line.
> 
> > 
> > > +					__u8 flags;
> > > +  #define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_PENDING	0
> > > +  #define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE		1
> > 
> > This is also a weird reimplementation of generic functionality.  KVM nullifies
> > vcpu->arch.complete_userspace_io _before_ invoking the callback.  So if a callback
> > needs to run again on the next KVM_RUN, it can simply set complete_userspace_io
> > again.  In other words, literally doing nothing will get you what you want :-)
> 
> We could just have the completion callback set complete_userspace_io
> again, but then you'd always get 2 userspace exit events per attestation
> request. There could be some userspaces that don't implement the
> file-locking scheme, in which case they wouldn't need the 2nd notification.

Then they don't set immediate_exit.

> That's why the KVM_USER_VMGEXIT_REQ_CERTS_FLAGS_NOTIFY_DONE flag is provided
> as an opt-in.
> 
> The pending/done status bits are so userspace can distinguish between the
> start of a certificate request and the completion side of it after it gets
> bound a completed attestation request and the filelock can be released.
[PATCH] KVM: SEV: Replace KVM_EXIT_VMGEXIT with KVM_EXIT_SNP_REQ_CERTS
Posted by Michael Roth 1 year, 7 months ago
It's not clear if SNP guests will need any other SNP-specific userspace
exit types in the future, and if they do, it's not clear that they would
necessary be related to VMGEXIT events or something else entirely.

So, rather than trying to anticipate future use-cases and have a single
union structure to manage the associated parameters, just use a common
KVM_EXIT_SNP_* prefix, but otherwise treat these as separate events, and
go ahead and convert the only VMGEXIT type currently defined,
KVM_USER_VMGEXIT_REQ_CERTS, over to KVM_EXIT_SNP_REQ_CERTS.

Also, formally document that kvm_run->immediate_exit is guaranteed to
handle userspace IO completion callbacks before returning to userspace
with -EINTR, and use this mechanism to allow userspace to use this
mechanism as a means to know when an attestation request is complete and
the KVM_EXIT_SNP_REQ_CERTS event is fully-finished, allowing userspace
to at that point handle any certificate synchronization cleanup like
releasing file locks/etc.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
Hi Sean,

Here's an attempt to address your concerns regarding
KVM_EXIT_VMGEXIT/KVM_USER_VMGEXIT_REQ_CERTS. Please let me know what
you think.

The main gist of it is that it leverages kvm->immediate_edit rather than
a flag in the kvm_run exit struct to receive notification when the
attestation has been completed and the certificate is no longer needed.

Additionally it drops the confusing KVM_EXIT_VMGEXIT naming in favor of
the KVM_EXIT_SNP_* prefix, and rather than introduce infrastructure and
a union to handle other SNP-specific types in the future, it simply
defines KVM_EXIT_SNP_REQ_CERTS as a one-off event so we are free to
handle future SNP-specific/SNP-related userspace exits however makes
sense for future cases.

Thanks,

Mike

 Documentation/virt/kvm/api.rst | 64 +++++++++++-----------------------
 arch/x86/kvm/svm/sev.c         | 28 +++------------
 include/uapi/linux/kvm.h       | 31 ++++++++--------
 3 files changed, 43 insertions(+), 80 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 6ab8b5b7c64e..ea05c16f3438 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6381,6 +6381,11 @@ to avoid usage of KVM_SET_SIGNAL_MASK, which has worse scalability.
 Rather than blocking the signal outside KVM_RUN, userspace can set up
 a signal handler that sets run->immediate_exit to a non-zero value.
 
+Also note that any KVM_EXIT_* events that have associated completion
+callbacks that KVM needs to process when KVM_RUN is called will be
+processed *before* exiting again to userspace with -EINTR as a result
+of run->immediate_exit.
+
 This field is ignored if KVM_CAP_IMMEDIATE_EXIT is not available.
 
 ::
@@ -7069,50 +7074,24 @@ values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
 
 ::
 
-		/* KVM_EXIT_VMGEXIT */
-		struct kvm_user_vmgexit {
-  #define KVM_USER_VMGEXIT_REQ_CERTS		1
-			__u32 type; /* KVM_USER_VMGEXIT_* type */
-			union {
-				struct {
-					__u64 data_gpa;
-					__u64 data_npages;
-  #define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_INVALID_LEN   1
-  #define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_BUSY          2
-  #define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_GENERIC       (1 << 31)
-					__u32 ret;
-  #define KVM_USER_VMGEXIT_REQ_CERTS_FLAGS_NOTIFY_DONE	BIT(0)
-					__u8 flags;
-  #define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_PENDING	0
-  #define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE		1
-					__u8 status;
-				} req_certs;
-			};
-		};
-
-
-If exit reason is KVM_EXIT_VMGEXIT then it indicates that an SEV-SNP guest
-has issued a VMGEXIT instruction (as documented by the AMD Architecture
-Programmer's Manual (APM)) to the hypervisor that needs to be serviced by
-userspace. These are generally handled by the host kernel, but in some
-cases some aspects of handling a VMGEXIT are done in userspace.
-
-A kvm_user_vmgexit structure is defined to encapsulate the data to be
-sent to or returned by userspace. The type field defines the specific type
-of exit that needs to be serviced, and that type is used as a discriminator
-to determine which union type should be used for input/output.
-
-KVM_USER_VMGEXIT_REQ_CERTS
---------------------------
+		/* KVM_EXIT_SNP_REQ_CERTS */
+		struct {
+			__u64 data_gpa;
+			__u64 data_npages;
+  #define KVM_EXIT_SNP_REQ_CERTS_ERROR_INVALID_LEN   1
+  #define KVM_EXIT_SNP_REQ_CERTS_ERROR_BUSY          2
+  #define KVM_EXIT_SNP_REQ_CERTS_ERROR_GENERIC       (1 << 31)
+			__u32 ret;
+		} snp_req_certs;
 
 When an SEV-SNP issues a guest request for an attestation report, it has the
 option of issuing it in the form an *extended* guest request when a
 certificate blob is returned alongside the attestation report so the guest
 can validate the endorsement key used by SNP firmware to sign the report.
 These certificates are managed by userspace and are requested via
-KVM_EXIT_VMGEXITs using the KVM_USER_VMGEXIT_REQ_CERTS type.
+KVM_EXIT_SNP exits using the KVM_EXIT_SNP_REQ_CERTS type.
 
-For the KVM_USER_VMGEXIT_REQ_CERTS type, the req_certs union type
+For the KVM_EXIT_SNP_REQ_CERTS type, the req_certs union type
 is used. The kernel will supply in 'data_gpa' the value the guest supplies
 via the RAX field of the GHCB when issuing extended guest requests.
 'data_npages' will similarly contain the value the guest supplies in RBX
@@ -7139,12 +7118,11 @@ this is for the VMM to obtain a shared or exclusive lock on the path the
 certificate blob file resides at before reading it and returning it to KVM,
 and that it continues to hold the lock until the attestation request is
 actually sent to firmware. To facilitate this, the VMM can set the
-KVM_USER_VMGEXIT_REQ_CERTS_FLAGS_NOTIFY_DONE flag before returning the
-certificate blob, in which case another KVM_EXIT_VMGEXIT of type
-KVM_USER_VMGEXIT_REQ_CERTS will be sent to userspace with
-KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE being set in the status field to
-indicate the request is fully-completed and that any associated locks can be
-released.
+run->immediate_exit flag before returning the certificate blob, in which
+case KVM is guaranteed to complete the issuing of all pending IO completion
+callbacks before exiting to userspace with EINTR. At this point userspace
+can release any locks it may have taken when the KVM_EXIT_SNP_REQ_CERTS exit
+was originally received.
 
 Tools/libraries that perform updates to SNP firmware TCB values or endorsement
 keys (e.g. firmware interfaces such as SNP_COMMIT, SNP_SET_CONFIG, or
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 6cf665c410b2..e6318bbd8a6a 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4006,21 +4006,14 @@ static int snp_complete_ext_guest_req(struct kvm_vcpu *vcpu)
 	sev_ret_code fw_err = 0;
 	int vmm_ret;
 
-	vmm_ret = vcpu->run->vmgexit.req_certs.ret;
+	vmm_ret = vcpu->run->snp_req_certs.ret;
 	if (vmm_ret) {
 		if (vmm_ret == SNP_GUEST_VMM_ERR_INVALID_LEN)
 			vcpu->arch.regs[VCPU_REGS_RBX] =
-				vcpu->run->vmgexit.req_certs.data_npages;
+				vcpu->run->snp_req_certs.data_npages;
 		goto out;
 	}
 
-	/*
-	 * The request was completed on the previous completion callback and
-	 * this completion is only for the STATUS_DONE userspace notification.
-	 */
-	if (vcpu->run->vmgexit.req_certs.status == KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE)
-		goto out_resume;
-
 	control = &svm->vmcb->control;
 
 	if (__snp_handle_guest_req(kvm, control->exit_info_1,
@@ -4029,14 +4022,6 @@ static int snp_complete_ext_guest_req(struct kvm_vcpu *vcpu)
 
 out:
 	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
-
-	if (vcpu->run->vmgexit.req_certs.flags & KVM_USER_VMGEXIT_REQ_CERTS_FLAGS_NOTIFY_DONE) {
-		vcpu->run->vmgexit.req_certs.status = KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE;
-		vcpu->run->vmgexit.req_certs.flags = 0;
-		return 0; /* notify userspace of completion */
-	}
-
-out_resume:
 	return 1; /* resume guest */
 }
 
@@ -4060,12 +4045,9 @@ static int snp_begin_ext_guest_req(struct kvm_vcpu *vcpu)
 	 * Grab the certificates from userspace so that can be bundled with
 	 * attestation/guest requests.
 	 */
-	vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
-	vcpu->run->vmgexit.type = KVM_USER_VMGEXIT_REQ_CERTS;
-	vcpu->run->vmgexit.req_certs.data_gpa = data_gpa;
-	vcpu->run->vmgexit.req_certs.data_npages = data_npages;
-	vcpu->run->vmgexit.req_certs.flags = 0;
-	vcpu->run->vmgexit.req_certs.status = KVM_USER_VMGEXIT_REQ_CERTS_STATUS_PENDING;
+	vcpu->run->exit_reason = KVM_EXIT_SNP_REQ_CERTS;
+	vcpu->run->snp_req_certs.data_gpa = data_gpa;
+	vcpu->run->snp_req_certs.data_npages = data_npages;
 	vcpu->arch.complete_userspace_io = snp_complete_ext_guest_req;
 
 	return 0; /* forward request to userspace */
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 106367d87189..8ebfc91dc967 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -135,22 +135,17 @@ struct kvm_xen_exit {
 	} u;
 };
 
-struct kvm_user_vmgexit {
-#define KVM_USER_VMGEXIT_REQ_CERTS		1
-	__u32 type; /* KVM_USER_VMGEXIT_* type */
+struct kvm_exit_snp {
+#define KVM_EXIT_SNP_REQ_CERTS		1
+	__u32 type; /* KVM_EXIT_SNP_* type */
 	union {
 		struct {
 			__u64 data_gpa;
 			__u64 data_npages;
-#define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_INVALID_LEN   1
-#define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_BUSY          2
-#define KVM_USER_VMGEXIT_REQ_CERTS_ERROR_GENERIC       (1 << 31)
+#define KVM_EXIT_SNP_REQ_CERTS_ERROR_INVALID_LEN   1
+#define KVM_EXIT_SNP_REQ_CERTS_ERROR_BUSY          2
+#define KVM_EXIT_SNP_REQ_CERTS_ERROR_GENERIC       (1 << 31)
 			__u32 ret;
-#define KVM_USER_VMGEXIT_REQ_CERTS_FLAGS_NOTIFY_DONE	BIT(0)
-			__u8 flags;
-#define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_PENDING	0
-#define KVM_USER_VMGEXIT_REQ_CERTS_STATUS_DONE		1
-			__u8 status;
 		} req_certs;
 	};
 };
@@ -198,7 +193,7 @@ struct kvm_user_vmgexit {
 #define KVM_EXIT_NOTIFY           37
 #define KVM_EXIT_LOONGARCH_IOCSR  38
 #define KVM_EXIT_MEMORY_FAULT     39
-#define KVM_EXIT_VMGEXIT          40
+#define KVM_EXIT_SNP_REQUEST_CERTS 40
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -454,8 +449,16 @@ struct kvm_run {
 			__u64 gpa;
 			__u64 size;
 		} memory_fault;
-		/* KVM_EXIT_VMGEXIT */
-		struct kvm_user_vmgexit vmgexit;
+		/* KVM_EXIT_SNP_REQ_CERTS */
+		struct {
+			__u64 data_gpa;
+			__u64 data_npages;
+#define KVM_EXIT_SNP_REQ_CERTS_ERROR_INVALID_LEN   1
+#define KVM_EXIT_SNP_REQ_CERTS_ERROR_BUSY          2
+#define KVM_EXIT_SNP_REQ_CERTS_ERROR_GENERIC       (1 << 31)
+			__u32 ret;
+		} snp_req_certs;
+
 		/* Fix the size of the union. */
 		char padding[256];
 	};
-- 
2.25.1
Re: [PATCH] KVM: SEV: Replace KVM_EXIT_VMGEXIT with KVM_EXIT_SNP_REQ_CERTS
Posted by Dionna Amalie Glaze 1 year, 4 months ago
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -4006,21 +4006,14 @@ static int snp_complete_ext_guest_req(struct kvm_vcpu *vcpu)
>         sev_ret_code fw_err = 0;
>         int vmm_ret;
>
> -       vmm_ret = vcpu->run->vmgexit.req_certs.ret;
> +       vmm_ret = vcpu->run->snp_req_certs.ret;
>         if (vmm_ret) {
>                 if (vmm_ret == SNP_GUEST_VMM_ERR_INVALID_LEN)
>                         vcpu->arch.regs[VCPU_REGS_RBX] =
> -                               vcpu->run->vmgexit.req_certs.data_npages;
> +                               vcpu->run->snp_req_certs.data_npages;
>                 goto out;
>         }

Finally getting around to this patch. Thanks for your patience.

Whether the exit to guest for certs is first or second when getting
the attestation report, the certificates need to be
consistent. Since we don't have any locks held before exiting, and no
checks happening on the result, there's a
chance that a well-intentioned host can still provide the wrong
certificate to the guest when VMs are running and requesting
attestations during a firmware hotload.

Thread 1:
DOWNLOAD_FIRMWARE_EX please
CURRENT_TCB > REPORTED_TCB
(notify service to get a new VCEK cert)
Interrupted

Thread 2:
VM extended guest request in.
Exit to user space
Call SNP_PLATFORM_STATUS to get REPORTED_TCB.
Get certs for REPORTED_TCB for the blob. It's at /x/y/z-REPORTED_TCB.crt.
Interrupted

Thread 1:
I got my VCEK cert delivered for CURRENT_TCB! I'll put it at
/x/y/z-CURRENT_TCB.crt
Great. SNP_COMMIT.
Now both REPORTED_TCB and COMMITTED_TCB to CURRENT_TCB, because that's
the spec. Different reported_tcb here. than in thread 1.
Interrupted

Thread 2:
Get the attestation report. It will be signed by the VCEK versioned to
the newer REPORTED_TCB.
Return to VM guest

VM guest:
My report's signature doesn't verify with the VCEK cert I was given.
Yes, 1-88-COM-PLAIN?

How do we avoid this?
1. We can advise that the guest parses the certificate and the
attestation report to determine if their TCBs match expectations and
retry if they're different because of a bad luck data race.
2. We can add a new global lock that KVM holds from CCP similar to
sev_cmd_lock to sequentialize req_certs, attestation reports, and
SNP_COMMIT. KVM releases the lock before returning to the guest.
  SNP_COMMIT must now hold this lock before attempting to grab the sev_cmd_lock.

I think probably 2 is better.

>
> @@ -4060,12 +4045,9 @@ static int snp_begin_ext_guest_req(struct kvm_vcpu *vcpu)
>          * Grab the certificates from userspace so that can be bundled with
>          * attestation/guest requests.
>          */
> -       vcpu->run->exit_reason = KVM_EXIT_VMGEXIT;
> -       vcpu->run->vmgexit.type = KVM_USER_VMGEXIT_REQ_CERTS;
> -       vcpu->run->vmgexit.req_certs.data_gpa = data_gpa;
> -       vcpu->run->vmgexit.req_certs.data_npages = data_npages;
> -       vcpu->run->vmgexit.req_certs.flags = 0;
> -       vcpu->run->vmgexit.req_certs.status = KVM_USER_VMGEXIT_REQ_CERTS_STATUS_PENDING;
> +       vcpu->run->exit_reason = KVM_EXIT_SNP_REQ_CERTS;

This should be whatever exit reason #define you go with (40), not the
(1) you defined for kvm_snp_exit.

> +       vcpu->run->snp_req_certs.data_gpa = data_gpa;
> +       vcpu->run->snp_req_certs.data_npages = data_npages;
>         vcpu->arch.complete_userspace_io = snp_complete_ext_guest_req;
>
>         return 0; /* forward request to userspace */
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 106367d87189..8ebfc91dc967 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -135,22 +135,17 @@ struct kvm_xen_exit {
>         } u;
>  };
>
> -struct kvm_user_vmgexit {
> -#define KVM_USER_VMGEXIT_REQ_CERTS             1
> -       __u32 type; /* KVM_USER_VMGEXIT_* type */
> +struct kvm_exit_snp {
> +#define KVM_EXIT_SNP_REQ_CERTS         1
> +       __u32 type; /* KVM_EXIT_SNP_* type */

I think this whole struct should be removed since we're only doing the
one exit reason. This is unused.
You also double-#define the return value preprocessor directives.

>  };
> @@ -198,7 +193,7 @@ struct kvm_user_vmgexit {
>  #define KVM_EXIT_NOTIFY           37
>  #define KVM_EXIT_LOONGARCH_IOCSR  38
>  #define KVM_EXIT_MEMORY_FAULT     39
> -#define KVM_EXIT_VMGEXIT          40
> +#define KVM_EXIT_SNP_REQUEST_CERTS 40

Probably we should just make this KVM_EXT_SNP_REQ_CERTS so the rest of
the code works.
>
>  /* For KVM_EXIT_INTERNAL_ERROR */
>  /* Emulate instruction failed. */
> @@ -454,8 +449,16 @@ struct kvm_run {
>                         __u64 gpa;
>                         __u64 size;
>                 } memory_fault;
> -               /* KVM_EXIT_VMGEXIT */
> -               struct kvm_user_vmgexit vmgexit;
> +               /* KVM_EXIT_SNP_REQ_CERTS */
> +               struct {
> +                       __u64 data_gpa;
> +                       __u64 data_npages;
> +#define KVM_EXIT_SNP_REQ_CERTS_ERROR_INVALID_LEN   1
> +#define KVM_EXIT_SNP_REQ_CERTS_ERROR_BUSY          2
> +#define KVM_EXIT_SNP_REQ_CERTS_ERROR_GENERIC       (1 << 31)
> +                       __u32 ret;
> +               } snp_req_certs;
> +
>                 /* Fix the size of the union. */
>                 char padding[256];
>         };
> --
> 2.25.1
>
>


--
-Dionna Glaze, PhD, CISSP, CCSP (she/her)
Re: [PATCH] KVM: SEV: Replace KVM_EXIT_VMGEXIT with KVM_EXIT_SNP_REQ_CERTS
Posted by Dionna Amalie Glaze 1 year, 4 months ago
> How do we avoid this?
> 1. We can advise that the guest parses the certificate and the
> attestation report to determine if their TCBs match expectations and
> retry if they're different because of a bad luck data race.
> 2. We can add a new global lock that KVM holds from CCP similar to
> sev_cmd_lock to sequentialize req_certs, attestation reports, and
> SNP_COMMIT. KVM releases the lock before returning to the guest.
>   SNP_COMMIT must now hold this lock before attempting to grab the sev_cmd_lock.
>
> I think probably 2 is better.
>

Actually no, we shouldn't hold a global lock and only release it if
user space returns to KVM in a specific way, unless we can ensure it
will be unlocked safely on fd close.

-- 
-Dionna Glaze, PhD, CISSP, CCSP (she/her)