[v13] Add throttling detection to sev-guest

[PATCH v13 1/4] virt/coco/sev-guest: Add throttling awareness

Posted by Dionna Glaze 3 years ago

The host is permitted and encouraged to throttle guest requests to the
AMD-SP since it is a shared resource across all VMs. Without
throttling-awareness, the host returning an error will immediately lock
out access to the VMPCK, which makes the VM less useful as it can't
attest itself. Since throttling is expected to be a common occurrence, a
cooperative host can return a VMM error code that the request was
throttled.

The driver interprets the upper 32 bits of exitinfo2 as a VMM error code.
For safety, since the encryption algorithm in GHCBv2 is AES_GCM, control
must remain in the kernel to complete the request with the current
sequence number. Returning without finishing the request allows the the
guest to make another request but with different message contents. This
is IV reuse, and breaks cryptographic protections.

A guest request may not make it to the AMD-SP before the host returns to
the guest, so the err local variable in handle_guest_request must be
initialized the same way fw_err is. snp_issue_guest_request similarly
should set fw_err whether or not the value is non-zero, in order to
appropriately clear the error value when zero.

Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Peter Gonda <pgonda@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <Borislav.Petkov@amd.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Venu Busireddy <venu.busireddy@oracle.com>
Cc: Michael Roth <michael.roth@amd.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Michael Sterritt <sterritt@google.com>

Fixes: d5af44dde546 ("x86/sev: Provide support for SNP guest request
NAEs")

Signed-off-by: Dionna Glaze <dionnaglaze@google.com>
---
 arch/x86/include/asm/sev-common.h       |  3 ++-
 arch/x86/kernel/sev.c                   |  3 +--
 drivers/virt/coco/sev-guest/sev-guest.c | 11 ++++++++++-
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index b8357d6ecd47..b63be696b776 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -128,8 +128,9 @@ struct snp_psc_desc {
 	struct psc_entry entries[VMGEXIT_PSC_MAX_ENTRY];
 } __packed;
 
-/* Guest message request error code */
+/* Guest message request error codes */
 #define SNP_GUEST_REQ_INVALID_LEN	BIT_ULL(32)
+#define SNP_GUEST_REQ_ERR_BUSY		BIT_ULL(33)
 
 #define GHCB_MSR_TERM_REQ		0x100
 #define GHCB_MSR_TERM_REASON_SET_POS	12
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 679026a640ef..a908ffc2dfba 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -2212,14 +2212,13 @@ int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, unsigned
 	if (ret)
 		goto e_put;
 
+	*fw_err = ghcb->save.sw_exit_info_2;
 	if (ghcb->save.sw_exit_info_2) {
 		/* Number of expected pages are returned in RBX */
 		if (exit_code == SVM_VMGEXIT_EXT_GUEST_REQUEST &&
 		    ghcb->save.sw_exit_info_2 == SNP_GUEST_REQ_INVALID_LEN)
 			input->data_npages = ghcb_get_rbx(ghcb);
 
-		*fw_err = ghcb->save.sw_exit_info_2;
-
 		ret = -EIO;
 	}
 
diff --git a/drivers/virt/coco/sev-guest/sev-guest.c b/drivers/virt/coco/sev-guest/sev-guest.c
index 4ec4174e05a3..3d6551fdf06f 100644
--- a/drivers/virt/coco/sev-guest/sev-guest.c
+++ b/drivers/virt/coco/sev-guest/sev-guest.c
@@ -322,7 +322,7 @@ static int handle_guest_request(struct snp_guest_dev *snp_dev, u64 exit_code, in
 				u8 type, void *req_buf, size_t req_sz, void *resp_buf,
 				u32 resp_sz, __u64 *fw_err)
 {
-	unsigned long err;
+	unsigned long err = 0xff;
 	u64 seqno;
 	int rc;
 
@@ -338,6 +338,7 @@ static int handle_guest_request(struct snp_guest_dev *snp_dev, u64 exit_code, in
 	if (rc)
 		return rc;
 
+retry:
 	/*
 	 * Call firmware to process the request. In this function the encrypted
 	 * message enters shared memory with the host. So after this call the
@@ -346,6 +347,14 @@ static int handle_guest_request(struct snp_guest_dev *snp_dev, u64 exit_code, in
 	 */
 	rc = snp_issue_guest_request(exit_code, &snp_dev->input, &err);
 
+	/*
+	 * The host may return SNP_GUEST_REQ_ERR_EBUSY if the request has been
+	 * throttled. Retry in the driver to avoid returning and reusing the
+	 * message sequence number on a different message.
+	 */
+	if (err == SNP_GUEST_REQ_ERR_BUSY)
+		goto retry;
+
 	/*
 	 * If the extended guest request fails due to having too small of a
 	 * certificate data buffer, retry the same guest request without the
-- 
2.39.1.405.gd4c25cc71f-goog

Re: [PATCH v13 1/4] virt/coco/sev-guest: Add throttling awareness

Posted by Borislav Petkov 3 years ago

On Tue, Jan 24, 2023 at 09:14:52PM +0000, Dionna Glaze wrote:
> The host is permitted and encouraged to throttle guest requests to the
> AMD-SP since it is a shared resource across all VMs. Without
> throttling-awareness, the host returning an error will immediately lock
> out access to the VMPCK, which makes the VM less useful as it can't
> attest itself. Since throttling is expected to be a common occurrence, a
> cooperative host can return a VMM error code that the request was
> throttled.

So where is this concept of guest throttling documented?

It sounds like this is something hypervisors do and it is all fine and
great to do that but where does it say: yes, this is what we do and this
is the usual behavior that's expected from guests and HVs to adhere to
when accessing this shared resource?

Tom, is that in the spec somewhere perhaps? Or was this decided upon on
some call?

In any case, I'd like for us to document it somewhere eventually if that
hasn't happened yet so that all parties are clear on what is supposed to
happen and what the protocol is.

> +retry:
>  	/*
>  	 * Call firmware to process the request. In this function the encrypted
>  	 * message enters shared memory with the host. So after this call the
> @@ -346,6 +347,14 @@ static int handle_guest_request(struct snp_guest_dev *snp_dev, u64 exit_code, in
>  	 */
>  	rc = snp_issue_guest_request(exit_code, &snp_dev->input, &err);
>  
> +	/*
> +	 * The host may return SNP_GUEST_REQ_ERR_EBUSY if the request has been
> +	 * throttled. Retry in the driver to avoid returning and reusing the
> +	 * message sequence number on a different message.
> +	 */
> +	if (err == SNP_GUEST_REQ_ERR_BUSY)
> +		goto retry;

I don't like even potential endless loops.

How about you turn this into a loop with a sufficiently large retry
count which, when depleted, gets this request failed with a -ETIMEDOUT
or what not?

You could also stick a cond_resched() in that loop so that it can take a
breather between the requests and doesn't hammer the hw as much.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v13 1/4] virt/coco/sev-guest: Add throttling awareness

Posted by Tom Lendacky 3 years ago

On 1/30/23 05:13, Borislav Petkov wrote:
> On Tue, Jan 24, 2023 at 09:14:52PM +0000, Dionna Glaze wrote:
>> The host is permitted and encouraged to throttle guest requests to the
>> AMD-SP since it is a shared resource across all VMs. Without
>> throttling-awareness, the host returning an error will immediately lock
>> out access to the VMPCK, which makes the VM less useful as it can't
>> attest itself. Since throttling is expected to be a common occurrence, a
>> cooperative host can return a VMM error code that the request was
>> throttled.
> 
> So where is this concept of guest throttling documented?
> 
> It sounds like this is something hypervisors do and it is all fine and
> great to do that but where does it say: yes, this is what we do and this
> is the usual behavior that's expected from guests and HVs to adhere to
> when accessing this shared resource?
> 
> Tom, is that in the spec somewhere perhaps? Or was this decided upon on
> some call?

Yes, this is part of the GHCB 2.02 specification document that is in the 
process of being published.

Thanks,
Tom

> 
> In any case, I'd like for us to document it somewhere eventually if that
> hasn't happened yet so that all parties are clear on what is supposed to
> happen and what the protocol is.
> 
>> +retry:
>>   	/*
>>   	 * Call firmware to process the request. In this function the encrypted
>>   	 * message enters shared memory with the host. So after this call the
>> @@ -346,6 +347,14 @@ static int handle_guest_request(struct snp_guest_dev *snp_dev, u64 exit_code, in
>>   	 */
>>   	rc = snp_issue_guest_request(exit_code, &snp_dev->input, &err);
>>   
>> +	/*
>> +	 * The host may return SNP_GUEST_REQ_ERR_EBUSY if the request has been
>> +	 * throttled. Retry in the driver to avoid returning and reusing the
>> +	 * message sequence number on a different message.
>> +	 */
>> +	if (err == SNP_GUEST_REQ_ERR_BUSY)
>> +		goto retry;
> 
> I don't like even potential endless loops.
> 
> How about you turn this into a loop with a sufficiently large retry
> count which, when depleted, gets this request failed with a -ETIMEDOUT
> or what not?
> 
> You could also stick a cond_resched() in that loop so that it can take a
> breather between the requests and doesn't hammer the hw as much.
> 
> Thx.
>

Re: [PATCH v13 1/4] virt/coco/sev-guest: Add throttling awareness

Posted by Dionna Amalie Glaze 3 years ago

>
> >
> > In any case, I'd like for us to document it somewhere eventually if that
> > hasn't happened yet so that all parties are clear on what is supposed to
> > happen and what the protocol is.
> >

Hi y'all, checking in on this thread. Are we waiting for the new GHCB
spec to be published before merging this fix?

-- 
-Dionna Glaze, PhD (she/her)

Re: [PATCH v13 1/4] virt/coco/sev-guest: Add throttling awareness

Posted by Borislav Petkov 3 years ago

On Wed, Feb 08, 2023 at 11:24:46AM -0800, Dionna Amalie Glaze wrote:
> Hi y'all, checking in on this thread. Are we waiting for the new GHCB
> spec to be published before merging this fix?

Not really - I'm waiting for you to remove the potential endless loop:

https://lore.kernel.org/r/Y9emVjoTBrM2%2BY5P@zn.tnic

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v13 1/4] virt/coco/sev-guest: Add throttling awareness

Posted by Dionna Amalie Glaze 3 years ago

> Not really - I'm waiting for you to remove the potential endless loop:
>
> https://lore.kernel.org/r/Y9emVjoTBrM2%2BY5P@zn.tnic
>
> Thx.
>

Ah that slipped my attention. Thanks, I'll update that.

-- 
-Dionna Glaze, PhD (she/her)

Re: [PATCH v13 1/4] virt/coco/sev-guest: Add throttling awareness

Posted by Tom Lendacky 3 years ago

On 1/24/23 15:14, Dionna Glaze wrote:
> The host is permitted and encouraged to throttle guest requests to the
> AMD-SP since it is a shared resource across all VMs. Without
> throttling-awareness, the host returning an error will immediately lock
> out access to the VMPCK, which makes the VM less useful as it can't
> attest itself. Since throttling is expected to be a common occurrence, a

It's not expected to be a common occurrence, but a host should protect 
itself from an un-cooperative guest.

> cooperative host can return a VMM error code that the request was
> throttled.
> 
> The driver interprets the upper 32 bits of exitinfo2 as a VMM error code.
> For safety, since the encryption algorithm in GHCBv2 is AES_GCM, control
> must remain in the kernel to complete the request with the current
> sequence number. Returning without finishing the request allows the the
> guest to make another request but with different message contents. This
> is IV reuse, and breaks cryptographic protections.
> 
> A guest request may not make it to the AMD-SP before the host returns to
> the guest, so the err local variable in handle_guest_request must be
> initialized the same way fw_err is. snp_issue_guest_request similarly
> should set fw_err whether or not the value is non-zero, in order to
> appropriately clear the error value when zero.
> 
> Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Joerg Roedel <jroedel@suse.de>
> Cc: Peter Gonda <pgonda@google.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <Borislav.Petkov@amd.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Venu Busireddy <venu.busireddy@oracle.com>
> Cc: Michael Roth <michael.roth@amd.com>
> Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> Cc: Michael Sterritt <sterritt@google.com>
> 
> Fixes: d5af44dde546 ("x86/sev: Provide support for SNP guest request
> NAEs")
> 
> Signed-off-by: Dionna Glaze <dionnaglaze@google.com>
> ---
>   arch/x86/include/asm/sev-common.h       |  3 ++-
>   arch/x86/kernel/sev.c                   |  3 +--
>   drivers/virt/coco/sev-guest/sev-guest.c | 11 ++++++++++-
>   3 files changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> index b8357d6ecd47..b63be696b776 100644
> --- a/arch/x86/include/asm/sev-common.h
> +++ b/arch/x86/include/asm/sev-common.h
> @@ -128,8 +128,9 @@ struct snp_psc_desc {
>   	struct psc_entry entries[VMGEXIT_PSC_MAX_ENTRY];
>   } __packed;
>   
> -/* Guest message request error code */
> +/* Guest message request error codes */
>   #define SNP_GUEST_REQ_INVALID_LEN	BIT_ULL(32)
> +#define SNP_GUEST_REQ_ERR_BUSY		BIT_ULL(33)

It was probably a short cut to use BIT_ULL() to start with because the 
error codes are not intended to be single bit positions, e.g., the next 
value will be 3.

So these should really be:

#define SNP_GUEST_REQ_INVALID_LEN	(1ULL << 32)
#define SNP_GUEST_REQ_ERR_BUSY		(2ULL << 32)

Thanks,
Tom

>   
>   #define GHCB_MSR_TERM_REQ		0x100
>   #define GHCB_MSR_TERM_REASON_SET_POS	12
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 679026a640ef..a908ffc2dfba 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -2212,14 +2212,13 @@ int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, unsigned
>   	if (ret)
>   		goto e_put;
>   
> +	*fw_err = ghcb->save.sw_exit_info_2;
>   	if (ghcb->save.sw_exit_info_2) {
>   		/* Number of expected pages are returned in RBX */
>   		if (exit_code == SVM_VMGEXIT_EXT_GUEST_REQUEST &&
>   		    ghcb->save.sw_exit_info_2 == SNP_GUEST_REQ_INVALID_LEN)
>   			input->data_npages = ghcb_get_rbx(ghcb);
>   
> -		*fw_err = ghcb->save.sw_exit_info_2;
> -
>   		ret = -EIO;
>   	}
>   
> diff --git a/drivers/virt/coco/sev-guest/sev-guest.c b/drivers/virt/coco/sev-guest/sev-guest.c
> index 4ec4174e05a3..3d6551fdf06f 100644
> --- a/drivers/virt/coco/sev-guest/sev-guest.c
> +++ b/drivers/virt/coco/sev-guest/sev-guest.c
> @@ -322,7 +322,7 @@ static int handle_guest_request(struct snp_guest_dev *snp_dev, u64 exit_code, in
>   				u8 type, void *req_buf, size_t req_sz, void *resp_buf,
>   				u32 resp_sz, __u64 *fw_err)
>   {
> -	unsigned long err;
> +	unsigned long err = 0xff;
>   	u64 seqno;
>   	int rc;
>   
> @@ -338,6 +338,7 @@ static int handle_guest_request(struct snp_guest_dev *snp_dev, u64 exit_code, in
>   	if (rc)
>   		return rc;
>   
> +retry:
>   	/*
>   	 * Call firmware to process the request. In this function the encrypted
>   	 * message enters shared memory with the host. So after this call the
> @@ -346,6 +347,14 @@ static int handle_guest_request(struct snp_guest_dev *snp_dev, u64 exit_code, in
>   	 */
>   	rc = snp_issue_guest_request(exit_code, &snp_dev->input, &err);
>   
> +	/*
> +	 * The host may return SNP_GUEST_REQ_ERR_EBUSY if the request has been
> +	 * throttled. Retry in the driver to avoid returning and reusing the
> +	 * message sequence number on a different message.
> +	 */
> +	if (err == SNP_GUEST_REQ_ERR_BUSY)
> +		goto retry;
> +
>   	/*
>   	 * If the extended guest request fails due to having too small of a
>   	 * certificate data buffer, retry the same guest request without the

Re: [PATCH v13 1/4] virt/coco/sev-guest: Add throttling awareness

Posted by Dionna Amalie Glaze 3 years ago

>
> So these should really be:
>
> #define SNP_GUEST_REQ_INVALID_LEN       (1ULL << 32)
> #define SNP_GUEST_REQ_ERR_BUSY          (2ULL << 32)
>
> Thanks,
> Tom
>

Patch 3/4 cleans them up with just such a shift. They also move from
arch/x86/include/asm/sev-common.h to include/uapi/linux/sev-guest.h.
Is it okay to delay that cleanup until after the fix patch?


--
-Dionna Glaze, PhD (she/her)

Re: [PATCH v13 1/4] virt/coco/sev-guest: Add throttling awareness

Posted by Tom Lendacky 3 years ago

On 1/25/23 11:48, Dionna Amalie Glaze wrote:
>>
>> So these should really be:
>>
>> #define SNP_GUEST_REQ_INVALID_LEN       (1ULL << 32)
>> #define SNP_GUEST_REQ_ERR_BUSY          (2ULL << 32)
>>
>> Thanks,
>> Tom
>>
> 
> Patch 3/4 cleans them up with just such a shift. They also move from
> arch/x86/include/asm/sev-common.h to include/uapi/linux/sev-guest.h.
> Is it okay to delay that cleanup until after the fix patch?

Yeah, if they're being deleted later and fixed up, I'm ok with that.

Thanks,
Tom

> 
> 
> --
> -Dionna Glaze, PhD (she/her)