[PATCH v10 1/2] x86/tdx: Retry partially-completed page conversion hypercalls

Dexuan Cui posted 2 patches 2 years, 4 months ago
[PATCH v10 1/2] x86/tdx: Retry partially-completed page conversion hypercalls
Posted by Dexuan Cui 2 years, 4 months ago
TDX guest memory is private by default and the VMM may not access it.
However, in cases where the guest needs to share data with the VMM,
the guest and the VMM can coordinate to make memory shared between
them.

The guest side of this protocol includes the "MapGPA" hypercall.  This
call takes a guest physical address range.  The hypercall spec (aka.
the GHCI) says that the MapGPA call is allowed to return partial
progress in mapping this range and indicate that fact with a special
error code.  A guest that sees such partial progress is expected to
retry the operation for the portion of the address range that was not
completed.

Hyper-V does this partial completion dance when set_memory_decrypted()
is called to "decrypt" swiotlb bounce buffers that can be up to 1GB
in size.  It is evidently the only VMM that does this, which is why
nobody noticed this until now.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 arch/x86/coco/tdx/tdx.c           | 64 +++++++++++++++++++++++++------
 arch/x86/include/asm/shared/tdx.h |  2 +
 2 files changed, 54 insertions(+), 12 deletions(-)

Changes in v10:
    Dave kindly re-wrote the changelog. No other changes.

diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 1d6b863c42b00..746075d20cd2d 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -703,14 +703,15 @@ static bool tdx_cache_flush_required(void)
 }
 
 /*
- * Inform the VMM of the guest's intent for this physical page: shared with
- * the VMM or private to the guest.  The VMM is expected to change its mapping
- * of the page in response.
+ * Notify the VMM about page mapping conversion. More info about ABI
+ * can be found in TDX Guest-Host-Communication Interface (GHCI),
+ * section "TDG.VP.VMCALL<MapGPA>".
  */
-static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
+static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc)
 {
-	phys_addr_t start = __pa(vaddr);
-	phys_addr_t end   = __pa(vaddr + numpages * PAGE_SIZE);
+	/* Retrying the hypercall a second time should succeed; use 3 just in case */
+	const int max_retries_per_page = 3;
+	int retry_count = 0;
 
 	if (!enc) {
 		/* Set the shared (decrypted) bits: */
@@ -718,12 +719,51 @@ static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
 		end   |= cc_mkdec(0);
 	}
 
-	/*
-	 * Notify the VMM about page mapping conversion. More info about ABI
-	 * can be found in TDX Guest-Host-Communication Interface (GHCI),
-	 * section "TDG.VP.VMCALL<MapGPA>"
-	 */
-	if (_tdx_hypercall(TDVMCALL_MAP_GPA, start, end - start, 0, 0))
+	while (retry_count < max_retries_per_page) {
+		struct tdx_hypercall_args args = {
+			.r10 = TDX_HYPERCALL_STANDARD,
+			.r11 = TDVMCALL_MAP_GPA,
+			.r12 = start,
+			.r13 = end - start };
+
+		u64 map_fail_paddr;
+		u64 ret = __tdx_hypercall_ret(&args);
+
+		if (ret != TDVMCALL_STATUS_RETRY)
+			return !ret;
+		/*
+		 * The guest must retry the operation for the pages in the
+		 * region starting at the GPA specified in R11. R11 comes
+		 * from the untrusted VMM. Sanity check it.
+		 */
+		map_fail_paddr = args.r11;
+		if (map_fail_paddr < start || map_fail_paddr >= end)
+			return false;
+
+		/* "Consume" a retry without forward progress */
+		if (map_fail_paddr == start) {
+			retry_count++;
+			continue;
+		}
+
+		start = map_fail_paddr;
+		retry_count = 0;
+	}
+
+	return false;
+}
+
+/*
+ * Inform the VMM of the guest's intent for this physical page: shared with
+ * the VMM or private to the guest.  The VMM is expected to change its mapping
+ * of the page in response.
+ */
+static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
+{
+	phys_addr_t start = __pa(vaddr);
+	phys_addr_t end   = __pa(vaddr + numpages * PAGE_SIZE);
+
+	if (!tdx_map_gpa(start, end, enc))
 		return false;
 
 	/* shared->private conversion requires memory to be accepted before use */
diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
index 7513b3bb69b7e..22ee23a3f24a6 100644
--- a/arch/x86/include/asm/shared/tdx.h
+++ b/arch/x86/include/asm/shared/tdx.h
@@ -24,6 +24,8 @@
 #define TDVMCALL_MAP_GPA		0x10001
 #define TDVMCALL_REPORT_FATAL_ERROR	0x10003
 
+#define TDVMCALL_STATUS_RETRY		1
+
 #ifndef __ASSEMBLY__
 
 /*
-- 
2.25.1
Re: [PATCH v10 1/2] x86/tdx: Retry partially-completed page conversion hypercalls
Posted by Dave Hansen 2 years, 3 months ago
On 8/11/23 14:48, Dexuan Cui wrote:
> TDX guest memory is private by default and the VMM may not access it.
> However, in cases where the guest needs to share data with the VMM,
> the guest and the VMM can coordinate to make memory shared between
> them.
> 
> The guest side of this protocol includes the "MapGPA" hypercall.  This
> call takes a guest physical address range.  The hypercall spec (aka.
> the GHCI) says that the MapGPA call is allowed to return partial
> progress in mapping this range and indicate that fact with a special
> error code.  A guest that sees such partial progress is expected to
> retry the operation for the portion of the address range that was not
> completed.
> 
> Hyper-V does this partial completion dance when set_memory_decrypted()
> is called to "decrypt" swiotlb bounce buffers that can be up to 1GB
> in size.  It is evidently the only VMM that does this, which is why
> nobody noticed this until now.
> 
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reviewed-by: Michael Kelley <mikelley@microsoft.com>
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Signed-off-by: Dexuan Cui <decui@microsoft.com>

Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>

Is there any reason that this needs to go into the stable trees?  If so,
Fixes: and Cc:stable@ tags would be nice.
Re: [PATCH v10 1/2] x86/tdx: Retry partially-completed page conversion hypercalls
Posted by Huang, Kai 2 years, 3 months ago
On Fri, 2023-08-11 at 14:48 -0700, Dexuan Cui wrote:
> TDX guest memory is private by default and the VMM may not access it.
> However, in cases where the guest needs to share data with the VMM,
> the guest and the VMM can coordinate to make memory shared between
> them.
> 
> The guest side of this protocol includes the "MapGPA" hypercall.  This
> call takes a guest physical address range.  The hypercall spec (aka.
> the GHCI) says that the MapGPA call is allowed to return partial
> progress in mapping this range and indicate that fact with a special
> error code.  A guest that sees such partial progress is expected to
> retry the operation for the portion of the address range that was not
> completed.
> 
> Hyper-V does this partial completion dance when set_memory_decrypted()
> is called to "decrypt" swiotlb bounce buffers that can be up to 1GB
> in size.  It is evidently the only VMM that does this, which is why
> nobody noticed this until now.

Sorry for late commenting.

Nit:

IMHO this patch is doing two separate things together:

1) convert tdx_enc_status_changed() to tdx_map_gpa() to take physical address.
2) Handle MapGPA() retry

The reason of doing 1), IIUC, is hidden in the second patch, that hyperv guest
code is using vzalloc().  I.e., handle MapGPA() retry doesn't strictly require
to change API to take PA rather than VA.

So to me it's better to split this into two patches and give properly
justification to each of them.

Also, see below for the retry ...

> 
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reviewed-by: Michael Kelley <mikelley@microsoft.com>
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>  arch/x86/coco/tdx/tdx.c           | 64 +++++++++++++++++++++++++------
>  arch/x86/include/asm/shared/tdx.h |  2 +
>  2 files changed, 54 insertions(+), 12 deletions(-)
> 
> Changes in v10:
>     Dave kindly re-wrote the changelog. No other changes.
> 
> diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
> index 1d6b863c42b00..746075d20cd2d 100644
> --- a/arch/x86/coco/tdx/tdx.c
> +++ b/arch/x86/coco/tdx/tdx.c
> @@ -703,14 +703,15 @@ static bool tdx_cache_flush_required(void)
>  }
>  
>  /*
> - * Inform the VMM of the guest's intent for this physical page: shared with
> - * the VMM or private to the guest.  The VMM is expected to change its mapping
> - * of the page in response.
> + * Notify the VMM about page mapping conversion. More info about ABI
> + * can be found in TDX Guest-Host-Communication Interface (GHCI),
> + * section "TDG.VP.VMCALL<MapGPA>".
>   */
> -static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
> +static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc)
>  {
> -	phys_addr_t start = __pa(vaddr);
> -	phys_addr_t end   = __pa(vaddr + numpages * PAGE_SIZE);
> +	/* Retrying the hypercall a second time should succeed; use 3 just in case */
> +	const int max_retries_per_page = 3;
> +	int retry_count = 0;

... I tried to dig the full history, but sorry if I am still missing something.

Using 3 is fine if "Retrying the hypercall a second time should succeed" is
always true.  I assume this is because hyperV is able to handle large amount of
pages in one call?

That being said, this is purely hypervisor implementation specific.  Here IIUC
Linux is trying to define a non-spec-based value to retry, which can happen to
work for hyperv *current* implementation.  I am not sure whether it's a good
idea?  For instance, what happens if hyperv is changed in the future to reduce
the number of pages it can handle in one call?

Is there any hyperv specification to define how many pages it can handle in one
call?

What's more, given this function only takes a random range of pages, it makes
even more strange to use hard-coded retry here.  Looks a more reasonable way is
to let the caller who knows how many pages are going to be converted, and
*ideally*, also knows the which hypervisor is running underneath, to determine
how many pages to be converted in one call.  

For instance, any hyperv specific guest code can safely assume hyperv is able to
handle how many pages thus can determine how many pages to try in one call.

Just my 2cents.  And feel free to ignore if all others are fine with the current
solution in this patch.

>  
>  	if (!enc) {
>  		/* Set the shared (decrypted) bits: */
> @@ -718,12 +719,51 @@ static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
>  		end   |= cc_mkdec(0);
>  	}
>  
> -	/*
> -	 * Notify the VMM about page mapping conversion. More info about ABI
> -	 * can be found in TDX Guest-Host-Communication Interface (GHCI),
> -	 * section "TDG.VP.VMCALL<MapGPA>"
> -	 */
> -	if (_tdx_hypercall(TDVMCALL_MAP_GPA, start, end - start, 0, 0))
> +	while (retry_count < max_retries_per_page) {
> +		struct tdx_hypercall_args args = {
> +			.r10 = TDX_HYPERCALL_STANDARD,
> +			.r11 = TDVMCALL_MAP_GPA,
> +			.r12 = start,
> +			.r13 = end - start };

Nit:

Is it a better style to move '}' to a new line?

> +
> +		u64 map_fail_paddr;
> +		u64 ret = __tdx_hypercall_ret(&args);

Nit:

Looks people prefers reverse-christmas-tree style.  Perhaps separate the 'ret' 
declaration out.

> +
> +		if (ret != TDVMCALL_STATUS_RETRY)
> +			return !ret;
> +		/*
> +		 * The guest must retry the operation for the pages in the
> +		 * region starting at the GPA specified in R11. R11 comes
> +		 * from the untrusted VMM. Sanity check it.
> +		 */
> +		map_fail_paddr = args.r11;
> +		if (map_fail_paddr < start || map_fail_paddr >= end)
> +			return false;
> +
> +		/* "Consume" a retry without forward progress */
> +		if (map_fail_paddr == start) {
> +			retry_count++;
> +			continue;
> +		}
> +
> +		start = map_fail_paddr;
> +		retry_count = 0;
> +	}
> +
> +	return false;
> +}
> +
> +/*
> + * Inform the VMM of the guest's intent for this physical page: shared with
> + * the VMM or private to the guest.  The VMM is expected to change its mapping
> + * of the page in response.
> + */
> +static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
> +{
> +	phys_addr_t start = __pa(vaddr);
> +	phys_addr_t end   = __pa(vaddr + numpages * PAGE_SIZE);
> +
> +	if (!tdx_map_gpa(start, end, enc))
>  		return false;
>  
>  	/* shared->private conversion requires memory to be accepted before use */
> diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
> index 7513b3bb69b7e..22ee23a3f24a6 100644
> --- a/arch/x86/include/asm/shared/tdx.h
> +++ b/arch/x86/include/asm/shared/tdx.h
> @@ -24,6 +24,8 @@
>  #define TDVMCALL_MAP_GPA		0x10001
>  #define TDVMCALL_REPORT_FATAL_ERROR	0x10003
>  
> +#define TDVMCALL_STATUS_RETRY		1
> +
>  #ifndef __ASSEMBLY__
>  
>  /*

Re: [PATCH v10 1/2] x86/tdx: Retry partially-completed page conversion hypercalls
Posted by Huang, Kai 2 years, 3 months ago
On Wed, 2023-09-06 at 01:19 +0000, Huang, Kai wrote:
> On Fri, 2023-08-11 at 14:48 -0700, Dexuan Cui wrote:
> > TDX guest memory is private by default and the VMM may not access it.
> > However, in cases where the guest needs to share data with the VMM,
> > the guest and the VMM can coordinate to make memory shared between
> > them.
> > 
> > The guest side of this protocol includes the "MapGPA" hypercall.  This
> > call takes a guest physical address range.  The hypercall spec (aka.
> > the GHCI) says that the MapGPA call is allowed to return partial
> > progress in mapping this range and indicate that fact with a special
> > error code.  A guest that sees such partial progress is expected to
> > retry the operation for the portion of the address range that was not
> > completed.
> > 
> > Hyper-V does this partial completion dance when set_memory_decrypted()
> > is called to "decrypt" swiotlb bounce buffers that can be up to 1GB
> > in size.  It is evidently the only VMM that does this, which is why
> > nobody noticed this until now.
> 
> Sorry for late commenting.
> 
> Nit:
> 
> IMHO this patch is doing two separate things together:
> 
> 1) convert tdx_enc_status_changed() to tdx_map_gpa() to take physical address.
> 2) Handle MapGPA() retry
> 
> The reason of doing 1), IIUC, is hidden in the second patch, that hyperv guest
> code is using vzalloc().  I.e., handle MapGPA() retry doesn't strictly require
> to change API to take PA rather than VA.
> 
> So to me it's better to split this into two patches and give properly
> justification to each of them.

Sorry I realized I missed one point in the retry logic below, so feel free to
ignore this comment.

> 
> Also, see below for the retry ...
> 
> > 
> > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Reviewed-by: Michael Kelley <mikelley@microsoft.com>
> > Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > Signed-off-by: Dexuan Cui <decui@microsoft.com>
> > ---
> >  arch/x86/coco/tdx/tdx.c           | 64 +++++++++++++++++++++++++------
> >  arch/x86/include/asm/shared/tdx.h |  2 +
> >  2 files changed, 54 insertions(+), 12 deletions(-)
> > 
> > Changes in v10:
> >     Dave kindly re-wrote the changelog. No other changes.
> > 
> > diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
> > index 1d6b863c42b00..746075d20cd2d 100644
> > --- a/arch/x86/coco/tdx/tdx.c
> > +++ b/arch/x86/coco/tdx/tdx.c
> > @@ -703,14 +703,15 @@ static bool tdx_cache_flush_required(void)
> >  }
> >  
> >  /*
> > - * Inform the VMM of the guest's intent for this physical page: shared with
> > - * the VMM or private to the guest.  The VMM is expected to change its mapping
> > - * of the page in response.
> > + * Notify the VMM about page mapping conversion. More info about ABI
> > + * can be found in TDX Guest-Host-Communication Interface (GHCI),
> > + * section "TDG.VP.VMCALL<MapGPA>".
> >   */
> > -static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
> > +static bool tdx_map_gpa(phys_addr_t start, phys_addr_t end, bool enc)
> >  {
> > -	phys_addr_t start = __pa(vaddr);
> > -	phys_addr_t end   = __pa(vaddr + numpages * PAGE_SIZE);
> > +	/* Retrying the hypercall a second time should succeed; use 3 just in case */
> > +	const int max_retries_per_page = 3;
> > +	int retry_count = 0;
> 
> ... I tried to dig the full history, but sorry if I am still missing something.
> 
> Using 3 is fine if "Retrying the hypercall a second time should succeed" is
> always true.  I assume this is because hyperV is able to handle large amount of
> pages in one call?
> 
> That being said, this is purely hypervisor implementation specific.  Here IIUC
> Linux is trying to define a non-spec-based value to retry, which can happen to
> work for hyperv *current* implementation.  I am not sure whether it's a good
> idea?  For instance, what happens if hyperv is changed in the future to reduce
> the number of pages it can handle in one call?
> 
> Is there any hyperv specification to define how many pages it can handle in one
> call?
> 
> What's more, given this function only takes a random range of pages, it makes
> even more strange to use hard-coded retry here.  Looks a more reasonable way is
> to let the caller who knows how many pages are going to be converted, and
> *ideally*, also knows the which hypervisor is running underneath, to determine
> how many pages to be converted in one call.  
> 
> For instance, any hyperv specific guest code can safely assume hyperv is able to
> handle how many pages thus can determine how many pages to try in one call.
> 
> Just my 2cents.  And feel free to ignore if all others are fine with the current
> solution in this patch.
> 
> 

Ah, reading patch again I missed the fact that retry is only consumed when no
forward progress is made.   If there's any page has been converted by the
hypervisor, retry_count is reset to 0, and this function will loop until all
pages are converted. So feel free to ignore my above comments.

But is it better to explicitly call this out in the comment?

/*
 * When the hypercall made no forward progress, retrying the hypercall a second
 * time should succeed to make some progress.  Use 3 just in case.
 */

Also, this "retry w/o forward progress" seems a little bit odd, i.e., it is a
special case when hypervisor cannot convert any.  Have you seen any in practice?
How true can we say "retrying a second time should succeed"?

> > +
> > +		if (ret != TDVMCALL_STATUS_RETRY)
> > +			return !ret;
> > +		/*
> > +		 * The guest must retry the operation for the pages in the
> > +		 * region starting at the GPA specified in R11. R11 comes
> > +		 * from the untrusted VMM. Sanity check it.
> > +		 */
> > +		map_fail_paddr = args.r11;
> > +		if (map_fail_paddr < start || map_fail_paddr >= end)
> > +			return false;
> > +
> > +		/* "Consume" a retry without forward progress */
> > +		if (map_fail_paddr == start) {
> > +			retry_count++;
> > +			continue;
> > +		}
> > +
> > +		start = map_fail_paddr;
> > +		retry_count = 0;
> > +	}
> > +
> > +	return false;
> > +}
> > +
> > +/*
> > + * Inform the VMM of the guest's intent for this physical page: shared with
> > + * the VMM or private to the guest.  The VMM is expected to change its mapping
> > + * of the page in response.
> > + */
> > +static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
> > +{
> > +	phys_addr_t start = __pa(vaddr);
> > +	phys_addr_t end   = __pa(vaddr + numpages * PAGE_SIZE);
> > +
> > +	if (!tdx_map_gpa(start, end, enc))
> >  		return false;
> >  
> >  	/* shared->private conversion requires memory to be accepted before use */
> > diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
> > index 7513b3bb69b7e..22ee23a3f24a6 100644
> > --- a/arch/x86/include/asm/shared/tdx.h
> > +++ b/arch/x86/include/asm/shared/tdx.h
> > @@ -24,6 +24,8 @@
> >  #define TDVMCALL_MAP_GPA		0x10001
> >  #define TDVMCALL_REPORT_FATAL_ERROR	0x10003
> >  
> > +#define TDVMCALL_STATUS_RETRY		1
> > +
> >  #ifndef __ASSEMBLY__
> >  
> >  /*
> 

Re: [PATCH v10 1/2] x86/tdx: Retry partially-completed page conversion hypercalls
Posted by Isaku Yamahata 2 years, 4 months ago
On Fri, Aug 11, 2023 at 02:48:25PM -0700,
Dexuan Cui <decui@microsoft.com> wrote:

> TDX guest memory is private by default and the VMM may not access it.
> However, in cases where the guest needs to share data with the VMM,
> the guest and the VMM can coordinate to make memory shared between
> them.
> 
> The guest side of this protocol includes the "MapGPA" hypercall.  This
> call takes a guest physical address range.  The hypercall spec (aka.
> the GHCI) says that the MapGPA call is allowed to return partial
> progress in mapping this range and indicate that fact with a special
> error code.  A guest that sees such partial progress is expected to
> retry the operation for the portion of the address range that was not
> completed.
> 
> Hyper-V does this partial completion dance when set_memory_decrypted()
> is called to "decrypt" swiotlb bounce buffers that can be up to 1GB
> in size.  It is evidently the only VMM that does this, which is why
> nobody noticed this until now.

Now TDX KVM + TDX qemu supports partial completion because TD guest can pass
very large range. e.g. 1GB order.  I tested this patch with (patched) TDX
KVM/qemu.

Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Tested-by: Isaku Yamahata <isaku.yamahata@intel.com>
-- 
Isaku Yamahata <isaku.yamahata@gmail.com>
RE: [PATCH v10 1/2] x86/tdx: Retry partially-completed page conversion hypercalls
Posted by Dexuan Cui 2 years, 3 months ago
> From: Isaku Yamahata <isaku.yamahata@gmail.com>
> Sent: Monday, August 14, 2023 12:04 PM
> To: Dexuan Cui <decui@microsoft.com>
> [...]
> 
> On Fri, Aug 11, 2023 at 02:48:25PM -0700,
> Dexuan Cui <decui@microsoft.com> wrote:
> 
> > TDX guest memory is private by default and the VMM may not access it.
> > However, in cases where the guest needs to share data with the VMM,
> > the guest and the VMM can coordinate to make memory shared between
> > them.
> >
> > The guest side of this protocol includes the "MapGPA" hypercall.  This
> > call takes a guest physical address range.  The hypercall spec (aka.
> > the GHCI) says that the MapGPA call is allowed to return partial
> > progress in mapping this range and indicate that fact with a special
> > error code.  A guest that sees such partial progress is expected to
> > retry the operation for the portion of the address range that was not
> > completed.
> >
> > Hyper-V does this partial completion dance when set_memory_decrypted()
> > is called to "decrypt" swiotlb bounce buffers that can be up to 1GB
> > in size.  It is evidently the only VMM that does this, which is why
> > nobody noticed this until now.
> 
> Now TDX KVM + TDX qemu supports partial completion because TD guest
> can pass
> very large range. e.g. 1GB order.  I tested this patch with (patched) TDX
> KVM/qemu.
> 
> Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Tested-by: Isaku Yamahata <isaku.yamahata@intel.com>

Thanks Isaku for reviewing and testing the patch!

@Dave, may I know if the 2 updated patches look good to you?