[PATCH v3 13/25] KVM: TDX: Fold tdx_mem_page_record_premap_cnt() into its sole caller

Sean Christopherson posted 25 patches 3 months, 3 weeks ago
There is a newer version of this series
[PATCH v3 13/25] KVM: TDX: Fold tdx_mem_page_record_premap_cnt() into its sole caller
Posted by Sean Christopherson 3 months, 3 weeks ago
Fold tdx_mem_page_record_premap_cnt() into tdx_sept_set_private_spte() as
providing a one-off helper for effectively three lines of code is at best a
wash, and splitting the code makes the comment for smp_rmb()  _extremely_
confusing as the comment talks about reading kvm->arch.pre_fault_allowed
before kvm_tdx->state, but the immediately visible code does the exact
opposite.

Opportunistically rewrite the comments to more explicitly explain who is
checking what, as well as _why_ the ordering matters.

No functional change intended.

Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/tdx.c | 49 ++++++++++++++++++------------------------
 1 file changed, 21 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 6c0adc1b3bd5..c37591730cc5 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1605,29 +1605,6 @@ static int tdx_mem_page_aug(struct kvm *kvm, gfn_t gfn,
 	return 0;
 }
 
-/*
- * KVM_TDX_INIT_MEM_REGION calls kvm_gmem_populate() to map guest pages; the
- * callback tdx_gmem_post_populate() then maps pages into private memory.
- * through the a seamcall TDH.MEM.PAGE.ADD().  The SEAMCALL also requires the
- * private EPT structures for the page to have been built before, which is
- * done via kvm_tdp_map_page(). nr_premapped counts the number of pages that
- * were added to the EPT structures but not added with TDH.MEM.PAGE.ADD().
- * The counter has to be zero on KVM_TDX_FINALIZE_VM, to ensure that there
- * are no half-initialized shared EPT pages.
- */
-static int tdx_mem_page_record_premap_cnt(struct kvm *kvm, gfn_t gfn,
-					  enum pg_level level, kvm_pfn_t pfn)
-{
-	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
-
-	if (KVM_BUG_ON(kvm->arch.pre_fault_allowed, kvm))
-		return -EIO;
-
-	/* nr_premapped will be decreased when tdh_mem_page_add() is called. */
-	atomic64_inc(&kvm_tdx->nr_premapped);
-	return 0;
-}
-
 static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
 				     enum pg_level level, kvm_pfn_t pfn)
 {
@@ -1638,14 +1615,30 @@ static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
 		return -EIO;
 
 	/*
-	 * Read 'pre_fault_allowed' before 'kvm_tdx->state'; see matching
-	 * barrier in tdx_td_finalize().
+	 * Ensure pre_fault_allowed is read by kvm_arch_vcpu_pre_fault_memory()
+	 * before kvm_tdx->state.  Userspace must not be allowed to pre-fault
+	 * arbitrary memory until the initial memory image is finalized.  Pairs
+	 * with the smp_wmb() in tdx_td_finalize().
 	 */
 	smp_rmb();
-	if (likely(kvm_tdx->state == TD_STATE_RUNNABLE))
-		return tdx_mem_page_aug(kvm, gfn, level, pfn);
 
-	return tdx_mem_page_record_premap_cnt(kvm, gfn, level, pfn);
+	/*
+	 * If the TD isn't finalized/runnable, then userspace is initializing
+	 * the VM image via KVM_TDX_INIT_MEM_REGION.  Increment the number of
+	 * pages that need to be mapped and initialized via TDH.MEM.PAGE.ADD.
+	 * KVM_TDX_FINALIZE_VM checks the counter to ensure all mapped pages
+	 * have been added to the image, to prevent running the TD with a
+	 * valid mapping in the mirror EPT, but not in the S-EPT.
+	 */
+	if (unlikely(kvm_tdx->state != TD_STATE_RUNNABLE)) {
+		if (KVM_BUG_ON(kvm->arch.pre_fault_allowed, kvm))
+			return -EIO;
+
+		atomic64_inc(&kvm_tdx->nr_premapped);
+		return 0;
+	}
+
+	return tdx_mem_page_aug(kvm, gfn, level, pfn);
 }
 
 static int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn,
-- 
2.51.0.858.gf9c4a03a3a-goog
Re: [PATCH v3 13/25] KVM: TDX: Fold tdx_mem_page_record_premap_cnt() into its sole caller
Posted by Binbin Wu 3 months, 2 weeks ago

On 10/17/2025 8:32 AM, Sean Christopherson wrote:
> Fold tdx_mem_page_record_premap_cnt() into tdx_sept_set_private_spte() as
> providing a one-off helper for effectively three lines of code is at best a
> wash, and splitting the code makes the comment for smp_rmb()  _extremely_
> confusing as the comment talks about reading kvm->arch.pre_fault_allowed
> before kvm_tdx->state, but the immediately visible code does the exact
> opposite.
>
> Opportunistically rewrite the comments to more explicitly explain who is
> checking what, as well as _why_ the ordering matters.
>
> No functional change intended.
>
> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

One nit below.

[...]
> +	/*
> +	 * If the TD isn't finalized/runnable, then userspace is initializing
> +	 * the VM image via KVM_TDX_INIT_MEM_REGION.  Increment the number of
> +	 * pages that need to be mapped and initialized via TDH.MEM.PAGE.ADD.
> +	 * KVM_TDX_FINALIZE_VM checks the counter to ensure all mapped pages
                                                                    ^
                                                 Nit: Is pre-mapped better?
> +	 * have been added to the image, to prevent running the TD with a
> +	 * valid mapping in the mirror EPT, but not in the S-EPT.
> +	 */
> +	if (unlikely(kvm_tdx->state != TD_STATE_RUNNABLE)) {
> +		if (KVM_BUG_ON(kvm->arch.pre_fault_allowed, kvm))
> +			return -EIO;
> +
> +		atomic64_inc(&kvm_tdx->nr_premapped);
> +		return 0;
> +	}
> +
> +	return tdx_mem_page_aug(kvm, gfn, level, pfn);
>   }
>   
>   static int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn,

Re: [PATCH v3 13/25] KVM: TDX: Fold tdx_mem_page_record_premap_cnt() into its sole caller
Posted by Sean Christopherson 3 months, 2 weeks ago
On Fri, Oct 24, 2025, Binbin Wu wrote:
> 
> 
> On 10/17/2025 8:32 AM, Sean Christopherson wrote:
> > Fold tdx_mem_page_record_premap_cnt() into tdx_sept_set_private_spte() as
> > providing a one-off helper for effectively three lines of code is at best a
> > wash, and splitting the code makes the comment for smp_rmb()  _extremely_
> > confusing as the comment talks about reading kvm->arch.pre_fault_allowed
> > before kvm_tdx->state, but the immediately visible code does the exact
> > opposite.
> > 
> > Opportunistically rewrite the comments to more explicitly explain who is
> > checking what, as well as _why_ the ordering matters.
> > 
> > No functional change intended.
> > 
> > Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> 
> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> 
> One nit below.
> 
> [...]
> > +	/*
> > +	 * If the TD isn't finalized/runnable, then userspace is initializing
> > +	 * the VM image via KVM_TDX_INIT_MEM_REGION.  Increment the number of
> > +	 * pages that need to be mapped and initialized via TDH.MEM.PAGE.ADD.
> > +	 * KVM_TDX_FINALIZE_VM checks the counter to ensure all mapped pages
>                                                                    ^
>                                                 Nit: Is pre-mapped better?

Yeah, updated (and then it gets deleted a few commits later :-) ).
Re: [PATCH v3 13/25] KVM: TDX: Fold tdx_mem_page_record_premap_cnt() into its sole caller
Posted by Binbin Wu 3 months, 2 weeks ago

On 10/25/2025 12:33 AM, Sean Christopherson wrote:
> On Fri, Oct 24, 2025, Binbin Wu wrote:
>>
>> On 10/17/2025 8:32 AM, Sean Christopherson wrote:
>>> Fold tdx_mem_page_record_premap_cnt() into tdx_sept_set_private_spte() as
>>> providing a one-off helper for effectively three lines of code is at best a
>>> wash, and splitting the code makes the comment for smp_rmb()  _extremely_
>>> confusing as the comment talks about reading kvm->arch.pre_fault_allowed
>>> before kvm_tdx->state, but the immediately visible code does the exact
>>> opposite.
>>>
>>> Opportunistically rewrite the comments to more explicitly explain who is
>>> checking what, as well as _why_ the ordering matters.
>>>
>>> No functional change intended.
>>>
>>> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
>>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
>>
>> One nit below.
>>
>> [...]
>>> +	/*
>>> +	 * If the TD isn't finalized/runnable, then userspace is initializing
>>> +	 * the VM image via KVM_TDX_INIT_MEM_REGION.  Increment the number of
>>> +	 * pages that need to be mapped and initialized via TDH.MEM.PAGE.ADD.
>>> +	 * KVM_TDX_FINALIZE_VM checks the counter to ensure all mapped pages
>>                                                                     ^
>>                                                  Nit: Is pre-mapped better?
> Yeah, updated (and then it gets deleted a few commits later :-) ).
Oh, right, nr_premapped will be dropped later.

Since the whole nr_premapped will be dropped, do we still need a cleanup patch
like patch 12 which will be dropped finally?
Re: [PATCH v3 13/25] KVM: TDX: Fold tdx_mem_page_record_premap_cnt() into its sole caller
Posted by Sean Christopherson 3 months, 1 week ago
On Mon, Oct 27, 2025, Binbin Wu wrote:
> 
> 
> On 10/25/2025 12:33 AM, Sean Christopherson wrote:
> > On Fri, Oct 24, 2025, Binbin Wu wrote:
> > > 
> > > On 10/17/2025 8:32 AM, Sean Christopherson wrote:
> > > > Fold tdx_mem_page_record_premap_cnt() into tdx_sept_set_private_spte() as
> > > > providing a one-off helper for effectively three lines of code is at best a
> > > > wash, and splitting the code makes the comment for smp_rmb()  _extremely_
> > > > confusing as the comment talks about reading kvm->arch.pre_fault_allowed
> > > > before kvm_tdx->state, but the immediately visible code does the exact
> > > > opposite.
> > > > 
> > > > Opportunistically rewrite the comments to more explicitly explain who is
> > > > checking what, as well as _why_ the ordering matters.
> > > > 
> > > > No functional change intended.
> > > > 
> > > > Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> > > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> > > 
> > > One nit below.
> > > 
> > > [...]
> > > > +	/*
> > > > +	 * If the TD isn't finalized/runnable, then userspace is initializing
> > > > +	 * the VM image via KVM_TDX_INIT_MEM_REGION.  Increment the number of
> > > > +	 * pages that need to be mapped and initialized via TDH.MEM.PAGE.ADD.
> > > > +	 * KVM_TDX_FINALIZE_VM checks the counter to ensure all mapped pages
> > >                                                                     ^
> > >                                                  Nit: Is pre-mapped better?
> > Yeah, updated (and then it gets deleted a few commits later :-) ).
> Oh, right, nr_premapped will be dropped later.
> 
> Since the whole nr_premapped will be dropped, do we still need a cleanup patch
> like patch 12 which will be dropped finally?

We don't strictly "need" the cleanups, but IMO intermediate cleanups are often
worth doing even if they get thrown away, soo that the code is in a (hopefully)
better state when the "big" functional change comes along.  I.e. if code 'X' is
easier to understand than code 'Y', then theoretically/hopefully X=>Z is also
easier to understand than Y=>Z.
Re: [PATCH v3 13/25] KVM: TDX: Fold tdx_mem_page_record_premap_cnt() into its sole caller
Posted by Huang, Kai 3 months, 2 weeks ago
On Thu, 2025-10-16 at 17:32 -0700, Sean Christopherson wrote:
> Fold tdx_mem_page_record_premap_cnt() into tdx_sept_set_private_spte() as
> providing a one-off helper for effectively three lines of code is at best a
> wash, and splitting the code makes the comment for smp_rmb()  _extremely_
> confusing as the comment talks about reading kvm->arch.pre_fault_allowed
> before kvm_tdx->state, but the immediately visible code does the exact
> opposite.
> 
> Opportunistically rewrite the comments to more explicitly explain who is
> checking what, as well as _why_ the ordering matters.
> 
> No functional change intended.
> 
> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Kai Huang <kai.huang@intel.com>

> ---
>  arch/x86/kvm/vmx/tdx.c | 49 ++++++++++++++++++------------------------
>  1 file changed, 21 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index 6c0adc1b3bd5..c37591730cc5 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -1605,29 +1605,6 @@ static int tdx_mem_page_aug(struct kvm *kvm, gfn_t gfn,
>  	return 0;
>  }
>  
> -/*
> - * KVM_TDX_INIT_MEM_REGION calls kvm_gmem_populate() to map guest pages; the
> - * callback tdx_gmem_post_populate() then maps pages into private memory.
> - * through the a seamcall TDH.MEM.PAGE.ADD().  The SEAMCALL also requires the
> - * private EPT structures for the page to have been built before, which is
> - * done via kvm_tdp_map_page(). nr_premapped counts the number of pages that
> - * were added to the EPT structures but not added with TDH.MEM.PAGE.ADD().
> - * The counter has to be zero on KVM_TDX_FINALIZE_VM, to ensure that there
> - * are no half-initialized shared EPT pages.
> - */
> -static int tdx_mem_page_record_premap_cnt(struct kvm *kvm, gfn_t gfn,
> -					  enum pg_level level, kvm_pfn_t pfn)
> -{
> -	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> -
> -	if (KVM_BUG_ON(kvm->arch.pre_fault_allowed, kvm))
> -		return -EIO;
> -
> -	/* nr_premapped will be decreased when tdh_mem_page_add() is called. */
> -	atomic64_inc(&kvm_tdx->nr_premapped);
> -	return 0;
> -}
> -
>  static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
>  				     enum pg_level level, kvm_pfn_t pfn)
>  {
> @@ -1638,14 +1615,30 @@ static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
>  		return -EIO;
>  
>  	/*
> -	 * Read 'pre_fault_allowed' before 'kvm_tdx->state'; see matching
> -	 * barrier in tdx_td_finalize().
> +	 * Ensure pre_fault_allowed is read by kvm_arch_vcpu_pre_fault_memory()
> +	 * before kvm_tdx->state.  Userspace must not be allowed to pre-fault
> +	 * arbitrary memory until the initial memory image is finalized.  Pairs
> +	 * with the smp_wmb() in tdx_td_finalize().
>  	 */
>  	smp_rmb();
> -	if (likely(kvm_tdx->state == TD_STATE_RUNNABLE))
> -		return tdx_mem_page_aug(kvm, gfn, level, pfn);
>  
> -	return tdx_mem_page_record_premap_cnt(kvm, gfn, level, pfn);
> +	/*
> +	 * If the TD isn't finalized/runnable, then userspace is initializing
> +	 * the VM image via KVM_TDX_INIT_MEM_REGION.  Increment the number of
> +	 * pages that need to be mapped and initialized via TDH.MEM.PAGE.ADD.
> +	 * KVM_TDX_FINALIZE_VM checks the counter to ensure all mapped pages
> +	 * have been added to the image, to prevent running the TD with a
> +	 * valid mapping in the mirror EPT, but not in the S-EPT.
> +	 */
> +	if (unlikely(kvm_tdx->state != TD_STATE_RUNNABLE)) {
> +		if (KVM_BUG_ON(kvm->arch.pre_fault_allowed, kvm))
> +			return -EIO;
> +
> +		atomic64_inc(&kvm_tdx->nr_premapped);

Nit: the comment

  /* nr_premapped will be decreased when tdh_mem_page_add() is called. */

is lost.  I think we can somehow embed it to the big comment above?
Re: [PATCH v3 13/25] KVM: TDX: Fold tdx_mem_page_record_premap_cnt() into its sole caller
Posted by Huang, Kai 3 months, 2 weeks ago
> > +	/*
> > +	 * If the TD isn't finalized/runnable, then userspace is initializing
> > +	 * the VM image via KVM_TDX_INIT_MEM_REGION.  Increment the number of
> > +	 * pages that need to be mapped and initialized via TDH.MEM.PAGE.ADD.
> > +	 * KVM_TDX_FINALIZE_VM checks the counter to ensure all mapped pages
> > +	 * have been added to the image, to prevent running the TD with a
> > +	 * valid mapping in the mirror EPT, but not in the S-EPT.
> > +	 */
> > +	if (unlikely(kvm_tdx->state != TD_STATE_RUNNABLE)) {
> > +		if (KVM_BUG_ON(kvm->arch.pre_fault_allowed, kvm))
> > +			return -EIO;
> > +
> > +		atomic64_inc(&kvm_tdx->nr_premapped);
> 
> Nit: the comment
> 
>   /* nr_premapped will be decreased when tdh_mem_page_add() is called. */
> 
> is lost.  I think we can somehow embed it to the big comment above?

Please ignore this.  I saw the whole 'nr_premapped' eventually got removed
later in this series so don't bother :-)