[PATCH v3 14/25] KVM: TDX: Bug the VM if extended the initial measurement fails

Sean Christopherson posted 25 patches 3 months, 3 weeks ago
There is a newer version of this series
[PATCH v3 14/25] KVM: TDX: Bug the VM if extended the initial measurement fails
Posted by Sean Christopherson 3 months, 3 weeks ago
WARN and terminate the VM if TDH_MR_EXTEND fails, as extending the
measurement should fail if and only if there is a KVM bug, or if the S-EPT
mapping is invalid, and it should be impossible for the S-EPT mappings to
be removed between kvm_tdp_mmu_map_private_pfn() and tdh_mr_extend().

Holding slots_lock prevents zaps due to memslot updates,
filemap_invalidate_lock() prevents zaps due to guest_memfd PUNCH_HOLE,
and all usage of kvm_zap_gfn_range() is mutually exclusive with S-EPT
entries that can be used for the initial image.  The call from sev.c is
obviously mutually exclusive, TDX disallows KVM_X86_QUIRK_IGNORE_GUEST_PAT
so same goes for kvm_noncoherent_dma_assignment_start_or_stop, and while
__kvm_set_or_clear_apicv_inhibit() can likely be tripped while building the
image, the APIC page has its own non-guest_memfd memslot and so can't be
used for the initial image, which means that too is mutually exclusive.

Opportunistically switch to "goto" to jump around the measurement code,
partly to make it clear that KVM needs to bail entirely if extending the
measurement fails, partly in anticipation of reworking how and when
TDH_MEM_PAGE_ADD is done.

Fixes: d789fa6efac9 ("KVM: TDX: Handle vCPU dissociation")
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/tdx.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index c37591730cc5..f4bab75d3ffb 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -3151,14 +3151,22 @@ static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
 
 	KVM_BUG_ON(atomic64_dec_return(&kvm_tdx->nr_premapped) < 0, kvm);
 
-	if (arg->flags & KVM_TDX_MEASURE_MEMORY_REGION) {
-		for (i = 0; i < PAGE_SIZE; i += TDX_EXTENDMR_CHUNKSIZE) {
-			err = tdh_mr_extend(&kvm_tdx->td, gpa + i, &entry,
-					    &level_state);
-			if (err) {
-				ret = -EIO;
-				break;
-			}
+	if (!(arg->flags & KVM_TDX_MEASURE_MEMORY_REGION))
+		goto out;
+
+	/*
+	 * Note, MR.EXTEND can fail if the S-EPT mapping is somehow removed
+	 * between mapping the pfn and now, but slots_lock prevents memslot
+	 * updates, filemap_invalidate_lock() prevents guest_memfd updates,
+	 * mmu_notifier events can't reach S-EPT entries, and KVM's internal
+	 * zapping flows are mutually exclusive with S-EPT mappings.
+	 */
+	for (i = 0; i < PAGE_SIZE; i += TDX_EXTENDMR_CHUNKSIZE) {
+		err = tdh_mr_extend(&kvm_tdx->td, gpa + i, &entry, &level_state);
+		if (KVM_BUG_ON(err, kvm)) {
+			pr_tdx_error_2(TDH_MR_EXTEND, err, entry, level_state);
+			ret = -EIO;
+			goto out;
 		}
 	}
 
-- 
2.51.0.858.gf9c4a03a3a-goog
Re: [PATCH v3 14/25] KVM: TDX: Bug the VM if extended the initial measurement fails
Posted by Huang, Kai 3 months, 2 weeks ago
On Thu, 2025-10-16 at 17:32 -0700, Sean Christopherson wrote:
> WARN and terminate the VM if TDH_MR_EXTEND fails, as extending the
> measurement should fail if and only if there is a KVM bug, or if the S-EPT
> mapping is invalid, and it should be impossible for the S-EPT mappings to
> be removed between kvm_tdp_mmu_map_private_pfn() and tdh_mr_extend().
> 
> Holding slots_lock prevents zaps due to memslot updates,
> filemap_invalidate_lock() prevents zaps due to guest_memfd PUNCH_HOLE,
> and all usage of kvm_zap_gfn_range() is mutually exclusive with S-EPT
> entries that can be used for the initial image.  The call from sev.c is
> obviously mutually exclusive, TDX disallows KVM_X86_QUIRK_IGNORE_GUEST_PAT
> so same goes for kvm_noncoherent_dma_assignment_start_or_stop, and while
> __kvm_set_or_clear_apicv_inhibit() can likely be tripped while building the
> image, the APIC page has its own non-guest_memfd memslot and so can't be
> used for the initial image, which means that too is mutually exclusive.
> 
> Opportunistically switch to "goto" to jump around the measurement code,
> partly to make it clear that KVM needs to bail entirely if extending the
> measurement fails, partly in anticipation of reworking how and when
> TDH_MEM_PAGE_ADD is done.
> 
> Fixes: d789fa6efac9 ("KVM: TDX: Handle vCPU dissociation")

So IIUC this patch only adds a KVM_BUG_ON() when TDH.MR.EXTEND fails.  How
does this fix anything?

Looking at v2, they may have a relationship, but it's quite confusing w/o
any explanation?

> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/vmx/tdx.c | 24 ++++++++++++++++--------
>  1 file changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index c37591730cc5..f4bab75d3ffb 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -3151,14 +3151,22 @@ static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
>  
>  	KVM_BUG_ON(atomic64_dec_return(&kvm_tdx->nr_premapped) < 0, kvm);
>  
> -	if (arg->flags & KVM_TDX_MEASURE_MEMORY_REGION) {
> -		for (i = 0; i < PAGE_SIZE; i += TDX_EXTENDMR_CHUNKSIZE) {
> -			err = tdh_mr_extend(&kvm_tdx->td, gpa + i, &entry,
> -					    &level_state);
> -			if (err) {
> -				ret = -EIO;
> -				break;
> -			}
> +	if (!(arg->flags & KVM_TDX_MEASURE_MEMORY_REGION))
> +		goto out;
> +
> +	/*
> +	 * Note, MR.EXTEND can fail if the S-EPT mapping is somehow removed
> +	 * between mapping the pfn and now, but slots_lock prevents memslot
> +	 * updates, filemap_invalidate_lock() prevents guest_memfd updates,
> +	 * mmu_notifier events can't reach S-EPT entries, and KVM's internal
> +	 * zapping flows are mutually exclusive with S-EPT mappings.
> +	 */
> +	for (i = 0; i < PAGE_SIZE; i += TDX_EXTENDMR_CHUNKSIZE) {
> +		err = tdh_mr_extend(&kvm_tdx->td, gpa + i, &entry, &level_state);
> +		if (KVM_BUG_ON(err, kvm)) {
> +			pr_tdx_error_2(TDH_MR_EXTEND, err, entry, level_state);
> +			ret = -EIO;
> +			goto out;
>  		}
>  	}
>  
> -- 
> 2.51.0.858.gf9c4a03a3a-goog
Re: [PATCH v3 14/25] KVM: TDX: Bug the VM if extended the initial measurement fails
Posted by Sean Christopherson 3 months, 2 weeks ago
On Thu, Oct 23, 2025, Kai Huang wrote:
> On Thu, 2025-10-16 at 17:32 -0700, Sean Christopherson wrote:
> > WARN and terminate the VM if TDH_MR_EXTEND fails, as extending the
> > measurement should fail if and only if there is a KVM bug, or if the S-EPT
> > mapping is invalid, and it should be impossible for the S-EPT mappings to
> > be removed between kvm_tdp_mmu_map_private_pfn() and tdh_mr_extend().
> > 
> > Holding slots_lock prevents zaps due to memslot updates,
> > filemap_invalidate_lock() prevents zaps due to guest_memfd PUNCH_HOLE,
> > and all usage of kvm_zap_gfn_range() is mutually exclusive with S-EPT
> > entries that can be used for the initial image.  The call from sev.c is
> > obviously mutually exclusive, TDX disallows KVM_X86_QUIRK_IGNORE_GUEST_PAT
> > so same goes for kvm_noncoherent_dma_assignment_start_or_stop, and while
> > __kvm_set_or_clear_apicv_inhibit() can likely be tripped while building the
> > image, the APIC page has its own non-guest_memfd memslot and so can't be
> > used for the initial image, which means that too is mutually exclusive.
> > 
> > Opportunistically switch to "goto" to jump around the measurement code,
> > partly to make it clear that KVM needs to bail entirely if extending the
> > measurement fails, partly in anticipation of reworking how and when
> > TDH_MEM_PAGE_ADD is done.
> > 
> > Fixes: d789fa6efac9 ("KVM: TDX: Handle vCPU dissociation")
> 
> So IIUC this patch only adds a KVM_BUG_ON() when TDH.MR.EXTEND fails.  How
> does this fix anything?

Hmm, yeah, I'll drop the Fixes.  It made more sense when I thought it was
impossible for tdh_mr_extend() to fail, as returning an error and not explicitly
terminating the VM was "wrong".  But I agree it does far more harm than good,
even when relocated to the end of the series.
Re: [PATCH v3 14/25] KVM: TDX: Bug the VM if extended the initial measurement fails
Posted by Yan Zhao 3 months, 2 weeks ago
On Fri, Oct 24, 2025 at 09:35:43AM -0700, Sean Christopherson wrote:
> On Thu, Oct 23, 2025, Kai Huang wrote:
> > On Thu, 2025-10-16 at 17:32 -0700, Sean Christopherson wrote:
> > > WARN and terminate the VM if TDH_MR_EXTEND fails, as extending the
> > > measurement should fail if and only if there is a KVM bug, or if the S-EPT
> > > mapping is invalid, and it should be impossible for the S-EPT mappings to
> > > be removed between kvm_tdp_mmu_map_private_pfn() and tdh_mr_extend().
> > > 
> > > Holding slots_lock prevents zaps due to memslot updates,
> > > filemap_invalidate_lock() prevents zaps due to guest_memfd PUNCH_HOLE,
> > > and all usage of kvm_zap_gfn_range() is mutually exclusive with S-EPT
> > > entries that can be used for the initial image.  The call from sev.c is
> > > obviously mutually exclusive, TDX disallows KVM_X86_QUIRK_IGNORE_GUEST_PAT
> > > so same goes for kvm_noncoherent_dma_assignment_start_or_stop, and while
> > > __kvm_set_or_clear_apicv_inhibit() can likely be tripped while building the
> > > image, the APIC page has its own non-guest_memfd memslot and so can't be
> > > used for the initial image, which means that too is mutually exclusive.
> > > 
> > > Opportunistically switch to "goto" to jump around the measurement code,
> > > partly to make it clear that KVM needs to bail entirely if extending the
> > > measurement fails, partly in anticipation of reworking how and when
> > > TDH_MEM_PAGE_ADD is done.
> > > 
> > > Fixes: d789fa6efac9 ("KVM: TDX: Handle vCPU dissociation")
Maybe this tag is for patch 25?

> > So IIUC this patch only adds a KVM_BUG_ON() when TDH.MR.EXTEND fails.  How
> > does this fix anything?
> 
> Hmm, yeah, I'll drop the Fixes.  It made more sense when I thought it was
> impossible for tdh_mr_extend() to fail, as returning an error and not explicitly
> terminating the VM was "wrong".  But I agree it does far more harm than good,
> even when relocated to the end of the series.
Re: [PATCH v3 14/25] KVM: TDX: Bug the VM if extended the initial measurement fails
Posted by Edgecombe, Rick P 3 months, 2 weeks ago
On Thu, 2025-10-16 at 17:32 -0700, Sean Christopherson wrote:
> WARN and terminate the VM if TDH_MR_EXTEND fails, as extending the
> measurement should fail if and only if there is a KVM bug, or if the S-EPT
> mapping is invalid, and it should be impossible for the S-EPT mappings to
> be removed between kvm_tdp_mmu_map_private_pfn() and tdh_mr_extend().
> 
> Holding slots_lock prevents zaps due to memslot updates,
> filemap_invalidate_lock() prevents zaps due to guest_memfd PUNCH_HOLE,
> and all usage of kvm_zap_gfn_range() is mutually exclusive with S-EPT
> entries that can be used for the initial image.  The call from sev.c is
> obviously mutually exclusive, TDX disallows KVM_X86_QUIRK_IGNORE_GUEST_PAT
> so same goes for kvm_noncoherent_dma_assignment_start_or_stop, and while
> __kvm_set_or_clear_apicv_inhibit() can likely be tripped while building the
> image, the APIC page has its own non-guest_memfd memslot and so can't be
> used for the initial image, which means that too is mutually exclusive.
> 
> Opportunistically switch to "goto" to jump around the measurement code,
> partly to make it clear that KVM needs to bail entirely if extending the
> measurement fails, partly in anticipation of reworking how and when
> TDH_MEM_PAGE_ADD is done.
> 
> Fixes: d789fa6efac9 ("KVM: TDX: Handle vCPU dissociation")
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---

Per the discussion in v2, shouldn't it go after patch 24 'KVM: TDX: Guard VM
state transitions with "all" the locks'? Otherwise it introduces a KVM_BUG_ON()
that can be triggered from userspace. not a huge deal though.
Re: [PATCH v3 14/25] KVM: TDX: Bug the VM if extended the initial measurement fails
Posted by Sean Christopherson 3 months, 2 weeks ago
On Tue, Oct 21, 2025, Rick P Edgecombe wrote:
> On Thu, 2025-10-16 at 17:32 -0700, Sean Christopherson wrote:
> > WARN and terminate the VM if TDH_MR_EXTEND fails, as extending the
> > measurement should fail if and only if there is a KVM bug, or if the S-EPT
> > mapping is invalid, and it should be impossible for the S-EPT mappings to
> > be removed between kvm_tdp_mmu_map_private_pfn() and tdh_mr_extend().
> > 
> > Holding slots_lock prevents zaps due to memslot updates,
> > filemap_invalidate_lock() prevents zaps due to guest_memfd PUNCH_HOLE,
> > and all usage of kvm_zap_gfn_range() is mutually exclusive with S-EPT
> > entries that can be used for the initial image.  The call from sev.c is
> > obviously mutually exclusive, TDX disallows KVM_X86_QUIRK_IGNORE_GUEST_PAT
> > so same goes for kvm_noncoherent_dma_assignment_start_or_stop, and while
> > __kvm_set_or_clear_apicv_inhibit() can likely be tripped while building the
> > image, the APIC page has its own non-guest_memfd memslot and so can't be
> > used for the initial image, which means that too is mutually exclusive.
> > 
> > Opportunistically switch to "goto" to jump around the measurement code,
> > partly to make it clear that KVM needs to bail entirely if extending the
> > measurement fails, partly in anticipation of reworking how and when
> > TDH_MEM_PAGE_ADD is done.
> > 
> > Fixes: d789fa6efac9 ("KVM: TDX: Handle vCPU dissociation")
> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> 
> Per the discussion in v2, shouldn't it go after patch 24 'KVM: TDX: Guard VM
> state transitions with "all" the locks'? Otherwise it introduces a KVM_BUG_ON()
> that can be triggered from userspace. not a huge deal though.

Oh, right.  And then the changelog needs to be updated too.