KVM: Drop vm_dead, pivot on vm_bugged for -EIO

[PATCH 2/5] KVM: TDX: Exit with MEMORY_FAULT on unexpected pending S-EPT Violation

Posted by Sean Christopherson 2 months, 1 week ago

Exit to userspace with -EFAULT and a valid MEMORY_FAULT exit if a vCPU
hits an unexpected pending S-EPT Violation instead of marking the VM dead.
While it's unlikely the VM can continue on, whether or not to terminate
the VM is not KVM's decision to make.

Set memory_fault.size to zero to communicate to userspace that reported
fault is "bad", and to effectively terminate the VM if userspace blindly
treats the exit as a conversion attempt (KVM_SET_MEMORY_ATTRIBUTES will
fail with -EINVAL if the size is zero).

Opportunistically delete the pr_warn(), which could be abused to spam the
kernel log, and is largely useless outside of interact debug as it doesn't
specify which VM encountered a failure.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/tdx.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 3e0d4edee849..c2ef03f39c32 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1937,10 +1937,8 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
 
 	if (vt_is_tdx_private_gpa(vcpu->kvm, gpa)) {
 		if (tdx_is_sept_violation_unexpected_pending(vcpu)) {
-			pr_warn("Guest access before accepting 0x%llx on vCPU %d\n",
-				gpa, vcpu->vcpu_id);
-			kvm_vm_dead(vcpu->kvm);
-			return -EIO;
+			kvm_prepare_memory_fault_exit(vcpu, gpa, 0, true, false, true);
+			return -EFAULT;
 		}
 		/*
 		 * Always treat SEPT violations as write faults.  Ignore the
-- 
2.50.1.552.g942d659e1b-goog

Re: [PATCH 2/5] KVM: TDX: Exit with MEMORY_FAULT on unexpected pending S-EPT Violation

Posted by Xiaoyao Li 2 months, 1 week ago

On 7/30/2025 3:33 AM, Sean Christopherson wrote:
> Exit to userspace with -EFAULT and a valid MEMORY_FAULT exit if a vCPU
> hits an unexpected pending S-EPT Violation instead of marking the VM dead.
> While it's unlikely the VM can continue on, whether or not to terminate
> the VM is not KVM's decision to make.
> 
> Set memory_fault.size to zero to communicate to userspace that reported
> fault is "bad", and to effectively terminate the VM if userspace blindly
> treats the exit as a conversion attempt (KVM_SET_MEMORY_ATTRIBUTES will
> fail with -EINVAL if the size is zero).

This sets a special contract on size zero.

I had a patch internally, which introduce a new exit type:

+               /* KVM_EXIT_GUEST_ERROR */
+               struct {
+  #define KVM_GUEST_ERROR_TDX_ACCESS_PENDING_PAGE      0
+                       __u32 error_type;
+                       __u32 ndata;
+                       __u64 data[16];
+               } guest_error;

how about it?

> Opportunistically delete the pr_warn(), which could be abused to spam the
> kernel log, and is largely useless outside of interact debug as it doesn't
> specify which VM encountered a failure.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/vmx/tdx.c | 6 ++----
>   1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index 3e0d4edee849..c2ef03f39c32 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -1937,10 +1937,8 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
>   
>   	if (vt_is_tdx_private_gpa(vcpu->kvm, gpa)) {
>   		if (tdx_is_sept_violation_unexpected_pending(vcpu)) {
> -			pr_warn("Guest access before accepting 0x%llx on vCPU %d\n",
> -				gpa, vcpu->vcpu_id);
> -			kvm_vm_dead(vcpu->kvm);
> -			return -EIO;
> +			kvm_prepare_memory_fault_exit(vcpu, gpa, 0, true, false, true);
> +			return -EFAULT;
>   		}
>   		/*
>   		 * Always treat SEPT violations as write faults.  Ignore the

Re: [PATCH 2/5] KVM: TDX: Exit with MEMORY_FAULT on unexpected pending S-EPT Violation

Posted by Yan Zhao 2 months, 1 week ago

On Wed, Jul 30, 2025 at 10:07:36AM +0800, Xiaoyao Li wrote:
> On 7/30/2025 3:33 AM, Sean Christopherson wrote:
> > Exit to userspace with -EFAULT and a valid MEMORY_FAULT exit if a vCPU
> > hits an unexpected pending S-EPT Violation instead of marking the VM dead.
> > While it's unlikely the VM can continue on, whether or not to terminate
> > the VM is not KVM's decision to make.
> > 
> > Set memory_fault.size to zero to communicate to userspace that reported
> > fault is "bad", and to effectively terminate the VM if userspace blindly
> > treats the exit as a conversion attempt (KVM_SET_MEMORY_ATTRIBUTES will
> > fail with -EINVAL if the size is zero).
> 
> This sets a special contract on size zero.
+1.

Or would it be good to use pr_warn_once() to indicate that the VM termination
reason is "Guest access before accepting" if a new exit type is not specified?

This info is straightforward and helpful in cases when a TD is accidentally
killed.

> I had a patch internally, which introduce a new exit type:
> 
> +               /* KVM_EXIT_GUEST_ERROR */
> +               struct {
> +  #define KVM_GUEST_ERROR_TDX_ACCESS_PENDING_PAGE      0
> +                       __u32 error_type;
> +                       __u32 ndata;
> +                       __u64 data[16];
> +               } guest_error;
> 
> how about it?
> 
> > Opportunistically delete the pr_warn(), which could be abused to spam the
> > kernel log, and is largely useless outside of interact debug as it doesn't
> > specify which VM encountered a failure.
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >   arch/x86/kvm/vmx/tdx.c | 6 ++----
> >   1 file changed, 2 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > index 3e0d4edee849..c2ef03f39c32 100644
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -1937,10 +1937,8 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
> >   	if (vt_is_tdx_private_gpa(vcpu->kvm, gpa)) {
> >   		if (tdx_is_sept_violation_unexpected_pending(vcpu)) {
> > -			pr_warn("Guest access before accepting 0x%llx on vCPU %d\n",
> > -				gpa, vcpu->vcpu_id);
> > -			kvm_vm_dead(vcpu->kvm);
> > -			return -EIO;
> > +			kvm_prepare_memory_fault_exit(vcpu, gpa, 0, true, false, true);
> > +			return -EFAULT;
> >   		}
> >   		/*
> >   		 * Always treat SEPT violations as write faults.  Ignore the
> 
>

Re: [PATCH 2/5] KVM: TDX: Exit with MEMORY_FAULT on unexpected pending S-EPT Violation

Posted by Edgecombe, Rick P 2 months, 1 week ago

On Tue, 2025-07-29 at 12:33 -0700, Sean Christopherson wrote:
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index 3e0d4edee849..c2ef03f39c32 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -1937,10 +1937,8 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
>  
>  	if (vt_is_tdx_private_gpa(vcpu->kvm, gpa)) {
>  		if (tdx_is_sept_violation_unexpected_pending(vcpu)) {
> -			pr_warn("Guest access before accepting 0x%llx on vCPU %d\n",
> -				gpa, vcpu->vcpu_id);
> -			kvm_vm_dead(vcpu->kvm);
> -			return -EIO;
> +			kvm_prepare_memory_fault_exit(vcpu, gpa, 0, true, false, true);
> +			return -EFAULT;
>  		}
>  		/*
>  		 * Always treat SEPT violations as write faults.  Ignore the

The vm_dead was added because mirror EPT will KVM_BUG_ON() if there is an
attempt to set the mirror EPT entry when it is already present. And the
unaccepted memory access will trigger an EPT violation for a mirror PTE that is
already set. I think this is a better solution irrespective of the vm_dead
changes.

Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>

But hmm, tangentially related, but Yan do we have a similar problem with
KVM_PRE_FAULT_MEMORY after we started setting pre_fault_allowed during TD
finalization?

Re: [PATCH 2/5] KVM: TDX: Exit with MEMORY_FAULT on unexpected pending S-EPT Violation

Posted by Yan Zhao 2 months, 1 week ago

On Tue, Jul 29, 2025 at 10:27:34PM +0000, Edgecombe, Rick P wrote:
> On Tue, 2025-07-29 at 12:33 -0700, Sean Christopherson wrote:
> > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > index 3e0d4edee849..c2ef03f39c32 100644
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -1937,10 +1937,8 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
> >  
> >  	if (vt_is_tdx_private_gpa(vcpu->kvm, gpa)) {
> >  		if (tdx_is_sept_violation_unexpected_pending(vcpu)) {
> > -			pr_warn("Guest access before accepting 0x%llx on vCPU %d\n",
> > -				gpa, vcpu->vcpu_id);
> > -			kvm_vm_dead(vcpu->kvm);
> > -			return -EIO;
> > +			kvm_prepare_memory_fault_exit(vcpu, gpa, 0, true, false, true);
> > +			return -EFAULT;
> >  		}
> >  		/*
> >  		 * Always treat SEPT violations as write faults.  Ignore the
> 
> The vm_dead was added because mirror EPT will KVM_BUG_ON() if there is an
> attempt to set the mirror EPT entry when it is already present. And the
> unaccepted memory access will trigger an EPT violation for a mirror PTE that is
> already set. I think this is a better solution irrespective of the vm_dead
> changes.
> 
> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> 
> But hmm, tangentially related, but Yan do we have a similar problem with
> KVM_PRE_FAULT_MEMORY after we started setting pre_fault_allowed during TD
> finalization?
Sean's commit 6385d01eec16 ("KVM: x86/mmu: Don't overwrite shadow-present MMU
SPTEs when prefaulting") should have prevented repeated invocation of
set_external_spte_present() with prefaulted entries.

Re: [PATCH 2/5] KVM: TDX: Exit with MEMORY_FAULT on unexpected pending S-EPT Violation

Posted by Edgecombe, Rick P 2 months, 1 week ago

On Wed, 2025-07-30 at 13:55 +0800, Yan Zhao wrote:
> > But hmm, tangentially related, but Yan do we have a similar problem with
> > KVM_PRE_FAULT_MEMORY after we started setting pre_fault_allowed during TD
> > finalization?
> Sean's commit 6385d01eec16 ("KVM: x86/mmu: Don't overwrite shadow-present MMU
> SPTEs when prefaulting") should have prevented repeated invocation of
> set_external_spte_present() with prefaulted entries.

Ok, great.

Re: [PATCH 2/5] KVM: TDX: Exit with MEMORY_FAULT on unexpected pending S-EPT Violation

Posted by Sean Christopherson 2 months, 1 week ago

On Tue, Jul 29, 2025, Rick P Edgecombe wrote:
> On Tue, 2025-07-29 at 12:33 -0700, Sean Christopherson wrote:
> > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > index 3e0d4edee849..c2ef03f39c32 100644
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -1937,10 +1937,8 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
> >  
> >  	if (vt_is_tdx_private_gpa(vcpu->kvm, gpa)) {
> >  		if (tdx_is_sept_violation_unexpected_pending(vcpu)) {
> > -			pr_warn("Guest access before accepting 0x%llx on vCPU %d\n",
> > -				gpa, vcpu->vcpu_id);
> > -			kvm_vm_dead(vcpu->kvm);
> > -			return -EIO;
> > +			kvm_prepare_memory_fault_exit(vcpu, gpa, 0, true, false, true);
> > +			return -EFAULT;
> >  		}
> >  		/*
> >  		 * Always treat SEPT violations as write faults.  Ignore the
> 
> The vm_dead was added because mirror EPT will KVM_BUG_ON() if there is an
> attempt to set the mirror EPT entry when it is already present. And the
> unaccepted memory access will trigger an EPT violation for a mirror PTE that is
> already set. I think this is a better solution irrespective of the vm_dead
> changes.

In that case, this change will expose KVM to the KVM_BUG_ON(), because nothing
prevents userspace from re-running the vCPU.  Which KVM_BUG_ON() exactly gets
hit?

Re: [PATCH 2/5] KVM: TDX: Exit with MEMORY_FAULT on unexpected pending S-EPT Violation

Posted by Edgecombe, Rick P 2 months, 1 week ago

On Tue, 2025-07-29 at 15:54 -0700, Sean Christopherson wrote:
> > The vm_dead was added because mirror EPT will KVM_BUG_ON() if there is an
> > attempt to set the mirror EPT entry when it is already present. And the
> > unaccepted memory access will trigger an EPT violation for a mirror PTE that
> > is
> > already set. I think this is a better solution irrespective of the vm_dead
> > changes.
> 
> In that case, this change will expose KVM to the KVM_BUG_ON(), because nothing
> prevents userspace from re-running the vCPU. 

If userspace runs the vCPU again then an EPT violation gets triggered again,
which again gets kicked out to userspace. The new check will prevent it from
getting into the fault handler, right?

>  Which KVM_BUG_ON() exactly gets hit?

Should be:

static int __must_check set_external_spte_present(struct kvm *kvm, tdp_ptep_t
sptep,
						 gfn_t gfn, u64 old_spte,
						 u64 new_spte, int level)
{
	bool was_present = is_shadow_present_pte(old_spte);
	bool is_present = is_shadow_present_pte(new_spte);
	bool is_leaf = is_present && is_last_spte(new_spte, level);
	kvm_pfn_t new_pfn = spte_to_pfn(new_spte);
	int ret = 0;

	KVM_BUG_ON(was_present, kvm);

Re: [PATCH 2/5] KVM: TDX: Exit with MEMORY_FAULT on unexpected pending S-EPT Violation

Posted by Yan Zhao 2 months, 1 week ago

On Tue, Jul 29, 2025 at 10:58:02PM +0000, Edgecombe, Rick P wrote:
> On Tue, 2025-07-29 at 15:54 -0700, Sean Christopherson wrote:
> > > The vm_dead was added because mirror EPT will KVM_BUG_ON() if there is an
> > > attempt to set the mirror EPT entry when it is already present. And the
> > > unaccepted memory access will trigger an EPT violation for a mirror PTE that
> > > is
> > > already set. I think this is a better solution irrespective of the vm_dead
> > > changes.
> > 
> > In that case, this change will expose KVM to the KVM_BUG_ON(), because nothing
> > prevents userspace from re-running the vCPU. 
> 
> If userspace runs the vCPU again then an EPT violation gets triggered again,
> which again gets kicked out to userspace. The new check will prevent it from
> getting into the fault handler, right?
> 
> >  Which KVM_BUG_ON() exactly gets hit?
> 
> Should be:
> 
> static int __must_check set_external_spte_present(struct kvm *kvm, tdp_ptep_t
> sptep,
> 						 gfn_t gfn, u64 old_spte,
> 						 u64 new_spte, int level)
> {
> 	bool was_present = is_shadow_present_pte(old_spte);
> 	bool is_present = is_shadow_present_pte(new_spte);
> 	bool is_leaf = is_present && is_last_spte(new_spte, level);
> 	kvm_pfn_t new_pfn = spte_to_pfn(new_spte);
> 	int ret = 0;
> 
> 	KVM_BUG_ON(was_present, kvm);
Yes, this KVM_BUG_ON() could be triggered in an earlier code if
tdx_is_sept_violation_unexpected_pending() wasn't checked to prevent repeated
EPT faults.

But the KVM_BUG_ON() is no longer reachable after the commit cc7ed3358e41 ("KVM:
x86/mmu: Always set SPTE's dirty bit if it's created as writable").

With the current code, even without putting VM to dead, the worst case is the
endless invocation of EPT fault handler, returning RET_PF_SPURIOUS.

Re: [PATCH 2/5] KVM: TDX: Exit with MEMORY_FAULT on unexpected pending S-EPT Violation

Posted by Sean Christopherson 2 months, 1 week ago

On Tue, Jul 29, 2025, Rick P Edgecombe wrote:
> On Tue, 2025-07-29 at 15:54 -0700, Sean Christopherson wrote:
> > > The vm_dead was added because mirror EPT will KVM_BUG_ON() if there is an
> > > attempt to set the mirror EPT entry when it is already present. And the
> > > unaccepted memory access will trigger an EPT violation for a mirror PTE
> > > that is already set. I think this is a better solution irrespective of
> > > the vm_dead changes.
> > 
> > In that case, this change will expose KVM to the KVM_BUG_ON(), because nothing
> > prevents userspace from re-running the vCPU. 
> 
> If userspace runs the vCPU again then an EPT violation gets triggered again,
> which again gets kicked out to userspace. The new check will prevent it from
> getting into the fault handler, right?

Yes?  But I'm confused about why you mentioned vm_dead, and why you're calling
this a "new check".  This effectively does two things: drops kvm_vm_dead() and
switches from EOI => EFAULT.  _If_ setting vm_dead was necessary, then we have a
problem.

I assume by "The vm_dead was added" you really mean "forcing an exit to userspace",
and that kvm_vm_dead()+EIO was a somewhat arbitrary way of forcing an exit?

> >  Which KVM_BUG_ON() exactly gets hit?
> 
> Should be:
> 
> static int __must_check set_external_spte_present(struct kvm *kvm, tdp_ptep_t
> sptep,
> 						 gfn_t gfn, u64 old_spte,
> 						 u64 new_spte, int level)
> {
> 	bool was_present = is_shadow_present_pte(old_spte);
> 	bool is_present = is_shadow_present_pte(new_spte);
> 	bool is_leaf = is_present && is_last_spte(new_spte, level);
> 	kvm_pfn_t new_pfn = spte_to_pfn(new_spte);
> 	int ret = 0;
> 
> 	KVM_BUG_ON(was_present, kvm);

Yeah, I don't see how that can be reach in this scenario.

Re: [PATCH 2/5] KVM: TDX: Exit with MEMORY_FAULT on unexpected pending S-EPT Violation

Posted by Edgecombe, Rick P 2 months, 1 week ago

On Tue, 2025-07-29 at 16:08 -0700, Sean Christopherson wrote:
> > If userspace runs the vCPU again then an EPT violation gets triggered again,
> > which again gets kicked out to userspace. The new check will prevent it from
> > getting into the fault handler, right?
> 
> Yes?  But I'm confused about why you mentioned vm_dead, and why you're calling
> this a "new check".  This effectively does two things: drops kvm_vm_dead() and
> switches from EOI => EFAULT.  _If_ setting vm_dead was necessary, then we have
> a
> problem.
> 
> I assume by "The vm_dead was added" you really mean "forcing an exit to
> userspace",
> and that kvm_vm_dead()+EIO was a somewhat arbitrary way of forcing an exit?

Sorry, yes vm_dead prevents an EPT violation loop but not the KVM_BUG_ON(). The
whole if clause prevents the KVM_BUG_ON(). Your patch prevents the ept violation
loop in a better way.