KVM: Fix deadlock for invalid memslots

[PATCH v2 2/3] KVM: TDX: Do not retry locally when the retry is caused by invalid memslot

Posted by Yan Zhao 1 month, 1 week ago

From: Sean Christopherson <seanjc@google.com>

Avoid local retries within the TDX EPT violation handler if a retry is
triggered by faulting in an invalid memslot, indicating that the memslot is
undergoing a removal process.

This prevents the slot removal process from being blocked while waiting for
the VMExit handler to release the SRCU lock.

Opportunistically, export symbol kvm_vcpu_gfn_to_memslot() to allow for
per-vCPU acceleration of gfn_to_memslot translation.

[Yan: Wrote patch log, comment, fixed a minor error, function export]

Reported-by: Reinette Chatre <reinette.chatre@intel.com>
Closes: https://lore.kernel.org/all/20250519023737.30360-1-yan.y.zhao@intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 11 +++++++++++
 virt/kvm/kvm_main.c    |  1 +
 2 files changed, 12 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 6784aaaced87..de2c4bb36069 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1992,6 +1992,11 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
 	 * blocked by TDs, false positives are inevitable i.e., KVM may re-enter
 	 * the guest even if the IRQ/NMI can't be delivered.
 	 *
+	 * Breaking out of the local retries if a retry is caused by faulting
+	 * in an invalid memslot (indicating the slot is under removal), so that
+	 * the slot removal will not be blocked due to waiting for releasing
+	 * SRCU lock in the VMExit handler.
+	 *
 	 * Note: even without breaking out of local retries, zero-step
 	 * mitigation may still occur due to
 	 * - invoking of TDH.VP.ENTER after KVM_EXIT_MEMORY_FAULT,
@@ -2002,6 +2007,8 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
 	 * handle retries locally in their EPT violation handlers.
 	 */
 	while (1) {
+		struct kvm_memory_slot *slot;
+
 		ret = __vmx_handle_ept_violation(vcpu, gpa, exit_qual);
 
 		if (ret != RET_PF_RETRY || !local_retry)
@@ -2015,6 +2022,10 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
 			break;
 		}
 
+		slot = kvm_vcpu_gfn_to_memslot(vcpu, gpa_to_gfn(gpa));
+		if (slot && slot->flags & KVM_MEMSLOT_INVALID)
+			break;
+
 		cond_resched();
 	}
 	return ret;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6c07dd423458..f769d1dccc21 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2661,6 +2661,7 @@ struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn
 
 	return NULL;
 }
+EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_memslot);
 
 bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn)
 {
-- 
2.43.2

Re: [PATCH v2 2/3] KVM: TDX: Do not retry locally when the retry is caused by invalid memslot

Posted by Binbin Wu 3 weeks, 3 days ago


On 8/22/2025 3:05 PM, Yan Zhao wrote:
> From: Sean Christopherson <seanjc@google.com>
>
> Avoid local retries within the TDX EPT violation handler if a retry is
> triggered by faulting in an invalid memslot, indicating that the memslot is
> undergoing a removal process.
>
> This prevents the slot removal process from being blocked while waiting for
> the VMExit handler to release the SRCU lock.
>
> Opportunistically, export symbol kvm_vcpu_gfn_to_memslot() to allow for
> per-vCPU acceleration of gfn_to_memslot translation.
>
> [Yan: Wrote patch log, comment, fixed a minor error, function export]
>
> Reported-by: Reinette Chatre <reinette.chatre@intel.com>
> Closes: https://lore.kernel.org/all/20250519023737.30360-1-yan.y.zhao@intel.com
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>   arch/x86/kvm/vmx/tdx.c | 11 +++++++++++
>   virt/kvm/kvm_main.c    |  1 +
>   2 files changed, 12 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index 6784aaaced87..de2c4bb36069 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -1992,6 +1992,11 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
>   	 * blocked by TDs, false positives are inevitable i.e., KVM may re-enter
>   	 * the guest even if the IRQ/NMI can't be delivered.
>   	 *
> +	 * Breaking out of the local retries if a retry is caused by faulting
> +	 * in an invalid memslot (indicating the slot is under removal), so that
> +	 * the slot removal will not be blocked due to waiting for releasing
> +	 * SRCU lock in the VMExit handler.
> +	 *
>   	 * Note: even without breaking out of local retries, zero-step
>   	 * mitigation may still occur due to
>   	 * - invoking of TDH.VP.ENTER after KVM_EXIT_MEMORY_FAULT,
> @@ -2002,6 +2007,8 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
>   	 * handle retries locally in their EPT violation handlers.
>   	 */
>   	while (1) {
> +		struct kvm_memory_slot *slot;
> +
>   		ret = __vmx_handle_ept_violation(vcpu, gpa, exit_qual);
>   
>   		if (ret != RET_PF_RETRY || !local_retry)
> @@ -2015,6 +2022,10 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
>   			break;
>   		}
>   
> +		slot = kvm_vcpu_gfn_to_memslot(vcpu, gpa_to_gfn(gpa));
> +		if (slot && slot->flags & KVM_MEMSLOT_INVALID)

The slot couldn't be NULL here, right?
So the checking for slot is to avoid de-referencing a NULL pointer in case of
bug?

> +			break;
> +
>   		cond_resched();
>   	}
>   	return ret;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 6c07dd423458..f769d1dccc21 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2661,6 +2661,7 @@ struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn
>   
>   	return NULL;
>   }
> +EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_memslot);
>   
>   bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn)
>   {

Re: [PATCH v2 2/3] KVM: TDX: Do not retry locally when the retry is caused by invalid memslot

Posted by Sean Christopherson 3 weeks, 3 days ago

On Tue, Sep 09, 2025, Binbin Wu wrote:
> On 8/22/2025 3:05 PM, Yan Zhao wrote:
> > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > index 6784aaaced87..de2c4bb36069 100644
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -1992,6 +1992,11 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
> >   	 * blocked by TDs, false positives are inevitable i.e., KVM may re-enter
> >   	 * the guest even if the IRQ/NMI can't be delivered.
> >   	 *
> > +	 * Breaking out of the local retries if a retry is caused by faulting
> > +	 * in an invalid memslot (indicating the slot is under removal), so that
> > +	 * the slot removal will not be blocked due to waiting for releasing
> > +	 * SRCU lock in the VMExit handler.
> > +	 *
> >   	 * Note: even without breaking out of local retries, zero-step
> >   	 * mitigation may still occur due to
> >   	 * - invoking of TDH.VP.ENTER after KVM_EXIT_MEMORY_FAULT,
> > @@ -2002,6 +2007,8 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
> >   	 * handle retries locally in their EPT violation handlers.
> >   	 */
> >   	while (1) {
> > +		struct kvm_memory_slot *slot;
> > +
> >   		ret = __vmx_handle_ept_violation(vcpu, gpa, exit_qual);
> >   		if (ret != RET_PF_RETRY || !local_retry)
> > @@ -2015,6 +2022,10 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
> >   			break;
> >   		}
> > +		slot = kvm_vcpu_gfn_to_memslot(vcpu, gpa_to_gfn(gpa));
> > +		if (slot && slot->flags & KVM_MEMSLOT_INVALID)
> 
> The slot couldn't be NULL here, right?

Uh, hmm.  It could be NULL.  If the memslot deletion starts concurrently with the
S-EPT violation, then the memslot could be transitioned to INVALID (prepared for
deletion) prior to the vCPU acquiring SRCU after the VM-Exit.  Memslot deletion
could then assign to kvm->memslots with a NULL memslot.

  vCPU                          DELETE
  S-EPT Violation
                                Set KVM_MEMSLOT_INVALID
                                synchronize_srcu_expedited()
  Acquire SRCU
  __vmx_handle_ept_violation()
  RET_PF_RETRY due to INVALID
                                Set memslot NULL
  kvm_vcpu_gfn_to_memslot()

Re: [PATCH v2 2/3] KVM: TDX: Do not retry locally when the retry is caused by invalid memslot

Posted by Binbin Wu 3 weeks, 2 days ago


On 9/9/2025 10:18 PM, Sean Christopherson wrote:
> On Tue, Sep 09, 2025, Binbin Wu wrote:
>> On 8/22/2025 3:05 PM, Yan Zhao wrote:
>>> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
>>> index 6784aaaced87..de2c4bb36069 100644
>>> --- a/arch/x86/kvm/vmx/tdx.c
>>> +++ b/arch/x86/kvm/vmx/tdx.c
>>> @@ -1992,6 +1992,11 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
>>>    	 * blocked by TDs, false positives are inevitable i.e., KVM may re-enter
>>>    	 * the guest even if the IRQ/NMI can't be delivered.
>>>    	 *
>>> +	 * Breaking out of the local retries if a retry is caused by faulting
>>> +	 * in an invalid memslot (indicating the slot is under removal), so that
>>> +	 * the slot removal will not be blocked due to waiting for releasing
>>> +	 * SRCU lock in the VMExit handler.
>>> +	 *
>>>    	 * Note: even without breaking out of local retries, zero-step
>>>    	 * mitigation may still occur due to
>>>    	 * - invoking of TDH.VP.ENTER after KVM_EXIT_MEMORY_FAULT,
>>> @@ -2002,6 +2007,8 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
>>>    	 * handle retries locally in their EPT violation handlers.
>>>    	 */
>>>    	while (1) {
>>> +		struct kvm_memory_slot *slot;
>>> +
>>>    		ret = __vmx_handle_ept_violation(vcpu, gpa, exit_qual);
>>>    		if (ret != RET_PF_RETRY || !local_retry)
>>> @@ -2015,6 +2022,10 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
>>>    			break;
>>>    		}
>>> +		slot = kvm_vcpu_gfn_to_memslot(vcpu, gpa_to_gfn(gpa));
>>> +		if (slot && slot->flags & KVM_MEMSLOT_INVALID)
>> The slot couldn't be NULL here, right?
> Uh, hmm.  It could be NULL.  If the memslot deletion starts concurrently with the
> S-EPT violation, then the memslot could be transitioned to INVALID (prepared for
> deletion) prior to the vCPU acquiring SRCU after the VM-Exit.  Memslot deletion
> could then assign to kvm->memslots with a NULL memslot.
>
>    vCPU                          DELETE
>    S-EPT Violation
>                                  Set KVM_MEMSLOT_INVALID
>                                  synchronize_srcu_expedited()
>    Acquire SRCU
>    __vmx_handle_ept_violation()
>    RET_PF_RETRY due to INVALID
>                                  Set memslot NULL
>    kvm_vcpu_gfn_to_memslot()
Got it, thanks!