[RFC PATCH 11/13] KVM: nSVM: Do not reset TLB_CONTROL in VMCB02 on nested entry

Yosry Ahmed posted 13 patches 1 year ago
[RFC PATCH 11/13] KVM: nSVM: Do not reset TLB_CONTROL in VMCB02 on nested entry
Posted by Yosry Ahmed 1 year ago
TLB_CONTROL is reset to TLB_CONTROL_DO_NOTHING on nested transitions to
L2. This is unnecessary because it should always be
TLB_CONTROL_DO_NOTHING at this point.

The flow for setting TLB_CONTROL is as follows:
1. In vcpu_enter_guest(), servicing a TLB flush request may set it to
TLB_CONTROL_FLUSH_ASID in svm_flush_tlb_asid().
2. In svm_vcpu_run() -> pre_svm_run(), it may get upgraded to
TLB_CONTROL_FLUSH_ALL_ASID when assigning a new ASID.
3. In svm_cpu_run(), it gets reset to TLB_CONTROL_DO_NOTHING after the
guest is run.

Hence, TLB_CONTROL is reset after each run and there is no need to do it
again on every nested transition to L2.

There is a TODO in nested_svm_transition_tlb_flush() about this reset
crushing pending TLB flushes. Remove it, as the reset is not really
crushing anything as explained above.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/kvm/svm/nested.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 12bb391884299..8e40ff21f7353 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -512,12 +512,7 @@ static void nested_svm_entry_tlb_flush(struct kvm_vcpu *vcpu)
 		svm->nested.last_asid = svm->nested.ctl.asid;
 		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
 	}
-	/*
-	 * TODO: optimize unconditional TLB flush/MMU sync.  A partial list of
-	 * things to fix before this can be conditional:
-	 *
-	 *  - Don't crush a pending TLB flush in vmcb02 on nested VMRUN
-	 */
+	/* TODO: optimize unconditional TLB flush/MMU sync */
 	kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
 	kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 }
@@ -536,7 +531,7 @@ static void nested_svm_exit_tlb_flush(struct kvm_vcpu *vcpu)
 	if (svm->nested.ctl.tlb_ctl == TLB_CONTROL_FLUSH_ALL_ASID)
 		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
 
-	/* See nested_svm_entry_tlb_flush() */
+	/* TODO: optimize unconditional TLB flush/MMU sync */
 	kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
 	kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 }
@@ -717,9 +712,6 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
 
 	/* Done at vmrun: asid.  */
 
-	/* Also overwritten later if necessary.  */
-	svm_clear_tlb_ctl_flush(vmcb02);
-
 	/* nested_cr3.  */
 	if (nested_npt_enabled(svm))
 		nested_svm_init_mmu_context(vcpu);
-- 
2.48.1.362.g079036d154-goog
Re: [RFC PATCH 11/13] KVM: nSVM: Do not reset TLB_CONTROL in VMCB02 on nested entry
Posted by Maxim Levitsky 11 months, 2 weeks ago
On Wed, 2025-02-05 at 18:24 +0000, Yosry Ahmed wrote:
> TLB_CONTROL is reset to TLB_CONTROL_DO_NOTHING on nested transitions to
> L2. This is unnecessary because it should always be
> TLB_CONTROL_DO_NOTHING at this point.
> 
> The flow for setting TLB_CONTROL is as follows:
> 1. In vcpu_enter_guest(), servicing a TLB flush request may set it to
> TLB_CONTROL_FLUSH_ASID in svm_flush_tlb_asid().
> 2. In svm_vcpu_run() -> pre_svm_run(), it may get upgraded to
> TLB_CONTROL_FLUSH_ALL_ASID when assigning a new ASID.
> 3. In svm_cpu_run(), it gets reset to TLB_CONTROL_DO_NOTHING after the
> guest is run.
> 
> Hence, TLB_CONTROL is reset after each run and there is no need to do it
> again on every nested transition to L2.
> 
> There is a TODO in nested_svm_transition_tlb_flush() about this reset
> crushing pending TLB flushes. Remove it, as the reset is not really
> crushing anything as explained above.

I am not sure that we don't crush a pending tlb request: 

svm_flush_tlb_asid can also be called by KVM_REQ_TLB_FLUSH
and set the flush request in both vmcbs, thus later the nested_svm_exit_tlb_flush
can crush this request.

But the patch itself does look OK to me, although I might be mistaken.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>


Best regards,
	Maxim Levitsky


> 
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/kvm/svm/nested.c | 12 ++----------
>  1 file changed, 2 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 12bb391884299..8e40ff21f7353 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -512,12 +512,7 @@ static void nested_svm_entry_tlb_flush(struct kvm_vcpu *vcpu)
>  		svm->nested.last_asid = svm->nested.ctl.asid;
>  		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
>  	}
> -	/*
> -	 * TODO: optimize unconditional TLB flush/MMU sync.  A partial list of
> -	 * things to fix before this can be conditional:
> -	 *
> -	 *  - Don't crush a pending TLB flush in vmcb02 on nested VMRUN
> -	 */
> +	/* TODO: optimize unconditional TLB flush/MMU sync */
>  	kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
>  	kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
>  }
> @@ -536,7 +531,7 @@ static void nested_svm_exit_tlb_flush(struct kvm_vcpu *vcpu)
>  	if (svm->nested.ctl.tlb_ctl == TLB_CONTROL_FLUSH_ALL_ASID)
>  		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
>  
> -	/* See nested_svm_entry_tlb_flush() */
> +	/* TODO: optimize unconditional TLB flush/MMU sync */
>  	kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
>  	kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
>  }
> @@ -717,9 +712,6 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
>  
>  	/* Done at vmrun: asid.  */
>  
> -	/* Also overwritten later if necessary.  */
> -	svm_clear_tlb_ctl_flush(vmcb02);
> -
>  	/* nested_cr3.  */
>  	if (nested_npt_enabled(svm))
>  		nested_svm_init_mmu_context(vcpu);
Re: [RFC PATCH 11/13] KVM: nSVM: Do not reset TLB_CONTROL in VMCB02 on nested entry
Posted by Yosry Ahmed 11 months, 1 week ago
On Fri, Feb 28, 2025 at 09:17:52PM -0500, Maxim Levitsky wrote:
> On Wed, 2025-02-05 at 18:24 +0000, Yosry Ahmed wrote:
> > TLB_CONTROL is reset to TLB_CONTROL_DO_NOTHING on nested transitions to
> > L2. This is unnecessary because it should always be
> > TLB_CONTROL_DO_NOTHING at this point.
> > 
> > The flow for setting TLB_CONTROL is as follows:
> > 1. In vcpu_enter_guest(), servicing a TLB flush request may set it to
> > TLB_CONTROL_FLUSH_ASID in svm_flush_tlb_asid().
> > 2. In svm_vcpu_run() -> pre_svm_run(), it may get upgraded to
> > TLB_CONTROL_FLUSH_ALL_ASID when assigning a new ASID.
> > 3. In svm_cpu_run(), it gets reset to TLB_CONTROL_DO_NOTHING after the
> > guest is run.
> > 
> > Hence, TLB_CONTROL is reset after each run and there is no need to do it
> > again on every nested transition to L2.
> > 
> > There is a TODO in nested_svm_transition_tlb_flush() about this reset
> > crushing pending TLB flushes. Remove it, as the reset is not really
> > crushing anything as explained above.
> 
> I am not sure that we don't crush a pending tlb request: 
> 
> svm_flush_tlb_asid can also be called by KVM_REQ_TLB_FLUSH
> and set the flush request in both vmcbs, thus later the nested_svm_exit_tlb_flush
> can crush this request.

How so?

nested_svm_exit_tlb_flush() makes a KVM_REQ_TLB_FLUSH_GUEST request.
svm_flush_tlb_asid() is called when servicing KVM_REQ_TLB_* requests.

So svm_flush_tlb_asid() does not make a request in the sense of
KVM_REQ_*, it sets TLB_CONTROL or invalidates the ASID, which is can
more-or-less be described as "requesting" a TLB flush on VM-enter, but
is not the same thing as KVM_REQ_TLB_FLUSH.

So I am not sure there are any requests being crushed here.

> 
> But the patch itself does look OK to me, although I might be mistaken.
> 
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Thanks!

> 
> 
> Best regards,
> 	Maxim Levitsky
Re: [RFC PATCH 11/13] KVM: nSVM: Do not reset TLB_CONTROL in VMCB02 on nested entry
Posted by Maxim Levitsky 11 months, 1 week ago
On Mon, 2025-03-03 at 22:14 +0000, Yosry Ahmed wrote:
> On Fri, Feb 28, 2025 at 09:17:52PM -0500, Maxim Levitsky wrote:
> > On Wed, 2025-02-05 at 18:24 +0000, Yosry Ahmed wrote:
> > > TLB_CONTROL is reset to TLB_CONTROL_DO_NOTHING on nested transitions to
> > > L2. This is unnecessary because it should always be
> > > TLB_CONTROL_DO_NOTHING at this point.
> > > 
> > > The flow for setting TLB_CONTROL is as follows:
> > > 1. In vcpu_enter_guest(), servicing a TLB flush request may set it to
> > > TLB_CONTROL_FLUSH_ASID in svm_flush_tlb_asid().
> > > 2. In svm_vcpu_run() -> pre_svm_run(), it may get upgraded to
> > > TLB_CONTROL_FLUSH_ALL_ASID when assigning a new ASID.
> > > 3. In svm_cpu_run(), it gets reset to TLB_CONTROL_DO_NOTHING after the
> > > guest is run.
> > > 
> > > Hence, TLB_CONTROL is reset after each run and there is no need to do it
> > > again on every nested transition to L2.
> > > 
> > > There is a TODO in nested_svm_transition_tlb_flush() about this reset
> > > crushing pending TLB flushes. Remove it, as the reset is not really
> > > crushing anything as explained above.
> > 
> > I am not sure that we don't crush a pending tlb request: 
> > 
> > svm_flush_tlb_asid can also be called by KVM_REQ_TLB_FLUSH
> > and set the flush request in both vmcbs, thus later the nested_svm_exit_tlb_flush
> > can crush this request.
> 
> How so?
> 
> nested_svm_exit_tlb_flush() makes a KVM_REQ_TLB_FLUSH_GUEST request.
> svm_flush_tlb_asid() is called when servicing KVM_REQ_TLB_* requests.

I am probably missing something but:

Suppose KVM_REQ_TLB_FLUSH is raised and then processed while ordinary L1 entry is happening,
but nested state is allocated.

(KVM_REQ_TLB_FLUSH can be raised anytime MMU wants a 'big hammer flush everything')

In this case svm_flush_tlb_all will call svm_flush_tlb_asid on both vmcbs (see patch 8),
and that will set TLB_CONTROL_FLUSH_ASID in both vmcbs.
In particular it will be set in vmcb02.

Later, maybe even hours later in theory, L1 issues VMRUN, we reach nested_vmcb02_prepare_control,
and crush the value (TLB_CONTROL_FLUSH_ASID) set in vmcb02.

I think that this is what the removed comment referred to.


Best regards,
	Maxim Levitsky

> 
> So svm_flush_tlb_asid() does not make a request in the sense of
> KVM_REQ_*, it sets TLB_CONTROL or invalidates the ASID, which is can
> more-or-less be described as "requesting" a TLB flush on VM-enter, but
> is not the same thing as KVM_REQ_TLB_FLUSH.
> 
> So I am not sure there are any requests being crushed here.
> 
> > But the patch itself does look OK to me, although I might be mistaken.
> > 
> > Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> 
> Thanks!
> 
> > 
> > Best regards,
> > 	Maxim Levitsky
Re: [RFC PATCH 11/13] KVM: nSVM: Do not reset TLB_CONTROL in VMCB02 on nested entry
Posted by Yosry Ahmed 11 months, 1 week ago
On Tue, Mar 04, 2025 at 10:01:25PM -0500, Maxim Levitsky wrote:
> On Mon, 2025-03-03 at 22:14 +0000, Yosry Ahmed wrote:
> > On Fri, Feb 28, 2025 at 09:17:52PM -0500, Maxim Levitsky wrote:
> > > On Wed, 2025-02-05 at 18:24 +0000, Yosry Ahmed wrote:
> > > > TLB_CONTROL is reset to TLB_CONTROL_DO_NOTHING on nested transitions to
> > > > L2. This is unnecessary because it should always be
> > > > TLB_CONTROL_DO_NOTHING at this point.
> > > > 
> > > > The flow for setting TLB_CONTROL is as follows:
> > > > 1. In vcpu_enter_guest(), servicing a TLB flush request may set it to
> > > > TLB_CONTROL_FLUSH_ASID in svm_flush_tlb_asid().
> > > > 2. In svm_vcpu_run() -> pre_svm_run(), it may get upgraded to
> > > > TLB_CONTROL_FLUSH_ALL_ASID when assigning a new ASID.
> > > > 3. In svm_cpu_run(), it gets reset to TLB_CONTROL_DO_NOTHING after the
> > > > guest is run.
> > > > 
> > > > Hence, TLB_CONTROL is reset after each run and there is no need to do it
> > > > again on every nested transition to L2.
> > > > 
> > > > There is a TODO in nested_svm_transition_tlb_flush() about this reset
> > > > crushing pending TLB flushes. Remove it, as the reset is not really
> > > > crushing anything as explained above.
> > > 
> > > I am not sure that we don't crush a pending tlb request: 
> > > 
> > > svm_flush_tlb_asid can also be called by KVM_REQ_TLB_FLUSH
> > > and set the flush request in both vmcbs, thus later the nested_svm_exit_tlb_flush
> > > can crush this request.
> > 
> > How so?
> > 
> > nested_svm_exit_tlb_flush() makes a KVM_REQ_TLB_FLUSH_GUEST request.
> > svm_flush_tlb_asid() is called when servicing KVM_REQ_TLB_* requests.
> 
> I am probably missing something but:
> 
> Suppose KVM_REQ_TLB_FLUSH is raised and then processed while ordinary L1 entry is happening,
> but nested state is allocated.
> 
> (KVM_REQ_TLB_FLUSH can be raised anytime MMU wants a 'big hammer flush everything')
> 
> In this case svm_flush_tlb_all will call svm_flush_tlb_asid on both vmcbs (see patch 8),
> and that will set TLB_CONTROL_FLUSH_ASID in both vmcbs.
> In particular it will be set in vmcb02.
> 
> Later, maybe even hours later in theory, L1 issues VMRUN, we reach nested_vmcb02_prepare_control,
> and crush the value (TLB_CONTROL_FLUSH_ASID) set in vmcb02.
> 
> I think that this is what the removed comment referred to.

When KVM_REQ_TLB_FLUSH is raised, we do not call svm_flush_tlb_all()
immediately. We only call svm_flush_tlb_all() when the request is
serviced in vcpu_enter_guest():

	/*
	 * Note, the order matters here, as flushing "all" TLB entries
	 * also flushes the "current" TLB entries, i.e. servicing the
	 * flush "all" will clear any request to flush "current".
	 */
	if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu))
		kvm_vcpu_flush_tlb_all(vcpu);

	kvm_service_local_tlb_flush_requests(vcpu);

IIUC, vcpu_enter_guest() will be called for L2 after
nested_vmcb02_prepare_control() is called. My understanding is that the
sequence of events is as follows:
- L1 executes VMRUN, which is trapped and emulated by L0.

- KVM executes handles the VM-exit in L0 by calling
  vmrun_interception() to setup VMCB02 in prepartion for running L2.
  This will call nested_svm_vmrun() -> enter_svm_guest_mode() ->
  nested_vmcb02_prepare_control() (setting tlb_ctl to
  TLB_CONTROL_DO_NOTHING).

- Execution will pick up after the VMRUN instruction in L0, eventually
  getting to the loop in vcpu_run(), and calling vcpu_enter_guest()
  for L2. At this point any pending TLB flush requests (e.g.
  KVM_REQ_TLB_FLUSH) will be handled, and svm_flush_tlb_*() functions
  may be called to set tlb_ctl to a non-zero value (e.g.
  TLB_CONTROL_FLUSH_ASID).

- A little bit later in svm_vcpu_run() -> pre_svm_run(), tlb_ctl may be
  upgraded to TLB_CONTROL_FLUSH_ALL_ASID if a new ASID is allocated.
 
- After the guest is run, svm_cpu_run() resets tlb_ctl to TLB_CONTROL_DO_NOTHING.

So nested_vmcb02_prepare_control() setting tlb_ctl to
TLB_CONTROL_DO_NOTHING should have no effect because tlb_ctl is set
after that anyway before L2 is run, and reset back to
TLB_CONTROL_DO_NOTHING after L2 is run.

I tried to clarify this in the commit log, but I don't think it is clear
enough. I will try to add more details in the next version.

Please correct me if I am wrong.