TLB_CONTROL is reset to TLB_CONTROL_DO_NOTHING on nested transitions to
L2. This is unnecessary because it should always be
TLB_CONTROL_DO_NOTHING at this point.
The flow for setting TLB_CONTROL is as follows:
1. In vcpu_enter_guest(), servicing a TLB flush request may set it to
TLB_CONTROL_FLUSH_ASID in svm_flush_tlb_asid().
2. In svm_vcpu_run() -> pre_svm_run(), it may get upgraded to
TLB_CONTROL_FLUSH_ALL_ASID when assigning a new ASID.
3. In svm_cpu_run(), it gets reset to TLB_CONTROL_DO_NOTHING after the
guest is run.
Hence, TLB_CONTROL is reset after each run and there is no need to do it
again on every nested transition to L2.
There is a TODO in nested_svm_transition_tlb_flush() about this reset
crushing pending TLB flushes. Remove it, as the reset is not really
crushing anything as explained above.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
arch/x86/kvm/svm/nested.c | 12 ++----------
1 file changed, 2 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 12bb391884299..8e40ff21f7353 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -512,12 +512,7 @@ static void nested_svm_entry_tlb_flush(struct kvm_vcpu *vcpu)
svm->nested.last_asid = svm->nested.ctl.asid;
kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
}
- /*
- * TODO: optimize unconditional TLB flush/MMU sync. A partial list of
- * things to fix before this can be conditional:
- *
- * - Don't crush a pending TLB flush in vmcb02 on nested VMRUN
- */
+ /* TODO: optimize unconditional TLB flush/MMU sync */
kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
}
@@ -536,7 +531,7 @@ static void nested_svm_exit_tlb_flush(struct kvm_vcpu *vcpu)
if (svm->nested.ctl.tlb_ctl == TLB_CONTROL_FLUSH_ALL_ASID)
kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
- /* See nested_svm_entry_tlb_flush() */
+ /* TODO: optimize unconditional TLB flush/MMU sync */
kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
}
@@ -717,9 +712,6 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
/* Done at vmrun: asid. */
- /* Also overwritten later if necessary. */
- svm_clear_tlb_ctl_flush(vmcb02);
-
/* nested_cr3. */
if (nested_npt_enabled(svm))
nested_svm_init_mmu_context(vcpu);
--
2.48.1.362.g079036d154-goog
On Wed, 2025-02-05 at 18:24 +0000, Yosry Ahmed wrote: > TLB_CONTROL is reset to TLB_CONTROL_DO_NOTHING on nested transitions to > L2. This is unnecessary because it should always be > TLB_CONTROL_DO_NOTHING at this point. > > The flow for setting TLB_CONTROL is as follows: > 1. In vcpu_enter_guest(), servicing a TLB flush request may set it to > TLB_CONTROL_FLUSH_ASID in svm_flush_tlb_asid(). > 2. In svm_vcpu_run() -> pre_svm_run(), it may get upgraded to > TLB_CONTROL_FLUSH_ALL_ASID when assigning a new ASID. > 3. In svm_cpu_run(), it gets reset to TLB_CONTROL_DO_NOTHING after the > guest is run. > > Hence, TLB_CONTROL is reset after each run and there is no need to do it > again on every nested transition to L2. > > There is a TODO in nested_svm_transition_tlb_flush() about this reset > crushing pending TLB flushes. Remove it, as the reset is not really > crushing anything as explained above. I am not sure that we don't crush a pending tlb request: svm_flush_tlb_asid can also be called by KVM_REQ_TLB_FLUSH and set the flush request in both vmcbs, thus later the nested_svm_exit_tlb_flush can crush this request. But the patch itself does look OK to me, although I might be mistaken. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Best regards, Maxim Levitsky > > Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> > --- > arch/x86/kvm/svm/nested.c | 12 ++---------- > 1 file changed, 2 insertions(+), 10 deletions(-) > > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c > index 12bb391884299..8e40ff21f7353 100644 > --- a/arch/x86/kvm/svm/nested.c > +++ b/arch/x86/kvm/svm/nested.c > @@ -512,12 +512,7 @@ static void nested_svm_entry_tlb_flush(struct kvm_vcpu *vcpu) > svm->nested.last_asid = svm->nested.ctl.asid; > kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu); > } > - /* > - * TODO: optimize unconditional TLB flush/MMU sync. A partial list of > - * things to fix before this can be conditional: > - * > - * - Don't crush a pending TLB flush in vmcb02 on nested VMRUN > - */ > + /* TODO: optimize unconditional TLB flush/MMU sync */ > kvm_make_request(KVM_REQ_MMU_SYNC, vcpu); > kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu); > } > @@ -536,7 +531,7 @@ static void nested_svm_exit_tlb_flush(struct kvm_vcpu *vcpu) > if (svm->nested.ctl.tlb_ctl == TLB_CONTROL_FLUSH_ALL_ASID) > kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu); > > - /* See nested_svm_entry_tlb_flush() */ > + /* TODO: optimize unconditional TLB flush/MMU sync */ > kvm_make_request(KVM_REQ_MMU_SYNC, vcpu); > kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu); > } > @@ -717,9 +712,6 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm, > > /* Done at vmrun: asid. */ > > - /* Also overwritten later if necessary. */ > - svm_clear_tlb_ctl_flush(vmcb02); > - > /* nested_cr3. */ > if (nested_npt_enabled(svm)) > nested_svm_init_mmu_context(vcpu);
On Fri, Feb 28, 2025 at 09:17:52PM -0500, Maxim Levitsky wrote: > On Wed, 2025-02-05 at 18:24 +0000, Yosry Ahmed wrote: > > TLB_CONTROL is reset to TLB_CONTROL_DO_NOTHING on nested transitions to > > L2. This is unnecessary because it should always be > > TLB_CONTROL_DO_NOTHING at this point. > > > > The flow for setting TLB_CONTROL is as follows: > > 1. In vcpu_enter_guest(), servicing a TLB flush request may set it to > > TLB_CONTROL_FLUSH_ASID in svm_flush_tlb_asid(). > > 2. In svm_vcpu_run() -> pre_svm_run(), it may get upgraded to > > TLB_CONTROL_FLUSH_ALL_ASID when assigning a new ASID. > > 3. In svm_cpu_run(), it gets reset to TLB_CONTROL_DO_NOTHING after the > > guest is run. > > > > Hence, TLB_CONTROL is reset after each run and there is no need to do it > > again on every nested transition to L2. > > > > There is a TODO in nested_svm_transition_tlb_flush() about this reset > > crushing pending TLB flushes. Remove it, as the reset is not really > > crushing anything as explained above. > > I am not sure that we don't crush a pending tlb request: > > svm_flush_tlb_asid can also be called by KVM_REQ_TLB_FLUSH > and set the flush request in both vmcbs, thus later the nested_svm_exit_tlb_flush > can crush this request. How so? nested_svm_exit_tlb_flush() makes a KVM_REQ_TLB_FLUSH_GUEST request. svm_flush_tlb_asid() is called when servicing KVM_REQ_TLB_* requests. So svm_flush_tlb_asid() does not make a request in the sense of KVM_REQ_*, it sets TLB_CONTROL or invalidates the ASID, which is can more-or-less be described as "requesting" a TLB flush on VM-enter, but is not the same thing as KVM_REQ_TLB_FLUSH. So I am not sure there are any requests being crushed here. > > But the patch itself does look OK to me, although I might be mistaken. > > Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Thanks! > > > Best regards, > Maxim Levitsky
On Mon, 2025-03-03 at 22:14 +0000, Yosry Ahmed wrote: > On Fri, Feb 28, 2025 at 09:17:52PM -0500, Maxim Levitsky wrote: > > On Wed, 2025-02-05 at 18:24 +0000, Yosry Ahmed wrote: > > > TLB_CONTROL is reset to TLB_CONTROL_DO_NOTHING on nested transitions to > > > L2. This is unnecessary because it should always be > > > TLB_CONTROL_DO_NOTHING at this point. > > > > > > The flow for setting TLB_CONTROL is as follows: > > > 1. In vcpu_enter_guest(), servicing a TLB flush request may set it to > > > TLB_CONTROL_FLUSH_ASID in svm_flush_tlb_asid(). > > > 2. In svm_vcpu_run() -> pre_svm_run(), it may get upgraded to > > > TLB_CONTROL_FLUSH_ALL_ASID when assigning a new ASID. > > > 3. In svm_cpu_run(), it gets reset to TLB_CONTROL_DO_NOTHING after the > > > guest is run. > > > > > > Hence, TLB_CONTROL is reset after each run and there is no need to do it > > > again on every nested transition to L2. > > > > > > There is a TODO in nested_svm_transition_tlb_flush() about this reset > > > crushing pending TLB flushes. Remove it, as the reset is not really > > > crushing anything as explained above. > > > > I am not sure that we don't crush a pending tlb request: > > > > svm_flush_tlb_asid can also be called by KVM_REQ_TLB_FLUSH > > and set the flush request in both vmcbs, thus later the nested_svm_exit_tlb_flush > > can crush this request. > > How so? > > nested_svm_exit_tlb_flush() makes a KVM_REQ_TLB_FLUSH_GUEST request. > svm_flush_tlb_asid() is called when servicing KVM_REQ_TLB_* requests. I am probably missing something but: Suppose KVM_REQ_TLB_FLUSH is raised and then processed while ordinary L1 entry is happening, but nested state is allocated. (KVM_REQ_TLB_FLUSH can be raised anytime MMU wants a 'big hammer flush everything') In this case svm_flush_tlb_all will call svm_flush_tlb_asid on both vmcbs (see patch 8), and that will set TLB_CONTROL_FLUSH_ASID in both vmcbs. In particular it will be set in vmcb02. Later, maybe even hours later in theory, L1 issues VMRUN, we reach nested_vmcb02_prepare_control, and crush the value (TLB_CONTROL_FLUSH_ASID) set in vmcb02. I think that this is what the removed comment referred to. Best regards, Maxim Levitsky > > So svm_flush_tlb_asid() does not make a request in the sense of > KVM_REQ_*, it sets TLB_CONTROL or invalidates the ASID, which is can > more-or-less be described as "requesting" a TLB flush on VM-enter, but > is not the same thing as KVM_REQ_TLB_FLUSH. > > So I am not sure there are any requests being crushed here. > > > But the patch itself does look OK to me, although I might be mistaken. > > > > Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> > > Thanks! > > > > > Best regards, > > Maxim Levitsky
On Tue, Mar 04, 2025 at 10:01:25PM -0500, Maxim Levitsky wrote: > On Mon, 2025-03-03 at 22:14 +0000, Yosry Ahmed wrote: > > On Fri, Feb 28, 2025 at 09:17:52PM -0500, Maxim Levitsky wrote: > > > On Wed, 2025-02-05 at 18:24 +0000, Yosry Ahmed wrote: > > > > TLB_CONTROL is reset to TLB_CONTROL_DO_NOTHING on nested transitions to > > > > L2. This is unnecessary because it should always be > > > > TLB_CONTROL_DO_NOTHING at this point. > > > > > > > > The flow for setting TLB_CONTROL is as follows: > > > > 1. In vcpu_enter_guest(), servicing a TLB flush request may set it to > > > > TLB_CONTROL_FLUSH_ASID in svm_flush_tlb_asid(). > > > > 2. In svm_vcpu_run() -> pre_svm_run(), it may get upgraded to > > > > TLB_CONTROL_FLUSH_ALL_ASID when assigning a new ASID. > > > > 3. In svm_cpu_run(), it gets reset to TLB_CONTROL_DO_NOTHING after the > > > > guest is run. > > > > > > > > Hence, TLB_CONTROL is reset after each run and there is no need to do it > > > > again on every nested transition to L2. > > > > > > > > There is a TODO in nested_svm_transition_tlb_flush() about this reset > > > > crushing pending TLB flushes. Remove it, as the reset is not really > > > > crushing anything as explained above. > > > > > > I am not sure that we don't crush a pending tlb request: > > > > > > svm_flush_tlb_asid can also be called by KVM_REQ_TLB_FLUSH > > > and set the flush request in both vmcbs, thus later the nested_svm_exit_tlb_flush > > > can crush this request. > > > > How so? > > > > nested_svm_exit_tlb_flush() makes a KVM_REQ_TLB_FLUSH_GUEST request. > > svm_flush_tlb_asid() is called when servicing KVM_REQ_TLB_* requests. > > I am probably missing something but: > > Suppose KVM_REQ_TLB_FLUSH is raised and then processed while ordinary L1 entry is happening, > but nested state is allocated. > > (KVM_REQ_TLB_FLUSH can be raised anytime MMU wants a 'big hammer flush everything') > > In this case svm_flush_tlb_all will call svm_flush_tlb_asid on both vmcbs (see patch 8), > and that will set TLB_CONTROL_FLUSH_ASID in both vmcbs. > In particular it will be set in vmcb02. > > Later, maybe even hours later in theory, L1 issues VMRUN, we reach nested_vmcb02_prepare_control, > and crush the value (TLB_CONTROL_FLUSH_ASID) set in vmcb02. > > I think that this is what the removed comment referred to. When KVM_REQ_TLB_FLUSH is raised, we do not call svm_flush_tlb_all() immediately. We only call svm_flush_tlb_all() when the request is serviced in vcpu_enter_guest(): /* * Note, the order matters here, as flushing "all" TLB entries * also flushes the "current" TLB entries, i.e. servicing the * flush "all" will clear any request to flush "current". */ if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) kvm_vcpu_flush_tlb_all(vcpu); kvm_service_local_tlb_flush_requests(vcpu); IIUC, vcpu_enter_guest() will be called for L2 after nested_vmcb02_prepare_control() is called. My understanding is that the sequence of events is as follows: - L1 executes VMRUN, which is trapped and emulated by L0. - KVM executes handles the VM-exit in L0 by calling vmrun_interception() to setup VMCB02 in prepartion for running L2. This will call nested_svm_vmrun() -> enter_svm_guest_mode() -> nested_vmcb02_prepare_control() (setting tlb_ctl to TLB_CONTROL_DO_NOTHING). - Execution will pick up after the VMRUN instruction in L0, eventually getting to the loop in vcpu_run(), and calling vcpu_enter_guest() for L2. At this point any pending TLB flush requests (e.g. KVM_REQ_TLB_FLUSH) will be handled, and svm_flush_tlb_*() functions may be called to set tlb_ctl to a non-zero value (e.g. TLB_CONTROL_FLUSH_ASID). - A little bit later in svm_vcpu_run() -> pre_svm_run(), tlb_ctl may be upgraded to TLB_CONTROL_FLUSH_ALL_ASID if a new ASID is allocated. - After the guest is run, svm_cpu_run() resets tlb_ctl to TLB_CONTROL_DO_NOTHING. So nested_vmcb02_prepare_control() setting tlb_ctl to TLB_CONTROL_DO_NOTHING should have no effect because tlb_ctl is set after that anyway before L2 is run, and reset back to TLB_CONTROL_DO_NOTHING after L2 is run. I tried to clarify this in the commit log, but I don't think it is clear enough. I will try to add more details in the next version. Please correct me if I am wrong.
© 2016 - 2026 Red Hat, Inc.