Optimize nSVM TLB flushes

[RFC PATCH 01/13] KVM: nSVM: Track the ASID per-VMCB

Posted by Yosry Ahmed 1 year ago

The ASID is currently tracked per-vCPU, because the same ASID is used by
L1 and L2. That ASID is flushed on every transition between L1 and L2.

Track the ASID separately for each VMCB (similar to the
asid_generation), giving L2 a separate ASID. This is in preparation for
doing fine-grained TLB flushes on nested transitions instead of
unconditional full flushes.

The ASIDs are still not fully maintained (e.g. a remote flush will only
flush the current ASID), so keep the TLB flush on every transition until
this is sorted out.

L1's ASID will be flushed on KVM_REQ_TLB_FLUSH_GUEST if it is the
active context, so remove the TODO in nested_svm_transition_tlb_flush()
about it.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/kvm/svm/nested.c |  1 -
 arch/x86/kvm/svm/sev.c    |  2 +-
 arch/x86/kvm/svm/svm.c    | 12 +++++++-----
 arch/x86/kvm/svm/svm.h    |  2 +-
 4 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 04c375bf1ac2a..bbe4f3ac9f250 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -495,7 +495,6 @@ static void nested_svm_transition_tlb_flush(struct kvm_vcpu *vcpu)
 	 *  - Honor L1's request to flush an ASID on nested VMRUN
 	 *  - Sync nested NPT MMU on VMRUN that flushes L2's ASID[*]
 	 *  - Don't crush a pending TLB flush in vmcb02 on nested VMRUN
-	 *  - Flush L1's ASID on KVM_REQ_TLB_FLUSH_GUEST
 	 *
 	 * [*] Unlike nested EPT, SVM's ASID management can invalidate nested
 	 *     NPT guest-physical mappings on VMRUN.
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 799f8494b599c..b0adfd0537d00 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3468,7 +3468,7 @@ void pre_sev_run(struct vcpu_svm *svm, int cpu)
 	unsigned int asid = sev_get_asid(svm->vcpu.kvm);
 
 	/* Assign the asid allocated with this SEV guest */
-	svm->asid = asid;
+	svm->current_vmcb->asid = asid;
 
 	/*
 	 * Flush guest TLB:
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7640a84e554a6..08340ae57777b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1335,8 +1335,10 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 		save->g_pat = vcpu->arch.pat;
 		save->cr3 = 0;
 	}
-	svm->current_vmcb->asid_generation = 0;
-	svm->asid = 0;
+	svm->vmcb01.asid_generation = 0;
+	svm->vmcb01.asid = 0;
+	svm->nested.vmcb02.asid_generation = 0;
+	svm->nested.vmcb02.asid = 0;
 
 	svm->nested.vmcb12_gpa = INVALID_GPA;
 	svm->nested.last_vmcb12_gpa = INVALID_GPA;
@@ -1988,7 +1990,7 @@ static void new_asid(struct vcpu_svm *svm, struct svm_cpu_data *sd)
 	}
 
 	svm->current_vmcb->asid_generation = sd->asid_generation;
-	svm->asid = sd->next_asid++;
+	svm->current_vmcb->asid = sd->next_asid++;
 }
 
 static void svm_set_dr6(struct vcpu_svm *svm, unsigned long value)
@@ -4235,8 +4237,8 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu,
 
 	sync_lapic_to_cr8(vcpu);
 
-	if (unlikely(svm->asid != svm->vmcb->control.asid)) {
-		svm->vmcb->control.asid = svm->asid;
+	if (unlikely(svm->current_vmcb->asid != svm->vmcb->control.asid)) {
+		svm->vmcb->control.asid = svm->current_vmcb->asid;
 		vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
 	}
 	svm->vmcb->save.cr2 = vcpu->arch.cr2;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 9d7cdb8fbf872..ebbb0b1a64676 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -133,6 +133,7 @@ struct kvm_vmcb_info {
 	unsigned long pa;
 	int cpu;
 	uint64_t asid_generation;
+	u32 asid;
 };
 
 struct vmcb_save_area_cached {
@@ -247,7 +248,6 @@ struct vcpu_svm {
 	struct vmcb *vmcb;
 	struct kvm_vmcb_info vmcb01;
 	struct kvm_vmcb_info *current_vmcb;
-	u32 asid;
 	u32 sysenter_esp_hi;
 	u32 sysenter_eip_hi;
 	uint64_t tsc_aux;
-- 
2.48.1.362.g079036d154-goog

Re: [RFC PATCH 01/13] KVM: nSVM: Track the ASID per-VMCB

Posted by Maxim Levitsky 11 months, 2 weeks ago

On Wed, 2025-02-05 at 18:23 +0000, Yosry Ahmed wrote:
> The ASID is currently tracked per-vCPU, because the same ASID is used by
> L1 and L2. That ASID is flushed on every transition between L1 and L2.
> 
> Track the ASID separately for each VMCB (similar to the
> asid_generation), giving L2 a separate ASID. This is in preparation for
> doing fine-grained TLB flushes on nested transitions instead of
> unconditional full flushes.
> 
> The ASIDs are still not fully maintained (e.g. a remote flush will only
> flush the current ASID), so keep the TLB flush on every transition until
> this is sorted out.
> 
> L1's ASID will be flushed on KVM_REQ_TLB_FLUSH_GUEST if it is the
> active context, so remove the TODO in nested_svm_transition_tlb_flush()
> about it.
> 
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/kvm/svm/nested.c |  1 -
>  arch/x86/kvm/svm/sev.c    |  2 +-
>  arch/x86/kvm/svm/svm.c    | 12 +++++++-----
>  arch/x86/kvm/svm/svm.h    |  2 +-
>  4 files changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 04c375bf1ac2a..bbe4f3ac9f250 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -495,7 +495,6 @@ static void nested_svm_transition_tlb_flush(struct kvm_vcpu *vcpu)
>  	 *  - Honor L1's request to flush an ASID on nested VMRUN
>  	 *  - Sync nested NPT MMU on VMRUN that flushes L2's ASID[*]
>  	 *  - Don't crush a pending TLB flush in vmcb02 on nested VMRUN
> -	 *  - Flush L1's ASID on KVM_REQ_TLB_FLUSH_GUEST
>  	 *
>  	 * [*] Unlike nested EPT, SVM's ASID management can invalidate nested
>  	 *     NPT guest-physical mappings on VMRUN.
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 799f8494b599c..b0adfd0537d00 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3468,7 +3468,7 @@ void pre_sev_run(struct vcpu_svm *svm, int cpu)
>  	unsigned int asid = sev_get_asid(svm->vcpu.kvm);
>  
>  	/* Assign the asid allocated with this SEV guest */
> -	svm->asid = asid;
> +	svm->current_vmcb->asid = asid;
>  
>  	/*
>  	 * Flush guest TLB:
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 7640a84e554a6..08340ae57777b 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1335,8 +1335,10 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
>  		save->g_pat = vcpu->arch.pat;
>  		save->cr3 = 0;
>  	}
> -	svm->current_vmcb->asid_generation = 0;
> -	svm->asid = 0;
> +	svm->vmcb01.asid_generation = 0;
> +	svm->vmcb01.asid = 0;
> +	svm->nested.vmcb02.asid_generation = 0;
> +	svm->nested.vmcb02.asid = 0;
>  
>  	svm->nested.vmcb12_gpa = INVALID_GPA;
>  	svm->nested.last_vmcb12_gpa = INVALID_GPA;
> @@ -1988,7 +1990,7 @@ static void new_asid(struct vcpu_svm *svm, struct svm_cpu_data *sd)
>  	}
>  
>  	svm->current_vmcb->asid_generation = sd->asid_generation;
> -	svm->asid = sd->next_asid++;
> +	svm->current_vmcb->asid = sd->next_asid++;
>  }
>  
>  static void svm_set_dr6(struct vcpu_svm *svm, unsigned long value)
> @@ -4235,8 +4237,8 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu,
>  
>  	sync_lapic_to_cr8(vcpu);
>  
> -	if (unlikely(svm->asid != svm->vmcb->control.asid)) {
> -		svm->vmcb->control.asid = svm->asid;
> +	if (unlikely(svm->current_vmcb->asid != svm->vmcb->control.asid)) {
> +		svm->vmcb->control.asid = svm->current_vmcb->asid;
>  		vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
>  	}
>  	svm->vmcb->save.cr2 = vcpu->arch.cr2;
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 9d7cdb8fbf872..ebbb0b1a64676 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -133,6 +133,7 @@ struct kvm_vmcb_info {
>  	unsigned long pa;
>  	int cpu;
>  	uint64_t asid_generation;
> +	u32 asid;
>  };
>  
>  struct vmcb_save_area_cached {
> @@ -247,7 +248,6 @@ struct vcpu_svm {
>  	struct vmcb *vmcb;
>  	struct kvm_vmcb_info vmcb01;
>  	struct kvm_vmcb_info *current_vmcb;
> -	u32 asid;
>  	u32 sysenter_esp_hi;
>  	u32 sysenter_eip_hi;
>  	uint64_t tsc_aux;

Hi,


I think it should be possible to eliminate separate ASID field (current_vmcb->asid/svm->asid)
completely and instead just use the value stored in the vmcb.

When there is a need to update it, KVM can also set the corresponding dirty bit
as done in svm_vcpu_run (new_asid also already does this when the asid generation increases)

Also KVM already sets the tlb_ctl directly in the vmcb.

What do you think?

Best regards,
	Maxim Levitsky

Re: [RFC PATCH 01/13] KVM: nSVM: Track the ASID per-VMCB

Posted by Yosry Ahmed 11 months, 1 week ago

On Fri, Feb 28, 2025 at 08:23:48PM -0500, Maxim Levitsky wrote:
> On Wed, 2025-02-05 at 18:23 +0000, Yosry Ahmed wrote:
> > The ASID is currently tracked per-vCPU, because the same ASID is used by
> > L1 and L2. That ASID is flushed on every transition between L1 and L2.
> > 
> > Track the ASID separately for each VMCB (similar to the
> > asid_generation), giving L2 a separate ASID. This is in preparation for
> > doing fine-grained TLB flushes on nested transitions instead of
> > unconditional full flushes.
> > 
> > The ASIDs are still not fully maintained (e.g. a remote flush will only
> > flush the current ASID), so keep the TLB flush on every transition until
> > this is sorted out.
> > 
> > L1's ASID will be flushed on KVM_REQ_TLB_FLUSH_GUEST if it is the
> > active context, so remove the TODO in nested_svm_transition_tlb_flush()
> > about it.
> > 
> > Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> > ---
> >  arch/x86/kvm/svm/nested.c |  1 -
> >  arch/x86/kvm/svm/sev.c    |  2 +-
> >  arch/x86/kvm/svm/svm.c    | 12 +++++++-----
> >  arch/x86/kvm/svm/svm.h    |  2 +-
> >  4 files changed, 9 insertions(+), 8 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index 04c375bf1ac2a..bbe4f3ac9f250 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -495,7 +495,6 @@ static void nested_svm_transition_tlb_flush(struct kvm_vcpu *vcpu)
> >  	 *  - Honor L1's request to flush an ASID on nested VMRUN
> >  	 *  - Sync nested NPT MMU on VMRUN that flushes L2's ASID[*]
> >  	 *  - Don't crush a pending TLB flush in vmcb02 on nested VMRUN
> > -	 *  - Flush L1's ASID on KVM_REQ_TLB_FLUSH_GUEST
> >  	 *
> >  	 * [*] Unlike nested EPT, SVM's ASID management can invalidate nested
> >  	 *     NPT guest-physical mappings on VMRUN.
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index 799f8494b599c..b0adfd0537d00 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -3468,7 +3468,7 @@ void pre_sev_run(struct vcpu_svm *svm, int cpu)
> >  	unsigned int asid = sev_get_asid(svm->vcpu.kvm);
> >  
> >  	/* Assign the asid allocated with this SEV guest */
> > -	svm->asid = asid;
> > +	svm->current_vmcb->asid = asid;
> >  
> >  	/*
> >  	 * Flush guest TLB:
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 7640a84e554a6..08340ae57777b 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -1335,8 +1335,10 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
> >  		save->g_pat = vcpu->arch.pat;
> >  		save->cr3 = 0;
> >  	}
> > -	svm->current_vmcb->asid_generation = 0;
> > -	svm->asid = 0;
> > +	svm->vmcb01.asid_generation = 0;
> > +	svm->vmcb01.asid = 0;
> > +	svm->nested.vmcb02.asid_generation = 0;
> > +	svm->nested.vmcb02.asid = 0;
> >  
> >  	svm->nested.vmcb12_gpa = INVALID_GPA;
> >  	svm->nested.last_vmcb12_gpa = INVALID_GPA;
> > @@ -1988,7 +1990,7 @@ static void new_asid(struct vcpu_svm *svm, struct svm_cpu_data *sd)
> >  	}
> >  
> >  	svm->current_vmcb->asid_generation = sd->asid_generation;
> > -	svm->asid = sd->next_asid++;
> > +	svm->current_vmcb->asid = sd->next_asid++;
> >  }
> >  
> >  static void svm_set_dr6(struct vcpu_svm *svm, unsigned long value)
> > @@ -4235,8 +4237,8 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu,
> >  
> >  	sync_lapic_to_cr8(vcpu);
> >  
> > -	if (unlikely(svm->asid != svm->vmcb->control.asid)) {
> > -		svm->vmcb->control.asid = svm->asid;
> > +	if (unlikely(svm->current_vmcb->asid != svm->vmcb->control.asid)) {
> > +		svm->vmcb->control.asid = svm->current_vmcb->asid;
> >  		vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
> >  	}
> >  	svm->vmcb->save.cr2 = vcpu->arch.cr2;
> > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > index 9d7cdb8fbf872..ebbb0b1a64676 100644
> > --- a/arch/x86/kvm/svm/svm.h
> > +++ b/arch/x86/kvm/svm/svm.h
> > @@ -133,6 +133,7 @@ struct kvm_vmcb_info {
> >  	unsigned long pa;
> >  	int cpu;
> >  	uint64_t asid_generation;
> > +	u32 asid;
> >  };
> >  
> >  struct vmcb_save_area_cached {
> > @@ -247,7 +248,6 @@ struct vcpu_svm {
> >  	struct vmcb *vmcb;
> >  	struct kvm_vmcb_info vmcb01;
> >  	struct kvm_vmcb_info *current_vmcb;
> > -	u32 asid;
> >  	u32 sysenter_esp_hi;
> >  	u32 sysenter_eip_hi;
> >  	uint64_t tsc_aux;
> 
> Hi,
> 

Hi,

Thanks for taking a look! 

> 
> I think it should be possible to eliminate separate ASID field (current_vmcb->asid/svm->asid)
> completely and instead just use the value stored in the vmcb.
> 
> When there is a need to update it, KVM can also set the corresponding dirty bit
> as done in svm_vcpu_run (new_asid also already does this when the asid generation increases)
> 
> Also KVM already sets the tlb_ctl directly in the vmcb.
> 
> What do you think?

Yeah I think we can do that, although if we go with Sean's suggestion of
a per VM or a per vCPU ASID, this will change anyway. If we use a per
vCPU ASID, I think it would be nice to have it directly in svm->asid and
svm->nested.asid02 to be consistent with VMX.

I will see how the code turns out to be after taking Sean's suggestion
and go from there.

Re: [RFC PATCH 01/13] KVM: nSVM: Track the ASID per-VMCB

Posted by Sean Christopherson 11 months, 2 weeks ago

+Jim, for his input on VPIDs.

On Wed, Feb 05, 2025, Yosry Ahmed wrote:
> The ASID is currently tracked per-vCPU, because the same ASID is used by
> L1 and L2. That ASID is flushed on every transition between L1 and L2.
> 
> Track the ASID separately for each VMCB (similar to the
> asid_generation), giving L2 a separate ASID. This is in preparation for
> doing fine-grained TLB flushes on nested transitions instead of
> unconditional full flushes.

After having some time to think about this, rather than track ASIDs per VMCB, I
think we should converge on a single approach for nVMX (VPID) and nSVM (ASID).

Per **VM**, one VPID/ASID for L1, and one VPID/ASID for L2.

For SVM, the dynamic ASID crud is a holdover from KVM's support for CPUs that
don't support FLUSHBYASID, i.e. needed to purge the entire TLB in order to flush
guest mappings.  FLUSHBYASID was added in 2010, and AFAIK has been supported by
all AMD CPUs since.

KVM already mostly keeps the same ASID, except for when a vCPU is migrated, in
which case KVM assigns a new ASID.  I suspect that following VMX's lead and
simply doing a TLB flush in this situation would be an improvement for modern
CPUs, as it would flush the entries that need to be flushed, and not pollute the
TLBs with stale, unused entries.

Using a static per-VM ASID would also allow using broadcast invalidations[*],
would simplify the SVM code base, and I think/hope would allow us to move much
of the TLB flushing logic, e.g. for task migration, to common code.

For VPIDs, maybe it's because it's Friday afternoon, but for the life of me I
can't think of any reason why KVM needs to assign VPIDs per vCPU.  Especially
since KVM is ridiculously conservative and flushes _all_ EPT/VPID contexts when
running a different vCPU on a pCPU (which I suspect we can trim down?).

Am I forgetting something?

[*] https://lore.kernel.org/all/Z8HdBg3wj8M7a4ts@google.com

Re: [RFC PATCH 01/13] KVM: nSVM: Track the ASID per-VMCB

Posted by Yosry Ahmed 11 months, 1 week ago

On Fri, Feb 28, 2025 at 04:03:08PM -0800, Sean Christopherson wrote:
> +Jim, for his input on VPIDs.
> 
> On Wed, Feb 05, 2025, Yosry Ahmed wrote:
> > The ASID is currently tracked per-vCPU, because the same ASID is used by
> > L1 and L2. That ASID is flushed on every transition between L1 and L2.
> > 
> > Track the ASID separately for each VMCB (similar to the
> > asid_generation), giving L2 a separate ASID. This is in preparation for
> > doing fine-grained TLB flushes on nested transitions instead of
> > unconditional full flushes.
> 
> After having some time to think about this, rather than track ASIDs per VMCB, I
> think we should converge on a single approach for nVMX (VPID) and nSVM (ASID).
> 
> Per **VM**, one VPID/ASID for L1, and one VPID/ASID for L2.
> 
> For SVM, the dynamic ASID crud is a holdover from KVM's support for CPUs that
> don't support FLUSHBYASID, i.e. needed to purge the entire TLB in order to flush
> guest mappings.  FLUSHBYASID was added in 2010, and AFAIK has been supported by
> all AMD CPUs since.

This means that for these old CPUs, every TLB flush done for the guest
will also flush the TLB entries of all other guests and the host IIUC. I
am not sure what CPUs around do not support FLUSHBYASID, but this sounds
like a big regression for them.

I am all for simplifying the code and converging nVMX and nSVM, but I am
a bit worried about this. Sounds like you are not though, so maybe I am
missing something :P

I initially that that the ASID space is too small, but it turns out I
was confused by the ASID messages from the SEV code. The max number of
ASIDs seems to be (1 << 15) on Rome, Milan, and Genoa CPUs. That's half
of VMX_NR_VPIDS, and probably good enough.

> 
> KVM already mostly keeps the same ASID, except for when a vCPU is migrated, in
> which case KVM assigns a new ASID.  I suspect that following VMX's lead and
> simply doing a TLB flush in this situation would be an improvement for modern
> CPUs, as it would flush the entries that need to be flushed, and not pollute the
> TLBs with stale, unused entries.
> 
> Using a static per-VM ASID would also allow using broadcast invalidations[*],
> would simplify the SVM code base, and I think/hope would allow us to move much
> of the TLB flushing logic, e.g. for task migration, to common code.
> 
> For VPIDs, maybe it's because it's Friday afternoon, but for the life of me I
> can't think of any reason why KVM needs to assign VPIDs per vCPU.  Especially
> since KVM is ridiculously conservative and flushes _all_ EPT/VPID contexts when
> running a different vCPU on a pCPU (which I suspect we can trim down?).

I think for the purpose of this series we can switch SVM to use one ASID
per vCPU to match the current nVMX behavior and simplify things. Moving
both nSVM and nVMX to use a single ASID per VM instead of per vCPU, and
potentially moving some of the logic to the common code, could be a
separate followup effort (maybe something that I can work on later this
year if no one picks it up :) ).

WDYT?

> 
> Am I forgetting something?
> 
> [*] https://lore.kernel.org/all/Z8HdBg3wj8M7a4ts@google.com

Re: [RFC PATCH 01/13] KVM: nSVM: Track the ASID per-VMCB

Posted by Jim Mattson 11 months, 1 week ago

On Fri, Feb 28, 2025 at 4:03 PM Sean Christopherson <seanjc@google.com> wrote:
>
> +Jim, for his input on VPIDs.
>
> On Wed, Feb 05, 2025, Yosry Ahmed wrote:
> > The ASID is currently tracked per-vCPU, because the same ASID is used by
> > L1 and L2. That ASID is flushed on every transition between L1 and L2.
> >
> > Track the ASID separately for each VMCB (similar to the
> > asid_generation), giving L2 a separate ASID. This is in preparation for
> > doing fine-grained TLB flushes on nested transitions instead of
> > unconditional full flushes.
>
> After having some time to think about this, rather than track ASIDs per VMCB, I
> think we should converge on a single approach for nVMX (VPID) and nSVM (ASID).
>
> Per **VM**, one VPID/ASID for L1, and one VPID/ASID for L2.

When using EPT on VMX, there is probably no advantage to using one
VPID per VM. The physical ASID is determined by <EPTRTA, VPID, PCID>.
Two different VMs are not going to share an EPTRTA, so they already
have different ASIDs, even if they have the same VPID.

> For SVM, the dynamic ASID crud is a holdover from KVM's support for CPUs that
> don't support FLUSHBYASID, i.e. needed to purge the entire TLB in order to flush
> guest mappings.  FLUSHBYASID was added in 2010, and AFAIK has been supported by
> all AMD CPUs since.

> KVM already mostly keeps the same ASID, except for when a vCPU is migrated, in
> which case KVM assigns a new ASID.  I suspect that following VMX's lead and
> simply doing a TLB flush in this situation would be an improvement for modern
> CPUs, as it would flush the entries that need to be flushed, and not pollute the
> TLBs with stale, unused entries.
>
> Using a static per-VM ASID would also allow using broadcast invalidations[*],
> would simplify the SVM code base, and I think/hope would allow us to move much
> of the TLB flushing logic, e.g. for task migration, to common code.
>
> For VPIDs, maybe it's because it's Friday afternoon, but for the life of me I
> can't think of any reason why KVM needs to assign VPIDs per vCPU.  Especially
> since KVM is ridiculously conservative and flushes _all_ EPT/VPID contexts when
> running a different vCPU on a pCPU (which I suspect we can trim down?).
>
> Am I forgetting something?

TDX? IIRC, TDX requires a unique VPID for each vCPU in a VM.

> [*] https://lore.kernel.org/all/Z8HdBg3wj8M7a4ts@google.com

Re: [RFC PATCH 01/13] KVM: nSVM: Track the ASID per-VMCB

Posted by Sean Christopherson 11 months, 1 week ago

On Mon, Mar 03, 2025, Jim Mattson wrote:
> On Fri, Feb 28, 2025 at 4:03 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > +Jim, for his input on VPIDs.
> >
> > On Wed, Feb 05, 2025, Yosry Ahmed wrote:
> > > The ASID is currently tracked per-vCPU, because the same ASID is used by
> > > L1 and L2. That ASID is flushed on every transition between L1 and L2.
> > >
> > > Track the ASID separately for each VMCB (similar to the
> > > asid_generation), giving L2 a separate ASID. This is in preparation for
> > > doing fine-grained TLB flushes on nested transitions instead of
> > > unconditional full flushes.
> >
> > After having some time to think about this, rather than track ASIDs per VMCB, I
> > think we should converge on a single approach for nVMX (VPID) and nSVM (ASID).
> >
> > Per **VM**, one VPID/ASID for L1, and one VPID/ASID for L2.
> 
> When using EPT on VMX, there is probably no advantage to using one
> VPID per VM. The physical ASID is determined by <EPTRTA, VPID, PCID>.
> Two different VMs are not going to share an EPTRTA, so they already
> have different ASIDs, even if they have the same VPID.

For posterity, which the SDM says this:

  Linear mappings may be created. They are derived from the paging structures
  referenced (directly or indirectly) by the current value of CR3 and are associated
  with the current VPID and the current PCID.

it explicitly disallows creating or using linear mappings when EPT is enabled:

  No linear mappings are created while EPT is in use.

  no linear mappings are used while EPT is in use.

I think it's still worth assigning a unique VPID though, e.g. it would provide
some amount of defense in depth.  I.e. two different VMs *shouldn't* share an
EPTRTA :-)

> > For SVM, the dynamic ASID crud is a holdover from KVM's support for CPUs that
> > don't support FLUSHBYASID, i.e. needed to purge the entire TLB in order to flush
> > guest mappings.  FLUSHBYASID was added in 2010, and AFAIK has been supported by
> > all AMD CPUs since.
> 
> > KVM already mostly keeps the same ASID, except for when a vCPU is migrated, in
> > which case KVM assigns a new ASID.  I suspect that following VMX's lead and
> > simply doing a TLB flush in this situation would be an improvement for modern
> > CPUs, as it would flush the entries that need to be flushed, and not pollute the
> > TLBs with stale, unused entries.
> >
> > Using a static per-VM ASID would also allow using broadcast invalidations[*],
> > would simplify the SVM code base, and I think/hope would allow us to move much
> > of the TLB flushing logic, e.g. for task migration, to common code.
> >
> > For VPIDs, maybe it's because it's Friday afternoon, but for the life of me I
> > can't think of any reason why KVM needs to assign VPIDs per vCPU.  Especially
> > since KVM is ridiculously conservative and flushes _all_ EPT/VPID contexts when
> > running a different vCPU on a pCPU (which I suspect we can trim down?).
> >
> > Am I forgetting something?
> 
> TDX? IIRC, TDX requires a unique VPID for each vCPU in a VM.

Ha!  Nope, the TDX module actually does what I'm suggesting, and uses a per-VM
VPID.  So if I'm forgetting some TLB edge case, TDX is already hosed.

FWIW, the hypervisor, i.e. KVM, has no control over the VPID used by the TDX
module.  Intel incorporated SEAM mode into the ASID tag to prevent TLB collisions
between the hypervisor and the TDX module, and that also conveniently provides
separation between VPIDs for non-TDX VMs and TDX VMs (and now I'm curious if TDX
enabling does the "right" thing and skips VPID allocation).

FWIW, TDX's scheme would match what I'm proposing almost exactly.  TDX "composes"
the VPID using the HKID (guaranteed unique per VM) and then a "VM identifier",
which at a glance differentiates L1 from L2.

> > [*] https://lore.kernel.org/all/Z8HdBg3wj8M7a4ts@google.com