[v2] x86, KVM: Optimize SEV cache flushing

[PATCH v2 7/8] x86, lib: Add wbinvd and wbnoinvd helpers to target multiple CPUs

Posted by Sean Christopherson 8 months, 4 weeks ago

From: Zheyun Shen <szy0127@sjtu.edu.cn>

Extract KVM's open-coded calls to do writeback caches on multiple CPUs to
common library helpers for both WBINVD and WBNOINVD (KVM will use both).
Put the onus on the caller to check for a non-empty mask to simplify the
SMP=n implementation, e.g. so that it doesn't need to check that the one
and only CPU in the system is present in the mask.

Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20250128015345.7929-2-szy0127@sjtu.edu.cn
[sean: move to lib, add SMP=n helpers, clarify usage]
Acked-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/smp.h | 12 ++++++++++++
 arch/x86/kvm/x86.c         |  8 +-------
 arch/x86/lib/cache-smp.c   | 12 ++++++++++++
 3 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index e08f1ae25401..fe98e021f7f8 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -113,7 +113,9 @@ void native_play_dead(void);
 void play_dead_common(void);
 void wbinvd_on_cpu(int cpu);
 void wbinvd_on_all_cpus(void);
+void wbinvd_on_many_cpus(struct cpumask *cpus);
 void wbnoinvd_on_all_cpus(void);
+void wbnoinvd_on_many_cpus(struct cpumask *cpus);
 
 void smp_kick_mwait_play_dead(void);
 void __noreturn mwait_play_dead(unsigned int eax_hint);
@@ -154,11 +156,21 @@ static inline void wbinvd_on_all_cpus(void)
 	wbinvd();
 }
 
+static inline void wbinvd_on_many_cpus(struct cpumask *cpus)
+{
+	wbinvd();
+}
+
 static inline void wbnoinvd_on_all_cpus(void)
 {
 	wbnoinvd();
 }
 
+static inline void wbnoinvd_on_many_cpus(struct cpumask *cpus)
+{
+	wbnoinvd();
+}
+
 static inline struct cpumask *cpu_llc_shared_mask(int cpu)
 {
 	return (struct cpumask *)cpumask_of(0);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b8b72e8dac6e..e00a4b3a0e8c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4981,11 +4981,6 @@ long kvm_arch_dev_ioctl(struct file *filp,
 	return r;
 }
 
-static void wbinvd_ipi(void *garbage)
-{
-	wbinvd();
-}
-
 static bool need_emulate_wbinvd(struct kvm_vcpu *vcpu)
 {
 	return kvm_arch_has_noncoherent_dma(vcpu->kvm);
@@ -8286,8 +8281,7 @@ static int kvm_emulate_wbinvd_noskip(struct kvm_vcpu *vcpu)
 		int cpu = get_cpu();
 
 		cpumask_set_cpu(cpu, vcpu->arch.wbinvd_dirty_mask);
-		on_each_cpu_mask(vcpu->arch.wbinvd_dirty_mask,
-				wbinvd_ipi, NULL, 1);
+		wbinvd_on_many_cpus(vcpu->arch.wbinvd_dirty_mask);
 		put_cpu();
 		cpumask_clear(vcpu->arch.wbinvd_dirty_mask);
 	} else
diff --git a/arch/x86/lib/cache-smp.c b/arch/x86/lib/cache-smp.c
index 1789db5d8825..ebbc91b3ac67 100644
--- a/arch/x86/lib/cache-smp.c
+++ b/arch/x86/lib/cache-smp.c
@@ -20,6 +20,12 @@ void wbinvd_on_all_cpus(void)
 }
 EXPORT_SYMBOL(wbinvd_on_all_cpus);
 
+void wbinvd_on_many_cpus(struct cpumask *cpus)
+{
+	on_each_cpu_mask(cpus, __wbinvd, NULL, 1);
+}
+EXPORT_SYMBOL_GPL(wbinvd_on_many_cpus);
+
 static void __wbnoinvd(void *dummy)
 {
 	wbnoinvd();
@@ -30,3 +36,9 @@ void wbnoinvd_on_all_cpus(void)
 	on_each_cpu(__wbnoinvd, NULL, 1);
 }
 EXPORT_SYMBOL(wbnoinvd_on_all_cpus);
+
+void wbnoinvd_on_many_cpus(struct cpumask *cpus)
+{
+	on_each_cpu_mask(cpus, __wbnoinvd, NULL, 1);
+}
+EXPORT_SYMBOL_GPL(wbnoinvd_on_many_cpus);
-- 
2.49.0.1112.g889b7c5bd8-goog

Re: [PATCH v2 7/8] x86, lib: Add wbinvd and wbnoinvd helpers to target multiple CPUs

Posted by Ingo Molnar 8 months, 3 weeks ago

* Sean Christopherson <seanjc@google.com> wrote:

> From: Zheyun Shen <szy0127@sjtu.edu.cn>
> 
> Extract KVM's open-coded calls to do writeback caches on multiple CPUs to
> common library helpers for both WBINVD and WBNOINVD (KVM will use both).
> Put the onus on the caller to check for a non-empty mask to simplify the
> SMP=n implementation, e.g. so that it doesn't need to check that the one
> and only CPU in the system is present in the mask.
> 
> Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn>
> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
> Link: https://lore.kernel.org/r/20250128015345.7929-2-szy0127@sjtu.edu.cn
> [sean: move to lib, add SMP=n helpers, clarify usage]
> Acked-by: Kai Huang <kai.huang@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/smp.h | 12 ++++++++++++
>  arch/x86/kvm/x86.c         |  8 +-------
>  arch/x86/lib/cache-smp.c   | 12 ++++++++++++
>  3 files changed, 25 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
> index e08f1ae25401..fe98e021f7f8 100644
> --- a/arch/x86/include/asm/smp.h
> +++ b/arch/x86/include/asm/smp.h
> @@ -113,7 +113,9 @@ void native_play_dead(void);
>  void play_dead_common(void);
>  void wbinvd_on_cpu(int cpu);
>  void wbinvd_on_all_cpus(void);
> +void wbinvd_on_many_cpus(struct cpumask *cpus);
>  void wbnoinvd_on_all_cpus(void);
> +void wbnoinvd_on_many_cpus(struct cpumask *cpus);

Let's go with the _on_cpumask() suffix:

    void wbinvd_on_cpu(int cpu);
   +void wbinvd_on_cpumask(struct cpumask *cpus);
    void wbinvd_on_all_cpus(void);

And the wb*invd_all_cpus() methods should probably be inlined wrappers 
with -1 as the cpumask, or so - not two separate functions?

In fact it would be nice to have the DRM preparatory patch and all the 
x86 patches at the beginning of the next version of the series, so 
those 4 patches can be applied to the x86 tree. Can make it a separate 
permanent branch based on v6.15-rc6/rc7.

Thanks,

	Ingo

Re: [PATCH v2 7/8] x86, lib: Add wbinvd and wbnoinvd helpers to target multiple CPUs

Posted by Sean Christopherson 8 months, 3 weeks ago

On Sat, May 17, 2025, Ingo Molnar wrote:
> 
> * Sean Christopherson <seanjc@google.com> wrote:
> 
> > From: Zheyun Shen <szy0127@sjtu.edu.cn>
> > 
> > Extract KVM's open-coded calls to do writeback caches on multiple CPUs to
> > common library helpers for both WBINVD and WBNOINVD (KVM will use both).
> > Put the onus on the caller to check for a non-empty mask to simplify the
> > SMP=n implementation, e.g. so that it doesn't need to check that the one
> > and only CPU in the system is present in the mask.
> > 
> > Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn>
> > Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
> > Link: https://lore.kernel.org/r/20250128015345.7929-2-szy0127@sjtu.edu.cn
> > [sean: move to lib, add SMP=n helpers, clarify usage]
> > Acked-by: Kai Huang <kai.huang@intel.com>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/x86/include/asm/smp.h | 12 ++++++++++++
> >  arch/x86/kvm/x86.c         |  8 +-------
> >  arch/x86/lib/cache-smp.c   | 12 ++++++++++++
> >  3 files changed, 25 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
> > index e08f1ae25401..fe98e021f7f8 100644
> > --- a/arch/x86/include/asm/smp.h
> > +++ b/arch/x86/include/asm/smp.h
> > @@ -113,7 +113,9 @@ void native_play_dead(void);
> >  void play_dead_common(void);
> >  void wbinvd_on_cpu(int cpu);
> >  void wbinvd_on_all_cpus(void);
> > +void wbinvd_on_many_cpus(struct cpumask *cpus);
> >  void wbnoinvd_on_all_cpus(void);
> > +void wbnoinvd_on_many_cpus(struct cpumask *cpus);
> 
> Let's go with the _on_cpumask() suffix:
> 
>     void wbinvd_on_cpu(int cpu);
>    +void wbinvd_on_cpumask(struct cpumask *cpus);
>     void wbinvd_on_all_cpus(void);

How about wbinvd_on_cpus_mask(), to make it more obvious that it operates on
multiple CPUs?  At a glance, wbinvd_on_cpumask() could be mistaken for a masked
version of wbinvd_on_cpu().

Re: [PATCH v2 7/8] x86, lib: Add wbinvd and wbnoinvd helpers to target multiple CPUs

Posted by Ingo Molnar 8 months, 3 weeks ago

* Sean Christopherson <seanjc@google.com> wrote:

> On Sat, May 17, 2025, Ingo Molnar wrote:
> > 
> > * Sean Christopherson <seanjc@google.com> wrote:
> > 
> > > From: Zheyun Shen <szy0127@sjtu.edu.cn>
> > > 
> > > Extract KVM's open-coded calls to do writeback caches on multiple CPUs to
> > > common library helpers for both WBINVD and WBNOINVD (KVM will use both).
> > > Put the onus on the caller to check for a non-empty mask to simplify the
> > > SMP=n implementation, e.g. so that it doesn't need to check that the one
> > > and only CPU in the system is present in the mask.
> > > 
> > > Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn>
> > > Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
> > > Link: https://lore.kernel.org/r/20250128015345.7929-2-szy0127@sjtu.edu.cn
> > > [sean: move to lib, add SMP=n helpers, clarify usage]
> > > Acked-by: Kai Huang <kai.huang@intel.com>
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > ---
> > >  arch/x86/include/asm/smp.h | 12 ++++++++++++
> > >  arch/x86/kvm/x86.c         |  8 +-------
> > >  arch/x86/lib/cache-smp.c   | 12 ++++++++++++
> > >  3 files changed, 25 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
> > > index e08f1ae25401..fe98e021f7f8 100644
> > > --- a/arch/x86/include/asm/smp.h
> > > +++ b/arch/x86/include/asm/smp.h
> > > @@ -113,7 +113,9 @@ void native_play_dead(void);
> > >  void play_dead_common(void);
> > >  void wbinvd_on_cpu(int cpu);
> > >  void wbinvd_on_all_cpus(void);
> > > +void wbinvd_on_many_cpus(struct cpumask *cpus);
> > >  void wbnoinvd_on_all_cpus(void);
> > > +void wbnoinvd_on_many_cpus(struct cpumask *cpus);
> > 
> > Let's go with the _on_cpumask() suffix:
> > 
> >     void wbinvd_on_cpu(int cpu);
> >    +void wbinvd_on_cpumask(struct cpumask *cpus);
> >     void wbinvd_on_all_cpus(void);
> 
> How about wbinvd_on_cpus_mask(), to make it more obvious that it operates on
> multiple CPUs?  At a glance, wbinvd_on_cpumask() could be mistaken for a masked
> version of wbinvd_on_cpu().

Works for me!

Thanks,

	Ingo

Re: [PATCH v2 7/8] x86, lib: Add wbinvd and wbnoinvd helpers to target multiple CPUs

Posted by Sean Christopherson 8 months, 3 weeks ago

On Sat, May 17, 2025, Ingo Molnar wrote:
> 
> * Sean Christopherson <seanjc@google.com> wrote:
> 
> > From: Zheyun Shen <szy0127@sjtu.edu.cn>
> > 
> > Extract KVM's open-coded calls to do writeback caches on multiple CPUs to
> > common library helpers for both WBINVD and WBNOINVD (KVM will use both).
> > Put the onus on the caller to check for a non-empty mask to simplify the
> > SMP=n implementation, e.g. so that it doesn't need to check that the one
> > and only CPU in the system is present in the mask.
> > 
> > Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn>
> > Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
> > Link: https://lore.kernel.org/r/20250128015345.7929-2-szy0127@sjtu.edu.cn
> > [sean: move to lib, add SMP=n helpers, clarify usage]
> > Acked-by: Kai Huang <kai.huang@intel.com>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/x86/include/asm/smp.h | 12 ++++++++++++
> >  arch/x86/kvm/x86.c         |  8 +-------
> >  arch/x86/lib/cache-smp.c   | 12 ++++++++++++
> >  3 files changed, 25 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
> > index e08f1ae25401..fe98e021f7f8 100644
> > --- a/arch/x86/include/asm/smp.h
> > +++ b/arch/x86/include/asm/smp.h
> > @@ -113,7 +113,9 @@ void native_play_dead(void);
> >  void play_dead_common(void);
> >  void wbinvd_on_cpu(int cpu);
> >  void wbinvd_on_all_cpus(void);
> > +void wbinvd_on_many_cpus(struct cpumask *cpus);
> >  void wbnoinvd_on_all_cpus(void);
> > +void wbnoinvd_on_many_cpus(struct cpumask *cpus);
> 
> Let's go with the _on_cpumask() suffix:
> 
>     void wbinvd_on_cpu(int cpu);
>    +void wbinvd_on_cpumask(struct cpumask *cpus);
>     void wbinvd_on_all_cpus(void);
> 
> And the wb*invd_all_cpus() methods should probably be inlined wrappers 
> with -1 as the cpumask, or so - not two separate functions?

Using two separate functions allows _on_all_cpus() to defer the mask generation
to on_each_cpu(), i.e. avoids having to duplicate the passing of cpu_online_mask.
IMO, duplicating passing __wbinvd is preferable to duplicating the use of
cpu_online_mask.
 
> In fact it would be nice to have the DRM preparatory patch and all the 
> x86 patches at the beginning of the next version of the series, so 
> those 4 patches can be applied to the x86 tree. Can make it a separate 
> permanent branch based on v6.15-rc6/rc7.

Can do, assuming there's no lurking dependency I'm missing.

[PATCH v2 1/8] KVM: SVM: Remove wbinvd in sev_vm_destroy()
[PATCH v2 2/8] drm/gpu: Remove dead checks on wbinvd_on_all_cpus()'s return value
[PATCH v2 3/8] x86, lib: Drop the unused return value from wbinvd_on_all_cpus()
[PATCH v2 4/8] x86, lib: Add WBNOINVD helper functions
[PATCH v2 5/8] KVM: SEV: Prefer WBNOINVD over WBINVD for cache maintenance efficiency
[PATCH v2 6/8] KVM: x86: Use wbinvd_on_cpu() instead of an open-coded equivalent
[PATCH v2 7/8] x86, lib: Add wbinvd and wbnoinvd helpers to target multiple CPUs
[PATCH v2 8/8] KVM: SVM: Flush cache only on CPUs running SEV guest