[PATCH v7 2/3] x86/cacheinfo: Delete global num_cache_leaves

Ricardo Neri posted 3 patches 2 months, 2 weeks ago
There is a newer version of this series
[PATCH v7 2/3] x86/cacheinfo: Delete global num_cache_leaves
Posted by Ricardo Neri 2 months, 2 weeks ago
Linux remembers cpu_cachinfo::num_leaves per CPU, but x86 initializes all
CPUs from the same global "num_cache_leaves".

This is erroneous on systems such as Meteor Lake, where each CPU has a
distinct num_leaves value. Delete the global "num_cache_leaves" and
initialize num_leaves on each CPU.

Reviewed-by: Andreas Herrmann <aherrmann@suse.de>
Reviewed-by: Len Brown <len.brown@intel.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Tested-by: Andreas Herrmann <aherrmann@suse.de>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Cc: Andreas Herrmann <aherrmann@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Yu <yu.c.chen@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Nikolay Borisov <nik.borisov@suse.com>
Cc: Radu Rendec <rrendec@redhat.com>
Cc: Pierre Gondois <Pierre.Gondois@arm.com>
Cc: Pu Wen <puwen@hygon.cn>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: stable@vger.kernel.org # 6.3+
---
After this change, all CPUs will traverse CPUID leaf 0x4 when booted for
the first time. On systems with symmetric cache topologies this is
useless work.

Creating a list of processor models that have asymmetric cache topologies
was considered. The burden of maintaining such list would outweigh the
performance benefit of skipping this extra step.
---
Changes since v5:
 * Reordered the arguments of set_num_cache_leaves() for readability.
   (Nikolay)
 * Added Reviewed-by tag from Nikolay and Andreas. Thanks!
 * Added Tested-by tag from Andreas. Thanks!

Changes since v4:
 * None

Changes since v3:
 * Rebased on v6.7-rc5.

Changes since v2:
 * None

Changes since v1:
 * Do not make num_cache_leaves a per-CPU variable. Instead, reuse the
   existing per-CPU ci_cpu_cacheinfo variable. (Dave Hansen)
---
 arch/x86/kernel/cpu/cacheinfo.c | 44 +++++++++++++++++++--------------
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 392d09c936d6..182cacd772b8 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -178,7 +178,16 @@ struct _cpuid4_info_regs {
 	struct amd_northbridge *nb;
 };
 
-static unsigned short num_cache_leaves;
+static inline unsigned int get_num_cache_leaves(unsigned int cpu)
+{
+	return get_cpu_cacheinfo(cpu)->num_leaves;
+}
+
+static inline void
+set_num_cache_leaves(unsigned int cpu, unsigned int nr_leaves)
+{
+	get_cpu_cacheinfo(cpu)->num_leaves = nr_leaves;
+}
 
 /* AMD doesn't have CPUID4. Emulate it here to report the same
    information to the user.  This makes some assumptions about the machine:
@@ -718,19 +727,21 @@ void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c)
 void init_amd_cacheinfo(struct cpuinfo_x86 *c)
 {
 
+	unsigned int cpu = c->cpu_index;
+
 	if (boot_cpu_has(X86_FEATURE_TOPOEXT)) {
-		num_cache_leaves = find_num_cache_leaves(c);
+		set_num_cache_leaves(cpu, find_num_cache_leaves(c));
 	} else if (c->extended_cpuid_level >= 0x80000006) {
 		if (cpuid_edx(0x80000006) & 0xf000)
-			num_cache_leaves = 4;
+			set_num_cache_leaves(cpu, 4);
 		else
-			num_cache_leaves = 3;
+			set_num_cache_leaves(cpu, 3);
 	}
 }
 
 void init_hygon_cacheinfo(struct cpuinfo_x86 *c)
 {
-	num_cache_leaves = find_num_cache_leaves(c);
+	set_num_cache_leaves(c->cpu_index, find_num_cache_leaves(c));
 }
 
 void init_intel_cacheinfo(struct cpuinfo_x86 *c)
@@ -742,19 +753,19 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
 	unsigned int l2_id = 0, l3_id = 0, num_threads_sharing, index_msb;
 
 	if (c->cpuid_level > 3) {
-		static int is_initialized;
-
-		if (is_initialized == 0) {
-			/* Init num_cache_leaves from boot CPU */
-			num_cache_leaves = find_num_cache_leaves(c);
-			is_initialized++;
-		}
+		/*
+		 * There should be at least one leaf. A non-zero value means
+		 * that the number of leaves has been initialized.
+		 */
+		if (!get_num_cache_leaves(c->cpu_index))
+			set_num_cache_leaves(c->cpu_index,
+					     find_num_cache_leaves(c));
 
 		/*
 		 * Whenever possible use cpuid(4), deterministic cache
 		 * parameters cpuid leaf to find the cache details
 		 */
-		for (i = 0; i < num_cache_leaves; i++) {
+		for (i = 0; i < get_num_cache_leaves(c->cpu_index); i++) {
 			struct _cpuid4_info_regs this_leaf = {};
 			int retval;
 
@@ -790,14 +801,14 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
 	 * Don't use cpuid2 if cpuid4 is supported. For P4, we use cpuid2 for
 	 * trace cache
 	 */
-	if ((num_cache_leaves == 0 || c->x86 == 15) && c->cpuid_level > 1) {
+	if ((!get_num_cache_leaves(c->cpu_index) || c->x86 == 15) && c->cpuid_level > 1) {
 		/* supports eax=2  call */
 		int j, n;
 		unsigned int regs[4];
 		unsigned char *dp = (unsigned char *)regs;
 		int only_trace = 0;
 
-		if (num_cache_leaves != 0 && c->x86 == 15)
+		if (get_num_cache_leaves(c->cpu_index) && c->x86 == 15)
 			only_trace = 1;
 
 		/* Number of times to iterate */
@@ -993,12 +1004,9 @@ int init_cache_level(unsigned int cpu)
 {
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
 
-	if (!num_cache_leaves)
-		return -ENOENT;
 	if (!this_cpu_ci)
 		return -EINVAL;
 	this_cpu_ci->num_levels = 3;
-	this_cpu_ci->num_leaves = num_cache_leaves;
 	return 0;
 }
 
-- 
2.34.1
Re: [PATCH v7 2/3] x86/cacheinfo: Delete global num_cache_leaves
Posted by Borislav Petkov 1 month, 1 week ago
On Fri, Sep 13, 2024 at 01:31:54AM -0700, Ricardo Neri wrote:
> diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
> index 392d09c936d6..182cacd772b8 100644
> --- a/arch/x86/kernel/cpu/cacheinfo.c
> +++ b/arch/x86/kernel/cpu/cacheinfo.c
> @@ -178,7 +178,16 @@ struct _cpuid4_info_regs {
>  	struct amd_northbridge *nb;
>  };
>  
> -static unsigned short num_cache_leaves;
> +static inline unsigned int get_num_cache_leaves(unsigned int cpu)
> +{
> +	return get_cpu_cacheinfo(cpu)->num_leaves;
> +}

There already is

#define cache_leaves(cpu)       (ci_cacheinfo(cpu)->num_leaves)

And there's also get_cpu_cacheinfo().

And now you're adding more silly wrappers. Yuck.

Can we pls use *one* of those things and work with it everywhere?

> @@ -742,19 +753,19 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
>  	unsigned int l2_id = 0, l3_id = 0, num_threads_sharing, index_msb;
>  
>  	if (c->cpuid_level > 3) {
> -		static int is_initialized;
> -
> -		if (is_initialized == 0) {
> -			/* Init num_cache_leaves from boot CPU */
> -			num_cache_leaves = find_num_cache_leaves(c);
> -			is_initialized++;
> -		}
> +		/*
> +		 * There should be at least one leaf. A non-zero value means
> +		 * that the number of leaves has been initialized.
> +		 */
> +		if (!get_num_cache_leaves(c->cpu_index))
> +			set_num_cache_leaves(c->cpu_index,
> +					     find_num_cache_leaves(c));

Ugly linebreak.

>  
>  		/*
>  		 * Whenever possible use cpuid(4), deterministic cache
>  		 * parameters cpuid leaf to find the cache details
>  		 */
> -		for (i = 0; i < num_cache_leaves; i++) {
> +		for (i = 0; i < get_num_cache_leaves(c->cpu_index); i++) {
>  			struct _cpuid4_info_regs this_leaf = {};
>  			int retval;
>  
> @@ -790,14 +801,14 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
>  	 * Don't use cpuid2 if cpuid4 is supported. For P4, we use cpuid2 for
>  	 * trace cache
>  	 */
> -	if ((num_cache_leaves == 0 || c->x86 == 15) && c->cpuid_level > 1) {
> +	if ((!get_num_cache_leaves(c->cpu_index) || c->x86 == 15) && c->cpuid_level > 1) {
>  		/* supports eax=2  call */
>  		int j, n;
>  		unsigned int regs[4];
>  		unsigned char *dp = (unsigned char *)regs;
>  		int only_trace = 0;
>  
> -		if (num_cache_leaves != 0 && c->x86 == 15)
> +		if (get_num_cache_leaves(c->cpu_index) && c->x86 == 15)
>  			only_trace = 1;
>  
>  		/* Number of times to iterate */
> @@ -993,12 +1004,9 @@ int init_cache_level(unsigned int cpu)
>  {
>  	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
>  
> -	if (!num_cache_leaves)
> -		return -ENOENT;

Why not

	if (!cache_leaves(cpu))
		return -ENOENT;

?

>  	if (!this_cpu_ci)
>  		return -EINVAL;
>  	this_cpu_ci->num_levels = 3;
> -	this_cpu_ci->num_leaves = num_cache_leaves;
>  	return 0;
>  }
>  
> -- 
> 2.34.1
> 

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH v7 2/3] x86/cacheinfo: Delete global num_cache_leaves
Posted by Ricardo Neri 1 month, 1 week ago
On Tue, Oct 22, 2024 at 03:20:50PM +0200, Borislav Petkov wrote:
> On Fri, Sep 13, 2024 at 01:31:54AM -0700, Ricardo Neri wrote:
> > diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
> > index 392d09c936d6..182cacd772b8 100644
> > --- a/arch/x86/kernel/cpu/cacheinfo.c
> > +++ b/arch/x86/kernel/cpu/cacheinfo.c
> > @@ -178,7 +178,16 @@ struct _cpuid4_info_regs {
> >  	struct amd_northbridge *nb;
> >  };
> >  
> > -static unsigned short num_cache_leaves;
> > +static inline unsigned int get_num_cache_leaves(unsigned int cpu)
> > +{
> > +	return get_cpu_cacheinfo(cpu)->num_leaves;
> > +}
> 
> There already is
> 
> #define cache_leaves(cpu)       (ci_cacheinfo(cpu)->num_leaves)
> 
> And there's also get_cpu_cacheinfo().
> 
> And now you're adding more silly wrappers. Yuck.
> 
> Can we pls use *one* of those things and work with it everywhere?

I agree. Another wrapper is not needed. I did not use cache_leaves() because
it was internal to drivers/base/cacheinfo.c I can convert it to a function
and expose it in include/linux/cacheinfo.h. I can rename it as
get_cacheinfo_leaves(unsigned int cpu).

Would that make sense?

> 
> > @@ -742,19 +753,19 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
> >  	unsigned int l2_id = 0, l3_id = 0, num_threads_sharing, index_msb;
> >  
> >  	if (c->cpuid_level > 3) {
> > -		static int is_initialized;
> > -
> > -		if (is_initialized == 0) {
> > -			/* Init num_cache_leaves from boot CPU */
> > -			num_cache_leaves = find_num_cache_leaves(c);
> > -			is_initialized++;
> > -		}
> > +		/*
> > +		 * There should be at least one leaf. A non-zero value means
> > +		 * that the number of leaves has been initialized.
> > +		 */
> > +		if (!get_num_cache_leaves(c->cpu_index))
> > +			set_num_cache_leaves(c->cpu_index,
> > +					     find_num_cache_leaves(c));
> 
> Ugly linebreak.

I will make it a single line.

> 
> >  
> >  		/*
> >  		 * Whenever possible use cpuid(4), deterministic cache
> >  		 * parameters cpuid leaf to find the cache details
> >  		 */
> > -		for (i = 0; i < num_cache_leaves; i++) {
> > +		for (i = 0; i < get_num_cache_leaves(c->cpu_index); i++) {
> >  			struct _cpuid4_info_regs this_leaf = {};
> >  			int retval;
> >  
> > @@ -790,14 +801,14 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
> >  	 * Don't use cpuid2 if cpuid4 is supported. For P4, we use cpuid2 for
> >  	 * trace cache
> >  	 */
> > -	if ((num_cache_leaves == 0 || c->x86 == 15) && c->cpuid_level > 1) {
> > +	if ((!get_num_cache_leaves(c->cpu_index) || c->x86 == 15) && c->cpuid_level > 1) {
> >  		/* supports eax=2  call */
> >  		int j, n;
> >  		unsigned int regs[4];
> >  		unsigned char *dp = (unsigned char *)regs;
> >  		int only_trace = 0;
> >  
> > -		if (num_cache_leaves != 0 && c->x86 == 15)
> > +		if (get_num_cache_leaves(c->cpu_index) && c->x86 == 15)
> >  			only_trace = 1;
> >  
> >  		/* Number of times to iterate */
> > @@ -993,12 +1004,9 @@ int init_cache_level(unsigned int cpu)
> >  {
> >  	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
> >  
> > -	if (!num_cache_leaves)
> > -		return -ENOENT;
> 
> Why not
> 
> 	if (!cache_leaves(cpu))
> 		return -ENOENT;
> 
> ?

The only caller of init_cache_level() also checks for !cache_leaves(cpu). I
saw no need to repeat the check here.

Also, I understand that the purpose of the function is to initialize
cpu_cacheinfo::num_levels, which is not used on x86. Moreover,
cpu_cacheinfo::num_levels do not depend on num_leaves.

Having said that, I see other architectures initializing both num_levels
and num_leaves in this function.

Adding this check probably makes the x86 implementation more future-proof
in case callers change their behavior.

Thanks and BR,
Ricardo
Re: [PATCH v7 2/3] x86/cacheinfo: Delete global num_cache_leaves
Posted by Borislav Petkov 3 weeks ago
On Tue, Oct 22, 2024 at 08:50:22PM -0700, Ricardo Neri wrote:
> I agree. Another wrapper is not needed. I did not use cache_leaves() because
> it was internal to drivers/base/cacheinfo.c I can convert it to a function
> and expose it in include/linux/cacheinfo.h. I can rename it as
> get_cacheinfo_leaves(unsigned int cpu).
> 
> Would that make sense?

I think you should use get_cpu_cacheinfo() everywhere and simply access the
struct members like ->num_leaves where you need it. No need for a bunch of
other silly one-liners.

> The only caller of init_cache_level() also checks for !cache_leaves(cpu). I
> saw no need to repeat the check here.
> 
> Also, I understand that the purpose of the function is to initialize
> cpu_cacheinfo::num_levels, which is not used on x86. Moreover,
> cpu_cacheinfo::num_levels do not depend on num_leaves.
> 
> Having said that, I see other architectures initializing both num_levels
> and num_leaves in this function.
> 
> Adding this check probably makes the x86 implementation more future-proof
> in case callers change their behavior.

But you're practically zapping its body in the next patch. So why does patch
3 even exist as a separate patch instead of being part of patch 2?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH v7 2/3] x86/cacheinfo: Delete global num_cache_leaves
Posted by Ricardo Neri 2 weeks, 6 days ago
On Fri, Nov 08, 2024 at 12:58:25PM +0100, Borislav Petkov wrote:
> On Tue, Oct 22, 2024 at 08:50:22PM -0700, Ricardo Neri wrote:
> > I agree. Another wrapper is not needed. I did not use cache_leaves() because
> > it was internal to drivers/base/cacheinfo.c I can convert it to a function
> > and expose it in include/linux/cacheinfo.h. I can rename it as
> > get_cacheinfo_leaves(unsigned int cpu).
> > 
> > Would that make sense?
> 
> I think you should use get_cpu_cacheinfo() everywhere and simply access the
> struct members like ->num_leaves where you need it. No need for a bunch of
> other silly one-liners.

Sure, I can do this.

> 
> > The only caller of init_cache_level() also checks for !cache_leaves(cpu). I
> > saw no need to repeat the check here.
> > 
> > Also, I understand that the purpose of the function is to initialize
> > cpu_cacheinfo::num_levels, which is not used on x86. Moreover,
> > cpu_cacheinfo::num_levels do not depend on num_leaves.
> > 
> > Having said that, I see other architectures initializing both num_levels
> > and num_leaves in this function.
> > 
> > Adding this check probably makes the x86 implementation more future-proof
> > in case callers change their behavior.
> 
> But you're practically zapping its body in the next patch. So why does patch
> 3 even exist as a separate patch instead of being part of patch 2?

Because patch 2 deals with cpu_cacheinfo::num_leaves whereas patch 3 deals
with cpu_cacheinfo:::num_levels.

I think I see your point: it can be argued that both patches deal with
init_cache_level(). I can merge these two patches together.

Thanks and BR,
Ricardo