Linux remembers cpu_cachinfo::num_leaves per CPU, but x86 initializes all
CPUs from the same global "num_cache_leaves".
This is erroneous on systems such as Meteor Lake, where each CPU has a
distinct num_leaves value. Delete the global "num_cache_leaves" and
initialize num_leaves on each CPU.
Reviewed-by: Andreas Herrmann <aherrmann@suse.de>
Reviewed-by: Len Brown <len.brown@intel.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Tested-by: Andreas Herrmann <aherrmann@suse.de>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Cc: Andreas Herrmann <aherrmann@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Yu <yu.c.chen@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Nikolay Borisov <nik.borisov@suse.com>
Cc: Radu Rendec <rrendec@redhat.com>
Cc: Pierre Gondois <Pierre.Gondois@arm.com>
Cc: Pu Wen <puwen@hygon.cn>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: stable@vger.kernel.org # 6.3+
---
After this change, all CPUs will traverse CPUID leaf 0x4 when booted for
the first time. On systems with symmetric cache topologies this is
useless work.
Creating a list of processor models that have asymmetric cache topologies
was considered. The burden of maintaining such list would outweigh the
performance benefit of skipping this extra step.
---
Changes since v5:
* Reordered the arguments of set_num_cache_leaves() for readability.
(Nikolay)
* Added Reviewed-by tag from Nikolay and Andreas. Thanks!
* Added Tested-by tag from Andreas. Thanks!
Changes since v4:
* None
Changes since v3:
* Rebased on v6.7-rc5.
Changes since v2:
* None
Changes since v1:
* Do not make num_cache_leaves a per-CPU variable. Instead, reuse the
existing per-CPU ci_cpu_cacheinfo variable. (Dave Hansen)
---
arch/x86/kernel/cpu/cacheinfo.c | 44 +++++++++++++++++++--------------
1 file changed, 26 insertions(+), 18 deletions(-)
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 392d09c936d6..182cacd772b8 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -178,7 +178,16 @@ struct _cpuid4_info_regs {
struct amd_northbridge *nb;
};
-static unsigned short num_cache_leaves;
+static inline unsigned int get_num_cache_leaves(unsigned int cpu)
+{
+ return get_cpu_cacheinfo(cpu)->num_leaves;
+}
+
+static inline void
+set_num_cache_leaves(unsigned int cpu, unsigned int nr_leaves)
+{
+ get_cpu_cacheinfo(cpu)->num_leaves = nr_leaves;
+}
/* AMD doesn't have CPUID4. Emulate it here to report the same
information to the user. This makes some assumptions about the machine:
@@ -718,19 +727,21 @@ void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c)
void init_amd_cacheinfo(struct cpuinfo_x86 *c)
{
+ unsigned int cpu = c->cpu_index;
+
if (boot_cpu_has(X86_FEATURE_TOPOEXT)) {
- num_cache_leaves = find_num_cache_leaves(c);
+ set_num_cache_leaves(cpu, find_num_cache_leaves(c));
} else if (c->extended_cpuid_level >= 0x80000006) {
if (cpuid_edx(0x80000006) & 0xf000)
- num_cache_leaves = 4;
+ set_num_cache_leaves(cpu, 4);
else
- num_cache_leaves = 3;
+ set_num_cache_leaves(cpu, 3);
}
}
void init_hygon_cacheinfo(struct cpuinfo_x86 *c)
{
- num_cache_leaves = find_num_cache_leaves(c);
+ set_num_cache_leaves(c->cpu_index, find_num_cache_leaves(c));
}
void init_intel_cacheinfo(struct cpuinfo_x86 *c)
@@ -742,19 +753,19 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
unsigned int l2_id = 0, l3_id = 0, num_threads_sharing, index_msb;
if (c->cpuid_level > 3) {
- static int is_initialized;
-
- if (is_initialized == 0) {
- /* Init num_cache_leaves from boot CPU */
- num_cache_leaves = find_num_cache_leaves(c);
- is_initialized++;
- }
+ /*
+ * There should be at least one leaf. A non-zero value means
+ * that the number of leaves has been initialized.
+ */
+ if (!get_num_cache_leaves(c->cpu_index))
+ set_num_cache_leaves(c->cpu_index,
+ find_num_cache_leaves(c));
/*
* Whenever possible use cpuid(4), deterministic cache
* parameters cpuid leaf to find the cache details
*/
- for (i = 0; i < num_cache_leaves; i++) {
+ for (i = 0; i < get_num_cache_leaves(c->cpu_index); i++) {
struct _cpuid4_info_regs this_leaf = {};
int retval;
@@ -790,14 +801,14 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
* Don't use cpuid2 if cpuid4 is supported. For P4, we use cpuid2 for
* trace cache
*/
- if ((num_cache_leaves == 0 || c->x86 == 15) && c->cpuid_level > 1) {
+ if ((!get_num_cache_leaves(c->cpu_index) || c->x86 == 15) && c->cpuid_level > 1) {
/* supports eax=2 call */
int j, n;
unsigned int regs[4];
unsigned char *dp = (unsigned char *)regs;
int only_trace = 0;
- if (num_cache_leaves != 0 && c->x86 == 15)
+ if (get_num_cache_leaves(c->cpu_index) && c->x86 == 15)
only_trace = 1;
/* Number of times to iterate */
@@ -993,12 +1004,9 @@ int init_cache_level(unsigned int cpu)
{
struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
- if (!num_cache_leaves)
- return -ENOENT;
if (!this_cpu_ci)
return -EINVAL;
this_cpu_ci->num_levels = 3;
- this_cpu_ci->num_leaves = num_cache_leaves;
return 0;
}
--
2.34.1
On Fri, Sep 13, 2024 at 01:31:54AM -0700, Ricardo Neri wrote: > diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c > index 392d09c936d6..182cacd772b8 100644 > --- a/arch/x86/kernel/cpu/cacheinfo.c > +++ b/arch/x86/kernel/cpu/cacheinfo.c > @@ -178,7 +178,16 @@ struct _cpuid4_info_regs { > struct amd_northbridge *nb; > }; > > -static unsigned short num_cache_leaves; > +static inline unsigned int get_num_cache_leaves(unsigned int cpu) > +{ > + return get_cpu_cacheinfo(cpu)->num_leaves; > +} There already is #define cache_leaves(cpu) (ci_cacheinfo(cpu)->num_leaves) And there's also get_cpu_cacheinfo(). And now you're adding more silly wrappers. Yuck. Can we pls use *one* of those things and work with it everywhere? > @@ -742,19 +753,19 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c) > unsigned int l2_id = 0, l3_id = 0, num_threads_sharing, index_msb; > > if (c->cpuid_level > 3) { > - static int is_initialized; > - > - if (is_initialized == 0) { > - /* Init num_cache_leaves from boot CPU */ > - num_cache_leaves = find_num_cache_leaves(c); > - is_initialized++; > - } > + /* > + * There should be at least one leaf. A non-zero value means > + * that the number of leaves has been initialized. > + */ > + if (!get_num_cache_leaves(c->cpu_index)) > + set_num_cache_leaves(c->cpu_index, > + find_num_cache_leaves(c)); Ugly linebreak. > > /* > * Whenever possible use cpuid(4), deterministic cache > * parameters cpuid leaf to find the cache details > */ > - for (i = 0; i < num_cache_leaves; i++) { > + for (i = 0; i < get_num_cache_leaves(c->cpu_index); i++) { > struct _cpuid4_info_regs this_leaf = {}; > int retval; > > @@ -790,14 +801,14 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c) > * Don't use cpuid2 if cpuid4 is supported. For P4, we use cpuid2 for > * trace cache > */ > - if ((num_cache_leaves == 0 || c->x86 == 15) && c->cpuid_level > 1) { > + if ((!get_num_cache_leaves(c->cpu_index) || c->x86 == 15) && c->cpuid_level > 1) { > /* supports eax=2 call */ > int j, n; > unsigned int regs[4]; > unsigned char *dp = (unsigned char *)regs; > int only_trace = 0; > > - if (num_cache_leaves != 0 && c->x86 == 15) > + if (get_num_cache_leaves(c->cpu_index) && c->x86 == 15) > only_trace = 1; > > /* Number of times to iterate */ > @@ -993,12 +1004,9 @@ int init_cache_level(unsigned int cpu) > { > struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu); > > - if (!num_cache_leaves) > - return -ENOENT; Why not if (!cache_leaves(cpu)) return -ENOENT; ? > if (!this_cpu_ci) > return -EINVAL; > this_cpu_ci->num_levels = 3; > - this_cpu_ci->num_leaves = num_cache_leaves; > return 0; > } > > -- > 2.34.1 > -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
On Tue, Oct 22, 2024 at 03:20:50PM +0200, Borislav Petkov wrote: > On Fri, Sep 13, 2024 at 01:31:54AM -0700, Ricardo Neri wrote: > > diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c > > index 392d09c936d6..182cacd772b8 100644 > > --- a/arch/x86/kernel/cpu/cacheinfo.c > > +++ b/arch/x86/kernel/cpu/cacheinfo.c > > @@ -178,7 +178,16 @@ struct _cpuid4_info_regs { > > struct amd_northbridge *nb; > > }; > > > > -static unsigned short num_cache_leaves; > > +static inline unsigned int get_num_cache_leaves(unsigned int cpu) > > +{ > > + return get_cpu_cacheinfo(cpu)->num_leaves; > > +} > > There already is > > #define cache_leaves(cpu) (ci_cacheinfo(cpu)->num_leaves) > > And there's also get_cpu_cacheinfo(). > > And now you're adding more silly wrappers. Yuck. > > Can we pls use *one* of those things and work with it everywhere? I agree. Another wrapper is not needed. I did not use cache_leaves() because it was internal to drivers/base/cacheinfo.c I can convert it to a function and expose it in include/linux/cacheinfo.h. I can rename it as get_cacheinfo_leaves(unsigned int cpu). Would that make sense? > > > @@ -742,19 +753,19 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c) > > unsigned int l2_id = 0, l3_id = 0, num_threads_sharing, index_msb; > > > > if (c->cpuid_level > 3) { > > - static int is_initialized; > > - > > - if (is_initialized == 0) { > > - /* Init num_cache_leaves from boot CPU */ > > - num_cache_leaves = find_num_cache_leaves(c); > > - is_initialized++; > > - } > > + /* > > + * There should be at least one leaf. A non-zero value means > > + * that the number of leaves has been initialized. > > + */ > > + if (!get_num_cache_leaves(c->cpu_index)) > > + set_num_cache_leaves(c->cpu_index, > > + find_num_cache_leaves(c)); > > Ugly linebreak. I will make it a single line. > > > > > /* > > * Whenever possible use cpuid(4), deterministic cache > > * parameters cpuid leaf to find the cache details > > */ > > - for (i = 0; i < num_cache_leaves; i++) { > > + for (i = 0; i < get_num_cache_leaves(c->cpu_index); i++) { > > struct _cpuid4_info_regs this_leaf = {}; > > int retval; > > > > @@ -790,14 +801,14 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c) > > * Don't use cpuid2 if cpuid4 is supported. For P4, we use cpuid2 for > > * trace cache > > */ > > - if ((num_cache_leaves == 0 || c->x86 == 15) && c->cpuid_level > 1) { > > + if ((!get_num_cache_leaves(c->cpu_index) || c->x86 == 15) && c->cpuid_level > 1) { > > /* supports eax=2 call */ > > int j, n; > > unsigned int regs[4]; > > unsigned char *dp = (unsigned char *)regs; > > int only_trace = 0; > > > > - if (num_cache_leaves != 0 && c->x86 == 15) > > + if (get_num_cache_leaves(c->cpu_index) && c->x86 == 15) > > only_trace = 1; > > > > /* Number of times to iterate */ > > @@ -993,12 +1004,9 @@ int init_cache_level(unsigned int cpu) > > { > > struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu); > > > > - if (!num_cache_leaves) > > - return -ENOENT; > > Why not > > if (!cache_leaves(cpu)) > return -ENOENT; > > ? The only caller of init_cache_level() also checks for !cache_leaves(cpu). I saw no need to repeat the check here. Also, I understand that the purpose of the function is to initialize cpu_cacheinfo::num_levels, which is not used on x86. Moreover, cpu_cacheinfo::num_levels do not depend on num_leaves. Having said that, I see other architectures initializing both num_levels and num_leaves in this function. Adding this check probably makes the x86 implementation more future-proof in case callers change their behavior. Thanks and BR, Ricardo
On Tue, Oct 22, 2024 at 08:50:22PM -0700, Ricardo Neri wrote: > I agree. Another wrapper is not needed. I did not use cache_leaves() because > it was internal to drivers/base/cacheinfo.c I can convert it to a function > and expose it in include/linux/cacheinfo.h. I can rename it as > get_cacheinfo_leaves(unsigned int cpu). > > Would that make sense? I think you should use get_cpu_cacheinfo() everywhere and simply access the struct members like ->num_leaves where you need it. No need for a bunch of other silly one-liners. > The only caller of init_cache_level() also checks for !cache_leaves(cpu). I > saw no need to repeat the check here. > > Also, I understand that the purpose of the function is to initialize > cpu_cacheinfo::num_levels, which is not used on x86. Moreover, > cpu_cacheinfo::num_levels do not depend on num_leaves. > > Having said that, I see other architectures initializing both num_levels > and num_leaves in this function. > > Adding this check probably makes the x86 implementation more future-proof > in case callers change their behavior. But you're practically zapping its body in the next patch. So why does patch 3 even exist as a separate patch instead of being part of patch 2? -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
On Fri, Nov 08, 2024 at 12:58:25PM +0100, Borislav Petkov wrote: > On Tue, Oct 22, 2024 at 08:50:22PM -0700, Ricardo Neri wrote: > > I agree. Another wrapper is not needed. I did not use cache_leaves() because > > it was internal to drivers/base/cacheinfo.c I can convert it to a function > > and expose it in include/linux/cacheinfo.h. I can rename it as > > get_cacheinfo_leaves(unsigned int cpu). > > > > Would that make sense? > > I think you should use get_cpu_cacheinfo() everywhere and simply access the > struct members like ->num_leaves where you need it. No need for a bunch of > other silly one-liners. Sure, I can do this. > > > The only caller of init_cache_level() also checks for !cache_leaves(cpu). I > > saw no need to repeat the check here. > > > > Also, I understand that the purpose of the function is to initialize > > cpu_cacheinfo::num_levels, which is not used on x86. Moreover, > > cpu_cacheinfo::num_levels do not depend on num_leaves. > > > > Having said that, I see other architectures initializing both num_levels > > and num_leaves in this function. > > > > Adding this check probably makes the x86 implementation more future-proof > > in case callers change their behavior. > > But you're practically zapping its body in the next patch. So why does patch > 3 even exist as a separate patch instead of being part of patch 2? Because patch 2 deals with cpu_cacheinfo::num_leaves whereas patch 3 deals with cpu_cacheinfo:::num_levels. I think I see your point: it can be argued that both patches deal with init_cache_level(). I can merge these two patches together. Thanks and BR, Ricardo
© 2016 - 2024 Red Hat, Inc.