[v1] arm_mpam: Add basic mpam driver

[PATCH 05/33] ACPI / PPTT: Find cache level by cache-id

Posted by James Morse 5 months, 2 weeks ago

The MPAM table identifies caches by id. The MPAM driver also wants to know
the cache level to determine if the platform is of the shape that can be
managed via resctrl. Cacheinfo has this information, but only for CPUs that
are online.

Waiting for all CPUs to come online is a problem for platforms where
CPUs are brought online late by user-space.

Add a helper that walks every possible cache, until it finds the one
identified by cache-id, then return the level.
Add a cleanup based free-ing mechanism for acpi_get_table().

CC: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
 * acpi_count_levels() now returns a value.
 * Converted the table-get stuff to use Jonathan's cleanup helper.
 * Dropped Sudeep's Review tag due to the cleanup change.
---
 drivers/acpi/pptt.c  | 64 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h | 17 ++++++++++++
 2 files changed, 81 insertions(+)

diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 8f9b9508acba..660457644a5b 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -907,3 +907,67 @@ int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
 	return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
 					  ACPI_PPTT_ACPI_IDENTICAL);
 }
+
+/**
+ * find_acpi_cache_level_from_id() - Get the level of the specified cache
+ * @cache_id: The id field of the unified cache
+ *
+ * Determine the level relative to any CPU for the unified cache identified by
+ * cache_id. This allows the property to be found even if the CPUs are offline.
+ *
+ * The returned level can be used to group unified caches that are peers.
+ *
+ * The PPTT table must be rev 3 or later,
+ *
+ * If one CPUs L2 is shared with another as L3, this function will return
+ * an unpredictable value.
+ *
+ * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
+ * Otherwise returns a value which represents the level of the specified cache.
+ */
+int find_acpi_cache_level_from_id(u32 cache_id)
+{
+	u32 acpi_cpu_id;
+	int level, cpu, num_levels;
+	struct acpi_pptt_cache *cache;
+	struct acpi_pptt_cache_v1 *cache_v1;
+	struct acpi_pptt_processor *cpu_node;
+	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);
+
+	if (IS_ERR(table))
+		return PTR_ERR(table);
+
+	if (table->revision < 3)
+		return -ENOENT;
+
+	/*
+	 * If we found the cache first, we'd still need to walk from each CPU
+	 * to find the level...
+	 */
+	for_each_possible_cpu(cpu) {
+		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
+		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
+		if (!cpu_node)
+			return -ENOENT;
+		num_levels = acpi_count_levels(table, cpu_node, NULL);
+
+		/* Start at 1 for L1 */
+		for (level = 1; level <= num_levels; level++) {
+			cache = acpi_find_cache_node(table, acpi_cpu_id,
+						     ACPI_PPTT_CACHE_TYPE_UNIFIED,
+						     level, &cpu_node);
+			if (!cache)
+				continue;
+
+			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
+						cache,
+						sizeof(struct acpi_pptt_cache));
+
+			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
+			    cache_v1->cache_id == cache_id)
+				return level;
+		}
+	}
+
+	return -ENOENT;
+}
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index f97a9ff678cc..30c10b1dcdb2 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -8,6 +8,7 @@
 #ifndef _LINUX_ACPI_H
 #define _LINUX_ACPI_H
 
+#include <linux/cleanup.h>
 #include <linux/errno.h>
 #include <linux/ioport.h>	/* for struct resource */
 #include <linux/resource_ext.h>
@@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
 void acpi_table_init_complete (void);
 int acpi_table_init (void);
 
+static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
+{
+	struct acpi_table_header *table;
+	int status = acpi_get_table(signature, instance, &table);
+
+	if (ACPI_FAILURE(status))
+		return ERR_PTR(-ENOENT);
+	return table;
+}
+DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR(_T)) acpi_put_table(_T))
+
 int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
 int __init_or_acpilib acpi_table_parse_entries(char *id,
 		unsigned long table_size, int entry_id,
@@ -1542,6 +1554,7 @@ int find_acpi_cpu_topology_cluster(unsigned int cpu);
 int find_acpi_cpu_topology_package(unsigned int cpu);
 int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
 void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
+int find_acpi_cache_level_from_id(u32 cache_id);
 #else
 static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
 {
@@ -1565,6 +1578,10 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
 }
 static inline void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id,
 						     cpumask_t *cpus) { }
+static inline int find_acpi_cache_level_from_id(u32 cache_id)
+{
+	return -EINVAL;
+}
 #endif
 
 void acpi_arch_init(void);
-- 
2.20.1

Re: [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id

Posted by Dave Martin 5 months, 2 weeks ago

Hi,

On Fri, Aug 22, 2025 at 03:29:46PM +0000, James Morse wrote:
> The MPAM table identifies caches by id. The MPAM driver also wants to know
> the cache level to determine if the platform is of the shape that can be
> managed via resctrl. Cacheinfo has this information, but only for CPUs that
> are online.
> 
> Waiting for all CPUs to come online is a problem for platforms where
> CPUs are brought online late by user-space.
> 
> Add a helper that walks every possible cache, until it finds the one
> identified by cache-id, then return the level.
> Add a cleanup based free-ing mechanism for acpi_get_table().

Does this mean that the early secondaries must be spread out across the
whole topology so that everything can be probed?

(i.e., a random subset is no good?)

If so, is this documented somewhere, such as in booting.rst?


Maybe this is not a new requirement -- it's not an area that I'm very
familiar with.


> 
> CC: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
>  * acpi_count_levels() now returns a value.
>  * Converted the table-get stuff to use Jonathan's cleanup helper.
>  * Dropped Sudeep's Review tag due to the cleanup change.
> ---
>  drivers/acpi/pptt.c  | 64 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/acpi.h | 17 ++++++++++++
>  2 files changed, 81 insertions(+)
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 8f9b9508acba..660457644a5b 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -907,3 +907,67 @@ int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
>  	return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
>  					  ACPI_PPTT_ACPI_IDENTICAL);
>  }
> +
> +/**
> + * find_acpi_cache_level_from_id() - Get the level of the specified cache
> + * @cache_id: The id field of the unified cache
> + *
> + * Determine the level relative to any CPU for the unified cache identified by
> + * cache_id. This allows the property to be found even if the CPUs are offline.
> + *
> + * The returned level can be used to group unified caches that are peers.
> + *
> + * The PPTT table must be rev 3 or later,
> + *
> + * If one CPUs L2 is shared with another as L3, this function will return
> + * an unpredictable value.
> + *
> + * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.

Nit: doesn't exist or its revision is too old.

> + * Otherwise returns a value which represents the level of the specified cache.
> + */
> +int find_acpi_cache_level_from_id(u32 cache_id)
> +{
> +	u32 acpi_cpu_id;
> +	int level, cpu, num_levels;
> +	struct acpi_pptt_cache *cache;
> +	struct acpi_pptt_cache_v1 *cache_v1;
> +	struct acpi_pptt_processor *cpu_node;
> +	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);

acpi_get_pptt() ? (See comment on patch 3.)

Comments there also suggest that the acpi_put_table() may be
unnecessary, at least on some paths.

I haven't tried to understand the ins and outs of this.

> +
> +	if (IS_ERR(table))
> +		return PTR_ERR(table);
> +
> +	if (table->revision < 3)
> +		return -ENOENT;
> +
> +	/*
> +	 * If we found the cache first, we'd still need to walk from each CPU
> +	 * to find the level...
> +	 */

^ Possibly confusing comment?  The cache id is the starting point for
calling this function.  Is there a world in which we are at this point
without first having found the cache node?

(If the comment is just a restatement of part of the kerneldoc
description, maybe just drop it.)

> +	for_each_possible_cpu(cpu) {
> +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
> +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> +		if (!cpu_node)
> +			return -ENOENT;
> +		num_levels = acpi_count_levels(table, cpu_node, NULL);

Is the initial call to acpi_count_levels() really needed here?

It feels a bit like we end up enumerating the whole topology two or
three times here; once to count how many levels there are, and then
again to examine the nodes, and once more inside acpi_find_cache_node().

Why can't we just walk until we run out of levels?


I may be missing some details of how these functions interact -- if
this is only run at probe time, compact, well-factored code is
more important than making things as fast as possible.

> +
> +		/* Start at 1 for L1 */
> +		for (level = 1; level <= num_levels; level++) {
> +			cache = acpi_find_cache_node(table, acpi_cpu_id,
> +						     ACPI_PPTT_CACHE_TYPE_UNIFIED,
> +						     level, &cpu_node);
> +			if (!cache)
> +				continue;
> +
> +			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
> +						cache,
> +						sizeof(struct acpi_pptt_cache));
> +
> +			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
> +			    cache_v1->cache_id == cache_id)
> +				return level;
> +		}
> +	}
> +
> +	return -ENOENT;
> +}
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index f97a9ff678cc..30c10b1dcdb2 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h

[...]

> @@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
>  void acpi_table_init_complete (void);
>  int acpi_table_init (void);
>  
> +static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
> +{
> +	struct acpi_table_header *table;
> +	int status = acpi_get_table(signature, instance, &table);
> +
> +	if (ACPI_FAILURE(status))
> +		return ERR_PTR(-ENOENT);
> +	return table;
> +}

This feels like something that ought to exist already.  If not, why
not?  If so, are there open-coded versions of this spread around the
ACPI tree that should be ported to use it?

[...]

Cheers
---Dave

Re: [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id

Posted by James Morse 5 months, 2 weeks ago

Hi Dave,

On 27/08/2025 11:50, Dave Martin wrote:
> Hi,
> 
> On Fri, Aug 22, 2025 at 03:29:46PM +0000, James Morse wrote:
>> The MPAM table identifies caches by id. The MPAM driver also wants to know
>> the cache level to determine if the platform is of the shape that can be
>> managed via resctrl. Cacheinfo has this information, but only for CPUs that
>> are online.
>>
>> Waiting for all CPUs to come online is a problem for platforms where
>> CPUs are brought online late by user-space.
>>
>> Add a helper that walks every possible cache, until it finds the one
>> identified by cache-id, then return the level.
>> Add a cleanup based free-ing mechanism for acpi_get_table().

> Does this mean that the early secondaries must be spread out across the
> whole topology so that everything can be probed?
>
> (i.e., a random subset is no good?)

For the mpam driver - it needs to see each cache with mpam hardware, which means a CPU
associated with each cache needs to be online. Random is fine - provided you get lucky.


> If so, is this documented somewhere, such as in booting.rst?

booting.rst is for the bootloader.
Late secondaries is a bit of a niche sport, I've only seen it commonly done in VMs.
Most platforms so far have their MPAM controls on a global L3, so this requirement doesn't
make much of a difference.

The concern is that if resctrl gets probed after user-space has started, whatever
user-space service is supposed to set it up will have concluded its not supported. Working
with cache-ids for offline CPUs means you don't have to bring all the CPUs online - only
enough so that every piece of hardware is reachable.


> Maybe this is not a new requirement -- it's not an area that I'm very
> familiar with.

Hard to say - its a potentially surprising side effect of glomming OS accessible registers
onto the side of hardware that can be automatically powered off. (PSCI CPU_SUSPEND).

I did try getting cacheinfo to populate all the CPUs at boot, regardless of whether they
were online. Apparently that doesn't work for PowerPC where the properties of CPUs can
change while they are offline. (presumably due to RAS or a firmware update)


>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index 8f9b9508acba..660457644a5b 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -907,3 +907,67 @@ int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
>>  	return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
>>  					  ACPI_PPTT_ACPI_IDENTICAL);
>>  }
>> +
>> +/**
>> + * find_acpi_cache_level_from_id() - Get the level of the specified cache
>> + * @cache_id: The id field of the unified cache
>> + *
>> + * Determine the level relative to any CPU for the unified cache identified by
>> + * cache_id. This allows the property to be found even if the CPUs are offline.
>> + *
>> + * The returned level can be used to group unified caches that are peers.
>> + *
>> + * The PPTT table must be rev 3 or later,
>> + *
>> + * If one CPUs L2 is shared with another as L3, this function will return
>> + * an unpredictable value.
>> + *
>> + * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
> 
> Nit: doesn't exist or its revision is too old.

... its not old, but there is no published spec for that revision... unsupported?


>> + * Otherwise returns a value which represents the level of the specified cache.
>> + */
>> +int find_acpi_cache_level_from_id(u32 cache_id)
>> +{
>> +	u32 acpi_cpu_id;
>> +	int level, cpu, num_levels;
>> +	struct acpi_pptt_cache *cache;
>> +	struct acpi_pptt_cache_v1 *cache_v1;
>> +	struct acpi_pptt_processor *cpu_node;
>> +	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);

> acpi_get_pptt() ? (See comment on patch 3.)

Yup,

> Comments there also suggest that the acpi_put_table() may be
> unnecessary, at least on some paths.
> 
> I haven't tried to understand the ins and outs of this.

It's grabbing one reference and using it for everything, because it needs to 'map' the
table in atomic context due to cpuhp, but can't.
Given how frequently its used, there is no problem just leaving it mapped.


>> +
>> +	if (IS_ERR(table))
>> +		return PTR_ERR(table);
>> +
>> +	if (table->revision < 3)
>> +		return -ENOENT;
>> +
>> +	/*
>> +	 * If we found the cache first, we'd still need to walk from each CPU
>> +	 * to find the level...
>> +	 */

> ^ Possibly confusing comment?  The cache id is the starting point for
> calling this function.  Is there a world in which we are at this point
> without first having found the cache node?
> 
> (If the comment is just a restatement of part of the kerneldoc
> description, maybe just drop it.)

It's describing the alternate world where the table is searched to find the cache first,
but then we'd still need to walk the table another NR_CPUs times, which can't be avoided.
I'll drop it - it was justifying why its done this way round...


>> +	for_each_possible_cpu(cpu) {
>> +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
>> +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
>> +		if (!cpu_node)
>> +			return -ENOENT;
>> +		num_levels = acpi_count_levels(table, cpu_node, NULL);
> 
> Is the initial call to acpi_count_levels() really needed here?
> 
> It feels a bit like we end up enumerating the whole topology two or
> three times here; once to count how many levels there are, and then
> again to examine the nodes, and once more inside acpi_find_cache_node().
> 
> Why can't we just walk until we run out of levels?

This is looking for a unified cache - and we don't know where those start.
We could walk the first 100 caches, and stop once we start getting unified caches, then
they stop again ... but this seemed simpler.


> I may be missing some details of how these functions interact -- if
> this is only run at probe time, compact, well-factored code is
> more important than making things as fast as possible.


>> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
>> index f97a9ff678cc..30c10b1dcdb2 100644
>> --- a/include/linux/acpi.h
>> +++ b/include/linux/acpi.h
> 
> [...]
> 
>> @@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
>>  void acpi_table_init_complete (void);
>>  int acpi_table_init (void);
>>  
>> +static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
>> +{
>> +	struct acpi_table_header *table;
>> +	int status = acpi_get_table(signature, instance, &table);
>> +
>> +	if (ACPI_FAILURE(status))
>> +		return ERR_PTR(-ENOENT);
>> +	return table;
>> +}

> This feels like something that ought to exist already.  If not, why
> not?  If so, are there open-coded versions of this spread around the
> ACPI tree that should be ported to use it?


It's a cleanup idiom helper that lets the compiler do this automagically - but its moot as
its not going to be needed in the pptt because of the acpi_get_pptt() thing.


Thanks,

James

Re: [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id

Posted by Dave Martin 5 months ago

Hi James,

On Thu, Aug 28, 2025 at 04:58:05PM +0100, James Morse wrote:
> Hi Dave,
> 
> On 27/08/2025 11:50, Dave Martin wrote:
> > Hi,
> > 
> > On Fri, Aug 22, 2025 at 03:29:46PM +0000, James Morse wrote:
> >> The MPAM table identifies caches by id. The MPAM driver also wants to know
> >> the cache level to determine if the platform is of the shape that can be
> >> managed via resctrl. Cacheinfo has this information, but only for CPUs that
> >> are online.
> >>
> >> Waiting for all CPUs to come online is a problem for platforms where
> >> CPUs are brought online late by user-space.
> >>
> >> Add a helper that walks every possible cache, until it finds the one
> >> identified by cache-id, then return the level.
> >> Add a cleanup based free-ing mechanism for acpi_get_table().
> 
> > Does this mean that the early secondaries must be spread out across the
> > whole topology so that everything can be probed?
> >
> > (i.e., a random subset is no good?)
> 
> For the mpam driver - it needs to see each cache with mpam hardware, which means a CPU
> associated with each cache needs to be online. Random is fine - provided you get lucky.

"Fine" = "not dependent on luck".  So, random is not fine.

> > If so, is this documented somewhere, such as in booting.rst?
> 
> booting.rst is for the bootloader.
> Late secondaries is a bit of a niche sport, I've only seen it commonly done in VMs.
> Most platforms so far have their MPAM controls on a global L3, so this requirement doesn't
> make much of a difference.
> 
> The concern is that if resctrl gets probed after user-space has started, whatever
> user-space service is supposed to set it up will have concluded its not supported. Working
> with cache-ids for offline CPUs means you don't have to bring all the CPUs online - only
> enough so that every piece of hardware is reachable.
> 
> 
> > Maybe this is not a new requirement -- it's not an area that I'm very
> > familiar with.
> 
> Hard to say - its a potentially surprising side effect of glomming OS accessible registers
> onto the side of hardware that can be automatically powered off. (PSCI CPU_SUSPEND).
> 
> I did try getting cacheinfo to populate all the CPUs at boot, regardless of whether they
> were online. Apparently that doesn't work for PowerPC where the properties of CPUs can
> change while they are offline. (presumably due to RAS or a firmware update)

So, it sounds like there is a requirement, but we don't document it,
and if the requirement is not met then the user is presented with an
obscure failure in the MPAM driver.  This seems a bit unhelpful?

I'm not saying booting.rst is the right place for this -- maybe the
appropriate document doesn't exist yet.

I wonder whether the required property is reasonable and general enough
that it should be treated as a kernel boot requirement.

Or, we require caches to be symmetric for non-early CPUs and reject
those that don't match when they try to come online (similarly to
the way cpufeatures deals with mismatches).


> >> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> >> index 8f9b9508acba..660457644a5b 100644
> >> --- a/drivers/acpi/pptt.c
> >> +++ b/drivers/acpi/pptt.c
> >> @@ -907,3 +907,67 @@ int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
> >>  	return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
> >>  					  ACPI_PPTT_ACPI_IDENTICAL);
> >>  }
> >> +
> >> +/**
> >> + * find_acpi_cache_level_from_id() - Get the level of the specified cache
> >> + * @cache_id: The id field of the unified cache
> >> + *
> >> + * Determine the level relative to any CPU for the unified cache identified by
> >> + * cache_id. This allows the property to be found even if the CPUs are offline.
> >> + *
> >> + * The returned level can be used to group unified caches that are peers.
> >> + *
> >> + * The PPTT table must be rev 3 or later,
> >> + *
> >> + * If one CPUs L2 is shared with another as L3, this function will return
> >> + * an unpredictable value.
> >> + *
> >> + * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
> > 
> > Nit: doesn't exist or its revision is too old.
> 
> ... its not old, but there is no published spec for that revision... unsupported?

That seems OK, say,

	"... if the PPTT doesn't exist or has an unsupported format, or ..."


> >> + * Otherwise returns a value which represents the level of the specified cache.
> >> + */
> >> +int find_acpi_cache_level_from_id(u32 cache_id)
> >> +{
> >> +	u32 acpi_cpu_id;
> >> +	int level, cpu, num_levels;
> >> +	struct acpi_pptt_cache *cache;
> >> +	struct acpi_pptt_cache_v1 *cache_v1;
> >> +	struct acpi_pptt_processor *cpu_node;
> >> +	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);
> 
> > acpi_get_pptt() ? (See comment on patch 3.)
> 
> Yup,
> 
> > Comments there also suggest that the acpi_put_table() may be
> > unnecessary, at least on some paths.
> > 
> > I haven't tried to understand the ins and outs of this.
> 
> It's grabbing one reference and using it for everything, because it needs to 'map' the
> table in atomic context due to cpuhp, but can't.
> Given how frequently its used, there is no problem just leaving it mapped.

That's rather what I thought -- in which case we can use it (as you
already concluded, by the looks of it).

> >> +
> >> +	if (IS_ERR(table))
> >> +		return PTR_ERR(table);
> >> +
> >> +	if (table->revision < 3)
> >> +		return -ENOENT;
> >> +
> >> +	/*
> >> +	 * If we found the cache first, we'd still need to walk from each CPU
> >> +	 * to find the level...
> >> +	 */
> 
> > ^ Possibly confusing comment?  The cache id is the starting point for
> > calling this function.  Is there a world in which we are at this point
> > without first having found the cache node?
> > 
> > (If the comment is just a restatement of part of the kerneldoc
> > description, maybe just drop it.)
> 
> It's describing the alternate world where the table is searched to find the cache first,
> but then we'd still need to walk the table another NR_CPUs times, which can't be avoided.
> I'll drop it - it was justifying why its done this way round...

Oh, I see, this is "if the code had been written in such-and-such a way",
not "if such-and-such a runtime precondition is met" ?

The comment can be read both ways, as it stands.

> 
> >> +	for_each_possible_cpu(cpu) {
> >> +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
> >> +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> >> +		if (!cpu_node)
> >> +			return -ENOENT;
> >> +		num_levels = acpi_count_levels(table, cpu_node, NULL);
> > 
> > Is the initial call to acpi_count_levels() really needed here?
> > 
> > It feels a bit like we end up enumerating the whole topology two or
> > three times here; once to count how many levels there are, and then
> > again to examine the nodes, and once more inside acpi_find_cache_node().
> > 
> > Why can't we just walk until we run out of levels?
> 
> This is looking for a unified cache - and we don't know where those start.
> We could walk the first 100 caches, and stop once we start getting unified caches, then
> they stop again ... but this seemed simpler.

I'm still a bit confused.

We start at level one, and then trace parents until we hit a unified
cache or run out of levels.

Why do we need to know a priori how many levels there are, when the
way to determine that is part of the same procedure we're already doing
(i.e., start at level one and trace parents until we run out of levels)?

> > I may be missing some details of how these functions interact -- if
> > this is only run at probe time, compact, well-factored code is
> > more important than making things as fast as possible.

(This still stands.)


> >> diff --git a/include/linux/acpi.h b/include/linux/acpi.h

[...]

> >> @@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
> >>  void acpi_table_init_complete (void);
> >>  int acpi_table_init (void);
> >>  
> >> +static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
> >> +{
> >> +	struct acpi_table_header *table;
> >> +	int status = acpi_get_table(signature, instance, &table);
> >> +
> >> +	if (ACPI_FAILURE(status))
> >> +		return ERR_PTR(-ENOENT);
> >> +	return table;
> >> +}
> 
> > This feels like something that ought to exist already.  If not, why
> > not?  If so, are there open-coded versions of this spread around the
> > ACPI tree that should be ported to use it?
> 
> 
> It's a cleanup idiom helper that lets the compiler do this automagically - but its moot as
> its not going to be needed in the pptt because of the acpi_get_pptt() thing.

Ah, OK.

Cheers
---Dave

Re: [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id

Posted by James Morse 5 months ago

Hi Dave,

On 05/09/2025 17:27, Dave Martin wrote:
> On Thu, Aug 28, 2025 at 04:58:05PM +0100, James Morse wrote:
>> On 27/08/2025 11:50, Dave Martin wrote:
>>> On Fri, Aug 22, 2025 at 03:29:46PM +0000, James Morse wrote:
>>>> The MPAM table identifies caches by id. The MPAM driver also wants to know
>>>> the cache level to determine if the platform is of the shape that can be
>>>> managed via resctrl. Cacheinfo has this information, but only for CPUs that
>>>> are online.
>>>>
>>>> Waiting for all CPUs to come online is a problem for platforms where
>>>> CPUs are brought online late by user-space.
>>>>
>>>> Add a helper that walks every possible cache, until it finds the one
>>>> identified by cache-id, then return the level.
>>>> Add a cleanup based free-ing mechanism for acpi_get_table().
>>
>>> Does this mean that the early secondaries must be spread out across the
>>> whole topology so that everything can be probed?
>>>
>>> (i.e., a random subset is no good?)
>>
>> For the mpam driver - it needs to see each cache with mpam hardware, which means a CPU
>> associated with each cache needs to be online. Random is fine - provided you get lucky.
> 
> "Fine" = "not dependent on luck".  So, random is not fine.

As in it doesn't matter which CPUs - as long as you manage to represent each cache.

I really don't care if people configure their platform such that the mpam driver can't
complete its probing. Everything continues to work, you just don't get to use the unusable
features.

I've only really seen it done in VMs, which are most likely to have a single global MSC
because the thing has to be emulated.


>>> If so, is this documented somewhere, such as in booting.rst?
>>
>> booting.rst is for the bootloader.
>> Late secondaries is a bit of a niche sport, I've only seen it commonly done in VMs.
>> Most platforms so far have their MPAM controls on a global L3, so this requirement doesn't
>> make much of a difference.
>>
>> The concern is that if resctrl gets probed after user-space has started, whatever
>> user-space service is supposed to set it up will have concluded its not supported. Working
>> with cache-ids for offline CPUs means you don't have to bring all the CPUs online - only
>> enough so that every piece of hardware is reachable.
>>
>>
>>> Maybe this is not a new requirement -- it's not an area that I'm very
>>> familiar with.
>>
>> Hard to say - its a potentially surprising side effect of glomming OS accessible registers
>> onto the side of hardware that can be automatically powered off. (PSCI CPU_SUSPEND).
>>
>> I did try getting cacheinfo to populate all the CPUs at boot, regardless of whether they
>> were online. Apparently that doesn't work for PowerPC where the properties of CPUs can
>> change while they are offline. (presumably due to RAS or a firmware update)

> So, it sounds like there is a requirement, but we don't document it,
> and if the requirement is not met then the user is presented with an
> obscure failure in the MPAM driver.  This seems a bit unhelpful?

Not a failure. MPAM isn't yet available. Bring the rest of the system up and it will
spring into life. The same goes for any device you keep turned off.


> I'm not saying booting.rst is the right place for this -- maybe the
> appropriate document doesn't exist yet.
> 
> I wonder whether the required property is reasonable and general enough
> that it should be treated as a kernel boot requirement.

It's not a boot requirement. Linux boots just fine.


> Or, we require caches to be symmetric for non-early CPUs and reject
> those that don't match when they try to come online (similarly to
> the way cpufeatures deals with mismatches).

MPAM isn't an interesting enough feature to reject CPUs!
We can't check the the MSC properties until we can schedule, (thank the 'hide it in
firmware' folk for that one), meaning the CPU has to be online before we realise the
MPAM properties are mismatched.

The best approach here is to wait until everything has been seen before declaring that
we know what is going on. The architecture even calls this out as something that needs
doing, meaning the firmware tables have to describe all possible MSC.

I agree its not nice - but it is what MPAM is.
I think the people clever enough to manage late online-ing secondaries will work this out.


>>>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>>>> index 8f9b9508acba..660457644a5b 100644
>>>> --- a/drivers/acpi/pptt.c
>>>> +++ b/drivers/acpi/pptt.c
>>>> @@ -907,3 +907,67 @@ int find_acpi_cpu_topology_hetero_id(unsigned int cpu)

>>
>>>> +	for_each_possible_cpu(cpu) {
>>>> +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
>>>> +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
>>>> +		if (!cpu_node)
>>>> +			return -ENOENT;
>>>> +		num_levels = acpi_count_levels(table, cpu_node, NULL);
>>>
>>> Is the initial call to acpi_count_levels() really needed here?
>>>
>>> It feels a bit like we end up enumerating the whole topology two or
>>> three times here; once to count how many levels there are, and then
>>> again to examine the nodes, and once more inside acpi_find_cache_node().
>>>
>>> Why can't we just walk until we run out of levels?
>>
>> This is looking for a unified cache - and we don't know where those start.
>> We could walk the first 100 caches, and stop once we start getting unified caches, then
>> they stop again ... but this seemed simpler.
> 
> I'm still a bit confused.
> 
> We start at level one, and then trace parents until we hit a unified
> cache or run out of levels.
> 
> Why do we need to know a priori how many levels there are, when the
> way to determine that is part of the same procedure we're already doing
> (i.e., start at level one and trace parents until we run out of levels)?

| level = 1;
| acpi_find_cache_node(table, acpi_cpu_id, ACPI_PPTT_CACHE_TYPE_UNIFIED,
|			level, &cpu_node);

Fails. Do we stop the loop, or try level=2?

level = 1 fails because the L1 (probably) isn't a unified cache. Is L2? We don't know.


It's simpler to know how many levels there are, and walk the lot, than it is to try and
work out where the unified bit of the hierarchy starts - and start walking from there.

Yes the PPTT parsing could be different, but this is the kind of shape it has. Doing it
like this is in-keeping with the rest of it.


>>> I may be missing some details of how these functions interact -- if
>>> this is only run at probe time, compact, well-factored code is
>>> more important than making things as fast as possible.
> 
> (This still stands.)

This is all done at probe time, and never called again.
Nothing else in here caches values - it derives it all from the table every time so that
its stateless.


James

Re: [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id

Posted by Ben Horgan 5 months, 2 weeks ago

Hi James,

On 8/22/25 16:29, James Morse wrote:
> The MPAM table identifies caches by id. The MPAM driver also wants to know
> the cache level to determine if the platform is of the shape that can be
> managed via resctrl. Cacheinfo has this information, but only for CPUs that
> are online.
> 
> Waiting for all CPUs to come online is a problem for platforms where
> CPUs are brought online late by user-space.
> 
> Add a helper that walks every possible cache, until it finds the one
> identified by cache-id, then return the level.
> Add a cleanup based free-ing mechanism for acpi_get_table().
> 
> CC: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
>  * acpi_count_levels() now returns a value.
>  * Converted the table-get stuff to use Jonathan's cleanup helper.
>  * Dropped Sudeep's Review tag due to the cleanup change.
> ---
>  drivers/acpi/pptt.c  | 64 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/acpi.h | 17 ++++++++++++
>  2 files changed, 81 insertions(+)
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 8f9b9508acba..660457644a5b 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -907,3 +907,67 @@ int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
>  	return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
>  					  ACPI_PPTT_ACPI_IDENTICAL);
>  }
> +
> +/**
> + * find_acpi_cache_level_from_id() - Get the level of the specified cache
> + * @cache_id: The id field of the unified cache
> + *
> + * Determine the level relative to any CPU for the unified cache identified by
> + * cache_id. This allows the property to be found even if the CPUs are offline.
> + *
> + * The returned level can be used to group unified caches that are peers.
> + *
> + * The PPTT table must be rev 3 or later,
> + *
> + * If one CPUs L2 is shared with another as L3, this function will return
> + * an unpredictable value.
> + *
> + * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
> + * Otherwise returns a value which represents the level of the specified cache.
> + */
> +int find_acpi_cache_level_from_id(u32 cache_id)
> +{
> +	u32 acpi_cpu_id;
> +	int level, cpu, num_levels;
> +	struct acpi_pptt_cache *cache;
> +	struct acpi_pptt_cache_v1 *cache_v1;
> +	struct acpi_pptt_processor *cpu_node;
> +	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);
> +
> +	if (IS_ERR(table))
> +		return PTR_ERR(table);
> +
> +	if (table->revision < 3)
> +		return -ENOENT;
> +
> +	/*
> +	 * If we found the cache first, we'd still need to walk from each CPU
> +	 * to find the level...
> +	 */
> +	for_each_possible_cpu(cpu) {
> +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
> +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> +		if (!cpu_node)
> +			return -ENOENT;
> +		num_levels = acpi_count_levels(table, cpu_node, NULL);
> +
> +		/* Start at 1 for L1 */
> +		for (level = 1; level <= num_levels; level++) {
> +			cache = acpi_find_cache_node(table, acpi_cpu_id,
> +						     ACPI_PPTT_CACHE_TYPE_UNIFIED,
> +						     level, &cpu_node);
> +			if (!cache)
> +				continue;
> +
> +			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
> +						cache,
> +						sizeof(struct acpi_pptt_cache));
> +
> +			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
> +			    cache_v1->cache_id == cache_id)
> +				return level;
> +		}
> +	}
> +
> +	return -ENOENT;
> +}
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index f97a9ff678cc..30c10b1dcdb2 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -8,6 +8,7 @@
>  #ifndef _LINUX_ACPI_H
>  #define _LINUX_ACPI_H
>  
> +#include <linux/cleanup.h>
>  #include <linux/errno.h>
>  #include <linux/ioport.h>	/* for struct resource */
>  #include <linux/resource_ext.h>
> @@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
>  void acpi_table_init_complete (void);
>  int acpi_table_init (void);
>  
> +static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
> +{
> +	struct acpi_table_header *table;
> +	int status = acpi_get_table(signature, instance, &table);
> +
> +	if (ACPI_FAILURE(status))
> +		return ERR_PTR(-ENOENT);
> +	return table;
> +}
> +DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR(_T)) acpi_put_table(_T))
nit: Is it useful to change the condition from !IS_ERR(_T) to
!IS_ERR_OR_NULL(_T)? This seems to be the common pattern. I do note that
acpi_put_table() can take NULL, so there is no real danger.
> +
>  int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
>  int __init_or_acpilib acpi_table_parse_entries(char *id,
>  		unsigned long table_size, int entry_id,
> @@ -1542,6 +1554,7 @@ int find_acpi_cpu_topology_cluster(unsigned int cpu);
>  int find_acpi_cpu_topology_package(unsigned int cpu);
>  int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
>  void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
> +int find_acpi_cache_level_from_id(u32 cache_id);
>  #else
>  static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
>  {
> @@ -1565,6 +1578,10 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
>  }
>  static inline void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id,
>  						     cpumask_t *cpus) { }
> +static inline int find_acpi_cache_level_from_id(u32 cache_id)
> +{
> +	return -EINVAL;
> +}
>  #endif
>  
>  void acpi_arch_init(void);


Thanks,

Ben

Re: [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id

Posted by James Morse 5 months, 2 weeks ago

Hi Ben,

On 27/08/2025 10:25, Ben Horgan wrote:
> On 8/22/25 16:29, James Morse wrote:
>> The MPAM table identifies caches by id. The MPAM driver also wants to know
>> the cache level to determine if the platform is of the shape that can be
>> managed via resctrl. Cacheinfo has this information, but only for CPUs that
>> are online.
>>
>> Waiting for all CPUs to come online is a problem for platforms where
>> CPUs are brought online late by user-space.
>>
>> Add a helper that walks every possible cache, until it finds the one
>> identified by cache-id, then return the level.
>> Add a cleanup based free-ing mechanism for acpi_get_table().

>> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
>> index f97a9ff678cc..30c10b1dcdb2 100644
>> --- a/include/linux/acpi.h
>> +++ b/include/linux/acpi.h
>> @@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
>>  void acpi_table_init_complete (void);
>>  int acpi_table_init (void);
>>  
>> +static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
>> +{
>> +	struct acpi_table_header *table;
>> +	int status = acpi_get_table(signature, instance, &table);
>> +
>> +	if (ACPI_FAILURE(status))
>> +		return ERR_PTR(-ENOENT);
>> +	return table;
>> +}
>> +DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR(_T)) acpi_put_table(_T))

> nit: Is it useful to change the condition from !IS_ERR(_T) to
> !IS_ERR_OR_NULL(_T)? This seems to be the common pattern. I do note that
> acpi_put_table() can take NULL, so there is no real danger.

If it's the common pattern, sure.

But this code got dropped as Dave pointed out the PPTT doesn't care about the reference
counting anyway, its acpi_get_pptt() helper just uses the same reference for everything.

This might come back for the MPAM driver..


>> +
>>  int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
>>  int __init_or_acpilib acpi_table_parse_entries(char *id,
>>  		unsigned long table_size, int entry_id,


Thanks,

James

Re: [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id

Posted by Markus Elfring 5 months, 2 weeks ago

…
> +++ b/include/linux/acpi.h
…
> @@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
>  void acpi_table_init_complete (void);
>  int acpi_table_init (void);
…
> +DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR(_T)) acpi_put_table(_T))
> +
>  int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
…

How do you think about to offer the addition of such a special macro call
by another separate update step?
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.17-rc2#n81

Regards,
Markus

Re: [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id

Posted by James Morse 5 months, 2 weeks ago

Hi Markus,

On 23/08/2025 13:14, Markus Elfring wrote:
> …
>> +++ b/include/linux/acpi.h
> …
>> @@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
>>  void acpi_table_init_complete (void);
>>  int acpi_table_init (void);
> …
>> +DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR(_T)) acpi_put_table(_T))
>> +
>>  int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
> …
> 
> How do you think about to offer the addition of such a special macro call
> by another separate update step?
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.17-rc2#n81

As it goes via the same tree I don't think there is a strong reason either way.

Dave points out on an earlier patch that the PPTT code doesn't care about the reference
counting anyway, so this stuff can go.


Thanks,

James