[v2] cacheinfo: Set cache 'id' based on DT data

[PATCH v2 1/3] cacheinfo: Set cache 'id' based on DT data

Posted by James Morse 3 months ago

From: Rob Herring <robh@kernel.org>

Use the minimum CPU h/w id of the CPUs associated with the cache for the
cache 'id'. This will provide a stable id value for a given system. As
we need to check all possible CPUs, we can't use the shared_cpu_map
which is just online CPUs. As there's not a cache to CPUs mapping in DT,
we have to walk all CPU nodes and then walk cache levels.

The cache_id exposed to user-space has historically been 32 bits, and
is too late to change. This value is parsed into a u32 by user-space
libraries such as libvirt:
https://github.com/libvirt/libvirt/blob/master/src/util/virresctrl.c#L1588

Give up on assigning cache-id's if a CPU h/w id greater than 32 bits
is found.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Rob Herring <robh@kernel.org>
[ ben: converted to use the __free cleanup idiom ]
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
[ morse: Add checks to give up if a value larger than 32 bits is seen. ]
Signed-off-by: James Morse <james.morse@arm.com>
---
Use as a 32bit value has also been seen in DPDK patches here:
http://inbox.dpdk.org/dev/20241021015246.304431-2-wathsala.vithanage@arm.com/

Changes since v1:
 * Remove the second loop in favour of a helper.
 
An open question from v1 is whether it would be preferable to use an
index into the DT of the CPU nodes instead of the hardware id. This would
save an arch specific swizzle - but the numbers would change if the DT
were changed. This scheme isn't sensitive to the order of DT nodes.

---
 drivers/base/cacheinfo.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index cf0d455209d7..df593da0d5f7 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -8,6 +8,7 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include <linux/acpi.h>
+#include <linux/bitfield.h>
 #include <linux/bitops.h>
 #include <linux/cacheinfo.h>
 #include <linux/compiler.h>
@@ -183,6 +184,42 @@ static bool cache_node_is_unified(struct cacheinfo *this_leaf,
 	return of_property_read_bool(np, "cache-unified");
 }
 
+static bool match_cache_node(struct device_node *cpu,
+			     const struct device_node *cache_node)
+{
+	for (struct device_node *cache __free(device_node) = of_find_next_cache_node(cpu);
+	     cache != NULL; cache = of_find_next_cache_node(cache)) {
+		if (cache == cache_node)
+			return true;
+	}
+
+	return false;
+}
+
+static void cache_of_set_id(struct cacheinfo *this_leaf,
+			    struct device_node *cache_node)
+{
+	struct device_node *cpu;
+	u32 min_id = ~0;
+
+	for_each_of_cpu_node(cpu) {
+		u64 id = of_get_cpu_hwid(cpu, 0);
+
+		if (FIELD_GET(GENMASK_ULL(63, 32), id)) {
+			of_node_put(cpu);
+			return;
+		}
+
+		if (match_cache_node(cpu, cache_node))
+			min_id = min(min_id, id);
+	}
+
+	if (min_id != ~0) {
+		this_leaf->id = min_id;
+		this_leaf->attributes |= CACHE_ID;
+	}
+}
+
 static void cache_of_set_props(struct cacheinfo *this_leaf,
 			       struct device_node *np)
 {
@@ -198,6 +235,7 @@ static void cache_of_set_props(struct cacheinfo *this_leaf,
 	cache_get_line_size(this_leaf, np);
 	cache_nr_sets(this_leaf, np);
 	cache_associativity(this_leaf);
+	cache_of_set_id(this_leaf, np);
 }
 
 static int cache_setup_of_node(unsigned int cpu)
-- 
2.39.5

Re: [PATCH v2 1/3] cacheinfo: Set cache 'id' based on DT data

Posted by Gavin Shan 2 months, 4 weeks ago

On 7/5/25 3:38 AM, James Morse wrote:
> From: Rob Herring <robh@kernel.org>
> 
> Use the minimum CPU h/w id of the CPUs associated with the cache for the
> cache 'id'. This will provide a stable id value for a given system. As
> we need to check all possible CPUs, we can't use the shared_cpu_map
> which is just online CPUs. As there's not a cache to CPUs mapping in DT,
> we have to walk all CPU nodes and then walk cache levels.
> 
> The cache_id exposed to user-space has historically been 32 bits, and
> is too late to change. This value is parsed into a u32 by user-space
> libraries such as libvirt:
> https://github.com/libvirt/libvirt/blob/master/src/util/virresctrl.c#L1588
> 
> Give up on assigning cache-id's if a CPU h/w id greater than 32 bits
> is found.
> 
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Signed-off-by: Rob Herring <robh@kernel.org>
> [ ben: converted to use the __free cleanup idiom ]
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> [ morse: Add checks to give up if a value larger than 32 bits is seen. ]
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Use as a 32bit value has also been seen in DPDK patches here:
> http://inbox.dpdk.org/dev/20241021015246.304431-2-wathsala.vithanage@arm.com/
> 
> Changes since v1:
>   * Remove the second loop in favour of a helper.
>   
> An open question from v1 is whether it would be preferable to use an
> index into the DT of the CPU nodes instead of the hardware id. This would
> save an arch specific swizzle - but the numbers would change if the DT
> were changed. This scheme isn't sensitive to the order of DT nodes.
> 
> ---
>   drivers/base/cacheinfo.c | 38 ++++++++++++++++++++++++++++++++++++++
>   1 file changed, 38 insertions(+)
> 

With Ben Horgan's concern addressed, LGTM:

Reviewed-by: Gavin Shan <gshan@redha.com>

Re: [PATCH v2 1/3] cacheinfo: Set cache 'id' based on DT data

Posted by Ben Horgan 3 months ago

Hi James,

On 7/4/25 18:38, James Morse wrote:
> From: Rob Herring <robh@kernel.org>
> 
> Use the minimum CPU h/w id of the CPUs associated with the cache for the
> cache 'id'. This will provide a stable id value for a given system. As
> we need to check all possible CPUs, we can't use the shared_cpu_map
> which is just online CPUs. As there's not a cache to CPUs mapping in DT,
> we have to walk all CPU nodes and then walk cache levels.
> 
> The cache_id exposed to user-space has historically been 32 bits, and
> is too late to change. This value is parsed into a u32 by user-space
> libraries such as libvirt:
> https://github.com/libvirt/libvirt/blob/master/src/util/virresctrl.c#L1588
> 
> Give up on assigning cache-id's if a CPU h/w id greater than 32 bits
> is found.
> 
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Signed-off-by: Rob Herring <robh@kernel.org>
> [ ben: converted to use the __free cleanup idiom ]
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> [ morse: Add checks to give up if a value larger than 32 bits is seen. ]
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Use as a 32bit value has also been seen in DPDK patches here:
> http://inbox.dpdk.org/dev/20241021015246.304431-2-wathsala.vithanage@arm.com/
> 
> Changes since v1:
>   * Remove the second loop in favour of a helper.
>   
> An open question from v1 is whether it would be preferable to use an
> index into the DT of the CPU nodes instead of the hardware id. This would
> save an arch specific swizzle - but the numbers would change if the DT
> were changed. This scheme isn't sensitive to the order of DT nodes.
> 
> ---
>   drivers/base/cacheinfo.c | 38 ++++++++++++++++++++++++++++++++++++++
>   1 file changed, 38 insertions(+)
> 
> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
> index cf0d455209d7..df593da0d5f7 100644
> --- a/drivers/base/cacheinfo.c
> +++ b/drivers/base/cacheinfo.c
> @@ -8,6 +8,7 @@
>   #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>   
>   #include <linux/acpi.h>
> +#include <linux/bitfield.h>
>   #include <linux/bitops.h>
>   #include <linux/cacheinfo.h>
>   #include <linux/compiler.h>
> @@ -183,6 +184,42 @@ static bool cache_node_is_unified(struct cacheinfo *this_leaf,
>   	return of_property_read_bool(np, "cache-unified");
>   }
>   
> +static bool match_cache_node(struct device_node *cpu,
> +			     const struct device_node *cache_node)
> +{
> +	for (struct device_node *cache __free(device_node) = of_find_next_cache_node(cpu);
Looks like the creation of this helper function has upset the 
device_node reference counting. This first __free(device_node) will only 
cause of_node_put() to be called in the case of the early return from 
the loop. You've dropped the second __free(device_node) which accounts 
for 'cache' changing on each iteration.
> +	     cache != NULL; cache = of_find_next_cache_node(cache)) {
> +		if (cache == cache_node)
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
> +static void cache_of_set_id(struct cacheinfo *this_leaf,
> +			    struct device_node *cache_node)
> +{
> +	struct device_node *cpu;
> +	u32 min_id = ~0;
> +
> +	for_each_of_cpu_node(cpu) {
> +		u64 id = of_get_cpu_hwid(cpu, 0);
> +
> +		if (FIELD_GET(GENMASK_ULL(63, 32), id)) {
> +			of_node_put(cpu);
> +			return;
> +		}
> +
> +		if (match_cache_node(cpu, cache_node))
> +			min_id = min(min_id, id);
> +	}
> +
> +	if (min_id != ~0) {
> +		this_leaf->id = min_id;
> +		this_leaf->attributes |= CACHE_ID;
> +	}
> +}
> +
>   static void cache_of_set_props(struct cacheinfo *this_leaf,
>   			       struct device_node *np)
>   {
> @@ -198,6 +235,7 @@ static void cache_of_set_props(struct cacheinfo *this_leaf,
>   	cache_get_line_size(this_leaf, np);
>   	cache_nr_sets(this_leaf, np);
>   	cache_associativity(this_leaf);
> +	cache_of_set_id(this_leaf, np);
>   }
>   
>   static int cache_setup_of_node(unsigned int cpu)


Thanks,

Ben

Re: [PATCH v2 1/3] cacheinfo: Set cache 'id' based on DT data

Posted by Jonathan Cameron 3 months ago

On Mon, 7 Jul 2025 11:27:06 +0100
Ben Horgan <ben.horgan@arm.com> wrote:

> Hi James,
> 
> On 7/4/25 18:38, James Morse wrote:
> > From: Rob Herring <robh@kernel.org>
> > 
> > Use the minimum CPU h/w id of the CPUs associated with the cache for the
> > cache 'id'. This will provide a stable id value for a given system. As
> > we need to check all possible CPUs, we can't use the shared_cpu_map
> > which is just online CPUs. As there's not a cache to CPUs mapping in DT,
> > we have to walk all CPU nodes and then walk cache levels.
> > 
> > The cache_id exposed to user-space has historically been 32 bits, and
> > is too late to change. This value is parsed into a u32 by user-space
> > libraries such as libvirt:
> > https://github.com/libvirt/libvirt/blob/master/src/util/virresctrl.c#L1588
> > 
> > Give up on assigning cache-id's if a CPU h/w id greater than 32 bits
> > is found.
> > 
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> > Signed-off-by: Rob Herring <robh@kernel.org>
> > [ ben: converted to use the __free cleanup idiom ]
> > Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> > [ morse: Add checks to give up if a value larger than 32 bits is seen. ]
> > Signed-off-by: James Morse <james.morse@arm.com>
> > ---
> > Use as a 32bit value has also been seen in DPDK patches here:
> > http://inbox.dpdk.org/dev/20241021015246.304431-2-wathsala.vithanage@arm.com/
> > 
> > Changes since v1:
> >   * Remove the second loop in favour of a helper.
> >   
> > An open question from v1 is whether it would be preferable to use an
> > index into the DT of the CPU nodes instead of the hardware id. This would
> > save an arch specific swizzle - but the numbers would change if the DT
> > were changed. This scheme isn't sensitive to the order of DT nodes.
> > 
> > ---
> >   drivers/base/cacheinfo.c | 38 ++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 38 insertions(+)
> > 
> > diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
> > index cf0d455209d7..df593da0d5f7 100644
> > --- a/drivers/base/cacheinfo.c
> > +++ b/drivers/base/cacheinfo.c
> > @@ -8,6 +8,7 @@
> >   #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> >   
> >   #include <linux/acpi.h>
> > +#include <linux/bitfield.h>
> >   #include <linux/bitops.h>
> >   #include <linux/cacheinfo.h>
> >   #include <linux/compiler.h>
> > @@ -183,6 +184,42 @@ static bool cache_node_is_unified(struct cacheinfo *this_leaf,
> >   	return of_property_read_bool(np, "cache-unified");
> >   }
> >   
> > +static bool match_cache_node(struct device_node *cpu,
> > +			     const struct device_node *cache_node)
> > +{
> > +	for (struct device_node *cache __free(device_node) = of_find_next_cache_node(cpu);  
> Looks like the creation of this helper function has upset the 
> device_node reference counting. This first __free(device_node) will only 
> cause of_node_put() to be called in the case of the early return from 
> the loop. You've dropped the second __free(device_node) which accounts 
> for 'cache' changing on each iteration.

Good catch - this behaves differently from many of the of_get_next* type
helpers in that it doesn't drop the reference to the previous iteration
within the call.

Maybe it should?

I checked a few of the call sites and some would be simplified if it did
others would need some more complex restructuring but might benefit as
well.

> > +	     cache != NULL; cache = of_find_next_cache_node(cache)) {
> > +		if (cache == cache_node)
> > +			return true;
> > +	}
> > +
> > +	return false;
> > +}
> > +
> > +static void cache_of_set_id(struct cacheinfo *this_leaf,
> > +			    struct device_node *cache_node)
> > +{
> > +	struct device_node *cpu;
> > +	u32 min_id = ~0;
> > +
> > +	for_each_of_cpu_node(cpu) {
> > +		u64 id = of_get_cpu_hwid(cpu, 0);
> > +
> > +		if (FIELD_GET(GENMASK_ULL(63, 32), id)) {
> > +			of_node_put(cpu);
> > +			return;
> > +		}
> > +
> > +		if (match_cache_node(cpu, cache_node))
> > +			min_id = min(min_id, id);
> > +	}
> > +
> > +	if (min_id != ~0) {
> > +		this_leaf->id = min_id;
> > +		this_leaf->attributes |= CACHE_ID;
> > +	}
> > +}
> > +
> >   static void cache_of_set_props(struct cacheinfo *this_leaf,
> >   			       struct device_node *np)
> >   {
> > @@ -198,6 +235,7 @@ static void cache_of_set_props(struct cacheinfo *this_leaf,
> >   	cache_get_line_size(this_leaf, np);
> >   	cache_nr_sets(this_leaf, np);
> >   	cache_associativity(this_leaf);
> > +	cache_of_set_id(this_leaf, np);
> >   }
> >   
> >   static int cache_setup_of_node(unsigned int cpu)  
> 
> 
> Thanks,
> 
> Ben
> 
>

Re: [PATCH v2 1/3] cacheinfo: Set cache 'id' based on DT data

Posted by James Morse 2 months, 4 weeks ago

Hi Ben, Jonathan,

On 07/07/2025 13:32, Jonathan Cameron wrote:
> On Mon, 7 Jul 2025 11:27:06 +0100
> Ben Horgan <ben.horgan@arm.com> wrote:
>> On 7/4/25 18:38, James Morse wrote:
>>> From: Rob Herring <robh@kernel.org>
>>> Use the minimum CPU h/w id of the CPUs associated with the cache for the
>>> cache 'id'. This will provide a stable id value for a given system. As
>>> we need to check all possible CPUs, we can't use the shared_cpu_map
>>> which is just online CPUs. As there's not a cache to CPUs mapping in DT,
>>> we have to walk all CPU nodes and then walk cache levels.
>>>
>>> The cache_id exposed to user-space has historically been 32 bits, and
>>> is too late to change. This value is parsed into a u32 by user-space
>>> libraries such as libvirt:
>>> https://github.com/libvirt/libvirt/blob/master/src/util/virresctrl.c#L1588
>>>
>>> Give up on assigning cache-id's if a CPU h/w id greater than 32 bits
>>> is found.

>>> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
>>> index cf0d455209d7..df593da0d5f7 100644
>>> --- a/drivers/base/cacheinfo.c
>>> +++ b/drivers/base/cacheinfo.c
>>> @@ -183,6 +184,42 @@ static bool cache_node_is_unified(struct cacheinfo *this_leaf,
>>>   	return of_property_read_bool(np, "cache-unified");
>>>   }
>>>   
>>> +static bool match_cache_node(struct device_node *cpu,
>>> +			     const struct device_node *cache_node)
>>> +{
>>> +	for (struct device_node *cache __free(device_node) = of_find_next_cache_node(cpu);  
>> Looks like the creation of this helper function has upset the 
>> device_node reference counting. This first __free(device_node) will only 
>> cause of_node_put() to be called in the case of the early return from 
>> the loop. You've dropped the second __free(device_node) which accounts 
>> for 'cache' changing on each iteration.

Heh, I just took this hunk verbatim. Fixing this up with the __free() magic is tricky as
the existing patterns all drop the reference to cpu, which we don't want to do here. I
think at this point the __free() magic is just making this harder to understand. How about
the old fashioned way:

| static bool match_cache_node(struct device_node *cpu,
|                              const struct device_node *cache_node)
| {
|         struct device_node *prev, *cache = of_find_next_cache_node(cpu);
|
|         while (cache) {
|                 if (cache == cache_node) {
|                         of_node_put(cache);
|                         return true;
|                 }
|
|                 prev = cache;
|                 cache = of_find_next_cache_node(cache);
|                 of_node_put(prev);
|         }
|
|         return false;
| }


> Good catch - this behaves differently from many of the of_get_next* type
> helpers in that it doesn't drop the reference to the previous iteration
> within the call.
> 
> Maybe it should?
> 
> I checked a few of the call sites and some would be simplified if it did
> others would need some more complex restructuring but might benefit as
> well.

If it did, we'd end up dropping the reference to cpu on the way in, which
of_get_next_cpu_node() in for_each_of_cpu_node() was expecting to do.


Thanks,

James

Re: [PATCH v2 1/3] cacheinfo: Set cache 'id' based on DT data

Posted by Ben Horgan 2 months, 4 weeks ago

Hi James and Jonathan,

On 7/10/25 12:15, James Morse wrote:
> Hi Ben, Jonathan,
> 
> On 07/07/2025 13:32, Jonathan Cameron wrote:
>> On Mon, 7 Jul 2025 11:27:06 +0100
>> Ben Horgan <ben.horgan@arm.com> wrote:
>>> On 7/4/25 18:38, James Morse wrote:
>>>> From: Rob Herring <robh@kernel.org>
>>>> Use the minimum CPU h/w id of the CPUs associated with the cache for the
>>>> cache 'id'. This will provide a stable id value for a given system. As
>>>> we need to check all possible CPUs, we can't use the shared_cpu_map
>>>> which is just online CPUs. As there's not a cache to CPUs mapping in DT,
>>>> we have to walk all CPU nodes and then walk cache levels.
>>>>
>>>> The cache_id exposed to user-space has historically been 32 bits, and
>>>> is too late to change. This value is parsed into a u32 by user-space
>>>> libraries such as libvirt:
>>>> https://github.com/libvirt/libvirt/blob/master/src/util/virresctrl.c#L1588
>>>>
>>>> Give up on assigning cache-id's if a CPU h/w id greater than 32 bits
>>>> is found.
> 
>>>> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
>>>> index cf0d455209d7..df593da0d5f7 100644
>>>> --- a/drivers/base/cacheinfo.c
>>>> +++ b/drivers/base/cacheinfo.c
>>>> @@ -183,6 +184,42 @@ static bool cache_node_is_unified(struct cacheinfo *this_leaf,
>>>>    	return of_property_read_bool(np, "cache-unified");
>>>>    }
>>>>    
>>>> +static bool match_cache_node(struct device_node *cpu,
>>>> +			     const struct device_node *cache_node)
>>>> +{
>>>> +	for (struct device_node *cache __free(device_node) = of_find_next_cache_node(cpu);
>>> Looks like the creation of this helper function has upset the
>>> device_node reference counting. This first __free(device_node) will only
>>> cause of_node_put() to be called in the case of the early return from
>>> the loop. You've dropped the second __free(device_node) which accounts
>>> for 'cache' changing on each iteration.
> 
> Heh, I just took this hunk verbatim. Fixing this up with the __free() magic is tricky as
> the existing patterns all drop the reference to cpu, which we don't want to do here. I
> think at this point the __free() magic is just making this harder to understand. How about
> the old fashioned way:
> 
> | static bool match_cache_node(struct device_node *cpu,
> |                              const struct device_node *cache_node)
> | {
> |         struct device_node *prev, *cache = of_find_next_cache_node(cpu);
> |
> |         while (cache) {
> |                 if (cache == cache_node) {
> |                         of_node_put(cache);
> |                         return true;
> |                 }
> |
> |                 prev = cache;
> |                 cache = of_find_next_cache_node(cache);
> |                 of_node_put(prev);
> |         }
> |
> |         return false;
> | }
Ok with me.
> 
> 
>> Good catch - this behaves differently from many of the of_get_next* type
>> helpers in that it doesn't drop the reference to the previous iteration
>> within the call.
>>
>> Maybe it should?
>>
>> I checked a few of the call sites and some would be simplified if it did
>> others would need some more complex restructuring but might benefit as
>> well.
> 
> If it did, we'd end up dropping the reference to cpu on the way in, which
> of_get_next_cpu_node() in for_each_of_cpu_node() was expecting to do.

Yes, I think the blurring of the lines between a cpu node and cache node 
is at least partially to blame for the confusion here.
> 
> 
> Thanks,
> 
> James

Thanks,

Ben

Re: [PATCH v2 1/3] cacheinfo: Set cache 'id' based on DT data

Posted by Jonathan Cameron 2 months, 4 weeks ago

On Thu, 10 Jul 2025 12:24:01 +0100
Ben Horgan <ben.horgan@arm.com> wrote:

> Hi James and Jonathan,
> 
> On 7/10/25 12:15, James Morse wrote:
> > Hi Ben, Jonathan,
> > 
> > On 07/07/2025 13:32, Jonathan Cameron wrote:  
> >> On Mon, 7 Jul 2025 11:27:06 +0100
> >> Ben Horgan <ben.horgan@arm.com> wrote:  
> >>> On 7/4/25 18:38, James Morse wrote:  
> >>>> From: Rob Herring <robh@kernel.org>
> >>>> Use the minimum CPU h/w id of the CPUs associated with the cache for the
> >>>> cache 'id'. This will provide a stable id value for a given system. As
> >>>> we need to check all possible CPUs, we can't use the shared_cpu_map
> >>>> which is just online CPUs. As there's not a cache to CPUs mapping in DT,
> >>>> we have to walk all CPU nodes and then walk cache levels.
> >>>>
> >>>> The cache_id exposed to user-space has historically been 32 bits, and
> >>>> is too late to change. This value is parsed into a u32 by user-space
> >>>> libraries such as libvirt:
> >>>> https://github.com/libvirt/libvirt/blob/master/src/util/virresctrl.c#L1588
> >>>>
> >>>> Give up on assigning cache-id's if a CPU h/w id greater than 32 bits
> >>>> is found.  
> >   
> >>>> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
> >>>> index cf0d455209d7..df593da0d5f7 100644
> >>>> --- a/drivers/base/cacheinfo.c
> >>>> +++ b/drivers/base/cacheinfo.c
> >>>> @@ -183,6 +184,42 @@ static bool cache_node_is_unified(struct cacheinfo *this_leaf,
> >>>>    	return of_property_read_bool(np, "cache-unified");
> >>>>    }
> >>>>    
> >>>> +static bool match_cache_node(struct device_node *cpu,
> >>>> +			     const struct device_node *cache_node)
> >>>> +{
> >>>> +	for (struct device_node *cache __free(device_node) = of_find_next_cache_node(cpu);  
> >>> Looks like the creation of this helper function has upset the
> >>> device_node reference counting. This first __free(device_node) will only
> >>> cause of_node_put() to be called in the case of the early return from
> >>> the loop. You've dropped the second __free(device_node) which accounts
> >>> for 'cache' changing on each iteration.  
> > 
> > Heh, I just took this hunk verbatim. Fixing this up with the __free() magic is tricky as
> > the existing patterns all drop the reference to cpu, which we don't want to do here. I
> > think at this point the __free() magic is just making this harder to understand. How about
> > the old fashioned way:
> > 
> > | static bool match_cache_node(struct device_node *cpu,
> > |                              const struct device_node *cache_node)
> > | {
> > |         struct device_node *prev, *cache = of_find_next_cache_node(cpu);
> > |
> > |         while (cache) {
> > |                 if (cache == cache_node) {
> > |                         of_node_put(cache);
> > |                         return true;
> > |                 }
> > |
> > |                 prev = cache;
> > |                 cache = of_find_next_cache_node(cache);
> > |                 of_node_put(prev);
> > |         }
> > |
> > |         return false;
> > | }  
> Ok with me.
Agreed. 

> > 
> >   
> >> Good catch - this behaves differently from many of the of_get_next* type
> >> helpers in that it doesn't drop the reference to the previous iteration
> >> within the call.
> >>
> >> Maybe it should?
> >>
> >> I checked a few of the call sites and some would be simplified if it did
> >> others would need some more complex restructuring but might benefit as
> >> well.  
> > 
> > If it did, we'd end up dropping the reference to cpu on the way in, which
> > of_get_next_cpu_node() in for_each_of_cpu_node() was expecting to do.  
> 
> Yes, I think the blurring of the lines between a cpu node and cache node 
> is at least partially to blame for the confusion here.
Yes.  That is more than a little ugly!

> > 
> > 
> > Thanks,
> > 
> > James  
> 
> Thanks,
> 
> Ben
> 
>

Re: [PATCH v2 1/3] cacheinfo: Set cache 'id' based on DT data

Posted by Jonathan Cameron 3 months ago

On Fri, 4 Jul 2025 17:38:24 +0000
James Morse <james.morse@arm.com> wrote:

> From: Rob Herring <robh@kernel.org>
> 
> Use the minimum CPU h/w id of the CPUs associated with the cache for the
> cache 'id'. This will provide a stable id value for a given system. As
> we need to check all possible CPUs, we can't use the shared_cpu_map
> which is just online CPUs. As there's not a cache to CPUs mapping in DT,
> we have to walk all CPU nodes and then walk cache levels.
> 
> The cache_id exposed to user-space has historically been 32 bits, and
> is too late to change. This value is parsed into a u32 by user-space
> libraries such as libvirt:
> https://github.com/libvirt/libvirt/blob/master/src/util/virresctrl.c#L1588
> 
> Give up on assigning cache-id's if a CPU h/w id greater than 32 bits
> is found.
> 
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Signed-off-by: Rob Herring <robh@kernel.org>
> [ ben: converted to use the __free cleanup idiom ]
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> [ morse: Add checks to give up if a value larger than 32 bits is seen. ]
> Signed-off-by: James Morse <james.morse@arm.com>

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>