[PATCH 2/5] cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id

James Morse posted 5 patches 3 months, 4 weeks ago
There is a newer version of this series
[PATCH 2/5] cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id
Posted by James Morse 3 months, 4 weeks ago
Filesystems like resctrl use the cache-id exposed via sysfs to identify
groups of CPUs. The value is also used for PCIe cache steering tags. On
DT platforms cache-id is not something that is described in the
device-tree, but instead generated from the smallest CPU h/w id of the
CPUs associated with that cache.

CPU h/w ids may be larger than 32 bits.

Add a hook to allow architectures to compress the value from the devicetree
into 32 bits. Returning the same value is always safe as cache_of_set_id()
will stop if a value larger than 32 bits is seen.

For example, on arm64 the value is the MPIDR affinity register, which only
has 32 bits of affinity data, but spread across the 64 bit field. An
arch-specific bit swizzle gives a 32 bit value.

Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/base/cacheinfo.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index 9888d87840a2..d8e5b4c7156c 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -184,6 +184,10 @@ static bool cache_node_is_unified(struct cacheinfo *this_leaf,
 	return of_property_read_bool(np, "cache-unified");
 }
 
+#ifndef arch_compact_of_hwid
+#define arch_compact_of_hwid(_x)	(_x)
+#endif
+
 static void cache_of_set_id(struct cacheinfo *this_leaf, struct device_node *np)
 {
 	struct device_node *cpu;
@@ -193,6 +197,7 @@ static void cache_of_set_id(struct cacheinfo *this_leaf, struct device_node *np)
 		struct device_node *cache_node __free(device_node) = of_find_next_cache_node(cpu);
 		u64 id = of_get_cpu_hwid(cpu, 0);
 
+		id = arch_compact_of_hwid(id);
 		if (FIELD_GET(GENMASK_ULL(63, 32), id)) {
 			of_node_put(cpu);
 			return;
-- 
2.39.5
Re: [PATCH 2/5] cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id
Posted by Rob Herring 3 months, 2 weeks ago
On Fri, Jun 13, 2025 at 8:04 AM James Morse <james.morse@arm.com> wrote:
>
> Filesystems like resctrl use the cache-id exposed via sysfs to identify
> groups of CPUs. The value is also used for PCIe cache steering tags. On
> DT platforms cache-id is not something that is described in the
> device-tree, but instead generated from the smallest CPU h/w id of the
> CPUs associated with that cache.
>
> CPU h/w ids may be larger than 32 bits.
>
> Add a hook to allow architectures to compress the value from the devicetree
> into 32 bits. Returning the same value is always safe as cache_of_set_id()
> will stop if a value larger than 32 bits is seen.
>
> For example, on arm64 the value is the MPIDR affinity register, which only
> has 32 bits of affinity data, but spread across the 64 bit field. An
> arch-specific bit swizzle gives a 32 bit value.

What's missing here is why do we need the cache id to be only 32-bits?
I suppose it is because the sysfs 'id' file has been implicitly that?
Why can't we just allow 64-bit values there? Obviously, you can't have
a 64-bit value on x86 because that might break existing userspace. But
for Arm, there is no existing userspace to break. Even with 32-bits,
it is entirely possible that an existing userspace assumed values less
than 32-bits and would be broken for Arm as-is. It is obviously nice
if we can avoid modifying userspace, but I don't think that's a
requirement and I'd be surprised if there's not other things that need
to be adapted for MPAM support.

Also, what if an architecture can't swizzle their value into 32-bits?
They would be stuck with requiring userspace to deal with 64-bit
values.

Rob
Re: [PATCH 2/5] cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id
Posted by James Morse 3 months, 2 weeks ago
Hi Rob,

On 23/06/2025 15:48, Rob Herring wrote:
> On Fri, Jun 13, 2025 at 8:04 AM James Morse <james.morse@arm.com> wrote:
>> Filesystems like resctrl use the cache-id exposed via sysfs to identify
>> groups of CPUs. The value is also used for PCIe cache steering tags. On
>> DT platforms cache-id is not something that is described in the
>> device-tree, but instead generated from the smallest CPU h/w id of the
>> CPUs associated with that cache.
>>
>> CPU h/w ids may be larger than 32 bits.
>>
>> Add a hook to allow architectures to compress the value from the devicetree
>> into 32 bits. Returning the same value is always safe as cache_of_set_id()
>> will stop if a value larger than 32 bits is seen.
>>
>> For example, on arm64 the value is the MPIDR affinity register, which only
>> has 32 bits of affinity data, but spread across the 64 bit field. An
>> arch-specific bit swizzle gives a 32 bit value.

> What's missing here is why do we need the cache id to be only 32-bits?
> I suppose it is because the sysfs 'id' file has been implicitly that?

Yup, and its too late to change.


> Why can't we just allow 64-bit values there? Obviously, you can't have
> a 64-bit value on x86 because that might break existing userspace.

It's the same user-space. Users of resctrl should be portable between architectures.
Resctrl isn't the only user, of the cache-id field.


> But for Arm, there is no existing userspace to break.

libvirt: https://github.com/libvirt/libvirt/blob/master/src/util/virresctrl.c#L1588
DPDK: http://inbox.dpdk.org/dev/20241021015246.304431-2-wathsala.vithanage@arm.com/


> Even with 32-bits,
> it is entirely possible that an existing userspace assumed values less
> than 32-bits and would be broken for Arm as-is.

Sure, but I've not found a project where that is broken yet.


> It is obviously nice
> if we can avoid modifying userspace, but I don't think that's a
> requirement and I'd be surprised if there's not other things that need
> to be adapted for MPAM support.

The whole multi-year effort has been to make existing user-space work without any ABI
changes. The effect is some platforms have features that can't be used because resctrl
expects things to be Xeon shaped.
But if your platform looks a bit like a Xeon (cache portion controls on the L3, memory
bandwidth controls somewhere that is believably the L3), then resctrl works as it does on
Intel. The only thing that has come a little unstuck is the 'num_rmid' property where MPAM
doesn't have an equivalent, so '1' is exposed as a safe value.


> Also, what if an architecture can't swizzle their value into 32-bits?
> They would be stuck with requiring userspace to deal with 64-bit
> values.

Remap them in a more complicated way. Chances are there aren't 2^32 CPUs.


I'll add something about the libvirt example to the commit message.


Thanks,

James
Re: [PATCH 2/5] cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id
Posted by Rob Herring 3 months, 1 week ago
On Fri, Jun 27, 2025 at 11:38 AM James Morse <james.morse@arm.com> wrote:
>
> Hi Rob,
>
> On 23/06/2025 15:48, Rob Herring wrote:
> > On Fri, Jun 13, 2025 at 8:04 AM James Morse <james.morse@arm.com> wrote:
> >> Filesystems like resctrl use the cache-id exposed via sysfs to identify
> >> groups of CPUs. The value is also used for PCIe cache steering tags. On
> >> DT platforms cache-id is not something that is described in the
> >> device-tree, but instead generated from the smallest CPU h/w id of the
> >> CPUs associated with that cache.
> >>
> >> CPU h/w ids may be larger than 32 bits.
> >>
> >> Add a hook to allow architectures to compress the value from the devicetree
> >> into 32 bits. Returning the same value is always safe as cache_of_set_id()
> >> will stop if a value larger than 32 bits is seen.
> >>
> >> For example, on arm64 the value is the MPIDR affinity register, which only
> >> has 32 bits of affinity data, but spread across the 64 bit field. An
> >> arch-specific bit swizzle gives a 32 bit value.
>
> > What's missing here is why do we need the cache id to be only 32-bits?
> > I suppose it is because the sysfs 'id' file has been implicitly that?
>
> Yup, and its too late to change.
>
>
> > Why can't we just allow 64-bit values there? Obviously, you can't have
> > a 64-bit value on x86 because that might break existing userspace.
>
> It's the same user-space. Users of resctrl should be portable between architectures.
> Resctrl isn't the only user, of the cache-id field.
>
>
> > But for Arm, there is no existing userspace to break.
>
> libvirt: https://github.com/libvirt/libvirt/blob/master/src/util/virresctrl.c#L1588

Looks to me like AMD wasn't even supported til v10.8.0 (2024-10-01)[1].

> DPDK: http://inbox.dpdk.org/dev/20241021015246.304431-2-wathsala.vithanage@arm.com/

Is that even applied yet?

> > Even with 32-bits,
> > it is entirely possible that an existing userspace assumed values less
> > than 32-bits and would be broken for Arm as-is.
>
> Sure, but I've not found a project where that is broken yet.
>
>
> > It is obviously nice
> > if we can avoid modifying userspace, but I don't think that's a
> > requirement and I'd be surprised if there's not other things that need
> > to be adapted for MPAM support.
>
> The whole multi-year effort has been to make existing user-space work without any ABI
> changes. The effect is some platforms have features that can't be used because resctrl
> expects things to be Xeon shaped.
> But if your platform looks a bit like a Xeon (cache portion controls on the L3, memory
> bandwidth controls somewhere that is believably the L3), then resctrl works as it does on
> Intel. The only thing that has come a little unstuck is the 'num_rmid' property where MPAM
> doesn't have an equivalent, so '1' is exposed as a safe value.

Fair enough, but I'd be rather surprised if there doesn't end up being
changes to support Arm platforms.

> > Also, what if an architecture can't swizzle their value into 32-bits?
> > They would be stuck with requiring userspace to deal with 64-bit
> > values.
>
> Remap them in a more complicated way. Chances are there aren't 2^32 CPUs.

What about using the logical CPU number instead? That's stable for a
given machine and firmware. And then instead of having 3 sets of
numbers (MPIDR, compressed MPIDR, and logical CPU), we'd still only
have 2. And logical CPU is what sysfs already exposes to userspace.

Rob

[1] https://libvirt.org/news.html
Re: [PATCH 2/5] cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id
Posted by James Morse 3 months, 1 week ago
Hi Rob,

On 30/06/2025 20:43, Rob Herring wrote:
> On Fri, Jun 27, 2025 at 11:38 AM James Morse <james.morse@arm.com> wrote:
>> On 23/06/2025 15:48, Rob Herring wrote:
>>> On Fri, Jun 13, 2025 at 8:04 AM James Morse <james.morse@arm.com> wrote:
>>>> Filesystems like resctrl use the cache-id exposed via sysfs to identify
>>>> groups of CPUs. The value is also used for PCIe cache steering tags. On
>>>> DT platforms cache-id is not something that is described in the
>>>> device-tree, but instead generated from the smallest CPU h/w id of the
>>>> CPUs associated with that cache.
>>>>
>>>> CPU h/w ids may be larger than 32 bits.
>>>>
>>>> Add a hook to allow architectures to compress the value from the devicetree
>>>> into 32 bits. Returning the same value is always safe as cache_of_set_id()
>>>> will stop if a value larger than 32 bits is seen.
>>>>
>>>> For example, on arm64 the value is the MPIDR affinity register, which only
>>>> has 32 bits of affinity data, but spread across the 64 bit field. An
>>>> arch-specific bit swizzle gives a 32 bit value.
>>
>>> What's missing here is why do we need the cache id to be only 32-bits?
>>> I suppose it is because the sysfs 'id' file has been implicitly that?
>>
>> Yup, and its too late to change.
>>
>>
>>> Why can't we just allow 64-bit values there? Obviously, you can't have
>>> a 64-bit value on x86 because that might break existing userspace.
>>
>> It's the same user-space. Users of resctrl should be portable between architectures.
>> Resctrl isn't the only user, of the cache-id field.
>>
>>
>>> But for Arm, there is no existing userspace to break.
>>
>> libvirt: https://github.com/libvirt/libvirt/blob/master/src/util/virresctrl.c#L1588
> 
> Looks to me like AMD wasn't even supported til v10.8.0 (2024-10-01)[1].

'when mounted with [a particular option]'

AMDs bandwidth controls count in 1/8ths of 1GB/s - and you have to know you're running on
an AMD machine. I'm aiming for the arm64 support to be portable between Intel and RISC-V.


>> DPDK: http://inbox.dpdk.org/dev/20241021015246.304431-2-wathsala.vithanage@arm.com/
> 
> Is that even applied yet?

No idea, but its equally likely that I haven't found all the places this gets parsed by
user-space. I don't think we have a way of telling people using stable-kernels that we
might change the size of that field. It's pretty clear people don't anticipate it changing!

This is just the downside of exposing anything to user-space!

[...]

>>> It is obviously nice
>>> if we can avoid modifying userspace, but I don't think that's a
>>> requirement and I'd be surprised if there's not other things that need
>>> to be adapted for MPAM support.
>>
>> The whole multi-year effort has been to make existing user-space work without any ABI
>> changes. The effect is some platforms have features that can't be used because resctrl
>> expects things to be Xeon shaped.
>> But if your platform looks a bit like a Xeon (cache portion controls on the L3, memory
>> bandwidth controls somewhere that is believably the L3), then resctrl works as it does on
>> Intel. The only thing that has come a little unstuck is the 'num_rmid' property where MPAM
>> doesn't have an equivalent, so '1' is exposed as a safe value.
> 
> Fair enough, but I'd be rather surprised if there doesn't end up being
> changes to support Arm platforms.
> 
>>> Also, what if an architecture can't swizzle their value into 32-bits?
>>> They would be stuck with requiring userspace to deal with 64-bit
>>> values.
>>
>> Remap them in a more complicated way. Chances are there aren't 2^32 CPUs.

> What about using the logical CPU number instead? That's stable for a
> given machine and firmware.

Hmmm, if you offline CPU-0 then kexec, then CPU-1 becomes the new CPU-0 and the numbers
get doled out differently.


> And then instead of having 3 sets of
> numbers (MPIDR, compressed MPIDR, and logical CPU), we'd still only
> have 2. And logical CPU is what sysfs already exposes to userspace.

I don't think the linux allocated CPU number is robust enough.

We could use the CPU number as seen when walking through the DT to make it stable, but
that would still be a third type of number. It would save the arch hook to swizzle the
bits, but changing the DT would change the numbers which doesn't happen with this scheme.

Let me know if that is what you prefer.
(I'll summarise this on the cover-letter/patch-1 of the incoming series)


Thanks,

James
Re: [PATCH 2/5] cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id
Posted by Rob Herring 3 months ago
On Fri, Jul 4, 2025 at 12:39 PM James Morse <james.morse@arm.com> wrote:
>
> Hi Rob,
>
> On 30/06/2025 20:43, Rob Herring wrote:
> > On Fri, Jun 27, 2025 at 11:38 AM James Morse <james.morse@arm.com> wrote:
> >> On 23/06/2025 15:48, Rob Herring wrote:
> >>> On Fri, Jun 13, 2025 at 8:04 AM James Morse <james.morse@arm.com> wrote:
> >>>> Filesystems like resctrl use the cache-id exposed via sysfs to identify
> >>>> groups of CPUs. The value is also used for PCIe cache steering tags. On
> >>>> DT platforms cache-id is not something that is described in the
> >>>> device-tree, but instead generated from the smallest CPU h/w id of the
> >>>> CPUs associated with that cache.
> >>>>
> >>>> CPU h/w ids may be larger than 32 bits.
> >>>>
> >>>> Add a hook to allow architectures to compress the value from the devicetree
> >>>> into 32 bits. Returning the same value is always safe as cache_of_set_id()
> >>>> will stop if a value larger than 32 bits is seen.
> >>>>
> >>>> For example, on arm64 the value is the MPIDR affinity register, which only
> >>>> has 32 bits of affinity data, but spread across the 64 bit field. An
> >>>> arch-specific bit swizzle gives a 32 bit value.
> >>
> >>> What's missing here is why do we need the cache id to be only 32-bits?
> >>> I suppose it is because the sysfs 'id' file has been implicitly that?
> >>
> >> Yup, and its too late to change.
> >>
> >>
> >>> Why can't we just allow 64-bit values there? Obviously, you can't have
> >>> a 64-bit value on x86 because that might break existing userspace.
> >>
> >> It's the same user-space. Users of resctrl should be portable between architectures.
> >> Resctrl isn't the only user, of the cache-id field.
> >>
> >>
> >>> But for Arm, there is no existing userspace to break.
> >>
> >> libvirt: https://github.com/libvirt/libvirt/blob/master/src/util/virresctrl.c#L1588
> >
> > Looks to me like AMD wasn't even supported til v10.8.0 (2024-10-01)[1].
>
> 'when mounted with [a particular option]'
>
> AMDs bandwidth controls count in 1/8ths of 1GB/s - and you have to know you're running on
> an AMD machine. I'm aiming for the arm64 support to be portable between Intel and RISC-V.
>
>
> >> DPDK: http://inbox.dpdk.org/dev/20241021015246.304431-2-wathsala.vithanage@arm.com/
> >
> > Is that even applied yet?
>
> No idea, but its equally likely that I haven't found all the places this gets parsed by
> user-space. I don't think we have a way of telling people using stable-kernels that we
> might change the size of that field. It's pretty clear people don't anticipate it changing!
>
> This is just the downside of exposing anything to user-space!
>
> [...]
>
> >>> It is obviously nice
> >>> if we can avoid modifying userspace, but I don't think that's a
> >>> requirement and I'd be surprised if there's not other things that need
> >>> to be adapted for MPAM support.
> >>
> >> The whole multi-year effort has been to make existing user-space work without any ABI
> >> changes. The effect is some platforms have features that can't be used because resctrl
> >> expects things to be Xeon shaped.
> >> But if your platform looks a bit like a Xeon (cache portion controls on the L3, memory
> >> bandwidth controls somewhere that is believably the L3), then resctrl works as it does on
> >> Intel. The only thing that has come a little unstuck is the 'num_rmid' property where MPAM
> >> doesn't have an equivalent, so '1' is exposed as a safe value.
> >
> > Fair enough, but I'd be rather surprised if there doesn't end up being
> > changes to support Arm platforms.
> >
> >>> Also, what if an architecture can't swizzle their value into 32-bits?
> >>> They would be stuck with requiring userspace to deal with 64-bit
> >>> values.
> >>
> >> Remap them in a more complicated way. Chances are there aren't 2^32 CPUs.
>
> > What about using the logical CPU number instead? That's stable for a
> > given machine and firmware.
>
> Hmmm, if you offline CPU-0 then kexec, then CPU-1 becomes the new CPU-0 and the numbers
> get doled out differently.

Ah, I thought they were more stable than that, but indeed there's a
special case for the boot CPU.


> > And then instead of having 3 sets of
> > numbers (MPIDR, compressed MPIDR, and logical CPU), we'd still only
> > have 2. And logical CPU is what sysfs already exposes to userspace.
>
> I don't think the linux allocated CPU number is robust enough.
>
> We could use the CPU number as seen when walking through the DT to make it stable, but
> that would still be a third type of number. It would save the arch hook to swizzle the
> bits, but changing the DT would change the numbers which doesn't happen with this scheme.
>
> Let me know if that is what you prefer.
> (I'll summarise this on the cover-letter/patch-1 of the incoming series)

Let's leave it with what you have...

Rob
Re: [PATCH 2/5] cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id
Posted by Jonathan Cameron 3 months, 3 weeks ago
On Fri, 13 Jun 2025 13:03:53 +0000
James Morse <james.morse@arm.com> wrote:

> Filesystems like resctrl use the cache-id exposed via sysfs to identify
> groups of CPUs. The value is also used for PCIe cache steering tags. On
> DT platforms cache-id is not something that is described in the
> device-tree, but instead generated from the smallest CPU h/w id of the
> CPUs associated with that cache.
> 
> CPU h/w ids may be larger than 32 bits.
> 
> Add a hook to allow architectures to compress the value from the devicetree
> into 32 bits. Returning the same value is always safe as cache_of_set_id()
> will stop if a value larger than 32 bits is seen.
> 
> For example, on arm64 the value is the MPIDR affinity register, which only
> has 32 bits of affinity data, but spread across the 64 bit field. An
> arch-specific bit swizzle gives a 32 bit value.
> 
> Signed-off-by: James Morse <james.morse@arm.com>

Looks fine to me
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

> ---
>  drivers/base/cacheinfo.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
> index 9888d87840a2..d8e5b4c7156c 100644
> --- a/drivers/base/cacheinfo.c
> +++ b/drivers/base/cacheinfo.c
> @@ -184,6 +184,10 @@ static bool cache_node_is_unified(struct cacheinfo *this_leaf,
>  	return of_property_read_bool(np, "cache-unified");
>  }
>  
> +#ifndef arch_compact_of_hwid
> +#define arch_compact_of_hwid(_x)	(_x)
> +#endif
> +
>  static void cache_of_set_id(struct cacheinfo *this_leaf, struct device_node *np)
>  {
>  	struct device_node *cpu;
> @@ -193,6 +197,7 @@ static void cache_of_set_id(struct cacheinfo *this_leaf, struct device_node *np)
>  		struct device_node *cache_node __free(device_node) = of_find_next_cache_node(cpu);
>  		u64 id = of_get_cpu_hwid(cpu, 0);
>  
> +		id = arch_compact_of_hwid(id);
>  		if (FIELD_GET(GENMASK_ULL(63, 32), id)) {
>  			of_node_put(cpu);
>  			return;