Filesystems like resctrl use the cache-id exposed via sysfs to identify
groups of CPUs. The value is also used for PCIe cache steering tags. On
DT platforms cache-id is not something that is described in the
device-tree, but instead generated from the smallest CPU h/w id of the
CPUs associated with that cache.
CPU h/w ids may be larger than 32 bits.
Add a hook to allow architectures to compress the value from the devicetree
into 32 bits. Returning the same value is always safe as cache_of_set_id()
will stop if a value larger than 32 bits is seen.
For example, on arm64 the value is the MPIDR affinity register, which only
has 32 bits of affinity data, but spread across the 64 bit field. An
arch-specific bit swizzle gives a 32 bit value.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/base/cacheinfo.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index 9888d87840a2..d8e5b4c7156c 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -184,6 +184,10 @@ static bool cache_node_is_unified(struct cacheinfo *this_leaf,
return of_property_read_bool(np, "cache-unified");
}
+#ifndef arch_compact_of_hwid
+#define arch_compact_of_hwid(_x) (_x)
+#endif
+
static void cache_of_set_id(struct cacheinfo *this_leaf, struct device_node *np)
{
struct device_node *cpu;
@@ -193,6 +197,7 @@ static void cache_of_set_id(struct cacheinfo *this_leaf, struct device_node *np)
struct device_node *cache_node __free(device_node) = of_find_next_cache_node(cpu);
u64 id = of_get_cpu_hwid(cpu, 0);
+ id = arch_compact_of_hwid(id);
if (FIELD_GET(GENMASK_ULL(63, 32), id)) {
of_node_put(cpu);
return;
--
2.39.5
On Fri, Jun 13, 2025 at 8:04 AM James Morse <james.morse@arm.com> wrote: > > Filesystems like resctrl use the cache-id exposed via sysfs to identify > groups of CPUs. The value is also used for PCIe cache steering tags. On > DT platforms cache-id is not something that is described in the > device-tree, but instead generated from the smallest CPU h/w id of the > CPUs associated with that cache. > > CPU h/w ids may be larger than 32 bits. > > Add a hook to allow architectures to compress the value from the devicetree > into 32 bits. Returning the same value is always safe as cache_of_set_id() > will stop if a value larger than 32 bits is seen. > > For example, on arm64 the value is the MPIDR affinity register, which only > has 32 bits of affinity data, but spread across the 64 bit field. An > arch-specific bit swizzle gives a 32 bit value. What's missing here is why do we need the cache id to be only 32-bits? I suppose it is because the sysfs 'id' file has been implicitly that? Why can't we just allow 64-bit values there? Obviously, you can't have a 64-bit value on x86 because that might break existing userspace. But for Arm, there is no existing userspace to break. Even with 32-bits, it is entirely possible that an existing userspace assumed values less than 32-bits and would be broken for Arm as-is. It is obviously nice if we can avoid modifying userspace, but I don't think that's a requirement and I'd be surprised if there's not other things that need to be adapted for MPAM support. Also, what if an architecture can't swizzle their value into 32-bits? They would be stuck with requiring userspace to deal with 64-bit values. Rob
Hi Rob, On 23/06/2025 15:48, Rob Herring wrote: > On Fri, Jun 13, 2025 at 8:04 AM James Morse <james.morse@arm.com> wrote: >> Filesystems like resctrl use the cache-id exposed via sysfs to identify >> groups of CPUs. The value is also used for PCIe cache steering tags. On >> DT platforms cache-id is not something that is described in the >> device-tree, but instead generated from the smallest CPU h/w id of the >> CPUs associated with that cache. >> >> CPU h/w ids may be larger than 32 bits. >> >> Add a hook to allow architectures to compress the value from the devicetree >> into 32 bits. Returning the same value is always safe as cache_of_set_id() >> will stop if a value larger than 32 bits is seen. >> >> For example, on arm64 the value is the MPIDR affinity register, which only >> has 32 bits of affinity data, but spread across the 64 bit field. An >> arch-specific bit swizzle gives a 32 bit value. > What's missing here is why do we need the cache id to be only 32-bits? > I suppose it is because the sysfs 'id' file has been implicitly that? Yup, and its too late to change. > Why can't we just allow 64-bit values there? Obviously, you can't have > a 64-bit value on x86 because that might break existing userspace. It's the same user-space. Users of resctrl should be portable between architectures. Resctrl isn't the only user, of the cache-id field. > But for Arm, there is no existing userspace to break. libvirt: https://github.com/libvirt/libvirt/blob/master/src/util/virresctrl.c#L1588 DPDK: http://inbox.dpdk.org/dev/20241021015246.304431-2-wathsala.vithanage@arm.com/ > Even with 32-bits, > it is entirely possible that an existing userspace assumed values less > than 32-bits and would be broken for Arm as-is. Sure, but I've not found a project where that is broken yet. > It is obviously nice > if we can avoid modifying userspace, but I don't think that's a > requirement and I'd be surprised if there's not other things that need > to be adapted for MPAM support. The whole multi-year effort has been to make existing user-space work without any ABI changes. The effect is some platforms have features that can't be used because resctrl expects things to be Xeon shaped. But if your platform looks a bit like a Xeon (cache portion controls on the L3, memory bandwidth controls somewhere that is believably the L3), then resctrl works as it does on Intel. The only thing that has come a little unstuck is the 'num_rmid' property where MPAM doesn't have an equivalent, so '1' is exposed as a safe value. > Also, what if an architecture can't swizzle their value into 32-bits? > They would be stuck with requiring userspace to deal with 64-bit > values. Remap them in a more complicated way. Chances are there aren't 2^32 CPUs. I'll add something about the libvirt example to the commit message. Thanks, James
On Fri, Jun 27, 2025 at 11:38 AM James Morse <james.morse@arm.com> wrote: > > Hi Rob, > > On 23/06/2025 15:48, Rob Herring wrote: > > On Fri, Jun 13, 2025 at 8:04 AM James Morse <james.morse@arm.com> wrote: > >> Filesystems like resctrl use the cache-id exposed via sysfs to identify > >> groups of CPUs. The value is also used for PCIe cache steering tags. On > >> DT platforms cache-id is not something that is described in the > >> device-tree, but instead generated from the smallest CPU h/w id of the > >> CPUs associated with that cache. > >> > >> CPU h/w ids may be larger than 32 bits. > >> > >> Add a hook to allow architectures to compress the value from the devicetree > >> into 32 bits. Returning the same value is always safe as cache_of_set_id() > >> will stop if a value larger than 32 bits is seen. > >> > >> For example, on arm64 the value is the MPIDR affinity register, which only > >> has 32 bits of affinity data, but spread across the 64 bit field. An > >> arch-specific bit swizzle gives a 32 bit value. > > > What's missing here is why do we need the cache id to be only 32-bits? > > I suppose it is because the sysfs 'id' file has been implicitly that? > > Yup, and its too late to change. > > > > Why can't we just allow 64-bit values there? Obviously, you can't have > > a 64-bit value on x86 because that might break existing userspace. > > It's the same user-space. Users of resctrl should be portable between architectures. > Resctrl isn't the only user, of the cache-id field. > > > > But for Arm, there is no existing userspace to break. > > libvirt: https://github.com/libvirt/libvirt/blob/master/src/util/virresctrl.c#L1588 Looks to me like AMD wasn't even supported til v10.8.0 (2024-10-01)[1]. > DPDK: http://inbox.dpdk.org/dev/20241021015246.304431-2-wathsala.vithanage@arm.com/ Is that even applied yet? > > Even with 32-bits, > > it is entirely possible that an existing userspace assumed values less > > than 32-bits and would be broken for Arm as-is. > > Sure, but I've not found a project where that is broken yet. > > > > It is obviously nice > > if we can avoid modifying userspace, but I don't think that's a > > requirement and I'd be surprised if there's not other things that need > > to be adapted for MPAM support. > > The whole multi-year effort has been to make existing user-space work without any ABI > changes. The effect is some platforms have features that can't be used because resctrl > expects things to be Xeon shaped. > But if your platform looks a bit like a Xeon (cache portion controls on the L3, memory > bandwidth controls somewhere that is believably the L3), then resctrl works as it does on > Intel. The only thing that has come a little unstuck is the 'num_rmid' property where MPAM > doesn't have an equivalent, so '1' is exposed as a safe value. Fair enough, but I'd be rather surprised if there doesn't end up being changes to support Arm platforms. > > Also, what if an architecture can't swizzle their value into 32-bits? > > They would be stuck with requiring userspace to deal with 64-bit > > values. > > Remap them in a more complicated way. Chances are there aren't 2^32 CPUs. What about using the logical CPU number instead? That's stable for a given machine and firmware. And then instead of having 3 sets of numbers (MPIDR, compressed MPIDR, and logical CPU), we'd still only have 2. And logical CPU is what sysfs already exposes to userspace. Rob [1] https://libvirt.org/news.html
Hi Rob, On 30/06/2025 20:43, Rob Herring wrote: > On Fri, Jun 27, 2025 at 11:38 AM James Morse <james.morse@arm.com> wrote: >> On 23/06/2025 15:48, Rob Herring wrote: >>> On Fri, Jun 13, 2025 at 8:04 AM James Morse <james.morse@arm.com> wrote: >>>> Filesystems like resctrl use the cache-id exposed via sysfs to identify >>>> groups of CPUs. The value is also used for PCIe cache steering tags. On >>>> DT platforms cache-id is not something that is described in the >>>> device-tree, but instead generated from the smallest CPU h/w id of the >>>> CPUs associated with that cache. >>>> >>>> CPU h/w ids may be larger than 32 bits. >>>> >>>> Add a hook to allow architectures to compress the value from the devicetree >>>> into 32 bits. Returning the same value is always safe as cache_of_set_id() >>>> will stop if a value larger than 32 bits is seen. >>>> >>>> For example, on arm64 the value is the MPIDR affinity register, which only >>>> has 32 bits of affinity data, but spread across the 64 bit field. An >>>> arch-specific bit swizzle gives a 32 bit value. >> >>> What's missing here is why do we need the cache id to be only 32-bits? >>> I suppose it is because the sysfs 'id' file has been implicitly that? >> >> Yup, and its too late to change. >> >> >>> Why can't we just allow 64-bit values there? Obviously, you can't have >>> a 64-bit value on x86 because that might break existing userspace. >> >> It's the same user-space. Users of resctrl should be portable between architectures. >> Resctrl isn't the only user, of the cache-id field. >> >> >>> But for Arm, there is no existing userspace to break. >> >> libvirt: https://github.com/libvirt/libvirt/blob/master/src/util/virresctrl.c#L1588 > > Looks to me like AMD wasn't even supported til v10.8.0 (2024-10-01)[1]. 'when mounted with [a particular option]' AMDs bandwidth controls count in 1/8ths of 1GB/s - and you have to know you're running on an AMD machine. I'm aiming for the arm64 support to be portable between Intel and RISC-V. >> DPDK: http://inbox.dpdk.org/dev/20241021015246.304431-2-wathsala.vithanage@arm.com/ > > Is that even applied yet? No idea, but its equally likely that I haven't found all the places this gets parsed by user-space. I don't think we have a way of telling people using stable-kernels that we might change the size of that field. It's pretty clear people don't anticipate it changing! This is just the downside of exposing anything to user-space! [...] >>> It is obviously nice >>> if we can avoid modifying userspace, but I don't think that's a >>> requirement and I'd be surprised if there's not other things that need >>> to be adapted for MPAM support. >> >> The whole multi-year effort has been to make existing user-space work without any ABI >> changes. The effect is some platforms have features that can't be used because resctrl >> expects things to be Xeon shaped. >> But if your platform looks a bit like a Xeon (cache portion controls on the L3, memory >> bandwidth controls somewhere that is believably the L3), then resctrl works as it does on >> Intel. The only thing that has come a little unstuck is the 'num_rmid' property where MPAM >> doesn't have an equivalent, so '1' is exposed as a safe value. > > Fair enough, but I'd be rather surprised if there doesn't end up being > changes to support Arm platforms. > >>> Also, what if an architecture can't swizzle their value into 32-bits? >>> They would be stuck with requiring userspace to deal with 64-bit >>> values. >> >> Remap them in a more complicated way. Chances are there aren't 2^32 CPUs. > What about using the logical CPU number instead? That's stable for a > given machine and firmware. Hmmm, if you offline CPU-0 then kexec, then CPU-1 becomes the new CPU-0 and the numbers get doled out differently. > And then instead of having 3 sets of > numbers (MPIDR, compressed MPIDR, and logical CPU), we'd still only > have 2. And logical CPU is what sysfs already exposes to userspace. I don't think the linux allocated CPU number is robust enough. We could use the CPU number as seen when walking through the DT to make it stable, but that would still be a third type of number. It would save the arch hook to swizzle the bits, but changing the DT would change the numbers which doesn't happen with this scheme. Let me know if that is what you prefer. (I'll summarise this on the cover-letter/patch-1 of the incoming series) Thanks, James
On Fri, Jul 4, 2025 at 12:39 PM James Morse <james.morse@arm.com> wrote: > > Hi Rob, > > On 30/06/2025 20:43, Rob Herring wrote: > > On Fri, Jun 27, 2025 at 11:38 AM James Morse <james.morse@arm.com> wrote: > >> On 23/06/2025 15:48, Rob Herring wrote: > >>> On Fri, Jun 13, 2025 at 8:04 AM James Morse <james.morse@arm.com> wrote: > >>>> Filesystems like resctrl use the cache-id exposed via sysfs to identify > >>>> groups of CPUs. The value is also used for PCIe cache steering tags. On > >>>> DT platforms cache-id is not something that is described in the > >>>> device-tree, but instead generated from the smallest CPU h/w id of the > >>>> CPUs associated with that cache. > >>>> > >>>> CPU h/w ids may be larger than 32 bits. > >>>> > >>>> Add a hook to allow architectures to compress the value from the devicetree > >>>> into 32 bits. Returning the same value is always safe as cache_of_set_id() > >>>> will stop if a value larger than 32 bits is seen. > >>>> > >>>> For example, on arm64 the value is the MPIDR affinity register, which only > >>>> has 32 bits of affinity data, but spread across the 64 bit field. An > >>>> arch-specific bit swizzle gives a 32 bit value. > >> > >>> What's missing here is why do we need the cache id to be only 32-bits? > >>> I suppose it is because the sysfs 'id' file has been implicitly that? > >> > >> Yup, and its too late to change. > >> > >> > >>> Why can't we just allow 64-bit values there? Obviously, you can't have > >>> a 64-bit value on x86 because that might break existing userspace. > >> > >> It's the same user-space. Users of resctrl should be portable between architectures. > >> Resctrl isn't the only user, of the cache-id field. > >> > >> > >>> But for Arm, there is no existing userspace to break. > >> > >> libvirt: https://github.com/libvirt/libvirt/blob/master/src/util/virresctrl.c#L1588 > > > > Looks to me like AMD wasn't even supported til v10.8.0 (2024-10-01)[1]. > > 'when mounted with [a particular option]' > > AMDs bandwidth controls count in 1/8ths of 1GB/s - and you have to know you're running on > an AMD machine. I'm aiming for the arm64 support to be portable between Intel and RISC-V. > > > >> DPDK: http://inbox.dpdk.org/dev/20241021015246.304431-2-wathsala.vithanage@arm.com/ > > > > Is that even applied yet? > > No idea, but its equally likely that I haven't found all the places this gets parsed by > user-space. I don't think we have a way of telling people using stable-kernels that we > might change the size of that field. It's pretty clear people don't anticipate it changing! > > This is just the downside of exposing anything to user-space! > > [...] > > >>> It is obviously nice > >>> if we can avoid modifying userspace, but I don't think that's a > >>> requirement and I'd be surprised if there's not other things that need > >>> to be adapted for MPAM support. > >> > >> The whole multi-year effort has been to make existing user-space work without any ABI > >> changes. The effect is some platforms have features that can't be used because resctrl > >> expects things to be Xeon shaped. > >> But if your platform looks a bit like a Xeon (cache portion controls on the L3, memory > >> bandwidth controls somewhere that is believably the L3), then resctrl works as it does on > >> Intel. The only thing that has come a little unstuck is the 'num_rmid' property where MPAM > >> doesn't have an equivalent, so '1' is exposed as a safe value. > > > > Fair enough, but I'd be rather surprised if there doesn't end up being > > changes to support Arm platforms. > > > >>> Also, what if an architecture can't swizzle their value into 32-bits? > >>> They would be stuck with requiring userspace to deal with 64-bit > >>> values. > >> > >> Remap them in a more complicated way. Chances are there aren't 2^32 CPUs. > > > What about using the logical CPU number instead? That's stable for a > > given machine and firmware. > > Hmmm, if you offline CPU-0 then kexec, then CPU-1 becomes the new CPU-0 and the numbers > get doled out differently. Ah, I thought they were more stable than that, but indeed there's a special case for the boot CPU. > > And then instead of having 3 sets of > > numbers (MPIDR, compressed MPIDR, and logical CPU), we'd still only > > have 2. And logical CPU is what sysfs already exposes to userspace. > > I don't think the linux allocated CPU number is robust enough. > > We could use the CPU number as seen when walking through the DT to make it stable, but > that would still be a third type of number. It would save the arch hook to swizzle the > bits, but changing the DT would change the numbers which doesn't happen with this scheme. > > Let me know if that is what you prefer. > (I'll summarise this on the cover-letter/patch-1 of the incoming series) Let's leave it with what you have... Rob
On Fri, 13 Jun 2025 13:03:53 +0000 James Morse <james.morse@arm.com> wrote: > Filesystems like resctrl use the cache-id exposed via sysfs to identify > groups of CPUs. The value is also used for PCIe cache steering tags. On > DT platforms cache-id is not something that is described in the > device-tree, but instead generated from the smallest CPU h/w id of the > CPUs associated with that cache. > > CPU h/w ids may be larger than 32 bits. > > Add a hook to allow architectures to compress the value from the devicetree > into 32 bits. Returning the same value is always safe as cache_of_set_id() > will stop if a value larger than 32 bits is seen. > > For example, on arm64 the value is the MPIDR affinity register, which only > has 32 bits of affinity data, but spread across the 64 bit field. An > arch-specific bit swizzle gives a 32 bit value. > > Signed-off-by: James Morse <james.morse@arm.com> Looks fine to me Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> > --- > drivers/base/cacheinfo.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c > index 9888d87840a2..d8e5b4c7156c 100644 > --- a/drivers/base/cacheinfo.c > +++ b/drivers/base/cacheinfo.c > @@ -184,6 +184,10 @@ static bool cache_node_is_unified(struct cacheinfo *this_leaf, > return of_property_read_bool(np, "cache-unified"); > } > > +#ifndef arch_compact_of_hwid > +#define arch_compact_of_hwid(_x) (_x) > +#endif > + > static void cache_of_set_id(struct cacheinfo *this_leaf, struct device_node *np) > { > struct device_node *cpu; > @@ -193,6 +197,7 @@ static void cache_of_set_id(struct cacheinfo *this_leaf, struct device_node *np) > struct device_node *cache_node __free(device_node) = of_find_next_cache_node(cpu); > u64 id = of_get_cpu_hwid(cpu, 0); > > + id = arch_compact_of_hwid(id); > if (FIELD_GET(GENMASK_ULL(63, 32), id)) { > of_node_put(cpu); > return;
© 2016 - 2025 Red Hat, Inc.