arch/x86/kernel/cpu/resctrl/core.c | 6 ++++-- fs/resctrl/ctrlmondata.c | 13 +++++++++---- fs/resctrl/internal.h | 4 ++-- fs/resctrl/monitor.c | 6 ++++-- fs/resctrl/rdtgroup.c | 6 +++--- include/linux/resctrl.h | 4 ++-- 6 files changed, 24 insertions(+), 15 deletions(-)
V6:
- Modify subject tag "x86/resctrl:" -> "x86,fs/resctrl:"
V5:
- Modify the commit log format again. I am truly grateful to Reinette
for her thoughtful guidance :).
V4:
- Add the tested by and reviewed by tags.
- Modify the commit log format.
V3:
- According to Reinette suggestion, modify the commit log description to
make it easier for readers to understand
V2:
- Keep sanity checks in the __mon_event_count to ensure execution occurs
exclusively on CPUs sharing the same L3 cache cluster
- When reading the top level event, obtain a CPU within hdr.cpu_mask.
Then use the cacheinfo shared_cpu_map of this CPU instead of using
hdr.cpu_mask directly
- Adjust code formatting and commit log descriptions.
In the resctrl subsystem's Sub-NUMA Cluster (SNC) mode, the rdt_mon_domain
structure representing a NUMA node relies on the cacheinfo interface
(rdt_mon_domain::ci) to store L3 cache information (e.g., shared_cpu_map)
for monitoring. The L3 cache information of a SNC NUMA node determines
which domains are summed for the "top level" L3-scoped events.
rdt_mon_domain::ci is initialized using the first online CPU of a NUMA
node. When this CPU goes offline, its shared_cpu_map is cleared to contain
only the offline CPU itself. Subsequently, attempting to read counters
via smp_call_on_cpu(offline_cpu) fails (and error ignored), returning
zero values for "top-level events" without any error indication.
Replace the cacheinfo references in struct rdt_mon_domain and struct
rmid_read with the cacheinfo ID (a unique identifier for the L3 cache).
rdt_domain_hdr::cpu_mask contains the online CPUs associated with that
domain. When reading "top-level events", select a CPU from
rdt_domain_hdr::cpu_mask and utilize its L3 shared_cpu_map to determine
valid CPUs for reading RMID counter via the MSR interface.
Considering all CPUs associated with the L3 cache improves the chances
of picking a housekeeping CPU on which the counter reading work can be
queued, avoiding an unnecessary IPI.
Qinyun Tan (1):
x86/resctrl: Remove unappropriate references to cacheinfo in the
resctrl subsystem.
arch/x86/kernel/cpu/resctrl/core.c | 6 ++++--
fs/resctrl/ctrlmondata.c | 13 +++++++++----
fs/resctrl/internal.h | 4 ++--
fs/resctrl/monitor.c | 6 ++++--
fs/resctrl/rdtgroup.c | 6 +++---
include/linux/resctrl.h | 4 ++--
6 files changed, 24 insertions(+), 15 deletions(-)
--
2.43.5
Dear x86 Maintainers, Could you please consider this resctrl fix for inclusion? Thank you very much. Reinette On 5/30/25 11:20 AM, Qinyun Tan wrote: > V6: > - Modify subject tag "x86/resctrl:" -> "x86,fs/resctrl:" > > V5: > - Modify the commit log format again. I am truly grateful to Reinette > for her thoughtful guidance :). > > V4: > - Add the tested by and reviewed by tags. > - Modify the commit log format. > > V3: > - According to Reinette suggestion, modify the commit log description to > make it easier for readers to understand > > V2: > - Keep sanity checks in the __mon_event_count to ensure execution occurs > exclusively on CPUs sharing the same L3 cache cluster > - When reading the top level event, obtain a CPU within hdr.cpu_mask. > Then use the cacheinfo shared_cpu_map of this CPU instead of using > hdr.cpu_mask directly > - Adjust code formatting and commit log descriptions. > > In the resctrl subsystem's Sub-NUMA Cluster (SNC) mode, the rdt_mon_domain > structure representing a NUMA node relies on the cacheinfo interface > (rdt_mon_domain::ci) to store L3 cache information (e.g., shared_cpu_map) > for monitoring. The L3 cache information of a SNC NUMA node determines > which domains are summed for the "top level" L3-scoped events. > > rdt_mon_domain::ci is initialized using the first online CPU of a NUMA > node. When this CPU goes offline, its shared_cpu_map is cleared to contain > only the offline CPU itself. Subsequently, attempting to read counters > via smp_call_on_cpu(offline_cpu) fails (and error ignored), returning > zero values for "top-level events" without any error indication. > > Replace the cacheinfo references in struct rdt_mon_domain and struct > rmid_read with the cacheinfo ID (a unique identifier for the L3 cache). > > rdt_domain_hdr::cpu_mask contains the online CPUs associated with that > domain. When reading "top-level events", select a CPU from > rdt_domain_hdr::cpu_mask and utilize its L3 shared_cpu_map to determine > valid CPUs for reading RMID counter via the MSR interface. > > Considering all CPUs associated with the L3 cache improves the chances > of picking a housekeeping CPU on which the counter reading work can be > queued, avoiding an unnecessary IPI. > > Qinyun Tan (1): > x86/resctrl: Remove unappropriate references to cacheinfo in the > resctrl subsystem. > > arch/x86/kernel/cpu/resctrl/core.c | 6 ++++-- > fs/resctrl/ctrlmondata.c | 13 +++++++++---- > fs/resctrl/internal.h | 4 ++-- > fs/resctrl/monitor.c | 6 ++++-- > fs/resctrl/rdtgroup.c | 6 +++--- > include/linux/resctrl.h | 4 ++-- > 6 files changed, 24 insertions(+), 15 deletions(-) >
© 2016 - 2025 Red Hat, Inc.