[PATCH V6 0/1] x86,fs/resctrl: Remove unappropriate references to cacheinfo in the resctrl subsystem.

Qinyun Tan posted 1 patch 6 months, 3 weeks ago
arch/x86/kernel/cpu/resctrl/core.c |  6 ++++--
fs/resctrl/ctrlmondata.c           | 13 +++++++++----
fs/resctrl/internal.h              |  4 ++--
fs/resctrl/monitor.c               |  6 ++++--
fs/resctrl/rdtgroup.c              |  6 +++---
include/linux/resctrl.h            |  4 ++--
6 files changed, 24 insertions(+), 15 deletions(-)
[PATCH V6 0/1] x86,fs/resctrl: Remove unappropriate references to cacheinfo in the resctrl subsystem.
Posted by Qinyun Tan 6 months, 3 weeks ago
V6:
- Modify subject tag "x86/resctrl:" -> "x86,fs/resctrl:"

V5:
- Modify the commit log format again. I am truly grateful to Reinette
  for her thoughtful guidance :).

V4:
- Add the tested by and reviewed by tags.
- Modify the commit log format.

V3:
- According to Reinette suggestion, modify the commit log description to
 make it easier for readers to understand

V2:
- Keep sanity checks in the __mon_event_count to ensure execution occurs
  exclusively on CPUs sharing the same L3 cache cluster
- When reading the top level event, obtain a CPU within hdr.cpu_mask.
  Then use the cacheinfo shared_cpu_map of this CPU instead of using
  hdr.cpu_mask directly
- Adjust code formatting and commit log descriptions.

In the resctrl subsystem's Sub-NUMA Cluster (SNC) mode, the rdt_mon_domain
structure representing a NUMA node relies on the cacheinfo interface
(rdt_mon_domain::ci) to store L3 cache information (e.g., shared_cpu_map)
for monitoring. The L3 cache information of a SNC NUMA node determines
which domains are summed for the "top level" L3-scoped events.

rdt_mon_domain::ci is initialized using the first online CPU of a NUMA
node. When this CPU goes offline, its shared_cpu_map is cleared to contain
only the offline CPU itself. Subsequently, attempting to read counters
via smp_call_on_cpu(offline_cpu) fails (and error ignored), returning
zero values for "top-level events" without any error indication.

Replace the cacheinfo references in struct rdt_mon_domain and struct
rmid_read with the cacheinfo ID (a unique identifier for the L3 cache).

rdt_domain_hdr::cpu_mask contains the online CPUs associated with that
domain. When reading "top-level events", select a CPU from
rdt_domain_hdr::cpu_mask and utilize its L3 shared_cpu_map to determine
valid CPUs for reading RMID counter via the MSR interface.

Considering all CPUs associated with the L3 cache improves the chances
of picking a housekeeping CPU on which the counter reading work can be
queued, avoiding an unnecessary IPI.

Qinyun Tan (1):
  x86/resctrl: Remove unappropriate references to cacheinfo in the
    resctrl subsystem.

 arch/x86/kernel/cpu/resctrl/core.c |  6 ++++--
 fs/resctrl/ctrlmondata.c           | 13 +++++++++----
 fs/resctrl/internal.h              |  4 ++--
 fs/resctrl/monitor.c               |  6 ++++--
 fs/resctrl/rdtgroup.c              |  6 +++---
 include/linux/resctrl.h            |  4 ++--
 6 files changed, 24 insertions(+), 15 deletions(-)

-- 
2.43.5
Re: [PATCH V6 0/1] x86,fs/resctrl: Remove unappropriate references to cacheinfo in the resctrl subsystem.
Posted by Reinette Chatre 6 months, 1 week ago
Dear x86 Maintainers,

Could you please consider this resctrl fix for inclusion?

Thank you very much.

Reinette

On 5/30/25 11:20 AM, Qinyun Tan wrote:
> V6:
> - Modify subject tag "x86/resctrl:" -> "x86,fs/resctrl:"
> 
> V5:
> - Modify the commit log format again. I am truly grateful to Reinette
>   for her thoughtful guidance :).
> 
> V4:
> - Add the tested by and reviewed by tags.
> - Modify the commit log format.
> 
> V3:
> - According to Reinette suggestion, modify the commit log description to
>  make it easier for readers to understand
> 
> V2:
> - Keep sanity checks in the __mon_event_count to ensure execution occurs
>   exclusively on CPUs sharing the same L3 cache cluster
> - When reading the top level event, obtain a CPU within hdr.cpu_mask.
>   Then use the cacheinfo shared_cpu_map of this CPU instead of using
>   hdr.cpu_mask directly
> - Adjust code formatting and commit log descriptions.
> 
> In the resctrl subsystem's Sub-NUMA Cluster (SNC) mode, the rdt_mon_domain
> structure representing a NUMA node relies on the cacheinfo interface
> (rdt_mon_domain::ci) to store L3 cache information (e.g., shared_cpu_map)
> for monitoring. The L3 cache information of a SNC NUMA node determines
> which domains are summed for the "top level" L3-scoped events.
> 
> rdt_mon_domain::ci is initialized using the first online CPU of a NUMA
> node. When this CPU goes offline, its shared_cpu_map is cleared to contain
> only the offline CPU itself. Subsequently, attempting to read counters
> via smp_call_on_cpu(offline_cpu) fails (and error ignored), returning
> zero values for "top-level events" without any error indication.
> 
> Replace the cacheinfo references in struct rdt_mon_domain and struct
> rmid_read with the cacheinfo ID (a unique identifier for the L3 cache).
> 
> rdt_domain_hdr::cpu_mask contains the online CPUs associated with that
> domain. When reading "top-level events", select a CPU from
> rdt_domain_hdr::cpu_mask and utilize its L3 shared_cpu_map to determine
> valid CPUs for reading RMID counter via the MSR interface.
> 
> Considering all CPUs associated with the L3 cache improves the chances
> of picking a housekeeping CPU on which the counter reading work can be
> queued, avoiding an unnecessary IPI.
> 
> Qinyun Tan (1):
>   x86/resctrl: Remove unappropriate references to cacheinfo in the
>     resctrl subsystem.
> 
>  arch/x86/kernel/cpu/resctrl/core.c |  6 ++++--
>  fs/resctrl/ctrlmondata.c           | 13 +++++++++----
>  fs/resctrl/internal.h              |  4 ++--
>  fs/resctrl/monitor.c               |  6 ++++--
>  fs/resctrl/rdtgroup.c              |  6 +++---
>  include/linux/resctrl.h            |  4 ++--
>  6 files changed, 24 insertions(+), 15 deletions(-)
>