Documentation/arch/x86/resctrl.rst | 17 ++ include/linux/resctrl.h | 91 +++++-- arch/x86/include/asm/msr-index.h | 1 + arch/x86/kernel/cpu/resctrl/internal.h | 114 ++++++-- arch/x86/kernel/cpu/resctrl/core.c | 312 ++++++++++++++++------ arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 74 +++-- arch/x86/kernel/cpu/resctrl/monitor.c | 220 ++++++++++++--- arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 27 +- arch/x86/kernel/cpu/resctrl/rdtgroup.c | 265 +++++++++++------- 9 files changed, 813 insertions(+), 308 deletions(-)
This series based on top of Linus upstream commit 33e02dc69afb ("Merge
tag 'sound-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound")
The Sub-NUMA cluster feature on some Intel processors partitions the CPUs
that share an L3 cache into two or more sets. This plays havoc with the
Resource Director Technology (RDT) monitoring features. Prior to this
patch Intel has advised that SNC and RDT are incompatible.
Some of these CPUs support an MSR that can partition the RMID counters
in the same way. This allows monitoring features to be used. Legacy
monitoring files provide the sum of counters from each SNC node for
backwards compatibility. Additional files per SNC node provide details
per node.
Cache and memory bandwidth allocation features continue to operate at
the scope of the L3 cache.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
Changes since v18: https://lore.kernel.org/all/20240515222326.74166-1-tony.luck@intel.com/
Global: Consistent use of "Sub-NUMA Cluster (SNC)"
1-4: No change
5: Rename RESCTRL_NODE as RESCTRL_L3_NODE to make it clear that
these "nodes" are each subsets of L3 cache instances.
6: Changes for snc_nodes_per_l3_cache are localized to monitor.c
Don't use it in decision block use of mba_MBps option.
Moved the old get_node_rmid() function here, but renamed it to
logical_rmid_to_physical_rmid() with a block comment explaining
how RMIDs are distributed when SNC is enabled. Function now
checks if snc_nodes_per_l3_cache == 1 for fast return.
7: New patch. Only allow mba_MBps option if scope of MBM matches MBA
8: Replaces old patch 8. "display_id" field is no more. Add and
initialize the @ci (struct cachinfo *) to rdt_mon_domain.
Note that the new get_cpu_cacheinfo_level() helper function is
added to internal.h as it will also be needed by patch 19.
9: Instead of display_id, add pointer to cacheinfo structure to
struct rmid_read. Add kerneldoc description of existing and
new fields.
10: Added to commit comment describing why mkdir_mondata_subdir()
needs to be refactored.
11: Dropped Intel specific description of fields in the mon_evt
structure. Say that choice of bit to steal was arbitrary, but
can be changed in the future.
12: Fixed typo s/and file/and files/ in commit message. Now using
the cacheinfo structure (specifically "id" field) instead of
display_id.
13: Wordsmith commit into imperative.
I looked at using kobject_has_children() to check for empty
directory, but it needs a "struct kobject *" and all I have
is "struct kernfs_node *". I'm now checking how many CPUs
remain in ci->shared_cpu_map to detect whether this is the
last SNC node.
s/kernfs_find_and_get_ns/kernfs_find_and_get/ in all places.
Fix copy/paste error which used "pgrp" instead of "cgrp".
Dropped the firtree fix for a function I hadn't touched.
Old patch 14 split into 14, 15, 16
The "wedging things until it works" path is gone. Instead
of passing in a random SNC domain that has the right display_id
code now makes use of cacheinfo both to get the L3 id, and to
pick the cpu mask for the smp_call*(). rr.d is now NULL in the
sum case as suggested.
14: New patch. Does the "top half" work of filling out the rmid_read
structure prior to the smp_call*().
15: Need to pass the cachinfo struct to resctrl_arch_rmid_read()
16: When "sum", resctrl_arch_rmid_read() loops across domains sharing
L3 cache.
17: (was 15) sanity
Removed "Fix" from the shortlog description. Use ci->shared_cpu_map
in the sanity check for sum case.
18: (new - split out from old 16) Try to do one thing at a time. Split
the MSR 0xCA0 update code from the SNC detection code.
19: (was 16) Fix typo s/couning/counting/
Use upper case for first letter of messages.
Use cpumask_weight() instead of bitmap_weight.
20: (was 17) Dropped the "This patch needs updating" part of commit
Tony Luck (20):
x86/resctrl: Prepare for new domain scope
x86/resctrl: Prepare to split rdt_domain structure
x86/resctrl: Prepare for different scope for control/monitor
operations
x86/resctrl: Split the rdt_domain and rdt_hw_domain structures
x86/resctrl: Add node-scope to the options for feature scope
x86/resctrl: Introduce snc_nodes_per_l3_cache
x86/resctrl: Block use of mba_MBps mount option on Sub-NUMA Cluster
(SNC) systems
x86/resctrl: Prepare for new Sub-NUMA Cluster (SNC) monitor files
x86/resctrl: Add new fields to struct rmid_read for summation of
domains
x86/resctrl: Refactor mkdir_mondata_subdir() with a helper function
x86/resctrl: Allocate a new bit in union mon_data_bits
x86/resctrl: Create Sub-NUMA Cluster (SNC) monitor files
x86/resctrl: Handle removing directories in Sub-NUMA Cluster (SNC)
mode
x86/resctrl: Fill out rmid_read structure for smp_call*() to read a
counter
x86/resctrl: Pass two extra arguments to resctrl_arch_rmid_read()
x86/resctrl: Make resctrl_arch_rmid_read() handle sum over domains
x86/resctrl: Update CPU sanity checks when reading RMID counters
x86/resctrl: Enable RMID shared RMID mode on Sub-NUMA Cluster (SNC)
systems
x86/resctrl: Sub-NUMA Cluster (SNC) detection and enabling
x86/resctrl: Update documentation with Sub-NUMA cluster changes
Documentation/arch/x86/resctrl.rst | 17 ++
include/linux/resctrl.h | 91 +++++--
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/resctrl/internal.h | 114 ++++++--
arch/x86/kernel/cpu/resctrl/core.c | 312 ++++++++++++++++------
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 74 +++--
arch/x86/kernel/cpu/resctrl/monitor.c | 220 ++++++++++++---
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 27 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 265 +++++++++++-------
9 files changed, 813 insertions(+), 308 deletions(-)
base-commit: 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0
--
2.45.0
Hi Tony,
On 5/28/24 3:19 PM, Tony Luck wrote:
> This series based on top of Linus upstream commit 33e02dc69afb ("Merge
> tag 'sound-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound")
>
> The Sub-NUMA cluster feature on some Intel processors partitions the CPUs
> that share an L3 cache into two or more sets. This plays havoc with the
> Resource Director Technology (RDT) monitoring features. Prior to this
> patch Intel has advised that SNC and RDT are incompatible.
>
> Some of these CPUs support an MSR that can partition the RMID counters
> in the same way. This allows monitoring features to be used. Legacy
> monitoring files provide the sum of counters from each SNC node for
> backwards compatibility. Additional files per SNC node provide details
> per node.
>
> Cache and memory bandwidth allocation features continue to operate at
> the scope of the L3 cache.
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
>
> ---
> Changes since v18: https://lore.kernel.org/all/20240515222326.74166-1-tony.luck@intel.com/
>
> Global: Consistent use of "Sub-NUMA Cluster (SNC)"
(briefly scanning changes)
>
> 1-4: No change
>
> 5: Rename RESCTRL_NODE as RESCTRL_L3_NODE to make it clear that
> these "nodes" are each subsets of L3 cache instances.
>
> 6: Changes for snc_nodes_per_l3_cache are localized to monitor.c
> Don't use it in decision block use of mba_MBps option.
> Moved the old get_node_rmid() function here, but renamed it to
> logical_rmid_to_physical_rmid() with a block comment explaining
> how RMIDs are distributed when SNC is enabled. Function now
> checks if snc_nodes_per_l3_cache == 1 for fast return.
>
> 7: New patch. Only allow mba_MBps option if scope of MBM matches MBA
>
> 8: Replaces old patch 8. "display_id" field is no more. Add and
> initialize the @ci (struct cachinfo *) to rdt_mon_domain.
> Note that the new get_cpu_cacheinfo_level() helper function is
> added to internal.h as it will also be needed by patch 19.
>
> 9: Instead of display_id, add pointer to cacheinfo structure to
> struct rmid_read. Add kerneldoc description of existing and
> new fields.
>
> 10: Added to commit comment describing why mkdir_mondata_subdir()
> needs to be refactored.
>
> 11: Dropped Intel specific description of fields in the mon_evt
> structure. Say that choice of bit to steal was arbitrary, but
> can be changed in the future.
>
> 12: Fixed typo s/and file/and files/ in commit message. Now using
> the cacheinfo structure (specifically "id" field) instead of
> display_id.
>
> 13: Wordsmith commit into imperative.
> I looked at using kobject_has_children() to check for empty
> directory, but it needs a "struct kobject *" and all I have
> is "struct kernfs_node *". I'm now checking how many CPUs
Consider how kobject_has_children() uses that struct kobject *.
Specifically:
return kobj->sd && kobj->sd->dir.subdirs
It operates on kobj->sd, which is exactly what you have: struct kernfs_node.
> remain in ci->shared_cpu_map to detect whether this is the
> last SNC node.
hmmm, ok, will take a look ... but please finalize discussion of a patch series
before submitting a new series that rejects feedback without discussion and
does something completely different in new version.
Reinette
On Tue, May 28, 2024 at 03:55:29PM -0700, Reinette Chatre wrote:
> Hi Tony,
> > 13: Wordsmith commit into imperative.
> > I looked at using kobject_has_children() to check for empty
> > directory, but it needs a "struct kobject *" and all I have
> > is "struct kernfs_node *". I'm now checking how many CPUs
>
> Consider how kobject_has_children() uses that struct kobject *.
> Specifically:
> return kobj->sd && kobj->sd->dir.subdirs
>
> It operates on kobj->sd, which is exactly what you have: struct kernfs_node.
So right. My turn to grumble about other peoples choice of names. If
that field was named "kn" instead of "sd" I would have spotted this
too.
> > remain in ci->shared_cpu_map to detect whether this is the
> > last SNC node.
>
> hmmm, ok, will take a look ... but please finalize discussion of a patch series
> before submitting a new series that rejects feedback without discussion and
> does something completely different in new version.
Reinette,
So here's what rmdir_mondata_subdir_allrdtgrp() looks like using the
subdirs check. It might need an update/better header comment.
-Tony
---
/*
* Remove all subdirectories of mon_data of ctrl_mon groups
* and monitor groups with given domain id.
*/
static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
struct rdt_mon_domain *d)
{
struct rdtgroup *prgrp, *crgrp;
struct kernfs_node *kn;
char subname[32];
char name[32];
sprintf(name, "mon_%s_%02d", r->name, d->ci->id);
if (r->mon_scope != RESCTRL_L3_CACHE) {
/*
* SNC mode: Unless the last domain is being removed must
* just remove the SNC subdomain.
*/
sprintf(subname, "mon_sub_%s_%02d", r->name, d->hdr.id);
}
list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
kn = kernfs_find_and_get(prgrp->mon.mon_data_kn, name);
if (!kn)
continue;
if (kn->dir.subdirs <= 1)
kernfs_remove(kn);
else
kernfs_remove_by_name(kn, subname);
list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list, mon.crdtgrp_list) {
kn = kernfs_find_and_get(crgrp->mon.mon_data_kn, name);
if (!kn)
continue;
if (kn->dir.subdirs <= 1)
kernfs_remove(kn);
else
kernfs_remove_by_name(kn, subname);
}
}
}
Hi Tony,
On 5/29/24 1:20 PM, Tony Luck wrote:
> On Tue, May 28, 2024 at 03:55:29PM -0700, Reinette Chatre wrote:
>> Hi Tony,
>>> 13: Wordsmith commit into imperative.
>>> I looked at using kobject_has_children() to check for empty
>>> directory, but it needs a "struct kobject *" and all I have
>>> is "struct kernfs_node *". I'm now checking how many CPUs
>>
>> Consider how kobject_has_children() uses that struct kobject *.
>> Specifically:
>> return kobj->sd && kobj->sd->dir.subdirs
>>
>> It operates on kobj->sd, which is exactly what you have: struct kernfs_node.
>
> So right. My turn to grumble about other peoples choice of names. If
> that field was named "kn" instead of "sd" I would have spotted this
> too.
>
>>> remain in ci->shared_cpu_map to detect whether this is the
>>> last SNC node.
>>
>> hmmm, ok, will take a look ... but please finalize discussion of a patch series
>> before submitting a new series that rejects feedback without discussion and
>> does something completely different in new version.
>
> Reinette,
>
> So here's what rmdir_mondata_subdir_allrdtgrp() looks like using the
> subdirs check. It might need an update/better header comment.
>
> -Tony
>
> ---
>
> /*
> * Remove all subdirectories of mon_data of ctrl_mon groups
> * and monitor groups with given domain id.
(note comment still considers that domain id is parameter)
> */
> static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
> struct rdt_mon_domain *d)
> {
> struct rdtgroup *prgrp, *crgrp;
> struct kernfs_node *kn;
> char subname[32];
I wonder if static checkers will know that this cannot be used
uninitialized?
> char name[32];
>
> sprintf(name, "mon_%s_%02d", r->name, d->ci->id);
> if (r->mon_scope != RESCTRL_L3_CACHE) {
> /*
> * SNC mode: Unless the last domain is being removed must
> * just remove the SNC subdomain.
> */
> sprintf(subname, "mon_sub_%s_%02d", r->name, d->hdr.id);
> }
>
> list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
> kn = kernfs_find_and_get(prgrp->mon.mon_data_kn, name);
> if (!kn)
> continue;
>
> if (kn->dir.subdirs <= 1)
> kernfs_remove(kn);
> else
> kernfs_remove_by_name(kn, subname);
>
> list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list, mon.crdtgrp_list) {
> kn = kernfs_find_and_get(crgrp->mon.mon_data_kn, name);
> if (!kn)
> continue;
>
> if (kn->dir.subdirs <= 1)
> kernfs_remove(kn);
> else
> kernfs_remove_by_name(kn, subname);
> }
> }
> }
This solution looks more intuitive to me. I do think that it may be
missing some kernfs_put()'s?
Reinette
ps. Please do give me a couple of days more with this series before you
submit a new version.
On Wed, May 29, 2024 at 07:46:27PM -0700, Reinette Chatre wrote:
> Hi Tony,
>
> On 5/29/24 1:20 PM, Tony Luck wrote:
> > On Tue, May 28, 2024 at 03:55:29PM -0700, Reinette Chatre wrote:
> > > Hi Tony,
> > > > 13: Wordsmith commit into imperative.
> > > > I looked at using kobject_has_children() to check for empty
> > > > directory, but it needs a "struct kobject *" and all I have
> > > > is "struct kernfs_node *". I'm now checking how many CPUs
> > >
> > > Consider how kobject_has_children() uses that struct kobject *.
> > > Specifically:
> > > return kobj->sd && kobj->sd->dir.subdirs
> > >
> > > It operates on kobj->sd, which is exactly what you have: struct kernfs_node.
> >
> > So right. My turn to grumble about other peoples choice of names. If
> > that field was named "kn" instead of "sd" I would have spotted this
> > too.
> >
> > > > remain in ci->shared_cpu_map to detect whether this is the
> > > > last SNC node.
> > >
> > > hmmm, ok, will take a look ... but please finalize discussion of a patch series
> > > before submitting a new series that rejects feedback without discussion and
> > > does something completely different in new version.
> >
> > Reinette,
> >
> > So here's what rmdir_mondata_subdir_allrdtgrp() looks like using the
> > subdirs check. It might need an update/better header comment.
> >
> > -Tony
> >
> > ---
> >
> > /*
> > * Remove all subdirectories of mon_data of ctrl_mon groups
> > * and monitor groups with given domain id.
>
> (note comment still considers that domain id is parameter)
Will fix.
> > */
> > static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
> > struct rdt_mon_domain *d)
> > {
> > struct rdtgroup *prgrp, *crgrp;
> > struct kernfs_node *kn;
> > char subname[32];
>
> I wonder if static checkers will know that this cannot be used
> uninitialized?
I wondered that too. There are no complaints from gcc. How do people
deal with false positives from static checkers? Simplest would be to
provide an initializer:
char subname[32] = "";
While that might shut up the static check, it would be more confusing
for human readers.
> > char name[32];
> >
> > sprintf(name, "mon_%s_%02d", r->name, d->ci->id);
> > if (r->mon_scope != RESCTRL_L3_CACHE) {
> > /*
> > * SNC mode: Unless the last domain is being removed must
> > * just remove the SNC subdomain.
> > */
> > sprintf(subname, "mon_sub_%s_%02d", r->name, d->hdr.id);
> > }
> >
> > list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
> > kn = kernfs_find_and_get(prgrp->mon.mon_data_kn, name);
> > if (!kn)
> > continue;
> >
> > if (kn->dir.subdirs <= 1)
> > kernfs_remove(kn);
> > else
> > kernfs_remove_by_name(kn, subname);
> >
> > list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list, mon.crdtgrp_list) {
> > kn = kernfs_find_and_get(crgrp->mon.mon_data_kn, name);
> > if (!kn)
> > continue;
> >
> > if (kn->dir.subdirs <= 1)
> > kernfs_remove(kn);
> > else
> > kernfs_remove_by_name(kn, subname);
> > }
> > }
> > }
>
> This solution looks more intuitive to me. I do think that it may be
> missing some kernfs_put()'s?
There aren't any kernfs_put()'s in the existing code. Resctrl takes
an extra hold on the CTRL_MON and MON directories and jumps though some
hoops to drop that after the directory has been removed. But the monitor
directories have nothing like that.
> Reinette
>
> ps. Please do give me a couple of days more with this series before you
> submit a new version.
Sure. Will do.
-Tony
Hi Tony,
On 5/30/24 9:36 AM, Tony Luck wrote:
> On Wed, May 29, 2024 at 07:46:27PM -0700, Reinette Chatre wrote:
>> Hi Tony,
>>
>> On 5/29/24 1:20 PM, Tony Luck wrote:
>>> On Tue, May 28, 2024 at 03:55:29PM -0700, Reinette Chatre wrote:
>>>> Hi Tony,
>>>>> 13: Wordsmith commit into imperative.
>>>>> I looked at using kobject_has_children() to check for empty
>>>>> directory, but it needs a "struct kobject *" and all I have
>>>>> is "struct kernfs_node *". I'm now checking how many CPUs
>>>>
>>>> Consider how kobject_has_children() uses that struct kobject *.
>>>> Specifically:
>>>> return kobj->sd && kobj->sd->dir.subdirs
>>>>
>>>> It operates on kobj->sd, which is exactly what you have: struct kernfs_node.
>>>
>>> So right. My turn to grumble about other peoples choice of names. If
>>> that field was named "kn" instead of "sd" I would have spotted this
>>> too.
>>>
>>>>> remain in ci->shared_cpu_map to detect whether this is the
>>>>> last SNC node.
>>>>
>>>> hmmm, ok, will take a look ... but please finalize discussion of a patch series
>>>> before submitting a new series that rejects feedback without discussion and
>>>> does something completely different in new version.
>>>
>>> Reinette,
>>>
>>> So here's what rmdir_mondata_subdir_allrdtgrp() looks like using the
>>> subdirs check. It might need an update/better header comment.
>>>
>>> -Tony
>>>
>>> ---
>>>
>>> /*
>>> * Remove all subdirectories of mon_data of ctrl_mon groups
>>> * and monitor groups with given domain id.
>>
>> (note comment still considers that domain id is parameter)
>
> Will fix.
>
>>> */
>>> static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
>>> struct rdt_mon_domain *d)
>>> {
>>> struct rdtgroup *prgrp, *crgrp;
>>> struct kernfs_node *kn;
>>> char subname[32];
>>
>> I wonder if static checkers will know that this cannot be used
>> uninitialized?
>
> I wondered that too. There are no complaints from gcc. How do people
> deal with false positives from static checkers? Simplest would be to
> provide an initializer:
>
> char subname[32] = "";
>
> While that might shut up the static check, it would be more confusing
> for human readers.
or char subname[32] = {};
Please elaborate how this will be confusing to human readers? A comment
may help to address that.
I took the time to run a static checker on this series and it did
not complain about this issue. I did not run it with this fixup though, with
just original submission that seem to have similar pattern. I do still think
it would be good to initialize the arrays.
btw ... the static checker I ran did have four other complaints, three about
uninitialized data and one about divide by zero. Most problems appear to be
in mbm_update() that does not initialize rr.sumdomains nor rr.ci before
calling __mon_event_count().
Please use available tools to check code before posting.
>
>>> char name[32];
>>>
>>> sprintf(name, "mon_%s_%02d", r->name, d->ci->id);
>>> if (r->mon_scope != RESCTRL_L3_CACHE) {
>>> /*
>>> * SNC mode: Unless the last domain is being removed must
>>> * just remove the SNC subdomain.
>>> */
>>> sprintf(subname, "mon_sub_%s_%02d", r->name, d->hdr.id);
>>> }
>>>
>>> list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
>>> kn = kernfs_find_and_get(prgrp->mon.mon_data_kn, name);
>>> if (!kn)
>>> continue;
>>>
>>> if (kn->dir.subdirs <= 1)
>>> kernfs_remove(kn);
>>> else
>>> kernfs_remove_by_name(kn, subname);
>>>
>>> list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list, mon.crdtgrp_list) {
>>> kn = kernfs_find_and_get(crgrp->mon.mon_data_kn, name);
>>> if (!kn)
>>> continue;
>>>
>>> if (kn->dir.subdirs <= 1)
>>> kernfs_remove(kn);
>>> else
>>> kernfs_remove_by_name(kn, subname);
>>> }
>>> }
>>> }
>>
>> This solution looks more intuitive to me. I do think that it may be
>> missing some kernfs_put()'s?
>
> There aren't any kernfs_put()'s in the existing code.
Why should it? Existing code does not have the kernfs_put()'s because
the extra reference is only obtained in this new code.
Reinette
> btw ... the static checker I ran did have four other complaints, three about > uninitialized data and one about divide by zero. Most problems appear to be > in mbm_update() that does not initialize rr.sumdomains nor rr.ci before > calling __mon_event_count(). > > Please use available tools to check code before posting. Which static checker? I tried smatch and it only finds one: arch/x86/kernel/cpu/resctrl/rdtgroup.c:3129 mkdir_mondata_subdir() error: uninitialized symbol 'ret'. Which is a false positive (but I don't fault smatch for not understanding the relationship between the two "if" blocks in this function. -Tony
Hi Tony, On 5/30/24 3:49 PM, Luck, Tony wrote: >> btw ... the static checker I ran did have four other complaints, three about >> uninitialized data and one about divide by zero. Most problems appear to be >> in mbm_update() that does not initialize rr.sumdomains nor rr.ci before >> calling __mon_event_count(). >> >> Please use available tools to check code before posting. > > Which static checker? I tried smatch and it only finds one: I used coverity this time. Reinette
© 2016 - 2025 Red Hat, Inc.