[PATCH v20 00/18] Add support for Sub-NUMA cluster (SNC) systems

Tony Luck posted 18 patches 1 year, 8 months ago
There is a newer version of this series
Documentation/arch/x86/resctrl.rst        |  27 ++
include/linux/resctrl.h                   |  87 ++++--
arch/x86/include/asm/msr-index.h          |   1 +
arch/x86/kernel/cpu/resctrl/internal.h    |  93 +++++--
arch/x86/kernel/cpu/resctrl/core.c        | 312 ++++++++++++++++------
arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  85 +++---
arch/x86/kernel/cpu/resctrl/monitor.c     | 242 ++++++++++++++---
arch/x86/kernel/cpu/resctrl/pseudo_lock.c |  27 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c    | 272 ++++++++++++-------
9 files changed, 835 insertions(+), 311 deletions(-)
[PATCH v20 00/18] Add support for Sub-NUMA cluster (SNC) systems
Posted by Tony Luck 1 year, 8 months ago
This series based on top of tip x86/cache commit f385f0246394
("x86/resctrl: Replace open coded cacheinfo searches")

The Sub-NUMA cluster feature on some Intel processors partitions the CPUs
that share an L3 cache into two or more sets. This plays havoc with the
Resource Director Technology (RDT) monitoring features.  Prior to this
patch Intel has advised that SNC and RDT are incompatible.

Some of these CPUs support an MSR that can partition the RMID counters
in the same way. This allows monitoring features to be used. Legacy
monitoring files provide the sum of counters from each SNC node for
backwards compatibility. Additional  files per SNC node provide details
per node.

Memory bandwidth allocation features continue to operate at
the scope of the L3 cache.

L3 cache occupancy and allocation operate on the portion of
L3 cache available for each SNC node.

Signed-off-by: Tony Luck <tony.luck@intel.com>

---
Changes since v19: https://lore.kernel.org/all/20240528222006.58283-1-tony.luck@intel.com/

1-4:	Refactor on top of <linux/cacheinfo.h> change.
	Nothing functional.

5:	No change

6:	Updated commit message with note about RMID Sharing mode.
	Renamed __rmid_read() to __rmid_read_phys() and performed
	translation from logical RMID to physical RMID at callsites.
	Updated comment for __rmid_read_phys() with explanation of
	logical/physical RMIDs. Consistently use "SNC node" avoid
	SNC domain. Add specifics for non-SNC mode.
	Joined split line on __rmid_read() definition (even with the
	added "_phys" to its name still fits on one line.

7:	No change

8:	get_cpu_cacheinfo_level() moved to <linux/cacheinfo.h>
	currently in tip x86/cache
	no other changes

9:	Dropped the "sumdomains" field from struct rmid_read (a NULL
	domain field now indicates that summing is needed).
	Fix kerneldoc comments for struct rmid_read.
	Updated commit comments with more "why" than "what".

10:	No change

11:	Fix commit comments per suggestions
	1) Added some "why it is OK to take a bit from evtid"
	2) s/The stolen bit is given to/Give the bit to/
	3) Don't use "l3_cache_id" (which looks like a variable)

12:	Fix commit message.
	s/kernfs_find_and_get_ns()/kernfs_find_and_get()/
	Add kernfs_put() to drop hold from kernfs_find_and_get()
	Drop useless "/* create the directory */" comment.

13:	Add kernfs_put() to drop hold from kernfs_find_and_get() [two places]

14:	Add cpumask parameter to mon_event_read() so SNC decsions are
	all in rdtgroup_mondata_show() instead of spread between functions.
	Add comments in rdtgroup_mondata_show() to explain the sum vs. no-sum
	cases.
	Moved the mon_event_read() call into both arms of the if-else
	instead of "d = NULL; goto got_cacheinfo;"

15:	New (replaces 15-17). Make __mon_event_read() do the sum across
	domains (at filesystem level). Move the CPU/domain sanity check out
	of resctrl_arch_rmid_read() and into __mon_event_read()
	with separate scope tests for single domain vs. sum over
	domains.

16:	[Was 18] Update commit message with details about MSR 0xCA0, what changes
	when bit 0 is cleared, and why this is necessary.
	Dropped "Add an architecture specific hook" language from
	commit message.

17:	[Was 19] Drop "and enabling" from shortlog (enabling done by
	previous commit).
	Added checks that cpumask_weight() isn't returning zero (to keep
	static checkers from warning of possible divide by zero).

18:	[Was 20] Fix some "Sub-NUMA" references to say "Sub-NUMA Cluster"
	Added document section on effect of SNC mode on MBA and L3 CAT.

Tony Luck (18):
  x86/resctrl: Prepare for new domain scope
  x86/resctrl: Prepare to split rdt_domain structure
  x86/resctrl: Prepare for different scope for control/monitor
    operations
  x86/resctrl: Split the rdt_domain and rdt_hw_domain structures
  x86/resctrl: Add node-scope to the options for feature scope
  x86/resctrl: Introduce snc_nodes_per_l3_cache
  x86/resctrl: Block use of mba_MBps mount option on Sub-NUMA Cluster
    (SNC) systems
  x86/resctrl: Prepare for new Sub-NUMA Cluster (SNC) monitor files
  x86/resctrl: Add a new field to struct rmid_read for summation of
    domains
  x86/resctrl: Refactor mkdir_mondata_subdir() with a helper function
  x86/resctrl: Allocate a new field in union mon_data_bits
  x86/resctrl: Create Sub-NUMA Cluster (SNC) monitor files
  x86/resctrl: Handle removing directories in Sub-NUMA Cluster (SNC)
    mode
  x86/resctrl: Fill out rmid_read structure for smp_call*() to read a
    counter
  x86/resctrl: Make __mon_event_count() handle sum domains
  x86/resctrl: Enable RMID shared RMID mode on Sub-NUMA Cluster (SNC)
    systems
  x86/resctrl: Sub-NUMA Cluster (SNC) detection
  x86/resctrl: Update documentation with Sub-NUMA cluster changes

 Documentation/arch/x86/resctrl.rst        |  27 ++
 include/linux/resctrl.h                   |  87 ++++--
 arch/x86/include/asm/msr-index.h          |   1 +
 arch/x86/kernel/cpu/resctrl/internal.h    |  93 +++++--
 arch/x86/kernel/cpu/resctrl/core.c        | 312 ++++++++++++++++------
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  85 +++---
 arch/x86/kernel/cpu/resctrl/monitor.c     | 242 ++++++++++++++---
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c |  27 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    | 272 ++++++++++++-------
 9 files changed, 835 insertions(+), 311 deletions(-)


base-commit: f385f024639431bec3e70c33cdbc9563894b3ee5
-- 
2.45.0
Re: [PATCH v20 00/18] Add support for Sub-NUMA cluster (SNC) systems
Posted by Moger, Babu 1 year, 8 months ago
Hi Reinette,

I may be little bit out of sync here. Also, sorry to come back late in the
series.

Looking at the series again, I see this approach adds lots of code.
Look at this structure.


@@ -187,10 +196,12 @@ struct rdt_resource {
 	bool			alloc_capable;
 	bool			mon_capable;
 	int			num_rmid;
-	enum resctrl_scope	scope;
+	enum resctrl_scope	ctrl_scope;
+	enum resctrl_scope	mon_scope;
 	struct resctrl_cache	cache;
 	struct resctrl_membw	membw;
-	struct list_head	domains;
+	struct list_head	ctrl_domains;
+	struct list_head	mon_domains;
 	char			*name;
 	int			data_width;
 	u32			default_ctrl;

There are two scope fields.
There are two domains fields.

These are very confusing and very hard to maintain. Also, I am not sure if
these fields are useful for anything other than SNC feature. This approach
adds quite a bit of code for no specific advantage.

Why don't we just split the RDT_RESOURCE_L3 resource
into separate resources, one for control, one for monitoring.
We already have "control" only resources (MBA, SMBA, L2). Lets create new
"monitor" only resource. I feel it will be much cleaner approach.

Tony has already tried that approach and showed that it is much simpler.

v15-RFC :
https://lore.kernel.org/lkml/20240130222034.37181-1-tony.luck@intel.com/

What do you think?

Thanks
Babu


On 6/10/24 13:35, Tony Luck wrote:
> This series based on top of tip x86/cache commit f385f0246394
> ("x86/resctrl: Replace open coded cacheinfo searches")
> 
> The Sub-NUMA cluster feature on some Intel processors partitions the CPUs
> that share an L3 cache into two or more sets. This plays havoc with the
> Resource Director Technology (RDT) monitoring features.  Prior to this
> patch Intel has advised that SNC and RDT are incompatible.
> 
> Some of these CPUs support an MSR that can partition the RMID counters
> in the same way. This allows monitoring features to be used. Legacy
> monitoring files provide the sum of counters from each SNC node for
> backwards compatibility. Additional  files per SNC node provide details
> per node.
> 
> Memory bandwidth allocation features continue to operate at
> the scope of the L3 cache.
> 
> L3 cache occupancy and allocation operate on the portion of
> L3 cache available for each SNC node.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> 
> ---
> Changes since v19: https://lore.kernel.org/all/20240528222006.58283-1-tony.luck@intel.com/
> 
> 1-4:	Refactor on top of <linux/cacheinfo.h> change.
> 	Nothing functional.
> 
> 5:	No change
> 
> 6:	Updated commit message with note about RMID Sharing mode.
> 	Renamed __rmid_read() to __rmid_read_phys() and performed
> 	translation from logical RMID to physical RMID at callsites.
> 	Updated comment for __rmid_read_phys() with explanation of
> 	logical/physical RMIDs. Consistently use "SNC node" avoid
> 	SNC domain. Add specifics for non-SNC mode.
> 	Joined split line on __rmid_read() definition (even with the
> 	added "_phys" to its name still fits on one line.
> 
> 7:	No change
> 
> 8:	get_cpu_cacheinfo_level() moved to <linux/cacheinfo.h>
> 	currently in tip x86/cache
> 	no other changes
> 
> 9:	Dropped the "sumdomains" field from struct rmid_read (a NULL
> 	domain field now indicates that summing is needed).
> 	Fix kerneldoc comments for struct rmid_read.
> 	Updated commit comments with more "why" than "what".
> 
> 10:	No change
> 
> 11:	Fix commit comments per suggestions
> 	1) Added some "why it is OK to take a bit from evtid"
> 	2) s/The stolen bit is given to/Give the bit to/
> 	3) Don't use "l3_cache_id" (which looks like a variable)
> 
> 12:	Fix commit message.
> 	s/kernfs_find_and_get_ns()/kernfs_find_and_get()/
> 	Add kernfs_put() to drop hold from kernfs_find_and_get()
> 	Drop useless "/* create the directory */" comment.
> 
> 13:	Add kernfs_put() to drop hold from kernfs_find_and_get() [two places]
> 
> 14:	Add cpumask parameter to mon_event_read() so SNC decsions are
> 	all in rdtgroup_mondata_show() instead of spread between functions.
> 	Add comments in rdtgroup_mondata_show() to explain the sum vs. no-sum
> 	cases.
> 	Moved the mon_event_read() call into both arms of the if-else
> 	instead of "d = NULL; goto got_cacheinfo;"
> 
> 15:	New (replaces 15-17). Make __mon_event_read() do the sum across
> 	domains (at filesystem level). Move the CPU/domain sanity check out
> 	of resctrl_arch_rmid_read() and into __mon_event_read()
> 	with separate scope tests for single domain vs. sum over
> 	domains.
> 
> 16:	[Was 18] Update commit message with details about MSR 0xCA0, what changes
> 	when bit 0 is cleared, and why this is necessary.
> 	Dropped "Add an architecture specific hook" language from
> 	commit message.
> 
> 17:	[Was 19] Drop "and enabling" from shortlog (enabling done by
> 	previous commit).
> 	Added checks that cpumask_weight() isn't returning zero (to keep
> 	static checkers from warning of possible divide by zero).
> 
> 18:	[Was 20] Fix some "Sub-NUMA" references to say "Sub-NUMA Cluster"
> 	Added document section on effect of SNC mode on MBA and L3 CAT.
> 
> Tony Luck (18):
>   x86/resctrl: Prepare for new domain scope
>   x86/resctrl: Prepare to split rdt_domain structure
>   x86/resctrl: Prepare for different scope for control/monitor
>     operations
>   x86/resctrl: Split the rdt_domain and rdt_hw_domain structures
>   x86/resctrl: Add node-scope to the options for feature scope
>   x86/resctrl: Introduce snc_nodes_per_l3_cache
>   x86/resctrl: Block use of mba_MBps mount option on Sub-NUMA Cluster
>     (SNC) systems
>   x86/resctrl: Prepare for new Sub-NUMA Cluster (SNC) monitor files
>   x86/resctrl: Add a new field to struct rmid_read for summation of
>     domains
>   x86/resctrl: Refactor mkdir_mondata_subdir() with a helper function
>   x86/resctrl: Allocate a new field in union mon_data_bits
>   x86/resctrl: Create Sub-NUMA Cluster (SNC) monitor files
>   x86/resctrl: Handle removing directories in Sub-NUMA Cluster (SNC)
>     mode
>   x86/resctrl: Fill out rmid_read structure for smp_call*() to read a
>     counter
>   x86/resctrl: Make __mon_event_count() handle sum domains
>   x86/resctrl: Enable RMID shared RMID mode on Sub-NUMA Cluster (SNC)
>     systems
>   x86/resctrl: Sub-NUMA Cluster (SNC) detection
>   x86/resctrl: Update documentation with Sub-NUMA cluster changes
> 
>  Documentation/arch/x86/resctrl.rst        |  27 ++
>  include/linux/resctrl.h                   |  87 ++++--
>  arch/x86/include/asm/msr-index.h          |   1 +
>  arch/x86/kernel/cpu/resctrl/internal.h    |  93 +++++--
>  arch/x86/kernel/cpu/resctrl/core.c        | 312 ++++++++++++++++------
>  arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  85 +++---
>  arch/x86/kernel/cpu/resctrl/monitor.c     | 242 ++++++++++++++---
>  arch/x86/kernel/cpu/resctrl/pseudo_lock.c |  27 +-
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c    | 272 ++++++++++++-------
>  9 files changed, 835 insertions(+), 311 deletions(-)
> 
> 
> base-commit: f385f024639431bec3e70c33cdbc9563894b3ee5

-- 
Thanks
Babu Moger
Re: [PATCH v20 00/18] Add support for Sub-NUMA cluster (SNC) systems
Posted by Reinette Chatre 1 year, 8 months ago
Hi Babu,

On 6/13/24 12:17 PM, Moger, Babu wrote:
> I may be little bit out of sync here. Also, sorry to come back late in the
> series.
> 
> Looking at the series again, I see this approach adds lots of code.
> Look at this structure.
> 
> 
> @@ -187,10 +196,12 @@ struct rdt_resource {
>   	bool			alloc_capable;
>   	bool			mon_capable;
>   	int			num_rmid;
> -	enum resctrl_scope	scope;
> +	enum resctrl_scope	ctrl_scope;
> +	enum resctrl_scope	mon_scope;
>   	struct resctrl_cache	cache;
>   	struct resctrl_membw	membw;
> -	struct list_head	domains;
> +	struct list_head	ctrl_domains;
> +	struct list_head	mon_domains;
>   	char			*name;
>   	int			data_width;
>   	u32			default_ctrl;
> 
> There are two scope fields.
> There are two domains fields.
> 
> These are very confusing and very hard to maintain. Also, I am not sure if
> these fields are useful for anything other than SNC feature. This approach
> adds quite a bit of code for no specific advantage.
> 
> Why don't we just split the RDT_RESOURCE_L3 resource
> into separate resources, one for control, one for monitoring.
> We already have "control" only resources (MBA, SMBA, L2). Lets create new
> "monitor" only resource. I feel it will be much cleaner approach.
> 
> Tony has already tried that approach and showed that it is much simpler.
> 
> v15-RFC :
> https://lore.kernel.org/lkml/20240130222034.37181-1-tony.luck@intel.com/
> 
> What do you think?
> 

Some highlights of my thoughts in response to that series, but the whole thread
may be of interest to you:
https://lore.kernel.org/lkml/78c88c6d-2e8d-42d1-a6f2-1c5ac38fb258@intel.com/
https://lore.kernel.org/lkml/59944211-d34a-4ba3-a1de-095822c0b3f0@intel.com/

Reinette
Re: [PATCH v20 00/18] Add support for Sub-NUMA cluster (SNC) systems
Posted by Moger, Babu 1 year, 8 months ago
Hi Reinette,

On 6/13/24 15:32, Reinette Chatre wrote:
> Hi Babu,
> 
> On 6/13/24 12:17 PM, Moger, Babu wrote:
>> I may be little bit out of sync here. Also, sorry to come back late in the
>> series.
>>
>> Looking at the series again, I see this approach adds lots of code.
>> Look at this structure.
>>
>>
>> @@ -187,10 +196,12 @@ struct rdt_resource {
>>       bool            alloc_capable;
>>       bool            mon_capable;
>>       int            num_rmid;
>> -    enum resctrl_scope    scope;
>> +    enum resctrl_scope    ctrl_scope;
>> +    enum resctrl_scope    mon_scope;
>>       struct resctrl_cache    cache;
>>       struct resctrl_membw    membw;
>> -    struct list_head    domains;
>> +    struct list_head    ctrl_domains;
>> +    struct list_head    mon_domains;
>>       char            *name;
>>       int            data_width;
>>       u32            default_ctrl;
>>
>> There are two scope fields.
>> There are two domains fields.
>>
>> These are very confusing and very hard to maintain. Also, I am not sure if
>> these fields are useful for anything other than SNC feature. This approach
>> adds quite a bit of code for no specific advantage.
>>
>> Why don't we just split the RDT_RESOURCE_L3 resource
>> into separate resources, one for control, one for monitoring.
>> We already have "control" only resources (MBA, SMBA, L2). Lets create new
>> "monitor" only resource. I feel it will be much cleaner approach.
>>
>> Tony has already tried that approach and showed that it is much simpler.
>>
>> v15-RFC :
>> https://lore.kernel.org/lkml/20240130222034.37181-1-tony.luck@intel.com/
>>
>> What do you think?
>>
> 
> Some highlights of my thoughts in response to that series, but the whole
> thread
> may be of interest to you:
> https://lore.kernel.org/lkml/78c88c6d-2e8d-42d1-a6f2-1c5ac38fb258@intel.com/
> https://lore.kernel.org/lkml/59944211-d34a-4ba3-a1de-095822c0b3f0@intel.com/
> 

Went through the thread, in summary:

The main concerns are related to duplication of code and data structures.

The solutions are

a) Split the domains.
This is what this series is doing now. This creates members like
ctrl_scope, mon_scope, ctrl_domains etc.. These fields are added to all
the resources (MBA, SMBA and L2). Then there is additional domain header.


b) Split the resource.
  Split RDT_RESOURCE_L3 into two, one for "monitor" and one for "control".
  There will be one domain structure for "monitor" and  one for "control"

Both these approaches have code and data duplication. So, there is no
difference that way.

But complexity wise, approach (a) adds quite bit of complexity. Doesn't it?

For me, solution (b) looks simple and easy. Eventually, we may have to
restructure these data structures anyways. I feel it is the right direction.

-- 
Thanks
Babu Moger
Re: [PATCH v20 00/18] Add support for Sub-NUMA cluster (SNC) systems
Posted by Reinette Chatre 1 year, 8 months ago
Hi Babu,

On 6/14/24 9:27 AM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 6/13/24 15:32, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 6/13/24 12:17 PM, Moger, Babu wrote:
>>> I may be little bit out of sync here. Also, sorry to come back late in the
>>> series.
>>>
>>> Looking at the series again, I see this approach adds lots of code.
>>> Look at this structure.
>>>
>>>
>>> @@ -187,10 +196,12 @@ struct rdt_resource {
>>>        bool            alloc_capable;
>>>        bool            mon_capable;
>>>        int            num_rmid;
>>> -    enum resctrl_scope    scope;
>>> +    enum resctrl_scope    ctrl_scope;
>>> +    enum resctrl_scope    mon_scope;
>>>        struct resctrl_cache    cache;
>>>        struct resctrl_membw    membw;
>>> -    struct list_head    domains;
>>> +    struct list_head    ctrl_domains;
>>> +    struct list_head    mon_domains;
>>>        char            *name;
>>>        int            data_width;
>>>        u32            default_ctrl;
>>>
>>> There are two scope fields.
>>> There are two domains fields.
>>>
>>> These are very confusing and very hard to maintain. Also, I am not sure if
>>> these fields are useful for anything other than SNC feature. This approach
>>> adds quite a bit of code for no specific advantage.
>>>
>>> Why don't we just split the RDT_RESOURCE_L3 resource
>>> into separate resources, one for control, one for monitoring.
>>> We already have "control" only resources (MBA, SMBA, L2). Lets create new
>>> "monitor" only resource. I feel it will be much cleaner approach.
>>>
>>> Tony has already tried that approach and showed that it is much simpler.
>>>
>>> v15-RFC :
>>> https://lore.kernel.org/lkml/20240130222034.37181-1-tony.luck@intel.com/
>>>
>>> What do you think?
>>>
>>
>> Some highlights of my thoughts in response to that series, but the whole
>> thread
>> may be of interest to you:
>> https://lore.kernel.org/lkml/78c88c6d-2e8d-42d1-a6f2-1c5ac38fb258@intel.com/
>> https://lore.kernel.org/lkml/59944211-d34a-4ba3-a1de-095822c0b3f0@intel.com/
>>
> 
> Went through the thread, in summary:
> 
> The main concerns are related to duplication of code and data structures.
> 
> The solutions are
> 
> a) Split the domains.
> This is what this series is doing now. This creates members like
> ctrl_scope, mon_scope, ctrl_domains etc.. These fields are added to all
> the resources (MBA, SMBA and L2). Then there is additional domain header.
> 
> 
> b) Split the resource.
>    Split RDT_RESOURCE_L3 into two, one for "monitor" and one for "control".
>    There will be one domain structure for "monitor" and  one for "control"
> 
> Both these approaches have code and data duplication. So, there is no
> difference that way.

Could you please elaborate where code and data duplication of (a) is?

> 
> But complexity wise, approach (a) adds quite bit of complexity. Doesn't it?

"complex" is a subjective term. Could you please elaborate what is complex
about this? Is your concern about the size of the patch? To me that is
not a concern when considering the end result of how the resctrl structures
are organized.

> 
> For me, solution (b) looks simple and easy. Eventually, we may have to
> restructure these data structures anyways. I feel it is the right direction.
> 

I understand that it is tempting to look for smallest patch possible but we
really need to ensure that any work integrates well into resctrl. Doing
so may end up with larger patches but in the end it makes the data structures
and code easier to understand. I specifically find the duplication of structures
troublesome since that requires developers to always be on high alert of
what code is being worked on and what flows the particular code participates in
since the duplication results in members of structures be invalid based on which
code flow is used. To me this is an unnecessary burden on developers and against
your original goal of making resctrl easier to maintain.

Reinette
Re: [PATCH v20 00/18] Add support for Sub-NUMA cluster (SNC) systems
Posted by Moger, Babu 1 year, 8 months ago
Hi Reinette,

On 6/14/2024 11:46 AM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 6/14/24 9:27 AM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 6/13/24 15:32, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 6/13/24 12:17 PM, Moger, Babu wrote:
>>>> I may be little bit out of sync here. Also, sorry to come back late 
>>>> in the
>>>> series.
>>>>
>>>> Looking at the series again, I see this approach adds lots of code.
>>>> Look at this structure.
>>>>
>>>>
>>>> @@ -187,10 +196,12 @@ struct rdt_resource {
>>>>        bool            alloc_capable;
>>>>        bool            mon_capable;
>>>>        int            num_rmid;
>>>> -    enum resctrl_scope    scope;
>>>> +    enum resctrl_scope    ctrl_scope;
>>>> +    enum resctrl_scope    mon_scope;
>>>>        struct resctrl_cache    cache;
>>>>        struct resctrl_membw    membw;
>>>> -    struct list_head    domains;
>>>> +    struct list_head    ctrl_domains;
>>>> +    struct list_head    mon_domains;
>>>>        char            *name;
>>>>        int            data_width;
>>>>        u32            default_ctrl;
>>>>
>>>> There are two scope fields.
>>>> There are two domains fields.
>>>>
>>>> These are very confusing and very hard to maintain. Also, I am not 
>>>> sure if
>>>> these fields are useful for anything other than SNC feature. This 
>>>> approach
>>>> adds quite a bit of code for no specific advantage.
>>>>
>>>> Why don't we just split the RDT_RESOURCE_L3 resource
>>>> into separate resources, one for control, one for monitoring.
>>>> We already have "control" only resources (MBA, SMBA, L2). Lets 
>>>> create new
>>>> "monitor" only resource. I feel it will be much cleaner approach.
>>>>
>>>> Tony has already tried that approach and showed that it is much 
>>>> simpler.
>>>>
>>>> v15-RFC :
>>>> https://lore.kernel.org/lkml/20240130222034.37181-1-tony.luck@intel.com/ 
>>>>
>>>>
>>>> What do you think?
>>>>
>>>
>>> Some highlights of my thoughts in response to that series, but the whole
>>> thread
>>> may be of interest to you:
>>> https://lore.kernel.org/lkml/78c88c6d-2e8d-42d1-a6f2-1c5ac38fb258@intel.com/ 
>>>
>>> https://lore.kernel.org/lkml/59944211-d34a-4ba3-a1de-095822c0b3f0@intel.com/ 
>>>
>>>
>>
>> Went through the thread, in summary:
>>
>> The main concerns are related to duplication of code and data structures.
>>
>> The solutions are
>>
>> a) Split the domains.
>> This is what this series is doing now. This creates members like
>> ctrl_scope, mon_scope, ctrl_domains etc.. These fields are added to all
>> the resources (MBA, SMBA and L2). Then there is additional domain header.
>>
>>
>> b) Split the resource.
>>    Split RDT_RESOURCE_L3 into two, one for "monitor" and one for 
>> "control".
>>    There will be one domain structure for "monitor" and  one for 
>> "control"
>>
>> Both these approaches have code and data duplication. So, there is no
>> difference that way.
> 
> Could you please elaborate where code and data duplication of (a) is?

We have ctrl_scope, mon_scope, ctrl_domains. mon_domains.  Only one 
resource, RDT_RESOURCE_L3 is going to use these fields. Rest of the 
resources don't need these fields. But these fields are part of all the 
resources.

I am not too worried about the size of the patch.  But, I don't foresee 
these fields will be used anytime soon in these resources(MBA. L3. 
SMBA). Why add it now? In future we may have to cleanup all these anyways.

> 
>>
>> But complexity wise, approach (a) adds quite bit of complexity. 
>> Doesn't it?
> 
> "complex" is a subjective term. Could you please elaborate what is complex
> about this? Is your concern about the size of the patch? To me that is
> not a concern when considering the end result of how the resctrl structures
> are organized.
> 
>>
>> For me, solution (b) looks simple and easy. Eventually, we may have to
>> restructure these data structures anyways. I feel it is the right 
>> direction.
>>
> 
> I understand that it is tempting to look for smallest patch possible but we
> really need to ensure that any work integrates well into resctrl. Doing
> so may end up with larger patches but in the end it makes the data 
> structures
> and code easier to understand. I specifically find the duplication of 
> structures
> troublesome since that requires developers to always be on high alert of
> what code is being worked on and what flows the particular code 
> participates in
> since the duplication results in members of structures be invalid based 
> on which
> code flow is used. To me this is an unnecessary burden on developers and 
> against
> your original goal of making resctrl easier to maintain.
> 
> Reinette

-- 
- Babu Moger
Re: [PATCH v20 00/18] Add support for Sub-NUMA cluster (SNC) systems
Posted by Reinette Chatre 1 year, 8 months ago
Hi Babu,

On 6/14/24 2:29 PM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 6/14/2024 11:46 AM, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 6/14/24 9:27 AM, Moger, Babu wrote:
>>> Hi Reinette,
>>>
>>> On 6/13/24 15:32, Reinette Chatre wrote:
>>>> Hi Babu,
>>>>
>>>> On 6/13/24 12:17 PM, Moger, Babu wrote:
>>>>> I may be little bit out of sync here. Also, sorry to come back late in the
>>>>> series.
>>>>>
>>>>> Looking at the series again, I see this approach adds lots of code.
>>>>> Look at this structure.
>>>>>
>>>>>
>>>>> @@ -187,10 +196,12 @@ struct rdt_resource {
>>>>>        bool            alloc_capable;
>>>>>        bool            mon_capable;
>>>>>        int            num_rmid;
>>>>> -    enum resctrl_scope    scope;
>>>>> +    enum resctrl_scope    ctrl_scope;
>>>>> +    enum resctrl_scope    mon_scope;
>>>>>        struct resctrl_cache    cache;
>>>>>        struct resctrl_membw    membw;
>>>>> -    struct list_head    domains;
>>>>> +    struct list_head    ctrl_domains;
>>>>> +    struct list_head    mon_domains;
>>>>>        char            *name;
>>>>>        int            data_width;
>>>>>        u32            default_ctrl;
>>>>>
>>>>> There are two scope fields.
>>>>> There are two domains fields.
>>>>>
>>>>> These are very confusing and very hard to maintain. Also, I am not sure if
>>>>> these fields are useful for anything other than SNC feature. This approach
>>>>> adds quite a bit of code for no specific advantage.
>>>>>
>>>>> Why don't we just split the RDT_RESOURCE_L3 resource
>>>>> into separate resources, one for control, one for monitoring.
>>>>> We already have "control" only resources (MBA, SMBA, L2). Lets create new
>>>>> "monitor" only resource. I feel it will be much cleaner approach.
>>>>>
>>>>> Tony has already tried that approach and showed that it is much simpler.
>>>>>
>>>>> v15-RFC :
>>>>> https://lore.kernel.org/lkml/20240130222034.37181-1-tony.luck@intel.com/
>>>>>
>>>>> What do you think?
>>>>>
>>>>
>>>> Some highlights of my thoughts in response to that series, but the whole
>>>> thread
>>>> may be of interest to you:
>>>> https://lore.kernel.org/lkml/78c88c6d-2e8d-42d1-a6f2-1c5ac38fb258@intel.com/
>>>> https://lore.kernel.org/lkml/59944211-d34a-4ba3-a1de-095822c0b3f0@intel.com/
>>>>
>>>
>>> Went through the thread, in summary:
>>>
>>> The main concerns are related to duplication of code and data structures.
>>>
>>> The solutions are
>>>
>>> a) Split the domains.
>>> This is what this series is doing now. This creates members like
>>> ctrl_scope, mon_scope, ctrl_domains etc.. These fields are added to all
>>> the resources (MBA, SMBA and L2). Then there is additional domain header.
>>>
>>>
>>> b) Split the resource.
>>>    Split RDT_RESOURCE_L3 into two, one for "monitor" and one for "control".
>>>    There will be one domain structure for "monitor" and  one for "control"
>>>
>>> Both these approaches have code and data duplication. So, there is no
>>> difference that way.
>>
>> Could you please elaborate where code and data duplication of (a) is?
> 
> We have ctrl_scope, mon_scope, ctrl_domains. mon_domains.  Only one
> resource, RDT_RESOURCE_L3 is going to use these fields. Rest of the
> resources don't need these fields. But these fields are part of all
> the resources.

Correct. There are two new empty fields per resource that does
not support monitoring. Having the new mon_domains list results in
the benefit of eliminating monitoring fields from all the domains
forming part of resources that do not support monitoring. Providing
more details below but the additional pointer results in a significant
net reduction of unused fields. Having the new mon_scope field results
in the benefit to support SNC.

> 
> I am not too worried about the size of the patch.  But, I don't
> foresee these fields will be used anytime soon in these
> resources(MBA. L3. SMBA). Why add it now? In future we may have to
> cleanup all these anyways.

This work does indeed go through the effort to _eliminate_ unused fields.
Note how all domains of all resources (whether they support monitoring or
not) currently have to carry a significant number of monitoring fields.
These can be found in both struct rdt_domain (*rmid_busy_llc, *mbm_total,
*mbm_local, mbm_over, cqm_limbo, mbm_work_cpu, cqm_work_cpu)  as
well as struct rdt_hw_domain (*arch_mbm_total, *arch_mbm_local).

For a resource that does not support monitoring it is of course
unnecessary to carry all of this for _every_ domain instance and
after this series it no longer will.

Reinette
Re: [PATCH v20 00/18] Add support for Sub-NUMA cluster (SNC) systems
Posted by Moger, Babu 1 year, 8 months ago
Hi Reinette,

On 6/14/24 18:11, Reinette Chatre wrote:
> Hi Babu,
> 
> On 6/14/24 2:29 PM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 6/14/2024 11:46 AM, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 6/14/24 9:27 AM, Moger, Babu wrote:
>>>> Hi Reinette,
>>>>
>>>> On 6/13/24 15:32, Reinette Chatre wrote:
>>>>> Hi Babu,
>>>>>
>>>>> On 6/13/24 12:17 PM, Moger, Babu wrote:
>>>>>> I may be little bit out of sync here. Also, sorry to come back late
>>>>>> in the
>>>>>> series.
>>>>>>
>>>>>> Looking at the series again, I see this approach adds lots of code.
>>>>>> Look at this structure.
>>>>>>
>>>>>>
>>>>>> @@ -187,10 +196,12 @@ struct rdt_resource {
>>>>>>        bool            alloc_capable;
>>>>>>        bool            mon_capable;
>>>>>>        int            num_rmid;
>>>>>> -    enum resctrl_scope    scope;
>>>>>> +    enum resctrl_scope    ctrl_scope;
>>>>>> +    enum resctrl_scope    mon_scope;
>>>>>>        struct resctrl_cache    cache;
>>>>>>        struct resctrl_membw    membw;
>>>>>> -    struct list_head    domains;
>>>>>> +    struct list_head    ctrl_domains;
>>>>>> +    struct list_head    mon_domains;
>>>>>>        char            *name;
>>>>>>        int            data_width;
>>>>>>        u32            default_ctrl;
>>>>>>
>>>>>> There are two scope fields.
>>>>>> There are two domains fields.
>>>>>>
>>>>>> These are very confusing and very hard to maintain. Also, I am not
>>>>>> sure if
>>>>>> these fields are useful for anything other than SNC feature. This
>>>>>> approach
>>>>>> adds quite a bit of code for no specific advantage.
>>>>>>
>>>>>> Why don't we just split the RDT_RESOURCE_L3 resource
>>>>>> into separate resources, one for control, one for monitoring.
>>>>>> We already have "control" only resources (MBA, SMBA, L2). Lets
>>>>>> create new
>>>>>> "monitor" only resource. I feel it will be much cleaner approach.
>>>>>>
>>>>>> Tony has already tried that approach and showed that it is much
>>>>>> simpler.
>>>>>>
>>>>>> v15-RFC :
>>>>>> https://lore.kernel.org/lkml/20240130222034.37181-1-tony.luck@intel.com/
>>>>>>
>>>>>> What do you think?
>>>>>>
>>>>>
>>>>> Some highlights of my thoughts in response to that series, but the whole
>>>>> thread
>>>>> may be of interest to you:
>>>>> https://lore.kernel.org/lkml/78c88c6d-2e8d-42d1-a6f2-1c5ac38fb258@intel.com/
>>>>> https://lore.kernel.org/lkml/59944211-d34a-4ba3-a1de-095822c0b3f0@intel.com/
>>>>>
>>>>
>>>> Went through the thread, in summary:
>>>>
>>>> The main concerns are related to duplication of code and data structures.
>>>>
>>>> The solutions are
>>>>
>>>> a) Split the domains.
>>>> This is what this series is doing now. This creates members like
>>>> ctrl_scope, mon_scope, ctrl_domains etc.. These fields are added to all
>>>> the resources (MBA, SMBA and L2). Then there is additional domain header.
>>>>
>>>>
>>>> b) Split the resource.
>>>>    Split RDT_RESOURCE_L3 into two, one for "monitor" and one for
>>>> "control".
>>>>    There will be one domain structure for "monitor" and  one for
>>>> "control"
>>>>
>>>> Both these approaches have code and data duplication. So, there is no
>>>> difference that way.
>>>
>>> Could you please elaborate where code and data duplication of (a) is?
>>
>> We have ctrl_scope, mon_scope, ctrl_domains. mon_domains.  Only one
>> resource, RDT_RESOURCE_L3 is going to use these fields. Rest of the
>> resources don't need these fields. But these fields are part of all
>> the resources.
> 
> Correct. There are two new empty fields per resource that does
> not support monitoring. Having the new mon_domains list results in
> the benefit of eliminating monitoring fields from all the domains
> forming part of resources that do not support monitoring. Providing
> more details below but the additional pointer results in a significant
> net reduction of unused fields. Having the new mon_scope field results
> in the benefit to support SNC.
> 
>>
>> I am not too worried about the size of the patch.  But, I don't
>> foresee these fields will be used anytime soon in these
>> resources(MBA. L3. SMBA). Why add it now? In future we may have to
>> cleanup all these anyways.
> 
> This work does indeed go through the effort to _eliminate_ unused fields.
> Note how all domains of all resources (whether they support monitoring or
> not) currently have to carry a significant number of monitoring fields.
> These can be found in both struct rdt_domain (*rmid_busy_llc, *mbm_total,
> *mbm_local, mbm_over, cqm_limbo, mbm_work_cpu, cqm_work_cpu)  as
> well as struct rdt_hw_domain (*arch_mbm_total, *arch_mbm_local).
> 
> For a resource that does not support monitoring it is of course
> unnecessary to carry all of this for _every_ domain instance and
> after this series it no longer will.

Yes. I see that. Thanks for the explanation. Lets go ahead with the
series. This feature is been pending for a while. I will provide my
comments for series.
-- 
Thanks
Babu Moger
RE: [PATCH v20 00/18] Add support for Sub-NUMA cluster (SNC) systems
Posted by Luck, Tony 1 year, 8 months ago
> We have ctrl_scope, mon_scope, ctrl_domains. mon_domains.  Only one 
> resource, RDT_RESOURCE_L3 is going to use these fields. Rest of the 
> resources don't need these fields. But these fields are part of all the 
> resources.
>
> I am not too worried about the size of the patch.  But, I don't foresee 
> these fields will be used anytime soon in these resources(MBA. L3. 
> SMBA). Why add it now? In future we may have to cleanup all these anyways.

Babu,

I mentioned yesterday that future patches could split struct rdt_resource. I was noodling
at doing so back in February. Patches (messy, not finished or fit for consumption) are
here (just to give you an idea where things might build from here):

git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git rdt_split_resource

-Tony
Re: [PATCH v20 00/18] Add support for Sub-NUMA cluster (SNC) systems
Posted by Moger, Babu 1 year, 8 months ago

On 6/14/2024 4:40 PM, Luck, Tony wrote:
>> We have ctrl_scope, mon_scope, ctrl_domains. mon_domains.  Only one
>> resource, RDT_RESOURCE_L3 is going to use these fields. Rest of the
>> resources don't need these fields. But these fields are part of all the
>> resources.
>>
>> I am not too worried about the size of the patch.  But, I don't foresee
>> these fields will be used anytime soon in these resources(MBA. L3.
>> SMBA). Why add it now? In future we may have to cleanup all these anyways.
> 
> Babu,
> 
> I mentioned yesterday that future patches could split struct rdt_resource. I was noodling
> at doing so back in February. Patches (messy, not finished or fit for consumption) are
> here (just to give you an idea where things might build from here):
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git rdt_split_resource

I see, it is similar to splitting the resource (option a. v15-RFC) and 
little more. I think we should move to that direction eventually.
-- 
Babu Moger
RE: [PATCH v20 00/18] Add support for Sub-NUMA cluster (SNC) systems
Posted by Luck, Tony 1 year, 8 months ago
> Looking at the series again, I see this approach adds lots of code.
> Look at this structure.
> 
> 
> @@ -187,10 +196,12 @@ struct rdt_resource {
>   	bool			alloc_capable;
>   	bool			mon_capable;
>   	int			num_rmid;
> -	enum resctrl_scope	scope;
> +	enum resctrl_scope	ctrl_scope;
> +	enum resctrl_scope	mon_scope;
>   	struct resctrl_cache	cache;
>   	struct resctrl_membw	membw;
> -	struct list_head	domains;
> +	struct list_head	ctrl_domains;
> +	struct list_head	mon_domains;
>   	char			*name;
>   	int			data_width;
>   	u32			default_ctrl;
> 
> There are two scope fields.
> There are two domains fields.

I might at some future time split struct rdt_resource into struct
rdt_ctrl_resource and struct rdt_mon_resource.  That would get rid of
multiple scope and domain fields. There are also a bunch of fields that
are specific to just ctrl or mon functions.

That would require other churn. E.g. getting rid of the
rdt_resources_all[] array and the macros that scan it to perform
various actions. Replace with two lists, one each for active ctrl/mon
resources. Once James' patches to split into architecture vs. generic
parts are applied this might be useful so that CPU vendors could add
resources that didn't have equivalents for other architectures.

There isn't a pressing need for that today. But splitting the rdt_domain
structure now makes for a good base to build on later (if needed).

-Tony