RE: [PATCH v15-RFC 0/8] Add support for Sub-NUMA cluster (SNC) systems

Luck, Tony posted 8 patches 1 week, 6 days ago
Only 0 patches received!
There is a newer version of this series
RE: [PATCH v15-RFC 0/8] Add support for Sub-NUMA cluster (SNC) systems
Posted by Luck, Tony 1 week, 6 days ago
>> To be honest, I like this series more than the previous series. I always
>> thought RDT_RESOURCE_L3_MON should have been a separate resource by itself.
>
> Would you prefer that your "Reviewed-by" tag be removed from the
> previous series?

I'm thinking that I could continue splitting things and break "struct rdt_resource" into
separate "ctrl" and "mon" structures. Then we'd have a clean split from top to bottom.

Doing that would get rid of the rdt_resources_all[] array. Replacing with individual
rdt_hw_ctrl_resource and rdt_hw_mon_resource declarations for each feature.

Features found on a system would be added to a list of ctrl or list of mon resources.

-Tony
Re: [PATCH v15-RFC 0/8] Add support for Sub-NUMA cluster (SNC) systems
Posted by Reinette Chatre 1 week, 6 days ago
Hi Tony,

On 2/12/2024 11:57 AM, Luck, Tony wrote:
>>> To be honest, I like this series more than the previous series. I always
>>> thought RDT_RESOURCE_L3_MON should have been a separate resource by itself.
>>
>> Would you prefer that your "Reviewed-by" tag be removed from the
>> previous series?
> 
> I'm thinking that I could continue splitting things and break "struct rdt_resource" into
> separate "ctrl" and "mon" structures. Then we'd have a clean split from top to bottom.

It is not obvious what you mean with "continue splitting things". Are you
speaking about "continue splitting from v14" or "continue splitting from v15-RFC"?

I think that any solution needs to consider what makes sense for resctrl
as a whole instead of how to support SNC with smallest patch possible.

There should not be any changes that makes resctrl harder to understand
and maintain, as exemplified by confusion introduced by a simple thing as
resource name choice [1].

> 
> Doing that would get rid of the rdt_resources_all[] array. Replacing with individual
> rdt_hw_ctrl_resource and rdt_hw_mon_resource declarations for each feature.
>
> Features found on a system would be added to a list of ctrl or list of mon resources.

Could you please elaborate what is architecturally wrong with v14 and how this
new proposal addresses that?

Reinette

[1] https://lore.kernel.org/lkml/ZcZyqs5hnQqZ5ZV0@agluck-desk3/
Re: [PATCH v15-RFC 0/8] Add support for Sub-NUMA cluster (SNC) systems
Posted by Tony Luck 1 week, 6 days ago
On Mon, Feb 12, 2024 at 01:43:56PM -0800, Reinette Chatre wrote:
> Hi Tony,
> 
> On 2/12/2024 11:57 AM, Luck, Tony wrote:
> >>> To be honest, I like this series more than the previous series. I always
> >>> thought RDT_RESOURCE_L3_MON should have been a separate resource by itself.
> >>
> >> Would you prefer that your "Reviewed-by" tag be removed from the
> >> previous series?
> > 
> > I'm thinking that I could continue splitting things and break "struct rdt_resource" into
> > separate "ctrl" and "mon" structures. Then we'd have a clean split from top to bottom.
> 
> It is not obvious what you mean with "continue splitting things". Are you
> speaking about "continue splitting from v14" or "continue splitting from v15-RFC"?

I'm speaking of some future potential changes. Not proposing to
do this now.

> I think that any solution needs to consider what makes sense for resctrl
> as a whole instead of how to support SNC with smallest patch possible.

I am officially abandoning my v15-RFC patches. I wasn't clear enough in
my e-mail earlier today.

https://lore.kernel.org/all/SJ1PR11MB608378D1304224D9E8A9016FFC482@SJ1PR11MB6083.namprd11.prod.outlook.com/
> 
> There should not be any changes that makes resctrl harder to understand
> and maintain, as exemplified by confusion introduced by a simple thing as
> resource name choice [1].
> 
> > 
> > Doing that would get rid of the rdt_resources_all[] array. Replacing with individual
> > rdt_hw_ctrl_resource and rdt_hw_mon_resource declarations for each feature.
> >
> > Features found on a system would be added to a list of ctrl or list of mon resources.
> 
> Could you please elaborate what is architecturally wrong with v14 and how this
> new proposal addresses that?

There is nothing architecturally wrong with v14. I thought it was more
complex than it needed to be. You have convinced me that my v15-RFC
series, while simpler, is not a reasonable path for long-term resctrl
maintainability.
> 
> Reinette
> 
> [1] https://lore.kernel.org/lkml/ZcZyqs5hnQqZ5ZV0@agluck-desk3/

-Tony
Re: [PATCH v15-RFC 0/8] Add support for Sub-NUMA cluster (SNC) systems
Posted by James Morse 1 week, 5 days ago
Hello,

On 12/02/2024 22:05, Tony Luck wrote:
> On Mon, Feb 12, 2024 at 01:43:56PM -0800, Reinette Chatre wrote:
>> On 2/12/2024 11:57 AM, Luck, Tony wrote:
>>>>> To be honest, I like this series more than the previous series. I always
>>>>> thought RDT_RESOURCE_L3_MON should have been a separate resource by itself.
>>>>
>>>> Would you prefer that your "Reviewed-by" tag be removed from the
>>>> previous series?
>>>
>>> I'm thinking that I could continue splitting things and break "struct rdt_resource" into
>>> separate "ctrl" and "mon" structures. Then we'd have a clean split from top to bottom.
>>
>> It is not obvious what you mean with "continue splitting things". Are you
>> speaking about "continue splitting from v14" or "continue splitting from v15-RFC"?
> 
> I'm speaking of some future potential changes. Not proposing to
> do this now.
> 
>> I think that any solution needs to consider what makes sense for resctrl
>> as a whole instead of how to support SNC with smallest patch possible.

>> There should not be any changes that makes resctrl harder to understand
>> and maintain, as exemplified by confusion introduced by a simple thing as
>> resource name choice [1].
>>
>>>
>>> Doing that would get rid of the rdt_resources_all[] array. Replacing with individual
>>> rdt_hw_ctrl_resource and rdt_hw_mon_resource declarations for each feature.
>>>
>>> Features found on a system would be added to a list of ctrl or list of mon resources.
>>
>> Could you please elaborate what is architecturally wrong with v14 and how this
>> new proposal addresses that?
> 
> There is nothing architecturally wrong with v14. I thought it was more
> complex than it needed to be. You have convinced me that my v15-RFC
> series, while simpler, is not a reasonable path for long-term resctrl
> maintainability.

I'm not sure if its helpful to describe a third approach at this point - but on the off
chance its useful:
With SNC enable, the L3 monitors are unaffected, but the controls behave as if they were
part of some other component in the system..

ACPI describes something called "memory side caches" [0] in the HMAT table, which are
outside the CPU cache hierarchy, and are associated with a Proximity-Domain. I've heard
that one of Arm's partners has built a system with MPAM controls on something like this.
How would we support this - and would this be a better fit for the way SNC behaves?

I think this would be a new resource and schema, 'MSC'(?) with domain-ids using the NUMA
nid. As these aren't CPU caches, they wouldn't appear in the same part of the sysfs
hierarchy, and wouldn't necessarily have a cache-id.

For SNC systems, I think this would look like CMT on the L3, and CAT on the 'MSC'.
Existing software wouldn't know to use the new schema, but equally wouldn't be surprised
by the domain-ids being something other than the cache-id, and the controls and monitors
not lining up.
Where its not quite right for SNC is sysfs may not describe a memory side cache, but one
would be present in resctrl. I don't think that's a problem - unless these systems do also
have a memory-side-cache that behaves differently. (where is the controls being applied at
the 'near' side of the link - I don't think the difference matters)


I'm a little nervous that the SNC support looks strange if we ever add support for
something like the above. Given its described in ACPI, I assume there are plenty of
machines out there that look like this.

(Why aren't memory-side-caches a CPU cache? They live near the memory controller and cache
based on the PA, not the CPU that issued the transaction)


Thanks,

James

[0]
https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#memory-side-cache-overview
RE: [PATCH v15-RFC 0/8] Add support for Sub-NUMA cluster (SNC) systems
Posted by Luck, Tony 1 week, 5 days ago
> With SNC enable, the L3 monitors are unaffected, but the controls behave as if they were
> part of some other component in the system.

I don't think of it like that. See attached picture of a single socket divided in two by SNC.
[If the attachment is stripped off for those reading this via mailing lists, if you want the
picture, just send me an e-mail.]

Everything in blue is node 0. Yellow for node 1.

The rectangles in the middle represent the L3 cache (12-way associative). When cores
in node 0 access memory in node 0, it will be cached using the "top" half of the cache
indices. Similarly for node 1 using the "bottom" half.

Here’s how each of the Intel L3 resctrl functions operate with SNC enabled:

CQM: Reports how much of your half of the L3 cache is occupied

MBM: Reports on memory traffic from your half of the cache to your memory controllers.

CAT: Still controls which ways of the cache are available for allocation (but each way
has half the capacity.)

MBA: The same throttling levels applied to "blue" and "yellow" traffic (because there
are only socket level controls).

> I'm a little nervous that the SNC support looks strange if we ever add support for
> something like the above. Given its described in ACPI, I assume there are plenty of
> machines out there that look like this.

I'm also nervous as h/w designers find various ways to diverge from the old paradigm of

	socket scope == L3 cache scope == NUMA node scope

-Tony