>> To be honest, I like this series more than the previous series. I always >> thought RDT_RESOURCE_L3_MON should have been a separate resource by itself. > > Would you prefer that your "Reviewed-by" tag be removed from the > previous series? I'm thinking that I could continue splitting things and break "struct rdt_resource" into separate "ctrl" and "mon" structures. Then we'd have a clean split from top to bottom. Doing that would get rid of the rdt_resources_all[] array. Replacing with individual rdt_hw_ctrl_resource and rdt_hw_mon_resource declarations for each feature. Features found on a system would be added to a list of ctrl or list of mon resources. -Tony
Hi Tony, On 2/12/2024 11:57 AM, Luck, Tony wrote: >>> To be honest, I like this series more than the previous series. I always >>> thought RDT_RESOURCE_L3_MON should have been a separate resource by itself. >> >> Would you prefer that your "Reviewed-by" tag be removed from the >> previous series? > > I'm thinking that I could continue splitting things and break "struct rdt_resource" into > separate "ctrl" and "mon" structures. Then we'd have a clean split from top to bottom. It is not obvious what you mean with "continue splitting things". Are you speaking about "continue splitting from v14" or "continue splitting from v15-RFC"? I think that any solution needs to consider what makes sense for resctrl as a whole instead of how to support SNC with smallest patch possible. There should not be any changes that makes resctrl harder to understand and maintain, as exemplified by confusion introduced by a simple thing as resource name choice [1]. > > Doing that would get rid of the rdt_resources_all[] array. Replacing with individual > rdt_hw_ctrl_resource and rdt_hw_mon_resource declarations for each feature. > > Features found on a system would be added to a list of ctrl or list of mon resources. Could you please elaborate what is architecturally wrong with v14 and how this new proposal addresses that? Reinette [1] https://lore.kernel.org/lkml/ZcZyqs5hnQqZ5ZV0@agluck-desk3/
On Mon, Feb 12, 2024 at 01:43:56PM -0800, Reinette Chatre wrote: > Hi Tony, > > On 2/12/2024 11:57 AM, Luck, Tony wrote: > >>> To be honest, I like this series more than the previous series. I always > >>> thought RDT_RESOURCE_L3_MON should have been a separate resource by itself. > >> > >> Would you prefer that your "Reviewed-by" tag be removed from the > >> previous series? > > > > I'm thinking that I could continue splitting things and break "struct rdt_resource" into > > separate "ctrl" and "mon" structures. Then we'd have a clean split from top to bottom. > > It is not obvious what you mean with "continue splitting things". Are you > speaking about "continue splitting from v14" or "continue splitting from v15-RFC"? I'm speaking of some future potential changes. Not proposing to do this now. > I think that any solution needs to consider what makes sense for resctrl > as a whole instead of how to support SNC with smallest patch possible. I am officially abandoning my v15-RFC patches. I wasn't clear enough in my e-mail earlier today. https://lore.kernel.org/all/SJ1PR11MB608378D1304224D9E8A9016FFC482@SJ1PR11MB6083.namprd11.prod.outlook.com/ > > There should not be any changes that makes resctrl harder to understand > and maintain, as exemplified by confusion introduced by a simple thing as > resource name choice [1]. > > > > > Doing that would get rid of the rdt_resources_all[] array. Replacing with individual > > rdt_hw_ctrl_resource and rdt_hw_mon_resource declarations for each feature. > > > > Features found on a system would be added to a list of ctrl or list of mon resources. > > Could you please elaborate what is architecturally wrong with v14 and how this > new proposal addresses that? There is nothing architecturally wrong with v14. I thought it was more complex than it needed to be. You have convinced me that my v15-RFC series, while simpler, is not a reasonable path for long-term resctrl maintainability. > > Reinette > > [1] https://lore.kernel.org/lkml/ZcZyqs5hnQqZ5ZV0@agluck-desk3/ -Tony
Hello, On 12/02/2024 22:05, Tony Luck wrote: > On Mon, Feb 12, 2024 at 01:43:56PM -0800, Reinette Chatre wrote: >> On 2/12/2024 11:57 AM, Luck, Tony wrote: >>>>> To be honest, I like this series more than the previous series. I always >>>>> thought RDT_RESOURCE_L3_MON should have been a separate resource by itself. >>>> >>>> Would you prefer that your "Reviewed-by" tag be removed from the >>>> previous series? >>> >>> I'm thinking that I could continue splitting things and break "struct rdt_resource" into >>> separate "ctrl" and "mon" structures. Then we'd have a clean split from top to bottom. >> >> It is not obvious what you mean with "continue splitting things". Are you >> speaking about "continue splitting from v14" or "continue splitting from v15-RFC"? > > I'm speaking of some future potential changes. Not proposing to > do this now. > >> I think that any solution needs to consider what makes sense for resctrl >> as a whole instead of how to support SNC with smallest patch possible. >> There should not be any changes that makes resctrl harder to understand >> and maintain, as exemplified by confusion introduced by a simple thing as >> resource name choice [1]. >> >>> >>> Doing that would get rid of the rdt_resources_all[] array. Replacing with individual >>> rdt_hw_ctrl_resource and rdt_hw_mon_resource declarations for each feature. >>> >>> Features found on a system would be added to a list of ctrl or list of mon resources. >> >> Could you please elaborate what is architecturally wrong with v14 and how this >> new proposal addresses that? > > There is nothing architecturally wrong with v14. I thought it was more > complex than it needed to be. You have convinced me that my v15-RFC > series, while simpler, is not a reasonable path for long-term resctrl > maintainability. I'm not sure if its helpful to describe a third approach at this point - but on the off chance its useful: With SNC enable, the L3 monitors are unaffected, but the controls behave as if they were part of some other component in the system.. ACPI describes something called "memory side caches" [0] in the HMAT table, which are outside the CPU cache hierarchy, and are associated with a Proximity-Domain. I've heard that one of Arm's partners has built a system with MPAM controls on something like this. How would we support this - and would this be a better fit for the way SNC behaves? I think this would be a new resource and schema, 'MSC'(?) with domain-ids using the NUMA nid. As these aren't CPU caches, they wouldn't appear in the same part of the sysfs hierarchy, and wouldn't necessarily have a cache-id. For SNC systems, I think this would look like CMT on the L3, and CAT on the 'MSC'. Existing software wouldn't know to use the new schema, but equally wouldn't be surprised by the domain-ids being something other than the cache-id, and the controls and monitors not lining up. Where its not quite right for SNC is sysfs may not describe a memory side cache, but one would be present in resctrl. I don't think that's a problem - unless these systems do also have a memory-side-cache that behaves differently. (where is the controls being applied at the 'near' side of the link - I don't think the difference matters) I'm a little nervous that the SNC support looks strange if we ever add support for something like the above. Given its described in ACPI, I assume there are plenty of machines out there that look like this. (Why aren't memory-side-caches a CPU cache? They live near the memory controller and cache based on the PA, not the CPU that issued the transaction) Thanks, James [0] https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#memory-side-cache-overview
> With SNC enable, the L3 monitors are unaffected, but the controls behave as if they were > part of some other component in the system. I don't think of it like that. See attached picture of a single socket divided in two by SNC. [If the attachment is stripped off for those reading this via mailing lists, if you want the picture, just send me an e-mail.] Everything in blue is node 0. Yellow for node 1. The rectangles in the middle represent the L3 cache (12-way associative). When cores in node 0 access memory in node 0, it will be cached using the "top" half of the cache indices. Similarly for node 1 using the "bottom" half. Here’s how each of the Intel L3 resctrl functions operate with SNC enabled: CQM: Reports how much of your half of the L3 cache is occupied MBM: Reports on memory traffic from your half of the cache to your memory controllers. CAT: Still controls which ways of the cache are available for allocation (but each way has half the capacity.) MBA: The same throttling levels applied to "blue" and "yellow" traffic (because there are only socket level controls). > I'm a little nervous that the SNC support looks strange if we ever add support for > something like the above. Given its described in ACPI, I assume there are plenty of > machines out there that look like this. I'm also nervous as h/w designers find various ways to diverge from the old paradigm of socket scope == L3 cache scope == NUMA node scope -Tony
© 2016 - 2025 Red Hat, Inc.