Documentation/arch/x86/resctrl.rst | 44 ++- include/linux/resctrl.h | 85 +++-- arch/x86/include/asm/msr-index.h | 1 + arch/x86/kernel/cpu/resctrl/internal.h | 67 ++-- arch/x86/kernel/cpu/resctrl/core.c | 428 ++++++++++++++++++---- arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 56 +-- arch/x86/kernel/cpu/resctrl/monitor.c | 70 ++-- arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 26 +- arch/x86/kernel/cpu/resctrl/rdtgroup.c | 213 +++++++---- 9 files changed, 718 insertions(+), 272 deletions(-)
The Sub-NUMA cluster feature on some Intel processors partitions the CPUs
that share an L3 cache into two or more sets. This plays havoc with the
Resource Director Technology (RDT) monitoring features. Prior to this
patch Intel has advised that SNC and RDT are incompatible.
Some of these CPU support an MSR that can partition the RMID counters in
the same way. This allows monitoring features to be used. With the caveat
that users must be aware that Linux may migrate tasks more frequently
between SNC nodes than between "regular" NUMA nodes, so reading counters
from all SNC nodes may be needed to get a complete picture of activity
for tasks.
Cache and memory bandwidth allocation features continue to operate at
the scope of the L3 cache.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
Changes since v15: Link: https://lore.kernel.org/all/20240228112935.8087-tony.luck@intel.com/
0) Note that v14 Reviewed/Testing tags have been removed because of the
extent of refactoring to catch up with upstream. But nothing
fundamental changed, so everything should look familiar.
1) Refactor to apply on top of Link: https://lore.kernel.org/all/20240308213846.77075-1-tony.luck@intel.com/
[So base commit is either tip x86/cache, or upstream current merge PLUS
the two patches in that series]
2) Add patch 9 which adds files showing mappings from domains to CPUs
Reinette suggested this, James thinks it duplicates information
that can be gathered from /sys/devices/system/
Discussion here: Link: https://lore.kernel.org/all/ZetcM9GO2PH6SC0j@agluck-desk3/
This part is a nice-to-have. I'm fine if just the first eight patches
are applied without this while the discussion continues.
Tony Luck (9):
x86/resctrl: Prepare for new domain scope
x86/resctrl: Prepare to split rdt_domain structure
x86/resctrl: Prepare for different scope for control/monitor
operations
x86/resctrl: Split the rdt_domain and rdt_hw_domain structures
x86/resctrl: Add node-scope to the options for feature scope
x86/resctrl: Introduce snc_nodes_per_l3_cache
x86/resctrl: Sub NUMA Cluster detection and enable
x86/resctrl: Update documentation with Sub-NUMA cluster changes
x86/resctrl: Add info files to show mappings from domains to lists of
cpus
Documentation/arch/x86/resctrl.rst | 44 ++-
include/linux/resctrl.h | 85 +++--
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/resctrl/internal.h | 67 ++--
arch/x86/kernel/cpu/resctrl/core.c | 428 ++++++++++++++++++----
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 56 +--
arch/x86/kernel/cpu/resctrl/monitor.c | 70 ++--
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 26 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 213 +++++++----
9 files changed, 718 insertions(+), 272 deletions(-)
--
2.44.0
(+Maciej) Hi Tony, (Please add x86/resctrl to Subject prefix of cover letter) On 3/12/2024 2:42 PM, Tony Luck wrote: > The Sub-NUMA cluster feature on some Intel processors partitions the CPUs > that share an L3 cache into two or more sets. This plays havoc with the > Resource Director Technology (RDT) monitoring features. Prior to this > patch Intel has advised that SNC and RDT are incompatible. > > Some of these CPU support an MSR that can partition the RMID counters in > the same way. This allows monitoring features to be used. With the caveat > that users must be aware that Linux may migrate tasks more frequently > between SNC nodes than between "regular" NUMA nodes, so reading counters > from all SNC nodes may be needed to get a complete picture of activity > for tasks. > > Cache and memory bandwidth allocation features continue to operate at > the scope of the L3 cache. > > Signed-off-by: Tony Luck <tony.luck@intel.com> > > --- > Changes since v15: Link: https://lore.kernel.org/all/20240228112935.8087-tony.luck@intel.com/ > > 0) Note that v14 Reviewed/Testing tags have been removed because of the > extent of refactoring to catch up with upstream. But nothing > fundamental changed, so everything should look familiar. > > 1) Refactor to apply on top of Link: https://lore.kernel.org/all/20240308213846.77075-1-tony.luck@intel.com/ > [So base commit is either tip x86/cache, or upstream current merge PLUS > the two patches in that series] > > 2) Add patch 9 which adds files showing mappings from domains to CPUs > Reinette suggested this, James thinks it duplicates information > that can be gathered from /sys/devices/system/ > Discussion here: Link: https://lore.kernel.org/all/ZetcM9GO2PH6SC0j@agluck-desk3/ > This part is a nice-to-have. I'm fine if just the first eight patches > are applied without this while the discussion continues. I agree to drop patch #9. The core support for SNC continue to look good to me (I just had a few nitpicks). What remains is the user interface that continues to gather opinions [3]. These new discussions were prompted by user space needing a way to determine if resctrl supports SNC. This started by using the "size" file but thinking about it more user space could also look at whether the number of L3 control domains are different from the number of L3 monitoring domains? I am adding Maciej for his opinion (please also include him in future versions of this series). Apart from the user space requirement to know if SNC is supported by resctrl there is also the interface with which user space obtains the monitoring data. James highlighted [1] that the interface used in this series uses existing files to represent different content, and can thus be considered as "broken". It is not obvious to me how to "fix" this. Should we continue to explore interfaces like [2] that attempts to add SNC support into resctrl or should the message continue to be that SNC "plays havoc with the RDT monitoring features" and users wanting to use SNC and RDT at the same time are expected to adapt to the peculiar interface ... or is the preference that after this series "SNC and RDT are compatible" and thus presented with an intuitive interface? Reinette [1] https://lore.kernel.org/lkml/88430722-67b3-4f7d-8db2-95ee52b6f0b0@arm.com/ [2] https://lore.kernel.org/lkml/SJ1PR11MB608309F47C00F964E16205D6FC2D2@SJ1PR11MB6083.namprd11.prod.outlook.com/ [3] https://lore.kernel.org/lkml/SJ1PR11MB608310C72D7189C139EA6302FC212@SJ1PR11MB6083.namprd11.prod.outlook.com/
> I agree to drop patch #9. > > The core support for SNC continue to look good to me (I just had a few nitpicks). > > What remains is the user interface that continues to gather opinions [3]. These new > discussions were prompted by user space needing a way to determine if resctrl supports > SNC. This started by using the "size" file but thinking about it more user space could > also look at whether the number of L3 control domains are different from the number > of L3 monitoring domains? I am adding Maciej for his opinion (please also include him > in future versions of this series). Implementing the revised user interface will make significant changes to the patches needed to support SNC. Working on them now. Minor stuff is the that "size" file won't need to change (because summing across all SNC domains mean that all of the cache is counted in mon_data/mon_L3_xx/llc_occupancy. The new mon_data/mon_L3_xx/mon_NODE_xxllc_occupancy files are the ones limited to 1/snc_ways of L3 capacity. Major stuff is that we now need both the L3-scoped domain list as well as the NODE scoped domain list. So no longer just changing the "mon_scope" field in the L3 rdt_resource from one value to another. I'll maybe have a draft set of patches in a day or two. > Apart from the user space requirement to know if SNC is supported by resctrl there > is also the interface with which user space obtains the monitoring data. > James highlighted [1] that the interface used in this series uses existing files to > represent different content, and can thus be considered as "broken". It is not obvious > to me how to "fix" this. Should we continue to explore interfaces like [2] that > attempts to add SNC support into resctrl or should the message continue to be > that SNC "plays havoc with the RDT monitoring features" and users wanting to use > SNC and RDT at the same time are expected to adapt to the peculiar interface ... > or is the preference that after this series "SNC and RDT are compatible" and > thus presented with an intuitive interface? > > Reinette > > [1] https://lore.kernel.org/lkml/88430722-67b3-4f7d-8db2-95ee52b6f0b0@arm.com/ > [2] https://lore.kernel.org/lkml/SJ1PR11MB608309F47C00F964E16205D6FC2D2@SJ1PR11MB6083.namprd11.prod.outlook.com/ > [3] https://lore.kernel.org/lkml/SJ1PR11MB608310C72D7189C139EA6302FC212@SJ1PR11MB6083.namprd11.prod.outlook.com/
Hi Tony, On 3/19/2024 1:31 PM, Luck, Tony wrote: >> I agree to drop patch #9. >> >> The core support for SNC continue to look good to me (I just had a few nitpicks). >> >> What remains is the user interface that continues to gather opinions [3]. These new >> discussions were prompted by user space needing a way to determine if resctrl supports >> SNC. This started by using the "size" file but thinking about it more user space could >> also look at whether the number of L3 control domains are different from the number >> of L3 monitoring domains? I am adding Maciej for his opinion (please also include him >> in future versions of this series). > > Implementing the revised user interface will make significant changes to the > patches needed to support SNC. Working on them now. > > Minor stuff is the that "size" file won't need to change (because summing across > all SNC domains mean that all of the cache is counted in mon_data/mon_L3_xx/llc_occupancy. > The new mon_data/mon_L3_xx/mon_NODE_xxllc_occupancy files are the ones limited > to 1/snc_ways of L3 capacity. If the "size" file is impacted by this change then I think there is something wrong with the current series because "size" is intended to mirror the schemata file that represents the size in bytes that correspond to the bits from schemata - this monitoring change does not change this relationship. > > Major stuff is that we now need both the L3-scoped domain list as well as the NODE > scoped domain list. So no longer just changing the "mon_scope" field in the L3 > rdt_resource from one value to another. Here may be opportunity to isolate the SNC layout from the customs. For example, the extra layers of files only need to be added when knowing SNC is active and the cache<->node relationships can be determined dynamically on file create and read instead of creating new lists and data structures to try and describe it as part of the architecture. > > I'll maybe have a draft set of patches in a day or two. I was actually hoping to get a discussion started about what folks really want to see instead of you implementing yet another new feature that folks may disagree with. Reinette
Hi Reinette, On 2024-03-19 at 10:51:14 -0700, Reinette Chatre wrote: >What remains is the user interface that continues to gather opinions [3]. These new >discussions were prompted by user space needing a way to determine if resctrl supports >SNC. This started by using the "size" file but thinking about it more user space could >also look at whether the number of L3 control domains are different from the number >of L3 monitoring domains? I am adding Maciej for his opinion (please also include him >in future versions of this series). By this do you mean comparing the contents of main "schemata" file with the number of mon_L3_* files? >Apart from the user space requirement to know if SNC is supported by resctrl there >is also the interface with which user space obtains the monitoring data. >James highlighted [1] that the interface used in this series uses existing files to >represent different content, and can thus be considered as "broken". It is not obvious >to me how to "fix" this. Should we continue to explore interfaces like [2] that >attempts to add SNC support into resctrl or should the message continue to be >that SNC "plays havoc with the RDT monitoring features" and users wanting to use >SNC and RDT at the same time are expected to adapt to the peculiar interface ... >or is the preference that after this series "SNC and RDT are compatible" and >thus presented with an intuitive interface? I kind of liked this idea [1]. Hiding SNC related information behind some not obvious text parsing and size comparisons might eliminate any ease of use for userspace applications. But I agree with you [2] that it's hard to predict the future for this interface and any potential problems with setting up this file structure. [1] https://lore.kernel.org/all/SJ1PR11MB608309F47C00F964E16205D6FC2D2@SJ1PR11MB6083.namprd11.prod.outlook.com/ [2] https://lore.kernel.org/all/7f15a700-f23a-48f9-b335-13ea1735ad84@intel.com/ -- Kind regards Maciej Wieczór-Retman
Hi Maciej, On 3/20/2024 8:21 AM, Maciej Wieczor-Retman wrote: > Hi Reinette, > > On 2024-03-19 at 10:51:14 -0700, Reinette Chatre wrote: >> What remains is the user interface that continues to gather opinions [3]. These new >> discussions were prompted by user space needing a way to determine if resctrl supports >> SNC. This started by using the "size" file but thinking about it more user space could >> also look at whether the number of L3 control domains are different from the number >> of L3 monitoring domains? I am adding Maciej for his opinion (please also include him >> in future versions of this series). > > By this do you mean comparing the contents of main "schemata" file with the > number of mon_L3_* files? (assuming you mean mon_L3_* directories) Yes, the "schemata" file can be used. There is also the "bit_usage" file in the info directory that indicates how many control domains there are. Do you think doing so also falls into the "not obvious text parsing and size comparison" category? > >> Apart from the user space requirement to know if SNC is supported by resctrl there >> is also the interface with which user space obtains the monitoring data. >> James highlighted [1] that the interface used in this series uses existing files to >> represent different content, and can thus be considered as "broken". It is not obvious >> to me how to "fix" this. Should we continue to explore interfaces like [2] that >> attempts to add SNC support into resctrl or should the message continue to be >> that SNC "plays havoc with the RDT monitoring features" and users wanting to use >> SNC and RDT at the same time are expected to adapt to the peculiar interface ... >> or is the preference that after this series "SNC and RDT are compatible" and >> thus presented with an intuitive interface? > > I kind of liked this idea [1]. Hiding SNC related information behind some not > obvious text parsing and size comparisons might eliminate any ease of use for > userspace applications. But I agree with you [2] that it's hard to predict the > future for this interface and any potential problems with setting up this > file structure. Thanks to you for trying this out from user space side and highlighting the difficulty trying to do so. Reinette > > [1] https://lore.kernel.org/all/SJ1PR11MB608309F47C00F964E16205D6FC2D2@SJ1PR11MB6083.namprd11.prod.outlook.com/ > [2] https://lore.kernel.org/all/7f15a700-f23a-48f9-b335-13ea1735ad84@intel.com/ >
On 2024-03-20 at 08:50:51 -0700, Reinette Chatre wrote: >Hi Maciej, > >On 3/20/2024 8:21 AM, Maciej Wieczor-Retman wrote: >> Hi Reinette, >> >> On 2024-03-19 at 10:51:14 -0700, Reinette Chatre wrote: >>> What remains is the user interface that continues to gather opinions [3]. These new >>> discussions were prompted by user space needing a way to determine if resctrl supports >>> SNC. This started by using the "size" file but thinking about it more user space could >>> also look at whether the number of L3 control domains are different from the number >>> of L3 monitoring domains? I am adding Maciej for his opinion (please also include him >>> in future versions of this series). >> >> By this do you mean comparing the contents of main "schemata" file with the >> number of mon_L3_* files? > >(assuming you mean mon_L3_* directories) > >Yes, the "schemata" file can be used. There is also the "bit_usage" file in >the info directory that indicates how many control domains there are. I just did a test on my IceLake server that has SNC enabled and you're right. This can indicate kernel support for SNC. It also seems more reliable in determining the ratio of nodes per socket (since the ratio of cpus per cache/node can potentially fail with offline cpus). >Do you think doing so also falls into the "not obvious text parsing and size >comparison" category? Ideally [1] seems just the most user friendly in my opinion. Comparing schemata with mon_L3_* directories feels like a good next solution though. >>> Apart from the user space requirement to know if SNC is supported by resctrl there >>> is also the interface with which user space obtains the monitoring data. >>> James highlighted [1] that the interface used in this series uses existing files to >>> represent different content, and can thus be considered as "broken". It is not obvious >>> to me how to "fix" this. Should we continue to explore interfaces like [2] that >>> attempts to add SNC support into resctrl or should the message continue to be >>> that SNC "plays havoc with the RDT monitoring features" and users wanting to use >>> SNC and RDT at the same time are expected to adapt to the peculiar interface ... >>> or is the preference that after this series "SNC and RDT are compatible" and >>> thus presented with an intuitive interface? >> >> I kind of liked this idea [1]. Hiding SNC related information behind some not >> obvious text parsing and size comparisons might eliminate any ease of use for >> userspace applications. But I agree with you [2] that it's hard to predict the >> future for this interface and any potential problems with setting up this >> file structure. > >Thanks to you for trying this out from user space side and highlighting the >difficulty trying to do so. I suppose it wasn't that difficult in execution but it required a lot of thinking about what is a reliable way for checking kernel support, what can fail in checking the ratio of nodes to sockets etc. Maybe once we have this figured out all it will take is a good documentation and userspace applications won't have a hard time with SNC. > >Reinette > [1] https://lore.kernel.org/all/SJ1PR11MB608309F47C00F964E16205D6FC2D2@SJ1PR11MB6083.namprd11.prod.outlook.com/ [2] https://lore.kernel.org/all/7f15a700-f23a-48f9-b335-13ea1735ad84@intel.com/ -- Kind regards Maciej Wieczór-Retman
Hi Tony, On 2024-03-12 at 14:42:38 -0700, Tony Luck wrote: >The Sub-NUMA cluster feature on some Intel processors partitions the CPUs >that share an L3 cache into two or more sets. This plays havoc with the >Resource Director Technology (RDT) monitoring features. Prior to this >patch Intel has advised that SNC and RDT are incompatible. > >Some of these CPU support an MSR that can partition the RMID counters in >the same way. This allows monitoring features to be used. With the caveat >that users must be aware that Linux may migrate tasks more frequently >between SNC nodes than between "regular" NUMA nodes, so reading counters >from all SNC nodes may be needed to get a complete picture of activity >for tasks. > >Cache and memory bandwidth allocation features continue to operate at >the scope of the L3 cache. > >Signed-off-by: Tony Luck <tony.luck@intel.com> > >--- >Changes since v15: Link: https://lore.kernel.org/all/20240228112935.8087-tony.luck@intel.com/ > >0) Note that v14 Reviewed/Testing tags have been removed because of the > extent of refactoring to catch up with upstream. But nothing > fundamental changed, so everything should look familiar. > >1) Refactor to apply on top of Link: https://lore.kernel.org/all/20240308213846.77075-1-tony.luck@intel.com/ > [So base commit is either tip x86/cache, or upstream current merge PLUS > the two patches in that series] I'm having problems with cleanly applying this series. Here are the steps I took: - I pulled this series with b4. - I pulled two patches from the "Pass domain to target CPU" series with b4. - Then I hard reseted my branch to tip x86/cache - Tried applying first the two patches and then this series. And here I'm getting conflicts on patch 0002 "Prepare to split rdt_domain structure". I also tried with tip master and applying James Morse's series before your patches but I got the same problem. Am I doing something wrong? I wanted to find what causes the conflict so I can give more details but three way merging doesn't seem to work and git am doesn't leave any conflict markers, just fails. -- Kind regards Maciej Wieczór-Retman
> I wanted to find what causes the conflict so I can give more details but three > way merging doesn't seem to work and git am doesn't leave any conflict markers, > just fails. Maybe there's something else touching resctrl in either TIP or now in upstream? In my local tree I built on upstream after Linus pulled TIP x86/cache (and a bunch or other stuff). So my base commit is: b29f377119f6 Merge tag 'x86-boot-2024-03-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/ti Then the two patch cleanup: b0eebe2ee45d x86/resctrl: Simplify call convention for MSR update functions edf14f3a6093 x86/resctrl: Pass domain to target CPU Then this v16 SNC series. Reinette pointed out a couple of things to fix in v16, I'll handle those and post v17 next week (based on Linus v6.9-rc1). I'll fix up any additional merge issues while rebasing. -Tony
On 2024-03-21 at 16:51:51 +0000, Luck, Tony wrote: >> I wanted to find what causes the conflict so I can give more details but three >> way merging doesn't seem to work and git am doesn't leave any conflict markers, >> just fails. > >Maybe there's something else touching resctrl in either TIP or now in upstream? > >In my local tree I built on upstream after Linus pulled TIP x86/cache (and a bunch >or other stuff). So my base commit is: > >b29f377119f6 Merge tag 'x86-boot-2024-03-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/ti Thanks, I'll try using this one. > >Then the two patch cleanup: > >b0eebe2ee45d x86/resctrl: Simplify call convention for MSR update functions >edf14f3a6093 x86/resctrl: Pass domain to target CPU > >Then this v16 SNC series. > >Reinette pointed out a couple of things to fix in v16, I'll handle those and post >v17 next week (based on Linus v6.9-rc1). I'll fix up any additional merge issues >while rebasing. > >-Tony -- Kind regards Maciej Wieczór-Retman
© 2016 - 2026 Red Hat, Inc.