arch/x86/include/asm/topology.h | 4 +- arch/x86/kernel/itmt.c | 81 ++++++++++++++------------------- arch/x86/kernel/smpboot.c | 19 +------- kernel/sched/fair.c | 41 +++++++++++++---- kernel/sched/sched.h | 1 - kernel/sched/topology.c | 15 +----- 6 files changed, 69 insertions(+), 92 deletions(-)
The ITMT infrastructure currently assumes ITMT rankings to be static and
is set correctly prior to enabling ITMT support which allows the CPU
with the highest core ranking to be cached as the "asym_prefer_cpu" in
the sched_group struct. However, with the introduction of Preferred Core
support in amd-pstate, these rankings can change at runtime.
This series adds support for dynamic ranking in generic scheduler layer
without the need to rebuild the sched domain hierarchy and fixes an
issue with x86_die_flags() on AMD systems that support Preferred Core
ranking with some yak shaving done along the way.
Patch 1 to 4 are independent cleanup around ITMT infrastructure, removal
of x86_smt_flags wrapper, and moving the "sched_itmt_enabled" sysctl to
debugfs.
Patch 5 adds the SD_ASYM_PACKING flag to the PKG domain on all ITMT
enabled systems. The rationale behind the addition is elaborates in the
same. One open question remains is for Intel processors with multiple
Tiles in a PKG which advertises itself as multiple LLCs in a PKG and
supports ITMT - is it okay to set SD_ASYM_PACKING for PKG domain on
these processors?
Patch 6 and 7 are independent possible micro-optimizations discovered
when auditing update_sg_lb_stats()
Patch 8 uncaches the asym_prefer_cpu from the sched_group struct and
finds it during load balancing in update_sg_lb_stats() before it is used
to make any scheduling decisions. This is the simplest approach; an
alternate approach would be to move the asym_prefer_cpu to
sched_domain_shared and allow the first load balancing instance post a
priority change to update the cached asym_prefer_cpu. On systems with
static priorities, this would allow benefits of caching while on systems
with dynamic priorities, it'll reduce the overhead of finding
"asym_prefer_cpu" each time update_sg_lb_stats() is called however the
benefits come with added code complexity which is why Patch 8 is marked
as an RFC.
This series is based on
git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core
at commit 2a77e4be12cb ("sched/fair: Untangle NEXT_BUDDY and
pick_next_task()") and is a spiritual successor to a previous attempt
at fixing the x86_die_flags() on Preferred Core enabled system by Mario
that can be found at
https://lore.kernel.org/lkml/20241203201129.31957-1-mario.limonciello@amd.com/
---
K Prateek Nayak (8):
x86/itmt: Convert "sysctl_sched_itmt_enabled" to boolean
x86/itmt: Use guard() for itmt_update_mutex
x86/itmt: Move the "sched_itmt_enabled" sysctl to debugfs
x86/topology: Remove x86_smt_flags and use cpu_smt_flags directly
x86/topology: Use x86_sched_itmt_flags for PKG domain unconditionally
sched/fair: Do not compute NUMA Balancing stats unnecessarily during
lb
sched/fair: Do not compute overloaded status unnecessarily during lb
sched/fair: Uncache asym_prefer_cpu and find it during
update_sd_lb_stats()
arch/x86/include/asm/topology.h | 4 +-
arch/x86/kernel/itmt.c | 81 ++++++++++++++-------------------
arch/x86/kernel/smpboot.c | 19 +-------
kernel/sched/fair.c | 41 +++++++++++++----
kernel/sched/sched.h | 1 -
kernel/sched/topology.c | 15 +-----
6 files changed, 69 insertions(+), 92 deletions(-)
base-commit: 2a77e4be12cb58bbf774e7c717c8bb80e128b7a4
--
2.34.1
On Wed, 2024-12-11 at 18:55 +0000, K Prateek Nayak wrote:
> The ITMT infrastructure currently assumes ITMT rankings to be static and
> is set correctly prior to enabling ITMT support which allows the CPU
> with the highest core ranking to be cached as the "asym_prefer_cpu" in
> the sched_group struct. However, with the introduction of Preferred Core
> support in amd-pstate, these rankings can change at runtime.
>
> This series adds support for dynamic ranking in generic scheduler layer
> without the need to rebuild the sched domain hierarchy and fixes an
> issue with x86_die_flags() on AMD systems that support Preferred Core
> ranking with some yak shaving done along the way.
>
> Patch 1 to 4 are independent cleanup around ITMT infrastructure, removal
> of x86_smt_flags wrapper, and moving the "sched_itmt_enabled" sysctl to
> debugfs.
>
> Patch 5 adds the SD_ASYM_PACKING flag to the PKG domain on all ITMT
> enabled systems. The rationale behind the addition is elaborates in the
> same. One open question remains is for Intel processors with multiple
> Tiles in a PKG which advertises itself as multiple LLCs in a PKG and
> supports ITMT - is it okay to set SD_ASYM_PACKING for PKG domain on
> these processors?
After talking to my colleagues Ricardo and Srinivas, we think that this
should be fine for Intel CPUs.
Tim
>
> Patch 6 and 7 are independent possible micro-optimizations discovered
> when auditing update_sg_lb_stats()
>
> Patch 8 uncaches the asym_prefer_cpu from the sched_group struct and
> finds it during load balancing in update_sg_lb_stats() before it is used
> to make any scheduling decisions. This is the simplest approach; an
> alternate approach would be to move the asym_prefer_cpu to
> sched_domain_shared and allow the first load balancing instance post a
> priority change to update the cached asym_prefer_cpu. On systems with
> static priorities, this would allow benefits of caching while on systems
> with dynamic priorities, it'll reduce the overhead of finding
> "asym_prefer_cpu" each time update_sg_lb_stats() is called however the
> benefits come with added code complexity which is why Patch 8 is marked
> as an RFC.
>
> This series is based on
>
> git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core
>
> at commit 2a77e4be12cb ("sched/fair: Untangle NEXT_BUDDY and
> pick_next_task()") and is a spiritual successor to a previous attempt
> at fixing the x86_die_flags() on Preferred Core enabled system by Mario
> that can be found at
> https://lore.kernel.org/lkml/20241203201129.31957-1-mario.limonciello@amd.com/
>
> ---
> K Prateek Nayak (8):
> x86/itmt: Convert "sysctl_sched_itmt_enabled" to boolean
> x86/itmt: Use guard() for itmt_update_mutex
> x86/itmt: Move the "sched_itmt_enabled" sysctl to debugfs
> x86/topology: Remove x86_smt_flags and use cpu_smt_flags directly
> x86/topology: Use x86_sched_itmt_flags for PKG domain unconditionally
> sched/fair: Do not compute NUMA Balancing stats unnecessarily during
> lb
> sched/fair: Do not compute overloaded status unnecessarily during lb
> sched/fair: Uncache asym_prefer_cpu and find it during
> update_sd_lb_stats()
>
> arch/x86/include/asm/topology.h | 4 +-
> arch/x86/kernel/itmt.c | 81 ++++++++++++++-------------------
> arch/x86/kernel/smpboot.c | 19 +-------
> kernel/sched/fair.c | 41 +++++++++++++----
> kernel/sched/sched.h | 1 -
> kernel/sched/topology.c | 15 +-----
> 6 files changed, 69 insertions(+), 92 deletions(-)
>
>
> base-commit: 2a77e4be12cb58bbf774e7c717c8bb80e128b7a4
Hello Tim, On 12/13/2024 6:03 AM, Tim Chen wrote: > On Wed, 2024-12-11 at 18:55 +0000, K Prateek Nayak wrote: >> The ITMT infrastructure currently assumes ITMT rankings to be static and >> is set correctly prior to enabling ITMT support which allows the CPU >> with the highest core ranking to be cached as the "asym_prefer_cpu" in >> the sched_group struct. However, with the introduction of Preferred Core >> support in amd-pstate, these rankings can change at runtime. >> >> This series adds support for dynamic ranking in generic scheduler layer >> without the need to rebuild the sched domain hierarchy and fixes an >> issue with x86_die_flags() on AMD systems that support Preferred Core >> ranking with some yak shaving done along the way. >> >> Patch 1 to 4 are independent cleanup around ITMT infrastructure, removal >> of x86_smt_flags wrapper, and moving the "sched_itmt_enabled" sysctl to >> debugfs. >> >> Patch 5 adds the SD_ASYM_PACKING flag to the PKG domain on all ITMT >> enabled systems. The rationale behind the addition is elaborates in the >> same. One open question remains is for Intel processors with multiple >> Tiles in a PKG which advertises itself as multiple LLCs in a PKG and >> supports ITMT - is it okay to set SD_ASYM_PACKING for PKG domain on >> these processors? > > After talking to my colleagues Ricardo and Srinivas, we think that this > should be fine for Intel CPUs. Thank you for confirming that. Could you also confirm if my observations for Intel systems on Patch 5 covered all possible scenarios for the ones that feature multiple MC groups within a PKG and enable ITMT support. If I'm missing something, please do let me know and we can hash out the implementation details. Thanks a ton for reviewing the series! -- Thanks and Regards, Prateek > > Tim > >> >> Patch 6 and 7 are independent possible micro-optimizations discovered >> when auditing update_sg_lb_stats() >> >> Patch 8 uncaches the asym_prefer_cpu from the sched_group struct and >> finds it during load balancing in update_sg_lb_stats() before it is used >> to make any scheduling decisions. This is the simplest approach; an >> alternate approach would be to move the asym_prefer_cpu to >> sched_domain_shared and allow the first load balancing instance post a >> priority change to update the cached asym_prefer_cpu. On systems with >> static priorities, this would allow benefits of caching while on systems >> with dynamic priorities, it'll reduce the overhead of finding >> "asym_prefer_cpu" each time update_sg_lb_stats() is called however the >> benefits come with added code complexity which is why Patch 8 is marked >> as an RFC. > > [..snip..] >
On Fri, 2024-12-13 at 09:42 +0530, K Prateek Nayak wrote: > Hello Tim, > > On 12/13/2024 6:03 AM, Tim Chen wrote: > > On Wed, 2024-12-11 at 18:55 +0000, K Prateek Nayak wrote: > > > The ITMT infrastructure currently assumes ITMT rankings to be static and > > > is set correctly prior to enabling ITMT support which allows the CPU > > > with the highest core ranking to be cached as the "asym_prefer_cpu" in > > > the sched_group struct. However, with the introduction of Preferred Core > > > support in amd-pstate, these rankings can change at runtime. > > > > > > This series adds support for dynamic ranking in generic scheduler layer > > > without the need to rebuild the sched domain hierarchy and fixes an > > > issue with x86_die_flags() on AMD systems that support Preferred Core > > > ranking with some yak shaving done along the way. > > > > > > Patch 1 to 4 are independent cleanup around ITMT infrastructure, removal > > > of x86_smt_flags wrapper, and moving the "sched_itmt_enabled" sysctl to > > > debugfs. > > > > > > Patch 5 adds the SD_ASYM_PACKING flag to the PKG domain on all ITMT > > > enabled systems. The rationale behind the addition is elaborates in the > > > same. One open question remains is for Intel processors with multiple > > > Tiles in a PKG which advertises itself as multiple LLCs in a PKG and > > > supports ITMT - is it okay to set SD_ASYM_PACKING for PKG domain on > > > these processors? > > > > After talking to my colleagues Ricardo and Srinivas, we think that this > > should be fine for Intel CPUs. > > Thank you for confirming that. Could you also confirm if my observations > for Intel systems on Patch 5 covered all possible scenarios for the ones > that feature multiple MC groups within a PKG and enable ITMT support. If > I'm missing something, please do let me know and we can hash out the > implementation details. Your patch 5 implementation should be fine as far as we can tell. Tim > > Thanks a ton for reviewing the series! >
© 2016 - 2025 Red Hat, Inc.