[PATCH v3 0/8] sched/topology: Optimize sd->shared allocation

K Prateek Nayak posted 8 patches 2 weeks, 3 days ago
include/linux/sched/topology.h |   1 -
kernel/sched/fair.c            |  62 +++++++-----------
kernel/sched/sched.h           |   2 +-
kernel/sched/topology.c        | 111 ++++++++++++++++++++++-----------
4 files changed, 101 insertions(+), 75 deletions(-)
[PATCH v3 0/8] sched/topology: Optimize sd->shared allocation
Posted by K Prateek Nayak 2 weeks, 3 days ago
Discussed at LPC'25, the allocation of per-CPU "sched_domain_shared"
objects for each topology level was found to be unnecessary since only
"sd_llc_shared" is ever used by the scheduler and rest is either
reclaimed during __sdt_free() or remain allocated without any purpose.

Folks are already optimizing for unnecessary sched domain allocations
with commit f79c9aa446d6 ("x86/smpboot: avoid SMT domain attach/destroy
if SMT is not enabled") removing the SMT level entirely on the x86 side
when it is know that the domain will be degenerated anyways by the
scheduler.

This goes one step ahead with the "sched_domain_shared" allocations by
moving it out of "sd_data" which is allocated for every topology level
and into "s_data" instead which is allocated once per partition.

"sd->shared" is only allocated for the topmost SD_SHARE_LLC domain and
the topology layer uses the sched domain degeneration path to pass the
reference to the final "sd_llc" domain. Since degeneration of parent
ensures 1:1 mapping between the span with the child, and the fact that
SD_SHARE_LLC domains never overlap, degeneration of an SD_SAHRE_LLC
domain either means its span is same as that of its child or that it
only contains a single CPU making it redundant.

Since the topology layer also checks for the existence of a valid
"sd->shared" when "sd_llc" is present, the handling of "sd_llc_shared"
can also be simplified when a reference to "sd_llc" is already present
in the scope (Patch 7 and Patch 8).

Patches are based on top of:

  git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core

at commit 5d86d542f68f ("sched/fair: Remove nohz.nr_cpus and use weight
of cpumask instead")
---
Changelog rfc v2..v3:

o Broke off the "sd->shared" assignment optimization into a separate
  series for easier review.

o Spotted a case of incorrect calculation of load balancing periods
  in presence of cpuset partitions (Patch 1).

o Broke off the single "sd->shared" assignment optimization patch into
  3 parts for easier review (Patch 2 - Patch 4). The "Reviewed-by:" tag
  from Gautham was dropped as a result.

o Building on recent effort from Peter to remove the superfluous usage
  of rcu_read_lock() in !preemptible() regions, Patch5 and Patch 6
  cleans up the fair task's wakeup path before adding more cleanups in
  Patch 7 and Patch 8.

o Dropped the RFC tag.

v2: https://lore.kernel.org/lkml/20251208083602.31898-1-kprateek.nayak@amd.com/
---
K Prateek Nayak (8):
  sched/topology: Compute sd_weight considering cpuset partitions
  sched/topology: Allocate per-CPU sched_domain_shared in s_data
  sched/topology: Switch to assigning "sd->shared" from s_data
  sched/topology: Remove sched_domain_shared allocation with sd_data
  sched/core: Check for rcu_read_lock_any_held() in idle_get_state()
  sched/fair: Remove superfluous rcu_read_lock() in the wakeup path
  sched/fair: Simplify the entry condition for update_idle_cpu_scan()
  sched/fair: Simplify SIS_UTIL handling in select_idle_cpu()

 include/linux/sched/topology.h |   1 -
 kernel/sched/fair.c            |  62 +++++++-----------
 kernel/sched/sched.h           |   2 +-
 kernel/sched/topology.c        | 111 ++++++++++++++++++++++-----------
 4 files changed, 101 insertions(+), 75 deletions(-)


base-commit: 5d86d542f68fda7ef6d543ac631b741db734101a
-- 
2.34.1
Re: [PATCH v3 0/8] sched/topology: Optimize sd->shared allocation
Posted by Peter Zijlstra 2 weeks, 2 days ago
On Tue, Jan 20, 2026 at 11:32:38AM +0000, K Prateek Nayak wrote:

> "sd->shared" is only allocated for the topmost SD_SHARE_LLC domain and
> the topology layer uses the sched domain degeneration path to pass the
> reference to the final "sd_llc" domain. 

I'm fairly sure we've had patches that introduced it for other levels at
various times, but clearly none of those ever made it.

Anyway, a quick peek seems to suggest it is still easy to extend.


>  include/linux/sched/topology.h |   1 -
>  kernel/sched/fair.c            |  62 +++++++-----------
>  kernel/sched/sched.h           |   2 +-
>  kernel/sched/topology.c        | 111 ++++++++++++++++++++++-----------
>  4 files changed, 101 insertions(+), 75 deletions(-)

Is this really worth the extra lines though?
Re: [PATCH v3 0/8] sched/topology: Optimize sd->shared allocation
Posted by K Prateek Nayak 2 weeks, 2 days ago
Hello Peter,

On 1/21/2026 9:46 PM, Peter Zijlstra wrote:
> On Tue, Jan 20, 2026 at 11:32:38AM +0000, K Prateek Nayak wrote:
> 
>> "sd->shared" is only allocated for the topmost SD_SHARE_LLC domain and
>> the topology layer uses the sched domain degeneration path to pass the
>> reference to the final "sd_llc" domain. 
> 
> I'm fairly sure we've had patches that introduced it for other levels at
> various times, but clearly none of those ever made it.
> 
> Anyway, a quick peek seems to suggest it is still easy to extend.
> 
> 
>>  include/linux/sched/topology.h |   1 -
>>  kernel/sched/fair.c            |  62 +++++++-----------
>>  kernel/sched/sched.h           |   2 +-
>>  kernel/sched/topology.c        | 111 ++++++++++++++++++++++-----------
>>  4 files changed, 101 insertions(+), 75 deletions(-)
> 
> Is this really worth the extra lines though?

The larger plan was to move the "nohz.idle_cpus" tracking into the
sched_domain_shared instance which will bloat these allocations.

Instead of (#CPUs x #topology_levels) surplus, most of which will get
reclaimed at the end anyways, we'll only have #CPUs worth of
allocations now.

-- 
Thanks and Regards,
Prateek
Re: [PATCH v3 0/8] sched/topology: Optimize sd->shared allocation
Posted by Peter Zijlstra 2 weeks ago
On Thu, Jan 22, 2026 at 08:26:29AM +0530, K Prateek Nayak wrote:
> Hello Peter,
> 
> On 1/21/2026 9:46 PM, Peter Zijlstra wrote:
> > On Tue, Jan 20, 2026 at 11:32:38AM +0000, K Prateek Nayak wrote:
> > 
> >> "sd->shared" is only allocated for the topmost SD_SHARE_LLC domain and
> >> the topology layer uses the sched domain degeneration path to pass the
> >> reference to the final "sd_llc" domain. 
> > 
> > I'm fairly sure we've had patches that introduced it for other levels at
> > various times, but clearly none of those ever made it.
> > 
> > Anyway, a quick peek seems to suggest it is still easy to extend.
> > 
> > 
> >>  include/linux/sched/topology.h |   1 -
> >>  kernel/sched/fair.c            |  62 +++++++-----------
> >>  kernel/sched/sched.h           |   2 +-
> >>  kernel/sched/topology.c        | 111 ++++++++++++++++++++++-----------
> >>  4 files changed, 101 insertions(+), 75 deletions(-)
> > 
> > Is this really worth the extra lines though?
> 
> The larger plan was to move the "nohz.idle_cpus" tracking into the
> sched_domain_shared instance which will bloat these allocations.
> 
> Instead of (#CPUs x #topology_levels) surplus, most of which will get
> reclaimed at the end anyways, we'll only have #CPUs worth of
> allocations now.

Fair enough I suppose. Be sure to call this out as the primary reason
for doing this.