The "sd_weight" used for calculating the load balancing interval, and
its limits, considers the span weight of the entire topology level
without accounting for cpuset partitions.
Compute the "sd_weight" after computing the "sd_span" considering the
cpu_map covered by the partition, and set the load balancing interval,
and its limits accordingly.
Fixes: cb83b629bae03 ("sched/numa: Rewrite the CONFIG_NUMA sched domain support")
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
Changelog rfc v2..v3:
o New patch.
---
kernel/sched/topology.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index cf643a5ddedd..649674bb6c3c 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1638,8 +1638,6 @@ sd_init(struct sched_domain_topology_level *tl,
int sd_id, sd_weight, sd_flags = 0;
struct cpumask *sd_span;
- sd_weight = cpumask_weight(tl->mask(tl, cpu));
-
if (tl->sd_flags)
sd_flags = (*tl->sd_flags)();
if (WARN_ONCE(sd_flags & ~TOPOLOGY_SD_FLAGS,
@@ -1647,8 +1645,6 @@ sd_init(struct sched_domain_topology_level *tl,
sd_flags &= TOPOLOGY_SD_FLAGS;
*sd = (struct sched_domain){
- .min_interval = sd_weight,
- .max_interval = 2*sd_weight,
.busy_factor = 16,
.imbalance_pct = 117,
@@ -1668,7 +1664,6 @@ sd_init(struct sched_domain_topology_level *tl,
,
.last_balance = jiffies,
- .balance_interval = sd_weight,
/* 50% success rate */
.newidle_call = 512,
@@ -1685,6 +1680,11 @@ sd_init(struct sched_domain_topology_level *tl,
cpumask_and(sd_span, cpu_map, tl->mask(tl, cpu));
sd_id = cpumask_first(sd_span);
+ sd_weight = cpumask_weight(sd_span);
+ sd->min_interval = sd_weight;
+ sd->max_interval = 2 * sd_weight;
+ sd->balance_interval = sd_weight;
+
sd->flags |= asym_cpu_capacity_classify(sd_span, cpu_map);
WARN_ONCE((sd->flags & (SD_SHARE_CPUCAPACITY | SD_ASYM_CPUCAPACITY)) ==
--
2.34.1
On 20/01/26 11:32, K Prateek Nayak wrote:
> The "sd_weight" used for calculating the load balancing interval, and
> its limits, considers the span weight of the entire topology level
> without accounting for cpuset partitions.
>
> Compute the "sd_weight" after computing the "sd_span" considering the
> cpu_map covered by the partition, and set the load balancing interval,
> and its limits accordingly.
>
> Fixes: cb83b629bae03 ("sched/numa: Rewrite the CONFIG_NUMA sched domain support")
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
On 1/20/26 5:02 PM, K Prateek Nayak wrote:
> The "sd_weight" used for calculating the load balancing interval, and
> its limits, considers the span weight of the entire topology level
> without accounting for cpuset partitions.
>
Please add one example showing the wrong sd_weights
while having cpuset partitions. That would be helpful.
> Compute the "sd_weight" after computing the "sd_span" considering the
> cpu_map covered by the partition, and set the load balancing interval,
> and its limits accordingly.
>
> Fixes: cb83b629bae03 ("sched/numa: Rewrite the CONFIG_NUMA sched domain support")
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
> Changelog rfc v2..v3:
>
> o New patch.
> ---
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Hello Shrikanth,
On 1/21/2026 9:12 PM, Shrikanth Hegde wrote:
>
>
> On 1/20/26 5:02 PM, K Prateek Nayak wrote:
>> The "sd_weight" used for calculating the load balancing interval, and
>> its limits, considers the span weight of the entire topology level
>> without accounting for cpuset partitions.
>>
>
> Please add one example showing the wrong sd_weights
> while having cpuset partitions. That would be helpful.
Ack! I'll update with an example in the next version.
>
>> Compute the "sd_weight" after computing the "sd_span" considering the
>> cpu_map covered by the partition, and set the load balancing interval,
>> and its limits accordingly.
>>
>> Fixes: cb83b629bae03 ("sched/numa: Rewrite the CONFIG_NUMA sched domain support")
>> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
>> ---
>> Changelog rfc v2..v3:
>>
>> o New patch.
>> ---
>
> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Thanks a ton for the review.
--
Thanks and Regards,
Prateek
On 1/20/2026 7:32 PM, K Prateek Nayak wrote:
> The "sd_weight" used for calculating the load balancing interval, and
> its limits, considers the span weight of the entire topology level
> without accounting for cpuset partitions.
>
> Compute the "sd_weight" after computing the "sd_span" considering the
> cpu_map covered by the partition, and set the load balancing interval,
> and its limits accordingly.
>
> Fixes: cb83b629bae03 ("sched/numa: Rewrite the CONFIG_NUMA sched domain support")
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
This not only fixes the issue for periodic load balancer
but also nohz balance because it gives a correct value for nr_busy_cpus
so from my understanding,
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
thanks,
Chenyu
© 2016 - 2026 Red Hat, Inc.