Use the "sd_llc" passed to select_idle_cpu() to obtain the
"sd_llc_shared" instead of dereferencing the per-CPU variable.
Since "sd->shared" is always reclaimed at the same time as "sd" via
call_rcu() and update_top_cache_domain() always ensures a valid
"sd->shared" assignment when "sd_llc" is present, "sd_llc->shared" can
always be dereferenced without needing an additional check.
While at it move the cpumask_and() operation after the SIS_UTIL bailout
check to avoid unnecessarily computing the cpumask.
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
Changelog rfc v2..v3:
o No changes. to the diff. Added more details on directly dereferencing
"sd->shared" without a NULL check in the commit message.
---
kernel/sched/fair.c | 19 ++++++++-----------
1 file changed, 8 insertions(+), 11 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c308c0700a7f..b4ae9444d32f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7629,21 +7629,18 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
{
struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_rq_mask);
int i, cpu, idle_cpu = -1, nr = INT_MAX;
- struct sched_domain_shared *sd_share;
-
- cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
if (sched_feat(SIS_UTIL)) {
- sd_share = rcu_dereference_all(per_cpu(sd_llc_shared, target));
- if (sd_share) {
- /* because !--nr is the condition to stop scan */
- nr = READ_ONCE(sd_share->nr_idle_scan) + 1;
- /* overloaded LLC is unlikely to have idle cpu/core */
- if (nr == 1)
- return -1;
- }
+ /* because !--nr is the condition to stop scan */
+ nr = READ_ONCE(sd->shared->nr_idle_scan) + 1;
+ /* overloaded LLC is unlikely to have idle cpu/core */
+ if (nr == 1)
+ return -1;
}
+ if (!cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr))
+ return -1;
+
if (static_branch_unlikely(&sched_cluster_active)) {
struct sched_group *sg = sd->groups;
--
2.34.1
On 1/20/26 5:02 PM, K Prateek Nayak wrote:
> Use the "sd_llc" passed to select_idle_cpu() to obtain the
> "sd_llc_shared" instead of dereferencing the per-CPU variable.
>
> Since "sd->shared" is always reclaimed at the same time as "sd" via
> call_rcu() and update_top_cache_domain() always ensures a valid
> "sd->shared" assignment when "sd_llc" is present, "sd_llc->shared" can
> always be dereferenced without needing an additional check.
>
> While at it move the cpumask_and() operation after the SIS_UTIL bailout
> check to avoid unnecessarily computing the cpumask.
>
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
> Changelog rfc v2..v3:
>
> o No changes. to the diff. Added more details on directly dereferencing
> "sd->shared" without a NULL check in the commit message.
> ---
> kernel/sched/fair.c | 19 ++++++++-----------
> 1 file changed, 8 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c308c0700a7f..b4ae9444d32f 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7629,21 +7629,18 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
> {
> struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_rq_mask);
> int i, cpu, idle_cpu = -1, nr = INT_MAX;
> - struct sched_domain_shared *sd_share;
> -
> - cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
>
> if (sched_feat(SIS_UTIL)) {
> - sd_share = rcu_dereference_all(per_cpu(sd_llc_shared, target));
> - if (sd_share) {
> - /* because !--nr is the condition to stop scan */
> - nr = READ_ONCE(sd_share->nr_idle_scan) + 1;
> - /* overloaded LLC is unlikely to have idle cpu/core */
> - if (nr == 1)
> - return -1;
> - }
> + /* because !--nr is the condition to stop scan */
> + nr = READ_ONCE(sd->shared->nr_idle_scan) + 1;
> + /* overloaded LLC is unlikely to have idle cpu/core */
> + if (nr == 1)
> + return -1;
> }
I stared at sd->shared->nr_idle_scan for a while to see why it is safe
even when lets say there is no LLC domain.
It is because it is sd_llc here. Not any other domains. and
there is sd_llc check before calling select_idle_cpu.
So maybe add a comment here, saying null check for sd_llc is already there
and that's why it is safe to call it directly.
>
> + if (!cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr))
> + return -1;
> +
> if (static_branch_unlikely(&sched_cluster_active)) {
> struct sched_group *sg = sd->groups;
>
While reading this series, this reminded me we had discussed about unifying
sd_llc->shared and sd_llc_shared thing into one (in v1 or v2).
is that dropped or you plan to fix it after this series?
Other than minor comments and nits series looks good to me.
So, for the series.
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Hello Shrikanth,
On 1/23/2026 11:36 AM, Shrikanth Hegde wrote:
>> + /* because !--nr is the condition to stop scan */
>> + nr = READ_ONCE(sd->shared->nr_idle_scan) + 1;
>> + /* overloaded LLC is unlikely to have idle cpu/core */
>> + if (nr == 1)
>> + return -1;
>> }
>
>
> I stared at sd->shared->nr_idle_scan for a while to see why it is safe
> even when lets say there is no LLC domain.
>
> It is because it is sd_llc here. Not any other domains. and
> there is sd_llc check before calling select_idle_cpu.
Ack! We come here with a valid "sd_llc" from select_idle_sibling()
and "sd" and "sd->shared" are freed at the same time via call_rcu() when
the last reference is dropped so having a reference to "sd" guarantees
"sd->shared" is not freed and the topology bits will ensure
"sd_llc->shared" is always present (or it screams and we crash here).
>
> So maybe add a comment here, saying null check for sd_llc is already there
> and that's why it is safe to call it directly.
>
>> + if (!cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr))
>> + return -1;
>> +
>> if (static_branch_unlikely(&sched_cluster_active)) {
>> struct sched_group *sg = sd->groups;
>>
>
> While reading this series, this reminded me we had discussed about unifying
> sd_llc->shared and sd_llc_shared thing into one (in v1 or v2).
> is that dropped or you plan to fix it after this series?
Must have slipped out of my mind! I believe the only other user of
"sd_llc_shared" directly would then be nohz_balancer_kick() and
{test,set}_idle_cores().
Out of those, I would only consider set_idle_core() from wakeup to
be a fast-path but we'll already have a "sd_llc" reference there
so we should be able to flip the idle_cores indicator without
needing an extra dereference.
We can only keep per-CPU "sd_llc" and remove "sd_llc_shared". I
hope that is what you were suggesting. Otherwise please let me
know if I misinterpreted the question.
>
>
> Other than minor comments and nits series looks good to me.
> So, for the series.
>
> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Thank you for reviewing the series.
--
Thanks and Regards,
Prateek
On 1/23/26 11:57 AM, K Prateek Nayak wrote:
> Hello Shrikanth,
>
> On 1/23/2026 11:36 AM, Shrikanth Hegde wrote:
>>> + /* because !--nr is the condition to stop scan */
>>> + nr = READ_ONCE(sd->shared->nr_idle_scan) + 1;
>>> + /* overloaded LLC is unlikely to have idle cpu/core */
>>> + if (nr == 1)
>>> + return -1;
>>> }
>>
>>
>> I stared at sd->shared->nr_idle_scan for a while to see why it is safe
>> even when lets say there is no LLC domain.
>>
>> It is because it is sd_llc here. Not any other domains. and
>> there is sd_llc check before calling select_idle_cpu.
>
> Ack! We come here with a valid "sd_llc" from select_idle_sibling()
> and "sd" and "sd->shared" are freed at the same time via call_rcu() when
> the last reference is dropped so having a reference to "sd" guarantees
> "sd->shared" is not freed and the topology bits will ensure
> "sd_llc->shared" is always present (or it screams and we crash here).
>
>>
>> So maybe add a comment here, saying null check for sd_llc is already there
>> and that's why it is safe to call it directly.
>>
>>> + if (!cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr))
>>> + return -1;
>>> +
>>> if (static_branch_unlikely(&sched_cluster_active)) {
>>> struct sched_group *sg = sd->groups;
>>>
>>
>> While reading this series, this reminded me we had discussed about unifying
>> sd_llc->shared and sd_llc_shared thing into one (in v1 or v2).
>> is that dropped or you plan to fix it after this series?
>
> Must have slipped out of my mind! I believe the only other user of
> "sd_llc_shared" directly would then be nohz_balancer_kick() and
> {test,set}_idle_cores().
>
> Out of those, I would only consider set_idle_core() from wakeup to
> be a fast-path but we'll already have a "sd_llc" reference there
> so we should be able to flip the idle_cores indicator without
> needing an extra dereference.
>
> We can only keep per-CPU "sd_llc" and remove "sd_llc_shared". I
> hope that is what you were suggesting. Otherwise please let me
> know if I misinterpreted the question.
>
you got it right. keep sd_llc only.
>>
>>
>> Other than minor comments and nits series looks good to me.
>> So, for the series.
>>
>> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
>
> Thank you for reviewing the series.
>
© 2016 - 2026 Red Hat, Inc.