[PATCH v2 09/23] sched/cache: Count tasks prefering destination LLC in a sched group

Tim Chen posted 23 patches 2 weeks, 1 day ago
There is a newer version of this series
[PATCH v2 09/23] sched/cache: Count tasks prefering destination LLC in a sched group
Posted by Tim Chen 2 weeks, 1 day ago
During LLC load balancing, tabulate the number of tasks on each runqueue
that prefer the LLC contains the env->dst_cpu in a sched group.

For example, consider a system with 4 LLC sched groups (LLC0 to LLC3)
balancing towards LLC3. LLC0 has 3 tasks preferring LLC3, LLC1 has
2, and LLC2 has 1. LLC0, having the most tasks preferring LLC3, is
selected as the busiest source to pick tasks from.

Within a source LLC, the total number of tasks preferring a destination
LLC is computed by summing counts across all CPUs in that LLC. For
instance, if LLC0 has CPU0 with 2 tasks and CPU1 with 1 task preferring
LLC3, the total for LLC0 is 3.

These statistics allow the load balancer to choose tasks from source
sched groups that best match their preferred LLCs.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---

Notes:
    v1->v2:
        Convert nr_pref_llc array in sg_lb_stats to a single
        variable as only the dst LLC stat is needed.
        (K Prateek Nayak)

 kernel/sched/fair.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b0e87616e377..4d7803f69a74 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10445,6 +10445,9 @@ struct sg_lb_stats {
 	unsigned int nr_numa_running;
 	unsigned int nr_preferred_running;
 #endif
+#ifdef CONFIG_SCHED_CACHE
+	unsigned int nr_pref_llc;
+#endif
 };
 
 /*
@@ -10912,6 +10915,9 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 {
 	int i, nr_running, local_group, sd_flags = env->sd->flags;
 	bool balancing_at_rd = !env->sd->parent;
+#ifdef CONFIG_SCHED_CACHE
+	int dst_llc = llc_id(env->dst_cpu);
+#endif
 
 	memset(sgs, 0, sizeof(*sgs));
 
@@ -10932,6 +10938,12 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 		if (cpu_overutilized(i))
 			*sg_overutilized = 1;
 
+#ifdef CONFIG_SCHED_CACHE
+		if (sched_cache_enabled() && llc_id(i) != dst_llc &&
+		    dst_llc >= 0)
+			sgs->nr_pref_llc += rq->nr_pref_llc[dst_llc];
+#endif
+
 		/*
 		 * No need to call idle_cpu() if nr_running is not 0
 		 */
-- 
2.32.0
Re: [PATCH v2 09/23] sched/cache: Count tasks prefering destination LLC in a sched group
Posted by Peter Zijlstra 1 week, 2 days ago
On Wed, Dec 03, 2025 at 03:07:28PM -0800, Tim Chen wrote:
> During LLC load balancing, tabulate the number of tasks on each runqueue
> that prefer the LLC contains the env->dst_cpu in a sched group.
> 
> For example, consider a system with 4 LLC sched groups (LLC0 to LLC3)
> balancing towards LLC3. LLC0 has 3 tasks preferring LLC3, LLC1 has
> 2, and LLC2 has 1. LLC0, having the most tasks preferring LLC3, is
> selected as the busiest source to pick tasks from.
> 
> Within a source LLC, the total number of tasks preferring a destination
> LLC is computed by summing counts across all CPUs in that LLC. For
> instance, if LLC0 has CPU0 with 2 tasks and CPU1 with 1 task preferring
> LLC3, the total for LLC0 is 3.
> 
> These statistics allow the load balancer to choose tasks from source
> sched groups that best match their preferred LLCs.
> 
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> ---
> 
> Notes:
>     v1->v2:
>         Convert nr_pref_llc array in sg_lb_stats to a single
>         variable as only the dst LLC stat is needed.
>         (K Prateek Nayak)
> 
>  kernel/sched/fair.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index b0e87616e377..4d7803f69a74 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10445,6 +10445,9 @@ struct sg_lb_stats {
>  	unsigned int nr_numa_running;
>  	unsigned int nr_preferred_running;
>  #endif
> +#ifdef CONFIG_SCHED_CACHE
> +	unsigned int nr_pref_llc;
> +#endif

At this point I have to note that rq->nr_pref_llc seems like a horrible
misnomer, for it being an array, and not an actual number like the
naming suggests.
Re: [PATCH v2 09/23] sched/cache: Count tasks prefering destination LLC in a sched group
Posted by Chen, Yu C 1 week, 2 days ago
On 12/10/2025 9:52 PM, Peter Zijlstra wrote:
> On Wed, Dec 03, 2025 at 03:07:28PM -0800, Tim Chen wrote:
>> During LLC load balancing, tabulate the number of tasks on each runqueue
>> that prefer the LLC contains the env->dst_cpu in a sched group.
>>
>> For example, consider a system with 4 LLC sched groups (LLC0 to LLC3)
>> balancing towards LLC3. LLC0 has 3 tasks preferring LLC3, LLC1 has
>> 2, and LLC2 has 1. LLC0, having the most tasks preferring LLC3, is
>> selected as the busiest source to pick tasks from.
>>
>> Within a source LLC, the total number of tasks preferring a destination
>> LLC is computed by summing counts across all CPUs in that LLC. For
>> instance, if LLC0 has CPU0 with 2 tasks and CPU1 with 1 task preferring
>> LLC3, the total for LLC0 is 3.
>>
>> These statistics allow the load balancer to choose tasks from source
>> sched groups that best match their preferred LLCs.
>>
>> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
>> ---
>>
>> Notes:
>>      v1->v2:
>>          Convert nr_pref_llc array in sg_lb_stats to a single
>>          variable as only the dst LLC stat is needed.
>>          (K Prateek Nayak)
>>
>>   kernel/sched/fair.c | 12 ++++++++++++
>>   1 file changed, 12 insertions(+)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index b0e87616e377..4d7803f69a74 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -10445,6 +10445,9 @@ struct sg_lb_stats {
>>   	unsigned int nr_numa_running;
>>   	unsigned int nr_preferred_running;
>>   #endif
>> +#ifdef CONFIG_SCHED_CACHE
>> +	unsigned int nr_pref_llc;
>> +#endif
> 
> At this point I have to note that rq->nr_pref_llc seems like a horrible
> misnomer, for it being an array, and not an actual number like the
> naming suggests.

In the v2 it seems that rq->nr_pref_llc is not an array anymore, it 
indicates
the number of tasks that want to be migrated to the env->dst_cpu 
(dst_llc), because
these tasks' preferred LLC are env->dst_cpu(dst_llc). Maybe renaming it to
rq->nr_pref_dst_llc?

thanks,
Chenyu
Re: [PATCH v2 09/23] sched/cache: Count tasks prefering destination LLC in a sched group
Posted by Peter Zijlstra 1 week, 2 days ago
On Wed, Dec 10, 2025 at 11:05:33PM +0900, Chen, Yu C wrote:
> On 12/10/2025 9:52 PM, Peter Zijlstra wrote:
> > On Wed, Dec 03, 2025 at 03:07:28PM -0800, Tim Chen wrote:
> > > During LLC load balancing, tabulate the number of tasks on each runqueue
> > > that prefer the LLC contains the env->dst_cpu in a sched group.
> > > 
> > > For example, consider a system with 4 LLC sched groups (LLC0 to LLC3)
> > > balancing towards LLC3. LLC0 has 3 tasks preferring LLC3, LLC1 has
> > > 2, and LLC2 has 1. LLC0, having the most tasks preferring LLC3, is
> > > selected as the busiest source to pick tasks from.
> > > 
> > > Within a source LLC, the total number of tasks preferring a destination
> > > LLC is computed by summing counts across all CPUs in that LLC. For
> > > instance, if LLC0 has CPU0 with 2 tasks and CPU1 with 1 task preferring
> > > LLC3, the total for LLC0 is 3.
> > > 
> > > These statistics allow the load balancer to choose tasks from source
> > > sched groups that best match their preferred LLCs.
> > > 
> > > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > > ---
> > > 
> > > Notes:
> > >      v1->v2:
> > >          Convert nr_pref_llc array in sg_lb_stats to a single
> > >          variable as only the dst LLC stat is needed.
> > >          (K Prateek Nayak)
> > > 
> > >   kernel/sched/fair.c | 12 ++++++++++++
> > >   1 file changed, 12 insertions(+)
> > > 
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index b0e87616e377..4d7803f69a74 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -10445,6 +10445,9 @@ struct sg_lb_stats {
> > >   	unsigned int nr_numa_running;
> > >   	unsigned int nr_preferred_running;
> > >   #endif
> > > +#ifdef CONFIG_SCHED_CACHE
> > > +	unsigned int nr_pref_llc;
> > > +#endif
> > 
> > At this point I have to note that rq->nr_pref_llc seems like a horrible
> > misnomer, for it being an array, and not an actual number like the
> > naming suggests.
> 
> In the v2 it seems that rq->nr_pref_llc is not an array anymore, it

From two patches ago:

+       unsigned int            *nr_pref_llc;

Its a pointer of some sort.


> indicates
> the number of tasks that want to be migrated to the env->dst_cpu (dst_llc),
> because
> these tasks' preferred LLC are env->dst_cpu(dst_llc). Maybe renaming it to
> rq->nr_pref_dst_llc?

Like I said in:

  https://lkml.kernel.org/r/20251210125114.GS3707891@noisy.programming.kicks-ass.net

it might make sense to put it in struct sched_domain instead of struct
rq, since then you can allocate and swap it right along with the rest of
the domain tree.
Re: [PATCH v2 09/23] sched/cache: Count tasks prefering destination LLC in a sched group
Posted by Chen, Yu C 1 week, 1 day ago
On 12/11/2025 12:16 AM, Peter Zijlstra wrote:
> On Wed, Dec 10, 2025 at 11:05:33PM +0900, Chen, Yu C wrote:
>> On 12/10/2025 9:52 PM, Peter Zijlstra wrote:
>>> On Wed, Dec 03, 2025 at 03:07:28PM -0800, Tim Chen wrote:
>>>> During LLC load balancing, tabulate the number of tasks on each runqueue
>>>> that prefer the LLC contains the env->dst_cpu in a sched group.
>>>>
>>>> For example, consider a system with 4 LLC sched groups (LLC0 to LLC3)
>>>> balancing towards LLC3. LLC0 has 3 tasks preferring LLC3, LLC1 has
>>>> 2, and LLC2 has 1. LLC0, having the most tasks preferring LLC3, is
>>>> selected as the busiest source to pick tasks from.
>>>>
>>>> Within a source LLC, the total number of tasks preferring a destination
>>>> LLC is computed by summing counts across all CPUs in that LLC. For
>>>> instance, if LLC0 has CPU0 with 2 tasks and CPU1 with 1 task preferring
>>>> LLC3, the total for LLC0 is 3.
>>>>
>>>> These statistics allow the load balancer to choose tasks from source
>>>> sched groups that best match their preferred LLCs.
>>>>
>>>> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
>>>> ---
>>>>
>>>> Notes:
>>>>       v1->v2:
>>>>           Convert nr_pref_llc array in sg_lb_stats to a single
>>>>           variable as only the dst LLC stat is needed.
>>>>           (K Prateek Nayak)
>>>>
>>>>    kernel/sched/fair.c | 12 ++++++++++++
>>>>    1 file changed, 12 insertions(+)
>>>>
>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>> index b0e87616e377..4d7803f69a74 100644
>>>> --- a/kernel/sched/fair.c
>>>> +++ b/kernel/sched/fair.c
>>>> @@ -10445,6 +10445,9 @@ struct sg_lb_stats {
>>>>    	unsigned int nr_numa_running;
>>>>    	unsigned int nr_preferred_running;
>>>>    #endif
>>>> +#ifdef CONFIG_SCHED_CACHE
>>>> +	unsigned int nr_pref_llc;
>>>> +#endif
>>>
>>> At this point I have to note that rq->nr_pref_llc seems like a horrible
>>> misnomer, for it being an array, and not an actual number like the
>>> naming suggests.
>>
>> In the v2 it seems that rq->nr_pref_llc is not an array anymore, it
> 
>  From two patches ago:
> 
> +       unsigned int            *nr_pref_llc;
> 
> Its a pointer of some sort.
> 
> 

Ah I see, I thought it was the variable in the sgs structure.


>> indicates
>> the number of tasks that want to be migrated to the env->dst_cpu (dst_llc),
>> because
>> these tasks' preferred LLC are env->dst_cpu(dst_llc). Maybe renaming it to
>> rq->nr_pref_dst_llc?
> 
> Like I said in:
> 
>    https://lkml.kernel.org/r/20251210125114.GS3707891@noisy.programming.kicks-ass.net
> 
> it might make sense to put it in struct sched_domain instead of struct
> rq, since then you can allocate and swap it right along with the rest of
> the domain tree.

I'll think more about this. Currently the per cpu rq's nr_pref_llc is 
used to
identify the "busiest" runqueue. The busiest runqueue has most threads 
wanted
to be migrated to llc_id(env->dst_cpu), because the threads' preferred 
LLC is
there - in this way, the migration success ratio to the preferred LLC 
would be
higher without breaking the imbalance too much IMHO. So we might have to 
track
the per cpu rq's statistics during enqueue/dequeue. If we put it in the 
domain,
not sure how to track that.

Thanks,
Chenyu
Re: [PATCH v2 09/23] sched/cache: Count tasks prefering destination LLC in a sched group
Posted by Tim Chen 1 week, 2 days ago
On Wed, 2025-12-10 at 16:16 +0100, Peter Zijlstra wrote:
> On Wed, Dec 10, 2025 at 11:05:33PM +0900, Chen, Yu C wrote:
> > On 12/10/2025 9:52 PM, Peter Zijlstra wrote:
> > > On Wed, Dec 03, 2025 at 03:07:28PM -0800, Tim Chen wrote:
> > > > During LLC load balancing, tabulate the number of tasks on each runqueue
> > > > that prefer the LLC contains the env->dst_cpu in a sched group.
> > > > 
> > > > For example, consider a system with 4 LLC sched groups (LLC0 to LLC3)
> > > > balancing towards LLC3. LLC0 has 3 tasks preferring LLC3, LLC1 has
> > > > 2, and LLC2 has 1. LLC0, having the most tasks preferring LLC3, is
> > > > selected as the busiest source to pick tasks from.
> > > > 
> > > > Within a source LLC, the total number of tasks preferring a destination
> > > > LLC is computed by summing counts across all CPUs in that LLC. For
> > > > instance, if LLC0 has CPU0 with 2 tasks and CPU1 with 1 task preferring
> > > > LLC3, the total for LLC0 is 3.
> > > > 
> > > > These statistics allow the load balancer to choose tasks from source
> > > > sched groups that best match their preferred LLCs.
> > > > 
> > > > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > > > ---
> > > > 
> > > > Notes:
> > > >      v1->v2:
> > > >          Convert nr_pref_llc array in sg_lb_stats to a single
> > > >          variable as only the dst LLC stat is needed.
> > > >          (K Prateek Nayak)
> > > > 
> > > >   kernel/sched/fair.c | 12 ++++++++++++
> > > >   1 file changed, 12 insertions(+)
> > > > 
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > index b0e87616e377..4d7803f69a74 100644
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -10445,6 +10445,9 @@ struct sg_lb_stats {
> > > >   	unsigned int nr_numa_running;
> > > >   	unsigned int nr_preferred_running;
> > > >   #endif
> > > > +#ifdef CONFIG_SCHED_CACHE
> > > > +	unsigned int nr_pref_llc;
> > > > +#endif
> > > 
> > > At this point I have to note that rq->nr_pref_llc seems like a horrible
> > > misnomer, for it being an array, and not an actual number like the
> > > naming suggests.
> > 
> > In the v2 it seems that rq->nr_pref_llc is not an array anymore, it
> 
> From two patches ago:
> 
> +       unsigned int            *nr_pref_llc;
> 
> Its a pointer of some sort.


Perhaps I should used a different name here when I update this patch
for v2.  

rq->nr_pref_llc[]   is an array as it records the number of tasks preferring each LLC.
However
sgs->nr_pref_llc  is a single number representing the number of tasks
preferring the current domain preferring the destination LLC.

Sorry for using the same name that may have created this confusion.

> 
> 
> > indicates
> > the number of tasks that want to be migrated to the env->dst_cpu (dst_llc),
> > because
> > these tasks' preferred LLC are env->dst_cpu(dst_llc). Maybe renaming it to
> > rq->nr_pref_dst_llc?
> 
> Like I said in:
> 
>   https://lkml.kernel.org/r/20251210125114.GS3707891@noisy.programming.kicks-ass.net
> 
> it might make sense to put it in struct sched_domain instead of struct
> rq, since then you can allocate and swap it right along with the rest of
> the domain tree.

Sent a separate reply to that comment to clarify why I think we need nr_pref_llc[] per
run queue.

Tim