[v2] Introduce SIS_CACHE to choose previous CPU during task wakeup

[PATCH v2 2/3] sched/fair: Calculate the cache-hot time of the idle CPU

Posted by Chen Yu 2 years, 2 months ago

When a CPU is about to become idle due to task dequeue, uses
the dequeued task's average sleep time to set the cache
hot timeout of this idle CPU. This information can facilitate
SIS to skip the cache-hot idle CPU and scan for the next
cache-cold one. When that task is woken up again, it can choose
its previous CPU and reuses its hot-cache.

This is a preparation for the next patch to introduce SIS_CACHE
based task wakeup.

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
 kernel/sched/fair.c     | 30 +++++++++++++++++++++++++++++-
 kernel/sched/features.h |  1 +
 kernel/sched/sched.h    |  1 +
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 672616503e35..c309b3d203c0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6853,8 +6853,17 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 	util_est_update(&rq->cfs, p, task_sleep);
 
 	if (task_sleep) {
-		p->last_dequeue_time = sched_clock_cpu(cpu_of(rq));
+		u64 now = sched_clock_cpu(cpu_of(rq));
+
+		p->last_dequeue_time = now;
 		p->last_dequeue_cpu = cpu_of(rq);
+
+#ifdef CONFIG_SMP
+		/* this rq becomes idle, update its cache hot timeout */
+		if (sched_feat(SIS_CACHE) && !rq->nr_running &&
+		    p->avg_hot_dur)
+			rq->cache_hot_timeout = max(rq->cache_hot_timeout, now + p->avg_hot_dur);
+#endif
 	} else {
 		/* 0 indicates the dequeue is not caused by sleep */
 		p->last_dequeue_time = 0;
@@ -7347,6 +7356,25 @@ static inline int select_idle_smt(struct task_struct *p, int target)
 
 #endif /* CONFIG_SCHED_SMT */
 
+/*
+ * Return true if the idle CPU is cache-hot for someone,
+ * return false otherwise.
+ */
+static __maybe_unused bool cache_hot_cpu(int cpu, int *hot_cpu)
+{
+	if (!sched_feat(SIS_CACHE))
+		return false;
+
+	if (sched_clock_cpu(cpu) >= cpu_rq(cpu)->cache_hot_timeout)
+		return false;
+
+	/* record the first cache hot idle cpu as the backup */
+	if (*hot_cpu == -1)
+		*hot_cpu = cpu;
+
+	return true;
+}
+
 /*
  * Scan the LLC domain for idle CPUs; this is dynamically regulated by
  * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index a3ddf84de430..0af282712cd1 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -50,6 +50,7 @@ SCHED_FEAT(TTWU_QUEUE, true)
  * When doing wakeups, attempt to limit superfluous scans of the LLC domain.
  */
 SCHED_FEAT(SIS_UTIL, true)
+SCHED_FEAT(SIS_CACHE, true)
 
 /*
  * Issue a WARN when we do multiple update_rq_clock() calls
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index e58a54bda77d..191ed62ef06d 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1083,6 +1083,7 @@ struct rq {
 #endif
 	u64			idle_stamp;
 	u64			avg_idle;
+	u64			cache_hot_timeout;
 
 	/* This is used to determine avg_idle's max value */
 	u64			max_idle_balance_cost;
-- 
2.25.1

Re: [PATCH v2 2/3] sched/fair: Calculate the cache-hot time of the idle CPU

Posted by Madadi Vineeth Reddy 2 years, 2 months ago

Hi Chen Yu,

On 21/11/23 13:09, Chen Yu wrote:
> When a CPU is about to become idle due to task dequeue, uses
> the dequeued task's average sleep time to set the cache
> hot timeout of this idle CPU. This information can facilitate
> SIS to skip the cache-hot idle CPU and scan for the next
> cache-cold one. When that task is woken up again, it can choose
> its previous CPU and reuses its hot-cache.
> 
> This is a preparation for the next patch to introduce SIS_CACHE
> based task wakeup.
> 
> Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> ---
>  kernel/sched/fair.c     | 30 +++++++++++++++++++++++++++++-
>  kernel/sched/features.h |  1 +
>  kernel/sched/sched.h    |  1 +
>  3 files changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 672616503e35..c309b3d203c0 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6853,8 +6853,17 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>  	util_est_update(&rq->cfs, p, task_sleep);
>  
>  	if (task_sleep) {
> -		p->last_dequeue_time = sched_clock_cpu(cpu_of(rq));
> +		u64 now = sched_clock_cpu(cpu_of(rq));
> +
> +		p->last_dequeue_time = now;
>  		p->last_dequeue_cpu = cpu_of(rq);
> +
> +#ifdef CONFIG_SMP
> +		/* this rq becomes idle, update its cache hot timeout */
> +		if (sched_feat(SIS_CACHE) && !rq->nr_running &&
> +		    p->avg_hot_dur)
> +			rq->cache_hot_timeout = max(rq->cache_hot_timeout, now + p->avg_hot_dur);

As per the discussion in the rfc patch, you mentioned that SIS_CACHE only honors the average sleep time
of the latest dequeued task and that we don't know how much of the cache is polluted by the latest task.

So I was wondering what made you to put max here.

Thanks and Regards 
Madadi Vineeth Reddy

Re: [PATCH v2 2/3] sched/fair: Calculate the cache-hot time of the idle CPU

Posted by Chen Yu 2 years, 2 months ago

Hi Madadi,

On 2023-11-25 at 12:40:18 +0530, Madadi Vineeth Reddy wrote:
> Hi Chen Yu,
> 
> On 21/11/23 13:09, Chen Yu wrote:
> > When a CPU is about to become idle due to task dequeue, uses
> > the dequeued task's average sleep time to set the cache
> > hot timeout of this idle CPU. This information can facilitate
> > SIS to skip the cache-hot idle CPU and scan for the next
> > cache-cold one. When that task is woken up again, it can choose
> > its previous CPU and reuses its hot-cache.
> > 
> > This is a preparation for the next patch to introduce SIS_CACHE
> > based task wakeup.
> > 
> > Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> > ---
> >  kernel/sched/fair.c     | 30 +++++++++++++++++++++++++++++-
> >  kernel/sched/features.h |  1 +
> >  kernel/sched/sched.h    |  1 +
> >  3 files changed, 31 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 672616503e35..c309b3d203c0 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6853,8 +6853,17 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> >  	util_est_update(&rq->cfs, p, task_sleep);
> >  
> >  	if (task_sleep) {
> > -		p->last_dequeue_time = sched_clock_cpu(cpu_of(rq));
> > +		u64 now = sched_clock_cpu(cpu_of(rq));
> > +
> > +		p->last_dequeue_time = now;
> >  		p->last_dequeue_cpu = cpu_of(rq);
> > +
> > +#ifdef CONFIG_SMP
> > +		/* this rq becomes idle, update its cache hot timeout */
> > +		if (sched_feat(SIS_CACHE) && !rq->nr_running &&
> > +		    p->avg_hot_dur)
> > +			rq->cache_hot_timeout = max(rq->cache_hot_timeout, now + p->avg_hot_dur);
> 
> As per the discussion in the rfc patch, you mentioned that SIS_CACHE only honors the average sleep time
> of the latest dequeued task and that we don't know how much of the cache is polluted by the latest task.
> 
> So I was wondering what made you to put max here.
>

Thanks for taking a look. Yes, previously SIS_CACHE only honors the latest dequeue task.
But as Mathieu pointed out[1], the latest dequeue task might not have enough time to scribble
the cache footprint of some older dequeue tasks, and we should honor the sleep time of
those older dequeue tasks. Consider the following scenario:

task p1 is dequeued with an average sleep time of 2 msec. Then p2 is scheduled in
on this cpu, but only runs for 10 us(does not pollute the cache footprint) and
dequeued with average sleep time of 1 msec. Should we tag the CPU runqueue's timeout
as 2 msec or 1 msec later? We choose 2 msec. The idea is to make the timeout moving
forward so the SIS_CACHE could make it easier for the p1 to be woken up on its previous
CPU.

[1] https://lore.kernel.org/lkml/2a47ae82-b8cd-95db-9f48-82b3df0730f3@efficios.com/

thanks,
Chenyu