[RESEND RFC PATCH v2 28/29] [EXPERIMENTAL] sched/fair: Add a local counter to rate limit task push

K Prateek Nayak posted 29 patches 1 week, 4 days ago
Only 28 patches received!
[RESEND RFC PATCH v2 28/29] [EXPERIMENTAL] sched/fair: Add a local counter to rate limit task push
Posted by K Prateek Nayak 1 week, 4 days ago
Pushing tasks can fail for multitude of reasons - task affinity, the
unavailability of an idle CPUs by the time balance callback is executed,
etc.

Maintain a CPU local counter in sched_domain to rate limit push attempts
if the failures build up. This counter is reset at the time of periodic
balance to the value in "nr_idle_scan".

Since "nr_idle_scan" is only computed for SIS_UTIL, rate limiting has
been guarded behind the same sched_feat().

Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
 include/linux/sched/topology.h |  4 ++++
 kernel/sched/fair.c            | 23 +++++++++++++++++++++--
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 074ee2980cdf..ebe26ce82c1a 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -122,6 +122,10 @@ struct sched_domain {
 	unsigned int alb_failed;
 	unsigned int alb_pushed;
 
+	/* Push load balancing */
+	unsigned long last_nr_push_update;
+	int nr_push_attempt;
+
 	/* SD_BALANCE_EXEC stats */
 	unsigned int sbe_count;
 	unsigned int sbe_balanced;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 34aeb8e58e0b..46d33ab63336 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12356,6 +12356,16 @@ static void sched_balance_domains(struct rq *rq, enum cpu_idle_type idle)
 		rq->max_idle_balance_cost =
 			max((u64)sysctl_sched_migration_cost, max_cost);
 	}
+	if (sched_feat(SIS_UTIL)) {
+		sd = rcu_dereference(per_cpu(sd_llc, cpu));
+
+		if (sd && sd->shared &&
+		    time_after_eq(jiffies, sd->last_nr_push_update + sd->min_interval)) {
+			sd->nr_push_attempt = READ_ONCE(sd->shared->nr_idle_scan);
+			sd->last_nr_push_update = jiffies;
+		}
+	}
+
 	rcu_read_unlock();
 
 	/*
@@ -13110,8 +13120,6 @@ static inline bool should_push_tasks(struct rq *rq)
 	struct sched_domain *sd;
 	int cpu = cpu_of(rq);
 
-	/* TODO: Add a CPU local failure counter. */
-
 	/* CPU doesn't have any fair task to push. */
 	if (!has_pushable_tasks(rq))
 		return false;
@@ -13126,6 +13134,10 @@ static inline bool should_push_tasks(struct rq *rq)
 	if (!sd)
 		return false;
 
+	/* We've failed to push task too many times. */
+	if (sched_feat(SIS_UTIL) && sd->nr_push_attempt <= 0)
+		return false;
+
 	/*
 	 * We may not be able to find a push target.
 	 * Skip for this tick and depend on the periodic
@@ -13176,6 +13188,13 @@ static bool push_fair_task(struct rq *rq)
 		return true;
 	}
 
+	/*
+	 * If the push failed after a full search, decrement the
+	 * attempt counter to dicourage further attempts. Periodic
+	 * balancer will reset the "nr_push_attempt" after a while.
+	 */
+	sd->nr_push_attempt--;
+
 	return false;
 }
 
-- 
2.43.0
Re: [RESEND RFC PATCH v2 28/29] [EXPERIMENTAL] sched/fair: Add a local counter to rate limit task push
Posted by Christian Loehle 1 week, 3 days ago
On 12/8/25 09:27, K Prateek Nayak wrote:
> Pushing tasks can fail for multitude of reasons - task affinity, the
> unavailability of an idle CPUs by the time balance callback is executed,
> etc.
> 
> Maintain a CPU local counter in sched_domain to rate limit push attempts
> if the failures build up. This counter is reset at the time of periodic
> balance to the value in "nr_idle_scan".
> 
> Since "nr_idle_scan" is only computed for SIS_UTIL, rate limiting has
> been guarded behind the same sched_feat().
> 
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
>  include/linux/sched/topology.h |  4 ++++
>  kernel/sched/fair.c            | 23 +++++++++++++++++++++--
>  2 files changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> index 074ee2980cdf..ebe26ce82c1a 100644
> --- a/include/linux/sched/topology.h
> +++ b/include/linux/sched/topology.h
> @@ -122,6 +122,10 @@ struct sched_domain {
>  	unsigned int alb_failed;
>  	unsigned int alb_pushed;
>  
> +	/* Push load balancing */
> +	unsigned long last_nr_push_update;
> +	int nr_push_attempt;
> +
>  	/* SD_BALANCE_EXEC stats */
>  	unsigned int sbe_count;
>  	unsigned int sbe_balanced;
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 34aeb8e58e0b..46d33ab63336 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -12356,6 +12356,16 @@ static void sched_balance_domains(struct rq *rq, enum cpu_idle_type idle)
>  		rq->max_idle_balance_cost =
>  			max((u64)sysctl_sched_migration_cost, max_cost);
>  	}
> +	if (sched_feat(SIS_UTIL)) {
> +		sd = rcu_dereference(per_cpu(sd_llc, cpu));
> +
> +		if (sd && sd->shared &&
> +		    time_after_eq(jiffies, sd->last_nr_push_update + sd->min_interval)) {
> +			sd->nr_push_attempt = READ_ONCE(sd->shared->nr_idle_scan);
> +			sd->last_nr_push_update = jiffies;
> +		}
> +	}
> +
>  	rcu_read_unlock();
>  
>  	/*
> @@ -13110,8 +13120,6 @@ static inline bool should_push_tasks(struct rq *rq)
>  	struct sched_domain *sd;
>  	int cpu = cpu_of(rq);
>  
> -	/* TODO: Add a CPU local failure counter. */
> -
>  	/* CPU doesn't have any fair task to push. */
>  	if (!has_pushable_tasks(rq))
>  		return false;
> @@ -13126,6 +13134,10 @@ static inline bool should_push_tasks(struct rq *rq)
>  	if (!sd)
>  		return false;
>  
> +	/* We've failed to push task too many times. */
> +	if (sched_feat(SIS_UTIL) && sd->nr_push_attempt <= 0)
> +		return false;
> +
>  	/*
>  	 * We may not be able to find a push target.
>  	 * Skip for this tick and depend on the periodic
> @@ -13176,6 +13188,13 @@ static bool push_fair_task(struct rq *rq)
>  		return true;
>  	}
>  
> +	/*
> +	 * If the push failed after a full search, decrement the
> +	 * attempt counter to dicourage further attempts. Periodic
> +	 * balancer will reset the "nr_push_attempt" after a while.
> +	 */
> +	sd->nr_push_attempt--;
> +
>  	return false;
>  }
>  

Just to confirm, but this patch is included when the cover letter mentions "push" for the
benchmarks?
Did this help the regressions then?
Re: [RESEND RFC PATCH v2 28/29] [EXPERIMENTAL] sched/fair: Add a local counter to rate limit task push
Posted by K Prateek Nayak 1 week, 3 days ago
Hello Chris,

On 12/8/2025 6:03 PM, Christian Loehle wrote:
> Just to confirm, but this patch is included when the cover letter mentions "push" for the
> benchmarks?

Yes, this patch is included.

> Did this help the regressions then?

So without this, I was just using tbench as a smoke test - server client
pair sending a very small amount of data and can lead to very short sleep
time.

Up till 64 client there were slight improvements from not rate limiting
but once I hit the fully loaded case, tbench regresses by up to 10%
mostly because of lack of idle targets but also the CPU doesn't hit the
overutilized threshold to cancel push attempts.

CPUs are still considered "nohz" until the first tick hits and at a high
frequency of CPUs going in and out of idle, "sd->shared->nr_idle_cpus"
turned unreliable in my case as a bailout threshold,

With this rate limiting, things are within 2-3% of the baseline.

-- 
Thanks and Regards,
Prateek