[PATCH] sched/fair: Replace random newidle_balance with Bresenham accumulator

Qing Wang posted 1 patch 1 month, 1 week ago
There is a newer version of this series
include/linux/sched/topology.h |  1 +
kernel/sched/fair.c            | 18 +++++++++++++-----
kernel/sched/topology.c        |  1 +
3 files changed, 15 insertions(+), 5 deletions(-)
[PATCH] sched/fair: Replace random newidle_balance with Bresenham accumulator
Posted by Qing Wang 1 month, 1 week ago
The current NI_RANDOM implementation uses a random dice roll to allow
newidle_balance attempts according to the success rate. There is a better
way to implememte it.

Replace the random dice with a Bresenham accumulator that distributes the
allowed attempts with uniformly spaced evenly across a 1024-step window:

Each step do those:
  - Accumulate (1 + newidle_ratio) into newidle_window_pos.
  - If the accumulator reaches 1024, allow the balance attempt and
    subtract 1024.

This guarantees exactly (1 + newidle_ratio) newidle_balance per 1024 steps
and per newidle_balance with uniformly spaced for any ratio in [0, 1023].

Signed-off-by: Qing Wang <wangqing7171@gmail.com>
---
 include/linux/sched/topology.h |  1 +
 kernel/sched/fair.c            | 18 +++++++++++++-----
 kernel/sched/topology.c        |  1 +
 3 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 36553e14866d..fd6de240fb2b 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -95,6 +95,7 @@ struct sched_domain {
 	unsigned int newidle_call;
 	unsigned int newidle_success;
 	unsigned int newidle_ratio;
+	unsigned int newidle_window_pos;
 	u64 newidle_stamp;
 	u64 max_newidle_lb_cost;
 	unsigned long last_decay_max_lb_cost;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 69361c63353a..7874f795f62d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12509,6 +12509,7 @@ static inline void update_newidle_stats(struct sched_domain *sd, unsigned int su
 		ratio += sd->newidle_success;
 
 		sd->newidle_ratio = min(1024, ratio);
+		sd->newidle_window_pos = 0;
 		sd->newidle_call /= 2;
 		sd->newidle_success /= 2;
 	}
@@ -13217,16 +13218,23 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf)
 
 			if (sched_feat(NI_RANDOM) && sd->newidle_ratio < 1024) {
 				/*
-				 * Throw a 1k sided dice; and only run
-				 * newidle_balance according to the success
-				 * rate.
+				 * Base on accumulator of Bresenham's line algorithm
+				 *
+				 * Accumulate ratio each time and trigger balance
+				 * when sum ≥ 1024.
+				 *
+				 * Ensures exactly ratio balances per 1024 calls
+				 * and distributes ratio balance operations uniformly
+				 * across every 1024 invocations.
 				 */
-				u32 d1k = sched_rng() % 1024;
 				weight = 1 + sd->newidle_ratio;
-				if (d1k > weight) {
+
+				sd->newidle_window_pos += weight;
+				if (sd->newidle_window_pos < 1024) {
 					update_newidle_stats(sd, 0);
 					continue;
 				}
+				sd->newidle_window_pos -= 1024;
 				weight = (1024 + weight/2) / weight;
 			}
 
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 5847b83d9d55..0d8fa413f66c 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1708,6 +1708,7 @@ sd_init(struct sched_domain_topology_level *tl,
 		.newidle_call		= 512,
 		.newidle_success	= 256,
 		.newidle_ratio		= 512,
+		.newidle_window_pos	= 0,
 		.newidle_stamp		= now,
 
 		.max_newidle_lb_cost	= 0,
-- 
2.34.1

Re: [PATCH] sched/fair: Replace random newidle_balance with Bresenham accumulator
Posted by K Prateek Nayak 1 month ago
Hello Qing,

On 5/6/2026 8:14 AM, Qing Wang wrote:
> The current NI_RANDOM implementation uses a random dice roll to allow
> newidle_balance attempts according to the success rate. There is a better
> way to implememte it.
> 
> Replace the random dice with a Bresenham accumulator that distributes the
> allowed attempts with uniformly spaced evenly across a 1024-step window:
> 
> Each step do those:
>   - Accumulate (1 + newidle_ratio) into newidle_window_pos.
>   - If the accumulator reaches 1024, allow the balance attempt and
>     subtract 1024.
> 
> This guarantees exactly (1 + newidle_ratio) newidle_balance per 1024 steps
> and per newidle_balance with uniformly spaced for any ratio in [0, 1023].

I took the most sensitive workload I have for newidle balance (tbench)
and took it for a spin with these changes. Following are the results:

    Clients:        tip                  bresenham_accm
        1      321.65 (0.00 pct)       313.97 (-2.38 pct)
        2      641.92 (0.00 pct)       638.74 (-0.49 pct)
        4     1245.65 (0.00 pct)      1237.26 (-0.67 pct)
        8     2435.80 (0.00 pct)      2442.23 ( 0.26 pct)
       16     4717.66 (0.00 pct)      4688.19 (-0.62 pct)
       32     9303.53 (0.00 pct)      9390.71 ( 0.93 pct)
       64    18002.57 (0.00 pct)     17911.56 (-0.50 pct)
      128    27729.26 (0.00 pct)     27621.95 (-0.38 pct)
      256    47134.77 (0.00 pct)     46137.36 (-2.11 pct)
      512    43179.41 (0.00 pct)     43277.53 ( 0.22 pct)
     1024    40339.30 (0.00 pct)     40176.49 (-0.40 pct)

The %diff is in noise range which is a good indication that there
shouldn't be any surprises. I'll queue a run overnight to see if
there are other benchmarks that like / dislike these changes.

I'll let Peter comment on the change itself since he knows these
bits best ;-)

-- 
Thanks and Regards,
Prateek
Re: [PATCH] sched/fair: Replace random newidle_balance with Bresenham accumulator
Posted by Qing Wang 1 month ago
On Wed, 13 May 2026 at 13:29, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
> I took the most sensitive workload I have for newidle balance (tbench)
> and took it for a spin with these changes. Following are the results:
> 
>     Clients:        tip                  bresenham_accm
>         1      321.65 (0.00 pct)       313.97 (-2.38 pct)
>         2      641.92 (0.00 pct)       638.74 (-0.49 pct)
>         4     1245.65 (0.00 pct)      1237.26 (-0.67 pct)
>         8     2435.80 (0.00 pct)      2442.23 ( 0.26 pct)
>        16     4717.66 (0.00 pct)      4688.19 (-0.62 pct)
>        32     9303.53 (0.00 pct)      9390.71 ( 0.93 pct)
>        64    18002.57 (0.00 pct)     17911.56 (-0.50 pct)
>       128    27729.26 (0.00 pct)     27621.95 (-0.38 pct)
>       256    47134.77 (0.00 pct)     46137.36 (-2.11 pct)
>       512    43179.41 (0.00 pct)     43277.53 ( 0.22 pct)
>      1024    40339.30 (0.00 pct)     40176.49 (-0.40 pct)
> 
> The %diff is in noise range which is a good indication that there
> shouldn't be any surprises. I'll queue a run overnight to see if
> there are other benchmarks that like / dislike these changes.
> 
> I'll let Peter comment on the change itself since he knows these
> bits best ;-)

Oh! Thanks for your benchmarks :) The %diff looks good.
Let's wait for Peter's comments.

--
Best regards,
Qing
Re: [PATCH] sched/fair: Replace random newidle_balance with Bresenham accumulator
Posted by Qing Wang 1 month ago
Hi Peter,
Just a gentle ping on this patch. How about this idea?
I'm looking forward to your reply.
--
Best regards,
Qing
[PATCH v2] sched/fair: Replace random newidle_balance with Bresenham accumulator
Posted by Qing Wang 1 week, 4 days ago
The current NI_RANDOM implementation uses a random dice roll to allow
newidle_balance attempts according to the success rate. There is a better
way to implememte it.

Replace the random dice with a Bresenham accumulator that distributes the
allowed attempts with uniformly spaced evenly across a 1024-step window:

Each step do those:
  - Accumulate (1 + newidle_ratio) into newidle_window_pos.
  - If the accumulator reaches 1024, allow the balance attempt and
    subtract 1024.

This guarantees exactly (1 + newidle_ratio) newidle_balance per 1024 steps
and per newidle_balance with uniformly spaced for any ratio in [0, 1023].

Signed-off-by: Qing Wang <wangqing7171@gmail.com>
---

V2:
1. Remove sched_rnd_state and sched_rng() since they are no longer used.

 include/linux/sched/topology.h |  1 +
 kernel/sched/core.c            |  3 ---
 kernel/sched/fair.c            | 18 +++++++++++++-----
 kernel/sched/sched.h           |  6 ------
 kernel/sched/topology.c        |  1 +
 5 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 36553e14866d..fd6de240fb2b 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -95,6 +95,7 @@ struct sched_domain {
 	unsigned int newidle_call;
 	unsigned int newidle_success;
 	unsigned int newidle_ratio;
+	unsigned int newidle_window_pos;
 	u64 newidle_stamp;
 	u64 max_newidle_lb_cost;
 	unsigned long last_decay_max_lb_cost;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b8871449d3c6..4c0f613326d4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -129,7 +129,6 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(sched_dl_server_start_tp);
 EXPORT_TRACEPOINT_SYMBOL_GPL(sched_dl_server_stop_tp);
 
 DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
-DEFINE_PER_CPU(struct rnd_state, sched_rnd_state);
 
 #ifdef CONFIG_SCHED_PROXY_EXEC
 DEFINE_STATIC_KEY_TRUE(__sched_proxy_exec);
@@ -8820,8 +8819,6 @@ void __init sched_init_smp(void)
 {
 	sched_init_numa(NUMA_NO_NODE);
 
-	prandom_init_once(&sched_rnd_state);
-
 	/*
 	 * There's no userspace yet to cause hotplug operations; hence all the
 	 * CPU masks are stable and all blatant races in the below code cannot
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3ebec186f982..c48f092dc5d4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12545,6 +12545,7 @@ static inline void update_newidle_stats(struct sched_domain *sd, unsigned int su
 		ratio += sd->newidle_success;
 
 		sd->newidle_ratio = min(1024, ratio);
+		sd->newidle_window_pos = 0;
 		sd->newidle_call /= 2;
 		sd->newidle_success /= 2;
 	}
@@ -13253,16 +13254,23 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf)
 
 			if (sched_feat(NI_RANDOM) && sd->newidle_ratio < 1024) {
 				/*
-				 * Throw a 1k sided dice; and only run
-				 * newidle_balance according to the success
-				 * rate.
+				 * Base on accumulator of Bresenham's line algorithm
+				 *
+				 * Accumulate ratio each time and trigger balance
+				 * when sum ≥ 1024.
+				 *
+				 * Ensures exactly ratio balances per 1024 calls
+				 * and distributes ratio balance operations uniformly
+				 * across every 1024 invocations.
 				 */
-				u32 d1k = sched_rng() % 1024;
 				weight = 1 + sd->newidle_ratio;
-				if (d1k > weight) {
+
+				sd->newidle_window_pos += weight;
+				if (sd->newidle_window_pos < 1024) {
 					update_newidle_stats(sd, 0);
 					continue;
 				}
+				sd->newidle_window_pos -= 1024;
 				weight = (1024 + weight/2) / weight;
 			}
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 9f63b15d309d..b407835ae9a8 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1384,12 +1384,6 @@ static inline bool is_migration_disabled(struct task_struct *p)
 }
 
 DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
-DECLARE_PER_CPU(struct rnd_state, sched_rnd_state);
-
-static inline u32 sched_rng(void)
-{
-	return prandom_u32_state(this_cpu_ptr(&sched_rnd_state));
-}
 
 static __always_inline struct rq *__this_rq(void)
 {
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 5847b83d9d55..0d8fa413f66c 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1708,6 +1708,7 @@ sd_init(struct sched_domain_topology_level *tl,
 		.newidle_call		= 512,
 		.newidle_success	= 256,
 		.newidle_ratio		= 512,
+		.newidle_window_pos	= 0,
 		.newidle_stamp		= now,
 
 		.max_newidle_lb_cost	= 0,
-- 
2.34.1