[PATCH] sched_ext: Simplify cpumask computation in balance_scx

lirongqing posted 1 patch 11 months, 1 week ago
kernel/sched/ext.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
[PATCH] sched_ext: Simplify cpumask computation in balance_scx
Posted by lirongqing 11 months, 1 week ago
From: Li RongQing <lirongqing@baidu.com>

Compare SMT CPU against RQ CPU and skip balance it, to avoid calling
for_each_cpu_andnot() and cpumask_of(), they are relatively expensive

Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
 kernel/sched/ext.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 0f1da19..7e40ede 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -2920,11 +2920,19 @@ static int balance_scx(struct rq *rq, struct task_struct *prev,
 	 */
 	if (sched_core_enabled(rq)) {
 		const struct cpumask *smt_mask = cpu_smt_mask(cpu_of(rq));
-		int scpu;
+		int scpu, cpu;
 
-		for_each_cpu_andnot(scpu, smt_mask, cpumask_of(cpu_of(rq))) {
-			struct rq *srq = cpu_rq(scpu);
-			struct task_struct *sprev = srq->curr;
+		cpu = cpu_of(rq);
+
+		for_each_cpu(scpu, smt_mask) {
+			struct rq *srq;
+			struct task_struct *sprev;
+
+			if (scpu == cpu)
+				continue;
+
+			srq = cpu_rq(scpu);
+			sprev = srq->curr;
 
 			WARN_ON_ONCE(__rq_lockp(rq) != __rq_lockp(srq));
 			update_rq_clock(srq);
-- 
2.9.4
Re: [PATCH] sched_ext: Simplify cpumask computation in balance_scx
Posted by Tejun Heo 11 months, 1 week ago
On Fri, Mar 07, 2025 at 02:45:33PM +0800, lirongqing wrote:
> From: Li RongQing <lirongqing@baidu.com>
> 
> Compare SMT CPU against RQ CPU and skip balance it, to avoid calling
> for_each_cpu_andnot() and cpumask_of(), they are relatively expensive

How is cpumask_of() expensive? I have a hard time seeing how this would
actually improve anything. Do you have any measurements?

Thanks.

-- 
tejun
答复: [????] Re: [PATCH] sched_ext: Simplify cpumask computation in balance_scx
Posted by Li,Rongqing 11 months ago
> How is cpumask_of() expensive? I have a hard time seeing how this would
> actually improve anything. Do you have any measurements?
> 

for_each_cpu_andnot+cpumask_of is more faster
sorry for this noise

thanks