From nobody Fri Dec 19 21:46:43 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C01D725B69F for ; Fri, 7 Nov 2025 16:18:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762532301; cv=none; b=Vt6s0JRT5zF6vJCYwXMByA5mP0O78QiPT0hkT8o3s1pmvizphIXiEGCVmE/JHxrdZu+4bPw9oJUA6+1kZEqERdBrhSnARnktw1IaMYK0MGep0ELlM8CSiuhobeLy38/Wnw2poauyNKXLL2eNW3Emy7bUF9PXrjhUR9bT5LyaDXQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762532301; c=relaxed/simple; bh=bP8o/crwvkKz2buUsjALT1cMZjD4rVQz1pRpLWhStxs=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=AOVkZcsc3dLAMQgqxg1lk3zmkk3VnZtBEZdWrVo5m+WxENh8KeYXTol0r9y8AO2ShwFsFPD3cP+0CWs3LtQW5ml79/AbOs2eVKnm2RbCbId7pC8uAbDJHO6PmpWpTL6KdoDUnBFPbRXWpLrN1619wXUW1aqu76ml2W7KcYiKmm8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=kRe5kJUM; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="kRe5kJUM" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=A8wtqIxcaHYAFkpukVmUhv5LoUcO1ERs7wW4nmiSXeI=; b=kRe5kJUMpF+UgAX8NHgurjasTV HGJ2iovuKN25Hh8jpDwAwIEJXZXfhfRwksYU54NeArw8MDCm7jXP4uj3GlH4EAv6Ex+yRPBtlgnRU lQY8zHGnQffGcqkOeH+3JlwkJqgwZ7gH/giHQf937/Jzjxg6NHXTryd2FZQ13uEjzkgMXcPd4b1xC 3KA8/px87/TRA5xEjTKHI3pHFI8Eqd1j1uJ4qOm5hGLQ4bx7p3aG7woTDwLQwVArozRWRKQjSGu2/ LNjZDIYAzc5jO4n7IxYuGSRm3s2/mHYrNCLJXslbng+msXo4Jg9vRnqnYXD+gstIwXsgjQE8NfUgO pBBGFOMg==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1vHPA8-0000000DUTL-0OcS; Fri, 07 Nov 2025 16:18:09 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id F398330049A; Fri, 07 Nov 2025 17:18:07 +0100 (CET) Message-ID: <20251107161739.406147760@infradead.org> User-Agent: quilt/0.68 Date: Fri, 07 Nov 2025 17:06:46 +0100 From: Peter Zijlstra To: Chris Mason , Joseph Salisbury , Adam Li , Hazem Mohamed Abuelfotoh , Josh Don Cc: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org Subject: [PATCH 1/4] sched/fair: Revert max_newidle_lb_cost bump References: <20251107160645.929564468@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Many people reported regressions on their database workloads due to: 155213a2aed4 ("sched/fair: Bump sd->max_newidle_lb_cost when newidle bala= nce fails") For instance Adam Li reported a 6% regression on SpecJBB. Conversely this will regress schbench again; on my machine from 2.22 Mrps/s down to 2.04 Mrps/s. Reported-by: Joseph Salisbury Reported-by: Adam Li Reported-by: Dietmar Eggemann Reported-by: Hazem Mohamed Abuelfotoh Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/fair.c | 19 +++---------------- 1 file changed, 3 insertions(+), 16 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -12152,14 +12152,8 @@ static inline bool update_newidle_cost(s /* * Track max cost of a domain to make sure to not delay the * next wakeup on the CPU. - * - * sched_balance_newidle() bumps the cost whenever newidle - * balance fails, and we don't want things to grow out of - * control. Use the sysctl_sched_migration_cost as the upper - * limit, plus a litle extra to avoid off by ones. */ - sd->max_newidle_lb_cost =3D - min(cost, sysctl_sched_migration_cost + 200); + sd->max_newidle_lb_cost =3D cost; sd->last_decay_max_lb_cost =3D jiffies; } else if (time_after(jiffies, sd->last_decay_max_lb_cost + HZ)) { /* @@ -12851,17 +12845,10 @@ static int sched_balance_newidle(struct =20 t1 =3D sched_clock_cpu(this_cpu); domain_cost =3D t1 - t0; + update_newidle_cost(sd, domain_cost); + curr_cost +=3D domain_cost; t0 =3D t1; - - /* - * Failing newidle means it is not effective; - * bump the cost so we end up doing less of it. - */ - if (!pulled_task) - domain_cost =3D (3 * sd->max_newidle_lb_cost) / 2; - - update_newidle_cost(sd, domain_cost); } =20 /* From nobody Fri Dec 19 21:46:43 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 71905262FC7 for ; Fri, 7 Nov 2025 16:18:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762532301; cv=none; b=peCwcw5uT51RLZj1I1bWF/XbSxlYp+s0yYzsO6V2ImD4L+kf+4Psb8rLKOBnoHmqx1de4KHKbwuwTWJaV1DP96fH+xiXY/zkmnfweohcT0hvxPW0IrWTjNHx4+73ni7aMHgWik7i2CDcMOcNJbvCt90WqUy9YBb/AvZPI/E437E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762532301; c=relaxed/simple; bh=v7BLQ9hN6tgnHhY26dNTomwoZYHkCf/qt7OZxeUAchw=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=iNKq8dxIRJX53fnZ1hbTtNg4VIlR8f2z1EcuJ1rzweNY1s/aDBG3bhgBNtRexVlgGib1KA4Nft68DcP+IrjH1wG8RjmHalZs2/bUz2b88Z+xkmFafn0dPgyJglRKySMhu/0rHKQuURUqqQYDOBql4dhquZ4OpGiHdjVn8v10b6Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=oEp1lEXX; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="oEp1lEXX" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=//CAQTnED5uckJ05w5QzaZBZaMbRiBHs025dyQwPJD8=; b=oEp1lEXXH0OHgAbWvPTG9vD1AN 4IqeXHhMLyBEx02RMbzCWnNDfeAWvSaUW8XqfetA2SwTeiYKc4CqhZplfpdHMf1tuHByBE7lo9da+ XFK+x20ZbjdwPGhD70fxpcNSt3g0Gs96bdctGO5KMLqp5u14pTrBjoeJomVS5IkQtMXQ2o/80NCEV b41MRtEwCDRad2TxaS0mnIUHc/MzRqtjs20OHVYRqUBF+V5JBV4nkSKM/YLEvodpq6g2Febq3NA9B eEBahorzB2BlUWVXgSOwFhDtQ0mPLC6LvUzyqs6VY3AwecBPRCDXEn054CT1ilXW4+zcpBZyxCHoW AISk/t3A==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1vHPA8-0000000DUTI-0K5T; Fri, 07 Nov 2025 16:18:09 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 03C1F3007B5; Fri, 07 Nov 2025 17:18:08 +0100 (CET) Message-ID: <20251107161739.525916173@infradead.org> User-Agent: quilt/0.68 Date: Fri, 07 Nov 2025 17:06:47 +0100 From: Peter Zijlstra To: Chris Mason , Joseph Salisbury , Adam Li , Hazem Mohamed Abuelfotoh , Josh Don Cc: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org Subject: [PATCH 2/4] sched/fair: Small cleanup to sched_balance_newidle() References: <20251107160645.929564468@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Pull out the !sd check to simplify code. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/fair.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -12811,14 +12811,16 @@ static int sched_balance_newidle(struct =20 rcu_read_lock(); sd =3D rcu_dereference_check_sched_domain(this_rq->sd); + if (!sd) { + rcu_read_unlock(); + goto out; + } =20 if (!get_rd_overloaded(this_rq->rd) || - (sd && this_rq->avg_idle < sd->max_newidle_lb_cost)) { + this_rq->avg_idle < sd->max_newidle_lb_cost) { =20 - if (sd) - update_next_balance(sd, &next_balance); + update_next_balance(sd, &next_balance); rcu_read_unlock(); - goto out; } rcu_read_unlock(); From nobody Fri Dec 19 21:46:43 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 71876261B99 for ; Fri, 7 Nov 2025 16:18:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762532302; cv=none; b=mKmMFPR9b43B2Jk1s0lPPsYMrovNyuCRk6Z4qir1xQHPr7MxzKxttwz/EMiOGoQ0UyJiHzhepUGtvA9MyZUEIXfgcVN6q8X+9hEKzPNDbyEs15Cx/7t++XvuWDqVWOlo45mWji2w9MRQybEJAvNlP8iyAXuDVQ/zT9xZQX4PXQk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762532302; c=relaxed/simple; bh=xc+Q+HSrfHgz01/jVrxNRJP8NAGZTlIEai3jwOKL6E4=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=LTw/3zZImGpFSdWdYeF/4hcaMhMnSiWVt1reeLdhcko6ANqldk3rUSJ85lAhHLd/pKwdtC/9HThZ2Azn+ktYVTxNEJlkOWBqyXe8QY43B536u9XOG2khIn9/Yk5FPierPtkPbAD5QcnD4mvgQVNhXNzzMQj20Rf30pMX26AItQQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=s4p+MN0t; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="s4p+MN0t" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=7yx2y0dJFqYVgOG0SXlROmLKdeQ8zFk8L28f2lgjVAE=; b=s4p+MN0tDKcvsjpzY0JMofx+Dg vcpJI8IHbB3DAFMuPA/d1w6fjejFd+819sL0MULcLowPFehdcUBZOLgTpGzBulD09fJOIPWgaETRE Pbkf4/sSj3tP85H0WXDanyW9s/Frev9BF2wdar8FpB7HrLfPQTXEzacPc0/I1Rk37v5J0lcTIqYP4 WHXVq8MnQVVEkkBhVeqVtUXBMqhduh/SraAt/HBVQ9uMMxrcNni6L5KLGWDaQk6QBF8mWQzAb+Ji+ FWCQ5FdUzJ8cY/4E9SP1qfX/1k26od12R+WtAZFPZUCMmHrOjvERxg12QoGh/eKmc+BC9isFulwNC LgMKk8vw==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1vHPA8-0000000DUTJ-0Oqh; Fri, 07 Nov 2025 16:18:09 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 07CF13018AA; Fri, 07 Nov 2025 17:18:08 +0100 (CET) Message-ID: <20251107161739.655208666@infradead.org> User-Agent: quilt/0.68 Date: Fri, 07 Nov 2025 17:06:48 +0100 From: Peter Zijlstra To: Chris Mason , Joseph Salisbury , Adam Li , Hazem Mohamed Abuelfotoh , Josh Don Cc: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org Subject: [PATCH 3/4] sched/fair: Small cleanup to update_newidle_cost() References: <20251107160645.929564468@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Simplify code by adding a few variables. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/fair.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -12148,22 +12148,25 @@ void update_max_interval(void) =20 static inline bool update_newidle_cost(struct sched_domain *sd, u64 cost) { + unsigned long next_decay =3D sd->last_decay_max_lb_cost + HZ; + unsigned long now =3D jiffies; + if (cost > sd->max_newidle_lb_cost) { /* * Track max cost of a domain to make sure to not delay the * next wakeup on the CPU. */ sd->max_newidle_lb_cost =3D cost; - sd->last_decay_max_lb_cost =3D jiffies; - } else if (time_after(jiffies, sd->last_decay_max_lb_cost + HZ)) { + sd->last_decay_max_lb_cost =3D now; + + } else if (time_after(now, next_decay)) { /* * Decay the newidle max times by ~1% per second to ensure that * it is not outdated and the current max cost is actually * shorter. */ sd->max_newidle_lb_cost =3D (sd->max_newidle_lb_cost * 253) / 256; - sd->last_decay_max_lb_cost =3D jiffies; - + sd->last_decay_max_lb_cost =3D now; return true; } From nobody Fri Dec 19 21:46:43 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7180E25C810 for ; Fri, 7 Nov 2025 16:18:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762532302; cv=none; b=qdFEuAyz++Kf/VoY51WsHEYNS/X1C4hk90Cszj0Fg0hR3uNSEepE9rVhBtcd8eaCIqNeUhbIM5c9al3wTiAB0S7Ex+tZ1rgvhvV+zEMziecWonOJDzcgnGeI2Ru/GoWizM22UWBvGYCZ5COEj8kl+e+hnmAegApkcI9toT74+E0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762532302; c=relaxed/simple; bh=pHYLI7RElVgPbGfTiypZjXRBRvRCu76ZtXCjBXPMzug=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=jzjdifkN931nKGiKCILAlMu1pRNA/FKrGYFwT2rWYJYqnhK4ujUrnhIlxgWpUJIML7krPqCZCdrCAzzK0zeA9WkQW6uB1elDFzNgMkmjfjbgkKKp9XWLLeDn7nn7P4HXSeThfTj2FbrCxNhto56s+sd/4WkewwyY9TZJqAlJ+jg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=D4sl7AlN; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="D4sl7AlN" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=uzMZW+s5eA+HQIfdTp4Suxqxw6Fg42u2NyW7VHN6lC4=; b=D4sl7AlNiH1i+ZPZCYYulYrupA lTyvYT2cePRA+XcnqE7YCflTxN7z7xweViwjHD5p4RoTuOAyF+2ds1+ggPtdLbE/yTkey6lSDa3vM oaZIwK7orMcWDmoGzAbWyhGHUFWZpKRW+W71BylBNB846QbiXUpsyl/yTC+kZ9/e/ryLzMHSpUgAO 3tZMzA9DDXBABgxKv+vI5GDQnXraAj1rgVEPl4as5igKEyjwEYIb4g4HZgkTXcV+Nwc11fheGAzkG NB8/gRb12y1PtHReWFvXRt039uHL82Fu3/d1Vf2ovEC2NwCYNHBey5dzLrUsv7IQ30v4VzKKbz1B8 xwtwM4/w==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1vHPA8-0000000DUTH-0P3i; Fri, 07 Nov 2025 16:18:09 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 0C689302186; Fri, 07 Nov 2025 17:18:08 +0100 (CET) Message-ID: <20251107161739.770122091@infradead.org> User-Agent: quilt/0.68 Date: Fri, 07 Nov 2025 17:06:49 +0100 From: Peter Zijlstra To: Chris Mason , Joseph Salisbury , Adam Li , Hazem Mohamed Abuelfotoh , Josh Don Cc: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org Subject: [PATCH 4/4] sched/fair: Proportional newidle balance References: <20251107160645.929564468@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a randomized algorithm that runs newidle balancing proportional to its success rate. This improves schbench significantly: 6.18-rc4: 2.22 Mrps/s 6.18-rc4+revert: 2.04 Mrps/s 6.18-rc4+revert+random: 2.18 Mrps/S Conversely, per Adam Li this affects SpecJBB slightly, reducing it by 1%: 6.17: -6% 6.17+revert: 0% 6.17+revert+random: -1% Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Dietmar Eggemann Tested-by: Adam Li Tested-by: Dietmar Eggemann --- include/linux/sched/topology.h | 3 ++ kernel/sched/core.c | 3 ++ kernel/sched/fair.c | 43 ++++++++++++++++++++++++++++++++++++= +---- kernel/sched/features.h | 5 ++++ kernel/sched/sched.h | 7 ++++++ kernel/sched/topology.c | 6 +++++ 6 files changed, 63 insertions(+), 4 deletions(-) --- a/include/linux/sched/topology.h +++ b/include/linux/sched/topology.h @@ -92,6 +92,9 @@ struct sched_domain { unsigned int nr_balance_failed; /* initialise to 0 */ =20 /* idle_balance() stats */ + unsigned int newidle_call; + unsigned int newidle_success; + unsigned int newidle_ratio; u64 max_newidle_lb_cost; unsigned long last_decay_max_lb_cost; =20 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -121,6 +121,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(sched_updat EXPORT_TRACEPOINT_SYMBOL_GPL(sched_compute_energy_tp); =20 DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues); +DEFINE_PER_CPU(struct rnd_state, sched_rnd_state); =20 #ifdef CONFIG_SCHED_PROXY_EXEC DEFINE_STATIC_KEY_TRUE(__sched_proxy_exec); @@ -8589,6 +8590,8 @@ void __init sched_init_smp(void) { sched_init_numa(NUMA_NO_NODE); =20 + prandom_init_once(&sched_rnd_state); + /* * There's no userspace yet to cause hotplug operations; hence all the * CPU masks are stable and all blatant races in the below code cannot --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -12146,11 +12146,26 @@ void update_max_interval(void) max_load_balance_interval =3D HZ*num_online_cpus()/10; } =20 -static inline bool update_newidle_cost(struct sched_domain *sd, u64 cost) +static inline void update_newidle_stats(struct sched_domain *sd, unsigned = int success) +{ + sd->newidle_call++; + sd->newidle_success +=3D success; + + if (sd->newidle_call >=3D 1024) { + sd->newidle_ratio =3D sd->newidle_success; + sd->newidle_call /=3D 2; + sd->newidle_success /=3D 2; + } +} + +static inline bool +update_newidle_cost(struct sched_domain *sd, u64 cost, unsigned int succes= s) { unsigned long next_decay =3D sd->last_decay_max_lb_cost + HZ; unsigned long now =3D jiffies; =20 + update_newidle_stats(sd, success); + if (cost > sd->max_newidle_lb_cost) { /* * Track max cost of a domain to make sure to not delay the @@ -12198,7 +12213,7 @@ static void sched_balance_domains(struct * Decay the newidle max times here because this is a regular * visit to all the domains. */ - need_decay =3D update_newidle_cost(sd, 0); + need_decay =3D update_newidle_cost(sd, 0, 0); max_cost +=3D sd->max_newidle_lb_cost; =20 /* @@ -12843,6 +12858,22 @@ static int sched_balance_newidle(struct break; =20 if (sd->flags & SD_BALANCE_NEWIDLE) { + unsigned int weight =3D 1; + + if (sched_feat(NI_RANDOM)) { + /* + * Throw a 1k sided dice; and only run + * newidle_balance according to the success + * rate. + */ + u32 d1k =3D sched_rng() % 1024; + weight =3D 1 + sd->newidle_ratio; + if (d1k > weight) { + update_newidle_stats(sd, 0); + continue; + } + weight =3D (1024 + weight/2) / weight; + } =20 pulled_task =3D sched_balance_rq(this_cpu, this_rq, sd, CPU_NEWLY_IDLE, @@ -12850,10 +12881,14 @@ static int sched_balance_newidle(struct =20 t1 =3D sched_clock_cpu(this_cpu); domain_cost =3D t1 - t0; - update_newidle_cost(sd, domain_cost); - curr_cost +=3D domain_cost; t0 =3D t1; + + /* + * Track max cost of a domain to make sure to not delay the + * next wakeup on the CPU. + */ + update_newidle_cost(sd, domain_cost, weight * !!pulled_task); } =20 /* --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -121,3 +121,8 @@ SCHED_FEAT(WA_BIAS, true) SCHED_FEAT(UTIL_EST, true) =20 SCHED_FEAT(LATENCY_WARN, false) + +/* + * Do newidle balancing proportional to its success rate using randomizati= on. + */ +SCHED_FEAT(NI_RANDOM, true) --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -5,6 +5,7 @@ #ifndef _KERNEL_SCHED_SCHED_H #define _KERNEL_SCHED_SCHED_H =20 +#include #include #include #include @@ -1348,6 +1349,12 @@ static inline bool is_migration_disabled } =20 DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues); +DECLARE_PER_CPU(struct rnd_state, sched_rnd_state); + +static inline u32 sched_rng(void) +{ + return prandom_u32_state(this_cpu_ptr(&sched_rnd_state)); +} =20 #define cpu_rq(cpu) (&per_cpu(runqueues, (cpu))) #define this_rq() this_cpu_ptr(&runqueues) --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -1662,6 +1662,12 @@ sd_init(struct sched_domain_topology_lev =20 .last_balance =3D jiffies, .balance_interval =3D sd_weight, + + /* 50% success rate */ + .newidle_call =3D 512, + .newidle_success =3D 256, + .newidle_ratio =3D 512, + .max_newidle_lb_cost =3D 0, .last_decay_max_lb_cost =3D jiffies, .child =3D child,