From nobody Fri Dec 19 12:45:06 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9AFBA265CAC for ; Tue, 20 May 2025 10:18:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747736335; cv=none; b=bmxEesC4Ki80GuHHhmadCi4ip05YX+BsWxy4CRpav6QRVS3suWCWt8bIknMBHm4z941bPmFYCKYkNFO05mCtnzY9p5kmVUDw5NqoRW9dDhTURkdiGxLGa7gbFieKXNYfhK9KruSR9NrOmob/Su7029WMRwJcEpUX1N73MtAaaYw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747736335; c=relaxed/simple; bh=ZFZHPQB2gr7QrnWvcowqbVRfTF6upcz8CBZDjNdGW+w=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=qtjEo9Ne5fHqLYffHQN98OVeovBvd6gKo5DKhcshHH7jX/02Dlig7b2IgVM2CPgKk5CeaFo6ebavhg91t66aOIYERPbq/zz6Q7hq+u5XqGSS+Y25Tb25qZiOu669AB+2Eoxza1HB8tfQ7/pNMO8PegBdZ4/Vzzs3ZCQ+DHT89CM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=lVTkAkRM; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="lVTkAkRM" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=dirTgPpnRB9/rGEebkVK3Wz8ypqGQFLT/f1SYZQ6aJs=; b=lVTkAkRMZQqgdbUrym6f8BdRNK nZDbeTn+1EZ111LfCdMraFR5B5Sxw/VbVH07P3HwlH1ZMPkw1RJdsXnYdDxOHiP1F6PTn/6baWjWb 6mG1OUPmBalFGg0+QD0S40fF7FAvzObdP4bLwyvH2J4Yr9tsdc9at2g5w2VVEae5HbvyGcTKWo5kE YrpDlXKX+HzveGc9xX8HXFSr+wtOhoP6UlHeIFTwbLtRBdQjXO2MY1aNua+gPu+AxWXIdeqw+37Cw 8nyiFx3Fy5lvPos8UZIpeJRje1yxbXRvPT0UTQX9r0kvlYSjs1X+Nb3P51sLyj0LrTN4VKVRfHoGg 5CjPtDzQ==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1uHK3Z-000000038Hf-2SAU; Tue, 20 May 2025 10:18:45 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id B75FE300780; Tue, 20 May 2025 12:18:44 +0200 (CEST) Message-ID: <20250520101727.507378961@infradead.org> User-Agent: quilt/0.66 Date: Tue, 20 May 2025 11:45:39 +0200 From: Peter Zijlstra To: mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, clm@meta.com Cc: linux-kernel@vger.kernel.org, peterz@infradead.org Subject: [RFC][PATCH 1/5] sched/deadline: Less agressive dl_server handling References: <20250520094538.086709102@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Chris reported that commit 5f6bd380c7bd ("sched/rt: Remove default bandwidth control") caused a significant dip in his favourite benchmark of the day. Simply disabling dl_server cured things. His workload hammers the 0->1, 1->0 transitions, and the dl_server_{start,stop}() overhead kills it -- fairly obviously a bad idea in hind sight and all that. Change things around to only disable the dl_server when there has not been a fair task around for a whole period. Since the default period is 1 second, this ensures the benchmark never trips this, overhead gone. Fixes: 557a6bfc662c ("sched/fair: Add trivial fair server") Reported-by: Chris Mason Signed-off-by: Peter Zijlstra (Intel) Tested-by: K Prateek Nayak --- include/linux/sched.h | 1 + kernel/sched/deadline.c | 31 +++++++++++++++++++++++++++---- 2 files changed, 28 insertions(+), 4 deletions(-) --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -702,6 +702,7 @@ struct sched_dl_entity { unsigned int dl_defer : 1; unsigned int dl_defer_armed : 1; unsigned int dl_defer_running : 1; + unsigned int dl_server_idle : 1; =20 /* * Bandwidth enforcement timer. Each -deadline task has its --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1215,6 +1215,8 @@ static void __push_dl_task(struct rq *rq /* a defer timer will not be reset if the runtime consumed was < dl_server= _min_res */ static const u64 dl_server_min_res =3D 1 * NSEC_PER_MSEC; =20 +static bool dl_server_stopped(struct sched_dl_entity *dl_se); + static enum hrtimer_restart dl_server_timer(struct hrtimer *timer, struct = sched_dl_entity *dl_se) { struct rq *rq =3D rq_of_dl_se(dl_se); @@ -1234,6 +1236,7 @@ static enum hrtimer_restart dl_server_ti =20 if (!dl_se->server_has_tasks(dl_se)) { replenish_dl_entity(dl_se); + dl_server_stopped(dl_se); return HRTIMER_NORESTART; } =20 @@ -1639,8 +1642,10 @@ void dl_server_update_idle_time(struct r void dl_server_update(struct sched_dl_entity *dl_se, s64 delta_exec) { /* 0 runtime =3D fair server disabled */ - if (dl_se->dl_runtime) + if (dl_se->dl_runtime) { + dl_se->dl_server_idle =3D 0; update_curr_dl_se(dl_se->rq, dl_se, delta_exec); + } } =20 void dl_server_start(struct sched_dl_entity *dl_se) @@ -1663,7 +1668,7 @@ void dl_server_start(struct sched_dl_ent setup_new_dl_entity(dl_se); } =20 - if (!dl_se->dl_runtime) + if (!dl_se->dl_runtime || dl_se->dl_server_active) return; =20 dl_se->dl_server_active =3D 1; @@ -1672,7 +1677,7 @@ void dl_server_start(struct sched_dl_ent resched_curr(dl_se->rq); } =20 -void dl_server_stop(struct sched_dl_entity *dl_se) +static void __dl_server_stop(struct sched_dl_entity *dl_se) { if (!dl_se->dl_runtime) return; @@ -1684,6 +1689,24 @@ void dl_server_stop(struct sched_dl_enti dl_se->dl_server_active =3D 0; } =20 +static bool dl_server_stopped(struct sched_dl_entity *dl_se) +{ + if (!dl_se->dl_server_active) + return false; + + if (dl_se->dl_server_idle) { + __dl_server_stop(dl_se); + return true; + } + + dl_se->dl_server_idle =3D 1; + return false; +} + +void dl_server_stop(struct sched_dl_entity *dl_se) +{ +} + void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq, dl_server_has_tasks_f has_tasks, dl_server_pick_f pick_task) @@ -2435,7 +2458,7 @@ static struct task_struct *__pick_task_d if (dl_server(dl_se)) { p =3D dl_se->server_pick_task(dl_se); if (!p) { - if (dl_server_active(dl_se)) { + if (!dl_server_stopped(dl_se)) { dl_se->dl_yielded =3D 1; update_curr_dl_se(rq, dl_se, 0); } From nobody Fri Dec 19 12:45:06 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9862225D1FE for ; Tue, 20 May 2025 10:18:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747736336; cv=none; b=gSncfCfa//tejyi7KwZxQvyVu1Y3uBxTOuR1l9fkZAMb+6bQai00Y86/5zzMzYDuUfqCfIuUDAEEGECAm05cG869s1xRppXgx6j4QwkCPQsAPZcJPcINOqbLqN5GVxi99hU7wYVdYI4rFvVORNAKit65ceUAL1y5zbX8NT5/ATQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747736336; c=relaxed/simple; bh=DjCg4de6F6zvhnjqMUtjzu/zF3gFtaQubS7XUIlgSQ0=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=BIii8kAj8lIFp+8TMazZS+NtY/BgRk3VkOgzWX3pMem+wAA4MzKXRZdL7aOPzEKRRiVqvPtgstwD4rGS0KEMRW0rdMvmW3qQ2ppewoPlwwVTXsAC712kco4PhdFXDLJPXnG59tb+KnaXA9pPmL7Hh27rXkIMYsq+mWXysiuQbo0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=SeQZO8K1; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="SeQZO8K1" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=8OICcTtnMnbM3sponDMK3LjLnM7Jffrw8obBl5w4ZwI=; b=SeQZO8K1wGGYiEHG5L8xkA6pFr lfLsZwFochRd6QdCZe5ACEjQ5iSea6PaLqWuznrkGkb72DCl7TNjUY/teXjbrwEs7GKADbEajKK93 RFWh9nG7hwsnW7KrOhpvutOIA6zXlWw8EpkZZ0RfcfrYjsD+o79pl+ZrWsg2OVFm//kEuHwCfyl95 H4zD32WviOymWClbknls6mDHyG5WD0treHNOwgbQGvdyepXFrcjNkeHRxXBt4I9g7rcp0ds46KKep uXTaIRo+p97FxrHKeamse2VjsyXz4O8UBbaolhWPkeIwmJQQbQqYXo3NbaXyeD7riyGGb4WdKaw5/ lqezpOjA==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.1 #2 (Red Hat Linux)) id 1uHK3Z-00000000lL4-3lyl; Tue, 20 May 2025 10:18:46 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id BA9F73007CD; Tue, 20 May 2025 12:18:44 +0200 (CEST) Message-ID: <20250520101727.620602459@infradead.org> User-Agent: quilt/0.66 Date: Tue, 20 May 2025 11:45:40 +0200 From: Peter Zijlstra To: mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, clm@meta.com Cc: linux-kernel@vger.kernel.org, peterz@infradead.org Subject: [RFC][PATCH 2/5] sched: Optimize ttwu() / select_task_rq() References: <20250520094538.086709102@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Optimize ttwu() by pushing select_idle_siblings() up above waiting for on_cpu(). This allows making use of the cycles otherwise spend waiting to search for an idle CPU. One little detail is that since the task we're looking for an idle CPU for might still be on the CPU, that CPU won't report as running the idle task, and thus won't find his own CPU idle, even when it is. To compensate, remove the 'rq->curr =3D=3D rq->idle' condition from idle_cpu() -- it doesn't really make sense anyway. Additionally, Chris found (concurrently) that perf-c2c reported that test as being a cache-miss monster. Signed-off-by: Peter Zijlstra (Intel) Tested-by: K Prateek Nayak --- kernel/sched/core.c | 3 ++- kernel/sched/syscalls.c | 3 --- 2 files changed, 2 insertions(+), 4 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4305,6 +4305,8 @@ int try_to_wake_up(struct task_struct *p ttwu_queue_wakelist(p, task_cpu(p), wake_flags)) break; =20 + cpu =3D select_task_rq(p, p->wake_cpu, &wake_flags); + /* * If the owning (remote) CPU is still in the middle of schedule() with * this task as prev, wait until it's done referencing the task. @@ -4316,7 +4318,6 @@ int try_to_wake_up(struct task_struct *p */ smp_cond_load_acquire(&p->on_cpu, !VAL); =20 - cpu =3D select_task_rq(p, p->wake_cpu, &wake_flags); if (task_cpu(p) !=3D cpu) { if (p->in_iowait) { delayacct_blkio_end(p); --- a/kernel/sched/syscalls.c +++ b/kernel/sched/syscalls.c @@ -203,9 +203,6 @@ int idle_cpu(int cpu) { struct rq *rq =3D cpu_rq(cpu); =20 - if (rq->curr !=3D rq->idle) - return 0; - if (rq->nr_running) return 0; From nobody Fri Dec 19 12:45:06 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9AF5825FA10 for ; Tue, 20 May 2025 10:18:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747736335; cv=none; b=IXf5vaIIMUzzeiLVRckbLct/quwwk2SjbLMOtxhal8Ay+jRM0WGNxSU/XH4ZEqdiEwqkJ3QfSYWuxP4weUWh/VwBejOfbMK4MKI6srXUk9kO/q9z8r9K5Rok3g8wuAVJevca3O7nFInWu2NzL/xgkQRM+HG9Oj2yhedpLJkwHjg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747736335; c=relaxed/simple; bh=a/MUVsZErBo9JFahoi2yE3aPodJX2QNyVjjkX/G7GbY=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=kxKlRWvzASx4Q+pGHfiQNmh54CWrsuOwVMcU7QLtiOVlpn6rqYKEmOK0eG8PBN71MSDHTCxTRrQYPsxicaeylDPHHtpEqYsVjBgvFlq19qTlsMVyGEmt9z54nf9I7ntO9nTQE7R05Imtu2qQqIdq8YVqZJB4tRuRqvEENj6V1/4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=rHP2cvAH; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="rHP2cvAH" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=kxcwxZ4VbmtKYSzZAeKBWZN0eOo9U7Z3p03L3HjcBBo=; b=rHP2cvAHPoDA0jbIZzB4U85+4c bZ22mCHr+fTN4hrtPufdzWYdb66XiIEpWdeDpOEc79hon26VARddwiYUGLWBX7zmtoX/SWxGLbVpL xuc2fSyMN6ufCeNZ5cpwcrLsHgeZRFaZXWg0EyH+lswqJoEozwCyO/6JSwTYQIbzrM5zEFATNoynH NeSOI9A4mYHoa+d/5O6bgqimmz1S0Kn/GO+mCr2xdfvTqBPvkVcguIVnykvcGte2dbz+0CwsPsymm WWTzmfhtW8CVaJ2N5ZvDb8uL6H9oPgAku258oWSQfbBuyoEazOcttY82R+00cOsJO8AgDXvc6rNZQ ez/6/MSg==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1uHK3Z-000000038He-2Rzd; Tue, 20 May 2025 10:18:45 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id BE348300BDD; Tue, 20 May 2025 12:18:44 +0200 (CEST) Message-ID: <20250520101727.732703833@infradead.org> User-Agent: quilt/0.66 Date: Tue, 20 May 2025 11:45:41 +0200 From: Peter Zijlstra To: mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, clm@meta.com Cc: linux-kernel@vger.kernel.org, peterz@infradead.org Subject: [RFC][PATCH 3/5] sched: Split up ttwu_runnable() References: <20250520094538.086709102@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Split up ttwu_runnable() in preparation for more changes. Signed-off-by: Peter Zijlstra (Intel) Tested-by: K Prateek Nayak --- kernel/sched/core.c | 43 +++++++++++++++++++++---------------------- kernel/sched/sched.h | 5 +++++ 2 files changed, 26 insertions(+), 22 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3765,6 +3765,25 @@ ttwu_do_activate(struct rq *rq, struct t #endif } =20 +static int __ttwu_runnable(struct rq *rq, struct task_struct *p, int wake_= flags) +{ + if (!task_on_rq_queued(p)) + return 0; + + update_rq_clock(rq); + if (p->se.sched_delayed) + enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED); + if (!task_on_cpu(rq, p)) { + /* + * When on_rq && !on_cpu the task is preempted, see if + * it should preempt the task that is current now. + */ + wakeup_preempt(rq, p, wake_flags); + } + ttwu_do_wakeup(p); + return 1; +} + /* * Consider @p being inside a wait loop: * @@ -3792,28 +3811,8 @@ ttwu_do_activate(struct rq *rq, struct t */ static int ttwu_runnable(struct task_struct *p, int wake_flags) { - struct rq_flags rf; - struct rq *rq; - int ret =3D 0; - - rq =3D __task_rq_lock(p, &rf); - if (task_on_rq_queued(p)) { - update_rq_clock(rq); - if (p->se.sched_delayed) - enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED); - if (!task_on_cpu(rq, p)) { - /* - * When on_rq && !on_cpu the task is preempted, see if - * it should preempt the task that is current now. - */ - wakeup_preempt(rq, p, wake_flags); - } - ttwu_do_wakeup(p); - ret =3D 1; - } - __task_rq_unlock(rq, &rf); - - return ret; + CLASS(__task_rq_lock, guard)(p); + return __ttwu_runnable(guard.rq, p, wake_flags); } =20 #ifdef CONFIG_SMP --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1831,6 +1831,11 @@ task_rq_unlock(struct rq *rq, struct tas raw_spin_unlock_irqrestore(&p->pi_lock, rf->flags); } =20 +DEFINE_LOCK_GUARD_1(__task_rq_lock, struct task_struct, + _T->rq =3D __task_rq_lock(_T->lock, &_T->rf), + __task_rq_unlock(_T->rq, &_T->rf), + struct rq *rq; struct rq_flags rf) + DEFINE_LOCK_GUARD_1(task_rq_lock, struct task_struct, _T->rq =3D task_rq_lock(_T->lock, &_T->rf), task_rq_unlock(_T->rq, _T->lock, &_T->rf), From nobody Fri Dec 19 12:45:06 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9AEF325E476 for ; Tue, 20 May 2025 10:18:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747736335; cv=none; b=nUCxlm3GP36tQdxkNDX0jNEshTwCMtBk6ctSSDCMLy7H6zQ3wWj4mbTK0xq0Fpa0z8+AwdBAMSg4D6XgzIE/Fdnfb3aop/NaQIfhkHUU6ti4QAqlY1rqvJsYoF9TGy0jJLn2CU18LHXA8ualB/jfhgBixK2uSbId9Ti3fJc/2X8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747736335; c=relaxed/simple; bh=Hr6d9QqfhpyBMqwme4pL/ldeGpvoK0ymIRDWfZdq7G0=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=kus2HadhLhgdPchJsjSLhdWWLvXKzZqWoNZpAU4SHIHqT6U4OP7YhGDH8yZHYGp7fmuY8VRzkV9Z49J8YvyDNcsGYPYGaSORbl5FhptGWyoT9P8/u5zST2DO5kAGMgmitwxAQV8wSvgAC0SGHz0YHgPNK7leFIkVmlRQweWQDh8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=dQPOkxJ3; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="dQPOkxJ3" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=2dngSsfqes/njR321qIM0PyOlDn4Zv4ZW2UR6zyFv48=; b=dQPOkxJ3A34AZk1NMZk1k8GhWl 0uCxIKblZJvGvHC2j2I0+KqOv1EBEZguxRM+2G/LE5ou7gNZbiyodJ5hzJNOvVUL6nrM+ErZzeGEZ 6ZQmIWuvLZrxaF/8wej25BAjsKJHbhlxmskngpTGoXajEpgVCiPuYlnLUlsEUrcVaV1PLEhxntqnP JYUUa+CAZBpBhrXVytuODw+wbR0v2QCtcdyfgV/bbyIScV51kaaiUvS9Bl085mUXmqpPyqzzaq2s/ ZBV5I+AnWQ2TT3fX6sVzxm5E/tSRAg5YrY23yKlS5D6trJ82JGawb4DVQID91NOyL/w1ZWYELN88r OJx7bqFA==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1uHK3Z-000000038Hd-2Q0l; Tue, 20 May 2025 10:18:45 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id C1DD9300F1A; Tue, 20 May 2025 12:18:44 +0200 (CEST) Message-ID: <20250520101727.874587738@infradead.org> User-Agent: quilt/0.66 Date: Tue, 20 May 2025 11:45:42 +0200 From: Peter Zijlstra To: mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, clm@meta.com Cc: linux-kernel@vger.kernel.org, peterz@infradead.org Subject: [RFC][PATCH 4/5] sched: Add ttwu_queue controls References: <20250520094538.086709102@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" There are two (soon three) callers of ttwu_queue_wakelist(), distinguish them with their own WF_ and add some knobs on. Signed-off-by: Peter Zijlstra (Intel) Tested-by: K Prateek Nayak --- kernel/sched/core.c | 22 ++++++++++++---------- kernel/sched/features.h | 2 ++ kernel/sched/sched.h | 2 ++ 3 files changed, 16 insertions(+), 10 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3928,7 +3928,7 @@ bool cpus_share_resources(int this_cpu, return per_cpu(sd_share_id, this_cpu) =3D=3D per_cpu(sd_share_id, that_cp= u); } =20 -static inline bool ttwu_queue_cond(struct task_struct *p, int cpu) +static inline bool ttwu_queue_cond(struct task_struct *p, int cpu, bool de= f) { /* See SCX_OPS_ALLOW_QUEUED_WAKEUP. */ if (!scx_allow_ttwu_queue(p)) @@ -3969,18 +3969,19 @@ static inline bool ttwu_queue_cond(struc if (!cpu_rq(cpu)->nr_running) return true; =20 - return false; + return def; } =20 static bool ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_f= lags) { - if (sched_feat(TTWU_QUEUE) && ttwu_queue_cond(p, cpu)) { - sched_clock_cpu(cpu); /* Sync clocks across CPUs */ - __ttwu_queue_wakelist(p, cpu, wake_flags); - return true; - } + bool def =3D sched_feat(TTWU_QUEUE_DEFAULT); =20 - return false; + if (!ttwu_queue_cond(p, cpu, def)) + return false; + + sched_clock_cpu(cpu); /* Sync clocks across CPUs */ + __ttwu_queue_wakelist(p, cpu, wake_flags); + return true; } =20 #else /* !CONFIG_SMP */ @@ -3997,7 +3998,7 @@ static void ttwu_queue(struct task_struc struct rq *rq =3D cpu_rq(cpu); struct rq_flags rf; =20 - if (ttwu_queue_wakelist(p, cpu, wake_flags)) + if (sched_feat(TTWU_QUEUE) && ttwu_queue_wakelist(p, cpu, wake_flags)) return; =20 rq_lock(rq, &rf); @@ -4301,7 +4302,8 @@ int try_to_wake_up(struct task_struct *p * scheduling. */ if (smp_load_acquire(&p->on_cpu) && - ttwu_queue_wakelist(p, task_cpu(p), wake_flags)) + sched_feat(TTWU_QUEUE_ON_CPU) && + ttwu_queue_wakelist(p, task_cpu(p), wake_flags | WF_ON_CPU)) break; =20 cpu =3D select_task_rq(p, p->wake_cpu, &wake_flags); --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -81,6 +81,8 @@ SCHED_FEAT(TTWU_QUEUE, false) */ SCHED_FEAT(TTWU_QUEUE, true) #endif +SCHED_FEAT(TTWU_QUEUE_ON_CPU, true) +SCHED_FEAT(TTWU_QUEUE_DEFAULT, false) =20 /* * When doing wakeups, attempt to limit superfluous scans of the LLC domai= n. --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2312,6 +2312,8 @@ static inline int task_on_rq_migrating(s #define WF_CURRENT_CPU 0x40 /* Prefer to move the wakee to the current CP= U. */ #define WF_RQ_SELECTED 0x80 /* ->select_task_rq() was called */ =20 +#define WF_ON_CPU 0x0100 + #ifdef CONFIG_SMP static_assert(WF_EXEC =3D=3D SD_BALANCE_EXEC); static_assert(WF_FORK =3D=3D SD_BALANCE_FORK); From nobody Fri Dec 19 12:45:06 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9867925D216 for ; Tue, 20 May 2025 10:18:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747736335; cv=none; b=mFrynBKZ5HFm526ZxwoZsA2SyD9Jx8i8qT2CaYxesKXsJHHfiORNyl9+ch0qjTv/dnQg4k/RhOALBiz+k59xOUDM0gf5fPj7rLeUDzzuckWja7/w9HlGtkunbgYAangn/knygWcPu7ioZsTbzgQO+ISvJpjqhRtCL2Ql7gcp3xo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747736335; c=relaxed/simple; bh=6jyJO+X952+v9TwQQqqRUwUO7WaprcuJD8QBDocVKf8=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=GNodoRQddYFzSOA0vdfQS0/9OKOylMwEBk94uJMe4EstRAVQ/XXp7Xj2piK1viW3PId6tvwDqYAZlnSEPlk1ba40L3ZLOqtn1iV7iNqW+A8TRjnNorDm+BHEfLhTHTGZTAi3B4p6aztWhPehbKjilHJ2W3l3A1b1xOpb3uyYH/U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=R+7g2HFT; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="R+7g2HFT" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=o2GFNTBpg11EZaBhNmzaf8Q+rSd69uuVAy6ECARk8uI=; b=R+7g2HFTyxtl0Dp7X4mYEETR2G XmmBp75s0Y89+6sL+EA5pnw1RSCFro81l8va0AWkd+3OctL+MJ0JNIru0mft7gsxWFJR5Mj5mKIBO iYF8WQO3ZQVHa8GaiHVkef9K/0PlA+awYXOBzW27XdpJ/fxB0G8CfxQLGhDkW8lxgHdfNCJiuQk39 OXM93uALKj++/TfPZ1pr9vu7zbrjiC3BjdXiAl8VzTQSaNeEz9taja/w+Wgbk+zr8WGnNVZr3IZrl XOvQ+gAjwnZuQuZWMSk4W1xEn6LUNJV3kdU9TI7JoMTr11lcgFwiqLkPpAansow3LpRHc8wcioGrN 6LeMLJAg==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.1 #2 (Red Hat Linux)) id 1uHK3a-00000000lL5-0Vyy; Tue, 20 May 2025 10:18:46 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id C57E8301C4D; Tue, 20 May 2025 12:18:44 +0200 (CEST) Message-ID: <20250520101727.984171377@infradead.org> User-Agent: quilt/0.66 Date: Tue, 20 May 2025 11:45:43 +0200 From: Peter Zijlstra To: mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, clm@meta.com Cc: linux-kernel@vger.kernel.org, peterz@infradead.org Subject: [RFC][PATCH 5/5] sched: Add ttwu_queue support for delayed tasks References: <20250520094538.086709102@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" One of the things lost with introduction of DELAY_DEQUEUE is the ability of TTWU to move those tasks around on wakeup, since they're on_rq, and as such, need to be woken in-place. Doing the in-place thing adds quite a bit of cross-cpu latency, add a little something that gets remote CPUs to do their own in-place wakeups, significantly reducing the rq->lock contention. Reported-by: Chris Mason Signed-off-by: Peter Zijlstra (Intel) Tested-by: K Prateek Nayak --- kernel/sched/core.c | 74 ++++++++++++++++++++++++++++++++++++++++++-= ----- kernel/sched/fair.c | 5 ++- kernel/sched/features.h | 1=20 kernel/sched/sched.h | 1=20 4 files changed, 72 insertions(+), 9 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3784,6 +3784,8 @@ static int __ttwu_runnable(struct rq *rq return 1; } =20 +static bool ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_f= lags); + /* * Consider @p being inside a wait loop: * @@ -3811,6 +3813,33 @@ static int __ttwu_runnable(struct rq *rq */ static int ttwu_runnable(struct task_struct *p, int wake_flags) { +#ifdef CONFIG_SMP + if (sched_feat(TTWU_QUEUE_DELAYED) && READ_ONCE(p->se.sched_delayed)) { + /* + * Similar to try_to_block_task(): + * + * __schedule() ttwu() + * prev_state =3D prev->state if (p->sched_delayed) + * if (prev_state) smp_acquire__after_ctrl_dep() + * try_to_block_task() p->state =3D TASK_WAKING + * ... set_delayed() + * RELEASE p->sched_delayed =3D 1 + * + * __schedule() and ttwu() have matching control dependencies. + * + * Notably, once we observe sched_delayed we know the task has + * passed try_to_block_task() and p->state is ours to modify. + * + * TASK_WAKING controls ttwu() concurrency. + */ + smp_acquire__after_ctrl_dep(); + WRITE_ONCE(p->__state, TASK_WAKING); + + if (ttwu_queue_wakelist(p, task_cpu(p), wake_flags | WF_DELAYED)) + return 1; + } +#endif + CLASS(__task_rq_lock, guard)(p); return __ttwu_runnable(guard.rq, p, wake_flags); } @@ -3830,12 +3859,41 @@ void sched_ttwu_pending(void *arg) update_rq_clock(rq); =20 llist_for_each_entry_safe(p, t, llist, wake_entry.llist) { + struct rq *p_rq =3D task_rq(p); + int ret; + + /* + * This is the ttwu_runnable() case. Notably it is possible for + * on-rq entities to get migrated -- even sched_delayed ones. + */ + if (unlikely(p_rq !=3D rq)) { + rq_unlock(rq, &rf); + p_rq =3D __task_rq_lock(p, &rf); + } + + ret =3D __ttwu_runnable(p_rq, p, WF_TTWU); + + if (unlikely(p_rq !=3D rq)) { + if (!ret) + set_task_cpu(p, cpu_of(rq)); + + __task_rq_unlock(p_rq, &rf); + rq_lock(rq, &rf); + update_rq_clock(rq); + } + + if (ret) { + // XXX ttwu_stat() + continue; + } + + /* + * This is the 'normal' case where the task is blocked. + */ + if (WARN_ON_ONCE(p->on_cpu)) smp_cond_load_acquire(&p->on_cpu, !VAL); =20 - if (WARN_ON_ONCE(task_cpu(p) !=3D cpu_of(rq))) - set_task_cpu(p, cpu_of(rq)); - ttwu_do_activate(rq, p, p->sched_remote_wakeup ? WF_MIGRATED : 0, &rf); } =20 @@ -3974,7 +4032,7 @@ static inline bool ttwu_queue_cond(struc =20 static bool ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_f= lags) { - bool def =3D sched_feat(TTWU_QUEUE_DEFAULT); + bool def =3D sched_feat(TTWU_QUEUE_DEFAULT) || (wake_flags & WF_DELAYED); =20 if (!ttwu_queue_cond(p, cpu, def)) return false; @@ -4269,8 +4327,8 @@ int try_to_wake_up(struct task_struct *p * __schedule(). See the comment for smp_mb__after_spinlock(). * * Form a control-dep-acquire with p->on_rq =3D=3D 0 above, to ensure - * schedule()'s deactivate_task() has 'happened' and p will no longer - * care about it's own p->state. See the comment in __schedule(). + * schedule()'s try_to_block_task() has 'happened' and p will no longer + * care about it's own p->state. See the comment in try_to_block_task(). */ smp_acquire__after_ctrl_dep(); =20 @@ -6712,8 +6770,8 @@ static void __sched notrace __schedule(i preempt =3D sched_mode =3D=3D SM_PREEMPT; =20 /* - * We must load prev->state once (task_struct::state is volatile), such - * that we form a control dependency vs deactivate_task() below. + * We must load prev->state once, such that we form a control + * dependency vs try_to_block_task() below. */ prev_state =3D READ_ONCE(prev->__state); if (sched_mode =3D=3D SM_IDLE) { --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5395,7 +5395,10 @@ static __always_inline void return_cfs_r =20 static void set_delayed(struct sched_entity *se) { - se->sched_delayed =3D 1; + /* + * See TTWU_QUEUE_DELAYED in ttwu_runnable(). + */ + smp_store_release(&se->sched_delayed, 1); =20 /* * Delayed se of cfs_rq have no tasks queued on them. --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -82,6 +82,7 @@ SCHED_FEAT(TTWU_QUEUE, false) SCHED_FEAT(TTWU_QUEUE, true) #endif SCHED_FEAT(TTWU_QUEUE_ON_CPU, true) +SCHED_FEAT(TTWU_QUEUE_DELAYED, false) SCHED_FEAT(TTWU_QUEUE_DEFAULT, false) =20 /* --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2313,6 +2313,7 @@ static inline int task_on_rq_migrating(s #define WF_RQ_SELECTED 0x80 /* ->select_task_rq() was called */ =20 #define WF_ON_CPU 0x0100 +#define WF_DELAYED 0x0200 =20 #ifdef CONFIG_SMP static_assert(WF_EXEC =3D=3D SD_BALANCE_EXEC);