From nobody Thu Sep 11 01:28:20 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8955B311944; Wed, 10 Sep 2025 16:00:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520042; cv=none; b=htHLjC/NC5poxxTvq3AxNjbexkg8rArmLRoARIHAi2j+9BEq/hlSeexBmM8ttD7QnGrZKhplG5kBdi9cdScPxpsUJ/9syzOTS7M3QIFBtTnS/JCWBO2k/aqiNOqS/OCsFjNImBJTNp61AU+YIPkExEdMlCTMw/Viz0deiGwZBbI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757520042; c=relaxed/simple; bh=B8aSuqbTgLfmYBfvMd3xIXOwLf3sM4eWPpCxmqmLkIQ=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=k8zzmAVDmECy4CilUs083QFtOllZziJdLRozwppCoeZl1YY1NWBQwMuxcWu4RA6FTgAXan6boNDdQ7C/PScLeV5k9tN5qfCfij8ihLBRx9QV66DciUKLBOOCy71Pe69bCtmJAtmOQ0/vjL/BdFz4WGTI/R9Eluezekb09qgFfu8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=bXB8xur8; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="bXB8xur8" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=3J8eOg/7mOqS6yso4behIDB9wj7wsh81tlPZ1LgWgX0=; b=bXB8xur8nanb7DTbYIt3SOumzk xSBhQsb3nTy4o72SqqQg/MM/7IQv00zUznLJzLPpwOaZgLtDh5zGAmsIT6uK58pCm/D4uMTjU9rZq Seu2jiqOgyGqFea/67h4Z6rjJua5pgxwf8/CbLW0LzFA3BPffS9BLAc5npaEjN6lUaZO1HHxXwMGv oYs2Aokjf/jZHtglqKhA30xekNR0uB8x8zT67qmjjAIcZdUjdkR4SMHy2p6rK/xv9FP+P9uoOrAut rfpm89RPoYf+XfmjprUZIguLKMNPdRRRlea6iMfAlfdL+WRAp3vx/FrKvm5n++YY0a92JKCIdw2Tc EUpRf3XA==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1uwNFH-0000000BXpK-0cXS; Wed, 10 Sep 2025 16:00:31 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id B0097302FFC; Wed, 10 Sep 2025 18:00:28 +0200 (CEST) Message-ID: <20250910155809.916720757@infradead.org> User-Agent: quilt/0.68 Date: Wed, 10 Sep 2025 17:44:23 +0200 From: Peter Zijlstra To: tj@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, cgroups@vger.kernel.org, sched-ext@lists.linux.dev, liuwenfang@honor.com, tglx@linutronix.de Subject: [PATCH 14/14] sched/ext: Implement p->srq_lock support References: <20250910154409.446470175@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Have enqueue set p->srq_lock to &dsq->lock and have dequeue clear it, when dst is non-local. When enqueue sees ENQUEUE_LOCKED, it must lock dsq->lock (since p->srq_lock will be NULL on entry) but must not unlock on exit when it sets p->srq_lock. When dequeue sees DEQUEUE_LOCKED, it must not lock dsq->lock when p->srq_lock is set (instead it must verify they are the same), but it must unlock on exit, since it will have cleared p->srq_lock. For DEQUEUE_SAVE/ENQUEUE_RESTORE it can retain p->srq_lock, since the extra unlock+lock cycle is pointless. Note: set_next_task_scx() relies on LOCKED to avoid self-recursion on dsq->lock in the enqueue_task/set_next_task case. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/ext.c | 68 ++++++++++++++++++++++++++++++++++++------------= ----- 1 file changed, 47 insertions(+), 21 deletions(-) --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -1952,13 +1952,16 @@ static void dispatch_enqueue(struct scx_ struct task_struct *p, u64 enq_flags) { bool is_local =3D dsq->id =3D=3D SCX_DSQ_LOCAL; + bool locked =3D enq_flags & ENQUEUE_LOCKED; + bool restore =3D enq_flags & ENQUEUE_RESTORE; =20 WARN_ON_ONCE(p->scx.dsq || !list_empty(&p->scx.dsq_list.node)); WARN_ON_ONCE((p->scx.dsq_flags & SCX_TASK_DSQ_ON_PRIQ) || !RB_EMPTY_NODE(&p->scx.dsq_priq)); =20 if (!is_local) { - raw_spin_lock(&dsq->lock); + if (!locked || !p->srq_lock) + raw_spin_lock(&dsq->lock); if (unlikely(dsq->id =3D=3D SCX_DSQ_INVALID)) { scx_error(sch, "attempting to dispatch to a destroyed dsq"); /* fall back to the global dsq */ @@ -2028,6 +2031,10 @@ static void dispatch_enqueue(struct scx_ =20 dsq_mod_nr(dsq, 1); p->scx.dsq =3D dsq; + if (!is_local) { + WARN_ON_ONCE(locked && restore && p->srq_lock && p->srq_lock !=3D &dsq->= lock); + p->srq_lock =3D &dsq->lock; + } =20 /* * scx.ddsp_dsq_id and scx.ddsp_enq_flags are only relevant on the @@ -2059,13 +2066,17 @@ static void dispatch_enqueue(struct scx_ rq->curr->sched_class)) resched_curr(rq); } else { - raw_spin_unlock(&dsq->lock); + if (!locked) + raw_spin_unlock(&dsq->lock); } } =20 static void task_unlink_from_dsq(struct task_struct *p, - struct scx_dispatch_q *dsq) + struct scx_dispatch_q *dsq, + int deq_flags) { + bool save =3D deq_flags & DEQUEUE_SAVE; + WARN_ON_ONCE(list_empty(&p->scx.dsq_list.node)); =20 if (p->scx.dsq_flags & SCX_TASK_DSQ_ON_PRIQ) { @@ -2076,12 +2087,15 @@ static void task_unlink_from_dsq(struct =20 list_del_init(&p->scx.dsq_list.node); dsq_mod_nr(dsq, -1); + if (!save) + p->srq_lock =3D NULL; } =20 -static void dispatch_dequeue(struct rq *rq, struct task_struct *p) +static void dispatch_dequeue(struct rq *rq, struct task_struct *p, int deq= _flags) { struct scx_dispatch_q *dsq =3D p->scx.dsq; bool is_local =3D dsq =3D=3D &rq->scx.local_dsq; + bool locked =3D deq_flags & DEQUEUE_LOCKED; =20 if (!dsq) { /* @@ -2103,8 +2117,10 @@ static void dispatch_dequeue(struct rq * return; } =20 - if (!is_local) - raw_spin_lock(&dsq->lock); + if (!is_local) { + if (!locked) + raw_spin_lock(&dsq->lock); + } =20 /* * Now that we hold @dsq->lock, @p->holding_cpu and @p->scx.dsq_* can't @@ -2112,7 +2128,8 @@ static void dispatch_dequeue(struct rq * */ if (p->scx.holding_cpu < 0) { /* @p must still be on @dsq, dequeue */ - task_unlink_from_dsq(p, dsq); + WARN_ON_ONCE(!is_local && !p->srq_lock); + task_unlink_from_dsq(p, dsq, deq_flags); } else { /* * We're racing against dispatch_to_local_dsq() which already @@ -2125,8 +2142,10 @@ static void dispatch_dequeue(struct rq * } p->scx.dsq =3D NULL; =20 - if (!is_local) - raw_spin_unlock(&dsq->lock); + if (!is_local) { + if (!locked || !p->srq_lock) + raw_spin_unlock(&dsq->lock); + } } =20 static struct scx_dispatch_q *find_dsq_for_dispatch(struct scx_sched *sch, @@ -2372,7 +2391,7 @@ static void clr_task_runnable(struct tas p->scx.flags |=3D SCX_TASK_RESET_RUNNABLE_AT; } =20 -static void set_next_task_scx(struct rq *rq, struct task_struct *p, bool f= irst); +static void __set_next_task_scx(struct rq *rq, struct task_struct *p, u32 = qf); =20 static void enqueue_task_scx(struct rq *rq, struct task_struct *p, u32 enq= _flags) { @@ -2421,7 +2440,7 @@ static void enqueue_task_scx(struct rq * __scx_add_event(sch, SCX_EV_SELECT_CPU_FALLBACK, 1); =20 if (enq_flags & ENQUEUE_CURR) - set_next_task_scx(rq, p, false); + __set_next_task_scx(rq, p, enq_flags); } =20 static void ops_dequeue(struct rq *rq, struct task_struct *p, u64 deq_flag= s) @@ -2516,7 +2535,7 @@ static bool dequeue_task_scx(struct rq * rq->scx.nr_running--; sub_nr_running(rq, 1); =20 - dispatch_dequeue(rq, p); + dispatch_dequeue(rq, p, deq_flags); =20 out: if (deq_flags & DEQUEUE_CURR) @@ -2710,7 +2729,7 @@ static bool unlink_dsq_and_lock_src_rq(s lockdep_assert_held(&dsq->lock); =20 WARN_ON_ONCE(p->scx.holding_cpu >=3D 0); - task_unlink_from_dsq(p, dsq); + task_unlink_from_dsq(p, dsq, 0); p->scx.holding_cpu =3D cpu; =20 raw_spin_unlock(&dsq->lock); @@ -2782,7 +2801,7 @@ static struct rq *move_task_between_dsqs if (dst_dsq->id =3D=3D SCX_DSQ_LOCAL) { /* @p is going from a non-local DSQ to a local DSQ */ if (src_rq =3D=3D dst_rq) { - task_unlink_from_dsq(p, src_dsq); + task_unlink_from_dsq(p, src_dsq, 0); move_local_task_to_local_dsq(p, enq_flags, src_dsq, dst_rq); raw_spin_unlock(&src_dsq->lock); @@ -2796,7 +2815,7 @@ static struct rq *move_task_between_dsqs * @p is going from a non-local DSQ to a non-local DSQ. As * $src_dsq is already locked, do an abbreviated dequeue. */ - task_unlink_from_dsq(p, src_dsq); + task_unlink_from_dsq(p, src_dsq, 0); p->scx.dsq =3D NULL; raw_spin_unlock(&src_dsq->lock); =20 @@ -2862,7 +2881,7 @@ static bool consume_dispatch_q(struct sc struct rq *task_rq =3D task_rq(p); =20 if (rq =3D=3D task_rq) { - task_unlink_from_dsq(p, dsq); + task_unlink_from_dsq(p, dsq, 0); move_local_task_to_local_dsq(p, 0, dsq, rq); raw_spin_unlock(&dsq->lock); return true; @@ -3256,7 +3275,7 @@ static void process_ddsp_deferred_locals } } =20 -static void set_next_task_scx(struct rq *rq, struct task_struct *p, bool f= irst) +static void __set_next_task_scx(struct rq *rq, struct task_struct *p, u32 = qf) { struct scx_sched *sch =3D scx_root; =20 @@ -3266,7 +3285,7 @@ static void set_next_task_scx(struct rq * dispatched. Call ops_dequeue() to notify the BPF scheduler. */ ops_dequeue(rq, p, SCX_DEQ_CORE_SCHED_EXEC); - dispatch_dequeue(rq, p); + dispatch_dequeue(rq, p, qf); } =20 p->se.exec_start =3D rq_clock_task(rq); @@ -3300,6 +3319,11 @@ static void set_next_task_scx(struct rq } } =20 +static void set_next_task_scx(struct rq *rq, struct task_struct *p, bool f= irst) +{ + __set_next_task_scx(rq, p, 0); +} + static enum scx_cpu_preempt_reason preempt_reason_from_class(const struct sched_class *class) { @@ -5012,7 +5036,8 @@ static void scx_disable_workfn(struct kt =20 scx_task_iter_start(&sti); while ((p =3D scx_task_iter_next_locked(&sti))) { - unsigned int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLO= CK; + unsigned int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE | + DEQUEUE_NOCLOCK | DEQUEUE_LOCKED; const struct sched_class *old_class =3D p->sched_class; const struct sched_class *new_class =3D __setscheduler_class(p->policy, p->prio); @@ -5756,7 +5781,8 @@ static int scx_enable(struct sched_ext_o percpu_down_write(&scx_fork_rwsem); scx_task_iter_start(&sti); while ((p =3D scx_task_iter_next_locked(&sti))) { - unsigned int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLO= CK; + unsigned int queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE | + DEQUEUE_NOCLOCK | DEQUEUE_LOCKED; const struct sched_class *old_class =3D p->sched_class; const struct sched_class *new_class =3D __setscheduler_class(p->policy, p->prio); @@ -6808,7 +6834,7 @@ __bpf_kfunc u32 scx_bpf_reenqueue_local( if (p->migration_pending || is_migration_disabled(p) || p->nr_cpus_allow= ed =3D=3D 1) continue; =20 - dispatch_dequeue(rq, p); + dispatch_dequeue(rq, p, 0); list_add_tail(&p->scx.dsq_list.node, &tasks); }