From nobody Mon Dec 1 21:30:02 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 90BA923FC49; Thu, 27 Nov 2025 15:48:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764258498; cv=none; b=Nc/o3QHOEpFf7N75SK1YB1nMo5ajt65Pjh708sqh54yTGC/RkhsuHxzbKlnelKnduSJWW2+ZFs+nFnQvF/KJdhOR6TtN+8RHCqmY5behbykbxwrHo4WfTNj5OMKp239aUaCdyn7GKqyVcbiun2Ha3DXubfO07duKbVilF5CAy7o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764258498; c=relaxed/simple; bh=L2jzblJ+afbzlrTgHkDr+6hdkw0/qrq0KUnJRIkILYk=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=ReXiFGO7I0GYzhWnwvaJRfh9Z/sdT/GJfcmlCm+bSwQE1wd25LCkU4A8E0zwkvcEQwmSLKamGWwSaabMaYypv3XxtfpuUK4KO3IYqMNFeJA73l/erCWon9qAVmVk5FBjpVZM0QyHZJB5d9XphnfNet5Ts5dPyHbypcQ7kyNEGbQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=SEZ7mukv; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="SEZ7mukv" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=zx2bkb6yotfEr78Kete+RE8vdT5QIVglwHNqAufMUdo=; b=SEZ7mukvNBpy5vBtug6Oi9tczL B7+UToCeHfOEm+Oym6uva8Xhytm6inS5hYR0FIOmhG97KV9vIIlv9kSqWjWlwBSKQoZBRhWb9hTbl R/Zl7JXVcqbC5YckoSPldFMsRyQhyVgMKpviSvI8wvTRcbvso4EfcztX/pfL/zwdJPNDhTSOqFL6M /gDHrTLO00E11vClcFz41A1ZBNCXeiGhpjUCGrjjmCqKlX7f4C9/PI53Wd57MPN9AjijKamUoEx2P gqzrz6ioJ1pb8glBQ7wqNP6Zz4JRdRiqF5apRR5r/1KkpymHj3xcNpu1BFJ5dRDJSlopFJflvV+A9 h4r2P3Hg==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1vOdMT-0000000AP1x-3KbK; Thu, 27 Nov 2025 14:52:46 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id A6725300413; Thu, 27 Nov 2025 16:48:06 +0100 (CET) Message-ID: <20251127154725.413564507@infradead.org> User-Agent: quilt/0.68 Date: Thu, 27 Nov 2025 16:39:44 +0100 From: Peter Zijlstra To: mingo@kernel.org, vincent.guittot@linaro.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, juri.lelli@redhat.com, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, tj@kernel.org, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, sched-ext@lists.linux.dev Subject: [PATCH 1/5] sched/fair: Fold the sched_avg update References: <20251127153943.696191429@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Nine (and a half) instances of the same pattern is just silly, fold the lot. Notable, the half instance in enqueue_load_avg() is right after setting cfs_rq->avg.load_sum to cfs_rq->avg.load_avg * get_pelt_divider(&cfs_rq->av= g). Since get_pelt_divisor() >=3D PELT_MIN_DIVIDER, this ends up being a no-op change. Signed-off-by: Peter Zijlstra (Intel) --- include/linux/compiler_types.h | 19 +++++++ kernel/sched/fair.c | 108 ++++++++++++------------------------= ----- 2 files changed, 51 insertions(+), 76 deletions(-) --- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -558,6 +558,25 @@ struct ftrace_likely_data { __scalar_type_to_expr_cases(long long), \ default: (x))) =20 +/* + * __signed_scalar_typeof(x) - Declare a signed scalar type, leaving + * non-scalar types unchanged. + */ + +#define __scalar_type_to_signed_cases(type) \ + unsigned type: (signed type)0, \ + signed type: (signed type)0 + +#define __signed_scalar_typeof(x) typeof( \ + _Generic((x), \ + char: (signed char)0, \ + __scalar_type_to_signed_cases(char), \ + __scalar_type_to_signed_cases(short), \ + __scalar_type_to_signed_cases(int), \ + __scalar_type_to_signed_cases(long), \ + __scalar_type_to_signed_cases(long long), \ + default: (x))) + /* Is this type a native word size -- useful for atomic operations */ #define __native_word(t) \ (sizeof(t) =3D=3D sizeof(char) || sizeof(t) =3D=3D sizeof(short) || \ --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3693,7 +3693,7 @@ account_entity_dequeue(struct cfs_rq *cf */ #define add_positive(_ptr, _val) do { \ typeof(_ptr) ptr =3D (_ptr); \ - typeof(_val) val =3D (_val); \ + __signed_scalar_typeof(*ptr) val =3D (_val); \ typeof(*ptr) res, var =3D READ_ONCE(*ptr); \ \ res =3D var + val; \ @@ -3705,23 +3705,6 @@ account_entity_dequeue(struct cfs_rq *cf } while (0) =20 /* - * Unsigned subtract and clamp on underflow. - * - * Explicitly do a load-store to ensure the intermediate value never hits - * memory. This allows lockless observations without ever seeing the negat= ive - * values. - */ -#define sub_positive(_ptr, _val) do { \ - typeof(_ptr) ptr =3D (_ptr); \ - typeof(*ptr) val =3D (_val); \ - typeof(*ptr) res, var =3D READ_ONCE(*ptr); \ - res =3D var - val; \ - if (res > var) \ - res =3D 0; \ - WRITE_ONCE(*ptr, res); \ -} while (0) - -/* * Remove and clamp on negative, from a local variable. * * A variant of sub_positive(), which does not use explicit load-store @@ -3732,21 +3715,37 @@ account_entity_dequeue(struct cfs_rq *cf *ptr -=3D min_t(typeof(*ptr), *ptr, _val); \ } while (0) =20 + +/* + * Because of rounding, se->util_sum might ends up being +1 more than + * cfs->util_sum. Although this is not a problem by itself, detaching + * a lot of tasks with the rounding problem between 2 updates of + * util_avg (~1ms) can make cfs->util_sum becoming null whereas + * cfs_util_avg is not. + * + * Check that util_sum is still above its lower bound for the new + * util_avg. Given that period_contrib might have moved since the last + * sync, we are only sure that util_sum must be above or equal to + * util_avg * minimum possible divider + */ +#define __update_sa(sa, name, delta_avg, delta_sum) do { \ + add_positive(&(sa)->name##_avg, delta_avg); \ + add_positive(&(sa)->name##_sum, delta_sum); \ + (sa)->name##_sum =3D max_t(typeof((sa)->name##_sum), \ + (sa)->name##_sum, \ + (sa)->name##_avg * PELT_MIN_DIVIDER); \ +} while (0) + static inline void enqueue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) { - cfs_rq->avg.load_avg +=3D se->avg.load_avg; - cfs_rq->avg.load_sum +=3D se_weight(se) * se->avg.load_sum; + __update_sa(&cfs_rq->avg, load, se->avg.load_avg, se->avg.load_sum); } =20 static inline void dequeue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) { - sub_positive(&cfs_rq->avg.load_avg, se->avg.load_avg); - sub_positive(&cfs_rq->avg.load_sum, se_weight(se) * se->avg.load_sum); - /* See update_cfs_rq_load_avg() */ - cfs_rq->avg.load_sum =3D max_t(u32, cfs_rq->avg.load_sum, - cfs_rq->avg.load_avg * PELT_MIN_DIVIDER); + __update_sa(&cfs_rq->avg, load, -se->avg.load_avg, -se->avg.load_sum); } =20 static void place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, i= nt flags); @@ -4239,7 +4238,6 @@ update_tg_cfs_util(struct cfs_rq *cfs_rq */ divider =3D get_pelt_divider(&cfs_rq->avg); =20 - /* Set new sched_entity's utilization */ se->avg.util_avg =3D gcfs_rq->avg.util_avg; new_sum =3D se->avg.util_avg * divider; @@ -4247,12 +4245,7 @@ update_tg_cfs_util(struct cfs_rq *cfs_rq se->avg.util_sum =3D new_sum; =20 /* Update parent cfs_rq utilization */ - add_positive(&cfs_rq->avg.util_avg, delta_avg); - add_positive(&cfs_rq->avg.util_sum, delta_sum); - - /* See update_cfs_rq_load_avg() */ - cfs_rq->avg.util_sum =3D max_t(u32, cfs_rq->avg.util_sum, - cfs_rq->avg.util_avg * PELT_MIN_DIVIDER); + __update_sa(&cfs_rq->avg, util, delta_avg, delta_sum); } =20 static inline void @@ -4278,11 +4271,7 @@ update_tg_cfs_runnable(struct cfs_rq *cf se->avg.runnable_sum =3D new_sum; =20 /* Update parent cfs_rq runnable */ - add_positive(&cfs_rq->avg.runnable_avg, delta_avg); - add_positive(&cfs_rq->avg.runnable_sum, delta_sum); - /* See update_cfs_rq_load_avg() */ - cfs_rq->avg.runnable_sum =3D max_t(u32, cfs_rq->avg.runnable_sum, - cfs_rq->avg.runnable_avg * PELT_MIN_DIVIDER); + __update_sa(&cfs_rq->avg, runnable, delta_avg, delta_sum); } =20 static inline void @@ -4346,11 +4335,7 @@ update_tg_cfs_load(struct cfs_rq *cfs_rq =20 se->avg.load_sum =3D runnable_sum; se->avg.load_avg =3D load_avg; - add_positive(&cfs_rq->avg.load_avg, delta_avg); - add_positive(&cfs_rq->avg.load_sum, delta_sum); - /* See update_cfs_rq_load_avg() */ - cfs_rq->avg.load_sum =3D max_t(u32, cfs_rq->avg.load_sum, - cfs_rq->avg.load_avg * PELT_MIN_DIVIDER); + __update_sa(&cfs_rq->avg, load, delta_avg, delta_sum); } =20 static inline void add_tg_cfs_propagate(struct cfs_rq *cfs_rq, long runnab= le_sum) @@ -4549,33 +4534,13 @@ update_cfs_rq_load_avg(u64 now, struct c raw_spin_unlock(&cfs_rq->removed.lock); =20 r =3D removed_load; - sub_positive(&sa->load_avg, r); - sub_positive(&sa->load_sum, r * divider); - /* See sa->util_sum below */ - sa->load_sum =3D max_t(u32, sa->load_sum, sa->load_avg * PELT_MIN_DIVIDE= R); + __update_sa(sa, load, -r, -r*divider); =20 r =3D removed_util; - sub_positive(&sa->util_avg, r); - sub_positive(&sa->util_sum, r * divider); - /* - * Because of rounding, se->util_sum might ends up being +1 more than - * cfs->util_sum. Although this is not a problem by itself, detaching - * a lot of tasks with the rounding problem between 2 updates of - * util_avg (~1ms) can make cfs->util_sum becoming null whereas - * cfs_util_avg is not. - * Check that util_sum is still above its lower bound for the new - * util_avg. Given that period_contrib might have moved since the last - * sync, we are only sure that util_sum must be above or equal to - * util_avg * minimum possible divider - */ - sa->util_sum =3D max_t(u32, sa->util_sum, sa->util_avg * PELT_MIN_DIVIDE= R); + __update_sa(sa, util, -r, -r*divider); =20 r =3D removed_runnable; - sub_positive(&sa->runnable_avg, r); - sub_positive(&sa->runnable_sum, r * divider); - /* See sa->util_sum above */ - sa->runnable_sum =3D max_t(u32, sa->runnable_sum, - sa->runnable_avg * PELT_MIN_DIVIDER); + __update_sa(sa, runnable, -r, -r*divider); =20 /* * removed_runnable is the unweighted version of removed_load so we @@ -4660,17 +4625,8 @@ static void attach_entity_load_avg(struc static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_ent= ity *se) { dequeue_load_avg(cfs_rq, se); - sub_positive(&cfs_rq->avg.util_avg, se->avg.util_avg); - sub_positive(&cfs_rq->avg.util_sum, se->avg.util_sum); - /* See update_cfs_rq_load_avg() */ - cfs_rq->avg.util_sum =3D max_t(u32, cfs_rq->avg.util_sum, - cfs_rq->avg.util_avg * PELT_MIN_DIVIDER); - - sub_positive(&cfs_rq->avg.runnable_avg, se->avg.runnable_avg); - sub_positive(&cfs_rq->avg.runnable_sum, se->avg.runnable_sum); - /* See update_cfs_rq_load_avg() */ - cfs_rq->avg.runnable_sum =3D max_t(u32, cfs_rq->avg.runnable_sum, - cfs_rq->avg.runnable_avg * PELT_MIN_DIVIDER); + __update_sa(&cfs_rq->avg, util, -se->avg.util_avg, -se->avg.util_sum); + __update_sa(&cfs_rq->avg, runnable, -se->avg.runnable_avg, -se->avg.runna= ble_sum); =20 add_tg_cfs_propagate(cfs_rq, -se->avg.load_sum); From nobody Mon Dec 1 21:30:02 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 41DC41391; Thu, 27 Nov 2025 15:48:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764258496; cv=none; b=UP8MZ2ak7kVntk2VNpNWZAkjBO/D8tJ8Fq5u/jzhxyfb5XT5FCQueZk1ocSUwpNCrMTsQOcR/47dlLNW9jwH2fxIN9/cNhEX2KhG/D+cBvsAeQBVm7Uvqkpn17WvNjPTux+ewBG+jan4OoLdf3SC42qW1dwB5I/jT4owAhpfuOQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764258496; c=relaxed/simple; bh=szdsVpTs9+zbXS/0lR/jnO42tgSCUM8xtocZUBWfick=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=au2rbNDZkSo2r7osfA8X8gS3Xxuxe3erTbowNvPyBdsHFomcci7SRVbPBmSE7HhobcNJs3QrIxu2mMdz3cz/r6+VtGRsMNmBW2XUxP8qX3olAs/kRsZK74f6vgezQoXY2gSjcavYJwC9bkTEDM3EkAVrd2IN9QKbyUcYQgi0hNI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=KAftj7Kc; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="KAftj7Kc" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=Fg/dshgqdIk9MpN0Q/eQ+5nOs/fbqHzIMhoq7/G0x7Q=; b=KAftj7Kc7WT3PAypmJH+7yDeqx GUT0RRLjsZf5uMQe0p1lDEWpfcL5gKFXMrthwp13UCkWZxsYIZAniRAaulN75i4go7gRdhfZT522b wdi+A64zEh4qqrkaIJdVP8qqzGfTX/8OWZNyVP3QBeY1zPCQI5faxCROIJo4DIrhaT2o/J0Xcdinu 92GE1moKnQrZWPneI6EhmhcRF4P8M6wWUEObmwr7BHL4bbygySOQTnpt+cJx3o9L/OLJez+CcbEbu WpW4RKduIDNGTjqPtzNs3XoOZIS+mCn8aKaCFvY5hiJ6lzh2t4ULXB4ZscmgJOMCAknDZTk3sVtqj yB/hVB1g==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1vOdMT-0000000AP1y-3J2p; Thu, 27 Nov 2025 14:52:46 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id AAA7D300F1A; Thu, 27 Nov 2025 16:48:06 +0100 (CET) Message-ID: <20251127154725.532469061@infradead.org> User-Agent: quilt/0.68 Date: Thu, 27 Nov 2025 16:39:45 +0100 From: Peter Zijlstra To: mingo@kernel.org, vincent.guittot@linaro.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, juri.lelli@redhat.com, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, tj@kernel.org, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, sched-ext@lists.linux.dev Subject: [PATCH 2/5] sched/fair: Avoid rq->lock bouncing in sched_balance_newidle() References: <20251127153943.696191429@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" While poking at this code recently I noted we do a pointless unlock+lock cycle in sched_balance_newidle(). We drop the rq->lock (so we can balance) but then instantly grab the same rq->lock again in sched_balance_update_blocked_averages(). Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Shrikanth Hegde --- kernel/sched/fair.c | 27 ++++++++++++++++++--------- 1 file changed, 18 insertions(+), 9 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9902,15 +9902,11 @@ static unsigned long task_h_load(struct } #endif /* !CONFIG_FAIR_GROUP_SCHED */ =20 -static void sched_balance_update_blocked_averages(int cpu) +static void __sched_balance_update_blocked_averages(struct rq *rq) { bool decayed =3D false, done =3D true; - struct rq *rq =3D cpu_rq(cpu); - struct rq_flags rf; =20 - rq_lock_irqsave(rq, &rf); update_blocked_load_tick(rq); - update_rq_clock(rq); =20 decayed |=3D __update_blocked_others(rq, &done); decayed |=3D __update_blocked_fair(rq, &done); @@ -9918,7 +9914,15 @@ static void sched_balance_update_blocked update_blocked_load_status(rq, !done); if (decayed) cpufreq_update_util(rq, 0); - rq_unlock_irqrestore(rq, &rf); +} + +static void sched_balance_update_blocked_averages(int cpu) +{ + struct rq *rq =3D cpu_rq(cpu); + + guard(rq_lock_irqsave)(rq); + update_rq_clock(rq); + __sched_balance_update_blocked_averages(rq); } =20 /********** Helpers for sched_balance_find_src_group *********************= ***/ @@ -12865,12 +12869,17 @@ static int sched_balance_newidle(struct } rcu_read_unlock(); =20 + /* + * Include sched_balance_update_blocked_averages() in the cost + * calculation because it can be quite costly -- this ensures we skip + * it when avg_idle gets to be very low. + */ + t0 =3D sched_clock_cpu(this_cpu); + __sched_balance_update_blocked_averages(this_rq); + rq_modified_clear(this_rq); raw_spin_rq_unlock(this_rq); =20 - t0 =3D sched_clock_cpu(this_cpu); - sched_balance_update_blocked_averages(this_cpu); - rcu_read_lock(); for_each_domain(this_cpu, sd) { u64 domain_cost; From nobody Mon Dec 1 21:30:02 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B784B274B26; Thu, 27 Nov 2025 15:48:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764258496; cv=none; b=p2KHsDQQBobYY+Alu2FpvkK6uB5YJUoCetUXXmGasZ+bBf56DWdYrcKwLJ9QwQCCYhH/B6nQDkpgfbD6NYgr3FhLDBiWb6mvPPAWIdGwBwE4i31rB4DX/O9xpSjPTiaK6IHTxiruPDz4SpMSSqkWNxmhxW4ze7u75afaB2V5+3E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764258496; c=relaxed/simple; bh=hMTMVAUYzHqRhAq4AQ2jLbHCsiI29ze1O2ZyB081erI=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=SAFnSIXcE7X9Z1F+QaBd7tTs1/ZDzYNs1PM+ezsm9xhiJd/tsH5CWoc7eR6RfQefcaAfV4LqzLn7KQTbIMh2+hrTv0hxMc6+eiwO9rDJhgDg9qAPyAMBAYMo6ylq8Mh9MpT3Fo347L1Bsj9fmHsRi57UNcu2izNztrFeavJWvBk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=LTweGNYF; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="LTweGNYF" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=tgxwiFXzP97N3LBO7St/JOaxT4aGfObT4S6QZDsdMv8=; b=LTweGNYF2ab9N4MOWFGznNg7cH led6361kIdq3/I+pQMzKcJYM6sB0dhXIKQmfHqIPlE1H6aAEMyKQLDnmMAqzPFR8EZXlEfYhjpbmo 5KUkBwaorC7l8aGBj6ocuZm5IwcjwbeedV4PE1jZq+5AsEilcIkWiwntUy9JIHxMvi4z3lz7MNXWP z2RKBdQOnDt1Vrut7gWGa+bsDrZGSRyLRo5AcdhTSqoG34Pxfo7gtmMuKWMtuvcizbA9N6JqL2qNY Tt7yQEd/3J119trETSrBRD6spmNmlFEzXFLI921bPW7+UcrF4QUK/RwIrxxFUs607O/Aby9B1i1Fw WUeNSFtg==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1vOeE3-0000000BwrB-4268; Thu, 27 Nov 2025 15:48:08 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id AF590302D3E; Thu, 27 Nov 2025 16:48:06 +0100 (CET) Message-ID: <20251127154725.647502625@infradead.org> User-Agent: quilt/0.68 Date: Thu, 27 Nov 2025 16:39:46 +0100 From: Peter Zijlstra To: mingo@kernel.org, vincent.guittot@linaro.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, juri.lelli@redhat.com, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, tj@kernel.org, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, sched-ext@lists.linux.dev Subject: [PATCH 3/5] sched: Change rcu_dereference_check_sched_domain() to rcu-sched References: <20251127153943.696191429@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" By changing rcu_dereference_check_sched_domain() to use rcu_dereference_sched_check() it also considers preempt_disable() to be equivalent to rcu_read_lock(). Since rcu fully implies rcu_sched this has absolutely no change in behaviour, but it does allow removing a bunch of otherwise redundant rcu_read_lock() noise. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/fair.c | 9 +-------- kernel/sched/sched.h | 2 +- 2 files changed, 2 insertions(+), 9 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -12853,21 +12853,16 @@ static int sched_balance_newidle(struct */ rq_unpin_lock(this_rq, rf); =20 - rcu_read_lock(); sd =3D rcu_dereference_check_sched_domain(this_rq->sd); - if (!sd) { - rcu_read_unlock(); + if (!sd) goto out; - } =20 if (!get_rd_overloaded(this_rq->rd) || this_rq->avg_idle < sd->max_newidle_lb_cost) { =20 update_next_balance(sd, &next_balance); - rcu_read_unlock(); goto out; } - rcu_read_unlock(); =20 /* * Include sched_balance_update_blocked_averages() in the cost @@ -12880,7 +12875,6 @@ static int sched_balance_newidle(struct rq_modified_clear(this_rq); raw_spin_rq_unlock(this_rq); =20 - rcu_read_lock(); for_each_domain(this_cpu, sd) { u64 domain_cost; =20 @@ -12930,7 +12924,6 @@ static int sched_balance_newidle(struct if (pulled_task || !continue_balancing) break; } - rcu_read_unlock(); =20 raw_spin_rq_lock(this_rq); =20 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2009,7 +2009,7 @@ queue_balance_callback(struct rq *rq, } =20 #define rcu_dereference_check_sched_domain(p) \ - rcu_dereference_check((p), lockdep_is_held(&sched_domains_mutex)) + rcu_dereference_sched_check((p), lockdep_is_held(&sched_domains_mutex)) =20 /* * The domain tree (rq->sd) is protected by RCU's quiescent state transiti= on. From nobody Mon Dec 1 21:30:02 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 41E3323FC41; Thu, 27 Nov 2025 15:48:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764258496; cv=none; b=Jq9rRnxlryG3Cp8kOzrRHzcrAv144pO9DbehidYTYugjVha9j6CP70fgOL2+oKWi6zS91LwS4RExpfT8LOeL/X3oiTMdUTiC0f2cdNTrGgRrSs77Oqrhbna0uQF11vtAvcACQvui1Gnc54eZzaqGf0hpAaw2E/VWWZfGsDwG9u0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764258496; c=relaxed/simple; bh=IqyVfWu40ZLMUdC0ro/AyQRRI5FJiBLB0sei0TEvx64=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=Uwffpe5iyoPbPE3/qsUNTqlJekODsH2VOAcDVn2Mv0YycuHVcWrf9KwGrgxu5QcSDO8pCtEpDRZCgFBNI2dcjYoEEYYCYUcUZamQn+1KddO1GLymkKJ8lwcm89tJxYLPQALiWOOl6ioz6rdmi+AksoJgZ7mUC8NUAZ1HNOf1F08= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=QIZ1DJGO; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="QIZ1DJGO" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=dnCfp3L62pj9DutfreyucA8+0cAoopcZko6tHzka0nM=; b=QIZ1DJGOf4fdowOurudefi2LSh QyMSKJH95T4AGH4uR67wYLjJoN4Wp1EgT/nGj2zi0NTMLvnjZcZachbFXEiVDDe2hdbT+pl9vior4 wgXLrTXjZVuB03pTUaWtJ/ShhBUHvUmQALGn1X5rwz5YiG1TsehrhWn31vvWgiG3+NSbyDq4UT/1q hcTCTVbIHI8w71V++tFeHgV38yxVWG28X2y5yuLVY0dx1soiNIac27XVNURQExLnCHAAxbi5r3jK6 arTzleoX640v7dkHCeUEx7gBmuooWskah6G7Z5lglWigKLAQjxH7gHO78r7Ov3Pf143ZhJwxuklcQ Odx7H04Q==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1vOdMT-0000000AP1z-3KQd; Thu, 27 Nov 2025 14:52:46 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id B3AA7302FD7; Thu, 27 Nov 2025 16:48:06 +0100 (CET) Message-ID: <20251127154725.771691954@infradead.org> User-Agent: quilt/0.68 Date: Thu, 27 Nov 2025 16:39:47 +0100 From: Peter Zijlstra To: mingo@kernel.org, vincent.guittot@linaro.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, juri.lelli@redhat.com, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, tj@kernel.org, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, sched-ext@lists.linux.dev Subject: [PATCH 4/5] sched: Add assertions to QUEUE_CLASS References: <20251127153943.696191429@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add some checks to the sched_change pattern to validate assumptions around changing classes. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/core.c | 13 +++++++++++++ kernel/sched/sched.h | 1 + 2 files changed, 14 insertions(+) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -10806,6 +10806,7 @@ struct sched_change_ctx *sched_change_be =20 *ctx =3D (struct sched_change_ctx){ .p =3D p, + .class =3D p->sched_class, .flags =3D flags, .queued =3D task_on_rq_queued(p), .running =3D task_current_donor(rq, p), @@ -10836,6 +10837,11 @@ void sched_change_end(struct sched_chang =20 lockdep_assert_rq_held(rq); =20 + /* + * Changing class without *QUEUE_CLASS is bad. + */ + WARN_ON_ONCE(p->sched_class !=3D ctx->class && !(ctx->flags & ENQUEUE_CLA= SS)); + if ((ctx->flags & ENQUEUE_CLASS) && p->sched_class->switching_to) p->sched_class->switching_to(rq, p); =20 @@ -10847,6 +10853,13 @@ void sched_change_end(struct sched_chang if (ctx->flags & ENQUEUE_CLASS) { if (p->sched_class->switched_to) p->sched_class->switched_to(rq, p); + + /* + * If this was a degradation in class someone should have set + * need_resched by now. + */ + WARN_ON_ONCE(sched_class_above(ctx->class, p->sched_class) && + !test_tsk_need_resched(p)); } else { p->sched_class->prio_changed(rq, p, ctx->prio); } --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -4027,6 +4027,7 @@ extern void balance_callbacks(struct rq struct sched_change_ctx { u64 prio; struct task_struct *p; + const struct sched_class *class; int flags; bool queued; bool running; From nobody Mon Dec 1 21:30:02 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0DD9D283FDD; Thu, 27 Nov 2025 15:48:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764258497; cv=none; b=ZcLwd6DXS5tzWL2K0BgRTICHudrVI/iF9vKbZ0KAvYZO+TJ93GFgV0i11hLvYZYE3xED2UWYkHcT7LUMVMUXdIxJJwLg2n2e9BNMNa9WH4cScm42Du8M+uVuasQgLh6QjZmOJq5Gcp8Vd+CQWQNiDg6kvHAhkkj/3yiCGNtUbqM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764258497; c=relaxed/simple; bh=xmEqkC3YipgevIYMNw66YaEDxWzZ3PQOrWEalsnHLtU=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=LrQ8EqCrY5QQXFIRrQD9R7eiXUW6DGMa8kOnenruOJZ683wUlljygtKJsaDke/gD8vcKfAHgzXbINl88u7Ro32gQ4jQjY5zYKotSkuUqPXR3E6mIRrihBNPACX6ZsMhwEOF9ZJX821a0M3AELzytQfaPCfn8ILCNOucKZeo/u5Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=FrcriYB4; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="FrcriYB4" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=4xyCKXDciurpk25f2SJLDqqyooNcBU/Lb7GEBZsLGXA=; b=FrcriYB4l+WDBZnhw3YdSytxSt Kp5xvXqLhRtp0s5trfTqMnXWzq5ZrhYFcy4xFQbNdwqro/999LjoGvNBubnh9Z8LkW2YF7zfkbJW/ nMxariogP+dFCvAMdwTjg0UCONdgoYBif/AI5vQJF7l/MJ6UOdtFkA1VGFkCO5lo2c6UI2MVao9hZ zkT3Rhzgeb/r3A5LHP8htdDizI5FUU/I4XOGkYpZsh1GOEPTJqVxbeU3qN1zbFj5RQjt9j7WrvgUl KH4I/GeE2n/X/lmUwaCkg7VtKzdwSbckX94NG5K9kXb7xumnPIy41XU1NWW/h+Bw8WTtNdwyA5gBT VKIlR6UA==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1vOdMU-0000000AP21-2DO8; Thu, 27 Nov 2025 14:52:46 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id B8926302FD9; Thu, 27 Nov 2025 16:48:06 +0100 (CET) Message-ID: <20251127154725.901391274@infradead.org> User-Agent: quilt/0.68 Date: Thu, 27 Nov 2025 16:39:48 +0100 From: Peter Zijlstra To: mingo@kernel.org, vincent.guittot@linaro.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, juri.lelli@redhat.com, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, tj@kernel.org, void@manifault.com, arighi@nvidia.com, changwoo@igalia.com, sched-ext@lists.linux.dev Subject: [PATCH 5/5] sched: Rework sched_class::wakeup_preempt() and rq_modified_*() References: <20251127153943.696191429@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Change sched_class::wakeup_preempt() to also get called for cross-class wakeups, specifically those where the woken task is of a higher class than the previous highest class. In order to do this, track the current highest class of the runqueue in rq::next_class and have wakeup_preempt() track this upwards for each new wakeup. Additionally have set_next_task() re-set the value to the current class. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/core.c | 32 +++++++++++++++++++++++--------- kernel/sched/deadline.c | 14 +++++++++----- kernel/sched/ext.c | 9 ++++----- kernel/sched/fair.c | 17 ++++++++++------- kernel/sched/idle.c | 3 --- kernel/sched/rt.c | 9 ++++++--- kernel/sched/sched.h | 26 ++------------------------ kernel/sched/stop_task.c | 3 --- 8 files changed, 54 insertions(+), 59 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2090,7 +2090,6 @@ void enqueue_task(struct rq *rq, struct */ uclamp_rq_inc(rq, p, flags); =20 - rq->queue_mask |=3D p->sched_class->queue_mask; p->sched_class->enqueue_task(rq, p, flags); =20 psi_enqueue(p, flags); @@ -2123,7 +2122,6 @@ inline bool dequeue_task(struct rq *rq, * and mark the task ->sched_delayed. */ uclamp_rq_dec(rq, p); - rq->queue_mask |=3D p->sched_class->queue_mask; return p->sched_class->dequeue_task(rq, p, flags); } =20 @@ -2174,10 +2172,14 @@ void wakeup_preempt(struct rq *rq, struc { struct task_struct *donor =3D rq->donor; =20 - if (p->sched_class =3D=3D donor->sched_class) - donor->sched_class->wakeup_preempt(rq, p, flags); - else if (sched_class_above(p->sched_class, donor->sched_class)) + if (p->sched_class =3D=3D rq->next_class) { + rq->next_class->wakeup_preempt(rq, p, flags); + + } else if (sched_class_above(p->sched_class, rq->next_class)) { + rq->next_class->wakeup_preempt(rq, p, flags); resched_curr(rq); + rq->next_class =3D p->sched_class; + } =20 /* * A queue event has occurred, and we're going to schedule. In @@ -6797,6 +6799,7 @@ static void __sched notrace __schedule(i pick_again: next =3D pick_next_task(rq, rq->donor, &rf); rq_set_donor(rq, next); + rq->next_class =3D next->sched_class; if (unlikely(task_is_blocked(next))) { next =3D find_proxy_task(rq, next, &rf); if (!next) @@ -8646,6 +8649,8 @@ void __init sched_init(void) rq->rt.rt_runtime =3D global_rt_runtime(); init_tg_rt_entry(&root_task_group, &rq->rt, NULL, i, NULL); #endif + rq->next_class =3D &idle_sched_class; + rq->sd =3D NULL; rq->rd =3D NULL; rq->cpu_capacity =3D SCHED_CAPACITY_SCALE; @@ -10771,10 +10776,8 @@ struct sched_change_ctx *sched_change_be flags |=3D DEQUEUE_NOCLOCK; } =20 - if (flags & DEQUEUE_CLASS) { - if (p->sched_class->switching_from) - p->sched_class->switching_from(rq, p); - } + if ((flags & DEQUEUE_CLASS) && p->sched_class->switching_from) + p->sched_class->switching_from(rq, p); =20 *ctx =3D (struct sched_change_ctx){ .p =3D p, @@ -10827,6 +10830,17 @@ void sched_change_end(struct sched_chang p->sched_class->switched_to(rq, p); =20 /* + * If this was a class promotion; let the old class know it + * got preempted. Note that none of the switch*_from() methods + * know the new class and none of the switch*_to() methods + * know the old class. + */ + if (ctx->running && sched_class_above(p->sched_class, ctx->class)) { + rq->next_class->wakeup_preempt(rq, p, 0); + rq->next_class =3D p->sched_class; + } + + /* * If this was a degradation in class someone should have set * need_resched by now. */ --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2499,9 +2499,16 @@ static int balance_dl(struct rq *rq, str * Only called when both the current and waking task are -deadline * tasks. */ -static void wakeup_preempt_dl(struct rq *rq, struct task_struct *p, - int flags) +static void wakeup_preempt_dl(struct rq *rq, struct task_struct *p, int fl= ags) { + /* + * Can only get preempted by stop-class, and those should be + * few and short lived, doesn't really make sense to push + * anything away for that. + */ + if (p->sched_class !=3D &dl_sched_class) + return; + if (dl_entity_preempt(&p->dl, &rq->donor->dl)) { resched_curr(rq); return; @@ -3304,9 +3311,6 @@ static int task_is_throttled_dl(struct t #endif =20 DEFINE_SCHED_CLASS(dl) =3D { - - .queue_mask =3D 8, - .enqueue_task =3D enqueue_task_dl, .dequeue_task =3D dequeue_task_dl, .yield_task =3D yield_task_dl, --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -2338,12 +2338,12 @@ static struct task_struct *pick_task_scx bool keep_prev, kick_idle =3D false; struct task_struct *p; =20 - rq_modified_clear(rq); + rq->next_class =3D &ext_sched_class; rq_unpin_lock(rq, rf); balance_one(rq, prev); rq_repin_lock(rq, rf); maybe_queue_balance_callback(rq); - if (rq_modified_above(rq, &ext_sched_class)) + if (sched_class_above(rq->next_class, &ext_sched_class)) return RETRY_TASK; =20 keep_prev =3D rq->scx.flags & SCX_RQ_BAL_KEEP; @@ -2967,7 +2967,8 @@ static void switched_from_scx(struct rq scx_disable_task(p); } =20 -static void wakeup_preempt_scx(struct rq *rq, struct task_struct *p,int wa= ke_flags) {} +static void wakeup_preempt_scx(struct rq *rq, struct task_struct *p, int w= ake_flags) {} + static void switched_to_scx(struct rq *rq, struct task_struct *p) {} =20 int scx_check_setscheduler(struct task_struct *p, int policy) @@ -3216,8 +3217,6 @@ static void scx_cgroup_unlock(void) {} * their current sched_class. Call them directly from sched core instead. */ DEFINE_SCHED_CLASS(ext) =3D { - .queue_mask =3D 1, - .enqueue_task =3D enqueue_task_scx, .dequeue_task =3D dequeue_task_scx, .yield_task =3D yield_task_scx, --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8697,7 +8697,7 @@ preempt_sync(struct rq *rq, int wake_fla /* * Preempt the current task with a newly woken task if needed: */ -static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p= , int wake_flags) +static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int = wake_flags) { enum preempt_wakeup_action preempt_action =3D PREEMPT_WAKEUP_PICK; struct task_struct *donor =3D rq->donor; @@ -8705,6 +8705,12 @@ static void check_preempt_wakeup_fair(st struct cfs_rq *cfs_rq =3D task_cfs_rq(donor); int cse_is_idle, pse_is_idle; =20 + /* + * XXX Getting preempted by higher class, try and find idle CPU? + */ + if (p->sched_class !=3D &fair_sched_class) + return; + if (unlikely(se =3D=3D pse)) return; =20 @@ -12872,7 +12878,7 @@ static int sched_balance_newidle(struct t0 =3D sched_clock_cpu(this_cpu); __sched_balance_update_blocked_averages(this_rq); =20 - rq_modified_clear(this_rq); + this_rq->next_class =3D &fair_sched_class; raw_spin_rq_unlock(this_rq); =20 for_each_domain(this_cpu, sd) { @@ -12939,7 +12945,7 @@ static int sched_balance_newidle(struct pulled_task =3D 1; =20 /* If a higher prio class was modified, restart the pick */ - if (rq_modified_above(this_rq, &fair_sched_class)) + if (sched_class_above(this_rq->next_class, &fair_sched_class)) pulled_task =3D -1; =20 out: @@ -13837,15 +13843,12 @@ static unsigned int get_rr_interval_fair * All the scheduling class methods: */ DEFINE_SCHED_CLASS(fair) =3D { - - .queue_mask =3D 2, - .enqueue_task =3D enqueue_task_fair, .dequeue_task =3D dequeue_task_fair, .yield_task =3D yield_task_fair, .yield_to_task =3D yield_to_task_fair, =20 - .wakeup_preempt =3D check_preempt_wakeup_fair, + .wakeup_preempt =3D wakeup_preempt_fair, =20 .pick_task =3D pick_task_fair, .pick_next_task =3D pick_next_task_fair, --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -534,9 +534,6 @@ static void update_curr_idle(struct rq * * Simple, special scheduling class for the per-CPU idle tasks: */ DEFINE_SCHED_CLASS(idle) =3D { - - .queue_mask =3D 0, - /* no enqueue/yield_task for idle tasks */ =20 /* dequeue is not valid, we print a debug message there: */ --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -1615,6 +1615,12 @@ static void wakeup_preempt_rt(struct rq { struct task_struct *donor =3D rq->donor; =20 + /* + * XXX If we're preempted by DL, queue a push? + */ + if (p->sched_class !=3D &rt_sched_class) + return; + if (p->prio < donor->prio) { resched_curr(rq); return; @@ -2568,9 +2574,6 @@ static int task_is_throttled_rt(struct t #endif /* CONFIG_SCHED_CORE */ =20 DEFINE_SCHED_CLASS(rt) =3D { - - .queue_mask =3D 4, - .enqueue_task =3D enqueue_task_rt, .dequeue_task =3D dequeue_task_rt, .yield_task =3D yield_task_rt, --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1119,7 +1119,6 @@ struct rq { raw_spinlock_t __lock; =20 /* Per class runqueue modification mask; bits in class order. */ - unsigned int queue_mask; unsigned int nr_running; #ifdef CONFIG_NUMA_BALANCING unsigned int nr_numa_running; @@ -1179,6 +1178,7 @@ struct rq { struct sched_dl_entity *dl_server; struct task_struct *idle; struct task_struct *stop; + const struct sched_class *next_class; unsigned long next_balance; struct mm_struct *prev_mm; =20 @@ -2426,15 +2426,6 @@ struct sched_class { #ifdef CONFIG_UCLAMP_TASK int uclamp_enabled; #endif - /* - * idle: 0 - * ext: 1 - * fair: 2 - * rt: 4 - * dl: 8 - * stop: 16 - */ - unsigned int queue_mask; =20 /* * move_queued_task/activate_task/enqueue_task: rq->lock @@ -2593,20 +2584,6 @@ struct sched_class { #endif }; =20 -/* - * Does not nest; only used around sched_class::pick_task() rq-lock-breaks. - */ -static inline void rq_modified_clear(struct rq *rq) -{ - rq->queue_mask =3D 0; -} - -static inline bool rq_modified_above(struct rq *rq, const struct sched_cla= ss * class) -{ - unsigned int mask =3D class->queue_mask; - return rq->queue_mask & ~((mask << 1) - 1); -} - static inline void put_prev_task(struct rq *rq, struct task_struct *prev) { WARN_ON_ONCE(rq->donor !=3D prev); @@ -3899,6 +3876,7 @@ void move_queued_task_locked(struct rq * deactivate_task(src_rq, task, 0); set_task_cpu(task, dst_rq->cpu); activate_task(dst_rq, task, 0); + wakeup_preempt(dst_rq, task, 0); } =20 static inline --- a/kernel/sched/stop_task.c +++ b/kernel/sched/stop_task.c @@ -97,9 +97,6 @@ static void update_curr_stop(struct rq * * Simple, special scheduling class for the per-CPU stop tasks: */ DEFINE_SCHED_CLASS(stop) =3D { - - .queue_mask =3D 16, - .enqueue_task =3D enqueue_task_stop, .dequeue_task =3D dequeue_task_stop, .yield_task =3D yield_task_stop,