From nobody Mon Jun 8 07:26:02 2026 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 15B053C1F3A; Thu, 4 Jun 2026 18:45:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780598732; cv=none; b=W9uEWvdqlJhEfJnyr3kS9dN+4LWCy/5a7V0KC9WWBpkf6zS99KfmQsRxRmRqiFvIHZoQcVXyjysonEKzB0wBTT/CPImmBUeAkoiNJlznErDZyMOoUzVd/ZgxyMPzx9jgKnCE8LXJyZGFWrn+K2X2oqHzEOWgUVyvPKYO5FcNmJA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780598732; c=relaxed/simple; bh=+Kz4ymvLfxWE1JORSayPjNf4v7WEG72E+BOH1Z+Qhj8=; h=Date:From:To:Subject:Cc:In-Reply-To:References:MIME-Version: Message-ID:Content-Type; b=dJMdq3PnYHpXSKS30A7larqN0VYF+t5oTZsJZJqwdUEAmdh2bzTzIhDHEJ7Zm6i7qt6KgQbm7x/lqpbp5LPq5fDW/vjwP7ZW0XgYY+rju4dsLR0EOgRXpvfIzPXSqAvnaX2bNGaYc2jeQdJB7UzPTYx2gscKDwlZBxNbPnNiWpw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=TPsKv3xd; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=lg+8Ne7a; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="TPsKv3xd"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="lg+8Ne7a" Date: Thu, 04 Jun 2026 18:45:27 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1780598728; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QBDKqbicKThwPD3zNLTaDkSaHKoihfEVch6U2bvC32Q=; b=TPsKv3xdamXc73cGMfgNSF36TPbTSVimPvi9G0yqZxqUhrygQ7mNMO1CJ9l1Tn+tmLr1If y6UBb/NT3UA0/JHC9ydtBLeWerNJsuPTbpoaOgbplIUImku5zuyKdb0PHFu9n2fg/otz9z 0k3CFgagv8WNBbbGoCP3H0NDlHVuiJv+OHY3BmXyWj2MLwGbo1XWHDcl+9NZ9or6lQUxrZ ItigZ/6J6T3bXgtgcF/4se6iqJBZRsZqxnwPdmWjEGJ8GaC5ZA6JQgVtEBvG3MOOvj2X7u d8jnm3i3Cd5KSoWOiewIY9ZmnmBMjIoDczv1kke9QkqHRFNZEggAFgRqAtmzJw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1780598728; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QBDKqbicKThwPD3zNLTaDkSaHKoihfEVch6U2bvC32Q=; b=lg+8Ne7aAmzNNyiMbMBu/2N+CB6eUZ0MkrbzMz8IzMWFNwnjfkIqEvzlGqqSqn1fTafL2Z K0D8O0KFI/4k9CAQ== From: "tip-bot2 for K Prateek Nayak" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: sched/core] sched/fair: Convert cfs bandwidth throttling to use guards Cc: K Prateek Nayak , "Peter Zijlstra (Intel)" , Ben Segall , Aaron Lu , x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20260602050005.11160-2-kprateek.nayak@amd.com> References: <20260602050005.11160-2-kprateek.nayak@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-ID: <178059872705.710.16108764162246002638.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Precedence: bulk Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The following commit has been merged into the sched/core branch of tip: Commit-ID: 1abbecd1d2d2fdd96e52f541f07ee2b163631bee Gitweb: https://git.kernel.org/tip/1abbecd1d2d2fdd96e52f541f07ee2b16= 3631bee Author: K Prateek Nayak AuthorDate: Tue, 02 Jun 2026 05:00:01=20 Committer: Peter Zijlstra CommitterDate: Tue, 02 Jun 2026 12:26:11 +02:00 sched/fair: Convert cfs bandwidth throttling to use guards Routine conversion of rcu_read_lock(), spin_lock*, and rq_lock usage within the cfs bandwidth controller to use class guards. Only notable changes are: - Checking for "cfs_rq->runtime_remaining <=3D 0" instead of the inverse to spot a throttle and break early. This also saves the need for extra indentation in the unthrottle case. - Reordering of list_del_rcu() against throttled_clock indicator update in unthrottle_cfs_rq(). Both are done with "cfs_b->lock" held after the "cfs_rq->throttled" is cleared which make the reordering safe against concurrent list modifications. No functional changes intended. Signed-off-by: K Prateek Nayak Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Ben Segall Tested-by: Aaron Lu Link: https://patch.msgid.link/20260602050005.11160-2-kprateek.nayak@amd.com --- kernel/sched/fair.c | 193 ++++++++++++++++++++----------------------- 1 file changed, 90 insertions(+), 103 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1d4ed88..261e5ce 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5035,13 +5035,13 @@ static void __maybe_unused clear_tg_offline_cfs_rqs= (struct rq *rq) */ rq_clock_start_loop_update(rq); =20 - rcu_read_lock(); + guard(rcu)(); + list_for_each_entry_rcu(tg, &task_groups, list) { struct cfs_rq *cfs_rq =3D tg_cfs_rq(tg, cpu_of(rq)); =20 clear_tg_load_avg(cfs_rq); } - rcu_read_unlock(); =20 rq_clock_stop_loop_update(rq); } @@ -6540,13 +6540,10 @@ static int __assign_cfs_rq_runtime(struct cfs_bandw= idth *cfs_b, static int assign_cfs_rq_runtime(struct cfs_rq *cfs_rq) { struct cfs_bandwidth *cfs_b =3D tg_cfs_bandwidth(cfs_rq->tg); - int ret; =20 - raw_spin_lock(&cfs_b->lock); - ret =3D __assign_cfs_rq_runtime(cfs_b, cfs_rq, sched_cfs_bandwidth_slice(= )); - raw_spin_unlock(&cfs_b->lock); + guard(raw_spinlock)(&cfs_b->lock); =20 - return ret; + return __assign_cfs_rq_runtime(cfs_b, cfs_rq, sched_cfs_bandwidth_slice()= ); } =20 static void __account_cfs_rq_runtime(struct cfs_rq *cfs_rq, u64 delta_exec) @@ -6835,33 +6832,32 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq) { struct rq *rq =3D rq_of(cfs_rq); struct cfs_bandwidth *cfs_b =3D tg_cfs_bandwidth(cfs_rq->tg); - int dequeue =3D 1; =20 - raw_spin_lock(&cfs_b->lock); - /* This will start the period timer if necessary */ - if (__assign_cfs_rq_runtime(cfs_b, cfs_rq, 1)) { + scoped_guard(raw_spinlock, &cfs_b->lock) { /* - * We have raced with bandwidth becoming available, and if we - * actually throttled the timer might not unthrottle us for an - * entire period. We additionally needed to make sure that any - * subsequent check_cfs_rq_runtime calls agree not to throttle - * us, as we may commit to do cfs put_prev+pick_next, so we ask - * for 1ns of runtime rather than just check cfs_b. + * Check if We have raced with bandwidth becoming available. If + * we actually throttled the timer might not unthrottle us for + * an entire period. We additionally needed to make sure that + * any subsequent check_cfs_rq_runtime calls agree not to + * throttle us, as we may commit to do cfs put_prev+pick_next, + * so we ask for 1ns of runtime rather than just check cfs_b. + * + * This will start the period timer if necessary. + */ + if (__assign_cfs_rq_runtime(cfs_b, cfs_rq, 1)) + return false; + + /* + * No bandwidth available; Add ourselves on the list to be + * unthrottled later. */ - dequeue =3D 0; - } else { list_add_tail_rcu(&cfs_rq->throttled_list, &cfs_b->throttled_cfs_rq); } - raw_spin_unlock(&cfs_b->lock); - - if (!dequeue) - return false; /* Throttle no longer required. */ =20 /* freeze hierarchy runnable averages while throttled */ - rcu_read_lock(); - walk_tg_tree_from(cfs_rq->tg, tg_throttle_down, tg_nop, (void *)rq); - rcu_read_unlock(); + scoped_guard(rcu) + walk_tg_tree_from(cfs_rq->tg, tg_throttle_down, tg_nop, (void *)rq); =20 /* * Note: distribution will already see us throttled via the @@ -6894,13 +6890,15 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq) =20 update_rq_clock(rq); =20 - raw_spin_lock(&cfs_b->lock); - if (cfs_rq->throttled_clock) { + scoped_guard(raw_spinlock, &cfs_b->lock) { + list_del_rcu(&cfs_rq->throttled_list); + + if (!cfs_rq->throttled_clock) + break; + cfs_b->throttled_time +=3D rq_clock(rq) - cfs_rq->throttled_clock; cfs_rq->throttled_clock =3D 0; } - list_del_rcu(&cfs_rq->throttled_list); - raw_spin_unlock(&cfs_b->lock); =20 /* update hierarchical throttle state */ walk_tg_tree_from(cfs_rq->tg, tg_nop, tg_unthrottle_up, (void *)rq); @@ -6929,9 +6927,8 @@ static void __cfsb_csd_unthrottle(void *arg) { struct cfs_rq *cursor, *tmp; struct rq *rq =3D arg; - struct rq_flags rf; =20 - rq_lock(rq, &rf); + guard(rq_lock)(rq); =20 /* * Iterating over the list can trigger several call to @@ -6948,7 +6945,7 @@ static void __cfsb_csd_unthrottle(void *arg) * race with group being freed in the window between removing it * from the list and advancing to the next entry in the list. */ - rcu_read_lock(); + guard(rcu)(); =20 list_for_each_entry_safe(cursor, tmp, &rq->cfsb_csd_list, throttled_csd_list) { @@ -6958,10 +6955,7 @@ static void __cfsb_csd_unthrottle(void *arg) unthrottle_cfs_rq(cursor); } =20 - rcu_read_unlock(); - rq_clock_stop_loop_update(rq); - rq_unlock(rq, &rf); } =20 static inline void __unthrottle_cfs_rq_async(struct cfs_rq *cfs_rq) @@ -7001,11 +6995,11 @@ static bool distribute_cfs_runtime(struct cfs_bandw= idth *cfs_b) u64 runtime, remaining =3D 1; bool throttled =3D false; struct cfs_rq *cfs_rq, *tmp; - struct rq_flags rf; struct rq *rq; LIST_HEAD(local_unthrottle); =20 - rcu_read_lock(); + guard(rcu)(); + list_for_each_entry_rcu(cfs_rq, &cfs_b->throttled_cfs_rq, throttled_list) { rq =3D rq_of(cfs_rq); @@ -7015,65 +7009,63 @@ static bool distribute_cfs_runtime(struct cfs_bandw= idth *cfs_b) break; } =20 - rq_lock_irqsave(rq, &rf); + guard(rq_lock_irqsave)(rq); + if (!cfs_rq_throttled(cfs_rq)) - goto next; + continue; =20 /* Already queued for async unthrottle */ if (!list_empty(&cfs_rq->throttled_csd_list)) - goto next; + continue; =20 /* By the above checks, this should never be true */ WARN_ON_ONCE(cfs_rq->runtime_remaining > 0); =20 - raw_spin_lock(&cfs_b->lock); - runtime =3D -cfs_rq->runtime_remaining + 1; - if (runtime > cfs_b->runtime) - runtime =3D cfs_b->runtime; - cfs_b->runtime -=3D runtime; - remaining =3D cfs_b->runtime; - raw_spin_unlock(&cfs_b->lock); + scoped_guard(raw_spinlock, &cfs_b->lock) { + runtime =3D -cfs_rq->runtime_remaining + 1; + if (runtime > cfs_b->runtime) + runtime =3D cfs_b->runtime; + cfs_b->runtime -=3D runtime; + remaining =3D cfs_b->runtime; + } =20 cfs_rq->runtime_remaining +=3D runtime; =20 - /* we check whether we're throttled above */ - if (cfs_rq->runtime_remaining > 0) { - if (cpu_of(rq) !=3D this_cpu) { - unthrottle_cfs_rq_async(cfs_rq); - } else { - /* - * We currently only expect to be unthrottling - * a single cfs_rq locally. - */ - WARN_ON_ONCE(!list_empty(&local_unthrottle)); - list_add_tail(&cfs_rq->throttled_csd_list, - &local_unthrottle); - } - } else { + /* + * Ran out of bandwidth during distribution! + * Indicate throttled entities and break early. + */ + if (cfs_rq->runtime_remaining <=3D 0) { throttled =3D true; + break; } =20 -next: - rq_unlock_irqrestore(rq, &rf); + /* we check whether we're throttled above */ + if (cpu_of(rq) !=3D this_cpu) { + unthrottle_cfs_rq_async(cfs_rq); + continue; + } + + /* + * We currently only expect to be unthrottling + * a single cfs_rq locally. + */ + WARN_ON_ONCE(!list_empty(&local_unthrottle)); + list_add_tail(&cfs_rq->throttled_csd_list, &local_unthrottle); } =20 list_for_each_entry_safe(cfs_rq, tmp, &local_unthrottle, throttled_csd_list) { struct rq *rq =3D rq_of(cfs_rq); =20 - rq_lock_irqsave(rq, &rf); + guard(rq_lock_irqsave)(rq); =20 list_del_init(&cfs_rq->throttled_csd_list); - if (cfs_rq_throttled(cfs_rq)) unthrottle_cfs_rq(cfs_rq); - - rq_unlock_irqrestore(rq, &rf); } WARN_ON_ONCE(!list_empty(&local_unthrottle)); =20 - rcu_read_unlock(); - return throttled; } =20 @@ -7196,7 +7188,8 @@ static void __return_cfs_rq_runtime(struct cfs_rq *cf= s_rq) if (slack_runtime <=3D 0) return; =20 - raw_spin_lock(&cfs_b->lock); + guard(raw_spinlock)(&cfs_b->lock); + if (cfs_b->quota !=3D RUNTIME_INF) { cfs_b->runtime +=3D slack_runtime; =20 @@ -7205,7 +7198,6 @@ static void __return_cfs_rq_runtime(struct cfs_rq *cf= s_rq) !list_empty(&cfs_b->throttled_cfs_rq)) start_cfs_slack_bandwidth(cfs_b); } - raw_spin_unlock(&cfs_b->lock); =20 /* even if it's not valid for return we don't want to try again */ cfs_rq->runtime_remaining -=3D slack_runtime; @@ -7228,25 +7220,21 @@ static __always_inline void return_cfs_rq_runtime(s= truct cfs_rq *cfs_rq) */ static void do_sched_cfs_slack_timer(struct cfs_bandwidth *cfs_b) { - u64 runtime =3D 0, slice =3D sched_cfs_bandwidth_slice(); - unsigned long flags; - /* confirm we're still not at a refresh boundary */ - raw_spin_lock_irqsave(&cfs_b->lock, flags); - cfs_b->slack_started =3D false; + scoped_guard(raw_spinlock_irqsave, &cfs_b->lock) { + u64 runtime =3D 0, slice =3D sched_cfs_bandwidth_slice(); =20 - if (runtime_refresh_within(cfs_b, min_bandwidth_expiration)) { - raw_spin_unlock_irqrestore(&cfs_b->lock, flags); - return; - } + cfs_b->slack_started =3D false; =20 - if (cfs_b->quota !=3D RUNTIME_INF && cfs_b->runtime > slice) - runtime =3D cfs_b->runtime; + if (runtime_refresh_within(cfs_b, min_bandwidth_expiration)) + return; =20 - raw_spin_unlock_irqrestore(&cfs_b->lock, flags); + if (cfs_b->quota !=3D RUNTIME_INF && cfs_b->runtime > slice) + runtime =3D cfs_b->runtime; =20 - if (!runtime) - return; + if (!runtime) + return; + } =20 distribute_cfs_runtime(cfs_b); } @@ -7335,18 +7323,18 @@ static enum hrtimer_restart sched_cfs_period_timer(= struct hrtimer *timer) { struct cfs_bandwidth *cfs_b =3D container_of(timer, struct cfs_bandwidth, period_timer); - unsigned long flags; int overrun; int idle =3D 0; int count =3D 0; =20 - raw_spin_lock_irqsave(&cfs_b->lock, flags); + CLASS(raw_spinlock_irqsave, cfsb_guard)(&cfs_b->lock); + for (;;) { overrun =3D hrtimer_forward_now(timer, cfs_b->period); if (!overrun) break; =20 - idle =3D do_sched_cfs_period_timer(cfs_b, overrun, flags); + idle =3D do_sched_cfs_period_timer(cfs_b, overrun, cfsb_guard.flags); =20 if (++count > 3) { u64 new, old =3D ktime_to_ns(cfs_b->period); @@ -7379,11 +7367,13 @@ static enum hrtimer_restart sched_cfs_period_timer(= struct hrtimer *timer) count =3D 0; } } - if (idle) + + if (idle) { cfs_b->period_active =3D 0; - raw_spin_unlock_irqrestore(&cfs_b->lock, flags); + return HRTIMER_NORESTART; + } =20 - return idle ? HRTIMER_NORESTART : HRTIMER_RESTART; + return HRTIMER_RESTART; } =20 void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b, struct cfs_bandwidth = *parent) @@ -7450,14 +7440,12 @@ static void destroy_cfs_bandwidth(struct cfs_bandwi= dth *cfs_b) */ for_each_possible_cpu(i) { struct rq *rq =3D cpu_rq(i); - unsigned long flags; =20 if (list_empty(&rq->cfsb_csd_list)) continue; =20 - local_irq_save(flags); - __cfsb_csd_unthrottle(rq); - local_irq_restore(flags); + scoped_guard(irqsave) + __cfsb_csd_unthrottle(rq); } } =20 @@ -7475,16 +7463,15 @@ static void __maybe_unused update_runtime_enabled(s= truct rq *rq) =20 lockdep_assert_rq_held(rq); =20 - rcu_read_lock(); + guard(rcu)(); + list_for_each_entry_rcu(tg, &task_groups, list) { struct cfs_bandwidth *cfs_b =3D &tg->cfs_bandwidth; struct cfs_rq *cfs_rq =3D tg_cfs_rq(tg, cpu_of(rq)); =20 - raw_spin_lock(&cfs_b->lock); - cfs_rq->runtime_enabled =3D cfs_b->quota !=3D RUNTIME_INF; - raw_spin_unlock(&cfs_b->lock); + scoped_guard(raw_spinlock, &cfs_b->lock) + cfs_rq->runtime_enabled =3D cfs_b->quota !=3D RUNTIME_INF; } - rcu_read_unlock(); } =20 /* cpu offline callback */ @@ -7505,7 +7492,8 @@ static void __maybe_unused unthrottle_offline_cfs_rqs= (struct rq *rq) */ rq_clock_start_loop_update(rq); =20 - rcu_read_lock(); + guard(rcu)(); + list_for_each_entry_rcu(tg, &task_groups, list) { struct cfs_rq *cfs_rq =3D tg_cfs_rq(tg, cpu_of(rq)); =20 @@ -7528,7 +7516,6 @@ static void __maybe_unused unthrottle_offline_cfs_rqs= (struct rq *rq) cfs_rq->runtime_remaining =3D 1; unthrottle_cfs_rq(cfs_rq); } - rcu_read_unlock(); =20 rq_clock_stop_loop_update(rq); }