kernel/sched/fair.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
Commit c9d93a73ce87 ("sched/fair: Drop redundant RCU read lock in NOHZ
kick path") removed the rcu_read_lock()/unlock() pair from
set_cpu_sd_state_busy() and set_cpu_sd_state_idle() on the assumption
that all callers run in a safe context for rcu_dereference_all(): IRQs
disabled or cpus_write_lock() held.
That assumption is wrong for the CPU hotplug teardown path. When CPUs
are taken offline, set_cpu_sd_state_busy() is invoked via:
cpuhp/N kthread
cpuhp_thread_fun()
cpuhp_invoke_callback()
sched_cpu_deactivate()
nohz_balance_exit_idle()
set_cpu_sd_state_busy()
rcu_dereference_all(per_cpu(sd_llc, cpu))
The cpuhp kthread holds cpu_hotplug_lock (percpu-rwsem) but runs with
preemption and IRQs enabled. As a result, lockdep correctly reports a
suspicious RCU usage on CPU offline, e.g.:
# echo 0 > /sys/devices/system/cpu/cpu1/online
=============================
WARNING: suspicious RCU usage
-----------------------------
kernel/sched/fair.c:12793 suspicious rcu_dereference_check() usage!
...
2 locks held by cpuhp/1/20:
#0: (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x42/0x1ae
#1: (cpuhp_state-down){+.+.}-{0:0}, at: cpuhp_thread_fun+0x72/0x1ae
Call Trace:
lockdep_rcu_suspicious
nohz_balance_exit_idle
sched_cpu_deactivate
cpuhp_invoke_callback
cpuhp_thread_fun
smpboot_thread_fn
Restore the rcu_read_lock()/unlock() pair in both helpers;
nohz_balancer_kick() is left as is, since its IRQ-disabled context is
genuinely sufficient.
Fixes: c9d93a73ce87 ("sched/fair: Drop redundant RCU read lock in NOHZ kick path")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Closes: https://lore.kernel.org/all/38fe0a1d-1a48-435a-910a-c278024d9ac9@samsung.com/
Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
kernel/sched/fair.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index aac24cfddecdf..381f378db71fa 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -14070,6 +14070,8 @@ static void nohz_balancer_kick(struct rq *rq)
static void set_cpu_sd_state_busy(int cpu)
{
struct sched_domain *sd;
+
+ rcu_read_lock();
sd = rcu_dereference_all(per_cpu(sd_llc, cpu));
/*
@@ -14077,10 +14079,12 @@ static void set_cpu_sd_state_busy(int cpu)
* domain has no shared object there is nothing to clear or account.
*/
if (!sd || !sd->shared || !sd->nohz_idle)
- return;
+ goto unlock;
sd->nohz_idle = 0;
atomic_inc(&sd->shared->nr_busy_cpus);
+unlock:
+ rcu_read_unlock();
}
void nohz_balance_exit_idle(struct rq *rq)
@@ -14099,14 +14103,18 @@ void nohz_balance_exit_idle(struct rq *rq)
static void set_cpu_sd_state_idle(int cpu)
{
struct sched_domain *sd;
+
+ rcu_read_lock();
sd = rcu_dereference_all(per_cpu(sd_llc, cpu));
/* See set_cpu_sd_state_busy(): nohz_idle is only used with sd->shared. */
if (!sd || !sd->shared || sd->nohz_idle)
- return;
+ goto unlock;
sd->nohz_idle = 1;
atomic_dec(&sd->shared->nr_busy_cpus);
+unlock:
+ rcu_read_unlock();
}
/*
base-commit: b07a332d9cbbd9cc9cfa923a21bd061cfb69bea5
--
2.54.0
On Thu, May 21, 2026 at 10:51:15PM +0200, Andrea Righi wrote:
> Commit c9d93a73ce87 ("sched/fair: Drop redundant RCU read lock in NOHZ
> kick path") removed the rcu_read_lock()/unlock() pair from
> set_cpu_sd_state_busy() and set_cpu_sd_state_idle() on the assumption
> that all callers run in a safe context for rcu_dereference_all(): IRQs
> disabled or cpus_write_lock() held.
>
> That assumption is wrong for the CPU hotplug teardown path. When CPUs
> are taken offline, set_cpu_sd_state_busy() is invoked via:
>
> cpuhp/N kthread
> cpuhp_thread_fun()
> cpuhp_invoke_callback()
> sched_cpu_deactivate()
> nohz_balance_exit_idle()
> set_cpu_sd_state_busy()
> rcu_dereference_all(per_cpu(sd_llc, cpu))
>
> Restore the rcu_read_lock()/unlock() pair in both helpers;
> nohz_balancer_kick() is left as is, since its IRQ-disabled context is
> genuinely sufficient.
>
> Fixes: c9d93a73ce87 ("sched/fair: Drop redundant RCU read lock in NOHZ kick path")
> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Closes: https://lore.kernel.org/all/38fe0a1d-1a48-435a-910a-c278024d9ac9@samsung.com/
> Signed-off-by: Andrea Righi <arighi@nvidia.com>
So the obvious alternative is to disable RCU in the one caller that
doesn't play ball.
Was that considered?
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8699,7 +8699,8 @@ int sched_cpu_deactivate(unsigned int cp
* Remove CPU from nohz.idle_cpus_mask to prevent participating in
* load balancing when not active
*/
- nohz_balance_exit_idle(rq);
+ scoped_guard (rcu)
+ nohz_balance_exit_idle(rq);
set_cpu_active(cpu, false);
Hi Peter,
On Fri, May 22, 2026 at 09:28:53AM +0200, Peter Zijlstra wrote:
> On Thu, May 21, 2026 at 10:51:15PM +0200, Andrea Righi wrote:
> > Commit c9d93a73ce87 ("sched/fair: Drop redundant RCU read lock in NOHZ
> > kick path") removed the rcu_read_lock()/unlock() pair from
> > set_cpu_sd_state_busy() and set_cpu_sd_state_idle() on the assumption
> > that all callers run in a safe context for rcu_dereference_all(): IRQs
> > disabled or cpus_write_lock() held.
> >
> > That assumption is wrong for the CPU hotplug teardown path. When CPUs
> > are taken offline, set_cpu_sd_state_busy() is invoked via:
> >
> > cpuhp/N kthread
> > cpuhp_thread_fun()
> > cpuhp_invoke_callback()
> > sched_cpu_deactivate()
> > nohz_balance_exit_idle()
> > set_cpu_sd_state_busy()
> > rcu_dereference_all(per_cpu(sd_llc, cpu))
>
> >
> > Restore the rcu_read_lock()/unlock() pair in both helpers;
> > nohz_balancer_kick() is left as is, since its IRQ-disabled context is
> > genuinely sufficient.
> >
> > Fixes: c9d93a73ce87 ("sched/fair: Drop redundant RCU read lock in NOHZ kick path")
> > Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > Closes: https://lore.kernel.org/all/38fe0a1d-1a48-435a-910a-c278024d9ac9@samsung.com/
> > Signed-off-by: Andrea Righi <arighi@nvidia.com>
>
> So the obvious alternative is to disable RCU in the one caller that
> doesn't play ball.
>
> Was that considered?
This also works (tested, just in case). Since the original intent was to drop
these redundant RCU read locks, we should probably go this way. I'll send a new
patch shortly with your Suggested-by.
Thanks!
-Andrea
>
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -8699,7 +8699,8 @@ int sched_cpu_deactivate(unsigned int cp
> * Remove CPU from nohz.idle_cpus_mask to prevent participating in
> * load balancing when not active
> */
> - nohz_balance_exit_idle(rq);
> + scoped_guard (rcu)
> + nohz_balance_exit_idle(rq);
>
> set_cpu_active(cpu, false);
>
>
Hello Andrea,
On 5/22/2026 2:21 AM, Andrea Righi wrote:
> @@ -14070,6 +14070,8 @@ static void nohz_balancer_kick(struct rq *rq)
> static void set_cpu_sd_state_busy(int cpu)
> {
> struct sched_domain *sd;
> +
> + rcu_read_lock();
guard(rcu)() saves on the need for an unlock label but this works fine
too. Feel free to include:
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
> sd = rcu_dereference_all(per_cpu(sd_llc, cpu));
>
> /*
--
Thanks and Regards,
Prateek
On Fri, May 22, 2026 at 09:44:15AM +0530, K Prateek Nayak wrote:
> Hello Andrea,
>
> On 5/22/2026 2:21 AM, Andrea Righi wrote:
> > @@ -14070,6 +14070,8 @@ static void nohz_balancer_kick(struct rq *rq)
> > static void set_cpu_sd_state_busy(int cpu)
> > {
> > struct sched_domain *sd;
> > +
> > + rcu_read_lock();
>
> guard(rcu)() saves on the need for an unlock label but this works fine
> too. Feel free to include:
>
> Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Ah yes, could be a good opportunity to switch to guard(rcu)(), I'll send a v2.
Thanks,
-Andrea
>
> > sd = rcu_dereference_all(per_cpu(sd_llc, cpu));
> >
> > /*
> --
> Thanks and Regards,
> Prateek
>
© 2016 - 2026 Red Hat, Inc.