[PATCH] sched/fair: Restore RCU read lock in set_cpu_sd_state_{busy,idle}()

Andrea Righi posted 1 patch 2 days, 21 hours ago
There is a newer version of this series
kernel/sched/fair.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
[PATCH] sched/fair: Restore RCU read lock in set_cpu_sd_state_{busy,idle}()
Posted by Andrea Righi 2 days, 21 hours ago
Commit c9d93a73ce87 ("sched/fair: Drop redundant RCU read lock in NOHZ
kick path") removed the rcu_read_lock()/unlock() pair from
set_cpu_sd_state_busy() and set_cpu_sd_state_idle() on the assumption
that all callers run in a safe context for rcu_dereference_all(): IRQs
disabled or cpus_write_lock() held.

That assumption is wrong for the CPU hotplug teardown path. When CPUs
are taken offline, set_cpu_sd_state_busy() is invoked via:

 cpuhp/N kthread
   cpuhp_thread_fun()
     cpuhp_invoke_callback()
       sched_cpu_deactivate()
         nohz_balance_exit_idle()
           set_cpu_sd_state_busy()
             rcu_dereference_all(per_cpu(sd_llc, cpu))

The cpuhp kthread holds cpu_hotplug_lock (percpu-rwsem) but runs with
preemption and IRQs enabled. As a result, lockdep correctly reports a
suspicious RCU usage on CPU offline, e.g.:

  # echo 0 > /sys/devices/system/cpu/cpu1/online

  =============================
  WARNING: suspicious RCU usage
  -----------------------------
  kernel/sched/fair.c:12793 suspicious rcu_dereference_check() usage!
  ...
  2 locks held by cpuhp/1/20:
   #0: (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x42/0x1ae
   #1: (cpuhp_state-down){+.+.}-{0:0}, at: cpuhp_thread_fun+0x72/0x1ae

  Call Trace:
    lockdep_rcu_suspicious
    nohz_balance_exit_idle
    sched_cpu_deactivate
    cpuhp_invoke_callback
    cpuhp_thread_fun
    smpboot_thread_fn

Restore the rcu_read_lock()/unlock() pair in both helpers;
nohz_balancer_kick() is left as is, since its IRQ-disabled context is
genuinely sufficient.

Fixes: c9d93a73ce87 ("sched/fair: Drop redundant RCU read lock in NOHZ kick path")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Closes: https://lore.kernel.org/all/38fe0a1d-1a48-435a-910a-c278024d9ac9@samsung.com/
Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 kernel/sched/fair.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index aac24cfddecdf..381f378db71fa 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -14070,6 +14070,8 @@ static void nohz_balancer_kick(struct rq *rq)
 static void set_cpu_sd_state_busy(int cpu)
 {
 	struct sched_domain *sd;
+
+	rcu_read_lock();
 	sd = rcu_dereference_all(per_cpu(sd_llc, cpu));
 
 	/*
@@ -14077,10 +14079,12 @@ static void set_cpu_sd_state_busy(int cpu)
 	 * domain has no shared object there is nothing to clear or account.
 	 */
 	if (!sd || !sd->shared || !sd->nohz_idle)
-		return;
+		goto unlock;
 	sd->nohz_idle = 0;
 
 	atomic_inc(&sd->shared->nr_busy_cpus);
+unlock:
+	rcu_read_unlock();
 }
 
 void nohz_balance_exit_idle(struct rq *rq)
@@ -14099,14 +14103,18 @@ void nohz_balance_exit_idle(struct rq *rq)
 static void set_cpu_sd_state_idle(int cpu)
 {
 	struct sched_domain *sd;
+
+	rcu_read_lock();
 	sd = rcu_dereference_all(per_cpu(sd_llc, cpu));
 
 	/* See set_cpu_sd_state_busy(): nohz_idle is only used with sd->shared. */
 	if (!sd || !sd->shared || sd->nohz_idle)
-		return;
+		goto unlock;
 	sd->nohz_idle = 1;
 
 	atomic_dec(&sd->shared->nr_busy_cpus);
+unlock:
+	rcu_read_unlock();
 }
 
 /*

base-commit: b07a332d9cbbd9cc9cfa923a21bd061cfb69bea5
-- 
2.54.0
Re: [PATCH] sched/fair: Restore RCU read lock in set_cpu_sd_state_{busy,idle}()
Posted by Peter Zijlstra 2 days, 11 hours ago
On Thu, May 21, 2026 at 10:51:15PM +0200, Andrea Righi wrote:
> Commit c9d93a73ce87 ("sched/fair: Drop redundant RCU read lock in NOHZ
> kick path") removed the rcu_read_lock()/unlock() pair from
> set_cpu_sd_state_busy() and set_cpu_sd_state_idle() on the assumption
> that all callers run in a safe context for rcu_dereference_all(): IRQs
> disabled or cpus_write_lock() held.
> 
> That assumption is wrong for the CPU hotplug teardown path. When CPUs
> are taken offline, set_cpu_sd_state_busy() is invoked via:
> 
>  cpuhp/N kthread
>    cpuhp_thread_fun()
>      cpuhp_invoke_callback()
>        sched_cpu_deactivate()
>          nohz_balance_exit_idle()
>            set_cpu_sd_state_busy()
>              rcu_dereference_all(per_cpu(sd_llc, cpu))

> 
> Restore the rcu_read_lock()/unlock() pair in both helpers;
> nohz_balancer_kick() is left as is, since its IRQ-disabled context is
> genuinely sufficient.
> 
> Fixes: c9d93a73ce87 ("sched/fair: Drop redundant RCU read lock in NOHZ kick path")
> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Closes: https://lore.kernel.org/all/38fe0a1d-1a48-435a-910a-c278024d9ac9@samsung.com/
> Signed-off-by: Andrea Righi <arighi@nvidia.com>

So the obvious alternative is to disable RCU in the one caller that
doesn't play ball.

Was that considered?


--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8699,7 +8699,8 @@ int sched_cpu_deactivate(unsigned int cp
 	 * Remove CPU from nohz.idle_cpus_mask to prevent participating in
 	 * load balancing when not active
 	 */
-	nohz_balance_exit_idle(rq);
+	scoped_guard (rcu)
+		nohz_balance_exit_idle(rq);
 
 	set_cpu_active(cpu, false);
Re: [PATCH] sched/fair: Restore RCU read lock in set_cpu_sd_state_{busy,idle}()
Posted by Andrea Righi 2 days, 9 hours ago
Hi Peter,

On Fri, May 22, 2026 at 09:28:53AM +0200, Peter Zijlstra wrote:
> On Thu, May 21, 2026 at 10:51:15PM +0200, Andrea Righi wrote:
> > Commit c9d93a73ce87 ("sched/fair: Drop redundant RCU read lock in NOHZ
> > kick path") removed the rcu_read_lock()/unlock() pair from
> > set_cpu_sd_state_busy() and set_cpu_sd_state_idle() on the assumption
> > that all callers run in a safe context for rcu_dereference_all(): IRQs
> > disabled or cpus_write_lock() held.
> > 
> > That assumption is wrong for the CPU hotplug teardown path. When CPUs
> > are taken offline, set_cpu_sd_state_busy() is invoked via:
> > 
> >  cpuhp/N kthread
> >    cpuhp_thread_fun()
> >      cpuhp_invoke_callback()
> >        sched_cpu_deactivate()
> >          nohz_balance_exit_idle()
> >            set_cpu_sd_state_busy()
> >              rcu_dereference_all(per_cpu(sd_llc, cpu))
> 
> > 
> > Restore the rcu_read_lock()/unlock() pair in both helpers;
> > nohz_balancer_kick() is left as is, since its IRQ-disabled context is
> > genuinely sufficient.
> > 
> > Fixes: c9d93a73ce87 ("sched/fair: Drop redundant RCU read lock in NOHZ kick path")
> > Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > Closes: https://lore.kernel.org/all/38fe0a1d-1a48-435a-910a-c278024d9ac9@samsung.com/
> > Signed-off-by: Andrea Righi <arighi@nvidia.com>
> 
> So the obvious alternative is to disable RCU in the one caller that
> doesn't play ball.
> 
> Was that considered?

This also works (tested, just in case). Since the original intent was to drop
these redundant RCU read locks, we should probably go this way. I'll send a new
patch shortly with your Suggested-by.

Thanks!
-Andrea

> 
> 
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -8699,7 +8699,8 @@ int sched_cpu_deactivate(unsigned int cp
>  	 * Remove CPU from nohz.idle_cpus_mask to prevent participating in
>  	 * load balancing when not active
>  	 */
> -	nohz_balance_exit_idle(rq);
> +	scoped_guard (rcu)
> +		nohz_balance_exit_idle(rq);
>  
>  	set_cpu_active(cpu, false);
>  
>
Re: [PATCH] sched/fair: Restore RCU read lock in set_cpu_sd_state_{busy,idle}()
Posted by K Prateek Nayak 2 days, 14 hours ago
Hello Andrea,

On 5/22/2026 2:21 AM, Andrea Righi wrote:
> @@ -14070,6 +14070,8 @@ static void nohz_balancer_kick(struct rq *rq)
>  static void set_cpu_sd_state_busy(int cpu)
>  {
>  	struct sched_domain *sd;
> +
> +	rcu_read_lock();

guard(rcu)() saves on the need for an unlock label but this works fine
too. Feel free to include:

Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>

>  	sd = rcu_dereference_all(per_cpu(sd_llc, cpu));
>  
>  	/*
-- 
Thanks and Regards,
Prateek
Re: [PATCH] sched/fair: Restore RCU read lock in set_cpu_sd_state_{busy,idle}()
Posted by Andrea Righi 2 days, 13 hours ago
On Fri, May 22, 2026 at 09:44:15AM +0530, K Prateek Nayak wrote:
> Hello Andrea,
> 
> On 5/22/2026 2:21 AM, Andrea Righi wrote:
> > @@ -14070,6 +14070,8 @@ static void nohz_balancer_kick(struct rq *rq)
> >  static void set_cpu_sd_state_busy(int cpu)
> >  {
> >  	struct sched_domain *sd;
> > +
> > +	rcu_read_lock();
> 
> guard(rcu)() saves on the need for an unlock label but this works fine
> too. Feel free to include:
> 
> Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>

Ah yes, could be a good opportunity to switch to guard(rcu)(), I'll send a v2.

Thanks,
-Andrea

> 
> >  	sd = rcu_dereference_all(per_cpu(sd_llc, cpu));
> >  
> >  	/*
> -- 
> Thanks and Regards,
> Prateek
>