[v3] rcu/nocb: Cleanup patches for next merge window

[PATCH -next v3 2/3] rcu/nocb: Remove dead callback overload handling

Posted by Joel Fernandes 2 weeks, 6 days ago

During callback overload (exceeding qhimark), the NOCB code attempts
opportunistic advancement via rcu_advance_cbs_nowake(). Analysis shows
this entire code path is dead:

- 30 overload conditions triggered with 300,000 callback flood
- 0 advancements actually occurred
- 100% of time blocked because current GP not done

The overload condition triggers when callbacks are coming in at a high
rate with GPs not completing as fast. But the advancement requires the
GP to be complete - a logical contradiction. Even if the GP did complete
in time, nocb_gp_wait() has to wake up anyway to do the advancement, so
it is pointless.

Since the advancement is dead code, the entire overload handling block
serves no purpose. Remove it entirely.

Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 kernel/rcu/tree_nocb.h | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index f525e4f7985b..64a8ff350f92 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -526,8 +526,6 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
 				 __releases(rdp->nocb_lock)
 {
 	long bypass_len;
-	unsigned long cur_gp_seq;
-	unsigned long j;
 	long lazy_len;
 	long len;
 	struct task_struct *t;
@@ -562,16 +560,6 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
 		}
 
 		return;
-	} else if (len > rdp->qlen_last_fqs_check + qhimark) {
-		/* ... or if many callbacks queued. */
-		rdp->qlen_last_fqs_check = len;
-		j = jiffies;
-		if (j != rdp->nocb_gp_adv_time &&
-		    rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) &&
-		    rcu_seq_done(&rdp->mynode->gp_seq, cur_gp_seq)) {
-			rcu_advance_cbs_nowake(rdp->mynode, rdp);
-			rdp->nocb_gp_adv_time = j;
-		}
 	}
 
 	rcu_nocb_unlock(rdp);
-- 
2.34.1

Re: [PATCH -next v3 2/3] rcu/nocb: Remove dead callback overload handling

Posted by Paul E. McKenney 2 weeks, 2 days ago

On Mon, Jan 19, 2026 at 06:12:22PM -0500, Joel Fernandes wrote:
> During callback overload (exceeding qhimark), the NOCB code attempts
> opportunistic advancement via rcu_advance_cbs_nowake(). Analysis shows
> this entire code path is dead:
> 
> - 30 overload conditions triggered with 300,000 callback flood
> - 0 advancements actually occurred
> - 100% of time blocked because current GP not done
> 
> The overload condition triggers when callbacks are coming in at a high
> rate with GPs not completing as fast. But the advancement requires the
> GP to be complete - a logical contradiction. Even if the GP did complete
> in time, nocb_gp_wait() has to wake up anyway to do the advancement, so
> it is pointless.
> 
> Since the advancement is dead code, the entire overload handling block
> serves no purpose. Remove it entirely.
> 
> Suggested-by: Frederic Weisbecker <frederic@kernel.org>
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  kernel/rcu/tree_nocb.h | 12 ------------
>  1 file changed, 12 deletions(-)
> 
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index f525e4f7985b..64a8ff350f92 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -526,8 +526,6 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
>  				 __releases(rdp->nocb_lock)
>  {
>  	long bypass_len;
> -	unsigned long cur_gp_seq;
> -	unsigned long j;
>  	long lazy_len;
>  	long len;
>  	struct task_struct *t;
> @@ -562,16 +560,6 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
>  		}
>  
>  		return;
> -	} else if (len > rdp->qlen_last_fqs_check + qhimark) {
> -		/* ... or if many callbacks queued. */
> -		rdp->qlen_last_fqs_check = len;
> -		j = jiffies;
> -		if (j != rdp->nocb_gp_adv_time &&
> -		    rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) &&
> -		    rcu_seq_done(&rdp->mynode->gp_seq, cur_gp_seq)) {
> -			rcu_advance_cbs_nowake(rdp->mynode, rdp);
> -			rdp->nocb_gp_adv_time = j;
> -		}
>  	}
>  
>  	rcu_nocb_unlock(rdp);
> -- 
> 2.34.1
>

Re: [PATCH -next v3 2/3] rcu/nocb: Remove dead callback overload handling

Posted by Paul E. McKenney 2 weeks, 3 days ago

On Mon, Jan 19, 2026 at 06:12:22PM -0500, Joel Fernandes wrote:
> During callback overload (exceeding qhimark), the NOCB code attempts
> opportunistic advancement via rcu_advance_cbs_nowake(). Analysis shows
> this entire code path is dead:
> 
> - 30 overload conditions triggered with 300,000 callback flood
> - 0 advancements actually occurred
> - 100% of time blocked because current GP not done
> 
> The overload condition triggers when callbacks are coming in at a high
> rate with GPs not completing as fast. But the advancement requires the
> GP to be complete - a logical contradiction. Even if the GP did complete
> in time, nocb_gp_wait() has to wake up anyway to do the advancement, so
> it is pointless.
> 
> Since the advancement is dead code, the entire overload handling block
> serves no purpose. Remove it entirely.
> 
> Suggested-by: Frederic Weisbecker <frederic@kernel.org>
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> ---
>  kernel/rcu/tree_nocb.h | 12 ------------
>  1 file changed, 12 deletions(-)
> 
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index f525e4f7985b..64a8ff350f92 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -526,8 +526,6 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
>  				 __releases(rdp->nocb_lock)
>  {
>  	long bypass_len;
> -	unsigned long cur_gp_seq;
> -	unsigned long j;
>  	long lazy_len;
>  	long len;
>  	struct task_struct *t;
> @@ -562,16 +560,6 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
>  		}
>  
>  		return;
> -	} else if (len > rdp->qlen_last_fqs_check + qhimark) {
> -		/* ... or if many callbacks queued. */
> -		rdp->qlen_last_fqs_check = len;
> -		j = jiffies;
> -		if (j != rdp->nocb_gp_adv_time &&
> -		    rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) &&

This places in cur_gp_seq not the grace period for the current callback
(which would be unlikely to have finished), but rather the grace period
for the oldest callback that has not yet been marked as done.  And that
callback started some time ago, and thus might well have finished.

So while this code might not have been executed in your tests, it is
definitely not a logical contradiction.

Or am I missing something subtle here?

						Thanx, Paul

> -		    rcu_seq_done(&rdp->mynode->gp_seq, cur_gp_seq)) {
> -			rcu_advance_cbs_nowake(rdp->mynode, rdp);
> -			rdp->nocb_gp_adv_time = j;
> -		}
>  	}
>  
>  	rcu_nocb_unlock(rdp);
> -- 
> 2.34.1
>

Re: [PATCH -next v3 2/3] rcu/nocb: Remove dead callback overload handling

Posted by Frederic Weisbecker 2 weeks, 6 days ago

Le Mon, Jan 19, 2026 at 06:12:22PM -0500, Joel Fernandes a écrit :
> During callback overload (exceeding qhimark), the NOCB code attempts
> opportunistic advancement via rcu_advance_cbs_nowake(). Analysis shows
> this entire code path is dead:
> 
> - 30 overload conditions triggered with 300,000 callback flood
> - 0 advancements actually occurred
> - 100% of time blocked because current GP not done
> 
> The overload condition triggers when callbacks are coming in at a high
> rate with GPs not completing as fast. But the advancement requires the
> GP to be complete - a logical contradiction. Even if the GP did complete
> in time, nocb_gp_wait() has to wake up anyway to do the advancement, so
> it is pointless.
> 
> Since the advancement is dead code, the entire overload handling block
> serves no purpose. Remove it entirely.
> 
> Suggested-by: Frederic Weisbecker <frederic@kernel.org>
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

Would be nice to have Paul's ack as well, in case we missed something subtle
here.

Also probably for upcoming merge window + 1, note that similar code with
similar removal opportunity resides in rcu_nocb_try_bypass().
And ->nocb_gp_adv_time could then be removed.

Thanks.

-- 
Frederic Weisbecker
SUSE Labs

Re: [PATCH -next v3 2/3] rcu/nocb: Remove dead callback overload handling

Posted by Paul E. McKenney 2 weeks, 6 days ago

On Tue, Jan 20, 2026 at 12:53:26AM +0100, Frederic Weisbecker wrote:
> Le Mon, Jan 19, 2026 at 06:12:22PM -0500, Joel Fernandes a écrit :
> > During callback overload (exceeding qhimark), the NOCB code attempts
> > opportunistic advancement via rcu_advance_cbs_nowake(). Analysis shows
> > this entire code path is dead:
> > 
> > - 30 overload conditions triggered with 300,000 callback flood
> > - 0 advancements actually occurred
> > - 100% of time blocked because current GP not done
> > 
> > The overload condition triggers when callbacks are coming in at a high
> > rate with GPs not completing as fast. But the advancement requires the
> > GP to be complete - a logical contradiction. Even if the GP did complete
> > in time, nocb_gp_wait() has to wake up anyway to do the advancement, so
> > it is pointless.
> > 
> > Since the advancement is dead code, the entire overload handling block
> > serves no purpose. Remove it entirely.
> > 
> > Suggested-by: Frederic Weisbecker <frederic@kernel.org>
> > Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> 
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> 
> Would be nice to have Paul's ack as well, in case we missed something subtle
> here.

Given that you are good with it, I will take a look.  And test it.  ;-)

> Also probably for upcoming merge window + 1, note that similar code with
> similar removal opportunity resides in rcu_nocb_try_bypass().
> And ->nocb_gp_adv_time could then be removed.

Further simplification sounds like a good thing!  Just not too simple,
you understand!  ;-)

							Thanx, Paul

Re: [PATCH -next v3 2/3] rcu/nocb: Remove dead callback overload handling

Posted by joelagnelf@nvidia.com 2 weeks, 6 days ago

> On Jan 19, 2026, at 7:07 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> 
> On Tue, Jan 20, 2026 at 12:53:26AM +0100, Frederic Weisbecker wrote:
>> Le Mon, Jan 19, 2026 at 06:12:22PM -0500, Joel Fernandes a \ufffd\ufffdcrit :
>>> During callback overload (exceeding qhimark), the NOCB code attempts
>>> opportunistic advancement via rcu_advance_cbs_nowake(). Analysis shows
>>> this entire code path is dead:
>>> 
>>> - 30 overload conditions triggered with 300,000 callback flood
>>> - 0 advancements actually occurred
>>> - 100% of time blocked because current GP not done
>>> 
>>> The overload condition triggers when callbacks are coming in at a high
>>> rate with GPs not completing as fast. But the advancement requires the
>>> GP to be complete - a logical contradiction. Even if the GP did complete
>>> in time, nocb_gp_wait() has to wake up anyway to do the advancement, so
>>> it is pointless.
>>> 
>>> Since the advancement is dead code, the entire overload handling block
>>> serves no purpose. Remove it entirely.
>>> 
>>> Suggested-by: Frederic Weisbecker <frederic@kernel.org>
>>> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
>> 
>> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
>> 
>> Would be nice to have Paul's ack as well, in case we missed something subtle
>> here.
> 
> Given that you are good with it, I will take a look.  And test it.  ;-)

Sure, thanks!

>> Also probably for upcoming merge window + 1, note that similar code with
>> similar removal opportunity resides in rcu_nocb_try_bypass().
>> And ->nocb_gp_adv_time could then be removed.
> 
> Further simplification sounds like a good thing!  Just not too simple,
> you understand!  ;-)

Yes I have some more queued in my local tree that I plan for merge window + 1. :-)

By the way, I have another recent idea: why don't we trigger nocb poll mode
automatically under overload condition? Currently rcu_nocb_poll is only set via
the boot parameter and stays constant. Testing shows me that poll mode can cause
GP completion faster during overload, so dynamically enabling it when we exceed
qhimark could be beneficial. The question then is how do we turn it off
dynamically as well - perhaps when callback count drops below qlowmark, and
using some debounce logic to avoid too frequent toggling?

>                            Thanx, Paul

thanks,

 - Joel