[PATCH -next 5/8] rcu/nocb: Add warning to detect if overload advancement is ever useful

Joel Fernandes posted 8 patches 1 month, 1 week ago
[PATCH -next 5/8] rcu/nocb: Add warning to detect if overload advancement is ever useful
Posted by Joel Fernandes 1 month, 1 week ago
During callback overload, the NOCB code attempts an opportunistic
advancement via rcu_advance_cbs_nowake().

Analysis via tracing with 300,000 callbacks flooded shows this
optimization is likely dead code:
- 30 overload conditions triggered
- 0 advancements actually occurred
- 100% of time no advancement due to current GP not done.

I also ran TREE05 and TREE08 for 2 hours and cannot trigger it.

When callbacks overflow (exceed qhimark), they are waiting for a grace
period that hasn't completed yet. The optimization requires the GP to be
complete to advance callbacks, but the overload condition itself is
caused by callbacks piling up faster than GPs can complete. This creates
a logical contradiction where the advancement cannot happen.

In *theory* this might be possible, the GP completed just in the nick of
time as we hit the overload, but this is just so rare that it can be
considered impossible when we cannot even hit it with synthetic callback
flooding even, it is a waste of cycles to even try to advance, let alone
be useful and is a maintenance burden complexity we don't need.

I suggest deletion. However, add a WARN_ON_ONCE for a merge window or 2
and delete it after out of extreme caution.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 kernel/rcu/tree_nocb.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 7e9d465c8ab1..d3e6a0e77210 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -571,8 +571,20 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
 		if (j != rdp->nocb_gp_adv_time &&
 		    rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) &&
 		    rcu_seq_done(&rdp->mynode->gp_seq, cur_gp_seq)) {
+			long done_before = rcu_segcblist_get_seglen(&rdp->cblist, RCU_DONE_TAIL);
+
 			rcu_advance_cbs_nowake(rdp->mynode, rdp);
 			rdp->nocb_gp_adv_time = j;
+
+			/*
+			 * The advance_cbs call above is not useful. Under an
+			 * overload condition, nocb_gp_wait() is always waiting
+			 * for GP completion, due to this nothing can be moved
+			 * from WAIT to DONE, in the list. WARN if an
+			 * advancement happened (next step is deletion of advance).
+			 */
+			WARN_ON_ONCE(rcu_segcblist_get_seglen(&rdp->cblist,
+				     RCU_DONE_TAIL) > done_before);
 		}
 	}
 
-- 
2.34.1
Re: [PATCH -next 5/8] rcu/nocb: Add warning to detect if overload advancement is ever useful
Posted by Joel Fernandes 3 weeks, 5 days ago
Since I am resubmitting the nocb patches in this series (3 of them from this
series) for the next merge window, I thought I'll replace this particular patch
with just a deletion of the rcu_advance_cbs_nowake() call itself instead of
bloating the code path with warnings and comments.

linux-next and many days of testing on my side are also looking good.

Thoughts?  Once I get any opinions, I'll change this patch to do the deletion.
Also I am adding one other (trivial) patch to this series:
https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=nocb-7.0&id=84669d678b9cb28ff8774a3b6457186a4a187c75

Running overnight tests on all 4 patches now...

thanks,

 - Joel

On 1/1/2026 11:34 AM, Joel Fernandes wrote:
> During callback overload, the NOCB code attempts an opportunistic
> advancement via rcu_advance_cbs_nowake().
> 
> Analysis via tracing with 300,000 callbacks flooded shows this
> optimization is likely dead code:
> - 30 overload conditions triggered
> - 0 advancements actually occurred
> - 100% of time no advancement due to current GP not done.
> 
> I also ran TREE05 and TREE08 for 2 hours and cannot trigger it.
> 
> When callbacks overflow (exceed qhimark), they are waiting for a grace
> period that hasn't completed yet. The optimization requires the GP to be
> complete to advance callbacks, but the overload condition itself is
> caused by callbacks piling up faster than GPs can complete. This creates
> a logical contradiction where the advancement cannot happen.
> 
> In *theory* this might be possible, the GP completed just in the nick of
> time as we hit the overload, but this is just so rare that it can be
> considered impossible when we cannot even hit it with synthetic callback
> flooding even, it is a waste of cycles to even try to advance, let alone
> be useful and is a maintenance burden complexity we don't need.
> 
> I suggest deletion. However, add a WARN_ON_ONCE for a merge window or 2
> and delete it after out of extreme caution.
> 
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> ---
>  kernel/rcu/tree_nocb.h | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 7e9d465c8ab1..d3e6a0e77210 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -571,8 +571,20 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
>  		if (j != rdp->nocb_gp_adv_time &&
>  		    rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) &&
>  		    rcu_seq_done(&rdp->mynode->gp_seq, cur_gp_seq)) {
> +			long done_before = rcu_segcblist_get_seglen(&rdp->cblist, RCU_DONE_TAIL);
> +
>  			rcu_advance_cbs_nowake(rdp->mynode, rdp);
>  			rdp->nocb_gp_adv_time = j;
> +
> +			/*
> +			 * The advance_cbs call above is not useful. Under an
> +			 * overload condition, nocb_gp_wait() is always waiting
> +			 * for GP completion, due to this nothing can be moved
> +			 * from WAIT to DONE, in the list. WARN if an
> +			 * advancement happened (next step is deletion of advance).
> +			 */
> +			WARN_ON_ONCE(rcu_segcblist_get_seglen(&rdp->cblist,
> +				     RCU_DONE_TAIL) > done_before);
>  		}
>  	}
>