kernel/rcu/tree_nocb.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
We observed a regression in our customer’s environment after enabling
CONFIG_LAZY_RCU. In the Android Update Engine scenario, where ioctl() is
used heavily, we found that callbacks queued via call_rcu_hurry (such as
percpu_ref_switch_to_atomic_rcu) can sometimes be delayed by up to 5
seconds before execution. This occurs because the new grace period does
not start immediately after the previous one completes.
The root cause is that the wake_nocb_gp_defer() function now checks
"rdp->nocb_defer_wakeup" instead of "rdp_gp->nocb_defer_wakeup". On CPUs
that are not rcuog, "rdp->nocb_defer_wakeup" may always be
RCU_NOCB_WAKE_NOT. This can cause "rdp_gp->nocb_defer_wakeup" to be
downgraded and the "rdp_gp->nocb_timer" to be postponed by up to 10
seconds, delaying the execution of hurry RCU callbacks.
The trace log of one scenario we encountered is as follow:
// previous GP ends at this point
rcu_preempt [000] d..1. 137.240210: rcu_grace_period: rcu_preempt 8369 end
rcu_preempt [000] ..... 137.240212: rcu_grace_period: rcu_preempt 8372 reqwait
// call_rcu_hurry enqueues "percpu_ref_switch_to_atomic_rcu", the callback waited on by UpdateEngine
update_engine [002] d..1. 137.301593: __call_rcu_common: wyy: unlikely p_ref = 00000000********. lazy = 0
// FirstQ on cpu 2 rdp_gp->nocb_timer is set to fire after 1 jiffy (4ms)
// and the rdp_gp->nocb_defer_wakeup is set to RCU_NOCB_WAKE
update_engine [002] d..2. 137.301595: rcu_nocb_wake: rcu_preempt 2 FirstQ on cpu2 with rdp_gp (cpu0).
// FirstBQ event on cpu2 during the 1 jiffy, make the timer postpond 10 seconds later.
// also, the rdp_gp->nocb_defer_wakeup is overwrite to RCU_NOCB_WAKE_LAZY
update_engine [002] d..1. 137.301601: rcu_nocb_wake: rcu_preempt 2 WakeEmptyIsDeferred
...
...
...
// before the 10 seconds timeout, cpu0 received another call_rcu_hurry
// reset the timer to jiffies+1 and set the waketype = RCU_NOCB_WAKE.
kworker/u32:0 [000] d..2. 142.557564: rcu_nocb_wake: rcu_preempt 0 FirstQ
kworker/u32:0 [000] d..1. 142.557576: rcu_nocb_wake: rcu_preempt 0 WakeEmptyIsDeferred
kworker/u32:0 [000] d..1. 142.558296: rcu_nocb_wake: rcu_preempt 0 WakeNot
kworker/u32:0 [000] d..1. 142.558562: rcu_nocb_wake: rcu_preempt 0 WakeNot
// idle(do_nocb_deferred_wakeup) wake rcuog due to waketype == RCU_NOCB_WAKE
<idle> [000] d..1. 142.558786: rcu_nocb_wake: rcu_preempt 0 DoWake
<idle> [000] dN.1. 142.558839: rcu_nocb_wake: rcu_preempt 0 DeferredWake
rcuog/0 [000] ..... 142.558871: rcu_nocb_wake: rcu_preempt 0 EndSleep
rcuog/0 [000] ..... 142.558877: rcu_nocb_wake: rcu_preempt 0 Check
// finally rcuog request a new GP at this point (5 seconds after the FirstQ event)
rcuog/0 [000] d..2. 142.558886: rcu_grace_period: rcu_preempt 8372 newreq
rcu_preempt [001] d..1. 142.559458: rcu_grace_period: rcu_preempt 8373 start
...
rcu_preempt [000] d..1. 142.564258: rcu_grace_period: rcu_preempt 8373 end
rcuop/2 [000] D..1. 142.566337: rcu_batch_start: rcu_preempt CBs=219 bl=10
// the hurry CB is invoked at this point
rcuop/2 [000] b.... 142.566352: blk_queue_usage_counter_release: wyy: wakeup. p_ref = 00000000********.
This patch changes the condition to check "rdp_gp->nocb_defer_wakeup" in
the lazy path. This prevents an already scheduled "rdp_gp->nocb_timer"
from being postponed and avoids overwriting "rdp_gp->nocb_defer_wakeup"
when it is not RCU_NOCB_WAKE_NOT.
Fixes: 3cb278e73be5 ("rcu: Make call_rcu() lazy to save power")
Co-developed-by: Cheng-jui Wang <cheng-jui.wang@mediatek.com>
Signed-off-by: Cheng-jui Wang <cheng-jui.wang@mediatek.com>
Co-developed-by: Lorry.Luo@mediatek.com
Signed-off-by: Lorry.Luo@mediatek.com
Tested-by: weiyangyang@vivo.com
Signed-off-by: weiyangyang@vivo.com
Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
---
The regression is first observed by wyy in the Update Engine scenario
with CONFIG_LAZY_RCU enabled. there is an additional delay of 4–5
seconds during the heavy ioctl API call, waiting for
percpu_ref_switch_to_atomic_rcu (RCU hurry CB) to complete.
Initially, we suspected that the percpu_ref_switch_to_atomic_rcu
function itself was taking too long. However, after enabling some
custome and the following trace events: rcu_do_batch, rcu_nocb_wake, and
rcu_grace_period. we found that the root cause was that rcuog was not
being woken up in time to request a new GP. This led to the delay in
invoking the hurry RCU callback (percpu_ref_switch_to_atomic_rcu).
Environment:
Android-16, Kernel: 6.12, 8 CPUs (ARM)
Configuration:
CONFIG_TREE_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_LAZY_RCU=y
CONFIG_RCU_NOCB_CPU=y
CONFIG_RCU_NOCB_CPU_DEFAULT_ALL=y
rcu_nocb_gp_stride = -1 (default is 4 for 8 cores)
jiffies_lazy_flush = 10 * HZ
Contributions:
Tze-Nan Wu:
Collaborated with Cheng-Jui to discuss which tracepoints needed to be
added, jointly analyzed the trace logs, identified the root cause, and
proposed this upstream change.
Cheng-Jui Wang:
Provided many valuable suggestions during the debugging process,
repeatedly found breakthroughs when we were stuck, and helped identify
the root cause.
Lorry Luo:
Assisted in verifying whether rcu-hurry-callback was executed too long
or deferred, supported with testing, and helped with communication.
Weiyangyang:
Main tester who discovered the regression scenario, confirmed that
enabling CONFIG_LAZY_RCU caused the regression, and verified that this
patch resolves the issue.
Note:
With my limited understanding of lazy RCU, I am not fully confident that
this is a real issue. In my opinion, hurry callbacks should not be
delayed by other events such as firstBQ trace event.
If my understanding is incorrect, I would greatly appreciate any
guidance or clarification from the maintainers.
---
kernel/rcu/tree_nocb.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 08eb9b0e2fab..e6cd56603cad 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -276,7 +276,7 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
* callback storms, no need to wake up too early.
*/
if (waketype == RCU_NOCB_WAKE_LAZY &&
- rdp->nocb_defer_wakeup == RCU_NOCB_WAKE_NOT) {
+ rdp_gp->nocb_defer_wakeup == RCU_NOCB_WAKE_NOT) {
mod_timer(&rdp_gp->nocb_timer, jiffies + rcu_get_jiffies_lazy_flush());
WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
} else if (waketype == RCU_NOCB_WAKE_BYPASS) {
--
2.45.2
Le Thu, Jul 17, 2025 at 01:53:38PM +0800, Tze-nan Wu a écrit :
> We observed a regression in our customer’s environment after enabling
> CONFIG_LAZY_RCU. In the Android Update Engine scenario, where ioctl() is
> used heavily, we found that callbacks queued via call_rcu_hurry (such as
> percpu_ref_switch_to_atomic_rcu) can sometimes be delayed by up to 5
> seconds before execution. This occurs because the new grace period does
> not start immediately after the previous one completes.
>
> The root cause is that the wake_nocb_gp_defer() function now checks
> "rdp->nocb_defer_wakeup" instead of "rdp_gp->nocb_defer_wakeup". On CPUs
> that are not rcuog, "rdp->nocb_defer_wakeup" may always be
> RCU_NOCB_WAKE_NOT. This can cause "rdp_gp->nocb_defer_wakeup" to be
> downgraded and the "rdp_gp->nocb_timer" to be postponed by up to 10
> seconds, delaying the execution of hurry RCU callbacks.
>
> The trace log of one scenario we encountered is as follow:
> // previous GP ends at this point
> rcu_preempt [000] d..1. 137.240210: rcu_grace_period: rcu_preempt 8369 end
> rcu_preempt [000] ..... 137.240212: rcu_grace_period: rcu_preempt 8372 reqwait
> // call_rcu_hurry enqueues "percpu_ref_switch_to_atomic_rcu", the callback waited on by UpdateEngine
> update_engine [002] d..1. 137.301593: __call_rcu_common: wyy: unlikely p_ref = 00000000********. lazy = 0
> // FirstQ on cpu 2 rdp_gp->nocb_timer is set to fire after 1 jiffy (4ms)
> // and the rdp_gp->nocb_defer_wakeup is set to RCU_NOCB_WAKE
> update_engine [002] d..2. 137.301595: rcu_nocb_wake: rcu_preempt 2 FirstQ on cpu2 with rdp_gp (cpu0).
> // FirstBQ event on cpu2 during the 1 jiffy, make the timer postpond 10 seconds later.
> // also, the rdp_gp->nocb_defer_wakeup is overwrite to RCU_NOCB_WAKE_LAZY
> update_engine [002] d..1. 137.301601: rcu_nocb_wake: rcu_preempt 2 WakeEmptyIsDeferred
> ...
> ...
> ...
> // before the 10 seconds timeout, cpu0 received another call_rcu_hurry
> // reset the timer to jiffies+1 and set the waketype = RCU_NOCB_WAKE.
> kworker/u32:0 [000] d..2. 142.557564: rcu_nocb_wake: rcu_preempt 0 FirstQ
> kworker/u32:0 [000] d..1. 142.557576: rcu_nocb_wake: rcu_preempt 0 WakeEmptyIsDeferred
> kworker/u32:0 [000] d..1. 142.558296: rcu_nocb_wake: rcu_preempt 0 WakeNot
> kworker/u32:0 [000] d..1. 142.558562: rcu_nocb_wake: rcu_preempt 0 WakeNot
> // idle(do_nocb_deferred_wakeup) wake rcuog due to waketype == RCU_NOCB_WAKE
> <idle> [000] d..1. 142.558786: rcu_nocb_wake: rcu_preempt 0 DoWake
> <idle> [000] dN.1. 142.558839: rcu_nocb_wake: rcu_preempt 0 DeferredWake
> rcuog/0 [000] ..... 142.558871: rcu_nocb_wake: rcu_preempt 0 EndSleep
> rcuog/0 [000] ..... 142.558877: rcu_nocb_wake: rcu_preempt 0 Check
> // finally rcuog request a new GP at this point (5 seconds after the FirstQ event)
> rcuog/0 [000] d..2. 142.558886: rcu_grace_period: rcu_preempt 8372 newreq
> rcu_preempt [001] d..1. 142.559458: rcu_grace_period: rcu_preempt 8373 start
> ...
> rcu_preempt [000] d..1. 142.564258: rcu_grace_period: rcu_preempt 8373 end
> rcuop/2 [000] D..1. 142.566337: rcu_batch_start: rcu_preempt CBs=219 bl=10
> // the hurry CB is invoked at this point
> rcuop/2 [000] b.... 142.566352: blk_queue_usage_counter_release: wyy: wakeup. p_ref = 00000000********.
>
> This patch changes the condition to check "rdp_gp->nocb_defer_wakeup" in
> the lazy path. This prevents an already scheduled "rdp_gp->nocb_timer"
> from being postponed and avoids overwriting "rdp_gp->nocb_defer_wakeup"
> when it is not RCU_NOCB_WAKE_NOT.
>
> Fixes: 3cb278e73be5 ("rcu: Make call_rcu() lazy to save power")
> Co-developed-by: Cheng-jui Wang <cheng-jui.wang@mediatek.com>
> Signed-off-by: Cheng-jui Wang <cheng-jui.wang@mediatek.com>
> Co-developed-by: Lorry.Luo@mediatek.com
> Signed-off-by: Lorry.Luo@mediatek.com
> Tested-by: weiyangyang@vivo.com
> Signed-off-by: weiyangyang@vivo.com
> Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
> ---
> The regression is first observed by wyy in the Update Engine scenario
> with CONFIG_LAZY_RCU enabled. there is an additional delay of 4–5
> seconds during the heavy ioctl API call, waiting for
> percpu_ref_switch_to_atomic_rcu (RCU hurry CB) to complete.
>
> Initially, we suspected that the percpu_ref_switch_to_atomic_rcu
> function itself was taking too long. However, after enabling some
> custome and the following trace events: rcu_do_batch, rcu_nocb_wake, and
> rcu_grace_period. we found that the root cause was that rcuog was not
> being woken up in time to request a new GP. This led to the delay in
> invoking the hurry RCU callback (percpu_ref_switch_to_atomic_rcu).
>
> Environment:
> Android-16, Kernel: 6.12, 8 CPUs (ARM)
>
> Configuration:
> CONFIG_TREE_RCU=y
> CONFIG_PREEMPT_RCU=y
> CONFIG_LAZY_RCU=y
> CONFIG_RCU_NOCB_CPU=y
> CONFIG_RCU_NOCB_CPU_DEFAULT_ALL=y
> rcu_nocb_gp_stride = -1 (default is 4 for 8 cores)
> jiffies_lazy_flush = 10 * HZ
>
> Contributions:
> Tze-Nan Wu:
> Collaborated with Cheng-Jui to discuss which tracepoints needed to be
> added, jointly analyzed the trace logs, identified the root cause, and
> proposed this upstream change.
>
> Cheng-Jui Wang:
> Provided many valuable suggestions during the debugging process,
> repeatedly found breakthroughs when we were stuck, and helped identify
> the root cause.
>
> Lorry Luo:
> Assisted in verifying whether rcu-hurry-callback was executed too long
> or deferred, supported with testing, and helped with communication.
>
> Weiyangyang:
> Main tester who discovered the regression scenario, confirmed that
> enabling CONFIG_LAZY_RCU caused the regression, and verified that this
> patch resolves the issue
Nice team work :-)
>
> Note:
> With my limited understanding of lazy RCU, I am not fully confident that
> this is a real issue. In my opinion, hurry callbacks should not be
> delayed by other events such as firstBQ trace event.
> If my understanding is incorrect, I would greatly appreciate any
> guidance or clarification from the maintainers.
Your understanding looks quite right!
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Since the issue is there for 3 years now and was introduced with
the CONFIG_LAZY_RCU new feature, it can probably wait a few weeks
for the next merge window.
--
Frederic Weisbecker
SUSE Labs
On Thu, 2025-07-17 at 15:34 +0200, Frederic Weisbecker wrote:
>
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>
>
> Le Thu, Jul 17, 2025 at 01:53:38PM +0800, Tze-nan Wu a écrit :
> > We observed a regression in our customer’s environment after
> > enabling
> > CONFIG_LAZY_RCU. In the Android Update Engine scenario, where
> > ioctl() is
> > used heavily, we found that callbacks queued via call_rcu_hurry
> > (such as
> > percpu_ref_switch_to_atomic_rcu) can sometimes be delayed by up to
> > 5
> > seconds before execution. This occurs because the new grace period
> > does
> > not start immediately after the previous one completes.
> >
> > The root cause is that the wake_nocb_gp_defer() function now checks
> > "rdp->nocb_defer_wakeup" instead of "rdp_gp->nocb_defer_wakeup". On
> > CPUs
> > that are not rcuog, "rdp->nocb_defer_wakeup" may always be
> > RCU_NOCB_WAKE_NOT. This can cause "rdp_gp->nocb_defer_wakeup" to be
> > downgraded and the "rdp_gp->nocb_timer" to be postponed by up to 10
> > seconds, delaying the execution of hurry RCU callbacks.
> >
> > The trace log of one scenario we encountered is as follow:
> > // previous GP ends at this point
> > rcu_preempt [000] d..1. 137.240210: rcu_grace_period:
> > rcu_preempt 8369 end
> > rcu_preempt [000] ..... 137.240212: rcu_grace_period:
> > rcu_preempt 8372 reqwait
> > // call_rcu_hurry enqueues "percpu_ref_switch_to_atomic_rcu", the
> > callback waited on by UpdateEngine
> > update_engine [002] d..1. 137.301593: __call_rcu_common: wyy:
> > unlikely p_ref = 00000000********. lazy = 0
> > // FirstQ on cpu 2 rdp_gp->nocb_timer is set to fire after 1
> > jiffy (4ms)
> > // and the rdp_gp->nocb_defer_wakeup is set to RCU_NOCB_WAKE
> > update_engine [002] d..2. 137.301595: rcu_nocb_wake:
> > rcu_preempt 2 FirstQ on cpu2 with rdp_gp (cpu0).
> > // FirstBQ event on cpu2 during the 1 jiffy, make the timer
> > postpond 10 seconds later.
> > // also, the rdp_gp->nocb_defer_wakeup is overwrite to
> > RCU_NOCB_WAKE_LAZY
> > update_engine [002] d..1. 137.301601: rcu_nocb_wake:
> > rcu_preempt 2 WakeEmptyIsDeferred
> > ...
> > ...
> > ...
> > // before the 10 seconds timeout, cpu0 received another
> > call_rcu_hurry
> > // reset the timer to jiffies+1 and set the waketype =
> > RCU_NOCB_WAKE.
> > kworker/u32:0 [000] d..2. 142.557564: rcu_nocb_wake:
> > rcu_preempt 0 FirstQ
> > kworker/u32:0 [000] d..1. 142.557576: rcu_nocb_wake:
> > rcu_preempt 0 WakeEmptyIsDeferred
> > kworker/u32:0 [000] d..1. 142.558296: rcu_nocb_wake:
> > rcu_preempt 0 WakeNot
> > kworker/u32:0 [000] d..1. 142.558562: rcu_nocb_wake:
> > rcu_preempt 0 WakeNot
> > // idle(do_nocb_deferred_wakeup) wake rcuog due to waketype ==
> > RCU_NOCB_WAKE
> > <idle> [000] d..1. 142.558786: rcu_nocb_wake:
> > rcu_preempt 0 DoWake
> > <idle> [000] dN.1. 142.558839: rcu_nocb_wake:
> > rcu_preempt 0 DeferredWake
> > rcuog/0 [000] ..... 142.558871: rcu_nocb_wake:
> > rcu_preempt 0 EndSleep
> > rcuog/0 [000] ..... 142.558877: rcu_nocb_wake:
> > rcu_preempt 0 Check
> > // finally rcuog request a new GP at this point (5 seconds after
> > the FirstQ event)
> > rcuog/0 [000] d..2. 142.558886: rcu_grace_period:
> > rcu_preempt 8372 newreq
> > rcu_preempt [001] d..1. 142.559458: rcu_grace_period:
> > rcu_preempt 8373 start
> > ...
> > rcu_preempt [000] d..1. 142.564258: rcu_grace_period:
> > rcu_preempt 8373 end
> > rcuop/2 [000] D..1. 142.566337: rcu_batch_start:
> > rcu_preempt CBs=219 bl=10
> > // the hurry CB is invoked at this point
> > rcuop/2 [000] b.... 142.566352:
> > blk_queue_usage_counter_release: wyy: wakeup. p_ref =
> > 00000000********.
> >
> > This patch changes the condition to check "rdp_gp-
> > >nocb_defer_wakeup" in
> > the lazy path. This prevents an already scheduled "rdp_gp-
> > >nocb_timer"
> > from being postponed and avoids overwriting "rdp_gp-
> > >nocb_defer_wakeup"
> > when it is not RCU_NOCB_WAKE_NOT.
> >
> > Fixes: 3cb278e73be5 ("rcu: Make call_rcu() lazy to save power")
> > Co-developed-by: Cheng-jui Wang <cheng-jui.wang@mediatek.com>
> > Signed-off-by: Cheng-jui Wang <cheng-jui.wang@mediatek.com>
> > Co-developed-by: Lorry.Luo@mediatek.com
> > Signed-off-by: Lorry.Luo@mediatek.com
> > Tested-by: weiyangyang@vivo.com
> > Signed-off-by: weiyangyang@vivo.com
> > Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
> > ---
> > The regression is first observed by wyy in the Update Engine
> > scenario
> > with CONFIG_LAZY_RCU enabled. there is an additional delay of 4–5
> > seconds during the heavy ioctl API call, waiting for
> > percpu_ref_switch_to_atomic_rcu (RCU hurry CB) to complete.
> >
> > Initially, we suspected that the percpu_ref_switch_to_atomic_rcu
> > function itself was taking too long. However, after enabling some
> > custome and the following trace events: rcu_do_batch,
> > rcu_nocb_wake, and
> > rcu_grace_period. we found that the root cause was that rcuog was
> > not
> > being woken up in time to request a new GP. This led to the delay
> > in
> > invoking the hurry RCU callback (percpu_ref_switch_to_atomic_rcu).
> >
> > Environment:
> > Android-16, Kernel: 6.12, 8 CPUs (ARM)
> >
> > Configuration:
> > CONFIG_TREE_RCU=y
> > CONFIG_PREEMPT_RCU=y
> > CONFIG_LAZY_RCU=y
> > CONFIG_RCU_NOCB_CPU=y
> > CONFIG_RCU_NOCB_CPU_DEFAULT_ALL=y
> > rcu_nocb_gp_stride = -1 (default is 4 for 8 cores)
> > jiffies_lazy_flush = 10 * HZ
> >
> > Contributions:
> > Tze-Nan Wu:
> > Collaborated with Cheng-Jui to discuss which tracepoints needed to
> > be
> > added, jointly analyzed the trace logs, identified the root cause,
> > and
> > proposed this upstream change.
> >
> > Cheng-Jui Wang:
> > Provided many valuable suggestions during the debugging process,
> > repeatedly found breakthroughs when we were stuck, and helped
> > identify
> > the root cause.
> >
> > Lorry Luo:
> > Assisted in verifying whether rcu-hurry-callback was executed too
> > long
> > or deferred, supported with testing, and helped with communication.
> >
> > Weiyangyang:
> > Main tester who discovered the regression scenario, confirmed that
> > enabling CONFIG_LAZY_RCU caused the regression, and verified that
> > this
> > patch resolves the issue
>
> Nice team work :-)
>
Thanks :)
> >
> > Note:
> > With my limited understanding of lazy RCU, I am not fully confident
> > that
> > this is a real issue. In my opinion, hurry callbacks should not be
> > delayed by other events such as firstBQ trace event.
> > If my understanding is incorrect, I would greatly appreciate any
> > guidance or clarification from the maintainers.
>
> Your understanding looks quite right!
>
Glad to know I was on the right track.
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
>
> Since the issue is there for 3 years now and was introduced with
> the CONFIG_LAZY_RCU new feature, it can probably wait a few weeks
> for the next merge window.
>
> --
> Frederic Weisbecker
> SUSE Labs
Totally understand, we'll follow the kernel's development schedule.
--
Tze-Nan Wu
On 7/18/2025 5:10 AM, Tze-nan Wu (吳澤南) wrote:
> On Thu, 2025-07-17 at 15:34 +0200, Frederic Weisbecker wrote:
>>
>> External email : Please do not click links or open attachments until
>> you have verified the sender or the content.
>>
>>
>> Le Thu, Jul 17, 2025 at 01:53:38PM +0800, Tze-nan Wu a écrit :
>> > We observed a regression in our customer’s environment after
>> > enabling
>> > CONFIG_LAZY_RCU. In the Android Update Engine scenario, where
>> > ioctl() is
>> > used heavily, we found that callbacks queued via call_rcu_hurry
>> > (such as
>> > percpu_ref_switch_to_atomic_rcu) can sometimes be delayed by up to
>> > 5
>> > seconds before execution. This occurs because the new grace period
>> > does
>> > not start immediately after the previous one completes.
>> >
>> > The root cause is that the wake_nocb_gp_defer() function now checks
>> > "rdp->nocb_defer_wakeup" instead of "rdp_gp->nocb_defer_wakeup". On
>> > CPUs
>> > that are not rcuog, "rdp->nocb_defer_wakeup" may always be
>> > RCU_NOCB_WAKE_NOT. This can cause "rdp_gp->nocb_defer_wakeup" to be
>> > downgraded and the "rdp_gp->nocb_timer" to be postponed by up to 10
>> > seconds, delaying the execution of hurry RCU callbacks.
>> >
>> > The trace log of one scenario we encountered is as follow:
>> > // previous GP ends at this point
>> > rcu_preempt [000] d..1. 137.240210: rcu_grace_period:
>> > rcu_preempt 8369 end
>> > rcu_preempt [000] ..... 137.240212: rcu_grace_period:
>> > rcu_preempt 8372 reqwait
>> > // call_rcu_hurry enqueues "percpu_ref_switch_to_atomic_rcu", the
>> > callback waited on by UpdateEngine
>> > update_engine [002] d..1. 137.301593: __call_rcu_common: wyy:
>> > unlikely p_ref = 00000000********. lazy = 0
>> > // FirstQ on cpu 2 rdp_gp->nocb_timer is set to fire after 1
>> > jiffy (4ms)
>> > // and the rdp_gp->nocb_defer_wakeup is set to RCU_NOCB_WAKE
>> > update_engine [002] d..2. 137.301595: rcu_nocb_wake:
>> > rcu_preempt 2 FirstQ on cpu2 with rdp_gp (cpu0).
>> > // FirstBQ event on cpu2 during the 1 jiffy, make the timer
>> > postpond 10 seconds later.
>> > // also, the rdp_gp->nocb_defer_wakeup is overwrite to
>> > RCU_NOCB_WAKE_LAZY
>> > update_engine [002] d..1. 137.301601: rcu_nocb_wake:
>> > rcu_preempt 2 WakeEmptyIsDeferred
>> > ...
>> > ...
>> > ...
>> > // before the 10 seconds timeout, cpu0 received another
>> > call_rcu_hurry
>> > // reset the timer to jiffies+1 and set the waketype =
>> > RCU_NOCB_WAKE.
>> > kworker/u32:0 [000] d..2. 142.557564: rcu_nocb_wake:
>> > rcu_preempt 0 FirstQ
>> > kworker/u32:0 [000] d..1. 142.557576: rcu_nocb_wake:
>> > rcu_preempt 0 WakeEmptyIsDeferred
>> > kworker/u32:0 [000] d..1. 142.558296: rcu_nocb_wake:
>> > rcu_preempt 0 WakeNot
>> > kworker/u32:0 [000] d..1. 142.558562: rcu_nocb_wake:
>> > rcu_preempt 0 WakeNot
>> > // idle(do_nocb_deferred_wakeup) wake rcuog due to waketype ==
>> > RCU_NOCB_WAKE
>> > <idle> [000] d..1. 142.558786: rcu_nocb_wake:
>> > rcu_preempt 0 DoWake
>> > <idle> [000] dN.1. 142.558839: rcu_nocb_wake:
>> > rcu_preempt 0 DeferredWake
>> > rcuog/0 [000] ..... 142.558871: rcu_nocb_wake:
>> > rcu_preempt 0 EndSleep
>> > rcuog/0 [000] ..... 142.558877: rcu_nocb_wake:
>> > rcu_preempt 0 Check
>> > // finally rcuog request a new GP at this point (5 seconds after
>> > the FirstQ event)
>> > rcuog/0 [000] d..2. 142.558886: rcu_grace_period:
>> > rcu_preempt 8372 newreq
>> > rcu_preempt [001] d..1. 142.559458: rcu_grace_period:
>> > rcu_preempt 8373 start
>> > ...
>> > rcu_preempt [000] d..1. 142.564258: rcu_grace_period:
>> > rcu_preempt 8373 end
>> > rcuop/2 [000] D..1. 142.566337: rcu_batch_start:
>> > rcu_preempt CBs=219 bl=10
>> > // the hurry CB is invoked at this point
>> > rcuop/2 [000] b.... 142.566352:
>> > blk_queue_usage_counter_release: wyy: wakeup. p_ref =
>> > 00000000********.
>> >
>> > This patch changes the condition to check "rdp_gp-
>> > >nocb_defer_wakeup" in
>> > the lazy path. This prevents an already scheduled "rdp_gp-
>> > >nocb_timer"
>> > from being postponed and avoids overwriting "rdp_gp-
>> > >nocb_defer_wakeup"
>> > when it is not RCU_NOCB_WAKE_NOT.
>> >
>> > Fixes: 3cb278e73be5 ("rcu: Make call_rcu() lazy to save power")
>> > Co-developed-by: Cheng-jui Wang <cheng-jui.wang@mediatek.com>
>> > Signed-off-by: Cheng-jui Wang <cheng-jui.wang@mediatek.com>
>> > Co-developed-by: Lorry.Luo@mediatek.com
>> > Signed-off-by: Lorry.Luo@mediatek.com
>> > Tested-by: weiyangyang@vivo.com
>> > Signed-off-by: weiyangyang@vivo.com
>> > Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
>> > ---
>> > The regression is first observed by wyy in the Update Engine
>> > scenario
>> > with CONFIG_LAZY_RCU enabled. there is an additional delay of 4–5
>> > seconds during the heavy ioctl API call, waiting for
>> > percpu_ref_switch_to_atomic_rcu (RCU hurry CB) to complete.
>> >
>> > Initially, we suspected that the percpu_ref_switch_to_atomic_rcu
>> > function itself was taking too long. However, after enabling some
>> > custome and the following trace events: rcu_do_batch,
>> > rcu_nocb_wake, and
>> > rcu_grace_period. we found that the root cause was that rcuog was
>> > not
>> > being woken up in time to request a new GP. This led to the delay
>> > in
>> > invoking the hurry RCU callback (percpu_ref_switch_to_atomic_rcu).
>> >
>> > Environment:
>> > Android-16, Kernel: 6.12, 8 CPUs (ARM)
>> >
>> > Configuration:
>> > CONFIG_TREE_RCU=y
>> > CONFIG_PREEMPT_RCU=y
>> > CONFIG_LAZY_RCU=y
>> > CONFIG_RCU_NOCB_CPU=y
>> > CONFIG_RCU_NOCB_CPU_DEFAULT_ALL=y
>> > rcu_nocb_gp_stride = -1 (default is 4 for 8 cores)
>> > jiffies_lazy_flush = 10 * HZ
>> >
>> > Contributions:
>> > Tze-Nan Wu:
>> > Collaborated with Cheng-Jui to discuss which tracepoints needed to
>> > be
>> > added, jointly analyzed the trace logs, identified the root cause,
>> > and
>> > proposed this upstream change.
>> >
>> > Cheng-Jui Wang:
>> > Provided many valuable suggestions during the debugging process,
>> > repeatedly found breakthroughs when we were stuck, and helped
>> > identify
>> > the root cause.
>> >
>> > Lorry Luo:
>> > Assisted in verifying whether rcu-hurry-callback was executed too
>> > long
>> > or deferred, supported with testing, and helped with communication.
>> >
>> > Weiyangyang:
>> > Main tester who discovered the regression scenario, confirmed that
>> > enabling CONFIG_LAZY_RCU caused the regression, and verified that
>> > this
>> > patch resolves the issue
>>
>> Nice team work :-)
>>
> Thanks :)
>> >
>> > Note:
>> > With my limited understanding of lazy RCU, I am not fully confident
>> > that
>> > this is a real issue. In my opinion, hurry callbacks should not be
>> > delayed by other events such as firstBQ trace event.
>> > If my understanding is incorrect, I would greatly appreciate any
>> > guidance or clarification from the maintainers.
>>
>> Your understanding looks quite right!
>>
> Glad to know I was on the right track.
>
>> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
>>
>> Since the issue is there for 3 years now and was introduced with
>> the CONFIG_LAZY_RCU new feature, it can probably wait a few weeks
>> for the next merge window.
Nice, thanks!
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
I suggest let us add the Fixes tag too so that stable gets it.
thanks,
- Joel
On Wed, 2025-07-23 at 10:32 -0400, Joel Fernandes wrote:
>
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>
>
> On 7/18/2025 5:10 AM, Tze-nan Wu (吳澤南) wrote:
> > On Thu, 2025-07-17 at 15:34 +0200, Frederic Weisbecker wrote:
> > >
> > > External email : Please do not click links or open attachments
> > > until
> > > you have verified the sender or the content.
> > >
> > >
> > > Le Thu, Jul 17, 2025 at 01:53:38PM +0800, Tze-nan Wu a écrit :
> > > > We observed a regression in our customer’s environment after
> > > > enabling
> > > > CONFIG_LAZY_RCU. In the Android Update Engine scenario, where
> > > > ioctl() is
> > > > used heavily, we found that callbacks queued via call_rcu_hurry
> > > > (such as
> > > > percpu_ref_switch_to_atomic_rcu) can sometimes be delayed by up
> > > > to
> > > > 5
> > > > seconds before execution. This occurs because the new grace
> > > > period
> > > > does
> > > > not start immediately after the previous one completes.
> > > >
> > > > The root cause is that the wake_nocb_gp_defer() function now
> > > > checks
> > > > "rdp->nocb_defer_wakeup" instead of "rdp_gp-
> > > > >nocb_defer_wakeup". On
> > > > CPUs
> > > > that are not rcuog, "rdp->nocb_defer_wakeup" may always be
> > > > RCU_NOCB_WAKE_NOT. This can cause "rdp_gp->nocb_defer_wakeup"
> > > > to be
> > > > downgraded and the "rdp_gp->nocb_timer" to be postponed by up
> > > > to 10
> > > > seconds, delaying the execution of hurry RCU callbacks.
> > > >
> > > > The trace log of one scenario we encountered is as follow:
> > > > // previous GP ends at this point
> > > > rcu_preempt [000] d..1. 137.240210: rcu_grace_period:
> > > > rcu_preempt 8369 end
> > > > rcu_preempt [000] ..... 137.240212: rcu_grace_period:
> > > > rcu_preempt 8372 reqwait
> > > > // call_rcu_hurry enqueues "percpu_ref_switch_to_atomic_rcu",
> > > > the
> > > > callback waited on by UpdateEngine
> > > > update_engine [002] d..1. 137.301593: __call_rcu_common:
> > > > wyy:
> > > > unlikely p_ref = 00000000********. lazy = 0
> > > > // FirstQ on cpu 2 rdp_gp->nocb_timer is set to fire after 1
> > > > jiffy (4ms)
> > > > // and the rdp_gp->nocb_defer_wakeup is set to RCU_NOCB_WAKE
> > > > update_engine [002] d..2. 137.301595: rcu_nocb_wake:
> > > > rcu_preempt 2 FirstQ on cpu2 with rdp_gp (cpu0).
> > > > // FirstBQ event on cpu2 during the 1 jiffy, make the timer
> > > > postpond 10 seconds later.
> > > > // also, the rdp_gp->nocb_defer_wakeup is overwrite to
> > > > RCU_NOCB_WAKE_LAZY
> > > > update_engine [002] d..1. 137.301601: rcu_nocb_wake:
> > > > rcu_preempt 2 WakeEmptyIsDeferred
> > > > ...
> > > > ...
> > > > ...
> > > > // before the 10 seconds timeout, cpu0 received another
> > > > call_rcu_hurry
> > > > // reset the timer to jiffies+1 and set the waketype =
> > > > RCU_NOCB_WAKE.
> > > > kworker/u32:0 [000] d..2. 142.557564: rcu_nocb_wake:
> > > > rcu_preempt 0 FirstQ
> > > > kworker/u32:0 [000] d..1. 142.557576: rcu_nocb_wake:
> > > > rcu_preempt 0 WakeEmptyIsDeferred
> > > > kworker/u32:0 [000] d..1. 142.558296: rcu_nocb_wake:
> > > > rcu_preempt 0 WakeNot
> > > > kworker/u32:0 [000] d..1. 142.558562: rcu_nocb_wake:
> > > > rcu_preempt 0 WakeNot
> > > > // idle(do_nocb_deferred_wakeup) wake rcuog due to waketype
> > > > ==
> > > > RCU_NOCB_WAKE
> > > > <idle> [000] d..1. 142.558786: rcu_nocb_wake:
> > > > rcu_preempt 0 DoWake
> > > > <idle> [000] dN.1. 142.558839: rcu_nocb_wake:
> > > > rcu_preempt 0 DeferredWake
> > > > rcuog/0 [000] ..... 142.558871: rcu_nocb_wake:
> > > > rcu_preempt 0 EndSleep
> > > > rcuog/0 [000] ..... 142.558877: rcu_nocb_wake:
> > > > rcu_preempt 0 Check
> > > > // finally rcuog request a new GP at this point (5 seconds
> > > > after
> > > > the FirstQ event)
> > > > rcuog/0 [000] d..2. 142.558886: rcu_grace_period:
> > > > rcu_preempt 8372 newreq
> > > > rcu_preempt [001] d..1. 142.559458: rcu_grace_period:
> > > > rcu_preempt 8373 start
> > > > ...
> > > > rcu_preempt [000] d..1. 142.564258: rcu_grace_period:
> > > > rcu_preempt 8373 end
> > > > rcuop/2 [000] D..1. 142.566337: rcu_batch_start:
> > > > rcu_preempt CBs=219 bl=10
> > > > // the hurry CB is invoked at this point
> > > > rcuop/2 [000] b.... 142.566352:
> > > > blk_queue_usage_counter_release: wyy: wakeup. p_ref =
> > > > 00000000********.
> > > >
> > > > This patch changes the condition to check "rdp_gp-
> > > > > nocb_defer_wakeup" in
> > > > the lazy path. This prevents an already scheduled "rdp_gp-
> > > > > nocb_timer"
> > > > from being postponed and avoids overwriting "rdp_gp-
> > > > > nocb_defer_wakeup"
> > > > when it is not RCU_NOCB_WAKE_NOT.
> > > >
> > > > Fixes: 3cb278e73be5 ("rcu: Make call_rcu() lazy to save power")
> > > > Co-developed-by: Cheng-jui Wang <cheng-jui.wang@mediatek.com>
> > > > Signed-off-by: Cheng-jui Wang <cheng-jui.wang@mediatek.com>
> > > > Co-developed-by: Lorry.Luo@mediatek.com
> > > > Signed-off-by: Lorry.Luo@mediatek.com
> > > > Tested-by: weiyangyang@vivo.com
> > > > Signed-off-by: weiyangyang@vivo.com
> > > > Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
> > > > ---
> > > > The regression is first observed by wyy in the Update Engine
> > > > scenario
> > > > with CONFIG_LAZY_RCU enabled. there is an additional delay of
> > > > 4–5
> > > > seconds during the heavy ioctl API call, waiting for
> > > > percpu_ref_switch_to_atomic_rcu (RCU hurry CB) to complete.
> > > >
> > > > Initially, we suspected that the
> > > > percpu_ref_switch_to_atomic_rcu
> > > > function itself was taking too long. However, after enabling
> > > > some
> > > > custome and the following trace events: rcu_do_batch,
> > > > rcu_nocb_wake, and
> > > > rcu_grace_period. we found that the root cause was that rcuog
> > > > was
> > > > not
> > > > being woken up in time to request a new GP. This led to the
> > > > delay
> > > > in
> > > > invoking the hurry RCU callback
> > > > (percpu_ref_switch_to_atomic_rcu).
> > > >
> > > > Environment:
> > > > Android-16, Kernel: 6.12, 8 CPUs (ARM)
> > > >
> > > > Configuration:
> > > > CONFIG_TREE_RCU=y
> > > > CONFIG_PREEMPT_RCU=y
> > > > CONFIG_LAZY_RCU=y
> > > > CONFIG_RCU_NOCB_CPU=y
> > > > CONFIG_RCU_NOCB_CPU_DEFAULT_ALL=y
> > > > rcu_nocb_gp_stride = -1 (default is 4 for 8 cores)
> > > > jiffies_lazy_flush = 10 * HZ
> > > >
> > > > Contributions:
> > > > Tze-Nan Wu:
> > > > Collaborated with Cheng-Jui to discuss which tracepoints needed
> > > > to
> > > > be
> > > > added, jointly analyzed the trace logs, identified the root
> > > > cause,
> > > > and
> > > > proposed this upstream change.
> > > >
> > > > Cheng-Jui Wang:
> > > > Provided many valuable suggestions during the debugging
> > > > process,
> > > > repeatedly found breakthroughs when we were stuck, and helped
> > > > identify
> > > > the root cause.
> > > >
> > > > Lorry Luo:
> > > > Assisted in verifying whether rcu-hurry-callback was executed
> > > > too
> > > > long
> > > > or deferred, supported with testing, and helped with
> > > > communication.
> > > >
> > > > Weiyangyang:
> > > > Main tester who discovered the regression scenario, confirmed
> > > > that
> > > > enabling CONFIG_LAZY_RCU caused the regression, and verified
> > > > that
> > > > this
> > > > patch resolves the issue
> > >
> > > Nice team work :-)
> > >
> > Thanks :)
> > > >
> > > > Note:
> > > > With my limited understanding of lazy RCU, I am not fully
> > > > confident
> > > > that
> > > > this is a real issue. In my opinion, hurry callbacks should not
> > > > be
> > > > delayed by other events such as firstBQ trace event.
> > > > If my understanding is incorrect, I would greatly appreciate
> > > > any
> > > > guidance or clarification from the maintainers.
> > >
> > > Your understanding looks quite right!
> > >
> > Glad to know I was on the right track.
> >
> > > Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> > >
> > > Since the issue is there for 3 years now and was introduced with
> > > the CONFIG_LAZY_RCU new feature, it can probably wait a few weeks
> > > for the next merge window.
>
>
> Nice, thanks!
>
> Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
>
> I suggest let us add the Fixes tag too so that stable gets it.
>
> thanks,
>
> - Joel
>
I noticed that I’ve already added the FIXES: tag, did you mean that we
should also add this line "CC:stable@vger.kernel.org" ?
If that’s the case, should I submit a new version of this patch(V2)?
thanks,
--
Tze-Nan Wu
> > Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> > > > > I suggest let us add the Fixes tag too so that stable gets it. > > > > thanks, > > > > - Joel > > > > I noticed that I’ve already added the FIXES: tag, did you mean that we > should also add this line "CC:stable@vger.kernel.org" ? > > If that’s the case, should I submit a new version of this patch(V2)? > Thanks Joel for reviewing the patch! I have already queued it for 6.17 [1]. So, v2 won't be required. [1] https://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux.git/log/?h=next - Neeraj
© 2016 - 2026 Red Hat, Inc.