kernel/rcu/tree_nocb.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
We observed a regression in our customer’s environment after enabling
CONFIG_LAZY_RCU. In the Android Update Engine scenario, where ioctl() is
used heavily, we found that callbacks queued via call_rcu_hurry (such as
percpu_ref_switch_to_atomic_rcu) can sometimes be delayed by up to 5
seconds before execution. This occurs because the new grace period does
not start immediately after the previous one completes.
The root cause is that the wake_nocb_gp_defer() function now checks
"rdp->nocb_defer_wakeup" instead of "rdp_gp->nocb_defer_wakeup". On CPUs
that are not rcuog, "rdp->nocb_defer_wakeup" may always be
RCU_NOCB_WAKE_NOT. This can cause "rdp_gp->nocb_defer_wakeup" to be
downgraded and the "rdp_gp->nocb_timer" to be postponed by up to 10
seconds, delaying the execution of hurry RCU callbacks.
The trace log of one scenario we encountered is as follow:
// previous GP ends at this point
rcu_preempt [000] d..1. 137.240210: rcu_grace_period: rcu_preempt 8369 end
rcu_preempt [000] ..... 137.240212: rcu_grace_period: rcu_preempt 8372 reqwait
// call_rcu_hurry enqueues "percpu_ref_switch_to_atomic_rcu", the callback waited on by UpdateEngine
update_engine [002] d..1. 137.301593: __call_rcu_common: wyy: unlikely p_ref = 00000000********. lazy = 0
// FirstQ on cpu 2 rdp_gp->nocb_timer is set to fire after 1 jiffy (4ms)
// and the rdp_gp->nocb_defer_wakeup is set to RCU_NOCB_WAKE
update_engine [002] d..2. 137.301595: rcu_nocb_wake: rcu_preempt 2 FirstQ on cpu2 with rdp_gp (cpu0).
// FirstBQ event on cpu2 during the 1 jiffy, make the timer postpond 10 seconds later.
// also, the rdp_gp->nocb_defer_wakeup is overwrite to RCU_NOCB_WAKE_LAZY
update_engine [002] d..1. 137.301601: rcu_nocb_wake: rcu_preempt 2 WakeEmptyIsDeferred
...
...
...
// before the 10 seconds timeout, cpu0 received another call_rcu_hurry
// reset the timer to jiffies+1 and set the waketype = RCU_NOCB_WAKE.
kworker/u32:0 [000] d..2. 142.557564: rcu_nocb_wake: rcu_preempt 0 FirstQ
kworker/u32:0 [000] d..1. 142.557576: rcu_nocb_wake: rcu_preempt 0 WakeEmptyIsDeferred
kworker/u32:0 [000] d..1. 142.558296: rcu_nocb_wake: rcu_preempt 0 WakeNot
kworker/u32:0 [000] d..1. 142.558562: rcu_nocb_wake: rcu_preempt 0 WakeNot
// idle(do_nocb_deferred_wakeup) wake rcuog due to waketype == RCU_NOCB_WAKE
<idle> [000] d..1. 142.558786: rcu_nocb_wake: rcu_preempt 0 DoWake
<idle> [000] dN.1. 142.558839: rcu_nocb_wake: rcu_preempt 0 DeferredWake
rcuog/0 [000] ..... 142.558871: rcu_nocb_wake: rcu_preempt 0 EndSleep
rcuog/0 [000] ..... 142.558877: rcu_nocb_wake: rcu_preempt 0 Check
// finally rcuog request a new GP at this point (5 seconds after the FirstQ event)
rcuog/0 [000] d..2. 142.558886: rcu_grace_period: rcu_preempt 8372 newreq
rcu_preempt [001] d..1. 142.559458: rcu_grace_period: rcu_preempt 8373 start
...
rcu_preempt [000] d..1. 142.564258: rcu_grace_period: rcu_preempt 8373 end
rcuop/2 [000] D..1. 142.566337: rcu_batch_start: rcu_preempt CBs=219 bl=10
// the hurry CB is invoked at this point
rcuop/2 [000] b.... 142.566352: blk_queue_usage_counter_release: wyy: wakeup. p_ref = 00000000********.
This patch changes the condition to check "rdp_gp->nocb_defer_wakeup" in
the lazy path. This prevents an already scheduled "rdp_gp->nocb_timer"
from being postponed and avoids overwriting "rdp_gp->nocb_defer_wakeup"
when it is not RCU_NOCB_WAKE_NOT.
Fixes: 3cb278e73be5 ("rcu: Make call_rcu() lazy to save power")
Co-developed-by: Cheng-jui Wang <cheng-jui.wang@mediatek.com>
Signed-off-by: Cheng-jui Wang <cheng-jui.wang@mediatek.com>
Co-developed-by: Lorry.Luo@mediatek.com
Signed-off-by: Lorry.Luo@mediatek.com
Tested-by: weiyangyang@vivo.com
Signed-off-by: weiyangyang@vivo.com
Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
---
The regression is first observed by wyy in the Update Engine scenario
with CONFIG_LAZY_RCU enabled. there is an additional delay of 4–5
seconds during the heavy ioctl API call, waiting for
percpu_ref_switch_to_atomic_rcu (RCU hurry CB) to complete.
Initially, we suspected that the percpu_ref_switch_to_atomic_rcu
function itself was taking too long. However, after enabling some
custome and the following trace events: rcu_do_batch, rcu_nocb_wake, and
rcu_grace_period. we found that the root cause was that rcuog was not
being woken up in time to request a new GP. This led to the delay in
invoking the hurry RCU callback (percpu_ref_switch_to_atomic_rcu).
Environment:
Android-16, Kernel: 6.12, 8 CPUs (ARM)
Configuration:
CONFIG_TREE_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_LAZY_RCU=y
CONFIG_RCU_NOCB_CPU=y
CONFIG_RCU_NOCB_CPU_DEFAULT_ALL=y
rcu_nocb_gp_stride = -1 (default is 4 for 8 cores)
jiffies_lazy_flush = 10 * HZ
Contributions:
Tze-Nan Wu:
Collaborated with Cheng-Jui to discuss which tracepoints needed to be
added, jointly analyzed the trace logs, identified the root cause, and
proposed this upstream change.
Cheng-Jui Wang:
Provided many valuable suggestions during the debugging process,
repeatedly found breakthroughs when we were stuck, and helped identify
the root cause.
Lorry Luo:
Assisted in verifying whether rcu-hurry-callback was executed too long
or deferred, supported with testing, and helped with communication.
Weiyangyang:
Main tester who discovered the regression scenario, confirmed that
enabling CONFIG_LAZY_RCU caused the regression, and verified that this
patch resolves the issue.
Note:
With my limited understanding of lazy RCU, I am not fully confident that
this is a real issue. In my opinion, hurry callbacks should not be
delayed by other events such as firstBQ trace event.
If my understanding is incorrect, I would greatly appreciate any
guidance or clarification from the maintainers.
---
kernel/rcu/tree_nocb.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 08eb9b0e2fab..e6cd56603cad 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -276,7 +276,7 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
* callback storms, no need to wake up too early.
*/
if (waketype == RCU_NOCB_WAKE_LAZY &&
- rdp->nocb_defer_wakeup == RCU_NOCB_WAKE_NOT) {
+ rdp_gp->nocb_defer_wakeup == RCU_NOCB_WAKE_NOT) {
mod_timer(&rdp_gp->nocb_timer, jiffies + rcu_get_jiffies_lazy_flush());
WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
} else if (waketype == RCU_NOCB_WAKE_BYPASS) {
--
2.45.2
Le Thu, Jul 17, 2025 at 01:53:38PM +0800, Tze-nan Wu a écrit : > We observed a regression in our customer’s environment after enabling > CONFIG_LAZY_RCU. In the Android Update Engine scenario, where ioctl() is > used heavily, we found that callbacks queued via call_rcu_hurry (such as > percpu_ref_switch_to_atomic_rcu) can sometimes be delayed by up to 5 > seconds before execution. This occurs because the new grace period does > not start immediately after the previous one completes. > > The root cause is that the wake_nocb_gp_defer() function now checks > "rdp->nocb_defer_wakeup" instead of "rdp_gp->nocb_defer_wakeup". On CPUs > that are not rcuog, "rdp->nocb_defer_wakeup" may always be > RCU_NOCB_WAKE_NOT. This can cause "rdp_gp->nocb_defer_wakeup" to be > downgraded and the "rdp_gp->nocb_timer" to be postponed by up to 10 > seconds, delaying the execution of hurry RCU callbacks. > > The trace log of one scenario we encountered is as follow: > // previous GP ends at this point > rcu_preempt [000] d..1. 137.240210: rcu_grace_period: rcu_preempt 8369 end > rcu_preempt [000] ..... 137.240212: rcu_grace_period: rcu_preempt 8372 reqwait > // call_rcu_hurry enqueues "percpu_ref_switch_to_atomic_rcu", the callback waited on by UpdateEngine > update_engine [002] d..1. 137.301593: __call_rcu_common: wyy: unlikely p_ref = 00000000********. lazy = 0 > // FirstQ on cpu 2 rdp_gp->nocb_timer is set to fire after 1 jiffy (4ms) > // and the rdp_gp->nocb_defer_wakeup is set to RCU_NOCB_WAKE > update_engine [002] d..2. 137.301595: rcu_nocb_wake: rcu_preempt 2 FirstQ on cpu2 with rdp_gp (cpu0). > // FirstBQ event on cpu2 during the 1 jiffy, make the timer postpond 10 seconds later. > // also, the rdp_gp->nocb_defer_wakeup is overwrite to RCU_NOCB_WAKE_LAZY > update_engine [002] d..1. 137.301601: rcu_nocb_wake: rcu_preempt 2 WakeEmptyIsDeferred > ... > ... > ... > // before the 10 seconds timeout, cpu0 received another call_rcu_hurry > // reset the timer to jiffies+1 and set the waketype = RCU_NOCB_WAKE. > kworker/u32:0 [000] d..2. 142.557564: rcu_nocb_wake: rcu_preempt 0 FirstQ > kworker/u32:0 [000] d..1. 142.557576: rcu_nocb_wake: rcu_preempt 0 WakeEmptyIsDeferred > kworker/u32:0 [000] d..1. 142.558296: rcu_nocb_wake: rcu_preempt 0 WakeNot > kworker/u32:0 [000] d..1. 142.558562: rcu_nocb_wake: rcu_preempt 0 WakeNot > // idle(do_nocb_deferred_wakeup) wake rcuog due to waketype == RCU_NOCB_WAKE > <idle> [000] d..1. 142.558786: rcu_nocb_wake: rcu_preempt 0 DoWake > <idle> [000] dN.1. 142.558839: rcu_nocb_wake: rcu_preempt 0 DeferredWake > rcuog/0 [000] ..... 142.558871: rcu_nocb_wake: rcu_preempt 0 EndSleep > rcuog/0 [000] ..... 142.558877: rcu_nocb_wake: rcu_preempt 0 Check > // finally rcuog request a new GP at this point (5 seconds after the FirstQ event) > rcuog/0 [000] d..2. 142.558886: rcu_grace_period: rcu_preempt 8372 newreq > rcu_preempt [001] d..1. 142.559458: rcu_grace_period: rcu_preempt 8373 start > ... > rcu_preempt [000] d..1. 142.564258: rcu_grace_period: rcu_preempt 8373 end > rcuop/2 [000] D..1. 142.566337: rcu_batch_start: rcu_preempt CBs=219 bl=10 > // the hurry CB is invoked at this point > rcuop/2 [000] b.... 142.566352: blk_queue_usage_counter_release: wyy: wakeup. p_ref = 00000000********. > > This patch changes the condition to check "rdp_gp->nocb_defer_wakeup" in > the lazy path. This prevents an already scheduled "rdp_gp->nocb_timer" > from being postponed and avoids overwriting "rdp_gp->nocb_defer_wakeup" > when it is not RCU_NOCB_WAKE_NOT. > > Fixes: 3cb278e73be5 ("rcu: Make call_rcu() lazy to save power") > Co-developed-by: Cheng-jui Wang <cheng-jui.wang@mediatek.com> > Signed-off-by: Cheng-jui Wang <cheng-jui.wang@mediatek.com> > Co-developed-by: Lorry.Luo@mediatek.com > Signed-off-by: Lorry.Luo@mediatek.com > Tested-by: weiyangyang@vivo.com > Signed-off-by: weiyangyang@vivo.com > Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com> > --- > The regression is first observed by wyy in the Update Engine scenario > with CONFIG_LAZY_RCU enabled. there is an additional delay of 4–5 > seconds during the heavy ioctl API call, waiting for > percpu_ref_switch_to_atomic_rcu (RCU hurry CB) to complete. > > Initially, we suspected that the percpu_ref_switch_to_atomic_rcu > function itself was taking too long. However, after enabling some > custome and the following trace events: rcu_do_batch, rcu_nocb_wake, and > rcu_grace_period. we found that the root cause was that rcuog was not > being woken up in time to request a new GP. This led to the delay in > invoking the hurry RCU callback (percpu_ref_switch_to_atomic_rcu). > > Environment: > Android-16, Kernel: 6.12, 8 CPUs (ARM) > > Configuration: > CONFIG_TREE_RCU=y > CONFIG_PREEMPT_RCU=y > CONFIG_LAZY_RCU=y > CONFIG_RCU_NOCB_CPU=y > CONFIG_RCU_NOCB_CPU_DEFAULT_ALL=y > rcu_nocb_gp_stride = -1 (default is 4 for 8 cores) > jiffies_lazy_flush = 10 * HZ > > Contributions: > Tze-Nan Wu: > Collaborated with Cheng-Jui to discuss which tracepoints needed to be > added, jointly analyzed the trace logs, identified the root cause, and > proposed this upstream change. > > Cheng-Jui Wang: > Provided many valuable suggestions during the debugging process, > repeatedly found breakthroughs when we were stuck, and helped identify > the root cause. > > Lorry Luo: > Assisted in verifying whether rcu-hurry-callback was executed too long > or deferred, supported with testing, and helped with communication. > > Weiyangyang: > Main tester who discovered the regression scenario, confirmed that > enabling CONFIG_LAZY_RCU caused the regression, and verified that this > patch resolves the issue Nice team work :-) > > Note: > With my limited understanding of lazy RCU, I am not fully confident that > this is a real issue. In my opinion, hurry callbacks should not be > delayed by other events such as firstBQ trace event. > If my understanding is incorrect, I would greatly appreciate any > guidance or clarification from the maintainers. Your understanding looks quite right! Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Since the issue is there for 3 years now and was introduced with the CONFIG_LAZY_RCU new feature, it can probably wait a few weeks for the next merge window. -- Frederic Weisbecker SUSE Labs
On Thu, 2025-07-17 at 15:34 +0200, Frederic Weisbecker wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > > > Le Thu, Jul 17, 2025 at 01:53:38PM +0800, Tze-nan Wu a écrit : > > We observed a regression in our customer’s environment after > > enabling > > CONFIG_LAZY_RCU. In the Android Update Engine scenario, where > > ioctl() is > > used heavily, we found that callbacks queued via call_rcu_hurry > > (such as > > percpu_ref_switch_to_atomic_rcu) can sometimes be delayed by up to > > 5 > > seconds before execution. This occurs because the new grace period > > does > > not start immediately after the previous one completes. > > > > The root cause is that the wake_nocb_gp_defer() function now checks > > "rdp->nocb_defer_wakeup" instead of "rdp_gp->nocb_defer_wakeup". On > > CPUs > > that are not rcuog, "rdp->nocb_defer_wakeup" may always be > > RCU_NOCB_WAKE_NOT. This can cause "rdp_gp->nocb_defer_wakeup" to be > > downgraded and the "rdp_gp->nocb_timer" to be postponed by up to 10 > > seconds, delaying the execution of hurry RCU callbacks. > > > > The trace log of one scenario we encountered is as follow: > > // previous GP ends at this point > > rcu_preempt [000] d..1. 137.240210: rcu_grace_period: > > rcu_preempt 8369 end > > rcu_preempt [000] ..... 137.240212: rcu_grace_period: > > rcu_preempt 8372 reqwait > > // call_rcu_hurry enqueues "percpu_ref_switch_to_atomic_rcu", the > > callback waited on by UpdateEngine > > update_engine [002] d..1. 137.301593: __call_rcu_common: wyy: > > unlikely p_ref = 00000000********. lazy = 0 > > // FirstQ on cpu 2 rdp_gp->nocb_timer is set to fire after 1 > > jiffy (4ms) > > // and the rdp_gp->nocb_defer_wakeup is set to RCU_NOCB_WAKE > > update_engine [002] d..2. 137.301595: rcu_nocb_wake: > > rcu_preempt 2 FirstQ on cpu2 with rdp_gp (cpu0). > > // FirstBQ event on cpu2 during the 1 jiffy, make the timer > > postpond 10 seconds later. > > // also, the rdp_gp->nocb_defer_wakeup is overwrite to > > RCU_NOCB_WAKE_LAZY > > update_engine [002] d..1. 137.301601: rcu_nocb_wake: > > rcu_preempt 2 WakeEmptyIsDeferred > > ... > > ... > > ... > > // before the 10 seconds timeout, cpu0 received another > > call_rcu_hurry > > // reset the timer to jiffies+1 and set the waketype = > > RCU_NOCB_WAKE. > > kworker/u32:0 [000] d..2. 142.557564: rcu_nocb_wake: > > rcu_preempt 0 FirstQ > > kworker/u32:0 [000] d..1. 142.557576: rcu_nocb_wake: > > rcu_preempt 0 WakeEmptyIsDeferred > > kworker/u32:0 [000] d..1. 142.558296: rcu_nocb_wake: > > rcu_preempt 0 WakeNot > > kworker/u32:0 [000] d..1. 142.558562: rcu_nocb_wake: > > rcu_preempt 0 WakeNot > > // idle(do_nocb_deferred_wakeup) wake rcuog due to waketype == > > RCU_NOCB_WAKE > > <idle> [000] d..1. 142.558786: rcu_nocb_wake: > > rcu_preempt 0 DoWake > > <idle> [000] dN.1. 142.558839: rcu_nocb_wake: > > rcu_preempt 0 DeferredWake > > rcuog/0 [000] ..... 142.558871: rcu_nocb_wake: > > rcu_preempt 0 EndSleep > > rcuog/0 [000] ..... 142.558877: rcu_nocb_wake: > > rcu_preempt 0 Check > > // finally rcuog request a new GP at this point (5 seconds after > > the FirstQ event) > > rcuog/0 [000] d..2. 142.558886: rcu_grace_period: > > rcu_preempt 8372 newreq > > rcu_preempt [001] d..1. 142.559458: rcu_grace_period: > > rcu_preempt 8373 start > > ... > > rcu_preempt [000] d..1. 142.564258: rcu_grace_period: > > rcu_preempt 8373 end > > rcuop/2 [000] D..1. 142.566337: rcu_batch_start: > > rcu_preempt CBs=219 bl=10 > > // the hurry CB is invoked at this point > > rcuop/2 [000] b.... 142.566352: > > blk_queue_usage_counter_release: wyy: wakeup. p_ref = > > 00000000********. > > > > This patch changes the condition to check "rdp_gp- > > >nocb_defer_wakeup" in > > the lazy path. This prevents an already scheduled "rdp_gp- > > >nocb_timer" > > from being postponed and avoids overwriting "rdp_gp- > > >nocb_defer_wakeup" > > when it is not RCU_NOCB_WAKE_NOT. > > > > Fixes: 3cb278e73be5 ("rcu: Make call_rcu() lazy to save power") > > Co-developed-by: Cheng-jui Wang <cheng-jui.wang@mediatek.com> > > Signed-off-by: Cheng-jui Wang <cheng-jui.wang@mediatek.com> > > Co-developed-by: Lorry.Luo@mediatek.com > > Signed-off-by: Lorry.Luo@mediatek.com > > Tested-by: weiyangyang@vivo.com > > Signed-off-by: weiyangyang@vivo.com > > Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com> > > --- > > The regression is first observed by wyy in the Update Engine > > scenario > > with CONFIG_LAZY_RCU enabled. there is an additional delay of 4–5 > > seconds during the heavy ioctl API call, waiting for > > percpu_ref_switch_to_atomic_rcu (RCU hurry CB) to complete. > > > > Initially, we suspected that the percpu_ref_switch_to_atomic_rcu > > function itself was taking too long. However, after enabling some > > custome and the following trace events: rcu_do_batch, > > rcu_nocb_wake, and > > rcu_grace_period. we found that the root cause was that rcuog was > > not > > being woken up in time to request a new GP. This led to the delay > > in > > invoking the hurry RCU callback (percpu_ref_switch_to_atomic_rcu). > > > > Environment: > > Android-16, Kernel: 6.12, 8 CPUs (ARM) > > > > Configuration: > > CONFIG_TREE_RCU=y > > CONFIG_PREEMPT_RCU=y > > CONFIG_LAZY_RCU=y > > CONFIG_RCU_NOCB_CPU=y > > CONFIG_RCU_NOCB_CPU_DEFAULT_ALL=y > > rcu_nocb_gp_stride = -1 (default is 4 for 8 cores) > > jiffies_lazy_flush = 10 * HZ > > > > Contributions: > > Tze-Nan Wu: > > Collaborated with Cheng-Jui to discuss which tracepoints needed to > > be > > added, jointly analyzed the trace logs, identified the root cause, > > and > > proposed this upstream change. > > > > Cheng-Jui Wang: > > Provided many valuable suggestions during the debugging process, > > repeatedly found breakthroughs when we were stuck, and helped > > identify > > the root cause. > > > > Lorry Luo: > > Assisted in verifying whether rcu-hurry-callback was executed too > > long > > or deferred, supported with testing, and helped with communication. > > > > Weiyangyang: > > Main tester who discovered the regression scenario, confirmed that > > enabling CONFIG_LAZY_RCU caused the regression, and verified that > > this > > patch resolves the issue > > Nice team work :-) > Thanks :) > > > > Note: > > With my limited understanding of lazy RCU, I am not fully confident > > that > > this is a real issue. In my opinion, hurry callbacks should not be > > delayed by other events such as firstBQ trace event. > > If my understanding is incorrect, I would greatly appreciate any > > guidance or clarification from the maintainers. > > Your understanding looks quite right! > Glad to know I was on the right track. > Reviewed-by: Frederic Weisbecker <frederic@kernel.org> > > Since the issue is there for 3 years now and was introduced with > the CONFIG_LAZY_RCU new feature, it can probably wait a few weeks > for the next merge window. > > -- > Frederic Weisbecker > SUSE Labs Totally understand, we'll follow the kernel's development schedule. -- Tze-Nan Wu
On 7/18/2025 5:10 AM, Tze-nan Wu (吳澤南) wrote: > On Thu, 2025-07-17 at 15:34 +0200, Frederic Weisbecker wrote: >> >> External email : Please do not click links or open attachments until >> you have verified the sender or the content. >> >> >> Le Thu, Jul 17, 2025 at 01:53:38PM +0800, Tze-nan Wu a écrit : >> > We observed a regression in our customer’s environment after >> > enabling >> > CONFIG_LAZY_RCU. In the Android Update Engine scenario, where >> > ioctl() is >> > used heavily, we found that callbacks queued via call_rcu_hurry >> > (such as >> > percpu_ref_switch_to_atomic_rcu) can sometimes be delayed by up to >> > 5 >> > seconds before execution. This occurs because the new grace period >> > does >> > not start immediately after the previous one completes. >> > >> > The root cause is that the wake_nocb_gp_defer() function now checks >> > "rdp->nocb_defer_wakeup" instead of "rdp_gp->nocb_defer_wakeup". On >> > CPUs >> > that are not rcuog, "rdp->nocb_defer_wakeup" may always be >> > RCU_NOCB_WAKE_NOT. This can cause "rdp_gp->nocb_defer_wakeup" to be >> > downgraded and the "rdp_gp->nocb_timer" to be postponed by up to 10 >> > seconds, delaying the execution of hurry RCU callbacks. >> > >> > The trace log of one scenario we encountered is as follow: >> > // previous GP ends at this point >> > rcu_preempt [000] d..1. 137.240210: rcu_grace_period: >> > rcu_preempt 8369 end >> > rcu_preempt [000] ..... 137.240212: rcu_grace_period: >> > rcu_preempt 8372 reqwait >> > // call_rcu_hurry enqueues "percpu_ref_switch_to_atomic_rcu", the >> > callback waited on by UpdateEngine >> > update_engine [002] d..1. 137.301593: __call_rcu_common: wyy: >> > unlikely p_ref = 00000000********. lazy = 0 >> > // FirstQ on cpu 2 rdp_gp->nocb_timer is set to fire after 1 >> > jiffy (4ms) >> > // and the rdp_gp->nocb_defer_wakeup is set to RCU_NOCB_WAKE >> > update_engine [002] d..2. 137.301595: rcu_nocb_wake: >> > rcu_preempt 2 FirstQ on cpu2 with rdp_gp (cpu0). >> > // FirstBQ event on cpu2 during the 1 jiffy, make the timer >> > postpond 10 seconds later. >> > // also, the rdp_gp->nocb_defer_wakeup is overwrite to >> > RCU_NOCB_WAKE_LAZY >> > update_engine [002] d..1. 137.301601: rcu_nocb_wake: >> > rcu_preempt 2 WakeEmptyIsDeferred >> > ... >> > ... >> > ... >> > // before the 10 seconds timeout, cpu0 received another >> > call_rcu_hurry >> > // reset the timer to jiffies+1 and set the waketype = >> > RCU_NOCB_WAKE. >> > kworker/u32:0 [000] d..2. 142.557564: rcu_nocb_wake: >> > rcu_preempt 0 FirstQ >> > kworker/u32:0 [000] d..1. 142.557576: rcu_nocb_wake: >> > rcu_preempt 0 WakeEmptyIsDeferred >> > kworker/u32:0 [000] d..1. 142.558296: rcu_nocb_wake: >> > rcu_preempt 0 WakeNot >> > kworker/u32:0 [000] d..1. 142.558562: rcu_nocb_wake: >> > rcu_preempt 0 WakeNot >> > // idle(do_nocb_deferred_wakeup) wake rcuog due to waketype == >> > RCU_NOCB_WAKE >> > <idle> [000] d..1. 142.558786: rcu_nocb_wake: >> > rcu_preempt 0 DoWake >> > <idle> [000] dN.1. 142.558839: rcu_nocb_wake: >> > rcu_preempt 0 DeferredWake >> > rcuog/0 [000] ..... 142.558871: rcu_nocb_wake: >> > rcu_preempt 0 EndSleep >> > rcuog/0 [000] ..... 142.558877: rcu_nocb_wake: >> > rcu_preempt 0 Check >> > // finally rcuog request a new GP at this point (5 seconds after >> > the FirstQ event) >> > rcuog/0 [000] d..2. 142.558886: rcu_grace_period: >> > rcu_preempt 8372 newreq >> > rcu_preempt [001] d..1. 142.559458: rcu_grace_period: >> > rcu_preempt 8373 start >> > ... >> > rcu_preempt [000] d..1. 142.564258: rcu_grace_period: >> > rcu_preempt 8373 end >> > rcuop/2 [000] D..1. 142.566337: rcu_batch_start: >> > rcu_preempt CBs=219 bl=10 >> > // the hurry CB is invoked at this point >> > rcuop/2 [000] b.... 142.566352: >> > blk_queue_usage_counter_release: wyy: wakeup. p_ref = >> > 00000000********. >> > >> > This patch changes the condition to check "rdp_gp- >> > >nocb_defer_wakeup" in >> > the lazy path. This prevents an already scheduled "rdp_gp- >> > >nocb_timer" >> > from being postponed and avoids overwriting "rdp_gp- >> > >nocb_defer_wakeup" >> > when it is not RCU_NOCB_WAKE_NOT. >> > >> > Fixes: 3cb278e73be5 ("rcu: Make call_rcu() lazy to save power") >> > Co-developed-by: Cheng-jui Wang <cheng-jui.wang@mediatek.com> >> > Signed-off-by: Cheng-jui Wang <cheng-jui.wang@mediatek.com> >> > Co-developed-by: Lorry.Luo@mediatek.com >> > Signed-off-by: Lorry.Luo@mediatek.com >> > Tested-by: weiyangyang@vivo.com >> > Signed-off-by: weiyangyang@vivo.com >> > Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com> >> > --- >> > The regression is first observed by wyy in the Update Engine >> > scenario >> > with CONFIG_LAZY_RCU enabled. there is an additional delay of 4–5 >> > seconds during the heavy ioctl API call, waiting for >> > percpu_ref_switch_to_atomic_rcu (RCU hurry CB) to complete. >> > >> > Initially, we suspected that the percpu_ref_switch_to_atomic_rcu >> > function itself was taking too long. However, after enabling some >> > custome and the following trace events: rcu_do_batch, >> > rcu_nocb_wake, and >> > rcu_grace_period. we found that the root cause was that rcuog was >> > not >> > being woken up in time to request a new GP. This led to the delay >> > in >> > invoking the hurry RCU callback (percpu_ref_switch_to_atomic_rcu). >> > >> > Environment: >> > Android-16, Kernel: 6.12, 8 CPUs (ARM) >> > >> > Configuration: >> > CONFIG_TREE_RCU=y >> > CONFIG_PREEMPT_RCU=y >> > CONFIG_LAZY_RCU=y >> > CONFIG_RCU_NOCB_CPU=y >> > CONFIG_RCU_NOCB_CPU_DEFAULT_ALL=y >> > rcu_nocb_gp_stride = -1 (default is 4 for 8 cores) >> > jiffies_lazy_flush = 10 * HZ >> > >> > Contributions: >> > Tze-Nan Wu: >> > Collaborated with Cheng-Jui to discuss which tracepoints needed to >> > be >> > added, jointly analyzed the trace logs, identified the root cause, >> > and >> > proposed this upstream change. >> > >> > Cheng-Jui Wang: >> > Provided many valuable suggestions during the debugging process, >> > repeatedly found breakthroughs when we were stuck, and helped >> > identify >> > the root cause. >> > >> > Lorry Luo: >> > Assisted in verifying whether rcu-hurry-callback was executed too >> > long >> > or deferred, supported with testing, and helped with communication. >> > >> > Weiyangyang: >> > Main tester who discovered the regression scenario, confirmed that >> > enabling CONFIG_LAZY_RCU caused the regression, and verified that >> > this >> > patch resolves the issue >> >> Nice team work :-) >> > Thanks :) >> > >> > Note: >> > With my limited understanding of lazy RCU, I am not fully confident >> > that >> > this is a real issue. In my opinion, hurry callbacks should not be >> > delayed by other events such as firstBQ trace event. >> > If my understanding is incorrect, I would greatly appreciate any >> > guidance or clarification from the maintainers. >> >> Your understanding looks quite right! >> > Glad to know I was on the right track. > >> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> >> >> Since the issue is there for 3 years now and was introduced with >> the CONFIG_LAZY_RCU new feature, it can probably wait a few weeks >> for the next merge window. Nice, thanks! Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> I suggest let us add the Fixes tag too so that stable gets it. thanks, - Joel
On Wed, 2025-07-23 at 10:32 -0400, Joel Fernandes wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > > > On 7/18/2025 5:10 AM, Tze-nan Wu (吳澤南) wrote: > > On Thu, 2025-07-17 at 15:34 +0200, Frederic Weisbecker wrote: > > > > > > External email : Please do not click links or open attachments > > > until > > > you have verified the sender or the content. > > > > > > > > > Le Thu, Jul 17, 2025 at 01:53:38PM +0800, Tze-nan Wu a écrit : > > > > We observed a regression in our customer’s environment after > > > > enabling > > > > CONFIG_LAZY_RCU. In the Android Update Engine scenario, where > > > > ioctl() is > > > > used heavily, we found that callbacks queued via call_rcu_hurry > > > > (such as > > > > percpu_ref_switch_to_atomic_rcu) can sometimes be delayed by up > > > > to > > > > 5 > > > > seconds before execution. This occurs because the new grace > > > > period > > > > does > > > > not start immediately after the previous one completes. > > > > > > > > The root cause is that the wake_nocb_gp_defer() function now > > > > checks > > > > "rdp->nocb_defer_wakeup" instead of "rdp_gp- > > > > >nocb_defer_wakeup". On > > > > CPUs > > > > that are not rcuog, "rdp->nocb_defer_wakeup" may always be > > > > RCU_NOCB_WAKE_NOT. This can cause "rdp_gp->nocb_defer_wakeup" > > > > to be > > > > downgraded and the "rdp_gp->nocb_timer" to be postponed by up > > > > to 10 > > > > seconds, delaying the execution of hurry RCU callbacks. > > > > > > > > The trace log of one scenario we encountered is as follow: > > > > // previous GP ends at this point > > > > rcu_preempt [000] d..1. 137.240210: rcu_grace_period: > > > > rcu_preempt 8369 end > > > > rcu_preempt [000] ..... 137.240212: rcu_grace_period: > > > > rcu_preempt 8372 reqwait > > > > // call_rcu_hurry enqueues "percpu_ref_switch_to_atomic_rcu", > > > > the > > > > callback waited on by UpdateEngine > > > > update_engine [002] d..1. 137.301593: __call_rcu_common: > > > > wyy: > > > > unlikely p_ref = 00000000********. lazy = 0 > > > > // FirstQ on cpu 2 rdp_gp->nocb_timer is set to fire after 1 > > > > jiffy (4ms) > > > > // and the rdp_gp->nocb_defer_wakeup is set to RCU_NOCB_WAKE > > > > update_engine [002] d..2. 137.301595: rcu_nocb_wake: > > > > rcu_preempt 2 FirstQ on cpu2 with rdp_gp (cpu0). > > > > // FirstBQ event on cpu2 during the 1 jiffy, make the timer > > > > postpond 10 seconds later. > > > > // also, the rdp_gp->nocb_defer_wakeup is overwrite to > > > > RCU_NOCB_WAKE_LAZY > > > > update_engine [002] d..1. 137.301601: rcu_nocb_wake: > > > > rcu_preempt 2 WakeEmptyIsDeferred > > > > ... > > > > ... > > > > ... > > > > // before the 10 seconds timeout, cpu0 received another > > > > call_rcu_hurry > > > > // reset the timer to jiffies+1 and set the waketype = > > > > RCU_NOCB_WAKE. > > > > kworker/u32:0 [000] d..2. 142.557564: rcu_nocb_wake: > > > > rcu_preempt 0 FirstQ > > > > kworker/u32:0 [000] d..1. 142.557576: rcu_nocb_wake: > > > > rcu_preempt 0 WakeEmptyIsDeferred > > > > kworker/u32:0 [000] d..1. 142.558296: rcu_nocb_wake: > > > > rcu_preempt 0 WakeNot > > > > kworker/u32:0 [000] d..1. 142.558562: rcu_nocb_wake: > > > > rcu_preempt 0 WakeNot > > > > // idle(do_nocb_deferred_wakeup) wake rcuog due to waketype > > > > == > > > > RCU_NOCB_WAKE > > > > <idle> [000] d..1. 142.558786: rcu_nocb_wake: > > > > rcu_preempt 0 DoWake > > > > <idle> [000] dN.1. 142.558839: rcu_nocb_wake: > > > > rcu_preempt 0 DeferredWake > > > > rcuog/0 [000] ..... 142.558871: rcu_nocb_wake: > > > > rcu_preempt 0 EndSleep > > > > rcuog/0 [000] ..... 142.558877: rcu_nocb_wake: > > > > rcu_preempt 0 Check > > > > // finally rcuog request a new GP at this point (5 seconds > > > > after > > > > the FirstQ event) > > > > rcuog/0 [000] d..2. 142.558886: rcu_grace_period: > > > > rcu_preempt 8372 newreq > > > > rcu_preempt [001] d..1. 142.559458: rcu_grace_period: > > > > rcu_preempt 8373 start > > > > ... > > > > rcu_preempt [000] d..1. 142.564258: rcu_grace_period: > > > > rcu_preempt 8373 end > > > > rcuop/2 [000] D..1. 142.566337: rcu_batch_start: > > > > rcu_preempt CBs=219 bl=10 > > > > // the hurry CB is invoked at this point > > > > rcuop/2 [000] b.... 142.566352: > > > > blk_queue_usage_counter_release: wyy: wakeup. p_ref = > > > > 00000000********. > > > > > > > > This patch changes the condition to check "rdp_gp- > > > > > nocb_defer_wakeup" in > > > > the lazy path. This prevents an already scheduled "rdp_gp- > > > > > nocb_timer" > > > > from being postponed and avoids overwriting "rdp_gp- > > > > > nocb_defer_wakeup" > > > > when it is not RCU_NOCB_WAKE_NOT. > > > > > > > > Fixes: 3cb278e73be5 ("rcu: Make call_rcu() lazy to save power") > > > > Co-developed-by: Cheng-jui Wang <cheng-jui.wang@mediatek.com> > > > > Signed-off-by: Cheng-jui Wang <cheng-jui.wang@mediatek.com> > > > > Co-developed-by: Lorry.Luo@mediatek.com > > > > Signed-off-by: Lorry.Luo@mediatek.com > > > > Tested-by: weiyangyang@vivo.com > > > > Signed-off-by: weiyangyang@vivo.com > > > > Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com> > > > > --- > > > > The regression is first observed by wyy in the Update Engine > > > > scenario > > > > with CONFIG_LAZY_RCU enabled. there is an additional delay of > > > > 4–5 > > > > seconds during the heavy ioctl API call, waiting for > > > > percpu_ref_switch_to_atomic_rcu (RCU hurry CB) to complete. > > > > > > > > Initially, we suspected that the > > > > percpu_ref_switch_to_atomic_rcu > > > > function itself was taking too long. However, after enabling > > > > some > > > > custome and the following trace events: rcu_do_batch, > > > > rcu_nocb_wake, and > > > > rcu_grace_period. we found that the root cause was that rcuog > > > > was > > > > not > > > > being woken up in time to request a new GP. This led to the > > > > delay > > > > in > > > > invoking the hurry RCU callback > > > > (percpu_ref_switch_to_atomic_rcu). > > > > > > > > Environment: > > > > Android-16, Kernel: 6.12, 8 CPUs (ARM) > > > > > > > > Configuration: > > > > CONFIG_TREE_RCU=y > > > > CONFIG_PREEMPT_RCU=y > > > > CONFIG_LAZY_RCU=y > > > > CONFIG_RCU_NOCB_CPU=y > > > > CONFIG_RCU_NOCB_CPU_DEFAULT_ALL=y > > > > rcu_nocb_gp_stride = -1 (default is 4 for 8 cores) > > > > jiffies_lazy_flush = 10 * HZ > > > > > > > > Contributions: > > > > Tze-Nan Wu: > > > > Collaborated with Cheng-Jui to discuss which tracepoints needed > > > > to > > > > be > > > > added, jointly analyzed the trace logs, identified the root > > > > cause, > > > > and > > > > proposed this upstream change. > > > > > > > > Cheng-Jui Wang: > > > > Provided many valuable suggestions during the debugging > > > > process, > > > > repeatedly found breakthroughs when we were stuck, and helped > > > > identify > > > > the root cause. > > > > > > > > Lorry Luo: > > > > Assisted in verifying whether rcu-hurry-callback was executed > > > > too > > > > long > > > > or deferred, supported with testing, and helped with > > > > communication. > > > > > > > > Weiyangyang: > > > > Main tester who discovered the regression scenario, confirmed > > > > that > > > > enabling CONFIG_LAZY_RCU caused the regression, and verified > > > > that > > > > this > > > > patch resolves the issue > > > > > > Nice team work :-) > > > > > Thanks :) > > > > > > > > Note: > > > > With my limited understanding of lazy RCU, I am not fully > > > > confident > > > > that > > > > this is a real issue. In my opinion, hurry callbacks should not > > > > be > > > > delayed by other events such as firstBQ trace event. > > > > If my understanding is incorrect, I would greatly appreciate > > > > any > > > > guidance or clarification from the maintainers. > > > > > > Your understanding looks quite right! > > > > > Glad to know I was on the right track. > > > > > Reviewed-by: Frederic Weisbecker <frederic@kernel.org> > > > > > > Since the issue is there for 3 years now and was introduced with > > > the CONFIG_LAZY_RCU new feature, it can probably wait a few weeks > > > for the next merge window. > > > Nice, thanks! > > Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> > > I suggest let us add the Fixes tag too so that stable gets it. > > thanks, > > - Joel > I noticed that I’ve already added the FIXES: tag, did you mean that we should also add this line "CC:stable@vger.kernel.org" ? If that’s the case, should I submit a new version of this patch(V2)? thanks, -- Tze-Nan Wu
> > Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> > > > > I suggest let us add the Fixes tag too so that stable gets it. > > > > thanks, > > > > - Joel > > > > I noticed that I’ve already added the FIXES: tag, did you mean that we > should also add this line "CC:stable@vger.kernel.org" ? > > If that’s the case, should I submit a new version of this patch(V2)? > Thanks Joel for reviewing the patch! I have already queued it for 6.17 [1]. So, v2 won't be required. [1] https://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux.git/log/?h=next - Neeraj
© 2016 - 2025 Red Hat, Inc.