This series fixes a bug where rdp->defer_qs_pending can remain stuck in
PENDING when a preempted reader's quiescent state is reported up-tree via
a path other than the deferred-QS irq-work handler (FQS scan, hotplug
transition, expedited GP IPI, context switch). Once stuck, the pending
gate in rcu_read_unlock_special() silently suppresses all future arming
attempts on that CPU. The series adds PENDING -> IDLE transitions at the
missing sites.
Also handles the case where the deferred-QS irq-work handler may run between
segments of a compound section (per Paul McKenney's counter-example):
I also handled expedited GP cases in a patch. I have not yet looked at how this
interacts with the softirq paths so I am keeping the RFC tag on.
The last patch is a debug-only detector (CONFIG_RCU_GP_CLEANUP_STALE_CHECK,
marked [TEST COMMIT], not for merge) -- applied alone on unmodified mainline
without patches 2-5 it reliably fires a WARN within 5 minutes under TREE03
rcutorture, confirming the bug exists and the detector catches it; with the
full fix applied, I could not reproduce the issue.
The git tree with all patches can be found at:
https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/tag/?h=rcu-dqs-stuck-rfc-v1-20260522
Joel Fernandes (6):
rcu: introduce rcu_defer_qs_clear() helper
rcu: clear defer_qs_pending when notifying GP changes
rcu: clear defer_qs_pending in handler for compounded sections
rcu: drop redundant defer_qs_pending clear in irqrestore handler
rcu: clear defer_qs_pending at expedited IPI entry
[TEST COMMIT] rcu: detect stuck defer_qs_pending at GP cleanup
kernel/rcu/Kconfig.debug | 11 +++++++++
kernel/rcu/tree.c | 49 ++++++++++++++++++++++++++++++++++++++++
kernel/rcu/tree.h | 13 +++++++++++
kernel/rcu/tree_exp.h | 6 +++++
kernel/rcu/tree_plugin.h | 30 ++++++++++++------------
5 files changed, 94 insertions(+), 15 deletions(-)
--
2.34.1