[PATCH-tip v3] debugobjects: Don't call fill_pool() in early boot non-task context

Waiman Long posted 1 patch 4 days, 1 hour ago
lib/debugobjects.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
[PATCH-tip v3] debugobjects: Don't call fill_pool() in early boot non-task context
Posted by Waiman Long 4 days, 1 hour ago
When booting a debug PREEMPT_RT kernel on an arm64 system with grace
processor, the following lockdep warning was reported during early boot.

  ================================
  WARNING: inconsistent lock state
  7.1.0-rc4-test+ #1 Not tainted
  --------------------------------
  inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
  swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
  ffff0000803346a0 (&n->list_lock){?.+.}-{3:3}, at: get_from_partial_node+0x74/0xa0
    :
  Call trace:
    :
   rt_spin_lock+0xa0/0x400
   get_from_partial_node+0x74/0xa0
   ___slab_alloc+0x94/0x4f8
   kmem_cache_alloc_noprof+0x2d4/0x598
   kmem_alloc_batch+0x54/0x170
   fill_pool+0x12c/0x438
   debug_objects_fill_pool+0x58/0x60
   debug_object_activate+0xfc/0x3d0
   add_timer_on+0x250/0x3a0
   add_interrupt_randomness+0x2d4/0x340
   handle_percpu_devid_irq+0x2e0/0x4e0
   handle_irq_desc+0xc0/0x120
   generic_handle_domain_irq+0x20/0x40
   __gic_handle_irq_from_irqson.isra.0+0x3c4/0x708
   gic_handle_irq+0x7c/0xe0
   call_on_irq_stack+0x30/0x48
   do_interrupt_handler+0x134/0x158
   el1_interrupt+0x48/0xb0
    :

The {IN-HARDIRQ-W} usage happens when debug_objects_fill_pool() calls
fill_pool() in the hardirq context during early boot. It is caused by the
"system_state < SYSTEM_SCHEDULING" check in debug_objects_fill_pool()
which allows fill_pool() to be called from any context during early
boot before scheduling is enabled.

Calling fill_pool() from any context is problematic as deadlock can
happen even though the early boot window should be pretty short. Fix
that by restricting the fill_pool() call to only in_task() context
during early boot.

Fixes: 06e0ae988f6e ("debugobjects: Allow to refill the pool before SYSTEM_SCHEDULING")
Signed-off-by: Waiman Long <longman@redhat.com>
---
 lib/debugobjects.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

 [v3] Rebased on top of tip/urgent/core & trim call trace.

diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 772ddabcbe7d..76bfc2571591 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -736,11 +736,15 @@ static void debug_objects_fill_pool(void)
 
 	/*
 	 * On RT enabled kernels the pool refill must happen in preemptible
-	 * context and not enqueued on an rt_mutex -- for !RT kernels we rely
-	 * on the fact that spinlock_t and raw_spinlock_t are basically the
-	 * same type and this lock-type inversion works just fine.
+	 * context and not enqueued on an rt_mutex or in task context during
+	 * early boot before scheduling starts.
+	 *
+	 * For !RT kernels we rely on the fact that spinlock_t and
+	 * raw_spinlock_t are basically the same type and this lock-type
+	 * inversion works just fine.
 	 */
-	if (!IS_ENABLED(CONFIG_PREEMPT_RT) || system_state < SYSTEM_SCHEDULING ||
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT) ||
+	    (system_state < SYSTEM_SCHEDULING && in_task()) ||
 	    (preemptible() && !debug_objects_is_pi_blocked_on())) {
 		/*
 		 * Annotate away the spinlock_t inside raw_spinlock_t warning
-- 
2.54.0