[PATCH v4] debugobjects: Don't call fill_pool() in early boot hardirq context

Waiman Long posted 1 patch 4 days, 12 hours ago
There is a newer version of this series
lib/debugobjects.c | 46 +++++++++++++++++++++++++++++++++++++---------
1 file changed, 37 insertions(+), 9 deletions(-)
[PATCH v4] debugobjects: Don't call fill_pool() in early boot hardirq context
Posted by Waiman Long 4 days, 12 hours ago
When booting a debug PREEMPT_RT kernel on an arm64 system with grace
processor, the following lockdep warning was reported during early boot.

  ================================
  WARNING: inconsistent lock state
  7.1.0-rc4-test+ #1 Not tainted
  --------------------------------
  inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
  swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
  ffff0000803346a0 (&n->list_lock){?.+.}-{3:3}, at: get_from_partial_node+0x74/0xa0
    :
  Call trace:
    :
   rt_spin_lock+0xa0/0x400
   get_from_partial_node+0x74/0xa0
   ___slab_alloc+0x94/0x4f8
   kmem_cache_alloc_noprof+0x2d4/0x598
   kmem_alloc_batch+0x54/0x170
   fill_pool+0x12c/0x438
   debug_objects_fill_pool+0x58/0x60
   debug_object_activate+0xfc/0x3d0
   add_timer_on+0x250/0x3a0
   add_interrupt_randomness+0x2d4/0x340
   handle_percpu_devid_irq+0x2e0/0x4e0
   handle_irq_desc+0xc0/0x120
   generic_handle_domain_irq+0x20/0x40
   __gic_handle_irq_from_irqson.isra.0+0x3c4/0x708
   gic_handle_irq+0x7c/0xe0
   call_on_irq_stack+0x30/0x48
   do_interrupt_handler+0x134/0x158
   el1_interrupt+0x48/0xb0
    :

During early boot, interrupts are getting enabled before the scheduler
is enabled. In this window (before SYSTEM_SCHEDULING is set) interrupts
can fire and attempt to fill the pool from within the hardirq. This can
lead to a deadlock the interrupt occurred while in the memory allocator.

Add a new can_fill_pool() helper and reorder the exception rule and
forbid this scenario by excluding allocations from hardirq.

Fixes: 06e0ae988f6e ("debugobjects: Allow to refill the pool before SYSTEM_SCHEDULING")
Co-developed-by: Waiman Long <longman@redhat.com>
Co-developed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Co-developed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Waiman Long <longman@redhat.com>
---
 lib/debugobjects.c | 46 +++++++++++++++++++++++++++++++++++++---------
 1 file changed, 37 insertions(+), 9 deletions(-)

diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index b18a682fe3da..6fb00e08a4e2 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -720,6 +720,41 @@ static inline bool debug_objects_is_pi_blocked_on(void)
 #endif
 }
 
+static inline bool can_fill_pool(void)
+{
+	/*
+	 * On !RT enabled kernels there are no restrictions and spinlock_t and
+	 * raw_spinlock_t are the same types.
+	 */
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+		return true;
+
+	/*
+	 * On RT enabled kernels, the task must not be blocked on a lock as
+	 * that could corrupt the PI state when blocking on a lock in the
+	 * allocation path.
+	 */
+	if (debug_objects_is_pi_blocked_on())
+		return false;
+
+	/*
+	 * On RT enabled kernels the pool refill should happen in preemptible
+	 * context.
+	 */
+	if (preemptible())
+		return true;
+
+	/*
+	 * Though during system boot before scheduling is set up, preemption is
+	 * disabled and the pool can get exhausted. Before scheduling is active
+	 * a task cannot be blocked on a sleeping lock, but it might hold a lock
+	 * and if interrupted then hard interrupt context might run into a lock
+	 * inversion. So exclude hard interrupt context from allocations before
+	 * scheduling is active.
+	 */
+	return system_state < SYSTEM_SCHEDULING && !in_hardirq();
+}
+
 static void debug_objects_fill_pool(void)
 {
 	if (!static_branch_likely(&obj_cache_enabled))
@@ -734,18 +769,11 @@ static void debug_objects_fill_pool(void)
 	if (likely(!pool_should_refill(&pool_global)))
 		return;
 
-	/*
-	 * On RT enabled kernels the pool refill must happen in preemptible
-	 * context and not enqueued on an rt_mutex -- for !RT kernels we rely
-	 * on the fact that spinlock_t and raw_spinlock_t are basically the
-	 * same type and this lock-type inversion works just fine.
-	 */
-	if (!IS_ENABLED(CONFIG_PREEMPT_RT) || system_state < SYSTEM_SCHEDULING ||
-	    (preemptible() && !debug_objects_is_pi_blocked_on())) {
+	if (can_fill_pool()) {
 		/*
 		 * Annotate away the spinlock_t inside raw_spinlock_t warning
 		 * by temporarily raising the wait-type to LD_WAIT_CONFIG, matching
-		 * the preemptible() condition above.
+		 * the preemptible() condition in can_fill_pool().
 		 */
 		static DEFINE_WAIT_OVERRIDE_MAP(fill_pool_map, LD_WAIT_CONFIG);
 		lock_map_acquire_try(&fill_pool_map);
-- 
2.54.0
Re: [PATCH v4] debugobjects: Don't call fill_pool() in early boot hardirq context
Posted by Sebastian Andrzej Siewior 3 days, 2 hours ago
On 2026-06-03 15:52:50 [-0400], Waiman Long wrote:
> When booting a debug PREEMPT_RT kernel on an arm64 system with grace
> processor, the following lockdep warning was reported during early boot.
> 
>   ================================
>   WARNING: inconsistent lock state
>   7.1.0-rc4-test+ #1 Not tainted
>   --------------------------------
>   inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
>   swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
>   ffff0000803346a0 (&n->list_lock){?.+.}-{3:3}, at: get_from_partial_node+0x74/0xa0
>     :
>   Call trace:
>     :
>    rt_spin_lock+0xa0/0x400
>    get_from_partial_node+0x74/0xa0
>    ___slab_alloc+0x94/0x4f8
>    kmem_cache_alloc_noprof+0x2d4/0x598
>    kmem_alloc_batch+0x54/0x170
>    fill_pool+0x12c/0x438
>    debug_objects_fill_pool+0x58/0x60
>    debug_object_activate+0xfc/0x3d0
>    add_timer_on+0x250/0x3a0
>    add_interrupt_randomness+0x2d4/0x340
>    handle_percpu_devid_irq+0x2e0/0x4e0
>    handle_irq_desc+0xc0/0x120
>    generic_handle_domain_irq+0x20/0x40
>    __gic_handle_irq_from_irqson.isra.0+0x3c4/0x708
>    gic_handle_irq+0x7c/0xe0
>    call_on_irq_stack+0x30/0x48
>    do_interrupt_handler+0x134/0x158
>    el1_interrupt+0x48/0xb0
>     :

I would strip that backtrace since it is obvious from the description
below.

> During early boot, interrupts are getting enabled before the scheduler
> is enabled. In this window (before SYSTEM_SCHEDULING is set) interrupts
> can fire and attempt to fill the pool from within the hardirq. This can
> lead to a deadlock the interrupt occurred while in the memory allocator.
> 
> Add a new can_fill_pool() helper and reorder the exception rule and
> forbid this scenario by excluding allocations from hardirq.
> 
> Fixes: 06e0ae988f6e ("debugobjects: Allow to refill the pool before SYSTEM_SCHEDULING")
> Co-developed-by: Waiman Long <longman@redhat.com>
> Co-developed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Co-developed-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Waiman Long <longman@redhat.com>

Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Sebastian
Re: [PATCH v4] debugobjects: Don't call fill_pool() in early boot hardirq context
Posted by Waiman Long 2 days, 15 hours ago
On 6/5/26 2:31 AM, Sebastian Andrzej Siewior wrote:
> On 2026-06-03 15:52:50 [-0400], Waiman Long wrote:
>> When booting a debug PREEMPT_RT kernel on an arm64 system with grace
>> processor, the following lockdep warning was reported during early boot.
>>
>>    ================================
>>    WARNING: inconsistent lock state
>>    7.1.0-rc4-test+ #1 Not tainted
>>    --------------------------------
>>    inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
>>    swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
>>    ffff0000803346a0 (&n->list_lock){?.+.}-{3:3}, at: get_from_partial_node+0x74/0xa0
>>      :
>>    Call trace:
>>      :
>>     rt_spin_lock+0xa0/0x400
>>     get_from_partial_node+0x74/0xa0
>>     ___slab_alloc+0x94/0x4f8
>>     kmem_cache_alloc_noprof+0x2d4/0x598
>>     kmem_alloc_batch+0x54/0x170
>>     fill_pool+0x12c/0x438
>>     debug_objects_fill_pool+0x58/0x60
>>     debug_object_activate+0xfc/0x3d0
>>     add_timer_on+0x250/0x3a0
>>     add_interrupt_randomness+0x2d4/0x340
>>     handle_percpu_devid_irq+0x2e0/0x4e0
>>     handle_irq_desc+0xc0/0x120
>>     generic_handle_domain_irq+0x20/0x40
>>     __gic_handle_irq_from_irqson.isra.0+0x3c4/0x708
>>     gic_handle_irq+0x7c/0xe0
>>     call_on_irq_stack+0x30/0x48
>>     do_interrupt_handler+0x134/0x158
>>     el1_interrupt+0x48/0xb0
>>      :
> I would strip that backtrace since it is obvious from the description
> below.

Sure. I can do that.

>
>> During early boot, interrupts are getting enabled before the scheduler
>> is enabled. In this window (before SYSTEM_SCHEDULING is set) interrupts
>> can fire and attempt to fill the pool from within the hardirq. This can
>> lead to a deadlock the interrupt occurred while in the memory allocator.
>>
>> Add a new can_fill_pool() helper and reorder the exception rule and
>> forbid this scenario by excluding allocations from hardirq.
>>
>> Fixes: 06e0ae988f6e ("debugobjects: Allow to refill the pool before SYSTEM_SCHEDULING")
>> Co-developed-by: Waiman Long <longman@redhat.com>
>> Co-developed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>> Co-developed-by: Thomas Gleixner <tglx@linutronix.de>
>> Signed-off-by: Waiman Long <longman@redhat.com>
> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Thanks,
Longman
>
> Sebastian
>