[PATCH] debugobjects: Don't call fill_pool() in early boot non-task context of RT kernel

Waiman Long posted 1 patch 4 days, 4 hours ago
There is a newer version of this series
lib/debugobjects.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
[PATCH] debugobjects: Don't call fill_pool() in early boot non-task context of RT kernel
Posted by Waiman Long 4 days, 4 hours ago
When booting a debug PREEMPT_RT kernel on an arm64 system with grace
processor, the following lockdep splat was reported during early boot.

  ================================
  WARNING: inconsistent lock state
  7.1.0-rc4-test+ #1 Not tainted
  --------------------------------
  inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
  swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
  ffff0000803346a0 (&n->list_lock){?.+.}-{3:3}, at: get_from_partial_node+0x74/0xa0
  {HARDIRQ-ON-W} state was registered at:
    __lock_acquire+0x3d4/0xb70
    lock_acquire.part.0+0x178/0x2e0
    lock_acquire+0xa0/0x240
    rt_spin_lock+0xa0/0x400
    __refill_objects_node+0x8c/0x638
    refill_objects+0x60/0x120
    __pcs_replace_empty_main+0x11c/0x3a8
    __kmalloc_noprof+0x550/0x5e0
    __alloc_workqueue+0x7a4/0xb68
    alloc_workqueue_noprof+0xc0/0x118
    kmem_cache_init_late+0x3c/0xd8
    start_kernel+0x360/0x460
    __primary_switched+0x8c/0xa0
  irq event stamp: 12818
  hardirqs last  enabled at (12817): [<ffffda0f4322be98>] __raw_spin_unlock_irqrestore+0xb8/0xe8
  hardirqs last disabled at (12818): [<ffffda0f45a611f4>] el1_interrupt+0x34/0xb0
  softirqs last  enabled at (0): [<0000000000000000>] 0x0
  softirqs last disabled at (0): [<0000000000000000>] 0x0
    :
  Call trace:
   show_stack+0x20/0x40 (C)
   dump_stack_lvl+0x7c/0x160
   dump_stack+0x1c/0x48
   print_usage_bug.part.0+0x248/0x270
   mark_lock_irq+0x410/0x608
   mark_lock+0x1ec/0x3a8
   mark_usage+0x138/0x170
   __lock_acquire+0x3d4/0xb70
   lock_acquire.part.0+0x178/0x2e0
   lock_acquire+0xa0/0x240
   rt_spin_lock+0xa0/0x400
   get_from_partial_node+0x74/0xa0
   ___slab_alloc+0x94/0x4f8
   kmem_cache_alloc_noprof+0x2d4/0x598
   kmem_alloc_batch+0x54/0x170
   fill_pool+0x12c/0x438
   debug_objects_fill_pool.part.0+0x88/0x100
   debug_objects_fill_pool+0x58/0x60
   debug_object_activate+0xfc/0x3d0
   add_timer_on+0x250/0x3a0
   add_interrupt_randomness+0x2d4/0x340
   handle_percpu_devid_irq+0x2e0/0x4e0
   handle_irq_desc+0xc0/0x120
   generic_handle_domain_irq+0x20/0x40
   __gic_handle_irq_from_irqson.isra.0+0x3c4/0x708
   gic_handle_irq+0x7c/0xe0
   call_on_irq_stack+0x30/0x48
   do_interrupt_handler+0x134/0x158
   el1_interrupt+0x48/0xb0
   el1h_64_irq_handler+0x18/0x28
   el1h_64_irq+0x80/0x88

The {IN-HARDIRQ-W} usage happens when debug_objects_fill_pool() calls
fill_pool() in the hardirq context during early boot. It is because of
the "system_state < SYSTEM_SCHEDULING" check in debug_objects_fill_pool()
which allows fill_pool() to be called from any context during early
boot.

It shouldn't really be a problem as fill_poll() will not called from
non-preemptible context after early boot, but the lockdep warning can
still cause confusion and anxiety. Fix that by further restricting the
call to only in_task() context during early boot.

Fixes: 06e0ae988f6e ("debugobjects: Allow to refill the pool before SYSTEM_SCHEDULING")
Signed-off-by: Waiman Long <longman@redhat.com>
---
 lib/debugobjects.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 12e2e42e6a31..236ea5e716df 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -727,11 +727,14 @@ static void debug_objects_fill_pool(void)
 
 	/*
 	 * On RT enabled kernels the pool refill must happen in preemptible
-	 * context -- for !RT kernels we rely on the fact that spinlock_t and
+	 * context or in task context during early boot.
+	 *
+	 * For !RT kernels we rely on the fact that spinlock_t and
 	 * raw_spinlock_t are basically the same type and this lock-type
 	 * inversion works just fine.
 	 */
-	if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible() || system_state < SYSTEM_SCHEDULING) {
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible() ||
+	   (system_state < SYSTEM_SCHEDULING && in_task())) {
 		/*
 		 * Annotate away the spinlock_t inside raw_spinlock_t warning
 		 * by temporarily raising the wait-type to LD_WAIT_CONFIG, matching
-- 
2.54.0
Re: [PATCH] debugobjects: Don't call fill_pool() in early boot non-task context of RT kernel
Posted by Waiman Long 4 days, 4 hours ago
On 5/20/26 12:43 PM, Waiman Long wrote:
> When booting a debug PREEMPT_RT kernel on an arm64 system with grace
> processor, the following lockdep splat was reported during early boot.
>
>    ================================
>    WARNING: inconsistent lock state
>    7.1.0-rc4-test+ #1 Not tainted
>    --------------------------------
>    inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
>    swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
>    ffff0000803346a0 (&n->list_lock){?.+.}-{3:3}, at: get_from_partial_node+0x74/0xa0
>    {HARDIRQ-ON-W} state was registered at:
>      __lock_acquire+0x3d4/0xb70
>      lock_acquire.part.0+0x178/0x2e0
>      lock_acquire+0xa0/0x240
>      rt_spin_lock+0xa0/0x400
>      __refill_objects_node+0x8c/0x638
>      refill_objects+0x60/0x120
>      __pcs_replace_empty_main+0x11c/0x3a8
>      __kmalloc_noprof+0x550/0x5e0
>      __alloc_workqueue+0x7a4/0xb68
>      alloc_workqueue_noprof+0xc0/0x118
>      kmem_cache_init_late+0x3c/0xd8
>      start_kernel+0x360/0x460
>      __primary_switched+0x8c/0xa0
>    irq event stamp: 12818
>    hardirqs last  enabled at (12817): [<ffffda0f4322be98>] __raw_spin_unlock_irqrestore+0xb8/0xe8
>    hardirqs last disabled at (12818): [<ffffda0f45a611f4>] el1_interrupt+0x34/0xb0
>    softirqs last  enabled at (0): [<0000000000000000>] 0x0
>    softirqs last disabled at (0): [<0000000000000000>] 0x0
>      :
>    Call trace:
>     show_stack+0x20/0x40 (C)
>     dump_stack_lvl+0x7c/0x160
>     dump_stack+0x1c/0x48
>     print_usage_bug.part.0+0x248/0x270
>     mark_lock_irq+0x410/0x608
>     mark_lock+0x1ec/0x3a8
>     mark_usage+0x138/0x170
>     __lock_acquire+0x3d4/0xb70
>     lock_acquire.part.0+0x178/0x2e0
>     lock_acquire+0xa0/0x240
>     rt_spin_lock+0xa0/0x400
>     get_from_partial_node+0x74/0xa0
>     ___slab_alloc+0x94/0x4f8
>     kmem_cache_alloc_noprof+0x2d4/0x598
>     kmem_alloc_batch+0x54/0x170
>     fill_pool+0x12c/0x438
>     debug_objects_fill_pool.part.0+0x88/0x100
>     debug_objects_fill_pool+0x58/0x60
>     debug_object_activate+0xfc/0x3d0
>     add_timer_on+0x250/0x3a0
>     add_interrupt_randomness+0x2d4/0x340
>     handle_percpu_devid_irq+0x2e0/0x4e0
>     handle_irq_desc+0xc0/0x120
>     generic_handle_domain_irq+0x20/0x40
>     __gic_handle_irq_from_irqson.isra.0+0x3c4/0x708
>     gic_handle_irq+0x7c/0xe0
>     call_on_irq_stack+0x30/0x48
>     do_interrupt_handler+0x134/0x158
>     el1_interrupt+0x48/0xb0
>     el1h_64_irq_handler+0x18/0x28
>     el1h_64_irq+0x80/0x88
>
> The {IN-HARDIRQ-W} usage happens when debug_objects_fill_pool() calls
> fill_pool() in the hardirq context during early boot. It is because of
> the "system_state < SYSTEM_SCHEDULING" check in debug_objects_fill_pool()
> which allows fill_pool() to be called from any context during early
> boot.
>
> It shouldn't really be a problem as fill_poll() will not called from
> non-preemptible context after early boot, but the lockdep warning can
> still cause confusion and anxiety. Fix that by further restricting the
> call to only in_task() context during early boot.

Sorry, the above paragraph isn't right. Will send out a v2.

Cheers,
Longman

> Fixes: 06e0ae988f6e ("debugobjects: Allow to refill the pool before SYSTEM_SCHEDULING")
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>   lib/debugobjects.c | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/lib/debugobjects.c b/lib/debugobjects.c
> index 12e2e42e6a31..236ea5e716df 100644
> --- a/lib/debugobjects.c
> +++ b/lib/debugobjects.c
> @@ -727,11 +727,14 @@ static void debug_objects_fill_pool(void)
>   
>   	/*
>   	 * On RT enabled kernels the pool refill must happen in preemptible
> -	 * context -- for !RT kernels we rely on the fact that spinlock_t and
> +	 * context or in task context during early boot.
> +	 *
> +	 * For !RT kernels we rely on the fact that spinlock_t and
>   	 * raw_spinlock_t are basically the same type and this lock-type
>   	 * inversion works just fine.
>   	 */
> -	if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible() || system_state < SYSTEM_SCHEDULING) {
> +	if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible() ||
> +	   (system_state < SYSTEM_SCHEDULING && in_task())) {
>   		/*
>   		 * Annotate away the spinlock_t inside raw_spinlock_t warning
>   		 * by temporarily raising the wait-type to LD_WAIT_CONFIG, matching
Re: [PATCH] debugobjects: Don't call fill_pool() in early boot non-task context of RT kernel
Posted by Thomas Gleixner 4 days, 3 hours ago
On Wed, May 20 2026 at 12:57, Waiman Long wrote:
> On 5/20/26 12:43 PM, Waiman Long wrote:
>> It shouldn't really be a problem as fill_poll() will not called from
>> non-preemptible context after early boot, but the lockdep warning can
>> still cause confusion and anxiety. Fix that by further restricting the
>> call to only in_task() context during early boot.
>
> Sorry, the above paragraph isn't right. Will send out a v2.

And please trim the back trace while at it.

>> Fixes: 06e0ae988f6e ("debugobjects: Allow to refill the pool before SYSTEM_SCHEDULING")
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>   lib/debugobjects.c | 7 +++++--
>>   1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/debugobjects.c b/lib/debugobjects.c
>> index 12e2e42e6a31..236ea5e716df 100644
>> --- a/lib/debugobjects.c
>> +++ b/lib/debugobjects.c
>> @@ -727,11 +727,14 @@ static void debug_objects_fill_pool(void)
>>   
>>   	/*
>>   	 * On RT enabled kernels the pool refill must happen in preemptible
>> -	 * context -- for !RT kernels we rely on the fact that spinlock_t and
>> +	 * context or in task context during early boot.
>> +	 *
>> +	 * For !RT kernels we rely on the fact that spinlock_t and
>>   	 * raw_spinlock_t are basically the same type and this lock-type
>>   	 * inversion works just fine.
>>   	 */
>> -	if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible() || system_state < SYSTEM_SCHEDULING) {
>> +	if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible() ||
>> +	   (system_state < SYSTEM_SCHEDULING && in_task())) {

Conflicts with:

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=core/urgent
Re: [PATCH] debugobjects: Don't call fill_pool() in early boot non-task context of RT kernel
Posted by Waiman Long 4 days, 3 hours ago
On 5/20/26 2:04 PM, Thomas Gleixner wrote:
> On Wed, May 20 2026 at 12:57, Waiman Long wrote:
>> On 5/20/26 12:43 PM, Waiman Long wrote:
>>> It shouldn't really be a problem as fill_poll() will not called from
>>> non-preemptible context after early boot, but the lockdep warning can
>>> still cause confusion and anxiety. Fix that by further restricting the
>>> call to only in_task() context during early boot.
>> Sorry, the above paragraph isn't right. Will send out a v2.
> And please trim the back trace while at it.
OK, will send a v3.
>
>>> Fixes: 06e0ae988f6e ("debugobjects: Allow to refill the pool before SYSTEM_SCHEDULING")
>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>> ---
>>>    lib/debugobjects.c | 7 +++++--
>>>    1 file changed, 5 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/lib/debugobjects.c b/lib/debugobjects.c
>>> index 12e2e42e6a31..236ea5e716df 100644
>>> --- a/lib/debugobjects.c
>>> +++ b/lib/debugobjects.c
>>> @@ -727,11 +727,14 @@ static void debug_objects_fill_pool(void)
>>>    
>>>    	/*
>>>    	 * On RT enabled kernels the pool refill must happen in preemptible
>>> -	 * context -- for !RT kernels we rely on the fact that spinlock_t and
>>> +	 * context or in task context during early boot.
>>> +	 *
>>> +	 * For !RT kernels we rely on the fact that spinlock_t and
>>>    	 * raw_spinlock_t are basically the same type and this lock-type
>>>    	 * inversion works just fine.
>>>    	 */
>>> -	if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible() || system_state < SYSTEM_SCHEDULING) {
>>> +	if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible() ||
>>> +	   (system_state < SYSTEM_SCHEDULING && in_task())) {
> Conflicts with:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=core/urgent
>
Will rebase it to tip.

Thanks,
Longman