[PATCH v2 3/3] timers: Disable memory pre-allocation of timer debug objects

Waiman Long posted 3 patches 6 months, 2 weeks ago
[PATCH v2 3/3] timers: Disable memory pre-allocation of timer debug objects
Posted by Waiman Long 6 months, 2 weeks ago
A circular locking dependency lockdep splat was hit recently with a
debug kernel. The dependency chain (in reverse order) is:

  -> #3 (&zone->lock){-.-.}-{2:2}:
  -> #2 (&base->lock){-.-.}-{2:2}:
  -> #1 (&console_sch_key){-.-.}-{2:2}:
  -> #0 (console_owner){..-.}-{0:0}:

The last one is from calling printk() within the rmqueue_bulk() call in
mm/page_alloc.c. The "base->lock" is from lock_timer_base() and first
one is due to calling add_timer_on() leading to debug_object_activate()
doing actual memory allocation acquiring the zone lock.

The console_sch_key comes from a s390 console driver in driver/s390/cio.
The console_sch_key -> timer dependency happens because the console
driver is setting a timeout value while holding its lock. Apparently it
is pretty common for a console driver to use timer for timeout or other
timing purposes. So this may happen to other console drivers as well.

One way to break this circular locking dependency is to disallow any
memory allocation when a timer debug object is being handled. Do this by
setting the ODEBUG_FLAG_NO_ALLOC flag in the timer_debug_descr structure.

The figures below show the number of times the debug_objects_fill_pool()
function has reached the statement right before and after the no_alloc
check in initial bootup and after running a parallel kernel build on
a 2-socket 96-threads x86-64 system.

			 Before      After     non-timer %
		 	 ------      -----     -----------
  Initial bootup	  150,730     148,198     98.3%
  Parallel kernel build 5,974,464   5,893,116     98.6%

So from object pre-allocation perspective, timer debug objects represent
just a small slice of the total number of debug objects to be processed.

The allocation of debug_object to the global pool happens when its object
count falls below the (256 + 16 * num_possible_cpus) threshold. Even
then, there may still be free objects available in the percpu pool. So
the chance that debug_objects gets disabled because it is running out
of free debug_object should be minimal.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/time/timer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 553fa469d7cc..e0be64591e43 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -775,6 +775,7 @@ static bool timer_fixup_assert_init(void *addr, enum debug_obj_state state)
 
 static const struct debug_obj_descr timer_debug_descr = {
 	.name			= "timer_list",
+	.flags			= ODEBUG_FLAG_NO_ALLOC,
 	.debug_hint		= timer_debug_hint,
 	.is_static_object	= timer_is_static_object,
 	.fixup_init		= timer_fixup_init,
-- 
2.49.0
Re: [PATCH v2 3/3] timers: Disable memory pre-allocation of timer debug objects
Posted by Thomas Gleixner 6 months, 1 week ago
On Thu, Jun 05 2025 at 23:15, Waiman Long wrote:
> A circular locking dependency lockdep splat was hit recently with a
> debug kernel. The dependency chain (in reverse order) is:
>
>   -> #3 (&zone->lock){-.-.}-{2:2}:
>   -> #2 (&base->lock){-.-.}-{2:2}:
>   -> #1 (&console_sch_key){-.-.}-{2:2}:
>   -> #0 (console_owner){..-.}-{0:0}:
>
> The last one is from calling printk() within the rmqueue_bulk() call in
> mm/page_alloc.c. The "base->lock" is from lock_timer_base() and first
> one is due to calling add_timer_on() leading to debug_object_activate()
> doing actual memory allocation acquiring the zone lock.
>
> The console_sch_key comes from a s390 console driver in driver/s390/cio.
> The console_sch_key -> timer dependency happens because the console
> driver is setting a timeout value while holding its lock. Apparently it
> is pretty common for a console driver to use timer for timeout or other
> timing purposes. So this may happen to other console drivers as well.
>
> One way to break this circular locking dependency is to disallow any
> memory allocation when a timer debug object is being handled. Do this by
> setting the ODEBUG_FLAG_NO_ALLOC flag in the timer_debug_descr
> structure.

Well. I'm absolutely not convinced that this is the right approach.

I have a hard time to find the printk() in rmqueue_bulk(). But if there
is one then it has to go or has to be converted to a deferred printk()
simply because that code can be called from so many contexts, which all
can legitimately create a lock dependency chain into the console drivers
in some way or the other. That's like invoking printk() from the guts of
the scheduler or locking code.

> The figures below show the number of times the debug_objects_fill_pool()
> function has reached the statement right before and after the no_alloc
> check in initial bootup and after running a parallel kernel build on
> a 2-socket 96-threads x86-64 system.
>
> 			 Before      After     non-timer %
> 		 	 ------      -----     -----------
>   Initial bootup	  150,730     148,198     98.3%
>   Parallel kernel build 5,974,464   5,893,116     98.6%
>
> So from object pre-allocation perspective, timer debug objects represent
> just a small slice of the total number of debug objects to be processed.

That math is skewed due to the way how debugobjects handles the
allocations for the global pool.

The initial decision to attempt a refill is:

    count < min_cnt

where min_cnt = 256 + 16 * num_possible_cpus()

That makes _one_ context go into the allocation path unless

    count < min_cnt / 2

which forces all contexts to try allocating in order not to deplete the
pool.

So let's assume we have 16 CPUs, then min_cnt = 512 and therefore
min_cnt / 2 = 256.

As the initial context which allocates when the count goes below 512
might be preempted, the rest of the callers can lower the pool count to
256 easily.

In the 0-day splat the debug_objects OOM happens from o2net_init():

[ 92.566274][ T1] debug_object_init (kbuild/obj/consumer/x86_64-randconfig-003-20250608/lib/debugobjects.c:785) 
[ 92.566777][ T1] init_timer_key (kbuild/obj/consumer/x86_64-randconfig-003-20250608/arch/x86/include/asm/jump_label.h:36
[ 92.567230][ T1] o2net_init (kbuild/obj/consumer/x86_64-randconfig-003-20250608/fs/ocfs2/cluster/tcp.c:2128 (discriminator 3)) 
[ 92.567629][ T1] init_o2nm (kbuild/obj/consumer/x86_64-randconfig-003-20250608/fs/ocfs2/cluster/nodemanager.c:832) 
[ 92.568023][ T1] do_one_initcall (kbuild/obj/consumer/x86_64-randconfig-003-20250608/init/main.c:1257) 

o2net_init() initializes 255 nodes and each node has three delayed works. Each
delayed work contains a timer for which debugobjects needs to create a
new tracking object. So with your brute force approach of disabling
allocations for timers blindy o2net_init() can trivially deplete the
pool.

For the o2net case this requires that workqueue debugobjects are
disabled, but you can't argue that this is silly because there are other
code paths which do bulk initialization of timers w/o having a work
involved.

So using the percentage of timer operations for evaluating how this
change can cause a debug object OOM is just bogus and wishful thinking.

Let's take a step back and ask the obvious question, when there is
actually consumption of debug objects happening:

  1) For all dynamically initialized objects it happens in
     debug_object_init()

  2) For statically initialized objects it happens in
     debug_object_activate()

#2 is arguably irrelevant as there are not gazillions of statically
   initialized objects, which are trackable by debugobjects

#1 is the vast majority and the good news is that the initialization of
   such objects (after allocation) happens mostly in preemptible context
   with no locks held.

So the obvious thing to try is not adding some random flag to timers
(and tomorrow to RCU and work), but to restrict the allocation
requirement to debug_object_init().

Something like the untested below should make all of these headaches
go away. Except for the general observation that debugobjects is not the
only way to create nasty dependency chains into console driver locks,
but that's a headache the MM folks have to sort out.

Thanks,

        tglx
---        

--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -811,8 +811,6 @@ int debug_object_activate(void *addr, co
 	if (!debug_objects_enabled)
 		return 0;
 
-	debug_objects_fill_pool();
-
 	db = get_bucket((unsigned long) addr);
 
 	raw_spin_lock_irqsave(&db->lock, flags);
@@ -1000,8 +998,6 @@ void debug_object_assert_init(void *addr
 	if (!debug_objects_enabled)
 		return;
 
-	debug_objects_fill_pool();
-
 	db = get_bucket((unsigned long) addr);
 
 	raw_spin_lock_irqsave(&db->lock, flags);
Re: [PATCH v2 3/3] timers: Disable memory pre-allocation of timer debug objects
Posted by Waiman Long 6 months, 1 week ago
On 6/12/25 6:03 PM, Thomas Gleixner wrote:
> On Thu, Jun 05 2025 at 23:15, Waiman Long wrote:
>> A circular locking dependency lockdep splat was hit recently with a
>> debug kernel. The dependency chain (in reverse order) is:
>>
>>    -> #3 (&zone->lock){-.-.}-{2:2}:
>>    -> #2 (&base->lock){-.-.}-{2:2}:
>>    -> #1 (&console_sch_key){-.-.}-{2:2}:
>>    -> #0 (console_owner){..-.}-{0:0}:
>>
>> The last one is from calling printk() within the rmqueue_bulk() call in
>> mm/page_alloc.c. The "base->lock" is from lock_timer_base() and first
>> one is due to calling add_timer_on() leading to debug_object_activate()
>> doing actual memory allocation acquiring the zone lock.
>>
>> The console_sch_key comes from a s390 console driver in driver/s390/cio.
>> The console_sch_key -> timer dependency happens because the console
>> driver is setting a timeout value while holding its lock. Apparently it
>> is pretty common for a console driver to use timer for timeout or other
>> timing purposes. So this may happen to other console drivers as well.
>>
>> One way to break this circular locking dependency is to disallow any
>> memory allocation when a timer debug object is being handled. Do this by
>> setting the ODEBUG_FLAG_NO_ALLOC flag in the timer_debug_descr
>> structure.
> Well. I'm absolutely not convinced that this is the right approach.
>
> I have a hard time to find the printk() in rmqueue_bulk(). But if there
> is one then it has to go or has to be converted to a deferred printk()
> simply because that code can be called from so many contexts, which all
> can legitimately create a lock dependency chain into the console drivers
> in some way or the other. That's like invoking printk() from the guts of
> the scheduler or locking code.

Actually, rmqueue_bulk() calls expand() which, in turn, calls 
__add_to_free_list() and the printk() comes from the VM_WARN_ONCE() 
macro there.

In a sense, printk() is called because of some other issues in the mm code.

>
>> The figures below show the number of times the debug_objects_fill_pool()
>> function has reached the statement right before and after the no_alloc
>> check in initial bootup and after running a parallel kernel build on
>> a 2-socket 96-threads x86-64 system.
>>
>> 			 Before      After     non-timer %
>> 		 	 ------      -----     -----------
>>    Initial bootup	  150,730     148,198     98.3%
>>    Parallel kernel build 5,974,464   5,893,116     98.6%
>>
>> So from object pre-allocation perspective, timer debug objects represent
>> just a small slice of the total number of debug objects to be processed.
> That math is skewed due to the way how debugobjects handles the
> allocations for the global pool.
>
> The initial decision to attempt a refill is:
>
>      count < min_cnt
>
> where min_cnt = 256 + 16 * num_possible_cpus()
>
> That makes _one_ context go into the allocation path unless
>
>      count < min_cnt / 2
>
> which forces all contexts to try allocating in order not to deplete the
> pool.
>
> So let's assume we have 16 CPUs, then min_cnt = 512 and therefore
> min_cnt / 2 = 256.
>
> As the initial context which allocates when the count goes below 512
> might be preempted, the rest of the callers can lower the pool count to
> 256 easily.
>
> In the 0-day splat the debug_objects OOM happens from o2net_init():
>
> [ 92.566274][ T1] debug_object_init (kbuild/obj/consumer/x86_64-randconfig-003-20250608/lib/debugobjects.c:785)
> [ 92.566777][ T1] init_timer_key (kbuild/obj/consumer/x86_64-randconfig-003-20250608/arch/x86/include/asm/jump_label.h:36
> [ 92.567230][ T1] o2net_init (kbuild/obj/consumer/x86_64-randconfig-003-20250608/fs/ocfs2/cluster/tcp.c:2128 (discriminator 3))
> [ 92.567629][ T1] init_o2nm (kbuild/obj/consumer/x86_64-randconfig-003-20250608/fs/ocfs2/cluster/nodemanager.c:832)
> [ 92.568023][ T1] do_one_initcall (kbuild/obj/consumer/x86_64-randconfig-003-20250608/init/main.c:1257)
>
> o2net_init() initializes 255 nodes and each node has three delayed works. Each
> delayed work contains a timer for which debugobjects needs to create a
> new tracking object. So with your brute force approach of disabling
> allocations for timers blindy o2net_init() can trivially deplete the
> pool.
>
> For the o2net case this requires that workqueue debugobjects are
> disabled, but you can't argue that this is silly because there are other
> code paths which do bulk initialization of timers w/o having a work
> involved.
>
> So using the percentage of timer operations for evaluating how this
> change can cause a debug object OOM is just bogus and wishful thinking.
>
> Let's take a step back and ask the obvious question, when there is
> actually consumption of debug objects happening:
>
>    1) For all dynamically initialized objects it happens in
>       debug_object_init()
>
>    2) For statically initialized objects it happens in
>       debug_object_activate()
>
> #2 is arguably irrelevant as there are not gazillions of statically
>     initialized objects, which are trackable by debugobjects
>
> #1 is the vast majority and the good news is that the initialization of
>     such objects (after allocation) happens mostly in preemptible context
>     with no locks held.
>
> So the obvious thing to try is not adding some random flag to timers
> (and tomorrow to RCU and work), but to restrict the allocation
> requirement to debug_object_init().
>
> Something like the untested below should make all of these headaches
> go away. Except for the general observation that debugobjects is not the
> only way to create nasty dependency chains into console driver locks,
> but that's a headache the MM folks have to sort out.
>
> Thanks,
>
>          tglx
> ---
>
> --- a/lib/debugobjects.c
> +++ b/lib/debugobjects.c
> @@ -811,8 +811,6 @@ int debug_object_activate(void *addr, co
>   	if (!debug_objects_enabled)
>   		return 0;
>   
> -	debug_objects_fill_pool();
> -
>   	db = get_bucket((unsigned long) addr);
>   
>   	raw_spin_lock_irqsave(&db->lock, flags);
> @@ -1000,8 +998,6 @@ void debug_object_assert_init(void *addr
>   	if (!debug_objects_enabled)
>   		return;
>   
> -	debug_objects_fill_pool();
> -
>   	db = get_bucket((unsigned long) addr);
>   
>   	raw_spin_lock_irqsave(&db->lock, flags);
>
Thanks for the suggestion. I will take further look at this problem.

Cheers,
Longman
Re: [PATCH v2 3/3] timers: Disable memory pre-allocation of timer debug objects
Posted by Thomas Gleixner 6 months, 1 week ago
On Fri, Jun 13 2025 at 11:13, Waiman Long wrote:
> On 6/12/25 6:03 PM, Thomas Gleixner wrote:
>> I have a hard time to find the printk() in rmqueue_bulk(). But if there
>> is one then it has to go or has to be converted to a deferred printk()
>> simply because that code can be called from so many contexts, which all
>> can legitimately create a lock dependency chain into the console drivers
>> in some way or the other. That's like invoking printk() from the guts of
>> the scheduler or locking code.
>
> Actually, rmqueue_bulk() calls expand() which, in turn, calls 
> __add_to_free_list() and the printk() comes from the VM_WARN_ONCE() 
> macro there.

Duh. I missed that one ....

> In a sense, printk() is called because of some other issues in the mm code.

:)

Thanks,

        tglx
Re: [PATCH v2 3/3] timers: Disable memory pre-allocation of timer debug objects
Posted by kernel test robot 6 months, 1 week ago

Hello,

kernel test robot noticed "WARNING:possible_circular_locking_dependency_detected" on:

commit: 89fd87e046372ce70f8146357324cacd365369fe ("[PATCH v2 3/3] timers: Disable memory pre-allocation of timer debug objects")
url: https://github.com/intel-lab-lkp/linux/commits/Waiman-Long/debugobjects-Add-ODEBUG_FLAG_NO_ALLOC-to-disable-memory-allocation/20250606-111718
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 6c58d2791d6046727d87db50a5e46644f195dcf9
patch link: https://lore.kernel.org/all/20250606031539.1004644-4-longman@redhat.com/
patch subject: [PATCH v2 3/3] timers: Disable memory pre-allocation of timer debug objects

in testcase: boot

config: x86_64-randconfig-003-20250608
compiler: gcc-12
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


we observe the issue does not always happen, 21 times out of 100 runs as below.
the parent keeps clean.


fc7e5b596de17969 89fd87e046372ce70f814635732
---------------- ---------------------------
       fail:runs  %reproduction    fail:runs
           |             |             |
           :100         21%          21:100   dmesg.WARNING:possible_circular_locking_dependency_detected
           :100         21%          21:100   dmesg.WARNING:possible_circular_locking_dependency_detected_systemd_is_trying_to_acquire_lock:at:__pm_runtime_resume_but_task_is_already_holding_lock:at:uart_write



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202506121115.b69b8c2-lkp@intel.com


[   92.544844][    T1] WARNING: possible circular locking dependency detected
[   92.545407][    T1] 6.15.0-rc1-00010-g89fd87e04637 #1 Tainted: G                T
[   92.546120][    T1] ------------------------------------------------------
[   92.546721][    T1] systemd/1 is trying to acquire lock:
[ 92.547161][ T1] ffff888118c93968 (&dev->power.lock){-...}-{3:3}, at: __pm_runtime_resume (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/base/power/runtime.c:1180) 
[   92.548064][    T1]
[   92.548064][    T1] but task is already holding lock:
[ 92.548680][ T1] ffffffff8806faf8 (&port_lock_key){....}-{3:3}, at: uart_write (kbuild/obj/consumer/x86_64-randconfig-003-20250608/include/linux/serial_core.h:638 kbuild/obj/consumer/x86_64-randconfig-003-20250608/include/linux/serial_core.h:672 kbuild/obj/consumer/x86_64-randconfig-003-20250608/include/linux/serial_core.h:716 kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/tty/serial/serial_core.c:626) 
[   92.549493][    T1]
[   92.549493][    T1] which lock already depends on the new lock.
[   92.549493][    T1]
[   92.550293][    T1]
[   92.550293][    T1] the existing dependency chain (in reverse order) is:
[   92.551049][    T1]
[   92.551049][    T1] -> #4 (&port_lock_key){....}-{3:3}:
[ 92.551674][ T1] __lock_acquire (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/locking/lockdep.c:5235) 
[ 92.552145][ T1] lock_acquire (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/locking/lockdep.c:472 kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/locking/lockdep.c:5868) 
[ 92.552621][ T1] _raw_spin_lock_irqsave (kbuild/obj/consumer/x86_64-randconfig-003-20250608/include/linux/spinlock_api_smp.h:111 kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/locking/spinlock.c:162) 
[ 92.553094][ T1] serial8250_console_write (kbuild/obj/consumer/x86_64-randconfig-003-20250608/include/linux/serial_core.h:638 kbuild/obj/consumer/x86_64-randconfig-003-20250608/include/linux/serial_core.h:672 kbuild/obj/consumer/x86_64-randconfig-003-20250608/include/linux/serial_core.h:716 kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/tty/serial/8250/8250_port.c:3415) 
[ 92.553560][ T1] univ8250_console_write (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/tty/serial/8250/8250_core.c:396) 
[ 92.554015][ T1] console_emit_next_record (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/printk/printk.c:3055 (discriminator 3) kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/printk/printk.c:3139 (discriminator 3)) 
[ 92.554517][ T1] console_flush_all (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/printk/printk.c:3227) 
[ 92.555013][ T1] console_unlock (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/printk/printk.c:3286 kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/printk/printk.c:3325) 
[ 92.555431][ T1] vprintk_emit (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/printk/printk.c:2450) 
[ 92.555844][ T1] vprintk_default (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/printk/printk.c:2465) 
[ 92.556277][ T1] vprintk (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/printk/printk_safe.c:82) 
[ 92.556641][ T1] _printk (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/printk/printk.c:2475) 
[ 92.557085][ T1] register_console (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/printk/printk.c:4126) 
[ 92.557584][ T1] univ8250_console_init (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/tty/serial/8250/8250_core.c:514) 
[ 92.558034][ T1] console_init (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/printk/printk.c:4323) 
[ 92.558479][ T1] start_kernel (kbuild/obj/consumer/x86_64-randconfig-003-20250608/init/main.c:1035 (discriminator 3)) 
[ 92.558928][ T1] x86_64_start_reservations (kbuild/obj/consumer/x86_64-randconfig-003-20250608/arch/x86/kernel/head64.c:501) 
[ 92.559398][ T1] x86_64_start_kernel (kbuild/obj/consumer/x86_64-randconfig-003-20250608/arch/x86/kernel/head64.c:443 (discriminator 17)) 
[ 92.559835][ T1] common_startup_64 (kbuild/obj/consumer/x86_64-randconfig-003-20250608/arch/x86/kernel/head_64.S:419) 
[   92.560249][    T1]
[   92.560249][    T1] -> #3 (console_owner){....}-{0:0}:
[ 92.560869][ T1] __lock_acquire (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/locking/lockdep.c:5235) 
[ 92.561272][ T1] lock_acquire (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/locking/lockdep.c:472 kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/locking/lockdep.c:5868) 
[ 92.561827][ T1] console_lock_spinning_enable (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/printk/printk.c:1924) 
[ 92.562373][ T1] console_emit_next_record (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/printk/printk.c:3050 (discriminator 3) kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/printk/printk.c:3137 (discriminator 3)) 
[ 92.562932][ T1] console_flush_all (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/printk/printk.c:3227) 
[ 92.563398][ T1] console_unlock (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/printk/printk.c:3286 kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/printk/printk.c:3325) 
[ 92.563806][ T1] vprintk_emit (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/printk/printk.c:2450) 
[ 92.564198][ T1] vprintk_default (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/printk/printk.c:2465) 
[ 92.564592][ T1] vprintk (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/printk/printk_safe.c:82) 
[ 92.564990][ T1] _printk (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/printk/printk.c:2475) 
[ 92.565340][ T1] lookup_object_or_alloc (kbuild/obj/consumer/x86_64-randconfig-003-20250608/lib/debugobjects.c:700) 
[ 92.565838][ T1] __debug_object_init (kbuild/obj/consumer/x86_64-randconfig-003-20250608/lib/debugobjects.c:750) 
[ 92.566274][ T1] debug_object_init (kbuild/obj/consumer/x86_64-randconfig-003-20250608/lib/debugobjects.c:785) 
[ 92.566777][ T1] init_timer_key (kbuild/obj/consumer/x86_64-randconfig-003-20250608/arch/x86/include/asm/jump_label.h:36 kbuild/obj/consumer/x86_64-randconfig-003-20250608/include/trace/events/timer.h:33 kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/time/timer.c:838 kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/time/timer.c:881) 
[ 92.567230][ T1] o2net_init (kbuild/obj/consumer/x86_64-randconfig-003-20250608/fs/ocfs2/cluster/tcp.c:2128 (discriminator 3)) 
[ 92.567629][ T1] init_o2nm (kbuild/obj/consumer/x86_64-randconfig-003-20250608/fs/ocfs2/cluster/nodemanager.c:832) 
[ 92.568023][ T1] do_one_initcall (kbuild/obj/consumer/x86_64-randconfig-003-20250608/init/main.c:1257) 
[ 92.568441][ T1] do_initcalls (kbuild/obj/consumer/x86_64-randconfig-003-20250608/init/main.c:1318 kbuild/obj/consumer/x86_64-randconfig-003-20250608/init/main.c:1335) 
[ 92.568835][ T1] kernel_init_freeable (kbuild/obj/consumer/x86_64-randconfig-003-20250608/init/main.c:1569) 
[ 92.569283][ T1] kernel_init (kbuild/obj/consumer/x86_64-randconfig-003-20250608/init/main.c:1459) 
[ 92.569690][ T1] ret_from_fork (kbuild/obj/consumer/x86_64-randconfig-003-20250608/arch/x86/kernel/process.c:159) 
[ 92.570193][ T1] ret_from_fork_asm (kbuild/obj/consumer/x86_64-randconfig-003-20250608/arch/x86/entry/entry_64.S:255) 
[   92.570619][    T1]
[   92.570619][    T1] -> #2 (&obj_hash[i].lock){-.-.}-{2:2}:
[ 92.571302][ T1] __lock_acquire (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/locking/lockdep.c:5235) 
[ 92.571722][ T1] lock_acquire (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/locking/lockdep.c:472 kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/locking/lockdep.c:5868) 
[ 92.572151][ T1] _raw_spin_lock_irqsave (kbuild/obj/consumer/x86_64-randconfig-003-20250608/include/linux/spinlock_api_smp.h:111 kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/locking/spinlock.c:162) 
[ 92.572704][ T1] debug_object_assert_init (kbuild/obj/consumer/x86_64-randconfig-003-20250608/lib/debugobjects.c:1014) 
[ 92.573200][ T1] __mod_timer (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/time/timer.c:1033) 
[ 92.573613][ T1] mod_timer (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/time/timer.c:1196) 
[ 92.573983][ T1] worker_enter_idle (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/workqueue.c:1053) 
[ 92.574452][ T1] create_worker (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/workqueue.c:2833) 
[ 92.574850][ T1] worker_thread (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/workqueue.c:3063 kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/workqueue.c:3115 kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/workqueue.c:3375) 
[ 92.575278][ T1] kthread (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/kthread.c:464) 
[ 92.575652][ T1] ret_from_fork (kbuild/obj/consumer/x86_64-randconfig-003-20250608/arch/x86/kernel/process.c:159) 
[ 92.576118][ T1] ret_from_fork_asm (kbuild/obj/consumer/x86_64-randconfig-003-20250608/arch/x86/entry/entry_64.S:255) 
[   92.576589][    T1]
[   92.576589][    T1] -> #1 (&pool->lock){-.-.}-{2:2}:
[ 92.577245][ T1] __lock_acquire (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/locking/lockdep.c:5235) 
[ 92.577686][ T1] lock_acquire (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/locking/lockdep.c:472 kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/locking/lockdep.c:5868) 
[ 92.578132][ T1] _raw_spin_lock (kbuild/obj/consumer/x86_64-randconfig-003-20250608/include/linux/spinlock_api_smp.h:134 kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/locking/spinlock.c:154) 
[ 92.578524][ T1] __queue_work (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/workqueue.c:2311) 
[ 92.578929][ T1] queue_work_on (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/workqueue.c:2393) 
[ 92.579474][ T1] rpm_suspend (kbuild/obj/consumer/x86_64-randconfig-003-20250608/include/linux/workqueue.h:662 kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/base/power/runtime.c:673) 
[ 92.579939][ T1] rpm_idle (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/base/power/runtime.c:547 (discriminator 1)) 
[ 92.580379][ T1] __pm_runtime_idle (kbuild/obj/consumer/x86_64-randconfig-003-20250608/include/linux/spinlock.h:406 (discriminator 5) kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/base/power/runtime.c:1113 (discriminator 5)) 
[ 92.580896][ T1] __device_attach (kbuild/obj/consumer/x86_64-randconfig-003-20250608/include/linux/device.h:937 kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/base/dd.c:1051) 
[ 92.581512][ T1] device_initial_probe (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/base/dd.c:1079) 
[ 92.581983][ T1] bus_probe_device (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/base/bus.c:537) 
[ 92.582481][ T1] device_add (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/base/core.c:3673) 
[ 92.582934][ T1] serial_base_port_add (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/tty/serial/serial_base_bus.c:179) 
[ 92.583403][ T1] serial_core_register_port (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/tty/serial/serial_core.c:3342 kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/tty/serial/serial_core.c:3381) 
[ 92.583950][ T1] serial_ctrl_register_port (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/tty/serial/serial_ctrl.c:41) 
[ 92.584518][ T1] uart_add_one_port (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/tty/serial/serial_port.c:143) 
[ 92.585031][ T1] serial8250_register_8250_port (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/tty/serial/8250/8250_core.c:822) 
[ 92.585578][ T1] serial_pnp_probe (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/tty/serial/8250/8250_pnp.c:480) 
[ 92.586119][ T1] pnp_device_probe (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/pnp/driver.c:111) 
[ 92.586574][ T1] really_probe (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/base/dd.c:579 kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/base/dd.c:658) 
[ 92.586973][ T1] __driver_probe_device (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/base/dd.c:800) 
[ 92.587468][ T1] driver_probe_device (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/base/dd.c:830) 
[ 92.587949][ T1] __driver_attach (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/base/dd.c:1217) 
[ 92.588438][ T1] bus_for_each_dev (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/base/bus.c:370) 
[ 92.588872][ T1] driver_attach (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/base/dd.c:1234) 
[ 92.589342][ T1] bus_add_driver (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/base/bus.c:678) 
[ 92.589759][ T1] driver_register (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/base/driver.c:249) 
[ 92.590215][ T1] pnp_register_driver (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/pnp/driver.c:280) 
[ 92.590609][ T1] serial8250_pnp_init (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/tty/serial/8250/8250_pnp.c:530) 
[ 92.591096][ T1] serial8250_init (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/tty/serial/8250/8250_platform.c:315) 
[ 92.591488][ T1] do_one_initcall (kbuild/obj/consumer/x86_64-randconfig-003-20250608/init/main.c:1257) 
[ 92.591939][ T1] do_initcalls (kbuild/obj/consumer/x86_64-randconfig-003-20250608/init/main.c:1318 kbuild/obj/consumer/x86_64-randconfig-003-20250608/init/main.c:1335) 
[ 92.592334][ T1] kernel_init_freeable (kbuild/obj/consumer/x86_64-randconfig-003-20250608/init/main.c:1569) 
[ 92.592879][ T1] kernel_init (kbuild/obj/consumer/x86_64-randconfig-003-20250608/init/main.c:1459) 
[ 92.593289][ T1] ret_from_fork (kbuild/obj/consumer/x86_64-randconfig-003-20250608/arch/x86/kernel/process.c:159) 
[ 92.593701][ T1] ret_from_fork_asm (kbuild/obj/consumer/x86_64-randconfig-003-20250608/arch/x86/entry/entry_64.S:255) 
[   92.594139][    T1]
[   92.594139][    T1] -> #0 (&dev->power.lock){-...}-{3:3}:
[ 92.594791][ T1] check_prev_add (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/locking/lockdep.c:3167) 
[ 92.595300][ T1] validate_chain (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/locking/lockdep.c:3286 kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/locking/lockdep.c:3909) 
[ 92.595732][ T1] __lock_acquire (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/locking/lockdep.c:5235) 
[ 92.596198][ T1] lock_acquire (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/locking/lockdep.c:472 kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/locking/lockdep.c:5868) 
[ 92.596626][ T1] _raw_spin_lock_irqsave (kbuild/obj/consumer/x86_64-randconfig-003-20250608/include/linux/spinlock_api_smp.h:111 kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/locking/spinlock.c:162) 
[ 92.597149][ T1] __pm_runtime_resume (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/base/power/runtime.c:1180) 
[ 92.597624][ T1] __uart_start (kbuild/obj/consumer/x86_64-randconfig-003-20250608/include/linux/pm_runtime.h:414 kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/tty/serial/serial_core.c:148) 
[ 92.598037][ T1] uart_write (kbuild/obj/consumer/x86_64-randconfig-003-20250608/include/linux/serial_core.h:638 kbuild/obj/consumer/x86_64-randconfig-003-20250608/include/linux/serial_core.h:682 kbuild/obj/consumer/x86_64-randconfig-003-20250608/include/linux/serial_core.h:785 kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/tty/serial/serial_core.c:636) 
[ 92.598483][ T1] process_output_block (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/tty/n_tty.c:561) 
[ 92.598979][ T1] n_tty_write (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/tty/n_tty.c:2377) 
[ 92.599384][ T1] iterate_tty_write (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/tty/tty_io.c:1015) 
[ 92.599892][ T1] file_tty_write+0x13f/0x240 
[ 92.600397][ T1] redirected_tty_write (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/tty/tty_io.c:1111 kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/tty/tty_io.c:1134) 
[ 92.600918][ T1] do_iter_readv_writev (kbuild/obj/consumer/x86_64-randconfig-003-20250608/fs/read_write.c:825) 
[ 92.601380][ T1] vfs_writev (kbuild/obj/consumer/x86_64-randconfig-003-20250608/fs/read_write.c:1055) 
[ 92.601825][ T1] do_writev (kbuild/obj/consumer/x86_64-randconfig-003-20250608/fs/read_write.c:1101) 
[ 92.602226][ T1] __ia32_sys_writev (kbuild/obj/consumer/x86_64-randconfig-003-20250608/fs/read_write.c:1169 kbuild/obj/consumer/x86_64-randconfig-003-20250608/fs/read_write.c:1166 kbuild/obj/consumer/x86_64-randconfig-003-20250608/fs/read_write.c:1166) 
[ 92.602662][ T1] ia32_sys_call (kbuild/obj/consumer/x86_64-randconfig-003-20250608/./arch/x86/include/generated/asm/syscalls_32.h:147) 
[ 92.603141][ T1] do_int80_emulation (kbuild/obj/consumer/x86_64-randconfig-003-20250608/arch/x86/entry/syscall_32.c:83 kbuild/obj/consumer/x86_64-randconfig-003-20250608/arch/x86/entry/syscall_32.c:172) 
[ 92.603655][ T1] asm_int80_emulation (kbuild/obj/consumer/x86_64-randconfig-003-20250608/arch/x86/include/asm/idtentry.h:626) 
[   92.604247][    T1]
[   92.604247][    T1] other info that might help us debug this:
[   92.604247][    T1]
[   92.605052][    T1] Chain exists of:
[   92.605052][    T1]   &dev->power.lock --> console_owner --> &port_lock_key
[   92.605052][    T1]
[   92.606202][    T1]  Possible unsafe locking scenario:
[   92.606202][    T1]
[   92.606763][    T1]        CPU0                    CPU1
[   92.607130][    T1]        ----                    ----
[   92.607543][    T1]   lock(&port_lock_key);
[   92.607947][    T1]                                lock(console_owner);
[   92.608624][    T1]                                lock(&port_lock_key);
[   92.609191][    T1]   lock(&dev->power.lock);
[   92.609615][    T1]
[   92.609615][    T1]  *** DEADLOCK ***
[   92.609615][    T1]
[   92.610237][    T1] 5 locks held by systemd/1:
[ 92.610655][ T1] #0: ffff8881193a3bb8 (&tty->ldisc_sem){++++}-{0:0}, at: ldsem_down_read (kbuild/obj/consumer/x86_64-randconfig-003-20250608/drivers/tty/tty_ldsem.c:340) 


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250612/202506121115.b69b8c2-lkp@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH v2 3/3] timers: Disable memory pre-allocation of timer debug objects
Posted by Thomas Gleixner 6 months, 1 week ago
On Thu, Jun 12 2025 at 12:55, kernel test robot wrote:
> kernel test robot noticed
> "WARNING:possible_circular_locking_dependency_detected" on:

The lockdep issue comes from patch 2/3 which prints with the hash bucket
lock held.

> commit: 89fd87e046372ce70f8146357324cacd365369fe ("[PATCH v2 3/3] timers: Disable memory pre-allocation of timer debug objects")

This one triggers it because the preallocation disable makes debug
object go out of memory.

> [ 92.564990][ T1] _printk (kbuild/obj/consumer/x86_64-randconfig-003-20250608/kernel/printk/printk.c:2475) 
> [ 92.565340][ T1] lookup_object_or_alloc (kbuild/obj/consumer/x86_64-randconfig-003-20250608/lib/debugobjects.c:700) 

That's the:

       debug_objects_disable("out of memory");

introduced by patch 2/3. The printk() in debug_objects_disable() needs
to be deferred.

> [ 92.565838][ T1] __debug_object_init (kbuild/obj/consumer/x86_64-randconfig-003-20250608/lib/debugobjects.c:750) 
> [ 92.566274][ T1] debug_object_init (kbuild/obj/consumer/x86_64-randconfig-003-20250608/lib/debugobjects.c:785) 

Which means that the changelog saying:

  "So the chance that debug_objects gets disabled because it is running out
   of free debug_object should be minimal."

is slightly off ....

Thanks,

        tglx