mm/kmemleak.c | 5 +++++ 1 file changed, 5 insertions(+)
A soft lockup warning was observed on a relative small system x86-64
system with 16 GB of memory when running a debug kernel with kmemleak
enabled.
watchdog: BUG: soft lockup - CPU#8 stuck for 33s! [kworker/8:1:134]
The test system was running a workload with hot unplug happening
in parallel. Then kemleak decided to disable itself due to its
inability to allocate more kmemleak objects. The debug kernel has its
CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE set to 40,000.
The soft lockup happened in kmemleak_do_cleanup() when the existing
kmemleak objects were being removed and deleted one-by-one in a loop
via a workqueue. In this particular case, there are at least 40,000
objects that need to be processed and given the slowness of a debug
kernel and the fact that a raw_spinlock has to be acquired and released
in __delete_object(), it could take a while to properly handle all
these objects.
As kmemleak has been disabled in this case, the object removal and
deletion process can be further optimized as locking isn't really
needed. However, it is probably not worth the effort to optimize for
such an edge case that should rarely happen. So the simple solution is
to call cond_resched() at periodic interval in the iteration loop to
avoid soft lockup.
Signed-off-by: Waiman Long <longman@redhat.com>
---
mm/kmemleak.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 8d588e685311..620abd95e680 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -2181,6 +2181,7 @@ static const struct file_operations kmemleak_fops = {
static void __kmemleak_do_cleanup(void)
{
struct kmemleak_object *object, *tmp;
+ unsigned int cnt = 0;
/*
* Kmemleak has already been disabled, no need for RCU list traversal
@@ -2189,6 +2190,10 @@ static void __kmemleak_do_cleanup(void)
list_for_each_entry_safe(object, tmp, &object_list, object_list) {
__remove_object(object);
__delete_object(object);
+
+ /* Call cond_resched() once per 64 iterations to avoid soft lockup */
+ if (!(++cnt & 0x3f))
+ cond_resched();
}
}
--
2.50.0
On Mon, Jul 28, 2025 at 03:02:48PM -0400, Waiman Long wrote: > A soft lockup warning was observed on a relative small system x86-64 > system with 16 GB of memory when running a debug kernel with kmemleak > enabled. > > watchdog: BUG: soft lockup - CPU#8 stuck for 33s! [kworker/8:1:134] > > The test system was running a workload with hot unplug happening > in parallel. Then kemleak decided to disable itself due to its > inability to allocate more kmemleak objects. The debug kernel has its > CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE set to 40,000. > > The soft lockup happened in kmemleak_do_cleanup() when the existing > kmemleak objects were being removed and deleted one-by-one in a loop > via a workqueue. In this particular case, there are at least 40,000 > objects that need to be processed and given the slowness of a debug > kernel and the fact that a raw_spinlock has to be acquired and released > in __delete_object(), it could take a while to properly handle all > these objects. > > As kmemleak has been disabled in this case, the object removal and > deletion process can be further optimized as locking isn't really > needed. However, it is probably not worth the effort to optimize for > such an edge case that should rarely happen. So the simple solution is > to call cond_resched() at periodic interval in the iteration loop to > avoid soft lockup. > > Signed-off-by: Waiman Long <longman@redhat.com> I agree, it's not worth rewriting this path for an unlikely event. So I'm fine with this approach. Thanks. Acked-by: Catalin Marinas <catalin.marinas@arm.com>
© 2016 - 2025 Red Hat, Inc.