[PATCH] locking/semaphore: Use wake_q to wake up processes outside lock critical section

Waiman Long posted 1 patch 1 year ago
There is a newer version of this series
kernel/locking/semaphore.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
[PATCH] locking/semaphore: Use wake_q to wake up processes outside lock critical section
Posted by Waiman Long 1 year ago
A circular lock dependency splat has been seen involving down_trylock().

[ 4011.795602] ======================================================
[ 4011.795603] WARNING: possible circular locking dependency detected
[ 4011.795607] 6.12.0-41.el10.s390x+debug
[ 4011.795612] ------------------------------------------------------
[ 4011.795613] dd/32479 is trying to acquire lock:
[ 4011.795617] 0015a20accd0d4f8 ((console_sem).lock){-.-.}-{2:2}, at: down_trylock+0x26/0x90
[ 4011.795636]
[ 4011.795636] but task is already holding lock:
[ 4011.795637] 000000017e461698 (&zone->lock){-.-.}-{2:2}, at: rmqueue_bulk+0xac/0x8f0

  the existing dependency chain (in reverse order) is:
  -> #4 (&zone->lock){-.-.}-{2:2}:
  -> #3 (hrtimer_bases.lock){-.-.}-{2:2}:
  -> #2 (&rq->__lock){-.-.}-{2:2}:
  -> #1 (&p->pi_lock){-.-.}-{2:2}:
  -> #0 ((console_sem).lock){-.-.}-{2:2}:

The last two are due to calling try_to_wake_up() while holding the
console.sem raw_spinlock. Break this circular lock dependency by using
wake_q to do the wakeup instead of calling try_to_wake_up() under the
lock. By doing so, the semaphore's raw_spinlock becomes a terminal lock
without taking any further raw_spinlock's underneath it.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/semaphore.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
index cb27cbf5162f..751e3cbb899f 100644
--- a/kernel/locking/semaphore.c
+++ b/kernel/locking/semaphore.c
@@ -29,6 +29,7 @@
 #include <linux/export.h>
 #include <linux/sched.h>
 #include <linux/sched/debug.h>
+#include <linux/sched/wake_q.h>
 #include <linux/semaphore.h>
 #include <linux/spinlock.h>
 #include <linux/ftrace.h>
@@ -38,7 +39,7 @@ static noinline void __down(struct semaphore *sem);
 static noinline int __down_interruptible(struct semaphore *sem);
 static noinline int __down_killable(struct semaphore *sem);
 static noinline int __down_timeout(struct semaphore *sem, long timeout);
-static noinline void __up(struct semaphore *sem);
+static noinline void __up(struct semaphore *sem, struct wake_q_head *wake_q);
 
 /**
  * down - acquire the semaphore
@@ -185,13 +186,16 @@ EXPORT_SYMBOL(down_timeout);
 void __sched up(struct semaphore *sem)
 {
 	unsigned long flags;
+	DEFINE_WAKE_Q(wake_q);
 
 	raw_spin_lock_irqsave(&sem->lock, flags);
 	if (likely(list_empty(&sem->wait_list)))
 		sem->count++;
 	else
-		__up(sem);
+		__up(sem, &wake_q);
 	raw_spin_unlock_irqrestore(&sem->lock, flags);
+	if (!wake_q_empty(&wake_q))
+		wake_up_q(&wake_q);
 }
 EXPORT_SYMBOL(up);
 
@@ -271,11 +275,12 @@ static noinline int __sched __down_timeout(struct semaphore *sem, long timeout)
 	return __down_common(sem, TASK_UNINTERRUPTIBLE, timeout);
 }
 
-static noinline void __sched __up(struct semaphore *sem)
+static noinline void __sched __up(struct semaphore *sem,
+				  struct wake_q_head *wake_q)
 {
 	struct semaphore_waiter *waiter = list_first_entry(&sem->wait_list,
 						struct semaphore_waiter, list);
 	list_del(&waiter->list);
 	waiter->up = true;
-	wake_up_process(waiter->task);
+	wake_q_add(wake_q, waiter->task);
 }
-- 
2.47.1
Re: [PATCH] locking/semaphore: Use wake_q to wake up processes outside lock critical section
Posted by Peter Zijlstra 1 year ago
On Tue, Jan 21, 2025 at 08:13:14PM -0500, Waiman Long wrote:
> A circular lock dependency splat has been seen involving down_trylock().
> 
> [ 4011.795602] ======================================================
> [ 4011.795603] WARNING: possible circular locking dependency detected
> [ 4011.795607] 6.12.0-41.el10.s390x+debug
> [ 4011.795612] ------------------------------------------------------
> [ 4011.795613] dd/32479 is trying to acquire lock:
> [ 4011.795617] 0015a20accd0d4f8 ((console_sem).lock){-.-.}-{2:2}, at: down_trylock+0x26/0x90
> [ 4011.795636]
> [ 4011.795636] but task is already holding lock:
> [ 4011.795637] 000000017e461698 (&zone->lock){-.-.}-{2:2}, at: rmqueue_bulk+0xac/0x8f0
> 
>   the existing dependency chain (in reverse order) is:
>   -> #4 (&zone->lock){-.-.}-{2:2}:
>   -> #3 (hrtimer_bases.lock){-.-.}-{2:2}:
>   -> #2 (&rq->__lock){-.-.}-{2:2}:
>   -> #1 (&p->pi_lock){-.-.}-{2:2}:
>   -> #0 ((console_sem).lock){-.-.}-{2:2}:

The whole #3->#4 thing seems dodgy, where is that? Specifically
hrtimer_bases.lock is a raw_spinlock, while zone->lock is a spinlock,
this is not a valid nesting.
Re: [PATCH] locking/semaphore: Use wake_q to wake up processes outside lock critical section
Posted by Waiman Long 1 year ago
On 1/22/25 5:39 AM, Peter Zijlstra wrote:
> On Tue, Jan 21, 2025 at 08:13:14PM -0500, Waiman Long wrote:
>> A circular lock dependency splat has been seen involving down_trylock().
>>
>> [ 4011.795602] ======================================================
>> [ 4011.795603] WARNING: possible circular locking dependency detected
>> [ 4011.795607] 6.12.0-41.el10.s390x+debug
>> [ 4011.795612] ------------------------------------------------------
>> [ 4011.795613] dd/32479 is trying to acquire lock:
>> [ 4011.795617] 0015a20accd0d4f8 ((console_sem).lock){-.-.}-{2:2}, at: down_trylock+0x26/0x90
>> [ 4011.795636]
>> [ 4011.795636] but task is already holding lock:
>> [ 4011.795637] 000000017e461698 (&zone->lock){-.-.}-{2:2}, at: rmqueue_bulk+0xac/0x8f0
>>
>>    the existing dependency chain (in reverse order) is:
>>    -> #4 (&zone->lock){-.-.}-{2:2}:
>>    -> #3 (hrtimer_bases.lock){-.-.}-{2:2}:
>>    -> #2 (&rq->__lock){-.-.}-{2:2}:
>>    -> #1 (&p->pi_lock){-.-.}-{2:2}:
>>    -> #0 ((console_sem).lock){-.-.}-{2:2}:
> The whole #3->#4 thing seems dodgy, where is that? Specifically
> hrtimer_bases.lock is a raw_spinlock, while zone->lock is a spinlock,
> this is not a valid nesting.

Ah, you are right. That is another raw_spinlock to spinlock nesting 
issue that needs to be addressed.

[ 4011.795646] -> #4 (&zone->lock){-.-.}-{2:2}: [ 4011.795650] 
__lock_acquire+0xe86/0x1cc0 [ 4011.795655] 
lock_acquire.part.0+0x258/0x630 [ 4011.795657] lock_acquire+0xb8/0xe0 [ 
4011.795659] _raw_spin_lock_irqsave+0xb4/0x120 [ 4011.795663] 
rmqueue_bulk+0xac/0x8f0 [ 4011.795665] __rmqueue_pcplist+0x580/0x830 [ 
4011.795667] rmqueue_pcplist+0xfc/0x470 [ 4011.795669] 
rmqueue.isra.0+0xdec/0x11b0 [ 4011.795671] 
get_page_from_freelist+0x2ee/0xeb0 [ 4011.795673] 
__alloc_pages_noprof+0x2c2/0x520 [ 4011.795676] 
alloc_pages_mpol_noprof+0x1fc/0x4d0 [ 4011.795681] 
alloc_pages_noprof+0x8c/0xe0 [ 4011.795684] allocate_slab+0x320/0x460 [ 
4011.795686] ___slab_alloc+0xa58/0x12b0 [ 4011.795688] 
__slab_alloc.isra.0+0x42/0x60 [ 4011.795690] 
kmem_cache_alloc_noprof+0x304/0x350 [ 4011.795692] fill_pool+0xf6/0x450 
[ 4011.795697] debug_object_activate+0xfe/0x360 [ 4011.795700] 
enqueue_hrtimer+0x34/0x190 [ 4011.795703] __run_hrtimer+0x3c8/0x4c0 [ 
4011.795705] __hrtimer_run_queues+0x1b2/0x260 [ 4011.795707] 
hrtimer_interrupt+0x316/0x760 [ 4011.795709] do_IRQ+0x9a/0xe0 [ 
4011.795712] do_irq_async+0xf6/0x160

We probably need to look at debug object code to avoid doing memory 
allocation under some circumstances. PROVE_RAW_LOCK_NESTING was not 
enabled and so this case wasn't caught. Will enable that in the next 
minor release to find out more instances of such bugs.

Anyway, do you think this patch is worth taking? There are may be other 
cases like this that are hiding somewhere and showing up from time to time.

Cheers, Longman