kernel/locking/rtmutex.c | 8 -------- 1 file changed, 8 deletions(-)
before rt_mutex_adjust_prio_chain(),unlock lock->wait_lock will cause
BUG_ON at kernel/locking/rtmutex.c:1331:
[147800.420240] Call trace:
[147800.420243] dump_backtrace+0x0/0x1cc
[147800.420260] show_stack+0x18/0x24
[147800.420267] dump_stack+0xcc/0x12c
[147800.420278] nmi_cpu_backtrace+0xbc/0xc8
[147800.420287] nmi_trigger_cpumask_backtrace+0xcc/0x1c0
[147800.420295] arch_trigger_cpumask_backtrace+0x28/0x40
[147800.420304] oops_enter+0x3c/0x48
[147800.420313] die+0x54/0x228
[147800.420320] bug_handler+0x44/0x6c
[147800.420327] call_break_hook+0x40/0x74
[147800.420336] brk_handler+0x1c/0x5c
[147800.420342] do_debug_exception+0x88/0xc4
[147800.420350] el1_dbg+0x38/0x54
[147800.420359] el1_sync_handler+0xac/0xb8
[147800.420365] el1_sync+0x88/0x140
[147800.420372] task_blocks_on_rt_mutex+0x94/0x1b4
[147800.420383] rt_spin_lock_slowlock_locked+0x90/0x1bc
[147800.420391] rt_spin_lock_slowlock+0x5c/0x94
[147800.420399] rt_spin_lock_fastlock.constprop.0+0x28/0x34
[147800.420408] rt_spin_lock+0x10/0x24
[147800.420415] local_lock_acquire+0x28/0xd0
[147800.420425] free_unref_page+0x94/0x114
[147800.420432] free_the_page+0x14/0x2c
[147800.420440] __free_pages+0x30/0x78
[147800.420447] __vunmap+0x188/0x1c8
[147800.420453] __vfree+0x4c/0x50
[147800.420460] vfree+0x30/0x40
[147800.420465] free_thread_stack+0xd0/0x120
[147800.420472] put_task_stack+0x60/0x6c
[147800.420479] __put_task_struct+0x4c/0xd4
[147800.420485] put_task_struct+0x44/0x78
[147800.420493] rt_mutex_adjust_prio_chain+0x1a0/0x370
[147800.420502] task_blocks_on_rt_mutex+0x188/0x1b4
[147800.420511] rt_mutex_slowlock_locked+0xb0/0x170
[147800.420519] rt_mutex_slowlock+0x7c/0xd4
[147800.420526] __rt_mutex_lock_state+0x3c/0x50
[147800.420534] _mutex_lock_blk_flush+0x5c/0x6c
[147800.420543] _mutex_lock+0x14/0x20
[147800.420550] ion_alloc+0x5f8/0x62c
[147800.420561] ion_ioctl+0x18c/0x694
[147800.420569] vfs_ioctl+0x28/0x48
[147800.420578] __arm64_sys_ioctl+0x78/0xcc
[147800.420586] el0_svc_common.constprop.0+0x148/0x1e8
[147800.420593] do_el0_svc+0x50/0x80
[147800.420599] el0_svc+0x14/0x20
[147800.420606] el0_sync_handler+0xcc/0x154
[147800.420612] el0_sync+0x180/0x1c0
A\B\C is task.
L1\L2 is lock.
adj: means rt_mutex_adjust_prio_chain()
key process:
1. A owns L1,and blocked on L2.
2. B blocked on L1,B execute mutex_lock or spinlock will adjust A's
priority by execute adj func.
3. before execute adj,it will unlock L1->wait_lock
4. If at this point,C release L2.A owns L2,and finish the whole thread
work very quickly,Finally the B thread exited.In this process,
unlock L1 will assign 0x1 to L1->owner,what orign value is A
task_struct.But in adj func,the parameter of task is still A's
pointer.becaues of A already exited,put_task_struct will release
task A.
5. If local page.lock is locked,it will cause a BUG_ON,becaues one
task A be blocked on two lock.
====A================B===============C================
| | |->owns L2
|->owns L1 | |
|->block on L2 |->lock L1.rawspin_wait_lock
| |->block on L1 |
| | |->unlock L2
| |->get A task_truct
|->owns L2 |->unlocked L1.rawspin_wait_lock
|->lock L1.rawspin_wait_lock
|->unlock L1 | |
| | |
|->unlock L1.rawspin_wait_lock
|->release L2 | |
|->A exit & not free
| |->put A task_struct
| | ↓
[5]
Signed-off-by: yuxin.ye <yeyuxin0925@gmail.com>
---
kernel/locking/rtmutex.c | 8 --------
1 file changed, 8 deletions(-)
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 7779ee8ab..52e9cebc3 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1175,13 +1175,9 @@ static int __sched task_blocks_on_rt_mutex(struct rt_mutex_base *lock,
*/
get_task_struct(owner);
- raw_spin_unlock_irq(&lock->wait_lock);
-
res = rt_mutex_adjust_prio_chain(owner, chwalk, lock,
next_lock, waiter, task);
- raw_spin_lock_irq(&lock->wait_lock);
-
return res;
}
@@ -1461,12 +1457,8 @@ static void __sched remove_waiter(struct rt_mutex_base *lock,
/* gets dropped in rt_mutex_adjust_prio_chain()! */
get_task_struct(owner);
- raw_spin_unlock_irq(&lock->wait_lock);
-
rt_mutex_adjust_prio_chain(owner, RT_MUTEX_MIN_CHAINWALK, lock,
next_lock, NULL, current);
-
- raw_spin_lock_irq(&lock->wait_lock);
}
/**
--
2.17.1
On 7/20/22 03:28, yuxin.ye wrote: > before rt_mutex_adjust_prio_chain(),unlock lock->wait_lock will cause > BUG_ON at kernel/locking/rtmutex.c:1331: The current upstream kernel/locking/rtmutex.c has no BUG_ON() call. Which version of the kernel are you using? Cheers, Longman
On Wed, Jul 20, 2022 at 10:25:17PM -0400, Waiman Long wrote: > On 7/20/22 03:28, yuxin.ye wrote: > > before rt_mutex_adjust_prio_chain(),unlock lock->wait_lock will cause > > BUG_ON at kernel/locking/rtmutex.c:1331: > > The current upstream kernel/locking/rtmutex.c has no BUG_ON() call. Which > version of the kernel are you using? > > Cheers, > Longman > The Linux version is 5.10. The upstream has indeed removed the BUG_ON, But in rt_mutex_adjust_prio_chain() it is still possible to have a thread is blocked by two locks. Can this situation be ignored without BUG_ON? Thanks.
On 7/21/22 03:17, yuxin.ye wrote: > On Wed, Jul 20, 2022 at 10:25:17PM -0400, Waiman Long wrote: >> On 7/20/22 03:28, yuxin.ye wrote: >>> before rt_mutex_adjust_prio_chain(),unlock lock->wait_lock will cause >>> BUG_ON at kernel/locking/rtmutex.c:1331: >> The current upstream kernel/locking/rtmutex.c has no BUG_ON() call. Which >> version of the kernel are you using? >> >> Cheers, >> Longman >> > The Linux version is 5.10. > The upstream has indeed removed the BUG_ON, But in rt_mutex_adjust_prio_chain() > it is still possible to have a thread is blocked by two locks. Can this situation > be ignored without BUG_ON? No. However, we don't remove the lock like what you do with your patch. It will corrupt the data if multiple CPUs are allowed to run rt_mutex_adjust_prio_chain() for the same rt_mutex simultaneously. You need to find a way to fix the underlying problem. BTW, I still can't see a BUG_ON at line 1331 of rtmutex.c with a v5.10 kernel. Does your source tree have some out-of-tree patches that modifies rtmutex? Cheers, Longman
On Thu, Jul 21, 2022 at 02:14:16PM -0400, Waiman Long wrote: > > On 7/21/22 03:17, yuxin.ye wrote: > > On Wed, Jul 20, 2022 at 10:25:17PM -0400, Waiman Long wrote: > > > On 7/20/22 03:28, yuxin.ye wrote: > > > > before rt_mutex_adjust_prio_chain(),unlock lock->wait_lock will cause > > > > BUG_ON at kernel/locking/rtmutex.c:1331: > > > The current upstream kernel/locking/rtmutex.c has no BUG_ON() call. Which > > > version of the kernel are you using? > > > > > > Cheers, > > > Longman > > > > > The Linux version is 5.10. > > The upstream has indeed removed the BUG_ON, But in rt_mutex_adjust_prio_chain() > > it is still possible to have a thread is blocked by two locks. Can this situation > > be ignored without BUG_ON? > > No. However, we don't remove the lock like what you do with your patch. It > will corrupt the data if multiple CPUs are allowed to run > rt_mutex_adjust_prio_chain() for the same rt_mutex simultaneously. You need > to find a way to fix the underlying problem. > > BTW, I still can't see a BUG_ON at line 1331 of rtmutex.c with a v5.10 > kernel. Does your source tree have some out-of-tree patches that modifies > rtmutex? > > Cheers, > Longman > Yes, I'm sorry I overlooked that earlier. We applied the RT patch,and the BUG_ON are also introduced by these patches. Back to the question, I think remove the wait_lock unlock before rt_mutex_adjust_prio_chain() is more likely to protect some data. The commont on task_blocks_on_rt_mutex() indicates that must be called with wait_lock held, but it unlock before call rt_mutex_adjust_prio_chain(). This may cause the owner thread to unlock the orig_lock and exit the thead. Finally, when calling put_task_struct(owner) in rt_mutex_adjust_prio_chain(), the thread is blocked by another lock that is deeply hidden. Actully, I'm not sure why rt_mutex_adjust_prio_chain() dosen't need wait_lock protection. Thanks again.
© 2016 - 2026 Red Hat, Inc.