kernel/trace/trace_osnoise.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
Lockdep reports this deadlock log:
============================================
WARNING: possible recursive locking detected
--------------------------------------------
sh/31444 is trying to acquire lock:
ffffffff82c51af0 (cpu_hotplug_lock){++++}-{0:0}, at:
stop_per_cpu_kthreads+0x7/0x60
but task is already holding lock:
ffffffff82c51af0 (cpu_hotplug_lock){++++}-{0:0}, at:
start_per_cpu_kthreads+0x28/0x140
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(cpu_hotplug_lock);
lock(cpu_hotplug_lock);
Call Trace:
<TASK>
__lock_acquire+0x1612/0x29b0
lock_acquire+0xd0/0x2e0
cpus_read_lock+0x49/0x120
stop_per_cpu_kthreads+0x7/0x60
start_kthread+0x105/0x120
start_per_cpu_kthreads+0xdd/0x140
osnoise_workload_start+0x261/0x2f0
osnoise_tracer_start+0x18/0x4
In start_kthread(), when kthread_run_on_cpu() fails,
cpus_read_unlock() should be called before stop_per_cpu_kthreads(),
but both start_per_cpu_kthreads() and start_kthread() call the error
handling routine stop_per_cpu_kthreads(),
which is redundant. Only one call is necessary.
To fix this, move stop_per_cpu_kthreads() outside of start_kthread(),
use the return value of start_kthread() to determine kthread creation
error.
The same issue exists in osnoise_hotplug_workfn() too.
Reviewed-by: Yang Guang <yang.guang5@zte.com.cn>
Reviewed-by: Wang Yong <wang.yong12@zte.com.cn>
Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
---
kernel/trace/trace_osnoise.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
index 92e16f03fa4e..38fb0c655f5b 100644
--- a/kernel/trace/trace_osnoise.c
+++ b/kernel/trace/trace_osnoise.c
@@ -2029,7 +2029,6 @@ static int start_kthread(unsigned int cpu)
if (IS_ERR(kthread)) {
pr_err(BANNER "could not start sampling thread\n");
- stop_per_cpu_kthreads();
return -ENOMEM;
}
@@ -2097,7 +2096,7 @@ static void osnoise_hotplug_workfn(struct
work_struct *dummy)
return;
guard(mutex)(&interface_lock);
- guard(cpus_read_lock)();
+ cpus_read_lock();
if (!cpu_online(cpu))
return;
@@ -2105,7 +2104,12 @@ static void osnoise_hotplug_workfn(struct
work_struct *dummy)
if (!cpumask_test_cpu(cpu, &osnoise_cpumask))
return;
- start_kthread(cpu);
+ if (start_kthread(cpu)) {
+ cpus_read_unlock();
+ stop_per_cpu_kthreads();
+ return;
+ }
+ cpus_read_unlock();
}
static DECLARE_WORK(osnoise_hotplug_work, osnoise_hotplug_workfn);
--
2.15.2
Hi,
On Tue, Feb 25, 2025 at 12:31:32PM +0000, Ran Xiaokai wrote:
> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>
> Lockdep reports this deadlock log:
> ============================================
> WARNING: possible recursive locking detected
> --------------------------------------------
> sh/31444 is trying to acquire lock:
> ffffffff82c51af0 (cpu_hotplug_lock){++++}-{0:0}, at:
> stop_per_cpu_kthreads+0x7/0x60
>
> but task is already holding lock:
> ffffffff82c51af0 (cpu_hotplug_lock){++++}-{0:0}, at:
> start_per_cpu_kthreads+0x28/0x140
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(cpu_hotplug_lock);
> lock(cpu_hotplug_lock);
>
> Call Trace:
> <TASK>
> __lock_acquire+0x1612/0x29b0
> lock_acquire+0xd0/0x2e0
> cpus_read_lock+0x49/0x120
> stop_per_cpu_kthreads+0x7/0x60
> start_kthread+0x105/0x120
> start_per_cpu_kthreads+0xdd/0x140
> osnoise_workload_start+0x261/0x2f0
> osnoise_tracer_start+0x18/0x4
>
> In start_kthread(), when kthread_run_on_cpu() fails,
> cpus_read_unlock() should be called before stop_per_cpu_kthreads(),
> but both start_per_cpu_kthreads() and start_kthread() call the error
> handling routine stop_per_cpu_kthreads(),
> which is redundant. Only one call is necessary.
> To fix this, move stop_per_cpu_kthreads() outside of start_kthread(),
> use the return value of start_kthread() to determine kthread creation
> error.
> The same issue exists in osnoise_hotplug_workfn() too.
>
> Reviewed-by: Yang Guang <yang.guang5@zte.com.cn>
> Reviewed-by: Wang Yong <wang.yong12@zte.com.cn>
> Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> ---
> kernel/trace/trace_osnoise.c | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
> index 92e16f03fa4e..38fb0c655f5b 100644
> --- a/kernel/trace/trace_osnoise.c
> +++ b/kernel/trace/trace_osnoise.c
> @@ -2029,7 +2029,6 @@ static int start_kthread(unsigned int cpu)
>
> if (IS_ERR(kthread)) {
> pr_err(BANNER "could not start sampling thread\n");
> - stop_per_cpu_kthreads();
> return -ENOMEM;
> }
>
> @@ -2097,7 +2096,7 @@ static void osnoise_hotplug_workfn(struct
> work_struct *dummy)
> return;
>
> guard(mutex)(&interface_lock);
> - guard(cpus_read_lock)();
> + cpus_read_lock();
>
> if (!cpu_online(cpu))
> return;
> @@ -2105,7 +2104,12 @@ static void osnoise_hotplug_workfn(struct
> work_struct *dummy)
> if (!cpumask_test_cpu(cpu, &osnoise_cpumask))
> return;
>
> - start_kthread(cpu);
> + if (start_kthread(cpu)) {
> + cpus_read_unlock();
> + stop_per_cpu_kthreads();
Is it right to call stop_per_cpu_kthreads() which stops osnoise kthread
for every other CPUs in the system if a failure occurs during hotplug of a
CPU?
On another note, since stop_per_cpu_kthreads() invokes stop_kthread()
for every online CPU. It's better to remove stop_per_cpu_kthreads() from
start_kthread() and handle the error in `osnoise_hotplug_workfn`
Vishal
> + return;
> + }
> + cpus_read_unlock();
> }
>
> static DECLARE_WORK(osnoise_hotplug_work, osnoise_hotplug_workfn);
> --
> 2.15.2
>
>On Tue, Feb 25, 2025 at 12:31:32PM +0000, Ran Xiaokai wrote:
>> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>>
>> Lockdep reports this deadlock log:
>> ============================================
>> WARNING: possible recursive locking detected
>> --------------------------------------------
>> sh/31444 is trying to acquire lock:
>> ffffffff82c51af0 (cpu_hotplug_lock){++++}-{0:0}, at:
>> stop_per_cpu_kthreads+0x7/0x60
>>
>> but task is already holding lock:
>> ffffffff82c51af0 (cpu_hotplug_lock){++++}-{0:0}, at:
>> start_per_cpu_kthreads+0x28/0x140
>>
>> other info that might help us debug this:
>> Possible unsafe locking scenario:
>>
>> CPU0
>> ----
>> lock(cpu_hotplug_lock);
>> lock(cpu_hotplug_lock);
>>
>> Call Trace:
>> <TASK>
>> __lock_acquire+0x1612/0x29b0
>> lock_acquire+0xd0/0x2e0
>> cpus_read_lock+0x49/0x120
>> stop_per_cpu_kthreads+0x7/0x60
>> start_kthread+0x105/0x120
>> start_per_cpu_kthreads+0xdd/0x140
>> osnoise_workload_start+0x261/0x2f0
>> osnoise_tracer_start+0x18/0x4
>>
>> In start_kthread(), when kthread_run_on_cpu() fails,
>> cpus_read_unlock() should be called before stop_per_cpu_kthreads(),
>> but both start_per_cpu_kthreads() and start_kthread() call the error
>> handling routine stop_per_cpu_kthreads(),
>> which is redundant. Only one call is necessary.
>> To fix this, move stop_per_cpu_kthreads() outside of start_kthread(),
>> use the return value of start_kthread() to determine kthread creation
>> error.
>> The same issue exists in osnoise_hotplug_workfn() too.
>>
>> Reviewed-by: Yang Guang <yang.guang5@zte.com.cn>
>> Reviewed-by: Wang Yong <wang.yong12@zte.com.cn>
>> Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>> ---
>> kernel/trace/trace_osnoise.c | 10 +++++++---
>> 1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
>> index 92e16f03fa4e..38fb0c655f5b 100644
>> --- a/kernel/trace/trace_osnoise.c
>> +++ b/kernel/trace/trace_osnoise.c
>> @@ -2029,7 +2029,6 @@ static int start_kthread(unsigned int cpu)
>>
>> if (IS_ERR(kthread)) {
>> pr_err(BANNER "could not start sampling thread\n");
>> - stop_per_cpu_kthreads();
>> return -ENOMEM;
>> }
>>
>> @@ -2097,7 +2096,7 @@ static void osnoise_hotplug_workfn(struct
>> work_struct *dummy)
>> return;
>>
>> guard(mutex)(&interface_lock);
>> - guard(cpus_read_lock)();
>> + cpus_read_lock();
>>
>> if (!cpu_online(cpu))
>> return;
>> @@ -2105,7 +2104,12 @@ static void osnoise_hotplug_workfn(struct
>> work_struct *dummy)
>> if (!cpumask_test_cpu(cpu, &osnoise_cpumask))
>> return;
>>
>> - start_kthread(cpu);
>> + if (start_kthread(cpu)) {
>> + cpus_read_unlock();
>> + stop_per_cpu_kthreads();
>
>Is it right to call stop_per_cpu_kthreads() which stops osnoise kthread
>for every other CPUs in the system if a failure occurs during hotplug of a
>CPU?
I also suspect that this is not a rational behavior.
>On another note, since stop_per_cpu_kthreads() invokes stop_kthread()
>for every online CPU. It's better to remove stop_per_cpu_kthreads() from
>start_kthread() and handle the error in `osnoise_hotplug_workfn`
Hi, Vishal
I did this in my first versin, something like this:
@@ -2097,7 +2096,7 @@ static void osnoise_hotplug_workfn(struct
work_struct *dummy)
return;
guard(mutex)(&interface_lock);
- guard(cpus_read_lock)();
+ cpus_read_lock();
if (!cpu_online(cpu)) {
+ cpus_read_unlock();
return;
}
if (!cpumask_test_cpu(cpu, &osnoise_cpumask)) {
+ cpus_read_unlock();
return;
}
- start_kthread(cpu);
+ if (start_kthread(cpu)) {
+ cpus_read_unlock();
+ stop_per_cpu_kthreads();
+ return;
+ }
+ cpus_read_unlock();
}
We have to drop the guard() and call unlock() manually,
this somewhat makes the code redundant.
> Vishal
>> + return;
>> + }
>> + cpus_read_unlock();
>> }
>>
>> static DECLARE_WORK(osnoise_hotplug_work, osnoise_hotplug_workfn);
>> --
>> 2.15.2
>>
On Tue, 25 Feb 2025 12:31:32 +0000
Ran Xiaokai <ranxiaokai627@163.com> wrote:
> @@ -2097,7 +2096,7 @@ static void osnoise_hotplug_workfn(struct
> work_struct *dummy)
> return;
>
> guard(mutex)(&interface_lock);
> - guard(cpus_read_lock)();
> + cpus_read_lock();
>
> if (!cpu_online(cpu))
> return;
This is buggy. You removed the guard, and right below we have an error exit
that will leave this function without unlocking the cpus_read_lock().
> @@ -2105,7 +2104,12 @@ static void osnoise_hotplug_workfn(struct
> work_struct *dummy)
> if (!cpumask_test_cpu(cpu, &osnoise_cpumask))
> return;
>
> - start_kthread(cpu);
> + if (start_kthread(cpu)) {
> + cpus_read_unlock();
> + stop_per_cpu_kthreads();
> + return;
If all you want to do is to unlock before calling stop_per_cpu_kthreads(),
then this should simply be:
if (start_kthread(cpu)) {
cpus_read_unlock();
stop_per_cpu_kthreads();
cpus_read_lock(); // The guard() above will unlock this
return;
}
But I still have to verify that this is indeed the issue here.
-- Steve
> + }
> + cpus_read_unlock();
> }
>
> static DECLARE_WORK(osnoise_hotplug_work, osnoise_hotplug_workfn);
>On Tue, 25 Feb 2025 12:31:32 +0000
>Ran Xiaokai <ranxiaokai627@163.com> wrote:
>
>> @@ -2097,7 +2096,7 @@ static void osnoise_hotplug_workfn(struct
>> work_struct *dummy)
>> return;
>>
>> guard(mutex)(&interface_lock);
>> - guard(cpus_read_lock)();
>> + cpus_read_lock();
>>
>> if (!cpu_online(cpu))
>> return;
>
>This is buggy. You removed the guard, and right below we have an error exit
>that will leave this function without unlocking the cpus_read_lock().
Indeed.
I will run the LTP cpu-hotplug testcases before the next verion.
>> @@ -2105,7 +2104,12 @@ static void osnoise_hotplug_workfn(struct
>> work_struct *dummy)
>> if (!cpumask_test_cpu(cpu, &osnoise_cpumask))
>> return;
>>
>> - start_kthread(cpu);
>> + if (start_kthread(cpu)) {
>> + cpus_read_unlock();
>> + stop_per_cpu_kthreads();
>> + return;
>
>If all you want to do is to unlock before calling stop_per_cpu_kthreads(),
>then this should simply be:
>
> if (start_kthread(cpu)) {
> cpus_read_unlock();
> stop_per_cpu_kthreads();
> cpus_read_lock(); // The guard() above will unlock this
> return;
> }
This is the deadlock senario:
start_per_cpu_kthreads()
cpus_read_lock(); // first lock call
start_kthread(cpu)
... kthread_run_on_cpu() fails:
if (IS_ERR(kthread)) {
stop_per_cpu_kthreads(); {
cpus_read_lock(); // second lock call. Cause the AA deadlock senario
}
}
stop_per_cpu_kthreads();
Besides, stop_per_cpu_kthreads() is called both in start_per_cpu_kthreads() and
start_kthread() which is unnecessary.
So the fix should be inside start_kthread()?
How about this ?
--- a/kernel/trace/trace_osnoise.c
+++ b/kernel/trace/trace_osnoise.c
@@ -2029,7 +2029,9 @@ static int start_kthread(unsigned int cpu)
if (IS_ERR(kthread)) {
pr_err(BANNER "could not start sampling thread\n");
+ cpus_read_unlock();
stop_per_cpu_kthreads();
+ cpus_read_lock();
return -ENOMEM;
}
@@ -2076,7 +2078,6 @@ static int start_per_cpu_kthreads(void)
retval = start_kthread(cpu);
if (retval) {
cpus_read_unlock();
- stop_per_cpu_kthreads();
return retval;
}
}
>
>But I still have to verify that this is indeed the issue here.
>
>-- Steve
>
>
>> + }
>> + cpus_read_unlock();
>> }
>>
>> static DECLARE_WORK(osnoise_hotplug_work, osnoise_hotplug_workfn);
On Wed, 26 Feb 2025 03:42:53 +0000
Ran Xiaokai <ranxiaokai627@163.com> wrote:
> >> @@ -2105,7 +2104,12 @@ static void osnoise_hotplug_workfn(struct
> >> work_struct *dummy)
> >> if (!cpumask_test_cpu(cpu, &osnoise_cpumask))
> >> return;
> >>
> >> - start_kthread(cpu);
> >> + if (start_kthread(cpu)) {
> >> + cpus_read_unlock();
> >> + stop_per_cpu_kthreads();
> >> + return;
> >
> >If all you want to do is to unlock before calling stop_per_cpu_kthreads(),
> >then this should simply be:
> >
> > if (start_kthread(cpu)) {
> > cpus_read_unlock();
> > stop_per_cpu_kthreads();
> > cpus_read_lock(); // The guard() above will unlock this
> > return;
> > }
>
> This is the deadlock senario:
> start_per_cpu_kthreads()
> cpus_read_lock(); // first lock call
> start_kthread(cpu)
> ... kthread_run_on_cpu() fails:
> if (IS_ERR(kthread)) {
> stop_per_cpu_kthreads(); {
> cpus_read_lock(); // second lock call. Cause the AA deadlock senario
> }
> }
> stop_per_cpu_kthreads();
>
> Besides, stop_per_cpu_kthreads() is called both in start_per_cpu_kthreads() and
> start_kthread() which is unnecessary.
>
> So the fix should be inside start_kthread()?
> How about this ?
No! You misunderstood what I wrote above.
Instead of removing the guard, keep it!
Do everything the same, but instead of having the error path of:
[..]
if (start_kthread(cpu)) {
cpus_read_unlock();
stop_per_cpu_kthreads();
return;
}
cpus_read_unlock();
}
Which requires removing the guard. Just do:
if (start_kthread(cpu)) {
cpus_read_unlock();
stop_per_cpu_kthreads();
cpus_read_lock(); // The guard() will unlock this
}
}
I'm just saying to not replace the guard with open coded locking of
cpus_read_lock().
-- Steve
>On Wed, 26 Feb 2025 03:42:53 +0000
>Ran Xiaokai <ranxiaokai627@163.com> wrote:
>
>> >> @@ -2105,7 +2104,12 @@ static void osnoise_hotplug_workfn(struct
>> >> work_struct *dummy)
>> >> if (!cpumask_test_cpu(cpu, &osnoise_cpumask))
>> >> return;
>> >>
>> >> - start_kthread(cpu);
>> >> + if (start_kthread(cpu)) {
>> >> + cpus_read_unlock();
>> >> + stop_per_cpu_kthreads();
>> >> + return;
>> >
>> >If all you want to do is to unlock before calling stop_per_cpu_kthreads(),
>> >then this should simply be:
>> >
>> > if (start_kthread(cpu)) {
>> > cpus_read_unlock();
>> > stop_per_cpu_kthreads();
>> > cpus_read_lock(); // The guard() above will unlock this
>> > return;
>> > }
>>
>> This is the deadlock senario:
>> start_per_cpu_kthreads()
>> cpus_read_lock(); // first lock call
>> start_kthread(cpu)
>> ... kthread_run_on_cpu() fails:
>> if (IS_ERR(kthread)) {
>> stop_per_cpu_kthreads(); {
>> cpus_read_lock(); // second lock call. Cause the AA deadlock senario
>> }
>> }
>> stop_per_cpu_kthreads();
>>
>> Besides, stop_per_cpu_kthreads() is called both in start_per_cpu_kthreads() and
>> start_kthread() which is unnecessary.
>>
>> So the fix should be inside start_kthread()?
>> How about this ?
>
>No! You misunderstood what I wrote above.
>
>Instead of removing the guard, keep it!
>
>Do everything the same, but instead of having the error path of:
>
>[..]
> if (start_kthread(cpu)) {
> cpus_read_unlock();
> stop_per_cpu_kthreads();
> return;
> }
> cpus_read_unlock();
> }
>
>Which requires removing the guard. Just do:
>
> if (start_kthread(cpu)) {
> cpus_read_unlock();
> stop_per_cpu_kthreads();
> cpus_read_lock(); // The guard() will unlock this
> }
> }
Hi, Steve
Sorry for the late response.
Yes, this will fix the deadlock issue.
What i mentioned before is that there is a redundant of stop_per_cpu_kthreads()
in start_per_cpu_kthreads().
start_per_cpu_kthreads()
for_each_cpu(cpu, current_mask) {
retval = start_kthread(cpu);
{
if (IS_ERR(kthread))
stop_per_cpu_kthreads(); // first cleanup call of stop_per_cpu_kthreads()
return -ENOMEM;
}
if (retval) {
cpus_read_unlock();
stop_per_cpu_kthreads(); // the redundant call of stop_per_cpu_kthreads()
But the second call will not cause any trouble, So i will send a v2
just according to your suggestion.
>I'm just saying to not replace the guard with open coded locking of
>cpus_read_lock().
>
>-- Steve
© 2016 - 2026 Red Hat, Inc.