kernel/trace/rv/rv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
From: Xiu Jianfeng <xiujianfeng@huawei.com>
When booting kernel with lockdown=confidentiality parameter, the system
will hang at rv_register_reactor() due to waiting for rv_interface_lock,
as shown in the following log,
INFO: task swapper/0:1 blocked for more than 122 seconds.
Not tainted 6.17.0-rc6-next-20250915+ #29
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:swapper/0 state:D stack:0 pid:1 tgid:1 ppid:0
Call Trace:
<TASK>
__schedule+0x492/0x1600
schedule+0x27/0xf0
schedule_preempt_disabled+0x15/0x30
__mutex_lock.constprop.0+0x538/0x9e0
? vprintk+0x18/0x50
? _printk+0x5f/0x90
__mutex_lock_slowpath+0x13/0x20
mutex_lock+0x3b/0x50
rv_register_reactor+0x48/0xe0
? __pfx_register_react_printk+0x10/0x10
register_react_printk+0x15/0x20
do_one_initcall+0x5d/0x340
kernel_init_freeable+0x351/0x540
? __pfx_kernel_init+0x10/0x10
kernel_init+0x1b/0x200
? __pfx_kernel_init+0x10/0x10
ret_from_fork+0x1fb/0x220
? __pfx_kernel_init+0x10/0x10
ret_from_fork_asm+0x1a/0x30
The root cause is that, when the kernel lockdown is in confidentiality
mode, rv_create_dir(), which is essentially tracefs_create_dir(), will
return NULL. This, in turn, causes create_monitor_dir() to return
-ENOMEM, and finally leading to the mutex not being unlocked.
Fixes: 24cbfe18d55a ("rv: Merge struct rv_monitor_def into struct rv_monitor")
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
---
kernel/trace/rv/rv.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/trace/rv/rv.c b/kernel/trace/rv/rv.c
index 1482e91c39f4..e35565dd2dc5 100644
--- a/kernel/trace/rv/rv.c
+++ b/kernel/trace/rv/rv.c
@@ -805,7 +805,7 @@ int rv_register_monitor(struct rv_monitor *monitor, struct rv_monitor *parent)
retval = create_monitor_dir(monitor, parent);
if (retval)
- return retval;
+ goto out_unlock;
/* keep children close to the parent for easier visualisation */
if (parent)
--
2.43.0
On Wed, 2025-09-17 at 12:57 +0000, Xiu Jianfeng wrote:
> From: Xiu Jianfeng <xiujianfeng@huawei.com>
>
> When booting kernel with lockdown=confidentiality parameter, the
> system
> will hang at rv_register_reactor() due to waiting for
> rv_interface_lock,
> as shown in the following log,
>
Thanks for finding this, the problem was already fixed in [1], which is
on its way to getting merged.
[1] -
https://lore.kernel.org/all/20250903065112.1878330-1-zhen.ni@easystack.cn
> INFO: task swapper/0:1 blocked for more than 122 seconds.
> Not tainted 6.17.0-rc6-next-20250915+ #29
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
> task:swapper/0 state:D stack:0 pid:1 tgid:1 ppid:0
> Call Trace:
> <TASK>
> __schedule+0x492/0x1600
> schedule+0x27/0xf0
> schedule_preempt_disabled+0x15/0x30
> __mutex_lock.constprop.0+0x538/0x9e0
> ? vprintk+0x18/0x50
> ? _printk+0x5f/0x90
> __mutex_lock_slowpath+0x13/0x20
> mutex_lock+0x3b/0x50
> rv_register_reactor+0x48/0xe0
> ? __pfx_register_react_printk+0x10/0x10
> register_react_printk+0x15/0x20
> do_one_initcall+0x5d/0x340
> kernel_init_freeable+0x351/0x540
> ? __pfx_kernel_init+0x10/0x10
> kernel_init+0x1b/0x200
> ? __pfx_kernel_init+0x10/0x10
> ret_from_fork+0x1fb/0x220
> ? __pfx_kernel_init+0x10/0x10
> ret_from_fork_asm+0x1a/0x30
>
> The root cause is that, when the kernel lockdown is in
> confidentiality
> mode, rv_create_dir(), which is essentially tracefs_create_dir(),
> will
> return NULL. This, in turn, causes create_monitor_dir() to return
> -ENOMEM, and finally leading to the mutex not being unlocked.
>
> Fixes: 24cbfe18d55a ("rv: Merge struct rv_monitor_def into struct
> rv_monitor")
> Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
> ---
> kernel/trace/rv/rv.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/trace/rv/rv.c b/kernel/trace/rv/rv.c
> index 1482e91c39f4..e35565dd2dc5 100644
> --- a/kernel/trace/rv/rv.c
> +++ b/kernel/trace/rv/rv.c
> @@ -805,7 +805,7 @@ int rv_register_monitor(struct rv_monitor
> *monitor, struct rv_monitor *parent)
>
> retval = create_monitor_dir(monitor, parent);
> if (retval)
> - return retval;
> + goto out_unlock;
>
> /* keep children close to the parent for easier
> visualisation */
> if (parent)
Gabriele Monaco <gmonaco@redhat.com> writes: > On Wed, 2025-09-17 at 12:57 +0000, Xiu Jianfeng wrote: >> From: Xiu Jianfeng <xiujianfeng@huawei.com> >> >> When booting kernel with lockdown=confidentiality parameter, the >> system >> will hang at rv_register_reactor() due to waiting for >> rv_interface_lock, >> as shown in the following log, >> > > Thanks for finding this, the problem was already fixed in [1], which is > on its way to getting merged. > > [1] - > https://lore.kernel.org/all/20250903065112.1878330-1-zhen.ni@easystack.cn Yeah, but it is interesting that this is causing real boot problem. I thought that commit merely fixes a theoretical bug. I guess this is an even stronger motivation to use lock guards. Nam
On Wed, 2025-09-17 at 16:07 +0200, Nam Cao wrote: > Gabriele Monaco <gmonaco@redhat.com> writes: > > On Wed, 2025-09-17 at 12:57 +0000, Xiu Jianfeng wrote: > > > From: Xiu Jianfeng <xiujianfeng@huawei.com> > > > > > > When booting kernel with lockdown=confidentiality parameter, the > > > system will hang at rv_register_reactor() due to waiting for > > > rv_interface_lock, as shown in the following log, > > > > > > > Thanks for finding this, the problem was already fixed in [1], > > which is on its way to getting merged. > > > > [1] - > > https://lore.kernel.org/all/20250903065112.1878330-1-zhen.ni@easystack.cn > > Yeah, but it is interesting that this is causing real boot problem. I > thought that commit merely fixes a theoretical bug. I guess this is > an even stronger motivation to use lock guards. Yeah totally, I have the feeling that with the kernel there's no such a thing as a "theoretical bug", kinda like a good consequence of Murphy's Law. But I agree, that's something we may want to do sooner than later. I'm currently not refactoring the RV core so I won't be touching that for a while, but any patch is more than welcome! Gabriele
čt 18. 9. 2025 v 10:36 odesílatel Gabriele Monaco <gmonaco@redhat.com> napsal: > > Yeah totally, I have the feeling that with the kernel there's no such a > thing as a "theoretical bug", kinda like a good consequence of Murphy's > Law. > My understanding of "theoretical bug" is that it's code that is semantically equivalent to a bug-free code, but becomes buggy after doing an "innocent" change. The bug might be more or less "theoretical" based on how "innocent" that change is. Of course, in a codebase of the size of a Linux kernel, this tends to happen quite often, and is not always possible to get rid of completely... Tomas
On Thu, 2025-09-18 at 11:48 +0200, Tomas Glozar wrote: > čt 18. 9. 2025 v 10:36 odesílatel Gabriele Monaco <gmonaco@redhat.com> napsal: > > > > Yeah totally, I have the feeling that with the kernel there's no such a > > thing as a "theoretical bug", kinda like a good consequence of Murphy's > > Law. > > My understanding of "theoretical bug" is that it's code that is > semantically equivalent to a bug-free code, but becomes buggy after > doing an "innocent" change. The bug might be more or less > "theoretical" based on how "innocent" that change is. Of course, in a > codebase of the size of a Linux kernel, this tends to happen quite > often, and is not always possible to get rid of completely... Yeah good point, we are getting philosophical here :) . This wasn't a theoretical bug then, just something you don't think will really happen (a failure creating a sysfs directory) ... until it happens. The fact there is a way to make that function fail on-demand (kernel lockdown), makes it just more "real". Moral of the story, better get the compiler check things for you (lock guards). Anyway the fix is now upstream. Gabriele
Gabriele Monaco <gmonaco@redhat.com> writes: > Moral of the story, better get the compiler check things for you (lock > guards). I will make the patches, once I'm done with some other things.. Nam
© 2016 - 2026 Red Hat, Inc.