rv: Fix boot failure when kernel lockdown is active

[PATCH] rv: Fix boot failure when kernel lockdown is active

Posted by Xiu Jianfeng 4 months, 3 weeks ago

From: Xiu Jianfeng <xiujianfeng@huawei.com>

When booting kernel with lockdown=confidentiality parameter, the system
will hang at rv_register_reactor() due to waiting for rv_interface_lock,
as shown in the following log,

INFO: task swapper/0:1 blocked for more than 122 seconds.
      Not tainted 6.17.0-rc6-next-20250915+ #29
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:swapper/0       state:D stack:0     pid:1     tgid:1     ppid:0
Call Trace:
 <TASK>
 __schedule+0x492/0x1600
 schedule+0x27/0xf0
 schedule_preempt_disabled+0x15/0x30
 __mutex_lock.constprop.0+0x538/0x9e0
 ? vprintk+0x18/0x50
 ? _printk+0x5f/0x90
 __mutex_lock_slowpath+0x13/0x20
 mutex_lock+0x3b/0x50
 rv_register_reactor+0x48/0xe0
 ? __pfx_register_react_printk+0x10/0x10
 register_react_printk+0x15/0x20
 do_one_initcall+0x5d/0x340
 kernel_init_freeable+0x351/0x540
 ? __pfx_kernel_init+0x10/0x10
 kernel_init+0x1b/0x200
 ? __pfx_kernel_init+0x10/0x10
 ret_from_fork+0x1fb/0x220
 ? __pfx_kernel_init+0x10/0x10
 ret_from_fork_asm+0x1a/0x30

The root cause is that, when the kernel lockdown is in confidentiality
mode, rv_create_dir(), which is essentially tracefs_create_dir(), will
return NULL. This, in turn, causes create_monitor_dir() to return
-ENOMEM, and finally leading to the mutex not being unlocked.

Fixes: 24cbfe18d55a ("rv: Merge struct rv_monitor_def into struct rv_monitor")
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
---
 kernel/trace/rv/rv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/rv/rv.c b/kernel/trace/rv/rv.c
index 1482e91c39f4..e35565dd2dc5 100644
--- a/kernel/trace/rv/rv.c
+++ b/kernel/trace/rv/rv.c
@@ -805,7 +805,7 @@ int rv_register_monitor(struct rv_monitor *monitor, struct rv_monitor *parent)
 
 	retval = create_monitor_dir(monitor, parent);
 	if (retval)
-		return retval;
+		goto out_unlock;
 
 	/* keep children close to the parent for easier visualisation */
 	if (parent)
-- 
2.43.0

Re: [PATCH] rv: Fix boot failure when kernel lockdown is active

Posted by Gabriele Monaco 4 months, 3 weeks ago

On Wed, 2025-09-17 at 12:57 +0000, Xiu Jianfeng wrote:
> From: Xiu Jianfeng <xiujianfeng@huawei.com>
> 
> When booting kernel with lockdown=confidentiality parameter, the
> system
> will hang at rv_register_reactor() due to waiting for
> rv_interface_lock,
> as shown in the following log,
> 

Thanks for finding this, the problem was already fixed in [1], which is
on its way to getting merged.

[1] -
https://lore.kernel.org/all/20250903065112.1878330-1-zhen.ni@easystack.cn

> INFO: task swapper/0:1 blocked for more than 122 seconds.
>       Not tainted 6.17.0-rc6-next-20250915+ #29
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
> task:swapper/0       state:D stack:0     pid:1     tgid:1     ppid:0
> Call Trace:
>  <TASK>
>  __schedule+0x492/0x1600
>  schedule+0x27/0xf0
>  schedule_preempt_disabled+0x15/0x30
>  __mutex_lock.constprop.0+0x538/0x9e0
>  ? vprintk+0x18/0x50
>  ? _printk+0x5f/0x90
>  __mutex_lock_slowpath+0x13/0x20
>  mutex_lock+0x3b/0x50
>  rv_register_reactor+0x48/0xe0
>  ? __pfx_register_react_printk+0x10/0x10
>  register_react_printk+0x15/0x20
>  do_one_initcall+0x5d/0x340
>  kernel_init_freeable+0x351/0x540
>  ? __pfx_kernel_init+0x10/0x10
>  kernel_init+0x1b/0x200
>  ? __pfx_kernel_init+0x10/0x10
>  ret_from_fork+0x1fb/0x220
>  ? __pfx_kernel_init+0x10/0x10
>  ret_from_fork_asm+0x1a/0x30
> 
> The root cause is that, when the kernel lockdown is in
> confidentiality
> mode, rv_create_dir(), which is essentially tracefs_create_dir(),
> will
> return NULL. This, in turn, causes create_monitor_dir() to return
> -ENOMEM, and finally leading to the mutex not being unlocked.
> 
> Fixes: 24cbfe18d55a ("rv: Merge struct rv_monitor_def into struct
> rv_monitor")
> Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
> ---
>  kernel/trace/rv/rv.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/rv/rv.c b/kernel/trace/rv/rv.c
> index 1482e91c39f4..e35565dd2dc5 100644
> --- a/kernel/trace/rv/rv.c
> +++ b/kernel/trace/rv/rv.c
> @@ -805,7 +805,7 @@ int rv_register_monitor(struct rv_monitor
> *monitor, struct rv_monitor *parent)
>  
>  	retval = create_monitor_dir(monitor, parent);
>  	if (retval)
> -		return retval;
> +		goto out_unlock;
>  
>  	/* keep children close to the parent for easier
> visualisation */
>  	if (parent)

Re: [PATCH] rv: Fix boot failure when kernel lockdown is active

Posted by Nam Cao 4 months, 3 weeks ago

Gabriele Monaco <gmonaco@redhat.com> writes:
> On Wed, 2025-09-17 at 12:57 +0000, Xiu Jianfeng wrote:
>> From: Xiu Jianfeng <xiujianfeng@huawei.com>
>> 
>> When booting kernel with lockdown=confidentiality parameter, the
>> system
>> will hang at rv_register_reactor() due to waiting for
>> rv_interface_lock,
>> as shown in the following log,
>> 
>
> Thanks for finding this, the problem was already fixed in [1], which is
> on its way to getting merged.
>
> [1] -
> https://lore.kernel.org/all/20250903065112.1878330-1-zhen.ni@easystack.cn

Yeah, but it is interesting that this is causing real boot problem. I
thought that commit merely fixes a theoretical bug. I guess this is an
even stronger motivation to use lock guards.

Nam

Re: [PATCH] rv: Fix boot failure when kernel lockdown is active

Posted by Gabriele Monaco 4 months, 3 weeks ago

On Wed, 2025-09-17 at 16:07 +0200, Nam Cao wrote:
> Gabriele Monaco <gmonaco@redhat.com> writes:
> > On Wed, 2025-09-17 at 12:57 +0000, Xiu Jianfeng wrote:
> > > From: Xiu Jianfeng <xiujianfeng@huawei.com>
> > > 
> > > When booting kernel with lockdown=confidentiality parameter, the
> > > system will hang at rv_register_reactor() due to waiting for
> > > rv_interface_lock, as shown in the following log,
> > > 
> > 
> > Thanks for finding this, the problem was already fixed in [1],
> > which is on its way to getting merged.
> > 
> > [1] -
> > https://lore.kernel.org/all/20250903065112.1878330-1-zhen.ni@easystack.cn
> 
> Yeah, but it is interesting that this is causing real boot problem. I
> thought that commit merely fixes a theoretical bug. I guess this is
> an even stronger motivation to use lock guards.

Yeah totally, I have the feeling that with the kernel there's no such a
thing as a "theoretical bug", kinda like a good consequence of Murphy's
Law.

But I agree, that's something we may want to do sooner than later.
I'm currently not refactoring the RV core so I won't be touching that
for a while, but any patch is more than welcome!

Gabriele

Re: [PATCH] rv: Fix boot failure when kernel lockdown is active

Posted by Tomas Glozar 4 months, 3 weeks ago

čt 18. 9. 2025 v 10:36 odesílatel Gabriele Monaco <gmonaco@redhat.com> napsal:
>
> Yeah totally, I have the feeling that with the kernel there's no such a
> thing as a "theoretical bug", kinda like a good consequence of Murphy's
> Law.
>

My understanding of "theoretical bug" is that it's code that is
semantically equivalent to a bug-free code, but becomes buggy after
doing an "innocent" change. The bug might be more or less
"theoretical" based on how "innocent" that change is. Of course, in a
codebase of the size of a Linux kernel, this tends to happen quite
often, and is not always possible to get rid of completely...

Tomas

Re: [PATCH] rv: Fix boot failure when kernel lockdown is active

Posted by Gabriele Monaco 4 months, 3 weeks ago

On Thu, 2025-09-18 at 11:48 +0200, Tomas Glozar wrote:
> čt 18. 9. 2025 v 10:36 odesílatel Gabriele Monaco <gmonaco@redhat.com> napsal:
> > 
> > Yeah totally, I have the feeling that with the kernel there's no such a
> > thing as a "theoretical bug", kinda like a good consequence of Murphy's
> > Law.
> 
> My understanding of "theoretical bug" is that it's code that is
> semantically equivalent to a bug-free code, but becomes buggy after
> doing an "innocent" change. The bug might be more or less
> "theoretical" based on how "innocent" that change is. Of course, in a
> codebase of the size of a Linux kernel, this tends to happen quite
> often, and is not always possible to get rid of completely...

Yeah good point, we are getting philosophical here :) . This wasn't a
theoretical bug then, just something you don't think will really happen (a
failure creating a sysfs directory) ... until it happens.

The fact there is a way to make that function fail on-demand (kernel lockdown),
makes it just more "real". Moral of the story, better get the compiler check
things for you (lock guards).

Anyway the fix is now upstream.

Gabriele

Re: [PATCH] rv: Fix boot failure when kernel lockdown is active

Posted by Nam Cao 4 months, 3 weeks ago

Gabriele Monaco <gmonaco@redhat.com> writes:
> Moral of the story, better get the compiler check things for you (lock
> guards).

I will make the patches, once I'm done with some other things..

Nam