rv: Add explicit lockdep context for reactors

[PATCH 3/3] rv: Add explicit lockdep context for reactors

Posted by Thomas Weißschuh 3 months, 4 weeks ago

Reactors can be called from any context through tracepoints.
When developing reactors care needs to be taken to only call APIs which
are safe. As the tracepoints used during testing may not actually be
called from restrictive contexts lockdep may not be helpful.

Add explicit overrides to help lockdep find invalid code patterns.

The usage of LD_WAIT_FREE will trigger lockdep warnings in the panic
reactor. These are indeed valid warnings but they are out of scope for
RV and will instead be fixed by the printk subsystem.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
---
 kernel/trace/rv/rv_reactors.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/trace/rv/rv_reactors.c b/kernel/trace/rv/rv_reactors.c
index 8c02426bc3bd944265f809e431283d1a20d56a8c..d9d335ae9badaa320f1d35dd159a033c3a30eb1a 100644
--- a/kernel/trace/rv/rv_reactors.c
+++ b/kernel/trace/rv/rv_reactors.c
@@ -61,6 +61,7 @@
  *      printk
  */
 
+#include <linux/lockdep.h>
 #include <linux/slab.h>
 
 #include "rv.h"
@@ -480,6 +481,7 @@ int init_rv_reactors(struct dentry *root_dir)
 
 void rv_react(struct rv_monitor *monitor, const char *msg, ...)
 {
+	static DEFINE_WAIT_OVERRIDE_MAP(rv_react_map, LD_WAIT_FREE);
 	va_list args;
 
 	if (!rv_reacting_on() || !monitor->react)
@@ -487,7 +489,9 @@ void rv_react(struct rv_monitor *monitor, const char *msg, ...)
 
 	va_start(args, msg);
 
+	lock_map_acquire_try(&rv_react_map);
 	monitor->react(msg, args);
+	lock_map_release(&rv_react_map);
 
 	va_end(args);
 }

-- 
2.51.0

Re: [PATCH 3/3] rv: Add explicit lockdep context for reactors

Posted by Nam Cao 3 months, 4 weeks ago

Thomas Weißschuh <thomas.weissschuh@linutronix.de> writes:
> Reactors can be called from any context through tracepoints.
> When developing reactors care needs to be taken to only call APIs which
> are safe. As the tracepoints used during testing may not actually be
> called from restrictive contexts lockdep may not be helpful.
>
> Add explicit overrides to help lockdep find invalid code patterns.
>
> The usage of LD_WAIT_FREE will trigger lockdep warnings in the panic
> reactor. These are indeed valid warnings but they are out of scope for
> RV and will instead be fixed by the printk subsystem.
>
> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
> ---
...
>  void rv_react(struct rv_monitor *monitor, const char *msg, ...)
>  {
> +	static DEFINE_WAIT_OVERRIDE_MAP(rv_react_map, LD_WAIT_FREE);
>  	va_list args;
>  
>  	if (!rv_reacting_on() || !monitor->react)
> @@ -487,7 +489,9 @@ void rv_react(struct rv_monitor *monitor, const char *msg, ...)
>  
>  	va_start(args, msg);
>  
> +	lock_map_acquire_try(&rv_react_map);
>  	monitor->react(msg, args);
> +	lock_map_release(&rv_react_map);
>  
>  	va_end(args);
>  }

The reactors are invoked in tracepoints' handlers, thus they must not
trigger another tracepoint, otherwise we may be stuck in an infinite loop.
(this is why preempt_enable_notrace() exists alongside preempt_enable()).

I'm not familiar with the internal lockdep. But I think these would
trigger trace_lock_acquire() and trace_lock_release().

Nam

Re: [PATCH 3/3] rv: Add explicit lockdep context for reactors

Posted by Thomas Weißschuh 3 months, 4 weeks ago

On Tue, Oct 14, 2025 at 09:38:09AM +0200, Nam Cao wrote:
> Thomas Weißschuh <thomas.weissschuh@linutronix.de> writes:
> > Reactors can be called from any context through tracepoints.
> > When developing reactors care needs to be taken to only call APIs which
> > are safe. As the tracepoints used during testing may not actually be
> > called from restrictive contexts lockdep may not be helpful.
> >
> > Add explicit overrides to help lockdep find invalid code patterns.
> >
> > The usage of LD_WAIT_FREE will trigger lockdep warnings in the panic
> > reactor. These are indeed valid warnings but they are out of scope for
> > RV and will instead be fixed by the printk subsystem.
> >
> > Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
> > ---
> ...
> >  void rv_react(struct rv_monitor *monitor, const char *msg, ...)
> >  {
> > +	static DEFINE_WAIT_OVERRIDE_MAP(rv_react_map, LD_WAIT_FREE);
> >  	va_list args;
> >  
> >  	if (!rv_reacting_on() || !monitor->react)
> > @@ -487,7 +489,9 @@ void rv_react(struct rv_monitor *monitor, const char *msg, ...)
> >  
> >  	va_start(args, msg);
> >  
> > +	lock_map_acquire_try(&rv_react_map);
> >  	monitor->react(msg, args);
> > +	lock_map_release(&rv_react_map);
> >  
> >  	va_end(args);
> >  }
> 
> The reactors are invoked in tracepoints' handlers, thus they must not
> trigger another tracepoint, otherwise we may be stuck in an infinite loop.
> (this is why preempt_enable_notrace() exists alongside preempt_enable()).

Sounds reasonable. However today not even the printk reactor satisfies this
rule as it transitively calls trace_console().

> I'm not familiar with the internal lockdep. But I think these would
> trigger trace_lock_acquire() and trace_lock_release().

Indeed. Right now no monitor attaches to those tracepoints. We could
prevent monitors from attaching to certain "well-known" tracepoints.
But then we still need to manually track which those are, which is ugly.
Or we move the invocation of the reactor to a workqueue/task_work.


Thomas

Re: [PATCH 3/3] rv: Add explicit lockdep context for reactors

Posted by Gabriele Monaco 3 months, 4 weeks ago

On Tue, 2025-10-14 at 11:46 +0200, Thomas Weißschuh wrote:
> On Tue, Oct 14, 2025 at 09:38:09AM +0200, Nam Cao wrote:
> > The reactors are invoked in tracepoints' handlers, thus they must not
> > trigger another tracepoint, otherwise we may be stuck in an infinite loop.
> > (this is why preempt_enable_notrace() exists alongside preempt_enable()).
> 
> Sounds reasonable. However today not even the printk reactor satisfies this
> rule as it transitively calls trace_console().

That's a valid concern, I assume it would become a problem also if we wanted to
use locks inside event handlers, as it was discussed some time ago to better
handle concurrency.

> > I'm not familiar with the internal lockdep. But I think these would
> > trigger trace_lock_acquire() and trace_lock_release().
> 
> Indeed. Right now no monitor attaches to those tracepoints. We could
> prevent monitors from attaching to certain "well-known" tracepoints.
> But then we still need to manually track which those are, which is ugly.
> Or we move the invocation of the reactor to a workqueue/task_work.

I'm afraid also workqueues might open a rabbit-hole (waking up a task fights
with locks in many scheduling tracepoints).
At a quick glance task_works also do some IPI/wakeups that are traced.
If I get it correctly we are looking for something absolutely lock-free/trace-
free, I can't really think of much at the moment, maybe abusing RCU callbacks
but those would have their set of problems too.

As much as it might be interesting to write monitors on lockdep tracepoints,
this seems challenging.

We could opt for a foolproof Kconfig solution and prevent reactors if lockdep is
active (leaving only the error tracepoints that are hopefully still safe).

Gabriele

Re: [PATCH 3/3] rv: Add explicit lockdep context for reactors

Posted by Thomas Weißschuh 3 months, 4 weeks ago

On Tue, Oct 14, 2025 at 12:22:06PM +0200, Gabriele Monaco wrote:
> On Tue, 2025-10-14 at 11:46 +0200, Thomas Weißschuh wrote:
> > On Tue, Oct 14, 2025 at 09:38:09AM +0200, Nam Cao wrote:
> > > The reactors are invoked in tracepoints' handlers, thus they must not
> > > trigger another tracepoint, otherwise we may be stuck in an infinite loop.
> > > (this is why preempt_enable_notrace() exists alongside preempt_enable()).

(...)

> > > I'm not familiar with the internal lockdep. But I think these would
> > > trigger trace_lock_acquire() and trace_lock_release().
> > 
> > Indeed. Right now no monitor attaches to those tracepoints. We could
> > prevent monitors from attaching to certain "well-known" tracepoints.
> > But then we still need to manually track which those are, which is ugly.
> > Or we move the invocation of the reactor to a workqueue/task_work.
> 
> I'm afraid also workqueues might open a rabbit-hole (waking up a task fights
> with locks in many scheduling tracepoints).
> At a quick glance task_works also do some IPI/wakeups that are traced.
> If I get it correctly we are looking for something absolutely lock-free/trace-
> free, I can't really think of much at the moment, maybe abusing RCU callbacks
> but those would have their set of problems too.

Agreed.

> As much as it might be interesting to write monitors on lockdep tracepoints,
> this seems challenging.

> We could opt for a foolproof Kconfig solution and prevent reactors if lockdep is
> active (leaving only the error tracepoints that are hopefully still safe).

I can't follow here. lockdep can indicate problems, but it should not introduce
problems on its own. So preventing the usage together with lockdep would be the
proverbial head in the sand. If the tracepoints called by lockdep are an issue
then we would just not call into lockdep in the first place. lockdep triggering
these tracepoints should not be an issue in practice. I don't see a bulletproof
way to prevent a tracepoint handler from calling another tracepoint, except
maybe extending lockdep to also track that.


Thomas

Re: [PATCH 3/3] rv: Add explicit lockdep context for reactors

Posted by Gabriele Monaco 3 months, 4 weeks ago

On Tue, 2025-10-14 at 14:51 +0200, Thomas Weißschuh wrote:
> I can't follow here. lockdep can indicate problems, but it should not
> introduce
> problems on its own. So preventing the usage together with lockdep would be
> the
> proverbial head in the sand. If the tracepoints called by lockdep are an issue
> then we would just not call into lockdep in the first place. lockdep
> triggering
> these tracepoints should not be an issue in practice. I don't see a
> bulletproof
> way to prevent a tracepoint handler from calling another tracepoint, except
> maybe extending lockdep to also track that.

Forget about it, you're right. This leads to not using lockdep inside reactors
in the first place. We could even have notrace versions of the lockdep calls
(I'm not sure lockdep itself needs them), but that's getting horrid.

Leaving for a moment concurrency quirks aside, a monitor that is reacting should
be done for a while and can be marked as not monitoring before reacting, instead
of after.
Trace handlers triggered in the same tracepoints should, in principle, be able
to tell they are not supposed to run. This at least stands for DA monitors, but
the same idea could work on LTL as well.

Of course this gets more complicated in practice, but perhaps suspending
monitors during reaction can be enough to allow these lockdep calls without
risking infinite loops.

Thoughts?

Gabriele

Re: [PATCH 3/3] rv: Add explicit lockdep context for reactors

Posted by Thomas Weißschuh 3 months, 4 weeks ago

On Tue, Oct 14, 2025 at 03:45:39PM +0200, Gabriele Monaco wrote:
> On Tue, 2025-10-14 at 14:51 +0200, Thomas Weißschuh wrote:
> > I can't follow here. lockdep can indicate problems, but it should not
> > introduce
> > problems on its own. So preventing the usage together with lockdep would be
> > the
> > proverbial head in the sand. If the tracepoints called by lockdep are an issue
> > then we would just not call into lockdep in the first place. lockdep
> > triggering
> > these tracepoints should not be an issue in practice. I don't see a
> > bulletproof
> > way to prevent a tracepoint handler from calling another tracepoint, except
> > maybe extending lockdep to also track that.
> 
> Forget about it, you're right. This leads to not using lockdep inside reactors
> in the first place. We could even have notrace versions of the lockdep calls
> (I'm not sure lockdep itself needs them), but that's getting horrid.

I still don't understand why the tracepoints called from lockdep are worse then
the ones called from the reactors themselves? Any solution should also apply to
those. Especially as even the simplest printk reactor runs into the same issue.

> Leaving for a moment concurrency quirks aside, a monitor that is reacting should
> be done for a while and can be marked as not monitoring before reacting, instead
> of after.
> Trace handlers triggered in the same tracepoints should, in principle, be able
> to tell they are not supposed to run. This at least stands for DA monitors, but
> the same idea could work on LTL as well.
> 
> Of course this gets more complicated in practice, but perhaps suspending
> monitors during reaction can be enough to allow these lockdep calls without
> risking infinite loops.

What would it mean to suspend a monitor? In my opinion we shouldn't sacrifice
the accuracy of the monitors or the reliability of the reactors while trying to
mitigate a theoretical problem.


Thomas

Re: [PATCH 3/3] rv: Add explicit lockdep context for reactors

Posted by Gabriele Monaco 3 months, 4 weeks ago

On Tue, 2025-10-14 at 16:18 +0200, Thomas Weißschuh wrote:
> On Tue, Oct 14, 2025 at 03:45:39PM +0200, Gabriele Monaco wrote:
> > On Tue, 2025-10-14 at 14:51 +0200, Thomas Weißschuh wrote:
> > > I can't follow here. lockdep can indicate problems, but it should not
> > > introduce
> > > problems on its own. So preventing the usage together with lockdep would
> > > be
> > > the
> > > proverbial head in the sand. If the tracepoints called by lockdep are an
> > > issue
> > > then we would just not call into lockdep in the first place. lockdep
> > > triggering
> > > these tracepoints should not be an issue in practice. I don't see a
> > > bulletproof
> > > way to prevent a tracepoint handler from calling another tracepoint,
> > > except
> > > maybe extending lockdep to also track that.
> > 
> > Forget about it, you're right. This leads to not using lockdep inside
> > reactors
> > in the first place. We could even have notrace versions of the lockdep calls
> > (I'm not sure lockdep itself needs them), but that's getting horrid.
> 
> I still don't understand why the tracepoints called from lockdep are worse
> then
> the ones called from the reactors themselves? Any solution should also apply
> to
> those. Especially as even the simplest printk reactor runs into the same
> issue.

They aren't in fact, so yes, we already had this problem without knowing about
it.

> > Leaving for a moment concurrency quirks aside, a monitor that is reacting
> > should be done for a while and can be marked as not monitoring before
> > reacting, instead of after.
> > Trace handlers triggered in the same tracepoints should, in principle, be
> > able to tell they are not supposed to run. This at least stands for DA
> > monitors, but the same idea could work on LTL as well.
> > 
> > Of course this gets more complicated in practice, but perhaps suspending
> > monitors during reaction can be enough to allow these lockdep calls without
> > risking infinite loops.
> 
> What would it mean to suspend a monitor? In my opinion we shouldn't sacrifice
> the accuracy of the monitors or the reliability of the reactors while trying
> to mitigate a theoretical problem.

I don't mean to really sacrifice accuracy, DA monitors are disabled after a
reaction. This comes from the assumption that the model becomes invalid, so
whatever comes after might be meaningless. Monitors restart as soon as we are
sure we reached the initial state.
In this case, it already doesn't make sense to monitor events triggered by
reactors.

LTL is a bit more complex, so it might make sense to continue monitoring just
after a reaction, but I'm not sure how useful that is.

Re: [PATCH 3/3] rv: Add explicit lockdep context for reactors

Posted by Gabriele Monaco 3 months, 4 weeks ago

On Tue, 2025-10-14 at 07:51 +0200, Thomas Weißschuh wrote:
> Reactors can be called from any context through tracepoints.
> When developing reactors care needs to be taken to only call APIs which
> are safe. As the tracepoints used during testing may not actually be
> called from restrictive contexts lockdep may not be helpful.
> 
> Add explicit overrides to help lockdep find invalid code patterns.
> 
> The usage of LD_WAIT_FREE will trigger lockdep warnings in the panic
> reactor. These are indeed valid warnings but they are out of scope for
> RV and will instead be fixed by the printk subsystem.

Looks like a nice addition!
If I get it correctly, this patch does trigger a lockdep warning with the
current state of the kernel. Is there a plan of fixing the warning in printk?
I assume this series would need to wait for that or did you have other ideas?

Thanks,
Gabriele

> 
> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
> ---
>  kernel/trace/rv/rv_reactors.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/trace/rv/rv_reactors.c b/kernel/trace/rv/rv_reactors.c
> index
> 8c02426bc3bd944265f809e431283d1a20d56a8c..d9d335ae9badaa320f1d35dd159a033c3a30
> eb1a 100644
> --- a/kernel/trace/rv/rv_reactors.c
> +++ b/kernel/trace/rv/rv_reactors.c
> @@ -61,6 +61,7 @@
>   *      printk
>   */
>  
> +#include <linux/lockdep.h>
>  #include <linux/slab.h>
>  
>  #include "rv.h"
> @@ -480,6 +481,7 @@ int init_rv_reactors(struct dentry *root_dir)
>  
>  void rv_react(struct rv_monitor *monitor, const char *msg, ...)
>  {
> +	static DEFINE_WAIT_OVERRIDE_MAP(rv_react_map, LD_WAIT_FREE);
>  	va_list args;
>  
>  	if (!rv_reacting_on() || !monitor->react)
> @@ -487,7 +489,9 @@ void rv_react(struct rv_monitor *monitor, const char *msg,
> ...)
>  
>  	va_start(args, msg);
>  
> +	lock_map_acquire_try(&rv_react_map);
>  	monitor->react(msg, args);
> +	lock_map_release(&rv_react_map);
>  
>  	va_end(args);
>  }

Re: [PATCH 3/3] rv: Add explicit lockdep context for reactors

Posted by Thomas Weißschuh 3 months, 4 weeks ago

On Tue, Oct 14, 2025 at 08:55:44AM +0200, Gabriele Monaco wrote:
> On Tue, 2025-10-14 at 07:51 +0200, Thomas Weißschuh wrote:
> > Reactors can be called from any context through tracepoints.
> > When developing reactors care needs to be taken to only call APIs which
> > are safe. As the tracepoints used during testing may not actually be
> > called from restrictive contexts lockdep may not be helpful.
> > 
> > Add explicit overrides to help lockdep find invalid code patterns.
> > 
> > The usage of LD_WAIT_FREE will trigger lockdep warnings in the panic
> > reactor. These are indeed valid warnings but they are out of scope for
> > RV and will instead be fixed by the printk subsystem.
> 
> Looks like a nice addition!

Thanks!

> If I get it correctly, this patch does trigger a lockdep warning with the
> current state of the kernel. Is there a plan of fixing the warning in printk?

This will be fixed as soon as the console drivers are converted to the
nonblocking APIs. And there are a few other lockdep warnings that will pop up
after that in other kernel subsystems which are currently hidden by the printk
one. None of which should be the responsibility of RV to fix.

> I assume this series would need to wait for that or did you have other ideas?

I think this series can go in as-is. Only the panic reactor is affected and the
panic will still go through anyways. *After* the panic will be a lockdep splat.

Thomas