[v3] rv: Add monitors to validate task switch

[PATCH v3 11/17] rv: Retry when da monitor detects race conditions

Posted by Gabriele Monaco 6 months, 4 weeks ago

DA monitor can be accessed from multiple cores simultaneously, this is
likely, for instance when dealing with per-task monitors reacting on
events that do not always occur on the CPU where the task is running.
This can cause race conditions where two events change the next state
and we see inconsistent values. E.g.:

  [62] event_srs: 27: sleepable x sched_wakeup -> running (final)
  [63] event_srs: 27: sleepable x sched_set_state_sleepable -> sleepable
  [63] error_srs: 27: event sched_switch_suspend not expected in the state running

In this case the monitor fails because the event on CPU 62 wins against
the one on CPU 63, although the correct state should have been
sleepable, since the task get suspended.

Detect if the current state was modified by using try_cmpxchg while
storing the next value. If it was, try again reading the current state.
After a maximum number of failed retries, react as if it was an error
with invalid current state (we cannot determine it).

Remove the functions da_monitor_curr_state() and da_monitor_set_state()
as they only hide the underlying implementation in this case.

Monitors where this type of condition can occur must be able to account
for racing events in any possible order, as we cannot know the winner.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 include/linux/rv.h      |  3 +-
 include/rv/da_monitor.h | 91 ++++++++++++++++++++++-------------------
 2 files changed, 51 insertions(+), 43 deletions(-)

diff --git a/include/linux/rv.h b/include/linux/rv.h
index 97baf58d88b28..0250a04f4524c 100644
--- a/include/linux/rv.h
+++ b/include/linux/rv.h
@@ -7,7 +7,8 @@
 #ifndef _LINUX_RV_H
 #define _LINUX_RV_H
 
-#define MAX_DA_NAME_LEN	32
+#define MAX_DA_NAME_LEN			32
+#define MAX_DA_RETRY_RACING_EVENTS	3
 
 #ifdef CONFIG_RV
 #include <linux/bitops.h>
diff --git a/include/rv/da_monitor.h b/include/rv/da_monitor.h
index ed3c34fe18d61..bd3096a3181f8 100644
--- a/include/rv/da_monitor.h
+++ b/include/rv/da_monitor.h
@@ -54,23 +54,6 @@ static inline void da_monitor_reset_##name(struct da_monitor *da_mon)				\
 	da_mon->curr_state = model_get_initial_state_##name();					\
 }												\
 												\
-/*												\
- * da_monitor_curr_state_##name - return the current state					\
- */												\
-static inline type da_monitor_curr_state_##name(struct da_monitor *da_mon)			\
-{												\
-	return da_mon->curr_state;								\
-}												\
-												\
-/*												\
- * da_monitor_set_state_##name - set the new current state					\
- */												\
-static inline void										\
-da_monitor_set_state_##name(struct da_monitor *da_mon, enum states_##name state)		\
-{												\
-	da_mon->curr_state = state;								\
-}												\
-												\
 /*												\
  * da_monitor_start_##name - start monitoring							\
  *												\
@@ -127,57 +110,72 @@ static inline bool da_monitor_handling_event_##name(struct da_monitor *da_mon)
  * Event handler for implicit monitors. Implicit monitor is the one which the
  * handler does not need to specify which da_monitor to manipulate. Examples
  * of implicit monitor are the per_cpu or the global ones.
+ *
+ * Retry, in case there is a race while getting and setting the next state
+ * return an invalid current state if we run out of retries. The monitor should
+ * be able to handle various orders.
  */
 #define DECLARE_DA_MON_MODEL_HANDLER_IMPLICIT(name, type)					\
 												\
 static inline bool										\
 da_event_##name(struct da_monitor *da_mon, enum events_##name event)				\
 {												\
-	type curr_state = da_monitor_curr_state_##name(da_mon);					\
-	type next_state = model_get_next_state_##name(curr_state, event);			\
-												\
-	if (next_state != INVALID_STATE) {							\
-		da_monitor_set_state_##name(da_mon, next_state);				\
-												\
-		trace_event_##name(model_get_state_name_##name(curr_state),			\
-				   model_get_event_name_##name(event),				\
-				   model_get_state_name_##name(next_state),			\
-				   model_is_final_state_##name(next_state));			\
-												\
-		return true;									\
+	enum states_##name curr_state, next_state;						\
+												\
+	curr_state = READ_ONCE(da_mon->curr_state);						\
+	for (int i = 0; i < MAX_DA_RETRY_RACING_EVENTS; i++) {					\
+		next_state = model_get_next_state_##name(curr_state, event);			\
+		if (next_state == INVALID_STATE)						\
+			goto out_react;								\
+		if (likely(try_cmpxchg(&da_mon->curr_state, &curr_state, next_state)))		\
+			goto out_success;							\
 	}											\
+	/* Special invalid transition if we run out of retries. */				\
+	curr_state = INVALID_STATE;								\
 												\
+out_react:											\
 	cond_react_##name(curr_state, event);							\
 												\
 	trace_error_##name(model_get_state_name_##name(curr_state),				\
 			   model_get_event_name_##name(event));					\
 												\
 	return false;										\
+												\
+out_success:											\
+	trace_event_##name(model_get_state_name_##name(curr_state),				\
+			   model_get_event_name_##name(event),					\
+			   model_get_state_name_##name(next_state),				\
+			   model_is_final_state_##name(next_state));				\
+												\
+	return true;										\
 }												\
 
 /*
  * Event handler for per_task monitors.
+ *
+ * Retry, in case there is a race while getting and setting the next state
+ * return an invalid current state if we run out of retries. The monitor should
+ * be able to handle various orders.
  */
 #define DECLARE_DA_MON_MODEL_HANDLER_PER_TASK(name, type)					\
 												\
 static inline bool da_event_##name(struct da_monitor *da_mon, struct task_struct *tsk,		\
 				   enum events_##name event)					\
 {												\
-	type curr_state = da_monitor_curr_state_##name(da_mon);					\
-	type next_state = model_get_next_state_##name(curr_state, event);			\
-												\
-	if (next_state != INVALID_STATE) {							\
-		da_monitor_set_state_##name(da_mon, next_state);				\
-												\
-		trace_event_##name(tsk->pid,							\
-				   model_get_state_name_##name(curr_state),			\
-				   model_get_event_name_##name(event),				\
-				   model_get_state_name_##name(next_state),			\
-				   model_is_final_state_##name(next_state));			\
-												\
-		return true;									\
+	enum states_##name curr_state, next_state;						\
+												\
+	curr_state = READ_ONCE(da_mon->curr_state);						\
+	for (int i = 0; i < MAX_DA_RETRY_RACING_EVENTS; i++) {					\
+		next_state = model_get_next_state_##name(curr_state, event);			\
+		if (next_state == INVALID_STATE)						\
+			goto out_react;								\
+		if (likely(try_cmpxchg(&da_mon->curr_state, &curr_state, next_state)))		\
+			goto out_success;							\
 	}											\
+	/* Special invalid transition if we run out of retries. */				\
+	curr_state = INVALID_STATE;								\
 												\
+out_react:											\
 	cond_react_##name(curr_state, event);							\
 												\
 	trace_error_##name(tsk->pid,								\
@@ -185,6 +183,15 @@ static inline bool da_event_##name(struct da_monitor *da_mon, struct task_struct
 			   model_get_event_name_##name(event));					\
 												\
 	return false;										\
+												\
+out_success:											\
+	trace_event_##name(tsk->pid,								\
+			   model_get_state_name_##name(curr_state),				\
+			   model_get_event_name_##name(event),					\
+			   model_get_state_name_##name(next_state),				\
+			   model_is_final_state_##name(next_state));				\
+												\
+	return true;										\
 }
 
 /*
-- 
2.50.1

Re: [PATCH v3 11/17] rv: Retry when da monitor detects race conditions

Posted by Nam Cao 6 months, 4 weeks ago

On Tue, Jul 15, 2025 at 09:14:28AM +0200, Gabriele Monaco wrote:
>  static inline bool										\
>  da_event_##name(struct da_monitor *da_mon, enum events_##name event)				\
>  {												\
> -	type curr_state = da_monitor_curr_state_##name(da_mon);					\
> -	type next_state = model_get_next_state_##name(curr_state, event);			\
> -												\
> -	if (next_state != INVALID_STATE) {							\
> -		da_monitor_set_state_##name(da_mon, next_state);				\
> -												\
> -		trace_event_##name(model_get_state_name_##name(curr_state),			\
> -				   model_get_event_name_##name(event),				\
> -				   model_get_state_name_##name(next_state),			\
> -				   model_is_final_state_##name(next_state));			\
> -												\
> -		return true;									\
> +	enum states_##name curr_state, next_state;						\
> +												\
> +	curr_state = READ_ONCE(da_mon->curr_state);						\
> +	for (int i = 0; i < MAX_DA_RETRY_RACING_EVENTS; i++) {					\
> +		next_state = model_get_next_state_##name(curr_state, event);			\
> +		if (next_state == INVALID_STATE)						\
> +			goto out_react;								\
> +		if (likely(try_cmpxchg(&da_mon->curr_state, &curr_state, next_state)))		\
> +			goto out_success;							\
>  	}											\
> +	/* Special invalid transition if we run out of retries. */				\
> +	curr_state = INVALID_STATE;								\
>  												\
> +out_react:											\
>  	cond_react_##name(curr_state, event);							\
>  												\
>  	trace_error_##name(model_get_state_name_##name(curr_state),				\
>  			   model_get_event_name_##name(event));					\

If I understand correctly, if after 3 tries and we still fail to change the
state, we will invoke the reactor and trace_error? Doesn't that cause a
false positive? Because it is not a violation of the model, it is just a
race making us fail to change the state.

Same below.

Also, I wouldn't use goto unless necessary. Perhaps it is better to put the
code at "out_react:" and "out_success:" into the loop. But that's just my
personal preference, up to you.

Nam

Re: [PATCH v3 11/17] rv: Retry when da monitor detects race conditions

Posted by Gabriele Monaco 6 months, 3 weeks ago

On Tue, 2025-07-15 at 17:23 +0200, Nam Cao wrote:
> On Tue, Jul 15, 2025 at 09:14:28AM +0200, Gabriele Monaco wrote:
> >  static inline
> > bool										\
> >  da_event_##name(struct da_monitor *da_mon, enum events_##name
> > event)				\
> >  {								
> > 				\
> > -	type curr_state =
> > da_monitor_curr_state_##name(da_mon);					\
> > -	type next_state = model_get_next_state_##name(curr_state,
> > event);			\
> > -
> > 												\
> > -	if (next_state != INVALID_STATE)
> > {							\
> > -		da_monitor_set_state_##name(da_mon,
> > next_state);				\
> > -
> > 												\
> > -
> > 		trace_event_##name(model_get_state_name_##name(curr_state),			\
> > -				  
> > model_get_event_name_##name(event),				\
> > -				  
> > model_get_state_name_##name(next_state),			\
> > -				  
> > model_is_final_state_##name(next_state));			\
> > -
> > 												\
> > -		return
> > true;									\
> > +	enum states_##name curr_state,
> > next_state;						\
> > +								
> > 				\
> > +	curr_state = READ_ONCE(da_mon-
> > >curr_state);						\
> > +	for (int i = 0; i < MAX_DA_RETRY_RACING_EVENTS; i++)
> > {					\
> > +		next_state =
> > model_get_next_state_##name(curr_state, event);			\
> > +		if (next_state ==
> > INVALID_STATE)						\
> > +			goto
> > out_react;								\
> > +		if (likely(try_cmpxchg(&da_mon->curr_state,
> > &curr_state, next_state)))		\
> > +			goto
> > out_success;							\
> >  	}							
> > 				\
> > +	/* Special invalid transition if we run out of retries.
> > */				\
> > +	curr_state =
> > INVALID_STATE;								\
> >  								
> > 				\
> > +out_react:							
> > 				\
> >  	cond_react_##name(curr_state,
> > event);							\
> >  								
> > 				\
> >  	trace_error_##name(model_get_state_name_##name(curr_state)
> > ,				\
> >  			  
> > model_get_event_name_##name(event));					\
> 
> If I understand correctly, if after 3 tries and we still fail to
> change the
> state, we will invoke the reactor and trace_error? Doesn't that cause
> a
> false positive? Because it is not a violation of the model, it is
> just a
> race making us fail to change the state.
> 

Yes, that's correct.
My rationale was that, at that point, the monitor is likely no longer
in sync, so silently ignoring the situation is not really an option.
In this case, the reaction includes an invalid current state (because
in fact we don't know what the current state is) and tools may be able
to understand that. I know you wouldn't be able to do that in LTL..
By the way, LTL uses multiple statuses, so this lockless approach may
not really work.

I don't see this situation happening often: I only ever observed 2
events able to race, 4 happening at the same time is wild, but of
course cannot be excluded in principle for any possible monitor.
Yet, I have the feeling a monitor where this can happen is not well
designed and RV should point that out.
Do you have ideas of potential monitors where more than 3 events can
race?

Perhaps a full blown reaction is a bit aggressive in this situation, as
the /fault/ may not be necessarily in the monitor.
We could think of a special tracepoint or just printing.

> Same below.
> 
> Also, I wouldn't use goto unless necessary. Perhaps it is better to
> put the
> code at "out_react:" and "out_success:" into the loop. But that's
> just my
> personal preference, up to you.

That could be done if we do a whole different thing when retries run
out, instead of defaulting to out_react.
I liked to avoid excessive indentation with those goto as well but
yeah, it may not be quite necessary.

I'll have a deeper thought on this.

Thanks,
Gabriele

Re: [PATCH v3 11/17] rv: Retry when da monitor detects race conditions

Posted by Nam Cao 6 months, 3 weeks ago

On Wed, Jul 16, 2025 at 10:20:39AM +0200, Gabriele Monaco wrote:
> On Tue, 2025-07-15 at 17:23 +0200, Nam Cao wrote:
> > If I understand correctly, if after 3 tries and we still fail to
> > change the
> > state, we will invoke the reactor and trace_error? Doesn't that cause
> > a
> > false positive? Because it is not a violation of the model, it is
> > just a
> > race making us fail to change the state.
> > 
> 
> Yes, that's correct.
> My rationale was that, at that point, the monitor is likely no longer
> in sync, so silently ignoring the situation is not really an option.
> In this case, the reaction includes an invalid current state (because
> in fact we don't know what the current state is) and tools may be able
> to understand that.

Can't you bring the monitor back to the init state, and start over again?

I think "da_mon->monitoring = 0;" does the trick?

> I know you wouldn't be able to do that in LTL..  By the way, LTL uses
> multiple statuses, so this lockless approach may not really work.

Let's worry about one thing at a time ;)

> I don't see this situation happening often: I only ever observed 2
> events able to race, 4 happening at the same time is wild, but of
> course cannot be excluded in principle for any possible monitor.
> Yet, I have the feeling a monitor where this can happen is not well
> designed and RV should point that out.
> Do you have ideas of potential monitors where more than 3 events can
> race?
> 
> Perhaps a full blown reaction is a bit aggressive in this situation, as
> the /fault/ may not be necessarily in the monitor.
> We could think of a special tracepoint or just printing.
> 
> > Same below.
> > 
> > Also, I wouldn't use goto unless necessary. Perhaps it is better to
> > put the
> > code at "out_react:" and "out_success:" into the loop. But that's
> > just my
> > personal preference, up to you.
> 
> That could be done if we do a whole different thing when retries run
> out, instead of defaulting to out_react.
> I liked to avoid excessive indentation with those goto as well but
> yeah, it may not be quite necessary.

Sure, as I said before, "just my personal preference, up to you."

Nam

Re: [PATCH v3 11/17] rv: Retry when da monitor detects race conditions

Posted by Gabriele Monaco 6 months, 3 weeks ago


On Wed, 2025-07-16 at 10:27 +0200, Nam Cao wrote:
> On Wed, Jul 16, 2025 at 10:20:39AM +0200, Gabriele Monaco wrote:
> > On Tue, 2025-07-15 at 17:23 +0200, Nam Cao wrote:
> > > If I understand correctly, if after 3 tries and we still fail to
> > > change the
> > > state, we will invoke the reactor and trace_error? Doesn't that
> > > cause
> > > a
> > > false positive? Because it is not a violation of the model, it is
> > > just a
> > > race making us fail to change the state.
> > > 
> > 
> > Yes, that's correct.
> > My rationale was that, at that point, the monitor is likely no
> > longer
> > in sync, so silently ignoring the situation is not really an
> > option.
> > In this case, the reaction includes an invalid current state
> > (because
> > in fact we don't know what the current state is) and tools may be
> > able
> > to understand that.
> 
> Can't you bring the monitor back to the init state, and start over
> again?
> 
> I think "da_mon->monitoring = 0;" does the trick?
> 

Yes you can, but I wouldn't do so silently.
I'd say the cleanest approach without reaction is to still return false
for the system to do all the cleanup but trace the event or, at the
very least, print a warning.

But you're right, this is more relevant for who develops the monitor
rather than for the user, so should probably be tracked separately.

Thanks,
Gabriele

> > I know you wouldn't be able to do that in LTL..  By the way, LTL
> > uses
> > multiple statuses, so this lockless approach may not really work.
> 
> Let's worry about one thing at a time ;)
> 
> > I don't see this situation happening often: I only ever observed 2
> > events able to race, 4 happening at the same time is wild, but of
> > course cannot be excluded in principle for any possible monitor.
> > Yet, I have the feeling a monitor where this can happen is not well
> > designed and RV should point that out.
> > Do you have ideas of potential monitors where more than 3 events
> > can
> > race?
> > 
> > Perhaps a full blown reaction is a bit aggressive in this
> > situation, as
> > the /fault/ may not be necessarily in the monitor.
> > We could think of a special tracepoint or just printing.
> > 
> > > Same below.
> > > 
> > > Also, I wouldn't use goto unless necessary. Perhaps it is better
> > > to
> > > put the
> > > code at "out_react:" and "out_success:" into the loop. But that's
> > > just my
> > > personal preference, up to you.
> > 
> > That could be done if we do a whole different thing when retries
> > run
> > out, instead of defaulting to out_react.
> > I liked to avoid excessive indentation with those goto as well but
> > yeah, it may not be quite necessary.
> 
> Sure, as I said before, "just my personal preference, up to you."
> 
> Nam

Re: [PATCH v3 11/17] rv: Retry when da monitor detects race conditions

Posted by Nam Cao 6 months, 3 weeks ago

On Wed, Jul 16, 2025 at 10:38:11AM +0200, Gabriele Monaco wrote:
> On Wed, 2025-07-16 at 10:27 +0200, Nam Cao wrote:
> > Can't you bring the monitor back to the init state, and start over
> > again?
> > 
> > I think "da_mon->monitoring = 0;" does the trick?
> > 
> 
> Yes you can, but I wouldn't do so silently.

Why not? The absolute worst that we get, is the rare case where a bug
appears at the exact same time. In that case, we would get a false
negative.

And I think that is really really rare.

> I'd say the cleanest approach without reaction is to still return false
> for the system to do all the cleanup but trace the event or, at the
> very least, print a warning.
> 
> But you're right, this is more relevant for who develops the monitor
> rather than for the user, so should probably be tracked separately.

Yes, if you really want to emit some sort of warning here, it should be
absolutely clear that the monitor itself is having a hiccup, not the
monitored kernel.

Nam

Re: [PATCH v3 11/17] rv: Retry when da monitor detects race conditions

Posted by Gabriele Monaco 6 months, 3 weeks ago

On Wed, 2025-07-16 at 10:45 +0200, Nam Cao wrote:
> On Wed, Jul 16, 2025 at 10:38:11AM +0200, Gabriele Monaco wrote:
> > Yes you can, but I wouldn't do so silently.
> 
> Why not? The absolute worst that we get, is the rare case where a bug
> appears at the exact same time. In that case, we would get a false
> negative.
> 
> And I think that is really really rare.

Well, we wouldn't even know how rare it is because we don't track it!

It may be a harmless note but might also be a design flaw in the
monitor, so it should be possible to understand when it happens.

> Yes, if you really want to emit some sort of warning here, it should
> be
> absolutely clear that the monitor itself is having a hiccup, not the
> monitored kernel.

Yeah totally, not saying that other errors reported in the wild are
going to be necessarily the kernel's fault though ;)
But let's keep these cases separate going forward nonetheless.

Thanks,
Gabriele

Re: [PATCH v3 11/17] rv: Retry when da monitor detects race conditions

Posted by Nam Cao 6 months, 3 weeks ago

On Wed, Jul 16, 2025 at 10:59:44AM +0200, Gabriele Monaco wrote:
> On Wed, 2025-07-16 at 10:45 +0200, Nam Cao wrote:
> > On Wed, Jul 16, 2025 at 10:38:11AM +0200, Gabriele Monaco wrote:
> > > Yes you can, but I wouldn't do so silently.
> > 
> > Why not? The absolute worst that we get, is the rare case where a bug
> > appears at the exact same time. In that case, we would get a false
> > negative.
> > 
> > And I think that is really really rare.
> 
> Well, we wouldn't even know how rare it is because we don't track it!
> 
> It may be a harmless note but might also be a design flaw in the
> monitor, so it should be possible to understand when it happens.

Oh, so you just want to track when this happens, as a RV developer. That
makes sense.

Just make sure it won't confuse other people who use the monitors to test
the kernel.

Best regards,
Nam