[RFC PATCH 6/9] sched: Treat try_to_block_task with pending signal as wakeup

Gabriele Monaco posted 9 patches 10 months, 1 week ago
There is a newer version of this series
[RFC PATCH 6/9] sched: Treat try_to_block_task with pending signal as wakeup
Posted by Gabriele Monaco 10 months, 1 week ago
If a task sets itself to interruptible and schedules, the __schedule
function checks whether there's a pending signal and, if that's the
case, updates the state of the task to runnable instead of dequeuing.
By looking at the tracepoints, we see the task enters the scheduler
while sleepable but exits as runnable. From a modelling perspective,
this is equivalent to a wakeup and the tracepoints should reflect that.

Add the waking/wakeup tracepoints in the try_to_block_task function and
set the prev_state used by the sched_switch tracepoint to TASK_RUNNING
if the task had a pending signal and was not blocked.

Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 kernel/sched/core.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f2f79236d5811..48cb32abce01a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6584,7 +6584,12 @@ static bool try_to_block_task(struct rq *rq, struct task_struct *p,
 	int flags = DEQUEUE_NOCLOCK;
 
 	if (signal_pending_state(task_state, p)) {
-		WRITE_ONCE(p->__state, TASK_RUNNING);
+		/*
+		 * From a modelling perspective, this is equivalent to a wakeup
+		 * before dequeuing the task: trace accordingly.
+		 */
+		trace_sched_waking(p);
+		ttwu_do_wakeup(p);
 		return false;
 	}
 
@@ -6721,7 +6726,9 @@ static void __sched notrace __schedule(int sched_mode)
 			goto picked;
 		}
 	} else if (!preempt && prev_state) {
-		try_to_block_task(rq, prev, prev_state);
+		/* Task was not blocked due to a signal and is again runnable */
+		if (!try_to_block_task(rq, prev, prev_state))
+			prev_state = TASK_RUNNING;
 		switch_count = &prev->nvcsw;
 	}
 
-- 
2.49.0
Re: [RFC PATCH 6/9] sched: Treat try_to_block_task with pending signal as wakeup
Posted by Nam Cao 10 months ago
On Fri, Apr 04, 2025 at 10:45:19AM +0200, Gabriele Monaco wrote:
> If a task sets itself to interruptible and schedules, the __schedule
> function checks whether there's a pending signal and, if that's the
> case, updates the state of the task to runnable instead of dequeuing.
> By looking at the tracepoints, we see the task enters the scheduler
> while sleepable but exits as runnable. From a modelling perspective,
> this is equivalent to a wakeup and the tracepoints should reflect that.
> 
> Add the waking/wakeup tracepoints in the try_to_block_task function and
> set the prev_state used by the sched_switch tracepoint to TASK_RUNNING
> if the task had a pending signal and was not blocked.
> 
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> ---
>  kernel/sched/core.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f2f79236d5811..48cb32abce01a 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6584,7 +6584,12 @@ static bool try_to_block_task(struct rq *rq, struct task_struct *p,
>  	int flags = DEQUEUE_NOCLOCK;
>  
>  	if (signal_pending_state(task_state, p)) {
> -		WRITE_ONCE(p->__state, TASK_RUNNING);
> +		/*
> +		 * From a modelling perspective, this is equivalent to a wakeup
> +		 * before dequeuing the task: trace accordingly.
> +		 */
> +		trace_sched_waking(p);
> +		ttwu_do_wakeup(p);

I don't think we should put trace_sched_waking() here. trace_sched_waking()
"is guaranteed to be called from the waking context", and this is not the
waking context.

There is already a trace_sched_waking() in signal_wake_up_state(). This is
duplicating that, in the wrong context.

ttwu_do_wakeup() alone should be sufficient?

Best regards,
Nam
Re: [RFC PATCH 6/9] sched: Treat try_to_block_task with pending signal as wakeup
Posted by Gabriele Monaco 10 months ago

On Sun, 2025-04-13 at 17:05 +0200, Nam Cao wrote:
> On Fri, Apr 04, 2025 at 10:45:19AM +0200, Gabriele Monaco wrote:
> > If a task sets itself to interruptible and schedules, the
> > __schedule
> > function checks whether there's a pending signal and, if that's the
> > case, updates the state of the task to runnable instead of
> > dequeuing.
> > By looking at the tracepoints, we see the task enters the scheduler
> > while sleepable but exits as runnable. From a modelling
> > perspective,
> > this is equivalent to a wakeup and the tracepoints should reflect
> > that.
> > 
> > Add the waking/wakeup tracepoints in the try_to_block_task function
> > and
> > set the prev_state used by the sched_switch tracepoint to
> > TASK_RUNNING
> > if the task had a pending signal and was not blocked.
> > 
> > Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> > ---
> >  kernel/sched/core.c | 11 +++++++++--
> >  1 file changed, 9 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index f2f79236d5811..48cb32abce01a 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -6584,7 +6584,12 @@ static bool try_to_block_task(struct rq *rq,
> > struct task_struct *p,
> >  	int flags = DEQUEUE_NOCLOCK;
> >  
> >  	if (signal_pending_state(task_state, p)) {
> > -		WRITE_ONCE(p->__state, TASK_RUNNING);
> > +		/*
> > +		 * From a modelling perspective, this is
> > equivalent to a wakeup
> > +		 * before dequeuing the task: trace accordingly.
> > +		 */
> > +		trace_sched_waking(p);
> > +		ttwu_do_wakeup(p);
> 
> I don't think we should put trace_sched_waking() here.
> trace_sched_waking()
> "is guaranteed to be called from the waking context", and this is not
> the
> waking context.
> 
> There is already a trace_sched_waking() in signal_wake_up_state().
> This is
> duplicating that, in the wrong context.
> 
> ttwu_do_wakeup() alone should be sufficient?

Mmh, that's a good point.
The thing is: this happens when the signal is generated while we are
scheduling (on different CPUs), so we take a short-cut and put the task
to running directly.
This thing is already racy, so we may or may not see the waking/wakeup.

Now probably waking shouldn't be there for the reason you said, but I'm
not sure a wakeup not following a waking would be correct either.
I might be missing something here, though.

Thanks,
Gabriele
Re: [RFC PATCH 6/9] sched: Treat try_to_block_task with pending signal as wakeup
Posted by Nam Cao 9 months, 4 weeks ago
On Mon, Apr 14, 2025 at 12:31:12PM +0200, Gabriele Monaco wrote:
> Mmh, that's a good point.
> The thing is: this happens when the signal is generated while we are
> scheduling (on different CPUs), so we take a short-cut and put the task
> to running directly.
> This thing is already racy, so we may or may not see the waking/wakeup.
> 
> Now probably waking shouldn't be there for the reason you said, but I'm
> not sure a wakeup not following a waking would be correct either.
> I might be missing something here, though.

I'm not familiar with signal and sched, so I don't have anything more to
add, sorry.

I presume this is to make the srs monitor works? Perhaps it is possible to
modify the model so that this patch is not required? Let me stare at srs,
maybe I will have something..

Nam
Re: [RFC PATCH 6/9] sched: Treat try_to_block_task with pending signal as wakeup
Posted by Gabriele Monaco 9 months, 4 weeks ago
2025-04-15T11:05:06Z Nam Cao <namcao@linutronix.de>:

> On Mon, Apr 14, 2025 at 12:31:12PM +0200, Gabriele Monaco wrote:
>> Mmh, that's a good point.
>> The thing is: this happens when the signal is generated while we are
>> scheduling (on different CPUs), so we take a short-cut and put the task
>> to running directly.
>> This thing is already racy, so we may or may not see the waking/wakeup.
>>
>> Now probably waking shouldn't be there for the reason you said, but I'm
>> not sure a wakeup not following a waking would be correct either.
>> I might be missing something here, though.
>
> I'm not familiar with signal and sched, so I don't have anything more to
> add, sorry.
>

No worries, that was helpful already!

> I presume this is to make the srs monitor works? Perhaps it is possible to
> modify the model so that this patch is not required? Let me stare at srs,
> maybe I will have something..

Yeah, that's another option, we could have a special case just for that, but I wanted to avoid it.
But still, let me know if you got a good idea how to rearrange it!

Thanks,
Gabriele
Re: [RFC PATCH 6/9] sched: Treat try_to_block_task with pending signal as wakeup
Posted by Nam Cao 9 months, 4 weeks ago
On Fri, Apr 04, 2025 at 10:45:19AM +0200, Gabriele Monaco wrote:
> If a task sets itself to interruptible and schedules, the __schedule
> function checks whether there's a pending signal and, if that's the
> case, updates the state of the task to runnable instead of dequeuing.
> By looking at the tracepoints, we see the task enters the scheduler
> while sleepable but exits as runnable. From a modelling perspective,
> this is equivalent to a wakeup and the tracepoints should reflect that.
> 
> Add the waking/wakeup tracepoints in the try_to_block_task function and
> set the prev_state used by the sched_switch tracepoint to TASK_RUNNING
> if the task had a pending signal and was not blocked.
> 
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> ---
>  kernel/sched/core.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f2f79236d5811..48cb32abce01a 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6584,7 +6584,12 @@ static bool try_to_block_task(struct rq *rq, struct task_struct *p,
>  	int flags = DEQUEUE_NOCLOCK;
>  
>  	if (signal_pending_state(task_state, p)) {
> -		WRITE_ONCE(p->__state, TASK_RUNNING);
> +		/*
> +		 * From a modelling perspective, this is equivalent to a wakeup
> +		 * before dequeuing the task: trace accordingly.
> +		 */
> +		trace_sched_waking(p);
> +		ttwu_do_wakeup(p);
>  		return false;
>  	}
>  
> @@ -6721,7 +6726,9 @@ static void __sched notrace __schedule(int sched_mode)
>  			goto picked;
>  		}
>  	} else if (!preempt && prev_state) {
> -		try_to_block_task(rq, prev, prev_state);
> +		/* Task was not blocked due to a signal and is again runnable */
> +		if (!try_to_block_task(rq, prev, prev_state))
> +			prev_state = TASK_RUNNING;
>  		switch_count = &prev->nvcsw;
>  	}

I couldn't reproduce the problem that this patch is solving. But staring at
the srs monitor, I made an educated guess that this is to accomodate the
transition "sleepable x wakeup -> running"?

But for this transition, no real wakeup happens, just the task's state is
changed to "sleepable" then back to "running", right? Sleep hasn't actually
happened yet?

If that is the case, would the patch below also solves it? It would turn
the transition into "sleepable x set_runnable -> running", which I think
describe it more accurately.

Best regards,
Nam

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9a14d792a681..e39b4aadeba2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6592,6 +6592,7 @@ static bool try_to_block_task(struct rq *rq, struct task_struct *p,
 	int flags = DEQUEUE_NOCLOCK;
 
 	if (signal_pending_state(task_state, p)) {
+		trace_sched_set_state_tp(p, TASK_RUNNING);
 		WRITE_ONCE(p->__state, TASK_RUNNING);
 		return false;
 	}
Re: [RFC PATCH 6/9] sched: Treat try_to_block_task with pending signal as wakeup
Posted by Gabriele Monaco 9 months, 4 weeks ago

On Wed, 2025-04-16 at 11:20 +0200, Nam Cao wrote:
> On Fri, Apr 04, 2025 at 10:45:19AM +0200, Gabriele Monaco wrote:
> > If a task sets itself to interruptible and schedules, the
> > __schedule
> > function checks whether there's a pending signal and, if that's the
> > case, updates the state of the task to runnable instead of
> > dequeuing.
> > By looking at the tracepoints, we see the task enters the scheduler
> > while sleepable but exits as runnable. From a modelling
> > perspective,
> > this is equivalent to a wakeup and the tracepoints should reflect
> > that.
> > 
> > Add the waking/wakeup tracepoints in the try_to_block_task function
> > and
> > set the prev_state used by the sched_switch tracepoint to
> > TASK_RUNNING
> > if the task had a pending signal and was not blocked.
> > 
> > Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> > ---
> >  kernel/sched/core.c | 11 +++++++++--
> >  1 file changed, 9 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index f2f79236d5811..48cb32abce01a 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -6584,7 +6584,12 @@ static bool try_to_block_task(struct rq *rq,
> > struct task_struct *p,
> >  	int flags = DEQUEUE_NOCLOCK;
> >  
> >  	if (signal_pending_state(task_state, p)) {
> > -		WRITE_ONCE(p->__state, TASK_RUNNING);
> > +		/*
> > +		 * From a modelling perspective, this is
> > equivalent to a wakeup
> > +		 * before dequeuing the task: trace accordingly.
> > +		 */
> > +		trace_sched_waking(p);
> > +		ttwu_do_wakeup(p);
> >  		return false;
> >  	}
> >  
> > @@ -6721,7 +6726,9 @@ static void __sched notrace __schedule(int
> > sched_mode)
> >  			goto picked;
> >  		}
> >  	} else if (!preempt && prev_state) {
> > -		try_to_block_task(rq, prev, prev_state);
> > +		/* Task was not blocked due to a signal and is
> > again runnable */
> > +		if (!try_to_block_task(rq, prev, prev_state))
> > +			prev_state = TASK_RUNNING;
> >  		switch_count = &prev->nvcsw;
> >  	}
> 
> I couldn't reproduce the problem that this patch is solving. But
> staring at
> the srs monitor, I made an educated guess that this is to accomodate
> the
> transition "sleepable x wakeup -> running"?
> 
> But for this transition, no real wakeup happens, just the task's
> state is
> changed to "sleepable" then back to "running", right? Sleep hasn't
> actually
> happened yet?
> 
> If that is the case, would the patch below also solves it? It would
> turn
> the transition into "sleepable x set_runnable -> running", which I
> think
> describe it more accurately.

Yeah that's pretty much it, there are a few problems though:
1. set_state should occur in task context and not while scheduling
2. set_state doesn't expect a task switch to occur

One way to solve this is to do like you said but add a flag to the
tracepoint to tell the model this set state is a special one happening
while scheduling, after that one, we may be scheduled out.

I didn't really like adding another state so I dropped that.

However, a task can be woken up before being scheduled out (I'd agree
with you it's not quite a wakeup as it wasn't yet sleeping, but it
happens, e.g. p == current in try_to_wake_up).
This case with the signal is, in that sense, a wakeup. We can even see
the tracepoint at times.

Anyway, that issue was mostly hypothetical, the patch also fixes the
prev_state (there's a patch by Peter on tip doing the same thing) and I
need to make sure it's really possible to see the issue after that too.

Thanks for looking into it,
Gabriele