[PATCH] sched/tracing: correct the task blocking state

alexs@kernel.org posted 1 patch 1 year, 11 months ago
include/linux/sched.h   | 4 ++++
kernel/sched/deadline.c | 2 +-
kernel/sched/fair.c     | 2 +-
kernel/sched/rt.c       | 2 +-
4 files changed, 7 insertions(+), 3 deletions(-)
[PATCH] sched/tracing: correct the task blocking state
Posted by alexs@kernel.org 1 year, 11 months ago
From: Alex Shi <alexs@kernel.org>

commit 80ed87c8a9ca ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")
stopped the idle kthreads contribution to loadavg. Also task idle should
separated from blocked state too, otherwise we will get incorrect task
blocking state from event tracing sched:sched_stat_blocked.

Originally-from: Curu Wong <curuwang@tencent.com>
Signed-off-by: Alex Shi <alexs@kernel.org>
To: linux-kernel@vger.kernel.org
To: Valentin Schneider <vschneid@redhat.com>
To: Daniel Bristot de Oliveira <bristot@redhat.com>
To: Mel Gorman <mgorman@suse.de>
To: Ben Segall <bsegall@google.com>
To: Steven Rostedt <rostedt@goodmis.org>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
To: Juri Lelli <juri.lelli@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
To: Ingo Molnar <mingo@redhat.com>
---
 include/linux/sched.h   | 4 ++++
 kernel/sched/deadline.c | 2 +-
 kernel/sched/fair.c     | 2 +-
 kernel/sched/rt.c       | 2 +-
 4 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 292c31697248..341e62255ea7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -140,6 +140,10 @@ struct user_event_mm;
 #define is_special_task_state(state)				\
 	((state) & (__TASK_STOPPED | __TASK_TRACED | TASK_PARKED | TASK_DEAD))
 
+/* blocked task is UNINTERRUPTIBLE but not NOLOAD */
+#define is_blocked_task_state(state)				\
+	((state) & TASK_UNINTERRUPTIBLE && (!((state) & TASK_NOLOAD)))
+
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
 # define debug_normal_state_change(state_value)				\
 	do {								\
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index b28114478b82..b6afa596f071 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1570,7 +1570,7 @@ update_stats_dequeue_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se,
 			__schedstat_set(p->stats.sleep_start,
 					rq_clock(rq_of_dl_rq(dl_rq)));
 
-		if (state & TASK_UNINTERRUPTIBLE)
+		if (is_blocked_task_state(state))
 			__schedstat_set(p->stats.block_start,
 					rq_clock(rq_of_dl_rq(dl_rq)));
 	}
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d7a3c63a2171..349b0c5104b6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1281,7 +1281,7 @@ update_stats_dequeue_fair(struct cfs_rq *cfs_rq, struct sched_entity *se, int fl
 		if (state & TASK_INTERRUPTIBLE)
 			__schedstat_set(tsk->stats.sleep_start,
 				      rq_clock(rq_of(cfs_rq)));
-		if (state & TASK_UNINTERRUPTIBLE)
+		if (is_blocked_task_state(state))
 			__schedstat_set(tsk->stats.block_start,
 				      rq_clock(rq_of(cfs_rq)));
 	}
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 6aaf0a3d6081..2fdf3d71428d 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1375,7 +1375,7 @@ update_stats_dequeue_rt(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se,
 			__schedstat_set(p->stats.sleep_start,
 					rq_clock(rq_of_rt_rq(rt_rq)));
 
-		if (state & TASK_UNINTERRUPTIBLE)
+		if (is_blocked_task_state(state))
 			__schedstat_set(p->stats.block_start,
 					rq_clock(rq_of_rt_rq(rt_rq)));
 	}
-- 
2.43.0
Re: [PATCH] sched/tracing: correct the task blocking state
Posted by Valentin Schneider 1 year, 11 months ago
On 02/01/24 15:33, alexs@kernel.org wrote:
> From: Alex Shi <alexs@kernel.org>
>
> commit 80ed87c8a9ca ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")
> stopped the idle kthreads contribution to loadavg. Also task idle should
> separated from blocked state too, otherwise we will get incorrect task
> blocking state from event tracing sched:sched_stat_blocked.
>

Why is that incorrect? AFAICT we have mapped the (schedstat) 'blocked'
meaning to TASK_UNINTERRUPTIBLE. TASK_IDLE tasks don't contribute to
loadavg yes, but they are still in an UNINTERRUPTIBLE wait.

> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index b28114478b82..b6afa596f071 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1570,7 +1570,7 @@ update_stats_dequeue_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se,
>                       __schedstat_set(p->stats.sleep_start,
>                                       rq_clock(rq_of_dl_rq(dl_rq)));
>
> -		if (state & TASK_UNINTERRUPTIBLE)
> +		if (is_blocked_task_state(state))
>                       __schedstat_set(p->stats.block_start,
>                                       rq_clock(rq_of_dl_rq(dl_rq)));

This change makes it so tasks waiting in TASK_IDLE have their waiting
ignored by schedstat (they are seen as neither INTERRUPTIBLE nor UNINTERRUPTIBLE).
Re: [PATCH] sched/tracing: correct the task blocking state
Posted by Alex Shi 1 year, 11 months ago
On Tue, Jan 2, 2024 at 6:19 PM Valentin Schneider <vschneid@redhat.com> wrote:
>
> On 02/01/24 15:33, alexs@kernel.org wrote:
> > From: Alex Shi <alexs@kernel.org>
> >
> > commit 80ed87c8a9ca ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")
> > stopped the idle kthreads contribution to loadavg. Also task idle should
> > separated from blocked state too, otherwise we will get incorrect task
> > blocking state from event tracing sched:sched_stat_blocked.
> >
>
> Why is that incorrect? AFAICT we have mapped the (schedstat) 'blocked'
> meaning to TASK_UNINTERRUPTIBLE. TASK_IDLE tasks don't contribute to
> loadavg yes, but they are still in an UNINTERRUPTIBLE wait.
>
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index b28114478b82..b6afa596f071 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -1570,7 +1570,7 @@ update_stats_dequeue_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se,
> >                       __schedstat_set(p->stats.sleep_start,
> >                                       rq_clock(rq_of_dl_rq(dl_rq)));
> >
> > -             if (state & TASK_UNINTERRUPTIBLE)
> > +             if (is_blocked_task_state(state))
> >                       __schedstat_set(p->stats.block_start,
> >                                       rq_clock(rq_of_dl_rq(dl_rq)));
>
> This change makes it so tasks waiting in TASK_IDLE have their waiting
> ignored by schedstat (they are seen as neither INTERRUPTIBLE nor UNINTERRUPTIBLE).

Right, I will fix it by adding idle time into sleep. will send the 2nd
version patch.

Thanks
Alex
Re: [PATCH] sched/tracing: correct the task blocking state
Posted by Alex Shi 1 year, 11 months ago
On Tue, Jan 2, 2024 at 6:19 PM Valentin Schneider <vschneid@redhat.com> wrote:
>
> On 02/01/24 15:33, alexs@kernel.org wrote:
> > From: Alex Shi <alexs@kernel.org>
> >
> > commit 80ed87c8a9ca ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")
> > stopped the idle kthreads contribution to loadavg. Also task idle should
> > separated from blocked state too, otherwise we will get incorrect task
> > blocking state from event tracing sched:sched_stat_blocked.
> >
>
> Why is that incorrect? AFAICT we have mapped the (schedstat) 'blocked'
> meaning to TASK_UNINTERRUPTIBLE. TASK_IDLE tasks don't contribute to
> loadavg yes, but they are still in an UNINTERRUPTIBLE wait.


Hi Valentin,
Thanks a lot for the reply.

I agree with you the current usage, but if so, we account for the idle task into
blocked state. And it's better to distinguish between idle and block.

>
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index b28114478b82..b6afa596f071 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -1570,7 +1570,7 @@ update_stats_dequeue_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se,
> >                       __schedstat_set(p->stats.sleep_start,
> >                                       rq_clock(rq_of_dl_rq(dl_rq)));
> >
> > -             if (state & TASK_UNINTERRUPTIBLE)
> > +             if (is_blocked_task_state(state))
> >                       __schedstat_set(p->stats.block_start,
> >                                       rq_clock(rq_of_dl_rq(dl_rq)));
>
> This change makes it so tasks waiting in TASK_IDLE have their waiting
> ignored by schedstat (they are seen as neither INTERRUPTIBLE nor UNINTERRUPTIBLE).
>

Right, maybe it's time to add a new state for TASK_IDLE?

Many thanks!
Alex Shi
Re: [PATCH] sched/tracing: correct the task blocking state
Posted by Valentin Schneider 1 year, 11 months ago
On 02/01/24 21:00, Alex Shi wrote:
> On Tue, Jan 2, 2024 at 6:19 PM Valentin Schneider <vschneid@redhat.com> wrote:
>>
>> On 02/01/24 15:33, alexs@kernel.org wrote:
>> > From: Alex Shi <alexs@kernel.org>
>> >
>> > commit 80ed87c8a9ca ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")
>> > stopped the idle kthreads contribution to loadavg. Also task idle should
>> > separated from blocked state too, otherwise we will get incorrect task
>> > blocking state from event tracing sched:sched_stat_blocked.
>> >
>>
>> Why is that incorrect? AFAICT we have mapped the (schedstat) 'blocked'
>> meaning to TASK_UNINTERRUPTIBLE. TASK_IDLE tasks don't contribute to
>> loadavg yes, but they are still in an UNINTERRUPTIBLE wait.
>
>
> Hi Valentin,
> Thanks a lot for the reply.
>
> I agree with you the current usage, but if so, we account for the idle task into
> blocked state. And it's better to distinguish between idle and block.
>

Why is that an issue? If those tasks didn't have to be
TASK_UNINTERRUPTIBLE (via TASK_IDLE), we'd make them TASK_INTERRUPTIBLE and
they'd also inflate the 'sleeping' schedstat (rather than the 'blocked').

What problem are you facing with those tasks being flagged as blocked during
their wait?
Re: [PATCH] sched/tracing: correct the task blocking state
Posted by Alex Shi 1 year, 11 months ago
On Wed, Jan 3, 2024 at 12:16 AM Valentin Schneider <vschneid@redhat.com> wrote:
>
> On 02/01/24 21:00, Alex Shi wrote:
> > On Tue, Jan 2, 2024 at 6:19 PM Valentin Schneider <vschneid@redhat.com> wrote:
> >>
> >> On 02/01/24 15:33, alexs@kernel.org wrote:
> >> > From: Alex Shi <alexs@kernel.org>
> >> >
> >> > commit 80ed87c8a9ca ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")
> >> > stopped the idle kthreads contribution to loadavg. Also task idle should
> >> > separated from blocked state too, otherwise we will get incorrect task
> >> > blocking state from event tracing sched:sched_stat_blocked.
> >> >
> >>
> >> Why is that incorrect? AFAICT we have mapped the (schedstat) 'blocked'
> >> meaning to TASK_UNINTERRUPTIBLE. TASK_IDLE tasks don't contribute to
> >> loadavg yes, but they are still in an UNINTERRUPTIBLE wait.
> >
> >
> > Hi Valentin,
> > Thanks a lot for the reply.
> >
> > I agree with you the current usage, but if so, we account for the idle task into
> > blocked state. And it's better to distinguish between idle and block.
> >
>
> Why is that an issue? If those tasks didn't have to be
> TASK_UNINTERRUPTIBLE (via TASK_IDLE), we'd make them TASK_INTERRUPTIBLE and
> they'd also inflate the 'sleeping' schedstat (rather than the 'blocked').
>
> What problem are you facing with those tasks being flagged as blocked during
> their wait?
>

Uh, Tencent cloud has some latency sensitive services, a blocked state
means the service has
 some trouble, but with IDLE state involved, it's failed on this judgement.
and 2nd, if a service has abnormal, we want to check if it's hanging
on io or sth else, but the top
3 D tasks are often queuework in our system, and even a task in
blocked state we have no
quick way to figure out if it's IDLE or BLOCKED.  2 different states
will help us a lot.

Thanks!
Re: [PATCH] sched/tracing: correct the task blocking state
Posted by Valentin Schneider 1 year, 11 months ago
On 03/01/24 11:14, Alex Shi wrote:
> On Wed, Jan 3, 2024 at 12:16 AM Valentin Schneider <vschneid@redhat.com> wrote:
>>
>> Why is that an issue? If those tasks didn't have to be
>> TASK_UNINTERRUPTIBLE (via TASK_IDLE), we'd make them TASK_INTERRUPTIBLE and
>> they'd also inflate the 'sleeping' schedstat (rather than the 'blocked').
>>
>> What problem are you facing with those tasks being flagged as blocked during
>> their wait?
>>
>
> Uh, Tencent cloud has some latency sensitive services, a blocked state
> means the service has
>  some trouble, but with IDLE state involved, it's failed on this judgement.
> and 2nd, if a service has abnormal, we want to check if it's hanging
> on io or sth else, but the top
> 3 D tasks are often queuework in our system, and even a task in
> blocked state we have no
> quick way to figure out if it's IDLE or BLOCKED.  2 different states
> will help us a lot.
>

That's useful information - generally it's good to add the motivation for a
patch in the changelog.

> Thanks!
Re: [PATCH] sched/tracing: correct the task blocking state
Posted by Alex Shi 1 year, 11 months ago
On Wed, Jan 3, 2024 at 9:56 PM Valentin Schneider <vschneid@redhat.com> wrote:
>
> On 03/01/24 11:14, Alex Shi wrote:
> > On Wed, Jan 3, 2024 at 12:16 AM Valentin Schneider <vschneid@redhat.com> wrote:
> >>
> >> Why is that an issue? If those tasks didn't have to be
> >> TASK_UNINTERRUPTIBLE (via TASK_IDLE), we'd make them TASK_INTERRUPTIBLE and
> >> they'd also inflate the 'sleeping' schedstat (rather than the 'blocked').
> >>
> >> What problem are you facing with those tasks being flagged as blocked during
> >> their wait?
> >>
> >
> > Uh, Tencent cloud has some latency sensitive services, a blocked state
> > means the service has
> >  some trouble, but with IDLE state involved, it's failed on this judgement.
> > and 2nd, if a service has abnormal, we want to check if it's hanging
> > on io or sth else, but the top
> > 3 D tasks are often queuework in our system, and even a task in
> > blocked state we have no
> > quick way to figure out if it's IDLE or BLOCKED.  2 different states
> > will help us a lot.
> >
>
> That's useful information - generally it's good to add the motivation for a
> patch in the changelog.

Thanks a lot for coaching, I changed the commit log in the v2 patch:
https://lore.kernel.org/lkml/20240103081042.1549189-1-alexs@kernel.org/
Btw, which way is usual prefered, send v2 patch in a separate thread
or add '--in-reply-to' to follow this thread?

Thanks
Alex

>
> > Thanks!
>
Re: [PATCH] sched/tracing: correct the task blocking state
Posted by Valentin Schneider 1 year, 11 months ago
On 04/01/24 15:56, Alex Shi wrote:
> On Wed, Jan 3, 2024 at 9:56 PM Valentin Schneider <vschneid@redhat.com> wrote:
>>
>> That's useful information - generally it's good to add the motivation for a
>> patch in the changelog.
>
> Thanks a lot for coaching, I changed the commit log in the v2 patch:
> https://lore.kernel.org/lkml/20240103081042.1549189-1-alexs@kernel.org/
> Btw, which way is usual prefered, send v2 patch in a separate thread
> or add '--in-reply-to' to follow this thread?
>

New threads are usually better, it makes it clear that it is a new version
of the patch and not just a reply to the previous version's
thread.

Reviewers/Maintainers can also jump more easily on the v(n+1) when it's a
separate thread, in case they missed / didn't have time to review the
previous version.

> Thanks
> Alex
>
>>
>> > Thanks!
>>