[PATCH 2/9] sched/balancing: Remove reliance on 'enum cpu_idle_type' ordering when iterating [CPU_MAX_IDLE_TYPES] arrays in show_schedstat()

Ingo Molnar posted 9 patches 1 year, 11 months ago
There is a newer version of this series
[PATCH 2/9] sched/balancing: Remove reliance on 'enum cpu_idle_type' ordering when iterating [CPU_MAX_IDLE_TYPES] arrays in show_schedstat()
Posted by Ingo Molnar 1 year, 11 months ago
From: Shrikanth Hegde <sshegde@linux.ibm.com>

Shrikanth Hegde reported that show_schedstat() output broke when
the ordering of the definitions in 'enum cpu_idle_type' is changed,
because show_schedstat() assumed that 'CPU_IDLE' is 0.

Fix it before we change the definition order & values.

[ mingo: Added changelog. ]

Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Valentin Schneider <vschneid@redhat.com>
---
 kernel/sched/stats.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
index 857f837f52cb..85277953cc72 100644
--- a/kernel/sched/stats.c
+++ b/kernel/sched/stats.c
@@ -150,8 +150,7 @@ static int show_schedstat(struct seq_file *seq, void *v)
 
 			seq_printf(seq, "domain%d %*pb", dcount++,
 				   cpumask_pr_args(sched_domain_span(sd)));
-			for (itype = CPU_IDLE; itype < CPU_MAX_IDLE_TYPES;
-					itype++) {
+			for (itype = 0; itype < CPU_MAX_IDLE_TYPES; itype++) {
 				seq_printf(seq, " %u %u %u %u %u %u %u %u",
 				    sd->lb_count[itype],
 				    sd->lb_balanced[itype],
-- 
2.40.1
Re: [PATCH 2/9] sched/balancing: Remove reliance on 'enum cpu_idle_type' ordering when iterating [CPU_MAX_IDLE_TYPES] arrays in show_schedstat()
Posted by Shrikanth Hegde 1 year, 11 months ago

On 3/4/24 3:18 PM, Ingo Molnar wrote:
> From: Shrikanth Hegde <sshegde@linux.ibm.com>
> 
> Shrikanth Hegde reported that show_schedstat() output broke when
> the ordering of the definitions in 'enum cpu_idle_type' is changed,
> because show_schedstat() assumed that 'CPU_IDLE' is 0.
>
Hi Ingo. 
Feel free to drop me from the changelog. 

> @@ -150,8 +150,7 @@ static int show_schedstat(struct seq_file *seq, void *v)
>  
>  			seq_printf(seq, "domain%d %*pb", dcount++,
>  				   cpumask_pr_args(sched_domain_span(sd)));
> -			for (itype = CPU_IDLE; itype < CPU_MAX_IDLE_TYPES;
> -					itype++) {
> +			for (itype = 0; itype < CPU_MAX_IDLE_TYPES; itype++) {


It would still not be same order as current documentation of schedstat. no? The documentation 
would need changes too. Change SCHEDSTAT_VERSION to 16? 

 
Current documentation says this. 
--------------------
The next 24 are a variety of load_balance() statistics in grouped into types
of idleness (idle, busy, and newly idle):

Above code will do. 
(busy, idle and newly idle)


--------------------
Verified with the v3 patch as well using the previous method. 
Before patch:
cpu0 0 0 4400 1485 1624 1229 301472313236 120382198 7714    
				   [-------- idle --------][-----------busy--------][-------new-idle--]                    
domain0 00000000,00000000,00000055 1661 1661 0 0 0 0 0 1661 2495 2495 0 0 0 0 0 2495 67 66 1 2 0 0 0 66 0 0 0 0 0 0 0 0 0 133 38 0
domain1 ff000000,00ff0000,ffffffff 382 369 13 13 4 0 2 207 198 195 3 36 0 0 0 195 67 64 3 3 0 0 0 64 4 0 4 0 0 0 0 0 0 124 9 0
domain2 ff00ffff,00ffffff,ffffffff 586 585 1 6 0 0 0 365 118 116 2 96 0 0 0 116 67 67 0 0 0 0 0 67 0 0 0 0 0 0 0 0 0 59 0 0
domain3 ffffffff,ffffffff,ffffffff 481 479 2 58 0 0 0 387 97 97 0 0 0 0 0 96 67 67 0 0 0 0 0 67 0 0 0 0 0 0 0 0 0 79 0 0

After patch:
cpu0 0 0 3924 728 1940 1540 302019558490 425784368 8793
				   [-------- busy ----------][-----------idle--------][-------new-idle--]   
domain0 00000000,00000000,00000055 2494 2489 3 37 2 0 0 2489 1691 1691 0 0 0 0 0 1691 21 19 0 2 2 0 0 19 0 0 0 0 0 0 0 0 0 89 2 0
domain1 ff000000,00ff0000,ffffffff 196 193 3 44 0 0 0 193 411 400 10 2060 4 1 4 260 19 16 3 1028 0 0 0 16 3 0 3 0 0 0 0 0 0 59 2 0
domain2 ff00ffff,00ffffff,ffffffff 116 116 0 0 0 0 0 116 590 588 2 3 0 0 0 447 19 18 1 2 0 0 0 18 0 0 0 0 0 0 0 0 0 192 0 0
domain3 ffffffff,ffffffff,ffffffff 97 97 0 0 0 0 0 96 457 457 0 0 0 0 0 427 19 18 1 27 0 0 0 18 0 0 0 0 0 0 0 0 0 60 0 0



>  				seq_printf(seq, " %u %u %u %u %u %u %u %u",
>  				    sd->lb_count[itype],
>  				    sd->lb_balanced[itype],
Re: [PATCH 2/9] sched/balancing: Remove reliance on 'enum cpu_idle_type' ordering when iterating [CPU_MAX_IDLE_TYPES] arrays in show_schedstat()
Posted by Ingo Molnar 1 year, 11 months ago
* Shrikanth Hegde <sshegde@linux.ibm.com> wrote:

> 
> 
> On 3/4/24 3:18 PM, Ingo Molnar wrote:
> > From: Shrikanth Hegde <sshegde@linux.ibm.com>
> > 
> > Shrikanth Hegde reported that show_schedstat() output broke when
> > the ordering of the definitions in 'enum cpu_idle_type' is changed,
> > because show_schedstat() assumed that 'CPU_IDLE' is 0.
> >
> Hi Ingo. 
> Feel free to drop me from the changelog. 

Yeah - I made you the author of the commit, and indeed it should not refer 
to you in the third person. :-) Fixed.

> 
> > @@ -150,8 +150,7 @@ static int show_schedstat(struct seq_file *seq, void *v)
> >  
> >  			seq_printf(seq, "domain%d %*pb", dcount++,
> >  				   cpumask_pr_args(sched_domain_span(sd)));
> > -			for (itype = CPU_IDLE; itype < CPU_MAX_IDLE_TYPES;
> > -					itype++) {
> > +			for (itype = 0; itype < CPU_MAX_IDLE_TYPES; itype++) {
> 
> 
> It would still not be same order as current documentation of schedstat. 
> no? The documentation would need changes too. Change SCHEDSTAT_VERSION to 
> 16?

Correct. I've bumped SCHEDSTAT_VERSION up to 16 now, but since it hasn't 
been changed for the last 10+ years I'm wondering whether that's the right 
thing to do or we should add a quirk to maintain the v15 ordering?

I think we should also output the actual symbolic cpu_idle_type names into 
schedstat, so that tooling (and observant kernel developers) can see the 
actual ordering of the [CPU_MAX_IDLE_TYPES] columns.

A new line like this (mockup):

  cpu0 0 0 4400 1485 1624 1229 301472313236 120382198 7714    
+ cpu_idle_type CPU_IDLE 0 CPU_NOT_IDLE 1 CPU_NEWLY_IDLE 2 CPU_MAX_IDLE_TYPES 3
  domain0 00000000,00000000,00000055 1661 1661 0 0 0 0 0 1661 2495 2495 0 0 0 0 0 2495 67 66 1 2 0 0 0 66 0 0 0 0 0 0 0 0 0 133 38 0

... and after the change this would become:

  cpu_idle_type CPU_NOT_IDLE 0 CPU_IDLE 1 CPU_NEWLY_IDLE 2 CPU_MAX_IDLE_TYPES 3

or so?

This gives tooling (that cares) a way to enumerate the idle types, without 
having to rely on their numeric values. Adding a new line to schedstat 
shouldn't break existing tooling - and if it does, we've increased 
SCHEDSTAT_VERSION to 16 anyway. ;-)

Thanks,

	Ingo