[PATCH v2 1/6] sched/stats: Print domain name in /proc/schedstat

Swapnil Sapkal posted 6 patches 1 week ago
[PATCH v2 1/6] sched/stats: Print domain name in /proc/schedstat
Posted by Swapnil Sapkal 1 week ago
From: K Prateek Nayak <kprateek.nayak@amd.com>

Currently, there does not exist a straightforward way to extract the
names of the sched domains and match them to the per-cpu domain entry in
/proc/schedstat other than looking at the debugfs files which are only
visible after enabling "verbose" debug after commit 34320745dfc9
("sched/debug: Put sched/domains files under the verbose flag")

Since tools like `perf sched schedstat` require displaying per-domain
information in user friendly manner, display the names of sched domain,
alongside their level in /proc/schedstat if CONFIG_SCHED_DEBUG is enabled.

Domain names also makes the /proc/schedstat data unambiguous when some
of the cpus are offline. For example, on a 128 cpus AMD Zen3 machine
where CPU0 and CPU64 are SMT siblings and CPU64 is offline:

Before:
    cpu0 ...
    domain0 ...
    domain1 ...
    cpu1 ...
    domain0 ...
    domain1 ...
    domain2 ...

After:
    cpu0 ...
    domain0:MC ...
    domain1:PKG ...
    cpu1 ...
    domain0:SMT ...
    domain1:MC ...
    domain2:PKG ...

schedstat version has not been updated since this change merely adds
additional information to the domain name field and does not add a new
field altogether.

Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Tested-by: James Clark <james.clark@linaro.org>
---
 Documentation/scheduler/sched-stats.rst | 8 ++++++--
 kernel/sched/stats.c                    | 6 +++++-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/Documentation/scheduler/sched-stats.rst b/Documentation/scheduler/sched-stats.rst
index 7c2b16c4729d..b60a3e7bc108 100644
--- a/Documentation/scheduler/sched-stats.rst
+++ b/Documentation/scheduler/sched-stats.rst
@@ -6,6 +6,8 @@ Version 16 of schedstats changed the order of definitions within
 'enum cpu_idle_type', which changed the order of [CPU_MAX_IDLE_TYPES]
 columns in show_schedstat(). In particular the position of CPU_IDLE
 and __CPU_NOT_IDLE changed places. The size of the array is unchanged.
+With CONFIG_SCHED_DEBUG enabled, the domain field can also print the
+name of the corresponding sched domain.
 
 Version 15 of schedstats dropped counters for some sched_yield:
 yld_exp_empty, yld_act_empty and yld_both_empty. Otherwise, it is
@@ -71,9 +73,11 @@ Domain statistics
 -----------------
 One of these is produced per domain for each cpu described. (Note that if
 CONFIG_SMP is not defined, *no* domains are utilized and these lines
-will not appear in the output.)
+will not appear in the output. [:<name>] is an optional extension to the domain
+field that prints the name of the corresponding sched domain. It can appear in
+schedstat version 16 and above, and requires CONFIG_SCHED_DEBUG.)
 
-domain<N> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
+domain<N>[:<name>] <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
 
 The first field is a bit mask indicating what cpus this domain operates over.
 
diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
index eb0cdcd4d921..bd4ed737e894 100644
--- a/kernel/sched/stats.c
+++ b/kernel/sched/stats.c
@@ -138,7 +138,11 @@ static int show_schedstat(struct seq_file *seq, void *v)
 		for_each_domain(cpu, sd) {
 			enum cpu_idle_type itype;
 
-			seq_printf(seq, "domain%d %*pb", dcount++,
+			seq_printf(seq, "domain%d", dcount++);
+#ifdef CONFIG_SCHED_DEBUG
+			seq_printf(seq, ":%s", sd->name);
+#endif
+			seq_printf(seq, " %*pb",
 				   cpumask_pr_args(sched_domain_span(sd)));
 			for (itype = 0; itype < CPU_MAX_IDLE_TYPES; itype++) {
 				seq_printf(seq, " %u %u %u %u %u %u %u %u",
-- 
2.43.0
Re: [PATCH v2 1/6] sched/stats: Print domain name in /proc/schedstat
Posted by Peter Zijlstra 1 week ago
On Fri, Nov 22, 2024 at 08:44:47AM +0000, Swapnil Sapkal wrote:

> schedstat version has not been updated since this change merely adds
> additional information to the domain name field and does not add a new
> field altogether.

So I don't care much either way, but if an existing tool is trying to
parse the domain number, it might now get confused by the extra
characters. I know of no such tool, just being pedantic.
Re: [PATCH v2 1/6] sched/stats: Print domain name in /proc/schedstat
Posted by Sapkal, Swapnil 4 days, 6 hours ago
Hello Peter,

On 11/22/2024 4:42 PM, Peter Zijlstra wrote:
> On Fri, Nov 22, 2024 at 08:44:47AM +0000, Swapnil Sapkal wrote:
> 
>> schedstat version has not been updated since this change merely adds
>> additional information to the domain name field and does not add a new
>> field altogether.
> 
> So I don't care much either way, but if an existing tool is trying to
> parse the domain number, it might now get confused by the extra
> characters. I know of no such tool, just being pedantic.

I will change the schedstat version so that it will not break any 
existing tools.

However the domain name is under CONFIG_SCHED_DEBUG and thus 
incrementing the schedstats version is not sufficient when this config 
option is disabled. Is it okay for you to move domain name out of 
CONFIG_SCHED_DEBUG?

Previously, I sent couple of patches to change the schedstat layout for
* Correctly accounting the different types of imbalances [1]
* Accurately computing the task hot migrations [2] (This was your patch)
I would like to include these two patches that better helps justify the 
version change. Is that okay by you?

[1] 
https://lore.kernel.org/all/20240514044445.1409976-1-swapnil.sapkal@amd.com/
[2] 
https://lore.kernel.org/all/20230614102224.12555-1-swapnil.sapkal@amd.com/

--
Thanks and Regards,
Swapnil
Re: [PATCH v2 1/6] sched/stats: Print domain name in /proc/schedstat
Posted by Peter Zijlstra 4 days, 4 hours ago
On Mon, Nov 25, 2024 at 06:01:16PM +0530, Sapkal, Swapnil wrote:
> Hello Peter,
> 
> On 11/22/2024 4:42 PM, Peter Zijlstra wrote:
> > On Fri, Nov 22, 2024 at 08:44:47AM +0000, Swapnil Sapkal wrote:
> > 
> > > schedstat version has not been updated since this change merely adds
> > > additional information to the domain name field and does not add a new
> > > field altogether.
> > 
> > So I don't care much either way, but if an existing tool is trying to
> > parse the domain number, it might now get confused by the extra
> > characters. I know of no such tool, just being pedantic.
> 
> I will change the schedstat version so that it will not break any existing
> tools.
> 
> However the domain name is under CONFIG_SCHED_DEBUG and thus incrementing
> the schedstats version is not sufficient when this config option is
> disabled. Is it okay for you to move domain name out of CONFIG_SCHED_DEBUG?

I suppose so..

> Previously, I sent couple of patches to change the schedstat layout for
> * Correctly accounting the different types of imbalances [1]
> * Accurately computing the task hot migrations [2] (This was your patch)
> I would like to include these two patches that better helps justify the
> version change. Is that okay by you?

Yeah, sure.
Re: [sos-linux-ext-patches] [PATCH v2 1/6] sched/stats: Print domain name in /proc/schedstat
Posted by Sapkal, Swapnil 1 week ago
On 11/22/2024 2:14 PM, Swapnil Sapkal wrote:
> From: K Prateek Nayak <kprateek.nayak@amd.com>
> 
> Currently, there does not exist a straightforward way to extract the
> names of the sched domains and match them to the per-cpu domain entry in
> /proc/schedstat other than looking at the debugfs files which are only
> visible after enabling "verbose" debug after commit 34320745dfc9
> ("sched/debug: Put sched/domains files under the verbose flag")
> 
> Since tools like `perf sched schedstat` require displaying per-domain
> information in user friendly manner, display the names of sched domain,
> alongside their level in /proc/schedstat if CONFIG_SCHED_DEBUG is enabled.
> 
> Domain names also makes the /proc/schedstat data unambiguous when some
> of the cpus are offline. For example, on a 128 cpus AMD Zen3 machine
> where CPU0 and CPU64 are SMT siblings and CPU64 is offline:
> 
> Before:
>      cpu0 ...
>      domain0 ...
>      domain1 ...
>      cpu1 ...
>      domain0 ...
>      domain1 ...
>      domain2 ...
> 
> After:
>      cpu0 ...
>      domain0:MC ...
>      domain1:PKG ...
>      cpu1 ...
>      domain0:SMT ...
>      domain1:MC ...
>      domain2:PKG ...
> 
> schedstat version has not been updated since this change merely adds
> additional information to the domain name field and does not add a new
> field altogether.
> 
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> Tested-by: James Clark <james.clark@linaro.org>

Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
Re: [sos-linux-ext-patches] [PATCH v2 1/6] sched/stats: Print domain name in /proc/schedstat
Posted by Peter Zijlstra 1 week ago
On Fri, Nov 22, 2024 at 02:25:22PM +0530, Sapkal, Swapnil wrote:
> On 11/22/2024 2:14 PM, Swapnil Sapkal wrote:
> > From: K Prateek Nayak <kprateek.nayak@amd.com>
> > 
> > Currently, there does not exist a straightforward way to extract the
> > names of the sched domains and match them to the per-cpu domain entry in
> > /proc/schedstat other than looking at the debugfs files which are only
> > visible after enabling "verbose" debug after commit 34320745dfc9
> > ("sched/debug: Put sched/domains files under the verbose flag")
> > 
> > Since tools like `perf sched schedstat` require displaying per-domain
> > information in user friendly manner, display the names of sched domain,
> > alongside their level in /proc/schedstat if CONFIG_SCHED_DEBUG is enabled.
> > 
> > Domain names also makes the /proc/schedstat data unambiguous when some
> > of the cpus are offline. For example, on a 128 cpus AMD Zen3 machine
> > where CPU0 and CPU64 are SMT siblings and CPU64 is offline:
> > 
> > Before:
> >      cpu0 ...
> >      domain0 ...
> >      domain1 ...
> >      cpu1 ...
> >      domain0 ...
> >      domain1 ...
> >      domain2 ...
> > 
> > After:
> >      cpu0 ...
> >      domain0:MC ...
> >      domain1:PKG ...
> >      cpu1 ...
> >      domain0:SMT ...
> >      domain1:MC ...
> >      domain2:PKG ...
> > 
> > schedstat version has not been updated since this change merely adds
> > additional information to the domain name field and does not add a new
> > field altogether.
> > 
> > Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> > Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> > Tested-by: James Clark <james.clark@linaro.org>
> 
> Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>

Surely you mean either acked-by or reviewed-by ? Otherwise I suggest you
re-read the documentation on tags.
Re: [sos-linux-ext-patches] [PATCH v2 1/6] sched/stats: Print domain name in /proc/schedstat
Posted by Sapkal, Swapnil 4 days, 14 hours ago
Hi Peter,

On 11/22/2024 4:38 PM, Peter Zijlstra wrote:
> On Fri, Nov 22, 2024 at 02:25:22PM +0530, Sapkal, Swapnil wrote:
>> On 11/22/2024 2:14 PM, Swapnil Sapkal wrote:
>>> From: K Prateek Nayak <kprateek.nayak@amd.com>
>>>
>>> Currently, there does not exist a straightforward way to extract the
>>> names of the sched domains and match them to the per-cpu domain entry in
>>> /proc/schedstat other than looking at the debugfs files which are only
>>> visible after enabling "verbose" debug after commit 34320745dfc9
>>> ("sched/debug: Put sched/domains files under the verbose flag")
>>>
>>> Since tools like `perf sched schedstat` require displaying per-domain
>>> information in user friendly manner, display the names of sched domain,
>>> alongside their level in /proc/schedstat if CONFIG_SCHED_DEBUG is enabled.
>>>
>>> Domain names also makes the /proc/schedstat data unambiguous when some
>>> of the cpus are offline. For example, on a 128 cpus AMD Zen3 machine
>>> where CPU0 and CPU64 are SMT siblings and CPU64 is offline:
>>>
>>> Before:
>>>       cpu0 ...
>>>       domain0 ...
>>>       domain1 ...
>>>       cpu1 ...
>>>       domain0 ...
>>>       domain1 ...
>>>       domain2 ...
>>>
>>> After:
>>>       cpu0 ...
>>>       domain0:MC ...
>>>       domain1:PKG ...
>>>       cpu1 ...
>>>       domain0:SMT ...
>>>       domain1:MC ...
>>>       domain2:PKG ...
>>>
>>> schedstat version has not been updated since this change merely adds
>>> additional information to the domain name field and does not add a new
>>> field altogether.
>>>
>>> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
>>> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
>>> Tested-by: James Clark <james.clark@linaro.org>
>>
>> Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
> 
> Surely you mean either acked-by or reviewed-by ? Otherwise I suggest you
> re-read the documentation on tags.

I took over the series from Ravi but forgot to add my S-o-b. I will make 
sure to include it from next time onward.

--
Thanks and Regards,
Swapnil