From: K Prateek Nayak <kprateek.nayak@amd.com>
Currently, there does not exist a straightforward way to extract the
names of the sched domains and match them to the per-cpu domain entry in
/proc/schedstat other than looking at the debugfs files which are only
visible after enabling "verbose" debug after commit 34320745dfc9
("sched/debug: Put sched/domains files under the verbose flag")
Since tools like `perf sched schedstat` require displaying per-domain
information in user friendly manner, display the names of sched domain,
alongside their level in /proc/schedstat if CONFIG_SCHED_DEBUG is enabled.
Domain names also makes the /proc/schedstat data unambiguous when some
of the cpus are offline. For example, on a 128 cpus AMD Zen3 machine
where CPU0 and CPU64 are SMT siblings and CPU64 is offline:
Before:
cpu0 ...
domain0 ...
domain1 ...
cpu1 ...
domain0 ...
domain1 ...
domain2 ...
After:
cpu0 ...
domain0:MC ...
domain1:PKG ...
cpu1 ...
domain0:SMT ...
domain1:MC ...
domain2:PKG ...
schedstat version has not been updated since this change merely adds
additional information to the domain name field and does not add a new
field altogether.
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Tested-by: James Clark <james.clark@linaro.org>
---
Documentation/scheduler/sched-stats.rst | 8 ++++++--
kernel/sched/stats.c | 6 +++++-
2 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/Documentation/scheduler/sched-stats.rst b/Documentation/scheduler/sched-stats.rst
index 7c2b16c4729d..b60a3e7bc108 100644
--- a/Documentation/scheduler/sched-stats.rst
+++ b/Documentation/scheduler/sched-stats.rst
@@ -6,6 +6,8 @@ Version 16 of schedstats changed the order of definitions within
'enum cpu_idle_type', which changed the order of [CPU_MAX_IDLE_TYPES]
columns in show_schedstat(). In particular the position of CPU_IDLE
and __CPU_NOT_IDLE changed places. The size of the array is unchanged.
+With CONFIG_SCHED_DEBUG enabled, the domain field can also print the
+name of the corresponding sched domain.
Version 15 of schedstats dropped counters for some sched_yield:
yld_exp_empty, yld_act_empty and yld_both_empty. Otherwise, it is
@@ -71,9 +73,11 @@ Domain statistics
-----------------
One of these is produced per domain for each cpu described. (Note that if
CONFIG_SMP is not defined, *no* domains are utilized and these lines
-will not appear in the output.)
+will not appear in the output. [:<name>] is an optional extension to the domain
+field that prints the name of the corresponding sched domain. It can appear in
+schedstat version 16 and above, and requires CONFIG_SCHED_DEBUG.)
-domain<N> <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
+domain<N>[:<name>] <cpumask> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
The first field is a bit mask indicating what cpus this domain operates over.
diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
index eb0cdcd4d921..bd4ed737e894 100644
--- a/kernel/sched/stats.c
+++ b/kernel/sched/stats.c
@@ -138,7 +138,11 @@ static int show_schedstat(struct seq_file *seq, void *v)
for_each_domain(cpu, sd) {
enum cpu_idle_type itype;
- seq_printf(seq, "domain%d %*pb", dcount++,
+ seq_printf(seq, "domain%d", dcount++);
+#ifdef CONFIG_SCHED_DEBUG
+ seq_printf(seq, ":%s", sd->name);
+#endif
+ seq_printf(seq, " %*pb",
cpumask_pr_args(sched_domain_span(sd)));
for (itype = 0; itype < CPU_MAX_IDLE_TYPES; itype++) {
seq_printf(seq, " %u %u %u %u %u %u %u %u",
--
2.43.0
On Fri, Nov 22, 2024 at 08:44:47AM +0000, Swapnil Sapkal wrote: > schedstat version has not been updated since this change merely adds > additional information to the domain name field and does not add a new > field altogether. So I don't care much either way, but if an existing tool is trying to parse the domain number, it might now get confused by the extra characters. I know of no such tool, just being pedantic.
Hello Peter, On 11/22/2024 4:42 PM, Peter Zijlstra wrote: > On Fri, Nov 22, 2024 at 08:44:47AM +0000, Swapnil Sapkal wrote: > >> schedstat version has not been updated since this change merely adds >> additional information to the domain name field and does not add a new >> field altogether. > > So I don't care much either way, but if an existing tool is trying to > parse the domain number, it might now get confused by the extra > characters. I know of no such tool, just being pedantic. I will change the schedstat version so that it will not break any existing tools. However the domain name is under CONFIG_SCHED_DEBUG and thus incrementing the schedstats version is not sufficient when this config option is disabled. Is it okay for you to move domain name out of CONFIG_SCHED_DEBUG? Previously, I sent couple of patches to change the schedstat layout for * Correctly accounting the different types of imbalances [1] * Accurately computing the task hot migrations [2] (This was your patch) I would like to include these two patches that better helps justify the version change. Is that okay by you? [1] https://lore.kernel.org/all/20240514044445.1409976-1-swapnil.sapkal@amd.com/ [2] https://lore.kernel.org/all/20230614102224.12555-1-swapnil.sapkal@amd.com/ -- Thanks and Regards, Swapnil
On Mon, Nov 25, 2024 at 06:01:16PM +0530, Sapkal, Swapnil wrote: > Hello Peter, > > On 11/22/2024 4:42 PM, Peter Zijlstra wrote: > > On Fri, Nov 22, 2024 at 08:44:47AM +0000, Swapnil Sapkal wrote: > > > > > schedstat version has not been updated since this change merely adds > > > additional information to the domain name field and does not add a new > > > field altogether. > > > > So I don't care much either way, but if an existing tool is trying to > > parse the domain number, it might now get confused by the extra > > characters. I know of no such tool, just being pedantic. > > I will change the schedstat version so that it will not break any existing > tools. > > However the domain name is under CONFIG_SCHED_DEBUG and thus incrementing > the schedstats version is not sufficient when this config option is > disabled. Is it okay for you to move domain name out of CONFIG_SCHED_DEBUG? I suppose so.. > Previously, I sent couple of patches to change the schedstat layout for > * Correctly accounting the different types of imbalances [1] > * Accurately computing the task hot migrations [2] (This was your patch) > I would like to include these two patches that better helps justify the > version change. Is that okay by you? Yeah, sure.
On 11/22/2024 2:14 PM, Swapnil Sapkal wrote:
> From: K Prateek Nayak <kprateek.nayak@amd.com>
>
> Currently, there does not exist a straightforward way to extract the
> names of the sched domains and match them to the per-cpu domain entry in
> /proc/schedstat other than looking at the debugfs files which are only
> visible after enabling "verbose" debug after commit 34320745dfc9
> ("sched/debug: Put sched/domains files under the verbose flag")
>
> Since tools like `perf sched schedstat` require displaying per-domain
> information in user friendly manner, display the names of sched domain,
> alongside their level in /proc/schedstat if CONFIG_SCHED_DEBUG is enabled.
>
> Domain names also makes the /proc/schedstat data unambiguous when some
> of the cpus are offline. For example, on a 128 cpus AMD Zen3 machine
> where CPU0 and CPU64 are SMT siblings and CPU64 is offline:
>
> Before:
> cpu0 ...
> domain0 ...
> domain1 ...
> cpu1 ...
> domain0 ...
> domain1 ...
> domain2 ...
>
> After:
> cpu0 ...
> domain0:MC ...
> domain1:PKG ...
> cpu1 ...
> domain0:SMT ...
> domain1:MC ...
> domain2:PKG ...
>
> schedstat version has not been updated since this change merely adds
> additional information to the domain name field and does not add a new
> field altogether.
>
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
On Fri, Nov 22, 2024 at 02:25:22PM +0530, Sapkal, Swapnil wrote:
> On 11/22/2024 2:14 PM, Swapnil Sapkal wrote:
> > From: K Prateek Nayak <kprateek.nayak@amd.com>
> >
> > Currently, there does not exist a straightforward way to extract the
> > names of the sched domains and match them to the per-cpu domain entry in
> > /proc/schedstat other than looking at the debugfs files which are only
> > visible after enabling "verbose" debug after commit 34320745dfc9
> > ("sched/debug: Put sched/domains files under the verbose flag")
> >
> > Since tools like `perf sched schedstat` require displaying per-domain
> > information in user friendly manner, display the names of sched domain,
> > alongside their level in /proc/schedstat if CONFIG_SCHED_DEBUG is enabled.
> >
> > Domain names also makes the /proc/schedstat data unambiguous when some
> > of the cpus are offline. For example, on a 128 cpus AMD Zen3 machine
> > where CPU0 and CPU64 are SMT siblings and CPU64 is offline:
> >
> > Before:
> > cpu0 ...
> > domain0 ...
> > domain1 ...
> > cpu1 ...
> > domain0 ...
> > domain1 ...
> > domain2 ...
> >
> > After:
> > cpu0 ...
> > domain0:MC ...
> > domain1:PKG ...
> > cpu1 ...
> > domain0:SMT ...
> > domain1:MC ...
> > domain2:PKG ...
> >
> > schedstat version has not been updated since this change merely adds
> > additional information to the domain name field and does not add a new
> > field altogether.
> >
> > Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> > Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> > Tested-by: James Clark <james.clark@linaro.org>
>
> Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
Surely you mean either acked-by or reviewed-by ? Otherwise I suggest you
re-read the documentation on tags.
Hi Peter,
On 11/22/2024 4:38 PM, Peter Zijlstra wrote:
> On Fri, Nov 22, 2024 at 02:25:22PM +0530, Sapkal, Swapnil wrote:
>> On 11/22/2024 2:14 PM, Swapnil Sapkal wrote:
>>> From: K Prateek Nayak <kprateek.nayak@amd.com>
>>>
>>> Currently, there does not exist a straightforward way to extract the
>>> names of the sched domains and match them to the per-cpu domain entry in
>>> /proc/schedstat other than looking at the debugfs files which are only
>>> visible after enabling "verbose" debug after commit 34320745dfc9
>>> ("sched/debug: Put sched/domains files under the verbose flag")
>>>
>>> Since tools like `perf sched schedstat` require displaying per-domain
>>> information in user friendly manner, display the names of sched domain,
>>> alongside their level in /proc/schedstat if CONFIG_SCHED_DEBUG is enabled.
>>>
>>> Domain names also makes the /proc/schedstat data unambiguous when some
>>> of the cpus are offline. For example, on a 128 cpus AMD Zen3 machine
>>> where CPU0 and CPU64 are SMT siblings and CPU64 is offline:
>>>
>>> Before:
>>> cpu0 ...
>>> domain0 ...
>>> domain1 ...
>>> cpu1 ...
>>> domain0 ...
>>> domain1 ...
>>> domain2 ...
>>>
>>> After:
>>> cpu0 ...
>>> domain0:MC ...
>>> domain1:PKG ...
>>> cpu1 ...
>>> domain0:SMT ...
>>> domain1:MC ...
>>> domain2:PKG ...
>>>
>>> schedstat version has not been updated since this change merely adds
>>> additional information to the domain name field and does not add a new
>>> field altogether.
>>>
>>> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
>>> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
>>> Tested-by: James Clark <james.clark@linaro.org>
>>
>> Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
>
> Surely you mean either acked-by or reviewed-by ? Otherwise I suggest you
> re-read the documentation on tags.
I took over the series from Ravi but forgot to add my S-o-b. I will make
sure to include it from next time onward.
--
Thanks and Regards,
Swapnil
© 2016 - 2026 Red Hat, Inc.