Add a clarification that domain levels are system-specific
and where to check for system details.
Add CPU clusters to the scheduler domain levels table.
Signed-off-by: Vitalii Bursov <vitaly@bursov.com>
---
Documentation/admin-guide/cgroup-v1/cpusets.rst | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v1/cpusets.rst b/Documentation/admin-guide/cgroup-v1/cpusets.rst
index 7d3415eea..d16a3967d 100644
--- a/Documentation/admin-guide/cgroup-v1/cpusets.rst
+++ b/Documentation/admin-guide/cgroup-v1/cpusets.rst
@@ -568,19 +568,25 @@ on the next tick. For some applications in special situation, waiting
The 'cpuset.sched_relax_domain_level' file allows you to request changing
this searching range as you like. This file takes int value which
-indicates size of searching range in levels ideally as follows,
+indicates size of searching range in levels approximately as follows,
otherwise initial value -1 that indicates the cpuset has no request.
====== ===========================================================
-1 no request. use system default or follow request of others.
0 no search.
1 search siblings (hyperthreads in a core).
- 2 search cores in a package.
- 3 search cpus in a node [= system wide on non-NUMA system]
- 4 search nodes in a chunk of node [on NUMA system]
- 5 search system wide [on NUMA system]
+ 2 search cpu clusters
+ 3 search cores in a package.
+ 4 search cpus in a node [= system wide on non-NUMA system]
+ 5 search nodes in a chunk of node [on NUMA system]
+ 6 search system wide [on NUMA system]
====== ===========================================================
+Not all levels can be present and values can change depending on the
+system architecture and kernel configuration. Check
+/sys/kernel/debug/sched/domains/cpu*/domain*/ for system-specific
+details.
+
The system default is architecture dependent. The system default
can be changed using the relax_domain_level= boot parameter.
--
2.20.1
On 3/31/24 9:31 PM, Vitalii Bursov wrote: > Add a clarification that domain levels are system-specific > and where to check for system details. > > Add CPU clusters to the scheduler domain levels table. > > Signed-off-by: Vitalii Bursov <vitaly@bursov.com> > --- > Documentation/admin-guide/cgroup-v1/cpusets.rst | 16 +++++++++++----- > 1 file changed, 11 insertions(+), 5 deletions(-) > > diff --git a/Documentation/admin-guide/cgroup-v1/cpusets.rst b/Documentation/admin-guide/cgroup-v1/cpusets.rst > index 7d3415eea..d16a3967d 100644 > --- a/Documentation/admin-guide/cgroup-v1/cpusets.rst > +++ b/Documentation/admin-guide/cgroup-v1/cpusets.rst > @@ -568,19 +568,25 @@ on the next tick. For some applications in special situation, waiting > > The 'cpuset.sched_relax_domain_level' file allows you to request changing > this searching range as you like. This file takes int value which > -indicates size of searching range in levels ideally as follows, > +indicates size of searching range in levels approximately as follows, > otherwise initial value -1 that indicates the cpuset has no request. > > ====== =========================================================== > -1 no request. use system default or follow request of others. > 0 no search. > 1 search siblings (hyperthreads in a core). > - 2 search cores in a package. > - 3 search cpus in a node [= system wide on non-NUMA system] > - 4 search nodes in a chunk of node [on NUMA system] > - 5 search system wide [on NUMA system] > + 2 search cpu clusters > + 3 search cores in a package. > + 4 search cpus in a node [= system wide on non-NUMA system] > + 5 search nodes in a chunk of node [on NUMA system] > + 6 search system wide [on NUMA system] I think above block of documentation need not change. SD_CLUSTER is a software construct, not a sched domain per se. IMO the next paragraph that is added is good enough and the above change can be removed. > ====== =========================================================== > > +Not all levels can be present and values can change depending on the > +system architecture and kernel configuration. Check > +/sys/kernel/debug/sched/domains/cpu*/domain*/ for system-specific > +details. > + > The system default is architecture dependent. The system default > can be changed using the relax_domain_level= boot parameter. >
On 01.04.24 07:05, Shrikanth Hegde wrote: > > > On 3/31/24 9:31 PM, Vitalii Bursov wrote: >> Add a clarification that domain levels are system-specific >> and where to check for system details. >> >> Add CPU clusters to the scheduler domain levels table. >> >> Signed-off-by: Vitalii Bursov <vitaly@bursov.com> >> --- >> Documentation/admin-guide/cgroup-v1/cpusets.rst | 16 +++++++++++----- >> 1 file changed, 11 insertions(+), 5 deletions(-) >> >> diff --git a/Documentation/admin-guide/cgroup-v1/cpusets.rst b/Documentation/admin-guide/cgroup-v1/cpusets.rst >> index 7d3415eea..d16a3967d 100644 >> --- a/Documentation/admin-guide/cgroup-v1/cpusets.rst >> +++ b/Documentation/admin-guide/cgroup-v1/cpusets.rst >> @@ -568,19 +568,25 @@ on the next tick. For some applications in special situation, waiting >> >> The 'cpuset.sched_relax_domain_level' file allows you to request changing >> this searching range as you like. This file takes int value which >> -indicates size of searching range in levels ideally as follows, >> +indicates size of searching range in levels approximately as follows, >> otherwise initial value -1 that indicates the cpuset has no request. >> >> ====== =========================================================== >> -1 no request. use system default or follow request of others. >> 0 no search. >> 1 search siblings (hyperthreads in a core). >> - 2 search cores in a package. >> - 3 search cpus in a node [= system wide on non-NUMA system] >> - 4 search nodes in a chunk of node [on NUMA system] >> - 5 search system wide [on NUMA system] >> + 2 search cpu clusters >> + 3 search cores in a package. >> + 4 search cpus in a node [= system wide on non-NUMA system] >> + 5 search nodes in a chunk of node [on NUMA system] >> + 6 search system wide [on NUMA system] > > I think above block of documentation need not change. SD_CLUSTER is a software > construct, not a sched domain per se. > I added "cpu clusters" because the original table: ====== =========================================================== -1 no request. use system default or follow request of others. 0 no search. 1 search siblings (hyperthreads in a core). 2 search cores in a package. 3 search cpus in a node [= system wide on non-NUMA system] 4 search nodes in a chunk of node [on NUMA system] 5 search system wide [on NUMA system] ====== =========================================================== does not match to what I see on a few systems I checked. AMD Ryzen and the same dual-CPU Intel server with NUMA disabled: level:0 - SMT level:2 - MC level:3 - PKG Server with NUMA enabled: level:0 - SMT level:2 - MC level:5 - NUMA So, for the relax level original table: 1 -> enables 0 SMP -> OK 2 -> enables 1 unknown -> does not enable cores in a package 3 -> enables 2 MC -> OK for NUMA, but not system wide on non-NUMA system 5 -> enables 4 unknown -> does not enable system wide on NUMA The updated table ====== =========================================================== -1 no request. use system default or follow request of others. 0 no search. 1 search siblings (hyperthreads in a core). 2 search cpu clusters 3 search cores in a package. 4 search cpus in a node [= system wide on non-NUMA system] 5 search nodes in a chunk of node [on NUMA system] 6 search system wide [on NUMA system] ====== =========================================================== would work like this: 1 -> enables 0 SMP -> OK 2 -> enables 1 unknown -> does nothing new 3 -> enables 2 MC -> OK, cores in a package for NUMA and non-NUMA system 4 -> enables 3 PKG -> OK on non-NUMA system 6 -> enables 5 NUMA -> OK I think it would look more correct on "average" systems, but anyway, please confirm and I'll remove the table update in an updated patch. Thanks > IMO the next paragraph that is added is good enough and the above change can be removed. >> ====== =========================================================== >> >> +Not all levels can be present and values can change depending on the >> +system architecture and kernel configuration. Check >> +/sys/kernel/debug/sched/domains/cpu*/domain*/ for system-specific >> +details. >> + >> The system default is architecture dependent. The system default >> can be changed using the relax_domain_level= boot parameter. >>
On 4/1/24 4:05 PM, Vitalii Bursov wrote: > > > On 01.04.24 07:05, Shrikanth Hegde wrote: >> >> >> On 3/31/24 9:31 PM, Vitalii Bursov wrote: >>> Add a clarification that domain levels are system-specific >>> and where to check for system details. >>> >>> Add CPU clusters to the scheduler domain levels table. >>> >>> Signed-off-by: Vitalii Bursov <vitaly@bursov.com> >>> --- >>> Documentation/admin-guide/cgroup-v1/cpusets.rst | 16 +++++++++++----- >>> 1 file changed, 11 insertions(+), 5 deletions(-) >>> >>> diff --git a/Documentation/admin-guide/cgroup-v1/cpusets.rst b/Documentation/admin-guide/cgroup-v1/cpusets.rst >>> index 7d3415eea..d16a3967d 100644 >>> --- a/Documentation/admin-guide/cgroup-v1/cpusets.rst >>> +++ b/Documentation/admin-guide/cgroup-v1/cpusets.rst >>> @@ -568,19 +568,25 @@ on the next tick. For some applications in special situation, waiting >>> >>> The 'cpuset.sched_relax_domain_level' file allows you to request changing >>> this searching range as you like. This file takes int value which >>> -indicates size of searching range in levels ideally as follows, >>> +indicates size of searching range in levels approximately as follows, >>> otherwise initial value -1 that indicates the cpuset has no request. >>> >>> ====== =========================================================== >>> -1 no request. use system default or follow request of others. >>> 0 no search. >>> 1 search siblings (hyperthreads in a core). >>> - 2 search cores in a package. >>> - 3 search cpus in a node [= system wide on non-NUMA system] >>> - 4 search nodes in a chunk of node [on NUMA system] >>> - 5 search system wide [on NUMA system] >>> + 2 search cpu clusters >>> + 3 search cores in a package. >>> + 4 search cpus in a node [= system wide on non-NUMA system] >>> + 5 search nodes in a chunk of node [on NUMA system] >>> + 6 search system wide [on NUMA system] >> >> I think above block of documentation need not change. SD_CLUSTER is a software >> construct, not a sched domain per se. >> > > I added "cpu clusters" because the original table: > ====== =========================================================== > -1 no request. use system default or follow request of others. > 0 no search. > 1 search siblings (hyperthreads in a core). > 2 search cores in a package. > 3 search cpus in a node [= system wide on non-NUMA system] > 4 search nodes in a chunk of node [on NUMA system] > 5 search system wide [on NUMA system] > ====== =========================================================== > does not match to what I see on a few systems I checked. > > AMD Ryzen and the same dual-CPU Intel server with NUMA disabled: > level:0 - SMT > level:2 - MC > level:3 - PKG > > Server with NUMA enabled: > level:0 - SMT > level:2 - MC > level:5 - NUMA > None of these are "cpu clusters". From what i know, the description for the above are. SMT - multi-threads/hyperthreads MC - Multi-Core PKG - Package/Socket level NUMA - Node level. When you enable, PKG gets degenerated since pkg mask and numa mask would have been same. > So, for the relax level original table: > 1 -> enables 0 SMP -> OK > 2 -> enables 1 unknown -> does not enable cores in a package > 3 -> enables 2 MC -> OK for NUMA, but not system wide on non-NUMA system > 5 -> enables 4 unknown -> does not enable system wide on NUMA > > The updated table > ====== =========================================================== > -1 no request. use system default or follow request of others. > 0 no search. > 1 search siblings (hyperthreads in a core). > 2 search cpu clusters > 3 search cores in a package. > 4 search cpus in a node [= system wide on non-NUMA system] > 5 search nodes in a chunk of node [on NUMA system] > 6 search system wide [on NUMA system] > ====== =========================================================== > would work like this: > 1 -> enables 0 SMP -> OK > 2 -> enables 1 unknown -> does nothing new > 3 -> enables 2 MC -> OK, cores in a package for NUMA and non-NUMA system > 4 -> enables 3 PKG -> OK on non-NUMA system It wont, PKG domain itself wont be there. It gets removed. > 6 -> enables 5 NUMA -> OK > > I think it would look more correct on "average" systems, but anyway, > please confirm and I'll remove the table update in an updated patch. > IMHO, the table need not get updated. Just adding a paragraph pointing to refer to the sysfs files is good enough. > Thanks > >> IMO the next paragraph that is added is good enough and the above change can be removed. > >>> ====== =========================================================== >>> >>> +Not all levels can be present and values can change depending on the >>> +system architecture and kernel configuration. Check >>> +/sys/kernel/debug/sched/domains/cpu*/domain*/ for system-specific >>> +details. >>> + >>> The system default is architecture dependent. The system default >>> can be changed using the relax_domain_level= boot parameter. >>>
© 2016 - 2026 Red Hat, Inc.